Comments on Tests of Statistical Significance in EconomicsTypePad2007-10-05T19:51:00ZMark Thomahttps://economistsview.typepad.com/economistsview/tag:typepad.com,2003:https://economistsview.typepad.com/economistsview/2007/10/are-tests-of-st/comments/atom.xml/Deirdre McCloskey commented on 'Tests of Statistical Significance in Economics'tag:typepad.com,2003:6a00d83451b33869e200e54ef7e3ba88332007-10-15T16:23:22Z2007-10-15T16:23:22ZDeirdre McCloskeyDears, I can't keep all your monickers straight, so let me "reply" in a general way. Ziliak and I are...<p>Dears,</p>
<p>I can't keep all your monickers straight, so let me "reply" in a general way. Ziliak and I are to get the page proofs of a book about all this in a couple of week, out in I think December: The Cult of Statistical Significance (University of Michiagn Press). </p>
<p>(1.) There we tell how R. A. Fisher resisted calls to make his procedures relevant to scientific hypotheses (by Neyman and Pearson, for example, or by the inventor of the t test, Gossett; or by later exponents of statistics with loss functions such as Wald). He stuck with 5% and tests of one kind of scepticism, supposing (as is very frequently not the case) that the only source of scientific error is one of inference from a sample to a population. </p>
<p>(2.) But the basic point, which some of you-all are not quite seeing, is that good fit is not the same thing as importance. In fact, usually it has nothing to do with importance. Yet SPSS-sciences have thoroughly confused the two. As one of you said, irrelevant reporting of standard errors (when for example the issue is not one of sampling anyway!) is used as a referee-enforced entry fee to journals, and makes no contribution to the scientific oomph of the argument.</p>
<p>(3.) It's rather important to understand (see the book or any of the articles we wrote) that we are making a very old and elementary point. So it just won't do to say, "Wow, this is crazy. Who are these people anyway?" We're merely reporting on scores of the leading minds on statistics from the beginning who make the same point. See for example Bill Kruskal's devastating article on "significance" in The old Encyclopedia of Statistics. It's not our point. It's Savage's point, or Freedman's point (Freedman, Pisani, and Purvis).</p>
<p>In other words, some fields---not physics and engineering and chemistry---have made a dreadful mistake, which vitiates most of their statistical work.</p>
<p>Sincerely,</p>
<p><br />
Deirdre McCloskey</p>KIO commented on 'Tests of Statistical Significance in Economics'tag:typepad.com,2003:6a00d83451b33869e200e54ef34f7588332007-10-08T13:09:32Z2007-10-08T13:09:33ZKIOhttp://inflationusa.blogspot.comAs a matter of fact, experimental physics is impossible without statistics (with tests of statistical significance as an inevitable part...<p>As a matter of fact, experimental physics is impossible without statistics (with tests of statistical significance as an inevitable part of it). Theoretical physics could not exist without experimental one, and thus statistics. <br />
Effectively, all our fundamental physical laws are statistical relationships between numerous measurables with various levels of significance. There are no strict or exact physical laws, all they have a certain level of uncertainty (and associated level of siginificance as a simple description of statistical properties of corresponding data) as predefined by limited measurement precision (not accuracy because we not always know what is true value of given variable - we have never used dark matter as a part of Newton laws.)</p>
<p>Therefore, by definition (we just often forget about it) testing of statistical significance is the core procedure of physics. (In physical laboratory, it often (if not always at the highest research level) happens that researches throw away resuls of experiments, which do not meet certain criteria of significance and repeat this experiment in new setting before this criteria are satisfied.) This is the result of measurement errors, which are of the same amplitude as signal at the cutting edge of physics.<br />
</p> commented on 'Tests of Statistical Significance in Economics'tag:typepad.com,2003:6a00d83451b33869e200e54ef2b28e88332007-10-07T13:24:14Z2007-10-07T13:24:14ZAccording to G.S. Maddala (Introduction to Econometrics) the consensus around the 5% and 10% significance levels is something of an...<p>According to G.S. Maddala (Introduction to Econometrics) the consensus around the 5% and 10% significance levels is something of an historical accident. Maddala argues that Sir R.A. Fischer, who is regarded as the "father" of modern statistics, simply offered 5% and 10% as suggested levels of significance and over the generations that suggestion became gospel.</p>
<p>Outside of academia very few people in the real world take the 5% and 10% thresholds all that seriously. Something coming in at the 11% significance level will probably be good enough to win the practical argument.</p>
<p>You can also make the argument that strict adherence to 5% and 10% significance levels has (perhaps unintentionally) corrupted many studies. A few years ago there was a paper that examined the reported tests for statistical significance across randomly selected social science papers. The paper found that a statistically unlikely number of studies reported results that were barely under the wire, which suggests some unconscious manipulation of the data in order to publish results.</p>ScentOfViolets commented on 'Tests of Statistical Significance in Economics'tag:typepad.com,2003:6a00d83451b33869e200e54ef26e3b88332007-10-06T23:25:18Z2007-10-06T23:25:18ZScentOfVioletsI suspect this 'consensus' dates back to the days when this was looked up in standard tables which had a...<p>I suspect this 'consensus' dates back to the days when this was looked up in standard tables which had a limited number of entries, rather than being directly calculated.</p>Meanwhile, back in the real world... commented on 'Tests of Statistical Significance in Economics'tag:typepad.com,2003:6a00d83451b33869e200e54f06a64488342007-10-06T22:21:21Z2007-10-06T22:21:21ZMeanwhile, back in the real world..." why is the significance taken to be at the 5% level? What's the justification? Why not at the 0.0001%...<p>" why is the significance taken to be at the 5% level? What's the justification? Why not at the 0.0001% level, or the 20% level? ...It's quite possible that under one set of circumstances, significance at the 5% level is a meaningless statement, while in another, significance at, say, the 10% level is very significant indeed."</p>
<p>That is correct, in a general way. The term 'significance' has a precise definition. refering to the number of times a result would be expected by chance under certain conditions. The question of what level of significance is substantively important is not a math/stat question.</p>
<p>Basically, the consensus has developed over time in the social sciences that five percent is enough to justify further work and/or taking the matter seriously (pending confirmation) in thinking about whatever it is. Partly this is simply tradition; it also is done with reference to experience with how good the data is likely to be. Assuming that research is to go forward, expectations must be set at a level that can be satisified by results achievable in the real world.</p>
<p>However, in other contexts other levels are appropriate. In marketing or product research to percent is often used. If your life depended on the result you might insist on a stricter standard. This is not a question with an absolute answer; it's always a judgement call.<br />
<br />
</p>James Killus commented on 'Tests of Statistical Significance in Economics'tag:typepad.com,2003:6a00d83451b33869e200e54f06a56688342007-10-06T22:12:36Z2007-10-06T22:12:36ZJames Killushttp://www.sff.net/people/james-killus/I never say or imply that physicists never use statistics, but I would appreciate it if you can explain the...<p>I never say or imply that physicists never use statistics, but I would appreciate it if you can explain the difference between "hypothesis testing" and "arbitrary measure of significance" and the proof that M&Z focus on the latter...</p>
<p>I beg your pardon for slightly misquoting your original claim: that M&Z say that physicists do not use "statistical test(s)." However, the difference between "statistics" and "statistical tests" pales before the difference between "hypothesis testing" and "hypothesis testing via statistical significance levels," the latter being what I hope, hope, hope is what you meant to say.</p>
<p>When Cavendish burned hydrogen and got water, he was testing the hypothesis that water is an element. No statistics were involved, but it was hypothesis testing. If one determines (via regression) that a particular data set is better explained by a quadratic function than by a linear one (something I myself once did, not to compare myself to Cavendish, of course; I'm a much nicer guy than he was) one is using a statistical method to compare two hypotheses, but there is no need to use "significance levels" unless one suspects that the differences in the data might be due to chance. Frequently, one doesn't really care; you're going to either follow that line of research or you aren't, and if it's just an accident, you'll find out soon enough.</p>
<p>Physicists, chemists, and many other sciences usually report error bounds for their measurements as a matter of course, but the connection to what hypotheses are being tested is tenuous and frequently there is no hypothesis involved; the data a simply reported as data.</p>
<p>The reasons why these particular sciences do not rely excessively on tests of statistical significance is very similar to the reasons why McCloskey and Ziliak object to them: they do very little to advance the science and they seem primarily to be a "community" phenomenon within certain sciences, used as a method of selecting which papers are worthy of publication, without recourse to any insight into the science itself.</p>ScentOfViolets commented on 'Tests of Statistical Significance in Economics'tag:typepad.com,2003:6a00d83451b33869e200e54f06892288342007-10-06T17:50:15Z2007-10-06T17:50:15ZScentOfVioletsRobert, I'm still not sure what you mean. It might help if you illustrate with a specific example. I looked...<p>Robert, I'm still not sure what you mean. It might help if you illustrate with a specific example.</p>
<p>I looked over the links, btw, and this is nothing new, from what was an admittedly brief perusal. Basically, all they're saying is that you can have very precise tests for signicance. For example, In a one-tailed test on a normal distribution with a 90% confidence, if your measurements give a figure that would come up only 5% of the time, you could conclude that you should reject the null hypothesis (There's a lot of detail in there I'm skipping, so please take this explanation to be heuristic.) Now, the conditions for this test of statistical significance can be very precisely stated, the measurements can be made arbtirarily (thoeritically) accurate and so on and so forth, which can be very appealing to people who want a 'hard' result. The problem is . . . why is the significance taken to be at the 5% level? What's the justification? Why not at the 0.0001% level, or the 20% level? To cut a long story short, this is the problem with these tests - the essential arbitrariness of assigning these sorts of cutoff values. It's quite possible that under one set of circumstances, significance at the 5% level is a meaningless statement, while in another, significance at, say, the 10% level is very significant indeed. And unfortunately, it is sometimes quite difficult to make a good argument on why a given level of significance is chosen . . . until after the fact.</p>robertdfeinman commented on 'Tests of Statistical Significance in Economics'tag:typepad.com,2003:6a00d83451b33869e200e54f06031088342007-10-06T16:54:10Z2007-10-06T16:54:10Zrobertdfeinmanhttp://robertdfeinman.com/societyHere's an example of a recent physics paper. In the place where it compares theory to recent experiments notice the...<p>Here's an example of a recent physics paper. In the place where it compares theory to recent experiments notice the use of error bars (figure 6).</p>
<p><a href="http://arxiv.org/PS_cache/nucl-th/pdf/0201/0201021v4.pdf" rel="nofollow">http://arxiv.org/PS_cache/nucl-th/pdf/0201/0201021v4.pdf</a></p>
<p>I show it for two reasons, one to confirm that even theoretical physicists use statistical techniques when appropriate and also to show that any field can render itself incomprehensible when it wants to.</p>
<p>The actual point of the paper is to conjecture whether neutrons are actually spherical under all circumstances. Apparently (this is from the press release) they can look more like a peanut at times.</p>2slugbaits commented on 'Tests of Statistical Significance in Economics'tag:typepad.com,2003:6a00d83451b33869e200e54f05f83088342007-10-06T14:29:14Z2007-10-06T14:29:14Z2slugbaitsRegarding the issue of McCloskey's gender, I would point out that even Deidre refers to Donald and Deidre as two...<p>Regarding the issue of McCloskey's gender, I would point out that even Deidre refers to Donald and Deidre as two separate persons. When Deidre McCloskey was promoting her book "Crossings" at a local bookstore, she said that her new gender had caused her to revise many of Donald's former views. So in that sense it might be a bit misleading to use the term "Donald/Deidre" because it implies a unity of thought that is not always there.</p>
<p>Regarding McCloskey's comment that statistics has nothing new to contribute, that reminds me of a comment that Donald McCloskey once made in a lecture in which he said that understanding Alfred Marshall was just about all you needed to know in economics...the rest was just scribbles on the board. His lectures were always interesting.</p>Meanwhile, back in the real world... commented on 'Tests of Statistical Significance in Economics'tag:typepad.com,2003:6a00d83451b33869e200e54f05f4a888342007-10-06T13:37:53Z2007-10-06T13:37:53ZMeanwhile, back in the real world...I also think they totally misunderstand why physicists don't use statistical test. Physicists use statistical tests on a regular basis,...<p>I also think they totally misunderstand why physicists don't use statistical test.</p>
<p>Physicists use statistical tests on a regular basis, e.g. to allow for random effects in noisy data. Of course, the significance levels they get are somewhat beyond those achieveable by mere mortals. For instance, in the search for the Higgs boson they intend to insist on five sigma. </p>passerby commented on 'Tests of Statistical Significance in Economics'tag:typepad.com,2003:6a00d83451b33869e200e54ef1816a88332007-10-06T04:16:19Z2007-10-06T04:16:19ZpasserbyJames, I never say or imply that physicists never use statistics, but I would appreciate it if you can explain...<p>James,</p>
<p>I never say or imply that physicists never use statistics, but I would appreciate it if you can explain the difference between "hypothesis testing" and "arbitrary measure of significance" and the proof that M&Z focus on the latter. </p>James Killus commented on 'Tests of Statistical Significance in Economics'tag:typepad.com,2003:6a00d83451b33869e200e54ef1802388332007-10-06T04:00:41Z2007-10-06T04:00:41ZJames Killushttp://www.sff.net/people/james-killus/passerby, No, McCloskey and Ziliak do not claim that physicists use hypothesis testing very rarely. They say that physicists (and...<p>passerby,</p>
<p>No, McCloskey and Ziliak do not claim that physicists use hypothesis testing very rarely. They say that physicists (and chemists, and geologists) rarely use the arbitrary measure of "statistical significance." My own observation and experience tends to confirm that. However, that is not the same as "not using statistics." Neither is it the same as not engaging in "hypothesis testing." I don't really want to have to explain the differences, but I will if I must.<br />
</p>passerby commented on 'Tests of Statistical Significance in Economics'tag:typepad.com,2003:6a00d83451b33869e200e54f05cd3388342007-10-06T02:21:26Z2007-10-06T02:21:26ZpasserbyI am citing McCloskey and Ziliak about the claim that physicists use hypothesis testing very rarely. (I don't mean absolutely...<p>I am citing McCloskey and Ziliak about the claim that physicists use hypothesis testing very rarely. (I don't mean absolutely don't use, but general don't need to use.) </p>James Killus commented on 'Tests of Statistical Significance in Economics'tag:typepad.com,2003:6a00d83451b33869e200e54ef1672388332007-10-06T01:26:51Z2007-10-06T01:26:51ZJames Killushttp://www.sff.net/people/james-killus/passerby, Wherever did you get the idea that physicists do not use statistical tests? I cannot imagine that you know...<p>passerby,</p>
<p>Wherever did you get the idea that physicists do not use statistical tests? I cannot imagine that you know any actual physicists. Or chemists, for that matter.</p>robertdfeinman commented on 'Tests of Statistical Significance in Economics'tag:typepad.com,2003:6a00d83451b33869e200e54ef15d1388332007-10-05T23:48:15Z2007-10-05T23:48:15Zrobertdfeinmanhttp://robertdfeinman.com/societyScent: In a normal distribution there is a 67% probability that the data will lie within one standard deviation of...<p>Scent:<br />
In a normal distribution there is a 67% probability that the data will lie within one standard deviation of the mean and a 95% probability that it will lie within two standard deviations of the mean.</p>
<p>Since you said the data would be reliable in 9 out of 10 cases that is about the same as saying that the level of confidence lies within two standard deviations of the mean.</p>
<p>Rich:<br />
If the remark about McCloskey wasn't being snide then what about "zeal of a radio preacher" remark? Sorry, the paper doesn't pass the smell test. Ad hominem attacks may be acceptable in blogs, but not in scholarly publications. </p>ScentOfViolets commented on 'Tests of Statistical Significance in Economics'tag:typepad.com,2003:6a00d83451b33869e200e54ef1574788332007-10-05T23:05:07Z2007-10-05T23:05:07ZScentOfVioletsI'm sorry, Robert, but I'm not following you. Could you try again please? I'm an old man, and not the...<p>I'm sorry, Robert, but I'm not following you. Could you try again please? I'm an old man, and not the quickest of sorts by any stretch of the imagination.</p>passerby commented on 'Tests of Statistical Significance in Economics'tag:typepad.com,2003:6a00d83451b33869e200e54f05c00b88342007-10-05T22:36:54Z2007-10-05T22:36:54ZpasserbyBut don't you guys think that MZ's last reply is also appealing to authority and emotion more than elaborating their...<p>But don't you guys think that MZ's last reply is also appealing to authority and emotion more than elaborating their points. </p>
<p>I also think they totally misunderstand why physicists don't use statistical test. The data are of different nature and they don't need it, rather than the method is bad per se. We don't need chopsticks to eat western meals, does it imply chopsticks are bad tools for meals?</p>passerby commented on 'Tests of Statistical Significance in Economics'tag:typepad.com,2003:6a00d83451b33869e200e54f05bfc388342007-10-05T22:33:38Z2007-10-05T22:33:39ZpasserbyBut don't you guys think that MZ's last reply is also appealing to authority and emotion more than elaborating their...<p>But don't you guys think that MZ's last reply is also appealing to authority and emotion more than elaborating their points. </p>
<p>I also think they totally misunderstand why physicists don't use statistical test. The data are of different nature and they don't need it, rather than the method is bad per se. We don't need chopsticks to eat western meals, does it imply chopsticks are bad tools for meals?</p>Rich commented on 'Tests of Statistical Significance in Economics'tag:typepad.com,2003:6a00d83451b33869e200e54ef1521f88332007-10-05T22:26:50Z2007-10-05T22:26:50ZRichThe reference to McCloskey's original given name was not, I suspect, gratuitous. It was probably provided to help readers who...<p>The reference to McCloskey's original given name was not, I suspect, gratuitous. It was probably provided to help readers who might not know that McCloskey's name changed between 1985 and 2002 (the two publication dates cited).</p>robertdfeinman commented on 'Tests of Statistical Significance in Economics'tag:typepad.com,2003:6a00d83451b33869e200e54ef14d6888332007-10-05T21:52:36Z2007-10-05T21:52:37Zrobertdfeinmanhttp://robertdfeinman.com/societyScent: Thanks that would imply about two standard deviations for a normal distribution. There is a nice graph on Wikipedia:...<p>Scent:<br />
Thanks that would imply about two standard deviations for a normal distribution. There is a nice graph on Wikipedia:<br />
<a href="http://en.wikipedia.org/wiki/Standard_deviation#Standard_deviation_of_a_random_variable" rel="nofollow">http://en.wikipedia.org/wiki/Standard_deviation#Standard_deviation_of_a_random_variable</a></p>ScentOfViolets commented on 'Tests of Statistical Significance in Economics'tag:typepad.com,2003:6a00d83451b33869e200e54ef1482b88332007-10-05T21:17:08Z2007-10-05T21:17:09ZScentOfVioletsI'm not sure how much you know about statistics, so please excuse me if this sounds patronizing and understand that...<p>I'm not sure how much you know about statistics, so please excuse me if this sounds patronizing and understand that I have no intention of being so. When you hear things like the margin of error is four percent, what is usually meant that in 19 out of 20 samples drawn from the population, the sample means (or some other statistic that will enable us to draw inferences about some characteristic of the general population) will be within four percent of the given figure. This is called the confidence or confidence interval, and that too can vary; another common one would be 9 out of 10 samples. So, for example, if your local paper says that 38% of the voters favor candidate X, with a margin of error of 4%, what that means is that if this survey were done 20 times, in 19 out of 20 cases, polling data would give a figure somewhere between 34% and 42%.</p>James Killus commented on 'Tests of Statistical Significance in Economics'tag:typepad.com,2003:6a00d83451b33869e200e54ef1464f88332007-10-05T21:06:08Z2007-10-05T21:06:08ZJames Killushttp://www.sff.net/people/james-killus/And that's not even addressing the "zeal of a radio preacher" remark. The paper linked to in item #8 is...<p>And that's not even addressing the "zeal of a radio preacher" remark.</p>
<p>The paper linked to in item #8 is quite good and also brief.</p>robertdfeinman commented on 'Tests of Statistical Significance in Economics'tag:typepad.com,2003:6a00d83451b33869e200e54f05b33088342007-10-05T20:28:47Z2007-10-05T20:28:47Zrobertdfeinmanhttp://robertdfeinman.com/societyAs usual resorting to ad hominem attacks makes one doubtful of the theoretical aspects of a person's arguments to wit...<p>As usual resorting to ad hominem attacks makes one doubtful of the theoretical aspects of a person's arguments to wit (Hoover and Siegler):</p>
<p>For twenty years, since the publication of the first edition of The Rhetoric of Economics (1985a), Deirdre (né Donald) N. McCloskey has campaigned tirelessly to convince the economics profession that it is deeply confused about statistical significance.1 With the zeal of a radio preacher, McCloskey (2002) declares the current practice of significance testing to be one of the two deadly “secret sins of economics”:</p>
<p>Is it relevant to drag in a person's gender identity?</p>
<p>If I understand the dispute, there are those who fail to measure the uncertainty of the data adequately and some who also don't reveal the accuracy of fit between the data and the model. This is not a "good thing" in any study.</p>
<p>The other issue seems to be something called "economic significance". The best I can figure is that this means the problem under study isn't "important", or that while the factor under discussion has a high probability of being correlated with some other factor the variable itself is not one the exerts much influence in the real world.<br />
<br />
It would seem one wouldn't know which factors are going to be the most important before doing the study and any attempt to isolate a small subset may chose those which have less influence. But one seldom knows this in advance.</p>
<p>Not discussed is the reliability of the data. A perfect example can be seen when polling people. The polls always quote things like "margin of error is 4%". What this means is that if they ran the poll twice with two groups of people there would be a slight chance that the results would be different by more than this margin (one standard deviation?). It doesn't mean that the people being polled know what they are talking about. Jay Leno makes a career out of demonstrating this with his man-on-the-street interviews.</p>
<p>I don't know what ax Hoover and Siegler have to grind, but they don't seem to be meeting the standards of scholarly debate.<br />
</p>