« Is Inflation a Threat to China's Stability? | Main | The Other Half »

Oct 17, 2007

Comment on "Tests of Statistical Significance in Economics"

This comment scrolled down the sidebar fairly quickly, so many of you may have missed Deirdre McCloskey's response to your comments to the post "Tests of Statistical Significance in Economics." As you'll recall, the post highlighted comments by Andrew Gelman on a debate between McCloskey/Zilliak and Hoover/Siegler over the use of classical statistical methodology in economics. Here's the comment:

Dears,

I can't keep all your monickers straight, so let me "reply" in a general way. Ziliak and I are to get the page proofs of a book about all this in a couple of week, out in I think December: The Cult of Statistical Significance (University of Michiagn Press).

(1.) There we tell how R. A. Fisher resisted calls to make his procedures relevant to scientific hypotheses (by Neyman and Pearson, for example, or by the inventor of the t test, Gossett; or by later exponents of statistics with loss functions such as Wald). He stuck with 5% and tests of one kind of scepticism, supposing (as is very frequently not the case) that the only source of scientific error is one of inference from a sample to a population.

(2.) But the basic point, which some of you-all are not quite seeing, is that good fit is not the same thing as importance. In fact, usually it has nothing to do with importance. Yet SPSS-sciences have thoroughly confused the two. As one of you said, irrelevant reporting of standard errors (when for example the issue is not one of sampling anyway!) is used as a referee-enforced entry fee to journals, and makes no contribution to the scientific oomph of the argument.

(3.) It's rather important to understand (see the book or any of the articles we wrote) that we are making a very old and elementary point. So it just won't do to say, "Wow, this is crazy. Who are these people anyway?" We're merely reporting on scores of the leading minds on statistics from the beginning who make the same point. See for example Bill Kruskal's devastating article on "significance" in The old Encyclopedia of Statistics. It's not our point. It's Savage's point, or Freedman's point (Freedman, Pisani, and Purvis).

In other words, some fields---not physics and engineering and chemistry---have made a dreadful mistake, which vitiates most of their statistical work.

Sincerely,

Deirdre McCloskey

    Posted by Mark Thoma on Wednesday, October 17, 2007 at 02:07 PM in Economics, Methodology | Permalink | TrackBack (0) | Comments (19)



    TrackBack

    TrackBack URL for this entry:
    http://www.typepad.com/services/trackback/6a00d83451b33869e200e54f0dd4e28834

    Listed below are links to weblogs that reference Comment on "Tests of Statistical Significance in Economics":


    Comments

    Feed You can follow this conversation by subscribing to the comment feed for this post.


    Harold says...

    So what do they recommend ? What use is this argument in policy-oriented research, or in applied microeconomics for instance ?

    Posted by: Harold | Link to comment | Oct 17, 2007 at 02:26 PM

    2slugbaits says...

    Harold,

    Suppose a new cancer treatment shows great promise, but the null hypothesis of no effect cannot be rejected at the 10% level, but can be rejected at the 11% level. Suppose another government agency finds a cure for dog dandruff and the null hypothesis of no effect can be rejected at the 0.1% level. Which project deserves govt funding?

    Posted by: 2slugbaits | Link to comment | Oct 17, 2007 at 04:08 PM

    notsneaky says...

    But no one is disagreeing with McCloskey and Zilliak on points 1 through 3!!!!! Hoover and Siegler certainly aren't. The disagreement is over whether economist focus on statistical significance to the exclusion of everything else or not. M&Z say yes based on the data they constructed. H&S say no, M&Z's measure is constructed in a way which makes it more likely to generate the result their looking for. Basically you got to cross yourself, anoint your forehead with virgin olive oil blessed by the Pope, say five Hail Mary's, two our Fathers and write "Statistical significance is not the only thing that is important" 100 times on the blackboard or you're guilty by the McCloskey and Zilliak criteria. I think H&S get the better of the argument but other may disagree. But the argument should be mostly about M&Z's methadology and data, not the straw man that Deirdre keeps pushing forward. Though I guess that kind of a strategy can be expected from a rhetorician.

    Posted by: notsneaky | Link to comment | Oct 17, 2007 at 07:11 PM

    James Killus says...

    Hey, cool, notsneaky, you've got a classic straw man argument (and you get double points for the straw man argument being that the other guy is using straw man arguments) followed by a classic ad hominem, with yet another double score for the ad hominem being that your opponent is a rhetorician.

    That is really good.

    However, if M&Z's metric was really "constructed in a way which makes it more likely to generate the result their looking for" then it should also generate that result for papers in physics, chemistry, and engineering, yet it does not. And if M&Z's only made the point (and they actually makes several others) that the difference between "statistically significant" and "real world significant" can be large, and many papers do not sufficiently make the distinction, then they would still have a fairly important criticism.

    I will say, however, that the best think I saw in the halo of commentary etc. around the original set-to was the paper which bears the telling title "The Difference Between 'Significant' and 'Not Significant' is not Itself Statistically Significant.".

    Posted by: James Killus | Link to comment | Oct 17, 2007 at 08:53 PM

    notsneaky says...

    James Killus,

    Yes, I exaggerated to make a point. It's still a valid point. There should be a lot more discussion about the methodology M&Z use to support their hypothesis (that economists care only about statistical significance to exclusion of other factors) rather than whether RA Fisher was a chump or whether statistical significance is the only thing that ever matters. Particularly since no one's arguing that it is.

    As an addendum. When researchers present their findings the usual practice is to take the "hypothesis hasn't been proven (since it never really can be) only not rejected" line. As a consequence the results are often presented as "suggesting this" or "offering some support for the view that" etc. In other words, with caution and restraint. But M&Z are so eager to paint economists with their black soot brush that they present theirs with a lot of hubris and unwarranted bravado. That in itself is telling.

    Posted by: notsneaky | Link to comment | Oct 18, 2007 at 12:17 AM

    notsneaky says...

    Also, there is some pretty good and obvious reasons why folks who study physics, chemistry, and engineering don't spend as much time talking about statistical significance as economists. They're dealing with quantities which can be much more easily observed (mostly) and precisely measured. They can do controlled experiments. Hence, there's less need for the "inferential" in "inferential statistics" in the first place.

    And the Loss Function that McCloskey keeps complaining about arises much more naturally in those areas than it does in economics (and where it does arise naturally in economics, say the inflation/unemployment trade off, it is used, like in the Taylor Rule).

    I don't see why one discipline which tackles a particular type of problems should be judged by the standards of other disciplines which tackle problems of completely different nature. Should people studying music theory incorporate carbon dating into their toolbox?
    (Again, just exaggerating to make a point)

    Posted by: notsneaky | Link to comment | Oct 18, 2007 at 12:31 AM

    James Killus says...

    notsneaky, you still have not explained how M&Z have gimmicked their methodology to find a phenomenon in economics and other social sciences that they do not find in papers in physics, chemistry, and engineering. Until you do that, you are just slinging insults.

    And badly.

    Posted by: James Killus | Link to comment | Oct 18, 2007 at 12:34 AM

    notsneaky says...

    "you still have not explained how M&Z have gimmicked their methodology to find a phenomenon in economics and other social sciences that they do not find in papers in physics, chemistry, and engineering."

    Huh? I just gave two reasons why you wouldn't expect physics, chemistry and engineering to use statistical significance as much as economists do (and they do use stat sign much more than McCloskey claims);

    One, there's no reason for it since in physics, chemistry and engineering one can perform controlled experiments and measure variables precisely. Usually this is not the case in economics. We have to deal with the data the world gives us. Hence the need for inferential statistics.

    Two, the loss function arises naturally in those disciplines. In economics it often does not. It's much easier to specify a loss function for "bridge too weak, will collapse vs. bridge too strong we used to many materials" as opposed to "price elasticity of tomatoes too high vs. price elasticity of tomatoes too low". Again, different disciplines, different tools. You wouldn't expect carpenters to build houses with scalpels and surgeons to use ball peen hammers.

    Why you make me repeat myself?

    And to, um, repeat myself, the point is that the discussion should be much more about M&Z's methodology rather than whether there exists at least one economist who thinks that stat sign is the Alpha and Omega of scientific research.

    For more details, refer to Hoover and Siegler, roughly pages 30 through 40 and then some later on.


    "Until you do that, you are just slinging insults."

    I haven't insulted you anywhere. Yet.

    Posted by: notsneaky | Link to comment | Oct 18, 2007 at 07:54 AM

    James Killus says...

    We cross-posted notsneaky. Sorry I haven't been more prompt in reply.

    Part of current mythology is that physics, chemsitry, and engineering deal in "quantities which can be much more easily observed (mostly) and precisely measured." This mythology has substantial following among people who do not follow the literature of physics, chemistry, and engineering. I've used statistical inference as a matter of course on practically every paper I've ever written or co-written, and it's hard to find papers that do not at least use regression packages and variations on the theme.

    The simple fact is that there are plenty of the so-called "hard sciences" that cannot perform controlled experiments (astronomy, geology, atmospheric science, etc.) yet which do not fall into the "statistical significance" trap. There are traps that these fields fall into, things that have the same function as the "entry fee" that DM refers to, but arbitrary measures of statistical significance are not among them.

    Moreover, this is still carping. If M&Z "gimmicked" their selection criteria, there is still no reason why those gimmicks should not also sweep up physics and chemistry. At the moment, all you are doing is making an unsupported claim that this is because physics and chemistry do not use "statistical inference" as often as does economics. But that is just your assertion, unsupported by, dare I say it, any real data, statistical or otherwise. Indeed, if you read M&Z's paper, they specifically argue against your iterpretation, and they are using data to back up their claims.

    Now, shall we go over everything here that you have written and divide it into the categories of "fact based criticisms" and "mere name calling?" And if we did so, would there be a difference at the two sigma level?

    Posted by: James Killus | Link to comment | Oct 18, 2007 at 01:23 PM

    notsneaky says...

    James,
    Sure, statistical inference is sometimes used in "hard" sciences although not to the same extent as in economics. Again, this has to do with the feasibility of running controlled experiments vs. relying on already available data. If you look in the Hoover & Siegler paper they do document this through a very crude, but telling, search through journals from various disciplines. Something like 10% of physics papers reference stat sign while 55% of (applied) econ papers do. So economists probably do use stat sign more often than "hard" sciences.

    Still, Hoover and Siegler argue, based on comments apparently from Horowitz (which I haven't read) that the 10% or so use of stat sign in physics is probably a lower bound since often times physicists use stat sign without explicitly using the term. But this is just evidence that stat sign is NOT "unimportant", which is pretty much what McCloskey and Ziliak argue, even in "hard sciences". Moreover, according to Hoover and Siegler again, the way that physicists use stat sign is the same way that economists use it so no one is really falling into "the "statistical significance" trap".

    Now as to whether M&Z's methodology should "sweep" up errors in physics, chemistry and engineering papers with their methodology - if M&Z's methodology was properly constructed then it certainly should. But it is not properly constructed! As a matter of fact, M&Z DO NOT EVEN LOOK AT the practice and usage of stat sign within the "hard" sciences in either of the two relevant papers (JEL 96 and JSE 04)! If you don't even look at how it's done in physics, chemistry, how can your methodology possibly "sweep up" physics and chemistry?!? I though this was obvious.

    In other words, McCloskey's comment that "some fields---not physics and engineering and chemistry---have made a dreadful mistake" is a mere assertion with regard to physics, engineering and chemistry. This is readily apparent if one reads the two M&Z's paper and the H&S reply.

    To state things one more time. M&Z's methodology is to look through economic journals (AER specifically) and look for "applied papers". They then go through these papers and decide whether the papers address some questions chosen by M&Z. We've got double subjectivity here in fact since 1) the questions are subjectively chosen by M&Z, and some of them have tenuous connection to the topic (if I put little asteri-xi next to my estimates that are sig at 95% level I get counted as "bad" even if I spend the next 10 pages talking about "oopmh") and 2) even accepting the criteria it's very often a judgment call whether a given paper is "bad" or "good" (what the hell is a "crescendo" of an article?). Given that at least McCloskey has a lot riding on this (since she's been pushing this line for 10+ years and this is perhaps what she's best known for) how can you trust them to be fair?

    Ok, enough "fact based criticisms". Now for some subjective insinuations:

    Are you sure that you're not just readily and gladly accepting McCloskey's claims without actually reading the relevant articles and looking at the relevant evidence, simply because it lets you look down on those "stupid economists"?

    Posted by: notsneaky | Link to comment | Oct 18, 2007 at 02:21 PM

    James Killus says...

    Well, notsneaky, if you look up above, you'll find that I referenced a paper tangential to the debate, so I've already given just a bit of evidence that I have read a little more widely than you imply. I read all the articles linked in the prior articles, and several of the further commentaries as well.

    In fact, I read economics papers quite frequently, albeit (I'm sure) ideosyncratically. Few of those papers are relevant to this discussion, of course, as I tend to read papers about historical price levels, monetary flows into and out of markets, and the phenomenon of "spurious correlation."

    You might also want to pay attention to my remark about the fact that there are other "entry fee" type devices in papers in the hard sciences, which should also give you a little pause before you snark about my just wanting to disrespect economists. I remember one photochemical modeling paper that drew its conclusions from a single simulation without comparing it to a base case, for example, so I have no illusions about the superiority of researchers in any given field.

    However, the error bars given in physical science papers are usually single sigma, not nearly two sigma, but then there is usually a raft of data reported, rather than a single test for statistical significance. Oh, and I'll also mention that I've seen some seriously twisted statistics used in policy arguments, my favorite being from an attack on a stratospheric ozone depletion paper where the commenter separated out sub-sets of data and then trumpeted the fact that the sub-sets were no longer statistically significant. So perhaps my stance can best be viewed as a disdain for the very idea of any standard measure of "statistical significance."

    So to reiterate, I find the idea that I'm writing these things out of a lack of respect for economists rather bemusing, since that isn't my attitude at all. I also have substantial respect for statistics (and statisticians), which I imagine has more to do with my response when I see people misusing statistical inference. Why there would be a greater misuse of statistics in economics than in physics and chemistry is also an interesting question, (I understand that you are claiming that there is no difference, but I believe that you are wrong). My conjecture would be that it is similar to the reason that there are almost always more misuses of any science that has policy implications, and economics is near the very top of the list when it comes to that. And it's illuminating, I think, that all the egregious misuses of the physical sciences that come immediately to mind have to do with things like environmental policy, where the same argument applies.

    Posted by: James Killus | Link to comment | Oct 18, 2007 at 10:59 PM

    notsneaky says...

    James,
    I do not doubt your expertise and knowledge. Nor do I doubt that you've read the relevant papers. But then it is really puzzling why you keep arguing that M&Z's methodology should "sweep" up results in physics, engineering and chemistry, since to the best of my knowledge they haven't really looked at physics, engineering and chemistry. So let me ask you some questions:
    a) Can you point me to a paper or a place where M&Z apply the same methodology they used on AER articles on articles from physics, engineering and chemistry? And I'm not asking for just McCloskey making assertions to the effect that 'economists bad, engineers good' but the actual application of the methodology.
    b) Do you really think the M&Z methodology appropriate? Are you not concerned that many of the questions are only barely related to the topic at hand if at all? Do you think that a person who's been on a personal crusade for more than 10 years is the best person to make these kinds of subjective judgment calls? Do none of H&S's arguments strike you as persuasive?
    and relatedly
    c)even supposing that the methodology is not total quackery, and that this methodology actually has been applied somewhere to papers in chemistry, physics and engineering, do you really think M&Z posses enough expertise and knowledge about those fields to be able to make these kind of judgment calls correctly? I know, statistics is statistics and an error is an error, but lots of fields have their own terminology and nomenclature that an outsider may not catch on to.

    Posted by: notsneaky | Link to comment | Oct 19, 2007 at 10:44 AM

    James Killus says...

    to the best of my knowledge they haven't really looked at physics, engineering and chemistry. --notsneaky

    ...we have found by examining The Physical Review that physicists approximately never use tests of statistical significance; so too in the magazine Science the chemists and geologists...--Ziliak and McCloskey, "Size Matters: The Standard Error of Regressions in the American

    I'll grant that this is a single question in Z&M's questionaire, but it is rather "the money shot," as it were, since the other questions tend to be subsequent statistical tests.

    As for whether or not I find any of H&S's arguments persuasive, I fear that there is probably a halo effect in operation: I rather dislike any set of "arguments" that begins with the phrase "with the zeal of a radio preacher." But in the final analysis, nothing that H&S argue undercuts the primary argument of M&Z, that economists (and psychologists, and many medical researchers) often misuse and misinterpret the notion of "statistical significance."

    This is, incidentally, not an entirely academic exercise. In the treatment of esophagal erosion, the "statistically significant" difference between Nexium (Esomeprazole, an expensive, patented drug) and Prilosec (Omeprazole, which contains both optical isomers or the same drug, and is now out of patent) was 84% vs 94%. The "statistically significant" difference between Nexium and Prevacid was 93% to 89%. Such differences, plus massive advertising, are the basis of $6 billion in drug sales for Nexium.

    Posted by: James Killus | Link to comment | Oct 19, 2007 at 07:03 PM

    notsneaky says...

    " ...we have found by examining The Physical Review that physicists approximately never use tests of statistical significance; so too in the magazine Science the chemists and geologists...

    --Ziliak and McCloskey, "Size Matters: The Standard Error of Regressions in the American"

    But then we're back to square one! Apparantly physicists, chemists and geologists, according to M&Z DON'T use statistical significance test, contra to what you were arguing earlier (and for the record, I agree with you, and H&S, not M&Z. It's you who are disagreeing with yourself).
    If physicists etc. "approximately never" use stat sign, how do we know that, unlike the economists, they're using them correctly?

    "I'll grant that this is a single question in Z&M's questionaire, but it is rather "the money shot," as it were, since the other questions tend to be subsequent statistical tests."

    No, it's a cursory, sloppy look which pretends to confirm what are essentially the researchers prior biases. Again, show me where they actually go through the physics etc. journals, identify uses of stat sign and show that these are correct, unlike in the economic journals.

    "I rather dislike any set of "arguments" that begins with the phrase "with the zeal of a radio preacher." "

    Perhaps that's a bit of unwarranted editorializing on their part which has no place in a scholarly article (though I find it very accurate) but it's irrelevant to whether their criticisms are valid or not. And if you want nasty remarks and insults just read McCloskey's reply which is much much worse.

    "But in the final analysis, nothing that H&S argue undercuts the primary argument of M&Z,"

    It undercuts the entire methodology, so it undercuts the entire argument. The questionnaire is arbitrary, ad hoc, imprecise and many of the questions are only tenuously related to the question (do economists screw it up...). The implementation of the questionnaire is subjective and it is done by two people who obviously have a stake in finding the "right" answer.

    Also, I just wanna say that I read pretty much the whole of each issue of AER. In fact I get giddy when it arrives in my mail box. And I just don't see the M&Z critique in them. In the applied papers it's pretty much standard practice to discuss the size of the effects, to do "counter factual" simulations, to carry out policy predictions. Yes, sometimes people still put little asteri-xi next to their estimated coefficients which are sig at 95% level (which makes M&Z count them as "bad") but there's a lot of other stuff in those papers. And often times people discuss the magnitude without actually using the phrase "economic significance" (which also makes M&Z count them as "bad"). So AT BEST I'm very very sceptical of M&Z's "findings" and there's a lot more evidence that needs to be presented before I'll be convinced that there's something to it. As I said before, the usual practice is to present one's findings as "suggestive" or "possibly supporting", not as definitive gospel truth. Particularly when the methodology is so subjective.

    "This is, incidentally, not an entirely academic exercise."

    But it's precisely the point that no one is arguing that it is. That's just a strawman that M&Z have constructed.

    Posted by: notsneaky | Link to comment | Oct 21, 2007 at 02:06 PM

    James Killus says...

    But then we're back to square one! Apparantly physicists, chemists and geologists, according to M&Z DON'T use statistical significance test, contra to what you were arguing earlier (and for the record, I agree with you, and H&S, not M&Z. It's you who are disagreeing with yourself).

    Okay, you've got me. I have no idea as to what the hell you are talking about, unless you have somehow decided that calculating a p value is exactly the same as a "statistical significance test." But that is not what M&Z are arguing, and that is not what H&S are responding to, and, as nearly as I can tell from rereading my earlier messages, that is not what I've said. I've reported p values in several papers; I've never bothered with reporting any particular value as "statistically significant," because I think the term is rubbish. Please explain how this means that I'm arguing with myself.

    But tell you what, let's look at a specific case cited by H&S as an example of M&Z's sloppiness and a triumph of statistical significance: the study of the effects of small dose aspirin on heart attack and strokes (Charles H. Hennekens 1988 as cited by H&S). McCloskey wrote that it was halted, for ethical reasons, because the results were so large, before the statistics had achieved the level of "statistical significance," i.e. p=0.05. H&S say that in fact the study had showed statistical significance, and in fact, had achieved p=0.00001 (for fatal and non-fatal myocardial infarctions).

    Now I'm going to set aside for a moment the fact that anyone who believes in a p value of 0.00001 in a clinical trial is an idiot. Believing that you have a statistical model that is good out that far on the distributional tail is something that only the truly ill-informed can believe, (excepting those truly rare occurances where one has collected enough data, or knows enough about the underlying process to have extreme confidence in the nature of the underlying distribution--something that is most assuredly not the case in a clinical trial).

    But consider the process that is being described here. I'm very much hoping that this particular trial had a single "analysis point" at which these calculations were made, but even then I get a little dubious, because there is no obvious reason why the calculations cannot be made continuously as the data on morbidity and mortality come in, and in a situation where the data include deaths, well, not looking at the ongoing data has some serious implications. Let's consider that continuous calculation process model for a moment, as an exemplar.

    If the calculated p value was 0.00001 at the time the study was halted, at some earlier time, it would have been 0.05. Earlier still, it would have been 0.1, and so forth. I assume we're on the same page here.

    So, was there ever a point at which someone said, "Oh, sure, there's only 1 chance in 10 that the excess deaths in the control group are due to chance, but we have to wait for enough people to die for that to get to 1 chance in 20, before we halt the trial." Does this never happen? From the sounds of it, it happens a lot. H&S dismiss this as the "licorish jellybean mistake," citing a hypothetical where a situation where a few trials of licorish jellybeans might seem to have an effect on migraines. Remember, they are comparing this hypothetical with a real clinical trial with a much larger number of cases (p=0.1, remember?) which include a number of people dying.

    Moreover, the description of the trial gives other possible outcomes, and notes that there was no "significant difference" in strokes between the two groups, or in overall deaths from cardiovascular disease. So the phenomenon of "statistical significance" caused those in charge of the study to focus on myocardial infarction, that is, they allowed a particular statistical measure to draw their focus to specific test variables, which is exactly what M&Z are complaining about.

    So, you say that nobody disagrees with the idea that one should not confuse statistical significance with real world significance, and that nobody thinks of using an artificial statistical cutoff as a data filter? It looks to me like H&S allow statistical blindness to equate migraine headaches with lethal heart attacks, and approve of a study which filtered out stroke data because of an arbitrary statistical measure. Exactly what kind of examples are you looking for, if these don't count? And remember, H&S thought that this example was an argument against M&Z's complaints.

    Posted by: James Killus | Link to comment | Oct 21, 2007 at 07:31 PM

    notsneaky says...

    James,
    Well, first, you're still evading the question - do you consider the M&Z methodology sound and if so why? Do you really think that what they do gets at the question?

    Second - "I assume we're on the same page here." Yes but I think the crux is here: "this particular trial had a single "analysis point" at which these calculations were made" - what if the "statistical significance" came at this point? It was both stat sign and big in magnitude.

    "there is no obvious reason why the calculations cannot be made continuously as the data on morbidity and mortality come in"

    There are plenty obvious reasons. And anyway, you'd have to read the actual study to know. And McCloskey hadn't.

    Third, while insisting on an all-or-nothing 95% significance level is of course ridiculous, providing t-stats, or using asterixi is just a quick way to indicate what range the p-values are in. Mind you, McCloskey's not big on p-values either.

    Anyway. We're starting to run in circles here.

    Posted by: notsneaky | Link to comment | Oct 22, 2007 at 04:44 PM

    James Killus says...

    Well, I certainly agree that someone is evading the question here.

    Posted by: James Killus | Link to comment | Oct 22, 2007 at 07:29 PM

    sony says...


    [url=http://teleki.com.ua/brand/sony/69/]sony[/url]
    Poeple are not so innocent as they seem to be.
    vilaksoirko

    Posted by: sony | Link to comment | Mar 07, 2009 at 08:17 AM



    Post a comment

    If you have a TypeKey or TypePad account, please Sign In