« An Interview with Martin Feldstein | Main | The Fed Leaves Target Rate Unchanged at 5.25% »

Wednesday, September 20, 2006

If At First You Don't Succeed, Run Another Regression

In political science, and I suspect in economics as well, there is a surprising lack of t-statistics just below the 5% significance level, and a similarly surprising amount just above indicating “scholars “tweak” regression specifications and samples to move barely insignificant results above conventional thresholds”.

Something fishy in political science p-values..., Statistical Modeling, Causal Inference, and Social Science, by Andrew: A comment pointed out this note by Kevin Drum on this cool paper by Alan Gerber and Neil Malhotra on p-values in published political science papers. They find that there are surprisingly many papers with results that are just barely statistically significant (t=1.96 to 2.06) and surprisingly few that are just barely not significant (t=1.85 to 1.95). Perhaps people are fudging their results or selecting analyses to get significance. Gerber and Malhotra's analysis is excellent--clean and thorough.

Here's the abstract from the paper:

Abstract ...This paper examines the two most prominent political science journals (the APSR and the AJPS) and two major literatures in the discipline (the effect of negative advertisements and economic voting) to see if there is evidence of publication bias. We examine the effect of the .05 significance level on the pattern of published findings using what we term a “caliper” test and can reject the hypothesis of no publication bias at the 1 in 100,000,000 level. Our findings therefore strongly suggest that the results reported in the leading political science journals and in two important literatures are misleading and inaccurate due to publication bias. We also discuss some of the reasons for publication bias and propose reforms to reduce its impact on research.

Kevin Drum includes a graph showing this effect explicitly:

Lies, Damn Lies, and....: Via Kieran Healy, ...It is, at first glance, just what it says it is: a study of publication bias, the tendency of academic journals to publish studies that find positive results but not to publish studies that fail to find results. ...

The chart on the right shows G&M's basic result. In statistics jargon, a significant result is anything with a "z-score" higher than 1.96, and if journals accepted articles based solely on the quality of the work, with no regard to z-scores, you'd expect the z-score of studies to resemble a bell curve. But that's not what Gerber and Malhotra found. Above a z-score of 1.96, the results fit the bell curve pretty well, but below a z-score of 1.96 there are far fewer studies than you'd expect. Apparently, studies that fail to show significant results have a hard time getting published.

So far, this is unsurprising. Publication bias is a well-known and widely studied effect, and it would be surprising if G&M hadn't found evidence of it. But take a closer look at the graph. In particular, take a look at the two bars directly adjacent to the magic number of 1.96. That's kind of funny, isn't it? They should be roughly the same height, but they aren't even close. There are a lot of studies that just barely show significant results, and there are hardly any that fall just barely short of significance. There's a pretty obvious conclusion here, and it has nothing to do with publication bias: data is being massaged on wide scale. A lot of researchers who almost find significant results are fiddling with the data to get themselves just over the line into significance.

And looky here. In properly sober language, that's exactly what G&M say:

It is sometimes suggested that insignificant findings end up in “file drawers,” but we observe many results with z-statistics between zero and the critical value. There is, however, no way to know how many studies are “missing.” If scholars “tweak” regression specifications and samples to move barely insignificant results above conventional thresholds, then there may be many z-statistics below the critical value, but an inordinate number barely above the critical value and not very many barely below it. We see that pattern in the data.

We see that pattern in the data. Message to political science professors: you are being watched. And if you report results just barely above the significance level, we want to see your work....

I'd be interested in seeing a similar chart for economics.

    Posted by on Wednesday, September 20, 2006 at 10:41 AM in Economics, Methodology | Permalink  TrackBack (2)  Comments (17)


    TrackBack URL for this entry:

    Listed below are links to weblogs that reference If At First You Don't Succeed, Run Another Regression:

    » On the significance of statistically significant r from Worthwhile Canadian Initiative

    Andrew Gelman notes that there are suprisingly many papers with results that are just barely statistically significant (t=1.96 to 2.06) and surprisingly few that are just barely not significant (t=1.85 to 1.95). in a selection of empirical stud... [Read More]

    Tracked on Thursday, September 21, 2006 at 10:04 AM

    » Does empirical economics follow the scientific method? from New Economist

    Some activities of economists are valued more than others in the academic pecking order. Research trumps teaching. Journal articles often have more impact than books (unless you are as well known as Mankiw or Krugman and can corner the 'principles' tex... [Read More]

    Tracked on Sunday, October 01, 2006 at 11:53 PM


    Feed You can follow this conversation by subscribing to the comment feed for this post.