Friday, September 15, 2006

Start with the Question

Brad DeLong takes on, rightly, the Columbia Journalism Review for its attack on Paul Krugman for "partisan slipperiness" over his use of statistics in a recent column. There seems to be a "Krugman does it too" industry developing as some misguided attempt to balance out the misleading analytics that often come from his opponents. So far, though, they haven't been successful.

It's useful to remember that every statistic is an unbiased estimator of something, it just may not be what you want. Statistics aren't wrong. They are what they are, they estimate what they estimate. If you look at "wage gains for one-eyed bearded men with 2.5 years of college" you get an estimate of how this group has fared. The estimate isn't wrong, it just might not answer a useful question, or have much to do with the question being asked. The question to ask is not "Are the statistics wrong." They aren't. They measure something. The question is whether a particular statistic is a useful measure for answering the question at hand.

No measure is perfect. Theoretically we may know precisely what we mean by a term like output or prices, but as we've seen with the debate over measuring inflation, measurement can be controversial. To evaluate a statistic such mean income, median income, all goods inflation, core inflation, wage gains for prime-age males with no college, and so on, start with the question being asked, then ask if a particular measure is the best possible way, given inevitable data constraints, of answering the question (which is not to say that won't be debatable at times).

Measuring inflation provides a good example. If the question is "How has the cost of living changed for a typical consumer," core inflation is not the best measure, all goods inflation is better. But if the question is "What measure of inflation forecasts future all goods inflation best," then core inflation appears to work better. It would be wrong to criticize the use of core inflation to predict future inflation based upon the fact that it's not the best measure of the cost of living. That's not the question the statistic is being asked to answer.

This is where Bree Nordenson of the Columbia Journalism Review got herself into trouble with her attempt at "Krugman does it too." She failed to start with the question - Why isn't the American economy generating higher income gains for those in middle America who have attained a college education - then ask if there is a better statistic to answer the question (I don't think there is), or if the statistic being used gives a misleading answer. As Brad DeLong puts it:

The idea is to rebut the claim that the American economy is still offering rapidly increasing rewards to those in middle America who have bettered themselves by getting a college education. Paul chose it not because it was one of the few statistics that goes his way, but because it is one of the statistics that would be expected to show income gains if the principal factor at work were rising rewards to education.

She also seems to be concerned because "This is not a widely published statistic," and is critical because "he chooses somewhat specific data himself." Of course it's specific, the question was about a specific group of people. All that matters is if it's the best statistic to answer the question. Whether the data are widely published has nothing to do with it. In any case, Brad shows it isn't obscure at all.

So, it's really pretty simple. Start with the question, then ask what empirical measures shed the most light on the answer.

Update: Brad DeLong posts Bree Nordenson's response as she grasps at straws trying to defend herself. For example, she says, this time attacking DeLong for his defense of Krugman:

We wonder whether Professor DeLong's economics students would be permitted to look at the median personal income of college graduates in two separate years, chosen at random, and then present a paper suggesting that this data alone sheds light on the question of education's effect on income inequality.

The years are 2000 and 2005. Does she really think these were "chosen at random?" The data end in 2005 - could that have anything to do with why 2005 is chosen? Perhaps. She could just read what Krugman said:

Political analysts tried all sorts of explanations for popular discontent with the “Bush boom” — it’s the price of gasoline; no, people are in a bad mood because of Iraq — before finally acknowledging that most Americans think it’s a bad economy because for them, it is. The lion’s share of the benefits from recent economic growth has gone to a small, wealthy minority, while most Americans were worse off in 2005 than they were in 2000.

The question is about discontent and the distribution of gains during the so-called "Bush boom" and how the distribution is related to education. Could that have anything to do with the choice of 2000 (and her suggestion of using the change from 2004 to 2005 instead is ludicrous given the question is about the gains during the recent cycle).

This part was funny too:

Leaving aside the question of whether "typical" households can be characterized as "median" (rather than, say, "average" or "neighbors of Paul Krugman")...

Huh? Good decision to leave that aside as it would be hard to win an argument that Krugman's neighbors are more typical than the median person. The rest of the argument is similarly . . . . let's just say unpersuasive. [Update: Yet another post from Brad DeLong shows that David Brooks, to whom Krugman is responding in part, also appears to use the years 2000 and 2005 as base years for comparison further explaining why Krugman would want to use the same years in rebuttal. Bree?].

