When Census data are collected, incomes in excess of an upper limit (approximately 1 million dollars) are top-coded and reported as $999,999 dollars rather than as the full amount. There's a bit more to it than this, but what's important is that there are reporting limits.
Recently, the effect that top-coding of census data has on income inequality has been under discussion. The issue arose when Alan Reynolds claimed that various statistical issues have created a false impression of rising inequality, a claim that has been thoroughly rebutted here and elsewhere (see below). In response, Paul Krugman emails "a finger exercise on earnings inequality and reporting limits":
Do Reporting Limits Really Affect Measured Income Inequality, by Paul Krugman 1/8/07: For my own edification, I thought I’d make a rough estimate of how much the Census reporting limits affect one dimension of inequality, inequality in earnings. What we learned amid all the nonsense from Alan Reynolds is that the Census data don’t count earned income in excess of approximately $1 million, and other forms of income are subject to even tighter reporting limits. But let’s focus just on earnings.
Now, only a tiny minority of Americans make enough for the reporting limits to matter. According to the Social Security Administration data, http://www.ssa.gov/OACT/COLA/awidevelop.html, in 2005 less than 0.06% of workers had wages and salaries exceeding $1 million – and only the part of their income over $1 million is censored. Can that piece really be big enough to significantly affect overall measures of the level and trend in inequality?
According to an estimate I’ve just done using the SSA data, the answer is yes. This kind of calculation is new to me – I was just having what passes for fun in the sick mind of an economist – but I’d like to hear any comments.
The SSA data give the number of people with wage and salary income in ranges – e.g., $1 million to $1.5 million, $1.5 million to $2 million, and so on. But it’s pretty easy to use those data to estimate the total income excluded by a $1 million reporting limit.
The key is knowing that top incomes tend to follow a Pareto distribution. That is, the number of people with any given (high) income declines exponentially with that income:
n = Ky^(-alpha)
We can integrate this to get the number of people with incomes exceeding some level Y:
N = KY^(1-alpha)/(alpha-1)
Or, in logs,
Ln(N) = ln(K/(alpha-1)) + (1-alpha) ln(Y)
Does this work? You bet! Figure 1 shows the SSA data for 2005, with income measured in millions and actual numbers of people. The Pareto distribution works very well indeed. And the fitted line lets us estimate both K and alpha: K = 154318; alpha = 2.6867.
A bit more math lets us derive the total income of people with earnings above a given Y: it’s
Y total = KY^(2-alpha)/(alpha-2)
Since we’re measuring income in millions, and looking for the income of people in the million-plus category, this becomes simply K/(alpha-2) = about $225 billion.
The SSA also gives us total wage and salary income: $5.374 trillion in 2005. So the 82,000 workers in the million-plus club, less than 0.06 percent of the work force, accounted for 4.2% of wages and salaries.
Some of that total – the part under $1 million – was counted. How much? $82 billion - $1 million per worker. What’s left, the part that was above the reporting limit, I estimate at 2.7% of total wage and salary income. If I understand correctly, that explains about a third of the difference between the Census and Piketty-Saez estimates of the top 5% share. Bear in mind that the reporting limits on other forms of income also matter, and that there are other sources of bias in the Census numbers, such as a tendency of high-income respondents to understate their incomes.
More important is how the reporting limits affect trends. Here’s what I did: I went back to 1994, just after the Census changed the reporting limits, and did exactly the same exercise. Figure 2 shows the Pareto plot for 1994: notice that the line is steeper, which says that income among the million-plus club wasn’t quite as unequal in 1994 as it was in 2005.
When you run through the whole exercise, what you find is that the earnings that would be missed because of reporting limits are much smaller, only 0.7% of total wage and salary income. This isn’t surprising: the reporting limit hasn’t changed, while the structure of wages has shifted right both because of inflation and because of rising average real earnings. Moreover, top incomes have become more unequal, with more income in the far right tail. So there’s a lot more income above the reporting limit, all of which will be captured by income-tax-based estimates of income inequality, but won’t be captured by Census data.
Now, the Census data say that the income share of the top 5% rose only slightly, from 21.2% to 22.2%, between 1994 and 2005. The Piketty-Saez data, which only go up to 2004, show a 3.7% rise. Our little exercise with earnings data suggests that the missed income due to reporting limits rose by about 2 percentage points over the same period, so that more than half the difference between the Census and Piketty-Saez trends could be the result of reporting limits that caused the Census data to miss a large and growing amount of income at the very top.
The bottom line: top-coding really, truly does matter – and yes, Virginia, income inequality is still rising.