The
blow-up over the Reinhart-Rogoff results reminds me of a point I’ve been
meaning to make about our ability to use empirical methods to make progress in
macroeconomics. This isn't about the computational mistakes that Reinhart and
Rogoff made, though those are certainly important, especially in small samples,
it's about the quantity and quality of the data we use to draw important
conclusions in macroeconomics.
Everybody has been highly critical of theoretical macroeconomic models, DSGE
models in particular, and for good reason. But the imaginative construction of
theoretical models is not the biggest problem in macro – we can build reasonable
models to explain just about anything. The biggest problem in macroeconomics is
the inability of econometricians of all flavors (classical, Bayesian) to
definitively choose one model over another, i.e. to sort between these
imaginative constructions. We like to think or ourselves as scientists, but if
data can’t settle our theoretical disputes – and it doesn’t appear that it can –
then our claim for scientific validity has little or no merit.
There are many reasons for this. For example, the use of historical rather
than “all else equal” laboratory/experimental data makes it difficult to figure
out if a particular relationship we find in the data reveals an important truth
rather than a chance run that mimics a causal relationship. If we could do
repeated experiments or compare data across countries (or other jurisdictions)
without worrying about the “all else equal assumption” we’d could perhaps sort
this out. It would be like repeated experiments. But, unfortunately, there are
too many institutional differences and common shocks across countries to
reliably treat each country as an independent, all else equal experiment.
Without repeated experiments – with just one set of historical data for the US
to rely upon – it is extraordinarily difficult to tell the difference between a
spurious correlation and a true, noteworthy relationship in the data.
Even so, if we had a very, very long time-series for a single country, and if
certain regularity conditions persisted over time (e.g. no structural change),
we might be able to answer important theoretical and policy questions (if the
same policy is tried again and again over time within a country, we can sort out
the random and the systematic effects). Unfortunately, the time period
covered by a typical data set in macroeconomics is relatively short (so that
very few useful policy experiments are contained in the available data, e.g.
there are very few data points telling us how the economy reacts to fiscal
policy in deep recessions).
There is another problem with using historical as opposed to experimental
data, testing theoretical models against data the researcher knows about when
the model is built. In this regard, when I was a new assistant professor Milton
Friedman presented some work at a conference that impressed me quite a bit. He
resurrected a theoretical paper he had written 25 years earlier (it was
his plucking model of aggregate fluctuations), and tested it against the
data that had accumulated in the time since he had published his work. It’s not
really fair to test a theory against historical macroeconomic data, we all know
what the data say and it would be foolish to build a model that is inconsistent
with the historical data it was built to explain – of course the model will fit
the data, who would be impressed by that? But a test against data that the
investigator could not have known about when the theory was formulated is a
different story – those tests are meaningful (Friedman’s model passed the test
using only the newer data).
As a young time-series econometrician struggling with data/degrees of freedom
issues I found this encouraging. So what if in 1986 – when I finished graduate
school – there were only 28 quarterly observations for macro variables (112
total observations, reliable data on money, which I almost always needed,
doesn’t begin until 1959). By, say, the end of 2012 there would be almost double
that amount (216 versus 112!!!). Asymptotic (plim-type) results here we come!
(Switching to monthly data doesn’t help much since it’s the span of the data –
the distance between the beginning and the end of the sample – rather than the
frequency the data are sampled that determines many of the “large-sample
results”).
By today, I thought, I would have almost double the data I had back then and
that would improve the precision of tests quite a bit. I could also do what
Friedman did, take really important older papers that give us results “everyone
knows” and see if they hold up when tested against newer data.
It didn’t work out that way. There was a big change in the Fed’s operating
procedure in the early 1980s, and because of this structural break today 1984 is
a common starting point for empirical investigations (start dates can be
anywhere in the 79-84 range though later dates are more common). Data before
this time-period are discarded.
So, here we are 25 years or so later and macroeconomists don’t have any more
data at our disposal than we did when I was in graduate school. And if the
structure of the economy keeps changing – as it will – the same will probably be
true 25 years from now. We will either have to model the structural change
explicitly (which isn’t easy, and attempts to model structural beaks often
induce as much uncertainty as clarity), or continually discard historical data
as time goes on (maybe big data, digital technology, theoretical advances, etc.
will help?).
The point is that for a variety of reasons – the lack of experimental data, small data sets, and important structural change foremost among them – empirical macroeconomics
is not able to definitively say which competing model of the economy best
explains the data. There are some questions we’ve been able to address
successfully with empirical methods, e.g., there has been a big change in views
about the effectiveness of monetary policy over the last few decades driven by
empirical work. But for the most part empirical macro has not been able to
settle important policy questions. The debate over government spending
multipliers is a good example. Theoretically the multiplier can take a range of
values from small to large, and even though most theoretical models in use today
say that the multiplier is large in deep recessions, ultimately this is an
empirical issue. I think the preponderance of the empirical evidence shows that
multipliers are, in fact, relatively large in deep recessions – but you can find
whatever result you like and none of the results are sufficiently definitive to
make this a fully settled issue.
I used to think that the accumulation of data along with ever improving
empirical techniques would eventually allow us to answer important theoretical
and policy questions. I haven’t completely lost faith, but it’s hard to be
satisfied with our progress to date. It’s even more disappointing to see
researchers overlooking these well-known, obvious problems – for example the
lack pf precision and sensitivity to data errors that come with the reliance on
just a few observations – to oversell their results.