Crossposted from mathbabe.org. The views expressed below are those of Cathy O’Neil.
As Rortybomb reported yesterday on the Roosevelt Institute blog (hat tip Adam Obeng), a recent paper written by Thomas Herndon, Michael Ash, and Robert Pollin looked into replicating the results of a economics paper originally written by Carmen Reinhart and Kenneth Rogoff entitled Growth in a Time of Debt.
The original Reinhart and Rogoff paper had concluded that public debt loads greater than 90 percent of GDP consistently reduce GDP growth, a “fact” which has been widely used. However, the more recent paper finds problems. Here’s the abstract:
Herndon, Ash and Pollin replicate Reinhart and Rogoff and find that coding errors, selective exclusion of available data, and unconventional weighting of summary statistics lead to serious errors that inaccurately represent the relationship between public debt and GDP growth among 20 advanced economies in the post-war period. They find that when properly calculated, the average real GDP growth rate for countries carrying a public-debt-to-GDP ratio of over 90 percent is actually 2.2 percent, not -0:1 percent as published in Reinhart and Rogoff. That is, contrary to RR, average GDP growth at public debt/GDP ratios over 90 percent is not dramatically different than when debt/GDP ratios are lower.
The authors also show how the relationship between public debt and GDP growth varies significantly by time period and country. Overall, the evidence we review contradicts Reinhart and Rogoff’s claim to have identified an important stylized fact, that public debt loads greater than 90 percent of GDP consistently reduce GDP growth.
A few comments.
1) We should always have the data and code for published results.
The way the authors Herndon, Ash and Pollin managed to replicate the results was that they personally requested the excel spreadsheets from Reinhart and Rogoff. Given how politically useful and important this result has been (see Rortybomb’s explanation of this), it’s kind of a miracle that they released the spreadsheet. Indeed that’s the best part of this story from a scientific viewpoint.
2) The data and code should be open source.
One cool thing is that now you can actually download the data – there’s a link at the bottom of this page. I did this and was happy to have a bunch of csv files and some (open source) R code which presumably recovers the excel spreadsheet mistakes. I also found some .dta files, which seems like Stata proprietary file types, which is annoying, but then I googled and it seems like you can use R to turn .dta files into csv files. It’s still weird that they wrote code in R but saved files in Stata.
3) These mistakes are easy to make and they’re mostly not considered mistakes.
Let’s talk about the “mistakes” the authors found. First, they’re excluding certain time periods for certain countries, specifically right after World War II. Second, they chose certain “non-standard” weightings for the various countries they considered. Finally, they accidentally excluded certain rows from their calculation.
Only that last one is considered a mistake by modelers. The others are modeling choices, and they happen all the time. Indeed it’s impossible not to make such choices. Who’s to say that you have to use standard country weightings? Why? How much data do you actually need to consider? Why?
[Aside: I'm sure there are proprietary trading models running right now in hedge funds that anticipate how other people weight countries in standard ways and betting accordingly. In that sense, using standard weightings might be a stupid thing to do. But in any case validating a weighting scheme is extremely difficult. In the end you're trying to decide how much various countries matter in a certain light, and the answer is often that your country matters the most to you.]
4) We need to actually consider other modeling possibilities.
It’s not a surprise, to economists anyway, that after you include more post-WWII years of data, which we all know to be high debt and high growth years worldwide, you get a substantively different answer. Excluding these data points is just as much a political decision as a modeling decision.
In the end the only reasonable way to proceed is to describe your choices, and your reasoning, and the result, but also consider other “reasonable” choices and report the results there too. And if you don’t like the answer, or don’t want to do the work, at the very least you need to provide your code and data and let other people check how your result changes with different “reasonable” choices.
Once the community of economists (and other data-centric fields) starts doing this, we will all realize that our so-called “objective results” utterly depend on such modeling decisions, and are about as variable as our own opinions.
5) And this is an easy model.
Think about how many modeling decisions and errors are in more complicated models!