On Borjas, Data and More Data

I see my craft as an economic historian as a dual mission. The first is to answer historical question by using economic theory (and in the process enliven economic theory through the use of history). The second relates to my obsessive-compulsive nature which can be observed by how much attention and care I give to getting the data right. My co-authors have often observed me “freaking out” over a possible improvement in data quality or be plagued by doubts over whether or not I had gone “one assumption too far” (pun on a bridge too far). Sometimes, I wish more economists would follow my historian-like freakouts over data quality. Why?

Because of this!

In that paper, Michael Clemens (whom I secretly admire – not so secretly now that I have written it on a blog) criticizes the recent paper produced by George Borjas showing the negative effect of immigration on wages for workers without a high school degree. Using the famous Mariel boatlift of 1980, Clemens basically shows that there were pressures on the US Census Bureau at the same time as the boatlift to add more black workers without high school degrees. This previously underrepresented group surged in importance within the survey data. However since that underrepresented group had lower wages than the average of the wider group of workers without high school degrees, there was an composition effect at play that caused wages to fall (in appearance). However, a composition effect is also a bias causing an artificial drop in wages and this drove the results produced by Borjas (and underestimated the conclusion made by David Card in his original paper to which Borjas was replying).

This is cautionary tale about the limits of econometrics. After all, a regression is only as good as the data it uses and suited to the question it seeks to answer. Sometimes, simple Ordinary Least Squares are excellent tools. When the question is broad and/or the data is excellent, an OLS can be a sufficient and necessary condition to a viable answer. However, the narrower the question (i.e. is there an effect of immigration only on unskilled and low-education workers), the better the method has to be. The problem is that the better methods often require better data as well. To obtain the latter, one must know the details of a data source. This is why I am nuts over data accuracy. Even small things matter – like a shift in the representation of blacks in survey data – in these cases. Otherwise, you end up with your results being reversed by very minor changes (see this paper in Journal of Economic Methodology for examples).

This is why I freak out over data. Maybe I can make two suggestions about sharing my freak-outs.

The first is to prefer a skewed ratio of data quality to advanced methods (i.e. simple methods with crazy-data). This reduces the chances of being criticized for relying on weak assumptions. The second is to take a leaf out of the book of the historians. While historians are often averse to advantaged data techniques (I remember a case when I had to explain panel data regressions to historians which ended terribly for me), they are very respectful of data sources. I have seen historians nurture datasets for years before being willing to present them. When published, they generally stand up to scrutiny because of the extensive wealth of details compiled.

That’s it folks.

 

Advertisements

2 thoughts on “On Borjas, Data and More Data

  1. I agree with you fully. I think that knowing your data, and its flaws, is under rated.

Please keep it civil

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s