I am finishing up my first year of doctoral level political science studies. During that time I have read a lot of articles – approximately 550. 11 courses. 5 articles a week on average. 10 weeks. 11×5×10=550. Two things have bothered me immensely when reading these pieces: (1) it’s unclear authors know their data well, regardless of it being original or secondary data and (2) the reader is rarely showed much about the data.
I take the stance that when you use a dataset you should know it well in and out. I do not just mean that you should just have an idea if its normally distributed or has outliers. I expect you to know who collected it. I expect you to know its limitations.
For example I have read public opinion data that sampled minority populations. Given that said populations are minorities they had to oversample in areas where said groups are over represented. The problem with this is that those who live near co-ethnics are different from those who live elsewhere. This restricts the external validity of results derived from the data, but I rarely see an acknowledgement of this.
Sometimes data is flawed but it’s the best we have. That’s fine. I’m not against using flawed data. I’m willing to buy most arguments if the underlying theory is well grounded. To be honest I view statistical work to be fluff most times. If I don’t really care about the statistics, why do I care if the authors know their data well? I do because it serves as a way for authors to signal that they thought about their work. It’s similar to why artists sometimes place a “bowl of only green m&ms” requirement on their performance contracts. Artists don’t know if their contracts were read, but if their candy bowl is filled with red twizzlers they know something is wrong. I can’t monitor whether the authors took care in their manuscripts, but NOT seeing the bowl of green only m&ms gives me a heads up that something is off.
Of those 500+ articles I have read only a handful had a devoted descriptive statistics section. The logic seems to be that editors are encouraging that stuff be placed in appendices to make articles more readable. I don’t buy that argument for descriptive statistics. Moving robustness checks or replications to the appendices is fine, but descriptive stats give me a chance to actually look at the data and feel less concerned that the results are driven by outliers. In my 2nd best world all dependent variables and major independent variables would be graphed. If the data was collected in differing geographies I would want the data mapped. In my 1st best world replication files with the full dataset and dofiles would be mandatory for all papers.
I don’t think I am asking too much here. Hell, I am not even fond of empirical work. My favorite academic is Peter Leeson (GMU Econ & Law) and he rarely (ever?) does empirical work. As long as empirical work is being done in the social sciences though I expect a certain standard. Otherwise all we’re doing is engaging in math masturbation.
Tldr; I don’t trust most empirical work out there. I’ll rant about excessive literature reviews next time.