Know your data, show your data: A rant

I am finishing up my first year of doctoral level political science studies. During that time I have read a lot of articles – approximately 550. 11 courses. 5 articles a week on average. 10 weeks. 11×5×10=550. Two things have bothered me immensely when reading these pieces: (1) it’s unclear authors know their data well, regardless of it being original or secondary data and (2) the reader is rarely showed much about the data.

I take the stance that when you use a dataset you should know it well in and out. I do not just mean that you should just have an idea if its normally distributed or has outliers. I expect you to know who collected it. I expect you to know its limitations.

For example I have read public opinion data that sampled minority populations. Given that said populations are minorities they had to oversample in areas where said groups are over represented. The problem with this is that those who live near co-ethnics are different from those who live elsewhere. This restricts the external validity of results derived from the data, but I rarely see an acknowledgement of this.

Sometimes data is flawed but it’s the best we have. That’s fine. I’m not against using flawed data. I’m willing to buy most arguments if the underlying theory is well grounded. To be honest I view statistical work to be fluff most times. If I don’t really care about the statistics, why do I care if the authors know their data well? I do because it serves as a way for authors to signal that they thought about their work. It’s similar to why artists sometimes place a “bowl of only green m&ms” requirement on their performance contracts. Artists don’t know if their contracts were read, but if their candy bowl is filled with red twizzlers they know something is wrong. I can’t monitor whether the authors took care in their manuscripts, but NOT seeing the bowl of green only m&ms gives me a heads up that something is off.

Of those 500+ articles I have read only a handful had a devoted descriptive statistics section. The logic seems to be that editors are encouraging that stuff be placed in appendices to make articles more readable. I don’t buy that argument for descriptive statistics. Moving robustness checks or replications to the appendices is fine, but descriptive stats give me a chance to actually look at the data and feel less concerned that the results are driven by outliers. In my 2nd best world all dependent variables and major independent variables would be graphed. If the data was collected in differing geographies I would want the data mapped. In my 1st best world replication files with the full dataset and dofiles would be mandatory for all papers.

I don’t think I am asking too much here. Hell, I am not even fond of empirical work. My favorite academic is Peter Leeson (GMU Econ & Law) and he rarely (ever?) does empirical work. As long as empirical work is being done in the social sciences though I expect a certain standard. Otherwise all we’re doing is engaging in math masturbation.

Tldr; I don’t trust most empirical work out there. I’ll rant about excessive literature reviews next time.

6 thoughts on “Know your data, show your data: A rant

  1. I agree wholeheartedly with Know Your Data. As for Show Your Data, let’s be clear about who the culprits are…it’s the journals. You point the finger at editors and they certainly bear some responsibility but, in my opinion, the bulk of it lies at the feet of the publishers. I’ve been on both sides. As an author, I’d LOVE to have more space for lots of things. As a member of an editorial board, a co-editor of a journal special issue, and co-editor of several books….ain’t gonna happen.

    Just one example of just how broken academic publishing has become.

  2. Why do you think that is? Is it because of physical space concerns? Why not just include it in the online appendices?

    • As silly as it is I think you got it, physical space concerns. IMO, academic publishing is woefully behind the times. The mind set I see still seems to think in terms of paper even though I can’t think of a single colleague that still uses hard copy journals. Why? Damned if I know.

  3. Regarding Leeson not doing empirical work: I think he would probably disagree with you classifying him as non-empirical (his paper on witch trials uses a fair amount of quantitative data). If you tell an economist or political scientist that someone does not do empirical analysis, that economist or political scientist would conclude that the person is, instead, a theorist. I would not consider Leeson a theorist. He is someone who writes analytic narratives, basically using historical (usually qualitative) research. This requires a lot of data, and that data is not as easily summarized as quantitative data is. There tend to be a lot more researcher degrees of freedom when presenting historical data (like how effective reputation is for facilitating trade in colonial Africa) than quantitative data. That doesn’t negate your criticisms of “empirical” researchers, but you might want to consider expanding your idea of empirical.

    • I mis-spoke. A better term might have been ‘quantitative’ to distinguish Leeson’s qualitative approach from the empirical approach that dominates mainstream econ.

Please keep it civil

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s