Some more borderline fraud from the higher education industry.

From the Wall Street Journal: For Sale: SAT-Takers’ Names. Colleges Buy Student Data and Boost Exclusivity

The title pretty much says it all: the College Board is selling data about test-takers (i.e. high school students) to colleges who use that to market to a wider pool of applicants. That wider pool often includes students who don’t stand a chance of getting in to the schools that are now marketing to them, but the marketing gives the false impression that the school wants them.

Joe Six-pack Jr. takes the SAT, fills out a survey, and that survey goes into a database. Some school that normally ranks near the middle of the pack buys a piece of that database, including Joe’s data. They send him a brochure and a letter that looks like it was written specifically for him (and he doesn’t know any better) so Joe, figures he’s being recruited. Instead of just applying to his local state schools, now he shells out an extra $50 to apply to Middling University. They summarily reject his application because his SAT scores were 1100 and they’re only accepting students who scored above 1300. MU now looks a little bit more prestigious in the rankings (which means their current administration can take credit before jumping ship to take a higher paying job at a school looking to also increase in the rankings). The College Board gets paid. The administrators get paid. The U.S. News rankings get a little less useful for incoming students, but they don’t know that. On the other hand the rankings get a little more important for decision makers at schools. And Joe Jr. is funding this whole mess despite being a) the least informed, and b) the least well funded player in this whole mess.

My Startup Experience

Over the past 4 years, I have had a huge transition in my life–from history student to law student to serial medical entrepreneur. Essentially, I have learned a great deal from my academic work that taught me the value that we can create if we find an unmet need in the world, create an idea that fills that need, and then use technology, personal networks, and hard work to create novelties. While startups obviously tackle any new problem under the sun, to me, they are the mechanism to bring about a positive change–and, along the way, get the resources to scale that change across the globe.

I am still very far from reaching that goal, but my family and cofounders have several visions of how to improve not only how patients are treated but also how we build the knowledge base that physicians, patients, and researchers can use to inform care and innovation. My brother/cofounder and I were recently on an entrepreneurship-focused podcast, and we got the chance to discuss our experience, our vision, and our companies. I hope this can be a springboard for more discussions about how companies are a unique agent of advancing human flourishing, and about the history and philosophy of entrepreneurship, technology, and knowledge.

You can listen here: http://rochesterrising.org/podcast/episode-151-talking-medical-startups-with-keith-and-kevin-kallmes. Heartfelt thanks to Amanda Leightner and Rochester Rising for a great conversation!

Thank you!

Kevin Kallmes

Let’s Find Out – or: the Power of Reference

The core message of a number of books I’ve recently had the great pleasure to read has been fairly simple. Have a look. Check it out. Put your numbers in perspective. In a world awash with statistics and cognitive biases imploring us to cheer mindlessly for our own team, having the skill and wherewithal to step back and carefully ask: “can this really be so?” is golden.

One of recently passed celebrity professor and YouTube phenomenon Hans Rosling’s most profound advice for countering misinformation about the state of the world is precisely this: put all numbers in perspective. Never accept unaccompanied numbers – never believe the numerator without checking the denominator. What matters, as Bryan Caplan never ceases to emphasize as the GMU Economics creed, “are statistics, not emotions – and arguments, not stories.”

But, a statistic may never be left alone, Rosling maintains, but always compared to other relevant numbers. What share of its total category does this statistic represent? What was it last year, 5 or 10 or 20 years ago? Is there some self-evident change in associated behavior that is relevant or ought to explain it? A century ago street cars used to kill and injure hundreds of people every year, but since very few American cities make use of street cars today, the casualty is fortunately much lower. If we keep in mind that miles travelled by cars far outnumber miles travelled by street cars, reporting the number of street car deaths – while probably correct – entirely miss the point when discussing traffic safety. In How Not To Be Wrong, Mathematics professor Jordan Ellenberg quipped

Dividing one number by another is mere computation ; knowing what to divide by what is mathematics.

Here’s another example. If I told you about 23 000 individual deaths and spent a brief 10 second on each of them, going through the list would take me almost three days. On a personal level like that, 23 000 deaths is an absurd, insane, catastrophe-style event that few people are emotionally equipped to handle – essentially the size of my hometown, wiped out in a single year. If I told you those 23 000 deaths were due to antibiotic resistant diseases in the U.S. last year, the pandemic scenarios working through your mind quickly escalate. That many! Let’s find the nearest bunker!

If I then told you that cancer and heart diseases (each!) claim the lives of about 20x that, the fear of lethal apocalyptic germs consuming the world ought to quickly recede. Oh.

Here’s another example. It is entirely correct to point out that the number of people killed in worldwide airplane accidents in 2018 (556 people) was much higher than the year before (44 people) and the year before that (325 people). Would one be excused for believing that air travel is getting more risky and dangerous? Forbes, for instance, ran a roughly accurate story claiming that airline fatalities increased by 900%.

Not in the slightest. The number of fatalities from air travel has been falling for decades, all while the number of flights and miles travelled have increased exponentially, meaning that the per-flight, per-mile or per-passenger risk of death has kept dropping. Not to mention that alternative modes of travelling like driving is magnitudes more dangerous.

While Rosling teaches us to figure out what the base rate is, i.e. putting our statistic into appropriate perspective, one of Philip Tetlock’s tricks for becoming a ‘Superforecaster’ is to use Bayesian updating of one’s beliefs. This picks up precisely where Rosling’s idea left off. Once we know where to start, we have to amass more information, numbers and observations from other points of view – Bayesian updating is a popular method to incorporate and synthesize new information with the old.

In short “Calculation, like logic, is your friend” (Landsburg 2018: 44). Statistics matter and numbers can deceive. In order to better understand our realities and see through mistakes that others make – either intentionally to deceive or persuade, or unintentionally through ignorance – we must embrace the core message of people like Ellenberg, Tetlock, Duffy, Rosling or Pinker.

Always Be Comparing Thy Numbers. Never accept an unaccompanied statistic. Never trust numerators without denominators.

Legal Immigration Into the United States (Part 5); The Net Contribution of Immigrants: An Attempt at Critical Quantification

In his October 2006 article in Liberty, (“Immigration: Yes, No, and Maybe” by Richard Fields, Stephen Cox, and Bruce Ramsey), Cox tries to summarize the net cost that (then) current immigrants impose on American society by working out a quantitative example. He stages an imaginary but realistic (Mexican) immigrant family of five living in Los Angeles – two parents and three minor children. He assigns reasonable earnings to the parents and sets those against the probable costs that the whole family imposes in the form of normal local and other services. He arrives at the conclusion that the family annually costs American society 38,900 2006 dollars. (I agree with Cox that this may be a conservative estimate. That would be about 48,000 June 2018 dollars, using the CPI Inflation Calculator of the Bureau of Labor Statistics).

To gauge the real magnitude of the overall normal costs legal immigrants  thus impose on American society, let’s suppose further that all of the 2016 legal immigration is composed of Cox’s families of five. That’s 240,000 such families. The aggregate excess of their social costs over their earnings is 48,000 x 240,000 = 11.52 billion dollars. As a percentage of 2016 GDP, this figure is less than 7/10,000 (seven over ten thousand – 2016 GDP from CountryEconomy.Com).

Now, let’s suppose that Cox was too conservative by one half in his estimate of the cost his family imposes on American society. This would imply that the legal immigrant families that compose all of 2016 immigration cost American society an amount that is like 14/10,000. The numerator in this last estimate includes only legal immigrants. Let’s suppose further that the number of illegal immigrants for the year of reference equals the number of legal ones and that they cost the same and contribute the same as legal immigrants. The cost that all immigrants impose on American society is then approximately 28/10,000 or about 1/3 of one per cent of GDP. If you assume that illegal immigrants earn only half as much as legal immigrants, the net cost of immigration overall goes up correspondingly. It’s still not much. My point is this: In the worst case scenario I can conjure, the net cost that immigrants impose on American society is very low. It’s of the order of 12 million Americans buying a $10 lottery ticket at Nine/Eleven every payday.

This is still certainly an overestimation, for two reasons. One, this scenario is the extreme, limiting case. There is, of course, zero chance that the total legal immigration in any one year is composed entirely of the kind of families of five Cox describes. Among the immigrants, as with nearly all immigration everywhere, there must be a preponderance of healthy young men and young women without children. This happens through self-selection: emigration is very difficult. It requires courage and even a solid dose of unrealism; children are a big impediment in this respect. But, in most cases, younger people without children must easily contribute more than they cost American society because they land all raised up and ready to work (as I said). The exceptions concern those who fall seriously sick– uncommon among the young – and those who end up in jail or prison. The latter is not a rare occurrence among the young in general, among young males in particular. As I said, I deal below with the particular cost of incarcerating immigrants.

The other imaginary limiting case is this: Among the 1,200,000 immigrants in 2016, there is a single family of five as described by Cox and the balance is made up of vigorous young women and young men who never become sick and never transgress the law. In that other limiting case, immigrants are almost certainly a net economic boon to American society. I don’t know where the reality lies and it may change from year to year. It’s doable research which, I think, has not been done.

The second reason why the figure of 28/10,000 is probably an overestimation, or why it leads to fallacious inferences, has to do with life cycles. First, there will probably be a period during the family’s life when the children will be grown and capable of working while the parents themselves are working, undisturbed by family obligations. During that period, three or four, or all five immigrants will in all likelihood contribute more than they take from American society, in spite of their low qualifications. This sweet spot may vanish when the parents reach Medicare and Social Security age. In the meantime, several family members will have contributed to the relevant social funds; one or more of the children will too, probably for 30 years or more. Hence, whether the family of five receives a net benefit or impose a net cost over a longer, trans-generational period depends on actuarial calculations that neither Cox nor I have performed.

I hasten to add that it’s quite possible that such actuarial calculations, performed with real numbers, would still show the five in my chosen family as perpetrating a net cost on American society. To be thorough, one would have to take into account two more things. One is the possibility that one of the three children will turn out to be a great, outsize contributor, like the 40% American Nobel Prize winners born abroad. Or all three. The relevant reasoning has to be trans-generational to some extent, it seems to me. Just look at the extreme imaginary scenario below.

For ten years in a row, the US admits as many immigrants as it did in 2016. That’s 12 million immigrants. Let’s assume none dies during that period and they have no children (We will see that this unrealistic assumption does not matter here.) Not one of the twelve million is able to pay his full fare. On the average, they each cost American society $20,000 there is no chance they will ever pay back, one way or another. However, one of these hapless immigrants is Steve Job’s biological father. You know the rest of this true story. Ask yourself: If it were your decision, knowing this and, and based solely on economic matters which are the stake here, would keep out all twelve million?

This quandary poses an interesting conceptual problem we keep encountering: Had Job’s biological father not accidentally made his girlfriend pregnant; had they not decided to give Steve up for adoption, would someone else have developed the personal computer with Wozniak? Without him? Would you bet on it? The truth is that American society is unusually inventive but it’s probably not the most inventive on a per capita basis. (Last time I looked, the Japanese were registering more patents than Americans – that’s per capita.) It’s also seems true that immigrants account for a disproportionate number of American innovations, including 40% of all Nobel prizes in other than literature. (And also excluding the often farcical Nobel Peace Prize.) It’s not absurd to think of American inventiveness as the happy encounter of American institutions unusually favorable to innovation with immigrant vigor. This is just a speculation, of course but how willing are you to discard it summarily?

Finally, the calculation of immigrants’ net burden imposed on American society necessarily fails to take into account real positive contributions that are difficult to quantify, more or less intangible contributions, some of which I have mentioned elsewhere. They go from Italian cuisine to my own ability to interpret some world events better than almost any native-born professor. Here is another mental experiment: Suppose a national society decided, through some process or other, to bring up the average quality of its every day food from, say English levels, to 1/3 of Italian level. The cost would be astronomical and the result would clearly constitute a significant improvement in the quality of Americans’ every day life – which is what the science of Economics is all about, of course. My point is that the fact that this felicitous result was achieved through the happenstance of immigration does not imply that its societal value is zero.

One of the highest per capita expenditures that immigrants–like every other population group over and below a certain age–impose on American society is the cost of incarceration. That cost is also mostly borne by state and local authorities, although there exists a process by which the federal government reimburses local governments for illegal immigrants incarcerated for crimes other than illegal border crossing (explained in Cox 2006). I examine below the tangled issue of the cost of immigrant incarceration.

[Editor’s note: In case you missed it, here is Part 4]

Know your data, show your data: A rant

I am finishing up my first year of doctoral level political science studies. During that time I have read a lot of articles – approximately 550. 11 courses. 5 articles a week on average. 10 weeks. 11×5×10=550. Two things have bothered me immensely when reading these pieces: (1) it’s unclear authors know their data well, regardless of it being original or secondary data and (2) the reader is rarely showed much about the data.

I take the stance that when you use a dataset you should know it well in and out. I do not just mean that you should just have an idea if its normally distributed or has outliers. I expect you to know who collected it. I expect you to know its limitations.

For example I have read public opinion data that sampled minority populations. Given that said populations are minorities they had to oversample in areas where said groups are over represented. The problem with this is that those who live near co-ethnics are different from those who live elsewhere. This restricts the external validity of results derived from the data, but I rarely see an acknowledgement of this.

Sometimes data is flawed but it’s the best we have. That’s fine. I’m not against using flawed data. I’m willing to buy most arguments if the underlying theory is well grounded. To be honest I view statistical work to be fluff most times. If I don’t really care about the statistics, why do I care if the authors know their data well? I do because it serves as a way for authors to signal that they thought about their work. It’s similar to why artists sometimes place a “bowl of only green m&ms” requirement on their performance contracts. Artists don’t know if their contracts were read, but if their candy bowl is filled with red twizzlers they know something is wrong. I can’t monitor whether the authors took care in their manuscripts, but NOT seeing the bowl of green only m&ms gives me a heads up that something is off.

Of those 500+ articles I have read only a handful had a devoted descriptive statistics section. The logic seems to be that editors are encouraging that stuff be placed in appendices to make articles more readable. I don’t buy that argument for descriptive statistics. Moving robustness checks or replications to the appendices is fine, but descriptive stats give me a chance to actually look at the data and feel less concerned that the results are driven by outliers. In my 2nd best world all dependent variables and major independent variables would be graphed. If the data was collected in differing geographies I would want the data mapped. In my 1st best world replication files with the full dataset and dofiles would be mandatory for all papers.

I don’t think I am asking too much here. Hell, I am not even fond of empirical work. My favorite academic is Peter Leeson (GMU Econ & Law) and he rarely (ever?) does empirical work. As long as empirical work is being done in the social sciences though I expect a certain standard. Otherwise all we’re doing is engaging in math masturbation.

Tldr; I don’t trust most empirical work out there. I’ll rant about excessive literature reviews next time.

On Borjas, Data and More Data

I see my craft as an economic historian as a dual mission. The first is to answer historical question by using economic theory (and in the process enliven economic theory through the use of history). The second relates to my obsessive-compulsive nature which can be observed by how much attention and care I give to getting the data right. My co-authors have often observed me “freaking out” over a possible improvement in data quality or be plagued by doubts over whether or not I had gone “one assumption too far” (pun on a bridge too far). Sometimes, I wish more economists would follow my historian-like freakouts over data quality. Why?

Because of this!

In that paper, Michael Clemens (whom I secretly admire – not so secretly now that I have written it on a blog) criticizes the recent paper produced by George Borjas showing the negative effect of immigration on wages for workers without a high school degree. Using the famous Mariel boatlift of 1980, Clemens basically shows that there were pressures on the US Census Bureau at the same time as the boatlift to add more black workers without high school degrees. This previously underrepresented group surged in importance within the survey data. However since that underrepresented group had lower wages than the average of the wider group of workers without high school degrees, there was an composition effect at play that caused wages to fall (in appearance). However, a composition effect is also a bias causing an artificial drop in wages and this drove the results produced by Borjas (and underestimated the conclusion made by David Card in his original paper to which Borjas was replying).

This is cautionary tale about the limits of econometrics. After all, a regression is only as good as the data it uses and suited to the question it seeks to answer. Sometimes, simple Ordinary Least Squares are excellent tools. When the question is broad and/or the data is excellent, an OLS can be a sufficient and necessary condition to a viable answer. However, the narrower the question (i.e. is there an effect of immigration only on unskilled and low-education workers), the better the method has to be. The problem is that the better methods often require better data as well. To obtain the latter, one must know the details of a data source. This is why I am nuts over data accuracy. Even small things matter – like a shift in the representation of blacks in survey data – in these cases. Otherwise, you end up with your results being reversed by very minor changes (see this paper in Journal of Economic Methodology for examples).

This is why I freak out over data. Maybe I can make two suggestions about sharing my freak-outs.

The first is to prefer a skewed ratio of data quality to advanced methods (i.e. simple methods with crazy-data). This reduces the chances of being criticized for relying on weak assumptions. The second is to take a leaf out of the book of the historians. While historians are often averse to advantaged data techniques (I remember a case when I had to explain panel data regressions to historians which ended terribly for me), they are very respectful of data sources. I have seen historians nurture datasets for years before being willing to present them. When published, they generally stand up to scrutiny because of the extensive wealth of details compiled.

That’s it folks.