There is no Bloomberg for medicine

When I began working in medical research, I was shocked to find that no one in the medical industry has actually collected and compared all of the clinical outcomes data that has been published. With Big Data in Healthcare as such a major initiative, it was incomprehensible to me that the highest-value data–the data that is directly used to clear therapies, recommend them to the medical community, and assess their efficacy–were being managed in the following way:

  1. Physician completes study, and then spends up to a year writing it up and submitting it,
  2. Journal sits on the study for months, then publishes (in some cases), but without ensuring that it matches similar studies in the data it reports.
  3. Oh, by the way, the journal does not make the data available in a structured format!
  4. Then, if you want to see how that one study compares to related studies, you have to either find a recent, comprehensive, on-point meta-analysis (which is a very low chance in my experience), or comb the literature and extract the data by hand.
  5. That’s it.

This strikes me as mismanagement of data that are relevant to lifechanging healthcare decisions. Effectively, no one in the medical field has anything like what the financial industry has had for decades–the Bloomberg terminal, which presents comprehensive information on an updatable basis by pulling data from centralized repositories. If we can do it for stocks, we can do it for medical studies, and in fact that is what I am trying to do. I recently wrote an article on the topic for the Minneapolis-St Paul Business Journal, calling for the medical community to support a centralized, constantly-updated, data-centric platform to enable not only physicians but also insurers, policymakers, and even patients examine the actual scientific consensus, and the data that support it, in a single interface.

Read the full article at https://www.bizjournals.com/twincities/news/2019/12/27/there-is-no-bloomberg-for-medicine.html!

Changing the way doctors see data

Over the past four years, my brother and I have grown a business that helps doctors publish data-driven articles from the two of us to over 30 experienced researchers. However, along the way, we noticed that data management in medical publication was decades behind other fields–in fact, the vital clinical outcomes from major trials are generally published as singular PDFs with no structured data, and are analyzed in comparison to existing studies only in nonsystematic, nonupdatable publications. Effectively, medicine has no central method for sharing or comparing patient outcomes across therapies, and I think that it is our responsibility as researchers to present these data to the medical community.

Based on our internal estimates, there are >3 million published clinical outcomes studies (with over 200 million individual datapoints) that need to be abstracted, structured, and compared through a central database. We recognized that this is a monumental task, and we therefore have focused on automating and scaling research processes that have been, through today, entirely manual. Only after a year of intensive work have we found a path toward creating a central database for all published patient outcomes, and we are excited to debut our technology publicly!

Keith recently presented our venture at a Mayo Clinic-hosted event, Walleye Tank (a Shark Tank-style competition of medical ventures), and I think that it is an excellent fast-paced introduction to a complex issue. Thanks also to the Mayo Clinic researchers for their interesting questions! You can see his two-minute presentation and the Q&A here. We would love to get more questions from the economic/data science/medical communities, and will continue putting our ideas out there for feedback!

Some more borderline fraud from the higher education industry.

From the Wall Street Journal: For Sale: SAT-Takers’ Names. Colleges Buy Student Data and Boost Exclusivity

The title pretty much says it all: the College Board is selling data about test-takers (i.e. high school students) to colleges who use that to market to a wider pool of applicants. That wider pool often includes students who don’t stand a chance of getting in to the schools that are now marketing to them, but the marketing gives the false impression that the school wants them.

Joe Six-pack Jr. takes the SAT, fills out a survey, and that survey goes into a database. Some school that normally ranks near the middle of the pack buys a piece of that database, including Joe’s data. They send him a brochure and a letter that looks like it was written specifically for him (and he doesn’t know any better) so Joe, figures he’s being recruited. Instead of just applying to his local state schools, now he shells out an extra $50 to apply to Middling University. They summarily reject his application because his SAT scores were 1100 and they’re only accepting students who scored above 1300. MU now looks a little bit more prestigious in the rankings (which means their current administration can take credit before jumping ship to take a higher paying job at a school looking to also increase in the rankings). The College Board gets paid. The administrators get paid. The U.S. News rankings get a little less useful for incoming students, but they don’t know that. On the other hand the rankings get a little more important for decision makers at schools. And Joe Jr. is funding this whole mess despite being a) the least informed, and b) the least well funded player in this whole mess.

My Startup Experience

Over the past 4 years, I have had a huge transition in my life–from history student to law student to serial medical entrepreneur. Essentially, I have learned a great deal from my academic work that taught me the value that we can create if we find an unmet need in the world, create an idea that fills that need, and then use technology, personal networks, and hard work to create novelties. While startups obviously tackle any new problem under the sun, to me, they are the mechanism to bring about a positive change–and, along the way, get the resources to scale that change across the globe.

I am still very far from reaching that goal, but my family and cofounders have several visions of how to improve not only how patients are treated but also how we build the knowledge base that physicians, patients, and researchers can use to inform care and innovation. My brother/cofounder and I were recently on an entrepreneurship-focused podcast, and we got the chance to discuss our experience, our vision, and our companies. I hope this can be a springboard for more discussions about how companies are a unique agent of advancing human flourishing, and about the history and philosophy of entrepreneurship, technology, and knowledge.

You can listen here: http://rochesterrising.org/podcast/episode-151-talking-medical-startups-with-keith-and-kevin-kallmes. Heartfelt thanks to Amanda Leightner and Rochester Rising for a great conversation!

Thank you!

Kevin Kallmes

Let’s Find Out – or: the Power of Reference

The core message of a number of books I’ve recently had the great pleasure to read has been fairly simple. Have a look. Check it out. Put your numbers in perspective. In a world awash with statistics and cognitive biases imploring us to cheer mindlessly for our own team, having the skill and wherewithal to step back and carefully ask: “can this really be so?” is golden.

One of recently passed celebrity professor and YouTube phenomenon Hans Rosling’s most profound advice for countering misinformation about the state of the world is precisely this: put all numbers in perspective. Never accept unaccompanied numbers – never believe the numerator without checking the denominator. What matters, as Bryan Caplan never ceases to emphasize as the GMU Economics creed, “are statistics, not emotions – and arguments, not stories.”

But, a statistic may never be left alone, Rosling maintains, but always compared to other relevant numbers. What share of its total category does this statistic represent? What was it last year, 5 or 10 or 20 years ago? Is there some self-evident change in associated behavior that is relevant or ought to explain it? A century ago street cars used to kill and injure hundreds of people every year, but since very few American cities make use of street cars today, the casualty is fortunately much lower. If we keep in mind that miles travelled by cars far outnumber miles travelled by street cars, reporting the number of street car deaths – while probably correct – entirely miss the point when discussing traffic safety. In How Not To Be Wrong, Mathematics professor Jordan Ellenberg quipped

Dividing one number by another is mere computation ; knowing what to divide by what is mathematics.

Here’s another example. If I told you about 23 000 individual deaths and spent a brief 10 second on each of them, going through the list would take me almost three days. On a personal level like that, 23 000 deaths is an absurd, insane, catastrophe-style event that few people are emotionally equipped to handle – essentially the size of my hometown, wiped out in a single year. If I told you those 23 000 deaths were due to antibiotic resistant diseases in the U.S. last year, the pandemic scenarios working through your mind quickly escalate. That many! Let’s find the nearest bunker!

If I then told you that cancer and heart diseases (each!) claim the lives of about 20x that, the fear of lethal apocalyptic germs consuming the world ought to quickly recede. Oh.

Here’s another example. It is entirely correct to point out that the number of people killed in worldwide airplane accidents in 2018 (556 people) was much higher than the year before (44 people) and the year before that (325 people). Would one be excused for believing that air travel is getting more risky and dangerous? Forbes, for instance, ran a roughly accurate story claiming that airline fatalities increased by 900%.

Not in the slightest. The number of fatalities from air travel has been falling for decades, all while the number of flights and miles travelled have increased exponentially, meaning that the per-flight, per-mile or per-passenger risk of death has kept dropping. Not to mention that alternative modes of travelling like driving is magnitudes more dangerous.

While Rosling teaches us to figure out what the base rate is, i.e. putting our statistic into appropriate perspective, one of Philip Tetlock’s tricks for becoming a ‘Superforecaster’ is to use Bayesian updating of one’s beliefs. This picks up precisely where Rosling’s idea left off. Once we know where to start, we have to amass more information, numbers and observations from other points of view – Bayesian updating is a popular method to incorporate and synthesize new information with the old.

In short “Calculation, like logic, is your friend” (Landsburg 2018: 44). Statistics matter and numbers can deceive. In order to better understand our realities and see through mistakes that others make – either intentionally to deceive or persuade, or unintentionally through ignorance – we must embrace the core message of people like Ellenberg, Tetlock, Duffy, Rosling or Pinker.

Always Be Comparing Thy Numbers. Never accept an unaccompanied statistic. Never trust numerators without denominators.

Legal Immigration Into the United States (Part 5); The Net Contribution of Immigrants: An Attempt at Critical Quantification

In his October 2006 article in Liberty, (“Immigration: Yes, No, and Maybe” by Richard Fields, Stephen Cox, and Bruce Ramsey), Cox tries to summarize the net cost that (then) current immigrants impose on American society by working out a quantitative example. He stages an imaginary but realistic (Mexican) immigrant family of five living in Los Angeles – two parents and three minor children. He assigns reasonable earnings to the parents and sets those against the probable costs that the whole family imposes in the form of normal local and other services. He arrives at the conclusion that the family annually costs American society 38,900 2006 dollars. (I agree with Cox that this may be a conservative estimate. That would be about 48,000 June 2018 dollars, using the CPI Inflation Calculator of the Bureau of Labor Statistics).

To gauge the real magnitude of the overall normal costs legal immigrants  thus impose on American society, let’s suppose further that all of the 2016 legal immigration is composed of Cox’s families of five. That’s 240,000 such families. The aggregate excess of their social costs over their earnings is 48,000 x 240,000 = 11.52 billion dollars. As a percentage of 2016 GDP, this figure is less than 7/10,000 (seven over ten thousand – 2016 GDP from CountryEconomy.Com).

Now, let’s suppose that Cox was too conservative by one half in his estimate of the cost his family imposes on American society. This would imply that the legal immigrant families that compose all of 2016 immigration cost American society an amount that is like 14/10,000. The numerator in this last estimate includes only legal immigrants. Let’s suppose further that the number of illegal immigrants for the year of reference equals the number of legal ones and that they cost the same and contribute the same as legal immigrants. The cost that all immigrants impose on American society is then approximately 28/10,000 or about 1/3 of one per cent of GDP. If you assume that illegal immigrants earn only half as much as legal immigrants, the net cost of immigration overall goes up correspondingly. It’s still not much. My point is this: In the worst case scenario I can conjure, the net cost that immigrants impose on American society is very low. It’s of the order of 12 million Americans buying a $10 lottery ticket at Nine/Eleven every payday.

This is still certainly an overestimation, for two reasons. One, this scenario is the extreme, limiting case. There is, of course, zero chance that the total legal immigration in any one year is composed entirely of the kind of families of five Cox describes. Among the immigrants, as with nearly all immigration everywhere, there must be a preponderance of healthy young men and young women without children. This happens through self-selection: emigration is very difficult. It requires courage and even a solid dose of unrealism; children are a big impediment in this respect. But, in most cases, younger people without children must easily contribute more than they cost American society because they land all raised up and ready to work (as I said). The exceptions concern those who fall seriously sick– uncommon among the young – and those who end up in jail or prison. The latter is not a rare occurrence among the young in general, among young males in particular. As I said, I deal below with the particular cost of incarcerating immigrants.

The other imaginary limiting case is this: Among the 1,200,000 immigrants in 2016, there is a single family of five as described by Cox and the balance is made up of vigorous young women and young men who never become sick and never transgress the law. In that other limiting case, immigrants are almost certainly a net economic boon to American society. I don’t know where the reality lies and it may change from year to year. It’s doable research which, I think, has not been done.

The second reason why the figure of 28/10,000 is probably an overestimation, or why it leads to fallacious inferences, has to do with life cycles. First, there will probably be a period during the family’s life when the children will be grown and capable of working while the parents themselves are working, undisturbed by family obligations. During that period, three or four, or all five immigrants will in all likelihood contribute more than they take from American society, in spite of their low qualifications. This sweet spot may vanish when the parents reach Medicare and Social Security age. In the meantime, several family members will have contributed to the relevant social funds; one or more of the children will too, probably for 30 years or more. Hence, whether the family of five receives a net benefit or impose a net cost over a longer, trans-generational period depends on actuarial calculations that neither Cox nor I have performed.

I hasten to add that it’s quite possible that such actuarial calculations, performed with real numbers, would still show the five in my chosen family as perpetrating a net cost on American society. To be thorough, one would have to take into account two more things. One is the possibility that one of the three children will turn out to be a great, outsize contributor, like the 40% American Nobel Prize winners born abroad. Or all three. The relevant reasoning has to be trans-generational to some extent, it seems to me. Just look at the extreme imaginary scenario below.

For ten years in a row, the US admits as many immigrants as it did in 2016. That’s 12 million immigrants. Let’s assume none dies during that period and they have no children (We will see that this unrealistic assumption does not matter here.) Not one of the twelve million is able to pay his full fare. On the average, they each cost American society $20,000 there is no chance they will ever pay back, one way or another. However, one of these hapless immigrants is Steve Job’s biological father. You know the rest of this true story. Ask yourself: If it were your decision, knowing this and, and based solely on economic matters which are the stake here, would keep out all twelve million?

This quandary poses an interesting conceptual problem we keep encountering: Had Job’s biological father not accidentally made his girlfriend pregnant; had they not decided to give Steve up for adoption, would someone else have developed the personal computer with Wozniak? Without him? Would you bet on it? The truth is that American society is unusually inventive but it’s probably not the most inventive on a per capita basis. (Last time I looked, the Japanese were registering more patents than Americans – that’s per capita.) It’s also seems true that immigrants account for a disproportionate number of American innovations, including 40% of all Nobel prizes in other than literature. (And also excluding the often farcical Nobel Peace Prize.) It’s not absurd to think of American inventiveness as the happy encounter of American institutions unusually favorable to innovation with immigrant vigor. This is just a speculation, of course but how willing are you to discard it summarily?

Finally, the calculation of immigrants’ net burden imposed on American society necessarily fails to take into account real positive contributions that are difficult to quantify, more or less intangible contributions, some of which I have mentioned elsewhere. They go from Italian cuisine to my own ability to interpret some world events better than almost any native-born professor. Here is another mental experiment: Suppose a national society decided, through some process or other, to bring up the average quality of its every day food from, say English levels, to 1/3 of Italian level. The cost would be astronomical and the result would clearly constitute a significant improvement in the quality of Americans’ every day life – which is what the science of Economics is all about, of course. My point is that the fact that this felicitous result was achieved through the happenstance of immigration does not imply that its societal value is zero.

One of the highest per capita expenditures that immigrants–like every other population group over and below a certain age–impose on American society is the cost of incarceration. That cost is also mostly borne by state and local authorities, although there exists a process by which the federal government reimburses local governments for illegal immigrants incarcerated for crimes other than illegal border crossing (explained in Cox 2006). I examine below the tangled issue of the cost of immigrant incarceration.

[Editor’s note: In case you missed it, here is Part 4]

Know your data, show your data: A rant

I am finishing up my first year of doctoral level political science studies. During that time I have read a lot of articles – approximately 550. 11 courses. 5 articles a week on average. 10 weeks. 11×5×10=550. Two things have bothered me immensely when reading these pieces: (1) it’s unclear authors know their data well, regardless of it being original or secondary data and (2) the reader is rarely showed much about the data.

I take the stance that when you use a dataset you should know it well in and out. I do not just mean that you should just have an idea if its normally distributed or has outliers. I expect you to know who collected it. I expect you to know its limitations.

For example I have read public opinion data that sampled minority populations. Given that said populations are minorities they had to oversample in areas where said groups are over represented. The problem with this is that those who live near co-ethnics are different from those who live elsewhere. This restricts the external validity of results derived from the data, but I rarely see an acknowledgement of this.

Sometimes data is flawed but it’s the best we have. That’s fine. I’m not against using flawed data. I’m willing to buy most arguments if the underlying theory is well grounded. To be honest I view statistical work to be fluff most times. If I don’t really care about the statistics, why do I care if the authors know their data well? I do because it serves as a way for authors to signal that they thought about their work. It’s similar to why artists sometimes place a “bowl of only green m&ms” requirement on their performance contracts. Artists don’t know if their contracts were read, but if their candy bowl is filled with red twizzlers they know something is wrong. I can’t monitor whether the authors took care in their manuscripts, but NOT seeing the bowl of green only m&ms gives me a heads up that something is off.

Of those 500+ articles I have read only a handful had a devoted descriptive statistics section. The logic seems to be that editors are encouraging that stuff be placed in appendices to make articles more readable. I don’t buy that argument for descriptive statistics. Moving robustness checks or replications to the appendices is fine, but descriptive stats give me a chance to actually look at the data and feel less concerned that the results are driven by outliers. In my 2nd best world all dependent variables and major independent variables would be graphed. If the data was collected in differing geographies I would want the data mapped. In my 1st best world replication files with the full dataset and dofiles would be mandatory for all papers.

I don’t think I am asking too much here. Hell, I am not even fond of empirical work. My favorite academic is Peter Leeson (GMU Econ & Law) and he rarely (ever?) does empirical work. As long as empirical work is being done in the social sciences though I expect a certain standard. Otherwise all we’re doing is engaging in math masturbation.

Tldr; I don’t trust most empirical work out there. I’ll rant about excessive literature reviews next time.