The Cost of ‘Free’ – or why I don’t like freeware

This is a partial response to Fabio Rojas recent post on the fate of Stata, a statistics package, given the rise of a free alternative, R. Rojas and others have many reasons for why R is a good package, but for now I wish to deal with the argument that it being ‘free’ is a virtue.

R is free, but I see it as a fault because it reveals that it doesn’t have a devoted support system and because it isn’t free at all. It’s actually very costly!

If you’ve spent any time with an economist you should know that there is no such thing as a free lunch. If R is free we should not simply assume it is better. To the contrary we should ask why it is free. As I have tried to argue elsewhere, it is because when you purchase software you aren’t just purchasing a few lines of code. You’re purchasing the support system that comes with it. When a company purchases Stata, or any commercial software, they do so with the expectation that they can call a dedicated hotline for troubleshooting. As software has evolved you’ve seen companies experiment with pricing to acknowledge the fact that we don’t purchase a one time software but a continuous support system.

Consider Xbox or Playstation’s online services. Their use is charged on a per time basis because it costs money to run servers and provide customer support. Even ‘freemium’ games, which nominally don’t require any money to play, survive off micro transactions which enable companies to earn steady revenues in exchange for continuing support and new content. I would not be surprised if freemium statistical software is tried in the future – access to basic regressions is free but more advanced models cost money to run. I half joke.

But let’s assume you’re good at coding and don’t need much support outside of a few days reading an R book. Should you praise R for being ‘free’? No, because you still paid the time value of your time. Every hour spent learning how to code in R is an hour you could have spent doing any number of things.

Now to be clear, you may still want to learn R if it frees up your time in the future by automating X process. This post isn’t to argue against adopting R. My point is only to say that it isn’t free in a meaningful sense. Adopting R costs in the sense that you’re giving up a devoted support system and value of time equal to how long it takes you to become proficient in it.

It’s possible that once you account for those things R is still ‘cheaper’ than commercial software like Stata or SPSS. That is an empirical question beyond the scope of this post.



Similar to Brandon I’ve began playing around with new statistical packages. Like many libertarian scholars I have my skepticism about the limits of what we can learn from number crunching. I think there is a place for statistical analysis in the social sciences, but it is definitely meant to be a tool, not an ends to itself, and should be complemented with additional methods.

Recently I’ve begun trying to find a Geographical Information Systems (GIS). I had initially intended to buy a copy of ArcGIS, one of the dominant GIS packages, until I looked at their pricing plans. A single license for the basic version costs $1,500 USD. I’m sad to say this price tag is not abnormal. STATA, one of the larger statistical packages, sells an annual licence for its bare bones version at $125 USD. SAS has its pro version going for $9,000 USD.

What is abnormal is that several freeware packages exist that provide comparable services. Are you an undergraduate student taking a class on univariate regression analysis? Download Gretl. It has a menu based system that is relatively easy for even the newest of users to play around with. If you’re looking to challenge yourself opt instead for R.

Likewise, for those who like me are on a budget, there exists several freeware alternatives for GIS systems such as GRASS and QGIS. I’m still learning GIS so I can’t comment on either package, but I will be sure to provide reviews once I’m comfortable with them.

If several freeware alternatives exist, why do retail versions remain dominant in the industry?

Part of the answer is that corporations and universities value the customer help hotline if their software starts to malfunction. Poor graduate students don’t have much money, but tend to have a surplus of free time to use trying to figure out why their software isn’t working. Corporations have the opposite constraints, they have infinitely more money than graduate students but have much stricter time constraints.

Surely that can’t explain it all though, can it? If what you are purchasing with retail packages is the customer hotline, why haven’t a group of entrepreneurial (and hungry) grad students set up a business where they provide dedicated IT support for freeware? Several attempts have been made by Linux enthusiasts to provide such services for corporations looking to replace their Microsoft OS systems, so the idea has surely been thought of before.

Another possible answer is that what these retail packages are selling is their community. STATA may not be so technically superior to Gretl, but the former’s community is larger than the latter. If you have a problem with Gretl you can’t easily find another user to help out outside of a few niche forums. Meanwhile you are sure to find a STATA compatriot just by walking down a social science college’s halls. I am not really convinced by this idea though. There is a value to joining an existing community, but in the long run people do move across networks. Consider Myspace, which less than a decade ago was the social network, until it was defeated by another social network. How much longer will STATA and ArcGIS last before its user base migrate to R and GRASS?

What do you all think? What other reasons might explain why pricey retail statistical packages remain dominant over comparable freeware alternatives?

Questions about R

And lots of ’em.

I just downloaded the R package from the CRAN in Seattle. I haven’t opened it yet. I don’t even know what CRAN is. I’ve been gathering some data on the GDP (PPP) per capita of regions in the world and I want to tinker with them, but I also want to get familiar with a stats program.

Any help with the fundamentals of what I’m dealing with would be great. Thanks!

UPDATE 12/18/2014: Michelangelo has steered me away from R and into the loving arms of gretl:

I prefer gretl to R because the former has a menu-based interface. R, Stata, etc. on the other hand require you to now how to ‘code’. There are menus in the latter, but I don’t find them user friendly. The coding is hardly hard, but I think it confuses people who are just starting out and it isn’t really worth coding if you’re doing it for fun.