I want to invest in “big data stocks”. After all, everyone is saying that big data is the future of health care, education, government, business, and will literally change the world. As someone who works with data both as an academic at USC and as the principal data scientist at Ranker, I am the type of person who is likely to make and believe in such hyperbolic claims. I recently put money into my IRA and needed to invest it and as someone who believes in investing in what I know about, naturally I wanted to invest in our data driven future.
Where should I invest? If you look around the internet, you’ll find a number of recommendations from places like Forbes or The Street. The general consensus appears to be to take the “picks and shovels” approach to investing in big data, where you invest in the companies that make the tools that enable people to use data, rather than in the data itself. I’m writing this post because I think this is absolutely the wrong approach. I believe in investing in data, not in tools. Why do I believe that?
- My experience in academia has taught me that simple statistics and tools are often the most reliable. If there is signal to be detected, any analysis and/or tool should be able to find it. Many people turn to more complex statistics when they don’t find the right relationship using simple statistics. In psychology, people are finding that the use of more complex models (e.g. covariates) is often an indicator that the study’s results may be less likely to be reliable. Given the size of datasets that we often have in data science, we often don’t need special statistical techniques to find relationships in data as we have so much statistical power that most tools and techniques should give you convergent results. Put simply, the tools matter less than the data.
- The most popular tools and techniques are often open source. You can do a lot with R, Python, Gephi, Mahout, etc.
- Yes, there are advantages to using particular distributions of open source tools (e.g. Hadoop distributions that come with particular features), but there are so many companies out there offering different flavors of products that do essentially the same thing, that I can’t see how any particular company is going to be the next Apple or Google, in terms of stock growth. There are no barriers to entry in the tools market. Perhaps a company will be the next RedHat, which may be a fine business to be in, but I don’t believe that that is the revolutionary wave that investors in big data stocks are looking for.
So what should you do if you want to invest in big data? Buy stock in companies that have the best, biggest, most unique sets of data and/or the most defensible ways of collecting that data. I invested my IRA money into Facebook, which has the biggest and best dataset of human behavior that ever existed. I invest my academic time into scalable data collection projects such as YourMorals, BeyondThePurchase, and ExploringMyReligion, confident that that will lead to the most long-term knowledge. And I invest my professional time into Ranker, which has a scalable process for collecting an opinion graph, that will be essential for the kinds of intelligent applications that big data futurists have been promising us.
Do you want to invest in big data? Generally, you’ll get better returns if you invest your money, time, and energy in data, rather than in tools.
- Ravi Iyer