The data problem
As remarked in a recent paper (1), many are excited at the prospect of exponentially growing data. Supercomputers, faster and fatter pipes to connect the nodes and ever expanding cloud storage are adding to the excitement of the data scientists. It is however, not clear, how expanding data will improve information content and more importantly affect the life of people and the economics of companies and countries. The null hypothesis to test is that expanding data has no effect on the world.
Let’s look at history (2). In business, just two decades ago, many management and information technology consultants concluded that the “singularity,” has already been reached. Enterprise Resource Planning systems will collect and store every piece of available data in the enterprise. It was obvious to many that once all the data is collected, the productivity of the business will improve exponentially. After all, information is in the data and all one has to do is to collect them. Computer manufacturers and database vendors gladly entered the fray – providing ever faster computers and ever larger storage bins – for cheaper prices. For many, the productivity curve was unending – some consultants even predicted infinite profits for their clients.
A few years later, they will do it again – this time it was “business intelligence,” nice platters to cut, dice and present data. The hypothesis was that once managers see the pictures, they will get great insights to drive their companies to ever higher profitability. Now, two decades later, the garbage data most collected is sitting in the basement with little decision utility. The picture makers have vanished and the profitability of the companies are about the same. So what happened? Did collecting, storing and presenting data really help managers run their companies better?
This is good learning for those in academics also. All they have to ask is what significant insights arrived in the last two decades as they swam in the ever expanding data ocean – regressing the heck out of every empirical tidbit. Did they find how markets work? Did they find how medicines can be improved? Did they find how the the universe works? if they did not, then, it should give them pause as the next wave of data moves in like a tsunami and drench every remaining brain cell in their brains.
Data is good – more of it is not necessarily good. Even more, could be bad.
(1) A 100-gigbit highway for science Published: Tuesday, May 1, 2012 - 10:34 in Mathematics & Economics. Source: DOE/Lawrence Berkeley National Laboratory
(2) Flexibility : Flexible Companies for the Uncertain World: Gill Eapen http://www.amazon.com/Flexibility-Flexible-Companies-Uncertain-World/dp/1439816328/ref=sr_1_2?ie=UTF8&qid=1336001147&sr=8-2
