Where does the value of big data truly present itself, in the data itself or in the algorithms we use to make sense of it? Bill Franks of Teradata comes down sharply on the side of the data:
…I’m convinced that new information will beat new algorithms and new metrics based on existing information almost every time. Indeed, new information can be so powerful that, once it is found, analytics professionals should stop worrying about improving existing models with existing data and focus instead on incorporating and testing that new information.
By “new information,” he means information that didn’t exist before or that we now have to a level of depth never before possible. Sensor data in Internet of Things environments can represent either of these kinds of data. For example, we may have always used temperature data in performing some calculation, but back in the day we used a daily average. Now we have sensors providing temperature data every few minutes (or seconds.) That’s data to a greater depth. For data that we didn’t have before, Bill cites sensors on cars that track wear and tear as the vehicle is driven. Previously, vehicle repair occurred in a primarily reactive way. Now we can begin to anticipate repairs before they are needed.
Somehow this reminds me of a talk that Eliezer Yudkowsky gave at the Singularity Summit back in 2007. He said:
In the intelligence explosion the key threshold is criticality of recursive self-improvement. It’s not enough to have an AI that improves itself a little. It has to be able to improve itself enough to significantly increase its ability to make further self-improvements, which sounds to me like a software issue, not a hardware issue. So there is a question of, Can you predict that threshold using Moore’s Law at all?
Geordie Rose of D-Wave Systems recently was kind enough to provide us with a startling illustration of software progress versus hardware progress. Suppose you want to factor a 75-digit number. Would you rather have a 2007 supercomputer, IBM’s Blue Gene/L, running an algorithm from 1977, or a 1977 computer, an Apple II, running a 2007 algorithm? And Geordie Rose calculated that Blue Gene/L with 1977′s algorithm would take ten years, and an Apple II with 2007′s algorithm would take three years.
There is a progression here, albeit a counter-intuitive one. We might be inclined to think that hardware adds more value than “mere” software and that software is inherently more valuable than “mere” (or the term we like to throw around a lot is “raw”) data. The opposite turns out to be the truth. The data itself is where the value is. Hardware and software only help us to focus on the potentialities, the possibilities, that it already contains.