Chris Anderson suggests that it’s time to chuck the scientific method in favor of a new methodology that serves up facts the way Google serves up ads — through calculations on massive sets of data:
But faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete. Consider physics: Newtonian models were crude approximations of the truth (wrong at the atomic level, but still useful). A hundred years ago, statistically based quantum mechanics offered a better picture — but quantum mechanics is yet another model, and as such it, too, is flawed, no doubt a caricature of a more complex underlying reality. The reason physics has drifted into theoretical speculation about n-dimensional grand unified models over the past few decades (the “beautiful story” phase of a discipline starved of data) is that we don’t know how to run the experiments that would falsify the hypotheses — the energies are too high, the accelerators too expensive, and so on.
There is now a better way. Petabytes allow us to say: “Correlation is enough.” We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.
I think there’s a lot to be learned from statistical analysis of data in the cloud, but I’m not sure that theory and models can be put away so quickly. There has to be a framework of questions we are asking, and we need to interpret the data once we have it. The theory may be moved to the algorithms or the interpretive methodology, but it still has to be in there somewhere.