Jun
25
Is the Scientific Method Obsolete? from Matthew Alexander
June 25, 2008 |
Google strikes again. With its huge computing power it is changing the way science is done, according to The End of Theory: The Data Deluge Makes the Scientific Method Obsolete, an article by Chris Anderson in Wired magazine.
Some excerpts:
"All models are wrong, but some are useful." So proclaimed statistician George Box 30 years ago. […] Speaking at the O'Reilly Emerging Technology Conference this past March, Peter Norvig, Google's research director, offered an update to George Box's maxim: "All models are wrong, and increasingly you can succeed without them."
[…]
Scientists are trained to recognize that correlation is not causation, that no conclusions should be drawn simply on the basis of correlation between X and Y (it could just be a coincidence). Instead, you must understand the underlying mechanisms that connect the two. Once you have a model, you can connect the data sets with confidence. Data without a model is just noise. But faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete.
[…]
There is now a better way. Petabytes allow us to say: "Correlation is enough." We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.
Dylan Distasio replies:
I didn't see any compelling argument in that article that the scientific method is on the verge of becoming obsolete.
While cloud computing can be a great tool for analyzing protein folding or looking for extraterrestrial signals, I don't see how throwing "the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot" is a viable approach.
This sounds like the ultimate in overfitting data (although I guess there is no model!). Venter may have come up with a brilliant software tool for rapidly sequencing DNA, but it's only useful within the context of a genomic model that was built using the scientific method.
I still see noise without a framework to hang the data on, and a testable (i.e., potentially refutable) hypothesis.
William Smith relates his experiences:
I work with large amounts of data all the time. Failure to understand the mechanism and specify the model form based on the mechanism leads one to make worthless models that fail miserably.
The noise level is very high in these types of datasets. Many "patterns" are just random but algorithms will treat them as real. The "kitchen sink" approach of throwing many algorithms on a cluster of processors at a large set of data is guaranteed to find something. If all that's there is noise, they fit the noise. When tested on new data it had never been exposed to before, the model won't work. I have seen even the most sophisticated machine learning algorithms fail, support vector machines, for example.
I have worked with many modelers who have tried the automated brute force approach and they have never once managed to solve the problem they were working on.
I have built models in the field of chemometrics that I patented which use only two variables because I understood the mechanism. It was an iterative process, and I certainly didn't get it all correct in the first cycle. In fact, I have had some outstandingly stupid ideas. But the process of testing, refining, discarding, and generating ideas is hypothesis driven science, and that is a process which hypster computer scientists cannot perform. Never seen one of those guys capable of doing it in the last 15 years. If this is where Google is going, they are in deep kimchee and wasting a lot of money. Looking for patterns in the hay does not find the needle.
Comments
2 Comments so far
Archives
- May 2013
- April 2013
- March 2013
- February 2013
- January 2013
- December 2012
- November 2012
- October 2012
- September 2012
- August 2012
- July 2012
- June 2012
- May 2012
- April 2012
- March 2012
- February 2012
- January 2012
- December 2011
- November 2011
- October 2011
- September 2011
- August 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- February 2011
- January 2011
- December 2010
- November 2010
- October 2010
- September 2010
- August 2010
- July 2010
- June 2010
- May 2010
- April 2010
- March 2010
- February 2010
- January 2010
- December 2009
- November 2009
- October 2009
- September 2009
- August 2009
- July 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009
- January 2009
- December 2008
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- August 2007
- July 2007
- June 2007
- May 2007
- April 2007
- March 2007
- February 2007
- January 2007
- December 2006
- November 2006
- October 2006
- September 2006
- August 2006
- Older Archives
Resources & Links
- The Letters Prize
- Pre-2007 Victor Niederhoffer Posts
- Vic’s NYC Junto
- Reading List
- Programming in 60 Seconds
- The Objectivist Center
- Foundation for Economic Education
- Tigerchess
- Dick Sears' G.T. Index
- Pre-2007 Daily Speculations
- Laurel & Vics' Worldly Investor Articles
A good critique: http://cscs.umich.edu/~crshalizi/weblog/581.html
That Wired article stated:
“This is a world where massive amounts of data and applied mathematics replace every other tool that might be brought to bear”
With respect to the author, there’s a fallacy in his thinking. All mathematics is subject to the rigor of proofs, testing, and retesting, under all possible permutations and combinations. In math, every solution is scrutinized by others, and nothing is taken for granted. Math guys are notorious for lusting to prove their collegues wrong.
Is the Wired author suggesting that the use of empirical equations, complete with the inherent fudge factors will replace real science? Is he suggesting that data, which is a result of observation and science, will replace science?
Frankly, I think the good old Scientific Method will truimph over all.
Jeff