Bill Egan wrote:

Plotting the data different ways pays off all the time. I earned a US patent because I examined bi-plots of ~50 variables and saw something interesting. Further investigation showed a sensible relationship to the physical mechanism I was interested in modeling, I always use bi-plots. Once I have a feel for the data and can throw out some variables…

 Reminds me of an old and slightly caricatured study we did in my R&D lab. We were always searching data to test our algorithms. One day, we got some astronomy data. A huge volume, nearly fifteen variables. The guy who brought us the data was quite proud about the quantitative numbers. So much data, so many variables. Must be serious. We didn't know much about the physics of the problem and we were also a bit impressed. (Because, also, we had to find how practically to manipulate this data amount with our software.)

Once the practical aspects were solved, the study revealed to be very rapid. In fact, a real variables slaughter. When finished, it appeared the real dimension of the data set was between one and two. Some kind of spiral in 3D.

On the statistical forums and lists I follow, there are very regularly people complaining: "I have too much data for my computer memory/ software abilities. Please how can I handle them as a whole?"

Sounds like we made big progress since 1637 and we could forget Descartes.

"Diviser chacune des difficultés que j'examinerais, en autant de parcelles qu'il se pourrait, et qu'il serait requis pour les mieux résoudre."

[To divide up each of the difficulties which I examined into as many parts as possible, and as seemed requisite in order that it might be resolved in the best manner possible.]





Speak your mind


Resources & Links