Dec

15

Data Mining, from Yucheng Pan

December 15, 2007 |

MiningMany researchers only have very limited data to play with and are not necessarily required to know the perils of data mining. With abundant data, however, the perils must be fully recognized. The most popular paper in modern mid-ocean-ridge geochemistry and geophysics was about a correlation between sodium content (adjusted to a constant magnesium content) and the thickness of oceanic crust. When the chief author of the awe-inspiring paper was asked how one could even think of such a correlation, she said with the large quantity of global data set they just collected, they simply plot anything against anything else (how a smart way not to miss any possible correlation I thought at that time) and the above-mentioned correlation appeared. They came up with a grandiose explanation for the correlation. I was never able to fully appreciate that paper as most of that correlation comes from one outlier location (Iceland) and on local scale (meaning each location studied along mid-ocean ridges) that correlation either disappears or is the reverse. In addition, my test showed that the adjustment to a constant magnesium content itself produces that same correlation.

To apply the plotting anything against anything else approach in trading, I tested trading methods based on all combinations of a very large number of entry and exit criteria. Needless to say, a large number of trading methods that could produce over one percent daily returns appeared. Once I further tested these methods using outside trading data, however, all their grandiosity disappeared. The above sodium correlation has not been similarly tested.


Comments

Name

Email

Website

Speak your mind

Archives

Resources & Links

Search