30-Apr-2006
A Lesson from Philip J. McDonnell: Robustness of Counting Studies

Suppose we have performed a study of some methodology for the last 1000 days of trading and found that it produces an expected return of r with a standard deviation of s at the 5% significance level. What do we really know and how much confidence can we have in our results?

First, the significance level implies that the result could have arisen through chance only 5% of the time. That means if we did 20 significant studies we would expect that 19 would be valid and one would be the result of a random data aberration. 95% of the time the study will have validity, 5% it will not. So if you want perfect guaranteed returns consider T-bills not statistics. I also hear that Elliot Wave, astrology and Gann work perfectly as well but haven't tried them.

If you've decided that 95% confidence is good enough for you then the next question is how to test your study for robustness. One way is to break the data sample into different time periods. If the study worked in an older period and in a more recent period one can have a higher confidence in it.

By the same token we can look at the variance of two different periods. If the variance is changing then we may have a problem called heteroskedasticity. Before this post gets banished to the KRS-LIST let me say that simply means different variances. An F statistic can be calculated to tell us if the data has a significantly different variance in two different periods.

Markets often exhibit non-stationary variance and that is a fact which must be recognized. However our choice of model can also be the cause of some of the heteroskedasticity. For example with long term data choosing a simple linear model without detrending can result in serious heteroskedasticity due solely to the wrong model. Using a log transform minimizes the problem because it yields a better fit of the long term compounded returns of the market.

Another common syndrome is autocorrelation in the residuals or in the trading results. One can test for this with a simple autocorrelation of the residuals or use a Box Jenkins statistic. If present it usually means there is a missing variable which should be included in the model if you only knew what it was. One can explicitly include a variable for the last residual or alternatively use a Kalman filter to do the same thing.

Classical non-parametric tests can be used if the underlying distribution is non-normal. Examples include Mann-Whitney, Spearman rank correlation, Chi Squared, ANOVA and simple binomial tests. Robust testing using bootstrap techniques can also give us confidence intervals and p values irrespective of the underlying distribution.