Bias, a query from Bill Rafter

November 10, 2007 |

Let's say that I work really hard and come up with a long-only trading system of largecap stocks that over the last 10 years had a compound annual rate of return of 20% with a maximum drawdown of 15%. The first thing everyone says is my universe was biased — survivor bias or look-ahead bias. I know there is some bias because I test my universe and find the universe I used had a 14% return with a 25% drawdown. So although there is some bias, I still beat the universe. But I am also happy because I know the S&P500 and Russell 3000 each had a 10% rate of return and a 45% drawdown. But the bias charge still nags me. I go back to the computer and come up with a short side to complement my long-only version. So my new system is long-short. Using the same stock universe over the same period, my long-short combined program produces a 10% return with a 3% drawdown. By going to a long-short program, did I eliminate the previously existing bias?

Phil McDonnell replies:

You cannot tell if the bias has been eliminated. Let me give a simple example. The S&P and most indexes are cap weighted. Effectively this means there is a lower bound a company must reach to be included in the S&P. Assume the sample is the current constituents of the index. Then in an historical study the sample includes knowledge of the future because it includes stocks which were added and excludes stacks that were deleted.In the bottom portion of the biggest 500 stocks there is a group of companies which grew their way into the elite index. These stocks probably outperformed. Over the last few years an equal number of stocks dropped out to make way for the new ones. These grew backwards and presumably underperformed.

In this example one would expect bias to arise if the data are filtered on market cap, sales or earnings growth and stock price growth (relative strength). When those factors are implicitly included with future knowledge that the stock will cross the threshold of index inclusion it can lead to a strong bias. For example, relative strength is related to market cap by a simple multiplication by the number of shares.

The only way to really determine what the bias might be is to identify the stocks which were added or deleted from the index but would have met the filtering criteria. Only then can we truly know the bias. But if you are going to do that you might as well simply start with the original stock list which existed at the time and do the study right. 

Rob Steele remarks:

If you were data snooping you'd probably see better performance. Survivorship bias is certainly an issue; if you can, expand your universe to include everything that would ever have come into it over the test period. The big issue, however, is the "I work really hard and come up with …" part. How do you know you aren't data mining? The harder you look the more likely you are to find spurious correlations that aren't predictive. You can never be totally positive you've found something real but you can guard against chimeras to some extent. One way is to not look too hard. That is, limit free parameters and the size of your search space. Another is rolling backtests where you repeated introduce previously unseen data. Aronson's Evidence Based Technical Analysis is good on this.

Gregory van Kipnis replies:

What bias? There was still residual information despite survivor/peek-ahead, or are you saying Dr. Rafter did not use a hold-out sample either? Information decays, but if it doesn't decay too quickly you can exploit it. If (big if) there was bias, then going short part of the remaining universe adds to the bias. It doesn't subtract. Systems that learn from the past are not ipso facto completely biased.

A little bias is not such a bad thing (I stay away from all growling dogs for that reason even though most won't bite me). Learning from the past is great. Adding common sense and questioning if anything is different from the past is what creates an edge. I seek that.





Speak your mind

1 Comment so far

  1. Curmudgeon 2312 on November 9, 2007 12:17 pm

    If you suspect there is bias in your database, why not address the question directly with a bias-test. Try to reproduce the historical returns of a known portfolio, like the S&P 500. When I did this with a database my boss had paid a lot of $$$ for about 5 years ago, I found reasonable (though far from perfect) agreement between my numbers and the published ones for the last few years, but for older years (late seventies to mid eighties) I found large discrepancies, with “my” S&P returns massively higher than the real ones. On closer examination my run included only about 375 of the 500 stocks in the early years; the rest could not be matched. That is a situation of MASSIVE bias, which called into question all the testing I had done for fancy strategies in those years.


Resources & Links