Sep
1
Testing for Seasonality, from Andrew Moe
September 1, 2009 |
After I looked at the data from 1900 to 2008, it is safe to conclude that September historically was the worst month for investors, period. — A Reader of Dailyspeculations.
Analysis of seasonality effects often falls victim to one of the most common oversights in probability. It is illustrated by the birthday problem in which a group of 23 or more randomly chosen individuals will be found to have (with probability greater than 50%) at least one pair sharing a birthday. With two individuals and 365 days in a year matches are rare, and 23 individuals still do not seem many compared to 365 days, but this apparent paradox is resolved by considering the number of possible pairings between those 23 individuals instead [Ed.: 23*22/2 = 253 pairings, which is close to 365].
In much the same way as a naive application of probability will massively underestimate the odds of two individuals in the group of 23 sharing a birthday, seasonality studies suffer from a similar effect. When grouping by week, month, or season, combinatorial considerations come into play. While 63 out 108 Septembers having a loss might appear statistically significant as a series of Bernoulli random trials (assuming an underlying 50/50 split between up and down months, p = .03), such effects are washed away when we instead consider the underlying empirical distribution of days or weeks, randomly permuted to form months. When comparing the months composing September to a random basket of days the results are random. Attempts to find seasons of non-randomness are frequently subject to data mining bias, as the same permutation test debunking the September drift is easily used to identify (falsely) statistically significant periods.
The study: Running a bootstrap permutation study on Dow data from 1960 to 2008 we estimate the empirical distribution of differences in monthly return between September and other months. We test the hypothesis that a random September is no more bearish than a composition of random days sampled with replacement. We find that the mean difference between populations is 0.0695%, yielding a p-value of 0.3612 – random.
Bob Humbert writes:
The same September underperformance anomaly exists in the municipal and corporate bond markets. Doesn't this seem "unusual" or is it simply a byproduct of relative value transmitting itself through the various asset classes?
I am not as numerate as you; but keep in mind this: if a coin comes up tails 20 times in a row a Trader would examine the coin… while a Quant would merely assume he was witness to an extremely remote event…
Alston Mabry reports on another study of the issue:
Stats for all Dow months from Oct 1928 thru Aug 2009:
All Dow months:
mean: +0.37%
sd: 5.44%
Take all days in this period, randomly pull 20 to create a month like September, and do this 1000 times (with replacement) to create 1000 randomly-selected "months" with the following stats:
1000 randomly-created months:
mean: +0.26%
sd: 5.03%
Close enough, given the vagaries of the actual monthly data, the use of replacement, etc. Randomly pulling out days creates a distribution of "months" very much like the actual distribution, so one cannot find a solid critique of the use of the actual monthly data, given the similar stats of the randomly-created months.
Then pull the actual Septembers out and compare them to the actual months:
All actual Septembers:
mean: -1.66%
sd: 6.37%
z vs all Dow months: -3.34
That z is spot on with the result from the random resorts of months posted earlier. So, one must conclude again that, in the time period under study, September has been unusually cruel.
The thing about the previously-posted analysis with the random resorts is that one is really asking the generalized question: If one treats the monthly % change series as a set that can be redistributed among the months-as-containers, what is the likelihood that any month will have an extreme mean like -1.66%? I think this eliminates the multiple-comparison problem, since it doesn't have to be September.
But another issue is: Can you treat a series like Dow monthly % changes as a set that can be re-sorted? One concern is the issue of volatility regime changes. For example: in a volatile year, September is the worst month at -4%, and December the best at +4%; then in a calmer year, September is the best month at +2%, and December the worst at -2%; now you have September's mean return as -1% and December's as +1%. But is September really "worse"? Or does it just appear so because of the problems inherent in mixing volatility regimes?
One way I've tried to address this issue is to normalize each month as a z score compared to the mean and sd of the previous 12 months. So that in the example with September and December, the values for September might be -2.5 and +2.5, and the same for December, making the months equivalent.
Normalizing the Dow months (again, Oct 1928 though Aug 2009) in this way and then analyzing September again, one gets:
All Dow months normalized:
mean: -0.05
sd: 1.19
September:
mean: -0.37
sd: 1.23
z vs all months: -2.42
So this adjustment pulls the z score in (as it does in all cases I've used it), but here the z for September still leaves it in the "unusually cruel" category.
Mr. K wrote: "Shorting September every year for 80 years could be fine, but on any given year, it is a crapshoot."
Alas, yes — a crapshoot with a bias. But the analysis is fun.
Comments
4 Comments so far
Archives
- June 2013
- May 2013
- April 2013
- March 2013
- February 2013
- January 2013
- December 2012
- November 2012
- October 2012
- September 2012
- August 2012
- July 2012
- June 2012
- May 2012
- April 2012
- March 2012
- February 2012
- January 2012
- December 2011
- November 2011
- October 2011
- September 2011
- August 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- February 2011
- January 2011
- December 2010
- November 2010
- October 2010
- September 2010
- August 2010
- July 2010
- June 2010
- May 2010
- April 2010
- March 2010
- February 2010
- January 2010
- December 2009
- November 2009
- October 2009
- September 2009
- August 2009
- July 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009
- January 2009
- December 2008
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- August 2007
- July 2007
- June 2007
- May 2007
- April 2007
- March 2007
- February 2007
- January 2007
- December 2006
- November 2006
- October 2006
- September 2006
- August 2006
- Older Archives
Resources & Links
- The Letters Prize
- Pre-2007 Victor Niederhoffer Posts
- Vic’s NYC Junto
- Reading List
- Programming in 60 Seconds
- The Objectivist Center
- Foundation for Economic Education
- Tigerchess
- Dick Sears' G.T. Index
- Pre-2007 Daily Speculations
- Laurel & Vics' Worldly Investor Articles
Very helpful analysis that deserves more readers. Perhaps I can push a few there.
What??
bootstrapping… a total load of cr_p. ask vic… best z
I think instead of a z-value he meant the probability side: 0.3612 or in z-value terms: 0.35. Heavily random. Am I right?
Quite a post, I liked it.