After I looked at the data from 1900 to 2008, it is safe to conclude that September historically was the worst month for investors, period. — A Reader of Dailyspeculations.

A MAnalysis of seasonality effects often falls victim to one of the most common oversights in probability. It is illustrated by the birthday problem in which a group of 23 or more randomly chosen individuals will be found to have (with probability greater than 50%) at least one pair sharing a birthday. With two individuals and 365 days in a year matches are rare, and 23 individuals still do not seem many compared to 365 days, but this apparent paradox is resolved by considering the number of possible pairings between those 23 individuals instead [Ed.: 23*22/2 = 253 pairings, which is close to 365].

In much the same way as a naive application of probability will massively underestimate the odds of two individuals in the group of 23 sharing a birthday, seasonality studies suffer from a similar effect. When grouping by week, month, or season, combinatorial considerations come into play. While 63 out 108 Septembers having a loss might appear statistically significant as a series of Bernoulli random trials (assuming an underlying 50/50 split between up and down months, p = .03), such effects are washed away when we instead consider the underlying empirical distribution of days or weeks, randomly permuted to form months. When comparing the months composing September to a random basket of days the results are random. Attempts to find seasons of non-randomness are frequently subject to data mining bias, as the same permutation test debunking the September drift is easily used to identify (falsely) statistically significant periods.

The study: Running a bootstrap permutation study on Dow data from 1960 to 2008 we estimate the empirical distribution of differences in monthly return between September and other months. We test the hypothesis that a random September is no more bearish than a composition of random days sampled with replacement. We find that the mean difference between populations is 0.0695%, yielding a p-value of 0.3612 – random.

Bob Humbert writes:

The same September underperformance anomaly exists in the municipal and corporate bond markets. Doesn't this seem "unusual" or is it simply a byproduct of relative value transmitting itself through the various asset classes?

I am not as numerate as you; but keep in mind this: if a coin comes up tails 20 times in a row a Trader would examine the coin… while a Quant would merely assume he was witness to an extremely remote event…

Alston Mabry reports on another study of the issue:

Stats for all Dow months from Oct 1928 thru Aug 2009:

All Dow months:
mean: +0.37%
sd: 5.44%

Take all days in this period, randomly pull 20 to create a month like September, and do this 1000 times (with replacement) to create 1000 randomly-selected "months" with the following stats:

1000 randomly-created months:
mean: +0.26%
sd: 5.03%

Close enough, given the vagaries of the actual monthly data, the use of replacement, etc. Randomly pulling out days creates a distribution of "months" very much like the actual distribution, so one cannot find a solid critique of the use of the actual monthly data, given the similar stats of the randomly-created months.

Then pull the actual Septembers out and compare them to the actual months:

All actual Septembers:
mean: -1.66%
sd: 6.37%
z vs all Dow months: -3.34

That z is spot on with the result from the random resorts of months posted earlier. So, one must conclude again that, in the time period under study, September has been unusually cruel.

The thing about the previously-posted analysis with the random resorts is that one is really asking the generalized question: If one treats the monthly % change series as a set that can be redistributed among the months-as-containers, what is the likelihood that any month will have an extreme mean like -1.66%? I think this eliminates the multiple-comparison problem, since it doesn't have to be September.

But another issue is: Can you treat a series like Dow monthly % changes as a set that can be re-sorted? One concern is the issue of volatility regime changes. For example: in a volatile year, September is the worst month at -4%, and December the best at +4%; then in a calmer year, September is the best month at +2%, and December the worst at -2%; now you have September's mean return as -1% and December's as +1%. But is September really "worse"? Or does it just appear so because of the problems inherent in mixing volatility regimes?

One way I've tried to address this issue is to normalize each month as a z score compared to the mean and sd of the previous 12 months. So that in the example with September and December, the values for September might be -2.5 and +2.5, and the same for December, making the months equivalent.

Normalizing the Dow months (again, Oct 1928 though Aug 2009) in this way and then analyzing September again, one gets:

All Dow months normalized:
mean: -0.05
sd: 1.19

mean: -0.37
sd: 1.23
z vs all months: -2.42

So this adjustment pulls the z score in (as it does in all cases I've used it), but here the z for September still leaves it in the "unusually cruel" category.

Mr. K wrote: "Shorting September every year for 80 years could be fine, but on any given year, it is a crapshoot."

Alas, yes — a crapshoot with a bias. But the analysis is fun.


Resources & Links