Inside Day Contest

Daily Speculations

The Web Site of Victor Niederhoffer & Laurel Kenner

Dedicated to the scientific method, free markets, deflating ballyhoo, creating value, and laughter; a forum for us to use our meager abilities to make the world of specinvestments a better place.

Home

Write to us at: (address is not clickable)

7/13/04
Inside Day Contest

A recent study by Mr. Downing found that out of 2,111 days from 1996, only 227 or 11% were inside days for the S&P. An inside day is defined as a day where the current high is lower than the previous high, and the current low is higher than the previous low. The number of inside days strikes me as low relative to randomness. It is interesting to speculate about the best way to test this naive supposition as there are many things in the philosophy of where the close is relative to the high and low of a day, and how this relates to the next day that might not be dreamed of. (An interesting aside is that the number of outside days during this period was 230) . We will offer a prize of $500 to you or your favorite charity for the best answer to this question. And your answer may be via closed form, simulation, or reference to tables of extreme values, or what have you. Note that this a study for pure increase in knowledge as opposed to one of practical value. A meal for a life time rather than a day.
-- Victor Niederhoffer

7/15/04
Victor Niederhoffer: The Four Winners

I was so impressed by the responses to our question on Inside Days that I decided to post the four best. They are by Munawar Cheema, Alston Mabry, Blake McShane, and Charles Pennington.

7/15/04
Statistical Analysis of the Occurrence of Inside Days in the S&P 500 by Munawar Cheema

Objective: To observe how frequently Inside Days occur in the S&P 500 continuous futures contract and determine if the results are consistent with a random behavior of certain independent variables of the stock market. This line of inquiry was inspired by www.dailyspeculations.com.
Sample Data: The data used was continuous contract data from January 3rd, 1989 to September 30th 2003.
Observations on the Data: Number of Inside Days is 12.083% of all days or 449 of 3716.
Methodology for Test:
1. Monte Carlo Simulation: I first tried to develop a closed form solution. Assuming no drift and independent behavior of the highs and lows a rough guess at the probability of an inside day seemed to be an obvious 25%. The fact that the data has a drift required that this be adjusted down as the probability of a shift out of each days range is increased by this drift. On closer examination though the high correlation between the change in the daily low and high to change in close made this type of analysis harder than I anticipated. This correlation of the changes in highs, lows, and closes was observed to be about 63 1/2 % in the sample time series;
2. Model for simulation:
  1. Using the starting high, low and closing prices from the sample data set I generated a time series for the identical period of each of these by modeling three random variables that I believe should be independent in a random world;
  2. I ran this time series 106 times to develop statistics on occurrences of Inside Days to compare the actual number observed in the sample time series;
3. Choice of independent Random Variables: To avoid problems with correlations I chose different variables than the change in the target variables which were the daily highs and lows. I tried to find variables that would be independent of the change in closing prices, but would allow me to infer the time series of lows and highs. I chose the following:
  1. Daily change in close as a % of previous close: I used a normal distribution with the sample mean and variance to model this random variable. A histogram of the sample data compared to a histogram of the generated test variable shows that the actual close change series has the expected longer tail and slightly thinner and taller hump but close enough to a normal distribution for the models purposes.
  2. Daily range as a % of previous close: I used a lognormal distribution to model this using the natural logs of the daily range data in the sample to determine the mean and variance. A visual inspection of the histogram of ranges seemed to confirm the distribution choice. I see no reason why the range in a random world should have any correlation with the days change and felt comfortable modeling this as an independent random variable. In the sample data there was a negative correlation of 7% with the change in the close.
  3. Position of close within range as a % of range: This was interesting as I see no reason why this should not have a mean of 50%. The sample had a mean of 55% and a look at the histogram showed that a close below 3% of the range was very rare and closes above 90% of the range were far more frequent than a uniform distribution would predict. Another, interesting feature was the extremely high 70% correlation with the change in the close as a % of the previous close. This seemed to invalidate my assumption of independence but in a random world I see no reason why the position of the close should be anything but uniformly distributed. See further study below for more on this.
4. Summary of Results:
  1. The percentage of all days that were inside days in the sample data was 12.083% ;
  2. The mean percentage of such days in each test run of the time series was 11.176% with a variance of .533%;
5. Conclusions:
  1. This means the 11% observed by Mr. Downing is consistent with my random model; in my sample the post 1995 data is 216 occurrences out of 1948 for a total of 11.08% as I only had data through the third quarter of 2003 but the number of Inside Days observed ties out.
  2. The number of inside days in my data seems to be higher by 1.7 standard deviations than would be predicted by my random model. This combined with the first conclusion hints at change in cycle where prior to 1996 we had a more frequent than could be expected rate of occurrence of Inside Days but this seems to have dissipated. (See further study below)
  3. Further study:
    1. Model the position of the close within the range to match it s histogram and see the effect;
    2. I want to look more closely into the correlation of the position of the close within the range to understand it or find an error in the entire setup or in the way I am computing correlations.
    3. Study the effect on the number of inside days by removing the upward drift in the simulation to see how the mean number of inside days changes in the simulation. I removed the drift by setting the mean of the changes in closes to zero but didn t affect the outcome in any meaningful way the mean of the 106 time series was 11.176% with a variance of .42%. Needs further review.
    4. The sample data before 1996 shows 13.18% of the days or 233 out of 1768 days were inside days which hints at a change of cycle around 1996 but needs more thought and work.
    5. In my model unsurprisingly position of close as % of range and Range as % of previous close were uncorrelated with the % change in closes as had been built into the assumptions. The correlation of the highs and lows was about 80% significantly higher than my observed 63 1/2% and this merits further investigation.
^{Disclaimer: All the above conclusions and observations are dependent on
a thorough checking of my work and methodology.}

7/15/04
Alston Mabry on Inside Days

Here's one take on your Inside Day question:

First, the data I used is the S&P 500, taken from Yahoo, for the dates 2 January, 1996 through 9 July, 2004. In that set I count 2143 total days, of which 251 are "inside" days, where High T1 is lower than High T-zero, and Low T1 is higher than Low T-zero. (Note that because I did not include the last trading day of 1995, 2 January, 1996, is counted only as a T-zero and not as a possible T1.)

I took the approach that an inside day is like putting a basketball through a hoop. The basketball must be smaller than the hoop, and there is some probability with each shot that a small-enough ball will go through a large-enough hoop. (The mean High-Low gap for the total distribution is 1.53%, whereas the mean High-Low gap for the subset of 251 inside days is only 1.05%.)

I calculated for each day Tx its High-Low gap, measured as a % of its Open. Then I calculated for each day Tx how many of the other days in the total distribution had High-Low gaps larger than High-Low Tx, i.e., how many of the other days were large enough "hoops". I then turned that into a percent probability that Tx might follow a day with a large-enough High-Low gap.

For example, 9 July, 2004, had a High-Low gap of 0.57%. Of the total population of 2143 days, 2046 have larger High-Low gaps. So, were I to randomize these days, 9 July, 2004, has a .9543 probability of following a day with a larger gap.

Having calculated the through-the-hoop probability for each day, I summed them for all 2143 days, to get a total of 1070.4958. I then divided this sum by the total number of inside days (251) to get an "inside day constant" of 4.26492351.

Now, for any group of days with a certain range of High-Low gaps, I should be able to predict how many inside days I would get by summing their probabilities and multiplying by this inside day constant.

For example, out of the total distribution of days (2143), there are 59 days with High-Low gaps between 0% and .50% inclusive. The sum of the inside day probabilities for those 59 days is 58.14226. This sum, divided by the inside day constant, produces 13.63266. So, we would expect 13-14 inside days from this group. Looking at the actual group of 251 inside days, we find that there are 20 days with High-Low gaps between 0% and .50% inclusive.

Running the predictor equation for the distribution, split into segments, produces these results:

The equation underestimates on the small end of the distribution, but I think that seeing the equation work fairly accurately over different segments of the distribution demonstrates its usefulness.

To test the idea out of sample, I took the S&P 500 daily data for the period 4 January, 1988, through 29 December, 1995, and applied the "inside constant" derived from the 1996-2004 period to predict how many inside days would come from different High-Low gap groups in the 1988-1995 period. Results as follows:

As with the 1996-2004 period, the predictor is less accurate at the small end of the 1988-1995 distribution, where, evidently, the small size of the "basketballs" gives us more hits than the predictor equation predicts. But overall, I would say the predictor equation works well in the out-of-sample test. (Interesting to note the accuracy of the predictor over segments with very different numbers of days, e.g., the 45%-.52% segment versus the .95% segment.)

I would conclude that the distribution of inside days is a function of the probabilities of one High-Low gap fitting inside another, as opposed to some underlying market structure or tradable anomaly. One might do a more detailed analysis of the probabilities for specific combinations of ball size and hoop size, to try to understand how the "inside constant" operates at the micro level, or to analyze the divergence at the small end of the distribution -- but probably just for sport.

Thanks for a stimulating question. I look forward to seeing other responses on the site.

7/15/04
Blake McShane on Inside Days

I tried solving it analytically assuming a price process of geometric brownian motion, but, unfortunately, could not figure out the process for the supremum and infimum of the underlying price process. Thus, I resorted to simulation and discretized with 10,000 "steps" per day. After running 1,000 such simulations of two days and comparing the highs and lows, I observed an inside day percentage of 12.7%. I am in the process of running 10,000 but this should take several five hours. If anyone wants the spreadsheet with the simulation, email me and I will send it.

7/15/04
Victor Niederhoffer Comments

Is very erudite. Perhaps if he modifies his work to take into consideration that the moves of the open relative to the close are quite different from any other tick, and in addition the serial correlation between consecutive ranges is of the order of 25%, he will win hands down the prize for the best solution to this pearl.

7/15/04
Blake McShane Responds

The serial correlation of the daily S+P High - S+P Low has been .157 over the last ten years, with the 90 day correlation ranging from -.33 to +.33 and averaging about zero. With this in mind, I retweaked my study so the volatility term in what before was a discretized geometric brownian motion now follows a GARCH(1,1) process. After running several simulations, I am seeing the same inside and outside day percentages (ie, about 9-13%). The range serial correlation is obviously dependent upon the weights chosen for the GARCH model but the inside/outside day percentages seem roughly constant despite variations in the level of range serial correlation.

Close to open prices are determined in a manner similar to intraday prices except that 2,000 "steps" are taken from close to open. Intraday I use 10,000 "steps" with each step representing one tick whereas 2,000 "steps" represent the close-to-open tick. 2,000 was chosen arbitrarily and I am open to changing it, but it clearly must be greater than one to reflect the distinctive qualities of this tick.

7/15/04
Charles Pennington on Inside Days

The statistics of inside days can be modeled on the E***l program, if you have not already uninstalled it from your machine as you were instructed.

I broke the day up into 100 time steps; ultimately the first 25 steps represent the pre-market hours (close-to-open) when the market is closed, and the next 75 steps represent regular market hours. Our daily "high" and "low" will be the max and min, respectively, of these 75 time steps. (The chosen ratio 75-to-25 will be explained shortly.)

The market started at zero; then for each subsequent time step, I added a random number, evenly distributed between -1 and 1. I put in 65,000 or so time steps (the limit of E***l), and now I have a diffusing market. Each day is 100 time steps, so there are ~650 simulated trading days.

The ratio of 75 to 25 was chosen as follows: I took the mean square percentage move from open-to-close and close-to-open, using about 10 years of data for SPY (the exchange traded fund version of the S&P 500). The mean square open-to-close move was 3.1 times the mean square close-to-open move. That's how I chose the 3-to-1 ratio for the number of time steps. (The root-mean-square diffusion distance grows like the square root of the time, but the mean-square distance grows linearly with time.)

For the time step sequences #25 to 100, 125 to 200, 225 to 300, etc., I stored the max and min prices, each representing the max and min price for one day.

Then, if today's max (H) was less than yesterday's (H1), and if today's min (L) was greater than yesterday's (L1), then today was an inside day.

The results?

654 simulated trading days.
73, or 11%, were inside days.

The statistical error in the # of days is probably the square root of 73, or about 8, so for an error bar we could use about 1%.

So with just random walk, one expects that 11% plus/minus 1% of all trading days should be inside days. The number that Tom observed was 11%. So all is well here in the best of all possible worlds.