# Simulation, from Phil McDonnell

January 13, 2008 | When working with spreadsheets and random numbers there are some techniques which can be helpful. For example suppose we used the real data of daily market changes but wished to randomly reorder them to see if the original order had different properties than the random order. We could perform 999 reorderings and compute some stat of interest for each reordering. Then sorting the 999 stats in order smallest to largest would give us a very nice table in which we could look up the p value of our stat of interest.

For example, suppose we wanted to test whether the recent trading ranges were wider than the should have been at random. We would take 999 reorderings and calculate the range for each. We then compare the range from the original empirical distribution to see where it falls in the sorted table of random ranges. If it is in the 50 it is too small at the 5% level (one tailed test). If it is in the top 50 it is too large for a 5% one tailed test. For a two tailed test we look for the bottom and top 25.

On an Excel spreadsheet if we use the RANDBETWEEN() function as an INDEX into the original table of numbers it will allow us to randomly sample from the original distribution to create our new random distribution. The VLOOKUP() function is often useful here as well. In R one can use the boot() function, from package boot, to do all the heavy lifting of randomization.  Easier than trying to do it in a spreadsheet.

In my book Optimal Portfolio Modeling I recommend a slight variation on this technique. The usual technique is to randomize each day. The thinking behind this is that the low autocorrelation usually present in markets implies that there is little or no linear relationship between successive days in a time series. But this says nothing about any non-linear relationships — patterns, cycles, mean reversion, conditional heteroskedasticity and many others. To capture these (if they exist) we can take random blocks of time, such as 20 days. We would randomize the start time of the block. So if our random number picked day 31 as the random time we would use the block from day 31 to day 50 as the block. The idea is to splice together these random blocks to see if they behave differently from the original.  Making the block length long relative to the granularity of the data (say 20:1) helps to preserve most of any putative non-linear behavior.

It is also instructive to compare the randomized daily design to the randomized 20 day block design. If there is a significant difference then something non-linear may be going on.