Daily Speculations

 

Bifurcations

Jan. 29, 2004

 

When I was a boy, there was a program called Automatic Interaction Detector that would find the optimal binary splits on x independents, nested to predict a dependent. It examined all splits and came up with a tree-and-branch similar to the below, with u corresponding to mean change:

                                                        S&P All Days

                                                            (u=1)

                        When bonds <50                                        When bonds >50

when gold up                    when gold down            when gold up                When gold down

u=-1                                    u=0                               u=2                                u=1.5

                          dollar down          dollar up   

                             u=1                     u=-1

etc, with continual branches.

It seemed so good on paper but was useless. I believe its successor is called CART. It don't work because it implicitly has thousands of hypotheses that it examines to retrofit a mainly random phenomenon. Also, when the results aren't random or  explained by multiple hypotheses, the cycles are ready to change.

Such defects apply to almost all the work of quant shops on Wall Street in particular, and to technical analysts in general. When they test things, they have many splits, many hypotheses, many exceptions based on lame duck on the Iraqi War that retrofit their data. Their predictivity for the future and departures from randomness are even less than the experience I had with the Automatic Interaction Detector. Beware of splits.--Vic


From Bill Egan, psychometrician:

CART (categorical analysis by regression trees) is also called RP (recursive partitioning). Vic's experience mirrors my own - some people in computational chemistry like these models, but I have found them to be useless. The modern incarnations of these algorithms do attempt to assess statistical significance, but when I tested them for predicting
physcicochemical properties of molecules, not only didn't they work, they gave physically impossible splits. I have found one or two splits to work very well for certain problems, but the reason was always physically meaningful and obvious in that particular data the instant I did some bi-and tri-plots. The more splits, the more you overfit with random, nonsensical multiple hypotheses.

From George R. Zachar, trader:

I too walked this path and discovered failure, though a naive "gut check" meant that I only lost time and not money: After coming up with "paper" historic winning models, I simply ran them in real time without putting any capital
at risk. It became instantly evident the entire approach was/would be a very expensive failure, generating p/ls no different than random.

Given my utter lack of training in the field, and how quickly I learned the entire approach was Baal-worshipping, I was able to restructure my understanding of the role mechanical systems players and marketers fill in the marketplace.