Mar

5

CurveHow can we avoid curve fitting when designing a trading strategy? Are there any solid parameters one can use as guide? It seems very easy to adjust the trading signals to the data. This leads to a perfect backtested system - and a tomorrow's crash. What is the line that tells apart perfect trading strategy optimization from curve fitting? The worry is to arrive to a model that explains everything and predicts nothing. (And a further question: What is the NATURE of the predictive value of a system? What - philosophically speaking - confer to a model it's ability to predict future market behavior?)

James Sogi writes:

KISS. Keep parameters simple and robust.

Newton Linchen replies:

You have to agree that it's easier said than done. There is always the desire to "improve" results, to avoid drawdown, to boost profitability…

Is there a "wise speculator's" to-do list on, for example, how many parameters does a system requires/accepts (can handle)?

Nigel Davies offers:

Here's an offbeat view:

Curve fitting isn't the only problem, there's also the issue of whether one takes into account contrary evidence. And there will usually be some kind of contrary evidence, unless and until a feeding frenzy occurs (i.e a segment of market participants start to lose their heads).

So for me the whole thing boils down to inner mental balance and harmony - when someone is under stress or has certain personality issues, they're going to find a way to fit some curves somehow. On the other those who are relaxed (even when the external situation is very difficult) and have stable characters will tend towards objectivity even in the most trying circumstances.

I think this way of seeing things provides a couple of important insights: a) True non randomness will tend to occur when most market participants are highly emotional. b) A good way to avoid curve fitting is to work on someone's ability to withstand stress - if they want to improve they should try green vegetables, good water and maybe some form of yoga, meditation or martial art (tai chi and yiquan are certainly good).

Newton Linchen replies:

The word that I found most important in your e-mail was "objectivity".

I kind of agree with the rest, but, I'm referring most to the curve fitting while developing trading ideas, not when trading them. That's why a scale to measure curve fitting (if it was possible at all) is in order: from what point curve fitting enters the modeling data process?

And, what would be the chess player point of view in this issue?

Nigel Davies replies:

Well what we chess players do is essentially try to destroy our own ideas because if we don't then our opponents will. In the midst of this process 'hope' is the enemy, and unless you're on top of your game he can appear in all sorts of situations. And this despite our best intentions.

Markets don't function in the same way as chess opponents; they act more as a mirror for our own flaws (mainly hope) rather than a malevolent force that's there to do you in. So the requirement to falsify doesn't seem quite so urgent, especially when one is winning game with a particular 'system'.

Out of sample testing can help simulate the process of falsification but not with the same level of paranoia, and also what's built into it is an assumption that the effect is stable.

This brings me to the other difference between chess and markets; the former offers a stable platform on which to experiment and test ones ideas, the latter only has moments of stability. How long will they last? Who knows. But I suspect that subliminal knowledge about the out of sample data may play a part in system construction, not to mention the fact that other people may be doing the same kind of thing and thus competing for the entrees.

An interesting experiment might be to see how the real time application of a system compares to the out of sample test. I hypothesize that it will be worse, much worse.

Kim Zussman adds:

Markets demonstrate repeating patterns over irregularly spaced intervals. It's one thing to find those patterns in the current regime, but how to determine when your precious pattern has failed vs. simply statistical noise?

The answers given here before include money-management and control analysis.

But if you manage your money so carefully as to not go bust when the patterns do, on the whole can you make money (beyond, say, B/H, net of vig, opportunity cost, day job)?

If control analysis and similar quantitative methods work, why aren't engineers rich? (OK some are, but more lawyers are and they don't understand this stuff)

The point will be made that systematic approaches fail, because all patterns get uncovered and you need to be alert to this, and adapt faster and bolder than other agents competing for mating rights. Which should result in certain runners at the top of the distribution (of smarts, guts, determination, etc) far out-distancing the pack.

And it seems there are such, in the infinitesimally small proportion predicted by the curve.

That is curve fitting.

Legacy Daily observes:

"I hypothesize that it will be worse, much worse." If it was so easy, I doubt this discussion would be taking place.

I think human judgment (+ the emotional balance Nigel mentions) are the elements that make multiple regression statistical analysis work. I am skeptical that past price history of a security can predict its future price action but not as skeptical that past relationships between multiple correlated markets (variables) can hold true in the future. The number of independent variables that you use to explain your dependent variable, which variables to choose, how to lag them, and interpretation of the result (why are the numbers saying what they are saying and the historical version of the same) among other decisions are based on so many human decisions that I doubt any system can accurately perpetually predict anything. Even if it could, the force (impact) of the system itself would skew the results rendering the original analysis, premises, and decisions invalid. I have heard of "learning" systems but I haven't had an opportunity to experiment with a model that is able to choose independent variables as the cycles change.

The system has two advantages over us the humans. It takes emotion out of the picture and it can perform many computations quickly. If one gives it any more credit than that, one learns some painful lessons sooner or later. The solution many people implement is "money management" techniques to cut losses short and let the winners take care of themselves (which again are based on judgment). I am sure there are studies out there that try to determine the impact of quantitative models on the markets. Perhaps fading those models by a contra model may yield more positive (dare I say predictable) results…

One last comment, check out how a system generates random numbers (if haven't already looked into this). While the number appears random to us, it is anything but random, unless the generator is based on external random phenomena.

Bill Rafter adds:

Research to identify a universal truth to be used going either forward or backward (out of sample or in-sample) is not curvefitting. An example of that might be the implications of higher levels of implied volatility to future asset price levels.

Research of past data to identify a specific value to be used going forward (out of sample) is not curvefitting, but used backward (in-sample) is curvefitting. If you think of the latter as look-ahead bias it becomes a little more clear. Optimization would clearly count as curvefitting.

Sometimes (usually because of insufficient history) you have no ability to divide your data into two tranches – one for identifying values and the second for testing. In such a case you had best limit your research to identifying universal truths rather than specific values.

Scott Brooks comments:

If the past is not a good measure of today and we only use the present data, then isn't that really just short term trend following? As has been said on this list many times, trend following works great until it doesn't. Therefore, using today's data doesn't really work either.

Phil McDonnell comments:

Curve fitting is one of those things market researchers try NOT to do. But as Mr. Linchen suggests, it is difficult to know when we are approaching the slippery slope of curve fitting. What is curve fitting and what is wrong with it?

A simple example of curve fitting may help. Suppose we had two variables that could not possibly have any predictive value. Call them x1 and x2. They are random numbers. Then let's use them to 'predict' two days worth of market changes m. We have the following table:

m x1 x2
+4 2 1
+20 8 6

Can our random numbers predict the market with a model like this? In fact they can. We know this because we can set up 2 simultaneous equations in two unknowns and solve it. The basic equation is:

m = a * x1 + b * x2

The solution is a = 1 and b = 2. You can check this by back substituting. Multiply x1 by 1 and add two times x2 and each time it appears to give you a correct answer for m. The reason is that it is almost always possible (*) to solve two equations in two unknowns.

So this gives us one rule to consider when we are fitting. The rule is: Never fit n data points with n parameters.

The reason is because you will generally get a 'too good to be true' fit as Larry Williams suggests. This rule generalizes. For example best practices include getting much more data than the number of parameters you are trying to fit. There is a statistical concept called degrees of freedom involved here.

Degrees of freedom is how much wiggle room there is in your model. Each variable you add is a chance for your model to wiggle to better fit the data. The rule of thumb is that you take the number of data points you have and subtract the number of variables. Another way to say this is the number of data points should be MUCH more than the number of fitted parameters.

It is also good to mention that the number of parameters can be tricky to understand. Looking at intraday patterns a parameter could be something like today's high was lower than yesterday's high. Even though it is a true false criteria it is still an independent variable. Choice of the length of a moving average is a parameter. Whether one is above or below is another parameter. Some people use thresholds in moving average systems. Each is a parameter. Adding a second moving average may add four more parameters and the comparison between the two
averages yet another. In a system involving a 200 day and 50 day
average that showed 10 buy sell signals it might have as many as 10 parameters and thus be nearly useless.

Steve Ellison mentioned the two sample data technique. Basically you can fit your model on one data set and then use the same parameters to test out of sample. What you cannot do is refit the model or system parameters to the new data.

Another caveat here is the data mining slippery slope. This means you need to keep track of how many other variables you tried and rejected. This is also called the multiple comparison problem. It can be as insidious as trying to know how many variables someone else tried before coming up with their idea. For example how many parameters did Welles Wilder try before coming up with his 14 day RSI index? There is no way 14 was his first and only guess.

Another bad practice is when you have a system that has picked say 20 profitable trades and you look for rules to weed out those pesky few bad trades to get the perfect system. If you find yourself adding a rule or variable to rule out one or two trades you are well into data mining territory.

Bruno's suggestion to use the BIC or AIC is a good one. If one is doing a multiple regression one should look at the individual t stats for the coefficients AND look at the F test for the overall quality of the fit. Any variables with t-stats that are not above 2 should be tossed. Also an variables which are highly correlated with each other, the weaker one should be tossed.

George Parkanyi reminds us:

Yeah but you guys are forgetting that without curve-fitting we never would have invented the bra.

Say, has anybody got any experience with vertical drop fitting? I just back-tested some oil data and …

Larry Williams writes:

If it looks like it works real well it is curve fitting.

Newton Linchen reiterates:

 my point is: what is the degree of system optimization that turns into curve fitting? In other words, how one is able to recognize curve fitting while modeling data? Perhaps returns too good to believe?

What I mean is to get a general rule that would tell: "Hey, man, from THIS point on you are curve fitting, so step back!"

Steve Ellison proffers:

I learned from Dr. McDonnell to divide the data into two halves and do the curve fitting on only the first half of the data, then test a strategy that looks good on the second half of the data.

Yishen Kuik writes:

The usual out of sample testing says, take price series data, break it into 2, optimize on the 1st piece, test on the 2nd piece, see if you still get a good result.

If you get a bad result you know you've curve fitted. If you get a good result, you know you have something that works.

But what if you get a mildly good result? Then what do you "know" ?

Jim Sogi adds:

This reminds me of the three blind men each touching one part of the elephant and describing what the elephant was like. Quants are often like the blind men, each touching say the 90's bull run tranche, others sampling recent data, others sample the whole. Each has their own description of the market, which like the blind men, are all wrong.

The most important data tranche is the most recent as that is what the current cycle is. You want your trades to work there. Don't try make the reality fit the model.

Also, why not break it into 3 pieces and have 2 out of sample pieces to test it on.

We can go further. If each discreet trade is of limited length, then why not slice up the price series into 100 pieces, reassemble all the odd numbered time slices chronologically into sample A, the even ones into sample B.

Then optimize on sample A and test on sample B. This can address to some degree concerns about regime shifts that might differently characterize your two samples in a simple break of the data.

 


Comments

Name

Email

Website

Speak your mind

Archives

Resources & Links

Search