# Subprime Information and Real Estate, from Gordon Haave

March 14, 2007 | 3 Comments

Why is it whenever the government decides to protect us from market forces, we tax payers get the shaft? Would it not be better just not to interfere in the first place?

Financially it would be better to let the foreclosures happen. Not foreclosing on bad debts means the economy doesn't get to reallocate capital to its best uses. That, in short, was the cause of the 10-year recession in Japan.

If it is in the interest of the lenders to give people breathing room, they will do so without the government forcing them to.

Not long ago, I was involved in building predictive models for sub-prime products for one of the major shorts in the market today. There was no way of predicting today's scenario, because a large part of the poor credit performance is due to fraudulent mortgages (loans originated on falsified information). Thus, most of the models are based on false historical data. For instance, a borrower with a Debt to Income Ratio (DTI) of 0.4 is now all of a sudden a borrower with a DTI of 1.4 … oops! It is interesting that this fraud was mostly conducted by "Loan Officers" and not the borrowers. Here is an example of quant models being useless!

# Non-Linear Relationships, from Philip McDonnell

I would like to offer some simple thoughts on non-linear relationships. The usual way to study non-linear correlations is to transform one or more of the variables in question. For example if we have a reason to believe that the underlying process is multiplicative then we can use a log function to model our data. When we do a correlation or regression of y~x we can just take the transformed variables ln(y)~ln(x) as our new data set. We are still doing a linear correlation or a linear regression but now we are doing it on the transformed variables.

Ideally we would know the form of the non-linear relationship from some theory. Absent that we could use a general functional form such as the polynomials. So our transform could be something like X^2, X^3, or X^4. Using one of these terms is usually pretty safe. But combining them in a multiple regression can be problematic. The reason is that the terms x^2 and x^3 are about 67% correlated. Using highly correlated variables to model or predict some third variable is a bad idea because you cannot trust the statistics you get.

One way around that is to use orthogonal polynomials or functions. We have previously discussed Fourier transforms and Chebychev polynomials. Both of these classes are orthogonal which also means that we can fit a few terms and add or delete terms at will. The fitted coefficients will not change if we truncate or add to the series. Each term is guaranteed to be linearly independent of the others.

Using one of these terms is usually pretty safe. But combining them in a multiple regression can be problematic. The reason is that the terms x^2 and x^3 are about 67% correlated. Using highly correlated variables to model or predict some third variable is a bad idea because you cannot trust the statistics you get.

I have a question.

One of the reasons for adding regressors is to take into account all possible reasons behind a move in the variable we are trying to explain. However, multicollinearity being prevalent in finance, it is a source of headaches.

If we could randomize and/or design experience plans for empirical studies, as we do in biology, we could get rid of part of the problem.

Is it possible to randomize ex post? Let's say I what to study Y = aX+ b + e. If instead of taking the full history of observed (Y,X), I am taking a random sample of (Y,X), it creates some kind of post-randomization, which should reduces the impact of other factors.

Does it make sense? Of course, we would lose all the information contained in the non-sampled (Y,X). That means even less data to work with, which is not nice with ever-changing cycles.

## Rich Ghazarian mentions:

And of course if you want a more powerful model, you fit a Copula to your processes and now you are in a more realistic Dependence Structure. Engle has a nice paper on Dynamic Conditional Correlation that may interest Dependence modelers on the list. The use of Excel correlation, pearson correlation, linear correlation … these must be the biggest flaws in quant finance today.

With linear functions we can compute the Eigenvectors to get an orthogonal representation. One problem that gets in the way of nonlinear models is that it isn't clear what is the appropriate "distance" measurement. You need a formal metric of distance to model, compare, or optimize anything. How far apart are these points?

With linear axes, distance is determined by Pythagoras. But what is suggested for the underlying measure of distance if the axes aren't linear?

These remarks about correlation resonate with me, especially in the case of the stock market.

## From Vincent Andres:

If you did replace your original axis X and Y by new axis X'=fx(X) and Y'=fy(Y) this is a transformation of the kind P=(x,y) -> P'=f(P)=(x',y')=(fx(x), fy(y)).

This transformation can be reverted without worry. P'=(x',y') -> P=(x,y) where x and y are the antecedents of x' and y' thru the reciprocal functions fx^-1 and fy^-1.

A "natural" suggested distance measure in this new universe is thus : dist(P1, P2) = dist(ant(P1), ant(P2)) ant = antecedent.

This works for all functions fx and fy being monotonous, e.g., (ln(x), x^2, etc) because there is a strict bijection between the two universes. It could even do something for a more large class of functions.

Sorry for the difficult notations, but I hope the idea is clear.