Apr

19

David MacKay says:

"Principal Component Analysis" is a dimensionally invalid method that gives people a delusion that they are doing something useful with their data. If you change the units that one of the variables is measured in, it will change all the "principal components"! It's for that reason that I made no mention of PCA in my book. I am not a slavish conformist, regurgitating whatever other people think should be taught. I think before I teach.

Bill Egan responds:

Well, Prof. MacKay is wrong. In fact, I have made predictive models that have worked for years in real-world corporate environments that were based on PCA. Worse yet for the good Prof., all work done in the predictive modeling of optical spectra for the last 40 years or so has involved PCA or a related method.

PCA transforms your data into a set of uncorrelated, unit length vectors. The first of these new vectors contains the most variation in the original data. The next vector explains as much of the remaining variation in the original data as possible, and so on. Each new vector is a linear combination of the original data into the new vector. The method is reproducible and quite numerically stable if you use singular value decomposition as the algorithm.

PCA is a very useful way to reduce the dimensionality of a data set, say one that has many variables, to a smaller set of uncorrelated variables you can work with. To be fair, the new variables do not necessarily have physical meaning, but they often do, and it always pays to look at the weights applied to the original variables (called loadings in some of the literature).

Well, Prof. MacKay is wrong. In fact, I have made predictive models that have worked for years in real-world corporate environments that were based on PCA. Worse yet for the good Prof., all work done in the predictive modeling of optical spectra for the last 40 years or so has involved PCA or a related method.

PCA transforms your data into a set of uncorrelated, unit length vectors. The first of these new vectors contains the most variation in the original data. The next vector explains as much of the remaining variation in the original data as possible, and so on. Each new vector is a linear combination of the original data into the new vector. The method is reproducible and quite numerically stable if you use singular value decomposition as the algorithm.

PCA is a very useful way to reduce the dimensionality of a data set, say one that has many variables, to a smaller set of uncorrelated variables you can work with. To be fair, the new variables do not necessarily have physical meaning, but they often do, and it always pays to look at the weights applied to the original variables (called loadings in some of the literature).


Comments

WordPress database error: [Incorrect file format 'wp_comments']
SELECT * FROM wp_comments WHERE comment_post_ID = '8314' AND comment_approved = '1' ORDER BY comment_date

Name

Email

Website

Speak your mind

Archives

Resources & Links

Search