### Principal components analysis (PCA) using a sequential method

Submitted by
kevindunn on 23 July 2011
File
Update history
Revision 3 of 3: previous
Updated by
kevindunn on 08 August 2011
Tags

The singular value decomposition is usually presented as the way to calculate the PCA decomposition of a data matrix.

The NIPALS algorithm is a very computationally tractable way of calculating PCA for large data sets, since we only calculate the components we actually need; whereas SVD calculates all components in one go.

The nonlinear iterative partial least squares (NIPALS) method is more informative, as the interpretation of what the loadings and scores really mean becomes apparent when examining the above code.

For example, in step 1 of the while loop we see the loading, $$p_a$$ contains the regression coefficients when regressing the score vector, $$t_a$$, onto each column in $$\mathbf{X}$$. So at convergence of the while loop, any columns in $$\mathbf{X}$$ that are strongly correlated, will have similar loading values, $$p_a$$, for those columns.

In step 3 of the while loop the loading, $$p_a$$, is regressed onto each row in $$\mathbf{X}$$ and the regression coefficient is stored at the relevant row in the score, $$t_a$$. At convergence of the while loop, any rows in $$\mathbf{X}$$ that are strongly correlated (aligned with) that loading will have have a large positive or negative score value. Row entries in $$\mathbf{X}$$ that are not explained (i.e. unrelated) to that $$p_a$$

loading will have a near-zero score value in $$t_a$$.

Still to come

• Calculating confidence limits for SPE and Hotelling’s $$T^2$$ to determine which points are likely outliers

• Creative Commons Zero. No rights reserved.
Users have permission to do anything with the code and other material on this page. (More details)
• X 17
• X 16
• X 12
• X 9
• X 8
• X 7
• X 7
• X 6
• X 5
• X 4
• X 4
• X 4
• X 3
• X 3
• X 3