Principal Components Part One

22 Mar, 2016 — 5 min

This post lets me combine a few of my favorite things: Interest rates, the singular value decomposition, eigen-things and maybe something from Attilio Meucci’s book Risk and Asset Allocation. Not to mention principal components analysis. I find most descriptions of PCA to be confusing, and that most authors do not try to convey an intuitive explanation of the concept. What’s different about this post? My goal is to try to explain the topic using data (i.e., past realized observations) and our favorite computer language, R. It’s never much fun until there’s data involved. Hopefully you’ll stick with this long post long enough to see if I’m right.

Besides some interesting math, the primary thing PCA has going for it is that it is an organized way of choosing unrelated factors in declining order of significance (a declining order of contribution to \(R^2\)). On the other hand, there are some drawbacks. PCA is a hidden-factor model or statistical factor model, which means that the factors may not make much intuitive sense. Also, there is some matrix math that is hard to understand and interpret.

My contention on PCA is that matrix decomposition/diagonalization methods are very cool, and that relatively pedestrian finance types have latched onto something much bigger and more interesting than their work. But that said, why argue about it? The popularity of PCA in fixed income is because it creates an intuitive set of factors for interest rate changes: parallel shift, slope and bend in the yield curve. It is useful: You can identify a one-standard deviation shift in the yield curve from each principal component. And it seems like magic.

Let’s call a time series of interest rate changes \(\mathrm{R}\), with n rows (dates) and m columns (each point on the yield curve). This corresponds to how it would appear using an R package like xts.

It All Begins With Covariance

In finance we tend to care a lot about covariance, and this is no exception. The covariance of two variables is a measure of linear association and is defined as: \( Cov(X,Y) = E[(X-\bar{X})(Y-\bar{Y})] \). When we look at several variables, we can summarize the variability around the mean for a set of variables and the linear association between them in the variance-covariance matrix.

In PCA, it’s best to center or demean your data by subtracting the column means before you do the PCA analysis. If you’ve demeaned your variables then the covariance formula reduces to \( Cov(X,Y) = E[\:X\:Y\:] \) since \(\bar{X}\) and \(\bar{Y}\) are zero. One bit of magic comes from the fact that your covariance matrix of the series \(\mathrm{R}\) is \(\mathrm{R}^T \:\mathrm{R} \) times a constant \( \frac{1}{n-1} \).

Enter The Eigenmagic

The spectral decomposition (a.k.a. eigendecomposition) is a way of decomposing a matrix into the product of three matrices. The center one is diagonal and is made up of the eigenvalues, and the ones on either side are made up of eigenvectors. The eigenvectors are such that they are not linearly related, so that each factor is independent in this sense.

It turns out that one beneficial coincidence is that we care a lot about the covariance matrix and that we can reliably do this eigendecomposition on it. It can be written as:

$$ \Sigma = \frac{1}{n-1} \: \mathrm{R}^T \: \mathrm{R} = \mathrm{Q} \: \Lambda \: \mathrm{Q}^T $$

Where \( \Sigma \) is the covariance matrix of \(\:\mathrm{R} \:\), \(\: \mathrm{Q} \) is the matrix of eigenvectors and \( \Lambda \) is the matrix of eigenvalues along the diagonal. It turns out the eigenvalues are the variance of each factor, and that when ordered from largest to smallest it allows us to pick out the most important factors from the least important ones. (Those that explain the most variance to the least.)

The eigenvectors are an orthogonal basis of \(\mathrm{R}^T \:\mathrm{R} \). Orthogonal means ‘awesome’ in math speak. Orthogonal matrices have some great properties, for example that the transpose of the matrix is it’s inverse. In this case, the eigenvalues are the factor loadings that give us the curve shift/steepening/bend that we love so much. The orthogonal property means that the factor loadings will be uncorrelated.

The Matrix Reloaded

But there is also another matrix factorization out there: the singular value decomposition. Personally I think this is a more direct way to view things. The SVD can be used on any matrix, not just symmetric matrices, and for our matrix \( \mathrm{R} \) it can be written as:

$$ \mathrm{R} = \mathrm{U} \: \mathrm{D} \: \mathrm{V}^T $$

Where \( \mathrm{U} \) is a a set of orthonormal eigenvectors for \(\mathrm{R} \: \mathrm{R}^T \) and \( \mathrm{V} \) is an orthogonal basis for \(\mathrm{R}^T \: \mathrm{R} \). If you have a lot of data the first will be a large matrix with dimension n x n, while the second will have dimension m x m. Remember that the covariance matrix is this latter quantity. So will \( \mathrm{V} \) be equal to \( \mathrm{Q} \)? Will \( \mathrm{D} \) have a relationship to \( \Lambda \)? A little algebra will tell you, but in the next post our R examples will show.

Remember at this point, these are equalities, so there is no approximation going on.

Finally, if we are looking to simulate something using PCA or analyze data over time, then you will want to look at the factor realizations. There are two ways to calculate these, one by multiplying your original data matrix by the eigenvector matrix and the other by using the left two matrices from the SVD. This is simple, but works because because the transpose of an orthogonal matrix is its inverse.

$$ \begin{aligned} \mathrm{R} \quad &= \mathrm{U} \: \mathrm{D} \: \mathrm{V}^T \\ \mathrm{R} \: \mathrm{V} \quad &= \mathrm{U} \: \mathrm{D} \: \mathrm{V}^T \: \mathrm{V}\\ \mathrm{R} \: \mathrm{V} \quad &= \mathrm{U} \: \mathrm{D} \: \mathrm{I_n}\\ \mathrm{R} \: \mathrm{V} \quad &= \mathrm{U} \: \mathrm{D} \\ \end{aligned} $$

You can see these are the same, so to look at the factor realizations, you can either use \(\mathrm{R} \: \mathrm{V} \) or \( \mathrm{U} \: \mathrm{D} \). I prefer the SVD because I can see the pieces as a decomposition of \(\mathrm{R}\), rather than its covariance matrix.

It’s always better with some data, which we’ll have next time. Another daring feat of R to Wordpress coming up!