Eigenvalues and how to find them

Recently, I have taken some time to look into numerical eigenvalue algorithms. This is a subject that has interested me for some time now, since it is not regularly taught in basic lectures on numerics, and there are many instances where you actually need to compute eigenvalues. Of course, any reasonable math software will have built-in algorithms that do the trick for you – but time and again, I’d like to know what makes them tick. So how do they work?

In what follows, I refer to chapter 3 of the basic-ish book by Hämmerlin on Numerical Mathematics. It is not very well known, and its notation is somewhat intricate, but at least it started at roughly the same level where I stood. For more advanced ideas, the book by Stoer and Bulirsch might be more applicable, or so I’m told – however, I haven’t had a closer look at it yet.

To fix notation, we look at some real or complex n\times n matrix A = (a_{ij})_{i,j=1,\ldots,n}, and we are interested in its eigenvalues \lambda_i with corresponding eigenvectors v_i, i=1\ldots,n. We denote the n-dimensional identity matrix by I.

As it turns out, these eigenvalue-algorithms are rather intricate beings. There are different ones for slightly different purposes, and if you are interested in eigenvectors as well – that’s almost always the bottleneck. So, unless you have a great package of built-in algorithms (such as Lapack, which is the foundation to what R and Matlab do with their eigen functions), you should really think twice before having a look at eigenvectors.

The most basic algorithm is the Power Method, which will yield the largest eigenvalue and its corresponding eigenvector (large in the sense of absolute value). It is based upon the concept of eigen-decomposition of vectors: if there are n eigenvalues \lambda_i and a base of corresponding eigenvalues x_i, then we have for any initial vector v_0: A^k v_0 = \sum_{i=1}^n\lambda_i^k x_i. Here, the largest eigenvalue, \lambda_1 say, will dominate the others and you can extract its value from the iterates A^k v_0. The iterates will also give you the corresponding eigenvector – that is, if the eigenspace is one-dimensional. If there are other eigenvalues that have the same absolute values as \lambda_1, things get rather messy. Manageable, yes, but it will not be as nice as before (here, Hämmerlin goes into some detail on the convergence issues especially of the eigenvectors). Related to this problem is the concept of deflation, where your n-dimensional space splits into lower dimensional ones to deal with the eigenvalue-problem in lower dimensions only (this can sometimes happen, depending on the exact structure of the matrix). This will improve the feasibility of all the algorithms discussed here, but we won’t go into this much deeper.

This algorithm may be what you need, if you are only interested in the largest eigenvalue. This might happen in physics or when dealing with the spectral norm, but in my own applications in statistics, I usually have to know all of the eigenvalues. Thankfully, there are more possibilities. For instance, there are tailored algorithms that work particularly well on tridiagonal matrices (which may arise in problems on ODEs for instance). Those can be based on a non-explicit computation of the characteristic polynomial – you mustn’t evaluate that polynomial straight-forwardly, since you will get all kinds of rubbish for your answer. The problem is highly instable from that point of view. But a recursive evaluation will do just fine, because the structure of the problem can be broken down to an evaluation of the characteristic polynomial and its derivative at any given point, thus allowing for Newton’s method to take over.

Statisticians will use eigenvalue-algorithms for real symmetric matrices (or suspected covariance matrices, if you will), which have the additional property of yielding n real eigenvalues (remember the principal axis theorem?). It will be interesting to find out, for statisticians at least, if all the eigenvalues are positive. Here, a clever use of orthogonal transformations will be useful, the Givens rotation, which will iteratively tone down the off-diagonal entries of your matrix (the application of this transformation in this application is called the Jacobi Algorithm). In the limit, you can read the eigenvalues off the diagonal. The proof of this fact is not too hard, but we won’t give it here. In practice, the question is when to stop your iterations.

For the Givens rotation, as well as for the Newton method, approximate facts about the eigenvalues would be helpful. Surprisingly, something can be said without much effort: the Gershgorin disks. All eigenvalues \lambda of the matrix A must lie in the union of the disks \left\{z\in\mathbb{C} : \left|z-a_{kk}\right|\leq r_k\right\}, where the radii are r_k=\sum_{i=1, i\neq k}^n \left|a_{ki}\right|.

The proof is remarkably simple: the eigenvalue equation Ax=\lambda x can be written as (\lambda-a_{kk})x_k=\sum_{i=1, i\neq k}^n a_{ki}x_i. One of the indices k must be such that x_k=\left\|x\right\|_\infty, then \left|\lambda-a_{kk}\right| \leq\left|\sum_{i=1, i\neq k}^na_{ki}\frac{x_i}{x_k}\right| \leq\sum_{i=1, i\neq k}^n\left|a_{ki}\right|. q.e.d.

These Gershgorin disks can tell you when your iterative algorithm has successfully split two eigenvalues. It will be rather undecisive with multiple zeros of the characteristic polynomial. But, as I mentioned above, those tricky situations will be troublesome for all of the algorithms.

Another fancy idea is an advanced power method. If you apply it to A^{-1} instead of A, it will give you the smallest eigenvalue. And you might even use it on (A-c I)^{-1} to find the eigenvalue closest to c\in\mathbb{C}. Gershgorin will tell you some proper choices of c to start with. On top of that, you can find the eigenvalues from the power method, as I mentioned above. That’s something that the Jacobi method will not provide – but of course you can use the power method on an eigenvalue approximation that you have gathered from one of the other algorithms. The bad thing is, that you will need to apply the power method about n times for this, and you might not even have the proper choices of c yet, so this will take even more effort. If you’re unlucky, you still have to distinguish multiple eigenvalues – it’s a hard problem with many nasty special cases.

The most famous algorithm, and probably the most widely used one, is the QR algorithm. It unites ideas from the power method and the Givens rotation (when you dig down deeply, that is), it is quite fast and highly stable. But still, this algorithm doesn’t make a tough problem easy. My own implementation in R works just fine on usual matrices, but it struggles with several problems of convergence in special cases and it uses quite some memory. But then again, I didn’t really try properly… The basic idea lies in the QR-decomposition of the matrix A = QR, which can also be used for solving linear equations and whatnot. As the names should suggest, R is an upper triangular matrix, while Q is orthogonal. Then, one will compute the new matrix A_{i+1} := R_iQ_i from A_i = Q_iR_i, and iterate. One can show that all the A_i have the same eigenvalues (they differ by orthogonal transformations only) and that the iterates will converge to some upper triangular matrix from which you can read the eigenvalues.

The proof is somewhat intricate – already in the base case that was proved by Wilkinson in the 1960’s. That is why we will not discuss it here – but most of the books I have consulted on this give the same proof and only in the special case of symmetric real matrices with some additional technical assumptions that you couldn’t check in the first place. The proof hinges on many QR-decompositions of various matrices performed in a row, and on the uniqueness of this decomposition.

It is rather tricky to decide when the algorithm should stop. You cannot expect the elements under the diagonal to vanish, since complex eigenvalues will not show up on the diagonal (how would they, the algorithm is real if you work on real matrices) but form boxes of dimension 2 from which you can still read off the eigenvalues. But then, fluctuations on the first subdiagonal can either come from two conjugate complex eigenvalues or from the algorithm not having properly converged yet. I am not aware of a way to decide what happens during runtime. But, apparently, sophisticated methods have been found to cope with this in the usual implementations.

Principally, the algorithm works in the general case as well, and it works quite fast to give all the eigenvalues at the same time. The decomposition itself will take about n^3 operations in each step of the iterations, which is on the faster side of eigenvalue algorithms (the power method takes n^2 operations for the multiplications, but you’ll need to do the iterations with n different matrices at least). People improve on the convergence by using the QR-decomposition of the matrix A_i - rI, where r is the so-called shift and is often applied such that the lower right element of A_i is set to 0. Depending on how deep you like to delve into this, you can also use more sophisticated shifts than that one. I am afraid, I didn’t get why this shifting should work, but, in practice, on many matrices it does wonders for the speed of convergence.

Finally, note that wikipedia has a list of eigenvalue-algorithms that goes beyond what we have covered here. It is yet another question whether this is an exhaustive list in any sense.

In the end, I have come to a much deeper appreciation of these algorithms. As I said earlier, you can’t do the first thing with the definition of an eigenvalue, you need more sophisticated algorithms for this problem. And these algorithms may be trivial to implement on easy matrices, but to make them work on “random” matrices is already very hard work – who guarantees that you found all the eigenvalues, and that they are reasonably exact? And that doesn’t include any thoughts about why these algorithms should work. In all of them, there is really hard work to be done, not only for the very many special cases that arise in your applications. All of this came as a surprise to me when I first thought about it – after all eigenvalues are a very basic, very natural and rather easy concept in linear algebra. By now, I have found and appreciated that for the numerical tackling of the problem, many intelligent people have spent huge amounts of time and effort on this highly important problem. Sometimes, the gap between theory and practice is really large.


One Moment of Clarity

On Star Trek, Jadzia Dax said she experienced two moments of clarity in seven lifetimes. On Pulp Fiction, Jules experiences a moment of clarity in a near-death situation. I just had one this night when I lay awake in my bed. This was the final piece to a puzzle over which I have been brooding for more than four years now. Wonderful. For the first time ever, I feel I have understood everything about this one. It’s haunted me and even though I managed to connect the biggest parts two years ago, there has always been this one aspect that escaped me.

Now, I can understand. How marvellous.

Actually, the next posting is already in the pipeline, but it cannot hope to compare with this thought that came to my mind tonight. Made my day.