Polynomials aren’t well-behaved after all

A remarkable problem can be encountered in the context of finding the eigenvalues of a given quadratic matrix A\in\mathbb{C}^{n\times n}. As everybody knows, an eigenvalue is some \lambda\in\mathbb{C} that satisfies the equation Av=\lambda v for some vector v\neq0 (which is appropriately called the eigenvector). Every book on linear algebra will tell you the theoretical properties of these eigenvectors, for instance how you can find them: look at the characteristic polynomial \det (A-x I) and find the zeros. As we’re dealing with complex numbers, this polynomial will have complex roots and these are the eigenvalues (because if \lambda is a root, A-\lambda I will have a non-trivial kernel, which means in turn that there is some vector v\neq0 with (A-\lambda I)v = 0).

So far, so simple. For small matrices, you can do just that with a pencil and paper. For larger matrices, you can do just that in a computer. But there are some difficulties to overcome: computing determinants is quite expensive, and you will need to do this often for your iterative zero-finders. Not to mention that the derivative of this polynomial is not easily at hand. You really would need a symbolic machine before you can start with the numerics. But there is another, much bigger problem.

Everybody has an intuitive impression that the zeros of a polynomial depend continuously on its coefficients (proving that lies a little deeper – I’d start with the theorem on implicit functions which might show a little more than that. That doesn’t really matter here.). But this problem is highly unstable in the case of characteristic polynomials. This is what makes this construction break down when you try to find eigenvalues numerically – you will be forced to use different methods in order to get useful results. Hence, the chapters on computing eigenvalues in texts on numerical analysis will hardly ever use the definition of eigenvalues given above. Maybe the proofs will, but the algorithms will avoid that wherever possible. This is one not-so-rare instance where you want to get some mathematical object but you can’t do the first thing with its definition – you need to dig a lot deeper (which is why the eigenvalue-algorithms were never taught in the lectures on numerical mathematics that I was fed with – everybody avoided those because of the necessary overhead; and, not to mention, I hadn’t been that interested in these methods after all. I was young and enthusiastic about theory, not numerics – I even deemed the quadrature formulae un-useful. People change.).

We won’t cover the standard eigenvalue algorithms here. This post is about the counterexample to the slogan “Polynomials are well-behaved”. Let’s look at Wilkinson’s polynomial \prod_{i=1}^{20}(x-i). This is rather nice: well-separated zeros, all of them real, low degree… what more could you wish for?

These are the reasons why Wilkinson looked at this thing in the 1950’s. He wanted some easy test case for a zero-finder. So, he ignored his built-in knowledge of the zeros of this polynomial and let his routine look for the zeros on a 9-digit-precision machine. The results were devastatingly bad. On Wikipedia you can find the computed zeros of the polynomial, some of which have rather rough errors, others have moved to the complex plane.

The basic problem lies in the representation of the explicit polynomial in machine numbers. The constant term of the polynomial is 20!, and this isn’t even the largest of them. On Wilkinson’s machine, several coefficients encountered massive loss of precision and even in today’s double precision, some of the numbers aren’t represented correctly. Hence the root of the ill-conditioning of the problem: you can’t even put the polynomial properly in a computer. Even evaluating the polynomial in a small area around one of the precise zeros will yield results that are dominated by rounding errors (including an almost pseudo-random sign).

Wilkinson published a survey article on the problem which he labelled “The perfidious polynomial” and where he also conducts some analysis of the problem sloppily described above. Besides, you can find there some examples of where the numerical problems can take their origin – it may sometimes be necessary to determine the coefficients of an explicit polynomial much more accurately than the data are: the eigenvalue equation \det(A-x I) itself is rather insensitive to small perturbations of the matrix, but the transformation to the explicit characteristic polynomial may demand for massively higher accuracy. This makes sense heuristically: the matrix contains n^2 entries which are condensed to n coefficients of the characteristic polynomial. In theory, no information is lost by this transformation, but in practice you might need to take a closer look than usual.

All this serves to show the care that needs to be taken in numerical analysis. Even things that are as well-behaved and well-understood as polynomials can show unanticipated behavior, notwithstanding fancy theorems like Weierstrass’s Approximation Theorem and others. You can hardly ever be careful enough.