The methods for numerical integration are of extreme practical interest for any mathematician. In any number of applications one is obliged to find the value of some definite integral. One of the most trivial examples should be the values of probability distributions, a little less obvious example are solutions to differential equations.

Especially since the advent of cheap but high-performing computer power those possibilities have risen immensely. Things that were unattainable some decades ago are used thoroughly by today’s standards. This is why I found it very necessary to make myself familiar with how one can find definite integrals numerically (basically that’s the same motivation for why I wanted to know about numerical algorithms for finding eigenvalues – and there is a nice connection between the two fields, as we shall see).

At first, I had wanted to compile some of the basic ideas for numerical integration here. However, I dived deeper down into some certain aspect of the theory and decided to cut that one short to present something a little less known on convergence of quadrature methods. But to begin with, a little review of what every mathematician should learn in his basic course on numerical analysis.

The numerical integration is sometimes called quadrature, as one tries to find the value of a given shape (the “area under the curve”) and therefore find a square with the same area – similar to what one does in “squaring the circle” which is called “quadraturam circuli” in latin. It’s just that we don’t have to be restricted to ruler and compass anymore.

**Basics**

Of course, the natural way of numerical integration is to evaluate the function (to fix notation here) at certain points and make a weighted average of the values. This is related to the Riemann-definition of integrability, where you use sums of the form for and let the grid get finer. This is a starting point for what we can achieve, but the first question is, how can we choose the smartly?

Let’s not delve into this too deeply, but the short answer from Riemann-integration is: do as you please (as long as is continuous). In the end, it will converge. The most common first choice is the appropriately-called midpoint-rule (here’s a picture). If you don’t know anything further about , why not. But one will soon find the trapezoidal rule (picture) and there is no general rule which one of them will be better in any instance. Let’s just write those down for further reference:

– **Midpoint-rule**: ,

– **Trapezoidal rule**: .

A certain mixture of those two rules can be found in

– **Simpson’s rule**: .

All these rules are fit to integrate polynomials up to a certain degree exactly (which is why this feature is called the degree of exactness for the quadrature rule): the midpoint rule is exact for linear polynomials, the trapezoidal rule makes it up to degree 2 and Simpson’s rule can integrate cubic polynomials exactly.

Each of these rules can be constructed by interpolation of with low-degree polynomials: one chooses the unique polynomial for which in those that come up in the formulae given above and choose constants (“weights”) such that . Note, that this will not be a Riemann-sum anymore, in general. You will find that for being of degree , 2 or 3, you will end up with the three formulae given above.

Now, if you wish to be a little better in your interpolation, you can subdivide the interval a little further and use any of those rules on the smaller intervals (interpolating repeatedly) to get the compound rules:

– Midpoint-rule: with ,

– Trapezoidal rule: with ,

– Simpsons rule: with .

Note, that we have a constant grid-width here. This doesn’t have to be the case in general, as we shall see. Besides, note that in Simpson’s rule we have three evaluations of the in each of the sub-intervals, and only the boundary points are used twice (once as the left-hand point, once as the right-hand point). One might also define it slightly differently, by having the sub-intervals overlap and each evaluation of is used twice (once as the midpoint, the other time as a boundary point), except for and .

Now, any basic text on numerical analysis and Wikipedia will tell you everything there is to know about the error that you get from these quadrature formulae. Let’s not go there, except for some small remarks. First, each of these rules has an error that will diminish with more sub-divisions of . The error will depend on how smooth is, in terms of the norm for some . Therefore, to know about the error, you will need to conjure some information on the derivatives of , which will rarely be possible in applications. However, if is “only” continuous, you may find yourself in a bad position for error estimation: the quadrature rule might be just looking at the unusual points to evaluate , which are not typical of the overall behaviour:

(here’s an example going with the picture: take to be some positive constant , except for a tiny area around the evaluation, where it shall vanish; then – though not impossible, it’s harder to do this the smoother will be, and the derivatives of will blow up, giving a worse estimate). This effect has been called “resonance” by Pólya.

Apart from ad hoc methods for the error estimation there is the method of *Peano kernels* which can be applied to bound quadrature formulae (and other linear functionals). Let there just be mentioned that this method requires rather little effort to yield error bounds. I wouldn’t quite call it elegant, possibly since I lack the background in functional analysis, but it gets the job nicely done.

Next natural question: which should we use? Short answer: the more the better, though many evaluations of are “expensive” (especially in applications from the sciences and economics where you might even be unable to get more measurements). Apart from the compound rules, which are used quite often in practice, one might think that interpolation using polynomials of higher degree should work out better and give finer error estimates. The quadrature stemming from this idea is called Newton-Cotes-formula (of which our previous rules are special cases). In principle, this is sound reasoning, but here’s the showstopper: interpolating polynomials need not be similar to the function that they interpolate, and hence the quadrature runs wild. The most famous example is Runge’s phenomenon: for the sequence of polynomials interpolating on an equidistant grid does not converge pointwise, which can be explained by the singularity at that stays invisible while working on the real numbers only. In particular, an interpolating polynomial of sufficiently high degree won’t tell us anything useful about the behaviour of .

For numerical applications, the next problem is that the weights in Newton-Cotes-formulae become negative when (that means, if you use a polynomial of higher degree to interpolate on ). In effect, you will encounter annihilation in your computation from subtracting rather large numbers – there won’t be any guarantee for proper results anymore, even though in theory your method might work fine on your given function. Quadrature formulae with positive weights are called for.

Before the discussion of convergence of quadrature formulae, we will still look at what methods are best possible in terms of the degree of exactness. It’s the Gaussian integration (many practical things are named after Gauß, aren’t they?) which allows for exact integration of polynomials up to degree from evaluations.

But first, why will we be content with a degree of exactness , why not go further? Well, there is always at least one polynomial of degree that can’t be exactly integrated by a quadrature formula on evaluations. Let’s fix some quadrature formula, it will look like with weights and evaluations in , and let’s look at the polynomial which has its zeroes at the points of evaluation of the quadrature formula. Then, we find

Hence, there is always a polynomial of degree that can’t be integrated exactly, and therefore is the best we could hope to achieve. Now, how do we get there? Up to now, the quadrature formulae had used some (mostly because those were the natural thing to do or because we can’t always chose the at liberty, if they stem from experimental measurements, say) and the were computed accordingly to make the polynomials up to degree be integrated exactly. But we can of course choose the wisely as well: they can act as a degree of freedom, and then we’ll have clever choices to make – this seems to indicate that we can achieve the degree of exactness . But can we? And how?

Let’s assume that there is some quadrature formula which has this degree of exactness, and let’s look at what it will do to the polynomials , where is any polynomial of degree less than . Restricting ourselves to the interval for simplicity, we should find

That shows that would be orthogonal to , no matter what actually was (as the -space is a Hilbert space, of course, with the integration as its scalar product). From the abstract theory of Hilbert spaces we can know that there are orthonormal bases of polynomials. For our setting here, we are led to use the Legendre polynomials as ; there are other orthonormal polynomials which will then use another weight function inside the integral – later more on that.

We have seen that the zeroes of the Legendre polynomials should be used to evaluate . But how to find those? Here’s the pretty thing I alluded to above. There’s a recursion formula for the Legendre polynomials which can be shown by induction:

, with and ,

and using . This recursion may be re-written as a vector-matrix-shape:

The last vector drops out for the zeroes of , and then we have an eigenvalue-problem. In particular: find the eigenvalues of the matrix on the left-hand-side to find the zeroes of . That’s some rather unexpected connection – I found it marvellous when I first saw this.

Computing eigenvalues is a tricky thing to do, as everybody knows, and getting eigenvectors is even harder. It turns out that you’ll need the eigenvectors to compute the proper weights for the quadrature formula, and many intelligent people have come up with ingenious algorithms to do so, such that we can have the proper values for the maximum degree of exactness that we have yearned for.

This shall suffice here. We can understand that these so-called Gauss-Legendre-formulae can’t be expressed in a closed form, but they yield satisfying results in practice. Of course, when there was no cheap computer power available, those formulae were more of an academic exercise, as you couldn’t easily evaluate any given function in those crude points (even though these came in tables) and you can’t re-use those evaluations when you take other values for .

A numerical advantage is that the weights will be positive for the Gaussian quadrature, which is easy to see, as a matter of fact: with the Lagrange-polynomials , that are used for explicit interpolation with polynomials and that have for the evaluation points, we find

As an aside: there are some variants of Gauss-Legendre that may come in useful in some applications. For instance there is the *Gauss-Chebyshev-formula* which uses the weight function in the integration (not in the quadrature – weights and weight functions are not related directly). This means, if your function is of the shape , use Gauss-Chebyshev on the function , which may be easier or more stable; in that case, other and are necessary, of course. Similar things hold for *Gauss-Laguerre* (weight , the integral running on , which makes it possible to integrate on infinite intervals, being a non-trivial task at a first glance) and *Gauss-Hermite* (weight , ). Other variants have certain fixed points of evaluation, which certainly reduces the degree of exactness, but which might help in solving differential equations (when you always evaluate in the point , you save one evaluation as this is your starting-point of your multi-step method; *Gauss-Radau* does this, for instance, and similarly *Gauss-Lobatto* with and fixed).

Obvious linear transformations will enable us to use different values for and than the standard ones for which the various orthonormal polynomials are constructed. Let’s ignore this here.

Those were pretty much the basic facts of what any mathematician should know or at least should have heard once, without much of a proof stated here. Now for a little more specialized material. Who guarantees that the quadrature gets better with increasing numbers of evaluation? Evaluations are expensive, generally speaking, and maybe it doesn’t even do any good (as was the case with the Newton-Cotes-method – or does it)? It seems that Stieltjes first came up with this question and gave partial answers in 1884, while Pólya answered it rather exhaustively a couple of decades later in 1933. We rely on Pólya’s original article “Über die Konvergenz von Quadraturverfahren” (Math. Zeitschr. 37, 1933, 264-286)

**Convergence**

A quadrature method is, for all intents and purposes that are to come, a triangular scheme of weights and evaluation points like

and

For any , we need . Of course we shall combine those triangular schemes like this

and hold up our hope that . If so, the quadrature method *converges* for the funtion . Of course, needs to be at least Riemann-integrable (We are unable to focus on Lebesgue-integrable functions here, as no quadrature method can converge for any Lebesgue-integrable function: there are only countably many evaluation points on which may be altered at will without effect on its integral).

It is clear that the Newton-Cotes-formulae fit in this definition. We can choose the at liberty and the follow from the demand that in stage polynomials up to degree are integrated exactly. In particular, the Newton-Cotes-formulae converge for any polynomial by definition.

But there’s the question, what do the and need to satisfy in order for the quadrature method to converge on a given set of functions (any polynomial, any continuous or even any integrable function)? If you switch to a larger set, what are the additional necessary assumptions?

For polynomials, the case is quite easy. We look for for any , as every polynomial is a linear combination of the and both integration and quadrature are linear functionals. No big deal, and for in particular we find the important assumption .

Pólya’s result that we shall examine in a little more detail is this

**Theorem**: The quadrature method converges for every continuous function, if and only if it converges for every polynomial and there is a constant such that .

**One direction of the proof** is easy. Let and let be any continuous function. We show that the method converges for . We know by Weierstrass’ Approximation Theorem that there are uniformly close polynomials for , in the sense that for every we can get a polynomial with . Thus,

Here, we used that converges for the polynomial and that for sufficiently large . This shows that is convergent.

**The other direction of the proof** is a little harder. We assume that there is no constant such that – therefore . We will construct a continuous function for which diverges.

This will involve the “resonance” that I alluded to earlier: choose a function that reacts as badly as possible on the evaluation points. For fixed take such that and .

This ensures that cannot possibly take a larger value on any function that itself is bounded by . Such may be constructed, for instance, via and linear connections between the evaluation points.

Now, one of two possible cases may occur: *either* there is some with , *or* that expression stays bounded uniformly in . In the first case, we have already found the continuous function which makes the quadrature method diverge and we are done. So, let us look at the second case only.

Let us inductively construct a subsequence of the as follows. Let . If we have got , set

As we have already decided we’re in the second case, will be finite. Now, we wish to choose the next part of the subsequence, and we ensure both and (as , we may find some index to make this sum of weights as large as we like)

With the subsequence at our disposal, let’s define the function

This is a normally convergent series, as each of the is bounded by and thus for sufficiently large :

As a uniformly convergent limit of continuous functions, is continuous. Next, we invoke linearity of quadrature methods again and have a closer look at what the do to :

by the definition of and by the construction of . Then,

which may explain the strange bound that we used in the construction of the subsequence.

Letting , we see that the quadrature method diverges for . The entire construction hinges on the assumption that ; in particular, the condition given in the theorem is necessary for the convergence of the quadrature method.

The theorem is proved. **q.e.d.**

The condition for the convergence on Riemann-integrable functions is rather technical and we won’t dwell on this for too long. The basic ideas of resonance are similar, and in applications there is often good reason to assume that your function is continuous (as opposed to differentiable which you usually can’t know). We’ll content ourselves with the remark that quadrature methods converge on any Riemann-integrable function if and only if they converge for every continuous function and where is a sequence of intervals which contains the and the total length of the intervals converges to .

A little more interesting is the following **counter-example**: There is an analytic function for which the Newton-Cotes-formulae on equidistant grids diverge.

Analytic functions are about the most well-behaved functions you may imagine, and even there the Newton-Cotes-method won’t do much good (as long as they’re computed on equidistant grids). Of course, by the theorem we proved, we can already see why: the weights are not bounded and the basic problem is what happened in Runge’s counterexample; the interpolating polynomials don’t have anything to do with the function that we wish to integrate. As Ouspensky put it:

La formule de Cotes perd tout valeur pratique tant que le nombre des ordonnées devient considérable.

However, the problem is not inherent in Newton-Cotes: if you choose the points of evaluation wisely (as zeroes of Legendre polynomials, or the like, usually clustering them around the endpoints of the interval), the sequence will converge by the theorem that we have proved: the quadrature is then Gaussian, its weights are positive and integrate the constant function properly (which even gives and so ) – hence, it must converge for any continuous (and in particular: analytic) function. The message is: avoid negative weights in quadrature methods and choose wisely whatever you can choose; not only will you avoid trouble in numerical computations, but you may also keep nice convergence properties.

Let us at least sketch the **proofs of this counter-example**.

Let us switch the notation very slightly and fix it like this: take for , as we try to approximate . Then, .

We’ll need two preliminaries. First: Asymptotics on the quadrature weights. Second: An estimate for the interpolating polynomial.

**First preliminary**: Asymptotics on the quadrature weights. As we use Newton-Cotes, the idea is to interpolate and then integrate the corresponding polynomial. The interpolating polynomial can be expressed via the Lagrange polynomials and our idea narrows down to

On the other hand, . Hence, the weights must equal , or to write it down for once

The really troublesome part is an asymptotic formula for the growth of these . Pólya uses all kinds of neat ideas from the theory of the gamma function, far-sighted variable transformations and a lot of computation. This has been done by Ouspensky first (“Sur les valeurs asymptotiques des coefficients de Cotes“; Bull. Amer. Math. Soc., 31, 1925, 145-156), who concluded with the quote given above, that the weights for Newton-Cotes get larger and larger in absolute value and therefore become increasingly useless. The result (which was at least a little generalized and simplified by Pólya) goes like this

with . In particular, a close inspection shows the special case

for even . Let’s keep this guy in mind.

**Second preliminary**: An estimate for the interpolating polynomial. As we wish to say something about analytic functions, we can invoke the pretty results of complex analysis. This can tell us about a representation of the interpolation polynomial. In what follows, is, again, the polynomial with zeroes in the points of evaluation. Of course, in the complex plane interpolation works with the Lagrange polynomials as well:

Some close inspections show:

For the sake of seeing how to invoke the residue theorem, we set and . This leads to

The last equation followed from a little divine intuition – something we can’t avoid in this argument; in order to drop this intuition, we would have needed to start with an eerie expression that would have come from nowhere. So, how does it follow? The function has simple poles in the , while does not (the simple singularities in the cancel out in the numerator to leave us with a polynomial without further zeroes in the ). The Laurent expansion of starts with the , being the residue. Multiplying with , using the Taylor expansion of and taking the limit gives

The argument looks slightly butchered as we read it the wrong way around – but it works nonetheless. Now comes the residue theorem into play, applying it on any reasonable domain that contains the on its interior:

Re-writing this last expression with the help of Cauchy’s integral formula shows

Now, consider a certain domain in the complex plane

(half-circles of radius around and , connected by straight lines), in which is bounded by some constant . Let be analytic in this domain and being interpolated by the polynomial in the points which are located in the interval . For on the boundary of , we are far away from the interval and all in the sense that and , while for in we have . Hence,

Therefore

Let’s keep this one in mind as well.

From all of these preliminaries, let us state **the promised counter-example**: consider the function

with and where the denominator and the starting index are chosen to ensure that the singularity in is removable by a positive constant. This function is actually analytic as the uniform limit of analytic functions (the pretty complex analysis again; and here we need for the summands to actually diminish quickly enough). Besides, it is periodic and one can use this to find its maximum value on the horizontal bounds of the domain: by complex analysis (yet again!) the maximum must be located on the boundary, and if it were on the vertical bounds, we could (by periodicity) shift the domain without losing any information and then contradictively find the maximum on the inside of the domain. Therefore, any summand of which is constructed may be bounded inside the domain by

where the constant helps to encode the minimum of the cosine on (it won’t vanish anywhere near this line).

Now that we have bounded the analytic summand of our function , from our second preliminary, we get

and we shall consume the 3 into the constant without further notice. On top of that, we are about to apply a Newton-Cotes-method; the integral of the interpolating polynomial is equal to what all the quadratures do:

Now we see

since in case , for any relevant and all terms in vanish except for where the denominator vanishes as well and the singularity is removable by taking the value .

This is the time to take our first preliminary out of the drawer:

As , the term in the large brackets converges to 1 ( grows slower than does), and the entire term diverges. In particular, the Newton-Cotes-method diverges for . That’s what we wanted to show. Oof. **q.e.d.**

So, we have found necessary and sufficient conditions for quadrature methods to converge and we dealt with a striking counter-example for a plausible method on a reasonably nice class of functions to make us dismiss this plausible method. However, whenever we spoke about convergence, we didn’t deal with the speed of convergence – and we couldn’t have. As Davis and Rabinovitz have rather un-nervedly put it: Nothing positive can be said about the degree of convergence. Lipow and Stenger (“How slowly can quadrature Formulas converge?“; Math. Comp. 26, 1972, 917-922) have shown that all quadrature methods will converge arbitrarily slowly on the space of continuous functions: for every sequence converging to , there is a continuous function such that .

Before closing this chapter, let’s say that in practical applications it’s hard to know whether your approximation to the integral is sufficient. The convergence theorems don’t speak about the speed of convergence, the error estimates deal with sup-norms of derivatives. Most applications deal with adaptive methods (which we haven’t covered here) to have a closer look where the function might become erratic, and in the end the methods are used with a little hope in the back of the user’s minds that stable methods with a high degree of exactness will do sufficiently well. There can be no certainty, however. But still, it’s some pretty mathematics to spend your time with.