# On numerical integration and convergence issues

The methods for numerical integration are of extreme practical interest for any mathematician. In any number of applications one is obliged to find the value of some definite integral. One of the most trivial examples should be the values of probability distributions, a little less obvious example are solutions to differential equations.

Especially since the advent of cheap but high-performing computer power those possibilities have risen immensely. Things that were unattainable some decades ago are used thoroughly by today’s standards. This is why I found it very necessary to make myself familiar with how one can find definite integrals numerically (basically that’s the same motivation for why I wanted to know about numerical algorithms for finding eigenvalues – and there is a nice connection between the two fields, as we shall see).

At first, I had wanted to compile some of the basic ideas for numerical integration here. However, I dived deeper down into some certain aspect of the theory and decided to cut that one short to present something a little less known on convergence of quadrature methods. But to begin with, a little review of what every mathematician should learn in his basic course on numerical analysis.

The numerical integration is sometimes called quadrature, as one tries to find the value of a given shape (the “area under the curve”) and therefore find a square with the same area – similar to what one does in “squaring the circle” which is called “quadraturam circuli” in latin. It’s just that we don’t have to be restricted to ruler and compass anymore.

Basics

Of course, the natural way of numerical integration is to evaluate the function $f$ (to fix notation here) at certain points and make a weighted average of the values. This is related to the Riemann-definition of integrability, where you use sums of the form $\sum_{i=1}^nf(\xi_i)(x_{i}-x_{i-1})$ for $\xi_i\in(x_{i-1},x_{i})$ and let the grid $x_i$ get finer. This is a starting point for what we can achieve, but the first question is, how can we choose the $\xi_i$ smartly?

Let’s not delve into this too deeply, but the short answer from Riemann-integration is: do as you please (as long as $f$ is continuous). In the end, it will converge. The most common first choice is the appropriately-called midpoint-rule (here’s a picture). If you don’t know anything further about $f$, why not. But one will soon find the trapezoidal rule (picture) and there is no general rule which one of them will be better in any instance. Let’s just write those down for further reference:

Midpoint-rule: $\int_a^b f(t)dt \approx (b-a)f(\frac{a+b}2)$,

Trapezoidal rule: $\int_a^b f(t)dt \approx \frac{b-a}{2}\bigl(f(a)+f(b)\bigr)$.

A certain mixture of those two rules can be found in

Simpson’s rule: $\int_a^b f(t)dt \approx \frac{b-a}{6}(f(a)+4f(\frac{a+b}2)+f(b)\bigr)$.

All these rules are fit to integrate polynomials up to a certain degree exactly (which is why this feature is called the degree of exactness for the quadrature rule): the midpoint rule is exact for linear polynomials, the trapezoidal rule makes it up to degree 2 and Simpson’s rule can integrate cubic polynomials exactly.

Each of these rules can be constructed by interpolation of $f$ with low-degree polynomials: one chooses the unique polynomial $p$ for which $p(t)=f(t)$ in those $t$ that come up in the formulae given above and choose constants (“weights”) such that $\int_a^b p(t)dt = \sum_{i=1}^n w_ip(t_i)$. Note, that this will not be a Riemann-sum anymore, in general. You will find that for $p$ being of degree $n=1$, 2 or 3, you will end up with the three formulae given above.

Now, if you wish to be a little better in your interpolation, you can subdivide the interval $[a,b]$ a little further and use any of those rules on the smaller intervals (interpolating repeatedly) to get the compound rules:

– Midpoint-rule: $\int_a^b f(t)dt \approx \frac{b-a}{n}\sum_{i=1}^nf(\frac{x_{i}-x_{i-1}}2)$ with $x_k = a+k\frac{b-a}{n}$,

– Trapezoidal rule: $\int_a^b f(t)dt \approx \frac{b-a}{n}\sum_{i=1}^n\frac{f(x_{i-1})+f(x_i)}{2} = \frac12f(a)+\frac{b-a}{n}\sum_{i=1}^{n-1}f(x_i)+\frac12f(b)$ with $x_k = a+k\frac{b-a}{n}$,

– Simpsons rule: $\int_a^b f(t)dt \approx \frac{b-a}{6n}\bigl(f(a) + 2\sum_{i=1}^n f(x_i) + 4\sum_{i=1}^nf(\frac{x_{i-1}+x_{i}}{2}) + f(b)\bigr)$ with $x_k = a+k\frac{b-a}{n}$.

Note, that we have a constant grid-width here. This doesn’t have to be the case in general, as we shall see. Besides, note that in Simpson’s rule we have three evaluations of the $f$ in each of the sub-intervals, and only the boundary points $x_k$ are used twice (once as the left-hand point, once as the right-hand point). One might also define it slightly differently, by having the sub-intervals overlap and each evaluation of $f$ is used twice (once as the midpoint, the other time as a boundary point), except for $f(a)$ and $f(b)$.

Now, any basic text on numerical analysis and Wikipedia will tell you everything there is to know about the error that you get from these quadrature formulae. Let’s not go there, except for some small remarks. First, each of these rules has an error that will diminish with more sub-divisions of $[a,b]$. The error will depend on how smooth $f$ is, in terms of the norm $\left\|f^{(k)}\right\|_\infty$ for some $k\geq1$. Therefore, to know about the error, you will need to conjure some information on the derivatives of $f$, which will rarely be possible in applications. However, if $f$ is “only” continuous, you may find yourself in a bad position for error estimation: the quadrature rule might be just looking at the unusual points to evaluate $f$, which are not typical of the overall behaviour:

(here’s an example going with the picture: take $f$ to be some positive constant $c$, except for a tiny area around the evaluation, where it shall vanish; then $\int_a^bf(t)dt \approx c(b-a) \neq 0 = \sum_{i=1}^n w_if(x_i)$ – though not impossible, it’s harder to do this the smoother $f$ will be, and the derivatives of $f$ will blow up, giving a worse estimate). This effect has been called “resonance” by Pólya.

Apart from ad hoc methods for the error estimation there is the method of Peano kernels which can be applied to bound quadrature formulae (and other linear functionals). Let there just be mentioned that this method requires rather little effort to yield error bounds. I wouldn’t quite call it elegant, possibly since I lack the background in functional analysis, but it gets the job nicely done.

Next natural question: which $n$ should we use? Short answer: the more the better, though many evaluations of $f$ are “expensive” (especially in applications from the sciences and economics where you might even be unable to get more measurements). Apart from the compound rules, which are used quite often in practice, one might think that interpolation using polynomials of higher degree should work out better and give finer error estimates. The quadrature stemming from this idea is called Newton-Cotes-formula (of which our previous rules are special cases). In principle, this is sound reasoning, but here’s the showstopper: interpolating polynomials need not be similar to the function that they interpolate, and hence the quadrature runs wild. The most famous example is Runge’s phenomenon: for $f(x)=\frac{1}{1+x^2}$ the sequence of polynomials interpolating on an equidistant grid does not converge pointwise, which can be explained by the singularity at $x = \pm i$ that stays invisible while working on the real numbers only. In particular, an interpolating polynomial of sufficiently high degree won’t tell us anything useful about the behaviour of $f$.

For numerical applications, the next problem is that the weights in Newton-Cotes-formulae become negative when $n\geq 9$ (that means, if you use a polynomial of higher degree to interpolate $f$ on $[a,b]$). In effect, you will encounter annihilation in your computation from subtracting rather large numbers – there won’t be any guarantee for proper results anymore, even though in theory your method might work fine on your given function. Quadrature formulae with positive weights are called for.

Before the discussion of convergence of quadrature formulae, we will still look at what methods are best possible in terms of the degree of exactness. It’s the Gaussian integration (many practical things are named after Gauß, aren’t they?) which allows for exact integration of polynomials up to degree $2n-1$ from $n$ evaluations.

But first, why will we be content with a degree of exactness $2n-1$, why not go further? Well, there is always at least one polynomial of degree $2n$ that can’t be exactly integrated by a quadrature formula on $n$ evaluations. Let’s fix some quadrature formula, it will look like $\sum_{i=1}^nw_if(x_i)$ with weights $w_i$ and evaluations in $x_i$, and let’s look at the polynomial $\omega^2(x) = \prod_{i=1}^n(x-x_i)^2$ which has its zeroes at the points of evaluation of the quadrature formula. Then, we find

$\displaystyle \sum_{i=1}^nw_i\omega^2(x_i) = 0 < \int_a^b \omega^2(t)dt.$

Hence, there is always a polynomial of degree $2n$ that can’t be integrated exactly, and therefore $2n-1$ is the best we could hope to achieve. Now, how do we get there? Up to now, the quadrature formulae had used some $x_i$ (mostly because those $x_i$ were the natural thing to do or because we can’t always chose the $x_i$ at liberty, if they stem from experimental measurements, say) and the $w_i$ were computed accordingly to make the polynomials up to degree $n$ be integrated exactly. But we can of course choose the $x_i$ wisely as well: they can act as a degree of freedom, and then we’ll have $2n$ clever choices to make – this seems to indicate that we can achieve the degree of exactness $2n-1$. But can we? And how?

Let’s assume that there is some quadrature formula which has this degree of exactness, and let’s look at what it will do to the polynomials $\omega p$, where $p$ is any polynomial of degree less than $n$. Restricting ourselves to the interval $[a,b] = [-1,1]$ for simplicity, we should find

$\displaystyle 0 = \sum_{i=1}^nw_i\omega(x_i)p(x_i) = \int_{-1}^1\omega(t)p(t)dt.$

That shows that $p$ would be orthogonal to $\omega$, no matter what $p$ actually was (as the $L^2$-space is a Hilbert space, of course, with the integration as its scalar product). From the abstract theory of Hilbert spaces we can know that there are orthonormal bases of polynomials. For our setting here, we are led to use the Legendre polynomials as $\omega$; there are other orthonormal polynomials which will then use another weight function inside the integral – later more on that.

We have seen that the zeroes of the Legendre polynomials should be used to evaluate $f$. But how to find those? Here’s the pretty thing I alluded to above. There’s a recursion formula for the Legendre polynomials $P_n$ which can be shown by induction:

$\displaystyle \beta_{n+1}P_{n+1}(x) = xP_n(x) - \beta_nP_{n-1}(x)$, with $\displaystyle P_1\equiv\frac1{\sqrt2}$ and $\displaystyle P_2(x) = \frac32x$,

and using $\beta_n = \frac{n}{\sqrt{(2n-1)(2n+1)}}$. This recursion may be re-written as a vector-matrix-shape:

$\displaystyle\begin{bmatrix}0&\beta_1&0\\\beta_1&0&\beta_2&0\\0&\beta_2&0&\beta_3&0\\&&\ddots&\ddots&\ddots\\&&&\beta_{n-2}&0&\beta_{n-1}\\&&&0&\beta_{n-1}&0\end{bmatrix}\begin{bmatrix}P_0(x)\\P_1(x)\\P_2(x)\\\vdots\\P_{n-2}(x)\\P_{n-1}(x)\end{bmatrix}= x\begin{bmatrix}P_0(x)\\P_1(x)\\P_2(x)\\\vdots\\P_{n-2}(x)\\P_{n-1}(x)\end{bmatrix}-\begin{bmatrix}0\\0\\0\\\vdots\\0\\\beta_nP_n(x)\end{bmatrix}$

The last vector drops out for the zeroes of $P_n$, and then we have an eigenvalue-problem. In particular: find the eigenvalues of the matrix on the left-hand-side to find the zeroes of $P_n$. That’s some rather unexpected connection – I found it marvellous when I first saw this.

Computing eigenvalues is a tricky thing to do, as everybody knows, and getting eigenvectors is even harder. It turns out that you’ll need the eigenvectors to compute the proper weights $w_i$ for the quadrature formula, and many intelligent people have come up with ingenious algorithms to do so, such that we can have the proper values for the maximum degree of exactness that we have yearned for.

This shall suffice here. We can understand that these so-called Gauss-Legendre-formulae can’t be expressed in a closed form, but they yield satisfying results in practice. Of course, when there was no cheap computer power available, those formulae were more of an academic exercise, as you couldn’t easily evaluate any given function $f$ in those crude points $x_i$ (even though these $x_i$ came in tables) and you can’t re-use those evaluations when you take other values for $n$.

A numerical advantage is that the weights will be positive for the Gaussian quadrature, which is easy to see, as a matter of fact: with the Lagrange-polynomials $\ell_{ni}$, that are used for explicit interpolation with polynomials and that have $\ell_{ni}(x_j) = \delta_{ij}$ for the evaluation points, we find

$\displaystyle w_i = \sum_{j=1}^nw_j\ell_{ni}^2(x_j) = \int_{-1}^1\ell_{ni}^2(t)dt > 0.$

As an aside: there are some variants of Gauss-Legendre that may come in useful in some applications. For instance there is the Gauss-Chebyshev-formula which uses the weight function $\frac1{\sqrt{1-x^2}}$ in the integration (not in the quadrature – weights and weight functions are not related directly). This means, if your function $f$ is of the shape $f(x) = \frac{g(x)}{\sqrt{1-x^2}}$, use Gauss-Chebyshev on the function $g$, which may be easier or more stable; in that case, other $x_i$ and $w_i$ are necessary, of course. Similar things hold for Gauss-Laguerre (weight $e^{-x}$, the integral running on $[a,b)=[0,\infty)$, which makes it possible to integrate on infinite intervals, being a non-trivial task at a first glance) and Gauss-Hermite (weight $e^{-x^2}$, $(a,b)=(-\infty,\infty)$). Other variants have certain fixed points of evaluation, which certainly reduces the degree of exactness, but which might help in solving differential equations (when you always evaluate in the point $x_1=a$, you save one evaluation as this is your starting-point of your multi-step method; Gauss-Radau does this, for instance, and similarly Gauss-Lobatto with $x_1=a$ and $x_n=b$ fixed).

Obvious linear transformations will enable us to use different values for $a$ and $b$ than the standard ones for which the various orthonormal polynomials are constructed. Let’s ignore this here.

Those were pretty much the basic facts of what any mathematician should know or at least should have heard once, without much of a proof stated here. Now for a little more specialized material. Who guarantees that the quadrature gets better with increasing numbers of evaluation? Evaluations are expensive, generally speaking, and maybe it doesn’t even do any good (as was the case with the Newton-Cotes-method – or does it)? It seems that Stieltjes first came up with this question and gave partial answers in 1884, while Pólya answered it rather exhaustively a couple of decades later in 1933. We rely on Pólya’s original article “Über die Konvergenz von Quadraturverfahren” (Math. Zeitschr. 37, 1933, 264-286)

Convergence

A quadrature method is, for all intents and purposes that are to come, a triangular scheme of weights $w_{ij}$ and evaluation points $x_{ij}$ like

$\displaystyle \begin{matrix}w_{11}\hphantom{,}\\w_{21}, &w_{22}\hphantom{,}\\w_{31},&w_{32},&w_{33}\\\vdots\\w_{n1},&w_{n2},&\ldots&w_{nn}&\\\vdots&\end{matrix}\hspace{2em}$ and $\hspace{2em}\displaystyle \begin{matrix}x_{11}\hphantom{,}\\x_{21}, &x_{22}\hphantom{,}\\x_{31},&x_{32},&x_{33}\\\vdots\\x_{n1},&x_{n2},&\ldots&x_{nn}&\\\vdots&\end{matrix}$

For any $n$, we need $a\leq x_{n1}. Of course we shall combine those triangular schemes like this

$\displaystyle Q_n[f] := \sum_{i=1}^nw_{ni}f(x_{ni})$

and hold up our hope that $\lim_{n\to\infty} Q_n[f] = \int_a^bf(t)dt$. If so, the quadrature method converges for the funtion $f$. Of course, $f$ needs to be at least Riemann-integrable (We are unable to focus on Lebesgue-integrable functions here, as no quadrature method can converge for any Lebesgue-integrable function: there are only countably many evaluation points on which $f$ may be altered at will without effect on its integral).

It is clear that the Newton-Cotes-formulae fit in this definition. We can choose the $x_{ij}$ at liberty and the $w_{ij}$ follow from the demand that in stage $n$ polynomials up to degree $n-1$ are integrated exactly. In particular, the Newton-Cotes-formulae converge for any polynomial by definition.

But there’s the question, what do the $x_{ij}$ and $w_{ij}$ need to satisfy in order for the quadrature method to converge on a given set of functions (any polynomial, any continuous or even any integrable function)? If you switch to a larger set, what are the additional necessary assumptions?

For polynomials, the case is quite easy. We look for $\lim_{n\to\infty}\sum_{i=1}^nw_{ni}x_{ni}^k = \frac{1}{k+1}(b^{k+1}-a^{k+1})$ for any $k\geq0$, as every polynomial is a linear combination of the $x^k$ and both integration and quadrature are linear functionals. No big deal, and for $k=0$ in particular we find the important assumption $\lim_{n\to\infty}\sum_{i=1}^nw_{ni} = b-a$.

Pólya’s result that we shall examine in a little more detail is this

Theorem: The quadrature method converges for every continuous function, if and only if it converges for every polynomial and there is a constant $\Gamma$ such that $\sup_n\sum_{i=1}^n\left|w_{ni}\right| < \Gamma$.

One direction of the proof is easy. Let $\sup_n\sum_{i=1}^n\left|w_{ni}\right| < \Gamma$ and let $f$ be any continuous function. We show that the method converges for $f$. We know by Weierstrass’ Approximation Theorem that there are uniformly close polynomials for $f$, in the sense that for every $\varepsilon>0$ we can get a polynomial $p$ with $\left\|f(t) - p(t)\right\|_\infty<\varepsilon$. Thus,

\displaystyle \begin{aligned} \left|\int_a^bf(t)dt - Q_n[f]\right| &\leq \int_a^b\left|f(t)-p(t)\right|dt + \left|\int_a^bp(t)dt - Q_n[p]\right| + \sum_{i=1}^n\left|w_{ni}\right|\left|p(x_{ni})-f(x_{ni})\right| \\ &\leq \varepsilon(b-a) + \varepsilon + \varepsilon\Gamma. \end{aligned}

Here, we used that $Q_n$ converges for the polynomial $p$ and that for $n$ sufficiently large $\left\|f-p\right\|_\infty<\varepsilon$. This shows that $Q_n[f]$ is convergent.

The other direction of the proof is a little harder. We assume that there is no constant $\Gamma$ such that $\sup_n\sum_{i=1}^n\left|w_{ni}\right| < \Gamma$ – therefore $\limsup_n\sum_{i=1}^n\left|w_{ni}\right| = \infty$. We will construct a continuous function $f$ for which $Q_n[f]$ diverges.

This will involve the “resonance” that I alluded to earlier: choose a function that reacts as badly as possible on the evaluation points. For fixed $k$ take $g_k$ such that $\left\|g_k\right\|_\infty = 1$ and $Q_k[g_k] = \sum_{i=1}^k\left|w_{ki}\right|$.

This ensures that $Q_k[g_k]$ cannot possibly take a larger value on any function that itself is bounded by $1$. Such $g_k$ may be constructed, for instance, via $g_k(x_{kj}) = \mathrm{sgn} w_{kj}$ and linear connections between the evaluation points.

Now, one of two possible cases may occur: either there is some $k$ with $\limsup_n\left|Q_n[g_k]\right|=\infty$, or that expression stays bounded uniformly in $k$. In the first case, we have already found the continuous function which makes the quadrature method diverge and we are done. So, let us look at the second case only.

Let us inductively construct a subsequence of the $g_k$ as follows. Let $g_{k_1} = g_1$. If we have got $g_{k_{m-1}}$, set

$\displaystyle M_{m-1}:= \sup_nQ_n\left[\sum_{j=1}^{m-1}g_{k_j}3^{-j}\right].$

As we have already decided we’re in the second case, $M_{m-1}$ will be finite. Now, we wish to choose the next part $g_{k_m}$ of the subsequence, and we ensure both $k_m>k_{m-1}$ and (as $\limsup_n\sum_{i=1}^n\left|w_{ni}\right| = \infty$, we may find some index to make this sum of weights as large as we like)

$\displaystyle \sum_{i=1}^{k_m}\left|w_{k_mi}\right| > 3^m 2(M_{m-1} + m).$

With the subsequence $g_{k_m}$ at our disposal, let’s define the function

$\displaystyle f(x):=\sum_{j=1}^\infty 3^{-j}g_{k_j}(x).$

This is a normally convergent series, as each of the $g_{k_j}$ is bounded by $1$ and thus for sufficiently large $n$:

$\displaystyle\left\|\sum_{j=n}^\infty 3^{-j}g_{k_j}(x)\right\|_\infty\leq\sum_{j=n}^\infty 3^{-j} < C3^{-n}<\varepsilon.$

As a uniformly convergent limit of continuous functions, $f$ is continuous. Next, we invoke linearity of quadrature methods again and have a closer look at what the $Q_n$ do to $f$:

\displaystyle \begin{aligned} Q_{k_m}[f] &= Q_{k_m}\left[\sum_{j=1}^{m-1}g_{k_j}3^{-j}\right] + 3^{-m}Q_{k_m}\left[g_{k_m}\right] + Q_{k_m}\left[\sum_{j=m+1}^\infty g_{k_j}3^{-j}\right]\\ & > -M_{m-1} + 3^{-m}\sum_{j=1}^{k_m}\left|w_{k_mj}\right| - \sum_{\ell=1}^{k_m}\left|w_{k_m\ell}\right|\sum_{j=m+1}^\infty 1\cdot 3^{-j}, \end{aligned}

by the definition of $M_{m-1}$ and by the construction of $g_{k_m}$. Then,

\displaystyle \begin{aligned} Q_{k_m}[f] & = -M_{m-1} + 3^{-m}\sum_{j=1}^{k_m}\left|w_{k_mj}\right| - 3^{-m-1}\sum_{\ell=1}^{k_m}\left|w_{k_m\ell}\right|\cdot\frac1{1-\frac13}\\ & = -M_{m-1} + \frac123^{-m}\sum_{j=1}^{k_m}\left|w_{k_mj}\right|\\ & > m, \end{aligned}

which may explain the strange bound that we used in the construction of the subsequence.

Letting $m\to\infty$, we see that the quadrature method diverges for $f$. The entire construction hinges on the assumption that $\limsup_n\sum_{i=1}^n\left|w_{ni}\right| = \infty$; in particular, the condition given in the theorem is necessary for the convergence of the quadrature method.

The theorem is proved. q.e.d.

The condition for the convergence on Riemann-integrable functions is rather technical and we won’t dwell on this for too long. The basic ideas of resonance are similar, and in applications there is often good reason to assume that your function is continuous (as opposed to differentiable which you usually can’t know). We’ll content ourselves with the remark that quadrature methods converge on any Riemann-integrable function if and only if they converge for every continuous function and $\limsup_n\sum_{i\in I_n}\left|w_{ni}\right|=0$ where $I_n$ is a sequence of intervals which contains the $x_{ni}$ and the total length of the intervals converges to $0$.

A little more interesting is the following counter-example: There is an analytic function for which the Newton-Cotes-formulae on equidistant grids diverge.

Analytic functions are about the most well-behaved functions you may imagine, and even there the Newton-Cotes-method won’t do much good (as long as they’re computed on equidistant grids). Of course, by the theorem we proved, we can already see why: the weights are not bounded and the basic problem is what happened in Runge’s counterexample; the interpolating polynomials don’t have anything to do with the function $f$ that we wish to integrate. As Ouspensky put it:

La formule de Cotes perd tout valeur pratique tant que le nombre des ordonnées devient considérable.

However, the problem is not inherent in Newton-Cotes: if you choose the points of evaluation wisely (as zeroes of Legendre polynomials, or the like, usually clustering them around the endpoints of the interval), the sequence will converge by the theorem that we have proved: the quadrature is then Gaussian, its weights are positive and integrate the constant function $1$ properly (which even gives $\sum_{i=1}^n w_{ni} = b-a$ and so $b-a =: \Gamma <\infty$) – hence, it must converge for any continuous (and in particular: analytic) function. The message is: avoid negative weights in quadrature methods and choose wisely whatever you can choose; not only will you avoid trouble in numerical computations, but you may also keep nice convergence properties.

Let us at least sketch the proofs of this counter-example.

Let us switch the notation very slightly and fix it like this: take $x_{ni} = \frac in$ for $i=0,\ldots,n$, as we try to approximate $\int_0^1f(t)dt$. Then, $Q_n[f] = \sum_{i=0}^nw_{ni}f(\frac in)$.

We’ll need two preliminaries. First: Asymptotics on the quadrature weights. Second: An estimate for the interpolating polynomial.

First preliminary: Asymptotics on the quadrature weights. As we use Newton-Cotes, the idea is to interpolate and then integrate the corresponding polynomial. The interpolating polynomial can be expressed via the Lagrange polynomials $\ell_{ni}(x) := \prod_{j=1, j\neq i}^n\frac{x-x_{nj}}{x_{ni}-x_{nj}}$ and our idea narrows down to

$\displaystyle \int_0^1f(t)dt\approx \int_0^1p(t)dt = \int_0^1\sum_{i=0}^nf(x_{ni})\ell_{ni}(t)dt = \sum_{i=0}^nf(x_{ni})\int_0^1\ell_{ni}(t)dt.$

On the other hand, $\int_0^1f(t)dt\approx Q_n[f] = \sum_{i=0}^nf(x_{ni})w_{ni}$. Hence, the weights $w_{ni}$ must equal $\int_0^1\ell_{ni}(t)dt$, or to write it down for once

$\displaystyle w_{nk} = \int_0^1\frac{nt\cdot (nt-1)\cdots (nt-k+1)\cdot (nt-k-1)\cdots (nt-n)}{k(k-1)\cdots 1\cdot(-1)\cdots(k-n)}dt.$

The really troublesome part is an asymptotic formula for the growth of these $w_{nk}$. Pólya uses all kinds of neat ideas from the theory of the gamma function, far-sighted variable transformations and a lot of computation. This has been done by Ouspensky first (“Sur les valeurs asymptotiques des coefficients de Cotes“; Bull. Amer. Math. Soc., 31, 1925, 145-156), who concluded with the quote given above, that the weights for Newton-Cotes get larger and larger in absolute value and therefore become increasingly useless. The result (which was at least a little generalized and simplified by Pólya) goes like this

$\displaystyle w_{nk} = -\frac{1}{n(\log n)^2}\binom{n}{k}\left(\frac{(-1)^k}{k}+\frac{(-1)^{n-k}}{n-k}\right)(1+\eta_{nk})$

with $\lim_n\sup_k\eta_{nk}=0$. In particular, a close inspection shows the special case

$\displaystyle w_{n,\frac n2} > 2^n n^{-3}$

for even $n$. Let’s keep this guy in mind.

Second preliminary: An estimate for the interpolating polynomial. As we wish to say something about analytic functions, we can invoke the pretty results of complex analysis. This can tell us about a representation of the interpolation polynomial. In what follows, $\omega$ is, again, the polynomial with zeroes in the points of evaluation.  Of course, in the complex plane interpolation works with the Lagrange polynomials as well:

$\displaystyle p(x) = \sum_{i=0}^nf(x_{ni})\prod_{j=0, j\neq i}^n\frac{x-x_{nj}}{x_{ni}-x_{nj}}.$

Some close inspections show:

\displaystyle \begin{aligned} p(x) &= \sum_{i=0}^nf(x_{ni})\frac{\prod_{j=0, j\neq i}^n(x-x_{nj})}{\prod_{j=0, j\neq i}^n(x_{ni}-x_{nj})}\\ &= \sum_{i=0}^nf(x_{ni})\frac{\prod_{j=0}^n(x-x_{nj})}{(x-x_{ni})\sum_{k=0}^n\prod_{\ell=0, \ell\neq k}^n(x_{ni}-x_{n\ell})}\\ &= \sum_{i=0}^n\frac{\omega(x)f(x_{ni})}{(x-x_{ni})\omega'(x_{ni})}. \end{aligned}

For the sake of seeing how to invoke the residue theorem, we set $F(t) := \frac{\omega(x)-\omega(t)}{x-t}f(t)$ and $G(t):=\omega(t)$. This leads to

$\displaystyle p(x) = \sum_{i=0}^n\frac{F(x_{ni})}{G'(x_{ni})} = \sum_{i=1}^n\mathrm{res}_{x_{ni}}\left(\frac{F}{G}\right).$

The last equation followed from a little divine intuition – something we can’t avoid in this argument; in order to drop this intuition, we would have needed to start with an eerie expression that would have come from nowhere. So, how does it follow? The function $\frac1G$ has simple poles in the $x_{ni}$, while $F$ does not (the simple singularities in the $x_{ni}$ cancel out in the numerator to leave us with a polynomial without further zeroes in the $x_{ni}$). The Laurent expansion of $\frac FG$ starts with the $a_{-1}(x-x_{ni})^{-1}+\cdots$, $a_{-1}$ being the residue. Multiplying with $x-x_{ni}$, using the Taylor expansion of $G$ and taking the limit gives

\displaystyle \begin{aligned} \mathrm{res}_{x_{ni}}\left(\frac{F}{G}\right) = \lim_{x\to x_{ni}}(x-x_{ni})\frac{F(x)}{G(x)} &= \lim_{x\to x_{ni}}(x-x_{ni})\frac{F(x)}{(x-x_{ni})G'(x_{ni})+\cdots} \\ &= \lim_{x\to x_{ni}}\frac{F(x)}{G'(x_{ni})+\cdots} = \frac{F(x_{ni})}{G'(x_{ni})}. \end{aligned}

The argument looks slightly butchered as we read it the wrong way around – but it works nonetheless. Now comes the residue theorem into play, applying it on any reasonable domain $D$ that contains the $x_{ni}$ on its interior:

\displaystyle \begin{aligned} p(x) = \sum_{i=1}^n\mathrm{res}_{x_{ni}}\left(\frac{F(\zeta)}{G(\zeta)}\right)&=\sum_{i=1}^n\mathrm{res}_{x_{ni}}\left(\frac{\bigl(\omega(x)-\omega(\zeta)\bigr)f(\zeta)}{(x-\zeta)\omega(\zeta)}\right)\\ &= \frac1{2\pi i}\int_{\partial D}\frac{\omega(x)-\omega(\zeta)}{(x-\zeta)\omega(\zeta)}f(\zeta)d\zeta \end{aligned}

Re-writing this last expression with the help of Cauchy’s integral formula shows

\displaystyle \begin{aligned}p(x) &= \frac1{2\pi i}\int_{\partial D}\frac{\omega(x)f(\zeta)}{(x-\zeta)\omega(\zeta)}d\zeta - \frac1{2\pi i}\int_{\partial D}\frac{f(\zeta)}{x-\zeta}d\zeta \\ &= \frac1{2\pi i}\int_{\partial D}\frac{\omega(x)f(\zeta)}{(x-\zeta)\omega(\zeta)}d\zeta + f(x). \end{aligned}

Now, consider a certain domain $D$ in the complex plane

(half-circles of radius $1$ around $z=0$ and $z=1$, connected by straight lines), in which $f$ is bounded by some constant $M$. Let $f$ be analytic in this domain $D$ and being interpolated by the polynomial $p$ in the points $x_{n0},\ldots,x_{nn}$ which are located in the interval $[0,1]$. For $\zeta$ on the boundary of $D$, we are far away from the interval $[0,1]$ and all $x_{ni}$ in the sense that $\left|\omega(\zeta)\right|\geq1$ and $\left|x-\zeta\right|\geq1$, while for $x$ in $[0,1]$ we have $\left|\omega(x)\right|<1$. Hence,

$\displaystyle \left|p(x)-f(x)\right|\leq \left|\frac1{2\pi i}\int_{\partial D}\frac{\omega(x)f(\zeta)}{(x-\zeta)\omega(\zeta)}d\zeta\right| \leq \frac1{2\pi}(2+2\pi)M.$

Therefore

$\displaystyle \left|\int_0^1p(t)dt\right|\leq\left|\int_0^1p(t)-f(t)dt\right|+\left|\int_0^1f(t)dt\right|\leq \frac{\pi+1}{\pi}M+M\leq 3M.$

Let’s keep this one in mind as well.

From all of these preliminaries, let us state the promised counter-example: consider the function

$\displaystyle f(x) := \sum_{k=4}^\infty a^{k!}\frac{\sin(k! \pi x)}{-\cos(\pi x)},$

with $a\in(\frac12, 1)$ and where the denominator and the starting index $k=4$ are chosen to ensure that the singularity in $x=\frac12$ is removable by a positive constant. This function is actually analytic as the uniform limit of analytic functions (the pretty complex analysis again; and here we need $a<1$ for the summands to actually diminish quickly enough). Besides, it is periodic and one can use this to find its maximum value on the horizontal bounds of the domain: by complex analysis (yet again!) the maximum must be located on the boundary, and if it were on the vertical bounds, we could (by periodicity) shift the domain without losing any information and then contradictively find the maximum on the inside of the domain. Therefore, any summand of which $f$ is constructed may be bounded inside the domain by

$\displaystyle \left|a^{k!}\frac{\sin(k! \pi x)}{-\cos(\pi x)}\right| \leq Ca^{k!}\exp(k!\pi \mathrm{Im}(x)) \leq Ca^{k!}\exp(k!\pi),$

where the constant $C$ helps to encode the minimum of the cosine on $\mathrm{Im}z=1$ (it won’t vanish anywhere near this line).

Now that we have bounded the analytic summand of our function $f$, from our second preliminary, we get

$\displaystyle \left|\int_0^1 p(t)dt\right| < 3Ca^{k!}\exp(k!\pi)$

and we shall consume the 3 into the constant $C$ without further notice. On top of that, we are about to apply a Newton-Cotes-method; the integral of the interpolating polynomial is equal to what all the quadratures $Q_n[f]$ do:

$\displaystyle \left|\sum_{i=0}^nw_{ni}a^{k!}\frac{\sin(k! \pi \frac in)}{-\cos(\pi \frac in)}\right| = \left|\int_0^1 p(t)dt\right| < Ca^{k!}\exp(k!\pi).$

Now we see

\displaystyle \begin{aligned}Q_{n!}[f] &= Q_{n!}\left[\sum_{k=n}^\infty a^{k!}\frac{\sin(k!\pi x)}{-\cos(\pi x)}\right]+Q_{n!}\left[\sum_{k=4}^{n-1} a^{k!}\frac{\sin(k!\pi x)}{-\cos(\pi x)}\right]\\ &\geq Q_{n!}\left[\sum_{k=n}^\infty a^{k!}\frac{\sin(k!\pi x)}{-\cos(\pi x)}\right]-\sum_{k=4}^{n-1}Ca^{k!}\exp(k!\pi)\\ &= w_{n!, \frac{n!}{2}}\sum_{k=n}^\infty k! a^{k!} -\sum_{k=4}^{n-1}Ca^{k!}\exp(k!\pi), \end{aligned}

since in case $k\geq n$, $\sin(k!\pi x)=0$ for any relevant $x_{n!,i}=\frac i{n!}$ and all terms in $Q_{n!}$ vanish except for $x_{n!,\frac12n!}=\frac12$ where the denominator vanishes as well and the singularity is removable by taking the value $k!$.

This is the time to take our first preliminary out of the drawer:

\displaystyle \begin{aligned} Q_{n!}[f] &> \frac{2^{n!}}{(n!)^3} n! a^{n!} - (n-4)C(ae^\pi)^{(n-1)!}\\ &= \frac{(2a)^{n!}}{(n!)^2}\left(1-\bigl(n!\bigr)^2(n-4)C\left(\frac{ae^\pi}{(2a)^n}\right)^{(n-1)!}\right)\end{aligned}.

As $a>\frac12$, the term in the large brackets converges to 1 ($n!$ grows slower than $(2a)^{n!}$ does), and the entire term diverges. In particular, the Newton-Cotes-method diverges for $f$. That’s what we wanted to show. Oof. q.e.d.

So, we have found necessary and sufficient conditions for quadrature methods to converge and we dealt with a striking counter-example for a plausible method on a reasonably nice class of functions to make us dismiss this plausible method. However, whenever we spoke about convergence, we didn’t deal with the speed of convergence – and we couldn’t have. As Davis and Rabinovitz have rather un-nervedly put it: Nothing positive can be said about the degree of convergence. Lipow and Stenger (“How slowly can quadrature Formulas converge?“; Math. Comp. 26, 1972, 917-922) have shown that all quadrature methods will converge arbitrarily slowly on the space of continuous functions: for every sequence $a_n$ converging to $0$, there is a continuous function $f$ such that $\int_0^1f(t)dt - Q_n[f] \geq \varepsilon_n$.

Before closing this chapter, let’s say that in practical applications it’s hard to know whether your approximation to the integral is sufficient. The convergence theorems don’t speak about the speed of convergence, the error estimates deal with sup-norms of derivatives. Most applications deal with adaptive methods (which we haven’t covered here) to have a closer look where the function might become erratic, and in the end the methods are used with a little hope in the back of the user’s minds that stable methods with a high degree of exactness will do sufficiently well. There can be no certainty, however. But still, it’s some pretty mathematics to spend your time with.