On approximating functions – Part One

Let us have a closer look at the Stone-Weierstrass-Theorem. It is a generalization of Weierstrass’ classical approximation theorem on polynomials which stands at the base of approximation theory. Stone took Weierstrass’ classical statement and rendered it more abstractly, so it can be used for a number of things in proofs about continuous functions. In practice, I have rarely seen it used in its pure form – maybe because it lacks a direct statement about the approximation error, and because you can base more powerful statements upon it.

Let us state the Stone-Weierstrass-Theorem in its pure form first, and its two most useful corollaries next, before we turn to different methods to prove them and to a deeper rendition of their significance.

We shall denote by \mathcal{C}(K) the set of continuous complex-valued functions on the set K, and by \mathcal{C}_{\mathbb{R}}(K) the set of real-valued continuous functions on the set K.

Remember that an algebra \mathcal{A} over a field f is a set which satisfies the following conditions:

  • for any f\in \mathcal{A} and any \alpha\in F, \alpha f\in \mathcal{A}, and
  • for any f, g\in \mathcal{A}, f+g\in \mathcal{A} and f\cdot g\in \mathcal{A}.

Also, we shall call a family of functions on the set x separating, if for any x, y\in X, with x\neq y, there is some f\in \mathcal{A} such that f(x)\neq f(y).


Stone-Weierstrass-Theorem (real version): Let K\subset\mathbb{R} be a compact set and let \mathcal{A}\subset\mathcal{C}_{\mathbb{R}}(K) a separating algebra with the constant function 1\in\mathcal{A}.

Then, \mathcal{A} is dense in \mathcal{C}_{\mathbb{R}}(K) in the topology of uniform convergence.

Equivalently, \overline{\mathcal{A}} = \mathcal{C}_{\mathbb{R}}(K). Equivalently, for every \varepsilon>0 and every f\in\mathcal{C}_{\mathbb{R}}(K), there is some g\in\mathcal{A} with \left\|f-g\right\|_\infty<\varepsilon.


Stone-Weierstrass-Theorem (complex version): Let K\subset\mathbb{C} be a compact set and let \mathcal{A}\subset\mathcal{C}(K) a separating algebra with the constant function 1\in\mathcal{A} and for any g\in\mathcal{A} let \overline g\in\mathcal{A}.

Then, \mathcal{A} is dense in \mathcal{C}(K) in the topology of uniform convergence.


Corollary: Weierstrass’ classical Theorem: For every continuous real-valued function f on an interval [a,b] there is a sequence of polynomials p_n which converges uniformly to f on [a,b].


Corollary: On trigonometric polynomials: For every continuous 2\pi-periodic real-valued function f there is a sequence of trigonometric polynomials \sum_{j=0}^n \alpha_j\cos(jx)+\beta_j\sin(jx) which converges uniformly to f.


In this second corollary, the objects don’t look like polynomials at all, but in the proof we will give a hint on why the series can be called that. It has to do with the representation (e^{iz})^j.

The conditions in the theorem are quite natural. You need some sort of richness in your set of approximating functions, this is achieved by the separation: if there were two points x\neq y and you couldn’t find a separating f\in\mathcal A, then how could you properly approximate any given continuous function which takes different values at x and y? You could never, well, separate the values with what the algebra provides you. Hence, this is obviously a necessary condition. As Hewitt/Stromberg put out nicely, this obviously necessary condition is also sufficient in the real case – that makes some of the beauty and elegance of the theorem.

In the complex version, there is some sort of extra-richness that we must provide for the theorem to hold. We obviously can’t do without that: Let us think of the family of polynomials on the unit disk in \mathbb{C}. They are an algebra, alright, they contain the continuous functions, and they are separating (even the identity polynomial f(x)=x can do that). But they are holomorphic as well, and the uniform limit on a compact set of holomorphic functions will be holomorphic again (this follows very naturally from Cauchy’s Integral Formula) – but certainly, there are continuous functions on the unit disk which are not holomorphic, so there needs to be an extra assumption in the Stone-Weierstrass-Theorem. And it follows very smoothly, that the lacking assumption can already be met by demanding that complex conjugates be contained in \mathcal{A} as well. Let us show this, why not:

Proof of the complex version basing on the real version

The real and imaginary part of any g\in\mathcal{A} can be represented as

\displaystyle\mathrm{Re}g=\frac{g+\overline g}{2}


\displaystyle\mathrm{Im}g=\frac{g-\overline g}{2i},

which means that \mathrm{Re}g, \mathrm{Im}g \in \mathcal{A}, by assumption. Moreover, if x, y\in K and x\neq y, then by separation there is some g\in\mathcal{A} which gives us g(x)\neq g(y), and that means at least \mathrm{Re}g(x)\neq\mathrm{Re}g(y) or \mathrm{Im}g(x)\neq\mathrm{Im}g(y). So the family of the real parts of the functions in \mathcal{A} is separating and thus meets the conditions of the real version of the theorem, and so does the family of imaginary parts. The theorem yields that either one of these families is dense in \mathcal{C}_{\mathbb{R}}(K).

Hence, for any f\in\mathcal{C}(K) and for any \varepsilon>0, there are some g_0, g_1\in\mathcal{A} such that

\displaystyle\left\|\mathrm{Re}f - g_0\right\|_\infty < \frac\varepsilon2


\displaystyle\left\|\mathrm{Im}f - g_1\right\|_\infty < \frac\varepsilon2.


\begin{aligned}\displaystyle\left\|f - (g_0+ig_1)\right\|_\infty &= \left\|\mathrm{Re}f - g_0 + i(\mathrm{Im}f - g_1)\right\|_\infty \\    &\leq \left\|\mathrm{Re}f - g_0\right\|_\infty + \left\|\mathrm{Im}f - g_1\right\|_\infty \\ &< \varepsilon.    \end{aligned}

Since g_0+ig_1\in\mathcal{A}, the complex version is proved. q.e.d.


While we’re at it, let’s give the proofs for the corollaries now.

Proof of Weierstrass’ classical Theorem

That’s rather simple: the set of polynomials is obviously an algebra, and it’s separating because you can find the identity polynomial g(x)=x in it. The conditions of the theorem are met – the corollary follows. q.e.d.


Proof of the statement on trigonometric polynomials

This follows from the complex version. Let us look at some 2\pi-periodic function f:\mathbb{R}\to\mathbb{R}. The domain of f is not compact, but all that matters is the compact interval [0,2\pi].

Now consider the set of functions \sum_{j=-n}^nc_jz^j with z\in [0,2\pi], which obviously form a separating algebra. The conditions of the real version are thus met. Those functions can approximate the function f on [0,2\pi] uniformly. We are interested in trigonometric polynomials, however, and so far, for any \varepsilon>0, we’ve only found some appropriate n and c_j with

\displaystyle\left\|f(z) - \sum_{j=-n}^nc_jz^j\right\|_\infty < \varepsilon.

Of course, that’s not news. We already knew that c_j=0 for j<0 from the real version. But we can now consider the bijection z\mapsto e^{iz}, which transforms our setting to the compact complex set S^1. Let us take g(e^{iz}):=f(z) which maps g:S^1\to\mathbb{R}. We try to apply the complex version to this function g and this time we use the approximating functions \sum_{j=-n}^nc_je^{ijz} (in order to show the resemblance with the approximating functions for f, they can also be written like \sum_{j=-n}^nc_k(e^{iz})^j). They are still an algebra containing constant functions, they are separating because e^{iz} is a bijection on S^1, and now we need to ensure the extra assumption on complex conjugation – but that’s not a problem because of \overline{e^{ijz}} = e^{-ijz}. The complex version thus holds and we can find some appropriate n and c_j with

\displaystyle\left\|g(e^{iz}) - \sum_{j=-n}^nc_je^{ijz}\right\|_\infty < \varepsilon.

Now, we can represent c_j = a_j+ib_j with a_j,b_j\in\mathbb{R}, and e^{ijz} = \cos(jz)+i\sin(jz), which yields

\displaystyle\left\|g(e^{iz}) - \sum_{j=-n}^n\bigl(a_j\cos(jz)-b_j\sin(jz)\bigr) - i\sum_{j=-n}^n\bigl(a_j\sin(jz)+b_j\cos(jz)\bigr)\right\|_\infty < \varepsilon,

but we had chosen g(e^{iz})=f(z) and this is real by assumption (no imaginary part). So:

\displaystyle\left\|f(z) - \sum_{j=-n}^n\bigl(a_j\cos(jz)-b_j\sin(jz)\bigr)\right\|_\infty < \varepsilon.

Finally, we employ the relations \cos(-x) = \cos(x) and \sin(-x) = -\sin(x) to find

\displaystyle\left\|f(z) - a_0 - \sum_{j=1}^n \bigl((a_j+a_{-j}) \cos(jz) - (b_j-b_{-j})\sin(jz)\bigr)\right\|_\infty < \varepsilon.

By appropriate definitions of the coefficients \alpha_j and \beta_j, we have proved the corollary:

\displaystyle\left\|f(z) - \sum_{j=0}^n \bigl(\alpha_j\cos(jz)+\beta_j\sin(jz)\bigr)\right\|_\infty < \varepsilon. q.e.d.


In a way, that was surprisingly non-smooth for the proof of a corollary. But the Fourier series people are thankful for a proof like that. We shall see a sketch of another proof of that later, but this one here does without explicit computations of integrals and fuzzing around with sine-cosine-identities.

Up until now, everything has hinged on the real version of the Stone-Weierstrass-Theorem. In order to give a self-contained proof of this, we are going to need several lemmas and a little work. We take the proof from Königsberger’s book on Calculus (it’s very similar to the proof given in Heuser’s book). Let us start with the


The square root Lemma: Define the generalized binomial coefficient via \binom{\alpha}{n}:=\frac{\alpha(\alpha-1)\cdots(\alpha-n+1)}{n!}. Then we have the generalized binomial formula (1+x)^\alpha=\sum_{k=0}^\infty\binom{\alpha}{k}x^k for any \left|x\right|<1.

For \alpha>0, uniform and absolute convergence even holds for any \left|x\right|\leq1

In particular, for \alpha=\frac12 and any x\in[-1,1],

\displaystyle\sqrt{1+x} = \sum_{k=0}^\infty\binom{\frac12}{k}x^k.


Proof: One part is easy, if we apply the approriate machinery. Since (1+x)^\alpha is holomorphic with a possible singularity at x=-1, we can find its absolutely convergent power series in the point x=0 with a radius of convergence at least 1:

\displaystyle (1+x)^{\alpha} = \sum_{k=0}^\infty \left.\frac{d^k}{dx^k}(1+x)^{\alpha}\right|_{x=0} \frac1{k!}x^k,\qquad\left|x\right|<1.

But the derivatives in x=0 are easily found to be \frac{d^k}{dx^k}(1+x)^{\alpha} = \prod_{j=0}^{k-1}(\alpha-j). This proves the series representation.

This can also be shown in an elementary way, as seen in Hewitt/Stromberg.

We still need to prove something about convergence on the boundary of the circle for \alpha>0. Let us consider the series \sum_{k=0}^\infty\left|\binom{\alpha}k\right|. We have, for K\geq\alpha:

\displaystyle\frac{\left|\binom{\alpha}{k+1}\right|}{\left|\binom{\alpha}k\right|} = \frac{\left|\alpha-k\right|}{k+1} = \frac{k-\alpha}{k+1}.

In particular, for those large K:

\displaystyle k\left|\binom{\alpha}k\right|-(k+1)\left|\binom{\alpha}{k+1}\right| = \alpha \left|\binom{\alpha}k\right| > 0

The sequence K\left|\binom{\alpha}k\right| is thus eventually decreasing in K, it’s bounded and therefore has a limit \gamma\geq0. Now look at the telescoping series

\displaystyle \sum_{k=0}^\infty \left(k\left|\binom{\alpha}k\right| - (k+1)\left|\binom{\alpha}{k+1}\right|\right),

whose p-th partial sum is -(p+1)\left|\binom{\alpha}{p+1}\right|, which converges to $-\gamma$ (however, only the first few terms of it are negative, which gives the negative limit). So, the telescoping series converges, and we find

\begin{aligned}\displaystyle \sum_{k=0}^\infty\left|\binom{\alpha}k\right| &= \sum_{k=0}^{\left\lfloor\alpha\right\rfloor}\left|\binom{\alpha}k\right| + \sum_{k=\left\lfloor\alpha\right\rfloor+1}^\infty\left|\binom{\alpha}k\right| \\    &= C_1 + \frac1\alpha\sum_{k=\left\lfloor\alpha\right\rfloor+1}^\infty\left(k\left|\binom{\alpha}k\right| - (k+1)\left|\binom{\alpha}{k+1}\right|\right) \\    &= C_1 + C_2.    \end{aligned}

This shows that the series \sum_{k=0}^\infty\left|\binom{\alpha}k\right| is convergent. For in x\in[-1,1], we are actually interested in

\displaystyle \left|\sum_{k=0}^\infty\binom{\alpha}{k}x^k\right|\leq \sum_{k=0}^\infty\left|\binom{\alpha}{k}\right|,

which is convergent. The series thus defines a uniformly and absolutely convergent function on the compact interval [-1,1].

The fact that (1+x)^\alpha = \sum_{k=0}^\infty\binom{\alpha}{k}x^k even for x=-1 and x=1 follows by continuity on [-1,1] and by equality of both sides on (-1,1). q.e.d.


The Closure Lemma: Let \mathcal A be an algebra of continuous functions on a compact set and \overline{\mathcal A} its closure in the topology of uniform convergence. Then, for any f,g\in\overline{\mathcal A}:

f+g, fg, \left|f\right|, \max(f,g) and \min(f,g) are in \overline{\mathcal A}.


Proof: Let \varepsilon>0. There are p,q\in\mathcal A with \left\|f-p\right\|_\infty<\varepsilon and \left\|g-q\right\|_\infty<\varepsilon. Therefore,

\displaystyle\left\|f+g-(p+q)\right\|_\infty \leq \left\|f-p\right\|_\infty + \left\|g-q\right\|_\infty < 2\varepsilon,


\begin{aligned}    \displaystyle\left\|fg-pq\right\|_\infty &= \left\|fg-fq+fq-pq\right\|_\infty \\    &\leq \left\|f-p\right\|_\infty\left\|q\right\|_\infty + \left\|f\right\|_\infty\left\|g-q\right\|_\infty \\    &\leq \left\|f-p\right\|_\infty\bigl(\left\|g\right\|_\infty+\left\|g-q\right\|_\infty\bigr) + \left\|f\right\|_\infty\left\|g-q\right\|_\infty\\    &\leq \varepsilon\bigl(\left\|g\right\|_\infty + \varepsilon + \left\|f\right\|_\infty\bigr) < C\varepsilon.    \end{aligned}

The right-hand side can be made arbitrarily small, which proves the first two statements.

To deal with \sqrt{f}, we employ the compactness of the set: f will be bounded. We can therefore consider the function \phi = \frac{f}{\left\|f\right\|_\infty} taking values in [-1,1]. Note that the constant function \left\|f\right\|_\infty^{-1} belongs to \mathcal A, and since f\in\overline{\mathcal A}, \phi\in\overline{\mathcal A}. By the Square root Lemma,

\displaystyle\left|\phi(x)\right| = \sqrt{\left|\phi(x)\right|^2} = \sqrt{1+\bigl(\phi^2(x)-1\bigr)} = \sum_{k=0}^\infty\binom{\frac12}{k}\bigl(\phi^2(x)-1\bigr)^k,

with an absolutely and uniformly convergent series representation. There is some N_0 with

\displaystyle \left\|\left|\phi(x)\right| - \sum_{k=0}^{N_0}\binom{\frac12}{k}\bigl(\phi^2(x)-1\bigr)^k\right\|_\infty < \frac\varepsilon2.

By the arguments given above, together with induction, one sees that the partial sum up to N_0 belongs to \overline{\mathcal A}, if \phi does. Hence, there is some r\in\mathcal A with

\displaystyle\left\|r - \sum_{k=0}^{N_0}\binom{\frac12}{k}\bigl(\phi^2(x)-1\bigr)^k\right\|_\infty < \frac\varepsilon2.

This yields

\begin{aligned}    \displaystyle \bigl\|\left|\phi(x)\right| - r\bigr\|_\infty &\leq \left\|\left|\phi(x)\right| - \sum_{k=0}^{N_0}\binom{\frac12}{k}\bigl(\phi^2(x)-1\bigr)^k\right\|_\infty + \left\|\sum_{k=0}^{N_0}\binom{\frac12}{k}\bigl(\phi^2(x)-1\bigr)^k - r\right\|_\infty \\    &< \frac\varepsilon2+\frac\varepsilon2 = \varepsilon.    \end{aligned}

Thus, \left|\phi\right|\in\overline{\mathcal A}, and therefore \left|f\right|\in\overline{\mathcal A}.

Finally, we apply the equalities

\displaystyle \max(f,g) = \frac12\bigl(f+g\left|f-g\right|\bigr),\qquad \min(f,g) = \frac12\bigl(f+g-\left|f-g\right|\bigr),

to show the final statements. q.e.d.


Note that we need \overline{\mathcal{A}} to find these functions, even if f,g\in\mathcal{A}. The algebra \mathcal{A} itself is not rich enough, in general, to contain the functions in the statement of the Closure Lemma. Besides, we have made frequent use of \overline{\overline{\mathcal{A}}}=\overline{\mathcal{A}}, which is at the heart of why the Closure Lemma works.


The Approximation Lemma: Let \mathcal A\subset\mathcal{C}_{\mathbb{R}}(K) be a separating algebra and let f\in\mathcal{C}_{\mathbb{R}}(K), x\in K, \varepsilon>0. Then, there is some q_x\in\overline{\mathcal A} with the properties

q_x(x) = f(x) and q_x(y) \leq f(y)+\varepsilon for any y\in K.


Proof: First of all, for any z\in K there is some function h_z\in\mathcal A with h_z(x)=a and h_z(z) = b. Take

\displaystyle h_z(y) = (b-a)\frac{g(y)-g(x)}{g(z)-g(x)}+a

with an appropriate g\in\mathcal{A} for instance (separation allows for this to be well-defined: pick g such that g(z)\neq g(x)). In particular, one can choose a=f(x), b=f(z), which makes h_z coincide with f at least in the points x and z.

Since h_z and f are continuous, there is some interval I_z around z, with h_z(y)\leq f(y)+\varepsilon for all y\in I_z. Now, any point z\in K is at least contained in the interval I_z, and by compactness, finitely many of those intervals suffice to cover K. Say, K\subset\bigcup_{i=1}^nI_{z_i}. Then consider

\displaystyle q_x:=\min(h_{z_1},\ldots,h_{z_n}),

which is in \overline{\mathcal A} by the Closure Lemma. Besides, q_x(x) = \min\bigl(h_{z_1}(x),\ldots,h_{z_n}(x)\bigr) = \min\bigl(f(x),\ldots,f(x)\bigr) = f(x) by construction, and for every y\in K, there is some j, such that y\in I_{z_j}, and so q_x(y) = \min\bigl(h_{z_1}(y),\ldots,h_{z_n}(y)\bigr) \leq h_{z_j}(y) \leq f(y)+\varepsilon. q.e.d.


Proof of the Stone-Weierstrass-Theorem, real version: For any x\in K, choose some q_x\in\overline{\mathcal A} as in the Approximation Lemma (in particular: q_x(x)=f(x)). Then, by continuity, pick an open interval U_x around x, such that for any of the y\in U_x\cap K,

\displaystyle q_x(y) \geq f(y)-\frac\varepsilon2.

By compactness, finitely many of the U_{x_i} suffice to cover K. Consider

\displaystyle g:=\max(q_{x_1},\ldots,q_{x_n}),

which is in \overline{\mathcal A} by the Closure Lemma. For any z\in K, we find some i with z\in U_{x_i}, and thus

\displaystyle g(z) = \max\bigl(q_{x_1}(z),\ldots,q_{x_n}(z)\bigr) \geq q_{x_i}(z) \geq f(z)-\frac\varepsilon2.

By the Approximation Lemma, however,

\displaystyle g(z) = \max\bigl(q_{x_1}(z),\ldots,q_{x_n}(z)\bigr) \leq \max\bigl(f(z)+\varepsilon,\ldots,f(z)+\varepsilon\bigr) = f(z)+\varepsilon.

Now, since g\in\overline{\mathcal A}, we can pick some p\in\mathcal A with \left\|g-p\right\|_\infty<\varepsilon. Then, we have found

\displaystyle\left\|f-p\right\|_\infty\leq\left\|f-g\right\|_\infty+\left\|g-p\right\|_\infty < \varepsilon+\varepsilon = 2\varepsilon.

The Stone-Weierstrass-Theorem is proved. q.e.d.


A slightly different proof is given in Hewitt/Stromberg. The idea is about the same, but the compactness argument works on the image of f, w.l.o.g. the interval [0,1]. Here, we have used the compactness of the definition space K. Of course, Hewitt/Stromberg can’t do without the compactness of K, since they need that f takes its maximum and minimum value. The downside of their approach is that they need to iterate their argument, so their approximating function is actually a series of those max-min-functions that we have used. But there are not too many differences in the proofs, so let’s just stick to that. I have actually found the proof given above a little slicker; but I have come to admire the book by Hewitt/Stromberg for its clarity and style. There are several nice things hidden inside there to come back to.

For now, we’ll stop. But there’s some more backstory about the Stone-Weierstrass-Theorem to come soon.