The last time, we have proved the Stone-Weierstass-Theorem on how to find approximating functions to any continuous function on a compact interval only using a certain algebra of “easy” functions – such as polynomials.

Now that the hard, abstract work is done, we shall note that there are other, less abstract ways to prove Weierstrass’ Approximation Theorem. I am unaware of a less abstract method for Stone-Weierstrass, in particular the method given above is non-constructive. You could go through most of these things, but the way that we used compactness is not constructable in general – how do you choose the finite covering, as you only know there should exist one?

Let us therefore look at Bernstein Polynomials, which are associated with a given function on :

One can prove:

**Proposition:** Let be continuous. Then the Bernstein Polynomials converge uniformly to .

**Proof:** Since is continuous on a compact interval, it is also uniformly continuous. Let . Then, there is some with if . Besides, for any ,

where we have used the standard fact that

which will (if used again and together with uniform continuity) yield

So, we have split our problem in two parts: in the first part, if we want to approximate well, it’s easy because an interpolation point for is close by. If this is not the case, the interpolation point is far away, meaning:

On top of that, by continuity, will be bounded by a constant , say. This gives:

As far as ideas are concerned, we’re done. All that is left to do is to shove terms and binomial coefficients from left to right and to find out that they behave properly. We won’t do that here, as it’s not particularly hard. One will find:

if is chosen sufficiently large. **q.e.d.**

There is a variant of the proof which avoids some of these messier calculations that we didn’t pull through above. Well, this variant doesn’t avoid the calculations, it rather shoves them away to another concept: the Weak Law of Large Numbers. We rely on Klenke’s book on Probability for this (but the idea is due to Bernstein).

**Alternative Proof:** Let , and let be independent random variables with a Bernoulli distribution with parameter (the random experiment of throwing one coin and its chance of landing on heads is ). Then, the sum has a binomial distribution . Therefore,

We find, again with ,

and hence

By the Chebyshev Inequality (which is just shy from using the weak law of large numbers itself), since and , we have

Hence,

if is taken large enough. **q.e.d.**

These two have been more direct ways of proving the classical Weierstrass Approximation Theorem. It is what Werner’s book on Functional Analysis does, and so does the treatment of Körner’s on Fourier Series. The pro side is, it’s a direct and simple proof. The downside is, however, you won’t get the stronger theorems given at the start of this text; it doesn’t even say anything about trigonometric polynomials.

But, there’s a slight hope: actually, in the non-probabilistic approach, we only needed to prove that Bernstein Polynomials can deal well with the monomials , and (that’s the part that we have avoided, since there is some computation to be done). Maybe one can generalize this one – and one can: this is Korovkin’s approach.

**Theorem (Korovkin):** Let a sequence of positive linear operators, and let be a test set that satisfies certain conditions on positivity and on its zero set.

If for any , we have , then we also have for all .

Remember that an operator is just a function on a function space (plug a function in, get a function out). It is linear by the usual definition of that term, and by positive one means as long as everywhere it is defined.

Korovkin tells us that you only need to approximate certain functions well with a cleverly chosen operator, to make the classical Weierstrass-Theorem work. This is what can be achieved by Bernstein Polynomials (they are a sequence of operators which only use finitely many values of a function) and the test set . You can also prove the Theorem on trigonometric polynomials with the operator chosen as the Fejér-kernel and the test set . We don’t delve into this here, especially not in the test set ; let’s just say that there is one and in the given examples you can’t make it smaller. In the detailed proof, there is some computation required and you can actually learn many things about Fourier series (so go into it; for instance in the books by Werner on Functional Analysis, or in Hämmerlin’s book on Numerics – however, to place a quote by Gauss in here: Most readers may well wish in some places for greater clarity in the presentation).

We won’t prove that here because for the main applications there is not much to gain on top of what we already have. But the approach has a certain elegance to it which is especially captured by functional analysts. I am not aware of generalizations of its applicability, I’d be very interested to hear about them.

As a final alternate proof of the full real version of Stone-Weierstrass Theorem (not just Weierstrass’ Theorem), we shall look at an elementary approach suggested by Brosowski and Deutsch in 1981 (see the original paper here). There is no dealing with convergent series, operators and the like; the proof only hinges on definitions of compactness and continuity and three easily proved ideas: closed subsets of compact spaces are compact, continuous positive functions have a positive minimum on compact sets, and the Bernoulli inequality for any and . We will give two lemmas before we are able to prove the real version of Stone-Weierstrass.

The proof will actually show a little bit more, since we need not suppose that . In fact, Stone-Weierstrass holds for any real-valued function on a compact topological space (it’s just that we didn’t make a fuss about that earlier on – most applications deal with functions on , as a matter of fact).

**The pointwise nearly indicator Lemma:** Let and let be a neighborhood of . Then, for any there is some neighborhood of with and some function such that

- for ,
- for ,
- for .

The lemma got its name because serves as an indicator function (or characteristic function for the non-probabilists out there) for the set . Of course, the algebra is not rich enough for a proper indicator function, since that wouldn’t be continuous. But the algebra approximates this indicator of , and we can guarantee the function to nearly vanish in some neighborhood of the given point . We’ll improve on that soon.

**Proof:** First of all, we need to find a function that takes its values only in the interval and which already has a controllably small value in . What easier way to pick that small value as ?

For any , by separation, there is some function with . Besides, the function is in and of course . Even the function is in , we have , and .

That was the first important step. vanishes at and it doesn’t vanish in some point outside of . Now, we need to push the “doesn’t vanish” part to make our function take values near , uniformly outside of .

Consider the set . This is a neighborhood of , hence its name. We can look at any of those , whenever . Now, is compact (it’s a closed subset of a compact space), and hence finitely many will suffice to cover , let’s say . The function is in , we still have , and for any .

Because of compactness, must take a positive minimum on the compact set , let’s call it . Then, for this , we look at the set . Obviously, and is a neighborhood of .

Let us look at the integer . Thus, and . In total, . We can therefore, for every , define the functions , which, of course, are in , which have and which have .

Let . Then, and Bernoulli’s inequality (used with and ) shows . Therefore, , and this limit is uniform in .

Let . Then, and Bernoulli’s inequality shows

and this converges to uniformly.

In particular, for any , we can find some large enough such that on and on . Now, to conclude the proof, pick (remember that was a neighborhood of by construction) and . **q.e.d.**

This one felt quite chore-ish. It was a little messy and technical, but note the necessary tools: no operator theory, no convergence of series, just stuff from high school calculus and the very basics of the theory of compact sets. That would seem like a value in and of itself.

**The setwise nearly indicator Lemma:** Let disjoint and closed. Let . Then, there is some with

- for ,
- for ,
- for .

Obviously, this lemma improves on the first one, we have nearly an indicator function on which nearly vanishes on . In this respect, it resembles Urysohn’s Lemma from elementary topology, but Urysohn isn’t restricted to the algebra and can hence do even a little better. Never mind.

**Proof:** This is going to be a lot quicker: you know the drill. Let . Let and let . We can choose neighborhoods as in the previous Lemma. Of course, since is compact, finitely many of those suffice to cover , let’s say . For each of the , there is a function such that , for all and for all .

The function will now do the trick: Of course, we have .

Let , in particular , say. Then

Now, let . Then Bernoulli tells us that . Note that we need here (which is not a very hard restriction anyway – nevertheless, we included some condition like that in the assertion).

That’s all we needed to show. **q.e.d.**

Note that the notion of compactness has been used rather slightly differently from the very first proof we gave for Stone-Weierstrass. Then, we had the Approximation Lemma which used continuity and direct neighborhoods where we could approximate adequately. Here, we have decomposed the underlying set by using the level sets of to approximate it. Then, compactness tells us every time that we can do with finitely many sets – but the nature of those sets is ever so slightly different. The math, however, is not.

Now, let’s harvest the ideas:

**Proof** *of the Stone-Weierstrass, real version:*

Let and let . Without loss of generality, we may assume that , since we might as well replace it by . Besides, is not much of a restriction.

Let be such that . Let, for , and be the sets

Obviously, those sets are disjoint for fixed , and the are increasing to , while the are decreasing to . The setwise nearly indicator Lemma then says that there is some with , for and for .

That’s all well and good – but how will this help us to prove Stone-Weierstrass? Imagine the space decomposed like an onion into the . The functions will have very small values on the , but large values outside, in the . We will combine these small and large values properly, to have similar values as the function in every ring of this onion. Note that the values of are closely attached to the – by the very definition of and .

Now consider the function , which of course is in . Let , then there is some with , and so

,

and of course for all . Besides, our fixed is in for . Therefore, . All of this tells us (remember that )

Now, if ,

Note that, for , appears to be correct.

To conclude, we wish to show something about . Let us consider the case first:

The other case is :

This shows, that in every case and differ by at most , uniformly on . **q.e.d.**

To conclude, let us mention some little question that is related to the classical Weierstrass Aproximation Theorem. If we have all monomials at our disposal, we can uniformly approximate any continuous function; but what if some monomials are missing? With how few of them can the Approimation Theorem still hold? The surprisingly neat answer is given in Müntz’ Theorem (sometimes called the Müntz-Szász-Theorem for the sake of more signs on top of the letters):

**Theorem (Müntz):** Let be a strictly increasing sequence of natural numbers with . Then, we have

The span is meant as the linear space generated by the given monomials, i.e. the polynomials that can be constructed using only those monomials. Hence, the Weierstrass Theorem can even hold if you disallow to use infinitely many monomials. However, you will always need the monomial , since otherwise your approximating functions could only satisfy .

The proof of that theorem is not particularly easy, especially the part where you show that the series must diverge if you can approximate any function. For the other direction there has been a very elementary proof that we shall give here because of its inherent beauty:

**Proof*** of Müntz’ Theorem, -part:*

Let .

Let be such that . Without loss of generality, we assume that is smaller than any of the : just take away those which are smaller than and re-enumerate the starting from ; the condition will then still hold.

We will show that . This will imply that all monomials are in the closure of that span, and if we have that, Weierstrass tells us that we can approximate any continuous function.

Note that we don’t need the constant function to approximate , since for our purposes the relevant functions all have .

Now consider the function

with certain . Then first of all, , and if we choose certain , we find the recursion

Let’s prove this by induction:

Obviously, one could try to define the appropriately or to give some closed formula expression; but it doesn’t seem to be useful for the problem at hand.

Now, we have , and

This shows

and hence

.

But the infinite product will be 0, because by assumption, and because of the basic inequality for any , we have

Hence, vanishes uniformly as , and so is being approximated uniformly by some polynomials using only the allowed monomials . This concludes the proof. **q.e.d.**

At first, I was excited because I thought one could reverse this idea to show the other direction of Müntz’ Theorem as well. However, it doesn’t work because of the second-to-last inequality given above: If the product is finite, that doesn’t say anything about how well (or how badly) is approximated. The proof of the reverse direction is harder and I am only aware of deeper ideas from Fourier Analysis (that’s historically the first proof) or using the Hahn-Banach Theorem (as in Rudin’s book).

But the elementary idea given here is actually very neat, even though it doesn’t, to the best my knowledge, solve any real-world problems. A very pretty academic finger-exercise. And this should be a good place to stop.