The Theorems of Schauder and Peano

In the text on Brouwer’s Fixed Point Theorem, I had confidently stated that Schauder’s Theorem follows from it with less effort and that one may easily conclude things like Peano’s Theorem from there. As a matter of fact, things lie considerably deeper than I had naively thought. If sound proofs are to be given, there is technical work to be done in many instances. The ideas are not tough themselves, but the sheer number of steps to be taken and the methodological machinery cannot be neglected. Let’s see.

We shall follow the lines of Heuser’s books (both this one and this one), as we did before, to collect the ingredients to give a proof of Schauder’s Fixed Point Theorem. It involves a statement about convex sets, on which we will focus first, followed by an excursion on approximation in normed vector spaces. We shall also need the theorem named after Arzelà and Ascoli, being a basic glimpse into the ways of thinking of functional analysis. All of this will allow us to prove Schauder’s Theorem in a rather strong flavour. For the conclusion, we split our path: we give both Heuser’s treatment of Peano’s Theorem in the spirit of functional analysis and Walter’s more elementary approach (which, however, also makes use of the Theorem of Arzelà-Ascoli).


Remember that a set K is called convex, if for any x,y\in K and for any \alpha\in[0,1], we have \alpha x + (1-\alpha)y\in K. This formalizes the intuition that the line connecting x and y be contained in K as well.


Lemma (on convex sets): Let E be a normed space and let x_1,\ldots,x_n\in E. Let

\displaystyle\mathrm{conv}(x_1,\ldots,x_n) = \bigcap_{\substack{K\subset E~\mathrm{ convex}\\\{x_1,\ldots,x_n\}\subset K}} K

the convex hull. Then,

\displaystyle \mathrm{conv}(x_1,\ldots,x_n) = \left\{v\in E\colon v=\sum_{i=1}^n\lambda_ix_i\text{ with }\sum_{i=1}^n\lambda_i=1; \lambda_i\geq0\right\},\qquad(\diamondsuit)

and \mathrm{conv}(x_1,\ldots,x_n) is compact.


Proof: Let us first prove the representation of the convex hull. For the “\subset“-direction, we will show that the set on the right-hand side of (\diamondsuit) is convex. Let x = \sum_{i=1}^n\lambda_ix_i and y=\sum_{i=1}^n\mu_ix_i, with \sum\lambda_i=\sum\mu_i=1. Let \alpha\in[0,1], then

\displaystyle \alpha x + (1-\alpha)y = \sum_{i=1}^n\bigl(\alpha\lambda_i+(1-\alpha)\mu_i\bigr)x_i,

where \sum_{i=1}^n\bigl(\alpha\lambda_i+(1-\alpha)\mu_i\bigr) = \alpha+(1-\alpha) = 1. Hence, \alpha x+(1-\alpha)y is part of the set on the right-hand side of (\diamondsuit) .

We now turn to the “\supset“-direction. Let y_1,\ldots, y_m\in\mathrm{conv}(x_1,\ldots,x_n). We show that \sum_{j=1}^m\mu_jy_j\in\mathrm{conv}(x_1,\ldots,x_m) if \sum_{j=1}^m\mu_j = 1. That means any point that as a representation as in the right-hand side of (\diamondsuit) must be in \mathrm{conv}(x_1,\ldots,x_n). This is clear for m=1. For m>1, we take C:=\sum_{j=1}^{m-1}\mu_j to see

\displaystyle    \begin{aligned}    \sum_{j=1}^m\mu_jy_j &= \sum_{j=1}^{m-1}\mu_jy_j + \mu_my_m\\    &= C\sum_{j=1}^{m-1}\frac{\mu_j}{C}y_j + \mu_my_m\\    &= C\sum_{j=1}^{m-1}\tilde\mu_j y_j + (1-C)y_m.    \end{aligned}

Note that \mu_m = \sum_{j=1}^m\mu_j - \sum_{j=1}^{m-1}\mu_j = 1-C. By induction,

\displaystyle \sum_{j=1}^{m-1}\tilde\mu_jy_j\in\mathrm{conv}(x_1,\ldots,x_n),

since \sum_{j=1}^{m-1}\tilde\mu_j = \frac1C\sum_{j=1}^{m-1}\mu_j = 1. Hence,


Finally, we shall prove compactness. Let (y_k)_k\subset\mathrm{conv}(x_1,\ldots,x_n), with a representation y_k = \sum_{i=1}^n\lambda_i^{(k)}x_i. The sequences (\lambda_i^{(k)})_k are bounded by [0,1] and hence have convergent subsequences. Choosing subsequences n times (for each i=1,\ldots,n), we find a subsequence (\lambda_i^{(k_\ell)})_\ell that converges to some \lambda_i for each i=1,\ldots,n. Besides,

\displaystyle\sum_{i=1}^n\lambda_i = \sum_{i=1}^n\lim_{\ell\to\infty}\lambda_i^{(k_\ell)} = \lim_{\ell\to\infty}1 = 1.

This yields

\displaystyle\lim_{\ell\to\infty}y_{k_\ell} = \lim_{\ell\to\infty}\sum_{j=1}^n\lambda_j^{(k_\ell)}x_j = \sum_{j=1}^n\lambda_jx_j\in\mathrm{conv}(x_1,\ldots,x_n).



We will now prove a result that extends Brouwer’s Fixed Point Theorem to a more general setting. This is the one that I had skimmed earlier, believing it consisted only of standard arguments; in principle, this is true. But let’s have a closer look at it and how these standard arguments work together.


Theorem (on fixed points in real convex sets): Let \emptyset\neq K\subset\mathbb{R}^p be convex, compact, and let f:K\to K be continuous. Then f has a fixed point.

Proof: 0th step. As K is compact, it is bounded and thus there is some r>0 such that K\subset B_r(0).

1st step. Let us construct the best approximation of some x\in B_r(0) within K; that means we look for a z\in K with \left|x-z\right| = \inf_{y\in K}\left|x-y\right|.

Taking \gamma:=\inf_{y\in K}\left|x-y\right|, there is a sequence (z_n)_n\subset K with \lim_{n\to\infty}\left|x-z_n\right| = \gamma.

We wish to prove that (z_n)_n is a Cauchy sequence. From the basic properties of any scalar product, we find

\displaystyle    \begin{aligned}    \left|u+v\right|^2+\left|u-v\right|^2 &= \left\langle u+v,u+v\right\rangle + \left\langle u-v,u-v\right\rangle \\    &= \left|u\right|^2 + \left|v\right|^2 + 2\left\langle u,v\right\rangle + \left|u\right|^2 + \left|v\right|^2 - 2\left\langle u,v\right\rangle \\    &= 2 \left|u\right|^2 + 2 \left|v\right|^2.    \end{aligned}

In our case, this shows

\displaystyle    \begin{aligned}    \left|z_n-z_m\right|^2 &= \left|(z_n-x)-(z_m-x)\right|^2 \\    &= 2\left|z_n-x\right|^2 + 2\left|z_m-x\right|^2- \left|(z_m+z_n)-2x\right|^2 \\    &= 2\left|z_n-x\right|^2 + 2\left|z_m-x\right|^2 - 4\left|\frac{z_m+z_n}2-x\right|^2.    \end{aligned}

Since K is convex, \frac12z_n+\frac12z_m\in K. Therefore,



\displaystyle    \begin{aligned}    \lim_{n,m\to\infty}\left|z_n-z_m\right|^2&\leq 2\lim_{n\to\infty}\left|z_n-x\right|^2 + 2\lim_{m\to\infty}\left|z_m-x\right|^2 - 4\gamma^2 \\    &= 2\gamma^2+2\gamma^2-4\gamma^2 = 0.    \end{aligned}

Therefore, (z_n)_n is a Cauchy sequence, having a limit y, say. As K is closed, y\in K. In total, we have seen (noting that the absolute value is continuous)

\displaystyle\gamma = \lim_{n\to\infty}\left|z_n-x\right| = \left|y-x\right|.

y is the best approximation to x within K.

2nd step. The best approximation is unique.

If there were two of them, u and V, say, then \left|x-u\right| = \left|x-v\right| = \gamma. If we consider the sequence (z_n)_n that alternates between u and V, we’d find \left|x-z_n\right| = \gamma for all n, and hence (z_n)_n is a Cauchy sequence by what we found in step 1. Therefore z_n must be convergent, which implies u=v.

3rd step. The mapping A:B_r(0) \to K that takes x to its best approximation, is continuous.

Let (x_n)_n\subset B_r(0) with \lim_{n\to\infty}x_n =: x. Let \varepsilon>0. For sufficiently large n, we have

\displaystyle \gamma_n :=\inf_{y\in K}\left|x_n-y\right| \leq \inf_{y\in K}\bigl(\left|x_n-x\right|+\left|x-y\right|\bigr) < \varepsilon + \inf_{y\in K}\left|x-y\right| = \varepsilon + \gamma.


\displaystyle \gamma\leq\left|x - A(x_n)\right|\leq\left|x_n-A(x_n)\right| + \left|x_n-x\right| = \gamma_n + \left|x_n-x\right| < \gamma_n+\varepsilon.

These inequalities give us

\displaystyle \gamma\leq\left|x-A(x_n)\right| < 2\varepsilon + \gamma,\qquad\text{ which means }\lim_{n\to\infty}\left|x-A(x_n)\right| = \gamma.

Hence, \lim_{n\to\infty}A(x_n) is the best approximation to x. As this is unique, we have shown that A is sequentially continuous.

4th step. The quest for the fixed point.

The mapping f\circ A:B_r(0)\to K\subset B_r(0) is continuous. By Brouwer’s Fixed Point Theorem, f\circ A has a fixed point: there is some w\in B_r(0) with f\bigl(A(w)\bigr) = w. As f only takes images in K, we must have w\in K. By construction, for points in K, the mapping A does not do anything: hence

\displaystyle w = f\bigl(A(w)\bigr) = f(w).



Corollary (on fixed points in convex sets of normed spaces): Let x_1,\ldots,x_n\in E a normed vector space, let \emptyset\neq K\subset\mathrm{span}(x_1,\ldots,x_n) convex, compact, and let f:K\to K continuous. Then f has a fixed point.

Proof: Let us choose a base for \mathrm{span}(x_1,\ldots,x_n) from the x_1,\ldots,x_n. We take w.l.o.g. x_1,\ldots,x_p for a certain p\leq n. Then any y\in\mathrm{span}(x_1,\ldots,x_n) has a unique representation as y=\sum_{j=1}^p\beta_jx_j, and the maping

\displaystyle A:\mathrm{span}(x_1,\ldots,x_n)\to\mathbb{R}^p,\qquad y\mapsto(\beta_1,\ldots,\beta_p)

is a bijection. As all norms on \mathbb{R}^p are equivalent, convergence issues are not affected by this bijection. Hence, the theorem and its proof work out in the setting of this corollary, too. q.e.d.


Note that this Corollary may deal with an infinite-dimensional space, however we make use of a finite-dimensional subspace only. This will become relevant in Schauder’s Theorem as well.


Theorem (Arzelà 1895, Ascoli 1884): Let X\subset\mathbb{R}^d compact, let \mathcal F be a family of continuous real-valued functions on X, which satisfies two properties:

  • it is pointwise bounded: for any x\in X, there is some M(x)\in\mathbb{R} with \left|f(x)\right|\leq M(x), for all f\in\mathcal F.
  • it is equicontinuous: for any \varepsilon>0 there is some \delta>0 such that for any x,y\in X with \left|x-y\right|<\delta we have \left|f(x)-f(y)\right|<\varepsilon, for all f\in\mathcal{F}.

Then, \mathcal F is relatively compact, that means every sequence in \mathcal F has a uniformly convergent subsequence.


Note, that we do not demand the limit of the convergent subsequence to be contained in \mathcal F; that would mean compact, instead of relatively compact.


Proof: 1st step. We get hold of a countable dense subset of X.

For our immediate uses of the theorem, it should suffice to choose \mathbb{Q}\cap X, since we will take X to be intervals and there will not be any need for more exotic applications. However, to show up something a little more general, have a look at the sets \bigl(U_{1/k}(x)\bigr)_{x\in X}. This is a covering of X and finitely many of them will suffice to cover X, for instance M_k:=\{x_{k1},\ldots,x_{kn}\}. The set M:=\bigcup_{k=1}^\infty M_k is countable. By construction, for any x_0\in X and any \varepsilon>0, we can find some point y\in M that has \left|x-y\right|<\varepsilon. Therefore, M is dense in X.

2nd step. We construct a certain subsequence to a given sequence (f_n)_n\subset\mathcal F.

This step is at the heart of the Arzelà-Ascoli-Theorem, with a diagonal argument to make it work. Let us enumerate the set M from the step 1 as \{x_1,x_2,\ldots\}.

As \mathcal F is pointwise bounded, the sequence \bigl(f_n(x_1)\bigr)_n\subset\mathbb{R} is bounded as well. By Bolzano-Weierstrass, it has a convergent subsequence that we will call \bigl(f_{1,n}(x_1)\bigr)_n.

If we evaluate this new sequence in x_2, we arrive at \bigl(f_{1,n}(x_2)\bigr)_n\subset\mathbb{R}, which is bounded as well. Again, we find a convergent subsequence that is now called \bigl(f_{2,n}(x_2)\bigr)_n. As this is a subsequence of \bigl(f_{1,n}(x_1)\bigr)_n, it converges in x_1 as well.

We continue this scheme and we find an array of sequences like this

f_{11} f_{12} f_{13} \cdots
f_{21} f_{22} f_{23} \cdots
f_{31} f_{32} f_{33} \cdots
\vdots \vdots \vdots \ddots


where each row is a subsequence of the row above. Row K is convergent in the point x_k by Bolzano-Weierstrass and convergent in the points x_1,\ldots,x_{k-1} by construction.

Now, consider the sequence (f_{nn})_n. It will converge in any point of M.

3rd step. Our subsequence of the 2nd step converges uniformly on X. We will use equicontinuity to expand the convergence from M to the whole of X.

As \mathcal F is equicontinuous, we will find for any \varepsilon>0 some \delta>0 with \left|f_{nn}(x)-f_{nn}(y)\right| < \frac\varepsilon3, for all n\in\mathbb{N}, as long as \left|x-y\right|<2\delta. Since X is compact, there are some points y_1,\ldots,y_p\in X with X\subset\bigcup_{j=1}^pU_\delta(y_j). And as M is dense in X, we can find some \xi_j\in U_\delta(y_j)\cap M for any j=1,\ldots,p.

Let x\in U_\delta(y_j), then

\displaystyle \left|x-\xi_j\right| \leq \left|x-y_j\right|+\left|y_j-\xi_j\right| < 2\delta,

which shows

\displaystyle \left| f_{nn}(x)-f_{nn}(\xi_j)\right| < \frac\varepsilon 3\qquad\text{ for any }n\in\mathbb{N}\text{ and }x\in X\cap U_\delta(y_j).

We have already seen that (f_{nn})_n is convergent on M, and hence (convergent sequences are Cauchy-sequences)

\displaystyle \left|f_{nn}(\xi_j) - f_{mm}(\xi_j)\right| < \frac\varepsilon 3\qquad \text{ for sufficiently large }m,n\text{ and for any }j=1,\ldots,p.

Now, let x\in X, no longer restricted. Then, there is some j=1,\ldots,p such that x\in U_\delta(x_j), and

\displaystyle    \begin{aligned}    \left|f_{nn}(x)-f_{mm}(x)\right| &\leq \left| f_{nn}(x)-f_{nn}(\xi_j)\right| + \left|f_{nn}(\xi_j) - f_{mm}(\xi_j)\right| + \left|f_{mm}(\xi_j) - f_{mm}(x)\right| \\    &< \frac\varepsilon3+\frac\varepsilon3+\frac\varepsilon3 = \varepsilon.    \end{aligned}

Thus, \left\|f_{nn}-f_{mm}\right\|_\infty < \varepsilon for sufficiently large n,m. This sequence is a Cauchy sequence and hence convergent. q.e.d.


This was our last stepping stone towards Schauder’s Theorem. Let’s see what we can do.


Theorem (Schauder, 1930): Let E be a normed vector space, \emptyset\neq K\subset E convex and closed, let f:K\to K continuous, f(K) relatively compact. Then f has a fixed point.


Proof: 1st step. As f(K) is relatively compact, its closure is compact. We construct a finite approximating subset of \overline{f(K)}.

Let \varepsilon>0. There are some finitely many points x_1,\ldots,x_m\in\overline{f(K)} with \overline{f(K)}\subset\bigcup_{j=1}^mU_\varepsilon(x_j). In particular, for any x\in f(K), there is some j=1,\ldots,m with \left|x_j-x\right| < \varepsilon. Let us consider the function for x\in f(K)

\displaystyle \varphi_j(x):=\mathbf{1}_{\left|x_j-x\right|<\varepsilon}\bigl(\varepsilon-\left|x-x_j\right|\bigr).

It is obviously continuous, and as \overline{f(K)} is covered by these U_\varepsilon,

\displaystyle \varphi(x)=\sum_{j=1}^m\varphi_j(x) > 0.

This allows \psi_j(x):=\frac{\varphi_j(x)}{\varphi(x)} to be well-defined, and by construction \psi(x)=\sum_{j=1}^m\psi_j(x) = 1. Hence, the function g:f(K)\to\mathrm{conv}(x_1,\ldots,x_m)

\displaystyle g(x):=\sum_{j=1}^m\psi_j(x) x_j

is continuous (the Lemma on convex sets tells us that this actually maps into the convex hull). Now, let x\in f(K). We find

\displaystyle g(x)-x = \sum_{j=1}^m\psi_j(x)x_j - x = \sum_{j=1}^m\psi_j(x)\bigl(x_j-x\bigr) = \sum_{\substack{j=1\\\left|x_j-x\right|<\varepsilon}}^m\psi_j(x)\bigl(x_j-x),

and therefore, for any x\in f(K),

\displaystyle \left|g(x)-x\right| \leq \sum_{\substack{j=1\\\left|x_j-x\right|<\varepsilon}}^m\psi_j(x)\left|x_j-x\right| < \varepsilon\sum_{j=1}^m\psi_j(x) = \varepsilon.

This shows that g uniformly approximates the identity on \mathrm{conv}(x_1,\ldots,x_m)\subset f(K). Note that g depends on the choice of \varepsilon.

2nd step. Reference to the Theorem on fixed points in convex sets and approximation of the fixed point.

We set h:=g\circ f, which is a continuous mapping

\displaystyle h:K\to\mathrm{conv}(x_1,\ldots,x_m)\subset f(K) \subset K.

We can restrict it to \mathrm{conv}(x_1,\ldots,x_m) and then re-name it \tilde h.  By the Lemma on convex sets, \mathrm{conv}(x_1,\ldots,x_m) is compact, it is finite-dimensional, and by the Corollary on fixed point sets in normed spaces, \tilde h has a fixed point z:

z = \tilde h(z) = g\bigl(f(z)\bigr)\qquad\text{ for some }z\in\mathrm{conv}(x_1,\ldots,x_m).


\displaystyle \left|f(z)-z\right| = \left|f(z)-g\bigl(f(z)\bigr)\right| < \varepsilon.

Note that z depends on g and hence on \varepsilon.

3rd step. Construction of the fixed point.

For any n\in\mathbb{N}, by step 2, we find some z_n\in\mathrm{conv}(x_1^{(n)},\ldots,x_{m(n)}^{(n)})\subset f(K)\subset K with

\displaystyle\left|f(z_n)-z_n\right| < \frac1n.

As f(K) is relatively compact, the sequence \bigl(f(z_n)\bigr)_n has a convergent subsequence: there is some \tilde z\in\overline{f(K)} with \tilde z=\lim_{k\to\infty}f(z_{n_k}). As K is closed, we get \tilde z\in \overline{f(K)}\subset\overline K = K. Now,

\displaystyle\left|z_{n_k} - \tilde z\right| \leq \left|z_{n_k} - f(z_{n_k})\right| + \left|f(z_{n_k}) - \tilde z\right| < \frac1{n_k} + \varepsilon_{n_k},

which means that z_{n_k} and \tilde z get arbitrarily close: \tilde z = \lim_{k\to\infty}z_{n_k}. Since f is continuous, we arrive at

\displaystyle f(\tilde z) = \lim_{k\to\infty}f(z_{n_k}) = \tilde z. q.e.d.


It is apparent that Schauder’s Theorem already has very general conditions that are tough to weaken further. Obviously the Theorem gets false if f is not continuous. If K were not closed, we’d get the counter-example of f:(0,1)\to(0,1), x\mapsto x^2, which doesn’t have any fixed points. If K were not convex, we’d get the counter-example of f:\partial B_1(0)\to \partial B_1(0), e^{it} \mapsto e^{i(t+\pi)}. It is hard to give a counter-example if f(K) is not relatively compact – in fact I would be interested to hear of any such counter-example or of the generalization of Schauder’s Theorem to such cases. Which is the most general such fixed point theorem?


Now, we are able to harvest the ideas of all this work and apply it to differential equations. Usually, in courses on ordinary differential equations, the famous Picard-Lindelöf-Theorem is proved, which states that for well-behaved functions f (meaning that they satisfy a Lipschitz-condition), the initial-value problem

y'(x) = f\bigl(x,y(x)\bigr),\qquad y(x_0) = y_0,

has a unique solution. This is a powerful theorem which simplifies the entire theory of differential equations. However, a little more holds true: it suffices that f is continuous to guarantee a solution. However, uniqueness is lost in general. While in many applications one can assume continuity of f without remorse (especially in physics), a Lipschitz-condition is much harder to justify. This is not to diminish the usefulness of Picard and Lindelöf, as any model has assumptions to be justified – the Lipschitz-condition is just one of them (if one even bothers to demand for a proper justification of existence and uniqueness – sometimes this would seem obvious from the start).

Let us have a look at what Peano told us:


Theorem (Peano, 1886/1890): Let f:R\to\mathbb{R} be continuous, where

\displaystyle R:=\bigl\{(x,y)\in\mathbb{R}^2\colon \left|x-x_0\right| \leq a, \left|y-y_0\right|\leq b\bigr\},

let M:=\max_{(x,y)\in R}\left|f(x,y)\right|, \alpha := \min\bigl(a, \frac bM\bigr).

Then, the initial value problem y'(x)= f\bigl(x,y(x)\bigr), y(x_0)=y_0, has a solution on the interval [x_0-\alpha, x_0+\alpha].


Concerning the interval on which we claim the solution to exist, have a look at how such a solution y might behave: as we vary x, the solution may “leave” R either to the vertical bounds (to left/right) or to the horizontonal bounds (up/down). A solution y can at most have a slope of \pm M, and thus, if it leaves on the horizontal bounds, this will happen at x_0\pm\frac bM as the earliest point. If it doesn’t leave there, it will exist until x_0\pm a. Of course, it might exist even further, but we have only demanded f to be defined till there. A little more formally, the mean value theorem tells us

\displaystyle \left|y(x)-y_0\right| = \left|y'(\xi)\right|\left|x-x_0\right| = \left|f\bigl(\xi,y(\xi)\bigr)\right|\left|x-x_0\right| \leq M\alpha\leq b.

This guarantees that the solution y is well-defined on R, because f is defined there.


Proof: 0th step. To simplify notation, let us set

\displaystyle    \begin{aligned}    J&:=[x_0-\alpha, x_0+\alpha],\\    \mathcal{C}(J)&:=\bigl\{f:J\to\mathbb{R}\text{ continuous}\bigr\},\\    K&:=\bigl\{y\in\mathcal{C}(J)\colon \left|y(x)-y_0\right|\leq b\text{ for any }x\in J\bigr\}.    \end{aligned}

1st step. We twist the problem to another equivalent shape, making it more accessible to our tools.

First, let y(x) be a solution to the initial value problem on a sub-interval I\subset J. Then, for any x\in I,

\displaystyle y'(x) = f\bigl(x,y(x)\bigr),\quad y(x_0)=y_0,

and hence

\displaystyle y(x) = y_0 + \int_{x_0}^x f\bigl(t,y(t)\bigr)dt.\qquad (\heartsuit)

On the other hand, if we start from this equation and suppose that it holds for any x\in I, y must be differentiable with y'(x) = f\bigl(x,y(x)\bigr) and y(x_0)=y_0.

We have seen that a function y solves the initial value problem on J if and only if it satisfies the equation (\heartsuit) on J.

2nd step. We try to give a representation of the problem as a fixed-point-problem.

Let us consider the mapping

\displaystyle A:K\to\mathcal{C}(J),~ y\mapsto y_0 + \int_{x_0}^{\text{\large\textbf{.}}} f\bigl(t,y(t)\bigr)dt.

This is a functional where we plug in a continuous function and where we get a continuous function back. In particular, and to make it even more painfully obvious,

\displaystyle (Ay)(x) = y_0+\int_{x_0}^xf\bigl(t,y(t)\bigr)dt,\qquad\text{for any }x\in J.

Therefore, y is a solution to the intial value problem, if it is a fixed point of A, meaning Ay=y.

3rd step. We show that A maps K to itself. We have defined A only on K, so let y\in K and x\in J; then:

\displaystyle \left|(Ay)(x)-y_0\right| = \left|\int_{x_0}^x f\bigl(t,y(t)\bigr) dt\right| \leq M\left|x-x_0\right| \leq \alpha M\leq b.

The second-to-last inequality follows from x\in J, the last one from the definition of \alpha.

This shows that Ay\in K.

4th step. K\neq\emptyset is obvious, as the constant function y_0 is in K.

5th step. K is convex. Let f,g\in K and let \beta\in[0,1]. Then, for any x\in J,

\displaystyle    \begin{aligned}    \left|(1-\beta)f(x)+\beta g(x) - y_0\right| &= \left|(1-\beta)\bigl(f(x)-y_0\bigr) + \beta\bigl(g(x)-y_0\bigr)\right| \\    &\leq (1-\beta)\left|f(x)-y_0\right| + \beta\left|g(x)-y_0\right| \\    &\leq (1-\beta)b+\beta b = b.    \end{aligned}

This proves (1-\beta)f+\beta g\in K.

6th step. K is a closed set in \mathcal{C}(J), where we use the topology of uniform convergence.

Consider the sequence (f_n)_n\subset K which converges uniformly to some f\in\mathcal{C}(J). Remember that \mathcal{C}(J) is complete, which is why we can do this. Then, for any x\in J,

\displaystyle\left|f(x)-y_0\right| = \left|\lim_{n\to\infty}f_n(x)-y_0\right| = \lim_{n\to\infty}\left|f_n(x)-y_0\right| \leq \lim_{n\to\infty} b = b.

This shows that f\in K.

7th step. Using the topology of uniform convergence, the mapping A:K\to K is continuous.

Let \varepsilon>0. The function f is continuous on the compact set R and hence uniformly continuous. Therefore, there is some \delta>0 such that for \left|u-v\right|<\delta,

\displaystyle \left|f(t,u)-f(t,v)\right|<\frac\varepsilon\alpha.

Now, let y,z\in K with \left\|y-z\right\|_\infty < \delta. Then we have just seen that for any t\in J

\displaystyle \left|f\bigl(t,y(t)\bigr)-f\bigl(t,z(t)\bigr)\right| < \frac\varepsilon\alpha.

That yields

\displaystyle    \begin{aligned}    \left|(Ay)(x)-(Az)(x)\right| &= \left|\int_{x_0}^xf\bigl(t,y(t)\bigr)dt - \int_{x_0}^xf\bigl(t,z(t)\bigr)dt\right| \\    &< \left|x-x_0\right|\frac\varepsilon\alpha\\    &\leq\varepsilon.    \end{aligned}

In particular,

\displaystyle\left\|Ay-Az\right\|_\infty < \varepsilon.

8th step. The set A(K)\subset K is relatively compact. Note that A(K) is a set of continuous functions.

Let y\in K and let x,x_1,x_2\in J. Then, every function of A(K) is bounded pointwise, since

\displaystyle \left|(Ay)(x)\right| = \left|y_0+\int_{x_0}^xf\bigl(t,y(t)\bigr)dt\right|\leq \left|y_0\right|+\left|x-x_0\right|M \leq \left|y_0\right|+\alpha M.

Besides, A(K) is equicontinuous, because of

\displaystyle    \begin{aligned}    \left|(Ay)(x_1)-(Ay)(x_2)\right| &= \left|\int_{x_0}^{x_1}f\bigl(t,y(t)\bigr)dt - \int_ {x_0}^{x_2}f\bigl(t,y(t)\bigr)dt\right| \\    &= \left|\int_{x_1}^{x_2}f\bigl(t,y(t)\bigr) dt\right| \\    &\leq \left|x_1-x_2\right| M.    \end{aligned}

Arzelà and Ascoli now tell us that any sequence in A(K) has a uniformly convergent subsequence.

9th and final step. From Schauder’s Fixed Point Theorem and steps 3 to 8, A has a fixed point in K. From step 2, the initial value problem has a solution. q.e.d.



There was a lot of technical work that we have only needed to invoke Schauder’s Theorem. Some of this could have been avoided, if we had a more elementary proof of Schauder’s Theorem. Such a proof exists, however, some of our machinery is still needed – the proof cannot honestly be called elementary. In some way, the proof matches our procedure given above, however, not everything is needed in such a fine manner. Let’s have short look at it; this is taken from Walter’s book.


Proof (Peano’s Theorem in a more elementary fashion): We proceed in two parts. First, we shall prove the weaker statement that if f is continuous and bounded on the (non-compact) set [x_0,x_0+a]\times\mathbb{R}, then there is a solution to the intial-value problem on [x_0,x_0+a]. Afterwards, we extend this to our compact set R. We won’t deal with extending the solution to the left of x_0 as it’s neither important nor difficult. In the previous proof we didn’t need to bother about this.

1st step. Let us define a function on [x_0,x_0+a] using some parameter \alpha\in(0,a] by

\displaystyle z_\alpha(x) = y_0\mathbf{1}_{x\leq x_0} + \left(y_0+\int_{x_0}^xf\bigl(t,z_\alpha(t)\bigr)dt\right)\mathbf{1}_{x>x_0}.

This is well-defined, since on the sub-interval [x_0+k\alpha, x_0+(k+1)\alpha) we have t-\alpha \in[x_0+(k-1)\alpha, x_0+k\alpha), and thus z_\alpha(t-\alpha) has been recursively defined; hence z_\alpha is defined as well.

Let us denote \mathcal{F}:=\bigl\{z_\alpha\colon\alpha\in(0,a]\bigr\}\subset\mathcal{C}\bigl([x_0,x_0+a]\bigr).

2nd step. \mathcal F is equicontinuous. Let \varepsilon > 0 and let x_1,x_2\in[x_0,x_0+a]. Then we get, as f is bounded by some M,

\displaystyle \left|z_\alpha(x_1)-z_\alpha(x_2)\right| = \left|\int_{x_1}^{x_2} f\bigl(t, z_\alpha(t-\alpha)\bigr)dt\right| \leq \left|x_2-x_1\right|M,

which doesn’t depend on \alpha, x_1 or x_2 (only on their distance). Hence, if \left|x_1-x_2\right|<\delta = \frac\varepsilon M,

\displaystyle \left|z_\alpha(x_1)-z_\alpha(x_2)\right| \leq \varepsilon.

3rd step. \mathcal F is pointwise bounded. This is obvious from \left|z_\alpha(x)\right|\leq\left|y_0\right|+aM, which doesn’t depend on \alpha or x.

4th step. We determine a solution to the intial-value problem.

From steps 2 and 3 and from Arzelà-Ascoli, we know that the sequence (z_{1/n})_n\subset\mathcal{F} has a uniformly convergent subsequence. Let us denote its limit by y(x), which is defined for all x\in[x_0,x_0+a]. This allows us to get

\displaystyle    \begin{aligned}    \left|z_{1/n_k}\left(t-\frac1{n_k}\right) - y(t)\right| &\leq \left| z_{1/n_k}\left(t-\frac1{n_k}\right) - z_{1/n_k}(t)\right| + \left|z_{1/n_k}(t) - y(t)\right| \\    &\leq \frac M{n_k} + \varepsilon\\    &< \overline\delta,    \end{aligned}

for any t\in[x_0,x_0+a] and for sufficiently large K. It should be clear what we intend to say with \overline\delta (let’s bring in a little sloppiness here, shall we). Since f is continuous in its second component, this proves

\displaystyle \left|f\left(t, z_{1/n_k}\left(t-\frac1{n_k}\right)\right) - f\bigl(t,y(t)\bigr)\right| < \overline\varepsilon\qquad\text{for any }t\in[x_0,x_0+a].

Hence, as every participant here converges uniformly,

\displaystyle    \begin{aligned}    y(x) &= \lim_{k\to\infty}z_{1/n_k}(x) \\    &= \lim_{k\to\infty}\left(y_0 + \int_{x_0}^x f\left(t, z_{1/n_k}\left(t-\frac1{n_k}\right)\right)dt\right)\\    &= y_0 + \int_{x_0}^x \lim_{k\to\infty} f\left(t, z_{1/n_k}\left(t-\frac1{n_k}\right)\right) dt\\    &= y_0 + \int_{x_0}^x f\bigl(t, y(t)\bigr) dt.    \end{aligned}

This shows that

y'(x) = f\bigl(x, y(x)\bigr),\qquad y(x_0) = y_0.

5th step. Extension to the general case: Let f be defined on the compact rectangle R.

We give a continuation of f beyond [y_0-b,y_0+b] for all x\in[x_0-a, x_0+a] via

\displaystyle \tilde f(x,y) = \begin{cases}f(x, y_0-b) ,&\text{for }y < y_0-b\\ f(x,y), &\text{for }(x,y)\in R\\ f(x,y_0+b),&\text{for }y>y_0+b\end{cases}

Obviously, \tilde f is continuous and bounded. From Step 1, y' = \tilde f(x, y) has a solution on [x_0, x_0+a]. For \left|x-x_0\right|\leq\frac bM, we get

\displaystyle    \begin{aligned}    \left|y(x) - y_0\right| &\leq\left|y'(\xi)\right|\left|x-x_0\right| \\    &= \left|\tilde f\bigl(\xi,y(\xi)\bigr)\right|\left|x-x_0\right| \\    &\leq M\left|x-x_0\right|\leq b    \end{aligned}

Therefore, the solution of y'=f(x,y) is well-defined if \left|x-x_0\right|\leq\frac bM, and thus the solution is guaranteed to exist for \left|x-x_0\right|<\alpha = \min\bigl(a,\frac bM\bigr). q.e.d.


As a sort of last-minute addendum, I have stumbled upon two articles from the 1970s that shed some more light on the issue of elementary proofs to Peano’s Theorem which completely avoid the technicalties of Schauder’s Theorem and of Arzelà-Ascoli. One is called “There is an Elementary Proof of Peano’s Existence Theorem” by Wolfgang Walter (the author of the book we cited earlier; Amer. Math. Monthly 78 1971, 170-173), the other is “On Elementary Proofs of Peano’s Existence Theorems” by Johann Walter (Amer. Math. Monthly 80, 1973, 282-286). The issue of whether Arzelà-Ascoli can be avoided is solved by both papers positively: they give proofs of Peano’s Theorem which only deal with standard calculus methods. Basically, the employ the Euler polygon method to construct a solution of the intitial value problem. However, again, the proofs are not constructive. Besides, “elementary” is not to be confused with “easy”, Peano’s Theorem is still nothing that lies directly on the surface of things. A brief look at the second of those articles (to the best of my knowledge the identical names of the authors are a coincidence) raises hope that this proof is actually not too hard – it should be understandable with a lot less effort than the proof via Schauder’s Theorem that we gave above in full detail; remember that Schauder itself required many non-standard theorems on its way. The elementary proof will only work for one-dimensional differential equations, but we bothered only with those anyway; it uses monotonicity of its approximating sequence which is only applicable in the real numbers. On the plus side, the proof explicitly constructs a solution via the Euler method.

The papers also shed some light on the history of Peano’s Theorem and the quest for its proof (together with some rather unusual disagreement on whether an earlier proof is valid or not; some interesting lines to read in passing). This should be enough on this matter for now. If the interest holds up (which is, to this extent, rather unlikely), we’ll return to it. But not for now.


Brouwer’s Fixed Point Theorem

Recently, we have concluded the text on the Transformation formula, remarking that it was a tool in an elementary proof of Brouwer’s Fixed Point Theorem. Let’s have a closer look at that.

Brouwer’s Fixed Point Theorem is at the core of many insights in topology and functional analysis. As many other powerful theorems, it can be stated and understood very easily, however the proof is quite deep. In particular, the conclusions that are drawn from it, are considered even deeper. As we shall see, Brouwer’s theorem can be shown in an elementary fashion, where the Transformation Formula, the Inverse Function Theorem and Weierstrass’ Approximation Theorem are the toughest stepping stones; note that we have given a perfectly elementary proof of Weierstrass’ Theorem before. This makes Brouwer’s theorem accessible to undergraduate calculus students (even though, of course, these stepping stones already mean bringing the big guns to the fight). The downside is that the proof, even though elementary, is quite long-ish. The undergraduate student needs to exercise some patience.


Theorem (Brouwer, 1910): Let K:=\{x\in\mathbb{R}^p\colon \left|x\right|\leq1\} be the compact unit ball, and let f:K\to K be continuous. Then f has a fixed point, i.e. there is some x\in K such that f(x)=x.


There are many generalizations of the Theorem, considering more complex sets instead of K, and taking place in the infinite-dimensional space. We shall get back to that later. First, we shall look at a perfectly trivial and then a slightly less trivial special case.


For p=1, the statement asks to find a fixed point for the continuous mapping f:[0,1]\to[0,1]. W.l.o.g. we have shrunk the set K to [0,1] instead of [-1,1] to avoid some useless notational difficulty. This is a standard exercise on the intermediate value theorem with the function g(x):=f(x)-x. Either, f(1)=1 is the fixed point, or else f(1)<1, meaning g(1)<0 and g(0)=f(0)\geq0. As g is continuous, some point needs to be the zero of g, meaning 0=g(\xi) = f(\xi)-\xi and hence f(\xi)=\xi. q.e.d. (p=1)


For p=2, things are still easy to see, even though a little less trivial. This is an application of homotopy theory (even though one doesn’t need to know much about it). The proof is by contradiction however. We will show an auxiliary statement first: there is no continuous mapping h:K\to\partial K, which is the identity on \partial K, i.e. h(x)=x for x\in\partial K. If there was, we would set

\displaystyle H(t,s):=h(se^{it}), \qquad t\in[0,2\pi], s\in[0,1].

H is a homotopy of the constant curve h(0) = H(t,0) to the circle e^{it} = h(e^{it}) = H(t,1). This means, we can continuously transform the constant curve to the circle. This is a contradiction, as the winding number of the constant is 0, but the winding number of the circle is 1. There can be no such h.

Now, we turn to the proof of the actual statement of Brouwer’s Theorem: If f had no fixed point, we could define a continuous mapping as follows: let x\in K, and consider the line through x and f(x) (which is well-defined by assumption). This line crosses \partial K in the point h(x); actually there are two such points, we shall use the one that is closer to x itself. Apparently, h(x)=x for x\in\partial K. By the auxiliary statement, there is no such h and the assumption fails. f must have a fixed point. q.e.d. (p=2)


For the general proof, we shall follow the lines of Heuser who has found this elementary fashion in the 1970’s and who made it accessible in his second volume of his book on calculus. It is interesting, that most of the standard literature for undergraduate students shies away from any proof of Brouwer’s theorem. Often, the theorem is stated without proof and then some conclusions and applications are drawn from it. Sometimes, a proof via differential forms is given (such as in Königsberger’s book, where it is somewhat downgraded to an exercise after the proof of Stoke’s Theorem) which I wouldn’t call elementary because of the theory which is needed to be developed first. The same holds for proofs using homology groups and the like (even though this is one of the simplest fashions to prove the auxiliary statement given above – it was done in my topology class, but this is by no means elementary).

A little downside is the non-constructiveness of the proof we are about to give. It is a proof by contradiction and it won’t give any indication on how to find the fixed point. For many applications, even the existence of a fixed point is already a gift (think of Peano’s theorem on the existence of solutions to a differential equation, for instance). On the other hand, there are constructive proofs as well, a fact that is quite in the spirit of Brouwer.

In some way, the basic structure of the following proof is similar to the proof that we gave for the case p=2. We will apply the same reasoning that concluded the proof for the special case (after the auxiliary statement), we will just add a little more formality to show that the mapping g is actually continuous and well-defined. The trickier part in higher dimensions is to show the corresponding half from which the contradiction followed. Our auxiliary statement within this previous proof involved the non-existence of a certain continuous mapping, that is called a retraction: for a subset A of a topological space X, f:X\to A is called a retraction of X to A, if f(x)=x for all x\in A. We have found that there is no retraction from K to \partial K. As a matter of fact, Brouwer’s Fixed Point Theorem and the non-existence of a retraction are equivalent (we’ll get back to that at the end).

The basic structure of the proof is like this:

  • we reduce the problem to polynomials, so we only have to deal with those functions instead of a general continuous f;
  • we formalize the geometric intuition that came across in the special case p=2 (this step is in essence identical to what we did above): basing on the assumption that Brouwer’s Theorem is wrong, we define a mapping quite similar to a retraction of K to \partial K;
  • we show that this almost-retraction is locally bijective;
  • we find, via the Transformation Formula, a contradiction: there can be no retraction and there must be a fixed point.

Steps 3 and 4 are the tricky part. They may be replaced by some other argument that yields a contradiction (homology theory, for instance), but we’ll stick to the elementary parts. Let’s go.


Lemma (The polynomial simplification): It will suffice to show Brouwer’s Fixed Point Theorem for those functions f:K\to K, whose components are polynomials on K and which have f(K)\subset\mathring K.


Proof: Let f:K\to K continuous, it has the components f = (f_1,\ldots,f_p), each of which has the arguments x_1,\ldots,x_p. By Weierstrass’ Approximation Theorem, for any \varepsilon>0 there are polynomials p_k^\varepsilon such that \left|f_k(x)-p_k^\varepsilon(x)\right| < \varepsilon, k=1,\ldots,p, for any x\in K. In particular, there are polynomials \varphi_{k,n} such that

\displaystyle \left|f_k(x)-\varphi_{k,n}(x)\right| < \frac{1}{\sqrt{p}n}\qquad\text{for any }x\in K.

If we define the function \varphi_n:=(\varphi_{1,n},\ldots,\varphi_{p,n}) which maps K to \mathbb{R}^p, we get

\displaystyle    \begin{aligned}    \left|f(x)-\varphi_n(x)\right|^2 &= \sum_{k=1}^p\left|f_k(x)-\varphi_{k,n}(x)\right|^2 \\    &< \frac{p}{pn^2} \\    &= \frac1{n^2},\qquad\text{for any }x\in K    \end{aligned}

and in particular \varphi_n\to f uniformly in K.


\displaystyle \left|\varphi_n(x)\right|\leq\left|\varphi_n(x)-f(x)\right| + \left|f(x)\right| < \frac1n + \left|f(x)\right| \leq \frac1n + 1 =:\alpha_n.

This allows us to set

\displaystyle \psi_n(x) = \frac{\varphi_n(x)}{\alpha_n}.

This function also converges uniformly to f, as for any x\in K,

\displaystyle    \begin{aligned}    \left|\psi_n(x)-f(x)\right| &= \left|\frac{\varphi_n(x)}{\alpha_n} - f(x)\right| \\    &= \frac1{\left|\alpha_n\right|}\left|\varphi_n(x)-\alpha_nf(x)\right|\\    &\leq \frac1{\left|\alpha_n\right|}\left|\varphi_n(x)-f(x)\right| + \frac1{\left|\alpha_n\right|}\left|f(x)-\alpha_nf(x)\right|\\    &< \frac1{\left|\alpha_n\right|}\frac1n + \frac1{\left|\alpha_n\right|}\left|f(x)\right|\left|1-\alpha_n\right|\\    &< (1+\delta)\frac1n + \frac{\delta}{1+\delta}\left|f(x)\right|\\    &< \varepsilon \qquad\text{for }n\gg0.    \end{aligned}

Finally, for x\in K, by construction, \left|\varphi_n(x)\right|\leq\alpha_n, and so \left|\psi_n(x)\right| = \frac{\left|\varphi_n(x)\right|}{\alpha_n} < 1, which means that \psi_n:K\to\mathring K.

The point of this lemma is to state that if we had shown Brouwer’s Fixed Point Theorem for every such function \psi_n:K\to\mathring K, whose components are polynomials, we had proved it for the general continuous function f. This can be seen as follows:

As we suppose Brouwer’s Theorem was true for the \psi_n, there would be a sequence (x_n)\subset K with \psi_n(x_n) = x_n. As K is (sequentially) compact, there is a convergent subsequence (x_{n_j})\subset(x_n), and \lim_jx_{n_j} = x_0\in K. For sufficiently large j, we see

\displaystyle \left|\psi_{n_j}(x_{n_j})-f(x_0)\right| \leq\left|\psi_{n_j}(x_{n_j})-f(x_{n_j})\right| + \left|f(x_{n_j})-f(x_0)\right| < \frac\varepsilon2 + \frac\varepsilon2.

The first bound follows from the fact that \psi_{n_j}\to f uniformly, the second bound is the continuity of f itself, with the fact that x_{n_j}\to x_0. In particular,

\displaystyle x_0 = \lim_{j} x_{n_j} = \lim_{j} \psi_{n_j}(x_{n_j}) = f(x_0).

So, f has the fixed point x_0, which proves Brouwer’s Theorem.

In effect, it suffices to deal with functions like the \psi_n for the rest of this text. q.e.d.


Slogan (The geometric intuition): If Brouwer’s Fixed Point Theorem is wrong, then there is “almost” a retraction of K to \partial K.

Or, rephrased as a proper lemma:

Lemma: For f being polynomially simplified as in the previous lemma, assuming x\neq f(x) for any x\in K, we can construct a continuously differentiable function g_t:K\to K, t\in[0,1], with g_t(x)=x for x\in\partial K. This function is given via

\displaystyle g_t(x) =x + t\lambda(x)\bigl(x-f(x)\bigr),

\displaystyle \lambda(x)=\frac{-x\cdot\bigl(x-f(x)\bigr)+\sqrt{\left(x\cdot\bigl(x-f(x)\bigr)\right)^2+\bigl(1-\left|x\right|^2\bigr)\left|x-f(x)\right|^2}}{\left|x-f(x)\right|^2}.

The mapping t\mapsto g_t is the direct line from x to the boundary of \partial K, which also passes through f(x). \lambda(x) is the parameter in the straight line that defines the intersection with \partial K.


Proof: As we suppose, Brouwer’s Fixed Point Theorem is wrong the continuous function \left|x-f(x)\right| is positive for any x\in K. Because of continuity, for every y\in \partial K, there is some \varepsilon = \varepsilon(y) > 0, such that still \left|x-f(x)\right|>0 in the neighborhood U_{\varepsilon(y)}(y).

Here, we have been in technical need of a continuation of f beyond K. As f is only defined on K itself, we might take f(x):=f\bigl(\frac{x}{\left|x\right|}\bigr) for \left|x\right|>1. We still have \left|f(x)\right| < 1 and f(x)\neq x, which means that we don’t get contradictions to our assumptions on f. Let’s not dwell on this for longer than necessary.

On the compact set \partial K, finitely many of the neighborhoods U_{\varepsilon(y)}(y) will suffice to cover \partial K. One of them will have a minimal radius. We shall set \delta =  \min_y\varepsilon(y) +1, to find: there is an open set U = U_\delta(0)\supset K with \left|x-f(x)\right| >0 for all x\in U.

Let us define for any x\in U

\displaystyle d(x):=\frac{\left(x\cdot\bigl(x-f(x)\bigr)\right)^2+\bigl(1-\left|x\right|\bigr)^2\left|x-f(x)\right|^2}{\left|x-f(x)\right|^4}.

It is well-defined by assumption. We distinguish three cases:


a) \left|x\right|<1: Then 1-\left|x\right|^2>0 and the numerator of d(x) is positive.

b) \left|x\right|=1: Then the numerator of d(x) is

\displaystyle \left(x\cdot\bigl(x-f(x)\bigr)\right)^2 = \bigl(x\cdot x - x\cdot f(x)\bigr)^2 = \bigl(\left|x\right|^2-x\cdot f(x)\bigr)^2 = \bigl(1-x\cdot f(x)\bigr)^2,

where by Cauchy-Schwarz and by assumption on f, we get

\displaystyle x\cdot f(x) \leq \left|x\right|\left|f(x)\right| = \left|f(x)\right| < 1\qquad (\spadesuit).

In particular, the numerator of d(x) is strictly positive.

c) \left|x\right|>1: This case is not relevant for what’s to come.


We have seen that d(x)>0 for all \left|x\right|\leq 1. Since d is continuous, a compactness argument similar to the one above shows that there is some V = V_{\delta'}(0)\supset K with d(x)>0 for all x\in V. If we pick \delta'=\delta if \delta is smaller, we find: d is positive and well-defined on V.

The reason why we have looked at d is not clear yet. Let us grasp at some geometry first. Let x\in V and \Gamma_x = \left\{x+\lambda\bigl(x-f(x)\bigr)\colon\lambda\in\mathbb{R}\right\} the straight line through x and f(x). If we look for the intersection of \Gamma_x with \partial K, we solve the equation

\displaystyle\left|x+\lambda\bigl(x-f(x)\bigr)\right| = 1.

The intersection “closer to” x is denoted by some \lambda>0.

This equation comes down to

\displaystyle    \begin{aligned}    && \left(x+\lambda\bigl(x-f(x)\bigr)\right) \cdot \left(x+\lambda\bigl(x-f(x)\bigr)\right) &=1 \\    &\iff& \left|x\right|^2 + 2\lambda x\cdot\bigl(x-f(x)\bigr) + \lambda^2\left|x-f(x)\right|^2 &=1\\    &\iff& \lambda^2\left|x-f(x)\right|^2 + 2\lambda x\cdot\bigl(x-f(x)\bigr) &= 1-\left|x\right|^2\\    &\iff& \left(\lambda+\frac{x\cdot\bigl(x-f(x)\bigr)}{\left|x-f(x)\right|^2}\right)^2 &= \frac{1-\left|x\right|^2}{\left|x-f(x)\right|^2} + \left(\frac{x\cdot\bigl(x-f(x)\bigr)}{\left|x-f(x)\right|^2}\right)^2 \\    &\iff& \left(\lambda+\frac{x\cdot\bigl(x-f(x)\bigr)}{\left|x-f(x)\right|^2}\right)^2 &= \frac{(1-\left|x\right|)^2\left|x-f(x)\right|^2+\left(x\cdot\bigl(x-f(x)\bigr)\right)^2}{\left|x-f(x)\right|^4} \\    &\iff& \left(\lambda+\frac{x\cdot\bigl(x-f(x)\bigr)}{\left|x-f(x)\right|^2}\right)^2 &= d(x).    \end{aligned}

As x\in V, d(x)>0, and hence there are two real solutions to the last displayed equation. Let \lambda(x) be the larger one (to get the intersection with \partial K closer to x), then we find

\displaystyle    \begin{aligned}    \lambda(x) &= \sqrt{d(x)} - \frac{x\cdot\bigl(x-f(x)\bigr)}{\left|x-f(x)\right|^2}\\    &= \frac{-x\cdot\bigl(x-f(x)\bigr)+\sqrt{\left(x\cdot\bigl(x-f(x)\bigr)\right)^2+\bigl(1-\left|x\right|^2\bigr)\left|x-f(x)\right|^2}}{\left|x-f(x)\right|^2}.    \end{aligned}

By construction,

\displaystyle \left|x+\lambda(x)\bigl(x-f(x)\bigr)\right| = 1,\qquad\text{for all }x\in V.\qquad(\clubsuit)

Let us define

\displaystyle g_t(x) = x+t\lambda(x)\bigl(x-f(x)\bigr),\qquad t\in[0,1],~x\in V.

This is (at least) a continuously differentiable function, as we simplified f to be a polynomial and the denominator in \lambda(x) is bounded away from 0. Trivially and by construction, g_0(x)=x and \left|g_1(x)\right| = 1 for all x\in V.

For \left|x\right|<1 and t<1, we have

\displaystyle    \begin{aligned}    \left|x+t\lambda(x)\bigl(x-f(x)\bigr)\right| &\stackrel{\hphantom{(\clubsuit)}}{=} \left|\bigl(t+(1-t)\bigr)x + t\lambda(x)\bigl(x-f(x)\bigr)\right|\\    &\stackrel{\hphantom{(\clubsuit)}}{=}\left|t\left(x+\lambda(x)\bigl(x-f(x)\bigr)\right)+(1-t)x\right|\\    &\stackrel{\hphantom{(\clubsuit)}}{\leq} t\left|x+\lambda(x)\bigl(x-f(x)\bigr)\right|+(1-t)\left|x\right|\\    &\stackrel{(\clubsuit)}{=} t+(1-t)\left|x\right|\\    &\stackrel{\hphantom{(\clubsuit)}}{<} t+(1-t) = 1\qquad (\heartsuit).    \end{aligned}

Hence, \left|g_t(x)\right|<1 for \left|x\right|<1 and t\in[0,1). This means g_t(\mathring K)\subset\mathring K for t<1.

For \left|x\right|=1, we find (notice that x\cdot\bigl(x-f(x)\bigr)>0 for \left|x\right|=1, by (\spadesuit)).

\displaystyle    \begin{aligned}    \lambda(x) &= \frac{-x\cdot\bigl(x-f(x)\bigr)+\sqrt{\left(x\cdot\bigl(x-f(x)\bigr)\right)^2}}{\left|x-f(x)\right|^4} \\    &= \frac{-x\cdot\bigl(x-f(x)\bigr)+x\cdot\bigl(x-f(x)\bigr)}{\left|x-f(x)\right|^4} = 0.    \end{aligned}

This is geometrically entirely obvious, since \lambda(x) denotes the distance of x to the intersection with \partial K; if x\in\partial K, this distance is apparently 0.

We have seen that g_t(x)=x for \left|x\right|=1 for any t\in[0,1]. Hence, g_t(\partial K)=\partial K for all t. g_t is almost a retraction, g_1 actually is a retraction. q.e.d.


Note how tricky the general formality gets, compared to the more compact and descriptive proof that we gave in the special case p=2. The arguments of the lemma and in the special case are identical.


Lemma (The bijection): Let \hat K be a closed ball around 0, K\subset\hat K\subset V. The function g_t is a bijection on \hat K, for t\geq0 sufficiently small.


Proof: We first show that g_t is injective. Let us define h(x):=\lambda(x)\bigl(x-f(x)\bigr), for reasons of legibility. As we saw above, h is (at least) continuously differentiable. We have

\displaystyle g_t(x) = x+th(x),\qquad g_t'(x)=\mathrm{Id}+th'(x).

As \hat K is compact, h' is bounded by \left|h'(x)\right|\leq C, say. By enlarging C if necessary, we can take C\geq1. Now let x,y\in\hat K with g_t(x)=g_t(y). That means x+th(x)=y+th(y) and so, by the mean value theorem,

\displaystyle \left|x-y\right| = t\left|h(x)-h(y)\right|\leq tC\left|x-y\right|.

By setting \varepsilon:=\frac1C and taking t\in[0,\varepsilon), we get \left|x-y\right| = 0. g_t is injective for t<\varepsilon.

Our arguments also proved \left|th'(x)\right| < 1. Let us briefly look at the convergent Neumann series \sum_{k=0}^\infty\bigl(th'(x)\bigr)^k, having the limit s, say. We find

\displaystyle sth'(x) = \sum_{k=0}^\infty\bigl(th'(x)\bigr)^{k+1} = s-\mathrm{Id},

which tells us

\displaystyle \mathrm{Id} = s-s\cdot th'(x) = s\bigl(\mathrm{Id}-th'(x)\bigr).

In particular, g_t'(x) = \mathrm{Id}-th'(x) is invertible, with the inverse s. Therefore, \det g_t'(x)\neq0. Since this determinant is a continuous function of t, and \det g_0'(x) = \det\mathrm{Id} = 1, we have found

\displaystyle \det g_t'(x) > 0 \text{ for any }t\in[0,\varepsilon),~x\in\hat K.

Now, let us show that g_t is surjective. As \det g_t'(x) never vanishes on \hat K, g_t is an open mapping (by an argument involving the inverse function theorem; g_t can be inverted locally in any point, hence no point can be a boundary point of the image). This means that g_t(\mathring K) is open.

Let z\in K with z\notin g_t(\mathring K); this makes z the test case for non-surjectivity. Let y\in g_t(\mathring K); there is some such y due to (\heartsuit). The straight line between y and z is

\displaystyle \overline{yz}:=\left\{(1-\lambda)y+\lambda z\colon \lambda\in[0,1]\right\}.

As g_t is continuous, there must be some point v\in\partial g_t(\mathring K)\cap\overline{yz}; we have to leave the set g_t(\mathring K) somewhere. Let us walk the line until we do, and then set

\displaystyle v=(1-\lambda_0)y+\lambda_0z,\qquad\text{with }\lambda_0=\sup\left\{\lambda\in[0,1]\colon\overline{y;(1-\lambda)y+\lambda z}\subset g_t(\mathring K)\right\}.

Now, continuous images of compact sets remain compact: g_t(K) is compact and hence closed. Therefore, we can conclude

\displaystyle g_t(\mathring K)\subset g_t(K)\quad\implies\quad \overline{g_t(\mathring K)}\subset g_t(K)\quad\implies\quad v\in\overline{g_t(\mathring K)}\subset g_t(K).

This means that there is some u\in K such that v=g_t(u). As g_t(\mathring K) is open, u\in\partial K (since otherwise, v\notin\partial g_t(\mathring K) which contradicts the construction). Therefore, \left|u\right|=1, and since g_t is almost a retraction, g_t(u)=u. Now,

\displaystyle v=g_t(u) = u \quad\implies\quad v\in\partial K.

But by construction, v is a point between z\in K and y\in g_t(\mathring K); however, y\notin\partial K, since g_t(\mathring K) is open. Due to the convexity of K, we have no choice but z\in\partial K, and by retraction again, g_t(z)=z. In particular, z\in g_t(\partial K).

We have shown that if z\notin g_t(\mathring K), then z\in g_t(\partial K). In particular, z\in g_t(K) for any z\in K. g_t is surjective. q.e.d.


Lemma (The Integral Application): The real-valued function

\displaystyle V(t)=\int_K\det g_t'(x)dx

is a polynomial and satisfies V(1)>0.


Proof: We have already seen in the previous lemma that \det g_t'(x)>0 on x\in\mathring{\hat K} for t<\varepsilon. This fact allows us to apply the transformation formula to the integral:

\displaystyle V(t) = \int_{g_t(K)}1dx.

As g_t is surjective, provided t is this small, g_t(K) = K, and therefore

\displaystyle V(t) = \int_K1dx = \mu(K).

In particular, this no longer depends on t, which implies V(t)>0 for any t<\varepsilon.

By the Leibniz representation of the determinant, \det g_t'(x) is a polynomial in t, and therefore, so is V(t). The identity theorem shows that V is constant altogether: in particular V(1)=V(0)>0. q.e.d.


Now we can readily conclude the proof of Brouwer’s Fixed Point Theorem, and we do it in a rather unexpected way. After the construction of g_t, we had found \left|g_1(x)\right|=1 for all x\in V. Let us write this in its components and take a partial derivative (j=1,\ldots,p)

\displaystyle    \begin{aligned}    &&1 &= \sum_{k=1}^p\bigl(g_{1,k}(x)\bigr)^2\\    &\implies& 0 &= \frac\partial{\partial x_j}\sum_{k=1}^p\bigl(g_{1,k}(x)\bigr)^2 = \sum_{k=1}^p2\frac{\partial g_{1,k}(x)}{\partial x_j}g_{1,k}(x)    \end{aligned}

This last line is a homogeneous system of linear equations, that we might also write like this

\displaystyle \begin{pmatrix}\frac{\partial g_{1,1}(x)}{\partial x_1}&\cdots &\frac{\partial g_{1,p}(x)}{\partial x_1}\\ \ldots&&\ldots\\ \frac{\partial g_{1,1}(x)}{\partial x_p}&\cdots&\frac{\partial g_{1,p}(x)}{\partial x_p}\end{pmatrix} \begin{pmatrix}\xi_1\\\ldots\\\xi_p\end{pmatrix} = 0,

and our computation has shown that the vector \bigl(g_{1,1}(x),\ldots,g_{1,p}(x)\bigr) is a solution. But the vector 0 is a solution as well. These solutions are different because of \left|g_1(x)\right| = 1. If a system of linear equations has two different solutions, it must be singular (it is not injective), and the determinant of the linear system vanishes:

\displaystyle 0 = \det \begin{pmatrix}\frac{\partial g_{1,1}(x)}{\partial x_1}&\cdots &\frac{\partial g_{1,p}(x)}{\partial x_1}\\ \ldots&&\ldots\\ \frac{\partial g_{1,1}(x)}{\partial x_p}&\cdots&\frac{\partial g_{1,p}(x)}{\partial x_p}\end{pmatrix} = \det g_1'(x).

This means

\displaystyle 0 = \int_K\det g_1'(x)dx = V(1) > 0.

A contradiction, which stems from the basic assumption that Brouwer’s Fixed Point Theorem were wrong. The Theorem is thus proved. q.e.d.


Let us make some concluding remarks. Our proof made vivid use of the fact that if there is a retraction, Brouwer’s Theorem must be wrong (this is where we got our contradiction in the end: the retraction cannot exist). The proof may also be started the other way round. If we had proved Brouwer’s Theorem without reference to retractions (this is how Elstrodt does it), you can conclude that there is no retraction from K to \partial K as follows: if there was a retraction g:K\to\partial K, we could consider the mapping -g. It is, in particular, a mapping of K to itself, but it does not have any fixed point – a contradiction to Brouwer’s Theorem.


Brouwer’s Theorem, as we have stated it here, is not yet ready to drink. For many applications, the set K is too much of a restriction. It turns out, however, that the hardest work has been done. Some little approximation argument (which in the end amounts to continuous projections) allows to formulate, for instance:

  • Let C\subset\mathbb{R}^p be convex, compact and C\neq\emptyset. Let f:C\to C be continuous. Then f has a fixed point.
  • Let E be a normed space, K\subset E convex and \emptyset\neq C\subset K compact. Let f:K\to C be continuous. Then f has a fixed point.
  • Let E be a normed space, K\subset E convex and K\neq\emptyset. Let f:K\to K be continuous. Let either K be compact or K bounded and f(K) relatively compact. Then f has a fixed point.

The last two statements are called Schauder’s Fixed Point Theorems, which may often be applied in functional analysis, or are famously used for proofs of Peano’s Theorem in differential equations. But at the core of all of them is Brouwer’s Theorem. This seems like a good place to end.