The Theorems of Schauder and Peano

In the text on Brouwer’s Fixed Point Theorem, I had confidently stated that Schauder’s Theorem follows from it with less effort and that one may easily conclude things like Peano’s Theorem from there. As a matter of fact, things lie considerably deeper than I had naively thought. If sound proofs are to be given, there is technical work to be done in many instances. The ideas are not tough themselves, but the sheer number of steps to be taken and the methodological machinery cannot be neglected. Let’s see.

We shall follow the lines of Heuser’s books (both this one and this one), as we did before, to collect the ingredients to give a proof of Schauder’s Fixed Point Theorem. It involves a statement about convex sets, on which we will focus first, followed by an excursion on approximation in normed vector spaces. We shall also need the theorem named after Arzelà and Ascoli, being a basic glimpse into the ways of thinking of functional analysis. All of this will allow us to prove Schauder’s Theorem in a rather strong flavour. For the conclusion, we split our path: we give both Heuser’s treatment of Peano’s Theorem in the spirit of functional analysis and Walter’s more elementary approach (which, however, also makes use of the Theorem of Arzelà-Ascoli).


Remember that a set K is called convex, if for any x,y\in K and for any \alpha\in[0,1], we have \alpha x + (1-\alpha)y\in K. This formalizes the intuition that the line connecting x and y be contained in K as well.


Lemma (on convex sets): Let E be a normed space and let x_1,\ldots,x_n\in E. Let

\displaystyle\mathrm{conv}(x_1,\ldots,x_n) = \bigcap_{\substack{K\subset E~\mathrm{ convex}\\\{x_1,\ldots,x_n\}\subset K}} K

the convex hull. Then,

\displaystyle \mathrm{conv}(x_1,\ldots,x_n) = \left\{v\in E\colon v=\sum_{i=1}^n\lambda_ix_i\text{ with }\sum_{i=1}^n\lambda_i=1; \lambda_i\geq0\right\},\qquad(\diamondsuit)

and \mathrm{conv}(x_1,\ldots,x_n) is compact.


Proof: Let us first prove the representation of the convex hull. For the “\subset“-direction, we will show that the set on the right-hand side of (\diamondsuit) is convex. Let x = \sum_{i=1}^n\lambda_ix_i and y=\sum_{i=1}^n\mu_ix_i, with \sum\lambda_i=\sum\mu_i=1. Let \alpha\in[0,1], then

\displaystyle \alpha x + (1-\alpha)y = \sum_{i=1}^n\bigl(\alpha\lambda_i+(1-\alpha)\mu_i\bigr)x_i,

where \sum_{i=1}^n\bigl(\alpha\lambda_i+(1-\alpha)\mu_i\bigr) = \alpha+(1-\alpha) = 1. Hence, \alpha x+(1-\alpha)y is part of the set on the right-hand side of (\diamondsuit) .

We now turn to the “\supset“-direction. Let y_1,\ldots, y_m\in\mathrm{conv}(x_1,\ldots,x_n). We show that \sum_{j=1}^m\mu_jy_j\in\mathrm{conv}(x_1,\ldots,x_m) if \sum_{j=1}^m\mu_j = 1. That means any point that as a representation as in the right-hand side of (\diamondsuit) must be in \mathrm{conv}(x_1,\ldots,x_n). This is clear for m=1. For m>1, we take C:=\sum_{j=1}^{m-1}\mu_j to see

\displaystyle    \begin{aligned}    \sum_{j=1}^m\mu_jy_j &= \sum_{j=1}^{m-1}\mu_jy_j + \mu_my_m\\    &= C\sum_{j=1}^{m-1}\frac{\mu_j}{C}y_j + \mu_my_m\\    &= C\sum_{j=1}^{m-1}\tilde\mu_j y_j + (1-C)y_m.    \end{aligned}

Note that \mu_m = \sum_{j=1}^m\mu_j - \sum_{j=1}^{m-1}\mu_j = 1-C. By induction,

\displaystyle \sum_{j=1}^{m-1}\tilde\mu_jy_j\in\mathrm{conv}(x_1,\ldots,x_n),

since \sum_{j=1}^{m-1}\tilde\mu_j = \frac1C\sum_{j=1}^{m-1}\mu_j = 1. Hence,


Finally, we shall prove compactness. Let (y_k)_k\subset\mathrm{conv}(x_1,\ldots,x_n), with a representation y_k = \sum_{i=1}^n\lambda_i^{(k)}x_i. The sequences (\lambda_i^{(k)})_k are bounded by [0,1] and hence have convergent subsequences. Choosing subsequences n times (for each i=1,\ldots,n), we find a subsequence (\lambda_i^{(k_\ell)})_\ell that converges to some \lambda_i for each i=1,\ldots,n. Besides,

\displaystyle\sum_{i=1}^n\lambda_i = \sum_{i=1}^n\lim_{\ell\to\infty}\lambda_i^{(k_\ell)} = \lim_{\ell\to\infty}1 = 1.

This yields

\displaystyle\lim_{\ell\to\infty}y_{k_\ell} = \lim_{\ell\to\infty}\sum_{j=1}^n\lambda_j^{(k_\ell)}x_j = \sum_{j=1}^n\lambda_jx_j\in\mathrm{conv}(x_1,\ldots,x_n).



We will now prove a result that extends Brouwer’s Fixed Point Theorem to a more general setting. This is the one that I had skimmed earlier, believing it consisted only of standard arguments; in principle, this is true. But let’s have a closer look at it and how these standard arguments work together.


Theorem (on fixed points in real convex sets): Let \emptyset\neq K\subset\mathbb{R}^p be convex, compact, and let f:K\to K be continuous. Then f has a fixed point.

Proof: 0th step. As K is compact, it is bounded and thus there is some r>0 such that K\subset B_r(0).

1st step. Let us construct the best approximation of some x\in B_r(0) within K; that means we look for a z\in K with \left|x-z\right| = \inf_{y\in K}\left|x-y\right|.

Taking \gamma:=\inf_{y\in K}\left|x-y\right|, there is a sequence (z_n)_n\subset K with \lim_{n\to\infty}\left|x-z_n\right| = \gamma.

We wish to prove that (z_n)_n is a Cauchy sequence. From the basic properties of any scalar product, we find

\displaystyle    \begin{aligned}    \left|u+v\right|^2+\left|u-v\right|^2 &= \left\langle u+v,u+v\right\rangle + \left\langle u-v,u-v\right\rangle \\    &= \left|u\right|^2 + \left|v\right|^2 + 2\left\langle u,v\right\rangle + \left|u\right|^2 + \left|v\right|^2 - 2\left\langle u,v\right\rangle \\    &= 2 \left|u\right|^2 + 2 \left|v\right|^2.    \end{aligned}

In our case, this shows

\displaystyle    \begin{aligned}    \left|z_n-z_m\right|^2 &= \left|(z_n-x)-(z_m-x)\right|^2 \\    &= 2\left|z_n-x\right|^2 + 2\left|z_m-x\right|^2- \left|(z_m+z_n)-2x\right|^2 \\    &= 2\left|z_n-x\right|^2 + 2\left|z_m-x\right|^2 - 4\left|\frac{z_m+z_n}2-x\right|^2.    \end{aligned}

Since K is convex, \frac12z_n+\frac12z_m\in K. Therefore,



\displaystyle    \begin{aligned}    \lim_{n,m\to\infty}\left|z_n-z_m\right|^2&\leq 2\lim_{n\to\infty}\left|z_n-x\right|^2 + 2\lim_{m\to\infty}\left|z_m-x\right|^2 - 4\gamma^2 \\    &= 2\gamma^2+2\gamma^2-4\gamma^2 = 0.    \end{aligned}

Therefore, (z_n)_n is a Cauchy sequence, having a limit y, say. As K is closed, y\in K. In total, we have seen (noting that the absolute value is continuous)

\displaystyle\gamma = \lim_{n\to\infty}\left|z_n-x\right| = \left|y-x\right|.

y is the best approximation to x within K.

2nd step. The best approximation is unique.

If there were two of them, u and V, say, then \left|x-u\right| = \left|x-v\right| = \gamma. If we consider the sequence (z_n)_n that alternates between u and V, we’d find \left|x-z_n\right| = \gamma for all n, and hence (z_n)_n is a Cauchy sequence by what we found in step 1. Therefore z_n must be convergent, which implies u=v.

3rd step. The mapping A:B_r(0) \to K that takes x to its best approximation, is continuous.

Let (x_n)_n\subset B_r(0) with \lim_{n\to\infty}x_n =: x. Let \varepsilon>0. For sufficiently large n, we have

\displaystyle \gamma_n :=\inf_{y\in K}\left|x_n-y\right| \leq \inf_{y\in K}\bigl(\left|x_n-x\right|+\left|x-y\right|\bigr) < \varepsilon + \inf_{y\in K}\left|x-y\right| = \varepsilon + \gamma.


\displaystyle \gamma\leq\left|x - A(x_n)\right|\leq\left|x_n-A(x_n)\right| + \left|x_n-x\right| = \gamma_n + \left|x_n-x\right| < \gamma_n+\varepsilon.

These inequalities give us

\displaystyle \gamma\leq\left|x-A(x_n)\right| < 2\varepsilon + \gamma,\qquad\text{ which means }\lim_{n\to\infty}\left|x-A(x_n)\right| = \gamma.

Hence, \lim_{n\to\infty}A(x_n) is the best approximation to x. As this is unique, we have shown that A is sequentially continuous.

4th step. The quest for the fixed point.

The mapping f\circ A:B_r(0)\to K\subset B_r(0) is continuous. By Brouwer’s Fixed Point Theorem, f\circ A has a fixed point: there is some w\in B_r(0) with f\bigl(A(w)\bigr) = w. As f only takes images in K, we must have w\in K. By construction, for points in K, the mapping A does not do anything: hence

\displaystyle w = f\bigl(A(w)\bigr) = f(w).



Corollary (on fixed points in convex sets of normed spaces): Let x_1,\ldots,x_n\in E a normed vector space, let \emptyset\neq K\subset\mathrm{span}(x_1,\ldots,x_n) convex, compact, and let f:K\to K continuous. Then f has a fixed point.

Proof: Let us choose a base for \mathrm{span}(x_1,\ldots,x_n) from the x_1,\ldots,x_n. We take w.l.o.g. x_1,\ldots,x_p for a certain p\leq n. Then any y\in\mathrm{span}(x_1,\ldots,x_n) has a unique representation as y=\sum_{j=1}^p\beta_jx_j, and the maping

\displaystyle A:\mathrm{span}(x_1,\ldots,x_n)\to\mathbb{R}^p,\qquad y\mapsto(\beta_1,\ldots,\beta_p)

is a bijection. As all norms on \mathbb{R}^p are equivalent, convergence issues are not affected by this bijection. Hence, the theorem and its proof work out in the setting of this corollary, too. q.e.d.


Note that this Corollary may deal with an infinite-dimensional space, however we make use of a finite-dimensional subspace only. This will become relevant in Schauder’s Theorem as well.


Theorem (Arzelà 1895, Ascoli 1884): Let X\subset\mathbb{R}^d compact, let \mathcal F be a family of continuous real-valued functions on X, which satisfies two properties:

  • it is pointwise bounded: for any x\in X, there is some M(x)\in\mathbb{R} with \left|f(x)\right|\leq M(x), for all f\in\mathcal F.
  • it is equicontinuous: for any \varepsilon>0 there is some \delta>0 such that for any x,y\in X with \left|x-y\right|<\delta we have \left|f(x)-f(y)\right|<\varepsilon, for all f\in\mathcal{F}.

Then, \mathcal F is relatively compact, that means every sequence in \mathcal F has a uniformly convergent subsequence.


Note, that we do not demand the limit of the convergent subsequence to be contained in \mathcal F; that would mean compact, instead of relatively compact.


Proof: 1st step. We get hold of a countable dense subset of X.

For our immediate uses of the theorem, it should suffice to choose \mathbb{Q}\cap X, since we will take X to be intervals and there will not be any need for more exotic applications. However, to show up something a little more general, have a look at the sets \bigl(U_{1/k}(x)\bigr)_{x\in X}. This is a covering of X and finitely many of them will suffice to cover X, for instance M_k:=\{x_{k1},\ldots,x_{kn}\}. The set M:=\bigcup_{k=1}^\infty M_k is countable. By construction, for any x_0\in X and any \varepsilon>0, we can find some point y\in M that has \left|x-y\right|<\varepsilon. Therefore, M is dense in X.

2nd step. We construct a certain subsequence to a given sequence (f_n)_n\subset\mathcal F.

This step is at the heart of the Arzelà-Ascoli-Theorem, with a diagonal argument to make it work. Let us enumerate the set M from the step 1 as \{x_1,x_2,\ldots\}.

As \mathcal F is pointwise bounded, the sequence \bigl(f_n(x_1)\bigr)_n\subset\mathbb{R} is bounded as well. By Bolzano-Weierstrass, it has a convergent subsequence that we will call \bigl(f_{1,n}(x_1)\bigr)_n.

If we evaluate this new sequence in x_2, we arrive at \bigl(f_{1,n}(x_2)\bigr)_n\subset\mathbb{R}, which is bounded as well. Again, we find a convergent subsequence that is now called \bigl(f_{2,n}(x_2)\bigr)_n. As this is a subsequence of \bigl(f_{1,n}(x_1)\bigr)_n, it converges in x_1 as well.

We continue this scheme and we find an array of sequences like this

f_{11} f_{12} f_{13} \cdots
f_{21} f_{22} f_{23} \cdots
f_{31} f_{32} f_{33} \cdots
\vdots \vdots \vdots \ddots


where each row is a subsequence of the row above. Row K is convergent in the point x_k by Bolzano-Weierstrass and convergent in the points x_1,\ldots,x_{k-1} by construction.

Now, consider the sequence (f_{nn})_n. It will converge in any point of M.

3rd step. Our subsequence of the 2nd step converges uniformly on X. We will use equicontinuity to expand the convergence from M to the whole of X.

As \mathcal F is equicontinuous, we will find for any \varepsilon>0 some \delta>0 with \left|f_{nn}(x)-f_{nn}(y)\right| < \frac\varepsilon3, for all n\in\mathbb{N}, as long as \left|x-y\right|<2\delta. Since X is compact, there are some points y_1,\ldots,y_p\in X with X\subset\bigcup_{j=1}^pU_\delta(y_j). And as M is dense in X, we can find some \xi_j\in U_\delta(y_j)\cap M for any j=1,\ldots,p.

Let x\in U_\delta(y_j), then

\displaystyle \left|x-\xi_j\right| \leq \left|x-y_j\right|+\left|y_j-\xi_j\right| < 2\delta,

which shows

\displaystyle \left| f_{nn}(x)-f_{nn}(\xi_j)\right| < \frac\varepsilon 3\qquad\text{ for any }n\in\mathbb{N}\text{ and }x\in X\cap U_\delta(y_j).

We have already seen that (f_{nn})_n is convergent on M, and hence (convergent sequences are Cauchy-sequences)

\displaystyle \left|f_{nn}(\xi_j) - f_{mm}(\xi_j)\right| < \frac\varepsilon 3\qquad \text{ for sufficiently large }m,n\text{ and for any }j=1,\ldots,p.

Now, let x\in X, no longer restricted. Then, there is some j=1,\ldots,p such that x\in U_\delta(x_j), and

\displaystyle    \begin{aligned}    \left|f_{nn}(x)-f_{mm}(x)\right| &\leq \left| f_{nn}(x)-f_{nn}(\xi_j)\right| + \left|f_{nn}(\xi_j) - f_{mm}(\xi_j)\right| + \left|f_{mm}(\xi_j) - f_{mm}(x)\right| \\    &< \frac\varepsilon3+\frac\varepsilon3+\frac\varepsilon3 = \varepsilon.    \end{aligned}

Thus, \left\|f_{nn}-f_{mm}\right\|_\infty < \varepsilon for sufficiently large n,m. This sequence is a Cauchy sequence and hence convergent. q.e.d.


This was our last stepping stone towards Schauder’s Theorem. Let’s see what we can do.


Theorem (Schauder, 1930): Let E be a normed vector space, \emptyset\neq K\subset E convex and closed, let f:K\to K continuous, f(K) relatively compact. Then f has a fixed point.


Proof: 1st step. As f(K) is relatively compact, its closure is compact. We construct a finite approximating subset of \overline{f(K)}.

Let \varepsilon>0. There are some finitely many points x_1,\ldots,x_m\in\overline{f(K)} with \overline{f(K)}\subset\bigcup_{j=1}^mU_\varepsilon(x_j). In particular, for any x\in f(K), there is some j=1,\ldots,m with \left|x_j-x\right| < \varepsilon. Let us consider the function for x\in f(K)

\displaystyle \varphi_j(x):=\mathbf{1}_{\left|x_j-x\right|<\varepsilon}\bigl(\varepsilon-\left|x-x_j\right|\bigr).

It is obviously continuous, and as \overline{f(K)} is covered by these U_\varepsilon,

\displaystyle \varphi(x)=\sum_{j=1}^m\varphi_j(x) > 0.

This allows \psi_j(x):=\frac{\varphi_j(x)}{\varphi(x)} to be well-defined, and by construction \psi(x)=\sum_{j=1}^m\psi_j(x) = 1. Hence, the function g:f(K)\to\mathrm{conv}(x_1,\ldots,x_m)

\displaystyle g(x):=\sum_{j=1}^m\psi_j(x) x_j

is continuous (the Lemma on convex sets tells us that this actually maps into the convex hull). Now, let x\in f(K). We find

\displaystyle g(x)-x = \sum_{j=1}^m\psi_j(x)x_j - x = \sum_{j=1}^m\psi_j(x)\bigl(x_j-x\bigr) = \sum_{\substack{j=1\\\left|x_j-x\right|<\varepsilon}}^m\psi_j(x)\bigl(x_j-x),

and therefore, for any x\in f(K),

\displaystyle \left|g(x)-x\right| \leq \sum_{\substack{j=1\\\left|x_j-x\right|<\varepsilon}}^m\psi_j(x)\left|x_j-x\right| < \varepsilon\sum_{j=1}^m\psi_j(x) = \varepsilon.

This shows that g uniformly approximates the identity on \mathrm{conv}(x_1,\ldots,x_m)\subset f(K). Note that g depends on the choice of \varepsilon.

2nd step. Reference to the Theorem on fixed points in convex sets and approximation of the fixed point.

We set h:=g\circ f, which is a continuous mapping

\displaystyle h:K\to\mathrm{conv}(x_1,\ldots,x_m)\subset f(K) \subset K.

We can restrict it to \mathrm{conv}(x_1,\ldots,x_m) and then re-name it \tilde h.  By the Lemma on convex sets, \mathrm{conv}(x_1,\ldots,x_m) is compact, it is finite-dimensional, and by the Corollary on fixed point sets in normed spaces, \tilde h has a fixed point z:

z = \tilde h(z) = g\bigl(f(z)\bigr)\qquad\text{ for some }z\in\mathrm{conv}(x_1,\ldots,x_m).


\displaystyle \left|f(z)-z\right| = \left|f(z)-g\bigl(f(z)\bigr)\right| < \varepsilon.

Note that z depends on g and hence on \varepsilon.

3rd step. Construction of the fixed point.

For any n\in\mathbb{N}, by step 2, we find some z_n\in\mathrm{conv}(x_1^{(n)},\ldots,x_{m(n)}^{(n)})\subset f(K)\subset K with

\displaystyle\left|f(z_n)-z_n\right| < \frac1n.

As f(K) is relatively compact, the sequence \bigl(f(z_n)\bigr)_n has a convergent subsequence: there is some \tilde z\in\overline{f(K)} with \tilde z=\lim_{k\to\infty}f(z_{n_k}). As K is closed, we get \tilde z\in \overline{f(K)}\subset\overline K = K. Now,

\displaystyle\left|z_{n_k} - \tilde z\right| \leq \left|z_{n_k} - f(z_{n_k})\right| + \left|f(z_{n_k}) - \tilde z\right| < \frac1{n_k} + \varepsilon_{n_k},

which means that z_{n_k} and \tilde z get arbitrarily close: \tilde z = \lim_{k\to\infty}z_{n_k}. Since f is continuous, we arrive at

\displaystyle f(\tilde z) = \lim_{k\to\infty}f(z_{n_k}) = \tilde z. q.e.d.


It is apparent that Schauder’s Theorem already has very general conditions that are tough to weaken further. Obviously the Theorem gets false if f is not continuous. If K were not closed, we’d get the counter-example of f:(0,1)\to(0,1), x\mapsto x^2, which doesn’t have any fixed points. If K were not convex, we’d get the counter-example of f:\partial B_1(0)\to \partial B_1(0), e^{it} \mapsto e^{i(t+\pi)}. It is hard to give a counter-example if f(K) is not relatively compact – in fact I would be interested to hear of any such counter-example or of the generalization of Schauder’s Theorem to such cases. Which is the most general such fixed point theorem?


Now, we are able to harvest the ideas of all this work and apply it to differential equations. Usually, in courses on ordinary differential equations, the famous Picard-Lindelöf-Theorem is proved, which states that for well-behaved functions f (meaning that they satisfy a Lipschitz-condition), the initial-value problem

y'(x) = f\bigl(x,y(x)\bigr),\qquad y(x_0) = y_0,

has a unique solution. This is a powerful theorem which simplifies the entire theory of differential equations. However, a little more holds true: it suffices that f is continuous to guarantee a solution. However, uniqueness is lost in general. While in many applications one can assume continuity of f without remorse (especially in physics), a Lipschitz-condition is much harder to justify. This is not to diminish the usefulness of Picard and Lindelöf, as any model has assumptions to be justified – the Lipschitz-condition is just one of them (if one even bothers to demand for a proper justification of existence and uniqueness – sometimes this would seem obvious from the start).

Let us have a look at what Peano told us:


Theorem (Peano, 1886/1890): Let f:R\to\mathbb{R} be continuous, where

\displaystyle R:=\bigl\{(x,y)\in\mathbb{R}^2\colon \left|x-x_0\right| \leq a, \left|y-y_0\right|\leq b\bigr\},

let M:=\max_{(x,y)\in R}\left|f(x,y)\right|, \alpha := \min\bigl(a, \frac bM\bigr).

Then, the initial value problem y'(x)= f\bigl(x,y(x)\bigr), y(x_0)=y_0, has a solution on the interval [x_0-\alpha, x_0+\alpha].


Concerning the interval on which we claim the solution to exist, have a look at how such a solution y might behave: as we vary x, the solution may “leave” R either to the vertical bounds (to left/right) or to the horizontonal bounds (up/down). A solution y can at most have a slope of \pm M, and thus, if it leaves on the horizontal bounds, this will happen at x_0\pm\frac bM as the earliest point. If it doesn’t leave there, it will exist until x_0\pm a. Of course, it might exist even further, but we have only demanded f to be defined till there. A little more formally, the mean value theorem tells us

\displaystyle \left|y(x)-y_0\right| = \left|y'(\xi)\right|\left|x-x_0\right| = \left|f\bigl(\xi,y(\xi)\bigr)\right|\left|x-x_0\right| \leq M\alpha\leq b.

This guarantees that the solution y is well-defined on R, because f is defined there.


Proof: 0th step. To simplify notation, let us set

\displaystyle    \begin{aligned}    J&:=[x_0-\alpha, x_0+\alpha],\\    \mathcal{C}(J)&:=\bigl\{f:J\to\mathbb{R}\text{ continuous}\bigr\},\\    K&:=\bigl\{y\in\mathcal{C}(J)\colon \left|y(x)-y_0\right|\leq b\text{ for any }x\in J\bigr\}.    \end{aligned}

1st step. We twist the problem to another equivalent shape, making it more accessible to our tools.

First, let y(x) be a solution to the initial value problem on a sub-interval I\subset J. Then, for any x\in I,

\displaystyle y'(x) = f\bigl(x,y(x)\bigr),\quad y(x_0)=y_0,

and hence

\displaystyle y(x) = y_0 + \int_{x_0}^x f\bigl(t,y(t)\bigr)dt.\qquad (\heartsuit)

On the other hand, if we start from this equation and suppose that it holds for any x\in I, y must be differentiable with y'(x) = f\bigl(x,y(x)\bigr) and y(x_0)=y_0.

We have seen that a function y solves the initial value problem on J if and only if it satisfies the equation (\heartsuit) on J.

2nd step. We try to give a representation of the problem as a fixed-point-problem.

Let us consider the mapping

\displaystyle A:K\to\mathcal{C}(J),~ y\mapsto y_0 + \int_{x_0}^{\text{\large\textbf{.}}} f\bigl(t,y(t)\bigr)dt.

This is a functional where we plug in a continuous function and where we get a continuous function back. In particular, and to make it even more painfully obvious,

\displaystyle (Ay)(x) = y_0+\int_{x_0}^xf\bigl(t,y(t)\bigr)dt,\qquad\text{for any }x\in J.

Therefore, y is a solution to the intial value problem, if it is a fixed point of A, meaning Ay=y.

3rd step. We show that A maps K to itself. We have defined A only on K, so let y\in K and x\in J; then:

\displaystyle \left|(Ay)(x)-y_0\right| = \left|\int_{x_0}^x f\bigl(t,y(t)\bigr) dt\right| \leq M\left|x-x_0\right| \leq \alpha M\leq b.

The second-to-last inequality follows from x\in J, the last one from the definition of \alpha.

This shows that Ay\in K.

4th step. K\neq\emptyset is obvious, as the constant function y_0 is in K.

5th step. K is convex. Let f,g\in K and let \beta\in[0,1]. Then, for any x\in J,

\displaystyle    \begin{aligned}    \left|(1-\beta)f(x)+\beta g(x) - y_0\right| &= \left|(1-\beta)\bigl(f(x)-y_0\bigr) + \beta\bigl(g(x)-y_0\bigr)\right| \\    &\leq (1-\beta)\left|f(x)-y_0\right| + \beta\left|g(x)-y_0\right| \\    &\leq (1-\beta)b+\beta b = b.    \end{aligned}

This proves (1-\beta)f+\beta g\in K.

6th step. K is a closed set in \mathcal{C}(J), where we use the topology of uniform convergence.

Consider the sequence (f_n)_n\subset K which converges uniformly to some f\in\mathcal{C}(J). Remember that \mathcal{C}(J) is complete, which is why we can do this. Then, for any x\in J,

\displaystyle\left|f(x)-y_0\right| = \left|\lim_{n\to\infty}f_n(x)-y_0\right| = \lim_{n\to\infty}\left|f_n(x)-y_0\right| \leq \lim_{n\to\infty} b = b.

This shows that f\in K.

7th step. Using the topology of uniform convergence, the mapping A:K\to K is continuous.

Let \varepsilon>0. The function f is continuous on the compact set R and hence uniformly continuous. Therefore, there is some \delta>0 such that for \left|u-v\right|<\delta,

\displaystyle \left|f(t,u)-f(t,v)\right|<\frac\varepsilon\alpha.

Now, let y,z\in K with \left\|y-z\right\|_\infty < \delta. Then we have just seen that for any t\in J

\displaystyle \left|f\bigl(t,y(t)\bigr)-f\bigl(t,z(t)\bigr)\right| < \frac\varepsilon\alpha.

That yields

\displaystyle    \begin{aligned}    \left|(Ay)(x)-(Az)(x)\right| &= \left|\int_{x_0}^xf\bigl(t,y(t)\bigr)dt - \int_{x_0}^xf\bigl(t,z(t)\bigr)dt\right| \\    &< \left|x-x_0\right|\frac\varepsilon\alpha\\    &\leq\varepsilon.    \end{aligned}

In particular,

\displaystyle\left\|Ay-Az\right\|_\infty < \varepsilon.

8th step. The set A(K)\subset K is relatively compact. Note that A(K) is a set of continuous functions.

Let y\in K and let x,x_1,x_2\in J. Then, every function of A(K) is bounded pointwise, since

\displaystyle \left|(Ay)(x)\right| = \left|y_0+\int_{x_0}^xf\bigl(t,y(t)\bigr)dt\right|\leq \left|y_0\right|+\left|x-x_0\right|M \leq \left|y_0\right|+\alpha M.

Besides, A(K) is equicontinuous, because of

\displaystyle    \begin{aligned}    \left|(Ay)(x_1)-(Ay)(x_2)\right| &= \left|\int_{x_0}^{x_1}f\bigl(t,y(t)\bigr)dt - \int_ {x_0}^{x_2}f\bigl(t,y(t)\bigr)dt\right| \\    &= \left|\int_{x_1}^{x_2}f\bigl(t,y(t)\bigr) dt\right| \\    &\leq \left|x_1-x_2\right| M.    \end{aligned}

Arzelà and Ascoli now tell us that any sequence in A(K) has a uniformly convergent subsequence.

9th and final step. From Schauder’s Fixed Point Theorem and steps 3 to 8, A has a fixed point in K. From step 2, the initial value problem has a solution. q.e.d.



There was a lot of technical work that we have only needed to invoke Schauder’s Theorem. Some of this could have been avoided, if we had a more elementary proof of Schauder’s Theorem. Such a proof exists, however, some of our machinery is still needed – the proof cannot honestly be called elementary. In some way, the proof matches our procedure given above, however, not everything is needed in such a fine manner. Let’s have short look at it; this is taken from Walter’s book.


Proof (Peano’s Theorem in a more elementary fashion): We proceed in two parts. First, we shall prove the weaker statement that if f is continuous and bounded on the (non-compact) set [x_0,x_0+a]\times\mathbb{R}, then there is a solution to the intial-value problem on [x_0,x_0+a]. Afterwards, we extend this to our compact set R. We won’t deal with extending the solution to the left of x_0 as it’s neither important nor difficult. In the previous proof we didn’t need to bother about this.

1st step. Let us define a function on [x_0,x_0+a] using some parameter \alpha\in(0,a] by

\displaystyle z_\alpha(x) = y_0\mathbf{1}_{x\leq x_0} + \left(y_0+\int_{x_0}^xf\bigl(t,z_\alpha(t)\bigr)dt\right)\mathbf{1}_{x>x_0}.

This is well-defined, since on the sub-interval [x_0+k\alpha, x_0+(k+1)\alpha) we have t-\alpha \in[x_0+(k-1)\alpha, x_0+k\alpha), and thus z_\alpha(t-\alpha) has been recursively defined; hence z_\alpha is defined as well.

Let us denote \mathcal{F}:=\bigl\{z_\alpha\colon\alpha\in(0,a]\bigr\}\subset\mathcal{C}\bigl([x_0,x_0+a]\bigr).

2nd step. \mathcal F is equicontinuous. Let \varepsilon > 0 and let x_1,x_2\in[x_0,x_0+a]. Then we get, as f is bounded by some M,

\displaystyle \left|z_\alpha(x_1)-z_\alpha(x_2)\right| = \left|\int_{x_1}^{x_2} f\bigl(t, z_\alpha(t-\alpha)\bigr)dt\right| \leq \left|x_2-x_1\right|M,

which doesn’t depend on \alpha, x_1 or x_2 (only on their distance). Hence, if \left|x_1-x_2\right|<\delta = \frac\varepsilon M,

\displaystyle \left|z_\alpha(x_1)-z_\alpha(x_2)\right| \leq \varepsilon.

3rd step. \mathcal F is pointwise bounded. This is obvious from \left|z_\alpha(x)\right|\leq\left|y_0\right|+aM, which doesn’t depend on \alpha or x.

4th step. We determine a solution to the intial-value problem.

From steps 2 and 3 and from Arzelà-Ascoli, we know that the sequence (z_{1/n})_n\subset\mathcal{F} has a uniformly convergent subsequence. Let us denote its limit by y(x), which is defined for all x\in[x_0,x_0+a]. This allows us to get

\displaystyle    \begin{aligned}    \left|z_{1/n_k}\left(t-\frac1{n_k}\right) - y(t)\right| &\leq \left| z_{1/n_k}\left(t-\frac1{n_k}\right) - z_{1/n_k}(t)\right| + \left|z_{1/n_k}(t) - y(t)\right| \\    &\leq \frac M{n_k} + \varepsilon\\    &< \overline\delta,    \end{aligned}

for any t\in[x_0,x_0+a] and for sufficiently large K. It should be clear what we intend to say with \overline\delta (let’s bring in a little sloppiness here, shall we). Since f is continuous in its second component, this proves

\displaystyle \left|f\left(t, z_{1/n_k}\left(t-\frac1{n_k}\right)\right) - f\bigl(t,y(t)\bigr)\right| < \overline\varepsilon\qquad\text{for any }t\in[x_0,x_0+a].

Hence, as every participant here converges uniformly,

\displaystyle    \begin{aligned}    y(x) &= \lim_{k\to\infty}z_{1/n_k}(x) \\    &= \lim_{k\to\infty}\left(y_0 + \int_{x_0}^x f\left(t, z_{1/n_k}\left(t-\frac1{n_k}\right)\right)dt\right)\\    &= y_0 + \int_{x_0}^x \lim_{k\to\infty} f\left(t, z_{1/n_k}\left(t-\frac1{n_k}\right)\right) dt\\    &= y_0 + \int_{x_0}^x f\bigl(t, y(t)\bigr) dt.    \end{aligned}

This shows that

y'(x) = f\bigl(x, y(x)\bigr),\qquad y(x_0) = y_0.

5th step. Extension to the general case: Let f be defined on the compact rectangle R.

We give a continuation of f beyond [y_0-b,y_0+b] for all x\in[x_0-a, x_0+a] via

\displaystyle \tilde f(x,y) = \begin{cases}f(x, y_0-b) ,&\text{for }y < y_0-b\\ f(x,y), &\text{for }(x,y)\in R\\ f(x,y_0+b),&\text{for }y>y_0+b\end{cases}

Obviously, \tilde f is continuous and bounded. From Step 1, y' = \tilde f(x, y) has a solution on [x_0, x_0+a]. For \left|x-x_0\right|\leq\frac bM, we get

\displaystyle    \begin{aligned}    \left|y(x) - y_0\right| &\leq\left|y'(\xi)\right|\left|x-x_0\right| \\    &= \left|\tilde f\bigl(\xi,y(\xi)\bigr)\right|\left|x-x_0\right| \\    &\leq M\left|x-x_0\right|\leq b    \end{aligned}

Therefore, the solution of y'=f(x,y) is well-defined if \left|x-x_0\right|\leq\frac bM, and thus the solution is guaranteed to exist for \left|x-x_0\right|<\alpha = \min\bigl(a,\frac bM\bigr). q.e.d.


As a sort of last-minute addendum, I have stumbled upon two articles from the 1970s that shed some more light on the issue of elementary proofs to Peano’s Theorem which completely avoid the technicalties of Schauder’s Theorem and of Arzelà-Ascoli. One is called “There is an Elementary Proof of Peano’s Existence Theorem” by Wolfgang Walter (the author of the book we cited earlier; Amer. Math. Monthly 78 1971, 170-173), the other is “On Elementary Proofs of Peano’s Existence Theorems” by Johann Walter (Amer. Math. Monthly 80, 1973, 282-286). The issue of whether Arzelà-Ascoli can be avoided is solved by both papers positively: they give proofs of Peano’s Theorem which only deal with standard calculus methods. Basically, the employ the Euler polygon method to construct a solution of the intitial value problem. However, again, the proofs are not constructive. Besides, “elementary” is not to be confused with “easy”, Peano’s Theorem is still nothing that lies directly on the surface of things. A brief look at the second of those articles (to the best of my knowledge the identical names of the authors are a coincidence) raises hope that this proof is actually not too hard – it should be understandable with a lot less effort than the proof via Schauder’s Theorem that we gave above in full detail; remember that Schauder itself required many non-standard theorems on its way. The elementary proof will only work for one-dimensional differential equations, but we bothered only with those anyway; it uses monotonicity of its approximating sequence which is only applicable in the real numbers. On the plus side, the proof explicitly constructs a solution via the Euler method.

The papers also shed some light on the history of Peano’s Theorem and the quest for its proof (together with some rather unusual disagreement on whether an earlier proof is valid or not; some interesting lines to read in passing). This should be enough on this matter for now. If the interest holds up (which is, to this extent, rather unlikely), we’ll return to it. But not for now.