Johann Kepler and how planets move

Kepler’s Laws of planetary motion follow very smoothly from Newton’s Law of Gravitation. Very little tough mathematics is needed for the proofs, it can actually be done with ordinary differential calculus and some knowledge on path integration.

Of course, from a historical point of view, these laws appeared reversed. Kepler had neither Newton’s Law at his disposal, nor had he sufficient use of the calculus machinery that we have today. Instead, his way of coming up with his Laws included years of hard work on the astronomical tables compiled by Tycho Brahe; Kepler himself was unable to make astronomical observations himself, even with his self-invented telescope, as he was ill-sighted for all his life. Later, Newton could rely on Kepler’s results to find inspiration for his Law of Gravitation: indeed, unless he could relate his results to Kepler’s laws, he knew his results to be incomplete. Together with his many other achievements (for instance, Kepler was the first to state Simpson’s rule, which is accordingly called “Kepler’s barrel rule” sometimes) this makes him one of the most interesting minds of the early modern era. His interest in planet’s orbits was rooted in astrology and the construction of horoscopes; his observations and deductions led him to drop both Copernicus’ thought that the orbits were circles and, later, that there were no platonic solids involved. Both Copernicus and Kepler had revolutionized the thoughts on space itself, Copernicus by showing how much easier the orbits can be described when Earth is allowed to move itself (no more epicycles and the like), but Kepler made it even easier by dropping circles altogether.

To consider how hard the problem of finding the planet’s orbits is, consider that we have plenty of observational data of the planets, however the data contain two unknowns: the orbit of Earth and the orbit of the planet. On top of that, our observations only tell us about the angle under which the planet is observed, we don’t learn anything about the distances involved (unless we have Kepler’s third law at our disposal). In a very insightful talk for a wider audience, Terry Tao has explained some of Kepler’s ideas on this, especially how Kepler dealt with the orbit of Mars which had been for several reasons the most tricky one of the orbits in the models that preceded Kepler. Tao mentions that Einstein valued the finding of Kepler’s Laws one of the most shining moments in the history of human curiosity. “And if Einstein calls you a genius, you are really doing well.

From these Laws and from what Newton and his successors achieved, many things can be inferred that are impossible to measure directly. For instance the mass of the Sun and all planets can be computed from here, once the gravitational constant is known (which is tricky to pinpoint, actually). Voltaire is quoted with the sentence, regarding Newton’s achievements but this would fit to Kepler as well, that the insights gained “semblaient n’être pas faites pour l’esprit humain.

To give just a tiny bit of contrast, we mention that Kepler also had erroneous thoughts that show how deeply he was still rooted in ancient ideas of harmonics and aesthetics. For instance, Kepler tried to prove why the Solar system had exactly six planets (or to rephrase a little more accurately to his thinking: why God had found pleasure in creating exactly six planets). For some time, he believed that the reason was related to the fact that there are exactly five platonic solids which define the structure of the six orbits around the sun. Those were ideas also related to the integer harmonies of a vibrating string, as the planets were supposed to move in a harmonical way themselves. Of course, in these days the observations were limited up to Saturn, as the outer planets (and dwarf planets) cannot be found by eyesight or the telescopes at Kepler’s disposal; all such ideas were doomed to be incomplete. However, his quest for harmonics in the Solar system led him in the end to his Third Law. On another account, Kepler was mistaken in the deduction of his First Law, since he lacked the deep knowledge about integration that would be developed decades later; luckily, his mistake cancels out with another mistake later on: “Es ist schon atemberaubend, wie sich bei Keplers Rechnungen letztlich doch alles fügt.” (“It is stunning how everything in Kepler’s computations adds up in the end”; have a look at Sonar’s highly readable and interesting book on the history of calculus for this).

In what follows, we shall show how the Kepler’s Laws can be proved, assuming Newton’s Law of Gravitation, in a purely mathematical fashion. There will be no heuristics from physics or from astronomy, only the axiomatic mathematical deduction that mostly works without any intuition from the applications (though we will look at motivations for why some definitions are made the way they are).

As a nice aside, we can look at the mathematical descriptions of the conic sections on which the first Law relies. But here again, there’s no connection to why these curve are called this way.

Let us state Kepler’s Laws here first.

Kepler’s First Law of Planetary Motion: Planets orbit in ellipses, with the Sun as one of the foci.

Kepler’s Second Law of Planetary Motion: A planet sweeps out equal areas in equal times.

Kepler’s Third Law of Planetary Motion: The square of the period of an orbit is proportional to the cube of its semi-major axis.

Let us prove this, by following the account of Königsberger’s book on calculus. Many calculus books deal with Kepler’s Laws in a similar axiomatical fashion, yet we stick to this account as it appears to be the neatest one without conjuring up too much of physics.

We shall give a couple of technical lemmas first.


The Triangle-Lemma: The triangle marked by the points (x_1,y_1), (x_2,y_2), (x_3,y_3) has area \displaystyle \frac12\left|\mathrm{det}\begin{pmatrix}1&x_1&y_1\\1&x_2&y_2\\1&x_3&y_3\end{pmatrix}\right|.

Proof: The triangle together with the coordinate axes marks the parallelograms:

P_{13} with the points (x_1,y_1), (x_3, y_3), (x_3,0), (x_1,0);

P_{23} with the points (x_2,y_2), (x_3, y_3), (x_3,0), (x_2,0);

P_{12} with the points (x_1,y_1), (x_2, y_2), (x_2,0), (x_1,0).


Thus, the area of the triangle is:

F_{T} = P_{13}+P_{23}-P_{12}.

The sign represents the situation given in the figure. For other triangles, another permutation of the signs may be necessary, but there will always be exactly one negative sign. Other permutations of the sign only represent a re-numbering of the points and therefore a change of sign in the determinant given in the statement. As we put absolute values to our statement, we avoid any difficulties of this kind.

As each of the paralellograms has two of their points on the x-axis, we find

\begin{aligned}    F_{T} &= \frac{y_1+y_3}{2}(x_3-x_1)+\frac{y_2+y_3}{2}(x_2-x_3)-\frac{y_1+y_2}{2}(x_2-x_1)\\    &= \frac12\left(x_1\bigl((y_1+y_2)-(y_1+y_3)\bigr) + x_2\bigl((y_2+y_3)-(y_1+y_2)\bigr) + x_3\bigl((y_1+y_3)-(y_2+y_3)\bigr)\right)\\    &= \frac12\left(x_1\bigl((y_1-y_3) + (y_2 - y_1)\bigr) + x_2(y_3-y_1) + x_3(y_1-y_2)\right)\\    &= \frac12\left((x_2-x_1)(y_3-y_1) - (x_3-x_1)(y_2-y_1)\right)\\    &= \frac12\mathrm{det}\begin{pmatrix}x_2-x_1&y_2-y_1\\x_3-x_1&y_3-y_1\end{pmatrix}\\    &= \frac12\mathrm{det}\begin{pmatrix}1&x_1&y_1\\0&x_2-x_1&y_2-y_1\\0&x_3-x_1&y_3-y_1\end{pmatrix}\\    &= \frac12\mathrm{det}\begin{pmatrix}1&x_1&y_1\\1&x_2&y_2\\1&x_3&y_3\end{pmatrix}.    \end{aligned}



Lemma (Leibniz’ sector formula): Let \gamma:[a,b]\to\mathbb{R}^2 be a continuously differentiable path, \gamma(t) = \begin{pmatrix}x(t)\\y(t)\end{pmatrix}. Then the line segment from 0 to the points of the path sweeps the area \displaystyle\frac12\int_a^b(x\dot y - y\dot x)dt.

Note that we have used Newton’s notation for derivatives. One might also write the integral as \displaystyle\int_a^b\bigl(x(t)y'(t)-y(t)x'(t)\bigr)dt.

Proof: Let us clarify first, what we understand by “sweeping” line segments. Consider the path \gamma given in the image.


As this path is not closed (it’s not a contour), it doesn’t contain an area. But if we take the origin into account, we can define an area that is related to where the path is:


Now, pick a partition of [a,b], such as \{a=t_0,t_1,\ldots,t_{n-1},b=t_n\} and make a polygon of the partition and the origin – the corresponding triangles form an area that approximates the area bounded by \gamma, as above.


As the partition gets finer, we expect that the polygon-area converges to the \gamma-area. And this is where the definition originates, of the area that is swept by a path:

For any \varepsilon>0 there shall be \delta>0, such that for every partition \{t_0,\ldots,t_n\} of [a,b] that is finer than \delta, we get

\displaystyle \left|\sum_{i=0}^{n-1}F_{T_i} - \frac12\int_a^b(x\dot y-y\dot x)dt\right| < \varepsilon.\qquad(\ast)

Here, F(T_i) is the area of the triangle bounded by \gamma_{t_i}, \gamma_{t_{i+1}} and the origin. By the Triangle-Lemma, its area is \displaystyle\frac12\mathrm{det}\begin{pmatrix}1&0&0\\1&x_i&y_i\\1&x_{i+1}&y_{i+1}\end{pmatrix}.

Because the orientation of the T_i might be of importance, F(T_i) may keep its sign in what follows (Imagine, for instance, a path that traverses a line segment once from left to right and once from right to left; in total, no area is covered).

Now, let us prove that (\ast) is true.

As \dot x and \dot y are continuous, choose L=\max\left(\max_{[a,b]}\left|\dot x(t)\right|, \max_{[a,b]}\left|\dot y(t)\right|\right) and take \delta = \frac{\varepsilon}{2L^2(b-a)}. Take a partition \{t_0,\ldots,t_n\} with t_0=a, t_0=b and \left|t_{k+1}-t_k\right| < \delta for k=0,\ldots,n-1. Then, for any such k,

\begin{aligned}    F(T_k) &= \frac12\mathrm{det}\begin{pmatrix}1&0&0\\1&x_k&y_k\\1&x_{k+1}&y_{k+1}\end{pmatrix}\\    &=\frac12(x_ky_{k+1}-x_{k+1}y_k)\\    &= \frac12\Bigl(x_k\bigl(y_k+(y_{k+1}-y_k)\bigr) - y_k\bigl(x_k+(x_{k+1}-x_k)\bigr)\Bigr)\\    &=\frac12\bigl(x_k(y_{k+1}-y_k)-y_k(x_{k+1}-x_k)\bigr)\\    &=\frac12\left(x_k\int_{t_k}^{t_{k+1}}\dot y(t)dt - y_k\int_{t_k}^{t_{k+1}}\dot x(t)dt\right)\\    &=\frac12\int_{t_k}^{t_{k+1}}\bigl(x_k\dot y(t)-y_k\dot x(t)\bigr)dt.    \end{aligned}

This yields, using the mean value theorem,

\begin{aligned}    \left|2F(T_k)-\int_{t_k}^{t_{k+1}}(x\dot y-y\dot x)dt\right| &= \left|\int_{t_k}^{t_{k+1}}(x_k\dot y-y_k\dot x-x\dot y+y\dot x)dt\right|\\    &\leq \left|\int_{t_k}^{t_{k+1}}(x_k-x)\dot ydt\right| + \left|\int_{t_k}^{t_{k+1}}(y-y_k)\dot xdt\right|\\    &\leq\int_{t_k}^{t_{k+1}}\max_{[a,b]}\left|\dot x(t)\right|\left|{t_k-t}\right|\left|\dot y(t)\right|dt + \\    &\hphantom{=}+\int_{t_k}^{t_{k+1}}\max_{[a,b]}\left|\dot y(t)\right|\left|{t_k-t}\right|\left|\dot x(t)\right|dt\\    &\leq L^2\left|t_{k+1}-t_k\right|(t_{k+1}-t_k) + L^2\left|t_{k+1}-t_k\right|(t_{k+1}-t_k)\\    &< 2L^2\delta(t_{k+1}-t_k).    \end{aligned}

We conclude

\displaystyle\left|2\sum_{k=0}^{n-1}F(T_k) - \int_a^b(x\dot y - y\dot x)dt\right| < 2L^2\delta(b-a) = \varepsilon.


One might as well prove this by applying Green’s Theorem, but in this case it just gets less elementary.


The \times-Lemma: Let a,b\in\mathbb{R}^3. We set

\displaystyle a\times b := \begin{pmatrix}a_2b_3-a_3b_2\\a_3b_1-a_1b_3\\a_1b_2-a_2b_1\end{pmatrix}.

Obviously, this is linear both in a and in b. We have

(i)\quad\displaystyle \left\langle (a\times b),c\right\rangle = \mathrm{det}(a,b,c).

(ii)\quad a\times b is orthogonal to a and to b.

(iii)\quad\displaystyle (a\times b)\times c = -\left\langle b,c\right\rangle a + \left\langle a,c\right\rangle b (Grassmann’s equation).

(iv)\quad\displaystyle (a\times b)^{\cdot} = \dot a\times b+a\times\dot b.

Proof: For (i):

\begin{aligned}    \left\langle (a\times b),c\right\rangle &= \left\langle\begin{pmatrix}a_2b_3-a_3b_2\\a_3b_1-a_1b_3\\a_1b_2-a_2b_1\end{pmatrix},\begin{pmatrix}c_1\\c_2\\c_3\end{pmatrix}\right\rangle\\    &=c_1(a_2b_3-a_3b_2)+c_2(a_3b_1-a_1b_3)+c_3(a_1b_2-a_2b_1)\\    &=c_1\mathrm{det}\begin{pmatrix}a_2&b_2\\a_3&b_3\end{pmatrix}-c_2\mathrm{det}\begin{pmatrix}a_1&b_1\\a_3&b_3\end{pmatrix}+c_3\mathrm{det}\begin{pmatrix}a_1&b_1\\a_2&b_2\end{pmatrix}\\    &=\mathrm{det}\begin{pmatrix}a_1&b_1&c_1\\a_2&b_2&c_2\\a_3&b_3&c_3\end{pmatrix}\\    &=\mathrm{det}(a,b,c).    \end{aligned}

For (ii): \left\langle a\times b, a\right\rangle = \mathrm{det}(a,b,a)=0 and \left\langle a\times b,b\right\rangle = \mathrm{det}(a,b,b)=0.

For (iii):

\begin{aligned}    (a\times b)\times c &=\begin{pmatrix}a_2b_3-a_3b_2\\a_3b_1-a_1b_3\\a_1b_2-a_2b_1\end{pmatrix}\times\begin{pmatrix}c_1\\c_2\\c_3\end{pmatrix}\\    &=\begin{pmatrix}(a_3b_1-a_1b_3)c_3-(a_1b_2-a_2b_1)c_2\\(a_1b_2-a_2b_1)c_1-(a_2b_3-a_3b_2)c_3\\(a_2b_3-a_3b_2)c_2-(a_3b_1-a_1b_3)c_1\end{pmatrix}\\    &=\begin{pmatrix}a_3b_1c_3-a_1b_3c_3-a_1b_2c_2+a_2b_1c_2\\a_1b_2c_1-a_2b_1c_1-a_2b_3c_3+a_3b_2c_3\\a_2b_3c_2-a_3b_2c_2-a_3b_1c_1+a_1b_3c_1\end{pmatrix}\\    &=\begin{pmatrix}a_1b_1c_1+a_2b_1c_2+a_3b_1c_3-a_1b_1c_1-a_1b_2c_2-a_1b_3c_3\\a_1b_2c_1+a_2b_2c_2+a_3b_2c_3-a_2b_1c_1-a_2b_2c_2-a_2b_3c_3\\a_1b_3c_1+a_2b_3c_2+a_3b_3c_3-a_3b_1c_1-a_3b_2c_2-a_3b_3c_3\end{pmatrix}\\    &=(a_1c_1+a_2c_2+a_3c_3)\begin{pmatrix}b_1\\b_2\\b_3\end{pmatrix} - (b_1c_1+b_2c_2+b_3c_3)\begin{pmatrix}a_1\\a_2\\a_3\end{pmatrix}\\    &=\left\langle a,c\right\rangle b - \left\langle b,c\right\rangle a.    \end{aligned}

For (iv):

\begin{aligned}    \bigl(a(t)\times b(t)\bigr)^\cdot &= \begin{pmatrix}a_2(t)b_3(t)-a_3(t)b_2(t)\\a_3(t)b_1(t)-a_1(t)b_3(t)\\a_1(t)b_2(t)-a_2(t)b_1(t)\end{pmatrix}^\cdot\\    &=\begin{pmatrix}\dot a_2b_3+a_2\dot b_3-\dot a_3b_2-a_3\dot b_2\\\dot a_3b_1+a_3\dot b_1-\dot a_1b_3-a_1\dot b_3\\\dot a_1b_2+a_1\dot b_2-\dot a_2b_1-a_2\dot b_1\end{pmatrix}\\    &=\begin{pmatrix}\dot a_2b_3-\dot a_3b_2\\\dot a_3b_1-\dot a_1b_3\\\dot a_1b_2-\dot a_2b_1\end{pmatrix}+\begin{pmatrix}a_2\dot b_3-a_3\dot b_2\\a_3\dot b_1-a_1\dot b_3\\ a_1\dot b_2-a_2\dot b_1\end{pmatrix} \\    &=\dot a\times b + a\times \dot b    \end{aligned}



Now, let us look at conic sections and define them mathematically. We will not be interested in what these things have to do with cones – as stated at the beginning: pure mathematics here.

We are going to work in \mathbb{R}^2 here. Let F be a point (the so-called focal point) and l be a line (the so-called directrix), and the distance of F and l shall be some p>0. We are looking for all those points in \mathbb{R}^2 for which the distance to F and the distance to l are proportional – formally: For any point (\xi,\eta)^t\in\mathbb{R}^2, we set

r=\mathrm{dist}\bigl(F,(\xi,\eta)^t\bigr)\qquad\text{and}\qquad d = \mathrm{dist}\bigl(l, (\xi,\eta)^t\bigr),

and for \varepsilon > 0 we demand

\displaystyle \frac{r}{d}=\varepsilon.

For simplicity, we will put F into the origin of our coordinate system, and l parallel to one of the axes, as in the figure. In particular, r^2 = \xi^2+\eta^2 and d = p+\xi. Our equation for the interesting points thus becomes:

\begin{aligned}    && r^2&=\varepsilon^2d^2\\    &\iff& \xi^2+\eta^2 &= \varepsilon^2p^2+2\varepsilon^2p\xi+\varepsilon^2\xi^2\\    &\iff&\xi^2(1-\varepsilon^2) &= \varepsilon^2p^2-\eta^2+2\varepsilon^2p\xi.    \end{aligned}


Let us distinguish the following cases:

Case \varepsilon = 1. Then we set x := \xi+\frac p2, y := \eta and find

\begin{aligned}    &&\xi^2 (1-\varepsilon^2) &= \varepsilon^2p^2-\eta^2 + 2\varepsilon^2 p \xi \\    &\iff& 0 &= p^2-y^2+2p\left(x-\frac p2\right)\\    &\iff& 0 &= p^2-y^2+2px-p^2\\    &\iff& y^2 &= 2px.    \end{aligned}

We see that the interesting points lie on a parabola which is open to the right (by choosing other coordinate systems, of course, any other parabola will appear; in some way, this is its normal form).

Case \varepsilon < 1. Here we set x:=\xi-\frac{p\varepsilon^2}{1-\varepsilon^2}, y:=\eta. Then we get

\begin{aligned}    &&\xi^2 (1-\varepsilon^2) &= \varepsilon^2p^2-\eta^2 + 2\varepsilon^2 p \xi \\    &\iff &\left(x+\frac{p\varepsilon^2}{1-\varepsilon^2}\right)^2(1-\varepsilon^2) &= \varepsilon^2p^2-y^2+2\varepsilon^2p\left(x+\frac{p\varepsilon^2}{1-\varepsilon^2}\right)\\    &\iff& x^2(1-\varepsilon^2)+2xp\varepsilon^2 + \frac{p^2\varepsilon^4}{1-\varepsilon^2} &= \varepsilon^2p^2-y^2+2\varepsilon^2px+\frac{2\varepsilon^4p^2}{1-\varepsilon^2}\\    &\iff& x^2(1-\varepsilon^2) + y^2 &= \varepsilon^2p^2+\frac{\varepsilon^4p^2}{1-\varepsilon^2}\\    &\iff& x^2(1-\varepsilon^2) + y^2 &= \frac{(1-\varepsilon^2)\varepsilon^2p^2+\varepsilon^4p^2}{1-\varepsilon^2}\\    &\iff& x^2(1-\varepsilon^2) + y^2 &= \frac{\varepsilon^2p^2}{1-\varepsilon^2}\\    &\iff& \frac{x^2(1-\varepsilon^2)^2}{\varepsilon^2p^2} + \frac{y^2(1-\varepsilon^2)}{\varepsilon^2p^2} &= 1\\    &\iff& \frac{x^2}{a^2} + \frac{y^2}{b^2} &= 1,    \end{aligned}

with a = \frac{\varepsilon p}{1-\varepsilon^2} and b = \frac{\varepsilon p}{\sqrt{1-\varepsilon^2}}.

We have found that the interesting points lie on an ellipse.

Case \varepsilon > 1. This is exactly the same as the case \varepsilon<1, except for the last step. We mustn’t set b as before, since 1-\varepsilon^2<0 and we cannot get a real square root of this. Thus, we use b = \frac{\varepsilon p}{\sqrt{\varepsilon^2-1}} and the resulting negative sign is placed in the final equation:

\displaystyle \frac{x^2}{a^2}-\frac{y^2}{b^2} = 1.

This is a hyperbola.

To conclude this part, we give the general representation of conic sections in polar coordinates. From the figure given above, we see d = p+r\cos\varphi and so

\displaystyle r = \varepsilon d = \varepsilon p + \varepsilon r\cos\varphi,

which means

\displaystyle r = \frac{\varepsilon p}{1-\varepsilon\cos\varphi}.

That yields the polar coordinates (only depending on parameters and on the variable \varphi:

\displaystyle re^{i\varphi} = \frac{\varepsilon p}{1-\varepsilon\cos\varphi}e^{i\varphi}.


Now, let us turn to our base for the proofs of Kepler’s Laws: Newton’s Law of Gravitation. Let m be the mass of a planet, M the mass of the Sun, \gamma a real constant (the gravitational constant), and let x(t) be a path (the planet’s orbit). By Newton’s Law we have the differential equation

\displaystyle m\ddot x = -\gamma Mm\frac{x}{\left\|x\right\|^3}.

On the left-hand side, there’s the definition of force as mass multiplied by acceleration. On the right-hand side is Newton’s Law stating the gravitational force between Sun and planet.

We define the vector-valued functions of t (note that x depends on t):

\displaystyle J = x\times m\dot x\qquad\text{and}\qquad A=\frac1{\gamma Mm}J\times\dot x+\frac{x}{\left\|x\right\|}.

(J is the angular momentum, A is an axis; but our math doesn’t care for either of those names or intentions).

The AJ-Lemma: As functions of t, A and J are constant.

Proof: Let us look at J first. Using the fact that by definition a\times a = 0, and Newton’s Law of Gravitation, we get

\begin{aligned}    \dot J \underset{\times\text{-Lemma}}{\overset{(iv)}{=}} (x\times m\dot x)^\cdot &= \dot x \times m\dot x + x\times m\ddot x\\    &= \dot x \times m\dot x + x\times\left(-\gamma Mm\frac{x}{\left\|x\right\|^3}\right)\\    &= m(\dot x\times \dot x) - \frac{\gamma Mm}{\left\|x\right\|^3}(x\times x)\\    &= 0 - 0 = 0.    \end{aligned}

Now, for A, we will need a side-result first.

\begin{aligned}    \frac{d}{dt}\frac{1}{\left\|x\right\|} &= \frac{d}{dt}\bigl(x_1^2(t)+x_2^2(t)+x_3^2(t)\bigr)^{-1/2}\\    &= -\frac12\bigl(x_1^2(t)+x_2^2(t)+x_3^2(t)\bigr)^{-3/2}\bigl(2x_1(t)\dot x_1(t)+2x_2(t)\dot x_2(t)+2x_3(t)\dot x_3(t)\bigr)\\    &= -\frac{\left\langle x,\dot x\right\rangle}{\left\|x\right\|^3}.    \end{aligned}

This yields

\begin{aligned}    \dot A &\underset{\hphantom{\times\text{-Lemma}}}{=} \left(\frac{1}{\gamma Mm}J\times \dot x + \frac{x}{\left\|x\right\|}\right)^\cdot \\    &\underset{\times\text{-Lemma}}{\overset{(iv)}{=}} \frac{1}{\gamma Mm}\bigl(\dot J\times\dot x + J\times\ddot x\bigr) + \left(\frac{1}{\left\|x\right\|}\dot x-\frac{\left\langle x,\dot x\right\rangle}{\left\|x\right\|^3}x\right)\\    &\underset{\hphantom{\times\text{-Lemma}}}{=} \frac1{\gamma Mm}\left(0+J\times\left(-\gamma M\frac{x}{\left\|x\right\|^3}\right)\right) + \left(\frac{1}{\left\|x\right\|}\dot x-\frac{\left\langle x,\dot x\right\rangle}{\left\|x\right\|^3}x\right)\\    &\underset{\hphantom{\times\text{-Lemma}}}{=}-\frac1m\left((x\times m\dot x)\times \frac{x}{\left\|x\right\|^3}\right) + \left(\frac{1}{\left\|x\right\|}\dot x-\frac{\left\langle x,\dot x\right\rangle}{\left\|x\right\|^3}x\right)\\    &\underset{\hphantom{\times\text{-Lemma}}}{=}-\left((x\times \dot x)\times \frac{x}{\left\|x\right\|^3}\right) + \left(\frac{1}{\left\|x\right\|}\dot x-\frac{\left\langle x,\dot x\right\rangle}{\left\|x\right\|^3}x\right)\\    &\underset{\times\text{-Lemma}}{\overset{(iii)}{=}} -\left\langle x,\frac{x}{\left\|x\right\|^3}\right\rangle \dot x + \left\langle \dot x,\frac{x}{\left\|x\right\|^3}\right\rangle x + \frac{1}{\left\|x\right\|}\dot x-\frac{\left\langle x,\dot x\right\rangle}{\left\|x\right\|^3}x \\    &\underset{\hphantom{\times\text{-Lemma}}}{=} -\frac1{\left\|x\right\|^3}\left\langle x,x\right\rangle \dot x + \frac1{\left\|x\right\|^3}\left\langle \dot x, x\right\rangle x + \frac1{\left\|x\right\|}\dot x-\frac1{\left\|x\right\|^3}\left\langle x,\dot x\right\rangle x\\    &\underset{\hphantom{\times\text{-Lemma}}}{=} -\frac1{\left\|x\right\|^3}\left\|x\right\|^2\dot x + \frac1{\left\|x\right\|}\dot x\\    &\underset{\hphantom{\times\text{-Lemma}}}{=} 0.    \end{aligned}



Now, we have all ingredients to prove Kepler’s Laws. We conclude axiomatically by assuming that planetary motion is governed by Newton’s Law of Gravitation (the differential equation given above).

Let’s start with

Theorem (Kepler’s First Law of Planetary Motion): Planets orbit in ellipses, with the Sun as one of the foci.

Proof: Let x(t) denote the orbit of a planet around the Sun. By the AJ-Lemma, J is constant. By definition of J = x\times m\dot x and by (ii) of the \times-Lemma, both x and \dot x are orthogonal to J; the orbit is therefore located in a two-dimensional plane. Let us introduce polar coordinates in this plane, with the Sun in the origin, and with axis A (this works, as A is located in the same plane as well: A\bot J by the definition of A).


Now let \varphi(t) be the angle of x(t) and A, set \varepsilon := \left\|A\right\|. This means

\displaystyle \cos\varphi(t) = \frac{\left\langle x(t),A\right\rangle}{\left\|x\right\|\left\|A\right\|},

and so

\displaystyle \left\langle A,x(t)\right\rangle = \varepsilon \left\|x\right\|\cos\varphi(t).

By definition of A, we have

\begin{aligned}    \left\langle A,x(t)\right\rangle &\underset{\hphantom{\times\text{-Lemma}}}{=} \left\langle\frac{1}{\gamma Mm}J\times\dot x + \frac{x}{\left\|x\right\|}, x(t)\right\rangle\\    &\underset{\hphantom{\times\text{-Lemma}}}{=} \frac{1}{\gamma Mm}\left\langle J\times \dot x, x\right\rangle + \frac{1}{\left\|x\right\|}\left\langle x,x\right\rangle\\    &\underset{\times\text{-Lemma}}{\overset{(i)}{=}} \frac{1}{\gamma Mm}\mathrm{det}(J,\dot x,x)+\frac{\left\| x\right\|^2}{\left\|x\right\|}\\    &\underset{\hphantom{\times\text{-Lemma}}}{=} (-1)^3\frac{1}{\gamma Mm}\mathrm{det}(x,\dot x,J) + \left\|x(t)\right\|\\    &\underset{\times\text{-Lemma}}{\overset{(i)}{=}} -\frac1{\gamma Mm}\left\langle x\times \dot x,J\right\rangle + \left\|x(t)\right\|\\    &\underset{\hphantom{\times\text{-Lemma}}}{=} -\frac1{\gamma Mm^2}\left\langle x\times m\dot x, J\right\rangle + \left\|x(t)\right\|\\    &\underset{\hphantom{\times\text{-Lemma}}}{=} -\frac1{\gamma Mm^2}\left\langle J,J\right\rangle + \left\|x(t)\right\|\\    &\underset{\hphantom{\times\text{-Lemma}}}{=} const. + \left\|x(t)\right\|.    \end{aligned}

Now, if A=0, then we have found

\displaystyle \left\|x(t)\right\| = \frac1{\gamma Mm^2}\left\|J\right\|^2,

which means that the planet moves on a circular orbit.

If A\neq0, then we conclude

\begin{aligned}    &&\varepsilon \left\|x(t)\right\|\cos\varphi(t) &= -\frac1{\gamma Mm^2}\left\|J\right\|^2+\left\|x(t)\right\|\\    &\implies&\varepsilon\left\|x(t)\right\|\cos\varphi(t)&=-\frac1{\gamma Mm^2}\left\|J\right\|^2+\left\|x(t)\right\|\\    &\implies&\frac1{\gamma Mm^2}\left\|J\right\|^2&=\bigl(1-\varepsilon\cos\varphi(t)\bigr)\left\|x(t)\right\|\\    &\implies&\left\|x(t)\right\|&=\frac{\frac{\left\|J\right\|^2}{\gamma Mm^2}}{1-\varepsilon\cos\varphi(t)} = \frac{\varepsilon\frac{\left\|J\right\|^2}{\gamma Mm^2\left\|A\right\|}}{1-\varepsilon\cos\varphi(t)} = \frac{\varepsilon p}{1-\varepsilon\cos\phi(t)},    \end{aligned}

with p defined in the obvious fashion to make the last equation work.

Therefore, the planet moves on a conic section, with focus in the Sun. As the planet’s orbits are bounded, we have proved that it must follow an ellipse. Q.e.d.

Theorem (Kepler’s Second Law of Planetary Motion): A planet sweeps out equal areas in equal times.

Proof: We use cartesian coordinates in \mathbb{R}^3, such that e_1 is parallel to A and e_3 is parallel to J. Then x(t) is the plane \mathrm{span}(e_1,e_2) and the Sun is in (0,0). In particular, x_3(t) = 0 for all t by the proof of the First Law. Then,

\displaystyle \frac1m J = x\times \dot x = \begin{pmatrix}x_1\\x_2\\0\end{pmatrix}\times \begin{pmatrix}\dot x_1\\\dot x_2\\ 0\end{pmatrix} = \begin{pmatrix}0\\0\\x_1\dot x_2-x_2\dot x_1\end{pmatrix}.

By Leibniz’ sector formula, the line segment between times t_1 and t_2 sweeps the area

\displaystyle \frac12\left|\int_{t_1}^{t_2}(x_1\dot x_2-x_2\dot x_1)dt\right| = \frac1{2m}\left\|J\right\|(t_2-t_1).

This area only depends on the difference of times, as stated. Q.e.d.

Theorem (Kepler’s Third Law of Planetary Motion): The square of the period of an orbit is proportional to the cube of its semi-major axis.

Proof: By Leibniz’ sector formula (used similarly to the proof of the Second Law), the area contained in the planet’s entire orbit is

\displaystyle \frac12\left|\int_0^T (x_1\dot x_2-x_2\dot x_1)dt\right| = \frac1{2m}\left\|J\right\| T,

where T is the time taken for a full orbit around the Sun. By the First Law, this orbit is an ellipse, the area of which may be computed as follows: The cartesian coordinates of an ellipse are

\begin{pmatrix}x(t)\\y(t)\end{pmatrix}=\begin{pmatrix}a\cos t\\b\sin t\end{pmatrix},

with a and b real constants (the larger one is called the semi-major axis). This is actually an ellipse because of

\displaystyle \frac{x^2}{a^2}+\frac{y^2}{b^2} = \frac{a^2\cos^2 t}{a^2}+\frac{b^2\sin^2t}{b^2} = 1.

From the notations about normal forms of conic sections, we find b = \frac{\varepsilon p}{\sqrt{1-\varepsilon2}} = a\sqrt{1-\varepsilon^2}, which implies that a>b (as \varepsilon > 0 for any conic section). Now, the area of the ellipse is, by Leibniz’ sector formula again,

\begin{aligned}    \frac12\left|\int_0^{2\pi} (x_1\dot x_2-x_2\dot x_1)dt\right| &= \frac12\left|\int_0^{2\pi}\bigl(a \cos t \cdot b \cos t-b\sin t \cdot a (-\sin t)\bigr)dt\right| \\    &= \frac12 ab\int_0^{2\pi}dt \\    &= \pi ab.    \end{aligned}

Both representations of the area covered by the orbit now yield

\displaystyle T \frac1{2m}\left\|J\right\| = \pi a^2\sqrt{1-\varepsilon^2},

and so, using the definition of p obtained in the proof of the First Law,

\begin{aligned}    \displaystyle T^2 &= \frac{4m^2}{\left\|J\right\|^2}\pi^2a^4(1-\varepsilon^2) \\    &= \frac{4\pi^2 m^2a^3}{\left\|J\right\|^2}a(1-\varepsilon^2) \\    &= \frac{4\pi^2m^2a^3}{\left\|J\right\|^2}\frac{\varepsilon p}{1-\varepsilon^2}(1-\varepsilon^2) \\    &= \frac{4\pi^2m^2a^3}{\left\|J\right\|^2}\varepsilon\frac{\left\|J\right\|^2}{\gamma Mm^2\left\|A\right\|} \\    &= \frac{4\pi^2}{\gamma M}a^3.    \end{aligned}

The constant \frac{4\pi^2}{\gamma M} is identical for any planet travelling around the Sun, and thus is constant. Q.e.d.

Let us conclude with a brief remark of how beautiful and elegant those Laws are – made my day.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s