Shellsort – Bounds, Gems and Open Questions

In the recent months I’ve spent a rather large amount of my spare time with the What, Why and Huh? of Shellsort. Part of the problem was, that I didn’t have much spare time. The other part was that Shellsort is a rather tricky thing to analyse when you look at it.

In the beginning, I was interested in the more standard sorting algorithms like Quicksort and Heapsort. But than I got hooked by the analysis of Shellsort that can be found in Knuth’s book. There are some quite intriguing ideas in it, and there’s still research going on about it – this came as a surprise to me: I had thought that sorting algorithms had been so very thoroughly researched, there wouldn’t be anything interesting left to be found out. But apparently, there’s till some work to do about the optimal parametrization of Shellsort.

Now, let’s start at the beginning. Shellsort is a generalization of Insertion Sort, which is called “bridge player’s sort” in Knuth’s book (Chapter 5.1 in Volume 3). The problem that these algorithms solve is: take the real numbers $a_1,\ldots,a_n$ and sort them. Abstractly, this means: find a permutation $\pi$ on the set $\{1,\ldots,n\}$ such that $a_{\pi(1)} < \ldots < a_{\pi(n)}$. Insertion sort does this inductively: if $a_1,\ldots, a_k$ have already been sorted, take $a_{k+1}$ and put it in its correct place among $a_1,\ldots, a_k$ by looking at $a_k$, then at $a_{k-1}$ and so on, until you found it.

Insertion sort is a very straight-forward method, it’s pretty much what most people do when playing cards (hence Knuth’s nickname for this sorting algorithm). On the other hand, it’s not very efficient. On the average case, you’ll have about $O(n^2)$ swapping-operations, which is a lot. On usual sets, the run-time is too long to be acceptable.

Shellsort improves on those many swaps. Obivously, when a number is “far away” from where it’s supposed to end, you have to check is against many numbers in between. Well, then – just don’t. Let the elements make larger leaps instead of small steps. This means, we will make several successive passes of Insertion Sort, but considering only a subset of elements each time. For instance, we’ll start with the set $a_1, a_{1+h}, a_{1+2h},\ldots,a_{1+rh}$, then we’ll look at the set $a_2, a_{2+h}, a_{2+2h},\ldots,a_{2+rh}$, and so on, until $a_{h-1}, a_{2h-1}, a_{3h-1},\ldots,a_{(r-1)h-1}$. Of course, that will not, in general, sort the set and solve our problem, we have only sorted each subset that consists of the original numbers with skips of $h$ numbers in between (we’ll call this “$h$-sorted“). So, we make another pass, now with another skipping number $k$. We continue until our original set is sorted. We can guarantee this by making a final pass with skip 1 (which amounts to a usual Insertion Sort).

The last sentence will make it seem that Shellsort is no use at all – in the end, you’ll always need to apply Insertion Sort to achieve the goal, and we’ve already seen that this is not efficient for practical purposes. But then again, the final pass of Insertion Sort won’t have a lot of work to do: those numbers that are “far away” and “very much out of order” will already have been put in a place very close to where they belong. The biggest issue about Insertion Sort has thus been resolved.

Let’s have a short example and let’s be real Knuth-ish here: We consider 16 random numbers that I have picked using R yesterday morning. First, we’ll do a 7-sorting pass on them, where I have indicated those numbers that are compared to one another by similar colors: 155, 352 and 320 are compared and sorted; then 299, 893 and 974 are compared and so on. After that, we have a 5-sorting pass, and then a 3-sorting pass. Finally, there’s Insertion Sort in the end and we arrive at a perfectly sorted set of 16 numbers.

 original set 155 299 923 41 152 484 145 352 893 472 291 44 664 354 320 974 7-sorting 155 299 923 41 152 484 145 352 893 472 291 44 664 354 320 974 155 299 472 41 44 484 145 320 893 923 291 152 664 354 352 974 5-sorting 155 299 472 41 44 484 145 320 893 923 291 152 664 354 352 974 155 145 320 41 44 291 152 472 354 352 484 299 664 893 923 974 3-sorting 155 145 320 41 44 291 152 472 354 352 484 299 664 893 923 974 41 44 291 152 145 299 155 472 320 352 484 354 664 893 923 974 1-sorting 41 44 291 152 145 299 155 472 320 352 484 354 664 893 923 974 41 44 145 152 155 291 299 320 352 354 472 484 664 893 923 974 sorted set 41 44 145 152 155 291 299 320 352 354 472 484 664 893 923 974

Now, let’s have look at the analysis of the algorithm. The nice thing is a Theorem to which Knuth has given the speaking name, well, “K”. If we have made some “$h$-pass” to a file, followed by a “$k$-pass”, the file will be both $h$-sorted and $k$-sorted. In particular, the subsequent passes will not render the work useless which we have made in previous passes. We can improve on our set with every next pass. A final pass of “$1$-sorting” (i. e. Insertion Sort) concludes our algorithm. The proof of this is not really difficult, but it’s hard to write down, because there’s some muddling with indices.

Then, the question must arise, what are the most efficient choices for the sorting-passes? And this is where research comes in: nobody can tell, what is the best sequence. The algorithm was conceived by Donald Shell (which is why it bears his name, the sorting has nothing to do with beaches or gas stations), and his choice was the sequence $2^k$, $k\geq0$. This choice has been challenged quickly because there will not be any interchange of numbers on odd and even places. The nice thing about it is a divisibility condition: One skipping number is a multiple of the next one, which makes the analysis feasible (although still not easy) – Knuth devotes several pages to sequences like this one. If chosen correctly, one can achieve an average-case running time of $O(n^{3/2+\varepsilon})$, which is much better than $O(n^2)$ of Straight Insertion. Anything better than $O(n^{\frac 32})$ cannot be hoped for, because there will be still too many comparisons in the final 1-sorting pass.

It will be better if we drop the divisibility condition that mixes up the odd- and even-numbered places (or other places respectively, depending on the chosen skipping sequence). The downside is, we can’t say much about the average case for a given skipping sequence, and this is the problem about the analysis: there are many $h$-sorted sequences and many $k$-sorted sequences in a set of $n$ numbers, but their number need not be a divisor of $n!$, the number of possible permutations – hence, some permutations appear more often than others.

However, already the sequence $2^k-1$ can do much better, allowing a worst-case complexity of $O(n^{\frac 32})$. This is a huge achievement, because it does not deal with the average case! Besides, the worst case under the divisibility condition will still be $O(n^2)$.

Let’s have a brief look at sequences with a very special property: Let the sequence $a_1,\ldots, a_n$ be 2-sorted and 3-sorted already, let’s call this an $n$-drowsy sequence (they’ll mostly appear towards the end of the Shellsort). This brings a lot of structure for the final 1-pass: For instance, there will be at worst $\frac n2$ inversions in it. You can see that by thinking about what this sequence must look like: it will start either with 1 or with 21 (anything else will contradict either the 2-sort or the 3-sort). Let’s use some induction now. For $n=1$, nothing is to be done. For $n=2$, the sequence can be 12 (no inversion, assertion holds) or 21 ($1 = \frac22 = \frac n2$ inversions, assertion still holds). For any other $n$, the sequence must be constructed

• either from some $n-1$-drowsy sequence by putting $n$ in the very end of it. This construction gives no futher inversions, so nothing changes. We have $m$ inversions, with $m \leq \frac{n-1}2 < \frac n2$ inversions, the assertion still holds.
• or from some $n-2$-drowsy sequence by putting $n, n-1$ in the very end of it. This adds one inversion to those that we already had, so we have $m$ inversions, $m \leq \frac{n-2}2 +1=\frac n2$, the assertion holds.

With any other construction the sequence won’t be drowsy anymore. q.e.d.

This inductive construction can be used for some unexpected mathematics as well. For instance, there are only $F_{n+1}$ drowsy sequences on $n$ numbers, where $F_n$ denotes the $n$-th Fibonacci-number. Each $(n-1)$-drowsy sequence and each $(n-2)$-drowsy sequence gives a unique $n$-drowsy sequence. By induction, those were $F_n$ and $F_{n-1}$. Digging slightly deeper, one can prove similarly how many inversions all drowsy sequences have in total. A linear function of the Fibonacci-numbers again.

When I found this tiny gem in Knuth’s book, I was honestly surprised about this beautiful connection. Now, that I’ve thought about it long enough, it doesn’t really amaze me anymore. There’s another instance of thinking about something for a couple of days and then saying “oh, of course, it’s trivial!

The next astonishing idea is due to Vaughan Pratt: Choose the numbers $2^p3^q$ as skipping sequences and arrive at a complexity of $O(n(\log n)^2)$! Such a primitive sequence and what a huge improvement on the worst-case timing! This sequence somewhat chooses the best of both worlds, using many divisors and many coprime skips to bring order to the chaos (actually, those quotations were unintentional…). Sadly, there’s a lot of overhead in this sequence: there are many passes which will do very little and there’s much book-keeping-work to be done. Thus, for real-world sequences we cannot hope to have any use for Pratt’s sequence in practical applications. But for theoretical purposes, this is a very amazing object to look at. And here’s the icing on the top of all this: the proof is not really hard, not even lengthy (considering my sloppy style of writing here). Let’s see:

First, how many passes will be necessary with Pratt’s sequence? That would be the cardinality of the set $\{k \in\mathbb{N}\colon k= 2^p3^q< n\} = \{k\colon \log k = p\log 2 + q\log 3< \log n\}$. The last inequality defines a set of integers inside of a triangle in the two-dimensional plane. This triangle is given by two axes in the plane and the line defined by $x\log 2+y\log 3=\log n$, where $x,y\in\mathbb{R}$. The points where the line intersects the axes are, of course, $x = \log_2 n$ and $y = \log_3 n$. In the picture below, the integer points of interest are marked, as well as the triangle described so far. The area of the triangle is $\frac12 \log_2 n \log_3 n$. The number of integer points inside the triangle is then approximately equal to this number, because each of these integer points corresponds bijectively to a square of unit length (the integer point is the lower left point of each square). You get a little fuss about what happens on the diagonal and about the square at the intersection of the axes, but that’s what the $O$-notation is for. No longer caring about this fuss and about what kind of logarithm to use (they only differ by a constant anyway), we find that Pratt’s sequence demands for $O((\log n)^2)$ passes.

Integer Points that can be used for Pratt’s Sequence in Shellsort

Finally, what cost does each pass have? In an $h$-pass, where $h = 2^p3^q$ with suitable $p,q$, the file will already be $2h$– and $3h$-sorted. If $2h >n$, this is extending the definition of “$2h$-sorted” trivially; if not, this follows from what the algorithm has achieved so far and from Knuth’s “Theorem K”. Now, from this theorem you can conclude that there won’t be too many inversions left in the considered subsequence: In one of our $h$-passes, a subsequence of $\frac nh$ numbers is to be considered (this is repeated $h$ times during this stage of the algorithm afterwards before the next skipping number comes in), and by construction, this subsequence is already 2-sorted and 3-sorted. Thankfully, we already talked about those drowsy sequences. Each has at most $\frac{n}{2h}$ inversions in it (its length is $\frac nh$ here, not $n$ as we had above) and we have to consider $h$ such sequences during this pass. Hence, the total number of inversions that may be cured by the $h$-pass is at most $h\frac{n}{2h} = \frac n2$.

So, the costs for Shellsort with Pratt’s sequence is $O(\frac n2(\log n)^2) = O(n(\log n)^2)$, as stated. q.e.d.

Knuth then goes on to look at bounds that hinge on deeper number-theoretic relations between the skips. By choosing those cleverly, one can even get down to $O(n e^{c\sqrt{\ln n}})$ – even for real-world sets, this may be a suitable choice, since the constant implied by the $O$-notation is usually much smaller than with Pratt’s sequence. Pratt’s sequence is way better for astronomically large $n$, though. I admit, I did shun away from the proofs of those bounds… I may return to those some time later. At a first glance, they didn’t seem as appealing as those that I rephrased here. But maybe, I’m just being too picky and not doing justice to those ideas.

As of now, apparently no sequence that achieves the famous $O(n\log n)$-bound is known. I would be very interested to learn, if one could actually prove this to be impossible.

Now, for practical purposes, Shellsort will not be your first choice as a sorting algorithm. It has mostly the merits of being very easy to implement, being quite fast on partially sorted sets and being in-place (not using too much extra space while sorting). I have turned to Knuth’s next chapter by now, which deals with exchanging sorts (like Bubblesort and, in particular, Quicksort) – in the beginning I just wanted to skip Shellsort to get to the next chapter faster. But I have dwelled there for quite some time now, and for good reason. Shellsort allows for much deeper mathematics than I had anticipated, and I have learned a great deal about the principles that underly the theory of algorithms. And there is still research to do. Amazing stuff.