About Haskell and why the quicksort example is bogus 2005-03-23

About Haskell is a page about the functional programming language Haskell.

It's well worth a read if you haven't read up on functional programming before.

But whenever I read intro's like these, there is one thing that provoke me: That bloody quicksort example.

Here is quicksort in Haskell:

qsort []     = []
qsort (x:xs) = qsort elts_lt_x ++ [x] ++ qsort elts_greq_x
                   elts_lt_x   = [y | y = x]

It looks quite straightforward, doesn't it: Quicksort of an empty list gives an empty result, otherwise quicksort of a set where x represent an element and xs represent the remaining elements is equal to the result of applying quicksort to all the elements smaller than x plus x plus the result of quicksort of all elements greater or equal than x.

It's even simpler than the description almost.

Contrast that with the pesky C implementation:

    qsort( a, lo, hi ) int a[], hi, lo;
      int h, l, p, t;
      if (lo < hi) {
        l = lo;
        h = hi;
        p = a[hi];
        do {
          while ((l < h) && (a[l] <= p)) 
              l = l+1;
          while ((h > l) && (a[h] >= p))
              h = h-1;
          if (l < h) {
              t = a[l];
              a[l] = a[h];
              a[h] = t;
        } while (l < h);
        t = a[l];
        a[l] = a[hi];
        a[hi] = t;
        qsort( a, lo, l-1 );
        qsort( a, l+1, hi );

Ohh. Nasty.

What the linked page doesn't tell you is that the reason the C version is nasty (apart from being pre-ANSI C spec judging from the signature) is performance. It's a version that sorts an array in place, whereas the Haskell version have all kinds of potential for massive memory wastage unless the compiler is far smarter than most.

That aside, the C version could trivially be simplified too.

What does the C version actually do? The answer is: Almost exactly the same as the Haskell version.

The Haskell version used what Haskell calls list comprehension to partition the input data into two groups and a pivot (the pivot is the value we split the input data by, and in the Haskell example it's "x").

That's what those nested loops does. In Haskell, partitioning is built into the language. In C it's not, but hoisting the loops out of Quicksort easily creates a partitioning function (in C++ we're even better off: std::partition() would do the job for us) that reduces the core of the C quicksort to just a couple of lines:

Partition the array in place, recursively call quicksort on the two subparts before and after the pivot.

I'll post some comments on how simple you can do it in C and C++ later.

Meanwhile, if you want to see flexible sorting in C++ that's usually faster than quicksort (improving on quicksort by selectively switching to other algorithms), take a look at this article by Andrei Alexandrescu.

For the record, Alexandrescu boils the core of quicksort down to:

    template <class Iter>
    void Sort(Iter b, Iter e)
      const pair<Iter, Iter> midRange = Partition(b, e, SelectPivot(b, e));
      Sort(b, midRange.first);
      Sort(midRange.second, e);


blog comments powered by Disqus