In the previous post we explored “array extras” and how they can help us to write concise yet performant and clean code. In this post we take a look at generalizing recursive algorithms with recursion combinators — high-level functions that encapsulate all boilerplate code needed to set up the recursion. These functions were added to dojox.lang.functional and will be officially released with Dojo 1.2.
In general the recursion is a form of iterative problem solving in the same category as loops. There are two major natural sources of recursive algorithms: recursive data structures (lists, trees, and so on), and recursive definitions (factorial, Fibonacci numbers, the GCD algorithm, etc.). The recursion plays a prominent role in the functional programming (FP), and one of the best articles on this topic is “Recursion Theory and Joy” by Manfred von Thun, the creator of Joy (a purely functional programming language with Forth-like syntax). Manfred’s article explains intricacies of recursion including the venerable Y combinator, recursion combinators in general, and introduces a practical set of recursion combinators, which will guide us in this post.
The simplest kind of recursion is a linear recursion (linrec). This is a pattern when a function calls itself once, or terminates when some condition is met. Basically it means that in order to do any linear recursion function we need four non-recursive functions:
- a stop condition,
- a “then” function, which is called when condition is met before stopping the recursion, produces a final value,
- a “before” function, which is called before the recursive call to produce new arguments,
- an “after” function, which is called after the recursive call to process the returned value.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
The code is simple, yet flexible. Let’s go over all 4 parameter functions:
- cond() takes all parameters passed to the linrec() and returns a Boolean value. If it is true, we stop recursions and call then(), otherwise we proceed to before().
- then() takes all parameters passed to the linrec() and returns a value, which in turn will be the returned value of the linrec().
- before() sets up the recursion arguments. It takes all parameters passed to linrec() and returns an array of new parameters to call linrec() recursively.
- When the recursive call is finished we process its return with after(). It takes two parameters: the returned value, and the array of all arguments passed to our linrec(). It returns a new value, which will be the returned value of linrec().
Let’s see how we can code well-known algorithms using linrec():
1 2 3 4 5 6 7 8 9 10 11 12 13
As you can see using linrec() is pretty straight-forward. But gsd0() demonstrates a very important case: a tail recursion. In terms of linrec() it means that we don’t process the result of the recursive call, but return it directly ⇒ we don’t need after() anymore. Let’s code it up:
1 2 3 4 5 6 7 8 9 10 11 12
Let’s see our previous examples coded with tailrec():
1 2 3 4 5 6 7 8 9 10
What does it buy us? The recursive part is encapsulated in linrec() or tailrec(), and we saved some bytes on the boilerplate. That’s it? Wait, there is more.
The recursive solutions sound like fun, but in the real life we have to take into account harsh realities:
- Recursions can be expensive: stack frames are allocated, internal variables are allocated, we don’t reuse unneeded variables of the previous stack frame, and so on.
- Usually there is a restriction on how many times we can recurse.
How severe are restrictions on number of recursions? We can write a super-simple program to find out:
1 2 3 4
n we can find a limit. I tried this code on different browsers. Results are
|IE 8.0.6001.17184 Beta||2,385||n/a||n/a|
|Midori 0.0.18/Webkit 1.0.1||149,794|
Let’s convert recursive algorithms into loops. Now we can easily do that, because by abstracting the recursive algorithms we set us up for improving their implementation without changing their interface. These are “loop” versions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
By eliminating the recursion we used an explicit stack (a list, really) in linrec to hold some necessary variables. But tailrec was converted without any linear structures. Now you can see why the tail recursion is the preferred form of recursion for many — it doesn’t require any intermediate storage for iterations, and can be optimized easily.
Is it all we can do to make it faster? No. As you’ve noticed we worked with lambdas, which allow to represent functions in a compact textual notation. We can easily inline them saving on calling external functions for small operations. Of course the inlining works only for text snippets, regular functions will be called as usual.
But before taking a look at numbers let me introduce two more recursion combinators — binrec, and multirec, which are here to deal with binary and generic tree-like recursion respectively:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
Pay attention to different signatures of before() and after() functions.
Let’s implement the Fibonacci algorithm with binrec:
1 2 3 4 5 6 7
Yes, it is that simple.
All 4 recursion combinators implemented with loops and possible lambda inlining are the part of dojox.lang.functional package:
1 2 3 4
And now it is time for obligatory numbers. I wrote a simple program, which measures performance of different versions of factorial functions, and Fibonacci functions. Here are the results for the factorial:
|OS||Browser||raw rec||raw loop||linrec rec||linrec loop||linrec||tailrec rec||tailrec loop||tailrec||multirec rec||multirec loop||multirec|
|IE 8 Beta||516||47||1,344||1,078||594||1,297||734||406||2,140||2,125||1,640|
And here are the results for the Fibonacci:
|OS||Browser||raw rec||raw tail||raw loop||binrec rec||binrec loop||binrec||tailrec rec||tailrec loop||tailrec||multirec rec||multirec loop||multirec|
|IE 8 Beta||797||281||15||3,609||2,985||1,953||750||406||235||4,344||5,844||3,093|
- raw rec is the manually-written recursive version of an algorithm,
- raw tail is the tail-recursive version of an algorithm,
- raw loop is the loop-based version of an algorithm,
- XXXrec rec is the naïve recursive version of linrec, tailrec, binrec, or multirec given above,
- XXXrec loop is the loop-based version of respective recursion combinators,
- XXXrec is the version from dojox.lang.factorial, which implements an optional inlining,
- Midori is the Webkit-based web browser for Linux,
- multirec() doesn’t match the factorial nor Fibonacci algorithms, I used it just to see how (badly) it stacks up against normal methods,
- all numbers were taken on the same machine under Linux, and Windows (the latter was running under VMware),
- all numbers were taken on the 3rd run,
- all numbers are in milliseconds.
The results are in.
As you can see while special versions of these simple functions written with the knowledge of their respective domains are extremely fast, our optimized generic versions can match and in some cases exceed the naïve versions without investing a lot of time in writing them. Obviously, if we are to test more calculation-heavy recursive functions (both factorial and Fibonacci use very light-weight snippets), the difference between the generic version and the original version will be much smaller, if any.
The conclusion: it pays off to abstract algorithms, because you can improve them independently of the applications they are used in.