Eugene's Blog

I can't believe it's blog!

Using recursion combinators in JavaScript

In the previous post we explored “array extras” and how they can help us to write concise yet performant and clean code. In this post we take a look at generalizing recursive algorithms with recursion combinators — high-level functions that encapsulate all boilerplate code needed to set up the recursion. These functions were added to dojox.lang.functional and will be officially released with Dojo 1.2.

In general the recursion is a form of iterative problem solving in the same category as loops. There are two major natural sources of recursive algorithms: recursive data structures (lists, trees, and so on), and recursive definitions (factorial, Fibonacci numbers, the GCD algorithm, etc.). The recursion plays a prominent role in the functional programming (FP), and one of the best articles on this topic is “Recursion Theory and Joy” by Manfred von Thun, the creator of Joy (a purely functional programming language with Forth-like syntax). Manfred’s article explains intricacies of recursion including the venerable Y combinator, recursion combinators in general, and introduces a practical set of recursion combinators, which will guide us in this post.

FP programmers spent a lot of time advancing the theory of programming, and it shows. Studying functional languages gives you better understanding of computing, and provides elegant solutions to many difficult questions. Yet there is a problem of applicability: some solutions are not applicable nor practical for multi-paradigm languages like JavaScript. For example, while it is simple to replicate monads), it doesn’t make much sense: monads are mostly used to express sequential computations, introduce side-effects like input/output, and implement state. We can do all these things directly without artificial constructs. The same goes for the Y combinator, which solves the problem of self-reference to yet undefined or incomplete function — there is the direct way to do it in JavaScript (hint: arguments.callee).

The simplest kind of recursion is a linear recursion (linrec). This is a pattern when a function calls itself once, or terminates when some condition is met. Basically it means that in order to do any linear recursion function we need four non-recursive functions:

  1. a stop condition,
  2. a “then” function, which is called when condition is met before stopping the recursion, produces a final value,
  3. a “before” function, which is called before the recursive call to produce new arguments,
  4. an “after” function, which is called after the recursive call to process the returned value.

Let’s code it up using lambdas (read more on them in Functional fun in JavaScript with Dojo):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
var df = dojox.lang.functional;

var linrec = function(cond, then, before, after){
    var cond   = df.lambda(cond),
        then   = df.lambda(then),
        before = df.lambda(before),
        after  = df.lambda(after);
    return function(){
        if(cond.apply(this, arguments)){
            return then.apply(this, arguments);
        }
        var args = before.apply(this, arguments);
        var ret  = arguments.callee.apply(this, args);
        return after.call(this, ret, arguments);
    };
};

The code is simple, yet flexible. Let’s go over all 4 parameter functions:

  1. cond() takes all parameters passed to the linrec() and returns a Boolean value. If it is true, we stop recursions and call then(), otherwise we proceed to before().
  2. then() takes all parameters passed to the linrec() and returns a value, which in turn will be the returned value of the linrec().
  3. before() sets up the recursion arguments. It takes all parameters passed to linrec() and returns an array of new parameters to call linrec() recursively.
  4. When the recursive call is finished we process its return with after(). It takes two parameters: the returned value, and the array of all arguments passed to our linrec(). It returns a new value, which will be the returned value of linrec().

Let’s see how we can code well-known algorithms using linrec():

1
2
3
4
5
6
7
8
9
10
11
12
13
// factorial
var fact0 = function(n){
    return n <= 1 ? 1 : n * arguments.callee.call(this, n - 1);
};

var fact1 = linrec("<= 1", "1", "[n - 1]", "m * n[0]");

// find the greatest common divisor using the Euclidean algorithm
var gcd0 = function(a, b){
    return b == 0 ? a : arguments.callee.call(this, b, a % b);
};

var gcd1 = linrec("a, b -> b == 0", "a", "a, b -> [b, a % b]", "x");

As you can see using linrec() is pretty straight-forward. But gsd0() demonstrates a very important case: a tail recursion. In terms of linrec() it means that we don’t process the result of the recursive call, but return it directly ⇒ we don’t need after() anymore. Let’s code it up:

1
2
3
4
5
6
7
8
9
10
11
12
var tailrec = function(cond, then, before){
    var cond   = df.lambda(cond),
        then   = df.lambda(then),
        before = df.lambda(before);
    return function(){
        if(cond.apply(this, arguments)){
            return then.apply(this, arguments);
        }
        var args = before.apply(this, arguments);
        return arguments.callee.apply(this, args);
    };
};

Let’s see our previous examples coded with tailrec():

1
2
3
4
5
6
7
8
9
10
// factorial: the tail recursive version with an accumulator
var fact2Aux = function(n, acc){
    return n <= 1 ? acc : arguments.callee.call(this, n - 1, n * acc);
};
var fact2 = function(n){ return fact2Aux(n, 1); }

var fact3Aux = tailrec("<= 1", "a, b -> b", "[n - 1, n * acc]");
var fact3 = function(n){ return fact3Aux(n, 1); }

var gcd2 = tailrec("a, b -> b", "a", "a, b -> [b, a % b]");

What does it buy us? The recursive part is encapsulated in linrec() or tailrec(), and we saved some bytes on the boilerplate. That’s it? Wait, there is more.

The recursive solutions sound like fun, but in the real life we have to take into account harsh realities:

  • Recursions can be expensive: stack frames are allocated, internal variables are allocated, we don’t reuse unneeded variables of the previous stack frame, and so on.
  • Usually there is a restriction on how many times we can recurse.

How severe are restrictions on number of recursions? We can write a super-simple program to find out:

1
2
3
4
rec = function(n){
    if(n <= 1){ return 0; }
    return rec(n - 1) + 1;  // to eliminate possible tail recursion optimization
};

Trying different n we can find a limit. I tried this code on different browsers. Results are below:

Browser Windows Linux Macintosh
Firefox 2 1,000 1,000  
Firefox 3 3,000 3,000 3,000
IE 6 2,542 n/a n/a
IE 7 2,556 n/a n/a
IE 8.0.6001.17184 Beta 2,385 n/a n/a
Opera 9.5 9,997 9,999  
Safari 3.1.1 499 n/a 499
Midori 0.0.18/Webkit 1.0.1   149,794  

 

Practically all browsers limit the depth of recursion at ~3,000. Safari is the weak spot with ~500. Opera looks good with ~10,000 but it aborted the JavaScript interpreter thread that ran the test, while other browsers threw an exception — the “nuclear” response.

Let’s convert recursive algorithms into loops. Now we can easily do that, because by abstracting the recursive algorithms we set us up for improving their implementation without changing their interface. These are “loop” versions:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
var linrec = function(cond, then, before, after){
    var cond   = df.lambda(cond),
        then   = df.lambda(then),
        before = df.lambda(before),
        after  = df.lambda(after);
    return function(){
        var args = arguments, top, ret;
        // 1st part
        for(; !cond.apply(this, args); args = before.apply(this, args)){
            top = {prev: top, args: args};
        }
        ret = then.apply(this, args);
        //2nd part
        for(; top; top = top.prev){
            ret = after.call(this, ret, top.args);
        }
        return ret;
    };
};

var tailrec = function(cond, then, before){
    var cond   = df.lambda(cond),
        then   = df.lambda(then),
        before = df.lambda(before);
    return function(){
        var args = arguments;
        for(; !cond.apply(this, args); args = before.apply(this, args));
        return then.apply(this, args);
    };
};

By eliminating the recursion we used an explicit stack (a list, really) in linrec to hold some necessary variables. But tailrec was converted without any linear structures. Now you can see why the tail recursion is the preferred form of recursion for many — it doesn’t require any intermediate storage for iterations, and can be optimized easily.

Is it all we can do to make it faster? No. As you’ve noticed we worked with lambdas, which allow to represent functions in a compact textual notation. We can easily inline them saving on calling external functions for small operations. Of course the inlining works only for text snippets, regular functions will be called as usual.

But before taking a look at numbers let me introduce two more recursion combinators — binrec, and multirec, which are here to deal with binary and generic tree-like recursion respectively:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
var binrec = function(cond, then, before, after){
    var cond   = df.lambda(cond),
        then   = df.lambda(then),
        before = df.lambda(before),
        after  = df.lambda(after);
    return function(){
        if(cond.apply(this, arguments)){
            return then.apply(this, arguments);
        }
        var args = before.apply(this, arguments);
        var ret1 = arguments.callee.apply(this, args[0]);
        var ret2 = arguments.callee.apply(this, args[1]);
        return after.call(this, ret1, ret2, arguments);
    };
};

var multirec = function(cond, then, before, after){
    var cond   = df.lambda(cond),
        then   = df.lambda(then),
        before = df.lambda(before),
        after  = df.lambda(after);
    return function(){
        if(cond.apply(this, arguments)){
            return then.apply(this, arguments);
        }
        var args = before.apply(this, arguments),
            ret  = new Array(args.length);
        for(var i = 0; i < args.length; ++i){
            ret[i] = arguments.callee.apply(this, args[i]);
        }
        return after.call(this, ret, arguments);
    };
};

Pay attention to different signatures of before() and after() functions.

Let’s implement the Fibonacci algorithm with binrec:

1
2
3
4
5
6
7
var fib0 = function(n){
    return n <= 1 ? 1 :
        arguments.callee.call(this, n - 1) +
            arguments.callee.call(this, n - 2);
};

var fib1 = binrec("<= 1", "1", "[[n - 1], [n - 2]]", "+");

Yes, it is that simple.

All 4 recursion combinators implemented with loops and possible lambda inlining are the part of dojox.lang.functional package:

1
2
3
4
dojo.require("dojox.lang.functional.linrec");
dojo.require("dojox.lang.functional.tailrec");
dojo.require("dojox.lang.functional.binrec");
dojo.require("dojox.lang.functional.multirec");

And now it is time for obligatory numbers. I wrote a simple program, which measures performance of different versions of factorial functions, and Fibonacci functions. Here are the results for the factorial:

OS Browser raw rec raw loop linrec rec linrec loop linrec tailrec rec tailrec loop tailrec multirec rec multirec loop multirec
Windows IE 6 703 78 2,266 1,875 782 2,157 1,422 531 3,312 3,219 2,063
IE 7 719 94 2,328 2,047 765 2,531 1,828 562 3,390 3,390 1,985
IE 8 Beta 516 47 1,344 1,078 594 1,297 734 406 2,140 2,125 1,640
FF 1.5 625 31 1,281 1,156 657 1,469 766 532 2,172 2,094 1,578
FF 2 984 47 2,860 2,406 1,984 2,469 1,281 1,359 4,968 5,406 4,859
FF 3 613 8 247 172 131 694 359 343 576 417 903
Opera 9.5 94 0 188 156 94 203 109 63 328 328 250
Safari 3.1.1 62 0 281 94 78 188 63 31 329 234 109
Linux FF 2 819 42 2,015 1,658 1,033 2,155 1,175 790 3,362 3,379 2,245
FF 3 462 26 916 1,138 325 1,440 397 250 2,178 1,054 745
Opera 9.5 133 24 445 382 182 411 263 123 673 763 528
Midori 0.0.18 154 11 343 194 99 347 116 76 469 329 249

 

And here are the results for the Fibonacci:

OS Browser raw rec raw tail raw loop binrec rec binrec loop binrec tailrec rec tailrec loop tailrec multirec rec multirec loop multirec
Windows IE 6 1,125 453 47 5,953 5,109 2,328 1,187 766 312 6,609 7,094 3,719
IE 7 1,219 407 47 6,000 4,938 2,485 1,219 750 313 6,781 6,938 3,875
IE 8 Beta 797 281 15 3,609 2,985 1,953 750 406 235 4,344 5,844 3,093
FF 1.5 797 282 31 3,594 3,156 2,203 750 438 343 4,422 4,032 3,266
FF 2 1,656 672 15 8,860 8,234 7,000 1,610 672 672 11,985 11,328 8,390
FF 3 449 93 11 1,240 845 1,257 228 88 48 1,332 653 1109
Opera 9.5 140 32 0 500 484 250 93 62 32 547 671 515
Safari 3.1.1 156 63 0 547 313 203 172 47 47 656 343 250
Linux FF 2 1,343 346 36 5,858 4,942 3,188 1,064 776 464 6,787 6,317 4,459
FF 3 718 914 12 3,067 2,134 1,756 573 228 152 3,484 3,307 2,025
Opera 9.5 312 87 11 1,173 1,041 544 260 167 79 1,364 1,392 993
Midori 0.0.18 268 116 8 831 457 288 219 103 65 1,003 574 394

 

The legend:

  • raw rec is the manually-written recursive version of an algorithm,
  • raw tail is the tail-recursive version of an algorithm,
  • raw loop is the loop-based version of an algorithm,
  • XXXrec rec is the naïve recursive version of linrec, tailrec, binrec, or multirec given above,
  • XXXrec loop is the loop-based version of respective recursion combinators,
  • XXXrec is the version from dojox.lang.factorial, which implements an optional inlining,
  • Midori is the Webkit-based web browser for Linux,
  • multirec() doesn’t match the factorial nor Fibonacci algorithms, I used it just to see how (badly) it stacks up against normal methods,
  • all numbers were taken on the same machine under Linux, and Windows (the latter was running under VMware),
  • all numbers were taken on the 3rd run,
  • all numbers are in milliseconds.

You are invited to check out the code of the test program. I didn’t massage the numbers and you can spot certain artifacts, when the results look a little strange. Numbers change from run to run, and if we were unlucky and caught some massive garbage collection, the numbers would be a little bit inflated. Unfortunately most browsers don’t allow JavaScript to run more than several seconds in a row, and show an alert about “unresponsive script” making the result invalid. But the numbers are real, just disregard small differences due to inaccuracy if timers, and look for major trends.

The results are in.

As you can see while special versions of these simple functions written with the knowledge of their respective domains are extremely fast, our optimized generic versions can match and in some cases exceed the naïve versions without investing a lot of time in writing them. Obviously, if we are to test more calculation-heavy recursive functions (both factorial and Fibonacci use very light-weight snippets), the difference between the generic version and the original version will be much smaller, if any.

The conclusion: it pays off to abstract algorithms, because you can improve them independently of the applications they are used in.