Analysis of generators and other async patterns in node

Fri Aug 09 2013

Table of contents:

A gentle introduction to generators
The analysis
The examples
Complexity
Performance (time and memory)
Debuggability
- Source maps support
- Stack trace accuracy
Conclusion

Async coding patterns are the subject of never-ending debates for us node.js developers. Everyone has their own favorite method or pet library as well as strong feelings and opinions on all the other methods and libraries. Debates can be heated: sometimes social pariahs may be declared or grave rolling may be induced.

The reason for this is that JavaScript never had any continuation mechanism to allow code to pause and resume across the event loop boundary.

Until now.

A gentle introduction to generators

If you know how generators work, you can skip this and continue to the analysis

Generators are a new feature of ES6. Normally they would be used for iteration. Here is a generator that generates Fibonacci numbers. The example is taken from the ECMAScript harmony wiki:

function* fibonacci() {
  let [prev, curr] = [0, 1];
  for (;;) {
    [prev, curr] = [curr, prev + curr];
    yield curr;
  }
}

And here is how we iterate through this generator:

for (n of fibonacci()) {
  // truncate the sequence at 1000
  if (n > 1000) break;
  console.log(n);
}

What happens behind the scene?

Generator functions are actually constructors of iterators. The returned iterator object has a next() method. We can invoke that method manually:

var seq = fibonacci();
console.log(seq.next()); // 1
console.log(seq.next()); // 2 etc.

When next is invoked, it starts the execution of the generator. The generator runs until it encounters a yield expression. Then it pauses and the execution goes back to the code that called next

So in a way, yield works similarly to return. But there is a big difference. If we call next on the generator again, the generator will resume from the point where it left off - from the last yield line.

In our example, the generator will resume to the top of the endless for loop and calculate the next Fibonacci pair.

So how would we use this to write async code?

A great thing about the next() method is that it can also send values to the generator. Let's write a simple number generator that also collects the stuff it receives. When it gets two things it prints them using console.log:

function* numbers() {
  var stuffIgot = [];
  for (var k = 0; k < 2; ++k) {
    var itemReceived = yield k;
    stuffIgot.push(itemReceived);
  }
  console.log(stuffIgot);
}

This generator gives us 3 numbers using yield. Can we give something back?

Let's give two things to this generator:

var iterator = numbers();
// Cant give anything the first time: need to get to a yield first.
console.log(iterator.next()); // logs 0
console.log(iterator.next("present")); // logs 1
fs.readFile("file.txt", function (err, resultFromAnAsyncTask) {
  console.log(iterator.next(resultFromAnAsyncTask)); // logs 2
});

The generator will log the string 'present' and the contents of file.txt

Uh-oh.

Seems that we can keep the generator paused across the event loop boundary.

What if instead of numbers, we yielded some files to be read?

function* files() {
  var results = [];
  for (var k = 0; k < files.length; ++k) results.push(yield files[k]);
  return results;
}

We could process those file reading tasks asynchronously.

var iterator = files();
function process(iterator, sendValue) {
  var fileTask = iterator.next(sendValue);
  fs.readFile(fileTask, function (err, res) {
    if (err) iterator.throw(err);
    else process(iterator, res);
  });
}
process(iterator);

But from the generator's point of view, everything seems to be happening synchronously: it gives us the file using yield, then it waits to be resumed, then it receives the contents of the file and makes a push to the results array.

And there is also generator.throw(). It causes an exception to be thrown from inside the generator. How cool is that?

With next and throw combined together, we can easily run async code. Here is an example from one of the earliest ES6 async generators library task.js.

spawn(function* () {
  var data = yield $.ajax(url);
  $("#result").html(data);
  var status = $("#status").html("Download complete.");
  yield status.fadeIn().promise();
  yield sleep(2000);
  status.fadeOut();
});

This generator yields promises, which causes it to suspend execution. The spawn function that runs the generator takes those promises and waits until they're fulfilled. Then it resumes the generator by sending it the resulting value.

When used in this form, generators look a lot like classical threads. You spawn a thread, it issues blocking I/O calls using yield, then the code resumes execution from the point it left off.

There is one very important difference though. While threads can be suspended involuntarily at any point by the operating systems, generators have to willingly suspend themselves using yield. This means that there is no danger of variables changing under our feet, except after a yield.

Generators go a step further with this: it's impossible to suspend execution without using the yield keyword. In fact, if you want to call another generator you will have to write yield* anotherGenerator(args). This means that suspend points are always visible in the code, just like they are when using callbacks.

Amazing stuff! So what does this mean? What is the reduction of code complexity? What are the performance characteristics of code using generators? Is debugging easy? What about environments that don't have ES6 support?

I decided to do a big comparison of all existing node async code patterns and find the answers to these questions.

The analysis

For the analysis, I took file.upload, a typical CRUD method extracted from DoxBee called when uploading files. It executes multiple queries to the database: a couple of selects, some inserts and one update. Lots of mixed sync / async action.

It looks like this:

function upload(stream, idOrPath, tag, done) {
    var blob = blobManager.create(account);
    var tx = db.begin();
    function backoff(err) {
        tx.rollback();
        return done(new Error(err));
    }
    blob.put(stream, function (err, blobId) {
        if (err) return done(err);
        self.byUuidOrPath(idOrPath).get(function (err, file) {
            if (err) return done(err);
            var previousId = file ? file.version : null;
            var version = {
                userAccountId: userAccount.id,
                date: new Date(),
                blobId: blobId,
                creatorId: userAccount.id,
                previousId: previousId
            };
            version.id = Version.createHash(version);
            Version.insert(version).execWithin(tx, function (err) {
                if (err) return backoff(err);
                if (!file) {
                    var splitPath = idOrPath.split('/');
                    var fileName = splitPath[splitPath.length - 1];
                    var newId = uuid.v1();
                    self.createQuery(idOrPath, {
                        id: newId,
                        userAccountId: userAccount.id,
                        name: fileName,
                        version: version.id
                    }, function (err, q) {
                        if (err) return backoff(err);
                        q.execWithin(tx, function (err) {
                            afterFileExists(err, newId);
                        });

                    })
                }
                else return afterFileExists(null, file.id);
            });
            function afterFileExists(err, fileId) {
                if (err) return backoff(err);
                FileVersion.insert({fileId: fileId,versionId: version.id})
                    .execWithin(tx, function (err) {
                        if (err) return backoff(err);
                        File.whereUpdate({id: fileId}, {
                            version: version.id
                        }).execWithin(tx, function (err) {
                            if (err) return backoff(err);
                            tx.commit(done);
                        });
                })
            }
        });
    });
}

Slightly pyramidal code full of callbacks.

This is how it looks like when written with generators:

var genny = require('genny');
module.exports = genny.fn(function* upload(resume, stream, idOrPath, tag) {
    var blob = blobManager.create(account);
    var tx = db.begin();
    try {
        var blobId = yield blob.put(stream, resume());
        var file = yield self.byUuidOrPath(idOrPath).get(resume());
        var previousId = file ? file.version : null;
        var version = {
            userAccountId: userAccount.id,
            blobId: blobId,
            creatorId: userAccount.id,
            previousId: previousId
        };
        version.id = Version.createHash(version);
        yield Version.insert(version).execWithin(tx, resume());
        if (!file) {
            var splitPath = idOrPath.split('/');
            var fileName = splitPath[splitPath.length - 1];
            var newId = uuid.v1();
            var file = {
                id: newId,
                userAccountId: userAccount.id,
                name: fileName,
                version: version.id
            }
            var q = yield self.createQuery(idOrPath, file, resume());
            yield q.execWithin(tx, resume());
        }
        yield FileVersion.insert({fileId: file.id, versionId: version.id})
            .execWithin(tx, resume());
        yield File.whereUpdate({id: file.id}, {version: version.id})
            .execWithin(tx, resume());
        yield tx.commit(resume());
    } catch (e) {
        tx.rollback();
        throw e;
    }
});

Shorter, very straight-forward code and absolutely no nesting of callback functions. Awesome.

Yet subjective adjectives are not very convincing. I want to have a measure of complexity, a number that tells me what I'm actually saving.

I also want to know what the performance characteristics are - how much time and memory would it take to execute a thousand of parallel invocations of this method? What about 2000 or 3000?

Also, what happens if an exception is thrown? Do I get a complete stack trace like in the original version?

I also wanted to compare the results with other alternatives, such as fibers, streamlinejs and promises (without generators).

So I wrote a lot of different versions of this method, and I will share my personal impressions before giving you the results of the analysis

The examples

original.js

The original solution, presented above. Vanilla callbacks. Slightly pyramidal. I consider it acceptable, if a bit mediocre.

flattened.js

Flattened variant of the original via named functions. Taking the advice from callback hell, I flattened the pyramid a little bit. As I did that, I found that while the pyramid shrunk, the code actually grew.

catcher.js

I noticed that the first two vanilla solutions had a lot of common error handling code everywhere. So I wrote a tiny library called catcher.js which works very much like node's domain.intercept. This reduced the complexity and the number of lines further, but the pyramidal looks remained.

async.js

Uses the waterfall function from caolan's async. Very similar to flattened.js but without the need to handle errors at every step.

flattened-class.js, flattened-noclosure.js, flattened-class-ctx.js

See this post for details

promises.js

I'll be honest. I've never written promise code in node before. Driven by Gozalla's excellent post I concluded that everything should be a promise, and things that can't handle promises should also be rewritten.

Take for example this particular line in the original:

var previousId = file ? file.version : null;

If file is a promise, we can't use the ternary operator or the property getter. Instead we need to write two helpers: a ternary operator helper and a property getter helper:

var previousIdP = p.ternary(fileP, p.get(fileP, "version"), null);

Unfortunately this gets out of hand quickly:

var versionP = p.allObject({
    userAccountId: userAccount.id,
    blobId: blobIdP,
    creatorId: userAccount.id,
    previousId: previousIdP,
    ...
});
versionP = p.set(versionP, p.allObject({
    id: fn.call(Version.createHash, versionP)
}));
// Even if Version.insert has been lifted to take promise arguments, it returns
// a promise and therefore we cannot call execWithinP. We have to wait for the
// promise  to resolve to invoke the function.
var versionInsert = p.eventuallyCall(
    Version.insert(versionP), 'execWithinP', tx);
var versionIdP = p.get(versionP, 'id');

So I decided to write a less aggressive version, promiseish.js

note: I used when because i liked its function lifting API better than Q's

promiseish.js and promiseishQ.js

Nothing fancy here, just some .then() chaining. In fact it feels less complex than the promise.js version, where I felt like I was trying to fight the language all the time.

The second file promiseishQ.js uses Q instead of when. No big difference there.

fibrous.js

Fibrous is a fibers library that creates "sync" methods out of your async ones, which you can then run in a fiber.

So if for example you had:

fs.readFile(file, function(err, data){ ... });

Fibrous would generate a version that returns a future, suspends the running fiber and resumes execution when the value becomes available.

var data = fs.sync.readFile(file);

I also needed to wrap the entire upload function:

fibrous(function upload() { ... })

This felt very similar to the generators version above but with sync instead of yield to indicate the methods that will yield. The one benefit I can think of is that it feels more natural for chaining - less parenthesis are needed.

somefn.sync(arg).split('/')
// vs
(yield somefn(arg, resume)).split('/')

Major drawback: this will never be available outside of node.js or without native modules.

Library: fibrous

suspend.js and genny.js

suspend and genny are generator-based solutions that can work directly with node-style functions.

I'm biased here since I wrote genny. I still think that this is objectively the best way to use generators in node. Just replace the callback with a placeholder generator-resuming function, then yield that. Comes back to you with the value.

Kudos to jmar777 for realizing that you don't need to actually yield anything and can resume the generator using the placeholder callback instead.

Both suspend and genny use generators roughly the same way. The resulting code is very clean, very straightforward and completely devoid of callbacks.

qasync.js

Q provides two methods that allow you to use generators: Q.spawn and Q.async. In both cases the generator yields promises and in turn receives resolved values.

The code didn't feel very different from genny and suspend. Its slightly less complicated: you can yield the promise instead of placing the provided resume function at every point where a callback is needed.

Caveat: as always with promises you will need to wrap all callback-based functions.

Library: Q

co.js and gens.js

Gens and co are generator-based libraries. Both can work by yielding thunk-style functions: that is, functions that take a single argument which is a node style callback in the format function (err, result)

The code looks roughly the same as qasync.js

The problem is, thunks still require wrapping. The recommended way to wrap node style functions is to use co.wrap for co and fn.bind for gens - so thats what I did.

streamline.js

Uses streamlinejs CPS transformer and works very much like co and qasync, except without needing to write yield all the time.

Caveat: you will need to compile the file in order to use it. Also, even though it looks like valid JavaScript, it isn't JavaScript. Superficially, it has the same syntax, but it has very different semantics, particularly when it comes to the _ keyword, which acts like yield and resume combined in one.

The code however is really simple and straightforward: infact it has the lowest complexity.

Complexity

To measure complexity I took the number of tokens in the source code found by Esprima's lexer (comments excluded). The idea is taken from Paul Graham's essay Succinctness is Power

I decided to allow all callback wrapping to happen in a separate file: In a large system, the wrapped layer will probably be a small part of the code.

Results:

name	tokens	complexity
src-streamline._js	302	1.00
co.js	304	1.01
qasync.js	314	1.04
fibrous.js	317	1.05
suspend.js	331	1.10
genny.js	339	1.12
gens.js	341	1.13
catcher.js	392	1.30
promiseishQ.js	396	1.31
promiseish.js	411	1.36
original.js	421	1.39
async.js	442	1.46
promises.js	461	1.53
flattened.js	473	1.57
flattened-noclosure.js	595	1.97
flattened-class-ctx.js	674	2.23
flattened-class.js	718	2.38
rx.js	935	3.10

Streamline and co have the lowest complexity. Fibrous, qasync, suspend, genny and gens are roughly comparable.

Catcher is comparable with the normal promise solutions. Both are roughly comparable to the original version with callbacks, but there is some improvement as the error handling is consolidated to one place.

It seems that flattening the callback pyramid increases the complexity a little bit. However, arguably the readability of the flattened version is improved.

Using caolan's async in this particular case doesn't seem to yield much improvement. Its complexity however is lower than the flattened version because it consolidates error handling.

Going promises-all-the-way as Gozala suggests also increases the complexity because we're fighting the language all the time.

The rx.js sample is still a work in progress - it can be made much better.

Performance (time and memory)

All external methods are mocked using setTimeout to simulate waiting for I/O.

There are two variables that control the test:

$n$ - the number of parallel "upload requests"
$t$ - average wait time per async I/O operation

For the first test, I set the time for every async operation to 1ms then ran every solution for $n \in \lbrace 100, 500, 1000, 1500, 2000 \rbrace$

note: hover over the legend to highlight the item on the chart.

Wow. Promises seem really, really slow. Fibers are also slow, with time complexity $O(n^2)$ . Everything else seems to be much faster.

Update (Dec 20 2013): Promises not slow anymore. PetkaAntonov wrote Bluebird, which is faster than almost everything else and very low on memory usage. For more info read Why I am switching to Promises

Lets try removing all those promises and fibers to see whats down there.

Ah, much better.

The original and flattened solutions are the fastest, as they use vanilla callbacks, with the fastest flattened solution being flattened-class.js.

suspend is the fastest generator based solution. It incurred minimal overhead of about 60% running time. Its also roughly comparable with streamlinejs (when in raw callbacks mode).

caolan's async adds some measurable overhead (its about 2 times slower than the original versions). Its also somewhat slower than the fastest generator based solution.

genny is about 3 times slower. This is because it adds some protection guarantees: it makes sure that callback-calling functions behave and call the callback only once. It also provides a mechanism to enable better stack traces when errors are encountered.

The slowest of the generator bunch is co, but not by much. There is nothing intrinsically slow about it though: the slowness is probably caused by co.wrap which creates a new arguments array on every invocation of the wrapped function.

All generator solutions become about 2 times slower when compiled with Google Traceur, an ES6 to ES5 compiler which we need to run generators code without the --harmony switch or in browsers.

Finally we have rx.js which is about 10 times slower than the original.

However, this test is a bit unrealistic.

Most async operations take much longer than 1 millisecond to complete, especially when the load is as high as thousands of requests per second. As a result, performance is I/O bound - why measure things as if it were CPU-bound?

So lets make the average time needed for an async operation depend on the number of parallel calls to upload().

On my machine redis can be queried about 40 000 times per second; node's "hello world" http server can serve up to 10 000 requests per second; postgresql's pgbench can do 300 mixed or 15 000 select transactions per second.

Given all that, I decided to go with 10 000 requests per second - it looks like a reasonable (rounded) mean.

Each I/O operation will take 10 ms on average when there are 100 running in parallel and 1000 ms when there are 10 000 running in parallel. Makes much more sense.

promises.js and fibrous.js are still significantly slower. However all of the other solutions are quite comparable now. Lets remove the worst two:

Everything is about the same now. Great! So in practice, you won't notice the CPU overhead in I/O bound cases - even if you're using promises. And with some of the generator libraries, the overhead becomes practically invisible.

Excellent. But what about memory usage? Lets chart that too!

Note: the y axis represents peak memory usage (in MB).

Seems like promises also use a lot of memory, especially the extreme implementation promises.js. promiseish.js as well as qasync.js are not too far behind.

fibrous.js, rx.js and stratifiedjs are somewhat better than the above, however their memory usage is still over 5 times bigger than the original.

Lets remove the hogs and see what remains underneath.

Streamline's fibers implementation uses 35MB while the rest use between 10MB and 25MB.

This is amazing. Generators (without promises) also have a low memory overhead, even when compiled with traceur.

Streamline is also quite good in this category. It has very low overhead, both in CPU and memory usage.

Its important to note that the testing method that I use is not statistically sound. Its however good enough to be used to compare orders of magnitude, which is fine considering the narrowly defined benchmark.

With that said, here is a table for 1000 parallel requests with 10 ms response time for I/O operations (i.e. 100K IO / s)

file	time(ms)	memory(MB)
suspend.js	101	8.62
flattened-class.js	111	4.48
flattened-noclosure.js	124	5.04
flattened.js	125	5.18
original.js	130	5.33
async.js	135	7.36
dst-streamline.js	139	8.34
catcher.js	140	6.45
dst-suspend-traceur.js	142	6.76
gens.js	149	8.40
genny.js	161	11.69
co.js	182	11.14
dst-genny-traceur.js	250	8.84
dst-stratifiedjs-014.js	267	23.55
dst-co-traceur.js	284	13.54
rx.js	295	40.43
dst-streamline-fibers.js	526	17.05
promiseish.js	825	117.88
qasync.js	971	98.39
fibrous.js	1159	57.48
promiseishQ.js	1161	96.47
dst-qasync-traceur.js	1195	112.10
promises.js	2315	240.39

Debuggability

Having good performance is important. However, all the performance is worth nothing if our code doesn't do what its supposed to. Debugging is therefore at least as important as performance.

How can we measure debuggability? We can look at source maps support and the generated stack traces.

Source maps support

I split this category into 5 levels:

level 1: no source maps, but needs them (wacky stack trace line numbers)
level 2: no source maps and needs them sometimes (to view the original code)

Streamline used to be in this category but now it does have source maps support.
level 3: has source maps and needs them always.

Nothing is in this category.
level 4: has source maps and needs them sometimes

Generator libraries are in this category. When compiled with traceur (e.g. for the browser) source maps are required and needed. If ES6 is available, source maps are unnecessary.

Streamline is also in this category for another reason. With streamline, you don't need source maps to get accurate stack traces. However, you will need them if you want to read the original code (e.g. when debugging in the browser).
level 5: doesn't need source maps

Everything else is in this category. That's a bit unfair as fibers will never work in a browser.

Stack trace accuracy

This category also has 5 levels:

level 1: stack traces are missing

suspend, co and gens are in this category. When an error happens in one of the async functions, this is how the result looks like:
```
Error: Error happened
  at null._onTimeout (/home/spion/Documents/tests/async-compare/lib/fakes.js:27:27)
  at Timer.listOnTimeout [as ontimeout] (timers.js:105:15)
```
No mention of the original file, examples/suspend.js

Unfortunately, if you throw an error to a generator using iterator.throw(error), the last yield point will not be present in the resulting stack trace. This means you will have no idea which line in your generator is the offending one.

Regular exceptions that are not thrown using iterator.throw have complete stack traces, so only yield points will suffer.

Some solutions that aren't generator based are also in this category, namely promiseish.js and async.js. When a library handles errors for you, the callback stack trace will not be preserved unless special care is taken to preserve it. async and when don't do that.
level 2: stack traces are correct with native modules

Bruno Jouhier's generator based solution galaxy is in this category. It has a native companion module called galaxy-stack that implements long stack traces without a performance penalty.

Note that galaxy-stack doesn't work with node v0.11.5

level 3: stack traces are correct with a flag (adding a performance penalty).

All Q-based solutions are here, even qasync.js, which uses generators. Q's support for stack traces via Q.longStackSupport = true; is good:

Error: Error happened
    at null._onTimeout (/home/spion/Documents/tests/async-compare/lib/fakes.js:27:27)
    at Timer.listOnTimeout [as ontimeout] (timers.js:105:15)
From previous event:
    at /home/spion/Documents/tests/async-compare/examples/qasync.js:41:18
    at GeneratorFunctionPrototype.next (native)

So, does this mean that its possible to add long stack traces support to a callbacks-based generator library the way that Q does it?

Yes it does! Genny is in this category too:

Error: Error happened
    at null._onTimeout (/home/spion/Documents/tests/async-compare/lib/fakes.js:27:27)
    at Timer.listOnTimeout [as ontimeout] (timers.js:105:15)
From generator:
    at upload (/home/spion/Documents/tests/async-compare/examples/genny.js:38:35)

However it incurs about 50-70% memory overhead and is about 6 times slower.

Catcher is also in this category, with 100% memory overhead and about 10 times slower.

level 4: stack traces are correct but fragile

All the raw-callback solutions are in this category: original, flattened, flattened-class, etc. At the moment, rx.js is in this category too.

As long as the callback functions are defined by your code everything will be fine. However, the moment you introduce some wrapper that handles the errors for you, your stack traces will break and will show functions from the wrapper library instead.
level 5: stack traces are always correct

Streamline and fibers are in this category. Streamline compiles the file in a way that preserves line numbers, making stack traces correct in all cases. Fibers also preserve the full call stack.

Ah yes. A table.

name	source maps	stack traces	total
fibrous.js	5	5	10
src-streamline._js	4	5	9
original.js	5	4	9
flattened*.js	5	4	9
rx.js	5	4	9
catcher.js	5	3	8
promiseishQ.js	5	3	8
qasync.js	4	3	7
genny.js	4	3	7
async.js	5	1	6
promiseish.js	5	1	6
promises.js	5	1	6
suspend.js	4	1	5
gens.js	4	1	5
co.js	4	1	5

Generators are not exactly great. They're doing well enough thanks to qasync and genny.

Conclusion

If this analysis left you even more confused than before, you're not alone. It seems hard to make a decision even with all the data available.

My opinion is biased. I love generators, and I've been pushing pretty hard to direct the attention of V8 developers to them (maybe a bit too hard). And its obvious from the analysis above that they have good characteristics: low code complexity, good performance.

More importantly, they will eventually become a part of everyday JavaScript with no compilation (except for older browsers) or native modules required, and the yield keyword is in principle as good indicator of async code as callbacks are.

Unfortunately, the debugging story for generators is somewhat bad, especially because of the missing stack traces for thrown errors. Fortunately, there are solutions and workarounds, like those implemented by genny (obtrusive, reduces performance) and galaxy (unobtrusive, but requires native modules).

But there are things that cannot be measured. How will the community accept generators? Will people find it hard to decide whether to use them or not? Will they be frowned upon when used in code published to npm?

I don't have the answers to these questions. I only have hunches. But they are generally positive. Generators will play an important role in the future of node.

Special thanks to Raynos, maxogden, mikeal and damonoehlman for their input on the draft version of this analysis.

Thanks to jmar777 for making suspend