One of the innovations that makes BABLR work efficiently is its use of stream iterators. The BABLR parser operates on streams at every level: it uses streams of characters for input, streams of tags for output, and streams of instructions to control the parser. In fact we rely so heavily on streams that for reasons of abstraction and performance we've had to develop our own way of representing them.
Iterators add a layer of indirection to data access, and in Javascript they have a performance cost. What they offer in return is an abstraction over how data is stored. You can write a method parse(chrs)parse(chrs)StringStringArrayArrayBufferBuffer
In practice sync iteration only abstracts away how data is laid out in memory. To be able to abstract over the differences between something like a character stream held in a string and a character stream being loaded over a network socket you need a different kind of iterator: one which can be suspended and resumed at a different moment in time. The Javascript standards present this as an "async iterator".
Unfortunately, async iterators as a construct were designed incorrectly. I've been quite careful to lay out the purpose of iterators as a time-sequence abstraction over storage, and I hope it seems natural that both a network stream of data and a string are natural to think of in the abstract as sequences of characters. The problem, then, is that async iterators do not provide an abstraction over this!!!
Far from giving you an abstraction, Javascript instead makes sync and async iteration two completely separate concrete things. Not just the official advice but the community best practice is to literally write all your code twice: once using synchronous iterators and synchronous for .. offor .. ofasync functionasync functionfor await .. offor await .. of
Let's see what all this looks like in code:
// The sync API must exist for callers in sync functions function* syncParse(chrs) { for (let chr of chrs) { /* ...do parsing stuff */ yield tag; } } // The async API must exist for when data is a network stream async function* asyncParse(chrs) { for await(let chr of chrs) { /* ...do parsing stuff */ yield tag; } }// The sync API must exist for callers in sync functions function* syncParse(chrs) { for (let chr of chrs) { /* ...do parsing stuff */ yield tag; } } // The async API must exist for when data is a network stream async function* asyncParse(chrs) { for await(let chr of chrs) { /* ...do parsing stuff */ yield tag; } }
Because BABLR is the most ambitious application ever conceived and executed on top of Javascript data streams, writing all our code twice was not a realistic option for us. To be able to move forward we needed to create a successful abstraction. We call the result "stream iterators". The easiest way of describing stream iterators is using the syntax we would propose:
async? function* streamParse(chrs) { for await? (let chr of chrs) { /* ...do parsing stuff */ yield tag; } }async? function* streamParse(chrs) { for await? (let chr of chrs) { /* ...do parsing stuff */ yield tag; } }
While we can't really use this async?async?
Here's how we can write streamParsestreamParse
import { streamify, wait } from '@bablr/stream-iterator'; let streamParse = streamify(function* (chrs) { let iter = chrs[Symbol.streamIterator](); let step; while (true) { // iter.next returns either a step object or a promise step = iter.next(); if (step instanceof Promise) { // resolve a promise to a step object // this line causes this generator to emit a promise as a step step = yield wait(step); } if (step.done) break; // either yields a step synchronously or fulfills a step promise yield tag; } });import { streamify, wait } from '@bablr/stream-iterator'; let streamParse = streamify(function* (chrs) { let iter = chrs[Symbol.streamIterator](); let step; while (true) { // iter.next returns either a step object or a promise step = iter.next(); if (step instanceof Promise) { // resolve a promise to a step object // this line causes this generator to emit a promise as a step step = yield wait(step); } if (step.done) break; // either yields a step synchronously or fulfills a step promise yield tag; } });
While this code is certainly messier looking, it succeeds handsomely in its purpose: we now have a practical method to define a parsing algorithm once while being truly agnostic to whether the data we're parsing lives in a string or on the filesystem or on the internet. We're now properly able to match a cost to a benefit!