Parsing Expression Grammars are a form of context-free grammars with ordered choice. This reduces ambiguity in the grammar and makes writing simple recursive-descent parsers (and parser generators) quite easy. With memoization, you can parse in linear time. The resulting parser is called a Packrat Parser.
The only problem is handling left-recursion. If you have grammar rules like this:
Expression = Expression "+" Term
| Term
your recursive-descent parser will recurse until it runs out of stack.
I just noticed a paper on detecting and handling left-recursion: Packrat Parsers can Handle Left Recursion. It’s a nifty technique, but it somewhat reduces the simplicity of the resulting program.
For a finger exercise a while ago I wrote a pretty basic packrat parser generator and used it to implement a C preprocessor. Source is here. When my supply of circular tuits increases I’ll have to look at implementing the stuff from the paper.
For what it’s worth, the way I proposed handling left-recursion has a much simpler runtime. Instead, more of the work is done upfront with the grammar.
https://lists.csail.mit.edu/pipermail/peg/2007-September/000157.html
https://lists.csail.mit.edu/pipermail/peg/2007-September/000160.html
Hmm, I’ll have to take a look.
I didn’t understand more than one word in four of that. Good thing I’m not a linguist.
Actually, it’s computer science, not linguistics :-) Computers have grammar too!
Ah! No wonder the computer gods don’t like me, when I can’t even understand their language :).