Parsing Expression Grammars and Left Recursion

Parsing Expression Grammars are a form of context-free grammars with ordered choice. This reduces ambiguity in the grammar and makes writing simple recursive-descent parsers (and parser generators) quite easy. With memoization, you can parse in linear time. The resulting parser is called a Packrat Parser.

The only problem is handling left-recursion. If you have grammar rules like this:

Expression = Expression "+" Term
           | Term

your recursive-descent parser will recurse until it runs out of stack.

I just noticed a paper on detecting and handling left-recursion: Packrat Parsers can Handle Left Recursion. It’s a nifty technique, but it somewhat reduces the simplicity of the resulting program.

For a finger exercise a while ago I wrote a pretty basic packrat parser generator and used it to implement a C preprocessor. Source is here. When my supply of circular tuits increases I’ll have to look at implementing the stuff from the paper.