Friday 3 September 2004 — This is more than 20 years old. Be careful.
If you want to generate a parser, there are lots of choices of software to help you. Most follow the classic yacc model: you write a grammar in a dedicated grammar language, run it through a parser generator, and end up with executable code for your parser. Newer tools run the same way: Bison, ANTLR, and Lemon.
So I was interested to find the Spirit Parser Framework, which employs a completely different technology. Grammars are written in highly stylized C++, and executed directly. The grammar syntax is unconventional, because it’s all based on C++ operator overloading. Here’s part of a SQL parser:
program = +(query) ;
query
= longest_d[ select_from_clause
| select_from_where_clause ] >> SEMI;
select_from_clause
= select_clause >> from_clause ;
select_from_where_clause
= select_from_clause >> where_clause ;
select_clause
= SELECT >> !(DISTINCT) >> ( STAR | var_list_clause ) ;
var_list_clause
= list_p( varname >> !alias_clause, COMMA ) ;
from_clause
= FROM >> list_p( table_name >> !alias_clause, COMMA ) ;
Significant template wizardry is at work behind the scenes, I don’t claim to understand it.
Comments
You'll get a mixture of compile-time and runtime errors when your grammar has problems. Getting a C++ compiler error for a parser grammar error isn't going to be pretty. That's not to say that this is an entirely bad idea, just that it may be, as they say "too clever by half".
Doing a quick scan of the documentation I found this question in the FAQ that speaks to the complexity of this approach:
Are there any techniques to minimize compile times using spirit. For simple parsers compile time doesn't seem to be a big issue, but I recently tried creating a parser with about 78 rules, and it took about 2 hours to compile
78 rules is a moderately complex grammar, not that unusual. Two hours to build this parser? Yikes.
Note: the response to the question mentions some ways of mitigating the problem but it's not clear how much effect these would have on the compile time.
It uses Python's operator overloading to construct the grammar. Of course it doesn't take 2 hours to compile...
I'm up in the air about redefining syntax via a library. The boost lambda library takes the same approach. I'm not convinced it is a good idea.
Add a comment: