----------------------------------------------------------------------------- RECOVERY OF SYNTAX DEFINITION FOR LEX ----------------------------------------------------------------------------- No syntax definition for LEX was available in the grammar-base. In order to further automate the translation of LEX/YACC grammars to SDF2 syntax definitions, such a syntax definition is needed. In this file I report the steps I took to get LEX in SDF2. -- Eelco Visser 2001/09/29 ----------------------------------------------------------------------------- [Step 1] Locate the sources ftp://ftp.gnu.org/non-gnu/flex/ [Step 2] Inspect the source cd flex-2.5.4 less parse.y [Step 3] Copy source to grammar base cp parse.y ~/res/XT/gb/grammars/lex.0 [Step 3] Parse the YACC (Bison) source > parse -l yacc -i lex.y -I -o lex.af => Error: charliteral '\n' not recognized. Repair syntax definition of YACC. [Step 4] Translate to AbstractSDF > parse -l yacc -i lex.y -I -o lex.af yacc2sdf -i lex.af -o lex.asdf [Step 5] Pretty-print syntax definition and inspect > sdf-bracket -i lex.asdf | pp -a -l sdf -o lex.def -v 2.1 less lex.def [Step 6] Regularize the syntax definition > parse -l yacc -i lex.y -I -o lex.af yacc2sdf -i lex.af -o lex.asdf sdf-regularize -i lex.asdf -o lex.reg.asdf sdf-bracket -i lex.reg.asdf | pp -a -l sdf -o lex.def -v 2.1 less lex.def [Step 7] Generate constructors > parse -l yacc -i lex.y -I -o lex.af yacc2sdf -i lex.af -o lex.asdf sdf-regularize -i lex.asdf -o lex.reg.asdf sdf-cons -i lex.reg.asdf -o lex.reg.cons.asdf sdf-bracket -i lex.reg.cons.asdf | pp -a -l sdf -o lex.def -v 2.1 less lex.def [Step 8] Edit the definition to define lexical syntax and improve constructors [Step 9] Unpack the definition to create separate SDF modules. Make check can then be used to generate lex.def and automatically parse various example files. Before doing this change the names of the modules Lexical and Generated into Lex-Symbols and Lex, respectively. > unpack-sdf lex.def [Step 10] Further edit the modules to define lexical syntax. => It turns rather hard to parse complete lex files. The lexical syntax is very tricky. Instead I decide to reduce the problem by editing lex files by hand to remove all irrelevant stuff and only leave definitions of the form |name re| and rules of the form |re return id;|. Newlines cannot be used as general layout, but are used to delimit definitions and rules. No superfluous newlines are allowed. => Succeed in parsing stratego.mod.l, a modified version of the stratego lexical syntax in lex. [Step 11] Improve the syntax definition to get good abstract syntax. Start with unfolding literals. > parse -l sdf -v 2.1 -I -i lex.def -o lex.adef unfold-literal -i lex.adef -o lex.unf.adef sdf-bracket -i lex.adef | pp -a -l sdf -o lex.def -v 2.1 less lex.def => Automatic application does not work, do it manually. (come back and repair unfold-literal later) [Step 12] Abstract syntax looks good. Install parse table such that it can be used with the parse tool of the grammar base. > make install parse -l lex -i data/stratego.mod.l -I [Step 13] Project finished. => Future work: - parse full lex definitions - generate a signature from the syntax definition