source: trunk/essentials/sys-devel/flex/doc/flex.info-3

Last change on this file was 3031, checked in by bird, 18 years ago

flex 2.5.33.

File size: 49.5 KB
Line 
1This is flex.info, produced by makeinfo version 4.5 from flex.texi.
2
3INFO-DIR-SECTION Programming
4START-INFO-DIR-ENTRY
5* flex: (flex). Fast lexical analyzer generator (lex replacement).
6END-INFO-DIR-ENTRY
7
8
9 The flex manual is placed under the same licensing conditions as the
10rest of flex:
11
12 Copyright (C) 1990, 1997 The Regents of the University of California.
13All rights reserved.
14
15 This code is derived from software contributed to Berkeley by Vern
16Paxson.
17
18 The United States Government has rights in this work pursuant to
19contract no. DE-AC03-76SF00098 between the United States Department of
20Energy and the University of California.
21
22 Redistribution and use in source and binary forms, with or without
23modification, are permitted provided that the following conditions are
24met:
25
26 1. Redistributions of source code must retain the above copyright
27 notice, this list of conditions and the following disclaimer.
28
29 2. Redistributions in binary form must reproduce the above copyright
30 notice, this list of conditions and the following disclaimer in the
31 documentation and/or other materials provided with the
32 distribution.
33 Neither the name of the University nor the names of its contributors
34may be used to endorse or promote products derived from this software
35without specific prior written permission.
36
37 THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
38WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
39MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
40
41File: flex.info, Node: Debugging Options, Next: Miscellaneous Options, Prev: Options for Scanner Speed and Size, Up: Scanner Options
42
43Debugging Options
44=================
45
46`-b, --backup, `%option backup''
47 Generate backing-up information to `lex.backup'. This is a list of
48 scanner states which require backing up and the input characters on
49 which they do so. By adding rules one can remove backing-up
50 states. If _all_ backing-up states are eliminated and `-Cf' or
51 `-CF' is used, the generated scanner will run faster (see the
52 `--perf-report' flag). Only users who wish to squeeze every last
53 cycle out of their scanners need worry about this option. (*note
54 Performance::).
55
56`-d, --debug, `%option debug''
57 makes the generated scanner run in "debug" mode. Whenever a
58 pattern is recognized and the global variable `yy_flex_debug' is
59 non-zero (which is the default), the scanner will write to
60 `stderr' a line of the form:
61
62
63 -accepting rule at line 53 ("the matched text")
64
65 The line number refers to the location of the rule in the file
66 defining the scanner (i.e., the file that was fed to flex).
67 Messages are also generated when the scanner backs up, accepts the
68 default rule, reaches the end of its input buffer (or encounters a
69 NUL; at this point, the two look the same as far as the scanner's
70 concerned), or reaches an end-of-file.
71
72`-p, --perf-report, `%option perf-report''
73 generates a performance report to `stderr'. The report consists of
74 comments regarding features of the `flex' input file which will
75 cause a serious loss of performance in the resulting scanner. If
76 you give the flag twice, you will also get comments regarding
77 features that lead to minor performance losses.
78
79 Note that the use of `REJECT', and variable trailing context
80 (*note Limitations::) entails a substantial performance penalty;
81 use of `yymore()', the `^' operator, and the `--interactive' flag
82 entail minor performance penalties.
83
84`-s, --nodefault, `%option nodefault''
85 causes the _default rule_ (that unmatched scanner input is echoed
86 to `stdout)' to be suppressed. If the scanner encounters input
87 that does not match any of its rules, it aborts with an error.
88 This option is useful for finding holes in a scanner's rule set.
89
90`-T, --trace, `%option trace''
91 makes `flex' run in "trace" mode. It will generate a lot of
92 messages to `stderr' concerning the form of the input and the
93 resultant non-deterministic and deterministic finite automata.
94 This option is mostly for use in maintaining `flex'.
95
96`-w, --nowarn, `%option nowarn''
97 suppresses warning messages.
98
99`-v, --verbose, `%option verbose''
100 specifies that `flex' should write to `stderr' a summary of
101 statistics regarding the scanner it generates. Most of the
102 statistics are meaningless to the casual `flex' user, but the
103 first line identifies the version of `flex' (same as reported by
104 `--version'), and the next line the flags used when generating the
105 scanner, including those that are on by default.
106
107`--warn, `%option warn''
108 warn about certain things. In particular, if the default rule can
109 be matched but no defualt rule has been given, the flex will warn
110 you. We recommend using this option always.
111
112
113
114File: flex.info, Node: Miscellaneous Options, Prev: Debugging Options, Up: Scanner Options
115
116Miscellaneous Options
117=====================
118
119`-c'
120 is a do-nothing option included for POSIX compliance.
121
122 generates
123
124`-h, -?, --help'
125 generates a "help" summary of `flex''s options to `stdout' and
126 then exits.
127
128`-n'
129 is another do-nothing option included only for POSIX compliance.
130
131`-V, --version'
132 prints the version number to `stdout' and exits.
133
134
135
136File: flex.info, Node: Performance, Next: Cxx, Prev: Scanner Options, Up: Top
137
138Performance Considerations
139**************************
140
141 The main design goal of `flex' is that it generate high-performance
142scanners. It has been optimized for dealing well with large sets of
143rules. Aside from the effects on scanner speed of the table compression
144`-C' options outlined above, there are a number of options/actions
145which degrade performance. These are, from most expensive to least:
146
147
148 REJECT
149 arbitrary trailing context
150
151 pattern sets that require backing up
152 %option yylineno
153 %array
154
155 %option interactive
156 %option always-interactive
157
158 @samp{^} beginning-of-line operator
159 yymore()
160
161 with the first two all being quite expensive and the last two being
162quite cheap. Note also that `unput()' is implemented as a routine call
163that potentially does quite a bit of work, while `yyless()' is a
164quite-cheap macro. So if you are just putting back some excess text you
165scanned, use `ss()'.
166
167 `REJECT' should be avoided at all costs when performance is
168important. It is a particularly expensive option.
169
170 There is one case when `%option yylineno' can be expensive. That is
171when your patterns match long tokens that could _possibly_ contain a
172newline character. There is no performance penalty for rules that can
173not possibly match newlines, since flex does not need to check them for
174newlines. In general, you should avoid rules such as `[^f]+', which
175match very long tokens, including newlines, and may possibly match your
176entire file! A better approach is to separate `[^f]+' into two rules:
177
178
179 %option yylineno
180 %%
181 [^f\n]+
182 \n+
183
184 The above scanner does not incur a performance penalty.
185
186 Getting rid of backing up is messy and often may be an enormous
187amount of work for a complicated scanner. In principal, one begins by
188using the `-b' flag to generate a `lex.backup' file. For example, on
189the input:
190
191
192 %%
193 foo return TOK_KEYWORD;
194 foobar return TOK_KEYWORD;
195
196 the file looks like:
197
198
199 State #6 is non-accepting -
200 associated rule line numbers:
201 2 3
202 out-transitions: [ o ]
203 jam-transitions: EOF [ \001-n p-\177 ]
204
205 State #8 is non-accepting -
206 associated rule line numbers:
207 3
208 out-transitions: [ a ]
209 jam-transitions: EOF [ \001-` b-\177 ]
210
211 State #9 is non-accepting -
212 associated rule line numbers:
213 3
214 out-transitions: [ r ]
215 jam-transitions: EOF [ \001-q s-\177 ]
216
217 Compressed tables always back up.
218
219 The first few lines tell us that there's a scanner state in which it
220can make a transition on an 'o' but not on any other character, and
221that in that state the currently scanned text does not match any rule.
222The state occurs when trying to match the rules found at lines 2 and 3
223in the input file. If the scanner is in that state and then reads
224something other than an 'o', it will have to back up to find a rule
225which is matched. With a bit of headscratching one can see that this
226must be the state it's in when it has seen `fo'. When this has
227happened, if anything other than another `o' is seen, the scanner will
228have to back up to simply match the `f' (by the default rule).
229
230 The comment regarding State #8 indicates there's a problem when
231`foob' has been scanned. Indeed, on any character other than an `a',
232the scanner will have to back up to accept "foo". Similarly, the
233comment for State #9 concerns when `fooba' has been scanned and an `r'
234does not follow.
235
236 The final comment reminds us that there's no point going to all the
237trouble of removing backing up from the rules unless we're using `-Cf'
238or `-CF', since there's no performance gain doing so with compressed
239scanners.
240
241 The way to remove the backing up is to add "error" rules:
242
243
244 %%
245 foo return TOK_KEYWORD;
246 foobar return TOK_KEYWORD;
247
248 fooba |
249 foob |
250 fo {
251 /* false alarm, not really a keyword */
252 return TOK_ID;
253 }
254
255 Eliminating backing up among a list of keywords can also be done
256using a "catch-all" rule:
257
258
259 %%
260 foo return TOK_KEYWORD;
261 foobar return TOK_KEYWORD;
262
263 [a-z]+ return TOK_ID;
264
265 This is usually the best solution when appropriate.
266
267 Backing up messages tend to cascade. With a complicated set of rules
268it's not uncommon to get hundreds of messages. If one can decipher
269them, though, it often only takes a dozen or so rules to eliminate the
270backing up (though it's easy to make a mistake and have an error rule
271accidentally match a valid token. A possible future `flex' feature
272will be to automatically add rules to eliminate backing up).
273
274 It's important to keep in mind that you gain the benefits of
275eliminating backing up only if you eliminate _every_ instance of
276backing up. Leaving just one means you gain nothing.
277
278 _Variable_ trailing context (where both the leading and trailing
279parts do not have a fixed length) entails almost the same performance
280loss as `REJECT' (i.e., substantial). So when possible a rule like:
281
282
283 %%
284 mouse|rat/(cat|dog) run();
285
286 is better written:
287
288
289 %%
290 mouse/cat|dog run();
291 rat/cat|dog run();
292
293 or as
294
295
296 %%
297 mouse|rat/cat run();
298 mouse|rat/dog run();
299
300 Note that here the special '|' action does _not_ provide any
301savings, and can even make things worse (*note Limitations::).
302
303 Another area where the user can increase a scanner's performance (and
304one that's easier to implement) arises from the fact that the longer the
305tokens matched, the faster the scanner will run. This is because with
306long tokens the processing of most input characters takes place in the
307(short) inner scanning loop, and does not often have to go through the
308additional work of setting up the scanning environment (e.g., `yytext')
309for the action. Recall the scanner for C comments:
310
311
312 %x comment
313 %%
314 int line_num = 1;
315
316 "/*" BEGIN(comment);
317
318 <comment>[^*\n]*
319 <comment>"*"+[^*/\n]*
320 <comment>\n ++line_num;
321 <comment>"*"+"/" BEGIN(INITIAL);
322
323 This could be sped up by writing it as:
324
325
326 %x comment
327 %%
328 int line_num = 1;
329
330 "/*" BEGIN(comment);
331
332 <comment>[^*\n]*
333 <comment>[^*\n]*\n ++line_num;
334 <comment>"*"+[^*/\n]*
335 <comment>"*"+[^*/\n]*\n ++line_num;
336 <comment>"*"+"/" BEGIN(INITIAL);
337
338 Now instead of each newline requiring the processing of another
339action, recognizing the newlines is distributed over the other rules to
340keep the matched text as long as possible. Note that _adding_ rules
341does _not_ slow down the scanner! The speed of the scanner is
342independent of the number of rules or (modulo the considerations given
343at the beginning of this section) how complicated the rules are with
344regard to operators such as `*' and `|'.
345
346 A final example in speeding up a scanner: suppose you want to scan
347through a file containing identifiers and keywords, one per line and
348with no other extraneous characters, and recognize all the keywords. A
349natural first approach is:
350
351
352 %%
353 asm |
354 auto |
355 break |
356 ... etc ...
357 volatile |
358 while /* it's a keyword */
359
360 .|\n /* it's not a keyword */
361
362 To eliminate the back-tracking, introduce a catch-all rule:
363
364
365 %%
366 asm |
367 auto |
368 break |
369 ... etc ...
370 volatile |
371 while /* it's a keyword */
372
373 [a-z]+ |
374 .|\n /* it's not a keyword */
375
376 Now, if it's guaranteed that there's exactly one word per line, then
377we can reduce the total number of matches by a half by merging in the
378recognition of newlines with that of the other tokens:
379
380
381 %%
382 asm\n |
383 auto\n |
384 break\n |
385 ... etc ...
386 volatile\n |
387 while\n /* it's a keyword */
388
389 [a-z]+\n |
390 .|\n /* it's not a keyword */
391
392 One has to be careful here, as we have now reintroduced backing up
393into the scanner. In particular, while _we_ know that there will never
394be any characters in the input stream other than letters or newlines,
395`flex' can't figure this out, and it will plan for possibly needing to
396back up when it has scanned a token like `auto' and then the next
397character is something other than a newline or a letter. Previously it
398would then just match the `auto' rule and be done, but now it has no
399`auto' rule, only a `auto\n' rule. To eliminate the possibility of
400backing up, we could either duplicate all rules but without final
401newlines, or, since we never expect to encounter such an input and
402therefore don't how it's classified, we can introduce one more
403catch-all rule, this one which doesn't include a newline:
404
405
406 %%
407 asm\n |
408 auto\n |
409 break\n |
410 ... etc ...
411 volatile\n |
412 while\n /* it's a keyword */
413
414 [a-z]+\n |
415 [a-z]+ |
416 .|\n /* it's not a keyword */
417
418 Compiled with `-Cf', this is about as fast as one can get a `flex'
419scanner to go for this particular problem.
420
421 A final note: `flex' is slow when matching `NUL's, particularly when
422a token contains multiple `NUL's. It's best to write rules which match
423_short_ amounts of text if it's anticipated that the text will often
424include `NUL's.
425
426 Another final note regarding performance: as mentioned in *Note
427Matching::, dynamically resizing `yytext' to accommodate huge tokens is
428a slow process because it presently requires that the (huge) token be
429rescanned from the beginning. Thus if performance is vital, you should
430attempt to match "large" quantities of text but not "huge" quantities,
431where the cutoff between the two is at about 8K characters per token.
432
433
434File: flex.info, Node: Cxx, Next: Reentrant, Prev: Performance, Up: Top
435
436Generating C++ Scanners
437***********************
438
439 *IMPORTANT*: the present form of the scanning class is _experimental_
440and may change considerably between major releases.
441
442 `flex' provides two different ways to generate scanners for use with
443C++. The first way is to simply compile a scanner generated by `flex'
444using a C++ compiler instead of a C compiler. You should not encounter
445any compilation errors (*note Reporting Bugs::). You can then use C++
446code in your rule actions instead of C code. Note that the default
447input source for your scanner remains `yyin', and default echoing is
448still done to `yyout'. Both of these remain `FILE *' variables and not
449C++ _streams_.
450
451 You can also use `flex' to generate a C++ scanner class, using the
452`-+' option (or, equivalently, `%option c++)', which is automatically
453specified if the name of the `flex' executable ends in a '+', such as
454`flex++'. When using this option, `flex' defaults to generating the
455scanner to the file `lex.yy.cc' instead of `lex.yy.c'. The generated
456scanner includes the header file `FlexLexer.h', which defines the
457interface to two C++ classes.
458
459 The first class, `FlexLexer', provides an abstract base class
460defining the general scanner class interface. It provides the
461following member functions:
462
463`const char* YYText()'
464 returns the text of the most recently matched token, the
465 equivalent of `yytext'.
466
467`int YYLeng()'
468 returns the length of the most recently matched token, the
469 equivalent of `yyleng'.
470
471`int lineno() const'
472 returns the current input line number (see `%option yylineno)', or
473 `1' if `%option yylineno' was not used.
474
475`void set_debug( int flag )'
476 sets the debugging flag for the scanner, equivalent to assigning to
477 `yy_flex_debug' (*note Scanner Options::). Note that you must
478 build the scannerusing `%option debug' to include debugging
479 information in it.
480
481`int debug() const'
482 returns the current setting of the debugging flag.
483
484 Also provided are member functions equivalent to
485`yy_switch_to_buffer()', `yy_create_buffer()' (though the first
486argument is an `istream*' object pointer and not a `FILE*)',
487`yy_flush_buffer()', `yy_delete_buffer()', and `yyrestart()' (again,
488the first argument is a `istream*' object pointer).
489
490 The second class defined in `FlexLexer.h' is `yyFlexLexer', which is
491derived from `FlexLexer'. It defines the following additional member
492functions:
493
494`yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 )'
495 constructs a `yyFlexLexer' object using the given streams for input
496 and output. If not specified, the streams default to `cin' and
497 `cout', respectively.
498
499`virtual int yylex()'
500 performs the same role is `yylex()' does for ordinary `flex'
501 scanners: it scans the input stream, consuming tokens, until a
502 rule's action returns a value. If you derive a subclass `S' from
503 `yyFlexLexer' and want to access the member functions and variables
504 of `S' inside `yylex()', then you need to use `%option
505 yyclass="S"' to inform `flex' that you will be using that subclass
506 instead of `yyFlexLexer'. In this case, rather than generating
507 `yyFlexLexer::yylex()', `flex' generates `S::yylex()' (and also
508 generates a dummy `yyFlexLexer::yylex()' that calls
509 `yyFlexLexer::LexerError()' if called).
510
511`virtual void switch_streams(istream* new_in = 0, ostream* new_out = 0)'
512 reassigns `yyin' to `new_in' (if non-null) and `yyout' to
513 `new_out' (if non-null), deleting the previous input buffer if
514 `yyin' is reassigned.
515
516`int yylex( istream* new_in, ostream* new_out = 0 )'
517 first switches the input streams via `switch_streams( new_in,
518 new_out )' and then returns the value of `yylex()'.
519
520 In addition, `yyFlexLexer' defines the following protected virtual
521functions which you can redefine in derived classes to tailor the
522scanner:
523
524`virtual int LexerInput( char* buf, int max_size )'
525 reads up to `max_size' characters into `buf' and returns the
526 number of characters read. To indicate end-of-input, return 0
527 characters. Note that `interactive' scanners (see the `-B' and
528 `-I' flags in *Note Scanner Options::) define the macro
529 `YY_INTERACTIVE'. If you redefine `LexerInput()' and need to take
530 different actions depending on whether or not the scanner might be
531 scanning an interactive input source, you can test for the
532 presence of this name via `#ifdef' statements.
533
534`virtual void LexerOutput( const char* buf, int size )'
535 writes out `size' characters from the buffer `buf', which, while
536 `NUL'-terminated, may also contain internal `NUL's if the
537 scanner's rules can match text with `NUL's in them.
538
539`virtual void LexerError( const char* msg )'
540 reports a fatal error message. The default version of this
541 function writes the message to the stream `cerr' and exits.
542
543 Note that a `yyFlexLexer' object contains its _entire_ scanning
544state. Thus you can use such objects to create reentrant scanners, but
545see also *Note Reentrant::. You can instantiate multiple instances of
546the same `yyFlexLexer' class, and you can also combine multiple C++
547scanner classes together in the same program using the `-P' option
548discussed above.
549
550 Finally, note that the `%array' feature is not available to C++
551scanner classes; you must use `%pointer' (the default).
552
553 Here is an example of a simple C++ scanner:
554
555
556 // An example of using the flex C++ scanner class.
557
558 %{
559 int mylineno = 0;
560 %}
561
562 string \"[^\n"]+\"
563
564 ws [ \t]+
565
566 alpha [A-Za-z]
567 dig [0-9]
568 name ({alpha}|{dig}|\$)({alpha}|{dig}|[_.\-/$])*
569 num1 [-+]?{dig}+\.?([eE][-+]?{dig}+)?
570 num2 [-+]?{dig}*\.{dig}+([eE][-+]?{dig}+)?
571 number {num1}|{num2}
572
573 %%
574
575 {ws} /* skip blanks and tabs */
576
577 "/*" {
578 int c;
579
580 while((c = yyinput()) != 0)
581 {
582 if(c == '\n')
583 ++mylineno;
584
585 else if(c == @samp{*})
586 {
587 if((c = yyinput()) == '/')
588 break;
589 else
590 unput(c);
591 }
592 }
593 }
594
595 {number} cout "number " YYText() '\n';
596
597 \n mylineno++;
598
599 {name} cout "name " YYText() '\n';
600
601 {string} cout "string " YYText() '\n';
602
603 %%
604
605 int main( int /* argc */, char** /* argv */ )
606 {
607 @code{flex}Lexer* lexer = new yyFlexLexer;
608 while(lexer->yylex() != 0)
609 ;
610 return 0;
611 }
612
613 If you want to create multiple (different) lexer classes, you use the
614`-P' flag (or the `prefix=' option) to rename each `yyFlexLexer' to
615some other `xxFlexLexer'. You then can include `<FlexLexer.h>' in your
616other sources once per lexer class, first renaming `yyFlexLexer' as
617follows:
618
619
620 #undef yyFlexLexer
621 #define yyFlexLexer xxFlexLexer
622 #include <FlexLexer.h>
623
624 #undef yyFlexLexer
625 #define yyFlexLexer zzFlexLexer
626 #include <FlexLexer.h>
627
628 if, for example, you used `%option prefix="xx"' for one of your
629scanners and `%option prefix="zz"' for the other.
630
631
632File: flex.info, Node: Reentrant, Next: Lex and Posix, Prev: Cxx, Up: Top
633
634Reentrant C Scanners
635********************
636
637 `flex' has the ability to generate a reentrant C scanner. This is
638accomplished by specifying `%option reentrant' (`-R') The generated
639scanner is both portable, and safe to use in one or more separate
640threads of control. The most common use for reentrant scanners is from
641within multi-threaded applications. Any thread may create and execute
642a reentrant `flex' scanner without the need for synchronization with
643other threads.
644
645* Menu:
646
647* Reentrant Uses::
648* Reentrant Overview::
649* Reentrant Example::
650* Reentrant Detail::
651* Reentrant Functions::
652
653
654File: flex.info, Node: Reentrant Uses, Next: Reentrant Overview, Prev: Reentrant, Up: Reentrant
655
656Uses for Reentrant Scanners
657===========================
658
659 However, there are other uses for a reentrant scanner. For example,
660you could scan two or more files simultaneously to implement a `diff' at
661the token level (i.e., instead of at the character level):
662
663
664 /* Example of maintaining more than one active scanner. */
665
666 do {
667 int tok1, tok2;
668
669 tok1 = yylex( scanner_1 );
670 tok2 = yylex( scanner_2 );
671
672 if( tok1 != tok2 )
673 printf("Files are different.");
674
675 } while ( tok1 && tok2 );
676
677 Another use for a reentrant scanner is recursion. (Note that a
678recursive scanner can also be created using a non-reentrant scanner and
679buffer states. *Note Multiple Input Buffers::.)
680
681 The following crude scanner supports the `eval' command by invoking
682another instance of itself.
683
684
685 /* Example of recursive invocation. */
686
687 %option reentrant
688
689 %%
690 "eval(".+")" {
691 yyscan_t scanner;
692 YY_BUFFER_STATE buf;
693
694 yylex_init( &scanner );
695 yytext[yyleng-1] = ' ';
696
697 buf = yy_scan_string( yytext + 5, scanner );
698 yylex( scanner );
699
700 yy_delete_buffer(buf,scanner);
701 yylex_destroy( scanner );
702 }
703 ...
704 %%
705
706
707File: flex.info, Node: Reentrant Overview, Next: Reentrant Example, Prev: Reentrant Uses, Up: Reentrant
708
709An Overview of the Reentrant API
710================================
711
712 The API for reentrant scanners is different than for non-reentrant
713scanners. Here is a quick overview of the API:
714
715 `%option reentrant' must be specified.
716
717 * All functions take one additional argument: `yyscanner'
718
719 * All global variables are replaced by their macro equivalents. (We
720 tell you this because it may be important to you during debugging.)
721
722 * `yylex_init' and `yylex_destroy' must be called before and after
723 `yylex', respectively.
724
725 * Accessor methods (get/set functions) provide access to common
726 `flex' variables.
727
728 * User-specific data can be stored in `yyextra'.
729
730
731File: flex.info, Node: Reentrant Example, Next: Reentrant Detail, Prev: Reentrant Overview, Up: Reentrant
732
733Reentrant Example
734=================
735
736 First, an example of a reentrant scanner:
737
738 /* This scanner prints "//" comments. */
739 %option reentrant stack
740 %x COMMENT
741 %%
742 "//" yy_push_state( COMMENT, yyscanner);
743 .|\n
744 <COMMENT>\n yy_pop_state( yyscanner );
745 <COMMENT>[^\n]+ fprintf( yyout, "%s\n", yytext);
746 %%
747 int main ( int argc, char * argv[] )
748 {
749 yyscan_t scanner;
750
751 yylex_init ( &scanner );
752 yylex ( scanner );
753 yylex_destroy ( scanner );
754 return 0;
755 }
756
757
758File: flex.info, Node: Reentrant Detail, Next: Reentrant Functions, Prev: Reentrant Example, Up: Reentrant
759
760The Reentrant API in Detail
761===========================
762
763 Here are the things you need to do or know to use the reentrant C
764API of `flex'.
765
766* Menu:
767
768* Specify Reentrant::
769* Extra Reentrant Argument::
770* Global Replacement::
771* Init and Destroy Functions::
772* Accessor Methods::
773* Extra Data::
774* About yyscan_t::
775
776
777File: flex.info, Node: Specify Reentrant, Next: Extra Reentrant Argument, Prev: Reentrant Detail, Up: Reentrant Detail
778
779Declaring a Scanner As Reentrant
780--------------------------------
781
782 %option reentrant (-reentrant) must be specified.
783
784 Notice that `%option reentrant' is specified in the above example
785(*note Reentrant Example::. Had this option not been specified, `flex'
786would have happily generated a non-reentrant scanner without
787complaining. You may explicitly specify `%option noreentrant', if you
788do _not_ want a reentrant scanner, although it is not necessary. The
789default is to generate a non-reentrant scanner.
790
791
792File: flex.info, Node: Extra Reentrant Argument, Next: Global Replacement, Prev: Specify Reentrant, Up: Reentrant Detail
793
794The Extra Argument
795------------------
796
797 All functions take one additional argument: `yyscanner'.
798
799 Notice that the calls to `yy_push_state' and `yy_pop_state' both
800have an argument, `yyscanner' , that is not present in a non-reentrant
801scanner. Here are the declarations of `yy_push_state' and
802`yy_pop_state' in the generated scanner:
803
804
805 static void yy_push_state ( int new_state , yyscan_t yyscanner ) ;
806 static void yy_pop_state ( yyscan_t yyscanner ) ;
807
808 Notice that the argument `yyscanner' appears in the declaration of
809both functions. In fact, all `flex' functions in a reentrant scanner
810have this additional argument. It is always the last argument in the
811argument list, it is always of type `yyscan_t' (which is typedef'd to
812`void *') and it is always named `yyscanner'. As you may have guessed,
813`yyscanner' is a pointer to an opaque data structure encapsulating the
814current state of the scanner. For a list of function declarations, see
815*Note Reentrant Functions::. Note that preprocessor macros, such as
816`BEGIN', `ECHO', and `REJECT', do not take this additional argument.
817
818
819File: flex.info, Node: Global Replacement, Next: Init and Destroy Functions, Prev: Extra Reentrant Argument, Up: Reentrant Detail
820
821Global Variables Replaced By Macros
822-----------------------------------
823
824 All global variables in traditional flex have been replaced by macro
825equivalents.
826
827 Note that in the above example, `yyout' and `yytext' are not plain
828variables. These are macros that will expand to their equivalent lvalue.
829All of the familiar `flex' globals have been replaced by their macro
830equivalents. In particular, `yytext', `yyleng', `yylineno', `yyin',
831`yyout', `yyextra', `yylval', and `yylloc' are macros. You may safely
832use these macros in actions as if they were plain variables. We only
833tell you this so you don't expect to link to these variables
834externally. Currently, each macro expands to a member of an internal
835struct, e.g.,
836
837
838 #define yytext (((struct yyguts_t*)yyscanner)->yytext_r)
839
840 One important thing to remember about `yytext' and friends is that
841`yytext' is not a global variable in a reentrant scanner, you can not
842access it directly from outside an action or from other functions. You
843must use an accessor method, e.g., `yyget_text', to accomplish this.
844(See below).
845
846
847File: flex.info, Node: Init and Destroy Functions, Next: Accessor Methods, Prev: Global Replacement, Up: Reentrant Detail
848
849Init and Destroy Functions
850--------------------------
851
852 `yylex_init' and `yylex_destroy' must be called before and after
853`yylex', respectively.
854
855
856 int yylex_init ( yyscan_t * ptr_yy_globals ) ;
857 int yylex ( yyscan_t yyscanner ) ;
858 int yylex_destroy ( yyscan_t yyscanner ) ;
859
860 The function `yylex_init' must be called before calling any other
861function. The argument to `yylex_init' is the address of an
862uninitialized pointer to be filled in by `flex'. The contents of
863`ptr_yy_globals' need not be initialized, since `flex' will overwrite
864it anyway. The value stored in `ptr_yy_globals' should thereafter be
865passed to `yylex()' and yylex_destroy(). Flex does not save the
866argument passed to `yylex_init', so it is safe to pass the address of a
867local pointer to `yylex_init'. The function `yylex' should be familiar
868to you by now. The reentrant version takes one argument, which is the
869value returned (via an argument) by `yylex_init'. Otherwise, it
870behaves the same as the non-reentrant version of `yylex'.
871
872 `yylex_init' returns 0 (zero) on success, or non-zero on failure, in
873which case, errno is set to one of the following values:
874
875 * ENOMEM Memory allocation error. *Note memory-management::.
876
877 * EINVAL Invalid argument.
878
879 The function `yylex_destroy' should be called to free resources used
880by the scanner. After `yylex_destroy' is called, the contents of
881`yyscanner' should not be used. Of course, there is no need to destroy
882a scanner if you plan to reuse it. A `flex' scanner (both reentrant
883and non-reentrant) may be restarted by calling `yyrestart'.
884
885 Below is an example of a program that creates a scanner, uses it,
886then destroys it when done:
887
888
889 int main ()
890 {
891 yyscan_t scanner;
892 int tok;
893
894 yylex_init(&scanner);
895
896 while ((tok=yylex()) > 0)
897 printf("tok=%d yytext=%s\n", tok, yyget_text(scanner));
898
899 yylex_destroy(scanner);
900 return 0;
901 }
902
903
904File: flex.info, Node: Accessor Methods, Next: Extra Data, Prev: Init and Destroy Functions, Up: Reentrant Detail
905
906Accessing Variables with Reentrant Scanners
907-------------------------------------------
908
909 Accessor methods (get/set functions) provide access to common `flex'
910variables.
911
912 Many scanners that you build will be part of a larger project.
913Portions of your project will need access to `flex' values, such as
914`yytext'. In a non-reentrant scanner, these values are global, so
915there is no problem accessing them. However, in a reentrant scanner,
916there are no global `flex' values. You can not access them directly.
917Instead, you must access `flex' values using accessor methods (get/set
918functions). Each accessor method is named `yyget_NAME' or `yyset_NAME',
919where `NAME' is the name of the `flex' variable you want. For example:
920
921
922 /* Set the last character of yytext to NULL. */
923 void chop ( yyscan_t scanner )
924 {
925 int len = yyget_leng( scanner );
926 yyget_text( scanner )[len - 1] = '\0';
927 }
928
929 The above code may be called from within an action like this:
930
931
932 %%
933 .+\n { chop( yyscanner );}
934
935 You may find that `%option header-file' is particularly useful for
936generating prototypes of all the accessor functions. *Note
937option-header::.
938
939
940File: flex.info, Node: Extra Data, Next: About yyscan_t, Prev: Accessor Methods, Up: Reentrant Detail
941
942Extra Data
943----------
944
945 User-specific data can be stored in `yyextra'.
946
947 In a reentrant scanner, it is unwise to use global variables to
948communicate with or maintain state between different pieces of your
949program. However, you may need access to external data or invoke
950external functions from within the scanner actions. Likewise, you may
951need to pass information to your scanner (e.g., open file descriptors,
952or database connections). In a non-reentrant scanner, the only way to
953do this would be through the use of global variables. `Flex' allows
954you to store arbitrary, "extra" data in a scanner. This data is
955accessible through the accessor methods `yyget_extra' and `yyset_extra'
956from outside the scanner, and through the shortcut macro `yyextra' from
957within the scanner itself. They are defined as follows:
958
959
960 #define YY_EXTRA_TYPE void*
961 YY_EXTRA_TYPE yyget_extra ( yyscan_t scanner );
962 void yyset_extra ( YY_EXTRA_TYPE arbitrary_data , yyscan_t scanner);
963
964 By default, `YY_EXTRA_TYPE' is defined as type `void *'. You will
965have to cast `yyextra' and the return value from `yyget_extra' to the
966appropriate value each time you access the extra data. To avoid
967casting, you may override the default type by defining `YY_EXTRA_TYPE'
968in section 1 of your scanner:
969
970
971 /* An example of overriding YY_EXTRA_TYPE. */
972 %{
973 #include <sys/stat.h>
974 #include <unistd.h>
975 #define YY_EXTRA_TYPE struct stat*
976 %}
977 %option reentrant
978 %%
979
980 __filesize__ printf( "%ld", yyextra->st_size );
981 __lastmod__ printf( "%ld", yyextra->st_mtime );
982 %%
983 void scan_file( char* filename )
984 {
985 yyscan_t scanner;
986 struct stat buf;
987
988 yylex_init ( &scanner );
989 yyset_in( fopen(filename,"r"), scanner );
990
991 stat( filename, &buf);
992 yyset_extra( &buf, scanner );
993 yylex ( scanner );
994 yylex_destroy( scanner );
995 }
996
997
998File: flex.info, Node: About yyscan_t, Prev: Extra Data, Up: Reentrant Detail
999
1000About yyscan_t
1001--------------
1002
1003 `yyscan_t' is defined as:
1004
1005
1006 typedef void* yyscan_t;
1007
1008 It is initialized by `yylex_init()' to point to an internal
1009structure. You should never access this value directly. In particular,
1010you should never attempt to free it (use `yylex_destroy()' instead.)
1011
1012
1013File: flex.info, Node: Reentrant Functions, Prev: Reentrant Detail, Up: Reentrant
1014
1015Functions and Macros Available in Reentrant C Scanners
1016======================================================
1017
1018 The following Functions are available in a reentrant scanner:
1019
1020
1021 char *yyget_text ( yyscan_t scanner );
1022 int yyget_leng ( yyscan_t scanner );
1023 FILE *yyget_in ( yyscan_t scanner );
1024 FILE *yyget_out ( yyscan_t scanner );
1025 int yyget_lineno ( yyscan_t scanner );
1026 YY_EXTRA_TYPE yyget_extra ( yyscan_t scanner );
1027 int yyget_debug ( yyscan_t scanner );
1028
1029 void yyset_debug ( int flag, yyscan_t scanner );
1030 void yyset_in ( FILE * in_str , yyscan_t scanner );
1031 void yyset_out ( FILE * out_str , yyscan_t scanner );
1032 void yyset_lineno ( int line_number , yyscan_t scanner );
1033 void yyset_extra ( YY_EXTRA_TYPE user_defined , yyscan_t scanner );
1034
1035 There are no "set" functions for yytext and yyleng. This is
1036intentional.
1037
1038 The following Macro shortcuts are available in actions in a reentrant
1039scanner:
1040
1041
1042 yytext
1043 yyleng
1044 yyin
1045 yyout
1046 yylineno
1047 yyextra
1048 yy_flex_debug
1049
1050 In a reentrant C scanner, support for yylineno is always present
1051(i.e., you may access yylineno), but the value is never modified by
1052`flex' unless `%option yylineno' is enabled. This is to allow the user
1053to maintain the line count independently of `flex'.
1054
1055 The following functions and macros are made available when `%option
1056bison-bridge' (`--bison-bridge') is specified:
1057
1058
1059 YYSTYPE * yyget_lval ( yyscan_t scanner );
1060 void yyset_lval ( YYSTYPE * yylvalp , yyscan_t scanner );
1061 yylval
1062
1063 The following functions and macros are made available when `%option
1064bison-locations' (`--bison-locations') is specified:
1065
1066
1067 YYLTYPE *yyget_lloc ( yyscan_t scanner );
1068 void yyset_lloc ( YYLTYPE * yyllocp , yyscan_t scanner );
1069 yylloc
1070
1071 Support for yylval assumes that `YYSTYPE' is a valid type. Support
1072for yylloc assumes that `YYSLYPE' is a valid type. Typically, these
1073types are generated by `bison', and are included in section 1 of the
1074`flex' input.
1075
1076
1077File: flex.info, Node: Lex and Posix, Next: Memory Management, Prev: Reentrant, Up: Top
1078
1079Incompatibilities with Lex and Posix
1080************************************
1081
1082 `flex' is a rewrite of the AT&T Unix _lex_ tool (the two
1083implementations do not share any code, though), with some extensions and
1084incompatibilities, both of which are of concern to those who wish to
1085write scanners acceptable to both implementations. `flex' is fully
1086compliant with the POSIX `lex' specification, except that when using
1087`%pointer' (the default), a call to `unput()' destroys the contents of
1088`yytext', which is counter to the POSIX specification. In this section
1089we discuss all of the known areas of incompatibility between `flex',
1090AT&T `lex', and the POSIX specification. `flex''s `-l' option turns on
1091maximum compatibility with the original AT&T `lex' implementation, at
1092the cost of a major loss in the generated scanner's performance. We
1093note below which incompatibilities can be overcome using the `-l'
1094option. `flex' is fully compatible with `lex' with the following
1095exceptions:
1096
1097 * The undocumented `lex' scanner internal variable `yylineno' is not
1098 supported unless `-l' or `%option yylineno' is used.
1099
1100 * `yylineno' should be maintained on a per-buffer basis, rather than
1101 a per-scanner (single global variable) basis.
1102
1103 * `yylineno' is not part of the POSIX specification.
1104
1105 * The `input()' routine is not redefinable, though it may be called
1106 to read characters following whatever has been matched by a rule.
1107 If `input()' encounters an end-of-file the normal `yywrap()'
1108 processing is done. A "real" end-of-file is returned by `input()'
1109 as `EOF'.
1110
1111 * Input is instead controlled by defining the `YY_INPUT()' macro.
1112
1113 * The `flex' restriction that `input()' cannot be redefined is in
1114 accordance with the POSIX specification, which simply does not
1115 specify any way of controlling the scanner's input other than by
1116 making an initial assignment to `yyin'.
1117
1118 * The `unput()' routine is not redefinable. This restriction is in
1119 accordance with POSIX.
1120
1121 * `flex' scanners are not as reentrant as `lex' scanners. In
1122 particular, if you have an interactive scanner and an interrupt
1123 handler which long-jumps out of the scanner, and the scanner is
1124 subsequently called again, you may get the following message:
1125
1126
1127 fatal @code{flex} scanner internal error--end of buffer missed
1128
1129 To reenter the scanner, first use:
1130
1131
1132 yyrestart( yyin );
1133
1134 Note that this call will throw away any buffered input; usually
1135 this isn't a problem with an interactive scanner. *Note
1136 Reentrant::, for `flex''s reentrant API.
1137
1138 * Also note that `flex' C++ scanner classes _are_ reentrant, so if
1139 using C++ is an option for you, you should use them instead.
1140 *Note Cxx::, and *Note Reentrant:: for details.
1141
1142 * `output()' is not supported. Output from the ECHO macro is done
1143 to the file-pointer `yyout' (default `stdout)'.
1144
1145 * `output()' is not part of the POSIX specification.
1146
1147 * `lex' does not support exclusive start conditions (%x), though they
1148 are in the POSIX specification.
1149
1150 * When definitions are expanded, `flex' encloses them in parentheses.
1151 With `lex', the following:
1152
1153
1154 NAME [A-Z][A-Z0-9]*
1155 %%
1156 foo{NAME}? printf( "Found it\n" );
1157 %%
1158
1159 will not match the string `foo' because when the macro is expanded
1160 the rule is equivalent to `foo[A-Z][A-Z0-9]*?' and the precedence
1161 is such that the `?' is associated with `[A-Z0-9]*'. With `flex',
1162 the rule will be expanded to `foo([A-Z][A-Z0-9]*)?' and so the
1163 string `foo' will match.
1164
1165 * Note that if the definition begins with `^' or ends with `$' then
1166 it is _not_ expanded with parentheses, to allow these operators to
1167 appear in definitions without losing their special meanings. But
1168 the `<s>', `/', and `<<EOF>>' operators cannot be used in a `flex'
1169 definition.
1170
1171 * Using `-l' results in the `lex' behavior of no parentheses around
1172 the definition.
1173
1174 * The POSIX specification is that the definition be enclosed in
1175 parentheses.
1176
1177 * Some implementations of `lex' allow a rule's action to begin on a
1178 separate line, if the rule's pattern has trailing whitespace:
1179
1180
1181 %%
1182 foo|bar<space here>
1183 { foobar_action();}
1184
1185 `flex' does not support this feature.
1186
1187 * The `lex' `%r' (generate a Ratfor scanner) option is not
1188 supported. It is not part of the POSIX specification.
1189
1190 * After a call to `unput()', _yytext_ is undefined until the next
1191 token is matched, unless the scanner was built using `%array'.
1192 This is not the case with `lex' or the POSIX specification. The
1193 `-l' option does away with this incompatibility.
1194
1195 * The precedence of the `{,}' (numeric range) operator is different.
1196 The AT&T and POSIX specifications of `lex' interpret `abc{1,3}'
1197 as match one, two, or three occurrences of `abc'", whereas `flex'
1198 interprets it as "match `ab' followed by one, two, or three
1199 occurrences of `c'". The `-l' and `--posix' options do away with
1200 this incompatibility.
1201
1202 * The precedence of the `^' operator is different. `lex' interprets
1203 `^foo|bar' as "match either 'foo' at the beginning of a line, or
1204 'bar' anywhere", whereas `flex' interprets it as "match either
1205 `foo' or `bar' if they come at the beginning of a line". The
1206 latter is in agreement with the POSIX specification.
1207
1208 * The special table-size declarations such as `%a' supported by
1209 `lex' are not required by `flex' scanners.. `flex' ignores them.
1210
1211 * The name `FLEX_SCANNER' is `#define''d so scanners may be written
1212 for use with either `flex' or `lex'. Scanners also include
1213 `YY_FLEX_MAJOR_VERSION', `YY_FLEX_MINOR_VERSION' and
1214 `YY_FLEX_SUBMINOR_VERSION' indicating which version of `flex'
1215 generated the scanner. For example, for the 2.5.22 release, these
1216 defines would be 2, 5 and 22 respectively. If the version of
1217 `flex' being used is a beta version, then the symbol `FLEX_BETA'
1218 is defined.
1219
1220 * The symbols `[[' and `]]' in the code sections of the input may
1221 conflict with the m4 delimiters. *Note M4 Dependency::.
1222
1223
1224 The following `flex' features are not included in `lex' or the POSIX
1225specification:
1226
1227 * C++ scanners
1228
1229 * %option
1230
1231 * start condition scopes
1232
1233 * start condition stacks
1234
1235 * interactive/non-interactive scanners
1236
1237 * yy_scan_string() and friends
1238
1239 * yyterminate()
1240
1241 * yy_set_interactive()
1242
1243 * yy_set_bol()
1244
1245 * YY_AT_BOL() <<EOF>>
1246
1247 * <*>
1248
1249 * YY_DECL
1250
1251 * YY_START
1252
1253 * YY_USER_ACTION
1254
1255 * YY_USER_INIT
1256
1257 * #line directives
1258
1259 * %{}'s around actions
1260
1261 * reentrant C API
1262
1263 * multiple actions on a line
1264
1265 * almost all of the `flex' command-line options
1266
1267 The feature "multiple actions on a line" refers to the fact that
1268with `flex' you can put multiple actions on the same line, separated
1269with semi-colons, while with `lex', the following:
1270
1271
1272 foo handle_foo(); ++num_foos_seen;
1273
1274 is (rather surprisingly) truncated to
1275
1276
1277 foo handle_foo();
1278
1279 `flex' does not truncate the action. Actions that are not enclosed
1280in braces are simply terminated at the end of the line.
1281
1282
1283File: flex.info, Node: Memory Management, Next: Serialized Tables, Prev: Lex and Posix, Up: Top
1284
1285Memory Management
1286*****************
1287
1288 This chapter describes how flex handles dynamic memory, and how you
1289can override the default behavior.
1290
1291* Menu:
1292
1293* The Default Memory Management::
1294* Overriding The Default Memory Management::
1295* A Note About yytext And Memory::
1296
1297
1298File: flex.info, Node: The Default Memory Management, Next: Overriding The Default Memory Management, Prev: Memory Management, Up: Memory Management
1299
1300The Default Memory Management
1301=============================
1302
1303 Flex allocates dynamic memory during initialization, and once in a
1304while from within a call to yylex(). Initialization takes place during
1305the first call to yylex(). Thereafter, flex may reallocate more memory
1306if it needs to enlarge a buffer. As of version 2.5.9 Flex will clean up
1307all memory when you call `yylex_destroy' *Note faq-memory-leak::.
1308
1309 Flex allocates dynamic memory for four purposes, listed below (1)
1310
131116kB for the input buffer.
1312 Flex allocates memory for the character buffer used to perform
1313 pattern matching. Flex must read ahead from the input stream and
1314 store it in a large character buffer. This buffer is typically
1315 the largest chunk of dynamic memory flex consumes. This buffer
1316 will grow if necessary, doubling the size each time. Flex frees
1317 this memory when you call yylex_destroy(). The default size of
1318 this buffer (16384 bytes) is almost always too large. The ideal
1319 size for this buffer is the length of the longest token expected,
1320 in bytes, plus a little more. Flex will allocate a few extra
1321 bytes for housekeeping. Currently, to override the size of the
1322 input buffer you must `#define YY_BUF_SIZE' to whatever number of
1323 bytes you want. We don't plan to change this in the near future,
1324 but we reserve the right to do so if we ever add a more robust
1325 memory management API.
1326
132764kb for the REJECT state. This will only be allocated if you use REJECT.
1328 The size is the large enough to hold the same number of states as
1329 characters in the input buffer. If you override the size of the
1330 input buffer (via `YY_BUF_SIZE'), then you automatically override
1331 the size of this buffer as well.
1332
1333100 bytes for the start condition stack.
1334 Flex allocates memory for the start condition stack. This is the
1335 stack used for pushing start states, i.e., with yy_push_state().
1336 It will grow if necessary. Since the states are simply integers,
1337 this stack doesn't consume much memory. This stack is not present
1338 if `%option stack' is not specified. You will rarely need to tune
1339 this buffer. The ideal size for this stack is the maximum depth
1340 expected. The memory for this stack is automatically destroyed
1341 when you call yylex_destroy(). *Note option-stack::.
1342
134340 bytes for each YY_BUFFER_STATE.
1344 Flex allocates memory for each YY_BUFFER_STATE. The buffer state
1345 itself is about 40 bytes, plus an additional large character
1346 buffer (described above.) The initial buffer state is created
1347 during initialization, and with each call to yy_create_buffer().
1348 You can't tune the size of this, but you can tune the character
1349 buffer as described above. Any buffer state that you explicitly
1350 create by calling yy_create_buffer() is _NOT_ destroyed
1351 automatically. You must call yy_delete_buffer() to free the
1352 memory. The exception to this rule is that flex will delete the
1353 current buffer automatically when you call yylex_destroy(). If you
1354 delete the current buffer, be sure to set it to NULL. That way,
1355 flex will not try to delete the buffer a second time (possibly
1356 crashing your program!) At the time of this writing, flex does not
1357 provide a growable stack for the buffer states. You have to
1358 manage that yourself. *Note Multiple Input Buffers::.
1359
136084 bytes for the reentrant scanner guts
1361 Flex allocates about 84 bytes for the reentrant scanner structure
1362 when you call yylex_init(). It is destroyed when the user calls
1363 yylex_destroy().
1364
1365
1366 ---------- Footnotes ----------
1367
1368 (1) The quantities given here are approximate, and may vary due to
1369host architecture, compiler configuration, or due to future
1370enhancements to flex.
1371
Note: See TracBrowser for help on using the repository browser.