source: trunk/flex/doc/flex.info-2@ 3032

Last change on this file since 3032 was 3031, checked in by bird, 18 years ago

flex 2.5.33.

File size: 50.9 KB
Line 
1This is flex.info, produced by makeinfo version 4.5 from flex.texi.
2
3INFO-DIR-SECTION Programming
4START-INFO-DIR-ENTRY
5* flex: (flex). Fast lexical analyzer generator (lex replacement).
6END-INFO-DIR-ENTRY
7
8
9 The flex manual is placed under the same licensing conditions as the
10rest of flex:
11
12 Copyright (C) 1990, 1997 The Regents of the University of California.
13All rights reserved.
14
15 This code is derived from software contributed to Berkeley by Vern
16Paxson.
17
18 The United States Government has rights in this work pursuant to
19contract no. DE-AC03-76SF00098 between the United States Department of
20Energy and the University of California.
21
22 Redistribution and use in source and binary forms, with or without
23modification, are permitted provided that the following conditions are
24met:
25
26 1. Redistributions of source code must retain the above copyright
27 notice, this list of conditions and the following disclaimer.
28
29 2. Redistributions in binary form must reproduce the above copyright
30 notice, this list of conditions and the following disclaimer in the
31 documentation and/or other materials provided with the
32 distribution.
33 Neither the name of the University nor the names of its contributors
34may be used to endorse or promote products derived from this software
35without specific prior written permission.
36
37 THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
38WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
39MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
40
41File: flex.info, Node: Start Conditions, Next: Multiple Input Buffers, Prev: Generated Scanner, Up: Top
42
43Start Conditions
44****************
45
46 `flex' provides a mechanism for conditionally activating rules. Any
47rule whose pattern is prefixed with `<sc>' will only be active when the
48scanner is in the "start condition" named `sc'. For example,
49
50
51 <STRING>[^"]* { /* eat up the string body ... */
52 ...
53 }
54
55 will be active only when the scanner is in the `STRING' start
56condition, and
57
58
59 <INITIAL,STRING,QUOTE>\. { /* handle an escape ... */
60 ...
61 }
62
63 will be active only when the current start condition is either
64`INITIAL', `STRING', or `QUOTE'.
65
66 Start conditions are declared in the definitions (first) section of
67the input using unindented lines beginning with either `%s' or `%x'
68followed by a list of names. The former declares "inclusive" start
69conditions, the latter "exclusive" start conditions. A start condition
70is activated using the `BEGIN' action. Until the next `BEGIN' action
71is executed, rules with the given start condition will be active and
72rules with other start conditions will be inactive. If the start
73condition is inclusive, then rules with no start conditions at all will
74also be active. If it is exclusive, then _only_ rules qualified with
75the start condition will be active. A set of rules contingent on the
76same exclusive start condition describe a scanner which is independent
77of any of the other rules in the `flex' input. Because of this,
78exclusive start conditions make it easy to specify "mini-scanners"
79which scan portions of the input that are syntactically different from
80the rest (e.g., comments).
81
82 If the distinction between inclusive and exclusive start conditions
83is still a little vague, here's a simple example illustrating the
84connection between the two. The set of rules:
85
86
87 %s example
88 %%
89
90 <example>foo do_something();
91
92 bar something_else();
93
94 is equivalent to
95
96
97 %x example
98 %%
99
100 <example>foo do_something();
101
102 <INITIAL,example>bar something_else();
103
104 Without the `<INITIAL,example>' qualifier, the `bar' pattern in the
105second example wouldn't be active (i.e., couldn't match) when in start
106condition `example'. If we just used `example>' to qualify `bar',
107though, then it would only be active in `example' and not in `INITIAL',
108while in the first example it's active in both, because in the first
109example the `example' start condition is an inclusive `(%s)' start
110condition.
111
112 Also note that the special start-condition specifier `<*>' matches
113every start condition. Thus, the above example could also have been
114written:
115
116
117 %x example
118 %%
119
120 <example>foo do_something();
121
122 <*>bar something_else();
123
124 The default rule (to `ECHO' any unmatched character) remains active
125in start conditions. It is equivalent to:
126
127
128 <*>.|\n ECHO;
129
130 `BEGIN(0)' returns to the original state where only the rules with
131no start conditions are active. This state can also be referred to as
132the start-condition `INITIAL', so `BEGIN(INITIAL)' is equivalent to
133`BEGIN(0)'. (The parentheses around the start condition name are not
134required but are considered good style.)
135
136 `BEGIN' actions can also be given as indented code at the beginning
137of the rules section. For example, the following will cause the scanner
138to enter the `SPECIAL' start condition whenever `yylex()' is called and
139the global variable `enter_special' is true:
140
141
142 int enter_special;
143
144 %x SPECIAL
145 %%
146 if ( enter_special )
147 BEGIN(SPECIAL);
148
149 <SPECIAL>blahblahblah
150 ...more rules follow...
151
152 To illustrate the uses of start conditions, here is a scanner which
153provides two different interpretations of a string like `123.456'. By
154default it will treat it as three tokens, the integer `123', a dot
155(`.'), and the integer `456'. But if the string is preceded earlier in
156the line by the string `expect-floats' it will treat it as a single
157token, the floating-point number `123.456':
158
159
160 %{
161 #include <math.h>
162 %}
163 %s expect
164
165 %%
166 expect-floats BEGIN(expect);
167
168 <expect>[0-9]+@samp{.}[0-9]+ {
169 printf( "found a float, = %f\n",
170 atof( yytext ) );
171 }
172 <expect>\n {
173 /* that's the end of the line, so
174 * we need another "expect-number"
175 * before we'll recognize any more
176 * numbers
177 */
178 BEGIN(INITIAL);
179 }
180
181 [0-9]+ {
182 printf( "found an integer, = %d\n",
183 atoi( yytext ) );
184 }
185
186 "." printf( "found a dot\n" );
187
188 Here is a scanner which recognizes (and discards) C comments while
189maintaining a count of the current input line.
190
191
192 %x comment
193 %%
194 int line_num = 1;
195
196 "/*" BEGIN(comment);
197
198 <comment>[^*\n]* /* eat anything that's not a '*' */
199 <comment>"*"+[^*/\n]* /* eat up '*'s not followed by '/'s */
200 <comment>\n ++line_num;
201 <comment>"*"+"/" BEGIN(INITIAL);
202
203 This scanner goes to a bit of trouble to match as much text as
204possible with each rule. In general, when attempting to write a
205high-speed scanner try to match as much possible in each rule, as it's
206a big win.
207
208 Note that start-conditions names are really integer values and can
209be stored as such. Thus, the above could be extended in the following
210fashion:
211
212
213 %x comment foo
214 %%
215 int line_num = 1;
216 int comment_caller;
217
218 "/*" {
219 comment_caller = INITIAL;
220 BEGIN(comment);
221 }
222
223 ...
224
225 <foo>"/*" {
226 comment_caller = foo;
227 BEGIN(comment);
228 }
229
230 <comment>[^*\n]* /* eat anything that's not a '*' */
231 <comment>"*"+[^*/\n]* /* eat up '*'s not followed by '/'s */
232 <comment>\n ++line_num;
233 <comment>"*"+"/" BEGIN(comment_caller);
234
235 Furthermore, you can access the current start condition using the
236integer-valued `YY_START' macro. For example, the above assignments to
237`comment_caller' could instead be written
238
239
240 comment_caller = YY_START;
241
242 Flex provides `YYSTATE' as an alias for `YY_START' (since that is
243what's used by AT&T `lex').
244
245 For historical reasons, start conditions do not have their own
246name-space within the generated scanner. The start condition names are
247unmodified in the generated scanner and generated header. *Note
248option-header::. *Note option-prefix::.
249
250 Finally, here's an example of how to match C-style quoted strings
251using exclusive start conditions, including expanded escape sequences
252(but not including checking for a string that's too long):
253
254
255 %x str
256
257 %%
258 char string_buf[MAX_STR_CONST];
259 char *string_buf_ptr;
260
261
262 \" string_buf_ptr = string_buf; BEGIN(str);
263
264 <str>\" { /* saw closing quote - all done */
265 BEGIN(INITIAL);
266 *string_buf_ptr = '\0';
267 /* return string constant token type and
268 * value to parser
269 */
270 }
271
272 <str>\n {
273 /* error - unterminated string constant */
274 /* generate error message */
275 }
276
277 <str>\\[0-7]{1,3} {
278 /* octal escape sequence */
279 int result;
280
281 (void) sscanf( yytext + 1, "%o", &result );
282
283 if ( result > 0xff )
284 /* error, constant is out-of-bounds */
285
286 *string_buf_ptr++ = result;
287 }
288
289 <str>\\[0-9]+ {
290 /* generate error - bad escape sequence; something
291 * like '\48' or '\0777777'
292 */
293 }
294
295 <str>\\n *string_buf_ptr++ = '\n';
296 <str>\\t *string_buf_ptr++ = '\t';
297 <str>\\r *string_buf_ptr++ = '\r';
298 <str>\\b *string_buf_ptr++ = '\b';
299 <str>\\f *string_buf_ptr++ = '\f';
300
301 <str>\\(.|\n) *string_buf_ptr++ = yytext[1];
302
303 <str>[^\\\n\"]+ {
304 char *yptr = yytext;
305
306 while ( *yptr )
307 *string_buf_ptr++ = *yptr++;
308 }
309
310 Often, such as in some of the examples above, you wind up writing a
311whole bunch of rules all preceded by the same start condition(s). Flex
312makes this a little easier and cleaner by introducing a notion of start
313condition "scope". A start condition scope is begun with:
314
315
316 <SCs>{
317
318 where `SCs' is a list of one or more start conditions. Inside the
319start condition scope, every rule automatically has the prefix `SCs>'
320applied to it, until a `}' which matches the initial `{'. So, for
321example,
322
323
324 <ESC>{
325 "\\n" return '\n';
326 "\\r" return '\r';
327 "\\f" return '\f';
328 "\\0" return '\0';
329 }
330
331 is equivalent to:
332
333
334 <ESC>"\\n" return '\n';
335 <ESC>"\\r" return '\r';
336 <ESC>"\\f" return '\f';
337 <ESC>"\\0" return '\0';
338
339 Start condition scopes may be nested.
340
341 The following routines are available for manipulating stacks of
342start conditions:
343
344 - Function: void yy_push_state ( int `new_state' )
345 pushes the current start condition onto the top of the start
346 condition stack and switches to `new_state' as though you had used
347 `BEGIN new_state' (recall that start condition names are also
348 integers).
349
350 - Function: void yy_pop_state ()
351 pops the top of the stack and switches to it via `BEGIN'.
352
353 - Function: int yy_top_state ()
354 returns the top of the stack without altering the stack's contents.
355
356 The start condition stack grows dynamically and so has no built-in
357size limitation. If memory is exhausted, program execution aborts.
358
359 To use start condition stacks, your scanner must include a `%option
360stack' directive (*note Scanner Options::).
361
362
363File: flex.info, Node: Multiple Input Buffers, Next: EOF, Prev: Start Conditions, Up: Top
364
365Multiple Input Buffers
366**********************
367
368 Some scanners (such as those which support "include" files) require
369reading from several input streams. As `flex' scanners do a large
370amount of buffering, one cannot control where the next input will be
371read from by simply writing a `YY_INPUT()' which is sensitive to the
372scanning context. `YY_INPUT()' is only called when the scanner reaches
373the end of its buffer, which may be a long time after scanning a
374statement such as an `include' statement which requires switching the
375input source.
376
377 To negotiate these sorts of problems, `flex' provides a mechanism
378for creating and switching between multiple input buffers. An input
379buffer is created by using:
380
381 - Function: YY_BUFFER_STATE yy_create_buffer ( FILE *file, int size )
382
383 which takes a `FILE' pointer and a size and creates a buffer
384associated with the given file and large enough to hold `size'
385characters (when in doubt, use `YY_BUF_SIZE' for the size). It returns
386a `YY_BUFFER_STATE' handle, which may then be passed to other routines
387(see below). The `YY_BUFFER_STATE' type is a pointer to an opaque
388`struct yy_buffer_state' structure, so you may safely initialize
389`YY_BUFFER_STATE' variables to `((YY_BUFFER_STATE) 0)' if you wish, and
390also refer to the opaque structure in order to correctly declare input
391buffers in source files other than that of your scanner. Note that the
392`FILE' pointer in the call to `yy_create_buffer' is only used as the
393value of `yyin' seen by `YY_INPUT'. If you redefine `YY_INPUT()' so it
394no longer uses `yyin', then you can safely pass a NULL `FILE' pointer to
395`yy_create_buffer'. You select a particular buffer to scan from using:
396
397 - Function: void yy_switch_to_buffer ( YY_BUFFER_STATE new_buffer )
398
399 The above function switches the scanner's input buffer so subsequent
400tokens will come from `new_buffer'. Note that `yy_switch_to_buffer()'
401may be used by `yywrap()' to set things up for continued scanning,
402instead of opening a new file and pointing `yyin' at it. If you are
403looking for a stack of input buffers, then you want to use
404`yypush_buffer_state()' instead of this function. Note also that
405switching input sources via either `yy_switch_to_buffer()' or
406`yywrap()' does _not_ change the start condition.
407
408 - Function: void yy_delete_buffer ( YY_BUFFER_STATE buffer )
409
410 is used to reclaim the storage associated with a buffer. (`buffer'
411can be NULL, in which case the routine does nothing.) You can also
412clear the current contents of a buffer using:
413
414 - Function: void yypush_buffer_state ( YY_BUFFER_STATE buffer )
415
416 This function pushes the new buffer state onto an internal stack.
417The pushed state becomes the new current state. The stack is maintained
418by flex and will grow as required. This function is intended to be used
419instead of `yy_switch_to_buffer', when you want to change states, but
420preserve the current state for later use.
421
422 - Function: void yypop_buffer_state ( )
423
424 This function removes the current state from the top of the stack,
425and deletes it by calling `yy_delete_buffer'. The next state on the
426stack, if any, becomes the new current state.
427
428 - Function: void yy_flush_buffer ( YY_BUFFER_STATE buffer )
429
430 This function discards the buffer's contents, so the next time the
431scanner attempts to match a token from the buffer, it will first fill
432the buffer anew using `YY_INPUT()'.
433
434 - Function: YY_BUFFER_STATE yy_new_buffer ( FILE *file, int size )
435
436 is an alias for `yy_create_buffer()', provided for compatibility
437with the C++ use of `new' and `delete' for creating and destroying
438dynamic objects.
439
440 `YY_CURRENT_BUFFER' macro returns a `YY_BUFFER_STATE' handle to the
441current buffer. It should not be used as an lvalue.
442
443 Here are two examples of using these features for writing a scanner
444which expands include files (the `<<EOF>>' feature is discussed below).
445
446 This first example uses yypush_buffer_state and yypop_buffer_state.
447Flex maintains the stack internally.
448
449
450 /* the "incl" state is used for picking up the name
451 * of an include file
452 */
453 %x incl
454 %%
455 include BEGIN(incl);
456
457 [a-z]+ ECHO;
458 [^a-z\n]*\n? ECHO;
459
460 <incl>[ \t]* /* eat the whitespace */
461 <incl>[^ \t\n]+ { /* got the include file name */
462 yyin = fopen( yytext, "r" );
463
464 if ( ! yyin )
465 error( ... );
466
467 yypush_buffer_state(yy_create_buffer( yyin, YY_BUF_SIZE ));
468
469 BEGIN(INITIAL);
470 }
471
472 <<EOF>> {
473 yypop_buffer_state();
474
475 if ( !YY_CURRENT_BUFFER )
476 {
477 yyterminate();
478 }
479 }
480
481 The second example, below, does the same thing as the previous
482example did, but manages its own input buffer stack manually (instead
483of letting flex do it).
484
485
486 /* the "incl" state is used for picking up the name
487 * of an include file
488 */
489 %x incl
490
491 %{
492 #define MAX_INCLUDE_DEPTH 10
493 YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
494 int include_stack_ptr = 0;
495 %}
496
497 %%
498 include BEGIN(incl);
499
500 [a-z]+ ECHO;
501 [^a-z\n]*\n? ECHO;
502
503 <incl>[ \t]* /* eat the whitespace */
504 <incl>[^ \t\n]+ { /* got the include file name */
505 if ( include_stack_ptr >= MAX_INCLUDE_DEPTH )
506 {
507 fprintf( stderr, "Includes nested too deeply" );
508 exit( 1 );
509 }
510
511 include_stack[include_stack_ptr++] =
512 YY_CURRENT_BUFFER;
513
514 yyin = fopen( yytext, "r" );
515
516 if ( ! yyin )
517 error( ... );
518
519 yy_switch_to_buffer(
520 yy_create_buffer( yyin, YY_BUF_SIZE ) );
521
522 BEGIN(INITIAL);
523 }
524
525 <<EOF>> {
526 if ( --include_stack_ptr 0 )
527 {
528 yyterminate();
529 }
530
531 else
532 {
533 yy_delete_buffer( YY_CURRENT_BUFFER );
534 yy_switch_to_buffer(
535 include_stack[include_stack_ptr] );
536 }
537 }
538
539 The following routines are available for setting up input buffers for
540scanning in-memory strings instead of files. All of them create a new
541input buffer for scanning the string, and return a corresponding
542`YY_BUFFER_STATE' handle (which you should delete with
543`yy_delete_buffer()' when done with it). They also switch to the new
544buffer using `yy_switch_to_buffer()', so the next call to `yylex()'
545will start scanning the string.
546
547 - Function: YY_BUFFER_STATE yy_scan_string ( const char *str )
548 scans a NUL-terminated string.
549
550 - Function: YY_BUFFER_STATE yy_scan_bytes ( const char *bytes, int len
551 )
552 scans `len' bytes (including possibly `NUL's) starting at location
553 `bytes'.
554
555 Note that both of these functions create and scan a _copy_ of the
556string or bytes. (This may be desirable, since `yylex()' modifies the
557contents of the buffer it is scanning.) You can avoid the copy by
558using:
559
560 - Function: YY_BUFFER_STATE yy_scan_buffer (char *base, yy_size_t size)
561 which scans in place the buffer starting at `base', consisting of
562 `size' bytes, the last two bytes of which _must_ be
563 `YY_END_OF_BUFFER_CHAR' (ASCII NUL). These last two bytes are not
564 scanned; thus, scanning consists of `base[0]' through
565 `base[size-2]', inclusive.
566
567 If you fail to set up `base' in this manner (i.e., forget the final
568two `YY_END_OF_BUFFER_CHAR' bytes), then `yy_scan_buffer()' returns a
569NULL pointer instead of creating a new input buffer.
570
571 - Data type: yy_size_t
572 is an integral type to which you can cast an integer expression
573 reflecting the size of the buffer.
574
575
576File: flex.info, Node: EOF, Next: Misc Macros, Prev: Multiple Input Buffers, Up: Top
577
578End-of-File Rules
579*****************
580
581 The special rule `<<EOF>>' indicates actions which are to be taken
582when an end-of-file is encountered and `yywrap()' returns non-zero
583(i.e., indicates no further files to process). The action must finish
584by doing one of the following things:
585
586 * assigning `yyin' to a new input file (in previous versions of
587 `flex', after doing the assignment you had to call the special
588 action `YY_NEW_FILE'. This is no longer necessary.)
589
590 * executing a `return' statement;
591
592 * executing the special `yyterminate()' action.
593
594 * or, switching to a new buffer using `yy_switch_to_buffer()' as
595 shown in the example above.
596
597 <<EOF>> rules may not be used with other patterns; they may only be
598qualified with a list of start conditions. If an unqualified <<EOF>>
599rule is given, it applies to _all_ start conditions which do not
600already have <<EOF>> actions. To specify an <<EOF>> rule for only the
601initial start condition, use:
602
603
604 <INITIAL><<EOF>>
605
606 These rules are useful for catching things like unclosed comments.
607An example:
608
609
610 %x quote
611 %%
612
613 ...other rules for dealing with quotes...
614
615 <quote><<EOF>> {
616 error( "unterminated quote" );
617 yyterminate();
618 }
619 <<EOF>> {
620 if ( *++filelist )
621 yyin = fopen( *filelist, "r" );
622 else
623 yyterminate();
624 }
625
626
627File: flex.info, Node: Misc Macros, Next: User Values, Prev: EOF, Up: Top
628
629Miscellaneous Macros
630********************
631
632 The macro `YY_USER_ACTION' can be defined to provide an action which
633is always executed prior to the matched rule's action. For example, it
634could be #define'd to call a routine to convert yytext to lower-case.
635When `YY_USER_ACTION' is invoked, the variable `yy_act' gives the
636number of the matched rule (rules are numbered starting with 1).
637Suppose you want to profile how often each of your rules is matched.
638The following would do the trick:
639
640
641 #define YY_USER_ACTION ++ctr[yy_act]
642
643 where `ctr' is an array to hold the counts for the different rules.
644Note that the macro `YY_NUM_RULES' gives the total number of rules
645(including the default rule), even if you use `-s)', so a correct
646declaration for `ctr' is:
647
648
649 int ctr[YY_NUM_RULES];
650
651 The macro `YY_USER_INIT' may be defined to provide an action which
652is always executed before the first scan (and before the scanner's
653internal initializations are done). For example, it could be used to
654call a routine to read in a data table or open a logging file.
655
656 The macro `yy_set_interactive(is_interactive)' can be used to
657control whether the current buffer is considered "interactive". An
658interactive buffer is processed more slowly, but must be used when the
659scanner's input source is indeed interactive to avoid problems due to
660waiting to fill buffers (see the discussion of the `-I' flag in *Note
661Scanner Options::). A non-zero value in the macro invocation marks the
662buffer as interactive, a zero value as non-interactive. Note that use
663of this macro overrides `%option always-interactive' or `%option
664never-interactive' (*note Scanner Options::). `yy_set_interactive()'
665must be invoked prior to beginning to scan the buffer that is (or is
666not) to be considered interactive.
667
668 The macro `yy_set_bol(at_bol)' can be used to control whether the
669current buffer's scanning context for the next token match is done as
670though at the beginning of a line. A non-zero macro argument makes
671rules anchored with `^' active, while a zero argument makes `^' rules
672inactive.
673
674 The macro `YY_AT_BOL()' returns true if the next token scanned from
675the current buffer will have `^' rules active, false otherwise.
676
677 In the generated scanner, the actions are all gathered in one large
678switch statement and separated using `YY_BREAK', which may be
679redefined. By default, it is simply a `break', to separate each rule's
680action from the following rule's. Redefining `YY_BREAK' allows, for
681example, C++ users to #define YY_BREAK to do nothing (while being very
682careful that every rule ends with a `break'" or a `return'!) to avoid
683suffering from unreachable statement warnings where because a rule's
684action ends with `return', the `YY_BREAK' is inaccessible.
685
686
687File: flex.info, Node: User Values, Next: Yacc, Prev: Misc Macros, Up: Top
688
689Values Available To the User
690****************************
691
692 This chapter summarizes the various values available to the user in
693the rule actions.
694
695`char *yytext'
696 holds the text of the current token. It may be modified but not
697 lengthened (you cannot append characters to the end).
698
699 If the special directive `%array' appears in the first section of
700 the scanner description, then `yytext' is instead declared `char
701 yytext[YYLMAX]', where `YYLMAX' is a macro definition that you can
702 redefine in the first section if you don't like the default value
703 (generally 8KB). Using `%array' results in somewhat slower
704 scanners, but the value of `yytext' becomes immune to calls to
705 `unput()', which potentially destroy its value when `yytext' is a
706 character pointer. The opposite of `%array' is `%pointer', which
707 is the default.
708
709 You cannot use `%array' when generating C++ scanner classes (the
710 `-+' flag).
711
712`int yyleng'
713 holds the length of the current token.
714
715`FILE *yyin'
716 is the file which by default `flex' reads from. It may be
717 redefined but doing so only makes sense before scanning begins or
718 after an EOF has been encountered. Changing it in the midst of
719 scanning will have unexpected results since `flex' buffers its
720 input; use `yyrestart()' instead. Once scanning terminates
721 because an end-of-file has been seen, you can assign `yyin' at the
722 new input file and then call the scanner again to continue
723 scanning.
724
725`void yyrestart( FILE *new_file )'
726 may be called to point `yyin' at the new input file. The
727 switch-over to the new file is immediate (any previously
728 buffered-up input is lost). Note that calling `yyrestart()' with
729 `yyin' as an argument thus throws away the current input buffer
730 and continues scanning the same input file.
731
732`FILE *yyout'
733 is the file to which `ECHO' actions are done. It can be reassigned
734 by the user.
735
736`YY_CURRENT_BUFFER'
737 returns a `YY_BUFFER_STATE' handle to the current buffer.
738
739`YY_START'
740 returns an integer value corresponding to the current start
741 condition. You can subsequently use this value with `BEGIN' to
742 return to that start condition.
743
744
745File: flex.info, Node: Yacc, Next: Scanner Options, Prev: User Values, Up: Top
746
747Interfacing with Yacc
748*********************
749
750 One of the main uses of `flex' is as a companion to the `yacc'
751parser-generator. `yacc' parsers expect to call a routine named
752`yylex()' to find the next input token. The routine is supposed to
753return the type of the next token as well as putting any associated
754value in the global `yylval'. To use `flex' with `yacc', one specifies
755the `-d' option to `yacc' to instruct it to generate the file `y.tab.h'
756containing definitions of all the `%tokens' appearing in the `yacc'
757input. This file is then included in the `flex' scanner. For example,
758if one of the tokens is `TOK_NUMBER', part of the scanner might look
759like:
760
761
762 %{
763 #include "y.tab.h"
764 %}
765
766 %%
767
768 [0-9]+ yylval = atoi( yytext ); return TOK_NUMBER;
769
770
771File: flex.info, Node: Scanner Options, Next: Performance, Prev: Yacc, Up: Top
772
773Scanner Options
774***************
775
776 The various `flex' options are categorized by function in the
777following menu. If you want to lookup a particular option by name,
778*Note Index of Scanner Options::.
779
780* Menu:
781
782* Options for Specifing Filenames::
783* Options Affecting Scanner Behavior::
784* Code-Level And API Options::
785* Options for Scanner Speed and Size::
786* Debugging Options::
787* Miscellaneous Options::
788
789 Even though there are many scanner options, a typical scanner might
790only specify the following options:
791
792
793 %option 8bit reentrant bison-bridge
794 %option warn nodefault
795 %option yylineno
796 %option outfile="scanner.c" header-file="scanner.h"
797
798 The first line specifies the general type of scanner we want. The
799second line specifies that we are being careful. The third line asks
800flex to track line numbers. The last line tells flex what to name the
801files. (The options can be specified in any order. We just dividied
802them.)
803
804 `flex' also provides a mechanism for controlling options within the
805scanner specification itself, rather than from the flex command-line.
806This is done by including `%option' directives in the first section of
807the scanner specification. You can specify multiple options with a
808single `%option' directive, and multiple directives in the first
809section of your flex input file.
810
811 Most options are given simply as names, optionally preceded by the
812word `no' (with no intervening whitespace) to negate their meaning.
813The names are the same as their long-option equivalents (but without the
814leading `--' ).
815
816 `flex' scans your rule actions to determine whether you use the
817`REJECT' or `yymore()' features. The `REJECT' and `yymore' options are
818available to override its decision as to whether you use the options,
819either by setting them (e.g., `%option reject)' to indicate the feature
820is indeed used, or unsetting them to indicate it actually is not used
821(e.g., `%option noyymore)'.
822
823 A number of options are available for lint purists who want to
824suppress the appearance of unneeded routines in the generated scanner.
825Each of the following, if unset (e.g., `%option nounput'), results in
826the corresponding routine not appearing in the generated scanner:
827
828
829 input, unput
830 yy_push_state, yy_pop_state, yy_top_state
831 yy_scan_buffer, yy_scan_bytes, yy_scan_string
832
833 yyget_extra, yyset_extra, yyget_leng, yyget_text,
834 yyget_lineno, yyset_lineno, yyget_in, yyset_in,
835 yyget_out, yyset_out, yyget_lval, yyset_lval,
836 yyget_lloc, yyset_lloc, yyget_debug, yyset_debug
837
838 (though `yy_push_state()' and friends won't appear anyway unless you
839use `%option stack)'.
840
841
842File: flex.info, Node: Options for Specifing Filenames, Next: Options Affecting Scanner Behavior, Prev: Scanner Options, Up: Scanner Options
843
844Options for Specifing Filenames
845===============================
846
847`--header-file=FILE, `%option header-file="FILE"''
848 instructs flex to write a C header to `FILE'. This file contains
849 function prototypes, extern variables, and types used by the
850 scanner. Only the external API is exported by the header file.
851 Many macros that are usable from within scanner actions are not
852 exported to the header file. This is due to namespace problems and
853 the goal of a clean external API.
854
855 While in the header, the macro `yyIN_HEADER' is defined, where `yy'
856 is substituted with the appropriate prefix.
857
858 The `--header-file' option is not compatible with the `--c++'
859 option, since the C++ scanner provides its own header in
860 `yyFlexLexer.h'.
861
862`-oFILE, --outfile=FILE, `%option outfile="FILE"''
863 directs flex to write the scanner to the file `FILE' instead of
864 `lex.yy.c'. If you combine `--outfile' with the `--stdout' option,
865 then the scanner is written to `stdout' but its `#line' directives
866 (see the `-l' option above) refer to the file `FILE'.
867
868`-t, --stdout, `%option stdout''
869 instructs `flex' to write the scanner it generates to standard
870 output instead of `lex.yy.c'.
871
872`-SFILE, --skel=FILE'
873 overrides the default skeleton file from which `flex' constructs
874 its scanners. You'll never need this option unless you are doing
875 `flex' maintenance or development.
876
877`--tables-file=FILE'
878 Write serialized scanner dfa tables to FILE. The generated scanner
879 will not contain the tables, and requires them to be loaded at
880 runtime. *Note serialization::.
881
882`--tables-verify'
883 This option is for flex development. We document it here in case
884 you stumble upon it by accident or in case you suspect some
885 inconsistency in the serialized tables. Flex will serialize the
886 scanner dfa tables but will also generate the in-code tables as it
887 normally does. At runtime, the scanner will verify that the
888 serialized tables match the in-code tables, instead of loading
889 them.
890
891
892
893File: flex.info, Node: Options Affecting Scanner Behavior, Next: Code-Level And API Options, Prev: Options for Specifing Filenames, Up: Scanner Options
894
895Options Affecting Scanner Behavior
896==================================
897
898`-i, --case-insensitive, `%option case-insensitive''
899 instructs `flex' to generate a "case-insensitive" scanner. The
900 case of letters given in the `flex' input patterns will be ignored,
901 and tokens in the input will be matched regardless of case. The
902 matched text given in `yytext' will have the preserved case (i.e.,
903 it will not be folded). For tricky behavior, see *Note case and
904 character ranges::.
905
906`-l, --lex-compat, `%option lex-compat''
907 turns on maximum compatibility with the original AT&T `lex'
908 implementation. Note that this does not mean _full_ compatibility.
909 Use of this option costs a considerable amount of performance, and
910 it cannot be used with the `--c++', `--full', `--fast', `-Cf', or
911 `-CF' options. For details on the compatibilities it provides, see
912 *Note Lex and Posix::. This option also results in the name
913 `YY_FLEX_LEX_COMPAT' being `#define''d in the generated scanner.
914
915`-B, --batch, `%option batch''
916 instructs `flex' to generate a "batch" scanner, the opposite of
917 _interactive_ scanners generated by `--interactive' (see below).
918 In general, you use `-B' when you are _certain_ that your scanner
919 will never be used interactively, and you want to squeeze a
920 _little_ more performance out of it. If your goal is instead to
921 squeeze out a _lot_ more performance, you should be using the
922 `-Cf' or `-CF' options, which turn on `--batch' automatically
923 anyway.
924
925`-I, --interactive, `%option interactive''
926 instructs `flex' to generate an interactive scanner. An
927 interactive scanner is one that only looks ahead to decide what
928 token has been matched if it absolutely must. It turns out that
929 always looking one extra character ahead, even if the scanner has
930 already seen enough text to disambiguate the current token, is a
931 bit faster than only looking ahead when necessary. But scanners
932 that always look ahead give dreadful interactive performance; for
933 example, when a user types a newline, it is not recognized as a
934 newline token until they enter _another_ token, which often means
935 typing in another whole line.
936
937 `flex' scanners default to `interactive' unless you use the `-Cf'
938 or `-CF' table-compression options (*note Performance::). That's
939 because if you're looking for high-performance you should be using
940 one of these options, so if you didn't, `flex' assumes you'd
941 rather trade off a bit of run-time performance for intuitive
942 interactive behavior. Note also that you _cannot_ use
943 `--interactive' in conjunction with `-Cf' or `-CF'. Thus, this
944 option is not really needed; it is on by default for all those
945 cases in which it is allowed.
946
947 You can force a scanner to _not_ be interactive by using `--batch'
948
949`-7, --7bit, `%option 7bit''
950 instructs `flex' to generate a 7-bit scanner, i.e., one which can
951 only recognize 7-bit characters in its input. The advantage of
952 using `--7bit' is that the scanner's tables can be up to half the
953 size of those generated using the `--8bit'. The disadvantage is
954 that such scanners often hang or crash if their input contains an
955 8-bit character.
956
957 Note, however, that unless you generate your scanner using the
958 `-Cf' or `-CF' table compression options, use of `--7bit' will
959 save only a small amount of table space, and make your scanner
960 considerably less portable. `Flex''s default behavior is to
961 generate an 8-bit scanner unless you use the `-Cf' or `-CF', in
962 which case `flex' defaults to generating 7-bit scanners unless
963 your site was always configured to generate 8-bit scanners (as will
964 often be the case with non-USA sites). You can tell whether flex
965 generated a 7-bit or an 8-bit scanner by inspecting the flag
966 summary in the `--verbose' output as described above.
967
968 Note that if you use `-Cfe' or `-CFe' `flex' still defaults to
969 generating an 8-bit scanner, since usually with these compression
970 options full 8-bit tables are not much more expensive than 7-bit
971 tables.
972
973`-8, --8bit, `%option 8bit''
974 instructs `flex' to generate an 8-bit scanner, i.e., one which can
975 recognize 8-bit characters. This flag is only needed for scanners
976 generated using `-Cf' or `-CF', as otherwise flex defaults to
977 generating an 8-bit scanner anyway.
978
979 See the discussion of `--7bit' above for `flex''s default behavior
980 and the tradeoffs between 7-bit and 8-bit scanners.
981
982`--default, `%option default''
983 generate the default rule.
984
985`--always-interactive, `%option always-interactive''
986 instructs flex to generate a scanner which always considers its
987 input _interactive_. Normally, on each new input file the scanner
988 calls `isatty()' in an attempt to determine whether the scanner's
989 input source is interactive and thus should be read a character at
990 a time. When this option is used, however, then no such call is
991 made.
992
993`--never-interactive, `--never-interactive''
994 instructs flex to generate a scanner which never considers its
995 input interactive. This is the opposite of `always-interactive'.
996
997`-X, --posix, `%option posix''
998 turns on maximum compatibility with the POSIX 1003.2-1992
999 definition of `lex'. Since `flex' was originally designed to
1000 implement the POSIX definition of `lex' this generally involves
1001 very few changes in behavior. At the current writing the known
1002 differences between `flex' and the POSIX standard are:
1003
1004 * In POSIX and AT&T `lex', the repeat operator, `{}', has lower
1005 precedence than concatenation (thus `ab{3}' yields `ababab').
1006 Most POSIX utilities use an Extended Regular Expression (ERE)
1007 precedence that has the precedence of the repeat operator
1008 higher than concatenation (which causes `ab{3}' to yield
1009 `abbb'). By default, `flex' places the precedence of the
1010 repeat operator higher than concatenation which matches the
1011 ERE processing of other POSIX utilities. When either
1012 `--posix' or `-l' are specified, `flex' will use the
1013 traditional AT&T and POSIX-compliant precedence for the
1014 repeat operator where concatenation has higher precedence
1015 than the repeat operator.
1016
1017`--stack, `%option stack''
1018 enables the use of start condition stacks (*note Start
1019 Conditions::).
1020
1021`--stdinit, `%option stdinit''
1022 if set (i.e., %option stdinit) initializes `yyin' and `yyout' to
1023 `stdin' and `stdout', instead of the default of `NULL'. Some
1024 existing `lex' programs depend on this behavior, even though it is
1025 not compliant with ANSI C, which does not require `stdin' and
1026 `stdout' to be compile-time constant. In a reentrant scanner,
1027 however, this is not a problem since initialization is performed
1028 in `yylex_init' at runtime.
1029
1030`--yylineno, `%option yylineno''
1031 directs `flex' to generate a scanner that maintains the number of
1032 the current line read from its input in the global variable
1033 `yylineno'. This option is implied by `%option lex-compat'. In a
1034 reentrant C scanner, the macro `yylineno' is accessible regardless
1035 of the value of `%option yylineno', however, its value is not
1036 modified by `flex' unless `%option yylineno' is enabled.
1037
1038`--yywrap, `%option yywrap''
1039 if unset (i.e., `--noyywrap)', makes the scanner not call
1040 `yywrap()' upon an end-of-file, but simply assume that there are no
1041 more files to scan (until the user points `yyin' at a new file and
1042 calls `yylex()' again).
1043
1044
1045
1046File: flex.info, Node: Code-Level And API Options, Next: Options for Scanner Speed and Size, Prev: Options Affecting Scanner Behavior, Up: Scanner Options
1047
1048Code-Level And API Options
1049==========================
1050
1051`--ansi-definitions, `%option ansi-definitions''
1052 instruct flex to generate ANSI C99 definitions for functions.
1053 This option is enabled by default. If `%option
1054 noansi-definitions' is specified, then the obsolete style is
1055 generated.
1056
1057`--ansi-prototypes, `%option ansi-prototypes''
1058 instructs flex to generate ANSI C99 prototypes for functions.
1059 This option is enabled by default. If `noansi-prototypes' is
1060 specified, then prototypes will have empty parameter lists.
1061
1062`--bison-bridge, `%option bison-bridge''
1063 instructs flex to generate a C scanner that is meant to be called
1064 by a `GNU bison' parser. The scanner has minor API changes for
1065 `bison' compatibility. In particular, the declaration of `yylex'
1066 is modified to take an additional parameter, `yylval'. *Note
1067 Bison Bridge::.
1068
1069`--bison-locations, `%option bison-locations''
1070 instruct flex that `GNU bison' `%locations' are being used. This
1071 means `yylex' will be passed an additional parameter, `yylloc'.
1072 This option implies `%option bison-bridge'. *Note Bison Bridge::.
1073
1074`-L, --noline, `%option noline''
1075 instructs `flex' not to generate `#line' directives. Without this
1076 option, `flex' peppers the generated scanner with `#line'
1077 directives so error messages in the actions will be correctly
1078 located with respect to either the original `flex' input file (if
1079 the errors are due to code in the input file), or `lex.yy.c' (if
1080 the errors are `flex''s fault - you should report these sorts of
1081 errors to the email address given in *Note Reporting Bugs::).
1082
1083`-R, --reentrant, `%option reentrant''
1084 instructs flex to generate a reentrant C scanner. The generated
1085 scanner may safely be used in a multi-threaded environment. The
1086 API for a reentrant scanner is different than for a non-reentrant
1087 scanner *note Reentrant::). Because of the API difference between
1088 reentrant and non-reentrant `flex' scanners, non-reentrant flex
1089 code must be modified before it is suitable for use with this
1090 option. This option is not compatible with the `--c++' option.
1091
1092 The option `--reentrant' does not affect the performance of the
1093 scanner.
1094
1095`-+, --c++, `%option c++''
1096 specifies that you want flex to generate a C++ scanner class.
1097 *Note Cxx::, for details.
1098
1099`--array, `%option array''
1100 specifies that you want yytext to be an array instead of a char*
1101
1102`--pointer, `%option pointer''
1103 specify that `yytext' should be a `char *', not an array. This
1104 default is `char *'.
1105
1106`-PPREFIX, --prefix=PREFIX, `%option prefix="PREFIX"''
1107 changes the default `yy' prefix used by `flex' for all
1108 globally-visible variable and function names to instead be
1109 `PREFIX'. For example, `--prefix=foo' changes the name of
1110 `yytext' to `footext'. It also changes the name of the default
1111 output file from `lex.yy.c' to `lex.foo.c'. Here is a partial
1112 list of the names affected:
1113
1114
1115 yy_create_buffer
1116 yy_delete_buffer
1117 yy_flex_debug
1118 yy_init_buffer
1119 yy_flush_buffer
1120 yy_load_buffer_state
1121 yy_switch_to_buffer
1122 yyin
1123 yyleng
1124 yylex
1125 yylineno
1126 yyout
1127 yyrestart
1128 yytext
1129 yywrap
1130 yyalloc
1131 yyrealloc
1132 yyfree
1133
1134 (If you are using a C++ scanner, then only `yywrap' and
1135 `yyFlexLexer' are affected.) Within your scanner itself, you can
1136 still refer to the global variables and functions using either
1137 version of their name; but externally, they have the modified name.
1138
1139 This option lets you easily link together multiple `flex' programs
1140 into the same executable. Note, though, that using this option
1141 also renames `yywrap()', so you now _must_ either provide your own
1142 (appropriately-named) version of the routine for your scanner, or
1143 use `%option noyywrap', as linking with `-lfl' no longer provides
1144 one for you by default.
1145
1146`--main, `%option main''
1147 directs flex to provide a default `main()' program for the
1148 scanner, which simply calls `yylex()'. This option implies
1149 `noyywrap' (see below).
1150
1151`--nounistd, `%option nounistd''
1152 suppresses inclusion of the non-ANSI header file `unistd.h'. This
1153 option is meant to target environments in which `unistd.h' does
1154 not exist. Be aware that certain options may cause flex to
1155 generate code that relies on functions normally found in
1156 `unistd.h', (e.g. `isatty()', `read()'.) If you wish to use these
1157 functions, you will have to inform your compiler where to find
1158 them. *Note option-always-interactive::. *Note option-read::.
1159
1160`--yyclass, `%option yyclass="NAME"''
1161 only applies when generating a C++ scanner (the `--c++' option).
1162 It informs `flex' that you have derived `foo' as a subclass of
1163 `yyFlexLexer', so `flex' will place your actions in the member
1164 function `foo::yylex()' instead of `yyFlexLexer::yylex()'. It
1165 also generates a `yyFlexLexer::yylex()' member function that emits
1166 a run-time error (by invoking `yyFlexLexer::LexerError())' if
1167 called. *Note Cxx::.
1168
1169
1170
1171File: flex.info, Node: Options for Scanner Speed and Size, Next: Debugging Options, Prev: Code-Level And API Options, Up: Scanner Options
1172
1173Options for Scanner Speed and Size
1174==================================
1175
1176`-C[aefFmr]'
1177 controls the degree of table compression and, more generally,
1178 trade-offs between small scanners and fast scanners.
1179
1180 `-C'
1181 A lone `-C' specifies that the scanner tables should be
1182 compressed but neither equivalence classes nor
1183 meta-equivalence classes should be used.
1184
1185 `-Ca, --align, `%option align''
1186 ("align") instructs flex to trade off larger tables in the
1187 generated scanner for faster performance because the elements
1188 of the tables are better aligned for memory access and
1189 computation. On some RISC architectures, fetching and
1190 manipulating longwords is more efficient than with
1191 smaller-sized units such as shortwords. This option can
1192 quadruple the size of the tables used by your scanner.
1193
1194 `-Ce, --ecs, `%option ecs''
1195 directs `flex' to construct "equivalence classes", i.e., sets
1196 of characters which have identical lexical properties (for
1197 example, if the only appearance of digits in the `flex' input
1198 is in the character class "[0-9]" then the digits '0', '1',
1199 ..., '9' will all be put in the same equivalence class).
1200 Equivalence classes usually give dramatic reductions in the
1201 final table/object file sizes (typically a factor of 2-5) and
1202 are pretty cheap performance-wise (one array look-up per
1203 character scanned).
1204
1205 `-Cf'
1206 specifies that the "full" scanner tables should be generated -
1207 `flex' should not compress the tables by taking advantages of
1208 similar transition functions for different states.
1209
1210 `-CF'
1211 specifies that the alternate fast scanner representation
1212 (described above under the `--fast' flag) should be used.
1213 This option cannot be used with `--c++'.
1214
1215 `-Cm, --meta-ecs, `%option meta-ecs''
1216 directs `flex' to construct "meta-equivalence classes", which
1217 are sets of equivalence classes (or characters, if equivalence
1218 classes are not being used) that are commonly used together.
1219 Meta-equivalence classes are often a big win when using
1220 compressed tables, but they have a moderate performance
1221 impact (one or two `if' tests and one array look-up per
1222 character scanned).
1223
1224 `-Cr, --read, `%option read''
1225 causes the generated scanner to _bypass_ use of the standard
1226 I/O library (`stdio') for input. Instead of calling
1227 `fread()' or `getc()', the scanner will use the `read()'
1228 system call, resulting in a performance gain which varies
1229 from system to system, but in general is probably negligible
1230 unless you are also using `-Cf' or `-CF'. Using `-Cr' can
1231 cause strange behavior if, for example, you read from `yyin'
1232 using `stdio' prior to calling the scanner (because the
1233 scanner will miss whatever text your previous reads left in
1234 the `stdio' input buffer). `-Cr' has no effect if you define
1235 `YY_INPUT()' (*note Generated Scanner::).
1236
1237 The options `-Cf' or `-CF' and `-Cm' do not make sense together -
1238 there is no opportunity for meta-equivalence classes if the table
1239 is not being compressed. Otherwise the options may be freely
1240 mixed, and are cumulative.
1241
1242 The default setting is `-Cem', which specifies that `flex' should
1243 generate equivalence classes and meta-equivalence classes. This
1244 setting provides the highest degree of table compression. You can
1245 trade off faster-executing scanners at the cost of larger tables
1246 with the following generally being true:
1247
1248
1249 slowest & smallest
1250 -Cem
1251 -Cm
1252 -Ce
1253 -C
1254 -C{f,F}e
1255 -C{f,F}
1256 -C{f,F}a
1257 fastest & largest
1258
1259 Note that scanners with the smallest tables are usually generated
1260 and compiled the quickest, so during development you will usually
1261 want to use the default, maximal compression.
1262
1263 `-Cfe' is often a good compromise between speed and size for
1264 production scanners.
1265
1266`-f, --full, `%option full''
1267 specifies "fast scanner". No table compression is done and
1268 `stdio' is bypassed. The result is large but fast. This option
1269 is equivalent to `--Cfr'
1270
1271`-F, --fast, `%option fast''
1272 specifies that the _fast_ scanner table representation should be
1273 used (and `stdio' bypassed). This representation is about as fast
1274 as the full table representation `--full', and for some sets of
1275 patterns will be considerably smaller (and for others, larger). In
1276 general, if the pattern set contains both _keywords_ and a
1277 catch-all, _identifier_ rule, such as in the set:
1278
1279
1280 "case" return TOK_CASE;
1281 "switch" return TOK_SWITCH;
1282 ...
1283 "default" return TOK_DEFAULT;
1284 [a-z]+ return TOK_ID;
1285
1286 then you're better off using the full table representation. If
1287 only the _identifier_ rule is present and you then use a hash
1288 table or some such to detect the keywords, you're better off using
1289 `--fast'.
1290
1291 This option is equivalent to `-CFr' (see below). It cannot be used
1292 with `--c++'.
1293
1294
Note: See TracBrowser for help on using the repository browser.