1 | This is flex.info, produced by makeinfo version 4.5 from flex.texi.
|
---|
2 |
|
---|
3 | INFO-DIR-SECTION Programming
|
---|
4 | START-INFO-DIR-ENTRY
|
---|
5 | * flex: (flex). Fast lexical analyzer generator (lex replacement).
|
---|
6 | END-INFO-DIR-ENTRY
|
---|
7 |
|
---|
8 |
|
---|
9 | The flex manual is placed under the same licensing conditions as the
|
---|
10 | rest of flex:
|
---|
11 |
|
---|
12 | Copyright (C) 1990, 1997 The Regents of the University of California.
|
---|
13 | All rights reserved.
|
---|
14 |
|
---|
15 | This code is derived from software contributed to Berkeley by Vern
|
---|
16 | Paxson.
|
---|
17 |
|
---|
18 | The United States Government has rights in this work pursuant to
|
---|
19 | contract no. DE-AC03-76SF00098 between the United States Department of
|
---|
20 | Energy and the University of California.
|
---|
21 |
|
---|
22 | Redistribution and use in source and binary forms, with or without
|
---|
23 | modification, are permitted provided that the following conditions are
|
---|
24 | met:
|
---|
25 |
|
---|
26 | 1. Redistributions of source code must retain the above copyright
|
---|
27 | notice, this list of conditions and the following disclaimer.
|
---|
28 |
|
---|
29 | 2. Redistributions in binary form must reproduce the above copyright
|
---|
30 | notice, this list of conditions and the following disclaimer in the
|
---|
31 | documentation and/or other materials provided with the
|
---|
32 | distribution.
|
---|
33 | Neither the name of the University nor the names of its contributors
|
---|
34 | may be used to endorse or promote products derived from this software
|
---|
35 | without specific prior written permission.
|
---|
36 |
|
---|
37 | THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
|
---|
38 | WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
|
---|
39 | MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
|
---|
40 |
|
---|
41 | File: flex.info, Node: Start Conditions, Next: Multiple Input Buffers, Prev: Generated Scanner, Up: Top
|
---|
42 |
|
---|
43 | Start Conditions
|
---|
44 | ****************
|
---|
45 |
|
---|
46 | `flex' provides a mechanism for conditionally activating rules. Any
|
---|
47 | rule whose pattern is prefixed with `<sc>' will only be active when the
|
---|
48 | scanner is in the "start condition" named `sc'. For example,
|
---|
49 |
|
---|
50 |
|
---|
51 | <STRING>[^"]* { /* eat up the string body ... */
|
---|
52 | ...
|
---|
53 | }
|
---|
54 |
|
---|
55 | will be active only when the scanner is in the `STRING' start
|
---|
56 | condition, and
|
---|
57 |
|
---|
58 |
|
---|
59 | <INITIAL,STRING,QUOTE>\. { /* handle an escape ... */
|
---|
60 | ...
|
---|
61 | }
|
---|
62 |
|
---|
63 | will be active only when the current start condition is either
|
---|
64 | `INITIAL', `STRING', or `QUOTE'.
|
---|
65 |
|
---|
66 | Start conditions are declared in the definitions (first) section of
|
---|
67 | the input using unindented lines beginning with either `%s' or `%x'
|
---|
68 | followed by a list of names. The former declares "inclusive" start
|
---|
69 | conditions, the latter "exclusive" start conditions. A start condition
|
---|
70 | is activated using the `BEGIN' action. Until the next `BEGIN' action
|
---|
71 | is executed, rules with the given start condition will be active and
|
---|
72 | rules with other start conditions will be inactive. If the start
|
---|
73 | condition is inclusive, then rules with no start conditions at all will
|
---|
74 | also be active. If it is exclusive, then _only_ rules qualified with
|
---|
75 | the start condition will be active. A set of rules contingent on the
|
---|
76 | same exclusive start condition describe a scanner which is independent
|
---|
77 | of any of the other rules in the `flex' input. Because of this,
|
---|
78 | exclusive start conditions make it easy to specify "mini-scanners"
|
---|
79 | which scan portions of the input that are syntactically different from
|
---|
80 | the rest (e.g., comments).
|
---|
81 |
|
---|
82 | If the distinction between inclusive and exclusive start conditions
|
---|
83 | is still a little vague, here's a simple example illustrating the
|
---|
84 | connection between the two. The set of rules:
|
---|
85 |
|
---|
86 |
|
---|
87 | %s example
|
---|
88 | %%
|
---|
89 |
|
---|
90 | <example>foo do_something();
|
---|
91 |
|
---|
92 | bar something_else();
|
---|
93 |
|
---|
94 | is equivalent to
|
---|
95 |
|
---|
96 |
|
---|
97 | %x example
|
---|
98 | %%
|
---|
99 |
|
---|
100 | <example>foo do_something();
|
---|
101 |
|
---|
102 | <INITIAL,example>bar something_else();
|
---|
103 |
|
---|
104 | Without the `<INITIAL,example>' qualifier, the `bar' pattern in the
|
---|
105 | second example wouldn't be active (i.e., couldn't match) when in start
|
---|
106 | condition `example'. If we just used `example>' to qualify `bar',
|
---|
107 | though, then it would only be active in `example' and not in `INITIAL',
|
---|
108 | while in the first example it's active in both, because in the first
|
---|
109 | example the `example' start condition is an inclusive `(%s)' start
|
---|
110 | condition.
|
---|
111 |
|
---|
112 | Also note that the special start-condition specifier `<*>' matches
|
---|
113 | every start condition. Thus, the above example could also have been
|
---|
114 | written:
|
---|
115 |
|
---|
116 |
|
---|
117 | %x example
|
---|
118 | %%
|
---|
119 |
|
---|
120 | <example>foo do_something();
|
---|
121 |
|
---|
122 | <*>bar something_else();
|
---|
123 |
|
---|
124 | The default rule (to `ECHO' any unmatched character) remains active
|
---|
125 | in start conditions. It is equivalent to:
|
---|
126 |
|
---|
127 |
|
---|
128 | <*>.|\n ECHO;
|
---|
129 |
|
---|
130 | `BEGIN(0)' returns to the original state where only the rules with
|
---|
131 | no start conditions are active. This state can also be referred to as
|
---|
132 | the start-condition `INITIAL', so `BEGIN(INITIAL)' is equivalent to
|
---|
133 | `BEGIN(0)'. (The parentheses around the start condition name are not
|
---|
134 | required but are considered good style.)
|
---|
135 |
|
---|
136 | `BEGIN' actions can also be given as indented code at the beginning
|
---|
137 | of the rules section. For example, the following will cause the scanner
|
---|
138 | to enter the `SPECIAL' start condition whenever `yylex()' is called and
|
---|
139 | the global variable `enter_special' is true:
|
---|
140 |
|
---|
141 |
|
---|
142 | int enter_special;
|
---|
143 |
|
---|
144 | %x SPECIAL
|
---|
145 | %%
|
---|
146 | if ( enter_special )
|
---|
147 | BEGIN(SPECIAL);
|
---|
148 |
|
---|
149 | <SPECIAL>blahblahblah
|
---|
150 | ...more rules follow...
|
---|
151 |
|
---|
152 | To illustrate the uses of start conditions, here is a scanner which
|
---|
153 | provides two different interpretations of a string like `123.456'. By
|
---|
154 | default it will treat it as three tokens, the integer `123', a dot
|
---|
155 | (`.'), and the integer `456'. But if the string is preceded earlier in
|
---|
156 | the line by the string `expect-floats' it will treat it as a single
|
---|
157 | token, the floating-point number `123.456':
|
---|
158 |
|
---|
159 |
|
---|
160 | %{
|
---|
161 | #include <math.h>
|
---|
162 | %}
|
---|
163 | %s expect
|
---|
164 |
|
---|
165 | %%
|
---|
166 | expect-floats BEGIN(expect);
|
---|
167 |
|
---|
168 | <expect>[0-9]+@samp{.}[0-9]+ {
|
---|
169 | printf( "found a float, = %f\n",
|
---|
170 | atof( yytext ) );
|
---|
171 | }
|
---|
172 | <expect>\n {
|
---|
173 | /* that's the end of the line, so
|
---|
174 | * we need another "expect-number"
|
---|
175 | * before we'll recognize any more
|
---|
176 | * numbers
|
---|
177 | */
|
---|
178 | BEGIN(INITIAL);
|
---|
179 | }
|
---|
180 |
|
---|
181 | [0-9]+ {
|
---|
182 | printf( "found an integer, = %d\n",
|
---|
183 | atoi( yytext ) );
|
---|
184 | }
|
---|
185 |
|
---|
186 | "." printf( "found a dot\n" );
|
---|
187 |
|
---|
188 | Here is a scanner which recognizes (and discards) C comments while
|
---|
189 | maintaining a count of the current input line.
|
---|
190 |
|
---|
191 |
|
---|
192 | %x comment
|
---|
193 | %%
|
---|
194 | int line_num = 1;
|
---|
195 |
|
---|
196 | "/*" BEGIN(comment);
|
---|
197 |
|
---|
198 | <comment>[^*\n]* /* eat anything that's not a '*' */
|
---|
199 | <comment>"*"+[^*/\n]* /* eat up '*'s not followed by '/'s */
|
---|
200 | <comment>\n ++line_num;
|
---|
201 | <comment>"*"+"/" BEGIN(INITIAL);
|
---|
202 |
|
---|
203 | This scanner goes to a bit of trouble to match as much text as
|
---|
204 | possible with each rule. In general, when attempting to write a
|
---|
205 | high-speed scanner try to match as much possible in each rule, as it's
|
---|
206 | a big win.
|
---|
207 |
|
---|
208 | Note that start-conditions names are really integer values and can
|
---|
209 | be stored as such. Thus, the above could be extended in the following
|
---|
210 | fashion:
|
---|
211 |
|
---|
212 |
|
---|
213 | %x comment foo
|
---|
214 | %%
|
---|
215 | int line_num = 1;
|
---|
216 | int comment_caller;
|
---|
217 |
|
---|
218 | "/*" {
|
---|
219 | comment_caller = INITIAL;
|
---|
220 | BEGIN(comment);
|
---|
221 | }
|
---|
222 |
|
---|
223 | ...
|
---|
224 |
|
---|
225 | <foo>"/*" {
|
---|
226 | comment_caller = foo;
|
---|
227 | BEGIN(comment);
|
---|
228 | }
|
---|
229 |
|
---|
230 | <comment>[^*\n]* /* eat anything that's not a '*' */
|
---|
231 | <comment>"*"+[^*/\n]* /* eat up '*'s not followed by '/'s */
|
---|
232 | <comment>\n ++line_num;
|
---|
233 | <comment>"*"+"/" BEGIN(comment_caller);
|
---|
234 |
|
---|
235 | Furthermore, you can access the current start condition using the
|
---|
236 | integer-valued `YY_START' macro. For example, the above assignments to
|
---|
237 | `comment_caller' could instead be written
|
---|
238 |
|
---|
239 |
|
---|
240 | comment_caller = YY_START;
|
---|
241 |
|
---|
242 | Flex provides `YYSTATE' as an alias for `YY_START' (since that is
|
---|
243 | what's used by AT&T `lex').
|
---|
244 |
|
---|
245 | For historical reasons, start conditions do not have their own
|
---|
246 | name-space within the generated scanner. The start condition names are
|
---|
247 | unmodified in the generated scanner and generated header. *Note
|
---|
248 | option-header::. *Note option-prefix::.
|
---|
249 |
|
---|
250 | Finally, here's an example of how to match C-style quoted strings
|
---|
251 | using exclusive start conditions, including expanded escape sequences
|
---|
252 | (but not including checking for a string that's too long):
|
---|
253 |
|
---|
254 |
|
---|
255 | %x str
|
---|
256 |
|
---|
257 | %%
|
---|
258 | char string_buf[MAX_STR_CONST];
|
---|
259 | char *string_buf_ptr;
|
---|
260 |
|
---|
261 |
|
---|
262 | \" string_buf_ptr = string_buf; BEGIN(str);
|
---|
263 |
|
---|
264 | <str>\" { /* saw closing quote - all done */
|
---|
265 | BEGIN(INITIAL);
|
---|
266 | *string_buf_ptr = '\0';
|
---|
267 | /* return string constant token type and
|
---|
268 | * value to parser
|
---|
269 | */
|
---|
270 | }
|
---|
271 |
|
---|
272 | <str>\n {
|
---|
273 | /* error - unterminated string constant */
|
---|
274 | /* generate error message */
|
---|
275 | }
|
---|
276 |
|
---|
277 | <str>\\[0-7]{1,3} {
|
---|
278 | /* octal escape sequence */
|
---|
279 | int result;
|
---|
280 |
|
---|
281 | (void) sscanf( yytext + 1, "%o", &result );
|
---|
282 |
|
---|
283 | if ( result > 0xff )
|
---|
284 | /* error, constant is out-of-bounds */
|
---|
285 |
|
---|
286 | *string_buf_ptr++ = result;
|
---|
287 | }
|
---|
288 |
|
---|
289 | <str>\\[0-9]+ {
|
---|
290 | /* generate error - bad escape sequence; something
|
---|
291 | * like '\48' or '\0777777'
|
---|
292 | */
|
---|
293 | }
|
---|
294 |
|
---|
295 | <str>\\n *string_buf_ptr++ = '\n';
|
---|
296 | <str>\\t *string_buf_ptr++ = '\t';
|
---|
297 | <str>\\r *string_buf_ptr++ = '\r';
|
---|
298 | <str>\\b *string_buf_ptr++ = '\b';
|
---|
299 | <str>\\f *string_buf_ptr++ = '\f';
|
---|
300 |
|
---|
301 | <str>\\(.|\n) *string_buf_ptr++ = yytext[1];
|
---|
302 |
|
---|
303 | <str>[^\\\n\"]+ {
|
---|
304 | char *yptr = yytext;
|
---|
305 |
|
---|
306 | while ( *yptr )
|
---|
307 | *string_buf_ptr++ = *yptr++;
|
---|
308 | }
|
---|
309 |
|
---|
310 | Often, such as in some of the examples above, you wind up writing a
|
---|
311 | whole bunch of rules all preceded by the same start condition(s). Flex
|
---|
312 | makes this a little easier and cleaner by introducing a notion of start
|
---|
313 | condition "scope". A start condition scope is begun with:
|
---|
314 |
|
---|
315 |
|
---|
316 | <SCs>{
|
---|
317 |
|
---|
318 | where `SCs' is a list of one or more start conditions. Inside the
|
---|
319 | start condition scope, every rule automatically has the prefix `SCs>'
|
---|
320 | applied to it, until a `}' which matches the initial `{'. So, for
|
---|
321 | example,
|
---|
322 |
|
---|
323 |
|
---|
324 | <ESC>{
|
---|
325 | "\\n" return '\n';
|
---|
326 | "\\r" return '\r';
|
---|
327 | "\\f" return '\f';
|
---|
328 | "\\0" return '\0';
|
---|
329 | }
|
---|
330 |
|
---|
331 | is equivalent to:
|
---|
332 |
|
---|
333 |
|
---|
334 | <ESC>"\\n" return '\n';
|
---|
335 | <ESC>"\\r" return '\r';
|
---|
336 | <ESC>"\\f" return '\f';
|
---|
337 | <ESC>"\\0" return '\0';
|
---|
338 |
|
---|
339 | Start condition scopes may be nested.
|
---|
340 |
|
---|
341 | The following routines are available for manipulating stacks of
|
---|
342 | start conditions:
|
---|
343 |
|
---|
344 | - Function: void yy_push_state ( int `new_state' )
|
---|
345 | pushes the current start condition onto the top of the start
|
---|
346 | condition stack and switches to `new_state' as though you had used
|
---|
347 | `BEGIN new_state' (recall that start condition names are also
|
---|
348 | integers).
|
---|
349 |
|
---|
350 | - Function: void yy_pop_state ()
|
---|
351 | pops the top of the stack and switches to it via `BEGIN'.
|
---|
352 |
|
---|
353 | - Function: int yy_top_state ()
|
---|
354 | returns the top of the stack without altering the stack's contents.
|
---|
355 |
|
---|
356 | The start condition stack grows dynamically and so has no built-in
|
---|
357 | size limitation. If memory is exhausted, program execution aborts.
|
---|
358 |
|
---|
359 | To use start condition stacks, your scanner must include a `%option
|
---|
360 | stack' directive (*note Scanner Options::).
|
---|
361 |
|
---|
362 |
|
---|
363 | File: flex.info, Node: Multiple Input Buffers, Next: EOF, Prev: Start Conditions, Up: Top
|
---|
364 |
|
---|
365 | Multiple Input Buffers
|
---|
366 | **********************
|
---|
367 |
|
---|
368 | Some scanners (such as those which support "include" files) require
|
---|
369 | reading from several input streams. As `flex' scanners do a large
|
---|
370 | amount of buffering, one cannot control where the next input will be
|
---|
371 | read from by simply writing a `YY_INPUT()' which is sensitive to the
|
---|
372 | scanning context. `YY_INPUT()' is only called when the scanner reaches
|
---|
373 | the end of its buffer, which may be a long time after scanning a
|
---|
374 | statement such as an `include' statement which requires switching the
|
---|
375 | input source.
|
---|
376 |
|
---|
377 | To negotiate these sorts of problems, `flex' provides a mechanism
|
---|
378 | for creating and switching between multiple input buffers. An input
|
---|
379 | buffer is created by using:
|
---|
380 |
|
---|
381 | - Function: YY_BUFFER_STATE yy_create_buffer ( FILE *file, int size )
|
---|
382 |
|
---|
383 | which takes a `FILE' pointer and a size and creates a buffer
|
---|
384 | associated with the given file and large enough to hold `size'
|
---|
385 | characters (when in doubt, use `YY_BUF_SIZE' for the size). It returns
|
---|
386 | a `YY_BUFFER_STATE' handle, which may then be passed to other routines
|
---|
387 | (see below). The `YY_BUFFER_STATE' type is a pointer to an opaque
|
---|
388 | `struct yy_buffer_state' structure, so you may safely initialize
|
---|
389 | `YY_BUFFER_STATE' variables to `((YY_BUFFER_STATE) 0)' if you wish, and
|
---|
390 | also refer to the opaque structure in order to correctly declare input
|
---|
391 | buffers in source files other than that of your scanner. Note that the
|
---|
392 | `FILE' pointer in the call to `yy_create_buffer' is only used as the
|
---|
393 | value of `yyin' seen by `YY_INPUT'. If you redefine `YY_INPUT()' so it
|
---|
394 | no longer uses `yyin', then you can safely pass a NULL `FILE' pointer to
|
---|
395 | `yy_create_buffer'. You select a particular buffer to scan from using:
|
---|
396 |
|
---|
397 | - Function: void yy_switch_to_buffer ( YY_BUFFER_STATE new_buffer )
|
---|
398 |
|
---|
399 | The above function switches the scanner's input buffer so subsequent
|
---|
400 | tokens will come from `new_buffer'. Note that `yy_switch_to_buffer()'
|
---|
401 | may be used by `yywrap()' to set things up for continued scanning,
|
---|
402 | instead of opening a new file and pointing `yyin' at it. If you are
|
---|
403 | looking for a stack of input buffers, then you want to use
|
---|
404 | `yypush_buffer_state()' instead of this function. Note also that
|
---|
405 | switching input sources via either `yy_switch_to_buffer()' or
|
---|
406 | `yywrap()' does _not_ change the start condition.
|
---|
407 |
|
---|
408 | - Function: void yy_delete_buffer ( YY_BUFFER_STATE buffer )
|
---|
409 |
|
---|
410 | is used to reclaim the storage associated with a buffer. (`buffer'
|
---|
411 | can be NULL, in which case the routine does nothing.) You can also
|
---|
412 | clear the current contents of a buffer using:
|
---|
413 |
|
---|
414 | - Function: void yypush_buffer_state ( YY_BUFFER_STATE buffer )
|
---|
415 |
|
---|
416 | This function pushes the new buffer state onto an internal stack.
|
---|
417 | The pushed state becomes the new current state. The stack is maintained
|
---|
418 | by flex and will grow as required. This function is intended to be used
|
---|
419 | instead of `yy_switch_to_buffer', when you want to change states, but
|
---|
420 | preserve the current state for later use.
|
---|
421 |
|
---|
422 | - Function: void yypop_buffer_state ( )
|
---|
423 |
|
---|
424 | This function removes the current state from the top of the stack,
|
---|
425 | and deletes it by calling `yy_delete_buffer'. The next state on the
|
---|
426 | stack, if any, becomes the new current state.
|
---|
427 |
|
---|
428 | - Function: void yy_flush_buffer ( YY_BUFFER_STATE buffer )
|
---|
429 |
|
---|
430 | This function discards the buffer's contents, so the next time the
|
---|
431 | scanner attempts to match a token from the buffer, it will first fill
|
---|
432 | the buffer anew using `YY_INPUT()'.
|
---|
433 |
|
---|
434 | - Function: YY_BUFFER_STATE yy_new_buffer ( FILE *file, int size )
|
---|
435 |
|
---|
436 | is an alias for `yy_create_buffer()', provided for compatibility
|
---|
437 | with the C++ use of `new' and `delete' for creating and destroying
|
---|
438 | dynamic objects.
|
---|
439 |
|
---|
440 | `YY_CURRENT_BUFFER' macro returns a `YY_BUFFER_STATE' handle to the
|
---|
441 | current buffer. It should not be used as an lvalue.
|
---|
442 |
|
---|
443 | Here are two examples of using these features for writing a scanner
|
---|
444 | which expands include files (the `<<EOF>>' feature is discussed below).
|
---|
445 |
|
---|
446 | This first example uses yypush_buffer_state and yypop_buffer_state.
|
---|
447 | Flex maintains the stack internally.
|
---|
448 |
|
---|
449 |
|
---|
450 | /* the "incl" state is used for picking up the name
|
---|
451 | * of an include file
|
---|
452 | */
|
---|
453 | %x incl
|
---|
454 | %%
|
---|
455 | include BEGIN(incl);
|
---|
456 |
|
---|
457 | [a-z]+ ECHO;
|
---|
458 | [^a-z\n]*\n? ECHO;
|
---|
459 |
|
---|
460 | <incl>[ \t]* /* eat the whitespace */
|
---|
461 | <incl>[^ \t\n]+ { /* got the include file name */
|
---|
462 | yyin = fopen( yytext, "r" );
|
---|
463 |
|
---|
464 | if ( ! yyin )
|
---|
465 | error( ... );
|
---|
466 |
|
---|
467 | yypush_buffer_state(yy_create_buffer( yyin, YY_BUF_SIZE ));
|
---|
468 |
|
---|
469 | BEGIN(INITIAL);
|
---|
470 | }
|
---|
471 |
|
---|
472 | <<EOF>> {
|
---|
473 | yypop_buffer_state();
|
---|
474 |
|
---|
475 | if ( !YY_CURRENT_BUFFER )
|
---|
476 | {
|
---|
477 | yyterminate();
|
---|
478 | }
|
---|
479 | }
|
---|
480 |
|
---|
481 | The second example, below, does the same thing as the previous
|
---|
482 | example did, but manages its own input buffer stack manually (instead
|
---|
483 | of letting flex do it).
|
---|
484 |
|
---|
485 |
|
---|
486 | /* the "incl" state is used for picking up the name
|
---|
487 | * of an include file
|
---|
488 | */
|
---|
489 | %x incl
|
---|
490 |
|
---|
491 | %{
|
---|
492 | #define MAX_INCLUDE_DEPTH 10
|
---|
493 | YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
|
---|
494 | int include_stack_ptr = 0;
|
---|
495 | %}
|
---|
496 |
|
---|
497 | %%
|
---|
498 | include BEGIN(incl);
|
---|
499 |
|
---|
500 | [a-z]+ ECHO;
|
---|
501 | [^a-z\n]*\n? ECHO;
|
---|
502 |
|
---|
503 | <incl>[ \t]* /* eat the whitespace */
|
---|
504 | <incl>[^ \t\n]+ { /* got the include file name */
|
---|
505 | if ( include_stack_ptr >= MAX_INCLUDE_DEPTH )
|
---|
506 | {
|
---|
507 | fprintf( stderr, "Includes nested too deeply" );
|
---|
508 | exit( 1 );
|
---|
509 | }
|
---|
510 |
|
---|
511 | include_stack[include_stack_ptr++] =
|
---|
512 | YY_CURRENT_BUFFER;
|
---|
513 |
|
---|
514 | yyin = fopen( yytext, "r" );
|
---|
515 |
|
---|
516 | if ( ! yyin )
|
---|
517 | error( ... );
|
---|
518 |
|
---|
519 | yy_switch_to_buffer(
|
---|
520 | yy_create_buffer( yyin, YY_BUF_SIZE ) );
|
---|
521 |
|
---|
522 | BEGIN(INITIAL);
|
---|
523 | }
|
---|
524 |
|
---|
525 | <<EOF>> {
|
---|
526 | if ( --include_stack_ptr 0 )
|
---|
527 | {
|
---|
528 | yyterminate();
|
---|
529 | }
|
---|
530 |
|
---|
531 | else
|
---|
532 | {
|
---|
533 | yy_delete_buffer( YY_CURRENT_BUFFER );
|
---|
534 | yy_switch_to_buffer(
|
---|
535 | include_stack[include_stack_ptr] );
|
---|
536 | }
|
---|
537 | }
|
---|
538 |
|
---|
539 | The following routines are available for setting up input buffers for
|
---|
540 | scanning in-memory strings instead of files. All of them create a new
|
---|
541 | input buffer for scanning the string, and return a corresponding
|
---|
542 | `YY_BUFFER_STATE' handle (which you should delete with
|
---|
543 | `yy_delete_buffer()' when done with it). They also switch to the new
|
---|
544 | buffer using `yy_switch_to_buffer()', so the next call to `yylex()'
|
---|
545 | will start scanning the string.
|
---|
546 |
|
---|
547 | - Function: YY_BUFFER_STATE yy_scan_string ( const char *str )
|
---|
548 | scans a NUL-terminated string.
|
---|
549 |
|
---|
550 | - Function: YY_BUFFER_STATE yy_scan_bytes ( const char *bytes, int len
|
---|
551 | )
|
---|
552 | scans `len' bytes (including possibly `NUL's) starting at location
|
---|
553 | `bytes'.
|
---|
554 |
|
---|
555 | Note that both of these functions create and scan a _copy_ of the
|
---|
556 | string or bytes. (This may be desirable, since `yylex()' modifies the
|
---|
557 | contents of the buffer it is scanning.) You can avoid the copy by
|
---|
558 | using:
|
---|
559 |
|
---|
560 | - Function: YY_BUFFER_STATE yy_scan_buffer (char *base, yy_size_t size)
|
---|
561 | which scans in place the buffer starting at `base', consisting of
|
---|
562 | `size' bytes, the last two bytes of which _must_ be
|
---|
563 | `YY_END_OF_BUFFER_CHAR' (ASCII NUL). These last two bytes are not
|
---|
564 | scanned; thus, scanning consists of `base[0]' through
|
---|
565 | `base[size-2]', inclusive.
|
---|
566 |
|
---|
567 | If you fail to set up `base' in this manner (i.e., forget the final
|
---|
568 | two `YY_END_OF_BUFFER_CHAR' bytes), then `yy_scan_buffer()' returns a
|
---|
569 | NULL pointer instead of creating a new input buffer.
|
---|
570 |
|
---|
571 | - Data type: yy_size_t
|
---|
572 | is an integral type to which you can cast an integer expression
|
---|
573 | reflecting the size of the buffer.
|
---|
574 |
|
---|
575 |
|
---|
576 | File: flex.info, Node: EOF, Next: Misc Macros, Prev: Multiple Input Buffers, Up: Top
|
---|
577 |
|
---|
578 | End-of-File Rules
|
---|
579 | *****************
|
---|
580 |
|
---|
581 | The special rule `<<EOF>>' indicates actions which are to be taken
|
---|
582 | when an end-of-file is encountered and `yywrap()' returns non-zero
|
---|
583 | (i.e., indicates no further files to process). The action must finish
|
---|
584 | by doing one of the following things:
|
---|
585 |
|
---|
586 | * assigning `yyin' to a new input file (in previous versions of
|
---|
587 | `flex', after doing the assignment you had to call the special
|
---|
588 | action `YY_NEW_FILE'. This is no longer necessary.)
|
---|
589 |
|
---|
590 | * executing a `return' statement;
|
---|
591 |
|
---|
592 | * executing the special `yyterminate()' action.
|
---|
593 |
|
---|
594 | * or, switching to a new buffer using `yy_switch_to_buffer()' as
|
---|
595 | shown in the example above.
|
---|
596 |
|
---|
597 | <<EOF>> rules may not be used with other patterns; they may only be
|
---|
598 | qualified with a list of start conditions. If an unqualified <<EOF>>
|
---|
599 | rule is given, it applies to _all_ start conditions which do not
|
---|
600 | already have <<EOF>> actions. To specify an <<EOF>> rule for only the
|
---|
601 | initial start condition, use:
|
---|
602 |
|
---|
603 |
|
---|
604 | <INITIAL><<EOF>>
|
---|
605 |
|
---|
606 | These rules are useful for catching things like unclosed comments.
|
---|
607 | An example:
|
---|
608 |
|
---|
609 |
|
---|
610 | %x quote
|
---|
611 | %%
|
---|
612 |
|
---|
613 | ...other rules for dealing with quotes...
|
---|
614 |
|
---|
615 | <quote><<EOF>> {
|
---|
616 | error( "unterminated quote" );
|
---|
617 | yyterminate();
|
---|
618 | }
|
---|
619 | <<EOF>> {
|
---|
620 | if ( *++filelist )
|
---|
621 | yyin = fopen( *filelist, "r" );
|
---|
622 | else
|
---|
623 | yyterminate();
|
---|
624 | }
|
---|
625 |
|
---|
626 |
|
---|
627 | File: flex.info, Node: Misc Macros, Next: User Values, Prev: EOF, Up: Top
|
---|
628 |
|
---|
629 | Miscellaneous Macros
|
---|
630 | ********************
|
---|
631 |
|
---|
632 | The macro `YY_USER_ACTION' can be defined to provide an action which
|
---|
633 | is always executed prior to the matched rule's action. For example, it
|
---|
634 | could be #define'd to call a routine to convert yytext to lower-case.
|
---|
635 | When `YY_USER_ACTION' is invoked, the variable `yy_act' gives the
|
---|
636 | number of the matched rule (rules are numbered starting with 1).
|
---|
637 | Suppose you want to profile how often each of your rules is matched.
|
---|
638 | The following would do the trick:
|
---|
639 |
|
---|
640 |
|
---|
641 | #define YY_USER_ACTION ++ctr[yy_act]
|
---|
642 |
|
---|
643 | where `ctr' is an array to hold the counts for the different rules.
|
---|
644 | Note that the macro `YY_NUM_RULES' gives the total number of rules
|
---|
645 | (including the default rule), even if you use `-s)', so a correct
|
---|
646 | declaration for `ctr' is:
|
---|
647 |
|
---|
648 |
|
---|
649 | int ctr[YY_NUM_RULES];
|
---|
650 |
|
---|
651 | The macro `YY_USER_INIT' may be defined to provide an action which
|
---|
652 | is always executed before the first scan (and before the scanner's
|
---|
653 | internal initializations are done). For example, it could be used to
|
---|
654 | call a routine to read in a data table or open a logging file.
|
---|
655 |
|
---|
656 | The macro `yy_set_interactive(is_interactive)' can be used to
|
---|
657 | control whether the current buffer is considered "interactive". An
|
---|
658 | interactive buffer is processed more slowly, but must be used when the
|
---|
659 | scanner's input source is indeed interactive to avoid problems due to
|
---|
660 | waiting to fill buffers (see the discussion of the `-I' flag in *Note
|
---|
661 | Scanner Options::). A non-zero value in the macro invocation marks the
|
---|
662 | buffer as interactive, a zero value as non-interactive. Note that use
|
---|
663 | of this macro overrides `%option always-interactive' or `%option
|
---|
664 | never-interactive' (*note Scanner Options::). `yy_set_interactive()'
|
---|
665 | must be invoked prior to beginning to scan the buffer that is (or is
|
---|
666 | not) to be considered interactive.
|
---|
667 |
|
---|
668 | The macro `yy_set_bol(at_bol)' can be used to control whether the
|
---|
669 | current buffer's scanning context for the next token match is done as
|
---|
670 | though at the beginning of a line. A non-zero macro argument makes
|
---|
671 | rules anchored with `^' active, while a zero argument makes `^' rules
|
---|
672 | inactive.
|
---|
673 |
|
---|
674 | The macro `YY_AT_BOL()' returns true if the next token scanned from
|
---|
675 | the current buffer will have `^' rules active, false otherwise.
|
---|
676 |
|
---|
677 | In the generated scanner, the actions are all gathered in one large
|
---|
678 | switch statement and separated using `YY_BREAK', which may be
|
---|
679 | redefined. By default, it is simply a `break', to separate each rule's
|
---|
680 | action from the following rule's. Redefining `YY_BREAK' allows, for
|
---|
681 | example, C++ users to #define YY_BREAK to do nothing (while being very
|
---|
682 | careful that every rule ends with a `break'" or a `return'!) to avoid
|
---|
683 | suffering from unreachable statement warnings where because a rule's
|
---|
684 | action ends with `return', the `YY_BREAK' is inaccessible.
|
---|
685 |
|
---|
686 |
|
---|
687 | File: flex.info, Node: User Values, Next: Yacc, Prev: Misc Macros, Up: Top
|
---|
688 |
|
---|
689 | Values Available To the User
|
---|
690 | ****************************
|
---|
691 |
|
---|
692 | This chapter summarizes the various values available to the user in
|
---|
693 | the rule actions.
|
---|
694 |
|
---|
695 | `char *yytext'
|
---|
696 | holds the text of the current token. It may be modified but not
|
---|
697 | lengthened (you cannot append characters to the end).
|
---|
698 |
|
---|
699 | If the special directive `%array' appears in the first section of
|
---|
700 | the scanner description, then `yytext' is instead declared `char
|
---|
701 | yytext[YYLMAX]', where `YYLMAX' is a macro definition that you can
|
---|
702 | redefine in the first section if you don't like the default value
|
---|
703 | (generally 8KB). Using `%array' results in somewhat slower
|
---|
704 | scanners, but the value of `yytext' becomes immune to calls to
|
---|
705 | `unput()', which potentially destroy its value when `yytext' is a
|
---|
706 | character pointer. The opposite of `%array' is `%pointer', which
|
---|
707 | is the default.
|
---|
708 |
|
---|
709 | You cannot use `%array' when generating C++ scanner classes (the
|
---|
710 | `-+' flag).
|
---|
711 |
|
---|
712 | `int yyleng'
|
---|
713 | holds the length of the current token.
|
---|
714 |
|
---|
715 | `FILE *yyin'
|
---|
716 | is the file which by default `flex' reads from. It may be
|
---|
717 | redefined but doing so only makes sense before scanning begins or
|
---|
718 | after an EOF has been encountered. Changing it in the midst of
|
---|
719 | scanning will have unexpected results since `flex' buffers its
|
---|
720 | input; use `yyrestart()' instead. Once scanning terminates
|
---|
721 | because an end-of-file has been seen, you can assign `yyin' at the
|
---|
722 | new input file and then call the scanner again to continue
|
---|
723 | scanning.
|
---|
724 |
|
---|
725 | `void yyrestart( FILE *new_file )'
|
---|
726 | may be called to point `yyin' at the new input file. The
|
---|
727 | switch-over to the new file is immediate (any previously
|
---|
728 | buffered-up input is lost). Note that calling `yyrestart()' with
|
---|
729 | `yyin' as an argument thus throws away the current input buffer
|
---|
730 | and continues scanning the same input file.
|
---|
731 |
|
---|
732 | `FILE *yyout'
|
---|
733 | is the file to which `ECHO' actions are done. It can be reassigned
|
---|
734 | by the user.
|
---|
735 |
|
---|
736 | `YY_CURRENT_BUFFER'
|
---|
737 | returns a `YY_BUFFER_STATE' handle to the current buffer.
|
---|
738 |
|
---|
739 | `YY_START'
|
---|
740 | returns an integer value corresponding to the current start
|
---|
741 | condition. You can subsequently use this value with `BEGIN' to
|
---|
742 | return to that start condition.
|
---|
743 |
|
---|
744 |
|
---|
745 | File: flex.info, Node: Yacc, Next: Scanner Options, Prev: User Values, Up: Top
|
---|
746 |
|
---|
747 | Interfacing with Yacc
|
---|
748 | *********************
|
---|
749 |
|
---|
750 | One of the main uses of `flex' is as a companion to the `yacc'
|
---|
751 | parser-generator. `yacc' parsers expect to call a routine named
|
---|
752 | `yylex()' to find the next input token. The routine is supposed to
|
---|
753 | return the type of the next token as well as putting any associated
|
---|
754 | value in the global `yylval'. To use `flex' with `yacc', one specifies
|
---|
755 | the `-d' option to `yacc' to instruct it to generate the file `y.tab.h'
|
---|
756 | containing definitions of all the `%tokens' appearing in the `yacc'
|
---|
757 | input. This file is then included in the `flex' scanner. For example,
|
---|
758 | if one of the tokens is `TOK_NUMBER', part of the scanner might look
|
---|
759 | like:
|
---|
760 |
|
---|
761 |
|
---|
762 | %{
|
---|
763 | #include "y.tab.h"
|
---|
764 | %}
|
---|
765 |
|
---|
766 | %%
|
---|
767 |
|
---|
768 | [0-9]+ yylval = atoi( yytext ); return TOK_NUMBER;
|
---|
769 |
|
---|
770 |
|
---|
771 | File: flex.info, Node: Scanner Options, Next: Performance, Prev: Yacc, Up: Top
|
---|
772 |
|
---|
773 | Scanner Options
|
---|
774 | ***************
|
---|
775 |
|
---|
776 | The various `flex' options are categorized by function in the
|
---|
777 | following menu. If you want to lookup a particular option by name,
|
---|
778 | *Note Index of Scanner Options::.
|
---|
779 |
|
---|
780 | * Menu:
|
---|
781 |
|
---|
782 | * Options for Specifing Filenames::
|
---|
783 | * Options Affecting Scanner Behavior::
|
---|
784 | * Code-Level And API Options::
|
---|
785 | * Options for Scanner Speed and Size::
|
---|
786 | * Debugging Options::
|
---|
787 | * Miscellaneous Options::
|
---|
788 |
|
---|
789 | Even though there are many scanner options, a typical scanner might
|
---|
790 | only specify the following options:
|
---|
791 |
|
---|
792 |
|
---|
793 | %option 8bit reentrant bison-bridge
|
---|
794 | %option warn nodefault
|
---|
795 | %option yylineno
|
---|
796 | %option outfile="scanner.c" header-file="scanner.h"
|
---|
797 |
|
---|
798 | The first line specifies the general type of scanner we want. The
|
---|
799 | second line specifies that we are being careful. The third line asks
|
---|
800 | flex to track line numbers. The last line tells flex what to name the
|
---|
801 | files. (The options can be specified in any order. We just dividied
|
---|
802 | them.)
|
---|
803 |
|
---|
804 | `flex' also provides a mechanism for controlling options within the
|
---|
805 | scanner specification itself, rather than from the flex command-line.
|
---|
806 | This is done by including `%option' directives in the first section of
|
---|
807 | the scanner specification. You can specify multiple options with a
|
---|
808 | single `%option' directive, and multiple directives in the first
|
---|
809 | section of your flex input file.
|
---|
810 |
|
---|
811 | Most options are given simply as names, optionally preceded by the
|
---|
812 | word `no' (with no intervening whitespace) to negate their meaning.
|
---|
813 | The names are the same as their long-option equivalents (but without the
|
---|
814 | leading `--' ).
|
---|
815 |
|
---|
816 | `flex' scans your rule actions to determine whether you use the
|
---|
817 | `REJECT' or `yymore()' features. The `REJECT' and `yymore' options are
|
---|
818 | available to override its decision as to whether you use the options,
|
---|
819 | either by setting them (e.g., `%option reject)' to indicate the feature
|
---|
820 | is indeed used, or unsetting them to indicate it actually is not used
|
---|
821 | (e.g., `%option noyymore)'.
|
---|
822 |
|
---|
823 | A number of options are available for lint purists who want to
|
---|
824 | suppress the appearance of unneeded routines in the generated scanner.
|
---|
825 | Each of the following, if unset (e.g., `%option nounput'), results in
|
---|
826 | the corresponding routine not appearing in the generated scanner:
|
---|
827 |
|
---|
828 |
|
---|
829 | input, unput
|
---|
830 | yy_push_state, yy_pop_state, yy_top_state
|
---|
831 | yy_scan_buffer, yy_scan_bytes, yy_scan_string
|
---|
832 |
|
---|
833 | yyget_extra, yyset_extra, yyget_leng, yyget_text,
|
---|
834 | yyget_lineno, yyset_lineno, yyget_in, yyset_in,
|
---|
835 | yyget_out, yyset_out, yyget_lval, yyset_lval,
|
---|
836 | yyget_lloc, yyset_lloc, yyget_debug, yyset_debug
|
---|
837 |
|
---|
838 | (though `yy_push_state()' and friends won't appear anyway unless you
|
---|
839 | use `%option stack)'.
|
---|
840 |
|
---|
841 |
|
---|
842 | File: flex.info, Node: Options for Specifing Filenames, Next: Options Affecting Scanner Behavior, Prev: Scanner Options, Up: Scanner Options
|
---|
843 |
|
---|
844 | Options for Specifing Filenames
|
---|
845 | ===============================
|
---|
846 |
|
---|
847 | `--header-file=FILE, `%option header-file="FILE"''
|
---|
848 | instructs flex to write a C header to `FILE'. This file contains
|
---|
849 | function prototypes, extern variables, and types used by the
|
---|
850 | scanner. Only the external API is exported by the header file.
|
---|
851 | Many macros that are usable from within scanner actions are not
|
---|
852 | exported to the header file. This is due to namespace problems and
|
---|
853 | the goal of a clean external API.
|
---|
854 |
|
---|
855 | While in the header, the macro `yyIN_HEADER' is defined, where `yy'
|
---|
856 | is substituted with the appropriate prefix.
|
---|
857 |
|
---|
858 | The `--header-file' option is not compatible with the `--c++'
|
---|
859 | option, since the C++ scanner provides its own header in
|
---|
860 | `yyFlexLexer.h'.
|
---|
861 |
|
---|
862 | `-oFILE, --outfile=FILE, `%option outfile="FILE"''
|
---|
863 | directs flex to write the scanner to the file `FILE' instead of
|
---|
864 | `lex.yy.c'. If you combine `--outfile' with the `--stdout' option,
|
---|
865 | then the scanner is written to `stdout' but its `#line' directives
|
---|
866 | (see the `-l' option above) refer to the file `FILE'.
|
---|
867 |
|
---|
868 | `-t, --stdout, `%option stdout''
|
---|
869 | instructs `flex' to write the scanner it generates to standard
|
---|
870 | output instead of `lex.yy.c'.
|
---|
871 |
|
---|
872 | `-SFILE, --skel=FILE'
|
---|
873 | overrides the default skeleton file from which `flex' constructs
|
---|
874 | its scanners. You'll never need this option unless you are doing
|
---|
875 | `flex' maintenance or development.
|
---|
876 |
|
---|
877 | `--tables-file=FILE'
|
---|
878 | Write serialized scanner dfa tables to FILE. The generated scanner
|
---|
879 | will not contain the tables, and requires them to be loaded at
|
---|
880 | runtime. *Note serialization::.
|
---|
881 |
|
---|
882 | `--tables-verify'
|
---|
883 | This option is for flex development. We document it here in case
|
---|
884 | you stumble upon it by accident or in case you suspect some
|
---|
885 | inconsistency in the serialized tables. Flex will serialize the
|
---|
886 | scanner dfa tables but will also generate the in-code tables as it
|
---|
887 | normally does. At runtime, the scanner will verify that the
|
---|
888 | serialized tables match the in-code tables, instead of loading
|
---|
889 | them.
|
---|
890 |
|
---|
891 |
|
---|
892 |
|
---|
893 | File: flex.info, Node: Options Affecting Scanner Behavior, Next: Code-Level And API Options, Prev: Options for Specifing Filenames, Up: Scanner Options
|
---|
894 |
|
---|
895 | Options Affecting Scanner Behavior
|
---|
896 | ==================================
|
---|
897 |
|
---|
898 | `-i, --case-insensitive, `%option case-insensitive''
|
---|
899 | instructs `flex' to generate a "case-insensitive" scanner. The
|
---|
900 | case of letters given in the `flex' input patterns will be ignored,
|
---|
901 | and tokens in the input will be matched regardless of case. The
|
---|
902 | matched text given in `yytext' will have the preserved case (i.e.,
|
---|
903 | it will not be folded). For tricky behavior, see *Note case and
|
---|
904 | character ranges::.
|
---|
905 |
|
---|
906 | `-l, --lex-compat, `%option lex-compat''
|
---|
907 | turns on maximum compatibility with the original AT&T `lex'
|
---|
908 | implementation. Note that this does not mean _full_ compatibility.
|
---|
909 | Use of this option costs a considerable amount of performance, and
|
---|
910 | it cannot be used with the `--c++', `--full', `--fast', `-Cf', or
|
---|
911 | `-CF' options. For details on the compatibilities it provides, see
|
---|
912 | *Note Lex and Posix::. This option also results in the name
|
---|
913 | `YY_FLEX_LEX_COMPAT' being `#define''d in the generated scanner.
|
---|
914 |
|
---|
915 | `-B, --batch, `%option batch''
|
---|
916 | instructs `flex' to generate a "batch" scanner, the opposite of
|
---|
917 | _interactive_ scanners generated by `--interactive' (see below).
|
---|
918 | In general, you use `-B' when you are _certain_ that your scanner
|
---|
919 | will never be used interactively, and you want to squeeze a
|
---|
920 | _little_ more performance out of it. If your goal is instead to
|
---|
921 | squeeze out a _lot_ more performance, you should be using the
|
---|
922 | `-Cf' or `-CF' options, which turn on `--batch' automatically
|
---|
923 | anyway.
|
---|
924 |
|
---|
925 | `-I, --interactive, `%option interactive''
|
---|
926 | instructs `flex' to generate an interactive scanner. An
|
---|
927 | interactive scanner is one that only looks ahead to decide what
|
---|
928 | token has been matched if it absolutely must. It turns out that
|
---|
929 | always looking one extra character ahead, even if the scanner has
|
---|
930 | already seen enough text to disambiguate the current token, is a
|
---|
931 | bit faster than only looking ahead when necessary. But scanners
|
---|
932 | that always look ahead give dreadful interactive performance; for
|
---|
933 | example, when a user types a newline, it is not recognized as a
|
---|
934 | newline token until they enter _another_ token, which often means
|
---|
935 | typing in another whole line.
|
---|
936 |
|
---|
937 | `flex' scanners default to `interactive' unless you use the `-Cf'
|
---|
938 | or `-CF' table-compression options (*note Performance::). That's
|
---|
939 | because if you're looking for high-performance you should be using
|
---|
940 | one of these options, so if you didn't, `flex' assumes you'd
|
---|
941 | rather trade off a bit of run-time performance for intuitive
|
---|
942 | interactive behavior. Note also that you _cannot_ use
|
---|
943 | `--interactive' in conjunction with `-Cf' or `-CF'. Thus, this
|
---|
944 | option is not really needed; it is on by default for all those
|
---|
945 | cases in which it is allowed.
|
---|
946 |
|
---|
947 | You can force a scanner to _not_ be interactive by using `--batch'
|
---|
948 |
|
---|
949 | `-7, --7bit, `%option 7bit''
|
---|
950 | instructs `flex' to generate a 7-bit scanner, i.e., one which can
|
---|
951 | only recognize 7-bit characters in its input. The advantage of
|
---|
952 | using `--7bit' is that the scanner's tables can be up to half the
|
---|
953 | size of those generated using the `--8bit'. The disadvantage is
|
---|
954 | that such scanners often hang or crash if their input contains an
|
---|
955 | 8-bit character.
|
---|
956 |
|
---|
957 | Note, however, that unless you generate your scanner using the
|
---|
958 | `-Cf' or `-CF' table compression options, use of `--7bit' will
|
---|
959 | save only a small amount of table space, and make your scanner
|
---|
960 | considerably less portable. `Flex''s default behavior is to
|
---|
961 | generate an 8-bit scanner unless you use the `-Cf' or `-CF', in
|
---|
962 | which case `flex' defaults to generating 7-bit scanners unless
|
---|
963 | your site was always configured to generate 8-bit scanners (as will
|
---|
964 | often be the case with non-USA sites). You can tell whether flex
|
---|
965 | generated a 7-bit or an 8-bit scanner by inspecting the flag
|
---|
966 | summary in the `--verbose' output as described above.
|
---|
967 |
|
---|
968 | Note that if you use `-Cfe' or `-CFe' `flex' still defaults to
|
---|
969 | generating an 8-bit scanner, since usually with these compression
|
---|
970 | options full 8-bit tables are not much more expensive than 7-bit
|
---|
971 | tables.
|
---|
972 |
|
---|
973 | `-8, --8bit, `%option 8bit''
|
---|
974 | instructs `flex' to generate an 8-bit scanner, i.e., one which can
|
---|
975 | recognize 8-bit characters. This flag is only needed for scanners
|
---|
976 | generated using `-Cf' or `-CF', as otherwise flex defaults to
|
---|
977 | generating an 8-bit scanner anyway.
|
---|
978 |
|
---|
979 | See the discussion of `--7bit' above for `flex''s default behavior
|
---|
980 | and the tradeoffs between 7-bit and 8-bit scanners.
|
---|
981 |
|
---|
982 | `--default, `%option default''
|
---|
983 | generate the default rule.
|
---|
984 |
|
---|
985 | `--always-interactive, `%option always-interactive''
|
---|
986 | instructs flex to generate a scanner which always considers its
|
---|
987 | input _interactive_. Normally, on each new input file the scanner
|
---|
988 | calls `isatty()' in an attempt to determine whether the scanner's
|
---|
989 | input source is interactive and thus should be read a character at
|
---|
990 | a time. When this option is used, however, then no such call is
|
---|
991 | made.
|
---|
992 |
|
---|
993 | `--never-interactive, `--never-interactive''
|
---|
994 | instructs flex to generate a scanner which never considers its
|
---|
995 | input interactive. This is the opposite of `always-interactive'.
|
---|
996 |
|
---|
997 | `-X, --posix, `%option posix''
|
---|
998 | turns on maximum compatibility with the POSIX 1003.2-1992
|
---|
999 | definition of `lex'. Since `flex' was originally designed to
|
---|
1000 | implement the POSIX definition of `lex' this generally involves
|
---|
1001 | very few changes in behavior. At the current writing the known
|
---|
1002 | differences between `flex' and the POSIX standard are:
|
---|
1003 |
|
---|
1004 | * In POSIX and AT&T `lex', the repeat operator, `{}', has lower
|
---|
1005 | precedence than concatenation (thus `ab{3}' yields `ababab').
|
---|
1006 | Most POSIX utilities use an Extended Regular Expression (ERE)
|
---|
1007 | precedence that has the precedence of the repeat operator
|
---|
1008 | higher than concatenation (which causes `ab{3}' to yield
|
---|
1009 | `abbb'). By default, `flex' places the precedence of the
|
---|
1010 | repeat operator higher than concatenation which matches the
|
---|
1011 | ERE processing of other POSIX utilities. When either
|
---|
1012 | `--posix' or `-l' are specified, `flex' will use the
|
---|
1013 | traditional AT&T and POSIX-compliant precedence for the
|
---|
1014 | repeat operator where concatenation has higher precedence
|
---|
1015 | than the repeat operator.
|
---|
1016 |
|
---|
1017 | `--stack, `%option stack''
|
---|
1018 | enables the use of start condition stacks (*note Start
|
---|
1019 | Conditions::).
|
---|
1020 |
|
---|
1021 | `--stdinit, `%option stdinit''
|
---|
1022 | if set (i.e., %option stdinit) initializes `yyin' and `yyout' to
|
---|
1023 | `stdin' and `stdout', instead of the default of `NULL'. Some
|
---|
1024 | existing `lex' programs depend on this behavior, even though it is
|
---|
1025 | not compliant with ANSI C, which does not require `stdin' and
|
---|
1026 | `stdout' to be compile-time constant. In a reentrant scanner,
|
---|
1027 | however, this is not a problem since initialization is performed
|
---|
1028 | in `yylex_init' at runtime.
|
---|
1029 |
|
---|
1030 | `--yylineno, `%option yylineno''
|
---|
1031 | directs `flex' to generate a scanner that maintains the number of
|
---|
1032 | the current line read from its input in the global variable
|
---|
1033 | `yylineno'. This option is implied by `%option lex-compat'. In a
|
---|
1034 | reentrant C scanner, the macro `yylineno' is accessible regardless
|
---|
1035 | of the value of `%option yylineno', however, its value is not
|
---|
1036 | modified by `flex' unless `%option yylineno' is enabled.
|
---|
1037 |
|
---|
1038 | `--yywrap, `%option yywrap''
|
---|
1039 | if unset (i.e., `--noyywrap)', makes the scanner not call
|
---|
1040 | `yywrap()' upon an end-of-file, but simply assume that there are no
|
---|
1041 | more files to scan (until the user points `yyin' at a new file and
|
---|
1042 | calls `yylex()' again).
|
---|
1043 |
|
---|
1044 |
|
---|
1045 |
|
---|
1046 | File: flex.info, Node: Code-Level And API Options, Next: Options for Scanner Speed and Size, Prev: Options Affecting Scanner Behavior, Up: Scanner Options
|
---|
1047 |
|
---|
1048 | Code-Level And API Options
|
---|
1049 | ==========================
|
---|
1050 |
|
---|
1051 | `--ansi-definitions, `%option ansi-definitions''
|
---|
1052 | instruct flex to generate ANSI C99 definitions for functions.
|
---|
1053 | This option is enabled by default. If `%option
|
---|
1054 | noansi-definitions' is specified, then the obsolete style is
|
---|
1055 | generated.
|
---|
1056 |
|
---|
1057 | `--ansi-prototypes, `%option ansi-prototypes''
|
---|
1058 | instructs flex to generate ANSI C99 prototypes for functions.
|
---|
1059 | This option is enabled by default. If `noansi-prototypes' is
|
---|
1060 | specified, then prototypes will have empty parameter lists.
|
---|
1061 |
|
---|
1062 | `--bison-bridge, `%option bison-bridge''
|
---|
1063 | instructs flex to generate a C scanner that is meant to be called
|
---|
1064 | by a `GNU bison' parser. The scanner has minor API changes for
|
---|
1065 | `bison' compatibility. In particular, the declaration of `yylex'
|
---|
1066 | is modified to take an additional parameter, `yylval'. *Note
|
---|
1067 | Bison Bridge::.
|
---|
1068 |
|
---|
1069 | `--bison-locations, `%option bison-locations''
|
---|
1070 | instruct flex that `GNU bison' `%locations' are being used. This
|
---|
1071 | means `yylex' will be passed an additional parameter, `yylloc'.
|
---|
1072 | This option implies `%option bison-bridge'. *Note Bison Bridge::.
|
---|
1073 |
|
---|
1074 | `-L, --noline, `%option noline''
|
---|
1075 | instructs `flex' not to generate `#line' directives. Without this
|
---|
1076 | option, `flex' peppers the generated scanner with `#line'
|
---|
1077 | directives so error messages in the actions will be correctly
|
---|
1078 | located with respect to either the original `flex' input file (if
|
---|
1079 | the errors are due to code in the input file), or `lex.yy.c' (if
|
---|
1080 | the errors are `flex''s fault - you should report these sorts of
|
---|
1081 | errors to the email address given in *Note Reporting Bugs::).
|
---|
1082 |
|
---|
1083 | `-R, --reentrant, `%option reentrant''
|
---|
1084 | instructs flex to generate a reentrant C scanner. The generated
|
---|
1085 | scanner may safely be used in a multi-threaded environment. The
|
---|
1086 | API for a reentrant scanner is different than for a non-reentrant
|
---|
1087 | scanner *note Reentrant::). Because of the API difference between
|
---|
1088 | reentrant and non-reentrant `flex' scanners, non-reentrant flex
|
---|
1089 | code must be modified before it is suitable for use with this
|
---|
1090 | option. This option is not compatible with the `--c++' option.
|
---|
1091 |
|
---|
1092 | The option `--reentrant' does not affect the performance of the
|
---|
1093 | scanner.
|
---|
1094 |
|
---|
1095 | `-+, --c++, `%option c++''
|
---|
1096 | specifies that you want flex to generate a C++ scanner class.
|
---|
1097 | *Note Cxx::, for details.
|
---|
1098 |
|
---|
1099 | `--array, `%option array''
|
---|
1100 | specifies that you want yytext to be an array instead of a char*
|
---|
1101 |
|
---|
1102 | `--pointer, `%option pointer''
|
---|
1103 | specify that `yytext' should be a `char *', not an array. This
|
---|
1104 | default is `char *'.
|
---|
1105 |
|
---|
1106 | `-PPREFIX, --prefix=PREFIX, `%option prefix="PREFIX"''
|
---|
1107 | changes the default `yy' prefix used by `flex' for all
|
---|
1108 | globally-visible variable and function names to instead be
|
---|
1109 | `PREFIX'. For example, `--prefix=foo' changes the name of
|
---|
1110 | `yytext' to `footext'. It also changes the name of the default
|
---|
1111 | output file from `lex.yy.c' to `lex.foo.c'. Here is a partial
|
---|
1112 | list of the names affected:
|
---|
1113 |
|
---|
1114 |
|
---|
1115 | yy_create_buffer
|
---|
1116 | yy_delete_buffer
|
---|
1117 | yy_flex_debug
|
---|
1118 | yy_init_buffer
|
---|
1119 | yy_flush_buffer
|
---|
1120 | yy_load_buffer_state
|
---|
1121 | yy_switch_to_buffer
|
---|
1122 | yyin
|
---|
1123 | yyleng
|
---|
1124 | yylex
|
---|
1125 | yylineno
|
---|
1126 | yyout
|
---|
1127 | yyrestart
|
---|
1128 | yytext
|
---|
1129 | yywrap
|
---|
1130 | yyalloc
|
---|
1131 | yyrealloc
|
---|
1132 | yyfree
|
---|
1133 |
|
---|
1134 | (If you are using a C++ scanner, then only `yywrap' and
|
---|
1135 | `yyFlexLexer' are affected.) Within your scanner itself, you can
|
---|
1136 | still refer to the global variables and functions using either
|
---|
1137 | version of their name; but externally, they have the modified name.
|
---|
1138 |
|
---|
1139 | This option lets you easily link together multiple `flex' programs
|
---|
1140 | into the same executable. Note, though, that using this option
|
---|
1141 | also renames `yywrap()', so you now _must_ either provide your own
|
---|
1142 | (appropriately-named) version of the routine for your scanner, or
|
---|
1143 | use `%option noyywrap', as linking with `-lfl' no longer provides
|
---|
1144 | one for you by default.
|
---|
1145 |
|
---|
1146 | `--main, `%option main''
|
---|
1147 | directs flex to provide a default `main()' program for the
|
---|
1148 | scanner, which simply calls `yylex()'. This option implies
|
---|
1149 | `noyywrap' (see below).
|
---|
1150 |
|
---|
1151 | `--nounistd, `%option nounistd''
|
---|
1152 | suppresses inclusion of the non-ANSI header file `unistd.h'. This
|
---|
1153 | option is meant to target environments in which `unistd.h' does
|
---|
1154 | not exist. Be aware that certain options may cause flex to
|
---|
1155 | generate code that relies on functions normally found in
|
---|
1156 | `unistd.h', (e.g. `isatty()', `read()'.) If you wish to use these
|
---|
1157 | functions, you will have to inform your compiler where to find
|
---|
1158 | them. *Note option-always-interactive::. *Note option-read::.
|
---|
1159 |
|
---|
1160 | `--yyclass, `%option yyclass="NAME"''
|
---|
1161 | only applies when generating a C++ scanner (the `--c++' option).
|
---|
1162 | It informs `flex' that you have derived `foo' as a subclass of
|
---|
1163 | `yyFlexLexer', so `flex' will place your actions in the member
|
---|
1164 | function `foo::yylex()' instead of `yyFlexLexer::yylex()'. It
|
---|
1165 | also generates a `yyFlexLexer::yylex()' member function that emits
|
---|
1166 | a run-time error (by invoking `yyFlexLexer::LexerError())' if
|
---|
1167 | called. *Note Cxx::.
|
---|
1168 |
|
---|
1169 |
|
---|
1170 |
|
---|
1171 | File: flex.info, Node: Options for Scanner Speed and Size, Next: Debugging Options, Prev: Code-Level And API Options, Up: Scanner Options
|
---|
1172 |
|
---|
1173 | Options for Scanner Speed and Size
|
---|
1174 | ==================================
|
---|
1175 |
|
---|
1176 | `-C[aefFmr]'
|
---|
1177 | controls the degree of table compression and, more generally,
|
---|
1178 | trade-offs between small scanners and fast scanners.
|
---|
1179 |
|
---|
1180 | `-C'
|
---|
1181 | A lone `-C' specifies that the scanner tables should be
|
---|
1182 | compressed but neither equivalence classes nor
|
---|
1183 | meta-equivalence classes should be used.
|
---|
1184 |
|
---|
1185 | `-Ca, --align, `%option align''
|
---|
1186 | ("align") instructs flex to trade off larger tables in the
|
---|
1187 | generated scanner for faster performance because the elements
|
---|
1188 | of the tables are better aligned for memory access and
|
---|
1189 | computation. On some RISC architectures, fetching and
|
---|
1190 | manipulating longwords is more efficient than with
|
---|
1191 | smaller-sized units such as shortwords. This option can
|
---|
1192 | quadruple the size of the tables used by your scanner.
|
---|
1193 |
|
---|
1194 | `-Ce, --ecs, `%option ecs''
|
---|
1195 | directs `flex' to construct "equivalence classes", i.e., sets
|
---|
1196 | of characters which have identical lexical properties (for
|
---|
1197 | example, if the only appearance of digits in the `flex' input
|
---|
1198 | is in the character class "[0-9]" then the digits '0', '1',
|
---|
1199 | ..., '9' will all be put in the same equivalence class).
|
---|
1200 | Equivalence classes usually give dramatic reductions in the
|
---|
1201 | final table/object file sizes (typically a factor of 2-5) and
|
---|
1202 | are pretty cheap performance-wise (one array look-up per
|
---|
1203 | character scanned).
|
---|
1204 |
|
---|
1205 | `-Cf'
|
---|
1206 | specifies that the "full" scanner tables should be generated -
|
---|
1207 | `flex' should not compress the tables by taking advantages of
|
---|
1208 | similar transition functions for different states.
|
---|
1209 |
|
---|
1210 | `-CF'
|
---|
1211 | specifies that the alternate fast scanner representation
|
---|
1212 | (described above under the `--fast' flag) should be used.
|
---|
1213 | This option cannot be used with `--c++'.
|
---|
1214 |
|
---|
1215 | `-Cm, --meta-ecs, `%option meta-ecs''
|
---|
1216 | directs `flex' to construct "meta-equivalence classes", which
|
---|
1217 | are sets of equivalence classes (or characters, if equivalence
|
---|
1218 | classes are not being used) that are commonly used together.
|
---|
1219 | Meta-equivalence classes are often a big win when using
|
---|
1220 | compressed tables, but they have a moderate performance
|
---|
1221 | impact (one or two `if' tests and one array look-up per
|
---|
1222 | character scanned).
|
---|
1223 |
|
---|
1224 | `-Cr, --read, `%option read''
|
---|
1225 | causes the generated scanner to _bypass_ use of the standard
|
---|
1226 | I/O library (`stdio') for input. Instead of calling
|
---|
1227 | `fread()' or `getc()', the scanner will use the `read()'
|
---|
1228 | system call, resulting in a performance gain which varies
|
---|
1229 | from system to system, but in general is probably negligible
|
---|
1230 | unless you are also using `-Cf' or `-CF'. Using `-Cr' can
|
---|
1231 | cause strange behavior if, for example, you read from `yyin'
|
---|
1232 | using `stdio' prior to calling the scanner (because the
|
---|
1233 | scanner will miss whatever text your previous reads left in
|
---|
1234 | the `stdio' input buffer). `-Cr' has no effect if you define
|
---|
1235 | `YY_INPUT()' (*note Generated Scanner::).
|
---|
1236 |
|
---|
1237 | The options `-Cf' or `-CF' and `-Cm' do not make sense together -
|
---|
1238 | there is no opportunity for meta-equivalence classes if the table
|
---|
1239 | is not being compressed. Otherwise the options may be freely
|
---|
1240 | mixed, and are cumulative.
|
---|
1241 |
|
---|
1242 | The default setting is `-Cem', which specifies that `flex' should
|
---|
1243 | generate equivalence classes and meta-equivalence classes. This
|
---|
1244 | setting provides the highest degree of table compression. You can
|
---|
1245 | trade off faster-executing scanners at the cost of larger tables
|
---|
1246 | with the following generally being true:
|
---|
1247 |
|
---|
1248 |
|
---|
1249 | slowest & smallest
|
---|
1250 | -Cem
|
---|
1251 | -Cm
|
---|
1252 | -Ce
|
---|
1253 | -C
|
---|
1254 | -C{f,F}e
|
---|
1255 | -C{f,F}
|
---|
1256 | -C{f,F}a
|
---|
1257 | fastest & largest
|
---|
1258 |
|
---|
1259 | Note that scanners with the smallest tables are usually generated
|
---|
1260 | and compiled the quickest, so during development you will usually
|
---|
1261 | want to use the default, maximal compression.
|
---|
1262 |
|
---|
1263 | `-Cfe' is often a good compromise between speed and size for
|
---|
1264 | production scanners.
|
---|
1265 |
|
---|
1266 | `-f, --full, `%option full''
|
---|
1267 | specifies "fast scanner". No table compression is done and
|
---|
1268 | `stdio' is bypassed. The result is large but fast. This option
|
---|
1269 | is equivalent to `--Cfr'
|
---|
1270 |
|
---|
1271 | `-F, --fast, `%option fast''
|
---|
1272 | specifies that the _fast_ scanner table representation should be
|
---|
1273 | used (and `stdio' bypassed). This representation is about as fast
|
---|
1274 | as the full table representation `--full', and for some sets of
|
---|
1275 | patterns will be considerably smaller (and for others, larger). In
|
---|
1276 | general, if the pattern set contains both _keywords_ and a
|
---|
1277 | catch-all, _identifier_ rule, such as in the set:
|
---|
1278 |
|
---|
1279 |
|
---|
1280 | "case" return TOK_CASE;
|
---|
1281 | "switch" return TOK_SWITCH;
|
---|
1282 | ...
|
---|
1283 | "default" return TOK_DEFAULT;
|
---|
1284 | [a-z]+ return TOK_ID;
|
---|
1285 |
|
---|
1286 | then you're better off using the full table representation. If
|
---|
1287 | only the _identifier_ rule is present and you then use a hash
|
---|
1288 | table or some such to detect the keywords, you're better off using
|
---|
1289 | `--fast'.
|
---|
1290 |
|
---|
1291 | This option is equivalent to `-CFr' (see below). It cannot be used
|
---|
1292 | with `--c++'.
|
---|
1293 |
|
---|
1294 |
|
---|