| 1 | This is flex.info, produced by makeinfo version 4.5 from flex.texi.
|
|---|
| 2 |
|
|---|
| 3 | INFO-DIR-SECTION Programming
|
|---|
| 4 | START-INFO-DIR-ENTRY
|
|---|
| 5 | * flex: (flex). Fast lexical analyzer generator (lex replacement).
|
|---|
| 6 | END-INFO-DIR-ENTRY
|
|---|
| 7 |
|
|---|
| 8 |
|
|---|
| 9 | The flex manual is placed under the same licensing conditions as the
|
|---|
| 10 | rest of flex:
|
|---|
| 11 |
|
|---|
| 12 | Copyright (C) 1990, 1997 The Regents of the University of California.
|
|---|
| 13 | All rights reserved.
|
|---|
| 14 |
|
|---|
| 15 | This code is derived from software contributed to Berkeley by Vern
|
|---|
| 16 | Paxson.
|
|---|
| 17 |
|
|---|
| 18 | The United States Government has rights in this work pursuant to
|
|---|
| 19 | contract no. DE-AC03-76SF00098 between the United States Department of
|
|---|
| 20 | Energy and the University of California.
|
|---|
| 21 |
|
|---|
| 22 | Redistribution and use in source and binary forms, with or without
|
|---|
| 23 | modification, are permitted provided that the following conditions are
|
|---|
| 24 | met:
|
|---|
| 25 |
|
|---|
| 26 | 1. Redistributions of source code must retain the above copyright
|
|---|
| 27 | notice, this list of conditions and the following disclaimer.
|
|---|
| 28 |
|
|---|
| 29 | 2. Redistributions in binary form must reproduce the above copyright
|
|---|
| 30 | notice, this list of conditions and the following disclaimer in the
|
|---|
| 31 | documentation and/or other materials provided with the
|
|---|
| 32 | distribution.
|
|---|
| 33 | Neither the name of the University nor the names of its contributors
|
|---|
| 34 | may be used to endorse or promote products derived from this software
|
|---|
| 35 | without specific prior written permission.
|
|---|
| 36 |
|
|---|
| 37 | THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
|
|---|
| 38 | WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
|
|---|
| 39 | MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
|
|---|
| 40 |
|
|---|
| 41 | File: flex.info, Node: Start Conditions, Next: Multiple Input Buffers, Prev: Generated Scanner, Up: Top
|
|---|
| 42 |
|
|---|
| 43 | Start Conditions
|
|---|
| 44 | ****************
|
|---|
| 45 |
|
|---|
| 46 | `flex' provides a mechanism for conditionally activating rules. Any
|
|---|
| 47 | rule whose pattern is prefixed with `<sc>' will only be active when the
|
|---|
| 48 | scanner is in the "start condition" named `sc'. For example,
|
|---|
| 49 |
|
|---|
| 50 |
|
|---|
| 51 | <STRING>[^"]* { /* eat up the string body ... */
|
|---|
| 52 | ...
|
|---|
| 53 | }
|
|---|
| 54 |
|
|---|
| 55 | will be active only when the scanner is in the `STRING' start
|
|---|
| 56 | condition, and
|
|---|
| 57 |
|
|---|
| 58 |
|
|---|
| 59 | <INITIAL,STRING,QUOTE>\. { /* handle an escape ... */
|
|---|
| 60 | ...
|
|---|
| 61 | }
|
|---|
| 62 |
|
|---|
| 63 | will be active only when the current start condition is either
|
|---|
| 64 | `INITIAL', `STRING', or `QUOTE'.
|
|---|
| 65 |
|
|---|
| 66 | Start conditions are declared in the definitions (first) section of
|
|---|
| 67 | the input using unindented lines beginning with either `%s' or `%x'
|
|---|
| 68 | followed by a list of names. The former declares "inclusive" start
|
|---|
| 69 | conditions, the latter "exclusive" start conditions. A start condition
|
|---|
| 70 | is activated using the `BEGIN' action. Until the next `BEGIN' action
|
|---|
| 71 | is executed, rules with the given start condition will be active and
|
|---|
| 72 | rules with other start conditions will be inactive. If the start
|
|---|
| 73 | condition is inclusive, then rules with no start conditions at all will
|
|---|
| 74 | also be active. If it is exclusive, then _only_ rules qualified with
|
|---|
| 75 | the start condition will be active. A set of rules contingent on the
|
|---|
| 76 | same exclusive start condition describe a scanner which is independent
|
|---|
| 77 | of any of the other rules in the `flex' input. Because of this,
|
|---|
| 78 | exclusive start conditions make it easy to specify "mini-scanners"
|
|---|
| 79 | which scan portions of the input that are syntactically different from
|
|---|
| 80 | the rest (e.g., comments).
|
|---|
| 81 |
|
|---|
| 82 | If the distinction between inclusive and exclusive start conditions
|
|---|
| 83 | is still a little vague, here's a simple example illustrating the
|
|---|
| 84 | connection between the two. The set of rules:
|
|---|
| 85 |
|
|---|
| 86 |
|
|---|
| 87 | %s example
|
|---|
| 88 | %%
|
|---|
| 89 |
|
|---|
| 90 | <example>foo do_something();
|
|---|
| 91 |
|
|---|
| 92 | bar something_else();
|
|---|
| 93 |
|
|---|
| 94 | is equivalent to
|
|---|
| 95 |
|
|---|
| 96 |
|
|---|
| 97 | %x example
|
|---|
| 98 | %%
|
|---|
| 99 |
|
|---|
| 100 | <example>foo do_something();
|
|---|
| 101 |
|
|---|
| 102 | <INITIAL,example>bar something_else();
|
|---|
| 103 |
|
|---|
| 104 | Without the `<INITIAL,example>' qualifier, the `bar' pattern in the
|
|---|
| 105 | second example wouldn't be active (i.e., couldn't match) when in start
|
|---|
| 106 | condition `example'. If we just used `example>' to qualify `bar',
|
|---|
| 107 | though, then it would only be active in `example' and not in `INITIAL',
|
|---|
| 108 | while in the first example it's active in both, because in the first
|
|---|
| 109 | example the `example' start condition is an inclusive `(%s)' start
|
|---|
| 110 | condition.
|
|---|
| 111 |
|
|---|
| 112 | Also note that the special start-condition specifier `<*>' matches
|
|---|
| 113 | every start condition. Thus, the above example could also have been
|
|---|
| 114 | written:
|
|---|
| 115 |
|
|---|
| 116 |
|
|---|
| 117 | %x example
|
|---|
| 118 | %%
|
|---|
| 119 |
|
|---|
| 120 | <example>foo do_something();
|
|---|
| 121 |
|
|---|
| 122 | <*>bar something_else();
|
|---|
| 123 |
|
|---|
| 124 | The default rule (to `ECHO' any unmatched character) remains active
|
|---|
| 125 | in start conditions. It is equivalent to:
|
|---|
| 126 |
|
|---|
| 127 |
|
|---|
| 128 | <*>.|\n ECHO;
|
|---|
| 129 |
|
|---|
| 130 | `BEGIN(0)' returns to the original state where only the rules with
|
|---|
| 131 | no start conditions are active. This state can also be referred to as
|
|---|
| 132 | the start-condition `INITIAL', so `BEGIN(INITIAL)' is equivalent to
|
|---|
| 133 | `BEGIN(0)'. (The parentheses around the start condition name are not
|
|---|
| 134 | required but are considered good style.)
|
|---|
| 135 |
|
|---|
| 136 | `BEGIN' actions can also be given as indented code at the beginning
|
|---|
| 137 | of the rules section. For example, the following will cause the scanner
|
|---|
| 138 | to enter the `SPECIAL' start condition whenever `yylex()' is called and
|
|---|
| 139 | the global variable `enter_special' is true:
|
|---|
| 140 |
|
|---|
| 141 |
|
|---|
| 142 | int enter_special;
|
|---|
| 143 |
|
|---|
| 144 | %x SPECIAL
|
|---|
| 145 | %%
|
|---|
| 146 | if ( enter_special )
|
|---|
| 147 | BEGIN(SPECIAL);
|
|---|
| 148 |
|
|---|
| 149 | <SPECIAL>blahblahblah
|
|---|
| 150 | ...more rules follow...
|
|---|
| 151 |
|
|---|
| 152 | To illustrate the uses of start conditions, here is a scanner which
|
|---|
| 153 | provides two different interpretations of a string like `123.456'. By
|
|---|
| 154 | default it will treat it as three tokens, the integer `123', a dot
|
|---|
| 155 | (`.'), and the integer `456'. But if the string is preceded earlier in
|
|---|
| 156 | the line by the string `expect-floats' it will treat it as a single
|
|---|
| 157 | token, the floating-point number `123.456':
|
|---|
| 158 |
|
|---|
| 159 |
|
|---|
| 160 | %{
|
|---|
| 161 | #include <math.h>
|
|---|
| 162 | %}
|
|---|
| 163 | %s expect
|
|---|
| 164 |
|
|---|
| 165 | %%
|
|---|
| 166 | expect-floats BEGIN(expect);
|
|---|
| 167 |
|
|---|
| 168 | <expect>[0-9]+@samp{.}[0-9]+ {
|
|---|
| 169 | printf( "found a float, = %f\n",
|
|---|
| 170 | atof( yytext ) );
|
|---|
| 171 | }
|
|---|
| 172 | <expect>\n {
|
|---|
| 173 | /* that's the end of the line, so
|
|---|
| 174 | * we need another "expect-number"
|
|---|
| 175 | * before we'll recognize any more
|
|---|
| 176 | * numbers
|
|---|
| 177 | */
|
|---|
| 178 | BEGIN(INITIAL);
|
|---|
| 179 | }
|
|---|
| 180 |
|
|---|
| 181 | [0-9]+ {
|
|---|
| 182 | printf( "found an integer, = %d\n",
|
|---|
| 183 | atoi( yytext ) );
|
|---|
| 184 | }
|
|---|
| 185 |
|
|---|
| 186 | "." printf( "found a dot\n" );
|
|---|
| 187 |
|
|---|
| 188 | Here is a scanner which recognizes (and discards) C comments while
|
|---|
| 189 | maintaining a count of the current input line.
|
|---|
| 190 |
|
|---|
| 191 |
|
|---|
| 192 | %x comment
|
|---|
| 193 | %%
|
|---|
| 194 | int line_num = 1;
|
|---|
| 195 |
|
|---|
| 196 | "/*" BEGIN(comment);
|
|---|
| 197 |
|
|---|
| 198 | <comment>[^*\n]* /* eat anything that's not a '*' */
|
|---|
| 199 | <comment>"*"+[^*/\n]* /* eat up '*'s not followed by '/'s */
|
|---|
| 200 | <comment>\n ++line_num;
|
|---|
| 201 | <comment>"*"+"/" BEGIN(INITIAL);
|
|---|
| 202 |
|
|---|
| 203 | This scanner goes to a bit of trouble to match as much text as
|
|---|
| 204 | possible with each rule. In general, when attempting to write a
|
|---|
| 205 | high-speed scanner try to match as much possible in each rule, as it's
|
|---|
| 206 | a big win.
|
|---|
| 207 |
|
|---|
| 208 | Note that start-conditions names are really integer values and can
|
|---|
| 209 | be stored as such. Thus, the above could be extended in the following
|
|---|
| 210 | fashion:
|
|---|
| 211 |
|
|---|
| 212 |
|
|---|
| 213 | %x comment foo
|
|---|
| 214 | %%
|
|---|
| 215 | int line_num = 1;
|
|---|
| 216 | int comment_caller;
|
|---|
| 217 |
|
|---|
| 218 | "/*" {
|
|---|
| 219 | comment_caller = INITIAL;
|
|---|
| 220 | BEGIN(comment);
|
|---|
| 221 | }
|
|---|
| 222 |
|
|---|
| 223 | ...
|
|---|
| 224 |
|
|---|
| 225 | <foo>"/*" {
|
|---|
| 226 | comment_caller = foo;
|
|---|
| 227 | BEGIN(comment);
|
|---|
| 228 | }
|
|---|
| 229 |
|
|---|
| 230 | <comment>[^*\n]* /* eat anything that's not a '*' */
|
|---|
| 231 | <comment>"*"+[^*/\n]* /* eat up '*'s not followed by '/'s */
|
|---|
| 232 | <comment>\n ++line_num;
|
|---|
| 233 | <comment>"*"+"/" BEGIN(comment_caller);
|
|---|
| 234 |
|
|---|
| 235 | Furthermore, you can access the current start condition using the
|
|---|
| 236 | integer-valued `YY_START' macro. For example, the above assignments to
|
|---|
| 237 | `comment_caller' could instead be written
|
|---|
| 238 |
|
|---|
| 239 |
|
|---|
| 240 | comment_caller = YY_START;
|
|---|
| 241 |
|
|---|
| 242 | Flex provides `YYSTATE' as an alias for `YY_START' (since that is
|
|---|
| 243 | what's used by AT&T `lex').
|
|---|
| 244 |
|
|---|
| 245 | For historical reasons, start conditions do not have their own
|
|---|
| 246 | name-space within the generated scanner. The start condition names are
|
|---|
| 247 | unmodified in the generated scanner and generated header. *Note
|
|---|
| 248 | option-header::. *Note option-prefix::.
|
|---|
| 249 |
|
|---|
| 250 | Finally, here's an example of how to match C-style quoted strings
|
|---|
| 251 | using exclusive start conditions, including expanded escape sequences
|
|---|
| 252 | (but not including checking for a string that's too long):
|
|---|
| 253 |
|
|---|
| 254 |
|
|---|
| 255 | %x str
|
|---|
| 256 |
|
|---|
| 257 | %%
|
|---|
| 258 | char string_buf[MAX_STR_CONST];
|
|---|
| 259 | char *string_buf_ptr;
|
|---|
| 260 |
|
|---|
| 261 |
|
|---|
| 262 | \" string_buf_ptr = string_buf; BEGIN(str);
|
|---|
| 263 |
|
|---|
| 264 | <str>\" { /* saw closing quote - all done */
|
|---|
| 265 | BEGIN(INITIAL);
|
|---|
| 266 | *string_buf_ptr = '\0';
|
|---|
| 267 | /* return string constant token type and
|
|---|
| 268 | * value to parser
|
|---|
| 269 | */
|
|---|
| 270 | }
|
|---|
| 271 |
|
|---|
| 272 | <str>\n {
|
|---|
| 273 | /* error - unterminated string constant */
|
|---|
| 274 | /* generate error message */
|
|---|
| 275 | }
|
|---|
| 276 |
|
|---|
| 277 | <str>\\[0-7]{1,3} {
|
|---|
| 278 | /* octal escape sequence */
|
|---|
| 279 | int result;
|
|---|
| 280 |
|
|---|
| 281 | (void) sscanf( yytext + 1, "%o", &result );
|
|---|
| 282 |
|
|---|
| 283 | if ( result > 0xff )
|
|---|
| 284 | /* error, constant is out-of-bounds */
|
|---|
| 285 |
|
|---|
| 286 | *string_buf_ptr++ = result;
|
|---|
| 287 | }
|
|---|
| 288 |
|
|---|
| 289 | <str>\\[0-9]+ {
|
|---|
| 290 | /* generate error - bad escape sequence; something
|
|---|
| 291 | * like '\48' or '\0777777'
|
|---|
| 292 | */
|
|---|
| 293 | }
|
|---|
| 294 |
|
|---|
| 295 | <str>\\n *string_buf_ptr++ = '\n';
|
|---|
| 296 | <str>\\t *string_buf_ptr++ = '\t';
|
|---|
| 297 | <str>\\r *string_buf_ptr++ = '\r';
|
|---|
| 298 | <str>\\b *string_buf_ptr++ = '\b';
|
|---|
| 299 | <str>\\f *string_buf_ptr++ = '\f';
|
|---|
| 300 |
|
|---|
| 301 | <str>\\(.|\n) *string_buf_ptr++ = yytext[1];
|
|---|
| 302 |
|
|---|
| 303 | <str>[^\\\n\"]+ {
|
|---|
| 304 | char *yptr = yytext;
|
|---|
| 305 |
|
|---|
| 306 | while ( *yptr )
|
|---|
| 307 | *string_buf_ptr++ = *yptr++;
|
|---|
| 308 | }
|
|---|
| 309 |
|
|---|
| 310 | Often, such as in some of the examples above, you wind up writing a
|
|---|
| 311 | whole bunch of rules all preceded by the same start condition(s). Flex
|
|---|
| 312 | makes this a little easier and cleaner by introducing a notion of start
|
|---|
| 313 | condition "scope". A start condition scope is begun with:
|
|---|
| 314 |
|
|---|
| 315 |
|
|---|
| 316 | <SCs>{
|
|---|
| 317 |
|
|---|
| 318 | where `SCs' is a list of one or more start conditions. Inside the
|
|---|
| 319 | start condition scope, every rule automatically has the prefix `SCs>'
|
|---|
| 320 | applied to it, until a `}' which matches the initial `{'. So, for
|
|---|
| 321 | example,
|
|---|
| 322 |
|
|---|
| 323 |
|
|---|
| 324 | <ESC>{
|
|---|
| 325 | "\\n" return '\n';
|
|---|
| 326 | "\\r" return '\r';
|
|---|
| 327 | "\\f" return '\f';
|
|---|
| 328 | "\\0" return '\0';
|
|---|
| 329 | }
|
|---|
| 330 |
|
|---|
| 331 | is equivalent to:
|
|---|
| 332 |
|
|---|
| 333 |
|
|---|
| 334 | <ESC>"\\n" return '\n';
|
|---|
| 335 | <ESC>"\\r" return '\r';
|
|---|
| 336 | <ESC>"\\f" return '\f';
|
|---|
| 337 | <ESC>"\\0" return '\0';
|
|---|
| 338 |
|
|---|
| 339 | Start condition scopes may be nested.
|
|---|
| 340 |
|
|---|
| 341 | The following routines are available for manipulating stacks of
|
|---|
| 342 | start conditions:
|
|---|
| 343 |
|
|---|
| 344 | - Function: void yy_push_state ( int `new_state' )
|
|---|
| 345 | pushes the current start condition onto the top of the start
|
|---|
| 346 | condition stack and switches to `new_state' as though you had used
|
|---|
| 347 | `BEGIN new_state' (recall that start condition names are also
|
|---|
| 348 | integers).
|
|---|
| 349 |
|
|---|
| 350 | - Function: void yy_pop_state ()
|
|---|
| 351 | pops the top of the stack and switches to it via `BEGIN'.
|
|---|
| 352 |
|
|---|
| 353 | - Function: int yy_top_state ()
|
|---|
| 354 | returns the top of the stack without altering the stack's contents.
|
|---|
| 355 |
|
|---|
| 356 | The start condition stack grows dynamically and so has no built-in
|
|---|
| 357 | size limitation. If memory is exhausted, program execution aborts.
|
|---|
| 358 |
|
|---|
| 359 | To use start condition stacks, your scanner must include a `%option
|
|---|
| 360 | stack' directive (*note Scanner Options::).
|
|---|
| 361 |
|
|---|
| 362 |
|
|---|
| 363 | File: flex.info, Node: Multiple Input Buffers, Next: EOF, Prev: Start Conditions, Up: Top
|
|---|
| 364 |
|
|---|
| 365 | Multiple Input Buffers
|
|---|
| 366 | **********************
|
|---|
| 367 |
|
|---|
| 368 | Some scanners (such as those which support "include" files) require
|
|---|
| 369 | reading from several input streams. As `flex' scanners do a large
|
|---|
| 370 | amount of buffering, one cannot control where the next input will be
|
|---|
| 371 | read from by simply writing a `YY_INPUT()' which is sensitive to the
|
|---|
| 372 | scanning context. `YY_INPUT()' is only called when the scanner reaches
|
|---|
| 373 | the end of its buffer, which may be a long time after scanning a
|
|---|
| 374 | statement such as an `include' statement which requires switching the
|
|---|
| 375 | input source.
|
|---|
| 376 |
|
|---|
| 377 | To negotiate these sorts of problems, `flex' provides a mechanism
|
|---|
| 378 | for creating and switching between multiple input buffers. An input
|
|---|
| 379 | buffer is created by using:
|
|---|
| 380 |
|
|---|
| 381 | - Function: YY_BUFFER_STATE yy_create_buffer ( FILE *file, int size )
|
|---|
| 382 |
|
|---|
| 383 | which takes a `FILE' pointer and a size and creates a buffer
|
|---|
| 384 | associated with the given file and large enough to hold `size'
|
|---|
| 385 | characters (when in doubt, use `YY_BUF_SIZE' for the size). It returns
|
|---|
| 386 | a `YY_BUFFER_STATE' handle, which may then be passed to other routines
|
|---|
| 387 | (see below). The `YY_BUFFER_STATE' type is a pointer to an opaque
|
|---|
| 388 | `struct yy_buffer_state' structure, so you may safely initialize
|
|---|
| 389 | `YY_BUFFER_STATE' variables to `((YY_BUFFER_STATE) 0)' if you wish, and
|
|---|
| 390 | also refer to the opaque structure in order to correctly declare input
|
|---|
| 391 | buffers in source files other than that of your scanner. Note that the
|
|---|
| 392 | `FILE' pointer in the call to `yy_create_buffer' is only used as the
|
|---|
| 393 | value of `yyin' seen by `YY_INPUT'. If you redefine `YY_INPUT()' so it
|
|---|
| 394 | no longer uses `yyin', then you can safely pass a NULL `FILE' pointer to
|
|---|
| 395 | `yy_create_buffer'. You select a particular buffer to scan from using:
|
|---|
| 396 |
|
|---|
| 397 | - Function: void yy_switch_to_buffer ( YY_BUFFER_STATE new_buffer )
|
|---|
| 398 |
|
|---|
| 399 | The above function switches the scanner's input buffer so subsequent
|
|---|
| 400 | tokens will come from `new_buffer'. Note that `yy_switch_to_buffer()'
|
|---|
| 401 | may be used by `yywrap()' to set things up for continued scanning,
|
|---|
| 402 | instead of opening a new file and pointing `yyin' at it. If you are
|
|---|
| 403 | looking for a stack of input buffers, then you want to use
|
|---|
| 404 | `yypush_buffer_state()' instead of this function. Note also that
|
|---|
| 405 | switching input sources via either `yy_switch_to_buffer()' or
|
|---|
| 406 | `yywrap()' does _not_ change the start condition.
|
|---|
| 407 |
|
|---|
| 408 | - Function: void yy_delete_buffer ( YY_BUFFER_STATE buffer )
|
|---|
| 409 |
|
|---|
| 410 | is used to reclaim the storage associated with a buffer. (`buffer'
|
|---|
| 411 | can be NULL, in which case the routine does nothing.) You can also
|
|---|
| 412 | clear the current contents of a buffer using:
|
|---|
| 413 |
|
|---|
| 414 | - Function: void yypush_buffer_state ( YY_BUFFER_STATE buffer )
|
|---|
| 415 |
|
|---|
| 416 | This function pushes the new buffer state onto an internal stack.
|
|---|
| 417 | The pushed state becomes the new current state. The stack is maintained
|
|---|
| 418 | by flex and will grow as required. This function is intended to be used
|
|---|
| 419 | instead of `yy_switch_to_buffer', when you want to change states, but
|
|---|
| 420 | preserve the current state for later use.
|
|---|
| 421 |
|
|---|
| 422 | - Function: void yypop_buffer_state ( )
|
|---|
| 423 |
|
|---|
| 424 | This function removes the current state from the top of the stack,
|
|---|
| 425 | and deletes it by calling `yy_delete_buffer'. The next state on the
|
|---|
| 426 | stack, if any, becomes the new current state.
|
|---|
| 427 |
|
|---|
| 428 | - Function: void yy_flush_buffer ( YY_BUFFER_STATE buffer )
|
|---|
| 429 |
|
|---|
| 430 | This function discards the buffer's contents, so the next time the
|
|---|
| 431 | scanner attempts to match a token from the buffer, it will first fill
|
|---|
| 432 | the buffer anew using `YY_INPUT()'.
|
|---|
| 433 |
|
|---|
| 434 | - Function: YY_BUFFER_STATE yy_new_buffer ( FILE *file, int size )
|
|---|
| 435 |
|
|---|
| 436 | is an alias for `yy_create_buffer()', provided for compatibility
|
|---|
| 437 | with the C++ use of `new' and `delete' for creating and destroying
|
|---|
| 438 | dynamic objects.
|
|---|
| 439 |
|
|---|
| 440 | `YY_CURRENT_BUFFER' macro returns a `YY_BUFFER_STATE' handle to the
|
|---|
| 441 | current buffer. It should not be used as an lvalue.
|
|---|
| 442 |
|
|---|
| 443 | Here are two examples of using these features for writing a scanner
|
|---|
| 444 | which expands include files (the `<<EOF>>' feature is discussed below).
|
|---|
| 445 |
|
|---|
| 446 | This first example uses yypush_buffer_state and yypop_buffer_state.
|
|---|
| 447 | Flex maintains the stack internally.
|
|---|
| 448 |
|
|---|
| 449 |
|
|---|
| 450 | /* the "incl" state is used for picking up the name
|
|---|
| 451 | * of an include file
|
|---|
| 452 | */
|
|---|
| 453 | %x incl
|
|---|
| 454 | %%
|
|---|
| 455 | include BEGIN(incl);
|
|---|
| 456 |
|
|---|
| 457 | [a-z]+ ECHO;
|
|---|
| 458 | [^a-z\n]*\n? ECHO;
|
|---|
| 459 |
|
|---|
| 460 | <incl>[ \t]* /* eat the whitespace */
|
|---|
| 461 | <incl>[^ \t\n]+ { /* got the include file name */
|
|---|
| 462 | yyin = fopen( yytext, "r" );
|
|---|
| 463 |
|
|---|
| 464 | if ( ! yyin )
|
|---|
| 465 | error( ... );
|
|---|
| 466 |
|
|---|
| 467 | yypush_buffer_state(yy_create_buffer( yyin, YY_BUF_SIZE ));
|
|---|
| 468 |
|
|---|
| 469 | BEGIN(INITIAL);
|
|---|
| 470 | }
|
|---|
| 471 |
|
|---|
| 472 | <<EOF>> {
|
|---|
| 473 | yypop_buffer_state();
|
|---|
| 474 |
|
|---|
| 475 | if ( !YY_CURRENT_BUFFER )
|
|---|
| 476 | {
|
|---|
| 477 | yyterminate();
|
|---|
| 478 | }
|
|---|
| 479 | }
|
|---|
| 480 |
|
|---|
| 481 | The second example, below, does the same thing as the previous
|
|---|
| 482 | example did, but manages its own input buffer stack manually (instead
|
|---|
| 483 | of letting flex do it).
|
|---|
| 484 |
|
|---|
| 485 |
|
|---|
| 486 | /* the "incl" state is used for picking up the name
|
|---|
| 487 | * of an include file
|
|---|
| 488 | */
|
|---|
| 489 | %x incl
|
|---|
| 490 |
|
|---|
| 491 | %{
|
|---|
| 492 | #define MAX_INCLUDE_DEPTH 10
|
|---|
| 493 | YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
|
|---|
| 494 | int include_stack_ptr = 0;
|
|---|
| 495 | %}
|
|---|
| 496 |
|
|---|
| 497 | %%
|
|---|
| 498 | include BEGIN(incl);
|
|---|
| 499 |
|
|---|
| 500 | [a-z]+ ECHO;
|
|---|
| 501 | [^a-z\n]*\n? ECHO;
|
|---|
| 502 |
|
|---|
| 503 | <incl>[ \t]* /* eat the whitespace */
|
|---|
| 504 | <incl>[^ \t\n]+ { /* got the include file name */
|
|---|
| 505 | if ( include_stack_ptr >= MAX_INCLUDE_DEPTH )
|
|---|
| 506 | {
|
|---|
| 507 | fprintf( stderr, "Includes nested too deeply" );
|
|---|
| 508 | exit( 1 );
|
|---|
| 509 | }
|
|---|
| 510 |
|
|---|
| 511 | include_stack[include_stack_ptr++] =
|
|---|
| 512 | YY_CURRENT_BUFFER;
|
|---|
| 513 |
|
|---|
| 514 | yyin = fopen( yytext, "r" );
|
|---|
| 515 |
|
|---|
| 516 | if ( ! yyin )
|
|---|
| 517 | error( ... );
|
|---|
| 518 |
|
|---|
| 519 | yy_switch_to_buffer(
|
|---|
| 520 | yy_create_buffer( yyin, YY_BUF_SIZE ) );
|
|---|
| 521 |
|
|---|
| 522 | BEGIN(INITIAL);
|
|---|
| 523 | }
|
|---|
| 524 |
|
|---|
| 525 | <<EOF>> {
|
|---|
| 526 | if ( --include_stack_ptr 0 )
|
|---|
| 527 | {
|
|---|
| 528 | yyterminate();
|
|---|
| 529 | }
|
|---|
| 530 |
|
|---|
| 531 | else
|
|---|
| 532 | {
|
|---|
| 533 | yy_delete_buffer( YY_CURRENT_BUFFER );
|
|---|
| 534 | yy_switch_to_buffer(
|
|---|
| 535 | include_stack[include_stack_ptr] );
|
|---|
| 536 | }
|
|---|
| 537 | }
|
|---|
| 538 |
|
|---|
| 539 | The following routines are available for setting up input buffers for
|
|---|
| 540 | scanning in-memory strings instead of files. All of them create a new
|
|---|
| 541 | input buffer for scanning the string, and return a corresponding
|
|---|
| 542 | `YY_BUFFER_STATE' handle (which you should delete with
|
|---|
| 543 | `yy_delete_buffer()' when done with it). They also switch to the new
|
|---|
| 544 | buffer using `yy_switch_to_buffer()', so the next call to `yylex()'
|
|---|
| 545 | will start scanning the string.
|
|---|
| 546 |
|
|---|
| 547 | - Function: YY_BUFFER_STATE yy_scan_string ( const char *str )
|
|---|
| 548 | scans a NUL-terminated string.
|
|---|
| 549 |
|
|---|
| 550 | - Function: YY_BUFFER_STATE yy_scan_bytes ( const char *bytes, int len
|
|---|
| 551 | )
|
|---|
| 552 | scans `len' bytes (including possibly `NUL's) starting at location
|
|---|
| 553 | `bytes'.
|
|---|
| 554 |
|
|---|
| 555 | Note that both of these functions create and scan a _copy_ of the
|
|---|
| 556 | string or bytes. (This may be desirable, since `yylex()' modifies the
|
|---|
| 557 | contents of the buffer it is scanning.) You can avoid the copy by
|
|---|
| 558 | using:
|
|---|
| 559 |
|
|---|
| 560 | - Function: YY_BUFFER_STATE yy_scan_buffer (char *base, yy_size_t size)
|
|---|
| 561 | which scans in place the buffer starting at `base', consisting of
|
|---|
| 562 | `size' bytes, the last two bytes of which _must_ be
|
|---|
| 563 | `YY_END_OF_BUFFER_CHAR' (ASCII NUL). These last two bytes are not
|
|---|
| 564 | scanned; thus, scanning consists of `base[0]' through
|
|---|
| 565 | `base[size-2]', inclusive.
|
|---|
| 566 |
|
|---|
| 567 | If you fail to set up `base' in this manner (i.e., forget the final
|
|---|
| 568 | two `YY_END_OF_BUFFER_CHAR' bytes), then `yy_scan_buffer()' returns a
|
|---|
| 569 | NULL pointer instead of creating a new input buffer.
|
|---|
| 570 |
|
|---|
| 571 | - Data type: yy_size_t
|
|---|
| 572 | is an integral type to which you can cast an integer expression
|
|---|
| 573 | reflecting the size of the buffer.
|
|---|
| 574 |
|
|---|
| 575 |
|
|---|
| 576 | File: flex.info, Node: EOF, Next: Misc Macros, Prev: Multiple Input Buffers, Up: Top
|
|---|
| 577 |
|
|---|
| 578 | End-of-File Rules
|
|---|
| 579 | *****************
|
|---|
| 580 |
|
|---|
| 581 | The special rule `<<EOF>>' indicates actions which are to be taken
|
|---|
| 582 | when an end-of-file is encountered and `yywrap()' returns non-zero
|
|---|
| 583 | (i.e., indicates no further files to process). The action must finish
|
|---|
| 584 | by doing one of the following things:
|
|---|
| 585 |
|
|---|
| 586 | * assigning `yyin' to a new input file (in previous versions of
|
|---|
| 587 | `flex', after doing the assignment you had to call the special
|
|---|
| 588 | action `YY_NEW_FILE'. This is no longer necessary.)
|
|---|
| 589 |
|
|---|
| 590 | * executing a `return' statement;
|
|---|
| 591 |
|
|---|
| 592 | * executing the special `yyterminate()' action.
|
|---|
| 593 |
|
|---|
| 594 | * or, switching to a new buffer using `yy_switch_to_buffer()' as
|
|---|
| 595 | shown in the example above.
|
|---|
| 596 |
|
|---|
| 597 | <<EOF>> rules may not be used with other patterns; they may only be
|
|---|
| 598 | qualified with a list of start conditions. If an unqualified <<EOF>>
|
|---|
| 599 | rule is given, it applies to _all_ start conditions which do not
|
|---|
| 600 | already have <<EOF>> actions. To specify an <<EOF>> rule for only the
|
|---|
| 601 | initial start condition, use:
|
|---|
| 602 |
|
|---|
| 603 |
|
|---|
| 604 | <INITIAL><<EOF>>
|
|---|
| 605 |
|
|---|
| 606 | These rules are useful for catching things like unclosed comments.
|
|---|
| 607 | An example:
|
|---|
| 608 |
|
|---|
| 609 |
|
|---|
| 610 | %x quote
|
|---|
| 611 | %%
|
|---|
| 612 |
|
|---|
| 613 | ...other rules for dealing with quotes...
|
|---|
| 614 |
|
|---|
| 615 | <quote><<EOF>> {
|
|---|
| 616 | error( "unterminated quote" );
|
|---|
| 617 | yyterminate();
|
|---|
| 618 | }
|
|---|
| 619 | <<EOF>> {
|
|---|
| 620 | if ( *++filelist )
|
|---|
| 621 | yyin = fopen( *filelist, "r" );
|
|---|
| 622 | else
|
|---|
| 623 | yyterminate();
|
|---|
| 624 | }
|
|---|
| 625 |
|
|---|
| 626 |
|
|---|
| 627 | File: flex.info, Node: Misc Macros, Next: User Values, Prev: EOF, Up: Top
|
|---|
| 628 |
|
|---|
| 629 | Miscellaneous Macros
|
|---|
| 630 | ********************
|
|---|
| 631 |
|
|---|
| 632 | The macro `YY_USER_ACTION' can be defined to provide an action which
|
|---|
| 633 | is always executed prior to the matched rule's action. For example, it
|
|---|
| 634 | could be #define'd to call a routine to convert yytext to lower-case.
|
|---|
| 635 | When `YY_USER_ACTION' is invoked, the variable `yy_act' gives the
|
|---|
| 636 | number of the matched rule (rules are numbered starting with 1).
|
|---|
| 637 | Suppose you want to profile how often each of your rules is matched.
|
|---|
| 638 | The following would do the trick:
|
|---|
| 639 |
|
|---|
| 640 |
|
|---|
| 641 | #define YY_USER_ACTION ++ctr[yy_act]
|
|---|
| 642 |
|
|---|
| 643 | where `ctr' is an array to hold the counts for the different rules.
|
|---|
| 644 | Note that the macro `YY_NUM_RULES' gives the total number of rules
|
|---|
| 645 | (including the default rule), even if you use `-s)', so a correct
|
|---|
| 646 | declaration for `ctr' is:
|
|---|
| 647 |
|
|---|
| 648 |
|
|---|
| 649 | int ctr[YY_NUM_RULES];
|
|---|
| 650 |
|
|---|
| 651 | The macro `YY_USER_INIT' may be defined to provide an action which
|
|---|
| 652 | is always executed before the first scan (and before the scanner's
|
|---|
| 653 | internal initializations are done). For example, it could be used to
|
|---|
| 654 | call a routine to read in a data table or open a logging file.
|
|---|
| 655 |
|
|---|
| 656 | The macro `yy_set_interactive(is_interactive)' can be used to
|
|---|
| 657 | control whether the current buffer is considered "interactive". An
|
|---|
| 658 | interactive buffer is processed more slowly, but must be used when the
|
|---|
| 659 | scanner's input source is indeed interactive to avoid problems due to
|
|---|
| 660 | waiting to fill buffers (see the discussion of the `-I' flag in *Note
|
|---|
| 661 | Scanner Options::). A non-zero value in the macro invocation marks the
|
|---|
| 662 | buffer as interactive, a zero value as non-interactive. Note that use
|
|---|
| 663 | of this macro overrides `%option always-interactive' or `%option
|
|---|
| 664 | never-interactive' (*note Scanner Options::). `yy_set_interactive()'
|
|---|
| 665 | must be invoked prior to beginning to scan the buffer that is (or is
|
|---|
| 666 | not) to be considered interactive.
|
|---|
| 667 |
|
|---|
| 668 | The macro `yy_set_bol(at_bol)' can be used to control whether the
|
|---|
| 669 | current buffer's scanning context for the next token match is done as
|
|---|
| 670 | though at the beginning of a line. A non-zero macro argument makes
|
|---|
| 671 | rules anchored with `^' active, while a zero argument makes `^' rules
|
|---|
| 672 | inactive.
|
|---|
| 673 |
|
|---|
| 674 | The macro `YY_AT_BOL()' returns true if the next token scanned from
|
|---|
| 675 | the current buffer will have `^' rules active, false otherwise.
|
|---|
| 676 |
|
|---|
| 677 | In the generated scanner, the actions are all gathered in one large
|
|---|
| 678 | switch statement and separated using `YY_BREAK', which may be
|
|---|
| 679 | redefined. By default, it is simply a `break', to separate each rule's
|
|---|
| 680 | action from the following rule's. Redefining `YY_BREAK' allows, for
|
|---|
| 681 | example, C++ users to #define YY_BREAK to do nothing (while being very
|
|---|
| 682 | careful that every rule ends with a `break'" or a `return'!) to avoid
|
|---|
| 683 | suffering from unreachable statement warnings where because a rule's
|
|---|
| 684 | action ends with `return', the `YY_BREAK' is inaccessible.
|
|---|
| 685 |
|
|---|
| 686 |
|
|---|
| 687 | File: flex.info, Node: User Values, Next: Yacc, Prev: Misc Macros, Up: Top
|
|---|
| 688 |
|
|---|
| 689 | Values Available To the User
|
|---|
| 690 | ****************************
|
|---|
| 691 |
|
|---|
| 692 | This chapter summarizes the various values available to the user in
|
|---|
| 693 | the rule actions.
|
|---|
| 694 |
|
|---|
| 695 | `char *yytext'
|
|---|
| 696 | holds the text of the current token. It may be modified but not
|
|---|
| 697 | lengthened (you cannot append characters to the end).
|
|---|
| 698 |
|
|---|
| 699 | If the special directive `%array' appears in the first section of
|
|---|
| 700 | the scanner description, then `yytext' is instead declared `char
|
|---|
| 701 | yytext[YYLMAX]', where `YYLMAX' is a macro definition that you can
|
|---|
| 702 | redefine in the first section if you don't like the default value
|
|---|
| 703 | (generally 8KB). Using `%array' results in somewhat slower
|
|---|
| 704 | scanners, but the value of `yytext' becomes immune to calls to
|
|---|
| 705 | `unput()', which potentially destroy its value when `yytext' is a
|
|---|
| 706 | character pointer. The opposite of `%array' is `%pointer', which
|
|---|
| 707 | is the default.
|
|---|
| 708 |
|
|---|
| 709 | You cannot use `%array' when generating C++ scanner classes (the
|
|---|
| 710 | `-+' flag).
|
|---|
| 711 |
|
|---|
| 712 | `int yyleng'
|
|---|
| 713 | holds the length of the current token.
|
|---|
| 714 |
|
|---|
| 715 | `FILE *yyin'
|
|---|
| 716 | is the file which by default `flex' reads from. It may be
|
|---|
| 717 | redefined but doing so only makes sense before scanning begins or
|
|---|
| 718 | after an EOF has been encountered. Changing it in the midst of
|
|---|
| 719 | scanning will have unexpected results since `flex' buffers its
|
|---|
| 720 | input; use `yyrestart()' instead. Once scanning terminates
|
|---|
| 721 | because an end-of-file has been seen, you can assign `yyin' at the
|
|---|
| 722 | new input file and then call the scanner again to continue
|
|---|
| 723 | scanning.
|
|---|
| 724 |
|
|---|
| 725 | `void yyrestart( FILE *new_file )'
|
|---|
| 726 | may be called to point `yyin' at the new input file. The
|
|---|
| 727 | switch-over to the new file is immediate (any previously
|
|---|
| 728 | buffered-up input is lost). Note that calling `yyrestart()' with
|
|---|
| 729 | `yyin' as an argument thus throws away the current input buffer
|
|---|
| 730 | and continues scanning the same input file.
|
|---|
| 731 |
|
|---|
| 732 | `FILE *yyout'
|
|---|
| 733 | is the file to which `ECHO' actions are done. It can be reassigned
|
|---|
| 734 | by the user.
|
|---|
| 735 |
|
|---|
| 736 | `YY_CURRENT_BUFFER'
|
|---|
| 737 | returns a `YY_BUFFER_STATE' handle to the current buffer.
|
|---|
| 738 |
|
|---|
| 739 | `YY_START'
|
|---|
| 740 | returns an integer value corresponding to the current start
|
|---|
| 741 | condition. You can subsequently use this value with `BEGIN' to
|
|---|
| 742 | return to that start condition.
|
|---|
| 743 |
|
|---|
| 744 |
|
|---|
| 745 | File: flex.info, Node: Yacc, Next: Scanner Options, Prev: User Values, Up: Top
|
|---|
| 746 |
|
|---|
| 747 | Interfacing with Yacc
|
|---|
| 748 | *********************
|
|---|
| 749 |
|
|---|
| 750 | One of the main uses of `flex' is as a companion to the `yacc'
|
|---|
| 751 | parser-generator. `yacc' parsers expect to call a routine named
|
|---|
| 752 | `yylex()' to find the next input token. The routine is supposed to
|
|---|
| 753 | return the type of the next token as well as putting any associated
|
|---|
| 754 | value in the global `yylval'. To use `flex' with `yacc', one specifies
|
|---|
| 755 | the `-d' option to `yacc' to instruct it to generate the file `y.tab.h'
|
|---|
| 756 | containing definitions of all the `%tokens' appearing in the `yacc'
|
|---|
| 757 | input. This file is then included in the `flex' scanner. For example,
|
|---|
| 758 | if one of the tokens is `TOK_NUMBER', part of the scanner might look
|
|---|
| 759 | like:
|
|---|
| 760 |
|
|---|
| 761 |
|
|---|
| 762 | %{
|
|---|
| 763 | #include "y.tab.h"
|
|---|
| 764 | %}
|
|---|
| 765 |
|
|---|
| 766 | %%
|
|---|
| 767 |
|
|---|
| 768 | [0-9]+ yylval = atoi( yytext ); return TOK_NUMBER;
|
|---|
| 769 |
|
|---|
| 770 |
|
|---|
| 771 | File: flex.info, Node: Scanner Options, Next: Performance, Prev: Yacc, Up: Top
|
|---|
| 772 |
|
|---|
| 773 | Scanner Options
|
|---|
| 774 | ***************
|
|---|
| 775 |
|
|---|
| 776 | The various `flex' options are categorized by function in the
|
|---|
| 777 | following menu. If you want to lookup a particular option by name,
|
|---|
| 778 | *Note Index of Scanner Options::.
|
|---|
| 779 |
|
|---|
| 780 | * Menu:
|
|---|
| 781 |
|
|---|
| 782 | * Options for Specifing Filenames::
|
|---|
| 783 | * Options Affecting Scanner Behavior::
|
|---|
| 784 | * Code-Level And API Options::
|
|---|
| 785 | * Options for Scanner Speed and Size::
|
|---|
| 786 | * Debugging Options::
|
|---|
| 787 | * Miscellaneous Options::
|
|---|
| 788 |
|
|---|
| 789 | Even though there are many scanner options, a typical scanner might
|
|---|
| 790 | only specify the following options:
|
|---|
| 791 |
|
|---|
| 792 |
|
|---|
| 793 | %option 8bit reentrant bison-bridge
|
|---|
| 794 | %option warn nodefault
|
|---|
| 795 | %option yylineno
|
|---|
| 796 | %option outfile="scanner.c" header-file="scanner.h"
|
|---|
| 797 |
|
|---|
| 798 | The first line specifies the general type of scanner we want. The
|
|---|
| 799 | second line specifies that we are being careful. The third line asks
|
|---|
| 800 | flex to track line numbers. The last line tells flex what to name the
|
|---|
| 801 | files. (The options can be specified in any order. We just dividied
|
|---|
| 802 | them.)
|
|---|
| 803 |
|
|---|
| 804 | `flex' also provides a mechanism for controlling options within the
|
|---|
| 805 | scanner specification itself, rather than from the flex command-line.
|
|---|
| 806 | This is done by including `%option' directives in the first section of
|
|---|
| 807 | the scanner specification. You can specify multiple options with a
|
|---|
| 808 | single `%option' directive, and multiple directives in the first
|
|---|
| 809 | section of your flex input file.
|
|---|
| 810 |
|
|---|
| 811 | Most options are given simply as names, optionally preceded by the
|
|---|
| 812 | word `no' (with no intervening whitespace) to negate their meaning.
|
|---|
| 813 | The names are the same as their long-option equivalents (but without the
|
|---|
| 814 | leading `--' ).
|
|---|
| 815 |
|
|---|
| 816 | `flex' scans your rule actions to determine whether you use the
|
|---|
| 817 | `REJECT' or `yymore()' features. The `REJECT' and `yymore' options are
|
|---|
| 818 | available to override its decision as to whether you use the options,
|
|---|
| 819 | either by setting them (e.g., `%option reject)' to indicate the feature
|
|---|
| 820 | is indeed used, or unsetting them to indicate it actually is not used
|
|---|
| 821 | (e.g., `%option noyymore)'.
|
|---|
| 822 |
|
|---|
| 823 | A number of options are available for lint purists who want to
|
|---|
| 824 | suppress the appearance of unneeded routines in the generated scanner.
|
|---|
| 825 | Each of the following, if unset (e.g., `%option nounput'), results in
|
|---|
| 826 | the corresponding routine not appearing in the generated scanner:
|
|---|
| 827 |
|
|---|
| 828 |
|
|---|
| 829 | input, unput
|
|---|
| 830 | yy_push_state, yy_pop_state, yy_top_state
|
|---|
| 831 | yy_scan_buffer, yy_scan_bytes, yy_scan_string
|
|---|
| 832 |
|
|---|
| 833 | yyget_extra, yyset_extra, yyget_leng, yyget_text,
|
|---|
| 834 | yyget_lineno, yyset_lineno, yyget_in, yyset_in,
|
|---|
| 835 | yyget_out, yyset_out, yyget_lval, yyset_lval,
|
|---|
| 836 | yyget_lloc, yyset_lloc, yyget_debug, yyset_debug
|
|---|
| 837 |
|
|---|
| 838 | (though `yy_push_state()' and friends won't appear anyway unless you
|
|---|
| 839 | use `%option stack)'.
|
|---|
| 840 |
|
|---|
| 841 |
|
|---|
| 842 | File: flex.info, Node: Options for Specifing Filenames, Next: Options Affecting Scanner Behavior, Prev: Scanner Options, Up: Scanner Options
|
|---|
| 843 |
|
|---|
| 844 | Options for Specifing Filenames
|
|---|
| 845 | ===============================
|
|---|
| 846 |
|
|---|
| 847 | `--header-file=FILE, `%option header-file="FILE"''
|
|---|
| 848 | instructs flex to write a C header to `FILE'. This file contains
|
|---|
| 849 | function prototypes, extern variables, and types used by the
|
|---|
| 850 | scanner. Only the external API is exported by the header file.
|
|---|
| 851 | Many macros that are usable from within scanner actions are not
|
|---|
| 852 | exported to the header file. This is due to namespace problems and
|
|---|
| 853 | the goal of a clean external API.
|
|---|
| 854 |
|
|---|
| 855 | While in the header, the macro `yyIN_HEADER' is defined, where `yy'
|
|---|
| 856 | is substituted with the appropriate prefix.
|
|---|
| 857 |
|
|---|
| 858 | The `--header-file' option is not compatible with the `--c++'
|
|---|
| 859 | option, since the C++ scanner provides its own header in
|
|---|
| 860 | `yyFlexLexer.h'.
|
|---|
| 861 |
|
|---|
| 862 | `-oFILE, --outfile=FILE, `%option outfile="FILE"''
|
|---|
| 863 | directs flex to write the scanner to the file `FILE' instead of
|
|---|
| 864 | `lex.yy.c'. If you combine `--outfile' with the `--stdout' option,
|
|---|
| 865 | then the scanner is written to `stdout' but its `#line' directives
|
|---|
| 866 | (see the `-l' option above) refer to the file `FILE'.
|
|---|
| 867 |
|
|---|
| 868 | `-t, --stdout, `%option stdout''
|
|---|
| 869 | instructs `flex' to write the scanner it generates to standard
|
|---|
| 870 | output instead of `lex.yy.c'.
|
|---|
| 871 |
|
|---|
| 872 | `-SFILE, --skel=FILE'
|
|---|
| 873 | overrides the default skeleton file from which `flex' constructs
|
|---|
| 874 | its scanners. You'll never need this option unless you are doing
|
|---|
| 875 | `flex' maintenance or development.
|
|---|
| 876 |
|
|---|
| 877 | `--tables-file=FILE'
|
|---|
| 878 | Write serialized scanner dfa tables to FILE. The generated scanner
|
|---|
| 879 | will not contain the tables, and requires them to be loaded at
|
|---|
| 880 | runtime. *Note serialization::.
|
|---|
| 881 |
|
|---|
| 882 | `--tables-verify'
|
|---|
| 883 | This option is for flex development. We document it here in case
|
|---|
| 884 | you stumble upon it by accident or in case you suspect some
|
|---|
| 885 | inconsistency in the serialized tables. Flex will serialize the
|
|---|
| 886 | scanner dfa tables but will also generate the in-code tables as it
|
|---|
| 887 | normally does. At runtime, the scanner will verify that the
|
|---|
| 888 | serialized tables match the in-code tables, instead of loading
|
|---|
| 889 | them.
|
|---|
| 890 |
|
|---|
| 891 |
|
|---|
| 892 |
|
|---|
| 893 | File: flex.info, Node: Options Affecting Scanner Behavior, Next: Code-Level And API Options, Prev: Options for Specifing Filenames, Up: Scanner Options
|
|---|
| 894 |
|
|---|
| 895 | Options Affecting Scanner Behavior
|
|---|
| 896 | ==================================
|
|---|
| 897 |
|
|---|
| 898 | `-i, --case-insensitive, `%option case-insensitive''
|
|---|
| 899 | instructs `flex' to generate a "case-insensitive" scanner. The
|
|---|
| 900 | case of letters given in the `flex' input patterns will be ignored,
|
|---|
| 901 | and tokens in the input will be matched regardless of case. The
|
|---|
| 902 | matched text given in `yytext' will have the preserved case (i.e.,
|
|---|
| 903 | it will not be folded). For tricky behavior, see *Note case and
|
|---|
| 904 | character ranges::.
|
|---|
| 905 |
|
|---|
| 906 | `-l, --lex-compat, `%option lex-compat''
|
|---|
| 907 | turns on maximum compatibility with the original AT&T `lex'
|
|---|
| 908 | implementation. Note that this does not mean _full_ compatibility.
|
|---|
| 909 | Use of this option costs a considerable amount of performance, and
|
|---|
| 910 | it cannot be used with the `--c++', `--full', `--fast', `-Cf', or
|
|---|
| 911 | `-CF' options. For details on the compatibilities it provides, see
|
|---|
| 912 | *Note Lex and Posix::. This option also results in the name
|
|---|
| 913 | `YY_FLEX_LEX_COMPAT' being `#define''d in the generated scanner.
|
|---|
| 914 |
|
|---|
| 915 | `-B, --batch, `%option batch''
|
|---|
| 916 | instructs `flex' to generate a "batch" scanner, the opposite of
|
|---|
| 917 | _interactive_ scanners generated by `--interactive' (see below).
|
|---|
| 918 | In general, you use `-B' when you are _certain_ that your scanner
|
|---|
| 919 | will never be used interactively, and you want to squeeze a
|
|---|
| 920 | _little_ more performance out of it. If your goal is instead to
|
|---|
| 921 | squeeze out a _lot_ more performance, you should be using the
|
|---|
| 922 | `-Cf' or `-CF' options, which turn on `--batch' automatically
|
|---|
| 923 | anyway.
|
|---|
| 924 |
|
|---|
| 925 | `-I, --interactive, `%option interactive''
|
|---|
| 926 | instructs `flex' to generate an interactive scanner. An
|
|---|
| 927 | interactive scanner is one that only looks ahead to decide what
|
|---|
| 928 | token has been matched if it absolutely must. It turns out that
|
|---|
| 929 | always looking one extra character ahead, even if the scanner has
|
|---|
| 930 | already seen enough text to disambiguate the current token, is a
|
|---|
| 931 | bit faster than only looking ahead when necessary. But scanners
|
|---|
| 932 | that always look ahead give dreadful interactive performance; for
|
|---|
| 933 | example, when a user types a newline, it is not recognized as a
|
|---|
| 934 | newline token until they enter _another_ token, which often means
|
|---|
| 935 | typing in another whole line.
|
|---|
| 936 |
|
|---|
| 937 | `flex' scanners default to `interactive' unless you use the `-Cf'
|
|---|
| 938 | or `-CF' table-compression options (*note Performance::). That's
|
|---|
| 939 | because if you're looking for high-performance you should be using
|
|---|
| 940 | one of these options, so if you didn't, `flex' assumes you'd
|
|---|
| 941 | rather trade off a bit of run-time performance for intuitive
|
|---|
| 942 | interactive behavior. Note also that you _cannot_ use
|
|---|
| 943 | `--interactive' in conjunction with `-Cf' or `-CF'. Thus, this
|
|---|
| 944 | option is not really needed; it is on by default for all those
|
|---|
| 945 | cases in which it is allowed.
|
|---|
| 946 |
|
|---|
| 947 | You can force a scanner to _not_ be interactive by using `--batch'
|
|---|
| 948 |
|
|---|
| 949 | `-7, --7bit, `%option 7bit''
|
|---|
| 950 | instructs `flex' to generate a 7-bit scanner, i.e., one which can
|
|---|
| 951 | only recognize 7-bit characters in its input. The advantage of
|
|---|
| 952 | using `--7bit' is that the scanner's tables can be up to half the
|
|---|
| 953 | size of those generated using the `--8bit'. The disadvantage is
|
|---|
| 954 | that such scanners often hang or crash if their input contains an
|
|---|
| 955 | 8-bit character.
|
|---|
| 956 |
|
|---|
| 957 | Note, however, that unless you generate your scanner using the
|
|---|
| 958 | `-Cf' or `-CF' table compression options, use of `--7bit' will
|
|---|
| 959 | save only a small amount of table space, and make your scanner
|
|---|
| 960 | considerably less portable. `Flex''s default behavior is to
|
|---|
| 961 | generate an 8-bit scanner unless you use the `-Cf' or `-CF', in
|
|---|
| 962 | which case `flex' defaults to generating 7-bit scanners unless
|
|---|
| 963 | your site was always configured to generate 8-bit scanners (as will
|
|---|
| 964 | often be the case with non-USA sites). You can tell whether flex
|
|---|
| 965 | generated a 7-bit or an 8-bit scanner by inspecting the flag
|
|---|
| 966 | summary in the `--verbose' output as described above.
|
|---|
| 967 |
|
|---|
| 968 | Note that if you use `-Cfe' or `-CFe' `flex' still defaults to
|
|---|
| 969 | generating an 8-bit scanner, since usually with these compression
|
|---|
| 970 | options full 8-bit tables are not much more expensive than 7-bit
|
|---|
| 971 | tables.
|
|---|
| 972 |
|
|---|
| 973 | `-8, --8bit, `%option 8bit''
|
|---|
| 974 | instructs `flex' to generate an 8-bit scanner, i.e., one which can
|
|---|
| 975 | recognize 8-bit characters. This flag is only needed for scanners
|
|---|
| 976 | generated using `-Cf' or `-CF', as otherwise flex defaults to
|
|---|
| 977 | generating an 8-bit scanner anyway.
|
|---|
| 978 |
|
|---|
| 979 | See the discussion of `--7bit' above for `flex''s default behavior
|
|---|
| 980 | and the tradeoffs between 7-bit and 8-bit scanners.
|
|---|
| 981 |
|
|---|
| 982 | `--default, `%option default''
|
|---|
| 983 | generate the default rule.
|
|---|
| 984 |
|
|---|
| 985 | `--always-interactive, `%option always-interactive''
|
|---|
| 986 | instructs flex to generate a scanner which always considers its
|
|---|
| 987 | input _interactive_. Normally, on each new input file the scanner
|
|---|
| 988 | calls `isatty()' in an attempt to determine whether the scanner's
|
|---|
| 989 | input source is interactive and thus should be read a character at
|
|---|
| 990 | a time. When this option is used, however, then no such call is
|
|---|
| 991 | made.
|
|---|
| 992 |
|
|---|
| 993 | `--never-interactive, `--never-interactive''
|
|---|
| 994 | instructs flex to generate a scanner which never considers its
|
|---|
| 995 | input interactive. This is the opposite of `always-interactive'.
|
|---|
| 996 |
|
|---|
| 997 | `-X, --posix, `%option posix''
|
|---|
| 998 | turns on maximum compatibility with the POSIX 1003.2-1992
|
|---|
| 999 | definition of `lex'. Since `flex' was originally designed to
|
|---|
| 1000 | implement the POSIX definition of `lex' this generally involves
|
|---|
| 1001 | very few changes in behavior. At the current writing the known
|
|---|
| 1002 | differences between `flex' and the POSIX standard are:
|
|---|
| 1003 |
|
|---|
| 1004 | * In POSIX and AT&T `lex', the repeat operator, `{}', has lower
|
|---|
| 1005 | precedence than concatenation (thus `ab{3}' yields `ababab').
|
|---|
| 1006 | Most POSIX utilities use an Extended Regular Expression (ERE)
|
|---|
| 1007 | precedence that has the precedence of the repeat operator
|
|---|
| 1008 | higher than concatenation (which causes `ab{3}' to yield
|
|---|
| 1009 | `abbb'). By default, `flex' places the precedence of the
|
|---|
| 1010 | repeat operator higher than concatenation which matches the
|
|---|
| 1011 | ERE processing of other POSIX utilities. When either
|
|---|
| 1012 | `--posix' or `-l' are specified, `flex' will use the
|
|---|
| 1013 | traditional AT&T and POSIX-compliant precedence for the
|
|---|
| 1014 | repeat operator where concatenation has higher precedence
|
|---|
| 1015 | than the repeat operator.
|
|---|
| 1016 |
|
|---|
| 1017 | `--stack, `%option stack''
|
|---|
| 1018 | enables the use of start condition stacks (*note Start
|
|---|
| 1019 | Conditions::).
|
|---|
| 1020 |
|
|---|
| 1021 | `--stdinit, `%option stdinit''
|
|---|
| 1022 | if set (i.e., %option stdinit) initializes `yyin' and `yyout' to
|
|---|
| 1023 | `stdin' and `stdout', instead of the default of `NULL'. Some
|
|---|
| 1024 | existing `lex' programs depend on this behavior, even though it is
|
|---|
| 1025 | not compliant with ANSI C, which does not require `stdin' and
|
|---|
| 1026 | `stdout' to be compile-time constant. In a reentrant scanner,
|
|---|
| 1027 | however, this is not a problem since initialization is performed
|
|---|
| 1028 | in `yylex_init' at runtime.
|
|---|
| 1029 |
|
|---|
| 1030 | `--yylineno, `%option yylineno''
|
|---|
| 1031 | directs `flex' to generate a scanner that maintains the number of
|
|---|
| 1032 | the current line read from its input in the global variable
|
|---|
| 1033 | `yylineno'. This option is implied by `%option lex-compat'. In a
|
|---|
| 1034 | reentrant C scanner, the macro `yylineno' is accessible regardless
|
|---|
| 1035 | of the value of `%option yylineno', however, its value is not
|
|---|
| 1036 | modified by `flex' unless `%option yylineno' is enabled.
|
|---|
| 1037 |
|
|---|
| 1038 | `--yywrap, `%option yywrap''
|
|---|
| 1039 | if unset (i.e., `--noyywrap)', makes the scanner not call
|
|---|
| 1040 | `yywrap()' upon an end-of-file, but simply assume that there are no
|
|---|
| 1041 | more files to scan (until the user points `yyin' at a new file and
|
|---|
| 1042 | calls `yylex()' again).
|
|---|
| 1043 |
|
|---|
| 1044 |
|
|---|
| 1045 |
|
|---|
| 1046 | File: flex.info, Node: Code-Level And API Options, Next: Options for Scanner Speed and Size, Prev: Options Affecting Scanner Behavior, Up: Scanner Options
|
|---|
| 1047 |
|
|---|
| 1048 | Code-Level And API Options
|
|---|
| 1049 | ==========================
|
|---|
| 1050 |
|
|---|
| 1051 | `--ansi-definitions, `%option ansi-definitions''
|
|---|
| 1052 | instruct flex to generate ANSI C99 definitions for functions.
|
|---|
| 1053 | This option is enabled by default. If `%option
|
|---|
| 1054 | noansi-definitions' is specified, then the obsolete style is
|
|---|
| 1055 | generated.
|
|---|
| 1056 |
|
|---|
| 1057 | `--ansi-prototypes, `%option ansi-prototypes''
|
|---|
| 1058 | instructs flex to generate ANSI C99 prototypes for functions.
|
|---|
| 1059 | This option is enabled by default. If `noansi-prototypes' is
|
|---|
| 1060 | specified, then prototypes will have empty parameter lists.
|
|---|
| 1061 |
|
|---|
| 1062 | `--bison-bridge, `%option bison-bridge''
|
|---|
| 1063 | instructs flex to generate a C scanner that is meant to be called
|
|---|
| 1064 | by a `GNU bison' parser. The scanner has minor API changes for
|
|---|
| 1065 | `bison' compatibility. In particular, the declaration of `yylex'
|
|---|
| 1066 | is modified to take an additional parameter, `yylval'. *Note
|
|---|
| 1067 | Bison Bridge::.
|
|---|
| 1068 |
|
|---|
| 1069 | `--bison-locations, `%option bison-locations''
|
|---|
| 1070 | instruct flex that `GNU bison' `%locations' are being used. This
|
|---|
| 1071 | means `yylex' will be passed an additional parameter, `yylloc'.
|
|---|
| 1072 | This option implies `%option bison-bridge'. *Note Bison Bridge::.
|
|---|
| 1073 |
|
|---|
| 1074 | `-L, --noline, `%option noline''
|
|---|
| 1075 | instructs `flex' not to generate `#line' directives. Without this
|
|---|
| 1076 | option, `flex' peppers the generated scanner with `#line'
|
|---|
| 1077 | directives so error messages in the actions will be correctly
|
|---|
| 1078 | located with respect to either the original `flex' input file (if
|
|---|
| 1079 | the errors are due to code in the input file), or `lex.yy.c' (if
|
|---|
| 1080 | the errors are `flex''s fault - you should report these sorts of
|
|---|
| 1081 | errors to the email address given in *Note Reporting Bugs::).
|
|---|
| 1082 |
|
|---|
| 1083 | `-R, --reentrant, `%option reentrant''
|
|---|
| 1084 | instructs flex to generate a reentrant C scanner. The generated
|
|---|
| 1085 | scanner may safely be used in a multi-threaded environment. The
|
|---|
| 1086 | API for a reentrant scanner is different than for a non-reentrant
|
|---|
| 1087 | scanner *note Reentrant::). Because of the API difference between
|
|---|
| 1088 | reentrant and non-reentrant `flex' scanners, non-reentrant flex
|
|---|
| 1089 | code must be modified before it is suitable for use with this
|
|---|
| 1090 | option. This option is not compatible with the `--c++' option.
|
|---|
| 1091 |
|
|---|
| 1092 | The option `--reentrant' does not affect the performance of the
|
|---|
| 1093 | scanner.
|
|---|
| 1094 |
|
|---|
| 1095 | `-+, --c++, `%option c++''
|
|---|
| 1096 | specifies that you want flex to generate a C++ scanner class.
|
|---|
| 1097 | *Note Cxx::, for details.
|
|---|
| 1098 |
|
|---|
| 1099 | `--array, `%option array''
|
|---|
| 1100 | specifies that you want yytext to be an array instead of a char*
|
|---|
| 1101 |
|
|---|
| 1102 | `--pointer, `%option pointer''
|
|---|
| 1103 | specify that `yytext' should be a `char *', not an array. This
|
|---|
| 1104 | default is `char *'.
|
|---|
| 1105 |
|
|---|
| 1106 | `-PPREFIX, --prefix=PREFIX, `%option prefix="PREFIX"''
|
|---|
| 1107 | changes the default `yy' prefix used by `flex' for all
|
|---|
| 1108 | globally-visible variable and function names to instead be
|
|---|
| 1109 | `PREFIX'. For example, `--prefix=foo' changes the name of
|
|---|
| 1110 | `yytext' to `footext'. It also changes the name of the default
|
|---|
| 1111 | output file from `lex.yy.c' to `lex.foo.c'. Here is a partial
|
|---|
| 1112 | list of the names affected:
|
|---|
| 1113 |
|
|---|
| 1114 |
|
|---|
| 1115 | yy_create_buffer
|
|---|
| 1116 | yy_delete_buffer
|
|---|
| 1117 | yy_flex_debug
|
|---|
| 1118 | yy_init_buffer
|
|---|
| 1119 | yy_flush_buffer
|
|---|
| 1120 | yy_load_buffer_state
|
|---|
| 1121 | yy_switch_to_buffer
|
|---|
| 1122 | yyin
|
|---|
| 1123 | yyleng
|
|---|
| 1124 | yylex
|
|---|
| 1125 | yylineno
|
|---|
| 1126 | yyout
|
|---|
| 1127 | yyrestart
|
|---|
| 1128 | yytext
|
|---|
| 1129 | yywrap
|
|---|
| 1130 | yyalloc
|
|---|
| 1131 | yyrealloc
|
|---|
| 1132 | yyfree
|
|---|
| 1133 |
|
|---|
| 1134 | (If you are using a C++ scanner, then only `yywrap' and
|
|---|
| 1135 | `yyFlexLexer' are affected.) Within your scanner itself, you can
|
|---|
| 1136 | still refer to the global variables and functions using either
|
|---|
| 1137 | version of their name; but externally, they have the modified name.
|
|---|
| 1138 |
|
|---|
| 1139 | This option lets you easily link together multiple `flex' programs
|
|---|
| 1140 | into the same executable. Note, though, that using this option
|
|---|
| 1141 | also renames `yywrap()', so you now _must_ either provide your own
|
|---|
| 1142 | (appropriately-named) version of the routine for your scanner, or
|
|---|
| 1143 | use `%option noyywrap', as linking with `-lfl' no longer provides
|
|---|
| 1144 | one for you by default.
|
|---|
| 1145 |
|
|---|
| 1146 | `--main, `%option main''
|
|---|
| 1147 | directs flex to provide a default `main()' program for the
|
|---|
| 1148 | scanner, which simply calls `yylex()'. This option implies
|
|---|
| 1149 | `noyywrap' (see below).
|
|---|
| 1150 |
|
|---|
| 1151 | `--nounistd, `%option nounistd''
|
|---|
| 1152 | suppresses inclusion of the non-ANSI header file `unistd.h'. This
|
|---|
| 1153 | option is meant to target environments in which `unistd.h' does
|
|---|
| 1154 | not exist. Be aware that certain options may cause flex to
|
|---|
| 1155 | generate code that relies on functions normally found in
|
|---|
| 1156 | `unistd.h', (e.g. `isatty()', `read()'.) If you wish to use these
|
|---|
| 1157 | functions, you will have to inform your compiler where to find
|
|---|
| 1158 | them. *Note option-always-interactive::. *Note option-read::.
|
|---|
| 1159 |
|
|---|
| 1160 | `--yyclass, `%option yyclass="NAME"''
|
|---|
| 1161 | only applies when generating a C++ scanner (the `--c++' option).
|
|---|
| 1162 | It informs `flex' that you have derived `foo' as a subclass of
|
|---|
| 1163 | `yyFlexLexer', so `flex' will place your actions in the member
|
|---|
| 1164 | function `foo::yylex()' instead of `yyFlexLexer::yylex()'. It
|
|---|
| 1165 | also generates a `yyFlexLexer::yylex()' member function that emits
|
|---|
| 1166 | a run-time error (by invoking `yyFlexLexer::LexerError())' if
|
|---|
| 1167 | called. *Note Cxx::.
|
|---|
| 1168 |
|
|---|
| 1169 |
|
|---|
| 1170 |
|
|---|
| 1171 | File: flex.info, Node: Options for Scanner Speed and Size, Next: Debugging Options, Prev: Code-Level And API Options, Up: Scanner Options
|
|---|
| 1172 |
|
|---|
| 1173 | Options for Scanner Speed and Size
|
|---|
| 1174 | ==================================
|
|---|
| 1175 |
|
|---|
| 1176 | `-C[aefFmr]'
|
|---|
| 1177 | controls the degree of table compression and, more generally,
|
|---|
| 1178 | trade-offs between small scanners and fast scanners.
|
|---|
| 1179 |
|
|---|
| 1180 | `-C'
|
|---|
| 1181 | A lone `-C' specifies that the scanner tables should be
|
|---|
| 1182 | compressed but neither equivalence classes nor
|
|---|
| 1183 | meta-equivalence classes should be used.
|
|---|
| 1184 |
|
|---|
| 1185 | `-Ca, --align, `%option align''
|
|---|
| 1186 | ("align") instructs flex to trade off larger tables in the
|
|---|
| 1187 | generated scanner for faster performance because the elements
|
|---|
| 1188 | of the tables are better aligned for memory access and
|
|---|
| 1189 | computation. On some RISC architectures, fetching and
|
|---|
| 1190 | manipulating longwords is more efficient than with
|
|---|
| 1191 | smaller-sized units such as shortwords. This option can
|
|---|
| 1192 | quadruple the size of the tables used by your scanner.
|
|---|
| 1193 |
|
|---|
| 1194 | `-Ce, --ecs, `%option ecs''
|
|---|
| 1195 | directs `flex' to construct "equivalence classes", i.e., sets
|
|---|
| 1196 | of characters which have identical lexical properties (for
|
|---|
| 1197 | example, if the only appearance of digits in the `flex' input
|
|---|
| 1198 | is in the character class "[0-9]" then the digits '0', '1',
|
|---|
| 1199 | ..., '9' will all be put in the same equivalence class).
|
|---|
| 1200 | Equivalence classes usually give dramatic reductions in the
|
|---|
| 1201 | final table/object file sizes (typically a factor of 2-5) and
|
|---|
| 1202 | are pretty cheap performance-wise (one array look-up per
|
|---|
| 1203 | character scanned).
|
|---|
| 1204 |
|
|---|
| 1205 | `-Cf'
|
|---|
| 1206 | specifies that the "full" scanner tables should be generated -
|
|---|
| 1207 | `flex' should not compress the tables by taking advantages of
|
|---|
| 1208 | similar transition functions for different states.
|
|---|
| 1209 |
|
|---|
| 1210 | `-CF'
|
|---|
| 1211 | specifies that the alternate fast scanner representation
|
|---|
| 1212 | (described above under the `--fast' flag) should be used.
|
|---|
| 1213 | This option cannot be used with `--c++'.
|
|---|
| 1214 |
|
|---|
| 1215 | `-Cm, --meta-ecs, `%option meta-ecs''
|
|---|
| 1216 | directs `flex' to construct "meta-equivalence classes", which
|
|---|
| 1217 | are sets of equivalence classes (or characters, if equivalence
|
|---|
| 1218 | classes are not being used) that are commonly used together.
|
|---|
| 1219 | Meta-equivalence classes are often a big win when using
|
|---|
| 1220 | compressed tables, but they have a moderate performance
|
|---|
| 1221 | impact (one or two `if' tests and one array look-up per
|
|---|
| 1222 | character scanned).
|
|---|
| 1223 |
|
|---|
| 1224 | `-Cr, --read, `%option read''
|
|---|
| 1225 | causes the generated scanner to _bypass_ use of the standard
|
|---|
| 1226 | I/O library (`stdio') for input. Instead of calling
|
|---|
| 1227 | `fread()' or `getc()', the scanner will use the `read()'
|
|---|
| 1228 | system call, resulting in a performance gain which varies
|
|---|
| 1229 | from system to system, but in general is probably negligible
|
|---|
| 1230 | unless you are also using `-Cf' or `-CF'. Using `-Cr' can
|
|---|
| 1231 | cause strange behavior if, for example, you read from `yyin'
|
|---|
| 1232 | using `stdio' prior to calling the scanner (because the
|
|---|
| 1233 | scanner will miss whatever text your previous reads left in
|
|---|
| 1234 | the `stdio' input buffer). `-Cr' has no effect if you define
|
|---|
| 1235 | `YY_INPUT()' (*note Generated Scanner::).
|
|---|
| 1236 |
|
|---|
| 1237 | The options `-Cf' or `-CF' and `-Cm' do not make sense together -
|
|---|
| 1238 | there is no opportunity for meta-equivalence classes if the table
|
|---|
| 1239 | is not being compressed. Otherwise the options may be freely
|
|---|
| 1240 | mixed, and are cumulative.
|
|---|
| 1241 |
|
|---|
| 1242 | The default setting is `-Cem', which specifies that `flex' should
|
|---|
| 1243 | generate equivalence classes and meta-equivalence classes. This
|
|---|
| 1244 | setting provides the highest degree of table compression. You can
|
|---|
| 1245 | trade off faster-executing scanners at the cost of larger tables
|
|---|
| 1246 | with the following generally being true:
|
|---|
| 1247 |
|
|---|
| 1248 |
|
|---|
| 1249 | slowest & smallest
|
|---|
| 1250 | -Cem
|
|---|
| 1251 | -Cm
|
|---|
| 1252 | -Ce
|
|---|
| 1253 | -C
|
|---|
| 1254 | -C{f,F}e
|
|---|
| 1255 | -C{f,F}
|
|---|
| 1256 | -C{f,F}a
|
|---|
| 1257 | fastest & largest
|
|---|
| 1258 |
|
|---|
| 1259 | Note that scanners with the smallest tables are usually generated
|
|---|
| 1260 | and compiled the quickest, so during development you will usually
|
|---|
| 1261 | want to use the default, maximal compression.
|
|---|
| 1262 |
|
|---|
| 1263 | `-Cfe' is often a good compromise between speed and size for
|
|---|
| 1264 | production scanners.
|
|---|
| 1265 |
|
|---|
| 1266 | `-f, --full, `%option full''
|
|---|
| 1267 | specifies "fast scanner". No table compression is done and
|
|---|
| 1268 | `stdio' is bypassed. The result is large but fast. This option
|
|---|
| 1269 | is equivalent to `--Cfr'
|
|---|
| 1270 |
|
|---|
| 1271 | `-F, --fast, `%option fast''
|
|---|
| 1272 | specifies that the _fast_ scanner table representation should be
|
|---|
| 1273 | used (and `stdio' bypassed). This representation is about as fast
|
|---|
| 1274 | as the full table representation `--full', and for some sets of
|
|---|
| 1275 | patterns will be considerably smaller (and for others, larger). In
|
|---|
| 1276 | general, if the pattern set contains both _keywords_ and a
|
|---|
| 1277 | catch-all, _identifier_ rule, such as in the set:
|
|---|
| 1278 |
|
|---|
| 1279 |
|
|---|
| 1280 | "case" return TOK_CASE;
|
|---|
| 1281 | "switch" return TOK_SWITCH;
|
|---|
| 1282 | ...
|
|---|
| 1283 | "default" return TOK_DEFAULT;
|
|---|
| 1284 | [a-z]+ return TOK_ID;
|
|---|
| 1285 |
|
|---|
| 1286 | then you're better off using the full table representation. If
|
|---|
| 1287 | only the _identifier_ rule is present and you then use a hash
|
|---|
| 1288 | table or some such to detect the keywords, you're better off using
|
|---|
| 1289 | `--fast'.
|
|---|
| 1290 |
|
|---|
| 1291 | This option is equivalent to `-CFr' (see below). It cannot be used
|
|---|
| 1292 | with `--c++'.
|
|---|
| 1293 |
|
|---|
| 1294 |
|
|---|