source: trunk/flex/doc/flex.info-4@ 3032

Last change on this file since 3032 was 3031, checked in by bird, 18 years ago

flex 2.5.33.

File size: 50.2 KB
Line 
1This is flex.info, produced by makeinfo version 4.5 from flex.texi.
2
3INFO-DIR-SECTION Programming
4START-INFO-DIR-ENTRY
5* flex: (flex). Fast lexical analyzer generator (lex replacement).
6END-INFO-DIR-ENTRY
7
8
9 The flex manual is placed under the same licensing conditions as the
10rest of flex:
11
12 Copyright (C) 1990, 1997 The Regents of the University of California.
13All rights reserved.
14
15 This code is derived from software contributed to Berkeley by Vern
16Paxson.
17
18 The United States Government has rights in this work pursuant to
19contract no. DE-AC03-76SF00098 between the United States Department of
20Energy and the University of California.
21
22 Redistribution and use in source and binary forms, with or without
23modification, are permitted provided that the following conditions are
24met:
25
26 1. Redistributions of source code must retain the above copyright
27 notice, this list of conditions and the following disclaimer.
28
29 2. Redistributions in binary form must reproduce the above copyright
30 notice, this list of conditions and the following disclaimer in the
31 documentation and/or other materials provided with the
32 distribution.
33 Neither the name of the University nor the names of its contributors
34may be used to endorse or promote products derived from this software
35without specific prior written permission.
36
37 THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
38WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
39MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
40
41File: flex.info, Node: Overriding The Default Memory Management, Next: A Note About yytext And Memory, Prev: The Default Memory Management, Up: Memory Management
42
43Overriding The Default Memory Management
44========================================
45
46 Flex calls the functions `yyalloc', `yyrealloc', and `yyfree' when
47it needs to allocate or free memory. By default, these functions are
48wrappers around the standard C functions, `malloc', `realloc', and
49`free', respectively. You can override the default implementations by
50telling flex that you will provide your own implementations.
51
52 To override the default implementations, you must do two things:
53
54 1. Suppress the default implementations by specifying one or more of
55 the following options:
56
57 * `%option noyyalloc'
58
59 * `%option noyyrealloc'
60
61 * `%option noyyfree'.
62
63 2. Provide your own implementation of the following functions: (1)
64
65
66 // For a non-reentrant scanner
67 void * yyalloc (size_t bytes);
68 void * yyrealloc (void * ptr, size_t bytes);
69 void yyfree (void * ptr);
70
71 // For a reentrant scanner
72 void * yyalloc (size_t bytes, void * yyscanner);
73 void * yyrealloc (void * ptr, size_t bytes, void * yyscanner);
74 void yyfree (void * ptr, void * yyscanner);
75
76
77 In the following example, we will override all three memory
78routines. We assume that there is a custom allocator with garbage
79collection. In order to make this example interesting, we will use a
80reentrant scanner, passing a pointer to the custom allocator through
81`yyextra'.
82
83
84 %{
85 #include "some_allocator.h"
86 %}
87
88 /* Suppress the default implementations. */
89 %option noyyalloc noyyrealloc noyyfree
90 %option reentrant
91
92 /* Initialize the allocator. */
93 #define YY_EXTRA_TYPE struct allocator*
94 #define YY_USER_INIT yyextra = allocator_create();
95
96 %%
97 .|\n ;
98 %%
99
100 /* Provide our own implementations. */
101 void * yyalloc (size_t bytes, void* yyscanner) {
102 return allocator_alloc (yyextra, bytes);
103 }
104
105 void * yyrealloc (void * ptr, size_t bytes, void* yyscanner) {
106 return allocator_realloc (yyextra, bytes);
107 }
108
109 void yyfree (void * ptr, void * yyscanner) {
110 /* Do nothing -- we leave it to the garbage collector. */
111 }
112
113 ---------- Footnotes ----------
114
115 (1) It is not necessary to override all (or any) of the memory
116management routines. You may, for example, override `yyrealloc', but
117not `yyfree' or `yyalloc'.
118
119
120File: flex.info, Node: A Note About yytext And Memory, Prev: Overriding The Default Memory Management, Up: Memory Management
121
122A Note About yytext And Memory
123==============================
124
125 When flex finds a match, `yytext' points to the first character of
126the match in the input buffer. The string itself is part of the input
127buffer, and is _NOT_ allocated separately. The value of yytext will be
128overwritten the next time yylex() is called. In short, the value of
129yytext is only valid from within the matched rule's action.
130
131 Often, you want the value of yytext to persist for later processing,
132i.e., by a parser with non-zero lookahead. In order to preserve yytext,
133you will have to copy it with strdup() or a similar function. But this
134introduces some headache because your parser is now responsible for
135freeing the copy of yytext. If you use a yacc or bison parser,
136(commonly used with flex), you will discover that the error recovery
137mechanisms can cause memory to be leaked.
138
139 To prevent memory leaks from strdup'd yytext, you will have to track
140the memory somehow. Our experience has shown that a garbage collection
141mechanism or a pooled memory mechanism will save you a lot of grief
142when writing parsers.
143
144
145File: flex.info, Node: Serialized Tables, Next: Diagnostics, Prev: Memory Management, Up: Top
146
147Serialized Tables
148*****************
149
150 A `flex' scanner has the ability to save the DFA tables to a file,
151and load them at runtime when needed. The motivation for this feature
152is to reduce the runtime memory footprint. Traditionally, these tables
153have been compiled into the scanner as C arrays, and are sometimes
154quite large. Since the tables are compiled into the scanner, the
155memory used by the tables can never be freed. This is a waste of
156memory, especially if an application uses several scanners, but none of
157them at the same time.
158
159 The serialization feature allows the tables to be loaded at runtime,
160before scanning begins. The tables may be discarded when scanning is
161finished.
162
163* Menu:
164
165* Creating Serialized Tables::
166* Loading and Unloading Serialized Tables::
167* Tables File Format::
168
169
170File: flex.info, Node: Creating Serialized Tables, Next: Loading and Unloading Serialized Tables, Prev: Serialized Tables, Up: Serialized Tables
171
172Creating Serialized Tables
173==========================
174
175 You may create a scanner with serialized tables by specifying:
176
177
178 %option tables-file=FILE
179 or
180 --tables-file=FILE
181
182 These options instruct flex to save the DFA tables to the file FILE.
183The tables will _not_ be embedded in the generated scanner. The scanner
184will not function on its own. The scanner will be dependent upon the
185serialized tables. You must load the tables from this file at runtime
186before you can scan anything.
187
188 If you do not specify a filename to `--tables-file', the tables will
189be saved to `lex.yy.tables', where `yy' is the appropriate prefix.
190
191 If your project uses several different scanners, you can concatenate
192the serialized tables into one file, and flex will find the correct set
193of tables, using the scanner prefix as part of the lookup key. An
194example follows:
195
196
197 $ flex --tables-file --prefix=cpp cpp.l
198 $ flex --tables-file --prefix=c c.l
199 $ cat lex.cpp.tables lex.c.tables > all.tables
200
201 The above example created two scanners, `cpp', and `c'. Since we did
202not specify a filename, the tables were serialized to `lex.c.tables' and
203`lex.cpp.tables', respectively. Then, we concatenated the two files
204together into `all.tables', which we will distribute with our project.
205At runtime, we will open the file and tell flex to load the tables from
206it. Flex will find the correct tables automatically. (See next
207section).
208
209
210File: flex.info, Node: Loading and Unloading Serialized Tables, Next: Tables File Format, Prev: Creating Serialized Tables, Up: Serialized Tables
211
212Loading and Unloading Serialized Tables
213=======================================
214
215 If you've built your scanner with `%option tables-file', then you
216must load the scanner tables at runtime. This can be accomplished with
217the following function:
218
219 - Function: int yytables_fload (FILE* FP [, yyscan_t SCANNER])
220 Locates scanner tables in the stream pointed to by FP and loads
221 them. Memory for the tables is allocated via `yyalloc'. You must
222 call this function before the first call to `yylex'. The argument
223 SCANNER only appears in the reentrant scanner. This function
224 returns `0' (zero) on success, or non-zero on error.
225
226 The loaded tables are *not* automatically destroyed (unloaded) when
227you call `yylex_destroy'. The reason is that you may create several
228scanners of the same type (in a reentrant scanner), each of which needs
229access to these tables. To avoid a nasty memory leak, you must call
230the following function:
231
232 - Function: int yytables_destroy ([yyscan_t SCANNER])
233 Unloads the scanner tables. The tables must be loaded again before
234 you can scan any more data. The argument SCANNER only appears in
235 the reentrant scanner. This function returns `0' (zero) on
236 success, or non-zero on error.
237
238 *The functions `yytables_fload' and `yytables_destroy' are not
239thread-safe.* You must ensure that these functions are called exactly
240once (for each scanner type) in a threaded program, before any thread
241calls `yylex'. After the tables are loaded, they are never written to,
242and no thread protection is required thereafter - until you destroy
243them.
244
245
246File: flex.info, Node: Tables File Format, Prev: Loading and Unloading Serialized Tables, Up: Serialized Tables
247
248Tables File Format
249==================
250
251 This section defines the file format of serialized `flex' tables.
252
253 The tables format allows for one or more sets of tables to be
254specified, where each set corresponds to a given scanner. Scanners are
255indexed by name, as described below. The file format is as follows:
256
257
258 TABLE SET 1
259 +-------------------------------+
260 Header | uint32 th_magic; |
261 | uint32 th_hsize; |
262 | uint32 th_ssize; |
263 | uint16 th_flags; |
264 | char th_version[]; |
265 | char th_name[]; |
266 | uint8 th_pad64[]; |
267 +-------------------------------+
268 Table 1 | uint16 td_id; |
269 | uint16 td_flags; |
270 | uint32 td_lolen; |
271 | uint32 td_hilen; |
272 | void td_data[]; |
273 | uint8 td_pad64[]; |
274 +-------------------------------+
275 Table 2 | |
276 . . .
277 . . .
278 . . .
279 . . .
280 Table n | |
281 +-------------------------------+
282 TABLE SET 2
283 .
284 .
285 .
286 TABLE SET N
287
288 The above diagram shows that a complete set of tables consists of a
289header followed by multiple individual tables. Furthermore, multiple
290complete sets may be present in the same file, each set with its own
291header and tables. The sets are contiguous in the file. The only way to
292know if another set follows is to check the next four bytes for the
293magic number (or check for EOF). The header and tables sections are
294padded to 64-bit boundaries. Below we describe each field in detail.
295This format does not specify how the scanner will expand the given
296data, i.e., data may be serialized as int8, but expanded to an int32
297array at runtime. This is to reduce the size of the serialized data
298where possible. Remember, _all integer values are in network byte
299order_.
300
301Fields of a table header:
302
303`th_magic'
304 Magic number, always 0xF13C57B1.
305
306`th_hsize'
307 Size of this entire header, in bytes, including all fields plus
308 any padding.
309
310`th_ssize'
311 Size of this entire set, in bytes, including the header, all
312 tables, plus any padding.
313
314`th_flags'
315 Bit flags for this table set. Currently unused.
316
317`th_version[]'
318 Flex version in NULL-termninated string format. e.g., `2.5.13a'.
319 This is the version of flex that was used to create the serialized
320 tables.
321
322`th_name[]'
323 Contains the name of this table set. The default is `yytables',
324 and is prefixed accordingly, e.g., `footables'. Must be
325 NULL-terminated.
326
327`th_pad64[]'
328 Zero or more NULL bytes, padding the entire header to the next
329 64-bit boundary as calculated from the beginning of the header.
330
331Fields of a table:
332
333`td_id'
334 Specifies the table identifier. Possible values are:
335 `YYTD_ID_ACCEPT (0x01)'
336 `yy_accept'
337
338 `YYTD_ID_BASE (0x02)'
339 `yy_base'
340
341 `YYTD_ID_CHK (0x03)'
342 `yy_chk'
343
344 `YYTD_ID_DEF (0x04)'
345 `yy_def'
346
347 `YYTD_ID_EC (0x05)'
348 `yy_ec '
349
350 `YYTD_ID_META (0x06)'
351 `yy_meta'
352
353 `YYTD_ID_NUL_TRANS (0x07)'
354 `yy_NUL_trans'
355
356 `YYTD_ID_NXT (0x08)'
357 `yy_nxt'. This array may be two dimensional. See the
358 `td_hilen' field below.
359
360 `YYTD_ID_RULE_CAN_MATCH_EOL (0x09)'
361 `yy_rule_can_match_eol'
362
363 `YYTD_ID_START_STATE_LIST (0x0A)'
364 `yy_start_state_list'. This array is handled specially
365 because it is an array of pointers to structs. See the
366 `td_flags' field below.
367
368 `YYTD_ID_TRANSITION (0x0B)'
369 `yy_transition'. This array is handled specially because it
370 is an array of structs. See the `td_lolen' field below.
371
372 `YYTD_ID_ACCLIST (0x0C)'
373 `yy_acclist'
374
375`td_flags'
376 Bit flags describing how to interpret the data in `td_data'. The
377 data arrays are one-dimensional by default, but may be two
378 dimensional as specified in the `td_hilen' field.
379
380 `YYTD_DATA8 (0x01)'
381 The data is serialized as an array of type int8.
382
383 `YYTD_DATA16 (0x02)'
384 The data is serialized as an array of type int16.
385
386 `YYTD_DATA32 (0x04)'
387 The data is serialized as an array of type int32.
388
389 `YYTD_PTRANS (0x08)'
390 The data is a list of indexes of entries in the expanded
391 `yy_transition' array. Each index should be expanded to a
392 pointer to the corresponding entry in the `yy_transition'
393 array. We count on the fact that the `yy_transition' array
394 has already been seen.
395
396 `YYTD_STRUCT (0x10)'
397 The data is a list of yy_trans_info structs, each of which
398 consists of two integers. There is no padding between struct
399 elements or between structs. The type of each member is
400 determined by the `YYTD_DATA*' bits.
401
402`td_lolen'
403 Specifies the number of elements in the lowest dimension array. If
404 this is a one-dimensional array, then it is simply the number of
405 elements in this array. The element size is determined by the
406 `td_flags' field.
407
408`td_hilen'
409 If `td_hilen' is non-zero, then the data is a two-dimensional
410 array. Otherwise, the data is a one-dimensional array. `td_hilen'
411 contains the number of elements in the higher dimensional array,
412 and `td_lolen' contains the number of elements in the lowest
413 dimension.
414
415 Conceptually, `td_data' is either `sometype td_data[td_lolen]', or
416 `sometype td_data[td_hilen][td_lolen]', where `sometype' is
417 specified by the `td_flags' field. It is possible for both
418 `td_lolen' and `td_hilen' to be zero, in which case `td_data' is a
419 zero length array, and no data is loaded, i.e., this table is
420 simply skipped. Flex does not currently generate tables of zero
421 length.
422
423`td_data[]'
424 The table data. This array may be a one- or two-dimensional array,
425 of type `int8', `int16', `int32', `struct yy_trans_info', or
426 `struct yy_trans_info*', depending upon the values in the
427 `td_flags', `td_lolen', and `td_hilen' fields.
428
429`td_pad64[]'
430 Zero or more NULL bytes, padding the entire table to the next
431 64-bit boundary as calculated from the beginning of this table.
432
433
434File: flex.info, Node: Diagnostics, Next: Limitations, Prev: Serialized Tables, Up: Top
435
436Diagnostics
437***********
438
439 The following is a list of `flex' diagnostic messages:
440
441 * `warning, rule cannot be matched' indicates that the given rule
442 cannot be matched because it follows other rules that will always
443 match the same text as it. For example, in the following `foo'
444 cannot be matched because it comes after an identifier "catch-all"
445 rule:
446
447
448 [a-z]+ got_identifier();
449 foo got_foo();
450
451 Using `REJECT' in a scanner suppresses this warning.
452
453 * `warning, -s option given but default rule can be matched' means
454 that it is possible (perhaps only in a particular start condition)
455 that the default rule (match any single character) is the only one
456 that will match a particular input. Since `-s' was given,
457 presumably this is not intended.
458
459 * `reject_used_but_not_detected undefined' or
460 `yymore_used_but_not_detected undefined'. These errors can occur
461 at compile time. They indicate that the scanner uses `REJECT' or
462 `yymore()' but that `flex' failed to notice the fact, meaning that
463 `flex' scanned the first two sections looking for occurrences of
464 these actions and failed to find any, but somehow you snuck some in
465 (via a #include file, for example). Use `%option reject' or
466 `%option yymore' to indicate to `flex' that you really do use
467 these features.
468
469 * `flex scanner jammed'. a scanner compiled with `-s' has
470 encountered an input string which wasn't matched by any of its
471 rules. This error can also occur due to internal problems.
472
473 * `token too large, exceeds YYLMAX'. your scanner uses `%array' and
474 one of its rules matched a string longer than the `YYLMAX'
475 constant (8K bytes by default). You can increase the value by
476 #define'ing `YYLMAX' in the definitions section of your `flex'
477 input.
478
479 * `scanner requires -8 flag to use the character 'x''. Your scanner
480 specification includes recognizing the 8-bit character `'x'' and
481 you did not specify the -8 flag, and your scanner defaulted to
482 7-bit because you used the `-Cf' or `-CF' table compression
483 options. See the discussion of the `-7' flag, *Note Scanner
484 Options::, for details.
485
486 * `flex scanner push-back overflow'. you used `unput()' to push back
487 so much text that the scanner's buffer could not hold both the
488 pushed-back text and the current token in `yytext'. Ideally the
489 scanner should dynamically resize the buffer in this case, but at
490 present it does not.
491
492 * `input buffer overflow, can't enlarge buffer because scanner uses
493 REJECT'. the scanner was working on matching an extremely large
494 token and needed to expand the input buffer. This doesn't work
495 with scanners that use `REJECT'.
496
497 * `fatal flex scanner internal error--end of buffer missed'. This can
498 occur in a scanner which is reentered after a long-jump has jumped
499 out (or over) the scanner's activation frame. Before reentering
500 the scanner, use:
501
502 yyrestart( yyin );
503 or, as noted above, switch to using the C++ scanner class.
504
505 * `too many start conditions in <> construct!' you listed more start
506 conditions in a <> construct than exist (so you must have listed at
507 least one of them twice).
508
509
510File: flex.info, Node: Limitations, Next: Bibliography, Prev: Diagnostics, Up: Top
511
512Limitations
513***********
514
515 Some trailing context patterns cannot be properly matched and
516generate warning messages (`dangerous trailing context'). These are
517patterns where the ending of the first part of the rule matches the
518beginning of the second part, such as `zx*/xy*', where the 'x*' matches
519the 'x' at the beginning of the trailing context. (Note that the POSIX
520draft states that the text matched by such patterns is undefined.) For
521some trailing context rules, parts which are actually fixed-length are
522not recognized as such, leading to the abovementioned performance loss.
523In particular, parts using `|' or `{n}' (such as `foo{3}') are always
524considered variable-length. Combining trailing context with the
525special `|' action can result in _fixed_ trailing context being turned
526into the more expensive _variable_ trailing context. For example, in
527the following:
528
529
530 %%
531 abc |
532 xyz/def
533
534 Use of `unput()' invalidates yytext and yyleng, unless the `%array'
535directive or the `-l' option has been used. Pattern-matching of `NUL's
536is substantially slower than matching other characters. Dynamic
537resizing of the input buffer is slow, as it entails rescanning all the
538text matched so far by the current (generally huge) token. Due to both
539buffering of input and read-ahead, you cannot intermix calls to
540`<stdio.h>' routines, such as, getchar(), with `flex' rules and expect
541it to work. Call `input()' instead. The total table entries listed by
542the `-v' flag excludes the number of table entries needed to determine
543what rule has been matched. The number of entries is equal to the
544number of DFA states if the scanner does not use `REJECT', and somewhat
545greater than the number of states if it does. `REJECT' cannot be used
546with the `-f' or `-F' options.
547
548 The `flex' internal algorithms need documentation.
549
550
551File: flex.info, Node: Bibliography, Next: FAQ, Prev: Limitations, Up: Top
552
553Additional Reading
554******************
555
556 You may wish to read more about the following programs:
557 * lex
558
559 * yacc
560
561 * sed
562
563 * awk
564
565 The following books may contain material of interest:
566
567 John Levine, Tony Mason, and Doug Brown, _Lex & Yacc_, O'Reilly and
568Associates. Be sure to get the 2nd edition.
569
570 M. E. Lesk and E. Schmidt, _LEX - Lexical Analyzer Generator_
571
572 Alfred Aho, Ravi Sethi and Jeffrey Ullman, _Compilers: Principles,
573Techniques and Tools_, Addison-Wesley (1986). Describes the
574pattern-matching techniques used by `flex' (deterministic finite
575automata).
576
577
578File: flex.info, Node: FAQ, Next: Appendices, Prev: Bibliography, Up: Top
579
580FAQ
581***
582
583 From time to time, the `flex' maintainer receives certain questions.
584Rather than repeat answers to well-understood problems, we publish them
585here.
586
587* Menu:
588
589* When was flex born?::
590* How do I expand \ escape sequences in C-style quoted strings?::
591* Why do flex scanners call fileno if it is not ANSI compatible?::
592* Does flex support recursive pattern definitions?::
593* How do I skip huge chunks of input (tens of megabytes) while using flex?::
594* Flex is not matching my patterns in the same order that I defined them.::
595* My actions are executing out of order or sometimes not at all.::
596* How can I have multiple input sources feed into the same scanner at the same time?::
597* Can I build nested parsers that work with the same input file?::
598* How can I match text only at the end of a file?::
599* How can I make REJECT cascade across start condition boundaries?::
600* Why cant I use fast or full tables with interactive mode?::
601* How much faster is -F or -f than -C?::
602* If I have a simple grammar cant I just parse it with flex?::
603* Why doesnt yyrestart() set the start state back to INITIAL?::
604* How can I match C-style comments?::
605* The period isnt working the way I expected.::
606* Can I get the flex manual in another format?::
607* Does there exist a "faster" NDFA->DFA algorithm?::
608* How does flex compile the DFA so quickly?::
609* How can I use more than 8192 rules?::
610* How do I abandon a file in the middle of a scan and switch to a new file?::
611* How do I execute code only during initialization (only before the first scan)?::
612* How do I execute code at termination?::
613* Where else can I find help?::
614* Can I include comments in the "rules" section of the file?::
615* I get an error about undefined yywrap().::
616* How can I change the matching pattern at run time?::
617* How can I expand macros in the input?::
618* How can I build a two-pass scanner?::
619* How do I match any string not matched in the preceding rules?::
620* I am trying to port code from AT&T lex that uses yysptr and yysbuf.::
621* Is there a way to make flex treat NULL like a regular character?::
622* Whenever flex can not match the input it says "flex scanner jammed".::
623* Why doesnt flex have non-greedy operators like perl does?::
624* Memory leak - 16386 bytes allocated by malloc.::
625* How do I track the byte offset for lseek()?::
626* How do I use my own I/O classes in a C++ scanner?::
627* How do I skip as many chars as possible?::
628* deleteme00::
629* Are certain equivalent patterns faster than others?::
630* Is backing up a big deal?::
631* Can I fake multi-byte character support?::
632* deleteme01::
633* Can you discuss some flex internals?::
634* unput() messes up yy_at_bol::
635* The | operator is not doing what I want::
636* Why can't flex understand this variable trailing context pattern?::
637* The ^ operator isn't working::
638* Trailing context is getting confused with trailing optional patterns::
639* Is flex GNU or not?::
640* ERASEME53::
641* I need to scan if-then-else blocks and while loops::
642* ERASEME55::
643* ERASEME56::
644* ERASEME57::
645* Is there a repository for flex scanners?::
646* How can I conditionally compile or preprocess my flex input file?::
647* Where can I find grammars for lex and yacc?::
648* I get an end-of-buffer message for each character scanned.::
649* unnamed-faq-62::
650* unnamed-faq-63::
651* unnamed-faq-64::
652* unnamed-faq-65::
653* unnamed-faq-66::
654* unnamed-faq-67::
655* unnamed-faq-68::
656* unnamed-faq-69::
657* unnamed-faq-70::
658* unnamed-faq-71::
659* unnamed-faq-72::
660* unnamed-faq-73::
661* unnamed-faq-74::
662* unnamed-faq-75::
663* unnamed-faq-76::
664* unnamed-faq-77::
665* unnamed-faq-78::
666* unnamed-faq-79::
667* unnamed-faq-80::
668* unnamed-faq-81::
669* unnamed-faq-82::
670* unnamed-faq-83::
671* unnamed-faq-84::
672* unnamed-faq-85::
673* unnamed-faq-86::
674* unnamed-faq-87::
675* unnamed-faq-88::
676* unnamed-faq-90::
677* unnamed-faq-91::
678* unnamed-faq-92::
679* unnamed-faq-93::
680* unnamed-faq-94::
681* unnamed-faq-95::
682* unnamed-faq-96::
683* unnamed-faq-97::
684* unnamed-faq-98::
685* unnamed-faq-99::
686* unnamed-faq-100::
687* unnamed-faq-101::
688* What is the difference between YYLEX_PARAM and YY_DECL?::
689* Why do I get "conflicting types for yylex" error?::
690* How do I access the values set in a Flex action from within a Bison action?::
691
692
693File: flex.info, Node: When was flex born?, Next: How do I expand \ escape sequences in C-style quoted strings?, Up: FAQ
694
695When was flex born?
696===================
697
698 Vern Paxson took over the `Software Tools' lex project from Jef
699Poskanzer in 1982. At that point it was written in Ratfor. Around
7001987 or so, Paxson translated it into C, and a legend was born :-).
701
702
703File: flex.info, Node: How do I expand \ escape sequences in C-style quoted strings?, Next: Why do flex scanners call fileno if it is not ANSI compatible?, Prev: When was flex born?, Up: FAQ
704
705How do I expand \ escape sequences in C-style quoted strings?
706=============================================================
707
708 A key point when scanning quoted strings is that you cannot (easily)
709write a single rule that will precisely match the string if you allow
710things like embedded escape sequences and newlines. If you try to
711match strings with a single rule then you'll wind up having to rescan
712the string anyway to find any escape sequences.
713
714 Instead you can use exclusive start conditions and a set of rules,
715one for matching non-escaped text, one for matching a single escape,
716one for matching an embedded newline, and one for recognizing the end
717of the string. Each of these rules is then faced with the question of
718where to put its intermediary results. The best solution is for the
719rules to append their local value of `yytext' to the end of a "string
720literal" buffer. A rule like the escape-matcher will append to the
721buffer the meaning of the escape sequence rather than the literal text
722in `yytext'. In this way, `yytext' does not need to be modified at all.
723
724
725File: flex.info, Node: Why do flex scanners call fileno if it is not ANSI compatible?, Next: Does flex support recursive pattern definitions?, Prev: How do I expand \ escape sequences in C-style quoted strings?, Up: FAQ
726
727Why do flex scanners call fileno if it is not ANSI compatible?
728==============================================================
729
730 Flex scanners call `fileno()' in order to get the file descriptor
731corresponding to `yyin'. The file descriptor may be passed to
732`isatty()' or `read()', depending upon which `%options' you specified.
733If your system does not have `fileno()' support, to get rid of the
734`read()' call, do not specify `%option read'. To get rid of the
735`isatty()' call, you must specify one of `%option always-interactive' or
736`%option never-interactive'.
737
738
739File: flex.info, Node: Does flex support recursive pattern definitions?, Next: How do I skip huge chunks of input (tens of megabytes) while using flex?, Prev: Why do flex scanners call fileno if it is not ANSI compatible?, Up: FAQ
740
741Does flex support recursive pattern definitions?
742================================================
743
744 e.g.,
745
746
747 %%
748 block "{"({block}|{statement})*"}"
749
750 No. You cannot have recursive definitions. The pattern-matching
751power of regular expressions in general (and therefore flex scanners,
752too) is limited. In particular, regular expressions cannot "balance"
753parentheses to an arbitrary degree. For example, it's impossible to
754write a regular expression that matches all strings containing the same
755number of '{'s as '}'s. For more powerful pattern matching, you need a
756parser, such as `GNU bison'.
757
758
759File: flex.info, Node: How do I skip huge chunks of input (tens of megabytes) while using flex?, Next: Flex is not matching my patterns in the same order that I defined them., Prev: Does flex support recursive pattern definitions?, Up: FAQ
760
761How do I skip huge chunks of input (tens of megabytes) while using flex?
762========================================================================
763
764 Use `fseek()' (or `lseek()') to position yyin, then call
765`yyrestart()'.
766
767
768File: flex.info, Node: Flex is not matching my patterns in the same order that I defined them., Next: My actions are executing out of order or sometimes not at all., Prev: How do I skip huge chunks of input (tens of megabytes) while using flex?, Up: FAQ
769
770Flex is not matching my patterns in the same order that I defined them.
771=======================================================================
772
773 `flex' picks the rule that matches the most text (i.e., the longest
774possible input string). This is because `flex' uses an entirely
775different matching technique ("deterministic finite automata") that
776actually does all of the matching simultaneously, in parallel. (Seems
777impossible, but it's actually a fairly simple technique once you
778understand the principles.)
779
780 A side-effect of this parallel matching is that when the input
781matches more than one rule, `flex' scanners pick the rule that matched
782the _most_ text. This is explained further in the manual, in the
783section *Note Matching::.
784
785 If you want `flex' to choose a shorter match, then you can work
786around this behavior by expanding your short rule to match more text,
787then put back the extra:
788
789
790 data_.* yyless( 5 ); BEGIN BLOCKIDSTATE;
791
792 Another fix would be to make the second rule active only during the
793`<BLOCKIDSTATE>' start condition, and make that start condition
794exclusive by declaring it with `%x' instead of `%s'.
795
796 A final fix is to change the input language so that the ambiguity for
797`data_' is removed, by adding characters to it that don't match the
798identifier rule, or by removing characters (such as `_') from the
799identifier rule so it no longer matches `data_'. (Of course, you might
800also not have the option of changing the input language.)
801
802
803File: flex.info, Node: My actions are executing out of order or sometimes not at all., Next: How can I have multiple input sources feed into the same scanner at the same time?, Prev: Flex is not matching my patterns in the same order that I defined them., Up: FAQ
804
805My actions are executing out of order or sometimes not at all.
806==============================================================
807
808 Most likely, you have (in error) placed the opening `{' of the action
809block on a different line than the rule, e.g.,
810
811
812 ^(foo|bar)
813 { <<<--- WRONG!
814
815 }
816
817 `flex' requires that the opening `{' of an action associated with a
818rule begin on the same line as does the rule. You need instead to
819write your rules as follows:
820
821
822 ^(foo|bar) { // CORRECT!
823
824 }
825
826
827File: flex.info, Node: How can I have multiple input sources feed into the same scanner at the same time?, Next: Can I build nested parsers that work with the same input file?, Prev: My actions are executing out of order or sometimes not at all., Up: FAQ
828
829How can I have multiple input sources feed into the same scanner at the same time?
830==================================================================================
831
832 If ...
833 * your scanner is free of backtracking (verified using `flex''s `-b'
834 flag),
835
836 * AND you run your scanner interactively (`-I' option; default
837 unless using special table compression options),
838
839 * AND you feed it one character at a time by redefining `YY_INPUT'
840 to do so,
841
842 then every time it matches a token, it will have exhausted its input
843buffer (because the scanner is free of backtracking). This means you
844can safely use `select()' at the point and only call `yylex()' for
845another token if `select()' indicates there's data available.
846
847 That is, move the `select()' out from the input function to a point
848where it determines whether `yylex()' gets called for the next token.
849
850 With this approach, you will still have problems if your input can
851arrive piecemeal; `select()' could inform you that the beginning of a
852token is available, you call `yylex()' to get it, but it winds up
853blocking waiting for the later characters in the token.
854
855 Here's another way: Move your input multiplexing inside of
856`YY_INPUT'. That is, whenever `YY_INPUT' is called, it `select()''s to
857see where input is available. If input is available for the scanner,
858it reads and returns the next byte. If input is available from another
859source, it calls whatever function is responsible for reading from that
860source. (If no input is available, it blocks until some input is
861available.) I've used this technique in an interpreter I wrote that
862both reads keyboard input using a `flex' scanner and IPC traffic from
863sockets, and it works fine.
864
865
866File: flex.info, Node: Can I build nested parsers that work with the same input file?, Next: How can I match text only at the end of a file?, Prev: How can I have multiple input sources feed into the same scanner at the same time?, Up: FAQ
867
868Can I build nested parsers that work with the same input file?
869==============================================================
870
871 This is not going to work without some additional effort. The
872reason is that `flex' block-buffers the input it reads from `yyin'.
873This means that the "outermost" `yylex()', when called, will
874automatically slurp up the first 8K of input available on yyin, and
875subsequent calls to other `yylex()''s won't see that input. You might
876be tempted to work around this problem by redefining `YY_INPUT' to only
877return a small amount of text, but it turns out that that approach is
878quite difficult. Instead, the best solution is to combine all of your
879scanners into one large scanner, using a different exclusive start
880condition for each.
881
882
883File: flex.info, Node: How can I match text only at the end of a file?, Next: How can I make REJECT cascade across start condition boundaries?, Prev: Can I build nested parsers that work with the same input file?, Up: FAQ
884
885How can I match text only at the end of a file?
886===============================================
887
888 There is no way to write a rule which is "match this text, but only
889if it comes at the end of the file". You can fake it, though, if you
890happen to have a character lying around that you don't allow in your
891input. Then you redefine `YY_INPUT' to call your own routine which, if
892it sees an `EOF', returns the magic character first (and remembers to
893return a real `EOF' next time it's called). Then you could write:
894
895
896 <COMMENT>(.|\n)*{EOF_CHAR} /* saw comment at EOF */
897
898
899File: flex.info, Node: How can I make REJECT cascade across start condition boundaries?, Next: Why cant I use fast or full tables with interactive mode?, Prev: How can I match text only at the end of a file?, Up: FAQ
900
901How can I make REJECT cascade across start condition boundaries?
902================================================================
903
904 You can do this as follows. Suppose you have a start condition `A',
905and after exhausting all of the possible matches in `<A>', you want to
906try matches in `<INITIAL>'. Then you could use the following:
907
908
909 %x A
910 %%
911 <A>rule_that_is_long ...; REJECT;
912 <A>rule ...; REJECT; /* shorter rule */
913 <A>etc.
914 ...
915 <A>.|\n {
916 /* Shortest and last rule in <A>, so
917 * cascaded REJECT's will eventually
918 * wind up matching this rule. We want
919 * to now switch to the initial state
920 * and try matching from there instead.
921 */
922 yyless(0); /* put back matched text */
923 BEGIN(INITIAL);
924 }
925
926
927File: flex.info, Node: Why cant I use fast or full tables with interactive mode?, Next: How much faster is -F or -f than -C?, Prev: How can I make REJECT cascade across start condition boundaries?, Up: FAQ
928
929Why can't I use fast or full tables with interactive mode?
930==========================================================
931
932 One of the assumptions flex makes is that interactive applications
933are inherently slow (they're waiting on a human after all). It has to
934do with how the scanner detects that it must be finished scanning a
935token. For interactive scanners, after scanning each character the
936current state is looked up in a table (essentially) to see whether
937there's a chance of another input character possibly extending the
938length of the match. If not, the scanner halts. For non-interactive
939scanners, the end-of-token test is much simpler, basically a compare
940with 0, so no memory bus cycles. Since the test occurs in the
941innermost scanning loop, one would like to make it go as fast as
942possible.
943
944 Still, it seems reasonable to allow the user to choose to trade off
945a bit of performance in this area to gain the corresponding
946flexibility. There might be another reason, though, why fast scanners
947don't support the interactive option.
948
949
950File: flex.info, Node: How much faster is -F or -f than -C?, Next: If I have a simple grammar cant I just parse it with flex?, Prev: Why cant I use fast or full tables with interactive mode?, Up: FAQ
951
952How much faster is -F or -f than -C?
953====================================
954
955 Much faster (factor of 2-3).
956
957
958File: flex.info, Node: If I have a simple grammar cant I just parse it with flex?, Next: Why doesnt yyrestart() set the start state back to INITIAL?, Prev: How much faster is -F or -f than -C?, Up: FAQ
959
960If I have a simple grammar can't I just parse it with flex?
961===========================================================
962
963 Is your grammar recursive? That's almost always a sign that you're
964better off using a parser/scanner rather than just trying to use a
965scanner alone.
966
967
968File: flex.info, Node: Why doesnt yyrestart() set the start state back to INITIAL?, Next: How can I match C-style comments?, Prev: If I have a simple grammar cant I just parse it with flex?, Up: FAQ
969
970Why doesn't yyrestart() set the start state back to INITIAL?
971============================================================
972
973 There are two reasons. The first is that there might be programs
974that rely on the start state not changing across file changes. The
975second is that beginning with `flex' version 2.4, use of `yyrestart()'
976is no longer required, so fixing the problem there doesn't solve the
977more general problem.
978
979
980File: flex.info, Node: How can I match C-style comments?, Next: The period isnt working the way I expected., Prev: Why doesnt yyrestart() set the start state back to INITIAL?, Up: FAQ
981
982How can I match C-style comments?
983=================================
984
985 You might be tempted to try something like this:
986
987
988 "/*".*"*/" // WRONG!
989
990 or, worse, this:
991
992
993 "/*"(.|\n)"*/" // WRONG!
994
995 The above rules will eat too much input, and blow up on things like:
996
997
998 /* a comment */ do_my_thing( "oops */" );
999
1000 Here is one way which allows you to track line information:
1001
1002
1003 <INITIAL>{
1004 "/*" BEGIN(IN_COMMENT);
1005 }
1006 <IN_COMMENT>{
1007 "*/" BEGIN(INITIAL);
1008 [^*\n]+ // eat comment in chunks
1009 "*" // eat the lone star
1010 \n yylineno++;
1011 }
1012
1013
1014File: flex.info, Node: The period isnt working the way I expected., Next: Can I get the flex manual in another format?, Prev: How can I match C-style comments?, Up: FAQ
1015
1016The '.' isn't working the way I expected.
1017=========================================
1018
1019 Here are some tips for using `.':
1020
1021 * A common mistake is to place the grouping parenthesis AFTER an
1022 operator, when you really meant to place the parenthesis BEFORE
1023 the operator, e.g., you probably want this `(foo|bar)+' and NOT
1024 this `(foo|bar+)'.
1025
1026 The first pattern matches the words `foo' or `bar' any number of
1027 times, e.g., it matches the text `barfoofoobarfoo'. The second
1028 pattern matches a single instance of `foo' or a single instance of
1029 `bar' followed by one or more `r's, e.g., it matches the text
1030 `barrrr' .
1031
1032 * A `.' inside `[]''s just means a literal`.' (period), and NOT "any
1033 character except newline".
1034
1035 * Remember that `.' matches any character EXCEPT `\n' (and `EOF').
1036 If you really want to match ANY character, including newlines,
1037 then use `(.|\n)' Beware that the regex `(.|\n)+' will match your
1038 entire input!
1039
1040 * Finally, if you want to match a literal `.' (a period), then use
1041 `[.]' or `"."'
1042
1043
1044File: flex.info, Node: Can I get the flex manual in another format?, Next: Does there exist a "faster" NDFA->DFA algorithm?, Prev: The period isnt working the way I expected., Up: FAQ
1045
1046Can I get the flex manual in another format?
1047============================================
1048
1049 The `flex' source distribution includes a texinfo manual. You are
1050free to convert that texinfo into whatever format you desire. The
1051`texinfo' package includes tools for conversion to a number of formats.
1052
1053
1054File: flex.info, Node: Does there exist a "faster" NDFA->DFA algorithm?, Next: How does flex compile the DFA so quickly?, Prev: Can I get the flex manual in another format?, Up: FAQ
1055
1056Does there exist a "faster" NDFA->DFA algorithm?
1057================================================
1058
1059 There's no way around the potential exponential running time - it
1060can take you exponential time just to enumerate all of the DFA states.
1061In practice, though, the running time is closer to linear, or sometimes
1062quadratic.
1063
1064
1065File: flex.info, Node: How does flex compile the DFA so quickly?, Next: How can I use more than 8192 rules?, Prev: Does there exist a "faster" NDFA->DFA algorithm?, Up: FAQ
1066
1067How does flex compile the DFA so quickly?
1068=========================================
1069
1070 There are two big speed wins that `flex' uses:
1071
1072 1. It analyzes the input rules to construct equivalence classes for
1073 those characters that always make the same transitions. It then
1074 rewrites the NFA using equivalence classes for transitions instead
1075 of characters. This cuts down the NFA->DFA computation time
1076 dramatically, to the point where, for uncompressed DFA tables, the
1077 DFA generation is often I/O bound in writing out the tables.
1078
1079 2. It maintains hash values for previously computed DFA states, so
1080 testing whether a newly constructed DFA state is equivalent to a
1081 previously constructed state can be done very quickly, by first
1082 comparing hash values.
1083
1084
1085File: flex.info, Node: How can I use more than 8192 rules?, Next: How do I abandon a file in the middle of a scan and switch to a new file?, Prev: How does flex compile the DFA so quickly?, Up: FAQ
1086
1087How can I use more than 8192 rules?
1088===================================
1089
1090 `Flex' is compiled with an upper limit of 8192 rules per scanner.
1091If you need more than 8192 rules in your scanner, you'll have to
1092recompile `flex' with the following changes in `flexdef.h':
1093
1094
1095 < #define YY_TRAILING_MASK 0x2000
1096 < #define YY_TRAILING_HEAD_MASK 0x4000
1097 --
1098 > #define YY_TRAILING_MASK 0x20000000
1099 > #define YY_TRAILING_HEAD_MASK 0x40000000
1100
1101 This should work okay as long as your C compiler uses 32 bit
1102integers. But you might want to think about whether using such a huge
1103number of rules is the best way to solve your problem.
1104
1105 The following may also be relevant:
1106
1107 With luck, you should be able to increase the definitions in
1108flexdef.h for:
1109
1110
1111 #define JAMSTATE -32766 /* marks a reference to the state that always jams */
1112 #define MAXIMUM_MNS 31999
1113 #define BAD_SUBSCRIPT -32767
1114
1115 recompile everything, and it'll all work. Flex only has these
111616-bit-like values built into it because a long time ago it was
1117developed on a machine with 16-bit ints. I've given this advice to
1118others in the past but haven't heard back from them whether it worked
1119okay or not...
1120
1121
1122File: flex.info, Node: How do I abandon a file in the middle of a scan and switch to a new file?, Next: How do I execute code only during initialization (only before the first scan)?, Prev: How can I use more than 8192 rules?, Up: FAQ
1123
1124How do I abandon a file in the middle of a scan and switch to a new file?
1125=========================================================================
1126
1127 Just call `yyrestart(newfile)'. Be sure to reset the start state if
1128you want a "fresh start, since `yyrestart' does NOT reset the start
1129state back to `INITIAL'.
1130
1131
1132File: flex.info, Node: How do I execute code only during initialization (only before the first scan)?, Next: How do I execute code at termination?, Prev: How do I abandon a file in the middle of a scan and switch to a new file?, Up: FAQ
1133
1134How do I execute code only during initialization (only before the first scan)?
1135==============================================================================
1136
1137 You can specify an initial action by defining the macro
1138`YY_USER_INIT' (though note that `yyout' may not be available at the
1139time this macro is executed). Or you can add to the beginning of your
1140rules section:
1141
1142
1143 %%
1144 /* Must be indented! */
1145 static int did_init = 0;
1146
1147 if ( ! did_init ){
1148 do_my_init();
1149 did_init = 1;
1150 }
1151
1152
1153File: flex.info, Node: How do I execute code at termination?, Next: Where else can I find help?, Prev: How do I execute code only during initialization (only before the first scan)?, Up: FAQ
1154
1155How do I execute code at termination?
1156=====================================
1157
1158 You can specify an action for the `<<EOF>>' rule.
1159
1160
1161File: flex.info, Node: Where else can I find help?, Next: Can I include comments in the "rules" section of the file?, Prev: How do I execute code at termination?, Up: FAQ
1162
1163Where else can I find help?
1164===========================
1165
1166 You can find the flex homepage on the web at
1167`http://flex.sourceforge.net/'. See that page for details about flex
1168mailing lists as well.
1169
1170
1171File: flex.info, Node: Can I include comments in the "rules" section of the file?, Next: I get an error about undefined yywrap()., Prev: Where else can I find help?, Up: FAQ
1172
1173Can I include comments in the "rules" section of the file?
1174==========================================================
1175
1176 Yes, just about anywhere you want to. See the manual for the
1177specific syntax.
1178
1179
1180File: flex.info, Node: I get an error about undefined yywrap()., Next: How can I change the matching pattern at run time?, Prev: Can I include comments in the "rules" section of the file?, Up: FAQ
1181
1182I get an error about undefined yywrap().
1183========================================
1184
1185 You must supply a `yywrap()' function of your own, or link to
1186`libfl.a' (which provides one), or use
1187
1188
1189 %option noyywrap
1190
1191 in your source to say you don't want a `yywrap()' function.
1192
1193
1194File: flex.info, Node: How can I change the matching pattern at run time?, Next: How can I expand macros in the input?, Prev: I get an error about undefined yywrap()., Up: FAQ
1195
1196How can I change the matching pattern at run time?
1197==================================================
1198
1199 You can't, it's compiled into a static table when flex builds the
1200scanner.
1201
1202
1203File: flex.info, Node: How can I expand macros in the input?, Next: How can I build a two-pass scanner?, Prev: How can I change the matching pattern at run time?, Up: FAQ
1204
1205How can I expand macros in the input?
1206=====================================
1207
1208 The best way to approach this problem is at a higher level, e.g., in
1209the parser.
1210
1211 However, you can do this using multiple input buffers.
1212
1213
1214 %%
1215 macro/[a-z]+ {
1216 /* Saw the macro "macro" followed by extra stuff. */
1217 main_buffer = YY_CURRENT_BUFFER;
1218 expansion_buffer = yy_scan_string(expand(yytext));
1219 yy_switch_to_buffer(expansion_buffer);
1220 }
1221
1222 <<EOF>> {
1223 if ( expansion_buffer )
1224 {
1225 // We were doing an expansion, return to where
1226 // we were.
1227 yy_switch_to_buffer(main_buffer);
1228 yy_delete_buffer(expansion_buffer);
1229 expansion_buffer = 0;
1230 }
1231 else
1232 yyterminate();
1233 }
1234
1235 You probably will want a stack of expansion buffers to allow nested
1236macros. From the above though hopefully the idea is clear.
1237
1238
1239File: flex.info, Node: How can I build a two-pass scanner?, Next: How do I match any string not matched in the preceding rules?, Prev: How can I expand macros in the input?, Up: FAQ
1240
1241How can I build a two-pass scanner?
1242===================================
1243
1244 One way to do it is to filter the first pass to a temporary file,
1245then process the temporary file on the second pass. You will probably
1246see a performance hit, do to all the disk I/O.
1247
1248 When you need to look ahead far forward like this, it almost always
1249means that the right solution is to build a parse tree of the entire
1250input, then walk it after the parse in order to generate the output.
1251In a sense, this is a two-pass approach, once through the text and once
1252through the parse tree, but the performance hit for the latter is
1253usually an order of magnitude smaller, since everything is already
1254classified, in binary format, and residing in memory.
1255
Note: See TracBrowser for help on using the repository browser.