1 | This is flex.info, produced by makeinfo version 4.5 from flex.texi.
|
---|
2 |
|
---|
3 | INFO-DIR-SECTION Programming
|
---|
4 | START-INFO-DIR-ENTRY
|
---|
5 | * flex: (flex). Fast lexical analyzer generator (lex replacement).
|
---|
6 | END-INFO-DIR-ENTRY
|
---|
7 |
|
---|
8 |
|
---|
9 | The flex manual is placed under the same licensing conditions as the
|
---|
10 | rest of flex:
|
---|
11 |
|
---|
12 | Copyright (C) 1990, 1997 The Regents of the University of California.
|
---|
13 | All rights reserved.
|
---|
14 |
|
---|
15 | This code is derived from software contributed to Berkeley by Vern
|
---|
16 | Paxson.
|
---|
17 |
|
---|
18 | The United States Government has rights in this work pursuant to
|
---|
19 | contract no. DE-AC03-76SF00098 between the United States Department of
|
---|
20 | Energy and the University of California.
|
---|
21 |
|
---|
22 | Redistribution and use in source and binary forms, with or without
|
---|
23 | modification, are permitted provided that the following conditions are
|
---|
24 | met:
|
---|
25 |
|
---|
26 | 1. Redistributions of source code must retain the above copyright
|
---|
27 | notice, this list of conditions and the following disclaimer.
|
---|
28 |
|
---|
29 | 2. Redistributions in binary form must reproduce the above copyright
|
---|
30 | notice, this list of conditions and the following disclaimer in the
|
---|
31 | documentation and/or other materials provided with the
|
---|
32 | distribution.
|
---|
33 | Neither the name of the University nor the names of its contributors
|
---|
34 | may be used to endorse or promote products derived from this software
|
---|
35 | without specific prior written permission.
|
---|
36 |
|
---|
37 | THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
|
---|
38 | WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
|
---|
39 | MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
|
---|
40 |
|
---|
41 | File: flex.info, Node: Overriding The Default Memory Management, Next: A Note About yytext And Memory, Prev: The Default Memory Management, Up: Memory Management
|
---|
42 |
|
---|
43 | Overriding The Default Memory Management
|
---|
44 | ========================================
|
---|
45 |
|
---|
46 | Flex calls the functions `yyalloc', `yyrealloc', and `yyfree' when
|
---|
47 | it needs to allocate or free memory. By default, these functions are
|
---|
48 | wrappers around the standard C functions, `malloc', `realloc', and
|
---|
49 | `free', respectively. You can override the default implementations by
|
---|
50 | telling flex that you will provide your own implementations.
|
---|
51 |
|
---|
52 | To override the default implementations, you must do two things:
|
---|
53 |
|
---|
54 | 1. Suppress the default implementations by specifying one or more of
|
---|
55 | the following options:
|
---|
56 |
|
---|
57 | * `%option noyyalloc'
|
---|
58 |
|
---|
59 | * `%option noyyrealloc'
|
---|
60 |
|
---|
61 | * `%option noyyfree'.
|
---|
62 |
|
---|
63 | 2. Provide your own implementation of the following functions: (1)
|
---|
64 |
|
---|
65 |
|
---|
66 | // For a non-reentrant scanner
|
---|
67 | void * yyalloc (size_t bytes);
|
---|
68 | void * yyrealloc (void * ptr, size_t bytes);
|
---|
69 | void yyfree (void * ptr);
|
---|
70 |
|
---|
71 | // For a reentrant scanner
|
---|
72 | void * yyalloc (size_t bytes, void * yyscanner);
|
---|
73 | void * yyrealloc (void * ptr, size_t bytes, void * yyscanner);
|
---|
74 | void yyfree (void * ptr, void * yyscanner);
|
---|
75 |
|
---|
76 |
|
---|
77 | In the following example, we will override all three memory
|
---|
78 | routines. We assume that there is a custom allocator with garbage
|
---|
79 | collection. In order to make this example interesting, we will use a
|
---|
80 | reentrant scanner, passing a pointer to the custom allocator through
|
---|
81 | `yyextra'.
|
---|
82 |
|
---|
83 |
|
---|
84 | %{
|
---|
85 | #include "some_allocator.h"
|
---|
86 | %}
|
---|
87 |
|
---|
88 | /* Suppress the default implementations. */
|
---|
89 | %option noyyalloc noyyrealloc noyyfree
|
---|
90 | %option reentrant
|
---|
91 |
|
---|
92 | /* Initialize the allocator. */
|
---|
93 | #define YY_EXTRA_TYPE struct allocator*
|
---|
94 | #define YY_USER_INIT yyextra = allocator_create();
|
---|
95 |
|
---|
96 | %%
|
---|
97 | .|\n ;
|
---|
98 | %%
|
---|
99 |
|
---|
100 | /* Provide our own implementations. */
|
---|
101 | void * yyalloc (size_t bytes, void* yyscanner) {
|
---|
102 | return allocator_alloc (yyextra, bytes);
|
---|
103 | }
|
---|
104 |
|
---|
105 | void * yyrealloc (void * ptr, size_t bytes, void* yyscanner) {
|
---|
106 | return allocator_realloc (yyextra, bytes);
|
---|
107 | }
|
---|
108 |
|
---|
109 | void yyfree (void * ptr, void * yyscanner) {
|
---|
110 | /* Do nothing -- we leave it to the garbage collector. */
|
---|
111 | }
|
---|
112 |
|
---|
113 | ---------- Footnotes ----------
|
---|
114 |
|
---|
115 | (1) It is not necessary to override all (or any) of the memory
|
---|
116 | management routines. You may, for example, override `yyrealloc', but
|
---|
117 | not `yyfree' or `yyalloc'.
|
---|
118 |
|
---|
119 |
|
---|
120 | File: flex.info, Node: A Note About yytext And Memory, Prev: Overriding The Default Memory Management, Up: Memory Management
|
---|
121 |
|
---|
122 | A Note About yytext And Memory
|
---|
123 | ==============================
|
---|
124 |
|
---|
125 | When flex finds a match, `yytext' points to the first character of
|
---|
126 | the match in the input buffer. The string itself is part of the input
|
---|
127 | buffer, and is _NOT_ allocated separately. The value of yytext will be
|
---|
128 | overwritten the next time yylex() is called. In short, the value of
|
---|
129 | yytext is only valid from within the matched rule's action.
|
---|
130 |
|
---|
131 | Often, you want the value of yytext to persist for later processing,
|
---|
132 | i.e., by a parser with non-zero lookahead. In order to preserve yytext,
|
---|
133 | you will have to copy it with strdup() or a similar function. But this
|
---|
134 | introduces some headache because your parser is now responsible for
|
---|
135 | freeing the copy of yytext. If you use a yacc or bison parser,
|
---|
136 | (commonly used with flex), you will discover that the error recovery
|
---|
137 | mechanisms can cause memory to be leaked.
|
---|
138 |
|
---|
139 | To prevent memory leaks from strdup'd yytext, you will have to track
|
---|
140 | the memory somehow. Our experience has shown that a garbage collection
|
---|
141 | mechanism or a pooled memory mechanism will save you a lot of grief
|
---|
142 | when writing parsers.
|
---|
143 |
|
---|
144 |
|
---|
145 | File: flex.info, Node: Serialized Tables, Next: Diagnostics, Prev: Memory Management, Up: Top
|
---|
146 |
|
---|
147 | Serialized Tables
|
---|
148 | *****************
|
---|
149 |
|
---|
150 | A `flex' scanner has the ability to save the DFA tables to a file,
|
---|
151 | and load them at runtime when needed. The motivation for this feature
|
---|
152 | is to reduce the runtime memory footprint. Traditionally, these tables
|
---|
153 | have been compiled into the scanner as C arrays, and are sometimes
|
---|
154 | quite large. Since the tables are compiled into the scanner, the
|
---|
155 | memory used by the tables can never be freed. This is a waste of
|
---|
156 | memory, especially if an application uses several scanners, but none of
|
---|
157 | them at the same time.
|
---|
158 |
|
---|
159 | The serialization feature allows the tables to be loaded at runtime,
|
---|
160 | before scanning begins. The tables may be discarded when scanning is
|
---|
161 | finished.
|
---|
162 |
|
---|
163 | * Menu:
|
---|
164 |
|
---|
165 | * Creating Serialized Tables::
|
---|
166 | * Loading and Unloading Serialized Tables::
|
---|
167 | * Tables File Format::
|
---|
168 |
|
---|
169 |
|
---|
170 | File: flex.info, Node: Creating Serialized Tables, Next: Loading and Unloading Serialized Tables, Prev: Serialized Tables, Up: Serialized Tables
|
---|
171 |
|
---|
172 | Creating Serialized Tables
|
---|
173 | ==========================
|
---|
174 |
|
---|
175 | You may create a scanner with serialized tables by specifying:
|
---|
176 |
|
---|
177 |
|
---|
178 | %option tables-file=FILE
|
---|
179 | or
|
---|
180 | --tables-file=FILE
|
---|
181 |
|
---|
182 | These options instruct flex to save the DFA tables to the file FILE.
|
---|
183 | The tables will _not_ be embedded in the generated scanner. The scanner
|
---|
184 | will not function on its own. The scanner will be dependent upon the
|
---|
185 | serialized tables. You must load the tables from this file at runtime
|
---|
186 | before you can scan anything.
|
---|
187 |
|
---|
188 | If you do not specify a filename to `--tables-file', the tables will
|
---|
189 | be saved to `lex.yy.tables', where `yy' is the appropriate prefix.
|
---|
190 |
|
---|
191 | If your project uses several different scanners, you can concatenate
|
---|
192 | the serialized tables into one file, and flex will find the correct set
|
---|
193 | of tables, using the scanner prefix as part of the lookup key. An
|
---|
194 | example follows:
|
---|
195 |
|
---|
196 |
|
---|
197 | $ flex --tables-file --prefix=cpp cpp.l
|
---|
198 | $ flex --tables-file --prefix=c c.l
|
---|
199 | $ cat lex.cpp.tables lex.c.tables > all.tables
|
---|
200 |
|
---|
201 | The above example created two scanners, `cpp', and `c'. Since we did
|
---|
202 | not specify a filename, the tables were serialized to `lex.c.tables' and
|
---|
203 | `lex.cpp.tables', respectively. Then, we concatenated the two files
|
---|
204 | together into `all.tables', which we will distribute with our project.
|
---|
205 | At runtime, we will open the file and tell flex to load the tables from
|
---|
206 | it. Flex will find the correct tables automatically. (See next
|
---|
207 | section).
|
---|
208 |
|
---|
209 |
|
---|
210 | File: flex.info, Node: Loading and Unloading Serialized Tables, Next: Tables File Format, Prev: Creating Serialized Tables, Up: Serialized Tables
|
---|
211 |
|
---|
212 | Loading and Unloading Serialized Tables
|
---|
213 | =======================================
|
---|
214 |
|
---|
215 | If you've built your scanner with `%option tables-file', then you
|
---|
216 | must load the scanner tables at runtime. This can be accomplished with
|
---|
217 | the following function:
|
---|
218 |
|
---|
219 | - Function: int yytables_fload (FILE* FP [, yyscan_t SCANNER])
|
---|
220 | Locates scanner tables in the stream pointed to by FP and loads
|
---|
221 | them. Memory for the tables is allocated via `yyalloc'. You must
|
---|
222 | call this function before the first call to `yylex'. The argument
|
---|
223 | SCANNER only appears in the reentrant scanner. This function
|
---|
224 | returns `0' (zero) on success, or non-zero on error.
|
---|
225 |
|
---|
226 | The loaded tables are *not* automatically destroyed (unloaded) when
|
---|
227 | you call `yylex_destroy'. The reason is that you may create several
|
---|
228 | scanners of the same type (in a reentrant scanner), each of which needs
|
---|
229 | access to these tables. To avoid a nasty memory leak, you must call
|
---|
230 | the following function:
|
---|
231 |
|
---|
232 | - Function: int yytables_destroy ([yyscan_t SCANNER])
|
---|
233 | Unloads the scanner tables. The tables must be loaded again before
|
---|
234 | you can scan any more data. The argument SCANNER only appears in
|
---|
235 | the reentrant scanner. This function returns `0' (zero) on
|
---|
236 | success, or non-zero on error.
|
---|
237 |
|
---|
238 | *The functions `yytables_fload' and `yytables_destroy' are not
|
---|
239 | thread-safe.* You must ensure that these functions are called exactly
|
---|
240 | once (for each scanner type) in a threaded program, before any thread
|
---|
241 | calls `yylex'. After the tables are loaded, they are never written to,
|
---|
242 | and no thread protection is required thereafter - until you destroy
|
---|
243 | them.
|
---|
244 |
|
---|
245 |
|
---|
246 | File: flex.info, Node: Tables File Format, Prev: Loading and Unloading Serialized Tables, Up: Serialized Tables
|
---|
247 |
|
---|
248 | Tables File Format
|
---|
249 | ==================
|
---|
250 |
|
---|
251 | This section defines the file format of serialized `flex' tables.
|
---|
252 |
|
---|
253 | The tables format allows for one or more sets of tables to be
|
---|
254 | specified, where each set corresponds to a given scanner. Scanners are
|
---|
255 | indexed by name, as described below. The file format is as follows:
|
---|
256 |
|
---|
257 |
|
---|
258 | TABLE SET 1
|
---|
259 | +-------------------------------+
|
---|
260 | Header | uint32 th_magic; |
|
---|
261 | | uint32 th_hsize; |
|
---|
262 | | uint32 th_ssize; |
|
---|
263 | | uint16 th_flags; |
|
---|
264 | | char th_version[]; |
|
---|
265 | | char th_name[]; |
|
---|
266 | | uint8 th_pad64[]; |
|
---|
267 | +-------------------------------+
|
---|
268 | Table 1 | uint16 td_id; |
|
---|
269 | | uint16 td_flags; |
|
---|
270 | | uint32 td_lolen; |
|
---|
271 | | uint32 td_hilen; |
|
---|
272 | | void td_data[]; |
|
---|
273 | | uint8 td_pad64[]; |
|
---|
274 | +-------------------------------+
|
---|
275 | Table 2 | |
|
---|
276 | . . .
|
---|
277 | . . .
|
---|
278 | . . .
|
---|
279 | . . .
|
---|
280 | Table n | |
|
---|
281 | +-------------------------------+
|
---|
282 | TABLE SET 2
|
---|
283 | .
|
---|
284 | .
|
---|
285 | .
|
---|
286 | TABLE SET N
|
---|
287 |
|
---|
288 | The above diagram shows that a complete set of tables consists of a
|
---|
289 | header followed by multiple individual tables. Furthermore, multiple
|
---|
290 | complete sets may be present in the same file, each set with its own
|
---|
291 | header and tables. The sets are contiguous in the file. The only way to
|
---|
292 | know if another set follows is to check the next four bytes for the
|
---|
293 | magic number (or check for EOF). The header and tables sections are
|
---|
294 | padded to 64-bit boundaries. Below we describe each field in detail.
|
---|
295 | This format does not specify how the scanner will expand the given
|
---|
296 | data, i.e., data may be serialized as int8, but expanded to an int32
|
---|
297 | array at runtime. This is to reduce the size of the serialized data
|
---|
298 | where possible. Remember, _all integer values are in network byte
|
---|
299 | order_.
|
---|
300 |
|
---|
301 | Fields of a table header:
|
---|
302 |
|
---|
303 | `th_magic'
|
---|
304 | Magic number, always 0xF13C57B1.
|
---|
305 |
|
---|
306 | `th_hsize'
|
---|
307 | Size of this entire header, in bytes, including all fields plus
|
---|
308 | any padding.
|
---|
309 |
|
---|
310 | `th_ssize'
|
---|
311 | Size of this entire set, in bytes, including the header, all
|
---|
312 | tables, plus any padding.
|
---|
313 |
|
---|
314 | `th_flags'
|
---|
315 | Bit flags for this table set. Currently unused.
|
---|
316 |
|
---|
317 | `th_version[]'
|
---|
318 | Flex version in NULL-termninated string format. e.g., `2.5.13a'.
|
---|
319 | This is the version of flex that was used to create the serialized
|
---|
320 | tables.
|
---|
321 |
|
---|
322 | `th_name[]'
|
---|
323 | Contains the name of this table set. The default is `yytables',
|
---|
324 | and is prefixed accordingly, e.g., `footables'. Must be
|
---|
325 | NULL-terminated.
|
---|
326 |
|
---|
327 | `th_pad64[]'
|
---|
328 | Zero or more NULL bytes, padding the entire header to the next
|
---|
329 | 64-bit boundary as calculated from the beginning of the header.
|
---|
330 |
|
---|
331 | Fields of a table:
|
---|
332 |
|
---|
333 | `td_id'
|
---|
334 | Specifies the table identifier. Possible values are:
|
---|
335 | `YYTD_ID_ACCEPT (0x01)'
|
---|
336 | `yy_accept'
|
---|
337 |
|
---|
338 | `YYTD_ID_BASE (0x02)'
|
---|
339 | `yy_base'
|
---|
340 |
|
---|
341 | `YYTD_ID_CHK (0x03)'
|
---|
342 | `yy_chk'
|
---|
343 |
|
---|
344 | `YYTD_ID_DEF (0x04)'
|
---|
345 | `yy_def'
|
---|
346 |
|
---|
347 | `YYTD_ID_EC (0x05)'
|
---|
348 | `yy_ec '
|
---|
349 |
|
---|
350 | `YYTD_ID_META (0x06)'
|
---|
351 | `yy_meta'
|
---|
352 |
|
---|
353 | `YYTD_ID_NUL_TRANS (0x07)'
|
---|
354 | `yy_NUL_trans'
|
---|
355 |
|
---|
356 | `YYTD_ID_NXT (0x08)'
|
---|
357 | `yy_nxt'. This array may be two dimensional. See the
|
---|
358 | `td_hilen' field below.
|
---|
359 |
|
---|
360 | `YYTD_ID_RULE_CAN_MATCH_EOL (0x09)'
|
---|
361 | `yy_rule_can_match_eol'
|
---|
362 |
|
---|
363 | `YYTD_ID_START_STATE_LIST (0x0A)'
|
---|
364 | `yy_start_state_list'. This array is handled specially
|
---|
365 | because it is an array of pointers to structs. See the
|
---|
366 | `td_flags' field below.
|
---|
367 |
|
---|
368 | `YYTD_ID_TRANSITION (0x0B)'
|
---|
369 | `yy_transition'. This array is handled specially because it
|
---|
370 | is an array of structs. See the `td_lolen' field below.
|
---|
371 |
|
---|
372 | `YYTD_ID_ACCLIST (0x0C)'
|
---|
373 | `yy_acclist'
|
---|
374 |
|
---|
375 | `td_flags'
|
---|
376 | Bit flags describing how to interpret the data in `td_data'. The
|
---|
377 | data arrays are one-dimensional by default, but may be two
|
---|
378 | dimensional as specified in the `td_hilen' field.
|
---|
379 |
|
---|
380 | `YYTD_DATA8 (0x01)'
|
---|
381 | The data is serialized as an array of type int8.
|
---|
382 |
|
---|
383 | `YYTD_DATA16 (0x02)'
|
---|
384 | The data is serialized as an array of type int16.
|
---|
385 |
|
---|
386 | `YYTD_DATA32 (0x04)'
|
---|
387 | The data is serialized as an array of type int32.
|
---|
388 |
|
---|
389 | `YYTD_PTRANS (0x08)'
|
---|
390 | The data is a list of indexes of entries in the expanded
|
---|
391 | `yy_transition' array. Each index should be expanded to a
|
---|
392 | pointer to the corresponding entry in the `yy_transition'
|
---|
393 | array. We count on the fact that the `yy_transition' array
|
---|
394 | has already been seen.
|
---|
395 |
|
---|
396 | `YYTD_STRUCT (0x10)'
|
---|
397 | The data is a list of yy_trans_info structs, each of which
|
---|
398 | consists of two integers. There is no padding between struct
|
---|
399 | elements or between structs. The type of each member is
|
---|
400 | determined by the `YYTD_DATA*' bits.
|
---|
401 |
|
---|
402 | `td_lolen'
|
---|
403 | Specifies the number of elements in the lowest dimension array. If
|
---|
404 | this is a one-dimensional array, then it is simply the number of
|
---|
405 | elements in this array. The element size is determined by the
|
---|
406 | `td_flags' field.
|
---|
407 |
|
---|
408 | `td_hilen'
|
---|
409 | If `td_hilen' is non-zero, then the data is a two-dimensional
|
---|
410 | array. Otherwise, the data is a one-dimensional array. `td_hilen'
|
---|
411 | contains the number of elements in the higher dimensional array,
|
---|
412 | and `td_lolen' contains the number of elements in the lowest
|
---|
413 | dimension.
|
---|
414 |
|
---|
415 | Conceptually, `td_data' is either `sometype td_data[td_lolen]', or
|
---|
416 | `sometype td_data[td_hilen][td_lolen]', where `sometype' is
|
---|
417 | specified by the `td_flags' field. It is possible for both
|
---|
418 | `td_lolen' and `td_hilen' to be zero, in which case `td_data' is a
|
---|
419 | zero length array, and no data is loaded, i.e., this table is
|
---|
420 | simply skipped. Flex does not currently generate tables of zero
|
---|
421 | length.
|
---|
422 |
|
---|
423 | `td_data[]'
|
---|
424 | The table data. This array may be a one- or two-dimensional array,
|
---|
425 | of type `int8', `int16', `int32', `struct yy_trans_info', or
|
---|
426 | `struct yy_trans_info*', depending upon the values in the
|
---|
427 | `td_flags', `td_lolen', and `td_hilen' fields.
|
---|
428 |
|
---|
429 | `td_pad64[]'
|
---|
430 | Zero or more NULL bytes, padding the entire table to the next
|
---|
431 | 64-bit boundary as calculated from the beginning of this table.
|
---|
432 |
|
---|
433 |
|
---|
434 | File: flex.info, Node: Diagnostics, Next: Limitations, Prev: Serialized Tables, Up: Top
|
---|
435 |
|
---|
436 | Diagnostics
|
---|
437 | ***********
|
---|
438 |
|
---|
439 | The following is a list of `flex' diagnostic messages:
|
---|
440 |
|
---|
441 | * `warning, rule cannot be matched' indicates that the given rule
|
---|
442 | cannot be matched because it follows other rules that will always
|
---|
443 | match the same text as it. For example, in the following `foo'
|
---|
444 | cannot be matched because it comes after an identifier "catch-all"
|
---|
445 | rule:
|
---|
446 |
|
---|
447 |
|
---|
448 | [a-z]+ got_identifier();
|
---|
449 | foo got_foo();
|
---|
450 |
|
---|
451 | Using `REJECT' in a scanner suppresses this warning.
|
---|
452 |
|
---|
453 | * `warning, -s option given but default rule can be matched' means
|
---|
454 | that it is possible (perhaps only in a particular start condition)
|
---|
455 | that the default rule (match any single character) is the only one
|
---|
456 | that will match a particular input. Since `-s' was given,
|
---|
457 | presumably this is not intended.
|
---|
458 |
|
---|
459 | * `reject_used_but_not_detected undefined' or
|
---|
460 | `yymore_used_but_not_detected undefined'. These errors can occur
|
---|
461 | at compile time. They indicate that the scanner uses `REJECT' or
|
---|
462 | `yymore()' but that `flex' failed to notice the fact, meaning that
|
---|
463 | `flex' scanned the first two sections looking for occurrences of
|
---|
464 | these actions and failed to find any, but somehow you snuck some in
|
---|
465 | (via a #include file, for example). Use `%option reject' or
|
---|
466 | `%option yymore' to indicate to `flex' that you really do use
|
---|
467 | these features.
|
---|
468 |
|
---|
469 | * `flex scanner jammed'. a scanner compiled with `-s' has
|
---|
470 | encountered an input string which wasn't matched by any of its
|
---|
471 | rules. This error can also occur due to internal problems.
|
---|
472 |
|
---|
473 | * `token too large, exceeds YYLMAX'. your scanner uses `%array' and
|
---|
474 | one of its rules matched a string longer than the `YYLMAX'
|
---|
475 | constant (8K bytes by default). You can increase the value by
|
---|
476 | #define'ing `YYLMAX' in the definitions section of your `flex'
|
---|
477 | input.
|
---|
478 |
|
---|
479 | * `scanner requires -8 flag to use the character 'x''. Your scanner
|
---|
480 | specification includes recognizing the 8-bit character `'x'' and
|
---|
481 | you did not specify the -8 flag, and your scanner defaulted to
|
---|
482 | 7-bit because you used the `-Cf' or `-CF' table compression
|
---|
483 | options. See the discussion of the `-7' flag, *Note Scanner
|
---|
484 | Options::, for details.
|
---|
485 |
|
---|
486 | * `flex scanner push-back overflow'. you used `unput()' to push back
|
---|
487 | so much text that the scanner's buffer could not hold both the
|
---|
488 | pushed-back text and the current token in `yytext'. Ideally the
|
---|
489 | scanner should dynamically resize the buffer in this case, but at
|
---|
490 | present it does not.
|
---|
491 |
|
---|
492 | * `input buffer overflow, can't enlarge buffer because scanner uses
|
---|
493 | REJECT'. the scanner was working on matching an extremely large
|
---|
494 | token and needed to expand the input buffer. This doesn't work
|
---|
495 | with scanners that use `REJECT'.
|
---|
496 |
|
---|
497 | * `fatal flex scanner internal error--end of buffer missed'. This can
|
---|
498 | occur in a scanner which is reentered after a long-jump has jumped
|
---|
499 | out (or over) the scanner's activation frame. Before reentering
|
---|
500 | the scanner, use:
|
---|
501 |
|
---|
502 | yyrestart( yyin );
|
---|
503 | or, as noted above, switch to using the C++ scanner class.
|
---|
504 |
|
---|
505 | * `too many start conditions in <> construct!' you listed more start
|
---|
506 | conditions in a <> construct than exist (so you must have listed at
|
---|
507 | least one of them twice).
|
---|
508 |
|
---|
509 |
|
---|
510 | File: flex.info, Node: Limitations, Next: Bibliography, Prev: Diagnostics, Up: Top
|
---|
511 |
|
---|
512 | Limitations
|
---|
513 | ***********
|
---|
514 |
|
---|
515 | Some trailing context patterns cannot be properly matched and
|
---|
516 | generate warning messages (`dangerous trailing context'). These are
|
---|
517 | patterns where the ending of the first part of the rule matches the
|
---|
518 | beginning of the second part, such as `zx*/xy*', where the 'x*' matches
|
---|
519 | the 'x' at the beginning of the trailing context. (Note that the POSIX
|
---|
520 | draft states that the text matched by such patterns is undefined.) For
|
---|
521 | some trailing context rules, parts which are actually fixed-length are
|
---|
522 | not recognized as such, leading to the abovementioned performance loss.
|
---|
523 | In particular, parts using `|' or `{n}' (such as `foo{3}') are always
|
---|
524 | considered variable-length. Combining trailing context with the
|
---|
525 | special `|' action can result in _fixed_ trailing context being turned
|
---|
526 | into the more expensive _variable_ trailing context. For example, in
|
---|
527 | the following:
|
---|
528 |
|
---|
529 |
|
---|
530 | %%
|
---|
531 | abc |
|
---|
532 | xyz/def
|
---|
533 |
|
---|
534 | Use of `unput()' invalidates yytext and yyleng, unless the `%array'
|
---|
535 | directive or the `-l' option has been used. Pattern-matching of `NUL's
|
---|
536 | is substantially slower than matching other characters. Dynamic
|
---|
537 | resizing of the input buffer is slow, as it entails rescanning all the
|
---|
538 | text matched so far by the current (generally huge) token. Due to both
|
---|
539 | buffering of input and read-ahead, you cannot intermix calls to
|
---|
540 | `<stdio.h>' routines, such as, getchar(), with `flex' rules and expect
|
---|
541 | it to work. Call `input()' instead. The total table entries listed by
|
---|
542 | the `-v' flag excludes the number of table entries needed to determine
|
---|
543 | what rule has been matched. The number of entries is equal to the
|
---|
544 | number of DFA states if the scanner does not use `REJECT', and somewhat
|
---|
545 | greater than the number of states if it does. `REJECT' cannot be used
|
---|
546 | with the `-f' or `-F' options.
|
---|
547 |
|
---|
548 | The `flex' internal algorithms need documentation.
|
---|
549 |
|
---|
550 |
|
---|
551 | File: flex.info, Node: Bibliography, Next: FAQ, Prev: Limitations, Up: Top
|
---|
552 |
|
---|
553 | Additional Reading
|
---|
554 | ******************
|
---|
555 |
|
---|
556 | You may wish to read more about the following programs:
|
---|
557 | * lex
|
---|
558 |
|
---|
559 | * yacc
|
---|
560 |
|
---|
561 | * sed
|
---|
562 |
|
---|
563 | * awk
|
---|
564 |
|
---|
565 | The following books may contain material of interest:
|
---|
566 |
|
---|
567 | John Levine, Tony Mason, and Doug Brown, _Lex & Yacc_, O'Reilly and
|
---|
568 | Associates. Be sure to get the 2nd edition.
|
---|
569 |
|
---|
570 | M. E. Lesk and E. Schmidt, _LEX - Lexical Analyzer Generator_
|
---|
571 |
|
---|
572 | Alfred Aho, Ravi Sethi and Jeffrey Ullman, _Compilers: Principles,
|
---|
573 | Techniques and Tools_, Addison-Wesley (1986). Describes the
|
---|
574 | pattern-matching techniques used by `flex' (deterministic finite
|
---|
575 | automata).
|
---|
576 |
|
---|
577 |
|
---|
578 | File: flex.info, Node: FAQ, Next: Appendices, Prev: Bibliography, Up: Top
|
---|
579 |
|
---|
580 | FAQ
|
---|
581 | ***
|
---|
582 |
|
---|
583 | From time to time, the `flex' maintainer receives certain questions.
|
---|
584 | Rather than repeat answers to well-understood problems, we publish them
|
---|
585 | here.
|
---|
586 |
|
---|
587 | * Menu:
|
---|
588 |
|
---|
589 | * When was flex born?::
|
---|
590 | * How do I expand \ escape sequences in C-style quoted strings?::
|
---|
591 | * Why do flex scanners call fileno if it is not ANSI compatible?::
|
---|
592 | * Does flex support recursive pattern definitions?::
|
---|
593 | * How do I skip huge chunks of input (tens of megabytes) while using flex?::
|
---|
594 | * Flex is not matching my patterns in the same order that I defined them.::
|
---|
595 | * My actions are executing out of order or sometimes not at all.::
|
---|
596 | * How can I have multiple input sources feed into the same scanner at the same time?::
|
---|
597 | * Can I build nested parsers that work with the same input file?::
|
---|
598 | * How can I match text only at the end of a file?::
|
---|
599 | * How can I make REJECT cascade across start condition boundaries?::
|
---|
600 | * Why cant I use fast or full tables with interactive mode?::
|
---|
601 | * How much faster is -F or -f than -C?::
|
---|
602 | * If I have a simple grammar cant I just parse it with flex?::
|
---|
603 | * Why doesnt yyrestart() set the start state back to INITIAL?::
|
---|
604 | * How can I match C-style comments?::
|
---|
605 | * The period isnt working the way I expected.::
|
---|
606 | * Can I get the flex manual in another format?::
|
---|
607 | * Does there exist a "faster" NDFA->DFA algorithm?::
|
---|
608 | * How does flex compile the DFA so quickly?::
|
---|
609 | * How can I use more than 8192 rules?::
|
---|
610 | * How do I abandon a file in the middle of a scan and switch to a new file?::
|
---|
611 | * How do I execute code only during initialization (only before the first scan)?::
|
---|
612 | * How do I execute code at termination?::
|
---|
613 | * Where else can I find help?::
|
---|
614 | * Can I include comments in the "rules" section of the file?::
|
---|
615 | * I get an error about undefined yywrap().::
|
---|
616 | * How can I change the matching pattern at run time?::
|
---|
617 | * How can I expand macros in the input?::
|
---|
618 | * How can I build a two-pass scanner?::
|
---|
619 | * How do I match any string not matched in the preceding rules?::
|
---|
620 | * I am trying to port code from AT&T lex that uses yysptr and yysbuf.::
|
---|
621 | * Is there a way to make flex treat NULL like a regular character?::
|
---|
622 | * Whenever flex can not match the input it says "flex scanner jammed".::
|
---|
623 | * Why doesnt flex have non-greedy operators like perl does?::
|
---|
624 | * Memory leak - 16386 bytes allocated by malloc.::
|
---|
625 | * How do I track the byte offset for lseek()?::
|
---|
626 | * How do I use my own I/O classes in a C++ scanner?::
|
---|
627 | * How do I skip as many chars as possible?::
|
---|
628 | * deleteme00::
|
---|
629 | * Are certain equivalent patterns faster than others?::
|
---|
630 | * Is backing up a big deal?::
|
---|
631 | * Can I fake multi-byte character support?::
|
---|
632 | * deleteme01::
|
---|
633 | * Can you discuss some flex internals?::
|
---|
634 | * unput() messes up yy_at_bol::
|
---|
635 | * The | operator is not doing what I want::
|
---|
636 | * Why can't flex understand this variable trailing context pattern?::
|
---|
637 | * The ^ operator isn't working::
|
---|
638 | * Trailing context is getting confused with trailing optional patterns::
|
---|
639 | * Is flex GNU or not?::
|
---|
640 | * ERASEME53::
|
---|
641 | * I need to scan if-then-else blocks and while loops::
|
---|
642 | * ERASEME55::
|
---|
643 | * ERASEME56::
|
---|
644 | * ERASEME57::
|
---|
645 | * Is there a repository for flex scanners?::
|
---|
646 | * How can I conditionally compile or preprocess my flex input file?::
|
---|
647 | * Where can I find grammars for lex and yacc?::
|
---|
648 | * I get an end-of-buffer message for each character scanned.::
|
---|
649 | * unnamed-faq-62::
|
---|
650 | * unnamed-faq-63::
|
---|
651 | * unnamed-faq-64::
|
---|
652 | * unnamed-faq-65::
|
---|
653 | * unnamed-faq-66::
|
---|
654 | * unnamed-faq-67::
|
---|
655 | * unnamed-faq-68::
|
---|
656 | * unnamed-faq-69::
|
---|
657 | * unnamed-faq-70::
|
---|
658 | * unnamed-faq-71::
|
---|
659 | * unnamed-faq-72::
|
---|
660 | * unnamed-faq-73::
|
---|
661 | * unnamed-faq-74::
|
---|
662 | * unnamed-faq-75::
|
---|
663 | * unnamed-faq-76::
|
---|
664 | * unnamed-faq-77::
|
---|
665 | * unnamed-faq-78::
|
---|
666 | * unnamed-faq-79::
|
---|
667 | * unnamed-faq-80::
|
---|
668 | * unnamed-faq-81::
|
---|
669 | * unnamed-faq-82::
|
---|
670 | * unnamed-faq-83::
|
---|
671 | * unnamed-faq-84::
|
---|
672 | * unnamed-faq-85::
|
---|
673 | * unnamed-faq-86::
|
---|
674 | * unnamed-faq-87::
|
---|
675 | * unnamed-faq-88::
|
---|
676 | * unnamed-faq-90::
|
---|
677 | * unnamed-faq-91::
|
---|
678 | * unnamed-faq-92::
|
---|
679 | * unnamed-faq-93::
|
---|
680 | * unnamed-faq-94::
|
---|
681 | * unnamed-faq-95::
|
---|
682 | * unnamed-faq-96::
|
---|
683 | * unnamed-faq-97::
|
---|
684 | * unnamed-faq-98::
|
---|
685 | * unnamed-faq-99::
|
---|
686 | * unnamed-faq-100::
|
---|
687 | * unnamed-faq-101::
|
---|
688 | * What is the difference between YYLEX_PARAM and YY_DECL?::
|
---|
689 | * Why do I get "conflicting types for yylex" error?::
|
---|
690 | * How do I access the values set in a Flex action from within a Bison action?::
|
---|
691 |
|
---|
692 |
|
---|
693 | File: flex.info, Node: When was flex born?, Next: How do I expand \ escape sequences in C-style quoted strings?, Up: FAQ
|
---|
694 |
|
---|
695 | When was flex born?
|
---|
696 | ===================
|
---|
697 |
|
---|
698 | Vern Paxson took over the `Software Tools' lex project from Jef
|
---|
699 | Poskanzer in 1982. At that point it was written in Ratfor. Around
|
---|
700 | 1987 or so, Paxson translated it into C, and a legend was born :-).
|
---|
701 |
|
---|
702 |
|
---|
703 | File: flex.info, Node: How do I expand \ escape sequences in C-style quoted strings?, Next: Why do flex scanners call fileno if it is not ANSI compatible?, Prev: When was flex born?, Up: FAQ
|
---|
704 |
|
---|
705 | How do I expand \ escape sequences in C-style quoted strings?
|
---|
706 | =============================================================
|
---|
707 |
|
---|
708 | A key point when scanning quoted strings is that you cannot (easily)
|
---|
709 | write a single rule that will precisely match the string if you allow
|
---|
710 | things like embedded escape sequences and newlines. If you try to
|
---|
711 | match strings with a single rule then you'll wind up having to rescan
|
---|
712 | the string anyway to find any escape sequences.
|
---|
713 |
|
---|
714 | Instead you can use exclusive start conditions and a set of rules,
|
---|
715 | one for matching non-escaped text, one for matching a single escape,
|
---|
716 | one for matching an embedded newline, and one for recognizing the end
|
---|
717 | of the string. Each of these rules is then faced with the question of
|
---|
718 | where to put its intermediary results. The best solution is for the
|
---|
719 | rules to append their local value of `yytext' to the end of a "string
|
---|
720 | literal" buffer. A rule like the escape-matcher will append to the
|
---|
721 | buffer the meaning of the escape sequence rather than the literal text
|
---|
722 | in `yytext'. In this way, `yytext' does not need to be modified at all.
|
---|
723 |
|
---|
724 |
|
---|
725 | File: flex.info, Node: Why do flex scanners call fileno if it is not ANSI compatible?, Next: Does flex support recursive pattern definitions?, Prev: How do I expand \ escape sequences in C-style quoted strings?, Up: FAQ
|
---|
726 |
|
---|
727 | Why do flex scanners call fileno if it is not ANSI compatible?
|
---|
728 | ==============================================================
|
---|
729 |
|
---|
730 | Flex scanners call `fileno()' in order to get the file descriptor
|
---|
731 | corresponding to `yyin'. The file descriptor may be passed to
|
---|
732 | `isatty()' or `read()', depending upon which `%options' you specified.
|
---|
733 | If your system does not have `fileno()' support, to get rid of the
|
---|
734 | `read()' call, do not specify `%option read'. To get rid of the
|
---|
735 | `isatty()' call, you must specify one of `%option always-interactive' or
|
---|
736 | `%option never-interactive'.
|
---|
737 |
|
---|
738 |
|
---|
739 | File: flex.info, Node: Does flex support recursive pattern definitions?, Next: How do I skip huge chunks of input (tens of megabytes) while using flex?, Prev: Why do flex scanners call fileno if it is not ANSI compatible?, Up: FAQ
|
---|
740 |
|
---|
741 | Does flex support recursive pattern definitions?
|
---|
742 | ================================================
|
---|
743 |
|
---|
744 | e.g.,
|
---|
745 |
|
---|
746 |
|
---|
747 | %%
|
---|
748 | block "{"({block}|{statement})*"}"
|
---|
749 |
|
---|
750 | No. You cannot have recursive definitions. The pattern-matching
|
---|
751 | power of regular expressions in general (and therefore flex scanners,
|
---|
752 | too) is limited. In particular, regular expressions cannot "balance"
|
---|
753 | parentheses to an arbitrary degree. For example, it's impossible to
|
---|
754 | write a regular expression that matches all strings containing the same
|
---|
755 | number of '{'s as '}'s. For more powerful pattern matching, you need a
|
---|
756 | parser, such as `GNU bison'.
|
---|
757 |
|
---|
758 |
|
---|
759 | File: flex.info, Node: How do I skip huge chunks of input (tens of megabytes) while using flex?, Next: Flex is not matching my patterns in the same order that I defined them., Prev: Does flex support recursive pattern definitions?, Up: FAQ
|
---|
760 |
|
---|
761 | How do I skip huge chunks of input (tens of megabytes) while using flex?
|
---|
762 | ========================================================================
|
---|
763 |
|
---|
764 | Use `fseek()' (or `lseek()') to position yyin, then call
|
---|
765 | `yyrestart()'.
|
---|
766 |
|
---|
767 |
|
---|
768 | File: flex.info, Node: Flex is not matching my patterns in the same order that I defined them., Next: My actions are executing out of order or sometimes not at all., Prev: How do I skip huge chunks of input (tens of megabytes) while using flex?, Up: FAQ
|
---|
769 |
|
---|
770 | Flex is not matching my patterns in the same order that I defined them.
|
---|
771 | =======================================================================
|
---|
772 |
|
---|
773 | `flex' picks the rule that matches the most text (i.e., the longest
|
---|
774 | possible input string). This is because `flex' uses an entirely
|
---|
775 | different matching technique ("deterministic finite automata") that
|
---|
776 | actually does all of the matching simultaneously, in parallel. (Seems
|
---|
777 | impossible, but it's actually a fairly simple technique once you
|
---|
778 | understand the principles.)
|
---|
779 |
|
---|
780 | A side-effect of this parallel matching is that when the input
|
---|
781 | matches more than one rule, `flex' scanners pick the rule that matched
|
---|
782 | the _most_ text. This is explained further in the manual, in the
|
---|
783 | section *Note Matching::.
|
---|
784 |
|
---|
785 | If you want `flex' to choose a shorter match, then you can work
|
---|
786 | around this behavior by expanding your short rule to match more text,
|
---|
787 | then put back the extra:
|
---|
788 |
|
---|
789 |
|
---|
790 | data_.* yyless( 5 ); BEGIN BLOCKIDSTATE;
|
---|
791 |
|
---|
792 | Another fix would be to make the second rule active only during the
|
---|
793 | `<BLOCKIDSTATE>' start condition, and make that start condition
|
---|
794 | exclusive by declaring it with `%x' instead of `%s'.
|
---|
795 |
|
---|
796 | A final fix is to change the input language so that the ambiguity for
|
---|
797 | `data_' is removed, by adding characters to it that don't match the
|
---|
798 | identifier rule, or by removing characters (such as `_') from the
|
---|
799 | identifier rule so it no longer matches `data_'. (Of course, you might
|
---|
800 | also not have the option of changing the input language.)
|
---|
801 |
|
---|
802 |
|
---|
803 | File: flex.info, Node: My actions are executing out of order or sometimes not at all., Next: How can I have multiple input sources feed into the same scanner at the same time?, Prev: Flex is not matching my patterns in the same order that I defined them., Up: FAQ
|
---|
804 |
|
---|
805 | My actions are executing out of order or sometimes not at all.
|
---|
806 | ==============================================================
|
---|
807 |
|
---|
808 | Most likely, you have (in error) placed the opening `{' of the action
|
---|
809 | block on a different line than the rule, e.g.,
|
---|
810 |
|
---|
811 |
|
---|
812 | ^(foo|bar)
|
---|
813 | { <<<--- WRONG!
|
---|
814 |
|
---|
815 | }
|
---|
816 |
|
---|
817 | `flex' requires that the opening `{' of an action associated with a
|
---|
818 | rule begin on the same line as does the rule. You need instead to
|
---|
819 | write your rules as follows:
|
---|
820 |
|
---|
821 |
|
---|
822 | ^(foo|bar) { // CORRECT!
|
---|
823 |
|
---|
824 | }
|
---|
825 |
|
---|
826 |
|
---|
827 | File: flex.info, Node: How can I have multiple input sources feed into the same scanner at the same time?, Next: Can I build nested parsers that work with the same input file?, Prev: My actions are executing out of order or sometimes not at all., Up: FAQ
|
---|
828 |
|
---|
829 | How can I have multiple input sources feed into the same scanner at the same time?
|
---|
830 | ==================================================================================
|
---|
831 |
|
---|
832 | If ...
|
---|
833 | * your scanner is free of backtracking (verified using `flex''s `-b'
|
---|
834 | flag),
|
---|
835 |
|
---|
836 | * AND you run your scanner interactively (`-I' option; default
|
---|
837 | unless using special table compression options),
|
---|
838 |
|
---|
839 | * AND you feed it one character at a time by redefining `YY_INPUT'
|
---|
840 | to do so,
|
---|
841 |
|
---|
842 | then every time it matches a token, it will have exhausted its input
|
---|
843 | buffer (because the scanner is free of backtracking). This means you
|
---|
844 | can safely use `select()' at the point and only call `yylex()' for
|
---|
845 | another token if `select()' indicates there's data available.
|
---|
846 |
|
---|
847 | That is, move the `select()' out from the input function to a point
|
---|
848 | where it determines whether `yylex()' gets called for the next token.
|
---|
849 |
|
---|
850 | With this approach, you will still have problems if your input can
|
---|
851 | arrive piecemeal; `select()' could inform you that the beginning of a
|
---|
852 | token is available, you call `yylex()' to get it, but it winds up
|
---|
853 | blocking waiting for the later characters in the token.
|
---|
854 |
|
---|
855 | Here's another way: Move your input multiplexing inside of
|
---|
856 | `YY_INPUT'. That is, whenever `YY_INPUT' is called, it `select()''s to
|
---|
857 | see where input is available. If input is available for the scanner,
|
---|
858 | it reads and returns the next byte. If input is available from another
|
---|
859 | source, it calls whatever function is responsible for reading from that
|
---|
860 | source. (If no input is available, it blocks until some input is
|
---|
861 | available.) I've used this technique in an interpreter I wrote that
|
---|
862 | both reads keyboard input using a `flex' scanner and IPC traffic from
|
---|
863 | sockets, and it works fine.
|
---|
864 |
|
---|
865 |
|
---|
866 | File: flex.info, Node: Can I build nested parsers that work with the same input file?, Next: How can I match text only at the end of a file?, Prev: How can I have multiple input sources feed into the same scanner at the same time?, Up: FAQ
|
---|
867 |
|
---|
868 | Can I build nested parsers that work with the same input file?
|
---|
869 | ==============================================================
|
---|
870 |
|
---|
871 | This is not going to work without some additional effort. The
|
---|
872 | reason is that `flex' block-buffers the input it reads from `yyin'.
|
---|
873 | This means that the "outermost" `yylex()', when called, will
|
---|
874 | automatically slurp up the first 8K of input available on yyin, and
|
---|
875 | subsequent calls to other `yylex()''s won't see that input. You might
|
---|
876 | be tempted to work around this problem by redefining `YY_INPUT' to only
|
---|
877 | return a small amount of text, but it turns out that that approach is
|
---|
878 | quite difficult. Instead, the best solution is to combine all of your
|
---|
879 | scanners into one large scanner, using a different exclusive start
|
---|
880 | condition for each.
|
---|
881 |
|
---|
882 |
|
---|
883 | File: flex.info, Node: How can I match text only at the end of a file?, Next: How can I make REJECT cascade across start condition boundaries?, Prev: Can I build nested parsers that work with the same input file?, Up: FAQ
|
---|
884 |
|
---|
885 | How can I match text only at the end of a file?
|
---|
886 | ===============================================
|
---|
887 |
|
---|
888 | There is no way to write a rule which is "match this text, but only
|
---|
889 | if it comes at the end of the file". You can fake it, though, if you
|
---|
890 | happen to have a character lying around that you don't allow in your
|
---|
891 | input. Then you redefine `YY_INPUT' to call your own routine which, if
|
---|
892 | it sees an `EOF', returns the magic character first (and remembers to
|
---|
893 | return a real `EOF' next time it's called). Then you could write:
|
---|
894 |
|
---|
895 |
|
---|
896 | <COMMENT>(.|\n)*{EOF_CHAR} /* saw comment at EOF */
|
---|
897 |
|
---|
898 |
|
---|
899 | File: flex.info, Node: How can I make REJECT cascade across start condition boundaries?, Next: Why cant I use fast or full tables with interactive mode?, Prev: How can I match text only at the end of a file?, Up: FAQ
|
---|
900 |
|
---|
901 | How can I make REJECT cascade across start condition boundaries?
|
---|
902 | ================================================================
|
---|
903 |
|
---|
904 | You can do this as follows. Suppose you have a start condition `A',
|
---|
905 | and after exhausting all of the possible matches in `<A>', you want to
|
---|
906 | try matches in `<INITIAL>'. Then you could use the following:
|
---|
907 |
|
---|
908 |
|
---|
909 | %x A
|
---|
910 | %%
|
---|
911 | <A>rule_that_is_long ...; REJECT;
|
---|
912 | <A>rule ...; REJECT; /* shorter rule */
|
---|
913 | <A>etc.
|
---|
914 | ...
|
---|
915 | <A>.|\n {
|
---|
916 | /* Shortest and last rule in <A>, so
|
---|
917 | * cascaded REJECT's will eventually
|
---|
918 | * wind up matching this rule. We want
|
---|
919 | * to now switch to the initial state
|
---|
920 | * and try matching from there instead.
|
---|
921 | */
|
---|
922 | yyless(0); /* put back matched text */
|
---|
923 | BEGIN(INITIAL);
|
---|
924 | }
|
---|
925 |
|
---|
926 |
|
---|
927 | File: flex.info, Node: Why cant I use fast or full tables with interactive mode?, Next: How much faster is -F or -f than -C?, Prev: How can I make REJECT cascade across start condition boundaries?, Up: FAQ
|
---|
928 |
|
---|
929 | Why can't I use fast or full tables with interactive mode?
|
---|
930 | ==========================================================
|
---|
931 |
|
---|
932 | One of the assumptions flex makes is that interactive applications
|
---|
933 | are inherently slow (they're waiting on a human after all). It has to
|
---|
934 | do with how the scanner detects that it must be finished scanning a
|
---|
935 | token. For interactive scanners, after scanning each character the
|
---|
936 | current state is looked up in a table (essentially) to see whether
|
---|
937 | there's a chance of another input character possibly extending the
|
---|
938 | length of the match. If not, the scanner halts. For non-interactive
|
---|
939 | scanners, the end-of-token test is much simpler, basically a compare
|
---|
940 | with 0, so no memory bus cycles. Since the test occurs in the
|
---|
941 | innermost scanning loop, one would like to make it go as fast as
|
---|
942 | possible.
|
---|
943 |
|
---|
944 | Still, it seems reasonable to allow the user to choose to trade off
|
---|
945 | a bit of performance in this area to gain the corresponding
|
---|
946 | flexibility. There might be another reason, though, why fast scanners
|
---|
947 | don't support the interactive option.
|
---|
948 |
|
---|
949 |
|
---|
950 | File: flex.info, Node: How much faster is -F or -f than -C?, Next: If I have a simple grammar cant I just parse it with flex?, Prev: Why cant I use fast or full tables with interactive mode?, Up: FAQ
|
---|
951 |
|
---|
952 | How much faster is -F or -f than -C?
|
---|
953 | ====================================
|
---|
954 |
|
---|
955 | Much faster (factor of 2-3).
|
---|
956 |
|
---|
957 |
|
---|
958 | File: flex.info, Node: If I have a simple grammar cant I just parse it with flex?, Next: Why doesnt yyrestart() set the start state back to INITIAL?, Prev: How much faster is -F or -f than -C?, Up: FAQ
|
---|
959 |
|
---|
960 | If I have a simple grammar can't I just parse it with flex?
|
---|
961 | ===========================================================
|
---|
962 |
|
---|
963 | Is your grammar recursive? That's almost always a sign that you're
|
---|
964 | better off using a parser/scanner rather than just trying to use a
|
---|
965 | scanner alone.
|
---|
966 |
|
---|
967 |
|
---|
968 | File: flex.info, Node: Why doesnt yyrestart() set the start state back to INITIAL?, Next: How can I match C-style comments?, Prev: If I have a simple grammar cant I just parse it with flex?, Up: FAQ
|
---|
969 |
|
---|
970 | Why doesn't yyrestart() set the start state back to INITIAL?
|
---|
971 | ============================================================
|
---|
972 |
|
---|
973 | There are two reasons. The first is that there might be programs
|
---|
974 | that rely on the start state not changing across file changes. The
|
---|
975 | second is that beginning with `flex' version 2.4, use of `yyrestart()'
|
---|
976 | is no longer required, so fixing the problem there doesn't solve the
|
---|
977 | more general problem.
|
---|
978 |
|
---|
979 |
|
---|
980 | File: flex.info, Node: How can I match C-style comments?, Next: The period isnt working the way I expected., Prev: Why doesnt yyrestart() set the start state back to INITIAL?, Up: FAQ
|
---|
981 |
|
---|
982 | How can I match C-style comments?
|
---|
983 | =================================
|
---|
984 |
|
---|
985 | You might be tempted to try something like this:
|
---|
986 |
|
---|
987 |
|
---|
988 | "/*".*"*/" // WRONG!
|
---|
989 |
|
---|
990 | or, worse, this:
|
---|
991 |
|
---|
992 |
|
---|
993 | "/*"(.|\n)"*/" // WRONG!
|
---|
994 |
|
---|
995 | The above rules will eat too much input, and blow up on things like:
|
---|
996 |
|
---|
997 |
|
---|
998 | /* a comment */ do_my_thing( "oops */" );
|
---|
999 |
|
---|
1000 | Here is one way which allows you to track line information:
|
---|
1001 |
|
---|
1002 |
|
---|
1003 | <INITIAL>{
|
---|
1004 | "/*" BEGIN(IN_COMMENT);
|
---|
1005 | }
|
---|
1006 | <IN_COMMENT>{
|
---|
1007 | "*/" BEGIN(INITIAL);
|
---|
1008 | [^*\n]+ // eat comment in chunks
|
---|
1009 | "*" // eat the lone star
|
---|
1010 | \n yylineno++;
|
---|
1011 | }
|
---|
1012 |
|
---|
1013 |
|
---|
1014 | File: flex.info, Node: The period isnt working the way I expected., Next: Can I get the flex manual in another format?, Prev: How can I match C-style comments?, Up: FAQ
|
---|
1015 |
|
---|
1016 | The '.' isn't working the way I expected.
|
---|
1017 | =========================================
|
---|
1018 |
|
---|
1019 | Here are some tips for using `.':
|
---|
1020 |
|
---|
1021 | * A common mistake is to place the grouping parenthesis AFTER an
|
---|
1022 | operator, when you really meant to place the parenthesis BEFORE
|
---|
1023 | the operator, e.g., you probably want this `(foo|bar)+' and NOT
|
---|
1024 | this `(foo|bar+)'.
|
---|
1025 |
|
---|
1026 | The first pattern matches the words `foo' or `bar' any number of
|
---|
1027 | times, e.g., it matches the text `barfoofoobarfoo'. The second
|
---|
1028 | pattern matches a single instance of `foo' or a single instance of
|
---|
1029 | `bar' followed by one or more `r's, e.g., it matches the text
|
---|
1030 | `barrrr' .
|
---|
1031 |
|
---|
1032 | * A `.' inside `[]''s just means a literal`.' (period), and NOT "any
|
---|
1033 | character except newline".
|
---|
1034 |
|
---|
1035 | * Remember that `.' matches any character EXCEPT `\n' (and `EOF').
|
---|
1036 | If you really want to match ANY character, including newlines,
|
---|
1037 | then use `(.|\n)' Beware that the regex `(.|\n)+' will match your
|
---|
1038 | entire input!
|
---|
1039 |
|
---|
1040 | * Finally, if you want to match a literal `.' (a period), then use
|
---|
1041 | `[.]' or `"."'
|
---|
1042 |
|
---|
1043 |
|
---|
1044 | File: flex.info, Node: Can I get the flex manual in another format?, Next: Does there exist a "faster" NDFA->DFA algorithm?, Prev: The period isnt working the way I expected., Up: FAQ
|
---|
1045 |
|
---|
1046 | Can I get the flex manual in another format?
|
---|
1047 | ============================================
|
---|
1048 |
|
---|
1049 | The `flex' source distribution includes a texinfo manual. You are
|
---|
1050 | free to convert that texinfo into whatever format you desire. The
|
---|
1051 | `texinfo' package includes tools for conversion to a number of formats.
|
---|
1052 |
|
---|
1053 |
|
---|
1054 | File: flex.info, Node: Does there exist a "faster" NDFA->DFA algorithm?, Next: How does flex compile the DFA so quickly?, Prev: Can I get the flex manual in another format?, Up: FAQ
|
---|
1055 |
|
---|
1056 | Does there exist a "faster" NDFA->DFA algorithm?
|
---|
1057 | ================================================
|
---|
1058 |
|
---|
1059 | There's no way around the potential exponential running time - it
|
---|
1060 | can take you exponential time just to enumerate all of the DFA states.
|
---|
1061 | In practice, though, the running time is closer to linear, or sometimes
|
---|
1062 | quadratic.
|
---|
1063 |
|
---|
1064 |
|
---|
1065 | File: flex.info, Node: How does flex compile the DFA so quickly?, Next: How can I use more than 8192 rules?, Prev: Does there exist a "faster" NDFA->DFA algorithm?, Up: FAQ
|
---|
1066 |
|
---|
1067 | How does flex compile the DFA so quickly?
|
---|
1068 | =========================================
|
---|
1069 |
|
---|
1070 | There are two big speed wins that `flex' uses:
|
---|
1071 |
|
---|
1072 | 1. It analyzes the input rules to construct equivalence classes for
|
---|
1073 | those characters that always make the same transitions. It then
|
---|
1074 | rewrites the NFA using equivalence classes for transitions instead
|
---|
1075 | of characters. This cuts down the NFA->DFA computation time
|
---|
1076 | dramatically, to the point where, for uncompressed DFA tables, the
|
---|
1077 | DFA generation is often I/O bound in writing out the tables.
|
---|
1078 |
|
---|
1079 | 2. It maintains hash values for previously computed DFA states, so
|
---|
1080 | testing whether a newly constructed DFA state is equivalent to a
|
---|
1081 | previously constructed state can be done very quickly, by first
|
---|
1082 | comparing hash values.
|
---|
1083 |
|
---|
1084 |
|
---|
1085 | File: flex.info, Node: How can I use more than 8192 rules?, Next: How do I abandon a file in the middle of a scan and switch to a new file?, Prev: How does flex compile the DFA so quickly?, Up: FAQ
|
---|
1086 |
|
---|
1087 | How can I use more than 8192 rules?
|
---|
1088 | ===================================
|
---|
1089 |
|
---|
1090 | `Flex' is compiled with an upper limit of 8192 rules per scanner.
|
---|
1091 | If you need more than 8192 rules in your scanner, you'll have to
|
---|
1092 | recompile `flex' with the following changes in `flexdef.h':
|
---|
1093 |
|
---|
1094 |
|
---|
1095 | < #define YY_TRAILING_MASK 0x2000
|
---|
1096 | < #define YY_TRAILING_HEAD_MASK 0x4000
|
---|
1097 | --
|
---|
1098 | > #define YY_TRAILING_MASK 0x20000000
|
---|
1099 | > #define YY_TRAILING_HEAD_MASK 0x40000000
|
---|
1100 |
|
---|
1101 | This should work okay as long as your C compiler uses 32 bit
|
---|
1102 | integers. But you might want to think about whether using such a huge
|
---|
1103 | number of rules is the best way to solve your problem.
|
---|
1104 |
|
---|
1105 | The following may also be relevant:
|
---|
1106 |
|
---|
1107 | With luck, you should be able to increase the definitions in
|
---|
1108 | flexdef.h for:
|
---|
1109 |
|
---|
1110 |
|
---|
1111 | #define JAMSTATE -32766 /* marks a reference to the state that always jams */
|
---|
1112 | #define MAXIMUM_MNS 31999
|
---|
1113 | #define BAD_SUBSCRIPT -32767
|
---|
1114 |
|
---|
1115 | recompile everything, and it'll all work. Flex only has these
|
---|
1116 | 16-bit-like values built into it because a long time ago it was
|
---|
1117 | developed on a machine with 16-bit ints. I've given this advice to
|
---|
1118 | others in the past but haven't heard back from them whether it worked
|
---|
1119 | okay or not...
|
---|
1120 |
|
---|
1121 |
|
---|
1122 | File: flex.info, Node: How do I abandon a file in the middle of a scan and switch to a new file?, Next: How do I execute code only during initialization (only before the first scan)?, Prev: How can I use more than 8192 rules?, Up: FAQ
|
---|
1123 |
|
---|
1124 | How do I abandon a file in the middle of a scan and switch to a new file?
|
---|
1125 | =========================================================================
|
---|
1126 |
|
---|
1127 | Just call `yyrestart(newfile)'. Be sure to reset the start state if
|
---|
1128 | you want a "fresh start, since `yyrestart' does NOT reset the start
|
---|
1129 | state back to `INITIAL'.
|
---|
1130 |
|
---|
1131 |
|
---|
1132 | File: flex.info, Node: How do I execute code only during initialization (only before the first scan)?, Next: How do I execute code at termination?, Prev: How do I abandon a file in the middle of a scan and switch to a new file?, Up: FAQ
|
---|
1133 |
|
---|
1134 | How do I execute code only during initialization (only before the first scan)?
|
---|
1135 | ==============================================================================
|
---|
1136 |
|
---|
1137 | You can specify an initial action by defining the macro
|
---|
1138 | `YY_USER_INIT' (though note that `yyout' may not be available at the
|
---|
1139 | time this macro is executed). Or you can add to the beginning of your
|
---|
1140 | rules section:
|
---|
1141 |
|
---|
1142 |
|
---|
1143 | %%
|
---|
1144 | /* Must be indented! */
|
---|
1145 | static int did_init = 0;
|
---|
1146 |
|
---|
1147 | if ( ! did_init ){
|
---|
1148 | do_my_init();
|
---|
1149 | did_init = 1;
|
---|
1150 | }
|
---|
1151 |
|
---|
1152 |
|
---|
1153 | File: flex.info, Node: How do I execute code at termination?, Next: Where else can I find help?, Prev: How do I execute code only during initialization (only before the first scan)?, Up: FAQ
|
---|
1154 |
|
---|
1155 | How do I execute code at termination?
|
---|
1156 | =====================================
|
---|
1157 |
|
---|
1158 | You can specify an action for the `<<EOF>>' rule.
|
---|
1159 |
|
---|
1160 |
|
---|
1161 | File: flex.info, Node: Where else can I find help?, Next: Can I include comments in the "rules" section of the file?, Prev: How do I execute code at termination?, Up: FAQ
|
---|
1162 |
|
---|
1163 | Where else can I find help?
|
---|
1164 | ===========================
|
---|
1165 |
|
---|
1166 | You can find the flex homepage on the web at
|
---|
1167 | `http://flex.sourceforge.net/'. See that page for details about flex
|
---|
1168 | mailing lists as well.
|
---|
1169 |
|
---|
1170 |
|
---|
1171 | File: flex.info, Node: Can I include comments in the "rules" section of the file?, Next: I get an error about undefined yywrap()., Prev: Where else can I find help?, Up: FAQ
|
---|
1172 |
|
---|
1173 | Can I include comments in the "rules" section of the file?
|
---|
1174 | ==========================================================
|
---|
1175 |
|
---|
1176 | Yes, just about anywhere you want to. See the manual for the
|
---|
1177 | specific syntax.
|
---|
1178 |
|
---|
1179 |
|
---|
1180 | File: flex.info, Node: I get an error about undefined yywrap()., Next: How can I change the matching pattern at run time?, Prev: Can I include comments in the "rules" section of the file?, Up: FAQ
|
---|
1181 |
|
---|
1182 | I get an error about undefined yywrap().
|
---|
1183 | ========================================
|
---|
1184 |
|
---|
1185 | You must supply a `yywrap()' function of your own, or link to
|
---|
1186 | `libfl.a' (which provides one), or use
|
---|
1187 |
|
---|
1188 |
|
---|
1189 | %option noyywrap
|
---|
1190 |
|
---|
1191 | in your source to say you don't want a `yywrap()' function.
|
---|
1192 |
|
---|
1193 |
|
---|
1194 | File: flex.info, Node: How can I change the matching pattern at run time?, Next: How can I expand macros in the input?, Prev: I get an error about undefined yywrap()., Up: FAQ
|
---|
1195 |
|
---|
1196 | How can I change the matching pattern at run time?
|
---|
1197 | ==================================================
|
---|
1198 |
|
---|
1199 | You can't, it's compiled into a static table when flex builds the
|
---|
1200 | scanner.
|
---|
1201 |
|
---|
1202 |
|
---|
1203 | File: flex.info, Node: How can I expand macros in the input?, Next: How can I build a two-pass scanner?, Prev: How can I change the matching pattern at run time?, Up: FAQ
|
---|
1204 |
|
---|
1205 | How can I expand macros in the input?
|
---|
1206 | =====================================
|
---|
1207 |
|
---|
1208 | The best way to approach this problem is at a higher level, e.g., in
|
---|
1209 | the parser.
|
---|
1210 |
|
---|
1211 | However, you can do this using multiple input buffers.
|
---|
1212 |
|
---|
1213 |
|
---|
1214 | %%
|
---|
1215 | macro/[a-z]+ {
|
---|
1216 | /* Saw the macro "macro" followed by extra stuff. */
|
---|
1217 | main_buffer = YY_CURRENT_BUFFER;
|
---|
1218 | expansion_buffer = yy_scan_string(expand(yytext));
|
---|
1219 | yy_switch_to_buffer(expansion_buffer);
|
---|
1220 | }
|
---|
1221 |
|
---|
1222 | <<EOF>> {
|
---|
1223 | if ( expansion_buffer )
|
---|
1224 | {
|
---|
1225 | // We were doing an expansion, return to where
|
---|
1226 | // we were.
|
---|
1227 | yy_switch_to_buffer(main_buffer);
|
---|
1228 | yy_delete_buffer(expansion_buffer);
|
---|
1229 | expansion_buffer = 0;
|
---|
1230 | }
|
---|
1231 | else
|
---|
1232 | yyterminate();
|
---|
1233 | }
|
---|
1234 |
|
---|
1235 | You probably will want a stack of expansion buffers to allow nested
|
---|
1236 | macros. From the above though hopefully the idea is clear.
|
---|
1237 |
|
---|
1238 |
|
---|
1239 | File: flex.info, Node: How can I build a two-pass scanner?, Next: How do I match any string not matched in the preceding rules?, Prev: How can I expand macros in the input?, Up: FAQ
|
---|
1240 |
|
---|
1241 | How can I build a two-pass scanner?
|
---|
1242 | ===================================
|
---|
1243 |
|
---|
1244 | One way to do it is to filter the first pass to a temporary file,
|
---|
1245 | then process the temporary file on the second pass. You will probably
|
---|
1246 | see a performance hit, do to all the disk I/O.
|
---|
1247 |
|
---|
1248 | When you need to look ahead far forward like this, it almost always
|
---|
1249 | means that the right solution is to build a parse tree of the entire
|
---|
1250 | input, then walk it after the parse in order to generate the output.
|
---|
1251 | In a sense, this is a two-pass approach, once through the text and once
|
---|
1252 | through the parse tree, but the performance hit for the latter is
|
---|
1253 | usually an order of magnitude smaller, since everything is already
|
---|
1254 | classified, in binary format, and residing in memory.
|
---|
1255 |
|
---|