1 | This is flex.info, produced by makeinfo version 4.5 from flex.texi.
|
---|
2 |
|
---|
3 | INFO-DIR-SECTION Programming
|
---|
4 | START-INFO-DIR-ENTRY
|
---|
5 | * flex: (flex). Fast lexical analyzer generator (lex replacement).
|
---|
6 | END-INFO-DIR-ENTRY
|
---|
7 |
|
---|
8 |
|
---|
9 | The flex manual is placed under the same licensing conditions as the
|
---|
10 | rest of flex:
|
---|
11 |
|
---|
12 | Copyright (C) 1990, 1997 The Regents of the University of California.
|
---|
13 | All rights reserved.
|
---|
14 |
|
---|
15 | This code is derived from software contributed to Berkeley by Vern
|
---|
16 | Paxson.
|
---|
17 |
|
---|
18 | The United States Government has rights in this work pursuant to
|
---|
19 | contract no. DE-AC03-76SF00098 between the United States Department of
|
---|
20 | Energy and the University of California.
|
---|
21 |
|
---|
22 | Redistribution and use in source and binary forms, with or without
|
---|
23 | modification, are permitted provided that the following conditions are
|
---|
24 | met:
|
---|
25 |
|
---|
26 | 1. Redistributions of source code must retain the above copyright
|
---|
27 | notice, this list of conditions and the following disclaimer.
|
---|
28 |
|
---|
29 | 2. Redistributions in binary form must reproduce the above copyright
|
---|
30 | notice, this list of conditions and the following disclaimer in the
|
---|
31 | documentation and/or other materials provided with the
|
---|
32 | distribution.
|
---|
33 | Neither the name of the University nor the names of its contributors
|
---|
34 | may be used to endorse or promote products derived from this software
|
---|
35 | without specific prior written permission.
|
---|
36 |
|
---|
37 | THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
|
---|
38 | WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
|
---|
39 | MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
|
---|
40 |
|
---|
41 | File: flex.info, Node: How do I match any string not matched in the preceding rules?, Next: I am trying to port code from AT&T lex that uses yysptr and yysbuf., Prev: How can I build a two-pass scanner?, Up: FAQ
|
---|
42 |
|
---|
43 | How do I match any string not matched in the preceding rules?
|
---|
44 | =============================================================
|
---|
45 |
|
---|
46 | One way to assign precedence, is to place the more specific rules
|
---|
47 | first. If two rules would match the same input (same sequence of
|
---|
48 | characters) then the first rule listed in the `flex' input wins. e.g.,
|
---|
49 |
|
---|
50 |
|
---|
51 | %%
|
---|
52 | foo[a-zA-Z_]+ return FOO_ID;
|
---|
53 | bar[a-zA-Z_]+ return BAR_ID;
|
---|
54 | [a-zA-Z_]+ return GENERIC_ID;
|
---|
55 |
|
---|
56 | Note that the rule `[a-zA-Z_]+' must come *after* the others. It
|
---|
57 | will match the same amount of text as the more specific rules, and in
|
---|
58 | that case the `flex' scanner will pick the first rule listed in your
|
---|
59 | scanner as the one to match.
|
---|
60 |
|
---|
61 |
|
---|
62 | File: flex.info, Node: I am trying to port code from AT&T lex that uses yysptr and yysbuf., Next: Is there a way to make flex treat NULL like a regular character?, Prev: How do I match any string not matched in the preceding rules?, Up: FAQ
|
---|
63 |
|
---|
64 | I am trying to port code from AT&T lex that uses yysptr and yysbuf.
|
---|
65 | ===================================================================
|
---|
66 |
|
---|
67 | Those are internal variables pointing into the AT&T scanner's input
|
---|
68 | buffer. I imagine they're being manipulated in user versions of the
|
---|
69 | `input()' and `unput()' functions. If so, what you need to do is
|
---|
70 | analyze those functions to figure out what they're doing, and then
|
---|
71 | replace `input()' with an appropriate definition of `YY_INPUT'. You
|
---|
72 | shouldn't need to (and must not) replace `flex''s `unput()' function.
|
---|
73 |
|
---|
74 |
|
---|
75 | File: flex.info, Node: Is there a way to make flex treat NULL like a regular character?, Next: Whenever flex can not match the input it says "flex scanner jammed"., Prev: I am trying to port code from AT&T lex that uses yysptr and yysbuf., Up: FAQ
|
---|
76 |
|
---|
77 | Is there a way to make flex treat NULL like a regular character?
|
---|
78 | ================================================================
|
---|
79 |
|
---|
80 | Yes, `\0' and `\x00' should both do the trick. Perhaps you have an
|
---|
81 | ancient version of `flex'. The latest release is version 2.5.33.
|
---|
82 |
|
---|
83 |
|
---|
84 | File: flex.info, Node: Whenever flex can not match the input it says "flex scanner jammed"., Next: Why doesnt flex have non-greedy operators like perl does?, Prev: Is there a way to make flex treat NULL like a regular character?, Up: FAQ
|
---|
85 |
|
---|
86 | Whenever flex can not match the input it says "flex scanner jammed".
|
---|
87 | ====================================================================
|
---|
88 |
|
---|
89 | You need to add a rule that matches the otherwise-unmatched text.
|
---|
90 | e.g.,
|
---|
91 |
|
---|
92 |
|
---|
93 | %option yylineno
|
---|
94 | %%
|
---|
95 | [[a bunch of rules here]]
|
---|
96 |
|
---|
97 | . printf("bad input character '%s' at line %d\n", yytext, yylineno);
|
---|
98 |
|
---|
99 | See `%option default' for more information.
|
---|
100 |
|
---|
101 |
|
---|
102 | File: flex.info, Node: Why doesnt flex have non-greedy operators like perl does?, Next: Memory leak - 16386 bytes allocated by malloc., Prev: Whenever flex can not match the input it says "flex scanner jammed"., Up: FAQ
|
---|
103 |
|
---|
104 | Why doesn't flex have non-greedy operators like perl does?
|
---|
105 | ==========================================================
|
---|
106 |
|
---|
107 | A DFA can do a non-greedy match by stopping the first time it enters
|
---|
108 | an accepting state, instead of consuming input until it determines that
|
---|
109 | no further matching is possible (a "jam" state). This is actually
|
---|
110 | easier to implement than longest leftmost match (which flex does).
|
---|
111 |
|
---|
112 | But it's also much less useful than longest leftmost match. In
|
---|
113 | general, when you find yourself wishing for non-greedy matching, that's
|
---|
114 | usually a sign that you're trying to make the scanner do some parsing.
|
---|
115 | That's generally the wrong approach, since it lacks the power to do a
|
---|
116 | decent job. Better is to either introduce a separate parser, or to
|
---|
117 | split the scanner into multiple scanners using (exclusive) start
|
---|
118 | conditions.
|
---|
119 |
|
---|
120 | You might have a separate start state once you've seen the `BEGIN'.
|
---|
121 | In that state, you might then have a regex that will match `END' (to
|
---|
122 | kick you out of the state), and perhaps `(.|\n)' to get a single
|
---|
123 | character within the chunk ...
|
---|
124 |
|
---|
125 | This approach also has much better error-reporting properties.
|
---|
126 |
|
---|
127 |
|
---|
128 | File: flex.info, Node: Memory leak - 16386 bytes allocated by malloc., Next: How do I track the byte offset for lseek()?, Prev: Why doesnt flex have non-greedy operators like perl does?, Up: FAQ
|
---|
129 |
|
---|
130 | Memory leak - 16386 bytes allocated by malloc.
|
---|
131 | ==============================================
|
---|
132 |
|
---|
133 | UPDATED 2002-07-10: As of `flex' version 2.5.9, this leak means that
|
---|
134 | you did not call `yylex_destroy()'. If you are using an earlier version
|
---|
135 | of `flex', then read on.
|
---|
136 |
|
---|
137 | The leak is about 16426 bytes. That is, (8192 * 2 + 2) for the
|
---|
138 | read-buffer, and about 40 for `struct yy_buffer_state' (depending upon
|
---|
139 | alignment). The leak is in the non-reentrant C scanner only (NOT in the
|
---|
140 | reentrant scanner, NOT in the C++ scanner). Since `flex' doesn't know
|
---|
141 | when you are done, the buffer is never freed.
|
---|
142 |
|
---|
143 | However, the leak won't multiply since the buffer is reused no
|
---|
144 | matter how many times you call `yylex()'.
|
---|
145 |
|
---|
146 | If you want to reclaim the memory when you are completely done
|
---|
147 | scanning, then you might try this:
|
---|
148 |
|
---|
149 |
|
---|
150 | /* For non-reentrant C scanner only. */
|
---|
151 | yy_delete_buffer(YY_CURRENT_BUFFER);
|
---|
152 | yy_init = 1;
|
---|
153 |
|
---|
154 | Note: `yy_init' is an "internal variable", and hasn't been tested in
|
---|
155 | this situation. It is possible that some other globals may need
|
---|
156 | resetting as well.
|
---|
157 |
|
---|
158 |
|
---|
159 | File: flex.info, Node: How do I track the byte offset for lseek()?, Next: How do I use my own I/O classes in a C++ scanner?, Prev: Memory leak - 16386 bytes allocated by malloc., Up: FAQ
|
---|
160 |
|
---|
161 | How do I track the byte offset for lseek()?
|
---|
162 | ===========================================
|
---|
163 |
|
---|
164 |
|
---|
165 | > We thought that it would be possible to have this number through the
|
---|
166 | > evaluation of the following expression:
|
---|
167 | >
|
---|
168 | > seek_position = (no_buffers)*YY_READ_BUF_SIZE + yy_c_buf_p - YY_CURRENT_BUFFER->yy_ch_buf
|
---|
169 |
|
---|
170 | While this is the right idea, it has two problems. The first is that
|
---|
171 | it's possible that `flex' will request less than `YY_READ_BUF_SIZE'
|
---|
172 | during an invocation of `YY_INPUT' (or that your input source will
|
---|
173 | return less even though `YY_READ_BUF_SIZE' bytes were requested). The
|
---|
174 | second problem is that when refilling its internal buffer, `flex' keeps
|
---|
175 | some characters from the previous buffer (because usually it's in the
|
---|
176 | middle of a match, and needs those characters to construct `yytext' for
|
---|
177 | the match once it's done). Because of this, `yy_c_buf_p -
|
---|
178 | YY_CURRENT_BUFFER->yy_ch_buf' won't be exactly the number of characters
|
---|
179 | already read from the current buffer.
|
---|
180 |
|
---|
181 | An alternative solution is to count the number of characters you've
|
---|
182 | matched since starting to scan. This can be done by using
|
---|
183 | `YY_USER_ACTION'. For example,
|
---|
184 |
|
---|
185 |
|
---|
186 | #define YY_USER_ACTION num_chars += yyleng;
|
---|
187 |
|
---|
188 | (You need to be careful to update your bookkeeping if you use
|
---|
189 | `yymore('), `yyless()', `unput()', or `input()'.)
|
---|
190 |
|
---|
191 |
|
---|
192 | File: flex.info, Node: How do I use my own I/O classes in a C++ scanner?, Next: How do I skip as many chars as possible?, Prev: How do I track the byte offset for lseek()?, Up: FAQ
|
---|
193 |
|
---|
194 | How do I use my own I/O classes in a C++ scanner?
|
---|
195 | =================================================
|
---|
196 |
|
---|
197 | When the flex C++ scanning class rewrite finally happens, then this
|
---|
198 | sort of thing should become much easier.
|
---|
199 |
|
---|
200 | You can do this by passing the various functions (such as
|
---|
201 | `LexerInput()' and `LexerOutput()') NULL `iostream*''s, and then
|
---|
202 | dealing with your own I/O classes surreptitiously (i.e., stashing them
|
---|
203 | in special member variables). This works because the only assumption
|
---|
204 | about the lexer regarding what's done with the iostream's is that
|
---|
205 | they're ultimately passed to `LexerInput()' and `LexerOutput', which
|
---|
206 | then do whatever is necessary with them.
|
---|
207 |
|
---|
208 |
|
---|
209 | File: flex.info, Node: How do I skip as many chars as possible?, Next: deleteme00, Prev: How do I use my own I/O classes in a C++ scanner?, Up: FAQ
|
---|
210 |
|
---|
211 | How do I skip as many chars as possible?
|
---|
212 | ========================================
|
---|
213 |
|
---|
214 | How do I skip as many chars as possible - without interfering with
|
---|
215 | the other patterns?
|
---|
216 |
|
---|
217 | In the example below, we want to skip over characters until we see
|
---|
218 | the phrase "endskip". The following will _NOT_ work correctly (do you
|
---|
219 | see why not?)
|
---|
220 |
|
---|
221 |
|
---|
222 | /* INCORRECT SCANNER */
|
---|
223 | %x SKIP
|
---|
224 | %%
|
---|
225 | <INITIAL>startskip BEGIN(SKIP);
|
---|
226 | ...
|
---|
227 | <SKIP>"endskip" BEGIN(INITIAL);
|
---|
228 | <SKIP>.* ;
|
---|
229 |
|
---|
230 | The problem is that the pattern .* will eat up the word "endskip."
|
---|
231 | The simplest (but slow) fix is:
|
---|
232 |
|
---|
233 |
|
---|
234 | <SKIP>"endskip" BEGIN(INITIAL);
|
---|
235 | <SKIP>. ;
|
---|
236 |
|
---|
237 | The fix involves making the second rule match more, without making
|
---|
238 | it match "endskip" plus something else. So for example:
|
---|
239 |
|
---|
240 |
|
---|
241 | <SKIP>"endskip" BEGIN(INITIAL);
|
---|
242 | <SKIP>[^e]+ ;
|
---|
243 | <SKIP>. ;/* so you eat up e's, too */
|
---|
244 |
|
---|
245 |
|
---|
246 | File: flex.info, Node: deleteme00, Next: Are certain equivalent patterns faster than others?, Prev: How do I skip as many chars as possible?, Up: FAQ
|
---|
247 |
|
---|
248 | deleteme00
|
---|
249 | ==========
|
---|
250 |
|
---|
251 |
|
---|
252 | QUESTION:
|
---|
253 | When was flex born?
|
---|
254 |
|
---|
255 | Vern Paxson took over
|
---|
256 | the Software Tools lex project from Jef Poskanzer in 1982. At that point it
|
---|
257 | was written in Ratfor. Around 1987 or so, Paxson translated it into C, and
|
---|
258 | a legend was born :-).
|
---|
259 |
|
---|
260 |
|
---|
261 | File: flex.info, Node: Are certain equivalent patterns faster than others?, Next: Is backing up a big deal?, Prev: deleteme00, Up: FAQ
|
---|
262 |
|
---|
263 | Are certain equivalent patterns faster than others?
|
---|
264 | ===================================================
|
---|
265 |
|
---|
266 |
|
---|
267 | To: Adoram Rogel <adoram@orna.hybridge.com>
|
---|
268 | Subject: Re: Flex 2.5.2 performance questions
|
---|
269 | In-reply-to: Your message of Wed, 18 Sep 96 11:12:17 EDT.
|
---|
270 | Date: Wed, 18 Sep 96 10:51:02 PDT
|
---|
271 | From: Vern Paxson <vern>
|
---|
272 |
|
---|
273 | [Note, the most recent flex release is 2.5.4, which you can get from
|
---|
274 | ftp.ee.lbl.gov. It has bug fixes over 2.5.2 and 2.5.3.]
|
---|
275 |
|
---|
276 | > 1. Using the pattern
|
---|
277 | > ([Ff](oot)?)?[Nn](ote)?(\.)?
|
---|
278 | > instead of
|
---|
279 | > (((F|f)oot(N|n)ote)|((N|n)ote)|((N|n)\.)|((F|f)(N|n)(\.)))
|
---|
280 | > (in a very complicated flex program) caused the program to slow from
|
---|
281 | > 300K+/min to 100K/min (no other changes were done).
|
---|
282 |
|
---|
283 | These two are not equivalent. For example, the first can match "footnote."
|
---|
284 | but the second can only match "footnote". This is almost certainly the
|
---|
285 | cause in the discrepancy - the slower scanner run is matching more tokens,
|
---|
286 | and/or having to do more backing up.
|
---|
287 |
|
---|
288 | > 2. Which of these two are better: [Ff]oot or (F|f)oot ?
|
---|
289 |
|
---|
290 | From a performance point of view, they're equivalent (modulo presumably
|
---|
291 | minor effects such as memory cache hit rates; and the presence of trailing
|
---|
292 | context, see below). From a space point of view, the first is slightly
|
---|
293 | preferable.
|
---|
294 |
|
---|
295 | > 3. I have a pattern that look like this:
|
---|
296 | > pats {p1}|{p2}|{p3}|...|{p50} (50 patterns ORd)
|
---|
297 | >
|
---|
298 | > running yet another complicated program that includes the following rule:
|
---|
299 | > <snext>{and}/{no4}{bb}{pats}
|
---|
300 | >
|
---|
301 | > gets me to "too complicated - over 32,000 states"...
|
---|
302 |
|
---|
303 | I can't tell from this example whether the trailing context is variable-length
|
---|
304 | or fixed-length (it could be the latter if {and} is fixed-length). If it's
|
---|
305 | variable length, which flex -p will tell you, then this reflects a basic
|
---|
306 | performance problem, and if you can eliminate it by restructuring your
|
---|
307 | scanner, you will see significant improvement.
|
---|
308 |
|
---|
309 | > so I divided {pats} to {pats1}, {pats2},..., {pats5} each consists of about
|
---|
310 | > 10 patterns and changed the rule to be 5 rules.
|
---|
311 | > This did compile, but what is the rule of thumb here ?
|
---|
312 |
|
---|
313 | The rule is to avoid trailing context other than fixed-length, in which for
|
---|
314 | a/b, either the 'a' pattern or the 'b' pattern have a fixed length. Use
|
---|
315 | of the '|' operator automatically makes the pattern variable length, so in
|
---|
316 | this case '[Ff]oot' is preferred to '(F|f)oot'.
|
---|
317 |
|
---|
318 | > 4. I changed a rule that looked like this:
|
---|
319 | > <snext8>{and}{bb}/{ROMAN}[^A-Za-z] { BEGIN...
|
---|
320 | >
|
---|
321 | > to the next 2 rules:
|
---|
322 | > <snext8>{and}{bb}/{ROMAN}[A-Za-z] { ECHO;}
|
---|
323 | > <snext8>{and}{bb}/{ROMAN} { BEGIN...
|
---|
324 | >
|
---|
325 | > Again, I understand the using [^...] will cause a great performance loss
|
---|
326 |
|
---|
327 | Actually, it doesn't cause any sort of performance loss. It's a surprising
|
---|
328 | fact about regular expressions that they always match in linear time
|
---|
329 | regardless of how complex they are.
|
---|
330 |
|
---|
331 | > but are there any specific rules about it ?
|
---|
332 |
|
---|
333 | See the "Performance Considerations" section of the man page, and also
|
---|
334 | the example in MISC/fastwc/.
|
---|
335 |
|
---|
336 | Vern
|
---|
337 |
|
---|
338 |
|
---|
339 | File: flex.info, Node: Is backing up a big deal?, Next: Can I fake multi-byte character support?, Prev: Are certain equivalent patterns faster than others?, Up: FAQ
|
---|
340 |
|
---|
341 | Is backing up a big deal?
|
---|
342 | =========================
|
---|
343 |
|
---|
344 |
|
---|
345 | To: Adoram Rogel <adoram@hybridge.com>
|
---|
346 | Subject: Re: Flex 2.5.2 performance questions
|
---|
347 | In-reply-to: Your message of Thu, 19 Sep 96 10:16:04 EDT.
|
---|
348 | Date: Thu, 19 Sep 96 09:58:00 PDT
|
---|
349 | From: Vern Paxson <vern>
|
---|
350 |
|
---|
351 | > a lot about the backing up problem.
|
---|
352 | > I believe that there lies my biggest problem, and I'll try to improve
|
---|
353 | > it.
|
---|
354 |
|
---|
355 | Since you have variable trailing context, this is a bigger performance
|
---|
356 | problem. Fixing it is usually easier than fixing backing up, which in a
|
---|
357 | complicated scanner (yours seems to fit the bill) can be extremely
|
---|
358 | difficult to do correctly.
|
---|
359 |
|
---|
360 | You also don't mention what flags you are using for your scanner.
|
---|
361 | -f makes a large speed difference, and -Cfe buys you nearly as much
|
---|
362 | speed but the resulting scanner is considerably smaller.
|
---|
363 |
|
---|
364 | > I have an | operator in {and} and in {pats} so both of them are variable
|
---|
365 | > length.
|
---|
366 |
|
---|
367 | -p should have reported this.
|
---|
368 |
|
---|
369 | > Is changing one of them to fixed-length is enough ?
|
---|
370 |
|
---|
371 | Yes.
|
---|
372 |
|
---|
373 | > Is it possible to change the 32,000 states limit ?
|
---|
374 |
|
---|
375 | Yes. I've appended instructions on how. Before you make this change,
|
---|
376 | though, you should think about whether there are ways to fundamentally
|
---|
377 | simplify your scanner - those are certainly preferable!
|
---|
378 |
|
---|
379 | Vern
|
---|
380 |
|
---|
381 | To increase the 32K limit (on a machine with 32 bit integers), you increase
|
---|
382 | the magnitude of the following in flexdef.h:
|
---|
383 |
|
---|
384 | #define JAMSTATE -32766 /* marks a reference to the state that always jams */
|
---|
385 | #define MAXIMUM_MNS 31999
|
---|
386 | #define BAD_SUBSCRIPT -32767
|
---|
387 | #define MAX_SHORT 32700
|
---|
388 |
|
---|
389 | Adding a 0 or two after each should do the trick.
|
---|
390 |
|
---|
391 |
|
---|
392 | File: flex.info, Node: Can I fake multi-byte character support?, Next: deleteme01, Prev: Is backing up a big deal?, Up: FAQ
|
---|
393 |
|
---|
394 | Can I fake multi-byte character support?
|
---|
395 | ========================================
|
---|
396 |
|
---|
397 |
|
---|
398 | To: Heeman_Lee@hp.com
|
---|
399 | Subject: Re: flex - multi-byte support?
|
---|
400 | In-reply-to: Your message of Thu, 03 Oct 1996 17:24:04 PDT.
|
---|
401 | Date: Fri, 04 Oct 1996 11:42:18 PDT
|
---|
402 | From: Vern Paxson <vern>
|
---|
403 |
|
---|
404 | > I assume as long as my *.l file defines the
|
---|
405 | > range of expected character code values (in octal format), flex will
|
---|
406 | > scan the file and read multi-byte characters correctly. But I have no
|
---|
407 | > confidence in this assumption.
|
---|
408 |
|
---|
409 | Your lack of confidence is justified - this won't work.
|
---|
410 |
|
---|
411 | Flex has in it a widespread assumption that the input is processed
|
---|
412 | one byte at a time. Fixing this is on the to-do list, but is involved,
|
---|
413 | so it won't happen any time soon. In the interim, the best I can suggest
|
---|
414 | (unless you want to try fixing it yourself) is to write your rules in
|
---|
415 | terms of pairs of bytes, using definitions in the first section:
|
---|
416 |
|
---|
417 | X \xfe\xc2
|
---|
418 | ...
|
---|
419 | %%
|
---|
420 | foo{X}bar found_foo_fe_c2_bar();
|
---|
421 |
|
---|
422 | etc. Definitely a pain - sorry about that.
|
---|
423 |
|
---|
424 | By the way, the email address you used for me is ancient, indicating you
|
---|
425 | have a very old version of flex. You can get the most recent, 2.5.4, from
|
---|
426 | ftp.ee.lbl.gov.
|
---|
427 |
|
---|
428 | Vern
|
---|
429 |
|
---|
430 |
|
---|
431 | File: flex.info, Node: deleteme01, Next: Can you discuss some flex internals?, Prev: Can I fake multi-byte character support?, Up: FAQ
|
---|
432 |
|
---|
433 | deleteme01
|
---|
434 | ==========
|
---|
435 |
|
---|
436 |
|
---|
437 | To: moleary@primus.com
|
---|
438 | Subject: Re: Flex / Unicode compatibility question
|
---|
439 | In-reply-to: Your message of Tue, 22 Oct 1996 10:15:42 PDT.
|
---|
440 | Date: Tue, 22 Oct 1996 11:06:13 PDT
|
---|
441 | From: Vern Paxson <vern>
|
---|
442 |
|
---|
443 | Unfortunately flex at the moment has a widespread assumption within it
|
---|
444 | that characters are processed 8 bits at a time. I don't see any easy
|
---|
445 | fix for this (other than writing your rules in terms of double characters -
|
---|
446 | a pain). I also don't know of a wider lex, though you might try surfing
|
---|
447 | the Plan 9 stuff because I know it's a Unicode system, and also the PCCT
|
---|
448 | toolkit (try searching say Alta Vista for "Purdue Compiler Construction
|
---|
449 | Toolkit").
|
---|
450 |
|
---|
451 | Fixing flex to handle wider characters is on the long-term to-do list.
|
---|
452 | But since flex is a strictly spare-time project these days, this probably
|
---|
453 | won't happen for quite a while, unless someone else does it first.
|
---|
454 |
|
---|
455 | Vern
|
---|
456 |
|
---|
457 |
|
---|
458 | File: flex.info, Node: Can you discuss some flex internals?, Next: unput() messes up yy_at_bol, Prev: deleteme01, Up: FAQ
|
---|
459 |
|
---|
460 | Can you discuss some flex internals?
|
---|
461 | ====================================
|
---|
462 |
|
---|
463 |
|
---|
464 | To: Johan Linde <jl@theophys.kth.se>
|
---|
465 | Subject: Re: translation of flex
|
---|
466 | In-reply-to: Your message of Sun, 10 Nov 1996 09:16:36 PST.
|
---|
467 | Date: Mon, 11 Nov 1996 10:33:50 PST
|
---|
468 | From: Vern Paxson <vern>
|
---|
469 |
|
---|
470 | > I'm working for the Swedish team translating GNU program, and I'm currently
|
---|
471 | > working with flex. I have a few questions about some of the messages which
|
---|
472 | > I hope you can answer.
|
---|
473 |
|
---|
474 | All of the things you're wondering about, by the way, concerning flex
|
---|
475 | internals - probably the only person who understands what they mean in
|
---|
476 | English is me! So I wouldn't worry too much about getting them right.
|
---|
477 | That said ...
|
---|
478 |
|
---|
479 | > #: main.c:545
|
---|
480 | > msgid " %d protos created\n"
|
---|
481 | >
|
---|
482 | > Does proto mean prototype?
|
---|
483 |
|
---|
484 | Yes - prototypes of state compression tables.
|
---|
485 |
|
---|
486 | > #: main.c:539
|
---|
487 | > msgid " %d/%d (peak %d) template nxt-chk entries created\n"
|
---|
488 | >
|
---|
489 | > Here I'm mainly puzzled by 'nxt-chk'. I guess it means 'next-check'. (?)
|
---|
490 | > However, 'template next-check entries' doesn't make much sense to me. To be
|
---|
491 | > able to find a good translation I need to know a little bit more about it.
|
---|
492 |
|
---|
493 | There is a scheme in the Aho/Sethi/Ullman compiler book for compressing
|
---|
494 | scanner tables. It involves creating two pairs of tables. The first has
|
---|
495 | "base" and "default" entries, the second has "next" and "check" entries.
|
---|
496 | The "base" entry is indexed by the current state and yields an index into
|
---|
497 | the next/check table. The "default" entry gives what to do if the state
|
---|
498 | transition isn't found in next/check. The "next" entry gives the next
|
---|
499 | state to enter, but only if the "check" entry verifies that this entry is
|
---|
500 | correct for the current state. Flex creates templates of series of
|
---|
501 | next/check entries and then encodes differences from these templates as a
|
---|
502 | way to compress the tables.
|
---|
503 |
|
---|
504 | > #: main.c:533
|
---|
505 | > msgid " %d/%d base-def entries created\n"
|
---|
506 | >
|
---|
507 | > The same problem here for 'base-def'.
|
---|
508 |
|
---|
509 | See above.
|
---|
510 |
|
---|
511 | Vern
|
---|
512 |
|
---|
513 |
|
---|
514 | File: flex.info, Node: unput() messes up yy_at_bol, Next: The | operator is not doing what I want, Prev: Can you discuss some flex internals?, Up: FAQ
|
---|
515 |
|
---|
516 | unput() messes up yy_at_bol
|
---|
517 | ===========================
|
---|
518 |
|
---|
519 |
|
---|
520 | To: Xinying Li <xli@npac.syr.edu>
|
---|
521 | Subject: Re: FLEX ?
|
---|
522 | In-reply-to: Your message of Wed, 13 Nov 1996 17:28:38 PST.
|
---|
523 | Date: Wed, 13 Nov 1996 19:51:54 PST
|
---|
524 | From: Vern Paxson <vern>
|
---|
525 |
|
---|
526 | > "unput()" them to input flow, question occurs. If I do this after I scan
|
---|
527 | > a carriage, the variable "YY_CURRENT_BUFFER->yy_at_bol" is changed. That
|
---|
528 | > means the carriage flag has gone.
|
---|
529 |
|
---|
530 | You can control this by calling yy_set_bol(). It's described in the manual.
|
---|
531 |
|
---|
532 | > And if in pre-reading it goes to the end of file, is anything done
|
---|
533 | > to control the end of curren buffer and end of file?
|
---|
534 |
|
---|
535 | No, there's no way to put back an end-of-file.
|
---|
536 |
|
---|
537 | > By the way I am using flex 2.5.2 and using the "-l".
|
---|
538 |
|
---|
539 | The latest release is 2.5.4, by the way. It fixes some bugs in 2.5.2 and
|
---|
540 | 2.5.3. You can get it from ftp.ee.lbl.gov.
|
---|
541 |
|
---|
542 | Vern
|
---|
543 |
|
---|
544 |
|
---|
545 | File: flex.info, Node: The | operator is not doing what I want, Next: Why can't flex understand this variable trailing context pattern?, Prev: unput() messes up yy_at_bol, Up: FAQ
|
---|
546 |
|
---|
547 | The | operator is not doing what I want
|
---|
548 | =======================================
|
---|
549 |
|
---|
550 |
|
---|
551 | To: Alain.ISSARD@st.com
|
---|
552 | Subject: Re: Start condition with FLEX
|
---|
553 | In-reply-to: Your message of Mon, 18 Nov 1996 09:45:02 PST.
|
---|
554 | Date: Mon, 18 Nov 1996 10:41:34 PST
|
---|
555 | From: Vern Paxson <vern>
|
---|
556 |
|
---|
557 | > I am not able to use the start condition scope and to use the | (OR) with
|
---|
558 | > rules having start conditions.
|
---|
559 |
|
---|
560 | The problem is that if you use '|' as a regular expression operator, for
|
---|
561 | example "a|b" meaning "match either 'a' or 'b'", then it must *not* have
|
---|
562 | any blanks around it. If you instead want the special '|' *action* (which
|
---|
563 | from your scanner appears to be the case), which is a way of giving two
|
---|
564 | different rules the same action:
|
---|
565 |
|
---|
566 | foo |
|
---|
567 | bar matched_foo_or_bar();
|
---|
568 |
|
---|
569 | then '|' *must* be separated from the first rule by whitespace and *must*
|
---|
570 | be followed by a new line. You *cannot* write it as:
|
---|
571 |
|
---|
572 | foo | bar matched_foo_or_bar();
|
---|
573 |
|
---|
574 | even though you might think you could because yacc supports this syntax.
|
---|
575 | The reason for this unfortunately incompatibility is historical, but it's
|
---|
576 | unlikely to be changed.
|
---|
577 |
|
---|
578 | Your problems with start condition scope are simply due to syntax errors
|
---|
579 | from your use of '|' later confusing flex.
|
---|
580 |
|
---|
581 | Let me know if you still have problems.
|
---|
582 |
|
---|
583 | Vern
|
---|
584 |
|
---|
585 |
|
---|
586 | File: flex.info, Node: Why can't flex understand this variable trailing context pattern?, Next: The ^ operator isn't working, Prev: The | operator is not doing what I want, Up: FAQ
|
---|
587 |
|
---|
588 | Why can't flex understand this variable trailing context pattern?
|
---|
589 | =================================================================
|
---|
590 |
|
---|
591 |
|
---|
592 | To: Gregory Margo <gmargo@newton.vip.best.com>
|
---|
593 | Subject: Re: flex-2.5.3 bug report
|
---|
594 | In-reply-to: Your message of Sat, 23 Nov 1996 16:50:09 PST.
|
---|
595 | Date: Sat, 23 Nov 1996 17:07:32 PST
|
---|
596 | From: Vern Paxson <vern>
|
---|
597 |
|
---|
598 | > Enclosed is a lex file that "real" lex will process, but I cannot get
|
---|
599 | > flex to process it. Could you try it and maybe point me in the right direction?
|
---|
600 |
|
---|
601 | Your problem is that some of the definitions in the scanner use the '/'
|
---|
602 | trailing context operator, and have it enclosed in ()'s. Flex does not
|
---|
603 | allow this operator to be enclosed in ()'s because doing so allows undefined
|
---|
604 | regular expressions such as "(a/b)+". So the solution is to remove the
|
---|
605 | parentheses. Note that you must also be building the scanner with the -l
|
---|
606 | option for AT&T lex compatibility. Without this option, flex automatically
|
---|
607 | encloses the definitions in parentheses.
|
---|
608 |
|
---|
609 | Vern
|
---|
610 |
|
---|
611 |
|
---|
612 | File: flex.info, Node: The ^ operator isn't working, Next: Trailing context is getting confused with trailing optional patterns, Prev: Why can't flex understand this variable trailing context pattern?, Up: FAQ
|
---|
613 |
|
---|
614 | The ^ operator isn't working
|
---|
615 | ============================
|
---|
616 |
|
---|
617 |
|
---|
618 | To: Thomas Hadig <hadig@toots.physik.rwth-aachen.de>
|
---|
619 | Subject: Re: Flex Bug ?
|
---|
620 | In-reply-to: Your message of Tue, 26 Nov 1996 14:35:01 PST.
|
---|
621 | Date: Tue, 26 Nov 1996 11:15:05 PST
|
---|
622 | From: Vern Paxson <vern>
|
---|
623 |
|
---|
624 | > In my lexer code, i have the line :
|
---|
625 | > ^\*.* { }
|
---|
626 | >
|
---|
627 | > Thus all lines starting with an astrix (*) are comment lines.
|
---|
628 | > This does not work !
|
---|
629 |
|
---|
630 | I can't get this problem to reproduce - it works fine for me. Note
|
---|
631 | though that if what you have is slightly different:
|
---|
632 |
|
---|
633 | COMMENT ^\*.*
|
---|
634 | %%
|
---|
635 | {COMMENT} { }
|
---|
636 |
|
---|
637 | then it won't work, because flex pushes back macro definitions enclosed
|
---|
638 | in ()'s, so the rule becomes
|
---|
639 |
|
---|
640 | (^\*.*) { }
|
---|
641 |
|
---|
642 | and now that the '^' operator is not at the immediate beginning of the
|
---|
643 | line, it's interpreted as just a regular character. You can avoid this
|
---|
644 | behavior by using the "-l" lex-compatibility flag, or "%option lex-compat".
|
---|
645 |
|
---|
646 | Vern
|
---|
647 |
|
---|
648 |
|
---|
649 | File: flex.info, Node: Trailing context is getting confused with trailing optional patterns, Next: Is flex GNU or not?, Prev: The ^ operator isn't working, Up: FAQ
|
---|
650 |
|
---|
651 | Trailing context is getting confused with trailing optional patterns
|
---|
652 | ====================================================================
|
---|
653 |
|
---|
654 |
|
---|
655 | To: Adoram Rogel <adoram@hybridge.com>
|
---|
656 | Subject: Re: Flex 2.5.4 BOF ???
|
---|
657 | In-reply-to: Your message of Tue, 26 Nov 1996 16:10:41 PST.
|
---|
658 | Date: Wed, 27 Nov 1996 10:56:25 PST
|
---|
659 | From: Vern Paxson <vern>
|
---|
660 |
|
---|
661 | > Organization(s)?/[a-z]
|
---|
662 | >
|
---|
663 | > This matched "Organizations" (looking in debug mode, the trailing s
|
---|
664 | > was matched with trailing context instead of the optional (s) in the
|
---|
665 | > end of the word.
|
---|
666 |
|
---|
667 | That should only happen with lex. Flex can properly match this pattern.
|
---|
668 | (That might be what you're saying, I'm just not sure.)
|
---|
669 |
|
---|
670 | > Is there a way to avoid this dangerous trailing context problem ?
|
---|
671 |
|
---|
672 | Unfortunately, there's no easy way. On the other hand, I don't see why
|
---|
673 | it should be a problem. Lex's matching is clearly wrong, and I'd hope
|
---|
674 | that usually the intent remains the same as expressed with the pattern,
|
---|
675 | so flex's matching will be correct.
|
---|
676 |
|
---|
677 | Vern
|
---|
678 |
|
---|
679 |
|
---|
680 | File: flex.info, Node: Is flex GNU or not?, Next: ERASEME53, Prev: Trailing context is getting confused with trailing optional patterns, Up: FAQ
|
---|
681 |
|
---|
682 | Is flex GNU or not?
|
---|
683 | ===================
|
---|
684 |
|
---|
685 |
|
---|
686 | To: Cameron MacKinnon <mackin@interlog.com>
|
---|
687 | Subject: Re: Flex documentation bug
|
---|
688 | In-reply-to: Your message of Mon, 02 Dec 1996 00:07:08 PST.
|
---|
689 | Date: Sun, 01 Dec 1996 22:29:39 PST
|
---|
690 | From: Vern Paxson <vern>
|
---|
691 |
|
---|
692 | > I'm not sure how or where to submit bug reports (documentation or
|
---|
693 | > otherwise) for the GNU project stuff ...
|
---|
694 |
|
---|
695 | Well, strictly speaking flex isn't part of the GNU project. They just
|
---|
696 | distribute it because no one's written a decent GPL'd lex replacement.
|
---|
697 | So you should send bugs directly to me. Those sent to the GNU folks
|
---|
698 | sometimes find there way to me, but some may drop between the cracks.
|
---|
699 |
|
---|
700 | > In GNU Info, under the section 'Start Conditions', and also in the man
|
---|
701 | > page (mine's dated April '95) is a nice little snippet showing how to
|
---|
702 | > parse C quoted strings into a buffer, defined to be MAX_STR_CONST in
|
---|
703 | > size. Unfortunately, no overflow checking is ever done ...
|
---|
704 |
|
---|
705 | This is already mentioned in the manual:
|
---|
706 |
|
---|
707 | Finally, here's an example of how to match C-style quoted
|
---|
708 | strings using exclusive start conditions, including expanded
|
---|
709 | escape sequences (but not including checking for a string
|
---|
710 | that's too long):
|
---|
711 |
|
---|
712 | The reason for not doing the overflow checking is that it will needlessly
|
---|
713 | clutter up an example whose main purpose is just to demonstrate how to
|
---|
714 | use flex.
|
---|
715 |
|
---|
716 | The latest release is 2.5.4, by the way, available from ftp.ee.lbl.gov.
|
---|
717 |
|
---|
718 | Vern
|
---|
719 |
|
---|
720 |
|
---|
721 | File: flex.info, Node: ERASEME53, Next: I need to scan if-then-else blocks and while loops, Prev: Is flex GNU or not?, Up: FAQ
|
---|
722 |
|
---|
723 | ERASEME53
|
---|
724 | =========
|
---|
725 |
|
---|
726 |
|
---|
727 | To: tsv@cs.UManitoba.CA
|
---|
728 | Subject: Re: Flex (reg)..
|
---|
729 | In-reply-to: Your message of Thu, 06 Mar 1997 23:50:16 PST.
|
---|
730 | Date: Thu, 06 Mar 1997 15:54:19 PST
|
---|
731 | From: Vern Paxson <vern>
|
---|
732 |
|
---|
733 | > [:alpha:] ([:alnum:] | \\_)*
|
---|
734 |
|
---|
735 | If your rule really has embedded blanks as shown above, then it won't
|
---|
736 | work, as the first blank delimits the rule from the action. (It wouldn't
|
---|
737 | even compile ...) You need instead:
|
---|
738 |
|
---|
739 | [:alpha:]([:alnum:]|\\_)*
|
---|
740 |
|
---|
741 | and that should work fine - there's no restriction on what can go inside
|
---|
742 | of ()'s except for the trailing context operator, '/'.
|
---|
743 |
|
---|
744 | Vern
|
---|
745 |
|
---|
746 |
|
---|
747 | File: flex.info, Node: I need to scan if-then-else blocks and while loops, Next: ERASEME55, Prev: ERASEME53, Up: FAQ
|
---|
748 |
|
---|
749 | I need to scan if-then-else blocks and while loops
|
---|
750 | ==================================================
|
---|
751 |
|
---|
752 |
|
---|
753 | To: "Mike Stolnicki" <mstolnic@ford.com>
|
---|
754 | Subject: Re: FLEX help
|
---|
755 | In-reply-to: Your message of Fri, 30 May 1997 13:33:27 PDT.
|
---|
756 | Date: Fri, 30 May 1997 10:46:35 PDT
|
---|
757 | From: Vern Paxson <vern>
|
---|
758 |
|
---|
759 | > We'd like to add "if-then-else", "while", and "for" statements to our
|
---|
760 | > language ...
|
---|
761 | > We've investigated many possible solutions. The one solution that seems
|
---|
762 | > the most reasonable involves knowing the position of a TOKEN in yyin.
|
---|
763 |
|
---|
764 | I strongly advise you to instead build a parse tree (abstract syntax tree)
|
---|
765 | and loop over that instead. You'll find this has major benefits in keeping
|
---|
766 | your interpreter simple and extensible.
|
---|
767 |
|
---|
768 | That said, the functionality you mention for get_position and set_position
|
---|
769 | have been on the to-do list for a while. As flex is a purely spare-time
|
---|
770 | project for me, no guarantees when this will be added (in particular, it
|
---|
771 | for sure won't be for many months to come).
|
---|
772 |
|
---|
773 | Vern
|
---|
774 |
|
---|
775 |
|
---|
776 | File: flex.info, Node: ERASEME55, Next: ERASEME56, Prev: I need to scan if-then-else blocks and while loops, Up: FAQ
|
---|
777 |
|
---|
778 | ERASEME55
|
---|
779 | =========
|
---|
780 |
|
---|
781 |
|
---|
782 | To: Colin Paul Adams <colin@colina.demon.co.uk>
|
---|
783 | Subject: Re: Flex C++ classes and Bison
|
---|
784 | In-reply-to: Your message of 09 Aug 1997 17:11:41 PDT.
|
---|
785 | Date: Fri, 15 Aug 1997 10:48:19 PDT
|
---|
786 | From: Vern Paxson <vern>
|
---|
787 |
|
---|
788 | > #define YY_DECL int yylex (YYSTYPE *lvalp, struct parser_control
|
---|
789 | > *parm)
|
---|
790 | >
|
---|
791 | > I have been trying to get this to work as a C++ scanner, but it does
|
---|
792 | > not appear to be possible (warning that it matches no declarations in
|
---|
793 | > yyFlexLexer, or something like that).
|
---|
794 | >
|
---|
795 | > Is this supposed to be possible, or is it being worked on (I DID
|
---|
796 | > notice the comment that scanner classes are still experimental, so I'm
|
---|
797 | > not too hopeful)?
|
---|
798 |
|
---|
799 | What you need to do is derive a subclass from yyFlexLexer that provides
|
---|
800 | the above yylex() method, squirrels away lvalp and parm into member
|
---|
801 | variables, and then invokes yyFlexLexer::yylex() to do the regular scanning.
|
---|
802 |
|
---|
803 | Vern
|
---|
804 |
|
---|
805 |
|
---|
806 | File: flex.info, Node: ERASEME56, Next: ERASEME57, Prev: ERASEME55, Up: FAQ
|
---|
807 |
|
---|
808 | ERASEME56
|
---|
809 | =========
|
---|
810 |
|
---|
811 |
|
---|
812 | To: Mikael.Latvala@lmf.ericsson.se
|
---|
813 | Subject: Re: Possible mistake in Flex v2.5 document
|
---|
814 | In-reply-to: Your message of Fri, 05 Sep 1997 16:07:24 PDT.
|
---|
815 | Date: Fri, 05 Sep 1997 10:01:54 PDT
|
---|
816 | From: Vern Paxson <vern>
|
---|
817 |
|
---|
818 | > In that example you show how to count comment lines when using
|
---|
819 | > C style /* ... */ comments. My question is, shouldn't you take into
|
---|
820 | > account a scenario where end of a comment marker occurs inside
|
---|
821 | > character or string literals?
|
---|
822 |
|
---|
823 | The scanner certainly needs to also scan character and string literals.
|
---|
824 | However it does that (there's an example in the man page for strings), the
|
---|
825 | lexer will recognize the beginning of the literal before it runs across the
|
---|
826 | embedded "/*". Consequently, it will finish scanning the literal before it
|
---|
827 | even considers the possibility of matching "/*".
|
---|
828 |
|
---|
829 | Example:
|
---|
830 |
|
---|
831 | '([^']*|{ESCAPE_SEQUENCE})'
|
---|
832 |
|
---|
833 | will match all the text between the ''s (inclusive). So the lexer
|
---|
834 | considers this as a token beginning at the first ', and doesn't even
|
---|
835 | attempt to match other tokens inside it.
|
---|
836 |
|
---|
837 | I thinnk this subtlety is not worth putting in the manual, as I suspect
|
---|
838 | it would confuse more people than it would enlighten.
|
---|
839 |
|
---|
840 | Vern
|
---|
841 |
|
---|
842 |
|
---|
843 | File: flex.info, Node: ERASEME57, Next: Is there a repository for flex scanners?, Prev: ERASEME56, Up: FAQ
|
---|
844 |
|
---|
845 | ERASEME57
|
---|
846 | =========
|
---|
847 |
|
---|
848 |
|
---|
849 | To: "Marty Leisner" <leisner@sdsp.mc.xerox.com>
|
---|
850 | Subject: Re: flex limitations
|
---|
851 | In-reply-to: Your message of Sat, 06 Sep 1997 11:27:21 PDT.
|
---|
852 | Date: Mon, 08 Sep 1997 11:38:08 PDT
|
---|
853 | From: Vern Paxson <vern>
|
---|
854 |
|
---|
855 | > %%
|
---|
856 | > [a-zA-Z]+ /* skip a line */
|
---|
857 | > { printf("got %s\n", yytext); }
|
---|
858 | > %%
|
---|
859 |
|
---|
860 | What version of flex are you using? If I feed this to 2.5.4, it complains:
|
---|
861 |
|
---|
862 | "bug.l", line 5: EOF encountered inside an action
|
---|
863 | "bug.l", line 5: unrecognized rule
|
---|
864 | "bug.l", line 5: fatal parse error
|
---|
865 |
|
---|
866 | Not the world's greatest error message, but it manages to flag the problem.
|
---|
867 |
|
---|
868 | (With the introduction of start condition scopes, flex can't accommodate
|
---|
869 | an action on a separate line, since it's ambiguous with an indented rule.)
|
---|
870 |
|
---|
871 | You can get 2.5.4 from ftp.ee.lbl.gov.
|
---|
872 |
|
---|
873 | Vern
|
---|
874 |
|
---|
875 |
|
---|
876 | File: flex.info, Node: Is there a repository for flex scanners?, Next: How can I conditionally compile or preprocess my flex input file?, Prev: ERASEME57, Up: FAQ
|
---|
877 |
|
---|
878 | Is there a repository for flex scanners?
|
---|
879 | ========================================
|
---|
880 |
|
---|
881 | Not that we know of. You might try asking on comp.compilers.
|
---|
882 |
|
---|
883 |
|
---|
884 | File: flex.info, Node: How can I conditionally compile or preprocess my flex input file?, Next: Where can I find grammars for lex and yacc?, Prev: Is there a repository for flex scanners?, Up: FAQ
|
---|
885 |
|
---|
886 | How can I conditionally compile or preprocess my flex input file?
|
---|
887 | =================================================================
|
---|
888 |
|
---|
889 | Flex doesn't have a preprocessor like C does. You might try using
|
---|
890 | m4, or the C preprocessor plus a sed script to clean up the result.
|
---|
891 |
|
---|
892 |
|
---|
893 | File: flex.info, Node: Where can I find grammars for lex and yacc?, Next: I get an end-of-buffer message for each character scanned., Prev: How can I conditionally compile or preprocess my flex input file?, Up: FAQ
|
---|
894 |
|
---|
895 | Where can I find grammars for lex and yacc?
|
---|
896 | ===========================================
|
---|
897 |
|
---|
898 | In the sources for flex and bison.
|
---|
899 |
|
---|
900 |
|
---|
901 | File: flex.info, Node: I get an end-of-buffer message for each character scanned., Next: unnamed-faq-62, Prev: Where can I find grammars for lex and yacc?, Up: FAQ
|
---|
902 |
|
---|
903 | I get an end-of-buffer message for each character scanned.
|
---|
904 | ==========================================================
|
---|
905 |
|
---|
906 | This will happen if your LexerInput() function returns only one
|
---|
907 | character at a time, which can happen either if you're scanner is
|
---|
908 | "interactive", or if the streams library on your platform always
|
---|
909 | returns 1 for yyin->gcount().
|
---|
910 |
|
---|
911 | Solution: override LexerInput() with a version that returns whole
|
---|
912 | buffers.
|
---|
913 |
|
---|
914 |
|
---|
915 | File: flex.info, Node: unnamed-faq-62, Next: unnamed-faq-63, Prev: I get an end-of-buffer message for each character scanned., Up: FAQ
|
---|
916 |
|
---|
917 | unnamed-faq-62
|
---|
918 | ==============
|
---|
919 |
|
---|
920 |
|
---|
921 | To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE
|
---|
922 | Subject: Re: Flex maximums
|
---|
923 | In-reply-to: Your message of Mon, 17 Nov 1997 17:16:06 PST.
|
---|
924 | Date: Mon, 17 Nov 1997 17:16:15 PST
|
---|
925 | From: Vern Paxson <vern>
|
---|
926 |
|
---|
927 | > I took a quick look into the flex-sources and altered some #defines in
|
---|
928 | > flexdefs.h:
|
---|
929 | >
|
---|
930 | > #define INITIAL_MNS 64000
|
---|
931 | > #define MNS_INCREMENT 1024000
|
---|
932 | > #define MAXIMUM_MNS 64000
|
---|
933 |
|
---|
934 | The things to fix are to add a couple of zeroes to:
|
---|
935 |
|
---|
936 | #define JAMSTATE -32766 /* marks a reference to the state that always jams */
|
---|
937 | #define MAXIMUM_MNS 31999
|
---|
938 | #define BAD_SUBSCRIPT -32767
|
---|
939 | #define MAX_SHORT 32700
|
---|
940 |
|
---|
941 | and, if you get complaints about too many rules, make the following change too:
|
---|
942 |
|
---|
943 | #define YY_TRAILING_MASK 0x200000
|
---|
944 | #define YY_TRAILING_HEAD_MASK 0x400000
|
---|
945 |
|
---|
946 | - Vern
|
---|
947 |
|
---|
948 |
|
---|
949 | File: flex.info, Node: unnamed-faq-63, Next: unnamed-faq-64, Prev: unnamed-faq-62, Up: FAQ
|
---|
950 |
|
---|
951 | unnamed-faq-63
|
---|
952 | ==============
|
---|
953 |
|
---|
954 |
|
---|
955 | To: jimmey@lexis-nexis.com (Jimmey Todd)
|
---|
956 | Subject: Re: FLEX question regarding istream vs ifstream
|
---|
957 | In-reply-to: Your message of Mon, 08 Dec 1997 15:54:15 PST.
|
---|
958 | Date: Mon, 15 Dec 1997 13:21:35 PST
|
---|
959 | From: Vern Paxson <vern>
|
---|
960 |
|
---|
961 | > stdin_handle = YY_CURRENT_BUFFER;
|
---|
962 | > ifstream fin( "aFile" );
|
---|
963 | > yy_switch_to_buffer( yy_create_buffer( fin, YY_BUF_SIZE ) );
|
---|
964 | >
|
---|
965 | > What I'm wanting to do, is pass the contents of a file thru one set
|
---|
966 | > of rules and then pass stdin thru another set... It works great if, I
|
---|
967 | > don't use the C++ classes. But since everything else that I'm doing is
|
---|
968 | > in C++, I thought I'd be consistent.
|
---|
969 | >
|
---|
970 | > The problem is that 'yy_create_buffer' is expecting an istream* as it's
|
---|
971 | > first argument (as stated in the man page). However, fin is a ifstream
|
---|
972 | > object. Any ideas on what I might be doing wrong? Any help would be
|
---|
973 | > appreciated. Thanks!!
|
---|
974 |
|
---|
975 | You need to pass &fin, to turn it into an ifstream* instead of an ifstream.
|
---|
976 | Then its type will be compatible with the expected istream*, because ifstream
|
---|
977 | is derived from istream.
|
---|
978 |
|
---|
979 | Vern
|
---|
980 |
|
---|
981 |
|
---|
982 | File: flex.info, Node: unnamed-faq-64, Next: unnamed-faq-65, Prev: unnamed-faq-63, Up: FAQ
|
---|
983 |
|
---|
984 | unnamed-faq-64
|
---|
985 | ==============
|
---|
986 |
|
---|
987 |
|
---|
988 | To: Enda Fadian <fadiane@piercom.ie>
|
---|
989 | Subject: Re: Question related to Flex man page?
|
---|
990 | In-reply-to: Your message of Tue, 16 Dec 1997 15:17:34 PST.
|
---|
991 | Date: Tue, 16 Dec 1997 14:17:09 PST
|
---|
992 | From: Vern Paxson <vern>
|
---|
993 |
|
---|
994 | > Can you explain to me what is ment by a long-jump in relation to flex?
|
---|
995 |
|
---|
996 | Using the longjmp() function while inside yylex() or a routine called by it.
|
---|
997 |
|
---|
998 | > what is the flex activation frame.
|
---|
999 |
|
---|
1000 | Just yylex()'s stack frame.
|
---|
1001 |
|
---|
1002 | > As far as I can see yyrestart will bring me back to the sart of the input
|
---|
1003 | > file and using flex++ isnot really an option!
|
---|
1004 |
|
---|
1005 | No, yyrestart() doesn't imply a rewind, even though its name might sound
|
---|
1006 | like it does. It tells the scanner to flush its internal buffers and
|
---|
1007 | start reading from the given file at its present location.
|
---|
1008 |
|
---|
1009 | Vern
|
---|
1010 |
|
---|
1011 |
|
---|
1012 | File: flex.info, Node: unnamed-faq-65, Next: unnamed-faq-66, Prev: unnamed-faq-64, Up: FAQ
|
---|
1013 |
|
---|
1014 | unnamed-faq-65
|
---|
1015 | ==============
|
---|
1016 |
|
---|
1017 |
|
---|
1018 | To: hassan@larc.info.uqam.ca (Hassan Alaoui)
|
---|
1019 | Subject: Re: Need urgent Help
|
---|
1020 | In-reply-to: Your message of Sat, 20 Dec 1997 19:38:19 PST.
|
---|
1021 | Date: Sun, 21 Dec 1997 21:30:46 PST
|
---|
1022 | From: Vern Paxson <vern>
|
---|
1023 |
|
---|
1024 | > /usr/lib/yaccpar: In function `int yyparse()':
|
---|
1025 | > /usr/lib/yaccpar:184: warning: implicit declaration of function `int yylex(...)'
|
---|
1026 | >
|
---|
1027 | > ld: Undefined symbol
|
---|
1028 | > _yylex
|
---|
1029 | > _yyparse
|
---|
1030 | > _yyin
|
---|
1031 |
|
---|
1032 | This is a known problem with Solaris C++ (and/or Solaris yacc). I believe
|
---|
1033 | the fix is to explicitly insert some 'extern "C"' statements for the
|
---|
1034 | corresponding routines/symbols.
|
---|
1035 |
|
---|
1036 | Vern
|
---|
1037 |
|
---|
1038 |
|
---|
1039 | File: flex.info, Node: unnamed-faq-66, Next: unnamed-faq-67, Prev: unnamed-faq-65, Up: FAQ
|
---|
1040 |
|
---|
1041 | unnamed-faq-66
|
---|
1042 | ==============
|
---|
1043 |
|
---|
1044 |
|
---|
1045 | To: mc0307@mclink.it
|
---|
1046 | Cc: gnu@prep.ai.mit.edu
|
---|
1047 | Subject: Re: [mc0307@mclink.it: Help request]
|
---|
1048 | In-reply-to: Your message of Fri, 12 Dec 1997 17:57:29 PST.
|
---|
1049 | Date: Sun, 21 Dec 1997 22:33:37 PST
|
---|
1050 | From: Vern Paxson <vern>
|
---|
1051 |
|
---|
1052 | > This is my definition for float and integer types:
|
---|
1053 | > . . .
|
---|
1054 | > NZD [1-9]
|
---|
1055 | > ...
|
---|
1056 | > I've tested my program on other lex version (on UNIX Sun Solaris an HP
|
---|
1057 | > UNIX) and it work well, so I think that my definitions are correct.
|
---|
1058 | > There are any differences between Lex and Flex?
|
---|
1059 |
|
---|
1060 | There are indeed differences, as discussed in the man page. The one
|
---|
1061 | you are probably running into is that when flex expands a name definition,
|
---|
1062 | it puts parentheses around the expansion, while lex does not. There's
|
---|
1063 | an example in the man page of how this can lead to different matching.
|
---|
1064 | Flex's behavior complies with the POSIX standard (or at least with the
|
---|
1065 | last POSIX draft I saw).
|
---|
1066 |
|
---|
1067 | Vern
|
---|
1068 |
|
---|
1069 |
|
---|
1070 | File: flex.info, Node: unnamed-faq-67, Next: unnamed-faq-68, Prev: unnamed-faq-66, Up: FAQ
|
---|
1071 |
|
---|
1072 | unnamed-faq-67
|
---|
1073 | ==============
|
---|
1074 |
|
---|
1075 |
|
---|
1076 | To: hassan@larc.info.uqam.ca (Hassan Alaoui)
|
---|
1077 | Subject: Re: Thanks
|
---|
1078 | In-reply-to: Your message of Mon, 22 Dec 1997 16:06:35 PST.
|
---|
1079 | Date: Mon, 22 Dec 1997 14:35:05 PST
|
---|
1080 | From: Vern Paxson <vern>
|
---|
1081 |
|
---|
1082 | > Thank you very much for your help. I compile and link well with C++ while
|
---|
1083 | > declaring 'yylex ...' extern, But a little problem remains. I get a
|
---|
1084 | > segmentation default when executing ( I linked with lfl library) while it
|
---|
1085 | > works well when using LEX instead of flex. Do you have some ideas about the
|
---|
1086 | > reason for this ?
|
---|
1087 |
|
---|
1088 | The one possible reason for this that comes to mind is if you've defined
|
---|
1089 | yytext as "extern char yytext[]" (which is what lex uses) instead of
|
---|
1090 | "extern char *yytext" (which is what flex uses). If it's not that, then
|
---|
1091 | I'm afraid I don't know what the problem might be.
|
---|
1092 |
|
---|
1093 | Vern
|
---|
1094 |
|
---|
1095 |
|
---|
1096 | File: flex.info, Node: unnamed-faq-68, Next: unnamed-faq-69, Prev: unnamed-faq-67, Up: FAQ
|
---|
1097 |
|
---|
1098 | unnamed-faq-68
|
---|
1099 | ==============
|
---|
1100 |
|
---|
1101 |
|
---|
1102 | To: "Bart Niswonger" <NISWONGR@almaden.ibm.com>
|
---|
1103 | Subject: Re: flex 2.5: c++ scanners & start conditions
|
---|
1104 | In-reply-to: Your message of Tue, 06 Jan 1998 10:34:21 PST.
|
---|
1105 | Date: Tue, 06 Jan 1998 19:19:30 PST
|
---|
1106 | From: Vern Paxson <vern>
|
---|
1107 |
|
---|
1108 | > The problem is that when I do this (using %option c++) start
|
---|
1109 | > conditions seem to not apply.
|
---|
1110 |
|
---|
1111 | The BEGIN macro modifies the yy_start variable. For C scanners, this
|
---|
1112 | is a static with scope visible through the whole file. For C++ scanners,
|
---|
1113 | it's a member variable, so it only has visible scope within a member
|
---|
1114 | function. Your lexbegin() routine is not a member function when you
|
---|
1115 | build a C++ scanner, so it's not modifying the correct yy_start. The
|
---|
1116 | diagnostic that indicates this is that you found you needed to add
|
---|
1117 | a declaration of yy_start in order to get your scanner to compile when
|
---|
1118 | using C++; instead, the correct fix is to make lexbegin() a member
|
---|
1119 | function (by deriving from yyFlexLexer).
|
---|
1120 |
|
---|
1121 | Vern
|
---|
1122 |
|
---|
1123 |
|
---|
1124 | File: flex.info, Node: unnamed-faq-69, Next: unnamed-faq-70, Prev: unnamed-faq-68, Up: FAQ
|
---|
1125 |
|
---|
1126 | unnamed-faq-69
|
---|
1127 | ==============
|
---|
1128 |
|
---|
1129 |
|
---|
1130 | To: "Boris Zinin" <boris@ippe.rssi.ru>
|
---|
1131 | Subject: Re: current position in flex buffer
|
---|
1132 | In-reply-to: Your message of Mon, 12 Jan 1998 18:58:23 PST.
|
---|
1133 | Date: Mon, 12 Jan 1998 12:03:15 PST
|
---|
1134 | From: Vern Paxson <vern>
|
---|
1135 |
|
---|
1136 | > The problem is how to determine the current position in flex active
|
---|
1137 | > buffer when a rule is matched....
|
---|
1138 |
|
---|
1139 | You will need to keep track of this explicitly, such as by redefining
|
---|
1140 | YY_USER_ACTION to count the number of characters matched.
|
---|
1141 |
|
---|
1142 | The latest flex release, by the way, is 2.5.4, available from ftp.ee.lbl.gov.
|
---|
1143 |
|
---|
1144 | Vern
|
---|
1145 |
|
---|
1146 |
|
---|
1147 | File: flex.info, Node: unnamed-faq-70, Next: unnamed-faq-71, Prev: unnamed-faq-69, Up: FAQ
|
---|
1148 |
|
---|
1149 | unnamed-faq-70
|
---|
1150 | ==============
|
---|
1151 |
|
---|
1152 |
|
---|
1153 | To: Bik.Dhaliwal@bis.org
|
---|
1154 | Subject: Re: Flex question
|
---|
1155 | In-reply-to: Your message of Mon, 26 Jan 1998 13:05:35 PST.
|
---|
1156 | Date: Tue, 27 Jan 1998 22:41:52 PST
|
---|
1157 | From: Vern Paxson <vern>
|
---|
1158 |
|
---|
1159 | > That requirement involves knowing
|
---|
1160 | > the character position at which a particular token was matched
|
---|
1161 | > in the lexer.
|
---|
1162 |
|
---|
1163 | The way you have to do this is by explicitly keeping track of where
|
---|
1164 | you are in the file, by counting the number of characters scanned
|
---|
1165 | for each token (available in yyleng). It may prove convenient to
|
---|
1166 | do this by redefining YY_USER_ACTION, as described in the manual.
|
---|
1167 |
|
---|
1168 | Vern
|
---|
1169 |
|
---|
1170 |
|
---|
1171 | File: flex.info, Node: unnamed-faq-71, Next: unnamed-faq-72, Prev: unnamed-faq-70, Up: FAQ
|
---|
1172 |
|
---|
1173 | unnamed-faq-71
|
---|
1174 | ==============
|
---|
1175 |
|
---|
1176 |
|
---|
1177 | To: Vladimir Alexiev <vladimir@cs.ualberta.ca>
|
---|
1178 | Subject: Re: flex: how to control start condition from parser?
|
---|
1179 | In-reply-to: Your message of Mon, 26 Jan 1998 05:50:16 PST.
|
---|
1180 | Date: Tue, 27 Jan 1998 22:45:37 PST
|
---|
1181 | From: Vern Paxson <vern>
|
---|
1182 |
|
---|
1183 | > It seems useful for the parser to be able to tell the lexer about such
|
---|
1184 | > context dependencies, because then they don't have to be limited to
|
---|
1185 | > local or sequential context.
|
---|
1186 |
|
---|
1187 | One way to do this is to have the parser call a stub routine that's
|
---|
1188 | included in the scanner's .l file, and consequently that has access ot
|
---|
1189 | BEGIN. The only ugliness is that the parser can't pass in the state
|
---|
1190 | it wants, because those aren't visible - but if you don't have many
|
---|
1191 | such states, then using a different set of names doesn't seem like
|
---|
1192 | to much of a burden.
|
---|
1193 |
|
---|
1194 | While generating a .h file like you suggests is certainly cleaner,
|
---|
1195 | flex development has come to a virtual stand-still :-(, so a workaround
|
---|
1196 | like the above is much more pragmatic than waiting for a new feature.
|
---|
1197 |
|
---|
1198 | Vern
|
---|
1199 |
|
---|
1200 |
|
---|
1201 | File: flex.info, Node: unnamed-faq-72, Next: unnamed-faq-73, Prev: unnamed-faq-71, Up: FAQ
|
---|
1202 |
|
---|
1203 | unnamed-faq-72
|
---|
1204 | ==============
|
---|
1205 |
|
---|
1206 |
|
---|
1207 | To: Barbara Denny <denny@3com.com>
|
---|
1208 | Subject: Re: freebsd flex bug?
|
---|
1209 | In-reply-to: Your message of Fri, 30 Jan 1998 12:00:43 PST.
|
---|
1210 | Date: Fri, 30 Jan 1998 12:42:32 PST
|
---|
1211 | From: Vern Paxson <vern>
|
---|
1212 |
|
---|
1213 | > lex.yy.c:1996: parse error before `='
|
---|
1214 |
|
---|
1215 | This is the key, identifying this error. (It may help to pinpoint
|
---|
1216 | it by using flex -L, so it doesn't generate #line directives in its
|
---|
1217 | output.) I will bet you heavy money that you have a start condition
|
---|
1218 | name that is also a variable name, or something like that; flex spits
|
---|
1219 | out #define's for each start condition name, mapping them to a number,
|
---|
1220 | so you can wind up with:
|
---|
1221 |
|
---|
1222 | %x foo
|
---|
1223 | %%
|
---|
1224 | ...
|
---|
1225 | %%
|
---|
1226 | void bar()
|
---|
1227 | {
|
---|
1228 | int foo = 3;
|
---|
1229 | }
|
---|
1230 |
|
---|
1231 | and the penultimate will turn into "int 1 = 3" after C preprocessing,
|
---|
1232 | since flex will put "#define foo 1" in the generated scanner.
|
---|
1233 |
|
---|
1234 | Vern
|
---|
1235 |
|
---|
1236 |
|
---|
1237 | File: flex.info, Node: unnamed-faq-73, Next: unnamed-faq-74, Prev: unnamed-faq-72, Up: FAQ
|
---|
1238 |
|
---|
1239 | unnamed-faq-73
|
---|
1240 | ==============
|
---|
1241 |
|
---|
1242 |
|
---|
1243 | To: Maurice Petrie <mpetrie@infoscigroup.com>
|
---|
1244 | Subject: Re: Lost flex .l file
|
---|
1245 | In-reply-to: Your message of Mon, 02 Feb 1998 14:10:01 PST.
|
---|
1246 | Date: Mon, 02 Feb 1998 11:15:12 PST
|
---|
1247 | From: Vern Paxson <vern>
|
---|
1248 |
|
---|
1249 | > I am curious as to
|
---|
1250 | > whether there is a simple way to backtrack from the generated source to
|
---|
1251 | > reproduce the lost list of tokens we are searching on.
|
---|
1252 |
|
---|
1253 | In theory, it's straight-forward to go from the DFA representation
|
---|
1254 | back to a regular-expression representation - the two are isomorphic.
|
---|
1255 | In practice, a huge headache, because you have to unpack all the tables
|
---|
1256 | back into a single DFA representation, and then write a program to munch
|
---|
1257 | on that and translate it into an RE.
|
---|
1258 |
|
---|
1259 | Sorry for the less-than-happy news ...
|
---|
1260 |
|
---|
1261 | Vern
|
---|
1262 |
|
---|
1263 |
|
---|
1264 | File: flex.info, Node: unnamed-faq-74, Next: unnamed-faq-75, Prev: unnamed-faq-73, Up: FAQ
|
---|
1265 |
|
---|
1266 | unnamed-faq-74
|
---|
1267 | ==============
|
---|
1268 |
|
---|
1269 |
|
---|
1270 | To: jimmey@lexis-nexis.com (Jimmey Todd)
|
---|
1271 | Subject: Re: Flex performance question
|
---|
1272 | In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST.
|
---|
1273 | Date: Thu, 19 Feb 1998 08:48:51 PST
|
---|
1274 | From: Vern Paxson <vern>
|
---|
1275 |
|
---|
1276 | > What I have found, is that the smaller the data chunk, the faster the
|
---|
1277 | > program executes. This is the opposite of what I expected. Should this be
|
---|
1278 | > happening this way?
|
---|
1279 |
|
---|
1280 | This is exactly what will happen if your input file has embedded NULs.
|
---|
1281 | From the man page:
|
---|
1282 |
|
---|
1283 | A final note: flex is slow when matching NUL's, particularly
|
---|
1284 | when a token contains multiple NUL's. It's best to write
|
---|
1285 | rules which match short amounts of text if it's anticipated
|
---|
1286 | that the text will often include NUL's.
|
---|
1287 |
|
---|
1288 | So that's the first thing to look for.
|
---|
1289 |
|
---|
1290 | Vern
|
---|
1291 |
|
---|
1292 |
|
---|
1293 | File: flex.info, Node: unnamed-faq-75, Next: unnamed-faq-76, Prev: unnamed-faq-74, Up: FAQ
|
---|
1294 |
|
---|
1295 | unnamed-faq-75
|
---|
1296 | ==============
|
---|
1297 |
|
---|
1298 |
|
---|
1299 | To: jimmey@lexis-nexis.com (Jimmey Todd)
|
---|
1300 | Subject: Re: Flex performance question
|
---|
1301 | In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST.
|
---|
1302 | Date: Thu, 19 Feb 1998 15:42:25 PST
|
---|
1303 | From: Vern Paxson <vern>
|
---|
1304 |
|
---|
1305 | So there are several problems.
|
---|
1306 |
|
---|
1307 | First, to go fast, you want to match as much text as possible, which
|
---|
1308 | your scanners don't in the case that what they're scanning is *not*
|
---|
1309 | a <RN> tag. So you want a rule like:
|
---|
1310 |
|
---|
1311 | [^<]+
|
---|
1312 |
|
---|
1313 | Second, C++ scanners are particularly slow if they're interactive,
|
---|
1314 | which they are by default. Using -B speeds it up by a factor of 3-4
|
---|
1315 | on my workstation.
|
---|
1316 |
|
---|
1317 | Third, C++ scanners that use the istream interface are slow, because
|
---|
1318 | of how poorly implemented istream's are. I built two versions of
|
---|
1319 | the following scanner:
|
---|
1320 |
|
---|
1321 | %%
|
---|
1322 | .*\n
|
---|
1323 | .*
|
---|
1324 | %%
|
---|
1325 |
|
---|
1326 | and the C version inhales a 2.5MB file on my workstation in 0.8 seconds.
|
---|
1327 | The C++ istream version, using -B, takes 3.8 seconds.
|
---|
1328 |
|
---|
1329 | Vern
|
---|
1330 |
|
---|