source: trunk/essentials/sys-devel/flex/doc/flex.info-5

Last change on this file was 3031, checked in by bird, 18 years ago

flex 2.5.33.

File size: 50.0 KB
Line 
1This is flex.info, produced by makeinfo version 4.5 from flex.texi.
2
3INFO-DIR-SECTION Programming
4START-INFO-DIR-ENTRY
5* flex: (flex). Fast lexical analyzer generator (lex replacement).
6END-INFO-DIR-ENTRY
7
8
9 The flex manual is placed under the same licensing conditions as the
10rest of flex:
11
12 Copyright (C) 1990, 1997 The Regents of the University of California.
13All rights reserved.
14
15 This code is derived from software contributed to Berkeley by Vern
16Paxson.
17
18 The United States Government has rights in this work pursuant to
19contract no. DE-AC03-76SF00098 between the United States Department of
20Energy and the University of California.
21
22 Redistribution and use in source and binary forms, with or without
23modification, are permitted provided that the following conditions are
24met:
25
26 1. Redistributions of source code must retain the above copyright
27 notice, this list of conditions and the following disclaimer.
28
29 2. Redistributions in binary form must reproduce the above copyright
30 notice, this list of conditions and the following disclaimer in the
31 documentation and/or other materials provided with the
32 distribution.
33 Neither the name of the University nor the names of its contributors
34may be used to endorse or promote products derived from this software
35without specific prior written permission.
36
37 THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
38WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
39MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
40
41File: flex.info, Node: How do I match any string not matched in the preceding rules?, Next: I am trying to port code from AT&T lex that uses yysptr and yysbuf., Prev: How can I build a two-pass scanner?, Up: FAQ
42
43How do I match any string not matched in the preceding rules?
44=============================================================
45
46 One way to assign precedence, is to place the more specific rules
47first. If two rules would match the same input (same sequence of
48characters) then the first rule listed in the `flex' input wins. e.g.,
49
50
51 %%
52 foo[a-zA-Z_]+ return FOO_ID;
53 bar[a-zA-Z_]+ return BAR_ID;
54 [a-zA-Z_]+ return GENERIC_ID;
55
56 Note that the rule `[a-zA-Z_]+' must come *after* the others. It
57will match the same amount of text as the more specific rules, and in
58that case the `flex' scanner will pick the first rule listed in your
59scanner as the one to match.
60
61
62File: flex.info, Node: I am trying to port code from AT&T lex that uses yysptr and yysbuf., Next: Is there a way to make flex treat NULL like a regular character?, Prev: How do I match any string not matched in the preceding rules?, Up: FAQ
63
64I am trying to port code from AT&T lex that uses yysptr and yysbuf.
65===================================================================
66
67 Those are internal variables pointing into the AT&T scanner's input
68buffer. I imagine they're being manipulated in user versions of the
69`input()' and `unput()' functions. If so, what you need to do is
70analyze those functions to figure out what they're doing, and then
71replace `input()' with an appropriate definition of `YY_INPUT'. You
72shouldn't need to (and must not) replace `flex''s `unput()' function.
73
74
75File: flex.info, Node: Is there a way to make flex treat NULL like a regular character?, Next: Whenever flex can not match the input it says "flex scanner jammed"., Prev: I am trying to port code from AT&T lex that uses yysptr and yysbuf., Up: FAQ
76
77Is there a way to make flex treat NULL like a regular character?
78================================================================
79
80 Yes, `\0' and `\x00' should both do the trick. Perhaps you have an
81ancient version of `flex'. The latest release is version 2.5.33.
82
83
84File: flex.info, Node: Whenever flex can not match the input it says "flex scanner jammed"., Next: Why doesnt flex have non-greedy operators like perl does?, Prev: Is there a way to make flex treat NULL like a regular character?, Up: FAQ
85
86Whenever flex can not match the input it says "flex scanner jammed".
87====================================================================
88
89 You need to add a rule that matches the otherwise-unmatched text.
90e.g.,
91
92
93 %option yylineno
94 %%
95 [[a bunch of rules here]]
96
97 . printf("bad input character '%s' at line %d\n", yytext, yylineno);
98
99 See `%option default' for more information.
100
101
102File: flex.info, Node: Why doesnt flex have non-greedy operators like perl does?, Next: Memory leak - 16386 bytes allocated by malloc., Prev: Whenever flex can not match the input it says "flex scanner jammed"., Up: FAQ
103
104Why doesn't flex have non-greedy operators like perl does?
105==========================================================
106
107 A DFA can do a non-greedy match by stopping the first time it enters
108an accepting state, instead of consuming input until it determines that
109no further matching is possible (a "jam" state). This is actually
110easier to implement than longest leftmost match (which flex does).
111
112 But it's also much less useful than longest leftmost match. In
113general, when you find yourself wishing for non-greedy matching, that's
114usually a sign that you're trying to make the scanner do some parsing.
115That's generally the wrong approach, since it lacks the power to do a
116decent job. Better is to either introduce a separate parser, or to
117split the scanner into multiple scanners using (exclusive) start
118conditions.
119
120 You might have a separate start state once you've seen the `BEGIN'.
121In that state, you might then have a regex that will match `END' (to
122kick you out of the state), and perhaps `(.|\n)' to get a single
123character within the chunk ...
124
125 This approach also has much better error-reporting properties.
126
127
128File: flex.info, Node: Memory leak - 16386 bytes allocated by malloc., Next: How do I track the byte offset for lseek()?, Prev: Why doesnt flex have non-greedy operators like perl does?, Up: FAQ
129
130Memory leak - 16386 bytes allocated by malloc.
131==============================================
132
133 UPDATED 2002-07-10: As of `flex' version 2.5.9, this leak means that
134you did not call `yylex_destroy()'. If you are using an earlier version
135of `flex', then read on.
136
137 The leak is about 16426 bytes. That is, (8192 * 2 + 2) for the
138read-buffer, and about 40 for `struct yy_buffer_state' (depending upon
139alignment). The leak is in the non-reentrant C scanner only (NOT in the
140reentrant scanner, NOT in the C++ scanner). Since `flex' doesn't know
141when you are done, the buffer is never freed.
142
143 However, the leak won't multiply since the buffer is reused no
144matter how many times you call `yylex()'.
145
146 If you want to reclaim the memory when you are completely done
147scanning, then you might try this:
148
149
150 /* For non-reentrant C scanner only. */
151 yy_delete_buffer(YY_CURRENT_BUFFER);
152 yy_init = 1;
153
154 Note: `yy_init' is an "internal variable", and hasn't been tested in
155this situation. It is possible that some other globals may need
156resetting as well.
157
158
159File: flex.info, Node: How do I track the byte offset for lseek()?, Next: How do I use my own I/O classes in a C++ scanner?, Prev: Memory leak - 16386 bytes allocated by malloc., Up: FAQ
160
161How do I track the byte offset for lseek()?
162===========================================
163
164
165 > We thought that it would be possible to have this number through the
166 > evaluation of the following expression:
167 >
168 > seek_position = (no_buffers)*YY_READ_BUF_SIZE + yy_c_buf_p - YY_CURRENT_BUFFER->yy_ch_buf
169
170 While this is the right idea, it has two problems. The first is that
171it's possible that `flex' will request less than `YY_READ_BUF_SIZE'
172during an invocation of `YY_INPUT' (or that your input source will
173return less even though `YY_READ_BUF_SIZE' bytes were requested). The
174second problem is that when refilling its internal buffer, `flex' keeps
175some characters from the previous buffer (because usually it's in the
176middle of a match, and needs those characters to construct `yytext' for
177the match once it's done). Because of this, `yy_c_buf_p -
178YY_CURRENT_BUFFER->yy_ch_buf' won't be exactly the number of characters
179already read from the current buffer.
180
181 An alternative solution is to count the number of characters you've
182matched since starting to scan. This can be done by using
183`YY_USER_ACTION'. For example,
184
185
186 #define YY_USER_ACTION num_chars += yyleng;
187
188 (You need to be careful to update your bookkeeping if you use
189`yymore('), `yyless()', `unput()', or `input()'.)
190
191
192File: flex.info, Node: How do I use my own I/O classes in a C++ scanner?, Next: How do I skip as many chars as possible?, Prev: How do I track the byte offset for lseek()?, Up: FAQ
193
194How do I use my own I/O classes in a C++ scanner?
195=================================================
196
197 When the flex C++ scanning class rewrite finally happens, then this
198sort of thing should become much easier.
199
200 You can do this by passing the various functions (such as
201`LexerInput()' and `LexerOutput()') NULL `iostream*''s, and then
202dealing with your own I/O classes surreptitiously (i.e., stashing them
203in special member variables). This works because the only assumption
204about the lexer regarding what's done with the iostream's is that
205they're ultimately passed to `LexerInput()' and `LexerOutput', which
206then do whatever is necessary with them.
207
208
209File: flex.info, Node: How do I skip as many chars as possible?, Next: deleteme00, Prev: How do I use my own I/O classes in a C++ scanner?, Up: FAQ
210
211How do I skip as many chars as possible?
212========================================
213
214 How do I skip as many chars as possible - without interfering with
215the other patterns?
216
217 In the example below, we want to skip over characters until we see
218the phrase "endskip". The following will _NOT_ work correctly (do you
219see why not?)
220
221
222 /* INCORRECT SCANNER */
223 %x SKIP
224 %%
225 <INITIAL>startskip BEGIN(SKIP);
226 ...
227 <SKIP>"endskip" BEGIN(INITIAL);
228 <SKIP>.* ;
229
230 The problem is that the pattern .* will eat up the word "endskip."
231The simplest (but slow) fix is:
232
233
234 <SKIP>"endskip" BEGIN(INITIAL);
235 <SKIP>. ;
236
237 The fix involves making the second rule match more, without making
238it match "endskip" plus something else. So for example:
239
240
241 <SKIP>"endskip" BEGIN(INITIAL);
242 <SKIP>[^e]+ ;
243 <SKIP>. ;/* so you eat up e's, too */
244
245
246File: flex.info, Node: deleteme00, Next: Are certain equivalent patterns faster than others?, Prev: How do I skip as many chars as possible?, Up: FAQ
247
248deleteme00
249==========
250
251
252 QUESTION:
253 When was flex born?
254
255 Vern Paxson took over
256 the Software Tools lex project from Jef Poskanzer in 1982. At that point it
257 was written in Ratfor. Around 1987 or so, Paxson translated it into C, and
258 a legend was born :-).
259
260
261File: flex.info, Node: Are certain equivalent patterns faster than others?, Next: Is backing up a big deal?, Prev: deleteme00, Up: FAQ
262
263Are certain equivalent patterns faster than others?
264===================================================
265
266
267 To: Adoram Rogel <adoram@orna.hybridge.com>
268 Subject: Re: Flex 2.5.2 performance questions
269 In-reply-to: Your message of Wed, 18 Sep 96 11:12:17 EDT.
270 Date: Wed, 18 Sep 96 10:51:02 PDT
271 From: Vern Paxson <vern>
272
273 [Note, the most recent flex release is 2.5.4, which you can get from
274 ftp.ee.lbl.gov. It has bug fixes over 2.5.2 and 2.5.3.]
275
276 > 1. Using the pattern
277 > ([Ff](oot)?)?[Nn](ote)?(\.)?
278 > instead of
279 > (((F|f)oot(N|n)ote)|((N|n)ote)|((N|n)\.)|((F|f)(N|n)(\.)))
280 > (in a very complicated flex program) caused the program to slow from
281 > 300K+/min to 100K/min (no other changes were done).
282
283 These two are not equivalent. For example, the first can match "footnote."
284 but the second can only match "footnote". This is almost certainly the
285 cause in the discrepancy - the slower scanner run is matching more tokens,
286 and/or having to do more backing up.
287
288 > 2. Which of these two are better: [Ff]oot or (F|f)oot ?
289
290 From a performance point of view, they're equivalent (modulo presumably
291 minor effects such as memory cache hit rates; and the presence of trailing
292 context, see below). From a space point of view, the first is slightly
293 preferable.
294
295 > 3. I have a pattern that look like this:
296 > pats {p1}|{p2}|{p3}|...|{p50} (50 patterns ORd)
297 >
298 > running yet another complicated program that includes the following rule:
299 > <snext>{and}/{no4}{bb}{pats}
300 >
301 > gets me to "too complicated - over 32,000 states"...
302
303 I can't tell from this example whether the trailing context is variable-length
304 or fixed-length (it could be the latter if {and} is fixed-length). If it's
305 variable length, which flex -p will tell you, then this reflects a basic
306 performance problem, and if you can eliminate it by restructuring your
307 scanner, you will see significant improvement.
308
309 > so I divided {pats} to {pats1}, {pats2},..., {pats5} each consists of about
310 > 10 patterns and changed the rule to be 5 rules.
311 > This did compile, but what is the rule of thumb here ?
312
313 The rule is to avoid trailing context other than fixed-length, in which for
314 a/b, either the 'a' pattern or the 'b' pattern have a fixed length. Use
315 of the '|' operator automatically makes the pattern variable length, so in
316 this case '[Ff]oot' is preferred to '(F|f)oot'.
317
318 > 4. I changed a rule that looked like this:
319 > <snext8>{and}{bb}/{ROMAN}[^A-Za-z] { BEGIN...
320 >
321 > to the next 2 rules:
322 > <snext8>{and}{bb}/{ROMAN}[A-Za-z] { ECHO;}
323 > <snext8>{and}{bb}/{ROMAN} { BEGIN...
324 >
325 > Again, I understand the using [^...] will cause a great performance loss
326
327 Actually, it doesn't cause any sort of performance loss. It's a surprising
328 fact about regular expressions that they always match in linear time
329 regardless of how complex they are.
330
331 > but are there any specific rules about it ?
332
333 See the "Performance Considerations" section of the man page, and also
334 the example in MISC/fastwc/.
335
336 Vern
337
338
339File: flex.info, Node: Is backing up a big deal?, Next: Can I fake multi-byte character support?, Prev: Are certain equivalent patterns faster than others?, Up: FAQ
340
341Is backing up a big deal?
342=========================
343
344
345 To: Adoram Rogel <adoram@hybridge.com>
346 Subject: Re: Flex 2.5.2 performance questions
347 In-reply-to: Your message of Thu, 19 Sep 96 10:16:04 EDT.
348 Date: Thu, 19 Sep 96 09:58:00 PDT
349 From: Vern Paxson <vern>
350
351 > a lot about the backing up problem.
352 > I believe that there lies my biggest problem, and I'll try to improve
353 > it.
354
355 Since you have variable trailing context, this is a bigger performance
356 problem. Fixing it is usually easier than fixing backing up, which in a
357 complicated scanner (yours seems to fit the bill) can be extremely
358 difficult to do correctly.
359
360 You also don't mention what flags you are using for your scanner.
361 -f makes a large speed difference, and -Cfe buys you nearly as much
362 speed but the resulting scanner is considerably smaller.
363
364 > I have an | operator in {and} and in {pats} so both of them are variable
365 > length.
366
367 -p should have reported this.
368
369 > Is changing one of them to fixed-length is enough ?
370
371 Yes.
372
373 > Is it possible to change the 32,000 states limit ?
374
375 Yes. I've appended instructions on how. Before you make this change,
376 though, you should think about whether there are ways to fundamentally
377 simplify your scanner - those are certainly preferable!
378
379 Vern
380
381 To increase the 32K limit (on a machine with 32 bit integers), you increase
382 the magnitude of the following in flexdef.h:
383
384 #define JAMSTATE -32766 /* marks a reference to the state that always jams */
385 #define MAXIMUM_MNS 31999
386 #define BAD_SUBSCRIPT -32767
387 #define MAX_SHORT 32700
388
389 Adding a 0 or two after each should do the trick.
390
391
392File: flex.info, Node: Can I fake multi-byte character support?, Next: deleteme01, Prev: Is backing up a big deal?, Up: FAQ
393
394Can I fake multi-byte character support?
395========================================
396
397
398 To: Heeman_Lee@hp.com
399 Subject: Re: flex - multi-byte support?
400 In-reply-to: Your message of Thu, 03 Oct 1996 17:24:04 PDT.
401 Date: Fri, 04 Oct 1996 11:42:18 PDT
402 From: Vern Paxson <vern>
403
404 > I assume as long as my *.l file defines the
405 > range of expected character code values (in octal format), flex will
406 > scan the file and read multi-byte characters correctly. But I have no
407 > confidence in this assumption.
408
409 Your lack of confidence is justified - this won't work.
410
411 Flex has in it a widespread assumption that the input is processed
412 one byte at a time. Fixing this is on the to-do list, but is involved,
413 so it won't happen any time soon. In the interim, the best I can suggest
414 (unless you want to try fixing it yourself) is to write your rules in
415 terms of pairs of bytes, using definitions in the first section:
416
417 X \xfe\xc2
418 ...
419 %%
420 foo{X}bar found_foo_fe_c2_bar();
421
422 etc. Definitely a pain - sorry about that.
423
424 By the way, the email address you used for me is ancient, indicating you
425 have a very old version of flex. You can get the most recent, 2.5.4, from
426 ftp.ee.lbl.gov.
427
428 Vern
429
430
431File: flex.info, Node: deleteme01, Next: Can you discuss some flex internals?, Prev: Can I fake multi-byte character support?, Up: FAQ
432
433deleteme01
434==========
435
436
437 To: moleary@primus.com
438 Subject: Re: Flex / Unicode compatibility question
439 In-reply-to: Your message of Tue, 22 Oct 1996 10:15:42 PDT.
440 Date: Tue, 22 Oct 1996 11:06:13 PDT
441 From: Vern Paxson <vern>
442
443 Unfortunately flex at the moment has a widespread assumption within it
444 that characters are processed 8 bits at a time. I don't see any easy
445 fix for this (other than writing your rules in terms of double characters -
446 a pain). I also don't know of a wider lex, though you might try surfing
447 the Plan 9 stuff because I know it's a Unicode system, and also the PCCT
448 toolkit (try searching say Alta Vista for "Purdue Compiler Construction
449 Toolkit").
450
451 Fixing flex to handle wider characters is on the long-term to-do list.
452 But since flex is a strictly spare-time project these days, this probably
453 won't happen for quite a while, unless someone else does it first.
454
455 Vern
456
457
458File: flex.info, Node: Can you discuss some flex internals?, Next: unput() messes up yy_at_bol, Prev: deleteme01, Up: FAQ
459
460Can you discuss some flex internals?
461====================================
462
463
464 To: Johan Linde <jl@theophys.kth.se>
465 Subject: Re: translation of flex
466 In-reply-to: Your message of Sun, 10 Nov 1996 09:16:36 PST.
467 Date: Mon, 11 Nov 1996 10:33:50 PST
468 From: Vern Paxson <vern>
469
470 > I'm working for the Swedish team translating GNU program, and I'm currently
471 > working with flex. I have a few questions about some of the messages which
472 > I hope you can answer.
473
474 All of the things you're wondering about, by the way, concerning flex
475 internals - probably the only person who understands what they mean in
476 English is me! So I wouldn't worry too much about getting them right.
477 That said ...
478
479 > #: main.c:545
480 > msgid " %d protos created\n"
481 >
482 > Does proto mean prototype?
483
484 Yes - prototypes of state compression tables.
485
486 > #: main.c:539
487 > msgid " %d/%d (peak %d) template nxt-chk entries created\n"
488 >
489 > Here I'm mainly puzzled by 'nxt-chk'. I guess it means 'next-check'. (?)
490 > However, 'template next-check entries' doesn't make much sense to me. To be
491 > able to find a good translation I need to know a little bit more about it.
492
493 There is a scheme in the Aho/Sethi/Ullman compiler book for compressing
494 scanner tables. It involves creating two pairs of tables. The first has
495 "base" and "default" entries, the second has "next" and "check" entries.
496 The "base" entry is indexed by the current state and yields an index into
497 the next/check table. The "default" entry gives what to do if the state
498 transition isn't found in next/check. The "next" entry gives the next
499 state to enter, but only if the "check" entry verifies that this entry is
500 correct for the current state. Flex creates templates of series of
501 next/check entries and then encodes differences from these templates as a
502 way to compress the tables.
503
504 > #: main.c:533
505 > msgid " %d/%d base-def entries created\n"
506 >
507 > The same problem here for 'base-def'.
508
509 See above.
510
511 Vern
512
513
514File: flex.info, Node: unput() messes up yy_at_bol, Next: The | operator is not doing what I want, Prev: Can you discuss some flex internals?, Up: FAQ
515
516unput() messes up yy_at_bol
517===========================
518
519
520 To: Xinying Li <xli@npac.syr.edu>
521 Subject: Re: FLEX ?
522 In-reply-to: Your message of Wed, 13 Nov 1996 17:28:38 PST.
523 Date: Wed, 13 Nov 1996 19:51:54 PST
524 From: Vern Paxson <vern>
525
526 > "unput()" them to input flow, question occurs. If I do this after I scan
527 > a carriage, the variable "YY_CURRENT_BUFFER->yy_at_bol" is changed. That
528 > means the carriage flag has gone.
529
530 You can control this by calling yy_set_bol(). It's described in the manual.
531
532 > And if in pre-reading it goes to the end of file, is anything done
533 > to control the end of curren buffer and end of file?
534
535 No, there's no way to put back an end-of-file.
536
537 > By the way I am using flex 2.5.2 and using the "-l".
538
539 The latest release is 2.5.4, by the way. It fixes some bugs in 2.5.2 and
540 2.5.3. You can get it from ftp.ee.lbl.gov.
541
542 Vern
543
544
545File: flex.info, Node: The | operator is not doing what I want, Next: Why can't flex understand this variable trailing context pattern?, Prev: unput() messes up yy_at_bol, Up: FAQ
546
547The | operator is not doing what I want
548=======================================
549
550
551 To: Alain.ISSARD@st.com
552 Subject: Re: Start condition with FLEX
553 In-reply-to: Your message of Mon, 18 Nov 1996 09:45:02 PST.
554 Date: Mon, 18 Nov 1996 10:41:34 PST
555 From: Vern Paxson <vern>
556
557 > I am not able to use the start condition scope and to use the | (OR) with
558 > rules having start conditions.
559
560 The problem is that if you use '|' as a regular expression operator, for
561 example "a|b" meaning "match either 'a' or 'b'", then it must *not* have
562 any blanks around it. If you instead want the special '|' *action* (which
563 from your scanner appears to be the case), which is a way of giving two
564 different rules the same action:
565
566 foo |
567 bar matched_foo_or_bar();
568
569 then '|' *must* be separated from the first rule by whitespace and *must*
570 be followed by a new line. You *cannot* write it as:
571
572 foo | bar matched_foo_or_bar();
573
574 even though you might think you could because yacc supports this syntax.
575 The reason for this unfortunately incompatibility is historical, but it's
576 unlikely to be changed.
577
578 Your problems with start condition scope are simply due to syntax errors
579 from your use of '|' later confusing flex.
580
581 Let me know if you still have problems.
582
583 Vern
584
585
586File: flex.info, Node: Why can't flex understand this variable trailing context pattern?, Next: The ^ operator isn't working, Prev: The | operator is not doing what I want, Up: FAQ
587
588Why can't flex understand this variable trailing context pattern?
589=================================================================
590
591
592 To: Gregory Margo <gmargo@newton.vip.best.com>
593 Subject: Re: flex-2.5.3 bug report
594 In-reply-to: Your message of Sat, 23 Nov 1996 16:50:09 PST.
595 Date: Sat, 23 Nov 1996 17:07:32 PST
596 From: Vern Paxson <vern>
597
598 > Enclosed is a lex file that "real" lex will process, but I cannot get
599 > flex to process it. Could you try it and maybe point me in the right direction?
600
601 Your problem is that some of the definitions in the scanner use the '/'
602 trailing context operator, and have it enclosed in ()'s. Flex does not
603 allow this operator to be enclosed in ()'s because doing so allows undefined
604 regular expressions such as "(a/b)+". So the solution is to remove the
605 parentheses. Note that you must also be building the scanner with the -l
606 option for AT&T lex compatibility. Without this option, flex automatically
607 encloses the definitions in parentheses.
608
609 Vern
610
611
612File: flex.info, Node: The ^ operator isn't working, Next: Trailing context is getting confused with trailing optional patterns, Prev: Why can't flex understand this variable trailing context pattern?, Up: FAQ
613
614The ^ operator isn't working
615============================
616
617
618 To: Thomas Hadig <hadig@toots.physik.rwth-aachen.de>
619 Subject: Re: Flex Bug ?
620 In-reply-to: Your message of Tue, 26 Nov 1996 14:35:01 PST.
621 Date: Tue, 26 Nov 1996 11:15:05 PST
622 From: Vern Paxson <vern>
623
624 > In my lexer code, i have the line :
625 > ^\*.* { }
626 >
627 > Thus all lines starting with an astrix (*) are comment lines.
628 > This does not work !
629
630 I can't get this problem to reproduce - it works fine for me. Note
631 though that if what you have is slightly different:
632
633 COMMENT ^\*.*
634 %%
635 {COMMENT} { }
636
637 then it won't work, because flex pushes back macro definitions enclosed
638 in ()'s, so the rule becomes
639
640 (^\*.*) { }
641
642 and now that the '^' operator is not at the immediate beginning of the
643 line, it's interpreted as just a regular character. You can avoid this
644 behavior by using the "-l" lex-compatibility flag, or "%option lex-compat".
645
646 Vern
647
648
649File: flex.info, Node: Trailing context is getting confused with trailing optional patterns, Next: Is flex GNU or not?, Prev: The ^ operator isn't working, Up: FAQ
650
651Trailing context is getting confused with trailing optional patterns
652====================================================================
653
654
655 To: Adoram Rogel <adoram@hybridge.com>
656 Subject: Re: Flex 2.5.4 BOF ???
657 In-reply-to: Your message of Tue, 26 Nov 1996 16:10:41 PST.
658 Date: Wed, 27 Nov 1996 10:56:25 PST
659 From: Vern Paxson <vern>
660
661 > Organization(s)?/[a-z]
662 >
663 > This matched "Organizations" (looking in debug mode, the trailing s
664 > was matched with trailing context instead of the optional (s) in the
665 > end of the word.
666
667 That should only happen with lex. Flex can properly match this pattern.
668 (That might be what you're saying, I'm just not sure.)
669
670 > Is there a way to avoid this dangerous trailing context problem ?
671
672 Unfortunately, there's no easy way. On the other hand, I don't see why
673 it should be a problem. Lex's matching is clearly wrong, and I'd hope
674 that usually the intent remains the same as expressed with the pattern,
675 so flex's matching will be correct.
676
677 Vern
678
679
680File: flex.info, Node: Is flex GNU or not?, Next: ERASEME53, Prev: Trailing context is getting confused with trailing optional patterns, Up: FAQ
681
682Is flex GNU or not?
683===================
684
685
686 To: Cameron MacKinnon <mackin@interlog.com>
687 Subject: Re: Flex documentation bug
688 In-reply-to: Your message of Mon, 02 Dec 1996 00:07:08 PST.
689 Date: Sun, 01 Dec 1996 22:29:39 PST
690 From: Vern Paxson <vern>
691
692 > I'm not sure how or where to submit bug reports (documentation or
693 > otherwise) for the GNU project stuff ...
694
695 Well, strictly speaking flex isn't part of the GNU project. They just
696 distribute it because no one's written a decent GPL'd lex replacement.
697 So you should send bugs directly to me. Those sent to the GNU folks
698 sometimes find there way to me, but some may drop between the cracks.
699
700 > In GNU Info, under the section 'Start Conditions', and also in the man
701 > page (mine's dated April '95) is a nice little snippet showing how to
702 > parse C quoted strings into a buffer, defined to be MAX_STR_CONST in
703 > size. Unfortunately, no overflow checking is ever done ...
704
705 This is already mentioned in the manual:
706
707 Finally, here's an example of how to match C-style quoted
708 strings using exclusive start conditions, including expanded
709 escape sequences (but not including checking for a string
710 that's too long):
711
712 The reason for not doing the overflow checking is that it will needlessly
713 clutter up an example whose main purpose is just to demonstrate how to
714 use flex.
715
716 The latest release is 2.5.4, by the way, available from ftp.ee.lbl.gov.
717
718 Vern
719
720
721File: flex.info, Node: ERASEME53, Next: I need to scan if-then-else blocks and while loops, Prev: Is flex GNU or not?, Up: FAQ
722
723ERASEME53
724=========
725
726
727 To: tsv@cs.UManitoba.CA
728 Subject: Re: Flex (reg)..
729 In-reply-to: Your message of Thu, 06 Mar 1997 23:50:16 PST.
730 Date: Thu, 06 Mar 1997 15:54:19 PST
731 From: Vern Paxson <vern>
732
733 > [:alpha:] ([:alnum:] | \\_)*
734
735 If your rule really has embedded blanks as shown above, then it won't
736 work, as the first blank delimits the rule from the action. (It wouldn't
737 even compile ...) You need instead:
738
739 [:alpha:]([:alnum:]|\\_)*
740
741 and that should work fine - there's no restriction on what can go inside
742 of ()'s except for the trailing context operator, '/'.
743
744 Vern
745
746
747File: flex.info, Node: I need to scan if-then-else blocks and while loops, Next: ERASEME55, Prev: ERASEME53, Up: FAQ
748
749I need to scan if-then-else blocks and while loops
750==================================================
751
752
753 To: "Mike Stolnicki" <mstolnic@ford.com>
754 Subject: Re: FLEX help
755 In-reply-to: Your message of Fri, 30 May 1997 13:33:27 PDT.
756 Date: Fri, 30 May 1997 10:46:35 PDT
757 From: Vern Paxson <vern>
758
759 > We'd like to add "if-then-else", "while", and "for" statements to our
760 > language ...
761 > We've investigated many possible solutions. The one solution that seems
762 > the most reasonable involves knowing the position of a TOKEN in yyin.
763
764 I strongly advise you to instead build a parse tree (abstract syntax tree)
765 and loop over that instead. You'll find this has major benefits in keeping
766 your interpreter simple and extensible.
767
768 That said, the functionality you mention for get_position and set_position
769 have been on the to-do list for a while. As flex is a purely spare-time
770 project for me, no guarantees when this will be added (in particular, it
771 for sure won't be for many months to come).
772
773 Vern
774
775
776File: flex.info, Node: ERASEME55, Next: ERASEME56, Prev: I need to scan if-then-else blocks and while loops, Up: FAQ
777
778ERASEME55
779=========
780
781
782 To: Colin Paul Adams <colin@colina.demon.co.uk>
783 Subject: Re: Flex C++ classes and Bison
784 In-reply-to: Your message of 09 Aug 1997 17:11:41 PDT.
785 Date: Fri, 15 Aug 1997 10:48:19 PDT
786 From: Vern Paxson <vern>
787
788 > #define YY_DECL int yylex (YYSTYPE *lvalp, struct parser_control
789 > *parm)
790 >
791 > I have been trying to get this to work as a C++ scanner, but it does
792 > not appear to be possible (warning that it matches no declarations in
793 > yyFlexLexer, or something like that).
794 >
795 > Is this supposed to be possible, or is it being worked on (I DID
796 > notice the comment that scanner classes are still experimental, so I'm
797 > not too hopeful)?
798
799 What you need to do is derive a subclass from yyFlexLexer that provides
800 the above yylex() method, squirrels away lvalp and parm into member
801 variables, and then invokes yyFlexLexer::yylex() to do the regular scanning.
802
803 Vern
804
805
806File: flex.info, Node: ERASEME56, Next: ERASEME57, Prev: ERASEME55, Up: FAQ
807
808ERASEME56
809=========
810
811
812 To: Mikael.Latvala@lmf.ericsson.se
813 Subject: Re: Possible mistake in Flex v2.5 document
814 In-reply-to: Your message of Fri, 05 Sep 1997 16:07:24 PDT.
815 Date: Fri, 05 Sep 1997 10:01:54 PDT
816 From: Vern Paxson <vern>
817
818 > In that example you show how to count comment lines when using
819 > C style /* ... */ comments. My question is, shouldn't you take into
820 > account a scenario where end of a comment marker occurs inside
821 > character or string literals?
822
823 The scanner certainly needs to also scan character and string literals.
824 However it does that (there's an example in the man page for strings), the
825 lexer will recognize the beginning of the literal before it runs across the
826 embedded "/*". Consequently, it will finish scanning the literal before it
827 even considers the possibility of matching "/*".
828
829 Example:
830
831 '([^']*|{ESCAPE_SEQUENCE})'
832
833 will match all the text between the ''s (inclusive). So the lexer
834 considers this as a token beginning at the first ', and doesn't even
835 attempt to match other tokens inside it.
836
837 I thinnk this subtlety is not worth putting in the manual, as I suspect
838 it would confuse more people than it would enlighten.
839
840 Vern
841
842
843File: flex.info, Node: ERASEME57, Next: Is there a repository for flex scanners?, Prev: ERASEME56, Up: FAQ
844
845ERASEME57
846=========
847
848
849 To: "Marty Leisner" <leisner@sdsp.mc.xerox.com>
850 Subject: Re: flex limitations
851 In-reply-to: Your message of Sat, 06 Sep 1997 11:27:21 PDT.
852 Date: Mon, 08 Sep 1997 11:38:08 PDT
853 From: Vern Paxson <vern>
854
855 > %%
856 > [a-zA-Z]+ /* skip a line */
857 > { printf("got %s\n", yytext); }
858 > %%
859
860 What version of flex are you using? If I feed this to 2.5.4, it complains:
861
862 "bug.l", line 5: EOF encountered inside an action
863 "bug.l", line 5: unrecognized rule
864 "bug.l", line 5: fatal parse error
865
866 Not the world's greatest error message, but it manages to flag the problem.
867
868 (With the introduction of start condition scopes, flex can't accommodate
869 an action on a separate line, since it's ambiguous with an indented rule.)
870
871 You can get 2.5.4 from ftp.ee.lbl.gov.
872
873 Vern
874
875
876File: flex.info, Node: Is there a repository for flex scanners?, Next: How can I conditionally compile or preprocess my flex input file?, Prev: ERASEME57, Up: FAQ
877
878Is there a repository for flex scanners?
879========================================
880
881 Not that we know of. You might try asking on comp.compilers.
882
883
884File: flex.info, Node: How can I conditionally compile or preprocess my flex input file?, Next: Where can I find grammars for lex and yacc?, Prev: Is there a repository for flex scanners?, Up: FAQ
885
886How can I conditionally compile or preprocess my flex input file?
887=================================================================
888
889 Flex doesn't have a preprocessor like C does. You might try using
890m4, or the C preprocessor plus a sed script to clean up the result.
891
892
893File: flex.info, Node: Where can I find grammars for lex and yacc?, Next: I get an end-of-buffer message for each character scanned., Prev: How can I conditionally compile or preprocess my flex input file?, Up: FAQ
894
895Where can I find grammars for lex and yacc?
896===========================================
897
898 In the sources for flex and bison.
899
900
901File: flex.info, Node: I get an end-of-buffer message for each character scanned., Next: unnamed-faq-62, Prev: Where can I find grammars for lex and yacc?, Up: FAQ
902
903I get an end-of-buffer message for each character scanned.
904==========================================================
905
906 This will happen if your LexerInput() function returns only one
907character at a time, which can happen either if you're scanner is
908"interactive", or if the streams library on your platform always
909returns 1 for yyin->gcount().
910
911 Solution: override LexerInput() with a version that returns whole
912buffers.
913
914
915File: flex.info, Node: unnamed-faq-62, Next: unnamed-faq-63, Prev: I get an end-of-buffer message for each character scanned., Up: FAQ
916
917unnamed-faq-62
918==============
919
920
921 To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE
922 Subject: Re: Flex maximums
923 In-reply-to: Your message of Mon, 17 Nov 1997 17:16:06 PST.
924 Date: Mon, 17 Nov 1997 17:16:15 PST
925 From: Vern Paxson <vern>
926
927 > I took a quick look into the flex-sources and altered some #defines in
928 > flexdefs.h:
929 >
930 > #define INITIAL_MNS 64000
931 > #define MNS_INCREMENT 1024000
932 > #define MAXIMUM_MNS 64000
933
934 The things to fix are to add a couple of zeroes to:
935
936 #define JAMSTATE -32766 /* marks a reference to the state that always jams */
937 #define MAXIMUM_MNS 31999
938 #define BAD_SUBSCRIPT -32767
939 #define MAX_SHORT 32700
940
941 and, if you get complaints about too many rules, make the following change too:
942
943 #define YY_TRAILING_MASK 0x200000
944 #define YY_TRAILING_HEAD_MASK 0x400000
945
946 - Vern
947
948
949File: flex.info, Node: unnamed-faq-63, Next: unnamed-faq-64, Prev: unnamed-faq-62, Up: FAQ
950
951unnamed-faq-63
952==============
953
954
955 To: jimmey@lexis-nexis.com (Jimmey Todd)
956 Subject: Re: FLEX question regarding istream vs ifstream
957 In-reply-to: Your message of Mon, 08 Dec 1997 15:54:15 PST.
958 Date: Mon, 15 Dec 1997 13:21:35 PST
959 From: Vern Paxson <vern>
960
961 > stdin_handle = YY_CURRENT_BUFFER;
962 > ifstream fin( "aFile" );
963 > yy_switch_to_buffer( yy_create_buffer( fin, YY_BUF_SIZE ) );
964 >
965 > What I'm wanting to do, is pass the contents of a file thru one set
966 > of rules and then pass stdin thru another set... It works great if, I
967 > don't use the C++ classes. But since everything else that I'm doing is
968 > in C++, I thought I'd be consistent.
969 >
970 > The problem is that 'yy_create_buffer' is expecting an istream* as it's
971 > first argument (as stated in the man page). However, fin is a ifstream
972 > object. Any ideas on what I might be doing wrong? Any help would be
973 > appreciated. Thanks!!
974
975 You need to pass &fin, to turn it into an ifstream* instead of an ifstream.
976 Then its type will be compatible with the expected istream*, because ifstream
977 is derived from istream.
978
979 Vern
980
981
982File: flex.info, Node: unnamed-faq-64, Next: unnamed-faq-65, Prev: unnamed-faq-63, Up: FAQ
983
984unnamed-faq-64
985==============
986
987
988 To: Enda Fadian <fadiane@piercom.ie>
989 Subject: Re: Question related to Flex man page?
990 In-reply-to: Your message of Tue, 16 Dec 1997 15:17:34 PST.
991 Date: Tue, 16 Dec 1997 14:17:09 PST
992 From: Vern Paxson <vern>
993
994 > Can you explain to me what is ment by a long-jump in relation to flex?
995
996 Using the longjmp() function while inside yylex() or a routine called by it.
997
998 > what is the flex activation frame.
999
1000 Just yylex()'s stack frame.
1001
1002 > As far as I can see yyrestart will bring me back to the sart of the input
1003 > file and using flex++ isnot really an option!
1004
1005 No, yyrestart() doesn't imply a rewind, even though its name might sound
1006 like it does. It tells the scanner to flush its internal buffers and
1007 start reading from the given file at its present location.
1008
1009 Vern
1010
1011
1012File: flex.info, Node: unnamed-faq-65, Next: unnamed-faq-66, Prev: unnamed-faq-64, Up: FAQ
1013
1014unnamed-faq-65
1015==============
1016
1017
1018 To: hassan@larc.info.uqam.ca (Hassan Alaoui)
1019 Subject: Re: Need urgent Help
1020 In-reply-to: Your message of Sat, 20 Dec 1997 19:38:19 PST.
1021 Date: Sun, 21 Dec 1997 21:30:46 PST
1022 From: Vern Paxson <vern>
1023
1024 > /usr/lib/yaccpar: In function `int yyparse()':
1025 > /usr/lib/yaccpar:184: warning: implicit declaration of function `int yylex(...)'
1026 >
1027 > ld: Undefined symbol
1028 > _yylex
1029 > _yyparse
1030 > _yyin
1031
1032 This is a known problem with Solaris C++ (and/or Solaris yacc). I believe
1033 the fix is to explicitly insert some 'extern "C"' statements for the
1034 corresponding routines/symbols.
1035
1036 Vern
1037
1038
1039File: flex.info, Node: unnamed-faq-66, Next: unnamed-faq-67, Prev: unnamed-faq-65, Up: FAQ
1040
1041unnamed-faq-66
1042==============
1043
1044
1045 To: mc0307@mclink.it
1046 Cc: gnu@prep.ai.mit.edu
1047 Subject: Re: [mc0307@mclink.it: Help request]
1048 In-reply-to: Your message of Fri, 12 Dec 1997 17:57:29 PST.
1049 Date: Sun, 21 Dec 1997 22:33:37 PST
1050 From: Vern Paxson <vern>
1051
1052 > This is my definition for float and integer types:
1053 > . . .
1054 > NZD [1-9]
1055 > ...
1056 > I've tested my program on other lex version (on UNIX Sun Solaris an HP
1057 > UNIX) and it work well, so I think that my definitions are correct.
1058 > There are any differences between Lex and Flex?
1059
1060 There are indeed differences, as discussed in the man page. The one
1061 you are probably running into is that when flex expands a name definition,
1062 it puts parentheses around the expansion, while lex does not. There's
1063 an example in the man page of how this can lead to different matching.
1064 Flex's behavior complies with the POSIX standard (or at least with the
1065 last POSIX draft I saw).
1066
1067 Vern
1068
1069
1070File: flex.info, Node: unnamed-faq-67, Next: unnamed-faq-68, Prev: unnamed-faq-66, Up: FAQ
1071
1072unnamed-faq-67
1073==============
1074
1075
1076 To: hassan@larc.info.uqam.ca (Hassan Alaoui)
1077 Subject: Re: Thanks
1078 In-reply-to: Your message of Mon, 22 Dec 1997 16:06:35 PST.
1079 Date: Mon, 22 Dec 1997 14:35:05 PST
1080 From: Vern Paxson <vern>
1081
1082 > Thank you very much for your help. I compile and link well with C++ while
1083 > declaring 'yylex ...' extern, But a little problem remains. I get a
1084 > segmentation default when executing ( I linked with lfl library) while it
1085 > works well when using LEX instead of flex. Do you have some ideas about the
1086 > reason for this ?
1087
1088 The one possible reason for this that comes to mind is if you've defined
1089 yytext as "extern char yytext[]" (which is what lex uses) instead of
1090 "extern char *yytext" (which is what flex uses). If it's not that, then
1091 I'm afraid I don't know what the problem might be.
1092
1093 Vern
1094
1095
1096File: flex.info, Node: unnamed-faq-68, Next: unnamed-faq-69, Prev: unnamed-faq-67, Up: FAQ
1097
1098unnamed-faq-68
1099==============
1100
1101
1102 To: "Bart Niswonger" <NISWONGR@almaden.ibm.com>
1103 Subject: Re: flex 2.5: c++ scanners & start conditions
1104 In-reply-to: Your message of Tue, 06 Jan 1998 10:34:21 PST.
1105 Date: Tue, 06 Jan 1998 19:19:30 PST
1106 From: Vern Paxson <vern>
1107
1108 > The problem is that when I do this (using %option c++) start
1109 > conditions seem to not apply.
1110
1111 The BEGIN macro modifies the yy_start variable. For C scanners, this
1112 is a static with scope visible through the whole file. For C++ scanners,
1113 it's a member variable, so it only has visible scope within a member
1114 function. Your lexbegin() routine is not a member function when you
1115 build a C++ scanner, so it's not modifying the correct yy_start. The
1116 diagnostic that indicates this is that you found you needed to add
1117 a declaration of yy_start in order to get your scanner to compile when
1118 using C++; instead, the correct fix is to make lexbegin() a member
1119 function (by deriving from yyFlexLexer).
1120
1121 Vern
1122
1123
1124File: flex.info, Node: unnamed-faq-69, Next: unnamed-faq-70, Prev: unnamed-faq-68, Up: FAQ
1125
1126unnamed-faq-69
1127==============
1128
1129
1130 To: "Boris Zinin" <boris@ippe.rssi.ru>
1131 Subject: Re: current position in flex buffer
1132 In-reply-to: Your message of Mon, 12 Jan 1998 18:58:23 PST.
1133 Date: Mon, 12 Jan 1998 12:03:15 PST
1134 From: Vern Paxson <vern>
1135
1136 > The problem is how to determine the current position in flex active
1137 > buffer when a rule is matched....
1138
1139 You will need to keep track of this explicitly, such as by redefining
1140 YY_USER_ACTION to count the number of characters matched.
1141
1142 The latest flex release, by the way, is 2.5.4, available from ftp.ee.lbl.gov.
1143
1144 Vern
1145
1146
1147File: flex.info, Node: unnamed-faq-70, Next: unnamed-faq-71, Prev: unnamed-faq-69, Up: FAQ
1148
1149unnamed-faq-70
1150==============
1151
1152
1153 To: Bik.Dhaliwal@bis.org
1154 Subject: Re: Flex question
1155 In-reply-to: Your message of Mon, 26 Jan 1998 13:05:35 PST.
1156 Date: Tue, 27 Jan 1998 22:41:52 PST
1157 From: Vern Paxson <vern>
1158
1159 > That requirement involves knowing
1160 > the character position at which a particular token was matched
1161 > in the lexer.
1162
1163 The way you have to do this is by explicitly keeping track of where
1164 you are in the file, by counting the number of characters scanned
1165 for each token (available in yyleng). It may prove convenient to
1166 do this by redefining YY_USER_ACTION, as described in the manual.
1167
1168 Vern
1169
1170
1171File: flex.info, Node: unnamed-faq-71, Next: unnamed-faq-72, Prev: unnamed-faq-70, Up: FAQ
1172
1173unnamed-faq-71
1174==============
1175
1176
1177 To: Vladimir Alexiev <vladimir@cs.ualberta.ca>
1178 Subject: Re: flex: how to control start condition from parser?
1179 In-reply-to: Your message of Mon, 26 Jan 1998 05:50:16 PST.
1180 Date: Tue, 27 Jan 1998 22:45:37 PST
1181 From: Vern Paxson <vern>
1182
1183 > It seems useful for the parser to be able to tell the lexer about such
1184 > context dependencies, because then they don't have to be limited to
1185 > local or sequential context.
1186
1187 One way to do this is to have the parser call a stub routine that's
1188 included in the scanner's .l file, and consequently that has access ot
1189 BEGIN. The only ugliness is that the parser can't pass in the state
1190 it wants, because those aren't visible - but if you don't have many
1191 such states, then using a different set of names doesn't seem like
1192 to much of a burden.
1193
1194 While generating a .h file like you suggests is certainly cleaner,
1195 flex development has come to a virtual stand-still :-(, so a workaround
1196 like the above is much more pragmatic than waiting for a new feature.
1197
1198 Vern
1199
1200
1201File: flex.info, Node: unnamed-faq-72, Next: unnamed-faq-73, Prev: unnamed-faq-71, Up: FAQ
1202
1203unnamed-faq-72
1204==============
1205
1206
1207 To: Barbara Denny <denny@3com.com>
1208 Subject: Re: freebsd flex bug?
1209 In-reply-to: Your message of Fri, 30 Jan 1998 12:00:43 PST.
1210 Date: Fri, 30 Jan 1998 12:42:32 PST
1211 From: Vern Paxson <vern>
1212
1213 > lex.yy.c:1996: parse error before `='
1214
1215 This is the key, identifying this error. (It may help to pinpoint
1216 it by using flex -L, so it doesn't generate #line directives in its
1217 output.) I will bet you heavy money that you have a start condition
1218 name that is also a variable name, or something like that; flex spits
1219 out #define's for each start condition name, mapping them to a number,
1220 so you can wind up with:
1221
1222 %x foo
1223 %%
1224 ...
1225 %%
1226 void bar()
1227 {
1228 int foo = 3;
1229 }
1230
1231 and the penultimate will turn into "int 1 = 3" after C preprocessing,
1232 since flex will put "#define foo 1" in the generated scanner.
1233
1234 Vern
1235
1236
1237File: flex.info, Node: unnamed-faq-73, Next: unnamed-faq-74, Prev: unnamed-faq-72, Up: FAQ
1238
1239unnamed-faq-73
1240==============
1241
1242
1243 To: Maurice Petrie <mpetrie@infoscigroup.com>
1244 Subject: Re: Lost flex .l file
1245 In-reply-to: Your message of Mon, 02 Feb 1998 14:10:01 PST.
1246 Date: Mon, 02 Feb 1998 11:15:12 PST
1247 From: Vern Paxson <vern>
1248
1249 > I am curious as to
1250 > whether there is a simple way to backtrack from the generated source to
1251 > reproduce the lost list of tokens we are searching on.
1252
1253 In theory, it's straight-forward to go from the DFA representation
1254 back to a regular-expression representation - the two are isomorphic.
1255 In practice, a huge headache, because you have to unpack all the tables
1256 back into a single DFA representation, and then write a program to munch
1257 on that and translate it into an RE.
1258
1259 Sorry for the less-than-happy news ...
1260
1261 Vern
1262
1263
1264File: flex.info, Node: unnamed-faq-74, Next: unnamed-faq-75, Prev: unnamed-faq-73, Up: FAQ
1265
1266unnamed-faq-74
1267==============
1268
1269
1270 To: jimmey@lexis-nexis.com (Jimmey Todd)
1271 Subject: Re: Flex performance question
1272 In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST.
1273 Date: Thu, 19 Feb 1998 08:48:51 PST
1274 From: Vern Paxson <vern>
1275
1276 > What I have found, is that the smaller the data chunk, the faster the
1277 > program executes. This is the opposite of what I expected. Should this be
1278 > happening this way?
1279
1280 This is exactly what will happen if your input file has embedded NULs.
1281 From the man page:
1282
1283 A final note: flex is slow when matching NUL's, particularly
1284 when a token contains multiple NUL's. It's best to write
1285 rules which match short amounts of text if it's anticipated
1286 that the text will often include NUL's.
1287
1288 So that's the first thing to look for.
1289
1290 Vern
1291
1292
1293File: flex.info, Node: unnamed-faq-75, Next: unnamed-faq-76, Prev: unnamed-faq-74, Up: FAQ
1294
1295unnamed-faq-75
1296==============
1297
1298
1299 To: jimmey@lexis-nexis.com (Jimmey Todd)
1300 Subject: Re: Flex performance question
1301 In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST.
1302 Date: Thu, 19 Feb 1998 15:42:25 PST
1303 From: Vern Paxson <vern>
1304
1305 So there are several problems.
1306
1307 First, to go fast, you want to match as much text as possible, which
1308 your scanners don't in the case that what they're scanning is *not*
1309 a <RN> tag. So you want a rule like:
1310
1311 [^<]+
1312
1313 Second, C++ scanners are particularly slow if they're interactive,
1314 which they are by default. Using -B speeds it up by a factor of 3-4
1315 on my workstation.
1316
1317 Third, C++ scanners that use the istream interface are slow, because
1318 of how poorly implemented istream's are. I built two versions of
1319 the following scanner:
1320
1321 %%
1322 .*\n
1323 .*
1324 %%
1325
1326 and the C version inhales a 2.5MB file on my workstation in 0.8 seconds.
1327 The C++ istream version, using -B, takes 3.8 seconds.
1328
1329 Vern
1330
Note: See TracBrowser for help on using the repository browser.