Context Navigation

flex.info-5

Visit:

Last change on this file was 3031, checked in by bird, 18 years ago
flex 2.5.33.
File size: 50.0 KB

Line
1	This is flex.info, produced by makeinfo version 4.5 from flex.texi.
2
3	INFO-DIR-SECTION Programming
4	START-INFO-DIR-ENTRY
5	* flex: (flex). Fast lexical analyzer generator (lex replacement).
6	END-INFO-DIR-ENTRY
7
8
9	The flex manual is placed under the same licensing conditions as the
10	rest of flex:
11
12	Copyright (C) 1990, 1997 The Regents of the University of California.
13	All rights reserved.
14
15	This code is derived from software contributed to Berkeley by Vern
16	Paxson.
17
18	The United States Government has rights in this work pursuant to
19	contract no. DE-AC03-76SF00098 between the United States Department of
20	Energy and the University of California.
21
22	Redistribution and use in source and binary forms, with or without
23	modification, are permitted provided that the following conditions are
24	met:
25
26	1. Redistributions of source code must retain the above copyright
27	notice, this list of conditions and the following disclaimer.
28
29	2. Redistributions in binary form must reproduce the above copyright
30	notice, this list of conditions and the following disclaimer in the
31	documentation and/or other materials provided with the
32	distribution.
33	Neither the name of the University nor the names of its contributors
34	may be used to endorse or promote products derived from this software
35	without specific prior written permission.
36
37	THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
38	WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
39	MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
40
41	File: flex.info, Node: How do I match any string not matched in the preceding rules?, Next: I am trying to port code from AT&T lex that uses yysptr and yysbuf., Prev: How can I build a two-pass scanner?, Up: FAQ
42
43	How do I match any string not matched in the preceding rules?
44	=============================================================
45
46	One way to assign precedence, is to place the more specific rules
47	first. If two rules would match the same input (same sequence of
48	characters) then the first rule listed in the `flex' input wins. e.g.,
49
50
51	%%
52	foo[a-zA-Z_]+ return FOO_ID;
53	bar[a-zA-Z_]+ return BAR_ID;
54	[a-zA-Z_]+ return GENERIC_ID;
55
56	Note that the rule `[a-zA-Z_]+' must come after the others. It
57	will match the same amount of text as the more specific rules, and in
58	that case the `flex' scanner will pick the first rule listed in your
59	scanner as the one to match.
60
61
62	File: flex.info, Node: I am trying to port code from AT&T lex that uses yysptr and yysbuf., Next: Is there a way to make flex treat NULL like a regular character?, Prev: How do I match any string not matched in the preceding rules?, Up: FAQ
63
64	I am trying to port code from AT&T lex that uses yysptr and yysbuf.
65	===================================================================
66
67	Those are internal variables pointing into the AT&T scanner's input
68	buffer. I imagine they're being manipulated in user versions of the
69	`input()' and `unput()' functions. If so, what you need to do is
70	analyze those functions to figure out what they're doing, and then
71	replace `input()' with an appropriate definition of `YY_INPUT'. You
72	shouldn't need to (and must not) replace `flex''s `unput()' function.
73
74
75	File: flex.info, Node: Is there a way to make flex treat NULL like a regular character?, Next: Whenever flex can not match the input it says "flex scanner jammed"., Prev: I am trying to port code from AT&T lex that uses yysptr and yysbuf., Up: FAQ
76
77	Is there a way to make flex treat NULL like a regular character?
78	================================================================
79
80	Yes, `\0' and `\x00' should both do the trick. Perhaps you have an
81	ancient version of `flex'. The latest release is version 2.5.33.
82
83
84	File: flex.info, Node: Whenever flex can not match the input it says "flex scanner jammed"., Next: Why doesnt flex have non-greedy operators like perl does?, Prev: Is there a way to make flex treat NULL like a regular character?, Up: FAQ
85
86	Whenever flex can not match the input it says "flex scanner jammed".
87	====================================================================
88
89	You need to add a rule that matches the otherwise-unmatched text.
90	e.g.,
91
92
93	%option yylineno
94	%%
95	[[a bunch of rules here]]
96
97	. printf("bad input character '%s' at line %d\n", yytext, yylineno);
98
99	See `%option default' for more information.
100
101
102	File: flex.info, Node: Why doesnt flex have non-greedy operators like perl does?, Next: Memory leak - 16386 bytes allocated by malloc., Prev: Whenever flex can not match the input it says "flex scanner jammed"., Up: FAQ
103
104	Why doesn't flex have non-greedy operators like perl does?
105	==========================================================
106
107	A DFA can do a non-greedy match by stopping the first time it enters
108	an accepting state, instead of consuming input until it determines that
109	no further matching is possible (a "jam" state). This is actually
110	easier to implement than longest leftmost match (which flex does).
111
112	But it's also much less useful than longest leftmost match. In
113	general, when you find yourself wishing for non-greedy matching, that's
114	usually a sign that you're trying to make the scanner do some parsing.
115	That's generally the wrong approach, since it lacks the power to do a
116	decent job. Better is to either introduce a separate parser, or to
117	split the scanner into multiple scanners using (exclusive) start
118	conditions.
119
120	You might have a separate start state once you've seen the `BEGIN'.
121	In that state, you might then have a regex that will match `END' (to
122	kick you out of the state), and perhaps `(.\|\n)' to get a single
123	character within the chunk ...
124
125	This approach also has much better error-reporting properties.
126
127
128	File: flex.info, Node: Memory leak - 16386 bytes allocated by malloc., Next: How do I track the byte offset for lseek()?, Prev: Why doesnt flex have non-greedy operators like perl does?, Up: FAQ
129
130	Memory leak - 16386 bytes allocated by malloc.
131	==============================================
132
133	UPDATED 2002-07-10: As of `flex' version 2.5.9, this leak means that
134	you did not call `yylex_destroy()'. If you are using an earlier version
135	of `flex', then read on.
136
137	The leak is about 16426 bytes. That is, (8192 * 2 + 2) for the
138	read-buffer, and about 40 for `struct yy_buffer_state' (depending upon
139	alignment). The leak is in the non-reentrant C scanner only (NOT in the
140	reentrant scanner, NOT in the C++ scanner). Since `flex' doesn't know
141	when you are done, the buffer is never freed.
142
143	However, the leak won't multiply since the buffer is reused no
144	matter how many times you call `yylex()'.
145
146	If you want to reclaim the memory when you are completely done
147	scanning, then you might try this:
148
149
150	/* For non-reentrant C scanner only. */
151	yy_delete_buffer(YY_CURRENT_BUFFER);
152	yy_init = 1;
153
154	Note: `yy_init' is an "internal variable", and hasn't been tested in
155	this situation. It is possible that some other globals may need
156	resetting as well.
157
158
159	File: flex.info, Node: How do I track the byte offset for lseek()?, Next: How do I use my own I/O classes in a C++ scanner?, Prev: Memory leak - 16386 bytes allocated by malloc., Up: FAQ
160
161	How do I track the byte offset for lseek()?
162	===========================================
163
164
165	> We thought that it would be possible to have this number through the
166	> evaluation of the following expression:
167	>
168	> seek_position = (no_buffers)*YY_READ_BUF_SIZE + yy_c_buf_p - YY_CURRENT_BUFFER->yy_ch_buf
169
170	While this is the right idea, it has two problems. The first is that
171	it's possible that `flex' will request less than `YY_READ_BUF_SIZE'
172	during an invocation of `YY_INPUT' (or that your input source will
173	return less even though `YY_READ_BUF_SIZE' bytes were requested). The
174	second problem is that when refilling its internal buffer, `flex' keeps
175	some characters from the previous buffer (because usually it's in the
176	middle of a match, and needs those characters to construct `yytext' for
177	the match once it's done). Because of this, `yy_c_buf_p -
178	YY_CURRENT_BUFFER->yy_ch_buf' won't be exactly the number of characters
179	already read from the current buffer.
180
181	An alternative solution is to count the number of characters you've
182	matched since starting to scan. This can be done by using
183	`YY_USER_ACTION'. For example,
184
185
186	#define YY_USER_ACTION num_chars += yyleng;
187
188	(You need to be careful to update your bookkeeping if you use
189	`yymore('), `yyless()', `unput()', or `input()'.)
190
191
192	File: flex.info, Node: How do I use my own I/O classes in a C++ scanner?, Next: How do I skip as many chars as possible?, Prev: How do I track the byte offset for lseek()?, Up: FAQ
193
194	How do I use my own I/O classes in a C++ scanner?
195	=================================================
196
197	When the flex C++ scanning class rewrite finally happens, then this
198	sort of thing should become much easier.
199
200	You can do this by passing the various functions (such as
201	`LexerInput()' and `LexerOutput()') NULL `iostream*''s, and then
202	dealing with your own I/O classes surreptitiously (i.e., stashing them
203	in special member variables). This works because the only assumption
204	about the lexer regarding what's done with the iostream's is that
205	they're ultimately passed to `LexerInput()' and `LexerOutput', which
206	then do whatever is necessary with them.
207
208
209	File: flex.info, Node: How do I skip as many chars as possible?, Next: deleteme00, Prev: How do I use my own I/O classes in a C++ scanner?, Up: FAQ
210
211	How do I skip as many chars as possible?
212	========================================
213
214	How do I skip as many chars as possible - without interfering with
215	the other patterns?
216
217	In the example below, we want to skip over characters until we see
218	the phrase "endskip". The following will _NOT_ work correctly (do you
219	see why not?)
220
221
222	/* INCORRECT SCANNER */
223	%x SKIP
224	%%
225	<INITIAL>startskip BEGIN(SKIP);
226	...
227	<SKIP>"endskip" BEGIN(INITIAL);
228	<SKIP>.* ;
229
230	The problem is that the pattern .* will eat up the word "endskip."
231	The simplest (but slow) fix is:
232
233
234	<SKIP>"endskip" BEGIN(INITIAL);
235	<SKIP>. ;
236
237	The fix involves making the second rule match more, without making
238	it match "endskip" plus something else. So for example:
239
240
241	<SKIP>"endskip" BEGIN(INITIAL);
242	<SKIP>[^e]+ ;
243	<SKIP>. ;/* so you eat up e's, too */
244
245
246	File: flex.info, Node: deleteme00, Next: Are certain equivalent patterns faster than others?, Prev: How do I skip as many chars as possible?, Up: FAQ
247
248	deleteme00
249	==========
250
251
252	QUESTION:
253	When was flex born?
254
255	Vern Paxson took over
256	the Software Tools lex project from Jef Poskanzer in 1982. At that point it
257	was written in Ratfor. Around 1987 or so, Paxson translated it into C, and
258	a legend was born :-).
259
260
261	File: flex.info, Node: Are certain equivalent patterns faster than others?, Next: Is backing up a big deal?, Prev: deleteme00, Up: FAQ
262
263	Are certain equivalent patterns faster than others?
264	===================================================
265
266
267	To: Adoram Rogel <adoram@orna.hybridge.com>
268	Subject: Re: Flex 2.5.2 performance questions
269	In-reply-to: Your message of Wed, 18 Sep 96 11:12:17 EDT.
270	Date: Wed, 18 Sep 96 10:51:02 PDT
271	From: Vern Paxson <vern>
272
273	[Note, the most recent flex release is 2.5.4, which you can get from
274	ftp.ee.lbl.gov. It has bug fixes over 2.5.2 and 2.5.3.]
275
276	> 1. Using the pattern
277	> ([Ff](oot)?)?[Nn](ote)?(\.)?
278	> instead of
279	> (((F\|f)oot(N\|n)ote)\|((N\|n)ote)\|((N\|n)\.)\|((F\|f)(N\|n)(\.)))
280	> (in a very complicated flex program) caused the program to slow from
281	> 300K+/min to 100K/min (no other changes were done).
282
283	These two are not equivalent. For example, the first can match "footnote."
284	but the second can only match "footnote". This is almost certainly the
285	cause in the discrepancy - the slower scanner run is matching more tokens,
286	and/or having to do more backing up.
287
288	> 2. Which of these two are better: [Ff]oot or (F\|f)oot ?
289
290	From a performance point of view, they're equivalent (modulo presumably
291	minor effects such as memory cache hit rates; and the presence of trailing
292	context, see below). From a space point of view, the first is slightly
293	preferable.
294
295	> 3. I have a pattern that look like this:
296	> pats {p1}\|{p2}\|{p3}\|...\|{p50} (50 patterns ORd)
297	>
298	> running yet another complicated program that includes the following rule:
299	> <snext>{and}/{no4}{bb}{pats}
300	>
301	> gets me to "too complicated - over 32,000 states"...
302
303	I can't tell from this example whether the trailing context is variable-length
304	or fixed-length (it could be the latter if {and} is fixed-length). If it's
305	variable length, which flex -p will tell you, then this reflects a basic
306	performance problem, and if you can eliminate it by restructuring your
307	scanner, you will see significant improvement.
308
309	> so I divided {pats} to {pats1}, {pats2},..., {pats5} each consists of about
310	> 10 patterns and changed the rule to be 5 rules.
311	> This did compile, but what is the rule of thumb here ?
312
313	The rule is to avoid trailing context other than fixed-length, in which for
314	a/b, either the 'a' pattern or the 'b' pattern have a fixed length. Use
315	of the '\|' operator automatically makes the pattern variable length, so in
316	this case '[Ff]oot' is preferred to '(F\|f)oot'.
317
318	> 4. I changed a rule that looked like this:
319	> <snext8>{and}{bb}/{ROMAN}[^A-Za-z] { BEGIN...
320	>
321	> to the next 2 rules:
322	> <snext8>{and}{bb}/{ROMAN}[A-Za-z] { ECHO;}
323	> <snext8>{and}{bb}/{ROMAN} { BEGIN...
324	>
325	> Again, I understand the using [^...] will cause a great performance loss
326
327	Actually, it doesn't cause any sort of performance loss. It's a surprising
328	fact about regular expressions that they always match in linear time
329	regardless of how complex they are.
330
331	> but are there any specific rules about it ?
332
333	See the "Performance Considerations" section of the man page, and also
334	the example in MISC/fastwc/.
335
336	Vern
337
338
339	File: flex.info, Node: Is backing up a big deal?, Next: Can I fake multi-byte character support?, Prev: Are certain equivalent patterns faster than others?, Up: FAQ
340
341	Is backing up a big deal?
342	=========================
343
344
345	To: Adoram Rogel <adoram@hybridge.com>
346	Subject: Re: Flex 2.5.2 performance questions
347	In-reply-to: Your message of Thu, 19 Sep 96 10:16:04 EDT.
348	Date: Thu, 19 Sep 96 09:58:00 PDT
349	From: Vern Paxson <vern>
350
351	> a lot about the backing up problem.
352	> I believe that there lies my biggest problem, and I'll try to improve
353	> it.
354
355	Since you have variable trailing context, this is a bigger performance
356	problem. Fixing it is usually easier than fixing backing up, which in a
357	complicated scanner (yours seems to fit the bill) can be extremely
358	difficult to do correctly.
359
360	You also don't mention what flags you are using for your scanner.
361	-f makes a large speed difference, and -Cfe buys you nearly as much
362	speed but the resulting scanner is considerably smaller.
363
364	> I have an \| operator in {and} and in {pats} so both of them are variable
365	> length.
366
367	-p should have reported this.
368
369	> Is changing one of them to fixed-length is enough ?
370
371	Yes.
372
373	> Is it possible to change the 32,000 states limit ?
374
375	Yes. I've appended instructions on how. Before you make this change,
376	though, you should think about whether there are ways to fundamentally
377	simplify your scanner - those are certainly preferable!
378
379	Vern
380
381	To increase the 32K limit (on a machine with 32 bit integers), you increase
382	the magnitude of the following in flexdef.h:
383
384	#define JAMSTATE -32766 /* marks a reference to the state that always jams */
385	#define MAXIMUM_MNS 31999
386	#define BAD_SUBSCRIPT -32767
387	#define MAX_SHORT 32700
388
389	Adding a 0 or two after each should do the trick.
390
391
392	File: flex.info, Node: Can I fake multi-byte character support?, Next: deleteme01, Prev: Is backing up a big deal?, Up: FAQ
393
394	Can I fake multi-byte character support?
395	========================================
396
397
398	To: Heeman_Lee@hp.com
399	Subject: Re: flex - multi-byte support?
400	In-reply-to: Your message of Thu, 03 Oct 1996 17:24:04 PDT.
401	Date: Fri, 04 Oct 1996 11:42:18 PDT
402	From: Vern Paxson <vern>
403
404	> I assume as long as my *.l file defines the
405	> range of expected character code values (in octal format), flex will
406	> scan the file and read multi-byte characters correctly. But I have no
407	> confidence in this assumption.
408
409	Your lack of confidence is justified - this won't work.
410
411	Flex has in it a widespread assumption that the input is processed
412	one byte at a time. Fixing this is on the to-do list, but is involved,
413	so it won't happen any time soon. In the interim, the best I can suggest
414	(unless you want to try fixing it yourself) is to write your rules in
415	terms of pairs of bytes, using definitions in the first section:
416
417	X \xfe\xc2
418	...
419	%%
420	foo{X}bar found_foo_fe_c2_bar();
421
422	etc. Definitely a pain - sorry about that.
423
424	By the way, the email address you used for me is ancient, indicating you
425	have a very old version of flex. You can get the most recent, 2.5.4, from
426	ftp.ee.lbl.gov.
427
428	Vern
429
430
431	File: flex.info, Node: deleteme01, Next: Can you discuss some flex internals?, Prev: Can I fake multi-byte character support?, Up: FAQ
432
433	deleteme01
434	==========
435
436
437	To: moleary@primus.com
438	Subject: Re: Flex / Unicode compatibility question
439	In-reply-to: Your message of Tue, 22 Oct 1996 10:15:42 PDT.
440	Date: Tue, 22 Oct 1996 11:06:13 PDT
441	From: Vern Paxson <vern>
442
443	Unfortunately flex at the moment has a widespread assumption within it
444	that characters are processed 8 bits at a time. I don't see any easy
445	fix for this (other than writing your rules in terms of double characters -
446	a pain). I also don't know of a wider lex, though you might try surfing
447	the Plan 9 stuff because I know it's a Unicode system, and also the PCCT
448	toolkit (try searching say Alta Vista for "Purdue Compiler Construction
449	Toolkit").
450
451	Fixing flex to handle wider characters is on the long-term to-do list.
452	But since flex is a strictly spare-time project these days, this probably
453	won't happen for quite a while, unless someone else does it first.
454
455	Vern
456
457
458	File: flex.info, Node: Can you discuss some flex internals?, Next: unput() messes up yy_at_bol, Prev: deleteme01, Up: FAQ
459
460	Can you discuss some flex internals?
461	====================================
462
463
464	To: Johan Linde <jl@theophys.kth.se>
465	Subject: Re: translation of flex
466	In-reply-to: Your message of Sun, 10 Nov 1996 09:16:36 PST.
467	Date: Mon, 11 Nov 1996 10:33:50 PST
468	From: Vern Paxson <vern>
469
470	> I'm working for the Swedish team translating GNU program, and I'm currently
471	> working with flex. I have a few questions about some of the messages which
472	> I hope you can answer.
473
474	All of the things you're wondering about, by the way, concerning flex
475	internals - probably the only person who understands what they mean in
476	English is me! So I wouldn't worry too much about getting them right.
477	That said ...
478
479	> #: main.c:545
480	> msgid " %d protos created\n"
481	>
482	> Does proto mean prototype?
483
484	Yes - prototypes of state compression tables.
485
486	> #: main.c:539
487	> msgid " %d/%d (peak %d) template nxt-chk entries created\n"
488	>
489	> Here I'm mainly puzzled by 'nxt-chk'. I guess it means 'next-check'. (?)
490	> However, 'template next-check entries' doesn't make much sense to me. To be
491	> able to find a good translation I need to know a little bit more about it.
492
493	There is a scheme in the Aho/Sethi/Ullman compiler book for compressing
494	scanner tables. It involves creating two pairs of tables. The first has
495	"base" and "default" entries, the second has "next" and "check" entries.
496	The "base" entry is indexed by the current state and yields an index into
497	the next/check table. The "default" entry gives what to do if the state
498	transition isn't found in next/check. The "next" entry gives the next
499	state to enter, but only if the "check" entry verifies that this entry is
500	correct for the current state. Flex creates templates of series of
501	next/check entries and then encodes differences from these templates as a
502	way to compress the tables.
503
504	> #: main.c:533
505	> msgid " %d/%d base-def entries created\n"
506	>
507	> The same problem here for 'base-def'.
508
509	See above.
510
511	Vern
512
513
514	File: flex.info, Node: unput() messes up yy_at_bol, Next: The \| operator is not doing what I want, Prev: Can you discuss some flex internals?, Up: FAQ
515
516	unput() messes up yy_at_bol
517	===========================
518
519
520	To: Xinying Li <xli@npac.syr.edu>
521	Subject: Re: FLEX ?
522	In-reply-to: Your message of Wed, 13 Nov 1996 17:28:38 PST.
523	Date: Wed, 13 Nov 1996 19:51:54 PST
524	From: Vern Paxson <vern>
525
526	> "unput()" them to input flow, question occurs. If I do this after I scan
527	> a carriage, the variable "YY_CURRENT_BUFFER->yy_at_bol" is changed. That
528	> means the carriage flag has gone.
529
530	You can control this by calling yy_set_bol(). It's described in the manual.
531
532	> And if in pre-reading it goes to the end of file, is anything done
533	> to control the end of curren buffer and end of file?
534
535	No, there's no way to put back an end-of-file.
536
537	> By the way I am using flex 2.5.2 and using the "-l".
538
539	The latest release is 2.5.4, by the way. It fixes some bugs in 2.5.2 and
540	2.5.3. You can get it from ftp.ee.lbl.gov.
541
542	Vern
543
544
545	File: flex.info, Node: The \| operator is not doing what I want, Next: Why can't flex understand this variable trailing context pattern?, Prev: unput() messes up yy_at_bol, Up: FAQ
546
547	The \| operator is not doing what I want
548	=======================================
549
550
551	To: Alain.ISSARD@st.com
552	Subject: Re: Start condition with FLEX
553	In-reply-to: Your message of Mon, 18 Nov 1996 09:45:02 PST.
554	Date: Mon, 18 Nov 1996 10:41:34 PST
555	From: Vern Paxson <vern>
556
557	> I am not able to use the start condition scope and to use the \| (OR) with
558	> rules having start conditions.
559
560	The problem is that if you use '\|' as a regular expression operator, for
561	example "a\|b" meaning "match either 'a' or 'b'", then it must not have
562	any blanks around it. If you instead want the special '\|' action (which
563	from your scanner appears to be the case), which is a way of giving two
564	different rules the same action:
565
566	foo \|
567	bar matched_foo_or_bar();
568
569	then '\|' must be separated from the first rule by whitespace and must
570	be followed by a new line. You cannot write it as:
571
572	foo \| bar matched_foo_or_bar();
573
574	even though you might think you could because yacc supports this syntax.
575	The reason for this unfortunately incompatibility is historical, but it's
576	unlikely to be changed.
577
578	Your problems with start condition scope are simply due to syntax errors
579	from your use of '\|' later confusing flex.
580
581	Let me know if you still have problems.
582
583	Vern
584
585
586	File: flex.info, Node: Why can't flex understand this variable trailing context pattern?, Next: The ^ operator isn't working, Prev: The \| operator is not doing what I want, Up: FAQ
587
588	Why can't flex understand this variable trailing context pattern?
589	=================================================================
590
591
592	To: Gregory Margo <gmargo@newton.vip.best.com>
593	Subject: Re: flex-2.5.3 bug report
594	In-reply-to: Your message of Sat, 23 Nov 1996 16:50:09 PST.
595	Date: Sat, 23 Nov 1996 17:07:32 PST
596	From: Vern Paxson <vern>
597
598	> Enclosed is a lex file that "real" lex will process, but I cannot get
599	> flex to process it. Could you try it and maybe point me in the right direction?
600
601	Your problem is that some of the definitions in the scanner use the '/'
602	trailing context operator, and have it enclosed in ()'s. Flex does not
603	allow this operator to be enclosed in ()'s because doing so allows undefined
604	regular expressions such as "(a/b)+". So the solution is to remove the
605	parentheses. Note that you must also be building the scanner with the -l
606	option for AT&T lex compatibility. Without this option, flex automatically
607	encloses the definitions in parentheses.
608
609	Vern
610
611
612	File: flex.info, Node: The ^ operator isn't working, Next: Trailing context is getting confused with trailing optional patterns, Prev: Why can't flex understand this variable trailing context pattern?, Up: FAQ
613
614	The ^ operator isn't working
615	============================
616
617
618	To: Thomas Hadig <hadig@toots.physik.rwth-aachen.de>
619	Subject: Re: Flex Bug ?
620	In-reply-to: Your message of Tue, 26 Nov 1996 14:35:01 PST.
621	Date: Tue, 26 Nov 1996 11:15:05 PST
622	From: Vern Paxson <vern>
623
624	> In my lexer code, i have the line :
625	> ^\. { }
626	>
627	> Thus all lines starting with an astrix (*) are comment lines.
628	> This does not work !
629
630	I can't get this problem to reproduce - it works fine for me. Note
631	though that if what you have is slightly different:
632
633	COMMENT ^\.
634	%%
635	{COMMENT} { }
636
637	then it won't work, because flex pushes back macro definitions enclosed
638	in ()'s, so the rule becomes
639
640	(^\.) { }
641
642	and now that the '^' operator is not at the immediate beginning of the
643	line, it's interpreted as just a regular character. You can avoid this
644	behavior by using the "-l" lex-compatibility flag, or "%option lex-compat".
645
646	Vern
647
648
649	File: flex.info, Node: Trailing context is getting confused with trailing optional patterns, Next: Is flex GNU or not?, Prev: The ^ operator isn't working, Up: FAQ
650
651	Trailing context is getting confused with trailing optional patterns
652	====================================================================
653
654
655	To: Adoram Rogel <adoram@hybridge.com>
656	Subject: Re: Flex 2.5.4 BOF ???
657	In-reply-to: Your message of Tue, 26 Nov 1996 16:10:41 PST.
658	Date: Wed, 27 Nov 1996 10:56:25 PST
659	From: Vern Paxson <vern>
660
661	> Organization(s)?/[a-z]
662	>
663	> This matched "Organizations" (looking in debug mode, the trailing s
664	> was matched with trailing context instead of the optional (s) in the
665	> end of the word.
666
667	That should only happen with lex. Flex can properly match this pattern.
668	(That might be what you're saying, I'm just not sure.)
669
670	> Is there a way to avoid this dangerous trailing context problem ?
671
672	Unfortunately, there's no easy way. On the other hand, I don't see why
673	it should be a problem. Lex's matching is clearly wrong, and I'd hope
674	that usually the intent remains the same as expressed with the pattern,
675	so flex's matching will be correct.
676
677	Vern
678
679
680	File: flex.info, Node: Is flex GNU or not?, Next: ERASEME53, Prev: Trailing context is getting confused with trailing optional patterns, Up: FAQ
681
682	Is flex GNU or not?
683	===================
684
685
686	To: Cameron MacKinnon <mackin@interlog.com>
687	Subject: Re: Flex documentation bug
688	In-reply-to: Your message of Mon, 02 Dec 1996 00:07:08 PST.
689	Date: Sun, 01 Dec 1996 22:29:39 PST
690	From: Vern Paxson <vern>
691
692	> I'm not sure how or where to submit bug reports (documentation or
693	> otherwise) for the GNU project stuff ...
694
695	Well, strictly speaking flex isn't part of the GNU project. They just
696	distribute it because no one's written a decent GPL'd lex replacement.
697	So you should send bugs directly to me. Those sent to the GNU folks
698	sometimes find there way to me, but some may drop between the cracks.
699
700	> In GNU Info, under the section 'Start Conditions', and also in the man
701	> page (mine's dated April '95) is a nice little snippet showing how to
702	> parse C quoted strings into a buffer, defined to be MAX_STR_CONST in
703	> size. Unfortunately, no overflow checking is ever done ...
704
705	This is already mentioned in the manual:
706
707	Finally, here's an example of how to match C-style quoted
708	strings using exclusive start conditions, including expanded
709	escape sequences (but not including checking for a string
710	that's too long):
711
712	The reason for not doing the overflow checking is that it will needlessly
713	clutter up an example whose main purpose is just to demonstrate how to
714	use flex.
715
716	The latest release is 2.5.4, by the way, available from ftp.ee.lbl.gov.
717
718	Vern
719
720
721	File: flex.info, Node: ERASEME53, Next: I need to scan if-then-else blocks and while loops, Prev: Is flex GNU or not?, Up: FAQ
722
723	ERASEME53
724	=========
725
726
727	To: tsv@cs.UManitoba.CA
728	Subject: Re: Flex (reg)..
729	In-reply-to: Your message of Thu, 06 Mar 1997 23:50:16 PST.
730	Date: Thu, 06 Mar 1997 15:54:19 PST
731	From: Vern Paxson <vern>
732
733	> [:alpha:] ([:alnum:] \| \\_)*
734
735	If your rule really has embedded blanks as shown above, then it won't
736	work, as the first blank delimits the rule from the action. (It wouldn't
737	even compile ...) You need instead:
738
739	[:alpha:]([:alnum:]\|\\_)*
740
741	and that should work fine - there's no restriction on what can go inside
742	of ()'s except for the trailing context operator, '/'.
743
744	Vern
745
746
747	File: flex.info, Node: I need to scan if-then-else blocks and while loops, Next: ERASEME55, Prev: ERASEME53, Up: FAQ
748
749	I need to scan if-then-else blocks and while loops
750	==================================================
751
752
753	To: "Mike Stolnicki" <mstolnic@ford.com>
754	Subject: Re: FLEX help
755	In-reply-to: Your message of Fri, 30 May 1997 13:33:27 PDT.
756	Date: Fri, 30 May 1997 10:46:35 PDT
757	From: Vern Paxson <vern>
758
759	> We'd like to add "if-then-else", "while", and "for" statements to our
760	> language ...
761	> We've investigated many possible solutions. The one solution that seems
762	> the most reasonable involves knowing the position of a TOKEN in yyin.
763
764	I strongly advise you to instead build a parse tree (abstract syntax tree)
765	and loop over that instead. You'll find this has major benefits in keeping
766	your interpreter simple and extensible.
767
768	That said, the functionality you mention for get_position and set_position
769	have been on the to-do list for a while. As flex is a purely spare-time
770	project for me, no guarantees when this will be added (in particular, it
771	for sure won't be for many months to come).
772
773	Vern
774
775
776	File: flex.info, Node: ERASEME55, Next: ERASEME56, Prev: I need to scan if-then-else blocks and while loops, Up: FAQ
777
778	ERASEME55
779	=========
780
781
782	To: Colin Paul Adams <colin@colina.demon.co.uk>
783	Subject: Re: Flex C++ classes and Bison
784	In-reply-to: Your message of 09 Aug 1997 17:11:41 PDT.
785	Date: Fri, 15 Aug 1997 10:48:19 PDT
786	From: Vern Paxson <vern>
787
788	> #define YY_DECL int yylex (YYSTYPE *lvalp, struct parser_control
789	> *parm)
790	>
791	> I have been trying to get this to work as a C++ scanner, but it does
792	> not appear to be possible (warning that it matches no declarations in
793	> yyFlexLexer, or something like that).
794	>
795	> Is this supposed to be possible, or is it being worked on (I DID
796	> notice the comment that scanner classes are still experimental, so I'm
797	> not too hopeful)?
798
799	What you need to do is derive a subclass from yyFlexLexer that provides
800	the above yylex() method, squirrels away lvalp and parm into member
801	variables, and then invokes yyFlexLexer::yylex() to do the regular scanning.
802
803	Vern
804
805
806	File: flex.info, Node: ERASEME56, Next: ERASEME57, Prev: ERASEME55, Up: FAQ
807
808	ERASEME56
809	=========
810
811
812	To: Mikael.Latvala@lmf.ericsson.se
813	Subject: Re: Possible mistake in Flex v2.5 document
814	In-reply-to: Your message of Fri, 05 Sep 1997 16:07:24 PDT.
815	Date: Fri, 05 Sep 1997 10:01:54 PDT
816	From: Vern Paxson <vern>
817
818	> In that example you show how to count comment lines when using
819	> C style /* ... */ comments. My question is, shouldn't you take into
820	> account a scenario where end of a comment marker occurs inside
821	> character or string literals?
822
823	The scanner certainly needs to also scan character and string literals.
824	However it does that (there's an example in the man page for strings), the
825	lexer will recognize the beginning of the literal before it runs across the
826	embedded "/*". Consequently, it will finish scanning the literal before it
827	even considers the possibility of matching "/*".
828
829	Example:
830
831	'([^']*\|{ESCAPE_SEQUENCE})'
832
833	will match all the text between the ''s (inclusive). So the lexer
834	considers this as a token beginning at the first ', and doesn't even
835	attempt to match other tokens inside it.
836
837	I thinnk this subtlety is not worth putting in the manual, as I suspect
838	it would confuse more people than it would enlighten.
839
840	Vern
841
842
843	File: flex.info, Node: ERASEME57, Next: Is there a repository for flex scanners?, Prev: ERASEME56, Up: FAQ
844
845	ERASEME57
846	=========
847
848
849	To: "Marty Leisner" <leisner@sdsp.mc.xerox.com>
850	Subject: Re: flex limitations
851	In-reply-to: Your message of Sat, 06 Sep 1997 11:27:21 PDT.
852	Date: Mon, 08 Sep 1997 11:38:08 PDT
853	From: Vern Paxson <vern>
854
855	> %%
856	> [a-zA-Z]+ /* skip a line */
857	> { printf("got %s\n", yytext); }
858	> %%
859
860	What version of flex are you using? If I feed this to 2.5.4, it complains:
861
862	"bug.l", line 5: EOF encountered inside an action
863	"bug.l", line 5: unrecognized rule
864	"bug.l", line 5: fatal parse error
865
866	Not the world's greatest error message, but it manages to flag the problem.
867
868	(With the introduction of start condition scopes, flex can't accommodate
869	an action on a separate line, since it's ambiguous with an indented rule.)
870
871	You can get 2.5.4 from ftp.ee.lbl.gov.
872
873	Vern
874
875
876	File: flex.info, Node: Is there a repository for flex scanners?, Next: How can I conditionally compile or preprocess my flex input file?, Prev: ERASEME57, Up: FAQ
877
878	Is there a repository for flex scanners?
879	========================================
880
881	Not that we know of. You might try asking on comp.compilers.
882
883
884	File: flex.info, Node: How can I conditionally compile or preprocess my flex input file?, Next: Where can I find grammars for lex and yacc?, Prev: Is there a repository for flex scanners?, Up: FAQ
885
886	How can I conditionally compile or preprocess my flex input file?
887	=================================================================
888
889	Flex doesn't have a preprocessor like C does. You might try using
890	m4, or the C preprocessor plus a sed script to clean up the result.
891
892
893	File: flex.info, Node: Where can I find grammars for lex and yacc?, Next: I get an end-of-buffer message for each character scanned., Prev: How can I conditionally compile or preprocess my flex input file?, Up: FAQ
894
895	Where can I find grammars for lex and yacc?
896	===========================================
897
898	In the sources for flex and bison.
899
900
901	File: flex.info, Node: I get an end-of-buffer message for each character scanned., Next: unnamed-faq-62, Prev: Where can I find grammars for lex and yacc?, Up: FAQ
902
903	I get an end-of-buffer message for each character scanned.
904	==========================================================
905
906	This will happen if your LexerInput() function returns only one
907	character at a time, which can happen either if you're scanner is
908	"interactive", or if the streams library on your platform always
909	returns 1 for yyin->gcount().
910
911	Solution: override LexerInput() with a version that returns whole
912	buffers.
913
914
915	File: flex.info, Node: unnamed-faq-62, Next: unnamed-faq-63, Prev: I get an end-of-buffer message for each character scanned., Up: FAQ
916
917	unnamed-faq-62
918	==============
919
920
921	To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE
922	Subject: Re: Flex maximums
923	In-reply-to: Your message of Mon, 17 Nov 1997 17:16:06 PST.
924	Date: Mon, 17 Nov 1997 17:16:15 PST
925	From: Vern Paxson <vern>
926
927	> I took a quick look into the flex-sources and altered some #defines in
928	> flexdefs.h:
929	>
930	> #define INITIAL_MNS 64000
931	> #define MNS_INCREMENT 1024000
932	> #define MAXIMUM_MNS 64000
933
934	The things to fix are to add a couple of zeroes to:
935
936	#define JAMSTATE -32766 /* marks a reference to the state that always jams */
937	#define MAXIMUM_MNS 31999
938	#define BAD_SUBSCRIPT -32767
939	#define MAX_SHORT 32700
940
941	and, if you get complaints about too many rules, make the following change too:
942
943	#define YY_TRAILING_MASK 0x200000
944	#define YY_TRAILING_HEAD_MASK 0x400000
945
946	- Vern
947
948
949	File: flex.info, Node: unnamed-faq-63, Next: unnamed-faq-64, Prev: unnamed-faq-62, Up: FAQ
950
951	unnamed-faq-63
952	==============
953
954
955	To: jimmey@lexis-nexis.com (Jimmey Todd)
956	Subject: Re: FLEX question regarding istream vs ifstream
957	In-reply-to: Your message of Mon, 08 Dec 1997 15:54:15 PST.
958	Date: Mon, 15 Dec 1997 13:21:35 PST
959	From: Vern Paxson <vern>
960
961	> stdin_handle = YY_CURRENT_BUFFER;
962	> ifstream fin( "aFile" );
963	> yy_switch_to_buffer( yy_create_buffer( fin, YY_BUF_SIZE ) );
964	>
965	> What I'm wanting to do, is pass the contents of a file thru one set
966	> of rules and then pass stdin thru another set... It works great if, I
967	> don't use the C++ classes. But since everything else that I'm doing is
968	> in C++, I thought I'd be consistent.
969	>
970	> The problem is that 'yy_create_buffer' is expecting an istream* as it's
971	> first argument (as stated in the man page). However, fin is a ifstream
972	> object. Any ideas on what I might be doing wrong? Any help would be
973	> appreciated. Thanks!!
974
975	You need to pass &fin, to turn it into an ifstream* instead of an ifstream.
976	Then its type will be compatible with the expected istream*, because ifstream
977	is derived from istream.
978
979	Vern
980
981
982	File: flex.info, Node: unnamed-faq-64, Next: unnamed-faq-65, Prev: unnamed-faq-63, Up: FAQ
983
984	unnamed-faq-64
985	==============
986
987
988	To: Enda Fadian <fadiane@piercom.ie>
989	Subject: Re: Question related to Flex man page?
990	In-reply-to: Your message of Tue, 16 Dec 1997 15:17:34 PST.
991	Date: Tue, 16 Dec 1997 14:17:09 PST
992	From: Vern Paxson <vern>
993
994	> Can you explain to me what is ment by a long-jump in relation to flex?
995
996	Using the longjmp() function while inside yylex() or a routine called by it.
997
998	> what is the flex activation frame.
999
1000	Just yylex()'s stack frame.
1001
1002	> As far as I can see yyrestart will bring me back to the sart of the input
1003	> file and using flex++ isnot really an option!
1004
1005	No, yyrestart() doesn't imply a rewind, even though its name might sound
1006	like it does. It tells the scanner to flush its internal buffers and
1007	start reading from the given file at its present location.
1008
1009	Vern
1010
1011
1012	File: flex.info, Node: unnamed-faq-65, Next: unnamed-faq-66, Prev: unnamed-faq-64, Up: FAQ
1013
1014	unnamed-faq-65
1015	==============
1016
1017
1018	To: hassan@larc.info.uqam.ca (Hassan Alaoui)
1019	Subject: Re: Need urgent Help
1020	In-reply-to: Your message of Sat, 20 Dec 1997 19:38:19 PST.
1021	Date: Sun, 21 Dec 1997 21:30:46 PST
1022	From: Vern Paxson <vern>
1023
1024	> /usr/lib/yaccpar: In function `int yyparse()':
1025	> /usr/lib/yaccpar:184: warning: implicit declaration of function `int yylex(...)'
1026	>
1027	> ld: Undefined symbol
1028	> _yylex
1029	> _yyparse
1030	> _yyin
1031
1032	This is a known problem with Solaris C++ (and/or Solaris yacc). I believe
1033	the fix is to explicitly insert some 'extern "C"' statements for the
1034	corresponding routines/symbols.
1035
1036	Vern
1037
1038
1039	File: flex.info, Node: unnamed-faq-66, Next: unnamed-faq-67, Prev: unnamed-faq-65, Up: FAQ
1040
1041	unnamed-faq-66
1042	==============
1043
1044
1045	To: mc0307@mclink.it
1046	Cc: gnu@prep.ai.mit.edu
1047	Subject: Re: [mc0307@mclink.it: Help request]
1048	In-reply-to: Your message of Fri, 12 Dec 1997 17:57:29 PST.
1049	Date: Sun, 21 Dec 1997 22:33:37 PST
1050	From: Vern Paxson <vern>
1051
1052	> This is my definition for float and integer types:
1053	> . . .
1054	> NZD [1-9]
1055	> ...
1056	> I've tested my program on other lex version (on UNIX Sun Solaris an HP
1057	> UNIX) and it work well, so I think that my definitions are correct.
1058	> There are any differences between Lex and Flex?
1059
1060	There are indeed differences, as discussed in the man page. The one
1061	you are probably running into is that when flex expands a name definition,
1062	it puts parentheses around the expansion, while lex does not. There's
1063	an example in the man page of how this can lead to different matching.
1064	Flex's behavior complies with the POSIX standard (or at least with the
1065	last POSIX draft I saw).
1066
1067	Vern
1068
1069
1070	File: flex.info, Node: unnamed-faq-67, Next: unnamed-faq-68, Prev: unnamed-faq-66, Up: FAQ
1071
1072	unnamed-faq-67
1073	==============
1074
1075
1076	To: hassan@larc.info.uqam.ca (Hassan Alaoui)
1077	Subject: Re: Thanks
1078	In-reply-to: Your message of Mon, 22 Dec 1997 16:06:35 PST.
1079	Date: Mon, 22 Dec 1997 14:35:05 PST
1080	From: Vern Paxson <vern>
1081
1082	> Thank you very much for your help. I compile and link well with C++ while
1083	> declaring 'yylex ...' extern, But a little problem remains. I get a
1084	> segmentation default when executing ( I linked with lfl library) while it
1085	> works well when using LEX instead of flex. Do you have some ideas about the
1086	> reason for this ?
1087
1088	The one possible reason for this that comes to mind is if you've defined
1089	yytext as "extern char yytext[]" (which is what lex uses) instead of
1090	"extern char *yytext" (which is what flex uses). If it's not that, then
1091	I'm afraid I don't know what the problem might be.
1092
1093	Vern
1094
1095
1096	File: flex.info, Node: unnamed-faq-68, Next: unnamed-faq-69, Prev: unnamed-faq-67, Up: FAQ
1097
1098	unnamed-faq-68
1099	==============
1100
1101
1102	To: "Bart Niswonger" <NISWONGR@almaden.ibm.com>
1103	Subject: Re: flex 2.5: c++ scanners & start conditions
1104	In-reply-to: Your message of Tue, 06 Jan 1998 10:34:21 PST.
1105	Date: Tue, 06 Jan 1998 19:19:30 PST
1106	From: Vern Paxson <vern>
1107
1108	> The problem is that when I do this (using %option c++) start
1109	> conditions seem to not apply.
1110
1111	The BEGIN macro modifies the yy_start variable. For C scanners, this
1112	is a static with scope visible through the whole file. For C++ scanners,
1113	it's a member variable, so it only has visible scope within a member
1114	function. Your lexbegin() routine is not a member function when you
1115	build a C++ scanner, so it's not modifying the correct yy_start. The
1116	diagnostic that indicates this is that you found you needed to add
1117	a declaration of yy_start in order to get your scanner to compile when
1118	using C++; instead, the correct fix is to make lexbegin() a member
1119	function (by deriving from yyFlexLexer).
1120
1121	Vern
1122
1123
1124	File: flex.info, Node: unnamed-faq-69, Next: unnamed-faq-70, Prev: unnamed-faq-68, Up: FAQ
1125
1126	unnamed-faq-69
1127	==============
1128
1129
1130	To: "Boris Zinin" <boris@ippe.rssi.ru>
1131	Subject: Re: current position in flex buffer
1132	In-reply-to: Your message of Mon, 12 Jan 1998 18:58:23 PST.
1133	Date: Mon, 12 Jan 1998 12:03:15 PST
1134	From: Vern Paxson <vern>
1135
1136	> The problem is how to determine the current position in flex active
1137	> buffer when a rule is matched....
1138
1139	You will need to keep track of this explicitly, such as by redefining
1140	YY_USER_ACTION to count the number of characters matched.
1141
1142	The latest flex release, by the way, is 2.5.4, available from ftp.ee.lbl.gov.
1143
1144	Vern
1145
1146
1147	File: flex.info, Node: unnamed-faq-70, Next: unnamed-faq-71, Prev: unnamed-faq-69, Up: FAQ
1148
1149	unnamed-faq-70
1150	==============
1151
1152
1153	To: Bik.Dhaliwal@bis.org
1154	Subject: Re: Flex question
1155	In-reply-to: Your message of Mon, 26 Jan 1998 13:05:35 PST.
1156	Date: Tue, 27 Jan 1998 22:41:52 PST
1157	From: Vern Paxson <vern>
1158
1159	> That requirement involves knowing
1160	> the character position at which a particular token was matched
1161	> in the lexer.
1162
1163	The way you have to do this is by explicitly keeping track of where
1164	you are in the file, by counting the number of characters scanned
1165	for each token (available in yyleng). It may prove convenient to
1166	do this by redefining YY_USER_ACTION, as described in the manual.
1167
1168	Vern
1169
1170
1171	File: flex.info, Node: unnamed-faq-71, Next: unnamed-faq-72, Prev: unnamed-faq-70, Up: FAQ
1172
1173	unnamed-faq-71
1174	==============
1175
1176
1177	To: Vladimir Alexiev <vladimir@cs.ualberta.ca>
1178	Subject: Re: flex: how to control start condition from parser?
1179	In-reply-to: Your message of Mon, 26 Jan 1998 05:50:16 PST.
1180	Date: Tue, 27 Jan 1998 22:45:37 PST
1181	From: Vern Paxson <vern>
1182
1183	> It seems useful for the parser to be able to tell the lexer about such
1184	> context dependencies, because then they don't have to be limited to
1185	> local or sequential context.
1186
1187	One way to do this is to have the parser call a stub routine that's
1188	included in the scanner's .l file, and consequently that has access ot
1189	BEGIN. The only ugliness is that the parser can't pass in the state
1190	it wants, because those aren't visible - but if you don't have many
1191	such states, then using a different set of names doesn't seem like
1192	to much of a burden.
1193
1194	While generating a .h file like you suggests is certainly cleaner,
1195	flex development has come to a virtual stand-still :-(, so a workaround
1196	like the above is much more pragmatic than waiting for a new feature.
1197
1198	Vern
1199
1200
1201	File: flex.info, Node: unnamed-faq-72, Next: unnamed-faq-73, Prev: unnamed-faq-71, Up: FAQ
1202
1203	unnamed-faq-72
1204	==============
1205
1206
1207	To: Barbara Denny <denny@3com.com>
1208	Subject: Re: freebsd flex bug?
1209	In-reply-to: Your message of Fri, 30 Jan 1998 12:00:43 PST.
1210	Date: Fri, 30 Jan 1998 12:42:32 PST
1211	From: Vern Paxson <vern>
1212
1213	> lex.yy.c:1996: parse error before `='
1214
1215	This is the key, identifying this error. (It may help to pinpoint
1216	it by using flex -L, so it doesn't generate #line directives in its
1217	output.) I will bet you heavy money that you have a start condition
1218	name that is also a variable name, or something like that; flex spits
1219	out #define's for each start condition name, mapping them to a number,
1220	so you can wind up with:
1221
1222	%x foo
1223	%%
1224	...
1225	%%
1226	void bar()
1227	{
1228	int foo = 3;
1229	}
1230
1231	and the penultimate will turn into "int 1 = 3" after C preprocessing,
1232	since flex will put "#define foo 1" in the generated scanner.
1233
1234	Vern
1235
1236
1237	File: flex.info, Node: unnamed-faq-73, Next: unnamed-faq-74, Prev: unnamed-faq-72, Up: FAQ
1238
1239	unnamed-faq-73
1240	==============
1241
1242
1243	To: Maurice Petrie <mpetrie@infoscigroup.com>
1244	Subject: Re: Lost flex .l file
1245	In-reply-to: Your message of Mon, 02 Feb 1998 14:10:01 PST.
1246	Date: Mon, 02 Feb 1998 11:15:12 PST
1247	From: Vern Paxson <vern>
1248
1249	> I am curious as to
1250	> whether there is a simple way to backtrack from the generated source to
1251	> reproduce the lost list of tokens we are searching on.
1252
1253	In theory, it's straight-forward to go from the DFA representation
1254	back to a regular-expression representation - the two are isomorphic.
1255	In practice, a huge headache, because you have to unpack all the tables
1256	back into a single DFA representation, and then write a program to munch
1257	on that and translate it into an RE.
1258
1259	Sorry for the less-than-happy news ...
1260
1261	Vern
1262
1263
1264	File: flex.info, Node: unnamed-faq-74, Next: unnamed-faq-75, Prev: unnamed-faq-73, Up: FAQ
1265
1266	unnamed-faq-74
1267	==============
1268
1269
1270	To: jimmey@lexis-nexis.com (Jimmey Todd)
1271	Subject: Re: Flex performance question
1272	In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST.
1273	Date: Thu, 19 Feb 1998 08:48:51 PST
1274	From: Vern Paxson <vern>
1275
1276	> What I have found, is that the smaller the data chunk, the faster the
1277	> program executes. This is the opposite of what I expected. Should this be
1278	> happening this way?
1279
1280	This is exactly what will happen if your input file has embedded NULs.
1281	From the man page:
1282
1283	A final note: flex is slow when matching NUL's, particularly
1284	when a token contains multiple NUL's. It's best to write
1285	rules which match short amounts of text if it's anticipated
1286	that the text will often include NUL's.
1287
1288	So that's the first thing to look for.
1289
1290	Vern
1291
1292
1293	File: flex.info, Node: unnamed-faq-75, Next: unnamed-faq-76, Prev: unnamed-faq-74, Up: FAQ
1294
1295	unnamed-faq-75
1296	==============
1297
1298
1299	To: jimmey@lexis-nexis.com (Jimmey Todd)
1300	Subject: Re: Flex performance question
1301	In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST.
1302	Date: Thu, 19 Feb 1998 15:42:25 PST
1303	From: Vern Paxson <vern>
1304
1305	So there are several problems.
1306
1307	First, to go fast, you want to match as much text as possible, which
1308	your scanners don't in the case that what they're scanning is not
1309	a <RN> tag. So you want a rule like:
1310
1311	[^<]+
1312
1313	Second, C++ scanners are particularly slow if they're interactive,
1314	which they are by default. Using -B speeds it up by a factor of 3-4
1315	on my workstation.
1316
1317	Third, C++ scanners that use the istream interface are slow, because
1318	of how poorly implemented istream's are. I built two versions of
1319	the following scanner:
1320
1321	%%
1322	.*\n
1323	.*
1324	%%
1325
1326	and the C version inhales a 2.5MB file on my workstation in 0.8 seconds.
1327	The C++ istream version, using -B, takes 3.8 seconds.
1328
1329	Vern
1330

Note: See TracBrowser for help on using the repository browser.

Download in other formats:

Original Format