Context Navigation

← Previous Revision
Latest Revision
Next Revision →
Blame
Revision Log

flex.info-2@ 3032

Visit:

Last change on this file since 3032 was 3031, checked in by bird, 18 years ago
flex 2.5.33.
File size: 50.9 KB

Line
1	This is flex.info, produced by makeinfo version 4.5 from flex.texi.
2
3	INFO-DIR-SECTION Programming
4	START-INFO-DIR-ENTRY
5	* flex: (flex). Fast lexical analyzer generator (lex replacement).
6	END-INFO-DIR-ENTRY
7
8
9	The flex manual is placed under the same licensing conditions as the
10	rest of flex:
11
12	Copyright (C) 1990, 1997 The Regents of the University of California.
13	All rights reserved.
14
15	This code is derived from software contributed to Berkeley by Vern
16	Paxson.
17
18	The United States Government has rights in this work pursuant to
19	contract no. DE-AC03-76SF00098 between the United States Department of
20	Energy and the University of California.
21
22	Redistribution and use in source and binary forms, with or without
23	modification, are permitted provided that the following conditions are
24	met:
25
26	1. Redistributions of source code must retain the above copyright
27	notice, this list of conditions and the following disclaimer.
28
29	2. Redistributions in binary form must reproduce the above copyright
30	notice, this list of conditions and the following disclaimer in the
31	documentation and/or other materials provided with the
32	distribution.
33	Neither the name of the University nor the names of its contributors
34	may be used to endorse or promote products derived from this software
35	without specific prior written permission.
36
37	THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
38	WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
39	MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
40
41	File: flex.info, Node: Start Conditions, Next: Multiple Input Buffers, Prev: Generated Scanner, Up: Top
42
43	Start Conditions
44	****************
45
46	`flex' provides a mechanism for conditionally activating rules. Any
47	rule whose pattern is prefixed with `<sc>' will only be active when the
48	scanner is in the "start condition" named `sc'. For example,
49
50
51	<STRING>[^"]* { /* eat up the string body ... */
52	...
53	}
54
55	will be active only when the scanner is in the `STRING' start
56	condition, and
57
58
59	<INITIAL,STRING,QUOTE>\. { /* handle an escape ... */
60	...
61	}
62
63	will be active only when the current start condition is either
64	`INITIAL', `STRING', or `QUOTE'.
65
66	Start conditions are declared in the definitions (first) section of
67	the input using unindented lines beginning with either `%s' or `%x'
68	followed by a list of names. The former declares "inclusive" start
69	conditions, the latter "exclusive" start conditions. A start condition
70	is activated using the `BEGIN' action. Until the next `BEGIN' action
71	is executed, rules with the given start condition will be active and
72	rules with other start conditions will be inactive. If the start
73	condition is inclusive, then rules with no start conditions at all will
74	also be active. If it is exclusive, then _only_ rules qualified with
75	the start condition will be active. A set of rules contingent on the
76	same exclusive start condition describe a scanner which is independent
77	of any of the other rules in the `flex' input. Because of this,
78	exclusive start conditions make it easy to specify "mini-scanners"
79	which scan portions of the input that are syntactically different from
80	the rest (e.g., comments).
81
82	If the distinction between inclusive and exclusive start conditions
83	is still a little vague, here's a simple example illustrating the
84	connection between the two. The set of rules:
85
86
87	%s example
88	%%
89
90	<example>foo do_something();
91
92	bar something_else();
93
94	is equivalent to
95
96
97	%x example
98	%%
99
100	<example>foo do_something();
101
102	<INITIAL,example>bar something_else();
103
104	Without the `<INITIAL,example>' qualifier, the `bar' pattern in the
105	second example wouldn't be active (i.e., couldn't match) when in start
106	condition `example'. If we just used `example>' to qualify `bar',
107	though, then it would only be active in `example' and not in `INITIAL',
108	while in the first example it's active in both, because in the first
109	example the `example' start condition is an inclusive `(%s)' start
110	condition.
111
112	Also note that the special start-condition specifier `<*>' matches
113	every start condition. Thus, the above example could also have been
114	written:
115
116
117	%x example
118	%%
119
120	<example>foo do_something();
121
122	<*>bar something_else();
123
124	The default rule (to `ECHO' any unmatched character) remains active
125	in start conditions. It is equivalent to:
126
127
128	<*>.\|\n ECHO;
129
130	`BEGIN(0)' returns to the original state where only the rules with
131	no start conditions are active. This state can also be referred to as
132	the start-condition `INITIAL', so `BEGIN(INITIAL)' is equivalent to
133	`BEGIN(0)'. (The parentheses around the start condition name are not
134	required but are considered good style.)
135
136	`BEGIN' actions can also be given as indented code at the beginning
137	of the rules section. For example, the following will cause the scanner
138	to enter the `SPECIAL' start condition whenever `yylex()' is called and
139	the global variable `enter_special' is true:
140
141
142	int enter_special;
143
144	%x SPECIAL
145	%%
146	if ( enter_special )
147	BEGIN(SPECIAL);
148
149	<SPECIAL>blahblahblah
150	...more rules follow...
151
152	To illustrate the uses of start conditions, here is a scanner which
153	provides two different interpretations of a string like `123.456'. By
154	default it will treat it as three tokens, the integer `123', a dot
155	(`.'), and the integer `456'. But if the string is preceded earlier in
156	the line by the string `expect-floats' it will treat it as a single
157	token, the floating-point number `123.456':
158
159
160	%{
161	#include <math.h>
162	%}
163	%s expect
164
165	%%
166	expect-floats BEGIN(expect);
167
168	<expect>[0-9]+@samp{.}[0-9]+ {
169	printf( "found a float, = %f\n",
170	atof( yytext ) );
171	}
172	<expect>\n {
173	/* that's the end of the line, so
174	* we need another "expect-number"
175	* before we'll recognize any more
176	* numbers
177	*/
178	BEGIN(INITIAL);
179	}
180
181	[0-9]+ {
182	printf( "found an integer, = %d\n",
183	atoi( yytext ) );
184	}
185
186	"." printf( "found a dot\n" );
187
188	Here is a scanner which recognizes (and discards) C comments while
189	maintaining a count of the current input line.
190
191
192	%x comment
193	%%
194	int line_num = 1;
195
196	"/*" BEGIN(comment);
197
198	<comment>[^\n] /* eat anything that's not a '' /
199	<comment>""+[^/\n]* /* eat up ''s not followed by '/'s /
200	<comment>\n ++line_num;
201	<comment>"*"+"/" BEGIN(INITIAL);
202
203	This scanner goes to a bit of trouble to match as much text as
204	possible with each rule. In general, when attempting to write a
205	high-speed scanner try to match as much possible in each rule, as it's
206	a big win.
207
208	Note that start-conditions names are really integer values and can
209	be stored as such. Thus, the above could be extended in the following
210	fashion:
211
212
213	%x comment foo
214	%%
215	int line_num = 1;
216	int comment_caller;
217
218	"/*" {
219	comment_caller = INITIAL;
220	BEGIN(comment);
221	}
222
223	...
224
225	<foo>"/*" {
226	comment_caller = foo;
227	BEGIN(comment);
228	}
229
230	<comment>[^\n] /* eat anything that's not a '' /
231	<comment>""+[^/\n]* /* eat up ''s not followed by '/'s /
232	<comment>\n ++line_num;
233	<comment>"*"+"/" BEGIN(comment_caller);
234
235	Furthermore, you can access the current start condition using the
236	integer-valued `YY_START' macro. For example, the above assignments to
237	`comment_caller' could instead be written
238
239
240	comment_caller = YY_START;
241
242	Flex provides `YYSTATE' as an alias for `YY_START' (since that is
243	what's used by AT&T `lex').
244
245	For historical reasons, start conditions do not have their own
246	name-space within the generated scanner. The start condition names are
247	unmodified in the generated scanner and generated header. *Note
248	option-header::. *Note option-prefix::.
249
250	Finally, here's an example of how to match C-style quoted strings
251	using exclusive start conditions, including expanded escape sequences
252	(but not including checking for a string that's too long):
253
254
255	%x str
256
257	%%
258	char string_buf[MAX_STR_CONST];
259	char *string_buf_ptr;
260
261
262	\" string_buf_ptr = string_buf; BEGIN(str);
263
264	<str>\" { /* saw closing quote - all done */
265	BEGIN(INITIAL);
266	*string_buf_ptr = '\0';
267	/* return string constant token type and
268	* value to parser
269	*/
270	}
271
272	<str>\n {
273	/* error - unterminated string constant */
274	/* generate error message */
275	}
276
277	<str>\\[0-7]{1,3} {
278	/* octal escape sequence */
279	int result;
280
281	(void) sscanf( yytext + 1, "%o", &result );
282
283	if ( result > 0xff )
284	/* error, constant is out-of-bounds */
285
286	*string_buf_ptr++ = result;
287	}
288
289	<str>\\[0-9]+ {
290	/* generate error - bad escape sequence; something
291	* like '\48' or '\0777777'
292	*/
293	}
294
295	<str>\\n *string_buf_ptr++ = '\n';
296	<str>\\t *string_buf_ptr++ = '\t';
297	<str>\\r *string_buf_ptr++ = '\r';
298	<str>\\b *string_buf_ptr++ = '\b';
299	<str>\\f *string_buf_ptr++ = '\f';
300
301	<str>\\(.\|\n) *string_buf_ptr++ = yytext[1];
302
303	<str>[^\\\n\"]+ {
304	char *yptr = yytext;
305
306	while ( *yptr )
307	string_buf_ptr++ = yptr++;
308	}
309
310	Often, such as in some of the examples above, you wind up writing a
311	whole bunch of rules all preceded by the same start condition(s). Flex
312	makes this a little easier and cleaner by introducing a notion of start
313	condition "scope". A start condition scope is begun with:
314
315
316	<SCs>{
317
318	where `SCs' is a list of one or more start conditions. Inside the
319	start condition scope, every rule automatically has the prefix `SCs>'
320	applied to it, until a `}' which matches the initial `{'. So, for
321	example,
322
323
324	<ESC>{
325	"\\n" return '\n';
326	"\\r" return '\r';
327	"\\f" return '\f';
328	"\\0" return '\0';
329	}
330
331	is equivalent to:
332
333
334	<ESC>"\\n" return '\n';
335	<ESC>"\\r" return '\r';
336	<ESC>"\\f" return '\f';
337	<ESC>"\\0" return '\0';
338
339	Start condition scopes may be nested.
340
341	The following routines are available for manipulating stacks of
342	start conditions:
343
344	- Function: void yy_push_state ( int `new_state' )
345	pushes the current start condition onto the top of the start
346	condition stack and switches to `new_state' as though you had used
347	`BEGIN new_state' (recall that start condition names are also
348	integers).
349
350	- Function: void yy_pop_state ()
351	pops the top of the stack and switches to it via `BEGIN'.
352
353	- Function: int yy_top_state ()
354	returns the top of the stack without altering the stack's contents.
355
356	The start condition stack grows dynamically and so has no built-in
357	size limitation. If memory is exhausted, program execution aborts.
358
359	To use start condition stacks, your scanner must include a `%option
360	stack' directive (*note Scanner Options::).
361
362
363	File: flex.info, Node: Multiple Input Buffers, Next: EOF, Prev: Start Conditions, Up: Top
364
365	Multiple Input Buffers
366	**********************
367
368	Some scanners (such as those which support "include" files) require
369	reading from several input streams. As `flex' scanners do a large
370	amount of buffering, one cannot control where the next input will be
371	read from by simply writing a `YY_INPUT()' which is sensitive to the
372	scanning context. `YY_INPUT()' is only called when the scanner reaches
373	the end of its buffer, which may be a long time after scanning a
374	statement such as an `include' statement which requires switching the
375	input source.
376
377	To negotiate these sorts of problems, `flex' provides a mechanism
378	for creating and switching between multiple input buffers. An input
379	buffer is created by using:
380
381	- Function: YY_BUFFER_STATE yy_create_buffer ( FILE *file, int size )
382
383	which takes a `FILE' pointer and a size and creates a buffer
384	associated with the given file and large enough to hold `size'
385	characters (when in doubt, use `YY_BUF_SIZE' for the size). It returns
386	a `YY_BUFFER_STATE' handle, which may then be passed to other routines
387	(see below). The `YY_BUFFER_STATE' type is a pointer to an opaque
388	`struct yy_buffer_state' structure, so you may safely initialize
389	`YY_BUFFER_STATE' variables to `((YY_BUFFER_STATE) 0)' if you wish, and
390	also refer to the opaque structure in order to correctly declare input
391	buffers in source files other than that of your scanner. Note that the
392	`FILE' pointer in the call to `yy_create_buffer' is only used as the
393	value of `yyin' seen by `YY_INPUT'. If you redefine `YY_INPUT()' so it
394	no longer uses `yyin', then you can safely pass a NULL `FILE' pointer to
395	`yy_create_buffer'. You select a particular buffer to scan from using:
396
397	- Function: void yy_switch_to_buffer ( YY_BUFFER_STATE new_buffer )
398
399	The above function switches the scanner's input buffer so subsequent
400	tokens will come from `new_buffer'. Note that `yy_switch_to_buffer()'
401	may be used by `yywrap()' to set things up for continued scanning,
402	instead of opening a new file and pointing `yyin' at it. If you are
403	looking for a stack of input buffers, then you want to use
404	`yypush_buffer_state()' instead of this function. Note also that
405	switching input sources via either `yy_switch_to_buffer()' or
406	`yywrap()' does _not_ change the start condition.
407
408	- Function: void yy_delete_buffer ( YY_BUFFER_STATE buffer )
409
410	is used to reclaim the storage associated with a buffer. (`buffer'
411	can be NULL, in which case the routine does nothing.) You can also
412	clear the current contents of a buffer using:
413
414	- Function: void yypush_buffer_state ( YY_BUFFER_STATE buffer )
415
416	This function pushes the new buffer state onto an internal stack.
417	The pushed state becomes the new current state. The stack is maintained
418	by flex and will grow as required. This function is intended to be used
419	instead of `yy_switch_to_buffer', when you want to change states, but
420	preserve the current state for later use.
421
422	- Function: void yypop_buffer_state ( )
423
424	This function removes the current state from the top of the stack,
425	and deletes it by calling `yy_delete_buffer'. The next state on the
426	stack, if any, becomes the new current state.
427
428	- Function: void yy_flush_buffer ( YY_BUFFER_STATE buffer )
429
430	This function discards the buffer's contents, so the next time the
431	scanner attempts to match a token from the buffer, it will first fill
432	the buffer anew using `YY_INPUT()'.
433
434	- Function: YY_BUFFER_STATE yy_new_buffer ( FILE *file, int size )
435
436	is an alias for `yy_create_buffer()', provided for compatibility
437	with the C++ use of `new' and `delete' for creating and destroying
438	dynamic objects.
439
440	`YY_CURRENT_BUFFER' macro returns a `YY_BUFFER_STATE' handle to the
441	current buffer. It should not be used as an lvalue.
442
443	Here are two examples of using these features for writing a scanner
444	which expands include files (the `<<EOF>>' feature is discussed below).
445
446	This first example uses yypush_buffer_state and yypop_buffer_state.
447	Flex maintains the stack internally.
448
449
450	/* the "incl" state is used for picking up the name
451	* of an include file
452	*/
453	%x incl
454	%%
455	include BEGIN(incl);
456
457	[a-z]+ ECHO;
458	[^a-z\n]*\n? ECHO;
459
460	<incl>[ \t]* /* eat the whitespace */
461	<incl>[^ \t\n]+ { /* got the include file name */
462	yyin = fopen( yytext, "r" );
463
464	if ( ! yyin )
465	error( ... );
466
467	yypush_buffer_state(yy_create_buffer( yyin, YY_BUF_SIZE ));
468
469	BEGIN(INITIAL);
470	}
471
472	<<EOF>> {
473	yypop_buffer_state();
474
475	if ( !YY_CURRENT_BUFFER )
476	{
477	yyterminate();
478	}
479	}
480
481	The second example, below, does the same thing as the previous
482	example did, but manages its own input buffer stack manually (instead
483	of letting flex do it).
484
485
486	/* the "incl" state is used for picking up the name
487	* of an include file
488	*/
489	%x incl
490
491	%{
492	#define MAX_INCLUDE_DEPTH 10
493	YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
494	int include_stack_ptr = 0;
495	%}
496
497	%%
498	include BEGIN(incl);
499
500	[a-z]+ ECHO;
501	[^a-z\n]*\n? ECHO;
502
503	<incl>[ \t]* /* eat the whitespace */
504	<incl>[^ \t\n]+ { /* got the include file name */
505	if ( include_stack_ptr >= MAX_INCLUDE_DEPTH )
506	{
507	fprintf( stderr, "Includes nested too deeply" );
508	exit( 1 );
509	}
510
511	include_stack[include_stack_ptr++] =
512	YY_CURRENT_BUFFER;
513
514	yyin = fopen( yytext, "r" );
515
516	if ( ! yyin )
517	error( ... );
518
519	yy_switch_to_buffer(
520	yy_create_buffer( yyin, YY_BUF_SIZE ) );
521
522	BEGIN(INITIAL);
523	}
524
525	<<EOF>> {
526	if ( --include_stack_ptr 0 )
527	{
528	yyterminate();
529	}
530
531	else
532	{
533	yy_delete_buffer( YY_CURRENT_BUFFER );
534	yy_switch_to_buffer(
535	include_stack[include_stack_ptr] );
536	}
537	}
538
539	The following routines are available for setting up input buffers for
540	scanning in-memory strings instead of files. All of them create a new
541	input buffer for scanning the string, and return a corresponding
542	`YY_BUFFER_STATE' handle (which you should delete with
543	`yy_delete_buffer()' when done with it). They also switch to the new
544	buffer using `yy_switch_to_buffer()', so the next call to `yylex()'
545	will start scanning the string.
546
547	- Function: YY_BUFFER_STATE yy_scan_string ( const char *str )
548	scans a NUL-terminated string.
549
550	- Function: YY_BUFFER_STATE yy_scan_bytes ( const char *bytes, int len
551	)
552	scans `len' bytes (including possibly `NUL's) starting at location
553	`bytes'.
554
555	Note that both of these functions create and scan a _copy_ of the
556	string or bytes. (This may be desirable, since `yylex()' modifies the
557	contents of the buffer it is scanning.) You can avoid the copy by
558	using:
559
560	- Function: YY_BUFFER_STATE yy_scan_buffer (char *base, yy_size_t size)
561	which scans in place the buffer starting at `base', consisting of
562	`size' bytes, the last two bytes of which _must_ be
563	`YY_END_OF_BUFFER_CHAR' (ASCII NUL). These last two bytes are not
564	scanned; thus, scanning consists of `base[0]' through
565	`base[size-2]', inclusive.
566
567	If you fail to set up `base' in this manner (i.e., forget the final
568	two `YY_END_OF_BUFFER_CHAR' bytes), then `yy_scan_buffer()' returns a
569	NULL pointer instead of creating a new input buffer.
570
571	- Data type: yy_size_t
572	is an integral type to which you can cast an integer expression
573	reflecting the size of the buffer.
574
575
576	File: flex.info, Node: EOF, Next: Misc Macros, Prev: Multiple Input Buffers, Up: Top
577
578	End-of-File Rules
579	*****************
580
581	The special rule `<<EOF>>' indicates actions which are to be taken
582	when an end-of-file is encountered and `yywrap()' returns non-zero
583	(i.e., indicates no further files to process). The action must finish
584	by doing one of the following things:
585
586	* assigning `yyin' to a new input file (in previous versions of
587	`flex', after doing the assignment you had to call the special
588	action `YY_NEW_FILE'. This is no longer necessary.)
589
590	* executing a `return' statement;
591
592	* executing the special `yyterminate()' action.
593
594	* or, switching to a new buffer using `yy_switch_to_buffer()' as
595	shown in the example above.
596
597	<<EOF>> rules may not be used with other patterns; they may only be
598	qualified with a list of start conditions. If an unqualified <<EOF>>
599	rule is given, it applies to _all_ start conditions which do not
600	already have <<EOF>> actions. To specify an <<EOF>> rule for only the
601	initial start condition, use:
602
603
604	<INITIAL><<EOF>>
605
606	These rules are useful for catching things like unclosed comments.
607	An example:
608
609
610	%x quote
611	%%
612
613	...other rules for dealing with quotes...
614
615	<quote><<EOF>> {
616	error( "unterminated quote" );
617	yyterminate();
618	}
619	<<EOF>> {
620	if ( *++filelist )
621	yyin = fopen( *filelist, "r" );
622	else
623	yyterminate();
624	}
625
626
627	File: flex.info, Node: Misc Macros, Next: User Values, Prev: EOF, Up: Top
628
629	Miscellaneous Macros
630	********************
631
632	The macro `YY_USER_ACTION' can be defined to provide an action which
633	is always executed prior to the matched rule's action. For example, it
634	could be #define'd to call a routine to convert yytext to lower-case.
635	When `YY_USER_ACTION' is invoked, the variable `yy_act' gives the
636	number of the matched rule (rules are numbered starting with 1).
637	Suppose you want to profile how often each of your rules is matched.
638	The following would do the trick:
639
640
641	#define YY_USER_ACTION ++ctr[yy_act]
642
643	where `ctr' is an array to hold the counts for the different rules.
644	Note that the macro `YY_NUM_RULES' gives the total number of rules
645	(including the default rule), even if you use `-s)', so a correct
646	declaration for `ctr' is:
647
648
649	int ctr[YY_NUM_RULES];
650
651	The macro `YY_USER_INIT' may be defined to provide an action which
652	is always executed before the first scan (and before the scanner's
653	internal initializations are done). For example, it could be used to
654	call a routine to read in a data table or open a logging file.
655
656	The macro `yy_set_interactive(is_interactive)' can be used to
657	control whether the current buffer is considered "interactive". An
658	interactive buffer is processed more slowly, but must be used when the
659	scanner's input source is indeed interactive to avoid problems due to
660	waiting to fill buffers (see the discussion of the `-I' flag in *Note
661	Scanner Options::). A non-zero value in the macro invocation marks the
662	buffer as interactive, a zero value as non-interactive. Note that use
663	of this macro overrides `%option always-interactive' or `%option
664	never-interactive' (*note Scanner Options::). `yy_set_interactive()'
665	must be invoked prior to beginning to scan the buffer that is (or is
666	not) to be considered interactive.
667
668	The macro `yy_set_bol(at_bol)' can be used to control whether the
669	current buffer's scanning context for the next token match is done as
670	though at the beginning of a line. A non-zero macro argument makes
671	rules anchored with `^' active, while a zero argument makes `^' rules
672	inactive.
673
674	The macro `YY_AT_BOL()' returns true if the next token scanned from
675	the current buffer will have `^' rules active, false otherwise.
676
677	In the generated scanner, the actions are all gathered in one large
678	switch statement and separated using `YY_BREAK', which may be
679	redefined. By default, it is simply a `break', to separate each rule's
680	action from the following rule's. Redefining `YY_BREAK' allows, for
681	example, C++ users to #define YY_BREAK to do nothing (while being very
682	careful that every rule ends with a `break'" or a `return'!) to avoid
683	suffering from unreachable statement warnings where because a rule's
684	action ends with `return', the `YY_BREAK' is inaccessible.
685
686
687	File: flex.info, Node: User Values, Next: Yacc, Prev: Misc Macros, Up: Top
688
689	Values Available To the User
690	****************************
691
692	This chapter summarizes the various values available to the user in
693	the rule actions.
694
695	`char *yytext'
696	holds the text of the current token. It may be modified but not
697	lengthened (you cannot append characters to the end).
698
699	If the special directive `%array' appears in the first section of
700	the scanner description, then `yytext' is instead declared `char
701	yytext[YYLMAX]', where `YYLMAX' is a macro definition that you can
702	redefine in the first section if you don't like the default value
703	(generally 8KB). Using `%array' results in somewhat slower
704	scanners, but the value of `yytext' becomes immune to calls to
705	`unput()', which potentially destroy its value when `yytext' is a
706	character pointer. The opposite of `%array' is `%pointer', which
707	is the default.
708
709	You cannot use `%array' when generating C++ scanner classes (the
710	`-+' flag).
711
712	`int yyleng'
713	holds the length of the current token.
714
715	`FILE *yyin'
716	is the file which by default `flex' reads from. It may be
717	redefined but doing so only makes sense before scanning begins or
718	after an EOF has been encountered. Changing it in the midst of
719	scanning will have unexpected results since `flex' buffers its
720	input; use `yyrestart()' instead. Once scanning terminates
721	because an end-of-file has been seen, you can assign `yyin' at the
722	new input file and then call the scanner again to continue
723	scanning.
724
725	`void yyrestart( FILE *new_file )'
726	may be called to point `yyin' at the new input file. The
727	switch-over to the new file is immediate (any previously
728	buffered-up input is lost). Note that calling `yyrestart()' with
729	`yyin' as an argument thus throws away the current input buffer
730	and continues scanning the same input file.
731
732	`FILE *yyout'
733	is the file to which `ECHO' actions are done. It can be reassigned
734	by the user.
735
736	`YY_CURRENT_BUFFER'
737	returns a `YY_BUFFER_STATE' handle to the current buffer.
738
739	`YY_START'
740	returns an integer value corresponding to the current start
741	condition. You can subsequently use this value with `BEGIN' to
742	return to that start condition.
743
744
745	File: flex.info, Node: Yacc, Next: Scanner Options, Prev: User Values, Up: Top
746
747	Interfacing with Yacc
748	*********************
749
750	One of the main uses of `flex' is as a companion to the `yacc'
751	parser-generator. `yacc' parsers expect to call a routine named
752	`yylex()' to find the next input token. The routine is supposed to
753	return the type of the next token as well as putting any associated
754	value in the global `yylval'. To use `flex' with `yacc', one specifies
755	the `-d' option to `yacc' to instruct it to generate the file `y.tab.h'
756	containing definitions of all the `%tokens' appearing in the `yacc'
757	input. This file is then included in the `flex' scanner. For example,
758	if one of the tokens is `TOK_NUMBER', part of the scanner might look
759	like:
760
761
762	%{
763	#include "y.tab.h"
764	%}
765
766	%%
767
768	[0-9]+ yylval = atoi( yytext ); return TOK_NUMBER;
769
770
771	File: flex.info, Node: Scanner Options, Next: Performance, Prev: Yacc, Up: Top
772
773	Scanner Options
774	***************
775
776	The various `flex' options are categorized by function in the
777	following menu. If you want to lookup a particular option by name,
778	*Note Index of Scanner Options::.
779
780	* Menu:
781
782	* Options for Specifing Filenames::
783	* Options Affecting Scanner Behavior::
784	* Code-Level And API Options::
785	* Options for Scanner Speed and Size::
786	* Debugging Options::
787	* Miscellaneous Options::
788
789	Even though there are many scanner options, a typical scanner might
790	only specify the following options:
791
792
793	%option 8bit reentrant bison-bridge
794	%option warn nodefault
795	%option yylineno
796	%option outfile="scanner.c" header-file="scanner.h"
797
798	The first line specifies the general type of scanner we want. The
799	second line specifies that we are being careful. The third line asks
800	flex to track line numbers. The last line tells flex what to name the
801	files. (The options can be specified in any order. We just dividied
802	them.)
803
804	`flex' also provides a mechanism for controlling options within the
805	scanner specification itself, rather than from the flex command-line.
806	This is done by including `%option' directives in the first section of
807	the scanner specification. You can specify multiple options with a
808	single `%option' directive, and multiple directives in the first
809	section of your flex input file.
810
811	Most options are given simply as names, optionally preceded by the
812	word `no' (with no intervening whitespace) to negate their meaning.
813	The names are the same as their long-option equivalents (but without the
814	leading `--' ).
815
816	`flex' scans your rule actions to determine whether you use the
817	`REJECT' or `yymore()' features. The `REJECT' and `yymore' options are
818	available to override its decision as to whether you use the options,
819	either by setting them (e.g., `%option reject)' to indicate the feature
820	is indeed used, or unsetting them to indicate it actually is not used
821	(e.g., `%option noyymore)'.
822
823	A number of options are available for lint purists who want to
824	suppress the appearance of unneeded routines in the generated scanner.
825	Each of the following, if unset (e.g., `%option nounput'), results in
826	the corresponding routine not appearing in the generated scanner:
827
828
829	input, unput
830	yy_push_state, yy_pop_state, yy_top_state
831	yy_scan_buffer, yy_scan_bytes, yy_scan_string
832
833	yyget_extra, yyset_extra, yyget_leng, yyget_text,
834	yyget_lineno, yyset_lineno, yyget_in, yyset_in,
835	yyget_out, yyset_out, yyget_lval, yyset_lval,
836	yyget_lloc, yyset_lloc, yyget_debug, yyset_debug
837
838	(though `yy_push_state()' and friends won't appear anyway unless you
839	use `%option stack)'.
840
841
842	File: flex.info, Node: Options for Specifing Filenames, Next: Options Affecting Scanner Behavior, Prev: Scanner Options, Up: Scanner Options
843
844	Options for Specifing Filenames
845	===============================
846
847	`--header-file=FILE, `%option header-file="FILE"''
848	instructs flex to write a C header to `FILE'. This file contains
849	function prototypes, extern variables, and types used by the
850	scanner. Only the external API is exported by the header file.
851	Many macros that are usable from within scanner actions are not
852	exported to the header file. This is due to namespace problems and
853	the goal of a clean external API.
854
855	While in the header, the macro `yyIN_HEADER' is defined, where `yy'
856	is substituted with the appropriate prefix.
857
858	The `--header-file' option is not compatible with the `--c++'
859	option, since the C++ scanner provides its own header in
860	`yyFlexLexer.h'.
861
862	`-oFILE, --outfile=FILE, `%option outfile="FILE"''
863	directs flex to write the scanner to the file `FILE' instead of
864	`lex.yy.c'. If you combine `--outfile' with the `--stdout' option,
865	then the scanner is written to `stdout' but its `#line' directives
866	(see the `-l' option above) refer to the file `FILE'.
867
868	`-t, --stdout, `%option stdout''
869	instructs `flex' to write the scanner it generates to standard
870	output instead of `lex.yy.c'.
871
872	`-SFILE, --skel=FILE'
873	overrides the default skeleton file from which `flex' constructs
874	its scanners. You'll never need this option unless you are doing
875	`flex' maintenance or development.
876
877	`--tables-file=FILE'
878	Write serialized scanner dfa tables to FILE. The generated scanner
879	will not contain the tables, and requires them to be loaded at
880	runtime. *Note serialization::.
881
882	`--tables-verify'
883	This option is for flex development. We document it here in case
884	you stumble upon it by accident or in case you suspect some
885	inconsistency in the serialized tables. Flex will serialize the
886	scanner dfa tables but will also generate the in-code tables as it
887	normally does. At runtime, the scanner will verify that the
888	serialized tables match the in-code tables, instead of loading
889	them.
890
891
892
893	File: flex.info, Node: Options Affecting Scanner Behavior, Next: Code-Level And API Options, Prev: Options for Specifing Filenames, Up: Scanner Options
894
895	Options Affecting Scanner Behavior
896	==================================
897
898	`-i, --case-insensitive, `%option case-insensitive''
899	instructs `flex' to generate a "case-insensitive" scanner. The
900	case of letters given in the `flex' input patterns will be ignored,
901	and tokens in the input will be matched regardless of case. The
902	matched text given in `yytext' will have the preserved case (i.e.,
903	it will not be folded). For tricky behavior, see *Note case and
904	character ranges::.
905
906	`-l, --lex-compat, `%option lex-compat''
907	turns on maximum compatibility with the original AT&T `lex'
908	implementation. Note that this does not mean _full_ compatibility.
909	Use of this option costs a considerable amount of performance, and
910	it cannot be used with the `--c++', `--full', `--fast', `-Cf', or
911	`-CF' options. For details on the compatibilities it provides, see
912	*Note Lex and Posix::. This option also results in the name
913	`YY_FLEX_LEX_COMPAT' being `#define''d in the generated scanner.
914
915	`-B, --batch, `%option batch''
916	instructs `flex' to generate a "batch" scanner, the opposite of
917	_interactive_ scanners generated by `--interactive' (see below).
918	In general, you use `-B' when you are _certain_ that your scanner
919	will never be used interactively, and you want to squeeze a
920	_little_ more performance out of it. If your goal is instead to
921	squeeze out a _lot_ more performance, you should be using the
922	`-Cf' or `-CF' options, which turn on `--batch' automatically
923	anyway.
924
925	`-I, --interactive, `%option interactive''
926	instructs `flex' to generate an interactive scanner. An
927	interactive scanner is one that only looks ahead to decide what
928	token has been matched if it absolutely must. It turns out that
929	always looking one extra character ahead, even if the scanner has
930	already seen enough text to disambiguate the current token, is a
931	bit faster than only looking ahead when necessary. But scanners
932	that always look ahead give dreadful interactive performance; for
933	example, when a user types a newline, it is not recognized as a
934	newline token until they enter _another_ token, which often means
935	typing in another whole line.
936
937	`flex' scanners default to `interactive' unless you use the `-Cf'
938	or `-CF' table-compression options (*note Performance::). That's
939	because if you're looking for high-performance you should be using
940	one of these options, so if you didn't, `flex' assumes you'd
941	rather trade off a bit of run-time performance for intuitive
942	interactive behavior. Note also that you _cannot_ use
943	`--interactive' in conjunction with `-Cf' or `-CF'. Thus, this
944	option is not really needed; it is on by default for all those
945	cases in which it is allowed.
946
947	You can force a scanner to _not_ be interactive by using `--batch'
948
949	`-7, --7bit, `%option 7bit''
950	instructs `flex' to generate a 7-bit scanner, i.e., one which can
951	only recognize 7-bit characters in its input. The advantage of
952	using `--7bit' is that the scanner's tables can be up to half the
953	size of those generated using the `--8bit'. The disadvantage is
954	that such scanners often hang or crash if their input contains an
955	8-bit character.
956
957	Note, however, that unless you generate your scanner using the
958	`-Cf' or `-CF' table compression options, use of `--7bit' will
959	save only a small amount of table space, and make your scanner
960	considerably less portable. `Flex''s default behavior is to
961	generate an 8-bit scanner unless you use the `-Cf' or `-CF', in
962	which case `flex' defaults to generating 7-bit scanners unless
963	your site was always configured to generate 8-bit scanners (as will
964	often be the case with non-USA sites). You can tell whether flex
965	generated a 7-bit or an 8-bit scanner by inspecting the flag
966	summary in the `--verbose' output as described above.
967
968	Note that if you use `-Cfe' or `-CFe' `flex' still defaults to
969	generating an 8-bit scanner, since usually with these compression
970	options full 8-bit tables are not much more expensive than 7-bit
971	tables.
972
973	`-8, --8bit, `%option 8bit''
974	instructs `flex' to generate an 8-bit scanner, i.e., one which can
975	recognize 8-bit characters. This flag is only needed for scanners
976	generated using `-Cf' or `-CF', as otherwise flex defaults to
977	generating an 8-bit scanner anyway.
978
979	See the discussion of `--7bit' above for `flex''s default behavior
980	and the tradeoffs between 7-bit and 8-bit scanners.
981
982	`--default, `%option default''
983	generate the default rule.
984
985	`--always-interactive, `%option always-interactive''
986	instructs flex to generate a scanner which always considers its
987	input _interactive_. Normally, on each new input file the scanner
988	calls `isatty()' in an attempt to determine whether the scanner's
989	input source is interactive and thus should be read a character at
990	a time. When this option is used, however, then no such call is
991	made.
992
993	`--never-interactive, `--never-interactive''
994	instructs flex to generate a scanner which never considers its
995	input interactive. This is the opposite of `always-interactive'.
996
997	`-X, --posix, `%option posix''
998	turns on maximum compatibility with the POSIX 1003.2-1992
999	definition of `lex'. Since `flex' was originally designed to
1000	implement the POSIX definition of `lex' this generally involves
1001	very few changes in behavior. At the current writing the known
1002	differences between `flex' and the POSIX standard are:
1003
1004	* In POSIX and AT&T `lex', the repeat operator, `{}', has lower
1005	precedence than concatenation (thus `ab{3}' yields `ababab').
1006	Most POSIX utilities use an Extended Regular Expression (ERE)
1007	precedence that has the precedence of the repeat operator
1008	higher than concatenation (which causes `ab{3}' to yield
1009	`abbb'). By default, `flex' places the precedence of the
1010	repeat operator higher than concatenation which matches the
1011	ERE processing of other POSIX utilities. When either
1012	`--posix' or `-l' are specified, `flex' will use the
1013	traditional AT&T and POSIX-compliant precedence for the
1014	repeat operator where concatenation has higher precedence
1015	than the repeat operator.
1016
1017	`--stack, `%option stack''
1018	enables the use of start condition stacks (*note Start
1019	Conditions::).
1020
1021	`--stdinit, `%option stdinit''
1022	if set (i.e., %option stdinit) initializes `yyin' and `yyout' to
1023	`stdin' and `stdout', instead of the default of `NULL'. Some
1024	existing `lex' programs depend on this behavior, even though it is
1025	not compliant with ANSI C, which does not require `stdin' and
1026	`stdout' to be compile-time constant. In a reentrant scanner,
1027	however, this is not a problem since initialization is performed
1028	in `yylex_init' at runtime.
1029
1030	`--yylineno, `%option yylineno''
1031	directs `flex' to generate a scanner that maintains the number of
1032	the current line read from its input in the global variable
1033	`yylineno'. This option is implied by `%option lex-compat'. In a
1034	reentrant C scanner, the macro `yylineno' is accessible regardless
1035	of the value of `%option yylineno', however, its value is not
1036	modified by `flex' unless `%option yylineno' is enabled.
1037
1038	`--yywrap, `%option yywrap''
1039	if unset (i.e., `--noyywrap)', makes the scanner not call
1040	`yywrap()' upon an end-of-file, but simply assume that there are no
1041	more files to scan (until the user points `yyin' at a new file and
1042	calls `yylex()' again).
1043
1044
1045
1046	File: flex.info, Node: Code-Level And API Options, Next: Options for Scanner Speed and Size, Prev: Options Affecting Scanner Behavior, Up: Scanner Options
1047
1048	Code-Level And API Options
1049	==========================
1050
1051	`--ansi-definitions, `%option ansi-definitions''
1052	instruct flex to generate ANSI C99 definitions for functions.
1053	This option is enabled by default. If `%option
1054	noansi-definitions' is specified, then the obsolete style is
1055	generated.
1056
1057	`--ansi-prototypes, `%option ansi-prototypes''
1058	instructs flex to generate ANSI C99 prototypes for functions.
1059	This option is enabled by default. If `noansi-prototypes' is
1060	specified, then prototypes will have empty parameter lists.
1061
1062	`--bison-bridge, `%option bison-bridge''
1063	instructs flex to generate a C scanner that is meant to be called
1064	by a `GNU bison' parser. The scanner has minor API changes for
1065	`bison' compatibility. In particular, the declaration of `yylex'
1066	is modified to take an additional parameter, `yylval'. *Note
1067	Bison Bridge::.
1068
1069	`--bison-locations, `%option bison-locations''
1070	instruct flex that `GNU bison' `%locations' are being used. This
1071	means `yylex' will be passed an additional parameter, `yylloc'.
1072	This option implies `%option bison-bridge'. *Note Bison Bridge::.
1073
1074	`-L, --noline, `%option noline''
1075	instructs `flex' not to generate `#line' directives. Without this
1076	option, `flex' peppers the generated scanner with `#line'
1077	directives so error messages in the actions will be correctly
1078	located with respect to either the original `flex' input file (if
1079	the errors are due to code in the input file), or `lex.yy.c' (if
1080	the errors are `flex''s fault - you should report these sorts of
1081	errors to the email address given in *Note Reporting Bugs::).
1082
1083	`-R, --reentrant, `%option reentrant''
1084	instructs flex to generate a reentrant C scanner. The generated
1085	scanner may safely be used in a multi-threaded environment. The
1086	API for a reentrant scanner is different than for a non-reentrant
1087	scanner *note Reentrant::). Because of the API difference between
1088	reentrant and non-reentrant `flex' scanners, non-reentrant flex
1089	code must be modified before it is suitable for use with this
1090	option. This option is not compatible with the `--c++' option.
1091
1092	The option `--reentrant' does not affect the performance of the
1093	scanner.
1094
1095	`-+, --c++, `%option c++''
1096	specifies that you want flex to generate a C++ scanner class.
1097	*Note Cxx::, for details.
1098
1099	`--array, `%option array''
1100	specifies that you want yytext to be an array instead of a char*
1101
1102	`--pointer, `%option pointer''
1103	specify that `yytext' should be a `char *', not an array. This
1104	default is `char *'.
1105
1106	`-PPREFIX, --prefix=PREFIX, `%option prefix="PREFIX"''
1107	changes the default `yy' prefix used by `flex' for all
1108	globally-visible variable and function names to instead be
1109	`PREFIX'. For example, `--prefix=foo' changes the name of
1110	`yytext' to `footext'. It also changes the name of the default
1111	output file from `lex.yy.c' to `lex.foo.c'. Here is a partial
1112	list of the names affected:
1113
1114
1115	yy_create_buffer
1116	yy_delete_buffer
1117	yy_flex_debug
1118	yy_init_buffer
1119	yy_flush_buffer
1120	yy_load_buffer_state
1121	yy_switch_to_buffer
1122	yyin
1123	yyleng
1124	yylex
1125	yylineno
1126	yyout
1127	yyrestart
1128	yytext
1129	yywrap
1130	yyalloc
1131	yyrealloc
1132	yyfree
1133
1134	(If you are using a C++ scanner, then only `yywrap' and
1135	`yyFlexLexer' are affected.) Within your scanner itself, you can
1136	still refer to the global variables and functions using either
1137	version of their name; but externally, they have the modified name.
1138
1139	This option lets you easily link together multiple `flex' programs
1140	into the same executable. Note, though, that using this option
1141	also renames `yywrap()', so you now _must_ either provide your own
1142	(appropriately-named) version of the routine for your scanner, or
1143	use `%option noyywrap', as linking with `-lfl' no longer provides
1144	one for you by default.
1145
1146	`--main, `%option main''
1147	directs flex to provide a default `main()' program for the
1148	scanner, which simply calls `yylex()'. This option implies
1149	`noyywrap' (see below).
1150
1151	`--nounistd, `%option nounistd''
1152	suppresses inclusion of the non-ANSI header file `unistd.h'. This
1153	option is meant to target environments in which `unistd.h' does
1154	not exist. Be aware that certain options may cause flex to
1155	generate code that relies on functions normally found in
1156	`unistd.h', (e.g. `isatty()', `read()'.) If you wish to use these
1157	functions, you will have to inform your compiler where to find
1158	them. Note option-always-interactive::. Note option-read::.
1159
1160	`--yyclass, `%option yyclass="NAME"''
1161	only applies when generating a C++ scanner (the `--c++' option).
1162	It informs `flex' that you have derived `foo' as a subclass of
1163	`yyFlexLexer', so `flex' will place your actions in the member
1164	function `foo::yylex()' instead of `yyFlexLexer::yylex()'. It
1165	also generates a `yyFlexLexer::yylex()' member function that emits
1166	a run-time error (by invoking `yyFlexLexer::LexerError())' if
1167	called. *Note Cxx::.
1168
1169
1170
1171	File: flex.info, Node: Options for Scanner Speed and Size, Next: Debugging Options, Prev: Code-Level And API Options, Up: Scanner Options
1172
1173	Options for Scanner Speed and Size
1174	==================================
1175
1176	`-C[aefFmr]'
1177	controls the degree of table compression and, more generally,
1178	trade-offs between small scanners and fast scanners.
1179
1180	`-C'
1181	A lone `-C' specifies that the scanner tables should be
1182	compressed but neither equivalence classes nor
1183	meta-equivalence classes should be used.
1184
1185	`-Ca, --align, `%option align''
1186	("align") instructs flex to trade off larger tables in the
1187	generated scanner for faster performance because the elements
1188	of the tables are better aligned for memory access and
1189	computation. On some RISC architectures, fetching and
1190	manipulating longwords is more efficient than with
1191	smaller-sized units such as shortwords. This option can
1192	quadruple the size of the tables used by your scanner.
1193
1194	`-Ce, --ecs, `%option ecs''
1195	directs `flex' to construct "equivalence classes", i.e., sets
1196	of characters which have identical lexical properties (for
1197	example, if the only appearance of digits in the `flex' input
1198	is in the character class "[0-9]" then the digits '0', '1',
1199	..., '9' will all be put in the same equivalence class).
1200	Equivalence classes usually give dramatic reductions in the
1201	final table/object file sizes (typically a factor of 2-5) and
1202	are pretty cheap performance-wise (one array look-up per
1203	character scanned).
1204
1205	`-Cf'
1206	specifies that the "full" scanner tables should be generated -
1207	`flex' should not compress the tables by taking advantages of
1208	similar transition functions for different states.
1209
1210	`-CF'
1211	specifies that the alternate fast scanner representation
1212	(described above under the `--fast' flag) should be used.
1213	This option cannot be used with `--c++'.
1214
1215	`-Cm, --meta-ecs, `%option meta-ecs''
1216	directs `flex' to construct "meta-equivalence classes", which
1217	are sets of equivalence classes (or characters, if equivalence
1218	classes are not being used) that are commonly used together.
1219	Meta-equivalence classes are often a big win when using
1220	compressed tables, but they have a moderate performance
1221	impact (one or two `if' tests and one array look-up per
1222	character scanned).
1223
1224	`-Cr, --read, `%option read''
1225	causes the generated scanner to _bypass_ use of the standard
1226	I/O library (`stdio') for input. Instead of calling
1227	`fread()' or `getc()', the scanner will use the `read()'
1228	system call, resulting in a performance gain which varies
1229	from system to system, but in general is probably negligible
1230	unless you are also using `-Cf' or `-CF'. Using `-Cr' can
1231	cause strange behavior if, for example, you read from `yyin'
1232	using `stdio' prior to calling the scanner (because the
1233	scanner will miss whatever text your previous reads left in
1234	the `stdio' input buffer). `-Cr' has no effect if you define
1235	`YY_INPUT()' (*note Generated Scanner::).
1236
1237	The options `-Cf' or `-CF' and `-Cm' do not make sense together -
1238	there is no opportunity for meta-equivalence classes if the table
1239	is not being compressed. Otherwise the options may be freely
1240	mixed, and are cumulative.
1241
1242	The default setting is `-Cem', which specifies that `flex' should
1243	generate equivalence classes and meta-equivalence classes. This
1244	setting provides the highest degree of table compression. You can
1245	trade off faster-executing scanners at the cost of larger tables
1246	with the following generally being true:
1247
1248
1249	slowest & smallest
1250	-Cem
1251	-Cm
1252	-Ce
1253	-C
1254	-C{f,F}e
1255	-C{f,F}
1256	-C{f,F}a
1257	fastest & largest
1258
1259	Note that scanners with the smallest tables are usually generated
1260	and compiled the quickest, so during development you will usually
1261	want to use the default, maximal compression.
1262
1263	`-Cfe' is often a good compromise between speed and size for
1264	production scanners.
1265
1266	`-f, --full, `%option full''
1267	specifies "fast scanner". No table compression is done and
1268	`stdio' is bypassed. The result is large but fast. This option
1269	is equivalent to `--Cfr'
1270
1271	`-F, --fast, `%option fast''
1272	specifies that the _fast_ scanner table representation should be
1273	used (and `stdio' bypassed). This representation is about as fast
1274	as the full table representation `--full', and for some sets of
1275	patterns will be considerably smaller (and for others, larger). In
1276	general, if the pattern set contains both _keywords_ and a
1277	catch-all, _identifier_ rule, such as in the set:
1278
1279
1280	"case" return TOK_CASE;
1281	"switch" return TOK_SWITCH;
1282	...
1283	"default" return TOK_DEFAULT;
1284	[a-z]+ return TOK_ID;
1285
1286	then you're better off using the full table representation. If
1287	only the _identifier_ rule is present and you then use a hash
1288	table or some such to detect the keywords, you're better off using
1289	`--fast'.
1290
1291	This option is equivalent to `-CFr' (see below). It cannot be used
1292	with `--c++'.
1293
1294

Note: See TracBrowser for help on using the repository browser.

Download in other formats:

Original Format