1 | This directory contains some examples illustrating techniques for extracting
|
---|
2 | high-performance from flex scanners. Each program implements a simplified
|
---|
3 | version of the Unix "wc" tool: read text from stdin and print the number of
|
---|
4 | characters, words, and lines present in the text. All programs were compiled
|
---|
5 | using gcc (version unavailable, sorry) with the -O flag, and run on a
|
---|
6 | SPARCstation 1+. The input used was a PostScript file, mainly containing
|
---|
7 | figures, with the following "wc" counts:
|
---|
8 |
|
---|
9 | lines words characters
|
---|
10 | 214217 635954 2592172
|
---|
11 |
|
---|
12 |
|
---|
13 | The basic principles illustrated by these programs are:
|
---|
14 |
|
---|
15 | - match as much text with each rule as possible
|
---|
16 | - adding rules does not slow you down!
|
---|
17 | - avoid backing up
|
---|
18 |
|
---|
19 | and the big caveat that comes with them is:
|
---|
20 |
|
---|
21 | - you buy performance with decreased maintainability; make
|
---|
22 | sure you really need it before applying the above techniques.
|
---|
23 |
|
---|
24 | See the "Performance Considerations" section of flexdoc for more
|
---|
25 | details regarding these principles.
|
---|
26 |
|
---|
27 |
|
---|
28 | The different versions of "wc":
|
---|
29 |
|
---|
30 | mywc.c
|
---|
31 | a simple but fairly efficient C version
|
---|
32 |
|
---|
33 | wc1.l a naive flex "wc" implementation
|
---|
34 |
|
---|
35 | wc2.l somewhat faster; adds rules to match multiple tokens at once
|
---|
36 |
|
---|
37 | wc3.l faster still; adds more rules to match longer runs of tokens
|
---|
38 |
|
---|
39 | wc4.l fastest; still more rules added; hard to do much better
|
---|
40 | using flex (or, I suspect, hand-coding)
|
---|
41 |
|
---|
42 | wc5.l identical to wc3.l except one rule has been slightly
|
---|
43 | shortened, introducing backing-up
|
---|
44 |
|
---|
45 | Timing results (all times in user CPU seconds):
|
---|
46 |
|
---|
47 | program time notes
|
---|
48 | ------- ---- -----
|
---|
49 | wc1 16.4 default flex table compression (= -Cem)
|
---|
50 | wc1 6.7 -Cf compression option
|
---|
51 | /bin/wc 5.8 Sun's standard "wc" tool
|
---|
52 | mywc 4.6 simple but better C implementation!
|
---|
53 | wc2 4.6 as good as C implementation; built using -Cf
|
---|
54 | wc3 3.8 -Cf
|
---|
55 | wc4 3.3 -Cf
|
---|
56 | wc5 5.7 -Cf; ouch, backing up is expensive
|
---|