| 1 | This is a nearly-public-domain reimplementation of the V8 regexp(3) package.
|
|---|
| 2 | It gives C programs the ability to use egrep-style regular expressions, and
|
|---|
| 3 | does it in a much cleaner fashion than the analogous routines in SysV.
|
|---|
| 4 |
|
|---|
| 5 | Copyright (c) 1986 by University of Toronto.
|
|---|
| 6 | Written by Henry Spencer. Not derived from licensed software.
|
|---|
| 7 |
|
|---|
| 8 | Permission is granted to anyone to use this software for any
|
|---|
| 9 | purpose on any computer system, and to redistribute it freely,
|
|---|
| 10 | subject to the following restrictions:
|
|---|
| 11 |
|
|---|
| 12 | 1. The author is not responsible for the consequences of use of
|
|---|
| 13 | this software, no matter how awful, even if they arise
|
|---|
| 14 | from defects in it.
|
|---|
| 15 |
|
|---|
| 16 | 2. The origin of this software must not be misrepresented, either
|
|---|
| 17 | by explicit claim or by omission.
|
|---|
| 18 |
|
|---|
| 19 | 3. Altered versions must be plainly marked as such, and must not
|
|---|
| 20 | be misrepresented as being the original software.
|
|---|
| 21 |
|
|---|
| 22 | Barring a couple of small items in the BUGS list, this implementation is
|
|---|
| 23 | believed 100% compatible with V8. It should even be binary-compatible,
|
|---|
| 24 | sort of, since the only fields in a "struct regexp" that other people have
|
|---|
| 25 | any business touching are declared in exactly the same way at the same
|
|---|
| 26 | location in the struct (the beginning).
|
|---|
| 27 |
|
|---|
| 28 | This implementation is *NOT* AT&T/Bell code, and is not derived from licensed
|
|---|
| 29 | software. Even though U of T is a V8 licensee. This software is based on
|
|---|
| 30 | a V8 manual page sent to me by Dennis Ritchie (the manual page enclosed
|
|---|
| 31 | here is a complete rewrite and hence is not covered by AT&T copyright).
|
|---|
| 32 | The software was nearly complete at the time of arrival of our V8 tape.
|
|---|
| 33 | I haven't even looked at V8 yet, although a friend elsewhere at U of T has
|
|---|
| 34 | been kind enough to run a few test programs using the V8 regexp(3) to resolve
|
|---|
| 35 | a few fine points. I admit to some familiarity with regular-expression
|
|---|
| 36 | implementations of the past, but the only one that this code traces any
|
|---|
| 37 | ancestry to is the one published in Kernighan & Plauger (from which this
|
|---|
| 38 | one draws ideas but not code).
|
|---|
| 39 |
|
|---|
| 40 | Simplistically: put this stuff into a source directory, copy regexp.h into
|
|---|
| 41 | /usr/include, inspect Makefile for compilation options that need changing
|
|---|
| 42 | to suit your local environment, and then do "make r". This compiles the
|
|---|
| 43 | regexp(3) functions, compiles a test program, and runs a large set of
|
|---|
| 44 | regression tests. If there are no complaints, then put regexp.o, regsub.o,
|
|---|
| 45 | and regerror.o into your C library, and regexp.3 into your manual-pages
|
|---|
| 46 | directory.
|
|---|
| 47 |
|
|---|
| 48 | Note that if you don't put regexp.h into /usr/include *before* compiling,
|
|---|
| 49 | you'll have to add "-I." to CFLAGS before compiling.
|
|---|
| 50 |
|
|---|
| 51 | The files are:
|
|---|
| 52 |
|
|---|
| 53 | Makefile instructions to make everything
|
|---|
| 54 | regexp.3 manual page
|
|---|
| 55 | regexp.h header file, for /usr/include
|
|---|
| 56 | regexp.c source for regcomp() and regexec()
|
|---|
| 57 | regsub.c source for regsub()
|
|---|
| 58 | regerror.c source for default regerror()
|
|---|
| 59 | regmagic.h internal header file
|
|---|
| 60 | try.c source for test program
|
|---|
| 61 | timer.c source for timing program
|
|---|
| 62 | tests test list for try and timer
|
|---|
| 63 |
|
|---|
| 64 | This implementation uses nondeterministic automata rather than the
|
|---|
| 65 | deterministic ones found in some other implementations, which makes it
|
|---|
| 66 | simpler, smaller, and faster at compiling regular expressions, but slower
|
|---|
| 67 | at executing them. In theory, anyway. This implementation does employ
|
|---|
| 68 | some special-case optimizations to make the simpler cases (which do make
|
|---|
| 69 | up the bulk of regular expressions actually used) run quickly. In general,
|
|---|
| 70 | if you want blazing speed you're in the wrong place. Replacing the insides
|
|---|
| 71 | of egrep with this stuff is probably a mistake; if you want your own egrep
|
|---|
| 72 | you're going to have to do a lot more work. But if you want to use regular
|
|---|
| 73 | expressions a little bit in something else, you're in luck. Note that many
|
|---|
| 74 | existing text editors use nondeterministic regular-expression implementations,
|
|---|
| 75 | so you're in good company.
|
|---|
| 76 |
|
|---|
| 77 | This stuff should be pretty portable, given appropriate option settings.
|
|---|
| 78 | If your chars have less than 8 bits, you're going to have to change the
|
|---|
| 79 | internal representation of the automaton, although knowledge of the details
|
|---|
| 80 | of this is fairly localized. There are no "reserved" char values except for
|
|---|
| 81 | NUL, and no special significance is attached to the top bit of chars.
|
|---|
| 82 | The string(3) functions are used a fair bit, on the grounds that they are
|
|---|
| 83 | probably faster than coding the operations in line. Some attempts at code
|
|---|
| 84 | tuning have been made, but this is invariably a bit machine-specific.
|
|---|