source: trunk/essentials/sys-apps/findutils/doc/regexprops.texi

Last change on this file was 3170, checked in by bird, 18 years ago

findutils 4.3.2

File size: 24.7 KB
Line 
1@menu
2* findutils-default regular expression syntax::
3* awk regular expression syntax::
4* egrep regular expression syntax::
5* emacs regular expression syntax::
6* gnu-awk regular expression syntax::
7* grep regular expression syntax::
8* posix-awk regular expression syntax::
9* posix-basic regular expression syntax::
10* posix-egrep regular expression syntax::
11* posix-extended regular expression syntax::
12@end menu
13
14@node findutils-default regular expression syntax
15@subsection @samp{findutils-default} regular expression syntax
16
17
18The character @samp{.} matches any single character.
19
20
21@table @samp
22
23@item +
24indicates that the regular expression should match one or more occurrences of the previous atom or regexp.
25@item ?
26indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.
27@item \+
28matches a @samp{+}
29@item \?
30matches a @samp{?}.
31@end table
32
33
34Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are ignored. Within square brackets, @samp{\} is taken literally. Character classes are not supported, so for example you would need to use @samp{[0-9]} instead of @samp{[[:digit:]]}.
35
36GNU extensions are supported:
37@enumerate
38
39@item @samp{\w} matches a character within a word
40
41@item @samp{\W} matches a character which is not within a word
42
43@item @samp{\<} matches the beginning of a word
44
45@item @samp{\>} matches the end of a word
46
47@item @samp{\b} matches a word boundary
48
49@item @samp{\B} matches characters which are not a word boundary
50
51@item @samp{\`} matches the beginning of the whole input
52
53@item @samp{\'} matches the end of the whole input
54
55@end enumerate
56
57
58Grouping is performed with backslashes followed by parentheses @samp{\(}, @samp{\)}. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{\(}.
59
60The alternation operator is @samp{\|}.
61
62The character @samp{^} only represents the beginning of a string when it appears:
63@enumerate
64
65@item
66At the beginning of a regular expression
67
68@item After an open-group, signified by
69@samp{\(}
70
71@item After the alternation operator @samp{\|}
72
73@end enumerate
74
75
76The character @samp{$} only represents the end of a string when it appears:
77@enumerate
78
79@item At the end of a regular expression
80
81@item Before an close-group, signified by
82@samp{\)}
83@item Before the alternation operator @samp{\|}
84
85@end enumerate
86
87
88@samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except:
89@enumerate
90
91@item At the beginning of a regular expression
92
93@item After an open-group, signified by
94@samp{\(}
95@item After the alternation operator @samp{\|}
96
97@end enumerate
98
99
100
101
102The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.
103
104
105@node awk regular expression syntax
106@subsection @samp{awk} regular expression syntax
107
108
109The character @samp{.} matches any single character except the null character.
110
111
112@table @samp
113
114@item +
115indicates that the regular expression should match one or more occurrences of the previous atom or regexp.
116@item ?
117indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.
118@item \+
119matches a @samp{+}
120@item \?
121matches a @samp{?}.
122@end table
123
124
125Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} can be used to quote the following character. Character classes are not supported, so for example you would need to use @samp{[0-9]} instead of @samp{[[:digit:]]}.
126
127GNU extensions are not supported and so @samp{\w}, @samp{\W}, @samp{\<}, @samp{\>}, @samp{\b}, @samp{\B}, @samp{\`}, and @samp{\'} match @samp{w}, @samp{W}, @samp{<}, @samp{>}, @samp{b}, @samp{B}, @samp{`}, and @samp{'} respectively.
128
129Grouping is performed with parentheses @samp{()}. An unmatched @samp{)} matches just itself. A backslash followed by a digit matches that digit.
130
131The alternation operator is @samp{|}.
132
133The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified.
134
135@samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except:
136@enumerate
137
138@item At the beginning of a regular expression
139
140@item After an open-group, signified by
141@samp{(}
142@item After the alternation operator @samp{|}
143
144@end enumerate
145
146
147
148
149The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.
150
151
152@node egrep regular expression syntax
153@subsection @samp{egrep} regular expression syntax
154
155
156The character @samp{.} matches any single character except newline.
157
158
159@table @samp
160
161@item +
162indicates that the regular expression should match one or more occurrences of the previous atom or regexp.
163@item ?
164indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.
165@item \+
166matches a @samp{+}
167@item \?
168matches a @samp{?}.
169@end table
170
171
172Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are ignored. Within square brackets, @samp{\} is taken literally. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. Non-matching lists @samp{[^@dots{}]} do not ever match newline.
173
174GNU extensions are supported:
175@enumerate
176
177@item @samp{\w} matches a character within a word
178
179@item @samp{\W} matches a character which is not within a word
180
181@item @samp{\<} matches the beginning of a word
182
183@item @samp{\>} matches the end of a word
184
185@item @samp{\b} matches a word boundary
186
187@item @samp{\B} matches characters which are not a word boundary
188
189@item @samp{\`} matches the beginning of the whole input
190
191@item @samp{\'} matches the end of the whole input
192
193@end enumerate
194
195
196Grouping is performed with parentheses @samp{()}. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{(}.
197
198The alternation operator is @samp{|}.
199
200The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified.
201
202The characters @samp{*}, @samp{+} and @samp{?} are special anywhere in a regular expression.
203
204
205
206The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.
207
208
209@node emacs regular expression syntax
210@subsection @samp{emacs} regular expression syntax
211
212
213The character @samp{.} matches any single character except newline.
214
215
216@table @samp
217
218@item +
219indicates that the regular expression should match one or more occurrences of the previous atom or regexp.
220@item ?
221indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.
222@item \+
223matches a @samp{+}
224@item \?
225matches a @samp{?}.
226@end table
227
228
229Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are ignored. Within square brackets, @samp{\} is taken literally. Character classes are not supported, so for example you would need to use @samp{[0-9]} instead of @samp{[[:digit:]]}.
230
231GNU extensions are supported:
232@enumerate
233
234@item @samp{\w} matches a character within a word
235
236@item @samp{\W} matches a character which is not within a word
237
238@item @samp{\<} matches the beginning of a word
239
240@item @samp{\>} matches the end of a word
241
242@item @samp{\b} matches a word boundary
243
244@item @samp{\B} matches characters which are not a word boundary
245
246@item @samp{\`} matches the beginning of the whole input
247
248@item @samp{\'} matches the end of the whole input
249
250@end enumerate
251
252
253Grouping is performed with backslashes followed by parentheses @samp{\(}, @samp{\)}. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{\(}.
254
255The alternation operator is @samp{\|}.
256
257The character @samp{^} only represents the beginning of a string when it appears:
258@enumerate
259
260@item
261At the beginning of a regular expression
262
263@item After an open-group, signified by
264@samp{\(}
265
266@item After the alternation operator @samp{\|}
267
268@end enumerate
269
270
271The character @samp{$} only represents the end of a string when it appears:
272@enumerate
273
274@item At the end of a regular expression
275
276@item Before an close-group, signified by
277@samp{\)}
278@item Before the alternation operator @samp{\|}
279
280@end enumerate
281
282
283@samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except:
284@enumerate
285
286@item At the beginning of a regular expression
287
288@item After an open-group, signified by
289@samp{\(}
290@item After the alternation operator @samp{\|}
291
292@end enumerate
293
294
295
296
297The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.
298
299
300@node gnu-awk regular expression syntax
301@subsection @samp{gnu-awk} regular expression syntax
302
303
304The character @samp{.} matches any single character.
305
306
307@table @samp
308
309@item +
310indicates that the regular expression should match one or more occurrences of the previous atom or regexp.
311@item ?
312indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.
313@item \+
314matches a @samp{+}
315@item \?
316matches a @samp{?}.
317@end table
318
319
320Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} can be used to quote the following character. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit.
321
322GNU extensions are supported:
323@enumerate
324
325@item @samp{\w} matches a character within a word
326
327@item @samp{\W} matches a character which is not within a word
328
329@item @samp{\<} matches the beginning of a word
330
331@item @samp{\>} matches the end of a word
332
333@item @samp{\b} matches a word boundary
334
335@item @samp{\B} matches characters which are not a word boundary
336
337@item @samp{\`} matches the beginning of the whole input
338
339@item @samp{\'} matches the end of the whole input
340
341@end enumerate
342
343
344Grouping is performed with parentheses @samp{()}. An unmatched @samp{)} matches just itself. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{(}.
345
346The alternation operator is @samp{|}.
347
348The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified.
349
350@samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except:
351@enumerate
352
353@item At the beginning of a regular expression
354
355@item After an open-group, signified by
356@samp{(}
357@item After the alternation operator @samp{|}
358
359@end enumerate
360
361
362
363
364The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.
365
366
367@node grep regular expression syntax
368@subsection @samp{grep} regular expression syntax
369
370
371The character @samp{.} matches any single character except newline.
372
373
374@table @samp
375
376@item \+
377indicates that the regular expression should match one or more occurrences of the previous atom or regexp.
378@item \?
379indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.
380@item + and ?
381match themselves.
382@end table
383
384
385Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are ignored. Within square brackets, @samp{\} is taken literally. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. Non-matching lists @samp{[^@dots{}]} do not ever match newline.
386
387GNU extensions are supported:
388@enumerate
389
390@item @samp{\w} matches a character within a word
391
392@item @samp{\W} matches a character which is not within a word
393
394@item @samp{\<} matches the beginning of a word
395
396@item @samp{\>} matches the end of a word
397
398@item @samp{\b} matches a word boundary
399
400@item @samp{\B} matches characters which are not a word boundary
401
402@item @samp{\`} matches the beginning of the whole input
403
404@item @samp{\'} matches the end of the whole input
405
406@end enumerate
407
408
409Grouping is performed with backslashes followed by parentheses @samp{\(}, @samp{\)}. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{\(}.
410
411The alternation operator is @samp{\|}.
412
413The character @samp{^} only represents the beginning of a string when it appears:
414@enumerate
415
416@item
417At the beginning of a regular expression
418
419@item After an open-group, signified by
420@samp{\(}
421
422@item After a newline
423
424@item After the alternation operator @samp{\|}
425
426@end enumerate
427
428
429The character @samp{$} only represents the end of a string when it appears:
430@enumerate
431
432@item At the end of a regular expression
433
434@item Before an close-group, signified by
435@samp{\)}
436@item Before a newline
437
438@item Before the alternation operator @samp{\|}
439
440@end enumerate
441
442
443@samp{\*}, @samp{\+} and @samp{\?} are special at any point in a regular expression except:
444@enumerate
445
446@item At the beginning of a regular expression
447
448@item After an open-group, signified by
449@samp{\(}
450@item After a newline
451
452@item After the alternation operator @samp{\|}
453
454@end enumerate
455
456
457Intervals are specified by @samp{\@{} and @samp{\@}}. Invalid intervals such as @samp{a\@{1z} are not accepted.
458
459The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.
460
461
462@node posix-awk regular expression syntax
463@subsection @samp{posix-awk} regular expression syntax
464
465
466The character @samp{.} matches any single character except the null character.
467
468
469@table @samp
470
471@item +
472indicates that the regular expression should match one or more occurrences of the previous atom or regexp.
473@item ?
474indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.
475@item \+
476matches a @samp{+}
477@item \?
478matches a @samp{?}.
479@end table
480
481
482Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} can be used to quote the following character. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit.
483
484GNU extensions are not supported and so @samp{\w}, @samp{\W}, @samp{\<}, @samp{\>}, @samp{\b}, @samp{\B}, @samp{\`}, and @samp{\'} match @samp{w}, @samp{W}, @samp{<}, @samp{>}, @samp{b}, @samp{B}, @samp{`}, and @samp{'} respectively.
485
486Grouping is performed with parentheses @samp{()}. An unmatched @samp{)} matches just itself. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{(}.
487
488The alternation operator is @samp{|}.
489
490The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified.
491
492@samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except the following places, where they are not allowed:
493@enumerate
494
495@item At the beginning of a regular expression
496
497@item After an open-group, signified by
498@samp{(}
499@item After the alternation operator @samp{|}
500
501@end enumerate
502
503
504Intervals are specified by @samp{@{} and @samp{@}}. Invalid intervals such as @samp{a@{1z} are not accepted.
505
506The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.
507
508
509@node posix-basic regular expression syntax
510@subsection @samp{posix-basic} regular expression syntax
511
512
513The character @samp{.} matches any single character except the null character.
514
515
516@table @samp
517
518@item \+
519indicates that the regular expression should match one or more occurrences of the previous atom or regexp.
520@item \?
521indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.
522@item + and ?
523match themselves.
524@end table
525
526
527Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} is taken literally. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit.
528
529GNU extensions are supported:
530@enumerate
531
532@item @samp{\w} matches a character within a word
533
534@item @samp{\W} matches a character which is not within a word
535
536@item @samp{\<} matches the beginning of a word
537
538@item @samp{\>} matches the end of a word
539
540@item @samp{\b} matches a word boundary
541
542@item @samp{\B} matches characters which are not a word boundary
543
544@item @samp{\`} matches the beginning of the whole input
545
546@item @samp{\'} matches the end of the whole input
547
548@end enumerate
549
550
551Grouping is performed with backslashes followed by parentheses @samp{\(}, @samp{\)}. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{\(}.
552
553The alternation operator is @samp{\|}.
554
555The character @samp{^} only represents the beginning of a string when it appears:
556@enumerate
557
558@item
559At the beginning of a regular expression
560
561@item After an open-group, signified by
562@samp{\(}
563
564@item After the alternation operator @samp{\|}
565
566@end enumerate
567
568
569The character @samp{$} only represents the end of a string when it appears:
570@enumerate
571
572@item At the end of a regular expression
573
574@item Before an close-group, signified by
575@samp{\)}
576@item Before the alternation operator @samp{\|}
577
578@end enumerate
579
580
581@samp{\*}, @samp{\+} and @samp{\?} are special at any point in a regular expression except:
582@enumerate
583
584@item At the beginning of a regular expression
585
586@item After an open-group, signified by
587@samp{\(}
588@item After the alternation operator @samp{\|}
589
590@end enumerate
591
592
593Intervals are specified by @samp{\@{} and @samp{\@}}. Invalid intervals such as @samp{a\@{1z} are not accepted.
594
595The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.
596
597
598@node posix-egrep regular expression syntax
599@subsection @samp{posix-egrep} regular expression syntax
600
601
602The character @samp{.} matches any single character except newline.
603
604
605@table @samp
606
607@item +
608indicates that the regular expression should match one or more occurrences of the previous atom or regexp.
609@item ?
610indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.
611@item \+
612matches a @samp{+}
613@item \?
614matches a @samp{?}.
615@end table
616
617
618Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are ignored. Within square brackets, @samp{\} is taken literally. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. Non-matching lists @samp{[^@dots{}]} do not ever match newline.
619
620GNU extensions are supported:
621@enumerate
622
623@item @samp{\w} matches a character within a word
624
625@item @samp{\W} matches a character which is not within a word
626
627@item @samp{\<} matches the beginning of a word
628
629@item @samp{\>} matches the end of a word
630
631@item @samp{\b} matches a word boundary
632
633@item @samp{\B} matches characters which are not a word boundary
634
635@item @samp{\`} matches the beginning of the whole input
636
637@item @samp{\'} matches the end of the whole input
638
639@end enumerate
640
641
642Grouping is performed with parentheses @samp{()}. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{(}.
643
644The alternation operator is @samp{|}.
645
646The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified.
647
648The characters @samp{*}, @samp{+} and @samp{?} are special anywhere in a regular expression.
649
650Intervals are specified by @samp{@{} and @samp{@}}. Invalid intervals are treated as literals, for example @samp{a@{1} is treated as @samp{a\@{1}
651
652The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.
653
654
655@node posix-extended regular expression syntax
656@subsection @samp{posix-extended} regular expression syntax
657
658
659The character @samp{.} matches any single character except the null character.
660
661
662@table @samp
663
664@item +
665indicates that the regular expression should match one or more occurrences of the previous atom or regexp.
666@item ?
667indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.
668@item \+
669matches a @samp{+}
670@item \?
671matches a @samp{?}.
672@end table
673
674
675Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} is taken literally. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit.
676
677GNU extensions are supported:
678@enumerate
679
680@item @samp{\w} matches a character within a word
681
682@item @samp{\W} matches a character which is not within a word
683
684@item @samp{\<} matches the beginning of a word
685
686@item @samp{\>} matches the end of a word
687
688@item @samp{\b} matches a word boundary
689
690@item @samp{\B} matches characters which are not a word boundary
691
692@item @samp{\`} matches the beginning of the whole input
693
694@item @samp{\'} matches the end of the whole input
695
696@end enumerate
697
698
699Grouping is performed with parentheses @samp{()}. An unmatched @samp{)} matches just itself. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{(}.
700
701The alternation operator is @samp{|}.
702
703The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified.
704
705@samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except the following places, where they are not allowed:
706@enumerate
707
708@item At the beginning of a regular expression
709
710@item After an open-group, signified by
711@samp{(}
712@item After the alternation operator @samp{|}
713
714@end enumerate
715
716
717Intervals are specified by @samp{@{} and @samp{@}}. Invalid intervals such as @samp{a@{1z} are not accepted.
718
719The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.
720
Note: See TracBrowser for help on using the repository browser.