Context Navigation

← Previous Revision
Latest Revision
Next Revision →
Blame
Revision Log

sed.texi@ 1846

Last change on this file since 1846 was 599, checked in by bird, 19 years ago
GNU sed 4.1.5.
File size: 136.3 KB

Line
1	\input texinfo @c --texinfo--
2	@c Do not edit this file!! It is automatically generated from sed-in.texi.
3	@c
4	@c -- Stuff that needs adding: ----------------------------------------------
5	@c (document the `;' command-separator)
6	@c --------------------------------------------------------------------------
7	@c Check for consistency: regexps in @code, text that they match in @samp.
8	@c
9	@c Tips:
10	@c @command for command
11	@c @samp for command fragments: @samp{cat -s}
12	@c @code for sed commands and flags
13	@c Use ``quote'' not `quote' or "quote".
14	@c
15	@c %**start of header
16	@setfilename sed.info
17	@settitle sed, a stream editor
18	@c %**end of header
19
20	@c @smallbook
21
22	@include version.texi
23
24	@c Combine indices.
25	@syncodeindex ky cp
26	@syncodeindex pg cp
27	@syncodeindex tp cp
28
29	@defcodeindex op
30	@syncodeindex op fn
31
32	@include config.texi
33
34	@copying
35	This file documents version @value{VERSION} of
36	@value{SSED}, a stream editor.
37
38	Copyright @copyright{} 1998, 1999, 2001, 2002, 2003, 2004 Free
39	Software Foundation, Inc.
40
41	This document is released under the terms of the @acronym{GNU} Free
42	Documentation License as published by the Free Software Foundation;
43	either version 1.1, or (at your option) any later version.
44
45	You should have received a copy of the @acronym{GNU} Free Documentation
46	License along with @value{SSED}; see the file @file{COPYING.DOC}.
47	If not, write to the Free Software Foundation, 59 Temple Place - Suite
48	330, Boston, MA 02110-1301, USA.
49
50	There are no Cover Texts and no Invariant Sections; this text, along
51	with its equivalent in the printed manual, constitutes the Title Page.
52	@end copying
53
54	@setchapternewpage off
55
56	@titlepage
57	@title @command{sed}, a stream editor
58	@subtitle version @value{VERSION}, @value{UPDATED}
59	@author by Ken Pizzini, Paolo Bonzini
60
61	@page
62	@vskip 0pt plus 1filll
63	Copyright @copyright{} 1998, 1999 Free Software Foundation, Inc.
64
65	@insertcopying
66
67	Published by the Free Software Foundation, @*
68	51 Franklin Street, Fifth Floor @*
69	Boston, MA 02110-1301, USA
70	@end titlepage
71
72
73	@node Top
74	@top
75
76	@ifnottex
77	@insertcopying
78	@end ifnottex
79
80	@menu
81	* Introduction:: Introduction
82	* Invoking sed:: Invocation
83	* sed Programs:: @command{sed} programs
84	* Examples:: Some sample scripts
85	* Limitations:: Limitations and (non-)limitations of @value{SSED}
86	* Other Resources:: Other resources for learning about @command{sed}
87	* Reporting Bugs:: Reporting bugs
88
89	* Extended regexps:: @command{egrep}-style regular expressions
90	@ifset PERL
91	* Perl regexps:: Perl-style regular expressions
92	@end ifset
93
94	* Concept Index:: A menu with all the topics in this manual.
95	* Command and Option Index:: A menu with all @command{sed} commands and
96	command-line options.
97
98	@detailmenu
99	--- The detailed node listing ---
100
101	sed Programs:
102	* Execution Cycle:: How @command{sed} works
103	* Addresses:: Selecting lines with @command{sed}
104	* Regular Expressions:: Overview of regular expression syntax
105	* Common Commands:: Often used commands
106	* The "s" Command:: @command{sed}'s Swiss Army Knife
107	* Other Commands:: Less frequently used commands
108	* Programming Commands:: Commands for @command{sed} gurus
109	* Extended Commands:: Commands specific of @value{SSED}
110	* Escapes:: Specifying special characters
111
112	Examples:
113	* Centering lines::
114	* Increment a number::
115	* Rename files to lower case::
116	* Print bash environment::
117	* Reverse chars of lines::
118	* tac:: Reverse lines of files
119	* cat -n:: Numbering lines
120	* cat -b:: Numbering non-blank lines
121	* wc -c:: Counting chars
122	* wc -w:: Counting words
123	* wc -l:: Counting lines
124	* head:: Printing the first lines
125	* tail:: Printing the last lines
126	* uniq:: Make duplicate lines unique
127	* uniq -d:: Print duplicated lines of input
128	* uniq -u:: Remove all duplicated lines
129	* cat -s:: Squeezing blank lines
130
131	@ifset PERL
132	Perl regexps:: Perl-style regular expressions
133	* Backslash:: Introduces special sequences
134	* Circumflex/dollar sign/period:: Behave specially with regard to new lines
135	* Square brackets:: Are a bit different in strange cases
136	* Options setting:: Toggle modifiers in the middle of a regexp
137	* Non-capturing subpatterns:: Are not counted when backreferencing
138	* Repetition:: Allows for non-greedy matching
139	* Backreferences:: Allows for more than 10 back references
140	* Assertions:: Allows for complex look ahead matches
141	* Non-backtracking subpatterns:: Often gives more performance
142	* Conditional subpatterns:: Allows if/then/else branches
143	* Recursive patterns:: For example to match parentheses
144	* Comments:: Because things can get complex...
145	@end ifset
146
147	@end detailmenu
148	@end menu
149
150
151	@node Introduction
152	@chapter Introduction
153
154	@cindex Stream editor
155	@command{sed} is a stream editor.
156	A stream editor is used to perform basic text
157	transformations on an input stream
158	(a file or input from a pipeline).
159	While in some ways similar to an editor which
160	permits scripted edits (such as @command{ed}),
161	@command{sed} works by making only one pass over the
162	input(s), and is consequently more efficient.
163	But it is @command{sed}'s ability to filter text in a pipeline
164	which particularly distinguishes it from other types of
165	editors.
166
167
168	@node Invoking sed
169	@chapter Invocation
170
171	Normally @command{sed} is invoked like this:
172
173	@example
174	sed SCRIPT INPUTFILE...
175	@end example
176
177	The full format for invoking @command{sed} is:
178
179	@example
180	sed OPTIONS... [SCRIPT] [INPUTFILE...]
181	@end example
182
183	If you do not specify @var{INPUTFILE}, or if @var{INPUTFILE} is @file{-},
184	@command{sed} filters the contents of the standard input. The @var{script}
185	is actually the first non-option parameter, which @command{sed} specially
186	considers a script and not an input file if (and only if) none of the
187	other @var{options} specifies a script to be executed, that is if neither
188	of the @option{-e} and @option{-f} options is specified.
189
190	@command{sed} may be invoked with the following command-line options:
191
192	@table @code
193	@item --version
194	@opindex --version
195	@cindex Version, printing
196	Print out the version of @command{sed} that is being run and a copyright notice,
197	then exit.
198
199	@item --help
200	@opindex --help
201	@cindex Usage summary, printing
202	Print a usage message briefly summarizing these command-line options
203	and the bug-reporting address,
204	then exit.
205
206	@item -n
207	@itemx --quiet
208	@itemx --silent
209	@opindex -n
210	@opindex --quiet
211	@opindex --silent
212	@cindex Disabling autoprint, from command line
213	By default, @command{sed} prints out the pattern space
214	at the end of each cycle through the script.
215	These options disable this automatic printing,
216	and @command{sed} only produces output when explicitly told to
217	via the @code{p} command.
218
219	@item -i[@var{SUFFIX}]
220	@itemx --in-place[=@var{SUFFIX}]
221	@opindex -i
222	@opindex --in-place
223	@cindex In-place editing, activating
224	@cindex @value{SSEDEXT}, in-place editing
225	This option specifies that files are to be edited in-place.
226	@value{SSED} does this by creating a temporary file and
227	sending output to this file rather than to the standard
228	output.@footnote{This applies to commands such as @code{=},
229	@code{a}, @code{c}, @code{i}, @code{l}, @code{p}. You can
230	still write to the standard output by using the @code{w}
231	@cindex @value{SSEDEXT}, @file{/dev/stdout} file
232	or @code{W} commands together with the @file{/dev/stdout}
233	special file}.
234
235	This option implies @option{-s}.
236
237	When the end of the file is reached, the temporary file is
238	renamed to the output file's original name. The extension,
239	if supplied, is used to modify the name of the old file
240	before renaming the temporary file, thereby making a backup
241	copy@footnote{Note that @value{SSED} creates the backup
242	file whether or not any output is actually changed.}).
243
244	@cindex In-place editing, Perl-style backup file names
245	This rule is followed: if the extension doesn't contain a @code{*},
246	then it is appended to the end of the current filename as a
247	suffix; if the extension does contain one or more @code{*}
248	characters, then @emph{each} asterisk is replaced with the
249	current filename. This allows you to add a prefix to the
250	backup file, instead of (or in addition to) a suffix, or
251	even to place backup copies of the original files into another
252	directory (provided the directory already exists).
253
254	If no extension is supplied, the original file is
255	overwritten without making a backup.
256
257	@item -l @var{N}
258	@itemx --line-length=@var{N}
259	@opindex -l
260	@opindex --line-length
261	@cindex Line length, setting
262	Specify the default line-wrap length for the @code{l} command.
263	A length of 0 (zero) means to never wrap long lines. If
264	not specified, it is taken to be 70.
265
266	@item --posix
267	@cindex @value{SSEDEXT}, disabling
268	@value{SSED} includes several extensions to @acronym{POSIX}
269	sed. In order to simplify writing portable scripts, this
270	option disables all the extensions that this manual documents,
271	including additional commands.
272	@cindex @code{POSIXLY_CORRECT} behavior, enabling
273	Most of the extensions accept @command{sed} programs that
274	are outside the syntax mandated by @acronym{POSIX}, but some
275	of them (such as the behavior of the @command{N} command
276	described in @pxref{Reporting Bugs}) actually violate the
277	standard. If you want to disable only the latter kind of
278	extension, you can set the @code{POSIXLY_CORRECT} variable
279	to a non-empty value.
280
281	@item -r
282	@itemx --regexp-extended
283	@opindex -r
284	@opindex --regexp-extended
285	@cindex Extended regular expressions, choosing
286	@cindex @acronym{GNU} extensions, extended regular expressions
287	Use extended regular expressions rather than basic
288	regular expressions. Extended regexps are those that
289	@command{egrep} accepts; they can be clearer because they
290	usually have less backslashes, but are a @acronym{GNU} extension
291	and hence scripts that use them are not portable.
292	@xref{Extended regexps, , Extended regular expressions}.
293
294	@ifset PERL
295	@item -R
296	@itemx --regexp-perl
297	@opindex -R
298	@opindex --regexp-perl
299	@cindex Perl-style regular expressions, choosing
300	@cindex @value{SSEDEXT}, Perl-style regular expressions
301	Use Perl-style regular expressions rather than basic
302	regular expressions. Perl-style regexps are extremely
303	powerful but are a @value{SSED} extension and hence scripts that
304	use it are not portable. @xref{Perl regexps, ,
305	Perl-style regular expressions}.
306	@end ifset
307
308	@item -s
309	@itemx --separate
310	@cindex Working on separate files
311	By default, @command{sed} will consider the files specified on the
312	command line as a single continuous long stream. This @value{SSED}
313	extension allows the user to consider them as separate files:
314	range addresses (such as @samp{/abc/,/def/}) are not allowed
315	to span several files, line numbers are relative to the start
316	of each file, @code{$} refers to the last line of each file,
317	and files invoked from the @code{R} commands are rewound at the
318	start of each file.
319
320	@item -u
321	@itemx --unbuffered
322	@opindex -u
323	@opindex --unbuffered
324	@cindex Unbuffered I/O, choosing
325	Buffer both input and output as minimally as practical.
326	(This is particularly useful if the input is coming from
327	the likes of @samp{tail -f}, and you wish to see the transformed
328	output as soon as possible.)
329
330	@item -e @var{script}
331	@itemx --expression=@var{script}
332	@opindex -e
333	@opindex --expression
334	@cindex Script, from command line
335	Add the commands in @var{script} to the set of commands to be
336	run while processing the input.
337
338	@item -f @var{script-file}
339	@itemx --file=@var{script-file}
340	@opindex -f
341	@opindex --file
342	@cindex Script, from a file
343	Add the commands contained in the file @var{script-file}
344	to the set of commands to be run while processing the input.
345
346	@end table
347
348	If no @option{-e}, @option{-f}, @option{--expression}, or @option{--file}
349	options are given on the command-line,
350	then the first non-option argument on the command line is
351	taken to be the @var{script} to be executed.
352
353	@cindex Files to be processed as input
354	If any command-line parameters remain after processing the above,
355	these parameters are interpreted as the names of input files to
356	be processed.
357	@cindex Standard input, processing as input
358	A file name of @samp{-} refers to the standard input stream.
359	The standard input will be processed if no file names are specified.
360
361
362	@node sed Programs
363	@chapter @command{sed} Programs
364
365	@cindex @command{sed} program structure
366	@cindex Script structure
367	A @command{sed} program consists of one or more @command{sed} commands,
368	passed in by one or more of the
369	@option{-e}, @option{-f}, @option{--expression}, and @option{--file}
370	options, or the first non-option argument if zero of these
371	options are used.
372	This document will refer to ``the'' @command{sed} script;
373	this is understood to mean the in-order catenation
374	of all of the @var{script}s and @var{script-file}s passed in.
375
376	Each @code{sed} command consists of an optional address or
377	address range, followed by a one-character command name
378	and any additional command-specific code.
379
380	@menu
381	* Execution Cycle:: How @command{sed} works
382	* Addresses:: Selecting lines with @command{sed}
383	* Regular Expressions:: Overview of regular expression syntax
384	* Common Commands:: Often used commands
385	* The "s" Command:: @command{sed}'s Swiss Army Knife
386	* Other Commands:: Less frequently used commands
387	* Programming Commands:: Commands for @command{sed} gurus
388	* Extended Commands:: Commands specific of @value{SSED}
389	* Escapes:: Specifying special characters
390	@end menu
391
392
393	@node Execution Cycle
394	@section How @command{sed} Works
395
396	@cindex Buffer spaces, pattern and hold
397	@cindex Spaces, pattern and hold
398	@cindex Pattern space, definition
399	@cindex Hold space, definition
400	@command{sed} maintains two data buffers: the active @emph{pattern} space,
401	and the auxiliary @emph{hold} space. Both are initially empty.
402
403	@command{sed} operates by performing the following cycle on each
404	lines of input: first, @command{sed} reads one line from the input
405	stream, removes any trailing newline, and places it in the pattern space.
406	Then commands are executed; each command can have an address associated
407	to it: addresses are a kind of condition code, and a command is only
408	executed if the condition is verified before the command is to be
409	executed.
410
411	When the end of the script is reached, unless the @option{-n} option
412	is in use, the contents of pattern space are printed out to the output
413	stream, adding back the trailing newline if it was removed.@footnote{Actually,
414	if @command{sed} prints a line without the terminating newline, it will
415	nevertheless print the missing newline as soon as more text is sent to
416	the same output stream, which gives the ``least expected surprise''
417	even though it does not make commands like @samp{sed -n p} exactly
418	identical to @command{cat}.} Then the next cycle starts for the next
419	input line.
420
421	Unless special commands (like @samp{D}) are used, the pattern space is
422	deleted between two cycles. The hold space, on the other hand, keeps
423	its data between cycles (see commands @samp{h}, @samp{H}, @samp{x},
424	@samp{g}, @samp{G} to move data between both buffers).
425
426
427	@node Addresses
428	@section Selecting lines with @command{sed}
429	@cindex Addresses, in @command{sed} scripts
430	@cindex Line selection
431	@cindex Selecting lines to process
432
433	Addresses in a @command{sed} script can be in any of the following forms:
434	@table @code
435	@item @var{number}
436	@cindex Address, numeric
437	@cindex Line, selecting by number
438	Specifying a line number will match only that line in the input.
439	(Note that @command{sed} counts lines continuously across all input files
440	unless @option{-i} or @option{-s} options are specified.)
441
442	@item @var{first}~@var{step}
443	@cindex @acronym{GNU} extensions, @samp{@var{n}~@var{m}} addresses
444	This @acronym{GNU} extension matches every @var{step}th line
445	starting with line @var{first}.
446	In particular, lines will be selected when there exists
447	a non-negative @var{n} such that the current line-number equals
448	@var{first} + (@var{n} * @var{step}).
449	Thus, to select the odd-numbered lines,
450	one would use @code{1~2};
451	to pick every third line starting with the second, @samp{2~3} would be used;
452	to pick every fifth line starting with the tenth, use @samp{10~5};
453	and @samp{50~0} is just an obscure way of saying @code{50}.
454
455	@item $
456	@cindex Address, last line
457	@cindex Last line, selecting
458	@cindex Line, selecting last
459	This address matches the last line of the last file of input, or
460	the last line of each file when the @option{-i} or @option{-s} options
461	are specified.
462
463	@item /@var{regexp}/
464	@cindex Address, as a regular expression
465	@cindex Line, selecting by regular expression match
466	This will select any line which matches the regular expression @var{regexp}.
467	If @var{regexp} itself includes any @code{/} characters,
468	each must be escaped by a backslash (@code{\}).
469
470	@cindex empty regular expression
471	@cindex @value{SSEDEXT}, modifiers and the empty regular expression
472	The empty regular expression @samp{//} repeats the last regular
473	expression match (the same holds if the empty regular expression is
474	passed to the @code{s} command). Note that modifiers to regular expressions
475	are evaluated when the regular expression is compiled, thus it is invalid to
476	specify them together with the empty regular expression.
477
478	@item \%@var{regexp}%
479	(The @code{%} may be replaced by any other single character.)
480
481	@cindex Slash character, in regular expressions
482	This also matches the regular expression @var{regexp},
483	but allows one to use a different delimiter than @code{/}.
484	This is particularly useful if the @var{regexp} itself contains
485	a lot of slashes, since it avoids the tedious escaping of every @code{/}.
486	If @var{regexp} itself includes any delimiter characters,
487	each must be escaped by a backslash (@code{\}).
488
489	@item /@var{regexp}/I
490	@itemx \%@var{regexp}%I
491	@cindex @acronym{GNU} extensions, @code{I} modifier
492	@ifset PERL
493	@cindex Perl-style regular expressions, case-insensitive
494	@end ifset
495	The @code{I} modifier to regular-expression matching is a @acronym{GNU}
496	extension which causes the @var{regexp} to be matched in
497	a case-insensitive manner.
498
499	@item /@var{regexp}/M
500	@itemx \%@var{regexp}%M
501	@ifset PERL
502	@cindex @value{SSEDEXT}, @code{M} modifier
503	@end ifset
504	@cindex Perl-style regular expressions, multiline
505	The @code{M} modifier to regular-expression matching is a @value{SSED}
506	extension which causes @code{^} and @code{$} to match respectively
507	(in addition to the normal behavior) the empty string after a newline,
508	and the empty string before a newline. There are special character
509	sequences
510	@ifset PERL
511	(@code{\A} and @code{\Z} in Perl mode, @code{\`} and @code{\'}
512	in basic or extended regular expression modes)
513	@end ifset
514	@ifclear PERL
515	(@code{\`} and @code{\'})
516	@end ifclear
517	which always match the beginning or the end of the buffer.
518	@code{M} stands for @cite{multi-line}.
519
520	@ifset PERL
521	@item /@var{regexp}/S
522	@itemx \%@var{regexp}%S
523	@cindex @value{SSEDEXT}, @code{S} modifier
524	@cindex Perl-style regular expressions, single line
525	The @code{S} modifier to regular-expression matching is only valid
526	in Perl mode and specifies that the dot character (@code{.}) will
527	match the newline character too. @code{S} stands for @cite{single-line}.
528	@end ifset
529
530	@ifset PERL
531	@item /@var{regexp}/X
532	@itemx \%@var{regexp}%X
533	@cindex @value{SSEDEXT}, @code{X} modifier
534	@cindex Perl-style regular expressions, extended
535	The @code{X} modifier to regular-expression matching is also
536	valid in Perl mode only. If it is used, whitespace in the
537	pattern (other than in a character class) and
538	characters between a @kbd{#} outside a character class and the
539	next newline character are ignored. An escaping backslash
540	can be used to include a whitespace or @kbd{#} character as part
541	of the pattern.
542	@end ifset
543	@end table
544
545	If no addresses are given, then all lines are matched;
546	if one address is given, then only lines matching that
547	address are matched.
548
549	@cindex Range of lines
550	@cindex Several lines, selecting
551	An address range can be specified by specifying two addresses
552	separated by a comma (@code{,}). An address range matches lines
553	starting from where the first address matches, and continues
554	until the second address matches (inclusively).
555
556	If the second address is a @var{regexp}, then checking for the
557	ending match will start with the line @emph{following} the
558	line which matched the first address: a range will always
559	span at least two lines (except of course if the input stream
560	ends).
561
562	If the second address is a @var{number} less than (or equal to)
563	the line matching the first address, then only the one line is
564	matched.
565
566	@cindex Special addressing forms
567	@cindex Range with start address of zero
568	@cindex Zero, as range start address
569	@cindex @var{addr1},+N
570	@cindex @var{addr1},~N
571	@cindex @acronym{GNU} extensions, special two-address forms
572	@cindex @acronym{GNU} extensions, @code{0} address
573	@cindex @acronym{GNU} extensions, 0,@var{addr2} addressing
574	@cindex @acronym{GNU} extensions, @var{addr1},+@var{N} addressing
575	@cindex @acronym{GNU} extensions, @var{addr1},~@var{N} addressing
576	@value{SSED} also supports some special two-address forms; all these
577	are @acronym{GNU} extensions:
578	@table @code
579	@item 0,/@var{regexp}/
580	A line number of @code{0} can be used in an address specification like
581	@code{0,/@var{regexp}/} so that @command{sed} will try to match
582	@var{regexp} in the first input line too. In other words,
583	@code{0,/@var{regexp}/} is similar to @code{1,/@var{regexp}/},
584	except that if @var{addr2} matches the very first line of input the
585	@code{0,/@var{regexp}/} form will consider it to end the range, whereas
586	the @code{1,/@var{regexp}/} form will match the beginning of its range and
587	hence make the range span up to the @emph{second} occurrence of the
588	regular expression.
589
590	Note that this is the only place where the @code{0} address makes
591	sense; there is no 0-th line and commands which are given the @code{0}
592	address in any other way will give an error.
593
594	@item @var{addr1},+@var{N}
595	Matches @var{addr1} and the @var{N} lines following @var{addr1}.
596
597	@item @var{addr1},~@var{N}
598	Matches @var{addr1} and the lines following @var{addr1}
599	until the next line whose input line number is a multiple of @var{N}.
600	@end table
601
602	@cindex Excluding lines
603	@cindex Selecting non-matching lines
604	Appending the @code{!} character to the end of an address
605	specification negates the sense of the match.
606	That is, if the @code{!} character follows an address range,
607	then only lines which do @emph{not} match the address range
608	will be selected.
609	This also works for singleton addresses,
610	and, perhaps perversely, for the null address.
611
612
613	@node Regular Expressions
614	@section Overview of Regular Expression Syntax
615
616	To know how to use @command{sed}, people should understand regular
617	expressions (@dfn{regexp} for short). A regular expression
618	is a pattern that is matched against a
619	subject string from left to right. Most characters are
620	@dfn{ordinary}: they stand for
621	themselves in a pattern, and match the corresponding characters
622	in the subject. As a trivial example, the pattern
623
624	@example
625	The quick brown fox
626	@end example
627
628	@noindent
629	matches a portion of a subject string that is identical to
630	itself. The power of regular expressions comes from the
631	ability to include alternatives and repetitions in the pattern.
632	These are encoded in the pattern by the use of @dfn{special characters},
633	which do not stand for themselves but instead
634	are interpreted in some special way. Here is a brief description
635	of regular expression syntax as used in @command{sed}.
636
637	@table @code
638	@item @var{char}
639	A single ordinary character matches itself.
640
641	@item *
642	@cindex @acronym{GNU} extensions, to basic regular expressions
643	Matches a sequence of zero or more instances of matches for the
644	preceding regular expression, which must be an ordinary character, a
645	special character preceded by @code{\}, a @code{.}, a grouped regexp
646	(see below), or a bracket expression. As a @acronym{GNU} extension, a
647	postfixed regular expression can also be followed by @code{*}; for
648	example, @code{a*} is equivalent to @code{a}. @acronym{POSIX}
649	1003.1-2001 says that @code{*} stands for itself when it appears at
650	the start of a regular expression or subexpression, but many
651	non@acronym{GNU} implementations do not support this and portable
652	scripts should instead use @code{\*} in these contexts.
653
654	@item \+
655	@cindex @acronym{GNU} extensions, to basic regular expressions
656	As @code{*}, but matches one or more. It is a @acronym{GNU} extension.
657
658	@item \?
659	@cindex @acronym{GNU} extensions, to basic regular expressions
660	As @code{*}, but only matches zero or one. It is a @acronym{GNU} extension.
661
662	@item \@{@var{i}\@}
663	As @code{*}, but matches exactly @var{i} sequences (@var{i} is a
664	decimal integer; for portability, keep it between 0 and 255
665	inclusive).
666
667	@item \@{@var{i},@var{j}\@}
668	Matches between @var{i} and @var{j}, inclusive, sequences.
669
670	@item \@{@var{i},\@}
671	Matches more than or equal to @var{i} sequences.
672
673	@item $@var{regexp}$
674	Groups the inner @var{regexp} as a whole, this is used to:
675
676	@itemize @bullet
677	@item
678	@cindex @acronym{GNU} extensions, to basic regular expressions
679	Apply postfix operators, like @code{$abcd$*}:
680	this will search for zero or more whole sequences
681	of @samp{abcd}, while @code{abcd*} would search
682	for @samp{abc} followed by zero or more occurrences
683	of @samp{d}. Note that support for @code{$abcd$*} is
684	required by @acronym{POSIX} 1003.1-2001, but many non-@acronym{GNU}
685	implementations do not support it and hence it is not universally
686	portable.
687
688	@item
689	Use back references (see below).
690	@end itemize
691
692	@item .
693	Matches any character, including newline.
694
695	@item ^
696	Matches the null string at beginning of line, i.e. what
697	appears after the circumflex must appear at the
698	beginning of line. @code{^#include} will match only
699	lines where @samp{#include} is the first thing on line---if
700	there are spaces before, for example, the match fails.
701	@code{^} acts as a special character only at the beginning
702	of the regular expression or subexpression (that is,
703	after @code{\(} or @code{\\|}). Portable scripts should avoid
704	@code{^} at the beginning of a subexpression, though, as
705	@acronym{POSIX} allows implementations that treat @code{^} as
706	an ordinary character in that context.
707
708
709	@item $
710	It is the same as @code{^}, but refers to end of line.
711	@code{$} also acts as a special character only at the end
712	of the regular expression or subexpression (that is, before @code{\)}
713	or @code{\\|}), and its use at the end of a subexpression is not
714	portable.
715
716
717	@item [@var{list}]
718	@itemx [^@var{list}]
719	Matches any single character in @var{list}: for example,
720	@code{[aeiou]} matches all vowels. A list may include
721	sequences like @code{@var{char1}-@var{char2}}, which
722	matches any character between (inclusive) @var{char1}
723	and @var{char2}.
724
725	A leading @code{^} reverses the meaning of @var{list}, so that
726	it matches any single character @emph{not} in @var{list}. To include
727	@code{]} in the list, make it the first character (after
728	the @code{^} if needed), to include @code{-} in the list,
729	make it the first or last; to include @code{^} put
730	it after the first character.
731
732	@cindex @code{POSIXLY_CORRECT} behavior, bracket expressions
733	The characters @code{$}, @code{*}, @code{.}, @code{[}, and @code{\}
734	are normally not special within @var{list}. For example, @code{[\*]}
735	matches either @samp{\} or @samp{*}, because the @code{\} is not
736	special here. However, strings like @code{[.ch.]}, @code{[=a=]}, and
737	@code{[:space:]} are special within @var{list} and represent collating
738	symbols, equivalence classes, and character classes, respectively, and
739	@code{[} is therefore special within @var{list} when it is followed by
740	@code{.}, @code{=}, or @code{:}. Also, when not in
741	@env{POSIXLY_CORRECT} mode, special escapes like @code{\n} and
742	@code{\t} are recognized within @var{list}. @xref{Escapes}.
743
744	@item @var{regexp1}\\|@var{regexp2}
745	@cindex @acronym{GNU} extensions, to basic regular expressions
746	Matches either @var{regexp1} or @var{regexp2}. Use
747	parentheses to use complex alternative regular expressions.
748	The matching process tries each alternative in turn, from
749	left to right, and the first one that succeeds is used.
750	It is a @acronym{GNU} extension.
751
752	@item @var{regexp1}@var{regexp2}
753	Matches the concatenation of @var{regexp1} and @var{regexp2}.
754	Concatenation binds more tightly than @code{\\|}, @code{^}, and
755	@code{$}, but less tightly than the other regular expression
756	operators.
757
758	@item \@var{digit}
759	Matches the @var{digit}-th @code{$@dots{}$} parenthesized
760	subexpression in the regular expression. This is called a @dfn{back
761	reference}. Subexpressions are implicity numbered by counting
762	occurrences of @code{\(} left-to-right.
763
764	@item \n
765	Matches the newline character.
766
767	@item \@var{char}
768	Matches @var{char}, where @var{char} is one of @code{$},
769	@code{*}, @code{.}, @code{[}, @code{\}, or @code{^}.
770	Note that the only C-like
771	backslash sequences that you can portably assume to be
772	interpreted are @code{\n} and @code{\\}; in particular
773	@code{\t} is not portable, and matches a @samp{t} under most
774	implementations of @command{sed}, rather than a tab character.
775
776	@end table
777
778	@cindex Greedy regular expression matching
779	Note that the regular expression matcher is greedy, i.e., matches
780	are attempted from left to right and, if two or more matches are
781	possible starting at the same character, it selects the longest.
782
783	@noindent
784	Examples:
785	@table @samp
786	@item abcdef
787	Matches @samp{abcdef}.
788
789	@item a*b
790	Matches zero or more @samp{a}s followed by a single
791	@samp{b}. For example, @samp{b} or @samp{aaaaab}.
792
793	@item a\?b
794	Matches @samp{b} or @samp{ab}.
795
796	@item a\+b\+
797	Matches one or more @samp{a}s followed by one or more
798	@samp{b}s: @samp{ab} is the shortest possible match, but
799	other examples are @samp{aaaab} or @samp{abbbbb} or
800	@samp{aaaaaabbbbbbb}.
801
802	@item .*
803	@itemx .\+
804	These two both match all the characters in a string;
805	however, the first matches every string (including the empty
806	string), while the second matches only strings containing
807	at least one character.
808
809	@item ^main.(.)
810	his matches a string starting with @samp{main},
811	followed by an opening and closing
812	parenthesis. The @samp{n}, @samp{(} and @samp{)} need not
813	be adjacent.
814
815	@item ^#
816	This matches a string beginning with @samp{#}.
817
818	@item \\$
819	This matches a string ending with a single backslash. The
820	regexp contains two backslashes for escaping.
821
822	@item \$
823	Instead, this matches a string consisting of a single dollar sign,
824	because it is escaped.
825
826	@item [a-zA-Z0-9]
827	In the C locale, this matches any @acronym{ASCII} letters or digits.
828
829	@item [^ @kbd{tab}]\+
830	(Here @kbd{tab} stands for a single tab character.)
831	This matches a string of one or more
832	characters, none of which is a space or a tab.
833	Usually this means a word.
834
835	@item ^$.*$\n\1$
836	This matches a string consisting of two equal substrings separated by
837	a newline.
838
839	@item .\@{9\@}A$
840	This matches nine characters followed by an @samp{A}.
841
842	@item ^.\@{15\@}A
843	This matches the start of a string that contains 16 characters,
844	the last of which is an @samp{A}.
845
846	@end table
847
848
849
850	@node Common Commands
851	@section Often-Used Commands
852
853	If you use @command{sed} at all, you will quite likely want to know
854	these commands.
855
856	@table @code
857	@item #
858	[No addresses allowed.]
859
860	@findex # (comments)
861	@cindex Comments, in scripts
862	The @code{#} character begins a comment;
863	the comment continues until the next newline.
864
865	@cindex Portability, comments
866	If you are concerned about portability, be aware that
867	some implementations of @command{sed} (which are not @sc{posix}
868	conformant) may only support a single one-line comment,
869	and then only when the very first character of the script is a @code{#}.
870
871	@findex -n, forcing from within a script
872	@cindex Caveat --- #n on first line
873	Warning: if the first two characters of the @command{sed} script
874	are @code{#n}, then the @option{-n} (no-autoprint) option is forced.
875	If you want to put a comment in the first line of your script
876	and that comment begins with the letter @samp{n}
877	and you do not want this behavior,
878	then be sure to either use a capital @samp{N},
879	or place at least one space before the @samp{n}.
880
881	@item q [@var{exit-code}]
882	This command only accepts a single address.
883
884	@findex q (quit) command
885	@cindex @value{SSEDEXT}, returning an exit code
886	@cindex Quitting
887	Exit @command{sed} without processing any more commands or input.
888	Note that the current pattern space is printed if auto-print is
889	not disabled with the @option{-n} options. The ability to return
890	an exit code from the @command{sed} script is a @value{SSED} extension.
891
892	@item d
893	@findex d (delete) command
894	@cindex Text, deleting
895	Delete the pattern space;
896	immediately start next cycle.
897
898	@item p
899	@findex p (print) command
900	@cindex Text, printing
901	Print out the pattern space (to the standard output).
902	This command is usually only used in conjunction with the @option{-n}
903	command-line option.
904
905	@item n
906	@findex n (next-line) command
907	@cindex Next input line, replace pattern space with
908	@cindex Read next input line
909	If auto-print is not disabled, print the pattern space,
910	then, regardless, replace the pattern space with the next line of input.
911	If there is no more input then @command{sed} exits without processing
912	any more commands.
913
914	@item @{ @var{commands} @}
915	@findex @{@} command grouping
916	@cindex Grouping commands
917	@cindex Command groups
918	A group of commands may be enclosed between
919	@code{@{} and @code{@}} characters.
920	This is particularly useful when you want a group of commands
921	to be triggered by a single address (or address-range) match.
922
923	@end table
924
925	@node The "s" Command
926	@section The @code{s} Command
927
928	The syntax of the @code{s} (as in substitute) command is
929	@samp{s/@var{regexp}/@var{replacement}/@var{flags}}. The @code{/}
930	characters may be uniformly replaced by any other single
931	character within any given @code{s} command. The @code{/}
932	character (or whatever other character is used in its stead)
933	can appear in the @var{regexp} or @var{replacement}
934	only if it is preceded by a @code{\} character.
935
936	The @code{s} command is probably the most important in @command{sed}
937	and has a lot of different options. Its basic concept is simple:
938	the @code{s} command attempts to match the pattern
939	space against the supplied @var{regexp}; if the match is
940	successful, then that portion of the pattern
941	space which was matched is replaced with @var{replacement}.
942
943	@cindex Backreferences, in regular expressions
944	@cindex Parenthesized substrings
945	The @var{replacement} can contain @code{\@var{n}} (@var{n} being
946	a number from 1 to 9, inclusive) references, which refer to
947	the portion of the match which is contained between the @var{n}th
948	@code{$} and its matching @code{$}.
949	Also, the @var{replacement} can contain unescaped @code{&}
950	characters which reference the whole matched portion
951	of the pattern space.
952	@cindex @value{SSEDEXT}, case modifiers in @code{s} commands
953	Finally, as a @value{SSED} extension, you can include a
954	special sequence made of a backslash and one of the letters
955	@code{L}, @code{l}, @code{U}, @code{u}, or @code{E}.
956	The meaning is as follows:
957
958	@table @code
959	@item \L
960	Turn the replacement
961	to lowercase until a @code{\U} or @code{\E} is found,
962
963	@item \l
964	Turn the
965	next character to lowercase,
966
967	@item \U
968	Turn the replacement to uppercase
969	until a @code{\L} or @code{\E} is found,
970
971	@item \u
972	Turn the next character
973	to uppercase,
974
975	@item \E
976	Stop case conversion started by @code{\L} or @code{\U}.
977	@end table
978
979	To include a literal @code{\}, @code{&}, or newline in the final
980	replacement, be sure to precede the desired @code{\}, @code{&},
981	or newline in the @var{replacement} with a @code{\}.
982
983	@findex s command, option flags
984	@cindex Substitution of text, options
985	The @code{s} command can be followed by zero or more of the
986	following @var{flags}:
987
988	@table @code
989	@item g
990	@cindex Global substitution
991	@cindex Replacing all text matching regexp in a line
992	Apply the replacement to @emph{all} matches to the @var{regexp},
993	not just the first.
994
995	@item @var{number}
996	@cindex Replacing only @var{n}th match of regexp in a line
997	Only replace the @var{number}th match of the @var{regexp}.
998
999	@cindex @acronym{GNU} extensions, @code{g} and @var{number} modifier interaction in @code{s} command
1000	@cindex Mixing @code{g} and @var{number} modifiers in the @code{s} command
1001	Note: the @sc{posix} standard does not specify what should happen
1002	when you mix the @code{g} and @var{number} modifiers,
1003	and currently there is no widely agreed upon meaning
1004	across @command{sed} implementations.
1005	For @value{SSED}, the interaction is defined to be:
1006	ignore matches before the @var{number}th,
1007	and then match and replace all matches from
1008	the @var{number}th on.
1009
1010	@item p
1011	@cindex Text, printing after substitution
1012	If the substitution was made, then print the new pattern space.
1013
1014	Note: when both the @code{p} and @code{e} options are specified,
1015	the relative ordering of the two produces very different results.
1016	In general, @code{ep} (evaluate then print) is what you want,
1017	but operating the other way round can be useful for debugging.
1018	For this reason, the current version of @value{SSED} interprets
1019	specially the presence of @code{p} options both before and after
1020	@code{e}, printing the pattern space before and after evaluation,
1021	while in general flags for the @code{s} command show their
1022	effect just once. This behavior, although documented, might
1023	change in future versions.
1024
1025	@item w @var{file-name}
1026	@cindex Text, writing to a file after substitution
1027	@cindex @value{SSEDEXT}, @file{/dev/stdout} file
1028	@cindex @value{SSEDEXT}, @file{/dev/stderr} file
1029	If the substitution was made, then write out the result to the named file.
1030	As a @value{SSED} extension, two special values of @var{file-name} are
1031	supported: @file{/dev/stderr}, which writes the result to the standard
1032	error, and @file{/dev/stdout}, which writes to the standard
1033	output.@footnote{This is equivalent to @code{p} unless the @option{-i}
1034	option is being used.}
1035
1036	@item e
1037	@cindex Evaluate Bourne-shell commands, after substitution
1038	@cindex Subprocesses
1039	@cindex @value{SSEDEXT}, evaluating Bourne-shell commands
1040	@cindex @value{SSEDEXT}, subprocesses
1041	This command allows one to pipe input from a shell command
1042	into pattern space. If a substitution was made, the command
1043	that is found in pattern space is executed and pattern space
1044	is replaced with its output. A trailing newline is suppressed;
1045	results are undefined if the command to be executed contains
1046	a @sc{nul} character. This is a @value{SSED} extension.
1047
1048	@item I
1049	@itemx i
1050	@cindex @acronym{GNU} extensions, @code{I} modifier
1051	@cindex Case-insensitive matching
1052	@ifset PERL
1053	@cindex Perl-style regular expressions, case-insensitive
1054	@end ifset
1055	The @code{I} modifier to regular-expression matching is a @acronym{GNU}
1056	extension which makes @command{sed} match @var{regexp} in a
1057	case-insensitive manner.
1058
1059	@item M
1060	@itemx m
1061	@cindex @value{SSEDEXT}, @code{M} modifier
1062	@ifset PERL
1063	@cindex Perl-style regular expressions, multiline
1064	@end ifset
1065	The @code{M} modifier to regular-expression matching is a @value{SSED}
1066	extension which causes @code{^} and @code{$} to match respectively
1067	(in addition to the normal behavior) the empty string after a newline,
1068	and the empty string before a newline. There are special character
1069	sequences
1070	@ifset PERL
1071	(@code{\A} and @code{\Z} in Perl mode, @code{\`} and @code{\'}
1072	in basic or extended regular expression modes)
1073	@end ifset
1074	@ifclear PERL
1075	(@code{\`} and @code{\'})
1076	@end ifclear
1077	which always match the beginning or the end of the buffer.
1078	@code{M} stands for @cite{multi-line}.
1079
1080	@ifset PERL
1081	@item S
1082	@itemx s
1083	@cindex @value{SSEDEXT}, @code{S} modifier
1084	@cindex Perl-style regular expressions, single line
1085	The @code{S} modifier to regular-expression matching is only valid
1086	in Perl mode and specifies that the dot character (@code{.}) will
1087	match the newline character too. @code{S} stands for @cite{single-line}.
1088	@end ifset
1089
1090	@ifset PERL
1091	@item X
1092	@itemx x
1093	@cindex @value{SSEDEXT}, @code{X} modifier
1094	@cindex Perl-style regular expressions, extended
1095	The @code{X} modifier to regular-expression matching is also
1096	valid in Perl mode only. If it is used, whitespace in the
1097	pattern (other than in a character class) and
1098	characters between a @kbd{#} outside a character class and the
1099	next newline character are ignored. An escaping backslash
1100	can be used to include a whitespace or @kbd{#} character as part
1101	of the pattern.
1102	@end ifset
1103	@end table
1104
1105
1106	@node Other Commands
1107	@section Less Frequently-Used Commands
1108
1109	Though perhaps less frequently used than those in the previous
1110	section, some very small yet useful @command{sed} scripts can be built with
1111	these commands.
1112
1113	@table @code
1114	@item y/@var{source-chars}/@var{dest-chars}/
1115	(The @code{/} characters may be uniformly replaced by
1116	any other single character within any given @code{y} command.)
1117
1118	@findex y (transliterate) command
1119	@cindex Transliteration
1120	Transliterate any characters in the pattern space which match
1121	any of the @var{source-chars} with the corresponding character
1122	in @var{dest-chars}.
1123
1124	Instances of the @code{/} (or whatever other character is used in its stead),
1125	@code{\}, or newlines can appear in the @var{source-chars} or @var{dest-chars}
1126	lists, provide that each instance is escaped by a @code{\}.
1127	The @var{source-chars} and @var{dest-chars} lists @emph{must}
1128	contain the same number of characters (after de-escaping).
1129
1130	@item a\
1131	@itemx @var{text}
1132	@cindex @value{SSEDEXT}, two addresses supported by most commands
1133	As a @acronym{GNU} extension, this command accepts two addresses.
1134
1135	@findex a (append text lines) command
1136	@cindex Appending text after a line
1137	@cindex Text, appending
1138	Queue the lines of text which follow this command
1139	(each but the last ending with a @code{\},
1140	which are removed from the output)
1141	to be output at the end of the current cycle,
1142	or when the next input line is read.
1143
1144	Escape sequences in @var{text} are processed, so you should
1145	use @code{\\} in @var{text} to print a single backslash.
1146
1147	As a @acronym{GNU} extension, if between the @code{a} and the newline there is
1148	other than a whitespace-@code{\} sequence, then the text of this line,
1149	starting at the first non-whitespace character after the @code{a},
1150	is taken as the first line of the @var{text} block.
1151	(This enables a simplification in scripting a one-line add.)
1152	This extension also works with the @code{i} and @code{c} commands.
1153
1154	@item i\
1155	@itemx @var{text}
1156	@cindex @value{SSEDEXT}, two addresses supported by most commands
1157	As a @acronym{GNU} extension, this command accepts two addresses.
1158
1159	@findex i (insert text lines) command
1160	@cindex Inserting text before a line
1161	@cindex Text, insertion
1162	Immediately output the lines of text which follow this command
1163	(each but the last ending with a @code{\},
1164	which are removed from the output).
1165
1166	@item c\
1167	@itemx @var{text}
1168	@findex c (change to text lines) command
1169	@cindex Replacing selected lines with other text
1170	Delete the lines matching the address or address-range,
1171	and output the lines of text which follow this command
1172	(each but the last ending with a @code{\},
1173	which are removed from the output)
1174	in place of the last line
1175	(or in place of each line, if no addresses were specified).
1176	A new cycle is started after this command is done,
1177	since the pattern space will have been deleted.
1178
1179	@item =
1180	@cindex @value{SSEDEXT}, two addresses supported by most commands
1181	As a @acronym{GNU} extension, this command accepts two addresses.
1182
1183	@findex = (print line number) command
1184	@cindex Printing line number
1185	@cindex Line number, printing
1186	Print out the current input line number (with a trailing newline).
1187
1188	@item l @var{n}
1189	@findex l (list unambiguously) command
1190	@cindex List pattern space
1191	@cindex Printing text unambiguously
1192	@cindex Line length, setting
1193	@cindex @value{SSEDEXT}, setting line length
1194	Print the pattern space in an unambiguous form:
1195	non-printable characters (and the @code{\} character)
1196	are printed in C-style escaped form; long lines are split,
1197	with a trailing @code{\} character to indicate the split;
1198	the end of each line is marked with a @code{$}.
1199
1200	@var{n} specifies the desired line-wrap length;
1201	a length of 0 (zero) means to never wrap long lines. If omitted,
1202	the default as specified on the command line is used. The @var{n}
1203	parameter is a @value{SSED} extension.
1204
1205	@item r @var{filename}
1206	@cindex @value{SSEDEXT}, two addresses supported by most commands
1207	As a @acronym{GNU} extension, this command accepts two addresses.
1208
1209	@findex r (read file) command
1210	@cindex Read text from a file
1211	@cindex @value{SSEDEXT}, @file{/dev/stdin} file
1212	Queue the contents of @var{filename} to be read and
1213	inserted into the output stream at the end of the current cycle,
1214	or when the next input line is read.
1215	Note that if @var{filename} cannot be read, it is treated as
1216	if it were an empty file, without any error indication.
1217
1218	As a @value{SSED} extension, the special value @file{/dev/stdin}
1219	is supported for the file name, which reads the contents of the
1220	standard input.
1221
1222	@item w @var{filename}
1223	@findex w (write file) command
1224	@cindex Write to a file
1225	@cindex @value{SSEDEXT}, @file{/dev/stdout} file
1226	@cindex @value{SSEDEXT}, @file{/dev/stderr} file
1227	Write the pattern space to @var{filename}.
1228	As a @value{SSED} extension, two special values of @var{file-name} are
1229	supported: @file{/dev/stderr}, which writes the result to the standard
1230	error, and @file{/dev/stdout}, which writes to the standard
1231	output.@footnote{This is equivalent to @code{p} unless the @option{-i}
1232	option is being used.}
1233
1234	The file will be created (or truncated) before the
1235	first input line is read; all @code{w} commands
1236	(including instances of @code{w} flag on successful @code{s} commands)
1237	which refer to the same @var{filename} are output without
1238	closing and reopening the file.
1239
1240	@item D
1241	@findex D (delete first line) command
1242	@cindex Delete first line from pattern space
1243	Delete text in the pattern space up to the first newline.
1244	If any text is left, restart cycle with the resultant
1245	pattern space (without reading a new line of input),
1246	otherwise start a normal new cycle.
1247
1248	@item N
1249	@findex N (append Next line) command
1250	@cindex Next input line, append to pattern space
1251	@cindex Append next input line to pattern space
1252	Add a newline to the pattern space,
1253	then append the next line of input to the pattern space.
1254	If there is no more input then @command{sed} exits without processing
1255	any more commands.
1256
1257	@item P
1258	@findex P (print first line) command
1259	@cindex Print first line from pattern space
1260	Print out the portion of the pattern space up to the first newline.
1261
1262	@item h
1263	@findex h (hold) command
1264	@cindex Copy pattern space into hold space
1265	@cindex Replace hold space with copy of pattern space
1266	@cindex Hold space, copying pattern space into
1267	Replace the contents of the hold space with the contents of the pattern space.
1268
1269	@item H
1270	@findex H (append Hold) command
1271	@cindex Append pattern space to hold space
1272	@cindex Hold space, appending from pattern space
1273	Append a newline to the contents of the hold space,
1274	and then append the contents of the pattern space to that of the hold space.
1275
1276	@item g
1277	@findex g (get) command
1278	@cindex Copy hold space into pattern space
1279	@cindex Replace pattern space with copy of hold space
1280	@cindex Hold space, copy into pattern space
1281	Replace the contents of the pattern space with the contents of the hold space.
1282
1283	@item G
1284	@findex G (appending Get) command
1285	@cindex Append hold space to pattern space
1286	@cindex Hold space, appending to pattern space
1287	Append a newline to the contents of the pattern space,
1288	and then append the contents of the hold space to that of the pattern space.
1289
1290	@item x
1291	@findex x (eXchange) command
1292	@cindex Exchange hold space with pattern space
1293	@cindex Hold space, exchange with pattern space
1294	Exchange the contents of the hold and pattern spaces.
1295
1296	@end table
1297
1298
1299	@node Programming Commands
1300	@section Commands for @command{sed} gurus
1301
1302	In most cases, use of these commands indicates that you are
1303	probably better off programming in something like @command{awk}
1304	or Perl. But occasionally one is committed to sticking
1305	with @command{sed}, and these commands can enable one to write
1306	quite convoluted scripts.
1307
1308	@cindex Flow of control in scripts
1309	@table @code
1310	@item : @var{label}
1311	[No addresses allowed.]
1312
1313	@findex : (label) command
1314	@cindex Labels, in scripts
1315	Specify the location of @var{label} for branch commands.
1316	In all other respects, a no-op.
1317
1318	@item b @var{label}
1319	@findex b (branch) command
1320	@cindex Branch to a label, unconditionally
1321	@cindex Goto, in scripts
1322	Unconditionally branch to @var{label}.
1323	The @var{label} may be omitted, in which case the next cycle is started.
1324
1325	@item t @var{label}
1326	@findex t (test and branch if successful) command
1327	@cindex Branch to a label, if @code{s///} succeeded
1328	@cindex Conditional branch
1329	Branch to @var{label} only if there has been a successful @code{s}ubstitution
1330	since the last input line was read or conditional branch was taken.
1331	The @var{label} may be omitted, in which case the next cycle is started.
1332
1333	@end table
1334
1335	@node Extended Commands
1336	@section Commands Specific to @value{SSED}
1337
1338	These commands are specific to @value{SSED}, so you
1339	must use them with care and only when you are sure that
1340	hindering portability is not evil. They allow you to check
1341	for @value{SSED} extensions or to do tasks that are required
1342	quite often, yet are unsupported by standard @command{sed}s.
1343
1344	@table @code
1345	@item e [@var{command}]
1346	@findex e (evaluate) command
1347	@cindex Evaluate Bourne-shell commands
1348	@cindex Subprocesses
1349	@cindex @value{SSEDEXT}, evaluating Bourne-shell commands
1350	@cindex @value{SSEDEXT}, subprocesses
1351	This command allows one to pipe input from a shell command
1352	into pattern space. Without parameters, the @code{e} command
1353	executes the command that is found in pattern space and
1354	replaces the pattern space with the output; a trailing newline
1355	is suppressed.
1356
1357	If a parameter is specified, instead, the @code{e} command
1358	interprets it as a command and sends its output to the output stream
1359	(like @code{r} does). The command can run across multiple
1360	lines, all but the last ending with a back-slash.
1361
1362	In both cases, the results are undefined if the command to be
1363	executed contains a @sc{nul} character.
1364
1365	@item L @var{n}
1366	@findex L (fLow paragraphs) command
1367	@cindex Reformat pattern space
1368	@cindex Reformatting paragraphs
1369	@cindex @value{SSEDEXT}, reformatting paragraphs
1370	@cindex @value{SSEDEXT}, @code{L} command
1371	This @value{SSED} extension fills and joins lines in pattern space
1372	to produce output lines of (at most) @var{n} characters, like
1373	@code{fmt} does; if @var{n} is omitted, the default as specified
1374	on the command line is used. This command is considered a failed
1375	experiment and unless there is enough request (which seems unlikely)
1376	will be removed in future versions.
1377
1378	@ignore
1379	Blank lines, spaces between words, and indentation are
1380	preserved in the output; successive input lines with different
1381	indentation are not joined; tabs are expanded to 8 columns.
1382
1383	If the pattern space contains multiple lines, they are joined, but
1384	since the pattern space usually contains a single line, the behavior
1385	of a simple @code{L;d} script is the same as @samp{fmt -s} (i.e.,
1386	it does not join short lines to form longer ones).
1387
1388	@var{n} specifies the desired line-wrap length; if omitted,
1389	the default as specified on the command line is used.
1390	@end ignore
1391
1392	@item Q [@var{exit-code}]
1393	This command only accepts a single address.
1394
1395	@findex Q (silent Quit) command
1396	@cindex @value{SSEDEXT}, quitting silently
1397	@cindex @value{SSEDEXT}, returning an exit code
1398	@cindex Quitting
1399	This command is the same as @code{q}, but will not print the
1400	contents of pattern space. Like @code{q}, it provides the
1401	ability to return an exit code to the caller.
1402
1403	This command can be useful because the only alternative ways
1404	to accomplish this apparently trivial function are to use
1405	the @option{-n} option (which can unnecessarily complicate
1406	your script) or resorting to the following snippet, which
1407	wastes time by reading the whole file without any visible effect:
1408
1409	@example
1410	:eat
1411	$d @i{Quit silently on the last line}
1412	N @i{Read another line, silently}
1413	g @i{Overwrite pattern space each time to save memory}
1414	b eat
1415	@end example
1416
1417	@item R @var{filename}
1418	@findex R (read line) command
1419	@cindex Read text from a file
1420	@cindex @value{SSEDEXT}, reading a file a line at a time
1421	@cindex @value{SSEDEXT}, @code{R} command
1422	@cindex @value{SSEDEXT}, @file{/dev/stdin} file
1423	Queue a line of @var{filename} to be read and
1424	inserted into the output stream at the end of the current cycle,
1425	or when the next input line is read.
1426	Note that if @var{filename} cannot be read, or if its end is
1427	reached, no line is appended, without any error indication.
1428
1429	As with the @code{r} command, the special value @file{/dev/stdin}
1430	is supported for the file name, which reads a line from the
1431	standard input.
1432
1433	@item T @var{label}
1434	@findex T (test and branch if failed) command
1435	@cindex @value{SSEDEXT}, branch if @code{s///} failed
1436	@cindex Branch to a label, if @code{s///} failed
1437	@cindex Conditional branch
1438	Branch to @var{label} only if there have been no successful
1439	@code{s}ubstitutions since the last input line was read or
1440	conditional branch was taken. The @var{label} may be omitted,
1441	in which case the next cycle is started.
1442
1443	@item v @var{version}
1444	@findex v (version) command
1445	@cindex @value{SSEDEXT}, checking for their presence
1446	@cindex Requiring @value{SSED}
1447	This command does nothing, but makes @command{sed} fail if
1448	@value{SSED} extensions are not supported, simply because other
1449	versions of @command{sed} do not implement it. In addition, you
1450	can specify the version of @command{sed} that your script
1451	requires, such as @code{4.0.5}. The default is @code{4.0}
1452	because that is the first version that implemented this command.
1453
1454	This command enables all @value{SSEDEXT} even if
1455	@env{POSIXLY_CORRECT} is set in the environment.
1456
1457	@item W @var{filename}
1458	@findex W (write first line) command
1459	@cindex Write first line to a file
1460	@cindex @value{SSEDEXT}, writing first line to a file
1461	Write to the given filename the portion of the pattern space up to
1462	the first newline. Everything said under the @code{w} command about
1463	file handling holds here too.
1464	@end table
1465
1466	@node Escapes
1467	@section @acronym{GNU} Extensions for Escapes in Regular Expressions
1468
1469	@cindex @acronym{GNU} extensions, special escapes
1470	Until this chapter, we have only encountered escapes of the form
1471	@samp{\^}, which tell @command{sed} not to interpret the circumflex
1472	as a special character, but rather to take it literally. For
1473	example, @samp{\*} matches a single asterisk rather than zero
1474	or more backslashes.
1475
1476	@cindex @code{POSIXLY_CORRECT} behavior, escapes
1477	This chapter introduces another kind of escape@footnote{All
1478	the escapes introduced here are @acronym{GNU}
1479	extensions, with the exception of @code{\n}. In basic regular
1480	expression mode, setting @code{POSIXLY_CORRECT} disables them inside
1481	bracket expressions.}---that
1482	is, escapes that are applied to a character or sequence of characters
1483	that ordinarily are taken literally, and that @command{sed} replaces
1484	with a special character. This provides a way
1485	of encoding non-printable characters in patterns in a visible manner.
1486	There is no restriction on the appearance of non-printing characters
1487	in a @command{sed} script but when a script is being prepared in the
1488	shell or by text editing, it is usually easier to use one of
1489	the following escape sequences than the binary character it
1490	represents:
1491
1492	The list of these escapes is:
1493
1494	@table @code
1495	@item \a
1496	Produces or matches a @sc{bel} character, that is an ``alert'' (@sc{ascii} 7).
1497
1498	@item \f
1499	Produces or matches a form feed (@sc{ascii} 12).
1500
1501	@item \n
1502	Produces or matches a newline (@sc{ascii} 10).
1503
1504	@item \r
1505	Produces or matches a carriage return (@sc{ascii} 13).
1506
1507	@item \t
1508	Produces or matches a horizontal tab (@sc{ascii} 9).
1509
1510	@item \v
1511	Produces or matches a so called ``vertical tab'' (@sc{ascii} 11).
1512
1513	@item \c@var{x}
1514	Produces or matches @kbd{@sc{Control}-@var{x}}, where @var{x} is
1515	any character. The precise effect of @samp{\c@var{x}} is as follows:
1516	if @var{x} is a lower case letter, it is converted to upper case.
1517	Then bit 6 of the character (hex 40) is inverted. Thus @samp{\cz} becomes
1518	hex 1A, but @samp{\c@{} becomes hex 3B, while @samp{\c;} becomes hex 7B.
1519
1520	@item \d@var{xxx}
1521	Produces or matches a character whose decimal @sc{ascii} value is @var{xxx}.
1522
1523	@item \o@var{xxx}
1524	@ifset PERL
1525	@item \@var{xxx}
1526	@end ifset
1527	Produces or matches a character whose octal @sc{ascii} value is @var{xxx}.
1528	@ifset PERL
1529	The syntax without the @code{o} is active in Perl mode, while the one
1530	with the @code{o} is active in the normal or extended @sc{posix} regular
1531	expression modes.
1532	@end ifset
1533
1534	@item \x@var{xx}
1535	Produces or matches a character whose hexadecimal @sc{ascii} value is @var{xx}.
1536	@end table
1537
1538	@samp{\b} (backspace) was omitted because of the conflict with
1539	the existing ``word boundary'' meaning.
1540
1541	Other escapes match a particular character class and are valid only in
1542	regular expressions:
1543
1544	@table @code
1545	@item \w
1546	Matches any ``word'' character. A ``word'' character is any
1547	letter or digit or the underscore character.
1548
1549	@item \W
1550	Matches any ``non-word'' character.
1551
1552	@item \b
1553	Matches a word boundary; that is it matches if the character
1554	to the left is a ``word'' character and the character to the
1555	right is a ``non-word'' character, or vice-versa.
1556
1557	@item \B
1558	Matches everywhere but on a word boundary; that is it matches
1559	if the character to the left and the character to the right
1560	are either both ``word'' characters or both ``non-word''
1561	characters.
1562
1563	@item \`
1564	Matches only at the start of pattern space. This is different
1565	from @code{^} in multi-line mode.
1566
1567	@item \'
1568	Matches only at the end of pattern space. This is different
1569	from @code{$} in multi-line mode.
1570
1571	@ifset PERL
1572	@item \G
1573	Match only at the start of pattern space or, when doing a global
1574	substitution using the @code{s///g} command and option, at
1575	the end-of-match position of the prior match. For example,
1576	@samp{s/\Ga/Z/g} will change an initial run of @code{a}s to
1577	a run of @code{Z}s
1578	@end ifset
1579	@end table
1580
1581	@node Examples
1582	@chapter Some Sample Scripts
1583
1584	Here are some @command{sed} scripts to guide you in the art of mastering
1585	@command{sed}.
1586
1587	@menu
1588	Some exotic examples:
1589	* Centering lines::
1590	* Increment a number::
1591	* Rename files to lower case::
1592	* Print bash environment::
1593	* Reverse chars of lines::
1594
1595	Emulating standard utilities:
1596	* tac:: Reverse lines of files
1597	* cat -n:: Numbering lines
1598	* cat -b:: Numbering non-blank lines
1599	* wc -c:: Counting chars
1600	* wc -w:: Counting words
1601	* wc -l:: Counting lines
1602	* head:: Printing the first lines
1603	* tail:: Printing the last lines
1604	* uniq:: Make duplicate lines unique
1605	* uniq -d:: Print duplicated lines of input
1606	* uniq -u:: Remove all duplicated lines
1607	* cat -s:: Squeezing blank lines
1608	@end menu
1609
1610	@node Centering lines
1611	@section Centering Lines
1612
1613	This script centers all lines of a file on a 80 columns width.
1614	To change that width, the number in @code{\@{@dots{}\@}} must be
1615	replaced, and the number of added spaces also must be changed.
1616
1617	Note how the buffer commands are used to separate parts in
1618	the regular expressions to be matched---this is a common
1619	technique.
1620
1621	@c start-------------------------------------------
1622	@example
1623	#!/usr/bin/sed -f
1624
1625	@group
1626	# Put 80 spaces in the buffer
1627	1 @{
1628	x
1629	s/^$/ /
1630	s/^.*$/&&&&&&&&/
1631	x
1632	@}
1633	@end group
1634
1635	@group
1636	# del leading and trailing spaces
1637	y/@kbd{tab}/ /
1638	s/^ *//
1639	s/ *$//
1640	@end group
1641
1642	@group
1643	# add a newline and 80 spaces to end of line
1644	G
1645	@end group
1646
1647	@group
1648	# keep first 81 chars (80 + a newline)
1649	s/^$.\@{81\@}$.*$/\1/
1650	@end group
1651
1652	@group
1653	# \2 matches half of the spaces, which are moved to the beginning
1654	s/^$.$\n$.$\2/\2\1/
1655	@end group
1656	@end example
1657	@c end---------------------------------------------
1658
1659	@node Increment a number
1660	@section Increment a Number
1661
1662	This script is one of a few that demonstrate how to do arithmetic
1663	in @command{sed}. This is indeed possible,@footnote{@command{sed} guru Greg
1664	Ubben wrote an implementation of the @command{dc} @sc{rpn} calculator!
1665	It is distributed together with sed.} but must be done manually.
1666
1667	To increment one number you just add 1 to last digit, replacing
1668	it by the following digit. There is one exception: when the digit
1669	is a nine the previous digits must be also incremented until you
1670	don't have a nine.
1671
1672	This solution by Bruno Haible is very clever and smart because
1673	it uses a single buffer; if you don't have this limitation, the
1674	algorithm used in @ref{cat -n, Numbering lines}, is faster.
1675	It works by replacing trailing nines with an underscore, then
1676	using multiple @code{s} commands to increment the last digit,
1677	and then again substituting underscores with zeros.
1678
1679	@c start-------------------------------------------
1680	@example
1681	#!/usr/bin/sed -f
1682
1683	/[^0-9]/ d
1684
1685	@group
1686	# replace all leading 9s by _ (any other character except digits, could
1687	# be used)
1688	:d
1689	s/9$_*$$/_\1/
1690	td
1691	@end group
1692
1693	@group
1694	# incr last digit only. The first line adds a most-significant
1695	# digit of 1 if we have to add a digit.
1696	#
1697	# The @code{tn} commands are not necessary, but make the thing
1698	# faster
1699	@end group
1700
1701	@group
1702	s/^$_*$$/1\1/; tn
1703	s/8$_*$$/9\1/; tn
1704	s/7$_*$$/8\1/; tn
1705	s/6$_*$$/7\1/; tn
1706	s/5$_*$$/6\1/; tn
1707	s/4$_*$$/5\1/; tn
1708	s/3$_*$$/4\1/; tn
1709	s/2$_*$$/3\1/; tn
1710	s/1$_*$$/2\1/; tn
1711	s/0$_*$$/1\1/; tn
1712	@end group
1713
1714	@group
1715	:n
1716	y/_/0/
1717	@end group
1718	@end example
1719	@c end---------------------------------------------
1720
1721	@node Rename files to lower case
1722	@section Rename Files to Lower Case
1723
1724	This is a pretty strange use of @command{sed}. We transform text, and
1725	transform it to be shell commands, then just feed them to shell.
1726	Don't worry, even worse hacks are done when using @command{sed}; I have
1727	seen a script converting the output of @command{date} into a @command{bc}
1728	program!
1729
1730	The main body of this is the @command{sed} script, which remaps the name
1731	from lower to upper (or vice-versa) and even checks out
1732	if the remapped name is the same as the original name.
1733	Note how the script is parameterized using shell
1734	variables and proper quoting.
1735
1736	@c start-------------------------------------------
1737	@example
1738	@group
1739	#! /bin/sh
1740	# rename files to lower/upper case...
1741	#
1742	# usage:
1743	# move-to-lower *
1744	# move-to-upper *
1745	# or
1746	# move-to-lower -R .
1747	# move-to-upper -R .
1748	#
1749	@end group
1750
1751	@group
1752	help()
1753	@{
1754	cat << eof
1755	Usage: $0 [-n] [-r] [-h] files...
1756	@end group
1757
1758	@group
1759	-n do nothing, only see what would be done
1760	-R recursive (use find)
1761	-h this message
1762	files files to remap to lower case
1763	@end group
1764
1765	@group
1766	Examples:
1767	$0 -n * (see if everything is ok, then...)
1768	$0 *
1769	@end group
1770
1771	$0 -R .
1772
1773	@group
1774	eof
1775	@}
1776	@end group
1777
1778	@group
1779	apply_cmd='sh'
1780	finder='echo "$@@" \| tr " " "\n"'
1781	files_only=
1782	@end group
1783
1784	@group
1785	while :
1786	do
1787	case "$1" in
1788	-n) apply_cmd='cat' ;;
1789	-R) finder='find "$@@" -type f';;
1790	-h) help ; exit 1 ;;
1791	*) break ;;
1792	esac
1793	shift
1794	done
1795	@end group
1796
1797	@group
1798	if [ -z "$1" ]; then
1799	echo Usage: $0 [-h] [-n] [-r] files...
1800	exit 1
1801	fi
1802	@end group
1803
1804	@group
1805	LOWER='abcdefghijklmnopqrstuvwxyz'
1806	UPPER='ABCDEFGHIJKLMNOPQRSTUVWXYZ'
1807	@end group
1808
1809	@group
1810	case `basename $0` in
1811	upper) TO=$UPPER; FROM=$LOWER ;;
1812	*) FROM=$UPPER; TO=$LOWER ;;
1813	esac
1814	@end group
1815
1816	eval $finder \| sed -n '
1817
1818	@group
1819	# remove all trailing slashes
1820	s/\/*$//
1821	@end group
1822
1823	@group
1824	# add ./ if there is no path, only a filename
1825	/\//! s/^/.\//
1826	@end group
1827
1828	@group
1829	# save path+filename
1830	h
1831	@end group
1832
1833	@group
1834	# remove path
1835	s/.*\///
1836	@end group
1837
1838	@group
1839	# do conversion only on filename
1840	y/'$FROM'/'$TO'/
1841	@end group
1842
1843	@group
1844	# now line contains original path+file, while
1845	# hold space contains the new filename
1846	x
1847	@end group
1848
1849	@group
1850	# add converted file name to line, which now contains
1851	# path/file-name\nconverted-file-name
1852	G
1853	@end group
1854
1855	@group
1856	# check if converted file name is equal to original file name,
1857	# if it is, do not print nothing
1858	/^.\/$.$\n\1/b
1859	@end group
1860
1861	@group
1862	# now, transform path/fromfile\n, into
1863	# mv path/fromfile path/tofile and print it
1864	s/^$.\/$$.$\n$.*$$/mv "\1\2" "\1\3"/p
1865	@end group
1866
1867	' \| $apply_cmd
1868	@end example
1869	@c end---------------------------------------------
1870
1871	@node Print bash environment
1872	@section Print @command{bash} Environment
1873
1874	This script strips the definition of the shell functions
1875	from the output of the @command{set} Bourne-shell command.
1876
1877	@c start-------------------------------------------
1878	@example
1879	#!/bin/sh
1880
1881	@group
1882	set \| sed -n '
1883	:x
1884	@end group
1885
1886	@group
1887	@ifinfo
1888	# if no occurrence of "=()" print and load next line
1889	@end ifinfo
1890	@ifnotinfo
1891	# if no occurrence of @samp{=()} print and load next line
1892	@end ifnotinfo
1893	/=()/! @{ p; b; @}
1894	/ () $/! @{ p; b; @}
1895	@end group
1896
1897	@group
1898	# possible start of functions section
1899	# save the line in case this is a var like FOO="() "
1900	h
1901	@end group
1902
1903	@group
1904	# if the next line has a brace, we quit because
1905	# nothing comes after functions
1906	n
1907	/^@{/ q
1908	@end group
1909
1910	@group
1911	# print the old line
1912	x; p
1913	@end group
1914
1915	@group
1916	# work on the new line now
1917	x; bx
1918	'
1919	@end group
1920	@end example
1921	@c end---------------------------------------------
1922
1923	@node Reverse chars of lines
1924	@section Reverse Characters of Lines
1925
1926	This script can be used to reverse the position of characters
1927	in lines. The technique moves two characters at a time, hence
1928	it is faster than more intuitive implementations.
1929
1930	Note the @code{tx} command before the definition of the label.
1931	This is often needed to reset the flag that is tested by
1932	the @code{t} command.
1933
1934	Imaginative readers will find uses for this script. An example
1935	is reversing the output of @command{banner}.@footnote{This requires
1936	another script to pad the output of banner; for example
1937
1938	@example
1939	#! /bin/sh
1940
1941	banner -w $1 $2 $3 $4 \|
1942	sed -e :a -e '/^.\@{0,'$1'\@}$/ @{ s/$/ /; ba; @}' \|
1943	~/sedscripts/reverseline.sed
1944	@end example
1945	}
1946
1947	@c start-------------------------------------------
1948	@example
1949	#!/usr/bin/sed -f
1950
1951	/../! b
1952
1953	@group
1954	# Reverse a line. Begin embedding the line between two newlines
1955	s/^.*$/\
1956	&\
1957	/
1958	@end group
1959
1960	@group
1961	# Move first character at the end. The regexp matches until
1962	# there are zero or one characters between the markers
1963	tx
1964	:x
1965	s/$\n.$$.*$$.\n$/\3\2\1/
1966	tx
1967	@end group
1968
1969	@group
1970	# Remove the newline markers
1971	s/\n//g
1972	@end group
1973	@end example
1974	@c end---------------------------------------------
1975
1976	@node tac
1977	@section Reverse Lines of Files
1978
1979	This one begins a series of totally useless (yet interesting)
1980	scripts emulating various Unix commands. This, in particular,
1981	is a @command{tac} workalike.
1982
1983	Note that on implementations other than @acronym{GNU} @command{sed}
1984	@ifset PERL
1985	and @value{SSED}
1986	@end ifset
1987	this script might easily overflow internal buffers.
1988
1989	@c start-------------------------------------------
1990	@example
1991	#!/usr/bin/sed -nf
1992
1993	# reverse all lines of input, i.e. first line became last, ...
1994
1995	@group
1996	# from the second line, the buffer (which contains all previous lines)
1997	# is appended to current line, so, the order will be reversed
1998	1! G
1999	@end group
2000
2001	@group
2002	# on the last line we're done -- print everything
2003	$ p
2004	@end group
2005
2006	@group
2007	# store everything on the buffer again
2008	h
2009	@end group
2010	@end example
2011	@c end---------------------------------------------
2012
2013	@node cat -n
2014	@section Numbering Lines
2015
2016	This script replaces @samp{cat -n}; in fact it formats its output
2017	exactly like @acronym{GNU} @command{cat} does.
2018
2019	Of course this is completely useless and for two reasons: first,
2020	because somebody else did it in C, second, because the following
2021	Bourne-shell script could be used for the same purpose and would
2022	be much faster:
2023
2024	@c start-------------------------------------------
2025	@example
2026	@group
2027	#! /bin/sh
2028	sed -e "=" $@@ \| sed -e '
2029	s/^/ /
2030	N
2031	s/^ *$......$\n/\1 /
2032	'
2033	@end group
2034	@end example
2035	@c end---------------------------------------------
2036
2037	It uses @command{sed} to print the line number, then groups lines two
2038	by two using @code{N}. Of course, this script does not teach as much as
2039	the one presented below.
2040
2041	The algorithm used for incrementing uses both buffers, so the line
2042	is printed as soon as possible and then discarded. The number
2043	is split so that changing digits go in a buffer and unchanged ones go
2044	in the other; the changed digits are modified in a single step
2045	(using a @code{y} command). The line number for the next line
2046	is then composed and stored in the hold space, to be used in the
2047	next iteration.
2048
2049	@c start-------------------------------------------
2050	@example
2051	#!/usr/bin/sed -nf
2052
2053	@group
2054	# Prime the pump on the first line
2055	x
2056	/^$/ s/^.*$/1/
2057	@end group
2058
2059	@group
2060	# Add the correct line number before the pattern
2061	G
2062	h
2063	@end group
2064
2065	@group
2066	# Format it and print it
2067	s/^/ /
2068	s/^ *$......$\n/\1 /p
2069	@end group
2070
2071	@group
2072	# Get the line number from hold space; add a zero
2073	# if we're going to add a digit on the next line
2074	g
2075	s/\n.*$//
2076	/^9*$/ s/^/0/
2077	@end group
2078
2079	@group
2080	# separate changing/unchanged digits with an x
2081	s/.9*$/x&/
2082	@end group
2083
2084	@group
2085	# keep changing digits in hold space
2086	h
2087	s/^.*x//
2088	y/0123456789/1234567890/
2089	x
2090	@end group
2091
2092	@group
2093	# keep unchanged digits in pattern space
2094	s/x.*$//
2095	@end group
2096
2097	@group
2098	# compose the new number, remove the newline implicitly added by G
2099	G
2100	s/\n//
2101	h
2102	@end group
2103	@end example
2104	@c end---------------------------------------------
2105
2106	@node cat -b
2107	@section Numbering Non-blank Lines
2108
2109	Emulating @samp{cat -b} is almost the same as @samp{cat -n}---we only
2110	have to select which lines are to be numbered and which are not.
2111
2112	The part that is common to this script and the previous one is
2113	not commented to show how important it is to comment @command{sed}
2114	scripts properly...
2115
2116	@c start-------------------------------------------
2117	@example
2118	#!/usr/bin/sed -nf
2119
2120	@group
2121	/^$/ @{
2122	p
2123	b
2124	@}
2125	@end group
2126
2127	@group
2128	# Same as cat -n from now
2129	x
2130	/^$/ s/^.*$/1/
2131	G
2132	h
2133	s/^/ /
2134	s/^ *$......$\n/\1 /p
2135	x
2136	s/\n.*$//
2137	/^9*$/ s/^/0/
2138	s/.9*$/x&/
2139	h
2140	s/^.*x//
2141	y/0123456789/1234567890/
2142	x
2143	s/x.*$//
2144	G
2145	s/\n//
2146	h
2147	@end group
2148	@end example
2149	@c end---------------------------------------------
2150
2151	@node wc -c
2152	@section Counting Characters
2153
2154	This script shows another way to do arithmetic with @command{sed}.
2155	In this case we have to add possibly large numbers, so implementing
2156	this by successive increments would not be feasible (and possibly
2157	even more complicated to contrive than this script).
2158
2159	The approach is to map numbers to letters, kind of an abacus
2160	implemented with @command{sed}. @samp{a}s are units, @samp{b}s are
2161	tens and so on: we simply add the number of characters
2162	on the current line as units, and then propagate the carry
2163	to tens, hundreds, and so on.
2164
2165	As usual, running totals are kept in hold space.
2166
2167	On the last line, we convert the abacus form back to decimal.
2168	For the sake of variety, this is done with a loop rather than
2169	with some 80 @code{s} commands@footnote{Some implementations
2170	have a limit of 199 commands per script}: first we
2171	convert units, removing @samp{a}s from the number; then we
2172	rotate letters so that tens become @samp{a}s, and so on
2173	until no more letters remain.
2174
2175	@c start-------------------------------------------
2176	@example
2177	#!/usr/bin/sed -nf
2178
2179	@group
2180	# Add n+1 a's to hold space (+1 is for the newline)
2181	s/./a/g
2182	H
2183	x
2184	s/\n/a/
2185	@end group
2186
2187	@group
2188	# Do the carry. The t's and b's are not necessary,
2189	# but they do speed up the thing
2190	t a
2191	: a; s/aaaaaaaaaa/b/g; t b; b done
2192	: b; s/bbbbbbbbbb/c/g; t c; b done
2193	: c; s/cccccccccc/d/g; t d; b done
2194	: d; s/dddddddddd/e/g; t e; b done
2195	: e; s/eeeeeeeeee/f/g; t f; b done
2196	: f; s/ffffffffff/g/g; t g; b done
2197	: g; s/gggggggggg/h/g; t h; b done
2198	: h; s/hhhhhhhhhh//g
2199	@end group
2200
2201	@group
2202	: done
2203	$! @{
2204	h
2205	b
2206	@}
2207	@end group
2208
2209	# On the last line, convert back to decimal
2210
2211	@group
2212	: loop
2213	/a/! s/[b-h]*/&0/
2214	s/aaaaaaaaa/9/
2215	s/aaaaaaaa/8/
2216	s/aaaaaaa/7/
2217	s/aaaaaa/6/
2218	s/aaaaa/5/
2219	s/aaaa/4/
2220	s/aaa/3/
2221	s/aa/2/
2222	s/a/1/
2223	@end group
2224
2225	@group
2226	: next
2227	y/bcdefgh/abcdefg/
2228	/[a-h]/ b loop
2229	p
2230	@end group
2231	@end example
2232	@c end---------------------------------------------
2233
2234	@node wc -w
2235	@section Counting Words
2236
2237	This script is almost the same as the previous one, once each
2238	of the words on the line is converted to a single @samp{a}
2239	(in the previous script each letter was changed to an @samp{a}).
2240
2241	It is interesting that real @command{wc} programs have optimized
2242	loops for @samp{wc -c}, so they are much slower at counting
2243	words rather than characters. This script's bottleneck,
2244	instead, is arithmetic, and hence the word-counting one
2245	is faster (it has to manage smaller numbers).
2246
2247	Again, the common parts are not commented to show the importance
2248	of commenting @command{sed} scripts.
2249
2250	@c start-------------------------------------------
2251	@example
2252	#!/usr/bin/sed -nf
2253
2254	@group
2255	# Convert words to a's
2256	s/[ @kbd{tab}][ @kbd{tab}]*/ /g
2257	s/^/ /
2258	s/ [^ ][^ ]*/a /g
2259	s/ //g
2260	@end group
2261
2262	@group
2263	# Append them to hold space
2264	H
2265	x
2266	s/\n//
2267	@end group
2268
2269	@group
2270	# From here on it is the same as in wc -c.
2271	/aaaaaaaaaa/! bx; s/aaaaaaaaaa/b/g
2272	/bbbbbbbbbb/! bx; s/bbbbbbbbbb/c/g
2273	/cccccccccc/! bx; s/cccccccccc/d/g
2274	/dddddddddd/! bx; s/dddddddddd/e/g
2275	/eeeeeeeeee/! bx; s/eeeeeeeeee/f/g
2276	/ffffffffff/! bx; s/ffffffffff/g/g
2277	/gggggggggg/! bx; s/gggggggggg/h/g
2278	s/hhhhhhhhhh//g
2279	:x
2280	$! @{ h; b; @}
2281	:y
2282	/a/! s/[b-h]*/&0/
2283	s/aaaaaaaaa/9/
2284	s/aaaaaaaa/8/
2285	s/aaaaaaa/7/
2286	s/aaaaaa/6/
2287	s/aaaaa/5/
2288	s/aaaa/4/
2289	s/aaa/3/
2290	s/aa/2/
2291	s/a/1/
2292	y/bcdefgh/abcdefg/
2293	/[a-h]/ by
2294	p
2295	@end group
2296	@end example
2297	@c end---------------------------------------------
2298
2299	@node wc -l
2300	@section Counting Lines
2301
2302	No strange things are done now, because @command{sed} gives us
2303	@samp{wc -l} functionality for free!!! Look:
2304
2305	@c start-------------------------------------------
2306	@example
2307	@group
2308	#!/usr/bin/sed -nf
2309	$=
2310	@end group
2311	@end example
2312	@c end---------------------------------------------
2313
2314	@node head
2315	@section Printing the First Lines
2316
2317	This script is probably the simplest useful @command{sed} script.
2318	It displays the first 10 lines of input; the number of displayed
2319	lines is right before the @code{q} command.
2320
2321	@c start-------------------------------------------
2322	@example
2323	@group
2324	#!/usr/bin/sed -f
2325	10q
2326	@end group
2327	@end example
2328	@c end---------------------------------------------
2329
2330	@node tail
2331	@section Printing the Last Lines
2332
2333	Printing the last @var{n} lines rather than the first is more complex
2334	but indeed possible. @var{n} is encoded in the second line, before
2335	the bang character.
2336
2337	This script is similar to the @command{tac} script in that it keeps the
2338	final output in the hold space and prints it at the end:
2339
2340	@c start-------------------------------------------
2341	@example
2342	#!/usr/bin/sed -nf
2343
2344	@group
2345	1! @{; H; g; @}
2346	1,10 !s/[^\n]*\n//
2347	$p
2348	h
2349	@end group
2350	@end example
2351	@c end---------------------------------------------
2352
2353	Mainly, the scripts keeps a window of 10 lines and slides it
2354	by adding a line and deleting the oldest (the substitution command
2355	on the second line works like a @code{D} command but does not
2356	restart the loop).
2357
2358	The ``sliding window'' technique is a very powerful way to write
2359	efficient and complex @command{sed} scripts, because commands like
2360	@code{P} would require a lot of work if implemented manually.
2361
2362	To introduce the technique, which is fully demonstrated in the
2363	rest of this chapter and is based on the @code{N}, @code{P}
2364	and @code{D} commands, here is an implementation of @command{tail}
2365	using a simple ``sliding window.''
2366
2367	This looks complicated but in fact the working is the same as
2368	the last script: after we have kicked in the appropriate number
2369	of lines, however, we stop using the hold space to keep inter-line
2370	state, and instead use @code{N} and @code{D} to slide pattern
2371	space by one line:
2372
2373	@c start-------------------------------------------
2374	@example
2375	#!/usr/bin/sed -f
2376
2377	@group
2378	1h
2379	2,10 @{; H; g; @}
2380	$q
2381	1,9d
2382	N
2383	D
2384	@end group
2385	@end example
2386	@c end---------------------------------------------
2387
2388	Note how the first, second and fourth line are inactive after
2389	the first ten lines of input. After that, all the script does
2390	is: exiting on the last line of input, appending the next input
2391	line to pattern space, and removing the first line.
2392
2393	@node uniq
2394	@section Make Duplicate Lines Unique
2395
2396	This is an example of the art of using the @code{N}, @code{P}
2397	and @code{D} commands, probably the most difficult to master.
2398
2399	@c start-------------------------------------------
2400	@example
2401	@group
2402	#!/usr/bin/sed -f
2403	h
2404	@end group
2405
2406	@group
2407	:b
2408	# On the last line, print and exit
2409	$b
2410	N
2411	/^$.*$\n\1$/ @{
2412	# The two lines are identical. Undo the effect of
2413	# the n command.
2414	g
2415	bb
2416	@}
2417	@end group
2418
2419	@group
2420	# If the @code{N} command had added the last line, print and exit
2421	$b
2422	@end group
2423
2424	@group
2425	# The lines are different; print the first and go
2426	# back working on the second.
2427	P
2428	D
2429	@end group
2430	@end example
2431	@c end---------------------------------------------
2432
2433	As you can see, we mantain a 2-line window using @code{P} and @code{D}.
2434	This technique is often used in advanced @command{sed} scripts.
2435
2436	@node uniq -d
2437	@section Print Duplicated Lines of Input
2438
2439	This script prints only duplicated lines, like @samp{uniq -d}.
2440
2441	@c start-------------------------------------------
2442	@example
2443	#!/usr/bin/sed -nf
2444
2445	@group
2446	$b
2447	N
2448	/^$.*$\n\1$/ @{
2449	# Print the first of the duplicated lines
2450	s/.*\n//
2451	p
2452	@end group
2453
2454	@group
2455	# Loop until we get a different line
2456	:b
2457	$b
2458	N
2459	/^$.*$\n\1$/ @{
2460	s/.*\n//
2461	bb
2462	@}
2463	@}
2464	@end group
2465
2466	@group
2467	# The last line cannot be followed by duplicates
2468	$b
2469	@end group
2470
2471	@group
2472	# Found a different one. Leave it alone in the pattern space
2473	# and go back to the top, hunting its duplicates
2474	D
2475	@end group
2476	@end example
2477	@c end---------------------------------------------
2478
2479	@node uniq -u
2480	@section Remove All Duplicated Lines
2481
2482	This script prints only unique lines, like @samp{uniq -u}.
2483
2484	@c start-------------------------------------------
2485	@example
2486	#!/usr/bin/sed -f
2487
2488	@group
2489	# Search for a duplicate line --- until that, print what you find.
2490	$b
2491	N
2492	/^$.*$\n\1$/ ! @{
2493	P
2494	D
2495	@}
2496	@end group
2497
2498	@group
2499	:c
2500	# Got two equal lines in pattern space. At the
2501	# end of the file we simply exit
2502	$d
2503	@end group
2504
2505	@group
2506	# Else, we keep reading lines with @code{N} until we
2507	# find a different one
2508	s/.*\n//
2509	N
2510	/^$.*$\n\1$/ @{
2511	bc
2512	@}
2513	@end group
2514
2515	@group
2516	# Remove the last instance of the duplicate line
2517	# and go back to the top
2518	D
2519	@end group
2520	@end example
2521	@c end---------------------------------------------
2522
2523	@node cat -s
2524	@section Squeezing Blank Lines
2525
2526	As a final example, here are three scripts, of increasing complexity
2527	and speed, that implement the same function as @samp{cat -s}, that is
2528	squeezing blank lines.
2529
2530	The first leaves a blank line at the beginning and end if there are
2531	some already.
2532
2533	@c start-------------------------------------------
2534	@example
2535	#!/usr/bin/sed -f
2536
2537	@group
2538	# on empty lines, join with next
2539	# Note there is a star in the regexp
2540	:x
2541	/^\n*$/ @{
2542	N
2543	bx
2544	@}
2545	@end group
2546
2547	@group
2548	# now, squeeze all '\n', this can be also done by:
2549	# s/^$\n$*/\1/
2550	s/\n*/\
2551	/
2552	@end group
2553	@end example
2554	@c end---------------------------------------------
2555
2556	This one is a bit more complex and removes all empty lines
2557	at the beginning. It does leave a single blank line at end
2558	if one was there.
2559
2560	@c start-------------------------------------------
2561	@example
2562	#!/usr/bin/sed -f
2563
2564	@group
2565	# delete all leading empty lines
2566	1,/^./@{
2567	/./!d
2568	@}
2569	@end group
2570
2571	@group
2572	# on an empty line we remove it and all the following
2573	# empty lines, but one
2574	:x
2575	/./!@{
2576	N
2577	s/^\n$//
2578	tx
2579	@}
2580	@end group
2581	@end example
2582	@c end---------------------------------------------
2583
2584	This removes leading and trailing blank lines. It is also the
2585	fastest. Note that loops are completely done with @code{n} and
2586	@code{b}, without relying on @command{sed} to restart the
2587	the script automatically at the end of a line.
2588
2589	@c start-------------------------------------------
2590	@example
2591	#!/usr/bin/sed -nf
2592
2593	@group
2594	# delete all (leading) blanks
2595	/./!d
2596	@end group
2597
2598	@group
2599	# get here: so there is a non empty
2600	:x
2601	# print it
2602	p
2603	# get next
2604	n
2605	# got chars? print it again, etc...
2606	/./bx
2607	@end group
2608
2609	@group
2610	# no, don't have chars: got an empty line
2611	:z
2612	# get next, if last line we finish here so no trailing
2613	# empty lines are written
2614	n
2615	# also empty? then ignore it, and get next... this will
2616	# remove ALL empty lines
2617	/./!bz
2618	@end group
2619
2620	@group
2621	# all empty lines were deleted/ignored, but we have a non empty. As
2622	# what we want to do is to squeeze, insert a blank line artificially
2623	i\
2624	@end group
2625
2626	bx
2627	@end example
2628	@c end---------------------------------------------
2629
2630	@node Limitations
2631	@chapter @value{SSED}'s Limitations and Non-limitations
2632
2633	@cindex @acronym{GNU} extensions, unlimited line length
2634	@cindex Portability, line length limitations
2635	For those who want to write portable @command{sed} scripts,
2636	be aware that some implementations have been known to
2637	limit line lengths (for the pattern and hold spaces)
2638	to be no more than 4000 bytes.
2639	The @sc{posix} standard specifies that conforming @command{sed}
2640	implementations shall support at least 8192 byte line lengths.
2641	@value{SSED} has no built-in limit on line length;
2642	as long as it can @code{malloc()} more (virtual) memory,
2643	you can feed or construct lines as long as you like.
2644
2645	However, recursion is used to handle subpatterns and indefinite
2646	repetition. This means that the available stack space may limit
2647	the size of the buffer that can be processed by certain patterns.
2648
2649	@ifset PERL
2650	There are some size limitations in the regular expression
2651	matcher but it is hoped that they will never in practice
2652	be relevant. The maximum length of a compiled pattern
2653	is 65539 (sic) bytes. All values in repeating quantifiers
2654	must be less than 65536. The maximum nesting depth of
2655	all parenthesized subpatterns, including capturing and
2656	non-capturing subpatterns@footnote{The
2657	distinction is meaningful when referring to Perl-style
2658	regular expressions.}, assertions, and other types of
2659	subpattern, is 200.
2660
2661	Also, @value{SSED} recognizes the @sc{posix} syntax
2662	@code{[.@var{ch}.]} and @code{[=@var{ch}=]}
2663	where @var{ch} is a ``collating element'', but these
2664	are not supported, and an error is given if they are
2665	encountered.
2666
2667	Here are a few distinctions between the real Perl-style
2668	regular expressions and those that @option{-R} recognizes.
2669
2670	@enumerate
2671	@item
2672	Lookahead assertions do not allow repeat quantifiers after them
2673	Perl permits them, but they do not mean what you
2674	might think. For example, @samp{(?!a)@{3@}} does not assert that the
2675	next three characters are not @samp{a}. It just asserts three times that the
2676	next character is not @samp{a} --- a waste of time and nothing else.
2677
2678	@item
2679	Capturing subpatterns that occur inside negative lookahead
2680	head assertions are counted, but their entries are counted
2681	as empty in the second half of an @code{s} command.
2682	Perl sets its numerical variables from any such patterns
2683	that are matched before the assertion fails to match
2684	something (thereby succeeding), but only if the negative
2685	lookahead assertion contains just one branch.
2686
2687	@item
2688	The following Perl escape sequences are not supported:
2689	@samp{\l}, @samp{\u}, @samp{\L}, @samp{\U}, @samp{\E},
2690	@samp{\Q}. In fact these are implemented by Perl's general
2691	string-handling and are not part of its pattern matching engine.
2692
2693	@item
2694	The Perl @samp{\G} assertion is not supported as it is not
2695	relevant to single pattern matches.
2696
2697	@item
2698	Fairly obviously, @value{SSED} does not support the @samp{(?@{code@})}
2699	and @samp{(?p@{code@})} constructions. However, there is some experimental
2700	support for recursive patterns using the non-Perl item @samp{(?R)}.
2701
2702	@item
2703	There are at the time of writing some oddities in Perl
2704	5.005_02 concerned with the settings of captured strings
2705	when part of a pattern is repeated. For example, matching
2706	@samp{aba} against the pattern @samp{/^(a(b)?)+$/} sets
2707	@samp{$2}@footnote{@samp{$2} would be @samp{\2} in @value{SSED}.}
2708	to the value @samp{b}, but matching @samp{aabbaa}
2709	against @samp{/^(aa(bb)?)+$/} leaves @samp{$2}
2710	unset. However, if the pattern is changed to
2711	@samp{/^(aa(b(b))?)+$/} then @samp{$2} (and @samp{$3}) are set.
2712	In Perl 5.004 @samp{$2} is set in both cases, and that is also
2713	true of @value{SSED}.
2714
2715	@item
2716	Another as yet unresolved discrepancy is that in Perl
2717	5.005_02 the pattern @samp{/^(a)?(?(1)a\|b)+$/} matches
2718	the string @samp{a}, whereas in @value{SSED} it does not.
2719	However, in both Perl and @value{SSED} @samp{/^(a)?a/} matched
2720	against @samp{a} leaves $1 unset.
2721	@end enumerate
2722	@end ifset
2723
2724	@node Other Resources
2725	@chapter Other Resources for Learning About @command{sed}
2726
2727	@cindex Additional reading about @command{sed}
2728	In addition to several books that have been written about @command{sed}
2729	(either specifically or as chapters in books which discuss
2730	shell programming), one can find out more about @command{sed}
2731	(including suggestions of a few books) from the FAQ
2732	for the @code{sed-users} mailing list, available from any of:
2733	@display
2734	@uref{http://www.student.northpark.edu/pemente/sed/sedfaq.html}
2735	@uref{http://sed.sf.net/grabbag/tutorials/sedfaq.html}
2736	@end display
2737
2738	Also of interest are
2739	@uref{http://www.student.northpark.edu/pemente/sed/index.htm}
2740	and @uref{http://sed.sf.net/grabbag},
2741	which include @command{sed} tutorials and other @command{sed}-related goodies.
2742
2743	The @code{sed-users} mailing list itself maintained by Sven Guckes.
2744	To subscribe, visit @uref{http://groups.yahoo.com} and search
2745	for the @code{sed-users} mailing list.
2746
2747	@node Reporting Bugs
2748	@chapter Reporting Bugs
2749
2750	@cindex Bugs, reporting
2751	Email bug reports to @email{bonzini@@gnu.org}.
2752	Be sure to include the word ``sed'' somewhere in the @code{Subject:} field.
2753	Also, please include the output of @samp{sed --version} in the body
2754	of your report if at all possible.
2755
2756	Please do not send a bug report like this:
2757
2758	@example
2759	@i{while building frobme-1.3.4}
2760	$ configure
2761	@error{} sed: file sedscr line 1: Unknown option to 's'
2762	@end example
2763
2764	If @value{SSED} doesn't configure your favorite package, take a
2765	few extra minutes to identify the specific problem and make a stand-alone
2766	test case. Unlike other programs such as C compilers, making such test
2767	cases for @command{sed} is quite simple.
2768
2769	A stand-alone test case includes all the data necessary to perform the
2770	test, and the specific invocation of @command{sed} that causes the problem.
2771	The smaller a stand-alone test case is, the better. A test case should
2772	not involve something as far removed from @command{sed} as ``try to configure
2773	frobme-1.3.4''. Yes, that is in principle enough information to look
2774	for the bug, but that is not a very practical prospect.
2775
2776	Here are a few commonly reported bugs that are not bugs.
2777
2778	@table @asis
2779	@item @code{N} command on the last line
2780	@cindex Portability, @code{N} command on the last line
2781	@cindex Non-bugs, @code{N} command on the last line
2782
2783	Most versions of @command{sed} exit without printing anything when
2784	the @command{N} command is issued on the last line of a file.
2785	@value{SSED} prints pattern space before exiting unless of course
2786	the @command{-n} command switch has been specified. This choice is
2787	by design.
2788
2789	For example, the behavior of
2790	@example
2791	sed N foo bar
2792	@end example
2793	@noindent
2794	would depend on whether foo has an even or an odd number of
2795	lines@footnote{which is the actual ``bug'' that prompted the
2796	change in behavior}. Or, when writing a script to read the
2797	next few lines following a pattern match, traditional
2798	implementations of @code{sed} would force you to write
2799	something like
2800	@example
2801	/foo/@{ $!N; $!N; $!N; $!N; $!N; $!N; $!N; $!N; $!N @}
2802	@end example
2803	@noindent
2804	instead of just
2805	@example
2806	/foo/@{ N;N;N;N;N;N;N;N;N; @}
2807	@end example
2808
2809	@cindex @code{POSIXLY_CORRECT} behavior, @code{N} command
2810	In any case, the simplest workaround is to use @code{$d;N} in
2811	scripts that rely on the traditional behavior, or to set
2812	the @code{POSIXLY_CORRECT} variable to a non-empty value.
2813
2814	@item Regex syntax clashes (problems with backslashes)
2815	@cindex @acronym{GNU} extensions, to basic regular expressions
2816	@cindex Non-bugs, regex syntax clashes
2817	@command{sed} uses the @sc{posix} basic regular expression syntax. According to
2818	the standard, the meaning of some escape sequences is undefined in
2819	this syntax; notable in the case of @command{sed} are @code{\\|},
2820	@code{\+}, @code{\?}, @code{\`}, @code{\'}, @code{\<},
2821	@code{\>}, @code{\b}, @code{\B}, @code{\w}, and @code{\W}.
2822
2823	As in all @acronym{GNU} programs that use @sc{posix} basic regular
2824	expressions, @command{sed} interprets these escape sequences as special
2825	characters. So, @code{x\+} matches one or more occurrences of @samp{x}.
2826	@code{abc\\|def} matches either @samp{abc} or @samp{def}.
2827
2828	This syntax may cause problems when running scripts written for other
2829	@command{sed}s. Some @command{sed} programs have been written with the
2830	assumption that @code{\\|} and @code{\+} match the literal characters
2831	@code{\|} and @code{+}. Such scripts must be modified by removing the
2832	spurious backslashes if they are to be used with modern implementations
2833	of @command{sed}, like
2834	@ifset PERL
2835	@value{SSED} or
2836	@end ifset
2837	@acronym{GNU} @command{sed}.
2838
2839	On the other hand, some scripts use s\|abc\\|def\|\|g to remove occurrences
2840	of @emph{either} @code{abc} or @code{def}. While this worked until
2841	@command{sed} 4.0.x, newer versions interpret this as removing the
2842	string @code{abc\|def}. This is again undefined behavior according to
2843	@acronym{POSIX}, and this interpretation is arguably more robust: older
2844	@command{sed}s, for example, required that the regex matcher parsed
2845	@code{\/} as @code{/} in the common case of escaping a slash, which is
2846	again undefined behavior; the new behavior avoids this, and this is good
2847	because the regex matcher is only partially under our control.
2848
2849	@cindex @acronym{GNU} extensions, special escapes
2850	In addition, this version of @command{sed} supports several escape characters
2851	(some of which are multi-character) to insert non-printable characters
2852	in scripts (@code{\a}, @code{\c}, @code{\d}, @code{\o}, @code{\r},
2853	@code{\t}, @code{\v}, @code{\x}). These can cause similar problems
2854	with scripts written for other @command{sed}s.
2855
2856	@item @option{-i} clobbers read-only files
2857	@cindex In-place editing
2858	@cindex @value{SSEDEXT}, in-place editing
2859	@cindex Non-bugs, in-place editing
2860
2861	In short, @samp{sed -i} will let you delete the contents of
2862	a read-only file, and in general the @option{-i} option
2863	(@pxref{Invoking sed, , Invocation}) lets you clobber
2864	protected files. This is not a bug, but rather a consequence
2865	of how the Unix filesystem works.
2866
2867	The permissions on a file say what can happen to the data
2868	in that file, while the permissions on a directory say what can
2869	happen to the list of files in that directory. @samp{sed -i}
2870	will not ever open for writing a file that is already on disk.
2871	Rather, it will work on a temporary file that is finally renamed
2872	to the original name: if you rename or delete files, you're actually
2873	modifying the contents of the directory, so the operation depends on
2874	the permissions of the directory, not of the file. For this same
2875	reason, @command{sed} does not let you use @option{-i} on a writeable file
2876	in a read-only directory (but unbelievably nobody reports that as a
2877	bug@dots{}).
2878
2879	@item @code{0a} does not work (gives an error)
2880	There is no line 0. 0 is a special address that is only used to treat
2881	addresses like @code{0,/@var{RE}/} as active when the script starts: if
2882	you write @code{1,/abc/d} and the first line includes the word @samp{abc},
2883	then that match would be ignored because address ranges must span at least
2884	two lines (barring the end of the file); but what you probably wanted is
2885	to delete every line up to the first one including @samp{abc}, and this
2886	is obtained with @code{0,/abc/d}.
2887
2888	@ifclear PERL
2889	@item @code{[a-z]} is case insensitive
2890	You are encountering problems with locales. POSIX mandates that @code{[a-z]}
2891	uses the current locale's collation order -- in C parlance, that means using
2892	@code{strcoll(3)} instead of @code{strcmp(3)}. Some locales have a
2893	case-insensitive collation order, others don't: one of those that have
2894	problems is Estonian.
2895
2896	Another problem is that @code{[a-z]} tries to use collation symbols.
2897	This only happens if you are on the @acronym{GNU} system, using
2898	@acronym{GNU} libc's regular expression matcher instead of compiling the
2899	one supplied with @acronym{GNU} sed. In a Danish locale, for example,
2900	the regular expression @code{^[a-z]$} matches the string @samp{aa},
2901	because this is a single collating symbol that comes after @samp{a}
2902	and before @samp{b}; @samp{ll} behaves similarly in Spanish
2903	locales, or @samp{ij} in Dutch locales.
2904
2905	To work around these problems, which may cause bugs in shell scripts, set
2906	the @env{LC_COLLATE} and @env{LC_CTYPE} environment variables to @samp{C}.
2907	@end ifclear
2908	@end table
2909
2910
2911	@node Extended regexps
2912	@appendix Extended regular expressions
2913	@cindex Extended regular expressions, syntax
2914
2915	The only difference between basic and extended regular expressions is in
2916	the behavior of a few characters: @samp{?}, @samp{+}, parentheses,
2917	and braces (@samp{@{@}}). While basic regular expressions require
2918	these to be escaped if you want them to behave as special characters,
2919	when using extended regular expressions you must escape them if
2920	you want them @emph{to match a literal character}.
2921
2922	@noindent
2923	Examples:
2924	@table @code
2925	@item abc?
2926	becomes @samp{abc\?} when using extended regular expressions. It matches
2927	the literal string @samp{abc?}.
2928
2929	@item c\+
2930	becomes @samp{c+} when using extended regular expressions. It matches
2931	one or more @samp{c}s.
2932
2933	@item a\@{3,\@}
2934	becomes @samp{a@{3,@}} when using extended regular expressions. It matches
2935	three or more @samp{a}s.
2936
2937	@item $abc$\@{2,3\@}
2938	becomes @samp{(abc)@{2,3@}} when using extended regular expressions. It
2939	matches either @samp{abcabc} or @samp{abcabcabc}.
2940
2941	@item $abc*$\1
2942	becomes @samp{(abc*)\1} when using extended regular expressions.
2943	Backreferences must still be escaped when using extended regular
2944	expressions.
2945	@end table
2946
2947	@ifset PERL
2948	@node Perl regexps
2949	@appendix Perl-style regular expressions
2950	@cindex Perl-style regular expressions, syntax
2951
2952	@emph{This part is taken from the @file{pcre.txt} file distributed together
2953	with the free @sc{pcre} regular expression matcher; it was written by Philip Hazel.}
2954
2955	Perl introduced several extensions to regular expressions, some
2956	of them incompatible with the syntax of regular expressions
2957	accepted by Emacs and other @acronym{GNU} tools (whose matcher was
2958	based on the Emacs matcher). @value{SSED} implements
2959	both kinds of extensions.
2960
2961	@iftex
2962	Summarizing, we have:
2963
2964	@itemize @bullet
2965	@item
2966	A backslash can introduce several special sequences
2967
2968	@item
2969	The circumflex, dollar sign, and period characters behave specially
2970	with regard to new lines
2971
2972	@item
2973	Strange uses of square brackets are parsed differently
2974
2975	@item
2976	You can toggle modifiers in the middle of a regular expression
2977
2978	@item
2979	You can specify that a subpattern does not count when numbering backreferences
2980
2981	@item
2982	@cindex Greedy regular expression matching
2983	You can specify greedy or non-greedy matching
2984
2985	@item
2986	You can have more than ten back references
2987
2988	@item
2989	You can do complex look aheads and look behinds (in the spirit of
2990	@code{\b}, but with subpatterns).
2991
2992	@item
2993	You can often improve performance by avoiding that @command{sed} wastes
2994	time with backtracking
2995
2996	@item
2997	You can have if/then/else branches
2998
2999	@item
3000	You can do recursive matches, for example to look for unbalanced parentheses
3001
3002	@item
3003	You can have comments and non-significant whitespace, because things can
3004	get complex...
3005	@end itemize
3006
3007	Most of these extensions are introduced by the special @code{(?}
3008	sequence, which gives special meanings to parenthesized groups.
3009	@end iftex
3010	@menu
3011	Other extensions can be roughly subdivided in two categories
3012	On one hand Perl introduces several more escaped sequences
3013	(that is, sequences introduced by a backslash). On the other
3014	hand, it specifies that if a question mark follows an open
3015	parentheses it should give a special meaning to the parenthesized
3016	group.
3017
3018	* Backslash:: Introduces special sequences
3019	* Circumflex/dollar sign/period:: Behave specially with regard to new lines
3020	* Square brackets:: Are a bit different in strange cases
3021	* Options setting:: Toggle modifiers in the middle of a regexp
3022	* Non-capturing subpatterns:: Are not counted when backreferencing
3023	* Repetition:: Allows for non-greedy matching
3024	* Backreferences:: Allows for more than 10 back references
3025	* Assertions:: Allows for complex look ahead matches
3026	* Non-backtracking subpatterns:: Often gives more performance
3027	* Conditional subpatterns:: Allows if/then/else branches
3028	* Recursive patterns:: For example to match parentheses
3029	* Comments:: Because things can get complex...
3030	@end menu
3031
3032	@node Backslash
3033	@appendixsec Backslash
3034	@cindex Perl-style regular expressions, escaped sequences
3035
3036	There are a few difference in the handling of backslashed
3037	sequences in Perl mode.
3038
3039	First of all, there are no @code{\o} and @code{\d} sequences.
3040	@sc{ascii} values for characters can be specified in octal
3041	with a @code{\@var{xxx}} sequence, where @var{xxx} is a
3042	sequence of up to three octal digits. If the first digit
3043	is a zero, the treatment of the sequence is straightforward;
3044	just note that if the character that follows the escaped digit
3045	is itself an octal digit, you have to supply three octal digits
3046	for @var{xxx}. For example @code{\07} is a @sc{bel} character
3047	rather than a @sc{nul} and a literal @code{7} (this sequence is
3048	instead represented by @code{\0007}).
3049
3050	@cindex Perl-style regular expressions, backreferences
3051	The handling of a backslash followed by a digit other than 0
3052	is complicated. Outside a character class, @command{sed} reads it
3053	and any following digits as a decimal number. If the number
3054	is less than 10, or if there have been at least that many
3055	previous capturing left parentheses in the expression, the
3056	entire sequence is taken as a back reference. A description
3057	of how this works is given later, following the discussion
3058	of parenthesized subpatterns.
3059
3060	Inside a character class, or if the decimal number is
3061	greater than 9 and there have not been that many capturing
3062	subpatterns, @command{sed} re-reads up to three octal digits following
3063	the backslash, and generates a single byte from the
3064	least significant 8 bits of the value. Any subsequent digits
3065	stand for themselves. For example:
3066
3067	@example
3068	\040 @i{is another way of writing a space}
3069	\40 @i{is the same, provided there are fewer than 40}
3070	@i{previous capturing subpatterns}
3071	\7 @i{is always a back reference}
3072	\011 @i{is always a tab}
3073	\11 @i{might be a back reference, or another way of}
3074	@i{writing a tab}
3075	\0113 @i{is a tab followed by the character @samp{3}}
3076	\113 @i{is the character with octal code 113 (since there}
3077	@i{can be no more than 99 back references)}
3078	\377 @i{is a byte consisting entirely of 1 bits (@sc{ascii} 255)}
3079	\81 @i{is either a back reference, or a binary zero}
3080	@i{followed by the two characters @samp{81}}
3081	@end example
3082
3083	Note that octal values of 100 or greater must not be introduced
3084	duced by a leading zero, because no more than three octal
3085	digits are ever read.
3086
3087	All the sequences that define a single byte value can be
3088	used both inside and outside character classes. In addition,
3089	inside a character class, the sequence @code{\b} is interpreted
3090	as the backspace character (hex 08). Outside a character
3091	class it has a different meaning (see below).
3092
3093	In addition, there are four additional escapes specifying
3094	generic character classes (like @code{\w} and @code{\W} do):
3095
3096	@cindex Perl-style regular expressions, character classes
3097	@table @samp
3098	@item \d
3099	Matches any decimal digit
3100
3101	@item \D
3102	Matches any character that is not a decimal digit
3103	@end table
3104
3105	In Perl mode, these character type sequences can appear both inside and
3106	outside character classes. Instead, in @sc{posix} mode these sequences
3107	(as well as @code{\w} and @code{\W}) are treated as two literal characters
3108	(a backslash and a letter) inside square brackets.
3109
3110	Escaped sequences specifying assertions are also different in
3111	Perl mode. An assertion specifies a condition that has to be met
3112	at a particular point in a match, without consuming any
3113	characters from the subject string. The use of subpatterns
3114	for more complicated assertions is described below. The
3115	backslashed assertions are
3116
3117	@cindex Perl-style regular expressions, assertions
3118	@table @samp
3119	@item \b
3120	Asserts that the point is at a word boundary.
3121	A word boundary is a position in the subject string where
3122	the current character and the previous character do not both
3123	match @code{\w} or @code{\W} (i.e. one matches @code{\w} and
3124	the other matches @code{\W}), or the start or end of the string
3125	if the first or last character matches @code{\w}, respectively.
3126
3127	@item \B
3128	Asserts that the point is not at a word boundary.
3129
3130	@item \A
3131	Asserts the matcher is at the start of pattern space (independent
3132	of multiline mode).
3133
3134	@item \Z
3135	Asserts the matcher is at the end of pattern space,
3136	or at a newline before the end of pattern space (independent of
3137	multiline mode)
3138
3139	@item \z
3140	Asserts the matcher is at the end of pattern space (independent
3141	of multiline mode)
3142	@end table
3143
3144	These assertions may not appear in character classes (but
3145	note that @code{\b} has a different meaning, namely the
3146	backspace character, inside a character class).
3147	Note that Perl mode does not support directly assertions
3148	for the beginning and the end of word; the @acronym{GNU} extensions
3149	@code{\<} and @code{\>} achieve this purpose in @sc{posix} mode
3150	instead.
3151
3152	The @code{\A}, @code{\Z}, and @code{\z} assertions differ
3153	from the traditional circumflex and dollar sign (described below)
3154	in that they only ever match at the very start and end of the
3155	subject string, whatever options are set; in particular @code{\A}
3156	and @code{\z} are the same as the @acronym{GNU} extensions
3157	@code{\`} and @code{\'} that are active in @sc{posix} mode.
3158
3159	@node Circumflex/dollar sign/period
3160	@appendixsec Circumflex, dollar sign, period
3161	@cindex Perl-style regular expressions, newlines
3162
3163	Outside a character class, in the default matching mode, the
3164	circumflex character is an assertion which is true only if
3165	the current matching point is at the start of the subject
3166	string. Inside a character class, the circumflex has an entirely
3167	different meaning (see below).
3168
3169	The circumflex need not be the first character of the pattern if
3170	a number of alternatives are involved, but it should be the
3171	first thing in each alternative in which it appears if the
3172	pattern is ever to match that branch. If all possible alternatives,
3173	start with a circumflex, that is, if the pattern is
3174	constrained to match only at the start of the subject, it is
3175	said to be an @dfn{anchored} pattern. (There are also other constructs
3176	structs that can cause a pattern to be anchored.)
3177
3178	A dollar sign is an assertion which is true only if the
3179	current matching point is at the end of the subject string,
3180	or immediately before a newline character that is the last
3181	character in the string (by default). A dollar sign need not be the
3182	last character of the pattern if a number of alternatives
3183	are involved, but it should be the last item in any branch
3184	in which it appears. A dollar sign has no special meaning in a
3185	character class.
3186
3187	@cindex Perl-style regular expressions, multiline
3188	The meanings of the circumflex and dollar sign characters are
3189	changed if the @code{M} modifier option is used. When this is
3190	the case, they match immediately after and immediately
3191	before an internal @code{\n} character, respectively, in addition
3192	to matching at the start and end of the subject string. For
3193	example, the pattern @code{/^abc$/} matches the subject string
3194	@samp{def\nabc} in multiline mode, but not otherwise. Consequently,
3195	patterns that are anchored in single line mode
3196	because all branches start with @code{^} are not anchored in
3197	multiline mode.
3198
3199	@cindex Perl-style regular expressions, multiline
3200	Note that the sequences @code{\A}, @code{\Z}, and @code{\z}
3201	can be used to match the start and end of the subject in both
3202	modes, and if all branches of a pattern start with @code{\A}
3203	is it always anchored, whether the @code{M} modifier is set or not.
3204
3205	@cindex Perl-style regular expressions, single line
3206	Outside a character class, a dot in the pattern matches any
3207	one character in the subject, including a non-printing character,
3208	but not (by default) newline. If the @code{S} modifier is used,
3209	dots match newlines as well. Actually, the handling of
3210	dot is entirely independent of the handling of circumflex
3211	and dollar sign, the only relationship being that they both
3212	involve newline characters. Dot has no special meaning in a
3213	character class.
3214
3215	@node Square brackets
3216	@appendixsec Square brackets
3217	@cindex Perl-style regular expressions, character classes
3218
3219	An opening square bracket introduces a character class, terminated
3220	by a closing square bracket. A closing square bracket on its own
3221	is not special. If a closing square bracket is required as a
3222	member of the class, it should be the first data character in
3223	the class (after an initial circumflex, if present) or escaped with a backslash.
3224
3225	A character class matches a single character in the subject;
3226	the character must be in the set of characters defined by
3227	the class, unless the first character in the class is a circumflex,
3228	in which case the subject character must not be in
3229	the set defined by the class. If a circumflex is actually
3230	required as a member of the class, ensure it is not the
3231	first character, or escape it with a backslash.
3232
3233	For example, the character class [aeiou] matches any lower
3234	case vowel, while [^aeiou] matches any character that is not
3235	a lower case vowel. Note that a circumflex is just a convenient
3236	venient notation for specifying the characters which are in
3237	the class by enumerating those that are not. It is not an
3238	assertion: it still consumes a character from the subject
3239	string, and fails if the current pointer is at the end of
3240	the string.
3241
3242	@cindex Perl-style regular expressions, case-insensitive
3243	When caseless matching is set, any letters in a class
3244	represent both their upper case and lower case versions, so
3245	for example, a caseless @code{[aeiou]} matches uppercase
3246	and lowercase @samp{A}s, and a caseless @code{[^aeiou]}
3247	does not match @samp{A}, whereas a case-sensitive version would.
3248
3249	@cindex Perl-style regular expressions, single line
3250	@cindex Perl-style regular expressions, multiline
3251	The newline character is never treated in any special way in
3252	character classes, whatever the setting of the @code{S} and
3253	@code{M} options (modifiers) is. A class such as @code{[^a]} will
3254	always match a newline.
3255
3256	The minus (hyphen) character can be used to specify a range
3257	of characters in a character class. For example, @code{[d-m]}
3258	matches any letter between d and m, inclusive. If a minus
3259	character is required in a class, it must be escaped with a
3260	backslash or appear in a position where it cannot be interpreted
3261	as indicating a range, typically as the first or last
3262	character in the class.
3263
3264	It is not possible to have the literal character @code{]} as the
3265	end character of a range. A pattern such as @code{[W-]46]} is
3266	interpreted as a class of two characters (@code{W} and @code{-})
3267	followed by a literal string @code{46]}, so it would match
3268	@samp{W46]} or @samp{-46]}. However, if the @code{]} is escaped
3269	with a backslash it is interpreted as the end of range, so
3270	@code{[W-\]46]} is interpreted as a single class containing a
3271	range followed by two separate characters. The octal or
3272	hexadecimal representation of @code{]} can also be used to end a range.
3273
3274	Ranges operate in @sc{ascii} collating sequence. They can also be
3275	used for characters specified numerically, for example
3276	@code{[\000-\037]}. If a range that includes letters is used when
3277	caseless matching is set, it matches the letters in either
3278	case. For example, a caseless @code{[W-c]} is equivalent to
3279	@code{[][\^_`wxyzabc]}, matched caselessly, and if character
3280	tables for the French locale are in use, @code{[\xc8-\xcb]}
3281	matches accented E characters in both cases.
3282
3283	Unlike in @sc{posix} mode, the character types @code{\d},
3284	@code{\D}, @code{\s}, @code{\S}, @code{\w}, and @code{\W}
3285	may also appear in a character class, and add the characters
3286	that they match to the class. For example, @code{[\dABCDEF]} matches any
3287	hexadecimal digit. A circumflex can conveniently be used
3288	with the upper case character types to specify a more restricted
3289	set of characters than the matching lower case type.
3290	For example, the class @code{[^\W_]} matches any letter or digit,
3291	but not underscore.
3292
3293	All non-alphameric characters other than @code{\}, @code{-},
3294	@code{^} (at the start) and the terminating @code{]}
3295	are non-special in character classes, but it does no harm
3296	if they are escaped.
3297
3298	Perl 5.6 supports the @sc{posix} notation for character classes, which
3299	uses names enclosed by @code{[:} and @code{:]} within the enclosing
3300	square brackets, and @value{SSED} supports this notation as well.
3301	For example,
3302
3303	@example
3304	[01[:alpha:]%]
3305	@end example
3306
3307	@noindent
3308	matches @samp{0}, @samp{1}, any alphabetic character, or @samp{%}.
3309	The supported class names are
3310
3311	@table @code
3312	@item alnum
3313	Matches letters and digits
3314
3315	@item alpha
3316	Matches letters
3317
3318	@item ascii
3319	Matches character codes 0 - 127
3320
3321	@item cntrl
3322	Matches control characters
3323
3324	@item digit
3325	Matches decimal digits (same as \d)
3326
3327	@item graph
3328	Matches printing characters, excluding space
3329
3330	@item lower
3331	Matches lower case letters
3332
3333	@item print
3334	Matches printing characters, including space
3335
3336	@item punct
3337	Matches printing characters, excluding letters and digits
3338
3339	@item space
3340	Matches white space (same as \s)
3341
3342	@item upper
3343	Matches upper case letters
3344
3345	@item word
3346	Matches ``word'' characters (same as \w)
3347
3348	@item xdigit
3349	Matches hexadecimal digits
3350	@end table
3351
3352	The names @code{ascii} and @code{word} are extensions valid only in
3353	Perl mode. Another Perl extension is negation, which is
3354	indicated by a circumflex character after the colon. For example,
3355
3356	@example
3357	[12[:^digit:]]
3358	@end example
3359
3360	@noindent
3361	matches @samp{1}, @samp{2}, or any non-digit.
3362
3363	@node Options setting
3364	@appendixsec Options setting
3365	@cindex Perl-style regular expressions, toggling options
3366	@cindex Perl-style regular expressions, case-insensitive
3367	@cindex Perl-style regular expressions, multiline
3368	@cindex Perl-style regular expressions, single line
3369	@cindex Perl-style regular expressions, extended
3370
3371	The settings of the @code{I}, @code{M}, @code{S}, @code{X}
3372	modifiers can be changed from within the pattern by
3373	a sequence of Perl option letters enclosed between @code{(?}
3374	and @code{)}. The option letters must be lowercase.
3375
3376	For example, @code{(?im)} sets caseless, multiline matching. It is
3377	also possible to unset these options by preceding the letter
3378	with a hyphen; you can also have combined settings and unsettings:
3379	@code{(?im-sx)} sets caseless and multiline matching,
3380	while unsets single line matching (for dots) and extended
3381	whitespace interpretation. If a letter appears both before
3382	and after the hyphen, the option is unset.
3383
3384	The scope of these option changes depends on where in the
3385	pattern the setting occurs. For settings that are outside
3386	any subpattern (defined below), the effect is the same as if
3387	the options were set or unset at the start of matching. The
3388	following patterns all behave in exactly the same way:
3389
3390	@example
3391	(?i)abc
3392	a(?i)bc
3393	ab(?i)c
3394	abc(?i)
3395	@end example
3396
3397	which in turn is the same as specifying the pattern abc with
3398	the @code{I} modifier. In other words, ``top level'' settings
3399	apply to the whole pattern (unless there are other
3400	changes inside subpatterns). If there is more than one setting
3401	of the same option at top level, the rightmost setting
3402	is used.
3403
3404	If an option change occurs inside a subpattern, the effect
3405	is different. This is a change of behaviour in Perl 5.005.
3406	An option change inside a subpattern affects only that part
3407	of the subpattern @emph{that follows} it, so
3408
3409	@example
3410	(a(?i)b)c
3411	@end example
3412
3413	@noindent
3414	matches abc and aBc and no other strings (assuming
3415	case-sensitive matching is used). By this means, options can
3416	be made to have different settings in different parts of the
3417	pattern. Any changes made in one alternative do carry on
3418	into subsequent branches within the same subpattern. For
3419	example,
3420
3421	@example
3422	(a(?i)b\|c)
3423	@end example
3424
3425	@noindent
3426	matches @samp{ab}, @samp{aB}, @samp{c}, and @samp{C},
3427	even though when matching @samp{C} the first branch is
3428	abandoned before the option setting.
3429	This is because the effects of option settings happen at
3430	compile time. There would be some very weird behaviour otherwise.
3431
3432	@ignore
3433	There are two PCRE-specific options PCRE_UNGREEDY and PCRE_EXTRA
3434	that can be changed in the same way as the Perl-compatible options by
3435	using the characters U and X respectively. The (?X) flag
3436	setting is special in that it must always occur earlier in
3437	the pattern than any of the additional features it turns on,
3438	even when it is at top level. It is best put at the start.
3439	@end ignore
3440
3441
3442	@node Non-capturing subpatterns
3443	@appendixsec Non-capturing subpatterns
3444	@cindex Perl-style regular expressions, non-capturing subpatterns
3445
3446	Marking part of a pattern as a subpattern does two things.
3447	On one hand, it localizes a set of alternatives; on the other
3448	hand, it sets up the subpattern as a capturing subpattern (as
3449	defined above). The subpattern can be backreferenced and
3450	referenced in the right side of @code{s} commands.
3451
3452	For example, if the string @samp{the red king} is matched against
3453	the pattern
3454
3455	@example
3456	the ((red\|white) (king\|queen))
3457	@end example
3458
3459	@noindent
3460	the captured substrings are @samp{red king}, @samp{red},
3461	and @samp{king}, and are numbered 1, 2, and 3.
3462
3463	The fact that plain parentheses fulfil two functions is not
3464	always helpful. There are often times when a grouping
3465	subpattern is required without a capturing requirement. If an
3466	opening parenthesis is followed by @code{?:}, the subpattern does
3467	not do any capturing, and is not counted when computing the
3468	number of any subsequent capturing subpatterns. For example,
3469	if the string @samp{the white queen} is matched against the pattern
3470
3471	@example
3472	the ((?:red\|white) (king\|queen))
3473	@end example
3474
3475	@noindent
3476	the captured substrings are @samp{white queen} and @samp{queen},
3477	and are numbered 1 and 2. The maximum number of captured
3478	substrings is 99, while the maximum number of all subpatterns,
3479	both capturing and non-capturing, is 200.
3480
3481	As a convenient shorthand, if any option settings are
3482	equired at the start of a non-capturing subpattern, the
3483	option letters may appear between the @code{?} and the
3484	@code{:}. Thus the two patterns
3485
3486	@example
3487	(?i:saturday\|sunday)
3488	(?:(?i)saturday\|sunday)
3489	@end example
3490
3491	@noindent
3492	match exactly the same set of strings. Because alternative
3493	branches are tried from left to right, and options are not
3494	reset until the end of the subpattern is reached, an option
3495	setting in one branch does affect subsequent branches, so
3496	the above patterns match @samp{SUNDAY} as well as @samp{Saturday}.
3497
3498
3499	@node Repetition
3500	@appendixsec Repetition
3501	@cindex Perl-style regular expressions, repetitions
3502
3503	Repetition is specified by quantifiers, which can follow any
3504	of the following items:
3505
3506	@itemize @bullet
3507	@item
3508	a single character, possibly escaped
3509
3510	@item
3511	the @code{.} special character
3512
3513	@item
3514	a character class
3515
3516	@item
3517	a back reference (see next section)
3518
3519	@item
3520	a parenthesized subpattern (unless it is an assertion; @pxref{Assertions})
3521	@end itemize
3522
3523	The general repetition quantifier specifies a minimum and
3524	maximum number of permitted matches, by giving the two
3525	numbers in curly brackets (braces), separated by a comma.
3526	The numbers must be less than 65536, and the first must be
3527	less than or equal to the second. For example:
3528
3529	@example
3530	z@{2,4@}
3531	@end example
3532
3533	@noindent
3534	matches @samp{zz}, @samp{zzz}, or @samp{zzzz}. A closing brace on its own
3535	is not a special character. If the second number is omitted,
3536	but the comma is present, there is no upper limit; if the
3537	second number and the comma are both omitted, the quantifier
3538	specifies an exact number of required matches. Thus
3539
3540	@example
3541	[aeiou]@{3,@}
3542	@end example
3543
3544	@noindent
3545	matches at least 3 successive vowels, but may match many
3546	more, while
3547
3548	@example
3549	\d@{8@}
3550	@end example
3551
3552	@noindent
3553	matches exactly 8 digits. An opening curly bracket that
3554	appears in a position where a quantifier is not allowed, or
3555	one that does not match the syntax of a quantifier, is taken
3556	as a literal character. For example, @{,6@} is not a quantifier,
3557	but a literal string of four characters.@footnote{It
3558	raises an error if @option{-R} is not used.}
3559
3560	The quantifier @samp{@{0@}} is permitted, causing the expression to
3561	behave as if the previous item and the quantifier were not
3562	present.
3563
3564	For convenience (and historical compatibility) the three
3565	most common quantifiers have single-character abbreviations:
3566
3567	@table @code
3568	@item *
3569	is equivalent to @{0,@}
3570
3571	@item +
3572	is equivalent to @{1,@}
3573
3574	@item ?
3575	is equivalent to @{0,1@}
3576	@end table
3577
3578	It is possible to construct infinite loops by following a
3579	subpattern that can match no characters with a quantifier
3580	that has no upper limit, for example:
3581
3582	@example
3583	(a?)*
3584	@end example
3585
3586	Earlier versions of Perl used to give an error at
3587	compile time for such patterns. However, because there are
3588	cases where this can be useful, such patterns are now
3589	accepted, but if any repetition of the subpattern does in
3590	fact match no characters, the loop is forcibly broken.
3591
3592	@cindex Greedy regular expression matching
3593	@cindex Perl-style regular expressions, stingy repetitions
3594	By default, the quantifiers are @dfn{greedy} like in @sc{posix}
3595	mode, that is, they match as much as possible (up to the maximum
3596	number of permitted times), without causing the rest of the
3597	pattern to fail. The classic example of where this gives problems
3598	is in trying to match comments in C programs. These appear between
3599	the sequences @code{/} and @code{/} and within the sequence, individual
3600	@code{*} and @code{/} characters may appear. An attempt to match C
3601	comments by applying the pattern
3602
3603	@example
3604	/\.\*/
3605	@end example
3606
3607	@noindent
3608	to the string
3609
3610	@example
3611	/* first command / not comment / second comment */
3612	@end example
3613
3614	@noindent
3615
3616	fails, because it matches the entire string owing to the
3617	greediness of the @code{.*} item.
3618
3619	However, if a quantifier is followed by a question mark, it
3620	ceases to be greedy, and instead matches the minimum number
3621	of times possible, so the pattern @code{/\.?\*/}
3622	does the right thing with the C comments. The meaning of the
3623	various quantifiers is not otherwise changed, just the preferred
3624	number of matches. Do not confuse this use of question
3625	mark with its use as a quantifier in its own right.
3626	Because it has two uses, it can sometimes appear doubled, as in
3627
3628	@example
3629	\d??\d
3630	@end example
3631
3632	which matches one digit by preference, but can match two if
3633	that is the only way the rest of the pattern matches.
3634
3635	Note that greediness does not matter when specifying addresses,
3636	but can be nevertheless used to improve performance.
3637
3638	@ignore
3639	If the PCRE_UNGREEDY option is set (an option which is not
3640	available in Perl), the quantifiers are not greedy by
3641	default, but individual ones can be made greedy by following
3642	them with a question mark. In other words, it inverts the
3643	default behaviour.
3644	@end ignore
3645
3646	When a parenthesized subpattern is quantified with a minimum
3647	repeat count that is greater than 1 or with a limited maximum,
3648	more store is required for the compiled pattern, in
3649	proportion to the size of the minimum or maximum.
3650
3651	@cindex Perl-style regular expressions, single line
3652	If a pattern starts with @code{.*} or @code{.@{0,@}} and the
3653	@code{S} modifier is used, the pattern is implicitly anchored,
3654	because whatever follows will be tried against every character
3655	position in the subject string, so there is no point in
3656	retrying the overall match at any position after the first.
3657	PCRE treats such a pattern as though it were preceded by \A.
3658
3659	When a capturing subpattern is repeated, the value captured
3660	is the substring that matched the final iteration. For example,
3661	after
3662
3663	@example
3664	(tweedle[dume]@{3@}\s*)+
3665	@end example
3666
3667	@noindent
3668	has matched @samp{tweedledum tweedledee} the value of the
3669	captured substring is @samp{tweedledee}. However, if there are
3670	nested capturing subpatterns, the corresponding captured
3671	values may have been set in previous iterations. For example,
3672	after
3673
3674	@example
3675	/(a\|(b))+/
3676	@end example
3677
3678	matches @samp{aba}, the value of the second captured substring is
3679	@samp{b}.
3680
3681	@node Backreferences
3682	@appendixsec Backreferences
3683	@cindex Perl-style regular expressions, backreferences
3684
3685	Outside a character class, a backslash followed by a digit
3686	greater than 0 (and possibly further digits) is a back
3687	reference to a capturing subpattern earlier (i.e. to its
3688	left) in the pattern, provided there have been that many
3689	previous capturing left parentheses.
3690
3691	However, if the decimal number following the backslash is
3692	less than 10, it is always taken as a back reference, and
3693	causes an error only if there are not that many capturing
3694	left parentheses in the entire pattern. In other words, the
3695	parentheses that are referenced need not be to the left of
3696	the reference for numbers less than 10. @ref{Backslash}
3697	for further details of the handling of digits following a backslash.
3698
3699	A back reference matches whatever actually matched the capturing
3700	subpattern in the current subject string, rather than
3701	anything matching the subpattern itself. So the pattern
3702
3703	@example
3704	(sens\|respons)e and \1ibility
3705	@end example
3706
3707	@noindent
3708	matches @samp{sense and sensibility} and @samp{response and responsibility},
3709	but not @samp{sense and responsibility}. If caseful
3710	matching is in force at the time of the back reference, the
3711	case of letters is relevant. For example,
3712
3713	@example
3714	((?i)blah)\s+\1
3715	@end example
3716
3717	@noindent
3718	matches @samp{blah blah} and @samp{Blah Blah}, but not
3719	@samp{BLAH blah}, even though the original capturing
3720	subpattern is matched caselessly.
3721
3722	There may be more than one back reference to the same subpattern.
3723	Also, if a subpattern has not actually been used in a
3724	particular match, any back references to it always fail. For
3725	example, the pattern
3726
3727	@example
3728	(a\|(bc))\2
3729	@end example
3730
3731	@noindent
3732	always fails if it starts to match @samp{a} rather than
3733	@samp{bc}. Because there may be up to 99 back references, all
3734	digits following the backslash are taken as part of a potential
3735	back reference number; this is different from what happens
3736	in @sc{posix} mode. If the pattern continues with a digit
3737	character, some delimiter must be used to terminate the back
3738	reference. If the @code{X} modifier option is set, this can be
3739	whitespace. Otherwise an empty comment can be used, or the
3740	following character can be expressed in hexadecimal or octal.
3741
3742	A back reference that occurs inside the parentheses to which
3743	it refers fails when the subpattern is first used, so, for
3744	example, @code{(a\1)} never matches. However, such references
3745	can be useful inside repeated subpatterns. For example, the
3746	pattern
3747
3748	@example
3749	(a\|b\1)+
3750	@end example
3751
3752	@noindent
3753	matches any number of @samp{a}s and also @samp{aba}, @samp{ababbaa},
3754	etc. At each iteration of the subpattern, the back reference matches
3755	the character string corresponding to the previous iteration. In
3756	order for this to work, the pattern must be such that the first
3757	iteration does not need to match the back reference. This can be
3758	done using alternation, as in the example above, or by a
3759	quantifier with a minimum of zero.
3760
3761	@node Assertions
3762	@appendixsec Assertions
3763	@cindex Perl-style regular expressions, assertions
3764	@cindex Perl-style regular expressions, asserting subpatterns
3765
3766	An assertion is a test on the characters following or
3767	preceding the current matching point that does not actually
3768	consume any characters. The simple assertions coded as @code{\b},
3769	@code{\B}, @code{\A}, @code{\Z}, @code{\z}, @code{^} and @code{$}
3770	are described above. More complicated assertions are coded as
3771	subpatterns. There are two kinds: those that look ahead of the
3772	current position in the subject string, and those that look behind it.
3773
3774	@cindex Perl-style regular expressions, lookahead subpatterns
3775	An assertion subpattern is matched in the normal way, except
3776	that it does not cause the current matching position to be
3777	changed. Lookahead assertions start with @code{(?=} for positive
3778	assertions and @code{(?!} for negative assertions. For example,
3779
3780	@example
3781	\w+(?=;)
3782	@end example
3783
3784	@noindent
3785	matches a word followed by a semicolon, but does not include
3786	the semicolon in the match, and
3787
3788	@example
3789	foo(?!bar)
3790	@end example
3791
3792	@noindent
3793	matches any occurrence of @samp{foo} that is not followed by
3794	@samp{bar}.
3795
3796	Note that the apparently similar pattern
3797
3798	@example
3799	(?!foo)bar
3800	@end example
3801
3802	@noindent
3803	@cindex Perl-style regular expressions, lookbehind subpatterns
3804	finds any occurrence of @samp{bar} even if it is preceded by
3805	@samp{foo}, because the assertion @code{(?!foo)} is always true
3806	when the next three characters are @samp{bar}. A lookbehind
3807	assertion is needed to achieve this effect.
3808	Lookbehind assertions start with @code{(?<=} for positive
3809	assertions and @code{(?<!} for negative assertions. So,
3810
3811	@example
3812	(?<!foo)bar
3813	@end example
3814
3815	achieves the required effect of finding an occurrence of
3816	@samp{bar} that is not preceded by @samp{foo}. The contents of a
3817	lookbehind assertion are restricted
3818	such that all the strings it matches must have a fixed
3819	length. However, if there are several alternatives, they do
3820	not all have to have the same fixed length. This is an extension
3821	compared with Perl 5.005, which requires all branches to match
3822	the same length of string. Thus
3823
3824	@example
3825	(?<=dogs\|cats\|)
3826	@end example
3827
3828	@noindent
3829	is permitted, but the apparently equivalent regular expression
3830
3831	@example
3832	(?<!dogs?\|cats?)
3833	@end example
3834
3835	@noindent
3836	causes an error at compile time. Branches that match different
3837	length strings are permitted only at the top level of
3838	a lookbehind assertion: an assertion such as
3839
3840	@example
3841	(?<=ab(c\|de))
3842	@end example
3843
3844	@noindent
3845	is not permitted, because its single top-level branch can
3846	match two different lengths, but it is acceptable if rewritten
3847	to use two top-level branches:
3848
3849	@example
3850	(?<=abc\|abde)
3851	@end example
3852
3853	All this is required because lookbehind assertions simply
3854	move the current position back by the alternative's fixed
3855	width and then try to match. If there are
3856	insufficient characters before the current position, the
3857	match is deemed to fail. Lookbehinds, in conjunction with
3858	non-backtracking subpatterns can be particularly useful for
3859	matching at the ends of strings; an example is given at the end
3860	of the section on non-backtracking subpatterns.
3861
3862	Several assertions (of any sort) may occur in succession.
3863	For example,
3864
3865	@example
3866	(?<=\d@{3@})(?<!999)foo
3867	@end example
3868
3869	@noindent
3870	matches @samp{foo} preceded by three digits that are not @samp{999}.
3871	Notice that each of the assertions is applied independently
3872	at the same point in the subject string. First there is a
3873	check that the previous three characters are all digits, and
3874	then there is a check that the same three characters are not
3875	@samp{999}. This pattern does not match @samp{foo} preceded by six
3876	characters, the first of which are digits and the last three
3877	of which are not @samp{999}. For example, it doesn't match
3878	@samp{123abcfoo}. A pattern to do that is
3879
3880	@example
3881	(?<=\d@{3@}...)(?<!999)foo
3882	@end example
3883
3884	@noindent
3885	This time the first assertion looks at the preceding six
3886	characters, checking that the first three are digits, and
3887	then the second assertion checks that the preceding three
3888	characters are not @samp{999}. Actually, assertions can be
3889	nested in any combination, so one can write this as
3890
3891	@example
3892	(?<=\d@{3@}(?!999)...)foo
3893	@end example
3894
3895	or
3896
3897	@example
3898	(?<=\d@{3@}...(?<!999))foo
3899	@end example
3900
3901	@noindent
3902	both of which might be considered more readable.
3903
3904	Assertion subpatterns are not capturing subpatterns, and may
3905	not be repeated, because it makes no sense to assert the
3906	same thing several times. If any kind of assertion contains
3907	capturing subpatterns within it, these are counted for the
3908	purposes of numbering the capturing subpatterns in the whole
3909	pattern. However, substring capturing is carried out only
3910	for positive assertions, because it does not make sense for
3911	negative assertions.
3912
3913	Assertions count towards the maximum of 200 parenthesized
3914	subpatterns.
3915
3916	@node Non-backtracking subpatterns
3917	@appendixsec Non-backtracking subpatterns
3918	@cindex Perl-style regular expressions, non-backtracking subpatterns
3919
3920	With both maximizing and minimizing repetition, failure of
3921	what follows normally causes the repeated item to be evaluated
3922	again to see if a different number of repeats allows the
3923	rest of the pattern to match. Sometimes it is useful to
3924	prevent this, either to change the nature of the match, or
3925	to cause it fail earlier than it otherwise might, when the
3926	author of the pattern knows there is no point in carrying
3927	on.
3928
3929	Consider, for example, the pattern @code{\d+foo} when applied to
3930	the subject line
3931
3932	@example
3933	123456bar
3934	@end example
3935
3936	After matching all 6 digits and then failing to match @samp{foo},
3937	the normal action of the matcher is to try again with only 5
3938	digits matching the @code{\d+} item, and then with 4, and so on,
3939	before ultimately failing. Non-backtracking subpatterns
3940	provide the means for specifying that once a portion of the
3941	pattern has matched, it is not to be re-evaluated in this way,
3942	so the matcher would give up immediately on failing to match
3943	@samp{foo} the first time. The notation is another kind of special
3944	parenthesis, starting with @code{(?>} as in this example:
3945
3946	@example
3947	(?>\d+)bar
3948	@end example
3949
3950	This kind of parenthesis ``locks up'' the part of the pattern
3951	it contains once it has matched, and a failure further into
3952	the pattern is prevented from backtracking into it.
3953	Backtracking past it to previous items, however, works as
3954	normal.
3955
3956	Non-backtracking subpatterns are not capturing subpatterns. Simple
3957	cases such as the above example can be thought of as a maximizing
3958	repeat that must swallow everything it can. So,
3959	while both @code{\d+} and @code{\d+?} are prepared to adjust the number of
3960	digits they match in order to make the rest of the pattern
3961	match, @code{(?>\d+)} can only match an entire sequence of digits.
3962
3963	This construction can of course contain arbitrarily complicated
3964	subpatterns, and it can be nested.
3965
3966	@cindex Perl-style regular expressions, lookbehind subpatterns
3967	Non-backtracking subpatterns can be used in conjunction with look-behind
3968	assertions to specify efficient matching at the end
3969	of the subject string. Consider a simple pattern such as
3970
3971	@example
3972	abcd$
3973	@end example
3974
3975	@noindent
3976	when applied to a long string which does not match. Because
3977	matching proceeds from left to right, @command{sed} will look for
3978	each @samp{a} in the subject and then see if what follows matches
3979	the rest of the pattern. If the pattern is specified as
3980
3981	@example
3982	^.*abcd$
3983	@end example
3984
3985	@noindent
3986	the initial @code{.*} matches the entire string at first, but when
3987	this fails (because there is no following @samp{a}), it backtracks
3988	to match all but the last character, then all but the
3989	last two characters, and so on. Once again the search for
3990	@samp{a} covers the entire string, from right to left, so we are
3991	no better off. However, if the pattern is written as
3992
3993	@example
3994	^(?>.*)(?<=abcd)
3995	@end example
3996
3997	there can be no backtracking for the .* item; it can match
3998	only the entire string. The subsequent lookbehind assertion
3999	does a single test on the last four characters. If it fails,
4000	the match fails immediately. For long strings, this approach
4001	makes a significant difference to the processing time.
4002
4003	When a pattern contains an unlimited repeat inside a subpattern
4004	that can itself be repeated an unlimited number of
4005	times, the use of a once-only subpattern is the only way to
4006	avoid some failing matches taking a very long time
4007	indeed.@footnote{Actually, the matcher embedded in @value{SSED}
4008	tries to do something for this in the simplest cases,
4009	like @code{([^b]b)}. These cases are actually quite
4010	common: they happen for example in a regular expression
4011	like @code{\/\([^]\)*\/} which matches C comments.}
4012
4013	The pattern
4014
4015	@example
4016	(\D+\|<\d+>)*[!?]
4017	@end example
4018
4019	([^0-9<]+<(\d+>)?)*[!?]
4020
4021	@noindent
4022	matches an unlimited number of substrings that either consist
4023	of non-digits, or digits enclosed in angular brackets, followed by
4024	an exclamation or question mark. When it matches, it runs quickly.
4025	However, if it is applied to
4026
4027	@example
4028	aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
4029	@end example
4030
4031	@noindent
4032	it takes a long time before reporting failure. This is
4033	because the string can be divided between the two repeats in
4034	a large number of ways, and all have to be tried.@footnote{The
4035	example used @code{[!?]} rather than a single character at the end,
4036	because both @value{SSED} and Perl have an optimization that allows
4037	for fast failure when a single character is used. They
4038	remember the last single character that is required for a
4039	match, and fail early if it is not present in the string.}
4040
4041	If the pattern is changed to
4042
4043	@example
4044	((?>\D+)\|<\d+>)*[!?]
4045	@end example
4046
4047	sequences of non-digits cannot be broken, and failure happens
4048	quickly.
4049
4050	@node Conditional subpatterns
4051	@appendixsec Conditional subpatterns
4052	@cindex Perl-style regular expressions, conditional subpatterns
4053
4054	It is possible to cause the matching process to obey a subpattern
4055	conditionally or to choose between two alternative
4056	subpatterns, depending on the result of an assertion, or
4057	whether a previous capturing subpattern matched or not. The
4058	two possible forms of conditional subpattern are
4059
4060	@example
4061	(?(@var{condition})@var{yes-pattern})
4062	(?(@var{condition})@var{yes-pattern}\|@var{no-pattern})
4063	@end example
4064
4065	If the condition is satisfied, the yes-pattern is used; otherwise
4066	the no-pattern (if present) is used. If there are more than two
4067	alternatives in the subpattern, a compile-time error occurs.
4068
4069	There are two kinds of condition. If the text between the
4070	parentheses consists of a sequence of digits, the condition
4071	is satisfied if the capturing subpattern of that number has
4072	previously matched. The number must be greater than zero.
4073	Consider the following pattern, which contains non-significant
4074	white space to make it more readable (assume the @code{X} modifier)
4075	and to divide it into three parts for ease of discussion:
4076
4077	@example
4078	( $ )? [^()]+ (?(1) $ )
4079	@end example
4080
4081	The first part matches an optional opening parenthesis, and
4082	if that character is present, sets it as the first captured
4083	substring. The second part matches one or more characters
4084	that are not parentheses. The third part is a conditional
4085	subpattern that tests whether the first set of parentheses
4086	matched or not. If they did, that is, if subject started
4087	with an opening parenthesis, the condition is true, and so
4088	the yes-pattern is executed and a closing parenthesis is
4089	required. Otherwise, since no-pattern is not present, the
4090	subpattern matches nothing. In other words, this pattern
4091	matches a sequence of non-parentheses, optionally enclosed
4092	in parentheses.
4093
4094	@cindex Perl-style regular expressions, lookahead subpatterns
4095	If the condition is not a sequence of digits, it must be an
4096	assertion. This may be a positive or negative lookahead or
4097	lookbehind assertion. Consider this pattern, again containing
4098	non-significant white space, and with the two alternatives
4099	on the second line:
4100
4101	@example
4102	(?(?=...[a-z])
4103	\d\d-[a-z]@{3@}-\d\d \|
4104	\d\d-\d\d-\d\d )
4105	@end example
4106
4107	The condition is a positive lookahead assertion that matches
4108	a letter that is three characters away from the current point.
4109	If a letter is found, the subject is matched against the first
4110	alternative @samp{@var{dd}-@var{aaa}-@var{dd}} (where @var{aaa} are
4111	letters and @var{dd} are digits); otherwise it is matched against
4112	the second alternative, @samp{@var{dd}-@var{dd}-@var{dd}}.
4113
4114
4115	@node Recursive patterns
4116	@appendixsec Recursive patterns
4117	@cindex Perl-style regular expressions, recursive patterns
4118	@cindex Perl-style regular expressions, recursion
4119
4120	Consider the problem of matching a string in parentheses,
4121	allowing for unlimited nested parentheses. Without the use
4122	of recursion, the best that can be done is to use a pattern
4123	that matches up to some fixed depth of nesting. It is not
4124	possible to handle an arbitrary nesting depth. Perl 5.6 has
4125	provided an experimental facility that allows regular
4126	expressions to recurse (amongst other things). It does this
4127	by interpolating Perl code in the expression at run time,
4128	and the code can refer to the expression itself. A Perl pattern
4129	tern to solve the parentheses problem can be created like
4130	this:
4131
4132	@example
4133	$re = qr@{$ (?: (?>[^()]+) \| (?p@{$re@}) )* $@}x;
4134	@end example
4135
4136	The @code{(?p@{...@})} item interpolates Perl code at run time,
4137	and in this case refers recursively to the pattern in which it
4138	appears. Obviously, @command{sed} cannot support the interpolation of
4139	Perl code. Instead, the special item @code{(?R)} is provided for
4140	the specific case of recursion. This pattern solves the
4141	parentheses problem (assume the @code{X} modifier option is used
4142	so that white space is ignored):
4143
4144	@example
4145	$ ( (?>[^()]+) \| (?R) )* $
4146	@end example
4147
4148	First it matches an opening parenthesis. Then it matches any
4149	number of substrings which can either be a sequence of
4150	non-parentheses, or a recursive match of the pattern itself
4151	(i.e. a correctly parenthesized substring). Finally there is
4152	a closing parenthesis.
4153
4154	This particular example pattern contains nested unlimited
4155	repeats, and so the use of a non-backtracking subpattern for
4156	matching strings of non-parentheses is important when applying
4157	the pattern to strings that do not match. For example, when
4158	it is applied to
4159
4160	@example
4161	(aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()
4162	@end example
4163
4164	it yields a ``no match'' response quickly. However, if a
4165	standard backtracking subpattern is not used, the match runs
4166	for a very long time indeed because there are so many different
4167	ways the @code{+} and @code{*} repeats can carve up the subject,
4168	and all have to be tested before failure can be reported.
4169
4170	The values set for any capturing subpatterns are those from
4171	the outermost level of the recursion at which the subpattern
4172	value is set. If the pattern above is matched against
4173
4174	@example
4175	(ab(cd)ef)
4176	@end example
4177
4178	@noindent
4179	the value for the capturing parentheses is @samp{ef}, which is
4180	the last value taken on at the top level.
4181
4182	@node Comments
4183	@appendixsec Comments
4184	@cindex Perl-style regular expressions, comments
4185
4186	The sequence (?# marks the start of a comment which continues
4187	ues up to the next closing parenthesis. Nested parentheses
4188	are not permitted. The characters that make up a comment
4189	play no part in the pattern matching at all.
4190
4191	@cindex Perl-style regular expressions, extended
4192	If the @code{X} modifier option is used, an unescaped @code{#} character
4193	outside a character class introduces a comment that continues
4194	up to the next newline character in the pattern.
4195	@end ifset
4196
4197
4198	@page
4199	@node Concept Index
4200	@unnumbered Concept Index
4201
4202	This is a general index of all issues discussed in this manual, with the
4203	exception of the @command{sed} commands and command-line options.
4204
4205	@printindex cp
4206
4207	@page
4208	@node Command and Option Index
4209	@unnumbered Command and Option Index
4210
4211	This is an alphabetical list of all @command{sed} commands and command-line
4212	options.
4213
4214	@printindex fn
4215
4216	@contents
4217	@bye
4218
4219	@c XXX FIXME: the term "cycle" is never defined...

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: trunk/src/sed/doc/sed.texi@ 1846

Download in other formats: