Context Navigation

← Previous Revision
Next Revision →
Blame
Revision Log

sed.texi

Last change on this file was 3613, checked in by bird, 14 months ago
src/sed: Merged in changes between 4.1.5 and 4.9 from the vendor branch. (svn merge ^{/vendor/sed/4.1.5}/vendor/sed/current .)
File size: 161.5 KB

Line
1	\input texinfo @c --texinfo--
2	@c
3	@c -- Stuff that needs adding: ----------------------------------------------
4	@c (nothing!)
5	@c --------------------------------------------------------------------------
6	@c Check for consistency: regexps in @code, text that they match in @samp.
7	@c
8	@c Tips:
9	@c @command for command
10	@c @samp for command fragments: @samp{cat -s}
11	@c @code for sed commands and flags
12	@c Use ``quote'' not `quote' or "quote".
13	@c
14	@c %**start of header
15	@setfilename sed.info
16	@settitle sed, a stream editor
17	@c %**end of header
18
19	@c @smallbook
20
21	@include version.texi
22
23	@c Combine indices.
24	@syncodeindex ky cp
25	@syncodeindex pg cp
26	@syncodeindex tp cp
27
28	@defcodeindex op
29	@syncodeindex op fn
30
31	@include config.texi
32
33	@copying
34	This file documents version @value{VERSION} of
35	@value{SSED}, a stream editor.
36
37	Copyright @copyright{} 1998--2022 Free Software Foundation, Inc.
38
39	@quotation
40	Permission is granted to copy, distribute and/or modify this document
41	under the terms of the GNU Free Documentation License, Version 1.3
42	or any later version published by the Free Software Foundation;
43	with no Invariant Sections, no Front-Cover Texts, and no
44	Back-Cover Texts. A copy of the license is included in the
45	section entitled ``GNU Free Documentation License''.
46	@end quotation
47	@end copying
48
49	@setchapternewpage off
50
51	@titlepage
52	@title @value{SSED}, a stream editor
53	@subtitle version @value{VERSION}, @value{UPDATED}
54	@author by Ken Pizzini, Paolo Bonzini, Jim Meyering, Assaf Gordon
55
56	@page
57	@vskip 0pt plus 1filll
58	@insertcopying
59	@end titlepage
60
61	@contents
62
63	@ifnottex
64	@node Top
65	@top @value{SSED}
66
67	@insertcopying
68	@end ifnottex
69
70	@menu
71	* Introduction:: Introduction
72	* Invoking sed:: Invocation
73	* sed scripts:: @command{sed} scripts
74	* sed addresses:: Addresses: selecting lines
75	* sed regular expressions:: Regular expressions: selecting text
76	* advanced sed:: Advanced @command{sed}: cycles and buffers
77	* Examples:: Some sample scripts
78	* Limitations:: Limitations and (non-)limitations of @value{SSED}
79	* Other Resources:: Other resources for learning about @command{sed}
80	* Reporting Bugs:: Reporting bugs
81	* GNU Free Documentation License:: Copying and sharing this manual
82	* Concept Index:: A menu with all the topics in this manual.
83	* Command and Option Index:: A menu with all @command{sed} commands and
84	command-line options.
85	@end menu
86
87
88	@node Introduction
89	@chapter Introduction
90
91	@cindex Stream editor
92	@command{sed} is a stream editor.
93	A stream editor is used to perform basic text
94	transformations on an input stream
95	(a file or input from a pipeline).
96	While in some ways similar to an editor which
97	permits scripted edits (such as @command{ed}),
98	@command{sed} works by making only one pass over the
99	input(s), and is consequently more efficient.
100	But it is @command{sed}'s ability to filter text in a pipeline
101	which particularly distinguishes it from other types of
102	editors.
103
104
105	@node Invoking sed
106	@chapter Running sed
107
108	This chapter covers how to run @command{sed}. Details of @command{sed}
109	scripts and individual @command{sed} commands are discussed in the
110	next chapter.
111
112	@menu
113	* Overview::
114	* Command-Line Options::
115	* Exit status::
116	@end menu
117
118
119	@node Overview
120	@section Overview
121	Normally @command{sed} is invoked like this:
122
123	@example
124	sed SCRIPT INPUTFILE...
125	@end example
126
127	For example, to change every @samp{hello} to @samp{world}
128	in the file @file{input.txt}:
129
130	@example
131	sed 's/hello/world/g' input.txt > output.txt
132	@end example
133
134	Without the @samp{g} (global) modifier, @command{sed} affects
135	only the first instance per line.
136
137	@cindex stdin
138	@cindex standard input
139	If you do not specify @var{INPUTFILE}, or if @var{INPUTFILE} is @file{-},
140	@command{sed} filters the contents of the standard input. The following
141	commands are equivalent:
142
143	@example
144	sed 's/hello/world/g' input.txt > output.txt
145	sed 's/hello/world/g' < input.txt > output.txt
146	cat input.txt \| sed 's/hello/world/g' - > output.txt
147	@end example
148
149	@cindex stdout
150	@cindex output
151	@cindex standard output
152	@cindex -i, example
153	@command{sed} writes output to standard output. Use @option{-i} to edit
154	files in-place instead of printing to standard output.
155	See also the @code{W} and @code{s///w} commands for writing output to
156	other files. The following command modifies @file{file.txt} and
157	does not produce any output:
158
159	@example
160	sed -i 's/hello/world/' file.txt
161	@end example
162
163	@cindex -n, example
164	@cindex p, example
165	@cindex suppressing output
166	@cindex output, suppressing
167	By default @command{sed} prints all processed input (except input
168	that has been modified/deleted by commands such as @command{d}).
169	Use @option{-n} to suppress output, and the @code{p} command
170	to print specific lines. The following command prints only line 45
171	of the input file:
172
173	@example
174	sed -n '45p' file.txt
175	@end example
176
177
178
179	@cindex multiple files
180	@cindex -s, example
181	@command{sed} treats multiple input files as one long stream.
182	The following example prints the first line of the first file
183	(@file{one.txt}) and the last line of the last file (@file{three.txt}).
184	Use @option{-s} to reverse this behavior.
185
186	@example
187	sed -n '1p ; $p' one.txt two.txt three.txt
188	@end example
189
190
191	@cindex -e, example
192	@cindex --expression, example
193	@cindex -f, example
194	@cindex --file, example
195	@cindex script parameter
196	@cindex parameters, script
197	Without @option{-e} or @option{-f} options, @command{sed} uses
198	the first non-option parameter as the @var{script}, and the following
199	non-option parameters as input files.
200	If @option{-e} or @option{-f} options are used to specify a @var{script},
201	all non-option parameters are taken as input files.
202	Options @option{-e} and @option{-f} can be combined, and can appear
203	multiple times (in which case the final effective @var{script} will be
204	concatenation of all the individual @var{script}s).
205
206	The following examples are equivalent:
207
208	@example
209	sed 's/hello/world/' input.txt > output.txt
210
211	sed -e 's/hello/world/' input.txt > output.txt
212	sed --expression='s/hello/world/' input.txt > output.txt
213
214	echo 's/hello/world/' > myscript.sed
215	sed -f myscript.sed input.txt > output.txt
216	sed --file=myscript.sed input.txt > output.txt
217	@end example
218
219
220	@node Command-Line Options
221	@section Command-Line Options
222
223	The full format for invoking @command{sed} is:
224
225	@example
226	sed OPTIONS... [SCRIPT] [INPUTFILE...]
227	@end example
228
229	@command{sed} may be invoked with the following command-line options:
230
231	@table @code
232	@item --version
233	@opindex --version
234	@cindex Version, printing
235	Print out the version of @command{sed} that is being run and a copyright notice,
236	then exit.
237
238	@item --help
239	@opindex --help
240	@cindex Usage summary, printing
241	Print a usage message briefly summarizing these command-line options
242	and the bug-reporting address,
243	then exit.
244
245	@item -n
246	@itemx --quiet
247	@itemx --silent
248	@opindex -n
249	@opindex --quiet
250	@opindex --silent
251	@cindex Disabling autoprint, from command line
252	By default, @command{sed} prints out the pattern space
253	at the end of each cycle through the script (@pxref{Execution Cycle, ,
254	How @code{sed} works}).
255	These options disable this automatic printing,
256	and @command{sed} only produces output when explicitly told to
257	via the @code{p} command.
258
259	@item --debug
260	@opindex --debug
261	@cindex @value{SSEDEXT}, debug
262	Print the input sed program in canonical form,
263	and annotate program execution.
264	@codequotebacktick on
265	@codequoteundirected on
266	@example
267	$ echo 1 \| sed '\%1%s21232'
268	3
269
270	$ echo 1 \| sed --debug '\%1%s21232'
271	SED PROGRAM:
272	/1/ s/1/3/
273	INPUT: 'STDIN' line 1
274	PATTERN: 1
275	COMMAND: /1/ s/1/3/
276	PATTERN: 3
277	END-OF-CYCLE:
278	3
279	@end example
280	@codequotebacktick off
281	@codequoteundirected off
282
283
284	@item -e @var{script}
285	@itemx --expression=@var{script}
286	@opindex -e
287	@opindex --expression
288	@cindex Script, from command line
289	Add the commands in @var{script} to the set of commands to be
290	run while processing the input.
291
292	@item -f @var{script-file}
293	@itemx --file=@var{script-file}
294	@opindex -f
295	@opindex --file
296	@cindex Script, from a file
297	Add the commands contained in the file @var{script-file}
298	to the set of commands to be run while processing the input.
299
300	@item -i[@var{SUFFIX}]
301	@itemx --in-place[=@var{SUFFIX}]
302	@opindex -i
303	@opindex --in-place
304	@cindex In-place editing, activating
305	@cindex @value{SSEDEXT}, in-place editing
306	This option specifies that files are to be edited in-place.
307	@value{SSED} does this by creating a temporary file and
308	sending output to this file rather than to the standard
309	output.@footnote{This applies to commands such as @code{=},
310	@code{a}, @code{c}, @code{i}, @code{l}, @code{p}. You can
311	still write to the standard output by using the @code{w}
312	@cindex @value{SSEDEXT}, @file{/dev/stdout} file
313	or @code{W} commands together with the @file{/dev/stdout}
314	special file}.
315
316	This option implies @option{-s}.
317
318	When the end of the file is reached, the temporary file is
319	renamed to the output file's original name. The extension,
320	if supplied, is used to modify the name of the old file
321	before renaming the temporary file, thereby making a backup
322	copy@footnote{Note that @value{SSED} creates the backup
323	file whether or not any output is actually changed.}).
324
325	@cindex In-place editing, Perl-style backup file names
326	This rule is followed: if the extension doesn't contain a @code{*},
327	then it is appended to the end of the current filename as a
328	suffix; if the extension does contain one or more @code{*}
329	characters, then @emph{each} asterisk is replaced with the
330	current filename. This allows you to add a prefix to the
331	backup file, instead of (or in addition to) a suffix, or
332	even to place backup copies of the original files into another
333	directory (provided the directory already exists).
334
335	If no extension is supplied, the original file is
336	overwritten without making a backup.
337
338	Because @option{-i} takes an optional argument, it should
339	not be followed by other short options:
340	@table @code
341	@item sed -Ei '...' FILE
342	Same as @option{-E -i} with no backup suffix - @file{FILE} will be
343	edited in-place without creating a backup.
344
345	@item sed -iE '...' FILE
346	This is equivalent to @option{--in-place=E}, creating @file{FILEE} as backup
347	of @file{FILE}
348	@end table
349
350	Be cautious of using @option{-n} with @option{-i}: the former disables
351	automatic printing of lines and the latter changes the file in-place
352	without a backup. Used carelessly (and without an explicit @code{p} command),
353	the output file will be empty:
354	@codequotebacktick on
355	@codequoteundirected on
356	@example
357	# WRONG USAGE: 'FILE' will be truncated.
358	sed -ni 's/foo/bar/' FILE
359	@end example
360	@codequotebacktick off
361	@codequoteundirected off
362
363	@item -l @var{N}
364	@itemx --line-length=@var{N}
365	@opindex -l
366	@opindex --line-length
367	@cindex Line length, setting
368	Specify the default line-wrap length for the @code{l} command.
369	A length of 0 (zero) means to never wrap long lines. If
370	not specified, it is taken to be 70.
371
372	@item --posix
373	@opindex --posix
374	@cindex @value{SSEDEXT}, disabling
375	@value{SSED} includes several extensions to POSIX
376	sed. In order to simplify writing portable scripts, this
377	option disables all the extensions that this manual documents,
378	including additional commands.
379	@cindex @code{POSIXLY_CORRECT} behavior, enabling
380	Most of the extensions accept @command{sed} programs that
381	are outside the syntax mandated by POSIX, but some
382	of them (such as the behavior of the @command{N} command
383	described in @ref{Reporting Bugs}) actually violate the
384	standard. If you want to disable only the latter kind of
385	extension, you can set the @code{POSIXLY_CORRECT} variable
386	to a non-empty value.
387
388	@item -b
389	@itemx --binary
390	@opindex -b
391	@opindex --binary
392	This option is available on every platform, but is only effective where the
393	operating system makes a distinction between text files and binary files.
394	When such a distinction is made---as is the case for MS-DOS, Windows,
395	Cygwin---text files are composed of lines separated by a carriage return
396	@emph{and} a line feed character, and @command{sed} does not see the
397	ending CR. When this option is specified, @command{sed} will open
398	input files in binary mode, thus not requesting this special processing
399	and considering lines to end at a line feed.
400
401	@item --follow-symlinks
402	@opindex --follow-symlinks
403	This option is available only on platforms that support
404	symbolic links and has an effect only if option @option{-i}
405	is specified. In this case, if the file that is specified
406	on the command line is a symbolic link, @command{sed} will
407	follow the link and edit the ultimate destination of the
408	link. The default behavior is to break the symbolic link,
409	so that the link destination will not be modified.
410
411	@item -E
412	@itemx -r
413	@itemx --regexp-extended
414	@opindex -E
415	@opindex -r
416	@opindex --regexp-extended
417	@cindex Extended regular expressions, choosing
418	@cindex GNU extensions, extended regular expressions
419	Use extended regular expressions rather than basic
420	regular expressions. Extended regexps are those that
421	@command{egrep} accepts; they can be clearer because they
422	usually have fewer backslashes.
423	Historically this was a GNU extension,
424	but the @option{-E}
425	extension has since been added to the POSIX standard
426	(http://austingroupbugs.net/view.php?id=528),
427	so use @option{-E} for portability.
428	GNU sed has accepted @option{-E} as an undocumented option for years,
429	and *BSD seds have accepted @option{-E} for years as well,
430	but scripts that use @option{-E} might not port to other older systems.
431	@xref{ERE syntax, , Extended regular expressions}.
432
433
434	@item -s
435	@itemx --separate
436	@opindex -s
437	@opindex --separate
438	@cindex Working on separate files
439	By default, @command{sed} will consider the files specified on the
440	command line as a single continuous long stream. This @value{SSED}
441	extension allows the user to consider them as separate files:
442	range addresses (such as @samp{/abc/,/def/}) are not allowed
443	to span several files, line numbers are relative to the start
444	of each file, @code{$} refers to the last line of each file,
445	and files invoked from the @code{R} commands are rewound at the
446	start of each file.
447
448	@item --sandbox
449	@opindex --sandbox
450	@cindex Sandbox mode
451	In sandbox mode, @code{e/w/r} commands are rejected - programs containing
452	them will be aborted without being run. Sandbox mode ensures @command{sed}
453	operates only on the input files designated on the command line, and
454	cannot run external programs.
455
456
457	@item -u
458	@itemx --unbuffered
459	@opindex -u
460	@opindex --unbuffered
461	@cindex Unbuffered I/O, choosing
462	Buffer both input and output as minimally as practical.
463	(This is particularly useful if the input is coming from
464	the likes of @samp{tail -f}, and you wish to see the transformed
465	output as soon as possible.)
466
467	@item -z
468	@itemx --null-data
469	@itemx --zero-terminated
470	@opindex -z
471	@opindex --null-data
472	@opindex --zero-terminated
473	Treat the input as a set of lines, each terminated by a zero byte
474	(the ASCII @samp{NUL} character) instead of a newline. This option can
475	be used with commands like @samp{sort -z} and @samp{find -print0}
476	to process arbitrary file names.
477	@end table
478
479	If no @option{-e}, @option{-f}, @option{--expression}, or @option{--file}
480	options are given on the command-line,
481	then the first non-option argument on the command line is
482	taken to be the @var{script} to be executed.
483
484	@cindex Files to be processed as input
485	If any command-line parameters remain after processing the above,
486	these parameters are interpreted as the names of input files to
487	be processed.
488	@cindex Standard input, processing as input
489	A file name of @samp{-} refers to the standard input stream.
490	The standard input will be processed if no file names are specified.
491
492	@node Exit status
493	@section Exit status
494	@cindex exit status
495	An exit status of zero indicates success, and a nonzero value
496	indicates failure. @value{SSED} returns the following exit status
497	error values:
498
499	@table @asis
500	@item 0
501	Successful completion.
502
503	@item 1
504	Invalid command, invalid syntax, invalid regular expression or a
505	@value{SSED} extension command used with @option{--posix}.
506
507	@item 2
508	One or more of the input file specified on the command line could not be
509	opened (e.g. if a file is not found, or read permission is denied).
510	Processing continued with other files.
511
512	@item 4
513	An I/O error, or a serious processing error during runtime,
514	@value{SSED} aborted immediately.
515	@end table
516
517	@cindex Q, example
518	@cindex exit status, example
519	Additionally, the commands @code{q} and @code{Q} can be used to terminate
520	@command{sed} with a custom exit code value (this is a @value{SSED} extension):
521
522	@example
523	$ echo \| sed 'Q42' ; echo $?
524	42
525	@end example
526
527
528	@node sed scripts
529	@chapter @command{sed} scripts
530
531
532	@menu
533	* sed script overview:: @command{sed} script overview
534	* sed commands list:: @command{sed} commands summary
535	* The "s" Command:: @command{sed}'s Swiss Army Knife
536	* Common Commands:: Often used commands
537	* Other Commands:: Less frequently used commands
538	* Programming Commands:: Commands for @command{sed} gurus
539	* Extended Commands:: Commands specific of @value{SSED}
540	* Multiple commands syntax:: Extension for easier scripting
541	@end menu
542
543	@node sed script overview
544	@section @command{sed} script overview
545
546	@cindex @command{sed} script structure
547	@cindex Script structure
548
549	A @command{sed} program consists of one or more @command{sed} commands,
550	passed in by one or more of the
551	@option{-e}, @option{-f}, @option{--expression}, and @option{--file}
552	options, or the first non-option argument if zero of these
553	options are used.
554	This document will refer to ``the'' @command{sed} script;
555	this is understood to mean the in-order concatenation
556	of all of the @var{script}s and @var{script-file}s passed in.
557	@xref{Overview}.
558
559
560	@cindex @command{sed} commands syntax
561	@cindex syntax, @command{sed} commands
562	@cindex addresses, syntax
563	@cindex syntax, addresses
564	@command{sed} commands follow this syntax:
565
566	@example
567	[addr]@var{X}[options]
568	@end example
569
570	@var{X} is a single-letter @command{sed} command.
571	@c TODO: add @pxref{commands} when there is a command-list section.
572	@code{[addr]} is an optional line address. If @code{[addr]} is specified,
573	the command @var{X} will be executed only on the matched lines.
574	@code{[addr]} can be a single line number, a regular expression,
575	or a range of lines (@pxref{sed addresses}).
576	Additional @code{[options]} are used for some @command{sed} commands.
577
578	@cindex @command{d}, example
579	@cindex address range, example
580	@cindex example, address range
581	The following example deletes lines 30 to 35 in the input.
582	@code{30,35} is an address range. @command{d} is the delete command:
583
584	@example
585	sed '30,35d' input.txt > output.txt
586	@end example
587
588	@cindex @command{q}, example
589	@cindex regular expression, example
590	@cindex example, regular expression
591	The following example prints all input until a line
592	starting with the string @samp{foo} is found. If such line is found,
593	@command{sed} will terminate with exit status 42.
594	If such line was not found (and no other error occurred), @command{sed}
595	will exit with status 0.
596	@code{/^foo/} is a regular-expression address.
597	@command{q} is the quit command. @code{42} is the command option.
598
599	@example
600	sed '/^foo/q42' input.txt > output.txt
601	@end example
602
603
604	@cindex multiple @command{sed} commands
605	@cindex @command{sed} commands, multiple
606	@cindex newline, command separator
607	@cindex semicolons, command separator
608	@cindex ;, command separator
609	@cindex -e, example
610	@cindex -f, example
611	Commands within a @var{script} or @var{script-file} can be
612	separated by semicolons (@code{;}) or newlines (ASCII 10).
613	Multiple scripts can be specified with @option{-e} or @option{-f}
614	options.
615
616	The following examples are all equivalent. They perform two @command{sed}
617	operations: deleting any lines matching the regular expression @code{/^foo/},
618	and replacing all occurrences of the string @samp{hello} with @samp{world}:
619
620	@example
621	sed '/^foo/d ; s/hello/world/g' input.txt > output.txt
622
623	sed -e '/^foo/d' -e 's/hello/world/g' input.txt > output.txt
624
625	echo '/^foo/d' > script.sed
626	echo 's/hello/world/g' >> script.sed
627	sed -f script.sed input.txt > output.txt
628
629	echo 's/hello/world/g' > script2.sed
630	sed -e '/^foo/d' -f script2.sed input.txt > output.txt
631	@end example
632
633
634	@cindex @command{a}, and semicolons
635	@cindex @command{c}, and semicolons
636	@cindex @command{i}, and semicolons
637	Commands @command{a}, @command{c}, @command{i}, due to their syntax,
638	cannot be followed by semicolons working as command separators and
639	thus should be terminated
640	with newlines or be placed at the end of a @var{script} or @var{script-file}.
641	Commands can also be preceded with optional non-significant
642	whitespace characters.
643	@xref{Multiple commands syntax}.
644
645
646
647	@node sed commands list
648	@section @command{sed} commands summary
649
650	The following commands are supported in @value{SSED}.
651	Some are standard POSIX commands, while other are @value{SSEDEXT}.
652	Details and examples for each command are in the following sections.
653	(Mnemonics) are shown in parentheses.
654
655	@table @code
656
657	@item a\
658	@itemx @var{text}
659	Append @var{text} after a line.
660
661	@item a @var{text}
662	Append @var{text} after a line (alternative syntax).
663
664	@item b @var{label}
665	Branch unconditionally to @var{label}.
666	The @var{label} may be omitted, in which case the next cycle is started.
667
668	@item c\
669	@itemx @var{text}
670	Replace (change) lines with @var{text}.
671
672	@item c @var{text}
673	Replace (change) lines with @var{text} (alternative syntax).
674
675	@item d
676	Delete the pattern space;
677	immediately start next cycle.
678
679	@item D
680	If pattern space contains newlines, delete text in the pattern
681	space up to the first newline, and restart cycle with the resultant
682	pattern space, without reading a new line of input.
683
684	If pattern space contains no newline, start a normal new cycle as if
685	the @code{d} command was issued.
686	@c TODO: add a section about D+N and D+n commands
687
688	@item e
689	Executes the command that is found in pattern space and
690	replaces the pattern space with the output; a trailing newline
691	is suppressed.
692
693	@item e @var{command}
694	Executes @var{command} and sends its output to the output stream.
695	The command can run across multiple lines, all but the last ending with
696	a back-slash.
697
698	@item F
699	(filename) Print the file name of the current input file (with a trailing
700	newline).
701
702	@item g
703	Replace the contents of the pattern space with the contents of the hold space.
704
705	@item G
706	Append a newline to the contents of the pattern space,
707	and then append the contents of the hold space to that of the pattern space.
708
709	@item h
710	(hold) Replace the contents of the hold space with the contents of the
711	pattern space.
712
713	@item H
714	Append a newline to the contents of the hold space,
715	and then append the contents of the pattern space to that of the hold space.
716
717	@item i\
718	@itemx @var{text}
719	insert @var{text} before a line.
720
721	@item i @var{text}
722	insert @var{text} before a line (alternative syntax).
723
724	@item l
725	Print the pattern space in an unambiguous form.
726
727	@item n
728	(next) If auto-print is not disabled, print the pattern space,
729	then, regardless, replace the pattern space with the next line of input.
730	If there is no more input then @command{sed} exits without processing
731	any more commands.
732
733	@item N
734	Add a newline to the pattern space,
735	then append the next line of input to the pattern space.
736	If there is no more input then @command{sed} exits without processing
737	any more commands.
738
739	@item p
740	Print the pattern space.
741	@c useful with @option{-n}
742
743	@item P
744	Print the pattern space, up to the first <newline>.
745
746	@item q@var{[exit-code]}
747	(quit) Exit @command{sed} without processing any more commands or input.
748
749	@item Q@var{[exit-code]}
750	(quit) This command is the same as @code{q}, but will not print the
751	contents of pattern space. Like @code{q}, it provides the
752	ability to return an exit code to the caller.
753	@c useful to quit on a conditional without printing
754
755	@item r filename
756	Reads file @var{filename}.
757
758	@item R filename
759	Queue a line of @var{filename} to be read and
760	inserted into the output stream at the end of the current cycle,
761	or when the next input line is read.
762	@c useful to interleave files
763
764	@item s@var{/regexp/replacement/[flags]}
765	(substitute) Match the regular-expression against the content of the
766	pattern space. If found, replace matched string with
767	@var{replacement}.
768
769	@item t @var{label}
770	(test) Branch to @var{label} only if there has been a successful
771	@code{s}ubstitution since the last input line was read or conditional
772	branch was taken. The @var{label} may be omitted, in which case the
773	next cycle is started.
774
775	@item T @var{label}
776	(test) Branch to @var{label} only if there have been no successful
777	@code{s}ubstitutions since the last input line was read or
778	conditional branch was taken. The @var{label} may be omitted,
779	in which case the next cycle is started.
780
781	@item v @var{[version]}
782	(version) This command does nothing, but makes @command{sed} fail if
783	@value{SSED} extensions are not supported, or if the requested version
784	is not available.
785
786	@item w filename
787	Write the pattern space to @var{filename}.
788
789	@item W filename
790	Write to the given filename the portion of the pattern space up to
791	the first newline
792
793	@item x
794	Exchange the contents of the hold and pattern spaces.
795
796
797	@item y/src/dst/
798	Transliterate any characters in the pattern space which match
799	any of the @var{source-chars} with the corresponding character
800	in @var{dest-chars}.
801
802
803	@item z
804	(zap) This command empties the content of pattern space.
805
806	@item #
807	A comment, until the next newline.
808
809
810	@item @{ @var{cmd ; cmd ...} @}
811	Group several commands together.
812	@c useful for multiple commands on same address
813
814	@item =
815	Print the current input line number (with a trailing newline).
816
817	@item : @var{label}
818	Specify the location of @var{label} for branch commands (@code{b},
819	@code{t}, @code{T}).
820
821	@end table
822
823
824	@node The "s" Command
825	@section The @code{s} Command
826
827	The @code{s} command (as in substitute) is probably the most important
828	in @command{sed} and has a lot of different options. The syntax of
829	the @code{s} command is
830	@samp{s/@var{regexp}/@var{replacement}/@var{flags}}.
831
832	Its basic concept is simple: the @code{s} command attempts to match
833	the pattern space against the supplied regular expression @var{regexp};
834	if the match is successful, then that portion of the
835	pattern space which was matched is replaced with @var{replacement}.
836
837	For details about @var{regexp} syntax @pxref{Regexp Addresses,,Regular
838	Expression Addresses}.
839
840	@cindex Backreferences, in regular expressions
841	@cindex Parenthesized substrings
842	The @var{replacement} can contain @code{\@var{n}} (@var{n} being
843	a number from 1 to 9, inclusive) references, which refer to
844	the portion of the match which is contained between the @var{n}th
845	@code{$} and its matching @code{$}.
846	Also, the @var{replacement} can contain unescaped @code{&}
847	characters which reference the whole matched portion
848	of the pattern space.
849
850	@c TODO: xref to backreference section mention @var{\'}.
851
852	The @code{/}
853	characters may be uniformly replaced by any other single
854	character within any given @code{s} command. The @code{/}
855	character (or whatever other character is used in its stead)
856	can appear in the @var{regexp} or @var{replacement}
857	only if it is preceded by a @code{\} character.
858
859
860
861	@cindex @value{SSEDEXT}, case modifiers in @code{s} commands
862	Finally, as a @value{SSED} extension, you can include a
863	special sequence made of a backslash and one of the letters
864	@code{L}, @code{l}, @code{U}, @code{u}, or @code{E}.
865	The meaning is as follows:
866
867	@table @code
868	@item \L
869	Turn the replacement
870	to lowercase until a @code{\U} or @code{\E} is found,
871
872	@item \l
873	Turn the
874	next character to lowercase,
875
876	@item \U
877	Turn the replacement to uppercase
878	until a @code{\L} or @code{\E} is found,
879
880	@item \u
881	Turn the next character
882	to uppercase,
883
884	@item \E
885	Stop case conversion started by @code{\L} or @code{\U}.
886	@end table
887
888	When the @code{g} flag is being used, case conversion does not
889	propagate from one occurrence of the regular expression to
890	another. For example, when the following command is executed
891	with @samp{a-b-} in pattern space:
892	@example
893	s/$b\?$-/x\u\1/g
894	@end example
895
896	@noindent
897	the output is @samp{axxB}. When replacing the first @samp{-},
898	the @samp{\u} sequence only affects the empty replacement of
899	@samp{\1}. It does not affect the @code{x} character that is
900	added to pattern space when replacing @code{b-} with @code{xB}.
901
902	On the other hand, @code{\l} and @code{\u} do affect the remainder
903	of the replacement text if they are followed by an empty substitution.
904	With @samp{a-b-} in pattern space, the following command:
905	@example
906	s/$b\?$-/\u\1x/g
907	@end example
908
909	@noindent
910	will replace @samp{-} with @samp{X} (uppercase) and @samp{b-} with
911	@samp{Bx}. If this behavior is undesirable, you can prevent it by
912	adding a @samp{\E} sequence---after @samp{\1} in this case.
913
914	To include a literal @code{\}, @code{&}, or newline in the final
915	replacement, be sure to precede the desired @code{\}, @code{&},
916	or newline in the @var{replacement} with a @code{\}.
917
918	@findex s command, option flags
919	@cindex Substitution of text, options
920	The @code{s} command can be followed by zero or more of the
921	following @var{flags}:
922
923	@table @code
924	@item g
925	@cindex Global substitution
926	@cindex Replacing all text matching regexp in a line
927	Apply the replacement to @emph{all} matches to the @var{regexp},
928	not just the first.
929
930	@item @var{number}
931	@cindex Replacing only @var{n}th match of regexp in a line
932	Only replace the @var{number}th match of the @var{regexp}.
933
934	@cindex GNU extensions, @code{g} and @var{number} modifier
935	interaction in @code{s} command
936	@cindex Mixing @code{g} and @var{number} modifiers in the @code{s} command
937	Note: the @sc{posix} standard does not specify what should happen
938	when you mix the @code{g} and @var{number} modifiers,
939	and currently there is no widely agreed upon meaning
940	across @command{sed} implementations.
941	For @value{SSED}, the interaction is defined to be:
942	ignore matches before the @var{number}th,
943	and then match and replace all matches from
944	the @var{number}th on.
945
946	@item p
947	@cindex Text, printing after substitution
948	If the substitution was made, then print the new pattern space.
949
950	Note: when both the @code{p} and @code{e} options are specified,
951	the relative ordering of the two produces very different results.
952	In general, @code{ep} (evaluate then print) is what you want,
953	but operating the other way round can be useful for debugging.
954	For this reason, the current version of @value{SSED} interprets
955	specially the presence of @code{p} options both before and after
956	@code{e}, printing the pattern space before and after evaluation,
957	while in general flags for the @code{s} command show their
958	effect just once. This behavior, although documented, might
959	change in future versions.
960
961	@item w @var{filename}
962	@cindex Text, writing to a file after substitution
963	@cindex @value{SSEDEXT}, @file{/dev/stdout} file
964	@cindex @value{SSEDEXT}, @file{/dev/stderr} file
965	If the substitution was made, then write out the result to the named file.
966	As a @value{SSED} extension, two special values of @var{filename} are
967	supported: @file{/dev/stderr}, which writes the result to the standard
968	error, and @file{/dev/stdout}, which writes to the standard
969	output.@footnote{This is equivalent to @code{p} unless the @option{-i}
970	option is being used.}
971
972	@item e
973	@cindex Evaluate Bourne-shell commands, after substitution
974	@cindex Subprocesses
975	@cindex @value{SSEDEXT}, evaluating Bourne-shell commands
976	@cindex @value{SSEDEXT}, subprocesses
977	This command allows one to pipe input from a shell command
978	into pattern space. If a substitution was made, the command
979	that is found in pattern space is executed and pattern space
980	is replaced with its output. A trailing newline is suppressed;
981	results are undefined if the command to be executed contains
982	a @sc{nul} character. This is a @value{SSED} extension.
983
984	@item I
985	@itemx i
986	@cindex GNU extensions, @code{I} modifier
987	@cindex Case-insensitive matching
988	The @code{I} modifier to regular-expression matching is a GNU
989	extension which makes @command{sed} match @var{regexp} in a
990	case-insensitive manner.
991
992	@item M
993	@itemx m
994	@cindex @value{SSEDEXT}, @code{M} modifier
995	The @code{M} modifier to regular-expression matching is a @value{SSED}
996	extension which directs @value{SSED} to match the regular expression
997	in @cite{multi-line} mode. The modifier causes @code{^} and @code{$} to
998	match respectively (in addition to the normal behavior) the empty string
999	after a newline, and the empty string before a newline. There are
1000	special character sequences
1001	@ifclear PERL
1002	(@code{\`} and @code{\'})
1003	@end ifclear
1004	which always match the beginning or the end of the buffer.
1005	In addition,
1006	the period character does not match a new-line character in
1007	multi-line mode.
1008
1009
1010	@end table
1011
1012	@node Common Commands
1013	@section Often-Used Commands
1014
1015	If you use @command{sed} at all, you will quite likely want to know
1016	these commands.
1017
1018	@table @code
1019	@item #
1020	[No addresses allowed.]
1021
1022	@findex # (comments)
1023	@cindex Comments, in scripts
1024	The @code{#} character begins a comment;
1025	the comment continues until the next newline.
1026
1027	@cindex Portability, comments
1028	If you are concerned about portability, be aware that
1029	some implementations of @command{sed} (which are not @sc{posix}
1030	conforming) may only support a single one-line comment,
1031	and then only when the very first character of the script is a @code{#}.
1032
1033	@findex -n, forcing from within a script
1034	@cindex Caveat --- #n on first line
1035	Warning: if the first two characters of the @command{sed} script
1036	are @code{#n}, then the @option{-n} (no-autoprint) option is forced.
1037	If you want to put a comment in the first line of your script
1038	and that comment begins with the letter @samp{n}
1039	and you do not want this behavior,
1040	then be sure to either use a capital @samp{N},
1041	or place at least one space before the @samp{n}.
1042
1043	@item q [@var{exit-code}]
1044	@findex q (quit) command
1045	@cindex @value{SSEDEXT}, returning an exit code
1046	@cindex Quitting
1047	Exit @command{sed} without processing any more commands or input.
1048
1049	Example: stop after printing the second line:
1050	@example
1051	$ seq 3 \| sed 2q
1052	1
1053	2
1054	@end example
1055
1056	This command accepts only one address.
1057	Note that the current pattern space is printed if auto-print is
1058	not disabled with the @option{-n} options. The ability to return
1059	an exit code from the @command{sed} script is a @value{SSED} extension.
1060
1061	See also the @value{SSED} extension @code{Q} command which quits silently
1062	without printing the current pattern space.
1063
1064	@item d
1065	@findex d (delete) command
1066	@cindex Text, deleting
1067	Delete the pattern space;
1068	immediately start next cycle.
1069
1070	Example: delete the second input line:
1071	@example
1072	$ seq 3 \| sed 2d
1073	1
1074	3
1075	@end example
1076
1077	@item p
1078	@findex p (print) command
1079	@cindex Text, printing
1080	Print out the pattern space (to the standard output).
1081	This command is usually only used in conjunction with the @option{-n}
1082	command-line option.
1083
1084	Example: print only the second input line:
1085	@example
1086	$ seq 3 \| sed -n 2p
1087	2
1088	@end example
1089
1090	@item n
1091	@findex n (next-line) command
1092	@cindex Next input line, replace pattern space with
1093	@cindex Read next input line
1094	If auto-print is not disabled, print the pattern space,
1095	then, regardless, replace the pattern space with the next line of input.
1096	If there is no more input then @command{sed} exits without processing
1097	any more commands.
1098
1099	This command is useful to skip lines (e.g. process every Nth line).
1100
1101	Example: perform substitution on every 3rd line (i.e. two @code{n} commands
1102	skip two lines):
1103	@codequoteundirected on
1104	@codequotebacktick on
1105	@example
1106	$ seq 6 \| sed 'n;n;s/./x/'
1107	1
1108	2
1109	x
1110	4
1111	5
1112	x
1113	@end example
1114
1115	@value{SSED} provides an extension address syntax of @var{first}~@var{step}
1116	to achieve the same result:
1117
1118	@example
1119	$ seq 6 \| sed '0~3s/./x/'
1120	1
1121	2
1122	x
1123	4
1124	5
1125	x
1126	@end example
1127
1128	@codequotebacktick off
1129	@codequoteundirected off
1130
1131
1132	@item @{ @var{commands} @}
1133	@findex @{@} command grouping
1134	@cindex Grouping commands
1135	@cindex Command groups
1136	A group of commands may be enclosed between
1137	@code{@{} and @code{@}} characters.
1138	This is particularly useful when you want a group of commands
1139	to be triggered by a single address (or address-range) match.
1140
1141	Example: perform substitution then print the second input line:
1142	@codequoteundirected on
1143	@codequotebacktick on
1144	@example
1145	$ seq 3 \| sed -n '2@{s/2/X/ ; p@}'
1146	X
1147	@end example
1148	@codequoteundirected off
1149	@codequotebacktick off
1150
1151	@end table
1152
1153
1154	@node Other Commands
1155	@section Less Frequently-Used Commands
1156
1157	Though perhaps less frequently used than those in the previous
1158	section, some very small yet useful @command{sed} scripts can be built with
1159	these commands.
1160
1161	@table @code
1162	@item y/@var{source-chars}/@var{dest-chars}/
1163	@findex y (transliterate) command
1164	@cindex Transliteration
1165	Transliterate any characters in the pattern space which match
1166	any of the @var{source-chars} with the corresponding character
1167	in @var{dest-chars}.
1168
1169	Example: transliterate @samp{a-j} into @samp{0-9}:
1170	@codequoteundirected on
1171	@codequotebacktick on
1172	@example
1173	$ echo hello world \| sed 'y/abcdefghij/0123456789/'
1174	74llo worl3
1175	@end example
1176	@codequoteundirected off
1177	@codequotebacktick off
1178
1179	(The @code{/} characters may be uniformly replaced by
1180	any other single character within any given @code{y} command.)
1181
1182	Instances of the @code{/} (or whatever other character is used in its stead),
1183	@code{\}, or newlines can appear in the @var{source-chars} or @var{dest-chars}
1184	lists, provide that each instance is escaped by a @code{\}.
1185	The @var{source-chars} and @var{dest-chars} lists @emph{must}
1186	contain the same number of characters (after de-escaping).
1187
1188	See the @command{tr} command from GNU coreutils for similar functionality.
1189
1190	@item a @var{text}
1191	Appending @var{text} after a line. This is a GNU extension
1192	to the standard @code{a} command - see below for details.
1193
1194	Example: Add @samp{hello} after the second line:
1195	@codequoteundirected on
1196	@codequotebacktick on
1197	@example
1198	$ seq 3 \| sed '2a hello'
1199	1
1200	2
1201	hello
1202	3
1203	@end example
1204	@codequoteundirected off
1205	@codequotebacktick off
1206
1207	Leading whitespace after the @code{a} command is ignored.
1208	The text to add is read until the end of the line.
1209
1210
1211	@item a\
1212	@itemx @var{text}
1213	@findex a (append text lines) command
1214	@cindex Appending text after a line
1215	@cindex Text, appending
1216	Appending @var{text} after a line.
1217
1218	Example: Add @samp{hello} after the second line
1219	(@print{} indicates printed output lines):
1220	@codequoteundirected on
1221	@codequotebacktick on
1222	@example
1223	$ seq 3 \| sed '2a\
1224	hello'
1225	@print{}1
1226	@print{}2
1227	@print{}hello
1228	@print{}3
1229	@end example
1230	@codequoteundirected off
1231	@codequotebacktick off
1232
1233	The @code{a} command queues the lines of text which follow this command
1234	(each but the last ending with a @code{\},
1235	which are removed from the output)
1236	to be output at the end of the current cycle,
1237	or when the next input line is read.
1238
1239	@cindex @value{SSEDEXT}, two addresses supported by most commands
1240	As a GNU extension, this command accepts two addresses.
1241
1242	Escape sequences in @var{text} are processed, so you should
1243	use @code{\\} in @var{text} to print a single backslash.
1244
1245	The commands resume after the last line without a backslash (@code{\}) -
1246	@samp{world} in the following example:
1247	@codequoteundirected on
1248	@codequotebacktick on
1249	@example
1250	$ seq 3 \| sed '2a\
1251	hello\
1252	world
1253	3s/./X/'
1254	@print{}1
1255	@print{}2
1256	@print{}hello
1257	@print{}world
1258	@print{}X
1259	@end example
1260	@codequoteundirected off
1261	@codequotebacktick off
1262
1263	As a GNU extension, the @code{a} command and @var{text} can be
1264	separated into two @code{-e} parameters, enabling easier scripting:
1265	@codequoteundirected on
1266	@codequotebacktick on
1267	@example
1268	$ seq 3 \| sed -e '2a\' -e hello
1269	1
1270	2
1271	hello
1272	3
1273
1274	$ sed -e '2a\' -e "$VAR"
1275	@end example
1276	@codequoteundirected off
1277	@codequotebacktick off
1278
1279	@item i @var{text}
1280	insert @var{text} before a line. This is a GNU extension
1281	to the standard @code{i} command - see below for details.
1282
1283	Example: Insert @samp{hello} before the second line:
1284	@codequoteundirected on
1285	@codequotebacktick on
1286	@example
1287	$ seq 3 \| sed '2i hello'
1288	1
1289	hello
1290	2
1291	3
1292	@end example
1293	@codequoteundirected off
1294	@codequotebacktick off
1295
1296	Leading whitespace after the @code{i} command is ignored.
1297	The text to add is read until the end of the line.
1298
1299	@anchor{insert command}
1300	@item i\
1301	@itemx @var{text}
1302	@findex i (insert text lines) command
1303	@cindex Inserting text before a line
1304	@cindex Text, insertion
1305	Immediately output the lines of text which follow this command.
1306
1307	Example: Insert @samp{hello} before the second line
1308	(@print{} indicates printed output lines):
1309	@codequoteundirected on
1310	@codequotebacktick on
1311	@example
1312	$ seq 3 \| sed '2i\
1313	hello'
1314	@print{}1
1315	@print{}hello
1316	@print{}2
1317	@print{}3
1318	@end example
1319	@codequoteundirected off
1320	@codequotebacktick off
1321
1322	@cindex @value{SSEDEXT}, two addresses supported by most commands
1323	As a GNU extension, this command accepts two addresses.
1324
1325	Escape sequences in @var{text} are processed, so you should
1326	use @code{\\} in @var{text} to print a single backslash.
1327
1328	The commands resume after the last line without a backslash (@code{\}) -
1329	@samp{world} in the following example:
1330	@codequoteundirected on
1331	@codequotebacktick on
1332	@example
1333	$ seq 3 \| sed '2i\
1334	hello\
1335	world
1336	s/./X/'
1337	@print{}X
1338	@print{}hello
1339	@print{}world
1340	@print{}X
1341	@print{}X
1342	@end example
1343	@codequoteundirected off
1344	@codequotebacktick off
1345
1346	As a GNU extension, the @code{i} command and @var{text} can be
1347	separated into two @code{-e} parameters, enabling easier scripting:
1348	@codequoteundirected on
1349	@codequotebacktick on
1350	@example
1351	$ seq 3 \| sed -e '2i\' -e hello
1352	1
1353	hello
1354	2
1355	3
1356
1357	$ sed -e '2i\' -e "$VAR"
1358	@end example
1359	@codequoteundirected off
1360	@codequotebacktick off
1361
1362	@item c @var{text}
1363	Replaces the line(s) with @var{text}. This is a GNU extension
1364	to the standard @code{c} command - see below for details.
1365
1366	Example: Replace the 2nd to 9th lines with the word @samp{hello}:
1367	@codequoteundirected on
1368	@codequotebacktick on
1369	@example
1370	$ seq 10 \| sed '2,9c hello'
1371	1
1372	hello
1373	10
1374	@end example
1375	@codequoteundirected off
1376	@codequotebacktick off
1377
1378	Leading whitespace after the @code{c} command is ignored.
1379	The text to add is read until the end of the line.
1380
1381	@item c\
1382	@itemx @var{text}
1383	@findex c (change to text lines) command
1384	@cindex Replacing selected lines with other text
1385	Delete the lines matching the address or address-range,
1386	and output the lines of text which follow this command.
1387
1388	Example: Replace 2nd to 4th lines with the words @samp{hello} and
1389	@samp{world} (@print{} indicates printed output lines):
1390	@codequoteundirected on
1391	@codequotebacktick on
1392	@example
1393	$ seq 5 \| sed '2,4c\
1394	hello\
1395	world'
1396	@print{}1
1397	@print{}hello
1398	@print{}world
1399	@print{}5
1400	@end example
1401	@codequoteundirected off
1402	@codequotebacktick off
1403
1404	If no addresses are given, each line is replaced.
1405
1406	A new cycle is started after this command is done,
1407	since the pattern space will have been deleted.
1408	In the following example, the @code{c} starts a
1409	new cycle and the substitution command is not performed
1410	on the replaced text:
1411
1412	@codequoteundirected on
1413	@codequotebacktick on
1414	@example
1415	$ seq 3 \| sed '2c\
1416	hello
1417	s/./X/'
1418	@print{}X
1419	@print{}hello
1420	@print{}X
1421	@end example
1422	@codequoteundirected off
1423	@codequotebacktick off
1424
1425	As a GNU extension, the @code{c} command and @var{text} can be
1426	separated into two @code{-e} parameters, enabling easier scripting:
1427	@codequoteundirected on
1428	@codequotebacktick on
1429	@example
1430	$ seq 3 \| sed -e '2c\' -e hello
1431	1
1432	hello
1433	3
1434
1435	$ sed -e '2c\' -e "$VAR"
1436	@end example
1437	@codequoteundirected off
1438	@codequotebacktick off
1439
1440
1441	@item =
1442	@findex = (print line number) command
1443	@cindex Printing line number
1444	@cindex Line number, printing
1445	Print out the current input line number (with a trailing newline).
1446
1447	@codequoteundirected on
1448	@codequotebacktick on
1449	@example
1450	$ printf '%s\n' aaa bbb ccc \| sed =
1451	1
1452	aaa
1453	2
1454	bbb
1455	3
1456	ccc
1457	@end example
1458	@codequoteundirected off
1459	@codequotebacktick off
1460
1461	@cindex @value{SSEDEXT}, two addresses supported by most commands
1462	As a GNU extension, this command accepts two addresses.
1463
1464
1465
1466
1467	@item l @var{n}
1468	@findex l (list unambiguously) command
1469	@cindex List pattern space
1470	@cindex Printing text unambiguously
1471	@cindex Line length, setting
1472	@cindex @value{SSEDEXT}, setting line length
1473	Print the pattern space in an unambiguous form:
1474	non-printable characters (and the @code{\} character)
1475	are printed in C-style escaped form; long lines are split,
1476	with a trailing @code{\} character to indicate the split;
1477	the end of each line is marked with a @code{$}.
1478
1479	@var{n} specifies the desired line-wrap length;
1480	a length of 0 (zero) means to never wrap long lines. If omitted,
1481	the default as specified on the command line is used. The @var{n}
1482	parameter is a @value{SSED} extension.
1483
1484	@item r @var{filename}
1485
1486	@findex r (read file) command
1487	@cindex Read text from a file
1488	Reads file @var{filename}. Example:
1489
1490	@codequoteundirected on
1491	@codequotebacktick on
1492	@example
1493	$ seq 3 \| sed '2r/etc/hostname'
1494	1
1495	2
1496	fencepost.gnu.org
1497	3
1498	@end example
1499	@codequoteundirected off
1500	@codequotebacktick off
1501
1502	@cindex @value{SSEDEXT}, @file{/dev/stdin} file
1503	Queue the contents of @var{filename} to be read and
1504	inserted into the output stream at the end of the current cycle,
1505	or when the next input line is read.
1506	Note that if @var{filename} cannot be read, it is treated as
1507	if it were an empty file, without any error indication.
1508
1509	As a @value{SSED} extension, the special value @file{/dev/stdin}
1510	is supported for the file name, which reads the contents of the
1511	standard input.
1512
1513	@cindex @value{SSEDEXT}, two addresses supported by most commands
1514	As a GNU extension, this command accepts two addresses. The
1515	file will then be reread and inserted on each of the addressed lines.
1516
1517	As a @value{SSED} extension, the @code{r} command accepts a zero address,
1518	inserting a file @emph{before} the first line of the input
1519	@pxref{Adding a header to multiple files}.
1520
1521	@item w @var{filename}
1522	@findex w (write file) command
1523	@cindex Write to a file
1524	@cindex @value{SSEDEXT}, @file{/dev/stdout} file
1525	@cindex @value{SSEDEXT}, @file{/dev/stderr} file
1526	Write the pattern space to @var{filename}.
1527	As a @value{SSED} extension, two special values of @var{filename} are
1528	supported: @file{/dev/stderr}, which writes the result to the standard
1529	error, and @file{/dev/stdout}, which writes to the standard
1530	output.@footnote{This is equivalent to @code{p} unless the @option{-i}
1531	option is being used.}
1532
1533	The file will be created (or truncated) before the first input line is
1534	read; all @code{w} commands (including instances of the @code{w} flag
1535	on successful @code{s} commands) which refer to the same @var{filename}
1536	are output without closing and reopening the file.
1537
1538	@item D
1539	@findex D (delete first line) command
1540	@cindex Delete first line from pattern space
1541	If pattern space contains no newline, start a normal new cycle as if
1542	the @code{d} command was issued. Otherwise, delete text in the pattern
1543	space up to the first newline, and restart cycle with the resultant
1544	pattern space, without reading a new line of input.
1545
1546	@item N
1547	@findex N (append Next line) command
1548	@cindex Next input line, append to pattern space
1549	@cindex Append next input line to pattern space
1550	Add a newline to the pattern space,
1551	then append the next line of input to the pattern space.
1552	If there is no more input then @command{sed} exits without processing
1553	any more commands.
1554
1555	When @option{-z} is used, a zero byte (the ascii @samp{NUL} character) is
1556	added between the lines (instead of a new line).
1557
1558	By default @command{sed} does not terminate if there is no 'next' input line.
1559	This is a GNU extension which can be disabled with @option{--posix}.
1560	@xref{N_command_last_line,,N command on the last line}.
1561
1562
1563	@item P
1564	@findex P (print first line) command
1565	@cindex Print first line from pattern space
1566	Print out the portion of the pattern space up to the first newline.
1567
1568	@item h
1569	@findex h (hold) command
1570	@cindex Copy pattern space into hold space
1571	@cindex Replace hold space with copy of pattern space
1572	@cindex Hold space, copying pattern space into
1573	Replace the contents of the hold space with the contents of the pattern space.
1574
1575	@item H
1576	@findex H (append Hold) command
1577	@cindex Append pattern space to hold space
1578	@cindex Hold space, appending from pattern space
1579	Append a newline to the contents of the hold space,
1580	and then append the contents of the pattern space to that of the hold space.
1581
1582	@item g
1583	@findex g (get) command
1584	@cindex Copy hold space into pattern space
1585	@cindex Replace pattern space with copy of hold space
1586	@cindex Hold space, copy into pattern space
1587	Replace the contents of the pattern space with the contents of the hold space.
1588
1589	@item G
1590	@findex G (appending Get) command
1591	@cindex Append hold space to pattern space
1592	@cindex Hold space, appending to pattern space
1593	Append a newline to the contents of the pattern space,
1594	and then append the contents of the hold space to that of the pattern space.
1595
1596	@item x
1597	@findex x (eXchange) command
1598	@cindex Exchange hold space with pattern space
1599	@cindex Hold space, exchange with pattern space
1600	Exchange the contents of the hold and pattern spaces.
1601
1602	@end table
1603
1604
1605	@node Programming Commands
1606	@section Commands for @command{sed} gurus
1607
1608	In most cases, use of these commands indicates that you are
1609	probably better off programming in something like @command{awk}
1610	or Perl. But occasionally one is committed to sticking
1611	with @command{sed}, and these commands can enable one to write
1612	quite convoluted scripts.
1613
1614	@cindex Flow of control in scripts
1615	@table @code
1616	@item : @var{label}
1617	[No addresses allowed.]
1618
1619	@findex : (label) command
1620	@cindex Labels, in scripts
1621	Specify the location of @var{label} for branch commands.
1622	In all other respects, a no-op.
1623
1624	@item b @var{label}
1625	@findex b (branch) command
1626	@cindex Branch to a label, unconditionally
1627	@cindex Goto, in scripts
1628	Unconditionally branch to @var{label}.
1629	The @var{label} may be omitted, in which case the next cycle is started.
1630
1631	@item t @var{label}
1632	@findex t (test and branch if successful) command
1633	@cindex Branch to a label, if @code{s///} succeeded
1634	@cindex Conditional branch
1635	Branch to @var{label} only if there has been a successful @code{s}ubstitution
1636	since the last input line was read or conditional branch was taken.
1637	The @var{label} may be omitted, in which case the next cycle is started.
1638
1639	@end table
1640
1641	@node Extended Commands
1642	@section Commands Specific to @value{SSED}
1643
1644	These commands are specific to @value{SSED}, so you
1645	must use them with care and only when you are sure that
1646	hindering portability is not evil. They allow you to check
1647	for @value{SSED} extensions or to do tasks that are required
1648	quite often, yet are unsupported by standard @command{sed}s.
1649
1650	@table @code
1651	@item e [@var{command}]
1652	@findex e (evaluate) command
1653	@cindex Evaluate Bourne-shell commands
1654	@cindex Subprocesses
1655	@cindex @value{SSEDEXT}, evaluating Bourne-shell commands
1656	@cindex @value{SSEDEXT}, subprocesses
1657	This command allows one to pipe input from a shell command
1658	into pattern space. Without parameters, the @code{e} command
1659	executes the command that is found in pattern space and
1660	replaces the pattern space with the output; a trailing newline
1661	is suppressed.
1662
1663	If a parameter is specified, instead, the @code{e} command
1664	interprets it as a command and sends its output to the output stream.
1665	The command can run across multiple lines, all but the last ending with
1666	a back-slash.
1667
1668	In both cases, the results are undefined if the command to be
1669	executed contains a @sc{nul} character.
1670
1671	Note that, unlike the @code{r} command, the output of the command will
1672	be printed immediately; the @code{r} command instead delays the output
1673	to the end of the current cycle.
1674
1675	@item F
1676	@findex F (File name) command
1677	@cindex Printing file name
1678	@cindex File name, printing
1679	Print out the file name of the current input file (with a trailing
1680	newline).
1681
1682	@item Q [@var{exit-code}]
1683	This command accepts only one address.
1684
1685	@findex Q (silent Quit) command
1686	@cindex @value{SSEDEXT}, quitting silently
1687	@cindex @value{SSEDEXT}, returning an exit code
1688	@cindex Quitting
1689	This command is the same as @code{q}, but will not print the
1690	contents of pattern space. Like @code{q}, it provides the
1691	ability to return an exit code to the caller.
1692
1693	This command can be useful because the only alternative ways
1694	to accomplish this apparently trivial function are to use
1695	the @option{-n} option (which can unnecessarily complicate
1696	your script) or resorting to the following snippet, which
1697	wastes time by reading the whole file without any visible effect:
1698
1699	@example
1700	:eat
1701	$d @i{@r{Quit silently on the last line}}
1702	N @i{@r{Read another line, silently}}
1703	g @i{@r{Overwrite pattern space each time to save memory}}
1704	b eat
1705	@end example
1706
1707	@item R @var{filename}
1708	@findex R (read line) command
1709	@cindex Read text from a file
1710	@cindex @value{SSEDEXT}, reading a file a line at a time
1711	@cindex @value{SSEDEXT}, @code{R} command
1712	@cindex @value{SSEDEXT}, @file{/dev/stdin} file
1713	Queue a line of @var{filename} to be read and
1714	inserted into the output stream at the end of the current cycle,
1715	or when the next input line is read.
1716	Note that if @var{filename} cannot be read, or if its end is
1717	reached, no line is appended, without any error indication.
1718
1719	As with the @code{r} command, the special value @file{/dev/stdin}
1720	is supported for the file name, which reads a line from the
1721	standard input.
1722
1723	@item T @var{label}
1724	@findex T (test and branch if failed) command
1725	@cindex @value{SSEDEXT}, branch if @code{s///} failed
1726	@cindex Branch to a label, if @code{s///} failed
1727	@cindex Conditional branch
1728	Branch to @var{label} only if there have been no successful
1729	@code{s}ubstitutions since the last input line was read or
1730	conditional branch was taken. The @var{label} may be omitted,
1731	in which case the next cycle is started.
1732
1733	@item v @var{version}
1734	@findex v (version) command
1735	@cindex @value{SSEDEXT}, checking for their presence
1736	@cindex Requiring @value{SSED}
1737	This command does nothing, but makes @command{sed} fail if
1738	@value{SSED} extensions are not supported, simply because other
1739	versions of @command{sed} do not implement it. In addition, you
1740	can specify the version of @command{sed} that your script
1741	requires, such as @code{4.0.5}. The default is @code{4.0}
1742	because that is the first version that implemented this command.
1743
1744	This command enables all @value{SSEDEXT} even if
1745	@env{POSIXLY_CORRECT} is set in the environment.
1746
1747	@item W @var{filename}
1748	@findex W (write first line) command
1749	@cindex Write first line to a file
1750	@cindex @value{SSEDEXT}, writing first line to a file
1751	Write to the given filename the portion of the pattern space up to
1752	the first newline. Everything said under the @code{w} command about
1753	file handling holds here too.
1754
1755	@item z
1756	@findex z (Zap) command
1757	@cindex @value{SSEDEXT}, emptying pattern space
1758	@cindex Emptying pattern space
1759	This command empties the content of pattern space. It is
1760	usually the same as @samp{s/.*//}, but is more efficient
1761	and works in the presence of invalid multibyte sequences
1762	in the input stream. @sc{posix} mandates that such sequences
1763	are @emph{not} matched by @samp{.}, so that there is no portable
1764	way to clear @command{sed}'s buffers in the middle of the
1765	script in most multibyte locales (including UTF-8 locales).
1766	@end table
1767
1768
1769	@node Multiple commands syntax
1770	@section Multiple commands syntax
1771
1772	@c POSIX says:
1773	@c Editing commands other than {...}, a, b, c, i, r, t, w, :, and #
1774	@c can be followed by a <semicolon>, optional <blank> characters, and
1775	@c another editing command. However, when an s editing command is used
1776	@c with the w flag, following it with another command in this manner
1777	@c produces undefined results.
1778
1779	There are several methods to specify multiple commands in a @command{sed}
1780	program.
1781
1782	Using newlines is most natural when running a sed script from a file
1783	(using the @option{-f} option).
1784
1785	On the command line, all @command{sed} commands may be separated by newlines.
1786	Alternatively, you may specify each command as an argument to an @option{-e}
1787	option:
1788
1789	@codequoteundirected on
1790	@codequotebacktick on
1791	@example
1792	@group
1793	$ seq 6 \| sed '1d
1794	3d
1795	5d'
1796	2
1797	4
1798	6
1799
1800	$ seq 6 \| sed -e 1d -e 3d -e 5d
1801	2
1802	4
1803	6
1804	@end group
1805	@end example
1806	@codequoteundirected off
1807	@codequotebacktick off
1808
1809	A semicolon (@samp{;}) may be used to separate most simple commands:
1810
1811	@codequoteundirected on
1812	@codequotebacktick on
1813	@example
1814	@group
1815	$ seq 6 \| sed '1d;3d;5d'
1816	2
1817	4
1818	6
1819	@end group
1820	@end example
1821	@codequoteundirected off
1822	@codequotebacktick off
1823
1824	The @code{@{},@code{@}},@code{b},@code{t},@code{T},@code{:} commands can
1825	be separated with a semicolon (this is a non-portable @value{SSED} extension).
1826
1827	@codequoteundirected on
1828	@codequotebacktick on
1829	@example
1830	@group
1831	$ seq 4 \| sed '@{1d;3d@}'
1832	2
1833	4
1834
1835	$ seq 6 \| sed '@{1d;3d@};5d'
1836	2
1837	4
1838	6
1839	@end group
1840	@end example
1841	@codequoteundirected off
1842	@codequotebacktick off
1843
1844	Labels used in @code{b},@code{t},@code{T},@code{:} commands are read
1845	until a semicolon. Leading and trailing whitespace is ignored. In
1846	the examples below the label is @samp{x}. The first example works
1847	with @value{SSED}. The second is a portable equivalent. For more
1848	information about branching and labels @pxref{Branching and flow
1849	control}.
1850
1851	@codequoteundirected on
1852	@codequotebacktick on
1853	@example
1854	@group
1855	$ seq 3 \| sed '/1/b x ; s/^/=/ ; :x ; 3d'
1856	1
1857	=2
1858
1859	$ seq 3 \| sed -e '/1/bx' -e 's/^/=/' -e ':x' -e '3d'
1860	1
1861	=2
1862	@end group
1863	@end example
1864	@codequoteundirected off
1865	@codequotebacktick off
1866
1867
1868
1869	@subsection Commands Requiring a newline
1870
1871	The following commands cannot be separated by a semicolon and
1872	require a newline:
1873
1874	@table @asis
1875
1876	@item @code{a},@code{c},@code{i} (append/change/insert)
1877
1878	All characters following @code{a},@code{c},@code{i} commands are taken
1879	as the text to append/change/insert. Using a semicolon leads to
1880	undesirable results:
1881
1882	@codequoteundirected on
1883	@codequotebacktick on
1884	@example
1885	@group
1886	$ seq 2 \| sed '1aHello ; 2d'
1887	1
1888	Hello ; 2d
1889	2
1890	@end group
1891	@end example
1892	@codequoteundirected off
1893	@codequotebacktick off
1894
1895	Separate the commands using @option{-e} or a newline:
1896
1897	@codequoteundirected on
1898	@codequotebacktick on
1899	@example
1900	@group
1901	$ seq 2 \| sed -e 1aHello -e 2d
1902	1
1903	Hello
1904
1905	$ seq 2 \| sed '1aHello
1906	2d'
1907	1
1908	Hello
1909	@end group
1910	@end example
1911	@codequoteundirected off
1912	@codequotebacktick off
1913
1914	Note that specifying the text to add (@samp{Hello}) immediately
1915	after @code{a},@code{c},@code{i} is itself a @value{SSED} extension.
1916	A portable, POSIX-compliant alternative is:
1917
1918	@codequoteundirected on
1919	@codequotebacktick on
1920	@example
1921	@group
1922	$ seq 2 \| sed '1a\
1923	Hello
1924	2d'
1925	1
1926	Hello
1927	@end group
1928	@end example
1929	@codequoteundirected off
1930	@codequotebacktick off
1931
1932	@item @code{#} (comment)
1933
1934	All characters following @samp{#} until the next newline are ignored.
1935
1936	@codequoteundirected on
1937	@codequotebacktick on
1938	@example
1939	@group
1940	$ seq 3 \| sed '# this is a comment ; 2d'
1941	1
1942	2
1943	3
1944
1945
1946	$ seq 3 \| sed '# this is a comment
1947	2d'
1948	1
1949	3
1950	@end group
1951	@end example
1952	@codequoteundirected off
1953	@codequotebacktick off
1954
1955	@item @code{r},@code{R},@code{w},@code{W} (reading and writing files)
1956
1957	The @code{r},@code{R},@code{w},@code{W} commands parse the filename
1958	until end of the line. If whitespace, comments or semicolons are found,
1959	they will be included in the filename, leading to unexpected results:
1960
1961	@codequoteundirected on
1962	@codequotebacktick on
1963	@example
1964	@group
1965	$ seq 2 \| sed '1w hello.txt ; 2d'
1966	1
1967	2
1968
1969	$ ls -log
1970	total 4
1971	-rw-rw-r-- 1 2 Jan 23 23:03 hello.txt ; 2d
1972
1973	$ cat 'hello.txt ; 2d'
1974	1
1975	@end group
1976	@end example
1977	@codequoteundirected off
1978	@codequotebacktick off
1979
1980	Note that @command{sed} silently ignores read/write errors in
1981	@code{r},@code{R},@code{w},@code{W} commands (such as missing files).
1982	In the following example, @command{sed} tries to read a file named
1983	@samp{@file{hello.txt ; N}}. The file is missing, and the error is silently
1984	ignored:
1985
1986	@codequoteundirected on
1987	@codequotebacktick on
1988	@example
1989	@group
1990	$ echo x \| sed '1rhello.txt ; N'
1991	x
1992	@end group
1993	@end example
1994	@codequoteundirected off
1995	@codequotebacktick off
1996
1997	@item @code{e} (command execution)
1998
1999	Any characters following the @code{e} command until the end of the line
2000	will be sent to the shell. If whitespace, comments or semicolons are found,
2001	they will be included in the shell command, leading to unexpected results:
2002
2003	@codequoteundirected on
2004	@codequotebacktick on
2005	@example
2006	@group
2007	$ echo a \| sed '1e touch foo#bar'
2008	a
2009
2010	$ ls -1
2011	foo#bar
2012
2013	$ echo a \| sed '1e touch foo ; s/a/b/'
2014	sh: 1: s/a/b/: not found
2015	a
2016	@end group
2017	@end example
2018	@codequoteundirected off
2019	@codequotebacktick off
2020
2021
2022	@item @code{s///[we]} (substitute with @code{e} or @code{w} flags)
2023
2024	In a substitution command, the @code{w} flag writes the substitution
2025	result to a file, and the @code{e} flag executes the substitution result
2026	as a shell command. As with the @code{r/R/w/W/e} commands, these
2027	must be terminated with a newline. If whitespace, comments or semicolons
2028	are found, they will be included in the shell command or filename, leading to
2029	unexpected results:
2030
2031	@codequoteundirected on
2032	@codequotebacktick on
2033	@example
2034	@group
2035	$ echo a \| sed 's/a/b/w1.txt#foo'
2036	b
2037
2038	$ ls -1
2039	1.txt#foo
2040	@end group
2041	@end example
2042	@codequoteundirected off
2043	@codequotebacktick off
2044
2045	@end table
2046
2047
2048	@node sed addresses
2049	@chapter Addresses: selecting lines
2050
2051	@menu
2052	* Addresses overview:: Addresses overview
2053	* Numeric Addresses:: selecting lines by numbers
2054	* Regexp Addresses:: selecting lines by text matching
2055	* Range Addresses:: selecting a range of lines
2056	* Zero Address:: Using address @code{0}
2057	@end menu
2058
2059	@node Addresses overview
2060	@section Addresses overview
2061
2062	@cindex addresses, numeric
2063	@cindex numeric addresses
2064	Addresses determine on which line(s) the @command{sed} command will be
2065	executed. The following command replaces any first occurrence of @samp{hello}
2066	with @samp{world} only on line 144:
2067
2068	@codequoteundirected on
2069	@codequotebacktick on
2070	@example
2071	sed '144s/hello/world/' input.txt > output.txt
2072	@end example
2073	@codequoteundirected off
2074	@codequotebacktick off
2075
2076
2077
2078	If no address is specified, the command is performed on all lines.
2079	The following command replaces @samp{hello} with @samp{world},
2080	targeting every line of the input file.
2081	However, note that it modifies only the first instance of @samp{hello}
2082	on each line.
2083	Use the @samp{g} modifier to affect every instance on each affected line.
2084
2085	@codequoteundirected on
2086	@codequotebacktick on
2087	@example
2088	sed 's/hello/world/' input.txt > output.txt
2089	@end example
2090	@codequoteundirected off
2091	@codequotebacktick off
2092
2093
2094
2095	@cindex addresses, regular expression
2096	@cindex regular expression addresses
2097	Addresses can contain regular expressions to match lines based
2098	on content instead of line numbers. The following command replaces
2099	@samp{hello} with @samp{world} only on lines
2100	containing the string @samp{apple}:
2101
2102	@codequoteundirected on
2103	@codequotebacktick on
2104	@example
2105	sed '/apple/s/hello/world/' input.txt > output.txt
2106	@end example
2107	@codequoteundirected off
2108	@codequotebacktick off
2109
2110
2111
2112	@cindex addresses, range
2113	@cindex range addresses
2114	An address range is specified with two addresses separated by a comma
2115	(@code{,}). Addresses can be numeric, regular expressions, or a mix of
2116	both.
2117	The following command replaces @samp{hello} with @samp{world}
2118	only on lines 4 to 17 (inclusive):
2119
2120	@codequoteundirected on
2121	@codequotebacktick on
2122	@example
2123	sed '4,17s/hello/world/' input.txt > output.txt
2124	@end example
2125	@codequoteundirected off
2126	@codequotebacktick off
2127
2128
2129
2130	@cindex Excluding lines
2131	@cindex Selecting non-matching lines
2132	@cindex addresses, negating
2133	@cindex addresses, excluding
2134	Appending the @code{!} character to the end of an address
2135	specification (before the command letter) negates the sense of the
2136	match. That is, if the @code{!} character follows an address or an
2137	address range, then only lines which do @emph{not} match the addresses
2138	will be selected. The following command replaces @samp{hello}
2139	with @samp{world} only on lines @emph{not} containing the string
2140	@samp{apple}:
2141
2142	@example
2143	sed '/apple/!s/hello/world/' input.txt > output.txt
2144	@end example
2145
2146	The following command replaces @samp{hello} with
2147	@samp{world} only on lines 1 to 3 and from line 18 to the last line of the
2148	input file (i.e. excluding lines 4 to 17):
2149
2150	@example
2151	sed '4,17!s/hello/world/' input.txt > output.txt
2152	@end example
2153
2154
2155
2156
2157
2158	@node Numeric Addresses
2159	@section Selecting lines by numbers
2160	@cindex Addresses, in @command{sed} scripts
2161	@cindex Line selection
2162	@cindex Selecting lines to process
2163
2164	Addresses in a @command{sed} script can be in any of the following forms:
2165	@table @code
2166	@item @var{number}
2167	@cindex Address, numeric
2168	@cindex Line, selecting by number
2169	Specifying a line number will match only that line in the input.
2170	(Note that @command{sed} counts lines continuously across all input files
2171	unless @option{-i} or @option{-s} options are specified.)
2172
2173	@item $
2174	@cindex Address, last line
2175	@cindex Last line, selecting
2176	@cindex Line, selecting last
2177	This address matches the last line of the last file of input, or
2178	the last line of each file when the @option{-i} or @option{-s} options
2179	are specified.
2180
2181
2182	@item @var{first}~@var{step}
2183	@cindex GNU extensions, @samp{@var{n}~@var{m}} addresses
2184	This GNU extension matches every @var{step}th line
2185	starting with line @var{first}.
2186	In particular, lines will be selected when there exists
2187	a non-negative @var{n} such that the current line-number equals
2188	@var{first} + (@var{n} * @var{step}).
2189	Thus, one would use @code{1~2} to select the odd-numbered lines and
2190	@code{0~2} for even-numbered lines;
2191	to pick every third line starting with the second, @samp{2~3} would be used;
2192	to pick every fifth line starting with the tenth, use @samp{10~5};
2193	and @samp{50~0} is just an obscure way of saying @code{50}.
2194
2195	The following commands demonstrate the step address usage:
2196
2197	@example
2198	$ seq 10 \| sed -n '0~4p'
2199	4
2200	8
2201
2202	$ seq 10 \| sed -n '1~3p'
2203	1
2204	4
2205	7
2206	10
2207	@end example
2208
2209
2210	@end table
2211
2212
2213
2214	@node Regexp Addresses
2215	@section selecting lines by text matching
2216
2217	@value{SSED} supports the following regular expression addresses.
2218	The default regular expression is
2219	@ref{BRE syntax, , Basic Regular Expression (BRE)}.
2220	If @option{-E} or @option{-r} options are used, The regular expression should be
2221	in @ref{ERE syntax, , Extended Regular Expression (ERE)} syntax.
2222	@xref{BRE vs ERE}.
2223
2224	@table @code
2225	@item /@var{regexp}/
2226	@cindex Address, as a regular expression
2227	@cindex Line, selecting by regular expression match
2228	This will select any line which matches the regular expression @var{regexp}.
2229	If @var{regexp} itself includes any @code{/} characters,
2230	each must be escaped by a backslash (@code{\}).
2231
2232	The following command prints lines in @file{/etc/passwd}
2233	which end with @samp{bash}@footnote{
2234	There are of course many other ways to do the same,
2235	e.g.
2236	@example
2237	grep 'bash$' /etc/passwd
2238	awk -F: '$7 == "/bin/bash"' /etc/passwd
2239	@end example
2240	}:
2241
2242	@example
2243	sed -n '/bash$/p' /etc/passwd
2244	@end example
2245
2246	@cindex empty regular expression
2247	@cindex @value{SSEDEXT}, modifiers and the empty regular expression
2248	The empty regular expression @samp{//} repeats the last regular
2249	expression match (the same holds if the empty regular expression is
2250	passed to the @code{s} command). Note that modifiers to regular expressions
2251	are evaluated when the regular expression is compiled, thus it is invalid to
2252	specify them together with the empty regular expression.
2253
2254	@item \%@var{regexp}%
2255	(The @code{%} may be replaced by any other single character.)
2256
2257	@cindex Slash character, in regular expressions
2258	This also matches the regular expression @var{regexp},
2259	but allows one to use a different delimiter than @code{/}.
2260	This is particularly useful if the @var{regexp} itself contains
2261	a lot of slashes, since it avoids the tedious escaping of every @code{/}.
2262	If @var{regexp} itself includes any delimiter characters,
2263	each must be escaped by a backslash (@code{\}).
2264
2265	The following commands are equivalent. They print lines
2266	which start with @samp{/home/alice/documents/}:
2267
2268	@example
2269	sed -n '/^\/home\/alice\/documents\//p'
2270	sed -n '\%^/home/alice/documents/%p'
2271	sed -n '\;^/home/alice/documents/;p'
2272	@end example
2273
2274
2275	@item /@var{regexp}/I
2276	@itemx \%@var{regexp}%I
2277	@cindex GNU extensions, @code{I} modifier
2278	@cindex case insensitive, regular expression
2279	The @code{I} modifier to regular-expression matching is a GNU
2280	extension which causes the @var{regexp} to be matched in
2281	a case-insensitive manner.
2282
2283	In many other programming languages, a lower case @code{i} is used
2284	for case-insensitive regular expression matching. However, in @command{sed}
2285	the @code{i} is used for the insert command (@pxref{insert command}).
2286
2287	Observe the difference between the following examples.
2288
2289	In this example, @code{/b/I} is the address: regular expression with @code{I}
2290	modifier. @code{d} is the delete command:
2291
2292	@example
2293	$ printf "%s\n" a b c \| sed '/b/Id'
2294	a
2295	c
2296	@end example
2297
2298	Here, @code{/b/} is the address: a regular expression.
2299	@code{i} is the insert command.
2300	@code{d} is the value to insert.
2301	A line with @samp{d} is then inserted above the matched line:
2302
2303	@example
2304	$ printf "%s\n" a b c \| sed '/b/id'
2305	a
2306	d
2307	b
2308	c
2309	@end example
2310
2311	@item /@var{regexp}/M
2312	@itemx \%@var{regexp}%M
2313	@cindex @value{SSEDEXT}, @code{M} modifier
2314	The @code{M} modifier to regular-expression matching is a @value{SSED}
2315	extension which directs @value{SSED} to match the regular expression
2316	in @cite{multi-line} mode. The modifier causes @code{^} and @code{$} to
2317	match respectively (in addition to the normal behavior) the empty string
2318	after a newline, and the empty string before a newline. There are
2319	special character sequences
2320	@ifclear PERL
2321	(@code{\`} and @code{\'})
2322	@end ifclear
2323	which always match the beginning or the end of the buffer.
2324	In addition,
2325	the period character does not match a new-line character in
2326	multi-line mode.
2327	@end table
2328
2329
2330	@cindex regex addresses and pattern space
2331	@cindex regex addresses and input lines
2332	Regex addresses operate on the content of the current
2333	pattern space. If the pattern space is changed (for example with @code{s///}
2334	command) the regular expression matching will operate on the changed text.
2335
2336	In the following example, automatic printing is disabled with
2337	@option{-n}. The @code{s/2/X/} command changes lines containing
2338	@samp{2} to @samp{X}. The command @code{/[0-9]/p} matches
2339	lines with digits and prints them.
2340	Because the second line is changed before the @code{/[0-9]/} regex,
2341	it will not match and will not be printed:
2342
2343	@codequoteundirected on
2344	@codequotebacktick on
2345	@example
2346	@group
2347	$ seq 3 \| sed -n 's/2/X/ ; /[0-9]/p'
2348	1
2349	3
2350	@end group
2351	@end example
2352	@codequoteundirected off
2353	@codequotebacktick off
2354
2355
2356	@node Range Addresses
2357	@section Range Addresses
2358
2359	@cindex Range of lines
2360	@cindex Several lines, selecting
2361	An address range can be specified by specifying two addresses
2362	separated by a comma (@code{,}). An address range matches lines
2363	starting from where the first address matches, and continues
2364	until the second address matches (inclusively):
2365
2366	@example
2367	$ seq 10 \| sed -n '4,6p'
2368	4
2369	5
2370	6
2371	@end example
2372
2373	If the second address is a @var{regexp}, then checking for the
2374	ending match will start with the line @emph{following} the
2375	line which matched the first address: a range will always
2376	span at least two lines (except of course if the input stream
2377	ends).
2378
2379	@example
2380	$ seq 10 \| sed -n '4,/[0-9]/p'
2381	4
2382	5
2383	@end example
2384
2385	If the second address is a @var{number} less than (or equal to)
2386	the line matching the first address, then only the one line is
2387	matched:
2388
2389	@example
2390	$ seq 10 \| sed -n '4,1p'
2391	4
2392	@end example
2393
2394	@anchor{Zero Address Regex Range}
2395	@cindex Special addressing forms
2396	@cindex Range with start address of zero
2397	@cindex Zero, as range start address
2398	@cindex @var{addr1},+N
2399	@cindex @var{addr1},~N
2400	@cindex GNU extensions, special two-address forms
2401	@cindex GNU extensions, @code{0} address
2402	@cindex GNU extensions, 0,@var{addr2} addressing
2403	@cindex GNU extensions, @var{addr1},+@var{N} addressing
2404	@cindex GNU extensions, @var{addr1},~@var{N} addressing
2405	@value{SSED} also supports some special two-address forms; all these
2406	are GNU extensions:
2407	@table @code
2408	@item 0,/@var{regexp}/
2409	A line number of @code{0} can be used in an address specification like
2410	@code{0,/@var{regexp}/} so that @command{sed} will try to match
2411	@var{regexp} in the first input line too. In other words,
2412	@code{0,/@var{regexp}/} is similar to @code{1,/@var{regexp}/},
2413	except that if @var{addr2} matches the very first line of input the
2414	@code{0,/@var{regexp}/} form will consider it to end the range, whereas
2415	the @code{1,/@var{regexp}/} form will match the beginning of its range and
2416	hence make the range span up to the @emph{second} occurrence of the
2417	regular expression.
2418
2419	The following examples demonstrate the difference between starting
2420	with address 1 and 0:
2421
2422	@example
2423	$ seq 10 \| sed -n '1,/[0-9]/p'
2424	1
2425	2
2426
2427	$ seq 10 \| sed -n '0,/[0-9]/p'
2428	1
2429	@end example
2430
2431
2432	@item @var{addr1},+@var{N}
2433	Matches @var{addr1} and the @var{N} lines following @var{addr1}.
2434
2435	@example
2436	$ seq 10 \| sed -n '6,+2p'
2437	6
2438	7
2439	8
2440	@end example
2441
2442	@var{addr1} can be a line number or a regular expression.
2443
2444	@item @var{addr1},~@var{N}
2445	Matches @var{addr1} and the lines following @var{addr1}
2446	until the next line whose input line number is a multiple of @var{N}.
2447	The following command prints starting at line 6, until the next line which
2448	is a multiple of 4 (i.e. line 8):
2449
2450	@example
2451	$ seq 10 \| sed -n '6,~4p'
2452	6
2453	7
2454	8
2455	@end example
2456
2457	@var{addr1} can be a line number or a regular expression.
2458
2459	@end table
2460
2461
2462
2463	@node Zero Address
2464	@section Zero Address
2465	@cindex Zero Address
2466	As a @value{SSED} extension, @code{0} address can be used in two cases:
2467	@enumerate
2468	@item
2469	In a regex range addresses as @code{0,/@var{regexp}/}
2470	(@pxref{Zero Address Regex Range}).
2471	@item
2472	With the @code{r} command, inserting a file before the first line
2473	(@pxref{Adding a header to multiple files}).
2474	@end enumerate
2475
2476	Note that these are the only places where the @code{0} address makes
2477	sense; Commands which are given the @code{0} address in any
2478	other way will give an error.
2479
2480
2481
2482	@node sed regular expressions
2483	@chapter Regular Expressions: selecting text
2484
2485	@menu
2486	* Regular Expressions Overview:: Overview of Regular expression in @command{sed}
2487	* BRE vs ERE:: Basic (BRE) and extended (ERE) regular expression
2488	syntax
2489	* BRE syntax:: Overview of basic regular expression syntax
2490	* ERE syntax:: Overview of extended regular expression syntax
2491	* Character Classes and Bracket Expressions::
2492	* regexp extensions:: Additional regular expression commands
2493	* Back-references and Subexpressions:: Back-references and Subexpressions
2494	* Escapes:: Specifying special characters
2495	* Locale Considerations:: Multibyte characters and locale considerations
2496	@end menu
2497
2498	@node Regular Expressions Overview
2499	@section Overview of regular expression in @command{sed}
2500
2501	@c NOTE: Keep examples in the 'overview' section
2502	@c neutral in regards to BRE/ERE - to ease understanding.
2503
2504
2505	To know how to use @command{sed}, people should understand regular
2506	expressions (@dfn{regexp} for short). A regular expression
2507	is a pattern that is matched against a
2508	subject string from left to right. Most characters are
2509	@dfn{ordinary}: they stand for
2510	themselves in a pattern, and match the corresponding characters.
2511	Regular expressions in @command{sed} are specified between two
2512	slashes.
2513
2514	The following command prints lines containing the string @samp{hello}:
2515
2516	@example
2517	sed -n '/hello/p'
2518	@end example
2519
2520	The above example is equivalent to this @command{grep} command:
2521
2522	@example
2523	grep 'hello'
2524	@end example
2525
2526	The power of regular expressions comes from the ability to include
2527	alternatives and repetitions in the pattern. These are encoded in the
2528	pattern by the use of @dfn{special characters}, which do not stand for
2529	themselves but instead are interpreted in some special way.
2530
2531	The character @code{^} (caret) in a regular expression matches the
2532	beginning of the line. The character @code{.} (dot) matches any single
2533	character. The following @command{sed} command matches and prints
2534	lines which start with the letter @samp{b}, followed by any single character,
2535	followed by the letter @samp{d}:
2536
2537	@example
2538	$ printf "%s\n" abode bad bed bit bid byte body \| sed -n '/^b.d/p'
2539	bad
2540	bed
2541	bid
2542	body
2543	@end example
2544
2545	The following sections explain the meaning and usage of special
2546	characters in regular expressions.
2547
2548	@node BRE vs ERE
2549	@section Basic (BRE) and extended (ERE) regular expression
2550
2551	Basic and extended regular expressions are two variations on the
2552	syntax of the specified pattern. Basic Regular Expression (BRE) syntax is the
2553	default in @command{sed} (and similarly in @command{grep}).
2554	Use the POSIX-specified @option{-E} option (@option{-r},
2555	@option{--regexp-extended}) to enable Extended Regular Expression (ERE) syntax.
2556
2557	In @value{SSED}, the only difference between basic and extended regular
2558	expressions is in the behavior of a few special characters: @samp{?},
2559	@samp{+}, parentheses, braces (@samp{@{@}}), and @samp{\|}.
2560
2561	With basic (BRE) syntax, these characters do not have special meaning
2562	unless prefixed with a backslash (@samp{\}); While with extended (ERE) syntax
2563	it is reversed: these characters are special unless they are prefixed
2564	with backslash (@samp{\}).
2565
2566	@multitable @columnfractions .28 .36 .35
2567
2568	@headitem Desired pattern
2569	@tab Basic (BRE) Syntax
2570	@tab Extended (ERE) Syntax
2571
2572	@item literal @samp{+} (plus sign)
2573
2574	@tab
2575	@exampleindent 0
2576	@codequoteundirected on
2577	@codequotebacktick on
2578	@example
2579	$ echo 'a+b=c' > foo
2580	$ sed -n '/a+b/p' foo
2581	a+b=c
2582	@end example
2583	@codequotebacktick off
2584	@codequoteundirected off
2585
2586	@tab
2587	@exampleindent 0
2588	@codequoteundirected on
2589	@codequotebacktick on
2590	@example
2591	$ echo 'a+b=c' > foo
2592	$ sed -E -n '/a\+b/p' foo
2593	a+b=c
2594	@end example
2595	@codequotebacktick off
2596	@codequoteundirected off
2597
2598
2599	@item One or more @samp{a} characters followed by @samp{b}
2600	(plus sign as special meta-character)
2601
2602	@tab
2603	@exampleindent 0
2604	@codequoteundirected on
2605	@codequotebacktick on
2606	@example
2607	$ echo aab > foo
2608	$ sed -n '/a\+b/p' foo
2609	aab
2610	@end example
2611	@codequotebacktick off
2612	@codequoteundirected off
2613
2614	@tab
2615	@exampleindent 0
2616	@codequoteundirected on
2617	@codequotebacktick on
2618	@example
2619	$ echo aab > foo
2620	$ sed -E -n '/a+b/p' foo
2621	aab
2622	@end example
2623	@codequotebacktick off
2624	@codequoteundirected off
2625
2626	@end multitable
2627
2628
2629
2630
2631	@node BRE syntax
2632	@section Overview of basic regular expression syntax
2633
2634	Here is a brief description
2635	of regular expression syntax as used in @command{sed}.
2636
2637	@table @code
2638	@item @var{char}
2639	A single ordinary character matches itself.
2640
2641	@item *
2642	@cindex GNU extensions, to basic regular expressions
2643	Matches a sequence of zero or more instances of matches for the
2644	preceding regular expression, which must be an ordinary character, a
2645	special character preceded by @code{\}, a @code{.}, a grouped regexp
2646	(see below), or a bracket expression. As a GNU extension, a
2647	postfixed regular expression can also be followed by @code{*}; for
2648	example, @code{a*} is equivalent to @code{a}. POSIX
2649	1003.1-2001 says that @code{*} stands for itself when it appears at
2650	the start of a regular expression or subexpression, but many
2651	non-GNU implementations do not support this and portable
2652	scripts should instead use @code{\*} in these contexts.
2653	@item .
2654	Matches any character, including newline.
2655
2656	@item ^
2657	Matches the null string at beginning of the pattern space, i.e. what
2658	appears after the circumflex must appear at the beginning of the
2659	pattern space.
2660
2661	In most scripts, pattern space is initialized to the content of each
2662	line (@pxref{Execution Cycle, , How @code{sed} works}). So, it is a
2663	useful simplification to think of @code{^#include} as matching only
2664	lines where @samp{#include} is the first thing on the line---if there is
2665	any preceding space, for example, the match fails. This simplification is
2666	valid as long as the original content of pattern space is not modified,
2667	for example with an @code{s} command.
2668
2669	@code{^} acts as a special character only at the beginning of the
2670	regular expression or subexpression (that is, after @code{\(} or
2671	@code{\\|}). Portable scripts should avoid @code{^} at the beginning of
2672	a subexpression, though, as POSIX allows implementations that
2673	treat @code{^} as an ordinary character in that context.
2674
2675	@item $
2676	It is the same as @code{^}, but refers to end of pattern space.
2677	@code{$} also acts as a special character only at the end
2678	of the regular expression or subexpression (that is, before @code{\)}
2679	or @code{\\|}), and its use at the end of a subexpression is not
2680	portable.
2681
2682
2683	@item [@var{list}]
2684	@itemx [^@var{list}]
2685	Matches any single character in @var{list}: for example,
2686	@code{[aeiou]} matches all vowels. A list may include
2687	sequences like @code{@var{char1}-@var{char2}}, which
2688	matches any character between (inclusive) @var{char1}
2689	and @var{char2}.
2690	@xref{Character Classes and Bracket Expressions}.
2691
2692	@item \+
2693	@cindex GNU extensions, to basic regular expressions
2694	As @code{*}, but matches one or more. It is a GNU extension.
2695
2696	@item \?
2697	@cindex GNU extensions, to basic regular expressions
2698	As @code{*}, but only matches zero or one. It is a GNU extension.
2699
2700	@item \@{@var{i}\@}
2701	As @code{*}, but matches exactly @var{i} sequences (@var{i} is a
2702	decimal integer; for portability, keep it between 0 and 255
2703	inclusive).
2704
2705	@item \@{@var{i},@var{j}\@}
2706	Matches between @var{i} and @var{j}, inclusive, sequences.
2707
2708	@item \@{@var{i},\@}
2709	Matches more than or equal to @var{i} sequences.
2710
2711	@item $@var{regexp}$
2712	Groups the inner @var{regexp} as a whole, this is used to:
2713
2714	@itemize @bullet
2715	@item
2716	@cindex GNU extensions, to basic regular expressions
2717	Apply postfix operators, like @code{$abcd$*}:
2718	this will search for zero or more whole sequences
2719	of @samp{abcd}, while @code{abcd*} would search
2720	for @samp{abc} followed by zero or more occurrences
2721	of @samp{d}. Note that support for @code{$abcd$*} is
2722	required by POSIX 1003.1-2001, but many non-GNU
2723	implementations do not support it and hence it is not universally
2724	portable.
2725
2726	@item
2727	Use back references (see below).
2728	@end itemize
2729
2730
2731	@item @var{regexp1}\\|@var{regexp2}
2732	@cindex GNU extensions, to basic regular expressions
2733	Matches either @var{regexp1} or @var{regexp2}. Use
2734	parentheses to use complex alternative regular expressions.
2735	The matching process tries each alternative in turn, from
2736	left to right, and the first one that succeeds is used.
2737	It is a GNU extension.
2738
2739	@item @var{regexp1}@var{regexp2}
2740	Matches the concatenation of @var{regexp1} and @var{regexp2}.
2741	Concatenation binds more tightly than @code{\\|}, @code{^}, and
2742	@code{$}, but less tightly than the other regular expression
2743	operators.
2744
2745	@item \@var{digit}
2746	Matches the @var{digit}-th @code{$@dots{}$} parenthesized
2747	subexpression in the regular expression. This is called a @dfn{back
2748	reference}. Subexpressions are implicitly numbered by counting
2749	occurrences of @code{\(} left-to-right.
2750
2751	@item \n
2752	Matches the newline character.
2753
2754	@item \@var{char}
2755	Matches @var{char}, where @var{char} is one of @code{$},
2756	@code{*}, @code{.}, @code{[}, @code{\}, or @code{^}.
2757	Note that the only C-like
2758	backslash sequences that you can portably assume to be
2759	interpreted are @code{\n} and @code{\\}; in particular
2760	@code{\t} is not portable, and matches a @samp{t} under most
2761	implementations of @command{sed}, rather than a tab character.
2762
2763	@end table
2764
2765	@cindex Greedy regular expression matching
2766	Note that the regular expression matcher is greedy, i.e., matches
2767	are attempted from left to right and, if two or more matches are
2768	possible starting at the same character, it selects the longest.
2769
2770	@noindent
2771	Examples:
2772	@table @samp
2773	@item abcdef
2774	Matches @samp{abcdef}.
2775
2776	@item a*b
2777	Matches zero or more @samp{a}s followed by a single
2778	@samp{b}. For example, @samp{b} or @samp{aaaaab}.
2779
2780	@item a\?b
2781	Matches @samp{b} or @samp{ab}.
2782
2783	@item a\+b\+
2784	Matches one or more @samp{a}s followed by one or more
2785	@samp{b}s: @samp{ab} is the shortest possible match, but
2786	other examples are @samp{aaaab} or @samp{abbbbb} or
2787	@samp{aaaaaabbbbbbb}.
2788
2789	@item .*
2790	@itemx .\+
2791	These two both match all the characters in a string;
2792	however, the first matches every string (including the empty
2793	string), while the second matches only strings containing
2794	at least one character.
2795
2796	@item ^main.(.)
2797	This matches a string starting with @samp{main},
2798	followed by an opening and closing
2799	parenthesis. The @samp{n}, @samp{(} and @samp{)} need not
2800	be adjacent.
2801
2802	@item ^#
2803	This matches a string beginning with @samp{#}.
2804
2805	@item \\$
2806	This matches a string ending with a single backslash. The
2807	regexp contains two backslashes for escaping.
2808
2809	@item \$
2810	Instead, this matches a string consisting of a single dollar sign,
2811	because it is escaped.
2812
2813	@item [a-zA-Z0-9]
2814	In the C locale, this matches any ASCII letters or digits.
2815
2816	@item [^ @kbd{@key{TAB}}]\+
2817	(Here @kbd{@key{TAB}} stands for a single tab character.)
2818	This matches a string of one or more
2819	characters, none of which is a space or a tab.
2820	Usually this means a word.
2821
2822	@item ^$.*$\n\1$
2823	This matches a string consisting of two equal substrings separated by
2824	a newline.
2825
2826	@item .\@{9\@}A$
2827	This matches nine characters followed by an @samp{A} at the end of a line.
2828
2829	@item ^.\@{15\@}A
2830	This matches the start of a string that contains 16 characters,
2831	the last of which is an @samp{A}.
2832
2833	@end table
2834
2835
2836	@node ERE syntax
2837	@section Overview of extended regular expression syntax
2838	@cindex Extended regular expressions, syntax
2839
2840	The only difference between basic and extended regular expressions is in
2841	the behavior of a few characters: @samp{?}, @samp{+}, parentheses,
2842	braces (@samp{@{@}}), and @samp{\|}. While basic regular expressions
2843	require these to be escaped if you want them to behave as special
2844	characters, when using extended regular expressions you must escape
2845	them if you want them @emph{to match a literal character}. @samp{\|}
2846	is special here because @samp{\\|} is a GNU extension -- standard
2847	basic regular expressions do not provide its functionality.
2848
2849	@noindent
2850	Examples:
2851	@table @code
2852	@item abc?
2853	becomes @samp{abc\?} when using extended regular expressions. It matches
2854	the literal string @samp{abc?}.
2855
2856	@item c\+
2857	becomes @samp{c+} when using extended regular expressions. It matches
2858	one or more @samp{c}s.
2859
2860	@item a\@{3,\@}
2861	becomes @samp{a@{3,@}} when using extended regular expressions. It matches
2862	three or more @samp{a}s.
2863
2864	@item $abc$\@{2,3\@}
2865	becomes @samp{(abc)@{2,3@}} when using extended regular expressions. It
2866	matches either @samp{abcabc} or @samp{abcabcabc}.
2867
2868	@item $abc*$\1
2869	becomes @samp{(abc*)\1} when using extended regular expressions.
2870	Backreferences must still be escaped when using extended regular
2871	expressions.
2872
2873	@item a\\|b
2874	becomes @samp{a\|b} when using extended regular expressions. It matches
2875	@samp{a} or @samp{b}.
2876	@end table
2877
2878	@node Character Classes and Bracket Expressions
2879	@section Character Classes and Bracket Expressions
2880
2881	@c The 'character class' section is shamelessly copied from grep's manual.
2882
2883	@cindex bracket expression
2884	@cindex character class
2885	A @dfn{bracket expression} is a list of characters enclosed by @samp{[} and
2886	@samp{]}.
2887	It matches any single character in that list;
2888	if the first character of the list is the caret @samp{^},
2889	then it matches any character @strong{not} in the list.
2890	For example, the following command replaces the strings
2891	@samp{gray} or @samp{grey} with @samp{blue}:
2892
2893	@example
2894	sed 's/gr[ae]y/blue/'
2895	@end example
2896
2897	@c TODO: fix 'ref' to look good in both HTML and PDF
2898	Bracket expressions can be used in both
2899	@ref{BRE syntax,,basic} and @ref{ERE syntax,,extended}
2900	regular expressions (that is, with or without the @option{-E}/@option{-r}
2901	options).
2902
2903	@cindex range expression
2904	Within a bracket expression, a @dfn{range expression} consists of two
2905	characters separated by a hyphen.
2906	It matches any single character that
2907	sorts between the two characters, inclusive.
2908	In the default C locale, the sorting sequence is the native character
2909	order; for example, @samp{[a-d]} is equivalent to @samp{[abcd]}.
2910
2911
2912	Finally, certain named classes of characters are predefined within
2913	bracket expressions, as follows.
2914
2915	These named classes must be used @emph{inside} brackets
2916	themselves. Correct usage:
2917	@example
2918	$ echo 1 \| sed 's/[[:digit:]]/X/'
2919	X
2920	@end example
2921
2922	Incorrect usage is rejected by newer @command{sed} versions.
2923	Older versions accepted it but treated it as a single bracket expression
2924	(which is equivalent to @samp{[dgit:]},
2925	that is, only the characters @var{d/g/i/t/:}):
2926	@example
2927	# current GNU sed versions - incorrect usage rejected
2928	$ echo 1 \| sed 's/[:digit:]/X/'
2929	sed: character class syntax is [[:space:]], not [:space:]
2930
2931	# older GNU sed versions
2932	$ echo 1 \| sed 's/[:digit:]/X/'
2933	1
2934	@end example
2935
2936
2937	@cindex classes of characters
2938	@cindex character classes
2939	@cindex named character classes
2940	@table @samp
2941
2942	@item [:alnum:]
2943	@opindex alnum @r{character class}
2944	@cindex alphanumeric characters
2945	Alphanumeric characters:
2946	@samp{[:alpha:]} and @samp{[:digit:]}; in the @samp{C} locale and ASCII
2947	character encoding, this is the same as @samp{[0-9A-Za-z]}.
2948
2949	@item [:alpha:]
2950	@opindex alpha @r{character class}
2951	@cindex alphabetic characters
2952	Alphabetic characters:
2953	@samp{[:lower:]} and @samp{[:upper:]}; in the @samp{C} locale and ASCII
2954	character encoding, this is the same as @samp{[A-Za-z]}.
2955
2956	@item [:blank:]
2957	@opindex blank @r{character class}
2958	@cindex blank characters
2959	Blank characters:
2960	space and tab.
2961
2962	@item [:cntrl:]
2963	@opindex cntrl @r{character class}
2964	@cindex control characters
2965	Control characters.
2966	In ASCII, these characters have octal codes 000
2967	through 037, and 177 (DEL).
2968	In other character sets, these are
2969	the equivalent characters, if any.
2970
2971	@item [:digit:]
2972	@opindex digit @r{character class}
2973	@cindex digit characters
2974	@cindex numeric characters
2975	Digits: @code{0 1 2 3 4 5 6 7 8 9}.
2976
2977	@item [:graph:]
2978	@opindex graph @r{character class}
2979	@cindex graphic characters
2980	Graphical characters:
2981	@samp{[:alnum:]} and @samp{[:punct:]}.
2982
2983	@item [:lower:]
2984	@opindex lower @r{character class}
2985	@cindex lower-case letters
2986	Lower-case letters; in the @samp{C} locale and ASCII character
2987	encoding, this is
2988	@code{a b c d e f g h i j k l m n o p q r s t u v w x y z}.
2989
2990	@item [:print:]
2991	@opindex print @r{character class}
2992	@cindex printable characters
2993	Printable characters:
2994	@samp{[:alnum:]}, @samp{[:punct:]}, and space.
2995
2996	@item [:punct:]
2997	@opindex punct @r{character class}
2998	@cindex punctuation characters
2999	Punctuation characters; in the @samp{C} locale and ASCII character
3000	encoding, this is
3001	@code{!@: " # $ % & ' ( ) * + , - .@: / : ; < = > ?@: @@ [ \ ] ^ _ ` @{ \| @} ~}.
3002
3003	@item [:space:]
3004	@opindex space @r{character class}
3005	@cindex space characters
3006	@cindex whitespace characters
3007	Space characters: in the @samp{C} locale, this is
3008	tab, newline, vertical tab, form feed, carriage return, and space.
3009
3010
3011	@item [:upper:]
3012	@opindex upper @r{character class}
3013	@cindex upper-case letters
3014	Upper-case letters: in the @samp{C} locale and ASCII character
3015	encoding, this is
3016	@code{A B C D E F G H I J K L M N O P Q R S T U V W X Y Z}.
3017
3018	@item [:xdigit:]
3019	@opindex xdigit @r{character class}
3020	@cindex xdigit class
3021	@cindex hexadecimal digits
3022	Hexadecimal digits:
3023	@code{0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f}.
3024
3025	@end table
3026	Note that the brackets in these class names are
3027	part of the symbolic names, and must be included in addition to
3028	the brackets delimiting the bracket expression.
3029
3030	Most meta-characters lose their special meaning inside bracket expressions:
3031
3032	@table @samp
3033	@item ]
3034	ends the bracket expression if it's not the first list item.
3035	So, if you want to make the @samp{]} character a list item,
3036	you must put it first.
3037
3038	@item -
3039	represents the range if it's not first or last in a list or the ending point
3040	of a range.
3041
3042	@item ^
3043	represents the characters not in the list.
3044	If you want to make the @samp{^}
3045	character a list item, place it anywhere but first.
3046	@end table
3047
3048	TODO: incorporate this paragraph (copied verbatim from BRE section).
3049
3050	@cindex @code{POSIXLY_CORRECT} behavior, bracket expressions
3051	The characters @code{$}, @code{*}, @code{.}, @code{[}, and @code{\}
3052	are normally not special within @var{list}. For example, @code{[\*]}
3053	matches either @samp{\} or @samp{*}, because the @code{\} is not
3054	special here. However, strings like @code{[.ch.]}, @code{[=a=]}, and
3055	@code{[:space:]} are special within @var{list} and represent collating
3056	symbols, equivalence classes, and character classes, respectively, and
3057	@code{[} is therefore special within @var{list} when it is followed by
3058	@code{.}, @code{=}, or @code{:}. Also, when not in
3059	@env{POSIXLY_CORRECT} mode, special escapes like @code{\n} and
3060	@code{\t} are recognized within @var{list}. @xref{Escapes}.
3061	@c ********
3062
3063
3064	@c TODO: improve explanation about collation classes and equivalence classes
3065	@c perhaps dedicate a section to Locales ??
3066
3067	@table @samp
3068	@item [.
3069	represents the open collating symbol.
3070
3071	@item .]
3072	represents the close collating symbol.
3073
3074	@item [=
3075	represents the open equivalence class.
3076
3077	@item =]
3078	represents the close equivalence class.
3079
3080	@item [:
3081	represents the open character class symbol, and should be followed by a
3082	valid character class name.
3083
3084	@item :]
3085	represents the close character class symbol.
3086	@end table
3087
3088
3089	@node regexp extensions
3090	@section regular expression extensions
3091
3092	The following sequences have special meaning inside regular expressions
3093	(used in @ref{Regexp Addresses,,addresses} and the @code{s} command).
3094
3095	These can be used in both
3096	@ref{BRE syntax,,basic} and @ref{ERE syntax,,extended}
3097	regular expressions (that is, with or without the @option{-E}/@option{-r}
3098	options).
3099
3100	@table @code
3101	@item \w
3102	Matches any ``word'' character. A ``word'' character is any
3103	letter or digit or the underscore character.
3104
3105	@example
3106	$ echo "abc %-= def." \| sed 's/\w/X/g'
3107	XXX %-= XXX.
3108	@end example
3109
3110
3111	@item \W
3112	Matches any ``non-word'' character.
3113
3114	@example
3115	$ echo "abc %-= def." \| sed 's/\W/X/g'
3116	abcXXXXXdefX
3117	@end example
3118
3119
3120	@item \b
3121	Matches a word boundary; that is it matches if the character
3122	to the left is a ``word'' character and the character to the
3123	right is a ``non-word'' character, or vice-versa.
3124
3125	@example
3126	$ echo "abc %-= def." \| sed 's/\b/X/g'
3127	XabcX %-= XdefX.
3128	@end example
3129
3130
3131	@item \B
3132	Matches everywhere but on a word boundary; that is it matches
3133	if the character to the left and the character to the right
3134	are either both ``word'' characters or both ``non-word''
3135	characters.
3136
3137	@example
3138	$ echo "abc %-= def." \| sed 's/\B/X/g'
3139	aXbXc X%X-X=X dXeXf.X
3140	@end example
3141
3142
3143	@item \s
3144	Matches whitespace characters (spaces and tabs).
3145	Newlines embedded in the pattern/hold spaces will also match:
3146
3147	@example
3148	$ echo "abc %-= def." \| sed 's/\s/X/g'
3149	abcX%-=Xdef.
3150	@end example
3151
3152
3153	@item \S
3154	Matches non-whitespace characters.
3155
3156	@example
3157	$ echo "abc %-= def." \| sed 's/\S/X/g'
3158	XXX XXX XXXX
3159	@end example
3160
3161
3162	@item \<
3163	Matches the beginning of a word.
3164
3165	@example
3166	$ echo "abc %-= def." \| sed 's/\</X/g'
3167	Xabc %-= Xdef.
3168	@end example
3169
3170
3171	@item \>
3172	Matches the end of a word.
3173
3174	@example
3175	$ echo "abc %-= def." \| sed 's/\>/X/g'
3176	abcX %-= defX.
3177	@end example
3178
3179
3180	@item \`
3181	Matches only at the start of pattern space. This is different
3182	from @code{^} in multi-line mode.
3183
3184	Compare the following two examples:
3185
3186	@example
3187	$ printf "a\nb\nc\n" \| sed 'N;N;s/^/X/gm'
3188	Xa
3189	Xb
3190	Xc
3191
3192	$ printf "a\nb\nc\n" \| sed 'N;N;s/\`/X/gm'
3193	Xa
3194	b
3195	c
3196	@end example
3197
3198	@item \'
3199	Matches only at the end of pattern space. This is different
3200	from @code{$} in multi-line mode.
3201
3202
3203
3204	@end table
3205
3206
3207	@node Back-references and Subexpressions
3208	@section Back-references and Subexpressions
3209	@cindex subexpression
3210	@cindex back-reference
3211
3212	@dfn{back-references} are regular expression commands which refer to a
3213	previous part of the matched regular expression. Back-references are
3214	specified with backslash and a single digit (e.g. @samp{\1}). The
3215	part of the regular expression they refer to is called a
3216	@dfn{subexpression}, and is designated with parentheses.
3217
3218	Back-references and subexpressions are used in two cases: in the
3219	regular expression search pattern, and in the @var{replacement} part
3220	of the @command{s} command (@pxref{Regexp Addresses,,Regular
3221	Expression Addresses} and @ref{The "s" Command}).
3222
3223	In a regular expression pattern, back-references are used to match
3224	the same content as a previously matched subexpression. In the
3225	following example, the subexpression is @samp{.} - any single
3226	character (being surrounded by parentheses makes it a
3227	subexpression). The back-reference @samp{\1} asks to match the same
3228	content (same character) as the sub-expression.
3229
3230	The command below matches words starting with any character,
3231	followed by the letter @samp{o}, followed by the same character as the
3232	first.
3233
3234	@example
3235	$ sed -E -n '/^(.)o\1$/p' /usr/share/dict/words
3236	bob
3237	mom
3238	non
3239	pop
3240	sos
3241	tot
3242	wow
3243	@end example
3244
3245	Multiple subexpressions are automatically numbered from
3246	left-to-right. This command searches for 6-letter
3247	palindromes (the first three letters are 3 subexpressions,
3248	followed by 3 back-references in reverse order):
3249
3250	@example
3251	$ sed -E -n '/^(.)(.)(.)\3\2\1$/p' /usr/share/dict/words
3252	redder
3253	@end example
3254
3255	In the @command{s} command, back-references can be
3256	used in the @var{replacement} part to refer back to subexpressions in
3257	the @var{regexp} part.
3258
3259	The following example uses two subexpressions in the regular
3260	expression to match two space-separated words. The back-references in
3261	the @var{replacement} part prints the words in a different order:
3262
3263	@example
3264	$ echo "James Bond" \| sed -E 's/(.) (.)/The name is \2, \1 \2./'
3265	The name is Bond, James Bond.
3266	@end example
3267
3268
3269	When used with alternation, if the group does not participate in the
3270	match then the back-reference makes the whole match fail. For
3271	example, @samp{a(.)\|b\1} will not match @samp{ba}. When multiple
3272	regular expressions are given with @option{-e} or from a file
3273	(@samp{-f @var{file}}), back-references are local to each expression.
3274
3275
3276	@node Escapes
3277	@section Escape Sequences - specifying special characters
3278
3279	@cindex GNU extensions, special escapes
3280	Until this chapter, we have only encountered escapes of the form
3281	@samp{\^}, which tell @command{sed} not to interpret the circumflex
3282	as a special character, but rather to take it literally. For
3283	example, @samp{\*} matches a single asterisk rather than zero
3284	or more backslashes.
3285
3286	@cindex @code{POSIXLY_CORRECT} behavior, escapes
3287	This chapter introduces another kind of escape@footnote{All
3288	the escapes introduced here are GNU
3289	extensions, with the exception of @code{\n}. In basic regular
3290	expression mode, setting @code{POSIXLY_CORRECT} disables them inside
3291	bracket expressions.}---that
3292	is, escapes that are applied to a character or sequence of characters
3293	that ordinarily are taken literally, and that @command{sed} replaces
3294	with a special character. This provides a way
3295	of encoding non-printable characters in patterns in a visible manner.
3296	There is no restriction on the appearance of non-printing characters
3297	in a @command{sed} script but when a script is being prepared in the
3298	shell or by text editing, it is usually easier to use one of
3299	the following escape sequences than the binary character it
3300	represents:
3301
3302	The list of these escapes is:
3303
3304	@table @code
3305	@item \a
3306	Produces or matches a @sc{bel} character, that is an ``alert'' (@sc{ascii} 7).
3307
3308	@item \f
3309	Produces or matches a form feed (@sc{ascii} 12).
3310
3311	@item \n
3312	Produces or matches a newline (@sc{ascii} 10).
3313
3314	@item \r
3315	Produces or matches a carriage return (@sc{ascii} 13).
3316
3317	@item \t
3318	Produces or matches a horizontal tab (@sc{ascii} 9).
3319
3320	@item \v
3321	Produces or matches a so called ``vertical tab'' (@sc{ascii} 11).
3322
3323	@item \c@var{x}
3324	Produces or matches @kbd{@sc{Control}-@var{x}}, where @var{x} is
3325	any character. The precise effect of @samp{\c@var{x}} is as follows:
3326	if @var{x} is a lower case letter, it is converted to upper case.
3327	Then bit 6 of the character (hex 40) is inverted. Thus @samp{\cz} becomes
3328	hex 1A, but @samp{\c@{} becomes hex 3B, while @samp{\c;} becomes hex 7B.
3329
3330	@item \d@var{xxx}
3331	Produces or matches a character whose decimal @sc{ascii} value is @var{xxx}.
3332
3333	@item \o@var{xxx}
3334	Produces or matches a character whose octal @sc{ascii} value is @var{xxx}.
3335
3336	@item \x@var{xx}
3337	Produces or matches a character whose hexadecimal @sc{ascii} value is @var{xx}.
3338	@end table
3339
3340	@samp{\b} (backspace) was omitted because of the conflict with
3341	the existing ``word boundary'' meaning.
3342
3343	@subsection Escaping Precedence
3344
3345	@value{SSED} processes escape sequences @emph{before} passing
3346	the text onto the regular-expression matching of the @command{s///} command
3347	and Address matching. Thus the following two commands are equivalent
3348	(@samp{0x5e} is the hexadecimal @sc{ascii} value of the character @samp{^}):
3349
3350	@codequoteundirected on
3351	@codequotebacktick on
3352	@example
3353	@group
3354	$ echo 'a^c' \| sed 's/^/b/'
3355	ba^c
3356
3357	$ echo 'a^c' \| sed 's/\x5e/b/'
3358	ba^c
3359	@end group
3360	@end example
3361	@codequoteundirected off
3362	@codequotebacktick off
3363
3364	As are the following (@samp{0x5b},@samp{0x5d} are the hexadecimal
3365	@sc{ascii} values of @samp{[},@samp{]}, respectively):
3366
3367	@codequoteundirected on
3368	@codequotebacktick on
3369	@example
3370	@group
3371	$ echo abc \| sed 's/[a]/x/'
3372	Xbc
3373	$ echo abc \| sed 's/\x5ba\x5d/x/'
3374	Xbc
3375	@end group
3376	@end example
3377	@codequoteundirected off
3378	@codequotebacktick off
3379
3380	However it is recommended to avoid such special characters
3381	due to unexpected edge-cases. For example, the following
3382	are not equivalent:
3383
3384	@codequoteundirected on
3385	@codequotebacktick on
3386	@example
3387	@group
3388	$ echo 'a^c' \| sed 's/\^/b/'
3389	abc
3390
3391	$ echo 'a^c' \| sed 's/\\\x5e/b/'
3392	a^c
3393	@end group
3394	@end example
3395	@codequoteundirected off
3396	@codequotebacktick off
3397
3398	@c also: this fails in different places:
3399	@c $ sed 's/[//'
3400	@c sed: -e expression #1, char 5: unterminated `s' command
3401	@c $ sed 's/\x5b//'
3402	@c sed: -e expression #1, char 8: Invalid regular expression
3403	@c
3404	@c which is OK but confusing to explain why (the first
3405	@c fails in compile.c:snarf_char_class while the second
3406	@c is passed to the regex engine and then fails).
3407
3408
3409	@node Locale Considerations
3410	@section Multibyte characters and Locale Considerations
3411
3412	@value{SSED} processes valid multibyte characters in multibyte locales
3413	(e.g. @code{UTF-8}). @footnote{Some regexp edge-cases depends on the
3414	operating system and libc implementation. The examples shown are known
3415	to work as-expected on GNU/Linux systems using glibc.}
3416
3417	@noindent The following example uses the Greek letter Capital Sigma
3418	(@value{ucsigma},
3419	Unicode code point @code{0x03A3}). In a @code{UTF-8} locale,
3420	@command{sed} correctly processes the Sigma as one character despite
3421	it being 2 octets (bytes):
3422
3423	@codequoteundirected on
3424	@codequotebacktick on
3425	@example
3426	@group
3427	$ locale \| grep LANG
3428	LANG=en_US.UTF-8
3429
3430	$ printf 'a\u03A3b'
3431	a@value{ucsigma}b
3432
3433	$ printf 'a\u03A3b' \| sed 's/./X/g'
3434	XXX
3435
3436	$ printf 'a\u03A3b' \| od -tx1 -An
3437	61 ce a3 62
3438	@end group
3439	@end example
3440	@codequoteundirected off
3441	@codequotebacktick off
3442
3443	@noindent
3444	To force @command{sed} to process octets separately, use the @code{C} locale
3445	(also known as the @code{POSIX} locale):
3446
3447	@codequoteundirected on
3448	@codequotebacktick on
3449	@example
3450	$ printf 'a\u03A3b' \| LC_ALL=C sed 's/./X/g'
3451	XXXX
3452	@end example
3453	@codequoteundirected off
3454	@codequotebacktick off
3455
3456	@subsection Invalid multibyte characters
3457
3458	@command{sed}'s regular expressions @emph{do not} match
3459	invalid multibyte sequences in a multibyte locale.
3460
3461	@noindent
3462	In the following examples, the ascii value @code{0xCE} is
3463	an incomplete multibyte character (shown here as @value{unicodeFFFD}).
3464	The regular expression @samp{.} does not match it:
3465
3466	@codequoteundirected on
3467	@codequotebacktick on
3468	@example
3469	@group
3470	$ printf 'a\xCEb\n'
3471	a@value{unicodeFFFD}e
3472
3473	$ printf 'a\xCEb\n' \| sed 's/./X/g'
3474	X@value{unicodeFFFD}X
3475
3476	$ printf 'a\xCEc\n' \| sed 's/./X/g' \| od -tx1c -An
3477	58 ce 58 0a
3478	X X \n
3479	@end group
3480	@end example
3481	@codequoteundirected off
3482	@codequotebacktick off
3483
3484	@noindent Similarly, the 'catch-all' regular expression @samp{.*} does not
3485	match the entire line:
3486
3487	@codequoteundirected on
3488	@codequotebacktick on
3489	@example
3490	@group
3491	$ printf 'a\xCEc\n' \| sed 's/.*//' \| od -tx1c -An
3492	ce 63 0a
3493	c \n
3494	@end group
3495	@end example
3496	@codequoteundirected off
3497	@codequotebacktick off
3498
3499	@noindent
3500	@value{SSED} offers the special @command{z} command to clear the
3501	current pattern space regardless of invalid multibyte characters
3502	(i.e. it works like @code{s/.*//} but also removes invalid multibyte
3503	characters):
3504
3505	@codequoteundirected on
3506	@codequotebacktick on
3507	@example
3508	@group
3509	$ printf 'a\xCEc\n' \| sed 'z' \| od -tx1c -An
3510	0a
3511	\n
3512	@end group
3513	@end example
3514	@codequoteundirected off
3515	@codequotebacktick off
3516
3517	@noindent Alternatively, force the @code{C} locale to process
3518	each octet separately (every octet is a valid character in the @code{C}
3519	locale):
3520
3521	@codequoteundirected on
3522	@codequotebacktick on
3523	@example
3524	@group
3525	$ printf 'a\xCEc\n' \| LC_ALL=C sed 's/.*//' \| od -tx1c -An
3526	0a
3527	\n
3528	@end group
3529	@end example
3530	@codequoteundirected off
3531	@codequotebacktick off
3532
3533
3534	@command{sed}'s inability to process invalid multibyte characters
3535	can be used to detect such invalid sequences in a file.
3536	In the following examples, the @code{\xCE\xCE} is an invalid
3537	multibyte sequence, while @code{\xCE\A3} is a valid multibyte sequence
3538	(of the Greek Sigma character).
3539
3540	@noindent
3541	The following @command{sed} program removes all valid
3542	characters using @code{s/.//g}. Any content left in the pattern space
3543	(the invalid characters) are added to the hold space using the
3544	@code{H} command. On the last line (@code{$}), the hold space is retrieved
3545	(@code{x}), newlines are removed (@code{s/\n//g}), and any remaining
3546	octets are printed unambiguously (@code{l}). Thus, any invalid
3547	multibyte sequences are printed as octal values:
3548
3549	@codequoteundirected on
3550	@codequotebacktick on
3551	@example
3552	@group
3553	$ printf 'ab\nc\n\xCE\xCEde\n\xCE\xA3f\n' > invalid.txt
3554
3555	$ cat invalid.txt
3556	ab
3557	c
3558	@value{unicodeFFFD}@value{unicodeFFFD}de
3559	@value{ucsigma}f
3560
3561	$ sed -n 's/.//g ; H ; $@{x;s/\n//g;l@}' invalid.txt
3562	\316\316$
3563	@end group
3564	@end example
3565	@codequoteundirected off
3566	@codequotebacktick off
3567
3568	@noindent With a few more commands, @command{sed} can print
3569	the exact line number corresponding to each invalid characters (line 3).
3570	These characters can then be removed by forcing the @code{C} locale
3571	and using octal escape sequences:
3572
3573	@codequoteundirected on
3574	@codequotebacktick on
3575	@example
3576	$ sed -n 's/.//g;=;l' invalid.txt \| paste - - \| awk '$2!="$"'
3577	3 \316\316$
3578
3579	$ LC_ALL=C sed '3s/\o316\o316//' invalid.txt > fixed.txt
3580	@end example
3581	@codequoteundirected off
3582	@codequotebacktick off
3583
3584	@subsection Upper/Lower case conversion
3585
3586
3587	@value{SSED}'s substitute command (@code{s}) supports upper/lower
3588	case conversions using @code{\U},@code{\L} codes.
3589	These conversions support multibyte characters:
3590
3591	@codequoteundirected on
3592	@codequotebacktick on
3593	@example
3594	$ printf 'ABC\u03a3\n'
3595	ABC@value{ucsigma}
3596
3597	$ printf 'ABC\u03a3\n' \| sed 's/.*/\L&/'
3598	abc@value{lcsigma}
3599	@end example
3600	@codequoteundirected off
3601	@codequotebacktick off
3602
3603	@noindent
3604	@xref{The "s" Command}.
3605
3606
3607	@subsection Multibyte regexp character classes
3608
3609	@c TODO: fix following paragraphs (copied verbatim from 'bracket
3610	@c expression' section).
3611
3612	In other locales, the sorting sequence is not specified, and
3613	@samp{[a-d]} might be equivalent to @samp{[abcd]} or to
3614	@samp{[aBbCcDd]}, or it might fail to match any character, or the set of
3615	characters that it matches might even be erratic.
3616	To obtain the traditional interpretation
3617	of bracket expressions, you can use the @samp{C} locale by setting the
3618	@env{LC_ALL} environment variable to the value @samp{C}.
3619
3620	@example
3621	# TODO: is there any real-world system/locale where 'A'
3622	# is replaced by '-' ?
3623	$ echo A \| sed 's/[a-z]/-/'
3624	A
3625	@end example
3626
3627	Their interpretation depends on the @env{LC_CTYPE} locale;
3628	for example, @samp{[[:alnum:]]} means the character class of numbers and letters
3629	in the current locale.
3630
3631	TODO: show example of collation
3632
3633	@codequoteundirected on
3634	@codequotebacktick on
3635	@example
3636	# TODO: this works on glibc systems, not on musl-libc/freebsd/macosx.
3637	$ printf 'clichÃ©\n' \| LC_ALL=fr_FR.utf8 sed 's/[[=e=]]/X/g'
3638	clichX
3639	@end example
3640	@codequoteundirected off
3641	@codequotebacktick off
3642
3643
3644	@node advanced sed
3645	@chapter Advanced @command{sed}: cycles and buffers
3646
3647	@menu
3648	* Execution Cycle:: How @command{sed} works
3649	* Hold and Pattern Buffers::
3650	* Multiline techniques:: Using D,G,H,N,P to process multiple lines
3651	* Branching and flow control::
3652	@end menu
3653
3654	@node Execution Cycle
3655	@section How @command{sed} Works
3656
3657	@cindex Buffer spaces, pattern and hold
3658	@cindex Spaces, pattern and hold
3659	@cindex Pattern space, definition
3660	@cindex Hold space, definition
3661	@command{sed} maintains two data buffers: the active @emph{pattern} space,
3662	and the auxiliary @emph{hold} space. Both are initially empty.
3663
3664	@command{sed} operates by performing the following cycle on each
3665	line of input: first, @command{sed} reads one line from the input
3666	stream, removes any trailing newline, and places it in the pattern space.
3667	Then commands are executed; each command can have an address associated
3668	to it: addresses are a kind of condition code, and a command is only
3669	executed if the condition is verified before the command is to be
3670	executed.
3671
3672	When the end of the script is reached, unless the @option{-n} option
3673	is in use, the contents of pattern space are printed out to the output
3674	stream, adding back the trailing newline if it was removed.@footnote{Actually,
3675	if @command{sed} prints a line without the terminating newline, it will
3676	nevertheless print the missing newline as soon as more text is sent to
3677	the same output stream, which gives the ``least expected surprise''
3678	even though it does not make commands like @samp{sed -n p} exactly
3679	identical to @command{cat}.} Then the next cycle starts for the next
3680	input line.
3681
3682	Unless special commands (like @samp{D}) are used, the pattern space is
3683	deleted between two cycles. The hold space, on the other hand, keeps
3684	its data between cycles (see commands @samp{h}, @samp{H}, @samp{x},
3685	@samp{g}, @samp{G} to move data between both buffers).
3686
3687	@node Hold and Pattern Buffers
3688	@section Hold and Pattern Buffers
3689
3690	TODO
3691
3692	@node Multiline techniques
3693	@section Multiline techniques - using D,G,H,N,P to process multiple lines
3694
3695	Multiple lines can be processed as one buffer using the
3696	@code{D},@code{G},@code{H},@code{N},@code{P}. They are similar to
3697	their lowercase counterparts (@code{d},@code{g},
3698	@code{h},@code{n},@code{p}), except that these commands append or
3699	subtract data while respecting embedded newlines - allowing adding and
3700	removing lines from the pattern and hold spaces.
3701
3702	They operate as follows:
3703	@table @code
3704	@item D
3705	@emph{deletes} line from the pattern space until the first newline,
3706	and restarts the cycle.
3707
3708	@item G
3709	@emph{appends} line from the hold space to the pattern space, with a
3710	newline before it.
3711
3712	@item H
3713	@emph{appends} line from the pattern space to the hold space, with a
3714	newline before it.
3715
3716	@item N
3717	@emph{appends} line from the input file to the pattern space.
3718
3719	@item P
3720	@emph{prints} line from the pattern space until the first newline.
3721
3722	@end table
3723
3724
3725	The following example illustrates the operation of @code{N} and
3726	@code{D} commands:
3727
3728	@codequoteundirected on
3729	@codequotebacktick on
3730	@example
3731	@group
3732	$ seq 6 \| sed -n 'N;l;D'
3733	1\n2$
3734	2\n3$
3735	3\n4$
3736	4\n5$
3737	5\n6$
3738	@end group
3739	@end example
3740	@codequoteundirected off
3741	@codequotebacktick off
3742
3743	@enumerate
3744	@item
3745	@command{sed} starts by reading the first line into the pattern space
3746	(i.e. @samp{1}).
3747	@item
3748	At the beginning of every cycle, the @code{N}
3749	command appends a newline and the next line to the pattern space
3750	(i.e. @samp{1}, @samp{\n}, @samp{2} in the first cycle).
3751	@item
3752	The @code{l} command prints the content of the pattern space
3753	unambiguously.
3754	@item
3755	The @code{D} command then removes the content of pattern
3756	space up to the first newline (leaving @samp{2} at the end of
3757	the first cycle).
3758	@item
3759	At the next cycle the @code{N} command appends a
3760	newline and the next input line to the pattern space
3761	(e.g. @samp{2}, @samp{\n}, @samp{3}).
3762	@end enumerate
3763
3764
3765	@cindex processing paragraphs
3766	@cindex paragraphs, processing
3767	A common technique to process blocks of text such as paragraphs
3768	(instead of line-by-line) is using the following construct:
3769
3770	@codequoteundirected on
3771	@codequotebacktick on
3772	@example
3773	sed '/./@{H;$!d@} ; x ; s/REGEXP/REPLACEMENT/'
3774	@end example
3775	@codequoteundirected off
3776	@codequotebacktick off
3777
3778	@enumerate
3779	@item
3780	The first expression, @code{/./@{H;$!d@}} operates on all non-empty lines,
3781	and adds the current line (in the pattern space) to the hold space.
3782	On all lines except the last, the pattern space is deleted and the cycle is
3783	restarted.
3784
3785	@item
3786	The other expressions @code{x} and @code{s} are executed only on empty
3787	lines (i.e. paragraph separators). The @code{x} command fetches the
3788	accumulated lines from the hold space back to the pattern space. The
3789	@code{s///} command then operates on all the text in the paragraph
3790	(including the embedded newlines).
3791	@end enumerate
3792
3793	The following example demonstrates this technique:
3794	@codequoteundirected on
3795	@codequotebacktick on
3796	@example
3797	@group
3798	$ cat input.txt
3799	a a a aa aaa
3800	aaaa aaaa aa
3801	aaaa aaa aaa
3802
3803	bbbb bbb bbb
3804	bb bb bbb bb
3805	bbbbbbbb bbb
3806
3807	ccc ccc cccc
3808	cccc ccccc c
3809	cc cc cc cc
3810
3811	$ sed '/./@{H;$!d@} ; x ; s/^/\nSTART-->/ ; s/$/\n<--END/' input.txt
3812
3813	START-->
3814	a a a aa aaa
3815	aaaa aaaa aa
3816	aaaa aaa aaa
3817	<--END
3818
3819	START-->
3820	bbbb bbb bbb
3821	bb bb bbb bb
3822	bbbbbbbb bbb
3823	<--END
3824
3825	START-->
3826	ccc ccc cccc
3827	cccc ccccc c
3828	cc cc cc cc
3829	<--END
3830	@end group
3831	@end example
3832	@codequoteundirected off
3833	@codequotebacktick off
3834
3835	For more annotated examples, @pxref{Text search across multiple lines}
3836	and @ref{Line length adjustment}.
3837
3838	@node Branching and flow control
3839	@section Branching and Flow Control
3840
3841	The branching commands @code{b}, @code{t}, and @code{T} enable
3842	changing the flow of @command{sed} programs.
3843
3844	By default, @command{sed} reads an input line into the pattern buffer,
3845	then continues to processes all commands in order.
3846	Commands without addresses affect all lines.
3847	Commands with addresses affect only matching lines.
3848	@xref{Execution Cycle} and @ref{Addresses overview}.
3849
3850	@command{sed} does not support a typical @code{if/then} construct.
3851	Instead, some commands can be used as conditionals or to change the
3852	default flow control:
3853
3854	@table @code
3855
3856	@item d
3857	delete (clears) the current pattern space,
3858	and restart the program cycle without processing the rest of the commands
3859	and without printing the pattern space.
3860
3861	@item D
3862	delete the contents of the pattern space @emph{up to the first newline},
3863	and restart the program cycle without processing the rest of
3864	the commands and without printing the pattern space.
3865
3866	@item [addr]X
3867	@itemx [addr]@{ X ; X ; X @}
3868	@item /regexp/X
3869	@item /regexp/@{ X ; X ; X @}
3870	Addresses and regular expressions can be used as an @code{if/then}
3871	conditional: If @var{[addr]} matches the current pattern space,
3872	execute the command(s).
3873	For example: The command @code{/^#/d} means:
3874	@emph{if} the current pattern matches the regular expression @code{^#} (a line
3875	starting with a hash), @emph{then} execute the @code{d} command:
3876	delete the line without printing it, and restart the program cycle
3877	immediately.
3878
3879	@item b
3880	branch unconditionally (that is: always jump to a label, skipping
3881	or repeating other commands, without restarting a new cycle). Combined
3882	with an address, the branch can be conditionally executed on matched
3883	lines.
3884
3885	@item t
3886	branch conditionally (that is: jump to a label) @emph{only if} a
3887	@code{s///} command has succeeded since the last input line was read
3888	or another conditional branch was taken.
3889
3890	@item T
3891	similar but opposite to the @code{t} command: branch only if
3892	there has been @emph{no} successful substitutions since the last
3893	input line was read.
3894	@end table
3895
3896
3897	The following two @command{sed} programs are equivalent. The first
3898	(contrived) example uses the @code{b} command to skip the @code{s///}
3899	command on lines containing @samp{1}. The second example uses an
3900	address with negation (@samp{!}) to perform substitution only on
3901	desired lines. The @code{y///} command is still executed on all
3902	lines:
3903
3904	@codequoteundirected on
3905	@codequotebacktick on
3906	@example
3907	@group
3908	$ printf '%s\n' a1 a2 a3 \| sed -E '/1/bx ; s/a/z/ ; :x ; y/123/456/'
3909	a4
3910	z5
3911	z6
3912
3913	$ printf '%s\n' a1 a2 a3 \| sed -E '/1/!s/a/z/ ; y/123/456/'
3914	a4
3915	z5
3916	z6
3917	@end group
3918	@end example
3919	@codequoteundirected off
3920	@codequotebacktick off
3921
3922
3923
3924	@subsection Branching and Cycles
3925	@cindex labels
3926	@cindex omitting labels
3927	@cindex cycle, restarting
3928	@cindex restarting a cycle
3929	The @code{b},@code{t} and @code{T} commands can be followed by a label
3930	(typically a single letter). Labels are defined with a colon followed by
3931	one or more letters (e.g. @samp{:x}). If the label is omitted the
3932	branch commands restart the cycle. Note the difference between
3933	branching to a label and restarting the cycle: when a cycle is
3934	restarted, @command{sed} first prints the current content of the
3935	pattern space, then reads the next input line into the pattern space;
3936	Jumping to a label (even if it is at the beginning of the program)
3937	does not print the pattern space and does not read the next input line.
3938
3939	The following program is a no-op. The @code{b} command (the only command
3940	in the program) does not have a label, and thus simply restarts the cycle.
3941	On each cycle, the pattern space is printed and the next input line is read:
3942
3943	@example
3944	@group
3945	$ seq 3 \| sed b
3946	1
3947	2
3948	3
3949	@end group
3950	@end example
3951
3952	@cindex infinite loop, branching
3953	@cindex branching, infinite loop
3954	The following example is an infinite-loop - it doesn't terminate and
3955	doesn't print anything. The @code{b} command jumps to the @samp{x}
3956	label, and a new cycle is never started:
3957
3958	@codequoteundirected on
3959	@codequotebacktick on
3960	@example
3961	@group
3962	$ seq 3 \| sed ':x ; bx'
3963
3964	# The above command requires gnu sed (which supports additional
3965	# commands following a label, without a newline). A portable equivalent:
3966	# sed -e ':x' -e bx
3967	@end group
3968	@end example
3969	@codequoteundirected off
3970	@codequotebacktick off
3971
3972	@cindex branching and n, N
3973	@cindex n, and branching
3974	@cindex N, and branching
3975	Branching is often complemented with the @code{n} or @code{N} commands:
3976	both commands read the next input line into the pattern space without waiting
3977	for the cycle to restart. Before reading the next input line, @code{n}
3978	prints the current pattern space then empties it, while @code{N}
3979	appends a newline and the next input line to the pattern space.
3980
3981	Consider the following two examples:
3982
3983	@codequoteundirected on
3984	@codequotebacktick on
3985	@example
3986	@group
3987	$ seq 3 \| sed ':x ; n ; bx'
3988	1
3989	2
3990	3
3991
3992	$ seq 3 \| sed ':x ; N ; bx'
3993	1
3994	2
3995	3
3996	@end group
3997	@end example
3998	@codequoteundirected off
3999	@codequotebacktick off
4000
4001	@itemize
4002	@item
4003	Both examples do not inf-loop, despite never starting a new cycle.
4004
4005	@item
4006	In the first example, the @code{n} commands first prints the content
4007	of the pattern space, empties the pattern space then reads the next
4008	input line.
4009
4010	@item
4011	In the second example, the @code{N} commands appends the next input
4012	line to the pattern space (with a newline). Lines are accumulated in
4013	the pattern space until there are no more input lines to read, then
4014	the @code{N} command terminates the @command{sed} program. When the
4015	program terminates, the end-of-cycle actions are performed, and the
4016	entire pattern space is printed.
4017
4018	@item
4019	The second example requires @value{SSED},
4020	because it uses the non-POSIX-standard behavior of @code{N}.
4021	See the ``@code{N} command on the last line'' paragraph
4022	in @ref{Reporting Bugs}.
4023
4024	@item
4025	To further examine the difference between the two examples,
4026	try the following commands:
4027	@codequoteundirected on
4028	@codequotebacktick on
4029	@example
4030	@group
4031	printf '%s\n' aa bb cc dd \| sed ':x ; n ; = ; bx'
4032	printf '%s\n' aa bb cc dd \| sed ':x ; N ; = ; bx'
4033	printf '%s\n' aa bb cc dd \| sed ':x ; n ; s/\n/***/ ; bx'
4034	printf '%s\n' aa bb cc dd \| sed ':x ; N ; s/\n/***/ ; bx'
4035	@end group
4036	@end example
4037	@codequoteundirected off
4038	@codequotebacktick off
4039
4040	@end itemize
4041
4042
4043
4044	@subsection Branching example: joining lines
4045
4046	@cindex joining lines with branching
4047	@cindex branching, joining lines
4048	@cindex quoted-printable lines, joining
4049	@cindex joining quoted-printable lines
4050	@cindex t, joining lines with
4051	@cindex b, joining lines with
4052	@cindex b, versus t
4053	@cindex t, versus b
4054	As a real-world example of using branching, consider the case of
4055	@uref{https://en.wikipedia.org/wiki/Quoted-printable,quoted-printable} files,
4056	typically used to encode email messages.
4057	In these files long lines are split and marked with a @dfn{soft line break}
4058	consisting of a single @samp{=} character at the end of the line:
4059
4060	@example
4061	@group
4062	$ cat jaques.txt
4063	All the wor=
4064	ld's a stag=
4065	e,
4066	And all the=
4067	men and wo=
4068	men merely =
4069	players:
4070	They have t=
4071	heir exits =
4072	and their e=
4073	ntrances;
4074	And one man=
4075	in his tim=
4076	e plays man=
4077	y parts.
4078	@end group
4079	@end example
4080
4081
4082	The following program uses an address match @samp{/=$/} as a
4083	conditional: If the current pattern space ends with a @samp{=}, it
4084	reads the next input line using @code{N}, replaces all @samp{=}
4085	characters which are followed by a newline, and unconditionally
4086	branches (@code{b}) to the beginning of the program without restarting
4087	a new cycle. If the pattern space does not ends with @samp{=}, the
4088	default action is performed: the pattern space is printed and a new
4089	cycle is started:
4090
4091	@codequoteundirected on
4092	@codequotebacktick on
4093	@example
4094	@group
4095	$ sed ':x ; /=$/ @{ N ; s/=\n//g ; bx @}' jaques.txt
4096	All the world's a stage,
4097	And all the men and women merely players:
4098	They have their exits and their entrances;
4099	And one man in his time plays many parts.
4100	@end group
4101	@end example
4102	@codequoteundirected off
4103	@codequotebacktick off
4104
4105	Here's an alternative program with a slightly different approach: On
4106	all lines except the last, @code{N} appends the line to the pattern
4107	space. A substitution command then removes soft line breaks
4108	(@samp{=} at the end of a line, i.e. followed by a newline) by replacing
4109	them with an empty string.
4110	@emph{if} the substitution was successful (meaning the pattern space contained
4111	a line which should be joined), The conditional branch command @code{t} jumps
4112	to the beginning of the program without completing or restarting the cycle.
4113	If the substitution failed (meaning there were no soft line breaks),
4114	The @code{t} command will @emph{not} branch. Then, @code{P} will
4115	print the pattern space content until the first newline, and @code{D}
4116	will delete the pattern space content until the first new line.
4117	(To learn more about @code{N}, @code{P} and @code{D} commands
4118	@pxref{Multiline techniques}).
4119
4120
4121	@codequoteundirected on
4122	@codequotebacktick on
4123	@example
4124	@group
4125	$ sed ':x ; $!N ; s/=\n// ; tx ; P ; D' jaques.txt
4126	All the world's a stage,
4127	And all the men and women merely players:
4128	They have their exits and their entrances;
4129	And one man in his time plays many parts.
4130	@end group
4131	@end example
4132	@codequoteundirected off
4133	@codequotebacktick off
4134
4135
4136	For more line-joining examples @pxref{Joining lines}.
4137
4138
4139	@node Examples
4140	@chapter Some Sample Scripts
4141
4142	Here are some @command{sed} scripts to guide you in the art of mastering
4143	@command{sed}.
4144
4145	@menu
4146
4147	Useful one-liners:
4148	* Joining lines::
4149
4150	Some exotic examples:
4151	* Centering lines::
4152	* Increment a number::
4153	* Rename files to lower case::
4154	* Print bash environment::
4155	* Reverse chars of lines::
4156	* Text search across multiple lines::
4157	* Line length adjustment::
4158	* Adding a header to multiple files::
4159
4160	Emulating standard utilities:
4161	* tac:: Reverse lines of files
4162	* cat -n:: Numbering lines
4163	* cat -b:: Numbering non-blank lines
4164	* wc -c:: Counting chars
4165	* wc -w:: Counting words
4166	* wc -l:: Counting lines
4167	* head:: Printing the first lines
4168	* tail:: Printing the last lines
4169	* uniq:: Make duplicate lines unique
4170	* uniq -d:: Print duplicated lines of input
4171	* uniq -u:: Remove all duplicated lines
4172	* cat -s:: Squeezing blank lines
4173	@end menu
4174
4175	@node Joining lines
4176	@section Joining lines
4177
4178	This section uses @code{N}, @code{D} and @code{P} commands to process
4179	multiple lines, and the @code{b} and @code{t} commands for branching.
4180	@xref{Multiline techniques} and @ref{Branching and flow control}.
4181
4182	Join specific lines (e.g. if lines 2 and 3 need to be joined):
4183
4184	@codequoteundirected on
4185	@codequotebacktick on
4186	@example
4187	$ cat lines.txt
4188	hello
4189	hel
4190	lo
4191	hello
4192
4193	$ sed '2@{N;s/\n//;@}' lines.txt
4194	hello
4195	hello
4196	hello
4197	@end example
4198	@codequoteundirected off
4199	@codequotebacktick off
4200
4201	Join backslash-continued lines:
4202
4203	@codequoteundirected on
4204	@codequotebacktick on
4205	@example
4206	$ cat 1.txt
4207	this \
4208	is \
4209	a \
4210	long \
4211	line
4212	and another \
4213	line
4214
4215	$ sed -e ':x /\\$/ @{ N; s/\\\n//g ; bx @}' 1.txt
4216	this is a long line
4217	and another line
4218
4219
4220	#TODO: The above requires gnu sed.
4221	# non-gnu seds need newlines after ':' and 'b'
4222	@end example
4223	@codequoteundirected off
4224	@codequotebacktick off
4225
4226	Join lines that start with whitespace (e.g SMTP headers):
4227
4228	@codequoteundirected on
4229	@codequotebacktick on
4230	@example
4231	@group
4232	$ cat 2.txt
4233	Subject: Hello
4234	World
4235	Content-Type: multipart/alternative;
4236	boundary=94eb2c190cc6370f06054535da6a
4237	Date: Tue, 3 Jan 2017 19:41:16 +0000 (GMT)
4238	Authentication-Results: mx.gnu.org;
4239	dkim=pass header.i=@@gnu.org;
4240	spf=pass
4241	Message-ID: <abcdef@@gnu.org>
4242	From: John Doe <jdoe@@gnu.org>
4243	To: Jane Smith <jsmith@@gnu.org>
4244
4245	$ sed -E ':a ; $!N ; s/\n\s+/ / ; ta ; P ; D' 2.txt
4246	Subject: Hello World
4247	Content-Type: multipart/alternative; boundary=94eb2c190cc6370f06054535da6a
4248	Date: Tue, 3 Jan 2017 19:41:16 +0000 (GMT)
4249	Authentication-Results: mx.gnu.org; dkim=pass header.i=@@gnu.org; spf=pass
4250	Message-ID: <abcdef@@gnu.org>
4251	From: John Doe <jdoe@@gnu.org>
4252	To: Jane Smith <jsmith@@gnu.org>
4253
4254	# A portable (non-gnu) variation:
4255	# sed -e :a -e '$!N;s/\n */ /;ta' -e 'P;D'
4256	@end group
4257	@end example
4258	@codequoteundirected off
4259	@codequotebacktick off
4260
4261
4262	@node Centering lines
4263	@section Centering Lines
4264
4265	This script centers all lines of a file on a 80 columns width.
4266	To change that width, the number in @code{\@{@dots{}\@}} must be
4267	replaced, and the number of added spaces also must be changed.
4268
4269	Note how the buffer commands are used to separate parts in
4270	the regular expressions to be matched---this is a common
4271	technique.
4272
4273	@c start-------------------------------------------
4274	@example
4275	#!/usr/bin/sed -f
4276
4277	@group
4278	# Put 80 spaces in the buffer
4279	1 @{
4280	x
4281	s/^$/ /
4282	s/^.*$/&&&&&&&&/
4283	x
4284	@}
4285	@end group
4286
4287	@group
4288	# delete leading and trailing spaces
4289	y/@kbd{@key{TAB}}/ /
4290	s/^ *//
4291	s/ *$//
4292	@end group
4293
4294	@group
4295	# add a newline and 80 spaces to end of line
4296	G
4297	@end group
4298
4299	@group
4300	# keep first 81 chars (80 + a newline)
4301	s/^$.\@{81\@}$.*$/\1/
4302	@end group
4303
4304	@group
4305	# \2 matches half of the spaces, which are moved to the beginning
4306	s/^$.$\n$.$\2/\2\1/
4307	@end group
4308	@end example
4309	@c end---------------------------------------------
4310
4311	@node Increment a number
4312	@section Increment a Number
4313
4314	This script is one of a few that demonstrate how to do arithmetic
4315	in @command{sed}. This is indeed possible,@footnote{@command{sed} guru Greg
4316	Ubben wrote an implementation of the @command{dc} @sc{rpn} calculator!
4317	It is distributed together with sed.} but must be done manually.
4318
4319	To increment one number you just add 1 to last digit, replacing
4320	it by the following digit. There is one exception: when the digit
4321	is a nine the previous digits must be also incremented until you
4322	don't have a nine.
4323
4324	This solution by Bruno Haible is very clever and smart because
4325	it uses a single buffer; if you don't have this limitation, the
4326	algorithm used in @ref{cat -n, Numbering lines}, is faster.
4327	It works by replacing trailing nines with an underscore, then
4328	using multiple @code{s} commands to increment the last digit,
4329	and then again substituting underscores with zeros.
4330
4331	@c start-------------------------------------------
4332	@example
4333	#!/usr/bin/sed -f
4334
4335	/[^0-9]/ d
4336
4337	@group
4338	# replace all trailing 9s by _ (any other character except digits, could
4339	# be used)
4340	:d
4341	s/9$_*$$/_\1/
4342	td
4343	@end group
4344
4345	@group
4346	# incr last digit only. The first line adds a most-significant
4347	# digit of 1 if we have to add a digit.
4348	@end group
4349
4350	@group
4351	s/^$_*$$/1\1/; tn
4352	s/8$_*$$/9\1/; tn
4353	s/7$_*$$/8\1/; tn
4354	s/6$_*$$/7\1/; tn
4355	s/5$_*$$/6\1/; tn
4356	s/4$_*$$/5\1/; tn
4357	s/3$_*$$/4\1/; tn
4358	s/2$_*$$/3\1/; tn
4359	s/1$_*$$/2\1/; tn
4360	s/0$_*$$/1\1/; tn
4361	@end group
4362
4363	@group
4364	:n
4365	y/_/0/
4366	@end group
4367	@end example
4368	@c end---------------------------------------------
4369
4370	@node Rename files to lower case
4371	@section Rename Files to Lower Case
4372
4373	This is a pretty strange use of @command{sed}. We transform text, and
4374	transform it to be shell commands, then just feed them to shell.
4375	Don't worry, even worse hacks are done when using @command{sed}; I have
4376	seen a script converting the output of @command{date} into a @command{bc}
4377	program!
4378
4379	The main body of this is the @command{sed} script, which remaps the name
4380	from lower to upper (or vice-versa) and even checks out
4381	if the remapped name is the same as the original name.
4382	Note how the script is parameterized using shell
4383	variables and proper quoting.
4384
4385	@c start-------------------------------------------
4386	@example
4387	@group
4388	#! /bin/sh
4389	# rename files to lower/upper case...
4390	#
4391	# usage:
4392	# move-to-lower *
4393	# move-to-upper *
4394	# or
4395	# move-to-lower -R .
4396	# move-to-upper -R .
4397	#
4398	@end group
4399
4400	@group
4401	help()
4402	@{
4403	cat << eof
4404	Usage: $0 [-n] [-r] [-h] files...
4405	@end group
4406
4407	@group
4408	-n do nothing, only see what would be done
4409	-R recursive (use find)
4410	-h this message
4411	files files to remap to lower case
4412	@end group
4413
4414	@group
4415	Examples:
4416	$0 -n * (see if everything is ok, then...)
4417	$0 *
4418	@end group
4419
4420	$0 -R .
4421
4422	@group
4423	eof
4424	@}
4425	@end group
4426
4427	@group
4428	apply_cmd='sh'
4429	finder='echo "$@@" \| tr " " "\n"'
4430	files_only=
4431	@end group
4432
4433	@group
4434	while :
4435	do
4436	case "$1" in
4437	-n) apply_cmd='cat' ;;
4438	-R) finder='find "$@@" -type f';;
4439	-h) help ; exit 1 ;;
4440	*) break ;;
4441	esac
4442	shift
4443	done
4444	@end group
4445
4446	@group
4447	if [ -z "$1" ]; then
4448	echo Usage: $0 [-h] [-n] [-r] files...
4449	exit 1
4450	fi
4451	@end group
4452
4453	@group
4454	LOWER='abcdefghijklmnopqrstuvwxyz'
4455	UPPER='ABCDEFGHIJKLMNOPQRSTUVWXYZ'
4456	@end group
4457
4458	@group
4459	case `basename $0` in
4460	upper) TO=$UPPER; FROM=$LOWER ;;
4461	*) FROM=$UPPER; TO=$LOWER ;;
4462	esac
4463	@end group
4464
4465	eval $finder \| sed -n '
4466
4467	@group
4468	# remove all trailing slashes
4469	s/\/*$//
4470	@end group
4471
4472	@group
4473	# add ./ if there is no path, only a filename
4474	/\//! s/^/.\//
4475	@end group
4476
4477	@group
4478	# save path+filename
4479	h
4480	@end group
4481
4482	@group
4483	# remove path
4484	s/.*\///
4485	@end group
4486
4487	@group
4488	# do conversion only on filename
4489	y/'$FROM'/'$TO'/
4490	@end group
4491
4492	@group
4493	# now line contains original path+file, while
4494	# hold space contains the new filename
4495	x
4496	@end group
4497
4498	@group
4499	# add converted file name to line, which now contains
4500	# path/file-name\nconverted-file-name
4501	G
4502	@end group
4503
4504	@group
4505	# check if converted file name is equal to original file name,
4506	# if it is, do not print anything
4507	/^.\/$.$\n\1/b
4508	@end group
4509
4510	@group
4511	# escape special characters for the shell
4512	s/["$`\\]/\\&/g
4513	@end group
4514
4515	@group
4516	# now, transform path/fromfile\n, into
4517	# mv path/fromfile path/tofile and print it
4518	s/^$.\/$$.$\n$.*$$/mv "\1\2" "\1\3"/p
4519	@end group
4520
4521	' \| $apply_cmd
4522	@end example
4523	@c end---------------------------------------------
4524
4525	@node Print bash environment
4526	@section Print @command{bash} Environment
4527
4528	This script strips the definition of the shell functions
4529	from the output of the @command{set} Bourne-shell command.
4530
4531	@c start-------------------------------------------
4532	@example
4533	#!/bin/sh
4534
4535	@group
4536	set \| sed -n '
4537	:x
4538	@end group
4539
4540	@group
4541	@ifinfo
4542	# if no occurrence of "=()" print and load next line
4543	@end ifinfo
4544	@ifnotinfo
4545	# if no occurrence of @samp{=()} print and load next line
4546	@end ifnotinfo
4547	/=()/! @{ p; b; @}
4548	/ () $/! @{ p; b; @}
4549	@end group
4550
4551	@group
4552	# possible start of functions section
4553	# save the line in case this is a var like FOO="() "
4554	h
4555	@end group
4556
4557	@group
4558	# if the next line has a brace, we quit because
4559	# nothing comes after functions
4560	n
4561	/^@{/ q
4562	@end group
4563
4564	@group
4565	# print the old line
4566	x; p
4567	@end group
4568
4569	@group
4570	# work on the new line now
4571	x; bx
4572	'
4573	@end group
4574	@end example
4575	@c end---------------------------------------------
4576
4577	@node Reverse chars of lines
4578	@section Reverse Characters of Lines
4579
4580	This script can be used to reverse the position of characters
4581	in lines. The technique moves two characters at a time, hence
4582	it is faster than more intuitive implementations.
4583
4584	Note the @code{tx} command before the definition of the label.
4585	This is often needed to reset the flag that is tested by
4586	the @code{t} command.
4587
4588	Imaginative readers will find uses for this script. An example
4589	is reversing the output of @command{banner}.@footnote{This requires
4590	another script to pad the output of banner; for example
4591
4592	@example
4593	#! /bin/sh
4594
4595	banner -w $1 $2 $3 $4 \|
4596	sed -e :a -e '/^.\@{0,'$1'\@}$/ @{ s/$/ /; ba; @}' \|
4597	~/sedscripts/reverseline.sed
4598	@end example
4599	}
4600
4601	@c start-------------------------------------------
4602	@example
4603	#!/usr/bin/sed -f
4604
4605	/../! b
4606
4607	@group
4608	# Reverse a line. Begin embedding the line between two newlines
4609	s/^.*$/\
4610	&\
4611	/
4612	@end group
4613
4614	@group
4615	# Move first character at the end. The regexp matches until
4616	# there are zero or one characters between the markers
4617	tx
4618	:x
4619	s/$\n.$$.*$$.\n$/\3\2\1/
4620	tx
4621	@end group
4622
4623	@group
4624	# Remove the newline markers
4625	s/\n//g
4626	@end group
4627	@end example
4628	@c end---------------------------------------------
4629
4630
4631	@node Text search across multiple lines
4632	@section Text search across multiple lines
4633
4634	This section uses @code{N} and @code{D} commands to search for
4635	consecutive words spanning multiple lines. @xref{Multiline techniques}.
4636
4637	These examples deal with finding doubled occurrences of words in a document.
4638
4639	Finding doubled words in a single line is easy using GNU @command{grep}
4640	and similarly with @value{SSED}:
4641
4642	@c NOTE: in all examples, 'the@ the' is used to prevent
4643	@c 'make syntax-check' from complaining about double words.
4644	@codequoteundirected on
4645	@codequotebacktick on
4646	@example
4647	@group
4648	$ cat two-cities-dup1.txt
4649	It was the best of times,
4650	it was the worst of times,
4651	it was the@ the age of wisdom,
4652	it was the age of foolishness,
4653
4654	$ grep -E '\b(\w+)\s+\1\b' two-cities-dup1.txt
4655	it was the@ the age of wisdom,
4656
4657	$ grep -n -E '\b(\w+)\s+\1\b' two-cities-dup1.txt
4658	3:it was the@ the age of wisdom,
4659
4660	$ sed -En '/\b(\w+)\s+\1\b/p' two-cities-dup1.txt
4661	it was the@ the age of wisdom,
4662
4663	$ sed -En '/\b(\w+)\s+\1\b/@{=;p@}' two-cities-dup1.txt
4664	3
4665	it was the@ the age of wisdom,
4666	@end group
4667	@end example
4668	@codequoteundirected off
4669	@codequotebacktick off
4670
4671	@itemize @bullet
4672	@item
4673	The regular expression @samp{\b\w+\s+} searches for word-boundary (@samp{\b}),
4674	followed by one-or-more word-characters (@samp{\w+}), followed by whitespace
4675	(@samp{\s+}). @xref{regexp extensions}.
4676
4677	@item
4678	Adding parentheses around the @samp{(\w+)} expression creates a subexpression.
4679	The regular expression pattern @samp{(PATTERN)\s+\1} defines a subexpression
4680	(in the parentheses) followed by a back-reference, separated by whitespace.
4681	A successful match means the @var{PATTERN} was repeated twice in succession.
4682	@xref{Back-references and Subexpressions}.
4683
4684	@item
4685	The word-boundery expression (@samp{\b}) at both ends ensures partial
4686	words are not matched (e.g. @samp{the then} is not a desired match).
4687	@c Thanks to Jim for pointing this out in
4688	@c https://lists.gnu.org/archive/html/sed-devel/2016-12/msg00041.html
4689
4690	@item
4691	The @option{-E} option enables extended regular expression syntax, alleviating
4692	the need to add backslashes before the parenthesis. @xref{ERE syntax}.
4693
4694	@end itemize
4695
4696	When the doubled word span two lines the above regular expression
4697	will not find them as @command{grep} and @command{sed} operate line-by-line.
4698
4699	By using @command{N} and @command{D} commands, @command{sed} can apply
4700	regular expressions on multiple lines (that is, multiple lines are stored
4701	in the pattern space, and the regular expression works on it):
4702
4703	@c NOTE: use 'the@*the' instead of a real new line to prevent
4704	@c 'make syntax-check' to complain about doubled-words.
4705	@codequoteundirected on
4706	@codequotebacktick on
4707	@example
4708	$ cat two-cities-dup2.txt
4709	It was the best of times, it was the
4710	worst of times, it was the@*the age of wisdom,
4711	it was the age of foolishness,
4712
4713	$ sed -En '@{N; /\b(\w+)\s+\1\b/@{=;p@} ; D@}' two-cities-dup2.txt
4714	3
4715	worst of times, it was the@*the age of wisdom,
4716	@end example
4717	@codequoteundirected off
4718	@codequotebacktick off
4719
4720	@itemize @bullet
4721	@item
4722	The @command{N} command appends the next line to the pattern space
4723	(thus ensuring it contains two consecutive lines in every cycle).
4724
4725	@item
4726	The regular expression uses @samp{\s+} for word separator which matches
4727	both spaces and newlines.
4728
4729	@item
4730	The regular expression matches, the entire pattern space is printed
4731	with @command{p}. No lines are printed by default due to the @option{-n} option.
4732
4733	@item
4734	The @command{D} removes the first line from the pattern space (up until the
4735	first newline), readying it for the next cycle.
4736	@end itemize
4737
4738	See the GNU @command{coreutils} manual for an alternative solution using
4739	@command{tr -s} and @command{uniq} at
4740	@c NOTE: cheating and keeping the URL line shorter than 80 characters
4741	@c by using 'gnu.org' and '/s/'.
4742	@url{https://gnu.org/s/coreutils/manual/html_node/Squeezing-and-deleting.html}.
4743
4744	@node Line length adjustment
4745	@section Line length adjustment
4746
4747	This section uses @code{N} and @code{P} commands to read and write
4748	lines, and the @code{b} command for branching.
4749	@xref{Multiline techniques} and @ref{Branching and flow control}.
4750
4751	This (somewhat contrived) example deal with formatting and wrapping
4752	lines of text of the following input file:
4753
4754	@example
4755	@group
4756	$ cat two-cities-mix.txt
4757	It was the best of times, it was
4758	the worst of times, it
4759	was the age of
4760	wisdom,
4761	it
4762	was
4763	the age
4764	of foolishness,
4765	@end group
4766	@end example
4767
4768	@exdent The following sed program wraps lines at 40 characters:
4769	@codequoteundirected on
4770	@codequotebacktick on
4771	@example
4772	@group
4773	$ cat wrap40.sed
4774	# outer loop
4775	:x
4776
4777	# Append a newline followed by the next input line to the pattern buffer
4778	N
4779
4780	# Remove all newlines from the pattern buffer
4781	s/\n/ /g
4782
4783
4784	# Inner loop
4785	:y
4786
4787	# Add a newline after the first 40 characters
4788	s/(.@{40,40@})/\1\n/
4789
4790	# If there is a newline in the pattern buffer
4791	# (i.e. the previous substitution added a newline)
4792	/\n/ @{
4793	# There are newlines in the pattern buffer -
4794	# print the content until the first newline.
4795	P
4796
4797	# Remove the printed characters and the first newline
4798	s/.*\n//
4799
4800	# branch to label 'y' - repeat inner loop
4801	by
4802	@}
4803
4804	# No newlines in the pattern buffer - Branch to label 'x' (outer loop)
4805	# and read the next input line
4806	bx
4807	@end group
4808	@end example
4809	@codequoteundirected off
4810	@codequotebacktick off
4811
4812
4813
4814	@exdent The wrapped output:
4815	@codequoteundirected on
4816	@codequotebacktick on
4817	@example
4818	@group
4819	$ sed -E -f wrap40.sed two-cities-mix.txt
4820	It was the best of times, it was the wor
4821	st of times, it was the age of wisdom, i
4822	t was the age of foolishness,
4823	@end group
4824	@end example
4825	@codequoteundirected off
4826	@codequotebacktick off
4827
4828
4829
4830
4831	@node Adding a header to multiple files
4832	@section Adding a header to multiple files
4833
4834	@value{SSED} can be used to safely modify multiple files at once.
4835
4836	@exdent Add a single line to the beginning of source code files:
4837
4838	@codequoteundirected on
4839	@codequotebacktick on
4840	@example
4841	sed -i '1i/* Copyright (C) FOO BAR /' .c
4842	@end example
4843	@codequoteundirected off
4844	@codequotebacktick off
4845
4846	@exdent Adding a few lines is possible using @samp{\n} in the text:
4847
4848	@codequoteundirected on
4849	@codequotebacktick on
4850	@example
4851	sed -i '1i/\n Copyright (C) FOO BAR\n * Created by Jane Doe\n /' .c
4852	@end example
4853	@codequoteundirected off
4854	@codequotebacktick off
4855
4856	To add multiple lines from another file, use @code{0rFILE}.
4857	A typical use case is adding a license notice header to all files:
4858
4859	@codequoteundirected on
4860	@codequotebacktick on
4861	@example
4862	## Create the header file:
4863	$ cat<<'EOF'>LIC.TXT
4864	/*
4865	Copyright (C) 1989-2021 FOO BAR
4866
4867	This program is free software; you can redistribute it and/or modify
4868	it under the terms of the GNU General Public License as published by
4869	the Free Software Foundation; either version 3, or (at your option)
4870	any later version.
4871
4872	This program is distributed in the hope that it will be useful,
4873	but WITHOUT ANY WARRANTY; without even the implied warranty of
4874	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
4875	GNU General Public License for more details.
4876
4877	You should have received a copy of the GNU General Public License
4878	along with this program; If not, see <https://www.gnu.org/licenses/>.
4879	*/
4880	EOF
4881
4882	## Add the file at the beginning of all source code files:
4883	$ sed -i '0rLIC.TXT' .cpp .h
4884	@end example
4885	@codequoteundirected off
4886	@codequotebacktick off
4887
4888
4889	With script files (e.g. @file{.sh},@file{.py},@file{.pl} files)
4890	the license notice typically appears @emph{after} the first line (the
4891	'shebang' @samp{#!} line). The @code{1rFILE} command will add @file{FILE}
4892	@emph{after} the first line:
4893
4894	@codequoteundirected on
4895	@codequotebacktick on
4896	@example
4897	## Create the header file:
4898	$ cat<<'EOF'>LIC.TXT
4899	##
4900	## Copyright (C) 1989-2021 FOO BAR
4901	##
4902	## This program is free software; you can redistribute it and/or modify
4903	## it under the terms of the GNU General Public License as published by
4904	## the Free Software Foundation; either version 3, or (at your option)
4905	## any later version.
4906	##
4907	## This program is distributed in the hope that it will be useful,
4908	## but WITHOUT ANY WARRANTY; without even the implied warranty of
4909	## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
4910	## GNU General Public License for more details.
4911	##
4912	## You should have received a copy of the GNU General Public License
4913	## along with this program; If not, see <https://www.gnu.org/licenses/>.
4914	##
4915	##
4916	EOF
4917
4918	## Add the file at the beginning of all source code files:
4919	$ sed -i '1rLIC.TXT' .py .sh
4920	@end example
4921	@codequoteundirected off
4922	@codequotebacktick off
4923
4924	The above @command{sed} commands can be combined with @command{find}
4925	to locate files in all subdirectories, @command{xargs} to run additional
4926	commands on selected files and @command{grep} to filter out files that already
4927	contain a copyright notice:
4928
4929	@codequoteundirected on
4930	@codequotebacktick on
4931	@example
4932	find $ -iname '.cpp' -o -iname '.c' -o -iname '*.h' $ \
4933	\| xargs grep -Li copyright \
4934	\| xargs -r sed -i '0rLIC.TXT'
4935	@end example
4936	@codequoteundirected off
4937	@codequotebacktick off
4938
4939	@exdent Or a slightly safe version (handling files with spaces and newlines):
4940
4941	@codequoteundirected on
4942	@codequotebacktick on
4943	@example
4944	find $ -iname '.cpp' -o -iname '.c' -o -iname '*.h' $ -print0 \
4945	\| xargs -0 grep -Z -Li copyright \
4946	\| xargs -0 -r sed -i '0rLIC.TXT'
4947	@end example
4948	@codequoteundirected off
4949	@codequotebacktick off
4950
4951	Note: using the @code{0} address with @code{r} command requires @value{SSED}
4952	version 4.9 or later. @xref{Zero Address}.
4953
4954
4955
4956	@node tac
4957	@section Reverse Lines of Files
4958
4959	This one begins a series of totally useless (yet interesting)
4960	scripts emulating various Unix commands. This, in particular,
4961	is a @command{tac} workalike.
4962
4963	Note that on implementations other than GNU @command{sed}
4964	this script might easily overflow internal buffers.
4965
4966	@c start-------------------------------------------
4967	@example
4968	#!/usr/bin/sed -nf
4969
4970	# reverse all lines of input, i.e. first line became last, ...
4971
4972	@group
4973	# from the second line, the buffer (which contains all previous lines)
4974	# is appended to current line, so, the order will be reversed
4975	1! G
4976	@end group
4977
4978	@group
4979	# on the last line we're done -- print everything
4980	$ p
4981	@end group
4982
4983	@group
4984	# store everything on the buffer again
4985	h
4986	@end group
4987	@end example
4988	@c end---------------------------------------------
4989
4990	@node cat -n
4991	@section Numbering Lines
4992
4993	This script replaces @samp{cat -n}; in fact it formats its output
4994	exactly like GNU @command{cat} does.
4995
4996	Of course this is completely useless and for two reasons: first,
4997	because somebody else did it in C, second, because the following
4998	Bourne-shell script could be used for the same purpose and would
4999	be much faster:
5000
5001	@c start-------------------------------------------
5002	@example
5003	@group
5004	#! /bin/sh
5005	sed -e "=" $@@ \| sed -e '
5006	s/^/ /
5007	N
5008	s/^ *$......$\n/\1 /
5009	'
5010	@end group
5011	@end example
5012	@c end---------------------------------------------
5013
5014	It uses @command{sed} to print the line number, then groups lines two
5015	by two using @code{N}. Of course, this script does not teach as much as
5016	the one presented below.
5017
5018	The algorithm used for incrementing uses both buffers, so the line
5019	is printed as soon as possible and then discarded. The number
5020	is split so that changing digits go in a buffer and unchanged ones go
5021	in the other; the changed digits are modified in a single step
5022	(using a @code{y} command). The line number for the next line
5023	is then composed and stored in the hold space, to be used in the
5024	next iteration.
5025
5026	@c start-------------------------------------------
5027	@example
5028	#!/usr/bin/sed -nf
5029
5030	@group
5031	# Prime the pump on the first line
5032	x
5033	/^$/ s/^.*$/1/
5034	@end group
5035
5036	@group
5037	# Add the correct line number before the pattern
5038	G
5039	h
5040	@end group
5041
5042	@group
5043	# Format it and print it
5044	s/^/ /
5045	s/^ *$......$\n/\1 /p
5046	@end group
5047
5048	@group
5049	# Get the line number from hold space; add a zero
5050	# if we're going to add a digit on the next line
5051	g
5052	s/\n.*$//
5053	/^9*$/ s/^/0/
5054	@end group
5055
5056	@group
5057	# separate changing/unchanged digits with an x
5058	s/.9*$/x&/
5059	@end group
5060
5061	@group
5062	# keep changing digits in hold space
5063	h
5064	s/^.*x//
5065	y/0123456789/1234567890/
5066	x
5067	@end group
5068
5069	@group
5070	# keep unchanged digits in pattern space
5071	s/x.*$//
5072	@end group
5073
5074	@group
5075	# compose the new number, remove the newline implicitly added by G
5076	G
5077	s/\n//
5078	h
5079	@end group
5080	@end example
5081	@c end---------------------------------------------
5082
5083	@node cat -b
5084	@section Numbering Non-blank Lines
5085
5086	Emulating @samp{cat -b} is almost the same as @samp{cat -n}---we only
5087	have to select which lines are to be numbered and which are not.
5088
5089	The part that is common to this script and the previous one is
5090	not commented to show how important it is to comment @command{sed}
5091	scripts properly...
5092
5093	@c start-------------------------------------------
5094	@example
5095	#!/usr/bin/sed -nf
5096
5097	@group
5098	/^$/ @{
5099	p
5100	b
5101	@}
5102	@end group
5103
5104	@group
5105	# Same as cat -n from now
5106	x
5107	/^$/ s/^.*$/1/
5108	G
5109	h
5110	s/^/ /
5111	s/^ *$......$\n/\1 /p
5112	x
5113	s/\n.*$//
5114	/^9*$/ s/^/0/
5115	s/.9*$/x&/
5116	h
5117	s/^.*x//
5118	y/0123456789/1234567890/
5119	x
5120	s/x.*$//
5121	G
5122	s/\n//
5123	h
5124	@end group
5125	@end example
5126	@c end---------------------------------------------
5127
5128	@node wc -c
5129	@section Counting Characters
5130
5131	This script shows another way to do arithmetic with @command{sed}.
5132	In this case we have to add possibly large numbers, so implementing
5133	this by successive increments would not be feasible (and possibly
5134	even more complicated to contrive than this script).
5135
5136	The approach is to map numbers to letters, kind of an abacus
5137	implemented with @command{sed}. @samp{a}s are units, @samp{b}s are
5138	tens and so on: we simply add the number of characters
5139	on the current line as units, and then propagate the carry
5140	to tens, hundreds, and so on.
5141
5142	As usual, running totals are kept in hold space.
5143
5144	On the last line, we convert the abacus form back to decimal.
5145	For the sake of variety, this is done with a loop rather than
5146	with some 80 @code{s} commands@footnote{Some implementations
5147	have a limit of 199 commands per script}: first we
5148	convert units, removing @samp{a}s from the number; then we
5149	rotate letters so that tens become @samp{a}s, and so on
5150	until no more letters remain.
5151
5152	@c start-------------------------------------------
5153	@example
5154	#!/usr/bin/sed -nf
5155
5156	@group
5157	# Add n+1 a's to hold space (+1 is for the newline)
5158	s/./a/g
5159	H
5160	x
5161	s/\n/a/
5162	@end group
5163
5164	@group
5165	# Do the carry. The t's and b's are not necessary,
5166	# but they do speed up the thing
5167	t a
5168	: a; s/aaaaaaaaaa/b/g; t b; b done
5169	: b; s/bbbbbbbbbb/c/g; t c; b done
5170	: c; s/cccccccccc/d/g; t d; b done
5171	: d; s/dddddddddd/e/g; t e; b done
5172	: e; s/eeeeeeeeee/f/g; t f; b done
5173	: f; s/ffffffffff/g/g; t g; b done
5174	: g; s/gggggggggg/h/g; t h; b done
5175	: h; s/hhhhhhhhhh//g
5176	@end group
5177
5178	@group
5179	: done
5180	$! @{
5181	h
5182	b
5183	@}
5184	@end group
5185
5186	# On the last line, convert back to decimal
5187
5188	@group
5189	: loop
5190	/a/! s/[b-h]*/&0/
5191	s/aaaaaaaaa/9/
5192	s/aaaaaaaa/8/
5193	s/aaaaaaa/7/
5194	s/aaaaaa/6/
5195	s/aaaaa/5/
5196	s/aaaa/4/
5197	s/aaa/3/
5198	s/aa/2/
5199	s/a/1/
5200	@end group
5201
5202	@group
5203	: next
5204	y/bcdefgh/abcdefg/
5205	/[a-h]/ b loop
5206	p
5207	@end group
5208	@end example
5209	@c end---------------------------------------------
5210
5211	@node wc -w
5212	@section Counting Words
5213
5214	This script is almost the same as the previous one, once each
5215	of the words on the line is converted to a single @samp{a}
5216	(in the previous script each letter was changed to an @samp{a}).
5217
5218	It is interesting that real @command{wc} programs have optimized
5219	loops for @samp{wc -c}, so they are much slower at counting
5220	words rather than characters. This script's bottleneck,
5221	instead, is arithmetic, and hence the word-counting one
5222	is faster (it has to manage smaller numbers).
5223
5224	Again, the common parts are not commented to show the importance
5225	of commenting @command{sed} scripts.
5226
5227	@c start-------------------------------------------
5228	@example
5229	#!/usr/bin/sed -nf
5230
5231	@group
5232	# Convert words to a's
5233	s/[ @kbd{@key{TAB}}][ @kbd{@key{TAB}}]*/ /g
5234	s/^/ /
5235	s/ [^ ][^ ]*/a /g
5236	s/ //g
5237	@end group
5238
5239	@group
5240	# Append them to hold space
5241	H
5242	x
5243	s/\n//
5244	@end group
5245
5246	@group
5247	# From here on it is the same as in wc -c.
5248	/aaaaaaaaaa/! bx; s/aaaaaaaaaa/b/g
5249	/bbbbbbbbbb/! bx; s/bbbbbbbbbb/c/g
5250	/cccccccccc/! bx; s/cccccccccc/d/g
5251	/dddddddddd/! bx; s/dddddddddd/e/g
5252	/eeeeeeeeee/! bx; s/eeeeeeeeee/f/g
5253	/ffffffffff/! bx; s/ffffffffff/g/g
5254	/gggggggggg/! bx; s/gggggggggg/h/g
5255	s/hhhhhhhhhh//g
5256	:x
5257	$! @{ h; b; @}
5258	:y
5259	/a/! s/[b-h]*/&0/
5260	s/aaaaaaaaa/9/
5261	s/aaaaaaaa/8/
5262	s/aaaaaaa/7/
5263	s/aaaaaa/6/
5264	s/aaaaa/5/
5265	s/aaaa/4/
5266	s/aaa/3/
5267	s/aa/2/
5268	s/a/1/
5269	y/bcdefgh/abcdefg/
5270	/[a-h]/ by
5271	p
5272	@end group
5273	@end example
5274	@c end---------------------------------------------
5275
5276	@node wc -l
5277	@section Counting Lines
5278
5279	No strange things are done now, because @command{sed} gives us
5280	@samp{wc -l} functionality for free!!! Look:
5281
5282	@c start-------------------------------------------
5283	@example
5284	@group
5285	#!/usr/bin/sed -nf
5286	$=
5287	@end group
5288	@end example
5289	@c end---------------------------------------------
5290
5291	@node head
5292	@section Printing the First Lines
5293
5294	This script is probably the simplest useful @command{sed} script.
5295	It displays the first 10 lines of input; the number of displayed
5296	lines is right before the @code{q} command.
5297
5298	@c start-------------------------------------------
5299	@example
5300	@group
5301	#!/usr/bin/sed -f
5302	10q
5303	@end group
5304	@end example
5305	@c end---------------------------------------------
5306
5307	@node tail
5308	@section Printing the Last Lines
5309
5310	Printing the last @var{n} lines rather than the first is more complex
5311	but indeed possible. @var{n} is encoded in the second line, before
5312	the bang character.
5313
5314	This script is similar to the @command{tac} script in that it keeps the
5315	final output in the hold space and prints it at the end:
5316
5317	@c start-------------------------------------------
5318	@example
5319	#!/usr/bin/sed -nf
5320
5321	@group
5322	1! @{; H; g; @}
5323	1,10 !s/[^\n]*\n//
5324	$p
5325	h
5326	@end group
5327	@end example
5328	@c end---------------------------------------------
5329
5330	Mainly, the scripts keeps a window of 10 lines and slides it
5331	by adding a line and deleting the oldest (the substitution command
5332	on the second line works like a @code{D} command but does not
5333	restart the loop).
5334
5335	The ``sliding window'' technique is a very powerful way to write
5336	efficient and complex @command{sed} scripts, because commands like
5337	@code{P} would require a lot of work if implemented manually.
5338
5339	To introduce the technique, which is fully demonstrated in the
5340	rest of this chapter and is based on the @code{N}, @code{P}
5341	and @code{D} commands, here is an implementation of @command{tail}
5342	using a simple ``sliding window.''
5343
5344	This looks complicated but in fact the working is the same as
5345	the last script: after we have kicked in the appropriate number
5346	of lines, however, we stop using the hold space to keep inter-line
5347	state, and instead use @code{N} and @code{D} to slide pattern
5348	space by one line:
5349
5350	@c start-------------------------------------------
5351	@example
5352	#!/usr/bin/sed -f
5353
5354	@group
5355	1h
5356	2,10 @{; H; g; @}
5357	$q
5358	1,9d
5359	N
5360	D
5361	@end group
5362	@end example
5363	@c end---------------------------------------------
5364
5365	Note how the first, second and fourth line are inactive after
5366	the first ten lines of input. After that, all the script does
5367	is: exiting on the last line of input, appending the next input
5368	line to pattern space, and removing the first line.
5369
5370	@node uniq
5371	@section Make Duplicate Lines Unique
5372
5373	This is an example of the art of using the @code{N}, @code{P}
5374	and @code{D} commands, probably the most difficult to master.
5375
5376	@c start-------------------------------------------
5377	@example
5378	@group
5379	#!/usr/bin/sed -f
5380	h
5381	@end group
5382
5383	@group
5384	:b
5385	# On the last line, print and exit
5386	$b
5387	N
5388	/^$.*$\n\1$/ @{
5389	# The two lines are identical. Undo the effect of
5390	# the n command.
5391	g
5392	bb
5393	@}
5394	@end group
5395
5396	@group
5397	# If the @code{N} command had added the last line, print and exit
5398	$b
5399	@end group
5400
5401	@group
5402	# The lines are different; print the first and go
5403	# back working on the second.
5404	P
5405	D
5406	@end group
5407	@end example
5408	@c end---------------------------------------------
5409
5410	As you can see, we maintain a 2-line window using @code{P} and @code{D}.
5411	This technique is often used in advanced @command{sed} scripts.
5412
5413	@node uniq -d
5414	@section Print Duplicated Lines of Input
5415
5416	This script prints only duplicated lines, like @samp{uniq -d}.
5417
5418	@c start-------------------------------------------
5419	@example
5420	#!/usr/bin/sed -nf
5421
5422	@group
5423	$b
5424	N
5425	/^$.*$\n\1$/ @{
5426	# Print the first of the duplicated lines
5427	s/.*\n//
5428	p
5429	@end group
5430
5431	@group
5432	# Loop until we get a different line
5433	:b
5434	$b
5435	N
5436	/^$.*$\n\1$/ @{
5437	s/.*\n//
5438	bb
5439	@}
5440	@}
5441	@end group
5442
5443	@group
5444	# The last line cannot be followed by duplicates
5445	$b
5446	@end group
5447
5448	@group
5449	# Found a different one. Leave it alone in the pattern space
5450	# and go back to the top, hunting its duplicates
5451	D
5452	@end group
5453	@end example
5454	@c end---------------------------------------------
5455
5456	@node uniq -u
5457	@section Remove All Duplicated Lines
5458
5459	This script prints only unique lines, like @samp{uniq -u}.
5460
5461	@c start-------------------------------------------
5462	@example
5463	#!/usr/bin/sed -f
5464
5465	@group
5466	# Search for a duplicate line --- until that, print what you find.
5467	$b
5468	N
5469	/^$.*$\n\1$/ ! @{
5470	P
5471	D
5472	@}
5473	@end group
5474
5475	@group
5476	:c
5477	# Got two equal lines in pattern space. At the
5478	# end of the file we simply exit
5479	$d
5480	@end group
5481
5482	@group
5483	# Else, we keep reading lines with @code{N} until we
5484	# find a different one
5485	s/.*\n//
5486	N
5487	/^$.*$\n\1$/ @{
5488	bc
5489	@}
5490	@end group
5491
5492	@group
5493	# Remove the last instance of the duplicate line
5494	# and go back to the top
5495	D
5496	@end group
5497	@end example
5498	@c end---------------------------------------------
5499
5500	@node cat -s
5501	@section Squeezing Blank Lines
5502
5503	As a final example, here are three scripts, of increasing complexity
5504	and speed, that implement the same function as @samp{cat -s}, that is
5505	squeezing blank lines.
5506
5507	The first leaves a blank line at the beginning and end if there are
5508	some already.
5509
5510	@c start-------------------------------------------
5511	@example
5512	#!/usr/bin/sed -f
5513
5514	@group
5515	# on empty lines, join with next
5516	# Note there is a star in the regexp
5517	:x
5518	/^\n*$/ @{
5519	N
5520	bx
5521	@}
5522	@end group
5523
5524	@group
5525	# now, squeeze all '\n', this can be also done by:
5526	# s/^$\n$*/\1/
5527	s/\n*/\
5528	/
5529	@end group
5530	@end example
5531	@c end---------------------------------------------
5532
5533	This one is a bit more complex and removes all empty lines
5534	at the beginning. It does leave a single blank line at end
5535	if one was there.
5536
5537	@c start-------------------------------------------
5538	@example
5539	#!/usr/bin/sed -f
5540
5541	@group
5542	# delete all leading empty lines
5543	1,/^./@{
5544	/./!d
5545	@}
5546	@end group
5547
5548	@group
5549	# on an empty line we remove it and all the following
5550	# empty lines, but one
5551	:x
5552	/./!@{
5553	N
5554	s/^\n$//
5555	tx
5556	@}
5557	@end group
5558	@end example
5559	@c end---------------------------------------------
5560
5561	This removes leading and trailing blank lines. It is also the
5562	fastest. Note that loops are completely done with @code{n} and
5563	@code{b}, without relying on @command{sed} to restart the
5564	script automatically at the end of a line.
5565
5566	@c start-------------------------------------------
5567	@example
5568	#!/usr/bin/sed -nf
5569
5570	@group
5571	# delete all (leading) blanks
5572	/./!d
5573	@end group
5574
5575	@group
5576	# get here: so there is a non empty
5577	:x
5578	# print it
5579	p
5580	# get next
5581	n
5582	# got chars? print it again, etc...
5583	/./bx
5584	@end group
5585
5586	@group
5587	# no, don't have chars: got an empty line
5588	:z
5589	# get next, if last line we finish here so no trailing
5590	# empty lines are written
5591	n
5592	# also empty? then ignore it, and get next... this will
5593	# remove ALL empty lines
5594	/./!bz
5595	@end group
5596
5597	@group
5598	# all empty lines were deleted/ignored, but we have a non empty. As
5599	# what we want to do is to squeeze, insert a blank line artificially
5600	i\
5601	@end group
5602
5603	bx
5604	@end example
5605	@c end---------------------------------------------
5606
5607	@node Limitations
5608	@chapter @value{SSED}'s Limitations and Non-limitations
5609
5610	@cindex GNU extensions, unlimited line length
5611	@cindex Portability, line length limitations
5612	For those who want to write portable @command{sed} scripts,
5613	be aware that some implementations have been known to
5614	limit line lengths (for the pattern and hold spaces)
5615	to be no more than 4000 bytes.
5616	The @sc{posix} standard specifies that conforming @command{sed}
5617	implementations shall support at least 8192 byte line lengths.
5618	@value{SSED} has no built-in limit on line length;
5619	as long as it can @code{malloc()} more (virtual) memory,
5620	you can feed or construct lines as long as you like.
5621
5622	However, recursion is used to handle subpatterns and indefinite
5623	repetition. This means that the available stack space may limit
5624	the size of the buffer that can be processed by certain patterns.
5625
5626
5627	@node Other Resources
5628	@chapter Other Resources for Learning About @command{sed}
5629
5630	For up to date information about @value{SSED} please
5631	visit @uref{https://www.gnu.org/software/sed/}.
5632
5633	Send general questions and suggestions to @email{sed-devel@@gnu.org}.
5634	Visit the mailing list archives for past discussions at
5635	@uref{https://lists.gnu.org/archive/html/sed-devel/}.
5636
5637	@cindex Additional reading about @command{sed}
5638	The following resources provide information about @command{sed}
5639	(both @value{SSED} and other variations). Note these not maintained by
5640	@value{SSED} developers.
5641
5642	@itemize @bullet
5643
5644	@item
5645	sed @code{$HOME}: @uref{http://sed.sf.net}
5646
5647	@item
5648	sed FAQ: @uref{http://sed.sf.net/sedfaq.html}
5649
5650	@item
5651	seder's grabbag: @uref{http://sed.sf.net/grabbag}
5652
5653	@item
5654	The @code{sed-users} mailing list maintained by Sven Guckes:
5655	@uref{http://groups.yahoo.com/group/sed-users/}
5656	(note this is @emph{not} the @value{SSED} mailing list).
5657
5658	@end itemize
5659
5660	@node Reporting Bugs
5661	@chapter Reporting Bugs
5662
5663	@cindex Bugs, reporting
5664	Email bug reports to @email{bug-sed@@gnu.org}.
5665	Also, please include the output of @samp{sed --version} in the body
5666	of your report if at all possible.
5667
5668	Please do not send a bug report like this:
5669
5670	@example
5671	@i{@i{@r{while building frobme-1.3.4}}}
5672	$ configure
5673	@error{} sed: file sedscr line 1: Unknown option to 's'
5674	@end example
5675
5676	If @value{SSED} doesn't configure your favorite package, take a
5677	few extra minutes to identify the specific problem and make a stand-alone
5678	test case. Unlike other programs such as C compilers, making such test
5679	cases for @command{sed} is quite simple.
5680
5681	A stand-alone test case includes all the data necessary to perform the
5682	test, and the specific invocation of @command{sed} that causes the problem.
5683	The smaller a stand-alone test case is, the better. A test case should
5684	not involve something as far removed from @command{sed} as ``try to configure
5685	frobme-1.3.4''. Yes, that is in principle enough information to look
5686	for the bug, but that is not a very practical prospect.
5687
5688	Here are a few commonly reported bugs that are not bugs.
5689
5690	@table @asis
5691	@anchor{N_command_last_line}
5692	@item @code{N} command on the last line
5693	@cindex Portability, @code{N} command on the last line
5694	@cindex Non-bugs, @code{N} command on the last line
5695
5696	Most versions of @command{sed} exit without printing anything when
5697	the @command{N} command is issued on the last line of a file.
5698	@value{SSED} prints pattern space before exiting unless of course
5699	the @command{-n} command switch has been specified. This choice is
5700	by design.
5701
5702	Default behavior (gnu extension, non-POSIX conforming):
5703	@example
5704	$ seq 3 \| sed N
5705	1
5706	2
5707	3
5708	@end example
5709	@noindent
5710	To force POSIX-conforming behavior:
5711	@example
5712	$ seq 3 \| sed --posix N
5713	1
5714	2
5715	@end example
5716
5717	For example, the behavior of
5718	@example
5719	sed N foo bar
5720	@end example
5721	@noindent
5722	would depend on whether foo has an even or an odd number of
5723	lines@footnote{which is the actual ``bug'' that prompted the
5724	change in behavior}. Or, when writing a script to read the
5725	next few lines following a pattern match, traditional
5726	implementations of @code{sed} would force you to write
5727	something like
5728	@example
5729	/foo/@{ $!N; $!N; $!N; $!N; $!N; $!N; $!N; $!N; $!N @}
5730	@end example
5731	@noindent
5732	instead of just
5733	@example
5734	/foo/@{ N;N;N;N;N;N;N;N;N; @}
5735	@end example
5736
5737	@cindex @code{POSIXLY_CORRECT} behavior, @code{N} command
5738	In any case, the simplest workaround is to use @code{$d;N} in
5739	scripts that rely on the traditional behavior, or to set
5740	the @code{POSIXLY_CORRECT} variable to a non-empty value.
5741
5742	@item Regex syntax clashes (problems with backslashes)
5743	@cindex GNU extensions, to basic regular expressions
5744	@cindex Non-bugs, regex syntax clashes
5745	@command{sed} uses the @sc{posix} basic regular expression syntax. According to
5746	the standard, the meaning of some escape sequences is undefined in
5747	this syntax; notable in the case of @command{sed} are @code{\\|},
5748	@code{\+}, @code{\?}, @code{\`}, @code{\'}, @code{\<},
5749	@code{\>}, @code{\b}, @code{\B}, @code{\w}, and @code{\W}.
5750
5751	As in all GNU programs that use @sc{posix} basic regular
5752	expressions, @command{sed} interprets these escape sequences as special
5753	characters. So, @code{x\+} matches one or more occurrences of @samp{x}.
5754	@code{abc\\|def} matches either @samp{abc} or @samp{def}.
5755
5756	This syntax may cause problems when running scripts written for other
5757	@command{sed}s. Some @command{sed} programs have been written with the
5758	assumption that @code{\\|} and @code{\+} match the literal characters
5759	@code{\|} and @code{+}. Such scripts must be modified by removing the
5760	spurious backslashes if they are to be used with modern implementations
5761	of @command{sed}, like
5762	GNU @command{sed}.
5763
5764	On the other hand, some scripts use s\|abc\\|def\|\|g to remove occurrences
5765	of @emph{either} @code{abc} or @code{def}. While this worked until
5766	@command{sed} 4.0.x, newer versions interpret this as removing the
5767	string @code{abc\|def}. This is again undefined behavior according to
5768	POSIX, and this interpretation is arguably more robust: older
5769	@command{sed}s, for example, required that the regex matcher parsed
5770	@code{\/} as @code{/} in the common case of escaping a slash, which is
5771	again undefined behavior; the new behavior avoids this, and this is good
5772	because the regex matcher is only partially under our control.
5773
5774	@cindex GNU extensions, special escapes
5775	In addition, this version of @command{sed} supports several escape characters
5776	(some of which are multi-character) to insert non-printable characters
5777	in scripts (@code{\a}, @code{\c}, @code{\d}, @code{\o}, @code{\r},
5778	@code{\t}, @code{\v}, @code{\x}). These can cause similar problems
5779	with scripts written for other @command{sed}s.
5780
5781	@item @option{-i} clobbers read-only files
5782	@cindex In-place editing
5783	@cindex @value{SSEDEXT}, in-place editing
5784	@cindex Non-bugs, in-place editing
5785
5786	In short, @samp{sed -i} will let you delete the contents of
5787	a read-only file, and in general the @option{-i} option
5788	(@pxref{Invoking sed, , Invocation}) lets you clobber
5789	protected files. This is not a bug, but rather a consequence
5790	of how the Unix file system works.
5791
5792	The permissions on a file say what can happen to the data
5793	in that file, while the permissions on a directory say what can
5794	happen to the list of files in that directory. @samp{sed -i}
5795	will not ever open for writing a file that is already on disk.
5796	Rather, it will work on a temporary file that is finally renamed
5797	to the original name: if you rename or delete files, you're actually
5798	modifying the contents of the directory, so the operation depends on
5799	the permissions of the directory, not of the file. For this same
5800	reason, @command{sed} does not let you use @option{-i} on a writable file
5801	in a read-only directory, and will break hard or symbolic links when
5802	@option{-i} is used on such a file.
5803
5804	@item @code{0a} does not work (gives an error)
5805	@cindex @code{0} address
5806	@cindex GNU extensions, @code{0} address
5807	@cindex Non-bugs, @code{0} address
5808
5809	There is no line 0. 0 is a special address that is only used to treat
5810	addresses like @code{0,/@var{RE}/} as active when the script starts: if
5811	you write @code{1,/abc/d} and the first line includes the string @samp{abc},
5812	then that match would be ignored because address ranges must span at least
5813	two lines (barring the end of the file); but what you probably wanted is
5814	to delete every line up to the first one including @samp{abc}, and this
5815	is obtained with @code{0,/abc/d}.
5816
5817	@ifclear PERL
5818	@item @code{[a-z]} is case insensitive
5819	@cindex Non-bugs, localization-related
5820
5821	You are encountering problems with locales. POSIX mandates that @code{[a-z]}
5822	uses the current locale's collation order -- in C parlance, that means using
5823	@code{strcoll(3)} instead of @code{strcmp(3)}. Some locales have a
5824	case-insensitive collation order, others don't.
5825
5826	Another problem is that @code{[a-z]} tries to use collation symbols.
5827	This only happens if you are on the GNU system, using
5828	GNU libc's regular expression matcher instead of compiling the
5829	one supplied with GNU sed. In a Danish locale, for example,
5830	the regular expression @code{^[a-z]$} matches the string @samp{aa},
5831	because this is a single collating symbol that comes after @samp{a}
5832	and before @samp{b}; @samp{ll} behaves similarly in Spanish
5833	locales, or @samp{ij} in Dutch locales.
5834
5835	To work around these problems, which may cause bugs in shell scripts, set
5836	the @env{LC_COLLATE} and @env{LC_CTYPE} environment variables to @samp{C}.
5837
5838	@item @code{s/.*//} does not clear pattern space
5839	@cindex Non-bugs, localization-related
5840	@cindex @value{SSEDEXT}, emptying pattern space
5841	@cindex Emptying pattern space
5842
5843	This happens if your input stream includes invalid multibyte
5844	sequences. @sc{posix} mandates that such sequences
5845	are @emph{not} matched by @samp{.}, so that @samp{s/.*//} will not clear
5846	pattern space as you would expect. In fact, there is no way to clear
5847	sed's buffers in the middle of the script in most multibyte locales
5848	(including UTF-8 locales). For this reason, @value{SSED} provides a `z'
5849	command (for `zap') as an extension.
5850
5851	To work around these problems, which may cause bugs in shell scripts, set
5852	the @env{LC_COLLATE} and @env{LC_CTYPE} environment variables to @samp{C}.
5853	@end ifclear
5854	@end table
5855
5856
5857
5858
5859	@page
5860	@node GNU Free Documentation License
5861	@appendix GNU Free Documentation License
5862
5863	@include fdl.texi
5864
5865
5866	@page
5867	@node Concept Index
5868	@unnumbered Concept Index
5869
5870	This is a general index of all issues discussed in this manual, with the
5871	exception of the @command{sed} commands and command-line options.
5872
5873	@printindex cp
5874
5875	@page
5876	@node Command and Option Index
5877	@unnumbered Command and Option Index
5878
5879	This is an alphabetical list of all @command{sed} commands and command-line
5880	options.
5881
5882	@printindex fn
5883
5884	@contents
5885	@bye
5886
5887	@c XXX FIXME: the term "cycle" is never defined...

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: trunk/src/sed/doc/sed.texi

Download in other formats: