Context Navigation

← Previous Revision
Next Revision →
Normal
Revision Log

sed.texi

Last change on this file was 3613, checked in by bird, 14 months ago
src/sed: Merged in changes between 4.1.5 and 4.9 from the vendor branch. (svn merge ^{/vendor/sed/4.1.5}/vendor/sed/current .)
File size: 161.5 KB

Rev	Line
[599]	1	\input texinfo @c --texinfo--
	2	@c
	3	@c -- Stuff that needs adding: ----------------------------------------------
[3613]	4	@c (nothing!)
[599]	5	@c --------------------------------------------------------------------------
	6	@c Check for consistency: regexps in @code, text that they match in @samp.
[3613]	7	@c
[599]	8	@c Tips:
	9	@c @command for command
	10	@c @samp for command fragments: @samp{cat -s}
	11	@c @code for sed commands and flags
	12	@c Use ``quote'' not `quote' or "quote".
	13	@c
	14	@c %**start of header
	15	@setfilename sed.info
	16	@settitle sed, a stream editor
	17	@c %**end of header
	18
	19	@c @smallbook
	20
	21	@include version.texi
	22
	23	@c Combine indices.
	24	@syncodeindex ky cp
	25	@syncodeindex pg cp
	26	@syncodeindex tp cp
	27
	28	@defcodeindex op
	29	@syncodeindex op fn
	30
	31	@include config.texi
	32
	33	@copying
	34	This file documents version @value{VERSION} of
	35	@value{SSED}, a stream editor.
	36
[3613]	37	Copyright @copyright{} 1998--2022 Free Software Foundation, Inc.
[599]	38
[3613]	39	@quotation
	40	Permission is granted to copy, distribute and/or modify this document
	41	under the terms of the GNU Free Documentation License, Version 1.3
	42	or any later version published by the Free Software Foundation;
	43	with no Invariant Sections, no Front-Cover Texts, and no
	44	Back-Cover Texts. A copy of the license is included in the
	45	section entitled ``GNU Free Documentation License''.
	46	@end quotation
[599]	47	@end copying
	48
	49	@setchapternewpage off
	50
	51	@titlepage
[3613]	52	@title @value{SSED}, a stream editor
[599]	53	@subtitle version @value{VERSION}, @value{UPDATED}
[3613]	54	@author by Ken Pizzini, Paolo Bonzini, Jim Meyering, Assaf Gordon
[599]	55
	56	@page
	57	@vskip 0pt plus 1filll
	58	@insertcopying
	59	@end titlepage
	60
[3613]	61	@contents
[599]	62
[3613]	63	@ifnottex
[599]	64	@node Top
[3613]	65	@top @value{SSED}
[599]	66
	67	@insertcopying
	68	@end ifnottex
	69
	70	@menu
	71	* Introduction:: Introduction
	72	* Invoking sed:: Invocation
[3613]	73	* sed scripts:: @command{sed} scripts
	74	* sed addresses:: Addresses: selecting lines
	75	* sed regular expressions:: Regular expressions: selecting text
	76	* advanced sed:: Advanced @command{sed}: cycles and buffers
[599]	77	* Examples:: Some sample scripts
	78	* Limitations:: Limitations and (non-)limitations of @value{SSED}
	79	* Other Resources:: Other resources for learning about @command{sed}
	80	* Reporting Bugs:: Reporting bugs
[3613]	81	* GNU Free Documentation License:: Copying and sharing this manual
[599]	82	* Concept Index:: A menu with all the topics in this manual.
	83	* Command and Option Index:: A menu with all @command{sed} commands and
	84	command-line options.
	85	@end menu
	86
	87
	88	@node Introduction
	89	@chapter Introduction
	90
	91	@cindex Stream editor
	92	@command{sed} is a stream editor.
	93	A stream editor is used to perform basic text
	94	transformations on an input stream
	95	(a file or input from a pipeline).
	96	While in some ways similar to an editor which
	97	permits scripted edits (such as @command{ed}),
	98	@command{sed} works by making only one pass over the
	99	input(s), and is consequently more efficient.
	100	But it is @command{sed}'s ability to filter text in a pipeline
	101	which particularly distinguishes it from other types of
	102	editors.
	103
	104
	105	@node Invoking sed
[3613]	106	@chapter Running sed
[599]	107
[3613]	108	This chapter covers how to run @command{sed}. Details of @command{sed}
	109	scripts and individual @command{sed} commands are discussed in the
	110	next chapter.
	111
	112	@menu
	113	* Overview::
	114	* Command-Line Options::
	115	* Exit status::
	116	@end menu
	117
	118
	119	@node Overview
	120	@section Overview
[599]	121	Normally @command{sed} is invoked like this:
	122
	123	@example
	124	sed SCRIPT INPUTFILE...
	125	@end example
	126
[3613]	127	For example, to change every @samp{hello} to @samp{world}
	128	in the file @file{input.txt}:
	129
	130	@example
	131	sed 's/hello/world/g' input.txt > output.txt
	132	@end example
	133
	134	Without the @samp{g} (global) modifier, @command{sed} affects
	135	only the first instance per line.
	136
	137	@cindex stdin
	138	@cindex standard input
	139	If you do not specify @var{INPUTFILE}, or if @var{INPUTFILE} is @file{-},
	140	@command{sed} filters the contents of the standard input. The following
	141	commands are equivalent:
	142
	143	@example
	144	sed 's/hello/world/g' input.txt > output.txt
	145	sed 's/hello/world/g' < input.txt > output.txt
	146	cat input.txt \| sed 's/hello/world/g' - > output.txt
	147	@end example
	148
	149	@cindex stdout
	150	@cindex output
	151	@cindex standard output
	152	@cindex -i, example
	153	@command{sed} writes output to standard output. Use @option{-i} to edit
	154	files in-place instead of printing to standard output.
	155	See also the @code{W} and @code{s///w} commands for writing output to
	156	other files. The following command modifies @file{file.txt} and
	157	does not produce any output:
	158
	159	@example
	160	sed -i 's/hello/world/' file.txt
	161	@end example
	162
	163	@cindex -n, example
	164	@cindex p, example
	165	@cindex suppressing output
	166	@cindex output, suppressing
	167	By default @command{sed} prints all processed input (except input
	168	that has been modified/deleted by commands such as @command{d}).
	169	Use @option{-n} to suppress output, and the @code{p} command
	170	to print specific lines. The following command prints only line 45
	171	of the input file:
	172
	173	@example
	174	sed -n '45p' file.txt
	175	@end example
	176
	177
	178
	179	@cindex multiple files
	180	@cindex -s, example
	181	@command{sed} treats multiple input files as one long stream.
	182	The following example prints the first line of the first file
	183	(@file{one.txt}) and the last line of the last file (@file{three.txt}).
	184	Use @option{-s} to reverse this behavior.
	185
	186	@example
	187	sed -n '1p ; $p' one.txt two.txt three.txt
	188	@end example
	189
	190
	191	@cindex -e, example
	192	@cindex --expression, example
	193	@cindex -f, example
	194	@cindex --file, example
	195	@cindex script parameter
	196	@cindex parameters, script
	197	Without @option{-e} or @option{-f} options, @command{sed} uses
	198	the first non-option parameter as the @var{script}, and the following
	199	non-option parameters as input files.
	200	If @option{-e} or @option{-f} options are used to specify a @var{script},
	201	all non-option parameters are taken as input files.
	202	Options @option{-e} and @option{-f} can be combined, and can appear
	203	multiple times (in which case the final effective @var{script} will be
	204	concatenation of all the individual @var{script}s).
	205
	206	The following examples are equivalent:
	207
	208	@example
	209	sed 's/hello/world/' input.txt > output.txt
	210
	211	sed -e 's/hello/world/' input.txt > output.txt
	212	sed --expression='s/hello/world/' input.txt > output.txt
	213
	214	echo 's/hello/world/' > myscript.sed
	215	sed -f myscript.sed input.txt > output.txt
	216	sed --file=myscript.sed input.txt > output.txt
	217	@end example
	218
	219
	220	@node Command-Line Options
	221	@section Command-Line Options
	222
[599]	223	The full format for invoking @command{sed} is:
	224
	225	@example
	226	sed OPTIONS... [SCRIPT] [INPUTFILE...]
	227	@end example
	228
	229	@command{sed} may be invoked with the following command-line options:
	230
	231	@table @code
	232	@item --version
	233	@opindex --version
	234	@cindex Version, printing
	235	Print out the version of @command{sed} that is being run and a copyright notice,
	236	then exit.
	237
	238	@item --help
	239	@opindex --help
	240	@cindex Usage summary, printing
	241	Print a usage message briefly summarizing these command-line options
	242	and the bug-reporting address,
	243	then exit.
	244
	245	@item -n
	246	@itemx --quiet
	247	@itemx --silent
	248	@opindex -n
	249	@opindex --quiet
	250	@opindex --silent
	251	@cindex Disabling autoprint, from command line
	252	By default, @command{sed} prints out the pattern space
[3613]	253	at the end of each cycle through the script (@pxref{Execution Cycle, ,
	254	How @code{sed} works}).
[599]	255	These options disable this automatic printing,
	256	and @command{sed} only produces output when explicitly told to
	257	via the @code{p} command.
	258
[3613]	259	@item --debug
	260	@opindex --debug
	261	@cindex @value{SSEDEXT}, debug
	262	Print the input sed program in canonical form,
	263	and annotate program execution.
	264	@codequotebacktick on
	265	@codequoteundirected on
	266	@example
	267	$ echo 1 \| sed '\%1%s21232'
	268	3
	269
	270	$ echo 1 \| sed --debug '\%1%s21232'
	271	SED PROGRAM:
	272	/1/ s/1/3/
	273	INPUT: 'STDIN' line 1
	274	PATTERN: 1
	275	COMMAND: /1/ s/1/3/
	276	PATTERN: 3
	277	END-OF-CYCLE:
	278	3
	279	@end example
	280	@codequotebacktick off
	281	@codequoteundirected off
	282
	283
	284	@item -e @var{script}
	285	@itemx --expression=@var{script}
	286	@opindex -e
	287	@opindex --expression
	288	@cindex Script, from command line
	289	Add the commands in @var{script} to the set of commands to be
	290	run while processing the input.
	291
	292	@item -f @var{script-file}
	293	@itemx --file=@var{script-file}
	294	@opindex -f
	295	@opindex --file
	296	@cindex Script, from a file
	297	Add the commands contained in the file @var{script-file}
	298	to the set of commands to be run while processing the input.
	299
[599]	300	@item -i[@var{SUFFIX}]
	301	@itemx --in-place[=@var{SUFFIX}]
	302	@opindex -i
	303	@opindex --in-place
	304	@cindex In-place editing, activating
	305	@cindex @value{SSEDEXT}, in-place editing
	306	This option specifies that files are to be edited in-place.
	307	@value{SSED} does this by creating a temporary file and
	308	sending output to this file rather than to the standard
	309	output.@footnote{This applies to commands such as @code{=},
	310	@code{a}, @code{c}, @code{i}, @code{l}, @code{p}. You can
	311	still write to the standard output by using the @code{w}
	312	@cindex @value{SSEDEXT}, @file{/dev/stdout} file
	313	or @code{W} commands together with the @file{/dev/stdout}
	314	special file}.
	315
	316	This option implies @option{-s}.
	317
	318	When the end of the file is reached, the temporary file is
	319	renamed to the output file's original name. The extension,
	320	if supplied, is used to modify the name of the old file
	321	before renaming the temporary file, thereby making a backup
	322	copy@footnote{Note that @value{SSED} creates the backup
[3613]	323	file whether or not any output is actually changed.}).
[599]	324
	325	@cindex In-place editing, Perl-style backup file names
	326	This rule is followed: if the extension doesn't contain a @code{*},
	327	then it is appended to the end of the current filename as a
	328	suffix; if the extension does contain one or more @code{*}
	329	characters, then @emph{each} asterisk is replaced with the
	330	current filename. This allows you to add a prefix to the
	331	backup file, instead of (or in addition to) a suffix, or
	332	even to place backup copies of the original files into another
	333	directory (provided the directory already exists).
	334
	335	If no extension is supplied, the original file is
	336	overwritten without making a backup.
	337
[3613]	338	Because @option{-i} takes an optional argument, it should
	339	not be followed by other short options:
	340	@table @code
	341	@item sed -Ei '...' FILE
	342	Same as @option{-E -i} with no backup suffix - @file{FILE} will be
	343	edited in-place without creating a backup.
	344
	345	@item sed -iE '...' FILE
	346	This is equivalent to @option{--in-place=E}, creating @file{FILEE} as backup
	347	of @file{FILE}
	348	@end table
	349
	350	Be cautious of using @option{-n} with @option{-i}: the former disables
	351	automatic printing of lines and the latter changes the file in-place
	352	without a backup. Used carelessly (and without an explicit @code{p} command),
	353	the output file will be empty:
	354	@codequotebacktick on
	355	@codequoteundirected on
	356	@example
	357	# WRONG USAGE: 'FILE' will be truncated.
	358	sed -ni 's/foo/bar/' FILE
	359	@end example
	360	@codequotebacktick off
	361	@codequoteundirected off
	362
[599]	363	@item -l @var{N}
	364	@itemx --line-length=@var{N}
	365	@opindex -l
	366	@opindex --line-length
	367	@cindex Line length, setting
	368	Specify the default line-wrap length for the @code{l} command.
	369	A length of 0 (zero) means to never wrap long lines. If
	370	not specified, it is taken to be 70.
	371
	372	@item --posix
[3613]	373	@opindex --posix
[599]	374	@cindex @value{SSEDEXT}, disabling
[3613]	375	@value{SSED} includes several extensions to POSIX
[599]	376	sed. In order to simplify writing portable scripts, this
	377	option disables all the extensions that this manual documents,
	378	including additional commands.
	379	@cindex @code{POSIXLY_CORRECT} behavior, enabling
	380	Most of the extensions accept @command{sed} programs that
[3613]	381	are outside the syntax mandated by POSIX, but some
[599]	382	of them (such as the behavior of the @command{N} command
[3613]	383	described in @ref{Reporting Bugs}) actually violate the
[599]	384	standard. If you want to disable only the latter kind of
	385	extension, you can set the @code{POSIXLY_CORRECT} variable
	386	to a non-empty value.
	387
[3613]	388	@item -b
	389	@itemx --binary
	390	@opindex -b
	391	@opindex --binary
	392	This option is available on every platform, but is only effective where the
	393	operating system makes a distinction between text files and binary files.
	394	When such a distinction is made---as is the case for MS-DOS, Windows,
	395	Cygwin---text files are composed of lines separated by a carriage return
	396	@emph{and} a line feed character, and @command{sed} does not see the
	397	ending CR. When this option is specified, @command{sed} will open
	398	input files in binary mode, thus not requesting this special processing
	399	and considering lines to end at a line feed.
	400
	401	@item --follow-symlinks
	402	@opindex --follow-symlinks
	403	This option is available only on platforms that support
	404	symbolic links and has an effect only if option @option{-i}
	405	is specified. In this case, if the file that is specified
	406	on the command line is a symbolic link, @command{sed} will
	407	follow the link and edit the ultimate destination of the
	408	link. The default behavior is to break the symbolic link,
	409	so that the link destination will not be modified.
	410
	411	@item -E
	412	@itemx -r
[599]	413	@itemx --regexp-extended
[3613]	414	@opindex -E
[599]	415	@opindex -r
	416	@opindex --regexp-extended
	417	@cindex Extended regular expressions, choosing
[3613]	418	@cindex GNU extensions, extended regular expressions
[599]	419	Use extended regular expressions rather than basic
	420	regular expressions. Extended regexps are those that
	421	@command{egrep} accepts; they can be clearer because they
[3613]	422	usually have fewer backslashes.
	423	Historically this was a GNU extension,
	424	but the @option{-E}
	425	extension has since been added to the POSIX standard
	426	(http://austingroupbugs.net/view.php?id=528),
	427	so use @option{-E} for portability.
	428	GNU sed has accepted @option{-E} as an undocumented option for years,
	429	and *BSD seds have accepted @option{-E} for years as well,
	430	but scripts that use @option{-E} might not port to other older systems.
	431	@xref{ERE syntax, , Extended regular expressions}.
[599]	432
	433
	434	@item -s
	435	@itemx --separate
[3613]	436	@opindex -s
	437	@opindex --separate
[599]	438	@cindex Working on separate files
	439	By default, @command{sed} will consider the files specified on the
	440	command line as a single continuous long stream. This @value{SSED}
	441	extension allows the user to consider them as separate files:
	442	range addresses (such as @samp{/abc/,/def/}) are not allowed
	443	to span several files, line numbers are relative to the start
	444	of each file, @code{$} refers to the last line of each file,
	445	and files invoked from the @code{R} commands are rewound at the
	446	start of each file.
	447
[3613]	448	@item --sandbox
	449	@opindex --sandbox
	450	@cindex Sandbox mode
	451	In sandbox mode, @code{e/w/r} commands are rejected - programs containing
	452	them will be aborted without being run. Sandbox mode ensures @command{sed}
	453	operates only on the input files designated on the command line, and
	454	cannot run external programs.
	455
	456
[599]	457	@item -u
	458	@itemx --unbuffered
	459	@opindex -u
	460	@opindex --unbuffered
	461	@cindex Unbuffered I/O, choosing
	462	Buffer both input and output as minimally as practical.
	463	(This is particularly useful if the input is coming from
	464	the likes of @samp{tail -f}, and you wish to see the transformed
	465	output as soon as possible.)
	466
[3613]	467	@item -z
	468	@itemx --null-data
	469	@itemx --zero-terminated
	470	@opindex -z
	471	@opindex --null-data
	472	@opindex --zero-terminated
	473	Treat the input as a set of lines, each terminated by a zero byte
	474	(the ASCII @samp{NUL} character) instead of a newline. This option can
	475	be used with commands like @samp{sort -z} and @samp{find -print0}
	476	to process arbitrary file names.
[599]	477	@end table
	478
	479	If no @option{-e}, @option{-f}, @option{--expression}, or @option{--file}
	480	options are given on the command-line,
	481	then the first non-option argument on the command line is
	482	taken to be the @var{script} to be executed.
	483
	484	@cindex Files to be processed as input
	485	If any command-line parameters remain after processing the above,
	486	these parameters are interpreted as the names of input files to
	487	be processed.
	488	@cindex Standard input, processing as input
	489	A file name of @samp{-} refers to the standard input stream.
	490	The standard input will be processed if no file names are specified.
	491
[3613]	492	@node Exit status
	493	@section Exit status
	494	@cindex exit status
	495	An exit status of zero indicates success, and a nonzero value
	496	indicates failure. @value{SSED} returns the following exit status
	497	error values:
[599]	498
[3613]	499	@table @asis
	500	@item 0
	501	Successful completion.
[599]	502
[3613]	503	@item 1
	504	Invalid command, invalid syntax, invalid regular expression or a
	505	@value{SSED} extension command used with @option{--posix}.
[599]	506
[3613]	507	@item 2
	508	One or more of the input file specified on the command line could not be
	509	opened (e.g. if a file is not found, or read permission is denied).
	510	Processing continued with other files.
[599]	511
[3613]	512	@item 4
	513	An I/O error, or a serious processing error during runtime,
	514	@value{SSED} aborted immediately.
	515	@end table
	516
	517	@cindex Q, example
	518	@cindex exit status, example
	519	Additionally, the commands @code{q} and @code{Q} can be used to terminate
	520	@command{sed} with a custom exit code value (this is a @value{SSED} extension):
	521
	522	@example
	523	$ echo \| sed 'Q42' ; echo $?
	524	42
	525	@end example
	526
	527
	528	@node sed scripts
	529	@chapter @command{sed} scripts
	530
	531
[599]	532	@menu
[3613]	533	* sed script overview:: @command{sed} script overview
	534	* sed commands list:: @command{sed} commands summary
	535	* The "s" Command:: @command{sed}'s Swiss Army Knife
[599]	536	* Common Commands:: Often used commands
	537	* Other Commands:: Less frequently used commands
	538	* Programming Commands:: Commands for @command{sed} gurus
	539	* Extended Commands:: Commands specific of @value{SSED}
[3613]	540	* Multiple commands syntax:: Extension for easier scripting
[599]	541	@end menu
	542
[3613]	543	@node sed script overview
	544	@section @command{sed} script overview
[599]	545
[3613]	546	@cindex @command{sed} script structure
	547	@cindex Script structure
[599]	548
[3613]	549	A @command{sed} program consists of one or more @command{sed} commands,
	550	passed in by one or more of the
	551	@option{-e}, @option{-f}, @option{--expression}, and @option{--file}
	552	options, or the first non-option argument if zero of these
	553	options are used.
	554	This document will refer to ``the'' @command{sed} script;
	555	this is understood to mean the in-order concatenation
	556	of all of the @var{script}s and @var{script-file}s passed in.
	557	@xref{Overview}.
[599]	558
	559
[3613]	560	@cindex @command{sed} commands syntax
	561	@cindex syntax, @command{sed} commands
	562	@cindex addresses, syntax
	563	@cindex syntax, addresses
	564	@command{sed} commands follow this syntax:
[599]	565
[3613]	566	@example
	567	[addr]@var{X}[options]
	568	@end example
[599]	569
[3613]	570	@var{X} is a single-letter @command{sed} command.
	571	@c TODO: add @pxref{commands} when there is a command-list section.
	572	@code{[addr]} is an optional line address. If @code{[addr]} is specified,
	573	the command @var{X} will be executed only on the matched lines.
	574	@code{[addr]} can be a single line number, a regular expression,
	575	or a range of lines (@pxref{sed addresses}).
	576	Additional @code{[options]} are used for some @command{sed} commands.
[599]	577
[3613]	578	@cindex @command{d}, example
	579	@cindex address range, example
	580	@cindex example, address range
	581	The following example deletes lines 30 to 35 in the input.
	582	@code{30,35} is an address range. @command{d} is the delete command:
[599]	583
[3613]	584	@example
	585	sed '30,35d' input.txt > output.txt
	586	@end example
[599]	587
[3613]	588	@cindex @command{q}, example
	589	@cindex regular expression, example
	590	@cindex example, regular expression
	591	The following example prints all input until a line
	592	starting with the string @samp{foo} is found. If such line is found,
	593	@command{sed} will terminate with exit status 42.
	594	If such line was not found (and no other error occurred), @command{sed}
	595	will exit with status 0.
	596	@code{/^foo/} is a regular-expression address.
	597	@command{q} is the quit command. @code{42} is the command option.
[599]	598
[3613]	599	@example
	600	sed '/^foo/q42' input.txt > output.txt
	601	@end example
[599]	602
	603
[3613]	604	@cindex multiple @command{sed} commands
	605	@cindex @command{sed} commands, multiple
	606	@cindex newline, command separator
	607	@cindex semicolons, command separator
	608	@cindex ;, command separator
	609	@cindex -e, example
	610	@cindex -f, example
	611	Commands within a @var{script} or @var{script-file} can be
	612	separated by semicolons (@code{;}) or newlines (ASCII 10).
	613	Multiple scripts can be specified with @option{-e} or @option{-f}
	614	options.
[599]	615
[3613]	616	The following examples are all equivalent. They perform two @command{sed}
	617	operations: deleting any lines matching the regular expression @code{/^foo/},
	618	and replacing all occurrences of the string @samp{hello} with @samp{world}:
[599]	619
[3613]	620	@example
	621	sed '/^foo/d ; s/hello/world/g' input.txt > output.txt
[599]	622
[3613]	623	sed -e '/^foo/d' -e 's/hello/world/g' input.txt > output.txt
[599]	624
[3613]	625	echo '/^foo/d' > script.sed
	626	echo 's/hello/world/g' >> script.sed
	627	sed -f script.sed input.txt > output.txt
[599]	628
[3613]	629	echo 's/hello/world/g' > script2.sed
	630	sed -e '/^foo/d' -f script2.sed input.txt > output.txt
	631	@end example
[599]	632
	633
[3613]	634	@cindex @command{a}, and semicolons
	635	@cindex @command{c}, and semicolons
	636	@cindex @command{i}, and semicolons
	637	Commands @command{a}, @command{c}, @command{i}, due to their syntax,
	638	cannot be followed by semicolons working as command separators and
	639	thus should be terminated
	640	with newlines or be placed at the end of a @var{script} or @var{script-file}.
	641	Commands can also be preceded with optional non-significant
	642	whitespace characters.
	643	@xref{Multiple commands syntax}.
[599]	644
	645
	646
[3613]	647	@node sed commands list
	648	@section @command{sed} commands summary
[599]	649
[3613]	650	The following commands are supported in @value{SSED}.
	651	Some are standard POSIX commands, while other are @value{SSEDEXT}.
	652	Details and examples for each command are in the following sections.
	653	(Mnemonics) are shown in parentheses.
[599]	654
	655	@table @code
	656
[3613]	657	@item a\
	658	@itemx @var{text}
	659	Append @var{text} after a line.
[599]	660
[3613]	661	@item a @var{text}
	662	Append @var{text} after a line (alternative syntax).
[599]	663
[3613]	664	@item b @var{label}
	665	Branch unconditionally to @var{label}.
	666	The @var{label} may be omitted, in which case the next cycle is started.
[599]	667
[3613]	668	@item c\
	669	@itemx @var{text}
	670	Replace (change) lines with @var{text}.
[599]	671
[3613]	672	@item c @var{text}
	673	Replace (change) lines with @var{text} (alternative syntax).
[599]	674
[3613]	675	@item d
	676	Delete the pattern space;
	677	immediately start next cycle.
[599]	678
[3613]	679	@item D
	680	If pattern space contains newlines, delete text in the pattern
	681	space up to the first newline, and restart cycle with the resultant
	682	pattern space, without reading a new line of input.
[599]	683
[3613]	684	If pattern space contains no newline, start a normal new cycle as if
	685	the @code{d} command was issued.
	686	@c TODO: add a section about D+N and D+n commands
[599]	687
[3613]	688	@item e
	689	Executes the command that is found in pattern space and
	690	replaces the pattern space with the output; a trailing newline
	691	is suppressed.
[599]	692
[3613]	693	@item e @var{command}
	694	Executes @var{command} and sends its output to the output stream.
	695	The command can run across multiple lines, all but the last ending with
	696	a back-slash.
[599]	697
[3613]	698	@item F
	699	(filename) Print the file name of the current input file (with a trailing
	700	newline).
[599]	701
[3613]	702	@item g
	703	Replace the contents of the pattern space with the contents of the hold space.
[599]	704
[3613]	705	@item G
	706	Append a newline to the contents of the pattern space,
	707	and then append the contents of the hold space to that of the pattern space.
[599]	708
[3613]	709	@item h
	710	(hold) Replace the contents of the hold space with the contents of the
	711	pattern space.
[599]	712
[3613]	713	@item H
	714	Append a newline to the contents of the hold space,
	715	and then append the contents of the pattern space to that of the hold space.
[599]	716
[3613]	717	@item i\
	718	@itemx @var{text}
	719	insert @var{text} before a line.
[599]	720
[3613]	721	@item i @var{text}
	722	insert @var{text} before a line (alternative syntax).
[599]	723
[3613]	724	@item l
	725	Print the pattern space in an unambiguous form.
[599]	726
[3613]	727	@item n
	728	(next) If auto-print is not disabled, print the pattern space,
	729	then, regardless, replace the pattern space with the next line of input.
	730	If there is no more input then @command{sed} exits without processing
	731	any more commands.
[599]	732
[3613]	733	@item N
	734	Add a newline to the pattern space,
	735	then append the next line of input to the pattern space.
	736	If there is no more input then @command{sed} exits without processing
	737	any more commands.
[599]	738
[3613]	739	@item p
	740	Print the pattern space.
	741	@c useful with @option{-n}
[599]	742
[3613]	743	@item P
	744	Print the pattern space, up to the first <newline>.
[599]	745
[3613]	746	@item q@var{[exit-code]}
	747	(quit) Exit @command{sed} without processing any more commands or input.
[599]	748
[3613]	749	@item Q@var{[exit-code]}
	750	(quit) This command is the same as @code{q}, but will not print the
	751	contents of pattern space. Like @code{q}, it provides the
	752	ability to return an exit code to the caller.
	753	@c useful to quit on a conditional without printing
[599]	754
[3613]	755	@item r filename
	756	Reads file @var{filename}.
[599]	757
[3613]	758	@item R filename
	759	Queue a line of @var{filename} to be read and
	760	inserted into the output stream at the end of the current cycle,
	761	or when the next input line is read.
	762	@c useful to interleave files
[599]	763
[3613]	764	@item s@var{/regexp/replacement/[flags]}
	765	(substitute) Match the regular-expression against the content of the
	766	pattern space. If found, replace matched string with
	767	@var{replacement}.
[599]	768
[3613]	769	@item t @var{label}
	770	(test) Branch to @var{label} only if there has been a successful
	771	@code{s}ubstitution since the last input line was read or conditional
	772	branch was taken. The @var{label} may be omitted, in which case the
	773	next cycle is started.
[599]	774
[3613]	775	@item T @var{label}
	776	(test) Branch to @var{label} only if there have been no successful
	777	@code{s}ubstitutions since the last input line was read or
	778	conditional branch was taken. The @var{label} may be omitted,
	779	in which case the next cycle is started.
[599]	780
[3613]	781	@item v @var{[version]}
	782	(version) This command does nothing, but makes @command{sed} fail if
	783	@value{SSED} extensions are not supported, or if the requested version
	784	is not available.
[599]	785
[3613]	786	@item w filename
	787	Write the pattern space to @var{filename}.
[599]	788
[3613]	789	@item W filename
	790	Write to the given filename the portion of the pattern space up to
	791	the first newline
[599]	792
[3613]	793	@item x
	794	Exchange the contents of the hold and pattern spaces.
[599]	795
	796
[3613]	797	@item y/src/dst/
	798	Transliterate any characters in the pattern space which match
	799	any of the @var{source-chars} with the corresponding character
	800	in @var{dest-chars}.
[599]	801
	802
[3613]	803	@item z
	804	(zap) This command empties the content of pattern space.
[599]	805
	806	@item #
[3613]	807	A comment, until the next newline.
[599]	808
	809
[3613]	810	@item @{ @var{cmd ; cmd ...} @}
	811	Group several commands together.
	812	@c useful for multiple commands on same address
[599]	813
[3613]	814	@item =
	815	Print the current input line number (with a trailing newline).
[599]	816
[3613]	817	@item : @var{label}
	818	Specify the location of @var{label} for branch commands (@code{b},
	819	@code{t}, @code{T}).
[599]	820
[3613]	821	@end table
[599]	822
	823
	824	@node The "s" Command
	825	@section The @code{s} Command
	826
[3613]	827	The @code{s} command (as in substitute) is probably the most important
	828	in @command{sed} and has a lot of different options. The syntax of
	829	the @code{s} command is
	830	@samp{s/@var{regexp}/@var{replacement}/@var{flags}}.
[599]	831
[3613]	832	Its basic concept is simple: the @code{s} command attempts to match
	833	the pattern space against the supplied regular expression @var{regexp};
	834	if the match is successful, then that portion of the
	835	pattern space which was matched is replaced with @var{replacement}.
[599]	836
[3613]	837	For details about @var{regexp} syntax @pxref{Regexp Addresses,,Regular
	838	Expression Addresses}.
	839
[599]	840	@cindex Backreferences, in regular expressions
	841	@cindex Parenthesized substrings
	842	The @var{replacement} can contain @code{\@var{n}} (@var{n} being
	843	a number from 1 to 9, inclusive) references, which refer to
	844	the portion of the match which is contained between the @var{n}th
	845	@code{$} and its matching @code{$}.
	846	Also, the @var{replacement} can contain unescaped @code{&}
	847	characters which reference the whole matched portion
	848	of the pattern space.
[3613]	849
	850	@c TODO: xref to backreference section mention @var{\'}.
	851
	852	The @code{/}
	853	characters may be uniformly replaced by any other single
	854	character within any given @code{s} command. The @code{/}
	855	character (or whatever other character is used in its stead)
	856	can appear in the @var{regexp} or @var{replacement}
	857	only if it is preceded by a @code{\} character.
	858
	859
	860
[599]	861	@cindex @value{SSEDEXT}, case modifiers in @code{s} commands
	862	Finally, as a @value{SSED} extension, you can include a
	863	special sequence made of a backslash and one of the letters
	864	@code{L}, @code{l}, @code{U}, @code{u}, or @code{E}.
	865	The meaning is as follows:
	866
	867	@table @code
	868	@item \L
	869	Turn the replacement
	870	to lowercase until a @code{\U} or @code{\E} is found,
	871
	872	@item \l
	873	Turn the
	874	next character to lowercase,
	875
	876	@item \U
	877	Turn the replacement to uppercase
	878	until a @code{\L} or @code{\E} is found,
	879
	880	@item \u
	881	Turn the next character
	882	to uppercase,
	883
	884	@item \E
	885	Stop case conversion started by @code{\L} or @code{\U}.
	886	@end table
	887
[3613]	888	When the @code{g} flag is being used, case conversion does not
	889	propagate from one occurrence of the regular expression to
	890	another. For example, when the following command is executed
	891	with @samp{a-b-} in pattern space:
	892	@example
	893	s/$b\?$-/x\u\1/g
	894	@end example
	895
	896	@noindent
	897	the output is @samp{axxB}. When replacing the first @samp{-},
	898	the @samp{\u} sequence only affects the empty replacement of
	899	@samp{\1}. It does not affect the @code{x} character that is
	900	added to pattern space when replacing @code{b-} with @code{xB}.
	901
	902	On the other hand, @code{\l} and @code{\u} do affect the remainder
	903	of the replacement text if they are followed by an empty substitution.
	904	With @samp{a-b-} in pattern space, the following command:
	905	@example
	906	s/$b\?$-/\u\1x/g
	907	@end example
	908
	909	@noindent
	910	will replace @samp{-} with @samp{X} (uppercase) and @samp{b-} with
	911	@samp{Bx}. If this behavior is undesirable, you can prevent it by
	912	adding a @samp{\E} sequence---after @samp{\1} in this case.
	913
[599]	914	To include a literal @code{\}, @code{&}, or newline in the final
	915	replacement, be sure to precede the desired @code{\}, @code{&},
	916	or newline in the @var{replacement} with a @code{\}.
	917
	918	@findex s command, option flags
	919	@cindex Substitution of text, options
	920	The @code{s} command can be followed by zero or more of the
	921	following @var{flags}:
	922
	923	@table @code
	924	@item g
	925	@cindex Global substitution
	926	@cindex Replacing all text matching regexp in a line
	927	Apply the replacement to @emph{all} matches to the @var{regexp},
	928	not just the first.
	929
	930	@item @var{number}
	931	@cindex Replacing only @var{n}th match of regexp in a line
	932	Only replace the @var{number}th match of the @var{regexp}.
	933
[3613]	934	@cindex GNU extensions, @code{g} and @var{number} modifier
	935	interaction in @code{s} command
[599]	936	@cindex Mixing @code{g} and @var{number} modifiers in the @code{s} command
	937	Note: the @sc{posix} standard does not specify what should happen
	938	when you mix the @code{g} and @var{number} modifiers,
	939	and currently there is no widely agreed upon meaning
	940	across @command{sed} implementations.
	941	For @value{SSED}, the interaction is defined to be:
	942	ignore matches before the @var{number}th,
	943	and then match and replace all matches from
	944	the @var{number}th on.
	945
	946	@item p
	947	@cindex Text, printing after substitution
	948	If the substitution was made, then print the new pattern space.
	949
	950	Note: when both the @code{p} and @code{e} options are specified,
	951	the relative ordering of the two produces very different results.
	952	In general, @code{ep} (evaluate then print) is what you want,
	953	but operating the other way round can be useful for debugging.
	954	For this reason, the current version of @value{SSED} interprets
	955	specially the presence of @code{p} options both before and after
	956	@code{e}, printing the pattern space before and after evaluation,
	957	while in general flags for the @code{s} command show their
	958	effect just once. This behavior, although documented, might
	959	change in future versions.
	960
[3613]	961	@item w @var{filename}
[599]	962	@cindex Text, writing to a file after substitution
	963	@cindex @value{SSEDEXT}, @file{/dev/stdout} file
	964	@cindex @value{SSEDEXT}, @file{/dev/stderr} file
	965	If the substitution was made, then write out the result to the named file.
[3613]	966	As a @value{SSED} extension, two special values of @var{filename} are
[599]	967	supported: @file{/dev/stderr}, which writes the result to the standard
	968	error, and @file{/dev/stdout}, which writes to the standard
	969	output.@footnote{This is equivalent to @code{p} unless the @option{-i}
	970	option is being used.}
	971
	972	@item e
	973	@cindex Evaluate Bourne-shell commands, after substitution
	974	@cindex Subprocesses
	975	@cindex @value{SSEDEXT}, evaluating Bourne-shell commands
	976	@cindex @value{SSEDEXT}, subprocesses
	977	This command allows one to pipe input from a shell command
	978	into pattern space. If a substitution was made, the command
	979	that is found in pattern space is executed and pattern space
	980	is replaced with its output. A trailing newline is suppressed;
	981	results are undefined if the command to be executed contains
	982	a @sc{nul} character. This is a @value{SSED} extension.
	983
	984	@item I
	985	@itemx i
[3613]	986	@cindex GNU extensions, @code{I} modifier
[599]	987	@cindex Case-insensitive matching
[3613]	988	The @code{I} modifier to regular-expression matching is a GNU
[599]	989	extension which makes @command{sed} match @var{regexp} in a
	990	case-insensitive manner.
	991
	992	@item M
	993	@itemx m
	994	@cindex @value{SSEDEXT}, @code{M} modifier
	995	The @code{M} modifier to regular-expression matching is a @value{SSED}
[3613]	996	extension which directs @value{SSED} to match the regular expression
	997	in @cite{multi-line} mode. The modifier causes @code{^} and @code{$} to
	998	match respectively (in addition to the normal behavior) the empty string
	999	after a newline, and the empty string before a newline. There are
	1000	special character sequences
[599]	1001	@ifclear PERL
	1002	(@code{\`} and @code{\'})
	1003	@end ifclear
	1004	which always match the beginning or the end of the buffer.
[3613]	1005	In addition,
	1006	the period character does not match a new-line character in
	1007	multi-line mode.
[599]	1008
	1009
	1010	@end table
	1011
[3613]	1012	@node Common Commands
	1013	@section Often-Used Commands
[599]	1014
[3613]	1015	If you use @command{sed} at all, you will quite likely want to know
	1016	these commands.
	1017
	1018	@table @code
	1019	@item #
	1020	[No addresses allowed.]
	1021
	1022	@findex # (comments)
	1023	@cindex Comments, in scripts
	1024	The @code{#} character begins a comment;
	1025	the comment continues until the next newline.
	1026
	1027	@cindex Portability, comments
	1028	If you are concerned about portability, be aware that
	1029	some implementations of @command{sed} (which are not @sc{posix}
	1030	conforming) may only support a single one-line comment,
	1031	and then only when the very first character of the script is a @code{#}.
	1032
	1033	@findex -n, forcing from within a script
	1034	@cindex Caveat --- #n on first line
	1035	Warning: if the first two characters of the @command{sed} script
	1036	are @code{#n}, then the @option{-n} (no-autoprint) option is forced.
	1037	If you want to put a comment in the first line of your script
	1038	and that comment begins with the letter @samp{n}
	1039	and you do not want this behavior,
	1040	then be sure to either use a capital @samp{N},
	1041	or place at least one space before the @samp{n}.
	1042
	1043	@item q [@var{exit-code}]
	1044	@findex q (quit) command
	1045	@cindex @value{SSEDEXT}, returning an exit code
	1046	@cindex Quitting
	1047	Exit @command{sed} without processing any more commands or input.
	1048
	1049	Example: stop after printing the second line:
	1050	@example
	1051	$ seq 3 \| sed 2q
	1052	1
	1053	2
	1054	@end example
	1055
	1056	This command accepts only one address.
	1057	Note that the current pattern space is printed if auto-print is
	1058	not disabled with the @option{-n} options. The ability to return
	1059	an exit code from the @command{sed} script is a @value{SSED} extension.
	1060
	1061	See also the @value{SSED} extension @code{Q} command which quits silently
	1062	without printing the current pattern space.
	1063
	1064	@item d
	1065	@findex d (delete) command
	1066	@cindex Text, deleting
	1067	Delete the pattern space;
	1068	immediately start next cycle.
	1069
	1070	Example: delete the second input line:
	1071	@example
	1072	$ seq 3 \| sed 2d
	1073	1
	1074	3
	1075	@end example
	1076
	1077	@item p
	1078	@findex p (print) command
	1079	@cindex Text, printing
	1080	Print out the pattern space (to the standard output).
	1081	This command is usually only used in conjunction with the @option{-n}
	1082	command-line option.
	1083
	1084	Example: print only the second input line:
	1085	@example
	1086	$ seq 3 \| sed -n 2p
	1087	2
	1088	@end example
	1089
	1090	@item n
	1091	@findex n (next-line) command
	1092	@cindex Next input line, replace pattern space with
	1093	@cindex Read next input line
	1094	If auto-print is not disabled, print the pattern space,
	1095	then, regardless, replace the pattern space with the next line of input.
	1096	If there is no more input then @command{sed} exits without processing
	1097	any more commands.
	1098
	1099	This command is useful to skip lines (e.g. process every Nth line).
	1100
	1101	Example: perform substitution on every 3rd line (i.e. two @code{n} commands
	1102	skip two lines):
	1103	@codequoteundirected on
	1104	@codequotebacktick on
	1105	@example
	1106	$ seq 6 \| sed 'n;n;s/./x/'
	1107	1
	1108	2
	1109	x
	1110	4
	1111	5
	1112	x
	1113	@end example
	1114
	1115	@value{SSED} provides an extension address syntax of @var{first}~@var{step}
	1116	to achieve the same result:
	1117
	1118	@example
	1119	$ seq 6 \| sed '0~3s/./x/'
	1120	1
	1121	2
	1122	x
	1123	4
	1124	5
	1125	x
	1126	@end example
	1127
	1128	@codequotebacktick off
	1129	@codequoteundirected off
	1130
	1131
	1132	@item @{ @var{commands} @}
	1133	@findex @{@} command grouping
	1134	@cindex Grouping commands
	1135	@cindex Command groups
	1136	A group of commands may be enclosed between
	1137	@code{@{} and @code{@}} characters.
	1138	This is particularly useful when you want a group of commands
	1139	to be triggered by a single address (or address-range) match.
	1140
	1141	Example: perform substitution then print the second input line:
	1142	@codequoteundirected on
	1143	@codequotebacktick on
	1144	@example
	1145	$ seq 3 \| sed -n '2@{s/2/X/ ; p@}'
	1146	X
	1147	@end example
	1148	@codequoteundirected off
	1149	@codequotebacktick off
	1150
	1151	@end table
	1152
	1153
[599]	1154	@node Other Commands
	1155	@section Less Frequently-Used Commands
	1156
	1157	Though perhaps less frequently used than those in the previous
	1158	section, some very small yet useful @command{sed} scripts can be built with
	1159	these commands.
	1160
	1161	@table @code
	1162	@item y/@var{source-chars}/@var{dest-chars}/
	1163	@findex y (transliterate) command
	1164	@cindex Transliteration
	1165	Transliterate any characters in the pattern space which match
	1166	any of the @var{source-chars} with the corresponding character
	1167	in @var{dest-chars}.
	1168
[3613]	1169	Example: transliterate @samp{a-j} into @samp{0-9}:
	1170	@codequoteundirected on
	1171	@codequotebacktick on
	1172	@example
	1173	$ echo hello world \| sed 'y/abcdefghij/0123456789/'
	1174	74llo worl3
	1175	@end example
	1176	@codequoteundirected off
	1177	@codequotebacktick off
	1178
	1179	(The @code{/} characters may be uniformly replaced by
	1180	any other single character within any given @code{y} command.)
	1181
[599]	1182	Instances of the @code{/} (or whatever other character is used in its stead),
	1183	@code{\}, or newlines can appear in the @var{source-chars} or @var{dest-chars}
	1184	lists, provide that each instance is escaped by a @code{\}.
	1185	The @var{source-chars} and @var{dest-chars} lists @emph{must}
	1186	contain the same number of characters (after de-escaping).
	1187
[3613]	1188	See the @command{tr} command from GNU coreutils for similar functionality.
	1189
	1190	@item a @var{text}
	1191	Appending @var{text} after a line. This is a GNU extension
	1192	to the standard @code{a} command - see below for details.
	1193
	1194	Example: Add @samp{hello} after the second line:
	1195	@codequoteundirected on
	1196	@codequotebacktick on
	1197	@example
	1198	$ seq 3 \| sed '2a hello'
	1199	1
	1200	2
	1201	hello
	1202	3
	1203	@end example
	1204	@codequoteundirected off
	1205	@codequotebacktick off
	1206
	1207	Leading whitespace after the @code{a} command is ignored.
	1208	The text to add is read until the end of the line.
	1209
	1210
[599]	1211	@item a\
	1212	@itemx @var{text}
	1213	@findex a (append text lines) command
	1214	@cindex Appending text after a line
	1215	@cindex Text, appending
[3613]	1216	Appending @var{text} after a line.
	1217
	1218	Example: Add @samp{hello} after the second line
	1219	(@print{} indicates printed output lines):
	1220	@codequoteundirected on
	1221	@codequotebacktick on
	1222	@example
	1223	$ seq 3 \| sed '2a\
	1224	hello'
	1225	@print{}1
	1226	@print{}2
	1227	@print{}hello
	1228	@print{}3
	1229	@end example
	1230	@codequoteundirected off
	1231	@codequotebacktick off
	1232
	1233	The @code{a} command queues the lines of text which follow this command
[599]	1234	(each but the last ending with a @code{\},
	1235	which are removed from the output)
	1236	to be output at the end of the current cycle,
	1237	or when the next input line is read.
	1238
[3613]	1239	@cindex @value{SSEDEXT}, two addresses supported by most commands
	1240	As a GNU extension, this command accepts two addresses.
	1241
[599]	1242	Escape sequences in @var{text} are processed, so you should
	1243	use @code{\\} in @var{text} to print a single backslash.
	1244
[3613]	1245	The commands resume after the last line without a backslash (@code{\}) -
	1246	@samp{world} in the following example:
	1247	@codequoteundirected on
	1248	@codequotebacktick on
	1249	@example
	1250	$ seq 3 \| sed '2a\
	1251	hello\
	1252	world
	1253	3s/./X/'
	1254	@print{}1
	1255	@print{}2
	1256	@print{}hello
	1257	@print{}world
	1258	@print{}X
	1259	@end example
	1260	@codequoteundirected off
	1261	@codequotebacktick off
[599]	1262
[3613]	1263	As a GNU extension, the @code{a} command and @var{text} can be
	1264	separated into two @code{-e} parameters, enabling easier scripting:
	1265	@codequoteundirected on
	1266	@codequotebacktick on
	1267	@example
	1268	$ seq 3 \| sed -e '2a\' -e hello
	1269	1
	1270	2
	1271	hello
	1272	3
	1273
	1274	$ sed -e '2a\' -e "$VAR"
	1275	@end example
	1276	@codequoteundirected off
	1277	@codequotebacktick off
	1278
	1279	@item i @var{text}
	1280	insert @var{text} before a line. This is a GNU extension
	1281	to the standard @code{i} command - see below for details.
	1282
	1283	Example: Insert @samp{hello} before the second line:
	1284	@codequoteundirected on
	1285	@codequotebacktick on
	1286	@example
	1287	$ seq 3 \| sed '2i hello'
	1288	1
	1289	hello
	1290	2
	1291	3
	1292	@end example
	1293	@codequoteundirected off
	1294	@codequotebacktick off
	1295
	1296	Leading whitespace after the @code{i} command is ignored.
	1297	The text to add is read until the end of the line.
	1298
	1299	@anchor{insert command}
[599]	1300	@item i\
	1301	@itemx @var{text}
	1302	@findex i (insert text lines) command
	1303	@cindex Inserting text before a line
	1304	@cindex Text, insertion
[3613]	1305	Immediately output the lines of text which follow this command.
[599]	1306
[3613]	1307	Example: Insert @samp{hello} before the second line
	1308	(@print{} indicates printed output lines):
	1309	@codequoteundirected on
	1310	@codequotebacktick on
	1311	@example
	1312	$ seq 3 \| sed '2i\
	1313	hello'
	1314	@print{}1
	1315	@print{}hello
	1316	@print{}2
	1317	@print{}3
	1318	@end example
	1319	@codequoteundirected off
	1320	@codequotebacktick off
	1321
	1322	@cindex @value{SSEDEXT}, two addresses supported by most commands
	1323	As a GNU extension, this command accepts two addresses.
	1324
	1325	Escape sequences in @var{text} are processed, so you should
	1326	use @code{\\} in @var{text} to print a single backslash.
	1327
	1328	The commands resume after the last line without a backslash (@code{\}) -
	1329	@samp{world} in the following example:
	1330	@codequoteundirected on
	1331	@codequotebacktick on
	1332	@example
	1333	$ seq 3 \| sed '2i\
	1334	hello\
	1335	world
	1336	s/./X/'
	1337	@print{}X
	1338	@print{}hello
	1339	@print{}world
	1340	@print{}X
	1341	@print{}X
	1342	@end example
	1343	@codequoteundirected off
	1344	@codequotebacktick off
	1345
	1346	As a GNU extension, the @code{i} command and @var{text} can be
	1347	separated into two @code{-e} parameters, enabling easier scripting:
	1348	@codequoteundirected on
	1349	@codequotebacktick on
	1350	@example
	1351	$ seq 3 \| sed -e '2i\' -e hello
	1352	1
	1353	hello
	1354	2
	1355	3
	1356
	1357	$ sed -e '2i\' -e "$VAR"
	1358	@end example
	1359	@codequoteundirected off
	1360	@codequotebacktick off
	1361
	1362	@item c @var{text}
	1363	Replaces the line(s) with @var{text}. This is a GNU extension
	1364	to the standard @code{c} command - see below for details.
	1365
	1366	Example: Replace the 2nd to 9th lines with the word @samp{hello}:
	1367	@codequoteundirected on
	1368	@codequotebacktick on
	1369	@example
	1370	$ seq 10 \| sed '2,9c hello'
	1371	1
	1372	hello
	1373	10
	1374	@end example
	1375	@codequoteundirected off
	1376	@codequotebacktick off
	1377
	1378	Leading whitespace after the @code{c} command is ignored.
	1379	The text to add is read until the end of the line.
	1380
[599]	1381	@item c\
	1382	@itemx @var{text}
	1383	@findex c (change to text lines) command
	1384	@cindex Replacing selected lines with other text
	1385	Delete the lines matching the address or address-range,
[3613]	1386	and output the lines of text which follow this command.
	1387
	1388	Example: Replace 2nd to 4th lines with the words @samp{hello} and
	1389	@samp{world} (@print{} indicates printed output lines):
	1390	@codequoteundirected on
	1391	@codequotebacktick on
	1392	@example
	1393	$ seq 5 \| sed '2,4c\
	1394	hello\
	1395	world'
	1396	@print{}1
	1397	@print{}hello
	1398	@print{}world
	1399	@print{}5
	1400	@end example
	1401	@codequoteundirected off
	1402	@codequotebacktick off
	1403
	1404	If no addresses are given, each line is replaced.
	1405
[599]	1406	A new cycle is started after this command is done,
	1407	since the pattern space will have been deleted.
[3613]	1408	In the following example, the @code{c} starts a
	1409	new cycle and the substitution command is not performed
	1410	on the replaced text:
[599]	1411
[3613]	1412	@codequoteundirected on
	1413	@codequotebacktick on
	1414	@example
	1415	$ seq 3 \| sed '2c\
	1416	hello
	1417	s/./X/'
	1418	@print{}X
	1419	@print{}hello
	1420	@print{}X
	1421	@end example
	1422	@codequoteundirected off
	1423	@codequotebacktick off
	1424
	1425	As a GNU extension, the @code{c} command and @var{text} can be
	1426	separated into two @code{-e} parameters, enabling easier scripting:
	1427	@codequoteundirected on
	1428	@codequotebacktick on
	1429	@example
	1430	$ seq 3 \| sed -e '2c\' -e hello
	1431	1
	1432	hello
	1433	3
	1434
	1435	$ sed -e '2c\' -e "$VAR"
	1436	@end example
	1437	@codequoteundirected off
	1438	@codequotebacktick off
	1439
	1440
[599]	1441	@item =
	1442	@findex = (print line number) command
	1443	@cindex Printing line number
	1444	@cindex Line number, printing
	1445	Print out the current input line number (with a trailing newline).
	1446
[3613]	1447	@codequoteundirected on
	1448	@codequotebacktick on
	1449	@example
	1450	$ printf '%s\n' aaa bbb ccc \| sed =
	1451	1
	1452	aaa
	1453	2
	1454	bbb
	1455	3
	1456	ccc
	1457	@end example
	1458	@codequoteundirected off
	1459	@codequotebacktick off
	1460
	1461	@cindex @value{SSEDEXT}, two addresses supported by most commands
	1462	As a GNU extension, this command accepts two addresses.
	1463
	1464
	1465
	1466
[599]	1467	@item l @var{n}
	1468	@findex l (list unambiguously) command
	1469	@cindex List pattern space
	1470	@cindex Printing text unambiguously
	1471	@cindex Line length, setting
	1472	@cindex @value{SSEDEXT}, setting line length
	1473	Print the pattern space in an unambiguous form:
	1474	non-printable characters (and the @code{\} character)
	1475	are printed in C-style escaped form; long lines are split,
	1476	with a trailing @code{\} character to indicate the split;
	1477	the end of each line is marked with a @code{$}.
	1478
	1479	@var{n} specifies the desired line-wrap length;
	1480	a length of 0 (zero) means to never wrap long lines. If omitted,
	1481	the default as specified on the command line is used. The @var{n}
	1482	parameter is a @value{SSED} extension.
	1483
	1484	@item r @var{filename}
	1485
	1486	@findex r (read file) command
	1487	@cindex Read text from a file
[3613]	1488	Reads file @var{filename}. Example:
	1489
	1490	@codequoteundirected on
	1491	@codequotebacktick on
	1492	@example
	1493	$ seq 3 \| sed '2r/etc/hostname'
	1494	1
	1495	2
	1496	fencepost.gnu.org
	1497	3
	1498	@end example
	1499	@codequoteundirected off
	1500	@codequotebacktick off
	1501
[599]	1502	@cindex @value{SSEDEXT}, @file{/dev/stdin} file
	1503	Queue the contents of @var{filename} to be read and
	1504	inserted into the output stream at the end of the current cycle,
	1505	or when the next input line is read.
	1506	Note that if @var{filename} cannot be read, it is treated as
	1507	if it were an empty file, without any error indication.
	1508
	1509	As a @value{SSED} extension, the special value @file{/dev/stdin}
	1510	is supported for the file name, which reads the contents of the
	1511	standard input.
	1512
[3613]	1513	@cindex @value{SSEDEXT}, two addresses supported by most commands
	1514	As a GNU extension, this command accepts two addresses. The
	1515	file will then be reread and inserted on each of the addressed lines.
	1516
	1517	As a @value{SSED} extension, the @code{r} command accepts a zero address,
	1518	inserting a file @emph{before} the first line of the input
	1519	@pxref{Adding a header to multiple files}.
	1520
[599]	1521	@item w @var{filename}
	1522	@findex w (write file) command
	1523	@cindex Write to a file
	1524	@cindex @value{SSEDEXT}, @file{/dev/stdout} file
	1525	@cindex @value{SSEDEXT}, @file{/dev/stderr} file
	1526	Write the pattern space to @var{filename}.
[3613]	1527	As a @value{SSED} extension, two special values of @var{filename} are
[599]	1528	supported: @file{/dev/stderr}, which writes the result to the standard
	1529	error, and @file{/dev/stdout}, which writes to the standard
	1530	output.@footnote{This is equivalent to @code{p} unless the @option{-i}
	1531	option is being used.}
	1532
[3613]	1533	The file will be created (or truncated) before the first input line is
	1534	read; all @code{w} commands (including instances of the @code{w} flag
	1535	on successful @code{s} commands) which refer to the same @var{filename}
	1536	are output without closing and reopening the file.
[599]	1537
	1538	@item D
	1539	@findex D (delete first line) command
	1540	@cindex Delete first line from pattern space
[3613]	1541	If pattern space contains no newline, start a normal new cycle as if
	1542	the @code{d} command was issued. Otherwise, delete text in the pattern
	1543	space up to the first newline, and restart cycle with the resultant
	1544	pattern space, without reading a new line of input.
[599]	1545
	1546	@item N
	1547	@findex N (append Next line) command
	1548	@cindex Next input line, append to pattern space
	1549	@cindex Append next input line to pattern space
	1550	Add a newline to the pattern space,
	1551	then append the next line of input to the pattern space.
	1552	If there is no more input then @command{sed} exits without processing
	1553	any more commands.
	1554
[3613]	1555	When @option{-z} is used, a zero byte (the ascii @samp{NUL} character) is
	1556	added between the lines (instead of a new line).
	1557
	1558	By default @command{sed} does not terminate if there is no 'next' input line.
	1559	This is a GNU extension which can be disabled with @option{--posix}.
	1560	@xref{N_command_last_line,,N command on the last line}.
	1561
	1562
[599]	1563	@item P
	1564	@findex P (print first line) command
	1565	@cindex Print first line from pattern space
	1566	Print out the portion of the pattern space up to the first newline.
	1567
	1568	@item h
	1569	@findex h (hold) command
	1570	@cindex Copy pattern space into hold space
	1571	@cindex Replace hold space with copy of pattern space
	1572	@cindex Hold space, copying pattern space into
	1573	Replace the contents of the hold space with the contents of the pattern space.
	1574
	1575	@item H
	1576	@findex H (append Hold) command
	1577	@cindex Append pattern space to hold space
	1578	@cindex Hold space, appending from pattern space
	1579	Append a newline to the contents of the hold space,
	1580	and then append the contents of the pattern space to that of the hold space.
	1581
	1582	@item g
	1583	@findex g (get) command
	1584	@cindex Copy hold space into pattern space
	1585	@cindex Replace pattern space with copy of hold space
	1586	@cindex Hold space, copy into pattern space
	1587	Replace the contents of the pattern space with the contents of the hold space.
	1588
	1589	@item G
	1590	@findex G (appending Get) command
	1591	@cindex Append hold space to pattern space
	1592	@cindex Hold space, appending to pattern space
	1593	Append a newline to the contents of the pattern space,
	1594	and then append the contents of the hold space to that of the pattern space.
	1595
	1596	@item x
	1597	@findex x (eXchange) command
	1598	@cindex Exchange hold space with pattern space
	1599	@cindex Hold space, exchange with pattern space
	1600	Exchange the contents of the hold and pattern spaces.
	1601
	1602	@end table
	1603
	1604
	1605	@node Programming Commands
	1606	@section Commands for @command{sed} gurus
	1607
	1608	In most cases, use of these commands indicates that you are
	1609	probably better off programming in something like @command{awk}
	1610	or Perl. But occasionally one is committed to sticking
	1611	with @command{sed}, and these commands can enable one to write
	1612	quite convoluted scripts.
	1613
	1614	@cindex Flow of control in scripts
	1615	@table @code
	1616	@item : @var{label}
	1617	[No addresses allowed.]
	1618
	1619	@findex : (label) command
	1620	@cindex Labels, in scripts
	1621	Specify the location of @var{label} for branch commands.
	1622	In all other respects, a no-op.
	1623
	1624	@item b @var{label}
	1625	@findex b (branch) command
	1626	@cindex Branch to a label, unconditionally
	1627	@cindex Goto, in scripts
	1628	Unconditionally branch to @var{label}.
	1629	The @var{label} may be omitted, in which case the next cycle is started.
	1630
	1631	@item t @var{label}
	1632	@findex t (test and branch if successful) command
	1633	@cindex Branch to a label, if @code{s///} succeeded
	1634	@cindex Conditional branch
	1635	Branch to @var{label} only if there has been a successful @code{s}ubstitution
	1636	since the last input line was read or conditional branch was taken.
	1637	The @var{label} may be omitted, in which case the next cycle is started.
	1638
	1639	@end table
	1640
	1641	@node Extended Commands
	1642	@section Commands Specific to @value{SSED}
	1643
	1644	These commands are specific to @value{SSED}, so you
	1645	must use them with care and only when you are sure that
	1646	hindering portability is not evil. They allow you to check
	1647	for @value{SSED} extensions or to do tasks that are required
	1648	quite often, yet are unsupported by standard @command{sed}s.
	1649
	1650	@table @code
	1651	@item e [@var{command}]
	1652	@findex e (evaluate) command
	1653	@cindex Evaluate Bourne-shell commands
	1654	@cindex Subprocesses
	1655	@cindex @value{SSEDEXT}, evaluating Bourne-shell commands
	1656	@cindex @value{SSEDEXT}, subprocesses
	1657	This command allows one to pipe input from a shell command
	1658	into pattern space. Without parameters, the @code{e} command
	1659	executes the command that is found in pattern space and
	1660	replaces the pattern space with the output; a trailing newline
	1661	is suppressed.
	1662
	1663	If a parameter is specified, instead, the @code{e} command
[3613]	1664	interprets it as a command and sends its output to the output stream.
	1665	The command can run across multiple lines, all but the last ending with
	1666	a back-slash.
[599]	1667
	1668	In both cases, the results are undefined if the command to be
	1669	executed contains a @sc{nul} character.
	1670
[3613]	1671	Note that, unlike the @code{r} command, the output of the command will
	1672	be printed immediately; the @code{r} command instead delays the output
	1673	to the end of the current cycle.
[599]	1674
[3613]	1675	@item F
	1676	@findex F (File name) command
	1677	@cindex Printing file name
	1678	@cindex File name, printing
	1679	Print out the file name of the current input file (with a trailing
	1680	newline).
[599]	1681
	1682	@item Q [@var{exit-code}]
[3613]	1683	This command accepts only one address.
[599]	1684
	1685	@findex Q (silent Quit) command
	1686	@cindex @value{SSEDEXT}, quitting silently
	1687	@cindex @value{SSEDEXT}, returning an exit code
	1688	@cindex Quitting
	1689	This command is the same as @code{q}, but will not print the
	1690	contents of pattern space. Like @code{q}, it provides the
	1691	ability to return an exit code to the caller.
	1692
	1693	This command can be useful because the only alternative ways
	1694	to accomplish this apparently trivial function are to use
	1695	the @option{-n} option (which can unnecessarily complicate
	1696	your script) or resorting to the following snippet, which
	1697	wastes time by reading the whole file without any visible effect:
	1698
	1699	@example
	1700	:eat
[3613]	1701	$d @i{@r{Quit silently on the last line}}
	1702	N @i{@r{Read another line, silently}}
	1703	g @i{@r{Overwrite pattern space each time to save memory}}
[599]	1704	b eat
	1705	@end example
	1706
	1707	@item R @var{filename}
	1708	@findex R (read line) command
	1709	@cindex Read text from a file
	1710	@cindex @value{SSEDEXT}, reading a file a line at a time
	1711	@cindex @value{SSEDEXT}, @code{R} command
	1712	@cindex @value{SSEDEXT}, @file{/dev/stdin} file
	1713	Queue a line of @var{filename} to be read and
	1714	inserted into the output stream at the end of the current cycle,
	1715	or when the next input line is read.
	1716	Note that if @var{filename} cannot be read, or if its end is
	1717	reached, no line is appended, without any error indication.
	1718
	1719	As with the @code{r} command, the special value @file{/dev/stdin}
	1720	is supported for the file name, which reads a line from the
	1721	standard input.
	1722
	1723	@item T @var{label}
	1724	@findex T (test and branch if failed) command
	1725	@cindex @value{SSEDEXT}, branch if @code{s///} failed
	1726	@cindex Branch to a label, if @code{s///} failed
	1727	@cindex Conditional branch
	1728	Branch to @var{label} only if there have been no successful
	1729	@code{s}ubstitutions since the last input line was read or
	1730	conditional branch was taken. The @var{label} may be omitted,
	1731	in which case the next cycle is started.
	1732
	1733	@item v @var{version}
	1734	@findex v (version) command
	1735	@cindex @value{SSEDEXT}, checking for their presence
	1736	@cindex Requiring @value{SSED}
	1737	This command does nothing, but makes @command{sed} fail if
	1738	@value{SSED} extensions are not supported, simply because other
	1739	versions of @command{sed} do not implement it. In addition, you
	1740	can specify the version of @command{sed} that your script
	1741	requires, such as @code{4.0.5}. The default is @code{4.0}
	1742	because that is the first version that implemented this command.
	1743
	1744	This command enables all @value{SSEDEXT} even if
	1745	@env{POSIXLY_CORRECT} is set in the environment.
	1746
	1747	@item W @var{filename}
	1748	@findex W (write first line) command
	1749	@cindex Write first line to a file
	1750	@cindex @value{SSEDEXT}, writing first line to a file
	1751	Write to the given filename the portion of the pattern space up to
	1752	the first newline. Everything said under the @code{w} command about
	1753	file handling holds here too.
[3613]	1754
	1755	@item z
	1756	@findex z (Zap) command
	1757	@cindex @value{SSEDEXT}, emptying pattern space
	1758	@cindex Emptying pattern space
	1759	This command empties the content of pattern space. It is
	1760	usually the same as @samp{s/.*//}, but is more efficient
	1761	and works in the presence of invalid multibyte sequences
	1762	in the input stream. @sc{posix} mandates that such sequences
	1763	are @emph{not} matched by @samp{.}, so that there is no portable
	1764	way to clear @command{sed}'s buffers in the middle of the
	1765	script in most multibyte locales (including UTF-8 locales).
[599]	1766	@end table
	1767
[3613]	1768
	1769	@node Multiple commands syntax
	1770	@section Multiple commands syntax
	1771
	1772	@c POSIX says:
	1773	@c Editing commands other than {...}, a, b, c, i, r, t, w, :, and #
	1774	@c can be followed by a <semicolon>, optional <blank> characters, and
	1775	@c another editing command. However, when an s editing command is used
	1776	@c with the w flag, following it with another command in this manner
	1777	@c produces undefined results.
	1778
	1779	There are several methods to specify multiple commands in a @command{sed}
	1780	program.
	1781
	1782	Using newlines is most natural when running a sed script from a file
	1783	(using the @option{-f} option).
	1784
	1785	On the command line, all @command{sed} commands may be separated by newlines.
	1786	Alternatively, you may specify each command as an argument to an @option{-e}
	1787	option:
	1788
	1789	@codequoteundirected on
	1790	@codequotebacktick on
	1791	@example
	1792	@group
	1793	$ seq 6 \| sed '1d
	1794	3d
	1795	5d'
	1796	2
	1797	4
	1798	6
	1799
	1800	$ seq 6 \| sed -e 1d -e 3d -e 5d
	1801	2
	1802	4
	1803	6
	1804	@end group
	1805	@end example
	1806	@codequoteundirected off
	1807	@codequotebacktick off
	1808
	1809	A semicolon (@samp{;}) may be used to separate most simple commands:
	1810
	1811	@codequoteundirected on
	1812	@codequotebacktick on
	1813	@example
	1814	@group
	1815	$ seq 6 \| sed '1d;3d;5d'
	1816	2
	1817	4
	1818	6
	1819	@end group
	1820	@end example
	1821	@codequoteundirected off
	1822	@codequotebacktick off
	1823
	1824	The @code{@{},@code{@}},@code{b},@code{t},@code{T},@code{:} commands can
	1825	be separated with a semicolon (this is a non-portable @value{SSED} extension).
	1826
	1827	@codequoteundirected on
	1828	@codequotebacktick on
	1829	@example
	1830	@group
	1831	$ seq 4 \| sed '@{1d;3d@}'
	1832	2
	1833	4
	1834
	1835	$ seq 6 \| sed '@{1d;3d@};5d'
	1836	2
	1837	4
	1838	6
	1839	@end group
	1840	@end example
	1841	@codequoteundirected off
	1842	@codequotebacktick off
	1843
	1844	Labels used in @code{b},@code{t},@code{T},@code{:} commands are read
	1845	until a semicolon. Leading and trailing whitespace is ignored. In
	1846	the examples below the label is @samp{x}. The first example works
	1847	with @value{SSED}. The second is a portable equivalent. For more
	1848	information about branching and labels @pxref{Branching and flow
	1849	control}.
	1850
	1851	@codequoteundirected on
	1852	@codequotebacktick on
	1853	@example
	1854	@group
	1855	$ seq 3 \| sed '/1/b x ; s/^/=/ ; :x ; 3d'
	1856	1
	1857	=2
	1858
	1859	$ seq 3 \| sed -e '/1/bx' -e 's/^/=/' -e ':x' -e '3d'
	1860	1
	1861	=2
	1862	@end group
	1863	@end example
	1864	@codequoteundirected off
	1865	@codequotebacktick off
	1866
	1867
	1868
	1869	@subsection Commands Requiring a newline
	1870
	1871	The following commands cannot be separated by a semicolon and
	1872	require a newline:
	1873
	1874	@table @asis
	1875
	1876	@item @code{a},@code{c},@code{i} (append/change/insert)
	1877
	1878	All characters following @code{a},@code{c},@code{i} commands are taken
	1879	as the text to append/change/insert. Using a semicolon leads to
	1880	undesirable results:
	1881
	1882	@codequoteundirected on
	1883	@codequotebacktick on
	1884	@example
	1885	@group
	1886	$ seq 2 \| sed '1aHello ; 2d'
	1887	1
	1888	Hello ; 2d
	1889	2
	1890	@end group
	1891	@end example
	1892	@codequoteundirected off
	1893	@codequotebacktick off
	1894
	1895	Separate the commands using @option{-e} or a newline:
	1896
	1897	@codequoteundirected on
	1898	@codequotebacktick on
	1899	@example
	1900	@group
	1901	$ seq 2 \| sed -e 1aHello -e 2d
	1902	1
	1903	Hello
	1904
	1905	$ seq 2 \| sed '1aHello
	1906	2d'
	1907	1
	1908	Hello
	1909	@end group
	1910	@end example
	1911	@codequoteundirected off
	1912	@codequotebacktick off
	1913
	1914	Note that specifying the text to add (@samp{Hello}) immediately
	1915	after @code{a},@code{c},@code{i} is itself a @value{SSED} extension.
	1916	A portable, POSIX-compliant alternative is:
	1917
	1918	@codequoteundirected on
	1919	@codequotebacktick on
	1920	@example
	1921	@group
	1922	$ seq 2 \| sed '1a\
	1923	Hello
	1924	2d'
	1925	1
	1926	Hello
	1927	@end group
	1928	@end example
	1929	@codequoteundirected off
	1930	@codequotebacktick off
	1931
	1932	@item @code{#} (comment)
	1933
	1934	All characters following @samp{#} until the next newline are ignored.
	1935
	1936	@codequoteundirected on
	1937	@codequotebacktick on
	1938	@example
	1939	@group
	1940	$ seq 3 \| sed '# this is a comment ; 2d'
	1941	1
	1942	2
	1943	3
	1944
	1945
	1946	$ seq 3 \| sed '# this is a comment
	1947	2d'
	1948	1
	1949	3
	1950	@end group
	1951	@end example
	1952	@codequoteundirected off
	1953	@codequotebacktick off
	1954
	1955	@item @code{r},@code{R},@code{w},@code{W} (reading and writing files)
	1956
	1957	The @code{r},@code{R},@code{w},@code{W} commands parse the filename
	1958	until end of the line. If whitespace, comments or semicolons are found,
	1959	they will be included in the filename, leading to unexpected results:
	1960
	1961	@codequoteundirected on
	1962	@codequotebacktick on
	1963	@example
	1964	@group
	1965	$ seq 2 \| sed '1w hello.txt ; 2d'
	1966	1
	1967	2
	1968
	1969	$ ls -log
	1970	total 4
	1971	-rw-rw-r-- 1 2 Jan 23 23:03 hello.txt ; 2d
	1972
	1973	$ cat 'hello.txt ; 2d'
	1974	1
	1975	@end group
	1976	@end example
	1977	@codequoteundirected off
	1978	@codequotebacktick off
	1979
	1980	Note that @command{sed} silently ignores read/write errors in
	1981	@code{r},@code{R},@code{w},@code{W} commands (such as missing files).
	1982	In the following example, @command{sed} tries to read a file named
	1983	@samp{@file{hello.txt ; N}}. The file is missing, and the error is silently
	1984	ignored:
	1985
	1986	@codequoteundirected on
	1987	@codequotebacktick on
	1988	@example
	1989	@group
	1990	$ echo x \| sed '1rhello.txt ; N'
	1991	x
	1992	@end group
	1993	@end example
	1994	@codequoteundirected off
	1995	@codequotebacktick off
	1996
	1997	@item @code{e} (command execution)
	1998
	1999	Any characters following the @code{e} command until the end of the line
	2000	will be sent to the shell. If whitespace, comments or semicolons are found,
	2001	they will be included in the shell command, leading to unexpected results:
	2002
	2003	@codequoteundirected on
	2004	@codequotebacktick on
	2005	@example
	2006	@group
	2007	$ echo a \| sed '1e touch foo#bar'
	2008	a
	2009
	2010	$ ls -1
	2011	foo#bar
	2012
	2013	$ echo a \| sed '1e touch foo ; s/a/b/'
	2014	sh: 1: s/a/b/: not found
	2015	a
	2016	@end group
	2017	@end example
	2018	@codequoteundirected off
	2019	@codequotebacktick off
	2020
	2021
	2022	@item @code{s///[we]} (substitute with @code{e} or @code{w} flags)
	2023
	2024	In a substitution command, the @code{w} flag writes the substitution
	2025	result to a file, and the @code{e} flag executes the substitution result
	2026	as a shell command. As with the @code{r/R/w/W/e} commands, these
	2027	must be terminated with a newline. If whitespace, comments or semicolons
	2028	are found, they will be included in the shell command or filename, leading to
	2029	unexpected results:
	2030
	2031	@codequoteundirected on
	2032	@codequotebacktick on
	2033	@example
	2034	@group
	2035	$ echo a \| sed 's/a/b/w1.txt#foo'
	2036	b
	2037
	2038	$ ls -1
	2039	1.txt#foo
	2040	@end group
	2041	@end example
	2042	@codequoteundirected off
	2043	@codequotebacktick off
	2044
	2045	@end table
	2046
	2047
	2048	@node sed addresses
	2049	@chapter Addresses: selecting lines
	2050
	2051	@menu
	2052	* Addresses overview:: Addresses overview
	2053	* Numeric Addresses:: selecting lines by numbers
	2054	* Regexp Addresses:: selecting lines by text matching
	2055	* Range Addresses:: selecting a range of lines
	2056	* Zero Address:: Using address @code{0}
	2057	@end menu
	2058
	2059	@node Addresses overview
	2060	@section Addresses overview
	2061
	2062	@cindex addresses, numeric
	2063	@cindex numeric addresses
	2064	Addresses determine on which line(s) the @command{sed} command will be
	2065	executed. The following command replaces any first occurrence of @samp{hello}
	2066	with @samp{world} only on line 144:
	2067
	2068	@codequoteundirected on
	2069	@codequotebacktick on
	2070	@example
	2071	sed '144s/hello/world/' input.txt > output.txt
	2072	@end example
	2073	@codequoteundirected off
	2074	@codequotebacktick off
	2075
	2076
	2077
	2078	If no address is specified, the command is performed on all lines.
	2079	The following command replaces @samp{hello} with @samp{world},
	2080	targeting every line of the input file.
	2081	However, note that it modifies only the first instance of @samp{hello}
	2082	on each line.
	2083	Use the @samp{g} modifier to affect every instance on each affected line.
	2084
	2085	@codequoteundirected on
	2086	@codequotebacktick on
	2087	@example
	2088	sed 's/hello/world/' input.txt > output.txt
	2089	@end example
	2090	@codequoteundirected off
	2091	@codequotebacktick off
	2092
	2093
	2094
	2095	@cindex addresses, regular expression
	2096	@cindex regular expression addresses
	2097	Addresses can contain regular expressions to match lines based
	2098	on content instead of line numbers. The following command replaces
	2099	@samp{hello} with @samp{world} only on lines
	2100	containing the string @samp{apple}:
	2101
	2102	@codequoteundirected on
	2103	@codequotebacktick on
	2104	@example
	2105	sed '/apple/s/hello/world/' input.txt > output.txt
	2106	@end example
	2107	@codequoteundirected off
	2108	@codequotebacktick off
	2109
	2110
	2111
	2112	@cindex addresses, range
	2113	@cindex range addresses
	2114	An address range is specified with two addresses separated by a comma
	2115	(@code{,}). Addresses can be numeric, regular expressions, or a mix of
	2116	both.
	2117	The following command replaces @samp{hello} with @samp{world}
	2118	only on lines 4 to 17 (inclusive):
	2119
	2120	@codequoteundirected on
	2121	@codequotebacktick on
	2122	@example
	2123	sed '4,17s/hello/world/' input.txt > output.txt
	2124	@end example
	2125	@codequoteundirected off
	2126	@codequotebacktick off
	2127
	2128
	2129
	2130	@cindex Excluding lines
	2131	@cindex Selecting non-matching lines
	2132	@cindex addresses, negating
	2133	@cindex addresses, excluding
	2134	Appending the @code{!} character to the end of an address
	2135	specification (before the command letter) negates the sense of the
	2136	match. That is, if the @code{!} character follows an address or an
	2137	address range, then only lines which do @emph{not} match the addresses
	2138	will be selected. The following command replaces @samp{hello}
	2139	with @samp{world} only on lines @emph{not} containing the string
	2140	@samp{apple}:
	2141
	2142	@example
	2143	sed '/apple/!s/hello/world/' input.txt > output.txt
	2144	@end example
	2145
	2146	The following command replaces @samp{hello} with
	2147	@samp{world} only on lines 1 to 3 and from line 18 to the last line of the
	2148	input file (i.e. excluding lines 4 to 17):
	2149
	2150	@example
	2151	sed '4,17!s/hello/world/' input.txt > output.txt
	2152	@end example
	2153
	2154
	2155
	2156
	2157
	2158	@node Numeric Addresses
	2159	@section Selecting lines by numbers
	2160	@cindex Addresses, in @command{sed} scripts
	2161	@cindex Line selection
	2162	@cindex Selecting lines to process
	2163
	2164	Addresses in a @command{sed} script can be in any of the following forms:
	2165	@table @code
	2166	@item @var{number}
	2167	@cindex Address, numeric
	2168	@cindex Line, selecting by number
	2169	Specifying a line number will match only that line in the input.
	2170	(Note that @command{sed} counts lines continuously across all input files
	2171	unless @option{-i} or @option{-s} options are specified.)
	2172
	2173	@item $
	2174	@cindex Address, last line
	2175	@cindex Last line, selecting
	2176	@cindex Line, selecting last
	2177	This address matches the last line of the last file of input, or
	2178	the last line of each file when the @option{-i} or @option{-s} options
	2179	are specified.
	2180
	2181
	2182	@item @var{first}~@var{step}
	2183	@cindex GNU extensions, @samp{@var{n}~@var{m}} addresses
	2184	This GNU extension matches every @var{step}th line
	2185	starting with line @var{first}.
	2186	In particular, lines will be selected when there exists
	2187	a non-negative @var{n} such that the current line-number equals
	2188	@var{first} + (@var{n} * @var{step}).
	2189	Thus, one would use @code{1~2} to select the odd-numbered lines and
	2190	@code{0~2} for even-numbered lines;
	2191	to pick every third line starting with the second, @samp{2~3} would be used;
	2192	to pick every fifth line starting with the tenth, use @samp{10~5};
	2193	and @samp{50~0} is just an obscure way of saying @code{50}.
	2194
	2195	The following commands demonstrate the step address usage:
	2196
	2197	@example
	2198	$ seq 10 \| sed -n '0~4p'
	2199	4
	2200	8
	2201
	2202	$ seq 10 \| sed -n '1~3p'
	2203	1
	2204	4
	2205	7
	2206	10
	2207	@end example
	2208
	2209
	2210	@end table
	2211
	2212
	2213
	2214	@node Regexp Addresses
	2215	@section selecting lines by text matching
	2216
	2217	@value{SSED} supports the following regular expression addresses.
	2218	The default regular expression is
	2219	@ref{BRE syntax, , Basic Regular Expression (BRE)}.
	2220	If @option{-E} or @option{-r} options are used, The regular expression should be
	2221	in @ref{ERE syntax, , Extended Regular Expression (ERE)} syntax.
	2222	@xref{BRE vs ERE}.
	2223
	2224	@table @code
	2225	@item /@var{regexp}/
	2226	@cindex Address, as a regular expression
	2227	@cindex Line, selecting by regular expression match
	2228	This will select any line which matches the regular expression @var{regexp}.
	2229	If @var{regexp} itself includes any @code{/} characters,
	2230	each must be escaped by a backslash (@code{\}).
	2231
	2232	The following command prints lines in @file{/etc/passwd}
	2233	which end with @samp{bash}@footnote{
	2234	There are of course many other ways to do the same,
	2235	e.g.
	2236	@example
	2237	grep 'bash$' /etc/passwd
	2238	awk -F: '$7 == "/bin/bash"' /etc/passwd
	2239	@end example
	2240	}:
	2241
	2242	@example
	2243	sed -n '/bash$/p' /etc/passwd
	2244	@end example
	2245
	2246	@cindex empty regular expression
	2247	@cindex @value{SSEDEXT}, modifiers and the empty regular expression
	2248	The empty regular expression @samp{//} repeats the last regular
	2249	expression match (the same holds if the empty regular expression is
	2250	passed to the @code{s} command). Note that modifiers to regular expressions
	2251	are evaluated when the regular expression is compiled, thus it is invalid to
	2252	specify them together with the empty regular expression.
	2253
	2254	@item \%@var{regexp}%
	2255	(The @code{%} may be replaced by any other single character.)
	2256
	2257	@cindex Slash character, in regular expressions
	2258	This also matches the regular expression @var{regexp},
	2259	but allows one to use a different delimiter than @code{/}.
	2260	This is particularly useful if the @var{regexp} itself contains
	2261	a lot of slashes, since it avoids the tedious escaping of every @code{/}.
	2262	If @var{regexp} itself includes any delimiter characters,
	2263	each must be escaped by a backslash (@code{\}).
	2264
	2265	The following commands are equivalent. They print lines
	2266	which start with @samp{/home/alice/documents/}:
	2267
	2268	@example
	2269	sed -n '/^\/home\/alice\/documents\//p'
	2270	sed -n '\%^/home/alice/documents/%p'
	2271	sed -n '\;^/home/alice/documents/;p'
	2272	@end example
	2273
	2274
	2275	@item /@var{regexp}/I
	2276	@itemx \%@var{regexp}%I
	2277	@cindex GNU extensions, @code{I} modifier
	2278	@cindex case insensitive, regular expression
	2279	The @code{I} modifier to regular-expression matching is a GNU
	2280	extension which causes the @var{regexp} to be matched in
	2281	a case-insensitive manner.
	2282
	2283	In many other programming languages, a lower case @code{i} is used
	2284	for case-insensitive regular expression matching. However, in @command{sed}
	2285	the @code{i} is used for the insert command (@pxref{insert command}).
	2286
	2287	Observe the difference between the following examples.
	2288
	2289	In this example, @code{/b/I} is the address: regular expression with @code{I}
	2290	modifier. @code{d} is the delete command:
	2291
	2292	@example
	2293	$ printf "%s\n" a b c \| sed '/b/Id'
	2294	a
	2295	c
	2296	@end example
	2297
	2298	Here, @code{/b/} is the address: a regular expression.
	2299	@code{i} is the insert command.
	2300	@code{d} is the value to insert.
	2301	A line with @samp{d} is then inserted above the matched line:
	2302
	2303	@example
	2304	$ printf "%s\n" a b c \| sed '/b/id'
	2305	a
	2306	d
	2307	b
	2308	c
	2309	@end example
	2310
	2311	@item /@var{regexp}/M
	2312	@itemx \%@var{regexp}%M
	2313	@cindex @value{SSEDEXT}, @code{M} modifier
	2314	The @code{M} modifier to regular-expression matching is a @value{SSED}
	2315	extension which directs @value{SSED} to match the regular expression
	2316	in @cite{multi-line} mode. The modifier causes @code{^} and @code{$} to
	2317	match respectively (in addition to the normal behavior) the empty string
	2318	after a newline, and the empty string before a newline. There are
	2319	special character sequences
	2320	@ifclear PERL
	2321	(@code{\`} and @code{\'})
	2322	@end ifclear
	2323	which always match the beginning or the end of the buffer.
	2324	In addition,
	2325	the period character does not match a new-line character in
	2326	multi-line mode.
	2327	@end table
	2328
	2329
	2330	@cindex regex addresses and pattern space
	2331	@cindex regex addresses and input lines
	2332	Regex addresses operate on the content of the current
	2333	pattern space. If the pattern space is changed (for example with @code{s///}
	2334	command) the regular expression matching will operate on the changed text.
	2335
	2336	In the following example, automatic printing is disabled with
	2337	@option{-n}. The @code{s/2/X/} command changes lines containing
	2338	@samp{2} to @samp{X}. The command @code{/[0-9]/p} matches
	2339	lines with digits and prints them.
	2340	Because the second line is changed before the @code{/[0-9]/} regex,
	2341	it will not match and will not be printed:
	2342
	2343	@codequoteundirected on
	2344	@codequotebacktick on
	2345	@example
	2346	@group
	2347	$ seq 3 \| sed -n 's/2/X/ ; /[0-9]/p'
	2348	1
	2349	3
	2350	@end group
	2351	@end example
	2352	@codequoteundirected off
	2353	@codequotebacktick off
	2354
	2355
	2356	@node Range Addresses
	2357	@section Range Addresses
	2358
	2359	@cindex Range of lines
	2360	@cindex Several lines, selecting
	2361	An address range can be specified by specifying two addresses
	2362	separated by a comma (@code{,}). An address range matches lines
	2363	starting from where the first address matches, and continues
	2364	until the second address matches (inclusively):
	2365
	2366	@example
	2367	$ seq 10 \| sed -n '4,6p'
	2368	4
	2369	5
	2370	6
	2371	@end example
	2372
	2373	If the second address is a @var{regexp}, then checking for the
	2374	ending match will start with the line @emph{following} the
	2375	line which matched the first address: a range will always
	2376	span at least two lines (except of course if the input stream
	2377	ends).
	2378
	2379	@example
	2380	$ seq 10 \| sed -n '4,/[0-9]/p'
	2381	4
	2382	5
	2383	@end example
	2384
	2385	If the second address is a @var{number} less than (or equal to)
	2386	the line matching the first address, then only the one line is
	2387	matched:
	2388
	2389	@example
	2390	$ seq 10 \| sed -n '4,1p'
	2391	4
	2392	@end example
	2393
	2394	@anchor{Zero Address Regex Range}
	2395	@cindex Special addressing forms
	2396	@cindex Range with start address of zero
	2397	@cindex Zero, as range start address
	2398	@cindex @var{addr1},+N
	2399	@cindex @var{addr1},~N
	2400	@cindex GNU extensions, special two-address forms
	2401	@cindex GNU extensions, @code{0} address
	2402	@cindex GNU extensions, 0,@var{addr2} addressing
	2403	@cindex GNU extensions, @var{addr1},+@var{N} addressing
	2404	@cindex GNU extensions, @var{addr1},~@var{N} addressing
	2405	@value{SSED} also supports some special two-address forms; all these
	2406	are GNU extensions:
	2407	@table @code
	2408	@item 0,/@var{regexp}/
	2409	A line number of @code{0} can be used in an address specification like
	2410	@code{0,/@var{regexp}/} so that @command{sed} will try to match
	2411	@var{regexp} in the first input line too. In other words,
	2412	@code{0,/@var{regexp}/} is similar to @code{1,/@var{regexp}/},
	2413	except that if @var{addr2} matches the very first line of input the
	2414	@code{0,/@var{regexp}/} form will consider it to end the range, whereas
	2415	the @code{1,/@var{regexp}/} form will match the beginning of its range and
	2416	hence make the range span up to the @emph{second} occurrence of the
	2417	regular expression.
	2418
	2419	The following examples demonstrate the difference between starting
	2420	with address 1 and 0:
	2421
	2422	@example
	2423	$ seq 10 \| sed -n '1,/[0-9]/p'
	2424	1
	2425	2
	2426
	2427	$ seq 10 \| sed -n '0,/[0-9]/p'
	2428	1
	2429	@end example
	2430
	2431
	2432	@item @var{addr1},+@var{N}
	2433	Matches @var{addr1} and the @var{N} lines following @var{addr1}.
	2434
	2435	@example
	2436	$ seq 10 \| sed -n '6,+2p'
	2437	6
	2438	7
	2439	8
	2440	@end example
	2441
	2442	@var{addr1} can be a line number or a regular expression.
	2443
	2444	@item @var{addr1},~@var{N}
	2445	Matches @var{addr1} and the lines following @var{addr1}
	2446	until the next line whose input line number is a multiple of @var{N}.
	2447	The following command prints starting at line 6, until the next line which
	2448	is a multiple of 4 (i.e. line 8):
	2449
	2450	@example
	2451	$ seq 10 \| sed -n '6,~4p'
	2452	6
	2453	7
	2454	8
	2455	@end example
	2456
	2457	@var{addr1} can be a line number or a regular expression.
	2458
	2459	@end table
	2460
	2461
	2462
	2463	@node Zero Address
	2464	@section Zero Address
	2465	@cindex Zero Address
	2466	As a @value{SSED} extension, @code{0} address can be used in two cases:
	2467	@enumerate
	2468	@item
	2469	In a regex range addresses as @code{0,/@var{regexp}/}
	2470	(@pxref{Zero Address Regex Range}).
	2471	@item
	2472	With the @code{r} command, inserting a file before the first line
	2473	(@pxref{Adding a header to multiple files}).
	2474	@end enumerate
	2475
	2476	Note that these are the only places where the @code{0} address makes
	2477	sense; Commands which are given the @code{0} address in any
	2478	other way will give an error.
	2479
	2480
	2481
	2482	@node sed regular expressions
	2483	@chapter Regular Expressions: selecting text
	2484
	2485	@menu
	2486	* Regular Expressions Overview:: Overview of Regular expression in @command{sed}
	2487	* BRE vs ERE:: Basic (BRE) and extended (ERE) regular expression
	2488	syntax
	2489	* BRE syntax:: Overview of basic regular expression syntax
	2490	* ERE syntax:: Overview of extended regular expression syntax
	2491	* Character Classes and Bracket Expressions::
	2492	* regexp extensions:: Additional regular expression commands
	2493	* Back-references and Subexpressions:: Back-references and Subexpressions
	2494	* Escapes:: Specifying special characters
	2495	* Locale Considerations:: Multibyte characters and locale considerations
	2496	@end menu
	2497
	2498	@node Regular Expressions Overview
	2499	@section Overview of regular expression in @command{sed}
	2500
	2501	@c NOTE: Keep examples in the 'overview' section
	2502	@c neutral in regards to BRE/ERE - to ease understanding.
	2503
	2504
	2505	To know how to use @command{sed}, people should understand regular
	2506	expressions (@dfn{regexp} for short). A regular expression
	2507	is a pattern that is matched against a
	2508	subject string from left to right. Most characters are
	2509	@dfn{ordinary}: they stand for
	2510	themselves in a pattern, and match the corresponding characters.
	2511	Regular expressions in @command{sed} are specified between two
	2512	slashes.
	2513
	2514	The following command prints lines containing the string @samp{hello}:
	2515
	2516	@example
	2517	sed -n '/hello/p'
	2518	@end example
	2519
	2520	The above example is equivalent to this @command{grep} command:
	2521
	2522	@example
	2523	grep 'hello'
	2524	@end example
	2525
	2526	The power of regular expressions comes from the ability to include
	2527	alternatives and repetitions in the pattern. These are encoded in the
	2528	pattern by the use of @dfn{special characters}, which do not stand for
	2529	themselves but instead are interpreted in some special way.
	2530
	2531	The character @code{^} (caret) in a regular expression matches the
	2532	beginning of the line. The character @code{.} (dot) matches any single
	2533	character. The following @command{sed} command matches and prints
	2534	lines which start with the letter @samp{b}, followed by any single character,
	2535	followed by the letter @samp{d}:
	2536
	2537	@example
	2538	$ printf "%s\n" abode bad bed bit bid byte body \| sed -n '/^b.d/p'
	2539	bad
	2540	bed
	2541	bid
	2542	body
	2543	@end example
	2544
	2545	The following sections explain the meaning and usage of special
	2546	characters in regular expressions.
	2547
	2548	@node BRE vs ERE
	2549	@section Basic (BRE) and extended (ERE) regular expression
	2550
	2551	Basic and extended regular expressions are two variations on the
	2552	syntax of the specified pattern. Basic Regular Expression (BRE) syntax is the
	2553	default in @command{sed} (and similarly in @command{grep}).
	2554	Use the POSIX-specified @option{-E} option (@option{-r},
	2555	@option{--regexp-extended}) to enable Extended Regular Expression (ERE) syntax.
	2556
	2557	In @value{SSED}, the only difference between basic and extended regular
	2558	expressions is in the behavior of a few special characters: @samp{?},
	2559	@samp{+}, parentheses, braces (@samp{@{@}}), and @samp{\|}.
	2560
	2561	With basic (BRE) syntax, these characters do not have special meaning
	2562	unless prefixed with a backslash (@samp{\}); While with extended (ERE) syntax
	2563	it is reversed: these characters are special unless they are prefixed
	2564	with backslash (@samp{\}).
	2565
	2566	@multitable @columnfractions .28 .36 .35
	2567
	2568	@headitem Desired pattern
	2569	@tab Basic (BRE) Syntax
	2570	@tab Extended (ERE) Syntax
	2571
	2572	@item literal @samp{+} (plus sign)
	2573
	2574	@tab
	2575	@exampleindent 0
	2576	@codequoteundirected on
	2577	@codequotebacktick on
	2578	@example
	2579	$ echo 'a+b=c' > foo
	2580	$ sed -n '/a+b/p' foo
	2581	a+b=c
	2582	@end example
	2583	@codequotebacktick off
	2584	@codequoteundirected off
	2585
	2586	@tab
	2587	@exampleindent 0
	2588	@codequoteundirected on
	2589	@codequotebacktick on
	2590	@example
	2591	$ echo 'a+b=c' > foo
	2592	$ sed -E -n '/a\+b/p' foo
	2593	a+b=c
	2594	@end example
	2595	@codequotebacktick off
	2596	@codequoteundirected off
	2597
	2598
	2599	@item One or more @samp{a} characters followed by @samp{b}
	2600	(plus sign as special meta-character)
	2601
	2602	@tab
	2603	@exampleindent 0
	2604	@codequoteundirected on
	2605	@codequotebacktick on
	2606	@example
	2607	$ echo aab > foo
	2608	$ sed -n '/a\+b/p' foo
	2609	aab
	2610	@end example
	2611	@codequotebacktick off
	2612	@codequoteundirected off
	2613
	2614	@tab
	2615	@exampleindent 0
	2616	@codequoteundirected on
	2617	@codequotebacktick on
	2618	@example
	2619	$ echo aab > foo
	2620	$ sed -E -n '/a+b/p' foo
	2621	aab
	2622	@end example
	2623	@codequotebacktick off
	2624	@codequoteundirected off
	2625
	2626	@end multitable
	2627
	2628
	2629
	2630
	2631	@node BRE syntax
	2632	@section Overview of basic regular expression syntax
	2633
	2634	Here is a brief description
	2635	of regular expression syntax as used in @command{sed}.
	2636
	2637	@table @code
	2638	@item @var{char}
	2639	A single ordinary character matches itself.
	2640
	2641	@item *
	2642	@cindex GNU extensions, to basic regular expressions
	2643	Matches a sequence of zero or more instances of matches for the
	2644	preceding regular expression, which must be an ordinary character, a
	2645	special character preceded by @code{\}, a @code{.}, a grouped regexp
	2646	(see below), or a bracket expression. As a GNU extension, a
	2647	postfixed regular expression can also be followed by @code{*}; for
	2648	example, @code{a*} is equivalent to @code{a}. POSIX
	2649	1003.1-2001 says that @code{*} stands for itself when it appears at
	2650	the start of a regular expression or subexpression, but many
	2651	non-GNU implementations do not support this and portable
	2652	scripts should instead use @code{\*} in these contexts.
	2653	@item .
	2654	Matches any character, including newline.
	2655
	2656	@item ^
	2657	Matches the null string at beginning of the pattern space, i.e. what
	2658	appears after the circumflex must appear at the beginning of the
	2659	pattern space.
	2660
	2661	In most scripts, pattern space is initialized to the content of each
	2662	line (@pxref{Execution Cycle, , How @code{sed} works}). So, it is a
	2663	useful simplification to think of @code{^#include} as matching only
	2664	lines where @samp{#include} is the first thing on the line---if there is
	2665	any preceding space, for example, the match fails. This simplification is
	2666	valid as long as the original content of pattern space is not modified,
	2667	for example with an @code{s} command.
	2668
	2669	@code{^} acts as a special character only at the beginning of the
	2670	regular expression or subexpression (that is, after @code{\(} or
	2671	@code{\\|}). Portable scripts should avoid @code{^} at the beginning of
	2672	a subexpression, though, as POSIX allows implementations that
	2673	treat @code{^} as an ordinary character in that context.
	2674
	2675	@item $
	2676	It is the same as @code{^}, but refers to end of pattern space.
	2677	@code{$} also acts as a special character only at the end
	2678	of the regular expression or subexpression (that is, before @code{\)}
	2679	or @code{\\|}), and its use at the end of a subexpression is not
	2680	portable.
	2681
	2682
	2683	@item [@var{list}]
	2684	@itemx [^@var{list}]
	2685	Matches any single character in @var{list}: for example,
	2686	@code{[aeiou]} matches all vowels. A list may include
	2687	sequences like @code{@var{char1}-@var{char2}}, which
	2688	matches any character between (inclusive) @var{char1}
	2689	and @var{char2}.
	2690	@xref{Character Classes and Bracket Expressions}.
	2691
	2692	@item \+
	2693	@cindex GNU extensions, to basic regular expressions
	2694	As @code{*}, but matches one or more. It is a GNU extension.
	2695
	2696	@item \?
	2697	@cindex GNU extensions, to basic regular expressions
	2698	As @code{*}, but only matches zero or one. It is a GNU extension.
	2699
	2700	@item \@{@var{i}\@}
	2701	As @code{*}, but matches exactly @var{i} sequences (@var{i} is a
	2702	decimal integer; for portability, keep it between 0 and 255
	2703	inclusive).
	2704
	2705	@item \@{@var{i},@var{j}\@}
	2706	Matches between @var{i} and @var{j}, inclusive, sequences.
	2707
	2708	@item \@{@var{i},\@}
	2709	Matches more than or equal to @var{i} sequences.
	2710
	2711	@item $@var{regexp}$
	2712	Groups the inner @var{regexp} as a whole, this is used to:
	2713
	2714	@itemize @bullet
	2715	@item
	2716	@cindex GNU extensions, to basic regular expressions
	2717	Apply postfix operators, like @code{$abcd$*}:
	2718	this will search for zero or more whole sequences
	2719	of @samp{abcd}, while @code{abcd*} would search
	2720	for @samp{abc} followed by zero or more occurrences
	2721	of @samp{d}. Note that support for @code{$abcd$*} is
	2722	required by POSIX 1003.1-2001, but many non-GNU
	2723	implementations do not support it and hence it is not universally
	2724	portable.
	2725
	2726	@item
	2727	Use back references (see below).
	2728	@end itemize
	2729
	2730
	2731	@item @var{regexp1}\\|@var{regexp2}
	2732	@cindex GNU extensions, to basic regular expressions
	2733	Matches either @var{regexp1} or @var{regexp2}. Use
	2734	parentheses to use complex alternative regular expressions.
	2735	The matching process tries each alternative in turn, from
	2736	left to right, and the first one that succeeds is used.
	2737	It is a GNU extension.
	2738
	2739	@item @var{regexp1}@var{regexp2}
	2740	Matches the concatenation of @var{regexp1} and @var{regexp2}.
	2741	Concatenation binds more tightly than @code{\\|}, @code{^}, and
	2742	@code{$}, but less tightly than the other regular expression
	2743	operators.
	2744
	2745	@item \@var{digit}
	2746	Matches the @var{digit}-th @code{$@dots{}$} parenthesized
	2747	subexpression in the regular expression. This is called a @dfn{back
	2748	reference}. Subexpressions are implicitly numbered by counting
	2749	occurrences of @code{\(} left-to-right.
	2750
	2751	@item \n
	2752	Matches the newline character.
	2753
	2754	@item \@var{char}
	2755	Matches @var{char}, where @var{char} is one of @code{$},
	2756	@code{*}, @code{.}, @code{[}, @code{\}, or @code{^}.
	2757	Note that the only C-like
	2758	backslash sequences that you can portably assume to be
	2759	interpreted are @code{\n} and @code{\\}; in particular
	2760	@code{\t} is not portable, and matches a @samp{t} under most
	2761	implementations of @command{sed}, rather than a tab character.
	2762
	2763	@end table
	2764
	2765	@cindex Greedy regular expression matching
	2766	Note that the regular expression matcher is greedy, i.e., matches
	2767	are attempted from left to right and, if two or more matches are
	2768	possible starting at the same character, it selects the longest.
	2769
	2770	@noindent
	2771	Examples:
	2772	@table @samp
	2773	@item abcdef
	2774	Matches @samp{abcdef}.
	2775
	2776	@item a*b
	2777	Matches zero or more @samp{a}s followed by a single
	2778	@samp{b}. For example, @samp{b} or @samp{aaaaab}.
	2779
	2780	@item a\?b
	2781	Matches @samp{b} or @samp{ab}.
	2782
	2783	@item a\+b\+
	2784	Matches one or more @samp{a}s followed by one or more
	2785	@samp{b}s: @samp{ab} is the shortest possible match, but
	2786	other examples are @samp{aaaab} or @samp{abbbbb} or
	2787	@samp{aaaaaabbbbbbb}.
	2788
	2789	@item .*
	2790	@itemx .\+
	2791	These two both match all the characters in a string;
	2792	however, the first matches every string (including the empty
	2793	string), while the second matches only strings containing
	2794	at least one character.
	2795
	2796	@item ^main.(.)
	2797	This matches a string starting with @samp{main},
	2798	followed by an opening and closing
	2799	parenthesis. The @samp{n}, @samp{(} and @samp{)} need not
	2800	be adjacent.
	2801
	2802	@item ^#
	2803	This matches a string beginning with @samp{#}.
	2804
	2805	@item \\$
	2806	This matches a string ending with a single backslash. The
	2807	regexp contains two backslashes for escaping.
	2808
	2809	@item \$
	2810	Instead, this matches a string consisting of a single dollar sign,
	2811	because it is escaped.
	2812
	2813	@item [a-zA-Z0-9]
	2814	In the C locale, this matches any ASCII letters or digits.
	2815
	2816	@item [^ @kbd{@key{TAB}}]\+
	2817	(Here @kbd{@key{TAB}} stands for a single tab character.)
	2818	This matches a string of one or more
	2819	characters, none of which is a space or a tab.
	2820	Usually this means a word.
	2821
	2822	@item ^$.*$\n\1$
	2823	This matches a string consisting of two equal substrings separated by
	2824	a newline.
	2825
	2826	@item .\@{9\@}A$
	2827	This matches nine characters followed by an @samp{A} at the end of a line.
	2828
	2829	@item ^.\@{15\@}A
	2830	This matches the start of a string that contains 16 characters,
	2831	the last of which is an @samp{A}.
	2832
	2833	@end table
	2834
	2835
	2836	@node ERE syntax
	2837	@section Overview of extended regular expression syntax
	2838	@cindex Extended regular expressions, syntax
	2839
	2840	The only difference between basic and extended regular expressions is in
	2841	the behavior of a few characters: @samp{?}, @samp{+}, parentheses,
	2842	braces (@samp{@{@}}), and @samp{\|}. While basic regular expressions
	2843	require these to be escaped if you want them to behave as special
	2844	characters, when using extended regular expressions you must escape
	2845	them if you want them @emph{to match a literal character}. @samp{\|}
	2846	is special here because @samp{\\|} is a GNU extension -- standard
	2847	basic regular expressions do not provide its functionality.
	2848
	2849	@noindent
	2850	Examples:
	2851	@table @code
	2852	@item abc?
	2853	becomes @samp{abc\?} when using extended regular expressions. It matches
	2854	the literal string @samp{abc?}.
	2855
	2856	@item c\+
	2857	becomes @samp{c+} when using extended regular expressions. It matches
	2858	one or more @samp{c}s.
	2859
	2860	@item a\@{3,\@}
	2861	becomes @samp{a@{3,@}} when using extended regular expressions. It matches
	2862	three or more @samp{a}s.
	2863
	2864	@item $abc$\@{2,3\@}
	2865	becomes @samp{(abc)@{2,3@}} when using extended regular expressions. It
	2866	matches either @samp{abcabc} or @samp{abcabcabc}.
	2867
	2868	@item $abc*$\1
	2869	becomes @samp{(abc*)\1} when using extended regular expressions.
	2870	Backreferences must still be escaped when using extended regular
	2871	expressions.
	2872
	2873	@item a\\|b
	2874	becomes @samp{a\|b} when using extended regular expressions. It matches
	2875	@samp{a} or @samp{b}.
	2876	@end table
	2877
	2878	@node Character Classes and Bracket Expressions
	2879	@section Character Classes and Bracket Expressions
	2880
	2881	@c The 'character class' section is shamelessly copied from grep's manual.
	2882
	2883	@cindex bracket expression
	2884	@cindex character class
	2885	A @dfn{bracket expression} is a list of characters enclosed by @samp{[} and
	2886	@samp{]}.
	2887	It matches any single character in that list;
	2888	if the first character of the list is the caret @samp{^},
	2889	then it matches any character @strong{not} in the list.
	2890	For example, the following command replaces the strings
	2891	@samp{gray} or @samp{grey} with @samp{blue}:
	2892
	2893	@example
	2894	sed 's/gr[ae]y/blue/'
	2895	@end example
	2896
	2897	@c TODO: fix 'ref' to look good in both HTML and PDF
	2898	Bracket expressions can be used in both
	2899	@ref{BRE syntax,,basic} and @ref{ERE syntax,,extended}
	2900	regular expressions (that is, with or without the @option{-E}/@option{-r}
	2901	options).
	2902
	2903	@cindex range expression
	2904	Within a bracket expression, a @dfn{range expression} consists of two
	2905	characters separated by a hyphen.
	2906	It matches any single character that
	2907	sorts between the two characters, inclusive.
	2908	In the default C locale, the sorting sequence is the native character
	2909	order; for example, @samp{[a-d]} is equivalent to @samp{[abcd]}.
	2910
	2911
	2912	Finally, certain named classes of characters are predefined within
	2913	bracket expressions, as follows.
	2914
	2915	These named classes must be used @emph{inside} brackets
	2916	themselves. Correct usage:
	2917	@example
	2918	$ echo 1 \| sed 's/[[:digit:]]/X/'
	2919	X
	2920	@end example
	2921
	2922	Incorrect usage is rejected by newer @command{sed} versions.
	2923	Older versions accepted it but treated it as a single bracket expression
	2924	(which is equivalent to @samp{[dgit:]},
	2925	that is, only the characters @var{d/g/i/t/:}):
	2926	@example
	2927	# current GNU sed versions - incorrect usage rejected
	2928	$ echo 1 \| sed 's/[:digit:]/X/'
	2929	sed: character class syntax is [[:space:]], not [:space:]
	2930
	2931	# older GNU sed versions
	2932	$ echo 1 \| sed 's/[:digit:]/X/'
	2933	1
	2934	@end example
	2935
	2936
	2937	@cindex classes of characters
	2938	@cindex character classes
	2939	@cindex named character classes
	2940	@table @samp
	2941
	2942	@item [:alnum:]
	2943	@opindex alnum @r{character class}
	2944	@cindex alphanumeric characters
	2945	Alphanumeric characters:
	2946	@samp{[:alpha:]} and @samp{[:digit:]}; in the @samp{C} locale and ASCII
	2947	character encoding, this is the same as @samp{[0-9A-Za-z]}.
	2948
	2949	@item [:alpha:]
	2950	@opindex alpha @r{character class}
	2951	@cindex alphabetic characters
	2952	Alphabetic characters:
	2953	@samp{[:lower:]} and @samp{[:upper:]}; in the @samp{C} locale and ASCII
	2954	character encoding, this is the same as @samp{[A-Za-z]}.
	2955
	2956	@item [:blank:]
	2957	@opindex blank @r{character class}
	2958	@cindex blank characters
	2959	Blank characters:
	2960	space and tab.
	2961
	2962	@item [:cntrl:]
	2963	@opindex cntrl @r{character class}
	2964	@cindex control characters
	2965	Control characters.
	2966	In ASCII, these characters have octal codes 000
	2967	through 037, and 177 (DEL).
	2968	In other character sets, these are
	2969	the equivalent characters, if any.
	2970
	2971	@item [:digit:]
	2972	@opindex digit @r{character class}
	2973	@cindex digit characters
	2974	@cindex numeric characters
	2975	Digits: @code{0 1 2 3 4 5 6 7 8 9}.
	2976
	2977	@item [:graph:]
	2978	@opindex graph @r{character class}
	2979	@cindex graphic characters
	2980	Graphical characters:
	2981	@samp{[:alnum:]} and @samp{[:punct:]}.
	2982
	2983	@item [:lower:]
	2984	@opindex lower @r{character class}
	2985	@cindex lower-case letters
	2986	Lower-case letters; in the @samp{C} locale and ASCII character
	2987	encoding, this is
	2988	@code{a b c d e f g h i j k l m n o p q r s t u v w x y z}.
	2989
	2990	@item [:print:]
	2991	@opindex print @r{character class}
	2992	@cindex printable characters
	2993	Printable characters:
	2994	@samp{[:alnum:]}, @samp{[:punct:]}, and space.
	2995
	2996	@item [:punct:]
	2997	@opindex punct @r{character class}
	2998	@cindex punctuation characters
	2999	Punctuation characters; in the @samp{C} locale and ASCII character
	3000	encoding, this is
	3001	@code{!@: " # $ % & ' ( ) * + , - .@: / : ; < = > ?@: @@ [ \ ] ^ _ ` @{ \| @} ~}.
	3002
	3003	@item [:space:]
	3004	@opindex space @r{character class}
	3005	@cindex space characters
	3006	@cindex whitespace characters
	3007	Space characters: in the @samp{C} locale, this is
	3008	tab, newline, vertical tab, form feed, carriage return, and space.
	3009
	3010
	3011	@item [:upper:]
	3012	@opindex upper @r{character class}
	3013	@cindex upper-case letters
	3014	Upper-case letters: in the @samp{C} locale and ASCII character
	3015	encoding, this is
	3016	@code{A B C D E F G H I J K L M N O P Q R S T U V W X Y Z}.
	3017
	3018	@item [:xdigit:]
	3019	@opindex xdigit @r{character class}
	3020	@cindex xdigit class
	3021	@cindex hexadecimal digits
	3022	Hexadecimal digits:
	3023	@code{0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f}.
	3024
	3025	@end table
	3026	Note that the brackets in these class names are
	3027	part of the symbolic names, and must be included in addition to
	3028	the brackets delimiting the bracket expression.
	3029
	3030	Most meta-characters lose their special meaning inside bracket expressions:
	3031
	3032	@table @samp
	3033	@item ]
	3034	ends the bracket expression if it's not the first list item.
	3035	So, if you want to make the @samp{]} character a list item,
	3036	you must put it first.
	3037
	3038	@item -
	3039	represents the range if it's not first or last in a list or the ending point
	3040	of a range.
	3041
	3042	@item ^
	3043	represents the characters not in the list.
	3044	If you want to make the @samp{^}
	3045	character a list item, place it anywhere but first.
	3046	@end table
	3047
	3048	TODO: incorporate this paragraph (copied verbatim from BRE section).
	3049
	3050	@cindex @code{POSIXLY_CORRECT} behavior, bracket expressions
	3051	The characters @code{$}, @code{*}, @code{.}, @code{[}, and @code{\}
	3052	are normally not special within @var{list}. For example, @code{[\*]}
	3053	matches either @samp{\} or @samp{*}, because the @code{\} is not
	3054	special here. However, strings like @code{[.ch.]}, @code{[=a=]}, and
	3055	@code{[:space:]} are special within @var{list} and represent collating
	3056	symbols, equivalence classes, and character classes, respectively, and
	3057	@code{[} is therefore special within @var{list} when it is followed by
	3058	@code{.}, @code{=}, or @code{:}. Also, when not in
	3059	@env{POSIXLY_CORRECT} mode, special escapes like @code{\n} and
	3060	@code{\t} are recognized within @var{list}. @xref{Escapes}.
	3061	@c ********
	3062
	3063
	3064	@c TODO: improve explanation about collation classes and equivalence classes
	3065	@c perhaps dedicate a section to Locales ??
	3066
	3067	@table @samp
	3068	@item [.
	3069	represents the open collating symbol.
	3070
	3071	@item .]
	3072	represents the close collating symbol.
	3073
	3074	@item [=
	3075	represents the open equivalence class.
	3076
	3077	@item =]
	3078	represents the close equivalence class.
	3079
	3080	@item [:
	3081	represents the open character class symbol, and should be followed by a
	3082	valid character class name.
	3083
	3084	@item :]
	3085	represents the close character class symbol.
	3086	@end table
	3087
	3088
	3089	@node regexp extensions
	3090	@section regular expression extensions
	3091
	3092	The following sequences have special meaning inside regular expressions
	3093	(used in @ref{Regexp Addresses,,addresses} and the @code{s} command).
	3094
	3095	These can be used in both
	3096	@ref{BRE syntax,,basic} and @ref{ERE syntax,,extended}
	3097	regular expressions (that is, with or without the @option{-E}/@option{-r}
	3098	options).
	3099
	3100	@table @code
	3101	@item \w
	3102	Matches any ``word'' character. A ``word'' character is any
	3103	letter or digit or the underscore character.
	3104
	3105	@example
	3106	$ echo "abc %-= def." \| sed 's/\w/X/g'
	3107	XXX %-= XXX.
	3108	@end example
	3109
	3110
	3111	@item \W
	3112	Matches any ``non-word'' character.
	3113
	3114	@example
	3115	$ echo "abc %-= def." \| sed 's/\W/X/g'
	3116	abcXXXXXdefX
	3117	@end example
	3118
	3119
	3120	@item \b
	3121	Matches a word boundary; that is it matches if the character
	3122	to the left is a ``word'' character and the character to the
	3123	right is a ``non-word'' character, or vice-versa.
	3124
	3125	@example
	3126	$ echo "abc %-= def." \| sed 's/\b/X/g'
	3127	XabcX %-= XdefX.
	3128	@end example
	3129
	3130
	3131	@item \B
	3132	Matches everywhere but on a word boundary; that is it matches
	3133	if the character to the left and the character to the right
	3134	are either both ``word'' characters or both ``non-word''
	3135	characters.
	3136
	3137	@example
	3138	$ echo "abc %-= def." \| sed 's/\B/X/g'
	3139	aXbXc X%X-X=X dXeXf.X
	3140	@end example
	3141
	3142
	3143	@item \s
	3144	Matches whitespace characters (spaces and tabs).
	3145	Newlines embedded in the pattern/hold spaces will also match:
	3146
	3147	@example
	3148	$ echo "abc %-= def." \| sed 's/\s/X/g'
	3149	abcX%-=Xdef.
	3150	@end example
	3151
	3152
	3153	@item \S
	3154	Matches non-whitespace characters.
	3155
	3156	@example
	3157	$ echo "abc %-= def." \| sed 's/\S/X/g'
	3158	XXX XXX XXXX
	3159	@end example
	3160
	3161
	3162	@item \<
	3163	Matches the beginning of a word.
	3164
	3165	@example
	3166	$ echo "abc %-= def." \| sed 's/\</X/g'
	3167	Xabc %-= Xdef.
	3168	@end example
	3169
	3170
	3171	@item \>
	3172	Matches the end of a word.
	3173
	3174	@example
	3175	$ echo "abc %-= def." \| sed 's/\>/X/g'
	3176	abcX %-= defX.
	3177	@end example
	3178
	3179
	3180	@item \`
	3181	Matches only at the start of pattern space. This is different
	3182	from @code{^} in multi-line mode.
	3183
	3184	Compare the following two examples:
	3185
	3186	@example
	3187	$ printf "a\nb\nc\n" \| sed 'N;N;s/^/X/gm'
	3188	Xa
	3189	Xb
	3190	Xc
	3191
	3192	$ printf "a\nb\nc\n" \| sed 'N;N;s/\`/X/gm'
	3193	Xa
	3194	b
	3195	c
	3196	@end example
	3197
	3198	@item \'
	3199	Matches only at the end of pattern space. This is different
	3200	from @code{$} in multi-line mode.
	3201
	3202
	3203
	3204	@end table
	3205
	3206
	3207	@node Back-references and Subexpressions
	3208	@section Back-references and Subexpressions
	3209	@cindex subexpression
	3210	@cindex back-reference
	3211
	3212	@dfn{back-references} are regular expression commands which refer to a
	3213	previous part of the matched regular expression. Back-references are
	3214	specified with backslash and a single digit (e.g. @samp{\1}). The
	3215	part of the regular expression they refer to is called a
	3216	@dfn{subexpression}, and is designated with parentheses.
	3217
	3218	Back-references and subexpressions are used in two cases: in the
	3219	regular expression search pattern, and in the @var{replacement} part
	3220	of the @command{s} command (@pxref{Regexp Addresses,,Regular
	3221	Expression Addresses} and @ref{The "s" Command}).
	3222
	3223	In a regular expression pattern, back-references are used to match
	3224	the same content as a previously matched subexpression. In the
	3225	following example, the subexpression is @samp{.} - any single
	3226	character (being surrounded by parentheses makes it a
	3227	subexpression). The back-reference @samp{\1} asks to match the same
	3228	content (same character) as the sub-expression.
	3229
	3230	The command below matches words starting with any character,
	3231	followed by the letter @samp{o}, followed by the same character as the
	3232	first.
	3233
	3234	@example
	3235	$ sed -E -n '/^(.)o\1$/p' /usr/share/dict/words
	3236	bob
	3237	mom
	3238	non
	3239	pop
	3240	sos
	3241	tot
	3242	wow
	3243	@end example
	3244
	3245	Multiple subexpressions are automatically numbered from
	3246	left-to-right. This command searches for 6-letter
	3247	palindromes (the first three letters are 3 subexpressions,
	3248	followed by 3 back-references in reverse order):
	3249
	3250	@example
	3251	$ sed -E -n '/^(.)(.)(.)\3\2\1$/p' /usr/share/dict/words
	3252	redder
	3253	@end example
	3254
	3255	In the @command{s} command, back-references can be
	3256	used in the @var{replacement} part to refer back to subexpressions in
	3257	the @var{regexp} part.
	3258
	3259	The following example uses two subexpressions in the regular
	3260	expression to match two space-separated words. The back-references in
	3261	the @var{replacement} part prints the words in a different order:
	3262
	3263	@example
	3264	$ echo "James Bond" \| sed -E 's/(.) (.)/The name is \2, \1 \2./'
	3265	The name is Bond, James Bond.
	3266	@end example
	3267
	3268
	3269	When used with alternation, if the group does not participate in the
	3270	match then the back-reference makes the whole match fail. For
	3271	example, @samp{a(.)\|b\1} will not match @samp{ba}. When multiple
	3272	regular expressions are given with @option{-e} or from a file
	3273	(@samp{-f @var{file}}), back-references are local to each expression.
	3274
	3275
[599]	3276	@node Escapes
[3613]	3277	@section Escape Sequences - specifying special characters
[599]	3278
[3613]	3279	@cindex GNU extensions, special escapes
[599]	3280	Until this chapter, we have only encountered escapes of the form
	3281	@samp{\^}, which tell @command{sed} not to interpret the circumflex
	3282	as a special character, but rather to take it literally. For
	3283	example, @samp{\*} matches a single asterisk rather than zero
	3284	or more backslashes.
	3285
	3286	@cindex @code{POSIXLY_CORRECT} behavior, escapes
	3287	This chapter introduces another kind of escape@footnote{All
[3613]	3288	the escapes introduced here are GNU
[599]	3289	extensions, with the exception of @code{\n}. In basic regular
	3290	expression mode, setting @code{POSIXLY_CORRECT} disables them inside
	3291	bracket expressions.}---that
	3292	is, escapes that are applied to a character or sequence of characters
	3293	that ordinarily are taken literally, and that @command{sed} replaces
	3294	with a special character. This provides a way
	3295	of encoding non-printable characters in patterns in a visible manner.
	3296	There is no restriction on the appearance of non-printing characters
	3297	in a @command{sed} script but when a script is being prepared in the
	3298	shell or by text editing, it is usually easier to use one of
	3299	the following escape sequences than the binary character it
	3300	represents:
	3301
	3302	The list of these escapes is:
	3303
	3304	@table @code
	3305	@item \a
	3306	Produces or matches a @sc{bel} character, that is an ``alert'' (@sc{ascii} 7).
	3307
	3308	@item \f
	3309	Produces or matches a form feed (@sc{ascii} 12).
	3310
	3311	@item \n
	3312	Produces or matches a newline (@sc{ascii} 10).
	3313
	3314	@item \r
	3315	Produces or matches a carriage return (@sc{ascii} 13).
	3316
	3317	@item \t
	3318	Produces or matches a horizontal tab (@sc{ascii} 9).
	3319
	3320	@item \v
	3321	Produces or matches a so called ``vertical tab'' (@sc{ascii} 11).
	3322
	3323	@item \c@var{x}
	3324	Produces or matches @kbd{@sc{Control}-@var{x}}, where @var{x} is
	3325	any character. The precise effect of @samp{\c@var{x}} is as follows:
	3326	if @var{x} is a lower case letter, it is converted to upper case.
	3327	Then bit 6 of the character (hex 40) is inverted. Thus @samp{\cz} becomes
	3328	hex 1A, but @samp{\c@{} becomes hex 3B, while @samp{\c;} becomes hex 7B.
	3329
	3330	@item \d@var{xxx}
	3331	Produces or matches a character whose decimal @sc{ascii} value is @var{xxx}.
	3332
	3333	@item \o@var{xxx}
	3334	Produces or matches a character whose octal @sc{ascii} value is @var{xxx}.
	3335
	3336	@item \x@var{xx}
	3337	Produces or matches a character whose hexadecimal @sc{ascii} value is @var{xx}.
	3338	@end table
	3339
	3340	@samp{\b} (backspace) was omitted because of the conflict with
	3341	the existing ``word boundary'' meaning.
	3342
[3613]	3343	@subsection Escaping Precedence
[599]	3344
[3613]	3345	@value{SSED} processes escape sequences @emph{before} passing
	3346	the text onto the regular-expression matching of the @command{s///} command
	3347	and Address matching. Thus the following two commands are equivalent
	3348	(@samp{0x5e} is the hexadecimal @sc{ascii} value of the character @samp{^}):
	3349
	3350	@codequoteundirected on
	3351	@codequotebacktick on
	3352	@example
	3353	@group
	3354	$ echo 'a^c' \| sed 's/^/b/'
	3355	ba^c
	3356
	3357	$ echo 'a^c' \| sed 's/\x5e/b/'
	3358	ba^c
	3359	@end group
	3360	@end example
	3361	@codequoteundirected off
	3362	@codequotebacktick off
	3363
	3364	As are the following (@samp{0x5b},@samp{0x5d} are the hexadecimal
	3365	@sc{ascii} values of @samp{[},@samp{]}, respectively):
	3366
	3367	@codequoteundirected on
	3368	@codequotebacktick on
	3369	@example
	3370	@group
	3371	$ echo abc \| sed 's/[a]/x/'
	3372	Xbc
	3373	$ echo abc \| sed 's/\x5ba\x5d/x/'
	3374	Xbc
	3375	@end group
	3376	@end example
	3377	@codequoteundirected off
	3378	@codequotebacktick off
	3379
	3380	However it is recommended to avoid such special characters
	3381	due to unexpected edge-cases. For example, the following
	3382	are not equivalent:
	3383
	3384	@codequoteundirected on
	3385	@codequotebacktick on
	3386	@example
	3387	@group
	3388	$ echo 'a^c' \| sed 's/\^/b/'
	3389	abc
	3390
	3391	$ echo 'a^c' \| sed 's/\\\x5e/b/'
	3392	a^c
	3393	@end group
	3394	@end example
	3395	@codequoteundirected off
	3396	@codequotebacktick off
	3397
	3398	@c also: this fails in different places:
	3399	@c $ sed 's/[//'
	3400	@c sed: -e expression #1, char 5: unterminated `s' command
	3401	@c $ sed 's/\x5b//'
	3402	@c sed: -e expression #1, char 8: Invalid regular expression
	3403	@c
	3404	@c which is OK but confusing to explain why (the first
	3405	@c fails in compile.c:snarf_char_class while the second
	3406	@c is passed to the regex engine and then fails).
	3407
	3408
	3409	@node Locale Considerations
	3410	@section Multibyte characters and Locale Considerations
	3411
	3412	@value{SSED} processes valid multibyte characters in multibyte locales
	3413	(e.g. @code{UTF-8}). @footnote{Some regexp edge-cases depends on the
	3414	operating system and libc implementation. The examples shown are known
	3415	to work as-expected on GNU/Linux systems using glibc.}
	3416
	3417	@noindent The following example uses the Greek letter Capital Sigma
	3418	(@value{ucsigma},
	3419	Unicode code point @code{0x03A3}). In a @code{UTF-8} locale,
	3420	@command{sed} correctly processes the Sigma as one character despite
	3421	it being 2 octets (bytes):
	3422
	3423	@codequoteundirected on
	3424	@codequotebacktick on
	3425	@example
	3426	@group
	3427	$ locale \| grep LANG
	3428	LANG=en_US.UTF-8
	3429
	3430	$ printf 'a\u03A3b'
	3431	a@value{ucsigma}b
	3432
	3433	$ printf 'a\u03A3b' \| sed 's/./X/g'
	3434	XXX
	3435
	3436	$ printf 'a\u03A3b' \| od -tx1 -An
	3437	61 ce a3 62
	3438	@end group
	3439	@end example
	3440	@codequoteundirected off
	3441	@codequotebacktick off
	3442
	3443	@noindent
	3444	To force @command{sed} to process octets separately, use the @code{C} locale
	3445	(also known as the @code{POSIX} locale):
	3446
	3447	@codequoteundirected on
	3448	@codequotebacktick on
	3449	@example
	3450	$ printf 'a\u03A3b' \| LC_ALL=C sed 's/./X/g'
	3451	XXXX
	3452	@end example
	3453	@codequoteundirected off
	3454	@codequotebacktick off
	3455
	3456	@subsection Invalid multibyte characters
	3457
	3458	@command{sed}'s regular expressions @emph{do not} match
	3459	invalid multibyte sequences in a multibyte locale.
	3460
	3461	@noindent
	3462	In the following examples, the ascii value @code{0xCE} is
	3463	an incomplete multibyte character (shown here as @value{unicodeFFFD}).
	3464	The regular expression @samp{.} does not match it:
	3465
	3466	@codequoteundirected on
	3467	@codequotebacktick on
	3468	@example
	3469	@group
	3470	$ printf 'a\xCEb\n'
	3471	a@value{unicodeFFFD}e
	3472
	3473	$ printf 'a\xCEb\n' \| sed 's/./X/g'
	3474	X@value{unicodeFFFD}X
	3475
	3476	$ printf 'a\xCEc\n' \| sed 's/./X/g' \| od -tx1c -An
	3477	58 ce 58 0a
	3478	X X \n
	3479	@end group
	3480	@end example
	3481	@codequoteundirected off
	3482	@codequotebacktick off
	3483
	3484	@noindent Similarly, the 'catch-all' regular expression @samp{.*} does not
	3485	match the entire line:
	3486
	3487	@codequoteundirected on
	3488	@codequotebacktick on
	3489	@example
	3490	@group
	3491	$ printf 'a\xCEc\n' \| sed 's/.*//' \| od -tx1c -An
	3492	ce 63 0a
	3493	c \n
	3494	@end group
	3495	@end example
	3496	@codequoteundirected off
	3497	@codequotebacktick off
	3498
	3499	@noindent
	3500	@value{SSED} offers the special @command{z} command to clear the
	3501	current pattern space regardless of invalid multibyte characters
	3502	(i.e. it works like @code{s/.*//} but also removes invalid multibyte
	3503	characters):
	3504
	3505	@codequoteundirected on
	3506	@codequotebacktick on
	3507	@example
	3508	@group
	3509	$ printf 'a\xCEc\n' \| sed 'z' \| od -tx1c -An
	3510	0a
	3511	\n
	3512	@end group
	3513	@end example
	3514	@codequoteundirected off
	3515	@codequotebacktick off
	3516
	3517	@noindent Alternatively, force the @code{C} locale to process
	3518	each octet separately (every octet is a valid character in the @code{C}
	3519	locale):
	3520
	3521	@codequoteundirected on
	3522	@codequotebacktick on
	3523	@example
	3524	@group
	3525	$ printf 'a\xCEc\n' \| LC_ALL=C sed 's/.*//' \| od -tx1c -An
	3526	0a
	3527	\n
	3528	@end group
	3529	@end example
	3530	@codequoteundirected off
	3531	@codequotebacktick off
	3532
	3533
	3534	@command{sed}'s inability to process invalid multibyte characters
	3535	can be used to detect such invalid sequences in a file.
	3536	In the following examples, the @code{\xCE\xCE} is an invalid
	3537	multibyte sequence, while @code{\xCE\A3} is a valid multibyte sequence
	3538	(of the Greek Sigma character).
	3539
	3540	@noindent
	3541	The following @command{sed} program removes all valid
	3542	characters using @code{s/.//g}. Any content left in the pattern space
	3543	(the invalid characters) are added to the hold space using the
	3544	@code{H} command. On the last line (@code{$}), the hold space is retrieved
	3545	(@code{x}), newlines are removed (@code{s/\n//g}), and any remaining
	3546	octets are printed unambiguously (@code{l}). Thus, any invalid
	3547	multibyte sequences are printed as octal values:
	3548
	3549	@codequoteundirected on
	3550	@codequotebacktick on
	3551	@example
	3552	@group
	3553	$ printf 'ab\nc\n\xCE\xCEde\n\xCE\xA3f\n' > invalid.txt
	3554
	3555	$ cat invalid.txt
	3556	ab
	3557	c
	3558	@value{unicodeFFFD}@value{unicodeFFFD}de
	3559	@value{ucsigma}f
	3560
	3561	$ sed -n 's/.//g ; H ; $@{x;s/\n//g;l@}' invalid.txt
	3562	\316\316$
	3563	@end group
	3564	@end example
	3565	@codequoteundirected off
	3566	@codequotebacktick off
	3567
	3568	@noindent With a few more commands, @command{sed} can print
	3569	the exact line number corresponding to each invalid characters (line 3).
	3570	These characters can then be removed by forcing the @code{C} locale
	3571	and using octal escape sequences:
	3572
	3573	@codequoteundirected on
	3574	@codequotebacktick on
	3575	@example
	3576	$ sed -n 's/.//g;=;l' invalid.txt \| paste - - \| awk '$2!="$"'
	3577	3 \316\316$
	3578
	3579	$ LC_ALL=C sed '3s/\o316\o316//' invalid.txt > fixed.txt
	3580	@end example
	3581	@codequoteundirected off
	3582	@codequotebacktick off
	3583
	3584	@subsection Upper/Lower case conversion
	3585
	3586
	3587	@value{SSED}'s substitute command (@code{s}) supports upper/lower
	3588	case conversions using @code{\U},@code{\L} codes.
	3589	These conversions support multibyte characters:
	3590
	3591	@codequoteundirected on
	3592	@codequotebacktick on
	3593	@example
	3594	$ printf 'ABC\u03a3\n'
	3595	ABC@value{ucsigma}
	3596
	3597	$ printf 'ABC\u03a3\n' \| sed 's/.*/\L&/'
	3598	abc@value{lcsigma}
	3599	@end example
	3600	@codequoteundirected off
	3601	@codequotebacktick off
	3602
	3603	@noindent
	3604	@xref{The "s" Command}.
	3605
	3606
	3607	@subsection Multibyte regexp character classes
	3608
	3609	@c TODO: fix following paragraphs (copied verbatim from 'bracket
	3610	@c expression' section).
	3611
	3612	In other locales, the sorting sequence is not specified, and
	3613	@samp{[a-d]} might be equivalent to @samp{[abcd]} or to
	3614	@samp{[aBbCcDd]}, or it might fail to match any character, or the set of
	3615	characters that it matches might even be erratic.
	3616	To obtain the traditional interpretation
	3617	of bracket expressions, you can use the @samp{C} locale by setting the
	3618	@env{LC_ALL} environment variable to the value @samp{C}.
	3619
	3620	@example
	3621	# TODO: is there any real-world system/locale where 'A'
	3622	# is replaced by '-' ?
	3623	$ echo A \| sed 's/[a-z]/-/'
	3624	A
	3625	@end example
	3626
	3627	Their interpretation depends on the @env{LC_CTYPE} locale;
	3628	for example, @samp{[[:alnum:]]} means the character class of numbers and letters
	3629	in the current locale.
	3630
	3631	TODO: show example of collation
	3632
	3633	@codequoteundirected on
	3634	@codequotebacktick on
	3635	@example
	3636	# TODO: this works on glibc systems, not on musl-libc/freebsd/macosx.
	3637	$ printf 'clichÃ©\n' \| LC_ALL=fr_FR.utf8 sed 's/[[=e=]]/X/g'
	3638	clichX
	3639	@end example
	3640	@codequoteundirected off
	3641	@codequotebacktick off
	3642
	3643
	3644	@node advanced sed
	3645	@chapter Advanced @command{sed}: cycles and buffers
	3646
	3647	@menu
	3648	* Execution Cycle:: How @command{sed} works
	3649	* Hold and Pattern Buffers::
	3650	* Multiline techniques:: Using D,G,H,N,P to process multiple lines
	3651	* Branching and flow control::
	3652	@end menu
	3653
	3654	@node Execution Cycle
	3655	@section How @command{sed} Works
	3656
	3657	@cindex Buffer spaces, pattern and hold
	3658	@cindex Spaces, pattern and hold
	3659	@cindex Pattern space, definition
	3660	@cindex Hold space, definition
	3661	@command{sed} maintains two data buffers: the active @emph{pattern} space,
	3662	and the auxiliary @emph{hold} space. Both are initially empty.
	3663
	3664	@command{sed} operates by performing the following cycle on each
	3665	line of input: first, @command{sed} reads one line from the input
	3666	stream, removes any trailing newline, and places it in the pattern space.
	3667	Then commands are executed; each command can have an address associated
	3668	to it: addresses are a kind of condition code, and a command is only
	3669	executed if the condition is verified before the command is to be
	3670	executed.
	3671
	3672	When the end of the script is reached, unless the @option{-n} option
	3673	is in use, the contents of pattern space are printed out to the output
	3674	stream, adding back the trailing newline if it was removed.@footnote{Actually,
	3675	if @command{sed} prints a line without the terminating newline, it will
	3676	nevertheless print the missing newline as soon as more text is sent to
	3677	the same output stream, which gives the ``least expected surprise''
	3678	even though it does not make commands like @samp{sed -n p} exactly
	3679	identical to @command{cat}.} Then the next cycle starts for the next
	3680	input line.
	3681
	3682	Unless special commands (like @samp{D}) are used, the pattern space is
	3683	deleted between two cycles. The hold space, on the other hand, keeps
	3684	its data between cycles (see commands @samp{h}, @samp{H}, @samp{x},
	3685	@samp{g}, @samp{G} to move data between both buffers).
	3686
	3687	@node Hold and Pattern Buffers
	3688	@section Hold and Pattern Buffers
	3689
	3690	TODO
	3691
	3692	@node Multiline techniques
	3693	@section Multiline techniques - using D,G,H,N,P to process multiple lines
	3694
	3695	Multiple lines can be processed as one buffer using the
	3696	@code{D},@code{G},@code{H},@code{N},@code{P}. They are similar to
	3697	their lowercase counterparts (@code{d},@code{g},
	3698	@code{h},@code{n},@code{p}), except that these commands append or
	3699	subtract data while respecting embedded newlines - allowing adding and
	3700	removing lines from the pattern and hold spaces.
	3701
	3702	They operate as follows:
[599]	3703	@table @code
[3613]	3704	@item D
	3705	@emph{deletes} line from the pattern space until the first newline,
	3706	and restarts the cycle.
[599]	3707
[3613]	3708	@item G
	3709	@emph{appends} line from the hold space to the pattern space, with a
	3710	newline before it.
[599]	3711
[3613]	3712	@item H
	3713	@emph{appends} line from the pattern space to the hold space, with a
	3714	newline before it.
[599]	3715
[3613]	3716	@item N
	3717	@emph{appends} line from the input file to the pattern space.
[599]	3718
[3613]	3719	@item P
	3720	@emph{prints} line from the pattern space until the first newline.
[599]	3721
[3613]	3722	@end table
[599]	3723
[3613]	3724
	3725	The following example illustrates the operation of @code{N} and
	3726	@code{D} commands:
	3727
	3728	@codequoteundirected on
	3729	@codequotebacktick on
	3730	@example
	3731	@group
	3732	$ seq 6 \| sed -n 'N;l;D'
	3733	1\n2$
	3734	2\n3$
	3735	3\n4$
	3736	4\n5$
	3737	5\n6$
	3738	@end group
	3739	@end example
	3740	@codequoteundirected off
	3741	@codequotebacktick off
	3742
	3743	@enumerate
	3744	@item
	3745	@command{sed} starts by reading the first line into the pattern space
	3746	(i.e. @samp{1}).
	3747	@item
	3748	At the beginning of every cycle, the @code{N}
	3749	command appends a newline and the next line to the pattern space
	3750	(i.e. @samp{1}, @samp{\n}, @samp{2} in the first cycle).
	3751	@item
	3752	The @code{l} command prints the content of the pattern space
	3753	unambiguously.
	3754	@item
	3755	The @code{D} command then removes the content of pattern
	3756	space up to the first newline (leaving @samp{2} at the end of
	3757	the first cycle).
	3758	@item
	3759	At the next cycle the @code{N} command appends a
	3760	newline and the next input line to the pattern space
	3761	(e.g. @samp{2}, @samp{\n}, @samp{3}).
	3762	@end enumerate
	3763
	3764
	3765	@cindex processing paragraphs
	3766	@cindex paragraphs, processing
	3767	A common technique to process blocks of text such as paragraphs
	3768	(instead of line-by-line) is using the following construct:
	3769
	3770	@codequoteundirected on
	3771	@codequotebacktick on
	3772	@example
	3773	sed '/./@{H;$!d@} ; x ; s/REGEXP/REPLACEMENT/'
	3774	@end example
	3775	@codequoteundirected off
	3776	@codequotebacktick off
	3777
	3778	@enumerate
	3779	@item
	3780	The first expression, @code{/./@{H;$!d@}} operates on all non-empty lines,
	3781	and adds the current line (in the pattern space) to the hold space.
	3782	On all lines except the last, the pattern space is deleted and the cycle is
	3783	restarted.
	3784
	3785	@item
	3786	The other expressions @code{x} and @code{s} are executed only on empty
	3787	lines (i.e. paragraph separators). The @code{x} command fetches the
	3788	accumulated lines from the hold space back to the pattern space. The
	3789	@code{s///} command then operates on all the text in the paragraph
	3790	(including the embedded newlines).
	3791	@end enumerate
	3792
	3793	The following example demonstrates this technique:
	3794	@codequoteundirected on
	3795	@codequotebacktick on
	3796	@example
	3797	@group
	3798	$ cat input.txt
	3799	a a a aa aaa
	3800	aaaa aaaa aa
	3801	aaaa aaa aaa
	3802
	3803	bbbb bbb bbb
	3804	bb bb bbb bb
	3805	bbbbbbbb bbb
	3806
	3807	ccc ccc cccc
	3808	cccc ccccc c
	3809	cc cc cc cc
	3810
	3811	$ sed '/./@{H;$!d@} ; x ; s/^/\nSTART-->/ ; s/$/\n<--END/' input.txt
	3812
	3813	START-->
	3814	a a a aa aaa
	3815	aaaa aaaa aa
	3816	aaaa aaa aaa
	3817	<--END
	3818
	3819	START-->
	3820	bbbb bbb bbb
	3821	bb bb bbb bb
	3822	bbbbbbbb bbb
	3823	<--END
	3824
	3825	START-->
	3826	ccc ccc cccc
	3827	cccc ccccc c
	3828	cc cc cc cc
	3829	<--END
	3830	@end group
	3831	@end example
	3832	@codequoteundirected off
	3833	@codequotebacktick off
	3834
	3835	For more annotated examples, @pxref{Text search across multiple lines}
	3836	and @ref{Line length adjustment}.
	3837
	3838	@node Branching and flow control
	3839	@section Branching and Flow Control
	3840
	3841	The branching commands @code{b}, @code{t}, and @code{T} enable
	3842	changing the flow of @command{sed} programs.
	3843
	3844	By default, @command{sed} reads an input line into the pattern buffer,
	3845	then continues to processes all commands in order.
	3846	Commands without addresses affect all lines.
	3847	Commands with addresses affect only matching lines.
	3848	@xref{Execution Cycle} and @ref{Addresses overview}.
	3849
	3850	@command{sed} does not support a typical @code{if/then} construct.
	3851	Instead, some commands can be used as conditionals or to change the
	3852	default flow control:
	3853
	3854	@table @code
	3855
	3856	@item d
	3857	delete (clears) the current pattern space,
	3858	and restart the program cycle without processing the rest of the commands
	3859	and without printing the pattern space.
	3860
	3861	@item D
	3862	delete the contents of the pattern space @emph{up to the first newline},
	3863	and restart the program cycle without processing the rest of
	3864	the commands and without printing the pattern space.
	3865
	3866	@item [addr]X
	3867	@itemx [addr]@{ X ; X ; X @}
	3868	@item /regexp/X
	3869	@item /regexp/@{ X ; X ; X @}
	3870	Addresses and regular expressions can be used as an @code{if/then}
	3871	conditional: If @var{[addr]} matches the current pattern space,
	3872	execute the command(s).
	3873	For example: The command @code{/^#/d} means:
	3874	@emph{if} the current pattern matches the regular expression @code{^#} (a line
	3875	starting with a hash), @emph{then} execute the @code{d} command:
	3876	delete the line without printing it, and restart the program cycle
	3877	immediately.
	3878
	3879	@item b
	3880	branch unconditionally (that is: always jump to a label, skipping
	3881	or repeating other commands, without restarting a new cycle). Combined
	3882	with an address, the branch can be conditionally executed on matched
	3883	lines.
	3884
	3885	@item t
	3886	branch conditionally (that is: jump to a label) @emph{only if} a
	3887	@code{s///} command has succeeded since the last input line was read
	3888	or another conditional branch was taken.
	3889
	3890	@item T
	3891	similar but opposite to the @code{t} command: branch only if
	3892	there has been @emph{no} successful substitutions since the last
	3893	input line was read.
[599]	3894	@end table
	3895
[3613]	3896
	3897	The following two @command{sed} programs are equivalent. The first
	3898	(contrived) example uses the @code{b} command to skip the @code{s///}
	3899	command on lines containing @samp{1}. The second example uses an
	3900	address with negation (@samp{!}) to perform substitution only on
	3901	desired lines. The @code{y///} command is still executed on all
	3902	lines:
	3903
	3904	@codequoteundirected on
	3905	@codequotebacktick on
	3906	@example
	3907	@group
	3908	$ printf '%s\n' a1 a2 a3 \| sed -E '/1/bx ; s/a/z/ ; :x ; y/123/456/'
	3909	a4
	3910	z5
	3911	z6
	3912
	3913	$ printf '%s\n' a1 a2 a3 \| sed -E '/1/!s/a/z/ ; y/123/456/'
	3914	a4
	3915	z5
	3916	z6
	3917	@end group
	3918	@end example
	3919	@codequoteundirected off
	3920	@codequotebacktick off
	3921
	3922
	3923
	3924	@subsection Branching and Cycles
	3925	@cindex labels
	3926	@cindex omitting labels
	3927	@cindex cycle, restarting
	3928	@cindex restarting a cycle
	3929	The @code{b},@code{t} and @code{T} commands can be followed by a label
	3930	(typically a single letter). Labels are defined with a colon followed by
	3931	one or more letters (e.g. @samp{:x}). If the label is omitted the
	3932	branch commands restart the cycle. Note the difference between
	3933	branching to a label and restarting the cycle: when a cycle is
	3934	restarted, @command{sed} first prints the current content of the
	3935	pattern space, then reads the next input line into the pattern space;
	3936	Jumping to a label (even if it is at the beginning of the program)
	3937	does not print the pattern space and does not read the next input line.
	3938
	3939	The following program is a no-op. The @code{b} command (the only command
	3940	in the program) does not have a label, and thus simply restarts the cycle.
	3941	On each cycle, the pattern space is printed and the next input line is read:
	3942
	3943	@example
	3944	@group
	3945	$ seq 3 \| sed b
	3946	1
	3947	2
	3948	3
	3949	@end group
	3950	@end example
	3951
	3952	@cindex infinite loop, branching
	3953	@cindex branching, infinite loop
	3954	The following example is an infinite-loop - it doesn't terminate and
	3955	doesn't print anything. The @code{b} command jumps to the @samp{x}
	3956	label, and a new cycle is never started:
	3957
	3958	@codequoteundirected on
	3959	@codequotebacktick on
	3960	@example
	3961	@group
	3962	$ seq 3 \| sed ':x ; bx'
	3963
	3964	# The above command requires gnu sed (which supports additional
	3965	# commands following a label, without a newline). A portable equivalent:
	3966	# sed -e ':x' -e bx
	3967	@end group
	3968	@end example
	3969	@codequoteundirected off
	3970	@codequotebacktick off
	3971
	3972	@cindex branching and n, N
	3973	@cindex n, and branching
	3974	@cindex N, and branching
	3975	Branching is often complemented with the @code{n} or @code{N} commands:
	3976	both commands read the next input line into the pattern space without waiting
	3977	for the cycle to restart. Before reading the next input line, @code{n}
	3978	prints the current pattern space then empties it, while @code{N}
	3979	appends a newline and the next input line to the pattern space.
	3980
	3981	Consider the following two examples:
	3982
	3983	@codequoteundirected on
	3984	@codequotebacktick on
	3985	@example
	3986	@group
	3987	$ seq 3 \| sed ':x ; n ; bx'
	3988	1
	3989	2
	3990	3
	3991
	3992	$ seq 3 \| sed ':x ; N ; bx'
	3993	1
	3994	2
	3995	3
	3996	@end group
	3997	@end example
	3998	@codequoteundirected off
	3999	@codequotebacktick off
	4000
	4001	@itemize
	4002	@item
	4003	Both examples do not inf-loop, despite never starting a new cycle.
	4004
	4005	@item
	4006	In the first example, the @code{n} commands first prints the content
	4007	of the pattern space, empties the pattern space then reads the next
	4008	input line.
	4009
	4010	@item
	4011	In the second example, the @code{N} commands appends the next input
	4012	line to the pattern space (with a newline). Lines are accumulated in
	4013	the pattern space until there are no more input lines to read, then
	4014	the @code{N} command terminates the @command{sed} program. When the
	4015	program terminates, the end-of-cycle actions are performed, and the
	4016	entire pattern space is printed.
	4017
	4018	@item
	4019	The second example requires @value{SSED},
	4020	because it uses the non-POSIX-standard behavior of @code{N}.
	4021	See the ``@code{N} command on the last line'' paragraph
	4022	in @ref{Reporting Bugs}.
	4023
	4024	@item
	4025	To further examine the difference between the two examples,
	4026	try the following commands:
	4027	@codequoteundirected on
	4028	@codequotebacktick on
	4029	@example
	4030	@group
	4031	printf '%s\n' aa bb cc dd \| sed ':x ; n ; = ; bx'
	4032	printf '%s\n' aa bb cc dd \| sed ':x ; N ; = ; bx'
	4033	printf '%s\n' aa bb cc dd \| sed ':x ; n ; s/\n/***/ ; bx'
	4034	printf '%s\n' aa bb cc dd \| sed ':x ; N ; s/\n/***/ ; bx'
	4035	@end group
	4036	@end example
	4037	@codequoteundirected off
	4038	@codequotebacktick off
	4039
	4040	@end itemize
	4041
	4042
	4043
	4044	@subsection Branching example: joining lines
	4045
	4046	@cindex joining lines with branching
	4047	@cindex branching, joining lines
	4048	@cindex quoted-printable lines, joining
	4049	@cindex joining quoted-printable lines
	4050	@cindex t, joining lines with
	4051	@cindex b, joining lines with
	4052	@cindex b, versus t
	4053	@cindex t, versus b
	4054	As a real-world example of using branching, consider the case of
	4055	@uref{https://en.wikipedia.org/wiki/Quoted-printable,quoted-printable} files,
	4056	typically used to encode email messages.
	4057	In these files long lines are split and marked with a @dfn{soft line break}
	4058	consisting of a single @samp{=} character at the end of the line:
	4059
	4060	@example
	4061	@group
	4062	$ cat jaques.txt
	4063	All the wor=
	4064	ld's a stag=
	4065	e,
	4066	And all the=
	4067	men and wo=
	4068	men merely =
	4069	players:
	4070	They have t=
	4071	heir exits =
	4072	and their e=
	4073	ntrances;
	4074	And one man=
	4075	in his tim=
	4076	e plays man=
	4077	y parts.
	4078	@end group
	4079	@end example
	4080
	4081
	4082	The following program uses an address match @samp{/=$/} as a
	4083	conditional: If the current pattern space ends with a @samp{=}, it
	4084	reads the next input line using @code{N}, replaces all @samp{=}
	4085	characters which are followed by a newline, and unconditionally
	4086	branches (@code{b}) to the beginning of the program without restarting
	4087	a new cycle. If the pattern space does not ends with @samp{=}, the
	4088	default action is performed: the pattern space is printed and a new
	4089	cycle is started:
	4090
	4091	@codequoteundirected on
	4092	@codequotebacktick on
	4093	@example
	4094	@group
	4095	$ sed ':x ; /=$/ @{ N ; s/=\n//g ; bx @}' jaques.txt
	4096	All the world's a stage,
	4097	And all the men and women merely players:
	4098	They have their exits and their entrances;
	4099	And one man in his time plays many parts.
	4100	@end group
	4101	@end example
	4102	@codequoteundirected off
	4103	@codequotebacktick off
	4104
	4105	Here's an alternative program with a slightly different approach: On
	4106	all lines except the last, @code{N} appends the line to the pattern
	4107	space. A substitution command then removes soft line breaks
	4108	(@samp{=} at the end of a line, i.e. followed by a newline) by replacing
	4109	them with an empty string.
	4110	@emph{if} the substitution was successful (meaning the pattern space contained
	4111	a line which should be joined), The conditional branch command @code{t} jumps
	4112	to the beginning of the program without completing or restarting the cycle.
	4113	If the substitution failed (meaning there were no soft line breaks),
	4114	The @code{t} command will @emph{not} branch. Then, @code{P} will
	4115	print the pattern space content until the first newline, and @code{D}
	4116	will delete the pattern space content until the first new line.
	4117	(To learn more about @code{N}, @code{P} and @code{D} commands
	4118	@pxref{Multiline techniques}).
	4119
	4120
	4121	@codequoteundirected on
	4122	@codequotebacktick on
	4123	@example
	4124	@group
	4125	$ sed ':x ; $!N ; s/=\n// ; tx ; P ; D' jaques.txt
	4126	All the world's a stage,
	4127	And all the men and women merely players:
	4128	They have their exits and their entrances;
	4129	And one man in his time plays many parts.
	4130	@end group
	4131	@end example
	4132	@codequoteundirected off
	4133	@codequotebacktick off
	4134
	4135
	4136	For more line-joining examples @pxref{Joining lines}.
	4137
	4138
[599]	4139	@node Examples
	4140	@chapter Some Sample Scripts
	4141
	4142	Here are some @command{sed} scripts to guide you in the art of mastering
	4143	@command{sed}.
	4144
	4145	@menu
[3613]	4146
	4147	Useful one-liners:
	4148	* Joining lines::
	4149
[599]	4150	Some exotic examples:
	4151	* Centering lines::
	4152	* Increment a number::
	4153	* Rename files to lower case::
	4154	* Print bash environment::
	4155	* Reverse chars of lines::
[3613]	4156	* Text search across multiple lines::
	4157	* Line length adjustment::
	4158	* Adding a header to multiple files::
[599]	4159
	4160	Emulating standard utilities:
	4161	* tac:: Reverse lines of files
	4162	* cat -n:: Numbering lines
	4163	* cat -b:: Numbering non-blank lines
	4164	* wc -c:: Counting chars
	4165	* wc -w:: Counting words
	4166	* wc -l:: Counting lines
	4167	* head:: Printing the first lines
	4168	* tail:: Printing the last lines
	4169	* uniq:: Make duplicate lines unique
	4170	* uniq -d:: Print duplicated lines of input
	4171	* uniq -u:: Remove all duplicated lines
	4172	* cat -s:: Squeezing blank lines
	4173	@end menu
	4174
[3613]	4175	@node Joining lines
	4176	@section Joining lines
	4177
	4178	This section uses @code{N}, @code{D} and @code{P} commands to process
	4179	multiple lines, and the @code{b} and @code{t} commands for branching.
	4180	@xref{Multiline techniques} and @ref{Branching and flow control}.
	4181
	4182	Join specific lines (e.g. if lines 2 and 3 need to be joined):
	4183
	4184	@codequoteundirected on
	4185	@codequotebacktick on
	4186	@example
	4187	$ cat lines.txt
	4188	hello
	4189	hel
	4190	lo
	4191	hello
	4192
	4193	$ sed '2@{N;s/\n//;@}' lines.txt
	4194	hello
	4195	hello
	4196	hello
	4197	@end example
	4198	@codequoteundirected off
	4199	@codequotebacktick off
	4200
	4201	Join backslash-continued lines:
	4202
	4203	@codequoteundirected on
	4204	@codequotebacktick on
	4205	@example
	4206	$ cat 1.txt
	4207	this \
	4208	is \
	4209	a \
	4210	long \
	4211	line
	4212	and another \
	4213	line
	4214
	4215	$ sed -e ':x /\\$/ @{ N; s/\\\n//g ; bx @}' 1.txt
	4216	this is a long line
	4217	and another line
	4218
	4219
	4220	#TODO: The above requires gnu sed.
	4221	# non-gnu seds need newlines after ':' and 'b'
	4222	@end example
	4223	@codequoteundirected off
	4224	@codequotebacktick off
	4225
	4226	Join lines that start with whitespace (e.g SMTP headers):
	4227
	4228	@codequoteundirected on
	4229	@codequotebacktick on
	4230	@example
	4231	@group
	4232	$ cat 2.txt
	4233	Subject: Hello
	4234	World
	4235	Content-Type: multipart/alternative;
	4236	boundary=94eb2c190cc6370f06054535da6a
	4237	Date: Tue, 3 Jan 2017 19:41:16 +0000 (GMT)
	4238	Authentication-Results: mx.gnu.org;
	4239	dkim=pass header.i=@@gnu.org;
	4240	spf=pass
	4241	Message-ID: <abcdef@@gnu.org>
	4242	From: John Doe <jdoe@@gnu.org>
	4243	To: Jane Smith <jsmith@@gnu.org>
	4244
	4245	$ sed -E ':a ; $!N ; s/\n\s+/ / ; ta ; P ; D' 2.txt
	4246	Subject: Hello World
	4247	Content-Type: multipart/alternative; boundary=94eb2c190cc6370f06054535da6a
	4248	Date: Tue, 3 Jan 2017 19:41:16 +0000 (GMT)
	4249	Authentication-Results: mx.gnu.org; dkim=pass header.i=@@gnu.org; spf=pass
	4250	Message-ID: <abcdef@@gnu.org>
	4251	From: John Doe <jdoe@@gnu.org>
	4252	To: Jane Smith <jsmith@@gnu.org>
	4253
	4254	# A portable (non-gnu) variation:
	4255	# sed -e :a -e '$!N;s/\n */ /;ta' -e 'P;D'
	4256	@end group
	4257	@end example
	4258	@codequoteundirected off
	4259	@codequotebacktick off
	4260
	4261
[599]	4262	@node Centering lines
	4263	@section Centering Lines
	4264
	4265	This script centers all lines of a file on a 80 columns width.
	4266	To change that width, the number in @code{\@{@dots{}\@}} must be
	4267	replaced, and the number of added spaces also must be changed.
	4268
	4269	Note how the buffer commands are used to separate parts in
	4270	the regular expressions to be matched---this is a common
	4271	technique.
	4272
	4273	@c start-------------------------------------------
	4274	@example
	4275	#!/usr/bin/sed -f
	4276
	4277	@group
	4278	# Put 80 spaces in the buffer
	4279	1 @{
	4280	x
	4281	s/^$/ /
	4282	s/^.*$/&&&&&&&&/
	4283	x
	4284	@}
	4285	@end group
	4286
	4287	@group
[3613]	4288	# delete leading and trailing spaces
	4289	y/@kbd{@key{TAB}}/ /
[599]	4290	s/^ *//
	4291	s/ *$//
	4292	@end group
	4293
	4294	@group
	4295	# add a newline and 80 spaces to end of line
	4296	G
	4297	@end group
	4298
	4299	@group
	4300	# keep first 81 chars (80 + a newline)
	4301	s/^$.\@{81\@}$.*$/\1/
	4302	@end group
	4303
	4304	@group
	4305	# \2 matches half of the spaces, which are moved to the beginning
	4306	s/^$.$\n$.$\2/\2\1/
	4307	@end group
	4308	@end example
	4309	@c end---------------------------------------------
	4310
	4311	@node Increment a number
	4312	@section Increment a Number
	4313
	4314	This script is one of a few that demonstrate how to do arithmetic
	4315	in @command{sed}. This is indeed possible,@footnote{@command{sed} guru Greg
	4316	Ubben wrote an implementation of the @command{dc} @sc{rpn} calculator!
	4317	It is distributed together with sed.} but must be done manually.
	4318
	4319	To increment one number you just add 1 to last digit, replacing
	4320	it by the following digit. There is one exception: when the digit
	4321	is a nine the previous digits must be also incremented until you
	4322	don't have a nine.
	4323
	4324	This solution by Bruno Haible is very clever and smart because
	4325	it uses a single buffer; if you don't have this limitation, the
	4326	algorithm used in @ref{cat -n, Numbering lines}, is faster.
	4327	It works by replacing trailing nines with an underscore, then
	4328	using multiple @code{s} commands to increment the last digit,
	4329	and then again substituting underscores with zeros.
	4330
	4331	@c start-------------------------------------------
	4332	@example
	4333	#!/usr/bin/sed -f
	4334
	4335	/[^0-9]/ d
	4336
	4337	@group
[3613]	4338	# replace all trailing 9s by _ (any other character except digits, could
[599]	4339	# be used)
	4340	:d
	4341	s/9$_*$$/_\1/
	4342	td
	4343	@end group
	4344
	4345	@group
	4346	# incr last digit only. The first line adds a most-significant
	4347	# digit of 1 if we have to add a digit.
	4348	@end group
	4349
	4350	@group
	4351	s/^$_*$$/1\1/; tn
	4352	s/8$_*$$/9\1/; tn
	4353	s/7$_*$$/8\1/; tn
	4354	s/6$_*$$/7\1/; tn
	4355	s/5$_*$$/6\1/; tn
	4356	s/4$_*$$/5\1/; tn
	4357	s/3$_*$$/4\1/; tn
	4358	s/2$_*$$/3\1/; tn
	4359	s/1$_*$$/2\1/; tn
	4360	s/0$_*$$/1\1/; tn
	4361	@end group
	4362
	4363	@group
	4364	:n
	4365	y/_/0/
	4366	@end group
	4367	@end example
	4368	@c end---------------------------------------------
	4369
	4370	@node Rename files to lower case
	4371	@section Rename Files to Lower Case
	4372
	4373	This is a pretty strange use of @command{sed}. We transform text, and
	4374	transform it to be shell commands, then just feed them to shell.
	4375	Don't worry, even worse hacks are done when using @command{sed}; I have
	4376	seen a script converting the output of @command{date} into a @command{bc}
	4377	program!
[3613]	4378
[599]	4379	The main body of this is the @command{sed} script, which remaps the name
[3613]	4380	from lower to upper (or vice-versa) and even checks out
[599]	4381	if the remapped name is the same as the original name.
	4382	Note how the script is parameterized using shell
	4383	variables and proper quoting.
	4384
	4385	@c start-------------------------------------------
	4386	@example
	4387	@group
	4388	#! /bin/sh
[3613]	4389	# rename files to lower/upper case...
[599]	4390	#
[3613]	4391	# usage:
	4392	# move-to-lower *
	4393	# move-to-upper *
[599]	4394	# or
	4395	# move-to-lower -R .
	4396	# move-to-upper -R .
	4397	#
	4398	@end group
	4399
	4400	@group
	4401	help()
	4402	@{
[3613]	4403	cat << eof
[599]	4404	Usage: $0 [-n] [-r] [-h] files...
	4405	@end group
	4406
	4407	@group
	4408	-n do nothing, only see what would be done
	4409	-R recursive (use find)
	4410	-h this message
	4411	files files to remap to lower case
	4412	@end group
	4413
	4414	@group
	4415	Examples:
	4416	$0 -n * (see if everything is ok, then...)
	4417	$0 *
	4418	@end group
	4419
	4420	$0 -R .
	4421
	4422	@group
	4423	eof
	4424	@}
	4425	@end group
	4426
	4427	@group
	4428	apply_cmd='sh'
	4429	finder='echo "$@@" \| tr " " "\n"'
	4430	files_only=
	4431	@end group
	4432
	4433	@group
	4434	while :
	4435	do
[3613]	4436	case "$1" in
[599]	4437	-n) apply_cmd='cat' ;;
	4438	-R) finder='find "$@@" -type f';;
	4439	-h) help ; exit 1 ;;
	4440	*) break ;;
	4441	esac
	4442	shift
	4443	done
	4444	@end group
	4445
	4446	@group
	4447	if [ -z "$1" ]; then
	4448	echo Usage: $0 [-h] [-n] [-r] files...
	4449	exit 1
	4450	fi
	4451	@end group
	4452
	4453	@group
	4454	LOWER='abcdefghijklmnopqrstuvwxyz'
	4455	UPPER='ABCDEFGHIJKLMNOPQRSTUVWXYZ'
	4456	@end group
	4457
	4458	@group
	4459	case `basename $0` in
	4460	upper) TO=$UPPER; FROM=$LOWER ;;
	4461	*) FROM=$UPPER; TO=$LOWER ;;
	4462	esac
	4463	@end group
[3613]	4464
[599]	4465	eval $finder \| sed -n '
	4466
	4467	@group
	4468	# remove all trailing slashes
	4469	s/\/*$//
	4470	@end group
	4471
	4472	@group
	4473	# add ./ if there is no path, only a filename
	4474	/\//! s/^/.\//
	4475	@end group
	4476
	4477	@group
	4478	# save path+filename
	4479	h
	4480	@end group
	4481
	4482	@group
	4483	# remove path
	4484	s/.*\///
	4485	@end group
	4486
	4487	@group
	4488	# do conversion only on filename
	4489	y/'$FROM'/'$TO'/
	4490	@end group
	4491
	4492	@group
	4493	# now line contains original path+file, while
	4494	# hold space contains the new filename
	4495	x
	4496	@end group
	4497
	4498	@group
	4499	# add converted file name to line, which now contains
	4500	# path/file-name\nconverted-file-name
	4501	G
	4502	@end group
	4503
	4504	@group
	4505	# check if converted file name is equal to original file name,
[3613]	4506	# if it is, do not print anything
[599]	4507	/^.\/$.$\n\1/b
	4508	@end group
	4509
	4510	@group
[3613]	4511	# escape special characters for the shell
	4512	s/["$`\\]/\\&/g
	4513	@end group
	4514
	4515	@group
[599]	4516	# now, transform path/fromfile\n, into
	4517	# mv path/fromfile path/tofile and print it
	4518	s/^$.\/$$.$\n$.*$$/mv "\1\2" "\1\3"/p
	4519	@end group
	4520
	4521	' \| $apply_cmd
	4522	@end example
	4523	@c end---------------------------------------------
	4524
	4525	@node Print bash environment
	4526	@section Print @command{bash} Environment
	4527
	4528	This script strips the definition of the shell functions
	4529	from the output of the @command{set} Bourne-shell command.
	4530
	4531	@c start-------------------------------------------
	4532	@example
	4533	#!/bin/sh
	4534
	4535	@group
	4536	set \| sed -n '
	4537	:x
	4538	@end group
	4539
	4540	@group
	4541	@ifinfo
	4542	# if no occurrence of "=()" print and load next line
	4543	@end ifinfo
	4544	@ifnotinfo
	4545	# if no occurrence of @samp{=()} print and load next line
	4546	@end ifnotinfo
	4547	/=()/! @{ p; b; @}
	4548	/ () $/! @{ p; b; @}
	4549	@end group
	4550
	4551	@group
	4552	# possible start of functions section
	4553	# save the line in case this is a var like FOO="() "
	4554	h
	4555	@end group
	4556
	4557	@group
	4558	# if the next line has a brace, we quit because
	4559	# nothing comes after functions
	4560	n
	4561	/^@{/ q
	4562	@end group
	4563
	4564	@group
	4565	# print the old line
	4566	x; p
	4567	@end group
	4568
	4569	@group
	4570	# work on the new line now
	4571	x; bx
	4572	'
	4573	@end group
	4574	@end example
	4575	@c end---------------------------------------------
	4576
	4577	@node Reverse chars of lines
	4578	@section Reverse Characters of Lines
	4579
	4580	This script can be used to reverse the position of characters
	4581	in lines. The technique moves two characters at a time, hence
	4582	it is faster than more intuitive implementations.
	4583
	4584	Note the @code{tx} command before the definition of the label.
	4585	This is often needed to reset the flag that is tested by
	4586	the @code{t} command.
	4587
	4588	Imaginative readers will find uses for this script. An example
	4589	is reversing the output of @command{banner}.@footnote{This requires
	4590	another script to pad the output of banner; for example
	4591
	4592	@example
	4593	#! /bin/sh
	4594
	4595	banner -w $1 $2 $3 $4 \|
	4596	sed -e :a -e '/^.\@{0,'$1'\@}$/ @{ s/$/ /; ba; @}' \|
	4597	~/sedscripts/reverseline.sed
	4598	@end example
	4599	}
	4600
	4601	@c start-------------------------------------------
	4602	@example
	4603	#!/usr/bin/sed -f
	4604
	4605	/../! b
	4606
	4607	@group
	4608	# Reverse a line. Begin embedding the line between two newlines
	4609	s/^.*$/\
	4610	&\
	4611	/
	4612	@end group
	4613
	4614	@group
	4615	# Move first character at the end. The regexp matches until
	4616	# there are zero or one characters between the markers
	4617	tx
	4618	:x
	4619	s/$\n.$$.*$$.\n$/\3\2\1/
	4620	tx
	4621	@end group
	4622
	4623	@group
	4624	# Remove the newline markers
	4625	s/\n//g
	4626	@end group
	4627	@end example
	4628	@c end---------------------------------------------
	4629
[3613]	4630
	4631	@node Text search across multiple lines
	4632	@section Text search across multiple lines
	4633
	4634	This section uses @code{N} and @code{D} commands to search for
	4635	consecutive words spanning multiple lines. @xref{Multiline techniques}.
	4636
	4637	These examples deal with finding doubled occurrences of words in a document.
	4638
	4639	Finding doubled words in a single line is easy using GNU @command{grep}
	4640	and similarly with @value{SSED}:
	4641
	4642	@c NOTE: in all examples, 'the@ the' is used to prevent
	4643	@c 'make syntax-check' from complaining about double words.
	4644	@codequoteundirected on
	4645	@codequotebacktick on
	4646	@example
	4647	@group
	4648	$ cat two-cities-dup1.txt
	4649	It was the best of times,
	4650	it was the worst of times,
	4651	it was the@ the age of wisdom,
	4652	it was the age of foolishness,
	4653
	4654	$ grep -E '\b(\w+)\s+\1\b' two-cities-dup1.txt
	4655	it was the@ the age of wisdom,
	4656
	4657	$ grep -n -E '\b(\w+)\s+\1\b' two-cities-dup1.txt
	4658	3:it was the@ the age of wisdom,
	4659
	4660	$ sed -En '/\b(\w+)\s+\1\b/p' two-cities-dup1.txt
	4661	it was the@ the age of wisdom,
	4662
	4663	$ sed -En '/\b(\w+)\s+\1\b/@{=;p@}' two-cities-dup1.txt
	4664	3
	4665	it was the@ the age of wisdom,
	4666	@end group
	4667	@end example
	4668	@codequoteundirected off
	4669	@codequotebacktick off
	4670
	4671	@itemize @bullet
	4672	@item
	4673	The regular expression @samp{\b\w+\s+} searches for word-boundary (@samp{\b}),
	4674	followed by one-or-more word-characters (@samp{\w+}), followed by whitespace
	4675	(@samp{\s+}). @xref{regexp extensions}.
	4676
	4677	@item
	4678	Adding parentheses around the @samp{(\w+)} expression creates a subexpression.
	4679	The regular expression pattern @samp{(PATTERN)\s+\1} defines a subexpression
	4680	(in the parentheses) followed by a back-reference, separated by whitespace.
	4681	A successful match means the @var{PATTERN} was repeated twice in succession.
	4682	@xref{Back-references and Subexpressions}.
	4683
	4684	@item
	4685	The word-boundery expression (@samp{\b}) at both ends ensures partial
	4686	words are not matched (e.g. @samp{the then} is not a desired match).
	4687	@c Thanks to Jim for pointing this out in
	4688	@c https://lists.gnu.org/archive/html/sed-devel/2016-12/msg00041.html
	4689
	4690	@item
	4691	The @option{-E} option enables extended regular expression syntax, alleviating
	4692	the need to add backslashes before the parenthesis. @xref{ERE syntax}.
	4693
	4694	@end itemize
	4695
	4696	When the doubled word span two lines the above regular expression
	4697	will not find them as @command{grep} and @command{sed} operate line-by-line.
	4698
	4699	By using @command{N} and @command{D} commands, @command{sed} can apply
	4700	regular expressions on multiple lines (that is, multiple lines are stored
	4701	in the pattern space, and the regular expression works on it):
	4702
	4703	@c NOTE: use 'the@*the' instead of a real new line to prevent
	4704	@c 'make syntax-check' to complain about doubled-words.
	4705	@codequoteundirected on
	4706	@codequotebacktick on
	4707	@example
	4708	$ cat two-cities-dup2.txt
	4709	It was the best of times, it was the
	4710	worst of times, it was the@*the age of wisdom,
	4711	it was the age of foolishness,
	4712
	4713	$ sed -En '@{N; /\b(\w+)\s+\1\b/@{=;p@} ; D@}' two-cities-dup2.txt
	4714	3
	4715	worst of times, it was the@*the age of wisdom,
	4716	@end example
	4717	@codequoteundirected off
	4718	@codequotebacktick off
	4719
	4720	@itemize @bullet
	4721	@item
	4722	The @command{N} command appends the next line to the pattern space
	4723	(thus ensuring it contains two consecutive lines in every cycle).
	4724
	4725	@item
	4726	The regular expression uses @samp{\s+} for word separator which matches
	4727	both spaces and newlines.
	4728
	4729	@item
	4730	The regular expression matches, the entire pattern space is printed
	4731	with @command{p}. No lines are printed by default due to the @option{-n} option.
	4732
	4733	@item
	4734	The @command{D} removes the first line from the pattern space (up until the
	4735	first newline), readying it for the next cycle.
	4736	@end itemize
	4737
	4738	See the GNU @command{coreutils} manual for an alternative solution using
	4739	@command{tr -s} and @command{uniq} at
	4740	@c NOTE: cheating and keeping the URL line shorter than 80 characters
	4741	@c by using 'gnu.org' and '/s/'.
	4742	@url{https://gnu.org/s/coreutils/manual/html_node/Squeezing-and-deleting.html}.
	4743
	4744	@node Line length adjustment
	4745	@section Line length adjustment
	4746
	4747	This section uses @code{N} and @code{P} commands to read and write
	4748	lines, and the @code{b} command for branching.
	4749	@xref{Multiline techniques} and @ref{Branching and flow control}.
	4750
	4751	This (somewhat contrived) example deal with formatting and wrapping
	4752	lines of text of the following input file:
	4753
	4754	@example
	4755	@group
	4756	$ cat two-cities-mix.txt
	4757	It was the best of times, it was
	4758	the worst of times, it
	4759	was the age of
	4760	wisdom,
	4761	it
	4762	was
	4763	the age
	4764	of foolishness,
	4765	@end group
	4766	@end example
	4767
	4768	@exdent The following sed program wraps lines at 40 characters:
	4769	@codequoteundirected on
	4770	@codequotebacktick on
	4771	@example
	4772	@group
	4773	$ cat wrap40.sed
	4774	# outer loop
	4775	:x
	4776
	4777	# Append a newline followed by the next input line to the pattern buffer
	4778	N
	4779
	4780	# Remove all newlines from the pattern buffer
	4781	s/\n/ /g
	4782
	4783
	4784	# Inner loop
	4785	:y
	4786
	4787	# Add a newline after the first 40 characters
	4788	s/(.@{40,40@})/\1\n/
	4789
	4790	# If there is a newline in the pattern buffer
	4791	# (i.e. the previous substitution added a newline)
	4792	/\n/ @{
	4793	# There are newlines in the pattern buffer -
	4794	# print the content until the first newline.
	4795	P
	4796
	4797	# Remove the printed characters and the first newline
	4798	s/.*\n//
	4799
	4800	# branch to label 'y' - repeat inner loop
	4801	by
	4802	@}
	4803
	4804	# No newlines in the pattern buffer - Branch to label 'x' (outer loop)
	4805	# and read the next input line
	4806	bx
	4807	@end group
	4808	@end example
	4809	@codequoteundirected off
	4810	@codequotebacktick off
	4811
	4812
	4813
	4814	@exdent The wrapped output:
	4815	@codequoteundirected on
	4816	@codequotebacktick on
	4817	@example
	4818	@group
	4819	$ sed -E -f wrap40.sed two-cities-mix.txt
	4820	It was the best of times, it was the wor
	4821	st of times, it was the age of wisdom, i
	4822	t was the age of foolishness,
	4823	@end group
	4824	@end example
	4825	@codequoteundirected off
	4826	@codequotebacktick off
	4827
	4828
	4829
	4830
	4831	@node Adding a header to multiple files
	4832	@section Adding a header to multiple files
	4833
	4834	@value{SSED} can be used to safely modify multiple files at once.
	4835
	4836	@exdent Add a single line to the beginning of source code files:
	4837
	4838	@codequoteundirected on
	4839	@codequotebacktick on
	4840	@example
	4841	sed -i '1i/* Copyright (C) FOO BAR /' .c
	4842	@end example
	4843	@codequoteundirected off
	4844	@codequotebacktick off
	4845
	4846	@exdent Adding a few lines is possible using @samp{\n} in the text:
	4847
	4848	@codequoteundirected on
	4849	@codequotebacktick on
	4850	@example
	4851	sed -i '1i/\n Copyright (C) FOO BAR\n * Created by Jane Doe\n /' .c
	4852	@end example
	4853	@codequoteundirected off
	4854	@codequotebacktick off
	4855
	4856	To add multiple lines from another file, use @code{0rFILE}.
	4857	A typical use case is adding a license notice header to all files:
	4858
	4859	@codequoteundirected on
	4860	@codequotebacktick on
	4861	@example
	4862	## Create the header file:
	4863	$ cat<<'EOF'>LIC.TXT
	4864	/*
	4865	Copyright (C) 1989-2021 FOO BAR
	4866
	4867	This program is free software; you can redistribute it and/or modify
	4868	it under the terms of the GNU General Public License as published by
	4869	the Free Software Foundation; either version 3, or (at your option)
	4870	any later version.
	4871
	4872	This program is distributed in the hope that it will be useful,
	4873	but WITHOUT ANY WARRANTY; without even the implied warranty of
	4874	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
	4875	GNU General Public License for more details.
	4876
	4877	You should have received a copy of the GNU General Public License
	4878	along with this program; If not, see <https://www.gnu.org/licenses/>.
	4879	*/
	4880	EOF
	4881
	4882	## Add the file at the beginning of all source code files:
	4883	$ sed -i '0rLIC.TXT' .cpp .h
	4884	@end example
	4885	@codequoteundirected off
	4886	@codequotebacktick off
	4887
	4888
	4889	With script files (e.g. @file{.sh},@file{.py},@file{.pl} files)
	4890	the license notice typically appears @emph{after} the first line (the
	4891	'shebang' @samp{#!} line). The @code{1rFILE} command will add @file{FILE}
	4892	@emph{after} the first line:
	4893
	4894	@codequoteundirected on
	4895	@codequotebacktick on
	4896	@example
	4897	## Create the header file:
	4898	$ cat<<'EOF'>LIC.TXT
	4899	##
	4900	## Copyright (C) 1989-2021 FOO BAR
	4901	##
	4902	## This program is free software; you can redistribute it and/or modify
	4903	## it under the terms of the GNU General Public License as published by
	4904	## the Free Software Foundation; either version 3, or (at your option)
	4905	## any later version.
	4906	##
	4907	## This program is distributed in the hope that it will be useful,
	4908	## but WITHOUT ANY WARRANTY; without even the implied warranty of
	4909	## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
	4910	## GNU General Public License for more details.
	4911	##
	4912	## You should have received a copy of the GNU General Public License
	4913	## along with this program; If not, see <https://www.gnu.org/licenses/>.
	4914	##
	4915	##
	4916	EOF
	4917
	4918	## Add the file at the beginning of all source code files:
	4919	$ sed -i '1rLIC.TXT' .py .sh
	4920	@end example
	4921	@codequoteundirected off
	4922	@codequotebacktick off
	4923
	4924	The above @command{sed} commands can be combined with @command{find}
	4925	to locate files in all subdirectories, @command{xargs} to run additional
	4926	commands on selected files and @command{grep} to filter out files that already
	4927	contain a copyright notice:
	4928
	4929	@codequoteundirected on
	4930	@codequotebacktick on
	4931	@example
	4932	find $ -iname '.cpp' -o -iname '.c' -o -iname '*.h' $ \
	4933	\| xargs grep -Li copyright \
	4934	\| xargs -r sed -i '0rLIC.TXT'
	4935	@end example
	4936	@codequoteundirected off
	4937	@codequotebacktick off
	4938
	4939	@exdent Or a slightly safe version (handling files with spaces and newlines):
	4940
	4941	@codequoteundirected on
	4942	@codequotebacktick on
	4943	@example
	4944	find $ -iname '.cpp' -o -iname '.c' -o -iname '*.h' $ -print0 \
	4945	\| xargs -0 grep -Z -Li copyright \
	4946	\| xargs -0 -r sed -i '0rLIC.TXT'
	4947	@end example
	4948	@codequoteundirected off
	4949	@codequotebacktick off
	4950
	4951	Note: using the @code{0} address with @code{r} command requires @value{SSED}
	4952	version 4.9 or later. @xref{Zero Address}.
	4953
	4954
	4955
[599]	4956	@node tac
	4957	@section Reverse Lines of Files
	4958
	4959	This one begins a series of totally useless (yet interesting)
	4960	scripts emulating various Unix commands. This, in particular,
	4961	is a @command{tac} workalike.
	4962
[3613]	4963	Note that on implementations other than GNU @command{sed}
[599]	4964	this script might easily overflow internal buffers.
	4965
	4966	@c start-------------------------------------------
	4967	@example
	4968	#!/usr/bin/sed -nf
	4969
	4970	# reverse all lines of input, i.e. first line became last, ...
	4971
	4972	@group
	4973	# from the second line, the buffer (which contains all previous lines)
	4974	# is appended to current line, so, the order will be reversed
	4975	1! G
	4976	@end group
	4977
	4978	@group
	4979	# on the last line we're done -- print everything
	4980	$ p
	4981	@end group
	4982
	4983	@group
	4984	# store everything on the buffer again
	4985	h
	4986	@end group
	4987	@end example
	4988	@c end---------------------------------------------
	4989
	4990	@node cat -n
	4991	@section Numbering Lines
	4992
	4993	This script replaces @samp{cat -n}; in fact it formats its output
[3613]	4994	exactly like GNU @command{cat} does.
[599]	4995
	4996	Of course this is completely useless and for two reasons: first,
	4997	because somebody else did it in C, second, because the following
	4998	Bourne-shell script could be used for the same purpose and would
	4999	be much faster:
	5000
	5001	@c start-------------------------------------------
	5002	@example
	5003	@group
	5004	#! /bin/sh
	5005	sed -e "=" $@@ \| sed -e '
	5006	s/^/ /
	5007	N
	5008	s/^ *$......$\n/\1 /
	5009	'
	5010	@end group
	5011	@end example
	5012	@c end---------------------------------------------
	5013
	5014	It uses @command{sed} to print the line number, then groups lines two
	5015	by two using @code{N}. Of course, this script does not teach as much as
	5016	the one presented below.
	5017
	5018	The algorithm used for incrementing uses both buffers, so the line
	5019	is printed as soon as possible and then discarded. The number
	5020	is split so that changing digits go in a buffer and unchanged ones go
	5021	in the other; the changed digits are modified in a single step
	5022	(using a @code{y} command). The line number for the next line
	5023	is then composed and stored in the hold space, to be used in the
	5024	next iteration.
	5025
	5026	@c start-------------------------------------------
	5027	@example
	5028	#!/usr/bin/sed -nf
	5029
	5030	@group
	5031	# Prime the pump on the first line
	5032	x
	5033	/^$/ s/^.*$/1/
	5034	@end group
	5035
	5036	@group
	5037	# Add the correct line number before the pattern
	5038	G
	5039	h
	5040	@end group
	5041
	5042	@group
	5043	# Format it and print it
	5044	s/^/ /
	5045	s/^ *$......$\n/\1 /p
	5046	@end group
	5047
	5048	@group
	5049	# Get the line number from hold space; add a zero
	5050	# if we're going to add a digit on the next line
	5051	g
	5052	s/\n.*$//
	5053	/^9*$/ s/^/0/
	5054	@end group
	5055
	5056	@group
	5057	# separate changing/unchanged digits with an x
	5058	s/.9*$/x&/
	5059	@end group
	5060
	5061	@group
	5062	# keep changing digits in hold space
	5063	h
	5064	s/^.*x//
	5065	y/0123456789/1234567890/
	5066	x
	5067	@end group
	5068
	5069	@group
	5070	# keep unchanged digits in pattern space
	5071	s/x.*$//
	5072	@end group
	5073
	5074	@group
	5075	# compose the new number, remove the newline implicitly added by G
	5076	G
	5077	s/\n//
	5078	h
	5079	@end group
	5080	@end example
	5081	@c end---------------------------------------------
	5082
	5083	@node cat -b
	5084	@section Numbering Non-blank Lines
	5085
	5086	Emulating @samp{cat -b} is almost the same as @samp{cat -n}---we only
	5087	have to select which lines are to be numbered and which are not.
	5088
	5089	The part that is common to this script and the previous one is
	5090	not commented to show how important it is to comment @command{sed}
	5091	scripts properly...
	5092
	5093	@c start-------------------------------------------
	5094	@example
	5095	#!/usr/bin/sed -nf
	5096
	5097	@group
	5098	/^$/ @{
	5099	p
	5100	b
	5101	@}
	5102	@end group
	5103
	5104	@group
	5105	# Same as cat -n from now
	5106	x
	5107	/^$/ s/^.*$/1/
	5108	G
	5109	h
	5110	s/^/ /
	5111	s/^ *$......$\n/\1 /p
	5112	x
	5113	s/\n.*$//
	5114	/^9*$/ s/^/0/
	5115	s/.9*$/x&/
	5116	h
	5117	s/^.*x//
	5118	y/0123456789/1234567890/
	5119	x
	5120	s/x.*$//
	5121	G
	5122	s/\n//
	5123	h
	5124	@end group
	5125	@end example
	5126	@c end---------------------------------------------
	5127
	5128	@node wc -c
	5129	@section Counting Characters
	5130
	5131	This script shows another way to do arithmetic with @command{sed}.
	5132	In this case we have to add possibly large numbers, so implementing
	5133	this by successive increments would not be feasible (and possibly
	5134	even more complicated to contrive than this script).
	5135
	5136	The approach is to map numbers to letters, kind of an abacus
	5137	implemented with @command{sed}. @samp{a}s are units, @samp{b}s are
	5138	tens and so on: we simply add the number of characters
	5139	on the current line as units, and then propagate the carry
	5140	to tens, hundreds, and so on.
	5141
	5142	As usual, running totals are kept in hold space.
	5143
	5144	On the last line, we convert the abacus form back to decimal.
	5145	For the sake of variety, this is done with a loop rather than
	5146	with some 80 @code{s} commands@footnote{Some implementations
	5147	have a limit of 199 commands per script}: first we
	5148	convert units, removing @samp{a}s from the number; then we
	5149	rotate letters so that tens become @samp{a}s, and so on
	5150	until no more letters remain.
	5151
	5152	@c start-------------------------------------------
	5153	@example
	5154	#!/usr/bin/sed -nf
	5155
	5156	@group
	5157	# Add n+1 a's to hold space (+1 is for the newline)
	5158	s/./a/g
	5159	H
	5160	x
	5161	s/\n/a/
	5162	@end group
	5163
	5164	@group
	5165	# Do the carry. The t's and b's are not necessary,
	5166	# but they do speed up the thing
	5167	t a
	5168	: a; s/aaaaaaaaaa/b/g; t b; b done
	5169	: b; s/bbbbbbbbbb/c/g; t c; b done
	5170	: c; s/cccccccccc/d/g; t d; b done
	5171	: d; s/dddddddddd/e/g; t e; b done
	5172	: e; s/eeeeeeeeee/f/g; t f; b done
	5173	: f; s/ffffffffff/g/g; t g; b done
	5174	: g; s/gggggggggg/h/g; t h; b done
	5175	: h; s/hhhhhhhhhh//g
	5176	@end group
	5177
	5178	@group
	5179	: done
	5180	$! @{
	5181	h
	5182	b
	5183	@}
	5184	@end group
	5185
	5186	# On the last line, convert back to decimal
	5187
	5188	@group
	5189	: loop
	5190	/a/! s/[b-h]*/&0/
	5191	s/aaaaaaaaa/9/
	5192	s/aaaaaaaa/8/
	5193	s/aaaaaaa/7/
	5194	s/aaaaaa/6/
	5195	s/aaaaa/5/
	5196	s/aaaa/4/
	5197	s/aaa/3/
	5198	s/aa/2/
	5199	s/a/1/
	5200	@end group
	5201
	5202	@group
	5203	: next
	5204	y/bcdefgh/abcdefg/
	5205	/[a-h]/ b loop
	5206	p
	5207	@end group
	5208	@end example
	5209	@c end---------------------------------------------
	5210
	5211	@node wc -w
	5212	@section Counting Words
	5213
	5214	This script is almost the same as the previous one, once each
	5215	of the words on the line is converted to a single @samp{a}
	5216	(in the previous script each letter was changed to an @samp{a}).
	5217
	5218	It is interesting that real @command{wc} programs have optimized
	5219	loops for @samp{wc -c}, so they are much slower at counting
	5220	words rather than characters. This script's bottleneck,
	5221	instead, is arithmetic, and hence the word-counting one
	5222	is faster (it has to manage smaller numbers).
	5223
	5224	Again, the common parts are not commented to show the importance
	5225	of commenting @command{sed} scripts.
	5226
	5227	@c start-------------------------------------------
	5228	@example
	5229	#!/usr/bin/sed -nf
	5230
	5231	@group
	5232	# Convert words to a's
[3613]	5233	s/[ @kbd{@key{TAB}}][ @kbd{@key{TAB}}]*/ /g
[599]	5234	s/^/ /
	5235	s/ [^ ][^ ]*/a /g
	5236	s/ //g
	5237	@end group
	5238
	5239	@group
	5240	# Append them to hold space
	5241	H
	5242	x
	5243	s/\n//
	5244	@end group
	5245
	5246	@group
	5247	# From here on it is the same as in wc -c.
	5248	/aaaaaaaaaa/! bx; s/aaaaaaaaaa/b/g
	5249	/bbbbbbbbbb/! bx; s/bbbbbbbbbb/c/g
	5250	/cccccccccc/! bx; s/cccccccccc/d/g
	5251	/dddddddddd/! bx; s/dddddddddd/e/g
	5252	/eeeeeeeeee/! bx; s/eeeeeeeeee/f/g
	5253	/ffffffffff/! bx; s/ffffffffff/g/g
	5254	/gggggggggg/! bx; s/gggggggggg/h/g
	5255	s/hhhhhhhhhh//g
	5256	:x
	5257	$! @{ h; b; @}
	5258	:y
	5259	/a/! s/[b-h]*/&0/
	5260	s/aaaaaaaaa/9/
	5261	s/aaaaaaaa/8/
	5262	s/aaaaaaa/7/
	5263	s/aaaaaa/6/
	5264	s/aaaaa/5/
	5265	s/aaaa/4/
	5266	s/aaa/3/
	5267	s/aa/2/
	5268	s/a/1/
	5269	y/bcdefgh/abcdefg/
	5270	/[a-h]/ by
	5271	p
	5272	@end group
	5273	@end example
	5274	@c end---------------------------------------------
	5275
	5276	@node wc -l
	5277	@section Counting Lines
	5278
	5279	No strange things are done now, because @command{sed} gives us
	5280	@samp{wc -l} functionality for free!!! Look:
	5281
	5282	@c start-------------------------------------------
	5283	@example
	5284	@group
	5285	#!/usr/bin/sed -nf
	5286	$=
	5287	@end group
	5288	@end example
	5289	@c end---------------------------------------------
	5290
	5291	@node head
	5292	@section Printing the First Lines
	5293
	5294	This script is probably the simplest useful @command{sed} script.
	5295	It displays the first 10 lines of input; the number of displayed
	5296	lines is right before the @code{q} command.
	5297
	5298	@c start-------------------------------------------
	5299	@example
	5300	@group
	5301	#!/usr/bin/sed -f
	5302	10q
	5303	@end group
	5304	@end example
	5305	@c end---------------------------------------------
	5306
	5307	@node tail
	5308	@section Printing the Last Lines
	5309
	5310	Printing the last @var{n} lines rather than the first is more complex
	5311	but indeed possible. @var{n} is encoded in the second line, before
	5312	the bang character.
	5313
	5314	This script is similar to the @command{tac} script in that it keeps the
	5315	final output in the hold space and prints it at the end:
	5316
	5317	@c start-------------------------------------------
	5318	@example
	5319	#!/usr/bin/sed -nf
	5320
	5321	@group
	5322	1! @{; H; g; @}
	5323	1,10 !s/[^\n]*\n//
	5324	$p
	5325	h
	5326	@end group
	5327	@end example
	5328	@c end---------------------------------------------
	5329
	5330	Mainly, the scripts keeps a window of 10 lines and slides it
	5331	by adding a line and deleting the oldest (the substitution command
	5332	on the second line works like a @code{D} command but does not
	5333	restart the loop).
	5334
	5335	The ``sliding window'' technique is a very powerful way to write
	5336	efficient and complex @command{sed} scripts, because commands like
	5337	@code{P} would require a lot of work if implemented manually.
	5338
	5339	To introduce the technique, which is fully demonstrated in the
	5340	rest of this chapter and is based on the @code{N}, @code{P}
	5341	and @code{D} commands, here is an implementation of @command{tail}
	5342	using a simple ``sliding window.''
	5343
	5344	This looks complicated but in fact the working is the same as
	5345	the last script: after we have kicked in the appropriate number
	5346	of lines, however, we stop using the hold space to keep inter-line
	5347	state, and instead use @code{N} and @code{D} to slide pattern
	5348	space by one line:
	5349
	5350	@c start-------------------------------------------
	5351	@example
	5352	#!/usr/bin/sed -f
	5353
	5354	@group
	5355	1h
	5356	2,10 @{; H; g; @}
	5357	$q
	5358	1,9d
	5359	N
	5360	D
	5361	@end group
	5362	@end example
	5363	@c end---------------------------------------------
	5364
	5365	Note how the first, second and fourth line are inactive after
	5366	the first ten lines of input. After that, all the script does
	5367	is: exiting on the last line of input, appending the next input
	5368	line to pattern space, and removing the first line.
	5369
	5370	@node uniq
	5371	@section Make Duplicate Lines Unique
	5372
	5373	This is an example of the art of using the @code{N}, @code{P}
	5374	and @code{D} commands, probably the most difficult to master.
	5375
	5376	@c start-------------------------------------------
	5377	@example
	5378	@group
	5379	#!/usr/bin/sed -f
	5380	h
	5381	@end group
	5382
	5383	@group
	5384	:b
	5385	# On the last line, print and exit
	5386	$b
	5387	N
	5388	/^$.*$\n\1$/ @{
	5389	# The two lines are identical. Undo the effect of
	5390	# the n command.
	5391	g
	5392	bb
	5393	@}
	5394	@end group
	5395
	5396	@group
	5397	# If the @code{N} command had added the last line, print and exit
	5398	$b
	5399	@end group
	5400
	5401	@group
	5402	# The lines are different; print the first and go
	5403	# back working on the second.
	5404	P
	5405	D
	5406	@end group
	5407	@end example
	5408	@c end---------------------------------------------
	5409
[3613]	5410	As you can see, we maintain a 2-line window using @code{P} and @code{D}.
[599]	5411	This technique is often used in advanced @command{sed} scripts.
	5412
	5413	@node uniq -d
	5414	@section Print Duplicated Lines of Input
	5415
	5416	This script prints only duplicated lines, like @samp{uniq -d}.
	5417
	5418	@c start-------------------------------------------
	5419	@example
	5420	#!/usr/bin/sed -nf
	5421
	5422	@group
	5423	$b
	5424	N
	5425	/^$.*$\n\1$/ @{
	5426	# Print the first of the duplicated lines
	5427	s/.*\n//
	5428	p
	5429	@end group
	5430
	5431	@group
	5432	# Loop until we get a different line
	5433	:b
	5434	$b
	5435	N
	5436	/^$.*$\n\1$/ @{
	5437	s/.*\n//
	5438	bb
	5439	@}
	5440	@}
	5441	@end group
	5442
	5443	@group
	5444	# The last line cannot be followed by duplicates
	5445	$b
	5446	@end group
	5447
	5448	@group
	5449	# Found a different one. Leave it alone in the pattern space
	5450	# and go back to the top, hunting its duplicates
	5451	D
	5452	@end group
	5453	@end example
	5454	@c end---------------------------------------------
	5455
	5456	@node uniq -u
	5457	@section Remove All Duplicated Lines
	5458
	5459	This script prints only unique lines, like @samp{uniq -u}.
	5460
	5461	@c start-------------------------------------------
	5462	@example
	5463	#!/usr/bin/sed -f
	5464
	5465	@group
	5466	# Search for a duplicate line --- until that, print what you find.
	5467	$b
	5468	N
	5469	/^$.*$\n\1$/ ! @{
	5470	P
	5471	D
	5472	@}
	5473	@end group
	5474
	5475	@group
	5476	:c
	5477	# Got two equal lines in pattern space. At the
	5478	# end of the file we simply exit
	5479	$d
	5480	@end group
	5481
	5482	@group
	5483	# Else, we keep reading lines with @code{N} until we
	5484	# find a different one
	5485	s/.*\n//
	5486	N
	5487	/^$.*$\n\1$/ @{
	5488	bc
	5489	@}
	5490	@end group
	5491
	5492	@group
	5493	# Remove the last instance of the duplicate line
	5494	# and go back to the top
	5495	D
	5496	@end group
	5497	@end example
	5498	@c end---------------------------------------------
	5499
	5500	@node cat -s
	5501	@section Squeezing Blank Lines
	5502
	5503	As a final example, here are three scripts, of increasing complexity
	5504	and speed, that implement the same function as @samp{cat -s}, that is
	5505	squeezing blank lines.
	5506
	5507	The first leaves a blank line at the beginning and end if there are
	5508	some already.
	5509
	5510	@c start-------------------------------------------
	5511	@example
	5512	#!/usr/bin/sed -f
	5513
	5514	@group
	5515	# on empty lines, join with next
	5516	# Note there is a star in the regexp
	5517	:x
	5518	/^\n*$/ @{
	5519	N
	5520	bx
	5521	@}
	5522	@end group
	5523
	5524	@group
	5525	# now, squeeze all '\n', this can be also done by:
	5526	# s/^$\n$*/\1/
	5527	s/\n*/\
	5528	/
	5529	@end group
	5530	@end example
	5531	@c end---------------------------------------------
	5532
	5533	This one is a bit more complex and removes all empty lines
	5534	at the beginning. It does leave a single blank line at end
	5535	if one was there.
	5536
	5537	@c start-------------------------------------------
	5538	@example
	5539	#!/usr/bin/sed -f
	5540
	5541	@group
	5542	# delete all leading empty lines
	5543	1,/^./@{
	5544	/./!d
	5545	@}
	5546	@end group
	5547
	5548	@group
	5549	# on an empty line we remove it and all the following
	5550	# empty lines, but one
	5551	:x
	5552	/./!@{
	5553	N
	5554	s/^\n$//
	5555	tx
	5556	@}
	5557	@end group
	5558	@end example
	5559	@c end---------------------------------------------
	5560
	5561	This removes leading and trailing blank lines. It is also the
	5562	fastest. Note that loops are completely done with @code{n} and
	5563	@code{b}, without relying on @command{sed} to restart the
[3613]	5564	script automatically at the end of a line.
[599]	5565
	5566	@c start-------------------------------------------
	5567	@example
	5568	#!/usr/bin/sed -nf
	5569
	5570	@group
	5571	# delete all (leading) blanks
	5572	/./!d
	5573	@end group
	5574
	5575	@group
	5576	# get here: so there is a non empty
	5577	:x
	5578	# print it
	5579	p
	5580	# get next
	5581	n
[3613]	5582	# got chars? print it again, etc...
[599]	5583	/./bx
	5584	@end group
	5585
	5586	@group
	5587	# no, don't have chars: got an empty line
	5588	:z
	5589	# get next, if last line we finish here so no trailing
	5590	# empty lines are written
	5591	n
	5592	# also empty? then ignore it, and get next... this will
	5593	# remove ALL empty lines
	5594	/./!bz
	5595	@end group
	5596
	5597	@group
	5598	# all empty lines were deleted/ignored, but we have a non empty. As
	5599	# what we want to do is to squeeze, insert a blank line artificially
	5600	i\
	5601	@end group
	5602
	5603	bx
	5604	@end example
	5605	@c end---------------------------------------------
	5606
	5607	@node Limitations
	5608	@chapter @value{SSED}'s Limitations and Non-limitations
	5609
[3613]	5610	@cindex GNU extensions, unlimited line length
[599]	5611	@cindex Portability, line length limitations
	5612	For those who want to write portable @command{sed} scripts,
	5613	be aware that some implementations have been known to
	5614	limit line lengths (for the pattern and hold spaces)
	5615	to be no more than 4000 bytes.
	5616	The @sc{posix} standard specifies that conforming @command{sed}
	5617	implementations shall support at least 8192 byte line lengths.
	5618	@value{SSED} has no built-in limit on line length;
	5619	as long as it can @code{malloc()} more (virtual) memory,
	5620	you can feed or construct lines as long as you like.
	5621
	5622	However, recursion is used to handle subpatterns and indefinite
	5623	repetition. This means that the available stack space may limit
	5624	the size of the buffer that can be processed by certain patterns.
	5625
	5626
[3613]	5627	@node Other Resources
	5628	@chapter Other Resources for Learning About @command{sed}
[599]	5629
[3613]	5630	For up to date information about @value{SSED} please
	5631	visit @uref{https://www.gnu.org/software/sed/}.
[599]	5632
[3613]	5633	Send general questions and suggestions to @email{sed-devel@@gnu.org}.
	5634	Visit the mailing list archives for past discussions at
	5635	@uref{https://lists.gnu.org/archive/html/sed-devel/}.
[599]	5636
[3613]	5637	@cindex Additional reading about @command{sed}
	5638	The following resources provide information about @command{sed}
	5639	(both @value{SSED} and other variations). Note these not maintained by
	5640	@value{SSED} developers.
[599]	5641
[3613]	5642	@itemize @bullet
[599]	5643
	5644	@item
[3613]	5645	sed @code{$HOME}: @uref{http://sed.sf.net}
[599]	5646
	5647	@item
[3613]	5648	sed FAQ: @uref{http://sed.sf.net/sedfaq.html}
[599]	5649
	5650	@item
[3613]	5651	seder's grabbag: @uref{http://sed.sf.net/grabbag}
[599]	5652
	5653	@item
[3613]	5654	The @code{sed-users} mailing list maintained by Sven Guckes:
	5655	@uref{http://groups.yahoo.com/group/sed-users/}
	5656	(note this is @emph{not} the @value{SSED} mailing list).
[599]	5657
[3613]	5658	@end itemize
[599]	5659
	5660	@node Reporting Bugs
	5661	@chapter Reporting Bugs
	5662
	5663	@cindex Bugs, reporting
[3613]	5664	Email bug reports to @email{bug-sed@@gnu.org}.
[599]	5665	Also, please include the output of @samp{sed --version} in the body
	5666	of your report if at all possible.
	5667
	5668	Please do not send a bug report like this:
	5669
	5670	@example
[3613]	5671	@i{@i{@r{while building frobme-1.3.4}}}
	5672	$ configure
[599]	5673	@error{} sed: file sedscr line 1: Unknown option to 's'
	5674	@end example
	5675
	5676	If @value{SSED} doesn't configure your favorite package, take a
	5677	few extra minutes to identify the specific problem and make a stand-alone
	5678	test case. Unlike other programs such as C compilers, making such test
	5679	cases for @command{sed} is quite simple.
	5680
	5681	A stand-alone test case includes all the data necessary to perform the
	5682	test, and the specific invocation of @command{sed} that causes the problem.
	5683	The smaller a stand-alone test case is, the better. A test case should
	5684	not involve something as far removed from @command{sed} as ``try to configure
	5685	frobme-1.3.4''. Yes, that is in principle enough information to look
	5686	for the bug, but that is not a very practical prospect.
	5687
	5688	Here are a few commonly reported bugs that are not bugs.
	5689
	5690	@table @asis
[3613]	5691	@anchor{N_command_last_line}
[599]	5692	@item @code{N} command on the last line
	5693	@cindex Portability, @code{N} command on the last line
	5694	@cindex Non-bugs, @code{N} command on the last line
	5695
	5696	Most versions of @command{sed} exit without printing anything when
	5697	the @command{N} command is issued on the last line of a file.
	5698	@value{SSED} prints pattern space before exiting unless of course
	5699	the @command{-n} command switch has been specified. This choice is
	5700	by design.
	5701
[3613]	5702	Default behavior (gnu extension, non-POSIX conforming):
	5703	@example
	5704	$ seq 3 \| sed N
	5705	1
	5706	2
	5707	3
	5708	@end example
	5709	@noindent
	5710	To force POSIX-conforming behavior:
	5711	@example
	5712	$ seq 3 \| sed --posix N
	5713	1
	5714	2
	5715	@end example
	5716
[599]	5717	For example, the behavior of
	5718	@example
	5719	sed N foo bar
	5720	@end example
	5721	@noindent
	5722	would depend on whether foo has an even or an odd number of
	5723	lines@footnote{which is the actual ``bug'' that prompted the
	5724	change in behavior}. Or, when writing a script to read the
	5725	next few lines following a pattern match, traditional
	5726	implementations of @code{sed} would force you to write
	5727	something like
	5728	@example
	5729	/foo/@{ $!N; $!N; $!N; $!N; $!N; $!N; $!N; $!N; $!N @}
	5730	@end example
	5731	@noindent
	5732	instead of just
	5733	@example
	5734	/foo/@{ N;N;N;N;N;N;N;N;N; @}
	5735	@end example
[3613]	5736
[599]	5737	@cindex @code{POSIXLY_CORRECT} behavior, @code{N} command
	5738	In any case, the simplest workaround is to use @code{$d;N} in
	5739	scripts that rely on the traditional behavior, or to set
	5740	the @code{POSIXLY_CORRECT} variable to a non-empty value.
	5741
	5742	@item Regex syntax clashes (problems with backslashes)
[3613]	5743	@cindex GNU extensions, to basic regular expressions
[599]	5744	@cindex Non-bugs, regex syntax clashes
	5745	@command{sed} uses the @sc{posix} basic regular expression syntax. According to
	5746	the standard, the meaning of some escape sequences is undefined in
	5747	this syntax; notable in the case of @command{sed} are @code{\\|},
	5748	@code{\+}, @code{\?}, @code{\`}, @code{\'}, @code{\<},
	5749	@code{\>}, @code{\b}, @code{\B}, @code{\w}, and @code{\W}.
	5750
[3613]	5751	As in all GNU programs that use @sc{posix} basic regular
[599]	5752	expressions, @command{sed} interprets these escape sequences as special
	5753	characters. So, @code{x\+} matches one or more occurrences of @samp{x}.
	5754	@code{abc\\|def} matches either @samp{abc} or @samp{def}.
	5755
	5756	This syntax may cause problems when running scripts written for other
	5757	@command{sed}s. Some @command{sed} programs have been written with the
	5758	assumption that @code{\\|} and @code{\+} match the literal characters
	5759	@code{\|} and @code{+}. Such scripts must be modified by removing the
	5760	spurious backslashes if they are to be used with modern implementations
	5761	of @command{sed}, like
[3613]	5762	GNU @command{sed}.
[599]	5763
	5764	On the other hand, some scripts use s\|abc\\|def\|\|g to remove occurrences
	5765	of @emph{either} @code{abc} or @code{def}. While this worked until
	5766	@command{sed} 4.0.x, newer versions interpret this as removing the
	5767	string @code{abc\|def}. This is again undefined behavior according to
[3613]	5768	POSIX, and this interpretation is arguably more robust: older
[599]	5769	@command{sed}s, for example, required that the regex matcher parsed
	5770	@code{\/} as @code{/} in the common case of escaping a slash, which is
	5771	again undefined behavior; the new behavior avoids this, and this is good
	5772	because the regex matcher is only partially under our control.
	5773
[3613]	5774	@cindex GNU extensions, special escapes
[599]	5775	In addition, this version of @command{sed} supports several escape characters
	5776	(some of which are multi-character) to insert non-printable characters
	5777	in scripts (@code{\a}, @code{\c}, @code{\d}, @code{\o}, @code{\r},
	5778	@code{\t}, @code{\v}, @code{\x}). These can cause similar problems
	5779	with scripts written for other @command{sed}s.
	5780
	5781	@item @option{-i} clobbers read-only files
	5782	@cindex In-place editing
	5783	@cindex @value{SSEDEXT}, in-place editing
	5784	@cindex Non-bugs, in-place editing
	5785
	5786	In short, @samp{sed -i} will let you delete the contents of
	5787	a read-only file, and in general the @option{-i} option
	5788	(@pxref{Invoking sed, , Invocation}) lets you clobber
	5789	protected files. This is not a bug, but rather a consequence
[3613]	5790	of how the Unix file system works.
[599]	5791
	5792	The permissions on a file say what can happen to the data
	5793	in that file, while the permissions on a directory say what can
	5794	happen to the list of files in that directory. @samp{sed -i}
	5795	will not ever open for writing a file that is already on disk.
	5796	Rather, it will work on a temporary file that is finally renamed
	5797	to the original name: if you rename or delete files, you're actually
	5798	modifying the contents of the directory, so the operation depends on
	5799	the permissions of the directory, not of the file. For this same
[3613]	5800	reason, @command{sed} does not let you use @option{-i} on a writable file
	5801	in a read-only directory, and will break hard or symbolic links when
	5802	@option{-i} is used on such a file.
[599]	5803
	5804	@item @code{0a} does not work (gives an error)
[3613]	5805	@cindex @code{0} address
	5806	@cindex GNU extensions, @code{0} address
	5807	@cindex Non-bugs, @code{0} address
	5808
[599]	5809	There is no line 0. 0 is a special address that is only used to treat
	5810	addresses like @code{0,/@var{RE}/} as active when the script starts: if
[3613]	5811	you write @code{1,/abc/d} and the first line includes the string @samp{abc},
[599]	5812	then that match would be ignored because address ranges must span at least
	5813	two lines (barring the end of the file); but what you probably wanted is
	5814	to delete every line up to the first one including @samp{abc}, and this
	5815	is obtained with @code{0,/abc/d}.
	5816
	5817	@ifclear PERL
	5818	@item @code{[a-z]} is case insensitive
[3613]	5819	@cindex Non-bugs, localization-related
	5820
[599]	5821	You are encountering problems with locales. POSIX mandates that @code{[a-z]}
	5822	uses the current locale's collation order -- in C parlance, that means using
	5823	@code{strcoll(3)} instead of @code{strcmp(3)}. Some locales have a
[3613]	5824	case-insensitive collation order, others don't.
[599]	5825
	5826	Another problem is that @code{[a-z]} tries to use collation symbols.
[3613]	5827	This only happens if you are on the GNU system, using
	5828	GNU libc's regular expression matcher instead of compiling the
	5829	one supplied with GNU sed. In a Danish locale, for example,
[599]	5830	the regular expression @code{^[a-z]$} matches the string @samp{aa},
	5831	because this is a single collating symbol that comes after @samp{a}
	5832	and before @samp{b}; @samp{ll} behaves similarly in Spanish
	5833	locales, or @samp{ij} in Dutch locales.
	5834
	5835	To work around these problems, which may cause bugs in shell scripts, set
	5836	the @env{LC_COLLATE} and @env{LC_CTYPE} environment variables to @samp{C}.
	5837
[3613]	5838	@item @code{s/.*//} does not clear pattern space
	5839	@cindex Non-bugs, localization-related
	5840	@cindex @value{SSEDEXT}, emptying pattern space
	5841	@cindex Emptying pattern space
[599]	5842
[3613]	5843	This happens if your input stream includes invalid multibyte
	5844	sequences. @sc{posix} mandates that such sequences
	5845	are @emph{not} matched by @samp{.}, so that @samp{s/.*//} will not clear
	5846	pattern space as you would expect. In fact, there is no way to clear
	5847	sed's buffers in the middle of the script in most multibyte locales
	5848	(including UTF-8 locales). For this reason, @value{SSED} provides a `z'
	5849	command (for `zap') as an extension.
[599]	5850
[3613]	5851	To work around these problems, which may cause bugs in shell scripts, set
	5852	the @env{LC_COLLATE} and @env{LC_CTYPE} environment variables to @samp{C}.
	5853	@end ifclear
[599]	5854	@end table
	5855
	5856
	5857
	5858
[3613]	5859	@page
	5860	@node GNU Free Documentation License
	5861	@appendix GNU Free Documentation License
[599]	5862
[3613]	5863	@include fdl.texi
[599]	5864
	5865
	5866	@page
	5867	@node Concept Index
	5868	@unnumbered Concept Index
	5869
	5870	This is a general index of all issues discussed in this manual, with the
	5871	exception of the @command{sed} commands and command-line options.
	5872
	5873	@printindex cp
	5874
	5875	@page
	5876	@node Command and Option Index
	5877	@unnumbered Command and Option Index
	5878
	5879	This is an alphabetical list of all @command{sed} commands and command-line
	5880	options.
	5881
	5882	@printindex fn
	5883
	5884	@contents
	5885	@bye
	5886
	5887	@c XXX FIXME: the term "cycle" is never defined...

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: trunk/src/sed/doc/sed.texi

Download in other formats: