Changeset 3613 for trunk/src/sed/doc


Ignore:
Timestamp:
Sep 19, 2024, 2:34:43 AM (11 months ago)
Author:
bird
Message:

src/sed: Merged in changes between 4.1.5 and 4.9 from the vendor branch. (svn merge /vendor/sed/4.1.5 /vendor/sed/current .)

Location:
trunk/src/sed
Files:
6 deleted
8 edited
4 copied

Legend:

Unmodified
Added
Removed
  • trunk/src/sed

  • trunk/src/sed/doc/config.texi

    r599 r3613  
    66
    77@clear PERL
    8 @set SSEDEXT @acronym{GNU} extensions
    9 @set SSED @acronym{GNU} @command{sed}
     8@set SSEDEXT GNU extensions
     9@set SSED GNU @command{sed}
     10
     11@c Ugly hack to enable using new texinfo commands '@codequotebacktick'
     12@c and '@codequoteundirected' or define empty fallbacks if they are
     13@c not available.
     14
     15@ifclear txicommandconditionals
     16@c If we got here, this is a REALLY old texinfo (pre 5.0),
     17@c and '@ifcommandnotdefined' is not defined.
     18@c Assume these commands are not defined as well.
     19@macro codequotebacktick
     20@end macro
     21@macro codequoteundirected
     22@end macro
     23@end ifclear
     24
     25@ifset txicommandconditionals
     26@c if we got here, this texinfo supports checking for defined
     27@c commands. If these commands aren't available - define empty
     28@c fallbacks.
     29@ifcommandnotdefined codequotebacktick
     30@macro codequotebacktick
     31@end macro
     32@macro codequoteundirected
     33@end macro
     34@end ifcommandnotdefined
     35@end ifset
     36
     37
     38@c define variables that will render as characters
     39@c on both HTML (with @U{}) and PDF (with greek symbols).
     40@c Use with: @value{ucsigma}
     41@c
     42@c Based on:
     43@c https://lists.gnu.org/archive/html/help-texinfo/2012-06/msg00004.html
     44@iftex
     45@set ucsigma @math{@Sigma{}}
     46@end iftex
     47@ifnottex
     48@set ucsigma @U{03A3}
     49@end ifnottex
     50
     51@iftex
     52@set lcsigma @math{@sigma{}}
     53@end iftex
     54@ifnottex
     55@set lcsigma @U{03C3}
     56@end ifnottex
     57
     58@c Unicode Replacement Character (U+FFFD):
     59@c no easy/portable tex equivalent, so use another
     60@c distinct symbol (which will be rendered very differently
     61@c than ascii characters in @examples.
     62@iftex
     63@set unicodeFFFD @math{@otimes{}}
     64@end iftex
     65@ifnottex
     66@set unicodeFFFD @U{FFFD}
     67@end ifnottex
  • trunk/src/sed/doc/sed.1

    r599 r3613  
    1 .\" DO NOT MODIFY THIS FILE!  It was generated by help2man 1.28.
    2 .TH SED "1" "February 2006" "sed version 4.1.4" "User Commands"
     1.\" DO NOT MODIFY THIS FILE!  It was generated by help2man 1.48.5.
     2.TH SED "1" "November 2022" "GNU sed 4.9" "User Commands"
    33.SH NAME
    44sed \- stream editor for filtering and transforming text
    55.SH SYNOPSIS
    6 .B sed
    7 [\fIOPTION\fR]... \fI{script-only-if-no-other-script} \fR[\fIinput-file\fR]...
     6.nf
     7sed [-V] [--version] [--help] [-n] [--quiet] [--silent]
     8    [-l N] [--line-length=N] [-u] [--unbuffered]
     9    [-E] [-r] [--regexp-extended]
     10    [-e script] [--expression=script]
     11    [-f script-file] [--file=script-file]
     12    [script-if-no-other-script]
     13    [file...]
     14.fi
    815.SH DESCRIPTION
    916.ds sd \fIsed\fP
     
    2532suppress automatic printing of pattern space
    2633.HP
    27 \fB\-e\fR script, \fB\-\-expression\fR=\fIscript\fR
     34\fB\-\-debug\fR
     35.IP
     36annotate program execution
     37.HP
     38\fB\-e\fR script, \fB\-\-expression\fR=\fI\,script\/\fR
    2839.IP
    2940add the script to the commands to be executed
    3041.HP
    31 \fB\-f\fR script-file, \fB\-\-file\fR=\fIscript\-file\fR
    32 .IP
    33 add the contents of script-file to the commands to be executed
    34 .HP
    35 \fB\-i[SUFFIX]\fR, \fB\-\-in\-place\fR[=\fISUFFIX\fR]
    36 .IP
    37 edit files in place (makes backup if extension supplied)
    38 .HP
    39 \fB\-l\fR N, \fB\-\-line\-length\fR=\fIN\fR
    40 .IP
    41 specify the desired line-wrap length for the `l' command
     42\fB\-f\fR script\-file, \fB\-\-file\fR=\fI\,script\-file\/\fR
     43.IP
     44add the contents of script\-file to the commands to be executed
     45.HP
     46\fB\-\-follow\-symlinks\fR
     47.IP
     48follow symlinks when processing in place
     49.HP
     50\fB\-i[SUFFIX]\fR, \fB\-\-in\-place\fR[=\fI\,SUFFIX\/\fR]
     51.IP
     52edit files in place (makes backup if SUFFIX supplied)
     53.HP
     54\fB\-l\fR N, \fB\-\-line\-length\fR=\fI\,N\/\fR
     55.IP
     56specify the desired line\-wrap length for the `l' command
    4257.HP
    4358\fB\-\-posix\fR
     
    4560disable all GNU extensions.
    4661.HP
    47 \fB\-r\fR, \fB\-\-regexp\-extended\fR
    48 .IP
    49 use extended regular expressions in the script.
     62\fB\-E\fR, \fB\-r\fR, \fB\-\-regexp\-extended\fR
     63.IP
     64use extended regular expressions in the script
     65(for portability use POSIX \fB\-E\fR).
    5066.HP
    5167\fB\-s\fR, \fB\-\-separate\fR
    5268.IP
    53 consider files as separate rather than as a single continuous
    54 long stream.
     69consider files as separate rather than as a single,
     70continuous long stream.
     71.HP
     72\fB\-\-sandbox\fR
     73.IP
     74operate in sandbox mode (disable e/r/w commands).
    5575.HP
    5676\fB\-u\fR, \fB\-\-unbuffered\fR
     
    5878load minimal amounts of data from the input files and flush
    5979the output buffers more often
     80.HP
     81\fB\-z\fR, \fB\-\-null\-data\fR
     82.IP
     83separate lines by NUL characters
    6084.TP
    6185\fB\-\-help\fR
     
    6690.PP
    6791If no \fB\-e\fR, \fB\-\-expression\fR, \fB\-f\fR, or \fB\-\-file\fR option is given, then the first
    68 non-option argument is taken as the sed script to interpret.  All
     92non\-option argument is taken as the sed script to interpret.  All
    6993remaining arguments are names of input files; if no input files are
    7094specified, then the standard input is read.
    7195.PP
    72 E-mail bug reports to: bonzini@gnu.org .
    73 Be sure to include the word ``sed'' somewhere in the ``Subject:'' field.
     96GNU sed home page: <https://www.gnu.org/software/sed/>.
     97General help using GNU software: <https://www.gnu.org/gethelp/>.
     98E\-mail bug reports to: <bug\-sed@gnu.org>.
    7499.SH "COMMAND SYNOPSIS"
    75100This is just a brief synopsis of \*(sd commands to serve as
     
    89114.RI # comment
    90115The comment extends until the next newline (or the end of a
    91 .B -e
     116.B \-e
    92117script fragment).
    93118.TP
     
    114139which has each embedded newline preceded by a backslash.
    115140.TP
    116 q
     141q [\fIexit-code\fR]
    117142Immediately quit the \*(sd script without processing
    118 any more input,
    119 except that if auto-print is not disabled
    120 the current pattern space will be printed.
    121 .TP
    122 Q
     143any more input, except that if auto-print is not disabled
     144the current pattern space will be printed.  The exit code
     145argument is a GNU extension.
     146.TP
     147Q [\fIexit-code\fR]
    123148Immediately quit the \*(sd script without processing
    124 any more input.
     149any more input.  This is a GNU extension.
    125150.TP
    126151.RI r\  filename
     
    131156Append a line read from
    132157.IR filename .
     158Each invocation of the command reads a line from the file.
     159This is a GNU extension.
    133160.SS
    134161Commands which accept address ranges
     
    144171is omitted, branch to end of script.
    145172.TP
    146 .RI t\  label
    147 If a s/// has done a successful substitution since the
    148 last input line was read and since the last t or T
    149 command, then branch to
    150 .IR label ;
    151 if
    152 .I label
    153 is omitted, branch to end of script.
    154 .TP
    155 .RI T\  label
    156 If no s/// has done a successful substitution since the
    157 last input line was read and since the last t or T
    158 command, then branch to
    159 .IR label ;
    160 if
    161 .I label
    162 is omitted, branch to end of script.
    163 .TP
    164173c \e
    165174.TP
     
    174183.TP
    175184D
    176 Delete up to the first embedded newline in the pattern space.
    177 Start next cycle, but skip reading from the input
    178 if there is still data in the pattern space.
     185If pattern space contains no newline, start a normal new cycle as if
     186the d command was issued.  Otherwise, delete text in the pattern
     187space up to the first newline, and restart cycle with the resultant
     188pattern space, without reading a new line of input.
    179189.TP
    180190h H
     
    184194Copy/append hold space to pattern space.
    185195.TP
    186 x
    187 Exchange the contents of the hold and pattern spaces.
    188 .TP
    189196l
    190197List out the current line in a ``visually unambiguous'' form.
     198.TP
     199.RI l\  width
     200List out the current line in a ``visually unambiguous'' form,
     201breaking it at
     202.I width
     203characters.  This is a GNU extension.
    191204.TP
    192205n N
     
    215228.IR regexp .
    216229.TP
     230.RI t\  label
     231If a s/// has done a successful substitution since the
     232last input line was read and since the last t or T
     233command, then branch to
     234.IR label ;
     235if
     236.I label
     237is omitted, branch to end of script.
     238.TP
     239.RI T\  label
     240If no s/// has done a successful substitution since the
     241last input line was read and since the last t or T
     242command, then branch to
     243.IR label ;
     244if
     245.I label
     246is omitted, branch to end of script.  This is a GNU
     247extension.
     248.TP
    217249.RI w\  filename
    218250Write the current pattern space to
     
    222254Write the first line of the current pattern space to
    223255.IR filename .
     256This is a GNU extension.
     257.TP
     258x
     259Exchange the contents of the hold and pattern spaces.
    224260.TP
    225261.RI y/ source / dest /
     
    269305.I number
    270306Match only the specified line
    271 .IR number .
     307.IR number
     308(which increments cumulatively across files, unless the
     309.B \-s
     310option is specified on the command line).
    272311.TP
    273312.IR first ~ step
     
    276315line starting with line
    277316.IR first .
    278 For example, ``sed -n 1~2p'' will print all the odd-numbered lines in
     317For example, ``sed \-n 1~2p'' will print all the odd-numbered lines in
    279318the input stream, and the address 2~5 will match every fifth line,
    280 starting with the second. (This is an extension.)
     319starting with the second.
     320.I first
     321can be zero; in this case, \*(sd operates as if it were equal to
     322.IR step .
     323(This is an extension.)
    281324.TP
    282325$
     
    286329Match lines matching the regular expression
    287330.IR regexp .
     331Matching is performed on the current pattern space, which
     332can be modified with commands such as ``s///''.
    288333.TP
    289334.BI \fR\e\fPc regexp c
     
    309354.RI 1, addr2
    310355form will still be at the beginning of its range.
     356This works only when
     357.I addr2
     358is a regular expression.
    311359.TP
    312360.IR addr1 ,+ N
     
    337385.BR \et ,
    338386and other sequences.
     387The \fI-E\fP option switches to using extended regular expressions instead;
     388it has been supported for years by GNU sed, and is now
     389included in POSIX.
    339390.SH BUGS
    340391.PP
    341392E-mail bug reports to
    342 .BR bonzini@gnu.org .
    343 Be sure to include the word ``sed'' somewhere in the ``Subject:'' field.
    344 Also, please include the output of ``sed --version'' in the body
     393.BR bug-sed@gnu.org .
     394Also, please include the output of ``sed \-\-version'' in the body
    345395of your report if at all possible.
     396.SH AUTHOR
     397Written by Jay Fenlason, Tom Lord, Ken Pizzini,
     398Paolo Bonzini, Jim Meyering, and Assaf Gordon.
     399.PP
     400This sed program was built with SELinux support.
     401SELinux is enabled on this system.
     402.PP
     403GNU sed home page: <https://www.gnu.org/software/sed/>.
     404General help using GNU software: <https://www.gnu.org/gethelp/>.
     405E\-mail bug reports to: <bug\-sed@gnu.org>.
    346406.SH COPYRIGHT
    347 Copyright \(co 2003 Free Software Foundation, Inc.
     407Copyright \(co 2022 Free Software Foundation, Inc.
     408License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
    348409.br
    349 This is free software; see the source for copying conditions.  There is NO
    350 warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE,
    351 to the extent permitted by law.
     410This is free software: you are free to change and redistribute it.
     411There is NO WARRANTY, to the extent permitted by law.
    352412.SH "SEE ALSO"
    353413.BR awk (1),
  • trunk/src/sed/doc/sed.info

    r599 r3613  
    1 This is ../../doc/sed.info, produced by makeinfo version 4.5 from
    2 ../../doc/sed.texi.
    3 
     1This is sed.info, produced by makeinfo version 6.8dev from sed.texi.
     2
     3This file documents version 4.9 of GNU ‘sed’, a stream editor.
     4
     5   Copyright © 1998–2022 Free Software Foundation, Inc.
     6
     7     Permission is granted to copy, distribute and/or modify this
     8     document under the terms of the GNU Free Documentation License,
     9     Version 1.3 or any later version published by the Free Software
     10     Foundation; with no Invariant Sections, no Front-Cover Texts, and
     11     no Back-Cover Texts.  A copy of the license is included in the
     12     section entitled “GNU Free Documentation License”.
    413INFO-DIR-SECTION Text creation and manipulation
    514START-INFO-DIR-ENTRY
     
    817END-INFO-DIR-ENTRY
    918
    10 This file documents version 4.1.5 of GNU `sed', a stream editor.
    11 
    12    Copyright (C) 1998, 1999, 2001, 2002, 2003, 2004 Free Software
    13 Foundation, Inc.
    14 
    15    This document is released under the terms of the GNU Free
    16 Documentation License as published by the Free Software Foundation;
    17 either version 1.1, or (at your option) any later version.
    18 
    19    You should have received a copy of the GNU Free Documentation
    20 License along with GNU `sed'; see the file `COPYING.DOC'.  If not,
    21 write to the Free Software Foundation, 59 Temple Place - Suite 330,
    22 Boston, MA 02110-1301, USA.
    23 
    24    There are no Cover Texts and no Invariant Sections; this text, along
    25 with its equivalent in the printed manual, constitutes the Title Page.
    26 
    27 Indirect:
    28 sed.info-1: 935
    29 sed.info-2: 50405
     19
     20File: sed.info,  Node: Top,  Next: Introduction,  Up: (dir)
     21
     22GNU ‘sed’
     23*********
     24
     25This file documents version 4.9 of GNU ‘sed’, a stream editor.
     26
     27   Copyright © 1998–2022 Free Software Foundation, Inc.
     28
     29     Permission is granted to copy, distribute and/or modify this
     30     document under the terms of the GNU Free Documentation License,
     31     Version 1.3 or any later version published by the Free Software
     32     Foundation; with no Invariant Sections, no Front-Cover Texts, and
     33     no Back-Cover Texts.  A copy of the license is included in the
     34     section entitled “GNU Free Documentation License”.
     35
     36* Menu:
     37
     38* Introduction::               Introduction
     39* Invoking sed::               Invocation
     40* sed scripts::                ‘sed’ scripts
     41* sed addresses::              Addresses: selecting lines
     42* sed regular expressions::    Regular expressions: selecting text
     43* advanced sed::               Advanced ‘sed’: cycles and buffers
     44* Examples::                   Some sample scripts
     45* Limitations::                Limitations and (non-)limitations of GNU ‘sed’
     46* Other Resources::            Other resources for learning about ‘sed’
     47* Reporting Bugs::             Reporting bugs
     48* GNU Free Documentation License:: Copying and sharing this manual
     49* Concept Index::              A menu with all the topics in this manual.
     50* Command and Option Index::   A menu with all ‘sed’ commands and
     51                               command-line options.
     52
     53
     54File: sed.info,  Node: Introduction,  Next: Invoking sed,  Prev: Top,  Up: Top
     55
     561 Introduction
     57**************
     58
     59‘sed’ is a stream editor.  A stream editor is used to perform basic text
     60transformations on an input stream (a file or input from a pipeline).
     61While in some ways similar to an editor which permits scripted edits
     62(such as ‘ed’), ‘sed’ works by making only one pass over the input(s),
     63and is consequently more efficient.  But it is ‘sed’’s ability to filter
     64text in a pipeline which particularly distinguishes it from other types
     65of editors.
     66
     67
     68File: sed.info,  Node: Invoking sed,  Next: sed scripts,  Prev: Introduction,  Up: Top
     69
     702 Running sed
     71*************
     72
     73This chapter covers how to run ‘sed’.  Details of ‘sed’ scripts and
     74individual ‘sed’ commands are discussed in the next chapter.
     75
     76* Menu:
     77
     78* Overview::
     79* Command-Line Options::
     80* Exit status::
     81
     82
     83File: sed.info,  Node: Overview,  Next: Command-Line Options,  Up: Invoking sed
     84
     852.1 Overview
     86============
     87
     88Normally ‘sed’ is invoked like this:
     89
     90     sed SCRIPT INPUTFILE...
     91
     92   For example, to change every ‘hello’ to ‘world’ in the file
     93‘input.txt’:
     94
     95     sed 's/hello/world/g' input.txt > output.txt
     96
     97   Without the ‘g’ (global) modifier, ‘sed’ affects only the first
     98instance per line.
     99
     100   If you do not specify INPUTFILE, or if INPUTFILE is ‘-’, ‘sed’
     101filters the contents of the standard input.  The following commands are
     102equivalent:
     103
     104     sed 's/hello/world/g' input.txt > output.txt
     105     sed 's/hello/world/g' < input.txt > output.txt
     106     cat input.txt | sed 's/hello/world/g' - > output.txt
     107
     108   â€˜sed’ writes output to standard output.  Use ‘-i’ to edit files
     109in-place instead of printing to standard output.  See also the ‘W’ and
     110‘s///w’ commands for writing output to other files.  The following
     111command modifies ‘file.txt’ and does not produce any output:
     112
     113     sed -i 's/hello/world/' file.txt
     114
     115   By default ‘sed’ prints all processed input (except input that has
     116been modified/deleted by commands such as ‘d’).  Use ‘-n’ to suppress
     117output, and the ‘p’ command to print specific lines.  The following
     118command prints only line 45 of the input file:
     119
     120     sed -n '45p' file.txt
     121
     122   â€˜sed’ treats multiple input files as one long stream.  The following
     123example prints the first line of the first file (‘one.txt’) and the last
     124line of the last file (‘three.txt’).  Use ‘-s’ to reverse this behavior.
     125
     126     sed -n  '1p ; $p' one.txt two.txt three.txt
     127
     128   Without ‘-e’ or ‘-f’ options, ‘sed’ uses the first non-option
     129parameter as the SCRIPT, and the following non-option parameters as
     130input files.  If ‘-e’ or ‘-f’ options are used to specify a SCRIPT, all
     131non-option parameters are taken as input files.  Options ‘-e’ and ‘-f’
     132can be combined, and can appear multiple times (in which case the final
     133effective SCRIPT will be concatenation of all the individual SCRIPTs).
     134
     135   The following examples are equivalent:
     136
     137     sed 's/hello/world/' input.txt > output.txt
     138
     139     sed -e 's/hello/world/' input.txt > output.txt
     140     sed --expression='s/hello/world/' input.txt > output.txt
     141
     142     echo 's/hello/world/' > myscript.sed
     143     sed -f myscript.sed input.txt > output.txt
     144     sed --file=myscript.sed input.txt > output.txt
     145
     146
     147File: sed.info,  Node: Command-Line Options,  Next: Exit status,  Prev: Overview,  Up: Invoking sed
     148
     1492.2 Command-Line Options
     150========================
     151
     152The full format for invoking ‘sed’ is:
     153
     154     sed OPTIONS... [SCRIPT] [INPUTFILE...]
     155
     156   â€˜sed’ may be invoked with the following command-line options:
     157
     158‘--version’
     159     Print out the version of ‘sed’ that is being run and a copyright
     160     notice, then exit.
     161
     162‘--help’
     163     Print a usage message briefly summarizing these command-line
     164     options and the bug-reporting address, then exit.
     165
     166‘-n’
     167‘--quiet’
     168‘--silent’
     169     By default, ‘sed’ prints out the pattern space at the end of each
     170     cycle through the script (*note How ‘sed’ works: Execution Cycle.).
     171     These options disable this automatic printing, and ‘sed’ only
     172     produces output when explicitly told to via the ‘p’ command.
     173
     174‘--debug’
     175     Print the input sed program in canonical form, and annotate program
     176     execution.
     177          $ echo 1 | sed '\%1%s21232'
     178          3
     179
     180          $ echo 1 | sed --debug '\%1%s21232'
     181          SED PROGRAM:
     182            /1/ s/1/3/
     183          INPUT:   'STDIN' line 1
     184          PATTERN: 1
     185          COMMAND: /1/ s/1/3/
     186          PATTERN: 3
     187          END-OF-CYCLE:
     188          3
     189
     190‘-e SCRIPT’
     191‘--expression=SCRIPT’
     192     Add the commands in SCRIPT to the set of commands to be run while
     193     processing the input.
     194
     195‘-f SCRIPT-FILE’
     196‘--file=SCRIPT-FILE’
     197     Add the commands contained in the file SCRIPT-FILE to the set of
     198     commands to be run while processing the input.
     199
     200‘-i[SUFFIX]’
     201‘--in-place[=SUFFIX]’
     202     This option specifies that files are to be edited in-place.  GNU
     203     â€˜sed’ does this by creating a temporary file and sending output to
     204     this file rather than to the standard output.(1).
     205
     206     This option implies ‘-s’.
     207
     208     When the end of the file is reached, the temporary file is renamed
     209     to the output file’s original name.  The extension, if supplied, is
     210     used to modify the name of the old file before renaming the
     211     temporary file, thereby making a backup copy(2)).
     212
     213     This rule is followed: if the extension doesn’t contain a ‘*’, then
     214     it is appended to the end of the current filename as a suffix; if
     215     the extension does contain one or more ‘*’ characters, then _each_
     216     asterisk is replaced with the current filename.  This allows you to
     217     add a prefix to the backup file, instead of (or in addition to) a
     218     suffix, or even to place backup copies of the original files into
     219     another directory (provided the directory already exists).
     220
     221     If no extension is supplied, the original file is overwritten
     222     without making a backup.
     223
     224     Because ‘-i’ takes an optional argument, it should not be followed
     225     by other short options:
     226     â€˜sed -Ei '...' FILE’
     227          Same as ‘-E -i’ with no backup suffix - ‘FILE’ will be edited
     228          in-place without creating a backup.
     229
     230     â€˜sed -iE '...' FILE’
     231          This is equivalent to ‘--in-place=E’, creating ‘FILEE’ as
     232          backup of ‘FILE’
     233
     234     Be cautious of using ‘-n’ with ‘-i’: the former disables automatic
     235     printing of lines and the latter changes the file in-place without
     236     a backup.  Used carelessly (and without an explicit ‘p’ command),
     237     the output file will be empty:
     238          # WRONG USAGE: 'FILE' will be truncated.
     239          sed -ni 's/foo/bar/' FILE
     240
     241‘-l N’
     242‘--line-length=N’
     243     Specify the default line-wrap length for the ‘l’ command.  A length
     244     of 0 (zero) means to never wrap long lines.  If not specified, it
     245     is taken to be 70.
     246
     247‘--posix’
     248     GNU ‘sed’ includes several extensions to POSIX sed.  In order to
     249     simplify writing portable scripts, this option disables all the
     250     extensions that this manual documents, including additional
     251     commands.  Most of the extensions accept ‘sed’ programs that are
     252     outside the syntax mandated by POSIX, but some of them (such as the
     253     behavior of the ‘N’ command described in *note Reporting Bugs::)
     254     actually violate the standard.  If you want to disable only the
     255     latter kind of extension, you can set the ‘POSIXLY_CORRECT’
     256     variable to a non-empty value.
     257
     258‘-b’
     259‘--binary’
     260     This option is available on every platform, but is only effective
     261     where the operating system makes a distinction between text files
     262     and binary files.  When such a distinction is made—as is the case
     263     for MS-DOS, Windows, Cygwin—text files are composed of lines
     264     separated by a carriage return _and_ a line feed character, and
     265     â€˜sed’ does not see the ending CR. When this option is specified,
     266     â€˜sed’ will open input files in binary mode, thus not requesting
     267     this special processing and considering lines to end at a line
     268     feed.
     269
     270‘--follow-symlinks’
     271     This option is available only on platforms that support symbolic
     272     links and has an effect only if option ‘-i’ is specified.  In this
     273     case, if the file that is specified on the command line is a
     274     symbolic link, ‘sed’ will follow the link and edit the ultimate
     275     destination of the link.  The default behavior is to break the
     276     symbolic link, so that the link destination will not be modified.
     277
     278‘-E’
     279‘-r’
     280‘--regexp-extended’
     281     Use extended regular expressions rather than basic regular
     282     expressions.  Extended regexps are those that ‘egrep’ accepts; they
     283     can be clearer because they usually have fewer backslashes.
     284     Historically this was a GNU extension, but the ‘-E’ extension has
     285     since been added to the POSIX standard
     286     (http://austingroupbugs.net/view.php?id=528), so use ‘-E’ for
     287     portability.  GNU sed has accepted ‘-E’ as an undocumented option
     288     for years, and *BSD seds have accepted ‘-E’ for years as well, but
     289     scripts that use ‘-E’ might not port to other older systems.  *Note
     290     Extended regular expressions: ERE syntax.
     291
     292‘-s’
     293‘--separate’
     294     By default, ‘sed’ will consider the files specified on the command
     295     line as a single continuous long stream.  This GNU ‘sed’ extension
     296     allows the user to consider them as separate files: range addresses
     297     (such as ‘/abc/,/def/’) are not allowed to span several files, line
     298     numbers are relative to the start of each file, ‘$’ refers to the
     299     last line of each file, and files invoked from the ‘R’ commands are
     300     rewound at the start of each file.
     301
     302‘--sandbox’
     303     In sandbox mode, ‘e/w/r’ commands are rejected - programs
     304     containing them will be aborted without being run.  Sandbox mode
     305     ensures ‘sed’ operates only on the input files designated on the
     306     command line, and cannot run external programs.
     307
     308‘-u’
     309‘--unbuffered’
     310     Buffer both input and output as minimally as practical.  (This is
     311     particularly useful if the input is coming from the likes of ‘tail
     312     -f’, and you wish to see the transformed output as soon as
     313     possible.)
     314
     315‘-z’
     316‘--null-data’
     317‘--zero-terminated’
     318     Treat the input as a set of lines, each terminated by a zero byte
     319     (the ASCII ‘NUL’ character) instead of a newline.  This option can
     320     be used with commands like ‘sort -z’ and ‘find -print0’ to process
     321     arbitrary file names.
     322
     323   If no ‘-e’, ‘-f’, ‘--expression’, or ‘--file’ options are given on
     324the command-line, then the first non-option argument on the command line
     325is taken to be the SCRIPT to be executed.
     326
     327   If any command-line parameters remain after processing the above,
     328these parameters are interpreted as the names of input files to be
     329processed.  A file name of ‘-’ refers to the standard input stream.  The
     330standard input will be processed if no file names are specified.
     331
     332   ---------- Footnotes ----------
     333
     334   (1) This applies to commands such as ‘=’, ‘a’, ‘c’, ‘i’, ‘l’, ‘p’.
     335You can still write to the standard output by using the ‘w’ or ‘W’
     336commands together with the ‘/dev/stdout’ special file
     337
     338   (2) Note that GNU ‘sed’ creates the backup file whether or not any
     339output is actually changed.
     340
     341
     342File: sed.info,  Node: Exit status,  Prev: Command-Line Options,  Up: Invoking sed
     343
     3442.3 Exit status
     345===============
     346
     347An exit status of zero indicates success, and a nonzero value indicates
     348failure.  GNU ‘sed’ returns the following exit status error values:
     349
     3500
     351     Successful completion.
     352
     3531
     354     Invalid command, invalid syntax, invalid regular expression or a
     355     GNU ‘sed’ extension command used with ‘--posix’.
     356
     3572
     358     One or more of the input file specified on the command line could
     359     not be opened (e.g.  if a file is not found, or read permission is
     360     denied).  Processing continued with other files.
     361
     3624
     363     An I/O error, or a serious processing error during runtime, GNU
     364     â€˜sed’ aborted immediately.
     365
     366   Additionally, the commands ‘q’ and ‘Q’ can be used to terminate ‘sed’
     367with a custom exit code value (this is a GNU ‘sed’ extension):
     368
     369     $ echo | sed 'Q42' ; echo $?
     370     42
     371
     372
     373File: sed.info,  Node: sed scripts,  Next: sed addresses,  Prev: Invoking sed,  Up: Top
     374
     3753 ‘sed’ scripts
     376***************
     377
     378* Menu:
     379
     380* sed script overview::      ‘sed’ script overview
     381* sed commands list::        ‘sed’ commands summary
     382* The "s" Command::          ‘sed’’s Swiss Army Knife
     383* Common Commands::          Often used commands
     384* Other Commands::           Less frequently used commands
     385* Programming Commands::     Commands for ‘sed’ gurus
     386* Extended Commands::        Commands specific of GNU ‘sed’
     387* Multiple commands syntax:: Extension for easier scripting
     388
     389
     390File: sed.info,  Node: sed script overview,  Next: sed commands list,  Up: sed scripts
     391
     3923.1 ‘sed’ script overview
     393=========================
     394
     395A ‘sed’ program consists of one or more ‘sed’ commands, passed in by one
     396or more of the ‘-e’, ‘-f’, ‘--expression’, and ‘--file’ options, or the
     397first non-option argument if zero of these options are used.  This
     398document will refer to “the” ‘sed’ script; this is understood to mean
     399the in-order concatenation of all of the SCRIPTs and SCRIPT-FILEs passed
     400in.  *Note Overview::.
     401
     402   â€˜sed’ commands follow this syntax:
     403
     404     [addr]X[options]
     405
     406   X is a single-letter ‘sed’ command.  ‘[addr]’ is an optional line
     407address.  If ‘[addr]’ is specified, the command X will be executed only
     408on the matched lines.  ‘[addr]’ can be a single line number, a regular
     409expression, or a range of lines (*note sed addresses::).  Additional
     410‘[options]’ are used for some ‘sed’ commands.
     411
     412   The following example deletes lines 30 to 35 in the input.  ‘30,35’
     413is an address range.  ‘d’ is the delete command:
     414
     415     sed '30,35d' input.txt > output.txt
     416
     417   The following example prints all input until a line starting with the
     418string ‘foo’ is found.  If such line is found, ‘sed’ will terminate with
     419exit status 42.  If such line was not found (and no other error
     420occurred), ‘sed’ will exit with status 0.  ‘/^foo/’ is a
     421regular-expression address.  ‘q’ is the quit command.  ‘42’ is the
     422command option.
     423
     424     sed '/^foo/q42' input.txt > output.txt
     425
     426   Commands within a SCRIPT or SCRIPT-FILE can be separated by
     427semicolons (‘;’) or newlines (ASCII 10).  Multiple scripts can be
     428specified with ‘-e’ or ‘-f’ options.
     429
     430   The following examples are all equivalent.  They perform two ‘sed’
     431operations: deleting any lines matching the regular expression ‘/^foo/’,
     432and replacing all occurrences of the string ‘hello’ with ‘world’:
     433
     434     sed '/^foo/d ; s/hello/world/g' input.txt > output.txt
     435
     436     sed -e '/^foo/d' -e 's/hello/world/g' input.txt > output.txt
     437
     438     echo '/^foo/d' > script.sed
     439     echo 's/hello/world/g' >> script.sed
     440     sed -f script.sed input.txt > output.txt
     441
     442     echo 's/hello/world/g' > script2.sed
     443     sed -e '/^foo/d' -f script2.sed input.txt > output.txt
     444
     445   Commands ‘a’, ‘c’, ‘i’, due to their syntax, cannot be followed by
     446semicolons working as command separators and thus should be terminated
     447with newlines or be placed at the end of a SCRIPT or SCRIPT-FILE.
     448Commands can also be preceded with optional non-significant whitespace
     449characters.  *Note Multiple commands syntax::.
     450
     451
     452File: sed.info,  Node: sed commands list,  Next: The "s" Command,  Prev: sed script overview,  Up: sed scripts
     453
     4543.2 ‘sed’ commands summary
     455==========================
     456
     457The following commands are supported in GNU ‘sed’.  Some are standard
     458POSIX commands, while other are GNU extensions.  Details and examples
     459for each command are in the following sections.  (Mnemonics) are shown
     460in parentheses.
     461
     462‘a\’
     463‘TEXT’
     464     Append TEXT after a line.
     465
     466‘a TEXT’
     467     Append TEXT after a line (alternative syntax).
     468
     469‘b LABEL’
     470     Branch unconditionally to LABEL.  The LABEL may be omitted, in
     471     which case the next cycle is started.
     472
     473‘c\’
     474‘TEXT’
     475     Replace (change) lines with TEXT.
     476
     477‘c TEXT’
     478     Replace (change) lines with TEXT (alternative syntax).
     479
     480‘d’
     481     Delete the pattern space; immediately start next cycle.
     482
     483‘D’
     484     If pattern space contains newlines, delete text in the pattern
     485     space up to the first newline, and restart cycle with the resultant
     486     pattern space, without reading a new line of input.
     487
     488     If pattern space contains no newline, start a normal new cycle as
     489     if the ‘d’ command was issued.
     490
     491‘e’
     492     Executes the command that is found in pattern space and replaces
     493     the pattern space with the output; a trailing newline is
     494     suppressed.
     495
     496‘e COMMAND’
     497     Executes COMMAND and sends its output to the output stream.  The
     498     command can run across multiple lines, all but the last ending with
     499     a back-slash.
     500
     501‘F’
     502     (filename) Print the file name of the current input file (with a
     503     trailing newline).
     504
     505‘g’
     506     Replace the contents of the pattern space with the contents of the
     507     hold space.
     508
     509‘G’
     510     Append a newline to the contents of the pattern space, and then
     511     append the contents of the hold space to that of the pattern space.
     512
     513‘h’
     514     (hold) Replace the contents of the hold space with the contents of
     515     the pattern space.
     516
     517‘H’
     518     Append a newline to the contents of the hold space, and then append
     519     the contents of the pattern space to that of the hold space.
     520
     521‘i\’
     522‘TEXT’
     523     insert TEXT before a line.
     524
     525‘i TEXT’
     526     insert TEXT before a line (alternative syntax).
     527
     528‘l’
     529     Print the pattern space in an unambiguous form.
     530
     531‘n’
     532     (next) If auto-print is not disabled, print the pattern space,
     533     then, regardless, replace the pattern space with the next line of
     534     input.  If there is no more input then ‘sed’ exits without
     535     processing any more commands.
     536
     537‘N’
     538     Add a newline to the pattern space, then append the next line of
     539     input to the pattern space.  If there is no more input then ‘sed’
     540     exits without processing any more commands.
     541
     542‘p’
     543     Print the pattern space.
     544
     545‘P’
     546     Print the pattern space, up to the first <newline>.
     547
     548‘q[EXIT-CODE]’
     549     (quit) Exit ‘sed’ without processing any more commands or input.
     550
     551‘Q[EXIT-CODE]’
     552     (quit) This command is the same as ‘q’, but will not print the
     553     contents of pattern space.  Like ‘q’, it provides the ability to
     554     return an exit code to the caller.
     555
     556‘r filename’
     557     Reads file FILENAME.
     558
     559‘R filename’
     560     Queue a line of FILENAME to be read and inserted into the output
     561     stream at the end of the current cycle, or when the next input line
     562     is read.
     563
     564‘s/REGEXP/REPLACEMENT/[FLAGS]’
     565     (substitute) Match the regular-expression against the content of
     566     the pattern space.  If found, replace matched string with
     567     REPLACEMENT.
     568
     569‘t LABEL’
     570     (test) Branch to LABEL only if there has been a successful
     571     â€˜s’ubstitution since the last input line was read or conditional
     572     branch was taken.  The LABEL may be omitted, in which case the next
     573     cycle is started.
     574
     575‘T LABEL’
     576     (test) Branch to LABEL only if there have been no successful
     577     â€˜s’ubstitutions since the last input line was read or conditional
     578     branch was taken.  The LABEL may be omitted, in which case the next
     579     cycle is started.
     580
     581‘v [VERSION]’
     582     (version) This command does nothing, but makes ‘sed’ fail if GNU
     583     â€˜sed’ extensions are not supported, or if the requested version is
     584     not available.
     585
     586‘w filename’
     587     Write the pattern space to FILENAME.
     588
     589‘W filename’
     590     Write to the given filename the portion of the pattern space up to
     591     the first newline
     592
     593‘x’
     594     Exchange the contents of the hold and pattern spaces.
     595
     596‘y/src/dst/’
     597     Transliterate any characters in the pattern space which match any
     598     of the SOURCE-CHARS with the corresponding character in DEST-CHARS.
     599
     600‘z’
     601     (zap) This command empties the content of pattern space.
     602
     603‘#’
     604     A comment, until the next newline.
     605
     606‘{ CMD ; CMD ... }’
     607     Group several commands together.
     608
     609‘=’
     610     Print the current input line number (with a trailing newline).
     611
     612‘: LABEL’
     613     Specify the location of LABEL for branch commands (‘b’, ‘t’, ‘T’).
     614
     615
     616File: sed.info,  Node: The "s" Command,  Next: Common Commands,  Prev: sed commands list,  Up: sed scripts
     617
     6183.3 The ‘s’ Command
     619===================
     620
     621The ‘s’ command (as in substitute) is probably the most important in
     622‘sed’ and has a lot of different options.  The syntax of the ‘s’ command
     623is ‘s/REGEXP/REPLACEMENT/FLAGS’.
     624
     625   Its basic concept is simple: the ‘s’ command attempts to match the
     626pattern space against the supplied regular expression REGEXP; if the
     627match is successful, then that portion of the pattern space which was
     628matched is replaced with REPLACEMENT.
     629
     630   For details about REGEXP syntax *note Regular Expression Addresses:
     631Regexp Addresses.
     632
     633   The REPLACEMENT can contain ‘\N’ (N being a number from 1 to 9,
     634inclusive) references, which refer to the portion of the match which is
     635contained between the Nth ‘\(’ and its matching ‘\)’.  Also, the
     636REPLACEMENT can contain unescaped ‘&’ characters which reference the
     637whole matched portion of the pattern space.
     638
     639   The ‘/’ characters may be uniformly replaced by any other single
     640character within any given ‘s’ command.  The ‘/’ character (or whatever
     641other character is used in its stead) can appear in the REGEXP or
     642REPLACEMENT only if it is preceded by a ‘\’ character.
     643
     644   Finally, as a GNU ‘sed’ extension, you can include a special sequence
     645made of a backslash and one of the letters ‘L’, ‘l’, ‘U’, ‘u’, or ‘E’.
     646The meaning is as follows:
     647
     648‘\L’
     649     Turn the replacement to lowercase until a ‘\U’ or ‘\E’ is found,
     650
     651‘\l’
     652     Turn the next character to lowercase,
     653
     654‘\U’
     655     Turn the replacement to uppercase until a ‘\L’ or ‘\E’ is found,
     656
     657‘\u’
     658     Turn the next character to uppercase,
     659
     660‘\E’
     661     Stop case conversion started by ‘\L’ or ‘\U’.
     662
     663   When the ‘g’ flag is being used, case conversion does not propagate
     664from one occurrence of the regular expression to another.  For example,
     665when the following command is executed with ‘a-b-’ in pattern space:
     666     s/\(b\?\)-/x\u\1/g
     667
     668the output is ‘axxB’.  When replacing the first ‘-’, the ‘\u’ sequence
     669only affects the empty replacement of ‘\1’.  It does not affect the ‘x’
     670character that is added to pattern space when replacing ‘b-’ with ‘xB’.
     671
     672   On the other hand, ‘\l’ and ‘\u’ do affect the remainder of the
     673replacement text if they are followed by an empty substitution.  With
     674‘a-b-’ in pattern space, the following command:
     675     s/\(b\?\)-/\u\1x/g
     676
     677will replace ‘-’ with ‘X’ (uppercase) and ‘b-’ with ‘Bx’.  If this
     678behavior is undesirable, you can prevent it by adding a ‘\E’
     679sequence—after ‘\1’ in this case.
     680
     681   To include a literal ‘\’, ‘&’, or newline in the final replacement,
     682be sure to precede the desired ‘\’, ‘&’, or newline in the REPLACEMENT
     683with a ‘\’.
     684
     685   The ‘s’ command can be followed by zero or more of the following
     686FLAGS:
     687
     688‘g’
     689     Apply the replacement to _all_ matches to the REGEXP, not just the
     690     first.
     691
     692‘NUMBER’
     693     Only replace the NUMBERth match of the REGEXP.
     694
     695     interaction in ‘s’ command Note: the POSIX standard does not
     696     specify what should happen when you mix the ‘g’ and NUMBER
     697     modifiers, and currently there is no widely agreed upon meaning
     698     across ‘sed’ implementations.  For GNU ‘sed’, the interaction is
     699     defined to be: ignore matches before the NUMBERth, and then match
     700     and replace all matches from the NUMBERth on.
     701
     702‘p’
     703     If the substitution was made, then print the new pattern space.
     704
     705     Note: when both the ‘p’ and ‘e’ options are specified, the relative
     706     ordering of the two produces very different results.  In general,
     707     â€˜ep’ (evaluate then print) is what you want, but operating the
     708     other way round can be useful for debugging.  For this reason, the
     709     current version of GNU ‘sed’ interprets specially the presence of
     710     â€˜p’ options both before and after ‘e’, printing the pattern space
     711     before and after evaluation, while in general flags for the ‘s’
     712     command show their effect just once.  This behavior, although
     713     documented, might change in future versions.
     714
     715‘w FILENAME’
     716     If the substitution was made, then write out the result to the
     717     named file.  As a GNU ‘sed’ extension, two special values of
     718     FILENAME are supported: ‘/dev/stderr’, which writes the result to
     719     the standard error, and ‘/dev/stdout’, which writes to the standard
     720     output.(1)
     721
     722‘e’
     723     This command allows one to pipe input from a shell command into
     724     pattern space.  If a substitution was made, the command that is
     725     found in pattern space is executed and pattern space is replaced
     726     with its output.  A trailing newline is suppressed; results are
     727     undefined if the command to be executed contains a NUL character.
     728     This is a GNU ‘sed’ extension.
     729
     730‘I’
     731‘i’
     732     The ‘I’ modifier to regular-expression matching is a GNU extension
     733     which makes ‘sed’ match REGEXP in a case-insensitive manner.
     734
     735‘M’
     736‘m’
     737     The ‘M’ modifier to regular-expression matching is a GNU ‘sed’
     738     extension which directs GNU ‘sed’ to match the regular expression
     739     in ‘multi-line’ mode.  The modifier causes ‘^’ and ‘$’ to match
     740     respectively (in addition to the normal behavior) the empty string
     741     after a newline, and the empty string before a newline.  There are
     742     special character sequences (‘\`’ and ‘\'’) which always match the
     743     beginning or the end of the buffer.  In addition, the period
     744     character does not match a new-line character in multi-line mode.
     745
     746   ---------- Footnotes ----------
     747
     748   (1) This is equivalent to ‘p’ unless the ‘-i’ option is being used.
     749
     750
     751File: sed.info,  Node: Common Commands,  Next: Other Commands,  Prev: The "s" Command,  Up: sed scripts
     752
     7533.4 Often-Used Commands
     754=======================
     755
     756If you use ‘sed’ at all, you will quite likely want to know these
     757commands.
     758
     759‘#’
     760     [No addresses allowed.]
     761
     762     The ‘#’ character begins a comment; the comment continues until the
     763     next newline.
     764
     765     If you are concerned about portability, be aware that some
     766     implementations of ‘sed’ (which are not POSIX conforming) may only
     767     support a single one-line comment, and then only when the very
     768     first character of the script is a ‘#’.
     769
     770     Warning: if the first two characters of the ‘sed’ script are ‘#n’,
     771     then the ‘-n’ (no-autoprint) option is forced.  If you want to put
     772     a comment in the first line of your script and that comment begins
     773     with the letter ‘n’ and you do not want this behavior, then be sure
     774     to either use a capital ‘N’, or place at least one space before the
     775     â€˜n’.
     776
     777‘q [EXIT-CODE]’
     778     Exit ‘sed’ without processing any more commands or input.
     779
     780     Example: stop after printing the second line:
     781          $ seq 3 | sed 2q
     782          1
     783          2
     784
     785     This command accepts only one address.  Note that the current
     786     pattern space is printed if auto-print is not disabled with the
     787     â€˜-n’ options.  The ability to return an exit code from the ‘sed’
     788     script is a GNU ‘sed’ extension.
     789
     790     See also the GNU ‘sed’ extension ‘Q’ command which quits silently
     791     without printing the current pattern space.
     792
     793‘d’
     794     Delete the pattern space; immediately start next cycle.
     795
     796     Example: delete the second input line:
     797          $ seq 3 | sed 2d
     798          1
     799          3
     800
     801‘p’
     802     Print out the pattern space (to the standard output).  This command
     803     is usually only used in conjunction with the ‘-n’ command-line
     804     option.
     805
     806     Example: print only the second input line:
     807          $ seq 3 | sed -n 2p
     808          2
     809
     810‘n’
     811     If auto-print is not disabled, print the pattern space, then,
     812     regardless, replace the pattern space with the next line of input.
     813     If there is no more input then ‘sed’ exits without processing any
     814     more commands.
     815
     816     This command is useful to skip lines (e.g.  process every Nth
     817     line).
     818
     819     Example: perform substitution on every 3rd line (i.e.  two ‘n’
     820     commands skip two lines):
     821          $ seq 6 | sed 'n;n;s/./x/'
     822          1
     823          2
     824          x
     825          4
     826          5
     827          x
     828
     829     GNU ‘sed’ provides an extension address syntax of FIRST~STEP to
     830     achieve the same result:
     831
     832          $ seq 6 | sed '0~3s/./x/'
     833          1
     834          2
     835          x
     836          4
     837          5
     838          x
     839
     840‘{ COMMANDS }’
     841     A group of commands may be enclosed between ‘{’ and ‘}’ characters.
     842     This is particularly useful when you want a group of commands to be
     843     triggered by a single address (or address-range) match.
     844
     845     Example: perform substitution then print the second input line:
     846          $ seq 3 | sed -n '2{s/2/X/ ; p}'
     847          X
     848
     849
     850File: sed.info,  Node: Other Commands,  Next: Programming Commands,  Prev: Common Commands,  Up: sed scripts
     851
     8523.5 Less Frequently-Used Commands
     853=================================
     854
     855Though perhaps less frequently used than those in the previous section,
     856some very small yet useful ‘sed’ scripts can be built with these
     857commands.
     858
     859‘y/SOURCE-CHARS/DEST-CHARS/’
     860     Transliterate any characters in the pattern space which match any
     861     of the SOURCE-CHARS with the corresponding character in DEST-CHARS.
     862
     863     Example: transliterate ‘a-j’ into ‘0-9’:
     864          $ echo hello world | sed 'y/abcdefghij/0123456789/'
     865          74llo worl3
     866
     867     (The ‘/’ characters may be uniformly replaced by any other single
     868     character within any given ‘y’ command.)
     869
     870     Instances of the ‘/’ (or whatever other character is used in its
     871     stead), ‘\’, or newlines can appear in the SOURCE-CHARS or
     872     DEST-CHARS lists, provide that each instance is escaped by a ‘\’.
     873     The SOURCE-CHARS and DEST-CHARS lists _must_ contain the same
     874     number of characters (after de-escaping).
     875
     876     See the ‘tr’ command from GNU coreutils for similar functionality.
     877
     878‘a TEXT’
     879     Appending TEXT after a line.  This is a GNU extension to the
     880     standard ‘a’ command - see below for details.
     881
     882     Example: Add ‘hello’ after the second line:
     883          $ seq 3 | sed '2a hello'
     884          1
     885          2
     886          hello
     887          3
     888
     889     Leading whitespace after the ‘a’ command is ignored.  The text to
     890     add is read until the end of the line.
     891
     892‘a\’
     893‘TEXT’
     894     Appending TEXT after a line.
     895
     896     Example: Add ‘hello’ after the second line (⊣ indicates printed
     897     output lines):
     898          $ seq 3 | sed '2a\
     899          hello'
     900          ⊣1
     901          ⊣2
     902          ⊣hello
     903          ⊣3
     904
     905     The ‘a’ command queues the lines of text which follow this command
     906     (each but the last ending with a ‘\’, which are removed from the
     907     output) to be output at the end of the current cycle, or when the
     908     next input line is read.
     909
     910     As a GNU extension, this command accepts two addresses.
     911
     912     Escape sequences in TEXT are processed, so you should use ‘\\’ in
     913     TEXT to print a single backslash.
     914
     915     The commands resume after the last line without a backslash (‘\’) -
     916     â€˜world’ in the following example:
     917          $ seq 3 | sed '2a\
     918          hello\
     919          world
     920          3s/./X/'
     921          ⊣1
     922          ⊣2
     923          ⊣hello
     924          ⊣world
     925          ⊣X
     926
     927     As a GNU extension, the ‘a’ command and TEXT can be separated into
     928     two ‘-e’ parameters, enabling easier scripting:
     929          $ seq 3 | sed -e '2a\' -e hello
     930          1
     931          2
     932          hello
     933          3
     934
     935          $ sed -e '2a\' -e "$VAR"
     936
     937‘i TEXT’
     938     insert TEXT before a line.  This is a GNU extension to the standard
     939     â€˜i’ command - see below for details.
     940
     941     Example: Insert ‘hello’ before the second line:
     942          $ seq 3 | sed '2i hello'
     943          1
     944          hello
     945          2
     946          3
     947
     948     Leading whitespace after the ‘i’ command is ignored.  The text to
     949     add is read until the end of the line.
     950
     951‘i\’
     952‘TEXT’
     953     Immediately output the lines of text which follow this command.
     954
     955     Example: Insert ‘hello’ before the second line (⊣ indicates printed
     956     output lines):
     957          $ seq 3 | sed '2i\
     958          hello'
     959          ⊣1
     960          ⊣hello
     961          ⊣2
     962          ⊣3
     963
     964     As a GNU extension, this command accepts two addresses.
     965
     966     Escape sequences in TEXT are processed, so you should use ‘\\’ in
     967     TEXT to print a single backslash.
     968
     969     The commands resume after the last line without a backslash (‘\’) -
     970     â€˜world’ in the following example:
     971          $ seq 3 | sed '2i\
     972          hello\
     973          world
     974          s/./X/'
     975          ⊣X
     976          ⊣hello
     977          ⊣world
     978          ⊣X
     979          ⊣X
     980
     981     As a GNU extension, the ‘i’ command and TEXT can be separated into
     982     two ‘-e’ parameters, enabling easier scripting:
     983          $ seq 3 | sed -e '2i\' -e hello
     984          1
     985          hello
     986          2
     987          3
     988
     989          $ sed -e '2i\' -e "$VAR"
     990
     991‘c TEXT’
     992     Replaces the line(s) with TEXT.  This is a GNU extension to the
     993     standard ‘c’ command - see below for details.
     994
     995     Example: Replace the 2nd to 9th lines with the word ‘hello’:
     996          $ seq 10 | sed '2,9c hello'
     997          1
     998          hello
     999          10
     1000
     1001     Leading whitespace after the ‘c’ command is ignored.  The text to
     1002     add is read until the end of the line.
     1003
     1004‘c\’
     1005‘TEXT’
     1006     Delete the lines matching the address or address-range, and output
     1007     the lines of text which follow this command.
     1008
     1009     Example: Replace 2nd to 4th lines with the words ‘hello’ and
     1010     â€˜world’ (⊣ indicates printed output lines):
     1011          $ seq 5 | sed '2,4c\
     1012          hello\
     1013          world'
     1014          ⊣1
     1015          ⊣hello
     1016          ⊣world
     1017          ⊣5
     1018
     1019     If no addresses are given, each line is replaced.
     1020
     1021     A new cycle is started after this command is done, since the
     1022     pattern space will have been deleted.  In the following example,
     1023     the ‘c’ starts a new cycle and the substitution command is not
     1024     performed on the replaced text:
     1025
     1026          $ seq 3 | sed '2c\
     1027          hello
     1028          s/./X/'
     1029          ⊣X
     1030          ⊣hello
     1031          ⊣X
     1032
     1033     As a GNU extension, the ‘c’ command and TEXT can be separated into
     1034     two ‘-e’ parameters, enabling easier scripting:
     1035          $ seq 3 | sed -e '2c\' -e hello
     1036          1
     1037          hello
     1038          3
     1039
     1040          $ sed -e '2c\' -e "$VAR"
     1041
     1042‘=’
     1043     Print out the current input line number (with a trailing newline).
     1044
     1045          $ printf '%s\n' aaa bbb ccc | sed =
     1046          1
     1047          aaa
     1048          2
     1049          bbb
     1050          3
     1051          ccc
     1052
     1053     As a GNU extension, this command accepts two addresses.
     1054
     1055‘l N’
     1056     Print the pattern space in an unambiguous form: non-printable
     1057     characters (and the ‘\’ character) are printed in C-style escaped
     1058     form; long lines are split, with a trailing ‘\’ character to
     1059     indicate the split; the end of each line is marked with a ‘$’.
     1060
     1061     N specifies the desired line-wrap length; a length of 0 (zero)
     1062     means to never wrap long lines.  If omitted, the default as
     1063     specified on the command line is used.  The N parameter is a GNU
     1064     â€˜sed’ extension.
     1065
     1066‘r FILENAME’
     1067
     1068     Reads file FILENAME.  Example:
     1069
     1070          $ seq 3 | sed '2r/etc/hostname'
     1071          1
     1072          2
     1073          fencepost.gnu.org
     1074          3
     1075
     1076     Queue the contents of FILENAME to be read and inserted into the
     1077     output stream at the end of the current cycle, or when the next
     1078     input line is read.  Note that if FILENAME cannot be read, it is
     1079     treated as if it were an empty file, without any error indication.
     1080
     1081     As a GNU ‘sed’ extension, the special value ‘/dev/stdin’ is
     1082     supported for the file name, which reads the contents of the
     1083     standard input.
     1084
     1085     As a GNU extension, this command accepts two addresses.  The file
     1086     will then be reread and inserted on each of the addressed lines.
     1087
     1088     As a GNU ‘sed’ extension, the ‘r’ command accepts a zero address,
     1089     inserting a file _before_ the first line of the input *note Adding
     1090     a header to multiple files::.
     1091
     1092‘w FILENAME’
     1093     Write the pattern space to FILENAME.  As a GNU ‘sed’ extension, two
     1094     special values of FILENAME are supported: ‘/dev/stderr’, which
     1095     writes the result to the standard error, and ‘/dev/stdout’, which
     1096     writes to the standard output.(1)
     1097
     1098     The file will be created (or truncated) before the first input line
     1099     is read; all ‘w’ commands (including instances of the ‘w’ flag on
     1100     successful ‘s’ commands) which refer to the same FILENAME are
     1101     output without closing and reopening the file.
     1102
     1103‘D’
     1104     If pattern space contains no newline, start a normal new cycle as
     1105     if the ‘d’ command was issued.  Otherwise, delete text in the
     1106     pattern space up to the first newline, and restart cycle with the
     1107     resultant pattern space, without reading a new line of input.
     1108
     1109‘N’
     1110     Add a newline to the pattern space, then append the next line of
     1111     input to the pattern space.  If there is no more input then ‘sed’
     1112     exits without processing any more commands.
     1113
     1114     When ‘-z’ is used, a zero byte (the ascii ‘NUL’ character) is added
     1115     between the lines (instead of a new line).
     1116
     1117     By default ‘sed’ does not terminate if there is no ’next’ input
     1118     line.  This is a GNU extension which can be disabled with
     1119     â€˜--posix’.  *Note N command on the last line: N_command_last_line.
     1120
     1121‘P’
     1122     Print out the portion of the pattern space up to the first newline.
     1123
     1124‘h’
     1125     Replace the contents of the hold space with the contents of the
     1126     pattern space.
     1127
     1128‘H’
     1129     Append a newline to the contents of the hold space, and then append
     1130     the contents of the pattern space to that of the hold space.
     1131
     1132‘g’
     1133     Replace the contents of the pattern space with the contents of the
     1134     hold space.
     1135
     1136‘G’
     1137     Append a newline to the contents of the pattern space, and then
     1138     append the contents of the hold space to that of the pattern space.
     1139
     1140‘x’
     1141     Exchange the contents of the hold and pattern spaces.
     1142
     1143   ---------- Footnotes ----------
     1144
     1145   (1) This is equivalent to ‘p’ unless the ‘-i’ option is being used.
     1146
     1147
     1148File: sed.info,  Node: Programming Commands,  Next: Extended Commands,  Prev: Other Commands,  Up: sed scripts
     1149
     11503.6 Commands for ‘sed’ gurus
     1151============================
     1152
     1153In most cases, use of these commands indicates that you are probably
     1154better off programming in something like ‘awk’ or Perl.  But
     1155occasionally one is committed to sticking with ‘sed’, and these commands
     1156can enable one to write quite convoluted scripts.
     1157
     1158‘: LABEL’
     1159     [No addresses allowed.]
     1160
     1161     Specify the location of LABEL for branch commands.  In all other
     1162     respects, a no-op.
     1163
     1164‘b LABEL’
     1165     Unconditionally branch to LABEL.  The LABEL may be omitted, in
     1166     which case the next cycle is started.
     1167
     1168‘t LABEL’
     1169     Branch to LABEL only if there has been a successful ‘s’ubstitution
     1170     since the last input line was read or conditional branch was taken.
     1171     The LABEL may be omitted, in which case the next cycle is started.
     1172
     1173
     1174File: sed.info,  Node: Extended Commands,  Next: Multiple commands syntax,  Prev: Programming Commands,  Up: sed scripts
     1175
     11763.7 Commands Specific to GNU ‘sed’
     1177==================================
     1178
     1179These commands are specific to GNU ‘sed’, so you must use them with care
     1180and only when you are sure that hindering portability is not evil.  They
     1181allow you to check for GNU ‘sed’ extensions or to do tasks that are
     1182required quite often, yet are unsupported by standard ‘sed’s.
     1183
     1184‘e [COMMAND]’
     1185     This command allows one to pipe input from a shell command into
     1186     pattern space.  Without parameters, the ‘e’ command executes the
     1187     command that is found in pattern space and replaces the pattern
     1188     space with the output; a trailing newline is suppressed.
     1189
     1190     If a parameter is specified, instead, the ‘e’ command interprets it
     1191     as a command and sends its output to the output stream.  The
     1192     command can run across multiple lines, all but the last ending with
     1193     a back-slash.
     1194
     1195     In both cases, the results are undefined if the command to be
     1196     executed contains a NUL character.
     1197
     1198     Note that, unlike the ‘r’ command, the output of the command will
     1199     be printed immediately; the ‘r’ command instead delays the output
     1200     to the end of the current cycle.
     1201
     1202‘F’
     1203     Print out the file name of the current input file (with a trailing
     1204     newline).
     1205
     1206‘Q [EXIT-CODE]’
     1207     This command accepts only one address.
     1208
     1209     This command is the same as ‘q’, but will not print the contents of
     1210     pattern space.  Like ‘q’, it provides the ability to return an exit
     1211     code to the caller.
     1212
     1213     This command can be useful because the only alternative ways to
     1214     accomplish this apparently trivial function are to use the ‘-n’
     1215     option (which can unnecessarily complicate your script) or
     1216     resorting to the following snippet, which wastes time by reading
     1217     the whole file without any visible effect:
     1218
     1219          :eat
     1220          $d       Quit silently on the last line
     1221          N        Read another line, silently
     1222          g        Overwrite pattern space each time to save memory
     1223          b eat
     1224
     1225‘R FILENAME’
     1226     Queue a line of FILENAME to be read and inserted into the output
     1227     stream at the end of the current cycle, or when the next input line
     1228     is read.  Note that if FILENAME cannot be read, or if its end is
     1229     reached, no line is appended, without any error indication.
     1230
     1231     As with the ‘r’ command, the special value ‘/dev/stdin’ is
     1232     supported for the file name, which reads a line from the standard
     1233     input.
     1234
     1235‘T LABEL’
     1236     Branch to LABEL only if there have been no successful
     1237     â€˜s’ubstitutions since the last input line was read or conditional
     1238     branch was taken.  The LABEL may be omitted, in which case the next
     1239     cycle is started.
     1240
     1241‘v VERSION’
     1242     This command does nothing, but makes ‘sed’ fail if GNU ‘sed’
     1243     extensions are not supported, simply because other versions of
     1244     â€˜sed’ do not implement it.  In addition, you can specify the
     1245     version of ‘sed’ that your script requires, such as ‘4.0.5’.  The
     1246     default is ‘4.0’ because that is the first version that implemented
     1247     this command.
     1248
     1249     This command enables all GNU extensions even if ‘POSIXLY_CORRECT’
     1250     is set in the environment.
     1251
     1252‘W FILENAME’
     1253     Write to the given filename the portion of the pattern space up to
     1254     the first newline.  Everything said under the ‘w’ command about
     1255     file handling holds here too.
     1256
     1257‘z’
     1258     This command empties the content of pattern space.  It is usually
     1259     the same as ‘s/.*//’, but is more efficient and works in the
     1260     presence of invalid multibyte sequences in the input stream.  POSIX
     1261     mandates that such sequences are _not_ matched by ‘.’, so that
     1262     there is no portable way to clear ‘sed’’s buffers in the middle of
     1263     the script in most multibyte locales (including UTF-8 locales).
     1264
     1265
     1266File: sed.info,  Node: Multiple commands syntax,  Prev: Extended Commands,  Up: sed scripts
     1267
     12683.8 Multiple commands syntax
     1269============================
     1270
     1271There are several methods to specify multiple commands in a ‘sed’
     1272program.
     1273
     1274   Using newlines is most natural when running a sed script from a file
     1275(using the ‘-f’ option).
     1276
     1277   On the command line, all ‘sed’ commands may be separated by newlines.
     1278Alternatively, you may specify each command as an argument to an ‘-e’
     1279option:
     1280
     1281     $ seq 6 | sed '1d
     1282     3d
     1283     5d'
     1284     2
     1285     4
     1286     6
     1287
     1288     $ seq 6 | sed -e 1d -e 3d -e 5d
     1289     2
     1290     4
     1291     6
     1292
     1293   A semicolon (‘;’) may be used to separate most simple commands:
     1294
     1295     $ seq 6 | sed '1d;3d;5d'
     1296     2
     1297     4
     1298     6
     1299
     1300   The ‘{’,‘}’,‘b’,‘t’,‘T’,‘:’ commands can be separated with a
     1301semicolon (this is a non-portable GNU ‘sed’ extension).
     1302
     1303     $ seq 4 | sed '{1d;3d}'
     1304     2
     1305     4
     1306
     1307     $ seq 6 | sed '{1d;3d};5d'
     1308     2
     1309     4
     1310     6
     1311
     1312   Labels used in ‘b’,‘t’,‘T’,‘:’ commands are read until a semicolon.
     1313Leading and trailing whitespace is ignored.  In the examples below the
     1314label is ‘x’.  The first example works with GNU ‘sed’.  The second is a
     1315portable equivalent.  For more information about branching and labels
     1316*note Branching and flow control::.
     1317
     1318     $ seq 3 | sed '/1/b x ; s/^/=/ ; :x ; 3d'
     1319     1
     1320     =2
     1321
     1322     $ seq 3 | sed -e '/1/bx' -e 's/^/=/' -e ':x' -e '3d'
     1323     1
     1324     =2
     1325
     13263.8.1 Commands Requiring a newline
     1327----------------------------------
     1328
     1329The following commands cannot be separated by a semicolon and require a
     1330newline:
     1331
     1332‘a’,‘c’,‘i’ (append/change/insert)
     1333
     1334     All characters following ‘a’,‘c’,‘i’ commands are taken as the text
     1335     to append/change/insert.  Using a semicolon leads to undesirable
     1336     results:
     1337
     1338          $ seq 2 | sed '1aHello ; 2d'
     1339          1
     1340          Hello ; 2d
     1341          2
     1342
     1343     Separate the commands using ‘-e’ or a newline:
     1344
     1345          $ seq 2 | sed -e 1aHello -e 2d
     1346          1
     1347          Hello
     1348
     1349          $ seq 2 | sed '1aHello
     1350          2d'
     1351          1
     1352          Hello
     1353
     1354     Note that specifying the text to add (‘Hello’) immediately after
     1355     â€˜a’,‘c’,‘i’ is itself a GNU ‘sed’ extension.  A portable,
     1356     POSIX-compliant alternative is:
     1357
     1358          $ seq 2 | sed '1a\
     1359          Hello
     1360          2d'
     1361          1
     1362          Hello
     1363
     1364‘#’ (comment)
     1365
     1366     All characters following ‘#’ until the next newline are ignored.
     1367
     1368          $ seq 3 | sed '# this is a comment ; 2d'
     1369          1
     1370          2
     1371          3
     1372
     1373
     1374          $ seq 3 | sed '# this is a comment
     1375          2d'
     1376          1
     1377          3
     1378
     1379‘r’,‘R’,‘w’,‘W’ (reading and writing files)
     1380
     1381     The ‘r’,‘R’,‘w’,‘W’ commands parse the filename until end of the
     1382     line.  If whitespace, comments or semicolons are found, they will
     1383     be included in the filename, leading to unexpected results:
     1384
     1385          $ seq 2 | sed '1w hello.txt ; 2d'
     1386          1
     1387          2
     1388
     1389          $ ls -log
     1390          total 4
     1391          -rw-rw-r-- 1 2 Jan 23 23:03 hello.txt ; 2d
     1392
     1393          $ cat 'hello.txt ; 2d'
     1394          1
     1395
     1396     Note that ‘sed’ silently ignores read/write errors in
     1397     â€˜r’,‘R’,‘w’,‘W’ commands (such as missing files).  In the following
     1398     example, ‘sed’ tries to read a file named ‘‘hello.txt ; N’’.  The
     1399     file is missing, and the error is silently ignored:
     1400
     1401          $ echo x | sed '1rhello.txt ; N'
     1402          x
     1403
     1404‘e’ (command execution)
     1405
     1406     Any characters following the ‘e’ command until the end of the line
     1407     will be sent to the shell.  If whitespace, comments or semicolons
     1408     are found, they will be included in the shell command, leading to
     1409     unexpected results:
     1410
     1411          $ echo a | sed '1e touch foo#bar'
     1412          a
     1413
     1414          $ ls -1
     1415          foo#bar
     1416
     1417          $ echo a | sed '1e touch foo ; s/a/b/'
     1418          sh: 1: s/a/b/: not found
     1419          a
     1420
     1421‘s///[we]’ (substitute with ‘e’ or ‘w’ flags)
     1422
     1423     In a substitution command, the ‘w’ flag writes the substitution
     1424     result to a file, and the ‘e’ flag executes the substitution result
     1425     as a shell command.  As with the ‘r/R/w/W/e’ commands, these must
     1426     be terminated with a newline.  If whitespace, comments or
     1427     semicolons are found, they will be included in the shell command or
     1428     filename, leading to unexpected results:
     1429
     1430          $ echo a | sed 's/a/b/w1.txt#foo'
     1431          b
     1432
     1433          $ ls -1
     1434          1.txt#foo
     1435
     1436
     1437File: sed.info,  Node: sed addresses,  Next: sed regular expressions,  Prev: sed scripts,  Up: Top
     1438
     14394 Addresses: selecting lines
     1440****************************
     1441
     1442* Menu:
     1443
     1444* Addresses overview::                Addresses overview
     1445* Numeric Addresses::                 selecting lines by numbers
     1446* Regexp Addresses::                  selecting lines by text matching
     1447* Range Addresses::                   selecting a range of lines
     1448* Zero Address::                      Using address ‘0’
     1449
     1450
     1451File: sed.info,  Node: Addresses overview,  Next: Numeric Addresses,  Up: sed addresses
     1452
     14534.1 Addresses overview
     1454======================
     1455
     1456Addresses determine on which line(s) the ‘sed’ command will be executed.
     1457The following command replaces any first occurrence of ‘hello’ with
     1458‘world’ only on line 144:
     1459
     1460     sed '144s/hello/world/' input.txt > output.txt
     1461
     1462   If no address is specified, the command is performed on all lines.
     1463The following command replaces ‘hello’ with ‘world’, targeting every
     1464line of the input file.  However, note that it modifies only the first
     1465instance of ‘hello’ on each line.  Use the ‘g’ modifier to affect every
     1466instance on each affected line.
     1467
     1468     sed 's/hello/world/' input.txt > output.txt
     1469
     1470   Addresses can contain regular expressions to match lines based on
     1471content instead of line numbers.  The following command replaces ‘hello’
     1472with ‘world’ only on lines containing the string ‘apple’:
     1473
     1474     sed '/apple/s/hello/world/' input.txt > output.txt
     1475
     1476   An address range is specified with two addresses separated by a comma
     1477(‘,’).  Addresses can be numeric, regular expressions, or a mix of both.
     1478The following command replaces ‘hello’ with ‘world’ only on lines 4 to
     147917 (inclusive):
     1480
     1481     sed '4,17s/hello/world/' input.txt > output.txt
     1482
     1483   Appending the ‘!’ character to the end of an address specification
     1484(before the command letter) negates the sense of the match.  That is, if
     1485the ‘!’ character follows an address or an address range, then only
     1486lines which do _not_ match the addresses will be selected.  The
     1487following command replaces ‘hello’ with ‘world’ only on lines _not_
     1488containing the string ‘apple’:
     1489
     1490     sed '/apple/!s/hello/world/' input.txt > output.txt
     1491
     1492   The following command replaces ‘hello’ with ‘world’ only on lines 1
     1493to 3 and from line 18 to the last line of the input file (i.e.
     1494excluding lines 4 to 17):
     1495
     1496     sed '4,17!s/hello/world/' input.txt > output.txt
     1497
     1498
     1499File: sed.info,  Node: Numeric Addresses,  Next: Regexp Addresses,  Prev: Addresses overview,  Up: sed addresses
     1500
     15014.2 Selecting lines by numbers
     1502==============================
     1503
     1504Addresses in a ‘sed’ script can be in any of the following forms:
     1505‘NUMBER’
     1506     Specifying a line number will match only that line in the input.
     1507     (Note that ‘sed’ counts lines continuously across all input files
     1508     unless ‘-i’ or ‘-s’ options are specified.)
     1509
     1510‘$’
     1511     This address matches the last line of the last file of input, or
     1512     the last line of each file when the ‘-i’ or ‘-s’ options are
     1513     specified.
     1514
     1515‘FIRST~STEP’
     1516     This GNU extension matches every STEPth line starting with line
     1517     FIRST.  In particular, lines will be selected when there exists a
     1518     non-negative N such that the current line-number equals FIRST + (N
     1519     * STEP).  Thus, one would use ‘1~2’ to select the odd-numbered
     1520     lines and ‘0~2’ for even-numbered lines; to pick every third line
     1521     starting with the second, ‘2~3’ would be used; to pick every fifth
     1522     line starting with the tenth, use ‘10~5’; and ‘50~0’ is just an
     1523     obscure way of saying ‘50’.
     1524
     1525     The following commands demonstrate the step address usage:
     1526
     1527          $ seq 10 | sed -n '0~4p'
     1528          4
     1529          8
     1530
     1531          $ seq 10 | sed -n '1~3p'
     1532          1
     1533          4
     1534          7
     1535          10
     1536
     1537
     1538File: sed.info,  Node: Regexp Addresses,  Next: Range Addresses,  Prev: Numeric Addresses,  Up: sed addresses
     1539
     15404.3 selecting lines by text matching
     1541====================================
     1542
     1543GNU ‘sed’ supports the following regular expression addresses.  The
     1544default regular expression is *note Basic Regular Expression (BRE): BRE
     1545syntax.  If ‘-E’ or ‘-r’ options are used, The regular expression should
     1546be in *note Extended Regular Expression (ERE): ERE syntax. syntax.
     1547*Note BRE vs ERE::.
     1548
     1549‘/REGEXP/’
     1550     This will select any line which matches the regular expression
     1551     REGEXP.  If REGEXP itself includes any ‘/’ characters, each must be
     1552     escaped by a backslash (‘\’).
     1553
     1554     The following command prints lines in ‘/etc/passwd’ which end with
     1555     â€˜bash’(1):
     1556
     1557          sed -n '/bash$/p' /etc/passwd
     1558
     1559     The empty regular expression ‘//’ repeats the last regular
     1560     expression match (the same holds if the empty regular expression is
     1561     passed to the ‘s’ command).  Note that modifiers to regular
     1562     expressions are evaluated when the regular expression is compiled,
     1563     thus it is invalid to specify them together with the empty regular
     1564     expression.
     1565
     1566‘\%REGEXP%’
     1567     (The ‘%’ may be replaced by any other single character.)
     1568
     1569     This also matches the regular expression REGEXP, but allows one to
     1570     use a different delimiter than ‘/’.  This is particularly useful if
     1571     the REGEXP itself contains a lot of slashes, since it avoids the
     1572     tedious escaping of every ‘/’.  If REGEXP itself includes any
     1573     delimiter characters, each must be escaped by a backslash (‘\’).
     1574
     1575     The following commands are equivalent.  They print lines which
     1576     start with ‘/home/alice/documents/’:
     1577
     1578          sed -n '/^\/home\/alice\/documents\//p'
     1579          sed -n '\%^/home/alice/documents/%p'
     1580          sed -n '\;^/home/alice/documents/;p'
     1581
     1582‘/REGEXP/I’
     1583‘\%REGEXP%I’
     1584     The ‘I’ modifier to regular-expression matching is a GNU extension
     1585     which causes the REGEXP to be matched in a case-insensitive manner.
     1586
     1587     In many other programming languages, a lower case ‘i’ is used for
     1588     case-insensitive regular expression matching.  However, in ‘sed’
     1589     the ‘i’ is used for the insert command (*note insert command::).
     1590
     1591     Observe the difference between the following examples.
     1592
     1593     In this example, ‘/b/I’ is the address: regular expression with ‘I’
     1594     modifier.  ‘d’ is the delete command:
     1595
     1596          $ printf "%s\n" a b c | sed '/b/Id'
     1597          a
     1598          c
     1599
     1600     Here, ‘/b/’ is the address: a regular expression.  ‘i’ is the
     1601     insert command.  ‘d’ is the value to insert.  A line with ‘d’ is
     1602     then inserted above the matched line:
     1603
     1604          $ printf "%s\n" a b c | sed '/b/id'
     1605          a
     1606          d
     1607          b
     1608          c
     1609
     1610‘/REGEXP/M’
     1611‘\%REGEXP%M’
     1612     The ‘M’ modifier to regular-expression matching is a GNU ‘sed’
     1613     extension which directs GNU ‘sed’ to match the regular expression
     1614     in ‘multi-line’ mode.  The modifier causes ‘^’ and ‘$’ to match
     1615     respectively (in addition to the normal behavior) the empty string
     1616     after a newline, and the empty string before a newline.  There are
     1617     special character sequences (‘\`’ and ‘\'’) which always match the
     1618     beginning or the end of the buffer.  In addition, the period
     1619     character does not match a new-line character in multi-line mode.
     1620
     1621   Regex addresses operate on the content of the current pattern space.
     1622If the pattern space is changed (for example with ‘s///’ command) the
     1623regular expression matching will operate on the changed text.
     1624
     1625   In the following example, automatic printing is disabled with ‘-n’.
     1626The ‘s/2/X/’ command changes lines containing ‘2’ to ‘X’.  The command
     1627‘/[0-9]/p’ matches lines with digits and prints them.  Because the
     1628second line is changed before the ‘/[0-9]/’ regex, it will not match and
     1629will not be printed:
     1630
     1631     $ seq 3 | sed -n 's/2/X/ ; /[0-9]/p'
     1632     1
     1633     3
     1634
     1635   ---------- Footnotes ----------
     1636
     1637   (1) There are of course many other ways to do the same, e.g.
     1638     grep 'bash$' /etc/passwd
     1639     awk -F: '$7 == "/bin/bash"' /etc/passwd
     1640
     1641
     1642File: sed.info,  Node: Range Addresses,  Next: Zero Address,  Prev: Regexp Addresses,  Up: sed addresses
     1643
     16444.4 Range Addresses
     1645===================
     1646
     1647An address range can be specified by specifying two addresses separated
     1648by a comma (‘,’).  An address range matches lines starting from where
     1649the first address matches, and continues until the second address
     1650matches (inclusively):
     1651
     1652     $ seq 10 | sed -n '4,6p'
     1653     4
     1654     5
     1655     6
     1656
     1657   If the second address is a REGEXP, then checking for the ending match
     1658will start with the line _following_ the line which matched the first
     1659address: a range will always span at least two lines (except of course
     1660if the input stream ends).
     1661
     1662     $ seq 10 | sed -n '4,/[0-9]/p'
     1663     4
     1664     5
     1665
     1666   If the second address is a NUMBER less than (or equal to) the line
     1667matching the first address, then only the one line is matched:
     1668
     1669     $ seq 10 | sed -n '4,1p'
     1670     4
     1671
     1672   GNU ‘sed’ also supports some special two-address forms; all these are
     1673GNU extensions:
     1674‘0,/REGEXP/’
     1675     A line number of ‘0’ can be used in an address specification like
     1676     â€˜0,/REGEXP/’ so that ‘sed’ will try to match REGEXP in the first
     1677     input line too.  In other words, ‘0,/REGEXP/’ is similar to
     1678     â€˜1,/REGEXP/’, except that if ADDR2 matches the very first line of
     1679     input the ‘0,/REGEXP/’ form will consider it to end the range,
     1680     whereas the ‘1,/REGEXP/’ form will match the beginning of its range
     1681     and hence make the range span up to the _second_ occurrence of the
     1682     regular expression.
     1683
     1684     The following examples demonstrate the difference between starting
     1685     with address 1 and 0:
     1686
     1687          $ seq 10 | sed -n '1,/[0-9]/p'
     1688          1
     1689          2
     1690
     1691          $ seq 10 | sed -n '0,/[0-9]/p'
     1692          1
     1693
     1694‘ADDR1,+N’
     1695     Matches ADDR1 and the N lines following ADDR1.
     1696
     1697          $ seq 10 | sed -n '6,+2p'
     1698          6
     1699          7
     1700          8
     1701
     1702     ADDR1 can be a line number or a regular expression.
     1703
     1704‘ADDR1,~N’
     1705     Matches ADDR1 and the lines following ADDR1 until the next line
     1706     whose input line number is a multiple of N.  The following command
     1707     prints starting at line 6, until the next line which is a multiple
     1708     of 4 (i.e.  line 8):
     1709
     1710          $ seq 10 | sed -n '6,~4p'
     1711          6
     1712          7
     1713          8
     1714
     1715     ADDR1 can be a line number or a regular expression.
     1716
     1717
     1718File: sed.info,  Node: Zero Address,  Prev: Range Addresses,  Up: sed addresses
     1719
     17204.5 Zero Address
     1721================
     1722
     1723As a GNU ‘sed’ extension, ‘0’ address can be used in two cases:
     1724  1. In a regex range addresses as ‘0,/REGEXP/’ (*note Zero Address
     1725     Regex Range::).
     1726  2. With the ‘r’ command, inserting a file before the first line (*note
     1727     Adding a header to multiple files::).
     1728
     1729   Note that these are the only places where the ‘0’ address makes
     1730sense; Commands which are given the ‘0’ address in any other way will
     1731give an error.
     1732
     1733
     1734File: sed.info,  Node: sed regular expressions,  Next: advanced sed,  Prev: sed addresses,  Up: Top
     1735
     17365 Regular Expressions: selecting text
     1737*************************************
     1738
     1739* Menu:
     1740
     1741* Regular Expressions Overview:: Overview of Regular expression in ‘sed’
     1742* BRE vs ERE::               Basic (BRE) and extended (ERE) regular expression
     1743                             syntax
     1744* BRE syntax::               Overview of basic regular expression syntax
     1745* ERE syntax::               Overview of extended regular expression syntax
     1746* Character Classes and Bracket Expressions::
     1747* regexp extensions::        Additional regular expression commands
     1748* Back-references and Subexpressions:: Back-references and Subexpressions
     1749* Escapes::                  Specifying special characters
     1750* Locale Considerations::    Multibyte characters and locale considerations
     1751
     1752
     1753File: sed.info,  Node: Regular Expressions Overview,  Next: BRE vs ERE,  Up: sed regular expressions
     1754
     17555.1 Overview of regular expression in ‘sed’
     1756===========================================
     1757
     1758To know how to use ‘sed’, people should understand regular expressions
     1759(“regexp” for short).  A regular expression is a pattern that is matched
     1760against a subject string from left to right.  Most characters are
     1761“ordinary”: they stand for themselves in a pattern, and match the
     1762corresponding characters.  Regular expressions in ‘sed’ are specified
     1763between two slashes.
     1764
     1765   The following command prints lines containing the string ‘hello’:
     1766
     1767     sed -n '/hello/p'
     1768
     1769   The above example is equivalent to this ‘grep’ command:
     1770
     1771     grep 'hello'
     1772
     1773   The power of regular expressions comes from the ability to include
     1774alternatives and repetitions in the pattern.  These are encoded in the
     1775pattern by the use of “special characters”, which do not stand for
     1776themselves but instead are interpreted in some special way.
     1777
     1778   The character ‘^’ (caret) in a regular expression matches the
     1779beginning of the line.  The character ‘.’ (dot) matches any single
     1780character.  The following ‘sed’ command matches and prints lines which
     1781start with the letter ‘b’, followed by any single character, followed by
     1782the letter ‘d’:
     1783
     1784     $ printf "%s\n" abode bad bed bit bid byte body | sed -n '/^b.d/p'
     1785     bad
     1786     bed
     1787     bid
     1788     body
     1789
     1790   The following sections explain the meaning and usage of special
     1791characters in regular expressions.
     1792
     1793
     1794File: sed.info,  Node: BRE vs ERE,  Next: BRE syntax,  Prev: Regular Expressions Overview,  Up: sed regular expressions
     1795
     17965.2 Basic (BRE) and extended (ERE) regular expression
     1797=====================================================
     1798
     1799Basic and extended regular expressions are two variations on the syntax
     1800of the specified pattern.  Basic Regular Expression (BRE) syntax is the
     1801default in ‘sed’ (and similarly in ‘grep’).  Use the POSIX-specified
     1802‘-E’ option (‘-r’, ‘--regexp-extended’) to enable Extended Regular
     1803Expression (ERE) syntax.
     1804
     1805   In GNU ‘sed’, the only difference between basic and extended regular
     1806expressions is in the behavior of a few special characters: ‘?’, ‘+’,
     1807parentheses, braces (‘{}’), and ‘|’.
     1808
     1809   With basic (BRE) syntax, these characters do not have special meaning
     1810unless prefixed with a backslash (‘\’); While with extended (ERE) syntax
     1811it is reversed: these characters are special unless they are prefixed
     1812with backslash (‘\’).
     1813
     1814Desired pattern      Basic (BRE) Syntax         Extended (ERE) Syntax
     1815                                               
     1816--------------------------------------------------------------------------
     1817literal ‘+’ (plus         $ echo 'a+b=c' > foo       $ echo 'a+b=c' > foo
     1818sign)                     $ sed -n '/a+b/p' foo      $ sed -E -n '/a\+b/p' foo
     1819                          a+b=c                      a+b=c
     1820                                               
     1821One or more ‘a’           $ echo aab > foo           $ echo aab > foo
     1822characters                $ sed -n '/a\+b/p' foo     $ sed -E -n '/a+b/p' foo
     1823followed by ‘b’           aab                        aab
     1824(plus sign as                                   
     1825special
     1826meta-character)
     1827
     1828
     1829File: sed.info,  Node: BRE syntax,  Next: ERE syntax,  Prev: BRE vs ERE,  Up: sed regular expressions
     1830
     18315.3 Overview of basic regular expression syntax
     1832===============================================
     1833
     1834Here is a brief description of regular expression syntax as used in
     1835‘sed’.
     1836
     1837‘CHAR’
     1838     A single ordinary character matches itself.
     1839
     1840‘*’
     1841     Matches a sequence of zero or more instances of matches for the
     1842     preceding regular expression, which must be an ordinary character,
     1843     a special character preceded by ‘\’, a ‘.’, a grouped regexp (see
     1844     below), or a bracket expression.  As a GNU extension, a postfixed
     1845     regular expression can also be followed by ‘*’; for example, ‘a**’
     1846     is equivalent to ‘a*’.  POSIX 1003.1-2001 says that ‘*’ stands for
     1847     itself when it appears at the start of a regular expression or
     1848     subexpression, but many non-GNU implementations do not support this
     1849     and portable scripts should instead use ‘\*’ in these contexts.
     1850‘.’
     1851     Matches any character, including newline.
     1852
     1853‘^’
     1854     Matches the null string at beginning of the pattern space, i.e.
     1855     what appears after the circumflex must appear at the beginning of
     1856     the pattern space.
     1857
     1858     In most scripts, pattern space is initialized to the content of
     1859     each line (*note How ‘sed’ works: Execution Cycle.).  So, it is a
     1860     useful simplification to think of ‘^#include’ as matching only
     1861     lines where ‘#include’ is the first thing on the line—if there is
     1862     any preceding space, for example, the match fails.  This
     1863     simplification is valid as long as the original content of pattern
     1864     space is not modified, for example with an ‘s’ command.
     1865
     1866     â€˜^’ acts as a special character only at the beginning of the
     1867     regular expression or subexpression (that is, after ‘\(’ or ‘\|’).
     1868     Portable scripts should avoid ‘^’ at the beginning of a
     1869     subexpression, though, as POSIX allows implementations that treat
     1870     â€˜^’ as an ordinary character in that context.
     1871
     1872‘$’
     1873     It is the same as ‘^’, but refers to end of pattern space.  ‘$’
     1874     also acts as a special character only at the end of the regular
     1875     expression or subexpression (that is, before ‘\)’ or ‘\|’), and its
     1876     use at the end of a subexpression is not portable.
     1877
     1878‘[LIST]’
     1879‘[^LIST]’
     1880     Matches any single character in LIST: for example, ‘[aeiou]’
     1881     matches all vowels.  A list may include sequences like
     1882     â€˜CHAR1-CHAR2’, which matches any character between (inclusive)
     1883     CHAR1 and CHAR2.  *Note Character Classes and Bracket
     1884     Expressions::.
     1885
     1886‘\+’
     1887     As ‘*’, but matches one or more.  It is a GNU extension.
     1888
     1889‘\?’
     1890     As ‘*’, but only matches zero or one.  It is a GNU extension.
     1891
     1892‘\{I\}’
     1893     As ‘*’, but matches exactly I sequences (I is a decimal integer;
     1894     for portability, keep it between 0 and 255 inclusive).
     1895
     1896‘\{I,J\}’
     1897     Matches between I and J, inclusive, sequences.
     1898
     1899‘\{I,\}’
     1900     Matches more than or equal to I sequences.
     1901
     1902‘\(REGEXP\)’
     1903     Groups the inner REGEXP as a whole, this is used to:
     1904
     1905        • Apply postfix operators, like ‘\(abcd\)*’: this will search
     1906          for zero or more whole sequences of ‘abcd’, while ‘abcd*’
     1907          would search for ‘abc’ followed by zero or more occurrences of
     1908          ‘d’.  Note that support for ‘\(abcd\)*’ is required by POSIX
     1909          1003.1-2001, but many non-GNU implementations do not support
     1910          it and hence it is not universally portable.
     1911
     1912        • Use back references (see below).
     1913
     1914‘REGEXP1\|REGEXP2’
     1915     Matches either REGEXP1 or REGEXP2.  Use parentheses to use complex
     1916     alternative regular expressions.  The matching process tries each
     1917     alternative in turn, from left to right, and the first one that
     1918     succeeds is used.  It is a GNU extension.
     1919
     1920‘REGEXP1REGEXP2’
     1921     Matches the concatenation of REGEXP1 and REGEXP2.  Concatenation
     1922     binds more tightly than ‘\|’, ‘^’, and ‘$’, but less tightly than
     1923     the other regular expression operators.
     1924
     1925‘\DIGIT’
     1926     Matches the DIGIT-th ‘\(...\)’ parenthesized subexpression in the
     1927     regular expression.  This is called a “back reference”.
     1928     Subexpressions are implicitly numbered by counting occurrences of
     1929     â€˜\(’ left-to-right.
     1930
     1931‘\n’
     1932     Matches the newline character.
     1933
     1934‘\CHAR’
     1935     Matches CHAR, where CHAR is one of ‘$’, ‘*’, ‘.’, ‘[’, ‘\’, or ‘^’.
     1936     Note that the only C-like backslash sequences that you can portably
     1937     assume to be interpreted are ‘\n’ and ‘\\’; in particular ‘\t’ is
     1938     not portable, and matches a ‘t’ under most implementations of
     1939     â€˜sed’, rather than a tab character.
     1940
     1941   Note that the regular expression matcher is greedy, i.e., matches are
     1942attempted from left to right and, if two or more matches are possible
     1943starting at the same character, it selects the longest.
     1944
     1945Examples:
     1946‘abcdef’
     1947     Matches ‘abcdef’.
     1948
     1949‘a*b’
     1950     Matches zero or more ‘a’s followed by a single ‘b’.  For example,
     1951     â€˜b’ or ‘aaaaab’.
     1952
     1953‘a\?b’
     1954     Matches ‘b’ or ‘ab’.
     1955
     1956‘a\+b\+’
     1957     Matches one or more ‘a’s followed by one or more ‘b’s: ‘ab’ is the
     1958     shortest possible match, but other examples are ‘aaaab’ or ‘abbbbb’
     1959     or ‘aaaaaabbbbbbb’.
     1960
     1961‘.*’
     1962‘.\+’
     1963     These two both match all the characters in a string; however, the
     1964     first matches every string (including the empty string), while the
     1965     second matches only strings containing at least one character.
     1966
     1967‘^main.*(.*)’
     1968     This matches a string starting with ‘main’, followed by an opening
     1969     and closing parenthesis.  The ‘n’, ‘(’ and ‘)’ need not be
     1970     adjacent.
     1971
     1972‘^#’
     1973     This matches a string beginning with ‘#’.
     1974
     1975‘\\$’
     1976     This matches a string ending with a single backslash.  The regexp
     1977     contains two backslashes for escaping.
     1978
     1979‘\$’
     1980     Instead, this matches a string consisting of a single dollar sign,
     1981     because it is escaped.
     1982
     1983‘[a-zA-Z0-9]’
     1984     In the C locale, this matches any ASCII letters or digits.
     1985
     1986‘[^ ‘<TAB>’]\+’
     1987     (Here ‘<TAB>’ stands for a single tab character.)  This matches a
     1988     string of one or more characters, none of which is a space or a
     1989     tab.  Usually this means a word.
     1990
     1991‘^\(.*\)\n\1$’
     1992     This matches a string consisting of two equal substrings separated
     1993     by a newline.
     1994
     1995‘.\{9\}A$’
     1996     This matches nine characters followed by an ‘A’ at the end of a
     1997     line.
     1998
     1999‘^.\{15\}A’
     2000     This matches the start of a string that contains 16 characters, the
     2001     last of which is an ‘A’.
     2002
     2003
     2004File: sed.info,  Node: ERE syntax,  Next: Character Classes and Bracket Expressions,  Prev: BRE syntax,  Up: sed regular expressions
     2005
     20065.4 Overview of extended regular expression syntax
     2007==================================================
     2008
     2009The only difference between basic and extended regular expressions is in
     2010the behavior of a few characters: ‘?’, ‘+’, parentheses, braces (‘{}’),
     2011and ‘|’.  While basic regular expressions require these to be escaped if
     2012you want them to behave as special characters, when using extended
     2013regular expressions you must escape them if you want them _to match a
     2014literal character_.  ‘|’ is special here because ‘\|’ is a GNU extension
     2015– standard basic regular expressions do not provide its functionality.
     2016
     2017Examples:
     2018‘abc?’
     2019     becomes ‘abc\?’ when using extended regular expressions.  It
     2020     matches the literal string ‘abc?’.
     2021
     2022‘c\+’
     2023     becomes ‘c+’ when using extended regular expressions.  It matches
     2024     one or more ‘c’s.
     2025
     2026‘a\{3,\}’
     2027     becomes ‘a{3,}’ when using extended regular expressions.  It
     2028     matches three or more ‘a’s.
     2029
     2030‘\(abc\)\{2,3\}’
     2031     becomes ‘(abc){2,3}’ when using extended regular expressions.  It
     2032     matches either ‘abcabc’ or ‘abcabcabc’.
     2033
     2034‘\(abc*\)\1’
     2035     becomes ‘(abc*)\1’ when using extended regular expressions.
     2036     Backreferences must still be escaped when using extended regular
     2037     expressions.
     2038
     2039‘a\|b’
     2040     becomes ‘a|b’ when using extended regular expressions.  It matches
     2041     â€˜a’ or ‘b’.
     2042
     2043
     2044File: sed.info,  Node: Character Classes and Bracket Expressions,  Next: regexp extensions,  Prev: ERE syntax,  Up: sed regular expressions
     2045
     20465.5 Character Classes and Bracket Expressions
     2047=============================================
     2048
     2049A “bracket expression” is a list of characters enclosed by ‘[’ and ‘]’.
     2050It matches any single character in that list; if the first character of
     2051the list is the caret ‘^’, then it matches any character *not* in the
     2052list.  For example, the following command replaces the strings ‘gray’ or
     2053‘grey’ with ‘blue’:
     2054
     2055     sed  's/gr[ae]y/blue/'
     2056
     2057   Bracket expressions can be used in both *note basic: BRE syntax. and
     2058*note extended: ERE syntax. regular expressions (that is, with or
     2059without the ‘-E’/‘-r’ options).
     2060
     2061   Within a bracket expression, a “range expression” consists of two
     2062characters separated by a hyphen.  It matches any single character that
     2063sorts between the two characters, inclusive.  In the default C locale,
     2064the sorting sequence is the native character order; for example, ‘[a-d]’
     2065is equivalent to ‘[abcd]’.
     2066
     2067   Finally, certain named classes of characters are predefined within
     2068bracket expressions, as follows.
     2069
     2070   These named classes must be used _inside_ brackets themselves.
     2071Correct usage:
     2072     $ echo 1 | sed 's/[[:digit:]]/X/'
     2073     X
     2074
     2075   Incorrect usage is rejected by newer ‘sed’ versions.  Older versions
     2076accepted it but treated it as a single bracket expression (which is
     2077equivalent to ‘[dgit:]’, that is, only the characters D/G/I/T/:):
     2078     # current GNU sed versions - incorrect usage rejected
     2079     $ echo 1 | sed 's/[:digit:]/X/'
     2080     sed: character class syntax is [[:space:]], not [:space:]
     2081
     2082     # older GNU sed versions
     2083     $ echo 1 | sed 's/[:digit:]/X/'
     2084     1
     2085
     2086‘[:alnum:]’
     2087     Alphanumeric characters: ‘[:alpha:]’ and ‘[:digit:]’; in the ‘C’
     2088     locale and ASCII character encoding, this is the same as
     2089     â€˜[0-9A-Za-z]’.
     2090
     2091‘[:alpha:]’
     2092     Alphabetic characters: ‘[:lower:]’ and ‘[:upper:]’; in the ‘C’
     2093     locale and ASCII character encoding, this is the same as
     2094     â€˜[A-Za-z]’.
     2095
     2096‘[:blank:]’
     2097     Blank characters: space and tab.
     2098
     2099‘[:cntrl:]’
     2100     Control characters.  In ASCII, these characters have octal codes
     2101     000 through 037, and 177 (DEL). In other character sets, these are
     2102     the equivalent characters, if any.
     2103
     2104‘[:digit:]’
     2105     Digits: ‘0 1 2 3 4 5 6 7 8 9’.
     2106
     2107‘[:graph:]’
     2108     Graphical characters: ‘[:alnum:]’ and ‘[:punct:]’.
     2109
     2110‘[:lower:]’
     2111     Lower-case letters; in the ‘C’ locale and ASCII character encoding,
     2112     this is ‘a b c d e f g h i j k l m n o p q r s t u v w x y z’.
     2113
     2114‘[:print:]’
     2115     Printable characters: ‘[:alnum:]’, ‘[:punct:]’, and space.
     2116
     2117‘[:punct:]’
     2118     Punctuation characters; in the ‘C’ locale and ASCII character
     2119     encoding, this is ‘! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \
     2120     ] ^ _ ` { | } ~’.
     2121
     2122‘[:space:]’
     2123     Space characters: in the ‘C’ locale, this is tab, newline, vertical
     2124     tab, form feed, carriage return, and space.
     2125
     2126‘[:upper:]’
     2127     Upper-case letters: in the ‘C’ locale and ASCII character encoding,
     2128     this is ‘A B C D E F G H I J K L M N O P Q R S T U V W X Y Z’.
     2129
     2130‘[:xdigit:]’
     2131     Hexadecimal digits: ‘0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f’.
     2132
     2133   Note that the brackets in these class names are part of the symbolic
     2134names, and must be included in addition to the brackets delimiting the
     2135bracket expression.
     2136
     2137   Most meta-characters lose their special meaning inside bracket
     2138expressions:
     2139
     2140‘]’
     2141     ends the bracket expression if it’s not the first list item.  So,
     2142     if you want to make the ‘]’ character a list item, you must put it
     2143     first.
     2144
     2145‘-’
     2146     represents the range if it’s not first or last in a list or the
     2147     ending point of a range.
     2148
     2149‘^’
     2150     represents the characters not in the list.  If you want to make the
     2151     â€˜^’ character a list item, place it anywhere but first.
     2152
     2153   TODO: incorporate this paragraph (copied verbatim from BRE section).
     2154
     2155   The characters ‘$’, ‘*’, ‘.’, ‘[’, and ‘\’ are normally not special
     2156within LIST.  For example, ‘[\*]’ matches either ‘\’ or ‘*’, because the
     2157‘\’ is not special here.  However, strings like ‘[.ch.]’, ‘[=a=]’, and
     2158‘[:space:]’ are special within LIST and represent collating symbols,
     2159equivalence classes, and character classes, respectively, and ‘[’ is
     2160therefore special within LIST when it is followed by ‘.’, ‘=’, or ‘:’.
     2161Also, when not in ‘POSIXLY_CORRECT’ mode, special escapes like ‘\n’ and
     2162‘\t’ are recognized within LIST.  *Note Escapes::.
     2163
     2164‘[.’
     2165     represents the open collating symbol.
     2166
     2167‘.]’
     2168     represents the close collating symbol.
     2169
     2170‘[=’
     2171     represents the open equivalence class.
     2172
     2173‘=]’
     2174     represents the close equivalence class.
     2175
     2176‘[:’
     2177     represents the open character class symbol, and should be followed
     2178     by a valid character class name.
     2179
     2180‘:]’
     2181     represents the close character class symbol.
     2182
     2183
     2184File: sed.info,  Node: regexp extensions,  Next: Back-references and Subexpressions,  Prev: Character Classes and Bracket Expressions,  Up: sed regular expressions
     2185
     21865.6 regular expression extensions
     2187=================================
     2188
     2189The following sequences have special meaning inside regular expressions
     2190(used in *note addresses: Regexp Addresses. and the ‘s’ command).
     2191
     2192   These can be used in both *note basic: BRE syntax. and *note
     2193extended: ERE syntax. regular expressions (that is, with or without the
     2194‘-E’/‘-r’ options).
     2195
     2196‘\w’
     2197     Matches any “word” character.  A “word” character is any letter or
     2198     digit or the underscore character.
     2199
     2200          $ echo "abc %-= def." | sed 's/\w/X/g'
     2201          XXX %-= XXX.
     2202
     2203‘\W’
     2204     Matches any “non-word” character.
     2205
     2206          $ echo "abc %-= def." | sed 's/\W/X/g'
     2207          abcXXXXXdefX
     2208
     2209‘\b’
     2210     Matches a word boundary; that is it matches if the character to the
     2211     left is a “word” character and the character to the right is a
     2212     â€œnon-word” character, or vice-versa.
     2213
     2214          $ echo "abc %-= def." | sed 's/\b/X/g'
     2215          XabcX %-= XdefX.
     2216
     2217‘\B’
     2218     Matches everywhere but on a word boundary; that is it matches if
     2219     the character to the left and the character to the right are either
     2220     both “word” characters or both “non-word” characters.
     2221
     2222          $ echo "abc %-= def." | sed 's/\B/X/g'
     2223          aXbXc X%X-X=X dXeXf.X
     2224
     2225‘\s’
     2226     Matches whitespace characters (spaces and tabs).  Newlines embedded
     2227     in the pattern/hold spaces will also match:
     2228
     2229          $ echo "abc %-= def." | sed 's/\s/X/g'
     2230          abcX%-=Xdef.
     2231
     2232‘\S’
     2233     Matches non-whitespace characters.
     2234
     2235          $ echo "abc %-= def." | sed 's/\S/X/g'
     2236          XXX XXX XXXX
     2237
     2238‘\<’
     2239     Matches the beginning of a word.
     2240
     2241          $ echo "abc %-= def." | sed 's/\</X/g'
     2242          Xabc %-= Xdef.
     2243
     2244‘\>’
     2245     Matches the end of a word.
     2246
     2247          $ echo "abc %-= def." | sed 's/\>/X/g'
     2248          abcX %-= defX.
     2249
     2250‘\`’
     2251     Matches only at the start of pattern space.  This is different from
     2252     â€˜^’ in multi-line mode.
     2253
     2254     Compare the following two examples:
     2255
     2256          $ printf "a\nb\nc\n" | sed 'N;N;s/^/X/gm'
     2257          Xa
     2258          Xb
     2259          Xc
     2260
     2261          $ printf "a\nb\nc\n" | sed 'N;N;s/\`/X/gm'
     2262          Xa
     2263          b
     2264          c
     2265
     2266‘\'’
     2267     Matches only at the end of pattern space.  This is different from
     2268     â€˜$’ in multi-line mode.
     2269
     2270
     2271File: sed.info,  Node: Back-references and Subexpressions,  Next: Escapes,  Prev: regexp extensions,  Up: sed regular expressions
     2272
     22735.7 Back-references and Subexpressions
     2274======================================
     2275
     2276“back-references” are regular expression commands which refer to a
     2277previous part of the matched regular expression.  Back-references are
     2278specified with backslash and a single digit (e.g.  ‘\1’).  The part of
     2279the regular expression they refer to is called a “subexpression”, and is
     2280designated with parentheses.
     2281
     2282   Back-references and subexpressions are used in two cases: in the
     2283regular expression search pattern, and in the REPLACEMENT part of the
     2284‘s’ command (*note Regular Expression Addresses: Regexp Addresses. and
     2285*note The "s" Command::).
     2286
     2287   In a regular expression pattern, back-references are used to match
     2288the same content as a previously matched subexpression.  In the
     2289following example, the subexpression is ‘.’ - any single character
     2290(being surrounded by parentheses makes it a subexpression).  The
     2291back-reference ‘\1’ asks to match the same content (same character) as
     2292the sub-expression.
     2293
     2294   The command below matches words starting with any character, followed
     2295by the letter ‘o’, followed by the same character as the first.
     2296
     2297     $ sed -E -n '/^(.)o\1$/p' /usr/share/dict/words
     2298     bob
     2299     mom
     2300     non
     2301     pop
     2302     sos
     2303     tot
     2304     wow
     2305
     2306   Multiple subexpressions are automatically numbered from
     2307left-to-right.  This command searches for 6-letter palindromes (the
     2308first three letters are 3 subexpressions, followed by 3 back-references
     2309in reverse order):
     2310
     2311     $ sed -E -n '/^(.)(.)(.)\3\2\1$/p' /usr/share/dict/words
     2312     redder
     2313
     2314   In the ‘s’ command, back-references can be used in the REPLACEMENT
     2315part to refer back to subexpressions in the REGEXP part.
     2316
     2317   The following example uses two subexpressions in the regular
     2318expression to match two space-separated words.  The back-references in
     2319the REPLACEMENT part prints the words in a different order:
     2320
     2321     $ echo "James Bond" | sed -E 's/(.*) (.*)/The name is \2, \1 \2./'
     2322     The name is Bond, James Bond.
     2323
     2324   When used with alternation, if the group does not participate in the
     2325match then the back-reference makes the whole match fail.  For example,
     2326‘a(.)|b\1’ will not match ‘ba’.  When multiple regular expressions are
     2327given with ‘-e’ or from a file (‘-f FILE’), back-references are local to
     2328each expression.
     2329
     2330
     2331File: sed.info,  Node: Escapes,  Next: Locale Considerations,  Prev: Back-references and Subexpressions,  Up: sed regular expressions
     2332
     23335.8 Escape Sequences - specifying special characters
     2334====================================================
     2335
     2336Until this chapter, we have only encountered escapes of the form ‘\^’,
     2337which tell ‘sed’ not to interpret the circumflex as a special character,
     2338but rather to take it literally.  For example, ‘\*’ matches a single
     2339asterisk rather than zero or more backslashes.
     2340
     2341   This chapter introduces another kind of escape(1)—that is, escapes
     2342that are applied to a character or sequence of characters that
     2343ordinarily are taken literally, and that ‘sed’ replaces with a special
     2344character.  This provides a way of encoding non-printable characters in
     2345patterns in a visible manner.  There is no restriction on the appearance
     2346of non-printing characters in a ‘sed’ script but when a script is being
     2347prepared in the shell or by text editing, it is usually easier to use
     2348one of the following escape sequences than the binary character it
     2349represents:
     2350
     2351   The list of these escapes is:
     2352
     2353‘\a’
     2354     Produces or matches a BEL character, that is an “alert” (ASCII 7).
     2355
     2356‘\f’
     2357     Produces or matches a form feed (ASCII 12).
     2358
     2359‘\n’
     2360     Produces or matches a newline (ASCII 10).
     2361
     2362‘\r’
     2363     Produces or matches a carriage return (ASCII 13).
     2364
     2365‘\t’
     2366     Produces or matches a horizontal tab (ASCII 9).
     2367
     2368‘\v’
     2369     Produces or matches a so called “vertical tab” (ASCII 11).
     2370
     2371‘\cX’
     2372     Produces or matches ‘CONTROL-X’, where X is any character.  The
     2373     precise effect of ‘\cX’ is as follows: if X is a lower case letter,
     2374     it is converted to upper case.  Then bit 6 of the character (hex
     2375     40) is inverted.  Thus ‘\cz’ becomes hex 1A, but ‘\c{’ becomes hex
     2376     3B, while ‘\c;’ becomes hex 7B.
     2377
     2378‘\dXXX’
     2379     Produces or matches a character whose decimal ASCII value is XXX.
     2380
     2381‘\oXXX’
     2382     Produces or matches a character whose octal ASCII value is XXX.
     2383
     2384‘\xXX’
     2385     Produces or matches a character whose hexadecimal ASCII value is
     2386     XX.
     2387
     2388   â€˜\b’ (backspace) was omitted because of the conflict with the
     2389existing “word boundary” meaning.
     2390
     23915.8.1 Escaping Precedence
     2392-------------------------
     2393
     2394GNU ‘sed’ processes escape sequences _before_ passing the text onto the
     2395regular-expression matching of the ‘s///’ command and Address matching.
     2396Thus the following two commands are equivalent (‘0x5e’ is the
     2397hexadecimal ASCII value of the character ‘^’):
     2398
     2399     $ echo 'a^c' | sed 's/^/b/'
     2400     ba^c
     2401
     2402     $ echo 'a^c' | sed 's/\x5e/b/'
     2403     ba^c
     2404
     2405   As are the following (‘0x5b’,‘0x5d’ are the hexadecimal ASCII values
     2406of ‘[’,‘]’, respectively):
     2407
     2408     $ echo abc | sed 's/[a]/x/'
     2409     Xbc
     2410     $ echo abc | sed 's/\x5ba\x5d/x/'
     2411     Xbc
     2412
     2413   However it is recommended to avoid such special characters due to
     2414unexpected edge-cases.  For example, the following are not equivalent:
     2415
     2416     $ echo 'a^c' | sed 's/\^/b/'
     2417     abc
     2418
     2419     $ echo 'a^c' | sed 's/\\\x5e/b/'
     2420     a^c
     2421
     2422   ---------- Footnotes ----------
     2423
     2424   (1) All the escapes introduced here are GNU extensions, with the
     2425exception of ‘\n’.  In basic regular expression mode, setting
     2426‘POSIXLY_CORRECT’ disables them inside bracket expressions.
     2427
     2428
     2429File: sed.info,  Node: Locale Considerations,  Prev: Escapes,  Up: sed regular expressions
     2430
     24315.9 Multibyte characters and Locale Considerations
     2432==================================================
     2433
     2434GNU ‘sed’ processes valid multibyte characters in multibyte locales
     2435(e.g.  ‘UTF-8’).  (1)
     2436
     2437The following example uses the Greek letter Capital Sigma (Σ, Unicode
     2438code point ‘0x03A3’).  In a ‘UTF-8’ locale, ‘sed’ correctly processes
     2439the Sigma as one character despite it being 2 octets (bytes):
     2440
     2441     $ locale | grep LANG
     2442     LANG=en_US.UTF-8
     2443
     2444     $ printf 'a\u03A3b'
     2445     aΣb
     2446
     2447     $ printf 'a\u03A3b' | sed 's/./X/g'
     2448     XXX
     2449
     2450     $ printf 'a\u03A3b' | od -tx1 -An
     2451      61 ce a3 62
     2452
     2453To force ‘sed’ to process octets separately, use the ‘C’ locale (also
     2454known as the ‘POSIX’ locale):
     2455
     2456     $ printf 'a\u03A3b' | LC_ALL=C sed 's/./X/g'
     2457     XXXX
     2458
     24595.9.1 Invalid multibyte characters
     2460----------------------------------
     2461
     2462‘sed’’s regular expressions _do not_ match invalid multibyte sequences
     2463in a multibyte locale.
     2464
     2465In the following examples, the ascii value ‘0xCE’ is an incomplete
     2466multibyte character (shown here as ᅵ).  The regular expression ‘.’ does
     2467not match it:
     2468
     2469     $ printf 'a\xCEb\n'
     2470     aᅵe
     2471
     2472     $ printf 'a\xCEb\n' | sed 's/./X/g'
     2473     XᅵX
     2474
     2475     $ printf 'a\xCEc\n' | sed 's/./X/g' | od -tx1c -An
     2476       58  ce  58  0a
     2477        X      X   \n
     2478
     2479Similarly, the ’catch-all’ regular expression ‘.*’ does not match the
     2480entire line:
     2481
     2482     $ printf 'a\xCEc\n' | sed 's/.*//' | od -tx1c -An
     2483       ce  63  0a
     2484            c  \n
     2485
     2486GNU ‘sed’ offers the special ‘z’ command to clear the current pattern
     2487space regardless of invalid multibyte characters (i.e.  it works like
     2488‘s/.*//’ but also removes invalid multibyte characters):
     2489
     2490     $ printf 'a\xCEc\n' | sed 'z' | od -tx1c -An
     2491        0a
     2492        \n
     2493
     2494Alternatively, force the ‘C’ locale to process each octet separately
     2495(every octet is a valid character in the ‘C’ locale):
     2496
     2497     $ printf 'a\xCEc\n' | LC_ALL=C sed 's/.*//' | od -tx1c -An
     2498       0a
     2499       \n
     2500
     2501   â€˜sed’’s inability to process invalid multibyte characters can be used
     2502to detect such invalid sequences in a file.  In the following examples,
     2503the ‘\xCE\xCE’ is an invalid multibyte sequence, while ‘\xCE\A3’ is a
     2504valid multibyte sequence (of the Greek Sigma character).
     2505
     2506The following ‘sed’ program removes all valid characters using ‘s/.//g’.
     2507Any content left in the pattern space (the invalid characters) are added
     2508to the hold space using the ‘H’ command.  On the last line (‘$’), the
     2509hold space is retrieved (‘x’), newlines are removed (‘s/\n//g’), and any
     2510remaining octets are printed unambiguously (‘l’).  Thus, any invalid
     2511multibyte sequences are printed as octal values:
     2512
     2513     $ printf 'ab\nc\n\xCE\xCEde\n\xCE\xA3f\n' > invalid.txt
     2514
     2515     $ cat invalid.txt
     2516     ab
     2517     c
     2518     ï¿œï¿œde
     2519     Î£f
     2520
     2521     $ sed -n 's/.//g ; H ; ${x;s/\n//g;l}' invalid.txt
     2522     \316\316$
     2523
     2524With a few more commands, ‘sed’ can print the exact line number
     2525corresponding to each invalid characters (line 3).  These characters can
     2526then be removed by forcing the ‘C’ locale and using octal escape
     2527sequences:
     2528
     2529     $ sed -n 's/.//g;=;l' invalid.txt | paste - -  | awk '$2!="$"'
     2530     3       \316\316$
     2531
     2532     $ LC_ALL=C sed '3s/\o316\o316//' invalid.txt > fixed.txt
     2533
     25345.9.2 Upper/Lower case conversion
     2535---------------------------------
     2536
     2537GNU ‘sed’’s substitute command (‘s’) supports upper/lower case
     2538conversions using ‘\U’,‘\L’ codes.  These conversions support multibyte
     2539characters:
     2540
     2541     $ printf 'ABC\u03a3\n'
     2542     ABCΣ
     2543
     2544     $ printf 'ABC\u03a3\n' | sed 's/.*/\L&/'
     2545     abcσ
     2546
     2547*Note The "s" Command::.
     2548
     25495.9.3 Multibyte regexp character classes
     2550----------------------------------------
     2551
     2552In other locales, the sorting sequence is not specified, and ‘[a-d]’
     2553might be equivalent to ‘[abcd]’ or to ‘[aBbCcDd]’, or it might fail to
     2554match any character, or the set of characters that it matches might even
     2555be erratic.  To obtain the traditional interpretation of bracket
     2556expressions, you can use the ‘C’ locale by setting the ‘LC_ALL’
     2557environment variable to the value ‘C’.
     2558
     2559     # TODO: is there any real-world system/locale where 'A'
     2560     #       is replaced by '-' ?
     2561     $ echo A | sed 's/[a-z]/-/'
     2562     A
     2563
     2564   Their interpretation depends on the ‘LC_CTYPE’ locale; for example,
     2565‘[[:alnum:]]’ means the character class of numbers and letters in the
     2566current locale.
     2567
     2568   TODO: show example of collation
     2569
     2570     # TODO: this works on glibc systems, not on musl-libc/freebsd/macosx.
     2571     $ printf 'cliché\n' | LC_ALL=fr_FR.utf8 sed 's/[[=e=]]/X/g'
     2572     clichX
     2573
     2574   ---------- Footnotes ----------
     2575
     2576   (1) Some regexp edge-cases depends on the operating system and libc
     2577implementation.  The examples shown are known to work as-expected on
     2578GNU/Linux systems using glibc.
     2579
     2580
     2581File: sed.info,  Node: advanced sed,  Next: Examples,  Prev: sed regular expressions,  Up: Top
     2582
     25836 Advanced ‘sed’: cycles and buffers
     2584************************************
     2585
     2586* Menu:
     2587
     2588* Execution Cycle::          How ‘sed’ works
     2589* Hold and Pattern Buffers::
     2590* Multiline techniques::     Using D,G,H,N,P to process multiple lines
     2591* Branching and flow control::
     2592
     2593
     2594File: sed.info,  Node: Execution Cycle,  Next: Hold and Pattern Buffers,  Up: advanced sed
     2595
     25966.1 How ‘sed’ Works
     2597===================
     2598
     2599‘sed’ maintains two data buffers: the active _pattern_ space, and the
     2600auxiliary _hold_ space.  Both are initially empty.
     2601
     2602   â€˜sed’ operates by performing the following cycle on each line of
     2603input: first, ‘sed’ reads one line from the input stream, removes any
     2604trailing newline, and places it in the pattern space.  Then commands are
     2605executed; each command can have an address associated to it: addresses
     2606are a kind of condition code, and a command is only executed if the
     2607condition is verified before the command is to be executed.
     2608
     2609   When the end of the script is reached, unless the ‘-n’ option is in
     2610use, the contents of pattern space are printed out to the output stream,
     2611adding back the trailing newline if it was removed.(1)  Then the next
     2612cycle starts for the next input line.
     2613
     2614   Unless special commands (like ‘D’) are used, the pattern space is
     2615deleted between two cycles.  The hold space, on the other hand, keeps
     2616its data between cycles (see commands ‘h’, ‘H’, ‘x’, ‘g’, ‘G’ to move
     2617data between both buffers).
     2618
     2619   ---------- Footnotes ----------
     2620
     2621   (1) Actually, if ‘sed’ prints a line without the terminating newline,
     2622it will nevertheless print the missing newline as soon as more text is
     2623sent to the same output stream, which gives the “least expected
     2624surprise” even though it does not make commands like ‘sed -n p’ exactly
     2625identical to ‘cat’.
     2626
     2627
     2628File: sed.info,  Node: Hold and Pattern Buffers,  Next: Multiline techniques,  Prev: Execution Cycle,  Up: advanced sed
     2629
     26306.2 Hold and Pattern Buffers
     2631============================
     2632
     2633TODO
     2634
     2635
     2636File: sed.info,  Node: Multiline techniques,  Next: Branching and flow control,  Prev: Hold and Pattern Buffers,  Up: advanced sed
     2637
     26386.3 Multiline techniques - using D,G,H,N,P to process multiple lines
     2639====================================================================
     2640
     2641Multiple lines can be processed as one buffer using the
     2642‘D’,‘G’,‘H’,‘N’,‘P’.  They are similar to their lowercase counterparts
     2643(‘d’,‘g’, ‘h’,‘n’,‘p’), except that these commands append or subtract
     2644data while respecting embedded newlines - allowing adding and removing
     2645lines from the pattern and hold spaces.
     2646
     2647   They operate as follows:
     2648‘D’
     2649     _deletes_ line from the pattern space until the first newline, and
     2650     restarts the cycle.
     2651
     2652‘G’
     2653     _appends_ line from the hold space to the pattern space, with a
     2654     newline before it.
     2655
     2656‘H’
     2657     _appends_ line from the pattern space to the hold space, with a
     2658     newline before it.
     2659
     2660‘N’
     2661     _appends_ line from the input file to the pattern space.
     2662
     2663‘P’
     2664     _prints_ line from the pattern space until the first newline.
     2665
     2666   The following example illustrates the operation of ‘N’ and ‘D’
     2667commands:
     2668
     2669     $ seq 6 | sed -n 'N;l;D'
     2670     1\n2$
     2671     2\n3$
     2672     3\n4$
     2673     4\n5$
     2674     5\n6$
     2675
     2676  1. ‘sed’ starts by reading the first line into the pattern space (i.e.
     2677     â€˜1’).
     2678  2. At the beginning of every cycle, the ‘N’ command appends a newline
     2679     and the next line to the pattern space (i.e.  ‘1’, ‘\n’, ‘2’ in the
     2680     first cycle).
     2681  3. The ‘l’ command prints the content of the pattern space
     2682     unambiguously.
     2683  4. The ‘D’ command then removes the content of pattern space up to the
     2684     first newline (leaving ‘2’ at the end of the first cycle).
     2685  5. At the next cycle the ‘N’ command appends a newline and the next
     2686     input line to the pattern space (e.g.  ‘2’, ‘\n’, ‘3’).
     2687
     2688   A common technique to process blocks of text such as paragraphs
     2689(instead of line-by-line) is using the following construct:
     2690
     2691     sed '/./{H;$!d} ; x ; s/REGEXP/REPLACEMENT/'
     2692
     2693  1. The first expression, ‘/./{H;$!d}’ operates on all non-empty lines,
     2694     and adds the current line (in the pattern space) to the hold space.
     2695     On all lines except the last, the pattern space is deleted and the
     2696     cycle is restarted.
     2697
     2698  2. The other expressions ‘x’ and ‘s’ are executed only on empty lines
     2699     (i.e.  paragraph separators).  The ‘x’ command fetches the
     2700     accumulated lines from the hold space back to the pattern space.
     2701     The ‘s///’ command then operates on all the text in the paragraph
     2702     (including the embedded newlines).
     2703
     2704   The following example demonstrates this technique:
     2705     $ cat input.txt
     2706     a a a aa aaa
     2707     aaaa aaaa aa
     2708     aaaa aaa aaa
     2709
     2710     bbbb bbb bbb
     2711     bb bb bbb bb
     2712     bbbbbbbb bbb
     2713
     2714     ccc ccc cccc
     2715     cccc ccccc c
     2716     cc cc cc cc
     2717
     2718     $ sed '/./{H;$!d} ; x ; s/^/\nSTART-->/ ; s/$/\n<--END/' input.txt
     2719
     2720     START-->
     2721     a a a aa aaa
     2722     aaaa aaaa aa
     2723     aaaa aaa aaa
     2724     <--END
     2725
     2726     START-->
     2727     bbbb bbb bbb
     2728     bb bb bbb bb
     2729     bbbbbbbb bbb
     2730     <--END
     2731
     2732     START-->
     2733     ccc ccc cccc
     2734     cccc ccccc c
     2735     cc cc cc cc
     2736     <--END
     2737
     2738   For more annotated examples, *note Text search across multiple
     2739lines:: and *note Line length adjustment::.
     2740
     2741
     2742File: sed.info,  Node: Branching and flow control,  Prev: Multiline techniques,  Up: advanced sed
     2743
     27446.4 Branching and Flow Control
     2745==============================
     2746
     2747The branching commands ‘b’, ‘t’, and ‘T’ enable changing the flow of
     2748‘sed’ programs.
     2749
     2750   By default, ‘sed’ reads an input line into the pattern buffer, then
     2751continues to processes all commands in order.  Commands without
     2752addresses affect all lines.  Commands with addresses affect only
     2753matching lines.  *Note Execution Cycle:: and *note Addresses overview::.
     2754
     2755   â€˜sed’ does not support a typical ‘if/then’ construct.  Instead, some
     2756commands can be used as conditionals or to change the default flow
     2757control:
     2758
     2759‘d’
     2760     delete (clears) the current pattern space, and restart the program
     2761     cycle without processing the rest of the commands and without
     2762     printing the pattern space.
     2763
     2764‘D’
     2765     delete the contents of the pattern space _up to the first newline_,
     2766     and restart the program cycle without processing the rest of the
     2767     commands and without printing the pattern space.
     2768
     2769‘[addr]X’
     2770‘[addr]{ X ; X ; X }’
     2771‘/regexp/X’
     2772‘/regexp/{ X ; X ; X }’
     2773     Addresses and regular expressions can be used as an ‘if/then’
     2774     conditional: If [ADDR] matches the current pattern space, execute
     2775     the command(s).  For example: The command ‘/^#/d’ means: _if_ the
     2776     current pattern matches the regular expression ‘^#’ (a line
     2777     starting with a hash), _then_ execute the ‘d’ command: delete the
     2778     line without printing it, and restart the program cycle
     2779     immediately.
     2780
     2781‘b’
     2782     branch unconditionally (that is: always jump to a label, skipping
     2783     or repeating other commands, without restarting a new cycle).
     2784     Combined with an address, the branch can be conditionally executed
     2785     on matched lines.
     2786
     2787‘t’
     2788     branch conditionally (that is: jump to a label) _only if_ a ‘s///’
     2789     command has succeeded since the last input line was read or another
     2790     conditional branch was taken.
     2791
     2792‘T’
     2793     similar but opposite to the ‘t’ command: branch only if there has
     2794     been _no_ successful substitutions since the last input line was
     2795     read.
     2796
     2797   The following two ‘sed’ programs are equivalent.  The first
     2798(contrived) example uses the ‘b’ command to skip the ‘s///’ command on
     2799lines containing ‘1’.  The second example uses an address with negation
     2800(‘!’) to perform substitution only on desired lines.  The ‘y///’ command
     2801is still executed on all lines:
     2802
     2803     $ printf '%s\n' a1 a2 a3 | sed -E '/1/bx ; s/a/z/ ; :x ; y/123/456/'
     2804     a4
     2805     z5
     2806     z6
     2807
     2808     $ printf '%s\n' a1 a2 a3 | sed -E '/1/!s/a/z/ ; y/123/456/'
     2809     a4
     2810     z5
     2811     z6
     2812
     28136.4.1 Branching and Cycles
     2814--------------------------
     2815
     2816The ‘b’,‘t’ and ‘T’ commands can be followed by a label (typically a
     2817single letter).  Labels are defined with a colon followed by one or more
     2818letters (e.g.  ‘:x’).  If the label is omitted the branch commands
     2819restart the cycle.  Note the difference between branching to a label and
     2820restarting the cycle: when a cycle is restarted, ‘sed’ first prints the
     2821current content of the pattern space, then reads the next input line
     2822into the pattern space; Jumping to a label (even if it is at the
     2823beginning of the program) does not print the pattern space and does not
     2824read the next input line.
     2825
     2826   The following program is a no-op.  The ‘b’ command (the only command
     2827in the program) does not have a label, and thus simply restarts the
     2828cycle.  On each cycle, the pattern space is printed and the next input
     2829line is read:
     2830
     2831     $ seq 3 | sed b
     2832     1
     2833     2
     2834     3
     2835
     2836   The following example is an infinite-loop - it doesn’t terminate and
     2837doesn’t print anything.  The ‘b’ command jumps to the ‘x’ label, and a
     2838new cycle is never started:
     2839
     2840     $ seq 3 | sed ':x ; bx'
     2841
     2842     # The above command requires gnu sed (which supports additional
     2843     # commands following a label, without a newline). A portable equivalent:
     2844     #     sed -e ':x' -e bx
     2845
     2846   Branching is often complemented with the ‘n’ or ‘N’ commands: both
     2847commands read the next input line into the pattern space without waiting
     2848for the cycle to restart.  Before reading the next input line, ‘n’
     2849prints the current pattern space then empties it, while ‘N’ appends a
     2850newline and the next input line to the pattern space.
     2851
     2852   Consider the following two examples:
     2853
     2854     $ seq 3 | sed ':x ; n ; bx'
     2855     1
     2856     2
     2857     3
     2858
     2859     $ seq 3 | sed ':x ; N ; bx'
     2860     1
     2861     2
     2862     3
     2863
     2864   â€¢ Both examples do not inf-loop, despite never starting a new cycle.
     2865
     2866   â€¢ In the first example, the ‘n’ commands first prints the content of
     2867     the pattern space, empties the pattern space then reads the next
     2868     input line.
     2869
     2870   â€¢ In the second example, the ‘N’ commands appends the next input line
     2871     to the pattern space (with a newline).  Lines are accumulated in
     2872     the pattern space until there are no more input lines to read, then
     2873     the ‘N’ command terminates the ‘sed’ program.  When the program
     2874     terminates, the end-of-cycle actions are performed, and the entire
     2875     pattern space is printed.
     2876
     2877   â€¢ The second example requires GNU ‘sed’, because it uses the
     2878     non-POSIX-standard behavior of ‘N’.  See the “‘N’ command on the
     2879     last line” paragraph in *note Reporting Bugs::.
     2880
     2881   â€¢ To further examine the difference between the two examples, try the
     2882     following commands:
     2883          printf '%s\n' aa bb cc dd | sed ':x ; n ; = ; bx'
     2884          printf '%s\n' aa bb cc dd | sed ':x ; N ; = ; bx'
     2885          printf '%s\n' aa bb cc dd | sed ':x ; n ; s/\n/***/ ; bx'
     2886          printf '%s\n' aa bb cc dd | sed ':x ; N ; s/\n/***/ ; bx'
     2887
     28886.4.2 Branching example: joining lines
     2889--------------------------------------
     2890
     2891As a real-world example of using branching, consider the case of
     2892quoted-printable (https://en.wikipedia.org/wiki/Quoted-printable) files,
     2893typically used to encode email messages.  In these files long lines are
     2894split and marked with a “soft line break” consisting of a single ‘=’
     2895character at the end of the line:
     2896
     2897     $ cat jaques.txt
     2898     All the wor=
     2899     ld's a stag=
     2900     e,
     2901     And all the=
     2902      men and wo=
     2903     men merely =
     2904     players:
     2905     They have t=
     2906     heir exits =
     2907     and their e=
     2908     ntrances;
     2909     And one man=
     2910      in his tim=
     2911     e plays man=
     2912     y parts.
     2913
     2914   The following program uses an address match ‘/=$/’ as a conditional:
     2915If the current pattern space ends with a ‘=’, it reads the next input
     2916line using ‘N’, replaces all ‘=’ characters which are followed by a
     2917newline, and unconditionally branches (‘b’) to the beginning of the
     2918program without restarting a new cycle.  If the pattern space does not
     2919ends with ‘=’, the default action is performed: the pattern space is
     2920printed and a new cycle is started:
     2921
     2922     $ sed ':x ; /=$/ { N ; s/=\n//g ; bx }' jaques.txt
     2923     All the world's a stage,
     2924     And all the men and women merely players:
     2925     They have their exits and their entrances;
     2926     And one man in his time plays many parts.
     2927
     2928   Here’s an alternative program with a slightly different approach: On
     2929all lines except the last, ‘N’ appends the line to the pattern space.  A
     2930substitution command then removes soft line breaks (‘=’ at the end of a
     2931line, i.e.  followed by a newline) by replacing them with an empty
     2932string.  _if_ the substitution was successful (meaning the pattern space
     2933contained a line which should be joined), The conditional branch command
     2934‘t’ jumps to the beginning of the program without completing or
     2935restarting the cycle.  If the substitution failed (meaning there were no
     2936soft line breaks), The ‘t’ command will _not_ branch.  Then, ‘P’ will
     2937print the pattern space content until the first newline, and ‘D’ will
     2938delete the pattern space content until the first new line.  (To learn
     2939more about ‘N’, ‘P’ and ‘D’ commands *note Multiline techniques::).
     2940
     2941     $ sed ':x ; $!N ; s/=\n// ; tx ; P ; D' jaques.txt
     2942     All the world's a stage,
     2943     And all the men and women merely players:
     2944     They have their exits and their entrances;
     2945     And one man in his time plays many parts.
     2946
     2947   For more line-joining examples *note Joining lines::.
     2948
     2949
     2950File: sed.info,  Node: Examples,  Next: Limitations,  Prev: advanced sed,  Up: Top
     2951
     29527 Some Sample Scripts
     2953*********************
     2954
     2955Here are some ‘sed’ scripts to guide you in the art of mastering ‘sed’.
     2956
     2957* Menu:
     2958
     2959
     2960Useful one-liners:
     2961* Joining lines::
     2962
     2963Some exotic examples:
     2964* Centering lines::
     2965* Increment a number::
     2966* Rename files to lower case::
     2967* Print bash environment::
     2968* Reverse chars of lines::
     2969* Text search across multiple lines::
     2970* Line length adjustment::
     2971* Adding a header to multiple files::
     2972
     2973Emulating standard utilities:
     2974* tac::                             Reverse lines of files
     2975* cat -n::                          Numbering lines
     2976* cat -b::                          Numbering non-blank lines
     2977* wc -c::                           Counting chars
     2978* wc -w::                           Counting words
     2979* wc -l::                           Counting lines
     2980* head::                            Printing the first lines
     2981* tail::                            Printing the last lines
     2982* uniq::                            Make duplicate lines unique
     2983* uniq -d::                         Print duplicated lines of input
     2984* uniq -u::                         Remove all duplicated lines
     2985* cat -s::                          Squeezing blank lines
     2986
     2987
     2988File: sed.info,  Node: Joining lines,  Next: Centering lines,  Up: Examples
     2989
     29907.1 Joining lines
     2991=================
     2992
     2993This section uses ‘N’, ‘D’ and ‘P’ commands to process multiple lines,
     2994and the ‘b’ and ‘t’ commands for branching.  *Note Multiline
     2995techniques:: and *note Branching and flow control::.
     2996
     2997   Join specific lines (e.g.  if lines 2 and 3 need to be joined):
     2998
     2999     $ cat lines.txt
     3000     hello
     3001     hel
     3002     lo
     3003     hello
     3004
     3005     $ sed '2{N;s/\n//;}' lines.txt
     3006     hello
     3007     hello
     3008     hello
     3009
     3010   Join backslash-continued lines:
     3011
     3012     $ cat 1.txt
     3013     this \
     3014     is \
     3015     a \
     3016     long \
     3017     line
     3018     and another \
     3019     line
     3020
     3021     $ sed -e ':x /\\$/ { N; s/\\\n//g ; bx }'  1.txt
     3022     this is a long line
     3023     and another line
     3024
     3025
     3026     #TODO: The above requires gnu sed.
     3027     #      non-gnu seds need newlines after ':' and 'b'
     3028
     3029   Join lines that start with whitespace (e.g SMTP headers):
     3030
     3031     $ cat 2.txt
     3032     Subject: Hello
     3033         World
     3034     Content-Type: multipart/alternative;
     3035         boundary=94eb2c190cc6370f06054535da6a
     3036     Date: Tue, 3 Jan 2017 19:41:16 +0000 (GMT)
     3037     Authentication-Results: mx.gnu.org;
     3038            dkim=pass header.i=@gnu.org;
     3039            spf=pass
     3040     Message-ID: <abcdef@gnu.org>
     3041     From: John Doe <jdoe@gnu.org>
     3042     To: Jane Smith <jsmith@gnu.org>
     3043
     3044     $ sed -E ':a ; $!N ; s/\n\s+/ / ; ta ; P ; D' 2.txt
     3045     Subject: Hello World
     3046     Content-Type: multipart/alternative; boundary=94eb2c190cc6370f06054535da6a
     3047     Date: Tue, 3 Jan 2017 19:41:16 +0000 (GMT)
     3048     Authentication-Results: mx.gnu.org; dkim=pass header.i=@gnu.org; spf=pass
     3049     Message-ID: <abcdef@gnu.org>
     3050     From: John Doe <jdoe@gnu.org>
     3051     To: Jane Smith <jsmith@gnu.org>
     3052
     3053     # A portable (non-gnu) variation:
     3054     #   sed -e :a -e '$!N;s/\n  */ /;ta' -e 'P;D'
     3055
     3056
     3057File: sed.info,  Node: Centering lines,  Next: Increment a number,  Prev: Joining lines,  Up: Examples
     3058
     30597.2 Centering Lines
     3060===================
     3061
     3062This script centers all lines of a file on a 80 columns width.  To
     3063change that width, the number in ‘\{...\}’ must be replaced, and the
     3064number of added spaces also must be changed.
     3065
     3066   Note how the buffer commands are used to separate parts in the
     3067regular expressions to be matched—this is a common technique.
     3068
     3069     #!/usr/bin/sed -f
     3070
     3071     # Put 80 spaces in the buffer
     3072     1 {
     3073       x
     3074       s/^$/          /
     3075       s/^.*$/&&&&&&&&/
     3076       x
     3077     }
     3078
     3079     # delete leading and trailing spaces
     3080     y/<TAB>/ /
     3081     s/^ *//
     3082     s/ *$//
     3083
     3084     # add a newline and 80 spaces to end of line
     3085     G
     3086
     3087     # keep first 81 chars (80 + a newline)
     3088     s/^\(.\{81\}\).*$/\1/
     3089
     3090     # \2 matches half of the spaces, which are moved to the beginning
     3091     s/^\(.*\)\n\(.*\)\2/\2\1/
     3092
     3093
     3094File: sed.info,  Node: Increment a number,  Next: Rename files to lower case,  Prev: Centering lines,  Up: Examples
     3095
     30967.3 Increment a Number
     3097======================
     3098
     3099This script is one of a few that demonstrate how to do arithmetic in
     3100‘sed’.  This is indeed possible,(1) but must be done manually.
     3101
     3102   To increment one number you just add 1 to last digit, replacing it by
     3103the following digit.  There is one exception: when the digit is a nine
     3104the previous digits must be also incremented until you don’t have a
     3105nine.
     3106
     3107   This solution by Bruno Haible is very clever and smart because it
     3108uses a single buffer; if you don’t have this limitation, the algorithm
     3109used in *note Numbering lines: cat -n, is faster.  It works by replacing
     3110trailing nines with an underscore, then using multiple ‘s’ commands to
     3111increment the last digit, and then again substituting underscores with
     3112zeros.
     3113
     3114     #!/usr/bin/sed -f
     3115
     3116     /[^0-9]/ d
     3117
     3118     # replace all trailing 9s by _ (any other character except digits, could
     3119     # be used)
     3120     :d
     3121     s/9\(_*\)$/_\1/
     3122     td
     3123
     3124     # incr last digit only.  The first line adds a most-significant
     3125     # digit of 1 if we have to add a digit.
     3126
     3127     s/^\(_*\)$/1\1/; tn
     3128     s/8\(_*\)$/9\1/; tn
     3129     s/7\(_*\)$/8\1/; tn
     3130     s/6\(_*\)$/7\1/; tn
     3131     s/5\(_*\)$/6\1/; tn
     3132     s/4\(_*\)$/5\1/; tn
     3133     s/3\(_*\)$/4\1/; tn
     3134     s/2\(_*\)$/3\1/; tn
     3135     s/1\(_*\)$/2\1/; tn
     3136     s/0\(_*\)$/1\1/; tn
     3137
     3138     :n
     3139     y/_/0/
     3140
     3141   ---------- Footnotes ----------
     3142
     3143   (1) ‘sed’ guru Greg Ubben wrote an implementation of the ‘dc’ RPN
     3144calculator!  It is distributed together with sed.
     3145
     3146
     3147File: sed.info,  Node: Rename files to lower case,  Next: Print bash environment,  Prev: Increment a number,  Up: Examples
     3148
     31497.4 Rename Files to Lower Case
     3150==============================
     3151
     3152This is a pretty strange use of ‘sed’.  We transform text, and transform
     3153it to be shell commands, then just feed them to shell.  Don’t worry,
     3154even worse hacks are done when using ‘sed’; I have seen a script
     3155converting the output of ‘date’ into a ‘bc’ program!
     3156
     3157   The main body of this is the ‘sed’ script, which remaps the name from
     3158lower to upper (or vice-versa) and even checks out if the remapped name
     3159is the same as the original name.  Note how the script is parameterized
     3160using shell variables and proper quoting.
     3161
     3162     #! /bin/sh
     3163     # rename files to lower/upper case...
     3164     #
     3165     # usage:
     3166     #    move-to-lower *
     3167     #    move-to-upper *
     3168     # or
     3169     #    move-to-lower -R .
     3170     #    move-to-upper -R .
     3171     #
     3172
     3173     help()
     3174     {
     3175             cat << eof
     3176     Usage: $0 [-n] [-r] [-h] files...
     3177
     3178     -n      do nothing, only see what would be done
     3179     -R      recursive (use find)
     3180     -h      this message
     3181     files   files to remap to lower case
     3182
     3183     Examples:
     3184            $0 -n *        (see if everything is ok, then...)
     3185            $0 *
     3186
     3187            $0 -R .
     3188
     3189     eof
     3190     }
     3191
     3192     apply_cmd='sh'
     3193     finder='echo "$@" | tr " " "\n"'
     3194     files_only=
     3195
     3196     while :
     3197     do
     3198         case "$1" in
     3199             -n) apply_cmd='cat' ;;
     3200             -R) finder='find "$@" -type f';;
     3201             -h) help ; exit 1 ;;
     3202             *) break ;;
     3203         esac
     3204         shift
     3205     done
     3206
     3207     if [ -z "$1" ]; then
     3208             echo Usage: $0 [-h] [-n] [-r] files...
     3209             exit 1
     3210     fi
     3211
     3212     LOWER='abcdefghijklmnopqrstuvwxyz'
     3213     UPPER='ABCDEFGHIJKLMNOPQRSTUVWXYZ'
     3214
     3215     case `basename $0` in
     3216             *upper*) TO=$UPPER; FROM=$LOWER ;;
     3217             *)       FROM=$UPPER; TO=$LOWER ;;
     3218     esac
     3219
     3220     eval $finder | sed -n '
     3221
     3222     # remove all trailing slashes
     3223     s/\/*$//
     3224
     3225     # add ./ if there is no path, only a filename
     3226     /\//! s/^/.\//
     3227
     3228     # save path+filename
     3229     h
     3230
     3231     # remove path
     3232     s/.*\///
     3233
     3234     # do conversion only on filename
     3235     y/'$FROM'/'$TO'/
     3236
     3237     # now line contains original path+file, while
     3238     # hold space contains the new filename
     3239     x
     3240
     3241     # add converted file name to line, which now contains
     3242     # path/file-name\nconverted-file-name
     3243     G
     3244
     3245     # check if converted file name is equal to original file name,
     3246     # if it is, do not print anything
     3247     /^.*\/\(.*\)\n\1/b
     3248
     3249     # escape special characters for the shell
     3250     s/["$`\\]/\\&/g
     3251
     3252     # now, transform path/fromfile\n, into
     3253     # mv path/fromfile path/tofile and print it
     3254     s/^\(.*\/\)\(.*\)\n\(.*\)$/mv "\1\2" "\1\3"/p
     3255
     3256     ' | $apply_cmd
     3257
     3258
     3259File: sed.info,  Node: Print bash environment,  Next: Reverse chars of lines,  Prev: Rename files to lower case,  Up: Examples
     3260
     32617.5 Print ‘bash’ Environment
     3262============================
     3263
     3264This script strips the definition of the shell functions from the output
     3265of the ‘set’ Bourne-shell command.
     3266
     3267     #!/bin/sh
     3268
     3269     set | sed -n '
     3270     :x
     3271
     3272     # if no occurrence of "=()" print and load next line
     3273     /=()/! { p; b; }
     3274     / () $/! { p; b; }
     3275
     3276     # possible start of functions section
     3277     # save the line in case this is a var like FOO="() "
     3278     h
     3279
     3280     # if the next line has a brace, we quit because
     3281     # nothing comes after functions
     3282     n
     3283     /^{/ q
     3284
     3285     # print the old line
     3286     x; p
     3287
     3288     # work on the new line now
     3289     x; bx
     3290     '
     3291
     3292
     3293File: sed.info,  Node: Reverse chars of lines,  Next: Text search across multiple lines,  Prev: Print bash environment,  Up: Examples
     3294
     32957.6 Reverse Characters of Lines
     3296===============================
     3297
     3298This script can be used to reverse the position of characters in lines.
     3299The technique moves two characters at a time, hence it is faster than
     3300more intuitive implementations.
     3301
     3302   Note the ‘tx’ command before the definition of the label.  This is
     3303often needed to reset the flag that is tested by the ‘t’ command.
     3304
     3305   Imaginative readers will find uses for this script.  An example is
     3306reversing the output of ‘banner’.(1)
     3307
     3308     #!/usr/bin/sed -f
     3309
     3310     /../! b
     3311
     3312     # Reverse a line.  Begin embedding the line between two newlines
     3313     s/^.*$/\
     3314     &\
     3315     /
     3316
     3317     # Move first character at the end.  The regexp matches until
     3318     # there are zero or one characters between the markers
     3319     tx
     3320     :x
     3321     s/\(\n.\)\(.*\)\(.\n\)/\3\2\1/
     3322     tx
     3323
     3324     # Remove the newline markers
     3325     s/\n//g
     3326
     3327   ---------- Footnotes ----------
     3328
     3329   (1) This requires another script to pad the output of banner; for
     3330example
     3331
     3332     #! /bin/sh
     3333
     3334     banner -w $1 $2 $3 $4 |
     3335       sed -e :a -e '/^.\{0,'$1'\}$/ { s/$/ /; ba; }' |
     3336       ~/sedscripts/reverseline.sed
     3337
     3338
     3339File: sed.info,  Node: Text search across multiple lines,  Next: Line length adjustment,  Prev: Reverse chars of lines,  Up: Examples
     3340
     33417.7 Text search across multiple lines
     3342=====================================
     3343
     3344This section uses ‘N’ and ‘D’ commands to search for consecutive words
     3345spanning multiple lines.  *Note Multiline techniques::.
     3346
     3347   These examples deal with finding doubled occurrences of words in a
     3348document.
     3349
     3350   Finding doubled words in a single line is easy using GNU ‘grep’ and
     3351similarly with GNU ‘sed’:
     3352
     3353     $ cat two-cities-dup1.txt
     3354     It was the best of times,
     3355     it was the worst of times,
     3356     it was the the age of wisdom,
     3357     it was the age of foolishness,
     3358
     3359     $ grep -E '\b(\w+)\s+\1\b' two-cities-dup1.txt
     3360     it was the the age of wisdom,
     3361
     3362     $ grep -n -E '\b(\w+)\s+\1\b' two-cities-dup1.txt
     3363     3:it was the the age of wisdom,
     3364
     3365     $ sed -En '/\b(\w+)\s+\1\b/p' two-cities-dup1.txt
     3366     it was the the age of wisdom,
     3367
     3368     $ sed -En '/\b(\w+)\s+\1\b/{=;p}' two-cities-dup1.txt
     3369     3
     3370     it was the the age of wisdom,
     3371
     3372   â€¢ The regular expression ‘\b\w+\s+’ searches for word-boundary
     3373     (‘\b’), followed by one-or-more word-characters (‘\w+’), followed
     3374     by whitespace (‘\s+’).  *Note regexp extensions::.
     3375
     3376   â€¢ Adding parentheses around the ‘(\w+)’ expression creates a
     3377     subexpression.  The regular expression pattern ‘(PATTERN)\s+\1’
     3378     defines a subexpression (in the parentheses) followed by a
     3379     back-reference, separated by whitespace.  A successful match means
     3380     the PATTERN was repeated twice in succession.  *Note
     3381     Back-references and Subexpressions::.
     3382
     3383   â€¢ The word-boundery expression (‘\b’) at both ends ensures partial
     3384     words are not matched (e.g.  ‘the then’ is not a desired match).
     3385
     3386   â€¢ The ‘-E’ option enables extended regular expression syntax,
     3387     alleviating the need to add backslashes before the parenthesis.
     3388     *Note ERE syntax::.
     3389
     3390   When the doubled word span two lines the above regular expression
     3391will not find them as ‘grep’ and ‘sed’ operate line-by-line.
     3392
     3393   By using ‘N’ and ‘D’ commands, ‘sed’ can apply regular expressions on
     3394multiple lines (that is, multiple lines are stored in the pattern space,
     3395and the regular expression works on it):
     3396
     3397     $ cat two-cities-dup2.txt
     3398     It was the best of times, it was the
     3399     worst of times, it was the
     3400     the age of wisdom,
     3401     it was the age of foolishness,
     3402
     3403     $ sed -En '{N; /\b(\w+)\s+\1\b/{=;p} ; D}'  two-cities-dup2.txt
     3404     3
     3405     worst of times, it was the
     3406     the age of wisdom,
     3407
     3408   â€¢ The ‘N’ command appends the next line to the pattern space (thus
     3409     ensuring it contains two consecutive lines in every cycle).
     3410
     3411   â€¢ The regular expression uses ‘\s+’ for word separator which matches
     3412     both spaces and newlines.
     3413
     3414   â€¢ The regular expression matches, the entire pattern space is printed
     3415     with ‘p’.  No lines are printed by default due to the ‘-n’ option.
     3416
     3417   â€¢ The ‘D’ removes the first line from the pattern space (up until the
     3418     first newline), readying it for the next cycle.
     3419
     3420   See the GNU ‘coreutils’ manual for an alternative solution using ‘tr
     3421-s’ and ‘uniq’ at
     3422<https://gnu.org/s/coreutils/manual/html_node/Squeezing-and-deleting.html>.
     3423
     3424
     3425File: sed.info,  Node: Line length adjustment,  Next: Adding a header to multiple files,  Prev: Text search across multiple lines,  Up: Examples
     3426
     34277.8 Line length adjustment
     3428==========================
     3429
     3430This section uses ‘N’ and ‘P’ commands to read and write lines, and the
     3431‘b’ command for branching.  *Note Multiline techniques:: and *note
     3432Branching and flow control::.
     3433
     3434   This (somewhat contrived) example deal with formatting and wrapping
     3435lines of text of the following input file:
     3436
     3437     $ cat two-cities-mix.txt
     3438     It was the best of times, it was
     3439     the worst of times, it
     3440     was the age of
     3441     wisdom,
     3442     it
     3443     was
     3444     the age
     3445     of foolishness,
     3446
     3447The following sed program wraps lines at 40 characters:
     3448     $ cat wrap40.sed
     3449     # outer loop
     3450     :x
     3451
     3452     # Append a newline followed by the next input line to the pattern buffer
     3453     N
     3454
     3455     # Remove all newlines from the pattern buffer
     3456     s/\n/ /g
     3457
     3458
     3459     # Inner loop
     3460     :y
     3461
     3462     # Add a newline after the first 40 characters
     3463     s/(.{40,40})/\1\n/
     3464
     3465     # If there is a newline in the pattern buffer
     3466     # (i.e. the previous substitution added a newline)
     3467     /\n/ {
     3468         # There are newlines in the pattern buffer -
     3469         # print the content until the first newline.
     3470         P
     3471
     3472        # Remove the printed characters and the first newline
     3473        s/.*\n//
     3474
     3475        # branch to label 'y' - repeat inner loop
     3476        by
     3477      }
     3478
     3479     # No newlines in the pattern buffer - Branch to label 'x' (outer loop)
     3480     # and read the next input line
     3481     bx
     3482
     3483The wrapped output:
     3484     $ sed -E -f wrap40.sed two-cities-mix.txt
     3485     It was the best of times, it was the wor
     3486     st of times, it was the age of wisdom, i
     3487     t was the age of foolishness,
     3488
     3489
     3490File: sed.info,  Node: Adding a header to multiple files,  Next: tac,  Prev: Line length adjustment,  Up: Examples
     3491
     34927.9 Adding a header to multiple files
     3493=====================================
     3494
     3495GNU ‘sed’ can be used to safely modify multiple files at once.
     3496
     3497Add a single line to the beginning of source code files:
     3498
     3499     sed -i '1i/* Copyright (C) FOO BAR */' *.c
     3500
     3501Adding a few lines is possible using ‘\n’ in the text:
     3502
     3503     sed -i '1i/*\n * Copyright (C) FOO BAR\n * Created by Jane Doe\n */' *.c
     3504
     3505   To add multiple lines from another file, use ‘0rFILE’.  A typical use
     3506case is adding a license notice header to all files:
     3507
     3508     ## Create the header file:
     3509     $ cat<<'EOF'>LIC.TXT
     3510     /*
     3511         Copyright (C) 1989-2021 FOO BAR
     3512
     3513         This program is free software; you can redistribute it and/or modify
     3514         it under the terms of the GNU General Public License as published by
     3515         the Free Software Foundation; either version 3, or (at your option)
     3516         any later version.
     3517
     3518         This program is distributed in the hope that it will be useful,
     3519         but WITHOUT ANY WARRANTY; without even the implied warranty of
     3520         MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
     3521         GNU General Public License for more details.
     3522
     3523         You should have received a copy of the GNU General Public License
     3524         along with this program; If not, see <https://www.gnu.org/licenses/>.
     3525     */
     3526     EOF
     3527
     3528     ## Add the file at the beginning of all source code files:
     3529     $ sed -i '0rLIC.TXT' *.cpp *.h
     3530
     3531   With script files (e.g.  ‘.sh’,‘.py’,‘.pl’ files) the license notice
     3532typically appears _after_ the first line (the ’shebang’ ‘#!’ line).  The
     3533‘1rFILE’ command will add ‘FILE’ _after_ the first line:
     3534
     3535     ## Create the header file:
     3536     $ cat<<'EOF'>LIC.TXT
     3537     ##
     3538     ## Copyright (C) 1989-2021 FOO BAR
     3539     ##
     3540     ## This program is free software; you can redistribute it and/or modify
     3541     ## it under the terms of the GNU General Public License as published by
     3542     ## the Free Software Foundation; either version 3, or (at your option)
     3543     ## any later version.
     3544     ##
     3545     ## This program is distributed in the hope that it will be useful,
     3546     ## but WITHOUT ANY WARRANTY; without even the implied warranty of
     3547     ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
     3548     ## GNU General Public License for more details.
     3549     ##
     3550     ## You should have received a copy of the GNU General Public License
     3551     ## along with this program; If not, see <https://www.gnu.org/licenses/>.
     3552     ##
     3553     ##
     3554     EOF
     3555
     3556     ## Add the file at the beginning of all source code files:
     3557     $ sed -i '1rLIC.TXT' *.py *.sh
     3558
     3559   The above ‘sed’ commands can be combined with ‘find’ to locate files
     3560in all subdirectories, ‘xargs’ to run additional commands on selected
     3561files and ‘grep’ to filter out files that already contain a copyright
     3562notice:
     3563
     3564     find \( -iname '*.cpp' -o -iname '*.c' -o -iname '*.h' \) \
     3565         | xargs grep -Li copyright \
     3566         | xargs -r sed -i '0rLIC.TXT'
     3567
     3568Or a slightly safe version (handling files with spaces and newlines):
     3569
     3570     find \( -iname '*.cpp' -o -iname '*.c' -o -iname '*.h' \) -print0 \
     3571         | xargs -0 grep -Z -Li copyright \
     3572         | xargs -0 -r sed -i '0rLIC.TXT'
     3573
     3574   Note: using the ‘0’ address with ‘r’ command requires GNU ‘sed’
     3575version 4.9 or later.  *Note Zero Address::.
     3576
     3577
     3578File: sed.info,  Node: tac,  Next: cat -n,  Prev: Adding a header to multiple files,  Up: Examples
     3579
     35807.10 Reverse Lines of Files
     3581===========================
     3582
     3583This one begins a series of totally useless (yet interesting) scripts
     3584emulating various Unix commands.  This, in particular, is a ‘tac’
     3585workalike.
     3586
     3587   Note that on implementations other than GNU ‘sed’ this script might
     3588easily overflow internal buffers.
     3589
     3590     #!/usr/bin/sed -nf
     3591
     3592     # reverse all lines of input, i.e. first line became last, ...
     3593
     3594     # from the second line, the buffer (which contains all previous lines)
     3595     # is *appended* to current line, so, the order will be reversed
     3596     1! G
     3597
     3598     # on the last line we're done -- print everything
     3599     $ p
     3600
     3601     # store everything on the buffer again
     3602     h
     3603
     3604
     3605File: sed.info,  Node: cat -n,  Next: cat -b,  Prev: tac,  Up: Examples
     3606
     36077.11 Numbering Lines
     3608====================
     3609
     3610This script replaces ‘cat -n’; in fact it formats its output exactly
     3611like GNU ‘cat’ does.
     3612
     3613   Of course this is completely useless and for two reasons: first,
     3614because somebody else did it in C, second, because the following
     3615Bourne-shell script could be used for the same purpose and would be much
     3616faster:
     3617
     3618     #! /bin/sh
     3619     sed -e "=" $@ | sed -e '
     3620       s/^/      /
     3621       N
     3622       s/^ *\(......\)\n/\1  /
     3623     '
     3624
     3625   It uses ‘sed’ to print the line number, then groups lines two by two
     3626using ‘N’.  Of course, this script does not teach as much as the one
     3627presented below.
     3628
     3629   The algorithm used for incrementing uses both buffers, so the line is
     3630printed as soon as possible and then discarded.  The number is split so
     3631that changing digits go in a buffer and unchanged ones go in the other;
     3632the changed digits are modified in a single step (using a ‘y’ command).
     3633The line number for the next line is then composed and stored in the
     3634hold space, to be used in the next iteration.
     3635
     3636     #!/usr/bin/sed -nf
     3637
     3638     # Prime the pump on the first line
     3639     x
     3640     /^$/ s/^.*$/1/
     3641
     3642     # Add the correct line number before the pattern
     3643     G
     3644     h
     3645
     3646     # Format it and print it
     3647     s/^/      /
     3648     s/^ *\(......\)\n/\1  /p
     3649
     3650     # Get the line number from hold space; add a zero
     3651     # if we're going to add a digit on the next line
     3652     g
     3653     s/\n.*$//
     3654     /^9*$/ s/^/0/
     3655
     3656     # separate changing/unchanged digits with an x
     3657     s/.9*$/x&/
     3658
     3659     # keep changing digits in hold space
     3660     h
     3661     s/^.*x//
     3662     y/0123456789/1234567890/
     3663     x
     3664
     3665     # keep unchanged digits in pattern space
     3666     s/x.*$//
     3667
     3668     # compose the new number, remove the newline implicitly added by G
     3669     G
     3670     s/\n//
     3671     h
     3672
     3673
     3674File: sed.info,  Node: cat -b,  Next: wc -c,  Prev: cat -n,  Up: Examples
     3675
     36767.12 Numbering Non-blank Lines
     3677==============================
     3678
     3679Emulating ‘cat -b’ is almost the same as ‘cat -n’—we only have to select
     3680which lines are to be numbered and which are not.
     3681
     3682   The part that is common to this script and the previous one is not
     3683commented to show how important it is to comment ‘sed’ scripts
     3684properly...
     3685
     3686     #!/usr/bin/sed -nf
     3687
     3688     /^$/ {
     3689       p
     3690       b
     3691     }
     3692
     3693     # Same as cat -n from now
     3694     x
     3695     /^$/ s/^.*$/1/
     3696     G
     3697     h
     3698     s/^/      /
     3699     s/^ *\(......\)\n/\1  /p
     3700     x
     3701     s/\n.*$//
     3702     /^9*$/ s/^/0/
     3703     s/.9*$/x&/
     3704     h
     3705     s/^.*x//
     3706     y/0123456789/1234567890/
     3707     x
     3708     s/x.*$//
     3709     G
     3710     s/\n//
     3711     h
     3712
     3713
     3714File: sed.info,  Node: wc -c,  Next: wc -w,  Prev: cat -b,  Up: Examples
     3715
     37167.13 Counting Characters
     3717========================
     3718
     3719This script shows another way to do arithmetic with ‘sed’.  In this case
     3720we have to add possibly large numbers, so implementing this by
     3721successive increments would not be feasible (and possibly even more
     3722complicated to contrive than this script).
     3723
     3724   The approach is to map numbers to letters, kind of an abacus
     3725implemented with ‘sed’.  ‘a’s are units, ‘b’s are tens and so on: we
     3726simply add the number of characters on the current line as units, and
     3727then propagate the carry to tens, hundreds, and so on.
     3728
     3729   As usual, running totals are kept in hold space.
     3730
     3731   On the last line, we convert the abacus form back to decimal.  For
     3732the sake of variety, this is done with a loop rather than with some 80
     3733‘s’ commands(1): first we convert units, removing ‘a’s from the number;
     3734then we rotate letters so that tens become ‘a’s, and so on until no more
     3735letters remain.
     3736
     3737     #!/usr/bin/sed -nf
     3738
     3739     # Add n+1 a's to hold space (+1 is for the newline)
     3740     s/./a/g
     3741     H
     3742     x
     3743     s/\n/a/
     3744
     3745     # Do the carry.  The t's and b's are not necessary,
     3746     # but they do speed up the thing
     3747     t a
     3748     : a;  s/aaaaaaaaaa/b/g; t b; b done
     3749     : b;  s/bbbbbbbbbb/c/g; t c; b done
     3750     : c;  s/cccccccccc/d/g; t d; b done
     3751     : d;  s/dddddddddd/e/g; t e; b done
     3752     : e;  s/eeeeeeeeee/f/g; t f; b done
     3753     : f;  s/ffffffffff/g/g; t g; b done
     3754     : g;  s/gggggggggg/h/g; t h; b done
     3755     : h;  s/hhhhhhhhhh//g
     3756
     3757     : done
     3758     $! {
     3759       h
     3760       b
     3761     }
     3762
     3763     # On the last line, convert back to decimal
     3764
     3765     : loop
     3766     /a/! s/[b-h]*/&0/
     3767     s/aaaaaaaaa/9/
     3768     s/aaaaaaaa/8/
     3769     s/aaaaaaa/7/
     3770     s/aaaaaa/6/
     3771     s/aaaaa/5/
     3772     s/aaaa/4/
     3773     s/aaa/3/
     3774     s/aa/2/
     3775     s/a/1/
     3776
     3777     : next
     3778     y/bcdefgh/abcdefg/
     3779     /[a-h]/ b loop
     3780     p
     3781
     3782   ---------- Footnotes ----------
     3783
     3784   (1) Some implementations have a limit of 199 commands per script
     3785
     3786
     3787File: sed.info,  Node: wc -w,  Next: wc -l,  Prev: wc -c,  Up: Examples
     3788
     37897.14 Counting Words
     3790===================
     3791
     3792This script is almost the same as the previous one, once each of the
     3793words on the line is converted to a single ‘a’ (in the previous script
     3794each letter was changed to an ‘a’).
     3795
     3796   It is interesting that real ‘wc’ programs have optimized loops for
     3797‘wc -c’, so they are much slower at counting words rather than
     3798characters.  This script’s bottleneck, instead, is arithmetic, and hence
     3799the word-counting one is faster (it has to manage smaller numbers).
     3800
     3801   Again, the common parts are not commented to show the importance of
     3802commenting ‘sed’ scripts.
     3803
     3804     #!/usr/bin/sed -nf
     3805
     3806     # Convert words to a's
     3807     s/[ <TAB>][ <TAB>]*/ /g
     3808     s/^/ /
     3809     s/ [^ ][^ ]*/a /g
     3810     s/ //g
     3811
     3812     # Append them to hold space
     3813     H
     3814     x
     3815     s/\n//
     3816
     3817     # From here on it is the same as in wc -c.
     3818     /aaaaaaaaaa/! bx;   s/aaaaaaaaaa/b/g
     3819     /bbbbbbbbbb/! bx;   s/bbbbbbbbbb/c/g
     3820     /cccccccccc/! bx;   s/cccccccccc/d/g
     3821     /dddddddddd/! bx;   s/dddddddddd/e/g
     3822     /eeeeeeeeee/! bx;   s/eeeeeeeeee/f/g
     3823     /ffffffffff/! bx;   s/ffffffffff/g/g
     3824     /gggggggggg/! bx;   s/gggggggggg/h/g
     3825     s/hhhhhhhhhh//g
     3826     :x
     3827     $! { h; b; }
     3828     :y
     3829     /a/! s/[b-h]*/&0/
     3830     s/aaaaaaaaa/9/
     3831     s/aaaaaaaa/8/
     3832     s/aaaaaaa/7/
     3833     s/aaaaaa/6/
     3834     s/aaaaa/5/
     3835     s/aaaa/4/
     3836     s/aaa/3/
     3837     s/aa/2/
     3838     s/a/1/
     3839     y/bcdefgh/abcdefg/
     3840     /[a-h]/ by
     3841     p
     3842
     3843
     3844File: sed.info,  Node: wc -l,  Next: head,  Prev: wc -w,  Up: Examples
     3845
     38467.15 Counting Lines
     3847===================
     3848
     3849No strange things are done now, because ‘sed’ gives us ‘wc -l’
     3850functionality for free!!!  Look:
     3851
     3852     #!/usr/bin/sed -nf
     3853     $=
     3854
     3855
     3856File: sed.info,  Node: head,  Next: tail,  Prev: wc -l,  Up: Examples
     3857
     38587.16 Printing the First Lines
     3859=============================
     3860
     3861This script is probably the simplest useful ‘sed’ script.  It displays
     3862the first 10 lines of input; the number of displayed lines is right
     3863before the ‘q’ command.
     3864
     3865     #!/usr/bin/sed -f
     3866     10q
     3867
     3868
     3869File: sed.info,  Node: tail,  Next: uniq,  Prev: head,  Up: Examples
     3870
     38717.17 Printing the Last Lines
     3872============================
     3873
     3874Printing the last N lines rather than the first is more complex but
     3875indeed possible.  N is encoded in the second line, before the bang
     3876character.
     3877
     3878   This script is similar to the ‘tac’ script in that it keeps the final
     3879output in the hold space and prints it at the end:
     3880
     3881     #!/usr/bin/sed -nf
     3882
     3883     1! {; H; g; }
     3884     1,10 !s/[^\n]*\n//
     3885     $p
     3886     h
     3887
     3888   Mainly, the scripts keeps a window of 10 lines and slides it by
     3889adding a line and deleting the oldest (the substitution command on the
     3890second line works like a ‘D’ command but does not restart the loop).
     3891
     3892   The “sliding window” technique is a very powerful way to write
     3893efficient and complex ‘sed’ scripts, because commands like ‘P’ would
     3894require a lot of work if implemented manually.
     3895
     3896   To introduce the technique, which is fully demonstrated in the rest
     3897of this chapter and is based on the ‘N’, ‘P’ and ‘D’ commands, here is
     3898an implementation of ‘tail’ using a simple “sliding window.”
     3899
     3900   This looks complicated but in fact the working is the same as the
     3901last script: after we have kicked in the appropriate number of lines,
     3902however, we stop using the hold space to keep inter-line state, and
     3903instead use ‘N’ and ‘D’ to slide pattern space by one line:
     3904
     3905     #!/usr/bin/sed -f
     3906
     3907     1h
     3908     2,10 {; H; g; }
     3909     $q
     3910     1,9d
     3911     N
     3912     D
     3913
     3914   Note how the first, second and fourth line are inactive after the
     3915first ten lines of input.  After that, all the script does is: exiting
     3916on the last line of input, appending the next input line to pattern
     3917space, and removing the first line.
     3918
     3919
     3920File: sed.info,  Node: uniq,  Next: uniq -d,  Prev: tail,  Up: Examples
     3921
     39227.18 Make Duplicate Lines Unique
     3923================================
     3924
     3925This is an example of the art of using the ‘N’, ‘P’ and ‘D’ commands,
     3926probably the most difficult to master.
     3927
     3928     #!/usr/bin/sed -f
     3929     h
     3930
     3931     :b
     3932     # On the last line, print and exit
     3933     $b
     3934     N
     3935     /^\(.*\)\n\1$/ {
     3936         # The two lines are identical.  Undo the effect of
     3937         # the n command.
     3938         g
     3939         bb
     3940     }
     3941
     3942     # If the N command had added the last line, print and exit
     3943     $b
     3944
     3945     # The lines are different; print the first and go
     3946     # back working on the second.
     3947     P
     3948     D
     3949
     3950   As you can see, we maintain a 2-line window using ‘P’ and ‘D’.  This
     3951technique is often used in advanced ‘sed’ scripts.
     3952
     3953
     3954File: sed.info,  Node: uniq -d,  Next: uniq -u,  Prev: uniq,  Up: Examples
     3955
     39567.19 Print Duplicated Lines of Input
     3957====================================
     3958
     3959This script prints only duplicated lines, like ‘uniq -d’.
     3960
     3961     #!/usr/bin/sed -nf
     3962
     3963     $b
     3964     N
     3965     /^\(.*\)\n\1$/ {
     3966         # Print the first of the duplicated lines
     3967         s/.*\n//
     3968         p
     3969
     3970         # Loop until we get a different line
     3971         :b
     3972         $b
     3973         N
     3974         /^\(.*\)\n\1$/ {
     3975             s/.*\n//
     3976             bb
     3977         }
     3978     }
     3979
     3980     # The last line cannot be followed by duplicates
     3981     $b
     3982
     3983     # Found a different one.  Leave it alone in the pattern space
     3984     # and go back to the top, hunting its duplicates
     3985     D
     3986
     3987
     3988File: sed.info,  Node: uniq -u,  Next: cat -s,  Prev: uniq -d,  Up: Examples
     3989
     39907.20 Remove All Duplicated Lines
     3991================================
     3992
     3993This script prints only unique lines, like ‘uniq -u’.
     3994
     3995     #!/usr/bin/sed -f
     3996
     3997     # Search for a duplicate line --- until that, print what you find.
     3998     $b
     3999     N
     4000     /^\(.*\)\n\1$/ ! {
     4001         P
     4002         D
     4003     }
     4004
     4005     :c
     4006     # Got two equal lines in pattern space.  At the
     4007     # end of the file we simply exit
     4008     $d
     4009
     4010     # Else, we keep reading lines with N until we
     4011     # find a different one
     4012     s/.*\n//
     4013     N
     4014     /^\(.*\)\n\1$/ {
     4015         bc
     4016     }
     4017
     4018     # Remove the last instance of the duplicate line
     4019     # and go back to the top
     4020     D
     4021
     4022
     4023File: sed.info,  Node: cat -s,  Prev: uniq -u,  Up: Examples
     4024
     40257.21 Squeezing Blank Lines
     4026==========================
     4027
     4028As a final example, here are three scripts, of increasing complexity and
     4029speed, that implement the same function as ‘cat -s’, that is squeezing
     4030blank lines.
     4031
     4032   The first leaves a blank line at the beginning and end if there are
     4033some already.
     4034
     4035     #!/usr/bin/sed -f
     4036
     4037     # on empty lines, join with next
     4038     # Note there is a star in the regexp
     4039     :x
     4040     /^\n*$/ {
     4041     N
     4042     bx
     4043     }
     4044
     4045     # now, squeeze all '\n', this can be also done by:
     4046     # s/^\(\n\)*/\1/
     4047     s/\n*/\
     4048     /
     4049
     4050   This one is a bit more complex and removes all empty lines at the
     4051beginning.  It does leave a single blank line at end if one was there.
     4052
     4053     #!/usr/bin/sed -f
     4054
     4055     # delete all leading empty lines
     4056     1,/^./{
     4057     /./!d
     4058     }
     4059
     4060     # on an empty line we remove it and all the following
     4061     # empty lines, but one
     4062     :x
     4063     /./!{
     4064     N
     4065     s/^\n$//
     4066     tx
     4067     }
     4068
     4069   This removes leading and trailing blank lines.  It is also the
     4070fastest.  Note that loops are completely done with ‘n’ and ‘b’, without
     4071relying on ‘sed’ to restart the script automatically at the end of a
     4072line.
     4073
     4074     #!/usr/bin/sed -nf
     4075
     4076     # delete all (leading) blanks
     4077     /./!d
     4078
     4079     # get here: so there is a non empty
     4080     :x
     4081     # print it
     4082     p
     4083     # get next
     4084     n
     4085     # got chars? print it again, etc...
     4086     /./bx
     4087
     4088     # no, don't have chars: got an empty line
     4089     :z
     4090     # get next, if last line we finish here so no trailing
     4091     # empty lines are written
     4092     n
     4093     # also empty? then ignore it, and get next... this will
     4094     # remove ALL empty lines
     4095     /./!bz
     4096
     4097     # all empty lines were deleted/ignored, but we have a non empty.  As
     4098     # what we want to do is to squeeze, insert a blank line artificially
     4099     i\
     4100
     4101     bx
     4102
     4103
     4104File: sed.info,  Node: Limitations,  Next: Other Resources,  Prev: Examples,  Up: Top
     4105
     41068 GNU ‘sed’’s Limitations and Non-limitations
     4107*********************************************
     4108
     4109For those who want to write portable ‘sed’ scripts, be aware that some
     4110implementations have been known to limit line lengths (for the pattern
     4111and hold spaces) to be no more than 4000 bytes.  The POSIX standard
     4112specifies that conforming ‘sed’ implementations shall support at least
     41138192 byte line lengths.  GNU ‘sed’ has no built-in limit on line length;
     4114as long as it can ‘malloc()’ more (virtual) memory, you can feed or
     4115construct lines as long as you like.
     4116
     4117   However, recursion is used to handle subpatterns and indefinite
     4118repetition.  This means that the available stack space may limit the
     4119size of the buffer that can be processed by certain patterns.
     4120
     4121
     4122File: sed.info,  Node: Other Resources,  Next: Reporting Bugs,  Prev: Limitations,  Up: Top
     4123
     41249 Other Resources for Learning About ‘sed’
     4125******************************************
     4126
     4127For up to date information about GNU ‘sed’ please visit
     4128<https://www.gnu.org/software/sed/>.
     4129
     4130   Send general questions and suggestions to <sed-devel@gnu.org>.  Visit
     4131the mailing list archives for past discussions at
     4132<https://lists.gnu.org/archive/html/sed-devel/>.
     4133
     4134   The following resources provide information about ‘sed’ (both GNU
     4135‘sed’ and other variations).  Note these not maintained by GNU ‘sed’
     4136developers.
     4137
     4138   â€¢ sed ‘$HOME’: <http://sed.sf.net>
     4139
     4140   â€¢ sed FAQ: <http://sed.sf.net/sedfaq.html>
     4141
     4142   â€¢ seder’s grabbag: <http://sed.sf.net/grabbag>
     4143
     4144   â€¢ The ‘sed-users’ mailing list maintained by Sven Guckes:
     4145     <http://groups.yahoo.com/group/sed-users/> (note this is _not_ the
     4146     GNU ‘sed’ mailing list).
     4147
     4148
     4149File: sed.info,  Node: Reporting Bugs,  Next: GNU Free Documentation License,  Prev: Other Resources,  Up: Top
     4150
     415110 Reporting Bugs
     4152*****************
     4153
     4154Email bug reports to <bug-sed@gnu.org>.  Also, please include the output
     4155of ‘sed --version’ in the body of your report if at all possible.
     4156
     4157   Please do not send a bug report like this:
     4158
     4159     while building frobme-1.3.4
     4160     $ configure
     4161     error→ sed: file sedscr line 1: Unknown option to 's'
     4162
     4163   If GNU ‘sed’ doesn’t configure your favorite package, take a few
     4164extra minutes to identify the specific problem and make a stand-alone
     4165test case.  Unlike other programs such as C compilers, making such test
     4166cases for ‘sed’ is quite simple.
     4167
     4168   A stand-alone test case includes all the data necessary to perform
     4169the test, and the specific invocation of ‘sed’ that causes the problem.
     4170The smaller a stand-alone test case is, the better.  A test case should
     4171not involve something as far removed from ‘sed’ as “try to configure
     4172frobme-1.3.4”.  Yes, that is in principle enough information to look for
     4173the bug, but that is not a very practical prospect.
     4174
     4175   Here are a few commonly reported bugs that are not bugs.
     4176
     4177‘N’ command on the last line
     4178
     4179     Most versions of ‘sed’ exit without printing anything when the ‘N’
     4180     command is issued on the last line of a file.  GNU ‘sed’ prints
     4181     pattern space before exiting unless of course the ‘-n’ command
     4182     switch has been specified.  This choice is by design.
     4183
     4184     Default behavior (gnu extension, non-POSIX conforming):
     4185          $ seq 3 | sed N
     4186          1
     4187          2
     4188          3
     4189     To force POSIX-conforming behavior:
     4190          $ seq 3 | sed --posix N
     4191          1
     4192          2
     4193
     4194     For example, the behavior of
     4195          sed N foo bar
     4196     would depend on whether foo has an even or an odd number of
     4197     lines(1).  Or, when writing a script to read the next few lines
     4198     following a pattern match, traditional implementations of ‘sed’
     4199     would force you to write something like
     4200          /foo/{ $!N; $!N; $!N; $!N; $!N; $!N; $!N; $!N; $!N }
     4201     instead of just
     4202          /foo/{ N;N;N;N;N;N;N;N;N; }
     4203
     4204     In any case, the simplest workaround is to use ‘$d;N’ in scripts
     4205     that rely on the traditional behavior, or to set the
     4206     â€˜POSIXLY_CORRECT’ variable to a non-empty value.
     4207
     4208Regex syntax clashes (problems with backslashes)
     4209     â€˜sed’ uses the POSIX basic regular expression syntax.  According to
     4210     the standard, the meaning of some escape sequences is undefined in
     4211     this syntax; notable in the case of ‘sed’ are ‘\|’, ‘\+’, ‘\?’,
     4212     â€˜\`’, ‘\'’, ‘\<’, ‘\>’, ‘\b’, ‘\B’, ‘\w’, and ‘\W’.
     4213
     4214     As in all GNU programs that use POSIX basic regular expressions,
     4215     â€˜sed’ interprets these escape sequences as special characters.  So,
     4216     â€˜x\+’ matches one or more occurrences of ‘x’.  ‘abc\|def’ matches
     4217     either ‘abc’ or ‘def’.
     4218
     4219     This syntax may cause problems when running scripts written for
     4220     other ‘sed’s.  Some ‘sed’ programs have been written with the
     4221     assumption that ‘\|’ and ‘\+’ match the literal characters ‘|’ and
     4222     â€˜+’.  Such scripts must be modified by removing the spurious
     4223     backslashes if they are to be used with modern implementations of
     4224     â€˜sed’, like GNU ‘sed’.
     4225
     4226     On the other hand, some scripts use s|abc\|def||g to remove
     4227     occurrences of _either_ ‘abc’ or ‘def’.  While this worked until
     4228     â€˜sed’ 4.0.x, newer versions interpret this as removing the string
     4229     â€˜abc|def’.  This is again undefined behavior according to POSIX,
     4230     and this interpretation is arguably more robust: older ‘sed’s, for
     4231     example, required that the regex matcher parsed ‘\/’ as ‘/’ in the
     4232     common case of escaping a slash, which is again undefined behavior;
     4233     the new behavior avoids this, and this is good because the regex
     4234     matcher is only partially under our control.
     4235
     4236     In addition, this version of ‘sed’ supports several escape
     4237     characters (some of which are multi-character) to insert
     4238     non-printable characters in scripts (‘\a’, ‘\c’, ‘\d’, ‘\o’, ‘\r’,
     4239     â€˜\t’, ‘\v’, ‘\x’).  These can cause similar problems with scripts
     4240     written for other ‘sed’s.
     4241
     4242‘-i’ clobbers read-only files
     4243
     4244     In short, ‘sed -i’ will let you delete the contents of a read-only
     4245     file, and in general the ‘-i’ option (*note Invocation: Invoking
     4246     sed.) lets you clobber protected files.  This is not a bug, but
     4247     rather a consequence of how the Unix file system works.
     4248
     4249     The permissions on a file say what can happen to the data in that
     4250     file, while the permissions on a directory say what can happen to
     4251     the list of files in that directory.  ‘sed -i’ will not ever open
     4252     for writing a file that is already on disk.  Rather, it will work
     4253     on a temporary file that is finally renamed to the original name:
     4254     if you rename or delete files, you’re actually modifying the
     4255     contents of the directory, so the operation depends on the
     4256     permissions of the directory, not of the file.  For this same
     4257     reason, ‘sed’ does not let you use ‘-i’ on a writable file in a
     4258     read-only directory, and will break hard or symbolic links when
     4259     â€˜-i’ is used on such a file.
     4260
     4261‘0a’ does not work (gives an error)
     4262
     4263     There is no line 0.  0 is a special address that is only used to
     4264     treat addresses like ‘0,/RE/’ as active when the script starts: if
     4265     you write ‘1,/abc/d’ and the first line includes the string ‘abc’,
     4266     then that match would be ignored because address ranges must span
     4267     at least two lines (barring the end of the file); but what you
     4268     probably wanted is to delete every line up to the first one
     4269     including ‘abc’, and this is obtained with ‘0,/abc/d’.
     4270
     4271‘[a-z]’ is case insensitive
     4272
     4273     You are encountering problems with locales.  POSIX mandates that
     4274     â€˜[a-z]’ uses the current locale’s collation order – in C parlance,
     4275     that means using ‘strcoll(3)’ instead of ‘strcmp(3)’.  Some locales
     4276     have a case-insensitive collation order, others don’t.
     4277
     4278     Another problem is that ‘[a-z]’ tries to use collation symbols.
     4279     This only happens if you are on the GNU system, using GNU libc’s
     4280     regular expression matcher instead of compiling the one supplied
     4281     with GNU sed.  In a Danish locale, for example, the regular
     4282     expression ‘^[a-z]$’ matches the string ‘aa’, because this is a
     4283     single collating symbol that comes after ‘a’ and before ‘b’; ‘ll’
     4284     behaves similarly in Spanish locales, or ‘ij’ in Dutch locales.
     4285
     4286     To work around these problems, which may cause bugs in shell
     4287     scripts, set the ‘LC_COLLATE’ and ‘LC_CTYPE’ environment variables
     4288     to ‘C’.
     4289
     4290‘s/.*//’ does not clear pattern space
     4291
     4292     This happens if your input stream includes invalid multibyte
     4293     sequences.  POSIX mandates that such sequences are _not_ matched by
     4294     â€˜.’, so that ‘s/.*//’ will not clear pattern space as you would
     4295     expect.  In fact, there is no way to clear sed’s buffers in the
     4296     middle of the script in most multibyte locales (including UTF-8
     4297     locales).  For this reason, GNU ‘sed’ provides a ‘z’ command (for
     4298     â€˜zap’) as an extension.
     4299
     4300     To work around these problems, which may cause bugs in shell
     4301     scripts, set the ‘LC_COLLATE’ and ‘LC_CTYPE’ environment variables
     4302     to ‘C’.
     4303
     4304   ---------- Footnotes ----------
     4305
     4306   (1) which is the actual “bug” that prompted the change in behavior
     4307
     4308
     4309File: sed.info,  Node: GNU Free Documentation License,  Next: Concept Index,  Prev: Reporting Bugs,  Up: Top
     4310
     4311Appendix A GNU Free Documentation License
     4312*****************************************
     4313
     4314                     Version 1.3, 3 November 2008
     4315
     4316     Copyright © 2000, 2001, 2002, 2007, 2008 Free Software Foundation, Inc.
     4317     <https://fsf.org/>
     4318
     4319     Everyone is permitted to copy and distribute verbatim copies
     4320     of this license document, but changing it is not allowed.
     4321
     4322  0. PREAMBLE
     4323
     4324     The purpose of this License is to make a manual, textbook, or other
     4325     functional and useful document “free” in the sense of freedom: to
     4326     assure everyone the effective freedom to copy and redistribute it,
     4327     with or without modifying it, either commercially or
     4328     noncommercially.  Secondarily, this License preserves for the
     4329     author and publisher a way to get credit for their work, while not
     4330     being considered responsible for modifications made by others.
     4331
     4332     This License is a kind of “copyleft”, which means that derivative
     4333     works of the document must themselves be free in the same sense.
     4334     It complements the GNU General Public License, which is a copyleft
     4335     license designed for free software.
     4336
     4337     We have designed this License in order to use it for manuals for
     4338     free software, because free software needs free documentation: a
     4339     free program should come with manuals providing the same freedoms
     4340     that the software does.  But this License is not limited to
     4341     software manuals; it can be used for any textual work, regardless
     4342     of subject matter or whether it is published as a printed book.  We
     4343     recommend this License principally for works whose purpose is
     4344     instruction or reference.
     4345
     4346  1. APPLICABILITY AND DEFINITIONS
     4347
     4348     This License applies to any manual or other work, in any medium,
     4349     that contains a notice placed by the copyright holder saying it can
     4350     be distributed under the terms of this License.  Such a notice
     4351     grants a world-wide, royalty-free license, unlimited in duration,
     4352     to use that work under the conditions stated herein.  The
     4353     â€œDocument”, below, refers to any such manual or work.  Any member
     4354     of the public is a licensee, and is addressed as “you”.  You accept
     4355     the license if you copy, modify or distribute the work in a way
     4356     requiring permission under copyright law.
     4357
     4358     A “Modified Version” of the Document means any work containing the
     4359     Document or a portion of it, either copied verbatim, or with
     4360     modifications and/or translated into another language.
     4361
     4362     A “Secondary Section” is a named appendix or a front-matter section
     4363     of the Document that deals exclusively with the relationship of the
     4364     publishers or authors of the Document to the Document’s overall
     4365     subject (or to related matters) and contains nothing that could
     4366     fall directly within that overall subject.  (Thus, if the Document
     4367     is in part a textbook of mathematics, a Secondary Section may not
     4368     explain any mathematics.)  The relationship could be a matter of
     4369     historical connection with the subject or with related matters, or
     4370     of legal, commercial, philosophical, ethical or political position
     4371     regarding them.
     4372
     4373     The “Invariant Sections” are certain Secondary Sections whose
     4374     titles are designated, as being those of Invariant Sections, in the
     4375     notice that says that the Document is released under this License.
     4376     If a section does not fit the above definition of Secondary then it
     4377     is not allowed to be designated as Invariant.  The Document may
     4378     contain zero Invariant Sections.  If the Document does not identify
     4379     any Invariant Sections then there are none.
     4380
     4381     The “Cover Texts” are certain short passages of text that are
     4382     listed, as Front-Cover Texts or Back-Cover Texts, in the notice
     4383     that says that the Document is released under this License.  A
     4384     Front-Cover Text may be at most 5 words, and a Back-Cover Text may
     4385     be at most 25 words.
     4386
     4387     A “Transparent” copy of the Document means a machine-readable copy,
     4388     represented in a format whose specification is available to the
     4389     general public, that is suitable for revising the document
     4390     straightforwardly with generic text editors or (for images composed
     4391     of pixels) generic paint programs or (for drawings) some widely
     4392     available drawing editor, and that is suitable for input to text
     4393     formatters or for automatic translation to a variety of formats
     4394     suitable for input to text formatters.  A copy made in an otherwise
     4395     Transparent file format whose markup, or absence of markup, has
     4396     been arranged to thwart or discourage subsequent modification by
     4397     readers is not Transparent.  An image format is not Transparent if
     4398     used for any substantial amount of text.  A copy that is not
     4399     â€œTransparent” is called “Opaque”.
     4400
     4401     Examples of suitable formats for Transparent copies include plain
     4402     ASCII without markup, Texinfo input format, LaTeX input format,
     4403     SGML or XML using a publicly available DTD, and standard-conforming
     4404     simple HTML, PostScript or PDF designed for human modification.
     4405     Examples of transparent image formats include PNG, XCF and JPG.
     4406     Opaque formats include proprietary formats that can be read and
     4407     edited only by proprietary word processors, SGML or XML for which
     4408     the DTD and/or processing tools are not generally available, and
     4409     the machine-generated HTML, PostScript or PDF produced by some word
     4410     processors for output purposes only.
     4411
     4412     The “Title Page” means, for a printed book, the title page itself,
     4413     plus such following pages as are needed to hold, legibly, the
     4414     material this License requires to appear in the title page.  For
     4415     works in formats which do not have any title page as such, “Title
     4416     Page” means the text near the most prominent appearance of the
     4417     work’s title, preceding the beginning of the body of the text.
     4418
     4419     The “publisher” means any person or entity that distributes copies
     4420     of the Document to the public.
     4421
     4422     A section “Entitled XYZ” means a named subunit of the Document
     4423     whose title either is precisely XYZ or contains XYZ in parentheses
     4424     following text that translates XYZ in another language.  (Here XYZ
     4425     stands for a specific section name mentioned below, such as
     4426     â€œAcknowledgements”, “Dedications”, “Endorsements”, or “History”.)
     4427     To “Preserve the Title” of such a section when you modify the
     4428     Document means that it remains a section “Entitled XYZ” according
     4429     to this definition.
     4430
     4431     The Document may include Warranty Disclaimers next to the notice
     4432     which states that this License applies to the Document.  These
     4433     Warranty Disclaimers are considered to be included by reference in
     4434     this License, but only as regards disclaiming warranties: any other
     4435     implication that these Warranty Disclaimers may have is void and
     4436     has no effect on the meaning of this License.
     4437
     4438  2. VERBATIM COPYING
     4439
     4440     You may copy and distribute the Document in any medium, either
     4441     commercially or noncommercially, provided that this License, the
     4442     copyright notices, and the license notice saying this License
     4443     applies to the Document are reproduced in all copies, and that you
     4444     add no other conditions whatsoever to those of this License.  You
     4445     may not use technical measures to obstruct or control the reading
     4446     or further copying of the copies you make or distribute.  However,
     4447     you may accept compensation in exchange for copies.  If you
     4448     distribute a large enough number of copies you must also follow the
     4449     conditions in section 3.
     4450
     4451     You may also lend copies, under the same conditions stated above,
     4452     and you may publicly display copies.
     4453
     4454  3. COPYING IN QUANTITY
     4455
     4456     If you publish printed copies (or copies in media that commonly
     4457     have printed covers) of the Document, numbering more than 100, and
     4458     the Document’s license notice requires Cover Texts, you must
     4459     enclose the copies in covers that carry, clearly and legibly, all
     4460     these Cover Texts: Front-Cover Texts on the front cover, and
     4461     Back-Cover Texts on the back cover.  Both covers must also clearly
     4462     and legibly identify you as the publisher of these copies.  The
     4463     front cover must present the full title with all words of the title
     4464     equally prominent and visible.  You may add other material on the
     4465     covers in addition.  Copying with changes limited to the covers, as
     4466     long as they preserve the title of the Document and satisfy these
     4467     conditions, can be treated as verbatim copying in other respects.
     4468
     4469     If the required texts for either cover are too voluminous to fit
     4470     legibly, you should put the first ones listed (as many as fit
     4471     reasonably) on the actual cover, and continue the rest onto
     4472     adjacent pages.
     4473
     4474     If you publish or distribute Opaque copies of the Document
     4475     numbering more than 100, you must either include a machine-readable
     4476     Transparent copy along with each Opaque copy, or state in or with
     4477     each Opaque copy a computer-network location from which the general
     4478     network-using public has access to download using public-standard
     4479     network protocols a complete Transparent copy of the Document, free
     4480     of added material.  If you use the latter option, you must take
     4481     reasonably prudent steps, when you begin distribution of Opaque
     4482     copies in quantity, to ensure that this Transparent copy will
     4483     remain thus accessible at the stated location until at least one
     4484     year after the last time you distribute an Opaque copy (directly or
     4485     through your agents or retailers) of that edition to the public.
     4486
     4487     It is requested, but not required, that you contact the authors of
     4488     the Document well before redistributing any large number of copies,
     4489     to give them a chance to provide you with an updated version of the
     4490     Document.
     4491
     4492  4. MODIFICATIONS
     4493
     4494     You may copy and distribute a Modified Version of the Document
     4495     under the conditions of sections 2 and 3 above, provided that you
     4496     release the Modified Version under precisely this License, with the
     4497     Modified Version filling the role of the Document, thus licensing
     4498     distribution and modification of the Modified Version to whoever
     4499     possesses a copy of it.  In addition, you must do these things in
     4500     the Modified Version:
     4501
     4502       A. Use in the Title Page (and on the covers, if any) a title
     4503          distinct from that of the Document, and from those of previous
     4504          versions (which should, if there were any, be listed in the
     4505          History section of the Document).  You may use the same title
     4506          as a previous version if the original publisher of that
     4507          version gives permission.
     4508
     4509       B. List on the Title Page, as authors, one or more persons or
     4510          entities responsible for authorship of the modifications in
     4511          the Modified Version, together with at least five of the
     4512          principal authors of the Document (all of its principal
     4513          authors, if it has fewer than five), unless they release you
     4514          from this requirement.
     4515
     4516       C. State on the Title page the name of the publisher of the
     4517          Modified Version, as the publisher.
     4518
     4519       D. Preserve all the copyright notices of the Document.
     4520
     4521       E. Add an appropriate copyright notice for your modifications
     4522          adjacent to the other copyright notices.
     4523
     4524       F. Include, immediately after the copyright notices, a license
     4525          notice giving the public permission to use the Modified
     4526          Version under the terms of this License, in the form shown in
     4527          the Addendum below.
     4528
     4529       G. Preserve in that license notice the full lists of Invariant
     4530          Sections and required Cover Texts given in the Document’s
     4531          license notice.
     4532
     4533       H. Include an unaltered copy of this License.
     4534
     4535       I. Preserve the section Entitled “History”, Preserve its Title,
     4536          and add to it an item stating at least the title, year, new
     4537          authors, and publisher of the Modified Version as given on the
     4538          Title Page.  If there is no section Entitled “History” in the
     4539          Document, create one stating the title, year, authors, and
     4540          publisher of the Document as given on its Title Page, then add
     4541          an item describing the Modified Version as stated in the
     4542          previous sentence.
     4543
     4544       J. Preserve the network location, if any, given in the Document
     4545          for public access to a Transparent copy of the Document, and
     4546          likewise the network locations given in the Document for
     4547          previous versions it was based on.  These may be placed in the
     4548          “History” section.  You may omit a network location for a work
     4549          that was published at least four years before the Document
     4550          itself, or if the original publisher of the version it refers
     4551          to gives permission.
     4552
     4553       K. For any section Entitled “Acknowledgements” or “Dedications”,
     4554          Preserve the Title of the section, and preserve in the section
     4555          all the substance and tone of each of the contributor
     4556          acknowledgements and/or dedications given therein.
     4557
     4558       L. Preserve all the Invariant Sections of the Document, unaltered
     4559          in their text and in their titles.  Section numbers or the
     4560          equivalent are not considered part of the section titles.
     4561
     4562       M. Delete any section Entitled “Endorsements”.  Such a section
     4563          may not be included in the Modified Version.
     4564
     4565       N. Do not retitle any existing section to be Entitled
     4566          “Endorsements” or to conflict in title with any Invariant
     4567          Section.
     4568
     4569       O. Preserve any Warranty Disclaimers.
     4570
     4571     If the Modified Version includes new front-matter sections or
     4572     appendices that qualify as Secondary Sections and contain no
     4573     material copied from the Document, you may at your option designate
     4574     some or all of these sections as invariant.  To do this, add their
     4575     titles to the list of Invariant Sections in the Modified Version’s
     4576     license notice.  These titles must be distinct from any other
     4577     section titles.
     4578
     4579     You may add a section Entitled “Endorsements”, provided it contains
     4580     nothing but endorsements of your Modified Version by various
     4581     parties—for example, statements of peer review or that the text has
     4582     been approved by an organization as the authoritative definition of
     4583     a standard.
     4584
     4585     You may add a passage of up to five words as a Front-Cover Text,
     4586     and a passage of up to 25 words as a Back-Cover Text, to the end of
     4587     the list of Cover Texts in the Modified Version.  Only one passage
     4588     of Front-Cover Text and one of Back-Cover Text may be added by (or
     4589     through arrangements made by) any one entity.  If the Document
     4590     already includes a cover text for the same cover, previously added
     4591     by you or by arrangement made by the same entity you are acting on
     4592     behalf of, you may not add another; but you may replace the old
     4593     one, on explicit permission from the previous publisher that added
     4594     the old one.
     4595
     4596     The author(s) and publisher(s) of the Document do not by this
     4597     License give permission to use their names for publicity for or to
     4598     assert or imply endorsement of any Modified Version.
     4599
     4600  5. COMBINING DOCUMENTS
     4601
     4602     You may combine the Document with other documents released under
     4603     this License, under the terms defined in section 4 above for
     4604     modified versions, provided that you include in the combination all
     4605     of the Invariant Sections of all of the original documents,
     4606     unmodified, and list them all as Invariant Sections of your
     4607     combined work in its license notice, and that you preserve all
     4608     their Warranty Disclaimers.
     4609
     4610     The combined work need only contain one copy of this License, and
     4611     multiple identical Invariant Sections may be replaced with a single
     4612     copy.  If there are multiple Invariant Sections with the same name
     4613     but different contents, make the title of each such section unique
     4614     by adding at the end of it, in parentheses, the name of the
     4615     original author or publisher of that section if known, or else a
     4616     unique number.  Make the same adjustment to the section titles in
     4617     the list of Invariant Sections in the license notice of the
     4618     combined work.
     4619
     4620     In the combination, you must combine any sections Entitled
     4621     â€œHistory” in the various original documents, forming one section
     4622     Entitled “History”; likewise combine any sections Entitled
     4623     â€œAcknowledgements”, and any sections Entitled “Dedications”.  You
     4624     must delete all sections Entitled “Endorsements.”
     4625
     4626  6. COLLECTIONS OF DOCUMENTS
     4627
     4628     You may make a collection consisting of the Document and other
     4629     documents released under this License, and replace the individual
     4630     copies of this License in the various documents with a single copy
     4631     that is included in the collection, provided that you follow the
     4632     rules of this License for verbatim copying of each of the documents
     4633     in all other respects.
     4634
     4635     You may extract a single document from such a collection, and
     4636     distribute it individually under this License, provided you insert
     4637     a copy of this License into the extracted document, and follow this
     4638     License in all other respects regarding verbatim copying of that
     4639     document.
     4640
     4641  7. AGGREGATION WITH INDEPENDENT WORKS
     4642
     4643     A compilation of the Document or its derivatives with other
     4644     separate and independent documents or works, in or on a volume of a
     4645     storage or distribution medium, is called an “aggregate” if the
     4646     copyright resulting from the compilation is not used to limit the
     4647     legal rights of the compilation’s users beyond what the individual
     4648     works permit.  When the Document is included in an aggregate, this
     4649     License does not apply to the other works in the aggregate which
     4650     are not themselves derivative works of the Document.
     4651
     4652     If the Cover Text requirement of section 3 is applicable to these
     4653     copies of the Document, then if the Document is less than one half
     4654     of the entire aggregate, the Document’s Cover Texts may be placed
     4655     on covers that bracket the Document within the aggregate, or the
     4656     electronic equivalent of covers if the Document is in electronic
     4657     form.  Otherwise they must appear on printed covers that bracket
     4658     the whole aggregate.
     4659
     4660  8. TRANSLATION
     4661
     4662     Translation is considered a kind of modification, so you may
     4663     distribute translations of the Document under the terms of section
     4664     4.  Replacing Invariant Sections with translations requires special
     4665     permission from their copyright holders, but you may include
     4666     translations of some or all Invariant Sections in addition to the
     4667     original versions of these Invariant Sections.  You may include a
     4668     translation of this License, and all the license notices in the
     4669     Document, and any Warranty Disclaimers, provided that you also
     4670     include the original English version of this License and the
     4671     original versions of those notices and disclaimers.  In case of a
     4672     disagreement between the translation and the original version of
     4673     this License or a notice or disclaimer, the original version will
     4674     prevail.
     4675
     4676     If a section in the Document is Entitled “Acknowledgements”,
     4677     â€œDedications”, or “History”, the requirement (section 4) to
     4678     Preserve its Title (section 1) will typically require changing the
     4679     actual title.
     4680
     4681  9. TERMINATION
     4682
     4683     You may not copy, modify, sublicense, or distribute the Document
     4684     except as expressly provided under this License.  Any attempt
     4685     otherwise to copy, modify, sublicense, or distribute it is void,
     4686     and will automatically terminate your rights under this License.
     4687
     4688     However, if you cease all violation of this License, then your
     4689     license from a particular copyright holder is reinstated (a)
     4690     provisionally, unless and until the copyright holder explicitly and
     4691     finally terminates your license, and (b) permanently, if the
     4692     copyright holder fails to notify you of the violation by some
     4693     reasonable means prior to 60 days after the cessation.
     4694
     4695     Moreover, your license from a particular copyright holder is
     4696     reinstated permanently if the copyright holder notifies you of the
     4697     violation by some reasonable means, this is the first time you have
     4698     received notice of violation of this License (for any work) from
     4699     that copyright holder, and you cure the violation prior to 30 days
     4700     after your receipt of the notice.
     4701
     4702     Termination of your rights under this section does not terminate
     4703     the licenses of parties who have received copies or rights from you
     4704     under this License.  If your rights have been terminated and not
     4705     permanently reinstated, receipt of a copy of some or all of the
     4706     same material does not give you any rights to use it.
     4707
     4708  10. FUTURE REVISIONS OF THIS LICENSE
     4709
     4710     The Free Software Foundation may publish new, revised versions of
     4711     the GNU Free Documentation License from time to time.  Such new
     4712     versions will be similar in spirit to the present version, but may
     4713     differ in detail to address new problems or concerns.  See
     4714     <https://www.gnu.org/licenses/>.
     4715
     4716     Each version of the License is given a distinguishing version
     4717     number.  If the Document specifies that a particular numbered
     4718     version of this License “or any later version” applies to it, you
     4719     have the option of following the terms and conditions either of
     4720     that specified version or of any later version that has been
     4721     published (not as a draft) by the Free Software Foundation.  If the
     4722     Document does not specify a version number of this License, you may
     4723     choose any version ever published (not as a draft) by the Free
     4724     Software Foundation.  If the Document specifies that a proxy can
     4725     decide which future versions of this License can be used, that
     4726     proxy’s public statement of acceptance of a version permanently
     4727     authorizes you to choose that version for the Document.
     4728
     4729  11. RELICENSING
     4730
     4731     â€œMassive Multiauthor Collaboration Site” (or “MMC Site”) means any
     4732     World Wide Web server that publishes copyrightable works and also
     4733     provides prominent facilities for anybody to edit those works.  A
     4734     public wiki that anybody can edit is an example of such a server.
     4735     A “Massive Multiauthor Collaboration” (or “MMC”) contained in the
     4736     site means any set of copyrightable works thus published on the MMC
     4737     site.
     4738
     4739     â€œCC-BY-SA” means the Creative Commons Attribution-Share Alike 3.0
     4740     license published by Creative Commons Corporation, a not-for-profit
     4741     corporation with a principal place of business in San Francisco,
     4742     California, as well as future copyleft versions of that license
     4743     published by that same organization.
     4744
     4745     â€œIncorporate” means to publish or republish a Document, in whole or
     4746     in part, as part of another Document.
     4747
     4748     An MMC is “eligible for relicensing” if it is licensed under this
     4749     License, and if all works that were first published under this
     4750     License somewhere other than this MMC, and subsequently
     4751     incorporated in whole or in part into the MMC, (1) had no cover
     4752     texts or invariant sections, and (2) were thus incorporated prior
     4753     to November 1, 2008.
     4754
     4755     The operator of an MMC Site may republish an MMC contained in the
     4756     site under CC-BY-SA on the same site at any time before August 1,
     4757     2009, provided the MMC is eligible for relicensing.
     4758
     4759ADDENDUM: How to use this License for your documents
     4760====================================================
     4761
     4762To use this License in a document you have written, include a copy of
     4763the License in the document and put the following copyright and license
     4764notices just after the title page:
     4765
     4766       Copyright (C)  YEAR  YOUR NAME.
     4767       Permission is granted to copy, distribute and/or modify this document
     4768       under the terms of the GNU Free Documentation License, Version 1.3
     4769       or any later version published by the Free Software Foundation;
     4770       with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
     4771       Texts.  A copy of the license is included in the section entitled ``GNU
     4772       Free Documentation License''.
     4773
     4774   If you have Invariant Sections, Front-Cover Texts and Back-Cover
     4775Texts, replace the “with...Texts.” line with this:
     4776
     4777         with the Invariant Sections being LIST THEIR TITLES, with
     4778         the Front-Cover Texts being LIST, and with the Back-Cover Texts
     4779         being LIST.
     4780
     4781   If you have Invariant Sections without Cover Texts, or some other
     4782combination of the three, merge those two alternatives to suit the
     4783situation.
     4784
     4785   If your document contains nontrivial examples of program code, we
     4786recommend releasing these examples in parallel under your choice of free
     4787software license, such as the GNU General Public License, to permit
     4788their use in free software.
     4789
     4790
     4791File: sed.info,  Node: Concept Index,  Next: Command and Option Index,  Prev: GNU Free Documentation License,  Up: Top
     4792
     4793Concept Index
     4794*************
     4795
     4796This is a general index of all issues discussed in this manual, with the
     4797exception of the ‘sed’ commands and command-line options.
     4798
     4799[index]
     4800* Menu:
     4801
     4802* -e, example:                           Overview.            (line  46)
     4803* -e, example <1>:                       sed script overview. (line  37)
     4804* –expression, example:                  Overview.            (line  46)
     4805* -f, example:                           Overview.            (line  46)
     4806* -f, example <1>:                       sed script overview. (line  37)
     4807* –file, example:                        Overview.            (line  46)
     4808* -i, example:                           Overview.            (line  26)
     4809* -n, example:                           Overview.            (line  33)
     4810* -s, example:                           Overview.            (line  40)
     4811* 0 address:                             Reporting Bugs.      (line 114)
     4812* ;, command separator:                  sed script overview. (line  37)
     4813* a, and semicolons:                     sed script overview. (line  56)
     4814* Additional reading about sed:          Other Resources.     (line  13)
     4815* ADDR1,+N:                              Range Addresses.     (line  31)
     4816* ADDR1,~N:                              Range Addresses.     (line  31)
     4817* address range, example:                sed script overview. (line  23)
     4818* Address, as a regular expression:      Regexp Addresses.    (line  13)
     4819* Address, last line:                    Numeric Addresses.   (line  13)
     4820* Address, numeric:                      Numeric Addresses.   (line   8)
     4821* addresses, excluding:                  Addresses overview.  (line  33)
     4822* Addresses, in sed scripts:             Numeric Addresses.   (line   6)
     4823* addresses, negating:                   Addresses overview.  (line  33)
     4824* addresses, numeric:                    Addresses overview.  (line   6)
     4825* addresses, range:                      Addresses overview.  (line  26)
     4826* addresses, regular expression:         Addresses overview.  (line  20)
     4827* addresses, syntax:                     sed script overview. (line  13)
     4828* alphabetic characters:                 Character Classes and Bracket Expressions.
     4829                                                              (line  49)
     4830* alphanumeric characters:               Character Classes and Bracket Expressions.
     4831                                                              (line  44)
     4832* Append hold space to pattern space:    Other Commands.      (line 288)
     4833* Append next input line to pattern space: Other Commands.    (line 261)
     4834* Append pattern space to hold space:    Other Commands.      (line 280)
     4835* Appending text after a line:           Other Commands.      (line  45)
     4836* b, joining lines with:                 Branching and flow control.
     4837                                                              (line 150)
     4838* b, versus t:                           Branching and flow control.
     4839                                                              (line 150)
     4840* back-reference:                        Back-references and Subexpressions.
     4841                                                              (line   6)
     4842* Backreferences, in regular expressions: The "s" Command.    (line  18)
     4843* blank characters:                      Character Classes and Bracket Expressions.
     4844                                                              (line  54)
     4845* bracket expression:                    Character Classes and Bracket Expressions.
     4846                                                              (line   6)
     4847* Branch to a label, if s/// failed:     Extended Commands.   (line  63)
     4848* Branch to a label, if s/// succeeded:  Programming Commands.
     4849                                                              (line  22)
     4850* Branch to a label, unconditionally:    Programming Commands.
     4851                                                              (line  18)
     4852* branching and n, N:                    Branching and flow control.
     4853                                                              (line 105)
     4854* branching, infinite loop:              Branching and flow control.
     4855                                                              (line  95)
     4856* branching, joining lines:              Branching and flow control.
     4857                                                              (line 150)
     4858* Buffer spaces, pattern and hold:       Execution Cycle.     (line   6)
     4859* Bugs, reporting:                       Reporting Bugs.      (line   6)
     4860* c, and semicolons:                     sed script overview. (line  56)
     4861* case insensitive, regular expression:  Regexp Addresses.    (line  47)
     4862* Case-insensitive matching:             The "s" Command.     (line 117)
     4863* Caveat — #n on first line:             Common Commands.     (line  20)
     4864* character class:                       Character Classes and Bracket Expressions.
     4865                                                              (line   6)
     4866* character classes:                     Character Classes and Bracket Expressions.
     4867                                                              (line  43)
     4868* classes of characters:                 Character Classes and Bracket Expressions.
     4869                                                              (line  43)
     4870* Command groups:                        Common Commands.     (line  91)
     4871* Comments, in scripts:                  Common Commands.     (line  12)
     4872* Conditional branch:                    Programming Commands.
     4873                                                              (line  22)
     4874* Conditional branch <1>:                Extended Commands.   (line  63)
     4875* control characters:                    Character Classes and Bracket Expressions.
     4876                                                              (line  57)
     4877* Copy hold space into pattern space:    Other Commands.      (line 284)
     4878* Copy pattern space into hold space:    Other Commands.      (line 276)
     4879* cycle, restarting:                     Branching and flow control.
     4880                                                              (line  75)
     4881* d, example:                            sed script overview. (line  23)
     4882* Delete first line from pattern space:  Other Commands.      (line 255)
     4883* digit characters:                      Character Classes and Bracket Expressions.
     4884                                                              (line  62)
     4885* Disabling autoprint, from command line: Command-Line Options.
     4886                                                              (line  23)
     4887* empty regular expression:              Regexp Addresses.    (line  22)
     4888* Emptying pattern space:                Extended Commands.   (line  85)
     4889* Emptying pattern space <1>:            Reporting Bugs.      (line 143)
     4890* Evaluate Bourne-shell commands:        Extended Commands.   (line  12)
     4891* Evaluate Bourne-shell commands, after substitution: The "s" Command.
     4892                                                              (line 108)
     4893* example, address range:                sed script overview. (line  23)
     4894* example, regular expression:           sed script overview. (line  28)
     4895* Exchange hold space with pattern space: Other Commands.     (line 292)
     4896* Excluding lines:                       Addresses overview.  (line  33)
     4897* exit status:                           Exit status.         (line   6)
     4898* exit status, example:                  Exit status.         (line  25)
     4899* Extended regular expressions, choosing: Command-Line Options.
     4900                                                              (line 135)
     4901* Extended regular expressions, syntax:  ERE syntax.          (line   6)
     4902* File name, printing:                   Extended Commands.   (line  30)
     4903* Files to be processed as input:        Command-Line Options.
     4904                                                              (line 181)
     4905* Flow of control in scripts:            Programming Commands.
     4906                                                              (line  11)
     4907* Global substitution:                   The "s" Command.     (line  74)
     4908* GNU extensions, /dev/stderr file:      The "s" Command.     (line 101)
     4909* GNU extensions, /dev/stderr file <1>:  Other Commands.      (line 244)
     4910* GNU extensions, /dev/stdin file:       Other Commands.      (line 227)
     4911* GNU extensions, /dev/stdin file <1>:   Extended Commands.   (line  53)
     4912* GNU extensions, /dev/stdout file:      Command-Line Options.
     4913                                                              (line 189)
     4914* GNU extensions, /dev/stdout file <1>:  The "s" Command.     (line 101)
     4915* GNU extensions, /dev/stdout file <2>:  Other Commands.      (line 244)
     4916* GNU extensions, 0 address:             Range Addresses.     (line  31)
     4917* GNU extensions, 0 address <1>:         Reporting Bugs.      (line 114)
     4918* GNU extensions, 0,ADDR2 addressing:    Range Addresses.     (line  31)
     4919* GNU extensions, ADDR1,+N addressing:   Range Addresses.     (line  31)
     4920* GNU extensions, ADDR1,~N addressing:   Range Addresses.     (line  31)
     4921* GNU extensions, branch if s/// failed: Extended Commands.   (line  63)
     4922* GNU extensions, case modifiers in s commands: The "s" Command.
     4923                                                              (line  29)
     4924* GNU extensions, checking for their presence: Extended Commands.
     4925                                                              (line  69)
     4926* GNU extensions, debug:                 Command-Line Options.
     4927                                                              (line  29)
     4928* GNU extensions, disabling:             Command-Line Options.
     4929                                                              (line 102)
     4930* GNU extensions, emptying pattern space: Extended Commands.  (line  85)
     4931* GNU extensions, emptying pattern space <1>: Reporting Bugs. (line 143)
     4932* GNU extensions, evaluating Bourne-shell commands: The "s" Command.
     4933                                                              (line 108)
     4934* GNU extensions, evaluating Bourne-shell commands <1>: Extended Commands.
     4935                                                              (line  12)
     4936* GNU extensions, extended regular expressions: Command-Line Options.
     4937                                                              (line 135)
     4938* GNU extensions, g and NUMBER modifier: The "s" Command.     (line  80)
     4939* GNU extensions, I modifier:            The "s" Command.     (line 117)
     4940* GNU extensions, I modifier <1>:        Regexp Addresses.    (line  47)
     4941* GNU extensions, in-place editing:      Command-Line Options.
     4942                                                              (line  56)
     4943* GNU extensions, in-place editing <1>:  Reporting Bugs.      (line  95)
     4944* GNU extensions, M modifier:            The "s" Command.     (line 122)
     4945* GNU extensions, M modifier <1>:        Regexp Addresses.    (line  75)
     4946* GNU extensions, modifiers and the empty regular expression: Regexp Addresses.
     4947                                                              (line  22)
     4948* GNU extensions, N~M addresses:         Numeric Addresses.   (line  18)
     4949* GNU extensions, quitting silently:     Extended Commands.   (line  36)
     4950* GNU extensions, R command:             Extended Commands.   (line  53)
     4951* GNU extensions, reading a file a line at a time: Extended Commands.
     4952                                                              (line  53)
     4953* GNU extensions, returning an exit code: Common Commands.    (line  28)
     4954* GNU extensions, returning an exit code <1>: Extended Commands.
     4955                                                              (line  36)
     4956* GNU extensions, setting line length:   Other Commands.      (line 207)
     4957* GNU extensions, special escapes:       Escapes.             (line   6)
     4958* GNU extensions, special escapes <1>:   Reporting Bugs.      (line  88)
     4959* GNU extensions, special two-address forms: Range Addresses. (line  31)
     4960* GNU extensions, subprocesses:          The "s" Command.     (line 108)
     4961* GNU extensions, subprocesses <1>:      Extended Commands.   (line  12)
     4962* GNU extensions, to basic regular expressions: BRE syntax.   (line  13)
     4963* GNU extensions, to basic regular expressions <1>: BRE syntax.
     4964                                                              (line  59)
     4965* GNU extensions, to basic regular expressions <2>: BRE syntax.
     4966                                                              (line  62)
     4967* GNU extensions, to basic regular expressions <3>: BRE syntax.
     4968                                                              (line  77)
     4969* GNU extensions, to basic regular expressions <4>: BRE syntax.
     4970                                                              (line  87)
     4971* GNU extensions, to basic regular expressions <5>: Reporting Bugs.
     4972                                                              (line  61)
     4973* GNU extensions, two addresses supported by most commands: Other Commands.
     4974                                                              (line  61)
     4975* GNU extensions, two addresses supported by most commands <1>: Other Commands.
     4976                                                              (line 115)
     4977* GNU extensions, two addresses supported by most commands <2>: Other Commands.
     4978                                                              (line 204)
     4979* GNU extensions, two addresses supported by most commands <3>: Other Commands.
     4980                                                              (line 236)
     4981* GNU extensions, unlimited line length: Limitations.         (line   6)
     4982* GNU extensions, writing first line to a file: Extended Commands.
     4983                                                              (line  80)
     4984* Goto, in scripts:                      Programming Commands.
     4985                                                              (line  18)
     4986* graphic characters:                    Character Classes and Bracket Expressions.
     4987                                                              (line  65)
     4988* Greedy regular expression matching:    BRE syntax.          (line 113)
     4989* Grouping commands:                     Common Commands.     (line  91)
     4990* hexadecimal digits:                    Character Classes and Bracket Expressions.
     4991                                                              (line  88)
     4992* Hold space, appending from pattern space: Other Commands.   (line 280)
     4993* Hold space, appending to pattern space: Other Commands.     (line 288)
     4994* Hold space, copy into pattern space:   Other Commands.      (line 284)
     4995* Hold space, copying pattern space into: Other Commands.     (line 276)
     4996* Hold space, definition:                Execution Cycle.     (line   6)
     4997* Hold space, exchange with pattern space: Other Commands.    (line 292)
     4998* i, and semicolons:                     sed script overview. (line  56)
     4999* In-place editing:                      Reporting Bugs.      (line  95)
     5000* In-place editing, activating:          Command-Line Options.
     5001                                                              (line  56)
     5002* In-place editing, Perl-style backup file names: Command-Line Options.
     5003                                                              (line  67)
     5004* infinite loop, branching:              Branching and flow control.
     5005                                                              (line  95)
     5006* Inserting text before a line:          Other Commands.      (line 104)
     5007* joining lines with branching:          Branching and flow control.
     5008                                                              (line 150)
     5009* joining quoted-printable lines:        Branching and flow control.
     5010                                                              (line 150)
     5011* labels:                                Branching and flow control.
     5012                                                              (line  75)
     5013* Labels, in scripts:                    Programming Commands.
     5014                                                              (line  14)
     5015* Last line, selecting:                  Numeric Addresses.   (line  13)
     5016* Line length, setting:                  Command-Line Options.
     5017                                                              (line  97)
     5018* Line length, setting <1>:              Other Commands.      (line 207)
     5019* Line number, printing:                 Other Commands.      (line 194)
     5020* Line selection:                        Numeric Addresses.   (line   6)
     5021* Line, selecting by number:             Numeric Addresses.   (line   8)
     5022* Line, selecting by regular expression match: Regexp Addresses.
     5023                                                              (line  13)
     5024* Line, selecting last:                  Numeric Addresses.   (line  13)
     5025* List pattern space:                    Other Commands.      (line 207)
     5026* lower-case letters:                    Character Classes and Bracket Expressions.
     5027                                                              (line  68)
     5028* Mixing g and NUMBER modifiers in the s command: The "s" Command.
     5029                                                              (line  80)
     5030* multiple files:                        Overview.            (line  40)
     5031* multiple sed commands:                 sed script overview. (line  37)
     5032* n, and branching:                      Branching and flow control.
     5033                                                              (line 105)
     5034* N, and branching:                      Branching and flow control.
     5035                                                              (line 105)
     5036* named character classes:               Character Classes and Bracket Expressions.
     5037                                                              (line  43)
     5038* newline, command separator:            sed script overview. (line  37)
     5039* Next input line, append to pattern space: Other Commands.   (line 261)
     5040* Next input line, replace pattern space with: Common Commands.
     5041                                                              (line  61)
     5042* Non-bugs, 0 address:                   Reporting Bugs.      (line 114)
     5043* Non-bugs, in-place editing:            Reporting Bugs.      (line  95)
     5044* Non-bugs, localization-related:        Reporting Bugs.      (line 124)
     5045* Non-bugs, localization-related <1>:    Reporting Bugs.      (line 143)
     5046* Non-bugs, N command on the last line:  Reporting Bugs.      (line  30)
     5047* Non-bugs, regex syntax clashes:        Reporting Bugs.      (line  61)
     5048* numeric addresses:                     Addresses overview.  (line   6)
     5049* numeric characters:                    Character Classes and Bracket Expressions.
     5050                                                              (line  62)
     5051* omitting labels:                       Branching and flow control.
     5052                                                              (line  75)
     5053* output:                                Overview.            (line  26)
     5054* output, suppressing:                   Overview.            (line  33)
     5055* p, example:                            Overview.            (line  33)
     5056* paragraphs, processing:                Multiline techniques.
     5057                                                              (line  53)
     5058* parameters, script:                    Overview.            (line  46)
     5059* Parenthesized substrings:              The "s" Command.     (line  18)
     5060* Pattern space, definition:             Execution Cycle.     (line   6)
     5061* Portability, comments:                 Common Commands.     (line  15)
     5062* Portability, line length limitations:  Limitations.         (line   6)
     5063* Portability, N command on the last line: Reporting Bugs.    (line  30)
     5064* POSIXLY_CORRECT behavior, bracket expressions: Character Classes and Bracket Expressions.
     5065                                                              (line 112)
     5066* POSIXLY_CORRECT behavior, enabling:    Command-Line Options.
     5067                                                              (line 105)
     5068* POSIXLY_CORRECT behavior, escapes:     Escapes.             (line  11)
     5069* POSIXLY_CORRECT behavior, N command:   Reporting Bugs.      (line  56)
     5070* Print first line from pattern space:   Other Commands.      (line 273)
     5071* printable characters:                  Character Classes and Bracket Expressions.
     5072                                                              (line  72)
     5073* Printing file name:                    Extended Commands.   (line  30)
     5074* Printing line number:                  Other Commands.      (line 194)
     5075* Printing text unambiguously:           Other Commands.      (line 207)
     5076* processing paragraphs:                 Multiline techniques.
     5077                                                              (line  53)
     5078* punctuation characters:                Character Classes and Bracket Expressions.
     5079                                                              (line  75)
     5080* Q, example:                            Exit status.         (line  25)
     5081* q, example:                            sed script overview. (line  28)
     5082* Quitting:                              Common Commands.     (line  28)
     5083* Quitting <1>:                          Extended Commands.   (line  36)
     5084* quoted-printable lines, joining:       Branching and flow control.
     5085                                                              (line 150)
     5086* range addresses:                       Addresses overview.  (line  26)
     5087* range expression:                      Character Classes and Bracket Expressions.
     5088                                                              (line  18)
     5089* Range of lines:                        Range Addresses.     (line   6)
     5090* Range with start address of zero:      Range Addresses.     (line  31)
     5091* Read next input line:                  Common Commands.     (line  61)
     5092* Read text from a file:                 Other Commands.      (line 219)
     5093* Read text from a file <1>:             Extended Commands.   (line  53)
     5094* regex addresses and input lines:       Regexp Addresses.    (line  84)
     5095* regex addresses and pattern space:     Regexp Addresses.    (line  84)
     5096* regular expression addresses:          Addresses overview.  (line  20)
     5097* regular expression, example:           sed script overview. (line  28)
     5098* Replace hold space with copy of pattern space: Other Commands.
     5099                                                              (line 276)
     5100* Replace pattern space with copy of hold space: Other Commands.
     5101                                                              (line 284)
     5102* Replacing all text matching regexp in a line: The "s" Command.
     5103                                                              (line  74)
     5104* Replacing only Nth match of regexp in a line: The "s" Command.
     5105                                                              (line  78)
     5106* Replacing selected lines with other text: Other Commands.   (line 157)
     5107* Requiring GNU sed:                     Extended Commands.   (line  69)
     5108* restarting a cycle:                    Branching and flow control.
     5109                                                              (line  75)
     5110* Sandbox mode:                          Command-Line Options.
     5111                                                              (line 157)
     5112* script parameter:                      Overview.            (line  46)
     5113* Script structure:                      sed script overview. (line   6)
     5114* Script, from a file:                   Command-Line Options.
     5115                                                              (line  51)
     5116* Script, from command line:             Command-Line Options.
     5117                                                              (line  46)
     5118* sed commands syntax:                   sed script overview. (line  13)
     5119* sed commands, multiple:                sed script overview. (line  37)
     5120* sed script structure:                  sed script overview. (line   6)
     5121* Selecting lines to process:            Numeric Addresses.   (line   6)
     5122* Selecting non-matching lines:          Addresses overview.  (line  33)
     5123* semicolons, command separator:         sed script overview. (line  37)
     5124* Several lines, selecting:              Range Addresses.     (line   6)
     5125* Slash character, in regular expressions: Regexp Addresses.  (line  32)
     5126* space characters:                      Character Classes and Bracket Expressions.
     5127                                                              (line  80)
     5128* Spaces, pattern and hold:              Execution Cycle.     (line   6)
     5129* Special addressing forms:              Range Addresses.     (line  31)
     5130* standard input:                        Overview.            (line  18)
     5131* Standard input, processing as input:   Command-Line Options.
     5132                                                              (line 183)
     5133* standard output:                       Overview.            (line  26)
     5134* stdin:                                 Overview.            (line  18)
     5135* stdout:                                Overview.            (line  26)
     5136* Stream editor:                         Introduction.        (line   6)
     5137* subexpression:                         Back-references and Subexpressions.
     5138                                                              (line   6)
     5139* Subprocesses:                          The "s" Command.     (line 108)
     5140* Subprocesses <1>:                      Extended Commands.   (line  12)
     5141* Substitution of text, options:         The "s" Command.     (line  70)
     5142* suppressing output:                    Overview.            (line  33)
     5143* syntax, addresses:                     sed script overview. (line  13)
     5144* syntax, sed commands:                  sed script overview. (line  13)
     5145* t, joining lines with:                 Branching and flow control.
     5146                                                              (line 150)
     5147* t, versus b:                           Branching and flow control.
     5148                                                              (line 150)
     5149* Text, appending:                       Other Commands.      (line  45)
     5150* Text, deleting:                        Common Commands.     (line  44)
     5151* Text, insertion:                       Other Commands.      (line 104)
     5152* Text, printing:                        Common Commands.     (line  52)
     5153* Text, printing after substitution:     The "s" Command.     (line  88)
     5154* Text, writing to a file after substitution: The "s" Command.
     5155                                                              (line 101)
     5156* Transliteration:                       Other Commands.      (line  11)
     5157* Unbuffered I/O, choosing:              Command-Line Options.
     5158                                                              (line 164)
     5159* upper-case letters:                    Character Classes and Bracket Expressions.
     5160                                                              (line  84)
     5161* Usage summary, printing:               Command-Line Options.
     5162                                                              (line  17)
     5163* Version, printing:                     Command-Line Options.
     5164                                                              (line  13)
     5165* whitespace characters:                 Character Classes and Bracket Expressions.
     5166                                                              (line  80)
     5167* Working on separate files:             Command-Line Options.
     5168                                                              (line 148)
     5169* Write first line to a file:            Extended Commands.   (line  80)
     5170* Write to a file:                       Other Commands.      (line 244)
     5171* xdigit class:                          Character Classes and Bracket Expressions.
     5172                                                              (line  88)
     5173* Zero Address:                          Zero Address.        (line   6)
     5174* Zero, as range start address:          Range Addresses.     (line  31)
     5175
     5176
     5177File: sed.info,  Node: Command and Option Index,  Prev: Concept Index,  Up: Top
     5178
     5179Command and Option Index
     5180************************
     5181
     5182This is an alphabetical list of all ‘sed’ commands and command-line
     5183options.
     5184
     5185[index]
     5186* Menu:
     5187
     5188* # (comments):                          Common Commands.     (line  12)
     5189* --binary:                              Command-Line Options.
     5190                                                              (line 114)
     5191* --debug:                               Command-Line Options.
     5192                                                              (line  29)
     5193* --expression:                          Command-Line Options.
     5194                                                              (line  46)
     5195* --file:                                Command-Line Options.
     5196                                                              (line  51)
     5197* --follow-symlinks:                     Command-Line Options.
     5198                                                              (line 125)
     5199* --help:                                Command-Line Options.
     5200                                                              (line  17)
     5201* --in-place:                            Command-Line Options.
     5202                                                              (line  56)
     5203* --line-length:                         Command-Line Options.
     5204                                                              (line  97)
     5205* --null-data:                           Command-Line Options.
     5206                                                              (line 172)
     5207* --posix:                               Command-Line Options.
     5208                                                              (line 102)
     5209* --quiet:                               Command-Line Options.
     5210                                                              (line  23)
     5211* --regexp-extended:                     Command-Line Options.
     5212                                                              (line 135)
     5213* --sandbox:                             Command-Line Options.
     5214                                                              (line 157)
     5215* --separate:                            Command-Line Options.
     5216                                                              (line 148)
     5217* --silent:                              Command-Line Options.
     5218                                                              (line  23)
     5219* --unbuffered:                          Command-Line Options.
     5220                                                              (line 164)
     5221* --version:                             Command-Line Options.
     5222                                                              (line  13)
     5223* --zero-terminated:                     Command-Line Options.
     5224                                                              (line 172)
     5225* -b:                                    Command-Line Options.
     5226                                                              (line 114)
     5227* -e:                                    Command-Line Options.
     5228                                                              (line  46)
     5229* -E:                                    Command-Line Options.
     5230                                                              (line 135)
     5231* -f:                                    Command-Line Options.
     5232                                                              (line  51)
     5233* -i:                                    Command-Line Options.
     5234                                                              (line  56)
     5235* -l:                                    Command-Line Options.
     5236                                                              (line  97)
     5237* -n:                                    Command-Line Options.
     5238                                                              (line  23)
     5239* -n, forcing from within a script:      Common Commands.     (line  20)
     5240* -r:                                    Command-Line Options.
     5241                                                              (line 135)
     5242* -s:                                    Command-Line Options.
     5243                                                              (line 148)
     5244* -u:                                    Command-Line Options.
     5245                                                              (line 164)
     5246* -z:                                    Command-Line Options.
     5247                                                              (line 172)
     5248* : (label) command:                     Programming Commands.
     5249                                                              (line  14)
     5250* = (print line number) command:         Other Commands.      (line 194)
     5251* {} command grouping:                   Common Commands.     (line  91)
     5252* a (append text lines) command:         Other Commands.      (line  45)
     5253* alnum character class:                 Character Classes and Bracket Expressions.
     5254                                                              (line  44)
     5255* alpha character class:                 Character Classes and Bracket Expressions.
     5256                                                              (line  49)
     5257* b (branch) command:                    Programming Commands.
     5258                                                              (line  18)
     5259* blank character class:                 Character Classes and Bracket Expressions.
     5260                                                              (line  54)
     5261* c (change to text lines) command:      Other Commands.      (line 157)
     5262* cntrl character class:                 Character Classes and Bracket Expressions.
     5263                                                              (line  57)
     5264* D (delete first line) command:         Other Commands.      (line 255)
     5265* d (delete) command:                    Common Commands.     (line  44)
     5266* digit character class:                 Character Classes and Bracket Expressions.
     5267                                                              (line  62)
     5268* e (evaluate) command:                  Extended Commands.   (line  12)
     5269* F (File name) command:                 Extended Commands.   (line  30)
     5270* G (appending Get) command:             Other Commands.      (line 288)
     5271* g (get) command:                       Other Commands.      (line 284)
     5272* graph character class:                 Character Classes and Bracket Expressions.
     5273                                                              (line  65)
     5274* H (append Hold) command:               Other Commands.      (line 280)
     5275* h (hold) command:                      Other Commands.      (line 276)
     5276* i (insert text lines) command:         Other Commands.      (line 104)
     5277* l (list unambiguously) command:        Other Commands.      (line 207)
     5278* lower character class:                 Character Classes and Bracket Expressions.
     5279                                                              (line  68)
     5280* N (append Next line) command:          Other Commands.      (line 261)
     5281* n (next-line) command:                 Common Commands.     (line  61)
     5282* P (print first line) command:          Other Commands.      (line 273)
     5283* p (print) command:                     Common Commands.     (line  52)
     5284* print character class:                 Character Classes and Bracket Expressions.
     5285                                                              (line  72)
     5286* punct character class:                 Character Classes and Bracket Expressions.
     5287                                                              (line  75)
     5288* q (quit) command:                      Common Commands.     (line  28)
     5289* Q (silent Quit) command:               Extended Commands.   (line  36)
     5290* r (read file) command:                 Other Commands.      (line 219)
     5291* R (read line) command:                 Extended Commands.   (line  53)
     5292* s command, option flags:               The "s" Command.     (line  70)
     5293* space character class:                 Character Classes and Bracket Expressions.
     5294                                                              (line  80)
     5295* T (test and branch if failed) command: Extended Commands.   (line  63)
     5296* t (test and branch if successful) command: Programming Commands.
     5297                                                              (line  22)
     5298* upper character class:                 Character Classes and Bracket Expressions.
     5299                                                              (line  84)
     5300* v (version) command:                   Extended Commands.   (line  69)
     5301* w (write file) command:                Other Commands.      (line 244)
     5302* W (write first line) command:          Extended Commands.   (line  80)
     5303* x (eXchange) command:                  Other Commands.      (line 292)
     5304* xdigit character class:                Character Classes and Bracket Expressions.
     5305                                                              (line  88)
     5306* y (transliterate) command:             Other Commands.      (line  11)
     5307* z (Zap) command:                       Extended Commands.   (line  85)
     5308
     5309
    305310
    315311Tag Table:
    32 (Indirect)
    33 Node: Top935
    34 Node: Introduction3816
    35 Node: Invoking sed4370
    36 Ref: Invoking sed-Footnote-19396
    37 Ref: Invoking sed-Footnote-29588
    38 Node: sed Programs9691
    39 Node: Execution Cycle10838
    40 Ref: Execution Cycle-Footnote-112011
    41 Node: Addresses12322
    42 Node: Regular Expressions17061
    43 Node: Common Commands24610
    44 Node: The "s" Command26608
    45 Ref: The "s" Command-Footnote-130940
    46 Node: Other Commands31012
    47 Ref: Other Commands-Footnote-136149
    48 Node: Programming Commands36221
    49 Node: Extended Commands37130
    50 Node: Escapes40705
    51 Ref: Escapes-Footnote-143711
    52 Node: Examples43902
    53 Node: Centering lines44997
    54 Node: Increment a number45909
    55 Ref: Increment a number-Footnote-147489
    56 Node: Rename files to lower case47609
    57 Node: Print bash environment50405
    58 Node: Reverse chars of lines51185
    59 Ref: Reverse chars of lines-Footnote-152202
    60 Node: tac52424
    61 Node: cat -n53206
    62 Node: cat -b55063
    63 Node: wc -c55815
    64 Ref: wc -c-Footnote-157748
    65 Node: wc -w57817
    66 Node: wc -l59289
    67 Node: head59526
    68 Node: tail59850
    69 Node: uniq61534
    70 Node: uniq -d62330
    71 Node: uniq -u63054
    72 Node: cat -s63778
    73 Node: Limitations65667
    74 Node: Other Resources66507
    75 Node: Reporting Bugs67433
    76 Ref: Reporting Bugs-Footnote-173962
    77 Node: Extended regexps74033
    78 Node: Concept Index75200
    79 Node: Command and Option Index85215
     5312Node: Top738
     5313Node: Introduction2217
     5314Node: Invoking sed2789
     5315Node: Overview3114
     5316Node: Command-Line Options5561
     5317Ref: Command-Line Options-Footnote-113530
     5318Ref: Command-Line Options-Footnote-213758
     5319Node: Exit status13861
     5320Node: sed scripts14795
     5321Node: sed script overview15394
     5322Node: sed commands list18057
     5323Node: The "s" Command23070
     5324Ref: The "s" Command-Footnote-128889
     5325Node: Common Commands28969
     5326Node: Other Commands32106
     5327Ref: insert command35324
     5328Ref: Other Commands-Footnote-141629
     5329Node: Programming Commands41709
     5330Node: Extended Commands42649
     5331Node: Multiple commands syntax46675
     5332Node: sed addresses51217
     5333Node: Addresses overview51706
     5334Node: Numeric Addresses53705
     5335Node: Regexp Addresses55116
     5336Ref: Regexp Addresses-Footnote-159252
     5337Node: Range Addresses59392
     5338Ref: Zero Address Regex Range60294
     5339Node: Zero Address61753
     5340Node: sed regular expressions62318
     5341Node: Regular Expressions Overview63172
     5342Node: BRE vs ERE64733
     5343Node: BRE syntax66484
     5344Node: ERE syntax73304
     5345Node: Character Classes and Bracket Expressions74878
     5346Node: regexp extensions80030
     5347Node: Back-references and Subexpressions82506
     5348Node: Escapes84958
     5349Ref: Escapes-Footnote-188105
     5350Node: Locale Considerations88304
     5351Ref: Locale Considerations-Footnote-193067
     5352Node: advanced sed93239
     5353Node: Execution Cycle93606
     5354Ref: Execution Cycle-Footnote-194845
     5355Node: Hold and Pattern Buffers95162
     5356Node: Multiline techniques95350
     5357Node: Branching and flow control98704
     5358Node: Examples107029
     5359Node: Joining lines108275
     5360Node: Centering lines110082
     5361Node: Increment a number111006
     5362Ref: Increment a number-Footnote-1112495
     5363Node: Rename files to lower case112623
     5364Node: Print bash environment115418
     5365Node: Reverse chars of lines116181
     5366Ref: Reverse chars of lines-Footnote-1117224
     5367Node: Text search across multiple lines117441
     5368Node: Line length adjustment120786
     5369Node: Adding a header to multiple files122533
     5370Node: tac125986
     5371Node: cat -n126774
     5372Node: cat -b128616
     5373Node: wc -c129378
     5374Ref: wc -c-Footnote-1131316
     5375Node: wc -w131385
     5376Node: wc -l132875
     5377Node: head133128
     5378Node: tail133467
     5379Node: uniq135196
     5380Node: uniq -d136007
     5381Node: uniq -u136722
     5382Node: cat -s137435
     5383Node: Limitations139298
     5384Node: Other Resources140161
     5385Node: Reporting Bugs141104
     5386Ref: N_command_last_line142294
     5387Ref: Reporting Bugs-Footnote-1148805
     5388Node: GNU Free Documentation License148880
     5389Node: Concept Index174239
     5390Node: Command and Option Index201600
    805391
    815392End Tag Table
     5393
     5394
     5395Local Variables:
     5396coding: utf-8
     5397End:
  • trunk/src/sed/doc/sed.texi

    r599 r3613  
    11\input texinfo  @c -*-texinfo-*-
    2 @c Do not edit this file!! It is automatically generated from sed-in.texi.
    32@c
    43@c -- Stuff that needs adding: ----------------------------------------------
    5 @c (document the `;' command-separator)
     4@c (nothing!)
    65@c --------------------------------------------------------------------------
    76@c Check for consistency: regexps in @code, text that they match in @samp.
    8 @c 
     7@c
    98@c Tips:
    109@c    @command for command
     
    3635@value{SSED}, a stream editor.
    3736
    38 Copyright @copyright{} 1998, 1999, 2001, 2002, 2003, 2004 Free
    39 Software Foundation, Inc.
    40 
    41 This document is released under the terms of the @acronym{GNU} Free
    42 Documentation License as published by the Free Software Foundation;
    43 either version 1.1, or (at your option) any later version.
    44 
    45 You should have received a copy of the @acronym{GNU} Free Documentation
    46 License along with @value{SSED}; see the file @file{COPYING.DOC}.
    47 If not, write to the Free Software Foundation, 59 Temple Place - Suite
    48 330, Boston, MA 02110-1301, USA.
    49 
    50 There are no Cover Texts and no Invariant Sections; this text, along
    51 with its equivalent in the printed manual, constitutes the Title Page.
     37Copyright @copyright{} 1998--2022 Free Software Foundation, Inc.
     38
     39@quotation
     40Permission is granted to copy, distribute and/or modify this document
     41under the terms of the GNU Free Documentation License, Version 1.3
     42or any later version published by the Free Software Foundation;
     43with no Invariant Sections, no Front-Cover Texts, and no
     44Back-Cover Texts.  A copy of the license is included in the
     45section entitled ``GNU Free Documentation License''.
     46@end quotation
    5247@end copying
    5348
     
    5550
    5651@titlepage
    57 @title @command{sed}, a stream editor
     52@title @value{SSED}, a stream editor
    5853@subtitle version @value{VERSION}, @value{UPDATED}
    59 @author by Ken Pizzini, Paolo Bonzini
     54@author by Ken Pizzini, Paolo Bonzini, Jim Meyering, Assaf Gordon
    6055
    6156@page
    6257@vskip 0pt plus 1filll
    63 Copyright @copyright{} 1998, 1999 Free Software Foundation, Inc.
    64 
    6558@insertcopying
    66 
    67 Published by the Free Software Foundation, @*
    68 51 Franklin Street, Fifth Floor @*
    69 Boston, MA 02110-1301, USA
    7059@end titlepage
    7160
    72 
     61@contents
     62
     63@ifnottex
    7364@node Top
    74 @top
    75 
    76 @ifnottex
     65@top @value{SSED}
     66
    7767@insertcopying
    7868@end ifnottex
     
    8171* Introduction::               Introduction
    8272* Invoking sed::               Invocation
    83 * sed Programs::               @command{sed} programs
     73* sed scripts::                @command{sed} scripts
     74* sed addresses::              Addresses: selecting lines
     75* sed regular expressions::    Regular expressions: selecting text
     76* advanced sed::               Advanced @command{sed}: cycles and buffers
    8477* Examples::                   Some sample scripts
    8578* Limitations::                Limitations and (non-)limitations of @value{SSED}
    8679* Other Resources::            Other resources for learning about @command{sed}
    8780* Reporting Bugs::             Reporting bugs
    88 
    89 * Extended regexps::           @command{egrep}-style regular expressions
    90 @ifset PERL
    91 * Perl regexps::               Perl-style regular expressions
    92 @end ifset
    93 
     81* GNU Free Documentation License:: Copying and sharing this manual
    9482* Concept Index::              A menu with all the topics in this manual.
    9583* Command and Option Index::   A menu with all @command{sed} commands and
    9684                               command-line options.
    97 
    98 @detailmenu
    99 --- The detailed node listing ---
    100 
    101 sed Programs:
    102 * Execution Cycle::                 How @command{sed} works
    103 * Addresses::                       Selecting lines with @command{sed}
    104 * Regular Expressions::             Overview of regular expression syntax
    105 * Common Commands::                 Often used commands
    106 * The "s" Command::                 @command{sed}'s Swiss Army Knife
    107 * Other Commands::                  Less frequently used commands
    108 * Programming Commands::            Commands for @command{sed} gurus
    109 * Extended Commands::               Commands specific of @value{SSED}
    110 * Escapes::                         Specifying special characters
    111 
    112 Examples:
    113 * Centering lines::
    114 * Increment a number::
    115 * Rename files to lower case::
    116 * Print bash environment::
    117 * Reverse chars of lines::
    118 * tac::                             Reverse lines of files
    119 * cat -n::                          Numbering lines
    120 * cat -b::                          Numbering non-blank lines
    121 * wc -c::                           Counting chars
    122 * wc -w::                           Counting words
    123 * wc -l::                           Counting lines
    124 * head::                            Printing the first lines
    125 * tail::                            Printing the last lines
    126 * uniq::                            Make duplicate lines unique
    127 * uniq -d::                         Print duplicated lines of input
    128 * uniq -u::                         Remove all duplicated lines
    129 * cat -s::                          Squeezing blank lines
    130 
    131 @ifset PERL
    132 Perl regexps::                      Perl-style regular expressions
    133 * Backslash::                       Introduces special sequences
    134 * Circumflex/dollar sign/period::   Behave specially with regard to new lines
    135 * Square brackets::                 Are a bit different in strange cases
    136 * Options setting::                 Toggle modifiers in the middle of a regexp
    137 * Non-capturing subpatterns::       Are not counted when backreferencing
    138 * Repetition::                      Allows for non-greedy matching
    139 * Backreferences::                  Allows for more than 10 back references
    140 * Assertions::                      Allows for complex look ahead matches
    141 * Non-backtracking subpatterns::    Often gives more performance
    142 * Conditional subpatterns::         Allows if/then/else branches
    143 * Recursive patterns::              For example to match parentheses
    144 * Comments::                        Because things can get complex...
    145 @end ifset
    146 
    147 @end detailmenu
    14885@end menu
    14986
     
    167104
    168105@node Invoking sed
    169 @chapter Invocation
    170 
     106@chapter Running sed
     107
     108This chapter covers how to run @command{sed}. Details of @command{sed}
     109scripts and individual @command{sed} commands are discussed in the
     110next chapter.
     111
     112@menu
     113* Overview::
     114* Command-Line Options::
     115* Exit status::
     116@end menu
     117
     118
     119@node Overview
     120@section Overview
    171121Normally @command{sed} is invoked like this:
    172122
     
    175125@end example
    176126
     127For example, to change every @samp{hello} to @samp{world}
     128in the file @file{input.txt}:
     129
     130@example
     131sed 's/hello/world/g' input.txt > output.txt
     132@end example
     133
     134Without the @samp{g} (global) modifier, @command{sed} affects
     135only the first instance per line.
     136
     137@cindex stdin
     138@cindex standard input
     139If you do not specify @var{INPUTFILE}, or if @var{INPUTFILE} is @file{-},
     140@command{sed} filters the contents of the standard input. The following
     141commands are equivalent:
     142
     143@example
     144sed 's/hello/world/g' input.txt > output.txt
     145sed 's/hello/world/g' < input.txt > output.txt
     146cat input.txt | sed 's/hello/world/g' - > output.txt
     147@end example
     148
     149@cindex stdout
     150@cindex output
     151@cindex standard output
     152@cindex -i, example
     153@command{sed} writes output to standard output. Use @option{-i} to edit
     154files in-place instead of printing to standard output.
     155See also the @code{W} and @code{s///w} commands for writing output to
     156other files. The following command modifies @file{file.txt} and
     157does not produce any output:
     158
     159@example
     160sed -i 's/hello/world/' file.txt
     161@end example
     162
     163@cindex -n, example
     164@cindex p, example
     165@cindex suppressing output
     166@cindex output, suppressing
     167By default @command{sed} prints all processed input (except input
     168that has been modified/deleted by commands such as @command{d}).
     169Use @option{-n} to suppress output, and the @code{p} command
     170to print specific lines. The following command prints only line 45
     171of the input file:
     172
     173@example
     174sed -n '45p' file.txt
     175@end example
     176
     177
     178
     179@cindex multiple files
     180@cindex -s, example
     181@command{sed} treats multiple input files as one long stream.
     182The following example prints the first line of the first file
     183(@file{one.txt}) and the last line of the last file (@file{three.txt}).
     184Use @option{-s} to reverse this behavior.
     185
     186@example
     187sed -n  '1p ; $p' one.txt two.txt three.txt
     188@end example
     189
     190
     191@cindex -e, example
     192@cindex --expression, example
     193@cindex -f, example
     194@cindex --file, example
     195@cindex script parameter
     196@cindex parameters, script
     197Without @option{-e} or @option{-f} options, @command{sed} uses
     198the first non-option parameter as the @var{script}, and the following
     199non-option parameters as input files.
     200If @option{-e} or @option{-f} options are used to specify a @var{script},
     201all non-option parameters are taken as input files.
     202Options @option{-e} and @option{-f} can be combined, and can appear
     203multiple times (in which case the final effective @var{script} will be
     204concatenation of all the individual @var{script}s).
     205
     206The following examples are equivalent:
     207
     208@example
     209sed 's/hello/world/' input.txt > output.txt
     210
     211sed -e 's/hello/world/' input.txt > output.txt
     212sed --expression='s/hello/world/' input.txt > output.txt
     213
     214echo 's/hello/world/' > myscript.sed
     215sed -f myscript.sed input.txt > output.txt
     216sed --file=myscript.sed input.txt > output.txt
     217@end example
     218
     219
     220@node Command-Line Options
     221@section Command-Line Options
     222
    177223The full format for invoking @command{sed} is:
    178224
     
    180226sed OPTIONS... [SCRIPT] [INPUTFILE...]
    181227@end example
    182 
    183 If you do not specify @var{INPUTFILE}, or if @var{INPUTFILE} is @file{-},
    184 @command{sed} filters the contents of the standard input.  The @var{script}
    185 is actually the first non-option parameter, which @command{sed} specially
    186 considers a script and not an input file if (and only if) none of the
    187 other @var{options} specifies a script to be executed, that is if neither
    188 of the @option{-e} and @option{-f} options is specified.
    189228
    190229@command{sed} may be invoked with the following command-line options:
     
    212251@cindex Disabling autoprint, from command line
    213252By default, @command{sed} prints out the pattern space
    214 at the end of each cycle through the script.
     253at the end of each cycle through the script (@pxref{Execution Cycle, ,
     254How @code{sed} works}).
    215255These options disable this automatic printing,
    216256and @command{sed} only produces output when explicitly told to
    217257via the @code{p} command.
     258
     259@item --debug
     260@opindex --debug
     261@cindex @value{SSEDEXT}, debug
     262Print the input sed program in canonical form,
     263and annotate program execution.
     264@codequotebacktick on
     265@codequoteundirected on
     266@example
     267$ echo 1 | sed '\%1%s21232'
     2683
     269
     270$ echo 1 | sed --debug '\%1%s21232'
     271SED PROGRAM:
     272  /1/ s/1/3/
     273INPUT:   'STDIN' line 1
     274PATTERN: 1
     275COMMAND: /1/ s/1/3/
     276PATTERN: 3
     277END-OF-CYCLE:
     2783
     279@end example
     280@codequotebacktick off
     281@codequoteundirected off
     282
     283
     284@item -e @var{script}
     285@itemx --expression=@var{script}
     286@opindex -e
     287@opindex --expression
     288@cindex Script, from command line
     289Add the commands in @var{script} to the set of commands to be
     290run while processing the input.
     291
     292@item -f @var{script-file}
     293@itemx --file=@var{script-file}
     294@opindex -f
     295@opindex --file
     296@cindex Script, from a file
     297Add the commands contained in the file @var{script-file}
     298to the set of commands to be run while processing the input.
    218299
    219300@item -i[@var{SUFFIX}]
     
    240321before renaming the temporary file, thereby making a backup
    241322copy@footnote{Note that @value{SSED} creates the backup
    242     file whether or not any output is actually changed.}).
     323file whether or not any output is actually changed.}).
    243324
    244325@cindex In-place editing, Perl-style backup file names
     
    255336overwritten without making a backup.
    256337
     338Because @option{-i} takes an optional argument, it should
     339not be followed by other short options:
     340@table @code
     341@item sed -Ei '...' FILE
     342Same as @option{-E -i} with no backup suffix - @file{FILE} will be
     343edited in-place without creating a backup.
     344
     345@item sed -iE '...' FILE
     346This is equivalent to @option{--in-place=E}, creating @file{FILEE} as backup
     347of @file{FILE}
     348@end table
     349
     350Be cautious of using @option{-n} with @option{-i}: the former disables
     351automatic printing of lines and the latter changes the file in-place
     352without a backup. Used carelessly (and without an explicit @code{p} command),
     353the output file will be empty:
     354@codequotebacktick on
     355@codequoteundirected on
     356@example
     357# WRONG USAGE: 'FILE' will be truncated.
     358sed -ni 's/foo/bar/' FILE
     359@end example
     360@codequotebacktick off
     361@codequoteundirected off
     362
    257363@item -l @var{N}
    258364@itemx --line-length=@var{N}
     
    265371
    266372@item --posix
     373@opindex --posix
    267374@cindex @value{SSEDEXT}, disabling
    268 @value{SSED} includes several extensions to @acronym{POSIX}
     375@value{SSED} includes several extensions to POSIX
    269376sed.  In order to simplify writing portable scripts, this
    270377option disables all the extensions that this manual documents,
     
    272379@cindex @code{POSIXLY_CORRECT} behavior, enabling
    273380Most of the extensions accept @command{sed} programs that
    274 are outside the syntax mandated by @acronym{POSIX}, but some
     381are outside the syntax mandated by POSIX, but some
    275382of them (such as the behavior of the @command{N} command
    276 described in @pxref{Reporting Bugs}) actually violate the
     383described in @ref{Reporting Bugs}) actually violate the
    277384standard.  If you want to disable only the latter kind of
    278385extension, you can set the @code{POSIXLY_CORRECT} variable
    279386to a non-empty value.
    280387
    281 @item -r
     388@item -b
     389@itemx --binary
     390@opindex -b
     391@opindex --binary
     392This option is available on every platform, but is only effective where the
     393operating system makes a distinction between text files and binary files.
     394When such a distinction is made---as is the case for MS-DOS, Windows,
     395Cygwin---text files are composed of lines separated by a carriage return
     396@emph{and} a line feed character, and @command{sed} does not see the
     397ending CR.  When this option is specified, @command{sed} will open
     398input files in binary mode, thus not requesting this special processing
     399and considering lines to end at a line feed.
     400
     401@item --follow-symlinks
     402@opindex --follow-symlinks
     403This option is available only on platforms that support
     404symbolic links and has an effect only if option @option{-i}
     405is specified.  In this case, if the file that is specified
     406on the command line is a symbolic link, @command{sed} will
     407follow the link and edit the ultimate destination of the
     408link.  The default behavior is to break the symbolic link,
     409so that the link destination will not be modified.
     410
     411@item -E
     412@itemx -r
    282413@itemx --regexp-extended
     414@opindex -E
    283415@opindex -r
    284416@opindex --regexp-extended
    285417@cindex Extended regular expressions, choosing
    286 @cindex @acronym{GNU} extensions, extended regular expressions
     418@cindex GNU extensions, extended regular expressions
    287419Use extended regular expressions rather than basic
    288420regular expressions.  Extended regexps are those that
    289421@command{egrep} accepts; they can be clearer because they
    290 usually have less backslashes, but are a @acronym{GNU} extension
    291 and hence scripts that use them are not portable.
    292 @xref{Extended regexps, , Extended regular expressions}.
    293 
    294 @ifset PERL
    295 @item -R
    296 @itemx --regexp-perl
    297 @opindex -R
    298 @opindex --regexp-perl
    299 @cindex Perl-style regular expressions, choosing
    300 @cindex @value{SSEDEXT}, Perl-style regular expressions
    301 Use Perl-style regular expressions rather than basic
    302 regular expressions.  Perl-style regexps are extremely
    303 powerful but are a @value{SSED} extension and hence scripts that
    304 use it are not portable.  @xref{Perl regexps, ,
    305 Perl-style regular expressions}.
    306 @end ifset
     422usually have fewer backslashes.
     423Historically this was a GNU extension,
     424but the @option{-E}
     425extension has since been added to the POSIX standard
     426(http://austingroupbugs.net/view.php?id=528),
     427so use @option{-E} for portability.
     428GNU sed has accepted @option{-E} as an undocumented option for years,
     429and *BSD seds have accepted @option{-E} for years as well,
     430but scripts that use @option{-E} might not port to other older systems.
     431@xref{ERE syntax, , Extended regular expressions}.
     432
    307433
    308434@item -s
    309435@itemx --separate
     436@opindex -s
     437@opindex --separate
    310438@cindex Working on separate files
    311439By default, @command{sed} will consider the files specified on the
     
    318446start of each file.
    319447
     448@item --sandbox
     449@opindex --sandbox
     450@cindex Sandbox mode
     451In sandbox mode,  @code{e/w/r} commands are rejected - programs containing
     452them will be aborted without being run. Sandbox mode ensures @command{sed}
     453operates only on the input files designated on the command line, and
     454cannot run external programs.
     455
     456
    320457@item -u
    321458@itemx --unbuffered
     
    328465output as soon as possible.)
    329466
    330 @item -e @var{script}
    331 @itemx --expression=@var{script}
    332 @opindex -e
    333 @opindex --expression
    334 @cindex Script, from command line
    335 Add the commands in @var{script} to the set of commands to be
    336 run while processing the input.
    337 
    338 @item -f @var{script-file}
    339 @itemx --file=@var{script-file}
    340 @opindex -f
    341 @opindex --file
    342 @cindex Script, from a file
    343 Add the commands contained in the file @var{script-file}
    344 to the set of commands to be run while processing the input.
    345 
     467@item -z
     468@itemx --null-data
     469@itemx --zero-terminated
     470@opindex -z
     471@opindex --null-data
     472@opindex --zero-terminated
     473Treat the input as a set of lines, each terminated by a zero byte
     474(the ASCII @samp{NUL} character) instead of a newline.  This option can
     475be used with commands like @samp{sort -z} and @samp{find -print0}
     476to process arbitrary file names.
    346477@end table
    347478
     
    359490The standard input will be processed if no file names are specified.
    360491
    361 
    362 @node sed Programs
    363 @chapter @command{sed} Programs
    364 
    365 @cindex @command{sed} program structure
     492@node Exit status
     493@section Exit status
     494@cindex exit status
     495An exit status of zero indicates success, and a nonzero value
     496indicates failure. @value{SSED} returns the following exit status
     497error values:
     498
     499@table @asis
     500@item 0
     501Successful completion.
     502
     503@item 1
     504Invalid command, invalid syntax, invalid regular expression or a
     505@value{SSED} extension command used with @option{--posix}.
     506
     507@item 2
     508One or more of the input file specified on the command line could not be
     509opened (e.g. if a file is not found, or read permission is denied).
     510Processing continued with other files.
     511
     512@item 4
     513An I/O error, or a serious processing error during runtime,
     514@value{SSED} aborted immediately.
     515@end table
     516
     517@cindex Q, example
     518@cindex exit status, example
     519Additionally, the commands @code{q} and @code{Q} can be used to terminate
     520@command{sed} with a custom exit code value (this is a @value{SSED} extension):
     521
     522@example
     523$ echo | sed 'Q42' ; echo $?
     52442
     525@end example
     526
     527
     528@node sed scripts
     529@chapter @command{sed} scripts
     530
     531
     532@menu
     533* sed script overview::      @command{sed} script overview
     534* sed commands list::        @command{sed} commands summary
     535* The "s" Command::          @command{sed}'s Swiss Army Knife
     536* Common Commands::          Often used commands
     537* Other Commands::           Less frequently used commands
     538* Programming Commands::     Commands for @command{sed} gurus
     539* Extended Commands::        Commands specific of @value{SSED}
     540* Multiple commands syntax:: Extension for easier scripting
     541@end menu
     542
     543@node sed script overview
     544@section @command{sed} script overview
     545
     546@cindex @command{sed} script structure
    366547@cindex Script structure
     548
    367549A @command{sed} program consists of one or more @command{sed} commands,
    368550passed in by one or more of the
     
    371553options are used.
    372554This document will refer to ``the'' @command{sed} script;
    373 this is understood to mean the in-order catenation
     555this is understood to mean the in-order concatenation
    374556of all of the @var{script}s and @var{script-file}s passed in.
    375 
    376 Each @code{sed} command consists of an optional address or
    377 address range, followed by a one-character command name
    378 and any additional command-specific code.
    379 
    380 @menu
    381 * Execution Cycle::          How @command{sed} works
    382 * Addresses::                Selecting lines with @command{sed}
    383 * Regular Expressions::      Overview of regular expression syntax
    384 * Common Commands::          Often used commands
    385 * The "s" Command::          @command{sed}'s Swiss Army Knife
    386 * Other Commands::           Less frequently used commands
    387 * Programming Commands::     Commands for @command{sed} gurus
    388 * Extended Commands::        Commands specific of @value{SSED}
    389 * Escapes::                  Specifying special characters
    390 @end menu
    391 
    392 
    393 @node Execution Cycle
    394 @section How @command{sed} Works
    395 
    396 @cindex Buffer spaces, pattern and hold
    397 @cindex Spaces, pattern and hold
    398 @cindex Pattern space, definition
    399 @cindex Hold space, definition
    400 @command{sed} maintains two data buffers: the active @emph{pattern} space,
    401 and the auxiliary @emph{hold} space. Both are initially empty.
    402 
    403 @command{sed} operates by performing the following cycle on each
    404 lines of input: first, @command{sed} reads one line from the input
    405 stream, removes any trailing newline, and places it in the pattern space.
    406 Then commands are executed; each command can have an address associated
    407 to it: addresses are a kind of condition code, and a command is only
    408 executed if the condition is verified before the command is to be
    409 executed.
    410 
    411 When the end of the script is reached, unless the @option{-n} option
    412 is in use, the contents of pattern space are printed out to the output
    413 stream, adding back the trailing newline if it was removed.@footnote{Actually,
    414   if @command{sed} prints a line without the terminating newline, it will
    415   nevertheless print the missing newline as soon as more text is sent to
    416   the same output stream, which gives the ``least expected surprise''
    417   even though it does not make commands like @samp{sed -n p} exactly
    418   identical to @command{cat}.} Then the next cycle starts for the next
    419 input line.
    420 
    421 Unless special commands (like @samp{D}) are used, the pattern space is
    422 deleted between two cycles. The hold space, on the other hand, keeps
    423 its data between cycles (see commands @samp{h}, @samp{H}, @samp{x},
    424 @samp{g}, @samp{G} to move data between both buffers).
    425 
    426 
    427 @node Addresses
    428 @section Selecting lines with @command{sed}
    429 @cindex Addresses, in @command{sed} scripts
    430 @cindex Line selection
    431 @cindex Selecting lines to process
    432 
    433 Addresses in a @command{sed} script can be in any of the following forms:
     557@xref{Overview}.
     558
     559
     560@cindex @command{sed} commands syntax
     561@cindex syntax, @command{sed} commands
     562@cindex addresses, syntax
     563@cindex syntax, addresses
     564@command{sed} commands follow this syntax:
     565
     566@example
     567[addr]@var{X}[options]
     568@end example
     569
     570@var{X} is a single-letter @command{sed} command.
     571@c TODO: add @pxref{commands} when there is a command-list section.
     572@code{[addr]} is an optional line address. If @code{[addr]} is specified,
     573the command @var{X} will be executed only on the matched lines.
     574@code{[addr]} can be a single line number, a regular expression,
     575or a range of lines (@pxref{sed addresses}).
     576Additional @code{[options]} are used for some @command{sed} commands.
     577
     578@cindex @command{d}, example
     579@cindex address range, example
     580@cindex example, address range
     581The following example deletes  lines 30 to 35 in the input.
     582@code{30,35} is an address range. @command{d} is the delete command:
     583
     584@example
     585sed '30,35d' input.txt > output.txt
     586@end example
     587
     588@cindex @command{q}, example
     589@cindex regular expression, example
     590@cindex example, regular expression
     591The following example prints all input until a line
     592starting with the string @samp{foo} is found. If such line is found,
     593@command{sed} will terminate with exit status 42.
     594If such line was not found (and no other error occurred), @command{sed}
     595will exit with status 0.
     596@code{/^foo/} is a regular-expression address.
     597@command{q} is the quit command. @code{42} is the command option.
     598
     599@example
     600sed '/^foo/q42' input.txt > output.txt
     601@end example
     602
     603
     604@cindex multiple @command{sed} commands
     605@cindex @command{sed} commands, multiple
     606@cindex newline, command separator
     607@cindex semicolons, command separator
     608@cindex ;, command separator
     609@cindex -e, example
     610@cindex -f, example
     611Commands within a @var{script} or @var{script-file} can be
     612separated by semicolons (@code{;}) or newlines (ASCII 10).
     613Multiple scripts can be specified with @option{-e} or @option{-f}
     614options.
     615
     616The following examples are all equivalent. They perform two @command{sed}
     617operations: deleting any lines matching the regular expression @code{/^foo/},
     618and replacing all occurrences of the string @samp{hello} with @samp{world}:
     619
     620@example
     621sed '/^foo/d ; s/hello/world/g' input.txt > output.txt
     622
     623sed -e '/^foo/d' -e 's/hello/world/g' input.txt > output.txt
     624
     625echo '/^foo/d' > script.sed
     626echo 's/hello/world/g' >> script.sed
     627sed -f script.sed input.txt > output.txt
     628
     629echo 's/hello/world/g' > script2.sed
     630sed -e '/^foo/d' -f script2.sed input.txt > output.txt
     631@end example
     632
     633
     634@cindex @command{a}, and semicolons
     635@cindex @command{c}, and semicolons
     636@cindex @command{i}, and semicolons
     637Commands @command{a}, @command{c}, @command{i}, due to their syntax,
     638cannot be followed by semicolons working as command separators and
     639thus should be terminated
     640with newlines or be placed at the end of a @var{script} or @var{script-file}.
     641Commands can also be preceded with optional non-significant
     642whitespace characters.
     643@xref{Multiple commands syntax}.
     644
     645
     646
     647@node sed commands list
     648@section @command{sed} commands summary
     649
     650The following commands are supported in @value{SSED}.
     651Some are standard POSIX commands, while other are @value{SSEDEXT}.
     652Details and examples for each command are in the following sections.
     653(Mnemonics) are shown in parentheses.
     654
    434655@table @code
    435 @item @var{number}
    436 @cindex Address, numeric
    437 @cindex Line, selecting by number
    438 Specifying a line number will match only that line in the input.
    439 (Note that @command{sed} counts lines continuously across all input files
    440 unless @option{-i} or @option{-s} options are specified.)
    441 
    442 @item @var{first}~@var{step}
    443 @cindex @acronym{GNU} extensions, @samp{@var{n}~@var{m}} addresses
    444 This @acronym{GNU} extension matches every @var{step}th line
    445 starting with line @var{first}.
    446 In particular, lines will be selected when there exists
    447 a non-negative @var{n} such that the current line-number equals
    448 @var{first} + (@var{n} * @var{step}).
    449 Thus, to select the odd-numbered lines,
    450 one would use @code{1~2};
    451 to pick every third line starting with the second, @samp{2~3} would be used;
    452 to pick every fifth line starting with the tenth, use @samp{10~5};
    453 and @samp{50~0} is just an obscure way of saying @code{50}.
    454 
    455 @item $
    456 @cindex Address, last line
    457 @cindex Last line, selecting
    458 @cindex Line, selecting last
    459 This address matches the last line of the last file of input, or
    460 the last line of each file when the @option{-i} or @option{-s} options
    461 are specified.
    462 
    463 @item /@var{regexp}/
    464 @cindex Address, as a regular expression
    465 @cindex Line, selecting by regular expression match
    466 This will select any line which matches the regular expression @var{regexp}.
    467 If @var{regexp} itself includes any @code{/} characters,
    468 each must be escaped by a backslash (@code{\}).
    469 
    470 @cindex empty regular expression
    471 @cindex @value{SSEDEXT}, modifiers and the empty regular expression
    472 The empty regular expression @samp{//} repeats the last regular
    473 expression match (the same holds if the empty regular expression is
    474 passed to the @code{s} command).  Note that modifiers to regular expressions
    475 are evaluated when the regular expression is compiled, thus it is invalid to
    476 specify them together with the empty regular expression.
    477 
    478 @item \%@var{regexp}%
    479 (The @code{%} may be replaced by any other single character.)
    480 
    481 @cindex Slash character, in regular expressions
    482 This also matches the regular expression @var{regexp},
    483 but allows one to use a different delimiter than @code{/}.
    484 This is particularly useful if the @var{regexp} itself contains
    485 a lot of slashes, since it avoids the tedious escaping of every @code{/}.
    486 If @var{regexp} itself includes any delimiter characters,
    487 each must be escaped by a backslash (@code{\}).
    488 
    489 @item /@var{regexp}/I
    490 @itemx \%@var{regexp}%I
    491 @cindex @acronym{GNU} extensions, @code{I} modifier
    492 @ifset PERL
    493 @cindex Perl-style regular expressions, case-insensitive
    494 @end ifset
    495 The @code{I} modifier to regular-expression matching is a @acronym{GNU}
    496 extension which causes the @var{regexp} to be matched in
    497 a case-insensitive manner.
    498 
    499 @item /@var{regexp}/M
    500 @itemx \%@var{regexp}%M
    501 @ifset PERL
    502 @cindex @value{SSEDEXT}, @code{M} modifier
    503 @end ifset
    504 @cindex Perl-style regular expressions, multiline
    505 The @code{M} modifier to regular-expression matching is a @value{SSED}
    506 extension which causes @code{^} and @code{$} to match respectively
    507 (in addition to the normal behavior) the empty string after a newline,
    508 and the empty string before a newline.  There are special character
    509 sequences
    510 @ifset PERL
    511 (@code{\A} and @code{\Z} in Perl mode, @code{\`} and @code{\'}
    512 in basic or extended regular expression modes)
    513 @end ifset
    514 @ifclear PERL
    515 (@code{\`} and @code{\'})
    516 @end ifclear
    517 which always match the beginning or the end of the buffer.
    518 @code{M} stands for @cite{multi-line}.
    519 
    520 @ifset PERL
    521 @item /@var{regexp}/S
    522 @itemx \%@var{regexp}%S
    523 @cindex @value{SSEDEXT}, @code{S} modifier
    524 @cindex Perl-style regular expressions, single line
    525 The @code{S} modifier to regular-expression matching is only valid
    526 in Perl mode and specifies that the dot character (@code{.}) will
    527 match the newline character too.  @code{S} stands for @cite{single-line}.
    528 @end ifset
    529 
    530 @ifset PERL
    531 @item /@var{regexp}/X
    532 @itemx \%@var{regexp}%X
    533 @cindex @value{SSEDEXT}, @code{X} modifier
    534 @cindex Perl-style regular expressions, extended
    535 The @code{X} modifier to regular-expression matching is also
    536 valid in Perl mode only.  If it is used, whitespace in the
    537 pattern (other than in a character class) and
    538 characters between a @kbd{#} outside a character class and the
    539 next newline character are ignored. An escaping backslash
    540 can be used to include a whitespace or @kbd{#} character as part
    541 of the pattern.
    542 @end ifset
    543 @end table
    544 
    545 If no addresses are given, then all lines are matched;
    546 if one address is given, then only lines matching that
    547 address are matched.
    548 
    549 @cindex Range of lines
    550 @cindex Several lines, selecting
    551 An address range can be specified by specifying two addresses
    552 separated by a comma (@code{,}).  An address range matches lines
    553 starting from where the first address matches, and continues
    554 until the second address matches (inclusively).
    555 
    556 If the second address is a @var{regexp}, then checking for the
    557 ending match will start with the line @emph{following} the
    558 line which matched the first address: a range will always
    559 span at least two lines (except of course if the input stream
    560 ends).
    561 
    562 If the second address is a @var{number} less than (or equal to)
    563 the line matching the first address, then only the one line is
    564 matched.
    565 
    566 @cindex Special addressing forms
    567 @cindex Range with start address of zero
    568 @cindex Zero, as range start address
    569 @cindex @var{addr1},+N
    570 @cindex @var{addr1},~N
    571 @cindex @acronym{GNU} extensions, special two-address forms
    572 @cindex @acronym{GNU} extensions, @code{0} address
    573 @cindex @acronym{GNU} extensions, 0,@var{addr2} addressing
    574 @cindex @acronym{GNU} extensions, @var{addr1},+@var{N} addressing
    575 @cindex @acronym{GNU} extensions, @var{addr1},~@var{N} addressing
    576 @value{SSED} also supports some special two-address forms; all these
    577 are @acronym{GNU} extensions:
    578 @table @code
    579 @item 0,/@var{regexp}/
    580 A line number of @code{0} can be used in an address specification like
    581 @code{0,/@var{regexp}/} so that @command{sed} will try to match
    582 @var{regexp} in the first input line too.  In other words,
    583 @code{0,/@var{regexp}/} is similar to @code{1,/@var{regexp}/},
    584 except that if @var{addr2} matches the very first line of input the
    585 @code{0,/@var{regexp}/} form will consider it to end the range, whereas
    586 the @code{1,/@var{regexp}/} form will match the beginning of its range and
    587 hence make the range span up to the @emph{second} occurrence of the
    588 regular expression.
    589 
    590 Note that this is the only place where the @code{0} address makes
    591 sense; there is no 0-th line and commands which are given the @code{0}
    592 address in any other way will give an error.
    593 
    594 @item @var{addr1},+@var{N}
    595 Matches @var{addr1} and the @var{N} lines following @var{addr1}.
    596 
    597 @item @var{addr1},~@var{N}
    598 Matches @var{addr1} and the lines following @var{addr1}
    599 until the next line whose input line number is a multiple of @var{N}.
    600 @end table
    601 
    602 @cindex Excluding lines
    603 @cindex Selecting non-matching lines
    604 Appending the @code{!} character to the end of an address
    605 specification negates the sense of the match.
    606 That is, if the @code{!} character follows an address range,
    607 then only lines which do @emph{not} match the address range
    608 will be selected.
    609 This also works for singleton addresses,
    610 and, perhaps perversely, for the null address.
    611 
    612 
    613 @node Regular Expressions
    614 @section Overview of Regular Expression Syntax
    615 
    616 To know how to use @command{sed}, people should understand regular
    617 expressions (@dfn{regexp} for short).  A regular expression
    618 is a pattern that is matched against a
    619 subject string from left to right.  Most characters are
    620 @dfn{ordinary}: they stand for
    621 themselves in a pattern, and match the corresponding characters
    622 in the subject.  As a trivial example, the pattern
    623 
    624 @example
    625      The quick brown fox
    626 @end example
    627 
    628 @noindent
    629 matches a portion of a subject string that is identical to
    630 itself.  The power of regular expressions comes from the
    631 ability to include alternatives and repetitions in the pattern.
    632 These are encoded in the pattern by the use of @dfn{special characters},
    633 which do not stand for themselves but instead
    634 are interpreted in some special way.  Here is a brief description
    635 of regular expression syntax as used in @command{sed}.
    636 
    637 @table @code
    638 @item @var{char}
    639 A single ordinary character matches itself.
    640 
    641 @item *
    642 @cindex @acronym{GNU} extensions, to basic regular expressions
    643 Matches a sequence of zero or more instances of matches for the
    644 preceding regular expression, which must be an ordinary character, a
    645 special character preceded by @code{\}, a @code{.}, a grouped regexp
    646 (see below), or a bracket expression.  As a @acronym{GNU} extension, a
    647 postfixed regular expression can also be followed by @code{*}; for
    648 example, @code{a**} is equivalent to @code{a*}.  @acronym{POSIX}
    649 1003.1-2001 says that @code{*} stands for itself when it appears at
    650 the start of a regular expression or subexpression, but many
    651 non@acronym{GNU} implementations do not support this and portable
    652 scripts should instead use @code{\*} in these contexts.
    653 
    654 @item \+
    655 @cindex @acronym{GNU} extensions, to basic regular expressions
    656 As @code{*}, but matches one or more.  It is a @acronym{GNU} extension.
    657 
    658 @item \?
    659 @cindex @acronym{GNU} extensions, to basic regular expressions
    660 As @code{*}, but only matches zero or one.  It is a @acronym{GNU} extension.
    661 
    662 @item \@{@var{i}\@}
    663 As @code{*}, but matches exactly @var{i} sequences (@var{i} is a
    664 decimal integer; for portability, keep it between 0 and 255
    665 inclusive).
    666 
    667 @item \@{@var{i},@var{j}\@}
    668 Matches between @var{i} and @var{j}, inclusive, sequences.
    669 
    670 @item \@{@var{i},\@}
    671 Matches more than or equal to @var{i} sequences.
    672 
    673 @item \(@var{regexp}\)
    674 Groups the inner @var{regexp} as a whole, this is used to:
    675 
    676 @itemize @bullet
    677 @item
    678 @cindex @acronym{GNU} extensions, to basic regular expressions
    679 Apply postfix operators, like @code{\(abcd\)*}:
    680 this will search for zero or more whole sequences
    681 of @samp{abcd}, while @code{abcd*} would search
    682 for @samp{abc} followed by zero or more occurrences
    683 of @samp{d}.  Note that support for @code{\(abcd\)*} is
    684 required by @acronym{POSIX} 1003.1-2001, but many non-@acronym{GNU}
    685 implementations do not support it and hence it is not universally
    686 portable.         
    687 
    688 @item
    689 Use back references (see below).
    690 @end itemize
    691 
    692 @item .
    693 Matches any character, including newline.
    694 
    695 @item ^
    696 Matches the null string at beginning of line, i.e. what
    697 appears after the circumflex must appear at the
    698 beginning of line. @code{^#include} will match only
    699 lines where @samp{#include} is the first thing on line---if
    700 there are spaces before, for example, the match fails.
    701 @code{^} acts as a special character only at the beginning
    702 of the regular expression or subexpression (that is,
    703 after @code{\(} or @code{\|}).  Portable scripts should avoid
    704 @code{^} at the beginning of a subexpression, though, as
    705 @acronym{POSIX} allows implementations that treat @code{^} as
    706 an ordinary character in that context.
    707 
    708 
    709 @item $
    710 It is the same as @code{^}, but refers to end of line.
    711 @code{$} also acts as a special character only at the end
    712 of the regular expression or subexpression (that is, before @code{\)}
    713 or @code{\|}), and its use at the end of a subexpression is not
    714 portable.
    715 
    716 
    717 @item [@var{list}]
    718 @itemx [^@var{list}]
    719 Matches any single character in @var{list}: for example,
    720 @code{[aeiou]} matches all vowels.  A list may include
    721 sequences like @code{@var{char1}-@var{char2}}, which
    722 matches any character between (inclusive) @var{char1}
    723 and @var{char2}.
    724 
    725 A leading @code{^} reverses the meaning of @var{list}, so that
    726 it matches any single character @emph{not} in @var{list}.  To include
    727 @code{]} in the list, make it the first character (after
    728 the @code{^} if needed), to include @code{-} in the list,
    729 make it the first or last; to include @code{^} put
    730 it after the first character.
    731 
    732 @cindex @code{POSIXLY_CORRECT} behavior, bracket expressions
    733 The characters @code{$}, @code{*}, @code{.}, @code{[}, and @code{\}
    734 are normally not special within @var{list}.  For example, @code{[\*]}
    735 matches either @samp{\} or @samp{*}, because the @code{\} is not
    736 special here.  However, strings like @code{[.ch.]}, @code{[=a=]}, and
    737 @code{[:space:]} are special within @var{list} and represent collating
    738 symbols, equivalence classes, and character classes, respectively, and
    739 @code{[} is therefore special within @var{list} when it is followed by
    740 @code{.}, @code{=}, or @code{:}.  Also, when not in
    741 @env{POSIXLY_CORRECT} mode, special escapes like @code{\n} and
    742 @code{\t} are recognized within @var{list}.  @xref{Escapes}.
    743 
    744 @item @var{regexp1}\|@var{regexp2}
    745 @cindex @acronym{GNU} extensions, to basic regular expressions
    746 Matches either @var{regexp1} or @var{regexp2}.  Use
    747 parentheses to use complex alternative regular expressions.
    748 The matching process tries each alternative in turn, from
    749 left to right, and the first one that succeeds is used.
    750 It is a @acronym{GNU} extension.
    751 
    752 @item @var{regexp1}@var{regexp2}
    753 Matches the concatenation of @var{regexp1} and @var{regexp2}.
    754 Concatenation binds more tightly than @code{\|}, @code{^}, and
    755 @code{$}, but less tightly than the other regular expression
    756 operators.
    757 
    758 @item \@var{digit}
    759 Matches the @var{digit}-th @code{\(@dots{}\)} parenthesized
    760 subexpression in the regular expression.  This is called a @dfn{back
    761 reference}.  Subexpressions are implicity numbered by counting
    762 occurrences of @code{\(} left-to-right.
    763 
    764 @item \n
    765 Matches the newline character.
    766 
    767 @item \@var{char}
    768 Matches @var{char}, where @var{char} is one of @code{$},
    769 @code{*}, @code{.}, @code{[}, @code{\}, or @code{^}.
    770 Note that the only C-like
    771 backslash sequences that you can portably assume to be
    772 interpreted are @code{\n} and @code{\\}; in particular
    773 @code{\t} is not portable, and matches a @samp{t} under most
    774 implementations of @command{sed}, rather than a tab character.
    775 
    776 @end table
    777 
    778 @cindex Greedy regular expression matching
    779 Note that the regular expression matcher is greedy, i.e., matches
    780 are attempted from left to right and, if two or more matches are
    781 possible starting at the same character, it selects the longest.
    782 
    783 @noindent
    784 Examples:
    785 @table @samp
    786 @item abcdef
    787 Matches @samp{abcdef}.
    788 
    789 @item a*b
    790 Matches zero or more @samp{a}s followed by a single
    791 @samp{b}.  For example, @samp{b} or @samp{aaaaab}.
    792 
    793 @item a\?b
    794 Matches @samp{b} or @samp{ab}.
    795 
    796 @item a\+b\+
    797 Matches one or more @samp{a}s followed by one or more
    798 @samp{b}s: @samp{ab} is the shortest possible match, but
    799 other examples are @samp{aaaab} or @samp{abbbbb} or
    800 @samp{aaaaaabbbbbbb}.
    801 
    802 @item .*
    803 @itemx .\+
    804 These two both match all the characters in a string;
    805 however, the first matches every string (including the empty
    806 string), while the second matches only strings containing
    807 at least one character.
    808 
    809 @item ^main.*(.*)
    810 his matches a string starting with @samp{main},
    811 followed by an opening and closing
    812 parenthesis.  The @samp{n}, @samp{(} and @samp{)} need not
    813 be adjacent.
    814 
    815 @item ^#
    816 This matches a string beginning with @samp{#}.
    817 
    818 @item \\$
    819 This matches a string ending with a single backslash.  The
    820 regexp contains two backslashes for escaping.
    821 
    822 @item \$
    823 Instead, this matches a string consisting of a single dollar sign,
    824 because it is escaped.
    825 
    826 @item [a-zA-Z0-9]
    827 In the C locale, this matches any @acronym{ASCII} letters or digits.
    828 
    829 @item [^ @kbd{tab}]\+
    830 (Here @kbd{tab} stands for a single tab character.)
    831 This matches a string of one or more
    832 characters, none of which is a space or a tab.
    833 Usually this means a word.
    834 
    835 @item ^\(.*\)\n\1$
    836 This matches a string consisting of two equal substrings separated by
    837 a newline.
    838 
    839 @item .\@{9\@}A$
    840 This matches nine characters followed by an @samp{A}.
    841 
    842 @item ^.\@{15\@}A
    843 This matches the start of a string that contains 16 characters,
    844 the last of which is an @samp{A}.
    845 
    846 @end table
    847 
    848 
    849 
    850 @node Common Commands
    851 @section Often-Used Commands
    852 
    853 If you use @command{sed} at all, you will quite likely want to know
    854 these commands.
    855 
    856 @table @code
    857 @item #
    858 [No addresses allowed.]
    859 
    860 @findex # (comments)
    861 @cindex Comments, in scripts
    862 The @code{#} character begins a comment;
    863 the comment continues until the next newline.
    864 
    865 @cindex Portability, comments
    866 If you are concerned about portability, be aware that
    867 some implementations of @command{sed} (which are not @sc{posix}
    868 conformant) may only support a single one-line comment,
    869 and then only when the very first character of the script is a @code{#}.
    870 
    871 @findex -n, forcing from within a script
    872 @cindex Caveat --- #n on first line
    873 Warning: if the first two characters of the @command{sed} script
    874 are @code{#n}, then the @option{-n} (no-autoprint) option is forced.
    875 If you want to put a comment in the first line of your script
    876 and that comment begins with the letter @samp{n}
    877 and you do not want this behavior,
    878 then be sure to either use a capital @samp{N},
    879 or place at least one space before the @samp{n}.
    880 
    881 @item q [@var{exit-code}]
    882 This command only accepts a single address.
    883 
    884 @findex q (quit) command
    885 @cindex @value{SSEDEXT}, returning an exit code
    886 @cindex Quitting
    887 Exit @command{sed} without processing any more commands or input.
    888 Note that the current pattern space is printed if auto-print is
    889 not disabled with the @option{-n} options.  The ability to return
    890 an exit code from the @command{sed} script is a @value{SSED} extension.
     656
     657@item a\
     658@itemx @var{text}
     659Append @var{text} after a line.
     660
     661@item a @var{text}
     662Append @var{text} after a line (alternative syntax).
     663
     664@item b @var{label}
     665Branch unconditionally to @var{label}.
     666The @var{label} may be omitted, in which case the next cycle is started.
     667
     668@item c\
     669@itemx @var{text}
     670Replace (change) lines with @var{text}.
     671
     672@item c @var{text}
     673Replace (change) lines with @var{text} (alternative syntax).
    891674
    892675@item d
    893 @findex d (delete) command
    894 @cindex Text, deleting
    895676Delete the pattern space;
    896677immediately start next cycle.
    897678
    898 @item p
    899 @findex p (print) command
    900 @cindex Text, printing
    901 Print out the pattern space (to the standard output).
    902 This command is usually only used in conjunction with the @option{-n}
    903 command-line option.
     679@item D
     680If pattern space contains newlines, delete text in the pattern
     681space up to the first newline, and restart cycle with the resultant
     682pattern space, without reading a new line of input.
     683
     684If pattern space contains no newline, start a normal new cycle as if
     685the @code{d} command was issued.
     686@c TODO: add a section about D+N and D+n commands
     687
     688@item e
     689Executes the command that is found in pattern space and
     690replaces the pattern space with the output; a trailing newline
     691is suppressed.
     692
     693@item e @var{command}
     694Executes @var{command} and sends its output to the output stream.
     695The command can run across multiple lines, all but the last ending with
     696a back-slash.
     697
     698@item F
     699(filename) Print the file name of the current input file (with a trailing
     700newline).
     701
     702@item g
     703Replace the contents of the pattern space with the contents of the hold space.
     704
     705@item G
     706Append a newline to the contents of the pattern space,
     707and then append the contents of the hold space to that of the pattern space.
     708
     709@item h
     710(hold) Replace the contents of the hold space with the contents of the
     711pattern space.
     712
     713@item H
     714Append a newline to the contents of the hold space,
     715and then append the contents of the pattern space to that of the hold space.
     716
     717@item i\
     718@itemx @var{text}
     719insert @var{text} before a line.
     720
     721@item i @var{text}
     722insert @var{text} before a line (alternative syntax).
     723
     724@item l
     725Print the pattern space in an unambiguous form.
    904726
    905727@item n
    906 @findex n (next-line) command
    907 @cindex Next input line, replace pattern space with
    908 @cindex Read next input line
    909 If auto-print is not disabled, print the pattern space,
     728(next) If auto-print is not disabled, print the pattern space,
    910729then, regardless, replace the pattern space with the next line of input.
    911730If there is no more input then @command{sed} exits without processing
    912731any more commands.
    913732
    914 @item @{ @var{commands} @}
    915 @findex @{@} command grouping
    916 @cindex Grouping commands
    917 @cindex Command groups
    918 A group of commands may be enclosed between
    919 @code{@{} and @code{@}} characters.
    920 This is particularly useful when you want a group of commands
    921 to be triggered by a single address (or address-range) match.
     733@item N
     734Add a newline to the pattern space,
     735then append the next line of input to the pattern space.
     736If there is no more input then @command{sed} exits without processing
     737any more commands.
     738
     739@item p
     740Print the pattern space.
     741@c useful with @option{-n}
     742
     743@item P
     744Print the pattern space, up to the first <newline>.
     745
     746@item q@var{[exit-code]}
     747(quit) Exit @command{sed} without processing any more commands or input.
     748
     749@item Q@var{[exit-code]}
     750(quit) This command is the same as @code{q}, but will not print the
     751contents of pattern space.  Like @code{q}, it provides the
     752ability to return an exit code to the caller.
     753@c useful to quit on a conditional without printing
     754
     755@item r filename
     756Reads file @var{filename}.
     757
     758@item R filename
     759Queue a line of @var{filename} to be read and
     760inserted into the output stream at the end of the current cycle,
     761or when the next input line is read.
     762@c useful to interleave files
     763
     764@item s@var{/regexp/replacement/[flags]}
     765(substitute) Match the regular-expression against the content of the
     766pattern space.  If found, replace matched string with
     767@var{replacement}.
     768
     769@item t @var{label}
     770(test) Branch to @var{label} only if there has been a successful
     771@code{s}ubstitution since the last input line was read or conditional
     772branch was taken.  The @var{label} may be omitted, in which case the
     773next cycle is started.
     774
     775@item T @var{label}
     776(test) Branch to @var{label} only if there have been no successful
     777@code{s}ubstitutions since the last input line was read or
     778conditional branch was taken. The @var{label} may be omitted,
     779in which case the next cycle is started.
     780
     781@item v @var{[version]}
     782(version) This command does nothing, but makes @command{sed} fail if
     783@value{SSED} extensions are not supported, or if the requested version
     784is not available.
     785
     786@item w filename
     787Write the pattern space to @var{filename}.
     788
     789@item W filename
     790Write to the given filename the portion of the pattern space up to
     791the first newline
     792
     793@item x
     794Exchange the contents of the hold and pattern spaces.
     795
     796
     797@item y/src/dst/
     798Transliterate any characters in the pattern space which match
     799any of the @var{source-chars} with the corresponding character
     800in @var{dest-chars}.
     801
     802
     803@item z
     804(zap) This command empties the content of pattern space.
     805
     806@item #
     807A comment, until  the next newline.
     808
     809
     810@item @{ @var{cmd ; cmd ...} @}
     811Group several commands together.
     812@c useful for multiple commands on same address
     813
     814@item =
     815Print the current input line number (with a trailing newline).
     816
     817@item : @var{label}
     818Specify the location of @var{label} for branch commands (@code{b},
     819@code{t}, @code{T}).
    922820
    923821@end table
     822
    924823
    925824@node The "s" Command
    926825@section The @code{s} Command
    927826
    928 The syntax of the @code{s} (as in substitute) command is
    929 @samp{s/@var{regexp}/@var{replacement}/@var{flags}}.  The @code{/}
    930 characters may be uniformly replaced by any other single
    931 character within any given @code{s} command.  The @code{/}
    932 character (or whatever other character is used in its stead)
    933 can appear in the @var{regexp} or @var{replacement}
    934 only if it is preceded by a @code{\} character.
    935 
    936 The @code{s} command is probably the most important in @command{sed}
    937 and has a lot of different options.  Its basic concept is simple:
    938 the @code{s} command attempts to match the pattern
    939 space against the supplied @var{regexp}; if the match is
    940 successful, then that portion of the pattern
    941 space which was matched is replaced with @var{replacement}.
     827The @code{s} command (as in substitute) is probably the most important
     828in @command{sed} and has a lot of different options.  The syntax of
     829the @code{s} command is
     830@samp{s/@var{regexp}/@var{replacement}/@var{flags}}.
     831
     832Its basic concept is simple: the @code{s} command attempts to match
     833the pattern space against the supplied regular expression @var{regexp};
     834if the match is successful, then that portion of the
     835pattern space which was matched is replaced with @var{replacement}.
     836
     837For details about @var{regexp} syntax @pxref{Regexp Addresses,,Regular
     838Expression Addresses}.
    942839
    943840@cindex Backreferences, in regular expressions
     
    950847characters which reference the whole matched portion
    951848of the pattern space.
     849
     850@c TODO: xref to backreference section mention @var{\'}.
     851
     852The @code{/}
     853characters may be uniformly replaced by any other single
     854character within any given @code{s} command.  The @code{/}
     855character (or whatever other character is used in its stead)
     856can appear in the @var{regexp} or @var{replacement}
     857only if it is preceded by a @code{\} character.
     858
     859
     860
    952861@cindex @value{SSEDEXT}, case modifiers in @code{s} commands
    953862Finally, as a @value{SSED} extension, you can include a
     
    976885Stop case conversion started by @code{\L} or @code{\U}.
    977886@end table
     887
     888When the @code{g} flag is being used, case conversion does not
     889propagate from one occurrence of the regular expression to
     890another.  For example, when the following command is executed
     891with @samp{a-b-} in pattern space:
     892@example
     893s/\(b\?\)-/x\u\1/g
     894@end example
     895
     896@noindent
     897the output is @samp{axxB}.  When replacing the first @samp{-},
     898the @samp{\u} sequence only affects the empty replacement of
     899@samp{\1}.  It does not affect the @code{x} character that is
     900added to pattern space when replacing @code{b-} with @code{xB}.
     901
     902On the other hand, @code{\l} and @code{\u} do affect the remainder
     903of the replacement text if they are followed by an empty substitution.
     904With @samp{a-b-} in pattern space, the following command:
     905@example
     906s/\(b\?\)-/\u\1x/g
     907@end example
     908
     909@noindent
     910will replace @samp{-} with @samp{X} (uppercase) and @samp{b-} with
     911@samp{Bx}.  If this behavior is undesirable, you can prevent it by
     912adding a @samp{\E} sequence---after @samp{\1} in this case.
    978913
    979914To include a literal @code{\}, @code{&}, or newline in the final
     
    997932Only replace the @var{number}th match of the @var{regexp}.
    998933
    999 @cindex @acronym{GNU} extensions, @code{g} and @var{number} modifier interaction in @code{s} command
     934@cindex GNU extensions, @code{g} and @var{number} modifier
     935interaction in @code{s} command
    1000936@cindex Mixing @code{g} and @var{number} modifiers in the @code{s} command
    1001937Note: the @sc{posix} standard does not specify what should happen
     
    1023959change in future versions.
    1024960
    1025 @item w @var{file-name}
     961@item w @var{filename}
    1026962@cindex Text, writing to a file after substitution
    1027963@cindex @value{SSEDEXT}, @file{/dev/stdout} file
    1028964@cindex @value{SSEDEXT}, @file{/dev/stderr} file
    1029965If the substitution was made, then write out the result to the named file.
    1030 As a @value{SSED} extension, two special values of @var{file-name} are
     966As a @value{SSED} extension, two special values of @var{filename} are
    1031967supported: @file{/dev/stderr}, which writes the result to the standard
    1032968error, and @file{/dev/stdout}, which writes to the standard
     
    1048984@item I
    1049985@itemx i
    1050 @cindex @acronym{GNU} extensions, @code{I} modifier
     986@cindex GNU extensions, @code{I} modifier
    1051987@cindex Case-insensitive matching
    1052 @ifset PERL
    1053 @cindex Perl-style regular expressions, case-insensitive
    1054 @end ifset
    1055 The @code{I} modifier to regular-expression matching is a @acronym{GNU}
     988The @code{I} modifier to regular-expression matching is a GNU
    1056989extension which makes @command{sed} match @var{regexp} in a
    1057990case-insensitive manner.
     
    1060993@itemx m
    1061994@cindex @value{SSEDEXT}, @code{M} modifier
    1062 @ifset PERL
    1063 @cindex Perl-style regular expressions, multiline
    1064 @end ifset
    1065995The @code{M} modifier to regular-expression matching is a @value{SSED}
    1066 extension which causes @code{^} and @code{$} to match respectively
    1067 (in addition to the normal behavior) the empty string after a newline,
    1068 and the empty string before a newline.  There are special character
    1069 sequences
    1070 @ifset PERL
    1071 (@code{\A} and @code{\Z} in Perl mode, @code{\`} and @code{\'}
    1072 in basic or extended regular expression modes)
    1073 @end ifset
     996extension which directs @value{SSED} to match the regular expression
     997in @cite{multi-line} mode.  The modifier causes @code{^} and @code{$} to
     998match respectively (in addition to the normal behavior) the empty string
     999after a newline, and the empty string before a newline.  There are
     1000special character sequences
    10741001@ifclear PERL
    10751002(@code{\`} and @code{\'})
    10761003@end ifclear
    10771004which always match the beginning or the end of the buffer.
    1078 @code{M} stands for @cite{multi-line}.
    1079 
    1080 @ifset PERL
    1081 @item S
    1082 @itemx s
    1083 @cindex @value{SSEDEXT}, @code{S} modifier
    1084 @cindex Perl-style regular expressions, single line
    1085 The @code{S} modifier to regular-expression matching is only valid
    1086 in Perl mode and specifies that the dot character (@code{.}) will
    1087 match the newline character too.  @code{S} stands for @cite{single-line}.
    1088 @end ifset
    1089 
    1090 @ifset PERL
    1091 @item X
    1092 @itemx x
    1093 @cindex @value{SSEDEXT}, @code{X} modifier
    1094 @cindex Perl-style regular expressions, extended
    1095 The @code{X} modifier to regular-expression matching is also
    1096 valid in Perl mode only.  If it is used, whitespace in the
    1097 pattern (other than in a character class) and
    1098 characters between a @kbd{#} outside a character class and the
    1099 next newline character are ignored. An escaping backslash
    1100 can be used to include a whitespace or @kbd{#} character as part
    1101 of the pattern.
    1102 @end ifset
     1005In addition,
     1006the period character does not match a new-line character in
     1007multi-line mode.
     1008
     1009
     1010@end table
     1011
     1012@node Common Commands
     1013@section Often-Used Commands
     1014
     1015If you use @command{sed} at all, you will quite likely want to know
     1016these commands.
     1017
     1018@table @code
     1019@item #
     1020[No addresses allowed.]
     1021
     1022@findex # (comments)
     1023@cindex Comments, in scripts
     1024The @code{#} character begins a comment;
     1025the comment continues until the next newline.
     1026
     1027@cindex Portability, comments
     1028If you are concerned about portability, be aware that
     1029some implementations of @command{sed} (which are not @sc{posix}
     1030conforming) may only support a single one-line comment,
     1031and then only when the very first character of the script is a @code{#}.
     1032
     1033@findex -n, forcing from within a script
     1034@cindex Caveat --- #n on first line
     1035Warning: if the first two characters of the @command{sed} script
     1036are @code{#n}, then the @option{-n} (no-autoprint) option is forced.
     1037If you want to put a comment in the first line of your script
     1038and that comment begins with the letter @samp{n}
     1039and you do not want this behavior,
     1040then be sure to either use a capital @samp{N},
     1041or place at least one space before the @samp{n}.
     1042
     1043@item q [@var{exit-code}]
     1044@findex q (quit) command
     1045@cindex @value{SSEDEXT}, returning an exit code
     1046@cindex Quitting
     1047Exit @command{sed} without processing any more commands or input.
     1048
     1049Example: stop after printing the second line:
     1050@example
     1051$ seq 3 | sed 2q
     10521
     10532
     1054@end example
     1055
     1056This command accepts only one address.
     1057Note that the current pattern space is printed if auto-print is
     1058not disabled with the @option{-n} options.  The ability to return
     1059an exit code from the @command{sed} script is a @value{SSED} extension.
     1060
     1061See also the @value{SSED} extension @code{Q} command which quits silently
     1062without printing the current pattern space.
     1063
     1064@item d
     1065@findex d (delete) command
     1066@cindex Text, deleting
     1067Delete the pattern space;
     1068immediately start next cycle.
     1069
     1070Example: delete the second input line:
     1071@example
     1072$ seq 3 | sed 2d
     10731
     10743
     1075@end example
     1076
     1077@item p
     1078@findex p (print) command
     1079@cindex Text, printing
     1080Print out the pattern space (to the standard output).
     1081This command is usually only used in conjunction with the @option{-n}
     1082command-line option.
     1083
     1084Example: print only the second input line:
     1085@example
     1086$ seq 3 | sed -n 2p
     10872
     1088@end example
     1089
     1090@item n
     1091@findex n (next-line) command
     1092@cindex Next input line, replace pattern space with
     1093@cindex Read next input line
     1094If auto-print is not disabled, print the pattern space,
     1095then, regardless, replace the pattern space with the next line of input.
     1096If there is no more input then @command{sed} exits without processing
     1097any more commands.
     1098
     1099This command is useful to skip lines (e.g. process every Nth line).
     1100
     1101Example: perform substitution on every 3rd line (i.e. two @code{n} commands
     1102skip two lines):
     1103@codequoteundirected on
     1104@codequotebacktick on
     1105@example
     1106$ seq 6 | sed 'n;n;s/./x/'
     11071
     11082
     1109x
     11104
     11115
     1112x
     1113@end example
     1114
     1115@value{SSED} provides an extension address syntax of @var{first}~@var{step}
     1116to achieve the same result:
     1117
     1118@example
     1119$ seq 6 | sed '0~3s/./x/'
     11201
     11212
     1122x
     11234
     11245
     1125x
     1126@end example
     1127
     1128@codequotebacktick off
     1129@codequoteundirected off
     1130
     1131
     1132@item @{ @var{commands} @}
     1133@findex @{@} command grouping
     1134@cindex Grouping commands
     1135@cindex Command groups
     1136A group of commands may be enclosed between
     1137@code{@{} and @code{@}} characters.
     1138This is particularly useful when you want a group of commands
     1139to be triggered by a single address (or address-range) match.
     1140
     1141Example: perform substitution then print the second input line:
     1142@codequoteundirected on
     1143@codequotebacktick on
     1144@example
     1145$ seq 3 | sed -n '2@{s/2/X/ ; p@}'
     1146X
     1147@end example
     1148@codequoteundirected off
     1149@codequotebacktick off
     1150
    11031151@end table
    11041152
     
    11131161@table @code
    11141162@item y/@var{source-chars}/@var{dest-chars}/
    1115 (The @code{/} characters may be uniformly replaced by
    1116 any other single character within any given @code{y} command.)
    1117 
    11181163@findex y (transliterate) command
    11191164@cindex Transliteration
     
    11221167in @var{dest-chars}.
    11231168
     1169Example: transliterate @samp{a-j} into @samp{0-9}:
     1170@codequoteundirected on
     1171@codequotebacktick on
     1172@example
     1173$ echo hello world | sed 'y/abcdefghij/0123456789/'
     117474llo worl3
     1175@end example
     1176@codequoteundirected off
     1177@codequotebacktick off
     1178
     1179(The @code{/} characters may be uniformly replaced by
     1180any other single character within any given @code{y} command.)
     1181
    11241182Instances of the @code{/} (or whatever other character is used in its stead),
    11251183@code{\}, or newlines can appear in the @var{source-chars} or @var{dest-chars}
     
    11281186contain the same number of characters (after de-escaping).
    11291187
     1188See the @command{tr} command from GNU coreutils for similar functionality.
     1189
     1190@item a @var{text}
     1191Appending @var{text} after a line. This is a GNU extension
     1192to the standard @code{a} command - see below for details.
     1193
     1194Example: Add @samp{hello} after the second line:
     1195@codequoteundirected on
     1196@codequotebacktick on
     1197@example
     1198$ seq 3 | sed '2a hello'
     11991
     12002
     1201hello
     12023
     1203@end example
     1204@codequoteundirected off
     1205@codequotebacktick off
     1206
     1207Leading whitespace after the @code{a} command is ignored.
     1208The text to add is read until the end of the line.
     1209
     1210
    11301211@item a\
    11311212@itemx @var{text}
    1132 @cindex @value{SSEDEXT}, two addresses supported by most commands
    1133 As a @acronym{GNU} extension, this command accepts two addresses.
    1134 
    11351213@findex a (append text lines) command
    11361214@cindex Appending text after a line
    11371215@cindex Text, appending
    1138 Queue the lines of text which follow this command
     1216Appending @var{text} after a line.
     1217
     1218Example: Add @samp{hello} after the second line
     1219(@print{} indicates printed output lines):
     1220@codequoteundirected on
     1221@codequotebacktick on
     1222@example
     1223$ seq 3 | sed '2a\
     1224hello'
     1225@print{}1
     1226@print{}2
     1227@print{}hello
     1228@print{}3
     1229@end example
     1230@codequoteundirected off
     1231@codequotebacktick off
     1232
     1233The @code{a} command queues the lines of text which follow this command
    11391234(each but the last ending with a @code{\},
    11401235which are removed from the output)
     
    11421237or when the next input line is read.
    11431238
     1239@cindex @value{SSEDEXT}, two addresses supported by most commands
     1240As a GNU extension, this command accepts two addresses.
     1241
    11441242Escape sequences in @var{text} are processed, so you should
    11451243use @code{\\} in @var{text} to print a single backslash.
    11461244
    1147 As a @acronym{GNU} extension, if between the @code{a} and the newline there is
    1148 other than a whitespace-@code{\} sequence, then the text of this line,
    1149 starting at the first non-whitespace character after the @code{a},
    1150 is taken as the first line of the @var{text} block.
    1151 (This enables a simplification in scripting a one-line add.)
    1152 This extension also works with the @code{i} and @code{c} commands.
    1153 
     1245The commands resume after the last line without a backslash (@code{\}) -
     1246@samp{world} in the following example:
     1247@codequoteundirected on
     1248@codequotebacktick on
     1249@example
     1250$ seq 3 | sed '2a\
     1251hello\
     1252world
     12533s/./X/'
     1254@print{}1
     1255@print{}2
     1256@print{}hello
     1257@print{}world
     1258@print{}X
     1259@end example
     1260@codequoteundirected off
     1261@codequotebacktick off
     1262
     1263As a GNU extension, the @code{a} command and @var{text} can be
     1264separated into two @code{-e} parameters, enabling easier scripting:
     1265@codequoteundirected on
     1266@codequotebacktick on
     1267@example
     1268$ seq 3 | sed -e '2a\' -e hello
     12691
     12702
     1271hello
     12723
     1273
     1274$ sed -e '2a\' -e "$VAR"
     1275@end example
     1276@codequoteundirected off
     1277@codequotebacktick off
     1278
     1279@item i @var{text}
     1280insert @var{text} before a line. This is a GNU extension
     1281to the standard @code{i} command - see below for details.
     1282
     1283Example: Insert @samp{hello} before the second line:
     1284@codequoteundirected on
     1285@codequotebacktick on
     1286@example
     1287$ seq 3 | sed '2i hello'
     12881
     1289hello
     12902
     12913
     1292@end example
     1293@codequoteundirected off
     1294@codequotebacktick off
     1295
     1296Leading whitespace after the @code{i} command is ignored.
     1297The text to add is read until the end of the line.
     1298
     1299@anchor{insert command}
    11541300@item i\
    11551301@itemx @var{text}
    1156 @cindex @value{SSEDEXT}, two addresses supported by most commands
    1157 As a @acronym{GNU} extension, this command accepts two addresses.
    1158 
    11591302@findex i (insert text lines) command
    11601303@cindex Inserting text before a line
    11611304@cindex Text, insertion
    1162 Immediately output the lines of text which follow this command
    1163 (each but the last ending with a @code{\},
    1164 which are removed from the output).
     1305Immediately output the lines of text which follow this command.
     1306
     1307Example: Insert @samp{hello} before the second line
     1308(@print{} indicates printed output lines):
     1309@codequoteundirected on
     1310@codequotebacktick on
     1311@example
     1312$ seq 3 | sed '2i\
     1313hello'
     1314@print{}1
     1315@print{}hello
     1316@print{}2
     1317@print{}3
     1318@end example
     1319@codequoteundirected off
     1320@codequotebacktick off
     1321
     1322@cindex @value{SSEDEXT}, two addresses supported by most commands
     1323As a GNU extension, this command accepts two addresses.
     1324
     1325Escape sequences in @var{text} are processed, so you should
     1326use @code{\\} in @var{text} to print a single backslash.
     1327
     1328The commands resume after the last line without a backslash (@code{\}) -
     1329@samp{world} in the following example:
     1330@codequoteundirected on
     1331@codequotebacktick on
     1332@example
     1333$ seq 3 | sed '2i\
     1334hello\
     1335world
     1336s/./X/'
     1337@print{}X
     1338@print{}hello
     1339@print{}world
     1340@print{}X
     1341@print{}X
     1342@end example
     1343@codequoteundirected off
     1344@codequotebacktick off
     1345
     1346As a GNU extension, the @code{i} command and @var{text} can be
     1347separated into two @code{-e} parameters, enabling easier scripting:
     1348@codequoteundirected on
     1349@codequotebacktick on
     1350@example
     1351$ seq 3 | sed -e '2i\' -e hello
     13521
     1353hello
     13542
     13553
     1356
     1357$ sed -e '2i\' -e "$VAR"
     1358@end example
     1359@codequoteundirected off
     1360@codequotebacktick off
     1361
     1362@item c @var{text}
     1363Replaces the line(s) with @var{text}. This is a GNU extension
     1364to the standard @code{c} command - see below for details.
     1365
     1366Example: Replace the 2nd to 9th lines with the word @samp{hello}:
     1367@codequoteundirected on
     1368@codequotebacktick on
     1369@example
     1370$ seq 10 | sed '2,9c hello'
     13711
     1372hello
     137310
     1374@end example
     1375@codequoteundirected off
     1376@codequotebacktick off
     1377
     1378Leading whitespace after the @code{c} command is ignored.
     1379The text to add is read until the end of the line.
    11651380
    11661381@item c\
     
    11691384@cindex Replacing selected lines with other text
    11701385Delete the lines matching the address or address-range,
    1171 and output the lines of text which follow this command
    1172 (each but the last ending with a @code{\},
    1173 which are removed from the output)
    1174 in place of the last line
    1175 (or in place of each line, if no addresses were specified).
     1386and output the lines of text which follow this command.
     1387
     1388Example: Replace 2nd to 4th lines with the words @samp{hello} and
     1389@samp{world} (@print{} indicates printed output lines):
     1390@codequoteundirected on
     1391@codequotebacktick on
     1392@example
     1393$ seq 5 | sed '2,4c\
     1394hello\
     1395world'
     1396@print{}1
     1397@print{}hello
     1398@print{}world
     1399@print{}5
     1400@end example
     1401@codequoteundirected off
     1402@codequotebacktick off
     1403
     1404If no addresses are given, each line is replaced.
     1405
    11761406A new cycle is started after this command is done,
    11771407since the pattern space will have been deleted.
     1408In the following example, the @code{c} starts a
     1409new cycle and the substitution command is not performed
     1410on the replaced text:
     1411
     1412@codequoteundirected on
     1413@codequotebacktick on
     1414@example
     1415$ seq 3 | sed '2c\
     1416hello
     1417s/./X/'
     1418@print{}X
     1419@print{}hello
     1420@print{}X
     1421@end example
     1422@codequoteundirected off
     1423@codequotebacktick off
     1424
     1425As a GNU extension, the @code{c} command and @var{text} can be
     1426separated into two @code{-e} parameters, enabling easier scripting:
     1427@codequoteundirected on
     1428@codequotebacktick on
     1429@example
     1430$ seq 3 | sed -e '2c\' -e hello
     14311
     1432hello
     14333
     1434
     1435$ sed -e '2c\' -e "$VAR"
     1436@end example
     1437@codequoteundirected off
     1438@codequotebacktick off
     1439
    11781440
    11791441@item =
    1180 @cindex @value{SSEDEXT}, two addresses supported by most commands
    1181 As a @acronym{GNU} extension, this command accepts two addresses.
    1182 
    11831442@findex = (print line number) command
    11841443@cindex Printing line number
    11851444@cindex Line number, printing
    11861445Print out the current input line number (with a trailing newline).
     1446
     1447@codequoteundirected on
     1448@codequotebacktick on
     1449@example
     1450$ printf '%s\n' aaa bbb ccc | sed =
     14511
     1452aaa
     14532
     1454bbb
     14553
     1456ccc
     1457@end example
     1458@codequoteundirected off
     1459@codequotebacktick off
     1460
     1461@cindex @value{SSEDEXT}, two addresses supported by most commands
     1462As a GNU extension, this command accepts two addresses.
     1463
     1464
     1465
    11871466
    11881467@item l @var{n}
     
    12041483
    12051484@item r @var{filename}
    1206 @cindex @value{SSEDEXT}, two addresses supported by most commands
    1207 As a @acronym{GNU} extension, this command accepts two addresses.
    12081485
    12091486@findex r (read file) command
    12101487@cindex Read text from a file
     1488Reads file @var{filename}. Example:
     1489
     1490@codequoteundirected on
     1491@codequotebacktick on
     1492@example
     1493$ seq 3 | sed '2r/etc/hostname'
     14941
     14952
     1496fencepost.gnu.org
     14973
     1498@end example
     1499@codequoteundirected off
     1500@codequotebacktick off
     1501
    12111502@cindex @value{SSEDEXT}, @file{/dev/stdin} file
    12121503Queue the contents of @var{filename} to be read and
     
    12201511standard input.
    12211512
     1513@cindex @value{SSEDEXT}, two addresses supported by most commands
     1514As a GNU extension, this command accepts two addresses. The
     1515file will then be reread and inserted on each of the addressed lines.
     1516
     1517As a @value{SSED} extension, the @code{r} command accepts a zero address,
     1518inserting a file @emph{before} the first line of the input
     1519@pxref{Adding a header to multiple files}.
     1520
    12221521@item w @var{filename}
    12231522@findex w (write file) command
     
    12261525@cindex @value{SSEDEXT}, @file{/dev/stderr} file
    12271526Write the pattern space to @var{filename}.
    1228 As a @value{SSED} extension, two special values of @var{file-name} are
     1527As a @value{SSED} extension, two special values of @var{filename} are
    12291528supported: @file{/dev/stderr}, which writes the result to the standard
    12301529error, and @file{/dev/stdout}, which writes to the standard
     
    12321531option is being used.}
    12331532
    1234 The file will be created (or truncated) before the
    1235 first input line is read; all @code{w} commands
    1236 (including instances of @code{w} flag on successful @code{s} commands)
    1237 which refer to the same @var{filename} are output without
    1238 closing and reopening the file.
     1533The file will be created (or truncated) before the first input line is
     1534read; all @code{w} commands (including instances of the @code{w} flag
     1535on successful @code{s} commands) which refer to the same @var{filename}
     1536are output without closing and reopening the file.
    12391537
    12401538@item D
    12411539@findex D (delete first line) command
    12421540@cindex Delete first line from pattern space
    1243 Delete text in the pattern space up to the first newline.
    1244 If any text is left, restart cycle with the resultant
    1245 pattern space (without reading a new line of input),
    1246 otherwise start a normal new cycle.
     1541If pattern space contains no newline, start a normal new cycle as if
     1542the @code{d} command was issued.  Otherwise, delete text in the pattern
     1543space up to the first newline, and restart cycle with the resultant
     1544pattern space, without reading a new line of input.
    12471545
    12481546@item N
     
    12541552If there is no more input then @command{sed} exits without processing
    12551553any more commands.
     1554
     1555When @option{-z} is used, a zero byte (the ascii @samp{NUL} character) is
     1556added between the lines (instead of a new line).
     1557
     1558By default @command{sed} does not terminate if there is no 'next' input line.
     1559This is a GNU extension which can be disabled with @option{--posix}.
     1560@xref{N_command_last_line,,N command on the last line}.
     1561
    12561562
    12571563@item P
     
    13561662
    13571663If a parameter is specified, instead, the @code{e} command
    1358 interprets it as a command and sends its output to the output stream
    1359 (like @code{r} does).  The command can run across multiple
    1360 lines, all but the last ending with a back-slash.
     1664interprets it as a command and sends its output to the output stream.
     1665The command can run across multiple lines, all but the last ending with
     1666a back-slash.
    13611667
    13621668In both cases, the results are undefined if the command to be
    13631669executed contains a @sc{nul} character.
    13641670
    1365 @item L @var{n}
    1366 @findex L (fLow paragraphs) command
    1367 @cindex Reformat pattern space
    1368 @cindex Reformatting paragraphs
    1369 @cindex @value{SSEDEXT}, reformatting paragraphs
    1370 @cindex @value{SSEDEXT}, @code{L} command
    1371 This @value{SSED} extension fills and joins lines in pattern space
    1372 to produce output lines of (at most) @var{n} characters, like
    1373 @code{fmt} does; if @var{n} is omitted, the default as specified
    1374 on the command line is used.  This command is considered a failed
    1375 experiment and unless there is enough request (which seems unlikely)
    1376 will be removed in future versions.
    1377 
    1378 @ignore
    1379 Blank lines, spaces between words, and indentation are
    1380 preserved in the output; successive input lines with different
    1381 indentation are not joined; tabs are expanded to 8 columns.
    1382 
    1383 If the pattern space contains multiple lines, they are joined, but
    1384 since the pattern space usually contains a single line, the behavior
    1385 of a simple @code{L;d} script is the same as @samp{fmt -s} (i.e.,
    1386 it does not join short lines to form longer ones).
    1387 
    1388 @var{n} specifies the desired line-wrap length; if omitted,
    1389 the default as specified on the command line is used.
    1390 @end ignore
     1671Note that, unlike the @code{r} command, the output of the command will
     1672be printed immediately; the @code{r} command instead delays the output
     1673to the end of the current cycle.
     1674
     1675@item F
     1676@findex F (File name) command
     1677@cindex Printing file name
     1678@cindex File name, printing
     1679Print out the file name of the current input file (with a trailing
     1680newline).
    13911681
    13921682@item Q [@var{exit-code}]
    1393 This command only accepts a single address.
     1683This command accepts only one address.
    13941684
    13951685@findex Q (silent Quit) command
     
    14091699@example
    14101700:eat
    1411 $d       @i{Quit silently on the last line}
    1412 N        @i{Read another line, silently}
    1413 g        @i{Overwrite pattern space each time to save memory}
     1701$d       @i{@r{Quit silently on the last line}}
     1702N        @i{@r{Read another line, silently}}
     1703g        @i{@r{Overwrite pattern space each time to save memory}}
    14141704b eat
    14151705@end example
     
    14621752the first newline.  Everything said under the @code{w} command about
    14631753file handling holds here too.
     1754
     1755@item z
     1756@findex z (Zap) command
     1757@cindex @value{SSEDEXT}, emptying pattern space
     1758@cindex Emptying pattern space
     1759This command empties the content of pattern space.  It is
     1760usually the same as @samp{s/.*//}, but is more efficient
     1761and works in the presence of invalid multibyte sequences
     1762in the input stream.  @sc{posix} mandates that such sequences
     1763are @emph{not} matched by @samp{.}, so that there is no portable
     1764way to clear @command{sed}'s buffers in the middle of the
     1765script in most multibyte locales (including UTF-8 locales).
    14641766@end table
    14651767
     1768
     1769@node Multiple commands syntax
     1770@section Multiple commands syntax
     1771
     1772@c POSIX says:
     1773@c   Editing commands other than {...}, a, b, c, i, r, t, w, :, and #
     1774@c   can be followed by a <semicolon>, optional <blank> characters, and
     1775@c   another editing command. However, when an s editing command is used
     1776@c   with the w flag, following it with another command in this manner
     1777@c   produces undefined results.
     1778
     1779There are several methods to specify multiple commands in a @command{sed}
     1780program.
     1781
     1782Using newlines is most natural when running a sed script from a file
     1783(using the @option{-f} option).
     1784
     1785On the command line, all @command{sed} commands may be separated by newlines.
     1786Alternatively, you may specify each command as an argument to an @option{-e}
     1787option:
     1788
     1789@codequoteundirected on
     1790@codequotebacktick on
     1791@example
     1792@group
     1793$ seq 6 | sed '1d
     17943d
     17955d'
     17962
     17974
     17986
     1799
     1800$ seq 6 | sed -e 1d -e 3d -e 5d
     18012
     18024
     18036
     1804@end group
     1805@end example
     1806@codequoteundirected off
     1807@codequotebacktick off
     1808
     1809A semicolon (@samp{;}) may be used to separate most simple commands:
     1810
     1811@codequoteundirected on
     1812@codequotebacktick on
     1813@example
     1814@group
     1815$ seq 6 | sed '1d;3d;5d'
     18162
     18174
     18186
     1819@end group
     1820@end example
     1821@codequoteundirected off
     1822@codequotebacktick off
     1823
     1824The @code{@{},@code{@}},@code{b},@code{t},@code{T},@code{:} commands can
     1825be separated with a semicolon (this is a non-portable @value{SSED} extension).
     1826
     1827@codequoteundirected on
     1828@codequotebacktick on
     1829@example
     1830@group
     1831$ seq 4 | sed '@{1d;3d@}'
     18322
     18334
     1834
     1835$ seq 6 | sed '@{1d;3d@};5d'
     18362
     18374
     18386
     1839@end group
     1840@end example
     1841@codequoteundirected off
     1842@codequotebacktick off
     1843
     1844Labels used in @code{b},@code{t},@code{T},@code{:} commands are read
     1845until a semicolon.  Leading and trailing whitespace is ignored.  In
     1846the examples below the label is @samp{x}.  The first example works
     1847with @value{SSED}.  The second is a portable equivalent.  For more
     1848information about branching and labels @pxref{Branching and flow
     1849control}.
     1850
     1851@codequoteundirected on
     1852@codequotebacktick on
     1853@example
     1854@group
     1855$ seq 3 | sed '/1/b x ; s/^/=/ ; :x ; 3d'
     18561
     1857=2
     1858
     1859$ seq 3 | sed -e '/1/bx' -e 's/^/=/' -e ':x' -e '3d'
     18601
     1861=2
     1862@end group
     1863@end example
     1864@codequoteundirected off
     1865@codequotebacktick off
     1866
     1867
     1868
     1869@subsection Commands Requiring a newline
     1870
     1871The following commands cannot be separated by a semicolon and
     1872require a newline:
     1873
     1874@table @asis
     1875
     1876@item @code{a},@code{c},@code{i} (append/change/insert)
     1877
     1878All characters following @code{a},@code{c},@code{i} commands are taken
     1879as the text to append/change/insert.  Using a semicolon leads to
     1880undesirable results:
     1881
     1882@codequoteundirected on
     1883@codequotebacktick on
     1884@example
     1885@group
     1886$ seq 2 | sed '1aHello ; 2d'
     18871
     1888Hello ; 2d
     18892
     1890@end group
     1891@end example
     1892@codequoteundirected off
     1893@codequotebacktick off
     1894
     1895Separate the commands using @option{-e} or a newline:
     1896
     1897@codequoteundirected on
     1898@codequotebacktick on
     1899@example
     1900@group
     1901$ seq 2 | sed -e 1aHello -e 2d
     19021
     1903Hello
     1904
     1905$ seq 2 | sed '1aHello
     19062d'
     19071
     1908Hello
     1909@end group
     1910@end example
     1911@codequoteundirected off
     1912@codequotebacktick off
     1913
     1914Note that specifying the text to add (@samp{Hello}) immediately
     1915after @code{a},@code{c},@code{i} is itself a @value{SSED} extension.
     1916A portable, POSIX-compliant alternative is:
     1917
     1918@codequoteundirected on
     1919@codequotebacktick on
     1920@example
     1921@group
     1922$ seq 2 | sed '1a\
     1923Hello
     19242d'
     19251
     1926Hello
     1927@end group
     1928@end example
     1929@codequoteundirected off
     1930@codequotebacktick off
     1931
     1932@item @code{#} (comment)
     1933
     1934All characters following @samp{#} until the next newline are ignored.
     1935
     1936@codequoteundirected on
     1937@codequotebacktick on
     1938@example
     1939@group
     1940$ seq 3 | sed '# this is a comment ; 2d'
     19411
     19422
     19433
     1944
     1945
     1946$ seq 3 | sed '# this is a comment
     19472d'
     19481
     19493
     1950@end group
     1951@end example
     1952@codequoteundirected off
     1953@codequotebacktick off
     1954
     1955@item @code{r},@code{R},@code{w},@code{W} (reading and writing files)
     1956
     1957The @code{r},@code{R},@code{w},@code{W} commands parse the filename
     1958until end of the line.  If whitespace, comments or semicolons are found,
     1959they will be included in the filename, leading to unexpected results:
     1960
     1961@codequoteundirected on
     1962@codequotebacktick on
     1963@example
     1964@group
     1965$ seq 2 | sed '1w hello.txt ; 2d'
     19661
     19672
     1968
     1969$ ls -log
     1970total 4
     1971-rw-rw-r-- 1 2 Jan 23 23:03 hello.txt ; 2d
     1972
     1973$ cat 'hello.txt ; 2d'
     19741
     1975@end group
     1976@end example
     1977@codequoteundirected off
     1978@codequotebacktick off
     1979
     1980Note that @command{sed} silently ignores read/write errors in
     1981@code{r},@code{R},@code{w},@code{W} commands (such as missing files).
     1982In the following example, @command{sed} tries to read a file named
     1983@samp{@file{hello.txt ; N}}. The file is missing, and the error is silently
     1984ignored:
     1985
     1986@codequoteundirected on
     1987@codequotebacktick on
     1988@example
     1989@group
     1990$ echo x | sed '1rhello.txt ; N'
     1991x
     1992@end group
     1993@end example
     1994@codequoteundirected off
     1995@codequotebacktick off
     1996
     1997@item @code{e} (command execution)
     1998
     1999Any characters following the @code{e} command until the end of the line
     2000will be sent to the shell.  If whitespace, comments or semicolons are found,
     2001they will be included in the shell command, leading to unexpected results:
     2002
     2003@codequoteundirected on
     2004@codequotebacktick on
     2005@example
     2006@group
     2007$ echo a | sed '1e touch foo#bar'
     2008a
     2009
     2010$ ls -1
     2011foo#bar
     2012
     2013$ echo a | sed '1e touch foo ; s/a/b/'
     2014sh: 1: s/a/b/: not found
     2015a
     2016@end group
     2017@end example
     2018@codequoteundirected off
     2019@codequotebacktick off
     2020
     2021
     2022@item @code{s///[we]} (substitute with @code{e} or @code{w} flags)
     2023
     2024In a substitution command, the @code{w} flag writes the substitution
     2025result to a file, and the @code{e} flag executes the substitution result
     2026as a shell command.  As with the @code{r/R/w/W/e} commands, these
     2027must be terminated with a newline.  If whitespace, comments or semicolons
     2028are found, they will be included in the shell command or filename, leading to
     2029unexpected results:
     2030
     2031@codequoteundirected on
     2032@codequotebacktick on
     2033@example
     2034@group
     2035$ echo a | sed 's/a/b/w1.txt#foo'
     2036b
     2037
     2038$ ls -1
     20391.txt#foo
     2040@end group
     2041@end example
     2042@codequoteundirected off
     2043@codequotebacktick off
     2044
     2045@end table
     2046
     2047
     2048@node sed addresses
     2049@chapter Addresses: selecting lines
     2050
     2051@menu
     2052* Addresses overview::                Addresses overview
     2053* Numeric Addresses::                 selecting lines by numbers
     2054* Regexp Addresses::                  selecting lines by text matching
     2055* Range Addresses::                   selecting a range of lines
     2056* Zero Address::                      Using address @code{0}
     2057@end menu
     2058
     2059@node Addresses overview
     2060@section Addresses overview
     2061
     2062@cindex addresses, numeric
     2063@cindex numeric addresses
     2064Addresses determine on which line(s) the @command{sed} command will be
     2065executed. The following command replaces any first occurrence of @samp{hello}
     2066with @samp{world} only on line 144:
     2067
     2068@codequoteundirected on
     2069@codequotebacktick on
     2070@example
     2071sed '144s/hello/world/' input.txt > output.txt
     2072@end example
     2073@codequoteundirected off
     2074@codequotebacktick off
     2075
     2076
     2077
     2078If no address is specified, the command is performed on all lines.
     2079The following command replaces @samp{hello} with @samp{world},
     2080targeting every line of the input file.
     2081However, note that it modifies only the first instance of @samp{hello}
     2082on each line.
     2083Use the @samp{g} modifier to affect every instance on each affected line.
     2084
     2085@codequoteundirected on
     2086@codequotebacktick on
     2087@example
     2088sed 's/hello/world/' input.txt > output.txt
     2089@end example
     2090@codequoteundirected off
     2091@codequotebacktick off
     2092
     2093
     2094
     2095@cindex addresses, regular expression
     2096@cindex regular expression addresses
     2097Addresses can contain regular expressions to match lines based
     2098on content instead of line numbers. The following command replaces
     2099@samp{hello} with @samp{world} only on lines
     2100containing the string @samp{apple}:
     2101
     2102@codequoteundirected on
     2103@codequotebacktick on
     2104@example
     2105sed '/apple/s/hello/world/' input.txt > output.txt
     2106@end example
     2107@codequoteundirected off
     2108@codequotebacktick off
     2109
     2110
     2111
     2112@cindex addresses, range
     2113@cindex range addresses
     2114An address range is specified with two addresses separated by a comma
     2115(@code{,}). Addresses can be numeric, regular expressions, or a mix of
     2116both.
     2117The following command replaces @samp{hello} with @samp{world}
     2118only on lines 4 to 17 (inclusive):
     2119
     2120@codequoteundirected on
     2121@codequotebacktick on
     2122@example
     2123sed '4,17s/hello/world/' input.txt > output.txt
     2124@end example
     2125@codequoteundirected off
     2126@codequotebacktick off
     2127
     2128
     2129
     2130@cindex Excluding lines
     2131@cindex Selecting non-matching lines
     2132@cindex addresses, negating
     2133@cindex addresses, excluding
     2134Appending the @code{!} character to the end of an address
     2135specification (before the command letter) negates the sense of the
     2136match.  That is, if the @code{!} character follows an address or an
     2137address range, then only lines which do @emph{not} match the addresses
     2138will be selected. The following command replaces @samp{hello}
     2139with @samp{world} only on lines @emph{not} containing the string
     2140@samp{apple}:
     2141
     2142@example
     2143sed '/apple/!s/hello/world/' input.txt > output.txt
     2144@end example
     2145
     2146The following command replaces @samp{hello} with
     2147@samp{world} only on lines 1 to 3 and from line 18 to the last line of the
     2148input file (i.e. excluding lines 4 to 17):
     2149
     2150@example
     2151sed '4,17!s/hello/world/' input.txt > output.txt
     2152@end example
     2153
     2154
     2155
     2156
     2157
     2158@node Numeric Addresses
     2159@section Selecting lines by numbers
     2160@cindex Addresses, in @command{sed} scripts
     2161@cindex Line selection
     2162@cindex Selecting lines to process
     2163
     2164Addresses in a @command{sed} script can be in any of the following forms:
     2165@table @code
     2166@item @var{number}
     2167@cindex Address, numeric
     2168@cindex Line, selecting by number
     2169Specifying a line number will match only that line in the input.
     2170(Note that @command{sed} counts lines continuously across all input files
     2171unless @option{-i} or @option{-s} options are specified.)
     2172
     2173@item $
     2174@cindex Address, last line
     2175@cindex Last line, selecting
     2176@cindex Line, selecting last
     2177This address matches the last line of the last file of input, or
     2178the last line of each file when the @option{-i} or @option{-s} options
     2179are specified.
     2180
     2181
     2182@item @var{first}~@var{step}
     2183@cindex GNU extensions, @samp{@var{n}~@var{m}} addresses
     2184This GNU extension matches every @var{step}th line
     2185starting with line @var{first}.
     2186In particular, lines will be selected when there exists
     2187a non-negative @var{n} such that the current line-number equals
     2188@var{first} + (@var{n} * @var{step}).
     2189Thus, one would use @code{1~2} to select the odd-numbered lines and
     2190@code{0~2} for even-numbered lines;
     2191to pick every third line starting with the second, @samp{2~3} would be used;
     2192to pick every fifth line starting with the tenth, use @samp{10~5};
     2193and @samp{50~0} is just an obscure way of saying @code{50}.
     2194
     2195The following commands demonstrate the step address usage:
     2196
     2197@example
     2198$ seq 10 | sed -n '0~4p'
     21994
     22008
     2201
     2202$ seq 10 | sed -n '1~3p'
     22031
     22044
     22057
     220610
     2207@end example
     2208
     2209
     2210@end table
     2211
     2212
     2213
     2214@node Regexp Addresses
     2215@section selecting lines by text matching
     2216
     2217@value{SSED} supports the following regular expression addresses.
     2218The default regular expression is
     2219@ref{BRE syntax, , Basic Regular Expression (BRE)}.
     2220If @option{-E} or @option{-r} options are used, The regular expression should be
     2221in @ref{ERE syntax, , Extended Regular Expression (ERE)} syntax.
     2222@xref{BRE vs ERE}.
     2223
     2224@table @code
     2225@item /@var{regexp}/
     2226@cindex Address, as a regular expression
     2227@cindex Line, selecting by regular expression match
     2228This will select any line which matches the regular expression @var{regexp}.
     2229If @var{regexp} itself includes any @code{/} characters,
     2230each must be escaped by a backslash (@code{\}).
     2231
     2232The following command prints lines in @file{/etc/passwd}
     2233which end with @samp{bash}@footnote{
     2234There are of course many other ways to do the same,
     2235e.g.
     2236@example
     2237grep 'bash$' /etc/passwd
     2238awk -F: '$7 == "/bin/bash"' /etc/passwd
     2239@end example
     2240}:
     2241
     2242@example
     2243sed -n '/bash$/p' /etc/passwd
     2244@end example
     2245
     2246@cindex empty regular expression
     2247@cindex @value{SSEDEXT}, modifiers and the empty regular expression
     2248The empty regular expression @samp{//} repeats the last regular
     2249expression match (the same holds if the empty regular expression is
     2250passed to the @code{s} command).  Note that modifiers to regular expressions
     2251are evaluated when the regular expression is compiled, thus it is invalid to
     2252specify them together with the empty regular expression.
     2253
     2254@item \%@var{regexp}%
     2255(The @code{%} may be replaced by any other single character.)
     2256
     2257@cindex Slash character, in regular expressions
     2258This also matches the regular expression @var{regexp},
     2259but allows one to use a different delimiter than @code{/}.
     2260This is particularly useful if the @var{regexp} itself contains
     2261a lot of slashes, since it avoids the tedious escaping of every @code{/}.
     2262If @var{regexp} itself includes any delimiter characters,
     2263each must be escaped by a backslash (@code{\}).
     2264
     2265The following commands are equivalent. They print lines
     2266which start with @samp{/home/alice/documents/}:
     2267
     2268@example
     2269sed -n '/^\/home\/alice\/documents\//p'
     2270sed -n '\%^/home/alice/documents/%p'
     2271sed -n '\;^/home/alice/documents/;p'
     2272@end example
     2273
     2274
     2275@item /@var{regexp}/I
     2276@itemx \%@var{regexp}%I
     2277@cindex GNU extensions, @code{I} modifier
     2278@cindex case insensitive, regular expression
     2279The @code{I} modifier to regular-expression matching is a GNU
     2280extension which causes the @var{regexp} to be matched in
     2281a case-insensitive manner.
     2282
     2283In many other programming languages, a lower case @code{i} is used
     2284for case-insensitive regular expression matching. However, in @command{sed}
     2285the @code{i} is used for the insert command (@pxref{insert command}).
     2286
     2287Observe the difference between the following examples.
     2288
     2289In this example, @code{/b/I} is the address: regular expression with @code{I}
     2290modifier. @code{d} is the delete command:
     2291
     2292@example
     2293$ printf "%s\n" a b c | sed '/b/Id'
     2294a
     2295c
     2296@end example
     2297
     2298Here, @code{/b/} is the address: a regular expression.
     2299@code{i} is the insert command.
     2300@code{d} is the value to insert.
     2301A line with @samp{d} is then inserted above the matched line:
     2302
     2303@example
     2304$ printf "%s\n" a b c | sed '/b/id'
     2305a
     2306d
     2307b
     2308c
     2309@end example
     2310
     2311@item /@var{regexp}/M
     2312@itemx \%@var{regexp}%M
     2313@cindex @value{SSEDEXT}, @code{M} modifier
     2314The @code{M} modifier to regular-expression matching is a @value{SSED}
     2315extension which directs @value{SSED} to match the regular expression
     2316in @cite{multi-line} mode.  The modifier causes @code{^} and @code{$} to
     2317match respectively (in addition to the normal behavior) the empty string
     2318after a newline, and the empty string before a newline.  There are
     2319special character sequences
     2320@ifclear PERL
     2321(@code{\`} and @code{\'})
     2322@end ifclear
     2323which always match the beginning or the end of the buffer.
     2324In addition,
     2325the period character does not match a new-line character in
     2326multi-line mode.
     2327@end table
     2328
     2329
     2330@cindex regex addresses and pattern space
     2331@cindex regex addresses and input lines
     2332Regex addresses operate on the content of the current
     2333pattern space. If the pattern space is changed (for example with @code{s///}
     2334command) the regular expression matching will operate on the changed text.
     2335
     2336In the following example, automatic printing is disabled with
     2337@option{-n}.  The @code{s/2/X/} command changes lines containing
     2338@samp{2} to @samp{X}. The command @code{/[0-9]/p} matches
     2339lines with digits and prints them.
     2340Because the second line is changed before the @code{/[0-9]/} regex,
     2341it will not match and will not be printed:
     2342
     2343@codequoteundirected on
     2344@codequotebacktick on
     2345@example
     2346@group
     2347$ seq 3 | sed -n 's/2/X/ ; /[0-9]/p'
     23481
     23493
     2350@end group
     2351@end example
     2352@codequoteundirected off
     2353@codequotebacktick off
     2354
     2355
     2356@node Range Addresses
     2357@section Range Addresses
     2358
     2359@cindex Range of lines
     2360@cindex Several lines, selecting
     2361An address range can be specified by specifying two addresses
     2362separated by a comma (@code{,}).  An address range matches lines
     2363starting from where the first address matches, and continues
     2364until the second address matches (inclusively):
     2365
     2366@example
     2367$ seq 10 | sed -n '4,6p'
     23684
     23695
     23706
     2371@end example
     2372
     2373If the second address is a @var{regexp}, then checking for the
     2374ending match will start with the line @emph{following} the
     2375line which matched the first address: a range will always
     2376span at least two lines (except of course if the input stream
     2377ends).
     2378
     2379@example
     2380$ seq 10 | sed -n '4,/[0-9]/p'
     23814
     23825
     2383@end example
     2384
     2385If the second address is a @var{number} less than (or equal to)
     2386the line matching the first address, then only the one line is
     2387matched:
     2388
     2389@example
     2390$ seq 10 | sed -n '4,1p'
     23914
     2392@end example
     2393
     2394@anchor{Zero Address Regex Range}
     2395@cindex Special addressing forms
     2396@cindex Range with start address of zero
     2397@cindex Zero, as range start address
     2398@cindex @var{addr1},+N
     2399@cindex @var{addr1},~N
     2400@cindex GNU extensions, special two-address forms
     2401@cindex GNU extensions, @code{0} address
     2402@cindex GNU extensions, 0,@var{addr2} addressing
     2403@cindex GNU extensions, @var{addr1},+@var{N} addressing
     2404@cindex GNU extensions, @var{addr1},~@var{N} addressing
     2405@value{SSED} also supports some special two-address forms; all these
     2406are GNU extensions:
     2407@table @code
     2408@item 0,/@var{regexp}/
     2409A line number of @code{0} can be used in an address specification like
     2410@code{0,/@var{regexp}/} so that @command{sed} will try to match
     2411@var{regexp} in the first input line too.  In other words,
     2412@code{0,/@var{regexp}/} is similar to @code{1,/@var{regexp}/},
     2413except that if @var{addr2} matches the very first line of input the
     2414@code{0,/@var{regexp}/} form will consider it to end the range, whereas
     2415the @code{1,/@var{regexp}/} form will match the beginning of its range and
     2416hence make the range span up to the @emph{second} occurrence of the
     2417regular expression.
     2418
     2419The following examples demonstrate the difference between starting
     2420with address 1 and 0:
     2421
     2422@example
     2423$ seq 10 | sed -n '1,/[0-9]/p'
     24241
     24252
     2426
     2427$ seq 10 | sed -n '0,/[0-9]/p'
     24281
     2429@end example
     2430
     2431
     2432@item @var{addr1},+@var{N}
     2433Matches @var{addr1} and the @var{N} lines following @var{addr1}.
     2434
     2435@example
     2436$ seq 10 | sed -n '6,+2p'
     24376
     24387
     24398
     2440@end example
     2441
     2442@var{addr1} can be a line number or a regular expression.
     2443
     2444@item @var{addr1},~@var{N}
     2445Matches @var{addr1} and the lines following @var{addr1}
     2446until the next line whose input line number is a multiple of @var{N}.
     2447The following command prints starting at line 6, until the next line which
     2448is a multiple of 4 (i.e. line 8):
     2449
     2450@example
     2451$ seq 10 | sed -n '6,~4p'
     24526
     24537
     24548
     2455@end example
     2456
     2457@var{addr1} can be a line number or a regular expression.
     2458
     2459@end table
     2460
     2461
     2462
     2463@node Zero Address
     2464@section Zero Address
     2465@cindex Zero Address
     2466As a @value{SSED} extension, @code{0} address can be used in two cases:
     2467@enumerate
     2468@item
     2469In a regex range addresses as @code{0,/@var{regexp}/}
     2470(@pxref{Zero Address Regex Range}).
     2471@item
     2472With the @code{r} command, inserting a file before the first line
     2473(@pxref{Adding a header to multiple files}).
     2474@end enumerate
     2475
     2476Note that these are the only places where the @code{0} address makes
     2477sense; Commands which are given the @code{0} address in any
     2478other way will give an error.
     2479
     2480
     2481
     2482@node sed regular expressions
     2483@chapter Regular Expressions: selecting text
     2484
     2485@menu
     2486* Regular Expressions Overview:: Overview of Regular expression in @command{sed}
     2487* BRE vs ERE::               Basic (BRE) and extended (ERE) regular expression
     2488                             syntax
     2489* BRE syntax::               Overview of basic regular expression syntax
     2490* ERE syntax::               Overview of extended regular expression syntax
     2491* Character Classes and Bracket Expressions::
     2492* regexp extensions::        Additional regular expression commands
     2493* Back-references and Subexpressions:: Back-references and Subexpressions
     2494* Escapes::                  Specifying special characters
     2495* Locale Considerations::    Multibyte characters and locale considerations
     2496@end menu
     2497
     2498@node Regular Expressions Overview
     2499@section Overview of regular expression in @command{sed}
     2500
     2501@c NOTE: Keep examples in the 'overview' section
     2502@c neutral in regards to BRE/ERE - to ease understanding.
     2503
     2504
     2505To know how to use @command{sed}, people should understand regular
     2506expressions (@dfn{regexp} for short).  A regular expression
     2507is a pattern that is matched against a
     2508subject string from left to right.  Most characters are
     2509@dfn{ordinary}: they stand for
     2510themselves in a pattern, and match the corresponding characters.
     2511Regular expressions in @command{sed} are specified between two
     2512slashes.
     2513
     2514The following command prints lines containing the string @samp{hello}:
     2515
     2516@example
     2517sed -n '/hello/p'
     2518@end example
     2519
     2520The above example is equivalent to this @command{grep} command:
     2521
     2522@example
     2523grep 'hello'
     2524@end example
     2525
     2526The power of regular expressions comes from the ability to include
     2527alternatives and repetitions in the pattern.  These are encoded in the
     2528pattern by the use of @dfn{special characters}, which do not stand for
     2529themselves but instead are interpreted in some special way.
     2530
     2531The character @code{^} (caret) in a regular expression matches the
     2532beginning of the line. The character @code{.} (dot) matches any single
     2533character. The following @command{sed} command matches and prints
     2534lines which start with the letter @samp{b}, followed by any single character,
     2535followed by the letter @samp{d}:
     2536
     2537@example
     2538$ printf "%s\n" abode bad bed bit bid byte body | sed -n '/^b.d/p'
     2539bad
     2540bed
     2541bid
     2542body
     2543@end example
     2544
     2545The following sections explain the meaning and usage of special
     2546characters in regular expressions.
     2547
     2548@node BRE vs ERE
     2549@section Basic (BRE) and extended (ERE) regular expression
     2550
     2551Basic and extended regular expressions are two variations on the
     2552syntax of the specified pattern. Basic Regular Expression (BRE) syntax is the
     2553default in @command{sed} (and similarly in @command{grep}).
     2554Use the POSIX-specified @option{-E} option (@option{-r},
     2555@option{--regexp-extended}) to enable Extended Regular Expression (ERE) syntax.
     2556
     2557In @value{SSED}, the only difference between basic and extended regular
     2558expressions is in the behavior of a few special characters: @samp{?},
     2559@samp{+}, parentheses, braces (@samp{@{@}}), and @samp{|}.
     2560
     2561With basic (BRE) syntax, these characters do not have special meaning
     2562unless prefixed with a backslash (@samp{\}); While with extended (ERE) syntax
     2563it is reversed: these characters are special unless they are prefixed
     2564with backslash (@samp{\}).
     2565
     2566@multitable @columnfractions .28 .36 .35
     2567
     2568@headitem Desired pattern
     2569@tab Basic (BRE) Syntax
     2570@tab Extended (ERE) Syntax
     2571
     2572@item literal @samp{+} (plus sign)
     2573
     2574@tab
     2575@exampleindent 0
     2576@codequoteundirected on
     2577@codequotebacktick on
     2578@example
     2579$ echo 'a+b=c' > foo
     2580$ sed -n '/a+b/p' foo
     2581a+b=c
     2582@end example
     2583@codequotebacktick off
     2584@codequoteundirected off
     2585
     2586@tab
     2587@exampleindent 0
     2588@codequoteundirected on
     2589@codequotebacktick on
     2590@example
     2591$ echo 'a+b=c' > foo
     2592$ sed -E -n '/a\+b/p' foo
     2593a+b=c
     2594@end example
     2595@codequotebacktick off
     2596@codequoteundirected off
     2597
     2598
     2599@item One or more @samp{a} characters followed by @samp{b}
     2600(plus sign as special meta-character)
     2601
     2602@tab
     2603@exampleindent 0
     2604@codequoteundirected on
     2605@codequotebacktick on
     2606@example
     2607$ echo aab > foo
     2608$ sed -n '/a\+b/p' foo
     2609aab
     2610@end example
     2611@codequotebacktick off
     2612@codequoteundirected off
     2613
     2614@tab
     2615@exampleindent 0
     2616@codequoteundirected on
     2617@codequotebacktick on
     2618@example
     2619$ echo aab > foo
     2620$ sed -E -n '/a+b/p' foo
     2621aab
     2622@end example
     2623@codequotebacktick off
     2624@codequoteundirected off
     2625
     2626@end multitable
     2627
     2628
     2629
     2630
     2631@node BRE syntax
     2632@section Overview of basic regular expression syntax
     2633
     2634Here is a brief description
     2635of regular expression syntax as used in @command{sed}.
     2636
     2637@table @code
     2638@item @var{char}
     2639A single ordinary character matches itself.
     2640
     2641@item *
     2642@cindex GNU extensions, to basic regular expressions
     2643Matches a sequence of zero or more instances of matches for the
     2644preceding regular expression, which must be an ordinary character, a
     2645special character preceded by @code{\}, a @code{.}, a grouped regexp
     2646(see below), or a bracket expression.  As a GNU extension, a
     2647postfixed regular expression can also be followed by @code{*}; for
     2648example, @code{a**} is equivalent to @code{a*}.  POSIX
     26491003.1-2001 says that @code{*} stands for itself when it appears at
     2650the start of a regular expression or subexpression, but many
     2651non-GNU implementations do not support this and portable
     2652scripts should instead use @code{\*} in these contexts.
     2653@item .
     2654Matches any character, including newline.
     2655
     2656@item ^
     2657Matches the null string at beginning of the pattern space, i.e. what
     2658appears after the circumflex must appear at the beginning of the
     2659pattern space.
     2660
     2661In most scripts, pattern space is initialized to the content of each
     2662line (@pxref{Execution Cycle, , How @code{sed} works}).  So, it is a
     2663useful simplification to think of @code{^#include} as matching only
     2664lines where @samp{#include} is the first thing on the line---if there is
     2665any preceding space, for example, the match fails.  This simplification is
     2666valid as long as the original content of pattern space is not modified,
     2667for example with an @code{s} command.
     2668
     2669@code{^} acts as a special character only at the beginning of the
     2670regular expression or subexpression (that is, after @code{\(} or
     2671@code{\|}).  Portable scripts should avoid @code{^} at the beginning of
     2672a subexpression, though, as POSIX allows implementations that
     2673treat @code{^} as an ordinary character in that context.
     2674
     2675@item $
     2676It is the same as @code{^}, but refers to end of pattern space.
     2677@code{$} also acts as a special character only at the end
     2678of the regular expression or subexpression (that is, before @code{\)}
     2679or @code{\|}), and its use at the end of a subexpression is not
     2680portable.
     2681
     2682
     2683@item [@var{list}]
     2684@itemx [^@var{list}]
     2685Matches any single character in @var{list}: for example,
     2686@code{[aeiou]} matches all vowels.  A list may include
     2687sequences like @code{@var{char1}-@var{char2}}, which
     2688matches any character between (inclusive) @var{char1}
     2689and @var{char2}.
     2690@xref{Character Classes and Bracket Expressions}.
     2691
     2692@item \+
     2693@cindex GNU extensions, to basic regular expressions
     2694As @code{*}, but matches one or more.  It is a GNU extension.
     2695
     2696@item \?
     2697@cindex GNU extensions, to basic regular expressions
     2698As @code{*}, but only matches zero or one.  It is a GNU extension.
     2699
     2700@item \@{@var{i}\@}
     2701As @code{*}, but matches exactly @var{i} sequences (@var{i} is a
     2702decimal integer; for portability, keep it between 0 and 255
     2703inclusive).
     2704
     2705@item \@{@var{i},@var{j}\@}
     2706Matches between @var{i} and @var{j}, inclusive, sequences.
     2707
     2708@item \@{@var{i},\@}
     2709Matches more than or equal to @var{i} sequences.
     2710
     2711@item \(@var{regexp}\)
     2712Groups the inner @var{regexp} as a whole, this is used to:
     2713
     2714@itemize @bullet
     2715@item
     2716@cindex GNU extensions, to basic regular expressions
     2717Apply postfix operators, like @code{\(abcd\)*}:
     2718this will search for zero or more whole sequences
     2719of @samp{abcd}, while @code{abcd*} would search
     2720for @samp{abc} followed by zero or more occurrences
     2721of @samp{d}.  Note that support for @code{\(abcd\)*} is
     2722required by POSIX 1003.1-2001, but many non-GNU
     2723implementations do not support it and hence it is not universally
     2724portable.
     2725
     2726@item
     2727Use back references (see below).
     2728@end itemize
     2729
     2730
     2731@item @var{regexp1}\|@var{regexp2}
     2732@cindex GNU extensions, to basic regular expressions
     2733Matches either @var{regexp1} or @var{regexp2}.  Use
     2734parentheses to use complex alternative regular expressions.
     2735The matching process tries each alternative in turn, from
     2736left to right, and the first one that succeeds is used.
     2737It is a GNU extension.
     2738
     2739@item @var{regexp1}@var{regexp2}
     2740Matches the concatenation of @var{regexp1} and @var{regexp2}.
     2741Concatenation binds more tightly than @code{\|}, @code{^}, and
     2742@code{$}, but less tightly than the other regular expression
     2743operators.
     2744
     2745@item \@var{digit}
     2746Matches the @var{digit}-th @code{\(@dots{}\)} parenthesized
     2747subexpression in the regular expression.  This is called a @dfn{back
     2748reference}.  Subexpressions are implicitly numbered by counting
     2749occurrences of @code{\(} left-to-right.
     2750
     2751@item \n
     2752Matches the newline character.
     2753
     2754@item \@var{char}
     2755Matches @var{char}, where @var{char} is one of @code{$},
     2756@code{*}, @code{.}, @code{[}, @code{\}, or @code{^}.
     2757Note that the only C-like
     2758backslash sequences that you can portably assume to be
     2759interpreted are @code{\n} and @code{\\}; in particular
     2760@code{\t} is not portable, and matches a @samp{t} under most
     2761implementations of @command{sed}, rather than a tab character.
     2762
     2763@end table
     2764
     2765@cindex Greedy regular expression matching
     2766Note that the regular expression matcher is greedy, i.e., matches
     2767are attempted from left to right and, if two or more matches are
     2768possible starting at the same character, it selects the longest.
     2769
     2770@noindent
     2771Examples:
     2772@table @samp
     2773@item abcdef
     2774Matches @samp{abcdef}.
     2775
     2776@item a*b
     2777Matches zero or more @samp{a}s followed by a single
     2778@samp{b}.  For example, @samp{b} or @samp{aaaaab}.
     2779
     2780@item a\?b
     2781Matches @samp{b} or @samp{ab}.
     2782
     2783@item a\+b\+
     2784Matches one or more @samp{a}s followed by one or more
     2785@samp{b}s: @samp{ab} is the shortest possible match, but
     2786other examples are @samp{aaaab} or @samp{abbbbb} or
     2787@samp{aaaaaabbbbbbb}.
     2788
     2789@item .*
     2790@itemx .\+
     2791These two both match all the characters in a string;
     2792however, the first matches every string (including the empty
     2793string), while the second matches only strings containing
     2794at least one character.
     2795
     2796@item ^main.*(.*)
     2797This matches a string starting with @samp{main},
     2798followed by an opening and closing
     2799parenthesis.  The @samp{n}, @samp{(} and @samp{)} need not
     2800be adjacent.
     2801
     2802@item ^#
     2803This matches a string beginning with @samp{#}.
     2804
     2805@item \\$
     2806This matches a string ending with a single backslash.  The
     2807regexp contains two backslashes for escaping.
     2808
     2809@item \$
     2810Instead, this matches a string consisting of a single dollar sign,
     2811because it is escaped.
     2812
     2813@item [a-zA-Z0-9]
     2814In the C locale, this matches any ASCII letters or digits.
     2815
     2816@item [^ @kbd{@key{TAB}}]\+
     2817(Here @kbd{@key{TAB}} stands for a single tab character.)
     2818This matches a string of one or more
     2819characters, none of which is a space or a tab.
     2820Usually this means a word.
     2821
     2822@item ^\(.*\)\n\1$
     2823This matches a string consisting of two equal substrings separated by
     2824a newline.
     2825
     2826@item .\@{9\@}A$
     2827This matches nine characters followed by an @samp{A} at the end of a line.
     2828
     2829@item ^.\@{15\@}A
     2830This matches the start of a string that contains 16 characters,
     2831the last of which is an @samp{A}.
     2832
     2833@end table
     2834
     2835
     2836@node ERE syntax
     2837@section Overview of extended regular expression syntax
     2838@cindex Extended regular expressions, syntax
     2839
     2840The only difference between basic and extended regular expressions is in
     2841the behavior of a few characters: @samp{?}, @samp{+}, parentheses,
     2842braces (@samp{@{@}}), and @samp{|}.  While basic regular expressions
     2843require these to be escaped if you want them to behave as special
     2844characters, when using extended regular expressions you must escape
     2845them if you want them @emph{to match a literal character}.  @samp{|}
     2846is special here because @samp{\|} is a GNU extension -- standard
     2847basic regular expressions do not provide its functionality.
     2848
     2849@noindent
     2850Examples:
     2851@table @code
     2852@item abc?
     2853becomes @samp{abc\?} when using extended regular expressions.  It matches
     2854the literal string @samp{abc?}.
     2855
     2856@item c\+
     2857becomes @samp{c+} when using extended regular expressions.  It matches
     2858one or more @samp{c}s.
     2859
     2860@item a\@{3,\@}
     2861becomes @samp{a@{3,@}} when using extended regular expressions.  It matches
     2862three or more @samp{a}s.
     2863
     2864@item \(abc\)\@{2,3\@}
     2865becomes @samp{(abc)@{2,3@}} when using extended regular expressions.  It
     2866matches either @samp{abcabc} or @samp{abcabcabc}.
     2867
     2868@item \(abc*\)\1
     2869becomes @samp{(abc*)\1} when using extended regular expressions.
     2870Backreferences must still be escaped when using extended regular
     2871expressions.
     2872
     2873@item a\|b
     2874becomes @samp{a|b} when using extended regular expressions.  It matches
     2875@samp{a} or @samp{b}.
     2876@end table
     2877
     2878@node Character Classes and Bracket Expressions
     2879@section Character Classes and Bracket Expressions
     2880
     2881@c The 'character class' section is shamelessly copied from grep's manual.
     2882
     2883@cindex bracket expression
     2884@cindex character class
     2885A @dfn{bracket expression} is a list of characters enclosed by @samp{[} and
     2886@samp{]}.
     2887It matches any single character in that list;
     2888if the first character of the list is the caret @samp{^},
     2889then it matches any character @strong{not} in the list.
     2890For example, the following command replaces the strings
     2891@samp{gray} or @samp{grey} with @samp{blue}:
     2892
     2893@example
     2894sed  's/gr[ae]y/blue/'
     2895@end example
     2896
     2897@c TODO: fix 'ref' to look good in both HTML and PDF
     2898Bracket expressions can be used in both
     2899@ref{BRE syntax,,basic} and @ref{ERE syntax,,extended}
     2900regular expressions (that is, with or without the @option{-E}/@option{-r}
     2901options).
     2902
     2903@cindex range expression
     2904Within a bracket expression, a @dfn{range expression} consists of two
     2905characters separated by a hyphen.
     2906It matches any single character that
     2907sorts between the two characters, inclusive.
     2908In the default C locale, the sorting sequence is the native character
     2909order; for example, @samp{[a-d]} is equivalent to @samp{[abcd]}.
     2910
     2911
     2912Finally, certain named classes of characters are predefined within
     2913bracket expressions, as follows.
     2914
     2915These named classes must be used @emph{inside} brackets
     2916themselves. Correct usage:
     2917@example
     2918$ echo 1 | sed 's/[[:digit:]]/X/'
     2919X
     2920@end example
     2921
     2922Incorrect usage is rejected by newer @command{sed} versions.
     2923Older versions accepted it but treated it as a single bracket expression
     2924(which is equivalent to @samp{[dgit:]},
     2925that is, only the characters @var{d/g/i/t/:}):
     2926@example
     2927# current GNU sed versions - incorrect usage rejected
     2928$ echo 1 | sed 's/[:digit:]/X/'
     2929sed: character class syntax is [[:space:]], not [:space:]
     2930
     2931# older GNU sed versions
     2932$ echo 1 | sed 's/[:digit:]/X/'
     29331
     2934@end example
     2935
     2936
     2937@cindex classes of characters
     2938@cindex character classes
     2939@cindex named character classes
     2940@table @samp
     2941
     2942@item [:alnum:]
     2943@opindex alnum @r{character class}
     2944@cindex alphanumeric characters
     2945Alphanumeric characters:
     2946@samp{[:alpha:]} and @samp{[:digit:]}; in the @samp{C} locale and ASCII
     2947character encoding, this is the same as @samp{[0-9A-Za-z]}.
     2948
     2949@item [:alpha:]
     2950@opindex alpha @r{character class}
     2951@cindex alphabetic characters
     2952Alphabetic characters:
     2953@samp{[:lower:]} and @samp{[:upper:]}; in the @samp{C} locale and ASCII
     2954character encoding, this is the same as @samp{[A-Za-z]}.
     2955
     2956@item [:blank:]
     2957@opindex blank @r{character class}
     2958@cindex blank characters
     2959Blank characters:
     2960space and tab.
     2961
     2962@item [:cntrl:]
     2963@opindex cntrl @r{character class}
     2964@cindex control characters
     2965Control characters.
     2966In ASCII, these characters have octal codes 000
     2967through 037, and 177 (DEL).
     2968In other character sets, these are
     2969the equivalent characters, if any.
     2970
     2971@item [:digit:]
     2972@opindex digit @r{character class}
     2973@cindex digit characters
     2974@cindex numeric characters
     2975Digits: @code{0 1 2 3 4 5 6 7 8 9}.
     2976
     2977@item [:graph:]
     2978@opindex graph @r{character class}
     2979@cindex graphic characters
     2980Graphical characters:
     2981@samp{[:alnum:]} and @samp{[:punct:]}.
     2982
     2983@item [:lower:]
     2984@opindex lower @r{character class}
     2985@cindex lower-case letters
     2986Lower-case letters; in the @samp{C} locale and ASCII character
     2987encoding, this is
     2988@code{a b c d e f g h i j k l m n o p q r s t u v w x y z}.
     2989
     2990@item [:print:]
     2991@opindex print @r{character class}
     2992@cindex printable characters
     2993Printable characters:
     2994@samp{[:alnum:]}, @samp{[:punct:]}, and space.
     2995
     2996@item [:punct:]
     2997@opindex punct @r{character class}
     2998@cindex punctuation characters
     2999Punctuation characters; in the @samp{C} locale and ASCII character
     3000encoding, this is
     3001@code{!@: " # $ % & ' ( ) * + , - .@: / : ; < = > ?@: @@ [ \ ] ^ _ ` @{ | @} ~}.
     3002
     3003@item [:space:]
     3004@opindex space @r{character class}
     3005@cindex space characters
     3006@cindex whitespace characters
     3007Space characters: in the @samp{C} locale, this is
     3008tab, newline, vertical tab, form feed, carriage return, and space.
     3009
     3010
     3011@item [:upper:]
     3012@opindex upper @r{character class}
     3013@cindex upper-case letters
     3014Upper-case letters: in the @samp{C} locale and ASCII character
     3015encoding, this is
     3016@code{A B C D E F G H I J K L M N O P Q R S T U V W X Y Z}.
     3017
     3018@item [:xdigit:]
     3019@opindex xdigit @r{character class}
     3020@cindex xdigit class
     3021@cindex hexadecimal digits
     3022Hexadecimal digits:
     3023@code{0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f}.
     3024
     3025@end table
     3026Note that the brackets in these class names are
     3027part of the symbolic names, and must be included in addition to
     3028the brackets delimiting the bracket expression.
     3029
     3030Most meta-characters lose their special meaning inside bracket expressions:
     3031
     3032@table @samp
     3033@item ]
     3034ends the bracket expression if it's not the first list item.
     3035So, if you want to make the @samp{]} character a list item,
     3036you must put it first.
     3037
     3038@item -
     3039represents the range if it's not first or last in a list or the ending point
     3040of a range.
     3041
     3042@item ^
     3043represents the characters not in the list.
     3044If you want to make the @samp{^}
     3045character a list item, place it anywhere but first.
     3046@end table
     3047
     3048TODO: incorporate this paragraph (copied verbatim from BRE section).
     3049
     3050@cindex @code{POSIXLY_CORRECT} behavior, bracket expressions
     3051The characters @code{$}, @code{*}, @code{.}, @code{[}, and @code{\}
     3052are normally not special within @var{list}.  For example, @code{[\*]}
     3053matches either @samp{\} or @samp{*}, because the @code{\} is not
     3054special here.  However, strings like @code{[.ch.]}, @code{[=a=]}, and
     3055@code{[:space:]} are special within @var{list} and represent collating
     3056symbols, equivalence classes, and character classes, respectively, and
     3057@code{[} is therefore special within @var{list} when it is followed by
     3058@code{.}, @code{=}, or @code{:}.  Also, when not in
     3059@env{POSIXLY_CORRECT} mode, special escapes like @code{\n} and
     3060@code{\t} are recognized within @var{list}.  @xref{Escapes}.
     3061@c ********
     3062
     3063
     3064@c TODO: improve explanation about collation classes and equivalence classes
     3065@c       perhaps dedicate a section to Locales ??
     3066
     3067@table @samp
     3068@item [.
     3069represents the open collating symbol.
     3070
     3071@item .]
     3072represents the close collating symbol.
     3073
     3074@item [=
     3075represents the open equivalence class.
     3076
     3077@item =]
     3078represents the close equivalence class.
     3079
     3080@item [:
     3081represents the open character class symbol, and should be followed by a
     3082valid character class name.
     3083
     3084@item :]
     3085represents the close character class symbol.
     3086@end table
     3087
     3088
     3089@node regexp extensions
     3090@section regular expression extensions
     3091
     3092The following sequences have special meaning inside regular expressions
     3093(used in @ref{Regexp Addresses,,addresses} and the @code{s} command).
     3094
     3095These can be used in both
     3096@ref{BRE syntax,,basic} and @ref{ERE syntax,,extended}
     3097regular expressions (that is, with or without the @option{-E}/@option{-r}
     3098options).
     3099
     3100@table @code
     3101@item \w
     3102Matches any ``word'' character.  A ``word'' character is any
     3103letter or digit or the underscore character.
     3104
     3105@example
     3106$ echo "abc %-= def." | sed 's/\w/X/g'
     3107XXX %-= XXX.
     3108@end example
     3109
     3110
     3111@item \W
     3112Matches any ``non-word'' character.
     3113
     3114@example
     3115$ echo "abc %-= def." | sed 's/\W/X/g'
     3116abcXXXXXdefX
     3117@end example
     3118
     3119
     3120@item \b
     3121Matches a word boundary; that is it matches if the character
     3122to the left is a ``word'' character and the character to the
     3123right is a ``non-word'' character, or vice-versa.
     3124
     3125@example
     3126$ echo "abc %-= def." | sed 's/\b/X/g'
     3127XabcX %-= XdefX.
     3128@end example
     3129
     3130
     3131@item \B
     3132Matches everywhere but on a word boundary; that is it matches
     3133if the character to the left and the character to the right
     3134are either both ``word'' characters or both ``non-word''
     3135characters.
     3136
     3137@example
     3138$ echo "abc %-= def." | sed 's/\B/X/g'
     3139aXbXc X%X-X=X dXeXf.X
     3140@end example
     3141
     3142
     3143@item \s
     3144Matches whitespace characters (spaces and tabs).
     3145Newlines embedded in the pattern/hold spaces will also match:
     3146
     3147@example
     3148$ echo "abc %-= def." | sed 's/\s/X/g'
     3149abcX%-=Xdef.
     3150@end example
     3151
     3152
     3153@item \S
     3154Matches non-whitespace characters.
     3155
     3156@example
     3157$ echo "abc %-= def." | sed 's/\S/X/g'
     3158XXX XXX XXXX
     3159@end example
     3160
     3161
     3162@item \<
     3163Matches the beginning of a word.
     3164
     3165@example
     3166$ echo "abc %-= def." | sed 's/\</X/g'
     3167Xabc %-= Xdef.
     3168@end example
     3169
     3170
     3171@item \>
     3172Matches the end of a word.
     3173
     3174@example
     3175$ echo "abc %-= def." | sed 's/\>/X/g'
     3176abcX %-= defX.
     3177@end example
     3178
     3179
     3180@item \`
     3181Matches only at the start of pattern space.  This is different
     3182from @code{^} in multi-line mode.
     3183
     3184Compare the following two examples:
     3185
     3186@example
     3187$ printf "a\nb\nc\n" | sed 'N;N;s/^/X/gm'
     3188Xa
     3189Xb
     3190Xc
     3191
     3192$ printf "a\nb\nc\n" | sed 'N;N;s/\`/X/gm'
     3193Xa
     3194b
     3195c
     3196@end example
     3197
     3198@item \'
     3199Matches only at the end of pattern space.  This is different
     3200from @code{$} in multi-line mode.
     3201
     3202
     3203
     3204@end table
     3205
     3206
     3207@node Back-references and Subexpressions
     3208@section Back-references and Subexpressions
     3209@cindex subexpression
     3210@cindex back-reference
     3211
     3212@dfn{back-references} are regular expression commands which refer to a
     3213previous part of the matched regular expression.  Back-references are
     3214specified with backslash and a single digit (e.g. @samp{\1}).  The
     3215part of the regular expression they refer to is called a
     3216@dfn{subexpression}, and is designated with parentheses.
     3217
     3218Back-references and subexpressions are used in two cases: in the
     3219regular expression search pattern, and in the @var{replacement} part
     3220of the @command{s} command (@pxref{Regexp Addresses,,Regular
     3221Expression Addresses} and @ref{The "s" Command}).
     3222
     3223In a regular expression pattern, back-references are used to match
     3224the same content as a previously matched subexpression.  In the
     3225following example, the subexpression is @samp{.} - any single
     3226character (being surrounded by parentheses makes it a
     3227subexpression). The back-reference @samp{\1} asks to match the same
     3228content (same character) as the sub-expression.
     3229
     3230The command below matches words starting with any character,
     3231followed by the letter @samp{o}, followed by the same character as the
     3232first.
     3233
     3234@example
     3235$ sed -E -n '/^(.)o\1$/p' /usr/share/dict/words
     3236bob
     3237mom
     3238non
     3239pop
     3240sos
     3241tot
     3242wow
     3243@end example
     3244
     3245Multiple subexpressions are automatically numbered from
     3246left-to-right. This command searches for 6-letter
     3247palindromes (the first three letters are 3 subexpressions,
     3248followed by 3 back-references in reverse order):
     3249
     3250@example
     3251$ sed -E -n '/^(.)(.)(.)\3\2\1$/p' /usr/share/dict/words
     3252redder
     3253@end example
     3254
     3255In the @command{s} command, back-references can be
     3256used in the @var{replacement} part to refer back to subexpressions in
     3257the @var{regexp} part.
     3258
     3259The following example uses two subexpressions in the regular
     3260expression to match two space-separated words. The back-references in
     3261the @var{replacement} part prints the words in a different order:
     3262
     3263@example
     3264$ echo "James Bond" | sed -E 's/(.*) (.*)/The name is \2, \1 \2./'
     3265The name is Bond, James Bond.
     3266@end example
     3267
     3268
     3269When used with alternation, if the group does not participate in the
     3270match then the back-reference makes the whole match fail.  For
     3271example, @samp{a(.)|b\1} will not match @samp{ba}.  When multiple
     3272regular expressions are given with @option{-e} or from a file
     3273(@samp{-f @var{file}}), back-references are local to each expression.
     3274
     3275
    14663276@node Escapes
    1467 @section @acronym{GNU} Extensions for Escapes in Regular Expressions
    1468 
    1469 @cindex @acronym{GNU} extensions, special escapes
     3277@section Escape Sequences - specifying special characters
     3278
     3279@cindex GNU extensions, special escapes
    14703280Until this chapter, we have only encountered escapes of the form
    14713281@samp{\^}, which tell @command{sed} not to interpret the circumflex
     
    14763286@cindex @code{POSIXLY_CORRECT} behavior, escapes
    14773287This chapter introduces another kind of escape@footnote{All
    1478 the escapes introduced here are @acronym{GNU}
     3288the escapes introduced here are GNU
    14793289extensions, with the exception of @code{\n}.  In basic regular
    14803290expression mode, setting @code{POSIXLY_CORRECT} disables them inside
     
    15223332
    15233333@item \o@var{xxx}
    1524 @ifset PERL
    1525 @item \@var{xxx}
    1526 @end ifset
    15273334Produces or matches a character whose octal @sc{ascii} value is @var{xxx}.
    1528 @ifset PERL
    1529 The syntax without the @code{o} is active in Perl mode, while the one
    1530 with the @code{o} is active in the normal or extended @sc{posix} regular
    1531 expression modes.
    1532 @end ifset
    15333335
    15343336@item \x@var{xx}
     
    15393341the existing ``word boundary'' meaning.
    15403342
    1541 Other escapes match a particular character class and are valid only in
    1542 regular expressions:
    1543 
     3343@subsection Escaping Precedence
     3344
     3345@value{SSED} processes escape sequences @emph{before} passing
     3346the text onto the regular-expression matching of the @command{s///} command
     3347and Address matching. Thus the following two commands are equivalent
     3348(@samp{0x5e} is the hexadecimal @sc{ascii} value of the character @samp{^}):
     3349
     3350@codequoteundirected on
     3351@codequotebacktick on
     3352@example
     3353@group
     3354$ echo 'a^c' | sed 's/^/b/'
     3355ba^c
     3356
     3357$ echo 'a^c' | sed 's/\x5e/b/'
     3358ba^c
     3359@end group
     3360@end example
     3361@codequoteundirected off
     3362@codequotebacktick off
     3363
     3364As are the following (@samp{0x5b},@samp{0x5d} are the hexadecimal
     3365@sc{ascii} values of @samp{[},@samp{]}, respectively):
     3366
     3367@codequoteundirected on
     3368@codequotebacktick on
     3369@example
     3370@group
     3371$ echo abc | sed 's/[a]/x/'
     3372Xbc
     3373$ echo abc | sed 's/\x5ba\x5d/x/'
     3374Xbc
     3375@end group
     3376@end example
     3377@codequoteundirected off
     3378@codequotebacktick off
     3379
     3380However it is recommended to avoid such special characters
     3381due to unexpected edge-cases. For example, the following
     3382are not equivalent:
     3383
     3384@codequoteundirected on
     3385@codequotebacktick on
     3386@example
     3387@group
     3388$ echo 'a^c' | sed 's/\^/b/'
     3389abc
     3390
     3391$ echo 'a^c' | sed 's/\\\x5e/b/'
     3392a^c
     3393@end group
     3394@end example
     3395@codequoteundirected off
     3396@codequotebacktick off
     3397
     3398@c also: this fails in different places:
     3399@c   $ sed 's/[//'
     3400@c   sed: -e expression #1, char 5: unterminated `s' command
     3401@c   $ sed 's/\x5b//'
     3402@c   sed: -e expression #1, char 8: Invalid regular expression
     3403@c
     3404@c which is OK but confusing to explain why (the first
     3405@c fails in compile.c:snarf_char_class while the second
     3406@c is passed to the regex engine and then fails).
     3407
     3408
     3409@node Locale Considerations
     3410@section Multibyte characters and Locale Considerations
     3411
     3412@value{SSED} processes valid multibyte characters in multibyte locales
     3413(e.g. @code{UTF-8}).  @footnote{Some regexp edge-cases depends on the
     3414operating system and libc implementation. The examples shown are known
     3415to work as-expected on GNU/Linux systems using glibc.}
     3416
     3417@noindent The following example uses the Greek letter Capital Sigma
     3418(@value{ucsigma},
     3419Unicode code point @code{0x03A3}). In a @code{UTF-8} locale,
     3420@command{sed} correctly processes the Sigma as one character despite
     3421it being 2 octets (bytes):
     3422
     3423@codequoteundirected on
     3424@codequotebacktick on
     3425@example
     3426@group
     3427$ locale | grep LANG
     3428LANG=en_US.UTF-8
     3429
     3430$ printf 'a\u03A3b'
     3431a@value{ucsigma}b
     3432
     3433$ printf 'a\u03A3b' | sed 's/./X/g'
     3434XXX
     3435
     3436$ printf 'a\u03A3b' | od -tx1 -An
     3437 61 ce a3 62
     3438@end group
     3439@end example
     3440@codequoteundirected off
     3441@codequotebacktick off
     3442
     3443@noindent
     3444To force @command{sed} to process octets separately, use the @code{C} locale
     3445(also known as the @code{POSIX} locale):
     3446
     3447@codequoteundirected on
     3448@codequotebacktick on
     3449@example
     3450$ printf 'a\u03A3b' | LC_ALL=C sed 's/./X/g'
     3451XXXX
     3452@end example
     3453@codequoteundirected off
     3454@codequotebacktick off
     3455
     3456@subsection Invalid multibyte characters
     3457
     3458@command{sed}'s regular expressions @emph{do not} match
     3459invalid multibyte sequences in a multibyte locale.
     3460
     3461@noindent
     3462In the following examples, the ascii value @code{0xCE} is
     3463an incomplete multibyte character (shown here as @value{unicodeFFFD}).
     3464The regular expression @samp{.} does not match it:
     3465
     3466@codequoteundirected on
     3467@codequotebacktick on
     3468@example
     3469@group
     3470$ printf 'a\xCEb\n'
     3471a@value{unicodeFFFD}e
     3472
     3473$ printf 'a\xCEb\n' | sed 's/./X/g'
     3474X@value{unicodeFFFD}X
     3475
     3476$ printf 'a\xCEc\n' | sed 's/./X/g' | od -tx1c -An
     3477  58  ce  58  0a
     3478   X      X   \n
     3479@end group
     3480@end example
     3481@codequoteundirected off
     3482@codequotebacktick off
     3483
     3484@noindent Similarly, the 'catch-all' regular expression @samp{.*} does not
     3485match the entire line:
     3486
     3487@codequoteundirected on
     3488@codequotebacktick on
     3489@example
     3490@group
     3491$ printf 'a\xCEc\n' | sed 's/.*//' | od -tx1c -An
     3492  ce  63  0a
     3493       c  \n
     3494@end group
     3495@end example
     3496@codequoteundirected off
     3497@codequotebacktick off
     3498
     3499@noindent
     3500@value{SSED} offers the special @command{z} command to clear the
     3501current pattern space regardless of invalid multibyte characters
     3502(i.e. it works like @code{s/.*//} but also removes invalid multibyte
     3503characters):
     3504
     3505@codequoteundirected on
     3506@codequotebacktick on
     3507@example
     3508@group
     3509$ printf 'a\xCEc\n' | sed 'z' | od -tx1c -An
     3510   0a
     3511   \n
     3512@end group
     3513@end example
     3514@codequoteundirected off
     3515@codequotebacktick off
     3516
     3517@noindent Alternatively, force the @code{C} locale to process
     3518each octet separately (every octet is a valid character in the @code{C}
     3519locale):
     3520
     3521@codequoteundirected on
     3522@codequotebacktick on
     3523@example
     3524@group
     3525$ printf 'a\xCEc\n' | LC_ALL=C sed 's/.*//' | od -tx1c -An
     3526  0a
     3527  \n
     3528@end group
     3529@end example
     3530@codequoteundirected off
     3531@codequotebacktick off
     3532
     3533
     3534@command{sed}'s inability to process invalid multibyte characters
     3535can be used to detect such invalid sequences in a file.
     3536In the following examples, the @code{\xCE\xCE} is an invalid
     3537multibyte sequence, while @code{\xCE\A3} is a valid multibyte sequence
     3538(of the Greek Sigma character).
     3539
     3540@noindent
     3541The following @command{sed} program removes all valid
     3542characters using @code{s/.//g}.  Any content left in the pattern space
     3543(the invalid characters) are added to the hold space using the
     3544@code{H} command. On the last line (@code{$}), the hold space is retrieved
     3545(@code{x}), newlines are removed (@code{s/\n//g}), and any remaining
     3546octets are printed unambiguously (@code{l}).  Thus, any invalid
     3547multibyte sequences are printed as octal values:
     3548
     3549@codequoteundirected on
     3550@codequotebacktick on
     3551@example
     3552@group
     3553$ printf 'ab\nc\n\xCE\xCEde\n\xCE\xA3f\n' > invalid.txt
     3554
     3555$ cat invalid.txt
     3556ab
     3557c
     3558@value{unicodeFFFD}@value{unicodeFFFD}de
     3559@value{ucsigma}f
     3560
     3561$ sed -n 's/.//g ; H ; $@{x;s/\n//g;l@}' invalid.txt
     3562\316\316$
     3563@end group
     3564@end example
     3565@codequoteundirected off
     3566@codequotebacktick off
     3567
     3568@noindent With a few more commands, @command{sed} can print
     3569the exact line number corresponding to each invalid characters (line 3).
     3570These characters can then be removed by forcing the @code{C} locale
     3571and using octal escape sequences:
     3572
     3573@codequoteundirected on
     3574@codequotebacktick on
     3575@example
     3576$ sed -n 's/.//g;=;l' invalid.txt | paste - -  | awk '$2!="$"'
     35773       \316\316$
     3578
     3579$ LC_ALL=C sed '3s/\o316\o316//' invalid.txt > fixed.txt
     3580@end example
     3581@codequoteundirected off
     3582@codequotebacktick off
     3583
     3584@subsection Upper/Lower case conversion
     3585
     3586
     3587@value{SSED}'s substitute command (@code{s}) supports upper/lower
     3588case conversions using @code{\U},@code{\L} codes.
     3589These conversions support multibyte characters:
     3590
     3591@codequoteundirected on
     3592@codequotebacktick on
     3593@example
     3594$ printf 'ABC\u03a3\n'
     3595ABC@value{ucsigma}
     3596
     3597$ printf 'ABC\u03a3\n' | sed 's/.*/\L&/'
     3598abc@value{lcsigma}
     3599@end example
     3600@codequoteundirected off
     3601@codequotebacktick off
     3602
     3603@noindent
     3604@xref{The "s" Command}.
     3605
     3606
     3607@subsection Multibyte regexp character classes
     3608
     3609@c TODO: fix following paragraphs (copied verbatim from 'bracket
     3610@c expression' section).
     3611
     3612In other locales, the sorting sequence is not specified, and
     3613@samp{[a-d]} might be equivalent to @samp{[abcd]} or to
     3614@samp{[aBbCcDd]}, or it might fail to match any character, or the set of
     3615characters that it matches might even be erratic.
     3616To obtain the traditional interpretation
     3617of bracket expressions, you can use the @samp{C} locale by setting the
     3618@env{LC_ALL} environment variable to the value @samp{C}.
     3619
     3620@example
     3621# TODO: is there any real-world system/locale where 'A'
     3622#       is replaced by '-' ?
     3623$ echo A | sed 's/[a-z]/-/'
     3624A
     3625@end example
     3626
     3627Their interpretation depends on the @env{LC_CTYPE} locale;
     3628for example, @samp{[[:alnum:]]} means the character class of numbers and letters
     3629in the current locale.
     3630
     3631TODO: show example of collation
     3632
     3633@codequoteundirected on
     3634@codequotebacktick on
     3635@example
     3636# TODO: this works on glibc systems, not on musl-libc/freebsd/macosx.
     3637$ printf 'cliché\n' | LC_ALL=fr_FR.utf8 sed 's/[[=e=]]/X/g'
     3638clichX
     3639@end example
     3640@codequoteundirected off
     3641@codequotebacktick off
     3642
     3643
     3644@node advanced sed
     3645@chapter Advanced @command{sed}: cycles and buffers
     3646
     3647@menu
     3648* Execution Cycle::          How @command{sed} works
     3649* Hold and Pattern Buffers::
     3650* Multiline techniques::     Using D,G,H,N,P to process multiple lines
     3651* Branching and flow control::
     3652@end menu
     3653
     3654@node Execution Cycle
     3655@section How @command{sed} Works
     3656
     3657@cindex Buffer spaces, pattern and hold
     3658@cindex Spaces, pattern and hold
     3659@cindex Pattern space, definition
     3660@cindex Hold space, definition
     3661@command{sed} maintains two data buffers: the active @emph{pattern} space,
     3662and the auxiliary @emph{hold} space. Both are initially empty.
     3663
     3664@command{sed} operates by performing the following cycle on each
     3665line of input: first, @command{sed} reads one line from the input
     3666stream, removes any trailing newline, and places it in the pattern space.
     3667Then commands are executed; each command can have an address associated
     3668to it: addresses are a kind of condition code, and a command is only
     3669executed if the condition is verified before the command is to be
     3670executed.
     3671
     3672When the end of the script is reached, unless the @option{-n} option
     3673is in use, the contents of pattern space are printed out to the output
     3674stream, adding back the trailing newline if it was removed.@footnote{Actually,
     3675if @command{sed} prints a line without the terminating newline, it will
     3676nevertheless print the missing newline as soon as more text is sent to
     3677the same output stream, which gives the ``least expected surprise''
     3678even though it does not make commands like @samp{sed -n p} exactly
     3679identical to @command{cat}.} Then the next cycle starts for the next
     3680input line.
     3681
     3682Unless special commands (like @samp{D}) are used, the pattern space is
     3683deleted between two cycles. The hold space, on the other hand, keeps
     3684its data between cycles (see commands @samp{h}, @samp{H}, @samp{x},
     3685@samp{g}, @samp{G} to move data between both buffers).
     3686
     3687@node Hold and Pattern Buffers
     3688@section Hold and Pattern Buffers
     3689
     3690TODO
     3691
     3692@node Multiline techniques
     3693@section Multiline techniques - using D,G,H,N,P to process multiple lines
     3694
     3695Multiple lines can be processed as one buffer using the
     3696@code{D},@code{G},@code{H},@code{N},@code{P}. They are similar to
     3697their lowercase counterparts (@code{d},@code{g},
     3698@code{h},@code{n},@code{p}), except that these commands append or
     3699subtract data while respecting embedded newlines - allowing adding and
     3700removing lines from the pattern and hold spaces.
     3701
     3702They operate as follows:
    15443703@table @code
    1545 @item \w
    1546 Matches any ``word'' character.  A ``word'' character is any
    1547 letter or digit or the underscore character.
    1548 
    1549 @item \W
    1550 Matches any ``non-word'' character.
    1551 
    1552 @item \b
    1553 Matches a word boundary; that is it matches if the character
    1554 to the left is a ``word'' character and the character to the
    1555 right is a ``non-word'' character, or vice-versa.
    1556 
    1557 @item \B
    1558 Matches everywhere but on a word boundary; that is it matches
    1559 if the character to the left and the character to the right
    1560 are either both ``word'' characters or both ``non-word''
    1561 characters.
    1562 
    1563 @item \`
    1564 Matches only at the start of pattern space.  This is different
    1565 from @code{^} in multi-line mode.
    1566 
    1567 @item \'
    1568 Matches only at the end of pattern space.  This is different
    1569 from @code{$} in multi-line mode.
    1570 
    1571 @ifset PERL
    1572 @item \G
    1573 Match only at the start of pattern space or, when doing a global
    1574 substitution using the @code{s///g} command and option, at
    1575 the end-of-match position of the prior match.  For example,
    1576 @samp{s/\Ga/Z/g} will change an initial run of @code{a}s to
    1577 a run of @code{Z}s
    1578 @end ifset
     3704@item D
     3705@emph{deletes} line from the pattern space until the first newline,
     3706and restarts the cycle.
     3707
     3708@item G
     3709@emph{appends} line from the hold space to the pattern space, with a
     3710newline before it.
     3711
     3712@item H
     3713@emph{appends} line from the pattern space to the hold space, with a
     3714newline before it.
     3715
     3716@item N
     3717@emph{appends} line from the input file to the pattern space.
     3718
     3719@item P
     3720@emph{prints} line from the pattern space until the first newline.
     3721
    15793722@end table
     3723
     3724
     3725The following example illustrates the operation of @code{N} and
     3726@code{D} commands:
     3727
     3728@codequoteundirected on
     3729@codequotebacktick on
     3730@example
     3731@group
     3732$ seq 6 | sed -n 'N;l;D'
     37331\n2$
     37342\n3$
     37353\n4$
     37364\n5$
     37375\n6$
     3738@end group
     3739@end example
     3740@codequoteundirected off
     3741@codequotebacktick off
     3742
     3743@enumerate
     3744@item
     3745@command{sed} starts by reading the first line into the pattern space
     3746(i.e. @samp{1}).
     3747@item
     3748At the beginning of every cycle, the @code{N}
     3749command appends a newline and the next line to the pattern space
     3750(i.e. @samp{1}, @samp{\n}, @samp{2} in the first cycle).
     3751@item
     3752The @code{l} command prints the content of the pattern space
     3753unambiguously.
     3754@item
     3755The @code{D} command then removes the content of pattern
     3756space up to the first newline (leaving @samp{2} at the end of
     3757the first cycle).
     3758@item
     3759At the next cycle the @code{N} command appends a
     3760newline and the next input line to the pattern space
     3761(e.g. @samp{2}, @samp{\n}, @samp{3}).
     3762@end enumerate
     3763
     3764
     3765@cindex processing paragraphs
     3766@cindex paragraphs, processing
     3767A common technique to process blocks of text such as paragraphs
     3768(instead of line-by-line) is using the following construct:
     3769
     3770@codequoteundirected on
     3771@codequotebacktick on
     3772@example
     3773sed '/./@{H;$!d@} ; x ; s/REGEXP/REPLACEMENT/'
     3774@end example
     3775@codequoteundirected off
     3776@codequotebacktick off
     3777
     3778@enumerate
     3779@item
     3780The first expression, @code{/./@{H;$!d@}} operates on all non-empty lines,
     3781and adds the current line (in the pattern space) to the hold space.
     3782On all lines except the last, the pattern space is deleted and the cycle is
     3783restarted.
     3784
     3785@item
     3786The other expressions @code{x} and @code{s} are executed only on empty
     3787lines (i.e. paragraph separators). The @code{x} command fetches the
     3788accumulated lines from the hold space back to the pattern space. The
     3789@code{s///} command then operates on all the text in the paragraph
     3790(including the embedded newlines).
     3791@end enumerate
     3792
     3793The following example demonstrates this technique:
     3794@codequoteundirected on
     3795@codequotebacktick on
     3796@example
     3797@group
     3798$ cat input.txt
     3799a a a aa aaa
     3800aaaa aaaa aa
     3801aaaa aaa aaa
     3802
     3803bbbb bbb bbb
     3804bb bb bbb bb
     3805bbbbbbbb bbb
     3806
     3807ccc ccc cccc
     3808cccc ccccc c
     3809cc cc cc cc
     3810
     3811$ sed '/./@{H;$!d@} ; x ; s/^/\nSTART-->/ ; s/$/\n<--END/' input.txt
     3812
     3813START-->
     3814a a a aa aaa
     3815aaaa aaaa aa
     3816aaaa aaa aaa
     3817<--END
     3818
     3819START-->
     3820bbbb bbb bbb
     3821bb bb bbb bb
     3822bbbbbbbb bbb
     3823<--END
     3824
     3825START-->
     3826ccc ccc cccc
     3827cccc ccccc c
     3828cc cc cc cc
     3829<--END
     3830@end group
     3831@end example
     3832@codequoteundirected off
     3833@codequotebacktick off
     3834
     3835For more annotated examples, @pxref{Text search across multiple lines}
     3836and @ref{Line length adjustment}.
     3837
     3838@node Branching and flow control
     3839@section Branching and Flow Control
     3840
     3841The branching commands @code{b}, @code{t}, and @code{T} enable
     3842changing the flow of @command{sed} programs.
     3843
     3844By default, @command{sed} reads an input line into the pattern buffer,
     3845then continues to processes all commands in order.
     3846Commands without addresses affect all lines.
     3847Commands with addresses affect only matching lines.
     3848@xref{Execution Cycle} and @ref{Addresses overview}.
     3849
     3850@command{sed} does not support a typical @code{if/then} construct.
     3851Instead, some commands can be used as conditionals or to change the
     3852default flow control:
     3853
     3854@table @code
     3855
     3856@item d
     3857delete (clears) the current pattern space,
     3858and restart the program cycle without processing the rest of the commands
     3859and without printing the pattern space.
     3860
     3861@item D
     3862delete the contents of the pattern space @emph{up to the first newline},
     3863and restart the program cycle without processing the rest of
     3864the commands and without printing the pattern space.
     3865
     3866@item [addr]X
     3867@itemx [addr]@{ X ; X ; X @}
     3868@item /regexp/X
     3869@item /regexp/@{ X ; X ; X @}
     3870Addresses and regular expressions can be used as an @code{if/then}
     3871conditional: If @var{[addr]} matches the current pattern space,
     3872execute the command(s).
     3873For example: The command @code{/^#/d} means:
     3874@emph{if} the current pattern matches the regular expression @code{^#} (a line
     3875starting with a hash), @emph{then} execute the @code{d} command:
     3876delete the line without printing it, and restart the program cycle
     3877immediately.
     3878
     3879@item b
     3880branch unconditionally (that is: always jump to a label, skipping
     3881or repeating other commands, without restarting a new cycle). Combined
     3882with an address, the branch can be conditionally executed on matched
     3883lines.
     3884
     3885@item t
     3886branch conditionally (that is: jump to a label) @emph{only if} a
     3887@code{s///} command has succeeded since the last input line was read
     3888or another conditional branch was taken.
     3889
     3890@item T
     3891similar but opposite to the @code{t} command: branch only if
     3892there has been @emph{no} successful substitutions since the last
     3893input line was read.
     3894@end table
     3895
     3896
     3897The following two @command{sed} programs are equivalent.  The first
     3898(contrived) example uses the @code{b} command to skip the @code{s///}
     3899command on lines containing @samp{1}.  The second example uses an
     3900address with negation (@samp{!})  to perform substitution only on
     3901desired lines.  The @code{y///} command is still executed on all
     3902lines:
     3903
     3904@codequoteundirected on
     3905@codequotebacktick on
     3906@example
     3907@group
     3908$ printf '%s\n' a1 a2 a3 | sed -E '/1/bx ; s/a/z/ ; :x ; y/123/456/'
     3909a4
     3910z5
     3911z6
     3912
     3913$ printf '%s\n' a1 a2 a3 | sed -E '/1/!s/a/z/ ; y/123/456/'
     3914a4
     3915z5
     3916z6
     3917@end group
     3918@end example
     3919@codequoteundirected off
     3920@codequotebacktick off
     3921
     3922
     3923
     3924@subsection Branching and Cycles
     3925@cindex labels
     3926@cindex omitting labels
     3927@cindex cycle, restarting
     3928@cindex restarting a cycle
     3929The @code{b},@code{t} and @code{T} commands can be followed by a label
     3930(typically a single letter). Labels are defined with a colon followed by
     3931one or more letters (e.g. @samp{:x}). If the label is omitted the
     3932branch commands restart the cycle.  Note the difference between
     3933branching to a label and restarting the cycle: when a cycle is
     3934restarted, @command{sed} first prints the current content of the
     3935pattern space, then reads the next input line into the pattern space;
     3936Jumping to a label (even if it is at the beginning of the program)
     3937does not print the pattern space and does not read the next input line.
     3938
     3939The following program is a no-op. The @code{b} command (the only command
     3940in the program) does not have a label, and thus simply restarts the cycle.
     3941On each cycle, the pattern space is printed and the next input line is read:
     3942
     3943@example
     3944@group
     3945$ seq 3 | sed b
     39461
     39472
     39483
     3949@end group
     3950@end example
     3951
     3952@cindex infinite loop, branching
     3953@cindex branching, infinite loop
     3954The following example is an infinite-loop - it doesn't terminate and
     3955doesn't print anything. The @code{b} command jumps to the @samp{x}
     3956label, and a new cycle is never started:
     3957
     3958@codequoteundirected on
     3959@codequotebacktick on
     3960@example
     3961@group
     3962$ seq 3 | sed ':x ; bx'
     3963
     3964# The above command requires gnu sed (which supports additional
     3965# commands following a label, without a newline). A portable equivalent:
     3966#     sed -e ':x' -e bx
     3967@end group
     3968@end example
     3969@codequoteundirected off
     3970@codequotebacktick off
     3971
     3972@cindex branching and n, N
     3973@cindex n, and branching
     3974@cindex N, and branching
     3975Branching is often complemented with the @code{n} or @code{N} commands:
     3976both commands read the next input line into the pattern space without waiting
     3977for the cycle to restart. Before reading the next input line, @code{n}
     3978prints the current pattern space then empties it, while @code{N}
     3979appends a newline and the next input line to the pattern space.
     3980
     3981Consider the following two examples:
     3982
     3983@codequoteundirected on
     3984@codequotebacktick on
     3985@example
     3986@group
     3987$ seq 3 | sed ':x ; n ; bx'
     39881
     39892
     39903
     3991
     3992$ seq 3 | sed ':x ; N ; bx'
     39931
     39942
     39953
     3996@end group
     3997@end example
     3998@codequoteundirected off
     3999@codequotebacktick off
     4000
     4001@itemize
     4002@item
     4003Both examples do not inf-loop, despite never starting a new cycle.
     4004
     4005@item
     4006In the first example, the @code{n} commands first prints the content
     4007of the pattern space, empties the pattern space then reads the next
     4008input line.
     4009
     4010@item
     4011In the second example, the @code{N} commands appends the next input
     4012line to the pattern space (with a newline).  Lines are accumulated in
     4013the pattern space until there are no more input lines to read, then
     4014the @code{N} command terminates the @command{sed} program. When the
     4015program terminates, the end-of-cycle actions are performed, and the
     4016entire pattern space is printed.
     4017
     4018@item
     4019The second example requires @value{SSED},
     4020because it uses the non-POSIX-standard behavior of @code{N}.
     4021See the ``@code{N} command on the last line'' paragraph
     4022in @ref{Reporting Bugs}.
     4023
     4024@item
     4025To further examine the difference between the two examples,
     4026try the following commands:
     4027@codequoteundirected on
     4028@codequotebacktick on
     4029@example
     4030@group
     4031printf '%s\n' aa bb cc dd | sed ':x ; n ; = ; bx'
     4032printf '%s\n' aa bb cc dd | sed ':x ; N ; = ; bx'
     4033printf '%s\n' aa bb cc dd | sed ':x ; n ; s/\n/***/ ; bx'
     4034printf '%s\n' aa bb cc dd | sed ':x ; N ; s/\n/***/ ; bx'
     4035@end group
     4036@end example
     4037@codequoteundirected off
     4038@codequotebacktick off
     4039
     4040@end itemize
     4041
     4042
     4043
     4044@subsection Branching example: joining lines
     4045
     4046@cindex joining lines with branching
     4047@cindex branching, joining lines
     4048@cindex quoted-printable lines, joining
     4049@cindex joining quoted-printable lines
     4050@cindex t, joining lines with
     4051@cindex b, joining lines with
     4052@cindex b, versus t
     4053@cindex t, versus b
     4054As a real-world example of using branching, consider the case of
     4055@uref{https://en.wikipedia.org/wiki/Quoted-printable,quoted-printable} files,
     4056typically used to encode email messages.
     4057In these files long lines are split and marked with a @dfn{soft line break}
     4058consisting of a single @samp{=} character at the end of the line:
     4059
     4060@example
     4061@group
     4062$ cat jaques.txt
     4063All the wor=
     4064ld's a stag=
     4065e,
     4066And all the=
     4067 men and wo=
     4068men merely =
     4069players:
     4070They have t=
     4071heir exits =
     4072and their e=
     4073ntrances;
     4074And one man=
     4075 in his tim=
     4076e plays man=
     4077y parts.
     4078@end group
     4079@end example
     4080
     4081
     4082The following program uses an address match @samp{/=$/} as a
     4083conditional: If the current pattern space ends with a @samp{=}, it
     4084reads the next input line using @code{N}, replaces all @samp{=}
     4085characters which are followed by a newline, and unconditionally
     4086branches (@code{b}) to the beginning of the program without restarting
     4087a new cycle. If the pattern space does not ends with @samp{=}, the
     4088default action is performed: the pattern space is printed and a new
     4089cycle is started:
     4090
     4091@codequoteundirected on
     4092@codequotebacktick on
     4093@example
     4094@group
     4095$ sed ':x ; /=$/ @{ N ; s/=\n//g ; bx @}' jaques.txt
     4096All the world's a stage,
     4097And all the men and women merely players:
     4098They have their exits and their entrances;
     4099And one man in his time plays many parts.
     4100@end group
     4101@end example
     4102@codequoteundirected off
     4103@codequotebacktick off
     4104
     4105Here's an alternative program with a slightly different approach: On
     4106all lines except the last, @code{N} appends the line to the pattern
     4107space.  A substitution command then removes soft line breaks
     4108(@samp{=} at the end of a line, i.e. followed by a newline) by replacing
     4109them with an empty string.
     4110@emph{if} the substitution was successful (meaning the pattern space contained
     4111a line which should be joined), The conditional branch command @code{t} jumps
     4112to the beginning of the program without completing or restarting the cycle.
     4113If the substitution failed (meaning there were no soft line breaks),
     4114The @code{t} command will @emph{not} branch. Then, @code{P} will
     4115print the pattern space content until the first newline, and @code{D}
     4116will delete the pattern space content until the first new line.
     4117(To learn more about @code{N}, @code{P} and @code{D} commands
     4118@pxref{Multiline techniques}).
     4119
     4120
     4121@codequoteundirected on
     4122@codequotebacktick on
     4123@example
     4124@group
     4125$ sed ':x ; $!N ; s/=\n// ; tx ; P ; D' jaques.txt
     4126All the world's a stage,
     4127And all the men and women merely players:
     4128They have their exits and their entrances;
     4129And one man in his time plays many parts.
     4130@end group
     4131@end example
     4132@codequoteundirected off
     4133@codequotebacktick off
     4134
     4135
     4136For more line-joining examples @pxref{Joining lines}.
     4137
    15804138
    15814139@node Examples
     
    15864144
    15874145@menu
     4146
     4147Useful one-liners:
     4148* Joining lines::
     4149
    15884150Some exotic examples:
    15894151* Centering lines::
     
    15924154* Print bash environment::
    15934155* Reverse chars of lines::
     4156* Text search across multiple lines::
     4157* Line length adjustment::
     4158* Adding a header to multiple files::
    15944159
    15954160Emulating standard utilities:
     
    16084173@end menu
    16094174
     4175@node Joining lines
     4176@section Joining lines
     4177
     4178This section uses @code{N}, @code{D} and @code{P} commands to process
     4179multiple lines, and the @code{b} and @code{t} commands for branching.
     4180@xref{Multiline techniques} and @ref{Branching and flow control}.
     4181
     4182Join specific lines (e.g. if lines 2 and 3 need to be joined):
     4183
     4184@codequoteundirected on
     4185@codequotebacktick on
     4186@example
     4187$ cat lines.txt
     4188hello
     4189hel
     4190lo
     4191hello
     4192
     4193$ sed '2@{N;s/\n//;@}' lines.txt
     4194hello
     4195hello
     4196hello
     4197@end example
     4198@codequoteundirected off
     4199@codequotebacktick off
     4200
     4201Join backslash-continued lines:
     4202
     4203@codequoteundirected on
     4204@codequotebacktick on
     4205@example
     4206$ cat 1.txt
     4207this \
     4208is \
     4209a \
     4210long \
     4211line
     4212and another \
     4213line
     4214
     4215$ sed -e ':x /\\$/ @{ N; s/\\\n//g ; bx @}'  1.txt
     4216this is a long line
     4217and another line
     4218
     4219
     4220#TODO: The above requires gnu sed.
     4221#      non-gnu seds need newlines after ':' and 'b'
     4222@end example
     4223@codequoteundirected off
     4224@codequotebacktick off
     4225
     4226Join lines that start with whitespace (e.g SMTP headers):
     4227
     4228@codequoteundirected on
     4229@codequotebacktick on
     4230@example
     4231@group
     4232$ cat 2.txt
     4233Subject: Hello
     4234    World
     4235Content-Type: multipart/alternative;
     4236    boundary=94eb2c190cc6370f06054535da6a
     4237Date: Tue, 3 Jan 2017 19:41:16 +0000 (GMT)
     4238Authentication-Results: mx.gnu.org;
     4239       dkim=pass header.i=@@gnu.org;
     4240       spf=pass
     4241Message-ID: <abcdef@@gnu.org>
     4242From: John Doe <jdoe@@gnu.org>
     4243To: Jane Smith <jsmith@@gnu.org>
     4244
     4245$ sed -E ':a ; $!N ; s/\n\s+/ / ; ta ; P ; D' 2.txt
     4246Subject: Hello World
     4247Content-Type: multipart/alternative; boundary=94eb2c190cc6370f06054535da6a
     4248Date: Tue, 3 Jan 2017 19:41:16 +0000 (GMT)
     4249Authentication-Results: mx.gnu.org; dkim=pass header.i=@@gnu.org; spf=pass
     4250Message-ID: <abcdef@@gnu.org>
     4251From: John Doe <jdoe@@gnu.org>
     4252To: Jane Smith <jsmith@@gnu.org>
     4253
     4254# A portable (non-gnu) variation:
     4255#   sed -e :a -e '$!N;s/\n  */ /;ta' -e 'P;D'
     4256@end group
     4257@end example
     4258@codequoteundirected off
     4259@codequotebacktick off
     4260
     4261
    16104262@node Centering lines
    16114263@section Centering Lines
     
    16344286
    16354287@group
    1636 # del leading and trailing spaces
    1637 y/@kbd{tab}/ /
     4288# delete leading and trailing spaces
     4289y/@kbd{@key{TAB}}/ /
    16384290s/^ *//
    16394291s/ *$//
     
    16844336
    16854337@group
    1686 # replace all leading 9s by _ (any other character except digits, could
     4338# replace all trailing 9s by _ (any other character except digits, could
    16874339# be used)
    16884340:d
     
    16944346# incr last digit only.  The first line adds a most-significant
    16954347# digit of 1 if we have to add a digit.
    1696 #
    1697 # The @code{tn} commands are not necessary, but make the thing
    1698 # faster
    16994348@end group
    17004349
     
    17274376seen a script converting the output of @command{date} into a @command{bc}
    17284377program!
    1729  
     4378
    17304379The main body of this is the @command{sed} script, which remaps the name
    1731 from lower to upper (or vice-versa) and even checks out 
     4380from lower to upper (or vice-versa) and even checks out
    17324381if the remapped name is the same as the original name.
    17334382Note how the script is parameterized using shell
     
    17384387@group
    17394388#! /bin/sh
    1740 # rename files to lower/upper case... 
     4389# rename files to lower/upper case...
    17414390#
    1742 # usage: 
    1743 #    move-to-lower * 
    1744 #    move-to-upper * 
     4391# usage:
     4392#    move-to-lower *
     4393#    move-to-upper *
    17454394# or
    17464395#    move-to-lower -R .
     
    17524401help()
    17534402@{
    1754         cat << eof
     4403        cat << eof
    17554404Usage: $0 [-n] [-r] [-h] files...
    17564405@end group
     
    17854434while :
    17864435do
    1787     case "$1" in 
     4436    case "$1" in
    17884437        -n) apply_cmd='cat' ;;
    17894438        -R) finder='find "$@@" -type f';;
     
    18134462esac
    18144463@end group
    1815        
     4464
    18164465eval $finder | sed -n '
    18174466
     
    18554504@group
    18564505# check if converted file name is equal to original file name,
    1857 # if it is, do not print nothing
     4506# if it is, do not print anything
    18584507/^.*\/\(.*\)\n\1/b
     4508@end group
     4509
     4510@group
     4511# escape special characters for the shell
     4512s/["$`\\]/\\&/g
    18594513@end group
    18604514
     
    19744628@c end---------------------------------------------
    19754629
     4630
     4631@node Text search across multiple lines
     4632@section Text search across multiple lines
     4633
     4634This section uses @code{N} and @code{D} commands to search for
     4635consecutive words spanning multiple lines. @xref{Multiline techniques}.
     4636
     4637These examples deal with finding doubled occurrences of words in a document.
     4638
     4639Finding doubled words in a single line is easy using GNU @command{grep}
     4640and similarly with @value{SSED}:
     4641
     4642@c NOTE: in all examples, 'the@ the' is used to prevent
     4643@c 'make syntax-check' from complaining about double words.
     4644@codequoteundirected on
     4645@codequotebacktick on
     4646@example
     4647@group
     4648$ cat two-cities-dup1.txt
     4649It was the best of times,
     4650it was the worst of times,
     4651it was the@ the age of wisdom,
     4652it was the age of foolishness,
     4653
     4654$ grep -E '\b(\w+)\s+\1\b' two-cities-dup1.txt
     4655it was the@ the age of wisdom,
     4656
     4657$ grep -n -E '\b(\w+)\s+\1\b' two-cities-dup1.txt
     46583:it was the@ the age of wisdom,
     4659
     4660$ sed -En '/\b(\w+)\s+\1\b/p' two-cities-dup1.txt
     4661it was the@ the age of wisdom,
     4662
     4663$ sed -En '/\b(\w+)\s+\1\b/@{=;p@}' two-cities-dup1.txt
     46643
     4665it was the@ the age of wisdom,
     4666@end group
     4667@end example
     4668@codequoteundirected off
     4669@codequotebacktick off
     4670
     4671@itemize @bullet
     4672@item
     4673The regular expression @samp{\b\w+\s+} searches for word-boundary (@samp{\b}),
     4674followed by one-or-more word-characters (@samp{\w+}), followed by whitespace
     4675(@samp{\s+}). @xref{regexp extensions}.
     4676
     4677@item
     4678Adding parentheses around the @samp{(\w+)} expression creates a subexpression.
     4679The regular expression pattern @samp{(PATTERN)\s+\1} defines a subexpression
     4680(in the parentheses) followed by a back-reference, separated by whitespace.
     4681A successful match means the @var{PATTERN} was repeated twice in succession.
     4682@xref{Back-references and Subexpressions}.
     4683
     4684@item
     4685The word-boundery expression (@samp{\b}) at both ends ensures partial
     4686words are not matched (e.g. @samp{the then} is not a desired match).
     4687@c Thanks to Jim for pointing this out in
     4688@c https://lists.gnu.org/archive/html/sed-devel/2016-12/msg00041.html
     4689
     4690@item
     4691The @option{-E} option enables extended regular expression syntax, alleviating
     4692the need to add backslashes before the parenthesis. @xref{ERE syntax}.
     4693
     4694@end itemize
     4695
     4696When the doubled word span two lines the above regular expression
     4697will not find them as @command{grep} and @command{sed} operate line-by-line.
     4698
     4699By using @command{N} and @command{D} commands, @command{sed} can apply
     4700regular expressions on multiple lines (that is, multiple lines are stored
     4701in the pattern space, and the regular expression works on it):
     4702
     4703@c NOTE: use 'the@*the' instead of a real new line to prevent
     4704@c 'make syntax-check' to complain about doubled-words.
     4705@codequoteundirected on
     4706@codequotebacktick on
     4707@example
     4708$ cat two-cities-dup2.txt
     4709It was the best of times, it was the
     4710worst of times, it was the@*the age of wisdom,
     4711it was the age of foolishness,
     4712
     4713$ sed -En '@{N; /\b(\w+)\s+\1\b/@{=;p@} ; D@}'  two-cities-dup2.txt
     47143
     4715worst of times, it was the@*the age of wisdom,
     4716@end example
     4717@codequoteundirected off
     4718@codequotebacktick off
     4719
     4720@itemize @bullet
     4721@item
     4722The @command{N} command appends the next line to the pattern space
     4723(thus ensuring it contains two consecutive lines in every cycle).
     4724
     4725@item
     4726The regular expression uses @samp{\s+} for word separator which matches
     4727both spaces and newlines.
     4728
     4729@item
     4730The regular expression matches, the entire pattern space is printed
     4731with @command{p}. No lines are printed by default due to the @option{-n} option.
     4732
     4733@item
     4734The @command{D} removes the first line from the pattern space (up until the
     4735first newline), readying it for the next cycle.
     4736@end itemize
     4737
     4738See the GNU @command{coreutils} manual for an alternative solution using
     4739@command{tr -s} and @command{uniq} at
     4740@c NOTE: cheating and keeping the URL line shorter than 80 characters
     4741@c by using 'gnu.org' and '/s/'.
     4742@url{https://gnu.org/s/coreutils/manual/html_node/Squeezing-and-deleting.html}.
     4743
     4744@node Line length adjustment
     4745@section Line length adjustment
     4746
     4747This section uses @code{N} and @code{P} commands to read and write
     4748lines, and the @code{b} command for branching.
     4749@xref{Multiline techniques} and @ref{Branching and flow control}.
     4750
     4751This (somewhat contrived) example deal with formatting and wrapping
     4752lines of text of the following input file:
     4753
     4754@example
     4755@group
     4756$ cat two-cities-mix.txt
     4757It was the best of times, it was
     4758the worst of times, it
     4759was the age of
     4760wisdom,
     4761it
     4762was
     4763the age
     4764of foolishness,
     4765@end group
     4766@end example
     4767
     4768@exdent The following sed program wraps lines at 40 characters:
     4769@codequoteundirected on
     4770@codequotebacktick on
     4771@example
     4772@group
     4773$ cat wrap40.sed
     4774# outer loop
     4775:x
     4776
     4777# Append a newline followed by the next input line to the pattern buffer
     4778N
     4779
     4780# Remove all newlines from the pattern buffer
     4781s/\n/ /g
     4782
     4783
     4784# Inner loop
     4785:y
     4786
     4787# Add a newline after the first 40 characters
     4788s/(.@{40,40@})/\1\n/
     4789
     4790# If there is a newline in the pattern buffer
     4791# (i.e. the previous substitution added a newline)
     4792/\n/ @{
     4793    # There are newlines in the pattern buffer -
     4794    # print the content until the first newline.
     4795    P
     4796
     4797   # Remove the printed characters and the first newline
     4798   s/.*\n//
     4799
     4800   # branch to label 'y' - repeat inner loop
     4801   by
     4802 @}
     4803
     4804# No newlines in the pattern buffer - Branch to label 'x' (outer loop)
     4805# and read the next input line
     4806bx
     4807@end group
     4808@end example
     4809@codequoteundirected off
     4810@codequotebacktick off
     4811
     4812
     4813
     4814@exdent The wrapped output:
     4815@codequoteundirected on
     4816@codequotebacktick on
     4817@example
     4818@group
     4819$ sed -E -f wrap40.sed two-cities-mix.txt
     4820It was the best of times, it was the wor
     4821st of times, it was the age of wisdom, i
     4822t was the age of foolishness,
     4823@end group
     4824@end example
     4825@codequoteundirected off
     4826@codequotebacktick off
     4827
     4828
     4829
     4830
     4831@node Adding a header to multiple files
     4832@section Adding a header to multiple files
     4833
     4834@value{SSED} can be used to safely modify multiple files at once.
     4835
     4836@exdent Add a single line to the beginning of source code files:
     4837
     4838@codequoteundirected on
     4839@codequotebacktick on
     4840@example
     4841sed -i '1i/* Copyright (C) FOO BAR */' *.c
     4842@end example
     4843@codequoteundirected off
     4844@codequotebacktick off
     4845
     4846@exdent Adding a few lines is possible using @samp{\n} in the text:
     4847
     4848@codequoteundirected on
     4849@codequotebacktick on
     4850@example
     4851sed -i '1i/*\n * Copyright (C) FOO BAR\n * Created by Jane Doe\n */' *.c
     4852@end example
     4853@codequoteundirected off
     4854@codequotebacktick off
     4855
     4856To add multiple lines from another file, use @code{0rFILE}.
     4857A typical use case is adding a license notice header to all files:
     4858
     4859@codequoteundirected on
     4860@codequotebacktick on
     4861@example
     4862## Create the header file:
     4863$ cat<<'EOF'>LIC.TXT
     4864/*
     4865    Copyright (C) 1989-2021 FOO BAR
     4866
     4867    This program is free software; you can redistribute it and/or modify
     4868    it under the terms of the GNU General Public License as published by
     4869    the Free Software Foundation; either version 3, or (at your option)
     4870    any later version.
     4871
     4872    This program is distributed in the hope that it will be useful,
     4873    but WITHOUT ANY WARRANTY; without even the implied warranty of
     4874    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
     4875    GNU General Public License for more details.
     4876
     4877    You should have received a copy of the GNU General Public License
     4878    along with this program; If not, see <https://www.gnu.org/licenses/>.
     4879*/
     4880EOF
     4881
     4882## Add the file at the beginning of all source code files:
     4883$ sed -i '0rLIC.TXT' *.cpp *.h
     4884@end example
     4885@codequoteundirected off
     4886@codequotebacktick off
     4887
     4888
     4889With script files (e.g. @file{.sh},@file{.py},@file{.pl} files)
     4890the license notice typically appears @emph{after} the first line (the
     4891'shebang' @samp{#!} line). The @code{1rFILE} command will add @file{FILE}
     4892@emph{after} the first line:
     4893
     4894@codequoteundirected on
     4895@codequotebacktick on
     4896@example
     4897## Create the header file:
     4898$ cat<<'EOF'>LIC.TXT
     4899##
     4900## Copyright (C) 1989-2021 FOO BAR
     4901##
     4902## This program is free software; you can redistribute it and/or modify
     4903## it under the terms of the GNU General Public License as published by
     4904## the Free Software Foundation; either version 3, or (at your option)
     4905## any later version.
     4906##
     4907## This program is distributed in the hope that it will be useful,
     4908## but WITHOUT ANY WARRANTY; without even the implied warranty of
     4909## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
     4910## GNU General Public License for more details.
     4911##
     4912## You should have received a copy of the GNU General Public License
     4913## along with this program; If not, see <https://www.gnu.org/licenses/>.
     4914##
     4915##
     4916EOF
     4917
     4918## Add the file at the beginning of all source code files:
     4919$ sed -i '1rLIC.TXT' *.py *.sh
     4920@end example
     4921@codequoteundirected off
     4922@codequotebacktick off
     4923
     4924The above @command{sed} commands can be combined with @command{find}
     4925to locate files in all subdirectories, @command{xargs} to run additional
     4926commands on selected files and @command{grep} to filter out files that already
     4927contain a copyright notice:
     4928
     4929@codequoteundirected on
     4930@codequotebacktick on
     4931@example
     4932find \( -iname '*.cpp' -o -iname '*.c' -o -iname '*.h' \) \
     4933    | xargs grep -Li copyright \
     4934    | xargs -r sed -i '0rLIC.TXT'
     4935@end example
     4936@codequoteundirected off
     4937@codequotebacktick off
     4938
     4939@exdent Or a slightly safe version (handling files with spaces and newlines):
     4940
     4941@codequoteundirected on
     4942@codequotebacktick on
     4943@example
     4944find \( -iname '*.cpp' -o -iname '*.c' -o -iname '*.h' \) -print0 \
     4945    | xargs -0 grep -Z -Li copyright \
     4946    | xargs -0 -r sed -i '0rLIC.TXT'
     4947@end example
     4948@codequoteundirected off
     4949@codequotebacktick off
     4950
     4951Note: using the @code{0} address with @code{r} command requires @value{SSED}
     4952version 4.9 or later. @xref{Zero Address}.
     4953
     4954
     4955
    19764956@node tac
    19774957@section Reverse Lines of Files
     
    19814961is a @command{tac} workalike.
    19824962
    1983 Note that on implementations other than @acronym{GNU} @command{sed}
    1984 @ifset PERL
    1985 and @value{SSED}
    1986 @end ifset
     4963Note that on implementations other than GNU @command{sed}
    19874964this script might easily overflow internal buffers.
    19884965
     
    20154992
    20164993This script replaces @samp{cat -n}; in fact it formats its output
    2017 exactly like @acronym{GNU} @command{cat} does.
     4994exactly like GNU @command{cat} does.
    20184995
    20194996Of course this is completely useless and for two reasons:  first,
     
    22545231@group
    22555232# Convert words to a's
    2256 s/[ @kbd{tab}][ @kbd{tab}]*/ /g
     5233s/[ @kbd{@key{TAB}}][ @kbd{@key{TAB}}]*/ /g
    22575234s/^/ /
    22585235s/ [^ ][^ ]*/a /g
     
    24315408@c end---------------------------------------------
    24325409
    2433 As you can see, we mantain a 2-line window using @code{P} and @code{D}.
     5410As you can see, we maintain a 2-line window using @code{P} and @code{D}.
    24345411This technique is often used in advanced @command{sed} scripts.
    24355412
     
    25855562fastest.  Note that loops are completely done with @code{n} and
    25865563@code{b}, without relying on @command{sed} to restart the
    2587 the script automatically at the end of a line.
     5564script automatically at the end of a line.
    25885565
    25895566@c start-------------------------------------------
     
    26035580# get next
    26045581n
    2605 # got chars? print it again, etc... 
     5582# got chars? print it again, etc...
    26065583/./bx
    26075584@end group
     
    26315608@chapter @value{SSED}'s Limitations and Non-limitations
    26325609
    2633 @cindex @acronym{GNU} extensions, unlimited line length
     5610@cindex GNU extensions, unlimited line length
    26345611@cindex Portability, line length limitations
    26355612For those who want to write portable @command{sed} scripts,
     
    26475624the size of the buffer that can be processed by certain patterns.
    26485625
    2649 @ifset PERL
    2650 There are some size limitations in the regular expression
    2651 matcher but it is hoped that they will never in practice
    2652 be relevant.  The maximum length of a compiled pattern
    2653 is 65539 (sic) bytes.  All values in repeating quantifiers
    2654 must be less than 65536.  The maximum nesting depth of
    2655 all parenthesized subpatterns, including capturing and
    2656 non-capturing subpatterns@footnote{The
    2657 distinction is meaningful when referring to Perl-style
    2658 regular expressions.}, assertions, and other types of
    2659 subpattern, is 200.
    2660 
    2661 Also, @value{SSED} recognizes the @sc{posix} syntax
    2662 @code{[.@var{ch}.]} and @code{[=@var{ch}=]}
    2663 where @var{ch} is a ``collating element'', but these
    2664 are not supported, and an error is given if they are
    2665 encountered.
    2666 
    2667 Here are a few distinctions between the real Perl-style
    2668 regular expressions and those that @option{-R} recognizes.
    2669 
    2670 @enumerate
    2671 @item
    2672 Lookahead assertions do not allow repeat quantifiers after them
    2673 Perl permits them, but they do not mean what you
    2674 might think. For example, @samp{(?!a)@{3@}} does not assert that the
    2675 next three characters are not @samp{a}. It just asserts three times that the
    2676 next character is not @samp{a} --- a waste of time and nothing else.
    2677 
    2678 @item
    2679 Capturing subpatterns that occur inside  negative  lookahead
    2680 head  assertions  are  counted,  but  their  entries are counted
    2681 as empty in the second half of an @code{s} command.
    2682 Perl sets its numerical variables from any such patterns
    2683 that are matched before the assertion fails to match
    2684 something (thereby succeeding), but only if the negative
    2685 lookahead assertion contains just one branch.
    2686 
    2687 @item
    2688 The following Perl escape sequences are not supported:
    2689 @samp{\l}, @samp{\u}, @samp{\L}, @samp{\U}, @samp{\E},
    2690 @samp{\Q}. In fact these are implemented by Perl's general
    2691 string-handling and are not part of its pattern matching engine.
    2692 
    2693 @item
    2694 The Perl @samp{\G} assertion is not supported as it is not
    2695 relevant to single pattern matches.
    2696 
    2697 @item
    2698 Fairly obviously, @value{SSED} does not support the @samp{(?@{code@})}
    2699 and @samp{(?p@{code@})} constructions. However, there is some experimental
    2700 support for recursive patterns using the non-Perl item @samp{(?R)}.
    2701 
    2702 @item
    2703 There are at the time of writing some oddities in Perl
    2704 5.005_02 concerned with the settings of captured strings
    2705 when part of a pattern is repeated. For example, matching
    2706 @samp{aba} against the pattern @samp{/^(a(b)?)+$/} sets
    2707 @samp{$2}@footnote{@samp{$2} would be @samp{\2} in @value{SSED}.}
    2708 to the value @samp{b}, but matching @samp{aabbaa}
    2709 against @samp{/^(aa(bb)?)+$/} leaves @samp{$2}
    2710 unset.  However, if the pattern is changed to
    2711 @samp{/^(aa(b(b))?)+$/} then @samp{$2} (and @samp{$3}) are set.
    2712 In Perl 5.004 @samp{$2} is set in both cases, and that is also
    2713 true of @value{SSED}.
    2714 
    2715 @item
    2716 Another as yet unresolved discrepancy is that in Perl
    2717 5.005_02 the pattern @samp{/^(a)?(?(1)a|b)+$/} matches
    2718 the string @samp{a}, whereas in @value{SSED} it does not.
    2719 However, in both Perl and @value{SSED} @samp{/^(a)?a/} matched
    2720 against @samp{a} leaves $1 unset.
    2721 @end enumerate
    2722 @end ifset
    27235626
    27245627@node Other Resources
    27255628@chapter Other Resources for Learning About @command{sed}
    27265629
     5630For up to date information about @value{SSED} please
     5631visit @uref{https://www.gnu.org/software/sed/}.
     5632
     5633Send general questions and suggestions to @email{sed-devel@@gnu.org}.
     5634Visit the mailing list archives for past discussions at
     5635@uref{https://lists.gnu.org/archive/html/sed-devel/}.
     5636
    27275637@cindex Additional reading about @command{sed}
    2728 In addition to several books that have been written about @command{sed}
    2729 (either specifically or as chapters in books which discuss
    2730 shell programming), one can find out more about @command{sed}
    2731 (including suggestions of a few books) from the FAQ
    2732 for the @code{sed-users} mailing list, available from any of:
    2733 @display
    2734  @uref{http://www.student.northpark.edu/pemente/sed/sedfaq.html}
    2735  @uref{http://sed.sf.net/grabbag/tutorials/sedfaq.html}
    2736 @end display
    2737 
    2738 Also of interest are
    2739 @uref{http://www.student.northpark.edu/pemente/sed/index.htm}
    2740 and @uref{http://sed.sf.net/grabbag},
    2741 which include @command{sed} tutorials and other @command{sed}-related goodies.
    2742 
    2743 The @code{sed-users} mailing list itself maintained by Sven Guckes.
    2744 To subscribe, visit @uref{http://groups.yahoo.com} and search
    2745 for the @code{sed-users} mailing list.
     5638The following resources provide information about @command{sed}
     5639(both @value{SSED} and other variations). Note these not maintained by
     5640@value{SSED} developers.
     5641
     5642@itemize @bullet
     5643
     5644@item
     5645sed @code{$HOME}: @uref{http://sed.sf.net}
     5646
     5647@item
     5648sed FAQ: @uref{http://sed.sf.net/sedfaq.html}
     5649
     5650@item
     5651seder's grabbag: @uref{http://sed.sf.net/grabbag}
     5652
     5653@item
     5654The @code{sed-users} mailing list maintained by Sven Guckes:
     5655@uref{http://groups.yahoo.com/group/sed-users/}
     5656(note this is @emph{not} the @value{SSED} mailing list).
     5657
     5658@end itemize
    27465659
    27475660@node Reporting Bugs
     
    27495662
    27505663@cindex Bugs, reporting
    2751 Email bug reports to @email{bonzini@@gnu.org}.
    2752 Be sure to include the word ``sed'' somewhere in the @code{Subject:} field.
     5664Email bug reports to @email{bug-sed@@gnu.org}.
    27535665Also, please include the output of @samp{sed --version} in the body
    27545666of your report if at all possible.
     
    27575669
    27585670@example
    2759 @i{while building frobme-1.3.4}
    2760 $ configure 
     5671@i{@i{@r{while building frobme-1.3.4}}}
     5672$ configure
    27615673@error{} sed: file sedscr line 1: Unknown option to 's'
    27625674@end example
     
    27775689
    27785690@table @asis
     5691@anchor{N_command_last_line}
    27795692@item @code{N} command on the last line
    27805693@cindex Portability, @code{N} command on the last line
     
    27865699the @command{-n} command switch has been specified.  This choice is
    27875700by design.
     5701
     5702Default behavior (gnu extension, non-POSIX conforming):
     5703@example
     5704$ seq 3 | sed N
     57051
     57062
     57073
     5708@end example
     5709@noindent
     5710To force POSIX-conforming behavior:
     5711@example
     5712$ seq 3 | sed --posix N
     57131
     57142
     5715@end example
    27885716
    27895717For example, the behavior of
     
    28065734/foo/@{ N;N;N;N;N;N;N;N;N; @}
    28075735@end example
    2808  
     5736
    28095737@cindex @code{POSIXLY_CORRECT} behavior, @code{N} command
    28105738In any case, the simplest workaround is to use @code{$d;N} in
     
    28135741
    28145742@item Regex syntax clashes (problems with backslashes)
    2815 @cindex @acronym{GNU} extensions, to basic regular expressions
     5743@cindex GNU extensions, to basic regular expressions
    28165744@cindex Non-bugs, regex syntax clashes
    28175745@command{sed} uses the @sc{posix} basic regular expression syntax.  According to
     
    28215749@code{\>}, @code{\b}, @code{\B}, @code{\w}, and @code{\W}.
    28225750
    2823 As in all @acronym{GNU} programs that use @sc{posix} basic regular
     5751As in all GNU programs that use @sc{posix} basic regular
    28245752expressions, @command{sed} interprets these escape sequences as special
    28255753characters.  So, @code{x\+} matches one or more occurrences of @samp{x}.
     
    28325760spurious backslashes if they are to be used with modern implementations
    28335761of @command{sed}, like
    2834 @ifset PERL
    2835 @value{SSED} or
    2836 @end ifset
    2837 @acronym{GNU} @command{sed}.
     5762GNU @command{sed}.
    28385763
    28395764On the other hand, some scripts use s|abc\|def||g to remove occurrences
     
    28415766@command{sed} 4.0.x, newer versions interpret this as removing the
    28425767string @code{abc|def}.  This is again undefined behavior according to
    2843 @acronym{POSIX}, and this interpretation is arguably more robust: older
     5768POSIX, and this interpretation is arguably more robust: older
    28445769@command{sed}s, for example, required that the regex matcher parsed
    28455770@code{\/} as @code{/} in the common case of escaping a slash, which is
     
    28475772because the regex matcher is only partially under our control.
    28485773
    2849 @cindex @acronym{GNU} extensions, special escapes
     5774@cindex GNU extensions, special escapes
    28505775In addition, this version of @command{sed} supports several escape characters
    28515776(some of which are multi-character) to insert non-printable characters
     
    28635788(@pxref{Invoking sed, , Invocation}) lets you clobber
    28645789protected files.  This is not a bug, but rather a consequence
    2865 of how the Unix filesystem works.
     5790of how the Unix file system works.
    28665791
    28675792The permissions on a file say what can happen to the data
     
    28735798modifying the contents of the directory, so the operation depends on
    28745799the permissions of the directory, not of the file.  For this same
    2875 reason, @command{sed} does not let you use @option{-i} on a writeable file
    2876 in a read-only directory (but unbelievably nobody reports that as a
    2877 bug@dots{}).
     5800reason, @command{sed} does not let you use @option{-i} on a writable file
     5801in a read-only directory, and will break hard or symbolic links when
     5802@option{-i} is used on such a file.
    28785803
    28795804@item @code{0a} does not work (gives an error)
     5805@cindex @code{0} address
     5806@cindex GNU extensions, @code{0} address
     5807@cindex Non-bugs, @code{0} address
     5808
    28805809There is no line 0.  0 is a special address that is only used to treat
    28815810addresses like @code{0,/@var{RE}/} as active when the script starts: if
    2882 you write @code{1,/abc/d} and the first line includes the word @samp{abc},
     5811you write @code{1,/abc/d} and the first line includes the string @samp{abc},
    28835812then that match would be ignored because address ranges must span at least
    28845813two lines (barring the end of the file); but what you probably wanted is
     
    28885817@ifclear PERL
    28895818@item @code{[a-z]} is case insensitive
     5819@cindex Non-bugs, localization-related
     5820
    28905821You are encountering problems with locales.  POSIX mandates that @code{[a-z]}
    28915822uses the current locale's collation order -- in C parlance, that means using
    28925823@code{strcoll(3)} instead of @code{strcmp(3)}.  Some locales have a
    2893 case-insensitive collation order, others don't: one of those that have
    2894 problems is Estonian.
     5824case-insensitive collation order, others don't.
    28955825
    28965826Another problem is that @code{[a-z]} tries to use collation symbols.
    2897 This only happens if you are on the @acronym{GNU} system, using
    2898 @acronym{GNU} libc's regular expression matcher instead of compiling the
    2899 one supplied with @acronym{GNU} sed.  In a Danish locale, for example,
     5827This only happens if you are on the GNU system, using
     5828GNU libc's regular expression matcher instead of compiling the
     5829one supplied with GNU sed.  In a Danish locale, for example,
    29005830the regular expression @code{^[a-z]$} matches the string @samp{aa},
    29015831because this is a single collating symbol that comes after @samp{a}
     
    29055835To work around these problems, which may cause bugs in shell scripts, set
    29065836the @env{LC_COLLATE} and @env{LC_CTYPE} environment variables to @samp{C}.
     5837
     5838@item @code{s/.*//} does not clear pattern space
     5839@cindex Non-bugs, localization-related
     5840@cindex @value{SSEDEXT}, emptying pattern space
     5841@cindex Emptying pattern space
     5842
     5843This happens if your input stream includes invalid multibyte
     5844sequences.  @sc{posix} mandates that such sequences
     5845are @emph{not} matched by @samp{.}, so that @samp{s/.*//} will not clear
     5846pattern space as you would expect.  In fact, there is no way to clear
     5847sed's buffers in the middle of the script in most multibyte locales
     5848(including UTF-8 locales).  For this reason, @value{SSED} provides a `z'
     5849command (for `zap') as an extension.
     5850
     5851To work around these problems, which may cause bugs in shell scripts, set
     5852the @env{LC_COLLATE} and @env{LC_CTYPE} environment variables to @samp{C}.
    29075853@end ifclear
    29085854@end table
    29095855
    29105856
    2911 @node Extended regexps
    2912 @appendix Extended regular expressions
    2913 @cindex Extended regular expressions, syntax
    2914 
    2915 The only difference between basic and extended regular expressions is in
    2916 the behavior of a few characters: @samp{?}, @samp{+}, parentheses,
    2917 and braces (@samp{@{@}}).  While basic regular expressions require
    2918 these to be escaped if you want them to behave as special characters,
    2919 when using extended regular expressions you must escape them if
    2920 you want them @emph{to match a literal character}.
    2921 
    2922 @noindent
    2923 Examples:
    2924 @table @code
    2925 @item abc?
    2926 becomes @samp{abc\?} when using extended regular expressions.  It matches
    2927 the literal string @samp{abc?}.
    2928 
    2929 @item c\+
    2930 becomes @samp{c+} when using extended regular expressions.  It matches
    2931 one or more @samp{c}s.
    2932 
    2933 @item a\@{3,\@}
    2934 becomes @samp{a@{3,@}} when using extended regular expressions.  It matches
    2935 three or more @samp{a}s.
    2936 
    2937 @item \(abc\)\@{2,3\@}
    2938 becomes @samp{(abc)@{2,3@}} when using extended regular expressions.  It
    2939 matches either @samp{abcabc} or @samp{abcabcabc}.
    2940 
    2941 @item \(abc*\)\1
    2942 becomes @samp{(abc*)\1} when using extended regular expressions.
    2943 Backreferences must still be escaped when using extended regular
    2944 expressions.
    2945 @end table
    2946 
    2947 @ifset PERL
    2948 @node Perl regexps
    2949 @appendix Perl-style regular expressions
    2950 @cindex Perl-style regular expressions, syntax
    2951 
    2952 @emph{This part is taken from the @file{pcre.txt} file distributed together
    2953 with the free @sc{pcre} regular expression matcher; it was written by Philip Hazel.}
    2954 
    2955 Perl introduced several extensions to regular expressions, some
    2956 of them incompatible with the syntax of regular expressions
    2957 accepted by Emacs and other @acronym{GNU} tools (whose matcher was
    2958 based on the Emacs matcher).  @value{SSED} implements
    2959 both kinds of extensions.
    2960 
    2961 @iftex
    2962 Summarizing, we have:
    2963 
    2964 @itemize @bullet
    2965 @item
    2966 A backslash can introduce several special sequences
    2967 
    2968 @item
    2969 The circumflex, dollar sign, and period characters behave specially
    2970 with regard to new lines
    2971 
    2972 @item
    2973 Strange uses of square brackets are parsed differently
    2974 
    2975 @item
    2976 You can toggle modifiers in the middle of a regular expression
    2977 
    2978 @item
    2979 You can specify that a subpattern does not count when numbering backreferences
    2980 
    2981 @item
    2982 @cindex Greedy regular expression matching
    2983 You can specify greedy or non-greedy matching
    2984 
    2985 @item
    2986 You can have more than ten back references
    2987 
    2988 @item
    2989 You can do complex look aheads and look behinds (in the spirit of
    2990 @code{\b}, but with subpatterns).
    2991 
    2992 @item
    2993 You can often improve performance by avoiding that @command{sed} wastes
    2994 time with backtracking
    2995 
    2996 @item
    2997 You can have if/then/else branches
    2998 
    2999 @item
    3000 You can do recursive matches, for example to look for unbalanced parentheses
    3001 
    3002 @item
    3003 You can have comments and non-significant whitespace, because things can
    3004 get complex...
    3005 @end itemize
    3006 
    3007 Most of these extensions are introduced by the special @code{(?}
    3008 sequence, which gives special meanings to parenthesized groups.
    3009 @end iftex
    3010 @menu
    3011 Other extensions can be roughly subdivided in two categories
    3012 On one hand Perl introduces several more escaped sequences
    3013 (that is, sequences introduced by a backslash).  On the other
    3014 hand, it specifies that if a question mark follows an open
    3015 parentheses it should give a special meaning to the parenthesized
    3016 group.
    3017 
    3018 * Backslash::                       Introduces special sequences
    3019 * Circumflex/dollar sign/period::   Behave specially with regard to new lines
    3020 * Square brackets::                 Are a bit different in strange cases
    3021 * Options setting::                 Toggle modifiers in the middle of a regexp
    3022 * Non-capturing subpatterns::       Are not counted when backreferencing
    3023 * Repetition::                      Allows for non-greedy matching
    3024 * Backreferences::                  Allows for more than 10 back references
    3025 * Assertions::                      Allows for complex look ahead matches
    3026 * Non-backtracking subpatterns::    Often gives more performance
    3027 * Conditional subpatterns::         Allows if/then/else branches
    3028 * Recursive patterns::              For example to match parentheses
    3029 * Comments::                        Because things can get complex...
    3030 @end menu
    3031 
    3032 @node Backslash
    3033 @appendixsec Backslash
    3034 @cindex Perl-style regular expressions, escaped sequences
    3035 
    3036 There are a few difference in the handling of backslashed
    3037 sequences in Perl mode.
    3038 
    3039 First of all, there are no @code{\o} and @code{\d} sequences.
    3040 @sc{ascii} values for characters can be specified in octal
    3041 with a @code{\@var{xxx}} sequence, where @var{xxx} is a
    3042 sequence of up to three octal digits.  If the first digit
    3043 is a zero, the treatment of the sequence is straightforward;
    3044 just note that if the character that follows the escaped digit
    3045 is itself an octal digit, you have to supply three octal digits
    3046 for @var{xxx}.  For example @code{\07} is a @sc{bel} character
    3047 rather than a @sc{nul} and a literal @code{7} (this sequence is
    3048 instead represented by @code{\0007}).
    3049 
    3050 @cindex Perl-style regular expressions, backreferences
    3051 The handling of a backslash followed by a digit other than 0
    3052 is complicated.  Outside a character class, @command{sed} reads it
    3053 and any following digits as a decimal number. If the number
    3054 is less than 10, or if there have been at least that many
    3055 previous capturing left parentheses in the expression, the
    3056 entire sequence is taken as a back reference. A description
    3057 of how this works is given later, following the discussion
    3058 of parenthesized subpatterns.
    3059 
    3060 Inside a character class, or if the decimal number is
    3061 greater than 9 and there have not been that many capturing
    3062 subpatterns, @command{sed} re-reads up to three octal digits following
    3063 the backslash, and generates a single byte from the
    3064 least significant 8 bits of the value. Any subsequent digits
    3065 stand for themselves.  For example:
    3066 
    3067 @example
    3068      \040  @i{is another way of writing a space}
    3069      \40   @i{is the same, provided there are fewer than 40}
    3070            @i{previous capturing subpatterns}
    3071      \7    @i{is always a back reference}
    3072      \011  @i{is always a tab}
    3073      \11   @i{might be a back reference, or another way of}
    3074            @i{writing a tab}
    3075      \0113 @i{is a tab followed by the character @samp{3}}
    3076      \113  @i{is the character with octal code 113 (since there}
    3077            @i{can be no more than 99 back references)}
    3078      \377  @i{is a byte consisting entirely of 1 bits (@sc{ascii} 255)}
    3079      \81   @i{is either a back reference, or a binary zero}
    3080            @i{followed by the two characters @samp{81}}
    3081 @end example
    3082 
    3083 Note that octal values of 100 or greater must not be introduced
    3084 duced by a leading zero, because no more than three octal
    3085 digits are ever read.
    3086 
    3087 All the sequences that define a single byte value can be
    3088 used both inside and outside character classes. In addition,
    3089 inside a character class, the sequence @code{\b} is interpreted
    3090 as the backspace character (hex 08). Outside a character
    3091 class it has a different meaning (see below).
    3092 
    3093 In addition, there are four additional escapes specifying
    3094 generic character classes (like @code{\w} and @code{\W} do):
    3095 
    3096 @cindex Perl-style regular expressions, character classes
    3097 @table @samp
    3098 @item \d
    3099 Matches any decimal digit
    3100 
    3101 @item \D
    3102 Matches any character that is not a decimal digit
    3103 @end table
    3104 
    3105 In Perl mode, these character type sequences can appear both inside and
    3106 outside character classes. Instead, in @sc{posix} mode these sequences
    3107 (as well as @code{\w} and @code{\W}) are treated as two literal characters
    3108 (a backslash and a letter) inside square brackets.
    3109 
    3110 Escaped sequences specifying assertions are also different in
    3111 Perl mode.  An assertion specifies a condition that has to be met
    3112 at a particular point in a match, without consuming any
    3113 characters from the subject string. The use of subpatterns
    3114 for more complicated assertions is described below.  The
    3115 backslashed assertions are
    3116 
    3117 @cindex Perl-style regular expressions, assertions
    3118 @table @samp
    3119 @item \b
    3120 Asserts that the point is at a word boundary.
    3121 A word boundary is a position in the subject string where
    3122 the current character and the previous character do not both
    3123 match @code{\w} or @code{\W} (i.e. one matches @code{\w} and
    3124 the other matches @code{\W}), or the start or end of the string
    3125 if the first or last character matches @code{\w}, respectively.
    3126 
    3127 @item \B
    3128 Asserts that the point is not at a word boundary.
    3129 
    3130 @item \A
    3131 Asserts the matcher is at the start of pattern space (independent
    3132 of multiline mode).
    3133 
    3134 @item \Z
    3135 Asserts the matcher is at the end of pattern space,
    3136 or at a newline before the end of pattern space (independent of
    3137 multiline mode)
    3138 
    3139 @item \z
    3140 Asserts the matcher is at the end of pattern space (independent
    3141 of multiline mode)
    3142 @end table
    3143 
    3144 These assertions may not appear in character classes (but
    3145 note that @code{\b} has a different meaning, namely the
    3146 backspace character, inside a character class).
    3147 Note that Perl mode does not support directly assertions
    3148 for the beginning and the end of word; the @acronym{GNU} extensions
    3149 @code{\<} and @code{\>} achieve this purpose in @sc{posix} mode
    3150 instead.
    3151 
    3152 The @code{\A}, @code{\Z}, and @code{\z} assertions differ
    3153 from the traditional circumflex and dollar sign (described below)
    3154 in that they only ever match at the very start and end of the
    3155 subject string, whatever options are set; in particular @code{\A}
    3156 and @code{\z} are the same as the @acronym{GNU} extensions
    3157 @code{\`} and @code{\'} that are active in @sc{posix} mode.
    3158 
    3159 @node Circumflex/dollar sign/period
    3160 @appendixsec Circumflex, dollar sign, period
    3161 @cindex Perl-style regular expressions, newlines
    3162 
    3163 Outside a character class, in the default matching mode, the
    3164 circumflex character is an assertion which is true only if
    3165 the current matching point is at the start of the subject
    3166 string.  Inside a character class, the circumflex has an entirely
    3167 different meaning (see below).
    3168 
    3169 The circumflex need not be the first character of the pattern if
    3170 a number of alternatives are involved, but it should be the
    3171 first thing in each alternative in which it appears if the
    3172 pattern is ever to match that branch. If all possible alternatives,
    3173 start with a circumflex, that is, if the pattern is
    3174 constrained to match only at the start of the subject, it is
    3175 said to be an @dfn{anchored} pattern. (There are also other constructs
    3176 structs that can cause a pattern to be anchored.)
    3177 
    3178 A dollar sign is an assertion which is true only if the
    3179 current matching point is at the end of the subject string,
    3180 or immediately before a newline character that is the last
    3181 character in the string (by default).  A dollar sign need not be the
    3182 last character of the pattern if a number of alternatives
    3183 are involved, but it should be the last item in any branch
    3184 in which it appears.  A dollar sign has no special meaning in a
    3185 character class.
    3186 
    3187 @cindex Perl-style regular expressions, multiline
    3188 The meanings of the circumflex and dollar sign characters are
    3189 changed if the @code{M} modifier option is used. When this is
    3190 the case, they match immediately after and immediately
    3191 before an internal @code{\n} character, respectively, in addition
    3192 to matching at the start and end of the subject string.  For
    3193 example, the pattern @code{/^abc$/} matches the subject string
    3194 @samp{def\nabc} in multiline mode, but not otherwise.  Consequently,
    3195 patterns that are anchored in single line mode
    3196 because all branches start with @code{^} are not anchored in
    3197 multiline mode.
    3198 
    3199 @cindex Perl-style regular expressions, multiline
    3200 Note that the sequences @code{\A}, @code{\Z}, and @code{\z}
    3201 can be used to match the start and end of the subject in both
    3202 modes, and if all branches of a pattern start with @code{\A}
    3203 is it always anchored, whether the @code{M} modifier is set or not.
    3204 
    3205 @cindex Perl-style regular expressions, single line
    3206 Outside a character class, a dot in the pattern matches any
    3207 one character in the subject, including a non-printing character,
    3208 but not (by default) newline.  If the @code{S} modifier is used,
    3209 dots match newlines as well.  Actually, the handling of
    3210 dot is entirely independent of the handling of circumflex
    3211 and dollar sign, the only relationship being that they both
    3212 involve newline characters. Dot has no special meaning in a
    3213 character class.
    3214 
    3215 @node Square brackets
    3216 @appendixsec Square brackets
    3217 @cindex Perl-style regular expressions, character classes
    3218 
    3219 An opening square bracket introduces a character class, terminated
    3220 by a closing square bracket.  A closing square bracket on its own
    3221 is not special.  If a closing square bracket is required as a
    3222 member of the class, it should be the first data character in
    3223 the class (after an initial circumflex, if present) or escaped with a backslash.
    3224 
    3225 A character class matches a single character in the subject;
    3226 the character must be in the set of characters defined by
    3227 the class, unless the first character in the class is a circumflex,
    3228 in which case the subject character must not be in
    3229 the set defined by the class. If a circumflex is actually
    3230 required as a member of the class, ensure it is not the
    3231 first character, or escape it with a backslash.
    3232 
    3233 For example, the character class [aeiou] matches any lower
    3234 case vowel, while [^aeiou] matches any character that is not
    3235 a lower case vowel. Note that a circumflex is just a convenient
    3236 venient notation for specifying the characters which are in
    3237 the class by enumerating those that are not. It is not an
    3238 assertion: it still consumes a character from the subject
    3239 string, and fails if the current pointer is at the end of
    3240 the string.
    3241 
    3242 @cindex Perl-style regular expressions, case-insensitive
    3243 When caseless matching is set, any letters in a class
    3244 represent both their upper case and lower case versions, so
    3245 for example, a caseless @code{[aeiou]} matches uppercase
    3246 and lowercase @samp{A}s, and a caseless @code{[^aeiou]}
    3247 does not match @samp{A}, whereas a case-sensitive version would.
    3248 
    3249 @cindex Perl-style regular expressions, single line
    3250 @cindex Perl-style regular expressions, multiline
    3251 The newline character is never treated in any special way in
    3252 character classes, whatever the setting of the @code{S} and
    3253 @code{M} options (modifiers) is.  A class such as @code{[^a]} will
    3254 always match a newline.
    3255 
    3256 The minus (hyphen) character can be used to specify a range
    3257 of characters in a character class.  For example, @code{[d-m]}
    3258 matches any letter between d and m, inclusive.  If a minus
    3259 character is required in a class, it must be escaped with a
    3260 backslash or appear in a position where it cannot be interpreted
    3261 as indicating a range, typically as the first or last
    3262 character in the class.
    3263 
    3264 It is not possible to have the literal character @code{]} as the
    3265 end character of a range.  A pattern such as @code{[W-]46]} is
    3266 interpreted as a class of two characters (@code{W} and @code{-})
    3267 followed by a literal string @code{46]}, so it would match
    3268 @samp{W46]} or @samp{-46]}. However, if the @code{]} is escaped
    3269 with a backslash it is interpreted as the end of range, so
    3270 @code{[W-\]46]} is interpreted as a single class containing a
    3271 range followed by two separate characters. The octal or
    3272 hexadecimal representation of @code{]} can also be used to end a range.
    3273 
    3274 Ranges operate in @sc{ascii} collating sequence. They can also be
    3275 used for characters specified numerically, for example
    3276 @code{[\000-\037]}. If a range that includes letters is used when
    3277 caseless matching is set, it matches the letters in either
    3278 case. For example, a caseless @code{[W-c]} is equivalent to
    3279 @code{[][\^_`wxyzabc]}, matched caselessly, and if character
    3280 tables for the French locale are in use, @code{[\xc8-\xcb]}
    3281 matches accented E characters in both cases.
    3282 
    3283 Unlike in @sc{posix} mode, the character types @code{\d},
    3284 @code{\D}, @code{\s}, @code{\S}, @code{\w}, and @code{\W}
    3285 may also appear in a character class, and add the characters
    3286 that they match to the class. For example, @code{[\dABCDEF]} matches any
    3287 hexadecimal digit.  A circumflex can conveniently be used
    3288 with the upper case character types to specify a more restricted
    3289 set of characters than the matching lower case type.
    3290 For example, the class @code{[^\W_]} matches any letter or digit,
    3291 but not underscore.
    3292 
    3293 All non-alphameric characters other than @code{\}, @code{-},
    3294 @code{^} (at the start) and the terminating @code{]}
    3295 are non-special in character classes, but it does no harm
    3296 if they are escaped.
    3297 
    3298 Perl 5.6 supports the @sc{posix} notation for character classes, which
    3299 uses names enclosed by @code{[:} and @code{:]} within the enclosing
    3300 square brackets, and @value{SSED} supports this notation as well.
    3301 For example,
    3302 
    3303 @example
    3304      [01[:alpha:]%]
    3305 @end example
    3306 
    3307 @noindent
    3308 matches @samp{0}, @samp{1}, any alphabetic character, or @samp{%}.
    3309 The supported class names are
    3310 
    3311 @table @code
    3312 @item alnum
    3313 Matches letters and digits
    3314 
    3315 @item alpha
    3316 Matches letters
    3317 
    3318 @item ascii
    3319 Matches character codes 0 - 127
    3320 
    3321 @item cntrl
    3322 Matches control characters
    3323 
    3324 @item digit
    3325 Matches decimal digits (same as \d)
    3326 
    3327 @item graph
    3328 Matches printing characters, excluding space
    3329 
    3330 @item lower
    3331 Matches lower case letters
    3332 
    3333 @item print
    3334 Matches printing characters, including space
    3335 
    3336 @item punct
    3337 Matches printing characters, excluding letters and digits
    3338 
    3339 @item space
    3340 Matches white space (same as \s)
    3341 
    3342 @item upper
    3343 Matches upper case letters
    3344 
    3345 @item word
    3346 Matches ``word'' characters (same as \w)
    3347 
    3348 @item xdigit
    3349 Matches hexadecimal digits
    3350 @end table
    3351 
    3352 The names @code{ascii} and @code{word} are extensions valid only in
    3353 Perl mode.  Another Perl extension is negation, which is
    3354 indicated by a circumflex character after the colon. For example,
    3355 
    3356 @example
    3357      [12[:^digit:]]
    3358 @end example
    3359 
    3360 @noindent
    3361 matches @samp{1}, @samp{2}, or any non-digit.
    3362 
    3363 @node Options setting
    3364 @appendixsec Options setting
    3365 @cindex Perl-style regular expressions, toggling options
    3366 @cindex Perl-style regular expressions, case-insensitive
    3367 @cindex Perl-style regular expressions, multiline
    3368 @cindex Perl-style regular expressions, single line
    3369 @cindex Perl-style regular expressions, extended
    3370 
    3371 The settings of the @code{I}, @code{M}, @code{S}, @code{X}
    3372 modifiers can be changed from within the pattern by
    3373 a sequence of Perl option letters enclosed between @code{(?}
    3374 and @code{)}. The option letters must be lowercase.
    3375 
    3376 For example, @code{(?im)} sets caseless, multiline matching. It is
    3377 also possible to unset these options by preceding the letter
    3378 with a hyphen; you can also have combined settings and unsettings:
    3379 @code{(?im-sx)} sets caseless and multiline matching,
    3380 while unsets single line matching (for dots) and extended
    3381 whitespace interpretation.  If a letter appears both before
    3382 and after the hyphen, the option is unset.
    3383 
    3384 The scope of these option changes depends on where in the
    3385 pattern the setting occurs. For settings that are outside
    3386 any subpattern (defined below), the effect is the same as if
    3387 the options were set or unset at the start of matching. The
    3388 following patterns all behave in exactly the same way:
    3389 
    3390 @example
    3391      (?i)abc
    3392      a(?i)bc
    3393      ab(?i)c
    3394      abc(?i)
    3395 @end example
    3396 
    3397 which in turn is the same as specifying the pattern abc with
    3398 the @code{I} modifier.  In other words, ``top level'' settings
    3399 apply to the whole pattern (unless there are other
    3400 changes inside subpatterns). If there is more than one setting
    3401 of the same option at top level, the rightmost setting
    3402 is used.
    3403 
    3404 If an option change occurs inside a subpattern, the effect
    3405 is different.  This is a change of behaviour in Perl 5.005.
    3406 An option change inside a subpattern affects only that part
    3407 of the subpattern @emph{that follows} it, so
    3408 
    3409 @example
    3410      (a(?i)b)c
    3411 @end example
    3412 
    3413 @noindent
    3414 matches abc and aBc and no other  strings  (assuming
    3415 case-sensitive matching is used).  By this means, options can
    3416 be made to have different settings in different parts of the
    3417 pattern.  Any changes made in one alternative do carry on
    3418 into subsequent branches within the same subpattern.  For
    3419 example,
    3420 
    3421 @example
    3422      (a(?i)b|c)
    3423 @end example
    3424 
    3425 @noindent
    3426 matches @samp{ab}, @samp{aB}, @samp{c}, and @samp{C},
    3427 even though when matching @samp{C} the first branch is
    3428 abandoned before the option setting.
    3429 This is because the effects of option settings happen at
    3430 compile time. There would be some very weird behaviour otherwise.
    3431 
    3432 @ignore
    3433 There are two PCRE-specific options PCRE_UNGREEDY and PCRE_EXTRA
    3434 that can be changed in the same way as the Perl-compatible options by
    3435 using the characters U and X respectively.  The (?X) flag
    3436 setting is special in that it must always occur earlier in
    3437 the pattern than any of the additional features it turns on,
    3438 even when it is at top level. It is best put at the start.
    3439 @end ignore
    3440 
    3441 
    3442 @node Non-capturing subpatterns
    3443 @appendixsec Non-capturing subpatterns
    3444 @cindex Perl-style regular expressions, non-capturing subpatterns
    3445 
    3446 Marking part of a pattern as a subpattern does two things.
    3447 On one hand, it localizes a set of alternatives; on the other
    3448 hand, it sets up the subpattern as a capturing subpattern (as
    3449 defined above).  The subpattern can be backreferenced and
    3450 referenced in the right side of @code{s} commands.
    3451 
    3452 For example, if the string @samp{the red king} is matched against
    3453 the pattern
    3454 
    3455 @example
    3456      the ((red|white) (king|queen))
    3457 @end example
    3458 
    3459 @noindent
    3460 the captured substrings are @samp{red king}, @samp{red},
    3461 and @samp{king}, and are numbered 1, 2, and 3.
    3462 
    3463 The fact that plain parentheses fulfil two functions is not
    3464 always helpful.  There are often times when a grouping
    3465 subpattern is required without a capturing requirement.  If an
    3466 opening parenthesis is followed by @code{?:}, the subpattern does
    3467 not do any capturing, and is not counted when computing the
    3468 number of any subsequent capturing subpatterns. For example,
    3469 if the string @samp{the white queen} is matched against the pattern
    3470 
    3471 @example
    3472      the ((?:red|white) (king|queen))
    3473 @end example
    3474 
    3475 @noindent
    3476 the captured substrings are @samp{white queen} and @samp{queen},
    3477 and are numbered 1 and 2. The maximum number of captured
    3478 substrings is 99, while the maximum number of all subpatterns,
    3479 both capturing and non-capturing, is 200.
    3480 
    3481 As a convenient shorthand, if any option settings are
    3482 equired at the start of a non-capturing subpattern, the
    3483 option letters may appear between the @code{?} and the
    3484 @code{:}.  Thus the two patterns
    3485 
    3486 @example
    3487    (?i:saturday|sunday)
    3488    (?:(?i)saturday|sunday)
    3489 @end example
    3490 
    3491 @noindent
    3492 match exactly the same set of strings.  Because alternative
    3493 branches are tried from left to right, and options are not
    3494 reset until the end of the subpattern is reached, an option
    3495 setting in one branch does affect subsequent branches, so
    3496 the above patterns match @samp{SUNDAY} as well as @samp{Saturday}.
    3497 
    3498 
    3499 @node Repetition
    3500 @appendixsec Repetition
    3501 @cindex Perl-style regular expressions, repetitions
    3502 
    3503 Repetition is specified by quantifiers, which can follow any
    3504 of the following items:
    3505 
    3506 @itemize @bullet
    3507 @item
    3508 a single character, possibly escaped
    3509 
    3510 @item
    3511 the @code{.} special character
    3512 
    3513 @item
    3514 a character class
    3515 
    3516 @item
    3517 a back reference (see next section)
    3518 
    3519 @item
    3520 a parenthesized subpattern (unless it is an assertion; @pxref{Assertions})
    3521 @end itemize
    3522 
    3523 The general repetition quantifier specifies a minimum and
    3524 maximum number of permitted matches, by giving the two
    3525 numbers in curly brackets (braces), separated by a comma.
    3526 The numbers must be less than 65536, and the first must be
    3527 less than or equal to the second. For example:
    3528 
    3529 @example
    3530      z@{2,4@}
    3531 @end example
    3532 
    3533 @noindent
    3534 matches @samp{zz}, @samp{zzz}, or @samp{zzzz}. A closing brace on its own
    3535 is not a special character. If the second number is omitted,
    3536 but the comma is present, there is no upper limit; if the
    3537 second number and the comma are both omitted, the quantifier
    3538 specifies an exact number of required matches. Thus
    3539 
    3540 @example
    3541      [aeiou]@{3,@}
    3542 @end example
    3543 
    3544 @noindent
    3545 matches at least 3 successive vowels, but may match many
    3546 more, while
    3547 
    3548 @example
    3549      \d@{8@}
    3550 @end example
    3551 
    3552 @noindent
    3553 matches exactly 8 digits.  An opening curly bracket that
    3554 appears in a position where a quantifier is not allowed, or
    3555 one that does not match the syntax of a quantifier, is taken
    3556 as a literal character. For example, @{,6@} is not a quantifier,
    3557 but a literal string of four characters.@footnote{It
    3558 raises an error if @option{-R} is not used.}
    3559 
    3560 The quantifier @samp{@{0@}} is permitted, causing the expression to
    3561 behave as if the previous item and the quantifier were not
    3562 present.
    3563 
    3564 For convenience (and historical compatibility) the three
    3565 most common quantifiers have single-character abbreviations:
    3566 
    3567 @table @code
    3568 @item *
    3569 is equivalent to @{0,@}
    3570 
    3571 @item +
    3572 is equivalent to @{1,@}
    3573 
    3574 @item ?
    3575 is equivalent to @{0,1@}
    3576 @end table
    3577 
    3578 It is possible to construct infinite loops by following a
    3579 subpattern that can match no characters with a quantifier
    3580 that has no upper limit, for example:
    3581 
    3582 @example
    3583      (a?)*
    3584 @end example
    3585 
    3586 Earlier versions of Perl used to give an error at
    3587 compile time for such patterns. However, because there are
    3588 cases where this can be useful, such patterns are now
    3589 accepted, but if any repetition of the subpattern does in
    3590 fact match no characters, the loop is forcibly broken.
    3591 
    3592 @cindex Greedy regular expression matching
    3593 @cindex Perl-style regular expressions, stingy repetitions
    3594 By default, the quantifiers are @dfn{greedy} like in @sc{posix}
    3595 mode, that is, they match as much as possible (up to the maximum
    3596 number of permitted times), without causing the rest of the
    3597 pattern to fail. The classic example of where this gives problems
    3598 is in trying to match comments in C programs. These appear between
    3599 the sequences @code{/*} and @code{*/} and within the sequence, individual
    3600 @code{*} and @code{/} characters may appear. An attempt to match C
    3601 comments by applying the pattern
    3602 
    3603 @example
    3604      /\*.*\*/
    3605 @end example
    3606 
    3607 @noindent
    3608 to the string
    3609 
    3610 @example
    3611      /* first command */ not comment /* second comment */
    3612 @end example
    3613 
    3614 @noindent
    3615 
    3616 fails, because it matches the entire string owing to the
    3617 greediness of the @code{.*} item.
    3618 
    3619 However, if a quantifier is followed by a question mark, it
    3620 ceases to be greedy, and instead matches the minimum number
    3621 of times possible, so the pattern @code{/\*.*?\*/}
    3622 does the right thing with the C comments. The meaning of the
    3623 various quantifiers is not otherwise changed, just the preferred
    3624 number of matches.  Do not confuse this use of question
    3625 mark with its use as a quantifier in its own right.
    3626 Because it has two uses, it can sometimes appear doubled, as in
    3627 
    3628 @example
    3629      \d??\d
    3630 @end example
    3631 
    3632 which matches one digit by preference, but can match two if
    3633 that is the only way the rest of the pattern matches.
    3634 
    3635 Note that greediness does not matter when specifying addresses,
    3636 but can be nevertheless used to improve performance.
    3637 
    3638 @ignore
    3639    If the PCRE_UNGREEDY option is set (an option which is not
    3640    available in Perl), the quantifiers are not greedy by
    3641    default, but individual ones can be made greedy by following
    3642    them with a question mark. In other words, it inverts the
    3643    default behaviour.
    3644 @end ignore
    3645 
    3646 When a parenthesized subpattern is quantified with a minimum
    3647 repeat count that is greater than 1 or with a limited maximum,
    3648 more store is required for the compiled pattern, in
    3649 proportion to the size of the minimum or maximum.
    3650 
    3651 @cindex Perl-style regular expressions, single line
    3652 If a pattern starts with @code{.*} or @code{.@{0,@}} and the
    3653 @code{S} modifier is used, the pattern is implicitly anchored,
    3654 because whatever follows will be tried against every character
    3655 position in the subject string, so there is no point in
    3656 retrying the overall match at any position after the first.
    3657 PCRE treats such a pattern as though it were preceded by \A.
    3658 
    3659 When a capturing subpattern is repeated, the value captured
    3660 is the substring that matched the final iteration. For example,
    3661 after
    3662 
    3663 @example
    3664      (tweedle[dume]@{3@}\s*)+
    3665 @end example
    3666 
    3667 @noindent
    3668 has matched @samp{tweedledum tweedledee} the value of the
    3669 captured substring is @samp{tweedledee}.  However, if there are
    3670 nested capturing subpatterns, the corresponding captured
    3671 values may have been set in previous iterations. For example,
    3672 after
    3673 
    3674 @example
    3675      /(a|(b))+/
    3676 @end example
    3677 
    3678 matches @samp{aba}, the value of the second captured substring is
    3679 @samp{b}.
    3680 
    3681 @node Backreferences
    3682 @appendixsec Backreferences
    3683 @cindex Perl-style regular expressions, backreferences
    3684 
    3685 Outside a character class, a backslash followed by a digit
    3686 greater than 0 (and possibly further digits) is a back
    3687 reference to a capturing subpattern earlier (i.e.  to its
    3688 left) in the pattern, provided there have been that many
    3689 previous capturing left parentheses.
    3690 
    3691 However, if the decimal number following the backslash is
    3692 less than 10, it is always taken as a back reference, and
    3693 causes an error only if there are not that many capturing
    3694 left parentheses in the entire pattern. In other words, the
    3695 parentheses that are referenced need not be to the left of
    3696 the reference for numbers less than 10. @ref{Backslash}
    3697 for further details of the handling of digits following a backslash.
    3698 
    3699 A back reference matches whatever actually matched the capturing
    3700 subpattern in the current subject string, rather than
    3701 anything matching the subpattern itself. So the pattern
    3702 
    3703 @example
    3704      (sens|respons)e and \1ibility
    3705 @end example
    3706 
    3707 @noindent
    3708 matches @samp{sense and sensibility} and @samp{response and responsibility},
    3709 but not @samp{sense and responsibility}. If caseful
    3710 matching is in force at the time of the back reference, the
    3711 case of letters is relevant. For example,
    3712 
    3713 @example
    3714      ((?i)blah)\s+\1
    3715 @end example
    3716 
    3717 @noindent
    3718 matches @samp{blah blah} and @samp{Blah Blah}, but not
    3719 @samp{BLAH blah}, even though the original capturing
    3720 subpattern is matched caselessly.
    3721 
    3722 There may be more than one back reference to the same subpattern.
    3723 Also, if a subpattern has not actually been used in a
    3724 particular match, any back references to it always fail. For
    3725 example, the pattern
    3726 
    3727 @example
    3728      (a|(bc))\2
    3729 @end example
    3730 
    3731 @noindent
    3732 always fails if it starts to match @samp{a} rather than
    3733 @samp{bc}.  Because there may be up to 99 back references, all
    3734 digits following the backslash are taken as part of a potential
    3735 back reference number; this is different from what happens
    3736 in @sc{posix} mode. If the pattern continues with a digit
    3737 character, some delimiter must be used to terminate the back
    3738 reference.  If the @code{X} modifier option is set, this can be
    3739 whitespace.  Otherwise an empty comment can be used, or the
    3740 following character can be expressed in hexadecimal or octal.
    3741 
    3742 A back reference that occurs inside the parentheses to which
    3743 it refers fails when the subpattern is first used, so, for
    3744 example, @code{(a\1)} never matches.  However, such references
    3745 can be useful inside repeated subpatterns. For example, the
    3746 pattern
    3747 
    3748 @example
    3749      (a|b\1)+
    3750 @end example
    3751 
    3752 @noindent
    3753 matches any number of @samp{a}s and also @samp{aba}, @samp{ababbaa},
    3754 etc. At each iteration of the subpattern, the back reference matches
    3755 the character string corresponding to the previous iteration.  In
    3756 order for this to work, the pattern must be such that the first
    3757 iteration does not need to match the back reference.  This can be
    3758 done using alternation, as in the example above, or by a
    3759 quantifier with a minimum of zero.
    3760 
    3761 @node Assertions
    3762 @appendixsec Assertions
    3763 @cindex Perl-style regular expressions, assertions
    3764 @cindex Perl-style regular expressions, asserting subpatterns
    3765 
    3766 An assertion is a test on the characters following or
    3767 preceding the current matching point that does not actually
    3768 consume any characters. The simple assertions coded as @code{\b},
    3769 @code{\B}, @code{\A}, @code{\Z}, @code{\z}, @code{^} and @code{$}
    3770 are described above. More complicated assertions are coded as
    3771 subpatterns.  There are two kinds: those that look ahead of the
    3772 current position in the subject string, and those that look behind it.
    3773 
    3774 @cindex Perl-style regular expressions, lookahead subpatterns
    3775 An assertion subpattern is matched in the normal way, except
    3776 that it does not cause the current matching position to be
    3777 changed. Lookahead assertions start with @code{(?=} for positive
    3778 assertions and @code{(?!} for negative assertions. For example,
    3779 
    3780 @example
    3781      \w+(?=;)
    3782 @end example
    3783 
    3784 @noindent
    3785 matches a word followed by a semicolon, but does not include
    3786 the semicolon in the match, and
    3787 
    3788 @example
    3789      foo(?!bar)
    3790 @end example
    3791 
    3792 @noindent
    3793 matches any occurrence of @samp{foo} that is not followed by
    3794 @samp{bar}.
    3795 
    3796 Note that the apparently similar pattern
    3797 
    3798 @example
    3799      (?!foo)bar
    3800 @end example
    3801 
    3802 @noindent
    3803 @cindex Perl-style regular expressions, lookbehind subpatterns
    3804 finds any occurrence of @samp{bar} even if it is preceded by
    3805 @samp{foo}, because the assertion @code{(?!foo)} is always true
    3806 when the next three characters are @samp{bar}. A lookbehind
    3807 assertion is needed to achieve this effect.
    3808 Lookbehind assertions start with @code{(?<=} for positive
    3809 assertions and @code{(?<!} for negative assertions. So,
    3810 
    3811 @example
    3812      (?<!foo)bar
    3813 @end example
    3814 
    3815 achieves the required effect of finding an occurrence of
    3816 @samp{bar} that is not preceded by @samp{foo}. The contents of a
    3817 lookbehind assertion are restricted
    3818 such that all the strings it matches must have a fixed
    3819 length.  However, if there are several alternatives, they do
    3820 not all have to have the same fixed length.  This is an extension
    3821 compared with Perl 5.005, which requires all branches to match
    3822 the same length of string. Thus
    3823 
    3824 @example
    3825      (?<=dogs|cats|)
    3826 @end example
    3827 
    3828 @noindent
    3829 is permitted, but the apparently equivalent regular expression
    3830 
    3831 @example
    3832      (?<!dogs?|cats?)
    3833 @end example
    3834 
    3835 @noindent
    3836 causes an error at compile time. Branches that match different
    3837 length strings are permitted only at the top level of
    3838 a lookbehind assertion: an assertion such as
    3839 
    3840 @example
    3841      (?<=ab(c|de))
    3842 @end example
    3843 
    3844 @noindent
    3845 is not permitted, because its single top-level branch can
    3846 match two different lengths, but it is acceptable if rewritten
    3847 to use two top-level branches:
    3848 
    3849 @example
    3850      (?<=abc|abde)
    3851 @end example
    3852 
    3853 All this is required because lookbehind assertions simply
    3854 move the current position back by the alternative's fixed
    3855 width and then try to match.  If there are
    3856 insufficient characters before the current position, the
    3857 match is deemed to fail.  Lookbehinds, in conjunction with
    3858 non-backtracking subpatterns can be particularly useful for
    3859 matching at the ends of strings; an example is given at the end
    3860 of the section on non-backtracking subpatterns.
    3861 
    3862 Several assertions (of any sort) may occur in succession.
    3863 For example,
    3864 
    3865 @example
    3866      (?<=\d@{3@})(?<!999)foo
    3867 @end example
    3868 
    3869 @noindent
    3870 matches @samp{foo} preceded by three digits that are not @samp{999}.
    3871 Notice that each of the assertions is applied independently
    3872 at the same point in the subject string. First there is a
    3873 check that the previous three characters are all digits, and
    3874 then there is a check that the same three characters are not
    3875 @samp{999}.  This pattern does not match @samp{foo} preceded by six
    3876 characters, the first of which are digits and the last three
    3877 of which are not @samp{999}.  For example, it doesn't match
    3878 @samp{123abcfoo}. A pattern to do that is
    3879 
    3880 @example
    3881      (?<=\d@{3@}...)(?<!999)foo
    3882 @end example
    3883 
    3884 @noindent
    3885 This time the first assertion looks at the preceding six
    3886 characters, checking that the first three are digits, and
    3887 then the second assertion checks that the preceding three
    3888 characters are not @samp{999}.  Actually, assertions can be
    3889 nested in any combination, so one can write this as
    3890 
    3891 @example
    3892      (?<=\d@{3@}(?!999)...)foo
    3893 @end example
    3894 
    3895 or
    3896 
    3897 @example
    3898      (?<=\d@{3@}...(?<!999))foo
    3899 @end example
    3900 
    3901 @noindent
    3902 both of which might be considered more readable.
    3903 
    3904 Assertion subpatterns are not capturing subpatterns, and may
    3905 not be repeated, because it makes no sense to assert the
    3906 same thing several times. If any kind of assertion contains
    3907 capturing subpatterns within it, these are counted for the
    3908 purposes of numbering the capturing subpatterns in the whole
    3909 pattern.  However, substring capturing is carried out only
    3910 for positive assertions, because it does not make sense for
    3911 negative assertions.
    3912 
    3913 Assertions count towards the maximum of 200 parenthesized
    3914 subpatterns.
    3915 
    3916 @node Non-backtracking subpatterns
    3917 @appendixsec Non-backtracking subpatterns
    3918 @cindex Perl-style regular expressions, non-backtracking subpatterns
    3919 
    3920 With both maximizing and minimizing repetition, failure of
    3921 what follows normally causes the repeated item to be evaluated
    3922 again to see if a different number of repeats allows the
    3923 rest of the pattern to match. Sometimes it is useful to
    3924 prevent this, either to change the nature of the match, or
    3925 to cause it fail earlier than it otherwise might, when the
    3926 author of the pattern knows there is no point in carrying
    3927 on.
    3928 
    3929 Consider, for example, the pattern @code{\d+foo} when applied to
    3930 the subject line
    3931 
    3932 @example
    3933      123456bar
    3934 @end example
    3935 
    3936 After matching all 6 digits and then failing to match @samp{foo},
    3937 the normal action of the matcher is to try again with only 5
    3938 digits matching the @code{\d+} item, and then with 4, and so on,
    3939 before ultimately failing. Non-backtracking subpatterns
    3940 provide the means for specifying that once a portion of the
    3941 pattern has matched, it is not to be re-evaluated in this way,
    3942 so the matcher would give up immediately on failing to match
    3943 @samp{foo} the first time.  The notation is another kind of special
    3944 parenthesis, starting with @code{(?>} as in this example:
    3945 
    3946 @example
    3947      (?>\d+)bar
    3948 @end example
    3949 
    3950 This kind of parenthesis ``locks up'' the part of the pattern
    3951 it contains once it has matched, and a failure further into
    3952 the pattern is prevented from backtracking into it.
    3953 Backtracking past it to previous items, however, works as
    3954 normal.
    3955 
    3956 Non-backtracking subpatterns are not capturing subpatterns.  Simple
    3957 cases such as the above example can be thought of as a maximizing
    3958 repeat that must swallow everything it can.  So,
    3959 while both @code{\d+} and @code{\d+?} are prepared to adjust the number of
    3960 digits they match in order to make the rest of the pattern
    3961 match, @code{(?>\d+)} can only match an entire sequence of digits.
    3962 
    3963 This construction can of course contain arbitrarily complicated
    3964 subpatterns, and it can be nested.
    3965 
    3966 @cindex Perl-style regular expressions, lookbehind subpatterns
    3967 Non-backtracking subpatterns can be used in conjunction with look-behind
    3968 assertions to specify efficient matching at the end
    3969 of the subject string. Consider a simple pattern such as
    3970 
    3971 @example
    3972      abcd$
    3973 @end example
    3974 
    3975 @noindent
    3976 when applied to a long string which does not match.  Because
    3977 matching proceeds from left to right, @command{sed} will look for
    3978 each @samp{a} in the subject and then see if what follows matches
    3979 the rest of the pattern. If the pattern is specified as
    3980 
    3981 @example
    3982      ^.*abcd$
    3983 @end example
    3984 
    3985 @noindent
    3986 the initial @code{.*} matches the entire string at first, but when
    3987 this fails (because there is no following @samp{a}), it backtracks
    3988 to match all but the last character, then all but the
    3989 last two characters, and so on. Once again the search for
    3990 @samp{a} covers the entire string, from right to left, so we are
    3991 no better off. However, if the pattern is written as
    3992 
    3993 @example
    3994      ^(?>.*)(?<=abcd)
    3995 @end example
    3996 
    3997 there can be no backtracking for the .* item; it can match
    3998 only the entire string. The subsequent lookbehind assertion
    3999 does a single test on the last four characters. If it fails,
    4000 the match fails immediately. For long strings, this approach
    4001 makes a significant difference to the processing time.
    4002 
    4003 When a pattern contains an unlimited repeat inside a subpattern
    4004 that can itself be repeated an unlimited number of
    4005 times, the use of a once-only subpattern is the only way to
    4006 avoid some failing matches taking a very long time
    4007 indeed.@footnote{Actually, the matcher embedded in @value{SSED}
    4008     tries to do something for this in the simplest cases,
    4009     like @code{([^b]*b)*}.  These cases are actually quite
    4010     common: they happen for example in a regular expression
    4011     like @code{\/\*([^*]*\*)*\/} which matches C comments.}
    4012 
    4013 The pattern
    4014 
    4015 @example
    4016      (\D+|<\d+>)*[!?]
    4017 @end example
    4018 
    4019 ([^0-9<]+<(\d+>)?)*[!?]
    4020 
    4021 @noindent
    4022 matches an unlimited number of substrings that either consist
    4023 of non-digits, or digits enclosed in angular brackets, followed by
    4024 an exclamation or question mark. When it matches, it runs quickly.
    4025 However, if it is applied to
    4026 
    4027 @example
    4028      aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    4029 @end example
    4030 
    4031 @noindent
    4032 it takes a long time before reporting failure.  This is
    4033 because the string can be divided between the two repeats in
    4034 a large number of ways, and all have to be tried.@footnote{The
    4035 example used @code{[!?]} rather than a single character at the end,
    4036 because both @value{SSED} and Perl have an optimization that allows
    4037 for fast failure when a single character is used. They
    4038 remember the last single character that is required for a
    4039 match, and fail early if it is not present in the string.}
    4040 
    4041 If the pattern is changed to
    4042 
    4043 @example
    4044      ((?>\D+)|<\d+>)*[!?]
    4045 @end example
    4046 
    4047 sequences of non-digits cannot be broken, and failure happens
    4048 quickly.
    4049 
    4050 @node Conditional subpatterns
    4051 @appendixsec Conditional subpatterns
    4052 @cindex Perl-style regular expressions, conditional subpatterns
    4053 
    4054 It is possible to cause the matching process to obey a subpattern
    4055 conditionally or to choose between two alternative
    4056 subpatterns, depending on the result of an assertion, or
    4057 whether a previous capturing subpattern matched or not. The
    4058 two possible forms of conditional subpattern are
    4059 
    4060 @example
    4061      (?(@var{condition})@var{yes-pattern})
    4062      (?(@var{condition})@var{yes-pattern}|@var{no-pattern})
    4063 @end example
    4064 
    4065 If the condition is satisfied, the yes-pattern is used; otherwise
    4066 the no-pattern (if present) is used. If there are more than two
    4067 alternatives in the subpattern, a compile-time error occurs.
    4068 
    4069 There are two kinds of condition. If the text between the
    4070 parentheses consists of a sequence of digits, the condition
    4071 is satisfied if the capturing subpattern of that number has
    4072 previously matched.  The number must be greater than zero.
    4073 Consider the following pattern, which contains non-significant
    4074 white space to make it more readable (assume the @code{X} modifier)
    4075 and to divide it into three parts for ease of discussion:
    4076 
    4077 @example
    4078      ( \( )?   [^()]+   (?(1) \) )
    4079 @end example
    4080 
    4081 The first part matches an optional opening parenthesis, and
    4082 if that character is present, sets it as the first captured
    4083 substring. The second part matches one or more characters
    4084 that are not parentheses. The third part is a conditional
    4085 subpattern that tests whether the first set of parentheses
    4086 matched or not.  If they did, that is, if subject started
    4087 with an opening parenthesis, the condition is true, and so
    4088 the yes-pattern is executed and a closing parenthesis is
    4089 required. Otherwise, since no-pattern is not present, the
    4090 subpattern matches nothing.  In other words, this pattern
    4091 matches a sequence of non-parentheses, optionally enclosed
    4092 in parentheses.
    4093 
    4094 @cindex Perl-style regular expressions, lookahead subpatterns
    4095 If the condition is not a sequence of digits, it must be an
    4096 assertion.  This may be a positive or negative lookahead or
    4097 lookbehind assertion. Consider this pattern, again containing
    4098 non-significant white space, and with the two alternatives
    4099 on the second line:
    4100 
    4101 @example
    4102      (?(?=...[a-z])
    4103         \d\d-[a-z]@{3@}-\d\d |
    4104         \d\d-\d\d-\d\d )
    4105 @end example
    4106 
    4107 The condition is a positive lookahead assertion that matches
    4108 a letter that is three characters away from the current point.
    4109 If a letter is found, the subject is matched against the first
    4110 alternative @samp{@var{dd}-@var{aaa}-@var{dd}} (where @var{aaa} are
    4111 letters and @var{dd} are digits); otherwise it is matched against
    4112 the second alternative, @samp{@var{dd}-@var{dd}-@var{dd}}.
    4113 
    4114 
    4115 @node Recursive patterns
    4116 @appendixsec Recursive patterns
    4117 @cindex Perl-style regular expressions, recursive patterns
    4118 @cindex Perl-style regular expressions, recursion
    4119 
    4120 Consider the problem of matching a string in parentheses,
    4121 allowing for unlimited nested parentheses. Without the use
    4122 of recursion, the best that can be done is to use a pattern
    4123 that matches up to some fixed depth of nesting. It is not
    4124 possible to handle an arbitrary nesting depth. Perl 5.6 has
    4125 provided an experimental facility that allows regular
    4126 expressions to recurse (amongst other things). It does this
    4127 by interpolating Perl code in the expression at run time,
    4128 and the code can refer to the expression itself. A Perl pattern
    4129 tern to solve the parentheses problem can be created like
    4130 this:
    4131 
    4132 @example
    4133      $re = qr@{\( (?: (?>[^()]+) | (?p@{$re@}) )* \)@}x;
    4134 @end example
    4135 
    4136 The @code{(?p@{...@})} item interpolates Perl code at run time,
    4137 and in this case refers recursively to the pattern in which it
    4138 appears. Obviously, @command{sed} cannot support the interpolation of
    4139 Perl code.  Instead, the special item @code{(?R)} is provided for
    4140 the specific case of recursion. This pattern solves the
    4141 parentheses problem (assume the @code{X} modifier option is used
    4142 so that white space is ignored):
    4143 
    4144 @example
    4145      \( ( (?>[^()]+) | (?R) )* \)
    4146 @end example
    4147 
    4148 First it matches an opening parenthesis. Then it matches any
    4149 number of substrings which can either be a sequence of
    4150 non-parentheses, or a recursive match of the pattern itself
    4151 (i.e. a correctly parenthesized substring). Finally there is
    4152 a closing parenthesis.
    4153 
    4154 This particular example pattern contains nested unlimited
    4155 repeats, and so the use of a non-backtracking subpattern for
    4156 matching strings of non-parentheses is important when applying
    4157 the pattern to strings that do not match. For example, when
    4158 it is applied to
    4159 
    4160 @example
    4161      (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()
    4162 @end example
    4163 
    4164 it yields a ``no match'' response quickly. However, if a
    4165 standard backtracking subpattern is not used, the match runs
    4166 for a very long time indeed because there are so many different
    4167 ways the @code{+} and @code{*} repeats can carve up the subject,
    4168 and all have to be tested before failure can be reported.
    4169 
    4170 The values set for any capturing subpatterns are those from
    4171 the outermost level of the recursion at which the subpattern
    4172 value is set. If the pattern above is matched against
    4173 
    4174 @example
    4175      (ab(cd)ef)
    4176 @end example
    4177 
    4178 @noindent
    4179 the value for the capturing parentheses is @samp{ef}, which is
    4180 the last value taken on at the top level.
    4181 
    4182 @node Comments
    4183 @appendixsec Comments
    4184 @cindex Perl-style regular expressions, comments
    4185 
    4186 The sequence (?# marks the start of a comment which continues
    4187 ues up to the next closing parenthesis. Nested parentheses
    4188 are not permitted. The characters that make up a comment
    4189 play no part in the pattern matching at all.
    4190 
    4191 @cindex Perl-style regular expressions, extended
    4192 If the @code{X} modifier option is used, an unescaped @code{#} character
    4193 outside a character class introduces a comment that continues
    4194 up to the next newline character in the pattern.
    4195 @end ifset
     5857
     5858
     5859@page
     5860@node GNU Free Documentation License
     5861@appendix GNU Free Documentation License
     5862
     5863@include fdl.texi
    41965864
    41975865
  • trunk/src/sed/doc/sed.x

    r599 r3613  
    1 .SH NAME
    2 sed \- a Stream EDitor
    3 .SH SYNOPSIS
     1[NAME]
     2sed \- stream editor for filtering and transforming text
     3[SYNOPSIS]
    44.nf
    55sed [-V] [--version] [--help] [-n] [--quiet] [--silent]
    66    [-l N] [--line-length=N] [-u] [--unbuffered]
    7     [-r] [--regexp-extended]
     7    [-E] [-r] [--regexp-extended]
    88    [-e script] [--expression=script]
    99    [-f script-file] [--file=script-file]
     
    4343.RI # comment
    4444The comment extends until the next newline (or the end of a
    45 .B -e
     45.B \-e
    4646script fragment).
    4747.TP
     
    6868which has each embedded newline preceded by a backslash.
    6969.TP
    70 q
     70q [\fIexit-code\fR]
    7171Immediately quit the \*(sd script without processing
    72 any more input,
    73 except that if auto-print is not disabled
    74 the current pattern space will be printed.
    75 .TP
    76 Q
     72any more input, except that if auto-print is not disabled
     73the current pattern space will be printed.  The exit code
     74argument is a GNU extension.
     75.TP
     76Q [\fIexit-code\fR]
    7777Immediately quit the \*(sd script without processing
    78 any more input.
     78any more input.  This is a GNU extension.
    7979.TP
    8080.RI r\  filename
     
    8585Append a line read from
    8686.IR filename .
     87Each invocation of the command reads a line from the file.
     88This is a GNU extension.
    8789.SS
    8890Commands which accept address ranges
     
    98100is omitted, branch to end of script.
    99101.TP
    100 .RI t\  label
    101 If a s/// has done a successful substitution since the
    102 last input line was read and since the last t or T
    103 command, then branch to
    104 .IR label ;
    105 if
    106 .I label
    107 is omitted, branch to end of script.
    108 .TP
    109 .RI T\  label
    110 If no s/// has done a successful substitution since the
    111 last input line was read and since the last t or T
    112 command, then branch to
    113 .IR label ;
    114 if
    115 .I label
    116 is omitted, branch to end of script.
    117 .TP
    118102c \e
    119103.TP
     
    128112.TP
    129113D
    130 Delete up to the first embedded newline in the pattern space.
    131 Start next cycle, but skip reading from the input
    132 if there is still data in the pattern space.
     114If pattern space contains no newline, start a normal new cycle as if
     115the d command was issued.  Otherwise, delete text in the pattern
     116space up to the first newline, and restart cycle with the resultant
     117pattern space, without reading a new line of input.
    133118.TP
    134119h H
     
    138123Copy/append hold space to pattern space.
    139124.TP
    140 x
    141 Exchange the contents of the hold and pattern spaces.
    142 .TP
    143125l
    144126List out the current line in a ``visually unambiguous'' form.
     127.TP
     128.RI l\  width
     129List out the current line in a ``visually unambiguous'' form,
     130breaking it at
     131.I width
     132characters.  This is a GNU extension.
    145133.TP
    146134n N
     
    169157.IR regexp .
    170158.TP
     159.RI t\  label
     160If a s/// has done a successful substitution since the
     161last input line was read and since the last t or T
     162command, then branch to
     163.IR label ;
     164if
     165.I label
     166is omitted, branch to end of script.
     167.TP
     168.RI T\  label
     169If no s/// has done a successful substitution since the
     170last input line was read and since the last t or T
     171command, then branch to
     172.IR label ;
     173if
     174.I label
     175is omitted, branch to end of script.  This is a GNU
     176extension.
     177.TP
    171178.RI w\  filename
    172179Write the current pattern space to
     
    176183Write the first line of the current pattern space to
    177184.IR filename .
     185This is a GNU extension.
     186.TP
     187x
     188Exchange the contents of the hold and pattern spaces.
    178189.TP
    179190.RI y/ source / dest /
     
    223234.I number
    224235Match only the specified line
    225 .IR number .
     236.IR number
     237(which increments cumulatively across files, unless the
     238.B \-s
     239option is specified on the command line).
    226240.TP
    227241.IR first ~ step
     
    230244line starting with line
    231245.IR first .
    232 For example, ``sed -n 1~2p'' will print all the odd-numbered lines in
     246For example, ``sed \-n 1~2p'' will print all the odd-numbered lines in
    233247the input stream, and the address 2~5 will match every fifth line,
    234 starting with the second. (This is an extension.)
     248starting with the second.
     249.I first
     250can be zero; in this case, \*(sd operates as if it were equal to
     251.IR step .
     252(This is an extension.)
    235253.TP
    236254$
     
    240258Match lines matching the regular expression
    241259.IR regexp .
     260Matching is performed on the current pattern space, which
     261can be modified with commands such as ``s///''.
    242262.TP
    243263.BI \fR\e\fPc regexp c
     
    263283.RI 1, addr2
    264284form will still be at the beginning of its range.
     285This works only when
     286.I addr2
     287is a regular expression.
    265288.TP
    266289.IR addr1 ,+ N
     
    292315.BR \et ,
    293316and other sequences.
     317The \fI-E\fP option switches to using extended regular expressions instead;
     318it has been supported for years by GNU sed, and is now
     319included in POSIX.
    294320
    295321[SEE ALSO]
     
    308334.PP
    309335E-mail bug reports to
    310 .BR bonzini@gnu.org .
    311 Be sure to include the word ``sed'' somewhere in the ``Subject:'' field.
    312 Also, please include the output of ``sed --version'' in the body
     336.BR bug-sed@gnu.org .
     337Also, please include the output of ``sed \-\-version'' in the body
    313338of your report if at all possible.
  • trunk/src/sed/doc/stamp-vti

    r599 r3613  
    1 @set UPDATED 30 January 2006
    2 @set UPDATED-MONTH January 2006
    3 @set EDITION 4.1.5
    4 @set VERSION 4.1.5
     1@set UPDATED 1 January 2022
     2@set UPDATED-MONTH January 2022
     3@set EDITION 4.9
     4@set VERSION 4.9
  • trunk/src/sed/doc/version.texi

    r599 r3613  
    1 @set UPDATED 30 January 2006
    2 @set UPDATED-MONTH January 2006
    3 @set EDITION 4.1.5
    4 @set VERSION 4.1.5
     1@set UPDATED 1 January 2022
     2@set UPDATED-MONTH January 2022
     3@set EDITION 4.9
     4@set VERSION 4.9
Note: See TracChangeset for help on using the changeset viewer.