Changeset 3613 for trunk/src/sed/doc
- Timestamp:
- Sep 19, 2024, 2:34:43 AM (11 months ago)
- Location:
- trunk/src/sed
- Files:
-
- 6 deleted
- 8 edited
- 4 copied
Legend:
- Unmodified
- Added
- Removed
-
trunk/src/sed
-
Property svn:mergeinfo
set to
/vendor/sed/current merged eligible
-
Property svn:mergeinfo
set to
-
trunk/src/sed/doc/config.texi
r599 r3613 6 6 7 7 @clear PERL 8 @set SSEDEXT @acronym{GNU} extensions 9 @set SSED @acronym{GNU} @command{sed} 8 @set SSEDEXT GNU extensions 9 @set SSED GNU @command{sed} 10 11 @c Ugly hack to enable using new texinfo commands '@codequotebacktick' 12 @c and '@codequoteundirected' or define empty fallbacks if they are 13 @c not available. 14 15 @ifclear txicommandconditionals 16 @c If we got here, this is a REALLY old texinfo (pre 5.0), 17 @c and '@ifcommandnotdefined' is not defined. 18 @c Assume these commands are not defined as well. 19 @macro codequotebacktick 20 @end macro 21 @macro codequoteundirected 22 @end macro 23 @end ifclear 24 25 @ifset txicommandconditionals 26 @c if we got here, this texinfo supports checking for defined 27 @c commands. If these commands aren't available - define empty 28 @c fallbacks. 29 @ifcommandnotdefined codequotebacktick 30 @macro codequotebacktick 31 @end macro 32 @macro codequoteundirected 33 @end macro 34 @end ifcommandnotdefined 35 @end ifset 36 37 38 @c define variables that will render as characters 39 @c on both HTML (with @U{}) and PDF (with greek symbols). 40 @c Use with: @value{ucsigma} 41 @c 42 @c Based on: 43 @c https://lists.gnu.org/archive/html/help-texinfo/2012-06/msg00004.html 44 @iftex 45 @set ucsigma @math{@Sigma{}} 46 @end iftex 47 @ifnottex 48 @set ucsigma @U{03A3} 49 @end ifnottex 50 51 @iftex 52 @set lcsigma @math{@sigma{}} 53 @end iftex 54 @ifnottex 55 @set lcsigma @U{03C3} 56 @end ifnottex 57 58 @c Unicode Replacement Character (U+FFFD): 59 @c no easy/portable tex equivalent, so use another 60 @c distinct symbol (which will be rendered very differently 61 @c than ascii characters in @examples. 62 @iftex 63 @set unicodeFFFD @math{@otimes{}} 64 @end iftex 65 @ifnottex 66 @set unicodeFFFD @U{FFFD} 67 @end ifnottex -
trunk/src/sed/doc/sed.1
r599 r3613 1 .\" DO NOT MODIFY THIS FILE! It was generated by help2man 1. 28.2 .TH SED "1" " February 2006" "sed version 4.1.4" "User Commands"1 .\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.48.5. 2 .TH SED "1" "November 2022" "GNU sed 4.9" "User Commands" 3 3 .SH NAME 4 4 sed \- stream editor for filtering and transforming text 5 5 .SH SYNOPSIS 6 .B sed 7 [\fIOPTION\fR]... \fI{script-only-if-no-other-script} \fR[\fIinput-file\fR]... 6 .nf 7 sed [-V] [--version] [--help] [-n] [--quiet] [--silent] 8 [-l N] [--line-length=N] [-u] [--unbuffered] 9 [-E] [-r] [--regexp-extended] 10 [-e script] [--expression=script] 11 [-f script-file] [--file=script-file] 12 [script-if-no-other-script] 13 [file...] 14 .fi 8 15 .SH DESCRIPTION 9 16 .ds sd \fIsed\fP … … 25 32 suppress automatic printing of pattern space 26 33 .HP 27 \fB\-e\fR script, \fB\-\-expression\fR=\fIscript\fR 34 \fB\-\-debug\fR 35 .IP 36 annotate program execution 37 .HP 38 \fB\-e\fR script, \fB\-\-expression\fR=\fI\,script\/\fR 28 39 .IP 29 40 add the script to the commands to be executed 30 41 .HP 31 \fB\-f\fR script-file, \fB\-\-file\fR=\fIscript\-file\fR 32 .IP 33 add the contents of script-file to the commands to be executed 34 .HP 35 \fB\-i[SUFFIX]\fR, \fB\-\-in\-place\fR[=\fISUFFIX\fR] 36 .IP 37 edit files in place (makes backup if extension supplied) 38 .HP 39 \fB\-l\fR N, \fB\-\-line\-length\fR=\fIN\fR 40 .IP 41 specify the desired line-wrap length for the `l' command 42 \fB\-f\fR script\-file, \fB\-\-file\fR=\fI\,script\-file\/\fR 43 .IP 44 add the contents of script\-file to the commands to be executed 45 .HP 46 \fB\-\-follow\-symlinks\fR 47 .IP 48 follow symlinks when processing in place 49 .HP 50 \fB\-i[SUFFIX]\fR, \fB\-\-in\-place\fR[=\fI\,SUFFIX\/\fR] 51 .IP 52 edit files in place (makes backup if SUFFIX supplied) 53 .HP 54 \fB\-l\fR N, \fB\-\-line\-length\fR=\fI\,N\/\fR 55 .IP 56 specify the desired line\-wrap length for the `l' command 42 57 .HP 43 58 \fB\-\-posix\fR … … 45 60 disable all GNU extensions. 46 61 .HP 47 \fB\-r\fR, \fB\-\-regexp\-extended\fR 48 .IP 49 use extended regular expressions in the script. 62 \fB\-E\fR, \fB\-r\fR, \fB\-\-regexp\-extended\fR 63 .IP 64 use extended regular expressions in the script 65 (for portability use POSIX \fB\-E\fR). 50 66 .HP 51 67 \fB\-s\fR, \fB\-\-separate\fR 52 68 .IP 53 consider files as separate rather than as a single continuous 54 long stream. 69 consider files as separate rather than as a single, 70 continuous long stream. 71 .HP 72 \fB\-\-sandbox\fR 73 .IP 74 operate in sandbox mode (disable e/r/w commands). 55 75 .HP 56 76 \fB\-u\fR, \fB\-\-unbuffered\fR … … 58 78 load minimal amounts of data from the input files and flush 59 79 the output buffers more often 80 .HP 81 \fB\-z\fR, \fB\-\-null\-data\fR 82 .IP 83 separate lines by NUL characters 60 84 .TP 61 85 \fB\-\-help\fR … … 66 90 .PP 67 91 If no \fB\-e\fR, \fB\-\-expression\fR, \fB\-f\fR, or \fB\-\-file\fR option is given, then the first 68 non -option argument is taken as the sed script to interpret. All92 non\-option argument is taken as the sed script to interpret. All 69 93 remaining arguments are names of input files; if no input files are 70 94 specified, then the standard input is read. 71 95 .PP 72 E-mail bug reports to: bonzini@gnu.org . 73 Be sure to include the word ``sed'' somewhere in the ``Subject:'' field. 96 GNU sed home page: <https://www.gnu.org/software/sed/>. 97 General help using GNU software: <https://www.gnu.org/gethelp/>. 98 E\-mail bug reports to: <bug\-sed@gnu.org>. 74 99 .SH "COMMAND SYNOPSIS" 75 100 This is just a brief synopsis of \*(sd commands to serve as … … 89 114 .RI # comment 90 115 The comment extends until the next newline (or the end of a 91 .B -e116 .B \-e 92 117 script fragment). 93 118 .TP … … 114 139 which has each embedded newline preceded by a backslash. 115 140 .TP 116 q 141 q [\fIexit-code\fR] 117 142 Immediately quit the \*(sd script without processing 118 any more input, 119 except that if auto-print is not disabled 120 the current pattern space will be printed.121 .TP 122 Q 143 any more input, except that if auto-print is not disabled 144 the current pattern space will be printed. The exit code 145 argument is a GNU extension. 146 .TP 147 Q [\fIexit-code\fR] 123 148 Immediately quit the \*(sd script without processing 124 any more input. 149 any more input. This is a GNU extension. 125 150 .TP 126 151 .RI r\ filename … … 131 156 Append a line read from 132 157 .IR filename . 158 Each invocation of the command reads a line from the file. 159 This is a GNU extension. 133 160 .SS 134 161 Commands which accept address ranges … … 144 171 is omitted, branch to end of script. 145 172 .TP 146 .RI t\ label147 If a s/// has done a successful substitution since the148 last input line was read and since the last t or T149 command, then branch to150 .IR label ;151 if152 .I label153 is omitted, branch to end of script.154 .TP155 .RI T\ label156 If no s/// has done a successful substitution since the157 last input line was read and since the last t or T158 command, then branch to159 .IR label ;160 if161 .I label162 is omitted, branch to end of script.163 .TP164 173 c \e 165 174 .TP … … 174 183 .TP 175 184 D 176 Delete up to the first embedded newline in the pattern space. 177 Start next cycle, but skip reading from the input 178 if there is still data in the pattern space. 185 If pattern space contains no newline, start a normal new cycle as if 186 the d command was issued. Otherwise, delete text in the pattern 187 space up to the first newline, and restart cycle with the resultant 188 pattern space, without reading a new line of input. 179 189 .TP 180 190 h H … … 184 194 Copy/append hold space to pattern space. 185 195 .TP 186 x187 Exchange the contents of the hold and pattern spaces.188 .TP189 196 l 190 197 List out the current line in a ``visually unambiguous'' form. 198 .TP 199 .RI l\ width 200 List out the current line in a ``visually unambiguous'' form, 201 breaking it at 202 .I width 203 characters. This is a GNU extension. 191 204 .TP 192 205 n N … … 215 228 .IR regexp . 216 229 .TP 230 .RI t\ label 231 If a s/// has done a successful substitution since the 232 last input line was read and since the last t or T 233 command, then branch to 234 .IR label ; 235 if 236 .I label 237 is omitted, branch to end of script. 238 .TP 239 .RI T\ label 240 If no s/// has done a successful substitution since the 241 last input line was read and since the last t or T 242 command, then branch to 243 .IR label ; 244 if 245 .I label 246 is omitted, branch to end of script. This is a GNU 247 extension. 248 .TP 217 249 .RI w\ filename 218 250 Write the current pattern space to … … 222 254 Write the first line of the current pattern space to 223 255 .IR filename . 256 This is a GNU extension. 257 .TP 258 x 259 Exchange the contents of the hold and pattern spaces. 224 260 .TP 225 261 .RI y/ source / dest / … … 269 305 .I number 270 306 Match only the specified line 271 .IR number . 307 .IR number 308 (which increments cumulatively across files, unless the 309 .B \-s 310 option is specified on the command line). 272 311 .TP 273 312 .IR first ~ step … … 276 315 line starting with line 277 316 .IR first . 278 For example, ``sed -n 1~2p'' will print all the odd-numbered lines in317 For example, ``sed \-n 1~2p'' will print all the odd-numbered lines in 279 318 the input stream, and the address 2~5 will match every fifth line, 280 starting with the second. (This is an extension.) 319 starting with the second. 320 .I first 321 can be zero; in this case, \*(sd operates as if it were equal to 322 .IR step . 323 (This is an extension.) 281 324 .TP 282 325 $ … … 286 329 Match lines matching the regular expression 287 330 .IR regexp . 331 Matching is performed on the current pattern space, which 332 can be modified with commands such as ``s///''. 288 333 .TP 289 334 .BI \fR\e\fPc regexp c … … 309 354 .RI 1, addr2 310 355 form will still be at the beginning of its range. 356 This works only when 357 .I addr2 358 is a regular expression. 311 359 .TP 312 360 .IR addr1 ,+ N … … 337 385 .BR \et , 338 386 and other sequences. 387 The \fI-E\fP option switches to using extended regular expressions instead; 388 it has been supported for years by GNU sed, and is now 389 included in POSIX. 339 390 .SH BUGS 340 391 .PP 341 392 E-mail bug reports to 342 .BR bonzini@gnu.org . 343 Be sure to include the word ``sed'' somewhere in the ``Subject:'' field. 344 Also, please include the output of ``sed --version'' in the body 393 .BR bug-sed@gnu.org . 394 Also, please include the output of ``sed \-\-version'' in the body 345 395 of your report if at all possible. 396 .SH AUTHOR 397 Written by Jay Fenlason, Tom Lord, Ken Pizzini, 398 Paolo Bonzini, Jim Meyering, and Assaf Gordon. 399 .PP 400 This sed program was built with SELinux support. 401 SELinux is enabled on this system. 402 .PP 403 GNU sed home page: <https://www.gnu.org/software/sed/>. 404 General help using GNU software: <https://www.gnu.org/gethelp/>. 405 E\-mail bug reports to: <bug\-sed@gnu.org>. 346 406 .SH COPYRIGHT 347 Copyright \(co 2003 Free Software Foundation, Inc. 407 Copyright \(co 2022 Free Software Foundation, Inc. 408 License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>. 348 409 .br 349 This is free software; see the source for copying conditions. There is NO 350 warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, 351 to the extent permitted by law. 410 This is free software: you are free to change and redistribute it. 411 There is NO WARRANTY, to the extent permitted by law. 352 412 .SH "SEE ALSO" 353 413 .BR awk (1), -
trunk/src/sed/doc/sed.info
r599 r3613 1 This is ../../doc/sed.info, produced by makeinfo version 4.5 from 2 ../../doc/sed.texi. 3 1 This is sed.info, produced by makeinfo version 6.8dev from sed.texi. 2 3 This file documents version 4.9 of GNU âsedâ, a stream editor. 4 5 Copyright © 1998â2022 Free Software Foundation, Inc. 6 7 Permission is granted to copy, distribute and/or modify this 8 document under the terms of the GNU Free Documentation License, 9 Version 1.3 or any later version published by the Free Software 10 Foundation; with no Invariant Sections, no Front-Cover Texts, and 11 no Back-Cover Texts. A copy of the license is included in the 12 section entitled âGNU Free Documentation Licenseâ. 4 13 INFO-DIR-SECTION Text creation and manipulation 5 14 START-INFO-DIR-ENTRY … … 8 17 END-INFO-DIR-ENTRY 9 18 10 This file documents version 4.1.5 of GNU `sed', a stream editor. 11 12 Copyright (C) 1998, 1999, 2001, 2002, 2003, 2004 Free Software 13 Foundation, Inc. 14 15 This document is released under the terms of the GNU Free 16 Documentation License as published by the Free Software Foundation; 17 either version 1.1, or (at your option) any later version. 18 19 You should have received a copy of the GNU Free Documentation 20 License along with GNU `sed'; see the file `COPYING.DOC'. If not, 21 write to the Free Software Foundation, 59 Temple Place - Suite 330, 22 Boston, MA 02110-1301, USA. 23 24 There are no Cover Texts and no Invariant Sections; this text, along 25 with its equivalent in the printed manual, constitutes the Title Page. 26 27 Indirect: 28 sed.info-1: 935 29 sed.info-2: 50405 19 20 File: sed.info, Node: Top, Next: Introduction, Up: (dir) 21 22 GNU âsedâ 23 ********* 24 25 This file documents version 4.9 of GNU âsedâ, a stream editor. 26 27 Copyright © 1998â2022 Free Software Foundation, Inc. 28 29 Permission is granted to copy, distribute and/or modify this 30 document under the terms of the GNU Free Documentation License, 31 Version 1.3 or any later version published by the Free Software 32 Foundation; with no Invariant Sections, no Front-Cover Texts, and 33 no Back-Cover Texts. A copy of the license is included in the 34 section entitled âGNU Free Documentation Licenseâ. 35 36 * Menu: 37 38 * Introduction:: Introduction 39 * Invoking sed:: Invocation 40 * sed scripts:: âsedâ scripts 41 * sed addresses:: Addresses: selecting lines 42 * sed regular expressions:: Regular expressions: selecting text 43 * advanced sed:: Advanced âsedâ: cycles and buffers 44 * Examples:: Some sample scripts 45 * Limitations:: Limitations and (non-)limitations of GNU âsedâ 46 * Other Resources:: Other resources for learning about âsedâ 47 * Reporting Bugs:: Reporting bugs 48 * GNU Free Documentation License:: Copying and sharing this manual 49 * Concept Index:: A menu with all the topics in this manual. 50 * Command and Option Index:: A menu with all âsedâ commands and 51 command-line options. 52 53 54 File: sed.info, Node: Introduction, Next: Invoking sed, Prev: Top, Up: Top 55 56 1 Introduction 57 ************** 58 59 âsedâ is a stream editor. A stream editor is used to perform basic text 60 transformations on an input stream (a file or input from a pipeline). 61 While in some ways similar to an editor which permits scripted edits 62 (such as âedâ), âsedâ works by making only one pass over the input(s), 63 and is consequently more efficient. But it is âsedââs ability to filter 64 text in a pipeline which particularly distinguishes it from other types 65 of editors. 66 67 68 File: sed.info, Node: Invoking sed, Next: sed scripts, Prev: Introduction, Up: Top 69 70 2 Running sed 71 ************* 72 73 This chapter covers how to run âsedâ. Details of âsedâ scripts and 74 individual âsedâ commands are discussed in the next chapter. 75 76 * Menu: 77 78 * Overview:: 79 * Command-Line Options:: 80 * Exit status:: 81 82 83 File: sed.info, Node: Overview, Next: Command-Line Options, Up: Invoking sed 84 85 2.1 Overview 86 ============ 87 88 Normally âsedâ is invoked like this: 89 90 sed SCRIPT INPUTFILE... 91 92 For example, to change every âhelloâ to âworldâ in the file 93 âinput.txtâ: 94 95 sed 's/hello/world/g' input.txt > output.txt 96 97 Without the âgâ (global) modifier, âsedâ affects only the first 98 instance per line. 99 100 If you do not specify INPUTFILE, or if INPUTFILE is â-â, âsedâ 101 filters the contents of the standard input. The following commands are 102 equivalent: 103 104 sed 's/hello/world/g' input.txt > output.txt 105 sed 's/hello/world/g' < input.txt > output.txt 106 cat input.txt | sed 's/hello/world/g' - > output.txt 107 108 âsedâ writes output to standard output. Use â-iâ to edit files 109 in-place instead of printing to standard output. See also the âWâ and 110 âs///wâ commands for writing output to other files. The following 111 command modifies âfile.txtâ and does not produce any output: 112 113 sed -i 's/hello/world/' file.txt 114 115 By default âsedâ prints all processed input (except input that has 116 been modified/deleted by commands such as âdâ). Use â-nâ to suppress 117 output, and the âpâ command to print specific lines. The following 118 command prints only line 45 of the input file: 119 120 sed -n '45p' file.txt 121 122 âsedâ treats multiple input files as one long stream. The following 123 example prints the first line of the first file (âone.txtâ) and the last 124 line of the last file (âthree.txtâ). Use â-sâ to reverse this behavior. 125 126 sed -n '1p ; $p' one.txt two.txt three.txt 127 128 Without â-eâ or â-fâ options, âsedâ uses the first non-option 129 parameter as the SCRIPT, and the following non-option parameters as 130 input files. If â-eâ or â-fâ options are used to specify a SCRIPT, all 131 non-option parameters are taken as input files. Options â-eâ and â-fâ 132 can be combined, and can appear multiple times (in which case the final 133 effective SCRIPT will be concatenation of all the individual SCRIPTs). 134 135 The following examples are equivalent: 136 137 sed 's/hello/world/' input.txt > output.txt 138 139 sed -e 's/hello/world/' input.txt > output.txt 140 sed --expression='s/hello/world/' input.txt > output.txt 141 142 echo 's/hello/world/' > myscript.sed 143 sed -f myscript.sed input.txt > output.txt 144 sed --file=myscript.sed input.txt > output.txt 145 146 147 File: sed.info, Node: Command-Line Options, Next: Exit status, Prev: Overview, Up: Invoking sed 148 149 2.2 Command-Line Options 150 ======================== 151 152 The full format for invoking âsedâ is: 153 154 sed OPTIONS... [SCRIPT] [INPUTFILE...] 155 156 âsedâ may be invoked with the following command-line options: 157 158 â--versionâ 159 Print out the version of âsedâ that is being run and a copyright 160 notice, then exit. 161 162 â--helpâ 163 Print a usage message briefly summarizing these command-line 164 options and the bug-reporting address, then exit. 165 166 â-nâ 167 â--quietâ 168 â--silentâ 169 By default, âsedâ prints out the pattern space at the end of each 170 cycle through the script (*note How âsedâ works: Execution Cycle.). 171 These options disable this automatic printing, and âsedâ only 172 produces output when explicitly told to via the âpâ command. 173 174 â--debugâ 175 Print the input sed program in canonical form, and annotate program 176 execution. 177 $ echo 1 | sed '\%1%s21232' 178 3 179 180 $ echo 1 | sed --debug '\%1%s21232' 181 SED PROGRAM: 182 /1/ s/1/3/ 183 INPUT: 'STDIN' line 1 184 PATTERN: 1 185 COMMAND: /1/ s/1/3/ 186 PATTERN: 3 187 END-OF-CYCLE: 188 3 189 190 â-e SCRIPTâ 191 â--expression=SCRIPTâ 192 Add the commands in SCRIPT to the set of commands to be run while 193 processing the input. 194 195 â-f SCRIPT-FILEâ 196 â--file=SCRIPT-FILEâ 197 Add the commands contained in the file SCRIPT-FILE to the set of 198 commands to be run while processing the input. 199 200 â-i[SUFFIX]â 201 â--in-place[=SUFFIX]â 202 This option specifies that files are to be edited in-place. GNU 203 âsedâ does this by creating a temporary file and sending output to 204 this file rather than to the standard output.(1). 205 206 This option implies â-sâ. 207 208 When the end of the file is reached, the temporary file is renamed 209 to the output fileâs original name. The extension, if supplied, is 210 used to modify the name of the old file before renaming the 211 temporary file, thereby making a backup copy(2)). 212 213 This rule is followed: if the extension doesnât contain a â*â, then 214 it is appended to the end of the current filename as a suffix; if 215 the extension does contain one or more â*â characters, then _each_ 216 asterisk is replaced with the current filename. This allows you to 217 add a prefix to the backup file, instead of (or in addition to) a 218 suffix, or even to place backup copies of the original files into 219 another directory (provided the directory already exists). 220 221 If no extension is supplied, the original file is overwritten 222 without making a backup. 223 224 Because â-iâ takes an optional argument, it should not be followed 225 by other short options: 226 âsed -Ei '...' FILEâ 227 Same as â-E -iâ with no backup suffix - âFILEâ will be edited 228 in-place without creating a backup. 229 230 âsed -iE '...' FILEâ 231 This is equivalent to â--in-place=Eâ, creating âFILEEâ as 232 backup of âFILEâ 233 234 Be cautious of using â-nâ with â-iâ: the former disables automatic 235 printing of lines and the latter changes the file in-place without 236 a backup. Used carelessly (and without an explicit âpâ command), 237 the output file will be empty: 238 # WRONG USAGE: 'FILE' will be truncated. 239 sed -ni 's/foo/bar/' FILE 240 241 â-l Nâ 242 â--line-length=Nâ 243 Specify the default line-wrap length for the âlâ command. A length 244 of 0 (zero) means to never wrap long lines. If not specified, it 245 is taken to be 70. 246 247 â--posixâ 248 GNU âsedâ includes several extensions to POSIX sed. In order to 249 simplify writing portable scripts, this option disables all the 250 extensions that this manual documents, including additional 251 commands. Most of the extensions accept âsedâ programs that are 252 outside the syntax mandated by POSIX, but some of them (such as the 253 behavior of the âNâ command described in *note Reporting Bugs::) 254 actually violate the standard. If you want to disable only the 255 latter kind of extension, you can set the âPOSIXLY_CORRECTâ 256 variable to a non-empty value. 257 258 â-bâ 259 â--binaryâ 260 This option is available on every platform, but is only effective 261 where the operating system makes a distinction between text files 262 and binary files. When such a distinction is madeâas is the case 263 for MS-DOS, Windows, Cygwinâtext files are composed of lines 264 separated by a carriage return _and_ a line feed character, and 265 âsedâ does not see the ending CR. When this option is specified, 266 âsedâ will open input files in binary mode, thus not requesting 267 this special processing and considering lines to end at a line 268 feed. 269 270 â--follow-symlinksâ 271 This option is available only on platforms that support symbolic 272 links and has an effect only if option â-iâ is specified. In this 273 case, if the file that is specified on the command line is a 274 symbolic link, âsedâ will follow the link and edit the ultimate 275 destination of the link. The default behavior is to break the 276 symbolic link, so that the link destination will not be modified. 277 278 â-Eâ 279 â-râ 280 â--regexp-extendedâ 281 Use extended regular expressions rather than basic regular 282 expressions. Extended regexps are those that âegrepâ accepts; they 283 can be clearer because they usually have fewer backslashes. 284 Historically this was a GNU extension, but the â-Eâ extension has 285 since been added to the POSIX standard 286 (http://austingroupbugs.net/view.php?id=528), so use â-Eâ for 287 portability. GNU sed has accepted â-Eâ as an undocumented option 288 for years, and *BSD seds have accepted â-Eâ for years as well, but 289 scripts that use â-Eâ might not port to other older systems. *Note 290 Extended regular expressions: ERE syntax. 291 292 â-sâ 293 â--separateâ 294 By default, âsedâ will consider the files specified on the command 295 line as a single continuous long stream. This GNU âsedâ extension 296 allows the user to consider them as separate files: range addresses 297 (such as â/abc/,/def/â) are not allowed to span several files, line 298 numbers are relative to the start of each file, â$â refers to the 299 last line of each file, and files invoked from the âRâ commands are 300 rewound at the start of each file. 301 302 â--sandboxâ 303 In sandbox mode, âe/w/râ commands are rejected - programs 304 containing them will be aborted without being run. Sandbox mode 305 ensures âsedâ operates only on the input files designated on the 306 command line, and cannot run external programs. 307 308 â-uâ 309 â--unbufferedâ 310 Buffer both input and output as minimally as practical. (This is 311 particularly useful if the input is coming from the likes of âtail 312 -fâ, and you wish to see the transformed output as soon as 313 possible.) 314 315 â-zâ 316 â--null-dataâ 317 â--zero-terminatedâ 318 Treat the input as a set of lines, each terminated by a zero byte 319 (the ASCII âNULâ character) instead of a newline. This option can 320 be used with commands like âsort -zâ and âfind -print0â to process 321 arbitrary file names. 322 323 If no â-eâ, â-fâ, â--expressionâ, or â--fileâ options are given on 324 the command-line, then the first non-option argument on the command line 325 is taken to be the SCRIPT to be executed. 326 327 If any command-line parameters remain after processing the above, 328 these parameters are interpreted as the names of input files to be 329 processed. A file name of â-â refers to the standard input stream. The 330 standard input will be processed if no file names are specified. 331 332 ---------- Footnotes ---------- 333 334 (1) This applies to commands such as â=â, âaâ, âcâ, âiâ, âlâ, âpâ. 335 You can still write to the standard output by using the âwâ or âWâ 336 commands together with the â/dev/stdoutâ special file 337 338 (2) Note that GNU âsedâ creates the backup file whether or not any 339 output is actually changed. 340 341 342 File: sed.info, Node: Exit status, Prev: Command-Line Options, Up: Invoking sed 343 344 2.3 Exit status 345 =============== 346 347 An exit status of zero indicates success, and a nonzero value indicates 348 failure. GNU âsedâ returns the following exit status error values: 349 350 0 351 Successful completion. 352 353 1 354 Invalid command, invalid syntax, invalid regular expression or a 355 GNU âsedâ extension command used with â--posixâ. 356 357 2 358 One or more of the input file specified on the command line could 359 not be opened (e.g. if a file is not found, or read permission is 360 denied). Processing continued with other files. 361 362 4 363 An I/O error, or a serious processing error during runtime, GNU 364 âsedâ aborted immediately. 365 366 Additionally, the commands âqâ and âQâ can be used to terminate âsedâ 367 with a custom exit code value (this is a GNU âsedâ extension): 368 369 $ echo | sed 'Q42' ; echo $? 370 42 371 372 373 File: sed.info, Node: sed scripts, Next: sed addresses, Prev: Invoking sed, Up: Top 374 375 3 âsedâ scripts 376 *************** 377 378 * Menu: 379 380 * sed script overview:: âsedâ script overview 381 * sed commands list:: âsedâ commands summary 382 * The "s" Command:: âsedââs Swiss Army Knife 383 * Common Commands:: Often used commands 384 * Other Commands:: Less frequently used commands 385 * Programming Commands:: Commands for âsedâ gurus 386 * Extended Commands:: Commands specific of GNU âsedâ 387 * Multiple commands syntax:: Extension for easier scripting 388 389 390 File: sed.info, Node: sed script overview, Next: sed commands list, Up: sed scripts 391 392 3.1 âsedâ script overview 393 ========================= 394 395 A âsedâ program consists of one or more âsedâ commands, passed in by one 396 or more of the â-eâ, â-fâ, â--expressionâ, and â--fileâ options, or the 397 first non-option argument if zero of these options are used. This 398 document will refer to âtheâ âsedâ script; this is understood to mean 399 the in-order concatenation of all of the SCRIPTs and SCRIPT-FILEs passed 400 in. *Note Overview::. 401 402 âsedâ commands follow this syntax: 403 404 [addr]X[options] 405 406 X is a single-letter âsedâ command. â[addr]â is an optional line 407 address. If â[addr]â is specified, the command X will be executed only 408 on the matched lines. â[addr]â can be a single line number, a regular 409 expression, or a range of lines (*note sed addresses::). Additional 410 â[options]â are used for some âsedâ commands. 411 412 The following example deletes lines 30 to 35 in the input. â30,35â 413 is an address range. âdâ is the delete command: 414 415 sed '30,35d' input.txt > output.txt 416 417 The following example prints all input until a line starting with the 418 string âfooâ is found. If such line is found, âsedâ will terminate with 419 exit status 42. If such line was not found (and no other error 420 occurred), âsedâ will exit with status 0. â/^foo/â is a 421 regular-expression address. âqâ is the quit command. â42â is the 422 command option. 423 424 sed '/^foo/q42' input.txt > output.txt 425 426 Commands within a SCRIPT or SCRIPT-FILE can be separated by 427 semicolons (â;â) or newlines (ASCII 10). Multiple scripts can be 428 specified with â-eâ or â-fâ options. 429 430 The following examples are all equivalent. They perform two âsedâ 431 operations: deleting any lines matching the regular expression â/^foo/â, 432 and replacing all occurrences of the string âhelloâ with âworldâ: 433 434 sed '/^foo/d ; s/hello/world/g' input.txt > output.txt 435 436 sed -e '/^foo/d' -e 's/hello/world/g' input.txt > output.txt 437 438 echo '/^foo/d' > script.sed 439 echo 's/hello/world/g' >> script.sed 440 sed -f script.sed input.txt > output.txt 441 442 echo 's/hello/world/g' > script2.sed 443 sed -e '/^foo/d' -f script2.sed input.txt > output.txt 444 445 Commands âaâ, âcâ, âiâ, due to their syntax, cannot be followed by 446 semicolons working as command separators and thus should be terminated 447 with newlines or be placed at the end of a SCRIPT or SCRIPT-FILE. 448 Commands can also be preceded with optional non-significant whitespace 449 characters. *Note Multiple commands syntax::. 450 451 452 File: sed.info, Node: sed commands list, Next: The "s" Command, Prev: sed script overview, Up: sed scripts 453 454 3.2 âsedâ commands summary 455 ========================== 456 457 The following commands are supported in GNU âsedâ. Some are standard 458 POSIX commands, while other are GNU extensions. Details and examples 459 for each command are in the following sections. (Mnemonics) are shown 460 in parentheses. 461 462 âa\â 463 âTEXTâ 464 Append TEXT after a line. 465 466 âa TEXTâ 467 Append TEXT after a line (alternative syntax). 468 469 âb LABELâ 470 Branch unconditionally to LABEL. The LABEL may be omitted, in 471 which case the next cycle is started. 472 473 âc\â 474 âTEXTâ 475 Replace (change) lines with TEXT. 476 477 âc TEXTâ 478 Replace (change) lines with TEXT (alternative syntax). 479 480 âdâ 481 Delete the pattern space; immediately start next cycle. 482 483 âDâ 484 If pattern space contains newlines, delete text in the pattern 485 space up to the first newline, and restart cycle with the resultant 486 pattern space, without reading a new line of input. 487 488 If pattern space contains no newline, start a normal new cycle as 489 if the âdâ command was issued. 490 491 âeâ 492 Executes the command that is found in pattern space and replaces 493 the pattern space with the output; a trailing newline is 494 suppressed. 495 496 âe COMMANDâ 497 Executes COMMAND and sends its output to the output stream. The 498 command can run across multiple lines, all but the last ending with 499 a back-slash. 500 501 âFâ 502 (filename) Print the file name of the current input file (with a 503 trailing newline). 504 505 âgâ 506 Replace the contents of the pattern space with the contents of the 507 hold space. 508 509 âGâ 510 Append a newline to the contents of the pattern space, and then 511 append the contents of the hold space to that of the pattern space. 512 513 âhâ 514 (hold) Replace the contents of the hold space with the contents of 515 the pattern space. 516 517 âHâ 518 Append a newline to the contents of the hold space, and then append 519 the contents of the pattern space to that of the hold space. 520 521 âi\â 522 âTEXTâ 523 insert TEXT before a line. 524 525 âi TEXTâ 526 insert TEXT before a line (alternative syntax). 527 528 âlâ 529 Print the pattern space in an unambiguous form. 530 531 ânâ 532 (next) If auto-print is not disabled, print the pattern space, 533 then, regardless, replace the pattern space with the next line of 534 input. If there is no more input then âsedâ exits without 535 processing any more commands. 536 537 âNâ 538 Add a newline to the pattern space, then append the next line of 539 input to the pattern space. If there is no more input then âsedâ 540 exits without processing any more commands. 541 542 âpâ 543 Print the pattern space. 544 545 âPâ 546 Print the pattern space, up to the first <newline>. 547 548 âq[EXIT-CODE]â 549 (quit) Exit âsedâ without processing any more commands or input. 550 551 âQ[EXIT-CODE]â 552 (quit) This command is the same as âqâ, but will not print the 553 contents of pattern space. Like âqâ, it provides the ability to 554 return an exit code to the caller. 555 556 âr filenameâ 557 Reads file FILENAME. 558 559 âR filenameâ 560 Queue a line of FILENAME to be read and inserted into the output 561 stream at the end of the current cycle, or when the next input line 562 is read. 563 564 âs/REGEXP/REPLACEMENT/[FLAGS]â 565 (substitute) Match the regular-expression against the content of 566 the pattern space. If found, replace matched string with 567 REPLACEMENT. 568 569 ât LABELâ 570 (test) Branch to LABEL only if there has been a successful 571 âsâubstitution since the last input line was read or conditional 572 branch was taken. The LABEL may be omitted, in which case the next 573 cycle is started. 574 575 âT LABELâ 576 (test) Branch to LABEL only if there have been no successful 577 âsâubstitutions since the last input line was read or conditional 578 branch was taken. The LABEL may be omitted, in which case the next 579 cycle is started. 580 581 âv [VERSION]â 582 (version) This command does nothing, but makes âsedâ fail if GNU 583 âsedâ extensions are not supported, or if the requested version is 584 not available. 585 586 âw filenameâ 587 Write the pattern space to FILENAME. 588 589 âW filenameâ 590 Write to the given filename the portion of the pattern space up to 591 the first newline 592 593 âxâ 594 Exchange the contents of the hold and pattern spaces. 595 596 ây/src/dst/â 597 Transliterate any characters in the pattern space which match any 598 of the SOURCE-CHARS with the corresponding character in DEST-CHARS. 599 600 âzâ 601 (zap) This command empties the content of pattern space. 602 603 â#â 604 A comment, until the next newline. 605 606 â{ CMD ; CMD ... }â 607 Group several commands together. 608 609 â=â 610 Print the current input line number (with a trailing newline). 611 612 â: LABELâ 613 Specify the location of LABEL for branch commands (âbâ, âtâ, âTâ). 614 615 616 File: sed.info, Node: The "s" Command, Next: Common Commands, Prev: sed commands list, Up: sed scripts 617 618 3.3 The âsâ Command 619 =================== 620 621 The âsâ command (as in substitute) is probably the most important in 622 âsedâ and has a lot of different options. The syntax of the âsâ command 623 is âs/REGEXP/REPLACEMENT/FLAGSâ. 624 625 Its basic concept is simple: the âsâ command attempts to match the 626 pattern space against the supplied regular expression REGEXP; if the 627 match is successful, then that portion of the pattern space which was 628 matched is replaced with REPLACEMENT. 629 630 For details about REGEXP syntax *note Regular Expression Addresses: 631 Regexp Addresses. 632 633 The REPLACEMENT can contain â\Nâ (N being a number from 1 to 9, 634 inclusive) references, which refer to the portion of the match which is 635 contained between the Nth â\(â and its matching â\)â. Also, the 636 REPLACEMENT can contain unescaped â&â characters which reference the 637 whole matched portion of the pattern space. 638 639 The â/â characters may be uniformly replaced by any other single 640 character within any given âsâ command. The â/â character (or whatever 641 other character is used in its stead) can appear in the REGEXP or 642 REPLACEMENT only if it is preceded by a â\â character. 643 644 Finally, as a GNU âsedâ extension, you can include a special sequence 645 made of a backslash and one of the letters âLâ, âlâ, âUâ, âuâ, or âEâ. 646 The meaning is as follows: 647 648 â\Lâ 649 Turn the replacement to lowercase until a â\Uâ or â\Eâ is found, 650 651 â\lâ 652 Turn the next character to lowercase, 653 654 â\Uâ 655 Turn the replacement to uppercase until a â\Lâ or â\Eâ is found, 656 657 â\uâ 658 Turn the next character to uppercase, 659 660 â\Eâ 661 Stop case conversion started by â\Lâ or â\Uâ. 662 663 When the âgâ flag is being used, case conversion does not propagate 664 from one occurrence of the regular expression to another. For example, 665 when the following command is executed with âa-b-â in pattern space: 666 s/\(b\?\)-/x\u\1/g 667 668 the output is âaxxBâ. When replacing the first â-â, the â\uâ sequence 669 only affects the empty replacement of â\1â. It does not affect the âxâ 670 character that is added to pattern space when replacing âb-â with âxBâ. 671 672 On the other hand, â\lâ and â\uâ do affect the remainder of the 673 replacement text if they are followed by an empty substitution. With 674 âa-b-â in pattern space, the following command: 675 s/\(b\?\)-/\u\1x/g 676 677 will replace â-â with âXâ (uppercase) and âb-â with âBxâ. If this 678 behavior is undesirable, you can prevent it by adding a â\Eâ 679 sequenceâafter â\1â in this case. 680 681 To include a literal â\â, â&â, or newline in the final replacement, 682 be sure to precede the desired â\â, â&â, or newline in the REPLACEMENT 683 with a â\â. 684 685 The âsâ command can be followed by zero or more of the following 686 FLAGS: 687 688 âgâ 689 Apply the replacement to _all_ matches to the REGEXP, not just the 690 first. 691 692 âNUMBERâ 693 Only replace the NUMBERth match of the REGEXP. 694 695 interaction in âsâ command Note: the POSIX standard does not 696 specify what should happen when you mix the âgâ and NUMBER 697 modifiers, and currently there is no widely agreed upon meaning 698 across âsedâ implementations. For GNU âsedâ, the interaction is 699 defined to be: ignore matches before the NUMBERth, and then match 700 and replace all matches from the NUMBERth on. 701 702 âpâ 703 If the substitution was made, then print the new pattern space. 704 705 Note: when both the âpâ and âeâ options are specified, the relative 706 ordering of the two produces very different results. In general, 707 âepâ (evaluate then print) is what you want, but operating the 708 other way round can be useful for debugging. For this reason, the 709 current version of GNU âsedâ interprets specially the presence of 710 âpâ options both before and after âeâ, printing the pattern space 711 before and after evaluation, while in general flags for the âsâ 712 command show their effect just once. This behavior, although 713 documented, might change in future versions. 714 715 âw FILENAMEâ 716 If the substitution was made, then write out the result to the 717 named file. As a GNU âsedâ extension, two special values of 718 FILENAME are supported: â/dev/stderrâ, which writes the result to 719 the standard error, and â/dev/stdoutâ, which writes to the standard 720 output.(1) 721 722 âeâ 723 This command allows one to pipe input from a shell command into 724 pattern space. If a substitution was made, the command that is 725 found in pattern space is executed and pattern space is replaced 726 with its output. A trailing newline is suppressed; results are 727 undefined if the command to be executed contains a NUL character. 728 This is a GNU âsedâ extension. 729 730 âIâ 731 âiâ 732 The âIâ modifier to regular-expression matching is a GNU extension 733 which makes âsedâ match REGEXP in a case-insensitive manner. 734 735 âMâ 736 âmâ 737 The âMâ modifier to regular-expression matching is a GNU âsedâ 738 extension which directs GNU âsedâ to match the regular expression 739 in âmulti-lineâ mode. The modifier causes â^â and â$â to match 740 respectively (in addition to the normal behavior) the empty string 741 after a newline, and the empty string before a newline. There are 742 special character sequences (â\`â and â\'â) which always match the 743 beginning or the end of the buffer. In addition, the period 744 character does not match a new-line character in multi-line mode. 745 746 ---------- Footnotes ---------- 747 748 (1) This is equivalent to âpâ unless the â-iâ option is being used. 749 750 751 File: sed.info, Node: Common Commands, Next: Other Commands, Prev: The "s" Command, Up: sed scripts 752 753 3.4 Often-Used Commands 754 ======================= 755 756 If you use âsedâ at all, you will quite likely want to know these 757 commands. 758 759 â#â 760 [No addresses allowed.] 761 762 The â#â character begins a comment; the comment continues until the 763 next newline. 764 765 If you are concerned about portability, be aware that some 766 implementations of âsedâ (which are not POSIX conforming) may only 767 support a single one-line comment, and then only when the very 768 first character of the script is a â#â. 769 770 Warning: if the first two characters of the âsedâ script are â#nâ, 771 then the â-nâ (no-autoprint) option is forced. If you want to put 772 a comment in the first line of your script and that comment begins 773 with the letter ânâ and you do not want this behavior, then be sure 774 to either use a capital âNâ, or place at least one space before the 775 ânâ. 776 777 âq [EXIT-CODE]â 778 Exit âsedâ without processing any more commands or input. 779 780 Example: stop after printing the second line: 781 $ seq 3 | sed 2q 782 1 783 2 784 785 This command accepts only one address. Note that the current 786 pattern space is printed if auto-print is not disabled with the 787 â-nâ options. The ability to return an exit code from the âsedâ 788 script is a GNU âsedâ extension. 789 790 See also the GNU âsedâ extension âQâ command which quits silently 791 without printing the current pattern space. 792 793 âdâ 794 Delete the pattern space; immediately start next cycle. 795 796 Example: delete the second input line: 797 $ seq 3 | sed 2d 798 1 799 3 800 801 âpâ 802 Print out the pattern space (to the standard output). This command 803 is usually only used in conjunction with the â-nâ command-line 804 option. 805 806 Example: print only the second input line: 807 $ seq 3 | sed -n 2p 808 2 809 810 ânâ 811 If auto-print is not disabled, print the pattern space, then, 812 regardless, replace the pattern space with the next line of input. 813 If there is no more input then âsedâ exits without processing any 814 more commands. 815 816 This command is useful to skip lines (e.g. process every Nth 817 line). 818 819 Example: perform substitution on every 3rd line (i.e. two ânâ 820 commands skip two lines): 821 $ seq 6 | sed 'n;n;s/./x/' 822 1 823 2 824 x 825 4 826 5 827 x 828 829 GNU âsedâ provides an extension address syntax of FIRST~STEP to 830 achieve the same result: 831 832 $ seq 6 | sed '0~3s/./x/' 833 1 834 2 835 x 836 4 837 5 838 x 839 840 â{ COMMANDS }â 841 A group of commands may be enclosed between â{â and â}â characters. 842 This is particularly useful when you want a group of commands to be 843 triggered by a single address (or address-range) match. 844 845 Example: perform substitution then print the second input line: 846 $ seq 3 | sed -n '2{s/2/X/ ; p}' 847 X 848 849 850 File: sed.info, Node: Other Commands, Next: Programming Commands, Prev: Common Commands, Up: sed scripts 851 852 3.5 Less Frequently-Used Commands 853 ================================= 854 855 Though perhaps less frequently used than those in the previous section, 856 some very small yet useful âsedâ scripts can be built with these 857 commands. 858 859 ây/SOURCE-CHARS/DEST-CHARS/â 860 Transliterate any characters in the pattern space which match any 861 of the SOURCE-CHARS with the corresponding character in DEST-CHARS. 862 863 Example: transliterate âa-jâ into â0-9â: 864 $ echo hello world | sed 'y/abcdefghij/0123456789/' 865 74llo worl3 866 867 (The â/â characters may be uniformly replaced by any other single 868 character within any given âyâ command.) 869 870 Instances of the â/â (or whatever other character is used in its 871 stead), â\â, or newlines can appear in the SOURCE-CHARS or 872 DEST-CHARS lists, provide that each instance is escaped by a â\â. 873 The SOURCE-CHARS and DEST-CHARS lists _must_ contain the same 874 number of characters (after de-escaping). 875 876 See the âtrâ command from GNU coreutils for similar functionality. 877 878 âa TEXTâ 879 Appending TEXT after a line. This is a GNU extension to the 880 standard âaâ command - see below for details. 881 882 Example: Add âhelloâ after the second line: 883 $ seq 3 | sed '2a hello' 884 1 885 2 886 hello 887 3 888 889 Leading whitespace after the âaâ command is ignored. The text to 890 add is read until the end of the line. 891 892 âa\â 893 âTEXTâ 894 Appending TEXT after a line. 895 896 Example: Add âhelloâ after the second line (⣠indicates printed 897 output lines): 898 $ seq 3 | sed '2a\ 899 hello' 900 â£1 901 â£2 902 â£hello 903 â£3 904 905 The âaâ command queues the lines of text which follow this command 906 (each but the last ending with a â\â, which are removed from the 907 output) to be output at the end of the current cycle, or when the 908 next input line is read. 909 910 As a GNU extension, this command accepts two addresses. 911 912 Escape sequences in TEXT are processed, so you should use â\\â in 913 TEXT to print a single backslash. 914 915 The commands resume after the last line without a backslash (â\â) - 916 âworldâ in the following example: 917 $ seq 3 | sed '2a\ 918 hello\ 919 world 920 3s/./X/' 921 â£1 922 â£2 923 â£hello 924 â£world 925 â£X 926 927 As a GNU extension, the âaâ command and TEXT can be separated into 928 two â-eâ parameters, enabling easier scripting: 929 $ seq 3 | sed -e '2a\' -e hello 930 1 931 2 932 hello 933 3 934 935 $ sed -e '2a\' -e "$VAR" 936 937 âi TEXTâ 938 insert TEXT before a line. This is a GNU extension to the standard 939 âiâ command - see below for details. 940 941 Example: Insert âhelloâ before the second line: 942 $ seq 3 | sed '2i hello' 943 1 944 hello 945 2 946 3 947 948 Leading whitespace after the âiâ command is ignored. The text to 949 add is read until the end of the line. 950 951 âi\â 952 âTEXTâ 953 Immediately output the lines of text which follow this command. 954 955 Example: Insert âhelloâ before the second line (⣠indicates printed 956 output lines): 957 $ seq 3 | sed '2i\ 958 hello' 959 â£1 960 â£hello 961 â£2 962 â£3 963 964 As a GNU extension, this command accepts two addresses. 965 966 Escape sequences in TEXT are processed, so you should use â\\â in 967 TEXT to print a single backslash. 968 969 The commands resume after the last line without a backslash (â\â) - 970 âworldâ in the following example: 971 $ seq 3 | sed '2i\ 972 hello\ 973 world 974 s/./X/' 975 â£X 976 â£hello 977 â£world 978 â£X 979 â£X 980 981 As a GNU extension, the âiâ command and TEXT can be separated into 982 two â-eâ parameters, enabling easier scripting: 983 $ seq 3 | sed -e '2i\' -e hello 984 1 985 hello 986 2 987 3 988 989 $ sed -e '2i\' -e "$VAR" 990 991 âc TEXTâ 992 Replaces the line(s) with TEXT. This is a GNU extension to the 993 standard âcâ command - see below for details. 994 995 Example: Replace the 2nd to 9th lines with the word âhelloâ: 996 $ seq 10 | sed '2,9c hello' 997 1 998 hello 999 10 1000 1001 Leading whitespace after the âcâ command is ignored. The text to 1002 add is read until the end of the line. 1003 1004 âc\â 1005 âTEXTâ 1006 Delete the lines matching the address or address-range, and output 1007 the lines of text which follow this command. 1008 1009 Example: Replace 2nd to 4th lines with the words âhelloâ and 1010 âworldâ (⣠indicates printed output lines): 1011 $ seq 5 | sed '2,4c\ 1012 hello\ 1013 world' 1014 â£1 1015 â£hello 1016 â£world 1017 â£5 1018 1019 If no addresses are given, each line is replaced. 1020 1021 A new cycle is started after this command is done, since the 1022 pattern space will have been deleted. In the following example, 1023 the âcâ starts a new cycle and the substitution command is not 1024 performed on the replaced text: 1025 1026 $ seq 3 | sed '2c\ 1027 hello 1028 s/./X/' 1029 â£X 1030 â£hello 1031 â£X 1032 1033 As a GNU extension, the âcâ command and TEXT can be separated into 1034 two â-eâ parameters, enabling easier scripting: 1035 $ seq 3 | sed -e '2c\' -e hello 1036 1 1037 hello 1038 3 1039 1040 $ sed -e '2c\' -e "$VAR" 1041 1042 â=â 1043 Print out the current input line number (with a trailing newline). 1044 1045 $ printf '%s\n' aaa bbb ccc | sed = 1046 1 1047 aaa 1048 2 1049 bbb 1050 3 1051 ccc 1052 1053 As a GNU extension, this command accepts two addresses. 1054 1055 âl Nâ 1056 Print the pattern space in an unambiguous form: non-printable 1057 characters (and the â\â character) are printed in C-style escaped 1058 form; long lines are split, with a trailing â\â character to 1059 indicate the split; the end of each line is marked with a â$â. 1060 1061 N specifies the desired line-wrap length; a length of 0 (zero) 1062 means to never wrap long lines. If omitted, the default as 1063 specified on the command line is used. The N parameter is a GNU 1064 âsedâ extension. 1065 1066 âr FILENAMEâ 1067 1068 Reads file FILENAME. Example: 1069 1070 $ seq 3 | sed '2r/etc/hostname' 1071 1 1072 2 1073 fencepost.gnu.org 1074 3 1075 1076 Queue the contents of FILENAME to be read and inserted into the 1077 output stream at the end of the current cycle, or when the next 1078 input line is read. Note that if FILENAME cannot be read, it is 1079 treated as if it were an empty file, without any error indication. 1080 1081 As a GNU âsedâ extension, the special value â/dev/stdinâ is 1082 supported for the file name, which reads the contents of the 1083 standard input. 1084 1085 As a GNU extension, this command accepts two addresses. The file 1086 will then be reread and inserted on each of the addressed lines. 1087 1088 As a GNU âsedâ extension, the ârâ command accepts a zero address, 1089 inserting a file _before_ the first line of the input *note Adding 1090 a header to multiple files::. 1091 1092 âw FILENAMEâ 1093 Write the pattern space to FILENAME. As a GNU âsedâ extension, two 1094 special values of FILENAME are supported: â/dev/stderrâ, which 1095 writes the result to the standard error, and â/dev/stdoutâ, which 1096 writes to the standard output.(1) 1097 1098 The file will be created (or truncated) before the first input line 1099 is read; all âwâ commands (including instances of the âwâ flag on 1100 successful âsâ commands) which refer to the same FILENAME are 1101 output without closing and reopening the file. 1102 1103 âDâ 1104 If pattern space contains no newline, start a normal new cycle as 1105 if the âdâ command was issued. Otherwise, delete text in the 1106 pattern space up to the first newline, and restart cycle with the 1107 resultant pattern space, without reading a new line of input. 1108 1109 âNâ 1110 Add a newline to the pattern space, then append the next line of 1111 input to the pattern space. If there is no more input then âsedâ 1112 exits without processing any more commands. 1113 1114 When â-zâ is used, a zero byte (the ascii âNULâ character) is added 1115 between the lines (instead of a new line). 1116 1117 By default âsedâ does not terminate if there is no ânextâ input 1118 line. This is a GNU extension which can be disabled with 1119 â--posixâ. *Note N command on the last line: N_command_last_line. 1120 1121 âPâ 1122 Print out the portion of the pattern space up to the first newline. 1123 1124 âhâ 1125 Replace the contents of the hold space with the contents of the 1126 pattern space. 1127 1128 âHâ 1129 Append a newline to the contents of the hold space, and then append 1130 the contents of the pattern space to that of the hold space. 1131 1132 âgâ 1133 Replace the contents of the pattern space with the contents of the 1134 hold space. 1135 1136 âGâ 1137 Append a newline to the contents of the pattern space, and then 1138 append the contents of the hold space to that of the pattern space. 1139 1140 âxâ 1141 Exchange the contents of the hold and pattern spaces. 1142 1143 ---------- Footnotes ---------- 1144 1145 (1) This is equivalent to âpâ unless the â-iâ option is being used. 1146 1147 1148 File: sed.info, Node: Programming Commands, Next: Extended Commands, Prev: Other Commands, Up: sed scripts 1149 1150 3.6 Commands for âsedâ gurus 1151 ============================ 1152 1153 In most cases, use of these commands indicates that you are probably 1154 better off programming in something like âawkâ or Perl. But 1155 occasionally one is committed to sticking with âsedâ, and these commands 1156 can enable one to write quite convoluted scripts. 1157 1158 â: LABELâ 1159 [No addresses allowed.] 1160 1161 Specify the location of LABEL for branch commands. In all other 1162 respects, a no-op. 1163 1164 âb LABELâ 1165 Unconditionally branch to LABEL. The LABEL may be omitted, in 1166 which case the next cycle is started. 1167 1168 ât LABELâ 1169 Branch to LABEL only if there has been a successful âsâubstitution 1170 since the last input line was read or conditional branch was taken. 1171 The LABEL may be omitted, in which case the next cycle is started. 1172 1173 1174 File: sed.info, Node: Extended Commands, Next: Multiple commands syntax, Prev: Programming Commands, Up: sed scripts 1175 1176 3.7 Commands Specific to GNU âsedâ 1177 ================================== 1178 1179 These commands are specific to GNU âsedâ, so you must use them with care 1180 and only when you are sure that hindering portability is not evil. They 1181 allow you to check for GNU âsedâ extensions or to do tasks that are 1182 required quite often, yet are unsupported by standard âsedâs. 1183 1184 âe [COMMAND]â 1185 This command allows one to pipe input from a shell command into 1186 pattern space. Without parameters, the âeâ command executes the 1187 command that is found in pattern space and replaces the pattern 1188 space with the output; a trailing newline is suppressed. 1189 1190 If a parameter is specified, instead, the âeâ command interprets it 1191 as a command and sends its output to the output stream. The 1192 command can run across multiple lines, all but the last ending with 1193 a back-slash. 1194 1195 In both cases, the results are undefined if the command to be 1196 executed contains a NUL character. 1197 1198 Note that, unlike the ârâ command, the output of the command will 1199 be printed immediately; the ârâ command instead delays the output 1200 to the end of the current cycle. 1201 1202 âFâ 1203 Print out the file name of the current input file (with a trailing 1204 newline). 1205 1206 âQ [EXIT-CODE]â 1207 This command accepts only one address. 1208 1209 This command is the same as âqâ, but will not print the contents of 1210 pattern space. Like âqâ, it provides the ability to return an exit 1211 code to the caller. 1212 1213 This command can be useful because the only alternative ways to 1214 accomplish this apparently trivial function are to use the â-nâ 1215 option (which can unnecessarily complicate your script) or 1216 resorting to the following snippet, which wastes time by reading 1217 the whole file without any visible effect: 1218 1219 :eat 1220 $d Quit silently on the last line 1221 N Read another line, silently 1222 g Overwrite pattern space each time to save memory 1223 b eat 1224 1225 âR FILENAMEâ 1226 Queue a line of FILENAME to be read and inserted into the output 1227 stream at the end of the current cycle, or when the next input line 1228 is read. Note that if FILENAME cannot be read, or if its end is 1229 reached, no line is appended, without any error indication. 1230 1231 As with the ârâ command, the special value â/dev/stdinâ is 1232 supported for the file name, which reads a line from the standard 1233 input. 1234 1235 âT LABELâ 1236 Branch to LABEL only if there have been no successful 1237 âsâubstitutions since the last input line was read or conditional 1238 branch was taken. The LABEL may be omitted, in which case the next 1239 cycle is started. 1240 1241 âv VERSIONâ 1242 This command does nothing, but makes âsedâ fail if GNU âsedâ 1243 extensions are not supported, simply because other versions of 1244 âsedâ do not implement it. In addition, you can specify the 1245 version of âsedâ that your script requires, such as â4.0.5â. The 1246 default is â4.0â because that is the first version that implemented 1247 this command. 1248 1249 This command enables all GNU extensions even if âPOSIXLY_CORRECTâ 1250 is set in the environment. 1251 1252 âW FILENAMEâ 1253 Write to the given filename the portion of the pattern space up to 1254 the first newline. Everything said under the âwâ command about 1255 file handling holds here too. 1256 1257 âzâ 1258 This command empties the content of pattern space. It is usually 1259 the same as âs/.*//â, but is more efficient and works in the 1260 presence of invalid multibyte sequences in the input stream. POSIX 1261 mandates that such sequences are _not_ matched by â.â, so that 1262 there is no portable way to clear âsedââs buffers in the middle of 1263 the script in most multibyte locales (including UTF-8 locales). 1264 1265 1266 File: sed.info, Node: Multiple commands syntax, Prev: Extended Commands, Up: sed scripts 1267 1268 3.8 Multiple commands syntax 1269 ============================ 1270 1271 There are several methods to specify multiple commands in a âsedâ 1272 program. 1273 1274 Using newlines is most natural when running a sed script from a file 1275 (using the â-fâ option). 1276 1277 On the command line, all âsedâ commands may be separated by newlines. 1278 Alternatively, you may specify each command as an argument to an â-eâ 1279 option: 1280 1281 $ seq 6 | sed '1d 1282 3d 1283 5d' 1284 2 1285 4 1286 6 1287 1288 $ seq 6 | sed -e 1d -e 3d -e 5d 1289 2 1290 4 1291 6 1292 1293 A semicolon (â;â) may be used to separate most simple commands: 1294 1295 $ seq 6 | sed '1d;3d;5d' 1296 2 1297 4 1298 6 1299 1300 The â{â,â}â,âbâ,âtâ,âTâ,â:â commands can be separated with a 1301 semicolon (this is a non-portable GNU âsedâ extension). 1302 1303 $ seq 4 | sed '{1d;3d}' 1304 2 1305 4 1306 1307 $ seq 6 | sed '{1d;3d};5d' 1308 2 1309 4 1310 6 1311 1312 Labels used in âbâ,âtâ,âTâ,â:â commands are read until a semicolon. 1313 Leading and trailing whitespace is ignored. In the examples below the 1314 label is âxâ. The first example works with GNU âsedâ. The second is a 1315 portable equivalent. For more information about branching and labels 1316 *note Branching and flow control::. 1317 1318 $ seq 3 | sed '/1/b x ; s/^/=/ ; :x ; 3d' 1319 1 1320 =2 1321 1322 $ seq 3 | sed -e '/1/bx' -e 's/^/=/' -e ':x' -e '3d' 1323 1 1324 =2 1325 1326 3.8.1 Commands Requiring a newline 1327 ---------------------------------- 1328 1329 The following commands cannot be separated by a semicolon and require a 1330 newline: 1331 1332 âaâ,âcâ,âiâ (append/change/insert) 1333 1334 All characters following âaâ,âcâ,âiâ commands are taken as the text 1335 to append/change/insert. Using a semicolon leads to undesirable 1336 results: 1337 1338 $ seq 2 | sed '1aHello ; 2d' 1339 1 1340 Hello ; 2d 1341 2 1342 1343 Separate the commands using â-eâ or a newline: 1344 1345 $ seq 2 | sed -e 1aHello -e 2d 1346 1 1347 Hello 1348 1349 $ seq 2 | sed '1aHello 1350 2d' 1351 1 1352 Hello 1353 1354 Note that specifying the text to add (âHelloâ) immediately after 1355 âaâ,âcâ,âiâ is itself a GNU âsedâ extension. A portable, 1356 POSIX-compliant alternative is: 1357 1358 $ seq 2 | sed '1a\ 1359 Hello 1360 2d' 1361 1 1362 Hello 1363 1364 â#â (comment) 1365 1366 All characters following â#â until the next newline are ignored. 1367 1368 $ seq 3 | sed '# this is a comment ; 2d' 1369 1 1370 2 1371 3 1372 1373 1374 $ seq 3 | sed '# this is a comment 1375 2d' 1376 1 1377 3 1378 1379 ârâ,âRâ,âwâ,âWâ (reading and writing files) 1380 1381 The ârâ,âRâ,âwâ,âWâ commands parse the filename until end of the 1382 line. If whitespace, comments or semicolons are found, they will 1383 be included in the filename, leading to unexpected results: 1384 1385 $ seq 2 | sed '1w hello.txt ; 2d' 1386 1 1387 2 1388 1389 $ ls -log 1390 total 4 1391 -rw-rw-r-- 1 2 Jan 23 23:03 hello.txt ; 2d 1392 1393 $ cat 'hello.txt ; 2d' 1394 1 1395 1396 Note that âsedâ silently ignores read/write errors in 1397 ârâ,âRâ,âwâ,âWâ commands (such as missing files). In the following 1398 example, âsedâ tries to read a file named ââhello.txt ; Nââ. The 1399 file is missing, and the error is silently ignored: 1400 1401 $ echo x | sed '1rhello.txt ; N' 1402 x 1403 1404 âeâ (command execution) 1405 1406 Any characters following the âeâ command until the end of the line 1407 will be sent to the shell. If whitespace, comments or semicolons 1408 are found, they will be included in the shell command, leading to 1409 unexpected results: 1410 1411 $ echo a | sed '1e touch foo#bar' 1412 a 1413 1414 $ ls -1 1415 foo#bar 1416 1417 $ echo a | sed '1e touch foo ; s/a/b/' 1418 sh: 1: s/a/b/: not found 1419 a 1420 1421 âs///[we]â (substitute with âeâ or âwâ flags) 1422 1423 In a substitution command, the âwâ flag writes the substitution 1424 result to a file, and the âeâ flag executes the substitution result 1425 as a shell command. As with the âr/R/w/W/eâ commands, these must 1426 be terminated with a newline. If whitespace, comments or 1427 semicolons are found, they will be included in the shell command or 1428 filename, leading to unexpected results: 1429 1430 $ echo a | sed 's/a/b/w1.txt#foo' 1431 b 1432 1433 $ ls -1 1434 1.txt#foo 1435 1436 1437 File: sed.info, Node: sed addresses, Next: sed regular expressions, Prev: sed scripts, Up: Top 1438 1439 4 Addresses: selecting lines 1440 **************************** 1441 1442 * Menu: 1443 1444 * Addresses overview:: Addresses overview 1445 * Numeric Addresses:: selecting lines by numbers 1446 * Regexp Addresses:: selecting lines by text matching 1447 * Range Addresses:: selecting a range of lines 1448 * Zero Address:: Using address â0â 1449 1450 1451 File: sed.info, Node: Addresses overview, Next: Numeric Addresses, Up: sed addresses 1452 1453 4.1 Addresses overview 1454 ====================== 1455 1456 Addresses determine on which line(s) the âsedâ command will be executed. 1457 The following command replaces any first occurrence of âhelloâ with 1458 âworldâ only on line 144: 1459 1460 sed '144s/hello/world/' input.txt > output.txt 1461 1462 If no address is specified, the command is performed on all lines. 1463 The following command replaces âhelloâ with âworldâ, targeting every 1464 line of the input file. However, note that it modifies only the first 1465 instance of âhelloâ on each line. Use the âgâ modifier to affect every 1466 instance on each affected line. 1467 1468 sed 's/hello/world/' input.txt > output.txt 1469 1470 Addresses can contain regular expressions to match lines based on 1471 content instead of line numbers. The following command replaces âhelloâ 1472 with âworldâ only on lines containing the string âappleâ: 1473 1474 sed '/apple/s/hello/world/' input.txt > output.txt 1475 1476 An address range is specified with two addresses separated by a comma 1477 (â,â). Addresses can be numeric, regular expressions, or a mix of both. 1478 The following command replaces âhelloâ with âworldâ only on lines 4 to 1479 17 (inclusive): 1480 1481 sed '4,17s/hello/world/' input.txt > output.txt 1482 1483 Appending the â!â character to the end of an address specification 1484 (before the command letter) negates the sense of the match. That is, if 1485 the â!â character follows an address or an address range, then only 1486 lines which do _not_ match the addresses will be selected. The 1487 following command replaces âhelloâ with âworldâ only on lines _not_ 1488 containing the string âappleâ: 1489 1490 sed '/apple/!s/hello/world/' input.txt > output.txt 1491 1492 The following command replaces âhelloâ with âworldâ only on lines 1 1493 to 3 and from line 18 to the last line of the input file (i.e. 1494 excluding lines 4 to 17): 1495 1496 sed '4,17!s/hello/world/' input.txt > output.txt 1497 1498 1499 File: sed.info, Node: Numeric Addresses, Next: Regexp Addresses, Prev: Addresses overview, Up: sed addresses 1500 1501 4.2 Selecting lines by numbers 1502 ============================== 1503 1504 Addresses in a âsedâ script can be in any of the following forms: 1505 âNUMBERâ 1506 Specifying a line number will match only that line in the input. 1507 (Note that âsedâ counts lines continuously across all input files 1508 unless â-iâ or â-sâ options are specified.) 1509 1510 â$â 1511 This address matches the last line of the last file of input, or 1512 the last line of each file when the â-iâ or â-sâ options are 1513 specified. 1514 1515 âFIRST~STEPâ 1516 This GNU extension matches every STEPth line starting with line 1517 FIRST. In particular, lines will be selected when there exists a 1518 non-negative N such that the current line-number equals FIRST + (N 1519 * STEP). Thus, one would use â1~2â to select the odd-numbered 1520 lines and â0~2â for even-numbered lines; to pick every third line 1521 starting with the second, â2~3â would be used; to pick every fifth 1522 line starting with the tenth, use â10~5â; and â50~0â is just an 1523 obscure way of saying â50â. 1524 1525 The following commands demonstrate the step address usage: 1526 1527 $ seq 10 | sed -n '0~4p' 1528 4 1529 8 1530 1531 $ seq 10 | sed -n '1~3p' 1532 1 1533 4 1534 7 1535 10 1536 1537 1538 File: sed.info, Node: Regexp Addresses, Next: Range Addresses, Prev: Numeric Addresses, Up: sed addresses 1539 1540 4.3 selecting lines by text matching 1541 ==================================== 1542 1543 GNU âsedâ supports the following regular expression addresses. The 1544 default regular expression is *note Basic Regular Expression (BRE): BRE 1545 syntax. If â-Eâ or â-râ options are used, The regular expression should 1546 be in *note Extended Regular Expression (ERE): ERE syntax. syntax. 1547 *Note BRE vs ERE::. 1548 1549 â/REGEXP/â 1550 This will select any line which matches the regular expression 1551 REGEXP. If REGEXP itself includes any â/â characters, each must be 1552 escaped by a backslash (â\â). 1553 1554 The following command prints lines in â/etc/passwdâ which end with 1555 âbashâ(1): 1556 1557 sed -n '/bash$/p' /etc/passwd 1558 1559 The empty regular expression â//â repeats the last regular 1560 expression match (the same holds if the empty regular expression is 1561 passed to the âsâ command). Note that modifiers to regular 1562 expressions are evaluated when the regular expression is compiled, 1563 thus it is invalid to specify them together with the empty regular 1564 expression. 1565 1566 â\%REGEXP%â 1567 (The â%â may be replaced by any other single character.) 1568 1569 This also matches the regular expression REGEXP, but allows one to 1570 use a different delimiter than â/â. This is particularly useful if 1571 the REGEXP itself contains a lot of slashes, since it avoids the 1572 tedious escaping of every â/â. If REGEXP itself includes any 1573 delimiter characters, each must be escaped by a backslash (â\â). 1574 1575 The following commands are equivalent. They print lines which 1576 start with â/home/alice/documents/â: 1577 1578 sed -n '/^\/home\/alice\/documents\//p' 1579 sed -n '\%^/home/alice/documents/%p' 1580 sed -n '\;^/home/alice/documents/;p' 1581 1582 â/REGEXP/Iâ 1583 â\%REGEXP%Iâ 1584 The âIâ modifier to regular-expression matching is a GNU extension 1585 which causes the REGEXP to be matched in a case-insensitive manner. 1586 1587 In many other programming languages, a lower case âiâ is used for 1588 case-insensitive regular expression matching. However, in âsedâ 1589 the âiâ is used for the insert command (*note insert command::). 1590 1591 Observe the difference between the following examples. 1592 1593 In this example, â/b/Iâ is the address: regular expression with âIâ 1594 modifier. âdâ is the delete command: 1595 1596 $ printf "%s\n" a b c | sed '/b/Id' 1597 a 1598 c 1599 1600 Here, â/b/â is the address: a regular expression. âiâ is the 1601 insert command. âdâ is the value to insert. A line with âdâ is 1602 then inserted above the matched line: 1603 1604 $ printf "%s\n" a b c | sed '/b/id' 1605 a 1606 d 1607 b 1608 c 1609 1610 â/REGEXP/Mâ 1611 â\%REGEXP%Mâ 1612 The âMâ modifier to regular-expression matching is a GNU âsedâ 1613 extension which directs GNU âsedâ to match the regular expression 1614 in âmulti-lineâ mode. The modifier causes â^â and â$â to match 1615 respectively (in addition to the normal behavior) the empty string 1616 after a newline, and the empty string before a newline. There are 1617 special character sequences (â\`â and â\'â) which always match the 1618 beginning or the end of the buffer. In addition, the period 1619 character does not match a new-line character in multi-line mode. 1620 1621 Regex addresses operate on the content of the current pattern space. 1622 If the pattern space is changed (for example with âs///â command) the 1623 regular expression matching will operate on the changed text. 1624 1625 In the following example, automatic printing is disabled with â-nâ. 1626 The âs/2/X/â command changes lines containing â2â to âXâ. The command 1627 â/[0-9]/pâ matches lines with digits and prints them. Because the 1628 second line is changed before the â/[0-9]/â regex, it will not match and 1629 will not be printed: 1630 1631 $ seq 3 | sed -n 's/2/X/ ; /[0-9]/p' 1632 1 1633 3 1634 1635 ---------- Footnotes ---------- 1636 1637 (1) There are of course many other ways to do the same, e.g. 1638 grep 'bash$' /etc/passwd 1639 awk -F: '$7 == "/bin/bash"' /etc/passwd 1640 1641 1642 File: sed.info, Node: Range Addresses, Next: Zero Address, Prev: Regexp Addresses, Up: sed addresses 1643 1644 4.4 Range Addresses 1645 =================== 1646 1647 An address range can be specified by specifying two addresses separated 1648 by a comma (â,â). An address range matches lines starting from where 1649 the first address matches, and continues until the second address 1650 matches (inclusively): 1651 1652 $ seq 10 | sed -n '4,6p' 1653 4 1654 5 1655 6 1656 1657 If the second address is a REGEXP, then checking for the ending match 1658 will start with the line _following_ the line which matched the first 1659 address: a range will always span at least two lines (except of course 1660 if the input stream ends). 1661 1662 $ seq 10 | sed -n '4,/[0-9]/p' 1663 4 1664 5 1665 1666 If the second address is a NUMBER less than (or equal to) the line 1667 matching the first address, then only the one line is matched: 1668 1669 $ seq 10 | sed -n '4,1p' 1670 4 1671 1672 GNU âsedâ also supports some special two-address forms; all these are 1673 GNU extensions: 1674 â0,/REGEXP/â 1675 A line number of â0â can be used in an address specification like 1676 â0,/REGEXP/â so that âsedâ will try to match REGEXP in the first 1677 input line too. In other words, â0,/REGEXP/â is similar to 1678 â1,/REGEXP/â, except that if ADDR2 matches the very first line of 1679 input the â0,/REGEXP/â form will consider it to end the range, 1680 whereas the â1,/REGEXP/â form will match the beginning of its range 1681 and hence make the range span up to the _second_ occurrence of the 1682 regular expression. 1683 1684 The following examples demonstrate the difference between starting 1685 with address 1 and 0: 1686 1687 $ seq 10 | sed -n '1,/[0-9]/p' 1688 1 1689 2 1690 1691 $ seq 10 | sed -n '0,/[0-9]/p' 1692 1 1693 1694 âADDR1,+Nâ 1695 Matches ADDR1 and the N lines following ADDR1. 1696 1697 $ seq 10 | sed -n '6,+2p' 1698 6 1699 7 1700 8 1701 1702 ADDR1 can be a line number or a regular expression. 1703 1704 âADDR1,~Nâ 1705 Matches ADDR1 and the lines following ADDR1 until the next line 1706 whose input line number is a multiple of N. The following command 1707 prints starting at line 6, until the next line which is a multiple 1708 of 4 (i.e. line 8): 1709 1710 $ seq 10 | sed -n '6,~4p' 1711 6 1712 7 1713 8 1714 1715 ADDR1 can be a line number or a regular expression. 1716 1717 1718 File: sed.info, Node: Zero Address, Prev: Range Addresses, Up: sed addresses 1719 1720 4.5 Zero Address 1721 ================ 1722 1723 As a GNU âsedâ extension, â0â address can be used in two cases: 1724 1. In a regex range addresses as â0,/REGEXP/â (*note Zero Address 1725 Regex Range::). 1726 2. With the ârâ command, inserting a file before the first line (*note 1727 Adding a header to multiple files::). 1728 1729 Note that these are the only places where the â0â address makes 1730 sense; Commands which are given the â0â address in any other way will 1731 give an error. 1732 1733 1734 File: sed.info, Node: sed regular expressions, Next: advanced sed, Prev: sed addresses, Up: Top 1735 1736 5 Regular Expressions: selecting text 1737 ************************************* 1738 1739 * Menu: 1740 1741 * Regular Expressions Overview:: Overview of Regular expression in âsedâ 1742 * BRE vs ERE:: Basic (BRE) and extended (ERE) regular expression 1743 syntax 1744 * BRE syntax:: Overview of basic regular expression syntax 1745 * ERE syntax:: Overview of extended regular expression syntax 1746 * Character Classes and Bracket Expressions:: 1747 * regexp extensions:: Additional regular expression commands 1748 * Back-references and Subexpressions:: Back-references and Subexpressions 1749 * Escapes:: Specifying special characters 1750 * Locale Considerations:: Multibyte characters and locale considerations 1751 1752 1753 File: sed.info, Node: Regular Expressions Overview, Next: BRE vs ERE, Up: sed regular expressions 1754 1755 5.1 Overview of regular expression in âsedâ 1756 =========================================== 1757 1758 To know how to use âsedâ, people should understand regular expressions 1759 (âregexpâ for short). A regular expression is a pattern that is matched 1760 against a subject string from left to right. Most characters are 1761 âordinaryâ: they stand for themselves in a pattern, and match the 1762 corresponding characters. Regular expressions in âsedâ are specified 1763 between two slashes. 1764 1765 The following command prints lines containing the string âhelloâ: 1766 1767 sed -n '/hello/p' 1768 1769 The above example is equivalent to this âgrepâ command: 1770 1771 grep 'hello' 1772 1773 The power of regular expressions comes from the ability to include 1774 alternatives and repetitions in the pattern. These are encoded in the 1775 pattern by the use of âspecial charactersâ, which do not stand for 1776 themselves but instead are interpreted in some special way. 1777 1778 The character â^â (caret) in a regular expression matches the 1779 beginning of the line. The character â.â (dot) matches any single 1780 character. The following âsedâ command matches and prints lines which 1781 start with the letter âbâ, followed by any single character, followed by 1782 the letter âdâ: 1783 1784 $ printf "%s\n" abode bad bed bit bid byte body | sed -n '/^b.d/p' 1785 bad 1786 bed 1787 bid 1788 body 1789 1790 The following sections explain the meaning and usage of special 1791 characters in regular expressions. 1792 1793 1794 File: sed.info, Node: BRE vs ERE, Next: BRE syntax, Prev: Regular Expressions Overview, Up: sed regular expressions 1795 1796 5.2 Basic (BRE) and extended (ERE) regular expression 1797 ===================================================== 1798 1799 Basic and extended regular expressions are two variations on the syntax 1800 of the specified pattern. Basic Regular Expression (BRE) syntax is the 1801 default in âsedâ (and similarly in âgrepâ). Use the POSIX-specified 1802 â-Eâ option (â-râ, â--regexp-extendedâ) to enable Extended Regular 1803 Expression (ERE) syntax. 1804 1805 In GNU âsedâ, the only difference between basic and extended regular 1806 expressions is in the behavior of a few special characters: â?â, â+â, 1807 parentheses, braces (â{}â), and â|â. 1808 1809 With basic (BRE) syntax, these characters do not have special meaning 1810 unless prefixed with a backslash (â\â); While with extended (ERE) syntax 1811 it is reversed: these characters are special unless they are prefixed 1812 with backslash (â\â). 1813 1814 Desired pattern Basic (BRE) Syntax Extended (ERE) Syntax 1815 1816 -------------------------------------------------------------------------- 1817 literal â+â (plus $ echo 'a+b=c' > foo $ echo 'a+b=c' > foo 1818 sign) $ sed -n '/a+b/p' foo $ sed -E -n '/a\+b/p' foo 1819 a+b=c a+b=c 1820 1821 One or more âaâ $ echo aab > foo $ echo aab > foo 1822 characters $ sed -n '/a\+b/p' foo $ sed -E -n '/a+b/p' foo 1823 followed by âbâ aab aab 1824 (plus sign as 1825 special 1826 meta-character) 1827 1828 1829 File: sed.info, Node: BRE syntax, Next: ERE syntax, Prev: BRE vs ERE, Up: sed regular expressions 1830 1831 5.3 Overview of basic regular expression syntax 1832 =============================================== 1833 1834 Here is a brief description of regular expression syntax as used in 1835 âsedâ. 1836 1837 âCHARâ 1838 A single ordinary character matches itself. 1839 1840 â*â 1841 Matches a sequence of zero or more instances of matches for the 1842 preceding regular expression, which must be an ordinary character, 1843 a special character preceded by â\â, a â.â, a grouped regexp (see 1844 below), or a bracket expression. As a GNU extension, a postfixed 1845 regular expression can also be followed by â*â; for example, âa**â 1846 is equivalent to âa*â. POSIX 1003.1-2001 says that â*â stands for 1847 itself when it appears at the start of a regular expression or 1848 subexpression, but many non-GNU implementations do not support this 1849 and portable scripts should instead use â\*â in these contexts. 1850 â.â 1851 Matches any character, including newline. 1852 1853 â^â 1854 Matches the null string at beginning of the pattern space, i.e. 1855 what appears after the circumflex must appear at the beginning of 1856 the pattern space. 1857 1858 In most scripts, pattern space is initialized to the content of 1859 each line (*note How âsedâ works: Execution Cycle.). So, it is a 1860 useful simplification to think of â^#includeâ as matching only 1861 lines where â#includeâ is the first thing on the lineâif there is 1862 any preceding space, for example, the match fails. This 1863 simplification is valid as long as the original content of pattern 1864 space is not modified, for example with an âsâ command. 1865 1866 â^â acts as a special character only at the beginning of the 1867 regular expression or subexpression (that is, after â\(â or â\|â). 1868 Portable scripts should avoid â^â at the beginning of a 1869 subexpression, though, as POSIX allows implementations that treat 1870 â^â as an ordinary character in that context. 1871 1872 â$â 1873 It is the same as â^â, but refers to end of pattern space. â$â 1874 also acts as a special character only at the end of the regular 1875 expression or subexpression (that is, before â\)â or â\|â), and its 1876 use at the end of a subexpression is not portable. 1877 1878 â[LIST]â 1879 â[^LIST]â 1880 Matches any single character in LIST: for example, â[aeiou]â 1881 matches all vowels. A list may include sequences like 1882 âCHAR1-CHAR2â, which matches any character between (inclusive) 1883 CHAR1 and CHAR2. *Note Character Classes and Bracket 1884 Expressions::. 1885 1886 â\+â 1887 As â*â, but matches one or more. It is a GNU extension. 1888 1889 â\?â 1890 As â*â, but only matches zero or one. It is a GNU extension. 1891 1892 â\{I\}â 1893 As â*â, but matches exactly I sequences (I is a decimal integer; 1894 for portability, keep it between 0 and 255 inclusive). 1895 1896 â\{I,J\}â 1897 Matches between I and J, inclusive, sequences. 1898 1899 â\{I,\}â 1900 Matches more than or equal to I sequences. 1901 1902 â\(REGEXP\)â 1903 Groups the inner REGEXP as a whole, this is used to: 1904 1905 ⢠Apply postfix operators, like â\(abcd\)*â: this will search 1906 for zero or more whole sequences of âabcdâ, while âabcd*â 1907 would search for âabcâ followed by zero or more occurrences of 1908 âdâ. Note that support for â\(abcd\)*â is required by POSIX 1909 1003.1-2001, but many non-GNU implementations do not support 1910 it and hence it is not universally portable. 1911 1912 ⢠Use back references (see below). 1913 1914 âREGEXP1\|REGEXP2â 1915 Matches either REGEXP1 or REGEXP2. Use parentheses to use complex 1916 alternative regular expressions. The matching process tries each 1917 alternative in turn, from left to right, and the first one that 1918 succeeds is used. It is a GNU extension. 1919 1920 âREGEXP1REGEXP2â 1921 Matches the concatenation of REGEXP1 and REGEXP2. Concatenation 1922 binds more tightly than â\|â, â^â, and â$â, but less tightly than 1923 the other regular expression operators. 1924 1925 â\DIGITâ 1926 Matches the DIGIT-th â\(...\)â parenthesized subexpression in the 1927 regular expression. This is called a âback referenceâ. 1928 Subexpressions are implicitly numbered by counting occurrences of 1929 â\(â left-to-right. 1930 1931 â\nâ 1932 Matches the newline character. 1933 1934 â\CHARâ 1935 Matches CHAR, where CHAR is one of â$â, â*â, â.â, â[â, â\â, or â^â. 1936 Note that the only C-like backslash sequences that you can portably 1937 assume to be interpreted are â\nâ and â\\â; in particular â\tâ is 1938 not portable, and matches a âtâ under most implementations of 1939 âsedâ, rather than a tab character. 1940 1941 Note that the regular expression matcher is greedy, i.e., matches are 1942 attempted from left to right and, if two or more matches are possible 1943 starting at the same character, it selects the longest. 1944 1945 Examples: 1946 âabcdefâ 1947 Matches âabcdefâ. 1948 1949 âa*bâ 1950 Matches zero or more âaâs followed by a single âbâ. For example, 1951 âbâ or âaaaaabâ. 1952 1953 âa\?bâ 1954 Matches âbâ or âabâ. 1955 1956 âa\+b\+â 1957 Matches one or more âaâs followed by one or more âbâs: âabâ is the 1958 shortest possible match, but other examples are âaaaabâ or âabbbbbâ 1959 or âaaaaaabbbbbbbâ. 1960 1961 â.*â 1962 â.\+â 1963 These two both match all the characters in a string; however, the 1964 first matches every string (including the empty string), while the 1965 second matches only strings containing at least one character. 1966 1967 â^main.*(.*)â 1968 This matches a string starting with âmainâ, followed by an opening 1969 and closing parenthesis. The ânâ, â(â and â)â need not be 1970 adjacent. 1971 1972 â^#â 1973 This matches a string beginning with â#â. 1974 1975 â\\$â 1976 This matches a string ending with a single backslash. The regexp 1977 contains two backslashes for escaping. 1978 1979 â\$â 1980 Instead, this matches a string consisting of a single dollar sign, 1981 because it is escaped. 1982 1983 â[a-zA-Z0-9]â 1984 In the C locale, this matches any ASCII letters or digits. 1985 1986 â[^ â<TAB>â]\+â 1987 (Here â<TAB>â stands for a single tab character.) This matches a 1988 string of one or more characters, none of which is a space or a 1989 tab. Usually this means a word. 1990 1991 â^\(.*\)\n\1$â 1992 This matches a string consisting of two equal substrings separated 1993 by a newline. 1994 1995 â.\{9\}A$â 1996 This matches nine characters followed by an âAâ at the end of a 1997 line. 1998 1999 â^.\{15\}Aâ 2000 This matches the start of a string that contains 16 characters, the 2001 last of which is an âAâ. 2002 2003 2004 File: sed.info, Node: ERE syntax, Next: Character Classes and Bracket Expressions, Prev: BRE syntax, Up: sed regular expressions 2005 2006 5.4 Overview of extended regular expression syntax 2007 ================================================== 2008 2009 The only difference between basic and extended regular expressions is in 2010 the behavior of a few characters: â?â, â+â, parentheses, braces (â{}â), 2011 and â|â. While basic regular expressions require these to be escaped if 2012 you want them to behave as special characters, when using extended 2013 regular expressions you must escape them if you want them _to match a 2014 literal character_. â|â is special here because â\|â is a GNU extension 2015 â standard basic regular expressions do not provide its functionality. 2016 2017 Examples: 2018 âabc?â 2019 becomes âabc\?â when using extended regular expressions. It 2020 matches the literal string âabc?â. 2021 2022 âc\+â 2023 becomes âc+â when using extended regular expressions. It matches 2024 one or more âcâs. 2025 2026 âa\{3,\}â 2027 becomes âa{3,}â when using extended regular expressions. It 2028 matches three or more âaâs. 2029 2030 â\(abc\)\{2,3\}â 2031 becomes â(abc){2,3}â when using extended regular expressions. It 2032 matches either âabcabcâ or âabcabcabcâ. 2033 2034 â\(abc*\)\1â 2035 becomes â(abc*)\1â when using extended regular expressions. 2036 Backreferences must still be escaped when using extended regular 2037 expressions. 2038 2039 âa\|bâ 2040 becomes âa|bâ when using extended regular expressions. It matches 2041 âaâ or âbâ. 2042 2043 2044 File: sed.info, Node: Character Classes and Bracket Expressions, Next: regexp extensions, Prev: ERE syntax, Up: sed regular expressions 2045 2046 5.5 Character Classes and Bracket Expressions 2047 ============================================= 2048 2049 A âbracket expressionâ is a list of characters enclosed by â[â and â]â. 2050 It matches any single character in that list; if the first character of 2051 the list is the caret â^â, then it matches any character *not* in the 2052 list. For example, the following command replaces the strings âgrayâ or 2053 âgreyâ with âblueâ: 2054 2055 sed 's/gr[ae]y/blue/' 2056 2057 Bracket expressions can be used in both *note basic: BRE syntax. and 2058 *note extended: ERE syntax. regular expressions (that is, with or 2059 without the â-Eâ/â-râ options). 2060 2061 Within a bracket expression, a ârange expressionâ consists of two 2062 characters separated by a hyphen. It matches any single character that 2063 sorts between the two characters, inclusive. In the default C locale, 2064 the sorting sequence is the native character order; for example, â[a-d]â 2065 is equivalent to â[abcd]â. 2066 2067 Finally, certain named classes of characters are predefined within 2068 bracket expressions, as follows. 2069 2070 These named classes must be used _inside_ brackets themselves. 2071 Correct usage: 2072 $ echo 1 | sed 's/[[:digit:]]/X/' 2073 X 2074 2075 Incorrect usage is rejected by newer âsedâ versions. Older versions 2076 accepted it but treated it as a single bracket expression (which is 2077 equivalent to â[dgit:]â, that is, only the characters D/G/I/T/:): 2078 # current GNU sed versions - incorrect usage rejected 2079 $ echo 1 | sed 's/[:digit:]/X/' 2080 sed: character class syntax is [[:space:]], not [:space:] 2081 2082 # older GNU sed versions 2083 $ echo 1 | sed 's/[:digit:]/X/' 2084 1 2085 2086 â[:alnum:]â 2087 Alphanumeric characters: â[:alpha:]â and â[:digit:]â; in the âCâ 2088 locale and ASCII character encoding, this is the same as 2089 â[0-9A-Za-z]â. 2090 2091 â[:alpha:]â 2092 Alphabetic characters: â[:lower:]â and â[:upper:]â; in the âCâ 2093 locale and ASCII character encoding, this is the same as 2094 â[A-Za-z]â. 2095 2096 â[:blank:]â 2097 Blank characters: space and tab. 2098 2099 â[:cntrl:]â 2100 Control characters. In ASCII, these characters have octal codes 2101 000 through 037, and 177 (DEL). In other character sets, these are 2102 the equivalent characters, if any. 2103 2104 â[:digit:]â 2105 Digits: â0 1 2 3 4 5 6 7 8 9â. 2106 2107 â[:graph:]â 2108 Graphical characters: â[:alnum:]â and â[:punct:]â. 2109 2110 â[:lower:]â 2111 Lower-case letters; in the âCâ locale and ASCII character encoding, 2112 this is âa b c d e f g h i j k l m n o p q r s t u v w x y zâ. 2113 2114 â[:print:]â 2115 Printable characters: â[:alnum:]â, â[:punct:]â, and space. 2116 2117 â[:punct:]â 2118 Punctuation characters; in the âCâ locale and ASCII character 2119 encoding, this is â! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ 2120 ] ^ _ ` { | } ~â. 2121 2122 â[:space:]â 2123 Space characters: in the âCâ locale, this is tab, newline, vertical 2124 tab, form feed, carriage return, and space. 2125 2126 â[:upper:]â 2127 Upper-case letters: in the âCâ locale and ASCII character encoding, 2128 this is âA B C D E F G H I J K L M N O P Q R S T U V W X Y Zâ. 2129 2130 â[:xdigit:]â 2131 Hexadecimal digits: â0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e fâ. 2132 2133 Note that the brackets in these class names are part of the symbolic 2134 names, and must be included in addition to the brackets delimiting the 2135 bracket expression. 2136 2137 Most meta-characters lose their special meaning inside bracket 2138 expressions: 2139 2140 â]â 2141 ends the bracket expression if itâs not the first list item. So, 2142 if you want to make the â]â character a list item, you must put it 2143 first. 2144 2145 â-â 2146 represents the range if itâs not first or last in a list or the 2147 ending point of a range. 2148 2149 â^â 2150 represents the characters not in the list. If you want to make the 2151 â^â character a list item, place it anywhere but first. 2152 2153 TODO: incorporate this paragraph (copied verbatim from BRE section). 2154 2155 The characters â$â, â*â, â.â, â[â, and â\â are normally not special 2156 within LIST. For example, â[\*]â matches either â\â or â*â, because the 2157 â\â is not special here. However, strings like â[.ch.]â, â[=a=]â, and 2158 â[:space:]â are special within LIST and represent collating symbols, 2159 equivalence classes, and character classes, respectively, and â[â is 2160 therefore special within LIST when it is followed by â.â, â=â, or â:â. 2161 Also, when not in âPOSIXLY_CORRECTâ mode, special escapes like â\nâ and 2162 â\tâ are recognized within LIST. *Note Escapes::. 2163 2164 â[.â 2165 represents the open collating symbol. 2166 2167 â.]â 2168 represents the close collating symbol. 2169 2170 â[=â 2171 represents the open equivalence class. 2172 2173 â=]â 2174 represents the close equivalence class. 2175 2176 â[:â 2177 represents the open character class symbol, and should be followed 2178 by a valid character class name. 2179 2180 â:]â 2181 represents the close character class symbol. 2182 2183 2184 File: sed.info, Node: regexp extensions, Next: Back-references and Subexpressions, Prev: Character Classes and Bracket Expressions, Up: sed regular expressions 2185 2186 5.6 regular expression extensions 2187 ================================= 2188 2189 The following sequences have special meaning inside regular expressions 2190 (used in *note addresses: Regexp Addresses. and the âsâ command). 2191 2192 These can be used in both *note basic: BRE syntax. and *note 2193 extended: ERE syntax. regular expressions (that is, with or without the 2194 â-Eâ/â-râ options). 2195 2196 â\wâ 2197 Matches any âwordâ character. A âwordâ character is any letter or 2198 digit or the underscore character. 2199 2200 $ echo "abc %-= def." | sed 's/\w/X/g' 2201 XXX %-= XXX. 2202 2203 â\Wâ 2204 Matches any ânon-wordâ character. 2205 2206 $ echo "abc %-= def." | sed 's/\W/X/g' 2207 abcXXXXXdefX 2208 2209 â\bâ 2210 Matches a word boundary; that is it matches if the character to the 2211 left is a âwordâ character and the character to the right is a 2212 ânon-wordâ character, or vice-versa. 2213 2214 $ echo "abc %-= def." | sed 's/\b/X/g' 2215 XabcX %-= XdefX. 2216 2217 â\Bâ 2218 Matches everywhere but on a word boundary; that is it matches if 2219 the character to the left and the character to the right are either 2220 both âwordâ characters or both ânon-wordâ characters. 2221 2222 $ echo "abc %-= def." | sed 's/\B/X/g' 2223 aXbXc X%X-X=X dXeXf.X 2224 2225 â\sâ 2226 Matches whitespace characters (spaces and tabs). Newlines embedded 2227 in the pattern/hold spaces will also match: 2228 2229 $ echo "abc %-= def." | sed 's/\s/X/g' 2230 abcX%-=Xdef. 2231 2232 â\Sâ 2233 Matches non-whitespace characters. 2234 2235 $ echo "abc %-= def." | sed 's/\S/X/g' 2236 XXX XXX XXXX 2237 2238 â\<â 2239 Matches the beginning of a word. 2240 2241 $ echo "abc %-= def." | sed 's/\</X/g' 2242 Xabc %-= Xdef. 2243 2244 â\>â 2245 Matches the end of a word. 2246 2247 $ echo "abc %-= def." | sed 's/\>/X/g' 2248 abcX %-= defX. 2249 2250 â\`â 2251 Matches only at the start of pattern space. This is different from 2252 â^â in multi-line mode. 2253 2254 Compare the following two examples: 2255 2256 $ printf "a\nb\nc\n" | sed 'N;N;s/^/X/gm' 2257 Xa 2258 Xb 2259 Xc 2260 2261 $ printf "a\nb\nc\n" | sed 'N;N;s/\`/X/gm' 2262 Xa 2263 b 2264 c 2265 2266 â\'â 2267 Matches only at the end of pattern space. This is different from 2268 â$â in multi-line mode. 2269 2270 2271 File: sed.info, Node: Back-references and Subexpressions, Next: Escapes, Prev: regexp extensions, Up: sed regular expressions 2272 2273 5.7 Back-references and Subexpressions 2274 ====================================== 2275 2276 âback-referencesâ are regular expression commands which refer to a 2277 previous part of the matched regular expression. Back-references are 2278 specified with backslash and a single digit (e.g. â\1â). The part of 2279 the regular expression they refer to is called a âsubexpressionâ, and is 2280 designated with parentheses. 2281 2282 Back-references and subexpressions are used in two cases: in the 2283 regular expression search pattern, and in the REPLACEMENT part of the 2284 âsâ command (*note Regular Expression Addresses: Regexp Addresses. and 2285 *note The "s" Command::). 2286 2287 In a regular expression pattern, back-references are used to match 2288 the same content as a previously matched subexpression. In the 2289 following example, the subexpression is â.â - any single character 2290 (being surrounded by parentheses makes it a subexpression). The 2291 back-reference â\1â asks to match the same content (same character) as 2292 the sub-expression. 2293 2294 The command below matches words starting with any character, followed 2295 by the letter âoâ, followed by the same character as the first. 2296 2297 $ sed -E -n '/^(.)o\1$/p' /usr/share/dict/words 2298 bob 2299 mom 2300 non 2301 pop 2302 sos 2303 tot 2304 wow 2305 2306 Multiple subexpressions are automatically numbered from 2307 left-to-right. This command searches for 6-letter palindromes (the 2308 first three letters are 3 subexpressions, followed by 3 back-references 2309 in reverse order): 2310 2311 $ sed -E -n '/^(.)(.)(.)\3\2\1$/p' /usr/share/dict/words 2312 redder 2313 2314 In the âsâ command, back-references can be used in the REPLACEMENT 2315 part to refer back to subexpressions in the REGEXP part. 2316 2317 The following example uses two subexpressions in the regular 2318 expression to match two space-separated words. The back-references in 2319 the REPLACEMENT part prints the words in a different order: 2320 2321 $ echo "James Bond" | sed -E 's/(.*) (.*)/The name is \2, \1 \2./' 2322 The name is Bond, James Bond. 2323 2324 When used with alternation, if the group does not participate in the 2325 match then the back-reference makes the whole match fail. For example, 2326 âa(.)|b\1â will not match âbaâ. When multiple regular expressions are 2327 given with â-eâ or from a file (â-f FILEâ), back-references are local to 2328 each expression. 2329 2330 2331 File: sed.info, Node: Escapes, Next: Locale Considerations, Prev: Back-references and Subexpressions, Up: sed regular expressions 2332 2333 5.8 Escape Sequences - specifying special characters 2334 ==================================================== 2335 2336 Until this chapter, we have only encountered escapes of the form â\^â, 2337 which tell âsedâ not to interpret the circumflex as a special character, 2338 but rather to take it literally. For example, â\*â matches a single 2339 asterisk rather than zero or more backslashes. 2340 2341 This chapter introduces another kind of escape(1)âthat is, escapes 2342 that are applied to a character or sequence of characters that 2343 ordinarily are taken literally, and that âsedâ replaces with a special 2344 character. This provides a way of encoding non-printable characters in 2345 patterns in a visible manner. There is no restriction on the appearance 2346 of non-printing characters in a âsedâ script but when a script is being 2347 prepared in the shell or by text editing, it is usually easier to use 2348 one of the following escape sequences than the binary character it 2349 represents: 2350 2351 The list of these escapes is: 2352 2353 â\aâ 2354 Produces or matches a BEL character, that is an âalertâ (ASCII 7). 2355 2356 â\fâ 2357 Produces or matches a form feed (ASCII 12). 2358 2359 â\nâ 2360 Produces or matches a newline (ASCII 10). 2361 2362 â\râ 2363 Produces or matches a carriage return (ASCII 13). 2364 2365 â\tâ 2366 Produces or matches a horizontal tab (ASCII 9). 2367 2368 â\vâ 2369 Produces or matches a so called âvertical tabâ (ASCII 11). 2370 2371 â\cXâ 2372 Produces or matches âCONTROL-Xâ, where X is any character. The 2373 precise effect of â\cXâ is as follows: if X is a lower case letter, 2374 it is converted to upper case. Then bit 6 of the character (hex 2375 40) is inverted. Thus â\czâ becomes hex 1A, but â\c{â becomes hex 2376 3B, while â\c;â becomes hex 7B. 2377 2378 â\dXXXâ 2379 Produces or matches a character whose decimal ASCII value is XXX. 2380 2381 â\oXXXâ 2382 Produces or matches a character whose octal ASCII value is XXX. 2383 2384 â\xXXâ 2385 Produces or matches a character whose hexadecimal ASCII value is 2386 XX. 2387 2388 â\bâ (backspace) was omitted because of the conflict with the 2389 existing âword boundaryâ meaning. 2390 2391 5.8.1 Escaping Precedence 2392 ------------------------- 2393 2394 GNU âsedâ processes escape sequences _before_ passing the text onto the 2395 regular-expression matching of the âs///â command and Address matching. 2396 Thus the following two commands are equivalent (â0x5eâ is the 2397 hexadecimal ASCII value of the character â^â): 2398 2399 $ echo 'a^c' | sed 's/^/b/' 2400 ba^c 2401 2402 $ echo 'a^c' | sed 's/\x5e/b/' 2403 ba^c 2404 2405 As are the following (â0x5bâ,â0x5dâ are the hexadecimal ASCII values 2406 of â[â,â]â, respectively): 2407 2408 $ echo abc | sed 's/[a]/x/' 2409 Xbc 2410 $ echo abc | sed 's/\x5ba\x5d/x/' 2411 Xbc 2412 2413 However it is recommended to avoid such special characters due to 2414 unexpected edge-cases. For example, the following are not equivalent: 2415 2416 $ echo 'a^c' | sed 's/\^/b/' 2417 abc 2418 2419 $ echo 'a^c' | sed 's/\\\x5e/b/' 2420 a^c 2421 2422 ---------- Footnotes ---------- 2423 2424 (1) All the escapes introduced here are GNU extensions, with the 2425 exception of â\nâ. In basic regular expression mode, setting 2426 âPOSIXLY_CORRECTâ disables them inside bracket expressions. 2427 2428 2429 File: sed.info, Node: Locale Considerations, Prev: Escapes, Up: sed regular expressions 2430 2431 5.9 Multibyte characters and Locale Considerations 2432 ================================================== 2433 2434 GNU âsedâ processes valid multibyte characters in multibyte locales 2435 (e.g. âUTF-8â). (1) 2436 2437 The following example uses the Greek letter Capital Sigma (Σ, Unicode 2438 code point â0x03A3â). In a âUTF-8â locale, âsedâ correctly processes 2439 the Sigma as one character despite it being 2 octets (bytes): 2440 2441 $ locale | grep LANG 2442 LANG=en_US.UTF-8 2443 2444 $ printf 'a\u03A3b' 2445 aΣb 2446 2447 $ printf 'a\u03A3b' | sed 's/./X/g' 2448 XXX 2449 2450 $ printf 'a\u03A3b' | od -tx1 -An 2451 61 ce a3 62 2452 2453 To force âsedâ to process octets separately, use the âCâ locale (also 2454 known as the âPOSIXâ locale): 2455 2456 $ printf 'a\u03A3b' | LC_ALL=C sed 's/./X/g' 2457 XXXX 2458 2459 5.9.1 Invalid multibyte characters 2460 ---------------------------------- 2461 2462 âsedââs regular expressions _do not_ match invalid multibyte sequences 2463 in a multibyte locale. 2464 2465 In the following examples, the ascii value â0xCEâ is an incomplete 2466 multibyte character (shown here as ᅵ). The regular expression â.â does 2467 not match it: 2468 2469 $ printf 'a\xCEb\n' 2470 aᅵe 2471 2472 $ printf 'a\xCEb\n' | sed 's/./X/g' 2473 XᅵX 2474 2475 $ printf 'a\xCEc\n' | sed 's/./X/g' | od -tx1c -An 2476 58 ce 58 0a 2477 X X \n 2478 2479 Similarly, the âcatch-allâ regular expression â.*â does not match the 2480 entire line: 2481 2482 $ printf 'a\xCEc\n' | sed 's/.*//' | od -tx1c -An 2483 ce 63 0a 2484 c \n 2485 2486 GNU âsedâ offers the special âzâ command to clear the current pattern 2487 space regardless of invalid multibyte characters (i.e. it works like 2488 âs/.*//â but also removes invalid multibyte characters): 2489 2490 $ printf 'a\xCEc\n' | sed 'z' | od -tx1c -An 2491 0a 2492 \n 2493 2494 Alternatively, force the âCâ locale to process each octet separately 2495 (every octet is a valid character in the âCâ locale): 2496 2497 $ printf 'a\xCEc\n' | LC_ALL=C sed 's/.*//' | od -tx1c -An 2498 0a 2499 \n 2500 2501 âsedââs inability to process invalid multibyte characters can be used 2502 to detect such invalid sequences in a file. In the following examples, 2503 the â\xCE\xCEâ is an invalid multibyte sequence, while â\xCE\A3â is a 2504 valid multibyte sequence (of the Greek Sigma character). 2505 2506 The following âsedâ program removes all valid characters using âs/.//gâ. 2507 Any content left in the pattern space (the invalid characters) are added 2508 to the hold space using the âHâ command. On the last line (â$â), the 2509 hold space is retrieved (âxâ), newlines are removed (âs/\n//gâ), and any 2510 remaining octets are printed unambiguously (âlâ). Thus, any invalid 2511 multibyte sequences are printed as octal values: 2512 2513 $ printf 'ab\nc\n\xCE\xCEde\n\xCE\xA3f\n' > invalid.txt 2514 2515 $ cat invalid.txt 2516 ab 2517 c 2518 ᅵᅵde 2519 Σf 2520 2521 $ sed -n 's/.//g ; H ; ${x;s/\n//g;l}' invalid.txt 2522 \316\316$ 2523 2524 With a few more commands, âsedâ can print the exact line number 2525 corresponding to each invalid characters (line 3). These characters can 2526 then be removed by forcing the âCâ locale and using octal escape 2527 sequences: 2528 2529 $ sed -n 's/.//g;=;l' invalid.txt | paste - - | awk '$2!="$"' 2530 3 \316\316$ 2531 2532 $ LC_ALL=C sed '3s/\o316\o316//' invalid.txt > fixed.txt 2533 2534 5.9.2 Upper/Lower case conversion 2535 --------------------------------- 2536 2537 GNU âsedââs substitute command (âsâ) supports upper/lower case 2538 conversions using â\Uâ,â\Lâ codes. These conversions support multibyte 2539 characters: 2540 2541 $ printf 'ABC\u03a3\n' 2542 ABCΣ 2543 2544 $ printf 'ABC\u03a3\n' | sed 's/.*/\L&/' 2545 abcÏ 2546 2547 *Note The "s" Command::. 2548 2549 5.9.3 Multibyte regexp character classes 2550 ---------------------------------------- 2551 2552 In other locales, the sorting sequence is not specified, and â[a-d]â 2553 might be equivalent to â[abcd]â or to â[aBbCcDd]â, or it might fail to 2554 match any character, or the set of characters that it matches might even 2555 be erratic. To obtain the traditional interpretation of bracket 2556 expressions, you can use the âCâ locale by setting the âLC_ALLâ 2557 environment variable to the value âCâ. 2558 2559 # TODO: is there any real-world system/locale where 'A' 2560 # is replaced by '-' ? 2561 $ echo A | sed 's/[a-z]/-/' 2562 A 2563 2564 Their interpretation depends on the âLC_CTYPEâ locale; for example, 2565 â[[:alnum:]]â means the character class of numbers and letters in the 2566 current locale. 2567 2568 TODO: show example of collation 2569 2570 # TODO: this works on glibc systems, not on musl-libc/freebsd/macosx. 2571 $ printf 'cliché\n' | LC_ALL=fr_FR.utf8 sed 's/[[=e=]]/X/g' 2572 clichX 2573 2574 ---------- Footnotes ---------- 2575 2576 (1) Some regexp edge-cases depends on the operating system and libc 2577 implementation. The examples shown are known to work as-expected on 2578 GNU/Linux systems using glibc. 2579 2580 2581 File: sed.info, Node: advanced sed, Next: Examples, Prev: sed regular expressions, Up: Top 2582 2583 6 Advanced âsedâ: cycles and buffers 2584 ************************************ 2585 2586 * Menu: 2587 2588 * Execution Cycle:: How âsedâ works 2589 * Hold and Pattern Buffers:: 2590 * Multiline techniques:: Using D,G,H,N,P to process multiple lines 2591 * Branching and flow control:: 2592 2593 2594 File: sed.info, Node: Execution Cycle, Next: Hold and Pattern Buffers, Up: advanced sed 2595 2596 6.1 How âsedâ Works 2597 =================== 2598 2599 âsedâ maintains two data buffers: the active _pattern_ space, and the 2600 auxiliary _hold_ space. Both are initially empty. 2601 2602 âsedâ operates by performing the following cycle on each line of 2603 input: first, âsedâ reads one line from the input stream, removes any 2604 trailing newline, and places it in the pattern space. Then commands are 2605 executed; each command can have an address associated to it: addresses 2606 are a kind of condition code, and a command is only executed if the 2607 condition is verified before the command is to be executed. 2608 2609 When the end of the script is reached, unless the â-nâ option is in 2610 use, the contents of pattern space are printed out to the output stream, 2611 adding back the trailing newline if it was removed.(1) Then the next 2612 cycle starts for the next input line. 2613 2614 Unless special commands (like âDâ) are used, the pattern space is 2615 deleted between two cycles. The hold space, on the other hand, keeps 2616 its data between cycles (see commands âhâ, âHâ, âxâ, âgâ, âGâ to move 2617 data between both buffers). 2618 2619 ---------- Footnotes ---------- 2620 2621 (1) Actually, if âsedâ prints a line without the terminating newline, 2622 it will nevertheless print the missing newline as soon as more text is 2623 sent to the same output stream, which gives the âleast expected 2624 surpriseâ even though it does not make commands like âsed -n pâ exactly 2625 identical to âcatâ. 2626 2627 2628 File: sed.info, Node: Hold and Pattern Buffers, Next: Multiline techniques, Prev: Execution Cycle, Up: advanced sed 2629 2630 6.2 Hold and Pattern Buffers 2631 ============================ 2632 2633 TODO 2634 2635 2636 File: sed.info, Node: Multiline techniques, Next: Branching and flow control, Prev: Hold and Pattern Buffers, Up: advanced sed 2637 2638 6.3 Multiline techniques - using D,G,H,N,P to process multiple lines 2639 ==================================================================== 2640 2641 Multiple lines can be processed as one buffer using the 2642 âDâ,âGâ,âHâ,âNâ,âPâ. They are similar to their lowercase counterparts 2643 (âdâ,âgâ, âhâ,ânâ,âpâ), except that these commands append or subtract 2644 data while respecting embedded newlines - allowing adding and removing 2645 lines from the pattern and hold spaces. 2646 2647 They operate as follows: 2648 âDâ 2649 _deletes_ line from the pattern space until the first newline, and 2650 restarts the cycle. 2651 2652 âGâ 2653 _appends_ line from the hold space to the pattern space, with a 2654 newline before it. 2655 2656 âHâ 2657 _appends_ line from the pattern space to the hold space, with a 2658 newline before it. 2659 2660 âNâ 2661 _appends_ line from the input file to the pattern space. 2662 2663 âPâ 2664 _prints_ line from the pattern space until the first newline. 2665 2666 The following example illustrates the operation of âNâ and âDâ 2667 commands: 2668 2669 $ seq 6 | sed -n 'N;l;D' 2670 1\n2$ 2671 2\n3$ 2672 3\n4$ 2673 4\n5$ 2674 5\n6$ 2675 2676 1. âsedâ starts by reading the first line into the pattern space (i.e. 2677 â1â). 2678 2. At the beginning of every cycle, the âNâ command appends a newline 2679 and the next line to the pattern space (i.e. â1â, â\nâ, â2â in the 2680 first cycle). 2681 3. The âlâ command prints the content of the pattern space 2682 unambiguously. 2683 4. The âDâ command then removes the content of pattern space up to the 2684 first newline (leaving â2â at the end of the first cycle). 2685 5. At the next cycle the âNâ command appends a newline and the next 2686 input line to the pattern space (e.g. â2â, â\nâ, â3â). 2687 2688 A common technique to process blocks of text such as paragraphs 2689 (instead of line-by-line) is using the following construct: 2690 2691 sed '/./{H;$!d} ; x ; s/REGEXP/REPLACEMENT/' 2692 2693 1. The first expression, â/./{H;$!d}â operates on all non-empty lines, 2694 and adds the current line (in the pattern space) to the hold space. 2695 On all lines except the last, the pattern space is deleted and the 2696 cycle is restarted. 2697 2698 2. The other expressions âxâ and âsâ are executed only on empty lines 2699 (i.e. paragraph separators). The âxâ command fetches the 2700 accumulated lines from the hold space back to the pattern space. 2701 The âs///â command then operates on all the text in the paragraph 2702 (including the embedded newlines). 2703 2704 The following example demonstrates this technique: 2705 $ cat input.txt 2706 a a a aa aaa 2707 aaaa aaaa aa 2708 aaaa aaa aaa 2709 2710 bbbb bbb bbb 2711 bb bb bbb bb 2712 bbbbbbbb bbb 2713 2714 ccc ccc cccc 2715 cccc ccccc c 2716 cc cc cc cc 2717 2718 $ sed '/./{H;$!d} ; x ; s/^/\nSTART-->/ ; s/$/\n<--END/' input.txt 2719 2720 START--> 2721 a a a aa aaa 2722 aaaa aaaa aa 2723 aaaa aaa aaa 2724 <--END 2725 2726 START--> 2727 bbbb bbb bbb 2728 bb bb bbb bb 2729 bbbbbbbb bbb 2730 <--END 2731 2732 START--> 2733 ccc ccc cccc 2734 cccc ccccc c 2735 cc cc cc cc 2736 <--END 2737 2738 For more annotated examples, *note Text search across multiple 2739 lines:: and *note Line length adjustment::. 2740 2741 2742 File: sed.info, Node: Branching and flow control, Prev: Multiline techniques, Up: advanced sed 2743 2744 6.4 Branching and Flow Control 2745 ============================== 2746 2747 The branching commands âbâ, âtâ, and âTâ enable changing the flow of 2748 âsedâ programs. 2749 2750 By default, âsedâ reads an input line into the pattern buffer, then 2751 continues to processes all commands in order. Commands without 2752 addresses affect all lines. Commands with addresses affect only 2753 matching lines. *Note Execution Cycle:: and *note Addresses overview::. 2754 2755 âsedâ does not support a typical âif/thenâ construct. Instead, some 2756 commands can be used as conditionals or to change the default flow 2757 control: 2758 2759 âdâ 2760 delete (clears) the current pattern space, and restart the program 2761 cycle without processing the rest of the commands and without 2762 printing the pattern space. 2763 2764 âDâ 2765 delete the contents of the pattern space _up to the first newline_, 2766 and restart the program cycle without processing the rest of the 2767 commands and without printing the pattern space. 2768 2769 â[addr]Xâ 2770 â[addr]{ X ; X ; X }â 2771 â/regexp/Xâ 2772 â/regexp/{ X ; X ; X }â 2773 Addresses and regular expressions can be used as an âif/thenâ 2774 conditional: If [ADDR] matches the current pattern space, execute 2775 the command(s). For example: The command â/^#/dâ means: _if_ the 2776 current pattern matches the regular expression â^#â (a line 2777 starting with a hash), _then_ execute the âdâ command: delete the 2778 line without printing it, and restart the program cycle 2779 immediately. 2780 2781 âbâ 2782 branch unconditionally (that is: always jump to a label, skipping 2783 or repeating other commands, without restarting a new cycle). 2784 Combined with an address, the branch can be conditionally executed 2785 on matched lines. 2786 2787 âtâ 2788 branch conditionally (that is: jump to a label) _only if_ a âs///â 2789 command has succeeded since the last input line was read or another 2790 conditional branch was taken. 2791 2792 âTâ 2793 similar but opposite to the âtâ command: branch only if there has 2794 been _no_ successful substitutions since the last input line was 2795 read. 2796 2797 The following two âsedâ programs are equivalent. The first 2798 (contrived) example uses the âbâ command to skip the âs///â command on 2799 lines containing â1â. The second example uses an address with negation 2800 (â!â) to perform substitution only on desired lines. The ây///â command 2801 is still executed on all lines: 2802 2803 $ printf '%s\n' a1 a2 a3 | sed -E '/1/bx ; s/a/z/ ; :x ; y/123/456/' 2804 a4 2805 z5 2806 z6 2807 2808 $ printf '%s\n' a1 a2 a3 | sed -E '/1/!s/a/z/ ; y/123/456/' 2809 a4 2810 z5 2811 z6 2812 2813 6.4.1 Branching and Cycles 2814 -------------------------- 2815 2816 The âbâ,âtâ and âTâ commands can be followed by a label (typically a 2817 single letter). Labels are defined with a colon followed by one or more 2818 letters (e.g. â:xâ). If the label is omitted the branch commands 2819 restart the cycle. Note the difference between branching to a label and 2820 restarting the cycle: when a cycle is restarted, âsedâ first prints the 2821 current content of the pattern space, then reads the next input line 2822 into the pattern space; Jumping to a label (even if it is at the 2823 beginning of the program) does not print the pattern space and does not 2824 read the next input line. 2825 2826 The following program is a no-op. The âbâ command (the only command 2827 in the program) does not have a label, and thus simply restarts the 2828 cycle. On each cycle, the pattern space is printed and the next input 2829 line is read: 2830 2831 $ seq 3 | sed b 2832 1 2833 2 2834 3 2835 2836 The following example is an infinite-loop - it doesnât terminate and 2837 doesnât print anything. The âbâ command jumps to the âxâ label, and a 2838 new cycle is never started: 2839 2840 $ seq 3 | sed ':x ; bx' 2841 2842 # The above command requires gnu sed (which supports additional 2843 # commands following a label, without a newline). A portable equivalent: 2844 # sed -e ':x' -e bx 2845 2846 Branching is often complemented with the ânâ or âNâ commands: both 2847 commands read the next input line into the pattern space without waiting 2848 for the cycle to restart. Before reading the next input line, ânâ 2849 prints the current pattern space then empties it, while âNâ appends a 2850 newline and the next input line to the pattern space. 2851 2852 Consider the following two examples: 2853 2854 $ seq 3 | sed ':x ; n ; bx' 2855 1 2856 2 2857 3 2858 2859 $ seq 3 | sed ':x ; N ; bx' 2860 1 2861 2 2862 3 2863 2864 ⢠Both examples do not inf-loop, despite never starting a new cycle. 2865 2866 ⢠In the first example, the ânâ commands first prints the content of 2867 the pattern space, empties the pattern space then reads the next 2868 input line. 2869 2870 ⢠In the second example, the âNâ commands appends the next input line 2871 to the pattern space (with a newline). Lines are accumulated in 2872 the pattern space until there are no more input lines to read, then 2873 the âNâ command terminates the âsedâ program. When the program 2874 terminates, the end-of-cycle actions are performed, and the entire 2875 pattern space is printed. 2876 2877 ⢠The second example requires GNU âsedâ, because it uses the 2878 non-POSIX-standard behavior of âNâ. See the ââNâ command on the 2879 last lineâ paragraph in *note Reporting Bugs::. 2880 2881 ⢠To further examine the difference between the two examples, try the 2882 following commands: 2883 printf '%s\n' aa bb cc dd | sed ':x ; n ; = ; bx' 2884 printf '%s\n' aa bb cc dd | sed ':x ; N ; = ; bx' 2885 printf '%s\n' aa bb cc dd | sed ':x ; n ; s/\n/***/ ; bx' 2886 printf '%s\n' aa bb cc dd | sed ':x ; N ; s/\n/***/ ; bx' 2887 2888 6.4.2 Branching example: joining lines 2889 -------------------------------------- 2890 2891 As a real-world example of using branching, consider the case of 2892 quoted-printable (https://en.wikipedia.org/wiki/Quoted-printable) files, 2893 typically used to encode email messages. In these files long lines are 2894 split and marked with a âsoft line breakâ consisting of a single â=â 2895 character at the end of the line: 2896 2897 $ cat jaques.txt 2898 All the wor= 2899 ld's a stag= 2900 e, 2901 And all the= 2902 men and wo= 2903 men merely = 2904 players: 2905 They have t= 2906 heir exits = 2907 and their e= 2908 ntrances; 2909 And one man= 2910 in his tim= 2911 e plays man= 2912 y parts. 2913 2914 The following program uses an address match â/=$/â as a conditional: 2915 If the current pattern space ends with a â=â, it reads the next input 2916 line using âNâ, replaces all â=â characters which are followed by a 2917 newline, and unconditionally branches (âbâ) to the beginning of the 2918 program without restarting a new cycle. If the pattern space does not 2919 ends with â=â, the default action is performed: the pattern space is 2920 printed and a new cycle is started: 2921 2922 $ sed ':x ; /=$/ { N ; s/=\n//g ; bx }' jaques.txt 2923 All the world's a stage, 2924 And all the men and women merely players: 2925 They have their exits and their entrances; 2926 And one man in his time plays many parts. 2927 2928 Hereâs an alternative program with a slightly different approach: On 2929 all lines except the last, âNâ appends the line to the pattern space. A 2930 substitution command then removes soft line breaks (â=â at the end of a 2931 line, i.e. followed by a newline) by replacing them with an empty 2932 string. _if_ the substitution was successful (meaning the pattern space 2933 contained a line which should be joined), The conditional branch command 2934 âtâ jumps to the beginning of the program without completing or 2935 restarting the cycle. If the substitution failed (meaning there were no 2936 soft line breaks), The âtâ command will _not_ branch. Then, âPâ will 2937 print the pattern space content until the first newline, and âDâ will 2938 delete the pattern space content until the first new line. (To learn 2939 more about âNâ, âPâ and âDâ commands *note Multiline techniques::). 2940 2941 $ sed ':x ; $!N ; s/=\n// ; tx ; P ; D' jaques.txt 2942 All the world's a stage, 2943 And all the men and women merely players: 2944 They have their exits and their entrances; 2945 And one man in his time plays many parts. 2946 2947 For more line-joining examples *note Joining lines::. 2948 2949 2950 File: sed.info, Node: Examples, Next: Limitations, Prev: advanced sed, Up: Top 2951 2952 7 Some Sample Scripts 2953 ********************* 2954 2955 Here are some âsedâ scripts to guide you in the art of mastering âsedâ. 2956 2957 * Menu: 2958 2959 2960 Useful one-liners: 2961 * Joining lines:: 2962 2963 Some exotic examples: 2964 * Centering lines:: 2965 * Increment a number:: 2966 * Rename files to lower case:: 2967 * Print bash environment:: 2968 * Reverse chars of lines:: 2969 * Text search across multiple lines:: 2970 * Line length adjustment:: 2971 * Adding a header to multiple files:: 2972 2973 Emulating standard utilities: 2974 * tac:: Reverse lines of files 2975 * cat -n:: Numbering lines 2976 * cat -b:: Numbering non-blank lines 2977 * wc -c:: Counting chars 2978 * wc -w:: Counting words 2979 * wc -l:: Counting lines 2980 * head:: Printing the first lines 2981 * tail:: Printing the last lines 2982 * uniq:: Make duplicate lines unique 2983 * uniq -d:: Print duplicated lines of input 2984 * uniq -u:: Remove all duplicated lines 2985 * cat -s:: Squeezing blank lines 2986 2987 2988 File: sed.info, Node: Joining lines, Next: Centering lines, Up: Examples 2989 2990 7.1 Joining lines 2991 ================= 2992 2993 This section uses âNâ, âDâ and âPâ commands to process multiple lines, 2994 and the âbâ and âtâ commands for branching. *Note Multiline 2995 techniques:: and *note Branching and flow control::. 2996 2997 Join specific lines (e.g. if lines 2 and 3 need to be joined): 2998 2999 $ cat lines.txt 3000 hello 3001 hel 3002 lo 3003 hello 3004 3005 $ sed '2{N;s/\n//;}' lines.txt 3006 hello 3007 hello 3008 hello 3009 3010 Join backslash-continued lines: 3011 3012 $ cat 1.txt 3013 this \ 3014 is \ 3015 a \ 3016 long \ 3017 line 3018 and another \ 3019 line 3020 3021 $ sed -e ':x /\\$/ { N; s/\\\n//g ; bx }' 1.txt 3022 this is a long line 3023 and another line 3024 3025 3026 #TODO: The above requires gnu sed. 3027 # non-gnu seds need newlines after ':' and 'b' 3028 3029 Join lines that start with whitespace (e.g SMTP headers): 3030 3031 $ cat 2.txt 3032 Subject: Hello 3033 World 3034 Content-Type: multipart/alternative; 3035 boundary=94eb2c190cc6370f06054535da6a 3036 Date: Tue, 3 Jan 2017 19:41:16 +0000 (GMT) 3037 Authentication-Results: mx.gnu.org; 3038 dkim=pass header.i=@gnu.org; 3039 spf=pass 3040 Message-ID: <abcdef@gnu.org> 3041 From: John Doe <jdoe@gnu.org> 3042 To: Jane Smith <jsmith@gnu.org> 3043 3044 $ sed -E ':a ; $!N ; s/\n\s+/ / ; ta ; P ; D' 2.txt 3045 Subject: Hello World 3046 Content-Type: multipart/alternative; boundary=94eb2c190cc6370f06054535da6a 3047 Date: Tue, 3 Jan 2017 19:41:16 +0000 (GMT) 3048 Authentication-Results: mx.gnu.org; dkim=pass header.i=@gnu.org; spf=pass 3049 Message-ID: <abcdef@gnu.org> 3050 From: John Doe <jdoe@gnu.org> 3051 To: Jane Smith <jsmith@gnu.org> 3052 3053 # A portable (non-gnu) variation: 3054 # sed -e :a -e '$!N;s/\n */ /;ta' -e 'P;D' 3055 3056 3057 File: sed.info, Node: Centering lines, Next: Increment a number, Prev: Joining lines, Up: Examples 3058 3059 7.2 Centering Lines 3060 =================== 3061 3062 This script centers all lines of a file on a 80 columns width. To 3063 change that width, the number in â\{...\}â must be replaced, and the 3064 number of added spaces also must be changed. 3065 3066 Note how the buffer commands are used to separate parts in the 3067 regular expressions to be matchedâthis is a common technique. 3068 3069 #!/usr/bin/sed -f 3070 3071 # Put 80 spaces in the buffer 3072 1 { 3073 x 3074 s/^$/ / 3075 s/^.*$/&&&&&&&&/ 3076 x 3077 } 3078 3079 # delete leading and trailing spaces 3080 y/<TAB>/ / 3081 s/^ *// 3082 s/ *$// 3083 3084 # add a newline and 80 spaces to end of line 3085 G 3086 3087 # keep first 81 chars (80 + a newline) 3088 s/^\(.\{81\}\).*$/\1/ 3089 3090 # \2 matches half of the spaces, which are moved to the beginning 3091 s/^\(.*\)\n\(.*\)\2/\2\1/ 3092 3093 3094 File: sed.info, Node: Increment a number, Next: Rename files to lower case, Prev: Centering lines, Up: Examples 3095 3096 7.3 Increment a Number 3097 ====================== 3098 3099 This script is one of a few that demonstrate how to do arithmetic in 3100 âsedâ. This is indeed possible,(1) but must be done manually. 3101 3102 To increment one number you just add 1 to last digit, replacing it by 3103 the following digit. There is one exception: when the digit is a nine 3104 the previous digits must be also incremented until you donât have a 3105 nine. 3106 3107 This solution by Bruno Haible is very clever and smart because it 3108 uses a single buffer; if you donât have this limitation, the algorithm 3109 used in *note Numbering lines: cat -n, is faster. It works by replacing 3110 trailing nines with an underscore, then using multiple âsâ commands to 3111 increment the last digit, and then again substituting underscores with 3112 zeros. 3113 3114 #!/usr/bin/sed -f 3115 3116 /[^0-9]/ d 3117 3118 # replace all trailing 9s by _ (any other character except digits, could 3119 # be used) 3120 :d 3121 s/9\(_*\)$/_\1/ 3122 td 3123 3124 # incr last digit only. The first line adds a most-significant 3125 # digit of 1 if we have to add a digit. 3126 3127 s/^\(_*\)$/1\1/; tn 3128 s/8\(_*\)$/9\1/; tn 3129 s/7\(_*\)$/8\1/; tn 3130 s/6\(_*\)$/7\1/; tn 3131 s/5\(_*\)$/6\1/; tn 3132 s/4\(_*\)$/5\1/; tn 3133 s/3\(_*\)$/4\1/; tn 3134 s/2\(_*\)$/3\1/; tn 3135 s/1\(_*\)$/2\1/; tn 3136 s/0\(_*\)$/1\1/; tn 3137 3138 :n 3139 y/_/0/ 3140 3141 ---------- Footnotes ---------- 3142 3143 (1) âsedâ guru Greg Ubben wrote an implementation of the âdcâ RPN 3144 calculator! It is distributed together with sed. 3145 3146 3147 File: sed.info, Node: Rename files to lower case, Next: Print bash environment, Prev: Increment a number, Up: Examples 3148 3149 7.4 Rename Files to Lower Case 3150 ============================== 3151 3152 This is a pretty strange use of âsedâ. We transform text, and transform 3153 it to be shell commands, then just feed them to shell. Donât worry, 3154 even worse hacks are done when using âsedâ; I have seen a script 3155 converting the output of âdateâ into a âbcâ program! 3156 3157 The main body of this is the âsedâ script, which remaps the name from 3158 lower to upper (or vice-versa) and even checks out if the remapped name 3159 is the same as the original name. Note how the script is parameterized 3160 using shell variables and proper quoting. 3161 3162 #! /bin/sh 3163 # rename files to lower/upper case... 3164 # 3165 # usage: 3166 # move-to-lower * 3167 # move-to-upper * 3168 # or 3169 # move-to-lower -R . 3170 # move-to-upper -R . 3171 # 3172 3173 help() 3174 { 3175 cat << eof 3176 Usage: $0 [-n] [-r] [-h] files... 3177 3178 -n do nothing, only see what would be done 3179 -R recursive (use find) 3180 -h this message 3181 files files to remap to lower case 3182 3183 Examples: 3184 $0 -n * (see if everything is ok, then...) 3185 $0 * 3186 3187 $0 -R . 3188 3189 eof 3190 } 3191 3192 apply_cmd='sh' 3193 finder='echo "$@" | tr " " "\n"' 3194 files_only= 3195 3196 while : 3197 do 3198 case "$1" in 3199 -n) apply_cmd='cat' ;; 3200 -R) finder='find "$@" -type f';; 3201 -h) help ; exit 1 ;; 3202 *) break ;; 3203 esac 3204 shift 3205 done 3206 3207 if [ -z "$1" ]; then 3208 echo Usage: $0 [-h] [-n] [-r] files... 3209 exit 1 3210 fi 3211 3212 LOWER='abcdefghijklmnopqrstuvwxyz' 3213 UPPER='ABCDEFGHIJKLMNOPQRSTUVWXYZ' 3214 3215 case `basename $0` in 3216 *upper*) TO=$UPPER; FROM=$LOWER ;; 3217 *) FROM=$UPPER; TO=$LOWER ;; 3218 esac 3219 3220 eval $finder | sed -n ' 3221 3222 # remove all trailing slashes 3223 s/\/*$// 3224 3225 # add ./ if there is no path, only a filename 3226 /\//! s/^/.\// 3227 3228 # save path+filename 3229 h 3230 3231 # remove path 3232 s/.*\/// 3233 3234 # do conversion only on filename 3235 y/'$FROM'/'$TO'/ 3236 3237 # now line contains original path+file, while 3238 # hold space contains the new filename 3239 x 3240 3241 # add converted file name to line, which now contains 3242 # path/file-name\nconverted-file-name 3243 G 3244 3245 # check if converted file name is equal to original file name, 3246 # if it is, do not print anything 3247 /^.*\/\(.*\)\n\1/b 3248 3249 # escape special characters for the shell 3250 s/["$`\\]/\\&/g 3251 3252 # now, transform path/fromfile\n, into 3253 # mv path/fromfile path/tofile and print it 3254 s/^\(.*\/\)\(.*\)\n\(.*\)$/mv "\1\2" "\1\3"/p 3255 3256 ' | $apply_cmd 3257 3258 3259 File: sed.info, Node: Print bash environment, Next: Reverse chars of lines, Prev: Rename files to lower case, Up: Examples 3260 3261 7.5 Print âbashâ Environment 3262 ============================ 3263 3264 This script strips the definition of the shell functions from the output 3265 of the âsetâ Bourne-shell command. 3266 3267 #!/bin/sh 3268 3269 set | sed -n ' 3270 :x 3271 3272 # if no occurrence of "=()" print and load next line 3273 /=()/! { p; b; } 3274 / () $/! { p; b; } 3275 3276 # possible start of functions section 3277 # save the line in case this is a var like FOO="() " 3278 h 3279 3280 # if the next line has a brace, we quit because 3281 # nothing comes after functions 3282 n 3283 /^{/ q 3284 3285 # print the old line 3286 x; p 3287 3288 # work on the new line now 3289 x; bx 3290 ' 3291 3292 3293 File: sed.info, Node: Reverse chars of lines, Next: Text search across multiple lines, Prev: Print bash environment, Up: Examples 3294 3295 7.6 Reverse Characters of Lines 3296 =============================== 3297 3298 This script can be used to reverse the position of characters in lines. 3299 The technique moves two characters at a time, hence it is faster than 3300 more intuitive implementations. 3301 3302 Note the âtxâ command before the definition of the label. This is 3303 often needed to reset the flag that is tested by the âtâ command. 3304 3305 Imaginative readers will find uses for this script. An example is 3306 reversing the output of âbannerâ.(1) 3307 3308 #!/usr/bin/sed -f 3309 3310 /../! b 3311 3312 # Reverse a line. Begin embedding the line between two newlines 3313 s/^.*$/\ 3314 &\ 3315 / 3316 3317 # Move first character at the end. The regexp matches until 3318 # there are zero or one characters between the markers 3319 tx 3320 :x 3321 s/\(\n.\)\(.*\)\(.\n\)/\3\2\1/ 3322 tx 3323 3324 # Remove the newline markers 3325 s/\n//g 3326 3327 ---------- Footnotes ---------- 3328 3329 (1) This requires another script to pad the output of banner; for 3330 example 3331 3332 #! /bin/sh 3333 3334 banner -w $1 $2 $3 $4 | 3335 sed -e :a -e '/^.\{0,'$1'\}$/ { s/$/ /; ba; }' | 3336 ~/sedscripts/reverseline.sed 3337 3338 3339 File: sed.info, Node: Text search across multiple lines, Next: Line length adjustment, Prev: Reverse chars of lines, Up: Examples 3340 3341 7.7 Text search across multiple lines 3342 ===================================== 3343 3344 This section uses âNâ and âDâ commands to search for consecutive words 3345 spanning multiple lines. *Note Multiline techniques::. 3346 3347 These examples deal with finding doubled occurrences of words in a 3348 document. 3349 3350 Finding doubled words in a single line is easy using GNU âgrepâ and 3351 similarly with GNU âsedâ: 3352 3353 $ cat two-cities-dup1.txt 3354 It was the best of times, 3355 it was the worst of times, 3356 it was the the age of wisdom, 3357 it was the age of foolishness, 3358 3359 $ grep -E '\b(\w+)\s+\1\b' two-cities-dup1.txt 3360 it was the the age of wisdom, 3361 3362 $ grep -n -E '\b(\w+)\s+\1\b' two-cities-dup1.txt 3363 3:it was the the age of wisdom, 3364 3365 $ sed -En '/\b(\w+)\s+\1\b/p' two-cities-dup1.txt 3366 it was the the age of wisdom, 3367 3368 $ sed -En '/\b(\w+)\s+\1\b/{=;p}' two-cities-dup1.txt 3369 3 3370 it was the the age of wisdom, 3371 3372 ⢠The regular expression â\b\w+\s+â searches for word-boundary 3373 (â\bâ), followed by one-or-more word-characters (â\w+â), followed 3374 by whitespace (â\s+â). *Note regexp extensions::. 3375 3376 ⢠Adding parentheses around the â(\w+)â expression creates a 3377 subexpression. The regular expression pattern â(PATTERN)\s+\1â 3378 defines a subexpression (in the parentheses) followed by a 3379 back-reference, separated by whitespace. A successful match means 3380 the PATTERN was repeated twice in succession. *Note 3381 Back-references and Subexpressions::. 3382 3383 ⢠The word-boundery expression (â\bâ) at both ends ensures partial 3384 words are not matched (e.g. âthe thenâ is not a desired match). 3385 3386 ⢠The â-Eâ option enables extended regular expression syntax, 3387 alleviating the need to add backslashes before the parenthesis. 3388 *Note ERE syntax::. 3389 3390 When the doubled word span two lines the above regular expression 3391 will not find them as âgrepâ and âsedâ operate line-by-line. 3392 3393 By using âNâ and âDâ commands, âsedâ can apply regular expressions on 3394 multiple lines (that is, multiple lines are stored in the pattern space, 3395 and the regular expression works on it): 3396 3397 $ cat two-cities-dup2.txt 3398 It was the best of times, it was the 3399 worst of times, it was the 3400 the age of wisdom, 3401 it was the age of foolishness, 3402 3403 $ sed -En '{N; /\b(\w+)\s+\1\b/{=;p} ; D}' two-cities-dup2.txt 3404 3 3405 worst of times, it was the 3406 the age of wisdom, 3407 3408 ⢠The âNâ command appends the next line to the pattern space (thus 3409 ensuring it contains two consecutive lines in every cycle). 3410 3411 ⢠The regular expression uses â\s+â for word separator which matches 3412 both spaces and newlines. 3413 3414 ⢠The regular expression matches, the entire pattern space is printed 3415 with âpâ. No lines are printed by default due to the â-nâ option. 3416 3417 ⢠The âDâ removes the first line from the pattern space (up until the 3418 first newline), readying it for the next cycle. 3419 3420 See the GNU âcoreutilsâ manual for an alternative solution using âtr 3421 -sâ and âuniqâ at 3422 <https://gnu.org/s/coreutils/manual/html_node/Squeezing-and-deleting.html>. 3423 3424 3425 File: sed.info, Node: Line length adjustment, Next: Adding a header to multiple files, Prev: Text search across multiple lines, Up: Examples 3426 3427 7.8 Line length adjustment 3428 ========================== 3429 3430 This section uses âNâ and âPâ commands to read and write lines, and the 3431 âbâ command for branching. *Note Multiline techniques:: and *note 3432 Branching and flow control::. 3433 3434 This (somewhat contrived) example deal with formatting and wrapping 3435 lines of text of the following input file: 3436 3437 $ cat two-cities-mix.txt 3438 It was the best of times, it was 3439 the worst of times, it 3440 was the age of 3441 wisdom, 3442 it 3443 was 3444 the age 3445 of foolishness, 3446 3447 The following sed program wraps lines at 40 characters: 3448 $ cat wrap40.sed 3449 # outer loop 3450 :x 3451 3452 # Append a newline followed by the next input line to the pattern buffer 3453 N 3454 3455 # Remove all newlines from the pattern buffer 3456 s/\n/ /g 3457 3458 3459 # Inner loop 3460 :y 3461 3462 # Add a newline after the first 40 characters 3463 s/(.{40,40})/\1\n/ 3464 3465 # If there is a newline in the pattern buffer 3466 # (i.e. the previous substitution added a newline) 3467 /\n/ { 3468 # There are newlines in the pattern buffer - 3469 # print the content until the first newline. 3470 P 3471 3472 # Remove the printed characters and the first newline 3473 s/.*\n// 3474 3475 # branch to label 'y' - repeat inner loop 3476 by 3477 } 3478 3479 # No newlines in the pattern buffer - Branch to label 'x' (outer loop) 3480 # and read the next input line 3481 bx 3482 3483 The wrapped output: 3484 $ sed -E -f wrap40.sed two-cities-mix.txt 3485 It was the best of times, it was the wor 3486 st of times, it was the age of wisdom, i 3487 t was the age of foolishness, 3488 3489 3490 File: sed.info, Node: Adding a header to multiple files, Next: tac, Prev: Line length adjustment, Up: Examples 3491 3492 7.9 Adding a header to multiple files 3493 ===================================== 3494 3495 GNU âsedâ can be used to safely modify multiple files at once. 3496 3497 Add a single line to the beginning of source code files: 3498 3499 sed -i '1i/* Copyright (C) FOO BAR */' *.c 3500 3501 Adding a few lines is possible using â\nâ in the text: 3502 3503 sed -i '1i/*\n * Copyright (C) FOO BAR\n * Created by Jane Doe\n */' *.c 3504 3505 To add multiple lines from another file, use â0rFILEâ. A typical use 3506 case is adding a license notice header to all files: 3507 3508 ## Create the header file: 3509 $ cat<<'EOF'>LIC.TXT 3510 /* 3511 Copyright (C) 1989-2021 FOO BAR 3512 3513 This program is free software; you can redistribute it and/or modify 3514 it under the terms of the GNU General Public License as published by 3515 the Free Software Foundation; either version 3, or (at your option) 3516 any later version. 3517 3518 This program is distributed in the hope that it will be useful, 3519 but WITHOUT ANY WARRANTY; without even the implied warranty of 3520 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 3521 GNU General Public License for more details. 3522 3523 You should have received a copy of the GNU General Public License 3524 along with this program; If not, see <https://www.gnu.org/licenses/>. 3525 */ 3526 EOF 3527 3528 ## Add the file at the beginning of all source code files: 3529 $ sed -i '0rLIC.TXT' *.cpp *.h 3530 3531 With script files (e.g. â.shâ,â.pyâ,â.plâ files) the license notice 3532 typically appears _after_ the first line (the âshebangâ â#!â line). The 3533 â1rFILEâ command will add âFILEâ _after_ the first line: 3534 3535 ## Create the header file: 3536 $ cat<<'EOF'>LIC.TXT 3537 ## 3538 ## Copyright (C) 1989-2021 FOO BAR 3539 ## 3540 ## This program is free software; you can redistribute it and/or modify 3541 ## it under the terms of the GNU General Public License as published by 3542 ## the Free Software Foundation; either version 3, or (at your option) 3543 ## any later version. 3544 ## 3545 ## This program is distributed in the hope that it will be useful, 3546 ## but WITHOUT ANY WARRANTY; without even the implied warranty of 3547 ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 3548 ## GNU General Public License for more details. 3549 ## 3550 ## You should have received a copy of the GNU General Public License 3551 ## along with this program; If not, see <https://www.gnu.org/licenses/>. 3552 ## 3553 ## 3554 EOF 3555 3556 ## Add the file at the beginning of all source code files: 3557 $ sed -i '1rLIC.TXT' *.py *.sh 3558 3559 The above âsedâ commands can be combined with âfindâ to locate files 3560 in all subdirectories, âxargsâ to run additional commands on selected 3561 files and âgrepâ to filter out files that already contain a copyright 3562 notice: 3563 3564 find \( -iname '*.cpp' -o -iname '*.c' -o -iname '*.h' \) \ 3565 | xargs grep -Li copyright \ 3566 | xargs -r sed -i '0rLIC.TXT' 3567 3568 Or a slightly safe version (handling files with spaces and newlines): 3569 3570 find \( -iname '*.cpp' -o -iname '*.c' -o -iname '*.h' \) -print0 \ 3571 | xargs -0 grep -Z -Li copyright \ 3572 | xargs -0 -r sed -i '0rLIC.TXT' 3573 3574 Note: using the â0â address with ârâ command requires GNU âsedâ 3575 version 4.9 or later. *Note Zero Address::. 3576 3577 3578 File: sed.info, Node: tac, Next: cat -n, Prev: Adding a header to multiple files, Up: Examples 3579 3580 7.10 Reverse Lines of Files 3581 =========================== 3582 3583 This one begins a series of totally useless (yet interesting) scripts 3584 emulating various Unix commands. This, in particular, is a âtacâ 3585 workalike. 3586 3587 Note that on implementations other than GNU âsedâ this script might 3588 easily overflow internal buffers. 3589 3590 #!/usr/bin/sed -nf 3591 3592 # reverse all lines of input, i.e. first line became last, ... 3593 3594 # from the second line, the buffer (which contains all previous lines) 3595 # is *appended* to current line, so, the order will be reversed 3596 1! G 3597 3598 # on the last line we're done -- print everything 3599 $ p 3600 3601 # store everything on the buffer again 3602 h 3603 3604 3605 File: sed.info, Node: cat -n, Next: cat -b, Prev: tac, Up: Examples 3606 3607 7.11 Numbering Lines 3608 ==================== 3609 3610 This script replaces âcat -nâ; in fact it formats its output exactly 3611 like GNU âcatâ does. 3612 3613 Of course this is completely useless and for two reasons: first, 3614 because somebody else did it in C, second, because the following 3615 Bourne-shell script could be used for the same purpose and would be much 3616 faster: 3617 3618 #! /bin/sh 3619 sed -e "=" $@ | sed -e ' 3620 s/^/ / 3621 N 3622 s/^ *\(......\)\n/\1 / 3623 ' 3624 3625 It uses âsedâ to print the line number, then groups lines two by two 3626 using âNâ. Of course, this script does not teach as much as the one 3627 presented below. 3628 3629 The algorithm used for incrementing uses both buffers, so the line is 3630 printed as soon as possible and then discarded. The number is split so 3631 that changing digits go in a buffer and unchanged ones go in the other; 3632 the changed digits are modified in a single step (using a âyâ command). 3633 The line number for the next line is then composed and stored in the 3634 hold space, to be used in the next iteration. 3635 3636 #!/usr/bin/sed -nf 3637 3638 # Prime the pump on the first line 3639 x 3640 /^$/ s/^.*$/1/ 3641 3642 # Add the correct line number before the pattern 3643 G 3644 h 3645 3646 # Format it and print it 3647 s/^/ / 3648 s/^ *\(......\)\n/\1 /p 3649 3650 # Get the line number from hold space; add a zero 3651 # if we're going to add a digit on the next line 3652 g 3653 s/\n.*$// 3654 /^9*$/ s/^/0/ 3655 3656 # separate changing/unchanged digits with an x 3657 s/.9*$/x&/ 3658 3659 # keep changing digits in hold space 3660 h 3661 s/^.*x// 3662 y/0123456789/1234567890/ 3663 x 3664 3665 # keep unchanged digits in pattern space 3666 s/x.*$// 3667 3668 # compose the new number, remove the newline implicitly added by G 3669 G 3670 s/\n// 3671 h 3672 3673 3674 File: sed.info, Node: cat -b, Next: wc -c, Prev: cat -n, Up: Examples 3675 3676 7.12 Numbering Non-blank Lines 3677 ============================== 3678 3679 Emulating âcat -bâ is almost the same as âcat -nââwe only have to select 3680 which lines are to be numbered and which are not. 3681 3682 The part that is common to this script and the previous one is not 3683 commented to show how important it is to comment âsedâ scripts 3684 properly... 3685 3686 #!/usr/bin/sed -nf 3687 3688 /^$/ { 3689 p 3690 b 3691 } 3692 3693 # Same as cat -n from now 3694 x 3695 /^$/ s/^.*$/1/ 3696 G 3697 h 3698 s/^/ / 3699 s/^ *\(......\)\n/\1 /p 3700 x 3701 s/\n.*$// 3702 /^9*$/ s/^/0/ 3703 s/.9*$/x&/ 3704 h 3705 s/^.*x// 3706 y/0123456789/1234567890/ 3707 x 3708 s/x.*$// 3709 G 3710 s/\n// 3711 h 3712 3713 3714 File: sed.info, Node: wc -c, Next: wc -w, Prev: cat -b, Up: Examples 3715 3716 7.13 Counting Characters 3717 ======================== 3718 3719 This script shows another way to do arithmetic with âsedâ. In this case 3720 we have to add possibly large numbers, so implementing this by 3721 successive increments would not be feasible (and possibly even more 3722 complicated to contrive than this script). 3723 3724 The approach is to map numbers to letters, kind of an abacus 3725 implemented with âsedâ. âaâs are units, âbâs are tens and so on: we 3726 simply add the number of characters on the current line as units, and 3727 then propagate the carry to tens, hundreds, and so on. 3728 3729 As usual, running totals are kept in hold space. 3730 3731 On the last line, we convert the abacus form back to decimal. For 3732 the sake of variety, this is done with a loop rather than with some 80 3733 âsâ commands(1): first we convert units, removing âaâs from the number; 3734 then we rotate letters so that tens become âaâs, and so on until no more 3735 letters remain. 3736 3737 #!/usr/bin/sed -nf 3738 3739 # Add n+1 a's to hold space (+1 is for the newline) 3740 s/./a/g 3741 H 3742 x 3743 s/\n/a/ 3744 3745 # Do the carry. The t's and b's are not necessary, 3746 # but they do speed up the thing 3747 t a 3748 : a; s/aaaaaaaaaa/b/g; t b; b done 3749 : b; s/bbbbbbbbbb/c/g; t c; b done 3750 : c; s/cccccccccc/d/g; t d; b done 3751 : d; s/dddddddddd/e/g; t e; b done 3752 : e; s/eeeeeeeeee/f/g; t f; b done 3753 : f; s/ffffffffff/g/g; t g; b done 3754 : g; s/gggggggggg/h/g; t h; b done 3755 : h; s/hhhhhhhhhh//g 3756 3757 : done 3758 $! { 3759 h 3760 b 3761 } 3762 3763 # On the last line, convert back to decimal 3764 3765 : loop 3766 /a/! s/[b-h]*/&0/ 3767 s/aaaaaaaaa/9/ 3768 s/aaaaaaaa/8/ 3769 s/aaaaaaa/7/ 3770 s/aaaaaa/6/ 3771 s/aaaaa/5/ 3772 s/aaaa/4/ 3773 s/aaa/3/ 3774 s/aa/2/ 3775 s/a/1/ 3776 3777 : next 3778 y/bcdefgh/abcdefg/ 3779 /[a-h]/ b loop 3780 p 3781 3782 ---------- Footnotes ---------- 3783 3784 (1) Some implementations have a limit of 199 commands per script 3785 3786 3787 File: sed.info, Node: wc -w, Next: wc -l, Prev: wc -c, Up: Examples 3788 3789 7.14 Counting Words 3790 =================== 3791 3792 This script is almost the same as the previous one, once each of the 3793 words on the line is converted to a single âaâ (in the previous script 3794 each letter was changed to an âaâ). 3795 3796 It is interesting that real âwcâ programs have optimized loops for 3797 âwc -câ, so they are much slower at counting words rather than 3798 characters. This scriptâs bottleneck, instead, is arithmetic, and hence 3799 the word-counting one is faster (it has to manage smaller numbers). 3800 3801 Again, the common parts are not commented to show the importance of 3802 commenting âsedâ scripts. 3803 3804 #!/usr/bin/sed -nf 3805 3806 # Convert words to a's 3807 s/[ <TAB>][ <TAB>]*/ /g 3808 s/^/ / 3809 s/ [^ ][^ ]*/a /g 3810 s/ //g 3811 3812 # Append them to hold space 3813 H 3814 x 3815 s/\n// 3816 3817 # From here on it is the same as in wc -c. 3818 /aaaaaaaaaa/! bx; s/aaaaaaaaaa/b/g 3819 /bbbbbbbbbb/! bx; s/bbbbbbbbbb/c/g 3820 /cccccccccc/! bx; s/cccccccccc/d/g 3821 /dddddddddd/! bx; s/dddddddddd/e/g 3822 /eeeeeeeeee/! bx; s/eeeeeeeeee/f/g 3823 /ffffffffff/! bx; s/ffffffffff/g/g 3824 /gggggggggg/! bx; s/gggggggggg/h/g 3825 s/hhhhhhhhhh//g 3826 :x 3827 $! { h; b; } 3828 :y 3829 /a/! s/[b-h]*/&0/ 3830 s/aaaaaaaaa/9/ 3831 s/aaaaaaaa/8/ 3832 s/aaaaaaa/7/ 3833 s/aaaaaa/6/ 3834 s/aaaaa/5/ 3835 s/aaaa/4/ 3836 s/aaa/3/ 3837 s/aa/2/ 3838 s/a/1/ 3839 y/bcdefgh/abcdefg/ 3840 /[a-h]/ by 3841 p 3842 3843 3844 File: sed.info, Node: wc -l, Next: head, Prev: wc -w, Up: Examples 3845 3846 7.15 Counting Lines 3847 =================== 3848 3849 No strange things are done now, because âsedâ gives us âwc -lâ 3850 functionality for free!!! Look: 3851 3852 #!/usr/bin/sed -nf 3853 $= 3854 3855 3856 File: sed.info, Node: head, Next: tail, Prev: wc -l, Up: Examples 3857 3858 7.16 Printing the First Lines 3859 ============================= 3860 3861 This script is probably the simplest useful âsedâ script. It displays 3862 the first 10 lines of input; the number of displayed lines is right 3863 before the âqâ command. 3864 3865 #!/usr/bin/sed -f 3866 10q 3867 3868 3869 File: sed.info, Node: tail, Next: uniq, Prev: head, Up: Examples 3870 3871 7.17 Printing the Last Lines 3872 ============================ 3873 3874 Printing the last N lines rather than the first is more complex but 3875 indeed possible. N is encoded in the second line, before the bang 3876 character. 3877 3878 This script is similar to the âtacâ script in that it keeps the final 3879 output in the hold space and prints it at the end: 3880 3881 #!/usr/bin/sed -nf 3882 3883 1! {; H; g; } 3884 1,10 !s/[^\n]*\n// 3885 $p 3886 h 3887 3888 Mainly, the scripts keeps a window of 10 lines and slides it by 3889 adding a line and deleting the oldest (the substitution command on the 3890 second line works like a âDâ command but does not restart the loop). 3891 3892 The âsliding windowâ technique is a very powerful way to write 3893 efficient and complex âsedâ scripts, because commands like âPâ would 3894 require a lot of work if implemented manually. 3895 3896 To introduce the technique, which is fully demonstrated in the rest 3897 of this chapter and is based on the âNâ, âPâ and âDâ commands, here is 3898 an implementation of âtailâ using a simple âsliding window.â 3899 3900 This looks complicated but in fact the working is the same as the 3901 last script: after we have kicked in the appropriate number of lines, 3902 however, we stop using the hold space to keep inter-line state, and 3903 instead use âNâ and âDâ to slide pattern space by one line: 3904 3905 #!/usr/bin/sed -f 3906 3907 1h 3908 2,10 {; H; g; } 3909 $q 3910 1,9d 3911 N 3912 D 3913 3914 Note how the first, second and fourth line are inactive after the 3915 first ten lines of input. After that, all the script does is: exiting 3916 on the last line of input, appending the next input line to pattern 3917 space, and removing the first line. 3918 3919 3920 File: sed.info, Node: uniq, Next: uniq -d, Prev: tail, Up: Examples 3921 3922 7.18 Make Duplicate Lines Unique 3923 ================================ 3924 3925 This is an example of the art of using the âNâ, âPâ and âDâ commands, 3926 probably the most difficult to master. 3927 3928 #!/usr/bin/sed -f 3929 h 3930 3931 :b 3932 # On the last line, print and exit 3933 $b 3934 N 3935 /^\(.*\)\n\1$/ { 3936 # The two lines are identical. Undo the effect of 3937 # the n command. 3938 g 3939 bb 3940 } 3941 3942 # If the N command had added the last line, print and exit 3943 $b 3944 3945 # The lines are different; print the first and go 3946 # back working on the second. 3947 P 3948 D 3949 3950 As you can see, we maintain a 2-line window using âPâ and âDâ. This 3951 technique is often used in advanced âsedâ scripts. 3952 3953 3954 File: sed.info, Node: uniq -d, Next: uniq -u, Prev: uniq, Up: Examples 3955 3956 7.19 Print Duplicated Lines of Input 3957 ==================================== 3958 3959 This script prints only duplicated lines, like âuniq -dâ. 3960 3961 #!/usr/bin/sed -nf 3962 3963 $b 3964 N 3965 /^\(.*\)\n\1$/ { 3966 # Print the first of the duplicated lines 3967 s/.*\n// 3968 p 3969 3970 # Loop until we get a different line 3971 :b 3972 $b 3973 N 3974 /^\(.*\)\n\1$/ { 3975 s/.*\n// 3976 bb 3977 } 3978 } 3979 3980 # The last line cannot be followed by duplicates 3981 $b 3982 3983 # Found a different one. Leave it alone in the pattern space 3984 # and go back to the top, hunting its duplicates 3985 D 3986 3987 3988 File: sed.info, Node: uniq -u, Next: cat -s, Prev: uniq -d, Up: Examples 3989 3990 7.20 Remove All Duplicated Lines 3991 ================================ 3992 3993 This script prints only unique lines, like âuniq -uâ. 3994 3995 #!/usr/bin/sed -f 3996 3997 # Search for a duplicate line --- until that, print what you find. 3998 $b 3999 N 4000 /^\(.*\)\n\1$/ ! { 4001 P 4002 D 4003 } 4004 4005 :c 4006 # Got two equal lines in pattern space. At the 4007 # end of the file we simply exit 4008 $d 4009 4010 # Else, we keep reading lines with N until we 4011 # find a different one 4012 s/.*\n// 4013 N 4014 /^\(.*\)\n\1$/ { 4015 bc 4016 } 4017 4018 # Remove the last instance of the duplicate line 4019 # and go back to the top 4020 D 4021 4022 4023 File: sed.info, Node: cat -s, Prev: uniq -u, Up: Examples 4024 4025 7.21 Squeezing Blank Lines 4026 ========================== 4027 4028 As a final example, here are three scripts, of increasing complexity and 4029 speed, that implement the same function as âcat -sâ, that is squeezing 4030 blank lines. 4031 4032 The first leaves a blank line at the beginning and end if there are 4033 some already. 4034 4035 #!/usr/bin/sed -f 4036 4037 # on empty lines, join with next 4038 # Note there is a star in the regexp 4039 :x 4040 /^\n*$/ { 4041 N 4042 bx 4043 } 4044 4045 # now, squeeze all '\n', this can be also done by: 4046 # s/^\(\n\)*/\1/ 4047 s/\n*/\ 4048 / 4049 4050 This one is a bit more complex and removes all empty lines at the 4051 beginning. It does leave a single blank line at end if one was there. 4052 4053 #!/usr/bin/sed -f 4054 4055 # delete all leading empty lines 4056 1,/^./{ 4057 /./!d 4058 } 4059 4060 # on an empty line we remove it and all the following 4061 # empty lines, but one 4062 :x 4063 /./!{ 4064 N 4065 s/^\n$// 4066 tx 4067 } 4068 4069 This removes leading and trailing blank lines. It is also the 4070 fastest. Note that loops are completely done with ânâ and âbâ, without 4071 relying on âsedâ to restart the script automatically at the end of a 4072 line. 4073 4074 #!/usr/bin/sed -nf 4075 4076 # delete all (leading) blanks 4077 /./!d 4078 4079 # get here: so there is a non empty 4080 :x 4081 # print it 4082 p 4083 # get next 4084 n 4085 # got chars? print it again, etc... 4086 /./bx 4087 4088 # no, don't have chars: got an empty line 4089 :z 4090 # get next, if last line we finish here so no trailing 4091 # empty lines are written 4092 n 4093 # also empty? then ignore it, and get next... this will 4094 # remove ALL empty lines 4095 /./!bz 4096 4097 # all empty lines were deleted/ignored, but we have a non empty. As 4098 # what we want to do is to squeeze, insert a blank line artificially 4099 i\ 4100 4101 bx 4102 4103 4104 File: sed.info, Node: Limitations, Next: Other Resources, Prev: Examples, Up: Top 4105 4106 8 GNU âsedââs Limitations and Non-limitations 4107 ********************************************* 4108 4109 For those who want to write portable âsedâ scripts, be aware that some 4110 implementations have been known to limit line lengths (for the pattern 4111 and hold spaces) to be no more than 4000 bytes. The POSIX standard 4112 specifies that conforming âsedâ implementations shall support at least 4113 8192 byte line lengths. GNU âsedâ has no built-in limit on line length; 4114 as long as it can âmalloc()â more (virtual) memory, you can feed or 4115 construct lines as long as you like. 4116 4117 However, recursion is used to handle subpatterns and indefinite 4118 repetition. This means that the available stack space may limit the 4119 size of the buffer that can be processed by certain patterns. 4120 4121 4122 File: sed.info, Node: Other Resources, Next: Reporting Bugs, Prev: Limitations, Up: Top 4123 4124 9 Other Resources for Learning About âsedâ 4125 ****************************************** 4126 4127 For up to date information about GNU âsedâ please visit 4128 <https://www.gnu.org/software/sed/>. 4129 4130 Send general questions and suggestions to <sed-devel@gnu.org>. Visit 4131 the mailing list archives for past discussions at 4132 <https://lists.gnu.org/archive/html/sed-devel/>. 4133 4134 The following resources provide information about âsedâ (both GNU 4135 âsedâ and other variations). Note these not maintained by GNU âsedâ 4136 developers. 4137 4138 ⢠sed â$HOMEâ: <http://sed.sf.net> 4139 4140 ⢠sed FAQ: <http://sed.sf.net/sedfaq.html> 4141 4142 ⢠sederâs grabbag: <http://sed.sf.net/grabbag> 4143 4144 ⢠The âsed-usersâ mailing list maintained by Sven Guckes: 4145 <http://groups.yahoo.com/group/sed-users/> (note this is _not_ the 4146 GNU âsedâ mailing list). 4147 4148 4149 File: sed.info, Node: Reporting Bugs, Next: GNU Free Documentation License, Prev: Other Resources, Up: Top 4150 4151 10 Reporting Bugs 4152 ***************** 4153 4154 Email bug reports to <bug-sed@gnu.org>. Also, please include the output 4155 of âsed --versionâ in the body of your report if at all possible. 4156 4157 Please do not send a bug report like this: 4158 4159 while building frobme-1.3.4 4160 $ configure 4161 errorâ sed: file sedscr line 1: Unknown option to 's' 4162 4163 If GNU âsedâ doesnât configure your favorite package, take a few 4164 extra minutes to identify the specific problem and make a stand-alone 4165 test case. Unlike other programs such as C compilers, making such test 4166 cases for âsedâ is quite simple. 4167 4168 A stand-alone test case includes all the data necessary to perform 4169 the test, and the specific invocation of âsedâ that causes the problem. 4170 The smaller a stand-alone test case is, the better. A test case should 4171 not involve something as far removed from âsedâ as âtry to configure 4172 frobme-1.3.4â. Yes, that is in principle enough information to look for 4173 the bug, but that is not a very practical prospect. 4174 4175 Here are a few commonly reported bugs that are not bugs. 4176 4177 âNâ command on the last line 4178 4179 Most versions of âsedâ exit without printing anything when the âNâ 4180 command is issued on the last line of a file. GNU âsedâ prints 4181 pattern space before exiting unless of course the â-nâ command 4182 switch has been specified. This choice is by design. 4183 4184 Default behavior (gnu extension, non-POSIX conforming): 4185 $ seq 3 | sed N 4186 1 4187 2 4188 3 4189 To force POSIX-conforming behavior: 4190 $ seq 3 | sed --posix N 4191 1 4192 2 4193 4194 For example, the behavior of 4195 sed N foo bar 4196 would depend on whether foo has an even or an odd number of 4197 lines(1). Or, when writing a script to read the next few lines 4198 following a pattern match, traditional implementations of âsedâ 4199 would force you to write something like 4200 /foo/{ $!N; $!N; $!N; $!N; $!N; $!N; $!N; $!N; $!N } 4201 instead of just 4202 /foo/{ N;N;N;N;N;N;N;N;N; } 4203 4204 In any case, the simplest workaround is to use â$d;Nâ in scripts 4205 that rely on the traditional behavior, or to set the 4206 âPOSIXLY_CORRECTâ variable to a non-empty value. 4207 4208 Regex syntax clashes (problems with backslashes) 4209 âsedâ uses the POSIX basic regular expression syntax. According to 4210 the standard, the meaning of some escape sequences is undefined in 4211 this syntax; notable in the case of âsedâ are â\|â, â\+â, â\?â, 4212 â\`â, â\'â, â\<â, â\>â, â\bâ, â\Bâ, â\wâ, and â\Wâ. 4213 4214 As in all GNU programs that use POSIX basic regular expressions, 4215 âsedâ interprets these escape sequences as special characters. So, 4216 âx\+â matches one or more occurrences of âxâ. âabc\|defâ matches 4217 either âabcâ or âdefâ. 4218 4219 This syntax may cause problems when running scripts written for 4220 other âsedâs. Some âsedâ programs have been written with the 4221 assumption that â\|â and â\+â match the literal characters â|â and 4222 â+â. Such scripts must be modified by removing the spurious 4223 backslashes if they are to be used with modern implementations of 4224 âsedâ, like GNU âsedâ. 4225 4226 On the other hand, some scripts use s|abc\|def||g to remove 4227 occurrences of _either_ âabcâ or âdefâ. While this worked until 4228 âsedâ 4.0.x, newer versions interpret this as removing the string 4229 âabc|defâ. This is again undefined behavior according to POSIX, 4230 and this interpretation is arguably more robust: older âsedâs, for 4231 example, required that the regex matcher parsed â\/â as â/â in the 4232 common case of escaping a slash, which is again undefined behavior; 4233 the new behavior avoids this, and this is good because the regex 4234 matcher is only partially under our control. 4235 4236 In addition, this version of âsedâ supports several escape 4237 characters (some of which are multi-character) to insert 4238 non-printable characters in scripts (â\aâ, â\câ, â\dâ, â\oâ, â\râ, 4239 â\tâ, â\vâ, â\xâ). These can cause similar problems with scripts 4240 written for other âsedâs. 4241 4242 â-iâ clobbers read-only files 4243 4244 In short, âsed -iâ will let you delete the contents of a read-only 4245 file, and in general the â-iâ option (*note Invocation: Invoking 4246 sed.) lets you clobber protected files. This is not a bug, but 4247 rather a consequence of how the Unix file system works. 4248 4249 The permissions on a file say what can happen to the data in that 4250 file, while the permissions on a directory say what can happen to 4251 the list of files in that directory. âsed -iâ will not ever open 4252 for writing a file that is already on disk. Rather, it will work 4253 on a temporary file that is finally renamed to the original name: 4254 if you rename or delete files, youâre actually modifying the 4255 contents of the directory, so the operation depends on the 4256 permissions of the directory, not of the file. For this same 4257 reason, âsedâ does not let you use â-iâ on a writable file in a 4258 read-only directory, and will break hard or symbolic links when 4259 â-iâ is used on such a file. 4260 4261 â0aâ does not work (gives an error) 4262 4263 There is no line 0. 0 is a special address that is only used to 4264 treat addresses like â0,/RE/â as active when the script starts: if 4265 you write â1,/abc/dâ and the first line includes the string âabcâ, 4266 then that match would be ignored because address ranges must span 4267 at least two lines (barring the end of the file); but what you 4268 probably wanted is to delete every line up to the first one 4269 including âabcâ, and this is obtained with â0,/abc/dâ. 4270 4271 â[a-z]â is case insensitive 4272 4273 You are encountering problems with locales. POSIX mandates that 4274 â[a-z]â uses the current localeâs collation order â in C parlance, 4275 that means using âstrcoll(3)â instead of âstrcmp(3)â. Some locales 4276 have a case-insensitive collation order, others donât. 4277 4278 Another problem is that â[a-z]â tries to use collation symbols. 4279 This only happens if you are on the GNU system, using GNU libcâs 4280 regular expression matcher instead of compiling the one supplied 4281 with GNU sed. In a Danish locale, for example, the regular 4282 expression â^[a-z]$â matches the string âaaâ, because this is a 4283 single collating symbol that comes after âaâ and before âbâ; âllâ 4284 behaves similarly in Spanish locales, or âijâ in Dutch locales. 4285 4286 To work around these problems, which may cause bugs in shell 4287 scripts, set the âLC_COLLATEâ and âLC_CTYPEâ environment variables 4288 to âCâ. 4289 4290 âs/.*//â does not clear pattern space 4291 4292 This happens if your input stream includes invalid multibyte 4293 sequences. POSIX mandates that such sequences are _not_ matched by 4294 â.â, so that âs/.*//â will not clear pattern space as you would 4295 expect. In fact, there is no way to clear sedâs buffers in the 4296 middle of the script in most multibyte locales (including UTF-8 4297 locales). For this reason, GNU âsedâ provides a âzâ command (for 4298 âzapâ) as an extension. 4299 4300 To work around these problems, which may cause bugs in shell 4301 scripts, set the âLC_COLLATEâ and âLC_CTYPEâ environment variables 4302 to âCâ. 4303 4304 ---------- Footnotes ---------- 4305 4306 (1) which is the actual âbugâ that prompted the change in behavior 4307 4308 4309 File: sed.info, Node: GNU Free Documentation License, Next: Concept Index, Prev: Reporting Bugs, Up: Top 4310 4311 Appendix A GNU Free Documentation License 4312 ***************************************** 4313 4314 Version 1.3, 3 November 2008 4315 4316 Copyright © 2000, 2001, 2002, 2007, 2008 Free Software Foundation, Inc. 4317 <https://fsf.org/> 4318 4319 Everyone is permitted to copy and distribute verbatim copies 4320 of this license document, but changing it is not allowed. 4321 4322 0. PREAMBLE 4323 4324 The purpose of this License is to make a manual, textbook, or other 4325 functional and useful document âfreeâ in the sense of freedom: to 4326 assure everyone the effective freedom to copy and redistribute it, 4327 with or without modifying it, either commercially or 4328 noncommercially. Secondarily, this License preserves for the 4329 author and publisher a way to get credit for their work, while not 4330 being considered responsible for modifications made by others. 4331 4332 This License is a kind of âcopyleftâ, which means that derivative 4333 works of the document must themselves be free in the same sense. 4334 It complements the GNU General Public License, which is a copyleft 4335 license designed for free software. 4336 4337 We have designed this License in order to use it for manuals for 4338 free software, because free software needs free documentation: a 4339 free program should come with manuals providing the same freedoms 4340 that the software does. But this License is not limited to 4341 software manuals; it can be used for any textual work, regardless 4342 of subject matter or whether it is published as a printed book. We 4343 recommend this License principally for works whose purpose is 4344 instruction or reference. 4345 4346 1. APPLICABILITY AND DEFINITIONS 4347 4348 This License applies to any manual or other work, in any medium, 4349 that contains a notice placed by the copyright holder saying it can 4350 be distributed under the terms of this License. Such a notice 4351 grants a world-wide, royalty-free license, unlimited in duration, 4352 to use that work under the conditions stated herein. The 4353 âDocumentâ, below, refers to any such manual or work. Any member 4354 of the public is a licensee, and is addressed as âyouâ. You accept 4355 the license if you copy, modify or distribute the work in a way 4356 requiring permission under copyright law. 4357 4358 A âModified Versionâ of the Document means any work containing the 4359 Document or a portion of it, either copied verbatim, or with 4360 modifications and/or translated into another language. 4361 4362 A âSecondary Sectionâ is a named appendix or a front-matter section 4363 of the Document that deals exclusively with the relationship of the 4364 publishers or authors of the Document to the Documentâs overall 4365 subject (or to related matters) and contains nothing that could 4366 fall directly within that overall subject. (Thus, if the Document 4367 is in part a textbook of mathematics, a Secondary Section may not 4368 explain any mathematics.) The relationship could be a matter of 4369 historical connection with the subject or with related matters, or 4370 of legal, commercial, philosophical, ethical or political position 4371 regarding them. 4372 4373 The âInvariant Sectionsâ are certain Secondary Sections whose 4374 titles are designated, as being those of Invariant Sections, in the 4375 notice that says that the Document is released under this License. 4376 If a section does not fit the above definition of Secondary then it 4377 is not allowed to be designated as Invariant. The Document may 4378 contain zero Invariant Sections. If the Document does not identify 4379 any Invariant Sections then there are none. 4380 4381 The âCover Textsâ are certain short passages of text that are 4382 listed, as Front-Cover Texts or Back-Cover Texts, in the notice 4383 that says that the Document is released under this License. A 4384 Front-Cover Text may be at most 5 words, and a Back-Cover Text may 4385 be at most 25 words. 4386 4387 A âTransparentâ copy of the Document means a machine-readable copy, 4388 represented in a format whose specification is available to the 4389 general public, that is suitable for revising the document 4390 straightforwardly with generic text editors or (for images composed 4391 of pixels) generic paint programs or (for drawings) some widely 4392 available drawing editor, and that is suitable for input to text 4393 formatters or for automatic translation to a variety of formats 4394 suitable for input to text formatters. A copy made in an otherwise 4395 Transparent file format whose markup, or absence of markup, has 4396 been arranged to thwart or discourage subsequent modification by 4397 readers is not Transparent. An image format is not Transparent if 4398 used for any substantial amount of text. A copy that is not 4399 âTransparentâ is called âOpaqueâ. 4400 4401 Examples of suitable formats for Transparent copies include plain 4402 ASCII without markup, Texinfo input format, LaTeX input format, 4403 SGML or XML using a publicly available DTD, and standard-conforming 4404 simple HTML, PostScript or PDF designed for human modification. 4405 Examples of transparent image formats include PNG, XCF and JPG. 4406 Opaque formats include proprietary formats that can be read and 4407 edited only by proprietary word processors, SGML or XML for which 4408 the DTD and/or processing tools are not generally available, and 4409 the machine-generated HTML, PostScript or PDF produced by some word 4410 processors for output purposes only. 4411 4412 The âTitle Pageâ means, for a printed book, the title page itself, 4413 plus such following pages as are needed to hold, legibly, the 4414 material this License requires to appear in the title page. For 4415 works in formats which do not have any title page as such, âTitle 4416 Pageâ means the text near the most prominent appearance of the 4417 workâs title, preceding the beginning of the body of the text. 4418 4419 The âpublisherâ means any person or entity that distributes copies 4420 of the Document to the public. 4421 4422 A section âEntitled XYZâ means a named subunit of the Document 4423 whose title either is precisely XYZ or contains XYZ in parentheses 4424 following text that translates XYZ in another language. (Here XYZ 4425 stands for a specific section name mentioned below, such as 4426 âAcknowledgementsâ, âDedicationsâ, âEndorsementsâ, or âHistoryâ.) 4427 To âPreserve the Titleâ of such a section when you modify the 4428 Document means that it remains a section âEntitled XYZâ according 4429 to this definition. 4430 4431 The Document may include Warranty Disclaimers next to the notice 4432 which states that this License applies to the Document. These 4433 Warranty Disclaimers are considered to be included by reference in 4434 this License, but only as regards disclaiming warranties: any other 4435 implication that these Warranty Disclaimers may have is void and 4436 has no effect on the meaning of this License. 4437 4438 2. VERBATIM COPYING 4439 4440 You may copy and distribute the Document in any medium, either 4441 commercially or noncommercially, provided that this License, the 4442 copyright notices, and the license notice saying this License 4443 applies to the Document are reproduced in all copies, and that you 4444 add no other conditions whatsoever to those of this License. You 4445 may not use technical measures to obstruct or control the reading 4446 or further copying of the copies you make or distribute. However, 4447 you may accept compensation in exchange for copies. If you 4448 distribute a large enough number of copies you must also follow the 4449 conditions in section 3. 4450 4451 You may also lend copies, under the same conditions stated above, 4452 and you may publicly display copies. 4453 4454 3. COPYING IN QUANTITY 4455 4456 If you publish printed copies (or copies in media that commonly 4457 have printed covers) of the Document, numbering more than 100, and 4458 the Documentâs license notice requires Cover Texts, you must 4459 enclose the copies in covers that carry, clearly and legibly, all 4460 these Cover Texts: Front-Cover Texts on the front cover, and 4461 Back-Cover Texts on the back cover. Both covers must also clearly 4462 and legibly identify you as the publisher of these copies. The 4463 front cover must present the full title with all words of the title 4464 equally prominent and visible. You may add other material on the 4465 covers in addition. Copying with changes limited to the covers, as 4466 long as they preserve the title of the Document and satisfy these 4467 conditions, can be treated as verbatim copying in other respects. 4468 4469 If the required texts for either cover are too voluminous to fit 4470 legibly, you should put the first ones listed (as many as fit 4471 reasonably) on the actual cover, and continue the rest onto 4472 adjacent pages. 4473 4474 If you publish or distribute Opaque copies of the Document 4475 numbering more than 100, you must either include a machine-readable 4476 Transparent copy along with each Opaque copy, or state in or with 4477 each Opaque copy a computer-network location from which the general 4478 network-using public has access to download using public-standard 4479 network protocols a complete Transparent copy of the Document, free 4480 of added material. If you use the latter option, you must take 4481 reasonably prudent steps, when you begin distribution of Opaque 4482 copies in quantity, to ensure that this Transparent copy will 4483 remain thus accessible at the stated location until at least one 4484 year after the last time you distribute an Opaque copy (directly or 4485 through your agents or retailers) of that edition to the public. 4486 4487 It is requested, but not required, that you contact the authors of 4488 the Document well before redistributing any large number of copies, 4489 to give them a chance to provide you with an updated version of the 4490 Document. 4491 4492 4. MODIFICATIONS 4493 4494 You may copy and distribute a Modified Version of the Document 4495 under the conditions of sections 2 and 3 above, provided that you 4496 release the Modified Version under precisely this License, with the 4497 Modified Version filling the role of the Document, thus licensing 4498 distribution and modification of the Modified Version to whoever 4499 possesses a copy of it. In addition, you must do these things in 4500 the Modified Version: 4501 4502 A. Use in the Title Page (and on the covers, if any) a title 4503 distinct from that of the Document, and from those of previous 4504 versions (which should, if there were any, be listed in the 4505 History section of the Document). You may use the same title 4506 as a previous version if the original publisher of that 4507 version gives permission. 4508 4509 B. List on the Title Page, as authors, one or more persons or 4510 entities responsible for authorship of the modifications in 4511 the Modified Version, together with at least five of the 4512 principal authors of the Document (all of its principal 4513 authors, if it has fewer than five), unless they release you 4514 from this requirement. 4515 4516 C. State on the Title page the name of the publisher of the 4517 Modified Version, as the publisher. 4518 4519 D. Preserve all the copyright notices of the Document. 4520 4521 E. Add an appropriate copyright notice for your modifications 4522 adjacent to the other copyright notices. 4523 4524 F. Include, immediately after the copyright notices, a license 4525 notice giving the public permission to use the Modified 4526 Version under the terms of this License, in the form shown in 4527 the Addendum below. 4528 4529 G. Preserve in that license notice the full lists of Invariant 4530 Sections and required Cover Texts given in the Documentâs 4531 license notice. 4532 4533 H. Include an unaltered copy of this License. 4534 4535 I. Preserve the section Entitled âHistoryâ, Preserve its Title, 4536 and add to it an item stating at least the title, year, new 4537 authors, and publisher of the Modified Version as given on the 4538 Title Page. If there is no section Entitled âHistoryâ in the 4539 Document, create one stating the title, year, authors, and 4540 publisher of the Document as given on its Title Page, then add 4541 an item describing the Modified Version as stated in the 4542 previous sentence. 4543 4544 J. Preserve the network location, if any, given in the Document 4545 for public access to a Transparent copy of the Document, and 4546 likewise the network locations given in the Document for 4547 previous versions it was based on. These may be placed in the 4548 âHistoryâ section. You may omit a network location for a work 4549 that was published at least four years before the Document 4550 itself, or if the original publisher of the version it refers 4551 to gives permission. 4552 4553 K. For any section Entitled âAcknowledgementsâ or âDedicationsâ, 4554 Preserve the Title of the section, and preserve in the section 4555 all the substance and tone of each of the contributor 4556 acknowledgements and/or dedications given therein. 4557 4558 L. Preserve all the Invariant Sections of the Document, unaltered 4559 in their text and in their titles. Section numbers or the 4560 equivalent are not considered part of the section titles. 4561 4562 M. Delete any section Entitled âEndorsementsâ. Such a section 4563 may not be included in the Modified Version. 4564 4565 N. Do not retitle any existing section to be Entitled 4566 âEndorsementsâ or to conflict in title with any Invariant 4567 Section. 4568 4569 O. Preserve any Warranty Disclaimers. 4570 4571 If the Modified Version includes new front-matter sections or 4572 appendices that qualify as Secondary Sections and contain no 4573 material copied from the Document, you may at your option designate 4574 some or all of these sections as invariant. To do this, add their 4575 titles to the list of Invariant Sections in the Modified Versionâs 4576 license notice. These titles must be distinct from any other 4577 section titles. 4578 4579 You may add a section Entitled âEndorsementsâ, provided it contains 4580 nothing but endorsements of your Modified Version by various 4581 partiesâfor example, statements of peer review or that the text has 4582 been approved by an organization as the authoritative definition of 4583 a standard. 4584 4585 You may add a passage of up to five words as a Front-Cover Text, 4586 and a passage of up to 25 words as a Back-Cover Text, to the end of 4587 the list of Cover Texts in the Modified Version. Only one passage 4588 of Front-Cover Text and one of Back-Cover Text may be added by (or 4589 through arrangements made by) any one entity. If the Document 4590 already includes a cover text for the same cover, previously added 4591 by you or by arrangement made by the same entity you are acting on 4592 behalf of, you may not add another; but you may replace the old 4593 one, on explicit permission from the previous publisher that added 4594 the old one. 4595 4596 The author(s) and publisher(s) of the Document do not by this 4597 License give permission to use their names for publicity for or to 4598 assert or imply endorsement of any Modified Version. 4599 4600 5. COMBINING DOCUMENTS 4601 4602 You may combine the Document with other documents released under 4603 this License, under the terms defined in section 4 above for 4604 modified versions, provided that you include in the combination all 4605 of the Invariant Sections of all of the original documents, 4606 unmodified, and list them all as Invariant Sections of your 4607 combined work in its license notice, and that you preserve all 4608 their Warranty Disclaimers. 4609 4610 The combined work need only contain one copy of this License, and 4611 multiple identical Invariant Sections may be replaced with a single 4612 copy. If there are multiple Invariant Sections with the same name 4613 but different contents, make the title of each such section unique 4614 by adding at the end of it, in parentheses, the name of the 4615 original author or publisher of that section if known, or else a 4616 unique number. Make the same adjustment to the section titles in 4617 the list of Invariant Sections in the license notice of the 4618 combined work. 4619 4620 In the combination, you must combine any sections Entitled 4621 âHistoryâ in the various original documents, forming one section 4622 Entitled âHistoryâ; likewise combine any sections Entitled 4623 âAcknowledgementsâ, and any sections Entitled âDedicationsâ. You 4624 must delete all sections Entitled âEndorsements.â 4625 4626 6. COLLECTIONS OF DOCUMENTS 4627 4628 You may make a collection consisting of the Document and other 4629 documents released under this License, and replace the individual 4630 copies of this License in the various documents with a single copy 4631 that is included in the collection, provided that you follow the 4632 rules of this License for verbatim copying of each of the documents 4633 in all other respects. 4634 4635 You may extract a single document from such a collection, and 4636 distribute it individually under this License, provided you insert 4637 a copy of this License into the extracted document, and follow this 4638 License in all other respects regarding verbatim copying of that 4639 document. 4640 4641 7. AGGREGATION WITH INDEPENDENT WORKS 4642 4643 A compilation of the Document or its derivatives with other 4644 separate and independent documents or works, in or on a volume of a 4645 storage or distribution medium, is called an âaggregateâ if the 4646 copyright resulting from the compilation is not used to limit the 4647 legal rights of the compilationâs users beyond what the individual 4648 works permit. When the Document is included in an aggregate, this 4649 License does not apply to the other works in the aggregate which 4650 are not themselves derivative works of the Document. 4651 4652 If the Cover Text requirement of section 3 is applicable to these 4653 copies of the Document, then if the Document is less than one half 4654 of the entire aggregate, the Documentâs Cover Texts may be placed 4655 on covers that bracket the Document within the aggregate, or the 4656 electronic equivalent of covers if the Document is in electronic 4657 form. Otherwise they must appear on printed covers that bracket 4658 the whole aggregate. 4659 4660 8. TRANSLATION 4661 4662 Translation is considered a kind of modification, so you may 4663 distribute translations of the Document under the terms of section 4664 4. Replacing Invariant Sections with translations requires special 4665 permission from their copyright holders, but you may include 4666 translations of some or all Invariant Sections in addition to the 4667 original versions of these Invariant Sections. You may include a 4668 translation of this License, and all the license notices in the 4669 Document, and any Warranty Disclaimers, provided that you also 4670 include the original English version of this License and the 4671 original versions of those notices and disclaimers. In case of a 4672 disagreement between the translation and the original version of 4673 this License or a notice or disclaimer, the original version will 4674 prevail. 4675 4676 If a section in the Document is Entitled âAcknowledgementsâ, 4677 âDedicationsâ, or âHistoryâ, the requirement (section 4) to 4678 Preserve its Title (section 1) will typically require changing the 4679 actual title. 4680 4681 9. TERMINATION 4682 4683 You may not copy, modify, sublicense, or distribute the Document 4684 except as expressly provided under this License. Any attempt 4685 otherwise to copy, modify, sublicense, or distribute it is void, 4686 and will automatically terminate your rights under this License. 4687 4688 However, if you cease all violation of this License, then your 4689 license from a particular copyright holder is reinstated (a) 4690 provisionally, unless and until the copyright holder explicitly and 4691 finally terminates your license, and (b) permanently, if the 4692 copyright holder fails to notify you of the violation by some 4693 reasonable means prior to 60 days after the cessation. 4694 4695 Moreover, your license from a particular copyright holder is 4696 reinstated permanently if the copyright holder notifies you of the 4697 violation by some reasonable means, this is the first time you have 4698 received notice of violation of this License (for any work) from 4699 that copyright holder, and you cure the violation prior to 30 days 4700 after your receipt of the notice. 4701 4702 Termination of your rights under this section does not terminate 4703 the licenses of parties who have received copies or rights from you 4704 under this License. If your rights have been terminated and not 4705 permanently reinstated, receipt of a copy of some or all of the 4706 same material does not give you any rights to use it. 4707 4708 10. FUTURE REVISIONS OF THIS LICENSE 4709 4710 The Free Software Foundation may publish new, revised versions of 4711 the GNU Free Documentation License from time to time. Such new 4712 versions will be similar in spirit to the present version, but may 4713 differ in detail to address new problems or concerns. See 4714 <https://www.gnu.org/licenses/>. 4715 4716 Each version of the License is given a distinguishing version 4717 number. If the Document specifies that a particular numbered 4718 version of this License âor any later versionâ applies to it, you 4719 have the option of following the terms and conditions either of 4720 that specified version or of any later version that has been 4721 published (not as a draft) by the Free Software Foundation. If the 4722 Document does not specify a version number of this License, you may 4723 choose any version ever published (not as a draft) by the Free 4724 Software Foundation. If the Document specifies that a proxy can 4725 decide which future versions of this License can be used, that 4726 proxyâs public statement of acceptance of a version permanently 4727 authorizes you to choose that version for the Document. 4728 4729 11. RELICENSING 4730 4731 âMassive Multiauthor Collaboration Siteâ (or âMMC Siteâ) means any 4732 World Wide Web server that publishes copyrightable works and also 4733 provides prominent facilities for anybody to edit those works. A 4734 public wiki that anybody can edit is an example of such a server. 4735 A âMassive Multiauthor Collaborationâ (or âMMCâ) contained in the 4736 site means any set of copyrightable works thus published on the MMC 4737 site. 4738 4739 âCC-BY-SAâ means the Creative Commons Attribution-Share Alike 3.0 4740 license published by Creative Commons Corporation, a not-for-profit 4741 corporation with a principal place of business in San Francisco, 4742 California, as well as future copyleft versions of that license 4743 published by that same organization. 4744 4745 âIncorporateâ means to publish or republish a Document, in whole or 4746 in part, as part of another Document. 4747 4748 An MMC is âeligible for relicensingâ if it is licensed under this 4749 License, and if all works that were first published under this 4750 License somewhere other than this MMC, and subsequently 4751 incorporated in whole or in part into the MMC, (1) had no cover 4752 texts or invariant sections, and (2) were thus incorporated prior 4753 to November 1, 2008. 4754 4755 The operator of an MMC Site may republish an MMC contained in the 4756 site under CC-BY-SA on the same site at any time before August 1, 4757 2009, provided the MMC is eligible for relicensing. 4758 4759 ADDENDUM: How to use this License for your documents 4760 ==================================================== 4761 4762 To use this License in a document you have written, include a copy of 4763 the License in the document and put the following copyright and license 4764 notices just after the title page: 4765 4766 Copyright (C) YEAR YOUR NAME. 4767 Permission is granted to copy, distribute and/or modify this document 4768 under the terms of the GNU Free Documentation License, Version 1.3 4769 or any later version published by the Free Software Foundation; 4770 with no Invariant Sections, no Front-Cover Texts, and no Back-Cover 4771 Texts. A copy of the license is included in the section entitled ``GNU 4772 Free Documentation License''. 4773 4774 If you have Invariant Sections, Front-Cover Texts and Back-Cover 4775 Texts, replace the âwith...Texts.â line with this: 4776 4777 with the Invariant Sections being LIST THEIR TITLES, with 4778 the Front-Cover Texts being LIST, and with the Back-Cover Texts 4779 being LIST. 4780 4781 If you have Invariant Sections without Cover Texts, or some other 4782 combination of the three, merge those two alternatives to suit the 4783 situation. 4784 4785 If your document contains nontrivial examples of program code, we 4786 recommend releasing these examples in parallel under your choice of free 4787 software license, such as the GNU General Public License, to permit 4788 their use in free software. 4789 4790 4791 File: sed.info, Node: Concept Index, Next: Command and Option Index, Prev: GNU Free Documentation License, Up: Top 4792 4793 Concept Index 4794 ************* 4795 4796 This is a general index of all issues discussed in this manual, with the 4797 exception of the âsedâ commands and command-line options. 4798 4799 [index] 4800 * Menu: 4801 4802 * -e, example: Overview. (line 46) 4803 * -e, example <1>: sed script overview. (line 37) 4804 * âexpression, example: Overview. (line 46) 4805 * -f, example: Overview. (line 46) 4806 * -f, example <1>: sed script overview. (line 37) 4807 * âfile, example: Overview. (line 46) 4808 * -i, example: Overview. (line 26) 4809 * -n, example: Overview. (line 33) 4810 * -s, example: Overview. (line 40) 4811 * 0 address: Reporting Bugs. (line 114) 4812 * ;, command separator: sed script overview. (line 37) 4813 * a, and semicolons: sed script overview. (line 56) 4814 * Additional reading about sed: Other Resources. (line 13) 4815 * ADDR1,+N: Range Addresses. (line 31) 4816 * ADDR1,~N: Range Addresses. (line 31) 4817 * address range, example: sed script overview. (line 23) 4818 * Address, as a regular expression: Regexp Addresses. (line 13) 4819 * Address, last line: Numeric Addresses. (line 13) 4820 * Address, numeric: Numeric Addresses. (line 8) 4821 * addresses, excluding: Addresses overview. (line 33) 4822 * Addresses, in sed scripts: Numeric Addresses. (line 6) 4823 * addresses, negating: Addresses overview. (line 33) 4824 * addresses, numeric: Addresses overview. (line 6) 4825 * addresses, range: Addresses overview. (line 26) 4826 * addresses, regular expression: Addresses overview. (line 20) 4827 * addresses, syntax: sed script overview. (line 13) 4828 * alphabetic characters: Character Classes and Bracket Expressions. 4829 (line 49) 4830 * alphanumeric characters: Character Classes and Bracket Expressions. 4831 (line 44) 4832 * Append hold space to pattern space: Other Commands. (line 288) 4833 * Append next input line to pattern space: Other Commands. (line 261) 4834 * Append pattern space to hold space: Other Commands. (line 280) 4835 * Appending text after a line: Other Commands. (line 45) 4836 * b, joining lines with: Branching and flow control. 4837 (line 150) 4838 * b, versus t: Branching and flow control. 4839 (line 150) 4840 * back-reference: Back-references and Subexpressions. 4841 (line 6) 4842 * Backreferences, in regular expressions: The "s" Command. (line 18) 4843 * blank characters: Character Classes and Bracket Expressions. 4844 (line 54) 4845 * bracket expression: Character Classes and Bracket Expressions. 4846 (line 6) 4847 * Branch to a label, if s/// failed: Extended Commands. (line 63) 4848 * Branch to a label, if s/// succeeded: Programming Commands. 4849 (line 22) 4850 * Branch to a label, unconditionally: Programming Commands. 4851 (line 18) 4852 * branching and n, N: Branching and flow control. 4853 (line 105) 4854 * branching, infinite loop: Branching and flow control. 4855 (line 95) 4856 * branching, joining lines: Branching and flow control. 4857 (line 150) 4858 * Buffer spaces, pattern and hold: Execution Cycle. (line 6) 4859 * Bugs, reporting: Reporting Bugs. (line 6) 4860 * c, and semicolons: sed script overview. (line 56) 4861 * case insensitive, regular expression: Regexp Addresses. (line 47) 4862 * Case-insensitive matching: The "s" Command. (line 117) 4863 * Caveat â #n on first line: Common Commands. (line 20) 4864 * character class: Character Classes and Bracket Expressions. 4865 (line 6) 4866 * character classes: Character Classes and Bracket Expressions. 4867 (line 43) 4868 * classes of characters: Character Classes and Bracket Expressions. 4869 (line 43) 4870 * Command groups: Common Commands. (line 91) 4871 * Comments, in scripts: Common Commands. (line 12) 4872 * Conditional branch: Programming Commands. 4873 (line 22) 4874 * Conditional branch <1>: Extended Commands. (line 63) 4875 * control characters: Character Classes and Bracket Expressions. 4876 (line 57) 4877 * Copy hold space into pattern space: Other Commands. (line 284) 4878 * Copy pattern space into hold space: Other Commands. (line 276) 4879 * cycle, restarting: Branching and flow control. 4880 (line 75) 4881 * d, example: sed script overview. (line 23) 4882 * Delete first line from pattern space: Other Commands. (line 255) 4883 * digit characters: Character Classes and Bracket Expressions. 4884 (line 62) 4885 * Disabling autoprint, from command line: Command-Line Options. 4886 (line 23) 4887 * empty regular expression: Regexp Addresses. (line 22) 4888 * Emptying pattern space: Extended Commands. (line 85) 4889 * Emptying pattern space <1>: Reporting Bugs. (line 143) 4890 * Evaluate Bourne-shell commands: Extended Commands. (line 12) 4891 * Evaluate Bourne-shell commands, after substitution: The "s" Command. 4892 (line 108) 4893 * example, address range: sed script overview. (line 23) 4894 * example, regular expression: sed script overview. (line 28) 4895 * Exchange hold space with pattern space: Other Commands. (line 292) 4896 * Excluding lines: Addresses overview. (line 33) 4897 * exit status: Exit status. (line 6) 4898 * exit status, example: Exit status. (line 25) 4899 * Extended regular expressions, choosing: Command-Line Options. 4900 (line 135) 4901 * Extended regular expressions, syntax: ERE syntax. (line 6) 4902 * File name, printing: Extended Commands. (line 30) 4903 * Files to be processed as input: Command-Line Options. 4904 (line 181) 4905 * Flow of control in scripts: Programming Commands. 4906 (line 11) 4907 * Global substitution: The "s" Command. (line 74) 4908 * GNU extensions, /dev/stderr file: The "s" Command. (line 101) 4909 * GNU extensions, /dev/stderr file <1>: Other Commands. (line 244) 4910 * GNU extensions, /dev/stdin file: Other Commands. (line 227) 4911 * GNU extensions, /dev/stdin file <1>: Extended Commands. (line 53) 4912 * GNU extensions, /dev/stdout file: Command-Line Options. 4913 (line 189) 4914 * GNU extensions, /dev/stdout file <1>: The "s" Command. (line 101) 4915 * GNU extensions, /dev/stdout file <2>: Other Commands. (line 244) 4916 * GNU extensions, 0 address: Range Addresses. (line 31) 4917 * GNU extensions, 0 address <1>: Reporting Bugs. (line 114) 4918 * GNU extensions, 0,ADDR2 addressing: Range Addresses. (line 31) 4919 * GNU extensions, ADDR1,+N addressing: Range Addresses. (line 31) 4920 * GNU extensions, ADDR1,~N addressing: Range Addresses. (line 31) 4921 * GNU extensions, branch if s/// failed: Extended Commands. (line 63) 4922 * GNU extensions, case modifiers in s commands: The "s" Command. 4923 (line 29) 4924 * GNU extensions, checking for their presence: Extended Commands. 4925 (line 69) 4926 * GNU extensions, debug: Command-Line Options. 4927 (line 29) 4928 * GNU extensions, disabling: Command-Line Options. 4929 (line 102) 4930 * GNU extensions, emptying pattern space: Extended Commands. (line 85) 4931 * GNU extensions, emptying pattern space <1>: Reporting Bugs. (line 143) 4932 * GNU extensions, evaluating Bourne-shell commands: The "s" Command. 4933 (line 108) 4934 * GNU extensions, evaluating Bourne-shell commands <1>: Extended Commands. 4935 (line 12) 4936 * GNU extensions, extended regular expressions: Command-Line Options. 4937 (line 135) 4938 * GNU extensions, g and NUMBER modifier: The "s" Command. (line 80) 4939 * GNU extensions, I modifier: The "s" Command. (line 117) 4940 * GNU extensions, I modifier <1>: Regexp Addresses. (line 47) 4941 * GNU extensions, in-place editing: Command-Line Options. 4942 (line 56) 4943 * GNU extensions, in-place editing <1>: Reporting Bugs. (line 95) 4944 * GNU extensions, M modifier: The "s" Command. (line 122) 4945 * GNU extensions, M modifier <1>: Regexp Addresses. (line 75) 4946 * GNU extensions, modifiers and the empty regular expression: Regexp Addresses. 4947 (line 22) 4948 * GNU extensions, N~M addresses: Numeric Addresses. (line 18) 4949 * GNU extensions, quitting silently: Extended Commands. (line 36) 4950 * GNU extensions, R command: Extended Commands. (line 53) 4951 * GNU extensions, reading a file a line at a time: Extended Commands. 4952 (line 53) 4953 * GNU extensions, returning an exit code: Common Commands. (line 28) 4954 * GNU extensions, returning an exit code <1>: Extended Commands. 4955 (line 36) 4956 * GNU extensions, setting line length: Other Commands. (line 207) 4957 * GNU extensions, special escapes: Escapes. (line 6) 4958 * GNU extensions, special escapes <1>: Reporting Bugs. (line 88) 4959 * GNU extensions, special two-address forms: Range Addresses. (line 31) 4960 * GNU extensions, subprocesses: The "s" Command. (line 108) 4961 * GNU extensions, subprocesses <1>: Extended Commands. (line 12) 4962 * GNU extensions, to basic regular expressions: BRE syntax. (line 13) 4963 * GNU extensions, to basic regular expressions <1>: BRE syntax. 4964 (line 59) 4965 * GNU extensions, to basic regular expressions <2>: BRE syntax. 4966 (line 62) 4967 * GNU extensions, to basic regular expressions <3>: BRE syntax. 4968 (line 77) 4969 * GNU extensions, to basic regular expressions <4>: BRE syntax. 4970 (line 87) 4971 * GNU extensions, to basic regular expressions <5>: Reporting Bugs. 4972 (line 61) 4973 * GNU extensions, two addresses supported by most commands: Other Commands. 4974 (line 61) 4975 * GNU extensions, two addresses supported by most commands <1>: Other Commands. 4976 (line 115) 4977 * GNU extensions, two addresses supported by most commands <2>: Other Commands. 4978 (line 204) 4979 * GNU extensions, two addresses supported by most commands <3>: Other Commands. 4980 (line 236) 4981 * GNU extensions, unlimited line length: Limitations. (line 6) 4982 * GNU extensions, writing first line to a file: Extended Commands. 4983 (line 80) 4984 * Goto, in scripts: Programming Commands. 4985 (line 18) 4986 * graphic characters: Character Classes and Bracket Expressions. 4987 (line 65) 4988 * Greedy regular expression matching: BRE syntax. (line 113) 4989 * Grouping commands: Common Commands. (line 91) 4990 * hexadecimal digits: Character Classes and Bracket Expressions. 4991 (line 88) 4992 * Hold space, appending from pattern space: Other Commands. (line 280) 4993 * Hold space, appending to pattern space: Other Commands. (line 288) 4994 * Hold space, copy into pattern space: Other Commands. (line 284) 4995 * Hold space, copying pattern space into: Other Commands. (line 276) 4996 * Hold space, definition: Execution Cycle. (line 6) 4997 * Hold space, exchange with pattern space: Other Commands. (line 292) 4998 * i, and semicolons: sed script overview. (line 56) 4999 * In-place editing: Reporting Bugs. (line 95) 5000 * In-place editing, activating: Command-Line Options. 5001 (line 56) 5002 * In-place editing, Perl-style backup file names: Command-Line Options. 5003 (line 67) 5004 * infinite loop, branching: Branching and flow control. 5005 (line 95) 5006 * Inserting text before a line: Other Commands. (line 104) 5007 * joining lines with branching: Branching and flow control. 5008 (line 150) 5009 * joining quoted-printable lines: Branching and flow control. 5010 (line 150) 5011 * labels: Branching and flow control. 5012 (line 75) 5013 * Labels, in scripts: Programming Commands. 5014 (line 14) 5015 * Last line, selecting: Numeric Addresses. (line 13) 5016 * Line length, setting: Command-Line Options. 5017 (line 97) 5018 * Line length, setting <1>: Other Commands. (line 207) 5019 * Line number, printing: Other Commands. (line 194) 5020 * Line selection: Numeric Addresses. (line 6) 5021 * Line, selecting by number: Numeric Addresses. (line 8) 5022 * Line, selecting by regular expression match: Regexp Addresses. 5023 (line 13) 5024 * Line, selecting last: Numeric Addresses. (line 13) 5025 * List pattern space: Other Commands. (line 207) 5026 * lower-case letters: Character Classes and Bracket Expressions. 5027 (line 68) 5028 * Mixing g and NUMBER modifiers in the s command: The "s" Command. 5029 (line 80) 5030 * multiple files: Overview. (line 40) 5031 * multiple sed commands: sed script overview. (line 37) 5032 * n, and branching: Branching and flow control. 5033 (line 105) 5034 * N, and branching: Branching and flow control. 5035 (line 105) 5036 * named character classes: Character Classes and Bracket Expressions. 5037 (line 43) 5038 * newline, command separator: sed script overview. (line 37) 5039 * Next input line, append to pattern space: Other Commands. (line 261) 5040 * Next input line, replace pattern space with: Common Commands. 5041 (line 61) 5042 * Non-bugs, 0 address: Reporting Bugs. (line 114) 5043 * Non-bugs, in-place editing: Reporting Bugs. (line 95) 5044 * Non-bugs, localization-related: Reporting Bugs. (line 124) 5045 * Non-bugs, localization-related <1>: Reporting Bugs. (line 143) 5046 * Non-bugs, N command on the last line: Reporting Bugs. (line 30) 5047 * Non-bugs, regex syntax clashes: Reporting Bugs. (line 61) 5048 * numeric addresses: Addresses overview. (line 6) 5049 * numeric characters: Character Classes and Bracket Expressions. 5050 (line 62) 5051 * omitting labels: Branching and flow control. 5052 (line 75) 5053 * output: Overview. (line 26) 5054 * output, suppressing: Overview. (line 33) 5055 * p, example: Overview. (line 33) 5056 * paragraphs, processing: Multiline techniques. 5057 (line 53) 5058 * parameters, script: Overview. (line 46) 5059 * Parenthesized substrings: The "s" Command. (line 18) 5060 * Pattern space, definition: Execution Cycle. (line 6) 5061 * Portability, comments: Common Commands. (line 15) 5062 * Portability, line length limitations: Limitations. (line 6) 5063 * Portability, N command on the last line: Reporting Bugs. (line 30) 5064 * POSIXLY_CORRECT behavior, bracket expressions: Character Classes and Bracket Expressions. 5065 (line 112) 5066 * POSIXLY_CORRECT behavior, enabling: Command-Line Options. 5067 (line 105) 5068 * POSIXLY_CORRECT behavior, escapes: Escapes. (line 11) 5069 * POSIXLY_CORRECT behavior, N command: Reporting Bugs. (line 56) 5070 * Print first line from pattern space: Other Commands. (line 273) 5071 * printable characters: Character Classes and Bracket Expressions. 5072 (line 72) 5073 * Printing file name: Extended Commands. (line 30) 5074 * Printing line number: Other Commands. (line 194) 5075 * Printing text unambiguously: Other Commands. (line 207) 5076 * processing paragraphs: Multiline techniques. 5077 (line 53) 5078 * punctuation characters: Character Classes and Bracket Expressions. 5079 (line 75) 5080 * Q, example: Exit status. (line 25) 5081 * q, example: sed script overview. (line 28) 5082 * Quitting: Common Commands. (line 28) 5083 * Quitting <1>: Extended Commands. (line 36) 5084 * quoted-printable lines, joining: Branching and flow control. 5085 (line 150) 5086 * range addresses: Addresses overview. (line 26) 5087 * range expression: Character Classes and Bracket Expressions. 5088 (line 18) 5089 * Range of lines: Range Addresses. (line 6) 5090 * Range with start address of zero: Range Addresses. (line 31) 5091 * Read next input line: Common Commands. (line 61) 5092 * Read text from a file: Other Commands. (line 219) 5093 * Read text from a file <1>: Extended Commands. (line 53) 5094 * regex addresses and input lines: Regexp Addresses. (line 84) 5095 * regex addresses and pattern space: Regexp Addresses. (line 84) 5096 * regular expression addresses: Addresses overview. (line 20) 5097 * regular expression, example: sed script overview. (line 28) 5098 * Replace hold space with copy of pattern space: Other Commands. 5099 (line 276) 5100 * Replace pattern space with copy of hold space: Other Commands. 5101 (line 284) 5102 * Replacing all text matching regexp in a line: The "s" Command. 5103 (line 74) 5104 * Replacing only Nth match of regexp in a line: The "s" Command. 5105 (line 78) 5106 * Replacing selected lines with other text: Other Commands. (line 157) 5107 * Requiring GNU sed: Extended Commands. (line 69) 5108 * restarting a cycle: Branching and flow control. 5109 (line 75) 5110 * Sandbox mode: Command-Line Options. 5111 (line 157) 5112 * script parameter: Overview. (line 46) 5113 * Script structure: sed script overview. (line 6) 5114 * Script, from a file: Command-Line Options. 5115 (line 51) 5116 * Script, from command line: Command-Line Options. 5117 (line 46) 5118 * sed commands syntax: sed script overview. (line 13) 5119 * sed commands, multiple: sed script overview. (line 37) 5120 * sed script structure: sed script overview. (line 6) 5121 * Selecting lines to process: Numeric Addresses. (line 6) 5122 * Selecting non-matching lines: Addresses overview. (line 33) 5123 * semicolons, command separator: sed script overview. (line 37) 5124 * Several lines, selecting: Range Addresses. (line 6) 5125 * Slash character, in regular expressions: Regexp Addresses. (line 32) 5126 * space characters: Character Classes and Bracket Expressions. 5127 (line 80) 5128 * Spaces, pattern and hold: Execution Cycle. (line 6) 5129 * Special addressing forms: Range Addresses. (line 31) 5130 * standard input: Overview. (line 18) 5131 * Standard input, processing as input: Command-Line Options. 5132 (line 183) 5133 * standard output: Overview. (line 26) 5134 * stdin: Overview. (line 18) 5135 * stdout: Overview. (line 26) 5136 * Stream editor: Introduction. (line 6) 5137 * subexpression: Back-references and Subexpressions. 5138 (line 6) 5139 * Subprocesses: The "s" Command. (line 108) 5140 * Subprocesses <1>: Extended Commands. (line 12) 5141 * Substitution of text, options: The "s" Command. (line 70) 5142 * suppressing output: Overview. (line 33) 5143 * syntax, addresses: sed script overview. (line 13) 5144 * syntax, sed commands: sed script overview. (line 13) 5145 * t, joining lines with: Branching and flow control. 5146 (line 150) 5147 * t, versus b: Branching and flow control. 5148 (line 150) 5149 * Text, appending: Other Commands. (line 45) 5150 * Text, deleting: Common Commands. (line 44) 5151 * Text, insertion: Other Commands. (line 104) 5152 * Text, printing: Common Commands. (line 52) 5153 * Text, printing after substitution: The "s" Command. (line 88) 5154 * Text, writing to a file after substitution: The "s" Command. 5155 (line 101) 5156 * Transliteration: Other Commands. (line 11) 5157 * Unbuffered I/O, choosing: Command-Line Options. 5158 (line 164) 5159 * upper-case letters: Character Classes and Bracket Expressions. 5160 (line 84) 5161 * Usage summary, printing: Command-Line Options. 5162 (line 17) 5163 * Version, printing: Command-Line Options. 5164 (line 13) 5165 * whitespace characters: Character Classes and Bracket Expressions. 5166 (line 80) 5167 * Working on separate files: Command-Line Options. 5168 (line 148) 5169 * Write first line to a file: Extended Commands. (line 80) 5170 * Write to a file: Other Commands. (line 244) 5171 * xdigit class: Character Classes and Bracket Expressions. 5172 (line 88) 5173 * Zero Address: Zero Address. (line 6) 5174 * Zero, as range start address: Range Addresses. (line 31) 5175 5176 5177 File: sed.info, Node: Command and Option Index, Prev: Concept Index, Up: Top 5178 5179 Command and Option Index 5180 ************************ 5181 5182 This is an alphabetical list of all âsedâ commands and command-line 5183 options. 5184 5185 [index] 5186 * Menu: 5187 5188 * # (comments): Common Commands. (line 12) 5189 * --binary: Command-Line Options. 5190 (line 114) 5191 * --debug: Command-Line Options. 5192 (line 29) 5193 * --expression: Command-Line Options. 5194 (line 46) 5195 * --file: Command-Line Options. 5196 (line 51) 5197 * --follow-symlinks: Command-Line Options. 5198 (line 125) 5199 * --help: Command-Line Options. 5200 (line 17) 5201 * --in-place: Command-Line Options. 5202 (line 56) 5203 * --line-length: Command-Line Options. 5204 (line 97) 5205 * --null-data: Command-Line Options. 5206 (line 172) 5207 * --posix: Command-Line Options. 5208 (line 102) 5209 * --quiet: Command-Line Options. 5210 (line 23) 5211 * --regexp-extended: Command-Line Options. 5212 (line 135) 5213 * --sandbox: Command-Line Options. 5214 (line 157) 5215 * --separate: Command-Line Options. 5216 (line 148) 5217 * --silent: Command-Line Options. 5218 (line 23) 5219 * --unbuffered: Command-Line Options. 5220 (line 164) 5221 * --version: Command-Line Options. 5222 (line 13) 5223 * --zero-terminated: Command-Line Options. 5224 (line 172) 5225 * -b: Command-Line Options. 5226 (line 114) 5227 * -e: Command-Line Options. 5228 (line 46) 5229 * -E: Command-Line Options. 5230 (line 135) 5231 * -f: Command-Line Options. 5232 (line 51) 5233 * -i: Command-Line Options. 5234 (line 56) 5235 * -l: Command-Line Options. 5236 (line 97) 5237 * -n: Command-Line Options. 5238 (line 23) 5239 * -n, forcing from within a script: Common Commands. (line 20) 5240 * -r: Command-Line Options. 5241 (line 135) 5242 * -s: Command-Line Options. 5243 (line 148) 5244 * -u: Command-Line Options. 5245 (line 164) 5246 * -z: Command-Line Options. 5247 (line 172) 5248 * : (label) command: Programming Commands. 5249 (line 14) 5250 * = (print line number) command: Other Commands. (line 194) 5251 * {} command grouping: Common Commands. (line 91) 5252 * a (append text lines) command: Other Commands. (line 45) 5253 * alnum character class: Character Classes and Bracket Expressions. 5254 (line 44) 5255 * alpha character class: Character Classes and Bracket Expressions. 5256 (line 49) 5257 * b (branch) command: Programming Commands. 5258 (line 18) 5259 * blank character class: Character Classes and Bracket Expressions. 5260 (line 54) 5261 * c (change to text lines) command: Other Commands. (line 157) 5262 * cntrl character class: Character Classes and Bracket Expressions. 5263 (line 57) 5264 * D (delete first line) command: Other Commands. (line 255) 5265 * d (delete) command: Common Commands. (line 44) 5266 * digit character class: Character Classes and Bracket Expressions. 5267 (line 62) 5268 * e (evaluate) command: Extended Commands. (line 12) 5269 * F (File name) command: Extended Commands. (line 30) 5270 * G (appending Get) command: Other Commands. (line 288) 5271 * g (get) command: Other Commands. (line 284) 5272 * graph character class: Character Classes and Bracket Expressions. 5273 (line 65) 5274 * H (append Hold) command: Other Commands. (line 280) 5275 * h (hold) command: Other Commands. (line 276) 5276 * i (insert text lines) command: Other Commands. (line 104) 5277 * l (list unambiguously) command: Other Commands. (line 207) 5278 * lower character class: Character Classes and Bracket Expressions. 5279 (line 68) 5280 * N (append Next line) command: Other Commands. (line 261) 5281 * n (next-line) command: Common Commands. (line 61) 5282 * P (print first line) command: Other Commands. (line 273) 5283 * p (print) command: Common Commands. (line 52) 5284 * print character class: Character Classes and Bracket Expressions. 5285 (line 72) 5286 * punct character class: Character Classes and Bracket Expressions. 5287 (line 75) 5288 * q (quit) command: Common Commands. (line 28) 5289 * Q (silent Quit) command: Extended Commands. (line 36) 5290 * r (read file) command: Other Commands. (line 219) 5291 * R (read line) command: Extended Commands. (line 53) 5292 * s command, option flags: The "s" Command. (line 70) 5293 * space character class: Character Classes and Bracket Expressions. 5294 (line 80) 5295 * T (test and branch if failed) command: Extended Commands. (line 63) 5296 * t (test and branch if successful) command: Programming Commands. 5297 (line 22) 5298 * upper character class: Character Classes and Bracket Expressions. 5299 (line 84) 5300 * v (version) command: Extended Commands. (line 69) 5301 * w (write file) command: Other Commands. (line 244) 5302 * W (write first line) command: Extended Commands. (line 80) 5303 * x (eXchange) command: Other Commands. (line 292) 5304 * xdigit character class: Character Classes and Bracket Expressions. 5305 (line 88) 5306 * y (transliterate) command: Other Commands. (line 11) 5307 * z (Zap) command: Extended Commands. (line 85) 5308 5309 30 5310 31 5311 Tag Table: 32 (Indirect) 33 Node: Top935 34 Node: Introduction3816 35 Node: Invoking sed4370 36 Ref: Invoking sed-Footnote-19396 37 Ref: Invoking sed-Footnote-29588 38 Node: sed Programs9691 39 Node: Execution Cycle10838 40 Ref: Execution Cycle-Footnote-112011 41 Node: Addresses12322 42 Node: Regular Expressions17061 43 Node: Common Commands24610 44 Node: The "s" Command26608 45 Ref: The "s" Command-Footnote-130940 46 Node: Other Commands31012 47 Ref: Other Commands-Footnote-136149 48 Node: Programming Commands36221 49 Node: Extended Commands37130 50 Node: Escapes40705 51 Ref: Escapes-Footnote-143711 52 Node: Examples43902 53 Node: Centering lines44997 54 Node: Increment a number45909 55 Ref: Increment a number-Footnote-147489 56 Node: Rename files to lower case47609 57 Node: Print bash environment50405 58 Node: Reverse chars of lines51185 59 Ref: Reverse chars of lines-Footnote-152202 60 Node: tac52424 61 Node: cat -n53206 62 Node: cat -b55063 63 Node: wc -c55815 64 Ref: wc -c-Footnote-157748 65 Node: wc -w57817 66 Node: wc -l59289 67 Node: head59526 68 Node: tail59850 69 Node: uniq61534 70 Node: uniq -d62330 71 Node: uniq -u63054 72 Node: cat -s63778 73 Node: Limitations65667 74 Node: Other Resources66507 75 Node: Reporting Bugs67433 76 Ref: Reporting Bugs-Footnote-173962 77 Node: Extended regexps74033 78 Node: Concept Index75200 79 Node: Command and Option Index85215 5312 Node: Top738 5313 Node: Introduction2217 5314 Node: Invoking sed2789 5315 Node: Overview3114 5316 Node: Command-Line Options5561 5317 Ref: Command-Line Options-Footnote-113530 5318 Ref: Command-Line Options-Footnote-213758 5319 Node: Exit status13861 5320 Node: sed scripts14795 5321 Node: sed script overview15394 5322 Node: sed commands list18057 5323 Node: The "s" Command23070 5324 Ref: The "s" Command-Footnote-128889 5325 Node: Common Commands28969 5326 Node: Other Commands32106 5327 Ref: insert command35324 5328 Ref: Other Commands-Footnote-141629 5329 Node: Programming Commands41709 5330 Node: Extended Commands42649 5331 Node: Multiple commands syntax46675 5332 Node: sed addresses51217 5333 Node: Addresses overview51706 5334 Node: Numeric Addresses53705 5335 Node: Regexp Addresses55116 5336 Ref: Regexp Addresses-Footnote-159252 5337 Node: Range Addresses59392 5338 Ref: Zero Address Regex Range60294 5339 Node: Zero Address61753 5340 Node: sed regular expressions62318 5341 Node: Regular Expressions Overview63172 5342 Node: BRE vs ERE64733 5343 Node: BRE syntax66484 5344 Node: ERE syntax73304 5345 Node: Character Classes and Bracket Expressions74878 5346 Node: regexp extensions80030 5347 Node: Back-references and Subexpressions82506 5348 Node: Escapes84958 5349 Ref: Escapes-Footnote-188105 5350 Node: Locale Considerations88304 5351 Ref: Locale Considerations-Footnote-193067 5352 Node: advanced sed93239 5353 Node: Execution Cycle93606 5354 Ref: Execution Cycle-Footnote-194845 5355 Node: Hold and Pattern Buffers95162 5356 Node: Multiline techniques95350 5357 Node: Branching and flow control98704 5358 Node: Examples107029 5359 Node: Joining lines108275 5360 Node: Centering lines110082 5361 Node: Increment a number111006 5362 Ref: Increment a number-Footnote-1112495 5363 Node: Rename files to lower case112623 5364 Node: Print bash environment115418 5365 Node: Reverse chars of lines116181 5366 Ref: Reverse chars of lines-Footnote-1117224 5367 Node: Text search across multiple lines117441 5368 Node: Line length adjustment120786 5369 Node: Adding a header to multiple files122533 5370 Node: tac125986 5371 Node: cat -n126774 5372 Node: cat -b128616 5373 Node: wc -c129378 5374 Ref: wc -c-Footnote-1131316 5375 Node: wc -w131385 5376 Node: wc -l132875 5377 Node: head133128 5378 Node: tail133467 5379 Node: uniq135196 5380 Node: uniq -d136007 5381 Node: uniq -u136722 5382 Node: cat -s137435 5383 Node: Limitations139298 5384 Node: Other Resources140161 5385 Node: Reporting Bugs141104 5386 Ref: N_command_last_line142294 5387 Ref: Reporting Bugs-Footnote-1148805 5388 Node: GNU Free Documentation License148880 5389 Node: Concept Index174239 5390 Node: Command and Option Index201600 80 5391 81 5392 End Tag Table 5393 5394 5395 Local Variables: 5396 coding: utf-8 5397 End: -
trunk/src/sed/doc/sed.texi
r599 r3613 1 1 \input texinfo @c -*-texinfo-*- 2 @c Do not edit this file!! It is automatically generated from sed-in.texi.3 2 @c 4 3 @c -- Stuff that needs adding: ---------------------------------------------- 5 @c ( document the `;' command-separator)4 @c (nothing!) 6 5 @c -------------------------------------------------------------------------- 7 6 @c Check for consistency: regexps in @code, text that they match in @samp. 8 @c 7 @c 9 8 @c Tips: 10 9 @c @command for command … … 36 35 @value{SSED}, a stream editor. 37 36 38 Copyright @copyright{} 1998, 1999, 2001, 2002, 2003, 2004 Free 39 Software Foundation, Inc. 40 41 This document is released under the terms of the @acronym{GNU} Free 42 Documentation License as published by the Free Software Foundation; 43 either version 1.1, or (at your option) any later version. 44 45 You should have received a copy of the @acronym{GNU} Free Documentation 46 License along with @value{SSED}; see the file @file{COPYING.DOC}. 47 If not, write to the Free Software Foundation, 59 Temple Place - Suite 48 330, Boston, MA 02110-1301, USA. 49 50 There are no Cover Texts and no Invariant Sections; this text, along 51 with its equivalent in the printed manual, constitutes the Title Page. 37 Copyright @copyright{} 1998--2022 Free Software Foundation, Inc. 38 39 @quotation 40 Permission is granted to copy, distribute and/or modify this document 41 under the terms of the GNU Free Documentation License, Version 1.3 42 or any later version published by the Free Software Foundation; 43 with no Invariant Sections, no Front-Cover Texts, and no 44 Back-Cover Texts. A copy of the license is included in the 45 section entitled ``GNU Free Documentation License''. 46 @end quotation 52 47 @end copying 53 48 … … 55 50 56 51 @titlepage 57 @title @ command{sed}, a stream editor52 @title @value{SSED}, a stream editor 58 53 @subtitle version @value{VERSION}, @value{UPDATED} 59 @author by Ken Pizzini, Paolo Bonzini 54 @author by Ken Pizzini, Paolo Bonzini, Jim Meyering, Assaf Gordon 60 55 61 56 @page 62 57 @vskip 0pt plus 1filll 63 Copyright @copyright{} 1998, 1999 Free Software Foundation, Inc.64 65 58 @insertcopying 66 67 Published by the Free Software Foundation, @*68 51 Franklin Street, Fifth Floor @*69 Boston, MA 02110-1301, USA70 59 @end titlepage 71 60 72 61 @contents 62 63 @ifnottex 73 64 @node Top 74 @top 75 76 @ifnottex 65 @top @value{SSED} 66 77 67 @insertcopying 78 68 @end ifnottex … … 81 71 * Introduction:: Introduction 82 72 * Invoking sed:: Invocation 83 * sed Programs:: @command{sed} programs 73 * sed scripts:: @command{sed} scripts 74 * sed addresses:: Addresses: selecting lines 75 * sed regular expressions:: Regular expressions: selecting text 76 * advanced sed:: Advanced @command{sed}: cycles and buffers 84 77 * Examples:: Some sample scripts 85 78 * Limitations:: Limitations and (non-)limitations of @value{SSED} 86 79 * Other Resources:: Other resources for learning about @command{sed} 87 80 * Reporting Bugs:: Reporting bugs 88 89 * Extended regexps:: @command{egrep}-style regular expressions 90 @ifset PERL 91 * Perl regexps:: Perl-style regular expressions 92 @end ifset 93 81 * GNU Free Documentation License:: Copying and sharing this manual 94 82 * Concept Index:: A menu with all the topics in this manual. 95 83 * Command and Option Index:: A menu with all @command{sed} commands and 96 84 command-line options. 97 98 @detailmenu99 --- The detailed node listing ---100 101 sed Programs:102 * Execution Cycle:: How @command{sed} works103 * Addresses:: Selecting lines with @command{sed}104 * Regular Expressions:: Overview of regular expression syntax105 * Common Commands:: Often used commands106 * The "s" Command:: @command{sed}'s Swiss Army Knife107 * Other Commands:: Less frequently used commands108 * Programming Commands:: Commands for @command{sed} gurus109 * Extended Commands:: Commands specific of @value{SSED}110 * Escapes:: Specifying special characters111 112 Examples:113 * Centering lines::114 * Increment a number::115 * Rename files to lower case::116 * Print bash environment::117 * Reverse chars of lines::118 * tac:: Reverse lines of files119 * cat -n:: Numbering lines120 * cat -b:: Numbering non-blank lines121 * wc -c:: Counting chars122 * wc -w:: Counting words123 * wc -l:: Counting lines124 * head:: Printing the first lines125 * tail:: Printing the last lines126 * uniq:: Make duplicate lines unique127 * uniq -d:: Print duplicated lines of input128 * uniq -u:: Remove all duplicated lines129 * cat -s:: Squeezing blank lines130 131 @ifset PERL132 Perl regexps:: Perl-style regular expressions133 * Backslash:: Introduces special sequences134 * Circumflex/dollar sign/period:: Behave specially with regard to new lines135 * Square brackets:: Are a bit different in strange cases136 * Options setting:: Toggle modifiers in the middle of a regexp137 * Non-capturing subpatterns:: Are not counted when backreferencing138 * Repetition:: Allows for non-greedy matching139 * Backreferences:: Allows for more than 10 back references140 * Assertions:: Allows for complex look ahead matches141 * Non-backtracking subpatterns:: Often gives more performance142 * Conditional subpatterns:: Allows if/then/else branches143 * Recursive patterns:: For example to match parentheses144 * Comments:: Because things can get complex...145 @end ifset146 147 @end detailmenu148 85 @end menu 149 86 … … 167 104 168 105 @node Invoking sed 169 @chapter Invocation 170 106 @chapter Running sed 107 108 This chapter covers how to run @command{sed}. Details of @command{sed} 109 scripts and individual @command{sed} commands are discussed in the 110 next chapter. 111 112 @menu 113 * Overview:: 114 * Command-Line Options:: 115 * Exit status:: 116 @end menu 117 118 119 @node Overview 120 @section Overview 171 121 Normally @command{sed} is invoked like this: 172 122 … … 175 125 @end example 176 126 127 For example, to change every @samp{hello} to @samp{world} 128 in the file @file{input.txt}: 129 130 @example 131 sed 's/hello/world/g' input.txt > output.txt 132 @end example 133 134 Without the @samp{g} (global) modifier, @command{sed} affects 135 only the first instance per line. 136 137 @cindex stdin 138 @cindex standard input 139 If you do not specify @var{INPUTFILE}, or if @var{INPUTFILE} is @file{-}, 140 @command{sed} filters the contents of the standard input. The following 141 commands are equivalent: 142 143 @example 144 sed 's/hello/world/g' input.txt > output.txt 145 sed 's/hello/world/g' < input.txt > output.txt 146 cat input.txt | sed 's/hello/world/g' - > output.txt 147 @end example 148 149 @cindex stdout 150 @cindex output 151 @cindex standard output 152 @cindex -i, example 153 @command{sed} writes output to standard output. Use @option{-i} to edit 154 files in-place instead of printing to standard output. 155 See also the @code{W} and @code{s///w} commands for writing output to 156 other files. The following command modifies @file{file.txt} and 157 does not produce any output: 158 159 @example 160 sed -i 's/hello/world/' file.txt 161 @end example 162 163 @cindex -n, example 164 @cindex p, example 165 @cindex suppressing output 166 @cindex output, suppressing 167 By default @command{sed} prints all processed input (except input 168 that has been modified/deleted by commands such as @command{d}). 169 Use @option{-n} to suppress output, and the @code{p} command 170 to print specific lines. The following command prints only line 45 171 of the input file: 172 173 @example 174 sed -n '45p' file.txt 175 @end example 176 177 178 179 @cindex multiple files 180 @cindex -s, example 181 @command{sed} treats multiple input files as one long stream. 182 The following example prints the first line of the first file 183 (@file{one.txt}) and the last line of the last file (@file{three.txt}). 184 Use @option{-s} to reverse this behavior. 185 186 @example 187 sed -n '1p ; $p' one.txt two.txt three.txt 188 @end example 189 190 191 @cindex -e, example 192 @cindex --expression, example 193 @cindex -f, example 194 @cindex --file, example 195 @cindex script parameter 196 @cindex parameters, script 197 Without @option{-e} or @option{-f} options, @command{sed} uses 198 the first non-option parameter as the @var{script}, and the following 199 non-option parameters as input files. 200 If @option{-e} or @option{-f} options are used to specify a @var{script}, 201 all non-option parameters are taken as input files. 202 Options @option{-e} and @option{-f} can be combined, and can appear 203 multiple times (in which case the final effective @var{script} will be 204 concatenation of all the individual @var{script}s). 205 206 The following examples are equivalent: 207 208 @example 209 sed 's/hello/world/' input.txt > output.txt 210 211 sed -e 's/hello/world/' input.txt > output.txt 212 sed --expression='s/hello/world/' input.txt > output.txt 213 214 echo 's/hello/world/' > myscript.sed 215 sed -f myscript.sed input.txt > output.txt 216 sed --file=myscript.sed input.txt > output.txt 217 @end example 218 219 220 @node Command-Line Options 221 @section Command-Line Options 222 177 223 The full format for invoking @command{sed} is: 178 224 … … 180 226 sed OPTIONS... [SCRIPT] [INPUTFILE...] 181 227 @end example 182 183 If you do not specify @var{INPUTFILE}, or if @var{INPUTFILE} is @file{-},184 @command{sed} filters the contents of the standard input. The @var{script}185 is actually the first non-option parameter, which @command{sed} specially186 considers a script and not an input file if (and only if) none of the187 other @var{options} specifies a script to be executed, that is if neither188 of the @option{-e} and @option{-f} options is specified.189 228 190 229 @command{sed} may be invoked with the following command-line options: … … 212 251 @cindex Disabling autoprint, from command line 213 252 By default, @command{sed} prints out the pattern space 214 at the end of each cycle through the script. 253 at the end of each cycle through the script (@pxref{Execution Cycle, , 254 How @code{sed} works}). 215 255 These options disable this automatic printing, 216 256 and @command{sed} only produces output when explicitly told to 217 257 via the @code{p} command. 258 259 @item --debug 260 @opindex --debug 261 @cindex @value{SSEDEXT}, debug 262 Print the input sed program in canonical form, 263 and annotate program execution. 264 @codequotebacktick on 265 @codequoteundirected on 266 @example 267 $ echo 1 | sed '\%1%s21232' 268 3 269 270 $ echo 1 | sed --debug '\%1%s21232' 271 SED PROGRAM: 272 /1/ s/1/3/ 273 INPUT: 'STDIN' line 1 274 PATTERN: 1 275 COMMAND: /1/ s/1/3/ 276 PATTERN: 3 277 END-OF-CYCLE: 278 3 279 @end example 280 @codequotebacktick off 281 @codequoteundirected off 282 283 284 @item -e @var{script} 285 @itemx --expression=@var{script} 286 @opindex -e 287 @opindex --expression 288 @cindex Script, from command line 289 Add the commands in @var{script} to the set of commands to be 290 run while processing the input. 291 292 @item -f @var{script-file} 293 @itemx --file=@var{script-file} 294 @opindex -f 295 @opindex --file 296 @cindex Script, from a file 297 Add the commands contained in the file @var{script-file} 298 to the set of commands to be run while processing the input. 218 299 219 300 @item -i[@var{SUFFIX}] … … 240 321 before renaming the temporary file, thereby making a backup 241 322 copy@footnote{Note that @value{SSED} creates the backup 242 323 file whether or not any output is actually changed.}). 243 324 244 325 @cindex In-place editing, Perl-style backup file names … … 255 336 overwritten without making a backup. 256 337 338 Because @option{-i} takes an optional argument, it should 339 not be followed by other short options: 340 @table @code 341 @item sed -Ei '...' FILE 342 Same as @option{-E -i} with no backup suffix - @file{FILE} will be 343 edited in-place without creating a backup. 344 345 @item sed -iE '...' FILE 346 This is equivalent to @option{--in-place=E}, creating @file{FILEE} as backup 347 of @file{FILE} 348 @end table 349 350 Be cautious of using @option{-n} with @option{-i}: the former disables 351 automatic printing of lines and the latter changes the file in-place 352 without a backup. Used carelessly (and without an explicit @code{p} command), 353 the output file will be empty: 354 @codequotebacktick on 355 @codequoteundirected on 356 @example 357 # WRONG USAGE: 'FILE' will be truncated. 358 sed -ni 's/foo/bar/' FILE 359 @end example 360 @codequotebacktick off 361 @codequoteundirected off 362 257 363 @item -l @var{N} 258 364 @itemx --line-length=@var{N} … … 265 371 266 372 @item --posix 373 @opindex --posix 267 374 @cindex @value{SSEDEXT}, disabling 268 @value{SSED} includes several extensions to @acronym{POSIX}375 @value{SSED} includes several extensions to POSIX 269 376 sed. In order to simplify writing portable scripts, this 270 377 option disables all the extensions that this manual documents, … … 272 379 @cindex @code{POSIXLY_CORRECT} behavior, enabling 273 380 Most of the extensions accept @command{sed} programs that 274 are outside the syntax mandated by @acronym{POSIX}, but some381 are outside the syntax mandated by POSIX, but some 275 382 of them (such as the behavior of the @command{N} command 276 described in @ pxref{Reporting Bugs}) actually violate the383 described in @ref{Reporting Bugs}) actually violate the 277 384 standard. If you want to disable only the latter kind of 278 385 extension, you can set the @code{POSIXLY_CORRECT} variable 279 386 to a non-empty value. 280 387 281 @item -r 388 @item -b 389 @itemx --binary 390 @opindex -b 391 @opindex --binary 392 This option is available on every platform, but is only effective where the 393 operating system makes a distinction between text files and binary files. 394 When such a distinction is made---as is the case for MS-DOS, Windows, 395 Cygwin---text files are composed of lines separated by a carriage return 396 @emph{and} a line feed character, and @command{sed} does not see the 397 ending CR. When this option is specified, @command{sed} will open 398 input files in binary mode, thus not requesting this special processing 399 and considering lines to end at a line feed. 400 401 @item --follow-symlinks 402 @opindex --follow-symlinks 403 This option is available only on platforms that support 404 symbolic links and has an effect only if option @option{-i} 405 is specified. In this case, if the file that is specified 406 on the command line is a symbolic link, @command{sed} will 407 follow the link and edit the ultimate destination of the 408 link. The default behavior is to break the symbolic link, 409 so that the link destination will not be modified. 410 411 @item -E 412 @itemx -r 282 413 @itemx --regexp-extended 414 @opindex -E 283 415 @opindex -r 284 416 @opindex --regexp-extended 285 417 @cindex Extended regular expressions, choosing 286 @cindex @acronym{GNU}extensions, extended regular expressions418 @cindex GNU extensions, extended regular expressions 287 419 Use extended regular expressions rather than basic 288 420 regular expressions. Extended regexps are those that 289 421 @command{egrep} accepts; they can be clearer because they 290 usually have less backslashes, but are a @acronym{GNU} extension 291 and hence scripts that use them are not portable. 292 @xref{Extended regexps, , Extended regular expressions}. 293 294 @ifset PERL 295 @item -R 296 @itemx --regexp-perl 297 @opindex -R 298 @opindex --regexp-perl 299 @cindex Perl-style regular expressions, choosing 300 @cindex @value{SSEDEXT}, Perl-style regular expressions 301 Use Perl-style regular expressions rather than basic 302 regular expressions. Perl-style regexps are extremely 303 powerful but are a @value{SSED} extension and hence scripts that 304 use it are not portable. @xref{Perl regexps, , 305 Perl-style regular expressions}. 306 @end ifset 422 usually have fewer backslashes. 423 Historically this was a GNU extension, 424 but the @option{-E} 425 extension has since been added to the POSIX standard 426 (http://austingroupbugs.net/view.php?id=528), 427 so use @option{-E} for portability. 428 GNU sed has accepted @option{-E} as an undocumented option for years, 429 and *BSD seds have accepted @option{-E} for years as well, 430 but scripts that use @option{-E} might not port to other older systems. 431 @xref{ERE syntax, , Extended regular expressions}. 432 307 433 308 434 @item -s 309 435 @itemx --separate 436 @opindex -s 437 @opindex --separate 310 438 @cindex Working on separate files 311 439 By default, @command{sed} will consider the files specified on the … … 318 446 start of each file. 319 447 448 @item --sandbox 449 @opindex --sandbox 450 @cindex Sandbox mode 451 In sandbox mode, @code{e/w/r} commands are rejected - programs containing 452 them will be aborted without being run. Sandbox mode ensures @command{sed} 453 operates only on the input files designated on the command line, and 454 cannot run external programs. 455 456 320 457 @item -u 321 458 @itemx --unbuffered … … 328 465 output as soon as possible.) 329 466 330 @item -e @var{script} 331 @itemx --expression=@var{script} 332 @opindex -e 333 @opindex --expression 334 @cindex Script, from command line 335 Add the commands in @var{script} to the set of commands to be 336 run while processing the input. 337 338 @item -f @var{script-file} 339 @itemx --file=@var{script-file} 340 @opindex -f 341 @opindex --file 342 @cindex Script, from a file 343 Add the commands contained in the file @var{script-file} 344 to the set of commands to be run while processing the input. 345 467 @item -z 468 @itemx --null-data 469 @itemx --zero-terminated 470 @opindex -z 471 @opindex --null-data 472 @opindex --zero-terminated 473 Treat the input as a set of lines, each terminated by a zero byte 474 (the ASCII @samp{NUL} character) instead of a newline. This option can 475 be used with commands like @samp{sort -z} and @samp{find -print0} 476 to process arbitrary file names. 346 477 @end table 347 478 … … 359 490 The standard input will be processed if no file names are specified. 360 491 361 362 @node sed Programs 363 @chapter @command{sed} Programs 364 365 @cindex @command{sed} program structure 492 @node Exit status 493 @section Exit status 494 @cindex exit status 495 An exit status of zero indicates success, and a nonzero value 496 indicates failure. @value{SSED} returns the following exit status 497 error values: 498 499 @table @asis 500 @item 0 501 Successful completion. 502 503 @item 1 504 Invalid command, invalid syntax, invalid regular expression or a 505 @value{SSED} extension command used with @option{--posix}. 506 507 @item 2 508 One or more of the input file specified on the command line could not be 509 opened (e.g. if a file is not found, or read permission is denied). 510 Processing continued with other files. 511 512 @item 4 513 An I/O error, or a serious processing error during runtime, 514 @value{SSED} aborted immediately. 515 @end table 516 517 @cindex Q, example 518 @cindex exit status, example 519 Additionally, the commands @code{q} and @code{Q} can be used to terminate 520 @command{sed} with a custom exit code value (this is a @value{SSED} extension): 521 522 @example 523 $ echo | sed 'Q42' ; echo $? 524 42 525 @end example 526 527 528 @node sed scripts 529 @chapter @command{sed} scripts 530 531 532 @menu 533 * sed script overview:: @command{sed} script overview 534 * sed commands list:: @command{sed} commands summary 535 * The "s" Command:: @command{sed}'s Swiss Army Knife 536 * Common Commands:: Often used commands 537 * Other Commands:: Less frequently used commands 538 * Programming Commands:: Commands for @command{sed} gurus 539 * Extended Commands:: Commands specific of @value{SSED} 540 * Multiple commands syntax:: Extension for easier scripting 541 @end menu 542 543 @node sed script overview 544 @section @command{sed} script overview 545 546 @cindex @command{sed} script structure 366 547 @cindex Script structure 548 367 549 A @command{sed} program consists of one or more @command{sed} commands, 368 550 passed in by one or more of the … … 371 553 options are used. 372 554 This document will refer to ``the'' @command{sed} script; 373 this is understood to mean the in-order c atenation555 this is understood to mean the in-order concatenation 374 556 of all of the @var{script}s and @var{script-file}s passed in. 375 376 Each @code{sed} command consists of an optional address or 377 address range, followed by a one-character command name 378 and any additional command-specific code. 379 380 @menu 381 * Execution Cycle:: How @command{sed} works 382 * Addresses:: Selecting lines with @command{sed} 383 * Regular Expressions:: Overview of regular expression syntax 384 * Common Commands:: Often used commands 385 * The "s" Command:: @command{sed}'s Swiss Army Knife 386 * Other Commands:: Less frequently used commands 387 * Programming Commands:: Commands for @command{sed} gurus 388 * Extended Commands:: Commands specific of @value{SSED} 389 * Escapes:: Specifying special characters 390 @end menu 391 392 393 @node Execution Cycle 394 @section How @command{sed} Works 395 396 @cindex Buffer spaces, pattern and hold 397 @cindex Spaces, pattern and hold 398 @cindex Pattern space, definition 399 @cindex Hold space, definition 400 @command{sed} maintains two data buffers: the active @emph{pattern} space, 401 and the auxiliary @emph{hold} space. Both are initially empty. 402 403 @command{sed} operates by performing the following cycle on each 404 lines of input: first, @command{sed} reads one line from the input 405 stream, removes any trailing newline, and places it in the pattern space. 406 Then commands are executed; each command can have an address associated 407 to it: addresses are a kind of condition code, and a command is only 408 executed if the condition is verified before the command is to be 409 executed. 410 411 When the end of the script is reached, unless the @option{-n} option 412 is in use, the contents of pattern space are printed out to the output 413 stream, adding back the trailing newline if it was removed.@footnote{Actually, 414 if @command{sed} prints a line without the terminating newline, it will 415 nevertheless print the missing newline as soon as more text is sent to 416 the same output stream, which gives the ``least expected surprise'' 417 even though it does not make commands like @samp{sed -n p} exactly 418 identical to @command{cat}.} Then the next cycle starts for the next 419 input line. 420 421 Unless special commands (like @samp{D}) are used, the pattern space is 422 deleted between two cycles. The hold space, on the other hand, keeps 423 its data between cycles (see commands @samp{h}, @samp{H}, @samp{x}, 424 @samp{g}, @samp{G} to move data between both buffers). 425 426 427 @node Addresses 428 @section Selecting lines with @command{sed} 429 @cindex Addresses, in @command{sed} scripts 430 @cindex Line selection 431 @cindex Selecting lines to process 432 433 Addresses in a @command{sed} script can be in any of the following forms: 557 @xref{Overview}. 558 559 560 @cindex @command{sed} commands syntax 561 @cindex syntax, @command{sed} commands 562 @cindex addresses, syntax 563 @cindex syntax, addresses 564 @command{sed} commands follow this syntax: 565 566 @example 567 [addr]@var{X}[options] 568 @end example 569 570 @var{X} is a single-letter @command{sed} command. 571 @c TODO: add @pxref{commands} when there is a command-list section. 572 @code{[addr]} is an optional line address. If @code{[addr]} is specified, 573 the command @var{X} will be executed only on the matched lines. 574 @code{[addr]} can be a single line number, a regular expression, 575 or a range of lines (@pxref{sed addresses}). 576 Additional @code{[options]} are used for some @command{sed} commands. 577 578 @cindex @command{d}, example 579 @cindex address range, example 580 @cindex example, address range 581 The following example deletes lines 30 to 35 in the input. 582 @code{30,35} is an address range. @command{d} is the delete command: 583 584 @example 585 sed '30,35d' input.txt > output.txt 586 @end example 587 588 @cindex @command{q}, example 589 @cindex regular expression, example 590 @cindex example, regular expression 591 The following example prints all input until a line 592 starting with the string @samp{foo} is found. If such line is found, 593 @command{sed} will terminate with exit status 42. 594 If such line was not found (and no other error occurred), @command{sed} 595 will exit with status 0. 596 @code{/^foo/} is a regular-expression address. 597 @command{q} is the quit command. @code{42} is the command option. 598 599 @example 600 sed '/^foo/q42' input.txt > output.txt 601 @end example 602 603 604 @cindex multiple @command{sed} commands 605 @cindex @command{sed} commands, multiple 606 @cindex newline, command separator 607 @cindex semicolons, command separator 608 @cindex ;, command separator 609 @cindex -e, example 610 @cindex -f, example 611 Commands within a @var{script} or @var{script-file} can be 612 separated by semicolons (@code{;}) or newlines (ASCII 10). 613 Multiple scripts can be specified with @option{-e} or @option{-f} 614 options. 615 616 The following examples are all equivalent. They perform two @command{sed} 617 operations: deleting any lines matching the regular expression @code{/^foo/}, 618 and replacing all occurrences of the string @samp{hello} with @samp{world}: 619 620 @example 621 sed '/^foo/d ; s/hello/world/g' input.txt > output.txt 622 623 sed -e '/^foo/d' -e 's/hello/world/g' input.txt > output.txt 624 625 echo '/^foo/d' > script.sed 626 echo 's/hello/world/g' >> script.sed 627 sed -f script.sed input.txt > output.txt 628 629 echo 's/hello/world/g' > script2.sed 630 sed -e '/^foo/d' -f script2.sed input.txt > output.txt 631 @end example 632 633 634 @cindex @command{a}, and semicolons 635 @cindex @command{c}, and semicolons 636 @cindex @command{i}, and semicolons 637 Commands @command{a}, @command{c}, @command{i}, due to their syntax, 638 cannot be followed by semicolons working as command separators and 639 thus should be terminated 640 with newlines or be placed at the end of a @var{script} or @var{script-file}. 641 Commands can also be preceded with optional non-significant 642 whitespace characters. 643 @xref{Multiple commands syntax}. 644 645 646 647 @node sed commands list 648 @section @command{sed} commands summary 649 650 The following commands are supported in @value{SSED}. 651 Some are standard POSIX commands, while other are @value{SSEDEXT}. 652 Details and examples for each command are in the following sections. 653 (Mnemonics) are shown in parentheses. 654 434 655 @table @code 435 @item @var{number} 436 @cindex Address, numeric 437 @cindex Line, selecting by number 438 Specifying a line number will match only that line in the input. 439 (Note that @command{sed} counts lines continuously across all input files 440 unless @option{-i} or @option{-s} options are specified.) 441 442 @item @var{first}~@var{step} 443 @cindex @acronym{GNU} extensions, @samp{@var{n}~@var{m}} addresses 444 This @acronym{GNU} extension matches every @var{step}th line 445 starting with line @var{first}. 446 In particular, lines will be selected when there exists 447 a non-negative @var{n} such that the current line-number equals 448 @var{first} + (@var{n} * @var{step}). 449 Thus, to select the odd-numbered lines, 450 one would use @code{1~2}; 451 to pick every third line starting with the second, @samp{2~3} would be used; 452 to pick every fifth line starting with the tenth, use @samp{10~5}; 453 and @samp{50~0} is just an obscure way of saying @code{50}. 454 455 @item $ 456 @cindex Address, last line 457 @cindex Last line, selecting 458 @cindex Line, selecting last 459 This address matches the last line of the last file of input, or 460 the last line of each file when the @option{-i} or @option{-s} options 461 are specified. 462 463 @item /@var{regexp}/ 464 @cindex Address, as a regular expression 465 @cindex Line, selecting by regular expression match 466 This will select any line which matches the regular expression @var{regexp}. 467 If @var{regexp} itself includes any @code{/} characters, 468 each must be escaped by a backslash (@code{\}). 469 470 @cindex empty regular expression 471 @cindex @value{SSEDEXT}, modifiers and the empty regular expression 472 The empty regular expression @samp{//} repeats the last regular 473 expression match (the same holds if the empty regular expression is 474 passed to the @code{s} command). Note that modifiers to regular expressions 475 are evaluated when the regular expression is compiled, thus it is invalid to 476 specify them together with the empty regular expression. 477 478 @item \%@var{regexp}% 479 (The @code{%} may be replaced by any other single character.) 480 481 @cindex Slash character, in regular expressions 482 This also matches the regular expression @var{regexp}, 483 but allows one to use a different delimiter than @code{/}. 484 This is particularly useful if the @var{regexp} itself contains 485 a lot of slashes, since it avoids the tedious escaping of every @code{/}. 486 If @var{regexp} itself includes any delimiter characters, 487 each must be escaped by a backslash (@code{\}). 488 489 @item /@var{regexp}/I 490 @itemx \%@var{regexp}%I 491 @cindex @acronym{GNU} extensions, @code{I} modifier 492 @ifset PERL 493 @cindex Perl-style regular expressions, case-insensitive 494 @end ifset 495 The @code{I} modifier to regular-expression matching is a @acronym{GNU} 496 extension which causes the @var{regexp} to be matched in 497 a case-insensitive manner. 498 499 @item /@var{regexp}/M 500 @itemx \%@var{regexp}%M 501 @ifset PERL 502 @cindex @value{SSEDEXT}, @code{M} modifier 503 @end ifset 504 @cindex Perl-style regular expressions, multiline 505 The @code{M} modifier to regular-expression matching is a @value{SSED} 506 extension which causes @code{^} and @code{$} to match respectively 507 (in addition to the normal behavior) the empty string after a newline, 508 and the empty string before a newline. There are special character 509 sequences 510 @ifset PERL 511 (@code{\A} and @code{\Z} in Perl mode, @code{\`} and @code{\'} 512 in basic or extended regular expression modes) 513 @end ifset 514 @ifclear PERL 515 (@code{\`} and @code{\'}) 516 @end ifclear 517 which always match the beginning or the end of the buffer. 518 @code{M} stands for @cite{multi-line}. 519 520 @ifset PERL 521 @item /@var{regexp}/S 522 @itemx \%@var{regexp}%S 523 @cindex @value{SSEDEXT}, @code{S} modifier 524 @cindex Perl-style regular expressions, single line 525 The @code{S} modifier to regular-expression matching is only valid 526 in Perl mode and specifies that the dot character (@code{.}) will 527 match the newline character too. @code{S} stands for @cite{single-line}. 528 @end ifset 529 530 @ifset PERL 531 @item /@var{regexp}/X 532 @itemx \%@var{regexp}%X 533 @cindex @value{SSEDEXT}, @code{X} modifier 534 @cindex Perl-style regular expressions, extended 535 The @code{X} modifier to regular-expression matching is also 536 valid in Perl mode only. If it is used, whitespace in the 537 pattern (other than in a character class) and 538 characters between a @kbd{#} outside a character class and the 539 next newline character are ignored. An escaping backslash 540 can be used to include a whitespace or @kbd{#} character as part 541 of the pattern. 542 @end ifset 543 @end table 544 545 If no addresses are given, then all lines are matched; 546 if one address is given, then only lines matching that 547 address are matched. 548 549 @cindex Range of lines 550 @cindex Several lines, selecting 551 An address range can be specified by specifying two addresses 552 separated by a comma (@code{,}). An address range matches lines 553 starting from where the first address matches, and continues 554 until the second address matches (inclusively). 555 556 If the second address is a @var{regexp}, then checking for the 557 ending match will start with the line @emph{following} the 558 line which matched the first address: a range will always 559 span at least two lines (except of course if the input stream 560 ends). 561 562 If the second address is a @var{number} less than (or equal to) 563 the line matching the first address, then only the one line is 564 matched. 565 566 @cindex Special addressing forms 567 @cindex Range with start address of zero 568 @cindex Zero, as range start address 569 @cindex @var{addr1},+N 570 @cindex @var{addr1},~N 571 @cindex @acronym{GNU} extensions, special two-address forms 572 @cindex @acronym{GNU} extensions, @code{0} address 573 @cindex @acronym{GNU} extensions, 0,@var{addr2} addressing 574 @cindex @acronym{GNU} extensions, @var{addr1},+@var{N} addressing 575 @cindex @acronym{GNU} extensions, @var{addr1},~@var{N} addressing 576 @value{SSED} also supports some special two-address forms; all these 577 are @acronym{GNU} extensions: 578 @table @code 579 @item 0,/@var{regexp}/ 580 A line number of @code{0} can be used in an address specification like 581 @code{0,/@var{regexp}/} so that @command{sed} will try to match 582 @var{regexp} in the first input line too. In other words, 583 @code{0,/@var{regexp}/} is similar to @code{1,/@var{regexp}/}, 584 except that if @var{addr2} matches the very first line of input the 585 @code{0,/@var{regexp}/} form will consider it to end the range, whereas 586 the @code{1,/@var{regexp}/} form will match the beginning of its range and 587 hence make the range span up to the @emph{second} occurrence of the 588 regular expression. 589 590 Note that this is the only place where the @code{0} address makes 591 sense; there is no 0-th line and commands which are given the @code{0} 592 address in any other way will give an error. 593 594 @item @var{addr1},+@var{N} 595 Matches @var{addr1} and the @var{N} lines following @var{addr1}. 596 597 @item @var{addr1},~@var{N} 598 Matches @var{addr1} and the lines following @var{addr1} 599 until the next line whose input line number is a multiple of @var{N}. 600 @end table 601 602 @cindex Excluding lines 603 @cindex Selecting non-matching lines 604 Appending the @code{!} character to the end of an address 605 specification negates the sense of the match. 606 That is, if the @code{!} character follows an address range, 607 then only lines which do @emph{not} match the address range 608 will be selected. 609 This also works for singleton addresses, 610 and, perhaps perversely, for the null address. 611 612 613 @node Regular Expressions 614 @section Overview of Regular Expression Syntax 615 616 To know how to use @command{sed}, people should understand regular 617 expressions (@dfn{regexp} for short). A regular expression 618 is a pattern that is matched against a 619 subject string from left to right. Most characters are 620 @dfn{ordinary}: they stand for 621 themselves in a pattern, and match the corresponding characters 622 in the subject. As a trivial example, the pattern 623 624 @example 625 The quick brown fox 626 @end example 627 628 @noindent 629 matches a portion of a subject string that is identical to 630 itself. The power of regular expressions comes from the 631 ability to include alternatives and repetitions in the pattern. 632 These are encoded in the pattern by the use of @dfn{special characters}, 633 which do not stand for themselves but instead 634 are interpreted in some special way. Here is a brief description 635 of regular expression syntax as used in @command{sed}. 636 637 @table @code 638 @item @var{char} 639 A single ordinary character matches itself. 640 641 @item * 642 @cindex @acronym{GNU} extensions, to basic regular expressions 643 Matches a sequence of zero or more instances of matches for the 644 preceding regular expression, which must be an ordinary character, a 645 special character preceded by @code{\}, a @code{.}, a grouped regexp 646 (see below), or a bracket expression. As a @acronym{GNU} extension, a 647 postfixed regular expression can also be followed by @code{*}; for 648 example, @code{a**} is equivalent to @code{a*}. @acronym{POSIX} 649 1003.1-2001 says that @code{*} stands for itself when it appears at 650 the start of a regular expression or subexpression, but many 651 non@acronym{GNU} implementations do not support this and portable 652 scripts should instead use @code{\*} in these contexts. 653 654 @item \+ 655 @cindex @acronym{GNU} extensions, to basic regular expressions 656 As @code{*}, but matches one or more. It is a @acronym{GNU} extension. 657 658 @item \? 659 @cindex @acronym{GNU} extensions, to basic regular expressions 660 As @code{*}, but only matches zero or one. It is a @acronym{GNU} extension. 661 662 @item \@{@var{i}\@} 663 As @code{*}, but matches exactly @var{i} sequences (@var{i} is a 664 decimal integer; for portability, keep it between 0 and 255 665 inclusive). 666 667 @item \@{@var{i},@var{j}\@} 668 Matches between @var{i} and @var{j}, inclusive, sequences. 669 670 @item \@{@var{i},\@} 671 Matches more than or equal to @var{i} sequences. 672 673 @item \(@var{regexp}\) 674 Groups the inner @var{regexp} as a whole, this is used to: 675 676 @itemize @bullet 677 @item 678 @cindex @acronym{GNU} extensions, to basic regular expressions 679 Apply postfix operators, like @code{\(abcd\)*}: 680 this will search for zero or more whole sequences 681 of @samp{abcd}, while @code{abcd*} would search 682 for @samp{abc} followed by zero or more occurrences 683 of @samp{d}. Note that support for @code{\(abcd\)*} is 684 required by @acronym{POSIX} 1003.1-2001, but many non-@acronym{GNU} 685 implementations do not support it and hence it is not universally 686 portable. 687 688 @item 689 Use back references (see below). 690 @end itemize 691 692 @item . 693 Matches any character, including newline. 694 695 @item ^ 696 Matches the null string at beginning of line, i.e. what 697 appears after the circumflex must appear at the 698 beginning of line. @code{^#include} will match only 699 lines where @samp{#include} is the first thing on line---if 700 there are spaces before, for example, the match fails. 701 @code{^} acts as a special character only at the beginning 702 of the regular expression or subexpression (that is, 703 after @code{\(} or @code{\|}). Portable scripts should avoid 704 @code{^} at the beginning of a subexpression, though, as 705 @acronym{POSIX} allows implementations that treat @code{^} as 706 an ordinary character in that context. 707 708 709 @item $ 710 It is the same as @code{^}, but refers to end of line. 711 @code{$} also acts as a special character only at the end 712 of the regular expression or subexpression (that is, before @code{\)} 713 or @code{\|}), and its use at the end of a subexpression is not 714 portable. 715 716 717 @item [@var{list}] 718 @itemx [^@var{list}] 719 Matches any single character in @var{list}: for example, 720 @code{[aeiou]} matches all vowels. A list may include 721 sequences like @code{@var{char1}-@var{char2}}, which 722 matches any character between (inclusive) @var{char1} 723 and @var{char2}. 724 725 A leading @code{^} reverses the meaning of @var{list}, so that 726 it matches any single character @emph{not} in @var{list}. To include 727 @code{]} in the list, make it the first character (after 728 the @code{^} if needed), to include @code{-} in the list, 729 make it the first or last; to include @code{^} put 730 it after the first character. 731 732 @cindex @code{POSIXLY_CORRECT} behavior, bracket expressions 733 The characters @code{$}, @code{*}, @code{.}, @code{[}, and @code{\} 734 are normally not special within @var{list}. For example, @code{[\*]} 735 matches either @samp{\} or @samp{*}, because the @code{\} is not 736 special here. However, strings like @code{[.ch.]}, @code{[=a=]}, and 737 @code{[:space:]} are special within @var{list} and represent collating 738 symbols, equivalence classes, and character classes, respectively, and 739 @code{[} is therefore special within @var{list} when it is followed by 740 @code{.}, @code{=}, or @code{:}. Also, when not in 741 @env{POSIXLY_CORRECT} mode, special escapes like @code{\n} and 742 @code{\t} are recognized within @var{list}. @xref{Escapes}. 743 744 @item @var{regexp1}\|@var{regexp2} 745 @cindex @acronym{GNU} extensions, to basic regular expressions 746 Matches either @var{regexp1} or @var{regexp2}. Use 747 parentheses to use complex alternative regular expressions. 748 The matching process tries each alternative in turn, from 749 left to right, and the first one that succeeds is used. 750 It is a @acronym{GNU} extension. 751 752 @item @var{regexp1}@var{regexp2} 753 Matches the concatenation of @var{regexp1} and @var{regexp2}. 754 Concatenation binds more tightly than @code{\|}, @code{^}, and 755 @code{$}, but less tightly than the other regular expression 756 operators. 757 758 @item \@var{digit} 759 Matches the @var{digit}-th @code{\(@dots{}\)} parenthesized 760 subexpression in the regular expression. This is called a @dfn{back 761 reference}. Subexpressions are implicity numbered by counting 762 occurrences of @code{\(} left-to-right. 763 764 @item \n 765 Matches the newline character. 766 767 @item \@var{char} 768 Matches @var{char}, where @var{char} is one of @code{$}, 769 @code{*}, @code{.}, @code{[}, @code{\}, or @code{^}. 770 Note that the only C-like 771 backslash sequences that you can portably assume to be 772 interpreted are @code{\n} and @code{\\}; in particular 773 @code{\t} is not portable, and matches a @samp{t} under most 774 implementations of @command{sed}, rather than a tab character. 775 776 @end table 777 778 @cindex Greedy regular expression matching 779 Note that the regular expression matcher is greedy, i.e., matches 780 are attempted from left to right and, if two or more matches are 781 possible starting at the same character, it selects the longest. 782 783 @noindent 784 Examples: 785 @table @samp 786 @item abcdef 787 Matches @samp{abcdef}. 788 789 @item a*b 790 Matches zero or more @samp{a}s followed by a single 791 @samp{b}. For example, @samp{b} or @samp{aaaaab}. 792 793 @item a\?b 794 Matches @samp{b} or @samp{ab}. 795 796 @item a\+b\+ 797 Matches one or more @samp{a}s followed by one or more 798 @samp{b}s: @samp{ab} is the shortest possible match, but 799 other examples are @samp{aaaab} or @samp{abbbbb} or 800 @samp{aaaaaabbbbbbb}. 801 802 @item .* 803 @itemx .\+ 804 These two both match all the characters in a string; 805 however, the first matches every string (including the empty 806 string), while the second matches only strings containing 807 at least one character. 808 809 @item ^main.*(.*) 810 his matches a string starting with @samp{main}, 811 followed by an opening and closing 812 parenthesis. The @samp{n}, @samp{(} and @samp{)} need not 813 be adjacent. 814 815 @item ^# 816 This matches a string beginning with @samp{#}. 817 818 @item \\$ 819 This matches a string ending with a single backslash. The 820 regexp contains two backslashes for escaping. 821 822 @item \$ 823 Instead, this matches a string consisting of a single dollar sign, 824 because it is escaped. 825 826 @item [a-zA-Z0-9] 827 In the C locale, this matches any @acronym{ASCII} letters or digits. 828 829 @item [^ @kbd{tab}]\+ 830 (Here @kbd{tab} stands for a single tab character.) 831 This matches a string of one or more 832 characters, none of which is a space or a tab. 833 Usually this means a word. 834 835 @item ^\(.*\)\n\1$ 836 This matches a string consisting of two equal substrings separated by 837 a newline. 838 839 @item .\@{9\@}A$ 840 This matches nine characters followed by an @samp{A}. 841 842 @item ^.\@{15\@}A 843 This matches the start of a string that contains 16 characters, 844 the last of which is an @samp{A}. 845 846 @end table 847 848 849 850 @node Common Commands 851 @section Often-Used Commands 852 853 If you use @command{sed} at all, you will quite likely want to know 854 these commands. 855 856 @table @code 857 @item # 858 [No addresses allowed.] 859 860 @findex # (comments) 861 @cindex Comments, in scripts 862 The @code{#} character begins a comment; 863 the comment continues until the next newline. 864 865 @cindex Portability, comments 866 If you are concerned about portability, be aware that 867 some implementations of @command{sed} (which are not @sc{posix} 868 conformant) may only support a single one-line comment, 869 and then only when the very first character of the script is a @code{#}. 870 871 @findex -n, forcing from within a script 872 @cindex Caveat --- #n on first line 873 Warning: if the first two characters of the @command{sed} script 874 are @code{#n}, then the @option{-n} (no-autoprint) option is forced. 875 If you want to put a comment in the first line of your script 876 and that comment begins with the letter @samp{n} 877 and you do not want this behavior, 878 then be sure to either use a capital @samp{N}, 879 or place at least one space before the @samp{n}. 880 881 @item q [@var{exit-code}] 882 This command only accepts a single address. 883 884 @findex q (quit) command 885 @cindex @value{SSEDEXT}, returning an exit code 886 @cindex Quitting 887 Exit @command{sed} without processing any more commands or input. 888 Note that the current pattern space is printed if auto-print is 889 not disabled with the @option{-n} options. The ability to return 890 an exit code from the @command{sed} script is a @value{SSED} extension. 656 657 @item a\ 658 @itemx @var{text} 659 Append @var{text} after a line. 660 661 @item a @var{text} 662 Append @var{text} after a line (alternative syntax). 663 664 @item b @var{label} 665 Branch unconditionally to @var{label}. 666 The @var{label} may be omitted, in which case the next cycle is started. 667 668 @item c\ 669 @itemx @var{text} 670 Replace (change) lines with @var{text}. 671 672 @item c @var{text} 673 Replace (change) lines with @var{text} (alternative syntax). 891 674 892 675 @item d 893 @findex d (delete) command894 @cindex Text, deleting895 676 Delete the pattern space; 896 677 immediately start next cycle. 897 678 898 @item p 899 @findex p (print) command 900 @cindex Text, printing 901 Print out the pattern space (to the standard output). 902 This command is usually only used in conjunction with the @option{-n} 903 command-line option. 679 @item D 680 If pattern space contains newlines, delete text in the pattern 681 space up to the first newline, and restart cycle with the resultant 682 pattern space, without reading a new line of input. 683 684 If pattern space contains no newline, start a normal new cycle as if 685 the @code{d} command was issued. 686 @c TODO: add a section about D+N and D+n commands 687 688 @item e 689 Executes the command that is found in pattern space and 690 replaces the pattern space with the output; a trailing newline 691 is suppressed. 692 693 @item e @var{command} 694 Executes @var{command} and sends its output to the output stream. 695 The command can run across multiple lines, all but the last ending with 696 a back-slash. 697 698 @item F 699 (filename) Print the file name of the current input file (with a trailing 700 newline). 701 702 @item g 703 Replace the contents of the pattern space with the contents of the hold space. 704 705 @item G 706 Append a newline to the contents of the pattern space, 707 and then append the contents of the hold space to that of the pattern space. 708 709 @item h 710 (hold) Replace the contents of the hold space with the contents of the 711 pattern space. 712 713 @item H 714 Append a newline to the contents of the hold space, 715 and then append the contents of the pattern space to that of the hold space. 716 717 @item i\ 718 @itemx @var{text} 719 insert @var{text} before a line. 720 721 @item i @var{text} 722 insert @var{text} before a line (alternative syntax). 723 724 @item l 725 Print the pattern space in an unambiguous form. 904 726 905 727 @item n 906 @findex n (next-line) command 907 @cindex Next input line, replace pattern space with 908 @cindex Read next input line 909 If auto-print is not disabled, print the pattern space, 728 (next) If auto-print is not disabled, print the pattern space, 910 729 then, regardless, replace the pattern space with the next line of input. 911 730 If there is no more input then @command{sed} exits without processing 912 731 any more commands. 913 732 914 @item @{ @var{commands} @} 915 @findex @{@} command grouping 916 @cindex Grouping commands 917 @cindex Command groups 918 A group of commands may be enclosed between 919 @code{@{} and @code{@}} characters. 920 This is particularly useful when you want a group of commands 921 to be triggered by a single address (or address-range) match. 733 @item N 734 Add a newline to the pattern space, 735 then append the next line of input to the pattern space. 736 If there is no more input then @command{sed} exits without processing 737 any more commands. 738 739 @item p 740 Print the pattern space. 741 @c useful with @option{-n} 742 743 @item P 744 Print the pattern space, up to the first <newline>. 745 746 @item q@var{[exit-code]} 747 (quit) Exit @command{sed} without processing any more commands or input. 748 749 @item Q@var{[exit-code]} 750 (quit) This command is the same as @code{q}, but will not print the 751 contents of pattern space. Like @code{q}, it provides the 752 ability to return an exit code to the caller. 753 @c useful to quit on a conditional without printing 754 755 @item r filename 756 Reads file @var{filename}. 757 758 @item R filename 759 Queue a line of @var{filename} to be read and 760 inserted into the output stream at the end of the current cycle, 761 or when the next input line is read. 762 @c useful to interleave files 763 764 @item s@var{/regexp/replacement/[flags]} 765 (substitute) Match the regular-expression against the content of the 766 pattern space. If found, replace matched string with 767 @var{replacement}. 768 769 @item t @var{label} 770 (test) Branch to @var{label} only if there has been a successful 771 @code{s}ubstitution since the last input line was read or conditional 772 branch was taken. The @var{label} may be omitted, in which case the 773 next cycle is started. 774 775 @item T @var{label} 776 (test) Branch to @var{label} only if there have been no successful 777 @code{s}ubstitutions since the last input line was read or 778 conditional branch was taken. The @var{label} may be omitted, 779 in which case the next cycle is started. 780 781 @item v @var{[version]} 782 (version) This command does nothing, but makes @command{sed} fail if 783 @value{SSED} extensions are not supported, or if the requested version 784 is not available. 785 786 @item w filename 787 Write the pattern space to @var{filename}. 788 789 @item W filename 790 Write to the given filename the portion of the pattern space up to 791 the first newline 792 793 @item x 794 Exchange the contents of the hold and pattern spaces. 795 796 797 @item y/src/dst/ 798 Transliterate any characters in the pattern space which match 799 any of the @var{source-chars} with the corresponding character 800 in @var{dest-chars}. 801 802 803 @item z 804 (zap) This command empties the content of pattern space. 805 806 @item # 807 A comment, until the next newline. 808 809 810 @item @{ @var{cmd ; cmd ...} @} 811 Group several commands together. 812 @c useful for multiple commands on same address 813 814 @item = 815 Print the current input line number (with a trailing newline). 816 817 @item : @var{label} 818 Specify the location of @var{label} for branch commands (@code{b}, 819 @code{t}, @code{T}). 922 820 923 821 @end table 822 924 823 925 824 @node The "s" Command 926 825 @section The @code{s} Command 927 826 928 The syntax of the @code{s} (as in substitute) command is 929 @samp{s/@var{regexp}/@var{replacement}/@var{flags}}. The @code{/} 930 characters may be uniformly replaced by any other single 931 character within any given @code{s} command. The @code{/} 932 character (or whatever other character is used in its stead) 933 can appear in the @var{regexp} or @var{replacement} 934 only if it is preceded by a @code{\} character. 935 936 The @code{s} command is probably the most important in @command{sed} 937 and has a lot of different options. Its basic concept is simple: 938 the @code{s} command attempts to match the pattern 939 space against the supplied @var{regexp}; if the match is 940 successful, then that portion of the pattern 941 space which was matched is replaced with @var{replacement}. 827 The @code{s} command (as in substitute) is probably the most important 828 in @command{sed} and has a lot of different options. The syntax of 829 the @code{s} command is 830 @samp{s/@var{regexp}/@var{replacement}/@var{flags}}. 831 832 Its basic concept is simple: the @code{s} command attempts to match 833 the pattern space against the supplied regular expression @var{regexp}; 834 if the match is successful, then that portion of the 835 pattern space which was matched is replaced with @var{replacement}. 836 837 For details about @var{regexp} syntax @pxref{Regexp Addresses,,Regular 838 Expression Addresses}. 942 839 943 840 @cindex Backreferences, in regular expressions … … 950 847 characters which reference the whole matched portion 951 848 of the pattern space. 849 850 @c TODO: xref to backreference section mention @var{\'}. 851 852 The @code{/} 853 characters may be uniformly replaced by any other single 854 character within any given @code{s} command. The @code{/} 855 character (or whatever other character is used in its stead) 856 can appear in the @var{regexp} or @var{replacement} 857 only if it is preceded by a @code{\} character. 858 859 860 952 861 @cindex @value{SSEDEXT}, case modifiers in @code{s} commands 953 862 Finally, as a @value{SSED} extension, you can include a … … 976 885 Stop case conversion started by @code{\L} or @code{\U}. 977 886 @end table 887 888 When the @code{g} flag is being used, case conversion does not 889 propagate from one occurrence of the regular expression to 890 another. For example, when the following command is executed 891 with @samp{a-b-} in pattern space: 892 @example 893 s/\(b\?\)-/x\u\1/g 894 @end example 895 896 @noindent 897 the output is @samp{axxB}. When replacing the first @samp{-}, 898 the @samp{\u} sequence only affects the empty replacement of 899 @samp{\1}. It does not affect the @code{x} character that is 900 added to pattern space when replacing @code{b-} with @code{xB}. 901 902 On the other hand, @code{\l} and @code{\u} do affect the remainder 903 of the replacement text if they are followed by an empty substitution. 904 With @samp{a-b-} in pattern space, the following command: 905 @example 906 s/\(b\?\)-/\u\1x/g 907 @end example 908 909 @noindent 910 will replace @samp{-} with @samp{X} (uppercase) and @samp{b-} with 911 @samp{Bx}. If this behavior is undesirable, you can prevent it by 912 adding a @samp{\E} sequence---after @samp{\1} in this case. 978 913 979 914 To include a literal @code{\}, @code{&}, or newline in the final … … 997 932 Only replace the @var{number}th match of the @var{regexp}. 998 933 999 @cindex @acronym{GNU} extensions, @code{g} and @var{number} modifier interaction in @code{s} command 934 @cindex GNU extensions, @code{g} and @var{number} modifier 935 interaction in @code{s} command 1000 936 @cindex Mixing @code{g} and @var{number} modifiers in the @code{s} command 1001 937 Note: the @sc{posix} standard does not specify what should happen … … 1023 959 change in future versions. 1024 960 1025 @item w @var{file -name}961 @item w @var{filename} 1026 962 @cindex Text, writing to a file after substitution 1027 963 @cindex @value{SSEDEXT}, @file{/dev/stdout} file 1028 964 @cindex @value{SSEDEXT}, @file{/dev/stderr} file 1029 965 If the substitution was made, then write out the result to the named file. 1030 As a @value{SSED} extension, two special values of @var{file -name} are966 As a @value{SSED} extension, two special values of @var{filename} are 1031 967 supported: @file{/dev/stderr}, which writes the result to the standard 1032 968 error, and @file{/dev/stdout}, which writes to the standard … … 1048 984 @item I 1049 985 @itemx i 1050 @cindex @acronym{GNU}extensions, @code{I} modifier986 @cindex GNU extensions, @code{I} modifier 1051 987 @cindex Case-insensitive matching 1052 @ifset PERL 1053 @cindex Perl-style regular expressions, case-insensitive 1054 @end ifset 1055 The @code{I} modifier to regular-expression matching is a @acronym{GNU} 988 The @code{I} modifier to regular-expression matching is a GNU 1056 989 extension which makes @command{sed} match @var{regexp} in a 1057 990 case-insensitive manner. … … 1060 993 @itemx m 1061 994 @cindex @value{SSEDEXT}, @code{M} modifier 1062 @ifset PERL1063 @cindex Perl-style regular expressions, multiline1064 @end ifset1065 995 The @code{M} modifier to regular-expression matching is a @value{SSED} 1066 extension which causes @code{^} and @code{$} to match respectively 1067 (in addition to the normal behavior) the empty string after a newline, 1068 and the empty string before a newline. There are special character 1069 sequences 1070 @ifset PERL 1071 (@code{\A} and @code{\Z} in Perl mode, @code{\`} and @code{\'} 1072 in basic or extended regular expression modes) 1073 @end ifset 996 extension which directs @value{SSED} to match the regular expression 997 in @cite{multi-line} mode. The modifier causes @code{^} and @code{$} to 998 match respectively (in addition to the normal behavior) the empty string 999 after a newline, and the empty string before a newline. There are 1000 special character sequences 1074 1001 @ifclear PERL 1075 1002 (@code{\`} and @code{\'}) 1076 1003 @end ifclear 1077 1004 which always match the beginning or the end of the buffer. 1078 @code{M} stands for @cite{multi-line}. 1079 1080 @ifset PERL 1081 @item S 1082 @itemx s 1083 @cindex @value{SSEDEXT}, @code{S} modifier 1084 @cindex Perl-style regular expressions, single line 1085 The @code{S} modifier to regular-expression matching is only valid 1086 in Perl mode and specifies that the dot character (@code{.}) will 1087 match the newline character too. @code{S} stands for @cite{single-line}. 1088 @end ifset 1089 1090 @ifset PERL 1091 @item X 1092 @itemx x 1093 @cindex @value{SSEDEXT}, @code{X} modifier 1094 @cindex Perl-style regular expressions, extended 1095 The @code{X} modifier to regular-expression matching is also 1096 valid in Perl mode only. If it is used, whitespace in the 1097 pattern (other than in a character class) and 1098 characters between a @kbd{#} outside a character class and the 1099 next newline character are ignored. An escaping backslash 1100 can be used to include a whitespace or @kbd{#} character as part 1101 of the pattern. 1102 @end ifset 1005 In addition, 1006 the period character does not match a new-line character in 1007 multi-line mode. 1008 1009 1010 @end table 1011 1012 @node Common Commands 1013 @section Often-Used Commands 1014 1015 If you use @command{sed} at all, you will quite likely want to know 1016 these commands. 1017 1018 @table @code 1019 @item # 1020 [No addresses allowed.] 1021 1022 @findex # (comments) 1023 @cindex Comments, in scripts 1024 The @code{#} character begins a comment; 1025 the comment continues until the next newline. 1026 1027 @cindex Portability, comments 1028 If you are concerned about portability, be aware that 1029 some implementations of @command{sed} (which are not @sc{posix} 1030 conforming) may only support a single one-line comment, 1031 and then only when the very first character of the script is a @code{#}. 1032 1033 @findex -n, forcing from within a script 1034 @cindex Caveat --- #n on first line 1035 Warning: if the first two characters of the @command{sed} script 1036 are @code{#n}, then the @option{-n} (no-autoprint) option is forced. 1037 If you want to put a comment in the first line of your script 1038 and that comment begins with the letter @samp{n} 1039 and you do not want this behavior, 1040 then be sure to either use a capital @samp{N}, 1041 or place at least one space before the @samp{n}. 1042 1043 @item q [@var{exit-code}] 1044 @findex q (quit) command 1045 @cindex @value{SSEDEXT}, returning an exit code 1046 @cindex Quitting 1047 Exit @command{sed} without processing any more commands or input. 1048 1049 Example: stop after printing the second line: 1050 @example 1051 $ seq 3 | sed 2q 1052 1 1053 2 1054 @end example 1055 1056 This command accepts only one address. 1057 Note that the current pattern space is printed if auto-print is 1058 not disabled with the @option{-n} options. The ability to return 1059 an exit code from the @command{sed} script is a @value{SSED} extension. 1060 1061 See also the @value{SSED} extension @code{Q} command which quits silently 1062 without printing the current pattern space. 1063 1064 @item d 1065 @findex d (delete) command 1066 @cindex Text, deleting 1067 Delete the pattern space; 1068 immediately start next cycle. 1069 1070 Example: delete the second input line: 1071 @example 1072 $ seq 3 | sed 2d 1073 1 1074 3 1075 @end example 1076 1077 @item p 1078 @findex p (print) command 1079 @cindex Text, printing 1080 Print out the pattern space (to the standard output). 1081 This command is usually only used in conjunction with the @option{-n} 1082 command-line option. 1083 1084 Example: print only the second input line: 1085 @example 1086 $ seq 3 | sed -n 2p 1087 2 1088 @end example 1089 1090 @item n 1091 @findex n (next-line) command 1092 @cindex Next input line, replace pattern space with 1093 @cindex Read next input line 1094 If auto-print is not disabled, print the pattern space, 1095 then, regardless, replace the pattern space with the next line of input. 1096 If there is no more input then @command{sed} exits without processing 1097 any more commands. 1098 1099 This command is useful to skip lines (e.g. process every Nth line). 1100 1101 Example: perform substitution on every 3rd line (i.e. two @code{n} commands 1102 skip two lines): 1103 @codequoteundirected on 1104 @codequotebacktick on 1105 @example 1106 $ seq 6 | sed 'n;n;s/./x/' 1107 1 1108 2 1109 x 1110 4 1111 5 1112 x 1113 @end example 1114 1115 @value{SSED} provides an extension address syntax of @var{first}~@var{step} 1116 to achieve the same result: 1117 1118 @example 1119 $ seq 6 | sed '0~3s/./x/' 1120 1 1121 2 1122 x 1123 4 1124 5 1125 x 1126 @end example 1127 1128 @codequotebacktick off 1129 @codequoteundirected off 1130 1131 1132 @item @{ @var{commands} @} 1133 @findex @{@} command grouping 1134 @cindex Grouping commands 1135 @cindex Command groups 1136 A group of commands may be enclosed between 1137 @code{@{} and @code{@}} characters. 1138 This is particularly useful when you want a group of commands 1139 to be triggered by a single address (or address-range) match. 1140 1141 Example: perform substitution then print the second input line: 1142 @codequoteundirected on 1143 @codequotebacktick on 1144 @example 1145 $ seq 3 | sed -n '2@{s/2/X/ ; p@}' 1146 X 1147 @end example 1148 @codequoteundirected off 1149 @codequotebacktick off 1150 1103 1151 @end table 1104 1152 … … 1113 1161 @table @code 1114 1162 @item y/@var{source-chars}/@var{dest-chars}/ 1115 (The @code{/} characters may be uniformly replaced by1116 any other single character within any given @code{y} command.)1117 1118 1163 @findex y (transliterate) command 1119 1164 @cindex Transliteration … … 1122 1167 in @var{dest-chars}. 1123 1168 1169 Example: transliterate @samp{a-j} into @samp{0-9}: 1170 @codequoteundirected on 1171 @codequotebacktick on 1172 @example 1173 $ echo hello world | sed 'y/abcdefghij/0123456789/' 1174 74llo worl3 1175 @end example 1176 @codequoteundirected off 1177 @codequotebacktick off 1178 1179 (The @code{/} characters may be uniformly replaced by 1180 any other single character within any given @code{y} command.) 1181 1124 1182 Instances of the @code{/} (or whatever other character is used in its stead), 1125 1183 @code{\}, or newlines can appear in the @var{source-chars} or @var{dest-chars} … … 1128 1186 contain the same number of characters (after de-escaping). 1129 1187 1188 See the @command{tr} command from GNU coreutils for similar functionality. 1189 1190 @item a @var{text} 1191 Appending @var{text} after a line. This is a GNU extension 1192 to the standard @code{a} command - see below for details. 1193 1194 Example: Add @samp{hello} after the second line: 1195 @codequoteundirected on 1196 @codequotebacktick on 1197 @example 1198 $ seq 3 | sed '2a hello' 1199 1 1200 2 1201 hello 1202 3 1203 @end example 1204 @codequoteundirected off 1205 @codequotebacktick off 1206 1207 Leading whitespace after the @code{a} command is ignored. 1208 The text to add is read until the end of the line. 1209 1210 1130 1211 @item a\ 1131 1212 @itemx @var{text} 1132 @cindex @value{SSEDEXT}, two addresses supported by most commands1133 As a @acronym{GNU} extension, this command accepts two addresses.1134 1135 1213 @findex a (append text lines) command 1136 1214 @cindex Appending text after a line 1137 1215 @cindex Text, appending 1138 Queue the lines of text which follow this command 1216 Appending @var{text} after a line. 1217 1218 Example: Add @samp{hello} after the second line 1219 (@print{} indicates printed output lines): 1220 @codequoteundirected on 1221 @codequotebacktick on 1222 @example 1223 $ seq 3 | sed '2a\ 1224 hello' 1225 @print{}1 1226 @print{}2 1227 @print{}hello 1228 @print{}3 1229 @end example 1230 @codequoteundirected off 1231 @codequotebacktick off 1232 1233 The @code{a} command queues the lines of text which follow this command 1139 1234 (each but the last ending with a @code{\}, 1140 1235 which are removed from the output) … … 1142 1237 or when the next input line is read. 1143 1238 1239 @cindex @value{SSEDEXT}, two addresses supported by most commands 1240 As a GNU extension, this command accepts two addresses. 1241 1144 1242 Escape sequences in @var{text} are processed, so you should 1145 1243 use @code{\\} in @var{text} to print a single backslash. 1146 1244 1147 As a @acronym{GNU} extension, if between the @code{a} and the newline there is 1148 other than a whitespace-@code{\} sequence, then the text of this line, 1149 starting at the first non-whitespace character after the @code{a}, 1150 is taken as the first line of the @var{text} block. 1151 (This enables a simplification in scripting a one-line add.) 1152 This extension also works with the @code{i} and @code{c} commands. 1153 1245 The commands resume after the last line without a backslash (@code{\}) - 1246 @samp{world} in the following example: 1247 @codequoteundirected on 1248 @codequotebacktick on 1249 @example 1250 $ seq 3 | sed '2a\ 1251 hello\ 1252 world 1253 3s/./X/' 1254 @print{}1 1255 @print{}2 1256 @print{}hello 1257 @print{}world 1258 @print{}X 1259 @end example 1260 @codequoteundirected off 1261 @codequotebacktick off 1262 1263 As a GNU extension, the @code{a} command and @var{text} can be 1264 separated into two @code{-e} parameters, enabling easier scripting: 1265 @codequoteundirected on 1266 @codequotebacktick on 1267 @example 1268 $ seq 3 | sed -e '2a\' -e hello 1269 1 1270 2 1271 hello 1272 3 1273 1274 $ sed -e '2a\' -e "$VAR" 1275 @end example 1276 @codequoteundirected off 1277 @codequotebacktick off 1278 1279 @item i @var{text} 1280 insert @var{text} before a line. This is a GNU extension 1281 to the standard @code{i} command - see below for details. 1282 1283 Example: Insert @samp{hello} before the second line: 1284 @codequoteundirected on 1285 @codequotebacktick on 1286 @example 1287 $ seq 3 | sed '2i hello' 1288 1 1289 hello 1290 2 1291 3 1292 @end example 1293 @codequoteundirected off 1294 @codequotebacktick off 1295 1296 Leading whitespace after the @code{i} command is ignored. 1297 The text to add is read until the end of the line. 1298 1299 @anchor{insert command} 1154 1300 @item i\ 1155 1301 @itemx @var{text} 1156 @cindex @value{SSEDEXT}, two addresses supported by most commands1157 As a @acronym{GNU} extension, this command accepts two addresses.1158 1159 1302 @findex i (insert text lines) command 1160 1303 @cindex Inserting text before a line 1161 1304 @cindex Text, insertion 1162 Immediately output the lines of text which follow this command 1163 (each but the last ending with a @code{\}, 1164 which are removed from the output). 1305 Immediately output the lines of text which follow this command. 1306 1307 Example: Insert @samp{hello} before the second line 1308 (@print{} indicates printed output lines): 1309 @codequoteundirected on 1310 @codequotebacktick on 1311 @example 1312 $ seq 3 | sed '2i\ 1313 hello' 1314 @print{}1 1315 @print{}hello 1316 @print{}2 1317 @print{}3 1318 @end example 1319 @codequoteundirected off 1320 @codequotebacktick off 1321 1322 @cindex @value{SSEDEXT}, two addresses supported by most commands 1323 As a GNU extension, this command accepts two addresses. 1324 1325 Escape sequences in @var{text} are processed, so you should 1326 use @code{\\} in @var{text} to print a single backslash. 1327 1328 The commands resume after the last line without a backslash (@code{\}) - 1329 @samp{world} in the following example: 1330 @codequoteundirected on 1331 @codequotebacktick on 1332 @example 1333 $ seq 3 | sed '2i\ 1334 hello\ 1335 world 1336 s/./X/' 1337 @print{}X 1338 @print{}hello 1339 @print{}world 1340 @print{}X 1341 @print{}X 1342 @end example 1343 @codequoteundirected off 1344 @codequotebacktick off 1345 1346 As a GNU extension, the @code{i} command and @var{text} can be 1347 separated into two @code{-e} parameters, enabling easier scripting: 1348 @codequoteundirected on 1349 @codequotebacktick on 1350 @example 1351 $ seq 3 | sed -e '2i\' -e hello 1352 1 1353 hello 1354 2 1355 3 1356 1357 $ sed -e '2i\' -e "$VAR" 1358 @end example 1359 @codequoteundirected off 1360 @codequotebacktick off 1361 1362 @item c @var{text} 1363 Replaces the line(s) with @var{text}. This is a GNU extension 1364 to the standard @code{c} command - see below for details. 1365 1366 Example: Replace the 2nd to 9th lines with the word @samp{hello}: 1367 @codequoteundirected on 1368 @codequotebacktick on 1369 @example 1370 $ seq 10 | sed '2,9c hello' 1371 1 1372 hello 1373 10 1374 @end example 1375 @codequoteundirected off 1376 @codequotebacktick off 1377 1378 Leading whitespace after the @code{c} command is ignored. 1379 The text to add is read until the end of the line. 1165 1380 1166 1381 @item c\ … … 1169 1384 @cindex Replacing selected lines with other text 1170 1385 Delete the lines matching the address or address-range, 1171 and output the lines of text which follow this command 1172 (each but the last ending with a @code{\}, 1173 which are removed from the output) 1174 in place of the last line 1175 (or in place of each line, if no addresses were specified). 1386 and output the lines of text which follow this command. 1387 1388 Example: Replace 2nd to 4th lines with the words @samp{hello} and 1389 @samp{world} (@print{} indicates printed output lines): 1390 @codequoteundirected on 1391 @codequotebacktick on 1392 @example 1393 $ seq 5 | sed '2,4c\ 1394 hello\ 1395 world' 1396 @print{}1 1397 @print{}hello 1398 @print{}world 1399 @print{}5 1400 @end example 1401 @codequoteundirected off 1402 @codequotebacktick off 1403 1404 If no addresses are given, each line is replaced. 1405 1176 1406 A new cycle is started after this command is done, 1177 1407 since the pattern space will have been deleted. 1408 In the following example, the @code{c} starts a 1409 new cycle and the substitution command is not performed 1410 on the replaced text: 1411 1412 @codequoteundirected on 1413 @codequotebacktick on 1414 @example 1415 $ seq 3 | sed '2c\ 1416 hello 1417 s/./X/' 1418 @print{}X 1419 @print{}hello 1420 @print{}X 1421 @end example 1422 @codequoteundirected off 1423 @codequotebacktick off 1424 1425 As a GNU extension, the @code{c} command and @var{text} can be 1426 separated into two @code{-e} parameters, enabling easier scripting: 1427 @codequoteundirected on 1428 @codequotebacktick on 1429 @example 1430 $ seq 3 | sed -e '2c\' -e hello 1431 1 1432 hello 1433 3 1434 1435 $ sed -e '2c\' -e "$VAR" 1436 @end example 1437 @codequoteundirected off 1438 @codequotebacktick off 1439 1178 1440 1179 1441 @item = 1180 @cindex @value{SSEDEXT}, two addresses supported by most commands1181 As a @acronym{GNU} extension, this command accepts two addresses.1182 1183 1442 @findex = (print line number) command 1184 1443 @cindex Printing line number 1185 1444 @cindex Line number, printing 1186 1445 Print out the current input line number (with a trailing newline). 1446 1447 @codequoteundirected on 1448 @codequotebacktick on 1449 @example 1450 $ printf '%s\n' aaa bbb ccc | sed = 1451 1 1452 aaa 1453 2 1454 bbb 1455 3 1456 ccc 1457 @end example 1458 @codequoteundirected off 1459 @codequotebacktick off 1460 1461 @cindex @value{SSEDEXT}, two addresses supported by most commands 1462 As a GNU extension, this command accepts two addresses. 1463 1464 1465 1187 1466 1188 1467 @item l @var{n} … … 1204 1483 1205 1484 @item r @var{filename} 1206 @cindex @value{SSEDEXT}, two addresses supported by most commands1207 As a @acronym{GNU} extension, this command accepts two addresses.1208 1485 1209 1486 @findex r (read file) command 1210 1487 @cindex Read text from a file 1488 Reads file @var{filename}. Example: 1489 1490 @codequoteundirected on 1491 @codequotebacktick on 1492 @example 1493 $ seq 3 | sed '2r/etc/hostname' 1494 1 1495 2 1496 fencepost.gnu.org 1497 3 1498 @end example 1499 @codequoteundirected off 1500 @codequotebacktick off 1501 1211 1502 @cindex @value{SSEDEXT}, @file{/dev/stdin} file 1212 1503 Queue the contents of @var{filename} to be read and … … 1220 1511 standard input. 1221 1512 1513 @cindex @value{SSEDEXT}, two addresses supported by most commands 1514 As a GNU extension, this command accepts two addresses. The 1515 file will then be reread and inserted on each of the addressed lines. 1516 1517 As a @value{SSED} extension, the @code{r} command accepts a zero address, 1518 inserting a file @emph{before} the first line of the input 1519 @pxref{Adding a header to multiple files}. 1520 1222 1521 @item w @var{filename} 1223 1522 @findex w (write file) command … … 1226 1525 @cindex @value{SSEDEXT}, @file{/dev/stderr} file 1227 1526 Write the pattern space to @var{filename}. 1228 As a @value{SSED} extension, two special values of @var{file -name} are1527 As a @value{SSED} extension, two special values of @var{filename} are 1229 1528 supported: @file{/dev/stderr}, which writes the result to the standard 1230 1529 error, and @file{/dev/stdout}, which writes to the standard … … 1232 1531 option is being used.} 1233 1532 1234 The file will be created (or truncated) before the 1235 first input line is read; all @code{w} commands 1236 (including instances of @code{w} flag on successful @code{s} commands) 1237 which refer to the same @var{filename} are output without 1238 closing and reopening the file. 1533 The file will be created (or truncated) before the first input line is 1534 read; all @code{w} commands (including instances of the @code{w} flag 1535 on successful @code{s} commands) which refer to the same @var{filename} 1536 are output without closing and reopening the file. 1239 1537 1240 1538 @item D 1241 1539 @findex D (delete first line) command 1242 1540 @cindex Delete first line from pattern space 1243 Delete text in the pattern space up to the first newline. 1244 If any text is left, restart cycle with the resultant 1245 pattern space (without reading a new line of input), 1246 otherwise start a normal new cycle.1541 If pattern space contains no newline, start a normal new cycle as if 1542 the @code{d} command was issued. Otherwise, delete text in the pattern 1543 space up to the first newline, and restart cycle with the resultant 1544 pattern space, without reading a new line of input. 1247 1545 1248 1546 @item N … … 1254 1552 If there is no more input then @command{sed} exits without processing 1255 1553 any more commands. 1554 1555 When @option{-z} is used, a zero byte (the ascii @samp{NUL} character) is 1556 added between the lines (instead of a new line). 1557 1558 By default @command{sed} does not terminate if there is no 'next' input line. 1559 This is a GNU extension which can be disabled with @option{--posix}. 1560 @xref{N_command_last_line,,N command on the last line}. 1561 1256 1562 1257 1563 @item P … … 1356 1662 1357 1663 If a parameter is specified, instead, the @code{e} command 1358 interprets it as a command and sends its output to the output stream 1359 (like @code{r} does). The command can run across multiple 1360 lines, all but the last ending witha back-slash.1664 interprets it as a command and sends its output to the output stream. 1665 The command can run across multiple lines, all but the last ending with 1666 a back-slash. 1361 1667 1362 1668 In both cases, the results are undefined if the command to be 1363 1669 executed contains a @sc{nul} character. 1364 1670 1365 @item L @var{n} 1366 @findex L (fLow paragraphs) command 1367 @cindex Reformat pattern space 1368 @cindex Reformatting paragraphs 1369 @cindex @value{SSEDEXT}, reformatting paragraphs 1370 @cindex @value{SSEDEXT}, @code{L} command 1371 This @value{SSED} extension fills and joins lines in pattern space 1372 to produce output lines of (at most) @var{n} characters, like 1373 @code{fmt} does; if @var{n} is omitted, the default as specified 1374 on the command line is used. This command is considered a failed 1375 experiment and unless there is enough request (which seems unlikely) 1376 will be removed in future versions. 1377 1378 @ignore 1379 Blank lines, spaces between words, and indentation are 1380 preserved in the output; successive input lines with different 1381 indentation are not joined; tabs are expanded to 8 columns. 1382 1383 If the pattern space contains multiple lines, they are joined, but 1384 since the pattern space usually contains a single line, the behavior 1385 of a simple @code{L;d} script is the same as @samp{fmt -s} (i.e., 1386 it does not join short lines to form longer ones). 1387 1388 @var{n} specifies the desired line-wrap length; if omitted, 1389 the default as specified on the command line is used. 1390 @end ignore 1671 Note that, unlike the @code{r} command, the output of the command will 1672 be printed immediately; the @code{r} command instead delays the output 1673 to the end of the current cycle. 1674 1675 @item F 1676 @findex F (File name) command 1677 @cindex Printing file name 1678 @cindex File name, printing 1679 Print out the file name of the current input file (with a trailing 1680 newline). 1391 1681 1392 1682 @item Q [@var{exit-code}] 1393 This command only accepts a single address.1683 This command accepts only one address. 1394 1684 1395 1685 @findex Q (silent Quit) command … … 1409 1699 @example 1410 1700 :eat 1411 $d @i{ Quit silently on the last line}1412 N @i{ Read another line, silently}1413 g @i{ Overwrite pattern space each time to save memory}1701 $d @i{@r{Quit silently on the last line}} 1702 N @i{@r{Read another line, silently}} 1703 g @i{@r{Overwrite pattern space each time to save memory}} 1414 1704 b eat 1415 1705 @end example … … 1462 1752 the first newline. Everything said under the @code{w} command about 1463 1753 file handling holds here too. 1754 1755 @item z 1756 @findex z (Zap) command 1757 @cindex @value{SSEDEXT}, emptying pattern space 1758 @cindex Emptying pattern space 1759 This command empties the content of pattern space. It is 1760 usually the same as @samp{s/.*//}, but is more efficient 1761 and works in the presence of invalid multibyte sequences 1762 in the input stream. @sc{posix} mandates that such sequences 1763 are @emph{not} matched by @samp{.}, so that there is no portable 1764 way to clear @command{sed}'s buffers in the middle of the 1765 script in most multibyte locales (including UTF-8 locales). 1464 1766 @end table 1465 1767 1768 1769 @node Multiple commands syntax 1770 @section Multiple commands syntax 1771 1772 @c POSIX says: 1773 @c Editing commands other than {...}, a, b, c, i, r, t, w, :, and # 1774 @c can be followed by a <semicolon>, optional <blank> characters, and 1775 @c another editing command. However, when an s editing command is used 1776 @c with the w flag, following it with another command in this manner 1777 @c produces undefined results. 1778 1779 There are several methods to specify multiple commands in a @command{sed} 1780 program. 1781 1782 Using newlines is most natural when running a sed script from a file 1783 (using the @option{-f} option). 1784 1785 On the command line, all @command{sed} commands may be separated by newlines. 1786 Alternatively, you may specify each command as an argument to an @option{-e} 1787 option: 1788 1789 @codequoteundirected on 1790 @codequotebacktick on 1791 @example 1792 @group 1793 $ seq 6 | sed '1d 1794 3d 1795 5d' 1796 2 1797 4 1798 6 1799 1800 $ seq 6 | sed -e 1d -e 3d -e 5d 1801 2 1802 4 1803 6 1804 @end group 1805 @end example 1806 @codequoteundirected off 1807 @codequotebacktick off 1808 1809 A semicolon (@samp{;}) may be used to separate most simple commands: 1810 1811 @codequoteundirected on 1812 @codequotebacktick on 1813 @example 1814 @group 1815 $ seq 6 | sed '1d;3d;5d' 1816 2 1817 4 1818 6 1819 @end group 1820 @end example 1821 @codequoteundirected off 1822 @codequotebacktick off 1823 1824 The @code{@{},@code{@}},@code{b},@code{t},@code{T},@code{:} commands can 1825 be separated with a semicolon (this is a non-portable @value{SSED} extension). 1826 1827 @codequoteundirected on 1828 @codequotebacktick on 1829 @example 1830 @group 1831 $ seq 4 | sed '@{1d;3d@}' 1832 2 1833 4 1834 1835 $ seq 6 | sed '@{1d;3d@};5d' 1836 2 1837 4 1838 6 1839 @end group 1840 @end example 1841 @codequoteundirected off 1842 @codequotebacktick off 1843 1844 Labels used in @code{b},@code{t},@code{T},@code{:} commands are read 1845 until a semicolon. Leading and trailing whitespace is ignored. In 1846 the examples below the label is @samp{x}. The first example works 1847 with @value{SSED}. The second is a portable equivalent. For more 1848 information about branching and labels @pxref{Branching and flow 1849 control}. 1850 1851 @codequoteundirected on 1852 @codequotebacktick on 1853 @example 1854 @group 1855 $ seq 3 | sed '/1/b x ; s/^/=/ ; :x ; 3d' 1856 1 1857 =2 1858 1859 $ seq 3 | sed -e '/1/bx' -e 's/^/=/' -e ':x' -e '3d' 1860 1 1861 =2 1862 @end group 1863 @end example 1864 @codequoteundirected off 1865 @codequotebacktick off 1866 1867 1868 1869 @subsection Commands Requiring a newline 1870 1871 The following commands cannot be separated by a semicolon and 1872 require a newline: 1873 1874 @table @asis 1875 1876 @item @code{a},@code{c},@code{i} (append/change/insert) 1877 1878 All characters following @code{a},@code{c},@code{i} commands are taken 1879 as the text to append/change/insert. Using a semicolon leads to 1880 undesirable results: 1881 1882 @codequoteundirected on 1883 @codequotebacktick on 1884 @example 1885 @group 1886 $ seq 2 | sed '1aHello ; 2d' 1887 1 1888 Hello ; 2d 1889 2 1890 @end group 1891 @end example 1892 @codequoteundirected off 1893 @codequotebacktick off 1894 1895 Separate the commands using @option{-e} or a newline: 1896 1897 @codequoteundirected on 1898 @codequotebacktick on 1899 @example 1900 @group 1901 $ seq 2 | sed -e 1aHello -e 2d 1902 1 1903 Hello 1904 1905 $ seq 2 | sed '1aHello 1906 2d' 1907 1 1908 Hello 1909 @end group 1910 @end example 1911 @codequoteundirected off 1912 @codequotebacktick off 1913 1914 Note that specifying the text to add (@samp{Hello}) immediately 1915 after @code{a},@code{c},@code{i} is itself a @value{SSED} extension. 1916 A portable, POSIX-compliant alternative is: 1917 1918 @codequoteundirected on 1919 @codequotebacktick on 1920 @example 1921 @group 1922 $ seq 2 | sed '1a\ 1923 Hello 1924 2d' 1925 1 1926 Hello 1927 @end group 1928 @end example 1929 @codequoteundirected off 1930 @codequotebacktick off 1931 1932 @item @code{#} (comment) 1933 1934 All characters following @samp{#} until the next newline are ignored. 1935 1936 @codequoteundirected on 1937 @codequotebacktick on 1938 @example 1939 @group 1940 $ seq 3 | sed '# this is a comment ; 2d' 1941 1 1942 2 1943 3 1944 1945 1946 $ seq 3 | sed '# this is a comment 1947 2d' 1948 1 1949 3 1950 @end group 1951 @end example 1952 @codequoteundirected off 1953 @codequotebacktick off 1954 1955 @item @code{r},@code{R},@code{w},@code{W} (reading and writing files) 1956 1957 The @code{r},@code{R},@code{w},@code{W} commands parse the filename 1958 until end of the line. If whitespace, comments or semicolons are found, 1959 they will be included in the filename, leading to unexpected results: 1960 1961 @codequoteundirected on 1962 @codequotebacktick on 1963 @example 1964 @group 1965 $ seq 2 | sed '1w hello.txt ; 2d' 1966 1 1967 2 1968 1969 $ ls -log 1970 total 4 1971 -rw-rw-r-- 1 2 Jan 23 23:03 hello.txt ; 2d 1972 1973 $ cat 'hello.txt ; 2d' 1974 1 1975 @end group 1976 @end example 1977 @codequoteundirected off 1978 @codequotebacktick off 1979 1980 Note that @command{sed} silently ignores read/write errors in 1981 @code{r},@code{R},@code{w},@code{W} commands (such as missing files). 1982 In the following example, @command{sed} tries to read a file named 1983 @samp{@file{hello.txt ; N}}. The file is missing, and the error is silently 1984 ignored: 1985 1986 @codequoteundirected on 1987 @codequotebacktick on 1988 @example 1989 @group 1990 $ echo x | sed '1rhello.txt ; N' 1991 x 1992 @end group 1993 @end example 1994 @codequoteundirected off 1995 @codequotebacktick off 1996 1997 @item @code{e} (command execution) 1998 1999 Any characters following the @code{e} command until the end of the line 2000 will be sent to the shell. If whitespace, comments or semicolons are found, 2001 they will be included in the shell command, leading to unexpected results: 2002 2003 @codequoteundirected on 2004 @codequotebacktick on 2005 @example 2006 @group 2007 $ echo a | sed '1e touch foo#bar' 2008 a 2009 2010 $ ls -1 2011 foo#bar 2012 2013 $ echo a | sed '1e touch foo ; s/a/b/' 2014 sh: 1: s/a/b/: not found 2015 a 2016 @end group 2017 @end example 2018 @codequoteundirected off 2019 @codequotebacktick off 2020 2021 2022 @item @code{s///[we]} (substitute with @code{e} or @code{w} flags) 2023 2024 In a substitution command, the @code{w} flag writes the substitution 2025 result to a file, and the @code{e} flag executes the substitution result 2026 as a shell command. As with the @code{r/R/w/W/e} commands, these 2027 must be terminated with a newline. If whitespace, comments or semicolons 2028 are found, they will be included in the shell command or filename, leading to 2029 unexpected results: 2030 2031 @codequoteundirected on 2032 @codequotebacktick on 2033 @example 2034 @group 2035 $ echo a | sed 's/a/b/w1.txt#foo' 2036 b 2037 2038 $ ls -1 2039 1.txt#foo 2040 @end group 2041 @end example 2042 @codequoteundirected off 2043 @codequotebacktick off 2044 2045 @end table 2046 2047 2048 @node sed addresses 2049 @chapter Addresses: selecting lines 2050 2051 @menu 2052 * Addresses overview:: Addresses overview 2053 * Numeric Addresses:: selecting lines by numbers 2054 * Regexp Addresses:: selecting lines by text matching 2055 * Range Addresses:: selecting a range of lines 2056 * Zero Address:: Using address @code{0} 2057 @end menu 2058 2059 @node Addresses overview 2060 @section Addresses overview 2061 2062 @cindex addresses, numeric 2063 @cindex numeric addresses 2064 Addresses determine on which line(s) the @command{sed} command will be 2065 executed. The following command replaces any first occurrence of @samp{hello} 2066 with @samp{world} only on line 144: 2067 2068 @codequoteundirected on 2069 @codequotebacktick on 2070 @example 2071 sed '144s/hello/world/' input.txt > output.txt 2072 @end example 2073 @codequoteundirected off 2074 @codequotebacktick off 2075 2076 2077 2078 If no address is specified, the command is performed on all lines. 2079 The following command replaces @samp{hello} with @samp{world}, 2080 targeting every line of the input file. 2081 However, note that it modifies only the first instance of @samp{hello} 2082 on each line. 2083 Use the @samp{g} modifier to affect every instance on each affected line. 2084 2085 @codequoteundirected on 2086 @codequotebacktick on 2087 @example 2088 sed 's/hello/world/' input.txt > output.txt 2089 @end example 2090 @codequoteundirected off 2091 @codequotebacktick off 2092 2093 2094 2095 @cindex addresses, regular expression 2096 @cindex regular expression addresses 2097 Addresses can contain regular expressions to match lines based 2098 on content instead of line numbers. The following command replaces 2099 @samp{hello} with @samp{world} only on lines 2100 containing the string @samp{apple}: 2101 2102 @codequoteundirected on 2103 @codequotebacktick on 2104 @example 2105 sed '/apple/s/hello/world/' input.txt > output.txt 2106 @end example 2107 @codequoteundirected off 2108 @codequotebacktick off 2109 2110 2111 2112 @cindex addresses, range 2113 @cindex range addresses 2114 An address range is specified with two addresses separated by a comma 2115 (@code{,}). Addresses can be numeric, regular expressions, or a mix of 2116 both. 2117 The following command replaces @samp{hello} with @samp{world} 2118 only on lines 4 to 17 (inclusive): 2119 2120 @codequoteundirected on 2121 @codequotebacktick on 2122 @example 2123 sed '4,17s/hello/world/' input.txt > output.txt 2124 @end example 2125 @codequoteundirected off 2126 @codequotebacktick off 2127 2128 2129 2130 @cindex Excluding lines 2131 @cindex Selecting non-matching lines 2132 @cindex addresses, negating 2133 @cindex addresses, excluding 2134 Appending the @code{!} character to the end of an address 2135 specification (before the command letter) negates the sense of the 2136 match. That is, if the @code{!} character follows an address or an 2137 address range, then only lines which do @emph{not} match the addresses 2138 will be selected. The following command replaces @samp{hello} 2139 with @samp{world} only on lines @emph{not} containing the string 2140 @samp{apple}: 2141 2142 @example 2143 sed '/apple/!s/hello/world/' input.txt > output.txt 2144 @end example 2145 2146 The following command replaces @samp{hello} with 2147 @samp{world} only on lines 1 to 3 and from line 18 to the last line of the 2148 input file (i.e. excluding lines 4 to 17): 2149 2150 @example 2151 sed '4,17!s/hello/world/' input.txt > output.txt 2152 @end example 2153 2154 2155 2156 2157 2158 @node Numeric Addresses 2159 @section Selecting lines by numbers 2160 @cindex Addresses, in @command{sed} scripts 2161 @cindex Line selection 2162 @cindex Selecting lines to process 2163 2164 Addresses in a @command{sed} script can be in any of the following forms: 2165 @table @code 2166 @item @var{number} 2167 @cindex Address, numeric 2168 @cindex Line, selecting by number 2169 Specifying a line number will match only that line in the input. 2170 (Note that @command{sed} counts lines continuously across all input files 2171 unless @option{-i} or @option{-s} options are specified.) 2172 2173 @item $ 2174 @cindex Address, last line 2175 @cindex Last line, selecting 2176 @cindex Line, selecting last 2177 This address matches the last line of the last file of input, or 2178 the last line of each file when the @option{-i} or @option{-s} options 2179 are specified. 2180 2181 2182 @item @var{first}~@var{step} 2183 @cindex GNU extensions, @samp{@var{n}~@var{m}} addresses 2184 This GNU extension matches every @var{step}th line 2185 starting with line @var{first}. 2186 In particular, lines will be selected when there exists 2187 a non-negative @var{n} such that the current line-number equals 2188 @var{first} + (@var{n} * @var{step}). 2189 Thus, one would use @code{1~2} to select the odd-numbered lines and 2190 @code{0~2} for even-numbered lines; 2191 to pick every third line starting with the second, @samp{2~3} would be used; 2192 to pick every fifth line starting with the tenth, use @samp{10~5}; 2193 and @samp{50~0} is just an obscure way of saying @code{50}. 2194 2195 The following commands demonstrate the step address usage: 2196 2197 @example 2198 $ seq 10 | sed -n '0~4p' 2199 4 2200 8 2201 2202 $ seq 10 | sed -n '1~3p' 2203 1 2204 4 2205 7 2206 10 2207 @end example 2208 2209 2210 @end table 2211 2212 2213 2214 @node Regexp Addresses 2215 @section selecting lines by text matching 2216 2217 @value{SSED} supports the following regular expression addresses. 2218 The default regular expression is 2219 @ref{BRE syntax, , Basic Regular Expression (BRE)}. 2220 If @option{-E} or @option{-r} options are used, The regular expression should be 2221 in @ref{ERE syntax, , Extended Regular Expression (ERE)} syntax. 2222 @xref{BRE vs ERE}. 2223 2224 @table @code 2225 @item /@var{regexp}/ 2226 @cindex Address, as a regular expression 2227 @cindex Line, selecting by regular expression match 2228 This will select any line which matches the regular expression @var{regexp}. 2229 If @var{regexp} itself includes any @code{/} characters, 2230 each must be escaped by a backslash (@code{\}). 2231 2232 The following command prints lines in @file{/etc/passwd} 2233 which end with @samp{bash}@footnote{ 2234 There are of course many other ways to do the same, 2235 e.g. 2236 @example 2237 grep 'bash$' /etc/passwd 2238 awk -F: '$7 == "/bin/bash"' /etc/passwd 2239 @end example 2240 }: 2241 2242 @example 2243 sed -n '/bash$/p' /etc/passwd 2244 @end example 2245 2246 @cindex empty regular expression 2247 @cindex @value{SSEDEXT}, modifiers and the empty regular expression 2248 The empty regular expression @samp{//} repeats the last regular 2249 expression match (the same holds if the empty regular expression is 2250 passed to the @code{s} command). Note that modifiers to regular expressions 2251 are evaluated when the regular expression is compiled, thus it is invalid to 2252 specify them together with the empty regular expression. 2253 2254 @item \%@var{regexp}% 2255 (The @code{%} may be replaced by any other single character.) 2256 2257 @cindex Slash character, in regular expressions 2258 This also matches the regular expression @var{regexp}, 2259 but allows one to use a different delimiter than @code{/}. 2260 This is particularly useful if the @var{regexp} itself contains 2261 a lot of slashes, since it avoids the tedious escaping of every @code{/}. 2262 If @var{regexp} itself includes any delimiter characters, 2263 each must be escaped by a backslash (@code{\}). 2264 2265 The following commands are equivalent. They print lines 2266 which start with @samp{/home/alice/documents/}: 2267 2268 @example 2269 sed -n '/^\/home\/alice\/documents\//p' 2270 sed -n '\%^/home/alice/documents/%p' 2271 sed -n '\;^/home/alice/documents/;p' 2272 @end example 2273 2274 2275 @item /@var{regexp}/I 2276 @itemx \%@var{regexp}%I 2277 @cindex GNU extensions, @code{I} modifier 2278 @cindex case insensitive, regular expression 2279 The @code{I} modifier to regular-expression matching is a GNU 2280 extension which causes the @var{regexp} to be matched in 2281 a case-insensitive manner. 2282 2283 In many other programming languages, a lower case @code{i} is used 2284 for case-insensitive regular expression matching. However, in @command{sed} 2285 the @code{i} is used for the insert command (@pxref{insert command}). 2286 2287 Observe the difference between the following examples. 2288 2289 In this example, @code{/b/I} is the address: regular expression with @code{I} 2290 modifier. @code{d} is the delete command: 2291 2292 @example 2293 $ printf "%s\n" a b c | sed '/b/Id' 2294 a 2295 c 2296 @end example 2297 2298 Here, @code{/b/} is the address: a regular expression. 2299 @code{i} is the insert command. 2300 @code{d} is the value to insert. 2301 A line with @samp{d} is then inserted above the matched line: 2302 2303 @example 2304 $ printf "%s\n" a b c | sed '/b/id' 2305 a 2306 d 2307 b 2308 c 2309 @end example 2310 2311 @item /@var{regexp}/M 2312 @itemx \%@var{regexp}%M 2313 @cindex @value{SSEDEXT}, @code{M} modifier 2314 The @code{M} modifier to regular-expression matching is a @value{SSED} 2315 extension which directs @value{SSED} to match the regular expression 2316 in @cite{multi-line} mode. The modifier causes @code{^} and @code{$} to 2317 match respectively (in addition to the normal behavior) the empty string 2318 after a newline, and the empty string before a newline. There are 2319 special character sequences 2320 @ifclear PERL 2321 (@code{\`} and @code{\'}) 2322 @end ifclear 2323 which always match the beginning or the end of the buffer. 2324 In addition, 2325 the period character does not match a new-line character in 2326 multi-line mode. 2327 @end table 2328 2329 2330 @cindex regex addresses and pattern space 2331 @cindex regex addresses and input lines 2332 Regex addresses operate on the content of the current 2333 pattern space. If the pattern space is changed (for example with @code{s///} 2334 command) the regular expression matching will operate on the changed text. 2335 2336 In the following example, automatic printing is disabled with 2337 @option{-n}. The @code{s/2/X/} command changes lines containing 2338 @samp{2} to @samp{X}. The command @code{/[0-9]/p} matches 2339 lines with digits and prints them. 2340 Because the second line is changed before the @code{/[0-9]/} regex, 2341 it will not match and will not be printed: 2342 2343 @codequoteundirected on 2344 @codequotebacktick on 2345 @example 2346 @group 2347 $ seq 3 | sed -n 's/2/X/ ; /[0-9]/p' 2348 1 2349 3 2350 @end group 2351 @end example 2352 @codequoteundirected off 2353 @codequotebacktick off 2354 2355 2356 @node Range Addresses 2357 @section Range Addresses 2358 2359 @cindex Range of lines 2360 @cindex Several lines, selecting 2361 An address range can be specified by specifying two addresses 2362 separated by a comma (@code{,}). An address range matches lines 2363 starting from where the first address matches, and continues 2364 until the second address matches (inclusively): 2365 2366 @example 2367 $ seq 10 | sed -n '4,6p' 2368 4 2369 5 2370 6 2371 @end example 2372 2373 If the second address is a @var{regexp}, then checking for the 2374 ending match will start with the line @emph{following} the 2375 line which matched the first address: a range will always 2376 span at least two lines (except of course if the input stream 2377 ends). 2378 2379 @example 2380 $ seq 10 | sed -n '4,/[0-9]/p' 2381 4 2382 5 2383 @end example 2384 2385 If the second address is a @var{number} less than (or equal to) 2386 the line matching the first address, then only the one line is 2387 matched: 2388 2389 @example 2390 $ seq 10 | sed -n '4,1p' 2391 4 2392 @end example 2393 2394 @anchor{Zero Address Regex Range} 2395 @cindex Special addressing forms 2396 @cindex Range with start address of zero 2397 @cindex Zero, as range start address 2398 @cindex @var{addr1},+N 2399 @cindex @var{addr1},~N 2400 @cindex GNU extensions, special two-address forms 2401 @cindex GNU extensions, @code{0} address 2402 @cindex GNU extensions, 0,@var{addr2} addressing 2403 @cindex GNU extensions, @var{addr1},+@var{N} addressing 2404 @cindex GNU extensions, @var{addr1},~@var{N} addressing 2405 @value{SSED} also supports some special two-address forms; all these 2406 are GNU extensions: 2407 @table @code 2408 @item 0,/@var{regexp}/ 2409 A line number of @code{0} can be used in an address specification like 2410 @code{0,/@var{regexp}/} so that @command{sed} will try to match 2411 @var{regexp} in the first input line too. In other words, 2412 @code{0,/@var{regexp}/} is similar to @code{1,/@var{regexp}/}, 2413 except that if @var{addr2} matches the very first line of input the 2414 @code{0,/@var{regexp}/} form will consider it to end the range, whereas 2415 the @code{1,/@var{regexp}/} form will match the beginning of its range and 2416 hence make the range span up to the @emph{second} occurrence of the 2417 regular expression. 2418 2419 The following examples demonstrate the difference between starting 2420 with address 1 and 0: 2421 2422 @example 2423 $ seq 10 | sed -n '1,/[0-9]/p' 2424 1 2425 2 2426 2427 $ seq 10 | sed -n '0,/[0-9]/p' 2428 1 2429 @end example 2430 2431 2432 @item @var{addr1},+@var{N} 2433 Matches @var{addr1} and the @var{N} lines following @var{addr1}. 2434 2435 @example 2436 $ seq 10 | sed -n '6,+2p' 2437 6 2438 7 2439 8 2440 @end example 2441 2442 @var{addr1} can be a line number or a regular expression. 2443 2444 @item @var{addr1},~@var{N} 2445 Matches @var{addr1} and the lines following @var{addr1} 2446 until the next line whose input line number is a multiple of @var{N}. 2447 The following command prints starting at line 6, until the next line which 2448 is a multiple of 4 (i.e. line 8): 2449 2450 @example 2451 $ seq 10 | sed -n '6,~4p' 2452 6 2453 7 2454 8 2455 @end example 2456 2457 @var{addr1} can be a line number or a regular expression. 2458 2459 @end table 2460 2461 2462 2463 @node Zero Address 2464 @section Zero Address 2465 @cindex Zero Address 2466 As a @value{SSED} extension, @code{0} address can be used in two cases: 2467 @enumerate 2468 @item 2469 In a regex range addresses as @code{0,/@var{regexp}/} 2470 (@pxref{Zero Address Regex Range}). 2471 @item 2472 With the @code{r} command, inserting a file before the first line 2473 (@pxref{Adding a header to multiple files}). 2474 @end enumerate 2475 2476 Note that these are the only places where the @code{0} address makes 2477 sense; Commands which are given the @code{0} address in any 2478 other way will give an error. 2479 2480 2481 2482 @node sed regular expressions 2483 @chapter Regular Expressions: selecting text 2484 2485 @menu 2486 * Regular Expressions Overview:: Overview of Regular expression in @command{sed} 2487 * BRE vs ERE:: Basic (BRE) and extended (ERE) regular expression 2488 syntax 2489 * BRE syntax:: Overview of basic regular expression syntax 2490 * ERE syntax:: Overview of extended regular expression syntax 2491 * Character Classes and Bracket Expressions:: 2492 * regexp extensions:: Additional regular expression commands 2493 * Back-references and Subexpressions:: Back-references and Subexpressions 2494 * Escapes:: Specifying special characters 2495 * Locale Considerations:: Multibyte characters and locale considerations 2496 @end menu 2497 2498 @node Regular Expressions Overview 2499 @section Overview of regular expression in @command{sed} 2500 2501 @c NOTE: Keep examples in the 'overview' section 2502 @c neutral in regards to BRE/ERE - to ease understanding. 2503 2504 2505 To know how to use @command{sed}, people should understand regular 2506 expressions (@dfn{regexp} for short). A regular expression 2507 is a pattern that is matched against a 2508 subject string from left to right. Most characters are 2509 @dfn{ordinary}: they stand for 2510 themselves in a pattern, and match the corresponding characters. 2511 Regular expressions in @command{sed} are specified between two 2512 slashes. 2513 2514 The following command prints lines containing the string @samp{hello}: 2515 2516 @example 2517 sed -n '/hello/p' 2518 @end example 2519 2520 The above example is equivalent to this @command{grep} command: 2521 2522 @example 2523 grep 'hello' 2524 @end example 2525 2526 The power of regular expressions comes from the ability to include 2527 alternatives and repetitions in the pattern. These are encoded in the 2528 pattern by the use of @dfn{special characters}, which do not stand for 2529 themselves but instead are interpreted in some special way. 2530 2531 The character @code{^} (caret) in a regular expression matches the 2532 beginning of the line. The character @code{.} (dot) matches any single 2533 character. The following @command{sed} command matches and prints 2534 lines which start with the letter @samp{b}, followed by any single character, 2535 followed by the letter @samp{d}: 2536 2537 @example 2538 $ printf "%s\n" abode bad bed bit bid byte body | sed -n '/^b.d/p' 2539 bad 2540 bed 2541 bid 2542 body 2543 @end example 2544 2545 The following sections explain the meaning and usage of special 2546 characters in regular expressions. 2547 2548 @node BRE vs ERE 2549 @section Basic (BRE) and extended (ERE) regular expression 2550 2551 Basic and extended regular expressions are two variations on the 2552 syntax of the specified pattern. Basic Regular Expression (BRE) syntax is the 2553 default in @command{sed} (and similarly in @command{grep}). 2554 Use the POSIX-specified @option{-E} option (@option{-r}, 2555 @option{--regexp-extended}) to enable Extended Regular Expression (ERE) syntax. 2556 2557 In @value{SSED}, the only difference between basic and extended regular 2558 expressions is in the behavior of a few special characters: @samp{?}, 2559 @samp{+}, parentheses, braces (@samp{@{@}}), and @samp{|}. 2560 2561 With basic (BRE) syntax, these characters do not have special meaning 2562 unless prefixed with a backslash (@samp{\}); While with extended (ERE) syntax 2563 it is reversed: these characters are special unless they are prefixed 2564 with backslash (@samp{\}). 2565 2566 @multitable @columnfractions .28 .36 .35 2567 2568 @headitem Desired pattern 2569 @tab Basic (BRE) Syntax 2570 @tab Extended (ERE) Syntax 2571 2572 @item literal @samp{+} (plus sign) 2573 2574 @tab 2575 @exampleindent 0 2576 @codequoteundirected on 2577 @codequotebacktick on 2578 @example 2579 $ echo 'a+b=c' > foo 2580 $ sed -n '/a+b/p' foo 2581 a+b=c 2582 @end example 2583 @codequotebacktick off 2584 @codequoteundirected off 2585 2586 @tab 2587 @exampleindent 0 2588 @codequoteundirected on 2589 @codequotebacktick on 2590 @example 2591 $ echo 'a+b=c' > foo 2592 $ sed -E -n '/a\+b/p' foo 2593 a+b=c 2594 @end example 2595 @codequotebacktick off 2596 @codequoteundirected off 2597 2598 2599 @item One or more @samp{a} characters followed by @samp{b} 2600 (plus sign as special meta-character) 2601 2602 @tab 2603 @exampleindent 0 2604 @codequoteundirected on 2605 @codequotebacktick on 2606 @example 2607 $ echo aab > foo 2608 $ sed -n '/a\+b/p' foo 2609 aab 2610 @end example 2611 @codequotebacktick off 2612 @codequoteundirected off 2613 2614 @tab 2615 @exampleindent 0 2616 @codequoteundirected on 2617 @codequotebacktick on 2618 @example 2619 $ echo aab > foo 2620 $ sed -E -n '/a+b/p' foo 2621 aab 2622 @end example 2623 @codequotebacktick off 2624 @codequoteundirected off 2625 2626 @end multitable 2627 2628 2629 2630 2631 @node BRE syntax 2632 @section Overview of basic regular expression syntax 2633 2634 Here is a brief description 2635 of regular expression syntax as used in @command{sed}. 2636 2637 @table @code 2638 @item @var{char} 2639 A single ordinary character matches itself. 2640 2641 @item * 2642 @cindex GNU extensions, to basic regular expressions 2643 Matches a sequence of zero or more instances of matches for the 2644 preceding regular expression, which must be an ordinary character, a 2645 special character preceded by @code{\}, a @code{.}, a grouped regexp 2646 (see below), or a bracket expression. As a GNU extension, a 2647 postfixed regular expression can also be followed by @code{*}; for 2648 example, @code{a**} is equivalent to @code{a*}. POSIX 2649 1003.1-2001 says that @code{*} stands for itself when it appears at 2650 the start of a regular expression or subexpression, but many 2651 non-GNU implementations do not support this and portable 2652 scripts should instead use @code{\*} in these contexts. 2653 @item . 2654 Matches any character, including newline. 2655 2656 @item ^ 2657 Matches the null string at beginning of the pattern space, i.e. what 2658 appears after the circumflex must appear at the beginning of the 2659 pattern space. 2660 2661 In most scripts, pattern space is initialized to the content of each 2662 line (@pxref{Execution Cycle, , How @code{sed} works}). So, it is a 2663 useful simplification to think of @code{^#include} as matching only 2664 lines where @samp{#include} is the first thing on the line---if there is 2665 any preceding space, for example, the match fails. This simplification is 2666 valid as long as the original content of pattern space is not modified, 2667 for example with an @code{s} command. 2668 2669 @code{^} acts as a special character only at the beginning of the 2670 regular expression or subexpression (that is, after @code{\(} or 2671 @code{\|}). Portable scripts should avoid @code{^} at the beginning of 2672 a subexpression, though, as POSIX allows implementations that 2673 treat @code{^} as an ordinary character in that context. 2674 2675 @item $ 2676 It is the same as @code{^}, but refers to end of pattern space. 2677 @code{$} also acts as a special character only at the end 2678 of the regular expression or subexpression (that is, before @code{\)} 2679 or @code{\|}), and its use at the end of a subexpression is not 2680 portable. 2681 2682 2683 @item [@var{list}] 2684 @itemx [^@var{list}] 2685 Matches any single character in @var{list}: for example, 2686 @code{[aeiou]} matches all vowels. A list may include 2687 sequences like @code{@var{char1}-@var{char2}}, which 2688 matches any character between (inclusive) @var{char1} 2689 and @var{char2}. 2690 @xref{Character Classes and Bracket Expressions}. 2691 2692 @item \+ 2693 @cindex GNU extensions, to basic regular expressions 2694 As @code{*}, but matches one or more. It is a GNU extension. 2695 2696 @item \? 2697 @cindex GNU extensions, to basic regular expressions 2698 As @code{*}, but only matches zero or one. It is a GNU extension. 2699 2700 @item \@{@var{i}\@} 2701 As @code{*}, but matches exactly @var{i} sequences (@var{i} is a 2702 decimal integer; for portability, keep it between 0 and 255 2703 inclusive). 2704 2705 @item \@{@var{i},@var{j}\@} 2706 Matches between @var{i} and @var{j}, inclusive, sequences. 2707 2708 @item \@{@var{i},\@} 2709 Matches more than or equal to @var{i} sequences. 2710 2711 @item \(@var{regexp}\) 2712 Groups the inner @var{regexp} as a whole, this is used to: 2713 2714 @itemize @bullet 2715 @item 2716 @cindex GNU extensions, to basic regular expressions 2717 Apply postfix operators, like @code{\(abcd\)*}: 2718 this will search for zero or more whole sequences 2719 of @samp{abcd}, while @code{abcd*} would search 2720 for @samp{abc} followed by zero or more occurrences 2721 of @samp{d}. Note that support for @code{\(abcd\)*} is 2722 required by POSIX 1003.1-2001, but many non-GNU 2723 implementations do not support it and hence it is not universally 2724 portable. 2725 2726 @item 2727 Use back references (see below). 2728 @end itemize 2729 2730 2731 @item @var{regexp1}\|@var{regexp2} 2732 @cindex GNU extensions, to basic regular expressions 2733 Matches either @var{regexp1} or @var{regexp2}. Use 2734 parentheses to use complex alternative regular expressions. 2735 The matching process tries each alternative in turn, from 2736 left to right, and the first one that succeeds is used. 2737 It is a GNU extension. 2738 2739 @item @var{regexp1}@var{regexp2} 2740 Matches the concatenation of @var{regexp1} and @var{regexp2}. 2741 Concatenation binds more tightly than @code{\|}, @code{^}, and 2742 @code{$}, but less tightly than the other regular expression 2743 operators. 2744 2745 @item \@var{digit} 2746 Matches the @var{digit}-th @code{\(@dots{}\)} parenthesized 2747 subexpression in the regular expression. This is called a @dfn{back 2748 reference}. Subexpressions are implicitly numbered by counting 2749 occurrences of @code{\(} left-to-right. 2750 2751 @item \n 2752 Matches the newline character. 2753 2754 @item \@var{char} 2755 Matches @var{char}, where @var{char} is one of @code{$}, 2756 @code{*}, @code{.}, @code{[}, @code{\}, or @code{^}. 2757 Note that the only C-like 2758 backslash sequences that you can portably assume to be 2759 interpreted are @code{\n} and @code{\\}; in particular 2760 @code{\t} is not portable, and matches a @samp{t} under most 2761 implementations of @command{sed}, rather than a tab character. 2762 2763 @end table 2764 2765 @cindex Greedy regular expression matching 2766 Note that the regular expression matcher is greedy, i.e., matches 2767 are attempted from left to right and, if two or more matches are 2768 possible starting at the same character, it selects the longest. 2769 2770 @noindent 2771 Examples: 2772 @table @samp 2773 @item abcdef 2774 Matches @samp{abcdef}. 2775 2776 @item a*b 2777 Matches zero or more @samp{a}s followed by a single 2778 @samp{b}. For example, @samp{b} or @samp{aaaaab}. 2779 2780 @item a\?b 2781 Matches @samp{b} or @samp{ab}. 2782 2783 @item a\+b\+ 2784 Matches one or more @samp{a}s followed by one or more 2785 @samp{b}s: @samp{ab} is the shortest possible match, but 2786 other examples are @samp{aaaab} or @samp{abbbbb} or 2787 @samp{aaaaaabbbbbbb}. 2788 2789 @item .* 2790 @itemx .\+ 2791 These two both match all the characters in a string; 2792 however, the first matches every string (including the empty 2793 string), while the second matches only strings containing 2794 at least one character. 2795 2796 @item ^main.*(.*) 2797 This matches a string starting with @samp{main}, 2798 followed by an opening and closing 2799 parenthesis. The @samp{n}, @samp{(} and @samp{)} need not 2800 be adjacent. 2801 2802 @item ^# 2803 This matches a string beginning with @samp{#}. 2804 2805 @item \\$ 2806 This matches a string ending with a single backslash. The 2807 regexp contains two backslashes for escaping. 2808 2809 @item \$ 2810 Instead, this matches a string consisting of a single dollar sign, 2811 because it is escaped. 2812 2813 @item [a-zA-Z0-9] 2814 In the C locale, this matches any ASCII letters or digits. 2815 2816 @item [^ @kbd{@key{TAB}}]\+ 2817 (Here @kbd{@key{TAB}} stands for a single tab character.) 2818 This matches a string of one or more 2819 characters, none of which is a space or a tab. 2820 Usually this means a word. 2821 2822 @item ^\(.*\)\n\1$ 2823 This matches a string consisting of two equal substrings separated by 2824 a newline. 2825 2826 @item .\@{9\@}A$ 2827 This matches nine characters followed by an @samp{A} at the end of a line. 2828 2829 @item ^.\@{15\@}A 2830 This matches the start of a string that contains 16 characters, 2831 the last of which is an @samp{A}. 2832 2833 @end table 2834 2835 2836 @node ERE syntax 2837 @section Overview of extended regular expression syntax 2838 @cindex Extended regular expressions, syntax 2839 2840 The only difference between basic and extended regular expressions is in 2841 the behavior of a few characters: @samp{?}, @samp{+}, parentheses, 2842 braces (@samp{@{@}}), and @samp{|}. While basic regular expressions 2843 require these to be escaped if you want them to behave as special 2844 characters, when using extended regular expressions you must escape 2845 them if you want them @emph{to match a literal character}. @samp{|} 2846 is special here because @samp{\|} is a GNU extension -- standard 2847 basic regular expressions do not provide its functionality. 2848 2849 @noindent 2850 Examples: 2851 @table @code 2852 @item abc? 2853 becomes @samp{abc\?} when using extended regular expressions. It matches 2854 the literal string @samp{abc?}. 2855 2856 @item c\+ 2857 becomes @samp{c+} when using extended regular expressions. It matches 2858 one or more @samp{c}s. 2859 2860 @item a\@{3,\@} 2861 becomes @samp{a@{3,@}} when using extended regular expressions. It matches 2862 three or more @samp{a}s. 2863 2864 @item \(abc\)\@{2,3\@} 2865 becomes @samp{(abc)@{2,3@}} when using extended regular expressions. It 2866 matches either @samp{abcabc} or @samp{abcabcabc}. 2867 2868 @item \(abc*\)\1 2869 becomes @samp{(abc*)\1} when using extended regular expressions. 2870 Backreferences must still be escaped when using extended regular 2871 expressions. 2872 2873 @item a\|b 2874 becomes @samp{a|b} when using extended regular expressions. It matches 2875 @samp{a} or @samp{b}. 2876 @end table 2877 2878 @node Character Classes and Bracket Expressions 2879 @section Character Classes and Bracket Expressions 2880 2881 @c The 'character class' section is shamelessly copied from grep's manual. 2882 2883 @cindex bracket expression 2884 @cindex character class 2885 A @dfn{bracket expression} is a list of characters enclosed by @samp{[} and 2886 @samp{]}. 2887 It matches any single character in that list; 2888 if the first character of the list is the caret @samp{^}, 2889 then it matches any character @strong{not} in the list. 2890 For example, the following command replaces the strings 2891 @samp{gray} or @samp{grey} with @samp{blue}: 2892 2893 @example 2894 sed 's/gr[ae]y/blue/' 2895 @end example 2896 2897 @c TODO: fix 'ref' to look good in both HTML and PDF 2898 Bracket expressions can be used in both 2899 @ref{BRE syntax,,basic} and @ref{ERE syntax,,extended} 2900 regular expressions (that is, with or without the @option{-E}/@option{-r} 2901 options). 2902 2903 @cindex range expression 2904 Within a bracket expression, a @dfn{range expression} consists of two 2905 characters separated by a hyphen. 2906 It matches any single character that 2907 sorts between the two characters, inclusive. 2908 In the default C locale, the sorting sequence is the native character 2909 order; for example, @samp{[a-d]} is equivalent to @samp{[abcd]}. 2910 2911 2912 Finally, certain named classes of characters are predefined within 2913 bracket expressions, as follows. 2914 2915 These named classes must be used @emph{inside} brackets 2916 themselves. Correct usage: 2917 @example 2918 $ echo 1 | sed 's/[[:digit:]]/X/' 2919 X 2920 @end example 2921 2922 Incorrect usage is rejected by newer @command{sed} versions. 2923 Older versions accepted it but treated it as a single bracket expression 2924 (which is equivalent to @samp{[dgit:]}, 2925 that is, only the characters @var{d/g/i/t/:}): 2926 @example 2927 # current GNU sed versions - incorrect usage rejected 2928 $ echo 1 | sed 's/[:digit:]/X/' 2929 sed: character class syntax is [[:space:]], not [:space:] 2930 2931 # older GNU sed versions 2932 $ echo 1 | sed 's/[:digit:]/X/' 2933 1 2934 @end example 2935 2936 2937 @cindex classes of characters 2938 @cindex character classes 2939 @cindex named character classes 2940 @table @samp 2941 2942 @item [:alnum:] 2943 @opindex alnum @r{character class} 2944 @cindex alphanumeric characters 2945 Alphanumeric characters: 2946 @samp{[:alpha:]} and @samp{[:digit:]}; in the @samp{C} locale and ASCII 2947 character encoding, this is the same as @samp{[0-9A-Za-z]}. 2948 2949 @item [:alpha:] 2950 @opindex alpha @r{character class} 2951 @cindex alphabetic characters 2952 Alphabetic characters: 2953 @samp{[:lower:]} and @samp{[:upper:]}; in the @samp{C} locale and ASCII 2954 character encoding, this is the same as @samp{[A-Za-z]}. 2955 2956 @item [:blank:] 2957 @opindex blank @r{character class} 2958 @cindex blank characters 2959 Blank characters: 2960 space and tab. 2961 2962 @item [:cntrl:] 2963 @opindex cntrl @r{character class} 2964 @cindex control characters 2965 Control characters. 2966 In ASCII, these characters have octal codes 000 2967 through 037, and 177 (DEL). 2968 In other character sets, these are 2969 the equivalent characters, if any. 2970 2971 @item [:digit:] 2972 @opindex digit @r{character class} 2973 @cindex digit characters 2974 @cindex numeric characters 2975 Digits: @code{0 1 2 3 4 5 6 7 8 9}. 2976 2977 @item [:graph:] 2978 @opindex graph @r{character class} 2979 @cindex graphic characters 2980 Graphical characters: 2981 @samp{[:alnum:]} and @samp{[:punct:]}. 2982 2983 @item [:lower:] 2984 @opindex lower @r{character class} 2985 @cindex lower-case letters 2986 Lower-case letters; in the @samp{C} locale and ASCII character 2987 encoding, this is 2988 @code{a b c d e f g h i j k l m n o p q r s t u v w x y z}. 2989 2990 @item [:print:] 2991 @opindex print @r{character class} 2992 @cindex printable characters 2993 Printable characters: 2994 @samp{[:alnum:]}, @samp{[:punct:]}, and space. 2995 2996 @item [:punct:] 2997 @opindex punct @r{character class} 2998 @cindex punctuation characters 2999 Punctuation characters; in the @samp{C} locale and ASCII character 3000 encoding, this is 3001 @code{!@: " # $ % & ' ( ) * + , - .@: / : ; < = > ?@: @@ [ \ ] ^ _ ` @{ | @} ~}. 3002 3003 @item [:space:] 3004 @opindex space @r{character class} 3005 @cindex space characters 3006 @cindex whitespace characters 3007 Space characters: in the @samp{C} locale, this is 3008 tab, newline, vertical tab, form feed, carriage return, and space. 3009 3010 3011 @item [:upper:] 3012 @opindex upper @r{character class} 3013 @cindex upper-case letters 3014 Upper-case letters: in the @samp{C} locale and ASCII character 3015 encoding, this is 3016 @code{A B C D E F G H I J K L M N O P Q R S T U V W X Y Z}. 3017 3018 @item [:xdigit:] 3019 @opindex xdigit @r{character class} 3020 @cindex xdigit class 3021 @cindex hexadecimal digits 3022 Hexadecimal digits: 3023 @code{0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f}. 3024 3025 @end table 3026 Note that the brackets in these class names are 3027 part of the symbolic names, and must be included in addition to 3028 the brackets delimiting the bracket expression. 3029 3030 Most meta-characters lose their special meaning inside bracket expressions: 3031 3032 @table @samp 3033 @item ] 3034 ends the bracket expression if it's not the first list item. 3035 So, if you want to make the @samp{]} character a list item, 3036 you must put it first. 3037 3038 @item - 3039 represents the range if it's not first or last in a list or the ending point 3040 of a range. 3041 3042 @item ^ 3043 represents the characters not in the list. 3044 If you want to make the @samp{^} 3045 character a list item, place it anywhere but first. 3046 @end table 3047 3048 TODO: incorporate this paragraph (copied verbatim from BRE section). 3049 3050 @cindex @code{POSIXLY_CORRECT} behavior, bracket expressions 3051 The characters @code{$}, @code{*}, @code{.}, @code{[}, and @code{\} 3052 are normally not special within @var{list}. For example, @code{[\*]} 3053 matches either @samp{\} or @samp{*}, because the @code{\} is not 3054 special here. However, strings like @code{[.ch.]}, @code{[=a=]}, and 3055 @code{[:space:]} are special within @var{list} and represent collating 3056 symbols, equivalence classes, and character classes, respectively, and 3057 @code{[} is therefore special within @var{list} when it is followed by 3058 @code{.}, @code{=}, or @code{:}. Also, when not in 3059 @env{POSIXLY_CORRECT} mode, special escapes like @code{\n} and 3060 @code{\t} are recognized within @var{list}. @xref{Escapes}. 3061 @c ******** 3062 3063 3064 @c TODO: improve explanation about collation classes and equivalence classes 3065 @c perhaps dedicate a section to Locales ?? 3066 3067 @table @samp 3068 @item [. 3069 represents the open collating symbol. 3070 3071 @item .] 3072 represents the close collating symbol. 3073 3074 @item [= 3075 represents the open equivalence class. 3076 3077 @item =] 3078 represents the close equivalence class. 3079 3080 @item [: 3081 represents the open character class symbol, and should be followed by a 3082 valid character class name. 3083 3084 @item :] 3085 represents the close character class symbol. 3086 @end table 3087 3088 3089 @node regexp extensions 3090 @section regular expression extensions 3091 3092 The following sequences have special meaning inside regular expressions 3093 (used in @ref{Regexp Addresses,,addresses} and the @code{s} command). 3094 3095 These can be used in both 3096 @ref{BRE syntax,,basic} and @ref{ERE syntax,,extended} 3097 regular expressions (that is, with or without the @option{-E}/@option{-r} 3098 options). 3099 3100 @table @code 3101 @item \w 3102 Matches any ``word'' character. A ``word'' character is any 3103 letter or digit or the underscore character. 3104 3105 @example 3106 $ echo "abc %-= def." | sed 's/\w/X/g' 3107 XXX %-= XXX. 3108 @end example 3109 3110 3111 @item \W 3112 Matches any ``non-word'' character. 3113 3114 @example 3115 $ echo "abc %-= def." | sed 's/\W/X/g' 3116 abcXXXXXdefX 3117 @end example 3118 3119 3120 @item \b 3121 Matches a word boundary; that is it matches if the character 3122 to the left is a ``word'' character and the character to the 3123 right is a ``non-word'' character, or vice-versa. 3124 3125 @example 3126 $ echo "abc %-= def." | sed 's/\b/X/g' 3127 XabcX %-= XdefX. 3128 @end example 3129 3130 3131 @item \B 3132 Matches everywhere but on a word boundary; that is it matches 3133 if the character to the left and the character to the right 3134 are either both ``word'' characters or both ``non-word'' 3135 characters. 3136 3137 @example 3138 $ echo "abc %-= def." | sed 's/\B/X/g' 3139 aXbXc X%X-X=X dXeXf.X 3140 @end example 3141 3142 3143 @item \s 3144 Matches whitespace characters (spaces and tabs). 3145 Newlines embedded in the pattern/hold spaces will also match: 3146 3147 @example 3148 $ echo "abc %-= def." | sed 's/\s/X/g' 3149 abcX%-=Xdef. 3150 @end example 3151 3152 3153 @item \S 3154 Matches non-whitespace characters. 3155 3156 @example 3157 $ echo "abc %-= def." | sed 's/\S/X/g' 3158 XXX XXX XXXX 3159 @end example 3160 3161 3162 @item \< 3163 Matches the beginning of a word. 3164 3165 @example 3166 $ echo "abc %-= def." | sed 's/\</X/g' 3167 Xabc %-= Xdef. 3168 @end example 3169 3170 3171 @item \> 3172 Matches the end of a word. 3173 3174 @example 3175 $ echo "abc %-= def." | sed 's/\>/X/g' 3176 abcX %-= defX. 3177 @end example 3178 3179 3180 @item \` 3181 Matches only at the start of pattern space. This is different 3182 from @code{^} in multi-line mode. 3183 3184 Compare the following two examples: 3185 3186 @example 3187 $ printf "a\nb\nc\n" | sed 'N;N;s/^/X/gm' 3188 Xa 3189 Xb 3190 Xc 3191 3192 $ printf "a\nb\nc\n" | sed 'N;N;s/\`/X/gm' 3193 Xa 3194 b 3195 c 3196 @end example 3197 3198 @item \' 3199 Matches only at the end of pattern space. This is different 3200 from @code{$} in multi-line mode. 3201 3202 3203 3204 @end table 3205 3206 3207 @node Back-references and Subexpressions 3208 @section Back-references and Subexpressions 3209 @cindex subexpression 3210 @cindex back-reference 3211 3212 @dfn{back-references} are regular expression commands which refer to a 3213 previous part of the matched regular expression. Back-references are 3214 specified with backslash and a single digit (e.g. @samp{\1}). The 3215 part of the regular expression they refer to is called a 3216 @dfn{subexpression}, and is designated with parentheses. 3217 3218 Back-references and subexpressions are used in two cases: in the 3219 regular expression search pattern, and in the @var{replacement} part 3220 of the @command{s} command (@pxref{Regexp Addresses,,Regular 3221 Expression Addresses} and @ref{The "s" Command}). 3222 3223 In a regular expression pattern, back-references are used to match 3224 the same content as a previously matched subexpression. In the 3225 following example, the subexpression is @samp{.} - any single 3226 character (being surrounded by parentheses makes it a 3227 subexpression). The back-reference @samp{\1} asks to match the same 3228 content (same character) as the sub-expression. 3229 3230 The command below matches words starting with any character, 3231 followed by the letter @samp{o}, followed by the same character as the 3232 first. 3233 3234 @example 3235 $ sed -E -n '/^(.)o\1$/p' /usr/share/dict/words 3236 bob 3237 mom 3238 non 3239 pop 3240 sos 3241 tot 3242 wow 3243 @end example 3244 3245 Multiple subexpressions are automatically numbered from 3246 left-to-right. This command searches for 6-letter 3247 palindromes (the first three letters are 3 subexpressions, 3248 followed by 3 back-references in reverse order): 3249 3250 @example 3251 $ sed -E -n '/^(.)(.)(.)\3\2\1$/p' /usr/share/dict/words 3252 redder 3253 @end example 3254 3255 In the @command{s} command, back-references can be 3256 used in the @var{replacement} part to refer back to subexpressions in 3257 the @var{regexp} part. 3258 3259 The following example uses two subexpressions in the regular 3260 expression to match two space-separated words. The back-references in 3261 the @var{replacement} part prints the words in a different order: 3262 3263 @example 3264 $ echo "James Bond" | sed -E 's/(.*) (.*)/The name is \2, \1 \2./' 3265 The name is Bond, James Bond. 3266 @end example 3267 3268 3269 When used with alternation, if the group does not participate in the 3270 match then the back-reference makes the whole match fail. For 3271 example, @samp{a(.)|b\1} will not match @samp{ba}. When multiple 3272 regular expressions are given with @option{-e} or from a file 3273 (@samp{-f @var{file}}), back-references are local to each expression. 3274 3275 1466 3276 @node Escapes 1467 @section @acronym{GNU} Extensions for Escapes in Regular Expressions1468 1469 @cindex @acronym{GNU}extensions, special escapes3277 @section Escape Sequences - specifying special characters 3278 3279 @cindex GNU extensions, special escapes 1470 3280 Until this chapter, we have only encountered escapes of the form 1471 3281 @samp{\^}, which tell @command{sed} not to interpret the circumflex … … 1476 3286 @cindex @code{POSIXLY_CORRECT} behavior, escapes 1477 3287 This chapter introduces another kind of escape@footnote{All 1478 the escapes introduced here are @acronym{GNU}3288 the escapes introduced here are GNU 1479 3289 extensions, with the exception of @code{\n}. In basic regular 1480 3290 expression mode, setting @code{POSIXLY_CORRECT} disables them inside … … 1522 3332 1523 3333 @item \o@var{xxx} 1524 @ifset PERL1525 @item \@var{xxx}1526 @end ifset1527 3334 Produces or matches a character whose octal @sc{ascii} value is @var{xxx}. 1528 @ifset PERL1529 The syntax without the @code{o} is active in Perl mode, while the one1530 with the @code{o} is active in the normal or extended @sc{posix} regular1531 expression modes.1532 @end ifset1533 3335 1534 3336 @item \x@var{xx} … … 1539 3341 the existing ``word boundary'' meaning. 1540 3342 1541 Other escapes match a particular character class and are valid only in 1542 regular expressions: 1543 3343 @subsection Escaping Precedence 3344 3345 @value{SSED} processes escape sequences @emph{before} passing 3346 the text onto the regular-expression matching of the @command{s///} command 3347 and Address matching. Thus the following two commands are equivalent 3348 (@samp{0x5e} is the hexadecimal @sc{ascii} value of the character @samp{^}): 3349 3350 @codequoteundirected on 3351 @codequotebacktick on 3352 @example 3353 @group 3354 $ echo 'a^c' | sed 's/^/b/' 3355 ba^c 3356 3357 $ echo 'a^c' | sed 's/\x5e/b/' 3358 ba^c 3359 @end group 3360 @end example 3361 @codequoteundirected off 3362 @codequotebacktick off 3363 3364 As are the following (@samp{0x5b},@samp{0x5d} are the hexadecimal 3365 @sc{ascii} values of @samp{[},@samp{]}, respectively): 3366 3367 @codequoteundirected on 3368 @codequotebacktick on 3369 @example 3370 @group 3371 $ echo abc | sed 's/[a]/x/' 3372 Xbc 3373 $ echo abc | sed 's/\x5ba\x5d/x/' 3374 Xbc 3375 @end group 3376 @end example 3377 @codequoteundirected off 3378 @codequotebacktick off 3379 3380 However it is recommended to avoid such special characters 3381 due to unexpected edge-cases. For example, the following 3382 are not equivalent: 3383 3384 @codequoteundirected on 3385 @codequotebacktick on 3386 @example 3387 @group 3388 $ echo 'a^c' | sed 's/\^/b/' 3389 abc 3390 3391 $ echo 'a^c' | sed 's/\\\x5e/b/' 3392 a^c 3393 @end group 3394 @end example 3395 @codequoteundirected off 3396 @codequotebacktick off 3397 3398 @c also: this fails in different places: 3399 @c $ sed 's/[//' 3400 @c sed: -e expression #1, char 5: unterminated `s' command 3401 @c $ sed 's/\x5b//' 3402 @c sed: -e expression #1, char 8: Invalid regular expression 3403 @c 3404 @c which is OK but confusing to explain why (the first 3405 @c fails in compile.c:snarf_char_class while the second 3406 @c is passed to the regex engine and then fails). 3407 3408 3409 @node Locale Considerations 3410 @section Multibyte characters and Locale Considerations 3411 3412 @value{SSED} processes valid multibyte characters in multibyte locales 3413 (e.g. @code{UTF-8}). @footnote{Some regexp edge-cases depends on the 3414 operating system and libc implementation. The examples shown are known 3415 to work as-expected on GNU/Linux systems using glibc.} 3416 3417 @noindent The following example uses the Greek letter Capital Sigma 3418 (@value{ucsigma}, 3419 Unicode code point @code{0x03A3}). In a @code{UTF-8} locale, 3420 @command{sed} correctly processes the Sigma as one character despite 3421 it being 2 octets (bytes): 3422 3423 @codequoteundirected on 3424 @codequotebacktick on 3425 @example 3426 @group 3427 $ locale | grep LANG 3428 LANG=en_US.UTF-8 3429 3430 $ printf 'a\u03A3b' 3431 a@value{ucsigma}b 3432 3433 $ printf 'a\u03A3b' | sed 's/./X/g' 3434 XXX 3435 3436 $ printf 'a\u03A3b' | od -tx1 -An 3437 61 ce a3 62 3438 @end group 3439 @end example 3440 @codequoteundirected off 3441 @codequotebacktick off 3442 3443 @noindent 3444 To force @command{sed} to process octets separately, use the @code{C} locale 3445 (also known as the @code{POSIX} locale): 3446 3447 @codequoteundirected on 3448 @codequotebacktick on 3449 @example 3450 $ printf 'a\u03A3b' | LC_ALL=C sed 's/./X/g' 3451 XXXX 3452 @end example 3453 @codequoteundirected off 3454 @codequotebacktick off 3455 3456 @subsection Invalid multibyte characters 3457 3458 @command{sed}'s regular expressions @emph{do not} match 3459 invalid multibyte sequences in a multibyte locale. 3460 3461 @noindent 3462 In the following examples, the ascii value @code{0xCE} is 3463 an incomplete multibyte character (shown here as @value{unicodeFFFD}). 3464 The regular expression @samp{.} does not match it: 3465 3466 @codequoteundirected on 3467 @codequotebacktick on 3468 @example 3469 @group 3470 $ printf 'a\xCEb\n' 3471 a@value{unicodeFFFD}e 3472 3473 $ printf 'a\xCEb\n' | sed 's/./X/g' 3474 X@value{unicodeFFFD}X 3475 3476 $ printf 'a\xCEc\n' | sed 's/./X/g' | od -tx1c -An 3477 58 ce 58 0a 3478 X X \n 3479 @end group 3480 @end example 3481 @codequoteundirected off 3482 @codequotebacktick off 3483 3484 @noindent Similarly, the 'catch-all' regular expression @samp{.*} does not 3485 match the entire line: 3486 3487 @codequoteundirected on 3488 @codequotebacktick on 3489 @example 3490 @group 3491 $ printf 'a\xCEc\n' | sed 's/.*//' | od -tx1c -An 3492 ce 63 0a 3493 c \n 3494 @end group 3495 @end example 3496 @codequoteundirected off 3497 @codequotebacktick off 3498 3499 @noindent 3500 @value{SSED} offers the special @command{z} command to clear the 3501 current pattern space regardless of invalid multibyte characters 3502 (i.e. it works like @code{s/.*//} but also removes invalid multibyte 3503 characters): 3504 3505 @codequoteundirected on 3506 @codequotebacktick on 3507 @example 3508 @group 3509 $ printf 'a\xCEc\n' | sed 'z' | od -tx1c -An 3510 0a 3511 \n 3512 @end group 3513 @end example 3514 @codequoteundirected off 3515 @codequotebacktick off 3516 3517 @noindent Alternatively, force the @code{C} locale to process 3518 each octet separately (every octet is a valid character in the @code{C} 3519 locale): 3520 3521 @codequoteundirected on 3522 @codequotebacktick on 3523 @example 3524 @group 3525 $ printf 'a\xCEc\n' | LC_ALL=C sed 's/.*//' | od -tx1c -An 3526 0a 3527 \n 3528 @end group 3529 @end example 3530 @codequoteundirected off 3531 @codequotebacktick off 3532 3533 3534 @command{sed}'s inability to process invalid multibyte characters 3535 can be used to detect such invalid sequences in a file. 3536 In the following examples, the @code{\xCE\xCE} is an invalid 3537 multibyte sequence, while @code{\xCE\A3} is a valid multibyte sequence 3538 (of the Greek Sigma character). 3539 3540 @noindent 3541 The following @command{sed} program removes all valid 3542 characters using @code{s/.//g}. Any content left in the pattern space 3543 (the invalid characters) are added to the hold space using the 3544 @code{H} command. On the last line (@code{$}), the hold space is retrieved 3545 (@code{x}), newlines are removed (@code{s/\n//g}), and any remaining 3546 octets are printed unambiguously (@code{l}). Thus, any invalid 3547 multibyte sequences are printed as octal values: 3548 3549 @codequoteundirected on 3550 @codequotebacktick on 3551 @example 3552 @group 3553 $ printf 'ab\nc\n\xCE\xCEde\n\xCE\xA3f\n' > invalid.txt 3554 3555 $ cat invalid.txt 3556 ab 3557 c 3558 @value{unicodeFFFD}@value{unicodeFFFD}de 3559 @value{ucsigma}f 3560 3561 $ sed -n 's/.//g ; H ; $@{x;s/\n//g;l@}' invalid.txt 3562 \316\316$ 3563 @end group 3564 @end example 3565 @codequoteundirected off 3566 @codequotebacktick off 3567 3568 @noindent With a few more commands, @command{sed} can print 3569 the exact line number corresponding to each invalid characters (line 3). 3570 These characters can then be removed by forcing the @code{C} locale 3571 and using octal escape sequences: 3572 3573 @codequoteundirected on 3574 @codequotebacktick on 3575 @example 3576 $ sed -n 's/.//g;=;l' invalid.txt | paste - - | awk '$2!="$"' 3577 3 \316\316$ 3578 3579 $ LC_ALL=C sed '3s/\o316\o316//' invalid.txt > fixed.txt 3580 @end example 3581 @codequoteundirected off 3582 @codequotebacktick off 3583 3584 @subsection Upper/Lower case conversion 3585 3586 3587 @value{SSED}'s substitute command (@code{s}) supports upper/lower 3588 case conversions using @code{\U},@code{\L} codes. 3589 These conversions support multibyte characters: 3590 3591 @codequoteundirected on 3592 @codequotebacktick on 3593 @example 3594 $ printf 'ABC\u03a3\n' 3595 ABC@value{ucsigma} 3596 3597 $ printf 'ABC\u03a3\n' | sed 's/.*/\L&/' 3598 abc@value{lcsigma} 3599 @end example 3600 @codequoteundirected off 3601 @codequotebacktick off 3602 3603 @noindent 3604 @xref{The "s" Command}. 3605 3606 3607 @subsection Multibyte regexp character classes 3608 3609 @c TODO: fix following paragraphs (copied verbatim from 'bracket 3610 @c expression' section). 3611 3612 In other locales, the sorting sequence is not specified, and 3613 @samp{[a-d]} might be equivalent to @samp{[abcd]} or to 3614 @samp{[aBbCcDd]}, or it might fail to match any character, or the set of 3615 characters that it matches might even be erratic. 3616 To obtain the traditional interpretation 3617 of bracket expressions, you can use the @samp{C} locale by setting the 3618 @env{LC_ALL} environment variable to the value @samp{C}. 3619 3620 @example 3621 # TODO: is there any real-world system/locale where 'A' 3622 # is replaced by '-' ? 3623 $ echo A | sed 's/[a-z]/-/' 3624 A 3625 @end example 3626 3627 Their interpretation depends on the @env{LC_CTYPE} locale; 3628 for example, @samp{[[:alnum:]]} means the character class of numbers and letters 3629 in the current locale. 3630 3631 TODO: show example of collation 3632 3633 @codequoteundirected on 3634 @codequotebacktick on 3635 @example 3636 # TODO: this works on glibc systems, not on musl-libc/freebsd/macosx. 3637 $ printf 'cliché\n' | LC_ALL=fr_FR.utf8 sed 's/[[=e=]]/X/g' 3638 clichX 3639 @end example 3640 @codequoteundirected off 3641 @codequotebacktick off 3642 3643 3644 @node advanced sed 3645 @chapter Advanced @command{sed}: cycles and buffers 3646 3647 @menu 3648 * Execution Cycle:: How @command{sed} works 3649 * Hold and Pattern Buffers:: 3650 * Multiline techniques:: Using D,G,H,N,P to process multiple lines 3651 * Branching and flow control:: 3652 @end menu 3653 3654 @node Execution Cycle 3655 @section How @command{sed} Works 3656 3657 @cindex Buffer spaces, pattern and hold 3658 @cindex Spaces, pattern and hold 3659 @cindex Pattern space, definition 3660 @cindex Hold space, definition 3661 @command{sed} maintains two data buffers: the active @emph{pattern} space, 3662 and the auxiliary @emph{hold} space. Both are initially empty. 3663 3664 @command{sed} operates by performing the following cycle on each 3665 line of input: first, @command{sed} reads one line from the input 3666 stream, removes any trailing newline, and places it in the pattern space. 3667 Then commands are executed; each command can have an address associated 3668 to it: addresses are a kind of condition code, and a command is only 3669 executed if the condition is verified before the command is to be 3670 executed. 3671 3672 When the end of the script is reached, unless the @option{-n} option 3673 is in use, the contents of pattern space are printed out to the output 3674 stream, adding back the trailing newline if it was removed.@footnote{Actually, 3675 if @command{sed} prints a line without the terminating newline, it will 3676 nevertheless print the missing newline as soon as more text is sent to 3677 the same output stream, which gives the ``least expected surprise'' 3678 even though it does not make commands like @samp{sed -n p} exactly 3679 identical to @command{cat}.} Then the next cycle starts for the next 3680 input line. 3681 3682 Unless special commands (like @samp{D}) are used, the pattern space is 3683 deleted between two cycles. The hold space, on the other hand, keeps 3684 its data between cycles (see commands @samp{h}, @samp{H}, @samp{x}, 3685 @samp{g}, @samp{G} to move data between both buffers). 3686 3687 @node Hold and Pattern Buffers 3688 @section Hold and Pattern Buffers 3689 3690 TODO 3691 3692 @node Multiline techniques 3693 @section Multiline techniques - using D,G,H,N,P to process multiple lines 3694 3695 Multiple lines can be processed as one buffer using the 3696 @code{D},@code{G},@code{H},@code{N},@code{P}. They are similar to 3697 their lowercase counterparts (@code{d},@code{g}, 3698 @code{h},@code{n},@code{p}), except that these commands append or 3699 subtract data while respecting embedded newlines - allowing adding and 3700 removing lines from the pattern and hold spaces. 3701 3702 They operate as follows: 1544 3703 @table @code 1545 @item \w 1546 Matches any ``word'' character. A ``word'' character is any 1547 letter or digit or the underscore character. 1548 1549 @item \W 1550 Matches any ``non-word'' character. 1551 1552 @item \b 1553 Matches a word boundary; that is it matches if the character 1554 to the left is a ``word'' character and the character to the 1555 right is a ``non-word'' character, or vice-versa. 1556 1557 @item \B 1558 Matches everywhere but on a word boundary; that is it matches 1559 if the character to the left and the character to the right 1560 are either both ``word'' characters or both ``non-word'' 1561 characters. 1562 1563 @item \` 1564 Matches only at the start of pattern space. This is different 1565 from @code{^} in multi-line mode. 1566 1567 @item \' 1568 Matches only at the end of pattern space. This is different 1569 from @code{$} in multi-line mode. 1570 1571 @ifset PERL 1572 @item \G 1573 Match only at the start of pattern space or, when doing a global 1574 substitution using the @code{s///g} command and option, at 1575 the end-of-match position of the prior match. For example, 1576 @samp{s/\Ga/Z/g} will change an initial run of @code{a}s to 1577 a run of @code{Z}s 1578 @end ifset 3704 @item D 3705 @emph{deletes} line from the pattern space until the first newline, 3706 and restarts the cycle. 3707 3708 @item G 3709 @emph{appends} line from the hold space to the pattern space, with a 3710 newline before it. 3711 3712 @item H 3713 @emph{appends} line from the pattern space to the hold space, with a 3714 newline before it. 3715 3716 @item N 3717 @emph{appends} line from the input file to the pattern space. 3718 3719 @item P 3720 @emph{prints} line from the pattern space until the first newline. 3721 1579 3722 @end table 3723 3724 3725 The following example illustrates the operation of @code{N} and 3726 @code{D} commands: 3727 3728 @codequoteundirected on 3729 @codequotebacktick on 3730 @example 3731 @group 3732 $ seq 6 | sed -n 'N;l;D' 3733 1\n2$ 3734 2\n3$ 3735 3\n4$ 3736 4\n5$ 3737 5\n6$ 3738 @end group 3739 @end example 3740 @codequoteundirected off 3741 @codequotebacktick off 3742 3743 @enumerate 3744 @item 3745 @command{sed} starts by reading the first line into the pattern space 3746 (i.e. @samp{1}). 3747 @item 3748 At the beginning of every cycle, the @code{N} 3749 command appends a newline and the next line to the pattern space 3750 (i.e. @samp{1}, @samp{\n}, @samp{2} in the first cycle). 3751 @item 3752 The @code{l} command prints the content of the pattern space 3753 unambiguously. 3754 @item 3755 The @code{D} command then removes the content of pattern 3756 space up to the first newline (leaving @samp{2} at the end of 3757 the first cycle). 3758 @item 3759 At the next cycle the @code{N} command appends a 3760 newline and the next input line to the pattern space 3761 (e.g. @samp{2}, @samp{\n}, @samp{3}). 3762 @end enumerate 3763 3764 3765 @cindex processing paragraphs 3766 @cindex paragraphs, processing 3767 A common technique to process blocks of text such as paragraphs 3768 (instead of line-by-line) is using the following construct: 3769 3770 @codequoteundirected on 3771 @codequotebacktick on 3772 @example 3773 sed '/./@{H;$!d@} ; x ; s/REGEXP/REPLACEMENT/' 3774 @end example 3775 @codequoteundirected off 3776 @codequotebacktick off 3777 3778 @enumerate 3779 @item 3780 The first expression, @code{/./@{H;$!d@}} operates on all non-empty lines, 3781 and adds the current line (in the pattern space) to the hold space. 3782 On all lines except the last, the pattern space is deleted and the cycle is 3783 restarted. 3784 3785 @item 3786 The other expressions @code{x} and @code{s} are executed only on empty 3787 lines (i.e. paragraph separators). The @code{x} command fetches the 3788 accumulated lines from the hold space back to the pattern space. The 3789 @code{s///} command then operates on all the text in the paragraph 3790 (including the embedded newlines). 3791 @end enumerate 3792 3793 The following example demonstrates this technique: 3794 @codequoteundirected on 3795 @codequotebacktick on 3796 @example 3797 @group 3798 $ cat input.txt 3799 a a a aa aaa 3800 aaaa aaaa aa 3801 aaaa aaa aaa 3802 3803 bbbb bbb bbb 3804 bb bb bbb bb 3805 bbbbbbbb bbb 3806 3807 ccc ccc cccc 3808 cccc ccccc c 3809 cc cc cc cc 3810 3811 $ sed '/./@{H;$!d@} ; x ; s/^/\nSTART-->/ ; s/$/\n<--END/' input.txt 3812 3813 START--> 3814 a a a aa aaa 3815 aaaa aaaa aa 3816 aaaa aaa aaa 3817 <--END 3818 3819 START--> 3820 bbbb bbb bbb 3821 bb bb bbb bb 3822 bbbbbbbb bbb 3823 <--END 3824 3825 START--> 3826 ccc ccc cccc 3827 cccc ccccc c 3828 cc cc cc cc 3829 <--END 3830 @end group 3831 @end example 3832 @codequoteundirected off 3833 @codequotebacktick off 3834 3835 For more annotated examples, @pxref{Text search across multiple lines} 3836 and @ref{Line length adjustment}. 3837 3838 @node Branching and flow control 3839 @section Branching and Flow Control 3840 3841 The branching commands @code{b}, @code{t}, and @code{T} enable 3842 changing the flow of @command{sed} programs. 3843 3844 By default, @command{sed} reads an input line into the pattern buffer, 3845 then continues to processes all commands in order. 3846 Commands without addresses affect all lines. 3847 Commands with addresses affect only matching lines. 3848 @xref{Execution Cycle} and @ref{Addresses overview}. 3849 3850 @command{sed} does not support a typical @code{if/then} construct. 3851 Instead, some commands can be used as conditionals or to change the 3852 default flow control: 3853 3854 @table @code 3855 3856 @item d 3857 delete (clears) the current pattern space, 3858 and restart the program cycle without processing the rest of the commands 3859 and without printing the pattern space. 3860 3861 @item D 3862 delete the contents of the pattern space @emph{up to the first newline}, 3863 and restart the program cycle without processing the rest of 3864 the commands and without printing the pattern space. 3865 3866 @item [addr]X 3867 @itemx [addr]@{ X ; X ; X @} 3868 @item /regexp/X 3869 @item /regexp/@{ X ; X ; X @} 3870 Addresses and regular expressions can be used as an @code{if/then} 3871 conditional: If @var{[addr]} matches the current pattern space, 3872 execute the command(s). 3873 For example: The command @code{/^#/d} means: 3874 @emph{if} the current pattern matches the regular expression @code{^#} (a line 3875 starting with a hash), @emph{then} execute the @code{d} command: 3876 delete the line without printing it, and restart the program cycle 3877 immediately. 3878 3879 @item b 3880 branch unconditionally (that is: always jump to a label, skipping 3881 or repeating other commands, without restarting a new cycle). Combined 3882 with an address, the branch can be conditionally executed on matched 3883 lines. 3884 3885 @item t 3886 branch conditionally (that is: jump to a label) @emph{only if} a 3887 @code{s///} command has succeeded since the last input line was read 3888 or another conditional branch was taken. 3889 3890 @item T 3891 similar but opposite to the @code{t} command: branch only if 3892 there has been @emph{no} successful substitutions since the last 3893 input line was read. 3894 @end table 3895 3896 3897 The following two @command{sed} programs are equivalent. The first 3898 (contrived) example uses the @code{b} command to skip the @code{s///} 3899 command on lines containing @samp{1}. The second example uses an 3900 address with negation (@samp{!}) to perform substitution only on 3901 desired lines. The @code{y///} command is still executed on all 3902 lines: 3903 3904 @codequoteundirected on 3905 @codequotebacktick on 3906 @example 3907 @group 3908 $ printf '%s\n' a1 a2 a3 | sed -E '/1/bx ; s/a/z/ ; :x ; y/123/456/' 3909 a4 3910 z5 3911 z6 3912 3913 $ printf '%s\n' a1 a2 a3 | sed -E '/1/!s/a/z/ ; y/123/456/' 3914 a4 3915 z5 3916 z6 3917 @end group 3918 @end example 3919 @codequoteundirected off 3920 @codequotebacktick off 3921 3922 3923 3924 @subsection Branching and Cycles 3925 @cindex labels 3926 @cindex omitting labels 3927 @cindex cycle, restarting 3928 @cindex restarting a cycle 3929 The @code{b},@code{t} and @code{T} commands can be followed by a label 3930 (typically a single letter). Labels are defined with a colon followed by 3931 one or more letters (e.g. @samp{:x}). If the label is omitted the 3932 branch commands restart the cycle. Note the difference between 3933 branching to a label and restarting the cycle: when a cycle is 3934 restarted, @command{sed} first prints the current content of the 3935 pattern space, then reads the next input line into the pattern space; 3936 Jumping to a label (even if it is at the beginning of the program) 3937 does not print the pattern space and does not read the next input line. 3938 3939 The following program is a no-op. The @code{b} command (the only command 3940 in the program) does not have a label, and thus simply restarts the cycle. 3941 On each cycle, the pattern space is printed and the next input line is read: 3942 3943 @example 3944 @group 3945 $ seq 3 | sed b 3946 1 3947 2 3948 3 3949 @end group 3950 @end example 3951 3952 @cindex infinite loop, branching 3953 @cindex branching, infinite loop 3954 The following example is an infinite-loop - it doesn't terminate and 3955 doesn't print anything. The @code{b} command jumps to the @samp{x} 3956 label, and a new cycle is never started: 3957 3958 @codequoteundirected on 3959 @codequotebacktick on 3960 @example 3961 @group 3962 $ seq 3 | sed ':x ; bx' 3963 3964 # The above command requires gnu sed (which supports additional 3965 # commands following a label, without a newline). A portable equivalent: 3966 # sed -e ':x' -e bx 3967 @end group 3968 @end example 3969 @codequoteundirected off 3970 @codequotebacktick off 3971 3972 @cindex branching and n, N 3973 @cindex n, and branching 3974 @cindex N, and branching 3975 Branching is often complemented with the @code{n} or @code{N} commands: 3976 both commands read the next input line into the pattern space without waiting 3977 for the cycle to restart. Before reading the next input line, @code{n} 3978 prints the current pattern space then empties it, while @code{N} 3979 appends a newline and the next input line to the pattern space. 3980 3981 Consider the following two examples: 3982 3983 @codequoteundirected on 3984 @codequotebacktick on 3985 @example 3986 @group 3987 $ seq 3 | sed ':x ; n ; bx' 3988 1 3989 2 3990 3 3991 3992 $ seq 3 | sed ':x ; N ; bx' 3993 1 3994 2 3995 3 3996 @end group 3997 @end example 3998 @codequoteundirected off 3999 @codequotebacktick off 4000 4001 @itemize 4002 @item 4003 Both examples do not inf-loop, despite never starting a new cycle. 4004 4005 @item 4006 In the first example, the @code{n} commands first prints the content 4007 of the pattern space, empties the pattern space then reads the next 4008 input line. 4009 4010 @item 4011 In the second example, the @code{N} commands appends the next input 4012 line to the pattern space (with a newline). Lines are accumulated in 4013 the pattern space until there are no more input lines to read, then 4014 the @code{N} command terminates the @command{sed} program. When the 4015 program terminates, the end-of-cycle actions are performed, and the 4016 entire pattern space is printed. 4017 4018 @item 4019 The second example requires @value{SSED}, 4020 because it uses the non-POSIX-standard behavior of @code{N}. 4021 See the ``@code{N} command on the last line'' paragraph 4022 in @ref{Reporting Bugs}. 4023 4024 @item 4025 To further examine the difference between the two examples, 4026 try the following commands: 4027 @codequoteundirected on 4028 @codequotebacktick on 4029 @example 4030 @group 4031 printf '%s\n' aa bb cc dd | sed ':x ; n ; = ; bx' 4032 printf '%s\n' aa bb cc dd | sed ':x ; N ; = ; bx' 4033 printf '%s\n' aa bb cc dd | sed ':x ; n ; s/\n/***/ ; bx' 4034 printf '%s\n' aa bb cc dd | sed ':x ; N ; s/\n/***/ ; bx' 4035 @end group 4036 @end example 4037 @codequoteundirected off 4038 @codequotebacktick off 4039 4040 @end itemize 4041 4042 4043 4044 @subsection Branching example: joining lines 4045 4046 @cindex joining lines with branching 4047 @cindex branching, joining lines 4048 @cindex quoted-printable lines, joining 4049 @cindex joining quoted-printable lines 4050 @cindex t, joining lines with 4051 @cindex b, joining lines with 4052 @cindex b, versus t 4053 @cindex t, versus b 4054 As a real-world example of using branching, consider the case of 4055 @uref{https://en.wikipedia.org/wiki/Quoted-printable,quoted-printable} files, 4056 typically used to encode email messages. 4057 In these files long lines are split and marked with a @dfn{soft line break} 4058 consisting of a single @samp{=} character at the end of the line: 4059 4060 @example 4061 @group 4062 $ cat jaques.txt 4063 All the wor= 4064 ld's a stag= 4065 e, 4066 And all the= 4067 men and wo= 4068 men merely = 4069 players: 4070 They have t= 4071 heir exits = 4072 and their e= 4073 ntrances; 4074 And one man= 4075 in his tim= 4076 e plays man= 4077 y parts. 4078 @end group 4079 @end example 4080 4081 4082 The following program uses an address match @samp{/=$/} as a 4083 conditional: If the current pattern space ends with a @samp{=}, it 4084 reads the next input line using @code{N}, replaces all @samp{=} 4085 characters which are followed by a newline, and unconditionally 4086 branches (@code{b}) to the beginning of the program without restarting 4087 a new cycle. If the pattern space does not ends with @samp{=}, the 4088 default action is performed: the pattern space is printed and a new 4089 cycle is started: 4090 4091 @codequoteundirected on 4092 @codequotebacktick on 4093 @example 4094 @group 4095 $ sed ':x ; /=$/ @{ N ; s/=\n//g ; bx @}' jaques.txt 4096 All the world's a stage, 4097 And all the men and women merely players: 4098 They have their exits and their entrances; 4099 And one man in his time plays many parts. 4100 @end group 4101 @end example 4102 @codequoteundirected off 4103 @codequotebacktick off 4104 4105 Here's an alternative program with a slightly different approach: On 4106 all lines except the last, @code{N} appends the line to the pattern 4107 space. A substitution command then removes soft line breaks 4108 (@samp{=} at the end of a line, i.e. followed by a newline) by replacing 4109 them with an empty string. 4110 @emph{if} the substitution was successful (meaning the pattern space contained 4111 a line which should be joined), The conditional branch command @code{t} jumps 4112 to the beginning of the program without completing or restarting the cycle. 4113 If the substitution failed (meaning there were no soft line breaks), 4114 The @code{t} command will @emph{not} branch. Then, @code{P} will 4115 print the pattern space content until the first newline, and @code{D} 4116 will delete the pattern space content until the first new line. 4117 (To learn more about @code{N}, @code{P} and @code{D} commands 4118 @pxref{Multiline techniques}). 4119 4120 4121 @codequoteundirected on 4122 @codequotebacktick on 4123 @example 4124 @group 4125 $ sed ':x ; $!N ; s/=\n// ; tx ; P ; D' jaques.txt 4126 All the world's a stage, 4127 And all the men and women merely players: 4128 They have their exits and their entrances; 4129 And one man in his time plays many parts. 4130 @end group 4131 @end example 4132 @codequoteundirected off 4133 @codequotebacktick off 4134 4135 4136 For more line-joining examples @pxref{Joining lines}. 4137 1580 4138 1581 4139 @node Examples … … 1586 4144 1587 4145 @menu 4146 4147 Useful one-liners: 4148 * Joining lines:: 4149 1588 4150 Some exotic examples: 1589 4151 * Centering lines:: … … 1592 4154 * Print bash environment:: 1593 4155 * Reverse chars of lines:: 4156 * Text search across multiple lines:: 4157 * Line length adjustment:: 4158 * Adding a header to multiple files:: 1594 4159 1595 4160 Emulating standard utilities: … … 1608 4173 @end menu 1609 4174 4175 @node Joining lines 4176 @section Joining lines 4177 4178 This section uses @code{N}, @code{D} and @code{P} commands to process 4179 multiple lines, and the @code{b} and @code{t} commands for branching. 4180 @xref{Multiline techniques} and @ref{Branching and flow control}. 4181 4182 Join specific lines (e.g. if lines 2 and 3 need to be joined): 4183 4184 @codequoteundirected on 4185 @codequotebacktick on 4186 @example 4187 $ cat lines.txt 4188 hello 4189 hel 4190 lo 4191 hello 4192 4193 $ sed '2@{N;s/\n//;@}' lines.txt 4194 hello 4195 hello 4196 hello 4197 @end example 4198 @codequoteundirected off 4199 @codequotebacktick off 4200 4201 Join backslash-continued lines: 4202 4203 @codequoteundirected on 4204 @codequotebacktick on 4205 @example 4206 $ cat 1.txt 4207 this \ 4208 is \ 4209 a \ 4210 long \ 4211 line 4212 and another \ 4213 line 4214 4215 $ sed -e ':x /\\$/ @{ N; s/\\\n//g ; bx @}' 1.txt 4216 this is a long line 4217 and another line 4218 4219 4220 #TODO: The above requires gnu sed. 4221 # non-gnu seds need newlines after ':' and 'b' 4222 @end example 4223 @codequoteundirected off 4224 @codequotebacktick off 4225 4226 Join lines that start with whitespace (e.g SMTP headers): 4227 4228 @codequoteundirected on 4229 @codequotebacktick on 4230 @example 4231 @group 4232 $ cat 2.txt 4233 Subject: Hello 4234 World 4235 Content-Type: multipart/alternative; 4236 boundary=94eb2c190cc6370f06054535da6a 4237 Date: Tue, 3 Jan 2017 19:41:16 +0000 (GMT) 4238 Authentication-Results: mx.gnu.org; 4239 dkim=pass header.i=@@gnu.org; 4240 spf=pass 4241 Message-ID: <abcdef@@gnu.org> 4242 From: John Doe <jdoe@@gnu.org> 4243 To: Jane Smith <jsmith@@gnu.org> 4244 4245 $ sed -E ':a ; $!N ; s/\n\s+/ / ; ta ; P ; D' 2.txt 4246 Subject: Hello World 4247 Content-Type: multipart/alternative; boundary=94eb2c190cc6370f06054535da6a 4248 Date: Tue, 3 Jan 2017 19:41:16 +0000 (GMT) 4249 Authentication-Results: mx.gnu.org; dkim=pass header.i=@@gnu.org; spf=pass 4250 Message-ID: <abcdef@@gnu.org> 4251 From: John Doe <jdoe@@gnu.org> 4252 To: Jane Smith <jsmith@@gnu.org> 4253 4254 # A portable (non-gnu) variation: 4255 # sed -e :a -e '$!N;s/\n */ /;ta' -e 'P;D' 4256 @end group 4257 @end example 4258 @codequoteundirected off 4259 @codequotebacktick off 4260 4261 1610 4262 @node Centering lines 1611 4263 @section Centering Lines … … 1634 4286 1635 4287 @group 1636 # del leading and trailing spaces1637 y/@kbd{ tab}/ /4288 # delete leading and trailing spaces 4289 y/@kbd{@key{TAB}}/ / 1638 4290 s/^ *// 1639 4291 s/ *$// … … 1684 4336 1685 4337 @group 1686 # replace all leading 9s by _ (any other character except digits, could4338 # replace all trailing 9s by _ (any other character except digits, could 1687 4339 # be used) 1688 4340 :d … … 1694 4346 # incr last digit only. The first line adds a most-significant 1695 4347 # digit of 1 if we have to add a digit. 1696 #1697 # The @code{tn} commands are not necessary, but make the thing1698 # faster1699 4348 @end group 1700 4349 … … 1727 4376 seen a script converting the output of @command{date} into a @command{bc} 1728 4377 program! 1729 4378 1730 4379 The main body of this is the @command{sed} script, which remaps the name 1731 from lower to upper (or vice-versa) and even checks out 4380 from lower to upper (or vice-versa) and even checks out 1732 4381 if the remapped name is the same as the original name. 1733 4382 Note how the script is parameterized using shell … … 1738 4387 @group 1739 4388 #! /bin/sh 1740 # rename files to lower/upper case... 4389 # rename files to lower/upper case... 1741 4390 # 1742 # usage: 1743 # move-to-lower * 1744 # move-to-upper * 4391 # usage: 4392 # move-to-lower * 4393 # move-to-upper * 1745 4394 # or 1746 4395 # move-to-lower -R . … … 1752 4401 help() 1753 4402 @{ 1754 4403 cat << eof 1755 4404 Usage: $0 [-n] [-r] [-h] files... 1756 4405 @end group … … 1785 4434 while : 1786 4435 do 1787 case "$1" in 4436 case "$1" in 1788 4437 -n) apply_cmd='cat' ;; 1789 4438 -R) finder='find "$@@" -type f';; … … 1813 4462 esac 1814 4463 @end group 1815 4464 1816 4465 eval $finder | sed -n ' 1817 4466 … … 1855 4504 @group 1856 4505 # check if converted file name is equal to original file name, 1857 # if it is, do not print nothing4506 # if it is, do not print anything 1858 4507 /^.*\/\(.*\)\n\1/b 4508 @end group 4509 4510 @group 4511 # escape special characters for the shell 4512 s/["$`\\]/\\&/g 1859 4513 @end group 1860 4514 … … 1974 4628 @c end--------------------------------------------- 1975 4629 4630 4631 @node Text search across multiple lines 4632 @section Text search across multiple lines 4633 4634 This section uses @code{N} and @code{D} commands to search for 4635 consecutive words spanning multiple lines. @xref{Multiline techniques}. 4636 4637 These examples deal with finding doubled occurrences of words in a document. 4638 4639 Finding doubled words in a single line is easy using GNU @command{grep} 4640 and similarly with @value{SSED}: 4641 4642 @c NOTE: in all examples, 'the@ the' is used to prevent 4643 @c 'make syntax-check' from complaining about double words. 4644 @codequoteundirected on 4645 @codequotebacktick on 4646 @example 4647 @group 4648 $ cat two-cities-dup1.txt 4649 It was the best of times, 4650 it was the worst of times, 4651 it was the@ the age of wisdom, 4652 it was the age of foolishness, 4653 4654 $ grep -E '\b(\w+)\s+\1\b' two-cities-dup1.txt 4655 it was the@ the age of wisdom, 4656 4657 $ grep -n -E '\b(\w+)\s+\1\b' two-cities-dup1.txt 4658 3:it was the@ the age of wisdom, 4659 4660 $ sed -En '/\b(\w+)\s+\1\b/p' two-cities-dup1.txt 4661 it was the@ the age of wisdom, 4662 4663 $ sed -En '/\b(\w+)\s+\1\b/@{=;p@}' two-cities-dup1.txt 4664 3 4665 it was the@ the age of wisdom, 4666 @end group 4667 @end example 4668 @codequoteundirected off 4669 @codequotebacktick off 4670 4671 @itemize @bullet 4672 @item 4673 The regular expression @samp{\b\w+\s+} searches for word-boundary (@samp{\b}), 4674 followed by one-or-more word-characters (@samp{\w+}), followed by whitespace 4675 (@samp{\s+}). @xref{regexp extensions}. 4676 4677 @item 4678 Adding parentheses around the @samp{(\w+)} expression creates a subexpression. 4679 The regular expression pattern @samp{(PATTERN)\s+\1} defines a subexpression 4680 (in the parentheses) followed by a back-reference, separated by whitespace. 4681 A successful match means the @var{PATTERN} was repeated twice in succession. 4682 @xref{Back-references and Subexpressions}. 4683 4684 @item 4685 The word-boundery expression (@samp{\b}) at both ends ensures partial 4686 words are not matched (e.g. @samp{the then} is not a desired match). 4687 @c Thanks to Jim for pointing this out in 4688 @c https://lists.gnu.org/archive/html/sed-devel/2016-12/msg00041.html 4689 4690 @item 4691 The @option{-E} option enables extended regular expression syntax, alleviating 4692 the need to add backslashes before the parenthesis. @xref{ERE syntax}. 4693 4694 @end itemize 4695 4696 When the doubled word span two lines the above regular expression 4697 will not find them as @command{grep} and @command{sed} operate line-by-line. 4698 4699 By using @command{N} and @command{D} commands, @command{sed} can apply 4700 regular expressions on multiple lines (that is, multiple lines are stored 4701 in the pattern space, and the regular expression works on it): 4702 4703 @c NOTE: use 'the@*the' instead of a real new line to prevent 4704 @c 'make syntax-check' to complain about doubled-words. 4705 @codequoteundirected on 4706 @codequotebacktick on 4707 @example 4708 $ cat two-cities-dup2.txt 4709 It was the best of times, it was the 4710 worst of times, it was the@*the age of wisdom, 4711 it was the age of foolishness, 4712 4713 $ sed -En '@{N; /\b(\w+)\s+\1\b/@{=;p@} ; D@}' two-cities-dup2.txt 4714 3 4715 worst of times, it was the@*the age of wisdom, 4716 @end example 4717 @codequoteundirected off 4718 @codequotebacktick off 4719 4720 @itemize @bullet 4721 @item 4722 The @command{N} command appends the next line to the pattern space 4723 (thus ensuring it contains two consecutive lines in every cycle). 4724 4725 @item 4726 The regular expression uses @samp{\s+} for word separator which matches 4727 both spaces and newlines. 4728 4729 @item 4730 The regular expression matches, the entire pattern space is printed 4731 with @command{p}. No lines are printed by default due to the @option{-n} option. 4732 4733 @item 4734 The @command{D} removes the first line from the pattern space (up until the 4735 first newline), readying it for the next cycle. 4736 @end itemize 4737 4738 See the GNU @command{coreutils} manual for an alternative solution using 4739 @command{tr -s} and @command{uniq} at 4740 @c NOTE: cheating and keeping the URL line shorter than 80 characters 4741 @c by using 'gnu.org' and '/s/'. 4742 @url{https://gnu.org/s/coreutils/manual/html_node/Squeezing-and-deleting.html}. 4743 4744 @node Line length adjustment 4745 @section Line length adjustment 4746 4747 This section uses @code{N} and @code{P} commands to read and write 4748 lines, and the @code{b} command for branching. 4749 @xref{Multiline techniques} and @ref{Branching and flow control}. 4750 4751 This (somewhat contrived) example deal with formatting and wrapping 4752 lines of text of the following input file: 4753 4754 @example 4755 @group 4756 $ cat two-cities-mix.txt 4757 It was the best of times, it was 4758 the worst of times, it 4759 was the age of 4760 wisdom, 4761 it 4762 was 4763 the age 4764 of foolishness, 4765 @end group 4766 @end example 4767 4768 @exdent The following sed program wraps lines at 40 characters: 4769 @codequoteundirected on 4770 @codequotebacktick on 4771 @example 4772 @group 4773 $ cat wrap40.sed 4774 # outer loop 4775 :x 4776 4777 # Append a newline followed by the next input line to the pattern buffer 4778 N 4779 4780 # Remove all newlines from the pattern buffer 4781 s/\n/ /g 4782 4783 4784 # Inner loop 4785 :y 4786 4787 # Add a newline after the first 40 characters 4788 s/(.@{40,40@})/\1\n/ 4789 4790 # If there is a newline in the pattern buffer 4791 # (i.e. the previous substitution added a newline) 4792 /\n/ @{ 4793 # There are newlines in the pattern buffer - 4794 # print the content until the first newline. 4795 P 4796 4797 # Remove the printed characters and the first newline 4798 s/.*\n// 4799 4800 # branch to label 'y' - repeat inner loop 4801 by 4802 @} 4803 4804 # No newlines in the pattern buffer - Branch to label 'x' (outer loop) 4805 # and read the next input line 4806 bx 4807 @end group 4808 @end example 4809 @codequoteundirected off 4810 @codequotebacktick off 4811 4812 4813 4814 @exdent The wrapped output: 4815 @codequoteundirected on 4816 @codequotebacktick on 4817 @example 4818 @group 4819 $ sed -E -f wrap40.sed two-cities-mix.txt 4820 It was the best of times, it was the wor 4821 st of times, it was the age of wisdom, i 4822 t was the age of foolishness, 4823 @end group 4824 @end example 4825 @codequoteundirected off 4826 @codequotebacktick off 4827 4828 4829 4830 4831 @node Adding a header to multiple files 4832 @section Adding a header to multiple files 4833 4834 @value{SSED} can be used to safely modify multiple files at once. 4835 4836 @exdent Add a single line to the beginning of source code files: 4837 4838 @codequoteundirected on 4839 @codequotebacktick on 4840 @example 4841 sed -i '1i/* Copyright (C) FOO BAR */' *.c 4842 @end example 4843 @codequoteundirected off 4844 @codequotebacktick off 4845 4846 @exdent Adding a few lines is possible using @samp{\n} in the text: 4847 4848 @codequoteundirected on 4849 @codequotebacktick on 4850 @example 4851 sed -i '1i/*\n * Copyright (C) FOO BAR\n * Created by Jane Doe\n */' *.c 4852 @end example 4853 @codequoteundirected off 4854 @codequotebacktick off 4855 4856 To add multiple lines from another file, use @code{0rFILE}. 4857 A typical use case is adding a license notice header to all files: 4858 4859 @codequoteundirected on 4860 @codequotebacktick on 4861 @example 4862 ## Create the header file: 4863 $ cat<<'EOF'>LIC.TXT 4864 /* 4865 Copyright (C) 1989-2021 FOO BAR 4866 4867 This program is free software; you can redistribute it and/or modify 4868 it under the terms of the GNU General Public License as published by 4869 the Free Software Foundation; either version 3, or (at your option) 4870 any later version. 4871 4872 This program is distributed in the hope that it will be useful, 4873 but WITHOUT ANY WARRANTY; without even the implied warranty of 4874 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 4875 GNU General Public License for more details. 4876 4877 You should have received a copy of the GNU General Public License 4878 along with this program; If not, see <https://www.gnu.org/licenses/>. 4879 */ 4880 EOF 4881 4882 ## Add the file at the beginning of all source code files: 4883 $ sed -i '0rLIC.TXT' *.cpp *.h 4884 @end example 4885 @codequoteundirected off 4886 @codequotebacktick off 4887 4888 4889 With script files (e.g. @file{.sh},@file{.py},@file{.pl} files) 4890 the license notice typically appears @emph{after} the first line (the 4891 'shebang' @samp{#!} line). The @code{1rFILE} command will add @file{FILE} 4892 @emph{after} the first line: 4893 4894 @codequoteundirected on 4895 @codequotebacktick on 4896 @example 4897 ## Create the header file: 4898 $ cat<<'EOF'>LIC.TXT 4899 ## 4900 ## Copyright (C) 1989-2021 FOO BAR 4901 ## 4902 ## This program is free software; you can redistribute it and/or modify 4903 ## it under the terms of the GNU General Public License as published by 4904 ## the Free Software Foundation; either version 3, or (at your option) 4905 ## any later version. 4906 ## 4907 ## This program is distributed in the hope that it will be useful, 4908 ## but WITHOUT ANY WARRANTY; without even the implied warranty of 4909 ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 4910 ## GNU General Public License for more details. 4911 ## 4912 ## You should have received a copy of the GNU General Public License 4913 ## along with this program; If not, see <https://www.gnu.org/licenses/>. 4914 ## 4915 ## 4916 EOF 4917 4918 ## Add the file at the beginning of all source code files: 4919 $ sed -i '1rLIC.TXT' *.py *.sh 4920 @end example 4921 @codequoteundirected off 4922 @codequotebacktick off 4923 4924 The above @command{sed} commands can be combined with @command{find} 4925 to locate files in all subdirectories, @command{xargs} to run additional 4926 commands on selected files and @command{grep} to filter out files that already 4927 contain a copyright notice: 4928 4929 @codequoteundirected on 4930 @codequotebacktick on 4931 @example 4932 find \( -iname '*.cpp' -o -iname '*.c' -o -iname '*.h' \) \ 4933 | xargs grep -Li copyright \ 4934 | xargs -r sed -i '0rLIC.TXT' 4935 @end example 4936 @codequoteundirected off 4937 @codequotebacktick off 4938 4939 @exdent Or a slightly safe version (handling files with spaces and newlines): 4940 4941 @codequoteundirected on 4942 @codequotebacktick on 4943 @example 4944 find \( -iname '*.cpp' -o -iname '*.c' -o -iname '*.h' \) -print0 \ 4945 | xargs -0 grep -Z -Li copyright \ 4946 | xargs -0 -r sed -i '0rLIC.TXT' 4947 @end example 4948 @codequoteundirected off 4949 @codequotebacktick off 4950 4951 Note: using the @code{0} address with @code{r} command requires @value{SSED} 4952 version 4.9 or later. @xref{Zero Address}. 4953 4954 4955 1976 4956 @node tac 1977 4957 @section Reverse Lines of Files … … 1981 4961 is a @command{tac} workalike. 1982 4962 1983 Note that on implementations other than @acronym{GNU} @command{sed} 1984 @ifset PERL 1985 and @value{SSED} 1986 @end ifset 4963 Note that on implementations other than GNU @command{sed} 1987 4964 this script might easily overflow internal buffers. 1988 4965 … … 2015 4992 2016 4993 This script replaces @samp{cat -n}; in fact it formats its output 2017 exactly like @acronym{GNU}@command{cat} does.4994 exactly like GNU @command{cat} does. 2018 4995 2019 4996 Of course this is completely useless and for two reasons: first, … … 2254 5231 @group 2255 5232 # Convert words to a's 2256 s/[ @kbd{ tab}][ @kbd{tab}]*/ /g5233 s/[ @kbd{@key{TAB}}][ @kbd{@key{TAB}}]*/ /g 2257 5234 s/^/ / 2258 5235 s/ [^ ][^ ]*/a /g … … 2431 5408 @c end--------------------------------------------- 2432 5409 2433 As you can see, we ma ntain a 2-line window using @code{P} and @code{D}.5410 As you can see, we maintain a 2-line window using @code{P} and @code{D}. 2434 5411 This technique is often used in advanced @command{sed} scripts. 2435 5412 … … 2585 5562 fastest. Note that loops are completely done with @code{n} and 2586 5563 @code{b}, without relying on @command{sed} to restart the 2587 thescript automatically at the end of a line.5564 script automatically at the end of a line. 2588 5565 2589 5566 @c start------------------------------------------- … … 2603 5580 # get next 2604 5581 n 2605 # got chars? print it again, etc... 5582 # got chars? print it again, etc... 2606 5583 /./bx 2607 5584 @end group … … 2631 5608 @chapter @value{SSED}'s Limitations and Non-limitations 2632 5609 2633 @cindex @acronym{GNU}extensions, unlimited line length5610 @cindex GNU extensions, unlimited line length 2634 5611 @cindex Portability, line length limitations 2635 5612 For those who want to write portable @command{sed} scripts, … … 2647 5624 the size of the buffer that can be processed by certain patterns. 2648 5625 2649 @ifset PERL2650 There are some size limitations in the regular expression2651 matcher but it is hoped that they will never in practice2652 be relevant. The maximum length of a compiled pattern2653 is 65539 (sic) bytes. All values in repeating quantifiers2654 must be less than 65536. The maximum nesting depth of2655 all parenthesized subpatterns, including capturing and2656 non-capturing subpatterns@footnote{The2657 distinction is meaningful when referring to Perl-style2658 regular expressions.}, assertions, and other types of2659 subpattern, is 200.2660 2661 Also, @value{SSED} recognizes the @sc{posix} syntax2662 @code{[.@var{ch}.]} and @code{[=@var{ch}=]}2663 where @var{ch} is a ``collating element'', but these2664 are not supported, and an error is given if they are2665 encountered.2666 2667 Here are a few distinctions between the real Perl-style2668 regular expressions and those that @option{-R} recognizes.2669 2670 @enumerate2671 @item2672 Lookahead assertions do not allow repeat quantifiers after them2673 Perl permits them, but they do not mean what you2674 might think. For example, @samp{(?!a)@{3@}} does not assert that the2675 next three characters are not @samp{a}. It just asserts three times that the2676 next character is not @samp{a} --- a waste of time and nothing else.2677 2678 @item2679 Capturing subpatterns that occur inside negative lookahead2680 head assertions are counted, but their entries are counted2681 as empty in the second half of an @code{s} command.2682 Perl sets its numerical variables from any such patterns2683 that are matched before the assertion fails to match2684 something (thereby succeeding), but only if the negative2685 lookahead assertion contains just one branch.2686 2687 @item2688 The following Perl escape sequences are not supported:2689 @samp{\l}, @samp{\u}, @samp{\L}, @samp{\U}, @samp{\E},2690 @samp{\Q}. In fact these are implemented by Perl's general2691 string-handling and are not part of its pattern matching engine.2692 2693 @item2694 The Perl @samp{\G} assertion is not supported as it is not2695 relevant to single pattern matches.2696 2697 @item2698 Fairly obviously, @value{SSED} does not support the @samp{(?@{code@})}2699 and @samp{(?p@{code@})} constructions. However, there is some experimental2700 support for recursive patterns using the non-Perl item @samp{(?R)}.2701 2702 @item2703 There are at the time of writing some oddities in Perl2704 5.005_02 concerned with the settings of captured strings2705 when part of a pattern is repeated. For example, matching2706 @samp{aba} against the pattern @samp{/^(a(b)?)+$/} sets2707 @samp{$2}@footnote{@samp{$2} would be @samp{\2} in @value{SSED}.}2708 to the value @samp{b}, but matching @samp{aabbaa}2709 against @samp{/^(aa(bb)?)+$/} leaves @samp{$2}2710 unset. However, if the pattern is changed to2711 @samp{/^(aa(b(b))?)+$/} then @samp{$2} (and @samp{$3}) are set.2712 In Perl 5.004 @samp{$2} is set in both cases, and that is also2713 true of @value{SSED}.2714 2715 @item2716 Another as yet unresolved discrepancy is that in Perl2717 5.005_02 the pattern @samp{/^(a)?(?(1)a|b)+$/} matches2718 the string @samp{a}, whereas in @value{SSED} it does not.2719 However, in both Perl and @value{SSED} @samp{/^(a)?a/} matched2720 against @samp{a} leaves $1 unset.2721 @end enumerate2722 @end ifset2723 5626 2724 5627 @node Other Resources 2725 5628 @chapter Other Resources for Learning About @command{sed} 2726 5629 5630 For up to date information about @value{SSED} please 5631 visit @uref{https://www.gnu.org/software/sed/}. 5632 5633 Send general questions and suggestions to @email{sed-devel@@gnu.org}. 5634 Visit the mailing list archives for past discussions at 5635 @uref{https://lists.gnu.org/archive/html/sed-devel/}. 5636 2727 5637 @cindex Additional reading about @command{sed} 2728 In addition to several books that have been written about @command{sed} 2729 (either specifically or as chapters in books which discuss 2730 shell programming), one can find out more about @command{sed} 2731 (including suggestions of a few books) from the FAQ 2732 for the @code{sed-users} mailing list, available from any of: 2733 @display 2734 @uref{http://www.student.northpark.edu/pemente/sed/sedfaq.html} 2735 @uref{http://sed.sf.net/grabbag/tutorials/sedfaq.html} 2736 @end display 2737 2738 Also of interest are 2739 @uref{http://www.student.northpark.edu/pemente/sed/index.htm} 2740 and @uref{http://sed.sf.net/grabbag}, 2741 which include @command{sed} tutorials and other @command{sed}-related goodies. 2742 2743 The @code{sed-users} mailing list itself maintained by Sven Guckes. 2744 To subscribe, visit @uref{http://groups.yahoo.com} and search 2745 for the @code{sed-users} mailing list. 5638 The following resources provide information about @command{sed} 5639 (both @value{SSED} and other variations). Note these not maintained by 5640 @value{SSED} developers. 5641 5642 @itemize @bullet 5643 5644 @item 5645 sed @code{$HOME}: @uref{http://sed.sf.net} 5646 5647 @item 5648 sed FAQ: @uref{http://sed.sf.net/sedfaq.html} 5649 5650 @item 5651 seder's grabbag: @uref{http://sed.sf.net/grabbag} 5652 5653 @item 5654 The @code{sed-users} mailing list maintained by Sven Guckes: 5655 @uref{http://groups.yahoo.com/group/sed-users/} 5656 (note this is @emph{not} the @value{SSED} mailing list). 5657 5658 @end itemize 2746 5659 2747 5660 @node Reporting Bugs … … 2749 5662 2750 5663 @cindex Bugs, reporting 2751 Email bug reports to @email{bonzini@@gnu.org}. 2752 Be sure to include the word ``sed'' somewhere in the @code{Subject:} field. 5664 Email bug reports to @email{bug-sed@@gnu.org}. 2753 5665 Also, please include the output of @samp{sed --version} in the body 2754 5666 of your report if at all possible. … … 2757 5669 2758 5670 @example 2759 @i{ while building frobme-1.3.4}2760 $ configure 5671 @i{@i{@r{while building frobme-1.3.4}}} 5672 $ configure 2761 5673 @error{} sed: file sedscr line 1: Unknown option to 's' 2762 5674 @end example … … 2777 5689 2778 5690 @table @asis 5691 @anchor{N_command_last_line} 2779 5692 @item @code{N} command on the last line 2780 5693 @cindex Portability, @code{N} command on the last line … … 2786 5699 the @command{-n} command switch has been specified. This choice is 2787 5700 by design. 5701 5702 Default behavior (gnu extension, non-POSIX conforming): 5703 @example 5704 $ seq 3 | sed N 5705 1 5706 2 5707 3 5708 @end example 5709 @noindent 5710 To force POSIX-conforming behavior: 5711 @example 5712 $ seq 3 | sed --posix N 5713 1 5714 2 5715 @end example 2788 5716 2789 5717 For example, the behavior of … … 2806 5734 /foo/@{ N;N;N;N;N;N;N;N;N; @} 2807 5735 @end example 2808 5736 2809 5737 @cindex @code{POSIXLY_CORRECT} behavior, @code{N} command 2810 5738 In any case, the simplest workaround is to use @code{$d;N} in … … 2813 5741 2814 5742 @item Regex syntax clashes (problems with backslashes) 2815 @cindex @acronym{GNU}extensions, to basic regular expressions5743 @cindex GNU extensions, to basic regular expressions 2816 5744 @cindex Non-bugs, regex syntax clashes 2817 5745 @command{sed} uses the @sc{posix} basic regular expression syntax. According to … … 2821 5749 @code{\>}, @code{\b}, @code{\B}, @code{\w}, and @code{\W}. 2822 5750 2823 As in all @acronym{GNU}programs that use @sc{posix} basic regular5751 As in all GNU programs that use @sc{posix} basic regular 2824 5752 expressions, @command{sed} interprets these escape sequences as special 2825 5753 characters. So, @code{x\+} matches one or more occurrences of @samp{x}. … … 2832 5760 spurious backslashes if they are to be used with modern implementations 2833 5761 of @command{sed}, like 2834 @ifset PERL 2835 @value{SSED} or 2836 @end ifset 2837 @acronym{GNU} @command{sed}. 5762 GNU @command{sed}. 2838 5763 2839 5764 On the other hand, some scripts use s|abc\|def||g to remove occurrences … … 2841 5766 @command{sed} 4.0.x, newer versions interpret this as removing the 2842 5767 string @code{abc|def}. This is again undefined behavior according to 2843 @acronym{POSIX}, and this interpretation is arguably more robust: older5768 POSIX, and this interpretation is arguably more robust: older 2844 5769 @command{sed}s, for example, required that the regex matcher parsed 2845 5770 @code{\/} as @code{/} in the common case of escaping a slash, which is … … 2847 5772 because the regex matcher is only partially under our control. 2848 5773 2849 @cindex @acronym{GNU}extensions, special escapes5774 @cindex GNU extensions, special escapes 2850 5775 In addition, this version of @command{sed} supports several escape characters 2851 5776 (some of which are multi-character) to insert non-printable characters … … 2863 5788 (@pxref{Invoking sed, , Invocation}) lets you clobber 2864 5789 protected files. This is not a bug, but rather a consequence 2865 of how the Unix file system works.5790 of how the Unix file system works. 2866 5791 2867 5792 The permissions on a file say what can happen to the data … … 2873 5798 modifying the contents of the directory, so the operation depends on 2874 5799 the permissions of the directory, not of the file. For this same 2875 reason, @command{sed} does not let you use @option{-i} on a writ eable file2876 in a read-only directory (but unbelievably nobody reports that as a2877 bug@dots{}).5800 reason, @command{sed} does not let you use @option{-i} on a writable file 5801 in a read-only directory, and will break hard or symbolic links when 5802 @option{-i} is used on such a file. 2878 5803 2879 5804 @item @code{0a} does not work (gives an error) 5805 @cindex @code{0} address 5806 @cindex GNU extensions, @code{0} address 5807 @cindex Non-bugs, @code{0} address 5808 2880 5809 There is no line 0. 0 is a special address that is only used to treat 2881 5810 addresses like @code{0,/@var{RE}/} as active when the script starts: if 2882 you write @code{1,/abc/d} and the first line includes the word@samp{abc},5811 you write @code{1,/abc/d} and the first line includes the string @samp{abc}, 2883 5812 then that match would be ignored because address ranges must span at least 2884 5813 two lines (barring the end of the file); but what you probably wanted is … … 2888 5817 @ifclear PERL 2889 5818 @item @code{[a-z]} is case insensitive 5819 @cindex Non-bugs, localization-related 5820 2890 5821 You are encountering problems with locales. POSIX mandates that @code{[a-z]} 2891 5822 uses the current locale's collation order -- in C parlance, that means using 2892 5823 @code{strcoll(3)} instead of @code{strcmp(3)}. Some locales have a 2893 case-insensitive collation order, others don't: one of those that have 2894 problems is Estonian. 5824 case-insensitive collation order, others don't. 2895 5825 2896 5826 Another problem is that @code{[a-z]} tries to use collation symbols. 2897 This only happens if you are on the @acronym{GNU}system, using2898 @acronym{GNU}libc's regular expression matcher instead of compiling the2899 one supplied with @acronym{GNU}sed. In a Danish locale, for example,5827 This only happens if you are on the GNU system, using 5828 GNU libc's regular expression matcher instead of compiling the 5829 one supplied with GNU sed. In a Danish locale, for example, 2900 5830 the regular expression @code{^[a-z]$} matches the string @samp{aa}, 2901 5831 because this is a single collating symbol that comes after @samp{a} … … 2905 5835 To work around these problems, which may cause bugs in shell scripts, set 2906 5836 the @env{LC_COLLATE} and @env{LC_CTYPE} environment variables to @samp{C}. 5837 5838 @item @code{s/.*//} does not clear pattern space 5839 @cindex Non-bugs, localization-related 5840 @cindex @value{SSEDEXT}, emptying pattern space 5841 @cindex Emptying pattern space 5842 5843 This happens if your input stream includes invalid multibyte 5844 sequences. @sc{posix} mandates that such sequences 5845 are @emph{not} matched by @samp{.}, so that @samp{s/.*//} will not clear 5846 pattern space as you would expect. In fact, there is no way to clear 5847 sed's buffers in the middle of the script in most multibyte locales 5848 (including UTF-8 locales). For this reason, @value{SSED} provides a `z' 5849 command (for `zap') as an extension. 5850 5851 To work around these problems, which may cause bugs in shell scripts, set 5852 the @env{LC_COLLATE} and @env{LC_CTYPE} environment variables to @samp{C}. 2907 5853 @end ifclear 2908 5854 @end table 2909 5855 2910 5856 2911 @node Extended regexps 2912 @appendix Extended regular expressions 2913 @cindex Extended regular expressions, syntax 2914 2915 The only difference between basic and extended regular expressions is in 2916 the behavior of a few characters: @samp{?}, @samp{+}, parentheses, 2917 and braces (@samp{@{@}}). While basic regular expressions require 2918 these to be escaped if you want them to behave as special characters, 2919 when using extended regular expressions you must escape them if 2920 you want them @emph{to match a literal character}. 2921 2922 @noindent 2923 Examples: 2924 @table @code 2925 @item abc? 2926 becomes @samp{abc\?} when using extended regular expressions. It matches 2927 the literal string @samp{abc?}. 2928 2929 @item c\+ 2930 becomes @samp{c+} when using extended regular expressions. It matches 2931 one or more @samp{c}s. 2932 2933 @item a\@{3,\@} 2934 becomes @samp{a@{3,@}} when using extended regular expressions. It matches 2935 three or more @samp{a}s. 2936 2937 @item \(abc\)\@{2,3\@} 2938 becomes @samp{(abc)@{2,3@}} when using extended regular expressions. It 2939 matches either @samp{abcabc} or @samp{abcabcabc}. 2940 2941 @item \(abc*\)\1 2942 becomes @samp{(abc*)\1} when using extended regular expressions. 2943 Backreferences must still be escaped when using extended regular 2944 expressions. 2945 @end table 2946 2947 @ifset PERL 2948 @node Perl regexps 2949 @appendix Perl-style regular expressions 2950 @cindex Perl-style regular expressions, syntax 2951 2952 @emph{This part is taken from the @file{pcre.txt} file distributed together 2953 with the free @sc{pcre} regular expression matcher; it was written by Philip Hazel.} 2954 2955 Perl introduced several extensions to regular expressions, some 2956 of them incompatible with the syntax of regular expressions 2957 accepted by Emacs and other @acronym{GNU} tools (whose matcher was 2958 based on the Emacs matcher). @value{SSED} implements 2959 both kinds of extensions. 2960 2961 @iftex 2962 Summarizing, we have: 2963 2964 @itemize @bullet 2965 @item 2966 A backslash can introduce several special sequences 2967 2968 @item 2969 The circumflex, dollar sign, and period characters behave specially 2970 with regard to new lines 2971 2972 @item 2973 Strange uses of square brackets are parsed differently 2974 2975 @item 2976 You can toggle modifiers in the middle of a regular expression 2977 2978 @item 2979 You can specify that a subpattern does not count when numbering backreferences 2980 2981 @item 2982 @cindex Greedy regular expression matching 2983 You can specify greedy or non-greedy matching 2984 2985 @item 2986 You can have more than ten back references 2987 2988 @item 2989 You can do complex look aheads and look behinds (in the spirit of 2990 @code{\b}, but with subpatterns). 2991 2992 @item 2993 You can often improve performance by avoiding that @command{sed} wastes 2994 time with backtracking 2995 2996 @item 2997 You can have if/then/else branches 2998 2999 @item 3000 You can do recursive matches, for example to look for unbalanced parentheses 3001 3002 @item 3003 You can have comments and non-significant whitespace, because things can 3004 get complex... 3005 @end itemize 3006 3007 Most of these extensions are introduced by the special @code{(?} 3008 sequence, which gives special meanings to parenthesized groups. 3009 @end iftex 3010 @menu 3011 Other extensions can be roughly subdivided in two categories 3012 On one hand Perl introduces several more escaped sequences 3013 (that is, sequences introduced by a backslash). On the other 3014 hand, it specifies that if a question mark follows an open 3015 parentheses it should give a special meaning to the parenthesized 3016 group. 3017 3018 * Backslash:: Introduces special sequences 3019 * Circumflex/dollar sign/period:: Behave specially with regard to new lines 3020 * Square brackets:: Are a bit different in strange cases 3021 * Options setting:: Toggle modifiers in the middle of a regexp 3022 * Non-capturing subpatterns:: Are not counted when backreferencing 3023 * Repetition:: Allows for non-greedy matching 3024 * Backreferences:: Allows for more than 10 back references 3025 * Assertions:: Allows for complex look ahead matches 3026 * Non-backtracking subpatterns:: Often gives more performance 3027 * Conditional subpatterns:: Allows if/then/else branches 3028 * Recursive patterns:: For example to match parentheses 3029 * Comments:: Because things can get complex... 3030 @end menu 3031 3032 @node Backslash 3033 @appendixsec Backslash 3034 @cindex Perl-style regular expressions, escaped sequences 3035 3036 There are a few difference in the handling of backslashed 3037 sequences in Perl mode. 3038 3039 First of all, there are no @code{\o} and @code{\d} sequences. 3040 @sc{ascii} values for characters can be specified in octal 3041 with a @code{\@var{xxx}} sequence, where @var{xxx} is a 3042 sequence of up to three octal digits. If the first digit 3043 is a zero, the treatment of the sequence is straightforward; 3044 just note that if the character that follows the escaped digit 3045 is itself an octal digit, you have to supply three octal digits 3046 for @var{xxx}. For example @code{\07} is a @sc{bel} character 3047 rather than a @sc{nul} and a literal @code{7} (this sequence is 3048 instead represented by @code{\0007}). 3049 3050 @cindex Perl-style regular expressions, backreferences 3051 The handling of a backslash followed by a digit other than 0 3052 is complicated. Outside a character class, @command{sed} reads it 3053 and any following digits as a decimal number. If the number 3054 is less than 10, or if there have been at least that many 3055 previous capturing left parentheses in the expression, the 3056 entire sequence is taken as a back reference. A description 3057 of how this works is given later, following the discussion 3058 of parenthesized subpatterns. 3059 3060 Inside a character class, or if the decimal number is 3061 greater than 9 and there have not been that many capturing 3062 subpatterns, @command{sed} re-reads up to three octal digits following 3063 the backslash, and generates a single byte from the 3064 least significant 8 bits of the value. Any subsequent digits 3065 stand for themselves. For example: 3066 3067 @example 3068 \040 @i{is another way of writing a space} 3069 \40 @i{is the same, provided there are fewer than 40} 3070 @i{previous capturing subpatterns} 3071 \7 @i{is always a back reference} 3072 \011 @i{is always a tab} 3073 \11 @i{might be a back reference, or another way of} 3074 @i{writing a tab} 3075 \0113 @i{is a tab followed by the character @samp{3}} 3076 \113 @i{is the character with octal code 113 (since there} 3077 @i{can be no more than 99 back references)} 3078 \377 @i{is a byte consisting entirely of 1 bits (@sc{ascii} 255)} 3079 \81 @i{is either a back reference, or a binary zero} 3080 @i{followed by the two characters @samp{81}} 3081 @end example 3082 3083 Note that octal values of 100 or greater must not be introduced 3084 duced by a leading zero, because no more than three octal 3085 digits are ever read. 3086 3087 All the sequences that define a single byte value can be 3088 used both inside and outside character classes. In addition, 3089 inside a character class, the sequence @code{\b} is interpreted 3090 as the backspace character (hex 08). Outside a character 3091 class it has a different meaning (see below). 3092 3093 In addition, there are four additional escapes specifying 3094 generic character classes (like @code{\w} and @code{\W} do): 3095 3096 @cindex Perl-style regular expressions, character classes 3097 @table @samp 3098 @item \d 3099 Matches any decimal digit 3100 3101 @item \D 3102 Matches any character that is not a decimal digit 3103 @end table 3104 3105 In Perl mode, these character type sequences can appear both inside and 3106 outside character classes. Instead, in @sc{posix} mode these sequences 3107 (as well as @code{\w} and @code{\W}) are treated as two literal characters 3108 (a backslash and a letter) inside square brackets. 3109 3110 Escaped sequences specifying assertions are also different in 3111 Perl mode. An assertion specifies a condition that has to be met 3112 at a particular point in a match, without consuming any 3113 characters from the subject string. The use of subpatterns 3114 for more complicated assertions is described below. The 3115 backslashed assertions are 3116 3117 @cindex Perl-style regular expressions, assertions 3118 @table @samp 3119 @item \b 3120 Asserts that the point is at a word boundary. 3121 A word boundary is a position in the subject string where 3122 the current character and the previous character do not both 3123 match @code{\w} or @code{\W} (i.e. one matches @code{\w} and 3124 the other matches @code{\W}), or the start or end of the string 3125 if the first or last character matches @code{\w}, respectively. 3126 3127 @item \B 3128 Asserts that the point is not at a word boundary. 3129 3130 @item \A 3131 Asserts the matcher is at the start of pattern space (independent 3132 of multiline mode). 3133 3134 @item \Z 3135 Asserts the matcher is at the end of pattern space, 3136 or at a newline before the end of pattern space (independent of 3137 multiline mode) 3138 3139 @item \z 3140 Asserts the matcher is at the end of pattern space (independent 3141 of multiline mode) 3142 @end table 3143 3144 These assertions may not appear in character classes (but 3145 note that @code{\b} has a different meaning, namely the 3146 backspace character, inside a character class). 3147 Note that Perl mode does not support directly assertions 3148 for the beginning and the end of word; the @acronym{GNU} extensions 3149 @code{\<} and @code{\>} achieve this purpose in @sc{posix} mode 3150 instead. 3151 3152 The @code{\A}, @code{\Z}, and @code{\z} assertions differ 3153 from the traditional circumflex and dollar sign (described below) 3154 in that they only ever match at the very start and end of the 3155 subject string, whatever options are set; in particular @code{\A} 3156 and @code{\z} are the same as the @acronym{GNU} extensions 3157 @code{\`} and @code{\'} that are active in @sc{posix} mode. 3158 3159 @node Circumflex/dollar sign/period 3160 @appendixsec Circumflex, dollar sign, period 3161 @cindex Perl-style regular expressions, newlines 3162 3163 Outside a character class, in the default matching mode, the 3164 circumflex character is an assertion which is true only if 3165 the current matching point is at the start of the subject 3166 string. Inside a character class, the circumflex has an entirely 3167 different meaning (see below). 3168 3169 The circumflex need not be the first character of the pattern if 3170 a number of alternatives are involved, but it should be the 3171 first thing in each alternative in which it appears if the 3172 pattern is ever to match that branch. If all possible alternatives, 3173 start with a circumflex, that is, if the pattern is 3174 constrained to match only at the start of the subject, it is 3175 said to be an @dfn{anchored} pattern. (There are also other constructs 3176 structs that can cause a pattern to be anchored.) 3177 3178 A dollar sign is an assertion which is true only if the 3179 current matching point is at the end of the subject string, 3180 or immediately before a newline character that is the last 3181 character in the string (by default). A dollar sign need not be the 3182 last character of the pattern if a number of alternatives 3183 are involved, but it should be the last item in any branch 3184 in which it appears. A dollar sign has no special meaning in a 3185 character class. 3186 3187 @cindex Perl-style regular expressions, multiline 3188 The meanings of the circumflex and dollar sign characters are 3189 changed if the @code{M} modifier option is used. When this is 3190 the case, they match immediately after and immediately 3191 before an internal @code{\n} character, respectively, in addition 3192 to matching at the start and end of the subject string. For 3193 example, the pattern @code{/^abc$/} matches the subject string 3194 @samp{def\nabc} in multiline mode, but not otherwise. Consequently, 3195 patterns that are anchored in single line mode 3196 because all branches start with @code{^} are not anchored in 3197 multiline mode. 3198 3199 @cindex Perl-style regular expressions, multiline 3200 Note that the sequences @code{\A}, @code{\Z}, and @code{\z} 3201 can be used to match the start and end of the subject in both 3202 modes, and if all branches of a pattern start with @code{\A} 3203 is it always anchored, whether the @code{M} modifier is set or not. 3204 3205 @cindex Perl-style regular expressions, single line 3206 Outside a character class, a dot in the pattern matches any 3207 one character in the subject, including a non-printing character, 3208 but not (by default) newline. If the @code{S} modifier is used, 3209 dots match newlines as well. Actually, the handling of 3210 dot is entirely independent of the handling of circumflex 3211 and dollar sign, the only relationship being that they both 3212 involve newline characters. Dot has no special meaning in a 3213 character class. 3214 3215 @node Square brackets 3216 @appendixsec Square brackets 3217 @cindex Perl-style regular expressions, character classes 3218 3219 An opening square bracket introduces a character class, terminated 3220 by a closing square bracket. A closing square bracket on its own 3221 is not special. If a closing square bracket is required as a 3222 member of the class, it should be the first data character in 3223 the class (after an initial circumflex, if present) or escaped with a backslash. 3224 3225 A character class matches a single character in the subject; 3226 the character must be in the set of characters defined by 3227 the class, unless the first character in the class is a circumflex, 3228 in which case the subject character must not be in 3229 the set defined by the class. If a circumflex is actually 3230 required as a member of the class, ensure it is not the 3231 first character, or escape it with a backslash. 3232 3233 For example, the character class [aeiou] matches any lower 3234 case vowel, while [^aeiou] matches any character that is not 3235 a lower case vowel. Note that a circumflex is just a convenient 3236 venient notation for specifying the characters which are in 3237 the class by enumerating those that are not. It is not an 3238 assertion: it still consumes a character from the subject 3239 string, and fails if the current pointer is at the end of 3240 the string. 3241 3242 @cindex Perl-style regular expressions, case-insensitive 3243 When caseless matching is set, any letters in a class 3244 represent both their upper case and lower case versions, so 3245 for example, a caseless @code{[aeiou]} matches uppercase 3246 and lowercase @samp{A}s, and a caseless @code{[^aeiou]} 3247 does not match @samp{A}, whereas a case-sensitive version would. 3248 3249 @cindex Perl-style regular expressions, single line 3250 @cindex Perl-style regular expressions, multiline 3251 The newline character is never treated in any special way in 3252 character classes, whatever the setting of the @code{S} and 3253 @code{M} options (modifiers) is. A class such as @code{[^a]} will 3254 always match a newline. 3255 3256 The minus (hyphen) character can be used to specify a range 3257 of characters in a character class. For example, @code{[d-m]} 3258 matches any letter between d and m, inclusive. If a minus 3259 character is required in a class, it must be escaped with a 3260 backslash or appear in a position where it cannot be interpreted 3261 as indicating a range, typically as the first or last 3262 character in the class. 3263 3264 It is not possible to have the literal character @code{]} as the 3265 end character of a range. A pattern such as @code{[W-]46]} is 3266 interpreted as a class of two characters (@code{W} and @code{-}) 3267 followed by a literal string @code{46]}, so it would match 3268 @samp{W46]} or @samp{-46]}. However, if the @code{]} is escaped 3269 with a backslash it is interpreted as the end of range, so 3270 @code{[W-\]46]} is interpreted as a single class containing a 3271 range followed by two separate characters. The octal or 3272 hexadecimal representation of @code{]} can also be used to end a range. 3273 3274 Ranges operate in @sc{ascii} collating sequence. They can also be 3275 used for characters specified numerically, for example 3276 @code{[\000-\037]}. If a range that includes letters is used when 3277 caseless matching is set, it matches the letters in either 3278 case. For example, a caseless @code{[W-c]} is equivalent to 3279 @code{[][\^_`wxyzabc]}, matched caselessly, and if character 3280 tables for the French locale are in use, @code{[\xc8-\xcb]} 3281 matches accented E characters in both cases. 3282 3283 Unlike in @sc{posix} mode, the character types @code{\d}, 3284 @code{\D}, @code{\s}, @code{\S}, @code{\w}, and @code{\W} 3285 may also appear in a character class, and add the characters 3286 that they match to the class. For example, @code{[\dABCDEF]} matches any 3287 hexadecimal digit. A circumflex can conveniently be used 3288 with the upper case character types to specify a more restricted 3289 set of characters than the matching lower case type. 3290 For example, the class @code{[^\W_]} matches any letter or digit, 3291 but not underscore. 3292 3293 All non-alphameric characters other than @code{\}, @code{-}, 3294 @code{^} (at the start) and the terminating @code{]} 3295 are non-special in character classes, but it does no harm 3296 if they are escaped. 3297 3298 Perl 5.6 supports the @sc{posix} notation for character classes, which 3299 uses names enclosed by @code{[:} and @code{:]} within the enclosing 3300 square brackets, and @value{SSED} supports this notation as well. 3301 For example, 3302 3303 @example 3304 [01[:alpha:]%] 3305 @end example 3306 3307 @noindent 3308 matches @samp{0}, @samp{1}, any alphabetic character, or @samp{%}. 3309 The supported class names are 3310 3311 @table @code 3312 @item alnum 3313 Matches letters and digits 3314 3315 @item alpha 3316 Matches letters 3317 3318 @item ascii 3319 Matches character codes 0 - 127 3320 3321 @item cntrl 3322 Matches control characters 3323 3324 @item digit 3325 Matches decimal digits (same as \d) 3326 3327 @item graph 3328 Matches printing characters, excluding space 3329 3330 @item lower 3331 Matches lower case letters 3332 3333 @item print 3334 Matches printing characters, including space 3335 3336 @item punct 3337 Matches printing characters, excluding letters and digits 3338 3339 @item space 3340 Matches white space (same as \s) 3341 3342 @item upper 3343 Matches upper case letters 3344 3345 @item word 3346 Matches ``word'' characters (same as \w) 3347 3348 @item xdigit 3349 Matches hexadecimal digits 3350 @end table 3351 3352 The names @code{ascii} and @code{word} are extensions valid only in 3353 Perl mode. Another Perl extension is negation, which is 3354 indicated by a circumflex character after the colon. For example, 3355 3356 @example 3357 [12[:^digit:]] 3358 @end example 3359 3360 @noindent 3361 matches @samp{1}, @samp{2}, or any non-digit. 3362 3363 @node Options setting 3364 @appendixsec Options setting 3365 @cindex Perl-style regular expressions, toggling options 3366 @cindex Perl-style regular expressions, case-insensitive 3367 @cindex Perl-style regular expressions, multiline 3368 @cindex Perl-style regular expressions, single line 3369 @cindex Perl-style regular expressions, extended 3370 3371 The settings of the @code{I}, @code{M}, @code{S}, @code{X} 3372 modifiers can be changed from within the pattern by 3373 a sequence of Perl option letters enclosed between @code{(?} 3374 and @code{)}. The option letters must be lowercase. 3375 3376 For example, @code{(?im)} sets caseless, multiline matching. It is 3377 also possible to unset these options by preceding the letter 3378 with a hyphen; you can also have combined settings and unsettings: 3379 @code{(?im-sx)} sets caseless and multiline matching, 3380 while unsets single line matching (for dots) and extended 3381 whitespace interpretation. If a letter appears both before 3382 and after the hyphen, the option is unset. 3383 3384 The scope of these option changes depends on where in the 3385 pattern the setting occurs. For settings that are outside 3386 any subpattern (defined below), the effect is the same as if 3387 the options were set or unset at the start of matching. The 3388 following patterns all behave in exactly the same way: 3389 3390 @example 3391 (?i)abc 3392 a(?i)bc 3393 ab(?i)c 3394 abc(?i) 3395 @end example 3396 3397 which in turn is the same as specifying the pattern abc with 3398 the @code{I} modifier. In other words, ``top level'' settings 3399 apply to the whole pattern (unless there are other 3400 changes inside subpatterns). If there is more than one setting 3401 of the same option at top level, the rightmost setting 3402 is used. 3403 3404 If an option change occurs inside a subpattern, the effect 3405 is different. This is a change of behaviour in Perl 5.005. 3406 An option change inside a subpattern affects only that part 3407 of the subpattern @emph{that follows} it, so 3408 3409 @example 3410 (a(?i)b)c 3411 @end example 3412 3413 @noindent 3414 matches abc and aBc and no other strings (assuming 3415 case-sensitive matching is used). By this means, options can 3416 be made to have different settings in different parts of the 3417 pattern. Any changes made in one alternative do carry on 3418 into subsequent branches within the same subpattern. For 3419 example, 3420 3421 @example 3422 (a(?i)b|c) 3423 @end example 3424 3425 @noindent 3426 matches @samp{ab}, @samp{aB}, @samp{c}, and @samp{C}, 3427 even though when matching @samp{C} the first branch is 3428 abandoned before the option setting. 3429 This is because the effects of option settings happen at 3430 compile time. There would be some very weird behaviour otherwise. 3431 3432 @ignore 3433 There are two PCRE-specific options PCRE_UNGREEDY and PCRE_EXTRA 3434 that can be changed in the same way as the Perl-compatible options by 3435 using the characters U and X respectively. The (?X) flag 3436 setting is special in that it must always occur earlier in 3437 the pattern than any of the additional features it turns on, 3438 even when it is at top level. It is best put at the start. 3439 @end ignore 3440 3441 3442 @node Non-capturing subpatterns 3443 @appendixsec Non-capturing subpatterns 3444 @cindex Perl-style regular expressions, non-capturing subpatterns 3445 3446 Marking part of a pattern as a subpattern does two things. 3447 On one hand, it localizes a set of alternatives; on the other 3448 hand, it sets up the subpattern as a capturing subpattern (as 3449 defined above). The subpattern can be backreferenced and 3450 referenced in the right side of @code{s} commands. 3451 3452 For example, if the string @samp{the red king} is matched against 3453 the pattern 3454 3455 @example 3456 the ((red|white) (king|queen)) 3457 @end example 3458 3459 @noindent 3460 the captured substrings are @samp{red king}, @samp{red}, 3461 and @samp{king}, and are numbered 1, 2, and 3. 3462 3463 The fact that plain parentheses fulfil two functions is not 3464 always helpful. There are often times when a grouping 3465 subpattern is required without a capturing requirement. If an 3466 opening parenthesis is followed by @code{?:}, the subpattern does 3467 not do any capturing, and is not counted when computing the 3468 number of any subsequent capturing subpatterns. For example, 3469 if the string @samp{the white queen} is matched against the pattern 3470 3471 @example 3472 the ((?:red|white) (king|queen)) 3473 @end example 3474 3475 @noindent 3476 the captured substrings are @samp{white queen} and @samp{queen}, 3477 and are numbered 1 and 2. The maximum number of captured 3478 substrings is 99, while the maximum number of all subpatterns, 3479 both capturing and non-capturing, is 200. 3480 3481 As a convenient shorthand, if any option settings are 3482 equired at the start of a non-capturing subpattern, the 3483 option letters may appear between the @code{?} and the 3484 @code{:}. Thus the two patterns 3485 3486 @example 3487 (?i:saturday|sunday) 3488 (?:(?i)saturday|sunday) 3489 @end example 3490 3491 @noindent 3492 match exactly the same set of strings. Because alternative 3493 branches are tried from left to right, and options are not 3494 reset until the end of the subpattern is reached, an option 3495 setting in one branch does affect subsequent branches, so 3496 the above patterns match @samp{SUNDAY} as well as @samp{Saturday}. 3497 3498 3499 @node Repetition 3500 @appendixsec Repetition 3501 @cindex Perl-style regular expressions, repetitions 3502 3503 Repetition is specified by quantifiers, which can follow any 3504 of the following items: 3505 3506 @itemize @bullet 3507 @item 3508 a single character, possibly escaped 3509 3510 @item 3511 the @code{.} special character 3512 3513 @item 3514 a character class 3515 3516 @item 3517 a back reference (see next section) 3518 3519 @item 3520 a parenthesized subpattern (unless it is an assertion; @pxref{Assertions}) 3521 @end itemize 3522 3523 The general repetition quantifier specifies a minimum and 3524 maximum number of permitted matches, by giving the two 3525 numbers in curly brackets (braces), separated by a comma. 3526 The numbers must be less than 65536, and the first must be 3527 less than or equal to the second. For example: 3528 3529 @example 3530 z@{2,4@} 3531 @end example 3532 3533 @noindent 3534 matches @samp{zz}, @samp{zzz}, or @samp{zzzz}. A closing brace on its own 3535 is not a special character. If the second number is omitted, 3536 but the comma is present, there is no upper limit; if the 3537 second number and the comma are both omitted, the quantifier 3538 specifies an exact number of required matches. Thus 3539 3540 @example 3541 [aeiou]@{3,@} 3542 @end example 3543 3544 @noindent 3545 matches at least 3 successive vowels, but may match many 3546 more, while 3547 3548 @example 3549 \d@{8@} 3550 @end example 3551 3552 @noindent 3553 matches exactly 8 digits. An opening curly bracket that 3554 appears in a position where a quantifier is not allowed, or 3555 one that does not match the syntax of a quantifier, is taken 3556 as a literal character. For example, @{,6@} is not a quantifier, 3557 but a literal string of four characters.@footnote{It 3558 raises an error if @option{-R} is not used.} 3559 3560 The quantifier @samp{@{0@}} is permitted, causing the expression to 3561 behave as if the previous item and the quantifier were not 3562 present. 3563 3564 For convenience (and historical compatibility) the three 3565 most common quantifiers have single-character abbreviations: 3566 3567 @table @code 3568 @item * 3569 is equivalent to @{0,@} 3570 3571 @item + 3572 is equivalent to @{1,@} 3573 3574 @item ? 3575 is equivalent to @{0,1@} 3576 @end table 3577 3578 It is possible to construct infinite loops by following a 3579 subpattern that can match no characters with a quantifier 3580 that has no upper limit, for example: 3581 3582 @example 3583 (a?)* 3584 @end example 3585 3586 Earlier versions of Perl used to give an error at 3587 compile time for such patterns. However, because there are 3588 cases where this can be useful, such patterns are now 3589 accepted, but if any repetition of the subpattern does in 3590 fact match no characters, the loop is forcibly broken. 3591 3592 @cindex Greedy regular expression matching 3593 @cindex Perl-style regular expressions, stingy repetitions 3594 By default, the quantifiers are @dfn{greedy} like in @sc{posix} 3595 mode, that is, they match as much as possible (up to the maximum 3596 number of permitted times), without causing the rest of the 3597 pattern to fail. The classic example of where this gives problems 3598 is in trying to match comments in C programs. These appear between 3599 the sequences @code{/*} and @code{*/} and within the sequence, individual 3600 @code{*} and @code{/} characters may appear. An attempt to match C 3601 comments by applying the pattern 3602 3603 @example 3604 /\*.*\*/ 3605 @end example 3606 3607 @noindent 3608 to the string 3609 3610 @example 3611 /* first command */ not comment /* second comment */ 3612 @end example 3613 3614 @noindent 3615 3616 fails, because it matches the entire string owing to the 3617 greediness of the @code{.*} item. 3618 3619 However, if a quantifier is followed by a question mark, it 3620 ceases to be greedy, and instead matches the minimum number 3621 of times possible, so the pattern @code{/\*.*?\*/} 3622 does the right thing with the C comments. The meaning of the 3623 various quantifiers is not otherwise changed, just the preferred 3624 number of matches. Do not confuse this use of question 3625 mark with its use as a quantifier in its own right. 3626 Because it has two uses, it can sometimes appear doubled, as in 3627 3628 @example 3629 \d??\d 3630 @end example 3631 3632 which matches one digit by preference, but can match two if 3633 that is the only way the rest of the pattern matches. 3634 3635 Note that greediness does not matter when specifying addresses, 3636 but can be nevertheless used to improve performance. 3637 3638 @ignore 3639 If the PCRE_UNGREEDY option is set (an option which is not 3640 available in Perl), the quantifiers are not greedy by 3641 default, but individual ones can be made greedy by following 3642 them with a question mark. In other words, it inverts the 3643 default behaviour. 3644 @end ignore 3645 3646 When a parenthesized subpattern is quantified with a minimum 3647 repeat count that is greater than 1 or with a limited maximum, 3648 more store is required for the compiled pattern, in 3649 proportion to the size of the minimum or maximum. 3650 3651 @cindex Perl-style regular expressions, single line 3652 If a pattern starts with @code{.*} or @code{.@{0,@}} and the 3653 @code{S} modifier is used, the pattern is implicitly anchored, 3654 because whatever follows will be tried against every character 3655 position in the subject string, so there is no point in 3656 retrying the overall match at any position after the first. 3657 PCRE treats such a pattern as though it were preceded by \A. 3658 3659 When a capturing subpattern is repeated, the value captured 3660 is the substring that matched the final iteration. For example, 3661 after 3662 3663 @example 3664 (tweedle[dume]@{3@}\s*)+ 3665 @end example 3666 3667 @noindent 3668 has matched @samp{tweedledum tweedledee} the value of the 3669 captured substring is @samp{tweedledee}. However, if there are 3670 nested capturing subpatterns, the corresponding captured 3671 values may have been set in previous iterations. For example, 3672 after 3673 3674 @example 3675 /(a|(b))+/ 3676 @end example 3677 3678 matches @samp{aba}, the value of the second captured substring is 3679 @samp{b}. 3680 3681 @node Backreferences 3682 @appendixsec Backreferences 3683 @cindex Perl-style regular expressions, backreferences 3684 3685 Outside a character class, a backslash followed by a digit 3686 greater than 0 (and possibly further digits) is a back 3687 reference to a capturing subpattern earlier (i.e. to its 3688 left) in the pattern, provided there have been that many 3689 previous capturing left parentheses. 3690 3691 However, if the decimal number following the backslash is 3692 less than 10, it is always taken as a back reference, and 3693 causes an error only if there are not that many capturing 3694 left parentheses in the entire pattern. In other words, the 3695 parentheses that are referenced need not be to the left of 3696 the reference for numbers less than 10. @ref{Backslash} 3697 for further details of the handling of digits following a backslash. 3698 3699 A back reference matches whatever actually matched the capturing 3700 subpattern in the current subject string, rather than 3701 anything matching the subpattern itself. So the pattern 3702 3703 @example 3704 (sens|respons)e and \1ibility 3705 @end example 3706 3707 @noindent 3708 matches @samp{sense and sensibility} and @samp{response and responsibility}, 3709 but not @samp{sense and responsibility}. If caseful 3710 matching is in force at the time of the back reference, the 3711 case of letters is relevant. For example, 3712 3713 @example 3714 ((?i)blah)\s+\1 3715 @end example 3716 3717 @noindent 3718 matches @samp{blah blah} and @samp{Blah Blah}, but not 3719 @samp{BLAH blah}, even though the original capturing 3720 subpattern is matched caselessly. 3721 3722 There may be more than one back reference to the same subpattern. 3723 Also, if a subpattern has not actually been used in a 3724 particular match, any back references to it always fail. For 3725 example, the pattern 3726 3727 @example 3728 (a|(bc))\2 3729 @end example 3730 3731 @noindent 3732 always fails if it starts to match @samp{a} rather than 3733 @samp{bc}. Because there may be up to 99 back references, all 3734 digits following the backslash are taken as part of a potential 3735 back reference number; this is different from what happens 3736 in @sc{posix} mode. If the pattern continues with a digit 3737 character, some delimiter must be used to terminate the back 3738 reference. If the @code{X} modifier option is set, this can be 3739 whitespace. Otherwise an empty comment can be used, or the 3740 following character can be expressed in hexadecimal or octal. 3741 3742 A back reference that occurs inside the parentheses to which 3743 it refers fails when the subpattern is first used, so, for 3744 example, @code{(a\1)} never matches. However, such references 3745 can be useful inside repeated subpatterns. For example, the 3746 pattern 3747 3748 @example 3749 (a|b\1)+ 3750 @end example 3751 3752 @noindent 3753 matches any number of @samp{a}s and also @samp{aba}, @samp{ababbaa}, 3754 etc. At each iteration of the subpattern, the back reference matches 3755 the character string corresponding to the previous iteration. In 3756 order for this to work, the pattern must be such that the first 3757 iteration does not need to match the back reference. This can be 3758 done using alternation, as in the example above, or by a 3759 quantifier with a minimum of zero. 3760 3761 @node Assertions 3762 @appendixsec Assertions 3763 @cindex Perl-style regular expressions, assertions 3764 @cindex Perl-style regular expressions, asserting subpatterns 3765 3766 An assertion is a test on the characters following or 3767 preceding the current matching point that does not actually 3768 consume any characters. The simple assertions coded as @code{\b}, 3769 @code{\B}, @code{\A}, @code{\Z}, @code{\z}, @code{^} and @code{$} 3770 are described above. More complicated assertions are coded as 3771 subpatterns. There are two kinds: those that look ahead of the 3772 current position in the subject string, and those that look behind it. 3773 3774 @cindex Perl-style regular expressions, lookahead subpatterns 3775 An assertion subpattern is matched in the normal way, except 3776 that it does not cause the current matching position to be 3777 changed. Lookahead assertions start with @code{(?=} for positive 3778 assertions and @code{(?!} for negative assertions. For example, 3779 3780 @example 3781 \w+(?=;) 3782 @end example 3783 3784 @noindent 3785 matches a word followed by a semicolon, but does not include 3786 the semicolon in the match, and 3787 3788 @example 3789 foo(?!bar) 3790 @end example 3791 3792 @noindent 3793 matches any occurrence of @samp{foo} that is not followed by 3794 @samp{bar}. 3795 3796 Note that the apparently similar pattern 3797 3798 @example 3799 (?!foo)bar 3800 @end example 3801 3802 @noindent 3803 @cindex Perl-style regular expressions, lookbehind subpatterns 3804 finds any occurrence of @samp{bar} even if it is preceded by 3805 @samp{foo}, because the assertion @code{(?!foo)} is always true 3806 when the next three characters are @samp{bar}. A lookbehind 3807 assertion is needed to achieve this effect. 3808 Lookbehind assertions start with @code{(?<=} for positive 3809 assertions and @code{(?<!} for negative assertions. So, 3810 3811 @example 3812 (?<!foo)bar 3813 @end example 3814 3815 achieves the required effect of finding an occurrence of 3816 @samp{bar} that is not preceded by @samp{foo}. The contents of a 3817 lookbehind assertion are restricted 3818 such that all the strings it matches must have a fixed 3819 length. However, if there are several alternatives, they do 3820 not all have to have the same fixed length. This is an extension 3821 compared with Perl 5.005, which requires all branches to match 3822 the same length of string. Thus 3823 3824 @example 3825 (?<=dogs|cats|) 3826 @end example 3827 3828 @noindent 3829 is permitted, but the apparently equivalent regular expression 3830 3831 @example 3832 (?<!dogs?|cats?) 3833 @end example 3834 3835 @noindent 3836 causes an error at compile time. Branches that match different 3837 length strings are permitted only at the top level of 3838 a lookbehind assertion: an assertion such as 3839 3840 @example 3841 (?<=ab(c|de)) 3842 @end example 3843 3844 @noindent 3845 is not permitted, because its single top-level branch can 3846 match two different lengths, but it is acceptable if rewritten 3847 to use two top-level branches: 3848 3849 @example 3850 (?<=abc|abde) 3851 @end example 3852 3853 All this is required because lookbehind assertions simply 3854 move the current position back by the alternative's fixed 3855 width and then try to match. If there are 3856 insufficient characters before the current position, the 3857 match is deemed to fail. Lookbehinds, in conjunction with 3858 non-backtracking subpatterns can be particularly useful for 3859 matching at the ends of strings; an example is given at the end 3860 of the section on non-backtracking subpatterns. 3861 3862 Several assertions (of any sort) may occur in succession. 3863 For example, 3864 3865 @example 3866 (?<=\d@{3@})(?<!999)foo 3867 @end example 3868 3869 @noindent 3870 matches @samp{foo} preceded by three digits that are not @samp{999}. 3871 Notice that each of the assertions is applied independently 3872 at the same point in the subject string. First there is a 3873 check that the previous three characters are all digits, and 3874 then there is a check that the same three characters are not 3875 @samp{999}. This pattern does not match @samp{foo} preceded by six 3876 characters, the first of which are digits and the last three 3877 of which are not @samp{999}. For example, it doesn't match 3878 @samp{123abcfoo}. A pattern to do that is 3879 3880 @example 3881 (?<=\d@{3@}...)(?<!999)foo 3882 @end example 3883 3884 @noindent 3885 This time the first assertion looks at the preceding six 3886 characters, checking that the first three are digits, and 3887 then the second assertion checks that the preceding three 3888 characters are not @samp{999}. Actually, assertions can be 3889 nested in any combination, so one can write this as 3890 3891 @example 3892 (?<=\d@{3@}(?!999)...)foo 3893 @end example 3894 3895 or 3896 3897 @example 3898 (?<=\d@{3@}...(?<!999))foo 3899 @end example 3900 3901 @noindent 3902 both of which might be considered more readable. 3903 3904 Assertion subpatterns are not capturing subpatterns, and may 3905 not be repeated, because it makes no sense to assert the 3906 same thing several times. If any kind of assertion contains 3907 capturing subpatterns within it, these are counted for the 3908 purposes of numbering the capturing subpatterns in the whole 3909 pattern. However, substring capturing is carried out only 3910 for positive assertions, because it does not make sense for 3911 negative assertions. 3912 3913 Assertions count towards the maximum of 200 parenthesized 3914 subpatterns. 3915 3916 @node Non-backtracking subpatterns 3917 @appendixsec Non-backtracking subpatterns 3918 @cindex Perl-style regular expressions, non-backtracking subpatterns 3919 3920 With both maximizing and minimizing repetition, failure of 3921 what follows normally causes the repeated item to be evaluated 3922 again to see if a different number of repeats allows the 3923 rest of the pattern to match. Sometimes it is useful to 3924 prevent this, either to change the nature of the match, or 3925 to cause it fail earlier than it otherwise might, when the 3926 author of the pattern knows there is no point in carrying 3927 on. 3928 3929 Consider, for example, the pattern @code{\d+foo} when applied to 3930 the subject line 3931 3932 @example 3933 123456bar 3934 @end example 3935 3936 After matching all 6 digits and then failing to match @samp{foo}, 3937 the normal action of the matcher is to try again with only 5 3938 digits matching the @code{\d+} item, and then with 4, and so on, 3939 before ultimately failing. Non-backtracking subpatterns 3940 provide the means for specifying that once a portion of the 3941 pattern has matched, it is not to be re-evaluated in this way, 3942 so the matcher would give up immediately on failing to match 3943 @samp{foo} the first time. The notation is another kind of special 3944 parenthesis, starting with @code{(?>} as in this example: 3945 3946 @example 3947 (?>\d+)bar 3948 @end example 3949 3950 This kind of parenthesis ``locks up'' the part of the pattern 3951 it contains once it has matched, and a failure further into 3952 the pattern is prevented from backtracking into it. 3953 Backtracking past it to previous items, however, works as 3954 normal. 3955 3956 Non-backtracking subpatterns are not capturing subpatterns. Simple 3957 cases such as the above example can be thought of as a maximizing 3958 repeat that must swallow everything it can. So, 3959 while both @code{\d+} and @code{\d+?} are prepared to adjust the number of 3960 digits they match in order to make the rest of the pattern 3961 match, @code{(?>\d+)} can only match an entire sequence of digits. 3962 3963 This construction can of course contain arbitrarily complicated 3964 subpatterns, and it can be nested. 3965 3966 @cindex Perl-style regular expressions, lookbehind subpatterns 3967 Non-backtracking subpatterns can be used in conjunction with look-behind 3968 assertions to specify efficient matching at the end 3969 of the subject string. Consider a simple pattern such as 3970 3971 @example 3972 abcd$ 3973 @end example 3974 3975 @noindent 3976 when applied to a long string which does not match. Because 3977 matching proceeds from left to right, @command{sed} will look for 3978 each @samp{a} in the subject and then see if what follows matches 3979 the rest of the pattern. If the pattern is specified as 3980 3981 @example 3982 ^.*abcd$ 3983 @end example 3984 3985 @noindent 3986 the initial @code{.*} matches the entire string at first, but when 3987 this fails (because there is no following @samp{a}), it backtracks 3988 to match all but the last character, then all but the 3989 last two characters, and so on. Once again the search for 3990 @samp{a} covers the entire string, from right to left, so we are 3991 no better off. However, if the pattern is written as 3992 3993 @example 3994 ^(?>.*)(?<=abcd) 3995 @end example 3996 3997 there can be no backtracking for the .* item; it can match 3998 only the entire string. The subsequent lookbehind assertion 3999 does a single test on the last four characters. If it fails, 4000 the match fails immediately. For long strings, this approach 4001 makes a significant difference to the processing time. 4002 4003 When a pattern contains an unlimited repeat inside a subpattern 4004 that can itself be repeated an unlimited number of 4005 times, the use of a once-only subpattern is the only way to 4006 avoid some failing matches taking a very long time 4007 indeed.@footnote{Actually, the matcher embedded in @value{SSED} 4008 tries to do something for this in the simplest cases, 4009 like @code{([^b]*b)*}. These cases are actually quite 4010 common: they happen for example in a regular expression 4011 like @code{\/\*([^*]*\*)*\/} which matches C comments.} 4012 4013 The pattern 4014 4015 @example 4016 (\D+|<\d+>)*[!?] 4017 @end example 4018 4019 ([^0-9<]+<(\d+>)?)*[!?] 4020 4021 @noindent 4022 matches an unlimited number of substrings that either consist 4023 of non-digits, or digits enclosed in angular brackets, followed by 4024 an exclamation or question mark. When it matches, it runs quickly. 4025 However, if it is applied to 4026 4027 @example 4028 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 4029 @end example 4030 4031 @noindent 4032 it takes a long time before reporting failure. This is 4033 because the string can be divided between the two repeats in 4034 a large number of ways, and all have to be tried.@footnote{The 4035 example used @code{[!?]} rather than a single character at the end, 4036 because both @value{SSED} and Perl have an optimization that allows 4037 for fast failure when a single character is used. They 4038 remember the last single character that is required for a 4039 match, and fail early if it is not present in the string.} 4040 4041 If the pattern is changed to 4042 4043 @example 4044 ((?>\D+)|<\d+>)*[!?] 4045 @end example 4046 4047 sequences of non-digits cannot be broken, and failure happens 4048 quickly. 4049 4050 @node Conditional subpatterns 4051 @appendixsec Conditional subpatterns 4052 @cindex Perl-style regular expressions, conditional subpatterns 4053 4054 It is possible to cause the matching process to obey a subpattern 4055 conditionally or to choose between two alternative 4056 subpatterns, depending on the result of an assertion, or 4057 whether a previous capturing subpattern matched or not. The 4058 two possible forms of conditional subpattern are 4059 4060 @example 4061 (?(@var{condition})@var{yes-pattern}) 4062 (?(@var{condition})@var{yes-pattern}|@var{no-pattern}) 4063 @end example 4064 4065 If the condition is satisfied, the yes-pattern is used; otherwise 4066 the no-pattern (if present) is used. If there are more than two 4067 alternatives in the subpattern, a compile-time error occurs. 4068 4069 There are two kinds of condition. If the text between the 4070 parentheses consists of a sequence of digits, the condition 4071 is satisfied if the capturing subpattern of that number has 4072 previously matched. The number must be greater than zero. 4073 Consider the following pattern, which contains non-significant 4074 white space to make it more readable (assume the @code{X} modifier) 4075 and to divide it into three parts for ease of discussion: 4076 4077 @example 4078 ( \( )? [^()]+ (?(1) \) ) 4079 @end example 4080 4081 The first part matches an optional opening parenthesis, and 4082 if that character is present, sets it as the first captured 4083 substring. The second part matches one or more characters 4084 that are not parentheses. The third part is a conditional 4085 subpattern that tests whether the first set of parentheses 4086 matched or not. If they did, that is, if subject started 4087 with an opening parenthesis, the condition is true, and so 4088 the yes-pattern is executed and a closing parenthesis is 4089 required. Otherwise, since no-pattern is not present, the 4090 subpattern matches nothing. In other words, this pattern 4091 matches a sequence of non-parentheses, optionally enclosed 4092 in parentheses. 4093 4094 @cindex Perl-style regular expressions, lookahead subpatterns 4095 If the condition is not a sequence of digits, it must be an 4096 assertion. This may be a positive or negative lookahead or 4097 lookbehind assertion. Consider this pattern, again containing 4098 non-significant white space, and with the two alternatives 4099 on the second line: 4100 4101 @example 4102 (?(?=...[a-z]) 4103 \d\d-[a-z]@{3@}-\d\d | 4104 \d\d-\d\d-\d\d ) 4105 @end example 4106 4107 The condition is a positive lookahead assertion that matches 4108 a letter that is three characters away from the current point. 4109 If a letter is found, the subject is matched against the first 4110 alternative @samp{@var{dd}-@var{aaa}-@var{dd}} (where @var{aaa} are 4111 letters and @var{dd} are digits); otherwise it is matched against 4112 the second alternative, @samp{@var{dd}-@var{dd}-@var{dd}}. 4113 4114 4115 @node Recursive patterns 4116 @appendixsec Recursive patterns 4117 @cindex Perl-style regular expressions, recursive patterns 4118 @cindex Perl-style regular expressions, recursion 4119 4120 Consider the problem of matching a string in parentheses, 4121 allowing for unlimited nested parentheses. Without the use 4122 of recursion, the best that can be done is to use a pattern 4123 that matches up to some fixed depth of nesting. It is not 4124 possible to handle an arbitrary nesting depth. Perl 5.6 has 4125 provided an experimental facility that allows regular 4126 expressions to recurse (amongst other things). It does this 4127 by interpolating Perl code in the expression at run time, 4128 and the code can refer to the expression itself. A Perl pattern 4129 tern to solve the parentheses problem can be created like 4130 this: 4131 4132 @example 4133 $re = qr@{\( (?: (?>[^()]+) | (?p@{$re@}) )* \)@}x; 4134 @end example 4135 4136 The @code{(?p@{...@})} item interpolates Perl code at run time, 4137 and in this case refers recursively to the pattern in which it 4138 appears. Obviously, @command{sed} cannot support the interpolation of 4139 Perl code. Instead, the special item @code{(?R)} is provided for 4140 the specific case of recursion. This pattern solves the 4141 parentheses problem (assume the @code{X} modifier option is used 4142 so that white space is ignored): 4143 4144 @example 4145 \( ( (?>[^()]+) | (?R) )* \) 4146 @end example 4147 4148 First it matches an opening parenthesis. Then it matches any 4149 number of substrings which can either be a sequence of 4150 non-parentheses, or a recursive match of the pattern itself 4151 (i.e. a correctly parenthesized substring). Finally there is 4152 a closing parenthesis. 4153 4154 This particular example pattern contains nested unlimited 4155 repeats, and so the use of a non-backtracking subpattern for 4156 matching strings of non-parentheses is important when applying 4157 the pattern to strings that do not match. For example, when 4158 it is applied to 4159 4160 @example 4161 (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa() 4162 @end example 4163 4164 it yields a ``no match'' response quickly. However, if a 4165 standard backtracking subpattern is not used, the match runs 4166 for a very long time indeed because there are so many different 4167 ways the @code{+} and @code{*} repeats can carve up the subject, 4168 and all have to be tested before failure can be reported. 4169 4170 The values set for any capturing subpatterns are those from 4171 the outermost level of the recursion at which the subpattern 4172 value is set. If the pattern above is matched against 4173 4174 @example 4175 (ab(cd)ef) 4176 @end example 4177 4178 @noindent 4179 the value for the capturing parentheses is @samp{ef}, which is 4180 the last value taken on at the top level. 4181 4182 @node Comments 4183 @appendixsec Comments 4184 @cindex Perl-style regular expressions, comments 4185 4186 The sequence (?# marks the start of a comment which continues 4187 ues up to the next closing parenthesis. Nested parentheses 4188 are not permitted. The characters that make up a comment 4189 play no part in the pattern matching at all. 4190 4191 @cindex Perl-style regular expressions, extended 4192 If the @code{X} modifier option is used, an unescaped @code{#} character 4193 outside a character class introduces a comment that continues 4194 up to the next newline character in the pattern. 4195 @end ifset 5857 5858 5859 @page 5860 @node GNU Free Documentation License 5861 @appendix GNU Free Documentation License 5862 5863 @include fdl.texi 4196 5864 4197 5865 -
trunk/src/sed/doc/sed.x
r599 r3613 1 .SH NAME 2 sed \- a Stream EDitor3 .SH SYNOPSIS 1 [NAME] 2 sed \- stream editor for filtering and transforming text 3 [SYNOPSIS] 4 4 .nf 5 5 sed [-V] [--version] [--help] [-n] [--quiet] [--silent] 6 6 [-l N] [--line-length=N] [-u] [--unbuffered] 7 [- r] [--regexp-extended]7 [-E] [-r] [--regexp-extended] 8 8 [-e script] [--expression=script] 9 9 [-f script-file] [--file=script-file] … … 43 43 .RI # comment 44 44 The comment extends until the next newline (or the end of a 45 .B -e45 .B \-e 46 46 script fragment). 47 47 .TP … … 68 68 which has each embedded newline preceded by a backslash. 69 69 .TP 70 q 70 q [\fIexit-code\fR] 71 71 Immediately quit the \*(sd script without processing 72 any more input, 73 except that if auto-print is not disabled 74 the current pattern space will be printed.75 .TP 76 Q 72 any more input, except that if auto-print is not disabled 73 the current pattern space will be printed. The exit code 74 argument is a GNU extension. 75 .TP 76 Q [\fIexit-code\fR] 77 77 Immediately quit the \*(sd script without processing 78 any more input. 78 any more input. This is a GNU extension. 79 79 .TP 80 80 .RI r\ filename … … 85 85 Append a line read from 86 86 .IR filename . 87 Each invocation of the command reads a line from the file. 88 This is a GNU extension. 87 89 .SS 88 90 Commands which accept address ranges … … 98 100 is omitted, branch to end of script. 99 101 .TP 100 .RI t\ label101 If a s/// has done a successful substitution since the102 last input line was read and since the last t or T103 command, then branch to104 .IR label ;105 if106 .I label107 is omitted, branch to end of script.108 .TP109 .RI T\ label110 If no s/// has done a successful substitution since the111 last input line was read and since the last t or T112 command, then branch to113 .IR label ;114 if115 .I label116 is omitted, branch to end of script.117 .TP118 102 c \e 119 103 .TP … … 128 112 .TP 129 113 D 130 Delete up to the first embedded newline in the pattern space. 131 Start next cycle, but skip reading from the input 132 if there is still data in the pattern space. 114 If pattern space contains no newline, start a normal new cycle as if 115 the d command was issued. Otherwise, delete text in the pattern 116 space up to the first newline, and restart cycle with the resultant 117 pattern space, without reading a new line of input. 133 118 .TP 134 119 h H … … 138 123 Copy/append hold space to pattern space. 139 124 .TP 140 x141 Exchange the contents of the hold and pattern spaces.142 .TP143 125 l 144 126 List out the current line in a ``visually unambiguous'' form. 127 .TP 128 .RI l\ width 129 List out the current line in a ``visually unambiguous'' form, 130 breaking it at 131 .I width 132 characters. This is a GNU extension. 145 133 .TP 146 134 n N … … 169 157 .IR regexp . 170 158 .TP 159 .RI t\ label 160 If a s/// has done a successful substitution since the 161 last input line was read and since the last t or T 162 command, then branch to 163 .IR label ; 164 if 165 .I label 166 is omitted, branch to end of script. 167 .TP 168 .RI T\ label 169 If no s/// has done a successful substitution since the 170 last input line was read and since the last t or T 171 command, then branch to 172 .IR label ; 173 if 174 .I label 175 is omitted, branch to end of script. This is a GNU 176 extension. 177 .TP 171 178 .RI w\ filename 172 179 Write the current pattern space to … … 176 183 Write the first line of the current pattern space to 177 184 .IR filename . 185 This is a GNU extension. 186 .TP 187 x 188 Exchange the contents of the hold and pattern spaces. 178 189 .TP 179 190 .RI y/ source / dest / … … 223 234 .I number 224 235 Match only the specified line 225 .IR number . 236 .IR number 237 (which increments cumulatively across files, unless the 238 .B \-s 239 option is specified on the command line). 226 240 .TP 227 241 .IR first ~ step … … 230 244 line starting with line 231 245 .IR first . 232 For example, ``sed -n 1~2p'' will print all the odd-numbered lines in246 For example, ``sed \-n 1~2p'' will print all the odd-numbered lines in 233 247 the input stream, and the address 2~5 will match every fifth line, 234 starting with the second. (This is an extension.) 248 starting with the second. 249 .I first 250 can be zero; in this case, \*(sd operates as if it were equal to 251 .IR step . 252 (This is an extension.) 235 253 .TP 236 254 $ … … 240 258 Match lines matching the regular expression 241 259 .IR regexp . 260 Matching is performed on the current pattern space, which 261 can be modified with commands such as ``s///''. 242 262 .TP 243 263 .BI \fR\e\fPc regexp c … … 263 283 .RI 1, addr2 264 284 form will still be at the beginning of its range. 285 This works only when 286 .I addr2 287 is a regular expression. 265 288 .TP 266 289 .IR addr1 ,+ N … … 292 315 .BR \et , 293 316 and other sequences. 317 The \fI-E\fP option switches to using extended regular expressions instead; 318 it has been supported for years by GNU sed, and is now 319 included in POSIX. 294 320 295 321 [SEE ALSO] … … 308 334 .PP 309 335 E-mail bug reports to 310 .BR bonzini@gnu.org . 311 Be sure to include the word ``sed'' somewhere in the ``Subject:'' field. 312 Also, please include the output of ``sed --version'' in the body 336 .BR bug-sed@gnu.org . 337 Also, please include the output of ``sed \-\-version'' in the body 313 338 of your report if at all possible. -
trunk/src/sed/doc/stamp-vti
r599 r3613 1 @set UPDATED 30 January 20062 @set UPDATED-MONTH January 20 063 @set EDITION 4. 1.54 @set VERSION 4. 1.51 @set UPDATED 1 January 2022 2 @set UPDATED-MONTH January 2022 3 @set EDITION 4.9 4 @set VERSION 4.9 -
trunk/src/sed/doc/version.texi
r599 r3613 1 @set UPDATED 30 January 20062 @set UPDATED-MONTH January 20 063 @set EDITION 4. 1.54 @set VERSION 4. 1.51 @set UPDATED 1 January 2022 2 @set UPDATED-MONTH January 2022 3 @set EDITION 4.9 4 @set VERSION 4.9
Note:
See TracChangeset
for help on using the changeset viewer.