| [599] | 1 | \input texinfo  @c -*-texinfo-*- | 
|---|
|  | 2 | @c | 
|---|
|  | 3 | @c -- Stuff that needs adding: ---------------------------------------------- | 
|---|
| [3613] | 4 | @c (nothing!) | 
|---|
| [599] | 5 | @c -------------------------------------------------------------------------- | 
|---|
|  | 6 | @c Check for consistency: regexps in @code, text that they match in @samp. | 
|---|
| [3613] | 7 | @c | 
|---|
| [599] | 8 | @c Tips: | 
|---|
|  | 9 | @c    @command for command | 
|---|
|  | 10 | @c    @samp for command fragments: @samp{cat -s} | 
|---|
|  | 11 | @c    @code for sed commands and flags | 
|---|
|  | 12 | @c    Use ``quote'' not `quote' or "quote". | 
|---|
|  | 13 | @c | 
|---|
|  | 14 | @c %**start of header | 
|---|
|  | 15 | @setfilename sed.info | 
|---|
|  | 16 | @settitle sed, a stream editor | 
|---|
|  | 17 | @c %**end of header | 
|---|
|  | 18 |  | 
|---|
|  | 19 | @c @smallbook | 
|---|
|  | 20 |  | 
|---|
|  | 21 | @include version.texi | 
|---|
|  | 22 |  | 
|---|
|  | 23 | @c Combine indices. | 
|---|
|  | 24 | @syncodeindex ky cp | 
|---|
|  | 25 | @syncodeindex pg cp | 
|---|
|  | 26 | @syncodeindex tp cp | 
|---|
|  | 27 |  | 
|---|
|  | 28 | @defcodeindex op | 
|---|
|  | 29 | @syncodeindex op fn | 
|---|
|  | 30 |  | 
|---|
|  | 31 | @include config.texi | 
|---|
|  | 32 |  | 
|---|
|  | 33 | @copying | 
|---|
|  | 34 | This file documents version @value{VERSION} of | 
|---|
|  | 35 | @value{SSED}, a stream editor. | 
|---|
|  | 36 |  | 
|---|
| [3613] | 37 | Copyright @copyright{} 1998--2022 Free Software Foundation, Inc. | 
|---|
| [599] | 38 |  | 
|---|
| [3613] | 39 | @quotation | 
|---|
|  | 40 | Permission is granted to copy, distribute and/or modify this document | 
|---|
|  | 41 | under the terms of the GNU Free Documentation License, Version 1.3 | 
|---|
|  | 42 | or any later version published by the Free Software Foundation; | 
|---|
|  | 43 | with no Invariant Sections, no Front-Cover Texts, and no | 
|---|
|  | 44 | Back-Cover Texts.  A copy of the license is included in the | 
|---|
|  | 45 | section entitled ``GNU Free Documentation License''. | 
|---|
|  | 46 | @end quotation | 
|---|
| [599] | 47 | @end copying | 
|---|
|  | 48 |  | 
|---|
|  | 49 | @setchapternewpage off | 
|---|
|  | 50 |  | 
|---|
|  | 51 | @titlepage | 
|---|
| [3613] | 52 | @title @value{SSED}, a stream editor | 
|---|
| [599] | 53 | @subtitle version @value{VERSION}, @value{UPDATED} | 
|---|
| [3613] | 54 | @author by Ken Pizzini, Paolo Bonzini, Jim Meyering, Assaf Gordon | 
|---|
| [599] | 55 |  | 
|---|
|  | 56 | @page | 
|---|
|  | 57 | @vskip 0pt plus 1filll | 
|---|
|  | 58 | @insertcopying | 
|---|
|  | 59 | @end titlepage | 
|---|
|  | 60 |  | 
|---|
| [3613] | 61 | @contents | 
|---|
| [599] | 62 |  | 
|---|
| [3613] | 63 | @ifnottex | 
|---|
| [599] | 64 | @node Top | 
|---|
| [3613] | 65 | @top @value{SSED} | 
|---|
| [599] | 66 |  | 
|---|
|  | 67 | @insertcopying | 
|---|
|  | 68 | @end ifnottex | 
|---|
|  | 69 |  | 
|---|
|  | 70 | @menu | 
|---|
|  | 71 | * Introduction::               Introduction | 
|---|
|  | 72 | * Invoking sed::               Invocation | 
|---|
| [3613] | 73 | * sed scripts::                @command{sed} scripts | 
|---|
|  | 74 | * sed addresses::              Addresses: selecting lines | 
|---|
|  | 75 | * sed regular expressions::    Regular expressions: selecting text | 
|---|
|  | 76 | * advanced sed::               Advanced @command{sed}: cycles and buffers | 
|---|
| [599] | 77 | * Examples::                   Some sample scripts | 
|---|
|  | 78 | * Limitations::                Limitations and (non-)limitations of @value{SSED} | 
|---|
|  | 79 | * Other Resources::            Other resources for learning about @command{sed} | 
|---|
|  | 80 | * Reporting Bugs::             Reporting bugs | 
|---|
| [3613] | 81 | * GNU Free Documentation License:: Copying and sharing this manual | 
|---|
| [599] | 82 | * Concept Index::              A menu with all the topics in this manual. | 
|---|
|  | 83 | * Command and Option Index::   A menu with all @command{sed} commands and | 
|---|
|  | 84 | command-line options. | 
|---|
|  | 85 | @end menu | 
|---|
|  | 86 |  | 
|---|
|  | 87 |  | 
|---|
|  | 88 | @node Introduction | 
|---|
|  | 89 | @chapter Introduction | 
|---|
|  | 90 |  | 
|---|
|  | 91 | @cindex Stream editor | 
|---|
|  | 92 | @command{sed} is a stream editor. | 
|---|
|  | 93 | A stream editor is used to perform basic text | 
|---|
|  | 94 | transformations on an input stream | 
|---|
|  | 95 | (a file or input from a pipeline). | 
|---|
|  | 96 | While in some ways similar to an editor which | 
|---|
|  | 97 | permits scripted edits (such as @command{ed}), | 
|---|
|  | 98 | @command{sed} works by making only one pass over the | 
|---|
|  | 99 | input(s), and is consequently more efficient. | 
|---|
|  | 100 | But it is @command{sed}'s ability to filter text in a pipeline | 
|---|
|  | 101 | which particularly distinguishes it from other types of | 
|---|
|  | 102 | editors. | 
|---|
|  | 103 |  | 
|---|
|  | 104 |  | 
|---|
|  | 105 | @node Invoking sed | 
|---|
| [3613] | 106 | @chapter Running sed | 
|---|
| [599] | 107 |  | 
|---|
| [3613] | 108 | This chapter covers how to run @command{sed}. Details of @command{sed} | 
|---|
|  | 109 | scripts and individual @command{sed} commands are discussed in the | 
|---|
|  | 110 | next chapter. | 
|---|
|  | 111 |  | 
|---|
|  | 112 | @menu | 
|---|
|  | 113 | * Overview:: | 
|---|
|  | 114 | * Command-Line Options:: | 
|---|
|  | 115 | * Exit status:: | 
|---|
|  | 116 | @end menu | 
|---|
|  | 117 |  | 
|---|
|  | 118 |  | 
|---|
|  | 119 | @node Overview | 
|---|
|  | 120 | @section Overview | 
|---|
| [599] | 121 | Normally @command{sed} is invoked like this: | 
|---|
|  | 122 |  | 
|---|
|  | 123 | @example | 
|---|
|  | 124 | sed SCRIPT INPUTFILE... | 
|---|
|  | 125 | @end example | 
|---|
|  | 126 |  | 
|---|
| [3613] | 127 | For example, to change every @samp{hello} to @samp{world} | 
|---|
|  | 128 | in the file @file{input.txt}: | 
|---|
|  | 129 |  | 
|---|
|  | 130 | @example | 
|---|
|  | 131 | sed 's/hello/world/g' input.txt > output.txt | 
|---|
|  | 132 | @end example | 
|---|
|  | 133 |  | 
|---|
|  | 134 | Without the @samp{g} (global) modifier, @command{sed} affects | 
|---|
|  | 135 | only the first instance per line. | 
|---|
|  | 136 |  | 
|---|
|  | 137 | @cindex stdin | 
|---|
|  | 138 | @cindex standard input | 
|---|
|  | 139 | If you do not specify @var{INPUTFILE}, or if @var{INPUTFILE} is @file{-}, | 
|---|
|  | 140 | @command{sed} filters the contents of the standard input. The following | 
|---|
|  | 141 | commands are equivalent: | 
|---|
|  | 142 |  | 
|---|
|  | 143 | @example | 
|---|
|  | 144 | sed 's/hello/world/g' input.txt > output.txt | 
|---|
|  | 145 | sed 's/hello/world/g' < input.txt > output.txt | 
|---|
|  | 146 | cat input.txt | sed 's/hello/world/g' - > output.txt | 
|---|
|  | 147 | @end example | 
|---|
|  | 148 |  | 
|---|
|  | 149 | @cindex stdout | 
|---|
|  | 150 | @cindex output | 
|---|
|  | 151 | @cindex standard output | 
|---|
|  | 152 | @cindex -i, example | 
|---|
|  | 153 | @command{sed} writes output to standard output. Use @option{-i} to edit | 
|---|
|  | 154 | files in-place instead of printing to standard output. | 
|---|
|  | 155 | See also the @code{W} and @code{s///w} commands for writing output to | 
|---|
|  | 156 | other files. The following command modifies @file{file.txt} and | 
|---|
|  | 157 | does not produce any output: | 
|---|
|  | 158 |  | 
|---|
|  | 159 | @example | 
|---|
|  | 160 | sed -i 's/hello/world/' file.txt | 
|---|
|  | 161 | @end example | 
|---|
|  | 162 |  | 
|---|
|  | 163 | @cindex -n, example | 
|---|
|  | 164 | @cindex p, example | 
|---|
|  | 165 | @cindex suppressing output | 
|---|
|  | 166 | @cindex output, suppressing | 
|---|
|  | 167 | By default @command{sed} prints all processed input (except input | 
|---|
|  | 168 | that has been modified/deleted by commands such as @command{d}). | 
|---|
|  | 169 | Use @option{-n} to suppress output, and the @code{p} command | 
|---|
|  | 170 | to print specific lines. The following command prints only line 45 | 
|---|
|  | 171 | of the input file: | 
|---|
|  | 172 |  | 
|---|
|  | 173 | @example | 
|---|
|  | 174 | sed -n '45p' file.txt | 
|---|
|  | 175 | @end example | 
|---|
|  | 176 |  | 
|---|
|  | 177 |  | 
|---|
|  | 178 |  | 
|---|
|  | 179 | @cindex multiple files | 
|---|
|  | 180 | @cindex -s, example | 
|---|
|  | 181 | @command{sed} treats multiple input files as one long stream. | 
|---|
|  | 182 | The following example prints the first line of the first file | 
|---|
|  | 183 | (@file{one.txt}) and the last line of the last file (@file{three.txt}). | 
|---|
|  | 184 | Use @option{-s} to reverse this behavior. | 
|---|
|  | 185 |  | 
|---|
|  | 186 | @example | 
|---|
|  | 187 | sed -n  '1p ; $p' one.txt two.txt three.txt | 
|---|
|  | 188 | @end example | 
|---|
|  | 189 |  | 
|---|
|  | 190 |  | 
|---|
|  | 191 | @cindex -e, example | 
|---|
|  | 192 | @cindex --expression, example | 
|---|
|  | 193 | @cindex -f, example | 
|---|
|  | 194 | @cindex --file, example | 
|---|
|  | 195 | @cindex script parameter | 
|---|
|  | 196 | @cindex parameters, script | 
|---|
|  | 197 | Without @option{-e} or @option{-f} options, @command{sed} uses | 
|---|
|  | 198 | the first non-option parameter as the @var{script}, and the following | 
|---|
|  | 199 | non-option parameters as input files. | 
|---|
|  | 200 | If @option{-e} or @option{-f} options are used to specify a @var{script}, | 
|---|
|  | 201 | all non-option parameters are taken as input files. | 
|---|
|  | 202 | Options @option{-e} and @option{-f} can be combined, and can appear | 
|---|
|  | 203 | multiple times (in which case the final effective @var{script} will be | 
|---|
|  | 204 | concatenation of all the individual @var{script}s). | 
|---|
|  | 205 |  | 
|---|
|  | 206 | The following examples are equivalent: | 
|---|
|  | 207 |  | 
|---|
|  | 208 | @example | 
|---|
|  | 209 | sed 's/hello/world/' input.txt > output.txt | 
|---|
|  | 210 |  | 
|---|
|  | 211 | sed -e 's/hello/world/' input.txt > output.txt | 
|---|
|  | 212 | sed --expression='s/hello/world/' input.txt > output.txt | 
|---|
|  | 213 |  | 
|---|
|  | 214 | echo 's/hello/world/' > myscript.sed | 
|---|
|  | 215 | sed -f myscript.sed input.txt > output.txt | 
|---|
|  | 216 | sed --file=myscript.sed input.txt > output.txt | 
|---|
|  | 217 | @end example | 
|---|
|  | 218 |  | 
|---|
|  | 219 |  | 
|---|
|  | 220 | @node Command-Line Options | 
|---|
|  | 221 | @section Command-Line Options | 
|---|
|  | 222 |  | 
|---|
| [599] | 223 | The full format for invoking @command{sed} is: | 
|---|
|  | 224 |  | 
|---|
|  | 225 | @example | 
|---|
|  | 226 | sed OPTIONS... [SCRIPT] [INPUTFILE...] | 
|---|
|  | 227 | @end example | 
|---|
|  | 228 |  | 
|---|
|  | 229 | @command{sed} may be invoked with the following command-line options: | 
|---|
|  | 230 |  | 
|---|
|  | 231 | @table @code | 
|---|
|  | 232 | @item --version | 
|---|
|  | 233 | @opindex --version | 
|---|
|  | 234 | @cindex Version, printing | 
|---|
|  | 235 | Print out the version of @command{sed} that is being run and a copyright notice, | 
|---|
|  | 236 | then exit. | 
|---|
|  | 237 |  | 
|---|
|  | 238 | @item --help | 
|---|
|  | 239 | @opindex --help | 
|---|
|  | 240 | @cindex Usage summary, printing | 
|---|
|  | 241 | Print a usage message briefly summarizing these command-line options | 
|---|
|  | 242 | and the bug-reporting address, | 
|---|
|  | 243 | then exit. | 
|---|
|  | 244 |  | 
|---|
|  | 245 | @item -n | 
|---|
|  | 246 | @itemx --quiet | 
|---|
|  | 247 | @itemx --silent | 
|---|
|  | 248 | @opindex -n | 
|---|
|  | 249 | @opindex --quiet | 
|---|
|  | 250 | @opindex --silent | 
|---|
|  | 251 | @cindex Disabling autoprint, from command line | 
|---|
|  | 252 | By default, @command{sed} prints out the pattern space | 
|---|
| [3613] | 253 | at the end of each cycle through the script (@pxref{Execution Cycle, , | 
|---|
|  | 254 | How @code{sed} works}). | 
|---|
| [599] | 255 | These options disable this automatic printing, | 
|---|
|  | 256 | and @command{sed} only produces output when explicitly told to | 
|---|
|  | 257 | via the @code{p} command. | 
|---|
|  | 258 |  | 
|---|
| [3613] | 259 | @item --debug | 
|---|
|  | 260 | @opindex --debug | 
|---|
|  | 261 | @cindex @value{SSEDEXT}, debug | 
|---|
|  | 262 | Print the input sed program in canonical form, | 
|---|
|  | 263 | and annotate program execution. | 
|---|
|  | 264 | @codequotebacktick on | 
|---|
|  | 265 | @codequoteundirected on | 
|---|
|  | 266 | @example | 
|---|
|  | 267 | $ echo 1 | sed '\%1%s21232' | 
|---|
|  | 268 | 3 | 
|---|
|  | 269 |  | 
|---|
|  | 270 | $ echo 1 | sed --debug '\%1%s21232' | 
|---|
|  | 271 | SED PROGRAM: | 
|---|
|  | 272 | /1/ s/1/3/ | 
|---|
|  | 273 | INPUT:   'STDIN' line 1 | 
|---|
|  | 274 | PATTERN: 1 | 
|---|
|  | 275 | COMMAND: /1/ s/1/3/ | 
|---|
|  | 276 | PATTERN: 3 | 
|---|
|  | 277 | END-OF-CYCLE: | 
|---|
|  | 278 | 3 | 
|---|
|  | 279 | @end example | 
|---|
|  | 280 | @codequotebacktick off | 
|---|
|  | 281 | @codequoteundirected off | 
|---|
|  | 282 |  | 
|---|
|  | 283 |  | 
|---|
|  | 284 | @item -e @var{script} | 
|---|
|  | 285 | @itemx --expression=@var{script} | 
|---|
|  | 286 | @opindex -e | 
|---|
|  | 287 | @opindex --expression | 
|---|
|  | 288 | @cindex Script, from command line | 
|---|
|  | 289 | Add the commands in @var{script} to the set of commands to be | 
|---|
|  | 290 | run while processing the input. | 
|---|
|  | 291 |  | 
|---|
|  | 292 | @item -f @var{script-file} | 
|---|
|  | 293 | @itemx --file=@var{script-file} | 
|---|
|  | 294 | @opindex -f | 
|---|
|  | 295 | @opindex --file | 
|---|
|  | 296 | @cindex Script, from a file | 
|---|
|  | 297 | Add the commands contained in the file @var{script-file} | 
|---|
|  | 298 | to the set of commands to be run while processing the input. | 
|---|
|  | 299 |  | 
|---|
| [599] | 300 | @item -i[@var{SUFFIX}] | 
|---|
|  | 301 | @itemx --in-place[=@var{SUFFIX}] | 
|---|
|  | 302 | @opindex -i | 
|---|
|  | 303 | @opindex --in-place | 
|---|
|  | 304 | @cindex In-place editing, activating | 
|---|
|  | 305 | @cindex @value{SSEDEXT}, in-place editing | 
|---|
|  | 306 | This option specifies that files are to be edited in-place. | 
|---|
|  | 307 | @value{SSED} does this by creating a temporary file and | 
|---|
|  | 308 | sending output to this file rather than to the standard | 
|---|
|  | 309 | output.@footnote{This applies to commands such as @code{=}, | 
|---|
|  | 310 | @code{a}, @code{c}, @code{i}, @code{l}, @code{p}.  You can | 
|---|
|  | 311 | still write to the standard output by using the @code{w} | 
|---|
|  | 312 | @cindex @value{SSEDEXT}, @file{/dev/stdout} file | 
|---|
|  | 313 | or @code{W} commands together with the @file{/dev/stdout} | 
|---|
|  | 314 | special file}. | 
|---|
|  | 315 |  | 
|---|
|  | 316 | This option implies @option{-s}. | 
|---|
|  | 317 |  | 
|---|
|  | 318 | When the end of the file is reached, the temporary file is | 
|---|
|  | 319 | renamed to the output file's original name.  The extension, | 
|---|
|  | 320 | if supplied, is used to modify the name of the old file | 
|---|
|  | 321 | before renaming the temporary file, thereby making a backup | 
|---|
|  | 322 | copy@footnote{Note that @value{SSED} creates the backup | 
|---|
| [3613] | 323 | file whether or not any output is actually changed.}). | 
|---|
| [599] | 324 |  | 
|---|
|  | 325 | @cindex In-place editing, Perl-style backup file names | 
|---|
|  | 326 | This rule is followed: if the extension doesn't contain a @code{*}, | 
|---|
|  | 327 | then it is appended to the end of the current filename as a | 
|---|
|  | 328 | suffix; if the extension does contain one or more @code{*} | 
|---|
|  | 329 | characters, then @emph{each} asterisk is replaced with the | 
|---|
|  | 330 | current filename.  This allows you to add a prefix to the | 
|---|
|  | 331 | backup file, instead of (or in addition to) a suffix, or | 
|---|
|  | 332 | even to place backup copies of the original files into another | 
|---|
|  | 333 | directory (provided the directory already exists). | 
|---|
|  | 334 |  | 
|---|
|  | 335 | If no extension is supplied, the original file is | 
|---|
|  | 336 | overwritten without making a backup. | 
|---|
|  | 337 |  | 
|---|
| [3613] | 338 | Because @option{-i} takes an optional argument, it should | 
|---|
|  | 339 | not be followed by other short options: | 
|---|
|  | 340 | @table @code | 
|---|
|  | 341 | @item sed -Ei '...' FILE | 
|---|
|  | 342 | Same as @option{-E -i} with no backup suffix - @file{FILE} will be | 
|---|
|  | 343 | edited in-place without creating a backup. | 
|---|
|  | 344 |  | 
|---|
|  | 345 | @item sed -iE '...' FILE | 
|---|
|  | 346 | This is equivalent to @option{--in-place=E}, creating @file{FILEE} as backup | 
|---|
|  | 347 | of @file{FILE} | 
|---|
|  | 348 | @end table | 
|---|
|  | 349 |  | 
|---|
|  | 350 | Be cautious of using @option{-n} with @option{-i}: the former disables | 
|---|
|  | 351 | automatic printing of lines and the latter changes the file in-place | 
|---|
|  | 352 | without a backup. Used carelessly (and without an explicit @code{p} command), | 
|---|
|  | 353 | the output file will be empty: | 
|---|
|  | 354 | @codequotebacktick on | 
|---|
|  | 355 | @codequoteundirected on | 
|---|
|  | 356 | @example | 
|---|
|  | 357 | # WRONG USAGE: 'FILE' will be truncated. | 
|---|
|  | 358 | sed -ni 's/foo/bar/' FILE | 
|---|
|  | 359 | @end example | 
|---|
|  | 360 | @codequotebacktick off | 
|---|
|  | 361 | @codequoteundirected off | 
|---|
|  | 362 |  | 
|---|
| [599] | 363 | @item -l @var{N} | 
|---|
|  | 364 | @itemx --line-length=@var{N} | 
|---|
|  | 365 | @opindex -l | 
|---|
|  | 366 | @opindex --line-length | 
|---|
|  | 367 | @cindex Line length, setting | 
|---|
|  | 368 | Specify the default line-wrap length for the @code{l} command. | 
|---|
|  | 369 | A length of 0 (zero) means to never wrap long lines.  If | 
|---|
|  | 370 | not specified, it is taken to be 70. | 
|---|
|  | 371 |  | 
|---|
|  | 372 | @item --posix | 
|---|
| [3613] | 373 | @opindex --posix | 
|---|
| [599] | 374 | @cindex @value{SSEDEXT}, disabling | 
|---|
| [3613] | 375 | @value{SSED} includes several extensions to POSIX | 
|---|
| [599] | 376 | sed.  In order to simplify writing portable scripts, this | 
|---|
|  | 377 | option disables all the extensions that this manual documents, | 
|---|
|  | 378 | including additional commands. | 
|---|
|  | 379 | @cindex @code{POSIXLY_CORRECT} behavior, enabling | 
|---|
|  | 380 | Most of the extensions accept @command{sed} programs that | 
|---|
| [3613] | 381 | are outside the syntax mandated by POSIX, but some | 
|---|
| [599] | 382 | of them (such as the behavior of the @command{N} command | 
|---|
| [3613] | 383 | described in @ref{Reporting Bugs}) actually violate the | 
|---|
| [599] | 384 | standard.  If you want to disable only the latter kind of | 
|---|
|  | 385 | extension, you can set the @code{POSIXLY_CORRECT} variable | 
|---|
|  | 386 | to a non-empty value. | 
|---|
|  | 387 |  | 
|---|
| [3613] | 388 | @item -b | 
|---|
|  | 389 | @itemx --binary | 
|---|
|  | 390 | @opindex -b | 
|---|
|  | 391 | @opindex --binary | 
|---|
|  | 392 | This option is available on every platform, but is only effective where the | 
|---|
|  | 393 | operating system makes a distinction between text files and binary files. | 
|---|
|  | 394 | When such a distinction is made---as is the case for MS-DOS, Windows, | 
|---|
|  | 395 | Cygwin---text files are composed of lines separated by a carriage return | 
|---|
|  | 396 | @emph{and} a line feed character, and @command{sed} does not see the | 
|---|
|  | 397 | ending CR.  When this option is specified, @command{sed} will open | 
|---|
|  | 398 | input files in binary mode, thus not requesting this special processing | 
|---|
|  | 399 | and considering lines to end at a line feed. | 
|---|
|  | 400 |  | 
|---|
|  | 401 | @item --follow-symlinks | 
|---|
|  | 402 | @opindex --follow-symlinks | 
|---|
|  | 403 | This option is available only on platforms that support | 
|---|
|  | 404 | symbolic links and has an effect only if option @option{-i} | 
|---|
|  | 405 | is specified.  In this case, if the file that is specified | 
|---|
|  | 406 | on the command line is a symbolic link, @command{sed} will | 
|---|
|  | 407 | follow the link and edit the ultimate destination of the | 
|---|
|  | 408 | link.  The default behavior is to break the symbolic link, | 
|---|
|  | 409 | so that the link destination will not be modified. | 
|---|
|  | 410 |  | 
|---|
|  | 411 | @item -E | 
|---|
|  | 412 | @itemx -r | 
|---|
| [599] | 413 | @itemx --regexp-extended | 
|---|
| [3613] | 414 | @opindex -E | 
|---|
| [599] | 415 | @opindex -r | 
|---|
|  | 416 | @opindex --regexp-extended | 
|---|
|  | 417 | @cindex Extended regular expressions, choosing | 
|---|
| [3613] | 418 | @cindex GNU extensions, extended regular expressions | 
|---|
| [599] | 419 | Use extended regular expressions rather than basic | 
|---|
|  | 420 | regular expressions.  Extended regexps are those that | 
|---|
|  | 421 | @command{egrep} accepts; they can be clearer because they | 
|---|
| [3613] | 422 | usually have fewer backslashes. | 
|---|
|  | 423 | Historically this was a GNU extension, | 
|---|
|  | 424 | but the @option{-E} | 
|---|
|  | 425 | extension has since been added to the POSIX standard | 
|---|
|  | 426 | (http://austingroupbugs.net/view.php?id=528), | 
|---|
|  | 427 | so use @option{-E} for portability. | 
|---|
|  | 428 | GNU sed has accepted @option{-E} as an undocumented option for years, | 
|---|
|  | 429 | and *BSD seds have accepted @option{-E} for years as well, | 
|---|
|  | 430 | but scripts that use @option{-E} might not port to other older systems. | 
|---|
|  | 431 | @xref{ERE syntax, , Extended regular expressions}. | 
|---|
| [599] | 432 |  | 
|---|
|  | 433 |  | 
|---|
|  | 434 | @item -s | 
|---|
|  | 435 | @itemx --separate | 
|---|
| [3613] | 436 | @opindex -s | 
|---|
|  | 437 | @opindex --separate | 
|---|
| [599] | 438 | @cindex Working on separate files | 
|---|
|  | 439 | By default, @command{sed} will consider the files specified on the | 
|---|
|  | 440 | command line as a single continuous long stream.  This @value{SSED} | 
|---|
|  | 441 | extension allows the user to consider them as separate files: | 
|---|
|  | 442 | range addresses (such as @samp{/abc/,/def/}) are not allowed | 
|---|
|  | 443 | to span several files, line numbers are relative to the start | 
|---|
|  | 444 | of each file, @code{$} refers to the last line of each file, | 
|---|
|  | 445 | and files invoked from the @code{R} commands are rewound at the | 
|---|
|  | 446 | start of each file. | 
|---|
|  | 447 |  | 
|---|
| [3613] | 448 | @item --sandbox | 
|---|
|  | 449 | @opindex --sandbox | 
|---|
|  | 450 | @cindex Sandbox mode | 
|---|
|  | 451 | In sandbox mode,  @code{e/w/r} commands are rejected - programs containing | 
|---|
|  | 452 | them will be aborted without being run. Sandbox mode ensures @command{sed} | 
|---|
|  | 453 | operates only on the input files designated on the command line, and | 
|---|
|  | 454 | cannot run external programs. | 
|---|
|  | 455 |  | 
|---|
|  | 456 |  | 
|---|
| [599] | 457 | @item -u | 
|---|
|  | 458 | @itemx --unbuffered | 
|---|
|  | 459 | @opindex -u | 
|---|
|  | 460 | @opindex --unbuffered | 
|---|
|  | 461 | @cindex Unbuffered I/O, choosing | 
|---|
|  | 462 | Buffer both input and output as minimally as practical. | 
|---|
|  | 463 | (This is particularly useful if the input is coming from | 
|---|
|  | 464 | the likes of @samp{tail -f}, and you wish to see the transformed | 
|---|
|  | 465 | output as soon as possible.) | 
|---|
|  | 466 |  | 
|---|
| [3613] | 467 | @item -z | 
|---|
|  | 468 | @itemx --null-data | 
|---|
|  | 469 | @itemx --zero-terminated | 
|---|
|  | 470 | @opindex -z | 
|---|
|  | 471 | @opindex --null-data | 
|---|
|  | 472 | @opindex --zero-terminated | 
|---|
|  | 473 | Treat the input as a set of lines, each terminated by a zero byte | 
|---|
|  | 474 | (the ASCII @samp{NUL} character) instead of a newline.  This option can | 
|---|
|  | 475 | be used with commands like @samp{sort -z} and @samp{find -print0} | 
|---|
|  | 476 | to process arbitrary file names. | 
|---|
| [599] | 477 | @end table | 
|---|
|  | 478 |  | 
|---|
|  | 479 | If no @option{-e}, @option{-f}, @option{--expression}, or @option{--file} | 
|---|
|  | 480 | options are given on the command-line, | 
|---|
|  | 481 | then the first non-option argument on the command line is | 
|---|
|  | 482 | taken to be the @var{script} to be executed. | 
|---|
|  | 483 |  | 
|---|
|  | 484 | @cindex Files to be processed as input | 
|---|
|  | 485 | If any command-line parameters remain after processing the above, | 
|---|
|  | 486 | these parameters are interpreted as the names of input files to | 
|---|
|  | 487 | be processed. | 
|---|
|  | 488 | @cindex Standard input, processing as input | 
|---|
|  | 489 | A file name of @samp{-} refers to the standard input stream. | 
|---|
|  | 490 | The standard input will be processed if no file names are specified. | 
|---|
|  | 491 |  | 
|---|
| [3613] | 492 | @node Exit status | 
|---|
|  | 493 | @section Exit status | 
|---|
|  | 494 | @cindex exit status | 
|---|
|  | 495 | An exit status of zero indicates success, and a nonzero value | 
|---|
|  | 496 | indicates failure. @value{SSED} returns the following exit status | 
|---|
|  | 497 | error values: | 
|---|
| [599] | 498 |  | 
|---|
| [3613] | 499 | @table @asis | 
|---|
|  | 500 | @item 0 | 
|---|
|  | 501 | Successful completion. | 
|---|
| [599] | 502 |  | 
|---|
| [3613] | 503 | @item 1 | 
|---|
|  | 504 | Invalid command, invalid syntax, invalid regular expression or a | 
|---|
|  | 505 | @value{SSED} extension command used with @option{--posix}. | 
|---|
| [599] | 506 |  | 
|---|
| [3613] | 507 | @item 2 | 
|---|
|  | 508 | One or more of the input file specified on the command line could not be | 
|---|
|  | 509 | opened (e.g. if a file is not found, or read permission is denied). | 
|---|
|  | 510 | Processing continued with other files. | 
|---|
| [599] | 511 |  | 
|---|
| [3613] | 512 | @item 4 | 
|---|
|  | 513 | An I/O error, or a serious processing error during runtime, | 
|---|
|  | 514 | @value{SSED} aborted immediately. | 
|---|
|  | 515 | @end table | 
|---|
|  | 516 |  | 
|---|
|  | 517 | @cindex Q, example | 
|---|
|  | 518 | @cindex exit status, example | 
|---|
|  | 519 | Additionally, the commands @code{q} and @code{Q} can be used to terminate | 
|---|
|  | 520 | @command{sed} with a custom exit code value (this is a @value{SSED} extension): | 
|---|
|  | 521 |  | 
|---|
|  | 522 | @example | 
|---|
|  | 523 | $ echo | sed 'Q42' ; echo $? | 
|---|
|  | 524 | 42 | 
|---|
|  | 525 | @end example | 
|---|
|  | 526 |  | 
|---|
|  | 527 |  | 
|---|
|  | 528 | @node sed scripts | 
|---|
|  | 529 | @chapter @command{sed} scripts | 
|---|
|  | 530 |  | 
|---|
|  | 531 |  | 
|---|
| [599] | 532 | @menu | 
|---|
| [3613] | 533 | * sed script overview::      @command{sed} script overview | 
|---|
|  | 534 | * sed commands list::        @command{sed} commands summary | 
|---|
|  | 535 | * The "s" Command::          @command{sed}'s Swiss Army Knife | 
|---|
| [599] | 536 | * Common Commands::          Often used commands | 
|---|
|  | 537 | * Other Commands::           Less frequently used commands | 
|---|
|  | 538 | * Programming Commands::     Commands for @command{sed} gurus | 
|---|
|  | 539 | * Extended Commands::        Commands specific of @value{SSED} | 
|---|
| [3613] | 540 | * Multiple commands syntax:: Extension for easier scripting | 
|---|
| [599] | 541 | @end menu | 
|---|
|  | 542 |  | 
|---|
| [3613] | 543 | @node sed script overview | 
|---|
|  | 544 | @section @command{sed} script overview | 
|---|
| [599] | 545 |  | 
|---|
| [3613] | 546 | @cindex @command{sed} script structure | 
|---|
|  | 547 | @cindex Script structure | 
|---|
| [599] | 548 |  | 
|---|
| [3613] | 549 | A @command{sed} program consists of one or more @command{sed} commands, | 
|---|
|  | 550 | passed in by one or more of the | 
|---|
|  | 551 | @option{-e}, @option{-f}, @option{--expression}, and @option{--file} | 
|---|
|  | 552 | options, or the first non-option argument if zero of these | 
|---|
|  | 553 | options are used. | 
|---|
|  | 554 | This document will refer to ``the'' @command{sed} script; | 
|---|
|  | 555 | this is understood to mean the in-order concatenation | 
|---|
|  | 556 | of all of the @var{script}s and @var{script-file}s passed in. | 
|---|
|  | 557 | @xref{Overview}. | 
|---|
| [599] | 558 |  | 
|---|
|  | 559 |  | 
|---|
| [3613] | 560 | @cindex @command{sed} commands syntax | 
|---|
|  | 561 | @cindex syntax, @command{sed} commands | 
|---|
|  | 562 | @cindex addresses, syntax | 
|---|
|  | 563 | @cindex syntax, addresses | 
|---|
|  | 564 | @command{sed} commands follow this syntax: | 
|---|
| [599] | 565 |  | 
|---|
| [3613] | 566 | @example | 
|---|
|  | 567 | [addr]@var{X}[options] | 
|---|
|  | 568 | @end example | 
|---|
| [599] | 569 |  | 
|---|
| [3613] | 570 | @var{X} is a single-letter @command{sed} command. | 
|---|
|  | 571 | @c TODO: add @pxref{commands} when there is a command-list section. | 
|---|
|  | 572 | @code{[addr]} is an optional line address. If @code{[addr]} is specified, | 
|---|
|  | 573 | the command @var{X} will be executed only on the matched lines. | 
|---|
|  | 574 | @code{[addr]} can be a single line number, a regular expression, | 
|---|
|  | 575 | or a range of lines (@pxref{sed addresses}). | 
|---|
|  | 576 | Additional @code{[options]} are used for some @command{sed} commands. | 
|---|
| [599] | 577 |  | 
|---|
| [3613] | 578 | @cindex @command{d}, example | 
|---|
|  | 579 | @cindex address range, example | 
|---|
|  | 580 | @cindex example, address range | 
|---|
|  | 581 | The following example deletes  lines 30 to 35 in the input. | 
|---|
|  | 582 | @code{30,35} is an address range. @command{d} is the delete command: | 
|---|
| [599] | 583 |  | 
|---|
| [3613] | 584 | @example | 
|---|
|  | 585 | sed '30,35d' input.txt > output.txt | 
|---|
|  | 586 | @end example | 
|---|
| [599] | 587 |  | 
|---|
| [3613] | 588 | @cindex @command{q}, example | 
|---|
|  | 589 | @cindex regular expression, example | 
|---|
|  | 590 | @cindex example, regular expression | 
|---|
|  | 591 | The following example prints all input until a line | 
|---|
|  | 592 | starting with the string @samp{foo} is found. If such line is found, | 
|---|
|  | 593 | @command{sed} will terminate with exit status 42. | 
|---|
|  | 594 | If such line was not found (and no other error occurred), @command{sed} | 
|---|
|  | 595 | will exit with status 0. | 
|---|
|  | 596 | @code{/^foo/} is a regular-expression address. | 
|---|
|  | 597 | @command{q} is the quit command. @code{42} is the command option. | 
|---|
| [599] | 598 |  | 
|---|
| [3613] | 599 | @example | 
|---|
|  | 600 | sed '/^foo/q42' input.txt > output.txt | 
|---|
|  | 601 | @end example | 
|---|
| [599] | 602 |  | 
|---|
|  | 603 |  | 
|---|
| [3613] | 604 | @cindex multiple @command{sed} commands | 
|---|
|  | 605 | @cindex @command{sed} commands, multiple | 
|---|
|  | 606 | @cindex newline, command separator | 
|---|
|  | 607 | @cindex semicolons, command separator | 
|---|
|  | 608 | @cindex ;, command separator | 
|---|
|  | 609 | @cindex -e, example | 
|---|
|  | 610 | @cindex -f, example | 
|---|
|  | 611 | Commands within a @var{script} or @var{script-file} can be | 
|---|
|  | 612 | separated by semicolons (@code{;}) or newlines (ASCII 10). | 
|---|
|  | 613 | Multiple scripts can be specified with @option{-e} or @option{-f} | 
|---|
|  | 614 | options. | 
|---|
| [599] | 615 |  | 
|---|
| [3613] | 616 | The following examples are all equivalent. They perform two @command{sed} | 
|---|
|  | 617 | operations: deleting any lines matching the regular expression @code{/^foo/}, | 
|---|
|  | 618 | and replacing all occurrences of the string @samp{hello} with @samp{world}: | 
|---|
| [599] | 619 |  | 
|---|
| [3613] | 620 | @example | 
|---|
|  | 621 | sed '/^foo/d ; s/hello/world/g' input.txt > output.txt | 
|---|
| [599] | 622 |  | 
|---|
| [3613] | 623 | sed -e '/^foo/d' -e 's/hello/world/g' input.txt > output.txt | 
|---|
| [599] | 624 |  | 
|---|
| [3613] | 625 | echo '/^foo/d' > script.sed | 
|---|
|  | 626 | echo 's/hello/world/g' >> script.sed | 
|---|
|  | 627 | sed -f script.sed input.txt > output.txt | 
|---|
| [599] | 628 |  | 
|---|
| [3613] | 629 | echo 's/hello/world/g' > script2.sed | 
|---|
|  | 630 | sed -e '/^foo/d' -f script2.sed input.txt > output.txt | 
|---|
|  | 631 | @end example | 
|---|
| [599] | 632 |  | 
|---|
|  | 633 |  | 
|---|
| [3613] | 634 | @cindex @command{a}, and semicolons | 
|---|
|  | 635 | @cindex @command{c}, and semicolons | 
|---|
|  | 636 | @cindex @command{i}, and semicolons | 
|---|
|  | 637 | Commands @command{a}, @command{c}, @command{i}, due to their syntax, | 
|---|
|  | 638 | cannot be followed by semicolons working as command separators and | 
|---|
|  | 639 | thus should be terminated | 
|---|
|  | 640 | with newlines or be placed at the end of a @var{script} or @var{script-file}. | 
|---|
|  | 641 | Commands can also be preceded with optional non-significant | 
|---|
|  | 642 | whitespace characters. | 
|---|
|  | 643 | @xref{Multiple commands syntax}. | 
|---|
| [599] | 644 |  | 
|---|
|  | 645 |  | 
|---|
|  | 646 |  | 
|---|
| [3613] | 647 | @node sed commands list | 
|---|
|  | 648 | @section @command{sed} commands summary | 
|---|
| [599] | 649 |  | 
|---|
| [3613] | 650 | The following commands are supported in @value{SSED}. | 
|---|
|  | 651 | Some are standard POSIX commands, while other are @value{SSEDEXT}. | 
|---|
|  | 652 | Details and examples for each command are in the following sections. | 
|---|
|  | 653 | (Mnemonics) are shown in parentheses. | 
|---|
| [599] | 654 |  | 
|---|
|  | 655 | @table @code | 
|---|
|  | 656 |  | 
|---|
| [3613] | 657 | @item a\ | 
|---|
|  | 658 | @itemx @var{text} | 
|---|
|  | 659 | Append @var{text} after a line. | 
|---|
| [599] | 660 |  | 
|---|
| [3613] | 661 | @item a @var{text} | 
|---|
|  | 662 | Append @var{text} after a line (alternative syntax). | 
|---|
| [599] | 663 |  | 
|---|
| [3613] | 664 | @item b @var{label} | 
|---|
|  | 665 | Branch unconditionally to @var{label}. | 
|---|
|  | 666 | The @var{label} may be omitted, in which case the next cycle is started. | 
|---|
| [599] | 667 |  | 
|---|
| [3613] | 668 | @item c\ | 
|---|
|  | 669 | @itemx @var{text} | 
|---|
|  | 670 | Replace (change) lines with @var{text}. | 
|---|
| [599] | 671 |  | 
|---|
| [3613] | 672 | @item c @var{text} | 
|---|
|  | 673 | Replace (change) lines with @var{text} (alternative syntax). | 
|---|
| [599] | 674 |  | 
|---|
| [3613] | 675 | @item d | 
|---|
|  | 676 | Delete the pattern space; | 
|---|
|  | 677 | immediately start next cycle. | 
|---|
| [599] | 678 |  | 
|---|
| [3613] | 679 | @item D | 
|---|
|  | 680 | If pattern space contains newlines, delete text in the pattern | 
|---|
|  | 681 | space up to the first newline, and restart cycle with the resultant | 
|---|
|  | 682 | pattern space, without reading a new line of input. | 
|---|
| [599] | 683 |  | 
|---|
| [3613] | 684 | If pattern space contains no newline, start a normal new cycle as if | 
|---|
|  | 685 | the @code{d} command was issued. | 
|---|
|  | 686 | @c TODO: add a section about D+N and D+n commands | 
|---|
| [599] | 687 |  | 
|---|
| [3613] | 688 | @item e | 
|---|
|  | 689 | Executes the command that is found in pattern space and | 
|---|
|  | 690 | replaces the pattern space with the output; a trailing newline | 
|---|
|  | 691 | is suppressed. | 
|---|
| [599] | 692 |  | 
|---|
| [3613] | 693 | @item e @var{command} | 
|---|
|  | 694 | Executes @var{command} and sends its output to the output stream. | 
|---|
|  | 695 | The command can run across multiple lines, all but the last ending with | 
|---|
|  | 696 | a back-slash. | 
|---|
| [599] | 697 |  | 
|---|
| [3613] | 698 | @item F | 
|---|
|  | 699 | (filename) Print the file name of the current input file (with a trailing | 
|---|
|  | 700 | newline). | 
|---|
| [599] | 701 |  | 
|---|
| [3613] | 702 | @item g | 
|---|
|  | 703 | Replace the contents of the pattern space with the contents of the hold space. | 
|---|
| [599] | 704 |  | 
|---|
| [3613] | 705 | @item G | 
|---|
|  | 706 | Append a newline to the contents of the pattern space, | 
|---|
|  | 707 | and then append the contents of the hold space to that of the pattern space. | 
|---|
| [599] | 708 |  | 
|---|
| [3613] | 709 | @item h | 
|---|
|  | 710 | (hold) Replace the contents of the hold space with the contents of the | 
|---|
|  | 711 | pattern space. | 
|---|
| [599] | 712 |  | 
|---|
| [3613] | 713 | @item H | 
|---|
|  | 714 | Append a newline to the contents of the hold space, | 
|---|
|  | 715 | and then append the contents of the pattern space to that of the hold space. | 
|---|
| [599] | 716 |  | 
|---|
| [3613] | 717 | @item i\ | 
|---|
|  | 718 | @itemx @var{text} | 
|---|
|  | 719 | insert @var{text} before a line. | 
|---|
| [599] | 720 |  | 
|---|
| [3613] | 721 | @item i @var{text} | 
|---|
|  | 722 | insert @var{text} before a line (alternative syntax). | 
|---|
| [599] | 723 |  | 
|---|
| [3613] | 724 | @item l | 
|---|
|  | 725 | Print the pattern space in an unambiguous form. | 
|---|
| [599] | 726 |  | 
|---|
| [3613] | 727 | @item n | 
|---|
|  | 728 | (next) If auto-print is not disabled, print the pattern space, | 
|---|
|  | 729 | then, regardless, replace the pattern space with the next line of input. | 
|---|
|  | 730 | If there is no more input then @command{sed} exits without processing | 
|---|
|  | 731 | any more commands. | 
|---|
| [599] | 732 |  | 
|---|
| [3613] | 733 | @item N | 
|---|
|  | 734 | Add a newline to the pattern space, | 
|---|
|  | 735 | then append the next line of input to the pattern space. | 
|---|
|  | 736 | If there is no more input then @command{sed} exits without processing | 
|---|
|  | 737 | any more commands. | 
|---|
| [599] | 738 |  | 
|---|
| [3613] | 739 | @item p | 
|---|
|  | 740 | Print the pattern space. | 
|---|
|  | 741 | @c useful with @option{-n} | 
|---|
| [599] | 742 |  | 
|---|
| [3613] | 743 | @item P | 
|---|
|  | 744 | Print the pattern space, up to the first <newline>. | 
|---|
| [599] | 745 |  | 
|---|
| [3613] | 746 | @item q@var{[exit-code]} | 
|---|
|  | 747 | (quit) Exit @command{sed} without processing any more commands or input. | 
|---|
| [599] | 748 |  | 
|---|
| [3613] | 749 | @item Q@var{[exit-code]} | 
|---|
|  | 750 | (quit) This command is the same as @code{q}, but will not print the | 
|---|
|  | 751 | contents of pattern space.  Like @code{q}, it provides the | 
|---|
|  | 752 | ability to return an exit code to the caller. | 
|---|
|  | 753 | @c useful to quit on a conditional without printing | 
|---|
| [599] | 754 |  | 
|---|
| [3613] | 755 | @item r filename | 
|---|
|  | 756 | Reads file @var{filename}. | 
|---|
| [599] | 757 |  | 
|---|
| [3613] | 758 | @item R filename | 
|---|
|  | 759 | Queue a line of @var{filename} to be read and | 
|---|
|  | 760 | inserted into the output stream at the end of the current cycle, | 
|---|
|  | 761 | or when the next input line is read. | 
|---|
|  | 762 | @c useful to interleave files | 
|---|
| [599] | 763 |  | 
|---|
| [3613] | 764 | @item s@var{/regexp/replacement/[flags]} | 
|---|
|  | 765 | (substitute) Match the regular-expression against the content of the | 
|---|
|  | 766 | pattern space.  If found, replace matched string with | 
|---|
|  | 767 | @var{replacement}. | 
|---|
| [599] | 768 |  | 
|---|
| [3613] | 769 | @item t @var{label} | 
|---|
|  | 770 | (test) Branch to @var{label} only if there has been a successful | 
|---|
|  | 771 | @code{s}ubstitution since the last input line was read or conditional | 
|---|
|  | 772 | branch was taken.  The @var{label} may be omitted, in which case the | 
|---|
|  | 773 | next cycle is started. | 
|---|
| [599] | 774 |  | 
|---|
| [3613] | 775 | @item T @var{label} | 
|---|
|  | 776 | (test) Branch to @var{label} only if there have been no successful | 
|---|
|  | 777 | @code{s}ubstitutions since the last input line was read or | 
|---|
|  | 778 | conditional branch was taken. The @var{label} may be omitted, | 
|---|
|  | 779 | in which case the next cycle is started. | 
|---|
| [599] | 780 |  | 
|---|
| [3613] | 781 | @item v @var{[version]} | 
|---|
|  | 782 | (version) This command does nothing, but makes @command{sed} fail if | 
|---|
|  | 783 | @value{SSED} extensions are not supported, or if the requested version | 
|---|
|  | 784 | is not available. | 
|---|
| [599] | 785 |  | 
|---|
| [3613] | 786 | @item w filename | 
|---|
|  | 787 | Write the pattern space to @var{filename}. | 
|---|
| [599] | 788 |  | 
|---|
| [3613] | 789 | @item W filename | 
|---|
|  | 790 | Write to the given filename the portion of the pattern space up to | 
|---|
|  | 791 | the first newline | 
|---|
| [599] | 792 |  | 
|---|
| [3613] | 793 | @item x | 
|---|
|  | 794 | Exchange the contents of the hold and pattern spaces. | 
|---|
| [599] | 795 |  | 
|---|
|  | 796 |  | 
|---|
| [3613] | 797 | @item y/src/dst/ | 
|---|
|  | 798 | Transliterate any characters in the pattern space which match | 
|---|
|  | 799 | any of the @var{source-chars} with the corresponding character | 
|---|
|  | 800 | in @var{dest-chars}. | 
|---|
| [599] | 801 |  | 
|---|
|  | 802 |  | 
|---|
| [3613] | 803 | @item z | 
|---|
|  | 804 | (zap) This command empties the content of pattern space. | 
|---|
| [599] | 805 |  | 
|---|
|  | 806 | @item # | 
|---|
| [3613] | 807 | A comment, until  the next newline. | 
|---|
| [599] | 808 |  | 
|---|
|  | 809 |  | 
|---|
| [3613] | 810 | @item @{ @var{cmd ; cmd ...} @} | 
|---|
|  | 811 | Group several commands together. | 
|---|
|  | 812 | @c useful for multiple commands on same address | 
|---|
| [599] | 813 |  | 
|---|
| [3613] | 814 | @item = | 
|---|
|  | 815 | Print the current input line number (with a trailing newline). | 
|---|
| [599] | 816 |  | 
|---|
| [3613] | 817 | @item : @var{label} | 
|---|
|  | 818 | Specify the location of @var{label} for branch commands (@code{b}, | 
|---|
|  | 819 | @code{t}, @code{T}). | 
|---|
| [599] | 820 |  | 
|---|
| [3613] | 821 | @end table | 
|---|
| [599] | 822 |  | 
|---|
|  | 823 |  | 
|---|
|  | 824 | @node The "s" Command | 
|---|
|  | 825 | @section The @code{s} Command | 
|---|
|  | 826 |  | 
|---|
| [3613] | 827 | The @code{s} command (as in substitute) is probably the most important | 
|---|
|  | 828 | in @command{sed} and has a lot of different options.  The syntax of | 
|---|
|  | 829 | the @code{s} command is | 
|---|
|  | 830 | @samp{s/@var{regexp}/@var{replacement}/@var{flags}}. | 
|---|
| [599] | 831 |  | 
|---|
| [3613] | 832 | Its basic concept is simple: the @code{s} command attempts to match | 
|---|
|  | 833 | the pattern space against the supplied regular expression @var{regexp}; | 
|---|
|  | 834 | if the match is successful, then that portion of the | 
|---|
|  | 835 | pattern space which was matched is replaced with @var{replacement}. | 
|---|
| [599] | 836 |  | 
|---|
| [3613] | 837 | For details about @var{regexp} syntax @pxref{Regexp Addresses,,Regular | 
|---|
|  | 838 | Expression Addresses}. | 
|---|
|  | 839 |  | 
|---|
| [599] | 840 | @cindex Backreferences, in regular expressions | 
|---|
|  | 841 | @cindex Parenthesized substrings | 
|---|
|  | 842 | The @var{replacement} can contain @code{\@var{n}} (@var{n} being | 
|---|
|  | 843 | a number from 1 to 9, inclusive) references, which refer to | 
|---|
|  | 844 | the portion of the match which is contained between the @var{n}th | 
|---|
|  | 845 | @code{\(} and its matching @code{\)}. | 
|---|
|  | 846 | Also, the @var{replacement} can contain unescaped @code{&} | 
|---|
|  | 847 | characters which reference the whole matched portion | 
|---|
|  | 848 | of the pattern space. | 
|---|
| [3613] | 849 |  | 
|---|
|  | 850 | @c TODO: xref to backreference section mention @var{\'}. | 
|---|
|  | 851 |  | 
|---|
|  | 852 | The @code{/} | 
|---|
|  | 853 | characters may be uniformly replaced by any other single | 
|---|
|  | 854 | character within any given @code{s} command.  The @code{/} | 
|---|
|  | 855 | character (or whatever other character is used in its stead) | 
|---|
|  | 856 | can appear in the @var{regexp} or @var{replacement} | 
|---|
|  | 857 | only if it is preceded by a @code{\} character. | 
|---|
|  | 858 |  | 
|---|
|  | 859 |  | 
|---|
|  | 860 |  | 
|---|
| [599] | 861 | @cindex @value{SSEDEXT}, case modifiers in @code{s} commands | 
|---|
|  | 862 | Finally, as a @value{SSED} extension, you can include a | 
|---|
|  | 863 | special sequence made of a backslash and one of the letters | 
|---|
|  | 864 | @code{L}, @code{l}, @code{U}, @code{u}, or @code{E}. | 
|---|
|  | 865 | The meaning is as follows: | 
|---|
|  | 866 |  | 
|---|
|  | 867 | @table @code | 
|---|
|  | 868 | @item \L | 
|---|
|  | 869 | Turn the replacement | 
|---|
|  | 870 | to lowercase until a @code{\U} or @code{\E} is found, | 
|---|
|  | 871 |  | 
|---|
|  | 872 | @item \l | 
|---|
|  | 873 | Turn the | 
|---|
|  | 874 | next character to lowercase, | 
|---|
|  | 875 |  | 
|---|
|  | 876 | @item \U | 
|---|
|  | 877 | Turn the replacement to uppercase | 
|---|
|  | 878 | until a @code{\L} or @code{\E} is found, | 
|---|
|  | 879 |  | 
|---|
|  | 880 | @item \u | 
|---|
|  | 881 | Turn the next character | 
|---|
|  | 882 | to uppercase, | 
|---|
|  | 883 |  | 
|---|
|  | 884 | @item \E | 
|---|
|  | 885 | Stop case conversion started by @code{\L} or @code{\U}. | 
|---|
|  | 886 | @end table | 
|---|
|  | 887 |  | 
|---|
| [3613] | 888 | When the @code{g} flag is being used, case conversion does not | 
|---|
|  | 889 | propagate from one occurrence of the regular expression to | 
|---|
|  | 890 | another.  For example, when the following command is executed | 
|---|
|  | 891 | with @samp{a-b-} in pattern space: | 
|---|
|  | 892 | @example | 
|---|
|  | 893 | s/\(b\?\)-/x\u\1/g | 
|---|
|  | 894 | @end example | 
|---|
|  | 895 |  | 
|---|
|  | 896 | @noindent | 
|---|
|  | 897 | the output is @samp{axxB}.  When replacing the first @samp{-}, | 
|---|
|  | 898 | the @samp{\u} sequence only affects the empty replacement of | 
|---|
|  | 899 | @samp{\1}.  It does not affect the @code{x} character that is | 
|---|
|  | 900 | added to pattern space when replacing @code{b-} with @code{xB}. | 
|---|
|  | 901 |  | 
|---|
|  | 902 | On the other hand, @code{\l} and @code{\u} do affect the remainder | 
|---|
|  | 903 | of the replacement text if they are followed by an empty substitution. | 
|---|
|  | 904 | With @samp{a-b-} in pattern space, the following command: | 
|---|
|  | 905 | @example | 
|---|
|  | 906 | s/\(b\?\)-/\u\1x/g | 
|---|
|  | 907 | @end example | 
|---|
|  | 908 |  | 
|---|
|  | 909 | @noindent | 
|---|
|  | 910 | will replace @samp{-} with @samp{X} (uppercase) and @samp{b-} with | 
|---|
|  | 911 | @samp{Bx}.  If this behavior is undesirable, you can prevent it by | 
|---|
|  | 912 | adding a @samp{\E} sequence---after @samp{\1} in this case. | 
|---|
|  | 913 |  | 
|---|
| [599] | 914 | To include a literal @code{\}, @code{&}, or newline in the final | 
|---|
|  | 915 | replacement, be sure to precede the desired @code{\}, @code{&}, | 
|---|
|  | 916 | or newline in the @var{replacement} with a @code{\}. | 
|---|
|  | 917 |  | 
|---|
|  | 918 | @findex s command, option flags | 
|---|
|  | 919 | @cindex Substitution of text, options | 
|---|
|  | 920 | The @code{s} command can be followed by zero or more of the | 
|---|
|  | 921 | following @var{flags}: | 
|---|
|  | 922 |  | 
|---|
|  | 923 | @table @code | 
|---|
|  | 924 | @item g | 
|---|
|  | 925 | @cindex Global substitution | 
|---|
|  | 926 | @cindex Replacing all text matching regexp in a line | 
|---|
|  | 927 | Apply the replacement to @emph{all} matches to the @var{regexp}, | 
|---|
|  | 928 | not just the first. | 
|---|
|  | 929 |  | 
|---|
|  | 930 | @item @var{number} | 
|---|
|  | 931 | @cindex Replacing only @var{n}th match of regexp in a line | 
|---|
|  | 932 | Only replace the @var{number}th match of the @var{regexp}. | 
|---|
|  | 933 |  | 
|---|
| [3613] | 934 | @cindex GNU extensions, @code{g} and @var{number} modifier | 
|---|
|  | 935 | interaction in @code{s} command | 
|---|
| [599] | 936 | @cindex Mixing @code{g} and @var{number} modifiers in the @code{s} command | 
|---|
|  | 937 | Note: the @sc{posix} standard does not specify what should happen | 
|---|
|  | 938 | when you mix the @code{g} and @var{number} modifiers, | 
|---|
|  | 939 | and currently there is no widely agreed upon meaning | 
|---|
|  | 940 | across @command{sed} implementations. | 
|---|
|  | 941 | For @value{SSED}, the interaction is defined to be: | 
|---|
|  | 942 | ignore matches before the @var{number}th, | 
|---|
|  | 943 | and then match and replace all matches from | 
|---|
|  | 944 | the @var{number}th on. | 
|---|
|  | 945 |  | 
|---|
|  | 946 | @item p | 
|---|
|  | 947 | @cindex Text, printing after substitution | 
|---|
|  | 948 | If the substitution was made, then print the new pattern space. | 
|---|
|  | 949 |  | 
|---|
|  | 950 | Note: when both the @code{p} and @code{e} options are specified, | 
|---|
|  | 951 | the relative ordering of the two produces very different results. | 
|---|
|  | 952 | In general, @code{ep} (evaluate then print) is what you want, | 
|---|
|  | 953 | but operating the other way round can be useful for debugging. | 
|---|
|  | 954 | For this reason, the current version of @value{SSED} interprets | 
|---|
|  | 955 | specially the presence of @code{p} options both before and after | 
|---|
|  | 956 | @code{e}, printing the pattern space before and after evaluation, | 
|---|
|  | 957 | while in general flags for the @code{s} command show their | 
|---|
|  | 958 | effect just once.  This behavior, although documented, might | 
|---|
|  | 959 | change in future versions. | 
|---|
|  | 960 |  | 
|---|
| [3613] | 961 | @item w @var{filename} | 
|---|
| [599] | 962 | @cindex Text, writing to a file after substitution | 
|---|
|  | 963 | @cindex @value{SSEDEXT}, @file{/dev/stdout} file | 
|---|
|  | 964 | @cindex @value{SSEDEXT}, @file{/dev/stderr} file | 
|---|
|  | 965 | If the substitution was made, then write out the result to the named file. | 
|---|
| [3613] | 966 | As a @value{SSED} extension, two special values of @var{filename} are | 
|---|
| [599] | 967 | supported: @file{/dev/stderr}, which writes the result to the standard | 
|---|
|  | 968 | error, and @file{/dev/stdout}, which writes to the standard | 
|---|
|  | 969 | output.@footnote{This is equivalent to @code{p} unless the @option{-i} | 
|---|
|  | 970 | option is being used.} | 
|---|
|  | 971 |  | 
|---|
|  | 972 | @item e | 
|---|
|  | 973 | @cindex Evaluate Bourne-shell commands, after substitution | 
|---|
|  | 974 | @cindex Subprocesses | 
|---|
|  | 975 | @cindex @value{SSEDEXT}, evaluating Bourne-shell commands | 
|---|
|  | 976 | @cindex @value{SSEDEXT}, subprocesses | 
|---|
|  | 977 | This command allows one to pipe input from a shell command | 
|---|
|  | 978 | into pattern space.  If a substitution was made, the command | 
|---|
|  | 979 | that is found in pattern space is executed and pattern space | 
|---|
|  | 980 | is replaced with its output.  A trailing newline is suppressed; | 
|---|
|  | 981 | results are undefined if the command to be executed contains | 
|---|
|  | 982 | a @sc{nul} character.  This is a @value{SSED} extension. | 
|---|
|  | 983 |  | 
|---|
|  | 984 | @item I | 
|---|
|  | 985 | @itemx i | 
|---|
| [3613] | 986 | @cindex GNU extensions, @code{I} modifier | 
|---|
| [599] | 987 | @cindex Case-insensitive matching | 
|---|
| [3613] | 988 | The @code{I} modifier to regular-expression matching is a GNU | 
|---|
| [599] | 989 | extension which makes @command{sed} match @var{regexp} in a | 
|---|
|  | 990 | case-insensitive manner. | 
|---|
|  | 991 |  | 
|---|
|  | 992 | @item M | 
|---|
|  | 993 | @itemx m | 
|---|
|  | 994 | @cindex @value{SSEDEXT}, @code{M} modifier | 
|---|
|  | 995 | The @code{M} modifier to regular-expression matching is a @value{SSED} | 
|---|
| [3613] | 996 | extension which directs @value{SSED} to match the regular expression | 
|---|
|  | 997 | in @cite{multi-line} mode.  The modifier causes @code{^} and @code{$} to | 
|---|
|  | 998 | match respectively (in addition to the normal behavior) the empty string | 
|---|
|  | 999 | after a newline, and the empty string before a newline.  There are | 
|---|
|  | 1000 | special character sequences | 
|---|
| [599] | 1001 | @ifclear PERL | 
|---|
|  | 1002 | (@code{\`} and @code{\'}) | 
|---|
|  | 1003 | @end ifclear | 
|---|
|  | 1004 | which always match the beginning or the end of the buffer. | 
|---|
| [3613] | 1005 | In addition, | 
|---|
|  | 1006 | the period character does not match a new-line character in | 
|---|
|  | 1007 | multi-line mode. | 
|---|
| [599] | 1008 |  | 
|---|
|  | 1009 |  | 
|---|
|  | 1010 | @end table | 
|---|
|  | 1011 |  | 
|---|
| [3613] | 1012 | @node Common Commands | 
|---|
|  | 1013 | @section Often-Used Commands | 
|---|
| [599] | 1014 |  | 
|---|
| [3613] | 1015 | If you use @command{sed} at all, you will quite likely want to know | 
|---|
|  | 1016 | these commands. | 
|---|
|  | 1017 |  | 
|---|
|  | 1018 | @table @code | 
|---|
|  | 1019 | @item # | 
|---|
|  | 1020 | [No addresses allowed.] | 
|---|
|  | 1021 |  | 
|---|
|  | 1022 | @findex # (comments) | 
|---|
|  | 1023 | @cindex Comments, in scripts | 
|---|
|  | 1024 | The @code{#} character begins a comment; | 
|---|
|  | 1025 | the comment continues until the next newline. | 
|---|
|  | 1026 |  | 
|---|
|  | 1027 | @cindex Portability, comments | 
|---|
|  | 1028 | If you are concerned about portability, be aware that | 
|---|
|  | 1029 | some implementations of @command{sed} (which are not @sc{posix} | 
|---|
|  | 1030 | conforming) may only support a single one-line comment, | 
|---|
|  | 1031 | and then only when the very first character of the script is a @code{#}. | 
|---|
|  | 1032 |  | 
|---|
|  | 1033 | @findex -n, forcing from within a script | 
|---|
|  | 1034 | @cindex Caveat --- #n on first line | 
|---|
|  | 1035 | Warning: if the first two characters of the @command{sed} script | 
|---|
|  | 1036 | are @code{#n}, then the @option{-n} (no-autoprint) option is forced. | 
|---|
|  | 1037 | If you want to put a comment in the first line of your script | 
|---|
|  | 1038 | and that comment begins with the letter @samp{n} | 
|---|
|  | 1039 | and you do not want this behavior, | 
|---|
|  | 1040 | then be sure to either use a capital @samp{N}, | 
|---|
|  | 1041 | or place at least one space before the @samp{n}. | 
|---|
|  | 1042 |  | 
|---|
|  | 1043 | @item q [@var{exit-code}] | 
|---|
|  | 1044 | @findex q (quit) command | 
|---|
|  | 1045 | @cindex @value{SSEDEXT}, returning an exit code | 
|---|
|  | 1046 | @cindex Quitting | 
|---|
|  | 1047 | Exit @command{sed} without processing any more commands or input. | 
|---|
|  | 1048 |  | 
|---|
|  | 1049 | Example: stop after printing the second line: | 
|---|
|  | 1050 | @example | 
|---|
|  | 1051 | $ seq 3 | sed 2q | 
|---|
|  | 1052 | 1 | 
|---|
|  | 1053 | 2 | 
|---|
|  | 1054 | @end example | 
|---|
|  | 1055 |  | 
|---|
|  | 1056 | This command accepts only one address. | 
|---|
|  | 1057 | Note that the current pattern space is printed if auto-print is | 
|---|
|  | 1058 | not disabled with the @option{-n} options.  The ability to return | 
|---|
|  | 1059 | an exit code from the @command{sed} script is a @value{SSED} extension. | 
|---|
|  | 1060 |  | 
|---|
|  | 1061 | See also the @value{SSED} extension @code{Q} command which quits silently | 
|---|
|  | 1062 | without printing the current pattern space. | 
|---|
|  | 1063 |  | 
|---|
|  | 1064 | @item d | 
|---|
|  | 1065 | @findex d (delete) command | 
|---|
|  | 1066 | @cindex Text, deleting | 
|---|
|  | 1067 | Delete the pattern space; | 
|---|
|  | 1068 | immediately start next cycle. | 
|---|
|  | 1069 |  | 
|---|
|  | 1070 | Example: delete the second input line: | 
|---|
|  | 1071 | @example | 
|---|
|  | 1072 | $ seq 3 | sed 2d | 
|---|
|  | 1073 | 1 | 
|---|
|  | 1074 | 3 | 
|---|
|  | 1075 | @end example | 
|---|
|  | 1076 |  | 
|---|
|  | 1077 | @item p | 
|---|
|  | 1078 | @findex p (print) command | 
|---|
|  | 1079 | @cindex Text, printing | 
|---|
|  | 1080 | Print out the pattern space (to the standard output). | 
|---|
|  | 1081 | This command is usually only used in conjunction with the @option{-n} | 
|---|
|  | 1082 | command-line option. | 
|---|
|  | 1083 |  | 
|---|
|  | 1084 | Example: print only the second input line: | 
|---|
|  | 1085 | @example | 
|---|
|  | 1086 | $ seq 3 | sed -n 2p | 
|---|
|  | 1087 | 2 | 
|---|
|  | 1088 | @end example | 
|---|
|  | 1089 |  | 
|---|
|  | 1090 | @item n | 
|---|
|  | 1091 | @findex n (next-line) command | 
|---|
|  | 1092 | @cindex Next input line, replace pattern space with | 
|---|
|  | 1093 | @cindex Read next input line | 
|---|
|  | 1094 | If auto-print is not disabled, print the pattern space, | 
|---|
|  | 1095 | then, regardless, replace the pattern space with the next line of input. | 
|---|
|  | 1096 | If there is no more input then @command{sed} exits without processing | 
|---|
|  | 1097 | any more commands. | 
|---|
|  | 1098 |  | 
|---|
|  | 1099 | This command is useful to skip lines (e.g. process every Nth line). | 
|---|
|  | 1100 |  | 
|---|
|  | 1101 | Example: perform substitution on every 3rd line (i.e. two @code{n} commands | 
|---|
|  | 1102 | skip two lines): | 
|---|
|  | 1103 | @codequoteundirected on | 
|---|
|  | 1104 | @codequotebacktick on | 
|---|
|  | 1105 | @example | 
|---|
|  | 1106 | $ seq 6 | sed 'n;n;s/./x/' | 
|---|
|  | 1107 | 1 | 
|---|
|  | 1108 | 2 | 
|---|
|  | 1109 | x | 
|---|
|  | 1110 | 4 | 
|---|
|  | 1111 | 5 | 
|---|
|  | 1112 | x | 
|---|
|  | 1113 | @end example | 
|---|
|  | 1114 |  | 
|---|
|  | 1115 | @value{SSED} provides an extension address syntax of @var{first}~@var{step} | 
|---|
|  | 1116 | to achieve the same result: | 
|---|
|  | 1117 |  | 
|---|
|  | 1118 | @example | 
|---|
|  | 1119 | $ seq 6 | sed '0~3s/./x/' | 
|---|
|  | 1120 | 1 | 
|---|
|  | 1121 | 2 | 
|---|
|  | 1122 | x | 
|---|
|  | 1123 | 4 | 
|---|
|  | 1124 | 5 | 
|---|
|  | 1125 | x | 
|---|
|  | 1126 | @end example | 
|---|
|  | 1127 |  | 
|---|
|  | 1128 | @codequotebacktick off | 
|---|
|  | 1129 | @codequoteundirected off | 
|---|
|  | 1130 |  | 
|---|
|  | 1131 |  | 
|---|
|  | 1132 | @item @{ @var{commands} @} | 
|---|
|  | 1133 | @findex @{@} command grouping | 
|---|
|  | 1134 | @cindex Grouping commands | 
|---|
|  | 1135 | @cindex Command groups | 
|---|
|  | 1136 | A group of commands may be enclosed between | 
|---|
|  | 1137 | @code{@{} and @code{@}} characters. | 
|---|
|  | 1138 | This is particularly useful when you want a group of commands | 
|---|
|  | 1139 | to be triggered by a single address (or address-range) match. | 
|---|
|  | 1140 |  | 
|---|
|  | 1141 | Example: perform substitution then print the second input line: | 
|---|
|  | 1142 | @codequoteundirected on | 
|---|
|  | 1143 | @codequotebacktick on | 
|---|
|  | 1144 | @example | 
|---|
|  | 1145 | $ seq 3 | sed -n '2@{s/2/X/ ; p@}' | 
|---|
|  | 1146 | X | 
|---|
|  | 1147 | @end example | 
|---|
|  | 1148 | @codequoteundirected off | 
|---|
|  | 1149 | @codequotebacktick off | 
|---|
|  | 1150 |  | 
|---|
|  | 1151 | @end table | 
|---|
|  | 1152 |  | 
|---|
|  | 1153 |  | 
|---|
| [599] | 1154 | @node Other Commands | 
|---|
|  | 1155 | @section Less Frequently-Used Commands | 
|---|
|  | 1156 |  | 
|---|
|  | 1157 | Though perhaps less frequently used than those in the previous | 
|---|
|  | 1158 | section, some very small yet useful @command{sed} scripts can be built with | 
|---|
|  | 1159 | these commands. | 
|---|
|  | 1160 |  | 
|---|
|  | 1161 | @table @code | 
|---|
|  | 1162 | @item y/@var{source-chars}/@var{dest-chars}/ | 
|---|
|  | 1163 | @findex y (transliterate) command | 
|---|
|  | 1164 | @cindex Transliteration | 
|---|
|  | 1165 | Transliterate any characters in the pattern space which match | 
|---|
|  | 1166 | any of the @var{source-chars} with the corresponding character | 
|---|
|  | 1167 | in @var{dest-chars}. | 
|---|
|  | 1168 |  | 
|---|
| [3613] | 1169 | Example: transliterate @samp{a-j} into @samp{0-9}: | 
|---|
|  | 1170 | @codequoteundirected on | 
|---|
|  | 1171 | @codequotebacktick on | 
|---|
|  | 1172 | @example | 
|---|
|  | 1173 | $ echo hello world | sed 'y/abcdefghij/0123456789/' | 
|---|
|  | 1174 | 74llo worl3 | 
|---|
|  | 1175 | @end example | 
|---|
|  | 1176 | @codequoteundirected off | 
|---|
|  | 1177 | @codequotebacktick off | 
|---|
|  | 1178 |  | 
|---|
|  | 1179 | (The @code{/} characters may be uniformly replaced by | 
|---|
|  | 1180 | any other single character within any given @code{y} command.) | 
|---|
|  | 1181 |  | 
|---|
| [599] | 1182 | Instances of the @code{/} (or whatever other character is used in its stead), | 
|---|
|  | 1183 | @code{\}, or newlines can appear in the @var{source-chars} or @var{dest-chars} | 
|---|
|  | 1184 | lists, provide that each instance is escaped by a @code{\}. | 
|---|
|  | 1185 | The @var{source-chars} and @var{dest-chars} lists @emph{must} | 
|---|
|  | 1186 | contain the same number of characters (after de-escaping). | 
|---|
|  | 1187 |  | 
|---|
| [3613] | 1188 | See the @command{tr} command from GNU coreutils for similar functionality. | 
|---|
|  | 1189 |  | 
|---|
|  | 1190 | @item a @var{text} | 
|---|
|  | 1191 | Appending @var{text} after a line. This is a GNU extension | 
|---|
|  | 1192 | to the standard @code{a} command - see below for details. | 
|---|
|  | 1193 |  | 
|---|
|  | 1194 | Example: Add @samp{hello} after the second line: | 
|---|
|  | 1195 | @codequoteundirected on | 
|---|
|  | 1196 | @codequotebacktick on | 
|---|
|  | 1197 | @example | 
|---|
|  | 1198 | $ seq 3 | sed '2a hello' | 
|---|
|  | 1199 | 1 | 
|---|
|  | 1200 | 2 | 
|---|
|  | 1201 | hello | 
|---|
|  | 1202 | 3 | 
|---|
|  | 1203 | @end example | 
|---|
|  | 1204 | @codequoteundirected off | 
|---|
|  | 1205 | @codequotebacktick off | 
|---|
|  | 1206 |  | 
|---|
|  | 1207 | Leading whitespace after the @code{a} command is ignored. | 
|---|
|  | 1208 | The text to add is read until the end of the line. | 
|---|
|  | 1209 |  | 
|---|
|  | 1210 |  | 
|---|
| [599] | 1211 | @item a\ | 
|---|
|  | 1212 | @itemx @var{text} | 
|---|
|  | 1213 | @findex a (append text lines) command | 
|---|
|  | 1214 | @cindex Appending text after a line | 
|---|
|  | 1215 | @cindex Text, appending | 
|---|
| [3613] | 1216 | Appending @var{text} after a line. | 
|---|
|  | 1217 |  | 
|---|
|  | 1218 | Example: Add @samp{hello} after the second line | 
|---|
|  | 1219 | (@print{} indicates printed output lines): | 
|---|
|  | 1220 | @codequoteundirected on | 
|---|
|  | 1221 | @codequotebacktick on | 
|---|
|  | 1222 | @example | 
|---|
|  | 1223 | $ seq 3 | sed '2a\ | 
|---|
|  | 1224 | hello' | 
|---|
|  | 1225 | @print{}1 | 
|---|
|  | 1226 | @print{}2 | 
|---|
|  | 1227 | @print{}hello | 
|---|
|  | 1228 | @print{}3 | 
|---|
|  | 1229 | @end example | 
|---|
|  | 1230 | @codequoteundirected off | 
|---|
|  | 1231 | @codequotebacktick off | 
|---|
|  | 1232 |  | 
|---|
|  | 1233 | The @code{a} command queues the lines of text which follow this command | 
|---|
| [599] | 1234 | (each but the last ending with a @code{\}, | 
|---|
|  | 1235 | which are removed from the output) | 
|---|
|  | 1236 | to be output at the end of the current cycle, | 
|---|
|  | 1237 | or when the next input line is read. | 
|---|
|  | 1238 |  | 
|---|
| [3613] | 1239 | @cindex @value{SSEDEXT}, two addresses supported by most commands | 
|---|
|  | 1240 | As a GNU extension, this command accepts two addresses. | 
|---|
|  | 1241 |  | 
|---|
| [599] | 1242 | Escape sequences in @var{text} are processed, so you should | 
|---|
|  | 1243 | use @code{\\} in @var{text} to print a single backslash. | 
|---|
|  | 1244 |  | 
|---|
| [3613] | 1245 | The commands resume after the last line without a backslash (@code{\}) - | 
|---|
|  | 1246 | @samp{world} in the following example: | 
|---|
|  | 1247 | @codequoteundirected on | 
|---|
|  | 1248 | @codequotebacktick on | 
|---|
|  | 1249 | @example | 
|---|
|  | 1250 | $ seq 3 | sed '2a\ | 
|---|
|  | 1251 | hello\ | 
|---|
|  | 1252 | world | 
|---|
|  | 1253 | 3s/./X/' | 
|---|
|  | 1254 | @print{}1 | 
|---|
|  | 1255 | @print{}2 | 
|---|
|  | 1256 | @print{}hello | 
|---|
|  | 1257 | @print{}world | 
|---|
|  | 1258 | @print{}X | 
|---|
|  | 1259 | @end example | 
|---|
|  | 1260 | @codequoteundirected off | 
|---|
|  | 1261 | @codequotebacktick off | 
|---|
| [599] | 1262 |  | 
|---|
| [3613] | 1263 | As a GNU extension, the @code{a} command and @var{text} can be | 
|---|
|  | 1264 | separated into two @code{-e} parameters, enabling easier scripting: | 
|---|
|  | 1265 | @codequoteundirected on | 
|---|
|  | 1266 | @codequotebacktick on | 
|---|
|  | 1267 | @example | 
|---|
|  | 1268 | $ seq 3 | sed -e '2a\' -e hello | 
|---|
|  | 1269 | 1 | 
|---|
|  | 1270 | 2 | 
|---|
|  | 1271 | hello | 
|---|
|  | 1272 | 3 | 
|---|
|  | 1273 |  | 
|---|
|  | 1274 | $ sed -e '2a\' -e "$VAR" | 
|---|
|  | 1275 | @end example | 
|---|
|  | 1276 | @codequoteundirected off | 
|---|
|  | 1277 | @codequotebacktick off | 
|---|
|  | 1278 |  | 
|---|
|  | 1279 | @item i @var{text} | 
|---|
|  | 1280 | insert @var{text} before a line. This is a GNU extension | 
|---|
|  | 1281 | to the standard @code{i} command - see below for details. | 
|---|
|  | 1282 |  | 
|---|
|  | 1283 | Example: Insert @samp{hello} before the second line: | 
|---|
|  | 1284 | @codequoteundirected on | 
|---|
|  | 1285 | @codequotebacktick on | 
|---|
|  | 1286 | @example | 
|---|
|  | 1287 | $ seq 3 | sed '2i hello' | 
|---|
|  | 1288 | 1 | 
|---|
|  | 1289 | hello | 
|---|
|  | 1290 | 2 | 
|---|
|  | 1291 | 3 | 
|---|
|  | 1292 | @end example | 
|---|
|  | 1293 | @codequoteundirected off | 
|---|
|  | 1294 | @codequotebacktick off | 
|---|
|  | 1295 |  | 
|---|
|  | 1296 | Leading whitespace after the @code{i} command is ignored. | 
|---|
|  | 1297 | The text to add is read until the end of the line. | 
|---|
|  | 1298 |  | 
|---|
|  | 1299 | @anchor{insert command} | 
|---|
| [599] | 1300 | @item i\ | 
|---|
|  | 1301 | @itemx @var{text} | 
|---|
|  | 1302 | @findex i (insert text lines) command | 
|---|
|  | 1303 | @cindex Inserting text before a line | 
|---|
|  | 1304 | @cindex Text, insertion | 
|---|
| [3613] | 1305 | Immediately output the lines of text which follow this command. | 
|---|
| [599] | 1306 |  | 
|---|
| [3613] | 1307 | Example: Insert @samp{hello} before the second line | 
|---|
|  | 1308 | (@print{} indicates printed output lines): | 
|---|
|  | 1309 | @codequoteundirected on | 
|---|
|  | 1310 | @codequotebacktick on | 
|---|
|  | 1311 | @example | 
|---|
|  | 1312 | $ seq 3 | sed '2i\ | 
|---|
|  | 1313 | hello' | 
|---|
|  | 1314 | @print{}1 | 
|---|
|  | 1315 | @print{}hello | 
|---|
|  | 1316 | @print{}2 | 
|---|
|  | 1317 | @print{}3 | 
|---|
|  | 1318 | @end example | 
|---|
|  | 1319 | @codequoteundirected off | 
|---|
|  | 1320 | @codequotebacktick off | 
|---|
|  | 1321 |  | 
|---|
|  | 1322 | @cindex @value{SSEDEXT}, two addresses supported by most commands | 
|---|
|  | 1323 | As a GNU extension, this command accepts two addresses. | 
|---|
|  | 1324 |  | 
|---|
|  | 1325 | Escape sequences in @var{text} are processed, so you should | 
|---|
|  | 1326 | use @code{\\} in @var{text} to print a single backslash. | 
|---|
|  | 1327 |  | 
|---|
|  | 1328 | The commands resume after the last line without a backslash (@code{\}) - | 
|---|
|  | 1329 | @samp{world} in the following example: | 
|---|
|  | 1330 | @codequoteundirected on | 
|---|
|  | 1331 | @codequotebacktick on | 
|---|
|  | 1332 | @example | 
|---|
|  | 1333 | $ seq 3 | sed '2i\ | 
|---|
|  | 1334 | hello\ | 
|---|
|  | 1335 | world | 
|---|
|  | 1336 | s/./X/' | 
|---|
|  | 1337 | @print{}X | 
|---|
|  | 1338 | @print{}hello | 
|---|
|  | 1339 | @print{}world | 
|---|
|  | 1340 | @print{}X | 
|---|
|  | 1341 | @print{}X | 
|---|
|  | 1342 | @end example | 
|---|
|  | 1343 | @codequoteundirected off | 
|---|
|  | 1344 | @codequotebacktick off | 
|---|
|  | 1345 |  | 
|---|
|  | 1346 | As a GNU extension, the @code{i} command and @var{text} can be | 
|---|
|  | 1347 | separated into two @code{-e} parameters, enabling easier scripting: | 
|---|
|  | 1348 | @codequoteundirected on | 
|---|
|  | 1349 | @codequotebacktick on | 
|---|
|  | 1350 | @example | 
|---|
|  | 1351 | $ seq 3 | sed -e '2i\' -e hello | 
|---|
|  | 1352 | 1 | 
|---|
|  | 1353 | hello | 
|---|
|  | 1354 | 2 | 
|---|
|  | 1355 | 3 | 
|---|
|  | 1356 |  | 
|---|
|  | 1357 | $ sed -e '2i\' -e "$VAR" | 
|---|
|  | 1358 | @end example | 
|---|
|  | 1359 | @codequoteundirected off | 
|---|
|  | 1360 | @codequotebacktick off | 
|---|
|  | 1361 |  | 
|---|
|  | 1362 | @item c @var{text} | 
|---|
|  | 1363 | Replaces the line(s) with @var{text}. This is a GNU extension | 
|---|
|  | 1364 | to the standard @code{c} command - see below for details. | 
|---|
|  | 1365 |  | 
|---|
|  | 1366 | Example: Replace the 2nd to 9th lines with the word @samp{hello}: | 
|---|
|  | 1367 | @codequoteundirected on | 
|---|
|  | 1368 | @codequotebacktick on | 
|---|
|  | 1369 | @example | 
|---|
|  | 1370 | $ seq 10 | sed '2,9c hello' | 
|---|
|  | 1371 | 1 | 
|---|
|  | 1372 | hello | 
|---|
|  | 1373 | 10 | 
|---|
|  | 1374 | @end example | 
|---|
|  | 1375 | @codequoteundirected off | 
|---|
|  | 1376 | @codequotebacktick off | 
|---|
|  | 1377 |  | 
|---|
|  | 1378 | Leading whitespace after the @code{c} command is ignored. | 
|---|
|  | 1379 | The text to add is read until the end of the line. | 
|---|
|  | 1380 |  | 
|---|
| [599] | 1381 | @item c\ | 
|---|
|  | 1382 | @itemx @var{text} | 
|---|
|  | 1383 | @findex c (change to text lines) command | 
|---|
|  | 1384 | @cindex Replacing selected lines with other text | 
|---|
|  | 1385 | Delete the lines matching the address or address-range, | 
|---|
| [3613] | 1386 | and output the lines of text which follow this command. | 
|---|
|  | 1387 |  | 
|---|
|  | 1388 | Example: Replace 2nd to 4th lines with the words @samp{hello} and | 
|---|
|  | 1389 | @samp{world} (@print{} indicates printed output lines): | 
|---|
|  | 1390 | @codequoteundirected on | 
|---|
|  | 1391 | @codequotebacktick on | 
|---|
|  | 1392 | @example | 
|---|
|  | 1393 | $ seq 5 | sed '2,4c\ | 
|---|
|  | 1394 | hello\ | 
|---|
|  | 1395 | world' | 
|---|
|  | 1396 | @print{}1 | 
|---|
|  | 1397 | @print{}hello | 
|---|
|  | 1398 | @print{}world | 
|---|
|  | 1399 | @print{}5 | 
|---|
|  | 1400 | @end example | 
|---|
|  | 1401 | @codequoteundirected off | 
|---|
|  | 1402 | @codequotebacktick off | 
|---|
|  | 1403 |  | 
|---|
|  | 1404 | If no addresses are given, each line is replaced. | 
|---|
|  | 1405 |  | 
|---|
| [599] | 1406 | A new cycle is started after this command is done, | 
|---|
|  | 1407 | since the pattern space will have been deleted. | 
|---|
| [3613] | 1408 | In the following example, the @code{c} starts a | 
|---|
|  | 1409 | new cycle and the substitution command is not performed | 
|---|
|  | 1410 | on the replaced text: | 
|---|
| [599] | 1411 |  | 
|---|
| [3613] | 1412 | @codequoteundirected on | 
|---|
|  | 1413 | @codequotebacktick on | 
|---|
|  | 1414 | @example | 
|---|
|  | 1415 | $ seq 3 | sed '2c\ | 
|---|
|  | 1416 | hello | 
|---|
|  | 1417 | s/./X/' | 
|---|
|  | 1418 | @print{}X | 
|---|
|  | 1419 | @print{}hello | 
|---|
|  | 1420 | @print{}X | 
|---|
|  | 1421 | @end example | 
|---|
|  | 1422 | @codequoteundirected off | 
|---|
|  | 1423 | @codequotebacktick off | 
|---|
|  | 1424 |  | 
|---|
|  | 1425 | As a GNU extension, the @code{c} command and @var{text} can be | 
|---|
|  | 1426 | separated into two @code{-e} parameters, enabling easier scripting: | 
|---|
|  | 1427 | @codequoteundirected on | 
|---|
|  | 1428 | @codequotebacktick on | 
|---|
|  | 1429 | @example | 
|---|
|  | 1430 | $ seq 3 | sed -e '2c\' -e hello | 
|---|
|  | 1431 | 1 | 
|---|
|  | 1432 | hello | 
|---|
|  | 1433 | 3 | 
|---|
|  | 1434 |  | 
|---|
|  | 1435 | $ sed -e '2c\' -e "$VAR" | 
|---|
|  | 1436 | @end example | 
|---|
|  | 1437 | @codequoteundirected off | 
|---|
|  | 1438 | @codequotebacktick off | 
|---|
|  | 1439 |  | 
|---|
|  | 1440 |  | 
|---|
| [599] | 1441 | @item = | 
|---|
|  | 1442 | @findex = (print line number) command | 
|---|
|  | 1443 | @cindex Printing line number | 
|---|
|  | 1444 | @cindex Line number, printing | 
|---|
|  | 1445 | Print out the current input line number (with a trailing newline). | 
|---|
|  | 1446 |  | 
|---|
| [3613] | 1447 | @codequoteundirected on | 
|---|
|  | 1448 | @codequotebacktick on | 
|---|
|  | 1449 | @example | 
|---|
|  | 1450 | $ printf '%s\n' aaa bbb ccc | sed = | 
|---|
|  | 1451 | 1 | 
|---|
|  | 1452 | aaa | 
|---|
|  | 1453 | 2 | 
|---|
|  | 1454 | bbb | 
|---|
|  | 1455 | 3 | 
|---|
|  | 1456 | ccc | 
|---|
|  | 1457 | @end example | 
|---|
|  | 1458 | @codequoteundirected off | 
|---|
|  | 1459 | @codequotebacktick off | 
|---|
|  | 1460 |  | 
|---|
|  | 1461 | @cindex @value{SSEDEXT}, two addresses supported by most commands | 
|---|
|  | 1462 | As a GNU extension, this command accepts two addresses. | 
|---|
|  | 1463 |  | 
|---|
|  | 1464 |  | 
|---|
|  | 1465 |  | 
|---|
|  | 1466 |  | 
|---|
| [599] | 1467 | @item l @var{n} | 
|---|
|  | 1468 | @findex l (list unambiguously) command | 
|---|
|  | 1469 | @cindex List pattern space | 
|---|
|  | 1470 | @cindex Printing text unambiguously | 
|---|
|  | 1471 | @cindex Line length, setting | 
|---|
|  | 1472 | @cindex @value{SSEDEXT}, setting line length | 
|---|
|  | 1473 | Print the pattern space in an unambiguous form: | 
|---|
|  | 1474 | non-printable characters (and the @code{\} character) | 
|---|
|  | 1475 | are printed in C-style escaped form; long lines are split, | 
|---|
|  | 1476 | with a trailing @code{\} character to indicate the split; | 
|---|
|  | 1477 | the end of each line is marked with a @code{$}. | 
|---|
|  | 1478 |  | 
|---|
|  | 1479 | @var{n} specifies the desired line-wrap length; | 
|---|
|  | 1480 | a length of 0 (zero) means to never wrap long lines.  If omitted, | 
|---|
|  | 1481 | the default as specified on the command line is used.  The @var{n} | 
|---|
|  | 1482 | parameter is a @value{SSED} extension. | 
|---|
|  | 1483 |  | 
|---|
|  | 1484 | @item r @var{filename} | 
|---|
|  | 1485 |  | 
|---|
|  | 1486 | @findex r (read file) command | 
|---|
|  | 1487 | @cindex Read text from a file | 
|---|
| [3613] | 1488 | Reads file @var{filename}. Example: | 
|---|
|  | 1489 |  | 
|---|
|  | 1490 | @codequoteundirected on | 
|---|
|  | 1491 | @codequotebacktick on | 
|---|
|  | 1492 | @example | 
|---|
|  | 1493 | $ seq 3 | sed '2r/etc/hostname' | 
|---|
|  | 1494 | 1 | 
|---|
|  | 1495 | 2 | 
|---|
|  | 1496 | fencepost.gnu.org | 
|---|
|  | 1497 | 3 | 
|---|
|  | 1498 | @end example | 
|---|
|  | 1499 | @codequoteundirected off | 
|---|
|  | 1500 | @codequotebacktick off | 
|---|
|  | 1501 |  | 
|---|
| [599] | 1502 | @cindex @value{SSEDEXT}, @file{/dev/stdin} file | 
|---|
|  | 1503 | Queue the contents of @var{filename} to be read and | 
|---|
|  | 1504 | inserted into the output stream at the end of the current cycle, | 
|---|
|  | 1505 | or when the next input line is read. | 
|---|
|  | 1506 | Note that if @var{filename} cannot be read, it is treated as | 
|---|
|  | 1507 | if it were an empty file, without any error indication. | 
|---|
|  | 1508 |  | 
|---|
|  | 1509 | As a @value{SSED} extension, the special value @file{/dev/stdin} | 
|---|
|  | 1510 | is supported for the file name, which reads the contents of the | 
|---|
|  | 1511 | standard input. | 
|---|
|  | 1512 |  | 
|---|
| [3613] | 1513 | @cindex @value{SSEDEXT}, two addresses supported by most commands | 
|---|
|  | 1514 | As a GNU extension, this command accepts two addresses. The | 
|---|
|  | 1515 | file will then be reread and inserted on each of the addressed lines. | 
|---|
|  | 1516 |  | 
|---|
|  | 1517 | As a @value{SSED} extension, the @code{r} command accepts a zero address, | 
|---|
|  | 1518 | inserting a file @emph{before} the first line of the input | 
|---|
|  | 1519 | @pxref{Adding a header to multiple files}. | 
|---|
|  | 1520 |  | 
|---|
| [599] | 1521 | @item w @var{filename} | 
|---|
|  | 1522 | @findex w (write file) command | 
|---|
|  | 1523 | @cindex Write to a file | 
|---|
|  | 1524 | @cindex @value{SSEDEXT}, @file{/dev/stdout} file | 
|---|
|  | 1525 | @cindex @value{SSEDEXT}, @file{/dev/stderr} file | 
|---|
|  | 1526 | Write the pattern space to @var{filename}. | 
|---|
| [3613] | 1527 | As a @value{SSED} extension, two special values of @var{filename} are | 
|---|
| [599] | 1528 | supported: @file{/dev/stderr}, which writes the result to the standard | 
|---|
|  | 1529 | error, and @file{/dev/stdout}, which writes to the standard | 
|---|
|  | 1530 | output.@footnote{This is equivalent to @code{p} unless the @option{-i} | 
|---|
|  | 1531 | option is being used.} | 
|---|
|  | 1532 |  | 
|---|
| [3613] | 1533 | The file will be created (or truncated) before the first input line is | 
|---|
|  | 1534 | read; all @code{w} commands (including instances of the @code{w} flag | 
|---|
|  | 1535 | on successful @code{s} commands) which refer to the same @var{filename} | 
|---|
|  | 1536 | are output without closing and reopening the file. | 
|---|
| [599] | 1537 |  | 
|---|
|  | 1538 | @item D | 
|---|
|  | 1539 | @findex D (delete first line) command | 
|---|
|  | 1540 | @cindex Delete first line from pattern space | 
|---|
| [3613] | 1541 | If pattern space contains no newline, start a normal new cycle as if | 
|---|
|  | 1542 | the @code{d} command was issued.  Otherwise, delete text in the pattern | 
|---|
|  | 1543 | space up to the first newline, and restart cycle with the resultant | 
|---|
|  | 1544 | pattern space, without reading a new line of input. | 
|---|
| [599] | 1545 |  | 
|---|
|  | 1546 | @item N | 
|---|
|  | 1547 | @findex N (append Next line) command | 
|---|
|  | 1548 | @cindex Next input line, append to pattern space | 
|---|
|  | 1549 | @cindex Append next input line to pattern space | 
|---|
|  | 1550 | Add a newline to the pattern space, | 
|---|
|  | 1551 | then append the next line of input to the pattern space. | 
|---|
|  | 1552 | If there is no more input then @command{sed} exits without processing | 
|---|
|  | 1553 | any more commands. | 
|---|
|  | 1554 |  | 
|---|
| [3613] | 1555 | When @option{-z} is used, a zero byte (the ascii @samp{NUL} character) is | 
|---|
|  | 1556 | added between the lines (instead of a new line). | 
|---|
|  | 1557 |  | 
|---|
|  | 1558 | By default @command{sed} does not terminate if there is no 'next' input line. | 
|---|
|  | 1559 | This is a GNU extension which can be disabled with @option{--posix}. | 
|---|
|  | 1560 | @xref{N_command_last_line,,N command on the last line}. | 
|---|
|  | 1561 |  | 
|---|
|  | 1562 |  | 
|---|
| [599] | 1563 | @item P | 
|---|
|  | 1564 | @findex P (print first line) command | 
|---|
|  | 1565 | @cindex Print first line from pattern space | 
|---|
|  | 1566 | Print out the portion of the pattern space up to the first newline. | 
|---|
|  | 1567 |  | 
|---|
|  | 1568 | @item h | 
|---|
|  | 1569 | @findex h (hold) command | 
|---|
|  | 1570 | @cindex Copy pattern space into hold space | 
|---|
|  | 1571 | @cindex Replace hold space with copy of pattern space | 
|---|
|  | 1572 | @cindex Hold space, copying pattern space into | 
|---|
|  | 1573 | Replace the contents of the hold space with the contents of the pattern space. | 
|---|
|  | 1574 |  | 
|---|
|  | 1575 | @item H | 
|---|
|  | 1576 | @findex H (append Hold) command | 
|---|
|  | 1577 | @cindex Append pattern space to hold space | 
|---|
|  | 1578 | @cindex Hold space, appending from pattern space | 
|---|
|  | 1579 | Append a newline to the contents of the hold space, | 
|---|
|  | 1580 | and then append the contents of the pattern space to that of the hold space. | 
|---|
|  | 1581 |  | 
|---|
|  | 1582 | @item g | 
|---|
|  | 1583 | @findex g (get) command | 
|---|
|  | 1584 | @cindex Copy hold space into pattern space | 
|---|
|  | 1585 | @cindex Replace pattern space with copy of hold space | 
|---|
|  | 1586 | @cindex Hold space, copy into pattern space | 
|---|
|  | 1587 | Replace the contents of the pattern space with the contents of the hold space. | 
|---|
|  | 1588 |  | 
|---|
|  | 1589 | @item G | 
|---|
|  | 1590 | @findex G (appending Get) command | 
|---|
|  | 1591 | @cindex Append hold space to pattern space | 
|---|
|  | 1592 | @cindex Hold space, appending to pattern space | 
|---|
|  | 1593 | Append a newline to the contents of the pattern space, | 
|---|
|  | 1594 | and then append the contents of the hold space to that of the pattern space. | 
|---|
|  | 1595 |  | 
|---|
|  | 1596 | @item x | 
|---|
|  | 1597 | @findex x (eXchange) command | 
|---|
|  | 1598 | @cindex Exchange hold space with pattern space | 
|---|
|  | 1599 | @cindex Hold space, exchange with pattern space | 
|---|
|  | 1600 | Exchange the contents of the hold and pattern spaces. | 
|---|
|  | 1601 |  | 
|---|
|  | 1602 | @end table | 
|---|
|  | 1603 |  | 
|---|
|  | 1604 |  | 
|---|
|  | 1605 | @node Programming Commands | 
|---|
|  | 1606 | @section Commands for @command{sed} gurus | 
|---|
|  | 1607 |  | 
|---|
|  | 1608 | In most cases, use of these commands indicates that you are | 
|---|
|  | 1609 | probably better off programming in something like @command{awk} | 
|---|
|  | 1610 | or Perl.  But occasionally one is committed to sticking | 
|---|
|  | 1611 | with @command{sed}, and these commands can enable one to write | 
|---|
|  | 1612 | quite convoluted scripts. | 
|---|
|  | 1613 |  | 
|---|
|  | 1614 | @cindex Flow of control in scripts | 
|---|
|  | 1615 | @table @code | 
|---|
|  | 1616 | @item : @var{label} | 
|---|
|  | 1617 | [No addresses allowed.] | 
|---|
|  | 1618 |  | 
|---|
|  | 1619 | @findex : (label) command | 
|---|
|  | 1620 | @cindex Labels, in scripts | 
|---|
|  | 1621 | Specify the location of @var{label} for branch commands. | 
|---|
|  | 1622 | In all other respects, a no-op. | 
|---|
|  | 1623 |  | 
|---|
|  | 1624 | @item b @var{label} | 
|---|
|  | 1625 | @findex b (branch) command | 
|---|
|  | 1626 | @cindex Branch to a label, unconditionally | 
|---|
|  | 1627 | @cindex Goto, in scripts | 
|---|
|  | 1628 | Unconditionally branch to @var{label}. | 
|---|
|  | 1629 | The @var{label} may be omitted, in which case the next cycle is started. | 
|---|
|  | 1630 |  | 
|---|
|  | 1631 | @item t @var{label} | 
|---|
|  | 1632 | @findex t (test and branch if successful) command | 
|---|
|  | 1633 | @cindex Branch to a label, if @code{s///} succeeded | 
|---|
|  | 1634 | @cindex Conditional branch | 
|---|
|  | 1635 | Branch to @var{label} only if there has been a successful @code{s}ubstitution | 
|---|
|  | 1636 | since the last input line was read or conditional branch was taken. | 
|---|
|  | 1637 | The @var{label} may be omitted, in which case the next cycle is started. | 
|---|
|  | 1638 |  | 
|---|
|  | 1639 | @end table | 
|---|
|  | 1640 |  | 
|---|
|  | 1641 | @node Extended Commands | 
|---|
|  | 1642 | @section Commands Specific to @value{SSED} | 
|---|
|  | 1643 |  | 
|---|
|  | 1644 | These commands are specific to @value{SSED}, so you | 
|---|
|  | 1645 | must use them with care and only when you are sure that | 
|---|
|  | 1646 | hindering portability is not evil.  They allow you to check | 
|---|
|  | 1647 | for @value{SSED} extensions or to do tasks that are required | 
|---|
|  | 1648 | quite often, yet are unsupported by standard @command{sed}s. | 
|---|
|  | 1649 |  | 
|---|
|  | 1650 | @table @code | 
|---|
|  | 1651 | @item e [@var{command}] | 
|---|
|  | 1652 | @findex e (evaluate) command | 
|---|
|  | 1653 | @cindex Evaluate Bourne-shell commands | 
|---|
|  | 1654 | @cindex Subprocesses | 
|---|
|  | 1655 | @cindex @value{SSEDEXT}, evaluating Bourne-shell commands | 
|---|
|  | 1656 | @cindex @value{SSEDEXT}, subprocesses | 
|---|
|  | 1657 | This command allows one to pipe input from a shell command | 
|---|
|  | 1658 | into pattern space.  Without parameters, the @code{e} command | 
|---|
|  | 1659 | executes the command that is found in pattern space and | 
|---|
|  | 1660 | replaces the pattern space with the output; a trailing newline | 
|---|
|  | 1661 | is suppressed. | 
|---|
|  | 1662 |  | 
|---|
|  | 1663 | If a parameter is specified, instead, the @code{e} command | 
|---|
| [3613] | 1664 | interprets it as a command and sends its output to the output stream. | 
|---|
|  | 1665 | The command can run across multiple lines, all but the last ending with | 
|---|
|  | 1666 | a back-slash. | 
|---|
| [599] | 1667 |  | 
|---|
|  | 1668 | In both cases, the results are undefined if the command to be | 
|---|
|  | 1669 | executed contains a @sc{nul} character. | 
|---|
|  | 1670 |  | 
|---|
| [3613] | 1671 | Note that, unlike the @code{r} command, the output of the command will | 
|---|
|  | 1672 | be printed immediately; the @code{r} command instead delays the output | 
|---|
|  | 1673 | to the end of the current cycle. | 
|---|
| [599] | 1674 |  | 
|---|
| [3613] | 1675 | @item F | 
|---|
|  | 1676 | @findex F (File name) command | 
|---|
|  | 1677 | @cindex Printing file name | 
|---|
|  | 1678 | @cindex File name, printing | 
|---|
|  | 1679 | Print out the file name of the current input file (with a trailing | 
|---|
|  | 1680 | newline). | 
|---|
| [599] | 1681 |  | 
|---|
|  | 1682 | @item Q [@var{exit-code}] | 
|---|
| [3613] | 1683 | This command accepts only one address. | 
|---|
| [599] | 1684 |  | 
|---|
|  | 1685 | @findex Q (silent Quit) command | 
|---|
|  | 1686 | @cindex @value{SSEDEXT}, quitting silently | 
|---|
|  | 1687 | @cindex @value{SSEDEXT}, returning an exit code | 
|---|
|  | 1688 | @cindex Quitting | 
|---|
|  | 1689 | This command is the same as @code{q}, but will not print the | 
|---|
|  | 1690 | contents of pattern space.  Like @code{q}, it provides the | 
|---|
|  | 1691 | ability to return an exit code to the caller. | 
|---|
|  | 1692 |  | 
|---|
|  | 1693 | This command can be useful because the only alternative ways | 
|---|
|  | 1694 | to accomplish this apparently trivial function are to use | 
|---|
|  | 1695 | the @option{-n} option (which can unnecessarily complicate | 
|---|
|  | 1696 | your script) or resorting to the following snippet, which | 
|---|
|  | 1697 | wastes time by reading the whole file without any visible effect: | 
|---|
|  | 1698 |  | 
|---|
|  | 1699 | @example | 
|---|
|  | 1700 | :eat | 
|---|
| [3613] | 1701 | $d       @i{@r{Quit silently on the last line}} | 
|---|
|  | 1702 | N        @i{@r{Read another line, silently}} | 
|---|
|  | 1703 | g        @i{@r{Overwrite pattern space each time to save memory}} | 
|---|
| [599] | 1704 | b eat | 
|---|
|  | 1705 | @end example | 
|---|
|  | 1706 |  | 
|---|
|  | 1707 | @item R @var{filename} | 
|---|
|  | 1708 | @findex R (read line) command | 
|---|
|  | 1709 | @cindex Read text from a file | 
|---|
|  | 1710 | @cindex @value{SSEDEXT}, reading a file a line at a time | 
|---|
|  | 1711 | @cindex @value{SSEDEXT}, @code{R} command | 
|---|
|  | 1712 | @cindex @value{SSEDEXT}, @file{/dev/stdin} file | 
|---|
|  | 1713 | Queue a line of @var{filename} to be read and | 
|---|
|  | 1714 | inserted into the output stream at the end of the current cycle, | 
|---|
|  | 1715 | or when the next input line is read. | 
|---|
|  | 1716 | Note that if @var{filename} cannot be read, or if its end is | 
|---|
|  | 1717 | reached, no line is appended, without any error indication. | 
|---|
|  | 1718 |  | 
|---|
|  | 1719 | As with the @code{r} command, the special value @file{/dev/stdin} | 
|---|
|  | 1720 | is supported for the file name, which reads a line from the | 
|---|
|  | 1721 | standard input. | 
|---|
|  | 1722 |  | 
|---|
|  | 1723 | @item T @var{label} | 
|---|
|  | 1724 | @findex T (test and branch if failed) command | 
|---|
|  | 1725 | @cindex @value{SSEDEXT}, branch if @code{s///} failed | 
|---|
|  | 1726 | @cindex Branch to a label, if @code{s///} failed | 
|---|
|  | 1727 | @cindex Conditional branch | 
|---|
|  | 1728 | Branch to @var{label} only if there have been no successful | 
|---|
|  | 1729 | @code{s}ubstitutions since the last input line was read or | 
|---|
|  | 1730 | conditional branch was taken. The @var{label} may be omitted, | 
|---|
|  | 1731 | in which case the next cycle is started. | 
|---|
|  | 1732 |  | 
|---|
|  | 1733 | @item v @var{version} | 
|---|
|  | 1734 | @findex v (version) command | 
|---|
|  | 1735 | @cindex @value{SSEDEXT}, checking for their presence | 
|---|
|  | 1736 | @cindex Requiring @value{SSED} | 
|---|
|  | 1737 | This command does nothing, but makes @command{sed} fail if | 
|---|
|  | 1738 | @value{SSED} extensions are not supported, simply because other | 
|---|
|  | 1739 | versions of @command{sed} do not implement it.  In addition, you | 
|---|
|  | 1740 | can specify the version of @command{sed} that your script | 
|---|
|  | 1741 | requires, such as @code{4.0.5}.  The default is @code{4.0} | 
|---|
|  | 1742 | because that is the first version that implemented this command. | 
|---|
|  | 1743 |  | 
|---|
|  | 1744 | This command enables all @value{SSEDEXT} even if | 
|---|
|  | 1745 | @env{POSIXLY_CORRECT} is set in the environment. | 
|---|
|  | 1746 |  | 
|---|
|  | 1747 | @item W @var{filename} | 
|---|
|  | 1748 | @findex W (write first line) command | 
|---|
|  | 1749 | @cindex Write first line to a file | 
|---|
|  | 1750 | @cindex @value{SSEDEXT}, writing first line to a file | 
|---|
|  | 1751 | Write to the given filename the portion of the pattern space up to | 
|---|
|  | 1752 | the first newline.  Everything said under the @code{w} command about | 
|---|
|  | 1753 | file handling holds here too. | 
|---|
| [3613] | 1754 |  | 
|---|
|  | 1755 | @item z | 
|---|
|  | 1756 | @findex z (Zap) command | 
|---|
|  | 1757 | @cindex @value{SSEDEXT}, emptying pattern space | 
|---|
|  | 1758 | @cindex Emptying pattern space | 
|---|
|  | 1759 | This command empties the content of pattern space.  It is | 
|---|
|  | 1760 | usually the same as @samp{s/.*//}, but is more efficient | 
|---|
|  | 1761 | and works in the presence of invalid multibyte sequences | 
|---|
|  | 1762 | in the input stream.  @sc{posix} mandates that such sequences | 
|---|
|  | 1763 | are @emph{not} matched by @samp{.}, so that there is no portable | 
|---|
|  | 1764 | way to clear @command{sed}'s buffers in the middle of the | 
|---|
|  | 1765 | script in most multibyte locales (including UTF-8 locales). | 
|---|
| [599] | 1766 | @end table | 
|---|
|  | 1767 |  | 
|---|
| [3613] | 1768 |  | 
|---|
|  | 1769 | @node Multiple commands syntax | 
|---|
|  | 1770 | @section Multiple commands syntax | 
|---|
|  | 1771 |  | 
|---|
|  | 1772 | @c POSIX says: | 
|---|
|  | 1773 | @c   Editing commands other than {...}, a, b, c, i, r, t, w, :, and # | 
|---|
|  | 1774 | @c   can be followed by a <semicolon>, optional <blank> characters, and | 
|---|
|  | 1775 | @c   another editing command. However, when an s editing command is used | 
|---|
|  | 1776 | @c   with the w flag, following it with another command in this manner | 
|---|
|  | 1777 | @c   produces undefined results. | 
|---|
|  | 1778 |  | 
|---|
|  | 1779 | There are several methods to specify multiple commands in a @command{sed} | 
|---|
|  | 1780 | program. | 
|---|
|  | 1781 |  | 
|---|
|  | 1782 | Using newlines is most natural when running a sed script from a file | 
|---|
|  | 1783 | (using the @option{-f} option). | 
|---|
|  | 1784 |  | 
|---|
|  | 1785 | On the command line, all @command{sed} commands may be separated by newlines. | 
|---|
|  | 1786 | Alternatively, you may specify each command as an argument to an @option{-e} | 
|---|
|  | 1787 | option: | 
|---|
|  | 1788 |  | 
|---|
|  | 1789 | @codequoteundirected on | 
|---|
|  | 1790 | @codequotebacktick on | 
|---|
|  | 1791 | @example | 
|---|
|  | 1792 | @group | 
|---|
|  | 1793 | $ seq 6 | sed '1d | 
|---|
|  | 1794 | 3d | 
|---|
|  | 1795 | 5d' | 
|---|
|  | 1796 | 2 | 
|---|
|  | 1797 | 4 | 
|---|
|  | 1798 | 6 | 
|---|
|  | 1799 |  | 
|---|
|  | 1800 | $ seq 6 | sed -e 1d -e 3d -e 5d | 
|---|
|  | 1801 | 2 | 
|---|
|  | 1802 | 4 | 
|---|
|  | 1803 | 6 | 
|---|
|  | 1804 | @end group | 
|---|
|  | 1805 | @end example | 
|---|
|  | 1806 | @codequoteundirected off | 
|---|
|  | 1807 | @codequotebacktick off | 
|---|
|  | 1808 |  | 
|---|
|  | 1809 | A semicolon (@samp{;}) may be used to separate most simple commands: | 
|---|
|  | 1810 |  | 
|---|
|  | 1811 | @codequoteundirected on | 
|---|
|  | 1812 | @codequotebacktick on | 
|---|
|  | 1813 | @example | 
|---|
|  | 1814 | @group | 
|---|
|  | 1815 | $ seq 6 | sed '1d;3d;5d' | 
|---|
|  | 1816 | 2 | 
|---|
|  | 1817 | 4 | 
|---|
|  | 1818 | 6 | 
|---|
|  | 1819 | @end group | 
|---|
|  | 1820 | @end example | 
|---|
|  | 1821 | @codequoteundirected off | 
|---|
|  | 1822 | @codequotebacktick off | 
|---|
|  | 1823 |  | 
|---|
|  | 1824 | The @code{@{},@code{@}},@code{b},@code{t},@code{T},@code{:} commands can | 
|---|
|  | 1825 | be separated with a semicolon (this is a non-portable @value{SSED} extension). | 
|---|
|  | 1826 |  | 
|---|
|  | 1827 | @codequoteundirected on | 
|---|
|  | 1828 | @codequotebacktick on | 
|---|
|  | 1829 | @example | 
|---|
|  | 1830 | @group | 
|---|
|  | 1831 | $ seq 4 | sed '@{1d;3d@}' | 
|---|
|  | 1832 | 2 | 
|---|
|  | 1833 | 4 | 
|---|
|  | 1834 |  | 
|---|
|  | 1835 | $ seq 6 | sed '@{1d;3d@};5d' | 
|---|
|  | 1836 | 2 | 
|---|
|  | 1837 | 4 | 
|---|
|  | 1838 | 6 | 
|---|
|  | 1839 | @end group | 
|---|
|  | 1840 | @end example | 
|---|
|  | 1841 | @codequoteundirected off | 
|---|
|  | 1842 | @codequotebacktick off | 
|---|
|  | 1843 |  | 
|---|
|  | 1844 | Labels used in @code{b},@code{t},@code{T},@code{:} commands are read | 
|---|
|  | 1845 | until a semicolon.  Leading and trailing whitespace is ignored.  In | 
|---|
|  | 1846 | the examples below the label is @samp{x}.  The first example works | 
|---|
|  | 1847 | with @value{SSED}.  The second is a portable equivalent.  For more | 
|---|
|  | 1848 | information about branching and labels @pxref{Branching and flow | 
|---|
|  | 1849 | control}. | 
|---|
|  | 1850 |  | 
|---|
|  | 1851 | @codequoteundirected on | 
|---|
|  | 1852 | @codequotebacktick on | 
|---|
|  | 1853 | @example | 
|---|
|  | 1854 | @group | 
|---|
|  | 1855 | $ seq 3 | sed '/1/b x ; s/^/=/ ; :x ; 3d' | 
|---|
|  | 1856 | 1 | 
|---|
|  | 1857 | =2 | 
|---|
|  | 1858 |  | 
|---|
|  | 1859 | $ seq 3 | sed -e '/1/bx' -e 's/^/=/' -e ':x' -e '3d' | 
|---|
|  | 1860 | 1 | 
|---|
|  | 1861 | =2 | 
|---|
|  | 1862 | @end group | 
|---|
|  | 1863 | @end example | 
|---|
|  | 1864 | @codequoteundirected off | 
|---|
|  | 1865 | @codequotebacktick off | 
|---|
|  | 1866 |  | 
|---|
|  | 1867 |  | 
|---|
|  | 1868 |  | 
|---|
|  | 1869 | @subsection Commands Requiring a newline | 
|---|
|  | 1870 |  | 
|---|
|  | 1871 | The following commands cannot be separated by a semicolon and | 
|---|
|  | 1872 | require a newline: | 
|---|
|  | 1873 |  | 
|---|
|  | 1874 | @table @asis | 
|---|
|  | 1875 |  | 
|---|
|  | 1876 | @item @code{a},@code{c},@code{i} (append/change/insert) | 
|---|
|  | 1877 |  | 
|---|
|  | 1878 | All characters following @code{a},@code{c},@code{i} commands are taken | 
|---|
|  | 1879 | as the text to append/change/insert.  Using a semicolon leads to | 
|---|
|  | 1880 | undesirable results: | 
|---|
|  | 1881 |  | 
|---|
|  | 1882 | @codequoteundirected on | 
|---|
|  | 1883 | @codequotebacktick on | 
|---|
|  | 1884 | @example | 
|---|
|  | 1885 | @group | 
|---|
|  | 1886 | $ seq 2 | sed '1aHello ; 2d' | 
|---|
|  | 1887 | 1 | 
|---|
|  | 1888 | Hello ; 2d | 
|---|
|  | 1889 | 2 | 
|---|
|  | 1890 | @end group | 
|---|
|  | 1891 | @end example | 
|---|
|  | 1892 | @codequoteundirected off | 
|---|
|  | 1893 | @codequotebacktick off | 
|---|
|  | 1894 |  | 
|---|
|  | 1895 | Separate the commands using @option{-e} or a newline: | 
|---|
|  | 1896 |  | 
|---|
|  | 1897 | @codequoteundirected on | 
|---|
|  | 1898 | @codequotebacktick on | 
|---|
|  | 1899 | @example | 
|---|
|  | 1900 | @group | 
|---|
|  | 1901 | $ seq 2 | sed -e 1aHello -e 2d | 
|---|
|  | 1902 | 1 | 
|---|
|  | 1903 | Hello | 
|---|
|  | 1904 |  | 
|---|
|  | 1905 | $ seq 2 | sed '1aHello | 
|---|
|  | 1906 | 2d' | 
|---|
|  | 1907 | 1 | 
|---|
|  | 1908 | Hello | 
|---|
|  | 1909 | @end group | 
|---|
|  | 1910 | @end example | 
|---|
|  | 1911 | @codequoteundirected off | 
|---|
|  | 1912 | @codequotebacktick off | 
|---|
|  | 1913 |  | 
|---|
|  | 1914 | Note that specifying the text to add (@samp{Hello}) immediately | 
|---|
|  | 1915 | after @code{a},@code{c},@code{i} is itself a @value{SSED} extension. | 
|---|
|  | 1916 | A portable, POSIX-compliant alternative is: | 
|---|
|  | 1917 |  | 
|---|
|  | 1918 | @codequoteundirected on | 
|---|
|  | 1919 | @codequotebacktick on | 
|---|
|  | 1920 | @example | 
|---|
|  | 1921 | @group | 
|---|
|  | 1922 | $ seq 2 | sed '1a\ | 
|---|
|  | 1923 | Hello | 
|---|
|  | 1924 | 2d' | 
|---|
|  | 1925 | 1 | 
|---|
|  | 1926 | Hello | 
|---|
|  | 1927 | @end group | 
|---|
|  | 1928 | @end example | 
|---|
|  | 1929 | @codequoteundirected off | 
|---|
|  | 1930 | @codequotebacktick off | 
|---|
|  | 1931 |  | 
|---|
|  | 1932 | @item @code{#} (comment) | 
|---|
|  | 1933 |  | 
|---|
|  | 1934 | All characters following @samp{#} until the next newline are ignored. | 
|---|
|  | 1935 |  | 
|---|
|  | 1936 | @codequoteundirected on | 
|---|
|  | 1937 | @codequotebacktick on | 
|---|
|  | 1938 | @example | 
|---|
|  | 1939 | @group | 
|---|
|  | 1940 | $ seq 3 | sed '# this is a comment ; 2d' | 
|---|
|  | 1941 | 1 | 
|---|
|  | 1942 | 2 | 
|---|
|  | 1943 | 3 | 
|---|
|  | 1944 |  | 
|---|
|  | 1945 |  | 
|---|
|  | 1946 | $ seq 3 | sed '# this is a comment | 
|---|
|  | 1947 | 2d' | 
|---|
|  | 1948 | 1 | 
|---|
|  | 1949 | 3 | 
|---|
|  | 1950 | @end group | 
|---|
|  | 1951 | @end example | 
|---|
|  | 1952 | @codequoteundirected off | 
|---|
|  | 1953 | @codequotebacktick off | 
|---|
|  | 1954 |  | 
|---|
|  | 1955 | @item @code{r},@code{R},@code{w},@code{W} (reading and writing files) | 
|---|
|  | 1956 |  | 
|---|
|  | 1957 | The @code{r},@code{R},@code{w},@code{W} commands parse the filename | 
|---|
|  | 1958 | until end of the line.  If whitespace, comments or semicolons are found, | 
|---|
|  | 1959 | they will be included in the filename, leading to unexpected results: | 
|---|
|  | 1960 |  | 
|---|
|  | 1961 | @codequoteundirected on | 
|---|
|  | 1962 | @codequotebacktick on | 
|---|
|  | 1963 | @example | 
|---|
|  | 1964 | @group | 
|---|
|  | 1965 | $ seq 2 | sed '1w hello.txt ; 2d' | 
|---|
|  | 1966 | 1 | 
|---|
|  | 1967 | 2 | 
|---|
|  | 1968 |  | 
|---|
|  | 1969 | $ ls -log | 
|---|
|  | 1970 | total 4 | 
|---|
|  | 1971 | -rw-rw-r-- 1 2 Jan 23 23:03 hello.txt ; 2d | 
|---|
|  | 1972 |  | 
|---|
|  | 1973 | $ cat 'hello.txt ; 2d' | 
|---|
|  | 1974 | 1 | 
|---|
|  | 1975 | @end group | 
|---|
|  | 1976 | @end example | 
|---|
|  | 1977 | @codequoteundirected off | 
|---|
|  | 1978 | @codequotebacktick off | 
|---|
|  | 1979 |  | 
|---|
|  | 1980 | Note that @command{sed} silently ignores read/write errors in | 
|---|
|  | 1981 | @code{r},@code{R},@code{w},@code{W} commands (such as missing files). | 
|---|
|  | 1982 | In the following example, @command{sed} tries to read a file named | 
|---|
|  | 1983 | @samp{@file{hello.txt ; N}}. The file is missing, and the error is silently | 
|---|
|  | 1984 | ignored: | 
|---|
|  | 1985 |  | 
|---|
|  | 1986 | @codequoteundirected on | 
|---|
|  | 1987 | @codequotebacktick on | 
|---|
|  | 1988 | @example | 
|---|
|  | 1989 | @group | 
|---|
|  | 1990 | $ echo x | sed '1rhello.txt ; N' | 
|---|
|  | 1991 | x | 
|---|
|  | 1992 | @end group | 
|---|
|  | 1993 | @end example | 
|---|
|  | 1994 | @codequoteundirected off | 
|---|
|  | 1995 | @codequotebacktick off | 
|---|
|  | 1996 |  | 
|---|
|  | 1997 | @item @code{e} (command execution) | 
|---|
|  | 1998 |  | 
|---|
|  | 1999 | Any characters following the @code{e} command until the end of the line | 
|---|
|  | 2000 | will be sent to the shell.  If whitespace, comments or semicolons are found, | 
|---|
|  | 2001 | they will be included in the shell command, leading to unexpected results: | 
|---|
|  | 2002 |  | 
|---|
|  | 2003 | @codequoteundirected on | 
|---|
|  | 2004 | @codequotebacktick on | 
|---|
|  | 2005 | @example | 
|---|
|  | 2006 | @group | 
|---|
|  | 2007 | $ echo a | sed '1e touch foo#bar' | 
|---|
|  | 2008 | a | 
|---|
|  | 2009 |  | 
|---|
|  | 2010 | $ ls -1 | 
|---|
|  | 2011 | foo#bar | 
|---|
|  | 2012 |  | 
|---|
|  | 2013 | $ echo a | sed '1e touch foo ; s/a/b/' | 
|---|
|  | 2014 | sh: 1: s/a/b/: not found | 
|---|
|  | 2015 | a | 
|---|
|  | 2016 | @end group | 
|---|
|  | 2017 | @end example | 
|---|
|  | 2018 | @codequoteundirected off | 
|---|
|  | 2019 | @codequotebacktick off | 
|---|
|  | 2020 |  | 
|---|
|  | 2021 |  | 
|---|
|  | 2022 | @item @code{s///[we]} (substitute with @code{e} or @code{w} flags) | 
|---|
|  | 2023 |  | 
|---|
|  | 2024 | In a substitution command, the @code{w} flag writes the substitution | 
|---|
|  | 2025 | result to a file, and the @code{e} flag executes the substitution result | 
|---|
|  | 2026 | as a shell command.  As with the @code{r/R/w/W/e} commands, these | 
|---|
|  | 2027 | must be terminated with a newline.  If whitespace, comments or semicolons | 
|---|
|  | 2028 | are found, they will be included in the shell command or filename, leading to | 
|---|
|  | 2029 | unexpected results: | 
|---|
|  | 2030 |  | 
|---|
|  | 2031 | @codequoteundirected on | 
|---|
|  | 2032 | @codequotebacktick on | 
|---|
|  | 2033 | @example | 
|---|
|  | 2034 | @group | 
|---|
|  | 2035 | $ echo a | sed 's/a/b/w1.txt#foo' | 
|---|
|  | 2036 | b | 
|---|
|  | 2037 |  | 
|---|
|  | 2038 | $ ls -1 | 
|---|
|  | 2039 | 1.txt#foo | 
|---|
|  | 2040 | @end group | 
|---|
|  | 2041 | @end example | 
|---|
|  | 2042 | @codequoteundirected off | 
|---|
|  | 2043 | @codequotebacktick off | 
|---|
|  | 2044 |  | 
|---|
|  | 2045 | @end table | 
|---|
|  | 2046 |  | 
|---|
|  | 2047 |  | 
|---|
|  | 2048 | @node sed addresses | 
|---|
|  | 2049 | @chapter Addresses: selecting lines | 
|---|
|  | 2050 |  | 
|---|
|  | 2051 | @menu | 
|---|
|  | 2052 | * Addresses overview::                Addresses overview | 
|---|
|  | 2053 | * Numeric Addresses::                 selecting lines by numbers | 
|---|
|  | 2054 | * Regexp Addresses::                  selecting lines by text matching | 
|---|
|  | 2055 | * Range Addresses::                   selecting a range of lines | 
|---|
|  | 2056 | * Zero Address::                      Using address @code{0} | 
|---|
|  | 2057 | @end menu | 
|---|
|  | 2058 |  | 
|---|
|  | 2059 | @node Addresses overview | 
|---|
|  | 2060 | @section Addresses overview | 
|---|
|  | 2061 |  | 
|---|
|  | 2062 | @cindex addresses, numeric | 
|---|
|  | 2063 | @cindex numeric addresses | 
|---|
|  | 2064 | Addresses determine on which line(s) the @command{sed} command will be | 
|---|
|  | 2065 | executed. The following command replaces any first occurrence of @samp{hello} | 
|---|
|  | 2066 | with @samp{world} only on line 144: | 
|---|
|  | 2067 |  | 
|---|
|  | 2068 | @codequoteundirected on | 
|---|
|  | 2069 | @codequotebacktick on | 
|---|
|  | 2070 | @example | 
|---|
|  | 2071 | sed '144s/hello/world/' input.txt > output.txt | 
|---|
|  | 2072 | @end example | 
|---|
|  | 2073 | @codequoteundirected off | 
|---|
|  | 2074 | @codequotebacktick off | 
|---|
|  | 2075 |  | 
|---|
|  | 2076 |  | 
|---|
|  | 2077 |  | 
|---|
|  | 2078 | If no address is specified, the command is performed on all lines. | 
|---|
|  | 2079 | The following command replaces @samp{hello} with @samp{world}, | 
|---|
|  | 2080 | targeting every line of the input file. | 
|---|
|  | 2081 | However, note that it modifies only the first instance of @samp{hello} | 
|---|
|  | 2082 | on each line. | 
|---|
|  | 2083 | Use the @samp{g} modifier to affect every instance on each affected line. | 
|---|
|  | 2084 |  | 
|---|
|  | 2085 | @codequoteundirected on | 
|---|
|  | 2086 | @codequotebacktick on | 
|---|
|  | 2087 | @example | 
|---|
|  | 2088 | sed 's/hello/world/' input.txt > output.txt | 
|---|
|  | 2089 | @end example | 
|---|
|  | 2090 | @codequoteundirected off | 
|---|
|  | 2091 | @codequotebacktick off | 
|---|
|  | 2092 |  | 
|---|
|  | 2093 |  | 
|---|
|  | 2094 |  | 
|---|
|  | 2095 | @cindex addresses, regular expression | 
|---|
|  | 2096 | @cindex regular expression addresses | 
|---|
|  | 2097 | Addresses can contain regular expressions to match lines based | 
|---|
|  | 2098 | on content instead of line numbers. The following command replaces | 
|---|
|  | 2099 | @samp{hello} with @samp{world} only on lines | 
|---|
|  | 2100 | containing the string @samp{apple}: | 
|---|
|  | 2101 |  | 
|---|
|  | 2102 | @codequoteundirected on | 
|---|
|  | 2103 | @codequotebacktick on | 
|---|
|  | 2104 | @example | 
|---|
|  | 2105 | sed '/apple/s/hello/world/' input.txt > output.txt | 
|---|
|  | 2106 | @end example | 
|---|
|  | 2107 | @codequoteundirected off | 
|---|
|  | 2108 | @codequotebacktick off | 
|---|
|  | 2109 |  | 
|---|
|  | 2110 |  | 
|---|
|  | 2111 |  | 
|---|
|  | 2112 | @cindex addresses, range | 
|---|
|  | 2113 | @cindex range addresses | 
|---|
|  | 2114 | An address range is specified with two addresses separated by a comma | 
|---|
|  | 2115 | (@code{,}). Addresses can be numeric, regular expressions, or a mix of | 
|---|
|  | 2116 | both. | 
|---|
|  | 2117 | The following command replaces @samp{hello} with @samp{world} | 
|---|
|  | 2118 | only on lines 4 to 17 (inclusive): | 
|---|
|  | 2119 |  | 
|---|
|  | 2120 | @codequoteundirected on | 
|---|
|  | 2121 | @codequotebacktick on | 
|---|
|  | 2122 | @example | 
|---|
|  | 2123 | sed '4,17s/hello/world/' input.txt > output.txt | 
|---|
|  | 2124 | @end example | 
|---|
|  | 2125 | @codequoteundirected off | 
|---|
|  | 2126 | @codequotebacktick off | 
|---|
|  | 2127 |  | 
|---|
|  | 2128 |  | 
|---|
|  | 2129 |  | 
|---|
|  | 2130 | @cindex Excluding lines | 
|---|
|  | 2131 | @cindex Selecting non-matching lines | 
|---|
|  | 2132 | @cindex addresses, negating | 
|---|
|  | 2133 | @cindex addresses, excluding | 
|---|
|  | 2134 | Appending the @code{!} character to the end of an address | 
|---|
|  | 2135 | specification (before the command letter) negates the sense of the | 
|---|
|  | 2136 | match.  That is, if the @code{!} character follows an address or an | 
|---|
|  | 2137 | address range, then only lines which do @emph{not} match the addresses | 
|---|
|  | 2138 | will be selected. The following command replaces @samp{hello} | 
|---|
|  | 2139 | with @samp{world} only on lines @emph{not} containing the string | 
|---|
|  | 2140 | @samp{apple}: | 
|---|
|  | 2141 |  | 
|---|
|  | 2142 | @example | 
|---|
|  | 2143 | sed '/apple/!s/hello/world/' input.txt > output.txt | 
|---|
|  | 2144 | @end example | 
|---|
|  | 2145 |  | 
|---|
|  | 2146 | The following command replaces @samp{hello} with | 
|---|
|  | 2147 | @samp{world} only on lines 1 to 3 and from line 18 to the last line of the | 
|---|
|  | 2148 | input file (i.e. excluding lines 4 to 17): | 
|---|
|  | 2149 |  | 
|---|
|  | 2150 | @example | 
|---|
|  | 2151 | sed '4,17!s/hello/world/' input.txt > output.txt | 
|---|
|  | 2152 | @end example | 
|---|
|  | 2153 |  | 
|---|
|  | 2154 |  | 
|---|
|  | 2155 |  | 
|---|
|  | 2156 |  | 
|---|
|  | 2157 |  | 
|---|
|  | 2158 | @node Numeric Addresses | 
|---|
|  | 2159 | @section Selecting lines by numbers | 
|---|
|  | 2160 | @cindex Addresses, in @command{sed} scripts | 
|---|
|  | 2161 | @cindex Line selection | 
|---|
|  | 2162 | @cindex Selecting lines to process | 
|---|
|  | 2163 |  | 
|---|
|  | 2164 | Addresses in a @command{sed} script can be in any of the following forms: | 
|---|
|  | 2165 | @table @code | 
|---|
|  | 2166 | @item @var{number} | 
|---|
|  | 2167 | @cindex Address, numeric | 
|---|
|  | 2168 | @cindex Line, selecting by number | 
|---|
|  | 2169 | Specifying a line number will match only that line in the input. | 
|---|
|  | 2170 | (Note that @command{sed} counts lines continuously across all input files | 
|---|
|  | 2171 | unless @option{-i} or @option{-s} options are specified.) | 
|---|
|  | 2172 |  | 
|---|
|  | 2173 | @item $ | 
|---|
|  | 2174 | @cindex Address, last line | 
|---|
|  | 2175 | @cindex Last line, selecting | 
|---|
|  | 2176 | @cindex Line, selecting last | 
|---|
|  | 2177 | This address matches the last line of the last file of input, or | 
|---|
|  | 2178 | the last line of each file when the @option{-i} or @option{-s} options | 
|---|
|  | 2179 | are specified. | 
|---|
|  | 2180 |  | 
|---|
|  | 2181 |  | 
|---|
|  | 2182 | @item @var{first}~@var{step} | 
|---|
|  | 2183 | @cindex GNU extensions, @samp{@var{n}~@var{m}} addresses | 
|---|
|  | 2184 | This GNU extension matches every @var{step}th line | 
|---|
|  | 2185 | starting with line @var{first}. | 
|---|
|  | 2186 | In particular, lines will be selected when there exists | 
|---|
|  | 2187 | a non-negative @var{n} such that the current line-number equals | 
|---|
|  | 2188 | @var{first} + (@var{n} * @var{step}). | 
|---|
|  | 2189 | Thus, one would use @code{1~2} to select the odd-numbered lines and | 
|---|
|  | 2190 | @code{0~2} for even-numbered lines; | 
|---|
|  | 2191 | to pick every third line starting with the second, @samp{2~3} would be used; | 
|---|
|  | 2192 | to pick every fifth line starting with the tenth, use @samp{10~5}; | 
|---|
|  | 2193 | and @samp{50~0} is just an obscure way of saying @code{50}. | 
|---|
|  | 2194 |  | 
|---|
|  | 2195 | The following commands demonstrate the step address usage: | 
|---|
|  | 2196 |  | 
|---|
|  | 2197 | @example | 
|---|
|  | 2198 | $ seq 10 | sed -n '0~4p' | 
|---|
|  | 2199 | 4 | 
|---|
|  | 2200 | 8 | 
|---|
|  | 2201 |  | 
|---|
|  | 2202 | $ seq 10 | sed -n '1~3p' | 
|---|
|  | 2203 | 1 | 
|---|
|  | 2204 | 4 | 
|---|
|  | 2205 | 7 | 
|---|
|  | 2206 | 10 | 
|---|
|  | 2207 | @end example | 
|---|
|  | 2208 |  | 
|---|
|  | 2209 |  | 
|---|
|  | 2210 | @end table | 
|---|
|  | 2211 |  | 
|---|
|  | 2212 |  | 
|---|
|  | 2213 |  | 
|---|
|  | 2214 | @node Regexp Addresses | 
|---|
|  | 2215 | @section selecting lines by text matching | 
|---|
|  | 2216 |  | 
|---|
|  | 2217 | @value{SSED} supports the following regular expression addresses. | 
|---|
|  | 2218 | The default regular expression is | 
|---|
|  | 2219 | @ref{BRE syntax, , Basic Regular Expression (BRE)}. | 
|---|
|  | 2220 | If @option{-E} or @option{-r} options are used, The regular expression should be | 
|---|
|  | 2221 | in @ref{ERE syntax, , Extended Regular Expression (ERE)} syntax. | 
|---|
|  | 2222 | @xref{BRE vs ERE}. | 
|---|
|  | 2223 |  | 
|---|
|  | 2224 | @table @code | 
|---|
|  | 2225 | @item /@var{regexp}/ | 
|---|
|  | 2226 | @cindex Address, as a regular expression | 
|---|
|  | 2227 | @cindex Line, selecting by regular expression match | 
|---|
|  | 2228 | This will select any line which matches the regular expression @var{regexp}. | 
|---|
|  | 2229 | If @var{regexp} itself includes any @code{/} characters, | 
|---|
|  | 2230 | each must be escaped by a backslash (@code{\}). | 
|---|
|  | 2231 |  | 
|---|
|  | 2232 | The following command prints lines in @file{/etc/passwd} | 
|---|
|  | 2233 | which end with @samp{bash}@footnote{ | 
|---|
|  | 2234 | There are of course many other ways to do the same, | 
|---|
|  | 2235 | e.g. | 
|---|
|  | 2236 | @example | 
|---|
|  | 2237 | grep 'bash$' /etc/passwd | 
|---|
|  | 2238 | awk -F: '$7 == "/bin/bash"' /etc/passwd | 
|---|
|  | 2239 | @end example | 
|---|
|  | 2240 | }: | 
|---|
|  | 2241 |  | 
|---|
|  | 2242 | @example | 
|---|
|  | 2243 | sed -n '/bash$/p' /etc/passwd | 
|---|
|  | 2244 | @end example | 
|---|
|  | 2245 |  | 
|---|
|  | 2246 | @cindex empty regular expression | 
|---|
|  | 2247 | @cindex @value{SSEDEXT}, modifiers and the empty regular expression | 
|---|
|  | 2248 | The empty regular expression @samp{//} repeats the last regular | 
|---|
|  | 2249 | expression match (the same holds if the empty regular expression is | 
|---|
|  | 2250 | passed to the @code{s} command).  Note that modifiers to regular expressions | 
|---|
|  | 2251 | are evaluated when the regular expression is compiled, thus it is invalid to | 
|---|
|  | 2252 | specify them together with the empty regular expression. | 
|---|
|  | 2253 |  | 
|---|
|  | 2254 | @item \%@var{regexp}% | 
|---|
|  | 2255 | (The @code{%} may be replaced by any other single character.) | 
|---|
|  | 2256 |  | 
|---|
|  | 2257 | @cindex Slash character, in regular expressions | 
|---|
|  | 2258 | This also matches the regular expression @var{regexp}, | 
|---|
|  | 2259 | but allows one to use a different delimiter than @code{/}. | 
|---|
|  | 2260 | This is particularly useful if the @var{regexp} itself contains | 
|---|
|  | 2261 | a lot of slashes, since it avoids the tedious escaping of every @code{/}. | 
|---|
|  | 2262 | If @var{regexp} itself includes any delimiter characters, | 
|---|
|  | 2263 | each must be escaped by a backslash (@code{\}). | 
|---|
|  | 2264 |  | 
|---|
|  | 2265 | The following commands are equivalent. They print lines | 
|---|
|  | 2266 | which start with @samp{/home/alice/documents/}: | 
|---|
|  | 2267 |  | 
|---|
|  | 2268 | @example | 
|---|
|  | 2269 | sed -n '/^\/home\/alice\/documents\//p' | 
|---|
|  | 2270 | sed -n '\%^/home/alice/documents/%p' | 
|---|
|  | 2271 | sed -n '\;^/home/alice/documents/;p' | 
|---|
|  | 2272 | @end example | 
|---|
|  | 2273 |  | 
|---|
|  | 2274 |  | 
|---|
|  | 2275 | @item /@var{regexp}/I | 
|---|
|  | 2276 | @itemx \%@var{regexp}%I | 
|---|
|  | 2277 | @cindex GNU extensions, @code{I} modifier | 
|---|
|  | 2278 | @cindex case insensitive, regular expression | 
|---|
|  | 2279 | The @code{I} modifier to regular-expression matching is a GNU | 
|---|
|  | 2280 | extension which causes the @var{regexp} to be matched in | 
|---|
|  | 2281 | a case-insensitive manner. | 
|---|
|  | 2282 |  | 
|---|
|  | 2283 | In many other programming languages, a lower case @code{i} is used | 
|---|
|  | 2284 | for case-insensitive regular expression matching. However, in @command{sed} | 
|---|
|  | 2285 | the @code{i} is used for the insert command (@pxref{insert command}). | 
|---|
|  | 2286 |  | 
|---|
|  | 2287 | Observe the difference between the following examples. | 
|---|
|  | 2288 |  | 
|---|
|  | 2289 | In this example, @code{/b/I} is the address: regular expression with @code{I} | 
|---|
|  | 2290 | modifier. @code{d} is the delete command: | 
|---|
|  | 2291 |  | 
|---|
|  | 2292 | @example | 
|---|
|  | 2293 | $ printf "%s\n" a b c | sed '/b/Id' | 
|---|
|  | 2294 | a | 
|---|
|  | 2295 | c | 
|---|
|  | 2296 | @end example | 
|---|
|  | 2297 |  | 
|---|
|  | 2298 | Here, @code{/b/} is the address: a regular expression. | 
|---|
|  | 2299 | @code{i} is the insert command. | 
|---|
|  | 2300 | @code{d} is the value to insert. | 
|---|
|  | 2301 | A line with @samp{d} is then inserted above the matched line: | 
|---|
|  | 2302 |  | 
|---|
|  | 2303 | @example | 
|---|
|  | 2304 | $ printf "%s\n" a b c | sed '/b/id' | 
|---|
|  | 2305 | a | 
|---|
|  | 2306 | d | 
|---|
|  | 2307 | b | 
|---|
|  | 2308 | c | 
|---|
|  | 2309 | @end example | 
|---|
|  | 2310 |  | 
|---|
|  | 2311 | @item /@var{regexp}/M | 
|---|
|  | 2312 | @itemx \%@var{regexp}%M | 
|---|
|  | 2313 | @cindex @value{SSEDEXT}, @code{M} modifier | 
|---|
|  | 2314 | The @code{M} modifier to regular-expression matching is a @value{SSED} | 
|---|
|  | 2315 | extension which directs @value{SSED} to match the regular expression | 
|---|
|  | 2316 | in @cite{multi-line} mode.  The modifier causes @code{^} and @code{$} to | 
|---|
|  | 2317 | match respectively (in addition to the normal behavior) the empty string | 
|---|
|  | 2318 | after a newline, and the empty string before a newline.  There are | 
|---|
|  | 2319 | special character sequences | 
|---|
|  | 2320 | @ifclear PERL | 
|---|
|  | 2321 | (@code{\`} and @code{\'}) | 
|---|
|  | 2322 | @end ifclear | 
|---|
|  | 2323 | which always match the beginning or the end of the buffer. | 
|---|
|  | 2324 | In addition, | 
|---|
|  | 2325 | the period character does not match a new-line character in | 
|---|
|  | 2326 | multi-line mode. | 
|---|
|  | 2327 | @end table | 
|---|
|  | 2328 |  | 
|---|
|  | 2329 |  | 
|---|
|  | 2330 | @cindex regex addresses and pattern space | 
|---|
|  | 2331 | @cindex regex addresses and input lines | 
|---|
|  | 2332 | Regex addresses operate on the content of the current | 
|---|
|  | 2333 | pattern space. If the pattern space is changed (for example with @code{s///} | 
|---|
|  | 2334 | command) the regular expression matching will operate on the changed text. | 
|---|
|  | 2335 |  | 
|---|
|  | 2336 | In the following example, automatic printing is disabled with | 
|---|
|  | 2337 | @option{-n}.  The @code{s/2/X/} command changes lines containing | 
|---|
|  | 2338 | @samp{2} to @samp{X}. The command @code{/[0-9]/p} matches | 
|---|
|  | 2339 | lines with digits and prints them. | 
|---|
|  | 2340 | Because the second line is changed before the @code{/[0-9]/} regex, | 
|---|
|  | 2341 | it will not match and will not be printed: | 
|---|
|  | 2342 |  | 
|---|
|  | 2343 | @codequoteundirected on | 
|---|
|  | 2344 | @codequotebacktick on | 
|---|
|  | 2345 | @example | 
|---|
|  | 2346 | @group | 
|---|
|  | 2347 | $ seq 3 | sed -n 's/2/X/ ; /[0-9]/p' | 
|---|
|  | 2348 | 1 | 
|---|
|  | 2349 | 3 | 
|---|
|  | 2350 | @end group | 
|---|
|  | 2351 | @end example | 
|---|
|  | 2352 | @codequoteundirected off | 
|---|
|  | 2353 | @codequotebacktick off | 
|---|
|  | 2354 |  | 
|---|
|  | 2355 |  | 
|---|
|  | 2356 | @node Range Addresses | 
|---|
|  | 2357 | @section Range Addresses | 
|---|
|  | 2358 |  | 
|---|
|  | 2359 | @cindex Range of lines | 
|---|
|  | 2360 | @cindex Several lines, selecting | 
|---|
|  | 2361 | An address range can be specified by specifying two addresses | 
|---|
|  | 2362 | separated by a comma (@code{,}).  An address range matches lines | 
|---|
|  | 2363 | starting from where the first address matches, and continues | 
|---|
|  | 2364 | until the second address matches (inclusively): | 
|---|
|  | 2365 |  | 
|---|
|  | 2366 | @example | 
|---|
|  | 2367 | $ seq 10 | sed -n '4,6p' | 
|---|
|  | 2368 | 4 | 
|---|
|  | 2369 | 5 | 
|---|
|  | 2370 | 6 | 
|---|
|  | 2371 | @end example | 
|---|
|  | 2372 |  | 
|---|
|  | 2373 | If the second address is a @var{regexp}, then checking for the | 
|---|
|  | 2374 | ending match will start with the line @emph{following} the | 
|---|
|  | 2375 | line which matched the first address: a range will always | 
|---|
|  | 2376 | span at least two lines (except of course if the input stream | 
|---|
|  | 2377 | ends). | 
|---|
|  | 2378 |  | 
|---|
|  | 2379 | @example | 
|---|
|  | 2380 | $ seq 10 | sed -n '4,/[0-9]/p' | 
|---|
|  | 2381 | 4 | 
|---|
|  | 2382 | 5 | 
|---|
|  | 2383 | @end example | 
|---|
|  | 2384 |  | 
|---|
|  | 2385 | If the second address is a @var{number} less than (or equal to) | 
|---|
|  | 2386 | the line matching the first address, then only the one line is | 
|---|
|  | 2387 | matched: | 
|---|
|  | 2388 |  | 
|---|
|  | 2389 | @example | 
|---|
|  | 2390 | $ seq 10 | sed -n '4,1p' | 
|---|
|  | 2391 | 4 | 
|---|
|  | 2392 | @end example | 
|---|
|  | 2393 |  | 
|---|
|  | 2394 | @anchor{Zero Address Regex Range} | 
|---|
|  | 2395 | @cindex Special addressing forms | 
|---|
|  | 2396 | @cindex Range with start address of zero | 
|---|
|  | 2397 | @cindex Zero, as range start address | 
|---|
|  | 2398 | @cindex @var{addr1},+N | 
|---|
|  | 2399 | @cindex @var{addr1},~N | 
|---|
|  | 2400 | @cindex GNU extensions, special two-address forms | 
|---|
|  | 2401 | @cindex GNU extensions, @code{0} address | 
|---|
|  | 2402 | @cindex GNU extensions, 0,@var{addr2} addressing | 
|---|
|  | 2403 | @cindex GNU extensions, @var{addr1},+@var{N} addressing | 
|---|
|  | 2404 | @cindex GNU extensions, @var{addr1},~@var{N} addressing | 
|---|
|  | 2405 | @value{SSED} also supports some special two-address forms; all these | 
|---|
|  | 2406 | are GNU extensions: | 
|---|
|  | 2407 | @table @code | 
|---|
|  | 2408 | @item 0,/@var{regexp}/ | 
|---|
|  | 2409 | A line number of @code{0} can be used in an address specification like | 
|---|
|  | 2410 | @code{0,/@var{regexp}/} so that @command{sed} will try to match | 
|---|
|  | 2411 | @var{regexp} in the first input line too.  In other words, | 
|---|
|  | 2412 | @code{0,/@var{regexp}/} is similar to @code{1,/@var{regexp}/}, | 
|---|
|  | 2413 | except that if @var{addr2} matches the very first line of input the | 
|---|
|  | 2414 | @code{0,/@var{regexp}/} form will consider it to end the range, whereas | 
|---|
|  | 2415 | the @code{1,/@var{regexp}/} form will match the beginning of its range and | 
|---|
|  | 2416 | hence make the range span up to the @emph{second} occurrence of the | 
|---|
|  | 2417 | regular expression. | 
|---|
|  | 2418 |  | 
|---|
|  | 2419 | The following examples demonstrate the difference between starting | 
|---|
|  | 2420 | with address 1 and 0: | 
|---|
|  | 2421 |  | 
|---|
|  | 2422 | @example | 
|---|
|  | 2423 | $ seq 10 | sed -n '1,/[0-9]/p' | 
|---|
|  | 2424 | 1 | 
|---|
|  | 2425 | 2 | 
|---|
|  | 2426 |  | 
|---|
|  | 2427 | $ seq 10 | sed -n '0,/[0-9]/p' | 
|---|
|  | 2428 | 1 | 
|---|
|  | 2429 | @end example | 
|---|
|  | 2430 |  | 
|---|
|  | 2431 |  | 
|---|
|  | 2432 | @item @var{addr1},+@var{N} | 
|---|
|  | 2433 | Matches @var{addr1} and the @var{N} lines following @var{addr1}. | 
|---|
|  | 2434 |  | 
|---|
|  | 2435 | @example | 
|---|
|  | 2436 | $ seq 10 | sed -n '6,+2p' | 
|---|
|  | 2437 | 6 | 
|---|
|  | 2438 | 7 | 
|---|
|  | 2439 | 8 | 
|---|
|  | 2440 | @end example | 
|---|
|  | 2441 |  | 
|---|
|  | 2442 | @var{addr1} can be a line number or a regular expression. | 
|---|
|  | 2443 |  | 
|---|
|  | 2444 | @item @var{addr1},~@var{N} | 
|---|
|  | 2445 | Matches @var{addr1} and the lines following @var{addr1} | 
|---|
|  | 2446 | until the next line whose input line number is a multiple of @var{N}. | 
|---|
|  | 2447 | The following command prints starting at line 6, until the next line which | 
|---|
|  | 2448 | is a multiple of 4 (i.e. line 8): | 
|---|
|  | 2449 |  | 
|---|
|  | 2450 | @example | 
|---|
|  | 2451 | $ seq 10 | sed -n '6,~4p' | 
|---|
|  | 2452 | 6 | 
|---|
|  | 2453 | 7 | 
|---|
|  | 2454 | 8 | 
|---|
|  | 2455 | @end example | 
|---|
|  | 2456 |  | 
|---|
|  | 2457 | @var{addr1} can be a line number or a regular expression. | 
|---|
|  | 2458 |  | 
|---|
|  | 2459 | @end table | 
|---|
|  | 2460 |  | 
|---|
|  | 2461 |  | 
|---|
|  | 2462 |  | 
|---|
|  | 2463 | @node Zero Address | 
|---|
|  | 2464 | @section Zero Address | 
|---|
|  | 2465 | @cindex Zero Address | 
|---|
|  | 2466 | As a @value{SSED} extension, @code{0} address can be used in two cases: | 
|---|
|  | 2467 | @enumerate | 
|---|
|  | 2468 | @item | 
|---|
|  | 2469 | In a regex range addresses as @code{0,/@var{regexp}/} | 
|---|
|  | 2470 | (@pxref{Zero Address Regex Range}). | 
|---|
|  | 2471 | @item | 
|---|
|  | 2472 | With the @code{r} command, inserting a file before the first line | 
|---|
|  | 2473 | (@pxref{Adding a header to multiple files}). | 
|---|
|  | 2474 | @end enumerate | 
|---|
|  | 2475 |  | 
|---|
|  | 2476 | Note that these are the only places where the @code{0} address makes | 
|---|
|  | 2477 | sense; Commands which are given the @code{0} address in any | 
|---|
|  | 2478 | other way will give an error. | 
|---|
|  | 2479 |  | 
|---|
|  | 2480 |  | 
|---|
|  | 2481 |  | 
|---|
|  | 2482 | @node sed regular expressions | 
|---|
|  | 2483 | @chapter Regular Expressions: selecting text | 
|---|
|  | 2484 |  | 
|---|
|  | 2485 | @menu | 
|---|
|  | 2486 | * Regular Expressions Overview:: Overview of Regular expression in @command{sed} | 
|---|
|  | 2487 | * BRE vs ERE::               Basic (BRE) and extended (ERE) regular expression | 
|---|
|  | 2488 | syntax | 
|---|
|  | 2489 | * BRE syntax::               Overview of basic regular expression syntax | 
|---|
|  | 2490 | * ERE syntax::               Overview of extended regular expression syntax | 
|---|
|  | 2491 | * Character Classes and Bracket Expressions:: | 
|---|
|  | 2492 | * regexp extensions::        Additional regular expression commands | 
|---|
|  | 2493 | * Back-references and Subexpressions:: Back-references and Subexpressions | 
|---|
|  | 2494 | * Escapes::                  Specifying special characters | 
|---|
|  | 2495 | * Locale Considerations::    Multibyte characters and locale considerations | 
|---|
|  | 2496 | @end menu | 
|---|
|  | 2497 |  | 
|---|
|  | 2498 | @node Regular Expressions Overview | 
|---|
|  | 2499 | @section Overview of regular expression in @command{sed} | 
|---|
|  | 2500 |  | 
|---|
|  | 2501 | @c NOTE: Keep examples in the 'overview' section | 
|---|
|  | 2502 | @c neutral in regards to BRE/ERE - to ease understanding. | 
|---|
|  | 2503 |  | 
|---|
|  | 2504 |  | 
|---|
|  | 2505 | To know how to use @command{sed}, people should understand regular | 
|---|
|  | 2506 | expressions (@dfn{regexp} for short).  A regular expression | 
|---|
|  | 2507 | is a pattern that is matched against a | 
|---|
|  | 2508 | subject string from left to right.  Most characters are | 
|---|
|  | 2509 | @dfn{ordinary}: they stand for | 
|---|
|  | 2510 | themselves in a pattern, and match the corresponding characters. | 
|---|
|  | 2511 | Regular expressions in @command{sed} are specified between two | 
|---|
|  | 2512 | slashes. | 
|---|
|  | 2513 |  | 
|---|
|  | 2514 | The following command prints lines containing the string @samp{hello}: | 
|---|
|  | 2515 |  | 
|---|
|  | 2516 | @example | 
|---|
|  | 2517 | sed -n '/hello/p' | 
|---|
|  | 2518 | @end example | 
|---|
|  | 2519 |  | 
|---|
|  | 2520 | The above example is equivalent to this @command{grep} command: | 
|---|
|  | 2521 |  | 
|---|
|  | 2522 | @example | 
|---|
|  | 2523 | grep 'hello' | 
|---|
|  | 2524 | @end example | 
|---|
|  | 2525 |  | 
|---|
|  | 2526 | The power of regular expressions comes from the ability to include | 
|---|
|  | 2527 | alternatives and repetitions in the pattern.  These are encoded in the | 
|---|
|  | 2528 | pattern by the use of @dfn{special characters}, which do not stand for | 
|---|
|  | 2529 | themselves but instead are interpreted in some special way. | 
|---|
|  | 2530 |  | 
|---|
|  | 2531 | The character @code{^} (caret) in a regular expression matches the | 
|---|
|  | 2532 | beginning of the line. The character @code{.} (dot) matches any single | 
|---|
|  | 2533 | character. The following @command{sed} command matches and prints | 
|---|
|  | 2534 | lines which start with the letter @samp{b}, followed by any single character, | 
|---|
|  | 2535 | followed by the letter @samp{d}: | 
|---|
|  | 2536 |  | 
|---|
|  | 2537 | @example | 
|---|
|  | 2538 | $ printf "%s\n" abode bad bed bit bid byte body | sed -n '/^b.d/p' | 
|---|
|  | 2539 | bad | 
|---|
|  | 2540 | bed | 
|---|
|  | 2541 | bid | 
|---|
|  | 2542 | body | 
|---|
|  | 2543 | @end example | 
|---|
|  | 2544 |  | 
|---|
|  | 2545 | The following sections explain the meaning and usage of special | 
|---|
|  | 2546 | characters in regular expressions. | 
|---|
|  | 2547 |  | 
|---|
|  | 2548 | @node BRE vs ERE | 
|---|
|  | 2549 | @section Basic (BRE) and extended (ERE) regular expression | 
|---|
|  | 2550 |  | 
|---|
|  | 2551 | Basic and extended regular expressions are two variations on the | 
|---|
|  | 2552 | syntax of the specified pattern. Basic Regular Expression (BRE) syntax is the | 
|---|
|  | 2553 | default in @command{sed} (and similarly in @command{grep}). | 
|---|
|  | 2554 | Use the POSIX-specified @option{-E} option (@option{-r}, | 
|---|
|  | 2555 | @option{--regexp-extended}) to enable Extended Regular Expression (ERE) syntax. | 
|---|
|  | 2556 |  | 
|---|
|  | 2557 | In @value{SSED}, the only difference between basic and extended regular | 
|---|
|  | 2558 | expressions is in the behavior of a few special characters: @samp{?}, | 
|---|
|  | 2559 | @samp{+}, parentheses, braces (@samp{@{@}}), and @samp{|}. | 
|---|
|  | 2560 |  | 
|---|
|  | 2561 | With basic (BRE) syntax, these characters do not have special meaning | 
|---|
|  | 2562 | unless prefixed with a backslash (@samp{\}); While with extended (ERE) syntax | 
|---|
|  | 2563 | it is reversed: these characters are special unless they are prefixed | 
|---|
|  | 2564 | with backslash (@samp{\}). | 
|---|
|  | 2565 |  | 
|---|
|  | 2566 | @multitable @columnfractions .28 .36 .35 | 
|---|
|  | 2567 |  | 
|---|
|  | 2568 | @headitem Desired pattern | 
|---|
|  | 2569 | @tab Basic (BRE) Syntax | 
|---|
|  | 2570 | @tab Extended (ERE) Syntax | 
|---|
|  | 2571 |  | 
|---|
|  | 2572 | @item literal @samp{+} (plus sign) | 
|---|
|  | 2573 |  | 
|---|
|  | 2574 | @tab | 
|---|
|  | 2575 | @exampleindent 0 | 
|---|
|  | 2576 | @codequoteundirected on | 
|---|
|  | 2577 | @codequotebacktick on | 
|---|
|  | 2578 | @example | 
|---|
|  | 2579 | $ echo 'a+b=c' > foo | 
|---|
|  | 2580 | $ sed -n '/a+b/p' foo | 
|---|
|  | 2581 | a+b=c | 
|---|
|  | 2582 | @end example | 
|---|
|  | 2583 | @codequotebacktick off | 
|---|
|  | 2584 | @codequoteundirected off | 
|---|
|  | 2585 |  | 
|---|
|  | 2586 | @tab | 
|---|
|  | 2587 | @exampleindent 0 | 
|---|
|  | 2588 | @codequoteundirected on | 
|---|
|  | 2589 | @codequotebacktick on | 
|---|
|  | 2590 | @example | 
|---|
|  | 2591 | $ echo 'a+b=c' > foo | 
|---|
|  | 2592 | $ sed -E -n '/a\+b/p' foo | 
|---|
|  | 2593 | a+b=c | 
|---|
|  | 2594 | @end example | 
|---|
|  | 2595 | @codequotebacktick off | 
|---|
|  | 2596 | @codequoteundirected off | 
|---|
|  | 2597 |  | 
|---|
|  | 2598 |  | 
|---|
|  | 2599 | @item One or more @samp{a} characters followed by @samp{b} | 
|---|
|  | 2600 | (plus sign as special meta-character) | 
|---|
|  | 2601 |  | 
|---|
|  | 2602 | @tab | 
|---|
|  | 2603 | @exampleindent 0 | 
|---|
|  | 2604 | @codequoteundirected on | 
|---|
|  | 2605 | @codequotebacktick on | 
|---|
|  | 2606 | @example | 
|---|
|  | 2607 | $ echo aab > foo | 
|---|
|  | 2608 | $ sed -n '/a\+b/p' foo | 
|---|
|  | 2609 | aab | 
|---|
|  | 2610 | @end example | 
|---|
|  | 2611 | @codequotebacktick off | 
|---|
|  | 2612 | @codequoteundirected off | 
|---|
|  | 2613 |  | 
|---|
|  | 2614 | @tab | 
|---|
|  | 2615 | @exampleindent 0 | 
|---|
|  | 2616 | @codequoteundirected on | 
|---|
|  | 2617 | @codequotebacktick on | 
|---|
|  | 2618 | @example | 
|---|
|  | 2619 | $ echo aab > foo | 
|---|
|  | 2620 | $ sed -E -n '/a+b/p' foo | 
|---|
|  | 2621 | aab | 
|---|
|  | 2622 | @end example | 
|---|
|  | 2623 | @codequotebacktick off | 
|---|
|  | 2624 | @codequoteundirected off | 
|---|
|  | 2625 |  | 
|---|
|  | 2626 | @end multitable | 
|---|
|  | 2627 |  | 
|---|
|  | 2628 |  | 
|---|
|  | 2629 |  | 
|---|
|  | 2630 |  | 
|---|
|  | 2631 | @node BRE syntax | 
|---|
|  | 2632 | @section Overview of basic regular expression syntax | 
|---|
|  | 2633 |  | 
|---|
|  | 2634 | Here is a brief description | 
|---|
|  | 2635 | of regular expression syntax as used in @command{sed}. | 
|---|
|  | 2636 |  | 
|---|
|  | 2637 | @table @code | 
|---|
|  | 2638 | @item @var{char} | 
|---|
|  | 2639 | A single ordinary character matches itself. | 
|---|
|  | 2640 |  | 
|---|
|  | 2641 | @item * | 
|---|
|  | 2642 | @cindex GNU extensions, to basic regular expressions | 
|---|
|  | 2643 | Matches a sequence of zero or more instances of matches for the | 
|---|
|  | 2644 | preceding regular expression, which must be an ordinary character, a | 
|---|
|  | 2645 | special character preceded by @code{\}, a @code{.}, a grouped regexp | 
|---|
|  | 2646 | (see below), or a bracket expression.  As a GNU extension, a | 
|---|
|  | 2647 | postfixed regular expression can also be followed by @code{*}; for | 
|---|
|  | 2648 | example, @code{a**} is equivalent to @code{a*}.  POSIX | 
|---|
|  | 2649 | 1003.1-2001 says that @code{*} stands for itself when it appears at | 
|---|
|  | 2650 | the start of a regular expression or subexpression, but many | 
|---|
|  | 2651 | non-GNU implementations do not support this and portable | 
|---|
|  | 2652 | scripts should instead use @code{\*} in these contexts. | 
|---|
|  | 2653 | @item . | 
|---|
|  | 2654 | Matches any character, including newline. | 
|---|
|  | 2655 |  | 
|---|
|  | 2656 | @item ^ | 
|---|
|  | 2657 | Matches the null string at beginning of the pattern space, i.e. what | 
|---|
|  | 2658 | appears after the circumflex must appear at the beginning of the | 
|---|
|  | 2659 | pattern space. | 
|---|
|  | 2660 |  | 
|---|
|  | 2661 | In most scripts, pattern space is initialized to the content of each | 
|---|
|  | 2662 | line (@pxref{Execution Cycle, , How @code{sed} works}).  So, it is a | 
|---|
|  | 2663 | useful simplification to think of @code{^#include} as matching only | 
|---|
|  | 2664 | lines where @samp{#include} is the first thing on the line---if there is | 
|---|
|  | 2665 | any preceding space, for example, the match fails.  This simplification is | 
|---|
|  | 2666 | valid as long as the original content of pattern space is not modified, | 
|---|
|  | 2667 | for example with an @code{s} command. | 
|---|
|  | 2668 |  | 
|---|
|  | 2669 | @code{^} acts as a special character only at the beginning of the | 
|---|
|  | 2670 | regular expression or subexpression (that is, after @code{\(} or | 
|---|
|  | 2671 | @code{\|}).  Portable scripts should avoid @code{^} at the beginning of | 
|---|
|  | 2672 | a subexpression, though, as POSIX allows implementations that | 
|---|
|  | 2673 | treat @code{^} as an ordinary character in that context. | 
|---|
|  | 2674 |  | 
|---|
|  | 2675 | @item $ | 
|---|
|  | 2676 | It is the same as @code{^}, but refers to end of pattern space. | 
|---|
|  | 2677 | @code{$} also acts as a special character only at the end | 
|---|
|  | 2678 | of the regular expression or subexpression (that is, before @code{\)} | 
|---|
|  | 2679 | or @code{\|}), and its use at the end of a subexpression is not | 
|---|
|  | 2680 | portable. | 
|---|
|  | 2681 |  | 
|---|
|  | 2682 |  | 
|---|
|  | 2683 | @item [@var{list}] | 
|---|
|  | 2684 | @itemx [^@var{list}] | 
|---|
|  | 2685 | Matches any single character in @var{list}: for example, | 
|---|
|  | 2686 | @code{[aeiou]} matches all vowels.  A list may include | 
|---|
|  | 2687 | sequences like @code{@var{char1}-@var{char2}}, which | 
|---|
|  | 2688 | matches any character between (inclusive) @var{char1} | 
|---|
|  | 2689 | and @var{char2}. | 
|---|
|  | 2690 | @xref{Character Classes and Bracket Expressions}. | 
|---|
|  | 2691 |  | 
|---|
|  | 2692 | @item \+ | 
|---|
|  | 2693 | @cindex GNU extensions, to basic regular expressions | 
|---|
|  | 2694 | As @code{*}, but matches one or more.  It is a GNU extension. | 
|---|
|  | 2695 |  | 
|---|
|  | 2696 | @item \? | 
|---|
|  | 2697 | @cindex GNU extensions, to basic regular expressions | 
|---|
|  | 2698 | As @code{*}, but only matches zero or one.  It is a GNU extension. | 
|---|
|  | 2699 |  | 
|---|
|  | 2700 | @item \@{@var{i}\@} | 
|---|
|  | 2701 | As @code{*}, but matches exactly @var{i} sequences (@var{i} is a | 
|---|
|  | 2702 | decimal integer; for portability, keep it between 0 and 255 | 
|---|
|  | 2703 | inclusive). | 
|---|
|  | 2704 |  | 
|---|
|  | 2705 | @item \@{@var{i},@var{j}\@} | 
|---|
|  | 2706 | Matches between @var{i} and @var{j}, inclusive, sequences. | 
|---|
|  | 2707 |  | 
|---|
|  | 2708 | @item \@{@var{i},\@} | 
|---|
|  | 2709 | Matches more than or equal to @var{i} sequences. | 
|---|
|  | 2710 |  | 
|---|
|  | 2711 | @item \(@var{regexp}\) | 
|---|
|  | 2712 | Groups the inner @var{regexp} as a whole, this is used to: | 
|---|
|  | 2713 |  | 
|---|
|  | 2714 | @itemize @bullet | 
|---|
|  | 2715 | @item | 
|---|
|  | 2716 | @cindex GNU extensions, to basic regular expressions | 
|---|
|  | 2717 | Apply postfix operators, like @code{\(abcd\)*}: | 
|---|
|  | 2718 | this will search for zero or more whole sequences | 
|---|
|  | 2719 | of @samp{abcd}, while @code{abcd*} would search | 
|---|
|  | 2720 | for @samp{abc} followed by zero or more occurrences | 
|---|
|  | 2721 | of @samp{d}.  Note that support for @code{\(abcd\)*} is | 
|---|
|  | 2722 | required by POSIX 1003.1-2001, but many non-GNU | 
|---|
|  | 2723 | implementations do not support it and hence it is not universally | 
|---|
|  | 2724 | portable. | 
|---|
|  | 2725 |  | 
|---|
|  | 2726 | @item | 
|---|
|  | 2727 | Use back references (see below). | 
|---|
|  | 2728 | @end itemize | 
|---|
|  | 2729 |  | 
|---|
|  | 2730 |  | 
|---|
|  | 2731 | @item @var{regexp1}\|@var{regexp2} | 
|---|
|  | 2732 | @cindex GNU extensions, to basic regular expressions | 
|---|
|  | 2733 | Matches either @var{regexp1} or @var{regexp2}.  Use | 
|---|
|  | 2734 | parentheses to use complex alternative regular expressions. | 
|---|
|  | 2735 | The matching process tries each alternative in turn, from | 
|---|
|  | 2736 | left to right, and the first one that succeeds is used. | 
|---|
|  | 2737 | It is a GNU extension. | 
|---|
|  | 2738 |  | 
|---|
|  | 2739 | @item @var{regexp1}@var{regexp2} | 
|---|
|  | 2740 | Matches the concatenation of @var{regexp1} and @var{regexp2}. | 
|---|
|  | 2741 | Concatenation binds more tightly than @code{\|}, @code{^}, and | 
|---|
|  | 2742 | @code{$}, but less tightly than the other regular expression | 
|---|
|  | 2743 | operators. | 
|---|
|  | 2744 |  | 
|---|
|  | 2745 | @item \@var{digit} | 
|---|
|  | 2746 | Matches the @var{digit}-th @code{\(@dots{}\)} parenthesized | 
|---|
|  | 2747 | subexpression in the regular expression.  This is called a @dfn{back | 
|---|
|  | 2748 | reference}.  Subexpressions are implicitly numbered by counting | 
|---|
|  | 2749 | occurrences of @code{\(} left-to-right. | 
|---|
|  | 2750 |  | 
|---|
|  | 2751 | @item \n | 
|---|
|  | 2752 | Matches the newline character. | 
|---|
|  | 2753 |  | 
|---|
|  | 2754 | @item \@var{char} | 
|---|
|  | 2755 | Matches @var{char}, where @var{char} is one of @code{$}, | 
|---|
|  | 2756 | @code{*}, @code{.}, @code{[}, @code{\}, or @code{^}. | 
|---|
|  | 2757 | Note that the only C-like | 
|---|
|  | 2758 | backslash sequences that you can portably assume to be | 
|---|
|  | 2759 | interpreted are @code{\n} and @code{\\}; in particular | 
|---|
|  | 2760 | @code{\t} is not portable, and matches a @samp{t} under most | 
|---|
|  | 2761 | implementations of @command{sed}, rather than a tab character. | 
|---|
|  | 2762 |  | 
|---|
|  | 2763 | @end table | 
|---|
|  | 2764 |  | 
|---|
|  | 2765 | @cindex Greedy regular expression matching | 
|---|
|  | 2766 | Note that the regular expression matcher is greedy, i.e., matches | 
|---|
|  | 2767 | are attempted from left to right and, if two or more matches are | 
|---|
|  | 2768 | possible starting at the same character, it selects the longest. | 
|---|
|  | 2769 |  | 
|---|
|  | 2770 | @noindent | 
|---|
|  | 2771 | Examples: | 
|---|
|  | 2772 | @table @samp | 
|---|
|  | 2773 | @item abcdef | 
|---|
|  | 2774 | Matches @samp{abcdef}. | 
|---|
|  | 2775 |  | 
|---|
|  | 2776 | @item a*b | 
|---|
|  | 2777 | Matches zero or more @samp{a}s followed by a single | 
|---|
|  | 2778 | @samp{b}.  For example, @samp{b} or @samp{aaaaab}. | 
|---|
|  | 2779 |  | 
|---|
|  | 2780 | @item a\?b | 
|---|
|  | 2781 | Matches @samp{b} or @samp{ab}. | 
|---|
|  | 2782 |  | 
|---|
|  | 2783 | @item a\+b\+ | 
|---|
|  | 2784 | Matches one or more @samp{a}s followed by one or more | 
|---|
|  | 2785 | @samp{b}s: @samp{ab} is the shortest possible match, but | 
|---|
|  | 2786 | other examples are @samp{aaaab} or @samp{abbbbb} or | 
|---|
|  | 2787 | @samp{aaaaaabbbbbbb}. | 
|---|
|  | 2788 |  | 
|---|
|  | 2789 | @item .* | 
|---|
|  | 2790 | @itemx .\+ | 
|---|
|  | 2791 | These two both match all the characters in a string; | 
|---|
|  | 2792 | however, the first matches every string (including the empty | 
|---|
|  | 2793 | string), while the second matches only strings containing | 
|---|
|  | 2794 | at least one character. | 
|---|
|  | 2795 |  | 
|---|
|  | 2796 | @item ^main.*(.*) | 
|---|
|  | 2797 | This matches a string starting with @samp{main}, | 
|---|
|  | 2798 | followed by an opening and closing | 
|---|
|  | 2799 | parenthesis.  The @samp{n}, @samp{(} and @samp{)} need not | 
|---|
|  | 2800 | be adjacent. | 
|---|
|  | 2801 |  | 
|---|
|  | 2802 | @item ^# | 
|---|
|  | 2803 | This matches a string beginning with @samp{#}. | 
|---|
|  | 2804 |  | 
|---|
|  | 2805 | @item \\$ | 
|---|
|  | 2806 | This matches a string ending with a single backslash.  The | 
|---|
|  | 2807 | regexp contains two backslashes for escaping. | 
|---|
|  | 2808 |  | 
|---|
|  | 2809 | @item \$ | 
|---|
|  | 2810 | Instead, this matches a string consisting of a single dollar sign, | 
|---|
|  | 2811 | because it is escaped. | 
|---|
|  | 2812 |  | 
|---|
|  | 2813 | @item [a-zA-Z0-9] | 
|---|
|  | 2814 | In the C locale, this matches any ASCII letters or digits. | 
|---|
|  | 2815 |  | 
|---|
|  | 2816 | @item [^ @kbd{@key{TAB}}]\+ | 
|---|
|  | 2817 | (Here @kbd{@key{TAB}} stands for a single tab character.) | 
|---|
|  | 2818 | This matches a string of one or more | 
|---|
|  | 2819 | characters, none of which is a space or a tab. | 
|---|
|  | 2820 | Usually this means a word. | 
|---|
|  | 2821 |  | 
|---|
|  | 2822 | @item ^\(.*\)\n\1$ | 
|---|
|  | 2823 | This matches a string consisting of two equal substrings separated by | 
|---|
|  | 2824 | a newline. | 
|---|
|  | 2825 |  | 
|---|
|  | 2826 | @item .\@{9\@}A$ | 
|---|
|  | 2827 | This matches nine characters followed by an @samp{A} at the end of a line. | 
|---|
|  | 2828 |  | 
|---|
|  | 2829 | @item ^.\@{15\@}A | 
|---|
|  | 2830 | This matches the start of a string that contains 16 characters, | 
|---|
|  | 2831 | the last of which is an @samp{A}. | 
|---|
|  | 2832 |  | 
|---|
|  | 2833 | @end table | 
|---|
|  | 2834 |  | 
|---|
|  | 2835 |  | 
|---|
|  | 2836 | @node ERE syntax | 
|---|
|  | 2837 | @section Overview of extended regular expression syntax | 
|---|
|  | 2838 | @cindex Extended regular expressions, syntax | 
|---|
|  | 2839 |  | 
|---|
|  | 2840 | The only difference between basic and extended regular expressions is in | 
|---|
|  | 2841 | the behavior of a few characters: @samp{?}, @samp{+}, parentheses, | 
|---|
|  | 2842 | braces (@samp{@{@}}), and @samp{|}.  While basic regular expressions | 
|---|
|  | 2843 | require these to be escaped if you want them to behave as special | 
|---|
|  | 2844 | characters, when using extended regular expressions you must escape | 
|---|
|  | 2845 | them if you want them @emph{to match a literal character}.  @samp{|} | 
|---|
|  | 2846 | is special here because @samp{\|} is a GNU extension -- standard | 
|---|
|  | 2847 | basic regular expressions do not provide its functionality. | 
|---|
|  | 2848 |  | 
|---|
|  | 2849 | @noindent | 
|---|
|  | 2850 | Examples: | 
|---|
|  | 2851 | @table @code | 
|---|
|  | 2852 | @item abc? | 
|---|
|  | 2853 | becomes @samp{abc\?} when using extended regular expressions.  It matches | 
|---|
|  | 2854 | the literal string @samp{abc?}. | 
|---|
|  | 2855 |  | 
|---|
|  | 2856 | @item c\+ | 
|---|
|  | 2857 | becomes @samp{c+} when using extended regular expressions.  It matches | 
|---|
|  | 2858 | one or more @samp{c}s. | 
|---|
|  | 2859 |  | 
|---|
|  | 2860 | @item a\@{3,\@} | 
|---|
|  | 2861 | becomes @samp{a@{3,@}} when using extended regular expressions.  It matches | 
|---|
|  | 2862 | three or more @samp{a}s. | 
|---|
|  | 2863 |  | 
|---|
|  | 2864 | @item \(abc\)\@{2,3\@} | 
|---|
|  | 2865 | becomes @samp{(abc)@{2,3@}} when using extended regular expressions.  It | 
|---|
|  | 2866 | matches either @samp{abcabc} or @samp{abcabcabc}. | 
|---|
|  | 2867 |  | 
|---|
|  | 2868 | @item \(abc*\)\1 | 
|---|
|  | 2869 | becomes @samp{(abc*)\1} when using extended regular expressions. | 
|---|
|  | 2870 | Backreferences must still be escaped when using extended regular | 
|---|
|  | 2871 | expressions. | 
|---|
|  | 2872 |  | 
|---|
|  | 2873 | @item a\|b | 
|---|
|  | 2874 | becomes @samp{a|b} when using extended regular expressions.  It matches | 
|---|
|  | 2875 | @samp{a} or @samp{b}. | 
|---|
|  | 2876 | @end table | 
|---|
|  | 2877 |  | 
|---|
|  | 2878 | @node Character Classes and Bracket Expressions | 
|---|
|  | 2879 | @section Character Classes and Bracket Expressions | 
|---|
|  | 2880 |  | 
|---|
|  | 2881 | @c The 'character class' section is shamelessly copied from grep's manual. | 
|---|
|  | 2882 |  | 
|---|
|  | 2883 | @cindex bracket expression | 
|---|
|  | 2884 | @cindex character class | 
|---|
|  | 2885 | A @dfn{bracket expression} is a list of characters enclosed by @samp{[} and | 
|---|
|  | 2886 | @samp{]}. | 
|---|
|  | 2887 | It matches any single character in that list; | 
|---|
|  | 2888 | if the first character of the list is the caret @samp{^}, | 
|---|
|  | 2889 | then it matches any character @strong{not} in the list. | 
|---|
|  | 2890 | For example, the following command replaces the strings | 
|---|
|  | 2891 | @samp{gray} or @samp{grey} with @samp{blue}: | 
|---|
|  | 2892 |  | 
|---|
|  | 2893 | @example | 
|---|
|  | 2894 | sed  's/gr[ae]y/blue/' | 
|---|
|  | 2895 | @end example | 
|---|
|  | 2896 |  | 
|---|
|  | 2897 | @c TODO: fix 'ref' to look good in both HTML and PDF | 
|---|
|  | 2898 | Bracket expressions can be used in both | 
|---|
|  | 2899 | @ref{BRE syntax,,basic} and @ref{ERE syntax,,extended} | 
|---|
|  | 2900 | regular expressions (that is, with or without the @option{-E}/@option{-r} | 
|---|
|  | 2901 | options). | 
|---|
|  | 2902 |  | 
|---|
|  | 2903 | @cindex range expression | 
|---|
|  | 2904 | Within a bracket expression, a @dfn{range expression} consists of two | 
|---|
|  | 2905 | characters separated by a hyphen. | 
|---|
|  | 2906 | It matches any single character that | 
|---|
|  | 2907 | sorts between the two characters, inclusive. | 
|---|
|  | 2908 | In the default C locale, the sorting sequence is the native character | 
|---|
|  | 2909 | order; for example, @samp{[a-d]} is equivalent to @samp{[abcd]}. | 
|---|
|  | 2910 |  | 
|---|
|  | 2911 |  | 
|---|
|  | 2912 | Finally, certain named classes of characters are predefined within | 
|---|
|  | 2913 | bracket expressions, as follows. | 
|---|
|  | 2914 |  | 
|---|
|  | 2915 | These named classes must be used @emph{inside} brackets | 
|---|
|  | 2916 | themselves. Correct usage: | 
|---|
|  | 2917 | @example | 
|---|
|  | 2918 | $ echo 1 | sed 's/[[:digit:]]/X/' | 
|---|
|  | 2919 | X | 
|---|
|  | 2920 | @end example | 
|---|
|  | 2921 |  | 
|---|
|  | 2922 | Incorrect usage is rejected by newer @command{sed} versions. | 
|---|
|  | 2923 | Older versions accepted it but treated it as a single bracket expression | 
|---|
|  | 2924 | (which is equivalent to @samp{[dgit:]}, | 
|---|
|  | 2925 | that is, only the characters @var{d/g/i/t/:}): | 
|---|
|  | 2926 | @example | 
|---|
|  | 2927 | # current GNU sed versions - incorrect usage rejected | 
|---|
|  | 2928 | $ echo 1 | sed 's/[:digit:]/X/' | 
|---|
|  | 2929 | sed: character class syntax is [[:space:]], not [:space:] | 
|---|
|  | 2930 |  | 
|---|
|  | 2931 | # older GNU sed versions | 
|---|
|  | 2932 | $ echo 1 | sed 's/[:digit:]/X/' | 
|---|
|  | 2933 | 1 | 
|---|
|  | 2934 | @end example | 
|---|
|  | 2935 |  | 
|---|
|  | 2936 |  | 
|---|
|  | 2937 | @cindex classes of characters | 
|---|
|  | 2938 | @cindex character classes | 
|---|
|  | 2939 | @cindex named character classes | 
|---|
|  | 2940 | @table @samp | 
|---|
|  | 2941 |  | 
|---|
|  | 2942 | @item [:alnum:] | 
|---|
|  | 2943 | @opindex alnum @r{character class} | 
|---|
|  | 2944 | @cindex alphanumeric characters | 
|---|
|  | 2945 | Alphanumeric characters: | 
|---|
|  | 2946 | @samp{[:alpha:]} and @samp{[:digit:]}; in the @samp{C} locale and ASCII | 
|---|
|  | 2947 | character encoding, this is the same as @samp{[0-9A-Za-z]}. | 
|---|
|  | 2948 |  | 
|---|
|  | 2949 | @item [:alpha:] | 
|---|
|  | 2950 | @opindex alpha @r{character class} | 
|---|
|  | 2951 | @cindex alphabetic characters | 
|---|
|  | 2952 | Alphabetic characters: | 
|---|
|  | 2953 | @samp{[:lower:]} and @samp{[:upper:]}; in the @samp{C} locale and ASCII | 
|---|
|  | 2954 | character encoding, this is the same as @samp{[A-Za-z]}. | 
|---|
|  | 2955 |  | 
|---|
|  | 2956 | @item [:blank:] | 
|---|
|  | 2957 | @opindex blank @r{character class} | 
|---|
|  | 2958 | @cindex blank characters | 
|---|
|  | 2959 | Blank characters: | 
|---|
|  | 2960 | space and tab. | 
|---|
|  | 2961 |  | 
|---|
|  | 2962 | @item [:cntrl:] | 
|---|
|  | 2963 | @opindex cntrl @r{character class} | 
|---|
|  | 2964 | @cindex control characters | 
|---|
|  | 2965 | Control characters. | 
|---|
|  | 2966 | In ASCII, these characters have octal codes 000 | 
|---|
|  | 2967 | through 037, and 177 (DEL). | 
|---|
|  | 2968 | In other character sets, these are | 
|---|
|  | 2969 | the equivalent characters, if any. | 
|---|
|  | 2970 |  | 
|---|
|  | 2971 | @item [:digit:] | 
|---|
|  | 2972 | @opindex digit @r{character class} | 
|---|
|  | 2973 | @cindex digit characters | 
|---|
|  | 2974 | @cindex numeric characters | 
|---|
|  | 2975 | Digits: @code{0 1 2 3 4 5 6 7 8 9}. | 
|---|
|  | 2976 |  | 
|---|
|  | 2977 | @item [:graph:] | 
|---|
|  | 2978 | @opindex graph @r{character class} | 
|---|
|  | 2979 | @cindex graphic characters | 
|---|
|  | 2980 | Graphical characters: | 
|---|
|  | 2981 | @samp{[:alnum:]} and @samp{[:punct:]}. | 
|---|
|  | 2982 |  | 
|---|
|  | 2983 | @item [:lower:] | 
|---|
|  | 2984 | @opindex lower @r{character class} | 
|---|
|  | 2985 | @cindex lower-case letters | 
|---|
|  | 2986 | Lower-case letters; in the @samp{C} locale and ASCII character | 
|---|
|  | 2987 | encoding, this is | 
|---|
|  | 2988 | @code{a b c d e f g h i j k l m n o p q r s t u v w x y z}. | 
|---|
|  | 2989 |  | 
|---|
|  | 2990 | @item [:print:] | 
|---|
|  | 2991 | @opindex print @r{character class} | 
|---|
|  | 2992 | @cindex printable characters | 
|---|
|  | 2993 | Printable characters: | 
|---|
|  | 2994 | @samp{[:alnum:]}, @samp{[:punct:]}, and space. | 
|---|
|  | 2995 |  | 
|---|
|  | 2996 | @item [:punct:] | 
|---|
|  | 2997 | @opindex punct @r{character class} | 
|---|
|  | 2998 | @cindex punctuation characters | 
|---|
|  | 2999 | Punctuation characters; in the @samp{C} locale and ASCII character | 
|---|
|  | 3000 | encoding, this is | 
|---|
|  | 3001 | @code{!@: " # $ % & ' ( ) * + , - .@: / : ; < = > ?@: @@ [ \ ] ^ _ ` @{ | @} ~}. | 
|---|
|  | 3002 |  | 
|---|
|  | 3003 | @item [:space:] | 
|---|
|  | 3004 | @opindex space @r{character class} | 
|---|
|  | 3005 | @cindex space characters | 
|---|
|  | 3006 | @cindex whitespace characters | 
|---|
|  | 3007 | Space characters: in the @samp{C} locale, this is | 
|---|
|  | 3008 | tab, newline, vertical tab, form feed, carriage return, and space. | 
|---|
|  | 3009 |  | 
|---|
|  | 3010 |  | 
|---|
|  | 3011 | @item [:upper:] | 
|---|
|  | 3012 | @opindex upper @r{character class} | 
|---|
|  | 3013 | @cindex upper-case letters | 
|---|
|  | 3014 | Upper-case letters: in the @samp{C} locale and ASCII character | 
|---|
|  | 3015 | encoding, this is | 
|---|
|  | 3016 | @code{A B C D E F G H I J K L M N O P Q R S T U V W X Y Z}. | 
|---|
|  | 3017 |  | 
|---|
|  | 3018 | @item [:xdigit:] | 
|---|
|  | 3019 | @opindex xdigit @r{character class} | 
|---|
|  | 3020 | @cindex xdigit class | 
|---|
|  | 3021 | @cindex hexadecimal digits | 
|---|
|  | 3022 | Hexadecimal digits: | 
|---|
|  | 3023 | @code{0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f}. | 
|---|
|  | 3024 |  | 
|---|
|  | 3025 | @end table | 
|---|
|  | 3026 | Note that the brackets in these class names are | 
|---|
|  | 3027 | part of the symbolic names, and must be included in addition to | 
|---|
|  | 3028 | the brackets delimiting the bracket expression. | 
|---|
|  | 3029 |  | 
|---|
|  | 3030 | Most meta-characters lose their special meaning inside bracket expressions: | 
|---|
|  | 3031 |  | 
|---|
|  | 3032 | @table @samp | 
|---|
|  | 3033 | @item ] | 
|---|
|  | 3034 | ends the bracket expression if it's not the first list item. | 
|---|
|  | 3035 | So, if you want to make the @samp{]} character a list item, | 
|---|
|  | 3036 | you must put it first. | 
|---|
|  | 3037 |  | 
|---|
|  | 3038 | @item - | 
|---|
|  | 3039 | represents the range if it's not first or last in a list or the ending point | 
|---|
|  | 3040 | of a range. | 
|---|
|  | 3041 |  | 
|---|
|  | 3042 | @item ^ | 
|---|
|  | 3043 | represents the characters not in the list. | 
|---|
|  | 3044 | If you want to make the @samp{^} | 
|---|
|  | 3045 | character a list item, place it anywhere but first. | 
|---|
|  | 3046 | @end table | 
|---|
|  | 3047 |  | 
|---|
|  | 3048 | TODO: incorporate this paragraph (copied verbatim from BRE section). | 
|---|
|  | 3049 |  | 
|---|
|  | 3050 | @cindex @code{POSIXLY_CORRECT} behavior, bracket expressions | 
|---|
|  | 3051 | The characters @code{$}, @code{*}, @code{.}, @code{[}, and @code{\} | 
|---|
|  | 3052 | are normally not special within @var{list}.  For example, @code{[\*]} | 
|---|
|  | 3053 | matches either @samp{\} or @samp{*}, because the @code{\} is not | 
|---|
|  | 3054 | special here.  However, strings like @code{[.ch.]}, @code{[=a=]}, and | 
|---|
|  | 3055 | @code{[:space:]} are special within @var{list} and represent collating | 
|---|
|  | 3056 | symbols, equivalence classes, and character classes, respectively, and | 
|---|
|  | 3057 | @code{[} is therefore special within @var{list} when it is followed by | 
|---|
|  | 3058 | @code{.}, @code{=}, or @code{:}.  Also, when not in | 
|---|
|  | 3059 | @env{POSIXLY_CORRECT} mode, special escapes like @code{\n} and | 
|---|
|  | 3060 | @code{\t} are recognized within @var{list}.  @xref{Escapes}. | 
|---|
|  | 3061 | @c ******** | 
|---|
|  | 3062 |  | 
|---|
|  | 3063 |  | 
|---|
|  | 3064 | @c TODO: improve explanation about collation classes and equivalence classes | 
|---|
|  | 3065 | @c       perhaps dedicate a section to Locales ?? | 
|---|
|  | 3066 |  | 
|---|
|  | 3067 | @table @samp | 
|---|
|  | 3068 | @item [. | 
|---|
|  | 3069 | represents the open collating symbol. | 
|---|
|  | 3070 |  | 
|---|
|  | 3071 | @item .] | 
|---|
|  | 3072 | represents the close collating symbol. | 
|---|
|  | 3073 |  | 
|---|
|  | 3074 | @item [= | 
|---|
|  | 3075 | represents the open equivalence class. | 
|---|
|  | 3076 |  | 
|---|
|  | 3077 | @item =] | 
|---|
|  | 3078 | represents the close equivalence class. | 
|---|
|  | 3079 |  | 
|---|
|  | 3080 | @item [: | 
|---|
|  | 3081 | represents the open character class symbol, and should be followed by a | 
|---|
|  | 3082 | valid character class name. | 
|---|
|  | 3083 |  | 
|---|
|  | 3084 | @item :] | 
|---|
|  | 3085 | represents the close character class symbol. | 
|---|
|  | 3086 | @end table | 
|---|
|  | 3087 |  | 
|---|
|  | 3088 |  | 
|---|
|  | 3089 | @node regexp extensions | 
|---|
|  | 3090 | @section regular expression extensions | 
|---|
|  | 3091 |  | 
|---|
|  | 3092 | The following sequences have special meaning inside regular expressions | 
|---|
|  | 3093 | (used in @ref{Regexp Addresses,,addresses} and the @code{s} command). | 
|---|
|  | 3094 |  | 
|---|
|  | 3095 | These can be used in both | 
|---|
|  | 3096 | @ref{BRE syntax,,basic} and @ref{ERE syntax,,extended} | 
|---|
|  | 3097 | regular expressions (that is, with or without the @option{-E}/@option{-r} | 
|---|
|  | 3098 | options). | 
|---|
|  | 3099 |  | 
|---|
|  | 3100 | @table @code | 
|---|
|  | 3101 | @item \w | 
|---|
|  | 3102 | Matches any ``word'' character.  A ``word'' character is any | 
|---|
|  | 3103 | letter or digit or the underscore character. | 
|---|
|  | 3104 |  | 
|---|
|  | 3105 | @example | 
|---|
|  | 3106 | $ echo "abc %-= def." | sed 's/\w/X/g' | 
|---|
|  | 3107 | XXX %-= XXX. | 
|---|
|  | 3108 | @end example | 
|---|
|  | 3109 |  | 
|---|
|  | 3110 |  | 
|---|
|  | 3111 | @item \W | 
|---|
|  | 3112 | Matches any ``non-word'' character. | 
|---|
|  | 3113 |  | 
|---|
|  | 3114 | @example | 
|---|
|  | 3115 | $ echo "abc %-= def." | sed 's/\W/X/g' | 
|---|
|  | 3116 | abcXXXXXdefX | 
|---|
|  | 3117 | @end example | 
|---|
|  | 3118 |  | 
|---|
|  | 3119 |  | 
|---|
|  | 3120 | @item \b | 
|---|
|  | 3121 | Matches a word boundary; that is it matches if the character | 
|---|
|  | 3122 | to the left is a ``word'' character and the character to the | 
|---|
|  | 3123 | right is a ``non-word'' character, or vice-versa. | 
|---|
|  | 3124 |  | 
|---|
|  | 3125 | @example | 
|---|
|  | 3126 | $ echo "abc %-= def." | sed 's/\b/X/g' | 
|---|
|  | 3127 | XabcX %-= XdefX. | 
|---|
|  | 3128 | @end example | 
|---|
|  | 3129 |  | 
|---|
|  | 3130 |  | 
|---|
|  | 3131 | @item \B | 
|---|
|  | 3132 | Matches everywhere but on a word boundary; that is it matches | 
|---|
|  | 3133 | if the character to the left and the character to the right | 
|---|
|  | 3134 | are either both ``word'' characters or both ``non-word'' | 
|---|
|  | 3135 | characters. | 
|---|
|  | 3136 |  | 
|---|
|  | 3137 | @example | 
|---|
|  | 3138 | $ echo "abc %-= def." | sed 's/\B/X/g' | 
|---|
|  | 3139 | aXbXc X%X-X=X dXeXf.X | 
|---|
|  | 3140 | @end example | 
|---|
|  | 3141 |  | 
|---|
|  | 3142 |  | 
|---|
|  | 3143 | @item \s | 
|---|
|  | 3144 | Matches whitespace characters (spaces and tabs). | 
|---|
|  | 3145 | Newlines embedded in the pattern/hold spaces will also match: | 
|---|
|  | 3146 |  | 
|---|
|  | 3147 | @example | 
|---|
|  | 3148 | $ echo "abc %-= def." | sed 's/\s/X/g' | 
|---|
|  | 3149 | abcX%-=Xdef. | 
|---|
|  | 3150 | @end example | 
|---|
|  | 3151 |  | 
|---|
|  | 3152 |  | 
|---|
|  | 3153 | @item \S | 
|---|
|  | 3154 | Matches non-whitespace characters. | 
|---|
|  | 3155 |  | 
|---|
|  | 3156 | @example | 
|---|
|  | 3157 | $ echo "abc %-= def." | sed 's/\S/X/g' | 
|---|
|  | 3158 | XXX XXX XXXX | 
|---|
|  | 3159 | @end example | 
|---|
|  | 3160 |  | 
|---|
|  | 3161 |  | 
|---|
|  | 3162 | @item \< | 
|---|
|  | 3163 | Matches the beginning of a word. | 
|---|
|  | 3164 |  | 
|---|
|  | 3165 | @example | 
|---|
|  | 3166 | $ echo "abc %-= def." | sed 's/\</X/g' | 
|---|
|  | 3167 | Xabc %-= Xdef. | 
|---|
|  | 3168 | @end example | 
|---|
|  | 3169 |  | 
|---|
|  | 3170 |  | 
|---|
|  | 3171 | @item \> | 
|---|
|  | 3172 | Matches the end of a word. | 
|---|
|  | 3173 |  | 
|---|
|  | 3174 | @example | 
|---|
|  | 3175 | $ echo "abc %-= def." | sed 's/\>/X/g' | 
|---|
|  | 3176 | abcX %-= defX. | 
|---|
|  | 3177 | @end example | 
|---|
|  | 3178 |  | 
|---|
|  | 3179 |  | 
|---|
|  | 3180 | @item \` | 
|---|
|  | 3181 | Matches only at the start of pattern space.  This is different | 
|---|
|  | 3182 | from @code{^} in multi-line mode. | 
|---|
|  | 3183 |  | 
|---|
|  | 3184 | Compare the following two examples: | 
|---|
|  | 3185 |  | 
|---|
|  | 3186 | @example | 
|---|
|  | 3187 | $ printf "a\nb\nc\n" | sed 'N;N;s/^/X/gm' | 
|---|
|  | 3188 | Xa | 
|---|
|  | 3189 | Xb | 
|---|
|  | 3190 | Xc | 
|---|
|  | 3191 |  | 
|---|
|  | 3192 | $ printf "a\nb\nc\n" | sed 'N;N;s/\`/X/gm' | 
|---|
|  | 3193 | Xa | 
|---|
|  | 3194 | b | 
|---|
|  | 3195 | c | 
|---|
|  | 3196 | @end example | 
|---|
|  | 3197 |  | 
|---|
|  | 3198 | @item \' | 
|---|
|  | 3199 | Matches only at the end of pattern space.  This is different | 
|---|
|  | 3200 | from @code{$} in multi-line mode. | 
|---|
|  | 3201 |  | 
|---|
|  | 3202 |  | 
|---|
|  | 3203 |  | 
|---|
|  | 3204 | @end table | 
|---|
|  | 3205 |  | 
|---|
|  | 3206 |  | 
|---|
|  | 3207 | @node Back-references and Subexpressions | 
|---|
|  | 3208 | @section Back-references and Subexpressions | 
|---|
|  | 3209 | @cindex subexpression | 
|---|
|  | 3210 | @cindex back-reference | 
|---|
|  | 3211 |  | 
|---|
|  | 3212 | @dfn{back-references} are regular expression commands which refer to a | 
|---|
|  | 3213 | previous part of the matched regular expression.  Back-references are | 
|---|
|  | 3214 | specified with backslash and a single digit (e.g. @samp{\1}).  The | 
|---|
|  | 3215 | part of the regular expression they refer to is called a | 
|---|
|  | 3216 | @dfn{subexpression}, and is designated with parentheses. | 
|---|
|  | 3217 |  | 
|---|
|  | 3218 | Back-references and subexpressions are used in two cases: in the | 
|---|
|  | 3219 | regular expression search pattern, and in the @var{replacement} part | 
|---|
|  | 3220 | of the @command{s} command (@pxref{Regexp Addresses,,Regular | 
|---|
|  | 3221 | Expression Addresses} and @ref{The "s" Command}). | 
|---|
|  | 3222 |  | 
|---|
|  | 3223 | In a regular expression pattern, back-references are used to match | 
|---|
|  | 3224 | the same content as a previously matched subexpression.  In the | 
|---|
|  | 3225 | following example, the subexpression is @samp{.} - any single | 
|---|
|  | 3226 | character (being surrounded by parentheses makes it a | 
|---|
|  | 3227 | subexpression). The back-reference @samp{\1} asks to match the same | 
|---|
|  | 3228 | content (same character) as the sub-expression. | 
|---|
|  | 3229 |  | 
|---|
|  | 3230 | The command below matches words starting with any character, | 
|---|
|  | 3231 | followed by the letter @samp{o}, followed by the same character as the | 
|---|
|  | 3232 | first. | 
|---|
|  | 3233 |  | 
|---|
|  | 3234 | @example | 
|---|
|  | 3235 | $ sed -E -n '/^(.)o\1$/p' /usr/share/dict/words | 
|---|
|  | 3236 | bob | 
|---|
|  | 3237 | mom | 
|---|
|  | 3238 | non | 
|---|
|  | 3239 | pop | 
|---|
|  | 3240 | sos | 
|---|
|  | 3241 | tot | 
|---|
|  | 3242 | wow | 
|---|
|  | 3243 | @end example | 
|---|
|  | 3244 |  | 
|---|
|  | 3245 | Multiple subexpressions are automatically numbered from | 
|---|
|  | 3246 | left-to-right. This command searches for 6-letter | 
|---|
|  | 3247 | palindromes (the first three letters are 3 subexpressions, | 
|---|
|  | 3248 | followed by 3 back-references in reverse order): | 
|---|
|  | 3249 |  | 
|---|
|  | 3250 | @example | 
|---|
|  | 3251 | $ sed -E -n '/^(.)(.)(.)\3\2\1$/p' /usr/share/dict/words | 
|---|
|  | 3252 | redder | 
|---|
|  | 3253 | @end example | 
|---|
|  | 3254 |  | 
|---|
|  | 3255 | In the @command{s} command, back-references can be | 
|---|
|  | 3256 | used in the @var{replacement} part to refer back to subexpressions in | 
|---|
|  | 3257 | the @var{regexp} part. | 
|---|
|  | 3258 |  | 
|---|
|  | 3259 | The following example uses two subexpressions in the regular | 
|---|
|  | 3260 | expression to match two space-separated words. The back-references in | 
|---|
|  | 3261 | the @var{replacement} part prints the words in a different order: | 
|---|
|  | 3262 |  | 
|---|
|  | 3263 | @example | 
|---|
|  | 3264 | $ echo "James Bond" | sed -E 's/(.*) (.*)/The name is \2, \1 \2./' | 
|---|
|  | 3265 | The name is Bond, James Bond. | 
|---|
|  | 3266 | @end example | 
|---|
|  | 3267 |  | 
|---|
|  | 3268 |  | 
|---|
|  | 3269 | When used with alternation, if the group does not participate in the | 
|---|
|  | 3270 | match then the back-reference makes the whole match fail.  For | 
|---|
|  | 3271 | example, @samp{a(.)|b\1} will not match @samp{ba}.  When multiple | 
|---|
|  | 3272 | regular expressions are given with @option{-e} or from a file | 
|---|
|  | 3273 | (@samp{-f @var{file}}), back-references are local to each expression. | 
|---|
|  | 3274 |  | 
|---|
|  | 3275 |  | 
|---|
| [599] | 3276 | @node Escapes | 
|---|
| [3613] | 3277 | @section Escape Sequences - specifying special characters | 
|---|
| [599] | 3278 |  | 
|---|
| [3613] | 3279 | @cindex GNU extensions, special escapes | 
|---|
| [599] | 3280 | Until this chapter, we have only encountered escapes of the form | 
|---|
|  | 3281 | @samp{\^}, which tell @command{sed} not to interpret the circumflex | 
|---|
|  | 3282 | as a special character, but rather to take it literally.  For | 
|---|
|  | 3283 | example, @samp{\*} matches a single asterisk rather than zero | 
|---|
|  | 3284 | or more backslashes. | 
|---|
|  | 3285 |  | 
|---|
|  | 3286 | @cindex @code{POSIXLY_CORRECT} behavior, escapes | 
|---|
|  | 3287 | This chapter introduces another kind of escape@footnote{All | 
|---|
| [3613] | 3288 | the escapes introduced here are GNU | 
|---|
| [599] | 3289 | extensions, with the exception of @code{\n}.  In basic regular | 
|---|
|  | 3290 | expression mode, setting @code{POSIXLY_CORRECT} disables them inside | 
|---|
|  | 3291 | bracket expressions.}---that | 
|---|
|  | 3292 | is, escapes that are applied to a character or sequence of characters | 
|---|
|  | 3293 | that ordinarily are taken literally, and that @command{sed} replaces | 
|---|
|  | 3294 | with a special character.  This provides a way | 
|---|
|  | 3295 | of encoding non-printable characters in patterns in a visible manner. | 
|---|
|  | 3296 | There is no restriction on the appearance of non-printing characters | 
|---|
|  | 3297 | in a @command{sed} script but when a script is being prepared in the | 
|---|
|  | 3298 | shell or by text editing, it is usually easier to use one of | 
|---|
|  | 3299 | the following escape sequences than the binary character it | 
|---|
|  | 3300 | represents: | 
|---|
|  | 3301 |  | 
|---|
|  | 3302 | The list of these escapes is: | 
|---|
|  | 3303 |  | 
|---|
|  | 3304 | @table @code | 
|---|
|  | 3305 | @item \a | 
|---|
|  | 3306 | Produces or matches a @sc{bel} character, that is an ``alert'' (@sc{ascii} 7). | 
|---|
|  | 3307 |  | 
|---|
|  | 3308 | @item \f | 
|---|
|  | 3309 | Produces or matches a form feed (@sc{ascii} 12). | 
|---|
|  | 3310 |  | 
|---|
|  | 3311 | @item \n | 
|---|
|  | 3312 | Produces or matches a newline (@sc{ascii} 10). | 
|---|
|  | 3313 |  | 
|---|
|  | 3314 | @item \r | 
|---|
|  | 3315 | Produces or matches a carriage return (@sc{ascii} 13). | 
|---|
|  | 3316 |  | 
|---|
|  | 3317 | @item \t | 
|---|
|  | 3318 | Produces or matches a horizontal tab (@sc{ascii} 9). | 
|---|
|  | 3319 |  | 
|---|
|  | 3320 | @item \v | 
|---|
|  | 3321 | Produces or matches a so called ``vertical tab'' (@sc{ascii} 11). | 
|---|
|  | 3322 |  | 
|---|
|  | 3323 | @item \c@var{x} | 
|---|
|  | 3324 | Produces or matches @kbd{@sc{Control}-@var{x}}, where @var{x} is | 
|---|
|  | 3325 | any character.  The precise effect of @samp{\c@var{x}} is as follows: | 
|---|
|  | 3326 | if @var{x} is a lower case letter, it is converted to upper case. | 
|---|
|  | 3327 | Then bit 6 of the character (hex 40) is inverted.  Thus @samp{\cz} becomes | 
|---|
|  | 3328 | hex 1A, but @samp{\c@{} becomes hex 3B, while @samp{\c;} becomes hex 7B. | 
|---|
|  | 3329 |  | 
|---|
|  | 3330 | @item \d@var{xxx} | 
|---|
|  | 3331 | Produces or matches a character whose decimal @sc{ascii} value is @var{xxx}. | 
|---|
|  | 3332 |  | 
|---|
|  | 3333 | @item \o@var{xxx} | 
|---|
|  | 3334 | Produces or matches a character whose octal @sc{ascii} value is @var{xxx}. | 
|---|
|  | 3335 |  | 
|---|
|  | 3336 | @item \x@var{xx} | 
|---|
|  | 3337 | Produces or matches a character whose hexadecimal @sc{ascii} value is @var{xx}. | 
|---|
|  | 3338 | @end table | 
|---|
|  | 3339 |  | 
|---|
|  | 3340 | @samp{\b} (backspace) was omitted because of the conflict with | 
|---|
|  | 3341 | the existing ``word boundary'' meaning. | 
|---|
|  | 3342 |  | 
|---|
| [3613] | 3343 | @subsection Escaping Precedence | 
|---|
| [599] | 3344 |  | 
|---|
| [3613] | 3345 | @value{SSED} processes escape sequences @emph{before} passing | 
|---|
|  | 3346 | the text onto the regular-expression matching of the @command{s///} command | 
|---|
|  | 3347 | and Address matching. Thus the following two commands are equivalent | 
|---|
|  | 3348 | (@samp{0x5e} is the hexadecimal @sc{ascii} value of the character @samp{^}): | 
|---|
|  | 3349 |  | 
|---|
|  | 3350 | @codequoteundirected on | 
|---|
|  | 3351 | @codequotebacktick on | 
|---|
|  | 3352 | @example | 
|---|
|  | 3353 | @group | 
|---|
|  | 3354 | $ echo 'a^c' | sed 's/^/b/' | 
|---|
|  | 3355 | ba^c | 
|---|
|  | 3356 |  | 
|---|
|  | 3357 | $ echo 'a^c' | sed 's/\x5e/b/' | 
|---|
|  | 3358 | ba^c | 
|---|
|  | 3359 | @end group | 
|---|
|  | 3360 | @end example | 
|---|
|  | 3361 | @codequoteundirected off | 
|---|
|  | 3362 | @codequotebacktick off | 
|---|
|  | 3363 |  | 
|---|
|  | 3364 | As are the following (@samp{0x5b},@samp{0x5d} are the hexadecimal | 
|---|
|  | 3365 | @sc{ascii} values of @samp{[},@samp{]}, respectively): | 
|---|
|  | 3366 |  | 
|---|
|  | 3367 | @codequoteundirected on | 
|---|
|  | 3368 | @codequotebacktick on | 
|---|
|  | 3369 | @example | 
|---|
|  | 3370 | @group | 
|---|
|  | 3371 | $ echo abc | sed 's/[a]/x/' | 
|---|
|  | 3372 | Xbc | 
|---|
|  | 3373 | $ echo abc | sed 's/\x5ba\x5d/x/' | 
|---|
|  | 3374 | Xbc | 
|---|
|  | 3375 | @end group | 
|---|
|  | 3376 | @end example | 
|---|
|  | 3377 | @codequoteundirected off | 
|---|
|  | 3378 | @codequotebacktick off | 
|---|
|  | 3379 |  | 
|---|
|  | 3380 | However it is recommended to avoid such special characters | 
|---|
|  | 3381 | due to unexpected edge-cases. For example, the following | 
|---|
|  | 3382 | are not equivalent: | 
|---|
|  | 3383 |  | 
|---|
|  | 3384 | @codequoteundirected on | 
|---|
|  | 3385 | @codequotebacktick on | 
|---|
|  | 3386 | @example | 
|---|
|  | 3387 | @group | 
|---|
|  | 3388 | $ echo 'a^c' | sed 's/\^/b/' | 
|---|
|  | 3389 | abc | 
|---|
|  | 3390 |  | 
|---|
|  | 3391 | $ echo 'a^c' | sed 's/\\\x5e/b/' | 
|---|
|  | 3392 | a^c | 
|---|
|  | 3393 | @end group | 
|---|
|  | 3394 | @end example | 
|---|
|  | 3395 | @codequoteundirected off | 
|---|
|  | 3396 | @codequotebacktick off | 
|---|
|  | 3397 |  | 
|---|
|  | 3398 | @c also: this fails in different places: | 
|---|
|  | 3399 | @c   $ sed 's/[//' | 
|---|
|  | 3400 | @c   sed: -e expression #1, char 5: unterminated `s' command | 
|---|
|  | 3401 | @c   $ sed 's/\x5b//' | 
|---|
|  | 3402 | @c   sed: -e expression #1, char 8: Invalid regular expression | 
|---|
|  | 3403 | @c | 
|---|
|  | 3404 | @c which is OK but confusing to explain why (the first | 
|---|
|  | 3405 | @c fails in compile.c:snarf_char_class while the second | 
|---|
|  | 3406 | @c is passed to the regex engine and then fails). | 
|---|
|  | 3407 |  | 
|---|
|  | 3408 |  | 
|---|
|  | 3409 | @node Locale Considerations | 
|---|
|  | 3410 | @section Multibyte characters and Locale Considerations | 
|---|
|  | 3411 |  | 
|---|
|  | 3412 | @value{SSED} processes valid multibyte characters in multibyte locales | 
|---|
|  | 3413 | (e.g. @code{UTF-8}).  @footnote{Some regexp edge-cases depends on the | 
|---|
|  | 3414 | operating system and libc implementation. The examples shown are known | 
|---|
|  | 3415 | to work as-expected on GNU/Linux systems using glibc.} | 
|---|
|  | 3416 |  | 
|---|
|  | 3417 | @noindent The following example uses the Greek letter Capital Sigma | 
|---|
|  | 3418 | (@value{ucsigma}, | 
|---|
|  | 3419 | Unicode code point @code{0x03A3}). In a @code{UTF-8} locale, | 
|---|
|  | 3420 | @command{sed} correctly processes the Sigma as one character despite | 
|---|
|  | 3421 | it being 2 octets (bytes): | 
|---|
|  | 3422 |  | 
|---|
|  | 3423 | @codequoteundirected on | 
|---|
|  | 3424 | @codequotebacktick on | 
|---|
|  | 3425 | @example | 
|---|
|  | 3426 | @group | 
|---|
|  | 3427 | $ locale | grep LANG | 
|---|
|  | 3428 | LANG=en_US.UTF-8 | 
|---|
|  | 3429 |  | 
|---|
|  | 3430 | $ printf 'a\u03A3b' | 
|---|
|  | 3431 | a@value{ucsigma}b | 
|---|
|  | 3432 |  | 
|---|
|  | 3433 | $ printf 'a\u03A3b' | sed 's/./X/g' | 
|---|
|  | 3434 | XXX | 
|---|
|  | 3435 |  | 
|---|
|  | 3436 | $ printf 'a\u03A3b' | od -tx1 -An | 
|---|
|  | 3437 | 61 ce a3 62 | 
|---|
|  | 3438 | @end group | 
|---|
|  | 3439 | @end example | 
|---|
|  | 3440 | @codequoteundirected off | 
|---|
|  | 3441 | @codequotebacktick off | 
|---|
|  | 3442 |  | 
|---|
|  | 3443 | @noindent | 
|---|
|  | 3444 | To force @command{sed} to process octets separately, use the @code{C} locale | 
|---|
|  | 3445 | (also known as the @code{POSIX} locale): | 
|---|
|  | 3446 |  | 
|---|
|  | 3447 | @codequoteundirected on | 
|---|
|  | 3448 | @codequotebacktick on | 
|---|
|  | 3449 | @example | 
|---|
|  | 3450 | $ printf 'a\u03A3b' | LC_ALL=C sed 's/./X/g' | 
|---|
|  | 3451 | XXXX | 
|---|
|  | 3452 | @end example | 
|---|
|  | 3453 | @codequoteundirected off | 
|---|
|  | 3454 | @codequotebacktick off | 
|---|
|  | 3455 |  | 
|---|
|  | 3456 | @subsection Invalid multibyte characters | 
|---|
|  | 3457 |  | 
|---|
|  | 3458 | @command{sed}'s regular expressions @emph{do not} match | 
|---|
|  | 3459 | invalid multibyte sequences in a multibyte locale. | 
|---|
|  | 3460 |  | 
|---|
|  | 3461 | @noindent | 
|---|
|  | 3462 | In the following examples, the ascii value @code{0xCE} is | 
|---|
|  | 3463 | an incomplete multibyte character (shown here as @value{unicodeFFFD}). | 
|---|
|  | 3464 | The regular expression @samp{.} does not match it: | 
|---|
|  | 3465 |  | 
|---|
|  | 3466 | @codequoteundirected on | 
|---|
|  | 3467 | @codequotebacktick on | 
|---|
|  | 3468 | @example | 
|---|
|  | 3469 | @group | 
|---|
|  | 3470 | $ printf 'a\xCEb\n' | 
|---|
|  | 3471 | a@value{unicodeFFFD}e | 
|---|
|  | 3472 |  | 
|---|
|  | 3473 | $ printf 'a\xCEb\n' | sed 's/./X/g' | 
|---|
|  | 3474 | X@value{unicodeFFFD}X | 
|---|
|  | 3475 |  | 
|---|
|  | 3476 | $ printf 'a\xCEc\n' | sed 's/./X/g' | od -tx1c -An | 
|---|
|  | 3477 | 58  ce  58  0a | 
|---|
|  | 3478 | X      X   \n | 
|---|
|  | 3479 | @end group | 
|---|
|  | 3480 | @end example | 
|---|
|  | 3481 | @codequoteundirected off | 
|---|
|  | 3482 | @codequotebacktick off | 
|---|
|  | 3483 |  | 
|---|
|  | 3484 | @noindent Similarly, the 'catch-all' regular expression @samp{.*} does not | 
|---|
|  | 3485 | match the entire line: | 
|---|
|  | 3486 |  | 
|---|
|  | 3487 | @codequoteundirected on | 
|---|
|  | 3488 | @codequotebacktick on | 
|---|
|  | 3489 | @example | 
|---|
|  | 3490 | @group | 
|---|
|  | 3491 | $ printf 'a\xCEc\n' | sed 's/.*//' | od -tx1c -An | 
|---|
|  | 3492 | ce  63  0a | 
|---|
|  | 3493 | c  \n | 
|---|
|  | 3494 | @end group | 
|---|
|  | 3495 | @end example | 
|---|
|  | 3496 | @codequoteundirected off | 
|---|
|  | 3497 | @codequotebacktick off | 
|---|
|  | 3498 |  | 
|---|
|  | 3499 | @noindent | 
|---|
|  | 3500 | @value{SSED} offers the special @command{z} command to clear the | 
|---|
|  | 3501 | current pattern space regardless of invalid multibyte characters | 
|---|
|  | 3502 | (i.e. it works like @code{s/.*//} but also removes invalid multibyte | 
|---|
|  | 3503 | characters): | 
|---|
|  | 3504 |  | 
|---|
|  | 3505 | @codequoteundirected on | 
|---|
|  | 3506 | @codequotebacktick on | 
|---|
|  | 3507 | @example | 
|---|
|  | 3508 | @group | 
|---|
|  | 3509 | $ printf 'a\xCEc\n' | sed 'z' | od -tx1c -An | 
|---|
|  | 3510 | 0a | 
|---|
|  | 3511 | \n | 
|---|
|  | 3512 | @end group | 
|---|
|  | 3513 | @end example | 
|---|
|  | 3514 | @codequoteundirected off | 
|---|
|  | 3515 | @codequotebacktick off | 
|---|
|  | 3516 |  | 
|---|
|  | 3517 | @noindent Alternatively, force the @code{C} locale to process | 
|---|
|  | 3518 | each octet separately (every octet is a valid character in the @code{C} | 
|---|
|  | 3519 | locale): | 
|---|
|  | 3520 |  | 
|---|
|  | 3521 | @codequoteundirected on | 
|---|
|  | 3522 | @codequotebacktick on | 
|---|
|  | 3523 | @example | 
|---|
|  | 3524 | @group | 
|---|
|  | 3525 | $ printf 'a\xCEc\n' | LC_ALL=C sed 's/.*//' | od -tx1c -An | 
|---|
|  | 3526 | 0a | 
|---|
|  | 3527 | \n | 
|---|
|  | 3528 | @end group | 
|---|
|  | 3529 | @end example | 
|---|
|  | 3530 | @codequoteundirected off | 
|---|
|  | 3531 | @codequotebacktick off | 
|---|
|  | 3532 |  | 
|---|
|  | 3533 |  | 
|---|
|  | 3534 | @command{sed}'s inability to process invalid multibyte characters | 
|---|
|  | 3535 | can be used to detect such invalid sequences in a file. | 
|---|
|  | 3536 | In the following examples, the @code{\xCE\xCE} is an invalid | 
|---|
|  | 3537 | multibyte sequence, while @code{\xCE\A3} is a valid multibyte sequence | 
|---|
|  | 3538 | (of the Greek Sigma character). | 
|---|
|  | 3539 |  | 
|---|
|  | 3540 | @noindent | 
|---|
|  | 3541 | The following @command{sed} program removes all valid | 
|---|
|  | 3542 | characters using @code{s/.//g}.  Any content left in the pattern space | 
|---|
|  | 3543 | (the invalid characters) are added to the hold space using the | 
|---|
|  | 3544 | @code{H} command. On the last line (@code{$}), the hold space is retrieved | 
|---|
|  | 3545 | (@code{x}), newlines are removed (@code{s/\n//g}), and any remaining | 
|---|
|  | 3546 | octets are printed unambiguously (@code{l}).  Thus, any invalid | 
|---|
|  | 3547 | multibyte sequences are printed as octal values: | 
|---|
|  | 3548 |  | 
|---|
|  | 3549 | @codequoteundirected on | 
|---|
|  | 3550 | @codequotebacktick on | 
|---|
|  | 3551 | @example | 
|---|
|  | 3552 | @group | 
|---|
|  | 3553 | $ printf 'ab\nc\n\xCE\xCEde\n\xCE\xA3f\n' > invalid.txt | 
|---|
|  | 3554 |  | 
|---|
|  | 3555 | $ cat invalid.txt | 
|---|
|  | 3556 | ab | 
|---|
|  | 3557 | c | 
|---|
|  | 3558 | @value{unicodeFFFD}@value{unicodeFFFD}de | 
|---|
|  | 3559 | @value{ucsigma}f | 
|---|
|  | 3560 |  | 
|---|
|  | 3561 | $ sed -n 's/.//g ; H ; $@{x;s/\n//g;l@}' invalid.txt | 
|---|
|  | 3562 | \316\316$ | 
|---|
|  | 3563 | @end group | 
|---|
|  | 3564 | @end example | 
|---|
|  | 3565 | @codequoteundirected off | 
|---|
|  | 3566 | @codequotebacktick off | 
|---|
|  | 3567 |  | 
|---|
|  | 3568 | @noindent With a few more commands, @command{sed} can print | 
|---|
|  | 3569 | the exact line number corresponding to each invalid characters (line 3). | 
|---|
|  | 3570 | These characters can then be removed by forcing the @code{C} locale | 
|---|
|  | 3571 | and using octal escape sequences: | 
|---|
|  | 3572 |  | 
|---|
|  | 3573 | @codequoteundirected on | 
|---|
|  | 3574 | @codequotebacktick on | 
|---|
|  | 3575 | @example | 
|---|
|  | 3576 | $ sed -n 's/.//g;=;l' invalid.txt | paste - -  | awk '$2!="$"' | 
|---|
|  | 3577 | 3       \316\316$ | 
|---|
|  | 3578 |  | 
|---|
|  | 3579 | $ LC_ALL=C sed '3s/\o316\o316//' invalid.txt > fixed.txt | 
|---|
|  | 3580 | @end example | 
|---|
|  | 3581 | @codequoteundirected off | 
|---|
|  | 3582 | @codequotebacktick off | 
|---|
|  | 3583 |  | 
|---|
|  | 3584 | @subsection Upper/Lower case conversion | 
|---|
|  | 3585 |  | 
|---|
|  | 3586 |  | 
|---|
|  | 3587 | @value{SSED}'s substitute command (@code{s}) supports upper/lower | 
|---|
|  | 3588 | case conversions using @code{\U},@code{\L} codes. | 
|---|
|  | 3589 | These conversions support multibyte characters: | 
|---|
|  | 3590 |  | 
|---|
|  | 3591 | @codequoteundirected on | 
|---|
|  | 3592 | @codequotebacktick on | 
|---|
|  | 3593 | @example | 
|---|
|  | 3594 | $ printf 'ABC\u03a3\n' | 
|---|
|  | 3595 | ABC@value{ucsigma} | 
|---|
|  | 3596 |  | 
|---|
|  | 3597 | $ printf 'ABC\u03a3\n' | sed 's/.*/\L&/' | 
|---|
|  | 3598 | abc@value{lcsigma} | 
|---|
|  | 3599 | @end example | 
|---|
|  | 3600 | @codequoteundirected off | 
|---|
|  | 3601 | @codequotebacktick off | 
|---|
|  | 3602 |  | 
|---|
|  | 3603 | @noindent | 
|---|
|  | 3604 | @xref{The "s" Command}. | 
|---|
|  | 3605 |  | 
|---|
|  | 3606 |  | 
|---|
|  | 3607 | @subsection Multibyte regexp character classes | 
|---|
|  | 3608 |  | 
|---|
|  | 3609 | @c TODO: fix following paragraphs (copied verbatim from 'bracket | 
|---|
|  | 3610 | @c expression' section). | 
|---|
|  | 3611 |  | 
|---|
|  | 3612 | In other locales, the sorting sequence is not specified, and | 
|---|
|  | 3613 | @samp{[a-d]} might be equivalent to @samp{[abcd]} or to | 
|---|
|  | 3614 | @samp{[aBbCcDd]}, or it might fail to match any character, or the set of | 
|---|
|  | 3615 | characters that it matches might even be erratic. | 
|---|
|  | 3616 | To obtain the traditional interpretation | 
|---|
|  | 3617 | of bracket expressions, you can use the @samp{C} locale by setting the | 
|---|
|  | 3618 | @env{LC_ALL} environment variable to the value @samp{C}. | 
|---|
|  | 3619 |  | 
|---|
|  | 3620 | @example | 
|---|
|  | 3621 | # TODO: is there any real-world system/locale where 'A' | 
|---|
|  | 3622 | #       is replaced by '-' ? | 
|---|
|  | 3623 | $ echo A | sed 's/[a-z]/-/' | 
|---|
|  | 3624 | A | 
|---|
|  | 3625 | @end example | 
|---|
|  | 3626 |  | 
|---|
|  | 3627 | Their interpretation depends on the @env{LC_CTYPE} locale; | 
|---|
|  | 3628 | for example, @samp{[[:alnum:]]} means the character class of numbers and letters | 
|---|
|  | 3629 | in the current locale. | 
|---|
|  | 3630 |  | 
|---|
|  | 3631 | TODO: show example of collation | 
|---|
|  | 3632 |  | 
|---|
|  | 3633 | @codequoteundirected on | 
|---|
|  | 3634 | @codequotebacktick on | 
|---|
|  | 3635 | @example | 
|---|
|  | 3636 | # TODO: this works on glibc systems, not on musl-libc/freebsd/macosx. | 
|---|
|  | 3637 | $ printf 'cliché\n' | LC_ALL=fr_FR.utf8 sed 's/[[=e=]]/X/g' | 
|---|
|  | 3638 | clichX | 
|---|
|  | 3639 | @end example | 
|---|
|  | 3640 | @codequoteundirected off | 
|---|
|  | 3641 | @codequotebacktick off | 
|---|
|  | 3642 |  | 
|---|
|  | 3643 |  | 
|---|
|  | 3644 | @node advanced sed | 
|---|
|  | 3645 | @chapter Advanced @command{sed}: cycles and buffers | 
|---|
|  | 3646 |  | 
|---|
|  | 3647 | @menu | 
|---|
|  | 3648 | * Execution Cycle::          How @command{sed} works | 
|---|
|  | 3649 | * Hold and Pattern Buffers:: | 
|---|
|  | 3650 | * Multiline techniques::     Using D,G,H,N,P to process multiple lines | 
|---|
|  | 3651 | * Branching and flow control:: | 
|---|
|  | 3652 | @end menu | 
|---|
|  | 3653 |  | 
|---|
|  | 3654 | @node Execution Cycle | 
|---|
|  | 3655 | @section How @command{sed} Works | 
|---|
|  | 3656 |  | 
|---|
|  | 3657 | @cindex Buffer spaces, pattern and hold | 
|---|
|  | 3658 | @cindex Spaces, pattern and hold | 
|---|
|  | 3659 | @cindex Pattern space, definition | 
|---|
|  | 3660 | @cindex Hold space, definition | 
|---|
|  | 3661 | @command{sed} maintains two data buffers: the active @emph{pattern} space, | 
|---|
|  | 3662 | and the auxiliary @emph{hold} space. Both are initially empty. | 
|---|
|  | 3663 |  | 
|---|
|  | 3664 | @command{sed} operates by performing the following cycle on each | 
|---|
|  | 3665 | line of input: first, @command{sed} reads one line from the input | 
|---|
|  | 3666 | stream, removes any trailing newline, and places it in the pattern space. | 
|---|
|  | 3667 | Then commands are executed; each command can have an address associated | 
|---|
|  | 3668 | to it: addresses are a kind of condition code, and a command is only | 
|---|
|  | 3669 | executed if the condition is verified before the command is to be | 
|---|
|  | 3670 | executed. | 
|---|
|  | 3671 |  | 
|---|
|  | 3672 | When the end of the script is reached, unless the @option{-n} option | 
|---|
|  | 3673 | is in use, the contents of pattern space are printed out to the output | 
|---|
|  | 3674 | stream, adding back the trailing newline if it was removed.@footnote{Actually, | 
|---|
|  | 3675 | if @command{sed} prints a line without the terminating newline, it will | 
|---|
|  | 3676 | nevertheless print the missing newline as soon as more text is sent to | 
|---|
|  | 3677 | the same output stream, which gives the ``least expected surprise'' | 
|---|
|  | 3678 | even though it does not make commands like @samp{sed -n p} exactly | 
|---|
|  | 3679 | identical to @command{cat}.} Then the next cycle starts for the next | 
|---|
|  | 3680 | input line. | 
|---|
|  | 3681 |  | 
|---|
|  | 3682 | Unless special commands (like @samp{D}) are used, the pattern space is | 
|---|
|  | 3683 | deleted between two cycles. The hold space, on the other hand, keeps | 
|---|
|  | 3684 | its data between cycles (see commands @samp{h}, @samp{H}, @samp{x}, | 
|---|
|  | 3685 | @samp{g}, @samp{G} to move data between both buffers). | 
|---|
|  | 3686 |  | 
|---|
|  | 3687 | @node Hold and Pattern Buffers | 
|---|
|  | 3688 | @section Hold and Pattern Buffers | 
|---|
|  | 3689 |  | 
|---|
|  | 3690 | TODO | 
|---|
|  | 3691 |  | 
|---|
|  | 3692 | @node Multiline techniques | 
|---|
|  | 3693 | @section Multiline techniques - using D,G,H,N,P to process multiple lines | 
|---|
|  | 3694 |  | 
|---|
|  | 3695 | Multiple lines can be processed as one buffer using the | 
|---|
|  | 3696 | @code{D},@code{G},@code{H},@code{N},@code{P}. They are similar to | 
|---|
|  | 3697 | their lowercase counterparts (@code{d},@code{g}, | 
|---|
|  | 3698 | @code{h},@code{n},@code{p}), except that these commands append or | 
|---|
|  | 3699 | subtract data while respecting embedded newlines - allowing adding and | 
|---|
|  | 3700 | removing lines from the pattern and hold spaces. | 
|---|
|  | 3701 |  | 
|---|
|  | 3702 | They operate as follows: | 
|---|
| [599] | 3703 | @table @code | 
|---|
| [3613] | 3704 | @item D | 
|---|
|  | 3705 | @emph{deletes} line from the pattern space until the first newline, | 
|---|
|  | 3706 | and restarts the cycle. | 
|---|
| [599] | 3707 |  | 
|---|
| [3613] | 3708 | @item G | 
|---|
|  | 3709 | @emph{appends} line from the hold space to the pattern space, with a | 
|---|
|  | 3710 | newline before it. | 
|---|
| [599] | 3711 |  | 
|---|
| [3613] | 3712 | @item H | 
|---|
|  | 3713 | @emph{appends} line from the pattern space to the hold space, with a | 
|---|
|  | 3714 | newline before it. | 
|---|
| [599] | 3715 |  | 
|---|
| [3613] | 3716 | @item N | 
|---|
|  | 3717 | @emph{appends} line from the input file to the pattern space. | 
|---|
| [599] | 3718 |  | 
|---|
| [3613] | 3719 | @item P | 
|---|
|  | 3720 | @emph{prints} line from the pattern space until the first newline. | 
|---|
| [599] | 3721 |  | 
|---|
| [3613] | 3722 | @end table | 
|---|
| [599] | 3723 |  | 
|---|
| [3613] | 3724 |  | 
|---|
|  | 3725 | The following example illustrates the operation of @code{N} and | 
|---|
|  | 3726 | @code{D} commands: | 
|---|
|  | 3727 |  | 
|---|
|  | 3728 | @codequoteundirected on | 
|---|
|  | 3729 | @codequotebacktick on | 
|---|
|  | 3730 | @example | 
|---|
|  | 3731 | @group | 
|---|
|  | 3732 | $ seq 6 | sed -n 'N;l;D' | 
|---|
|  | 3733 | 1\n2$ | 
|---|
|  | 3734 | 2\n3$ | 
|---|
|  | 3735 | 3\n4$ | 
|---|
|  | 3736 | 4\n5$ | 
|---|
|  | 3737 | 5\n6$ | 
|---|
|  | 3738 | @end group | 
|---|
|  | 3739 | @end example | 
|---|
|  | 3740 | @codequoteundirected off | 
|---|
|  | 3741 | @codequotebacktick off | 
|---|
|  | 3742 |  | 
|---|
|  | 3743 | @enumerate | 
|---|
|  | 3744 | @item | 
|---|
|  | 3745 | @command{sed} starts by reading the first line into the pattern space | 
|---|
|  | 3746 | (i.e. @samp{1}). | 
|---|
|  | 3747 | @item | 
|---|
|  | 3748 | At the beginning of every cycle, the @code{N} | 
|---|
|  | 3749 | command appends a newline and the next line to the pattern space | 
|---|
|  | 3750 | (i.e. @samp{1}, @samp{\n}, @samp{2} in the first cycle). | 
|---|
|  | 3751 | @item | 
|---|
|  | 3752 | The @code{l} command prints the content of the pattern space | 
|---|
|  | 3753 | unambiguously. | 
|---|
|  | 3754 | @item | 
|---|
|  | 3755 | The @code{D} command then removes the content of pattern | 
|---|
|  | 3756 | space up to the first newline (leaving @samp{2} at the end of | 
|---|
|  | 3757 | the first cycle). | 
|---|
|  | 3758 | @item | 
|---|
|  | 3759 | At the next cycle the @code{N} command appends a | 
|---|
|  | 3760 | newline and the next input line to the pattern space | 
|---|
|  | 3761 | (e.g. @samp{2}, @samp{\n}, @samp{3}). | 
|---|
|  | 3762 | @end enumerate | 
|---|
|  | 3763 |  | 
|---|
|  | 3764 |  | 
|---|
|  | 3765 | @cindex processing paragraphs | 
|---|
|  | 3766 | @cindex paragraphs, processing | 
|---|
|  | 3767 | A common technique to process blocks of text such as paragraphs | 
|---|
|  | 3768 | (instead of line-by-line) is using the following construct: | 
|---|
|  | 3769 |  | 
|---|
|  | 3770 | @codequoteundirected on | 
|---|
|  | 3771 | @codequotebacktick on | 
|---|
|  | 3772 | @example | 
|---|
|  | 3773 | sed '/./@{H;$!d@} ; x ; s/REGEXP/REPLACEMENT/' | 
|---|
|  | 3774 | @end example | 
|---|
|  | 3775 | @codequoteundirected off | 
|---|
|  | 3776 | @codequotebacktick off | 
|---|
|  | 3777 |  | 
|---|
|  | 3778 | @enumerate | 
|---|
|  | 3779 | @item | 
|---|
|  | 3780 | The first expression, @code{/./@{H;$!d@}} operates on all non-empty lines, | 
|---|
|  | 3781 | and adds the current line (in the pattern space) to the hold space. | 
|---|
|  | 3782 | On all lines except the last, the pattern space is deleted and the cycle is | 
|---|
|  | 3783 | restarted. | 
|---|
|  | 3784 |  | 
|---|
|  | 3785 | @item | 
|---|
|  | 3786 | The other expressions @code{x} and @code{s} are executed only on empty | 
|---|
|  | 3787 | lines (i.e. paragraph separators). The @code{x} command fetches the | 
|---|
|  | 3788 | accumulated lines from the hold space back to the pattern space. The | 
|---|
|  | 3789 | @code{s///} command then operates on all the text in the paragraph | 
|---|
|  | 3790 | (including the embedded newlines). | 
|---|
|  | 3791 | @end enumerate | 
|---|
|  | 3792 |  | 
|---|
|  | 3793 | The following example demonstrates this technique: | 
|---|
|  | 3794 | @codequoteundirected on | 
|---|
|  | 3795 | @codequotebacktick on | 
|---|
|  | 3796 | @example | 
|---|
|  | 3797 | @group | 
|---|
|  | 3798 | $ cat input.txt | 
|---|
|  | 3799 | a a a aa aaa | 
|---|
|  | 3800 | aaaa aaaa aa | 
|---|
|  | 3801 | aaaa aaa aaa | 
|---|
|  | 3802 |  | 
|---|
|  | 3803 | bbbb bbb bbb | 
|---|
|  | 3804 | bb bb bbb bb | 
|---|
|  | 3805 | bbbbbbbb bbb | 
|---|
|  | 3806 |  | 
|---|
|  | 3807 | ccc ccc cccc | 
|---|
|  | 3808 | cccc ccccc c | 
|---|
|  | 3809 | cc cc cc cc | 
|---|
|  | 3810 |  | 
|---|
|  | 3811 | $ sed '/./@{H;$!d@} ; x ; s/^/\nSTART-->/ ; s/$/\n<--END/' input.txt | 
|---|
|  | 3812 |  | 
|---|
|  | 3813 | START--> | 
|---|
|  | 3814 | a a a aa aaa | 
|---|
|  | 3815 | aaaa aaaa aa | 
|---|
|  | 3816 | aaaa aaa aaa | 
|---|
|  | 3817 | <--END | 
|---|
|  | 3818 |  | 
|---|
|  | 3819 | START--> | 
|---|
|  | 3820 | bbbb bbb bbb | 
|---|
|  | 3821 | bb bb bbb bb | 
|---|
|  | 3822 | bbbbbbbb bbb | 
|---|
|  | 3823 | <--END | 
|---|
|  | 3824 |  | 
|---|
|  | 3825 | START--> | 
|---|
|  | 3826 | ccc ccc cccc | 
|---|
|  | 3827 | cccc ccccc c | 
|---|
|  | 3828 | cc cc cc cc | 
|---|
|  | 3829 | <--END | 
|---|
|  | 3830 | @end group | 
|---|
|  | 3831 | @end example | 
|---|
|  | 3832 | @codequoteundirected off | 
|---|
|  | 3833 | @codequotebacktick off | 
|---|
|  | 3834 |  | 
|---|
|  | 3835 | For more annotated examples, @pxref{Text search across multiple lines} | 
|---|
|  | 3836 | and @ref{Line length adjustment}. | 
|---|
|  | 3837 |  | 
|---|
|  | 3838 | @node Branching and flow control | 
|---|
|  | 3839 | @section Branching and Flow Control | 
|---|
|  | 3840 |  | 
|---|
|  | 3841 | The branching commands @code{b}, @code{t}, and @code{T} enable | 
|---|
|  | 3842 | changing the flow of @command{sed} programs. | 
|---|
|  | 3843 |  | 
|---|
|  | 3844 | By default, @command{sed} reads an input line into the pattern buffer, | 
|---|
|  | 3845 | then continues to processes all commands in order. | 
|---|
|  | 3846 | Commands without addresses affect all lines. | 
|---|
|  | 3847 | Commands with addresses affect only matching lines. | 
|---|
|  | 3848 | @xref{Execution Cycle} and @ref{Addresses overview}. | 
|---|
|  | 3849 |  | 
|---|
|  | 3850 | @command{sed} does not support a typical @code{if/then} construct. | 
|---|
|  | 3851 | Instead, some commands can be used as conditionals or to change the | 
|---|
|  | 3852 | default flow control: | 
|---|
|  | 3853 |  | 
|---|
|  | 3854 | @table @code | 
|---|
|  | 3855 |  | 
|---|
|  | 3856 | @item d | 
|---|
|  | 3857 | delete (clears) the current pattern space, | 
|---|
|  | 3858 | and restart the program cycle without processing the rest of the commands | 
|---|
|  | 3859 | and without printing the pattern space. | 
|---|
|  | 3860 |  | 
|---|
|  | 3861 | @item D | 
|---|
|  | 3862 | delete the contents of the pattern space @emph{up to the first newline}, | 
|---|
|  | 3863 | and restart the program cycle without processing the rest of | 
|---|
|  | 3864 | the commands and without printing the pattern space. | 
|---|
|  | 3865 |  | 
|---|
|  | 3866 | @item [addr]X | 
|---|
|  | 3867 | @itemx [addr]@{ X ; X ; X @} | 
|---|
|  | 3868 | @item /regexp/X | 
|---|
|  | 3869 | @item /regexp/@{ X ; X ; X @} | 
|---|
|  | 3870 | Addresses and regular expressions can be used as an @code{if/then} | 
|---|
|  | 3871 | conditional: If @var{[addr]} matches the current pattern space, | 
|---|
|  | 3872 | execute the command(s). | 
|---|
|  | 3873 | For example: The command @code{/^#/d} means: | 
|---|
|  | 3874 | @emph{if} the current pattern matches the regular expression @code{^#} (a line | 
|---|
|  | 3875 | starting with a hash), @emph{then} execute the @code{d} command: | 
|---|
|  | 3876 | delete the line without printing it, and restart the program cycle | 
|---|
|  | 3877 | immediately. | 
|---|
|  | 3878 |  | 
|---|
|  | 3879 | @item b | 
|---|
|  | 3880 | branch unconditionally (that is: always jump to a label, skipping | 
|---|
|  | 3881 | or repeating other commands, without restarting a new cycle). Combined | 
|---|
|  | 3882 | with an address, the branch can be conditionally executed on matched | 
|---|
|  | 3883 | lines. | 
|---|
|  | 3884 |  | 
|---|
|  | 3885 | @item t | 
|---|
|  | 3886 | branch conditionally (that is: jump to a label) @emph{only if} a | 
|---|
|  | 3887 | @code{s///} command has succeeded since the last input line was read | 
|---|
|  | 3888 | or another conditional branch was taken. | 
|---|
|  | 3889 |  | 
|---|
|  | 3890 | @item T | 
|---|
|  | 3891 | similar but opposite to the @code{t} command: branch only if | 
|---|
|  | 3892 | there has been @emph{no} successful substitutions since the last | 
|---|
|  | 3893 | input line was read. | 
|---|
| [599] | 3894 | @end table | 
|---|
|  | 3895 |  | 
|---|
| [3613] | 3896 |  | 
|---|
|  | 3897 | The following two @command{sed} programs are equivalent.  The first | 
|---|
|  | 3898 | (contrived) example uses the @code{b} command to skip the @code{s///} | 
|---|
|  | 3899 | command on lines containing @samp{1}.  The second example uses an | 
|---|
|  | 3900 | address with negation (@samp{!})  to perform substitution only on | 
|---|
|  | 3901 | desired lines.  The @code{y///} command is still executed on all | 
|---|
|  | 3902 | lines: | 
|---|
|  | 3903 |  | 
|---|
|  | 3904 | @codequoteundirected on | 
|---|
|  | 3905 | @codequotebacktick on | 
|---|
|  | 3906 | @example | 
|---|
|  | 3907 | @group | 
|---|
|  | 3908 | $ printf '%s\n' a1 a2 a3 | sed -E '/1/bx ; s/a/z/ ; :x ; y/123/456/' | 
|---|
|  | 3909 | a4 | 
|---|
|  | 3910 | z5 | 
|---|
|  | 3911 | z6 | 
|---|
|  | 3912 |  | 
|---|
|  | 3913 | $ printf '%s\n' a1 a2 a3 | sed -E '/1/!s/a/z/ ; y/123/456/' | 
|---|
|  | 3914 | a4 | 
|---|
|  | 3915 | z5 | 
|---|
|  | 3916 | z6 | 
|---|
|  | 3917 | @end group | 
|---|
|  | 3918 | @end example | 
|---|
|  | 3919 | @codequoteundirected off | 
|---|
|  | 3920 | @codequotebacktick off | 
|---|
|  | 3921 |  | 
|---|
|  | 3922 |  | 
|---|
|  | 3923 |  | 
|---|
|  | 3924 | @subsection Branching and Cycles | 
|---|
|  | 3925 | @cindex labels | 
|---|
|  | 3926 | @cindex omitting labels | 
|---|
|  | 3927 | @cindex cycle, restarting | 
|---|
|  | 3928 | @cindex restarting a cycle | 
|---|
|  | 3929 | The @code{b},@code{t} and @code{T} commands can be followed by a label | 
|---|
|  | 3930 | (typically a single letter). Labels are defined with a colon followed by | 
|---|
|  | 3931 | one or more letters (e.g. @samp{:x}). If the label is omitted the | 
|---|
|  | 3932 | branch commands restart the cycle.  Note the difference between | 
|---|
|  | 3933 | branching to a label and restarting the cycle: when a cycle is | 
|---|
|  | 3934 | restarted, @command{sed} first prints the current content of the | 
|---|
|  | 3935 | pattern space, then reads the next input line into the pattern space; | 
|---|
|  | 3936 | Jumping to a label (even if it is at the beginning of the program) | 
|---|
|  | 3937 | does not print the pattern space and does not read the next input line. | 
|---|
|  | 3938 |  | 
|---|
|  | 3939 | The following program is a no-op. The @code{b} command (the only command | 
|---|
|  | 3940 | in the program) does not have a label, and thus simply restarts the cycle. | 
|---|
|  | 3941 | On each cycle, the pattern space is printed and the next input line is read: | 
|---|
|  | 3942 |  | 
|---|
|  | 3943 | @example | 
|---|
|  | 3944 | @group | 
|---|
|  | 3945 | $ seq 3 | sed b | 
|---|
|  | 3946 | 1 | 
|---|
|  | 3947 | 2 | 
|---|
|  | 3948 | 3 | 
|---|
|  | 3949 | @end group | 
|---|
|  | 3950 | @end example | 
|---|
|  | 3951 |  | 
|---|
|  | 3952 | @cindex infinite loop, branching | 
|---|
|  | 3953 | @cindex branching, infinite loop | 
|---|
|  | 3954 | The following example is an infinite-loop - it doesn't terminate and | 
|---|
|  | 3955 | doesn't print anything. The @code{b} command jumps to the @samp{x} | 
|---|
|  | 3956 | label, and a new cycle is never started: | 
|---|
|  | 3957 |  | 
|---|
|  | 3958 | @codequoteundirected on | 
|---|
|  | 3959 | @codequotebacktick on | 
|---|
|  | 3960 | @example | 
|---|
|  | 3961 | @group | 
|---|
|  | 3962 | $ seq 3 | sed ':x ; bx' | 
|---|
|  | 3963 |  | 
|---|
|  | 3964 | # The above command requires gnu sed (which supports additional | 
|---|
|  | 3965 | # commands following a label, without a newline). A portable equivalent: | 
|---|
|  | 3966 | #     sed -e ':x' -e bx | 
|---|
|  | 3967 | @end group | 
|---|
|  | 3968 | @end example | 
|---|
|  | 3969 | @codequoteundirected off | 
|---|
|  | 3970 | @codequotebacktick off | 
|---|
|  | 3971 |  | 
|---|
|  | 3972 | @cindex branching and n, N | 
|---|
|  | 3973 | @cindex n, and branching | 
|---|
|  | 3974 | @cindex N, and branching | 
|---|
|  | 3975 | Branching is often complemented with the @code{n} or @code{N} commands: | 
|---|
|  | 3976 | both commands read the next input line into the pattern space without waiting | 
|---|
|  | 3977 | for the cycle to restart. Before reading the next input line, @code{n} | 
|---|
|  | 3978 | prints the current pattern space then empties it, while @code{N} | 
|---|
|  | 3979 | appends a newline and the next input line to the pattern space. | 
|---|
|  | 3980 |  | 
|---|
|  | 3981 | Consider the following two examples: | 
|---|
|  | 3982 |  | 
|---|
|  | 3983 | @codequoteundirected on | 
|---|
|  | 3984 | @codequotebacktick on | 
|---|
|  | 3985 | @example | 
|---|
|  | 3986 | @group | 
|---|
|  | 3987 | $ seq 3 | sed ':x ; n ; bx' | 
|---|
|  | 3988 | 1 | 
|---|
|  | 3989 | 2 | 
|---|
|  | 3990 | 3 | 
|---|
|  | 3991 |  | 
|---|
|  | 3992 | $ seq 3 | sed ':x ; N ; bx' | 
|---|
|  | 3993 | 1 | 
|---|
|  | 3994 | 2 | 
|---|
|  | 3995 | 3 | 
|---|
|  | 3996 | @end group | 
|---|
|  | 3997 | @end example | 
|---|
|  | 3998 | @codequoteundirected off | 
|---|
|  | 3999 | @codequotebacktick off | 
|---|
|  | 4000 |  | 
|---|
|  | 4001 | @itemize | 
|---|
|  | 4002 | @item | 
|---|
|  | 4003 | Both examples do not inf-loop, despite never starting a new cycle. | 
|---|
|  | 4004 |  | 
|---|
|  | 4005 | @item | 
|---|
|  | 4006 | In the first example, the @code{n} commands first prints the content | 
|---|
|  | 4007 | of the pattern space, empties the pattern space then reads the next | 
|---|
|  | 4008 | input line. | 
|---|
|  | 4009 |  | 
|---|
|  | 4010 | @item | 
|---|
|  | 4011 | In the second example, the @code{N} commands appends the next input | 
|---|
|  | 4012 | line to the pattern space (with a newline).  Lines are accumulated in | 
|---|
|  | 4013 | the pattern space until there are no more input lines to read, then | 
|---|
|  | 4014 | the @code{N} command terminates the @command{sed} program. When the | 
|---|
|  | 4015 | program terminates, the end-of-cycle actions are performed, and the | 
|---|
|  | 4016 | entire pattern space is printed. | 
|---|
|  | 4017 |  | 
|---|
|  | 4018 | @item | 
|---|
|  | 4019 | The second example requires @value{SSED}, | 
|---|
|  | 4020 | because it uses the non-POSIX-standard behavior of @code{N}. | 
|---|
|  | 4021 | See the ``@code{N} command on the last line'' paragraph | 
|---|
|  | 4022 | in @ref{Reporting Bugs}. | 
|---|
|  | 4023 |  | 
|---|
|  | 4024 | @item | 
|---|
|  | 4025 | To further examine the difference between the two examples, | 
|---|
|  | 4026 | try the following commands: | 
|---|
|  | 4027 | @codequoteundirected on | 
|---|
|  | 4028 | @codequotebacktick on | 
|---|
|  | 4029 | @example | 
|---|
|  | 4030 | @group | 
|---|
|  | 4031 | printf '%s\n' aa bb cc dd | sed ':x ; n ; = ; bx' | 
|---|
|  | 4032 | printf '%s\n' aa bb cc dd | sed ':x ; N ; = ; bx' | 
|---|
|  | 4033 | printf '%s\n' aa bb cc dd | sed ':x ; n ; s/\n/***/ ; bx' | 
|---|
|  | 4034 | printf '%s\n' aa bb cc dd | sed ':x ; N ; s/\n/***/ ; bx' | 
|---|
|  | 4035 | @end group | 
|---|
|  | 4036 | @end example | 
|---|
|  | 4037 | @codequoteundirected off | 
|---|
|  | 4038 | @codequotebacktick off | 
|---|
|  | 4039 |  | 
|---|
|  | 4040 | @end itemize | 
|---|
|  | 4041 |  | 
|---|
|  | 4042 |  | 
|---|
|  | 4043 |  | 
|---|
|  | 4044 | @subsection Branching example: joining lines | 
|---|
|  | 4045 |  | 
|---|
|  | 4046 | @cindex joining lines with branching | 
|---|
|  | 4047 | @cindex branching, joining lines | 
|---|
|  | 4048 | @cindex quoted-printable lines, joining | 
|---|
|  | 4049 | @cindex joining quoted-printable lines | 
|---|
|  | 4050 | @cindex t, joining lines with | 
|---|
|  | 4051 | @cindex b, joining lines with | 
|---|
|  | 4052 | @cindex b, versus t | 
|---|
|  | 4053 | @cindex t, versus b | 
|---|
|  | 4054 | As a real-world example of using branching, consider the case of | 
|---|
|  | 4055 | @uref{https://en.wikipedia.org/wiki/Quoted-printable,quoted-printable} files, | 
|---|
|  | 4056 | typically used to encode email messages. | 
|---|
|  | 4057 | In these files long lines are split and marked with a @dfn{soft line break} | 
|---|
|  | 4058 | consisting of a single @samp{=} character at the end of the line: | 
|---|
|  | 4059 |  | 
|---|
|  | 4060 | @example | 
|---|
|  | 4061 | @group | 
|---|
|  | 4062 | $ cat jaques.txt | 
|---|
|  | 4063 | All the wor= | 
|---|
|  | 4064 | ld's a stag= | 
|---|
|  | 4065 | e, | 
|---|
|  | 4066 | And all the= | 
|---|
|  | 4067 | men and wo= | 
|---|
|  | 4068 | men merely = | 
|---|
|  | 4069 | players: | 
|---|
|  | 4070 | They have t= | 
|---|
|  | 4071 | heir exits = | 
|---|
|  | 4072 | and their e= | 
|---|
|  | 4073 | ntrances; | 
|---|
|  | 4074 | And one man= | 
|---|
|  | 4075 | in his tim= | 
|---|
|  | 4076 | e plays man= | 
|---|
|  | 4077 | y parts. | 
|---|
|  | 4078 | @end group | 
|---|
|  | 4079 | @end example | 
|---|
|  | 4080 |  | 
|---|
|  | 4081 |  | 
|---|
|  | 4082 | The following program uses an address match @samp{/=$/} as a | 
|---|
|  | 4083 | conditional: If the current pattern space ends with a @samp{=}, it | 
|---|
|  | 4084 | reads the next input line using @code{N}, replaces all @samp{=} | 
|---|
|  | 4085 | characters which are followed by a newline, and unconditionally | 
|---|
|  | 4086 | branches (@code{b}) to the beginning of the program without restarting | 
|---|
|  | 4087 | a new cycle. If the pattern space does not ends with @samp{=}, the | 
|---|
|  | 4088 | default action is performed: the pattern space is printed and a new | 
|---|
|  | 4089 | cycle is started: | 
|---|
|  | 4090 |  | 
|---|
|  | 4091 | @codequoteundirected on | 
|---|
|  | 4092 | @codequotebacktick on | 
|---|
|  | 4093 | @example | 
|---|
|  | 4094 | @group | 
|---|
|  | 4095 | $ sed ':x ; /=$/ @{ N ; s/=\n//g ; bx @}' jaques.txt | 
|---|
|  | 4096 | All the world's a stage, | 
|---|
|  | 4097 | And all the men and women merely players: | 
|---|
|  | 4098 | They have their exits and their entrances; | 
|---|
|  | 4099 | And one man in his time plays many parts. | 
|---|
|  | 4100 | @end group | 
|---|
|  | 4101 | @end example | 
|---|
|  | 4102 | @codequoteundirected off | 
|---|
|  | 4103 | @codequotebacktick off | 
|---|
|  | 4104 |  | 
|---|
|  | 4105 | Here's an alternative program with a slightly different approach: On | 
|---|
|  | 4106 | all lines except the last, @code{N} appends the line to the pattern | 
|---|
|  | 4107 | space.  A substitution command then removes soft line breaks | 
|---|
|  | 4108 | (@samp{=} at the end of a line, i.e. followed by a newline) by replacing | 
|---|
|  | 4109 | them with an empty string. | 
|---|
|  | 4110 | @emph{if} the substitution was successful (meaning the pattern space contained | 
|---|
|  | 4111 | a line which should be joined), The conditional branch command @code{t} jumps | 
|---|
|  | 4112 | to the beginning of the program without completing or restarting the cycle. | 
|---|
|  | 4113 | If the substitution failed (meaning there were no soft line breaks), | 
|---|
|  | 4114 | The @code{t} command will @emph{not} branch. Then, @code{P} will | 
|---|
|  | 4115 | print the pattern space content until the first newline, and @code{D} | 
|---|
|  | 4116 | will delete the pattern space content until the first new line. | 
|---|
|  | 4117 | (To learn more about @code{N}, @code{P} and @code{D} commands | 
|---|
|  | 4118 | @pxref{Multiline techniques}). | 
|---|
|  | 4119 |  | 
|---|
|  | 4120 |  | 
|---|
|  | 4121 | @codequoteundirected on | 
|---|
|  | 4122 | @codequotebacktick on | 
|---|
|  | 4123 | @example | 
|---|
|  | 4124 | @group | 
|---|
|  | 4125 | $ sed ':x ; $!N ; s/=\n// ; tx ; P ; D' jaques.txt | 
|---|
|  | 4126 | All the world's a stage, | 
|---|
|  | 4127 | And all the men and women merely players: | 
|---|
|  | 4128 | They have their exits and their entrances; | 
|---|
|  | 4129 | And one man in his time plays many parts. | 
|---|
|  | 4130 | @end group | 
|---|
|  | 4131 | @end example | 
|---|
|  | 4132 | @codequoteundirected off | 
|---|
|  | 4133 | @codequotebacktick off | 
|---|
|  | 4134 |  | 
|---|
|  | 4135 |  | 
|---|
|  | 4136 | For more line-joining examples @pxref{Joining lines}. | 
|---|
|  | 4137 |  | 
|---|
|  | 4138 |  | 
|---|
| [599] | 4139 | @node Examples | 
|---|
|  | 4140 | @chapter Some Sample Scripts | 
|---|
|  | 4141 |  | 
|---|
|  | 4142 | Here are some @command{sed} scripts to guide you in the art of mastering | 
|---|
|  | 4143 | @command{sed}. | 
|---|
|  | 4144 |  | 
|---|
|  | 4145 | @menu | 
|---|
| [3613] | 4146 |  | 
|---|
|  | 4147 | Useful one-liners: | 
|---|
|  | 4148 | * Joining lines:: | 
|---|
|  | 4149 |  | 
|---|
| [599] | 4150 | Some exotic examples: | 
|---|
|  | 4151 | * Centering lines:: | 
|---|
|  | 4152 | * Increment a number:: | 
|---|
|  | 4153 | * Rename files to lower case:: | 
|---|
|  | 4154 | * Print bash environment:: | 
|---|
|  | 4155 | * Reverse chars of lines:: | 
|---|
| [3613] | 4156 | * Text search across multiple lines:: | 
|---|
|  | 4157 | * Line length adjustment:: | 
|---|
|  | 4158 | * Adding a header to multiple files:: | 
|---|
| [599] | 4159 |  | 
|---|
|  | 4160 | Emulating standard utilities: | 
|---|
|  | 4161 | * tac::                             Reverse lines of files | 
|---|
|  | 4162 | * cat -n::                          Numbering lines | 
|---|
|  | 4163 | * cat -b::                          Numbering non-blank lines | 
|---|
|  | 4164 | * wc -c::                           Counting chars | 
|---|
|  | 4165 | * wc -w::                           Counting words | 
|---|
|  | 4166 | * wc -l::                           Counting lines | 
|---|
|  | 4167 | * head::                            Printing the first lines | 
|---|
|  | 4168 | * tail::                            Printing the last lines | 
|---|
|  | 4169 | * uniq::                            Make duplicate lines unique | 
|---|
|  | 4170 | * uniq -d::                         Print duplicated lines of input | 
|---|
|  | 4171 | * uniq -u::                         Remove all duplicated lines | 
|---|
|  | 4172 | * cat -s::                          Squeezing blank lines | 
|---|
|  | 4173 | @end menu | 
|---|
|  | 4174 |  | 
|---|
| [3613] | 4175 | @node Joining lines | 
|---|
|  | 4176 | @section Joining lines | 
|---|
|  | 4177 |  | 
|---|
|  | 4178 | This section uses @code{N}, @code{D} and @code{P} commands to process | 
|---|
|  | 4179 | multiple lines, and the @code{b} and @code{t} commands for branching. | 
|---|
|  | 4180 | @xref{Multiline techniques} and @ref{Branching and flow control}. | 
|---|
|  | 4181 |  | 
|---|
|  | 4182 | Join specific lines (e.g. if lines 2 and 3 need to be joined): | 
|---|
|  | 4183 |  | 
|---|
|  | 4184 | @codequoteundirected on | 
|---|
|  | 4185 | @codequotebacktick on | 
|---|
|  | 4186 | @example | 
|---|
|  | 4187 | $ cat lines.txt | 
|---|
|  | 4188 | hello | 
|---|
|  | 4189 | hel | 
|---|
|  | 4190 | lo | 
|---|
|  | 4191 | hello | 
|---|
|  | 4192 |  | 
|---|
|  | 4193 | $ sed '2@{N;s/\n//;@}' lines.txt | 
|---|
|  | 4194 | hello | 
|---|
|  | 4195 | hello | 
|---|
|  | 4196 | hello | 
|---|
|  | 4197 | @end example | 
|---|
|  | 4198 | @codequoteundirected off | 
|---|
|  | 4199 | @codequotebacktick off | 
|---|
|  | 4200 |  | 
|---|
|  | 4201 | Join backslash-continued lines: | 
|---|
|  | 4202 |  | 
|---|
|  | 4203 | @codequoteundirected on | 
|---|
|  | 4204 | @codequotebacktick on | 
|---|
|  | 4205 | @example | 
|---|
|  | 4206 | $ cat 1.txt | 
|---|
|  | 4207 | this \ | 
|---|
|  | 4208 | is \ | 
|---|
|  | 4209 | a \ | 
|---|
|  | 4210 | long \ | 
|---|
|  | 4211 | line | 
|---|
|  | 4212 | and another \ | 
|---|
|  | 4213 | line | 
|---|
|  | 4214 |  | 
|---|
|  | 4215 | $ sed -e ':x /\\$/ @{ N; s/\\\n//g ; bx @}'  1.txt | 
|---|
|  | 4216 | this is a long line | 
|---|
|  | 4217 | and another line | 
|---|
|  | 4218 |  | 
|---|
|  | 4219 |  | 
|---|
|  | 4220 | #TODO: The above requires gnu sed. | 
|---|
|  | 4221 | #      non-gnu seds need newlines after ':' and 'b' | 
|---|
|  | 4222 | @end example | 
|---|
|  | 4223 | @codequoteundirected off | 
|---|
|  | 4224 | @codequotebacktick off | 
|---|
|  | 4225 |  | 
|---|
|  | 4226 | Join lines that start with whitespace (e.g SMTP headers): | 
|---|
|  | 4227 |  | 
|---|
|  | 4228 | @codequoteundirected on | 
|---|
|  | 4229 | @codequotebacktick on | 
|---|
|  | 4230 | @example | 
|---|
|  | 4231 | @group | 
|---|
|  | 4232 | $ cat 2.txt | 
|---|
|  | 4233 | Subject: Hello | 
|---|
|  | 4234 | World | 
|---|
|  | 4235 | Content-Type: multipart/alternative; | 
|---|
|  | 4236 | boundary=94eb2c190cc6370f06054535da6a | 
|---|
|  | 4237 | Date: Tue, 3 Jan 2017 19:41:16 +0000 (GMT) | 
|---|
|  | 4238 | Authentication-Results: mx.gnu.org; | 
|---|
|  | 4239 | dkim=pass header.i=@@gnu.org; | 
|---|
|  | 4240 | spf=pass | 
|---|
|  | 4241 | Message-ID: <abcdef@@gnu.org> | 
|---|
|  | 4242 | From: John Doe <jdoe@@gnu.org> | 
|---|
|  | 4243 | To: Jane Smith <jsmith@@gnu.org> | 
|---|
|  | 4244 |  | 
|---|
|  | 4245 | $ sed -E ':a ; $!N ; s/\n\s+/ / ; ta ; P ; D' 2.txt | 
|---|
|  | 4246 | Subject: Hello World | 
|---|
|  | 4247 | Content-Type: multipart/alternative; boundary=94eb2c190cc6370f06054535da6a | 
|---|
|  | 4248 | Date: Tue, 3 Jan 2017 19:41:16 +0000 (GMT) | 
|---|
|  | 4249 | Authentication-Results: mx.gnu.org; dkim=pass header.i=@@gnu.org; spf=pass | 
|---|
|  | 4250 | Message-ID: <abcdef@@gnu.org> | 
|---|
|  | 4251 | From: John Doe <jdoe@@gnu.org> | 
|---|
|  | 4252 | To: Jane Smith <jsmith@@gnu.org> | 
|---|
|  | 4253 |  | 
|---|
|  | 4254 | # A portable (non-gnu) variation: | 
|---|
|  | 4255 | #   sed -e :a -e '$!N;s/\n  */ /;ta' -e 'P;D' | 
|---|
|  | 4256 | @end group | 
|---|
|  | 4257 | @end example | 
|---|
|  | 4258 | @codequoteundirected off | 
|---|
|  | 4259 | @codequotebacktick off | 
|---|
|  | 4260 |  | 
|---|
|  | 4261 |  | 
|---|
| [599] | 4262 | @node Centering lines | 
|---|
|  | 4263 | @section Centering Lines | 
|---|
|  | 4264 |  | 
|---|
|  | 4265 | This script centers all lines of a file on a 80 columns width. | 
|---|
|  | 4266 | To change that width, the number in @code{\@{@dots{}\@}} must be | 
|---|
|  | 4267 | replaced, and the number of added spaces also must be changed. | 
|---|
|  | 4268 |  | 
|---|
|  | 4269 | Note how the buffer commands are used to separate parts in | 
|---|
|  | 4270 | the regular expressions to be matched---this is a common | 
|---|
|  | 4271 | technique. | 
|---|
|  | 4272 |  | 
|---|
|  | 4273 | @c start------------------------------------------- | 
|---|
|  | 4274 | @example | 
|---|
|  | 4275 | #!/usr/bin/sed -f | 
|---|
|  | 4276 |  | 
|---|
|  | 4277 | @group | 
|---|
|  | 4278 | # Put 80 spaces in the buffer | 
|---|
|  | 4279 | 1 @{ | 
|---|
|  | 4280 | x | 
|---|
|  | 4281 | s/^$/          / | 
|---|
|  | 4282 | s/^.*$/&&&&&&&&/ | 
|---|
|  | 4283 | x | 
|---|
|  | 4284 | @} | 
|---|
|  | 4285 | @end group | 
|---|
|  | 4286 |  | 
|---|
|  | 4287 | @group | 
|---|
| [3613] | 4288 | # delete leading and trailing spaces | 
|---|
|  | 4289 | y/@kbd{@key{TAB}}/ / | 
|---|
| [599] | 4290 | s/^ *// | 
|---|
|  | 4291 | s/ *$// | 
|---|
|  | 4292 | @end group | 
|---|
|  | 4293 |  | 
|---|
|  | 4294 | @group | 
|---|
|  | 4295 | # add a newline and 80 spaces to end of line | 
|---|
|  | 4296 | G | 
|---|
|  | 4297 | @end group | 
|---|
|  | 4298 |  | 
|---|
|  | 4299 | @group | 
|---|
|  | 4300 | # keep first 81 chars (80 + a newline) | 
|---|
|  | 4301 | s/^\(.\@{81\@}\).*$/\1/ | 
|---|
|  | 4302 | @end group | 
|---|
|  | 4303 |  | 
|---|
|  | 4304 | @group | 
|---|
|  | 4305 | # \2 matches half of the spaces, which are moved to the beginning | 
|---|
|  | 4306 | s/^\(.*\)\n\(.*\)\2/\2\1/ | 
|---|
|  | 4307 | @end group | 
|---|
|  | 4308 | @end example | 
|---|
|  | 4309 | @c end--------------------------------------------- | 
|---|
|  | 4310 |  | 
|---|
|  | 4311 | @node Increment a number | 
|---|
|  | 4312 | @section Increment a Number | 
|---|
|  | 4313 |  | 
|---|
|  | 4314 | This script is one of a few that demonstrate how to do arithmetic | 
|---|
|  | 4315 | in @command{sed}.  This is indeed possible,@footnote{@command{sed} guru Greg | 
|---|
|  | 4316 | Ubben wrote an implementation of the @command{dc} @sc{rpn} calculator! | 
|---|
|  | 4317 | It is distributed together with sed.} but must be done manually. | 
|---|
|  | 4318 |  | 
|---|
|  | 4319 | To increment one number you just add 1 to last digit, replacing | 
|---|
|  | 4320 | it by the following digit.  There is one exception: when the digit | 
|---|
|  | 4321 | is a nine the previous digits must be also incremented until you | 
|---|
|  | 4322 | don't have a nine. | 
|---|
|  | 4323 |  | 
|---|
|  | 4324 | This solution by Bruno Haible is very clever and smart because | 
|---|
|  | 4325 | it uses a single buffer; if you don't have this limitation, the | 
|---|
|  | 4326 | algorithm used in @ref{cat -n, Numbering lines}, is faster. | 
|---|
|  | 4327 | It works by replacing trailing nines with an underscore, then | 
|---|
|  | 4328 | using multiple @code{s} commands to increment the last digit, | 
|---|
|  | 4329 | and then again substituting underscores with zeros. | 
|---|
|  | 4330 |  | 
|---|
|  | 4331 | @c start------------------------------------------- | 
|---|
|  | 4332 | @example | 
|---|
|  | 4333 | #!/usr/bin/sed -f | 
|---|
|  | 4334 |  | 
|---|
|  | 4335 | /[^0-9]/ d | 
|---|
|  | 4336 |  | 
|---|
|  | 4337 | @group | 
|---|
| [3613] | 4338 | # replace all trailing 9s by _ (any other character except digits, could | 
|---|
| [599] | 4339 | # be used) | 
|---|
|  | 4340 | :d | 
|---|
|  | 4341 | s/9\(_*\)$/_\1/ | 
|---|
|  | 4342 | td | 
|---|
|  | 4343 | @end group | 
|---|
|  | 4344 |  | 
|---|
|  | 4345 | @group | 
|---|
|  | 4346 | # incr last digit only.  The first line adds a most-significant | 
|---|
|  | 4347 | # digit of 1 if we have to add a digit. | 
|---|
|  | 4348 | @end group | 
|---|
|  | 4349 |  | 
|---|
|  | 4350 | @group | 
|---|
|  | 4351 | s/^\(_*\)$/1\1/; tn | 
|---|
|  | 4352 | s/8\(_*\)$/9\1/; tn | 
|---|
|  | 4353 | s/7\(_*\)$/8\1/; tn | 
|---|
|  | 4354 | s/6\(_*\)$/7\1/; tn | 
|---|
|  | 4355 | s/5\(_*\)$/6\1/; tn | 
|---|
|  | 4356 | s/4\(_*\)$/5\1/; tn | 
|---|
|  | 4357 | s/3\(_*\)$/4\1/; tn | 
|---|
|  | 4358 | s/2\(_*\)$/3\1/; tn | 
|---|
|  | 4359 | s/1\(_*\)$/2\1/; tn | 
|---|
|  | 4360 | s/0\(_*\)$/1\1/; tn | 
|---|
|  | 4361 | @end group | 
|---|
|  | 4362 |  | 
|---|
|  | 4363 | @group | 
|---|
|  | 4364 | :n | 
|---|
|  | 4365 | y/_/0/ | 
|---|
|  | 4366 | @end group | 
|---|
|  | 4367 | @end example | 
|---|
|  | 4368 | @c end--------------------------------------------- | 
|---|
|  | 4369 |  | 
|---|
|  | 4370 | @node Rename files to lower case | 
|---|
|  | 4371 | @section Rename Files to Lower Case | 
|---|
|  | 4372 |  | 
|---|
|  | 4373 | This is a pretty strange use of @command{sed}.  We transform text, and | 
|---|
|  | 4374 | transform it to be shell commands, then just feed them to shell. | 
|---|
|  | 4375 | Don't worry, even worse hacks are done when using @command{sed}; I have | 
|---|
|  | 4376 | seen a script converting the output of @command{date} into a @command{bc} | 
|---|
|  | 4377 | program! | 
|---|
| [3613] | 4378 |  | 
|---|
| [599] | 4379 | The main body of this is the @command{sed} script, which remaps the name | 
|---|
| [3613] | 4380 | from lower to upper (or vice-versa) and even checks out | 
|---|
| [599] | 4381 | if the remapped name is the same as the original name. | 
|---|
|  | 4382 | Note how the script is parameterized using shell | 
|---|
|  | 4383 | variables and proper quoting. | 
|---|
|  | 4384 |  | 
|---|
|  | 4385 | @c start------------------------------------------- | 
|---|
|  | 4386 | @example | 
|---|
|  | 4387 | @group | 
|---|
|  | 4388 | #! /bin/sh | 
|---|
| [3613] | 4389 | # rename files to lower/upper case... | 
|---|
| [599] | 4390 | # | 
|---|
| [3613] | 4391 | # usage: | 
|---|
|  | 4392 | #    move-to-lower * | 
|---|
|  | 4393 | #    move-to-upper * | 
|---|
| [599] | 4394 | # or | 
|---|
|  | 4395 | #    move-to-lower -R . | 
|---|
|  | 4396 | #    move-to-upper -R . | 
|---|
|  | 4397 | # | 
|---|
|  | 4398 | @end group | 
|---|
|  | 4399 |  | 
|---|
|  | 4400 | @group | 
|---|
|  | 4401 | help() | 
|---|
|  | 4402 | @{ | 
|---|
| [3613] | 4403 | cat << eof | 
|---|
| [599] | 4404 | Usage: $0 [-n] [-r] [-h] files... | 
|---|
|  | 4405 | @end group | 
|---|
|  | 4406 |  | 
|---|
|  | 4407 | @group | 
|---|
|  | 4408 | -n      do nothing, only see what would be done | 
|---|
|  | 4409 | -R      recursive (use find) | 
|---|
|  | 4410 | -h      this message | 
|---|
|  | 4411 | files   files to remap to lower case | 
|---|
|  | 4412 | @end group | 
|---|
|  | 4413 |  | 
|---|
|  | 4414 | @group | 
|---|
|  | 4415 | Examples: | 
|---|
|  | 4416 | $0 -n *        (see if everything is ok, then...) | 
|---|
|  | 4417 | $0 * | 
|---|
|  | 4418 | @end group | 
|---|
|  | 4419 |  | 
|---|
|  | 4420 | $0 -R . | 
|---|
|  | 4421 |  | 
|---|
|  | 4422 | @group | 
|---|
|  | 4423 | eof | 
|---|
|  | 4424 | @} | 
|---|
|  | 4425 | @end group | 
|---|
|  | 4426 |  | 
|---|
|  | 4427 | @group | 
|---|
|  | 4428 | apply_cmd='sh' | 
|---|
|  | 4429 | finder='echo "$@@" | tr " " "\n"' | 
|---|
|  | 4430 | files_only= | 
|---|
|  | 4431 | @end group | 
|---|
|  | 4432 |  | 
|---|
|  | 4433 | @group | 
|---|
|  | 4434 | while : | 
|---|
|  | 4435 | do | 
|---|
| [3613] | 4436 | case "$1" in | 
|---|
| [599] | 4437 | -n) apply_cmd='cat' ;; | 
|---|
|  | 4438 | -R) finder='find "$@@" -type f';; | 
|---|
|  | 4439 | -h) help ; exit 1 ;; | 
|---|
|  | 4440 | *) break ;; | 
|---|
|  | 4441 | esac | 
|---|
|  | 4442 | shift | 
|---|
|  | 4443 | done | 
|---|
|  | 4444 | @end group | 
|---|
|  | 4445 |  | 
|---|
|  | 4446 | @group | 
|---|
|  | 4447 | if [ -z "$1" ]; then | 
|---|
|  | 4448 | echo Usage: $0 [-h] [-n] [-r] files... | 
|---|
|  | 4449 | exit 1 | 
|---|
|  | 4450 | fi | 
|---|
|  | 4451 | @end group | 
|---|
|  | 4452 |  | 
|---|
|  | 4453 | @group | 
|---|
|  | 4454 | LOWER='abcdefghijklmnopqrstuvwxyz' | 
|---|
|  | 4455 | UPPER='ABCDEFGHIJKLMNOPQRSTUVWXYZ' | 
|---|
|  | 4456 | @end group | 
|---|
|  | 4457 |  | 
|---|
|  | 4458 | @group | 
|---|
|  | 4459 | case `basename $0` in | 
|---|
|  | 4460 | *upper*) TO=$UPPER; FROM=$LOWER ;; | 
|---|
|  | 4461 | *)       FROM=$UPPER; TO=$LOWER ;; | 
|---|
|  | 4462 | esac | 
|---|
|  | 4463 | @end group | 
|---|
| [3613] | 4464 |  | 
|---|
| [599] | 4465 | eval $finder | sed -n ' | 
|---|
|  | 4466 |  | 
|---|
|  | 4467 | @group | 
|---|
|  | 4468 | # remove all trailing slashes | 
|---|
|  | 4469 | s/\/*$// | 
|---|
|  | 4470 | @end group | 
|---|
|  | 4471 |  | 
|---|
|  | 4472 | @group | 
|---|
|  | 4473 | # add ./ if there is no path, only a filename | 
|---|
|  | 4474 | /\//! s/^/.\// | 
|---|
|  | 4475 | @end group | 
|---|
|  | 4476 |  | 
|---|
|  | 4477 | @group | 
|---|
|  | 4478 | # save path+filename | 
|---|
|  | 4479 | h | 
|---|
|  | 4480 | @end group | 
|---|
|  | 4481 |  | 
|---|
|  | 4482 | @group | 
|---|
|  | 4483 | # remove path | 
|---|
|  | 4484 | s/.*\/// | 
|---|
|  | 4485 | @end group | 
|---|
|  | 4486 |  | 
|---|
|  | 4487 | @group | 
|---|
|  | 4488 | # do conversion only on filename | 
|---|
|  | 4489 | y/'$FROM'/'$TO'/ | 
|---|
|  | 4490 | @end group | 
|---|
|  | 4491 |  | 
|---|
|  | 4492 | @group | 
|---|
|  | 4493 | # now line contains original path+file, while | 
|---|
|  | 4494 | # hold space contains the new filename | 
|---|
|  | 4495 | x | 
|---|
|  | 4496 | @end group | 
|---|
|  | 4497 |  | 
|---|
|  | 4498 | @group | 
|---|
|  | 4499 | # add converted file name to line, which now contains | 
|---|
|  | 4500 | # path/file-name\nconverted-file-name | 
|---|
|  | 4501 | G | 
|---|
|  | 4502 | @end group | 
|---|
|  | 4503 |  | 
|---|
|  | 4504 | @group | 
|---|
|  | 4505 | # check if converted file name is equal to original file name, | 
|---|
| [3613] | 4506 | # if it is, do not print anything | 
|---|
| [599] | 4507 | /^.*\/\(.*\)\n\1/b | 
|---|
|  | 4508 | @end group | 
|---|
|  | 4509 |  | 
|---|
|  | 4510 | @group | 
|---|
| [3613] | 4511 | # escape special characters for the shell | 
|---|
|  | 4512 | s/["$`\\]/\\&/g | 
|---|
|  | 4513 | @end group | 
|---|
|  | 4514 |  | 
|---|
|  | 4515 | @group | 
|---|
| [599] | 4516 | # now, transform path/fromfile\n, into | 
|---|
|  | 4517 | # mv path/fromfile path/tofile and print it | 
|---|
|  | 4518 | s/^\(.*\/\)\(.*\)\n\(.*\)$/mv "\1\2" "\1\3"/p | 
|---|
|  | 4519 | @end group | 
|---|
|  | 4520 |  | 
|---|
|  | 4521 | ' | $apply_cmd | 
|---|
|  | 4522 | @end example | 
|---|
|  | 4523 | @c end--------------------------------------------- | 
|---|
|  | 4524 |  | 
|---|
|  | 4525 | @node Print bash environment | 
|---|
|  | 4526 | @section Print @command{bash} Environment | 
|---|
|  | 4527 |  | 
|---|
|  | 4528 | This script strips the definition of the shell functions | 
|---|
|  | 4529 | from the output of the @command{set} Bourne-shell command. | 
|---|
|  | 4530 |  | 
|---|
|  | 4531 | @c start------------------------------------------- | 
|---|
|  | 4532 | @example | 
|---|
|  | 4533 | #!/bin/sh | 
|---|
|  | 4534 |  | 
|---|
|  | 4535 | @group | 
|---|
|  | 4536 | set | sed -n ' | 
|---|
|  | 4537 | :x | 
|---|
|  | 4538 | @end group | 
|---|
|  | 4539 |  | 
|---|
|  | 4540 | @group | 
|---|
|  | 4541 | @ifinfo | 
|---|
|  | 4542 | # if no occurrence of "=()" print and load next line | 
|---|
|  | 4543 | @end ifinfo | 
|---|
|  | 4544 | @ifnotinfo | 
|---|
|  | 4545 | # if no occurrence of @samp{=()} print and load next line | 
|---|
|  | 4546 | @end ifnotinfo | 
|---|
|  | 4547 | /=()/! @{ p; b; @} | 
|---|
|  | 4548 | / () $/! @{ p; b; @} | 
|---|
|  | 4549 | @end group | 
|---|
|  | 4550 |  | 
|---|
|  | 4551 | @group | 
|---|
|  | 4552 | # possible start of functions section | 
|---|
|  | 4553 | # save the line in case this is a var like FOO="() " | 
|---|
|  | 4554 | h | 
|---|
|  | 4555 | @end group | 
|---|
|  | 4556 |  | 
|---|
|  | 4557 | @group | 
|---|
|  | 4558 | # if the next line has a brace, we quit because | 
|---|
|  | 4559 | # nothing comes after functions | 
|---|
|  | 4560 | n | 
|---|
|  | 4561 | /^@{/ q | 
|---|
|  | 4562 | @end group | 
|---|
|  | 4563 |  | 
|---|
|  | 4564 | @group | 
|---|
|  | 4565 | # print the old line | 
|---|
|  | 4566 | x; p | 
|---|
|  | 4567 | @end group | 
|---|
|  | 4568 |  | 
|---|
|  | 4569 | @group | 
|---|
|  | 4570 | # work on the new line now | 
|---|
|  | 4571 | x; bx | 
|---|
|  | 4572 | ' | 
|---|
|  | 4573 | @end group | 
|---|
|  | 4574 | @end example | 
|---|
|  | 4575 | @c end--------------------------------------------- | 
|---|
|  | 4576 |  | 
|---|
|  | 4577 | @node Reverse chars of lines | 
|---|
|  | 4578 | @section Reverse Characters of Lines | 
|---|
|  | 4579 |  | 
|---|
|  | 4580 | This script can be used to reverse the position of characters | 
|---|
|  | 4581 | in lines.  The technique moves two characters at a time, hence | 
|---|
|  | 4582 | it is faster than more intuitive implementations. | 
|---|
|  | 4583 |  | 
|---|
|  | 4584 | Note the @code{tx} command before the definition of the label. | 
|---|
|  | 4585 | This is often needed to reset the flag that is tested by | 
|---|
|  | 4586 | the @code{t} command. | 
|---|
|  | 4587 |  | 
|---|
|  | 4588 | Imaginative readers will find uses for this script.  An example | 
|---|
|  | 4589 | is reversing the output of @command{banner}.@footnote{This requires | 
|---|
|  | 4590 | another script to pad the output of banner; for example | 
|---|
|  | 4591 |  | 
|---|
|  | 4592 | @example | 
|---|
|  | 4593 | #! /bin/sh | 
|---|
|  | 4594 |  | 
|---|
|  | 4595 | banner -w $1 $2 $3 $4 | | 
|---|
|  | 4596 | sed -e :a -e '/^.\@{0,'$1'\@}$/ @{ s/$/ /; ba; @}' | | 
|---|
|  | 4597 | ~/sedscripts/reverseline.sed | 
|---|
|  | 4598 | @end example | 
|---|
|  | 4599 | } | 
|---|
|  | 4600 |  | 
|---|
|  | 4601 | @c start------------------------------------------- | 
|---|
|  | 4602 | @example | 
|---|
|  | 4603 | #!/usr/bin/sed -f | 
|---|
|  | 4604 |  | 
|---|
|  | 4605 | /../! b | 
|---|
|  | 4606 |  | 
|---|
|  | 4607 | @group | 
|---|
|  | 4608 | # Reverse a line.  Begin embedding the line between two newlines | 
|---|
|  | 4609 | s/^.*$/\ | 
|---|
|  | 4610 | &\ | 
|---|
|  | 4611 | / | 
|---|
|  | 4612 | @end group | 
|---|
|  | 4613 |  | 
|---|
|  | 4614 | @group | 
|---|
|  | 4615 | # Move first character at the end.  The regexp matches until | 
|---|
|  | 4616 | # there are zero or one characters between the markers | 
|---|
|  | 4617 | tx | 
|---|
|  | 4618 | :x | 
|---|
|  | 4619 | s/\(\n.\)\(.*\)\(.\n\)/\3\2\1/ | 
|---|
|  | 4620 | tx | 
|---|
|  | 4621 | @end group | 
|---|
|  | 4622 |  | 
|---|
|  | 4623 | @group | 
|---|
|  | 4624 | # Remove the newline markers | 
|---|
|  | 4625 | s/\n//g | 
|---|
|  | 4626 | @end group | 
|---|
|  | 4627 | @end example | 
|---|
|  | 4628 | @c end--------------------------------------------- | 
|---|
|  | 4629 |  | 
|---|
| [3613] | 4630 |  | 
|---|
|  | 4631 | @node Text search across multiple lines | 
|---|
|  | 4632 | @section Text search across multiple lines | 
|---|
|  | 4633 |  | 
|---|
|  | 4634 | This section uses @code{N} and @code{D} commands to search for | 
|---|
|  | 4635 | consecutive words spanning multiple lines. @xref{Multiline techniques}. | 
|---|
|  | 4636 |  | 
|---|
|  | 4637 | These examples deal with finding doubled occurrences of words in a document. | 
|---|
|  | 4638 |  | 
|---|
|  | 4639 | Finding doubled words in a single line is easy using GNU @command{grep} | 
|---|
|  | 4640 | and similarly with @value{SSED}: | 
|---|
|  | 4641 |  | 
|---|
|  | 4642 | @c NOTE: in all examples, 'the@ the' is used to prevent | 
|---|
|  | 4643 | @c 'make syntax-check' from complaining about double words. | 
|---|
|  | 4644 | @codequoteundirected on | 
|---|
|  | 4645 | @codequotebacktick on | 
|---|
|  | 4646 | @example | 
|---|
|  | 4647 | @group | 
|---|
|  | 4648 | $ cat two-cities-dup1.txt | 
|---|
|  | 4649 | It was the best of times, | 
|---|
|  | 4650 | it was the worst of times, | 
|---|
|  | 4651 | it was the@ the age of wisdom, | 
|---|
|  | 4652 | it was the age of foolishness, | 
|---|
|  | 4653 |  | 
|---|
|  | 4654 | $ grep -E '\b(\w+)\s+\1\b' two-cities-dup1.txt | 
|---|
|  | 4655 | it was the@ the age of wisdom, | 
|---|
|  | 4656 |  | 
|---|
|  | 4657 | $ grep -n -E '\b(\w+)\s+\1\b' two-cities-dup1.txt | 
|---|
|  | 4658 | 3:it was the@ the age of wisdom, | 
|---|
|  | 4659 |  | 
|---|
|  | 4660 | $ sed -En '/\b(\w+)\s+\1\b/p' two-cities-dup1.txt | 
|---|
|  | 4661 | it was the@ the age of wisdom, | 
|---|
|  | 4662 |  | 
|---|
|  | 4663 | $ sed -En '/\b(\w+)\s+\1\b/@{=;p@}' two-cities-dup1.txt | 
|---|
|  | 4664 | 3 | 
|---|
|  | 4665 | it was the@ the age of wisdom, | 
|---|
|  | 4666 | @end group | 
|---|
|  | 4667 | @end example | 
|---|
|  | 4668 | @codequoteundirected off | 
|---|
|  | 4669 | @codequotebacktick off | 
|---|
|  | 4670 |  | 
|---|
|  | 4671 | @itemize @bullet | 
|---|
|  | 4672 | @item | 
|---|
|  | 4673 | The regular expression @samp{\b\w+\s+} searches for word-boundary (@samp{\b}), | 
|---|
|  | 4674 | followed by one-or-more word-characters (@samp{\w+}), followed by whitespace | 
|---|
|  | 4675 | (@samp{\s+}). @xref{regexp extensions}. | 
|---|
|  | 4676 |  | 
|---|
|  | 4677 | @item | 
|---|
|  | 4678 | Adding parentheses around the @samp{(\w+)} expression creates a subexpression. | 
|---|
|  | 4679 | The regular expression pattern @samp{(PATTERN)\s+\1} defines a subexpression | 
|---|
|  | 4680 | (in the parentheses) followed by a back-reference, separated by whitespace. | 
|---|
|  | 4681 | A successful match means the @var{PATTERN} was repeated twice in succession. | 
|---|
|  | 4682 | @xref{Back-references and Subexpressions}. | 
|---|
|  | 4683 |  | 
|---|
|  | 4684 | @item | 
|---|
|  | 4685 | The word-boundery expression (@samp{\b}) at both ends ensures partial | 
|---|
|  | 4686 | words are not matched (e.g. @samp{the then} is not a desired match). | 
|---|
|  | 4687 | @c Thanks to Jim for pointing this out in | 
|---|
|  | 4688 | @c https://lists.gnu.org/archive/html/sed-devel/2016-12/msg00041.html | 
|---|
|  | 4689 |  | 
|---|
|  | 4690 | @item | 
|---|
|  | 4691 | The @option{-E} option enables extended regular expression syntax, alleviating | 
|---|
|  | 4692 | the need to add backslashes before the parenthesis. @xref{ERE syntax}. | 
|---|
|  | 4693 |  | 
|---|
|  | 4694 | @end itemize | 
|---|
|  | 4695 |  | 
|---|
|  | 4696 | When the doubled word span two lines the above regular expression | 
|---|
|  | 4697 | will not find them as @command{grep} and @command{sed} operate line-by-line. | 
|---|
|  | 4698 |  | 
|---|
|  | 4699 | By using @command{N} and @command{D} commands, @command{sed} can apply | 
|---|
|  | 4700 | regular expressions on multiple lines (that is, multiple lines are stored | 
|---|
|  | 4701 | in the pattern space, and the regular expression works on it): | 
|---|
|  | 4702 |  | 
|---|
|  | 4703 | @c NOTE: use 'the@*the' instead of a real new line to prevent | 
|---|
|  | 4704 | @c 'make syntax-check' to complain about doubled-words. | 
|---|
|  | 4705 | @codequoteundirected on | 
|---|
|  | 4706 | @codequotebacktick on | 
|---|
|  | 4707 | @example | 
|---|
|  | 4708 | $ cat two-cities-dup2.txt | 
|---|
|  | 4709 | It was the best of times, it was the | 
|---|
|  | 4710 | worst of times, it was the@*the age of wisdom, | 
|---|
|  | 4711 | it was the age of foolishness, | 
|---|
|  | 4712 |  | 
|---|
|  | 4713 | $ sed -En '@{N; /\b(\w+)\s+\1\b/@{=;p@} ; D@}'  two-cities-dup2.txt | 
|---|
|  | 4714 | 3 | 
|---|
|  | 4715 | worst of times, it was the@*the age of wisdom, | 
|---|
|  | 4716 | @end example | 
|---|
|  | 4717 | @codequoteundirected off | 
|---|
|  | 4718 | @codequotebacktick off | 
|---|
|  | 4719 |  | 
|---|
|  | 4720 | @itemize @bullet | 
|---|
|  | 4721 | @item | 
|---|
|  | 4722 | The @command{N} command appends the next line to the pattern space | 
|---|
|  | 4723 | (thus ensuring it contains two consecutive lines in every cycle). | 
|---|
|  | 4724 |  | 
|---|
|  | 4725 | @item | 
|---|
|  | 4726 | The regular expression uses @samp{\s+} for word separator which matches | 
|---|
|  | 4727 | both spaces and newlines. | 
|---|
|  | 4728 |  | 
|---|
|  | 4729 | @item | 
|---|
|  | 4730 | The regular expression matches, the entire pattern space is printed | 
|---|
|  | 4731 | with @command{p}. No lines are printed by default due to the @option{-n} option. | 
|---|
|  | 4732 |  | 
|---|
|  | 4733 | @item | 
|---|
|  | 4734 | The @command{D} removes the first line from the pattern space (up until the | 
|---|
|  | 4735 | first newline), readying it for the next cycle. | 
|---|
|  | 4736 | @end itemize | 
|---|
|  | 4737 |  | 
|---|
|  | 4738 | See the GNU @command{coreutils} manual for an alternative solution using | 
|---|
|  | 4739 | @command{tr -s} and @command{uniq} at | 
|---|
|  | 4740 | @c NOTE: cheating and keeping the URL line shorter than 80 characters | 
|---|
|  | 4741 | @c by using 'gnu.org' and '/s/'. | 
|---|
|  | 4742 | @url{https://gnu.org/s/coreutils/manual/html_node/Squeezing-and-deleting.html}. | 
|---|
|  | 4743 |  | 
|---|
|  | 4744 | @node Line length adjustment | 
|---|
|  | 4745 | @section Line length adjustment | 
|---|
|  | 4746 |  | 
|---|
|  | 4747 | This section uses @code{N} and @code{P} commands to read and write | 
|---|
|  | 4748 | lines, and the @code{b} command for branching. | 
|---|
|  | 4749 | @xref{Multiline techniques} and @ref{Branching and flow control}. | 
|---|
|  | 4750 |  | 
|---|
|  | 4751 | This (somewhat contrived) example deal with formatting and wrapping | 
|---|
|  | 4752 | lines of text of the following input file: | 
|---|
|  | 4753 |  | 
|---|
|  | 4754 | @example | 
|---|
|  | 4755 | @group | 
|---|
|  | 4756 | $ cat two-cities-mix.txt | 
|---|
|  | 4757 | It was the best of times, it was | 
|---|
|  | 4758 | the worst of times, it | 
|---|
|  | 4759 | was the age of | 
|---|
|  | 4760 | wisdom, | 
|---|
|  | 4761 | it | 
|---|
|  | 4762 | was | 
|---|
|  | 4763 | the age | 
|---|
|  | 4764 | of foolishness, | 
|---|
|  | 4765 | @end group | 
|---|
|  | 4766 | @end example | 
|---|
|  | 4767 |  | 
|---|
|  | 4768 | @exdent The following sed program wraps lines at 40 characters: | 
|---|
|  | 4769 | @codequoteundirected on | 
|---|
|  | 4770 | @codequotebacktick on | 
|---|
|  | 4771 | @example | 
|---|
|  | 4772 | @group | 
|---|
|  | 4773 | $ cat wrap40.sed | 
|---|
|  | 4774 | # outer loop | 
|---|
|  | 4775 | :x | 
|---|
|  | 4776 |  | 
|---|
|  | 4777 | # Append a newline followed by the next input line to the pattern buffer | 
|---|
|  | 4778 | N | 
|---|
|  | 4779 |  | 
|---|
|  | 4780 | # Remove all newlines from the pattern buffer | 
|---|
|  | 4781 | s/\n/ /g | 
|---|
|  | 4782 |  | 
|---|
|  | 4783 |  | 
|---|
|  | 4784 | # Inner loop | 
|---|
|  | 4785 | :y | 
|---|
|  | 4786 |  | 
|---|
|  | 4787 | # Add a newline after the first 40 characters | 
|---|
|  | 4788 | s/(.@{40,40@})/\1\n/ | 
|---|
|  | 4789 |  | 
|---|
|  | 4790 | # If there is a newline in the pattern buffer | 
|---|
|  | 4791 | # (i.e. the previous substitution added a newline) | 
|---|
|  | 4792 | /\n/ @{ | 
|---|
|  | 4793 | # There are newlines in the pattern buffer - | 
|---|
|  | 4794 | # print the content until the first newline. | 
|---|
|  | 4795 | P | 
|---|
|  | 4796 |  | 
|---|
|  | 4797 | # Remove the printed characters and the first newline | 
|---|
|  | 4798 | s/.*\n// | 
|---|
|  | 4799 |  | 
|---|
|  | 4800 | # branch to label 'y' - repeat inner loop | 
|---|
|  | 4801 | by | 
|---|
|  | 4802 | @} | 
|---|
|  | 4803 |  | 
|---|
|  | 4804 | # No newlines in the pattern buffer - Branch to label 'x' (outer loop) | 
|---|
|  | 4805 | # and read the next input line | 
|---|
|  | 4806 | bx | 
|---|
|  | 4807 | @end group | 
|---|
|  | 4808 | @end example | 
|---|
|  | 4809 | @codequoteundirected off | 
|---|
|  | 4810 | @codequotebacktick off | 
|---|
|  | 4811 |  | 
|---|
|  | 4812 |  | 
|---|
|  | 4813 |  | 
|---|
|  | 4814 | @exdent The wrapped output: | 
|---|
|  | 4815 | @codequoteundirected on | 
|---|
|  | 4816 | @codequotebacktick on | 
|---|
|  | 4817 | @example | 
|---|
|  | 4818 | @group | 
|---|
|  | 4819 | $ sed -E -f wrap40.sed two-cities-mix.txt | 
|---|
|  | 4820 | It was the best of times, it was the wor | 
|---|
|  | 4821 | st of times, it was the age of wisdom, i | 
|---|
|  | 4822 | t was the age of foolishness, | 
|---|
|  | 4823 | @end group | 
|---|
|  | 4824 | @end example | 
|---|
|  | 4825 | @codequoteundirected off | 
|---|
|  | 4826 | @codequotebacktick off | 
|---|
|  | 4827 |  | 
|---|
|  | 4828 |  | 
|---|
|  | 4829 |  | 
|---|
|  | 4830 |  | 
|---|
|  | 4831 | @node Adding a header to multiple files | 
|---|
|  | 4832 | @section Adding a header to multiple files | 
|---|
|  | 4833 |  | 
|---|
|  | 4834 | @value{SSED} can be used to safely modify multiple files at once. | 
|---|
|  | 4835 |  | 
|---|
|  | 4836 | @exdent Add a single line to the beginning of source code files: | 
|---|
|  | 4837 |  | 
|---|
|  | 4838 | @codequoteundirected on | 
|---|
|  | 4839 | @codequotebacktick on | 
|---|
|  | 4840 | @example | 
|---|
|  | 4841 | sed -i '1i/* Copyright (C) FOO BAR */' *.c | 
|---|
|  | 4842 | @end example | 
|---|
|  | 4843 | @codequoteundirected off | 
|---|
|  | 4844 | @codequotebacktick off | 
|---|
|  | 4845 |  | 
|---|
|  | 4846 | @exdent Adding a few lines is possible using @samp{\n} in the text: | 
|---|
|  | 4847 |  | 
|---|
|  | 4848 | @codequoteundirected on | 
|---|
|  | 4849 | @codequotebacktick on | 
|---|
|  | 4850 | @example | 
|---|
|  | 4851 | sed -i '1i/*\n * Copyright (C) FOO BAR\n * Created by Jane Doe\n */' *.c | 
|---|
|  | 4852 | @end example | 
|---|
|  | 4853 | @codequoteundirected off | 
|---|
|  | 4854 | @codequotebacktick off | 
|---|
|  | 4855 |  | 
|---|
|  | 4856 | To add multiple lines from another file, use @code{0rFILE}. | 
|---|
|  | 4857 | A typical use case is adding a license notice header to all files: | 
|---|
|  | 4858 |  | 
|---|
|  | 4859 | @codequoteundirected on | 
|---|
|  | 4860 | @codequotebacktick on | 
|---|
|  | 4861 | @example | 
|---|
|  | 4862 | ## Create the header file: | 
|---|
|  | 4863 | $ cat<<'EOF'>LIC.TXT | 
|---|
|  | 4864 | /* | 
|---|
|  | 4865 | Copyright (C) 1989-2021 FOO BAR | 
|---|
|  | 4866 |  | 
|---|
|  | 4867 | This program is free software; you can redistribute it and/or modify | 
|---|
|  | 4868 | it under the terms of the GNU General Public License as published by | 
|---|
|  | 4869 | the Free Software Foundation; either version 3, or (at your option) | 
|---|
|  | 4870 | any later version. | 
|---|
|  | 4871 |  | 
|---|
|  | 4872 | This program is distributed in the hope that it will be useful, | 
|---|
|  | 4873 | but WITHOUT ANY WARRANTY; without even the implied warranty of | 
|---|
|  | 4874 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the | 
|---|
|  | 4875 | GNU General Public License for more details. | 
|---|
|  | 4876 |  | 
|---|
|  | 4877 | You should have received a copy of the GNU General Public License | 
|---|
|  | 4878 | along with this program; If not, see <https://www.gnu.org/licenses/>. | 
|---|
|  | 4879 | */ | 
|---|
|  | 4880 | EOF | 
|---|
|  | 4881 |  | 
|---|
|  | 4882 | ## Add the file at the beginning of all source code files: | 
|---|
|  | 4883 | $ sed -i '0rLIC.TXT' *.cpp *.h | 
|---|
|  | 4884 | @end example | 
|---|
|  | 4885 | @codequoteundirected off | 
|---|
|  | 4886 | @codequotebacktick off | 
|---|
|  | 4887 |  | 
|---|
|  | 4888 |  | 
|---|
|  | 4889 | With script files (e.g. @file{.sh},@file{.py},@file{.pl} files) | 
|---|
|  | 4890 | the license notice typically appears @emph{after} the first line (the | 
|---|
|  | 4891 | 'shebang' @samp{#!} line). The @code{1rFILE} command will add @file{FILE} | 
|---|
|  | 4892 | @emph{after} the first line: | 
|---|
|  | 4893 |  | 
|---|
|  | 4894 | @codequoteundirected on | 
|---|
|  | 4895 | @codequotebacktick on | 
|---|
|  | 4896 | @example | 
|---|
|  | 4897 | ## Create the header file: | 
|---|
|  | 4898 | $ cat<<'EOF'>LIC.TXT | 
|---|
|  | 4899 | ## | 
|---|
|  | 4900 | ## Copyright (C) 1989-2021 FOO BAR | 
|---|
|  | 4901 | ## | 
|---|
|  | 4902 | ## This program is free software; you can redistribute it and/or modify | 
|---|
|  | 4903 | ## it under the terms of the GNU General Public License as published by | 
|---|
|  | 4904 | ## the Free Software Foundation; either version 3, or (at your option) | 
|---|
|  | 4905 | ## any later version. | 
|---|
|  | 4906 | ## | 
|---|
|  | 4907 | ## This program is distributed in the hope that it will be useful, | 
|---|
|  | 4908 | ## but WITHOUT ANY WARRANTY; without even the implied warranty of | 
|---|
|  | 4909 | ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the | 
|---|
|  | 4910 | ## GNU General Public License for more details. | 
|---|
|  | 4911 | ## | 
|---|
|  | 4912 | ## You should have received a copy of the GNU General Public License | 
|---|
|  | 4913 | ## along with this program; If not, see <https://www.gnu.org/licenses/>. | 
|---|
|  | 4914 | ## | 
|---|
|  | 4915 | ## | 
|---|
|  | 4916 | EOF | 
|---|
|  | 4917 |  | 
|---|
|  | 4918 | ## Add the file at the beginning of all source code files: | 
|---|
|  | 4919 | $ sed -i '1rLIC.TXT' *.py *.sh | 
|---|
|  | 4920 | @end example | 
|---|
|  | 4921 | @codequoteundirected off | 
|---|
|  | 4922 | @codequotebacktick off | 
|---|
|  | 4923 |  | 
|---|
|  | 4924 | The above @command{sed} commands can be combined with @command{find} | 
|---|
|  | 4925 | to locate files in all subdirectories, @command{xargs} to run additional | 
|---|
|  | 4926 | commands on selected files and @command{grep} to filter out files that already | 
|---|
|  | 4927 | contain a copyright notice: | 
|---|
|  | 4928 |  | 
|---|
|  | 4929 | @codequoteundirected on | 
|---|
|  | 4930 | @codequotebacktick on | 
|---|
|  | 4931 | @example | 
|---|
|  | 4932 | find \( -iname '*.cpp' -o -iname '*.c' -o -iname '*.h' \) \ | 
|---|
|  | 4933 | | xargs grep -Li copyright \ | 
|---|
|  | 4934 | | xargs -r sed -i '0rLIC.TXT' | 
|---|
|  | 4935 | @end example | 
|---|
|  | 4936 | @codequoteundirected off | 
|---|
|  | 4937 | @codequotebacktick off | 
|---|
|  | 4938 |  | 
|---|
|  | 4939 | @exdent Or a slightly safe version (handling files with spaces and newlines): | 
|---|
|  | 4940 |  | 
|---|
|  | 4941 | @codequoteundirected on | 
|---|
|  | 4942 | @codequotebacktick on | 
|---|
|  | 4943 | @example | 
|---|
|  | 4944 | find \( -iname '*.cpp' -o -iname '*.c' -o -iname '*.h' \) -print0 \ | 
|---|
|  | 4945 | | xargs -0 grep -Z -Li copyright \ | 
|---|
|  | 4946 | | xargs -0 -r sed -i '0rLIC.TXT' | 
|---|
|  | 4947 | @end example | 
|---|
|  | 4948 | @codequoteundirected off | 
|---|
|  | 4949 | @codequotebacktick off | 
|---|
|  | 4950 |  | 
|---|
|  | 4951 | Note: using the @code{0} address with @code{r} command requires @value{SSED} | 
|---|
|  | 4952 | version 4.9 or later. @xref{Zero Address}. | 
|---|
|  | 4953 |  | 
|---|
|  | 4954 |  | 
|---|
|  | 4955 |  | 
|---|
| [599] | 4956 | @node tac | 
|---|
|  | 4957 | @section Reverse Lines of Files | 
|---|
|  | 4958 |  | 
|---|
|  | 4959 | This one begins a series of totally useless (yet interesting) | 
|---|
|  | 4960 | scripts emulating various Unix commands.  This, in particular, | 
|---|
|  | 4961 | is a @command{tac} workalike. | 
|---|
|  | 4962 |  | 
|---|
| [3613] | 4963 | Note that on implementations other than GNU @command{sed} | 
|---|
| [599] | 4964 | this script might easily overflow internal buffers. | 
|---|
|  | 4965 |  | 
|---|
|  | 4966 | @c start------------------------------------------- | 
|---|
|  | 4967 | @example | 
|---|
|  | 4968 | #!/usr/bin/sed -nf | 
|---|
|  | 4969 |  | 
|---|
|  | 4970 | # reverse all lines of input, i.e. first line became last, ... | 
|---|
|  | 4971 |  | 
|---|
|  | 4972 | @group | 
|---|
|  | 4973 | # from the second line, the buffer (which contains all previous lines) | 
|---|
|  | 4974 | # is *appended* to current line, so, the order will be reversed | 
|---|
|  | 4975 | 1! G | 
|---|
|  | 4976 | @end group | 
|---|
|  | 4977 |  | 
|---|
|  | 4978 | @group | 
|---|
|  | 4979 | # on the last line we're done -- print everything | 
|---|
|  | 4980 | $ p | 
|---|
|  | 4981 | @end group | 
|---|
|  | 4982 |  | 
|---|
|  | 4983 | @group | 
|---|
|  | 4984 | # store everything on the buffer again | 
|---|
|  | 4985 | h | 
|---|
|  | 4986 | @end group | 
|---|
|  | 4987 | @end example | 
|---|
|  | 4988 | @c end--------------------------------------------- | 
|---|
|  | 4989 |  | 
|---|
|  | 4990 | @node cat -n | 
|---|
|  | 4991 | @section Numbering Lines | 
|---|
|  | 4992 |  | 
|---|
|  | 4993 | This script replaces @samp{cat -n}; in fact it formats its output | 
|---|
| [3613] | 4994 | exactly like GNU @command{cat} does. | 
|---|
| [599] | 4995 |  | 
|---|
|  | 4996 | Of course this is completely useless and for two reasons:  first, | 
|---|
|  | 4997 | because somebody else did it in C, second, because the following | 
|---|
|  | 4998 | Bourne-shell script could be used for the same purpose and would | 
|---|
|  | 4999 | be much faster: | 
|---|
|  | 5000 |  | 
|---|
|  | 5001 | @c start------------------------------------------- | 
|---|
|  | 5002 | @example | 
|---|
|  | 5003 | @group | 
|---|
|  | 5004 | #! /bin/sh | 
|---|
|  | 5005 | sed -e "=" $@@ | sed -e ' | 
|---|
|  | 5006 | s/^/      / | 
|---|
|  | 5007 | N | 
|---|
|  | 5008 | s/^ *\(......\)\n/\1  / | 
|---|
|  | 5009 | ' | 
|---|
|  | 5010 | @end group | 
|---|
|  | 5011 | @end example | 
|---|
|  | 5012 | @c end--------------------------------------------- | 
|---|
|  | 5013 |  | 
|---|
|  | 5014 | It uses @command{sed} to print the line number, then groups lines two | 
|---|
|  | 5015 | by two using @code{N}.  Of course, this script does not teach as much as | 
|---|
|  | 5016 | the one presented below. | 
|---|
|  | 5017 |  | 
|---|
|  | 5018 | The algorithm used for incrementing uses both buffers, so the line | 
|---|
|  | 5019 | is printed as soon as possible and then discarded.  The number | 
|---|
|  | 5020 | is split so that changing digits go in a buffer and unchanged ones go | 
|---|
|  | 5021 | in the other; the changed digits are modified in a single step | 
|---|
|  | 5022 | (using a @code{y} command).  The line number for the next line | 
|---|
|  | 5023 | is then composed and stored in the hold space, to be used in the | 
|---|
|  | 5024 | next iteration. | 
|---|
|  | 5025 |  | 
|---|
|  | 5026 | @c start------------------------------------------- | 
|---|
|  | 5027 | @example | 
|---|
|  | 5028 | #!/usr/bin/sed -nf | 
|---|
|  | 5029 |  | 
|---|
|  | 5030 | @group | 
|---|
|  | 5031 | # Prime the pump on the first line | 
|---|
|  | 5032 | x | 
|---|
|  | 5033 | /^$/ s/^.*$/1/ | 
|---|
|  | 5034 | @end group | 
|---|
|  | 5035 |  | 
|---|
|  | 5036 | @group | 
|---|
|  | 5037 | # Add the correct line number before the pattern | 
|---|
|  | 5038 | G | 
|---|
|  | 5039 | h | 
|---|
|  | 5040 | @end group | 
|---|
|  | 5041 |  | 
|---|
|  | 5042 | @group | 
|---|
|  | 5043 | # Format it and print it | 
|---|
|  | 5044 | s/^/      / | 
|---|
|  | 5045 | s/^ *\(......\)\n/\1  /p | 
|---|
|  | 5046 | @end group | 
|---|
|  | 5047 |  | 
|---|
|  | 5048 | @group | 
|---|
|  | 5049 | # Get the line number from hold space; add a zero | 
|---|
|  | 5050 | # if we're going to add a digit on the next line | 
|---|
|  | 5051 | g | 
|---|
|  | 5052 | s/\n.*$// | 
|---|
|  | 5053 | /^9*$/ s/^/0/ | 
|---|
|  | 5054 | @end group | 
|---|
|  | 5055 |  | 
|---|
|  | 5056 | @group | 
|---|
|  | 5057 | # separate changing/unchanged digits with an x | 
|---|
|  | 5058 | s/.9*$/x&/ | 
|---|
|  | 5059 | @end group | 
|---|
|  | 5060 |  | 
|---|
|  | 5061 | @group | 
|---|
|  | 5062 | # keep changing digits in hold space | 
|---|
|  | 5063 | h | 
|---|
|  | 5064 | s/^.*x// | 
|---|
|  | 5065 | y/0123456789/1234567890/ | 
|---|
|  | 5066 | x | 
|---|
|  | 5067 | @end group | 
|---|
|  | 5068 |  | 
|---|
|  | 5069 | @group | 
|---|
|  | 5070 | # keep unchanged digits in pattern space | 
|---|
|  | 5071 | s/x.*$// | 
|---|
|  | 5072 | @end group | 
|---|
|  | 5073 |  | 
|---|
|  | 5074 | @group | 
|---|
|  | 5075 | # compose the new number, remove the newline implicitly added by G | 
|---|
|  | 5076 | G | 
|---|
|  | 5077 | s/\n// | 
|---|
|  | 5078 | h | 
|---|
|  | 5079 | @end group | 
|---|
|  | 5080 | @end example | 
|---|
|  | 5081 | @c end--------------------------------------------- | 
|---|
|  | 5082 |  | 
|---|
|  | 5083 | @node cat -b | 
|---|
|  | 5084 | @section Numbering Non-blank Lines | 
|---|
|  | 5085 |  | 
|---|
|  | 5086 | Emulating @samp{cat -b} is almost the same as @samp{cat -n}---we only | 
|---|
|  | 5087 | have to select which lines are to be numbered and which are not. | 
|---|
|  | 5088 |  | 
|---|
|  | 5089 | The part that is common to this script and the previous one is | 
|---|
|  | 5090 | not commented to show how important it is to comment @command{sed} | 
|---|
|  | 5091 | scripts properly... | 
|---|
|  | 5092 |  | 
|---|
|  | 5093 | @c start------------------------------------------- | 
|---|
|  | 5094 | @example | 
|---|
|  | 5095 | #!/usr/bin/sed -nf | 
|---|
|  | 5096 |  | 
|---|
|  | 5097 | @group | 
|---|
|  | 5098 | /^$/ @{ | 
|---|
|  | 5099 | p | 
|---|
|  | 5100 | b | 
|---|
|  | 5101 | @} | 
|---|
|  | 5102 | @end group | 
|---|
|  | 5103 |  | 
|---|
|  | 5104 | @group | 
|---|
|  | 5105 | # Same as cat -n from now | 
|---|
|  | 5106 | x | 
|---|
|  | 5107 | /^$/ s/^.*$/1/ | 
|---|
|  | 5108 | G | 
|---|
|  | 5109 | h | 
|---|
|  | 5110 | s/^/      / | 
|---|
|  | 5111 | s/^ *\(......\)\n/\1  /p | 
|---|
|  | 5112 | x | 
|---|
|  | 5113 | s/\n.*$// | 
|---|
|  | 5114 | /^9*$/ s/^/0/ | 
|---|
|  | 5115 | s/.9*$/x&/ | 
|---|
|  | 5116 | h | 
|---|
|  | 5117 | s/^.*x// | 
|---|
|  | 5118 | y/0123456789/1234567890/ | 
|---|
|  | 5119 | x | 
|---|
|  | 5120 | s/x.*$// | 
|---|
|  | 5121 | G | 
|---|
|  | 5122 | s/\n// | 
|---|
|  | 5123 | h | 
|---|
|  | 5124 | @end group | 
|---|
|  | 5125 | @end example | 
|---|
|  | 5126 | @c end--------------------------------------------- | 
|---|
|  | 5127 |  | 
|---|
|  | 5128 | @node wc -c | 
|---|
|  | 5129 | @section Counting Characters | 
|---|
|  | 5130 |  | 
|---|
|  | 5131 | This script shows another way to do arithmetic with @command{sed}. | 
|---|
|  | 5132 | In this case we have to add possibly large numbers, so implementing | 
|---|
|  | 5133 | this by successive increments would not be feasible (and possibly | 
|---|
|  | 5134 | even more complicated to contrive than this script). | 
|---|
|  | 5135 |  | 
|---|
|  | 5136 | The approach is to map numbers to letters, kind of an abacus | 
|---|
|  | 5137 | implemented with @command{sed}.  @samp{a}s are units, @samp{b}s are | 
|---|
|  | 5138 | tens and so on: we simply add the number of characters | 
|---|
|  | 5139 | on the current line as units, and then propagate the carry | 
|---|
|  | 5140 | to tens, hundreds, and so on. | 
|---|
|  | 5141 |  | 
|---|
|  | 5142 | As usual, running totals are kept in hold space. | 
|---|
|  | 5143 |  | 
|---|
|  | 5144 | On the last line, we convert the abacus form back to decimal. | 
|---|
|  | 5145 | For the sake of variety, this is done with a loop rather than | 
|---|
|  | 5146 | with some 80 @code{s} commands@footnote{Some implementations | 
|---|
|  | 5147 | have a limit of 199 commands per script}: first we | 
|---|
|  | 5148 | convert units, removing @samp{a}s from the number; then we | 
|---|
|  | 5149 | rotate letters so that tens become @samp{a}s, and so on | 
|---|
|  | 5150 | until no more letters remain. | 
|---|
|  | 5151 |  | 
|---|
|  | 5152 | @c start------------------------------------------- | 
|---|
|  | 5153 | @example | 
|---|
|  | 5154 | #!/usr/bin/sed -nf | 
|---|
|  | 5155 |  | 
|---|
|  | 5156 | @group | 
|---|
|  | 5157 | # Add n+1 a's to hold space (+1 is for the newline) | 
|---|
|  | 5158 | s/./a/g | 
|---|
|  | 5159 | H | 
|---|
|  | 5160 | x | 
|---|
|  | 5161 | s/\n/a/ | 
|---|
|  | 5162 | @end group | 
|---|
|  | 5163 |  | 
|---|
|  | 5164 | @group | 
|---|
|  | 5165 | # Do the carry.  The t's and b's are not necessary, | 
|---|
|  | 5166 | # but they do speed up the thing | 
|---|
|  | 5167 | t a | 
|---|
|  | 5168 | : a;  s/aaaaaaaaaa/b/g; t b; b done | 
|---|
|  | 5169 | : b;  s/bbbbbbbbbb/c/g; t c; b done | 
|---|
|  | 5170 | : c;  s/cccccccccc/d/g; t d; b done | 
|---|
|  | 5171 | : d;  s/dddddddddd/e/g; t e; b done | 
|---|
|  | 5172 | : e;  s/eeeeeeeeee/f/g; t f; b done | 
|---|
|  | 5173 | : f;  s/ffffffffff/g/g; t g; b done | 
|---|
|  | 5174 | : g;  s/gggggggggg/h/g; t h; b done | 
|---|
|  | 5175 | : h;  s/hhhhhhhhhh//g | 
|---|
|  | 5176 | @end group | 
|---|
|  | 5177 |  | 
|---|
|  | 5178 | @group | 
|---|
|  | 5179 | : done | 
|---|
|  | 5180 | $! @{ | 
|---|
|  | 5181 | h | 
|---|
|  | 5182 | b | 
|---|
|  | 5183 | @} | 
|---|
|  | 5184 | @end group | 
|---|
|  | 5185 |  | 
|---|
|  | 5186 | # On the last line, convert back to decimal | 
|---|
|  | 5187 |  | 
|---|
|  | 5188 | @group | 
|---|
|  | 5189 | : loop | 
|---|
|  | 5190 | /a/! s/[b-h]*/&0/ | 
|---|
|  | 5191 | s/aaaaaaaaa/9/ | 
|---|
|  | 5192 | s/aaaaaaaa/8/ | 
|---|
|  | 5193 | s/aaaaaaa/7/ | 
|---|
|  | 5194 | s/aaaaaa/6/ | 
|---|
|  | 5195 | s/aaaaa/5/ | 
|---|
|  | 5196 | s/aaaa/4/ | 
|---|
|  | 5197 | s/aaa/3/ | 
|---|
|  | 5198 | s/aa/2/ | 
|---|
|  | 5199 | s/a/1/ | 
|---|
|  | 5200 | @end group | 
|---|
|  | 5201 |  | 
|---|
|  | 5202 | @group | 
|---|
|  | 5203 | : next | 
|---|
|  | 5204 | y/bcdefgh/abcdefg/ | 
|---|
|  | 5205 | /[a-h]/ b loop | 
|---|
|  | 5206 | p | 
|---|
|  | 5207 | @end group | 
|---|
|  | 5208 | @end example | 
|---|
|  | 5209 | @c end--------------------------------------------- | 
|---|
|  | 5210 |  | 
|---|
|  | 5211 | @node wc -w | 
|---|
|  | 5212 | @section Counting Words | 
|---|
|  | 5213 |  | 
|---|
|  | 5214 | This script is almost the same as the previous one, once each | 
|---|
|  | 5215 | of the words on the line is converted to a single @samp{a} | 
|---|
|  | 5216 | (in the previous script each letter was changed to an @samp{a}). | 
|---|
|  | 5217 |  | 
|---|
|  | 5218 | It is interesting that real @command{wc} programs have optimized | 
|---|
|  | 5219 | loops for @samp{wc -c}, so they are much slower at counting | 
|---|
|  | 5220 | words rather than characters.  This script's bottleneck, | 
|---|
|  | 5221 | instead, is arithmetic, and hence the word-counting one | 
|---|
|  | 5222 | is faster (it has to manage smaller numbers). | 
|---|
|  | 5223 |  | 
|---|
|  | 5224 | Again, the common parts are not commented to show the importance | 
|---|
|  | 5225 | of commenting @command{sed} scripts. | 
|---|
|  | 5226 |  | 
|---|
|  | 5227 | @c start------------------------------------------- | 
|---|
|  | 5228 | @example | 
|---|
|  | 5229 | #!/usr/bin/sed -nf | 
|---|
|  | 5230 |  | 
|---|
|  | 5231 | @group | 
|---|
|  | 5232 | # Convert words to a's | 
|---|
| [3613] | 5233 | s/[ @kbd{@key{TAB}}][ @kbd{@key{TAB}}]*/ /g | 
|---|
| [599] | 5234 | s/^/ / | 
|---|
|  | 5235 | s/ [^ ][^ ]*/a /g | 
|---|
|  | 5236 | s/ //g | 
|---|
|  | 5237 | @end group | 
|---|
|  | 5238 |  | 
|---|
|  | 5239 | @group | 
|---|
|  | 5240 | # Append them to hold space | 
|---|
|  | 5241 | H | 
|---|
|  | 5242 | x | 
|---|
|  | 5243 | s/\n// | 
|---|
|  | 5244 | @end group | 
|---|
|  | 5245 |  | 
|---|
|  | 5246 | @group | 
|---|
|  | 5247 | # From here on it is the same as in wc -c. | 
|---|
|  | 5248 | /aaaaaaaaaa/! bx;   s/aaaaaaaaaa/b/g | 
|---|
|  | 5249 | /bbbbbbbbbb/! bx;   s/bbbbbbbbbb/c/g | 
|---|
|  | 5250 | /cccccccccc/! bx;   s/cccccccccc/d/g | 
|---|
|  | 5251 | /dddddddddd/! bx;   s/dddddddddd/e/g | 
|---|
|  | 5252 | /eeeeeeeeee/! bx;   s/eeeeeeeeee/f/g | 
|---|
|  | 5253 | /ffffffffff/! bx;   s/ffffffffff/g/g | 
|---|
|  | 5254 | /gggggggggg/! bx;   s/gggggggggg/h/g | 
|---|
|  | 5255 | s/hhhhhhhhhh//g | 
|---|
|  | 5256 | :x | 
|---|
|  | 5257 | $! @{ h; b; @} | 
|---|
|  | 5258 | :y | 
|---|
|  | 5259 | /a/! s/[b-h]*/&0/ | 
|---|
|  | 5260 | s/aaaaaaaaa/9/ | 
|---|
|  | 5261 | s/aaaaaaaa/8/ | 
|---|
|  | 5262 | s/aaaaaaa/7/ | 
|---|
|  | 5263 | s/aaaaaa/6/ | 
|---|
|  | 5264 | s/aaaaa/5/ | 
|---|
|  | 5265 | s/aaaa/4/ | 
|---|
|  | 5266 | s/aaa/3/ | 
|---|
|  | 5267 | s/aa/2/ | 
|---|
|  | 5268 | s/a/1/ | 
|---|
|  | 5269 | y/bcdefgh/abcdefg/ | 
|---|
|  | 5270 | /[a-h]/ by | 
|---|
|  | 5271 | p | 
|---|
|  | 5272 | @end group | 
|---|
|  | 5273 | @end example | 
|---|
|  | 5274 | @c end--------------------------------------------- | 
|---|
|  | 5275 |  | 
|---|
|  | 5276 | @node wc -l | 
|---|
|  | 5277 | @section Counting Lines | 
|---|
|  | 5278 |  | 
|---|
|  | 5279 | No strange things are done now, because @command{sed} gives us | 
|---|
|  | 5280 | @samp{wc -l} functionality for free!!! Look: | 
|---|
|  | 5281 |  | 
|---|
|  | 5282 | @c start------------------------------------------- | 
|---|
|  | 5283 | @example | 
|---|
|  | 5284 | @group | 
|---|
|  | 5285 | #!/usr/bin/sed -nf | 
|---|
|  | 5286 | $= | 
|---|
|  | 5287 | @end group | 
|---|
|  | 5288 | @end example | 
|---|
|  | 5289 | @c end--------------------------------------------- | 
|---|
|  | 5290 |  | 
|---|
|  | 5291 | @node head | 
|---|
|  | 5292 | @section Printing the First Lines | 
|---|
|  | 5293 |  | 
|---|
|  | 5294 | This script is probably the simplest useful @command{sed} script. | 
|---|
|  | 5295 | It displays the first 10 lines of input; the number of displayed | 
|---|
|  | 5296 | lines is right before the @code{q} command. | 
|---|
|  | 5297 |  | 
|---|
|  | 5298 | @c start------------------------------------------- | 
|---|
|  | 5299 | @example | 
|---|
|  | 5300 | @group | 
|---|
|  | 5301 | #!/usr/bin/sed -f | 
|---|
|  | 5302 | 10q | 
|---|
|  | 5303 | @end group | 
|---|
|  | 5304 | @end example | 
|---|
|  | 5305 | @c end--------------------------------------------- | 
|---|
|  | 5306 |  | 
|---|
|  | 5307 | @node tail | 
|---|
|  | 5308 | @section Printing the Last Lines | 
|---|
|  | 5309 |  | 
|---|
|  | 5310 | Printing the last @var{n} lines rather than the first is more complex | 
|---|
|  | 5311 | but indeed possible.  @var{n} is encoded in the second line, before | 
|---|
|  | 5312 | the bang character. | 
|---|
|  | 5313 |  | 
|---|
|  | 5314 | This script is similar to the @command{tac} script in that it keeps the | 
|---|
|  | 5315 | final output in the hold space and prints it at the end: | 
|---|
|  | 5316 |  | 
|---|
|  | 5317 | @c start------------------------------------------- | 
|---|
|  | 5318 | @example | 
|---|
|  | 5319 | #!/usr/bin/sed -nf | 
|---|
|  | 5320 |  | 
|---|
|  | 5321 | @group | 
|---|
|  | 5322 | 1! @{; H; g; @} | 
|---|
|  | 5323 | 1,10 !s/[^\n]*\n// | 
|---|
|  | 5324 | $p | 
|---|
|  | 5325 | h | 
|---|
|  | 5326 | @end group | 
|---|
|  | 5327 | @end example | 
|---|
|  | 5328 | @c end--------------------------------------------- | 
|---|
|  | 5329 |  | 
|---|
|  | 5330 | Mainly, the scripts keeps a window of 10 lines and slides it | 
|---|
|  | 5331 | by adding a line and deleting the oldest (the substitution command | 
|---|
|  | 5332 | on the second line works like a @code{D} command but does not | 
|---|
|  | 5333 | restart the loop). | 
|---|
|  | 5334 |  | 
|---|
|  | 5335 | The ``sliding window'' technique is a very powerful way to write | 
|---|
|  | 5336 | efficient and complex @command{sed} scripts, because commands like | 
|---|
|  | 5337 | @code{P} would require a lot of work if implemented manually. | 
|---|
|  | 5338 |  | 
|---|
|  | 5339 | To introduce the technique, which is fully demonstrated in the | 
|---|
|  | 5340 | rest of this chapter and is based on the @code{N}, @code{P} | 
|---|
|  | 5341 | and @code{D} commands, here is an implementation of @command{tail} | 
|---|
|  | 5342 | using a simple ``sliding window.'' | 
|---|
|  | 5343 |  | 
|---|
|  | 5344 | This looks complicated but in fact the working is the same as | 
|---|
|  | 5345 | the last script: after we have kicked in the appropriate number | 
|---|
|  | 5346 | of lines, however, we stop using the hold space to keep inter-line | 
|---|
|  | 5347 | state, and instead use @code{N} and @code{D} to slide pattern | 
|---|
|  | 5348 | space by one line: | 
|---|
|  | 5349 |  | 
|---|
|  | 5350 | @c start------------------------------------------- | 
|---|
|  | 5351 | @example | 
|---|
|  | 5352 | #!/usr/bin/sed -f | 
|---|
|  | 5353 |  | 
|---|
|  | 5354 | @group | 
|---|
|  | 5355 | 1h | 
|---|
|  | 5356 | 2,10 @{; H; g; @} | 
|---|
|  | 5357 | $q | 
|---|
|  | 5358 | 1,9d | 
|---|
|  | 5359 | N | 
|---|
|  | 5360 | D | 
|---|
|  | 5361 | @end group | 
|---|
|  | 5362 | @end example | 
|---|
|  | 5363 | @c end--------------------------------------------- | 
|---|
|  | 5364 |  | 
|---|
|  | 5365 | Note how the first, second and fourth line are inactive after | 
|---|
|  | 5366 | the first ten lines of input.  After that, all the script does | 
|---|
|  | 5367 | is: exiting on the last line of input, appending the next input | 
|---|
|  | 5368 | line to pattern space, and removing the first line. | 
|---|
|  | 5369 |  | 
|---|
|  | 5370 | @node uniq | 
|---|
|  | 5371 | @section Make Duplicate Lines Unique | 
|---|
|  | 5372 |  | 
|---|
|  | 5373 | This is an example of the art of using the @code{N}, @code{P} | 
|---|
|  | 5374 | and @code{D} commands, probably the most difficult to master. | 
|---|
|  | 5375 |  | 
|---|
|  | 5376 | @c start------------------------------------------- | 
|---|
|  | 5377 | @example | 
|---|
|  | 5378 | @group | 
|---|
|  | 5379 | #!/usr/bin/sed -f | 
|---|
|  | 5380 | h | 
|---|
|  | 5381 | @end group | 
|---|
|  | 5382 |  | 
|---|
|  | 5383 | @group | 
|---|
|  | 5384 | :b | 
|---|
|  | 5385 | # On the last line, print and exit | 
|---|
|  | 5386 | $b | 
|---|
|  | 5387 | N | 
|---|
|  | 5388 | /^\(.*\)\n\1$/ @{ | 
|---|
|  | 5389 | # The two lines are identical.  Undo the effect of | 
|---|
|  | 5390 | # the n command. | 
|---|
|  | 5391 | g | 
|---|
|  | 5392 | bb | 
|---|
|  | 5393 | @} | 
|---|
|  | 5394 | @end group | 
|---|
|  | 5395 |  | 
|---|
|  | 5396 | @group | 
|---|
|  | 5397 | # If the @code{N} command had added the last line, print and exit | 
|---|
|  | 5398 | $b | 
|---|
|  | 5399 | @end group | 
|---|
|  | 5400 |  | 
|---|
|  | 5401 | @group | 
|---|
|  | 5402 | # The lines are different; print the first and go | 
|---|
|  | 5403 | # back working on the second. | 
|---|
|  | 5404 | P | 
|---|
|  | 5405 | D | 
|---|
|  | 5406 | @end group | 
|---|
|  | 5407 | @end example | 
|---|
|  | 5408 | @c end--------------------------------------------- | 
|---|
|  | 5409 |  | 
|---|
| [3613] | 5410 | As you can see, we maintain a 2-line window using @code{P} and @code{D}. | 
|---|
| [599] | 5411 | This technique is often used in advanced @command{sed} scripts. | 
|---|
|  | 5412 |  | 
|---|
|  | 5413 | @node uniq -d | 
|---|
|  | 5414 | @section Print Duplicated Lines of Input | 
|---|
|  | 5415 |  | 
|---|
|  | 5416 | This script prints only duplicated lines, like @samp{uniq -d}. | 
|---|
|  | 5417 |  | 
|---|
|  | 5418 | @c start------------------------------------------- | 
|---|
|  | 5419 | @example | 
|---|
|  | 5420 | #!/usr/bin/sed -nf | 
|---|
|  | 5421 |  | 
|---|
|  | 5422 | @group | 
|---|
|  | 5423 | $b | 
|---|
|  | 5424 | N | 
|---|
|  | 5425 | /^\(.*\)\n\1$/ @{ | 
|---|
|  | 5426 | # Print the first of the duplicated lines | 
|---|
|  | 5427 | s/.*\n// | 
|---|
|  | 5428 | p | 
|---|
|  | 5429 | @end group | 
|---|
|  | 5430 |  | 
|---|
|  | 5431 | @group | 
|---|
|  | 5432 | # Loop until we get a different line | 
|---|
|  | 5433 | :b | 
|---|
|  | 5434 | $b | 
|---|
|  | 5435 | N | 
|---|
|  | 5436 | /^\(.*\)\n\1$/ @{ | 
|---|
|  | 5437 | s/.*\n// | 
|---|
|  | 5438 | bb | 
|---|
|  | 5439 | @} | 
|---|
|  | 5440 | @} | 
|---|
|  | 5441 | @end group | 
|---|
|  | 5442 |  | 
|---|
|  | 5443 | @group | 
|---|
|  | 5444 | # The last line cannot be followed by duplicates | 
|---|
|  | 5445 | $b | 
|---|
|  | 5446 | @end group | 
|---|
|  | 5447 |  | 
|---|
|  | 5448 | @group | 
|---|
|  | 5449 | # Found a different one.  Leave it alone in the pattern space | 
|---|
|  | 5450 | # and go back to the top, hunting its duplicates | 
|---|
|  | 5451 | D | 
|---|
|  | 5452 | @end group | 
|---|
|  | 5453 | @end example | 
|---|
|  | 5454 | @c end--------------------------------------------- | 
|---|
|  | 5455 |  | 
|---|
|  | 5456 | @node uniq -u | 
|---|
|  | 5457 | @section Remove All Duplicated Lines | 
|---|
|  | 5458 |  | 
|---|
|  | 5459 | This script prints only unique lines, like @samp{uniq -u}. | 
|---|
|  | 5460 |  | 
|---|
|  | 5461 | @c start------------------------------------------- | 
|---|
|  | 5462 | @example | 
|---|
|  | 5463 | #!/usr/bin/sed -f | 
|---|
|  | 5464 |  | 
|---|
|  | 5465 | @group | 
|---|
|  | 5466 | # Search for a duplicate line --- until that, print what you find. | 
|---|
|  | 5467 | $b | 
|---|
|  | 5468 | N | 
|---|
|  | 5469 | /^\(.*\)\n\1$/ ! @{ | 
|---|
|  | 5470 | P | 
|---|
|  | 5471 | D | 
|---|
|  | 5472 | @} | 
|---|
|  | 5473 | @end group | 
|---|
|  | 5474 |  | 
|---|
|  | 5475 | @group | 
|---|
|  | 5476 | :c | 
|---|
|  | 5477 | # Got two equal lines in pattern space.  At the | 
|---|
|  | 5478 | # end of the file we simply exit | 
|---|
|  | 5479 | $d | 
|---|
|  | 5480 | @end group | 
|---|
|  | 5481 |  | 
|---|
|  | 5482 | @group | 
|---|
|  | 5483 | # Else, we keep reading lines with @code{N} until we | 
|---|
|  | 5484 | # find a different one | 
|---|
|  | 5485 | s/.*\n// | 
|---|
|  | 5486 | N | 
|---|
|  | 5487 | /^\(.*\)\n\1$/ @{ | 
|---|
|  | 5488 | bc | 
|---|
|  | 5489 | @} | 
|---|
|  | 5490 | @end group | 
|---|
|  | 5491 |  | 
|---|
|  | 5492 | @group | 
|---|
|  | 5493 | # Remove the last instance of the duplicate line | 
|---|
|  | 5494 | # and go back to the top | 
|---|
|  | 5495 | D | 
|---|
|  | 5496 | @end group | 
|---|
|  | 5497 | @end example | 
|---|
|  | 5498 | @c end--------------------------------------------- | 
|---|
|  | 5499 |  | 
|---|
|  | 5500 | @node cat -s | 
|---|
|  | 5501 | @section Squeezing Blank Lines | 
|---|
|  | 5502 |  | 
|---|
|  | 5503 | As a final example, here are three scripts, of increasing complexity | 
|---|
|  | 5504 | and speed, that implement the same function as @samp{cat -s}, that is | 
|---|
|  | 5505 | squeezing blank lines. | 
|---|
|  | 5506 |  | 
|---|
|  | 5507 | The first leaves a blank line at the beginning and end if there are | 
|---|
|  | 5508 | some already. | 
|---|
|  | 5509 |  | 
|---|
|  | 5510 | @c start------------------------------------------- | 
|---|
|  | 5511 | @example | 
|---|
|  | 5512 | #!/usr/bin/sed -f | 
|---|
|  | 5513 |  | 
|---|
|  | 5514 | @group | 
|---|
|  | 5515 | # on empty lines, join with next | 
|---|
|  | 5516 | # Note there is a star in the regexp | 
|---|
|  | 5517 | :x | 
|---|
|  | 5518 | /^\n*$/ @{ | 
|---|
|  | 5519 | N | 
|---|
|  | 5520 | bx | 
|---|
|  | 5521 | @} | 
|---|
|  | 5522 | @end group | 
|---|
|  | 5523 |  | 
|---|
|  | 5524 | @group | 
|---|
|  | 5525 | # now, squeeze all '\n', this can be also done by: | 
|---|
|  | 5526 | # s/^\(\n\)*/\1/ | 
|---|
|  | 5527 | s/\n*/\ | 
|---|
|  | 5528 | / | 
|---|
|  | 5529 | @end group | 
|---|
|  | 5530 | @end example | 
|---|
|  | 5531 | @c end--------------------------------------------- | 
|---|
|  | 5532 |  | 
|---|
|  | 5533 | This one is a bit more complex and removes all empty lines | 
|---|
|  | 5534 | at the beginning.  It does leave a single blank line at end | 
|---|
|  | 5535 | if one was there. | 
|---|
|  | 5536 |  | 
|---|
|  | 5537 | @c start------------------------------------------- | 
|---|
|  | 5538 | @example | 
|---|
|  | 5539 | #!/usr/bin/sed -f | 
|---|
|  | 5540 |  | 
|---|
|  | 5541 | @group | 
|---|
|  | 5542 | # delete all leading empty lines | 
|---|
|  | 5543 | 1,/^./@{ | 
|---|
|  | 5544 | /./!d | 
|---|
|  | 5545 | @} | 
|---|
|  | 5546 | @end group | 
|---|
|  | 5547 |  | 
|---|
|  | 5548 | @group | 
|---|
|  | 5549 | # on an empty line we remove it and all the following | 
|---|
|  | 5550 | # empty lines, but one | 
|---|
|  | 5551 | :x | 
|---|
|  | 5552 | /./!@{ | 
|---|
|  | 5553 | N | 
|---|
|  | 5554 | s/^\n$// | 
|---|
|  | 5555 | tx | 
|---|
|  | 5556 | @} | 
|---|
|  | 5557 | @end group | 
|---|
|  | 5558 | @end example | 
|---|
|  | 5559 | @c end--------------------------------------------- | 
|---|
|  | 5560 |  | 
|---|
|  | 5561 | This removes leading and trailing blank lines.  It is also the | 
|---|
|  | 5562 | fastest.  Note that loops are completely done with @code{n} and | 
|---|
|  | 5563 | @code{b}, without relying on @command{sed} to restart the | 
|---|
| [3613] | 5564 | script automatically at the end of a line. | 
|---|
| [599] | 5565 |  | 
|---|
|  | 5566 | @c start------------------------------------------- | 
|---|
|  | 5567 | @example | 
|---|
|  | 5568 | #!/usr/bin/sed -nf | 
|---|
|  | 5569 |  | 
|---|
|  | 5570 | @group | 
|---|
|  | 5571 | # delete all (leading) blanks | 
|---|
|  | 5572 | /./!d | 
|---|
|  | 5573 | @end group | 
|---|
|  | 5574 |  | 
|---|
|  | 5575 | @group | 
|---|
|  | 5576 | # get here: so there is a non empty | 
|---|
|  | 5577 | :x | 
|---|
|  | 5578 | # print it | 
|---|
|  | 5579 | p | 
|---|
|  | 5580 | # get next | 
|---|
|  | 5581 | n | 
|---|
| [3613] | 5582 | # got chars? print it again, etc... | 
|---|
| [599] | 5583 | /./bx | 
|---|
|  | 5584 | @end group | 
|---|
|  | 5585 |  | 
|---|
|  | 5586 | @group | 
|---|
|  | 5587 | # no, don't have chars: got an empty line | 
|---|
|  | 5588 | :z | 
|---|
|  | 5589 | # get next, if last line we finish here so no trailing | 
|---|
|  | 5590 | # empty lines are written | 
|---|
|  | 5591 | n | 
|---|
|  | 5592 | # also empty? then ignore it, and get next... this will | 
|---|
|  | 5593 | # remove ALL empty lines | 
|---|
|  | 5594 | /./!bz | 
|---|
|  | 5595 | @end group | 
|---|
|  | 5596 |  | 
|---|
|  | 5597 | @group | 
|---|
|  | 5598 | # all empty lines were deleted/ignored, but we have a non empty.  As | 
|---|
|  | 5599 | # what we want to do is to squeeze, insert a blank line artificially | 
|---|
|  | 5600 | i\ | 
|---|
|  | 5601 | @end group | 
|---|
|  | 5602 |  | 
|---|
|  | 5603 | bx | 
|---|
|  | 5604 | @end example | 
|---|
|  | 5605 | @c end--------------------------------------------- | 
|---|
|  | 5606 |  | 
|---|
|  | 5607 | @node Limitations | 
|---|
|  | 5608 | @chapter @value{SSED}'s Limitations and Non-limitations | 
|---|
|  | 5609 |  | 
|---|
| [3613] | 5610 | @cindex GNU extensions, unlimited line length | 
|---|
| [599] | 5611 | @cindex Portability, line length limitations | 
|---|
|  | 5612 | For those who want to write portable @command{sed} scripts, | 
|---|
|  | 5613 | be aware that some implementations have been known to | 
|---|
|  | 5614 | limit line lengths (for the pattern and hold spaces) | 
|---|
|  | 5615 | to be no more than 4000 bytes. | 
|---|
|  | 5616 | The @sc{posix} standard specifies that conforming @command{sed} | 
|---|
|  | 5617 | implementations shall support at least 8192 byte line lengths. | 
|---|
|  | 5618 | @value{SSED} has no built-in limit on line length; | 
|---|
|  | 5619 | as long as it can @code{malloc()} more (virtual) memory, | 
|---|
|  | 5620 | you can feed or construct lines as long as you like. | 
|---|
|  | 5621 |  | 
|---|
|  | 5622 | However, recursion is used to handle subpatterns and indefinite | 
|---|
|  | 5623 | repetition.  This means that the available stack space may limit | 
|---|
|  | 5624 | the size of the buffer that can be processed by certain patterns. | 
|---|
|  | 5625 |  | 
|---|
|  | 5626 |  | 
|---|
| [3613] | 5627 | @node Other Resources | 
|---|
|  | 5628 | @chapter Other Resources for Learning About @command{sed} | 
|---|
| [599] | 5629 |  | 
|---|
| [3613] | 5630 | For up to date information about @value{SSED} please | 
|---|
|  | 5631 | visit @uref{https://www.gnu.org/software/sed/}. | 
|---|
| [599] | 5632 |  | 
|---|
| [3613] | 5633 | Send general questions and suggestions to @email{sed-devel@@gnu.org}. | 
|---|
|  | 5634 | Visit the mailing list archives for past discussions at | 
|---|
|  | 5635 | @uref{https://lists.gnu.org/archive/html/sed-devel/}. | 
|---|
| [599] | 5636 |  | 
|---|
| [3613] | 5637 | @cindex Additional reading about @command{sed} | 
|---|
|  | 5638 | The following resources provide information about @command{sed} | 
|---|
|  | 5639 | (both @value{SSED} and other variations). Note these not maintained by | 
|---|
|  | 5640 | @value{SSED} developers. | 
|---|
| [599] | 5641 |  | 
|---|
| [3613] | 5642 | @itemize @bullet | 
|---|
| [599] | 5643 |  | 
|---|
|  | 5644 | @item | 
|---|
| [3613] | 5645 | sed @code{$HOME}: @uref{http://sed.sf.net} | 
|---|
| [599] | 5646 |  | 
|---|
|  | 5647 | @item | 
|---|
| [3613] | 5648 | sed FAQ: @uref{http://sed.sf.net/sedfaq.html} | 
|---|
| [599] | 5649 |  | 
|---|
|  | 5650 | @item | 
|---|
| [3613] | 5651 | seder's grabbag: @uref{http://sed.sf.net/grabbag} | 
|---|
| [599] | 5652 |  | 
|---|
|  | 5653 | @item | 
|---|
| [3613] | 5654 | The @code{sed-users} mailing list maintained by Sven Guckes: | 
|---|
|  | 5655 | @uref{http://groups.yahoo.com/group/sed-users/} | 
|---|
|  | 5656 | (note this is @emph{not} the @value{SSED} mailing list). | 
|---|
| [599] | 5657 |  | 
|---|
| [3613] | 5658 | @end itemize | 
|---|
| [599] | 5659 |  | 
|---|
|  | 5660 | @node Reporting Bugs | 
|---|
|  | 5661 | @chapter Reporting Bugs | 
|---|
|  | 5662 |  | 
|---|
|  | 5663 | @cindex Bugs, reporting | 
|---|
| [3613] | 5664 | Email bug reports to @email{bug-sed@@gnu.org}. | 
|---|
| [599] | 5665 | Also, please include the output of @samp{sed --version} in the body | 
|---|
|  | 5666 | of your report if at all possible. | 
|---|
|  | 5667 |  | 
|---|
|  | 5668 | Please do not send a bug report like this: | 
|---|
|  | 5669 |  | 
|---|
|  | 5670 | @example | 
|---|
| [3613] | 5671 | @i{@i{@r{while building frobme-1.3.4}}} | 
|---|
|  | 5672 | $ configure | 
|---|
| [599] | 5673 | @error{} sed: file sedscr line 1: Unknown option to 's' | 
|---|
|  | 5674 | @end example | 
|---|
|  | 5675 |  | 
|---|
|  | 5676 | If @value{SSED} doesn't configure your favorite package, take a | 
|---|
|  | 5677 | few extra minutes to identify the specific problem and make a stand-alone | 
|---|
|  | 5678 | test case.  Unlike other programs such as C compilers, making such test | 
|---|
|  | 5679 | cases for @command{sed} is quite simple. | 
|---|
|  | 5680 |  | 
|---|
|  | 5681 | A stand-alone test case includes all the data necessary to perform the | 
|---|
|  | 5682 | test, and the specific invocation of @command{sed} that causes the problem. | 
|---|
|  | 5683 | The smaller a stand-alone test case is, the better.  A test case should | 
|---|
|  | 5684 | not involve something as far removed from @command{sed} as ``try to configure | 
|---|
|  | 5685 | frobme-1.3.4''.  Yes, that is in principle enough information to look | 
|---|
|  | 5686 | for the bug, but that is not a very practical prospect. | 
|---|
|  | 5687 |  | 
|---|
|  | 5688 | Here are a few commonly reported bugs that are not bugs. | 
|---|
|  | 5689 |  | 
|---|
|  | 5690 | @table @asis | 
|---|
| [3613] | 5691 | @anchor{N_command_last_line} | 
|---|
| [599] | 5692 | @item @code{N} command on the last line | 
|---|
|  | 5693 | @cindex Portability, @code{N} command on the last line | 
|---|
|  | 5694 | @cindex Non-bugs, @code{N} command on the last line | 
|---|
|  | 5695 |  | 
|---|
|  | 5696 | Most versions of @command{sed} exit without printing anything when | 
|---|
|  | 5697 | the @command{N} command is issued on the last line of a file. | 
|---|
|  | 5698 | @value{SSED} prints pattern space before exiting unless of course | 
|---|
|  | 5699 | the @command{-n} command switch has been specified.  This choice is | 
|---|
|  | 5700 | by design. | 
|---|
|  | 5701 |  | 
|---|
| [3613] | 5702 | Default behavior (gnu extension, non-POSIX conforming): | 
|---|
|  | 5703 | @example | 
|---|
|  | 5704 | $ seq 3 | sed N | 
|---|
|  | 5705 | 1 | 
|---|
|  | 5706 | 2 | 
|---|
|  | 5707 | 3 | 
|---|
|  | 5708 | @end example | 
|---|
|  | 5709 | @noindent | 
|---|
|  | 5710 | To force POSIX-conforming behavior: | 
|---|
|  | 5711 | @example | 
|---|
|  | 5712 | $ seq 3 | sed --posix N | 
|---|
|  | 5713 | 1 | 
|---|
|  | 5714 | 2 | 
|---|
|  | 5715 | @end example | 
|---|
|  | 5716 |  | 
|---|
| [599] | 5717 | For example, the behavior of | 
|---|
|  | 5718 | @example | 
|---|
|  | 5719 | sed N foo bar | 
|---|
|  | 5720 | @end example | 
|---|
|  | 5721 | @noindent | 
|---|
|  | 5722 | would depend on whether foo has an even or an odd number of | 
|---|
|  | 5723 | lines@footnote{which is the actual ``bug'' that prompted the | 
|---|
|  | 5724 | change in behavior}.  Or, when writing a script to read the | 
|---|
|  | 5725 | next few lines following a pattern match, traditional | 
|---|
|  | 5726 | implementations of @code{sed} would force you to write | 
|---|
|  | 5727 | something like | 
|---|
|  | 5728 | @example | 
|---|
|  | 5729 | /foo/@{ $!N; $!N; $!N; $!N; $!N; $!N; $!N; $!N; $!N @} | 
|---|
|  | 5730 | @end example | 
|---|
|  | 5731 | @noindent | 
|---|
|  | 5732 | instead of just | 
|---|
|  | 5733 | @example | 
|---|
|  | 5734 | /foo/@{ N;N;N;N;N;N;N;N;N; @} | 
|---|
|  | 5735 | @end example | 
|---|
| [3613] | 5736 |  | 
|---|
| [599] | 5737 | @cindex @code{POSIXLY_CORRECT} behavior, @code{N} command | 
|---|
|  | 5738 | In any case, the simplest workaround is to use @code{$d;N} in | 
|---|
|  | 5739 | scripts that rely on the traditional behavior, or to set | 
|---|
|  | 5740 | the @code{POSIXLY_CORRECT} variable to a non-empty value. | 
|---|
|  | 5741 |  | 
|---|
|  | 5742 | @item Regex syntax clashes (problems with backslashes) | 
|---|
| [3613] | 5743 | @cindex GNU extensions, to basic regular expressions | 
|---|
| [599] | 5744 | @cindex Non-bugs, regex syntax clashes | 
|---|
|  | 5745 | @command{sed} uses the @sc{posix} basic regular expression syntax.  According to | 
|---|
|  | 5746 | the standard, the meaning of some escape sequences is undefined in | 
|---|
|  | 5747 | this syntax;  notable in the case of @command{sed} are @code{\|}, | 
|---|
|  | 5748 | @code{\+}, @code{\?}, @code{\`}, @code{\'}, @code{\<}, | 
|---|
|  | 5749 | @code{\>}, @code{\b}, @code{\B}, @code{\w}, and @code{\W}. | 
|---|
|  | 5750 |  | 
|---|
| [3613] | 5751 | As in all GNU programs that use @sc{posix} basic regular | 
|---|
| [599] | 5752 | expressions, @command{sed} interprets these escape sequences as special | 
|---|
|  | 5753 | characters.  So, @code{x\+} matches one or more occurrences of @samp{x}. | 
|---|
|  | 5754 | @code{abc\|def} matches either @samp{abc} or @samp{def}. | 
|---|
|  | 5755 |  | 
|---|
|  | 5756 | This syntax may cause problems when running scripts written for other | 
|---|
|  | 5757 | @command{sed}s.  Some @command{sed} programs have been written with the | 
|---|
|  | 5758 | assumption that @code{\|} and @code{\+} match the literal characters | 
|---|
|  | 5759 | @code{|} and @code{+}.  Such scripts must be modified by removing the | 
|---|
|  | 5760 | spurious backslashes if they are to be used with modern implementations | 
|---|
|  | 5761 | of @command{sed}, like | 
|---|
| [3613] | 5762 | GNU @command{sed}. | 
|---|
| [599] | 5763 |  | 
|---|
|  | 5764 | On the other hand, some scripts use s|abc\|def||g to remove occurrences | 
|---|
|  | 5765 | of @emph{either} @code{abc} or @code{def}.  While this worked until | 
|---|
|  | 5766 | @command{sed} 4.0.x, newer versions interpret this as removing the | 
|---|
|  | 5767 | string @code{abc|def}.  This is again undefined behavior according to | 
|---|
| [3613] | 5768 | POSIX, and this interpretation is arguably more robust: older | 
|---|
| [599] | 5769 | @command{sed}s, for example, required that the regex matcher parsed | 
|---|
|  | 5770 | @code{\/} as @code{/} in the common case of escaping a slash, which is | 
|---|
|  | 5771 | again undefined behavior; the new behavior avoids this, and this is good | 
|---|
|  | 5772 | because the regex matcher is only partially under our control. | 
|---|
|  | 5773 |  | 
|---|
| [3613] | 5774 | @cindex GNU extensions, special escapes | 
|---|
| [599] | 5775 | In addition, this version of @command{sed} supports several escape characters | 
|---|
|  | 5776 | (some of which are multi-character) to insert non-printable characters | 
|---|
|  | 5777 | in scripts (@code{\a}, @code{\c}, @code{\d}, @code{\o}, @code{\r}, | 
|---|
|  | 5778 | @code{\t}, @code{\v}, @code{\x}).  These can cause similar problems | 
|---|
|  | 5779 | with scripts written for other @command{sed}s. | 
|---|
|  | 5780 |  | 
|---|
|  | 5781 | @item @option{-i} clobbers read-only files | 
|---|
|  | 5782 | @cindex In-place editing | 
|---|
|  | 5783 | @cindex @value{SSEDEXT}, in-place editing | 
|---|
|  | 5784 | @cindex Non-bugs, in-place editing | 
|---|
|  | 5785 |  | 
|---|
|  | 5786 | In short, @samp{sed -i} will let you delete the contents of | 
|---|
|  | 5787 | a read-only file, and in general the @option{-i} option | 
|---|
|  | 5788 | (@pxref{Invoking sed, , Invocation}) lets you clobber | 
|---|
|  | 5789 | protected files.  This is not a bug, but rather a consequence | 
|---|
| [3613] | 5790 | of how the Unix file system works. | 
|---|
| [599] | 5791 |  | 
|---|
|  | 5792 | The permissions on a file say what can happen to the data | 
|---|
|  | 5793 | in that file, while the permissions on a directory say what can | 
|---|
|  | 5794 | happen to the list of files in that directory.  @samp{sed -i} | 
|---|
|  | 5795 | will not ever open for writing  a file that is already on disk. | 
|---|
|  | 5796 | Rather, it will work on a temporary file that is finally renamed | 
|---|
|  | 5797 | to the original name: if you rename or delete files, you're actually | 
|---|
|  | 5798 | modifying the contents of the directory, so the operation depends on | 
|---|
|  | 5799 | the permissions of the directory, not of the file.  For this same | 
|---|
| [3613] | 5800 | reason, @command{sed} does not let you use @option{-i} on a writable file | 
|---|
|  | 5801 | in a read-only directory, and will break hard or symbolic links when | 
|---|
|  | 5802 | @option{-i} is used on such a file. | 
|---|
| [599] | 5803 |  | 
|---|
|  | 5804 | @item @code{0a} does not work (gives an error) | 
|---|
| [3613] | 5805 | @cindex @code{0} address | 
|---|
|  | 5806 | @cindex GNU extensions, @code{0} address | 
|---|
|  | 5807 | @cindex Non-bugs, @code{0} address | 
|---|
|  | 5808 |  | 
|---|
| [599] | 5809 | There is no line 0.  0 is a special address that is only used to treat | 
|---|
|  | 5810 | addresses like @code{0,/@var{RE}/} as active when the script starts: if | 
|---|
| [3613] | 5811 | you write @code{1,/abc/d} and the first line includes the string @samp{abc}, | 
|---|
| [599] | 5812 | then that match would be ignored because address ranges must span at least | 
|---|
|  | 5813 | two lines (barring the end of the file); but what you probably wanted is | 
|---|
|  | 5814 | to delete every line up to the first one including @samp{abc}, and this | 
|---|
|  | 5815 | is obtained with @code{0,/abc/d}. | 
|---|
|  | 5816 |  | 
|---|
|  | 5817 | @ifclear PERL | 
|---|
|  | 5818 | @item @code{[a-z]} is case insensitive | 
|---|
| [3613] | 5819 | @cindex Non-bugs, localization-related | 
|---|
|  | 5820 |  | 
|---|
| [599] | 5821 | You are encountering problems with locales.  POSIX mandates that @code{[a-z]} | 
|---|
|  | 5822 | uses the current locale's collation order -- in C parlance, that means using | 
|---|
|  | 5823 | @code{strcoll(3)} instead of @code{strcmp(3)}.  Some locales have a | 
|---|
| [3613] | 5824 | case-insensitive collation order, others don't. | 
|---|
| [599] | 5825 |  | 
|---|
|  | 5826 | Another problem is that @code{[a-z]} tries to use collation symbols. | 
|---|
| [3613] | 5827 | This only happens if you are on the GNU system, using | 
|---|
|  | 5828 | GNU libc's regular expression matcher instead of compiling the | 
|---|
|  | 5829 | one supplied with GNU sed.  In a Danish locale, for example, | 
|---|
| [599] | 5830 | the regular expression @code{^[a-z]$} matches the string @samp{aa}, | 
|---|
|  | 5831 | because this is a single collating symbol that comes after @samp{a} | 
|---|
|  | 5832 | and before @samp{b}; @samp{ll} behaves similarly in Spanish | 
|---|
|  | 5833 | locales, or @samp{ij} in Dutch locales. | 
|---|
|  | 5834 |  | 
|---|
|  | 5835 | To work around these problems, which may cause bugs in shell scripts, set | 
|---|
|  | 5836 | the @env{LC_COLLATE} and @env{LC_CTYPE} environment variables to @samp{C}. | 
|---|
|  | 5837 |  | 
|---|
| [3613] | 5838 | @item @code{s/.*//} does not clear pattern space | 
|---|
|  | 5839 | @cindex Non-bugs, localization-related | 
|---|
|  | 5840 | @cindex @value{SSEDEXT}, emptying pattern space | 
|---|
|  | 5841 | @cindex Emptying pattern space | 
|---|
| [599] | 5842 |  | 
|---|
| [3613] | 5843 | This happens if your input stream includes invalid multibyte | 
|---|
|  | 5844 | sequences.  @sc{posix} mandates that such sequences | 
|---|
|  | 5845 | are @emph{not} matched by @samp{.}, so that @samp{s/.*//} will not clear | 
|---|
|  | 5846 | pattern space as you would expect.  In fact, there is no way to clear | 
|---|
|  | 5847 | sed's buffers in the middle of the script in most multibyte locales | 
|---|
|  | 5848 | (including UTF-8 locales).  For this reason, @value{SSED} provides a `z' | 
|---|
|  | 5849 | command (for `zap') as an extension. | 
|---|
| [599] | 5850 |  | 
|---|
| [3613] | 5851 | To work around these problems, which may cause bugs in shell scripts, set | 
|---|
|  | 5852 | the @env{LC_COLLATE} and @env{LC_CTYPE} environment variables to @samp{C}. | 
|---|
|  | 5853 | @end ifclear | 
|---|
| [599] | 5854 | @end table | 
|---|
|  | 5855 |  | 
|---|
|  | 5856 |  | 
|---|
|  | 5857 |  | 
|---|
|  | 5858 |  | 
|---|
| [3613] | 5859 | @page | 
|---|
|  | 5860 | @node GNU Free Documentation License | 
|---|
|  | 5861 | @appendix GNU Free Documentation License | 
|---|
| [599] | 5862 |  | 
|---|
| [3613] | 5863 | @include fdl.texi | 
|---|
| [599] | 5864 |  | 
|---|
|  | 5865 |  | 
|---|
|  | 5866 | @page | 
|---|
|  | 5867 | @node Concept Index | 
|---|
|  | 5868 | @unnumbered Concept Index | 
|---|
|  | 5869 |  | 
|---|
|  | 5870 | This is a general index of all issues discussed in this manual, with the | 
|---|
|  | 5871 | exception of the @command{sed} commands and command-line options. | 
|---|
|  | 5872 |  | 
|---|
|  | 5873 | @printindex cp | 
|---|
|  | 5874 |  | 
|---|
|  | 5875 | @page | 
|---|
|  | 5876 | @node Command and Option Index | 
|---|
|  | 5877 | @unnumbered Command and Option Index | 
|---|
|  | 5878 |  | 
|---|
|  | 5879 | This is an alphabetical list of all @command{sed} commands and command-line | 
|---|
|  | 5880 | options. | 
|---|
|  | 5881 |  | 
|---|
|  | 5882 | @printindex fn | 
|---|
|  | 5883 |  | 
|---|
|  | 5884 | @contents | 
|---|
|  | 5885 | @bye | 
|---|
|  | 5886 |  | 
|---|
|  | 5887 | @c XXX FIXME: the term "cycle" is never defined... | 
|---|