| 1 | \input texinfo @c -*-texinfo-*-
|
|---|
| 2 | @c
|
|---|
| 3 | @c -- Stuff that needs adding: ----------------------------------------------
|
|---|
| 4 | @c (nothing!)
|
|---|
| 5 | @c --------------------------------------------------------------------------
|
|---|
| 6 | @c Check for consistency: regexps in @code, text that they match in @samp.
|
|---|
| 7 | @c
|
|---|
| 8 | @c Tips:
|
|---|
| 9 | @c @command for command
|
|---|
| 10 | @c @samp for command fragments: @samp{cat -s}
|
|---|
| 11 | @c @code for sed commands and flags
|
|---|
| 12 | @c Use ``quote'' not `quote' or "quote".
|
|---|
| 13 | @c
|
|---|
| 14 | @c %**start of header
|
|---|
| 15 | @setfilename sed.info
|
|---|
| 16 | @settitle sed, a stream editor
|
|---|
| 17 | @c %**end of header
|
|---|
| 18 |
|
|---|
| 19 | @c @smallbook
|
|---|
| 20 |
|
|---|
| 21 | @include version.texi
|
|---|
| 22 |
|
|---|
| 23 | @c Combine indices.
|
|---|
| 24 | @syncodeindex ky cp
|
|---|
| 25 | @syncodeindex pg cp
|
|---|
| 26 | @syncodeindex tp cp
|
|---|
| 27 |
|
|---|
| 28 | @defcodeindex op
|
|---|
| 29 | @syncodeindex op fn
|
|---|
| 30 |
|
|---|
| 31 | @include config.texi
|
|---|
| 32 |
|
|---|
| 33 | @copying
|
|---|
| 34 | This file documents version @value{VERSION} of
|
|---|
| 35 | @value{SSED}, a stream editor.
|
|---|
| 36 |
|
|---|
| 37 | Copyright @copyright{} 1998--2022 Free Software Foundation, Inc.
|
|---|
| 38 |
|
|---|
| 39 | @quotation
|
|---|
| 40 | Permission is granted to copy, distribute and/or modify this document
|
|---|
| 41 | under the terms of the GNU Free Documentation License, Version 1.3
|
|---|
| 42 | or any later version published by the Free Software Foundation;
|
|---|
| 43 | with no Invariant Sections, no Front-Cover Texts, and no
|
|---|
| 44 | Back-Cover Texts. A copy of the license is included in the
|
|---|
| 45 | section entitled ``GNU Free Documentation License''.
|
|---|
| 46 | @end quotation
|
|---|
| 47 | @end copying
|
|---|
| 48 |
|
|---|
| 49 | @setchapternewpage off
|
|---|
| 50 |
|
|---|
| 51 | @titlepage
|
|---|
| 52 | @title @value{SSED}, a stream editor
|
|---|
| 53 | @subtitle version @value{VERSION}, @value{UPDATED}
|
|---|
| 54 | @author by Ken Pizzini, Paolo Bonzini, Jim Meyering, Assaf Gordon
|
|---|
| 55 |
|
|---|
| 56 | @page
|
|---|
| 57 | @vskip 0pt plus 1filll
|
|---|
| 58 | @insertcopying
|
|---|
| 59 | @end titlepage
|
|---|
| 60 |
|
|---|
| 61 | @contents
|
|---|
| 62 |
|
|---|
| 63 | @ifnottex
|
|---|
| 64 | @node Top
|
|---|
| 65 | @top @value{SSED}
|
|---|
| 66 |
|
|---|
| 67 | @insertcopying
|
|---|
| 68 | @end ifnottex
|
|---|
| 69 |
|
|---|
| 70 | @menu
|
|---|
| 71 | * Introduction:: Introduction
|
|---|
| 72 | * Invoking sed:: Invocation
|
|---|
| 73 | * sed scripts:: @command{sed} scripts
|
|---|
| 74 | * sed addresses:: Addresses: selecting lines
|
|---|
| 75 | * sed regular expressions:: Regular expressions: selecting text
|
|---|
| 76 | * advanced sed:: Advanced @command{sed}: cycles and buffers
|
|---|
| 77 | * Examples:: Some sample scripts
|
|---|
| 78 | * Limitations:: Limitations and (non-)limitations of @value{SSED}
|
|---|
| 79 | * Other Resources:: Other resources for learning about @command{sed}
|
|---|
| 80 | * Reporting Bugs:: Reporting bugs
|
|---|
| 81 | * GNU Free Documentation License:: Copying and sharing this manual
|
|---|
| 82 | * Concept Index:: A menu with all the topics in this manual.
|
|---|
| 83 | * Command and Option Index:: A menu with all @command{sed} commands and
|
|---|
| 84 | command-line options.
|
|---|
| 85 | @end menu
|
|---|
| 86 |
|
|---|
| 87 |
|
|---|
| 88 | @node Introduction
|
|---|
| 89 | @chapter Introduction
|
|---|
| 90 |
|
|---|
| 91 | @cindex Stream editor
|
|---|
| 92 | @command{sed} is a stream editor.
|
|---|
| 93 | A stream editor is used to perform basic text
|
|---|
| 94 | transformations on an input stream
|
|---|
| 95 | (a file or input from a pipeline).
|
|---|
| 96 | While in some ways similar to an editor which
|
|---|
| 97 | permits scripted edits (such as @command{ed}),
|
|---|
| 98 | @command{sed} works by making only one pass over the
|
|---|
| 99 | input(s), and is consequently more efficient.
|
|---|
| 100 | But it is @command{sed}'s ability to filter text in a pipeline
|
|---|
| 101 | which particularly distinguishes it from other types of
|
|---|
| 102 | editors.
|
|---|
| 103 |
|
|---|
| 104 |
|
|---|
| 105 | @node Invoking sed
|
|---|
| 106 | @chapter Running sed
|
|---|
| 107 |
|
|---|
| 108 | This chapter covers how to run @command{sed}. Details of @command{sed}
|
|---|
| 109 | scripts and individual @command{sed} commands are discussed in the
|
|---|
| 110 | next chapter.
|
|---|
| 111 |
|
|---|
| 112 | @menu
|
|---|
| 113 | * Overview::
|
|---|
| 114 | * Command-Line Options::
|
|---|
| 115 | * Exit status::
|
|---|
| 116 | @end menu
|
|---|
| 117 |
|
|---|
| 118 |
|
|---|
| 119 | @node Overview
|
|---|
| 120 | @section Overview
|
|---|
| 121 | Normally @command{sed} is invoked like this:
|
|---|
| 122 |
|
|---|
| 123 | @example
|
|---|
| 124 | sed SCRIPT INPUTFILE...
|
|---|
| 125 | @end example
|
|---|
| 126 |
|
|---|
| 127 | For example, to change every @samp{hello} to @samp{world}
|
|---|
| 128 | in the file @file{input.txt}:
|
|---|
| 129 |
|
|---|
| 130 | @example
|
|---|
| 131 | sed 's/hello/world/g' input.txt > output.txt
|
|---|
| 132 | @end example
|
|---|
| 133 |
|
|---|
| 134 | Without the @samp{g} (global) modifier, @command{sed} affects
|
|---|
| 135 | only the first instance per line.
|
|---|
| 136 |
|
|---|
| 137 | @cindex stdin
|
|---|
| 138 | @cindex standard input
|
|---|
| 139 | If you do not specify @var{INPUTFILE}, or if @var{INPUTFILE} is @file{-},
|
|---|
| 140 | @command{sed} filters the contents of the standard input. The following
|
|---|
| 141 | commands are equivalent:
|
|---|
| 142 |
|
|---|
| 143 | @example
|
|---|
| 144 | sed 's/hello/world/g' input.txt > output.txt
|
|---|
| 145 | sed 's/hello/world/g' < input.txt > output.txt
|
|---|
| 146 | cat input.txt | sed 's/hello/world/g' - > output.txt
|
|---|
| 147 | @end example
|
|---|
| 148 |
|
|---|
| 149 | @cindex stdout
|
|---|
| 150 | @cindex output
|
|---|
| 151 | @cindex standard output
|
|---|
| 152 | @cindex -i, example
|
|---|
| 153 | @command{sed} writes output to standard output. Use @option{-i} to edit
|
|---|
| 154 | files in-place instead of printing to standard output.
|
|---|
| 155 | See also the @code{W} and @code{s///w} commands for writing output to
|
|---|
| 156 | other files. The following command modifies @file{file.txt} and
|
|---|
| 157 | does not produce any output:
|
|---|
| 158 |
|
|---|
| 159 | @example
|
|---|
| 160 | sed -i 's/hello/world/' file.txt
|
|---|
| 161 | @end example
|
|---|
| 162 |
|
|---|
| 163 | @cindex -n, example
|
|---|
| 164 | @cindex p, example
|
|---|
| 165 | @cindex suppressing output
|
|---|
| 166 | @cindex output, suppressing
|
|---|
| 167 | By default @command{sed} prints all processed input (except input
|
|---|
| 168 | that has been modified/deleted by commands such as @command{d}).
|
|---|
| 169 | Use @option{-n} to suppress output, and the @code{p} command
|
|---|
| 170 | to print specific lines. The following command prints only line 45
|
|---|
| 171 | of the input file:
|
|---|
| 172 |
|
|---|
| 173 | @example
|
|---|
| 174 | sed -n '45p' file.txt
|
|---|
| 175 | @end example
|
|---|
| 176 |
|
|---|
| 177 |
|
|---|
| 178 |
|
|---|
| 179 | @cindex multiple files
|
|---|
| 180 | @cindex -s, example
|
|---|
| 181 | @command{sed} treats multiple input files as one long stream.
|
|---|
| 182 | The following example prints the first line of the first file
|
|---|
| 183 | (@file{one.txt}) and the last line of the last file (@file{three.txt}).
|
|---|
| 184 | Use @option{-s} to reverse this behavior.
|
|---|
| 185 |
|
|---|
| 186 | @example
|
|---|
| 187 | sed -n '1p ; $p' one.txt two.txt three.txt
|
|---|
| 188 | @end example
|
|---|
| 189 |
|
|---|
| 190 |
|
|---|
| 191 | @cindex -e, example
|
|---|
| 192 | @cindex --expression, example
|
|---|
| 193 | @cindex -f, example
|
|---|
| 194 | @cindex --file, example
|
|---|
| 195 | @cindex script parameter
|
|---|
| 196 | @cindex parameters, script
|
|---|
| 197 | Without @option{-e} or @option{-f} options, @command{sed} uses
|
|---|
| 198 | the first non-option parameter as the @var{script}, and the following
|
|---|
| 199 | non-option parameters as input files.
|
|---|
| 200 | If @option{-e} or @option{-f} options are used to specify a @var{script},
|
|---|
| 201 | all non-option parameters are taken as input files.
|
|---|
| 202 | Options @option{-e} and @option{-f} can be combined, and can appear
|
|---|
| 203 | multiple times (in which case the final effective @var{script} will be
|
|---|
| 204 | concatenation of all the individual @var{script}s).
|
|---|
| 205 |
|
|---|
| 206 | The following examples are equivalent:
|
|---|
| 207 |
|
|---|
| 208 | @example
|
|---|
| 209 | sed 's/hello/world/' input.txt > output.txt
|
|---|
| 210 |
|
|---|
| 211 | sed -e 's/hello/world/' input.txt > output.txt
|
|---|
| 212 | sed --expression='s/hello/world/' input.txt > output.txt
|
|---|
| 213 |
|
|---|
| 214 | echo 's/hello/world/' > myscript.sed
|
|---|
| 215 | sed -f myscript.sed input.txt > output.txt
|
|---|
| 216 | sed --file=myscript.sed input.txt > output.txt
|
|---|
| 217 | @end example
|
|---|
| 218 |
|
|---|
| 219 |
|
|---|
| 220 | @node Command-Line Options
|
|---|
| 221 | @section Command-Line Options
|
|---|
| 222 |
|
|---|
| 223 | The full format for invoking @command{sed} is:
|
|---|
| 224 |
|
|---|
| 225 | @example
|
|---|
| 226 | sed OPTIONS... [SCRIPT] [INPUTFILE...]
|
|---|
| 227 | @end example
|
|---|
| 228 |
|
|---|
| 229 | @command{sed} may be invoked with the following command-line options:
|
|---|
| 230 |
|
|---|
| 231 | @table @code
|
|---|
| 232 | @item --version
|
|---|
| 233 | @opindex --version
|
|---|
| 234 | @cindex Version, printing
|
|---|
| 235 | Print out the version of @command{sed} that is being run and a copyright notice,
|
|---|
| 236 | then exit.
|
|---|
| 237 |
|
|---|
| 238 | @item --help
|
|---|
| 239 | @opindex --help
|
|---|
| 240 | @cindex Usage summary, printing
|
|---|
| 241 | Print a usage message briefly summarizing these command-line options
|
|---|
| 242 | and the bug-reporting address,
|
|---|
| 243 | then exit.
|
|---|
| 244 |
|
|---|
| 245 | @item -n
|
|---|
| 246 | @itemx --quiet
|
|---|
| 247 | @itemx --silent
|
|---|
| 248 | @opindex -n
|
|---|
| 249 | @opindex --quiet
|
|---|
| 250 | @opindex --silent
|
|---|
| 251 | @cindex Disabling autoprint, from command line
|
|---|
| 252 | By default, @command{sed} prints out the pattern space
|
|---|
| 253 | at the end of each cycle through the script (@pxref{Execution Cycle, ,
|
|---|
| 254 | How @code{sed} works}).
|
|---|
| 255 | These options disable this automatic printing,
|
|---|
| 256 | and @command{sed} only produces output when explicitly told to
|
|---|
| 257 | via the @code{p} command.
|
|---|
| 258 |
|
|---|
| 259 | @item --debug
|
|---|
| 260 | @opindex --debug
|
|---|
| 261 | @cindex @value{SSEDEXT}, debug
|
|---|
| 262 | Print the input sed program in canonical form,
|
|---|
| 263 | and annotate program execution.
|
|---|
| 264 | @codequotebacktick on
|
|---|
| 265 | @codequoteundirected on
|
|---|
| 266 | @example
|
|---|
| 267 | $ echo 1 | sed '\%1%s21232'
|
|---|
| 268 | 3
|
|---|
| 269 |
|
|---|
| 270 | $ echo 1 | sed --debug '\%1%s21232'
|
|---|
| 271 | SED PROGRAM:
|
|---|
| 272 | /1/ s/1/3/
|
|---|
| 273 | INPUT: 'STDIN' line 1
|
|---|
| 274 | PATTERN: 1
|
|---|
| 275 | COMMAND: /1/ s/1/3/
|
|---|
| 276 | PATTERN: 3
|
|---|
| 277 | END-OF-CYCLE:
|
|---|
| 278 | 3
|
|---|
| 279 | @end example
|
|---|
| 280 | @codequotebacktick off
|
|---|
| 281 | @codequoteundirected off
|
|---|
| 282 |
|
|---|
| 283 |
|
|---|
| 284 | @item -e @var{script}
|
|---|
| 285 | @itemx --expression=@var{script}
|
|---|
| 286 | @opindex -e
|
|---|
| 287 | @opindex --expression
|
|---|
| 288 | @cindex Script, from command line
|
|---|
| 289 | Add the commands in @var{script} to the set of commands to be
|
|---|
| 290 | run while processing the input.
|
|---|
| 291 |
|
|---|
| 292 | @item -f @var{script-file}
|
|---|
| 293 | @itemx --file=@var{script-file}
|
|---|
| 294 | @opindex -f
|
|---|
| 295 | @opindex --file
|
|---|
| 296 | @cindex Script, from a file
|
|---|
| 297 | Add the commands contained in the file @var{script-file}
|
|---|
| 298 | to the set of commands to be run while processing the input.
|
|---|
| 299 |
|
|---|
| 300 | @item -i[@var{SUFFIX}]
|
|---|
| 301 | @itemx --in-place[=@var{SUFFIX}]
|
|---|
| 302 | @opindex -i
|
|---|
| 303 | @opindex --in-place
|
|---|
| 304 | @cindex In-place editing, activating
|
|---|
| 305 | @cindex @value{SSEDEXT}, in-place editing
|
|---|
| 306 | This option specifies that files are to be edited in-place.
|
|---|
| 307 | @value{SSED} does this by creating a temporary file and
|
|---|
| 308 | sending output to this file rather than to the standard
|
|---|
| 309 | output.@footnote{This applies to commands such as @code{=},
|
|---|
| 310 | @code{a}, @code{c}, @code{i}, @code{l}, @code{p}. You can
|
|---|
| 311 | still write to the standard output by using the @code{w}
|
|---|
| 312 | @cindex @value{SSEDEXT}, @file{/dev/stdout} file
|
|---|
| 313 | or @code{W} commands together with the @file{/dev/stdout}
|
|---|
| 314 | special file}.
|
|---|
| 315 |
|
|---|
| 316 | This option implies @option{-s}.
|
|---|
| 317 |
|
|---|
| 318 | When the end of the file is reached, the temporary file is
|
|---|
| 319 | renamed to the output file's original name. The extension,
|
|---|
| 320 | if supplied, is used to modify the name of the old file
|
|---|
| 321 | before renaming the temporary file, thereby making a backup
|
|---|
| 322 | copy@footnote{Note that @value{SSED} creates the backup
|
|---|
| 323 | file whether or not any output is actually changed.}).
|
|---|
| 324 |
|
|---|
| 325 | @cindex In-place editing, Perl-style backup file names
|
|---|
| 326 | This rule is followed: if the extension doesn't contain a @code{*},
|
|---|
| 327 | then it is appended to the end of the current filename as a
|
|---|
| 328 | suffix; if the extension does contain one or more @code{*}
|
|---|
| 329 | characters, then @emph{each} asterisk is replaced with the
|
|---|
| 330 | current filename. This allows you to add a prefix to the
|
|---|
| 331 | backup file, instead of (or in addition to) a suffix, or
|
|---|
| 332 | even to place backup copies of the original files into another
|
|---|
| 333 | directory (provided the directory already exists).
|
|---|
| 334 |
|
|---|
| 335 | If no extension is supplied, the original file is
|
|---|
| 336 | overwritten without making a backup.
|
|---|
| 337 |
|
|---|
| 338 | Because @option{-i} takes an optional argument, it should
|
|---|
| 339 | not be followed by other short options:
|
|---|
| 340 | @table @code
|
|---|
| 341 | @item sed -Ei '...' FILE
|
|---|
| 342 | Same as @option{-E -i} with no backup suffix - @file{FILE} will be
|
|---|
| 343 | edited in-place without creating a backup.
|
|---|
| 344 |
|
|---|
| 345 | @item sed -iE '...' FILE
|
|---|
| 346 | This is equivalent to @option{--in-place=E}, creating @file{FILEE} as backup
|
|---|
| 347 | of @file{FILE}
|
|---|
| 348 | @end table
|
|---|
| 349 |
|
|---|
| 350 | Be cautious of using @option{-n} with @option{-i}: the former disables
|
|---|
| 351 | automatic printing of lines and the latter changes the file in-place
|
|---|
| 352 | without a backup. Used carelessly (and without an explicit @code{p} command),
|
|---|
| 353 | the output file will be empty:
|
|---|
| 354 | @codequotebacktick on
|
|---|
| 355 | @codequoteundirected on
|
|---|
| 356 | @example
|
|---|
| 357 | # WRONG USAGE: 'FILE' will be truncated.
|
|---|
| 358 | sed -ni 's/foo/bar/' FILE
|
|---|
| 359 | @end example
|
|---|
| 360 | @codequotebacktick off
|
|---|
| 361 | @codequoteundirected off
|
|---|
| 362 |
|
|---|
| 363 | @item -l @var{N}
|
|---|
| 364 | @itemx --line-length=@var{N}
|
|---|
| 365 | @opindex -l
|
|---|
| 366 | @opindex --line-length
|
|---|
| 367 | @cindex Line length, setting
|
|---|
| 368 | Specify the default line-wrap length for the @code{l} command.
|
|---|
| 369 | A length of 0 (zero) means to never wrap long lines. If
|
|---|
| 370 | not specified, it is taken to be 70.
|
|---|
| 371 |
|
|---|
| 372 | @item --posix
|
|---|
| 373 | @opindex --posix
|
|---|
| 374 | @cindex @value{SSEDEXT}, disabling
|
|---|
| 375 | @value{SSED} includes several extensions to POSIX
|
|---|
| 376 | sed. In order to simplify writing portable scripts, this
|
|---|
| 377 | option disables all the extensions that this manual documents,
|
|---|
| 378 | including additional commands.
|
|---|
| 379 | @cindex @code{POSIXLY_CORRECT} behavior, enabling
|
|---|
| 380 | Most of the extensions accept @command{sed} programs that
|
|---|
| 381 | are outside the syntax mandated by POSIX, but some
|
|---|
| 382 | of them (such as the behavior of the @command{N} command
|
|---|
| 383 | described in @ref{Reporting Bugs}) actually violate the
|
|---|
| 384 | standard. If you want to disable only the latter kind of
|
|---|
| 385 | extension, you can set the @code{POSIXLY_CORRECT} variable
|
|---|
| 386 | to a non-empty value.
|
|---|
| 387 |
|
|---|
| 388 | @item -b
|
|---|
| 389 | @itemx --binary
|
|---|
| 390 | @opindex -b
|
|---|
| 391 | @opindex --binary
|
|---|
| 392 | This option is available on every platform, but is only effective where the
|
|---|
| 393 | operating system makes a distinction between text files and binary files.
|
|---|
| 394 | When such a distinction is made---as is the case for MS-DOS, Windows,
|
|---|
| 395 | Cygwin---text files are composed of lines separated by a carriage return
|
|---|
| 396 | @emph{and} a line feed character, and @command{sed} does not see the
|
|---|
| 397 | ending CR. When this option is specified, @command{sed} will open
|
|---|
| 398 | input files in binary mode, thus not requesting this special processing
|
|---|
| 399 | and considering lines to end at a line feed.
|
|---|
| 400 |
|
|---|
| 401 | @item --follow-symlinks
|
|---|
| 402 | @opindex --follow-symlinks
|
|---|
| 403 | This option is available only on platforms that support
|
|---|
| 404 | symbolic links and has an effect only if option @option{-i}
|
|---|
| 405 | is specified. In this case, if the file that is specified
|
|---|
| 406 | on the command line is a symbolic link, @command{sed} will
|
|---|
| 407 | follow the link and edit the ultimate destination of the
|
|---|
| 408 | link. The default behavior is to break the symbolic link,
|
|---|
| 409 | so that the link destination will not be modified.
|
|---|
| 410 |
|
|---|
| 411 | @item -E
|
|---|
| 412 | @itemx -r
|
|---|
| 413 | @itemx --regexp-extended
|
|---|
| 414 | @opindex -E
|
|---|
| 415 | @opindex -r
|
|---|
| 416 | @opindex --regexp-extended
|
|---|
| 417 | @cindex Extended regular expressions, choosing
|
|---|
| 418 | @cindex GNU extensions, extended regular expressions
|
|---|
| 419 | Use extended regular expressions rather than basic
|
|---|
| 420 | regular expressions. Extended regexps are those that
|
|---|
| 421 | @command{egrep} accepts; they can be clearer because they
|
|---|
| 422 | usually have fewer backslashes.
|
|---|
| 423 | Historically this was a GNU extension,
|
|---|
| 424 | but the @option{-E}
|
|---|
| 425 | extension has since been added to the POSIX standard
|
|---|
| 426 | (http://austingroupbugs.net/view.php?id=528),
|
|---|
| 427 | so use @option{-E} for portability.
|
|---|
| 428 | GNU sed has accepted @option{-E} as an undocumented option for years,
|
|---|
| 429 | and *BSD seds have accepted @option{-E} for years as well,
|
|---|
| 430 | but scripts that use @option{-E} might not port to other older systems.
|
|---|
| 431 | @xref{ERE syntax, , Extended regular expressions}.
|
|---|
| 432 |
|
|---|
| 433 |
|
|---|
| 434 | @item -s
|
|---|
| 435 | @itemx --separate
|
|---|
| 436 | @opindex -s
|
|---|
| 437 | @opindex --separate
|
|---|
| 438 | @cindex Working on separate files
|
|---|
| 439 | By default, @command{sed} will consider the files specified on the
|
|---|
| 440 | command line as a single continuous long stream. This @value{SSED}
|
|---|
| 441 | extension allows the user to consider them as separate files:
|
|---|
| 442 | range addresses (such as @samp{/abc/,/def/}) are not allowed
|
|---|
| 443 | to span several files, line numbers are relative to the start
|
|---|
| 444 | of each file, @code{$} refers to the last line of each file,
|
|---|
| 445 | and files invoked from the @code{R} commands are rewound at the
|
|---|
| 446 | start of each file.
|
|---|
| 447 |
|
|---|
| 448 | @item --sandbox
|
|---|
| 449 | @opindex --sandbox
|
|---|
| 450 | @cindex Sandbox mode
|
|---|
| 451 | In sandbox mode, @code{e/w/r} commands are rejected - programs containing
|
|---|
| 452 | them will be aborted without being run. Sandbox mode ensures @command{sed}
|
|---|
| 453 | operates only on the input files designated on the command line, and
|
|---|
| 454 | cannot run external programs.
|
|---|
| 455 |
|
|---|
| 456 |
|
|---|
| 457 | @item -u
|
|---|
| 458 | @itemx --unbuffered
|
|---|
| 459 | @opindex -u
|
|---|
| 460 | @opindex --unbuffered
|
|---|
| 461 | @cindex Unbuffered I/O, choosing
|
|---|
| 462 | Buffer both input and output as minimally as practical.
|
|---|
| 463 | (This is particularly useful if the input is coming from
|
|---|
| 464 | the likes of @samp{tail -f}, and you wish to see the transformed
|
|---|
| 465 | output as soon as possible.)
|
|---|
| 466 |
|
|---|
| 467 | @item -z
|
|---|
| 468 | @itemx --null-data
|
|---|
| 469 | @itemx --zero-terminated
|
|---|
| 470 | @opindex -z
|
|---|
| 471 | @opindex --null-data
|
|---|
| 472 | @opindex --zero-terminated
|
|---|
| 473 | Treat the input as a set of lines, each terminated by a zero byte
|
|---|
| 474 | (the ASCII @samp{NUL} character) instead of a newline. This option can
|
|---|
| 475 | be used with commands like @samp{sort -z} and @samp{find -print0}
|
|---|
| 476 | to process arbitrary file names.
|
|---|
| 477 | @end table
|
|---|
| 478 |
|
|---|
| 479 | If no @option{-e}, @option{-f}, @option{--expression}, or @option{--file}
|
|---|
| 480 | options are given on the command-line,
|
|---|
| 481 | then the first non-option argument on the command line is
|
|---|
| 482 | taken to be the @var{script} to be executed.
|
|---|
| 483 |
|
|---|
| 484 | @cindex Files to be processed as input
|
|---|
| 485 | If any command-line parameters remain after processing the above,
|
|---|
| 486 | these parameters are interpreted as the names of input files to
|
|---|
| 487 | be processed.
|
|---|
| 488 | @cindex Standard input, processing as input
|
|---|
| 489 | A file name of @samp{-} refers to the standard input stream.
|
|---|
| 490 | The standard input will be processed if no file names are specified.
|
|---|
| 491 |
|
|---|
| 492 | @node Exit status
|
|---|
| 493 | @section Exit status
|
|---|
| 494 | @cindex exit status
|
|---|
| 495 | An exit status of zero indicates success, and a nonzero value
|
|---|
| 496 | indicates failure. @value{SSED} returns the following exit status
|
|---|
| 497 | error values:
|
|---|
| 498 |
|
|---|
| 499 | @table @asis
|
|---|
| 500 | @item 0
|
|---|
| 501 | Successful completion.
|
|---|
| 502 |
|
|---|
| 503 | @item 1
|
|---|
| 504 | Invalid command, invalid syntax, invalid regular expression or a
|
|---|
| 505 | @value{SSED} extension command used with @option{--posix}.
|
|---|
| 506 |
|
|---|
| 507 | @item 2
|
|---|
| 508 | One or more of the input file specified on the command line could not be
|
|---|
| 509 | opened (e.g. if a file is not found, or read permission is denied).
|
|---|
| 510 | Processing continued with other files.
|
|---|
| 511 |
|
|---|
| 512 | @item 4
|
|---|
| 513 | An I/O error, or a serious processing error during runtime,
|
|---|
| 514 | @value{SSED} aborted immediately.
|
|---|
| 515 | @end table
|
|---|
| 516 |
|
|---|
| 517 | @cindex Q, example
|
|---|
| 518 | @cindex exit status, example
|
|---|
| 519 | Additionally, the commands @code{q} and @code{Q} can be used to terminate
|
|---|
| 520 | @command{sed} with a custom exit code value (this is a @value{SSED} extension):
|
|---|
| 521 |
|
|---|
| 522 | @example
|
|---|
| 523 | $ echo | sed 'Q42' ; echo $?
|
|---|
| 524 | 42
|
|---|
| 525 | @end example
|
|---|
| 526 |
|
|---|
| 527 |
|
|---|
| 528 | @node sed scripts
|
|---|
| 529 | @chapter @command{sed} scripts
|
|---|
| 530 |
|
|---|
| 531 |
|
|---|
| 532 | @menu
|
|---|
| 533 | * sed script overview:: @command{sed} script overview
|
|---|
| 534 | * sed commands list:: @command{sed} commands summary
|
|---|
| 535 | * The "s" Command:: @command{sed}'s Swiss Army Knife
|
|---|
| 536 | * Common Commands:: Often used commands
|
|---|
| 537 | * Other Commands:: Less frequently used commands
|
|---|
| 538 | * Programming Commands:: Commands for @command{sed} gurus
|
|---|
| 539 | * Extended Commands:: Commands specific of @value{SSED}
|
|---|
| 540 | * Multiple commands syntax:: Extension for easier scripting
|
|---|
| 541 | @end menu
|
|---|
| 542 |
|
|---|
| 543 | @node sed script overview
|
|---|
| 544 | @section @command{sed} script overview
|
|---|
| 545 |
|
|---|
| 546 | @cindex @command{sed} script structure
|
|---|
| 547 | @cindex Script structure
|
|---|
| 548 |
|
|---|
| 549 | A @command{sed} program consists of one or more @command{sed} commands,
|
|---|
| 550 | passed in by one or more of the
|
|---|
| 551 | @option{-e}, @option{-f}, @option{--expression}, and @option{--file}
|
|---|
| 552 | options, or the first non-option argument if zero of these
|
|---|
| 553 | options are used.
|
|---|
| 554 | This document will refer to ``the'' @command{sed} script;
|
|---|
| 555 | this is understood to mean the in-order concatenation
|
|---|
| 556 | of all of the @var{script}s and @var{script-file}s passed in.
|
|---|
| 557 | @xref{Overview}.
|
|---|
| 558 |
|
|---|
| 559 |
|
|---|
| 560 | @cindex @command{sed} commands syntax
|
|---|
| 561 | @cindex syntax, @command{sed} commands
|
|---|
| 562 | @cindex addresses, syntax
|
|---|
| 563 | @cindex syntax, addresses
|
|---|
| 564 | @command{sed} commands follow this syntax:
|
|---|
| 565 |
|
|---|
| 566 | @example
|
|---|
| 567 | [addr]@var{X}[options]
|
|---|
| 568 | @end example
|
|---|
| 569 |
|
|---|
| 570 | @var{X} is a single-letter @command{sed} command.
|
|---|
| 571 | @c TODO: add @pxref{commands} when there is a command-list section.
|
|---|
| 572 | @code{[addr]} is an optional line address. If @code{[addr]} is specified,
|
|---|
| 573 | the command @var{X} will be executed only on the matched lines.
|
|---|
| 574 | @code{[addr]} can be a single line number, a regular expression,
|
|---|
| 575 | or a range of lines (@pxref{sed addresses}).
|
|---|
| 576 | Additional @code{[options]} are used for some @command{sed} commands.
|
|---|
| 577 |
|
|---|
| 578 | @cindex @command{d}, example
|
|---|
| 579 | @cindex address range, example
|
|---|
| 580 | @cindex example, address range
|
|---|
| 581 | The following example deletes lines 30 to 35 in the input.
|
|---|
| 582 | @code{30,35} is an address range. @command{d} is the delete command:
|
|---|
| 583 |
|
|---|
| 584 | @example
|
|---|
| 585 | sed '30,35d' input.txt > output.txt
|
|---|
| 586 | @end example
|
|---|
| 587 |
|
|---|
| 588 | @cindex @command{q}, example
|
|---|
| 589 | @cindex regular expression, example
|
|---|
| 590 | @cindex example, regular expression
|
|---|
| 591 | The following example prints all input until a line
|
|---|
| 592 | starting with the string @samp{foo} is found. If such line is found,
|
|---|
| 593 | @command{sed} will terminate with exit status 42.
|
|---|
| 594 | If such line was not found (and no other error occurred), @command{sed}
|
|---|
| 595 | will exit with status 0.
|
|---|
| 596 | @code{/^foo/} is a regular-expression address.
|
|---|
| 597 | @command{q} is the quit command. @code{42} is the command option.
|
|---|
| 598 |
|
|---|
| 599 | @example
|
|---|
| 600 | sed '/^foo/q42' input.txt > output.txt
|
|---|
| 601 | @end example
|
|---|
| 602 |
|
|---|
| 603 |
|
|---|
| 604 | @cindex multiple @command{sed} commands
|
|---|
| 605 | @cindex @command{sed} commands, multiple
|
|---|
| 606 | @cindex newline, command separator
|
|---|
| 607 | @cindex semicolons, command separator
|
|---|
| 608 | @cindex ;, command separator
|
|---|
| 609 | @cindex -e, example
|
|---|
| 610 | @cindex -f, example
|
|---|
| 611 | Commands within a @var{script} or @var{script-file} can be
|
|---|
| 612 | separated by semicolons (@code{;}) or newlines (ASCII 10).
|
|---|
| 613 | Multiple scripts can be specified with @option{-e} or @option{-f}
|
|---|
| 614 | options.
|
|---|
| 615 |
|
|---|
| 616 | The following examples are all equivalent. They perform two @command{sed}
|
|---|
| 617 | operations: deleting any lines matching the regular expression @code{/^foo/},
|
|---|
| 618 | and replacing all occurrences of the string @samp{hello} with @samp{world}:
|
|---|
| 619 |
|
|---|
| 620 | @example
|
|---|
| 621 | sed '/^foo/d ; s/hello/world/g' input.txt > output.txt
|
|---|
| 622 |
|
|---|
| 623 | sed -e '/^foo/d' -e 's/hello/world/g' input.txt > output.txt
|
|---|
| 624 |
|
|---|
| 625 | echo '/^foo/d' > script.sed
|
|---|
| 626 | echo 's/hello/world/g' >> script.sed
|
|---|
| 627 | sed -f script.sed input.txt > output.txt
|
|---|
| 628 |
|
|---|
| 629 | echo 's/hello/world/g' > script2.sed
|
|---|
| 630 | sed -e '/^foo/d' -f script2.sed input.txt > output.txt
|
|---|
| 631 | @end example
|
|---|
| 632 |
|
|---|
| 633 |
|
|---|
| 634 | @cindex @command{a}, and semicolons
|
|---|
| 635 | @cindex @command{c}, and semicolons
|
|---|
| 636 | @cindex @command{i}, and semicolons
|
|---|
| 637 | Commands @command{a}, @command{c}, @command{i}, due to their syntax,
|
|---|
| 638 | cannot be followed by semicolons working as command separators and
|
|---|
| 639 | thus should be terminated
|
|---|
| 640 | with newlines or be placed at the end of a @var{script} or @var{script-file}.
|
|---|
| 641 | Commands can also be preceded with optional non-significant
|
|---|
| 642 | whitespace characters.
|
|---|
| 643 | @xref{Multiple commands syntax}.
|
|---|
| 644 |
|
|---|
| 645 |
|
|---|
| 646 |
|
|---|
| 647 | @node sed commands list
|
|---|
| 648 | @section @command{sed} commands summary
|
|---|
| 649 |
|
|---|
| 650 | The following commands are supported in @value{SSED}.
|
|---|
| 651 | Some are standard POSIX commands, while other are @value{SSEDEXT}.
|
|---|
| 652 | Details and examples for each command are in the following sections.
|
|---|
| 653 | (Mnemonics) are shown in parentheses.
|
|---|
| 654 |
|
|---|
| 655 | @table @code
|
|---|
| 656 |
|
|---|
| 657 | @item a\
|
|---|
| 658 | @itemx @var{text}
|
|---|
| 659 | Append @var{text} after a line.
|
|---|
| 660 |
|
|---|
| 661 | @item a @var{text}
|
|---|
| 662 | Append @var{text} after a line (alternative syntax).
|
|---|
| 663 |
|
|---|
| 664 | @item b @var{label}
|
|---|
| 665 | Branch unconditionally to @var{label}.
|
|---|
| 666 | The @var{label} may be omitted, in which case the next cycle is started.
|
|---|
| 667 |
|
|---|
| 668 | @item c\
|
|---|
| 669 | @itemx @var{text}
|
|---|
| 670 | Replace (change) lines with @var{text}.
|
|---|
| 671 |
|
|---|
| 672 | @item c @var{text}
|
|---|
| 673 | Replace (change) lines with @var{text} (alternative syntax).
|
|---|
| 674 |
|
|---|
| 675 | @item d
|
|---|
| 676 | Delete the pattern space;
|
|---|
| 677 | immediately start next cycle.
|
|---|
| 678 |
|
|---|
| 679 | @item D
|
|---|
| 680 | If pattern space contains newlines, delete text in the pattern
|
|---|
| 681 | space up to the first newline, and restart cycle with the resultant
|
|---|
| 682 | pattern space, without reading a new line of input.
|
|---|
| 683 |
|
|---|
| 684 | If pattern space contains no newline, start a normal new cycle as if
|
|---|
| 685 | the @code{d} command was issued.
|
|---|
| 686 | @c TODO: add a section about D+N and D+n commands
|
|---|
| 687 |
|
|---|
| 688 | @item e
|
|---|
| 689 | Executes the command that is found in pattern space and
|
|---|
| 690 | replaces the pattern space with the output; a trailing newline
|
|---|
| 691 | is suppressed.
|
|---|
| 692 |
|
|---|
| 693 | @item e @var{command}
|
|---|
| 694 | Executes @var{command} and sends its output to the output stream.
|
|---|
| 695 | The command can run across multiple lines, all but the last ending with
|
|---|
| 696 | a back-slash.
|
|---|
| 697 |
|
|---|
| 698 | @item F
|
|---|
| 699 | (filename) Print the file name of the current input file (with a trailing
|
|---|
| 700 | newline).
|
|---|
| 701 |
|
|---|
| 702 | @item g
|
|---|
| 703 | Replace the contents of the pattern space with the contents of the hold space.
|
|---|
| 704 |
|
|---|
| 705 | @item G
|
|---|
| 706 | Append a newline to the contents of the pattern space,
|
|---|
| 707 | and then append the contents of the hold space to that of the pattern space.
|
|---|
| 708 |
|
|---|
| 709 | @item h
|
|---|
| 710 | (hold) Replace the contents of the hold space with the contents of the
|
|---|
| 711 | pattern space.
|
|---|
| 712 |
|
|---|
| 713 | @item H
|
|---|
| 714 | Append a newline to the contents of the hold space,
|
|---|
| 715 | and then append the contents of the pattern space to that of the hold space.
|
|---|
| 716 |
|
|---|
| 717 | @item i\
|
|---|
| 718 | @itemx @var{text}
|
|---|
| 719 | insert @var{text} before a line.
|
|---|
| 720 |
|
|---|
| 721 | @item i @var{text}
|
|---|
| 722 | insert @var{text} before a line (alternative syntax).
|
|---|
| 723 |
|
|---|
| 724 | @item l
|
|---|
| 725 | Print the pattern space in an unambiguous form.
|
|---|
| 726 |
|
|---|
| 727 | @item n
|
|---|
| 728 | (next) If auto-print is not disabled, print the pattern space,
|
|---|
| 729 | then, regardless, replace the pattern space with the next line of input.
|
|---|
| 730 | If there is no more input then @command{sed} exits without processing
|
|---|
| 731 | any more commands.
|
|---|
| 732 |
|
|---|
| 733 | @item N
|
|---|
| 734 | Add a newline to the pattern space,
|
|---|
| 735 | then append the next line of input to the pattern space.
|
|---|
| 736 | If there is no more input then @command{sed} exits without processing
|
|---|
| 737 | any more commands.
|
|---|
| 738 |
|
|---|
| 739 | @item p
|
|---|
| 740 | Print the pattern space.
|
|---|
| 741 | @c useful with @option{-n}
|
|---|
| 742 |
|
|---|
| 743 | @item P
|
|---|
| 744 | Print the pattern space, up to the first <newline>.
|
|---|
| 745 |
|
|---|
| 746 | @item q@var{[exit-code]}
|
|---|
| 747 | (quit) Exit @command{sed} without processing any more commands or input.
|
|---|
| 748 |
|
|---|
| 749 | @item Q@var{[exit-code]}
|
|---|
| 750 | (quit) This command is the same as @code{q}, but will not print the
|
|---|
| 751 | contents of pattern space. Like @code{q}, it provides the
|
|---|
| 752 | ability to return an exit code to the caller.
|
|---|
| 753 | @c useful to quit on a conditional without printing
|
|---|
| 754 |
|
|---|
| 755 | @item r filename
|
|---|
| 756 | Reads file @var{filename}.
|
|---|
| 757 |
|
|---|
| 758 | @item R filename
|
|---|
| 759 | Queue a line of @var{filename} to be read and
|
|---|
| 760 | inserted into the output stream at the end of the current cycle,
|
|---|
| 761 | or when the next input line is read.
|
|---|
| 762 | @c useful to interleave files
|
|---|
| 763 |
|
|---|
| 764 | @item s@var{/regexp/replacement/[flags]}
|
|---|
| 765 | (substitute) Match the regular-expression against the content of the
|
|---|
| 766 | pattern space. If found, replace matched string with
|
|---|
| 767 | @var{replacement}.
|
|---|
| 768 |
|
|---|
| 769 | @item t @var{label}
|
|---|
| 770 | (test) Branch to @var{label} only if there has been a successful
|
|---|
| 771 | @code{s}ubstitution since the last input line was read or conditional
|
|---|
| 772 | branch was taken. The @var{label} may be omitted, in which case the
|
|---|
| 773 | next cycle is started.
|
|---|
| 774 |
|
|---|
| 775 | @item T @var{label}
|
|---|
| 776 | (test) Branch to @var{label} only if there have been no successful
|
|---|
| 777 | @code{s}ubstitutions since the last input line was read or
|
|---|
| 778 | conditional branch was taken. The @var{label} may be omitted,
|
|---|
| 779 | in which case the next cycle is started.
|
|---|
| 780 |
|
|---|
| 781 | @item v @var{[version]}
|
|---|
| 782 | (version) This command does nothing, but makes @command{sed} fail if
|
|---|
| 783 | @value{SSED} extensions are not supported, or if the requested version
|
|---|
| 784 | is not available.
|
|---|
| 785 |
|
|---|
| 786 | @item w filename
|
|---|
| 787 | Write the pattern space to @var{filename}.
|
|---|
| 788 |
|
|---|
| 789 | @item W filename
|
|---|
| 790 | Write to the given filename the portion of the pattern space up to
|
|---|
| 791 | the first newline
|
|---|
| 792 |
|
|---|
| 793 | @item x
|
|---|
| 794 | Exchange the contents of the hold and pattern spaces.
|
|---|
| 795 |
|
|---|
| 796 |
|
|---|
| 797 | @item y/src/dst/
|
|---|
| 798 | Transliterate any characters in the pattern space which match
|
|---|
| 799 | any of the @var{source-chars} with the corresponding character
|
|---|
| 800 | in @var{dest-chars}.
|
|---|
| 801 |
|
|---|
| 802 |
|
|---|
| 803 | @item z
|
|---|
| 804 | (zap) This command empties the content of pattern space.
|
|---|
| 805 |
|
|---|
| 806 | @item #
|
|---|
| 807 | A comment, until the next newline.
|
|---|
| 808 |
|
|---|
| 809 |
|
|---|
| 810 | @item @{ @var{cmd ; cmd ...} @}
|
|---|
| 811 | Group several commands together.
|
|---|
| 812 | @c useful for multiple commands on same address
|
|---|
| 813 |
|
|---|
| 814 | @item =
|
|---|
| 815 | Print the current input line number (with a trailing newline).
|
|---|
| 816 |
|
|---|
| 817 | @item : @var{label}
|
|---|
| 818 | Specify the location of @var{label} for branch commands (@code{b},
|
|---|
| 819 | @code{t}, @code{T}).
|
|---|
| 820 |
|
|---|
| 821 | @end table
|
|---|
| 822 |
|
|---|
| 823 |
|
|---|
| 824 | @node The "s" Command
|
|---|
| 825 | @section The @code{s} Command
|
|---|
| 826 |
|
|---|
| 827 | The @code{s} command (as in substitute) is probably the most important
|
|---|
| 828 | in @command{sed} and has a lot of different options. The syntax of
|
|---|
| 829 | the @code{s} command is
|
|---|
| 830 | @samp{s/@var{regexp}/@var{replacement}/@var{flags}}.
|
|---|
| 831 |
|
|---|
| 832 | Its basic concept is simple: the @code{s} command attempts to match
|
|---|
| 833 | the pattern space against the supplied regular expression @var{regexp};
|
|---|
| 834 | if the match is successful, then that portion of the
|
|---|
| 835 | pattern space which was matched is replaced with @var{replacement}.
|
|---|
| 836 |
|
|---|
| 837 | For details about @var{regexp} syntax @pxref{Regexp Addresses,,Regular
|
|---|
| 838 | Expression Addresses}.
|
|---|
| 839 |
|
|---|
| 840 | @cindex Backreferences, in regular expressions
|
|---|
| 841 | @cindex Parenthesized substrings
|
|---|
| 842 | The @var{replacement} can contain @code{\@var{n}} (@var{n} being
|
|---|
| 843 | a number from 1 to 9, inclusive) references, which refer to
|
|---|
| 844 | the portion of the match which is contained between the @var{n}th
|
|---|
| 845 | @code{\(} and its matching @code{\)}.
|
|---|
| 846 | Also, the @var{replacement} can contain unescaped @code{&}
|
|---|
| 847 | characters which reference the whole matched portion
|
|---|
| 848 | of the pattern space.
|
|---|
| 849 |
|
|---|
| 850 | @c TODO: xref to backreference section mention @var{\'}.
|
|---|
| 851 |
|
|---|
| 852 | The @code{/}
|
|---|
| 853 | characters may be uniformly replaced by any other single
|
|---|
| 854 | character within any given @code{s} command. The @code{/}
|
|---|
| 855 | character (or whatever other character is used in its stead)
|
|---|
| 856 | can appear in the @var{regexp} or @var{replacement}
|
|---|
| 857 | only if it is preceded by a @code{\} character.
|
|---|
| 858 |
|
|---|
| 859 |
|
|---|
| 860 |
|
|---|
| 861 | @cindex @value{SSEDEXT}, case modifiers in @code{s} commands
|
|---|
| 862 | Finally, as a @value{SSED} extension, you can include a
|
|---|
| 863 | special sequence made of a backslash and one of the letters
|
|---|
| 864 | @code{L}, @code{l}, @code{U}, @code{u}, or @code{E}.
|
|---|
| 865 | The meaning is as follows:
|
|---|
| 866 |
|
|---|
| 867 | @table @code
|
|---|
| 868 | @item \L
|
|---|
| 869 | Turn the replacement
|
|---|
| 870 | to lowercase until a @code{\U} or @code{\E} is found,
|
|---|
| 871 |
|
|---|
| 872 | @item \l
|
|---|
| 873 | Turn the
|
|---|
| 874 | next character to lowercase,
|
|---|
| 875 |
|
|---|
| 876 | @item \U
|
|---|
| 877 | Turn the replacement to uppercase
|
|---|
| 878 | until a @code{\L} or @code{\E} is found,
|
|---|
| 879 |
|
|---|
| 880 | @item \u
|
|---|
| 881 | Turn the next character
|
|---|
| 882 | to uppercase,
|
|---|
| 883 |
|
|---|
| 884 | @item \E
|
|---|
| 885 | Stop case conversion started by @code{\L} or @code{\U}.
|
|---|
| 886 | @end table
|
|---|
| 887 |
|
|---|
| 888 | When the @code{g} flag is being used, case conversion does not
|
|---|
| 889 | propagate from one occurrence of the regular expression to
|
|---|
| 890 | another. For example, when the following command is executed
|
|---|
| 891 | with @samp{a-b-} in pattern space:
|
|---|
| 892 | @example
|
|---|
| 893 | s/\(b\?\)-/x\u\1/g
|
|---|
| 894 | @end example
|
|---|
| 895 |
|
|---|
| 896 | @noindent
|
|---|
| 897 | the output is @samp{axxB}. When replacing the first @samp{-},
|
|---|
| 898 | the @samp{\u} sequence only affects the empty replacement of
|
|---|
| 899 | @samp{\1}. It does not affect the @code{x} character that is
|
|---|
| 900 | added to pattern space when replacing @code{b-} with @code{xB}.
|
|---|
| 901 |
|
|---|
| 902 | On the other hand, @code{\l} and @code{\u} do affect the remainder
|
|---|
| 903 | of the replacement text if they are followed by an empty substitution.
|
|---|
| 904 | With @samp{a-b-} in pattern space, the following command:
|
|---|
| 905 | @example
|
|---|
| 906 | s/\(b\?\)-/\u\1x/g
|
|---|
| 907 | @end example
|
|---|
| 908 |
|
|---|
| 909 | @noindent
|
|---|
| 910 | will replace @samp{-} with @samp{X} (uppercase) and @samp{b-} with
|
|---|
| 911 | @samp{Bx}. If this behavior is undesirable, you can prevent it by
|
|---|
| 912 | adding a @samp{\E} sequence---after @samp{\1} in this case.
|
|---|
| 913 |
|
|---|
| 914 | To include a literal @code{\}, @code{&}, or newline in the final
|
|---|
| 915 | replacement, be sure to precede the desired @code{\}, @code{&},
|
|---|
| 916 | or newline in the @var{replacement} with a @code{\}.
|
|---|
| 917 |
|
|---|
| 918 | @findex s command, option flags
|
|---|
| 919 | @cindex Substitution of text, options
|
|---|
| 920 | The @code{s} command can be followed by zero or more of the
|
|---|
| 921 | following @var{flags}:
|
|---|
| 922 |
|
|---|
| 923 | @table @code
|
|---|
| 924 | @item g
|
|---|
| 925 | @cindex Global substitution
|
|---|
| 926 | @cindex Replacing all text matching regexp in a line
|
|---|
| 927 | Apply the replacement to @emph{all} matches to the @var{regexp},
|
|---|
| 928 | not just the first.
|
|---|
| 929 |
|
|---|
| 930 | @item @var{number}
|
|---|
| 931 | @cindex Replacing only @var{n}th match of regexp in a line
|
|---|
| 932 | Only replace the @var{number}th match of the @var{regexp}.
|
|---|
| 933 |
|
|---|
| 934 | @cindex GNU extensions, @code{g} and @var{number} modifier
|
|---|
| 935 | interaction in @code{s} command
|
|---|
| 936 | @cindex Mixing @code{g} and @var{number} modifiers in the @code{s} command
|
|---|
| 937 | Note: the @sc{posix} standard does not specify what should happen
|
|---|
| 938 | when you mix the @code{g} and @var{number} modifiers,
|
|---|
| 939 | and currently there is no widely agreed upon meaning
|
|---|
| 940 | across @command{sed} implementations.
|
|---|
| 941 | For @value{SSED}, the interaction is defined to be:
|
|---|
| 942 | ignore matches before the @var{number}th,
|
|---|
| 943 | and then match and replace all matches from
|
|---|
| 944 | the @var{number}th on.
|
|---|
| 945 |
|
|---|
| 946 | @item p
|
|---|
| 947 | @cindex Text, printing after substitution
|
|---|
| 948 | If the substitution was made, then print the new pattern space.
|
|---|
| 949 |
|
|---|
| 950 | Note: when both the @code{p} and @code{e} options are specified,
|
|---|
| 951 | the relative ordering of the two produces very different results.
|
|---|
| 952 | In general, @code{ep} (evaluate then print) is what you want,
|
|---|
| 953 | but operating the other way round can be useful for debugging.
|
|---|
| 954 | For this reason, the current version of @value{SSED} interprets
|
|---|
| 955 | specially the presence of @code{p} options both before and after
|
|---|
| 956 | @code{e}, printing the pattern space before and after evaluation,
|
|---|
| 957 | while in general flags for the @code{s} command show their
|
|---|
| 958 | effect just once. This behavior, although documented, might
|
|---|
| 959 | change in future versions.
|
|---|
| 960 |
|
|---|
| 961 | @item w @var{filename}
|
|---|
| 962 | @cindex Text, writing to a file after substitution
|
|---|
| 963 | @cindex @value{SSEDEXT}, @file{/dev/stdout} file
|
|---|
| 964 | @cindex @value{SSEDEXT}, @file{/dev/stderr} file
|
|---|
| 965 | If the substitution was made, then write out the result to the named file.
|
|---|
| 966 | As a @value{SSED} extension, two special values of @var{filename} are
|
|---|
| 967 | supported: @file{/dev/stderr}, which writes the result to the standard
|
|---|
| 968 | error, and @file{/dev/stdout}, which writes to the standard
|
|---|
| 969 | output.@footnote{This is equivalent to @code{p} unless the @option{-i}
|
|---|
| 970 | option is being used.}
|
|---|
| 971 |
|
|---|
| 972 | @item e
|
|---|
| 973 | @cindex Evaluate Bourne-shell commands, after substitution
|
|---|
| 974 | @cindex Subprocesses
|
|---|
| 975 | @cindex @value{SSEDEXT}, evaluating Bourne-shell commands
|
|---|
| 976 | @cindex @value{SSEDEXT}, subprocesses
|
|---|
| 977 | This command allows one to pipe input from a shell command
|
|---|
| 978 | into pattern space. If a substitution was made, the command
|
|---|
| 979 | that is found in pattern space is executed and pattern space
|
|---|
| 980 | is replaced with its output. A trailing newline is suppressed;
|
|---|
| 981 | results are undefined if the command to be executed contains
|
|---|
| 982 | a @sc{nul} character. This is a @value{SSED} extension.
|
|---|
| 983 |
|
|---|
| 984 | @item I
|
|---|
| 985 | @itemx i
|
|---|
| 986 | @cindex GNU extensions, @code{I} modifier
|
|---|
| 987 | @cindex Case-insensitive matching
|
|---|
| 988 | The @code{I} modifier to regular-expression matching is a GNU
|
|---|
| 989 | extension which makes @command{sed} match @var{regexp} in a
|
|---|
| 990 | case-insensitive manner.
|
|---|
| 991 |
|
|---|
| 992 | @item M
|
|---|
| 993 | @itemx m
|
|---|
| 994 | @cindex @value{SSEDEXT}, @code{M} modifier
|
|---|
| 995 | The @code{M} modifier to regular-expression matching is a @value{SSED}
|
|---|
| 996 | extension which directs @value{SSED} to match the regular expression
|
|---|
| 997 | in @cite{multi-line} mode. The modifier causes @code{^} and @code{$} to
|
|---|
| 998 | match respectively (in addition to the normal behavior) the empty string
|
|---|
| 999 | after a newline, and the empty string before a newline. There are
|
|---|
| 1000 | special character sequences
|
|---|
| 1001 | @ifclear PERL
|
|---|
| 1002 | (@code{\`} and @code{\'})
|
|---|
| 1003 | @end ifclear
|
|---|
| 1004 | which always match the beginning or the end of the buffer.
|
|---|
| 1005 | In addition,
|
|---|
| 1006 | the period character does not match a new-line character in
|
|---|
| 1007 | multi-line mode.
|
|---|
| 1008 |
|
|---|
| 1009 |
|
|---|
| 1010 | @end table
|
|---|
| 1011 |
|
|---|
| 1012 | @node Common Commands
|
|---|
| 1013 | @section Often-Used Commands
|
|---|
| 1014 |
|
|---|
| 1015 | If you use @command{sed} at all, you will quite likely want to know
|
|---|
| 1016 | these commands.
|
|---|
| 1017 |
|
|---|
| 1018 | @table @code
|
|---|
| 1019 | @item #
|
|---|
| 1020 | [No addresses allowed.]
|
|---|
| 1021 |
|
|---|
| 1022 | @findex # (comments)
|
|---|
| 1023 | @cindex Comments, in scripts
|
|---|
| 1024 | The @code{#} character begins a comment;
|
|---|
| 1025 | the comment continues until the next newline.
|
|---|
| 1026 |
|
|---|
| 1027 | @cindex Portability, comments
|
|---|
| 1028 | If you are concerned about portability, be aware that
|
|---|
| 1029 | some implementations of @command{sed} (which are not @sc{posix}
|
|---|
| 1030 | conforming) may only support a single one-line comment,
|
|---|
| 1031 | and then only when the very first character of the script is a @code{#}.
|
|---|
| 1032 |
|
|---|
| 1033 | @findex -n, forcing from within a script
|
|---|
| 1034 | @cindex Caveat --- #n on first line
|
|---|
| 1035 | Warning: if the first two characters of the @command{sed} script
|
|---|
| 1036 | are @code{#n}, then the @option{-n} (no-autoprint) option is forced.
|
|---|
| 1037 | If you want to put a comment in the first line of your script
|
|---|
| 1038 | and that comment begins with the letter @samp{n}
|
|---|
| 1039 | and you do not want this behavior,
|
|---|
| 1040 | then be sure to either use a capital @samp{N},
|
|---|
| 1041 | or place at least one space before the @samp{n}.
|
|---|
| 1042 |
|
|---|
| 1043 | @item q [@var{exit-code}]
|
|---|
| 1044 | @findex q (quit) command
|
|---|
| 1045 | @cindex @value{SSEDEXT}, returning an exit code
|
|---|
| 1046 | @cindex Quitting
|
|---|
| 1047 | Exit @command{sed} without processing any more commands or input.
|
|---|
| 1048 |
|
|---|
| 1049 | Example: stop after printing the second line:
|
|---|
| 1050 | @example
|
|---|
| 1051 | $ seq 3 | sed 2q
|
|---|
| 1052 | 1
|
|---|
| 1053 | 2
|
|---|
| 1054 | @end example
|
|---|
| 1055 |
|
|---|
| 1056 | This command accepts only one address.
|
|---|
| 1057 | Note that the current pattern space is printed if auto-print is
|
|---|
| 1058 | not disabled with the @option{-n} options. The ability to return
|
|---|
| 1059 | an exit code from the @command{sed} script is a @value{SSED} extension.
|
|---|
| 1060 |
|
|---|
| 1061 | See also the @value{SSED} extension @code{Q} command which quits silently
|
|---|
| 1062 | without printing the current pattern space.
|
|---|
| 1063 |
|
|---|
| 1064 | @item d
|
|---|
| 1065 | @findex d (delete) command
|
|---|
| 1066 | @cindex Text, deleting
|
|---|
| 1067 | Delete the pattern space;
|
|---|
| 1068 | immediately start next cycle.
|
|---|
| 1069 |
|
|---|
| 1070 | Example: delete the second input line:
|
|---|
| 1071 | @example
|
|---|
| 1072 | $ seq 3 | sed 2d
|
|---|
| 1073 | 1
|
|---|
| 1074 | 3
|
|---|
| 1075 | @end example
|
|---|
| 1076 |
|
|---|
| 1077 | @item p
|
|---|
| 1078 | @findex p (print) command
|
|---|
| 1079 | @cindex Text, printing
|
|---|
| 1080 | Print out the pattern space (to the standard output).
|
|---|
| 1081 | This command is usually only used in conjunction with the @option{-n}
|
|---|
| 1082 | command-line option.
|
|---|
| 1083 |
|
|---|
| 1084 | Example: print only the second input line:
|
|---|
| 1085 | @example
|
|---|
| 1086 | $ seq 3 | sed -n 2p
|
|---|
| 1087 | 2
|
|---|
| 1088 | @end example
|
|---|
| 1089 |
|
|---|
| 1090 | @item n
|
|---|
| 1091 | @findex n (next-line) command
|
|---|
| 1092 | @cindex Next input line, replace pattern space with
|
|---|
| 1093 | @cindex Read next input line
|
|---|
| 1094 | If auto-print is not disabled, print the pattern space,
|
|---|
| 1095 | then, regardless, replace the pattern space with the next line of input.
|
|---|
| 1096 | If there is no more input then @command{sed} exits without processing
|
|---|
| 1097 | any more commands.
|
|---|
| 1098 |
|
|---|
| 1099 | This command is useful to skip lines (e.g. process every Nth line).
|
|---|
| 1100 |
|
|---|
| 1101 | Example: perform substitution on every 3rd line (i.e. two @code{n} commands
|
|---|
| 1102 | skip two lines):
|
|---|
| 1103 | @codequoteundirected on
|
|---|
| 1104 | @codequotebacktick on
|
|---|
| 1105 | @example
|
|---|
| 1106 | $ seq 6 | sed 'n;n;s/./x/'
|
|---|
| 1107 | 1
|
|---|
| 1108 | 2
|
|---|
| 1109 | x
|
|---|
| 1110 | 4
|
|---|
| 1111 | 5
|
|---|
| 1112 | x
|
|---|
| 1113 | @end example
|
|---|
| 1114 |
|
|---|
| 1115 | @value{SSED} provides an extension address syntax of @var{first}~@var{step}
|
|---|
| 1116 | to achieve the same result:
|
|---|
| 1117 |
|
|---|
| 1118 | @example
|
|---|
| 1119 | $ seq 6 | sed '0~3s/./x/'
|
|---|
| 1120 | 1
|
|---|
| 1121 | 2
|
|---|
| 1122 | x
|
|---|
| 1123 | 4
|
|---|
| 1124 | 5
|
|---|
| 1125 | x
|
|---|
| 1126 | @end example
|
|---|
| 1127 |
|
|---|
| 1128 | @codequotebacktick off
|
|---|
| 1129 | @codequoteundirected off
|
|---|
| 1130 |
|
|---|
| 1131 |
|
|---|
| 1132 | @item @{ @var{commands} @}
|
|---|
| 1133 | @findex @{@} command grouping
|
|---|
| 1134 | @cindex Grouping commands
|
|---|
| 1135 | @cindex Command groups
|
|---|
| 1136 | A group of commands may be enclosed between
|
|---|
| 1137 | @code{@{} and @code{@}} characters.
|
|---|
| 1138 | This is particularly useful when you want a group of commands
|
|---|
| 1139 | to be triggered by a single address (or address-range) match.
|
|---|
| 1140 |
|
|---|
| 1141 | Example: perform substitution then print the second input line:
|
|---|
| 1142 | @codequoteundirected on
|
|---|
| 1143 | @codequotebacktick on
|
|---|
| 1144 | @example
|
|---|
| 1145 | $ seq 3 | sed -n '2@{s/2/X/ ; p@}'
|
|---|
| 1146 | X
|
|---|
| 1147 | @end example
|
|---|
| 1148 | @codequoteundirected off
|
|---|
| 1149 | @codequotebacktick off
|
|---|
| 1150 |
|
|---|
| 1151 | @end table
|
|---|
| 1152 |
|
|---|
| 1153 |
|
|---|
| 1154 | @node Other Commands
|
|---|
| 1155 | @section Less Frequently-Used Commands
|
|---|
| 1156 |
|
|---|
| 1157 | Though perhaps less frequently used than those in the previous
|
|---|
| 1158 | section, some very small yet useful @command{sed} scripts can be built with
|
|---|
| 1159 | these commands.
|
|---|
| 1160 |
|
|---|
| 1161 | @table @code
|
|---|
| 1162 | @item y/@var{source-chars}/@var{dest-chars}/
|
|---|
| 1163 | @findex y (transliterate) command
|
|---|
| 1164 | @cindex Transliteration
|
|---|
| 1165 | Transliterate any characters in the pattern space which match
|
|---|
| 1166 | any of the @var{source-chars} with the corresponding character
|
|---|
| 1167 | in @var{dest-chars}.
|
|---|
| 1168 |
|
|---|
| 1169 | Example: transliterate @samp{a-j} into @samp{0-9}:
|
|---|
| 1170 | @codequoteundirected on
|
|---|
| 1171 | @codequotebacktick on
|
|---|
| 1172 | @example
|
|---|
| 1173 | $ echo hello world | sed 'y/abcdefghij/0123456789/'
|
|---|
| 1174 | 74llo worl3
|
|---|
| 1175 | @end example
|
|---|
| 1176 | @codequoteundirected off
|
|---|
| 1177 | @codequotebacktick off
|
|---|
| 1178 |
|
|---|
| 1179 | (The @code{/} characters may be uniformly replaced by
|
|---|
| 1180 | any other single character within any given @code{y} command.)
|
|---|
| 1181 |
|
|---|
| 1182 | Instances of the @code{/} (or whatever other character is used in its stead),
|
|---|
| 1183 | @code{\}, or newlines can appear in the @var{source-chars} or @var{dest-chars}
|
|---|
| 1184 | lists, provide that each instance is escaped by a @code{\}.
|
|---|
| 1185 | The @var{source-chars} and @var{dest-chars} lists @emph{must}
|
|---|
| 1186 | contain the same number of characters (after de-escaping).
|
|---|
| 1187 |
|
|---|
| 1188 | See the @command{tr} command from GNU coreutils for similar functionality.
|
|---|
| 1189 |
|
|---|
| 1190 | @item a @var{text}
|
|---|
| 1191 | Appending @var{text} after a line. This is a GNU extension
|
|---|
| 1192 | to the standard @code{a} command - see below for details.
|
|---|
| 1193 |
|
|---|
| 1194 | Example: Add @samp{hello} after the second line:
|
|---|
| 1195 | @codequoteundirected on
|
|---|
| 1196 | @codequotebacktick on
|
|---|
| 1197 | @example
|
|---|
| 1198 | $ seq 3 | sed '2a hello'
|
|---|
| 1199 | 1
|
|---|
| 1200 | 2
|
|---|
| 1201 | hello
|
|---|
| 1202 | 3
|
|---|
| 1203 | @end example
|
|---|
| 1204 | @codequoteundirected off
|
|---|
| 1205 | @codequotebacktick off
|
|---|
| 1206 |
|
|---|
| 1207 | Leading whitespace after the @code{a} command is ignored.
|
|---|
| 1208 | The text to add is read until the end of the line.
|
|---|
| 1209 |
|
|---|
| 1210 |
|
|---|
| 1211 | @item a\
|
|---|
| 1212 | @itemx @var{text}
|
|---|
| 1213 | @findex a (append text lines) command
|
|---|
| 1214 | @cindex Appending text after a line
|
|---|
| 1215 | @cindex Text, appending
|
|---|
| 1216 | Appending @var{text} after a line.
|
|---|
| 1217 |
|
|---|
| 1218 | Example: Add @samp{hello} after the second line
|
|---|
| 1219 | (@print{} indicates printed output lines):
|
|---|
| 1220 | @codequoteundirected on
|
|---|
| 1221 | @codequotebacktick on
|
|---|
| 1222 | @example
|
|---|
| 1223 | $ seq 3 | sed '2a\
|
|---|
| 1224 | hello'
|
|---|
| 1225 | @print{}1
|
|---|
| 1226 | @print{}2
|
|---|
| 1227 | @print{}hello
|
|---|
| 1228 | @print{}3
|
|---|
| 1229 | @end example
|
|---|
| 1230 | @codequoteundirected off
|
|---|
| 1231 | @codequotebacktick off
|
|---|
| 1232 |
|
|---|
| 1233 | The @code{a} command queues the lines of text which follow this command
|
|---|
| 1234 | (each but the last ending with a @code{\},
|
|---|
| 1235 | which are removed from the output)
|
|---|
| 1236 | to be output at the end of the current cycle,
|
|---|
| 1237 | or when the next input line is read.
|
|---|
| 1238 |
|
|---|
| 1239 | @cindex @value{SSEDEXT}, two addresses supported by most commands
|
|---|
| 1240 | As a GNU extension, this command accepts two addresses.
|
|---|
| 1241 |
|
|---|
| 1242 | Escape sequences in @var{text} are processed, so you should
|
|---|
| 1243 | use @code{\\} in @var{text} to print a single backslash.
|
|---|
| 1244 |
|
|---|
| 1245 | The commands resume after the last line without a backslash (@code{\}) -
|
|---|
| 1246 | @samp{world} in the following example:
|
|---|
| 1247 | @codequoteundirected on
|
|---|
| 1248 | @codequotebacktick on
|
|---|
| 1249 | @example
|
|---|
| 1250 | $ seq 3 | sed '2a\
|
|---|
| 1251 | hello\
|
|---|
| 1252 | world
|
|---|
| 1253 | 3s/./X/'
|
|---|
| 1254 | @print{}1
|
|---|
| 1255 | @print{}2
|
|---|
| 1256 | @print{}hello
|
|---|
| 1257 | @print{}world
|
|---|
| 1258 | @print{}X
|
|---|
| 1259 | @end example
|
|---|
| 1260 | @codequoteundirected off
|
|---|
| 1261 | @codequotebacktick off
|
|---|
| 1262 |
|
|---|
| 1263 | As a GNU extension, the @code{a} command and @var{text} can be
|
|---|
| 1264 | separated into two @code{-e} parameters, enabling easier scripting:
|
|---|
| 1265 | @codequoteundirected on
|
|---|
| 1266 | @codequotebacktick on
|
|---|
| 1267 | @example
|
|---|
| 1268 | $ seq 3 | sed -e '2a\' -e hello
|
|---|
| 1269 | 1
|
|---|
| 1270 | 2
|
|---|
| 1271 | hello
|
|---|
| 1272 | 3
|
|---|
| 1273 |
|
|---|
| 1274 | $ sed -e '2a\' -e "$VAR"
|
|---|
| 1275 | @end example
|
|---|
| 1276 | @codequoteundirected off
|
|---|
| 1277 | @codequotebacktick off
|
|---|
| 1278 |
|
|---|
| 1279 | @item i @var{text}
|
|---|
| 1280 | insert @var{text} before a line. This is a GNU extension
|
|---|
| 1281 | to the standard @code{i} command - see below for details.
|
|---|
| 1282 |
|
|---|
| 1283 | Example: Insert @samp{hello} before the second line:
|
|---|
| 1284 | @codequoteundirected on
|
|---|
| 1285 | @codequotebacktick on
|
|---|
| 1286 | @example
|
|---|
| 1287 | $ seq 3 | sed '2i hello'
|
|---|
| 1288 | 1
|
|---|
| 1289 | hello
|
|---|
| 1290 | 2
|
|---|
| 1291 | 3
|
|---|
| 1292 | @end example
|
|---|
| 1293 | @codequoteundirected off
|
|---|
| 1294 | @codequotebacktick off
|
|---|
| 1295 |
|
|---|
| 1296 | Leading whitespace after the @code{i} command is ignored.
|
|---|
| 1297 | The text to add is read until the end of the line.
|
|---|
| 1298 |
|
|---|
| 1299 | @anchor{insert command}
|
|---|
| 1300 | @item i\
|
|---|
| 1301 | @itemx @var{text}
|
|---|
| 1302 | @findex i (insert text lines) command
|
|---|
| 1303 | @cindex Inserting text before a line
|
|---|
| 1304 | @cindex Text, insertion
|
|---|
| 1305 | Immediately output the lines of text which follow this command.
|
|---|
| 1306 |
|
|---|
| 1307 | Example: Insert @samp{hello} before the second line
|
|---|
| 1308 | (@print{} indicates printed output lines):
|
|---|
| 1309 | @codequoteundirected on
|
|---|
| 1310 | @codequotebacktick on
|
|---|
| 1311 | @example
|
|---|
| 1312 | $ seq 3 | sed '2i\
|
|---|
| 1313 | hello'
|
|---|
| 1314 | @print{}1
|
|---|
| 1315 | @print{}hello
|
|---|
| 1316 | @print{}2
|
|---|
| 1317 | @print{}3
|
|---|
| 1318 | @end example
|
|---|
| 1319 | @codequoteundirected off
|
|---|
| 1320 | @codequotebacktick off
|
|---|
| 1321 |
|
|---|
| 1322 | @cindex @value{SSEDEXT}, two addresses supported by most commands
|
|---|
| 1323 | As a GNU extension, this command accepts two addresses.
|
|---|
| 1324 |
|
|---|
| 1325 | Escape sequences in @var{text} are processed, so you should
|
|---|
| 1326 | use @code{\\} in @var{text} to print a single backslash.
|
|---|
| 1327 |
|
|---|
| 1328 | The commands resume after the last line without a backslash (@code{\}) -
|
|---|
| 1329 | @samp{world} in the following example:
|
|---|
| 1330 | @codequoteundirected on
|
|---|
| 1331 | @codequotebacktick on
|
|---|
| 1332 | @example
|
|---|
| 1333 | $ seq 3 | sed '2i\
|
|---|
| 1334 | hello\
|
|---|
| 1335 | world
|
|---|
| 1336 | s/./X/'
|
|---|
| 1337 | @print{}X
|
|---|
| 1338 | @print{}hello
|
|---|
| 1339 | @print{}world
|
|---|
| 1340 | @print{}X
|
|---|
| 1341 | @print{}X
|
|---|
| 1342 | @end example
|
|---|
| 1343 | @codequoteundirected off
|
|---|
| 1344 | @codequotebacktick off
|
|---|
| 1345 |
|
|---|
| 1346 | As a GNU extension, the @code{i} command and @var{text} can be
|
|---|
| 1347 | separated into two @code{-e} parameters, enabling easier scripting:
|
|---|
| 1348 | @codequoteundirected on
|
|---|
| 1349 | @codequotebacktick on
|
|---|
| 1350 | @example
|
|---|
| 1351 | $ seq 3 | sed -e '2i\' -e hello
|
|---|
| 1352 | 1
|
|---|
| 1353 | hello
|
|---|
| 1354 | 2
|
|---|
| 1355 | 3
|
|---|
| 1356 |
|
|---|
| 1357 | $ sed -e '2i\' -e "$VAR"
|
|---|
| 1358 | @end example
|
|---|
| 1359 | @codequoteundirected off
|
|---|
| 1360 | @codequotebacktick off
|
|---|
| 1361 |
|
|---|
| 1362 | @item c @var{text}
|
|---|
| 1363 | Replaces the line(s) with @var{text}. This is a GNU extension
|
|---|
| 1364 | to the standard @code{c} command - see below for details.
|
|---|
| 1365 |
|
|---|
| 1366 | Example: Replace the 2nd to 9th lines with the word @samp{hello}:
|
|---|
| 1367 | @codequoteundirected on
|
|---|
| 1368 | @codequotebacktick on
|
|---|
| 1369 | @example
|
|---|
| 1370 | $ seq 10 | sed '2,9c hello'
|
|---|
| 1371 | 1
|
|---|
| 1372 | hello
|
|---|
| 1373 | 10
|
|---|
| 1374 | @end example
|
|---|
| 1375 | @codequoteundirected off
|
|---|
| 1376 | @codequotebacktick off
|
|---|
| 1377 |
|
|---|
| 1378 | Leading whitespace after the @code{c} command is ignored.
|
|---|
| 1379 | The text to add is read until the end of the line.
|
|---|
| 1380 |
|
|---|
| 1381 | @item c\
|
|---|
| 1382 | @itemx @var{text}
|
|---|
| 1383 | @findex c (change to text lines) command
|
|---|
| 1384 | @cindex Replacing selected lines with other text
|
|---|
| 1385 | Delete the lines matching the address or address-range,
|
|---|
| 1386 | and output the lines of text which follow this command.
|
|---|
| 1387 |
|
|---|
| 1388 | Example: Replace 2nd to 4th lines with the words @samp{hello} and
|
|---|
| 1389 | @samp{world} (@print{} indicates printed output lines):
|
|---|
| 1390 | @codequoteundirected on
|
|---|
| 1391 | @codequotebacktick on
|
|---|
| 1392 | @example
|
|---|
| 1393 | $ seq 5 | sed '2,4c\
|
|---|
| 1394 | hello\
|
|---|
| 1395 | world'
|
|---|
| 1396 | @print{}1
|
|---|
| 1397 | @print{}hello
|
|---|
| 1398 | @print{}world
|
|---|
| 1399 | @print{}5
|
|---|
| 1400 | @end example
|
|---|
| 1401 | @codequoteundirected off
|
|---|
| 1402 | @codequotebacktick off
|
|---|
| 1403 |
|
|---|
| 1404 | If no addresses are given, each line is replaced.
|
|---|
| 1405 |
|
|---|
| 1406 | A new cycle is started after this command is done,
|
|---|
| 1407 | since the pattern space will have been deleted.
|
|---|
| 1408 | In the following example, the @code{c} starts a
|
|---|
| 1409 | new cycle and the substitution command is not performed
|
|---|
| 1410 | on the replaced text:
|
|---|
| 1411 |
|
|---|
| 1412 | @codequoteundirected on
|
|---|
| 1413 | @codequotebacktick on
|
|---|
| 1414 | @example
|
|---|
| 1415 | $ seq 3 | sed '2c\
|
|---|
| 1416 | hello
|
|---|
| 1417 | s/./X/'
|
|---|
| 1418 | @print{}X
|
|---|
| 1419 | @print{}hello
|
|---|
| 1420 | @print{}X
|
|---|
| 1421 | @end example
|
|---|
| 1422 | @codequoteundirected off
|
|---|
| 1423 | @codequotebacktick off
|
|---|
| 1424 |
|
|---|
| 1425 | As a GNU extension, the @code{c} command and @var{text} can be
|
|---|
| 1426 | separated into two @code{-e} parameters, enabling easier scripting:
|
|---|
| 1427 | @codequoteundirected on
|
|---|
| 1428 | @codequotebacktick on
|
|---|
| 1429 | @example
|
|---|
| 1430 | $ seq 3 | sed -e '2c\' -e hello
|
|---|
| 1431 | 1
|
|---|
| 1432 | hello
|
|---|
| 1433 | 3
|
|---|
| 1434 |
|
|---|
| 1435 | $ sed -e '2c\' -e "$VAR"
|
|---|
| 1436 | @end example
|
|---|
| 1437 | @codequoteundirected off
|
|---|
| 1438 | @codequotebacktick off
|
|---|
| 1439 |
|
|---|
| 1440 |
|
|---|
| 1441 | @item =
|
|---|
| 1442 | @findex = (print line number) command
|
|---|
| 1443 | @cindex Printing line number
|
|---|
| 1444 | @cindex Line number, printing
|
|---|
| 1445 | Print out the current input line number (with a trailing newline).
|
|---|
| 1446 |
|
|---|
| 1447 | @codequoteundirected on
|
|---|
| 1448 | @codequotebacktick on
|
|---|
| 1449 | @example
|
|---|
| 1450 | $ printf '%s\n' aaa bbb ccc | sed =
|
|---|
| 1451 | 1
|
|---|
| 1452 | aaa
|
|---|
| 1453 | 2
|
|---|
| 1454 | bbb
|
|---|
| 1455 | 3
|
|---|
| 1456 | ccc
|
|---|
| 1457 | @end example
|
|---|
| 1458 | @codequoteundirected off
|
|---|
| 1459 | @codequotebacktick off
|
|---|
| 1460 |
|
|---|
| 1461 | @cindex @value{SSEDEXT}, two addresses supported by most commands
|
|---|
| 1462 | As a GNU extension, this command accepts two addresses.
|
|---|
| 1463 |
|
|---|
| 1464 |
|
|---|
| 1465 |
|
|---|
| 1466 |
|
|---|
| 1467 | @item l @var{n}
|
|---|
| 1468 | @findex l (list unambiguously) command
|
|---|
| 1469 | @cindex List pattern space
|
|---|
| 1470 | @cindex Printing text unambiguously
|
|---|
| 1471 | @cindex Line length, setting
|
|---|
| 1472 | @cindex @value{SSEDEXT}, setting line length
|
|---|
| 1473 | Print the pattern space in an unambiguous form:
|
|---|
| 1474 | non-printable characters (and the @code{\} character)
|
|---|
| 1475 | are printed in C-style escaped form; long lines are split,
|
|---|
| 1476 | with a trailing @code{\} character to indicate the split;
|
|---|
| 1477 | the end of each line is marked with a @code{$}.
|
|---|
| 1478 |
|
|---|
| 1479 | @var{n} specifies the desired line-wrap length;
|
|---|
| 1480 | a length of 0 (zero) means to never wrap long lines. If omitted,
|
|---|
| 1481 | the default as specified on the command line is used. The @var{n}
|
|---|
| 1482 | parameter is a @value{SSED} extension.
|
|---|
| 1483 |
|
|---|
| 1484 | @item r @var{filename}
|
|---|
| 1485 |
|
|---|
| 1486 | @findex r (read file) command
|
|---|
| 1487 | @cindex Read text from a file
|
|---|
| 1488 | Reads file @var{filename}. Example:
|
|---|
| 1489 |
|
|---|
| 1490 | @codequoteundirected on
|
|---|
| 1491 | @codequotebacktick on
|
|---|
| 1492 | @example
|
|---|
| 1493 | $ seq 3 | sed '2r/etc/hostname'
|
|---|
| 1494 | 1
|
|---|
| 1495 | 2
|
|---|
| 1496 | fencepost.gnu.org
|
|---|
| 1497 | 3
|
|---|
| 1498 | @end example
|
|---|
| 1499 | @codequoteundirected off
|
|---|
| 1500 | @codequotebacktick off
|
|---|
| 1501 |
|
|---|
| 1502 | @cindex @value{SSEDEXT}, @file{/dev/stdin} file
|
|---|
| 1503 | Queue the contents of @var{filename} to be read and
|
|---|
| 1504 | inserted into the output stream at the end of the current cycle,
|
|---|
| 1505 | or when the next input line is read.
|
|---|
| 1506 | Note that if @var{filename} cannot be read, it is treated as
|
|---|
| 1507 | if it were an empty file, without any error indication.
|
|---|
| 1508 |
|
|---|
| 1509 | As a @value{SSED} extension, the special value @file{/dev/stdin}
|
|---|
| 1510 | is supported for the file name, which reads the contents of the
|
|---|
| 1511 | standard input.
|
|---|
| 1512 |
|
|---|
| 1513 | @cindex @value{SSEDEXT}, two addresses supported by most commands
|
|---|
| 1514 | As a GNU extension, this command accepts two addresses. The
|
|---|
| 1515 | file will then be reread and inserted on each of the addressed lines.
|
|---|
| 1516 |
|
|---|
| 1517 | As a @value{SSED} extension, the @code{r} command accepts a zero address,
|
|---|
| 1518 | inserting a file @emph{before} the first line of the input
|
|---|
| 1519 | @pxref{Adding a header to multiple files}.
|
|---|
| 1520 |
|
|---|
| 1521 | @item w @var{filename}
|
|---|
| 1522 | @findex w (write file) command
|
|---|
| 1523 | @cindex Write to a file
|
|---|
| 1524 | @cindex @value{SSEDEXT}, @file{/dev/stdout} file
|
|---|
| 1525 | @cindex @value{SSEDEXT}, @file{/dev/stderr} file
|
|---|
| 1526 | Write the pattern space to @var{filename}.
|
|---|
| 1527 | As a @value{SSED} extension, two special values of @var{filename} are
|
|---|
| 1528 | supported: @file{/dev/stderr}, which writes the result to the standard
|
|---|
| 1529 | error, and @file{/dev/stdout}, which writes to the standard
|
|---|
| 1530 | output.@footnote{This is equivalent to @code{p} unless the @option{-i}
|
|---|
| 1531 | option is being used.}
|
|---|
| 1532 |
|
|---|
| 1533 | The file will be created (or truncated) before the first input line is
|
|---|
| 1534 | read; all @code{w} commands (including instances of the @code{w} flag
|
|---|
| 1535 | on successful @code{s} commands) which refer to the same @var{filename}
|
|---|
| 1536 | are output without closing and reopening the file.
|
|---|
| 1537 |
|
|---|
| 1538 | @item D
|
|---|
| 1539 | @findex D (delete first line) command
|
|---|
| 1540 | @cindex Delete first line from pattern space
|
|---|
| 1541 | If pattern space contains no newline, start a normal new cycle as if
|
|---|
| 1542 | the @code{d} command was issued. Otherwise, delete text in the pattern
|
|---|
| 1543 | space up to the first newline, and restart cycle with the resultant
|
|---|
| 1544 | pattern space, without reading a new line of input.
|
|---|
| 1545 |
|
|---|
| 1546 | @item N
|
|---|
| 1547 | @findex N (append Next line) command
|
|---|
| 1548 | @cindex Next input line, append to pattern space
|
|---|
| 1549 | @cindex Append next input line to pattern space
|
|---|
| 1550 | Add a newline to the pattern space,
|
|---|
| 1551 | then append the next line of input to the pattern space.
|
|---|
| 1552 | If there is no more input then @command{sed} exits without processing
|
|---|
| 1553 | any more commands.
|
|---|
| 1554 |
|
|---|
| 1555 | When @option{-z} is used, a zero byte (the ascii @samp{NUL} character) is
|
|---|
| 1556 | added between the lines (instead of a new line).
|
|---|
| 1557 |
|
|---|
| 1558 | By default @command{sed} does not terminate if there is no 'next' input line.
|
|---|
| 1559 | This is a GNU extension which can be disabled with @option{--posix}.
|
|---|
| 1560 | @xref{N_command_last_line,,N command on the last line}.
|
|---|
| 1561 |
|
|---|
| 1562 |
|
|---|
| 1563 | @item P
|
|---|
| 1564 | @findex P (print first line) command
|
|---|
| 1565 | @cindex Print first line from pattern space
|
|---|
| 1566 | Print out the portion of the pattern space up to the first newline.
|
|---|
| 1567 |
|
|---|
| 1568 | @item h
|
|---|
| 1569 | @findex h (hold) command
|
|---|
| 1570 | @cindex Copy pattern space into hold space
|
|---|
| 1571 | @cindex Replace hold space with copy of pattern space
|
|---|
| 1572 | @cindex Hold space, copying pattern space into
|
|---|
| 1573 | Replace the contents of the hold space with the contents of the pattern space.
|
|---|
| 1574 |
|
|---|
| 1575 | @item H
|
|---|
| 1576 | @findex H (append Hold) command
|
|---|
| 1577 | @cindex Append pattern space to hold space
|
|---|
| 1578 | @cindex Hold space, appending from pattern space
|
|---|
| 1579 | Append a newline to the contents of the hold space,
|
|---|
| 1580 | and then append the contents of the pattern space to that of the hold space.
|
|---|
| 1581 |
|
|---|
| 1582 | @item g
|
|---|
| 1583 | @findex g (get) command
|
|---|
| 1584 | @cindex Copy hold space into pattern space
|
|---|
| 1585 | @cindex Replace pattern space with copy of hold space
|
|---|
| 1586 | @cindex Hold space, copy into pattern space
|
|---|
| 1587 | Replace the contents of the pattern space with the contents of the hold space.
|
|---|
| 1588 |
|
|---|
| 1589 | @item G
|
|---|
| 1590 | @findex G (appending Get) command
|
|---|
| 1591 | @cindex Append hold space to pattern space
|
|---|
| 1592 | @cindex Hold space, appending to pattern space
|
|---|
| 1593 | Append a newline to the contents of the pattern space,
|
|---|
| 1594 | and then append the contents of the hold space to that of the pattern space.
|
|---|
| 1595 |
|
|---|
| 1596 | @item x
|
|---|
| 1597 | @findex x (eXchange) command
|
|---|
| 1598 | @cindex Exchange hold space with pattern space
|
|---|
| 1599 | @cindex Hold space, exchange with pattern space
|
|---|
| 1600 | Exchange the contents of the hold and pattern spaces.
|
|---|
| 1601 |
|
|---|
| 1602 | @end table
|
|---|
| 1603 |
|
|---|
| 1604 |
|
|---|
| 1605 | @node Programming Commands
|
|---|
| 1606 | @section Commands for @command{sed} gurus
|
|---|
| 1607 |
|
|---|
| 1608 | In most cases, use of these commands indicates that you are
|
|---|
| 1609 | probably better off programming in something like @command{awk}
|
|---|
| 1610 | or Perl. But occasionally one is committed to sticking
|
|---|
| 1611 | with @command{sed}, and these commands can enable one to write
|
|---|
| 1612 | quite convoluted scripts.
|
|---|
| 1613 |
|
|---|
| 1614 | @cindex Flow of control in scripts
|
|---|
| 1615 | @table @code
|
|---|
| 1616 | @item : @var{label}
|
|---|
| 1617 | [No addresses allowed.]
|
|---|
| 1618 |
|
|---|
| 1619 | @findex : (label) command
|
|---|
| 1620 | @cindex Labels, in scripts
|
|---|
| 1621 | Specify the location of @var{label} for branch commands.
|
|---|
| 1622 | In all other respects, a no-op.
|
|---|
| 1623 |
|
|---|
| 1624 | @item b @var{label}
|
|---|
| 1625 | @findex b (branch) command
|
|---|
| 1626 | @cindex Branch to a label, unconditionally
|
|---|
| 1627 | @cindex Goto, in scripts
|
|---|
| 1628 | Unconditionally branch to @var{label}.
|
|---|
| 1629 | The @var{label} may be omitted, in which case the next cycle is started.
|
|---|
| 1630 |
|
|---|
| 1631 | @item t @var{label}
|
|---|
| 1632 | @findex t (test and branch if successful) command
|
|---|
| 1633 | @cindex Branch to a label, if @code{s///} succeeded
|
|---|
| 1634 | @cindex Conditional branch
|
|---|
| 1635 | Branch to @var{label} only if there has been a successful @code{s}ubstitution
|
|---|
| 1636 | since the last input line was read or conditional branch was taken.
|
|---|
| 1637 | The @var{label} may be omitted, in which case the next cycle is started.
|
|---|
| 1638 |
|
|---|
| 1639 | @end table
|
|---|
| 1640 |
|
|---|
| 1641 | @node Extended Commands
|
|---|
| 1642 | @section Commands Specific to @value{SSED}
|
|---|
| 1643 |
|
|---|
| 1644 | These commands are specific to @value{SSED}, so you
|
|---|
| 1645 | must use them with care and only when you are sure that
|
|---|
| 1646 | hindering portability is not evil. They allow you to check
|
|---|
| 1647 | for @value{SSED} extensions or to do tasks that are required
|
|---|
| 1648 | quite often, yet are unsupported by standard @command{sed}s.
|
|---|
| 1649 |
|
|---|
| 1650 | @table @code
|
|---|
| 1651 | @item e [@var{command}]
|
|---|
| 1652 | @findex e (evaluate) command
|
|---|
| 1653 | @cindex Evaluate Bourne-shell commands
|
|---|
| 1654 | @cindex Subprocesses
|
|---|
| 1655 | @cindex @value{SSEDEXT}, evaluating Bourne-shell commands
|
|---|
| 1656 | @cindex @value{SSEDEXT}, subprocesses
|
|---|
| 1657 | This command allows one to pipe input from a shell command
|
|---|
| 1658 | into pattern space. Without parameters, the @code{e} command
|
|---|
| 1659 | executes the command that is found in pattern space and
|
|---|
| 1660 | replaces the pattern space with the output; a trailing newline
|
|---|
| 1661 | is suppressed.
|
|---|
| 1662 |
|
|---|
| 1663 | If a parameter is specified, instead, the @code{e} command
|
|---|
| 1664 | interprets it as a command and sends its output to the output stream.
|
|---|
| 1665 | The command can run across multiple lines, all but the last ending with
|
|---|
| 1666 | a back-slash.
|
|---|
| 1667 |
|
|---|
| 1668 | In both cases, the results are undefined if the command to be
|
|---|
| 1669 | executed contains a @sc{nul} character.
|
|---|
| 1670 |
|
|---|
| 1671 | Note that, unlike the @code{r} command, the output of the command will
|
|---|
| 1672 | be printed immediately; the @code{r} command instead delays the output
|
|---|
| 1673 | to the end of the current cycle.
|
|---|
| 1674 |
|
|---|
| 1675 | @item F
|
|---|
| 1676 | @findex F (File name) command
|
|---|
| 1677 | @cindex Printing file name
|
|---|
| 1678 | @cindex File name, printing
|
|---|
| 1679 | Print out the file name of the current input file (with a trailing
|
|---|
| 1680 | newline).
|
|---|
| 1681 |
|
|---|
| 1682 | @item Q [@var{exit-code}]
|
|---|
| 1683 | This command accepts only one address.
|
|---|
| 1684 |
|
|---|
| 1685 | @findex Q (silent Quit) command
|
|---|
| 1686 | @cindex @value{SSEDEXT}, quitting silently
|
|---|
| 1687 | @cindex @value{SSEDEXT}, returning an exit code
|
|---|
| 1688 | @cindex Quitting
|
|---|
| 1689 | This command is the same as @code{q}, but will not print the
|
|---|
| 1690 | contents of pattern space. Like @code{q}, it provides the
|
|---|
| 1691 | ability to return an exit code to the caller.
|
|---|
| 1692 |
|
|---|
| 1693 | This command can be useful because the only alternative ways
|
|---|
| 1694 | to accomplish this apparently trivial function are to use
|
|---|
| 1695 | the @option{-n} option (which can unnecessarily complicate
|
|---|
| 1696 | your script) or resorting to the following snippet, which
|
|---|
| 1697 | wastes time by reading the whole file without any visible effect:
|
|---|
| 1698 |
|
|---|
| 1699 | @example
|
|---|
| 1700 | :eat
|
|---|
| 1701 | $d @i{@r{Quit silently on the last line}}
|
|---|
| 1702 | N @i{@r{Read another line, silently}}
|
|---|
| 1703 | g @i{@r{Overwrite pattern space each time to save memory}}
|
|---|
| 1704 | b eat
|
|---|
| 1705 | @end example
|
|---|
| 1706 |
|
|---|
| 1707 | @item R @var{filename}
|
|---|
| 1708 | @findex R (read line) command
|
|---|
| 1709 | @cindex Read text from a file
|
|---|
| 1710 | @cindex @value{SSEDEXT}, reading a file a line at a time
|
|---|
| 1711 | @cindex @value{SSEDEXT}, @code{R} command
|
|---|
| 1712 | @cindex @value{SSEDEXT}, @file{/dev/stdin} file
|
|---|
| 1713 | Queue a line of @var{filename} to be read and
|
|---|
| 1714 | inserted into the output stream at the end of the current cycle,
|
|---|
| 1715 | or when the next input line is read.
|
|---|
| 1716 | Note that if @var{filename} cannot be read, or if its end is
|
|---|
| 1717 | reached, no line is appended, without any error indication.
|
|---|
| 1718 |
|
|---|
| 1719 | As with the @code{r} command, the special value @file{/dev/stdin}
|
|---|
| 1720 | is supported for the file name, which reads a line from the
|
|---|
| 1721 | standard input.
|
|---|
| 1722 |
|
|---|
| 1723 | @item T @var{label}
|
|---|
| 1724 | @findex T (test and branch if failed) command
|
|---|
| 1725 | @cindex @value{SSEDEXT}, branch if @code{s///} failed
|
|---|
| 1726 | @cindex Branch to a label, if @code{s///} failed
|
|---|
| 1727 | @cindex Conditional branch
|
|---|
| 1728 | Branch to @var{label} only if there have been no successful
|
|---|
| 1729 | @code{s}ubstitutions since the last input line was read or
|
|---|
| 1730 | conditional branch was taken. The @var{label} may be omitted,
|
|---|
| 1731 | in which case the next cycle is started.
|
|---|
| 1732 |
|
|---|
| 1733 | @item v @var{version}
|
|---|
| 1734 | @findex v (version) command
|
|---|
| 1735 | @cindex @value{SSEDEXT}, checking for their presence
|
|---|
| 1736 | @cindex Requiring @value{SSED}
|
|---|
| 1737 | This command does nothing, but makes @command{sed} fail if
|
|---|
| 1738 | @value{SSED} extensions are not supported, simply because other
|
|---|
| 1739 | versions of @command{sed} do not implement it. In addition, you
|
|---|
| 1740 | can specify the version of @command{sed} that your script
|
|---|
| 1741 | requires, such as @code{4.0.5}. The default is @code{4.0}
|
|---|
| 1742 | because that is the first version that implemented this command.
|
|---|
| 1743 |
|
|---|
| 1744 | This command enables all @value{SSEDEXT} even if
|
|---|
| 1745 | @env{POSIXLY_CORRECT} is set in the environment.
|
|---|
| 1746 |
|
|---|
| 1747 | @item W @var{filename}
|
|---|
| 1748 | @findex W (write first line) command
|
|---|
| 1749 | @cindex Write first line to a file
|
|---|
| 1750 | @cindex @value{SSEDEXT}, writing first line to a file
|
|---|
| 1751 | Write to the given filename the portion of the pattern space up to
|
|---|
| 1752 | the first newline. Everything said under the @code{w} command about
|
|---|
| 1753 | file handling holds here too.
|
|---|
| 1754 |
|
|---|
| 1755 | @item z
|
|---|
| 1756 | @findex z (Zap) command
|
|---|
| 1757 | @cindex @value{SSEDEXT}, emptying pattern space
|
|---|
| 1758 | @cindex Emptying pattern space
|
|---|
| 1759 | This command empties the content of pattern space. It is
|
|---|
| 1760 | usually the same as @samp{s/.*//}, but is more efficient
|
|---|
| 1761 | and works in the presence of invalid multibyte sequences
|
|---|
| 1762 | in the input stream. @sc{posix} mandates that such sequences
|
|---|
| 1763 | are @emph{not} matched by @samp{.}, so that there is no portable
|
|---|
| 1764 | way to clear @command{sed}'s buffers in the middle of the
|
|---|
| 1765 | script in most multibyte locales (including UTF-8 locales).
|
|---|
| 1766 | @end table
|
|---|
| 1767 |
|
|---|
| 1768 |
|
|---|
| 1769 | @node Multiple commands syntax
|
|---|
| 1770 | @section Multiple commands syntax
|
|---|
| 1771 |
|
|---|
| 1772 | @c POSIX says:
|
|---|
| 1773 | @c Editing commands other than {...}, a, b, c, i, r, t, w, :, and #
|
|---|
| 1774 | @c can be followed by a <semicolon>, optional <blank> characters, and
|
|---|
| 1775 | @c another editing command. However, when an s editing command is used
|
|---|
| 1776 | @c with the w flag, following it with another command in this manner
|
|---|
| 1777 | @c produces undefined results.
|
|---|
| 1778 |
|
|---|
| 1779 | There are several methods to specify multiple commands in a @command{sed}
|
|---|
| 1780 | program.
|
|---|
| 1781 |
|
|---|
| 1782 | Using newlines is most natural when running a sed script from a file
|
|---|
| 1783 | (using the @option{-f} option).
|
|---|
| 1784 |
|
|---|
| 1785 | On the command line, all @command{sed} commands may be separated by newlines.
|
|---|
| 1786 | Alternatively, you may specify each command as an argument to an @option{-e}
|
|---|
| 1787 | option:
|
|---|
| 1788 |
|
|---|
| 1789 | @codequoteundirected on
|
|---|
| 1790 | @codequotebacktick on
|
|---|
| 1791 | @example
|
|---|
| 1792 | @group
|
|---|
| 1793 | $ seq 6 | sed '1d
|
|---|
| 1794 | 3d
|
|---|
| 1795 | 5d'
|
|---|
| 1796 | 2
|
|---|
| 1797 | 4
|
|---|
| 1798 | 6
|
|---|
| 1799 |
|
|---|
| 1800 | $ seq 6 | sed -e 1d -e 3d -e 5d
|
|---|
| 1801 | 2
|
|---|
| 1802 | 4
|
|---|
| 1803 | 6
|
|---|
| 1804 | @end group
|
|---|
| 1805 | @end example
|
|---|
| 1806 | @codequoteundirected off
|
|---|
| 1807 | @codequotebacktick off
|
|---|
| 1808 |
|
|---|
| 1809 | A semicolon (@samp{;}) may be used to separate most simple commands:
|
|---|
| 1810 |
|
|---|
| 1811 | @codequoteundirected on
|
|---|
| 1812 | @codequotebacktick on
|
|---|
| 1813 | @example
|
|---|
| 1814 | @group
|
|---|
| 1815 | $ seq 6 | sed '1d;3d;5d'
|
|---|
| 1816 | 2
|
|---|
| 1817 | 4
|
|---|
| 1818 | 6
|
|---|
| 1819 | @end group
|
|---|
| 1820 | @end example
|
|---|
| 1821 | @codequoteundirected off
|
|---|
| 1822 | @codequotebacktick off
|
|---|
| 1823 |
|
|---|
| 1824 | The @code{@{},@code{@}},@code{b},@code{t},@code{T},@code{:} commands can
|
|---|
| 1825 | be separated with a semicolon (this is a non-portable @value{SSED} extension).
|
|---|
| 1826 |
|
|---|
| 1827 | @codequoteundirected on
|
|---|
| 1828 | @codequotebacktick on
|
|---|
| 1829 | @example
|
|---|
| 1830 | @group
|
|---|
| 1831 | $ seq 4 | sed '@{1d;3d@}'
|
|---|
| 1832 | 2
|
|---|
| 1833 | 4
|
|---|
| 1834 |
|
|---|
| 1835 | $ seq 6 | sed '@{1d;3d@};5d'
|
|---|
| 1836 | 2
|
|---|
| 1837 | 4
|
|---|
| 1838 | 6
|
|---|
| 1839 | @end group
|
|---|
| 1840 | @end example
|
|---|
| 1841 | @codequoteundirected off
|
|---|
| 1842 | @codequotebacktick off
|
|---|
| 1843 |
|
|---|
| 1844 | Labels used in @code{b},@code{t},@code{T},@code{:} commands are read
|
|---|
| 1845 | until a semicolon. Leading and trailing whitespace is ignored. In
|
|---|
| 1846 | the examples below the label is @samp{x}. The first example works
|
|---|
| 1847 | with @value{SSED}. The second is a portable equivalent. For more
|
|---|
| 1848 | information about branching and labels @pxref{Branching and flow
|
|---|
| 1849 | control}.
|
|---|
| 1850 |
|
|---|
| 1851 | @codequoteundirected on
|
|---|
| 1852 | @codequotebacktick on
|
|---|
| 1853 | @example
|
|---|
| 1854 | @group
|
|---|
| 1855 | $ seq 3 | sed '/1/b x ; s/^/=/ ; :x ; 3d'
|
|---|
| 1856 | 1
|
|---|
| 1857 | =2
|
|---|
| 1858 |
|
|---|
| 1859 | $ seq 3 | sed -e '/1/bx' -e 's/^/=/' -e ':x' -e '3d'
|
|---|
| 1860 | 1
|
|---|
| 1861 | =2
|
|---|
| 1862 | @end group
|
|---|
| 1863 | @end example
|
|---|
| 1864 | @codequoteundirected off
|
|---|
| 1865 | @codequotebacktick off
|
|---|
| 1866 |
|
|---|
| 1867 |
|
|---|
| 1868 |
|
|---|
| 1869 | @subsection Commands Requiring a newline
|
|---|
| 1870 |
|
|---|
| 1871 | The following commands cannot be separated by a semicolon and
|
|---|
| 1872 | require a newline:
|
|---|
| 1873 |
|
|---|
| 1874 | @table @asis
|
|---|
| 1875 |
|
|---|
| 1876 | @item @code{a},@code{c},@code{i} (append/change/insert)
|
|---|
| 1877 |
|
|---|
| 1878 | All characters following @code{a},@code{c},@code{i} commands are taken
|
|---|
| 1879 | as the text to append/change/insert. Using a semicolon leads to
|
|---|
| 1880 | undesirable results:
|
|---|
| 1881 |
|
|---|
| 1882 | @codequoteundirected on
|
|---|
| 1883 | @codequotebacktick on
|
|---|
| 1884 | @example
|
|---|
| 1885 | @group
|
|---|
| 1886 | $ seq 2 | sed '1aHello ; 2d'
|
|---|
| 1887 | 1
|
|---|
| 1888 | Hello ; 2d
|
|---|
| 1889 | 2
|
|---|
| 1890 | @end group
|
|---|
| 1891 | @end example
|
|---|
| 1892 | @codequoteundirected off
|
|---|
| 1893 | @codequotebacktick off
|
|---|
| 1894 |
|
|---|
| 1895 | Separate the commands using @option{-e} or a newline:
|
|---|
| 1896 |
|
|---|
| 1897 | @codequoteundirected on
|
|---|
| 1898 | @codequotebacktick on
|
|---|
| 1899 | @example
|
|---|
| 1900 | @group
|
|---|
| 1901 | $ seq 2 | sed -e 1aHello -e 2d
|
|---|
| 1902 | 1
|
|---|
| 1903 | Hello
|
|---|
| 1904 |
|
|---|
| 1905 | $ seq 2 | sed '1aHello
|
|---|
| 1906 | 2d'
|
|---|
| 1907 | 1
|
|---|
| 1908 | Hello
|
|---|
| 1909 | @end group
|
|---|
| 1910 | @end example
|
|---|
| 1911 | @codequoteundirected off
|
|---|
| 1912 | @codequotebacktick off
|
|---|
| 1913 |
|
|---|
| 1914 | Note that specifying the text to add (@samp{Hello}) immediately
|
|---|
| 1915 | after @code{a},@code{c},@code{i} is itself a @value{SSED} extension.
|
|---|
| 1916 | A portable, POSIX-compliant alternative is:
|
|---|
| 1917 |
|
|---|
| 1918 | @codequoteundirected on
|
|---|
| 1919 | @codequotebacktick on
|
|---|
| 1920 | @example
|
|---|
| 1921 | @group
|
|---|
| 1922 | $ seq 2 | sed '1a\
|
|---|
| 1923 | Hello
|
|---|
| 1924 | 2d'
|
|---|
| 1925 | 1
|
|---|
| 1926 | Hello
|
|---|
| 1927 | @end group
|
|---|
| 1928 | @end example
|
|---|
| 1929 | @codequoteundirected off
|
|---|
| 1930 | @codequotebacktick off
|
|---|
| 1931 |
|
|---|
| 1932 | @item @code{#} (comment)
|
|---|
| 1933 |
|
|---|
| 1934 | All characters following @samp{#} until the next newline are ignored.
|
|---|
| 1935 |
|
|---|
| 1936 | @codequoteundirected on
|
|---|
| 1937 | @codequotebacktick on
|
|---|
| 1938 | @example
|
|---|
| 1939 | @group
|
|---|
| 1940 | $ seq 3 | sed '# this is a comment ; 2d'
|
|---|
| 1941 | 1
|
|---|
| 1942 | 2
|
|---|
| 1943 | 3
|
|---|
| 1944 |
|
|---|
| 1945 |
|
|---|
| 1946 | $ seq 3 | sed '# this is a comment
|
|---|
| 1947 | 2d'
|
|---|
| 1948 | 1
|
|---|
| 1949 | 3
|
|---|
| 1950 | @end group
|
|---|
| 1951 | @end example
|
|---|
| 1952 | @codequoteundirected off
|
|---|
| 1953 | @codequotebacktick off
|
|---|
| 1954 |
|
|---|
| 1955 | @item @code{r},@code{R},@code{w},@code{W} (reading and writing files)
|
|---|
| 1956 |
|
|---|
| 1957 | The @code{r},@code{R},@code{w},@code{W} commands parse the filename
|
|---|
| 1958 | until end of the line. If whitespace, comments or semicolons are found,
|
|---|
| 1959 | they will be included in the filename, leading to unexpected results:
|
|---|
| 1960 |
|
|---|
| 1961 | @codequoteundirected on
|
|---|
| 1962 | @codequotebacktick on
|
|---|
| 1963 | @example
|
|---|
| 1964 | @group
|
|---|
| 1965 | $ seq 2 | sed '1w hello.txt ; 2d'
|
|---|
| 1966 | 1
|
|---|
| 1967 | 2
|
|---|
| 1968 |
|
|---|
| 1969 | $ ls -log
|
|---|
| 1970 | total 4
|
|---|
| 1971 | -rw-rw-r-- 1 2 Jan 23 23:03 hello.txt ; 2d
|
|---|
| 1972 |
|
|---|
| 1973 | $ cat 'hello.txt ; 2d'
|
|---|
| 1974 | 1
|
|---|
| 1975 | @end group
|
|---|
| 1976 | @end example
|
|---|
| 1977 | @codequoteundirected off
|
|---|
| 1978 | @codequotebacktick off
|
|---|
| 1979 |
|
|---|
| 1980 | Note that @command{sed} silently ignores read/write errors in
|
|---|
| 1981 | @code{r},@code{R},@code{w},@code{W} commands (such as missing files).
|
|---|
| 1982 | In the following example, @command{sed} tries to read a file named
|
|---|
| 1983 | @samp{@file{hello.txt ; N}}. The file is missing, and the error is silently
|
|---|
| 1984 | ignored:
|
|---|
| 1985 |
|
|---|
| 1986 | @codequoteundirected on
|
|---|
| 1987 | @codequotebacktick on
|
|---|
| 1988 | @example
|
|---|
| 1989 | @group
|
|---|
| 1990 | $ echo x | sed '1rhello.txt ; N'
|
|---|
| 1991 | x
|
|---|
| 1992 | @end group
|
|---|
| 1993 | @end example
|
|---|
| 1994 | @codequoteundirected off
|
|---|
| 1995 | @codequotebacktick off
|
|---|
| 1996 |
|
|---|
| 1997 | @item @code{e} (command execution)
|
|---|
| 1998 |
|
|---|
| 1999 | Any characters following the @code{e} command until the end of the line
|
|---|
| 2000 | will be sent to the shell. If whitespace, comments or semicolons are found,
|
|---|
| 2001 | they will be included in the shell command, leading to unexpected results:
|
|---|
| 2002 |
|
|---|
| 2003 | @codequoteundirected on
|
|---|
| 2004 | @codequotebacktick on
|
|---|
| 2005 | @example
|
|---|
| 2006 | @group
|
|---|
| 2007 | $ echo a | sed '1e touch foo#bar'
|
|---|
| 2008 | a
|
|---|
| 2009 |
|
|---|
| 2010 | $ ls -1
|
|---|
| 2011 | foo#bar
|
|---|
| 2012 |
|
|---|
| 2013 | $ echo a | sed '1e touch foo ; s/a/b/'
|
|---|
| 2014 | sh: 1: s/a/b/: not found
|
|---|
| 2015 | a
|
|---|
| 2016 | @end group
|
|---|
| 2017 | @end example
|
|---|
| 2018 | @codequoteundirected off
|
|---|
| 2019 | @codequotebacktick off
|
|---|
| 2020 |
|
|---|
| 2021 |
|
|---|
| 2022 | @item @code{s///[we]} (substitute with @code{e} or @code{w} flags)
|
|---|
| 2023 |
|
|---|
| 2024 | In a substitution command, the @code{w} flag writes the substitution
|
|---|
| 2025 | result to a file, and the @code{e} flag executes the substitution result
|
|---|
| 2026 | as a shell command. As with the @code{r/R/w/W/e} commands, these
|
|---|
| 2027 | must be terminated with a newline. If whitespace, comments or semicolons
|
|---|
| 2028 | are found, they will be included in the shell command or filename, leading to
|
|---|
| 2029 | unexpected results:
|
|---|
| 2030 |
|
|---|
| 2031 | @codequoteundirected on
|
|---|
| 2032 | @codequotebacktick on
|
|---|
| 2033 | @example
|
|---|
| 2034 | @group
|
|---|
| 2035 | $ echo a | sed 's/a/b/w1.txt#foo'
|
|---|
| 2036 | b
|
|---|
| 2037 |
|
|---|
| 2038 | $ ls -1
|
|---|
| 2039 | 1.txt#foo
|
|---|
| 2040 | @end group
|
|---|
| 2041 | @end example
|
|---|
| 2042 | @codequoteundirected off
|
|---|
| 2043 | @codequotebacktick off
|
|---|
| 2044 |
|
|---|
| 2045 | @end table
|
|---|
| 2046 |
|
|---|
| 2047 |
|
|---|
| 2048 | @node sed addresses
|
|---|
| 2049 | @chapter Addresses: selecting lines
|
|---|
| 2050 |
|
|---|
| 2051 | @menu
|
|---|
| 2052 | * Addresses overview:: Addresses overview
|
|---|
| 2053 | * Numeric Addresses:: selecting lines by numbers
|
|---|
| 2054 | * Regexp Addresses:: selecting lines by text matching
|
|---|
| 2055 | * Range Addresses:: selecting a range of lines
|
|---|
| 2056 | * Zero Address:: Using address @code{0}
|
|---|
| 2057 | @end menu
|
|---|
| 2058 |
|
|---|
| 2059 | @node Addresses overview
|
|---|
| 2060 | @section Addresses overview
|
|---|
| 2061 |
|
|---|
| 2062 | @cindex addresses, numeric
|
|---|
| 2063 | @cindex numeric addresses
|
|---|
| 2064 | Addresses determine on which line(s) the @command{sed} command will be
|
|---|
| 2065 | executed. The following command replaces any first occurrence of @samp{hello}
|
|---|
| 2066 | with @samp{world} only on line 144:
|
|---|
| 2067 |
|
|---|
| 2068 | @codequoteundirected on
|
|---|
| 2069 | @codequotebacktick on
|
|---|
| 2070 | @example
|
|---|
| 2071 | sed '144s/hello/world/' input.txt > output.txt
|
|---|
| 2072 | @end example
|
|---|
| 2073 | @codequoteundirected off
|
|---|
| 2074 | @codequotebacktick off
|
|---|
| 2075 |
|
|---|
| 2076 |
|
|---|
| 2077 |
|
|---|
| 2078 | If no address is specified, the command is performed on all lines.
|
|---|
| 2079 | The following command replaces @samp{hello} with @samp{world},
|
|---|
| 2080 | targeting every line of the input file.
|
|---|
| 2081 | However, note that it modifies only the first instance of @samp{hello}
|
|---|
| 2082 | on each line.
|
|---|
| 2083 | Use the @samp{g} modifier to affect every instance on each affected line.
|
|---|
| 2084 |
|
|---|
| 2085 | @codequoteundirected on
|
|---|
| 2086 | @codequotebacktick on
|
|---|
| 2087 | @example
|
|---|
| 2088 | sed 's/hello/world/' input.txt > output.txt
|
|---|
| 2089 | @end example
|
|---|
| 2090 | @codequoteundirected off
|
|---|
| 2091 | @codequotebacktick off
|
|---|
| 2092 |
|
|---|
| 2093 |
|
|---|
| 2094 |
|
|---|
| 2095 | @cindex addresses, regular expression
|
|---|
| 2096 | @cindex regular expression addresses
|
|---|
| 2097 | Addresses can contain regular expressions to match lines based
|
|---|
| 2098 | on content instead of line numbers. The following command replaces
|
|---|
| 2099 | @samp{hello} with @samp{world} only on lines
|
|---|
| 2100 | containing the string @samp{apple}:
|
|---|
| 2101 |
|
|---|
| 2102 | @codequoteundirected on
|
|---|
| 2103 | @codequotebacktick on
|
|---|
| 2104 | @example
|
|---|
| 2105 | sed '/apple/s/hello/world/' input.txt > output.txt
|
|---|
| 2106 | @end example
|
|---|
| 2107 | @codequoteundirected off
|
|---|
| 2108 | @codequotebacktick off
|
|---|
| 2109 |
|
|---|
| 2110 |
|
|---|
| 2111 |
|
|---|
| 2112 | @cindex addresses, range
|
|---|
| 2113 | @cindex range addresses
|
|---|
| 2114 | An address range is specified with two addresses separated by a comma
|
|---|
| 2115 | (@code{,}). Addresses can be numeric, regular expressions, or a mix of
|
|---|
| 2116 | both.
|
|---|
| 2117 | The following command replaces @samp{hello} with @samp{world}
|
|---|
| 2118 | only on lines 4 to 17 (inclusive):
|
|---|
| 2119 |
|
|---|
| 2120 | @codequoteundirected on
|
|---|
| 2121 | @codequotebacktick on
|
|---|
| 2122 | @example
|
|---|
| 2123 | sed '4,17s/hello/world/' input.txt > output.txt
|
|---|
| 2124 | @end example
|
|---|
| 2125 | @codequoteundirected off
|
|---|
| 2126 | @codequotebacktick off
|
|---|
| 2127 |
|
|---|
| 2128 |
|
|---|
| 2129 |
|
|---|
| 2130 | @cindex Excluding lines
|
|---|
| 2131 | @cindex Selecting non-matching lines
|
|---|
| 2132 | @cindex addresses, negating
|
|---|
| 2133 | @cindex addresses, excluding
|
|---|
| 2134 | Appending the @code{!} character to the end of an address
|
|---|
| 2135 | specification (before the command letter) negates the sense of the
|
|---|
| 2136 | match. That is, if the @code{!} character follows an address or an
|
|---|
| 2137 | address range, then only lines which do @emph{not} match the addresses
|
|---|
| 2138 | will be selected. The following command replaces @samp{hello}
|
|---|
| 2139 | with @samp{world} only on lines @emph{not} containing the string
|
|---|
| 2140 | @samp{apple}:
|
|---|
| 2141 |
|
|---|
| 2142 | @example
|
|---|
| 2143 | sed '/apple/!s/hello/world/' input.txt > output.txt
|
|---|
| 2144 | @end example
|
|---|
| 2145 |
|
|---|
| 2146 | The following command replaces @samp{hello} with
|
|---|
| 2147 | @samp{world} only on lines 1 to 3 and from line 18 to the last line of the
|
|---|
| 2148 | input file (i.e. excluding lines 4 to 17):
|
|---|
| 2149 |
|
|---|
| 2150 | @example
|
|---|
| 2151 | sed '4,17!s/hello/world/' input.txt > output.txt
|
|---|
| 2152 | @end example
|
|---|
| 2153 |
|
|---|
| 2154 |
|
|---|
| 2155 |
|
|---|
| 2156 |
|
|---|
| 2157 |
|
|---|
| 2158 | @node Numeric Addresses
|
|---|
| 2159 | @section Selecting lines by numbers
|
|---|
| 2160 | @cindex Addresses, in @command{sed} scripts
|
|---|
| 2161 | @cindex Line selection
|
|---|
| 2162 | @cindex Selecting lines to process
|
|---|
| 2163 |
|
|---|
| 2164 | Addresses in a @command{sed} script can be in any of the following forms:
|
|---|
| 2165 | @table @code
|
|---|
| 2166 | @item @var{number}
|
|---|
| 2167 | @cindex Address, numeric
|
|---|
| 2168 | @cindex Line, selecting by number
|
|---|
| 2169 | Specifying a line number will match only that line in the input.
|
|---|
| 2170 | (Note that @command{sed} counts lines continuously across all input files
|
|---|
| 2171 | unless @option{-i} or @option{-s} options are specified.)
|
|---|
| 2172 |
|
|---|
| 2173 | @item $
|
|---|
| 2174 | @cindex Address, last line
|
|---|
| 2175 | @cindex Last line, selecting
|
|---|
| 2176 | @cindex Line, selecting last
|
|---|
| 2177 | This address matches the last line of the last file of input, or
|
|---|
| 2178 | the last line of each file when the @option{-i} or @option{-s} options
|
|---|
| 2179 | are specified.
|
|---|
| 2180 |
|
|---|
| 2181 |
|
|---|
| 2182 | @item @var{first}~@var{step}
|
|---|
| 2183 | @cindex GNU extensions, @samp{@var{n}~@var{m}} addresses
|
|---|
| 2184 | This GNU extension matches every @var{step}th line
|
|---|
| 2185 | starting with line @var{first}.
|
|---|
| 2186 | In particular, lines will be selected when there exists
|
|---|
| 2187 | a non-negative @var{n} such that the current line-number equals
|
|---|
| 2188 | @var{first} + (@var{n} * @var{step}).
|
|---|
| 2189 | Thus, one would use @code{1~2} to select the odd-numbered lines and
|
|---|
| 2190 | @code{0~2} for even-numbered lines;
|
|---|
| 2191 | to pick every third line starting with the second, @samp{2~3} would be used;
|
|---|
| 2192 | to pick every fifth line starting with the tenth, use @samp{10~5};
|
|---|
| 2193 | and @samp{50~0} is just an obscure way of saying @code{50}.
|
|---|
| 2194 |
|
|---|
| 2195 | The following commands demonstrate the step address usage:
|
|---|
| 2196 |
|
|---|
| 2197 | @example
|
|---|
| 2198 | $ seq 10 | sed -n '0~4p'
|
|---|
| 2199 | 4
|
|---|
| 2200 | 8
|
|---|
| 2201 |
|
|---|
| 2202 | $ seq 10 | sed -n '1~3p'
|
|---|
| 2203 | 1
|
|---|
| 2204 | 4
|
|---|
| 2205 | 7
|
|---|
| 2206 | 10
|
|---|
| 2207 | @end example
|
|---|
| 2208 |
|
|---|
| 2209 |
|
|---|
| 2210 | @end table
|
|---|
| 2211 |
|
|---|
| 2212 |
|
|---|
| 2213 |
|
|---|
| 2214 | @node Regexp Addresses
|
|---|
| 2215 | @section selecting lines by text matching
|
|---|
| 2216 |
|
|---|
| 2217 | @value{SSED} supports the following regular expression addresses.
|
|---|
| 2218 | The default regular expression is
|
|---|
| 2219 | @ref{BRE syntax, , Basic Regular Expression (BRE)}.
|
|---|
| 2220 | If @option{-E} or @option{-r} options are used, The regular expression should be
|
|---|
| 2221 | in @ref{ERE syntax, , Extended Regular Expression (ERE)} syntax.
|
|---|
| 2222 | @xref{BRE vs ERE}.
|
|---|
| 2223 |
|
|---|
| 2224 | @table @code
|
|---|
| 2225 | @item /@var{regexp}/
|
|---|
| 2226 | @cindex Address, as a regular expression
|
|---|
| 2227 | @cindex Line, selecting by regular expression match
|
|---|
| 2228 | This will select any line which matches the regular expression @var{regexp}.
|
|---|
| 2229 | If @var{regexp} itself includes any @code{/} characters,
|
|---|
| 2230 | each must be escaped by a backslash (@code{\}).
|
|---|
| 2231 |
|
|---|
| 2232 | The following command prints lines in @file{/etc/passwd}
|
|---|
| 2233 | which end with @samp{bash}@footnote{
|
|---|
| 2234 | There are of course many other ways to do the same,
|
|---|
| 2235 | e.g.
|
|---|
| 2236 | @example
|
|---|
| 2237 | grep 'bash$' /etc/passwd
|
|---|
| 2238 | awk -F: '$7 == "/bin/bash"' /etc/passwd
|
|---|
| 2239 | @end example
|
|---|
| 2240 | }:
|
|---|
| 2241 |
|
|---|
| 2242 | @example
|
|---|
| 2243 | sed -n '/bash$/p' /etc/passwd
|
|---|
| 2244 | @end example
|
|---|
| 2245 |
|
|---|
| 2246 | @cindex empty regular expression
|
|---|
| 2247 | @cindex @value{SSEDEXT}, modifiers and the empty regular expression
|
|---|
| 2248 | The empty regular expression @samp{//} repeats the last regular
|
|---|
| 2249 | expression match (the same holds if the empty regular expression is
|
|---|
| 2250 | passed to the @code{s} command). Note that modifiers to regular expressions
|
|---|
| 2251 | are evaluated when the regular expression is compiled, thus it is invalid to
|
|---|
| 2252 | specify them together with the empty regular expression.
|
|---|
| 2253 |
|
|---|
| 2254 | @item \%@var{regexp}%
|
|---|
| 2255 | (The @code{%} may be replaced by any other single character.)
|
|---|
| 2256 |
|
|---|
| 2257 | @cindex Slash character, in regular expressions
|
|---|
| 2258 | This also matches the regular expression @var{regexp},
|
|---|
| 2259 | but allows one to use a different delimiter than @code{/}.
|
|---|
| 2260 | This is particularly useful if the @var{regexp} itself contains
|
|---|
| 2261 | a lot of slashes, since it avoids the tedious escaping of every @code{/}.
|
|---|
| 2262 | If @var{regexp} itself includes any delimiter characters,
|
|---|
| 2263 | each must be escaped by a backslash (@code{\}).
|
|---|
| 2264 |
|
|---|
| 2265 | The following commands are equivalent. They print lines
|
|---|
| 2266 | which start with @samp{/home/alice/documents/}:
|
|---|
| 2267 |
|
|---|
| 2268 | @example
|
|---|
| 2269 | sed -n '/^\/home\/alice\/documents\//p'
|
|---|
| 2270 | sed -n '\%^/home/alice/documents/%p'
|
|---|
| 2271 | sed -n '\;^/home/alice/documents/;p'
|
|---|
| 2272 | @end example
|
|---|
| 2273 |
|
|---|
| 2274 |
|
|---|
| 2275 | @item /@var{regexp}/I
|
|---|
| 2276 | @itemx \%@var{regexp}%I
|
|---|
| 2277 | @cindex GNU extensions, @code{I} modifier
|
|---|
| 2278 | @cindex case insensitive, regular expression
|
|---|
| 2279 | The @code{I} modifier to regular-expression matching is a GNU
|
|---|
| 2280 | extension which causes the @var{regexp} to be matched in
|
|---|
| 2281 | a case-insensitive manner.
|
|---|
| 2282 |
|
|---|
| 2283 | In many other programming languages, a lower case @code{i} is used
|
|---|
| 2284 | for case-insensitive regular expression matching. However, in @command{sed}
|
|---|
| 2285 | the @code{i} is used for the insert command (@pxref{insert command}).
|
|---|
| 2286 |
|
|---|
| 2287 | Observe the difference between the following examples.
|
|---|
| 2288 |
|
|---|
| 2289 | In this example, @code{/b/I} is the address: regular expression with @code{I}
|
|---|
| 2290 | modifier. @code{d} is the delete command:
|
|---|
| 2291 |
|
|---|
| 2292 | @example
|
|---|
| 2293 | $ printf "%s\n" a b c | sed '/b/Id'
|
|---|
| 2294 | a
|
|---|
| 2295 | c
|
|---|
| 2296 | @end example
|
|---|
| 2297 |
|
|---|
| 2298 | Here, @code{/b/} is the address: a regular expression.
|
|---|
| 2299 | @code{i} is the insert command.
|
|---|
| 2300 | @code{d} is the value to insert.
|
|---|
| 2301 | A line with @samp{d} is then inserted above the matched line:
|
|---|
| 2302 |
|
|---|
| 2303 | @example
|
|---|
| 2304 | $ printf "%s\n" a b c | sed '/b/id'
|
|---|
| 2305 | a
|
|---|
| 2306 | d
|
|---|
| 2307 | b
|
|---|
| 2308 | c
|
|---|
| 2309 | @end example
|
|---|
| 2310 |
|
|---|
| 2311 | @item /@var{regexp}/M
|
|---|
| 2312 | @itemx \%@var{regexp}%M
|
|---|
| 2313 | @cindex @value{SSEDEXT}, @code{M} modifier
|
|---|
| 2314 | The @code{M} modifier to regular-expression matching is a @value{SSED}
|
|---|
| 2315 | extension which directs @value{SSED} to match the regular expression
|
|---|
| 2316 | in @cite{multi-line} mode. The modifier causes @code{^} and @code{$} to
|
|---|
| 2317 | match respectively (in addition to the normal behavior) the empty string
|
|---|
| 2318 | after a newline, and the empty string before a newline. There are
|
|---|
| 2319 | special character sequences
|
|---|
| 2320 | @ifclear PERL
|
|---|
| 2321 | (@code{\`} and @code{\'})
|
|---|
| 2322 | @end ifclear
|
|---|
| 2323 | which always match the beginning or the end of the buffer.
|
|---|
| 2324 | In addition,
|
|---|
| 2325 | the period character does not match a new-line character in
|
|---|
| 2326 | multi-line mode.
|
|---|
| 2327 | @end table
|
|---|
| 2328 |
|
|---|
| 2329 |
|
|---|
| 2330 | @cindex regex addresses and pattern space
|
|---|
| 2331 | @cindex regex addresses and input lines
|
|---|
| 2332 | Regex addresses operate on the content of the current
|
|---|
| 2333 | pattern space. If the pattern space is changed (for example with @code{s///}
|
|---|
| 2334 | command) the regular expression matching will operate on the changed text.
|
|---|
| 2335 |
|
|---|
| 2336 | In the following example, automatic printing is disabled with
|
|---|
| 2337 | @option{-n}. The @code{s/2/X/} command changes lines containing
|
|---|
| 2338 | @samp{2} to @samp{X}. The command @code{/[0-9]/p} matches
|
|---|
| 2339 | lines with digits and prints them.
|
|---|
| 2340 | Because the second line is changed before the @code{/[0-9]/} regex,
|
|---|
| 2341 | it will not match and will not be printed:
|
|---|
| 2342 |
|
|---|
| 2343 | @codequoteundirected on
|
|---|
| 2344 | @codequotebacktick on
|
|---|
| 2345 | @example
|
|---|
| 2346 | @group
|
|---|
| 2347 | $ seq 3 | sed -n 's/2/X/ ; /[0-9]/p'
|
|---|
| 2348 | 1
|
|---|
| 2349 | 3
|
|---|
| 2350 | @end group
|
|---|
| 2351 | @end example
|
|---|
| 2352 | @codequoteundirected off
|
|---|
| 2353 | @codequotebacktick off
|
|---|
| 2354 |
|
|---|
| 2355 |
|
|---|
| 2356 | @node Range Addresses
|
|---|
| 2357 | @section Range Addresses
|
|---|
| 2358 |
|
|---|
| 2359 | @cindex Range of lines
|
|---|
| 2360 | @cindex Several lines, selecting
|
|---|
| 2361 | An address range can be specified by specifying two addresses
|
|---|
| 2362 | separated by a comma (@code{,}). An address range matches lines
|
|---|
| 2363 | starting from where the first address matches, and continues
|
|---|
| 2364 | until the second address matches (inclusively):
|
|---|
| 2365 |
|
|---|
| 2366 | @example
|
|---|
| 2367 | $ seq 10 | sed -n '4,6p'
|
|---|
| 2368 | 4
|
|---|
| 2369 | 5
|
|---|
| 2370 | 6
|
|---|
| 2371 | @end example
|
|---|
| 2372 |
|
|---|
| 2373 | If the second address is a @var{regexp}, then checking for the
|
|---|
| 2374 | ending match will start with the line @emph{following} the
|
|---|
| 2375 | line which matched the first address: a range will always
|
|---|
| 2376 | span at least two lines (except of course if the input stream
|
|---|
| 2377 | ends).
|
|---|
| 2378 |
|
|---|
| 2379 | @example
|
|---|
| 2380 | $ seq 10 | sed -n '4,/[0-9]/p'
|
|---|
| 2381 | 4
|
|---|
| 2382 | 5
|
|---|
| 2383 | @end example
|
|---|
| 2384 |
|
|---|
| 2385 | If the second address is a @var{number} less than (or equal to)
|
|---|
| 2386 | the line matching the first address, then only the one line is
|
|---|
| 2387 | matched:
|
|---|
| 2388 |
|
|---|
| 2389 | @example
|
|---|
| 2390 | $ seq 10 | sed -n '4,1p'
|
|---|
| 2391 | 4
|
|---|
| 2392 | @end example
|
|---|
| 2393 |
|
|---|
| 2394 | @anchor{Zero Address Regex Range}
|
|---|
| 2395 | @cindex Special addressing forms
|
|---|
| 2396 | @cindex Range with start address of zero
|
|---|
| 2397 | @cindex Zero, as range start address
|
|---|
| 2398 | @cindex @var{addr1},+N
|
|---|
| 2399 | @cindex @var{addr1},~N
|
|---|
| 2400 | @cindex GNU extensions, special two-address forms
|
|---|
| 2401 | @cindex GNU extensions, @code{0} address
|
|---|
| 2402 | @cindex GNU extensions, 0,@var{addr2} addressing
|
|---|
| 2403 | @cindex GNU extensions, @var{addr1},+@var{N} addressing
|
|---|
| 2404 | @cindex GNU extensions, @var{addr1},~@var{N} addressing
|
|---|
| 2405 | @value{SSED} also supports some special two-address forms; all these
|
|---|
| 2406 | are GNU extensions:
|
|---|
| 2407 | @table @code
|
|---|
| 2408 | @item 0,/@var{regexp}/
|
|---|
| 2409 | A line number of @code{0} can be used in an address specification like
|
|---|
| 2410 | @code{0,/@var{regexp}/} so that @command{sed} will try to match
|
|---|
| 2411 | @var{regexp} in the first input line too. In other words,
|
|---|
| 2412 | @code{0,/@var{regexp}/} is similar to @code{1,/@var{regexp}/},
|
|---|
| 2413 | except that if @var{addr2} matches the very first line of input the
|
|---|
| 2414 | @code{0,/@var{regexp}/} form will consider it to end the range, whereas
|
|---|
| 2415 | the @code{1,/@var{regexp}/} form will match the beginning of its range and
|
|---|
| 2416 | hence make the range span up to the @emph{second} occurrence of the
|
|---|
| 2417 | regular expression.
|
|---|
| 2418 |
|
|---|
| 2419 | The following examples demonstrate the difference between starting
|
|---|
| 2420 | with address 1 and 0:
|
|---|
| 2421 |
|
|---|
| 2422 | @example
|
|---|
| 2423 | $ seq 10 | sed -n '1,/[0-9]/p'
|
|---|
| 2424 | 1
|
|---|
| 2425 | 2
|
|---|
| 2426 |
|
|---|
| 2427 | $ seq 10 | sed -n '0,/[0-9]/p'
|
|---|
| 2428 | 1
|
|---|
| 2429 | @end example
|
|---|
| 2430 |
|
|---|
| 2431 |
|
|---|
| 2432 | @item @var{addr1},+@var{N}
|
|---|
| 2433 | Matches @var{addr1} and the @var{N} lines following @var{addr1}.
|
|---|
| 2434 |
|
|---|
| 2435 | @example
|
|---|
| 2436 | $ seq 10 | sed -n '6,+2p'
|
|---|
| 2437 | 6
|
|---|
| 2438 | 7
|
|---|
| 2439 | 8
|
|---|
| 2440 | @end example
|
|---|
| 2441 |
|
|---|
| 2442 | @var{addr1} can be a line number or a regular expression.
|
|---|
| 2443 |
|
|---|
| 2444 | @item @var{addr1},~@var{N}
|
|---|
| 2445 | Matches @var{addr1} and the lines following @var{addr1}
|
|---|
| 2446 | until the next line whose input line number is a multiple of @var{N}.
|
|---|
| 2447 | The following command prints starting at line 6, until the next line which
|
|---|
| 2448 | is a multiple of 4 (i.e. line 8):
|
|---|
| 2449 |
|
|---|
| 2450 | @example
|
|---|
| 2451 | $ seq 10 | sed -n '6,~4p'
|
|---|
| 2452 | 6
|
|---|
| 2453 | 7
|
|---|
| 2454 | 8
|
|---|
| 2455 | @end example
|
|---|
| 2456 |
|
|---|
| 2457 | @var{addr1} can be a line number or a regular expression.
|
|---|
| 2458 |
|
|---|
| 2459 | @end table
|
|---|
| 2460 |
|
|---|
| 2461 |
|
|---|
| 2462 |
|
|---|
| 2463 | @node Zero Address
|
|---|
| 2464 | @section Zero Address
|
|---|
| 2465 | @cindex Zero Address
|
|---|
| 2466 | As a @value{SSED} extension, @code{0} address can be used in two cases:
|
|---|
| 2467 | @enumerate
|
|---|
| 2468 | @item
|
|---|
| 2469 | In a regex range addresses as @code{0,/@var{regexp}/}
|
|---|
| 2470 | (@pxref{Zero Address Regex Range}).
|
|---|
| 2471 | @item
|
|---|
| 2472 | With the @code{r} command, inserting a file before the first line
|
|---|
| 2473 | (@pxref{Adding a header to multiple files}).
|
|---|
| 2474 | @end enumerate
|
|---|
| 2475 |
|
|---|
| 2476 | Note that these are the only places where the @code{0} address makes
|
|---|
| 2477 | sense; Commands which are given the @code{0} address in any
|
|---|
| 2478 | other way will give an error.
|
|---|
| 2479 |
|
|---|
| 2480 |
|
|---|
| 2481 |
|
|---|
| 2482 | @node sed regular expressions
|
|---|
| 2483 | @chapter Regular Expressions: selecting text
|
|---|
| 2484 |
|
|---|
| 2485 | @menu
|
|---|
| 2486 | * Regular Expressions Overview:: Overview of Regular expression in @command{sed}
|
|---|
| 2487 | * BRE vs ERE:: Basic (BRE) and extended (ERE) regular expression
|
|---|
| 2488 | syntax
|
|---|
| 2489 | * BRE syntax:: Overview of basic regular expression syntax
|
|---|
| 2490 | * ERE syntax:: Overview of extended regular expression syntax
|
|---|
| 2491 | * Character Classes and Bracket Expressions::
|
|---|
| 2492 | * regexp extensions:: Additional regular expression commands
|
|---|
| 2493 | * Back-references and Subexpressions:: Back-references and Subexpressions
|
|---|
| 2494 | * Escapes:: Specifying special characters
|
|---|
| 2495 | * Locale Considerations:: Multibyte characters and locale considerations
|
|---|
| 2496 | @end menu
|
|---|
| 2497 |
|
|---|
| 2498 | @node Regular Expressions Overview
|
|---|
| 2499 | @section Overview of regular expression in @command{sed}
|
|---|
| 2500 |
|
|---|
| 2501 | @c NOTE: Keep examples in the 'overview' section
|
|---|
| 2502 | @c neutral in regards to BRE/ERE - to ease understanding.
|
|---|
| 2503 |
|
|---|
| 2504 |
|
|---|
| 2505 | To know how to use @command{sed}, people should understand regular
|
|---|
| 2506 | expressions (@dfn{regexp} for short). A regular expression
|
|---|
| 2507 | is a pattern that is matched against a
|
|---|
| 2508 | subject string from left to right. Most characters are
|
|---|
| 2509 | @dfn{ordinary}: they stand for
|
|---|
| 2510 | themselves in a pattern, and match the corresponding characters.
|
|---|
| 2511 | Regular expressions in @command{sed} are specified between two
|
|---|
| 2512 | slashes.
|
|---|
| 2513 |
|
|---|
| 2514 | The following command prints lines containing the string @samp{hello}:
|
|---|
| 2515 |
|
|---|
| 2516 | @example
|
|---|
| 2517 | sed -n '/hello/p'
|
|---|
| 2518 | @end example
|
|---|
| 2519 |
|
|---|
| 2520 | The above example is equivalent to this @command{grep} command:
|
|---|
| 2521 |
|
|---|
| 2522 | @example
|
|---|
| 2523 | grep 'hello'
|
|---|
| 2524 | @end example
|
|---|
| 2525 |
|
|---|
| 2526 | The power of regular expressions comes from the ability to include
|
|---|
| 2527 | alternatives and repetitions in the pattern. These are encoded in the
|
|---|
| 2528 | pattern by the use of @dfn{special characters}, which do not stand for
|
|---|
| 2529 | themselves but instead are interpreted in some special way.
|
|---|
| 2530 |
|
|---|
| 2531 | The character @code{^} (caret) in a regular expression matches the
|
|---|
| 2532 | beginning of the line. The character @code{.} (dot) matches any single
|
|---|
| 2533 | character. The following @command{sed} command matches and prints
|
|---|
| 2534 | lines which start with the letter @samp{b}, followed by any single character,
|
|---|
| 2535 | followed by the letter @samp{d}:
|
|---|
| 2536 |
|
|---|
| 2537 | @example
|
|---|
| 2538 | $ printf "%s\n" abode bad bed bit bid byte body | sed -n '/^b.d/p'
|
|---|
| 2539 | bad
|
|---|
| 2540 | bed
|
|---|
| 2541 | bid
|
|---|
| 2542 | body
|
|---|
| 2543 | @end example
|
|---|
| 2544 |
|
|---|
| 2545 | The following sections explain the meaning and usage of special
|
|---|
| 2546 | characters in regular expressions.
|
|---|
| 2547 |
|
|---|
| 2548 | @node BRE vs ERE
|
|---|
| 2549 | @section Basic (BRE) and extended (ERE) regular expression
|
|---|
| 2550 |
|
|---|
| 2551 | Basic and extended regular expressions are two variations on the
|
|---|
| 2552 | syntax of the specified pattern. Basic Regular Expression (BRE) syntax is the
|
|---|
| 2553 | default in @command{sed} (and similarly in @command{grep}).
|
|---|
| 2554 | Use the POSIX-specified @option{-E} option (@option{-r},
|
|---|
| 2555 | @option{--regexp-extended}) to enable Extended Regular Expression (ERE) syntax.
|
|---|
| 2556 |
|
|---|
| 2557 | In @value{SSED}, the only difference between basic and extended regular
|
|---|
| 2558 | expressions is in the behavior of a few special characters: @samp{?},
|
|---|
| 2559 | @samp{+}, parentheses, braces (@samp{@{@}}), and @samp{|}.
|
|---|
| 2560 |
|
|---|
| 2561 | With basic (BRE) syntax, these characters do not have special meaning
|
|---|
| 2562 | unless prefixed with a backslash (@samp{\}); While with extended (ERE) syntax
|
|---|
| 2563 | it is reversed: these characters are special unless they are prefixed
|
|---|
| 2564 | with backslash (@samp{\}).
|
|---|
| 2565 |
|
|---|
| 2566 | @multitable @columnfractions .28 .36 .35
|
|---|
| 2567 |
|
|---|
| 2568 | @headitem Desired pattern
|
|---|
| 2569 | @tab Basic (BRE) Syntax
|
|---|
| 2570 | @tab Extended (ERE) Syntax
|
|---|
| 2571 |
|
|---|
| 2572 | @item literal @samp{+} (plus sign)
|
|---|
| 2573 |
|
|---|
| 2574 | @tab
|
|---|
| 2575 | @exampleindent 0
|
|---|
| 2576 | @codequoteundirected on
|
|---|
| 2577 | @codequotebacktick on
|
|---|
| 2578 | @example
|
|---|
| 2579 | $ echo 'a+b=c' > foo
|
|---|
| 2580 | $ sed -n '/a+b/p' foo
|
|---|
| 2581 | a+b=c
|
|---|
| 2582 | @end example
|
|---|
| 2583 | @codequotebacktick off
|
|---|
| 2584 | @codequoteundirected off
|
|---|
| 2585 |
|
|---|
| 2586 | @tab
|
|---|
| 2587 | @exampleindent 0
|
|---|
| 2588 | @codequoteundirected on
|
|---|
| 2589 | @codequotebacktick on
|
|---|
| 2590 | @example
|
|---|
| 2591 | $ echo 'a+b=c' > foo
|
|---|
| 2592 | $ sed -E -n '/a\+b/p' foo
|
|---|
| 2593 | a+b=c
|
|---|
| 2594 | @end example
|
|---|
| 2595 | @codequotebacktick off
|
|---|
| 2596 | @codequoteundirected off
|
|---|
| 2597 |
|
|---|
| 2598 |
|
|---|
| 2599 | @item One or more @samp{a} characters followed by @samp{b}
|
|---|
| 2600 | (plus sign as special meta-character)
|
|---|
| 2601 |
|
|---|
| 2602 | @tab
|
|---|
| 2603 | @exampleindent 0
|
|---|
| 2604 | @codequoteundirected on
|
|---|
| 2605 | @codequotebacktick on
|
|---|
| 2606 | @example
|
|---|
| 2607 | $ echo aab > foo
|
|---|
| 2608 | $ sed -n '/a\+b/p' foo
|
|---|
| 2609 | aab
|
|---|
| 2610 | @end example
|
|---|
| 2611 | @codequotebacktick off
|
|---|
| 2612 | @codequoteundirected off
|
|---|
| 2613 |
|
|---|
| 2614 | @tab
|
|---|
| 2615 | @exampleindent 0
|
|---|
| 2616 | @codequoteundirected on
|
|---|
| 2617 | @codequotebacktick on
|
|---|
| 2618 | @example
|
|---|
| 2619 | $ echo aab > foo
|
|---|
| 2620 | $ sed -E -n '/a+b/p' foo
|
|---|
| 2621 | aab
|
|---|
| 2622 | @end example
|
|---|
| 2623 | @codequotebacktick off
|
|---|
| 2624 | @codequoteundirected off
|
|---|
| 2625 |
|
|---|
| 2626 | @end multitable
|
|---|
| 2627 |
|
|---|
| 2628 |
|
|---|
| 2629 |
|
|---|
| 2630 |
|
|---|
| 2631 | @node BRE syntax
|
|---|
| 2632 | @section Overview of basic regular expression syntax
|
|---|
| 2633 |
|
|---|
| 2634 | Here is a brief description
|
|---|
| 2635 | of regular expression syntax as used in @command{sed}.
|
|---|
| 2636 |
|
|---|
| 2637 | @table @code
|
|---|
| 2638 | @item @var{char}
|
|---|
| 2639 | A single ordinary character matches itself.
|
|---|
| 2640 |
|
|---|
| 2641 | @item *
|
|---|
| 2642 | @cindex GNU extensions, to basic regular expressions
|
|---|
| 2643 | Matches a sequence of zero or more instances of matches for the
|
|---|
| 2644 | preceding regular expression, which must be an ordinary character, a
|
|---|
| 2645 | special character preceded by @code{\}, a @code{.}, a grouped regexp
|
|---|
| 2646 | (see below), or a bracket expression. As a GNU extension, a
|
|---|
| 2647 | postfixed regular expression can also be followed by @code{*}; for
|
|---|
| 2648 | example, @code{a**} is equivalent to @code{a*}. POSIX
|
|---|
| 2649 | 1003.1-2001 says that @code{*} stands for itself when it appears at
|
|---|
| 2650 | the start of a regular expression or subexpression, but many
|
|---|
| 2651 | non-GNU implementations do not support this and portable
|
|---|
| 2652 | scripts should instead use @code{\*} in these contexts.
|
|---|
| 2653 | @item .
|
|---|
| 2654 | Matches any character, including newline.
|
|---|
| 2655 |
|
|---|
| 2656 | @item ^
|
|---|
| 2657 | Matches the null string at beginning of the pattern space, i.e. what
|
|---|
| 2658 | appears after the circumflex must appear at the beginning of the
|
|---|
| 2659 | pattern space.
|
|---|
| 2660 |
|
|---|
| 2661 | In most scripts, pattern space is initialized to the content of each
|
|---|
| 2662 | line (@pxref{Execution Cycle, , How @code{sed} works}). So, it is a
|
|---|
| 2663 | useful simplification to think of @code{^#include} as matching only
|
|---|
| 2664 | lines where @samp{#include} is the first thing on the line---if there is
|
|---|
| 2665 | any preceding space, for example, the match fails. This simplification is
|
|---|
| 2666 | valid as long as the original content of pattern space is not modified,
|
|---|
| 2667 | for example with an @code{s} command.
|
|---|
| 2668 |
|
|---|
| 2669 | @code{^} acts as a special character only at the beginning of the
|
|---|
| 2670 | regular expression or subexpression (that is, after @code{\(} or
|
|---|
| 2671 | @code{\|}). Portable scripts should avoid @code{^} at the beginning of
|
|---|
| 2672 | a subexpression, though, as POSIX allows implementations that
|
|---|
| 2673 | treat @code{^} as an ordinary character in that context.
|
|---|
| 2674 |
|
|---|
| 2675 | @item $
|
|---|
| 2676 | It is the same as @code{^}, but refers to end of pattern space.
|
|---|
| 2677 | @code{$} also acts as a special character only at the end
|
|---|
| 2678 | of the regular expression or subexpression (that is, before @code{\)}
|
|---|
| 2679 | or @code{\|}), and its use at the end of a subexpression is not
|
|---|
| 2680 | portable.
|
|---|
| 2681 |
|
|---|
| 2682 |
|
|---|
| 2683 | @item [@var{list}]
|
|---|
| 2684 | @itemx [^@var{list}]
|
|---|
| 2685 | Matches any single character in @var{list}: for example,
|
|---|
| 2686 | @code{[aeiou]} matches all vowels. A list may include
|
|---|
| 2687 | sequences like @code{@var{char1}-@var{char2}}, which
|
|---|
| 2688 | matches any character between (inclusive) @var{char1}
|
|---|
| 2689 | and @var{char2}.
|
|---|
| 2690 | @xref{Character Classes and Bracket Expressions}.
|
|---|
| 2691 |
|
|---|
| 2692 | @item \+
|
|---|
| 2693 | @cindex GNU extensions, to basic regular expressions
|
|---|
| 2694 | As @code{*}, but matches one or more. It is a GNU extension.
|
|---|
| 2695 |
|
|---|
| 2696 | @item \?
|
|---|
| 2697 | @cindex GNU extensions, to basic regular expressions
|
|---|
| 2698 | As @code{*}, but only matches zero or one. It is a GNU extension.
|
|---|
| 2699 |
|
|---|
| 2700 | @item \@{@var{i}\@}
|
|---|
| 2701 | As @code{*}, but matches exactly @var{i} sequences (@var{i} is a
|
|---|
| 2702 | decimal integer; for portability, keep it between 0 and 255
|
|---|
| 2703 | inclusive).
|
|---|
| 2704 |
|
|---|
| 2705 | @item \@{@var{i},@var{j}\@}
|
|---|
| 2706 | Matches between @var{i} and @var{j}, inclusive, sequences.
|
|---|
| 2707 |
|
|---|
| 2708 | @item \@{@var{i},\@}
|
|---|
| 2709 | Matches more than or equal to @var{i} sequences.
|
|---|
| 2710 |
|
|---|
| 2711 | @item \(@var{regexp}\)
|
|---|
| 2712 | Groups the inner @var{regexp} as a whole, this is used to:
|
|---|
| 2713 |
|
|---|
| 2714 | @itemize @bullet
|
|---|
| 2715 | @item
|
|---|
| 2716 | @cindex GNU extensions, to basic regular expressions
|
|---|
| 2717 | Apply postfix operators, like @code{\(abcd\)*}:
|
|---|
| 2718 | this will search for zero or more whole sequences
|
|---|
| 2719 | of @samp{abcd}, while @code{abcd*} would search
|
|---|
| 2720 | for @samp{abc} followed by zero or more occurrences
|
|---|
| 2721 | of @samp{d}. Note that support for @code{\(abcd\)*} is
|
|---|
| 2722 | required by POSIX 1003.1-2001, but many non-GNU
|
|---|
| 2723 | implementations do not support it and hence it is not universally
|
|---|
| 2724 | portable.
|
|---|
| 2725 |
|
|---|
| 2726 | @item
|
|---|
| 2727 | Use back references (see below).
|
|---|
| 2728 | @end itemize
|
|---|
| 2729 |
|
|---|
| 2730 |
|
|---|
| 2731 | @item @var{regexp1}\|@var{regexp2}
|
|---|
| 2732 | @cindex GNU extensions, to basic regular expressions
|
|---|
| 2733 | Matches either @var{regexp1} or @var{regexp2}. Use
|
|---|
| 2734 | parentheses to use complex alternative regular expressions.
|
|---|
| 2735 | The matching process tries each alternative in turn, from
|
|---|
| 2736 | left to right, and the first one that succeeds is used.
|
|---|
| 2737 | It is a GNU extension.
|
|---|
| 2738 |
|
|---|
| 2739 | @item @var{regexp1}@var{regexp2}
|
|---|
| 2740 | Matches the concatenation of @var{regexp1} and @var{regexp2}.
|
|---|
| 2741 | Concatenation binds more tightly than @code{\|}, @code{^}, and
|
|---|
| 2742 | @code{$}, but less tightly than the other regular expression
|
|---|
| 2743 | operators.
|
|---|
| 2744 |
|
|---|
| 2745 | @item \@var{digit}
|
|---|
| 2746 | Matches the @var{digit}-th @code{\(@dots{}\)} parenthesized
|
|---|
| 2747 | subexpression in the regular expression. This is called a @dfn{back
|
|---|
| 2748 | reference}. Subexpressions are implicitly numbered by counting
|
|---|
| 2749 | occurrences of @code{\(} left-to-right.
|
|---|
| 2750 |
|
|---|
| 2751 | @item \n
|
|---|
| 2752 | Matches the newline character.
|
|---|
| 2753 |
|
|---|
| 2754 | @item \@var{char}
|
|---|
| 2755 | Matches @var{char}, where @var{char} is one of @code{$},
|
|---|
| 2756 | @code{*}, @code{.}, @code{[}, @code{\}, or @code{^}.
|
|---|
| 2757 | Note that the only C-like
|
|---|
| 2758 | backslash sequences that you can portably assume to be
|
|---|
| 2759 | interpreted are @code{\n} and @code{\\}; in particular
|
|---|
| 2760 | @code{\t} is not portable, and matches a @samp{t} under most
|
|---|
| 2761 | implementations of @command{sed}, rather than a tab character.
|
|---|
| 2762 |
|
|---|
| 2763 | @end table
|
|---|
| 2764 |
|
|---|
| 2765 | @cindex Greedy regular expression matching
|
|---|
| 2766 | Note that the regular expression matcher is greedy, i.e., matches
|
|---|
| 2767 | are attempted from left to right and, if two or more matches are
|
|---|
| 2768 | possible starting at the same character, it selects the longest.
|
|---|
| 2769 |
|
|---|
| 2770 | @noindent
|
|---|
| 2771 | Examples:
|
|---|
| 2772 | @table @samp
|
|---|
| 2773 | @item abcdef
|
|---|
| 2774 | Matches @samp{abcdef}.
|
|---|
| 2775 |
|
|---|
| 2776 | @item a*b
|
|---|
| 2777 | Matches zero or more @samp{a}s followed by a single
|
|---|
| 2778 | @samp{b}. For example, @samp{b} or @samp{aaaaab}.
|
|---|
| 2779 |
|
|---|
| 2780 | @item a\?b
|
|---|
| 2781 | Matches @samp{b} or @samp{ab}.
|
|---|
| 2782 |
|
|---|
| 2783 | @item a\+b\+
|
|---|
| 2784 | Matches one or more @samp{a}s followed by one or more
|
|---|
| 2785 | @samp{b}s: @samp{ab} is the shortest possible match, but
|
|---|
| 2786 | other examples are @samp{aaaab} or @samp{abbbbb} or
|
|---|
| 2787 | @samp{aaaaaabbbbbbb}.
|
|---|
| 2788 |
|
|---|
| 2789 | @item .*
|
|---|
| 2790 | @itemx .\+
|
|---|
| 2791 | These two both match all the characters in a string;
|
|---|
| 2792 | however, the first matches every string (including the empty
|
|---|
| 2793 | string), while the second matches only strings containing
|
|---|
| 2794 | at least one character.
|
|---|
| 2795 |
|
|---|
| 2796 | @item ^main.*(.*)
|
|---|
| 2797 | This matches a string starting with @samp{main},
|
|---|
| 2798 | followed by an opening and closing
|
|---|
| 2799 | parenthesis. The @samp{n}, @samp{(} and @samp{)} need not
|
|---|
| 2800 | be adjacent.
|
|---|
| 2801 |
|
|---|
| 2802 | @item ^#
|
|---|
| 2803 | This matches a string beginning with @samp{#}.
|
|---|
| 2804 |
|
|---|
| 2805 | @item \\$
|
|---|
| 2806 | This matches a string ending with a single backslash. The
|
|---|
| 2807 | regexp contains two backslashes for escaping.
|
|---|
| 2808 |
|
|---|
| 2809 | @item \$
|
|---|
| 2810 | Instead, this matches a string consisting of a single dollar sign,
|
|---|
| 2811 | because it is escaped.
|
|---|
| 2812 |
|
|---|
| 2813 | @item [a-zA-Z0-9]
|
|---|
| 2814 | In the C locale, this matches any ASCII letters or digits.
|
|---|
| 2815 |
|
|---|
| 2816 | @item [^ @kbd{@key{TAB}}]\+
|
|---|
| 2817 | (Here @kbd{@key{TAB}} stands for a single tab character.)
|
|---|
| 2818 | This matches a string of one or more
|
|---|
| 2819 | characters, none of which is a space or a tab.
|
|---|
| 2820 | Usually this means a word.
|
|---|
| 2821 |
|
|---|
| 2822 | @item ^\(.*\)\n\1$
|
|---|
| 2823 | This matches a string consisting of two equal substrings separated by
|
|---|
| 2824 | a newline.
|
|---|
| 2825 |
|
|---|
| 2826 | @item .\@{9\@}A$
|
|---|
| 2827 | This matches nine characters followed by an @samp{A} at the end of a line.
|
|---|
| 2828 |
|
|---|
| 2829 | @item ^.\@{15\@}A
|
|---|
| 2830 | This matches the start of a string that contains 16 characters,
|
|---|
| 2831 | the last of which is an @samp{A}.
|
|---|
| 2832 |
|
|---|
| 2833 | @end table
|
|---|
| 2834 |
|
|---|
| 2835 |
|
|---|
| 2836 | @node ERE syntax
|
|---|
| 2837 | @section Overview of extended regular expression syntax
|
|---|
| 2838 | @cindex Extended regular expressions, syntax
|
|---|
| 2839 |
|
|---|
| 2840 | The only difference between basic and extended regular expressions is in
|
|---|
| 2841 | the behavior of a few characters: @samp{?}, @samp{+}, parentheses,
|
|---|
| 2842 | braces (@samp{@{@}}), and @samp{|}. While basic regular expressions
|
|---|
| 2843 | require these to be escaped if you want them to behave as special
|
|---|
| 2844 | characters, when using extended regular expressions you must escape
|
|---|
| 2845 | them if you want them @emph{to match a literal character}. @samp{|}
|
|---|
| 2846 | is special here because @samp{\|} is a GNU extension -- standard
|
|---|
| 2847 | basic regular expressions do not provide its functionality.
|
|---|
| 2848 |
|
|---|
| 2849 | @noindent
|
|---|
| 2850 | Examples:
|
|---|
| 2851 | @table @code
|
|---|
| 2852 | @item abc?
|
|---|
| 2853 | becomes @samp{abc\?} when using extended regular expressions. It matches
|
|---|
| 2854 | the literal string @samp{abc?}.
|
|---|
| 2855 |
|
|---|
| 2856 | @item c\+
|
|---|
| 2857 | becomes @samp{c+} when using extended regular expressions. It matches
|
|---|
| 2858 | one or more @samp{c}s.
|
|---|
| 2859 |
|
|---|
| 2860 | @item a\@{3,\@}
|
|---|
| 2861 | becomes @samp{a@{3,@}} when using extended regular expressions. It matches
|
|---|
| 2862 | three or more @samp{a}s.
|
|---|
| 2863 |
|
|---|
| 2864 | @item \(abc\)\@{2,3\@}
|
|---|
| 2865 | becomes @samp{(abc)@{2,3@}} when using extended regular expressions. It
|
|---|
| 2866 | matches either @samp{abcabc} or @samp{abcabcabc}.
|
|---|
| 2867 |
|
|---|
| 2868 | @item \(abc*\)\1
|
|---|
| 2869 | becomes @samp{(abc*)\1} when using extended regular expressions.
|
|---|
| 2870 | Backreferences must still be escaped when using extended regular
|
|---|
| 2871 | expressions.
|
|---|
| 2872 |
|
|---|
| 2873 | @item a\|b
|
|---|
| 2874 | becomes @samp{a|b} when using extended regular expressions. It matches
|
|---|
| 2875 | @samp{a} or @samp{b}.
|
|---|
| 2876 | @end table
|
|---|
| 2877 |
|
|---|
| 2878 | @node Character Classes and Bracket Expressions
|
|---|
| 2879 | @section Character Classes and Bracket Expressions
|
|---|
| 2880 |
|
|---|
| 2881 | @c The 'character class' section is shamelessly copied from grep's manual.
|
|---|
| 2882 |
|
|---|
| 2883 | @cindex bracket expression
|
|---|
| 2884 | @cindex character class
|
|---|
| 2885 | A @dfn{bracket expression} is a list of characters enclosed by @samp{[} and
|
|---|
| 2886 | @samp{]}.
|
|---|
| 2887 | It matches any single character in that list;
|
|---|
| 2888 | if the first character of the list is the caret @samp{^},
|
|---|
| 2889 | then it matches any character @strong{not} in the list.
|
|---|
| 2890 | For example, the following command replaces the strings
|
|---|
| 2891 | @samp{gray} or @samp{grey} with @samp{blue}:
|
|---|
| 2892 |
|
|---|
| 2893 | @example
|
|---|
| 2894 | sed 's/gr[ae]y/blue/'
|
|---|
| 2895 | @end example
|
|---|
| 2896 |
|
|---|
| 2897 | @c TODO: fix 'ref' to look good in both HTML and PDF
|
|---|
| 2898 | Bracket expressions can be used in both
|
|---|
| 2899 | @ref{BRE syntax,,basic} and @ref{ERE syntax,,extended}
|
|---|
| 2900 | regular expressions (that is, with or without the @option{-E}/@option{-r}
|
|---|
| 2901 | options).
|
|---|
| 2902 |
|
|---|
| 2903 | @cindex range expression
|
|---|
| 2904 | Within a bracket expression, a @dfn{range expression} consists of two
|
|---|
| 2905 | characters separated by a hyphen.
|
|---|
| 2906 | It matches any single character that
|
|---|
| 2907 | sorts between the two characters, inclusive.
|
|---|
| 2908 | In the default C locale, the sorting sequence is the native character
|
|---|
| 2909 | order; for example, @samp{[a-d]} is equivalent to @samp{[abcd]}.
|
|---|
| 2910 |
|
|---|
| 2911 |
|
|---|
| 2912 | Finally, certain named classes of characters are predefined within
|
|---|
| 2913 | bracket expressions, as follows.
|
|---|
| 2914 |
|
|---|
| 2915 | These named classes must be used @emph{inside} brackets
|
|---|
| 2916 | themselves. Correct usage:
|
|---|
| 2917 | @example
|
|---|
| 2918 | $ echo 1 | sed 's/[[:digit:]]/X/'
|
|---|
| 2919 | X
|
|---|
| 2920 | @end example
|
|---|
| 2921 |
|
|---|
| 2922 | Incorrect usage is rejected by newer @command{sed} versions.
|
|---|
| 2923 | Older versions accepted it but treated it as a single bracket expression
|
|---|
| 2924 | (which is equivalent to @samp{[dgit:]},
|
|---|
| 2925 | that is, only the characters @var{d/g/i/t/:}):
|
|---|
| 2926 | @example
|
|---|
| 2927 | # current GNU sed versions - incorrect usage rejected
|
|---|
| 2928 | $ echo 1 | sed 's/[:digit:]/X/'
|
|---|
| 2929 | sed: character class syntax is [[:space:]], not [:space:]
|
|---|
| 2930 |
|
|---|
| 2931 | # older GNU sed versions
|
|---|
| 2932 | $ echo 1 | sed 's/[:digit:]/X/'
|
|---|
| 2933 | 1
|
|---|
| 2934 | @end example
|
|---|
| 2935 |
|
|---|
| 2936 |
|
|---|
| 2937 | @cindex classes of characters
|
|---|
| 2938 | @cindex character classes
|
|---|
| 2939 | @cindex named character classes
|
|---|
| 2940 | @table @samp
|
|---|
| 2941 |
|
|---|
| 2942 | @item [:alnum:]
|
|---|
| 2943 | @opindex alnum @r{character class}
|
|---|
| 2944 | @cindex alphanumeric characters
|
|---|
| 2945 | Alphanumeric characters:
|
|---|
| 2946 | @samp{[:alpha:]} and @samp{[:digit:]}; in the @samp{C} locale and ASCII
|
|---|
| 2947 | character encoding, this is the same as @samp{[0-9A-Za-z]}.
|
|---|
| 2948 |
|
|---|
| 2949 | @item [:alpha:]
|
|---|
| 2950 | @opindex alpha @r{character class}
|
|---|
| 2951 | @cindex alphabetic characters
|
|---|
| 2952 | Alphabetic characters:
|
|---|
| 2953 | @samp{[:lower:]} and @samp{[:upper:]}; in the @samp{C} locale and ASCII
|
|---|
| 2954 | character encoding, this is the same as @samp{[A-Za-z]}.
|
|---|
| 2955 |
|
|---|
| 2956 | @item [:blank:]
|
|---|
| 2957 | @opindex blank @r{character class}
|
|---|
| 2958 | @cindex blank characters
|
|---|
| 2959 | Blank characters:
|
|---|
| 2960 | space and tab.
|
|---|
| 2961 |
|
|---|
| 2962 | @item [:cntrl:]
|
|---|
| 2963 | @opindex cntrl @r{character class}
|
|---|
| 2964 | @cindex control characters
|
|---|
| 2965 | Control characters.
|
|---|
| 2966 | In ASCII, these characters have octal codes 000
|
|---|
| 2967 | through 037, and 177 (DEL).
|
|---|
| 2968 | In other character sets, these are
|
|---|
| 2969 | the equivalent characters, if any.
|
|---|
| 2970 |
|
|---|
| 2971 | @item [:digit:]
|
|---|
| 2972 | @opindex digit @r{character class}
|
|---|
| 2973 | @cindex digit characters
|
|---|
| 2974 | @cindex numeric characters
|
|---|
| 2975 | Digits: @code{0 1 2 3 4 5 6 7 8 9}.
|
|---|
| 2976 |
|
|---|
| 2977 | @item [:graph:]
|
|---|
| 2978 | @opindex graph @r{character class}
|
|---|
| 2979 | @cindex graphic characters
|
|---|
| 2980 | Graphical characters:
|
|---|
| 2981 | @samp{[:alnum:]} and @samp{[:punct:]}.
|
|---|
| 2982 |
|
|---|
| 2983 | @item [:lower:]
|
|---|
| 2984 | @opindex lower @r{character class}
|
|---|
| 2985 | @cindex lower-case letters
|
|---|
| 2986 | Lower-case letters; in the @samp{C} locale and ASCII character
|
|---|
| 2987 | encoding, this is
|
|---|
| 2988 | @code{a b c d e f g h i j k l m n o p q r s t u v w x y z}.
|
|---|
| 2989 |
|
|---|
| 2990 | @item [:print:]
|
|---|
| 2991 | @opindex print @r{character class}
|
|---|
| 2992 | @cindex printable characters
|
|---|
| 2993 | Printable characters:
|
|---|
| 2994 | @samp{[:alnum:]}, @samp{[:punct:]}, and space.
|
|---|
| 2995 |
|
|---|
| 2996 | @item [:punct:]
|
|---|
| 2997 | @opindex punct @r{character class}
|
|---|
| 2998 | @cindex punctuation characters
|
|---|
| 2999 | Punctuation characters; in the @samp{C} locale and ASCII character
|
|---|
| 3000 | encoding, this is
|
|---|
| 3001 | @code{!@: " # $ % & ' ( ) * + , - .@: / : ; < = > ?@: @@ [ \ ] ^ _ ` @{ | @} ~}.
|
|---|
| 3002 |
|
|---|
| 3003 | @item [:space:]
|
|---|
| 3004 | @opindex space @r{character class}
|
|---|
| 3005 | @cindex space characters
|
|---|
| 3006 | @cindex whitespace characters
|
|---|
| 3007 | Space characters: in the @samp{C} locale, this is
|
|---|
| 3008 | tab, newline, vertical tab, form feed, carriage return, and space.
|
|---|
| 3009 |
|
|---|
| 3010 |
|
|---|
| 3011 | @item [:upper:]
|
|---|
| 3012 | @opindex upper @r{character class}
|
|---|
| 3013 | @cindex upper-case letters
|
|---|
| 3014 | Upper-case letters: in the @samp{C} locale and ASCII character
|
|---|
| 3015 | encoding, this is
|
|---|
| 3016 | @code{A B C D E F G H I J K L M N O P Q R S T U V W X Y Z}.
|
|---|
| 3017 |
|
|---|
| 3018 | @item [:xdigit:]
|
|---|
| 3019 | @opindex xdigit @r{character class}
|
|---|
| 3020 | @cindex xdigit class
|
|---|
| 3021 | @cindex hexadecimal digits
|
|---|
| 3022 | Hexadecimal digits:
|
|---|
| 3023 | @code{0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f}.
|
|---|
| 3024 |
|
|---|
| 3025 | @end table
|
|---|
| 3026 | Note that the brackets in these class names are
|
|---|
| 3027 | part of the symbolic names, and must be included in addition to
|
|---|
| 3028 | the brackets delimiting the bracket expression.
|
|---|
| 3029 |
|
|---|
| 3030 | Most meta-characters lose their special meaning inside bracket expressions:
|
|---|
| 3031 |
|
|---|
| 3032 | @table @samp
|
|---|
| 3033 | @item ]
|
|---|
| 3034 | ends the bracket expression if it's not the first list item.
|
|---|
| 3035 | So, if you want to make the @samp{]} character a list item,
|
|---|
| 3036 | you must put it first.
|
|---|
| 3037 |
|
|---|
| 3038 | @item -
|
|---|
| 3039 | represents the range if it's not first or last in a list or the ending point
|
|---|
| 3040 | of a range.
|
|---|
| 3041 |
|
|---|
| 3042 | @item ^
|
|---|
| 3043 | represents the characters not in the list.
|
|---|
| 3044 | If you want to make the @samp{^}
|
|---|
| 3045 | character a list item, place it anywhere but first.
|
|---|
| 3046 | @end table
|
|---|
| 3047 |
|
|---|
| 3048 | TODO: incorporate this paragraph (copied verbatim from BRE section).
|
|---|
| 3049 |
|
|---|
| 3050 | @cindex @code{POSIXLY_CORRECT} behavior, bracket expressions
|
|---|
| 3051 | The characters @code{$}, @code{*}, @code{.}, @code{[}, and @code{\}
|
|---|
| 3052 | are normally not special within @var{list}. For example, @code{[\*]}
|
|---|
| 3053 | matches either @samp{\} or @samp{*}, because the @code{\} is not
|
|---|
| 3054 | special here. However, strings like @code{[.ch.]}, @code{[=a=]}, and
|
|---|
| 3055 | @code{[:space:]} are special within @var{list} and represent collating
|
|---|
| 3056 | symbols, equivalence classes, and character classes, respectively, and
|
|---|
| 3057 | @code{[} is therefore special within @var{list} when it is followed by
|
|---|
| 3058 | @code{.}, @code{=}, or @code{:}. Also, when not in
|
|---|
| 3059 | @env{POSIXLY_CORRECT} mode, special escapes like @code{\n} and
|
|---|
| 3060 | @code{\t} are recognized within @var{list}. @xref{Escapes}.
|
|---|
| 3061 | @c ********
|
|---|
| 3062 |
|
|---|
| 3063 |
|
|---|
| 3064 | @c TODO: improve explanation about collation classes and equivalence classes
|
|---|
| 3065 | @c perhaps dedicate a section to Locales ??
|
|---|
| 3066 |
|
|---|
| 3067 | @table @samp
|
|---|
| 3068 | @item [.
|
|---|
| 3069 | represents the open collating symbol.
|
|---|
| 3070 |
|
|---|
| 3071 | @item .]
|
|---|
| 3072 | represents the close collating symbol.
|
|---|
| 3073 |
|
|---|
| 3074 | @item [=
|
|---|
| 3075 | represents the open equivalence class.
|
|---|
| 3076 |
|
|---|
| 3077 | @item =]
|
|---|
| 3078 | represents the close equivalence class.
|
|---|
| 3079 |
|
|---|
| 3080 | @item [:
|
|---|
| 3081 | represents the open character class symbol, and should be followed by a
|
|---|
| 3082 | valid character class name.
|
|---|
| 3083 |
|
|---|
| 3084 | @item :]
|
|---|
| 3085 | represents the close character class symbol.
|
|---|
| 3086 | @end table
|
|---|
| 3087 |
|
|---|
| 3088 |
|
|---|
| 3089 | @node regexp extensions
|
|---|
| 3090 | @section regular expression extensions
|
|---|
| 3091 |
|
|---|
| 3092 | The following sequences have special meaning inside regular expressions
|
|---|
| 3093 | (used in @ref{Regexp Addresses,,addresses} and the @code{s} command).
|
|---|
| 3094 |
|
|---|
| 3095 | These can be used in both
|
|---|
| 3096 | @ref{BRE syntax,,basic} and @ref{ERE syntax,,extended}
|
|---|
| 3097 | regular expressions (that is, with or without the @option{-E}/@option{-r}
|
|---|
| 3098 | options).
|
|---|
| 3099 |
|
|---|
| 3100 | @table @code
|
|---|
| 3101 | @item \w
|
|---|
| 3102 | Matches any ``word'' character. A ``word'' character is any
|
|---|
| 3103 | letter or digit or the underscore character.
|
|---|
| 3104 |
|
|---|
| 3105 | @example
|
|---|
| 3106 | $ echo "abc %-= def." | sed 's/\w/X/g'
|
|---|
| 3107 | XXX %-= XXX.
|
|---|
| 3108 | @end example
|
|---|
| 3109 |
|
|---|
| 3110 |
|
|---|
| 3111 | @item \W
|
|---|
| 3112 | Matches any ``non-word'' character.
|
|---|
| 3113 |
|
|---|
| 3114 | @example
|
|---|
| 3115 | $ echo "abc %-= def." | sed 's/\W/X/g'
|
|---|
| 3116 | abcXXXXXdefX
|
|---|
| 3117 | @end example
|
|---|
| 3118 |
|
|---|
| 3119 |
|
|---|
| 3120 | @item \b
|
|---|
| 3121 | Matches a word boundary; that is it matches if the character
|
|---|
| 3122 | to the left is a ``word'' character and the character to the
|
|---|
| 3123 | right is a ``non-word'' character, or vice-versa.
|
|---|
| 3124 |
|
|---|
| 3125 | @example
|
|---|
| 3126 | $ echo "abc %-= def." | sed 's/\b/X/g'
|
|---|
| 3127 | XabcX %-= XdefX.
|
|---|
| 3128 | @end example
|
|---|
| 3129 |
|
|---|
| 3130 |
|
|---|
| 3131 | @item \B
|
|---|
| 3132 | Matches everywhere but on a word boundary; that is it matches
|
|---|
| 3133 | if the character to the left and the character to the right
|
|---|
| 3134 | are either both ``word'' characters or both ``non-word''
|
|---|
| 3135 | characters.
|
|---|
| 3136 |
|
|---|
| 3137 | @example
|
|---|
| 3138 | $ echo "abc %-= def." | sed 's/\B/X/g'
|
|---|
| 3139 | aXbXc X%X-X=X dXeXf.X
|
|---|
| 3140 | @end example
|
|---|
| 3141 |
|
|---|
| 3142 |
|
|---|
| 3143 | @item \s
|
|---|
| 3144 | Matches whitespace characters (spaces and tabs).
|
|---|
| 3145 | Newlines embedded in the pattern/hold spaces will also match:
|
|---|
| 3146 |
|
|---|
| 3147 | @example
|
|---|
| 3148 | $ echo "abc %-= def." | sed 's/\s/X/g'
|
|---|
| 3149 | abcX%-=Xdef.
|
|---|
| 3150 | @end example
|
|---|
| 3151 |
|
|---|
| 3152 |
|
|---|
| 3153 | @item \S
|
|---|
| 3154 | Matches non-whitespace characters.
|
|---|
| 3155 |
|
|---|
| 3156 | @example
|
|---|
| 3157 | $ echo "abc %-= def." | sed 's/\S/X/g'
|
|---|
| 3158 | XXX XXX XXXX
|
|---|
| 3159 | @end example
|
|---|
| 3160 |
|
|---|
| 3161 |
|
|---|
| 3162 | @item \<
|
|---|
| 3163 | Matches the beginning of a word.
|
|---|
| 3164 |
|
|---|
| 3165 | @example
|
|---|
| 3166 | $ echo "abc %-= def." | sed 's/\</X/g'
|
|---|
| 3167 | Xabc %-= Xdef.
|
|---|
| 3168 | @end example
|
|---|
| 3169 |
|
|---|
| 3170 |
|
|---|
| 3171 | @item \>
|
|---|
| 3172 | Matches the end of a word.
|
|---|
| 3173 |
|
|---|
| 3174 | @example
|
|---|
| 3175 | $ echo "abc %-= def." | sed 's/\>/X/g'
|
|---|
| 3176 | abcX %-= defX.
|
|---|
| 3177 | @end example
|
|---|
| 3178 |
|
|---|
| 3179 |
|
|---|
| 3180 | @item \`
|
|---|
| 3181 | Matches only at the start of pattern space. This is different
|
|---|
| 3182 | from @code{^} in multi-line mode.
|
|---|
| 3183 |
|
|---|
| 3184 | Compare the following two examples:
|
|---|
| 3185 |
|
|---|
| 3186 | @example
|
|---|
| 3187 | $ printf "a\nb\nc\n" | sed 'N;N;s/^/X/gm'
|
|---|
| 3188 | Xa
|
|---|
| 3189 | Xb
|
|---|
| 3190 | Xc
|
|---|
| 3191 |
|
|---|
| 3192 | $ printf "a\nb\nc\n" | sed 'N;N;s/\`/X/gm'
|
|---|
| 3193 | Xa
|
|---|
| 3194 | b
|
|---|
| 3195 | c
|
|---|
| 3196 | @end example
|
|---|
| 3197 |
|
|---|
| 3198 | @item \'
|
|---|
| 3199 | Matches only at the end of pattern space. This is different
|
|---|
| 3200 | from @code{$} in multi-line mode.
|
|---|
| 3201 |
|
|---|
| 3202 |
|
|---|
| 3203 |
|
|---|
| 3204 | @end table
|
|---|
| 3205 |
|
|---|
| 3206 |
|
|---|
| 3207 | @node Back-references and Subexpressions
|
|---|
| 3208 | @section Back-references and Subexpressions
|
|---|
| 3209 | @cindex subexpression
|
|---|
| 3210 | @cindex back-reference
|
|---|
| 3211 |
|
|---|
| 3212 | @dfn{back-references} are regular expression commands which refer to a
|
|---|
| 3213 | previous part of the matched regular expression. Back-references are
|
|---|
| 3214 | specified with backslash and a single digit (e.g. @samp{\1}). The
|
|---|
| 3215 | part of the regular expression they refer to is called a
|
|---|
| 3216 | @dfn{subexpression}, and is designated with parentheses.
|
|---|
| 3217 |
|
|---|
| 3218 | Back-references and subexpressions are used in two cases: in the
|
|---|
| 3219 | regular expression search pattern, and in the @var{replacement} part
|
|---|
| 3220 | of the @command{s} command (@pxref{Regexp Addresses,,Regular
|
|---|
| 3221 | Expression Addresses} and @ref{The "s" Command}).
|
|---|
| 3222 |
|
|---|
| 3223 | In a regular expression pattern, back-references are used to match
|
|---|
| 3224 | the same content as a previously matched subexpression. In the
|
|---|
| 3225 | following example, the subexpression is @samp{.} - any single
|
|---|
| 3226 | character (being surrounded by parentheses makes it a
|
|---|
| 3227 | subexpression). The back-reference @samp{\1} asks to match the same
|
|---|
| 3228 | content (same character) as the sub-expression.
|
|---|
| 3229 |
|
|---|
| 3230 | The command below matches words starting with any character,
|
|---|
| 3231 | followed by the letter @samp{o}, followed by the same character as the
|
|---|
| 3232 | first.
|
|---|
| 3233 |
|
|---|
| 3234 | @example
|
|---|
| 3235 | $ sed -E -n '/^(.)o\1$/p' /usr/share/dict/words
|
|---|
| 3236 | bob
|
|---|
| 3237 | mom
|
|---|
| 3238 | non
|
|---|
| 3239 | pop
|
|---|
| 3240 | sos
|
|---|
| 3241 | tot
|
|---|
| 3242 | wow
|
|---|
| 3243 | @end example
|
|---|
| 3244 |
|
|---|
| 3245 | Multiple subexpressions are automatically numbered from
|
|---|
| 3246 | left-to-right. This command searches for 6-letter
|
|---|
| 3247 | palindromes (the first three letters are 3 subexpressions,
|
|---|
| 3248 | followed by 3 back-references in reverse order):
|
|---|
| 3249 |
|
|---|
| 3250 | @example
|
|---|
| 3251 | $ sed -E -n '/^(.)(.)(.)\3\2\1$/p' /usr/share/dict/words
|
|---|
| 3252 | redder
|
|---|
| 3253 | @end example
|
|---|
| 3254 |
|
|---|
| 3255 | In the @command{s} command, back-references can be
|
|---|
| 3256 | used in the @var{replacement} part to refer back to subexpressions in
|
|---|
| 3257 | the @var{regexp} part.
|
|---|
| 3258 |
|
|---|
| 3259 | The following example uses two subexpressions in the regular
|
|---|
| 3260 | expression to match two space-separated words. The back-references in
|
|---|
| 3261 | the @var{replacement} part prints the words in a different order:
|
|---|
| 3262 |
|
|---|
| 3263 | @example
|
|---|
| 3264 | $ echo "James Bond" | sed -E 's/(.*) (.*)/The name is \2, \1 \2./'
|
|---|
| 3265 | The name is Bond, James Bond.
|
|---|
| 3266 | @end example
|
|---|
| 3267 |
|
|---|
| 3268 |
|
|---|
| 3269 | When used with alternation, if the group does not participate in the
|
|---|
| 3270 | match then the back-reference makes the whole match fail. For
|
|---|
| 3271 | example, @samp{a(.)|b\1} will not match @samp{ba}. When multiple
|
|---|
| 3272 | regular expressions are given with @option{-e} or from a file
|
|---|
| 3273 | (@samp{-f @var{file}}), back-references are local to each expression.
|
|---|
| 3274 |
|
|---|
| 3275 |
|
|---|
| 3276 | @node Escapes
|
|---|
| 3277 | @section Escape Sequences - specifying special characters
|
|---|
| 3278 |
|
|---|
| 3279 | @cindex GNU extensions, special escapes
|
|---|
| 3280 | Until this chapter, we have only encountered escapes of the form
|
|---|
| 3281 | @samp{\^}, which tell @command{sed} not to interpret the circumflex
|
|---|
| 3282 | as a special character, but rather to take it literally. For
|
|---|
| 3283 | example, @samp{\*} matches a single asterisk rather than zero
|
|---|
| 3284 | or more backslashes.
|
|---|
| 3285 |
|
|---|
| 3286 | @cindex @code{POSIXLY_CORRECT} behavior, escapes
|
|---|
| 3287 | This chapter introduces another kind of escape@footnote{All
|
|---|
| 3288 | the escapes introduced here are GNU
|
|---|
| 3289 | extensions, with the exception of @code{\n}. In basic regular
|
|---|
| 3290 | expression mode, setting @code{POSIXLY_CORRECT} disables them inside
|
|---|
| 3291 | bracket expressions.}---that
|
|---|
| 3292 | is, escapes that are applied to a character or sequence of characters
|
|---|
| 3293 | that ordinarily are taken literally, and that @command{sed} replaces
|
|---|
| 3294 | with a special character. This provides a way
|
|---|
| 3295 | of encoding non-printable characters in patterns in a visible manner.
|
|---|
| 3296 | There is no restriction on the appearance of non-printing characters
|
|---|
| 3297 | in a @command{sed} script but when a script is being prepared in the
|
|---|
| 3298 | shell or by text editing, it is usually easier to use one of
|
|---|
| 3299 | the following escape sequences than the binary character it
|
|---|
| 3300 | represents:
|
|---|
| 3301 |
|
|---|
| 3302 | The list of these escapes is:
|
|---|
| 3303 |
|
|---|
| 3304 | @table @code
|
|---|
| 3305 | @item \a
|
|---|
| 3306 | Produces or matches a @sc{bel} character, that is an ``alert'' (@sc{ascii} 7).
|
|---|
| 3307 |
|
|---|
| 3308 | @item \f
|
|---|
| 3309 | Produces or matches a form feed (@sc{ascii} 12).
|
|---|
| 3310 |
|
|---|
| 3311 | @item \n
|
|---|
| 3312 | Produces or matches a newline (@sc{ascii} 10).
|
|---|
| 3313 |
|
|---|
| 3314 | @item \r
|
|---|
| 3315 | Produces or matches a carriage return (@sc{ascii} 13).
|
|---|
| 3316 |
|
|---|
| 3317 | @item \t
|
|---|
| 3318 | Produces or matches a horizontal tab (@sc{ascii} 9).
|
|---|
| 3319 |
|
|---|
| 3320 | @item \v
|
|---|
| 3321 | Produces or matches a so called ``vertical tab'' (@sc{ascii} 11).
|
|---|
| 3322 |
|
|---|
| 3323 | @item \c@var{x}
|
|---|
| 3324 | Produces or matches @kbd{@sc{Control}-@var{x}}, where @var{x} is
|
|---|
| 3325 | any character. The precise effect of @samp{\c@var{x}} is as follows:
|
|---|
| 3326 | if @var{x} is a lower case letter, it is converted to upper case.
|
|---|
| 3327 | Then bit 6 of the character (hex 40) is inverted. Thus @samp{\cz} becomes
|
|---|
| 3328 | hex 1A, but @samp{\c@{} becomes hex 3B, while @samp{\c;} becomes hex 7B.
|
|---|
| 3329 |
|
|---|
| 3330 | @item \d@var{xxx}
|
|---|
| 3331 | Produces or matches a character whose decimal @sc{ascii} value is @var{xxx}.
|
|---|
| 3332 |
|
|---|
| 3333 | @item \o@var{xxx}
|
|---|
| 3334 | Produces or matches a character whose octal @sc{ascii} value is @var{xxx}.
|
|---|
| 3335 |
|
|---|
| 3336 | @item \x@var{xx}
|
|---|
| 3337 | Produces or matches a character whose hexadecimal @sc{ascii} value is @var{xx}.
|
|---|
| 3338 | @end table
|
|---|
| 3339 |
|
|---|
| 3340 | @samp{\b} (backspace) was omitted because of the conflict with
|
|---|
| 3341 | the existing ``word boundary'' meaning.
|
|---|
| 3342 |
|
|---|
| 3343 | @subsection Escaping Precedence
|
|---|
| 3344 |
|
|---|
| 3345 | @value{SSED} processes escape sequences @emph{before} passing
|
|---|
| 3346 | the text onto the regular-expression matching of the @command{s///} command
|
|---|
| 3347 | and Address matching. Thus the following two commands are equivalent
|
|---|
| 3348 | (@samp{0x5e} is the hexadecimal @sc{ascii} value of the character @samp{^}):
|
|---|
| 3349 |
|
|---|
| 3350 | @codequoteundirected on
|
|---|
| 3351 | @codequotebacktick on
|
|---|
| 3352 | @example
|
|---|
| 3353 | @group
|
|---|
| 3354 | $ echo 'a^c' | sed 's/^/b/'
|
|---|
| 3355 | ba^c
|
|---|
| 3356 |
|
|---|
| 3357 | $ echo 'a^c' | sed 's/\x5e/b/'
|
|---|
| 3358 | ba^c
|
|---|
| 3359 | @end group
|
|---|
| 3360 | @end example
|
|---|
| 3361 | @codequoteundirected off
|
|---|
| 3362 | @codequotebacktick off
|
|---|
| 3363 |
|
|---|
| 3364 | As are the following (@samp{0x5b},@samp{0x5d} are the hexadecimal
|
|---|
| 3365 | @sc{ascii} values of @samp{[},@samp{]}, respectively):
|
|---|
| 3366 |
|
|---|
| 3367 | @codequoteundirected on
|
|---|
| 3368 | @codequotebacktick on
|
|---|
| 3369 | @example
|
|---|
| 3370 | @group
|
|---|
| 3371 | $ echo abc | sed 's/[a]/x/'
|
|---|
| 3372 | Xbc
|
|---|
| 3373 | $ echo abc | sed 's/\x5ba\x5d/x/'
|
|---|
| 3374 | Xbc
|
|---|
| 3375 | @end group
|
|---|
| 3376 | @end example
|
|---|
| 3377 | @codequoteundirected off
|
|---|
| 3378 | @codequotebacktick off
|
|---|
| 3379 |
|
|---|
| 3380 | However it is recommended to avoid such special characters
|
|---|
| 3381 | due to unexpected edge-cases. For example, the following
|
|---|
| 3382 | are not equivalent:
|
|---|
| 3383 |
|
|---|
| 3384 | @codequoteundirected on
|
|---|
| 3385 | @codequotebacktick on
|
|---|
| 3386 | @example
|
|---|
| 3387 | @group
|
|---|
| 3388 | $ echo 'a^c' | sed 's/\^/b/'
|
|---|
| 3389 | abc
|
|---|
| 3390 |
|
|---|
| 3391 | $ echo 'a^c' | sed 's/\\\x5e/b/'
|
|---|
| 3392 | a^c
|
|---|
| 3393 | @end group
|
|---|
| 3394 | @end example
|
|---|
| 3395 | @codequoteundirected off
|
|---|
| 3396 | @codequotebacktick off
|
|---|
| 3397 |
|
|---|
| 3398 | @c also: this fails in different places:
|
|---|
| 3399 | @c $ sed 's/[//'
|
|---|
| 3400 | @c sed: -e expression #1, char 5: unterminated `s' command
|
|---|
| 3401 | @c $ sed 's/\x5b//'
|
|---|
| 3402 | @c sed: -e expression #1, char 8: Invalid regular expression
|
|---|
| 3403 | @c
|
|---|
| 3404 | @c which is OK but confusing to explain why (the first
|
|---|
| 3405 | @c fails in compile.c:snarf_char_class while the second
|
|---|
| 3406 | @c is passed to the regex engine and then fails).
|
|---|
| 3407 |
|
|---|
| 3408 |
|
|---|
| 3409 | @node Locale Considerations
|
|---|
| 3410 | @section Multibyte characters and Locale Considerations
|
|---|
| 3411 |
|
|---|
| 3412 | @value{SSED} processes valid multibyte characters in multibyte locales
|
|---|
| 3413 | (e.g. @code{UTF-8}). @footnote{Some regexp edge-cases depends on the
|
|---|
| 3414 | operating system and libc implementation. The examples shown are known
|
|---|
| 3415 | to work as-expected on GNU/Linux systems using glibc.}
|
|---|
| 3416 |
|
|---|
| 3417 | @noindent The following example uses the Greek letter Capital Sigma
|
|---|
| 3418 | (@value{ucsigma},
|
|---|
| 3419 | Unicode code point @code{0x03A3}). In a @code{UTF-8} locale,
|
|---|
| 3420 | @command{sed} correctly processes the Sigma as one character despite
|
|---|
| 3421 | it being 2 octets (bytes):
|
|---|
| 3422 |
|
|---|
| 3423 | @codequoteundirected on
|
|---|
| 3424 | @codequotebacktick on
|
|---|
| 3425 | @example
|
|---|
| 3426 | @group
|
|---|
| 3427 | $ locale | grep LANG
|
|---|
| 3428 | LANG=en_US.UTF-8
|
|---|
| 3429 |
|
|---|
| 3430 | $ printf 'a\u03A3b'
|
|---|
| 3431 | a@value{ucsigma}b
|
|---|
| 3432 |
|
|---|
| 3433 | $ printf 'a\u03A3b' | sed 's/./X/g'
|
|---|
| 3434 | XXX
|
|---|
| 3435 |
|
|---|
| 3436 | $ printf 'a\u03A3b' | od -tx1 -An
|
|---|
| 3437 | 61 ce a3 62
|
|---|
| 3438 | @end group
|
|---|
| 3439 | @end example
|
|---|
| 3440 | @codequoteundirected off
|
|---|
| 3441 | @codequotebacktick off
|
|---|
| 3442 |
|
|---|
| 3443 | @noindent
|
|---|
| 3444 | To force @command{sed} to process octets separately, use the @code{C} locale
|
|---|
| 3445 | (also known as the @code{POSIX} locale):
|
|---|
| 3446 |
|
|---|
| 3447 | @codequoteundirected on
|
|---|
| 3448 | @codequotebacktick on
|
|---|
| 3449 | @example
|
|---|
| 3450 | $ printf 'a\u03A3b' | LC_ALL=C sed 's/./X/g'
|
|---|
| 3451 | XXXX
|
|---|
| 3452 | @end example
|
|---|
| 3453 | @codequoteundirected off
|
|---|
| 3454 | @codequotebacktick off
|
|---|
| 3455 |
|
|---|
| 3456 | @subsection Invalid multibyte characters
|
|---|
| 3457 |
|
|---|
| 3458 | @command{sed}'s regular expressions @emph{do not} match
|
|---|
| 3459 | invalid multibyte sequences in a multibyte locale.
|
|---|
| 3460 |
|
|---|
| 3461 | @noindent
|
|---|
| 3462 | In the following examples, the ascii value @code{0xCE} is
|
|---|
| 3463 | an incomplete multibyte character (shown here as @value{unicodeFFFD}).
|
|---|
| 3464 | The regular expression @samp{.} does not match it:
|
|---|
| 3465 |
|
|---|
| 3466 | @codequoteundirected on
|
|---|
| 3467 | @codequotebacktick on
|
|---|
| 3468 | @example
|
|---|
| 3469 | @group
|
|---|
| 3470 | $ printf 'a\xCEb\n'
|
|---|
| 3471 | a@value{unicodeFFFD}e
|
|---|
| 3472 |
|
|---|
| 3473 | $ printf 'a\xCEb\n' | sed 's/./X/g'
|
|---|
| 3474 | X@value{unicodeFFFD}X
|
|---|
| 3475 |
|
|---|
| 3476 | $ printf 'a\xCEc\n' | sed 's/./X/g' | od -tx1c -An
|
|---|
| 3477 | 58 ce 58 0a
|
|---|
| 3478 | X X \n
|
|---|
| 3479 | @end group
|
|---|
| 3480 | @end example
|
|---|
| 3481 | @codequoteundirected off
|
|---|
| 3482 | @codequotebacktick off
|
|---|
| 3483 |
|
|---|
| 3484 | @noindent Similarly, the 'catch-all' regular expression @samp{.*} does not
|
|---|
| 3485 | match the entire line:
|
|---|
| 3486 |
|
|---|
| 3487 | @codequoteundirected on
|
|---|
| 3488 | @codequotebacktick on
|
|---|
| 3489 | @example
|
|---|
| 3490 | @group
|
|---|
| 3491 | $ printf 'a\xCEc\n' | sed 's/.*//' | od -tx1c -An
|
|---|
| 3492 | ce 63 0a
|
|---|
| 3493 | c \n
|
|---|
| 3494 | @end group
|
|---|
| 3495 | @end example
|
|---|
| 3496 | @codequoteundirected off
|
|---|
| 3497 | @codequotebacktick off
|
|---|
| 3498 |
|
|---|
| 3499 | @noindent
|
|---|
| 3500 | @value{SSED} offers the special @command{z} command to clear the
|
|---|
| 3501 | current pattern space regardless of invalid multibyte characters
|
|---|
| 3502 | (i.e. it works like @code{s/.*//} but also removes invalid multibyte
|
|---|
| 3503 | characters):
|
|---|
| 3504 |
|
|---|
| 3505 | @codequoteundirected on
|
|---|
| 3506 | @codequotebacktick on
|
|---|
| 3507 | @example
|
|---|
| 3508 | @group
|
|---|
| 3509 | $ printf 'a\xCEc\n' | sed 'z' | od -tx1c -An
|
|---|
| 3510 | 0a
|
|---|
| 3511 | \n
|
|---|
| 3512 | @end group
|
|---|
| 3513 | @end example
|
|---|
| 3514 | @codequoteundirected off
|
|---|
| 3515 | @codequotebacktick off
|
|---|
| 3516 |
|
|---|
| 3517 | @noindent Alternatively, force the @code{C} locale to process
|
|---|
| 3518 | each octet separately (every octet is a valid character in the @code{C}
|
|---|
| 3519 | locale):
|
|---|
| 3520 |
|
|---|
| 3521 | @codequoteundirected on
|
|---|
| 3522 | @codequotebacktick on
|
|---|
| 3523 | @example
|
|---|
| 3524 | @group
|
|---|
| 3525 | $ printf 'a\xCEc\n' | LC_ALL=C sed 's/.*//' | od -tx1c -An
|
|---|
| 3526 | 0a
|
|---|
| 3527 | \n
|
|---|
| 3528 | @end group
|
|---|
| 3529 | @end example
|
|---|
| 3530 | @codequoteundirected off
|
|---|
| 3531 | @codequotebacktick off
|
|---|
| 3532 |
|
|---|
| 3533 |
|
|---|
| 3534 | @command{sed}'s inability to process invalid multibyte characters
|
|---|
| 3535 | can be used to detect such invalid sequences in a file.
|
|---|
| 3536 | In the following examples, the @code{\xCE\xCE} is an invalid
|
|---|
| 3537 | multibyte sequence, while @code{\xCE\A3} is a valid multibyte sequence
|
|---|
| 3538 | (of the Greek Sigma character).
|
|---|
| 3539 |
|
|---|
| 3540 | @noindent
|
|---|
| 3541 | The following @command{sed} program removes all valid
|
|---|
| 3542 | characters using @code{s/.//g}. Any content left in the pattern space
|
|---|
| 3543 | (the invalid characters) are added to the hold space using the
|
|---|
| 3544 | @code{H} command. On the last line (@code{$}), the hold space is retrieved
|
|---|
| 3545 | (@code{x}), newlines are removed (@code{s/\n//g}), and any remaining
|
|---|
| 3546 | octets are printed unambiguously (@code{l}). Thus, any invalid
|
|---|
| 3547 | multibyte sequences are printed as octal values:
|
|---|
| 3548 |
|
|---|
| 3549 | @codequoteundirected on
|
|---|
| 3550 | @codequotebacktick on
|
|---|
| 3551 | @example
|
|---|
| 3552 | @group
|
|---|
| 3553 | $ printf 'ab\nc\n\xCE\xCEde\n\xCE\xA3f\n' > invalid.txt
|
|---|
| 3554 |
|
|---|
| 3555 | $ cat invalid.txt
|
|---|
| 3556 | ab
|
|---|
| 3557 | c
|
|---|
| 3558 | @value{unicodeFFFD}@value{unicodeFFFD}de
|
|---|
| 3559 | @value{ucsigma}f
|
|---|
| 3560 |
|
|---|
| 3561 | $ sed -n 's/.//g ; H ; $@{x;s/\n//g;l@}' invalid.txt
|
|---|
| 3562 | \316\316$
|
|---|
| 3563 | @end group
|
|---|
| 3564 | @end example
|
|---|
| 3565 | @codequoteundirected off
|
|---|
| 3566 | @codequotebacktick off
|
|---|
| 3567 |
|
|---|
| 3568 | @noindent With a few more commands, @command{sed} can print
|
|---|
| 3569 | the exact line number corresponding to each invalid characters (line 3).
|
|---|
| 3570 | These characters can then be removed by forcing the @code{C} locale
|
|---|
| 3571 | and using octal escape sequences:
|
|---|
| 3572 |
|
|---|
| 3573 | @codequoteundirected on
|
|---|
| 3574 | @codequotebacktick on
|
|---|
| 3575 | @example
|
|---|
| 3576 | $ sed -n 's/.//g;=;l' invalid.txt | paste - - | awk '$2!="$"'
|
|---|
| 3577 | 3 \316\316$
|
|---|
| 3578 |
|
|---|
| 3579 | $ LC_ALL=C sed '3s/\o316\o316//' invalid.txt > fixed.txt
|
|---|
| 3580 | @end example
|
|---|
| 3581 | @codequoteundirected off
|
|---|
| 3582 | @codequotebacktick off
|
|---|
| 3583 |
|
|---|
| 3584 | @subsection Upper/Lower case conversion
|
|---|
| 3585 |
|
|---|
| 3586 |
|
|---|
| 3587 | @value{SSED}'s substitute command (@code{s}) supports upper/lower
|
|---|
| 3588 | case conversions using @code{\U},@code{\L} codes.
|
|---|
| 3589 | These conversions support multibyte characters:
|
|---|
| 3590 |
|
|---|
| 3591 | @codequoteundirected on
|
|---|
| 3592 | @codequotebacktick on
|
|---|
| 3593 | @example
|
|---|
| 3594 | $ printf 'ABC\u03a3\n'
|
|---|
| 3595 | ABC@value{ucsigma}
|
|---|
| 3596 |
|
|---|
| 3597 | $ printf 'ABC\u03a3\n' | sed 's/.*/\L&/'
|
|---|
| 3598 | abc@value{lcsigma}
|
|---|
| 3599 | @end example
|
|---|
| 3600 | @codequoteundirected off
|
|---|
| 3601 | @codequotebacktick off
|
|---|
| 3602 |
|
|---|
| 3603 | @noindent
|
|---|
| 3604 | @xref{The "s" Command}.
|
|---|
| 3605 |
|
|---|
| 3606 |
|
|---|
| 3607 | @subsection Multibyte regexp character classes
|
|---|
| 3608 |
|
|---|
| 3609 | @c TODO: fix following paragraphs (copied verbatim from 'bracket
|
|---|
| 3610 | @c expression' section).
|
|---|
| 3611 |
|
|---|
| 3612 | In other locales, the sorting sequence is not specified, and
|
|---|
| 3613 | @samp{[a-d]} might be equivalent to @samp{[abcd]} or to
|
|---|
| 3614 | @samp{[aBbCcDd]}, or it might fail to match any character, or the set of
|
|---|
| 3615 | characters that it matches might even be erratic.
|
|---|
| 3616 | To obtain the traditional interpretation
|
|---|
| 3617 | of bracket expressions, you can use the @samp{C} locale by setting the
|
|---|
| 3618 | @env{LC_ALL} environment variable to the value @samp{C}.
|
|---|
| 3619 |
|
|---|
| 3620 | @example
|
|---|
| 3621 | # TODO: is there any real-world system/locale where 'A'
|
|---|
| 3622 | # is replaced by '-' ?
|
|---|
| 3623 | $ echo A | sed 's/[a-z]/-/'
|
|---|
| 3624 | A
|
|---|
| 3625 | @end example
|
|---|
| 3626 |
|
|---|
| 3627 | Their interpretation depends on the @env{LC_CTYPE} locale;
|
|---|
| 3628 | for example, @samp{[[:alnum:]]} means the character class of numbers and letters
|
|---|
| 3629 | in the current locale.
|
|---|
| 3630 |
|
|---|
| 3631 | TODO: show example of collation
|
|---|
| 3632 |
|
|---|
| 3633 | @codequoteundirected on
|
|---|
| 3634 | @codequotebacktick on
|
|---|
| 3635 | @example
|
|---|
| 3636 | # TODO: this works on glibc systems, not on musl-libc/freebsd/macosx.
|
|---|
| 3637 | $ printf 'cliché\n' | LC_ALL=fr_FR.utf8 sed 's/[[=e=]]/X/g'
|
|---|
| 3638 | clichX
|
|---|
| 3639 | @end example
|
|---|
| 3640 | @codequoteundirected off
|
|---|
| 3641 | @codequotebacktick off
|
|---|
| 3642 |
|
|---|
| 3643 |
|
|---|
| 3644 | @node advanced sed
|
|---|
| 3645 | @chapter Advanced @command{sed}: cycles and buffers
|
|---|
| 3646 |
|
|---|
| 3647 | @menu
|
|---|
| 3648 | * Execution Cycle:: How @command{sed} works
|
|---|
| 3649 | * Hold and Pattern Buffers::
|
|---|
| 3650 | * Multiline techniques:: Using D,G,H,N,P to process multiple lines
|
|---|
| 3651 | * Branching and flow control::
|
|---|
| 3652 | @end menu
|
|---|
| 3653 |
|
|---|
| 3654 | @node Execution Cycle
|
|---|
| 3655 | @section How @command{sed} Works
|
|---|
| 3656 |
|
|---|
| 3657 | @cindex Buffer spaces, pattern and hold
|
|---|
| 3658 | @cindex Spaces, pattern and hold
|
|---|
| 3659 | @cindex Pattern space, definition
|
|---|
| 3660 | @cindex Hold space, definition
|
|---|
| 3661 | @command{sed} maintains two data buffers: the active @emph{pattern} space,
|
|---|
| 3662 | and the auxiliary @emph{hold} space. Both are initially empty.
|
|---|
| 3663 |
|
|---|
| 3664 | @command{sed} operates by performing the following cycle on each
|
|---|
| 3665 | line of input: first, @command{sed} reads one line from the input
|
|---|
| 3666 | stream, removes any trailing newline, and places it in the pattern space.
|
|---|
| 3667 | Then commands are executed; each command can have an address associated
|
|---|
| 3668 | to it: addresses are a kind of condition code, and a command is only
|
|---|
| 3669 | executed if the condition is verified before the command is to be
|
|---|
| 3670 | executed.
|
|---|
| 3671 |
|
|---|
| 3672 | When the end of the script is reached, unless the @option{-n} option
|
|---|
| 3673 | is in use, the contents of pattern space are printed out to the output
|
|---|
| 3674 | stream, adding back the trailing newline if it was removed.@footnote{Actually,
|
|---|
| 3675 | if @command{sed} prints a line without the terminating newline, it will
|
|---|
| 3676 | nevertheless print the missing newline as soon as more text is sent to
|
|---|
| 3677 | the same output stream, which gives the ``least expected surprise''
|
|---|
| 3678 | even though it does not make commands like @samp{sed -n p} exactly
|
|---|
| 3679 | identical to @command{cat}.} Then the next cycle starts for the next
|
|---|
| 3680 | input line.
|
|---|
| 3681 |
|
|---|
| 3682 | Unless special commands (like @samp{D}) are used, the pattern space is
|
|---|
| 3683 | deleted between two cycles. The hold space, on the other hand, keeps
|
|---|
| 3684 | its data between cycles (see commands @samp{h}, @samp{H}, @samp{x},
|
|---|
| 3685 | @samp{g}, @samp{G} to move data between both buffers).
|
|---|
| 3686 |
|
|---|
| 3687 | @node Hold and Pattern Buffers
|
|---|
| 3688 | @section Hold and Pattern Buffers
|
|---|
| 3689 |
|
|---|
| 3690 | TODO
|
|---|
| 3691 |
|
|---|
| 3692 | @node Multiline techniques
|
|---|
| 3693 | @section Multiline techniques - using D,G,H,N,P to process multiple lines
|
|---|
| 3694 |
|
|---|
| 3695 | Multiple lines can be processed as one buffer using the
|
|---|
| 3696 | @code{D},@code{G},@code{H},@code{N},@code{P}. They are similar to
|
|---|
| 3697 | their lowercase counterparts (@code{d},@code{g},
|
|---|
| 3698 | @code{h},@code{n},@code{p}), except that these commands append or
|
|---|
| 3699 | subtract data while respecting embedded newlines - allowing adding and
|
|---|
| 3700 | removing lines from the pattern and hold spaces.
|
|---|
| 3701 |
|
|---|
| 3702 | They operate as follows:
|
|---|
| 3703 | @table @code
|
|---|
| 3704 | @item D
|
|---|
| 3705 | @emph{deletes} line from the pattern space until the first newline,
|
|---|
| 3706 | and restarts the cycle.
|
|---|
| 3707 |
|
|---|
| 3708 | @item G
|
|---|
| 3709 | @emph{appends} line from the hold space to the pattern space, with a
|
|---|
| 3710 | newline before it.
|
|---|
| 3711 |
|
|---|
| 3712 | @item H
|
|---|
| 3713 | @emph{appends} line from the pattern space to the hold space, with a
|
|---|
| 3714 | newline before it.
|
|---|
| 3715 |
|
|---|
| 3716 | @item N
|
|---|
| 3717 | @emph{appends} line from the input file to the pattern space.
|
|---|
| 3718 |
|
|---|
| 3719 | @item P
|
|---|
| 3720 | @emph{prints} line from the pattern space until the first newline.
|
|---|
| 3721 |
|
|---|
| 3722 | @end table
|
|---|
| 3723 |
|
|---|
| 3724 |
|
|---|
| 3725 | The following example illustrates the operation of @code{N} and
|
|---|
| 3726 | @code{D} commands:
|
|---|
| 3727 |
|
|---|
| 3728 | @codequoteundirected on
|
|---|
| 3729 | @codequotebacktick on
|
|---|
| 3730 | @example
|
|---|
| 3731 | @group
|
|---|
| 3732 | $ seq 6 | sed -n 'N;l;D'
|
|---|
| 3733 | 1\n2$
|
|---|
| 3734 | 2\n3$
|
|---|
| 3735 | 3\n4$
|
|---|
| 3736 | 4\n5$
|
|---|
| 3737 | 5\n6$
|
|---|
| 3738 | @end group
|
|---|
| 3739 | @end example
|
|---|
| 3740 | @codequoteundirected off
|
|---|
| 3741 | @codequotebacktick off
|
|---|
| 3742 |
|
|---|
| 3743 | @enumerate
|
|---|
| 3744 | @item
|
|---|
| 3745 | @command{sed} starts by reading the first line into the pattern space
|
|---|
| 3746 | (i.e. @samp{1}).
|
|---|
| 3747 | @item
|
|---|
| 3748 | At the beginning of every cycle, the @code{N}
|
|---|
| 3749 | command appends a newline and the next line to the pattern space
|
|---|
| 3750 | (i.e. @samp{1}, @samp{\n}, @samp{2} in the first cycle).
|
|---|
| 3751 | @item
|
|---|
| 3752 | The @code{l} command prints the content of the pattern space
|
|---|
| 3753 | unambiguously.
|
|---|
| 3754 | @item
|
|---|
| 3755 | The @code{D} command then removes the content of pattern
|
|---|
| 3756 | space up to the first newline (leaving @samp{2} at the end of
|
|---|
| 3757 | the first cycle).
|
|---|
| 3758 | @item
|
|---|
| 3759 | At the next cycle the @code{N} command appends a
|
|---|
| 3760 | newline and the next input line to the pattern space
|
|---|
| 3761 | (e.g. @samp{2}, @samp{\n}, @samp{3}).
|
|---|
| 3762 | @end enumerate
|
|---|
| 3763 |
|
|---|
| 3764 |
|
|---|
| 3765 | @cindex processing paragraphs
|
|---|
| 3766 | @cindex paragraphs, processing
|
|---|
| 3767 | A common technique to process blocks of text such as paragraphs
|
|---|
| 3768 | (instead of line-by-line) is using the following construct:
|
|---|
| 3769 |
|
|---|
| 3770 | @codequoteundirected on
|
|---|
| 3771 | @codequotebacktick on
|
|---|
| 3772 | @example
|
|---|
| 3773 | sed '/./@{H;$!d@} ; x ; s/REGEXP/REPLACEMENT/'
|
|---|
| 3774 | @end example
|
|---|
| 3775 | @codequoteundirected off
|
|---|
| 3776 | @codequotebacktick off
|
|---|
| 3777 |
|
|---|
| 3778 | @enumerate
|
|---|
| 3779 | @item
|
|---|
| 3780 | The first expression, @code{/./@{H;$!d@}} operates on all non-empty lines,
|
|---|
| 3781 | and adds the current line (in the pattern space) to the hold space.
|
|---|
| 3782 | On all lines except the last, the pattern space is deleted and the cycle is
|
|---|
| 3783 | restarted.
|
|---|
| 3784 |
|
|---|
| 3785 | @item
|
|---|
| 3786 | The other expressions @code{x} and @code{s} are executed only on empty
|
|---|
| 3787 | lines (i.e. paragraph separators). The @code{x} command fetches the
|
|---|
| 3788 | accumulated lines from the hold space back to the pattern space. The
|
|---|
| 3789 | @code{s///} command then operates on all the text in the paragraph
|
|---|
| 3790 | (including the embedded newlines).
|
|---|
| 3791 | @end enumerate
|
|---|
| 3792 |
|
|---|
| 3793 | The following example demonstrates this technique:
|
|---|
| 3794 | @codequoteundirected on
|
|---|
| 3795 | @codequotebacktick on
|
|---|
| 3796 | @example
|
|---|
| 3797 | @group
|
|---|
| 3798 | $ cat input.txt
|
|---|
| 3799 | a a a aa aaa
|
|---|
| 3800 | aaaa aaaa aa
|
|---|
| 3801 | aaaa aaa aaa
|
|---|
| 3802 |
|
|---|
| 3803 | bbbb bbb bbb
|
|---|
| 3804 | bb bb bbb bb
|
|---|
| 3805 | bbbbbbbb bbb
|
|---|
| 3806 |
|
|---|
| 3807 | ccc ccc cccc
|
|---|
| 3808 | cccc ccccc c
|
|---|
| 3809 | cc cc cc cc
|
|---|
| 3810 |
|
|---|
| 3811 | $ sed '/./@{H;$!d@} ; x ; s/^/\nSTART-->/ ; s/$/\n<--END/' input.txt
|
|---|
| 3812 |
|
|---|
| 3813 | START-->
|
|---|
| 3814 | a a a aa aaa
|
|---|
| 3815 | aaaa aaaa aa
|
|---|
| 3816 | aaaa aaa aaa
|
|---|
| 3817 | <--END
|
|---|
| 3818 |
|
|---|
| 3819 | START-->
|
|---|
| 3820 | bbbb bbb bbb
|
|---|
| 3821 | bb bb bbb bb
|
|---|
| 3822 | bbbbbbbb bbb
|
|---|
| 3823 | <--END
|
|---|
| 3824 |
|
|---|
| 3825 | START-->
|
|---|
| 3826 | ccc ccc cccc
|
|---|
| 3827 | cccc ccccc c
|
|---|
| 3828 | cc cc cc cc
|
|---|
| 3829 | <--END
|
|---|
| 3830 | @end group
|
|---|
| 3831 | @end example
|
|---|
| 3832 | @codequoteundirected off
|
|---|
| 3833 | @codequotebacktick off
|
|---|
| 3834 |
|
|---|
| 3835 | For more annotated examples, @pxref{Text search across multiple lines}
|
|---|
| 3836 | and @ref{Line length adjustment}.
|
|---|
| 3837 |
|
|---|
| 3838 | @node Branching and flow control
|
|---|
| 3839 | @section Branching and Flow Control
|
|---|
| 3840 |
|
|---|
| 3841 | The branching commands @code{b}, @code{t}, and @code{T} enable
|
|---|
| 3842 | changing the flow of @command{sed} programs.
|
|---|
| 3843 |
|
|---|
| 3844 | By default, @command{sed} reads an input line into the pattern buffer,
|
|---|
| 3845 | then continues to processes all commands in order.
|
|---|
| 3846 | Commands without addresses affect all lines.
|
|---|
| 3847 | Commands with addresses affect only matching lines.
|
|---|
| 3848 | @xref{Execution Cycle} and @ref{Addresses overview}.
|
|---|
| 3849 |
|
|---|
| 3850 | @command{sed} does not support a typical @code{if/then} construct.
|
|---|
| 3851 | Instead, some commands can be used as conditionals or to change the
|
|---|
| 3852 | default flow control:
|
|---|
| 3853 |
|
|---|
| 3854 | @table @code
|
|---|
| 3855 |
|
|---|
| 3856 | @item d
|
|---|
| 3857 | delete (clears) the current pattern space,
|
|---|
| 3858 | and restart the program cycle without processing the rest of the commands
|
|---|
| 3859 | and without printing the pattern space.
|
|---|
| 3860 |
|
|---|
| 3861 | @item D
|
|---|
| 3862 | delete the contents of the pattern space @emph{up to the first newline},
|
|---|
| 3863 | and restart the program cycle without processing the rest of
|
|---|
| 3864 | the commands and without printing the pattern space.
|
|---|
| 3865 |
|
|---|
| 3866 | @item [addr]X
|
|---|
| 3867 | @itemx [addr]@{ X ; X ; X @}
|
|---|
| 3868 | @item /regexp/X
|
|---|
| 3869 | @item /regexp/@{ X ; X ; X @}
|
|---|
| 3870 | Addresses and regular expressions can be used as an @code{if/then}
|
|---|
| 3871 | conditional: If @var{[addr]} matches the current pattern space,
|
|---|
| 3872 | execute the command(s).
|
|---|
| 3873 | For example: The command @code{/^#/d} means:
|
|---|
| 3874 | @emph{if} the current pattern matches the regular expression @code{^#} (a line
|
|---|
| 3875 | starting with a hash), @emph{then} execute the @code{d} command:
|
|---|
| 3876 | delete the line without printing it, and restart the program cycle
|
|---|
| 3877 | immediately.
|
|---|
| 3878 |
|
|---|
| 3879 | @item b
|
|---|
| 3880 | branch unconditionally (that is: always jump to a label, skipping
|
|---|
| 3881 | or repeating other commands, without restarting a new cycle). Combined
|
|---|
| 3882 | with an address, the branch can be conditionally executed on matched
|
|---|
| 3883 | lines.
|
|---|
| 3884 |
|
|---|
| 3885 | @item t
|
|---|
| 3886 | branch conditionally (that is: jump to a label) @emph{only if} a
|
|---|
| 3887 | @code{s///} command has succeeded since the last input line was read
|
|---|
| 3888 | or another conditional branch was taken.
|
|---|
| 3889 |
|
|---|
| 3890 | @item T
|
|---|
| 3891 | similar but opposite to the @code{t} command: branch only if
|
|---|
| 3892 | there has been @emph{no} successful substitutions since the last
|
|---|
| 3893 | input line was read.
|
|---|
| 3894 | @end table
|
|---|
| 3895 |
|
|---|
| 3896 |
|
|---|
| 3897 | The following two @command{sed} programs are equivalent. The first
|
|---|
| 3898 | (contrived) example uses the @code{b} command to skip the @code{s///}
|
|---|
| 3899 | command on lines containing @samp{1}. The second example uses an
|
|---|
| 3900 | address with negation (@samp{!}) to perform substitution only on
|
|---|
| 3901 | desired lines. The @code{y///} command is still executed on all
|
|---|
| 3902 | lines:
|
|---|
| 3903 |
|
|---|
| 3904 | @codequoteundirected on
|
|---|
| 3905 | @codequotebacktick on
|
|---|
| 3906 | @example
|
|---|
| 3907 | @group
|
|---|
| 3908 | $ printf '%s\n' a1 a2 a3 | sed -E '/1/bx ; s/a/z/ ; :x ; y/123/456/'
|
|---|
| 3909 | a4
|
|---|
| 3910 | z5
|
|---|
| 3911 | z6
|
|---|
| 3912 |
|
|---|
| 3913 | $ printf '%s\n' a1 a2 a3 | sed -E '/1/!s/a/z/ ; y/123/456/'
|
|---|
| 3914 | a4
|
|---|
| 3915 | z5
|
|---|
| 3916 | z6
|
|---|
| 3917 | @end group
|
|---|
| 3918 | @end example
|
|---|
| 3919 | @codequoteundirected off
|
|---|
| 3920 | @codequotebacktick off
|
|---|
| 3921 |
|
|---|
| 3922 |
|
|---|
| 3923 |
|
|---|
| 3924 | @subsection Branching and Cycles
|
|---|
| 3925 | @cindex labels
|
|---|
| 3926 | @cindex omitting labels
|
|---|
| 3927 | @cindex cycle, restarting
|
|---|
| 3928 | @cindex restarting a cycle
|
|---|
| 3929 | The @code{b},@code{t} and @code{T} commands can be followed by a label
|
|---|
| 3930 | (typically a single letter). Labels are defined with a colon followed by
|
|---|
| 3931 | one or more letters (e.g. @samp{:x}). If the label is omitted the
|
|---|
| 3932 | branch commands restart the cycle. Note the difference between
|
|---|
| 3933 | branching to a label and restarting the cycle: when a cycle is
|
|---|
| 3934 | restarted, @command{sed} first prints the current content of the
|
|---|
| 3935 | pattern space, then reads the next input line into the pattern space;
|
|---|
| 3936 | Jumping to a label (even if it is at the beginning of the program)
|
|---|
| 3937 | does not print the pattern space and does not read the next input line.
|
|---|
| 3938 |
|
|---|
| 3939 | The following program is a no-op. The @code{b} command (the only command
|
|---|
| 3940 | in the program) does not have a label, and thus simply restarts the cycle.
|
|---|
| 3941 | On each cycle, the pattern space is printed and the next input line is read:
|
|---|
| 3942 |
|
|---|
| 3943 | @example
|
|---|
| 3944 | @group
|
|---|
| 3945 | $ seq 3 | sed b
|
|---|
| 3946 | 1
|
|---|
| 3947 | 2
|
|---|
| 3948 | 3
|
|---|
| 3949 | @end group
|
|---|
| 3950 | @end example
|
|---|
| 3951 |
|
|---|
| 3952 | @cindex infinite loop, branching
|
|---|
| 3953 | @cindex branching, infinite loop
|
|---|
| 3954 | The following example is an infinite-loop - it doesn't terminate and
|
|---|
| 3955 | doesn't print anything. The @code{b} command jumps to the @samp{x}
|
|---|
| 3956 | label, and a new cycle is never started:
|
|---|
| 3957 |
|
|---|
| 3958 | @codequoteundirected on
|
|---|
| 3959 | @codequotebacktick on
|
|---|
| 3960 | @example
|
|---|
| 3961 | @group
|
|---|
| 3962 | $ seq 3 | sed ':x ; bx'
|
|---|
| 3963 |
|
|---|
| 3964 | # The above command requires gnu sed (which supports additional
|
|---|
| 3965 | # commands following a label, without a newline). A portable equivalent:
|
|---|
| 3966 | # sed -e ':x' -e bx
|
|---|
| 3967 | @end group
|
|---|
| 3968 | @end example
|
|---|
| 3969 | @codequoteundirected off
|
|---|
| 3970 | @codequotebacktick off
|
|---|
| 3971 |
|
|---|
| 3972 | @cindex branching and n, N
|
|---|
| 3973 | @cindex n, and branching
|
|---|
| 3974 | @cindex N, and branching
|
|---|
| 3975 | Branching is often complemented with the @code{n} or @code{N} commands:
|
|---|
| 3976 | both commands read the next input line into the pattern space without waiting
|
|---|
| 3977 | for the cycle to restart. Before reading the next input line, @code{n}
|
|---|
| 3978 | prints the current pattern space then empties it, while @code{N}
|
|---|
| 3979 | appends a newline and the next input line to the pattern space.
|
|---|
| 3980 |
|
|---|
| 3981 | Consider the following two examples:
|
|---|
| 3982 |
|
|---|
| 3983 | @codequoteundirected on
|
|---|
| 3984 | @codequotebacktick on
|
|---|
| 3985 | @example
|
|---|
| 3986 | @group
|
|---|
| 3987 | $ seq 3 | sed ':x ; n ; bx'
|
|---|
| 3988 | 1
|
|---|
| 3989 | 2
|
|---|
| 3990 | 3
|
|---|
| 3991 |
|
|---|
| 3992 | $ seq 3 | sed ':x ; N ; bx'
|
|---|
| 3993 | 1
|
|---|
| 3994 | 2
|
|---|
| 3995 | 3
|
|---|
| 3996 | @end group
|
|---|
| 3997 | @end example
|
|---|
| 3998 | @codequoteundirected off
|
|---|
| 3999 | @codequotebacktick off
|
|---|
| 4000 |
|
|---|
| 4001 | @itemize
|
|---|
| 4002 | @item
|
|---|
| 4003 | Both examples do not inf-loop, despite never starting a new cycle.
|
|---|
| 4004 |
|
|---|
| 4005 | @item
|
|---|
| 4006 | In the first example, the @code{n} commands first prints the content
|
|---|
| 4007 | of the pattern space, empties the pattern space then reads the next
|
|---|
| 4008 | input line.
|
|---|
| 4009 |
|
|---|
| 4010 | @item
|
|---|
| 4011 | In the second example, the @code{N} commands appends the next input
|
|---|
| 4012 | line to the pattern space (with a newline). Lines are accumulated in
|
|---|
| 4013 | the pattern space until there are no more input lines to read, then
|
|---|
| 4014 | the @code{N} command terminates the @command{sed} program. When the
|
|---|
| 4015 | program terminates, the end-of-cycle actions are performed, and the
|
|---|
| 4016 | entire pattern space is printed.
|
|---|
| 4017 |
|
|---|
| 4018 | @item
|
|---|
| 4019 | The second example requires @value{SSED},
|
|---|
| 4020 | because it uses the non-POSIX-standard behavior of @code{N}.
|
|---|
| 4021 | See the ``@code{N} command on the last line'' paragraph
|
|---|
| 4022 | in @ref{Reporting Bugs}.
|
|---|
| 4023 |
|
|---|
| 4024 | @item
|
|---|
| 4025 | To further examine the difference between the two examples,
|
|---|
| 4026 | try the following commands:
|
|---|
| 4027 | @codequoteundirected on
|
|---|
| 4028 | @codequotebacktick on
|
|---|
| 4029 | @example
|
|---|
| 4030 | @group
|
|---|
| 4031 | printf '%s\n' aa bb cc dd | sed ':x ; n ; = ; bx'
|
|---|
| 4032 | printf '%s\n' aa bb cc dd | sed ':x ; N ; = ; bx'
|
|---|
| 4033 | printf '%s\n' aa bb cc dd | sed ':x ; n ; s/\n/***/ ; bx'
|
|---|
| 4034 | printf '%s\n' aa bb cc dd | sed ':x ; N ; s/\n/***/ ; bx'
|
|---|
| 4035 | @end group
|
|---|
| 4036 | @end example
|
|---|
| 4037 | @codequoteundirected off
|
|---|
| 4038 | @codequotebacktick off
|
|---|
| 4039 |
|
|---|
| 4040 | @end itemize
|
|---|
| 4041 |
|
|---|
| 4042 |
|
|---|
| 4043 |
|
|---|
| 4044 | @subsection Branching example: joining lines
|
|---|
| 4045 |
|
|---|
| 4046 | @cindex joining lines with branching
|
|---|
| 4047 | @cindex branching, joining lines
|
|---|
| 4048 | @cindex quoted-printable lines, joining
|
|---|
| 4049 | @cindex joining quoted-printable lines
|
|---|
| 4050 | @cindex t, joining lines with
|
|---|
| 4051 | @cindex b, joining lines with
|
|---|
| 4052 | @cindex b, versus t
|
|---|
| 4053 | @cindex t, versus b
|
|---|
| 4054 | As a real-world example of using branching, consider the case of
|
|---|
| 4055 | @uref{https://en.wikipedia.org/wiki/Quoted-printable,quoted-printable} files,
|
|---|
| 4056 | typically used to encode email messages.
|
|---|
| 4057 | In these files long lines are split and marked with a @dfn{soft line break}
|
|---|
| 4058 | consisting of a single @samp{=} character at the end of the line:
|
|---|
| 4059 |
|
|---|
| 4060 | @example
|
|---|
| 4061 | @group
|
|---|
| 4062 | $ cat jaques.txt
|
|---|
| 4063 | All the wor=
|
|---|
| 4064 | ld's a stag=
|
|---|
| 4065 | e,
|
|---|
| 4066 | And all the=
|
|---|
| 4067 | men and wo=
|
|---|
| 4068 | men merely =
|
|---|
| 4069 | players:
|
|---|
| 4070 | They have t=
|
|---|
| 4071 | heir exits =
|
|---|
| 4072 | and their e=
|
|---|
| 4073 | ntrances;
|
|---|
| 4074 | And one man=
|
|---|
| 4075 | in his tim=
|
|---|
| 4076 | e plays man=
|
|---|
| 4077 | y parts.
|
|---|
| 4078 | @end group
|
|---|
| 4079 | @end example
|
|---|
| 4080 |
|
|---|
| 4081 |
|
|---|
| 4082 | The following program uses an address match @samp{/=$/} as a
|
|---|
| 4083 | conditional: If the current pattern space ends with a @samp{=}, it
|
|---|
| 4084 | reads the next input line using @code{N}, replaces all @samp{=}
|
|---|
| 4085 | characters which are followed by a newline, and unconditionally
|
|---|
| 4086 | branches (@code{b}) to the beginning of the program without restarting
|
|---|
| 4087 | a new cycle. If the pattern space does not ends with @samp{=}, the
|
|---|
| 4088 | default action is performed: the pattern space is printed and a new
|
|---|
| 4089 | cycle is started:
|
|---|
| 4090 |
|
|---|
| 4091 | @codequoteundirected on
|
|---|
| 4092 | @codequotebacktick on
|
|---|
| 4093 | @example
|
|---|
| 4094 | @group
|
|---|
| 4095 | $ sed ':x ; /=$/ @{ N ; s/=\n//g ; bx @}' jaques.txt
|
|---|
| 4096 | All the world's a stage,
|
|---|
| 4097 | And all the men and women merely players:
|
|---|
| 4098 | They have their exits and their entrances;
|
|---|
| 4099 | And one man in his time plays many parts.
|
|---|
| 4100 | @end group
|
|---|
| 4101 | @end example
|
|---|
| 4102 | @codequoteundirected off
|
|---|
| 4103 | @codequotebacktick off
|
|---|
| 4104 |
|
|---|
| 4105 | Here's an alternative program with a slightly different approach: On
|
|---|
| 4106 | all lines except the last, @code{N} appends the line to the pattern
|
|---|
| 4107 | space. A substitution command then removes soft line breaks
|
|---|
| 4108 | (@samp{=} at the end of a line, i.e. followed by a newline) by replacing
|
|---|
| 4109 | them with an empty string.
|
|---|
| 4110 | @emph{if} the substitution was successful (meaning the pattern space contained
|
|---|
| 4111 | a line which should be joined), The conditional branch command @code{t} jumps
|
|---|
| 4112 | to the beginning of the program without completing or restarting the cycle.
|
|---|
| 4113 | If the substitution failed (meaning there were no soft line breaks),
|
|---|
| 4114 | The @code{t} command will @emph{not} branch. Then, @code{P} will
|
|---|
| 4115 | print the pattern space content until the first newline, and @code{D}
|
|---|
| 4116 | will delete the pattern space content until the first new line.
|
|---|
| 4117 | (To learn more about @code{N}, @code{P} and @code{D} commands
|
|---|
| 4118 | @pxref{Multiline techniques}).
|
|---|
| 4119 |
|
|---|
| 4120 |
|
|---|
| 4121 | @codequoteundirected on
|
|---|
| 4122 | @codequotebacktick on
|
|---|
| 4123 | @example
|
|---|
| 4124 | @group
|
|---|
| 4125 | $ sed ':x ; $!N ; s/=\n// ; tx ; P ; D' jaques.txt
|
|---|
| 4126 | All the world's a stage,
|
|---|
| 4127 | And all the men and women merely players:
|
|---|
| 4128 | They have their exits and their entrances;
|
|---|
| 4129 | And one man in his time plays many parts.
|
|---|
| 4130 | @end group
|
|---|
| 4131 | @end example
|
|---|
| 4132 | @codequoteundirected off
|
|---|
| 4133 | @codequotebacktick off
|
|---|
| 4134 |
|
|---|
| 4135 |
|
|---|
| 4136 | For more line-joining examples @pxref{Joining lines}.
|
|---|
| 4137 |
|
|---|
| 4138 |
|
|---|
| 4139 | @node Examples
|
|---|
| 4140 | @chapter Some Sample Scripts
|
|---|
| 4141 |
|
|---|
| 4142 | Here are some @command{sed} scripts to guide you in the art of mastering
|
|---|
| 4143 | @command{sed}.
|
|---|
| 4144 |
|
|---|
| 4145 | @menu
|
|---|
| 4146 |
|
|---|
| 4147 | Useful one-liners:
|
|---|
| 4148 | * Joining lines::
|
|---|
| 4149 |
|
|---|
| 4150 | Some exotic examples:
|
|---|
| 4151 | * Centering lines::
|
|---|
| 4152 | * Increment a number::
|
|---|
| 4153 | * Rename files to lower case::
|
|---|
| 4154 | * Print bash environment::
|
|---|
| 4155 | * Reverse chars of lines::
|
|---|
| 4156 | * Text search across multiple lines::
|
|---|
| 4157 | * Line length adjustment::
|
|---|
| 4158 | * Adding a header to multiple files::
|
|---|
| 4159 |
|
|---|
| 4160 | Emulating standard utilities:
|
|---|
| 4161 | * tac:: Reverse lines of files
|
|---|
| 4162 | * cat -n:: Numbering lines
|
|---|
| 4163 | * cat -b:: Numbering non-blank lines
|
|---|
| 4164 | * wc -c:: Counting chars
|
|---|
| 4165 | * wc -w:: Counting words
|
|---|
| 4166 | * wc -l:: Counting lines
|
|---|
| 4167 | * head:: Printing the first lines
|
|---|
| 4168 | * tail:: Printing the last lines
|
|---|
| 4169 | * uniq:: Make duplicate lines unique
|
|---|
| 4170 | * uniq -d:: Print duplicated lines of input
|
|---|
| 4171 | * uniq -u:: Remove all duplicated lines
|
|---|
| 4172 | * cat -s:: Squeezing blank lines
|
|---|
| 4173 | @end menu
|
|---|
| 4174 |
|
|---|
| 4175 | @node Joining lines
|
|---|
| 4176 | @section Joining lines
|
|---|
| 4177 |
|
|---|
| 4178 | This section uses @code{N}, @code{D} and @code{P} commands to process
|
|---|
| 4179 | multiple lines, and the @code{b} and @code{t} commands for branching.
|
|---|
| 4180 | @xref{Multiline techniques} and @ref{Branching and flow control}.
|
|---|
| 4181 |
|
|---|
| 4182 | Join specific lines (e.g. if lines 2 and 3 need to be joined):
|
|---|
| 4183 |
|
|---|
| 4184 | @codequoteundirected on
|
|---|
| 4185 | @codequotebacktick on
|
|---|
| 4186 | @example
|
|---|
| 4187 | $ cat lines.txt
|
|---|
| 4188 | hello
|
|---|
| 4189 | hel
|
|---|
| 4190 | lo
|
|---|
| 4191 | hello
|
|---|
| 4192 |
|
|---|
| 4193 | $ sed '2@{N;s/\n//;@}' lines.txt
|
|---|
| 4194 | hello
|
|---|
| 4195 | hello
|
|---|
| 4196 | hello
|
|---|
| 4197 | @end example
|
|---|
| 4198 | @codequoteundirected off
|
|---|
| 4199 | @codequotebacktick off
|
|---|
| 4200 |
|
|---|
| 4201 | Join backslash-continued lines:
|
|---|
| 4202 |
|
|---|
| 4203 | @codequoteundirected on
|
|---|
| 4204 | @codequotebacktick on
|
|---|
| 4205 | @example
|
|---|
| 4206 | $ cat 1.txt
|
|---|
| 4207 | this \
|
|---|
| 4208 | is \
|
|---|
| 4209 | a \
|
|---|
| 4210 | long \
|
|---|
| 4211 | line
|
|---|
| 4212 | and another \
|
|---|
| 4213 | line
|
|---|
| 4214 |
|
|---|
| 4215 | $ sed -e ':x /\\$/ @{ N; s/\\\n//g ; bx @}' 1.txt
|
|---|
| 4216 | this is a long line
|
|---|
| 4217 | and another line
|
|---|
| 4218 |
|
|---|
| 4219 |
|
|---|
| 4220 | #TODO: The above requires gnu sed.
|
|---|
| 4221 | # non-gnu seds need newlines after ':' and 'b'
|
|---|
| 4222 | @end example
|
|---|
| 4223 | @codequoteundirected off
|
|---|
| 4224 | @codequotebacktick off
|
|---|
| 4225 |
|
|---|
| 4226 | Join lines that start with whitespace (e.g SMTP headers):
|
|---|
| 4227 |
|
|---|
| 4228 | @codequoteundirected on
|
|---|
| 4229 | @codequotebacktick on
|
|---|
| 4230 | @example
|
|---|
| 4231 | @group
|
|---|
| 4232 | $ cat 2.txt
|
|---|
| 4233 | Subject: Hello
|
|---|
| 4234 | World
|
|---|
| 4235 | Content-Type: multipart/alternative;
|
|---|
| 4236 | boundary=94eb2c190cc6370f06054535da6a
|
|---|
| 4237 | Date: Tue, 3 Jan 2017 19:41:16 +0000 (GMT)
|
|---|
| 4238 | Authentication-Results: mx.gnu.org;
|
|---|
| 4239 | dkim=pass header.i=@@gnu.org;
|
|---|
| 4240 | spf=pass
|
|---|
| 4241 | Message-ID: <abcdef@@gnu.org>
|
|---|
| 4242 | From: John Doe <jdoe@@gnu.org>
|
|---|
| 4243 | To: Jane Smith <jsmith@@gnu.org>
|
|---|
| 4244 |
|
|---|
| 4245 | $ sed -E ':a ; $!N ; s/\n\s+/ / ; ta ; P ; D' 2.txt
|
|---|
| 4246 | Subject: Hello World
|
|---|
| 4247 | Content-Type: multipart/alternative; boundary=94eb2c190cc6370f06054535da6a
|
|---|
| 4248 | Date: Tue, 3 Jan 2017 19:41:16 +0000 (GMT)
|
|---|
| 4249 | Authentication-Results: mx.gnu.org; dkim=pass header.i=@@gnu.org; spf=pass
|
|---|
| 4250 | Message-ID: <abcdef@@gnu.org>
|
|---|
| 4251 | From: John Doe <jdoe@@gnu.org>
|
|---|
| 4252 | To: Jane Smith <jsmith@@gnu.org>
|
|---|
| 4253 |
|
|---|
| 4254 | # A portable (non-gnu) variation:
|
|---|
| 4255 | # sed -e :a -e '$!N;s/\n */ /;ta' -e 'P;D'
|
|---|
| 4256 | @end group
|
|---|
| 4257 | @end example
|
|---|
| 4258 | @codequoteundirected off
|
|---|
| 4259 | @codequotebacktick off
|
|---|
| 4260 |
|
|---|
| 4261 |
|
|---|
| 4262 | @node Centering lines
|
|---|
| 4263 | @section Centering Lines
|
|---|
| 4264 |
|
|---|
| 4265 | This script centers all lines of a file on a 80 columns width.
|
|---|
| 4266 | To change that width, the number in @code{\@{@dots{}\@}} must be
|
|---|
| 4267 | replaced, and the number of added spaces also must be changed.
|
|---|
| 4268 |
|
|---|
| 4269 | Note how the buffer commands are used to separate parts in
|
|---|
| 4270 | the regular expressions to be matched---this is a common
|
|---|
| 4271 | technique.
|
|---|
| 4272 |
|
|---|
| 4273 | @c start-------------------------------------------
|
|---|
| 4274 | @example
|
|---|
| 4275 | #!/usr/bin/sed -f
|
|---|
| 4276 |
|
|---|
| 4277 | @group
|
|---|
| 4278 | # Put 80 spaces in the buffer
|
|---|
| 4279 | 1 @{
|
|---|
| 4280 | x
|
|---|
| 4281 | s/^$/ /
|
|---|
| 4282 | s/^.*$/&&&&&&&&/
|
|---|
| 4283 | x
|
|---|
| 4284 | @}
|
|---|
| 4285 | @end group
|
|---|
| 4286 |
|
|---|
| 4287 | @group
|
|---|
| 4288 | # delete leading and trailing spaces
|
|---|
| 4289 | y/@kbd{@key{TAB}}/ /
|
|---|
| 4290 | s/^ *//
|
|---|
| 4291 | s/ *$//
|
|---|
| 4292 | @end group
|
|---|
| 4293 |
|
|---|
| 4294 | @group
|
|---|
| 4295 | # add a newline and 80 spaces to end of line
|
|---|
| 4296 | G
|
|---|
| 4297 | @end group
|
|---|
| 4298 |
|
|---|
| 4299 | @group
|
|---|
| 4300 | # keep first 81 chars (80 + a newline)
|
|---|
| 4301 | s/^\(.\@{81\@}\).*$/\1/
|
|---|
| 4302 | @end group
|
|---|
| 4303 |
|
|---|
| 4304 | @group
|
|---|
| 4305 | # \2 matches half of the spaces, which are moved to the beginning
|
|---|
| 4306 | s/^\(.*\)\n\(.*\)\2/\2\1/
|
|---|
| 4307 | @end group
|
|---|
| 4308 | @end example
|
|---|
| 4309 | @c end---------------------------------------------
|
|---|
| 4310 |
|
|---|
| 4311 | @node Increment a number
|
|---|
| 4312 | @section Increment a Number
|
|---|
| 4313 |
|
|---|
| 4314 | This script is one of a few that demonstrate how to do arithmetic
|
|---|
| 4315 | in @command{sed}. This is indeed possible,@footnote{@command{sed} guru Greg
|
|---|
| 4316 | Ubben wrote an implementation of the @command{dc} @sc{rpn} calculator!
|
|---|
| 4317 | It is distributed together with sed.} but must be done manually.
|
|---|
| 4318 |
|
|---|
| 4319 | To increment one number you just add 1 to last digit, replacing
|
|---|
| 4320 | it by the following digit. There is one exception: when the digit
|
|---|
| 4321 | is a nine the previous digits must be also incremented until you
|
|---|
| 4322 | don't have a nine.
|
|---|
| 4323 |
|
|---|
| 4324 | This solution by Bruno Haible is very clever and smart because
|
|---|
| 4325 | it uses a single buffer; if you don't have this limitation, the
|
|---|
| 4326 | algorithm used in @ref{cat -n, Numbering lines}, is faster.
|
|---|
| 4327 | It works by replacing trailing nines with an underscore, then
|
|---|
| 4328 | using multiple @code{s} commands to increment the last digit,
|
|---|
| 4329 | and then again substituting underscores with zeros.
|
|---|
| 4330 |
|
|---|
| 4331 | @c start-------------------------------------------
|
|---|
| 4332 | @example
|
|---|
| 4333 | #!/usr/bin/sed -f
|
|---|
| 4334 |
|
|---|
| 4335 | /[^0-9]/ d
|
|---|
| 4336 |
|
|---|
| 4337 | @group
|
|---|
| 4338 | # replace all trailing 9s by _ (any other character except digits, could
|
|---|
| 4339 | # be used)
|
|---|
| 4340 | :d
|
|---|
| 4341 | s/9\(_*\)$/_\1/
|
|---|
| 4342 | td
|
|---|
| 4343 | @end group
|
|---|
| 4344 |
|
|---|
| 4345 | @group
|
|---|
| 4346 | # incr last digit only. The first line adds a most-significant
|
|---|
| 4347 | # digit of 1 if we have to add a digit.
|
|---|
| 4348 | @end group
|
|---|
| 4349 |
|
|---|
| 4350 | @group
|
|---|
| 4351 | s/^\(_*\)$/1\1/; tn
|
|---|
| 4352 | s/8\(_*\)$/9\1/; tn
|
|---|
| 4353 | s/7\(_*\)$/8\1/; tn
|
|---|
| 4354 | s/6\(_*\)$/7\1/; tn
|
|---|
| 4355 | s/5\(_*\)$/6\1/; tn
|
|---|
| 4356 | s/4\(_*\)$/5\1/; tn
|
|---|
| 4357 | s/3\(_*\)$/4\1/; tn
|
|---|
| 4358 | s/2\(_*\)$/3\1/; tn
|
|---|
| 4359 | s/1\(_*\)$/2\1/; tn
|
|---|
| 4360 | s/0\(_*\)$/1\1/; tn
|
|---|
| 4361 | @end group
|
|---|
| 4362 |
|
|---|
| 4363 | @group
|
|---|
| 4364 | :n
|
|---|
| 4365 | y/_/0/
|
|---|
| 4366 | @end group
|
|---|
| 4367 | @end example
|
|---|
| 4368 | @c end---------------------------------------------
|
|---|
| 4369 |
|
|---|
| 4370 | @node Rename files to lower case
|
|---|
| 4371 | @section Rename Files to Lower Case
|
|---|
| 4372 |
|
|---|
| 4373 | This is a pretty strange use of @command{sed}. We transform text, and
|
|---|
| 4374 | transform it to be shell commands, then just feed them to shell.
|
|---|
| 4375 | Don't worry, even worse hacks are done when using @command{sed}; I have
|
|---|
| 4376 | seen a script converting the output of @command{date} into a @command{bc}
|
|---|
| 4377 | program!
|
|---|
| 4378 |
|
|---|
| 4379 | The main body of this is the @command{sed} script, which remaps the name
|
|---|
| 4380 | from lower to upper (or vice-versa) and even checks out
|
|---|
| 4381 | if the remapped name is the same as the original name.
|
|---|
| 4382 | Note how the script is parameterized using shell
|
|---|
| 4383 | variables and proper quoting.
|
|---|
| 4384 |
|
|---|
| 4385 | @c start-------------------------------------------
|
|---|
| 4386 | @example
|
|---|
| 4387 | @group
|
|---|
| 4388 | #! /bin/sh
|
|---|
| 4389 | # rename files to lower/upper case...
|
|---|
| 4390 | #
|
|---|
| 4391 | # usage:
|
|---|
| 4392 | # move-to-lower *
|
|---|
| 4393 | # move-to-upper *
|
|---|
| 4394 | # or
|
|---|
| 4395 | # move-to-lower -R .
|
|---|
| 4396 | # move-to-upper -R .
|
|---|
| 4397 | #
|
|---|
| 4398 | @end group
|
|---|
| 4399 |
|
|---|
| 4400 | @group
|
|---|
| 4401 | help()
|
|---|
| 4402 | @{
|
|---|
| 4403 | cat << eof
|
|---|
| 4404 | Usage: $0 [-n] [-r] [-h] files...
|
|---|
| 4405 | @end group
|
|---|
| 4406 |
|
|---|
| 4407 | @group
|
|---|
| 4408 | -n do nothing, only see what would be done
|
|---|
| 4409 | -R recursive (use find)
|
|---|
| 4410 | -h this message
|
|---|
| 4411 | files files to remap to lower case
|
|---|
| 4412 | @end group
|
|---|
| 4413 |
|
|---|
| 4414 | @group
|
|---|
| 4415 | Examples:
|
|---|
| 4416 | $0 -n * (see if everything is ok, then...)
|
|---|
| 4417 | $0 *
|
|---|
| 4418 | @end group
|
|---|
| 4419 |
|
|---|
| 4420 | $0 -R .
|
|---|
| 4421 |
|
|---|
| 4422 | @group
|
|---|
| 4423 | eof
|
|---|
| 4424 | @}
|
|---|
| 4425 | @end group
|
|---|
| 4426 |
|
|---|
| 4427 | @group
|
|---|
| 4428 | apply_cmd='sh'
|
|---|
| 4429 | finder='echo "$@@" | tr " " "\n"'
|
|---|
| 4430 | files_only=
|
|---|
| 4431 | @end group
|
|---|
| 4432 |
|
|---|
| 4433 | @group
|
|---|
| 4434 | while :
|
|---|
| 4435 | do
|
|---|
| 4436 | case "$1" in
|
|---|
| 4437 | -n) apply_cmd='cat' ;;
|
|---|
| 4438 | -R) finder='find "$@@" -type f';;
|
|---|
| 4439 | -h) help ; exit 1 ;;
|
|---|
| 4440 | *) break ;;
|
|---|
| 4441 | esac
|
|---|
| 4442 | shift
|
|---|
| 4443 | done
|
|---|
| 4444 | @end group
|
|---|
| 4445 |
|
|---|
| 4446 | @group
|
|---|
| 4447 | if [ -z "$1" ]; then
|
|---|
| 4448 | echo Usage: $0 [-h] [-n] [-r] files...
|
|---|
| 4449 | exit 1
|
|---|
| 4450 | fi
|
|---|
| 4451 | @end group
|
|---|
| 4452 |
|
|---|
| 4453 | @group
|
|---|
| 4454 | LOWER='abcdefghijklmnopqrstuvwxyz'
|
|---|
| 4455 | UPPER='ABCDEFGHIJKLMNOPQRSTUVWXYZ'
|
|---|
| 4456 | @end group
|
|---|
| 4457 |
|
|---|
| 4458 | @group
|
|---|
| 4459 | case `basename $0` in
|
|---|
| 4460 | *upper*) TO=$UPPER; FROM=$LOWER ;;
|
|---|
| 4461 | *) FROM=$UPPER; TO=$LOWER ;;
|
|---|
| 4462 | esac
|
|---|
| 4463 | @end group
|
|---|
| 4464 |
|
|---|
| 4465 | eval $finder | sed -n '
|
|---|
| 4466 |
|
|---|
| 4467 | @group
|
|---|
| 4468 | # remove all trailing slashes
|
|---|
| 4469 | s/\/*$//
|
|---|
| 4470 | @end group
|
|---|
| 4471 |
|
|---|
| 4472 | @group
|
|---|
| 4473 | # add ./ if there is no path, only a filename
|
|---|
| 4474 | /\//! s/^/.\//
|
|---|
| 4475 | @end group
|
|---|
| 4476 |
|
|---|
| 4477 | @group
|
|---|
| 4478 | # save path+filename
|
|---|
| 4479 | h
|
|---|
| 4480 | @end group
|
|---|
| 4481 |
|
|---|
| 4482 | @group
|
|---|
| 4483 | # remove path
|
|---|
| 4484 | s/.*\///
|
|---|
| 4485 | @end group
|
|---|
| 4486 |
|
|---|
| 4487 | @group
|
|---|
| 4488 | # do conversion only on filename
|
|---|
| 4489 | y/'$FROM'/'$TO'/
|
|---|
| 4490 | @end group
|
|---|
| 4491 |
|
|---|
| 4492 | @group
|
|---|
| 4493 | # now line contains original path+file, while
|
|---|
| 4494 | # hold space contains the new filename
|
|---|
| 4495 | x
|
|---|
| 4496 | @end group
|
|---|
| 4497 |
|
|---|
| 4498 | @group
|
|---|
| 4499 | # add converted file name to line, which now contains
|
|---|
| 4500 | # path/file-name\nconverted-file-name
|
|---|
| 4501 | G
|
|---|
| 4502 | @end group
|
|---|
| 4503 |
|
|---|
| 4504 | @group
|
|---|
| 4505 | # check if converted file name is equal to original file name,
|
|---|
| 4506 | # if it is, do not print anything
|
|---|
| 4507 | /^.*\/\(.*\)\n\1/b
|
|---|
| 4508 | @end group
|
|---|
| 4509 |
|
|---|
| 4510 | @group
|
|---|
| 4511 | # escape special characters for the shell
|
|---|
| 4512 | s/["$`\\]/\\&/g
|
|---|
| 4513 | @end group
|
|---|
| 4514 |
|
|---|
| 4515 | @group
|
|---|
| 4516 | # now, transform path/fromfile\n, into
|
|---|
| 4517 | # mv path/fromfile path/tofile and print it
|
|---|
| 4518 | s/^\(.*\/\)\(.*\)\n\(.*\)$/mv "\1\2" "\1\3"/p
|
|---|
| 4519 | @end group
|
|---|
| 4520 |
|
|---|
| 4521 | ' | $apply_cmd
|
|---|
| 4522 | @end example
|
|---|
| 4523 | @c end---------------------------------------------
|
|---|
| 4524 |
|
|---|
| 4525 | @node Print bash environment
|
|---|
| 4526 | @section Print @command{bash} Environment
|
|---|
| 4527 |
|
|---|
| 4528 | This script strips the definition of the shell functions
|
|---|
| 4529 | from the output of the @command{set} Bourne-shell command.
|
|---|
| 4530 |
|
|---|
| 4531 | @c start-------------------------------------------
|
|---|
| 4532 | @example
|
|---|
| 4533 | #!/bin/sh
|
|---|
| 4534 |
|
|---|
| 4535 | @group
|
|---|
| 4536 | set | sed -n '
|
|---|
| 4537 | :x
|
|---|
| 4538 | @end group
|
|---|
| 4539 |
|
|---|
| 4540 | @group
|
|---|
| 4541 | @ifinfo
|
|---|
| 4542 | # if no occurrence of "=()" print and load next line
|
|---|
| 4543 | @end ifinfo
|
|---|
| 4544 | @ifnotinfo
|
|---|
| 4545 | # if no occurrence of @samp{=()} print and load next line
|
|---|
| 4546 | @end ifnotinfo
|
|---|
| 4547 | /=()/! @{ p; b; @}
|
|---|
| 4548 | / () $/! @{ p; b; @}
|
|---|
| 4549 | @end group
|
|---|
| 4550 |
|
|---|
| 4551 | @group
|
|---|
| 4552 | # possible start of functions section
|
|---|
| 4553 | # save the line in case this is a var like FOO="() "
|
|---|
| 4554 | h
|
|---|
| 4555 | @end group
|
|---|
| 4556 |
|
|---|
| 4557 | @group
|
|---|
| 4558 | # if the next line has a brace, we quit because
|
|---|
| 4559 | # nothing comes after functions
|
|---|
| 4560 | n
|
|---|
| 4561 | /^@{/ q
|
|---|
| 4562 | @end group
|
|---|
| 4563 |
|
|---|
| 4564 | @group
|
|---|
| 4565 | # print the old line
|
|---|
| 4566 | x; p
|
|---|
| 4567 | @end group
|
|---|
| 4568 |
|
|---|
| 4569 | @group
|
|---|
| 4570 | # work on the new line now
|
|---|
| 4571 | x; bx
|
|---|
| 4572 | '
|
|---|
| 4573 | @end group
|
|---|
| 4574 | @end example
|
|---|
| 4575 | @c end---------------------------------------------
|
|---|
| 4576 |
|
|---|
| 4577 | @node Reverse chars of lines
|
|---|
| 4578 | @section Reverse Characters of Lines
|
|---|
| 4579 |
|
|---|
| 4580 | This script can be used to reverse the position of characters
|
|---|
| 4581 | in lines. The technique moves two characters at a time, hence
|
|---|
| 4582 | it is faster than more intuitive implementations.
|
|---|
| 4583 |
|
|---|
| 4584 | Note the @code{tx} command before the definition of the label.
|
|---|
| 4585 | This is often needed to reset the flag that is tested by
|
|---|
| 4586 | the @code{t} command.
|
|---|
| 4587 |
|
|---|
| 4588 | Imaginative readers will find uses for this script. An example
|
|---|
| 4589 | is reversing the output of @command{banner}.@footnote{This requires
|
|---|
| 4590 | another script to pad the output of banner; for example
|
|---|
| 4591 |
|
|---|
| 4592 | @example
|
|---|
| 4593 | #! /bin/sh
|
|---|
| 4594 |
|
|---|
| 4595 | banner -w $1 $2 $3 $4 |
|
|---|
| 4596 | sed -e :a -e '/^.\@{0,'$1'\@}$/ @{ s/$/ /; ba; @}' |
|
|---|
| 4597 | ~/sedscripts/reverseline.sed
|
|---|
| 4598 | @end example
|
|---|
| 4599 | }
|
|---|
| 4600 |
|
|---|
| 4601 | @c start-------------------------------------------
|
|---|
| 4602 | @example
|
|---|
| 4603 | #!/usr/bin/sed -f
|
|---|
| 4604 |
|
|---|
| 4605 | /../! b
|
|---|
| 4606 |
|
|---|
| 4607 | @group
|
|---|
| 4608 | # Reverse a line. Begin embedding the line between two newlines
|
|---|
| 4609 | s/^.*$/\
|
|---|
| 4610 | &\
|
|---|
| 4611 | /
|
|---|
| 4612 | @end group
|
|---|
| 4613 |
|
|---|
| 4614 | @group
|
|---|
| 4615 | # Move first character at the end. The regexp matches until
|
|---|
| 4616 | # there are zero or one characters between the markers
|
|---|
| 4617 | tx
|
|---|
| 4618 | :x
|
|---|
| 4619 | s/\(\n.\)\(.*\)\(.\n\)/\3\2\1/
|
|---|
| 4620 | tx
|
|---|
| 4621 | @end group
|
|---|
| 4622 |
|
|---|
| 4623 | @group
|
|---|
| 4624 | # Remove the newline markers
|
|---|
| 4625 | s/\n//g
|
|---|
| 4626 | @end group
|
|---|
| 4627 | @end example
|
|---|
| 4628 | @c end---------------------------------------------
|
|---|
| 4629 |
|
|---|
| 4630 |
|
|---|
| 4631 | @node Text search across multiple lines
|
|---|
| 4632 | @section Text search across multiple lines
|
|---|
| 4633 |
|
|---|
| 4634 | This section uses @code{N} and @code{D} commands to search for
|
|---|
| 4635 | consecutive words spanning multiple lines. @xref{Multiline techniques}.
|
|---|
| 4636 |
|
|---|
| 4637 | These examples deal with finding doubled occurrences of words in a document.
|
|---|
| 4638 |
|
|---|
| 4639 | Finding doubled words in a single line is easy using GNU @command{grep}
|
|---|
| 4640 | and similarly with @value{SSED}:
|
|---|
| 4641 |
|
|---|
| 4642 | @c NOTE: in all examples, 'the@ the' is used to prevent
|
|---|
| 4643 | @c 'make syntax-check' from complaining about double words.
|
|---|
| 4644 | @codequoteundirected on
|
|---|
| 4645 | @codequotebacktick on
|
|---|
| 4646 | @example
|
|---|
| 4647 | @group
|
|---|
| 4648 | $ cat two-cities-dup1.txt
|
|---|
| 4649 | It was the best of times,
|
|---|
| 4650 | it was the worst of times,
|
|---|
| 4651 | it was the@ the age of wisdom,
|
|---|
| 4652 | it was the age of foolishness,
|
|---|
| 4653 |
|
|---|
| 4654 | $ grep -E '\b(\w+)\s+\1\b' two-cities-dup1.txt
|
|---|
| 4655 | it was the@ the age of wisdom,
|
|---|
| 4656 |
|
|---|
| 4657 | $ grep -n -E '\b(\w+)\s+\1\b' two-cities-dup1.txt
|
|---|
| 4658 | 3:it was the@ the age of wisdom,
|
|---|
| 4659 |
|
|---|
| 4660 | $ sed -En '/\b(\w+)\s+\1\b/p' two-cities-dup1.txt
|
|---|
| 4661 | it was the@ the age of wisdom,
|
|---|
| 4662 |
|
|---|
| 4663 | $ sed -En '/\b(\w+)\s+\1\b/@{=;p@}' two-cities-dup1.txt
|
|---|
| 4664 | 3
|
|---|
| 4665 | it was the@ the age of wisdom,
|
|---|
| 4666 | @end group
|
|---|
| 4667 | @end example
|
|---|
| 4668 | @codequoteundirected off
|
|---|
| 4669 | @codequotebacktick off
|
|---|
| 4670 |
|
|---|
| 4671 | @itemize @bullet
|
|---|
| 4672 | @item
|
|---|
| 4673 | The regular expression @samp{\b\w+\s+} searches for word-boundary (@samp{\b}),
|
|---|
| 4674 | followed by one-or-more word-characters (@samp{\w+}), followed by whitespace
|
|---|
| 4675 | (@samp{\s+}). @xref{regexp extensions}.
|
|---|
| 4676 |
|
|---|
| 4677 | @item
|
|---|
| 4678 | Adding parentheses around the @samp{(\w+)} expression creates a subexpression.
|
|---|
| 4679 | The regular expression pattern @samp{(PATTERN)\s+\1} defines a subexpression
|
|---|
| 4680 | (in the parentheses) followed by a back-reference, separated by whitespace.
|
|---|
| 4681 | A successful match means the @var{PATTERN} was repeated twice in succession.
|
|---|
| 4682 | @xref{Back-references and Subexpressions}.
|
|---|
| 4683 |
|
|---|
| 4684 | @item
|
|---|
| 4685 | The word-boundery expression (@samp{\b}) at both ends ensures partial
|
|---|
| 4686 | words are not matched (e.g. @samp{the then} is not a desired match).
|
|---|
| 4687 | @c Thanks to Jim for pointing this out in
|
|---|
| 4688 | @c https://lists.gnu.org/archive/html/sed-devel/2016-12/msg00041.html
|
|---|
| 4689 |
|
|---|
| 4690 | @item
|
|---|
| 4691 | The @option{-E} option enables extended regular expression syntax, alleviating
|
|---|
| 4692 | the need to add backslashes before the parenthesis. @xref{ERE syntax}.
|
|---|
| 4693 |
|
|---|
| 4694 | @end itemize
|
|---|
| 4695 |
|
|---|
| 4696 | When the doubled word span two lines the above regular expression
|
|---|
| 4697 | will not find them as @command{grep} and @command{sed} operate line-by-line.
|
|---|
| 4698 |
|
|---|
| 4699 | By using @command{N} and @command{D} commands, @command{sed} can apply
|
|---|
| 4700 | regular expressions on multiple lines (that is, multiple lines are stored
|
|---|
| 4701 | in the pattern space, and the regular expression works on it):
|
|---|
| 4702 |
|
|---|
| 4703 | @c NOTE: use 'the@*the' instead of a real new line to prevent
|
|---|
| 4704 | @c 'make syntax-check' to complain about doubled-words.
|
|---|
| 4705 | @codequoteundirected on
|
|---|
| 4706 | @codequotebacktick on
|
|---|
| 4707 | @example
|
|---|
| 4708 | $ cat two-cities-dup2.txt
|
|---|
| 4709 | It was the best of times, it was the
|
|---|
| 4710 | worst of times, it was the@*the age of wisdom,
|
|---|
| 4711 | it was the age of foolishness,
|
|---|
| 4712 |
|
|---|
| 4713 | $ sed -En '@{N; /\b(\w+)\s+\1\b/@{=;p@} ; D@}' two-cities-dup2.txt
|
|---|
| 4714 | 3
|
|---|
| 4715 | worst of times, it was the@*the age of wisdom,
|
|---|
| 4716 | @end example
|
|---|
| 4717 | @codequoteundirected off
|
|---|
| 4718 | @codequotebacktick off
|
|---|
| 4719 |
|
|---|
| 4720 | @itemize @bullet
|
|---|
| 4721 | @item
|
|---|
| 4722 | The @command{N} command appends the next line to the pattern space
|
|---|
| 4723 | (thus ensuring it contains two consecutive lines in every cycle).
|
|---|
| 4724 |
|
|---|
| 4725 | @item
|
|---|
| 4726 | The regular expression uses @samp{\s+} for word separator which matches
|
|---|
| 4727 | both spaces and newlines.
|
|---|
| 4728 |
|
|---|
| 4729 | @item
|
|---|
| 4730 | The regular expression matches, the entire pattern space is printed
|
|---|
| 4731 | with @command{p}. No lines are printed by default due to the @option{-n} option.
|
|---|
| 4732 |
|
|---|
| 4733 | @item
|
|---|
| 4734 | The @command{D} removes the first line from the pattern space (up until the
|
|---|
| 4735 | first newline), readying it for the next cycle.
|
|---|
| 4736 | @end itemize
|
|---|
| 4737 |
|
|---|
| 4738 | See the GNU @command{coreutils} manual for an alternative solution using
|
|---|
| 4739 | @command{tr -s} and @command{uniq} at
|
|---|
| 4740 | @c NOTE: cheating and keeping the URL line shorter than 80 characters
|
|---|
| 4741 | @c by using 'gnu.org' and '/s/'.
|
|---|
| 4742 | @url{https://gnu.org/s/coreutils/manual/html_node/Squeezing-and-deleting.html}.
|
|---|
| 4743 |
|
|---|
| 4744 | @node Line length adjustment
|
|---|
| 4745 | @section Line length adjustment
|
|---|
| 4746 |
|
|---|
| 4747 | This section uses @code{N} and @code{P} commands to read and write
|
|---|
| 4748 | lines, and the @code{b} command for branching.
|
|---|
| 4749 | @xref{Multiline techniques} and @ref{Branching and flow control}.
|
|---|
| 4750 |
|
|---|
| 4751 | This (somewhat contrived) example deal with formatting and wrapping
|
|---|
| 4752 | lines of text of the following input file:
|
|---|
| 4753 |
|
|---|
| 4754 | @example
|
|---|
| 4755 | @group
|
|---|
| 4756 | $ cat two-cities-mix.txt
|
|---|
| 4757 | It was the best of times, it was
|
|---|
| 4758 | the worst of times, it
|
|---|
| 4759 | was the age of
|
|---|
| 4760 | wisdom,
|
|---|
| 4761 | it
|
|---|
| 4762 | was
|
|---|
| 4763 | the age
|
|---|
| 4764 | of foolishness,
|
|---|
| 4765 | @end group
|
|---|
| 4766 | @end example
|
|---|
| 4767 |
|
|---|
| 4768 | @exdent The following sed program wraps lines at 40 characters:
|
|---|
| 4769 | @codequoteundirected on
|
|---|
| 4770 | @codequotebacktick on
|
|---|
| 4771 | @example
|
|---|
| 4772 | @group
|
|---|
| 4773 | $ cat wrap40.sed
|
|---|
| 4774 | # outer loop
|
|---|
| 4775 | :x
|
|---|
| 4776 |
|
|---|
| 4777 | # Append a newline followed by the next input line to the pattern buffer
|
|---|
| 4778 | N
|
|---|
| 4779 |
|
|---|
| 4780 | # Remove all newlines from the pattern buffer
|
|---|
| 4781 | s/\n/ /g
|
|---|
| 4782 |
|
|---|
| 4783 |
|
|---|
| 4784 | # Inner loop
|
|---|
| 4785 | :y
|
|---|
| 4786 |
|
|---|
| 4787 | # Add a newline after the first 40 characters
|
|---|
| 4788 | s/(.@{40,40@})/\1\n/
|
|---|
| 4789 |
|
|---|
| 4790 | # If there is a newline in the pattern buffer
|
|---|
| 4791 | # (i.e. the previous substitution added a newline)
|
|---|
| 4792 | /\n/ @{
|
|---|
| 4793 | # There are newlines in the pattern buffer -
|
|---|
| 4794 | # print the content until the first newline.
|
|---|
| 4795 | P
|
|---|
| 4796 |
|
|---|
| 4797 | # Remove the printed characters and the first newline
|
|---|
| 4798 | s/.*\n//
|
|---|
| 4799 |
|
|---|
| 4800 | # branch to label 'y' - repeat inner loop
|
|---|
| 4801 | by
|
|---|
| 4802 | @}
|
|---|
| 4803 |
|
|---|
| 4804 | # No newlines in the pattern buffer - Branch to label 'x' (outer loop)
|
|---|
| 4805 | # and read the next input line
|
|---|
| 4806 | bx
|
|---|
| 4807 | @end group
|
|---|
| 4808 | @end example
|
|---|
| 4809 | @codequoteundirected off
|
|---|
| 4810 | @codequotebacktick off
|
|---|
| 4811 |
|
|---|
| 4812 |
|
|---|
| 4813 |
|
|---|
| 4814 | @exdent The wrapped output:
|
|---|
| 4815 | @codequoteundirected on
|
|---|
| 4816 | @codequotebacktick on
|
|---|
| 4817 | @example
|
|---|
| 4818 | @group
|
|---|
| 4819 | $ sed -E -f wrap40.sed two-cities-mix.txt
|
|---|
| 4820 | It was the best of times, it was the wor
|
|---|
| 4821 | st of times, it was the age of wisdom, i
|
|---|
| 4822 | t was the age of foolishness,
|
|---|
| 4823 | @end group
|
|---|
| 4824 | @end example
|
|---|
| 4825 | @codequoteundirected off
|
|---|
| 4826 | @codequotebacktick off
|
|---|
| 4827 |
|
|---|
| 4828 |
|
|---|
| 4829 |
|
|---|
| 4830 |
|
|---|
| 4831 | @node Adding a header to multiple files
|
|---|
| 4832 | @section Adding a header to multiple files
|
|---|
| 4833 |
|
|---|
| 4834 | @value{SSED} can be used to safely modify multiple files at once.
|
|---|
| 4835 |
|
|---|
| 4836 | @exdent Add a single line to the beginning of source code files:
|
|---|
| 4837 |
|
|---|
| 4838 | @codequoteundirected on
|
|---|
| 4839 | @codequotebacktick on
|
|---|
| 4840 | @example
|
|---|
| 4841 | sed -i '1i/* Copyright (C) FOO BAR */' *.c
|
|---|
| 4842 | @end example
|
|---|
| 4843 | @codequoteundirected off
|
|---|
| 4844 | @codequotebacktick off
|
|---|
| 4845 |
|
|---|
| 4846 | @exdent Adding a few lines is possible using @samp{\n} in the text:
|
|---|
| 4847 |
|
|---|
| 4848 | @codequoteundirected on
|
|---|
| 4849 | @codequotebacktick on
|
|---|
| 4850 | @example
|
|---|
| 4851 | sed -i '1i/*\n * Copyright (C) FOO BAR\n * Created by Jane Doe\n */' *.c
|
|---|
| 4852 | @end example
|
|---|
| 4853 | @codequoteundirected off
|
|---|
| 4854 | @codequotebacktick off
|
|---|
| 4855 |
|
|---|
| 4856 | To add multiple lines from another file, use @code{0rFILE}.
|
|---|
| 4857 | A typical use case is adding a license notice header to all files:
|
|---|
| 4858 |
|
|---|
| 4859 | @codequoteundirected on
|
|---|
| 4860 | @codequotebacktick on
|
|---|
| 4861 | @example
|
|---|
| 4862 | ## Create the header file:
|
|---|
| 4863 | $ cat<<'EOF'>LIC.TXT
|
|---|
| 4864 | /*
|
|---|
| 4865 | Copyright (C) 1989-2021 FOO BAR
|
|---|
| 4866 |
|
|---|
| 4867 | This program is free software; you can redistribute it and/or modify
|
|---|
| 4868 | it under the terms of the GNU General Public License as published by
|
|---|
| 4869 | the Free Software Foundation; either version 3, or (at your option)
|
|---|
| 4870 | any later version.
|
|---|
| 4871 |
|
|---|
| 4872 | This program is distributed in the hope that it will be useful,
|
|---|
| 4873 | but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|---|
| 4874 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|---|
| 4875 | GNU General Public License for more details.
|
|---|
| 4876 |
|
|---|
| 4877 | You should have received a copy of the GNU General Public License
|
|---|
| 4878 | along with this program; If not, see <https://www.gnu.org/licenses/>.
|
|---|
| 4879 | */
|
|---|
| 4880 | EOF
|
|---|
| 4881 |
|
|---|
| 4882 | ## Add the file at the beginning of all source code files:
|
|---|
| 4883 | $ sed -i '0rLIC.TXT' *.cpp *.h
|
|---|
| 4884 | @end example
|
|---|
| 4885 | @codequoteundirected off
|
|---|
| 4886 | @codequotebacktick off
|
|---|
| 4887 |
|
|---|
| 4888 |
|
|---|
| 4889 | With script files (e.g. @file{.sh},@file{.py},@file{.pl} files)
|
|---|
| 4890 | the license notice typically appears @emph{after} the first line (the
|
|---|
| 4891 | 'shebang' @samp{#!} line). The @code{1rFILE} command will add @file{FILE}
|
|---|
| 4892 | @emph{after} the first line:
|
|---|
| 4893 |
|
|---|
| 4894 | @codequoteundirected on
|
|---|
| 4895 | @codequotebacktick on
|
|---|
| 4896 | @example
|
|---|
| 4897 | ## Create the header file:
|
|---|
| 4898 | $ cat<<'EOF'>LIC.TXT
|
|---|
| 4899 | ##
|
|---|
| 4900 | ## Copyright (C) 1989-2021 FOO BAR
|
|---|
| 4901 | ##
|
|---|
| 4902 | ## This program is free software; you can redistribute it and/or modify
|
|---|
| 4903 | ## it under the terms of the GNU General Public License as published by
|
|---|
| 4904 | ## the Free Software Foundation; either version 3, or (at your option)
|
|---|
| 4905 | ## any later version.
|
|---|
| 4906 | ##
|
|---|
| 4907 | ## This program is distributed in the hope that it will be useful,
|
|---|
| 4908 | ## but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|---|
| 4909 | ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|---|
| 4910 | ## GNU General Public License for more details.
|
|---|
| 4911 | ##
|
|---|
| 4912 | ## You should have received a copy of the GNU General Public License
|
|---|
| 4913 | ## along with this program; If not, see <https://www.gnu.org/licenses/>.
|
|---|
| 4914 | ##
|
|---|
| 4915 | ##
|
|---|
| 4916 | EOF
|
|---|
| 4917 |
|
|---|
| 4918 | ## Add the file at the beginning of all source code files:
|
|---|
| 4919 | $ sed -i '1rLIC.TXT' *.py *.sh
|
|---|
| 4920 | @end example
|
|---|
| 4921 | @codequoteundirected off
|
|---|
| 4922 | @codequotebacktick off
|
|---|
| 4923 |
|
|---|
| 4924 | The above @command{sed} commands can be combined with @command{find}
|
|---|
| 4925 | to locate files in all subdirectories, @command{xargs} to run additional
|
|---|
| 4926 | commands on selected files and @command{grep} to filter out files that already
|
|---|
| 4927 | contain a copyright notice:
|
|---|
| 4928 |
|
|---|
| 4929 | @codequoteundirected on
|
|---|
| 4930 | @codequotebacktick on
|
|---|
| 4931 | @example
|
|---|
| 4932 | find \( -iname '*.cpp' -o -iname '*.c' -o -iname '*.h' \) \
|
|---|
| 4933 | | xargs grep -Li copyright \
|
|---|
| 4934 | | xargs -r sed -i '0rLIC.TXT'
|
|---|
| 4935 | @end example
|
|---|
| 4936 | @codequoteundirected off
|
|---|
| 4937 | @codequotebacktick off
|
|---|
| 4938 |
|
|---|
| 4939 | @exdent Or a slightly safe version (handling files with spaces and newlines):
|
|---|
| 4940 |
|
|---|
| 4941 | @codequoteundirected on
|
|---|
| 4942 | @codequotebacktick on
|
|---|
| 4943 | @example
|
|---|
| 4944 | find \( -iname '*.cpp' -o -iname '*.c' -o -iname '*.h' \) -print0 \
|
|---|
| 4945 | | xargs -0 grep -Z -Li copyright \
|
|---|
| 4946 | | xargs -0 -r sed -i '0rLIC.TXT'
|
|---|
| 4947 | @end example
|
|---|
| 4948 | @codequoteundirected off
|
|---|
| 4949 | @codequotebacktick off
|
|---|
| 4950 |
|
|---|
| 4951 | Note: using the @code{0} address with @code{r} command requires @value{SSED}
|
|---|
| 4952 | version 4.9 or later. @xref{Zero Address}.
|
|---|
| 4953 |
|
|---|
| 4954 |
|
|---|
| 4955 |
|
|---|
| 4956 | @node tac
|
|---|
| 4957 | @section Reverse Lines of Files
|
|---|
| 4958 |
|
|---|
| 4959 | This one begins a series of totally useless (yet interesting)
|
|---|
| 4960 | scripts emulating various Unix commands. This, in particular,
|
|---|
| 4961 | is a @command{tac} workalike.
|
|---|
| 4962 |
|
|---|
| 4963 | Note that on implementations other than GNU @command{sed}
|
|---|
| 4964 | this script might easily overflow internal buffers.
|
|---|
| 4965 |
|
|---|
| 4966 | @c start-------------------------------------------
|
|---|
| 4967 | @example
|
|---|
| 4968 | #!/usr/bin/sed -nf
|
|---|
| 4969 |
|
|---|
| 4970 | # reverse all lines of input, i.e. first line became last, ...
|
|---|
| 4971 |
|
|---|
| 4972 | @group
|
|---|
| 4973 | # from the second line, the buffer (which contains all previous lines)
|
|---|
| 4974 | # is *appended* to current line, so, the order will be reversed
|
|---|
| 4975 | 1! G
|
|---|
| 4976 | @end group
|
|---|
| 4977 |
|
|---|
| 4978 | @group
|
|---|
| 4979 | # on the last line we're done -- print everything
|
|---|
| 4980 | $ p
|
|---|
| 4981 | @end group
|
|---|
| 4982 |
|
|---|
| 4983 | @group
|
|---|
| 4984 | # store everything on the buffer again
|
|---|
| 4985 | h
|
|---|
| 4986 | @end group
|
|---|
| 4987 | @end example
|
|---|
| 4988 | @c end---------------------------------------------
|
|---|
| 4989 |
|
|---|
| 4990 | @node cat -n
|
|---|
| 4991 | @section Numbering Lines
|
|---|
| 4992 |
|
|---|
| 4993 | This script replaces @samp{cat -n}; in fact it formats its output
|
|---|
| 4994 | exactly like GNU @command{cat} does.
|
|---|
| 4995 |
|
|---|
| 4996 | Of course this is completely useless and for two reasons: first,
|
|---|
| 4997 | because somebody else did it in C, second, because the following
|
|---|
| 4998 | Bourne-shell script could be used for the same purpose and would
|
|---|
| 4999 | be much faster:
|
|---|
| 5000 |
|
|---|
| 5001 | @c start-------------------------------------------
|
|---|
| 5002 | @example
|
|---|
| 5003 | @group
|
|---|
| 5004 | #! /bin/sh
|
|---|
| 5005 | sed -e "=" $@@ | sed -e '
|
|---|
| 5006 | s/^/ /
|
|---|
| 5007 | N
|
|---|
| 5008 | s/^ *\(......\)\n/\1 /
|
|---|
| 5009 | '
|
|---|
| 5010 | @end group
|
|---|
| 5011 | @end example
|
|---|
| 5012 | @c end---------------------------------------------
|
|---|
| 5013 |
|
|---|
| 5014 | It uses @command{sed} to print the line number, then groups lines two
|
|---|
| 5015 | by two using @code{N}. Of course, this script does not teach as much as
|
|---|
| 5016 | the one presented below.
|
|---|
| 5017 |
|
|---|
| 5018 | The algorithm used for incrementing uses both buffers, so the line
|
|---|
| 5019 | is printed as soon as possible and then discarded. The number
|
|---|
| 5020 | is split so that changing digits go in a buffer and unchanged ones go
|
|---|
| 5021 | in the other; the changed digits are modified in a single step
|
|---|
| 5022 | (using a @code{y} command). The line number for the next line
|
|---|
| 5023 | is then composed and stored in the hold space, to be used in the
|
|---|
| 5024 | next iteration.
|
|---|
| 5025 |
|
|---|
| 5026 | @c start-------------------------------------------
|
|---|
| 5027 | @example
|
|---|
| 5028 | #!/usr/bin/sed -nf
|
|---|
| 5029 |
|
|---|
| 5030 | @group
|
|---|
| 5031 | # Prime the pump on the first line
|
|---|
| 5032 | x
|
|---|
| 5033 | /^$/ s/^.*$/1/
|
|---|
| 5034 | @end group
|
|---|
| 5035 |
|
|---|
| 5036 | @group
|
|---|
| 5037 | # Add the correct line number before the pattern
|
|---|
| 5038 | G
|
|---|
| 5039 | h
|
|---|
| 5040 | @end group
|
|---|
| 5041 |
|
|---|
| 5042 | @group
|
|---|
| 5043 | # Format it and print it
|
|---|
| 5044 | s/^/ /
|
|---|
| 5045 | s/^ *\(......\)\n/\1 /p
|
|---|
| 5046 | @end group
|
|---|
| 5047 |
|
|---|
| 5048 | @group
|
|---|
| 5049 | # Get the line number from hold space; add a zero
|
|---|
| 5050 | # if we're going to add a digit on the next line
|
|---|
| 5051 | g
|
|---|
| 5052 | s/\n.*$//
|
|---|
| 5053 | /^9*$/ s/^/0/
|
|---|
| 5054 | @end group
|
|---|
| 5055 |
|
|---|
| 5056 | @group
|
|---|
| 5057 | # separate changing/unchanged digits with an x
|
|---|
| 5058 | s/.9*$/x&/
|
|---|
| 5059 | @end group
|
|---|
| 5060 |
|
|---|
| 5061 | @group
|
|---|
| 5062 | # keep changing digits in hold space
|
|---|
| 5063 | h
|
|---|
| 5064 | s/^.*x//
|
|---|
| 5065 | y/0123456789/1234567890/
|
|---|
| 5066 | x
|
|---|
| 5067 | @end group
|
|---|
| 5068 |
|
|---|
| 5069 | @group
|
|---|
| 5070 | # keep unchanged digits in pattern space
|
|---|
| 5071 | s/x.*$//
|
|---|
| 5072 | @end group
|
|---|
| 5073 |
|
|---|
| 5074 | @group
|
|---|
| 5075 | # compose the new number, remove the newline implicitly added by G
|
|---|
| 5076 | G
|
|---|
| 5077 | s/\n//
|
|---|
| 5078 | h
|
|---|
| 5079 | @end group
|
|---|
| 5080 | @end example
|
|---|
| 5081 | @c end---------------------------------------------
|
|---|
| 5082 |
|
|---|
| 5083 | @node cat -b
|
|---|
| 5084 | @section Numbering Non-blank Lines
|
|---|
| 5085 |
|
|---|
| 5086 | Emulating @samp{cat -b} is almost the same as @samp{cat -n}---we only
|
|---|
| 5087 | have to select which lines are to be numbered and which are not.
|
|---|
| 5088 |
|
|---|
| 5089 | The part that is common to this script and the previous one is
|
|---|
| 5090 | not commented to show how important it is to comment @command{sed}
|
|---|
| 5091 | scripts properly...
|
|---|
| 5092 |
|
|---|
| 5093 | @c start-------------------------------------------
|
|---|
| 5094 | @example
|
|---|
| 5095 | #!/usr/bin/sed -nf
|
|---|
| 5096 |
|
|---|
| 5097 | @group
|
|---|
| 5098 | /^$/ @{
|
|---|
| 5099 | p
|
|---|
| 5100 | b
|
|---|
| 5101 | @}
|
|---|
| 5102 | @end group
|
|---|
| 5103 |
|
|---|
| 5104 | @group
|
|---|
| 5105 | # Same as cat -n from now
|
|---|
| 5106 | x
|
|---|
| 5107 | /^$/ s/^.*$/1/
|
|---|
| 5108 | G
|
|---|
| 5109 | h
|
|---|
| 5110 | s/^/ /
|
|---|
| 5111 | s/^ *\(......\)\n/\1 /p
|
|---|
| 5112 | x
|
|---|
| 5113 | s/\n.*$//
|
|---|
| 5114 | /^9*$/ s/^/0/
|
|---|
| 5115 | s/.9*$/x&/
|
|---|
| 5116 | h
|
|---|
| 5117 | s/^.*x//
|
|---|
| 5118 | y/0123456789/1234567890/
|
|---|
| 5119 | x
|
|---|
| 5120 | s/x.*$//
|
|---|
| 5121 | G
|
|---|
| 5122 | s/\n//
|
|---|
| 5123 | h
|
|---|
| 5124 | @end group
|
|---|
| 5125 | @end example
|
|---|
| 5126 | @c end---------------------------------------------
|
|---|
| 5127 |
|
|---|
| 5128 | @node wc -c
|
|---|
| 5129 | @section Counting Characters
|
|---|
| 5130 |
|
|---|
| 5131 | This script shows another way to do arithmetic with @command{sed}.
|
|---|
| 5132 | In this case we have to add possibly large numbers, so implementing
|
|---|
| 5133 | this by successive increments would not be feasible (and possibly
|
|---|
| 5134 | even more complicated to contrive than this script).
|
|---|
| 5135 |
|
|---|
| 5136 | The approach is to map numbers to letters, kind of an abacus
|
|---|
| 5137 | implemented with @command{sed}. @samp{a}s are units, @samp{b}s are
|
|---|
| 5138 | tens and so on: we simply add the number of characters
|
|---|
| 5139 | on the current line as units, and then propagate the carry
|
|---|
| 5140 | to tens, hundreds, and so on.
|
|---|
| 5141 |
|
|---|
| 5142 | As usual, running totals are kept in hold space.
|
|---|
| 5143 |
|
|---|
| 5144 | On the last line, we convert the abacus form back to decimal.
|
|---|
| 5145 | For the sake of variety, this is done with a loop rather than
|
|---|
| 5146 | with some 80 @code{s} commands@footnote{Some implementations
|
|---|
| 5147 | have a limit of 199 commands per script}: first we
|
|---|
| 5148 | convert units, removing @samp{a}s from the number; then we
|
|---|
| 5149 | rotate letters so that tens become @samp{a}s, and so on
|
|---|
| 5150 | until no more letters remain.
|
|---|
| 5151 |
|
|---|
| 5152 | @c start-------------------------------------------
|
|---|
| 5153 | @example
|
|---|
| 5154 | #!/usr/bin/sed -nf
|
|---|
| 5155 |
|
|---|
| 5156 | @group
|
|---|
| 5157 | # Add n+1 a's to hold space (+1 is for the newline)
|
|---|
| 5158 | s/./a/g
|
|---|
| 5159 | H
|
|---|
| 5160 | x
|
|---|
| 5161 | s/\n/a/
|
|---|
| 5162 | @end group
|
|---|
| 5163 |
|
|---|
| 5164 | @group
|
|---|
| 5165 | # Do the carry. The t's and b's are not necessary,
|
|---|
| 5166 | # but they do speed up the thing
|
|---|
| 5167 | t a
|
|---|
| 5168 | : a; s/aaaaaaaaaa/b/g; t b; b done
|
|---|
| 5169 | : b; s/bbbbbbbbbb/c/g; t c; b done
|
|---|
| 5170 | : c; s/cccccccccc/d/g; t d; b done
|
|---|
| 5171 | : d; s/dddddddddd/e/g; t e; b done
|
|---|
| 5172 | : e; s/eeeeeeeeee/f/g; t f; b done
|
|---|
| 5173 | : f; s/ffffffffff/g/g; t g; b done
|
|---|
| 5174 | : g; s/gggggggggg/h/g; t h; b done
|
|---|
| 5175 | : h; s/hhhhhhhhhh//g
|
|---|
| 5176 | @end group
|
|---|
| 5177 |
|
|---|
| 5178 | @group
|
|---|
| 5179 | : done
|
|---|
| 5180 | $! @{
|
|---|
| 5181 | h
|
|---|
| 5182 | b
|
|---|
| 5183 | @}
|
|---|
| 5184 | @end group
|
|---|
| 5185 |
|
|---|
| 5186 | # On the last line, convert back to decimal
|
|---|
| 5187 |
|
|---|
| 5188 | @group
|
|---|
| 5189 | : loop
|
|---|
| 5190 | /a/! s/[b-h]*/&0/
|
|---|
| 5191 | s/aaaaaaaaa/9/
|
|---|
| 5192 | s/aaaaaaaa/8/
|
|---|
| 5193 | s/aaaaaaa/7/
|
|---|
| 5194 | s/aaaaaa/6/
|
|---|
| 5195 | s/aaaaa/5/
|
|---|
| 5196 | s/aaaa/4/
|
|---|
| 5197 | s/aaa/3/
|
|---|
| 5198 | s/aa/2/
|
|---|
| 5199 | s/a/1/
|
|---|
| 5200 | @end group
|
|---|
| 5201 |
|
|---|
| 5202 | @group
|
|---|
| 5203 | : next
|
|---|
| 5204 | y/bcdefgh/abcdefg/
|
|---|
| 5205 | /[a-h]/ b loop
|
|---|
| 5206 | p
|
|---|
| 5207 | @end group
|
|---|
| 5208 | @end example
|
|---|
| 5209 | @c end---------------------------------------------
|
|---|
| 5210 |
|
|---|
| 5211 | @node wc -w
|
|---|
| 5212 | @section Counting Words
|
|---|
| 5213 |
|
|---|
| 5214 | This script is almost the same as the previous one, once each
|
|---|
| 5215 | of the words on the line is converted to a single @samp{a}
|
|---|
| 5216 | (in the previous script each letter was changed to an @samp{a}).
|
|---|
| 5217 |
|
|---|
| 5218 | It is interesting that real @command{wc} programs have optimized
|
|---|
| 5219 | loops for @samp{wc -c}, so they are much slower at counting
|
|---|
| 5220 | words rather than characters. This script's bottleneck,
|
|---|
| 5221 | instead, is arithmetic, and hence the word-counting one
|
|---|
| 5222 | is faster (it has to manage smaller numbers).
|
|---|
| 5223 |
|
|---|
| 5224 | Again, the common parts are not commented to show the importance
|
|---|
| 5225 | of commenting @command{sed} scripts.
|
|---|
| 5226 |
|
|---|
| 5227 | @c start-------------------------------------------
|
|---|
| 5228 | @example
|
|---|
| 5229 | #!/usr/bin/sed -nf
|
|---|
| 5230 |
|
|---|
| 5231 | @group
|
|---|
| 5232 | # Convert words to a's
|
|---|
| 5233 | s/[ @kbd{@key{TAB}}][ @kbd{@key{TAB}}]*/ /g
|
|---|
| 5234 | s/^/ /
|
|---|
| 5235 | s/ [^ ][^ ]*/a /g
|
|---|
| 5236 | s/ //g
|
|---|
| 5237 | @end group
|
|---|
| 5238 |
|
|---|
| 5239 | @group
|
|---|
| 5240 | # Append them to hold space
|
|---|
| 5241 | H
|
|---|
| 5242 | x
|
|---|
| 5243 | s/\n//
|
|---|
| 5244 | @end group
|
|---|
| 5245 |
|
|---|
| 5246 | @group
|
|---|
| 5247 | # From here on it is the same as in wc -c.
|
|---|
| 5248 | /aaaaaaaaaa/! bx; s/aaaaaaaaaa/b/g
|
|---|
| 5249 | /bbbbbbbbbb/! bx; s/bbbbbbbbbb/c/g
|
|---|
| 5250 | /cccccccccc/! bx; s/cccccccccc/d/g
|
|---|
| 5251 | /dddddddddd/! bx; s/dddddddddd/e/g
|
|---|
| 5252 | /eeeeeeeeee/! bx; s/eeeeeeeeee/f/g
|
|---|
| 5253 | /ffffffffff/! bx; s/ffffffffff/g/g
|
|---|
| 5254 | /gggggggggg/! bx; s/gggggggggg/h/g
|
|---|
| 5255 | s/hhhhhhhhhh//g
|
|---|
| 5256 | :x
|
|---|
| 5257 | $! @{ h; b; @}
|
|---|
| 5258 | :y
|
|---|
| 5259 | /a/! s/[b-h]*/&0/
|
|---|
| 5260 | s/aaaaaaaaa/9/
|
|---|
| 5261 | s/aaaaaaaa/8/
|
|---|
| 5262 | s/aaaaaaa/7/
|
|---|
| 5263 | s/aaaaaa/6/
|
|---|
| 5264 | s/aaaaa/5/
|
|---|
| 5265 | s/aaaa/4/
|
|---|
| 5266 | s/aaa/3/
|
|---|
| 5267 | s/aa/2/
|
|---|
| 5268 | s/a/1/
|
|---|
| 5269 | y/bcdefgh/abcdefg/
|
|---|
| 5270 | /[a-h]/ by
|
|---|
| 5271 | p
|
|---|
| 5272 | @end group
|
|---|
| 5273 | @end example
|
|---|
| 5274 | @c end---------------------------------------------
|
|---|
| 5275 |
|
|---|
| 5276 | @node wc -l
|
|---|
| 5277 | @section Counting Lines
|
|---|
| 5278 |
|
|---|
| 5279 | No strange things are done now, because @command{sed} gives us
|
|---|
| 5280 | @samp{wc -l} functionality for free!!! Look:
|
|---|
| 5281 |
|
|---|
| 5282 | @c start-------------------------------------------
|
|---|
| 5283 | @example
|
|---|
| 5284 | @group
|
|---|
| 5285 | #!/usr/bin/sed -nf
|
|---|
| 5286 | $=
|
|---|
| 5287 | @end group
|
|---|
| 5288 | @end example
|
|---|
| 5289 | @c end---------------------------------------------
|
|---|
| 5290 |
|
|---|
| 5291 | @node head
|
|---|
| 5292 | @section Printing the First Lines
|
|---|
| 5293 |
|
|---|
| 5294 | This script is probably the simplest useful @command{sed} script.
|
|---|
| 5295 | It displays the first 10 lines of input; the number of displayed
|
|---|
| 5296 | lines is right before the @code{q} command.
|
|---|
| 5297 |
|
|---|
| 5298 | @c start-------------------------------------------
|
|---|
| 5299 | @example
|
|---|
| 5300 | @group
|
|---|
| 5301 | #!/usr/bin/sed -f
|
|---|
| 5302 | 10q
|
|---|
| 5303 | @end group
|
|---|
| 5304 | @end example
|
|---|
| 5305 | @c end---------------------------------------------
|
|---|
| 5306 |
|
|---|
| 5307 | @node tail
|
|---|
| 5308 | @section Printing the Last Lines
|
|---|
| 5309 |
|
|---|
| 5310 | Printing the last @var{n} lines rather than the first is more complex
|
|---|
| 5311 | but indeed possible. @var{n} is encoded in the second line, before
|
|---|
| 5312 | the bang character.
|
|---|
| 5313 |
|
|---|
| 5314 | This script is similar to the @command{tac} script in that it keeps the
|
|---|
| 5315 | final output in the hold space and prints it at the end:
|
|---|
| 5316 |
|
|---|
| 5317 | @c start-------------------------------------------
|
|---|
| 5318 | @example
|
|---|
| 5319 | #!/usr/bin/sed -nf
|
|---|
| 5320 |
|
|---|
| 5321 | @group
|
|---|
| 5322 | 1! @{; H; g; @}
|
|---|
| 5323 | 1,10 !s/[^\n]*\n//
|
|---|
| 5324 | $p
|
|---|
| 5325 | h
|
|---|
| 5326 | @end group
|
|---|
| 5327 | @end example
|
|---|
| 5328 | @c end---------------------------------------------
|
|---|
| 5329 |
|
|---|
| 5330 | Mainly, the scripts keeps a window of 10 lines and slides it
|
|---|
| 5331 | by adding a line and deleting the oldest (the substitution command
|
|---|
| 5332 | on the second line works like a @code{D} command but does not
|
|---|
| 5333 | restart the loop).
|
|---|
| 5334 |
|
|---|
| 5335 | The ``sliding window'' technique is a very powerful way to write
|
|---|
| 5336 | efficient and complex @command{sed} scripts, because commands like
|
|---|
| 5337 | @code{P} would require a lot of work if implemented manually.
|
|---|
| 5338 |
|
|---|
| 5339 | To introduce the technique, which is fully demonstrated in the
|
|---|
| 5340 | rest of this chapter and is based on the @code{N}, @code{P}
|
|---|
| 5341 | and @code{D} commands, here is an implementation of @command{tail}
|
|---|
| 5342 | using a simple ``sliding window.''
|
|---|
| 5343 |
|
|---|
| 5344 | This looks complicated but in fact the working is the same as
|
|---|
| 5345 | the last script: after we have kicked in the appropriate number
|
|---|
| 5346 | of lines, however, we stop using the hold space to keep inter-line
|
|---|
| 5347 | state, and instead use @code{N} and @code{D} to slide pattern
|
|---|
| 5348 | space by one line:
|
|---|
| 5349 |
|
|---|
| 5350 | @c start-------------------------------------------
|
|---|
| 5351 | @example
|
|---|
| 5352 | #!/usr/bin/sed -f
|
|---|
| 5353 |
|
|---|
| 5354 | @group
|
|---|
| 5355 | 1h
|
|---|
| 5356 | 2,10 @{; H; g; @}
|
|---|
| 5357 | $q
|
|---|
| 5358 | 1,9d
|
|---|
| 5359 | N
|
|---|
| 5360 | D
|
|---|
| 5361 | @end group
|
|---|
| 5362 | @end example
|
|---|
| 5363 | @c end---------------------------------------------
|
|---|
| 5364 |
|
|---|
| 5365 | Note how the first, second and fourth line are inactive after
|
|---|
| 5366 | the first ten lines of input. After that, all the script does
|
|---|
| 5367 | is: exiting on the last line of input, appending the next input
|
|---|
| 5368 | line to pattern space, and removing the first line.
|
|---|
| 5369 |
|
|---|
| 5370 | @node uniq
|
|---|
| 5371 | @section Make Duplicate Lines Unique
|
|---|
| 5372 |
|
|---|
| 5373 | This is an example of the art of using the @code{N}, @code{P}
|
|---|
| 5374 | and @code{D} commands, probably the most difficult to master.
|
|---|
| 5375 |
|
|---|
| 5376 | @c start-------------------------------------------
|
|---|
| 5377 | @example
|
|---|
| 5378 | @group
|
|---|
| 5379 | #!/usr/bin/sed -f
|
|---|
| 5380 | h
|
|---|
| 5381 | @end group
|
|---|
| 5382 |
|
|---|
| 5383 | @group
|
|---|
| 5384 | :b
|
|---|
| 5385 | # On the last line, print and exit
|
|---|
| 5386 | $b
|
|---|
| 5387 | N
|
|---|
| 5388 | /^\(.*\)\n\1$/ @{
|
|---|
| 5389 | # The two lines are identical. Undo the effect of
|
|---|
| 5390 | # the n command.
|
|---|
| 5391 | g
|
|---|
| 5392 | bb
|
|---|
| 5393 | @}
|
|---|
| 5394 | @end group
|
|---|
| 5395 |
|
|---|
| 5396 | @group
|
|---|
| 5397 | # If the @code{N} command had added the last line, print and exit
|
|---|
| 5398 | $b
|
|---|
| 5399 | @end group
|
|---|
| 5400 |
|
|---|
| 5401 | @group
|
|---|
| 5402 | # The lines are different; print the first and go
|
|---|
| 5403 | # back working on the second.
|
|---|
| 5404 | P
|
|---|
| 5405 | D
|
|---|
| 5406 | @end group
|
|---|
| 5407 | @end example
|
|---|
| 5408 | @c end---------------------------------------------
|
|---|
| 5409 |
|
|---|
| 5410 | As you can see, we maintain a 2-line window using @code{P} and @code{D}.
|
|---|
| 5411 | This technique is often used in advanced @command{sed} scripts.
|
|---|
| 5412 |
|
|---|
| 5413 | @node uniq -d
|
|---|
| 5414 | @section Print Duplicated Lines of Input
|
|---|
| 5415 |
|
|---|
| 5416 | This script prints only duplicated lines, like @samp{uniq -d}.
|
|---|
| 5417 |
|
|---|
| 5418 | @c start-------------------------------------------
|
|---|
| 5419 | @example
|
|---|
| 5420 | #!/usr/bin/sed -nf
|
|---|
| 5421 |
|
|---|
| 5422 | @group
|
|---|
| 5423 | $b
|
|---|
| 5424 | N
|
|---|
| 5425 | /^\(.*\)\n\1$/ @{
|
|---|
| 5426 | # Print the first of the duplicated lines
|
|---|
| 5427 | s/.*\n//
|
|---|
| 5428 | p
|
|---|
| 5429 | @end group
|
|---|
| 5430 |
|
|---|
| 5431 | @group
|
|---|
| 5432 | # Loop until we get a different line
|
|---|
| 5433 | :b
|
|---|
| 5434 | $b
|
|---|
| 5435 | N
|
|---|
| 5436 | /^\(.*\)\n\1$/ @{
|
|---|
| 5437 | s/.*\n//
|
|---|
| 5438 | bb
|
|---|
| 5439 | @}
|
|---|
| 5440 | @}
|
|---|
| 5441 | @end group
|
|---|
| 5442 |
|
|---|
| 5443 | @group
|
|---|
| 5444 | # The last line cannot be followed by duplicates
|
|---|
| 5445 | $b
|
|---|
| 5446 | @end group
|
|---|
| 5447 |
|
|---|
| 5448 | @group
|
|---|
| 5449 | # Found a different one. Leave it alone in the pattern space
|
|---|
| 5450 | # and go back to the top, hunting its duplicates
|
|---|
| 5451 | D
|
|---|
| 5452 | @end group
|
|---|
| 5453 | @end example
|
|---|
| 5454 | @c end---------------------------------------------
|
|---|
| 5455 |
|
|---|
| 5456 | @node uniq -u
|
|---|
| 5457 | @section Remove All Duplicated Lines
|
|---|
| 5458 |
|
|---|
| 5459 | This script prints only unique lines, like @samp{uniq -u}.
|
|---|
| 5460 |
|
|---|
| 5461 | @c start-------------------------------------------
|
|---|
| 5462 | @example
|
|---|
| 5463 | #!/usr/bin/sed -f
|
|---|
| 5464 |
|
|---|
| 5465 | @group
|
|---|
| 5466 | # Search for a duplicate line --- until that, print what you find.
|
|---|
| 5467 | $b
|
|---|
| 5468 | N
|
|---|
| 5469 | /^\(.*\)\n\1$/ ! @{
|
|---|
| 5470 | P
|
|---|
| 5471 | D
|
|---|
| 5472 | @}
|
|---|
| 5473 | @end group
|
|---|
| 5474 |
|
|---|
| 5475 | @group
|
|---|
| 5476 | :c
|
|---|
| 5477 | # Got two equal lines in pattern space. At the
|
|---|
| 5478 | # end of the file we simply exit
|
|---|
| 5479 | $d
|
|---|
| 5480 | @end group
|
|---|
| 5481 |
|
|---|
| 5482 | @group
|
|---|
| 5483 | # Else, we keep reading lines with @code{N} until we
|
|---|
| 5484 | # find a different one
|
|---|
| 5485 | s/.*\n//
|
|---|
| 5486 | N
|
|---|
| 5487 | /^\(.*\)\n\1$/ @{
|
|---|
| 5488 | bc
|
|---|
| 5489 | @}
|
|---|
| 5490 | @end group
|
|---|
| 5491 |
|
|---|
| 5492 | @group
|
|---|
| 5493 | # Remove the last instance of the duplicate line
|
|---|
| 5494 | # and go back to the top
|
|---|
| 5495 | D
|
|---|
| 5496 | @end group
|
|---|
| 5497 | @end example
|
|---|
| 5498 | @c end---------------------------------------------
|
|---|
| 5499 |
|
|---|
| 5500 | @node cat -s
|
|---|
| 5501 | @section Squeezing Blank Lines
|
|---|
| 5502 |
|
|---|
| 5503 | As a final example, here are three scripts, of increasing complexity
|
|---|
| 5504 | and speed, that implement the same function as @samp{cat -s}, that is
|
|---|
| 5505 | squeezing blank lines.
|
|---|
| 5506 |
|
|---|
| 5507 | The first leaves a blank line at the beginning and end if there are
|
|---|
| 5508 | some already.
|
|---|
| 5509 |
|
|---|
| 5510 | @c start-------------------------------------------
|
|---|
| 5511 | @example
|
|---|
| 5512 | #!/usr/bin/sed -f
|
|---|
| 5513 |
|
|---|
| 5514 | @group
|
|---|
| 5515 | # on empty lines, join with next
|
|---|
| 5516 | # Note there is a star in the regexp
|
|---|
| 5517 | :x
|
|---|
| 5518 | /^\n*$/ @{
|
|---|
| 5519 | N
|
|---|
| 5520 | bx
|
|---|
| 5521 | @}
|
|---|
| 5522 | @end group
|
|---|
| 5523 |
|
|---|
| 5524 | @group
|
|---|
| 5525 | # now, squeeze all '\n', this can be also done by:
|
|---|
| 5526 | # s/^\(\n\)*/\1/
|
|---|
| 5527 | s/\n*/\
|
|---|
| 5528 | /
|
|---|
| 5529 | @end group
|
|---|
| 5530 | @end example
|
|---|
| 5531 | @c end---------------------------------------------
|
|---|
| 5532 |
|
|---|
| 5533 | This one is a bit more complex and removes all empty lines
|
|---|
| 5534 | at the beginning. It does leave a single blank line at end
|
|---|
| 5535 | if one was there.
|
|---|
| 5536 |
|
|---|
| 5537 | @c start-------------------------------------------
|
|---|
| 5538 | @example
|
|---|
| 5539 | #!/usr/bin/sed -f
|
|---|
| 5540 |
|
|---|
| 5541 | @group
|
|---|
| 5542 | # delete all leading empty lines
|
|---|
| 5543 | 1,/^./@{
|
|---|
| 5544 | /./!d
|
|---|
| 5545 | @}
|
|---|
| 5546 | @end group
|
|---|
| 5547 |
|
|---|
| 5548 | @group
|
|---|
| 5549 | # on an empty line we remove it and all the following
|
|---|
| 5550 | # empty lines, but one
|
|---|
| 5551 | :x
|
|---|
| 5552 | /./!@{
|
|---|
| 5553 | N
|
|---|
| 5554 | s/^\n$//
|
|---|
| 5555 | tx
|
|---|
| 5556 | @}
|
|---|
| 5557 | @end group
|
|---|
| 5558 | @end example
|
|---|
| 5559 | @c end---------------------------------------------
|
|---|
| 5560 |
|
|---|
| 5561 | This removes leading and trailing blank lines. It is also the
|
|---|
| 5562 | fastest. Note that loops are completely done with @code{n} and
|
|---|
| 5563 | @code{b}, without relying on @command{sed} to restart the
|
|---|
| 5564 | script automatically at the end of a line.
|
|---|
| 5565 |
|
|---|
| 5566 | @c start-------------------------------------------
|
|---|
| 5567 | @example
|
|---|
| 5568 | #!/usr/bin/sed -nf
|
|---|
| 5569 |
|
|---|
| 5570 | @group
|
|---|
| 5571 | # delete all (leading) blanks
|
|---|
| 5572 | /./!d
|
|---|
| 5573 | @end group
|
|---|
| 5574 |
|
|---|
| 5575 | @group
|
|---|
| 5576 | # get here: so there is a non empty
|
|---|
| 5577 | :x
|
|---|
| 5578 | # print it
|
|---|
| 5579 | p
|
|---|
| 5580 | # get next
|
|---|
| 5581 | n
|
|---|
| 5582 | # got chars? print it again, etc...
|
|---|
| 5583 | /./bx
|
|---|
| 5584 | @end group
|
|---|
| 5585 |
|
|---|
| 5586 | @group
|
|---|
| 5587 | # no, don't have chars: got an empty line
|
|---|
| 5588 | :z
|
|---|
| 5589 | # get next, if last line we finish here so no trailing
|
|---|
| 5590 | # empty lines are written
|
|---|
| 5591 | n
|
|---|
| 5592 | # also empty? then ignore it, and get next... this will
|
|---|
| 5593 | # remove ALL empty lines
|
|---|
| 5594 | /./!bz
|
|---|
| 5595 | @end group
|
|---|
| 5596 |
|
|---|
| 5597 | @group
|
|---|
| 5598 | # all empty lines were deleted/ignored, but we have a non empty. As
|
|---|
| 5599 | # what we want to do is to squeeze, insert a blank line artificially
|
|---|
| 5600 | i\
|
|---|
| 5601 | @end group
|
|---|
| 5602 |
|
|---|
| 5603 | bx
|
|---|
| 5604 | @end example
|
|---|
| 5605 | @c end---------------------------------------------
|
|---|
| 5606 |
|
|---|
| 5607 | @node Limitations
|
|---|
| 5608 | @chapter @value{SSED}'s Limitations and Non-limitations
|
|---|
| 5609 |
|
|---|
| 5610 | @cindex GNU extensions, unlimited line length
|
|---|
| 5611 | @cindex Portability, line length limitations
|
|---|
| 5612 | For those who want to write portable @command{sed} scripts,
|
|---|
| 5613 | be aware that some implementations have been known to
|
|---|
| 5614 | limit line lengths (for the pattern and hold spaces)
|
|---|
| 5615 | to be no more than 4000 bytes.
|
|---|
| 5616 | The @sc{posix} standard specifies that conforming @command{sed}
|
|---|
| 5617 | implementations shall support at least 8192 byte line lengths.
|
|---|
| 5618 | @value{SSED} has no built-in limit on line length;
|
|---|
| 5619 | as long as it can @code{malloc()} more (virtual) memory,
|
|---|
| 5620 | you can feed or construct lines as long as you like.
|
|---|
| 5621 |
|
|---|
| 5622 | However, recursion is used to handle subpatterns and indefinite
|
|---|
| 5623 | repetition. This means that the available stack space may limit
|
|---|
| 5624 | the size of the buffer that can be processed by certain patterns.
|
|---|
| 5625 |
|
|---|
| 5626 |
|
|---|
| 5627 | @node Other Resources
|
|---|
| 5628 | @chapter Other Resources for Learning About @command{sed}
|
|---|
| 5629 |
|
|---|
| 5630 | For up to date information about @value{SSED} please
|
|---|
| 5631 | visit @uref{https://www.gnu.org/software/sed/}.
|
|---|
| 5632 |
|
|---|
| 5633 | Send general questions and suggestions to @email{sed-devel@@gnu.org}.
|
|---|
| 5634 | Visit the mailing list archives for past discussions at
|
|---|
| 5635 | @uref{https://lists.gnu.org/archive/html/sed-devel/}.
|
|---|
| 5636 |
|
|---|
| 5637 | @cindex Additional reading about @command{sed}
|
|---|
| 5638 | The following resources provide information about @command{sed}
|
|---|
| 5639 | (both @value{SSED} and other variations). Note these not maintained by
|
|---|
| 5640 | @value{SSED} developers.
|
|---|
| 5641 |
|
|---|
| 5642 | @itemize @bullet
|
|---|
| 5643 |
|
|---|
| 5644 | @item
|
|---|
| 5645 | sed @code{$HOME}: @uref{http://sed.sf.net}
|
|---|
| 5646 |
|
|---|
| 5647 | @item
|
|---|
| 5648 | sed FAQ: @uref{http://sed.sf.net/sedfaq.html}
|
|---|
| 5649 |
|
|---|
| 5650 | @item
|
|---|
| 5651 | seder's grabbag: @uref{http://sed.sf.net/grabbag}
|
|---|
| 5652 |
|
|---|
| 5653 | @item
|
|---|
| 5654 | The @code{sed-users} mailing list maintained by Sven Guckes:
|
|---|
| 5655 | @uref{http://groups.yahoo.com/group/sed-users/}
|
|---|
| 5656 | (note this is @emph{not} the @value{SSED} mailing list).
|
|---|
| 5657 |
|
|---|
| 5658 | @end itemize
|
|---|
| 5659 |
|
|---|
| 5660 | @node Reporting Bugs
|
|---|
| 5661 | @chapter Reporting Bugs
|
|---|
| 5662 |
|
|---|
| 5663 | @cindex Bugs, reporting
|
|---|
| 5664 | Email bug reports to @email{bug-sed@@gnu.org}.
|
|---|
| 5665 | Also, please include the output of @samp{sed --version} in the body
|
|---|
| 5666 | of your report if at all possible.
|
|---|
| 5667 |
|
|---|
| 5668 | Please do not send a bug report like this:
|
|---|
| 5669 |
|
|---|
| 5670 | @example
|
|---|
| 5671 | @i{@i{@r{while building frobme-1.3.4}}}
|
|---|
| 5672 | $ configure
|
|---|
| 5673 | @error{} sed: file sedscr line 1: Unknown option to 's'
|
|---|
| 5674 | @end example
|
|---|
| 5675 |
|
|---|
| 5676 | If @value{SSED} doesn't configure your favorite package, take a
|
|---|
| 5677 | few extra minutes to identify the specific problem and make a stand-alone
|
|---|
| 5678 | test case. Unlike other programs such as C compilers, making such test
|
|---|
| 5679 | cases for @command{sed} is quite simple.
|
|---|
| 5680 |
|
|---|
| 5681 | A stand-alone test case includes all the data necessary to perform the
|
|---|
| 5682 | test, and the specific invocation of @command{sed} that causes the problem.
|
|---|
| 5683 | The smaller a stand-alone test case is, the better. A test case should
|
|---|
| 5684 | not involve something as far removed from @command{sed} as ``try to configure
|
|---|
| 5685 | frobme-1.3.4''. Yes, that is in principle enough information to look
|
|---|
| 5686 | for the bug, but that is not a very practical prospect.
|
|---|
| 5687 |
|
|---|
| 5688 | Here are a few commonly reported bugs that are not bugs.
|
|---|
| 5689 |
|
|---|
| 5690 | @table @asis
|
|---|
| 5691 | @anchor{N_command_last_line}
|
|---|
| 5692 | @item @code{N} command on the last line
|
|---|
| 5693 | @cindex Portability, @code{N} command on the last line
|
|---|
| 5694 | @cindex Non-bugs, @code{N} command on the last line
|
|---|
| 5695 |
|
|---|
| 5696 | Most versions of @command{sed} exit without printing anything when
|
|---|
| 5697 | the @command{N} command is issued on the last line of a file.
|
|---|
| 5698 | @value{SSED} prints pattern space before exiting unless of course
|
|---|
| 5699 | the @command{-n} command switch has been specified. This choice is
|
|---|
| 5700 | by design.
|
|---|
| 5701 |
|
|---|
| 5702 | Default behavior (gnu extension, non-POSIX conforming):
|
|---|
| 5703 | @example
|
|---|
| 5704 | $ seq 3 | sed N
|
|---|
| 5705 | 1
|
|---|
| 5706 | 2
|
|---|
| 5707 | 3
|
|---|
| 5708 | @end example
|
|---|
| 5709 | @noindent
|
|---|
| 5710 | To force POSIX-conforming behavior:
|
|---|
| 5711 | @example
|
|---|
| 5712 | $ seq 3 | sed --posix N
|
|---|
| 5713 | 1
|
|---|
| 5714 | 2
|
|---|
| 5715 | @end example
|
|---|
| 5716 |
|
|---|
| 5717 | For example, the behavior of
|
|---|
| 5718 | @example
|
|---|
| 5719 | sed N foo bar
|
|---|
| 5720 | @end example
|
|---|
| 5721 | @noindent
|
|---|
| 5722 | would depend on whether foo has an even or an odd number of
|
|---|
| 5723 | lines@footnote{which is the actual ``bug'' that prompted the
|
|---|
| 5724 | change in behavior}. Or, when writing a script to read the
|
|---|
| 5725 | next few lines following a pattern match, traditional
|
|---|
| 5726 | implementations of @code{sed} would force you to write
|
|---|
| 5727 | something like
|
|---|
| 5728 | @example
|
|---|
| 5729 | /foo/@{ $!N; $!N; $!N; $!N; $!N; $!N; $!N; $!N; $!N @}
|
|---|
| 5730 | @end example
|
|---|
| 5731 | @noindent
|
|---|
| 5732 | instead of just
|
|---|
| 5733 | @example
|
|---|
| 5734 | /foo/@{ N;N;N;N;N;N;N;N;N; @}
|
|---|
| 5735 | @end example
|
|---|
| 5736 |
|
|---|
| 5737 | @cindex @code{POSIXLY_CORRECT} behavior, @code{N} command
|
|---|
| 5738 | In any case, the simplest workaround is to use @code{$d;N} in
|
|---|
| 5739 | scripts that rely on the traditional behavior, or to set
|
|---|
| 5740 | the @code{POSIXLY_CORRECT} variable to a non-empty value.
|
|---|
| 5741 |
|
|---|
| 5742 | @item Regex syntax clashes (problems with backslashes)
|
|---|
| 5743 | @cindex GNU extensions, to basic regular expressions
|
|---|
| 5744 | @cindex Non-bugs, regex syntax clashes
|
|---|
| 5745 | @command{sed} uses the @sc{posix} basic regular expression syntax. According to
|
|---|
| 5746 | the standard, the meaning of some escape sequences is undefined in
|
|---|
| 5747 | this syntax; notable in the case of @command{sed} are @code{\|},
|
|---|
| 5748 | @code{\+}, @code{\?}, @code{\`}, @code{\'}, @code{\<},
|
|---|
| 5749 | @code{\>}, @code{\b}, @code{\B}, @code{\w}, and @code{\W}.
|
|---|
| 5750 |
|
|---|
| 5751 | As in all GNU programs that use @sc{posix} basic regular
|
|---|
| 5752 | expressions, @command{sed} interprets these escape sequences as special
|
|---|
| 5753 | characters. So, @code{x\+} matches one or more occurrences of @samp{x}.
|
|---|
| 5754 | @code{abc\|def} matches either @samp{abc} or @samp{def}.
|
|---|
| 5755 |
|
|---|
| 5756 | This syntax may cause problems when running scripts written for other
|
|---|
| 5757 | @command{sed}s. Some @command{sed} programs have been written with the
|
|---|
| 5758 | assumption that @code{\|} and @code{\+} match the literal characters
|
|---|
| 5759 | @code{|} and @code{+}. Such scripts must be modified by removing the
|
|---|
| 5760 | spurious backslashes if they are to be used with modern implementations
|
|---|
| 5761 | of @command{sed}, like
|
|---|
| 5762 | GNU @command{sed}.
|
|---|
| 5763 |
|
|---|
| 5764 | On the other hand, some scripts use s|abc\|def||g to remove occurrences
|
|---|
| 5765 | of @emph{either} @code{abc} or @code{def}. While this worked until
|
|---|
| 5766 | @command{sed} 4.0.x, newer versions interpret this as removing the
|
|---|
| 5767 | string @code{abc|def}. This is again undefined behavior according to
|
|---|
| 5768 | POSIX, and this interpretation is arguably more robust: older
|
|---|
| 5769 | @command{sed}s, for example, required that the regex matcher parsed
|
|---|
| 5770 | @code{\/} as @code{/} in the common case of escaping a slash, which is
|
|---|
| 5771 | again undefined behavior; the new behavior avoids this, and this is good
|
|---|
| 5772 | because the regex matcher is only partially under our control.
|
|---|
| 5773 |
|
|---|
| 5774 | @cindex GNU extensions, special escapes
|
|---|
| 5775 | In addition, this version of @command{sed} supports several escape characters
|
|---|
| 5776 | (some of which are multi-character) to insert non-printable characters
|
|---|
| 5777 | in scripts (@code{\a}, @code{\c}, @code{\d}, @code{\o}, @code{\r},
|
|---|
| 5778 | @code{\t}, @code{\v}, @code{\x}). These can cause similar problems
|
|---|
| 5779 | with scripts written for other @command{sed}s.
|
|---|
| 5780 |
|
|---|
| 5781 | @item @option{-i} clobbers read-only files
|
|---|
| 5782 | @cindex In-place editing
|
|---|
| 5783 | @cindex @value{SSEDEXT}, in-place editing
|
|---|
| 5784 | @cindex Non-bugs, in-place editing
|
|---|
| 5785 |
|
|---|
| 5786 | In short, @samp{sed -i} will let you delete the contents of
|
|---|
| 5787 | a read-only file, and in general the @option{-i} option
|
|---|
| 5788 | (@pxref{Invoking sed, , Invocation}) lets you clobber
|
|---|
| 5789 | protected files. This is not a bug, but rather a consequence
|
|---|
| 5790 | of how the Unix file system works.
|
|---|
| 5791 |
|
|---|
| 5792 | The permissions on a file say what can happen to the data
|
|---|
| 5793 | in that file, while the permissions on a directory say what can
|
|---|
| 5794 | happen to the list of files in that directory. @samp{sed -i}
|
|---|
| 5795 | will not ever open for writing a file that is already on disk.
|
|---|
| 5796 | Rather, it will work on a temporary file that is finally renamed
|
|---|
| 5797 | to the original name: if you rename or delete files, you're actually
|
|---|
| 5798 | modifying the contents of the directory, so the operation depends on
|
|---|
| 5799 | the permissions of the directory, not of the file. For this same
|
|---|
| 5800 | reason, @command{sed} does not let you use @option{-i} on a writable file
|
|---|
| 5801 | in a read-only directory, and will break hard or symbolic links when
|
|---|
| 5802 | @option{-i} is used on such a file.
|
|---|
| 5803 |
|
|---|
| 5804 | @item @code{0a} does not work (gives an error)
|
|---|
| 5805 | @cindex @code{0} address
|
|---|
| 5806 | @cindex GNU extensions, @code{0} address
|
|---|
| 5807 | @cindex Non-bugs, @code{0} address
|
|---|
| 5808 |
|
|---|
| 5809 | There is no line 0. 0 is a special address that is only used to treat
|
|---|
| 5810 | addresses like @code{0,/@var{RE}/} as active when the script starts: if
|
|---|
| 5811 | you write @code{1,/abc/d} and the first line includes the string @samp{abc},
|
|---|
| 5812 | then that match would be ignored because address ranges must span at least
|
|---|
| 5813 | two lines (barring the end of the file); but what you probably wanted is
|
|---|
| 5814 | to delete every line up to the first one including @samp{abc}, and this
|
|---|
| 5815 | is obtained with @code{0,/abc/d}.
|
|---|
| 5816 |
|
|---|
| 5817 | @ifclear PERL
|
|---|
| 5818 | @item @code{[a-z]} is case insensitive
|
|---|
| 5819 | @cindex Non-bugs, localization-related
|
|---|
| 5820 |
|
|---|
| 5821 | You are encountering problems with locales. POSIX mandates that @code{[a-z]}
|
|---|
| 5822 | uses the current locale's collation order -- in C parlance, that means using
|
|---|
| 5823 | @code{strcoll(3)} instead of @code{strcmp(3)}. Some locales have a
|
|---|
| 5824 | case-insensitive collation order, others don't.
|
|---|
| 5825 |
|
|---|
| 5826 | Another problem is that @code{[a-z]} tries to use collation symbols.
|
|---|
| 5827 | This only happens if you are on the GNU system, using
|
|---|
| 5828 | GNU libc's regular expression matcher instead of compiling the
|
|---|
| 5829 | one supplied with GNU sed. In a Danish locale, for example,
|
|---|
| 5830 | the regular expression @code{^[a-z]$} matches the string @samp{aa},
|
|---|
| 5831 | because this is a single collating symbol that comes after @samp{a}
|
|---|
| 5832 | and before @samp{b}; @samp{ll} behaves similarly in Spanish
|
|---|
| 5833 | locales, or @samp{ij} in Dutch locales.
|
|---|
| 5834 |
|
|---|
| 5835 | To work around these problems, which may cause bugs in shell scripts, set
|
|---|
| 5836 | the @env{LC_COLLATE} and @env{LC_CTYPE} environment variables to @samp{C}.
|
|---|
| 5837 |
|
|---|
| 5838 | @item @code{s/.*//} does not clear pattern space
|
|---|
| 5839 | @cindex Non-bugs, localization-related
|
|---|
| 5840 | @cindex @value{SSEDEXT}, emptying pattern space
|
|---|
| 5841 | @cindex Emptying pattern space
|
|---|
| 5842 |
|
|---|
| 5843 | This happens if your input stream includes invalid multibyte
|
|---|
| 5844 | sequences. @sc{posix} mandates that such sequences
|
|---|
| 5845 | are @emph{not} matched by @samp{.}, so that @samp{s/.*//} will not clear
|
|---|
| 5846 | pattern space as you would expect. In fact, there is no way to clear
|
|---|
| 5847 | sed's buffers in the middle of the script in most multibyte locales
|
|---|
| 5848 | (including UTF-8 locales). For this reason, @value{SSED} provides a `z'
|
|---|
| 5849 | command (for `zap') as an extension.
|
|---|
| 5850 |
|
|---|
| 5851 | To work around these problems, which may cause bugs in shell scripts, set
|
|---|
| 5852 | the @env{LC_COLLATE} and @env{LC_CTYPE} environment variables to @samp{C}.
|
|---|
| 5853 | @end ifclear
|
|---|
| 5854 | @end table
|
|---|
| 5855 |
|
|---|
| 5856 |
|
|---|
| 5857 |
|
|---|
| 5858 |
|
|---|
| 5859 | @page
|
|---|
| 5860 | @node GNU Free Documentation License
|
|---|
| 5861 | @appendix GNU Free Documentation License
|
|---|
| 5862 |
|
|---|
| 5863 | @include fdl.texi
|
|---|
| 5864 |
|
|---|
| 5865 |
|
|---|
| 5866 | @page
|
|---|
| 5867 | @node Concept Index
|
|---|
| 5868 | @unnumbered Concept Index
|
|---|
| 5869 |
|
|---|
| 5870 | This is a general index of all issues discussed in this manual, with the
|
|---|
| 5871 | exception of the @command{sed} commands and command-line options.
|
|---|
| 5872 |
|
|---|
| 5873 | @printindex cp
|
|---|
| 5874 |
|
|---|
| 5875 | @page
|
|---|
| 5876 | @node Command and Option Index
|
|---|
| 5877 | @unnumbered Command and Option Index
|
|---|
| 5878 |
|
|---|
| 5879 | This is an alphabetical list of all @command{sed} commands and command-line
|
|---|
| 5880 | options.
|
|---|
| 5881 |
|
|---|
| 5882 | @printindex fn
|
|---|
| 5883 |
|
|---|
| 5884 | @contents
|
|---|
| 5885 | @bye
|
|---|
| 5886 |
|
|---|
| 5887 | @c XXX FIXME: the term "cycle" is never defined...
|
|---|