| 1 | ! Gawk.Hlp
|
|---|
| 2 | ! Pat Rankin, Jun'90
|
|---|
| 3 | ! revised, Jun'91
|
|---|
| 4 | ! revised, Jul'92
|
|---|
| 5 | ! revised, Jan'95
|
|---|
| 6 | ! revised, Apr'97
|
|---|
| 7 | ! revised, Jan'03
|
|---|
| 8 | ! Online help for GAWK.
|
|---|
| 9 | !
|
|---|
| 10 | 1 GAWK
|
|---|
| 11 | GAWK is GNU awk, the Free Software Foundation's implementation of
|
|---|
| 12 | the awk programming language. awk is an interpretive language which
|
|---|
| 13 | can handle many data-reformatting jobs with just a few lines of code.
|
|---|
| 14 | It has powerful string manipulation and pattern matching capabilities
|
|---|
| 15 | built in. This version is compatible with POSIX 1003.2 awk.
|
|---|
| 16 |
|
|---|
| 17 | The VMS version of GAWK supports both the original UN*X-style command
|
|---|
| 18 | interface and a DCL interface. The only setup requirement for GAWK
|
|---|
| 19 | is to define it as a 'foreign' command: a DCL symbol with a value
|
|---|
| 20 | which begins with '$'.
|
|---|
| 21 | $ GAWK :== $disk:[directory]GAWK
|
|---|
| 22 | 2 GNU_syntax
|
|---|
| 23 | GAWK's UN*X-style interface uses the 'dash' convention for specifying
|
|---|
| 24 | options and uses spaces to separate multiple arguments.
|
|---|
| 25 |
|
|---|
| 26 | There are two main alternatives, depending on how the awk program is
|
|---|
| 27 | to be passed to GAWK. Both alternatives share most options.
|
|---|
| 28 |
|
|---|
| 29 | Usage: $ gawk [-W opts] [-F fs] [-v var=val] -f progfile [--] file ...
|
|---|
| 30 | or $ gawk [-W opts] [-F fs] [-v var=val] [--] "program" file ...
|
|---|
| 31 |
|
|---|
| 32 | The options are case-sensitive. On VMS, the DCL command interpreter
|
|---|
| 33 | converts unquoted text into uppercase before passing it to the running
|
|---|
| 34 | program. However, GAWK is written in 'C' and the C Run-Time Library
|
|---|
| 35 | (VAXCRTL or DECC$SHR) converts unquoted text into *lowercase*.
|
|---|
| 36 | Therefore, the -Fval and -W options must be enclosed in quotes.
|
|---|
| 37 | 3 options
|
|---|
| 38 | -f file use the specified file as the awk program source; if more
|
|---|
| 39 | than one instance of -f is used, each file will be read
|
|---|
| 40 | in succession
|
|---|
| 41 | -Fstring define a value for the FS variable (field separator)
|
|---|
| 42 | -v var=val assign a value of 'val' to the variable 'var'
|
|---|
| 43 | -W 'options' additional gawk-specific options; multiple values may
|
|---|
| 44 | be separated by commas, or by spaces if they're quoted,
|
|---|
| 45 | or mulitple occurrences of -W may be used.
|
|---|
| 46 | -W compat use awk "compatibility mode" to disable GAWK extensions
|
|---|
| 47 | and get the behavior of UN*X awk.
|
|---|
| 48 | -W copyright [or -W copyleft] display an abbreviated version of
|
|---|
| 49 | the GNU copyright information
|
|---|
| 50 | -W help list command line options (same as -W usage)
|
|---|
| 51 | -W lint warn about suspect or non-portable awk program code
|
|---|
| 52 | -W lint-old warn about constructs not available in original awk
|
|---|
| 53 | -W posix compatibility mode with additional restrictions
|
|---|
| 54 | -W re-interval evaluate '{' and '}' as intervals in regular expressions
|
|---|
| 55 | -W traditional suppress POSIX and GNU regular expression extensions
|
|---|
| 56 | -W usage list command line options (same as -W help)
|
|---|
| 57 | -W version display program version number
|
|---|
| 58 | -- don't check further arguments for leading dash
|
|---|
| 59 | 3 program_text
|
|---|
| 60 | If the '-f file' option is not used on the command line, then the
|
|---|
| 61 | first "non-dash" argument is assumed to be a string of text containing
|
|---|
| 62 | the awk source program. Here is a complete sample program:
|
|---|
| 63 | $ gawk -- "BEGIN {print ""\nHello, World!\n""}"
|
|---|
| 64 | This program would print a blank line (based on first "\n"), followed
|
|---|
| 65 | by a line reading "Hello, World!", followed by another blank line
|
|---|
| 66 | (since awk's 'print' statement includes the trailing 'newline').
|
|---|
| 67 |
|
|---|
| 68 | On VMS, to include a quote character inside of a quoted string, two
|
|---|
| 69 | successive quotes ("") must be used.
|
|---|
| 70 | 3 data_files
|
|---|
| 71 | After all dash-options are examined, and after the program text if
|
|---|
| 72 | there were no occurrences of the -f option, remaining (space separated)
|
|---|
| 73 | command line arguments are considered to be data files for the awk
|
|---|
| 74 | program to process. If any of these actually contains an equals sign
|
|---|
| 75 | (=), then it is interpreted as a variable assignment instead of a data
|
|---|
| 76 | file. The syntax is 'variable_name=value'. For example, the command
|
|---|
| 77 | $ gawk -f myprog.awk infile.one flag=2 start=0 infile.two
|
|---|
| 78 | would read file 'infile.one' for the program in 'myprog.awk', then it
|
|---|
| 79 | would set 'flag' to 2 and 'start' to 0, and finally it would read file
|
|---|
| 80 | 'infile.two' for the program. Note that in a case like this, the two
|
|---|
| 81 | assignments actually occur after the first file has been processed,
|
|---|
| 82 | not at program startup when the command line is first scanned.
|
|---|
| 83 | 3 IO_redirection
|
|---|
| 84 | The command parsing in the VMS implementation of GAWK does some
|
|---|
| 85 | emulation of a UN*X-style shell, where certain characters on the
|
|---|
| 86 | command line have special meaning. In particular, the symbols '<',
|
|---|
| 87 | '>', '|', '*', and '?' receive special handling before the main part
|
|---|
| 88 | of the program has a chance to see them. The symbols '<' and '>'
|
|---|
| 89 | perform some file manipulation from the command line:
|
|---|
| 90 |
|
|---|
| 91 | <ifile open file 'ifile' (readonly) as 'stdin' [SYS$INPUT]
|
|---|
| 92 | >nfile create 'nfile' as 'stdout' [SYS$OUTPUT], in stream-lf format
|
|---|
| 93 | >>ofile append to 'ofile' for 'stdout'; create it if necessary
|
|---|
| 94 | >&efile point 'stderr' [SYS$ERROR] at 'efile', but don't open it yet
|
|---|
| 95 | >$vfile create 'vfile' as 'stdout', using RMS attributes appropriate
|
|---|
| 96 | for a standard text file (variable length records with
|
|---|
| 97 | implied carriage control)
|
|---|
| 98 | >+bfile create 'bfile' as 'stdout' using binary mode
|
|---|
| 99 | 2>&1 route error messages into the regular output stream
|
|---|
| 100 | 1>&2 send output data to the error destination
|
|---|
| 101 | <<sentinel error; reading stdin until 'sentinel' not supported
|
|---|
| 102 | <-, >- error; closure of stdin or stdout from cmd line not supported
|
|---|
| 103 | >>$vfile incorrect; would be interpreted as file "$vfile" in stream-lf
|
|---|
| 104 | format rather than as file "vfile" in RMS 'text' format
|
|---|
| 105 | | error; command line pipes not supported
|
|---|
| 106 | 3 wildcard_expansion
|
|---|
| 107 | The command parsing in the VMS implementation of GAWK does some
|
|---|
| 108 | emulation of a UN*X-style shell, where certain characters on the
|
|---|
| 109 | command line have special meaning. In particular, the symbols '<',
|
|---|
| 110 | '>', '*', '%', and '?' receive special handling before the main part
|
|---|
| 111 | of the program has a chance to see them. The symbols '*', '%' and '?'
|
|---|
| 112 | are used as wildcards in filenames. '*' and '%' have their usual VMS
|
|---|
| 113 | meanings of multiple character and single character wildcards,
|
|---|
| 114 | respectively, and '?' is also treated as a single character wildcard.
|
|---|
| 115 | Wildcard expansion only works for filenames specified in native VMS
|
|---|
| 116 | filename syntax (eg, "[-.sibling]*"), not for ones specified pseudo-
|
|---|
| 117 | Unix syntax (eg, "../sibling/*").
|
|---|
| 118 |
|
|---|
| 119 | When a command line argument that should be a filename contains any
|
|---|
| 120 | of the wildcard characters, a directory lookup is attempted for files
|
|---|
| 121 | which match the specified pattern. If one or more matching files are
|
|---|
| 122 | found, those filenames are put into the command line in place of the
|
|---|
| 123 | original pattern. If no matching files are found, the original
|
|---|
| 124 | pattern is left in place.
|
|---|
| 125 | 2 DCL_syntax
|
|---|
| 126 | GAWK's DCL-style interface is more or less a standard DCL command, with
|
|---|
| 127 | one required parameter. Multiple values--when present--are separated
|
|---|
| 128 | by commas.
|
|---|
| 129 |
|
|---|
| 130 | There are two main alternatives, depending on how the awk program is
|
|---|
| 131 | to be passed to GAWK. Both alternatives share most options.
|
|---|
| 132 |
|
|---|
| 133 | Usage: GAWK /COMMANDS="awk program text" data_file[,data_file,...]
|
|---|
| 134 | or GAWK /INPUT=awk_file data_file[,"Var=value",data_file,...]
|
|---|
| 135 | ( or GAWK /INPUT=(awk_file1,awk_file2,...) data_file[,...] )
|
|---|
| 136 | 3 Parameter
|
|---|
| 137 | data_file[,datafile,...] (data_file data_file ...)
|
|---|
| 138 | data_file[,"Var=value",...,data_file,...] (data_file Var=value &c)
|
|---|
| 139 |
|
|---|
| 140 | Data file(s) for the awk program to process. If any of these
|
|---|
| 141 | actually contains an equals sign (=), then it is interpreted as
|
|---|
| 142 | a variable assignment instead of a data file. The syntax is
|
|---|
| 143 | "variable_name=value". Quotes are required for non-file parameters.
|
|---|
| 144 |
|
|---|
| 145 | For example, the command
|
|---|
| 146 | $ gawk/input=myprog.awk infile.one,"flag=2","start=0",infile.two
|
|---|
| 147 | would read file 'infile.one' for the program in 'myprog.awk', then it
|
|---|
| 148 | would set 'flag' to 2 and 'start' to 0, and finally it would read file
|
|---|
| 149 | 'infile.two' for the program. Note that in a case like this, the two
|
|---|
| 150 | assignments actually occur after the first file has been processed,
|
|---|
| 151 | not at program startup when the command line is first scanned.
|
|---|
| 152 |
|
|---|
| 153 | Wildcard file lookups are attempted on data file specifications. See
|
|---|
| 154 | subtopic 'GAWK GNU_syntax wildcard_expansion' for details.
|
|---|
| 155 |
|
|---|
| 156 | At least one data_file parameter value is required. An exception is
|
|---|
| 157 | made if /usage, /version, or /copyright is specified *and* if GAWK is
|
|---|
| 158 | defined as a 'foreign' command rather than a 'native' DCL command.
|
|---|
| 159 | 3 Qualifiers
|
|---|
| 160 | /COMMANDS
|
|---|
| 161 | /COMMANDS="awk program text" (-- "awk program text")
|
|---|
| 162 |
|
|---|
| 163 | For short programs, it is possible to include the complete program
|
|---|
| 164 | on the command line. The quotes are required. Here is a complete
|
|---|
| 165 | sample program:
|
|---|
| 166 | $ gawk/commands="BEGIN {print ""\nHello, World!\n""}" NL:
|
|---|
| 167 | This program would print a blank line (based on first "\n"), followed
|
|---|
| 168 | by a line reading "Hello, World!", followed by another blank line
|
|---|
| 169 | (since awk's 'print' statement includes the trailing 'newline').
|
|---|
| 170 |
|
|---|
| 171 | To include a quote character inside of a quoted string, two
|
|---|
| 172 | successive quotes ("") must be used.
|
|---|
| 173 |
|
|---|
| 174 | Either /COMMANDS or /INPUT (but not both) must be supplied.
|
|---|
| 175 | /INPUT
|
|---|
| 176 | /INPUT=(awk_file1,awk_file2) (-f awk_file1 -f awk_file2)
|
|---|
| 177 |
|
|---|
| 178 | Used to specify one or more files containing the source code of
|
|---|
| 179 | the awk program. If more than one file is used, separate them
|
|---|
| 180 | with commas and enclose the list in parentheses.
|
|---|
| 181 |
|
|---|
| 182 | Multiple source files are processed in order as if they had been
|
|---|
| 183 | concatenated together.
|
|---|
| 184 |
|
|---|
| 185 | Either /INPUT or /COMMANDS (but not both) must be supplied.
|
|---|
| 186 | /FIELD_SEPARATOR
|
|---|
| 187 | /FIELD_SEPARATOR="FS_value" (-F"FS_value")
|
|---|
| 188 |
|
|---|
| 189 | Assign a value to the built in variable FS (field separator).
|
|---|
| 190 | /VARIABLES
|
|---|
| 191 | /VARIABLES=("Var1=val1","Var2=val2",...) (-v Var1=val1 -v Var2=val2)
|
|---|
| 192 |
|
|---|
| 193 | Assign value(s) to the specified variable(s).
|
|---|
| 194 | /REG_EXPR
|
|---|
| 195 | /REG_EXPR={AWK | EGREP | POSIX} (-a vs -e options [obsolete])
|
|---|
| 196 |
|
|---|
| 197 | This qualifier is obsolete and has no effect.
|
|---|
| 198 | /STRICT
|
|---|
| 199 | /[NO]STRICT (-"W compat" option)
|
|---|
| 200 |
|
|---|
| 201 | Use strict awk compatibility mode (/strict) and suppress GAWK
|
|---|
| 202 | extensions. The default is /NOSTRICT.
|
|---|
| 203 | /POSIX
|
|---|
| 204 | /[NO]POSIX (-"W posix" option)
|
|---|
| 205 |
|
|---|
| 206 | Use POSIX compatibility mode (/posix) and suppress GAWK extensions.
|
|---|
| 207 | The default is /NOPOSIX. Slightly more restrictive than /strict.
|
|---|
| 208 | /LINT
|
|---|
| 209 | /[NO]LINT (-"W lint" option)
|
|---|
| 210 |
|
|---|
| 211 | Check the awk program cafefully for potential problems that might
|
|---|
| 212 | be encountered if it were to be used with other awk implementations,
|
|---|
| 213 | and print warnings for anything found. The default in /NOLINT.
|
|---|
| 214 | /VERSION
|
|---|
| 215 | /VERSION (-"W version" option)
|
|---|
| 216 |
|
|---|
| 217 | Print GAWK's version number.
|
|---|
| 218 | /COPYRIGHT
|
|---|
| 219 | /COPYRIGHT (-"W copyright" or -"W copyleft" option)
|
|---|
| 220 |
|
|---|
| 221 | Print a brief version of GAWK's copyright notice.
|
|---|
| 222 | /USAGE
|
|---|
| 223 | /USAGE (comparable to -"W usage" or -"W help" option)
|
|---|
| 224 |
|
|---|
| 225 | Print a compact summary of the command line options.
|
|---|
| 226 |
|
|---|
| 227 | After the 'usage' message is printed, GAWK terminates regardless
|
|---|
| 228 | of any other command line options.
|
|---|
| 229 | /OUTPUT
|
|---|
| 230 | /OUTPUT=out_file (>$out_file)
|
|---|
| 231 |
|
|---|
| 232 | Write program output into 'out_file'. The default is SYS$OUTPUT.
|
|---|
| 233 | 2 awk_language
|
|---|
| 234 | An awk program consists of one or more pattern-action pairs, sometimes
|
|---|
| 235 | referred to as "rules". For each record of an input (data) file, the
|
|---|
| 236 | rules are checked sequentially. Any pattern which matches the input
|
|---|
| 237 | record triggers that rule's action. Actions are instructions which
|
|---|
| 238 | resemble statements in the 'C' programming language. Patterns come
|
|---|
| 239 | in several varieties, including field comparisons, regular expression
|
|---|
| 240 | matching, and special cases defined by reserved keywords.
|
|---|
| 241 |
|
|---|
| 242 | All awk keywords and variables are case-sensitive. Text matching is
|
|---|
| 243 | also sensitive to character case unless the builtin variable IGNORECASE
|
|---|
| 244 | is set to a non-zero value.
|
|---|
| 245 | 3 rules
|
|---|
| 246 | The syntax for a pattern-action 'rule' is simply
|
|---|
| 247 | PATTERN { ACTION }
|
|---|
| 248 | where the braces ({}) are required punctuation for the action.
|
|---|
| 249 | Semicolons (;) or 'newlines' (ie, having the text on a separate line)
|
|---|
| 250 | delimit multiple rules and also multiple actions within a given rule.
|
|---|
| 251 | Either the pattern or the action may be omitted; an empty pattern
|
|---|
| 252 | matches every record of the input file; a missing action (not an empty
|
|---|
| 253 | action inside of braces), is an implicit request to print the current
|
|---|
| 254 | record; an empty action (ie, {}) is legal but not very useful.
|
|---|
| 255 | 3 patterns
|
|---|
| 256 | There are several types of patterns available for awk rules.
|
|---|
| 257 |
|
|---|
| 258 | expression an 'expression' is something to be evaluated (perhaps
|
|---|
| 259 | a comparison or function call) which will
|
|---|
| 260 | be considered true if non-zero (for numeric
|
|---|
| 261 | results) or if non-null (for strings)
|
|---|
| 262 | /regular_expression/ slashes (/) delimit a regular expression
|
|---|
| 263 | which is used as a pattern
|
|---|
| 264 | pattern1, pattern2 a pair of patterns separated by a comma (,),
|
|---|
| 265 | which causes a range of records to trigger
|
|---|
| 266 | the associated action; the records which
|
|---|
| 267 | match the patterns are included in the range
|
|---|
| 268 | <null> an omitted pattern (in this text, the string '<null>'
|
|---|
| 269 | is displayed, but in an awk program, it
|
|---|
| 270 | would really be blank) matches every record
|
|---|
| 271 | BEGIN keyword for specifying a rule to be executed prior to
|
|---|
| 272 | reading the 1st record of the 1st input file
|
|---|
| 273 | END keyword for specifying a rule to be executed after
|
|---|
| 274 | handling the last input record of last file
|
|---|
| 275 | 4 examples
|
|---|
| 276 | Some example patterns (mostly with the corresponding actions omitted)
|
|---|
| 277 |
|
|---|
| 278 | NF > 0 # comparison expression: matches non-null records
|
|---|
| 279 | $0 # implied comparison: also matches non-null records
|
|---|
| 280 | $2 > 1000 && sum <= 999999 # slightly more elaborate expression
|
|---|
| 281 | /x/ # regular expression matching any record with an 'x' in it
|
|---|
| 282 | /^ / # reg-expr matching records beginning with a space
|
|---|
| 283 | $1 == "start", $NF == "stop" # range pattern for input in which
|
|---|
| 284 | some data lines begin with 'start' and/or end with
|
|---|
| 285 | 'stop' in order to collect groups of records
|
|---|
| 286 | { sum += $1 } # null pattern: it's action (add field #1 to
|
|---|
| 287 | variable 'sum') would be executed for every record
|
|---|
| 288 | BEGIN { sum = 0 } # keyword 'BEGIN': perform this action before
|
|---|
| 289 | reading the input file (note: initialization to 0 is
|
|---|
| 290 | unnecessary in awk)
|
|---|
| 291 | END { print "total =", sum } # keyword 'END': perform this
|
|---|
| 292 | action after the last input record has been processed
|
|---|
| 293 | 3 actions
|
|---|
| 294 | An 'action' is something to do when a given record has matched the
|
|---|
| 295 | corresponding pattern in a rule. In general, actions resemble 'C'
|
|---|
| 296 | statements and expressions. The action in a rule must be enclosed
|
|---|
| 297 | in braces ({}).
|
|---|
| 298 |
|
|---|
| 299 | Each action can contain more than one statement or expression to be
|
|---|
| 300 | executed, provided that they're separated by semicolons (;) and/or
|
|---|
| 301 | on separate lines.
|
|---|
| 302 |
|
|---|
| 303 | An omitted action is equivalent to
|
|---|
| 304 | { print $0 }
|
|---|
| 305 | which prints the current record.
|
|---|
| 306 | 3 operators
|
|---|
| 307 | Relational operators
|
|---|
| 308 | == compare for equality
|
|---|
| 309 | != compare for inequality
|
|---|
| 310 | <, <=, >, >= numerical or lexical comparison (less than, less or
|
|---|
| 311 | equal, greater than, greater or equal, respectively)
|
|---|
| 312 | ~ match against a regular expression
|
|---|
| 313 | !~ match against a regular expression, but accept failed matches
|
|---|
| 314 | instead of successful ones
|
|---|
| 315 | Arithmetic operators
|
|---|
| 316 | + addition
|
|---|
| 317 | - subtraction
|
|---|
| 318 | * multiplication
|
|---|
| 319 | / division
|
|---|
| 320 | % remainder
|
|---|
| 321 | ^, ** exponentiation ('**' is a synonym for '^', unless POSIX
|
|---|
| 322 | compatibility is specified, in which case it's invalid)
|
|---|
| 323 | Boolean operators (aka Logical operators)
|
|---|
| 324 | a value is considered false if it's 0 or a null string,
|
|---|
| 325 | it is true otherwise; the result of a boolean operation
|
|---|
| 326 | (and also of a comparison operation) will be 0 when false
|
|---|
| 327 | or 1 when true
|
|---|
| 328 | || or [expression (a || b) is true if either a is true or b
|
|---|
| 329 | is true or both a and b are true; it is false otherwise;
|
|---|
| 330 | b is not evaluated unless a is false (ie, short-circuit)]
|
|---|
| 331 | && and [expression (a && b) is true if both a and b are true;
|
|---|
| 332 | it is false otherwise; b is only evaluated if a is true]
|
|---|
| 333 | ! not [expression (!a) is true if a is false, false otherwise]
|
|---|
| 334 | in array membership; the keyword 'in' tests whether the value
|
|---|
| 335 | on the left represents a current subscript in the array
|
|---|
| 336 | named on the right
|
|---|
| 337 | Conditional operator
|
|---|
| 338 | ? : the conditional operator takes three operands; the first is
|
|---|
| 339 | an expression to evaluate, the second is the expression to
|
|---|
| 340 | use if the first was true, the third is the expression to
|
|---|
| 341 | use if it was false [simple example (a < b ? b : a) gives
|
|---|
| 342 | the maximum of a and b]
|
|---|
| 343 | Assignment operators
|
|---|
| 344 | = store the value on the right into the variable or array slot
|
|---|
| 345 | on the left [expression (a = b) stores the value of b in a]
|
|---|
| 346 | +=, -=, *=, /=, %=, ^=, **= perform the indicated arithmetic
|
|---|
| 347 | operation using the current value of the variable or array
|
|---|
| 348 | element of the left side and the expression on the right
|
|---|
| 349 | side, then store the result in the left side
|
|---|
| 350 | ++ increment by 1 [expression (++a) gets the current value of
|
|---|
| 351 | a and adds 1 to it, stores that back in a, and returns the
|
|---|
| 352 | new value; expression (a++) gets the current value of a,
|
|---|
| 353 | adds 1 to it, stores that back in a, but returns the
|
|---|
| 354 | original value of a]
|
|---|
| 355 | -- decrement by 1 (analogous to increment)
|
|---|
| 356 | String operators
|
|---|
| 357 | there is no explicit operator for string concatenation;
|
|---|
| 358 | two values and/or variables side-by-side are implicitly
|
|---|
| 359 | concatenated into a string (numeric values are first
|
|---|
| 360 | converted into their string equivalents)
|
|---|
| 361 | Conversion between numeric and string values
|
|---|
| 362 | there is no explicit operator for conversion; adding 0
|
|---|
| 363 | to a string with force it to be converted to a number
|
|---|
| 364 | (the numeric value will be 0 if the string does not
|
|---|
| 365 | represent an integer or floating point number); the
|
|---|
| 366 | reverse, converting a number into a string, is done by
|
|---|
| 367 | concatenating a null string ("") to it [the expression
|
|---|
| 368 | (5.75 "") evaluates to "5.75"]
|
|---|
| 369 | Field 'operator'
|
|---|
| 370 | $ prefixing a number or variable with a dollar sign ($)
|
|---|
| 371 | causes the appropriate record field to be returned [($2)
|
|---|
| 372 | gives the second field of the record, ($NF) gives the
|
|---|
| 373 | last field (since the builtin variable NF is set to the
|
|---|
| 374 | number of fields in the current record)]
|
|---|
| 375 | Array subscript operator
|
|---|
| 376 | , multi-dimensional arrays are simulated by using comma (,)
|
|---|
| 377 | separated array indices; the actual index is generated
|
|---|
| 378 | by replacing commas with the value of builtin SUBSEP,
|
|---|
| 379 | then concatenating the expression into a string index
|
|---|
| 380 | [comma is also used to separate arguments in function
|
|---|
| 381 | calls and user-defined function definitions]
|
|---|
| 382 | [comma is *also* used to indicate a range pattern in an
|
|---|
| 383 | awk rule]
|
|---|
| 384 | Escape 'operator'
|
|---|
| 385 | \ In quoted character strings, the backslash (\) character
|
|---|
| 386 | causes the following character to be interpreted in a
|
|---|
| 387 | special manner [string "one\ntwo" has an embedded newline
|
|---|
| 388 | character (linefeed on VMS, but treated as if it were both
|
|---|
| 389 | carriage-return and linefeed); string "\033[" has an ASCII
|
|---|
| 390 | 'escape' character (which has octal value 033) followed by
|
|---|
| 391 | a 'right-bracket' character]
|
|---|
| 392 | Backslash is also used in regular expressions
|
|---|
| 393 | Redirection operators
|
|---|
| 394 | < Read-from -- valid with 'getline'
|
|---|
| 395 | > Write-to (create new file) -- valid with 'print' and 'printf'
|
|---|
| 396 | >> Append-to (create file if it doesn't already exist)
|
|---|
| 397 | | Pipe-from/to -- valid with 'getline', 'print', and 'printf'
|
|---|
| 398 | 4 precedence
|
|---|
| 399 | Operator precedence, listed from highest to lowest. Assignment,
|
|---|
| 400 | conditional, and exponentiation operators group from right to left;
|
|---|
| 401 | all others group from left to right. Parentheses may be used to
|
|---|
| 402 | override the normal order.
|
|---|
| 403 |
|
|---|
| 404 | field ($)
|
|---|
| 405 | increment (++), decrement (--)
|
|---|
| 406 | exponentiation (^, **)
|
|---|
| 407 | unary plus (+), unary minus (-), boolean not (!)
|
|---|
| 408 | multiplication (*), division (/), remainder (%)
|
|---|
| 409 | addition (+), subtraction (-)
|
|---|
| 410 | concatenation (no special symbol; implied by context)
|
|---|
| 411 | relational (==, !=, <, >=, etc), and redirection (<, >, >>, |)
|
|---|
| 412 | Relational and redirection operators have the same precedence
|
|---|
| 413 | and use similar symbols; context distinguishes between them
|
|---|
| 414 | matching (~, !~)
|
|---|
| 415 | array membership ('in')
|
|---|
| 416 | boolean and (&&)
|
|---|
| 417 | boolean or (||)
|
|---|
| 418 | conditional (? :)
|
|---|
| 419 | assignment (=, +=, etc)
|
|---|
| 420 | 4 escaped_characters
|
|---|
| 421 | Inside of a quoted string or constant regular expression, the
|
|---|
| 422 | backslash (\) character gives special meaning to the character(s)
|
|---|
| 423 | after it. Special character letters are case sensitive.
|
|---|
| 424 | \\ results in one backslash in the string
|
|---|
| 425 | \a is an 'alert' (<ctrl/G>. the ASCII <bell> character)
|
|---|
| 426 | \b is a backspace (BS, <ctrl/H>)
|
|---|
| 427 | \f is a form feed (FF, <ctrl/L>)
|
|---|
| 428 | \n 'newline' (<ctrl/J> [line feed treated as CR+LF]
|
|---|
| 429 | \r carriage return (CR, <ctrl/M> [re-positions at the
|
|---|
| 430 | beginning of the current line]
|
|---|
| 431 | \t tab (HT, <ctrl/I>)
|
|---|
| 432 | \v vertical tab (VT, <ctrl/K>)
|
|---|
| 433 | \### is an arbitrary character, where '###' represents 1 to 3
|
|---|
| 434 | octal (ie, 0 thru 7) digits
|
|---|
| 435 | \x## is an alternate arbitrary character, where '##' represents
|
|---|
| 436 | 1 or more hexadecimal (ie, 0 thru 9 and/or A through E
|
|---|
| 437 | and/or a through e) digits; if more than two digits
|
|---|
| 438 | follow, the result is undefined; not recognized if POSIX
|
|---|
| 439 | compatibility mode is specified.
|
|---|
| 440 |
|
|---|
| 441 | When a regular expression is represented in string form ("regex"
|
|---|
| 442 | as opposed to /regex/), backslashes need to be paired. The first
|
|---|
| 443 | one quotes the second during string processing, and the second one
|
|---|
| 444 | remains to be used to quote whatever follows in regular expression
|
|---|
| 445 | processing. For example, to match variable `xxx' against a period
|
|---|
| 446 | character, use (xxx ~ "\\.") or (xxx ~ /\./); if you tried to use
|
|---|
| 447 | (xxx ~ "\."), after string processing it would operate as (xxx ~ /./)
|
|---|
| 448 | and end up matching any single character rather than just a period.
|
|---|
| 449 | 3 statements
|
|---|
| 450 | A statement refers to a unit of instruction found in the action
|
|---|
| 451 | part of an awk rule, and also found in the definition of a function.
|
|---|
| 452 | The distinction between action, statement, and expression usually
|
|---|
| 453 | won't matter to an awk programmer.
|
|---|
| 454 |
|
|---|
| 455 | Compound statements consist of multiple statements separated by
|
|---|
| 456 | semicolons or newlines and enclosed within braces ({}). They are
|
|---|
| 457 | sometimes referred to as 'blocks'.
|
|---|
| 458 | 4 expressions
|
|---|
| 459 | An expression such as 'a = 10' or 'n += i++' is a valid statement.
|
|---|
| 460 |
|
|---|
| 461 | Function invocations such as 'reformat_field($3)' are also valid
|
|---|
| 462 | statements.
|
|---|
| 463 | 4 if-then-else
|
|---|
| 464 | A conditional statement in awk uses the same syntax as for the 'C'
|
|---|
| 465 | programming language: the 'if' keyword, followed by an expression
|
|---|
| 466 | in parentheses, followed by a statement--or block of statements
|
|---|
| 467 | enclosed within braces ({})--which will be executed if the expression
|
|---|
| 468 | is true but skipped if it's false. This can optionally be followed
|
|---|
| 469 | by the 'else' keyword and another statement--or block of statements--
|
|---|
| 470 | which will be executed if (and only if) the expression was false.
|
|---|
| 471 | 5 examples
|
|---|
| 472 | Simple example showing a statement used to control how many numbers
|
|---|
| 473 | are printed on a given line.
|
|---|
| 474 | if ( ++i <= 10 ) #check whether this would be the 11th
|
|---|
| 475 | printf(" %5d", k) #print on current line if not
|
|---|
| 476 | else {
|
|---|
| 477 | printf("\n %5d", k) #print on next line if so
|
|---|
| 478 | i = 1 #and reset the counter
|
|---|
| 479 | }
|
|---|
| 480 | Another example ('next' is described under 'action-controls')
|
|---|
| 481 | if ($1 > $2) { print "rejected"; next } else diff = $2 - $1
|
|---|
| 482 | 4 loops
|
|---|
| 483 | Three types of loop statements are available in awk. Each uses
|
|---|
| 484 | the same syntax as 'C'. The simplest of the three is the 'while'
|
|---|
| 485 | statement. It consists of the 'while' keyword, followed by an
|
|---|
| 486 | expression enclosed within parentheses, followed by a statement--or
|
|---|
| 487 | block of statements in braces ({})--which will be executed if the
|
|---|
| 488 | expression evaluates to true. The expression is evaluated before
|
|---|
| 489 | attempting to execute the statement; if it's true, the statement is
|
|---|
| 490 | executed (the entire block of statements if there is a block) and
|
|---|
| 491 | then the expression is re-evaluated.
|
|---|
| 492 |
|
|---|
| 493 | The second type of loop is the do-while loop. It consists of the
|
|---|
| 494 | 'do' keyword, followed by a statement (usually a block of statements
|
|---|
| 495 | enclosed within braces), followed by the 'while' keyword, followed
|
|---|
| 496 | by a test expression enclosed within parentheses. The statement--or
|
|---|
| 497 | block--is always executed at least once. Then the test expression
|
|---|
| 498 | is evaluated, and the statement(s) re-executed if the result was
|
|---|
| 499 | true (followed by re-evaluation of the test, and so on).
|
|---|
| 500 |
|
|---|
| 501 | The most complex of the three loops is the 'for' statement, and it
|
|---|
| 502 | has a second variant that is not found in 'C'. The ordinary for-loop
|
|---|
| 503 | consists of the 'for' keyword, followed by three semicolon-separated
|
|---|
| 504 | expressions enclosed within parentheses, followed by a statement or
|
|---|
| 505 | brace-enclosed block of statements. The first of the three
|
|---|
| 506 | expressions is an initialization clause; it is done before starting
|
|---|
| 507 | the loop. The second expression is used as a test, just like the
|
|---|
| 508 | expression in a while-loop. It is checked before attempting to
|
|---|
| 509 | execute the statement block, and then re-checked after each execution
|
|---|
| 510 | (if any) of the block. The third expression is an 'increment' clause;
|
|---|
| 511 | it is evaluated after an execution of the statement block and before
|
|---|
| 512 | re-evaluation of the test (2nd) expression. Normally, the increment
|
|---|
| 513 | clause will change a variable used in the test clause, in such a
|
|---|
| 514 | fashion that the test clause will eventually evaluate to false and
|
|---|
| 515 | cause the loop to finish.
|
|---|
| 516 |
|
|---|
| 517 | Note to 'C' programmers: the comma (,) operator commonly used in
|
|---|
| 518 | 'C' for-loop expressions is not valid in awk.
|
|---|
| 519 |
|
|---|
| 520 | The awk-specific variant of the for-loop is used for processing
|
|---|
| 521 | arrays. Its syntax is 'for' keyword, followed by variable_name 'in'
|
|---|
| 522 | array_name (where 'var in array' is enclosed in parentheses),
|
|---|
| 523 | followed by a statement (or block). Each valid subscript value for
|
|---|
| 524 | the array in question is successively placed--in no particular
|
|---|
| 525 | order--into the specified 'index' variable.
|
|---|
| 526 | 5 while_example
|
|---|
| 527 | # strip fields from the input record until there's nothing left
|
|---|
| 528 | while (NF > 0) {
|
|---|
| 529 | $1 = "" #this will affect the value of $0
|
|---|
| 530 | $0 = $0 #this causes $0 and NF to be re-evaluated
|
|---|
| 531 | print
|
|---|
| 532 | }
|
|---|
| 533 | 5 do_while_example
|
|---|
| 534 | # This is a variation of the while_example; it gives a slightly
|
|---|
| 535 | # different display due to the order of operation.
|
|---|
| 536 | # echo input record until all fields have been stripped
|
|---|
| 537 | do {
|
|---|
| 538 | print #output $0
|
|---|
| 539 | $1 = "" #this will affect the value of $0
|
|---|
| 540 | $0 = $0 #this causes $0 and NF to be re-evaluated
|
|---|
| 541 | } while (NF > 0)
|
|---|
| 542 | 5 for_example
|
|---|
| 543 | # echo command line arguments (won't include option switches)
|
|---|
| 544 | for ( i = 0; i < ARGC; i++ ) print ARGV[i]
|
|---|
| 545 |
|
|---|
| 546 | # display contents of builtin environment array
|
|---|
| 547 | for (itm in ENVIRON)
|
|---|
| 548 | print itm, ENVIRON[itm]
|
|---|
| 549 | 4 loop-controls
|
|---|
| 550 | There are two special statements--both from 'C'--for changing the
|
|---|
| 551 | behavior of loop execution. The 'continue' statement is useful in
|
|---|
| 552 | a compound (block) statement; when executed, it effectively skips
|
|---|
| 553 | the rest of the block so that the increment-expression (only for
|
|---|
| 554 | for-loops) and loop-termination expression can be re-evaluated.
|
|---|
| 555 |
|
|---|
| 556 | The 'break' statement, when executed, effectively skips the rest
|
|---|
| 557 | of the block and also treats the test expression as if it were
|
|---|
| 558 | false (instead of actually re-evaluating it). In this case, the
|
|---|
| 559 | increment-expression of a for-loop is also skipped.
|
|---|
| 560 |
|
|---|
| 561 | Inside nested loops, both 'break' and 'continue' only apply to the
|
|---|
| 562 | innermost loop. When in compatibility mode, 'break' or 'continue'
|
|---|
| 563 | may be used outside of a loop; either will be treated like 'next'
|
|---|
| 564 | (see action-controls).
|
|---|
| 565 | 4 action-controls
|
|---|
| 566 | There are two special statements for controlling statement execution.
|
|---|
| 567 | The 'next' statement, when executed, causes the rest of the current
|
|---|
| 568 | action and all further pattern-action rules to be skipped, so that
|
|---|
| 569 | the next input record will be immediately processed. This is useful
|
|---|
| 570 | if any early action knows that the current record will fail all the
|
|---|
| 571 | remaining patterns; skipping those rules will reduce processing time.
|
|---|
| 572 | An extended form, 'next file', is also available. It causes the
|
|---|
| 573 | remainder of the current file to be skipped, and then either the
|
|---|
| 574 | next input file will be processed, if any, or the END action will be
|
|---|
| 575 | performed. 'next file' is not available in traditional awk.
|
|---|
| 576 |
|
|---|
| 577 | The 'exit' statement causes GAWK execution to terminate. All open
|
|---|
| 578 | files are closed, and no further processing is done. The END rule,
|
|---|
| 579 | if any, is executed. 'exit' takes an optional numeric value as a
|
|---|
| 580 | argument which is used as an exit status value, so that some sort
|
|---|
| 581 | of indication of why execution has stopped can be passed on to the
|
|---|
| 582 | user's environment.
|
|---|
| 583 | 4 other_statements
|
|---|
| 584 | The delete statement is used to remove an element from an array.
|
|---|
| 585 | The syntax is 'delete' keyword followed by array name, followed
|
|---|
| 586 | by index value enclosed in square brackets ([]). 'delete' may
|
|---|
| 587 | also used on an array name, without any index specified, to delete
|
|---|
| 588 | all its elements in a single operation.
|
|---|
| 589 |
|
|---|
| 590 | The return statement is used in user-defined functions. The syntax
|
|---|
| 591 | is the keyword 'return' optionally followed by a string or numeric
|
|---|
| 592 | expression.
|
|---|
| 593 |
|
|---|
| 594 | See also subtopic 'functions IO_functions' for a description of
|
|---|
| 595 | 'print', 'printf', and 'getline'.
|
|---|
| 596 | 3 fields
|
|---|
| 597 | When an input record is read, it is automatically split into fields
|
|---|
| 598 | based on the current values of FS (builtin variable defining field
|
|---|
| 599 | separator expression) and RS (builtin variable defining record
|
|---|
| 600 | separator character). The default value of FS is an expression
|
|---|
| 601 | which matches one or more spaces and tabs; the default for RS is
|
|---|
| 602 | newline. If the FIELDWIDTHS variable is set to a space separated
|
|---|
| 603 | list of numbers (as in ``FIELDWIDTHS = "2 3 2"'') then the input
|
|---|
| 604 | is treated as if it had fixed-width fields of the indicated sizes
|
|---|
| 605 | and the FS value will be ignored.
|
|---|
| 606 |
|
|---|
| 607 | The field prefix operator ($), is used to reference a particular
|
|---|
| 608 | field. For example, $3 designates the third field of the current
|
|---|
| 609 | record. The entire record can be referenced via $0 (and it holds
|
|---|
| 610 | the actual input record, not the values of $1, $2, ... concatenated
|
|---|
| 611 | together, so multiple spaces--when present--remain intact, unless
|
|---|
| 612 | a new value gets assigned).
|
|---|
| 613 |
|
|---|
| 614 | The builtin variable NF holds the number of fields in the current
|
|---|
| 615 | record. $NF is therefore the value of the last field. Attempts to
|
|---|
| 616 | access fields beyond NF result in null values (if a record contained
|
|---|
| 617 | 3 fields, the value of $5 would be "").
|
|---|
| 618 |
|
|---|
| 619 | Assigning a new value to $0 causes all the other field values (and NF)
|
|---|
| 620 | to be re-evaluated. Changing a specific field will cause $0 to receive
|
|---|
| 621 | a new value once it's re-evaluated, but until then the other existing
|
|---|
| 622 | fields remain unchanged.
|
|---|
| 623 | 3 variables
|
|---|
| 624 | Variables in awk can hold both numeric and string values and do not
|
|---|
| 625 | have to be pre-declared. In fact, there is no way to explicitly
|
|---|
| 626 | declare them at all. Variable names consist of a leading letter
|
|---|
| 627 | (either upper or lower case, which are distinct from each other)
|
|---|
| 628 | or underscore (_) character followed by any number of letters,
|
|---|
| 629 | digits, or underscores.
|
|---|
| 630 |
|
|---|
| 631 | When a variable that didn't previously exist is referenced, it is
|
|---|
| 632 | created and given a null value. A null value is treated as 0 when
|
|---|
| 633 | used as a number, and is a string of zero characters in length if
|
|---|
| 634 | used as a string.
|
|---|
| 635 | 4 builtin_variables
|
|---|
| 636 | GAWK maintains several 'built-in' variables. All have default values;
|
|---|
| 637 | some are updated automatically. All the builtins have uppercase-only
|
|---|
| 638 | names.
|
|---|
| 639 |
|
|---|
| 640 | These builtin variables control how awk behaves
|
|---|
| 641 | FS input field separator; default is a single space, which is
|
|---|
| 642 | treated as if it were a regular expression for matching
|
|---|
| 643 | one or more spaces and/or tabs; a value of " " also has a
|
|---|
| 644 | second special-case side-effect of causing leading blanks
|
|---|
| 645 | to be ignored instead of producing a null first field;
|
|---|
| 646 | initial value can be specified on the command line with
|
|---|
| 647 | the -F option (or /field_separator); the value can be a
|
|---|
| 648 | regular expression
|
|---|
| 649 | RS input record separator; default value is a newline ("\n");
|
|---|
| 650 | the value can be multiple characters or a regular expression
|
|---|
| 651 | OFS output field separator; value to place between variables in
|
|---|
| 652 | a 'print' statement; default is one space; can be arbitrary
|
|---|
| 653 | string
|
|---|
| 654 | ORS output record separator; value to implicitly terminate 'print'
|
|---|
| 655 | statement with; default is newline ("\n"); can be arbitrary
|
|---|
| 656 | string
|
|---|
| 657 | OFMT default output format used for printing numbers; default
|
|---|
| 658 | value is "%.6g"
|
|---|
| 659 | CONVFMT conversion format used for number-to-string conversions;
|
|---|
| 660 | default value is also "%.6g", like OFMT; not used when the
|
|---|
| 661 | number has a value which may be represented internally as
|
|---|
| 662 | an exact integer (typically within -2147483648 to 2147483647)
|
|---|
| 663 | SUBSEP subscript separator for array indices; used when an array
|
|---|
| 664 | subscript is specified as a comma separated list of values:
|
|---|
| 665 | the comma is replaced by SUBSEP and the resulting index
|
|---|
| 666 | is a concatenation of the values and SUBSEP(s); default
|
|---|
| 667 | value is "\034"; value may be arbitrary string
|
|---|
| 668 | IGNORECASE string and regular expression matching flag; if true
|
|---|
| 669 | (non-zero) matching ignores differences between upper and
|
|---|
| 670 | lower case letters; affects the '~' and '!~' operators,
|
|---|
| 671 | the 'index', 'match', 'split', 'sub', and 'gsub' functions,
|
|---|
| 672 | and field splitting based on FS; default value is false (0);
|
|---|
| 673 | has no effect if GAWK is in strict compatibility mode
|
|---|
| 674 | FIELDWIDTHS space or tab separated list of width sizes; takes
|
|---|
| 675 | precedence over FS when set, but is cleared if FS has a
|
|---|
| 676 | value assigned to it; [note: the current implementation
|
|---|
| 677 | of fixed-field input is considered experimental and is
|
|---|
| 678 | expected to evolve over time]
|
|---|
| 679 |
|
|---|
| 680 | These builtin variables provide useful information
|
|---|
| 681 | NF number of fields in the current record
|
|---|
| 682 | NR record number (accumulated over all files when more than one
|
|---|
| 683 | input file is processed by the same program)
|
|---|
| 684 | FNR current record number of the current input file; reset to 0
|
|---|
| 685 | each time an input file is completed
|
|---|
| 686 | RT record terminator, the input text which matched RS; not
|
|---|
| 687 | available when the `-W traditional' option is used
|
|---|
| 688 | RSTART starting position of substring matched by last invocation
|
|---|
| 689 | of the 'match' function; set to 0 if a match fails and at
|
|---|
| 690 | the start of each input record
|
|---|
| 691 | RLENGTH length of substring matched by the last invocation of the
|
|---|
| 692 | 'match' function; set to -1 if a match fails
|
|---|
| 693 | FILENAME name of the input file currently being processed; the
|
|---|
| 694 | special name "-" is used to represent the standard input
|
|---|
| 695 | ENVIRON array of miscellaneous user environment values; the VMS
|
|---|
| 696 | implementation of GAWK provides values for ["USER"] (the
|
|---|
| 697 | username), ["PATH"] (current default directory), ["HOME"]
|
|---|
| 698 | (the user's login directory), and "[TERM]" (terminal type
|
|---|
| 699 | if available) [all info provided by C RTL's environ]
|
|---|
| 700 | ERRNO information about the cause of failure for 'getline' or
|
|---|
| 701 | 'close'; "0" if no such failure has occured.
|
|---|
| 702 | ARGC number of elements in the ARGV array, counting [0] which is
|
|---|
| 703 | the program name (ie, "gawk")
|
|---|
| 704 | ARGV array of command-line arguments (in [0] to [ARGC-1]); the
|
|---|
| 705 | program name (ie, "gawk") in held in ARGV[0]; command line
|
|---|
| 706 | parameters (data files and "var=value" expressions, but not
|
|---|
| 707 | program options or the awk program text string if present)
|
|---|
| 708 | are stored in ARGV[1] through ARGV[ARGC-1]; the awk program
|
|---|
| 709 | can change values of ARGC and ARGV[] during execution in
|
|---|
| 710 | order to alter which files are processed or which between-
|
|---|
| 711 | file assignments are made
|
|---|
| 712 | ARGIND current index into ARGV[]
|
|---|
| 713 | 4 arrays
|
|---|
| 714 | awk supports associative arrays to collect data into tables. Array
|
|---|
| 715 | elements can be either numeric or string, as can the indices used to
|
|---|
| 716 | access them. Each array must have a unique name, but a given array
|
|---|
| 717 | can hold both string and numeric elements at the same time. Arrays
|
|---|
| 718 | are one-dimensional only, but multi-dimensional arrays can be
|
|---|
| 719 | simulated using comma (,) separated indices, whereby a single index
|
|---|
| 720 | value gets created by replacing commas with SUBSEP and concatenating
|
|---|
| 721 | the resulting expression into a single string.
|
|---|
| 722 |
|
|---|
| 723 | Referencing an array element is done with the expression
|
|---|
| 724 | Array[Index]
|
|---|
| 725 | where 'Array' represents the array's name and 'Index' represents a
|
|---|
| 726 | value or expression used for a subscript. If the requested array
|
|---|
| 727 | element did not exist, it will be created and assigned an initial
|
|---|
| 728 | null value. To check whether an element exists without creating it,
|
|---|
| 729 | use the 'in' boolean operator.
|
|---|
| 730 | Index in Array
|
|---|
| 731 | would check 'Array' for element 'Index' and return 1 if it existed
|
|---|
| 732 | or 0 otherwise. To remove an element from an array, use the 'delete'
|
|---|
| 733 | statement
|
|---|
| 734 | delete Array[Index]
|
|---|
| 735 | To remove all array elements at once, use
|
|---|
| 736 | delete Array
|
|---|
| 737 | Note: the latter is a gawk extension; also, there is no way to
|
|---|
| 738 | delete an ordinary variable or an entire array; 'delete' only works
|
|---|
| 739 | on array elements.
|
|---|
| 740 |
|
|---|
| 741 | To process all elements of an array (in succession) when their
|
|---|
| 742 | subscripts might be unknown, use the 'in' variant of the for-loop
|
|---|
| 743 | for (Index in Array) { ... }
|
|---|
| 744 | 3 functions
|
|---|
| 745 | awk supports both built-in and user-defined functions. A function
|
|---|
| 746 | may be considered a 'black-box' which accepts zero or more input
|
|---|
| 747 | parameters, performs some calculations or other manipulations based
|
|---|
| 748 | on them, and returns a single result.
|
|---|
| 749 |
|
|---|
| 750 | The syntax for calling a function consists of the function name
|
|---|
| 751 | immediately followed by an open parenthesis (left parenthesis '('),
|
|---|
| 752 | followed by an argument list, followed by a closing parenthesis
|
|---|
| 753 | (right parenthesis ')'). The argument list is a sequence of values
|
|---|
| 754 | (numbers, strings, variables, array references, or expressions
|
|---|
| 755 | involving the above and/or nested function calls), separated by
|
|---|
| 756 | commas and optional white space.
|
|---|
| 757 |
|
|---|
| 758 | The parentheses are required punctuation, except for the 'print' and
|
|---|
| 759 | 'printf' builtin IO functions, where they're optional, and for the
|
|---|
| 760 | builtin IO function 'getline', where they're not allowed. Some
|
|---|
| 761 | functions support optional [trailing] arguments which can be simply
|
|---|
| 762 | omitted (along with the corresponding comma if applicable).
|
|---|
| 763 | 4 numeric_functions
|
|---|
| 764 | Builtin numeric functions
|
|---|
| 765 | int(n) returns the value of 'n' with any fraction truncated
|
|---|
| 766 | [truncation of negative values is towards 0]
|
|---|
| 767 | sqrt(n) the square root of n
|
|---|
| 768 | exp(n) the exponential of n ('e' raised to the 'n'th power)
|
|---|
| 769 | log(n) natural logarithm of n
|
|---|
| 770 | sin(n) sine of n (in radians)
|
|---|
| 771 | cos(n) cosine of n (radians)
|
|---|
| 772 | atan2(m,n) arctangent of m/n (radians)
|
|---|
| 773 | rand() random number in the range 0 to 1 (exclusive)
|
|---|
| 774 | srand(s) sets the random number 'seed' to s, so that a sequence
|
|---|
| 775 | of 'random' numbers can be repeated; returns the
|
|---|
| 776 | previous seed value; srand() [argument omitted] sets
|
|---|
| 777 | the seed to an 'unpredictable' value (based on date
|
|---|
| 778 | and time, for instance, so should be unrepeatable)
|
|---|
| 779 | 4 string_functions
|
|---|
| 780 | Builtin string functions
|
|---|
| 781 | index(s,t) search string s for substring t; result is 1-based
|
|---|
| 782 | offset of t within s, or 0 if not found
|
|---|
| 783 | length(s) returns the length of string s; either 'length()'
|
|---|
| 784 | with its argument omitted or 'length' without any
|
|---|
| 785 | parenthesized argument list will return length of $0
|
|---|
| 786 | match(s,r) search string s for regular expression r; the offset
|
|---|
| 787 | of the longest, left-most substring which matches
|
|---|
| 788 | is returned, or 0 if no match was found; the builtin
|
|---|
| 789 | variables RSTART and RLENGTH are also set [RSTART to
|
|---|
| 790 | the return value and RLENGTH to the size of the
|
|---|
| 791 | matching substring, or to -1 if no match was found]
|
|---|
| 792 | split(s,a,f) break string s into components based on field
|
|---|
| 793 | separator f and store them in array a (into elements
|
|---|
| 794 | [1], [2], and so on); the last argument is optional,
|
|---|
| 795 | if omitted, the value of FS is used; the return value
|
|---|
| 796 | is the number of components found
|
|---|
| 797 | sprintf(f,e,...) format expression(s) e using format string f and
|
|---|
| 798 | return the result as a string; formatting is similar
|
|---|
| 799 | to the printf function
|
|---|
| 800 | sub(r,t,s) search string target s for regular expression r, and
|
|---|
| 801 | if a match is found, replace the matching text with
|
|---|
| 802 | substring t, then store the result back in s; if s
|
|---|
| 803 | is omitted, use $0 for the string; the result is
|
|---|
| 804 | either 1 if a match+substitution was made, or 0
|
|---|
| 805 | otherwise; if substring t contains the character
|
|---|
| 806 | '&', the text which matched the regular expression
|
|---|
| 807 | is used instead of '&' [to suppress this feature
|
|---|
| 808 | of '&', 'quote' it with a backslash (\); since this
|
|---|
| 809 | will be inside a quoted string which will receive
|
|---|
| 810 | 'backslash' processing before being passed to sub(),
|
|---|
| 811 | *two* consecutive backslashes will be needed "\\&"]
|
|---|
| 812 | gsub(r,t,s) similar to sub(), but gsub() replaces all nonoverlapping
|
|---|
| 813 | substrings instead of just the first, and the return
|
|---|
| 814 | value is the number of substitutions made
|
|---|
| 815 | gensub(r,t,n,s) search string s ($0 if omitted) for regexp r and
|
|---|
| 816 | replace the n'th occurrence with substring t; the
|
|---|
| 817 | result is the new string and s (or $0) remains
|
|---|
| 818 | unchanged; if n begins with letter "g" or "G" then
|
|---|
| 819 | all matches are replaced instead of just the n'th;
|
|---|
| 820 | if r has parenthesized subexpressions in it, t may
|
|---|
| 821 | contain the special sequences \\0, \\1, through \\9
|
|---|
| 822 | which expand into the value of the corresponding
|
|---|
| 823 | subexpression; this function is a gawk extension
|
|---|
| 824 | substr(s,p,l) extract a substring l characters long starting at
|
|---|
| 825 | offset p in string s; l is optional, if omitted then
|
|---|
| 826 | the remainder of the string (p thru end) is returned
|
|---|
| 827 | tolower(s) return a copy of string s in which every uppercase
|
|---|
| 828 | letter has been converted into lowercase
|
|---|
| 829 | toupper(s) analogous to tolower(); convert lowercase to uppercase
|
|---|
| 830 | 4 time_functions
|
|---|
| 831 | Builtin time functions
|
|---|
| 832 | systime() return the current time of day as the number of seconds
|
|---|
| 833 | since some reference point; on VMS the reference point
|
|---|
| 834 | is January 1, 1970, at 12 AM local time (not UTC)
|
|---|
| 835 | strftime(f,t) format time value t using format f; if t is omitted,
|
|---|
| 836 | the default is systime()
|
|---|
| 837 | 5 time_formats
|
|---|
| 838 | Formatting directives similar to the 'printf' & 'sprintf' functions
|
|---|
| 839 | (each is introduced in the format string by preceding it with a
|
|---|
| 840 | percent sign (%)); the directive is substituted by the corresponding
|
|---|
| 841 | value
|
|---|
| 842 | a abbreviated weekday name (Sun,Mon,Tue,Wed,Thu,Fri,Sat)
|
|---|
| 843 | A full weekday name
|
|---|
| 844 | b abbreviated month name (Jan,Feb,...)
|
|---|
| 845 | B full month name
|
|---|
| 846 | c date and time (Unix-style "aaa bbb dd HH:MM:SS YYYY" format)
|
|---|
| 847 | C century prefix (19 or 20) [not century number, ie 20th]
|
|---|
| 848 | d day of month as two digit decimal number (01-31)
|
|---|
| 849 | D date in mm/dd/yy format
|
|---|
| 850 | e day of month with leading space instead of leading 0 ( 1-31)
|
|---|
| 851 | E ignored; following format character used
|
|---|
| 852 | H hour (24 hour clock) as two digit number (00-23)
|
|---|
| 853 | h abbreviated month name (Jan,Feb,...) [same as %b]
|
|---|
| 854 | I hour (12 hour clock) as two digit number (01-12)
|
|---|
| 855 | j day of year as three digit number (001-366)
|
|---|
| 856 | m month as two digit number (01-12)
|
|---|
| 857 | M minute as two digit number (00-59)
|
|---|
| 858 | n 'newline' (ie, treat %n as \n)
|
|---|
| 859 | O ignored; following format character used
|
|---|
| 860 | p AM/PM designation for 12 hour clock
|
|---|
| 861 | r time in AM/PM format ("II:MM:SS p")
|
|---|
| 862 | R time without seconds ("HH:MM")
|
|---|
| 863 | S second as two digit number (00-59)
|
|---|
| 864 | t tab (ie, treat %t as \t)
|
|---|
| 865 | T time ("HH:MM:SS")
|
|---|
| 866 | U week of year (00-53) [first Sunday is first day of week 1]
|
|---|
| 867 | V date (VMS-style "dd-bbb-YYYY" with 'bbb' forced to uppercase)
|
|---|
| 868 | w weekday as decimal digit (0 [Sunday] through 6 [Saturday])
|
|---|
| 869 | W week of year (00-53) [first _Monday_ is first day of week 1]
|
|---|
| 870 | x date ("aaa bbb dd YYYY")
|
|---|
| 871 | X time ("HH:MM:SS")
|
|---|
| 872 | y year without century (00-99)
|
|---|
| 873 | Y year with century (19yy-20yy)
|
|---|
| 874 | Z time zone name (always "local" for VMS)
|
|---|
| 875 | % literal percent sign (%)
|
|---|
| 876 | 4 IO_functions
|
|---|
| 877 | Builtin I/O functions
|
|---|
| 878 | print x,... print the values of one or more expressions; if none
|
|---|
| 879 | are listed, $0 is used; parentheses are optional;
|
|---|
| 880 | when multiple values are printed, the current value
|
|---|
| 881 | of builtin OFS (default is 1 space) is used to
|
|---|
| 882 | separate them; the print line is implicitly
|
|---|
| 883 | terminated with the current value of ORS (default
|
|---|
| 884 | is newline); print does not have a return value
|
|---|
| 885 | printf(f,x,...) print the values of one or more expressions, using
|
|---|
| 886 | the specified format string; null strings are used
|
|---|
| 887 | to supply missing values (if any); no between field
|
|---|
| 888 | or trailing newline characters are printed, they
|
|---|
| 889 | should be specified within the format string; the
|
|---|
| 890 | argument-enclosing parentheses are optional;
|
|---|
| 891 | printf does not have a return value
|
|---|
| 892 | getline v read a record into variable v; if v is omitted, $0 is
|
|---|
| 893 | used (and NF, NR, and FNR are updated); if v is
|
|---|
| 894 | specified, then field-splitting won't be performed;
|
|---|
| 895 | note: parentheses around the argument are *not*
|
|---|
| 896 | allowed; return value is 1 for successful read, 0
|
|---|
| 897 | if end of file is encountered, or -1 if some sort
|
|---|
| 898 | of error occurred; [see 'redirection' for several
|
|---|
| 899 | variants]
|
|---|
| 900 | close(s) close a file or pipe specified by the string s; the
|
|---|
| 901 | string used should have the same value as the one
|
|---|
| 902 | used in a getline or print/printf redirection
|
|---|
| 903 | fflush(s) flush output stream s; if s is omitted, stdout is
|
|---|
| 904 | flushed; if it is specified but its value is an
|
|---|
| 905 | empty string, all output streams are flushed
|
|---|
| 906 | system(s) pass string s to executed by the operating system;
|
|---|
| 907 | the command string is executed in a subprocess
|
|---|
| 908 | 5 redirection
|
|---|
| 909 | Both getline and print/printf support variant forms which use
|
|---|
| 910 | redirection and pipes.
|
|---|
| 911 |
|
|---|
| 912 | To read from a file (instead of from the primary input file), use
|
|---|
| 913 | getline var < "file"
|
|---|
| 914 | or getline < "file" (read into $0)
|
|---|
| 915 | where the string "file" represents either an actual file name (in
|
|---|
| 916 | quotes) or a variable which contains a file name string value or an
|
|---|
| 917 | expression which evaluates to a string filename.
|
|---|
| 918 |
|
|---|
| 919 | To create a pipe executing some command and read the result into
|
|---|
| 920 | a variable (or into $0), use
|
|---|
| 921 | "command" | getline var
|
|---|
| 922 | or "command" | getline (read into $0)
|
|---|
| 923 | where "command" is a literal string containing an operating system
|
|---|
| 924 | command or a variable with a string value representing such a
|
|---|
| 925 | command.
|
|---|
| 926 |
|
|---|
| 927 | To output into a file other that the primary output, use
|
|---|
| 928 | print x,... > "file" (or >> "file")
|
|---|
| 929 | or printf(f,x,...) > "file" (or >> "file")
|
|---|
| 930 | similar to the 'getline' example above. '>>' causes output to be
|
|---|
| 931 | appended to an existing file if it exists, or create the file if
|
|---|
| 932 | it doesn't already exist. '>' always creates a new file. The
|
|---|
| 933 | alternate redirection method of '>$' (for RMS text file attributes)
|
|---|
| 934 | is *only* available on the command line, not with 'print' or
|
|---|
| 935 | 'printf' in the current release.
|
|---|
| 936 |
|
|---|
| 937 | To output an error message, use 'print' or 'printf' and redirect
|
|---|
| 938 | the output to file "/dev/stderr" (or equivalently to "SYS$ERROR:"
|
|---|
| 939 | on VMS). 'stderr' will normally be the user's terminal, even if
|
|---|
| 940 | ordinary output is being redirected into a file.
|
|---|
| 941 |
|
|---|
| 942 | To feed awk output into another command, use
|
|---|
| 943 | print x,... | "command" (similarly for 'printf')
|
|---|
| 944 | similar to the second 'getline' example. In this case, output
|
|---|
| 945 | from awk will be passed as input to the specified operating system
|
|---|
| 946 | command. The command must be capable of reading input from 'stdin'
|
|---|
| 947 | ("SYS$INPUT:" on VMS) in order to receive data in this manner.
|
|---|
| 948 |
|
|---|
| 949 | The 'close' function operates on the "file" or "command" argument
|
|---|
| 950 | specified here (either a literal string or a variable or expression
|
|---|
| 951 | resulting in a string value). It completely closes the file or
|
|---|
| 952 | pipe so that further references to the same file or command string
|
|---|
| 953 | would re-open that file or command at the beginning. Closing a
|
|---|
| 954 | pipe or redirection also releases some file-oriented resources.
|
|---|
| 955 |
|
|---|
| 956 | Note: the VMS implementation of GAWK uses temporary files to
|
|---|
| 957 | simulate pipes, so a command must finish before 'getline' can get
|
|---|
| 958 | any input from it, and 'close' must be called for an output pipe
|
|---|
| 959 | before any data can be passed to the specified command.
|
|---|
| 960 | 5 formats
|
|---|
| 961 | Formatting characters used by the 'printf' and 'sprintf' functions
|
|---|
| 962 | (each is introduced in the format string by preceding it with a
|
|---|
| 963 | percent sign (%))
|
|---|
| 964 | % include a literal percent sign (%) in the result
|
|---|
| 965 | c format the next argument as a single ASCII character
|
|---|
| 966 | (prints first character of string argument, or corresponding
|
|---|
| 967 | ASCII character if numeric argument, e.g. 65 is 'A')
|
|---|
| 968 | s format the next argument as a string (numeric arguments are
|
|---|
| 969 | converted into strings on demand)
|
|---|
| 970 | d decimal number (ie, integer value in base 10)
|
|---|
| 971 | i integer (equivalent to decimal)
|
|---|
| 972 | o octal number (integer in base 8)
|
|---|
| 973 | x hexadecimal number (integer in base 16) [lowercase]
|
|---|
| 974 | X hexadecimal number [digits 'A' thru 'E' in uppercase]
|
|---|
| 975 | f floating point number (digits, decimal point, fraction digits)
|
|---|
| 976 | e exponential (scientific notation) number (digit, decimal
|
|---|
| 977 | point, fraction digits, letter 'e', sign '+' or '-',
|
|---|
| 978 | exponent digits)
|
|---|
| 979 | g 'fractional' number in either 'e' or 'f' format, whichever
|
|---|
| 980 | produces shorter result
|
|---|
| 981 |
|
|---|
| 982 | Several optional modifiers can be placed between the initiating
|
|---|
| 983 | percent sign and the format character (doesn't apply to %%).
|
|---|
| 984 | - left justify (only matters when width specifier is present)
|
|---|
| 985 | (space) for numeric specifiers, prefix nonnegative values with
|
|---|
| 986 | a space and negative values with a minus sign
|
|---|
| 987 | + for numeric specifiers, prefix nonnegative values with a plus
|
|---|
| 988 | sign and negative values with a minus sign
|
|---|
| 989 | # alternate form applicable to several of the format characters
|
|---|
| 990 | (o, x, X, e, E, f, g, G)
|
|---|
| 991 | NN width ['NN' represents 1 or more decimal digits]; actually
|
|---|
| 992 | minimum width to use, longer items will not be truncated; a
|
|---|
| 993 | leading 0 will cause right-justified numbers to be padded on
|
|---|
| 994 | the left with zeroes instead of spaces when they're aligned
|
|---|
| 995 | .MM precision [decimal point followed by 1 or more digits]; used
|
|---|
| 996 | as maximum width for strings (causing truncation if they're
|
|---|
| 997 | actually longer) or as number of fraction digits for 'f' or
|
|---|
| 998 | 'e' numeric formats, or number of significant digits for 'g'
|
|---|
| 999 | numeric format
|
|---|
| 1000 | 4 user_defined_functions
|
|---|
| 1001 | User-defined functions may be created as needed to simplify awk
|
|---|
| 1002 | programs or to collect commonly used code into one place. The
|
|---|
| 1003 | general syntax of a user-defined function is the 'function' keyword
|
|---|
| 1004 | followed by unique function name, followed by a comma-separated
|
|---|
| 1005 | parameter list enclosed in parentheses, followed by statement(s)
|
|---|
| 1006 | enclosed within braces ({}). A 'return' statement is customary
|
|---|
| 1007 | but is not required.
|
|---|
| 1008 | function FuncName(arg1,arg2) {
|
|---|
| 1009 | # arbitrary statements
|
|---|
| 1010 | return (arg1 + arg2) / 2
|
|---|
| 1011 | }
|
|---|
| 1012 | If a function does not use 'return' to specify an output value, the
|
|---|
| 1013 | result received by the caller will be unpredictable.
|
|---|
| 1014 |
|
|---|
| 1015 | Functions may be placed in an awk program before, between, or after
|
|---|
| 1016 | the pattern-action rules. The abbreviation 'func' may be used in
|
|---|
| 1017 | place of 'function', unless POSIX compatibility mode is in effect.
|
|---|
| 1018 | 3 regular_expressions
|
|---|
| 1019 | A regular expression is a shorthand way of specifying a 'wildcard'
|
|---|
| 1020 | type of string comparison. Regular expression matching is very
|
|---|
| 1021 | fundamental to awk's operation.
|
|---|
| 1022 |
|
|---|
| 1023 | Meta symbols
|
|---|
| 1024 | ^ matches beginning of line or beginning of string; note that
|
|---|
| 1025 | embedded newlines ('\n') create multi-line strings, so
|
|---|
| 1026 | beginning of line is not necessarily beginning of string
|
|---|
| 1027 | $ matches end of line or end of string
|
|---|
| 1028 | . any single character (except newline)
|
|---|
| 1029 | [ ] set of characters; [ABC] matches either 'A' or 'B' or 'C'; a
|
|---|
| 1030 | dash (other than first or last of the set) denotes a range
|
|---|
| 1031 | of characters: [A-Z] matches any upper case letter; if the
|
|---|
| 1032 | first character of the set is '^', then the sense of match
|
|---|
| 1033 | is reversed: [^0-9] matches any non-digit; several
|
|---|
| 1034 | characters need to be quoted with backslash (\) if they
|
|---|
| 1035 | occur in a set: '\', ']', '-', and '^'; within sets,
|
|---|
| 1036 | various special character class designations are recognized,
|
|---|
| 1037 | such as [:digit:] and [:punct:], as per POSIX
|
|---|
| 1038 | | alternation (similar to boolean 'or'); match either of two
|
|---|
| 1039 | patterns [for example "^start|stop$" matches leading 'start'
|
|---|
| 1040 | or trailing 'stop']
|
|---|
| 1041 | ( ) grouping, alter normal precedence [for example, "^(start|stop)$"
|
|---|
| 1042 | matches lines reading either 'start' or 'stop']
|
|---|
| 1043 | * repeated matching; when placed after a pattern, indicates that
|
|---|
| 1044 | the pattern should match any number of times [for example,
|
|---|
| 1045 | "[a-z][0-9]*" matches a lower case letter followed by zero or
|
|---|
| 1046 | more digits]
|
|---|
| 1047 | + repeated matching; when placed after a pattern, indicates that
|
|---|
| 1048 | the pattern should match one or more times ["[0-9]+" matches
|
|---|
| 1049 | any non-empty sequence of digits]
|
|---|
| 1050 | ? optional matching; indicates that the pattern can match zero or
|
|---|
| 1051 | one times ["[a-z][0-9]?" matches lower case letter alone or
|
|---|
| 1052 | followed by a single digit]
|
|---|
| 1053 | { } interval specification; {n} to match n times or {m,n} to match
|
|---|
| 1054 | at least m but not more than n times; only functional when
|
|---|
| 1055 | either the `-W posix' or `-W re-interval' options are used
|
|---|
| 1056 | \ quote; prevent the character which follows from having special
|
|---|
| 1057 | meaning; if the regexp is specified as a string, then the
|
|---|
| 1058 | backslash itself will need to be quoted by preceding it with
|
|---|
| 1059 | another backslash
|
|---|
| 1060 |
|
|---|
| 1061 | A regular expression which matches a string or line will match against
|
|---|
| 1062 | the first (left-most) substring which meets the pattern and include
|
|---|
| 1063 | the longest sequence of characters which still meets that pattern.
|
|---|
| 1064 | 3 comments
|
|---|
| 1065 | Comments in awk programs are introduced with '#'. Anything after
|
|---|
| 1066 | '#' on a line is ignored by GAWK. It's a good idea to include an
|
|---|
| 1067 | explanation of what an awk program is doing and also who wrote it
|
|---|
| 1068 | and when.
|
|---|
| 1069 | 3 further_information
|
|---|
| 1070 | For complete documentation on GAWK, see "Effective AWK Programming"
|
|---|
| 1071 | by Arnold Robbins. The second edition (ISBN 1-57831-000-8) is jointly
|
|---|
| 1072 | published by SSC and the FSF (http://www.ssc.com).
|
|---|
| 1073 |
|
|---|
| 1074 | Source text for it is present in the file GAWK.TEXI. A postscript
|
|---|
| 1075 | version is available via anonymous FTP from host gnudist.gnu.org in
|
|---|
| 1076 | directory /gnu/gawk, file gawk-{version}-doc.tar.gz where {version}
|
|---|
| 1077 | would be the current version number, such as 3.0.6.
|
|---|
| 1078 |
|
|---|
| 1079 | Another source of documentation is "The AWK Programming Language"
|
|---|
| 1080 | by Aho, Weinberger, and Kernighan (1988), published by Addison-Wesley.
|
|---|
| 1081 | ISBN code is 0-201-07981-X.
|
|---|
| 1082 |
|
|---|
| 1083 | Each of these works contains both a reference on the awk language
|
|---|
| 1084 | and a tutorial on awk's use, with many sample programs.
|
|---|
| 1085 | 3 authors
|
|---|
| 1086 | The awk programming language was originally created by Alfred V. Aho,
|
|---|
| 1087 | Peter J. Weinberger, and Brian W. Kernighan in 1977. The language
|
|---|
| 1088 | was revised and enhanced in a new version which was released in 1985.
|
|---|
| 1089 |
|
|---|
| 1090 | GAWK, the GNU implementation of awk, was written in 1986 by Paul Rubin
|
|---|
| 1091 | and Jay Fenlason, with advice from Richard Stallman, and with
|
|---|
| 1092 | contributions from John Woods. In 1988 and 1989, David Trueman and
|
|---|
| 1093 | Arnold Robbins revised GAWK for compatibility with the newer awk.
|
|---|
| 1094 | Arnold Robbins is the current maintainer.
|
|---|
| 1095 |
|
|---|
| 1096 | GAWK version 2.11.1 was ported to VMS by Pat Rankin in November, 1989,
|
|---|
| 1097 | with further revisions in the Spring of 1990. The VMS port was
|
|---|
| 1098 | incorporated into the official GNU distribution of version 2.13 in
|
|---|
| 1099 | Spring 1991. (Version 2.12 was never publically released.)
|
|---|
| 1100 | 2 release_notes
|
|---|
| 1101 | GAWK 3.1.2 handles parsing of the command line differently than
|
|---|
| 1102 | earlier versions for the case where there is a single token, which
|
|---|
| 1103 | often yielded a "missing required element" error in earlier versions.
|
|---|
| 1104 |
|
|---|
| 1105 | [Note for 3.1.x: these release notes haven't been updated in quite
|
|---|
| 1106 | some time. Most of the information is still applicable though.]
|
|---|
| 1107 |
|
|---|
| 1108 | GAWK 3.0.3 tested under VAX/VMS V6.2 and Alpha/VMS V6.2, April, 1997;
|
|---|
| 1109 | should be compatible with VMS versions V4.6 and later. Current source
|
|---|
| 1110 | code is compatible with DEC's DEC C v5.x or VAX C v3.2; also compiles
|
|---|
| 1111 | successfully with GNU C (tested with gcc-vms 2.7.1).
|
|---|
| 1112 | 3 AWK_LIBRARY
|
|---|
| 1113 | GAWK uses a built in search path when looking for a program file
|
|---|
| 1114 | specified by the -f option (or the /input qualifier) when that file
|
|---|
| 1115 | name does not include a device and/or directory. GAWK will first
|
|---|
| 1116 | look in the current default directory, then if the file wasn't found
|
|---|
| 1117 | it will look in the directory specified by the translation of logical
|
|---|
| 1118 | name "AWK_LIBRARY".
|
|---|
| 1119 | 3 known_problems
|
|---|
| 1120 | There are several known problems with GAWK running on VMS. Some can
|
|---|
| 1121 | be ignored, others require work-arounds.
|
|---|
| 1122 | 4 file_formats
|
|---|
| 1123 | If a file having the RMS attribute "Fortran carriage control" is
|
|---|
| 1124 | read as input, it will generate an empty first record if the first
|
|---|
| 1125 | actual record begins with a space (leading space becomes a newline).
|
|---|
| 1126 | Also, the last record of the file will give a "record not terminated"
|
|---|
| 1127 | warning. Both of these minor problems are due to the way that the
|
|---|
| 1128 | C Run-Time Library (VAXCRTL) converts record attributes.
|
|---|
| 1129 |
|
|---|
| 1130 | Another poor feature without a work-around is that there's no way to
|
|---|
| 1131 | specify "append if possible, create with RMS text attributes if not"
|
|---|
| 1132 | with the current command line I/O redirection. '>>$' isn't supported.
|
|---|
| 1133 | Ditto for binary output; '>>+' isn't supported.
|
|---|
| 1134 | 4 RS_peculiarities
|
|---|
| 1135 | Changing the record separator to something other than newline ('\n')
|
|---|
| 1136 | will produce anomalous results for ordinary files. For example,
|
|---|
| 1137 | using RS = "\f" and FS = "\n" with the following input
|
|---|
| 1138 | |rec 1, line 1
|
|---|
| 1139 | |rec 1, line 2
|
|---|
| 1140 | |^L (form feed)
|
|---|
| 1141 | |rec 2, line 1
|
|---|
| 1142 | |rec 2, line 2
|
|---|
| 1143 | |^L (form feed)
|
|---|
| 1144 | |rec 3, line 1
|
|---|
| 1145 | |rec 3, line 2
|
|---|
| 1146 | |(end of file)
|
|---|
| 1147 | will produce two fields for record 1, but three fields each for
|
|---|
| 1148 | records 2 and 3. This is because the form-feed record delimiter is
|
|---|
| 1149 | on its own line, so awk sees a newline after it. Since newline is
|
|---|
| 1150 | now a field separator, records 2 and 3 will have null first fields.
|
|---|
| 1151 | The following awk code will work-around this problem by inserting
|
|---|
| 1152 | a null first field in the first record, so that all records can be
|
|---|
| 1153 | handled the same by subsequent processing.
|
|---|
| 1154 | # fix up for first record (RS != "\n")
|
|---|
| 1155 | FNR == 1 { if ( $0 == "" ) #leading separator
|
|---|
| 1156 | next #skip its null record
|
|---|
| 1157 | else #otherwise,
|
|---|
| 1158 | $0 = FS $0 #realign fields
|
|---|
| 1159 | }
|
|---|
| 1160 | There is a second problem with this same example. It will always
|
|---|
| 1161 | trigger a "record not terminated" warning when it reaches the end of
|
|---|
| 1162 | file. In the sample shown, there is no final separator; however, if
|
|---|
| 1163 | a trailing form-feed were present, it would produce a spurious final
|
|---|
| 1164 | record with two null fields. This occurs because the I/O system
|
|---|
| 1165 | sees an implicit newline at the end of the last record, so awk sees
|
|---|
| 1166 | a pair of null fields separated by that newline. The following code
|
|---|
| 1167 | fragment will fix that provided there are no null records (in this
|
|---|
| 1168 | case, that would be two consecutive lines containing just form-feeds).
|
|---|
| 1169 | # fix up for last record (RS != "\n")
|
|---|
| 1170 | $0 == FS { next } #drop spurious final record
|
|---|
| 1171 | Note that the "record not terminated" warning will persist.
|
|---|
| 1172 | 4 cmd_inconsistency
|
|---|
| 1173 | The DCL qualifier /OUTPUT is internally equivalent to '>$' output
|
|---|
| 1174 | redirection, but the qualifier /INPUT corresponds to the -f option
|
|---|
| 1175 | rather than to '<' input redirection.
|
|---|
| 1176 | 4 exit
|
|---|
| 1177 | The exit statement can optionally pass a final status value to the
|
|---|
| 1178 | operating system. GAWK expects a UN*X-style value instead of a
|
|---|
| 1179 | VMS status value, so 0 indicates success and non-zero indicates
|
|---|
| 1180 | failure. The final exit status will be 1 (VMS success) if 0 is
|
|---|
| 1181 | used, or even (VMS non-success) if non-zero is used.
|
|---|
| 1182 | !3 changes
|
|---|
| 1183 | 3 prior_changes
|
|---|
| 1184 | Changes between version 3.0.6 and 2.15.6
|
|---|
| 1185 |
|
|---|
| 1186 | General
|
|---|
| 1187 | RS can contain multiple characters or be a regexp
|
|---|
| 1188 | Regular expression interval support added
|
|---|
| 1189 | gensub() and fflush() functions added
|
|---|
| 1190 | memory leak(s) introduced in 3.0.2 or 3.0.1 fixed
|
|---|
| 1191 | the user manual has been substantially revised
|
|---|
| 1192 |
|
|---|
| 1193 | VMS-specific
|
|---|
| 1194 | Switched to build with DEC C by default
|
|---|
| 1195 | Changes between version 2.15.6 and 2.14
|
|---|
| 1196 |
|
|---|
| 1197 | General
|
|---|
| 1198 | Many obscure bugs fixed
|
|---|
| 1199 | `delete' may operate on an entire array
|
|---|
| 1200 | ARGIND and ERRNO builtin variables added
|
|---|
| 1201 |
|
|---|
| 1202 | VMS-specific
|
|---|
| 1203 | `>+ file' binary-mode output redirection added
|
|---|
| 1204 | /variable=(foo=42) fixed
|
|---|
| 1205 | Floating point number formatting improved
|
|---|
| 1206 |
|
|---|
| 1207 | Changes between version 2.14 and 2.13.2:
|
|---|
| 1208 |
|
|---|
| 1209 | General
|
|---|
| 1210 | 'next file' construct added
|
|---|
| 1211 | 'continue' outside of any loop is treated as 'next'
|
|---|
| 1212 | Assorted bug fixes and efficiency improvements
|
|---|
| 1213 | _The_GAWK_Manual_ updated
|
|---|
| 1214 | Test suite expanded
|
|---|
| 1215 |
|
|---|
| 1216 | VMS-specific
|
|---|
| 1217 | VMS POSIX support added
|
|---|
| 1218 | Disk I/O throughput enhanced
|
|---|
| 1219 | Pipe emulation improved and incorrect interaction with user-mode
|
|---|
| 1220 | redefinition of SYS$OUTPUT eliminated
|
|---|
| 1221 |
|
|---|
| 1222 | Changes between version 2.13 and 2.11.1: (2.12 was not released)
|
|---|
| 1223 |
|
|---|
| 1224 | General
|
|---|
| 1225 | CONVFMT and FIELDWIDTHS builtin control variables added
|
|---|
| 1226 | systime() and strftime() date/time functions added
|
|---|
| 1227 | 'lint' and 'posix' run-time options added
|
|---|
| 1228 | '-W' command line option syntax supercedes '-c', '-C', and '-V'
|
|---|
| 1229 | '-a' and '-e' regular expression options made obsolete
|
|---|
| 1230 | Various bug fixes and efficiency improvements
|
|---|
| 1231 | More platforms supported ('officially' including VMS)
|
|---|
| 1232 |
|
|---|
| 1233 | VMS-specific
|
|---|
| 1234 | %g printf format fixed
|
|---|
| 1235 | Handling of '\' on command line modified; no longer necessary to
|
|---|
| 1236 | double it up
|
|---|
| 1237 | Problem redirecting stderr (>&efile) at same time as stdin (<ifile)
|
|---|
| 1238 | or stdout (>ofile) has been fixed
|
|---|
| 1239 | ``2>&1'' and ``1>&2'' redirection constructs added
|
|---|
| 1240 | Interaction between command line I/O redirection and gawk pipes
|
|---|
| 1241 | fixed; also, name used for pseudo-pipe temporary file expanded
|
|---|
| 1242 | 3 license
|
|---|
| 1243 | GAWK is covered by the "GNU General Public License", the gist of which
|
|---|
| 1244 | is that if you supply this software to a third party, you are expressly
|
|---|
| 1245 | forbidden to prevent them from supplying it to a fourth party, and if
|
|---|
| 1246 | you supply binaries you must make the source code available to them
|
|---|
| 1247 | at no additional cost. Any revisions or modified versions are also
|
|---|
| 1248 | covered by the same license. There is no warranty, express or implied,
|
|---|
| 1249 | for this software. It is provided "as is."
|
|---|
| 1250 |
|
|---|
| 1251 | [Disclaimer: This is just an informal summary with no legal basis;
|
|---|
| 1252 | refer to the actual GNU General Public License for specific details.]
|
|---|
| 1253 | !2 examples
|
|---|
| 1254 | !
|
|---|