source: trunk/essentials/sys-apps/gawk/vms/gawk.hlp

Last change on this file was 3076, checked in by bird, 18 years ago

gawk 3.1.5

File size: 63.1 KB
Line 
1! Gawk.Hlp
2! Pat Rankin, Jun'90
3! revised, Jun'91
4! revised, Jul'92
5! revised, Jan'95
6! revised, Apr'97
7! revised, Jan'03
8! Online help for GAWK.
9!
101 GAWK
11 GAWK is GNU awk, the Free Software Foundation's implementation of
12 the awk programming language. awk is an interpretive language which
13 can handle many data-reformatting jobs with just a few lines of code.
14 It has powerful string manipulation and pattern matching capabilities
15 built in. This version is compatible with POSIX 1003.2 awk.
16
17 The VMS version of GAWK supports both the original UN*X-style command
18 interface and a DCL interface. The only setup requirement for GAWK
19 is to define it as a 'foreign' command: a DCL symbol with a value
20 which begins with '$'.
21 $ GAWK :== $disk:[directory]GAWK
222 GNU_syntax
23 GAWK's UN*X-style interface uses the 'dash' convention for specifying
24 options and uses spaces to separate multiple arguments.
25
26 There are two main alternatives, depending on how the awk program is
27 to be passed to GAWK. Both alternatives share most options.
28
29 Usage: $ gawk [-W opts] [-F fs] [-v var=val] -f progfile [--] file ...
30 or $ gawk [-W opts] [-F fs] [-v var=val] [--] "program" file ...
31
32 The options are case-sensitive. On VMS, the DCL command interpreter
33 converts unquoted text into uppercase before passing it to the running
34 program. However, GAWK is written in 'C' and the C Run-Time Library
35 (VAXCRTL or DECC$SHR) converts unquoted text into *lowercase*.
36 Therefore, the -Fval and -W options must be enclosed in quotes.
373 options
38 -f file use the specified file as the awk program source; if more
39 than one instance of -f is used, each file will be read
40 in succession
41 -Fstring define a value for the FS variable (field separator)
42 -v var=val assign a value of 'val' to the variable 'var'
43 -W 'options' additional gawk-specific options; multiple values may
44 be separated by commas, or by spaces if they're quoted,
45 or mulitple occurrences of -W may be used.
46 -W compat use awk "compatibility mode" to disable GAWK extensions
47 and get the behavior of UN*X awk.
48 -W copyright [or -W copyleft] display an abbreviated version of
49 the GNU copyright information
50 -W help list command line options (same as -W usage)
51 -W lint warn about suspect or non-portable awk program code
52 -W lint-old warn about constructs not available in original awk
53 -W posix compatibility mode with additional restrictions
54 -W re-interval evaluate '{' and '}' as intervals in regular expressions
55 -W traditional suppress POSIX and GNU regular expression extensions
56 -W usage list command line options (same as -W help)
57 -W version display program version number
58 -- don't check further arguments for leading dash
593 program_text
60 If the '-f file' option is not used on the command line, then the
61 first "non-dash" argument is assumed to be a string of text containing
62 the awk source program. Here is a complete sample program:
63 $ gawk -- "BEGIN {print ""\nHello, World!\n""}"
64 This program would print a blank line (based on first "\n"), followed
65 by a line reading "Hello, World!", followed by another blank line
66 (since awk's 'print' statement includes the trailing 'newline').
67
68 On VMS, to include a quote character inside of a quoted string, two
69 successive quotes ("") must be used.
703 data_files
71 After all dash-options are examined, and after the program text if
72 there were no occurrences of the -f option, remaining (space separated)
73 command line arguments are considered to be data files for the awk
74 program to process. If any of these actually contains an equals sign
75 (=), then it is interpreted as a variable assignment instead of a data
76 file. The syntax is 'variable_name=value'. For example, the command
77 $ gawk -f myprog.awk infile.one flag=2 start=0 infile.two
78 would read file 'infile.one' for the program in 'myprog.awk', then it
79 would set 'flag' to 2 and 'start' to 0, and finally it would read file
80 'infile.two' for the program. Note that in a case like this, the two
81 assignments actually occur after the first file has been processed,
82 not at program startup when the command line is first scanned.
833 IO_redirection
84 The command parsing in the VMS implementation of GAWK does some
85 emulation of a UN*X-style shell, where certain characters on the
86 command line have special meaning. In particular, the symbols '<',
87 '>', '|', '*', and '?' receive special handling before the main part
88 of the program has a chance to see them. The symbols '<' and '>'
89 perform some file manipulation from the command line:
90
91 <ifile open file 'ifile' (readonly) as 'stdin' [SYS$INPUT]
92 >nfile create 'nfile' as 'stdout' [SYS$OUTPUT], in stream-lf format
93 >>ofile append to 'ofile' for 'stdout'; create it if necessary
94 >&efile point 'stderr' [SYS$ERROR] at 'efile', but don't open it yet
95 >$vfile create 'vfile' as 'stdout', using RMS attributes appropriate
96 for a standard text file (variable length records with
97 implied carriage control)
98 >+bfile create 'bfile' as 'stdout' using binary mode
99 2>&1 route error messages into the regular output stream
100 1>&2 send output data to the error destination
101 <<sentinel error; reading stdin until 'sentinel' not supported
102 <-, >- error; closure of stdin or stdout from cmd line not supported
103 >>$vfile incorrect; would be interpreted as file "$vfile" in stream-lf
104 format rather than as file "vfile" in RMS 'text' format
105 | error; command line pipes not supported
1063 wildcard_expansion
107 The command parsing in the VMS implementation of GAWK does some
108 emulation of a UN*X-style shell, where certain characters on the
109 command line have special meaning. In particular, the symbols '<',
110 '>', '*', '%', and '?' receive special handling before the main part
111 of the program has a chance to see them. The symbols '*', '%' and '?'
112 are used as wildcards in filenames. '*' and '%' have their usual VMS
113 meanings of multiple character and single character wildcards,
114 respectively, and '?' is also treated as a single character wildcard.
115 Wildcard expansion only works for filenames specified in native VMS
116 filename syntax (eg, "[-.sibling]*"), not for ones specified pseudo-
117 Unix syntax (eg, "../sibling/*").
118
119 When a command line argument that should be a filename contains any
120 of the wildcard characters, a directory lookup is attempted for files
121 which match the specified pattern. If one or more matching files are
122 found, those filenames are put into the command line in place of the
123 original pattern. If no matching files are found, the original
124 pattern is left in place.
1252 DCL_syntax
126 GAWK's DCL-style interface is more or less a standard DCL command, with
127 one required parameter. Multiple values--when present--are separated
128 by commas.
129
130 There are two main alternatives, depending on how the awk program is
131 to be passed to GAWK. Both alternatives share most options.
132
133 Usage: GAWK /COMMANDS="awk program text" data_file[,data_file,...]
134 or GAWK /INPUT=awk_file data_file[,"Var=value",data_file,...]
135 ( or GAWK /INPUT=(awk_file1,awk_file2,...) data_file[,...] )
1363 Parameter
137 data_file[,datafile,...] (data_file data_file ...)
138 data_file[,"Var=value",...,data_file,...] (data_file Var=value &c)
139
140 Data file(s) for the awk program to process. If any of these
141 actually contains an equals sign (=), then it is interpreted as
142 a variable assignment instead of a data file. The syntax is
143 "variable_name=value". Quotes are required for non-file parameters.
144
145 For example, the command
146 $ gawk/input=myprog.awk infile.one,"flag=2","start=0",infile.two
147 would read file 'infile.one' for the program in 'myprog.awk', then it
148 would set 'flag' to 2 and 'start' to 0, and finally it would read file
149 'infile.two' for the program. Note that in a case like this, the two
150 assignments actually occur after the first file has been processed,
151 not at program startup when the command line is first scanned.
152
153 Wildcard file lookups are attempted on data file specifications. See
154 subtopic 'GAWK GNU_syntax wildcard_expansion' for details.
155
156 At least one data_file parameter value is required. An exception is
157 made if /usage, /version, or /copyright is specified *and* if GAWK is
158 defined as a 'foreign' command rather than a 'native' DCL command.
1593 Qualifiers
160/COMMANDS
161 /COMMANDS="awk program text" (-- "awk program text")
162
163 For short programs, it is possible to include the complete program
164 on the command line. The quotes are required. Here is a complete
165 sample program:
166 $ gawk/commands="BEGIN {print ""\nHello, World!\n""}" NL:
167 This program would print a blank line (based on first "\n"), followed
168 by a line reading "Hello, World!", followed by another blank line
169 (since awk's 'print' statement includes the trailing 'newline').
170
171 To include a quote character inside of a quoted string, two
172 successive quotes ("") must be used.
173
174 Either /COMMANDS or /INPUT (but not both) must be supplied.
175/INPUT
176 /INPUT=(awk_file1,awk_file2) (-f awk_file1 -f awk_file2)
177
178 Used to specify one or more files containing the source code of
179 the awk program. If more than one file is used, separate them
180 with commas and enclose the list in parentheses.
181
182 Multiple source files are processed in order as if they had been
183 concatenated together.
184
185 Either /INPUT or /COMMANDS (but not both) must be supplied.
186/FIELD_SEPARATOR
187 /FIELD_SEPARATOR="FS_value" (-F"FS_value")
188
189 Assign a value to the built in variable FS (field separator).
190/VARIABLES
191 /VARIABLES=("Var1=val1","Var2=val2",...) (-v Var1=val1 -v Var2=val2)
192
193 Assign value(s) to the specified variable(s).
194/REG_EXPR
195 /REG_EXPR={AWK | EGREP | POSIX} (-a vs -e options [obsolete])
196
197 This qualifier is obsolete and has no effect.
198/STRICT
199 /[NO]STRICT (-"W compat" option)
200
201 Use strict awk compatibility mode (/strict) and suppress GAWK
202 extensions. The default is /NOSTRICT.
203/POSIX
204 /[NO]POSIX (-"W posix" option)
205
206 Use POSIX compatibility mode (/posix) and suppress GAWK extensions.
207 The default is /NOPOSIX. Slightly more restrictive than /strict.
208/LINT
209 /[NO]LINT (-"W lint" option)
210
211 Check the awk program cafefully for potential problems that might
212 be encountered if it were to be used with other awk implementations,
213 and print warnings for anything found. The default in /NOLINT.
214/VERSION
215 /VERSION (-"W version" option)
216
217 Print GAWK's version number.
218/COPYRIGHT
219 /COPYRIGHT (-"W copyright" or -"W copyleft" option)
220
221 Print a brief version of GAWK's copyright notice.
222/USAGE
223 /USAGE (comparable to -"W usage" or -"W help" option)
224
225 Print a compact summary of the command line options.
226
227 After the 'usage' message is printed, GAWK terminates regardless
228 of any other command line options.
229/OUTPUT
230 /OUTPUT=out_file (>$out_file)
231
232 Write program output into 'out_file'. The default is SYS$OUTPUT.
2332 awk_language
234 An awk program consists of one or more pattern-action pairs, sometimes
235 referred to as "rules". For each record of an input (data) file, the
236 rules are checked sequentially. Any pattern which matches the input
237 record triggers that rule's action. Actions are instructions which
238 resemble statements in the 'C' programming language. Patterns come
239 in several varieties, including field comparisons, regular expression
240 matching, and special cases defined by reserved keywords.
241
242 All awk keywords and variables are case-sensitive. Text matching is
243 also sensitive to character case unless the builtin variable IGNORECASE
244 is set to a non-zero value.
2453 rules
246 The syntax for a pattern-action 'rule' is simply
247 PATTERN { ACTION }
248 where the braces ({}) are required punctuation for the action.
249 Semicolons (;) or 'newlines' (ie, having the text on a separate line)
250 delimit multiple rules and also multiple actions within a given rule.
251 Either the pattern or the action may be omitted; an empty pattern
252 matches every record of the input file; a missing action (not an empty
253 action inside of braces), is an implicit request to print the current
254 record; an empty action (ie, {}) is legal but not very useful.
2553 patterns
256 There are several types of patterns available for awk rules.
257
258 expression an 'expression' is something to be evaluated (perhaps
259 a comparison or function call) which will
260 be considered true if non-zero (for numeric
261 results) or if non-null (for strings)
262 /regular_expression/ slashes (/) delimit a regular expression
263 which is used as a pattern
264 pattern1, pattern2 a pair of patterns separated by a comma (,),
265 which causes a range of records to trigger
266 the associated action; the records which
267 match the patterns are included in the range
268 <null> an omitted pattern (in this text, the string '<null>'
269 is displayed, but in an awk program, it
270 would really be blank) matches every record
271 BEGIN keyword for specifying a rule to be executed prior to
272 reading the 1st record of the 1st input file
273 END keyword for specifying a rule to be executed after
274 handling the last input record of last file
2754 examples
276 Some example patterns (mostly with the corresponding actions omitted)
277
278 NF > 0 # comparison expression: matches non-null records
279 $0 # implied comparison: also matches non-null records
280 $2 > 1000 && sum <= 999999 # slightly more elaborate expression
281 /x/ # regular expression matching any record with an 'x' in it
282 /^ / # reg-expr matching records beginning with a space
283 $1 == "start", $NF == "stop" # range pattern for input in which
284 some data lines begin with 'start' and/or end with
285 'stop' in order to collect groups of records
286 { sum += $1 } # null pattern: it's action (add field #1 to
287 variable 'sum') would be executed for every record
288 BEGIN { sum = 0 } # keyword 'BEGIN': perform this action before
289 reading the input file (note: initialization to 0 is
290 unnecessary in awk)
291 END { print "total =", sum } # keyword 'END': perform this
292 action after the last input record has been processed
2933 actions
294 An 'action' is something to do when a given record has matched the
295 corresponding pattern in a rule. In general, actions resemble 'C'
296 statements and expressions. The action in a rule must be enclosed
297 in braces ({}).
298
299 Each action can contain more than one statement or expression to be
300 executed, provided that they're separated by semicolons (;) and/or
301 on separate lines.
302
303 An omitted action is equivalent to
304 { print $0 }
305 which prints the current record.
3063 operators
307 Relational operators
308 == compare for equality
309 != compare for inequality
310 <, <=, >, >= numerical or lexical comparison (less than, less or
311 equal, greater than, greater or equal, respectively)
312 ~ match against a regular expression
313 !~ match against a regular expression, but accept failed matches
314 instead of successful ones
315 Arithmetic operators
316 + addition
317 - subtraction
318 * multiplication
319 / division
320 % remainder
321 ^, ** exponentiation ('**' is a synonym for '^', unless POSIX
322 compatibility is specified, in which case it's invalid)
323 Boolean operators (aka Logical operators)
324 a value is considered false if it's 0 or a null string,
325 it is true otherwise; the result of a boolean operation
326 (and also of a comparison operation) will be 0 when false
327 or 1 when true
328 || or [expression (a || b) is true if either a is true or b
329 is true or both a and b are true; it is false otherwise;
330 b is not evaluated unless a is false (ie, short-circuit)]
331 && and [expression (a && b) is true if both a and b are true;
332 it is false otherwise; b is only evaluated if a is true]
333 ! not [expression (!a) is true if a is false, false otherwise]
334 in array membership; the keyword 'in' tests whether the value
335 on the left represents a current subscript in the array
336 named on the right
337 Conditional operator
338 ? : the conditional operator takes three operands; the first is
339 an expression to evaluate, the second is the expression to
340 use if the first was true, the third is the expression to
341 use if it was false [simple example (a < b ? b : a) gives
342 the maximum of a and b]
343 Assignment operators
344 = store the value on the right into the variable or array slot
345 on the left [expression (a = b) stores the value of b in a]
346 +=, -=, *=, /=, %=, ^=, **= perform the indicated arithmetic
347 operation using the current value of the variable or array
348 element of the left side and the expression on the right
349 side, then store the result in the left side
350 ++ increment by 1 [expression (++a) gets the current value of
351 a and adds 1 to it, stores that back in a, and returns the
352 new value; expression (a++) gets the current value of a,
353 adds 1 to it, stores that back in a, but returns the
354 original value of a]
355 -- decrement by 1 (analogous to increment)
356 String operators
357 there is no explicit operator for string concatenation;
358 two values and/or variables side-by-side are implicitly
359 concatenated into a string (numeric values are first
360 converted into their string equivalents)
361 Conversion between numeric and string values
362 there is no explicit operator for conversion; adding 0
363 to a string with force it to be converted to a number
364 (the numeric value will be 0 if the string does not
365 represent an integer or floating point number); the
366 reverse, converting a number into a string, is done by
367 concatenating a null string ("") to it [the expression
368 (5.75 "") evaluates to "5.75"]
369 Field 'operator'
370 $ prefixing a number or variable with a dollar sign ($)
371 causes the appropriate record field to be returned [($2)
372 gives the second field of the record, ($NF) gives the
373 last field (since the builtin variable NF is set to the
374 number of fields in the current record)]
375 Array subscript operator
376 , multi-dimensional arrays are simulated by using comma (,)
377 separated array indices; the actual index is generated
378 by replacing commas with the value of builtin SUBSEP,
379 then concatenating the expression into a string index
380 [comma is also used to separate arguments in function
381 calls and user-defined function definitions]
382 [comma is *also* used to indicate a range pattern in an
383 awk rule]
384 Escape 'operator'
385 \ In quoted character strings, the backslash (\) character
386 causes the following character to be interpreted in a
387 special manner [string "one\ntwo" has an embedded newline
388 character (linefeed on VMS, but treated as if it were both
389 carriage-return and linefeed); string "\033[" has an ASCII
390 'escape' character (which has octal value 033) followed by
391 a 'right-bracket' character]
392 Backslash is also used in regular expressions
393 Redirection operators
394 < Read-from -- valid with 'getline'
395 > Write-to (create new file) -- valid with 'print' and 'printf'
396 >> Append-to (create file if it doesn't already exist)
397 | Pipe-from/to -- valid with 'getline', 'print', and 'printf'
3984 precedence
399 Operator precedence, listed from highest to lowest. Assignment,
400 conditional, and exponentiation operators group from right to left;
401 all others group from left to right. Parentheses may be used to
402 override the normal order.
403
404 field ($)
405 increment (++), decrement (--)
406 exponentiation (^, **)
407 unary plus (+), unary minus (-), boolean not (!)
408 multiplication (*), division (/), remainder (%)
409 addition (+), subtraction (-)
410 concatenation (no special symbol; implied by context)
411 relational (==, !=, <, >=, etc), and redirection (<, >, >>, |)
412 Relational and redirection operators have the same precedence
413 and use similar symbols; context distinguishes between them
414 matching (~, !~)
415 array membership ('in')
416 boolean and (&&)
417 boolean or (||)
418 conditional (? :)
419 assignment (=, +=, etc)
4204 escaped_characters
421 Inside of a quoted string or constant regular expression, the
422 backslash (\) character gives special meaning to the character(s)
423 after it. Special character letters are case sensitive.
424 \\ results in one backslash in the string
425 \a is an 'alert' (<ctrl/G>. the ASCII <bell> character)
426 \b is a backspace (BS, <ctrl/H>)
427 \f is a form feed (FF, <ctrl/L>)
428 \n 'newline' (<ctrl/J> [line feed treated as CR+LF]
429 \r carriage return (CR, <ctrl/M> [re-positions at the
430 beginning of the current line]
431 \t tab (HT, <ctrl/I>)
432 \v vertical tab (VT, <ctrl/K>)
433 \### is an arbitrary character, where '###' represents 1 to 3
434 octal (ie, 0 thru 7) digits
435 \x## is an alternate arbitrary character, where '##' represents
436 1 or more hexadecimal (ie, 0 thru 9 and/or A through E
437 and/or a through e) digits; if more than two digits
438 follow, the result is undefined; not recognized if POSIX
439 compatibility mode is specified.
440
441 When a regular expression is represented in string form ("regex"
442 as opposed to /regex/), backslashes need to be paired. The first
443 one quotes the second during string processing, and the second one
444 remains to be used to quote whatever follows in regular expression
445 processing. For example, to match variable `xxx' against a period
446 character, use (xxx ~ "\\.") or (xxx ~ /\./); if you tried to use
447 (xxx ~ "\."), after string processing it would operate as (xxx ~ /./)
448 and end up matching any single character rather than just a period.
4493 statements
450 A statement refers to a unit of instruction found in the action
451 part of an awk rule, and also found in the definition of a function.
452 The distinction between action, statement, and expression usually
453 won't matter to an awk programmer.
454
455 Compound statements consist of multiple statements separated by
456 semicolons or newlines and enclosed within braces ({}). They are
457 sometimes referred to as 'blocks'.
4584 expressions
459 An expression such as 'a = 10' or 'n += i++' is a valid statement.
460
461 Function invocations such as 'reformat_field($3)' are also valid
462 statements.
4634 if-then-else
464 A conditional statement in awk uses the same syntax as for the 'C'
465 programming language: the 'if' keyword, followed by an expression
466 in parentheses, followed by a statement--or block of statements
467 enclosed within braces ({})--which will be executed if the expression
468 is true but skipped if it's false. This can optionally be followed
469 by the 'else' keyword and another statement--or block of statements--
470 which will be executed if (and only if) the expression was false.
4715 examples
472 Simple example showing a statement used to control how many numbers
473 are printed on a given line.
474 if ( ++i <= 10 ) #check whether this would be the 11th
475 printf(" %5d", k) #print on current line if not
476 else {
477 printf("\n %5d", k) #print on next line if so
478 i = 1 #and reset the counter
479 }
480 Another example ('next' is described under 'action-controls')
481 if ($1 > $2) { print "rejected"; next } else diff = $2 - $1
4824 loops
483 Three types of loop statements are available in awk. Each uses
484 the same syntax as 'C'. The simplest of the three is the 'while'
485 statement. It consists of the 'while' keyword, followed by an
486 expression enclosed within parentheses, followed by a statement--or
487 block of statements in braces ({})--which will be executed if the
488 expression evaluates to true. The expression is evaluated before
489 attempting to execute the statement; if it's true, the statement is
490 executed (the entire block of statements if there is a block) and
491 then the expression is re-evaluated.
492
493 The second type of loop is the do-while loop. It consists of the
494 'do' keyword, followed by a statement (usually a block of statements
495 enclosed within braces), followed by the 'while' keyword, followed
496 by a test expression enclosed within parentheses. The statement--or
497 block--is always executed at least once. Then the test expression
498 is evaluated, and the statement(s) re-executed if the result was
499 true (followed by re-evaluation of the test, and so on).
500
501 The most complex of the three loops is the 'for' statement, and it
502 has a second variant that is not found in 'C'. The ordinary for-loop
503 consists of the 'for' keyword, followed by three semicolon-separated
504 expressions enclosed within parentheses, followed by a statement or
505 brace-enclosed block of statements. The first of the three
506 expressions is an initialization clause; it is done before starting
507 the loop. The second expression is used as a test, just like the
508 expression in a while-loop. It is checked before attempting to
509 execute the statement block, and then re-checked after each execution
510 (if any) of the block. The third expression is an 'increment' clause;
511 it is evaluated after an execution of the statement block and before
512 re-evaluation of the test (2nd) expression. Normally, the increment
513 clause will change a variable used in the test clause, in such a
514 fashion that the test clause will eventually evaluate to false and
515 cause the loop to finish.
516
517 Note to 'C' programmers: the comma (,) operator commonly used in
518 'C' for-loop expressions is not valid in awk.
519
520 The awk-specific variant of the for-loop is used for processing
521 arrays. Its syntax is 'for' keyword, followed by variable_name 'in'
522 array_name (where 'var in array' is enclosed in parentheses),
523 followed by a statement (or block). Each valid subscript value for
524 the array in question is successively placed--in no particular
525 order--into the specified 'index' variable.
5265 while_example
527 # strip fields from the input record until there's nothing left
528 while (NF > 0) {
529 $1 = "" #this will affect the value of $0
530 $0 = $0 #this causes $0 and NF to be re-evaluated
531 print
532 }
5335 do_while_example
534 # This is a variation of the while_example; it gives a slightly
535 # different display due to the order of operation.
536 # echo input record until all fields have been stripped
537 do {
538 print #output $0
539 $1 = "" #this will affect the value of $0
540 $0 = $0 #this causes $0 and NF to be re-evaluated
541 } while (NF > 0)
5425 for_example
543 # echo command line arguments (won't include option switches)
544 for ( i = 0; i < ARGC; i++ ) print ARGV[i]
545
546 # display contents of builtin environment array
547 for (itm in ENVIRON)
548 print itm, ENVIRON[itm]
5494 loop-controls
550 There are two special statements--both from 'C'--for changing the
551 behavior of loop execution. The 'continue' statement is useful in
552 a compound (block) statement; when executed, it effectively skips
553 the rest of the block so that the increment-expression (only for
554 for-loops) and loop-termination expression can be re-evaluated.
555
556 The 'break' statement, when executed, effectively skips the rest
557 of the block and also treats the test expression as if it were
558 false (instead of actually re-evaluating it). In this case, the
559 increment-expression of a for-loop is also skipped.
560
561 Inside nested loops, both 'break' and 'continue' only apply to the
562 innermost loop. When in compatibility mode, 'break' or 'continue'
563 may be used outside of a loop; either will be treated like 'next'
564 (see action-controls).
5654 action-controls
566 There are two special statements for controlling statement execution.
567 The 'next' statement, when executed, causes the rest of the current
568 action and all further pattern-action rules to be skipped, so that
569 the next input record will be immediately processed. This is useful
570 if any early action knows that the current record will fail all the
571 remaining patterns; skipping those rules will reduce processing time.
572 An extended form, 'next file', is also available. It causes the
573 remainder of the current file to be skipped, and then either the
574 next input file will be processed, if any, or the END action will be
575 performed. 'next file' is not available in traditional awk.
576
577 The 'exit' statement causes GAWK execution to terminate. All open
578 files are closed, and no further processing is done. The END rule,
579 if any, is executed. 'exit' takes an optional numeric value as a
580 argument which is used as an exit status value, so that some sort
581 of indication of why execution has stopped can be passed on to the
582 user's environment.
5834 other_statements
584 The delete statement is used to remove an element from an array.
585 The syntax is 'delete' keyword followed by array name, followed
586 by index value enclosed in square brackets ([]). 'delete' may
587 also used on an array name, without any index specified, to delete
588 all its elements in a single operation.
589
590 The return statement is used in user-defined functions. The syntax
591 is the keyword 'return' optionally followed by a string or numeric
592 expression.
593
594 See also subtopic 'functions IO_functions' for a description of
595 'print', 'printf', and 'getline'.
5963 fields
597 When an input record is read, it is automatically split into fields
598 based on the current values of FS (builtin variable defining field
599 separator expression) and RS (builtin variable defining record
600 separator character). The default value of FS is an expression
601 which matches one or more spaces and tabs; the default for RS is
602 newline. If the FIELDWIDTHS variable is set to a space separated
603 list of numbers (as in ``FIELDWIDTHS = "2 3 2"'') then the input
604 is treated as if it had fixed-width fields of the indicated sizes
605 and the FS value will be ignored.
606
607 The field prefix operator ($), is used to reference a particular
608 field. For example, $3 designates the third field of the current
609 record. The entire record can be referenced via $0 (and it holds
610 the actual input record, not the values of $1, $2, ... concatenated
611 together, so multiple spaces--when present--remain intact, unless
612 a new value gets assigned).
613
614 The builtin variable NF holds the number of fields in the current
615 record. $NF is therefore the value of the last field. Attempts to
616 access fields beyond NF result in null values (if a record contained
617 3 fields, the value of $5 would be "").
618
619 Assigning a new value to $0 causes all the other field values (and NF)
620 to be re-evaluated. Changing a specific field will cause $0 to receive
621 a new value once it's re-evaluated, but until then the other existing
622 fields remain unchanged.
6233 variables
624 Variables in awk can hold both numeric and string values and do not
625 have to be pre-declared. In fact, there is no way to explicitly
626 declare them at all. Variable names consist of a leading letter
627 (either upper or lower case, which are distinct from each other)
628 or underscore (_) character followed by any number of letters,
629 digits, or underscores.
630
631 When a variable that didn't previously exist is referenced, it is
632 created and given a null value. A null value is treated as 0 when
633 used as a number, and is a string of zero characters in length if
634 used as a string.
6354 builtin_variables
636 GAWK maintains several 'built-in' variables. All have default values;
637 some are updated automatically. All the builtins have uppercase-only
638 names.
639
640 These builtin variables control how awk behaves
641 FS input field separator; default is a single space, which is
642 treated as if it were a regular expression for matching
643 one or more spaces and/or tabs; a value of " " also has a
644 second special-case side-effect of causing leading blanks
645 to be ignored instead of producing a null first field;
646 initial value can be specified on the command line with
647 the -F option (or /field_separator); the value can be a
648 regular expression
649 RS input record separator; default value is a newline ("\n");
650 the value can be multiple characters or a regular expression
651 OFS output field separator; value to place between variables in
652 a 'print' statement; default is one space; can be arbitrary
653 string
654 ORS output record separator; value to implicitly terminate 'print'
655 statement with; default is newline ("\n"); can be arbitrary
656 string
657 OFMT default output format used for printing numbers; default
658 value is "%.6g"
659 CONVFMT conversion format used for number-to-string conversions;
660 default value is also "%.6g", like OFMT; not used when the
661 number has a value which may be represented internally as
662 an exact integer (typically within -2147483648 to 2147483647)
663 SUBSEP subscript separator for array indices; used when an array
664 subscript is specified as a comma separated list of values:
665 the comma is replaced by SUBSEP and the resulting index
666 is a concatenation of the values and SUBSEP(s); default
667 value is "\034"; value may be arbitrary string
668 IGNORECASE string and regular expression matching flag; if true
669 (non-zero) matching ignores differences between upper and
670 lower case letters; affects the '~' and '!~' operators,
671 the 'index', 'match', 'split', 'sub', and 'gsub' functions,
672 and field splitting based on FS; default value is false (0);
673 has no effect if GAWK is in strict compatibility mode
674 FIELDWIDTHS space or tab separated list of width sizes; takes
675 precedence over FS when set, but is cleared if FS has a
676 value assigned to it; [note: the current implementation
677 of fixed-field input is considered experimental and is
678 expected to evolve over time]
679
680 These builtin variables provide useful information
681 NF number of fields in the current record
682 NR record number (accumulated over all files when more than one
683 input file is processed by the same program)
684 FNR current record number of the current input file; reset to 0
685 each time an input file is completed
686 RT record terminator, the input text which matched RS; not
687 available when the `-W traditional' option is used
688 RSTART starting position of substring matched by last invocation
689 of the 'match' function; set to 0 if a match fails and at
690 the start of each input record
691 RLENGTH length of substring matched by the last invocation of the
692 'match' function; set to -1 if a match fails
693 FILENAME name of the input file currently being processed; the
694 special name "-" is used to represent the standard input
695 ENVIRON array of miscellaneous user environment values; the VMS
696 implementation of GAWK provides values for ["USER"] (the
697 username), ["PATH"] (current default directory), ["HOME"]
698 (the user's login directory), and "[TERM]" (terminal type
699 if available) [all info provided by C RTL's environ]
700 ERRNO information about the cause of failure for 'getline' or
701 'close'; "0" if no such failure has occured.
702 ARGC number of elements in the ARGV array, counting [0] which is
703 the program name (ie, "gawk")
704 ARGV array of command-line arguments (in [0] to [ARGC-1]); the
705 program name (ie, "gawk") in held in ARGV[0]; command line
706 parameters (data files and "var=value" expressions, but not
707 program options or the awk program text string if present)
708 are stored in ARGV[1] through ARGV[ARGC-1]; the awk program
709 can change values of ARGC and ARGV[] during execution in
710 order to alter which files are processed or which between-
711 file assignments are made
712 ARGIND current index into ARGV[]
7134 arrays
714 awk supports associative arrays to collect data into tables. Array
715 elements can be either numeric or string, as can the indices used to
716 access them. Each array must have a unique name, but a given array
717 can hold both string and numeric elements at the same time. Arrays
718 are one-dimensional only, but multi-dimensional arrays can be
719 simulated using comma (,) separated indices, whereby a single index
720 value gets created by replacing commas with SUBSEP and concatenating
721 the resulting expression into a single string.
722
723 Referencing an array element is done with the expression
724 Array[Index]
725 where 'Array' represents the array's name and 'Index' represents a
726 value or expression used for a subscript. If the requested array
727 element did not exist, it will be created and assigned an initial
728 null value. To check whether an element exists without creating it,
729 use the 'in' boolean operator.
730 Index in Array
731 would check 'Array' for element 'Index' and return 1 if it existed
732 or 0 otherwise. To remove an element from an array, use the 'delete'
733 statement
734 delete Array[Index]
735 To remove all array elements at once, use
736 delete Array
737 Note: the latter is a gawk extension; also, there is no way to
738 delete an ordinary variable or an entire array; 'delete' only works
739 on array elements.
740
741 To process all elements of an array (in succession) when their
742 subscripts might be unknown, use the 'in' variant of the for-loop
743 for (Index in Array) { ... }
7443 functions
745 awk supports both built-in and user-defined functions. A function
746 may be considered a 'black-box' which accepts zero or more input
747 parameters, performs some calculations or other manipulations based
748 on them, and returns a single result.
749
750 The syntax for calling a function consists of the function name
751 immediately followed by an open parenthesis (left parenthesis '('),
752 followed by an argument list, followed by a closing parenthesis
753 (right parenthesis ')'). The argument list is a sequence of values
754 (numbers, strings, variables, array references, or expressions
755 involving the above and/or nested function calls), separated by
756 commas and optional white space.
757
758 The parentheses are required punctuation, except for the 'print' and
759 'printf' builtin IO functions, where they're optional, and for the
760 builtin IO function 'getline', where they're not allowed. Some
761 functions support optional [trailing] arguments which can be simply
762 omitted (along with the corresponding comma if applicable).
7634 numeric_functions
764 Builtin numeric functions
765 int(n) returns the value of 'n' with any fraction truncated
766 [truncation of negative values is towards 0]
767 sqrt(n) the square root of n
768 exp(n) the exponential of n ('e' raised to the 'n'th power)
769 log(n) natural logarithm of n
770 sin(n) sine of n (in radians)
771 cos(n) cosine of n (radians)
772 atan2(m,n) arctangent of m/n (radians)
773 rand() random number in the range 0 to 1 (exclusive)
774 srand(s) sets the random number 'seed' to s, so that a sequence
775 of 'random' numbers can be repeated; returns the
776 previous seed value; srand() [argument omitted] sets
777 the seed to an 'unpredictable' value (based on date
778 and time, for instance, so should be unrepeatable)
7794 string_functions
780 Builtin string functions
781 index(s,t) search string s for substring t; result is 1-based
782 offset of t within s, or 0 if not found
783 length(s) returns the length of string s; either 'length()'
784 with its argument omitted or 'length' without any
785 parenthesized argument list will return length of $0
786 match(s,r) search string s for regular expression r; the offset
787 of the longest, left-most substring which matches
788 is returned, or 0 if no match was found; the builtin
789 variables RSTART and RLENGTH are also set [RSTART to
790 the return value and RLENGTH to the size of the
791 matching substring, or to -1 if no match was found]
792 split(s,a,f) break string s into components based on field
793 separator f and store them in array a (into elements
794 [1], [2], and so on); the last argument is optional,
795 if omitted, the value of FS is used; the return value
796 is the number of components found
797 sprintf(f,e,...) format expression(s) e using format string f and
798 return the result as a string; formatting is similar
799 to the printf function
800 sub(r,t,s) search string target s for regular expression r, and
801 if a match is found, replace the matching text with
802 substring t, then store the result back in s; if s
803 is omitted, use $0 for the string; the result is
804 either 1 if a match+substitution was made, or 0
805 otherwise; if substring t contains the character
806 '&', the text which matched the regular expression
807 is used instead of '&' [to suppress this feature
808 of '&', 'quote' it with a backslash (\); since this
809 will be inside a quoted string which will receive
810 'backslash' processing before being passed to sub(),
811 *two* consecutive backslashes will be needed "\\&"]
812 gsub(r,t,s) similar to sub(), but gsub() replaces all nonoverlapping
813 substrings instead of just the first, and the return
814 value is the number of substitutions made
815 gensub(r,t,n,s) search string s ($0 if omitted) for regexp r and
816 replace the n'th occurrence with substring t; the
817 result is the new string and s (or $0) remains
818 unchanged; if n begins with letter "g" or "G" then
819 all matches are replaced instead of just the n'th;
820 if r has parenthesized subexpressions in it, t may
821 contain the special sequences \\0, \\1, through \\9
822 which expand into the value of the corresponding
823 subexpression; this function is a gawk extension
824 substr(s,p,l) extract a substring l characters long starting at
825 offset p in string s; l is optional, if omitted then
826 the remainder of the string (p thru end) is returned
827 tolower(s) return a copy of string s in which every uppercase
828 letter has been converted into lowercase
829 toupper(s) analogous to tolower(); convert lowercase to uppercase
8304 time_functions
831 Builtin time functions
832 systime() return the current time of day as the number of seconds
833 since some reference point; on VMS the reference point
834 is January 1, 1970, at 12 AM local time (not UTC)
835 strftime(f,t) format time value t using format f; if t is omitted,
836 the default is systime()
8375 time_formats
838 Formatting directives similar to the 'printf' & 'sprintf' functions
839 (each is introduced in the format string by preceding it with a
840 percent sign (%)); the directive is substituted by the corresponding
841 value
842 a abbreviated weekday name (Sun,Mon,Tue,Wed,Thu,Fri,Sat)
843 A full weekday name
844 b abbreviated month name (Jan,Feb,...)
845 B full month name
846 c date and time (Unix-style "aaa bbb dd HH:MM:SS YYYY" format)
847 C century prefix (19 or 20) [not century number, ie 20th]
848 d day of month as two digit decimal number (01-31)
849 D date in mm/dd/yy format
850 e day of month with leading space instead of leading 0 ( 1-31)
851 E ignored; following format character used
852 H hour (24 hour clock) as two digit number (00-23)
853 h abbreviated month name (Jan,Feb,...) [same as %b]
854 I hour (12 hour clock) as two digit number (01-12)
855 j day of year as three digit number (001-366)
856 m month as two digit number (01-12)
857 M minute as two digit number (00-59)
858 n 'newline' (ie, treat %n as \n)
859 O ignored; following format character used
860 p AM/PM designation for 12 hour clock
861 r time in AM/PM format ("II:MM:SS p")
862 R time without seconds ("HH:MM")
863 S second as two digit number (00-59)
864 t tab (ie, treat %t as \t)
865 T time ("HH:MM:SS")
866 U week of year (00-53) [first Sunday is first day of week 1]
867 V date (VMS-style "dd-bbb-YYYY" with 'bbb' forced to uppercase)
868 w weekday as decimal digit (0 [Sunday] through 6 [Saturday])
869 W week of year (00-53) [first _Monday_ is first day of week 1]
870 x date ("aaa bbb dd YYYY")
871 X time ("HH:MM:SS")
872 y year without century (00-99)
873 Y year with century (19yy-20yy)
874 Z time zone name (always "local" for VMS)
875 % literal percent sign (%)
8764 IO_functions
877 Builtin I/O functions
878 print x,... print the values of one or more expressions; if none
879 are listed, $0 is used; parentheses are optional;
880 when multiple values are printed, the current value
881 of builtin OFS (default is 1 space) is used to
882 separate them; the print line is implicitly
883 terminated with the current value of ORS (default
884 is newline); print does not have a return value
885 printf(f,x,...) print the values of one or more expressions, using
886 the specified format string; null strings are used
887 to supply missing values (if any); no between field
888 or trailing newline characters are printed, they
889 should be specified within the format string; the
890 argument-enclosing parentheses are optional;
891 printf does not have a return value
892 getline v read a record into variable v; if v is omitted, $0 is
893 used (and NF, NR, and FNR are updated); if v is
894 specified, then field-splitting won't be performed;
895 note: parentheses around the argument are *not*
896 allowed; return value is 1 for successful read, 0
897 if end of file is encountered, or -1 if some sort
898 of error occurred; [see 'redirection' for several
899 variants]
900 close(s) close a file or pipe specified by the string s; the
901 string used should have the same value as the one
902 used in a getline or print/printf redirection
903 fflush(s) flush output stream s; if s is omitted, stdout is
904 flushed; if it is specified but its value is an
905 empty string, all output streams are flushed
906 system(s) pass string s to executed by the operating system;
907 the command string is executed in a subprocess
9085 redirection
909 Both getline and print/printf support variant forms which use
910 redirection and pipes.
911
912 To read from a file (instead of from the primary input file), use
913 getline var < "file"
914 or getline < "file" (read into $0)
915 where the string "file" represents either an actual file name (in
916 quotes) or a variable which contains a file name string value or an
917 expression which evaluates to a string filename.
918
919 To create a pipe executing some command and read the result into
920 a variable (or into $0), use
921 "command" | getline var
922 or "command" | getline (read into $0)
923 where "command" is a literal string containing an operating system
924 command or a variable with a string value representing such a
925 command.
926
927 To output into a file other that the primary output, use
928 print x,... > "file" (or >> "file")
929 or printf(f,x,...) > "file" (or >> "file")
930 similar to the 'getline' example above. '>>' causes output to be
931 appended to an existing file if it exists, or create the file if
932 it doesn't already exist. '>' always creates a new file. The
933 alternate redirection method of '>$' (for RMS text file attributes)
934 is *only* available on the command line, not with 'print' or
935 'printf' in the current release.
936
937 To output an error message, use 'print' or 'printf' and redirect
938 the output to file "/dev/stderr" (or equivalently to "SYS$ERROR:"
939 on VMS). 'stderr' will normally be the user's terminal, even if
940 ordinary output is being redirected into a file.
941
942 To feed awk output into another command, use
943 print x,... | "command" (similarly for 'printf')
944 similar to the second 'getline' example. In this case, output
945 from awk will be passed as input to the specified operating system
946 command. The command must be capable of reading input from 'stdin'
947 ("SYS$INPUT:" on VMS) in order to receive data in this manner.
948
949 The 'close' function operates on the "file" or "command" argument
950 specified here (either a literal string or a variable or expression
951 resulting in a string value). It completely closes the file or
952 pipe so that further references to the same file or command string
953 would re-open that file or command at the beginning. Closing a
954 pipe or redirection also releases some file-oriented resources.
955
956 Note: the VMS implementation of GAWK uses temporary files to
957 simulate pipes, so a command must finish before 'getline' can get
958 any input from it, and 'close' must be called for an output pipe
959 before any data can be passed to the specified command.
9605 formats
961 Formatting characters used by the 'printf' and 'sprintf' functions
962 (each is introduced in the format string by preceding it with a
963 percent sign (%))
964 % include a literal percent sign (%) in the result
965 c format the next argument as a single ASCII character
966 (prints first character of string argument, or corresponding
967 ASCII character if numeric argument, e.g. 65 is 'A')
968 s format the next argument as a string (numeric arguments are
969 converted into strings on demand)
970 d decimal number (ie, integer value in base 10)
971 i integer (equivalent to decimal)
972 o octal number (integer in base 8)
973 x hexadecimal number (integer in base 16) [lowercase]
974 X hexadecimal number [digits 'A' thru 'E' in uppercase]
975 f floating point number (digits, decimal point, fraction digits)
976 e exponential (scientific notation) number (digit, decimal
977 point, fraction digits, letter 'e', sign '+' or '-',
978 exponent digits)
979 g 'fractional' number in either 'e' or 'f' format, whichever
980 produces shorter result
981
982 Several optional modifiers can be placed between the initiating
983 percent sign and the format character (doesn't apply to %%).
984 - left justify (only matters when width specifier is present)
985 (space) for numeric specifiers, prefix nonnegative values with
986 a space and negative values with a minus sign
987 + for numeric specifiers, prefix nonnegative values with a plus
988 sign and negative values with a minus sign
989 # alternate form applicable to several of the format characters
990 (o, x, X, e, E, f, g, G)
991 NN width ['NN' represents 1 or more decimal digits]; actually
992 minimum width to use, longer items will not be truncated; a
993 leading 0 will cause right-justified numbers to be padded on
994 the left with zeroes instead of spaces when they're aligned
995 .MM precision [decimal point followed by 1 or more digits]; used
996 as maximum width for strings (causing truncation if they're
997 actually longer) or as number of fraction digits for 'f' or
998 'e' numeric formats, or number of significant digits for 'g'
999 numeric format
10004 user_defined_functions
1001 User-defined functions may be created as needed to simplify awk
1002 programs or to collect commonly used code into one place. The
1003 general syntax of a user-defined function is the 'function' keyword
1004 followed by unique function name, followed by a comma-separated
1005 parameter list enclosed in parentheses, followed by statement(s)
1006 enclosed within braces ({}). A 'return' statement is customary
1007 but is not required.
1008 function FuncName(arg1,arg2) {
1009 # arbitrary statements
1010 return (arg1 + arg2) / 2
1011 }
1012 If a function does not use 'return' to specify an output value, the
1013 result received by the caller will be unpredictable.
1014
1015 Functions may be placed in an awk program before, between, or after
1016 the pattern-action rules. The abbreviation 'func' may be used in
1017 place of 'function', unless POSIX compatibility mode is in effect.
10183 regular_expressions
1019 A regular expression is a shorthand way of specifying a 'wildcard'
1020 type of string comparison. Regular expression matching is very
1021 fundamental to awk's operation.
1022
1023 Meta symbols
1024 ^ matches beginning of line or beginning of string; note that
1025 embedded newlines ('\n') create multi-line strings, so
1026 beginning of line is not necessarily beginning of string
1027 $ matches end of line or end of string
1028 . any single character (except newline)
1029 [ ] set of characters; [ABC] matches either 'A' or 'B' or 'C'; a
1030 dash (other than first or last of the set) denotes a range
1031 of characters: [A-Z] matches any upper case letter; if the
1032 first character of the set is '^', then the sense of match
1033 is reversed: [^0-9] matches any non-digit; several
1034 characters need to be quoted with backslash (\) if they
1035 occur in a set: '\', ']', '-', and '^'; within sets,
1036 various special character class designations are recognized,
1037 such as [:digit:] and [:punct:], as per POSIX
1038 | alternation (similar to boolean 'or'); match either of two
1039 patterns [for example "^start|stop$" matches leading 'start'
1040 or trailing 'stop']
1041 ( ) grouping, alter normal precedence [for example, "^(start|stop)$"
1042 matches lines reading either 'start' or 'stop']
1043 * repeated matching; when placed after a pattern, indicates that
1044 the pattern should match any number of times [for example,
1045 "[a-z][0-9]*" matches a lower case letter followed by zero or
1046 more digits]
1047 + repeated matching; when placed after a pattern, indicates that
1048 the pattern should match one or more times ["[0-9]+" matches
1049 any non-empty sequence of digits]
1050 ? optional matching; indicates that the pattern can match zero or
1051 one times ["[a-z][0-9]?" matches lower case letter alone or
1052 followed by a single digit]
1053 { } interval specification; {n} to match n times or {m,n} to match
1054 at least m but not more than n times; only functional when
1055 either the `-W posix' or `-W re-interval' options are used
1056 \ quote; prevent the character which follows from having special
1057 meaning; if the regexp is specified as a string, then the
1058 backslash itself will need to be quoted by preceding it with
1059 another backslash
1060
1061 A regular expression which matches a string or line will match against
1062 the first (left-most) substring which meets the pattern and include
1063 the longest sequence of characters which still meets that pattern.
10643 comments
1065 Comments in awk programs are introduced with '#'. Anything after
1066 '#' on a line is ignored by GAWK. It's a good idea to include an
1067 explanation of what an awk program is doing and also who wrote it
1068 and when.
10693 further_information
1070 For complete documentation on GAWK, see "Effective AWK Programming"
1071 by Arnold Robbins. The second edition (ISBN 1-57831-000-8) is jointly
1072 published by SSC and the FSF (http://www.ssc.com).
1073
1074 Source text for it is present in the file GAWK.TEXI. A postscript
1075 version is available via anonymous FTP from host gnudist.gnu.org in
1076 directory /gnu/gawk, file gawk-{version}-doc.tar.gz where {version}
1077 would be the current version number, such as 3.0.6.
1078
1079 Another source of documentation is "The AWK Programming Language"
1080 by Aho, Weinberger, and Kernighan (1988), published by Addison-Wesley.
1081 ISBN code is 0-201-07981-X.
1082
1083 Each of these works contains both a reference on the awk language
1084 and a tutorial on awk's use, with many sample programs.
10853 authors
1086 The awk programming language was originally created by Alfred V. Aho,
1087 Peter J. Weinberger, and Brian W. Kernighan in 1977. The language
1088 was revised and enhanced in a new version which was released in 1985.
1089
1090 GAWK, the GNU implementation of awk, was written in 1986 by Paul Rubin
1091 and Jay Fenlason, with advice from Richard Stallman, and with
1092 contributions from John Woods. In 1988 and 1989, David Trueman and
1093 Arnold Robbins revised GAWK for compatibility with the newer awk.
1094 Arnold Robbins is the current maintainer.
1095
1096 GAWK version 2.11.1 was ported to VMS by Pat Rankin in November, 1989,
1097 with further revisions in the Spring of 1990. The VMS port was
1098 incorporated into the official GNU distribution of version 2.13 in
1099 Spring 1991. (Version 2.12 was never publically released.)
11002 release_notes
1101 GAWK 3.1.2 handles parsing of the command line differently than
1102 earlier versions for the case where there is a single token, which
1103 often yielded a "missing required element" error in earlier versions.
1104
1105 [Note for 3.1.x: these release notes haven't been updated in quite
1106 some time. Most of the information is still applicable though.]
1107
1108 GAWK 3.0.3 tested under VAX/VMS V6.2 and Alpha/VMS V6.2, April, 1997;
1109 should be compatible with VMS versions V4.6 and later. Current source
1110 code is compatible with DEC's DEC C v5.x or VAX C v3.2; also compiles
1111 successfully with GNU C (tested with gcc-vms 2.7.1).
11123 AWK_LIBRARY
1113 GAWK uses a built in search path when looking for a program file
1114 specified by the -f option (or the /input qualifier) when that file
1115 name does not include a device and/or directory. GAWK will first
1116 look in the current default directory, then if the file wasn't found
1117 it will look in the directory specified by the translation of logical
1118 name "AWK_LIBRARY".
11193 known_problems
1120 There are several known problems with GAWK running on VMS. Some can
1121 be ignored, others require work-arounds.
11224 file_formats
1123 If a file having the RMS attribute "Fortran carriage control" is
1124 read as input, it will generate an empty first record if the first
1125 actual record begins with a space (leading space becomes a newline).
1126 Also, the last record of the file will give a "record not terminated"
1127 warning. Both of these minor problems are due to the way that the
1128 C Run-Time Library (VAXCRTL) converts record attributes.
1129
1130 Another poor feature without a work-around is that there's no way to
1131 specify "append if possible, create with RMS text attributes if not"
1132 with the current command line I/O redirection. '>>$' isn't supported.
1133 Ditto for binary output; '>>+' isn't supported.
11344 RS_peculiarities
1135 Changing the record separator to something other than newline ('\n')
1136 will produce anomalous results for ordinary files. For example,
1137 using RS = "\f" and FS = "\n" with the following input
1138 |rec 1, line 1
1139 |rec 1, line 2
1140 |^L (form feed)
1141 |rec 2, line 1
1142 |rec 2, line 2
1143 |^L (form feed)
1144 |rec 3, line 1
1145 |rec 3, line 2
1146 |(end of file)
1147 will produce two fields for record 1, but three fields each for
1148 records 2 and 3. This is because the form-feed record delimiter is
1149 on its own line, so awk sees a newline after it. Since newline is
1150 now a field separator, records 2 and 3 will have null first fields.
1151 The following awk code will work-around this problem by inserting
1152 a null first field in the first record, so that all records can be
1153 handled the same by subsequent processing.
1154 # fix up for first record (RS != "\n")
1155 FNR == 1 { if ( $0 == "" ) #leading separator
1156 next #skip its null record
1157 else #otherwise,
1158 $0 = FS $0 #realign fields
1159 }
1160 There is a second problem with this same example. It will always
1161 trigger a "record not terminated" warning when it reaches the end of
1162 file. In the sample shown, there is no final separator; however, if
1163 a trailing form-feed were present, it would produce a spurious final
1164 record with two null fields. This occurs because the I/O system
1165 sees an implicit newline at the end of the last record, so awk sees
1166 a pair of null fields separated by that newline. The following code
1167 fragment will fix that provided there are no null records (in this
1168 case, that would be two consecutive lines containing just form-feeds).
1169 # fix up for last record (RS != "\n")
1170 $0 == FS { next } #drop spurious final record
1171 Note that the "record not terminated" warning will persist.
11724 cmd_inconsistency
1173 The DCL qualifier /OUTPUT is internally equivalent to '>$' output
1174 redirection, but the qualifier /INPUT corresponds to the -f option
1175 rather than to '<' input redirection.
11764 exit
1177 The exit statement can optionally pass a final status value to the
1178 operating system. GAWK expects a UN*X-style value instead of a
1179 VMS status value, so 0 indicates success and non-zero indicates
1180 failure. The final exit status will be 1 (VMS success) if 0 is
1181 used, or even (VMS non-success) if non-zero is used.
1182!3 changes
11833 prior_changes
1184 Changes between version 3.0.6 and 2.15.6
1185
1186 General
1187 RS can contain multiple characters or be a regexp
1188 Regular expression interval support added
1189 gensub() and fflush() functions added
1190 memory leak(s) introduced in 3.0.2 or 3.0.1 fixed
1191 the user manual has been substantially revised
1192
1193 VMS-specific
1194 Switched to build with DEC C by default
1195 Changes between version 2.15.6 and 2.14
1196
1197 General
1198 Many obscure bugs fixed
1199 `delete' may operate on an entire array
1200 ARGIND and ERRNO builtin variables added
1201
1202 VMS-specific
1203 `>+ file' binary-mode output redirection added
1204 /variable=(foo=42) fixed
1205 Floating point number formatting improved
1206
1207 Changes between version 2.14 and 2.13.2:
1208
1209 General
1210 'next file' construct added
1211 'continue' outside of any loop is treated as 'next'
1212 Assorted bug fixes and efficiency improvements
1213 _The_GAWK_Manual_ updated
1214 Test suite expanded
1215
1216 VMS-specific
1217 VMS POSIX support added
1218 Disk I/O throughput enhanced
1219 Pipe emulation improved and incorrect interaction with user-mode
1220 redefinition of SYS$OUTPUT eliminated
1221
1222 Changes between version 2.13 and 2.11.1: (2.12 was not released)
1223
1224 General
1225 CONVFMT and FIELDWIDTHS builtin control variables added
1226 systime() and strftime() date/time functions added
1227 'lint' and 'posix' run-time options added
1228 '-W' command line option syntax supercedes '-c', '-C', and '-V'
1229 '-a' and '-e' regular expression options made obsolete
1230 Various bug fixes and efficiency improvements
1231 More platforms supported ('officially' including VMS)
1232
1233 VMS-specific
1234 %g printf format fixed
1235 Handling of '\' on command line modified; no longer necessary to
1236 double it up
1237 Problem redirecting stderr (>&efile) at same time as stdin (<ifile)
1238 or stdout (>ofile) has been fixed
1239 ``2>&1'' and ``1>&2'' redirection constructs added
1240 Interaction between command line I/O redirection and gawk pipes
1241 fixed; also, name used for pseudo-pipe temporary file expanded
12423 license
1243 GAWK is covered by the "GNU General Public License", the gist of which
1244 is that if you supply this software to a third party, you are expressly
1245 forbidden to prevent them from supplying it to a fourth party, and if
1246 you supply binaries you must make the source code available to them
1247 at no additional cost. Any revisions or modified versions are also
1248 covered by the same license. There is no warranty, express or implied,
1249 for this software. It is provided "as is."
1250
1251 [Disclaimer: This is just an informal summary with no legal basis;
1252 refer to the actual GNU General Public License for specific details.]
1253!2 examples
1254!
Note: See TracBrowser for help on using the repository browser.