source: trunk/essentials/sys-apps/gawk/doc/gawk.1

Last change on this file was 3076, checked in by bird, 18 years ago

gawk 3.1.5

File size: 74.4 KB
Line 
1.ds PX \s-1POSIX\s+1
2.ds UX \s-1UNIX\s+1
3.ds AN \s-1ANSI\s+1
4.ds GN \s-1GNU\s+1
5.ds AK \s-1AWK\s+1
6.ds EP \fIGAWK: Effective AWK Programming\fP
7.if !\n(.g \{\
8. if !\w|\*(lq| \{\
9. ds lq ``
10. if \w'\(lq' .ds lq "\(lq
11. \}
12. if !\w|\*(rq| \{\
13. ds rq ''
14. if \w'\(rq' .ds rq "\(rq
15. \}
16.\}
17.TH GAWK 1 "June 26 2005" "Free Software Foundation" "Utility Commands"
18.SH NAME
19gawk \- pattern scanning and processing language
20.SH SYNOPSIS
21.B gawk
22[ \*(PX or \*(GN style options ]
23.B \-f
24.I program-file
25[
26.B \-\^\-
27] file .\|.\|.
28.br
29.B gawk
30[ \*(PX or \*(GN style options ]
31[
32.B \-\^\-
33]
34.I program-text
35file .\|.\|.
36.sp
37.B pgawk
38[ \*(PX or \*(GN style options ]
39.B \-f
40.I program-file
41[
42.B \-\^\-
43] file .\|.\|.
44.br
45.B pgawk
46[ \*(PX or \*(GN style options ]
47[
48.B \-\^\-
49]
50.I program-text
51file .\|.\|.
52.SH DESCRIPTION
53.I Gawk
54is the \*(GN Project's implementation of the \*(AK programming language.
55It conforms to the definition of the language in
56the \*(PX 1003.2 Command Language And Utilities Standard.
57This version in turn is based on the description in
58.IR "The AWK Programming Language" ,
59by Aho, Kernighan, and Weinberger,
60with the additional features found in the System V Release 4 version
61of \*(UX
62.IR awk .
63.I Gawk
64also provides more recent Bell Laboratories
65.I awk
66extensions, and a number of \*(GN-specific extensions.
67.PP
68.I Pgawk
69is the profiling version of
70.IR gawk .
71It is identical in every way to
72.IR gawk ,
73except that programs run more slowly,
74and it automatically produces an execution profile in the file
75.B awkprof.out
76when done.
77See the
78.B \-\^\-profile
79option, below.
80.PP
81The command line consists of options to
82.I gawk
83itself, the \*(AK program text (if not supplied via the
84.B \-f
85or
86.B \-\^\-file
87options), and values to be made
88available in the
89.B ARGC
90and
91.B ARGV
92pre-defined \*(AK variables.
93.SH OPTION FORMAT
94.PP
95.I Gawk
96options may be either traditional \*(PX one letter options,
97or \*(GN style long options. \*(PX options start with a single \*(lq\-\*(rq,
98while long options start with \*(lq\-\^\-\*(rq.
99Long options are provided for both \*(GN-specific features and
100for \*(PX-mandated features.
101.PP
102Following the \*(PX standard,
103.IR gawk -specific
104options are supplied via arguments to the
105.B \-W
106option. Multiple
107.B \-W
108options may be supplied
109Each
110.B \-W
111option has a corresponding long option, as detailed below.
112Arguments to long options are either joined with the option
113by an
114.B =
115sign, with no intervening spaces, or they may be provided in the
116next command line argument.
117Long options may be abbreviated, as long as the abbreviation
118remains unique.
119.SH OPTIONS
120.PP
121.I Gawk
122accepts the following options, listed alphabetically.
123.TP
124.PD 0
125.BI \-F " fs"
126.TP
127.PD
128.BI \-\^\-field-separator " fs"
129Use
130.I fs
131for the input field separator (the value of the
132.B FS
133predefined
134variable).
135.TP
136.PD 0
137\fB\-v\fI var\fB\^=\^\fIval\fR
138.TP
139.PD
140\fB\-\^\-assign \fIvar\fB\^=\^\fIval\fR
141Assign the value
142.I val
143to the variable
144.IR var ,
145before execution of the program begins.
146Such variable values are available to the
147.B BEGIN
148block of an \*(AK program.
149.TP
150.PD 0
151.BI \-f " program-file"
152.TP
153.PD
154.BI \-\^\-file " program-file"
155Read the \*(AK program source from the file
156.IR program-file ,
157instead of from the first command line argument.
158Multiple
159.B \-f
160(or
161.BR \-\^\-file )
162options may be used.
163.TP
164.PD 0
165.BI \-mf " NNN"
166.TP
167.PD
168.BI \-mr " NNN"
169Set various memory limits to the value
170.IR NNN .
171The
172.B f
173flag sets the maximum number of fields, and the
174.B r
175flag sets the maximum record size. These two flags and the
176.B \-m
177option are from the Bell Laboratories research version of \*(UX
178.IR awk .
179They are ignored by
180.IR gawk ,
181since
182.I gawk
183has no pre-defined limits.
184.TP
185.PD 0
186.B "\-W compat"
187.TP
188.PD 0
189.B "\-W traditional"
190.TP
191.PD 0
192.B \-\^\-compat
193.TP
194.PD
195.B \-\^\-traditional
196Run in
197.I compatibility
198mode. In compatibility mode,
199.I gawk
200behaves identically to \*(UX
201.IR awk ;
202none of the \*(GN-specific extensions are recognized.
203The use of
204.B \-\^\-traditional
205is preferred over the other forms of this option.
206See
207.BR "GNU EXTENSIONS" ,
208below, for more information.
209.TP
210.PD 0
211.B "\-W copyleft"
212.TP
213.PD 0
214.B "\-W copyright"
215.TP
216.PD 0
217.B \-\^\-copyleft
218.TP
219.PD
220.B \-\^\-copyright
221Print the short version of the \*(GN copyright information message on
222the standard output and exit successfully.
223.TP
224.PD 0
225\fB\-W dump-variables\fR[\fB=\fIfile\fR]
226.TP
227.PD
228\fB\-\^\-dump-variables\fR[\fB=\fIfile\fR]
229Print a sorted list of global variables, their types and final values to
230.IR file .
231If no
232.I file
233is provided,
234.I gawk
235uses a file named
236.I awkvars.out
237in the current directory.
238.sp .5
239Having a list of all the global variables is a good way to look for
240typographical errors in your programs.
241You would also use this option if you have a large program with a lot of
242functions, and you want to be sure that your functions don't
243inadvertently use global variables that you meant to be local.
244(This is a particularly easy mistake to make with simple variable
245names like
246.BR i ,
247.BR j ,
248and so on.)
249.TP
250.PD 0
251.BI "\-W exec " file
252.TP
253.PD
254.BI \-\^\-exec " file"
255Similar to
256.BR \-f ,
257however, this is option is the last one processed.
258This should be used with
259.B #!
260scripts, particularly for CGI applications, to avoid
261passing in options or source code (!) on the command line
262from a URL.
263This option disables command-line variable assignments.
264.TP
265.PD 0
266.B "\-W gen\-po"
267.TP
268.PD
269.B \-\^\-gen\-po
270Scan and parse the \*(AK program, and generate a \*(GN
271.B \&.po
272format file on standard output with entries for all localizable
273strings in the program. The program itself is not executed.
274See the \*(GN
275.I gettext
276distribution for more information on
277.B \&.po
278files.
279.TP
280.PD 0
281.B "\-W help"
282.TP
283.PD 0
284.B "\-W usage"
285.TP
286.PD 0
287.B \-\^\-help
288.TP
289.PD
290.B \-\^\-usage
291Print a relatively short summary of the available options on
292the standard output.
293(Per the
294.IR "GNU Coding Standards" ,
295these options cause an immediate, successful exit.)
296.TP
297.PD 0
298.BR "\-W lint" [ =\fIvalue\fR ]
299.TP
300.PD
301.BR \-\^\-lint [ =\fIvalue\fR ]
302Provide warnings about constructs that are
303dubious or non-portable to other \*(AK implementations.
304With an optional argument of
305.BR fatal ,
306lint warnings become fatal errors.
307This may be drastic, but its use will certainly encourage the
308development of cleaner \*(AK programs.
309With an optional argument of
310.BR invalid ,
311only warnings about things that are
312actually invalid are issued. (This is not fully implemented yet.)
313.TP
314.PD 0
315.B "\-W lint\-old"
316.TP
317.PD
318.B \-\^\-lint\-old
319Provide warnings about constructs that are
320not portable to the original version of Unix
321.IR awk .
322.TP
323.PD 0
324.B "\-W non\-decimal\-data"
325.TP
326.PD
327.B "\-\^\-non\-decimal\-data"
328Recognize octal and hexadecimal values in input data.
329.I "Use this option with great caution!"
330.ig
331.\" This option is left undocumented, on purpose.
332.TP
333.PD 0
334.B "\-W nostalgia"
335.TP
336.PD
337.B \-\^\-nostalgia
338Provide a moment of nostalgia for long time
339.I awk
340users.
341..
342.TP
343.PD 0
344.B "\-W posix"
345.TP
346.PD
347.B \-\^\-posix
348This turns on
349.I compatibility
350mode, with the following additional restrictions:
351.RS
352.TP "\w'\(bu'u+1n"
353\(bu
354.B \ex
355escape sequences are not recognized.
356.TP
357\(bu
358Only space and tab act as field separators when
359.B FS
360is set to a single space, newline does not.
361.TP
362\(bu
363You cannot continue lines after
364.B ?
365and
366.BR : .
367.TP
368\(bu
369The synonym
370.B func
371for the keyword
372.B function
373is not recognized.
374.TP
375\(bu
376The operators
377.B **
378and
379.B **=
380cannot be used in place of
381.B ^
382and
383.BR ^= .
384.TP
385\(bu
386The
387.B fflush()
388function is not available.
389.RE
390.TP
391.PD 0
392\fB\-W profile\fR[\fB=\fIprof_file\fR]
393.TP
394.PD
395\fB\-\^\-profile\fR[\fB=\fIprof_file\fR]
396Send profiling data to
397.IR prof_file .
398The default is
399.BR awkprof.out .
400When run with
401.IR gawk ,
402the profile is just a \*(lqpretty printed\*(rq version of the program.
403When run with
404.IR pgawk ,
405the profile contains execution counts of each statement in the program
406in the left margin and function call counts for each user-defined function.
407.TP
408.PD 0
409.B "\-W re\-interval"
410.TP
411.PD
412.B \-\^\-re\-interval
413Enable the use of
414.I "interval expressions"
415in regular expression matching
416(see
417.BR "Regular Expressions" ,
418below).
419Interval expressions were not traditionally available in the
420\*(AK language. The \*(PX standard added them, to make
421.I awk
422and
423.I egrep
424consistent with each other.
425However, their use is likely
426to break old \*(AK programs, so
427.I gawk
428only provides them if they are requested with this option, or when
429.B \-\^\-posix
430is specified.
431.TP
432.PD 0
433.BI "\-W source " program-text
434.TP
435.PD
436.BI \-\^\-source " program-text"
437Use
438.I program-text
439as \*(AK program source code.
440This option allows the easy intermixing of library functions (used via the
441.B \-f
442and
443.B \-\^\-file
444options) with source code entered on the command line.
445It is intended primarily for medium to large \*(AK programs used
446in shell scripts.
447.TP
448.PD 0
449.B "\-W version"
450.TP
451.PD
452.B \-\^\-version
453Print version information for this particular copy of
454.I gawk
455on the standard output.
456This is useful mainly for knowing if the current copy of
457.I gawk
458on your system
459is up to date with respect to whatever the Free Software Foundation
460is distributing.
461This is also useful when reporting bugs.
462(Per the
463.IR "GNU Coding Standards" ,
464these options cause an immediate, successful exit.)
465.TP
466.PD 0
467.B \-\^\-
468Signal the end of options. This is useful to allow further arguments to the
469\*(AK program itself to start with a \*(lq\-\*(rq.
470This is mainly for consistency with the argument parsing convention used
471by most other \*(PX programs.
472.PP
473In compatibility mode,
474any other options are flagged as invalid, but are otherwise ignored.
475In normal operation, as long as program text has been supplied, unknown
476options are passed on to the \*(AK program in the
477.B ARGV
478array for processing. This is particularly useful for running \*(AK
479programs via the \*(lq#!\*(rq executable interpreter mechanism.
480.SH AWK PROGRAM EXECUTION
481.PP
482An \*(AK program consists of a sequence of pattern-action statements
483and optional function definitions.
484.RS
485.PP
486\fIpattern\fB { \fIaction statements\fB }\fR
487.br
488\fBfunction \fIname\fB(\fIparameter list\fB) { \fIstatements\fB }\fR
489.RE
490.PP
491.I Gawk
492first reads the program source from the
493.IR program-file (s)
494if specified,
495from arguments to
496.BR \-\^\-source ,
497or from the first non-option argument on the command line.
498The
499.B \-f
500and
501.B \-\^\-source
502options may be used multiple times on the command line.
503.I Gawk
504reads the program text as if all the
505.IR program-file s
506and command line source texts
507had been concatenated together. This is useful for building libraries
508of \*(AK functions, without having to include them in each new \*(AK
509program that uses them. It also provides the ability to mix library
510functions with command line programs.
511.PP
512The environment variable
513.B AWKPATH
514specifies a search path to use when finding source files named with
515the
516.B \-f
517option. If this variable does not exist, the default path is
518\fB".:/usr/local/share/awk"\fR.
519(The actual directory may vary, depending upon how
520.I gawk
521was built and installed.)
522If a file name given to the
523.B \-f
524option contains a \*(lq/\*(rq character, no path search is performed.
525.PP
526.I Gawk
527executes \*(AK programs in the following order.
528First,
529all variable assignments specified via the
530.B \-v
531option are performed.
532Next,
533.I gawk
534compiles the program into an internal form.
535Then,
536.I gawk
537executes the code in the
538.B BEGIN
539block(s) (if any),
540and then proceeds to read
541each file named in the
542.B ARGV
543array.
544If there are no files named on the command line,
545.I gawk
546reads the standard input.
547.PP
548If a filename on the command line has the form
549.IB var = val
550it is treated as a variable assignment. The variable
551.I var
552will be assigned the value
553.IR val .
554(This happens after any
555.B BEGIN
556block(s) have been run.)
557Command line variable assignment
558is most useful for dynamically assigning values to the variables
559\*(AK uses to control how input is broken into fields and records.
560It is also useful for controlling state if multiple passes are needed over
561a single data file.
562.PP
563If the value of a particular element of
564.B ARGV
565is empty (\fB""\fR),
566.I gawk
567skips over it.
568.PP
569For each record in the input,
570.I gawk
571tests to see if it matches any
572.I pattern
573in the \*(AK program.
574For each pattern that the record matches, the associated
575.I action
576is executed.
577The patterns are tested in the order they occur in the program.
578.PP
579Finally, after all the input is exhausted,
580.I gawk
581executes the code in the
582.B END
583block(s) (if any).
584.SH VARIABLES, RECORDS AND FIELDS
585\*(AK variables are dynamic; they come into existence when they are
586first used. Their values are either floating-point numbers or strings,
587or both,
588depending upon how they are used. \*(AK also has one dimensional
589arrays; arrays with multiple dimensions may be simulated.
590Several pre-defined variables are set as a program
591runs; these will be described as needed and summarized below.
592.SS Records
593Normally, records are separated by newline characters. You can control how
594records are separated by assigning values to the built-in variable
595.BR RS .
596If
597.B RS
598is any single character, that character separates records.
599Otherwise,
600.B RS
601is a regular expression. Text in the input that matches this
602regular expression separates the record.
603However, in compatibility mode,
604only the first character of its string
605value is used for separating records.
606If
607.B RS
608is set to the null string, then records are separated by
609blank lines.
610When
611.B RS
612is set to the null string, the newline character always acts as
613a field separator, in addition to whatever value
614.B FS
615may have.
616.SS Fields
617.PP
618As each input record is read,
619.I gawk
620splits the record into
621.IR fields ,
622using the value of the
623.B FS
624variable as the field separator.
625If
626.B FS
627is a single character, fields are separated by that character.
628If
629.B FS
630is the null string, then each individual character becomes a
631separate field.
632Otherwise,
633.B FS
634is expected to be a full regular expression.
635In the special case that
636.B FS
637is a single space, fields are separated
638by runs of spaces and/or tabs and/or newlines.
639(But see the discussion of
640.BR \-\^\-posix ,
641below).
642.B NOTE:
643The value of
644.B IGNORECASE
645(see below) also affects how fields are split when
646.B FS
647is a regular expression, and how records are separated when
648.B RS
649is a regular expression.
650.PP
651If the
652.B FIELDWIDTHS
653variable is set to a space separated list of numbers, each field is
654expected to have fixed width, and
655.I gawk
656splits up the record using the specified widths. The value of
657.B FS
658is ignored.
659Assigning a new value to
660.B FS
661overrides the use of
662.BR FIELDWIDTHS ,
663and restores the default behavior.
664.PP
665Each field in the input record may be referenced by its position,
666.BR $1 ,
667.BR $2 ,
668and so on.
669.B $0
670is the whole record.
671Fields need not be referenced by constants:
672.RS
673.PP
674.ft B
675n = 5
676.br
677print $n
678.ft R
679.RE
680.PP
681prints the fifth field in the input record.
682.PP
683The variable
684.B NF
685is set to the total number of fields in the input record.
686.PP
687References to non-existent fields (i.e. fields after
688.BR $NF )
689produce the null-string. However, assigning to a non-existent field
690(e.g.,
691.BR "$(NF+2) = 5" )
692increases the value of
693.BR NF ,
694creates any intervening fields with the null string as their value, and
695causes the value of
696.B $0
697to be recomputed, with the fields being separated by the value of
698.BR OFS .
699References to negative numbered fields cause a fatal error.
700Decrementing
701.B NF
702causes the values of fields past the new value to be lost, and the value of
703.B $0
704to be recomputed, with the fields being separated by the value of
705.BR OFS .
706.PP
707Assigning a value to an existing field
708causes the whole record to be rebuilt when
709.B $0
710is referenced.
711Similarly, assigning a value to
712.B $0
713causes the record to be resplit, creating new
714values for the fields.
715.SS Built-in Variables
716.PP
717.IR Gawk\^ "'s"
718built-in variables are:
719.PP
720.TP "\w'\fBFIELDWIDTHS\fR'u+1n"
721.B ARGC
722The number of command line arguments (does not include options to
723.IR gawk ,
724or the program source).
725.TP
726.B ARGIND
727The index in
728.B ARGV
729of the current file being processed.
730.TP
731.B ARGV
732Array of command line arguments. The array is indexed from
7330 to
734.B ARGC
735\- 1.
736Dynamically changing the contents of
737.B ARGV
738can control the files used for data.
739.TP
740.B BINMODE
741On non-POSIX systems, specifies use of \*(lqbinary\*(rq mode for all file I/O.
742Numeric values of 1, 2, or 3, specify that input files, output files, or
743all files, respectively, should use binary I/O.
744String values of \fB"r"\fR, or \fB"w"\fR specify that input files, or output files,
745respectively, should use binary I/O.
746String values of \fB"rw"\fR or \fB"wr"\fR specify that all files
747should use binary I/O.
748Any other string value is treated as \fB"rw"\fR, but generates a warning message.
749.TP
750.B CONVFMT
751The conversion format for numbers, \fB"%.6g"\fR, by default.
752.TP
753.B ENVIRON
754An array containing the values of the current environment.
755The array is indexed by the environment variables, each element being
756the value of that variable (e.g., \fBENVIRON["HOME"]\fP might be
757.BR /home/arnold ).
758Changing this array does not affect the environment seen by programs which
759.I gawk
760spawns via redirection or the
761.B system()
762function.
763.TP
764.B ERRNO
765If a system error occurs either doing a redirection for
766.BR getline ,
767during a read for
768.BR getline ,
769or during a
770.BR close() ,
771then
772.B ERRNO
773will contain
774a string describing the error.
775The value is subject to translation in non-English locales.
776.TP
777.B FIELDWIDTHS
778A white-space separated list of fieldwidths. When set,
779.I gawk
780parses the input into fields of fixed width, instead of using the
781value of the
782.B FS
783variable as the field separator.
784.TP
785.B FILENAME
786The name of the current input file.
787If no files are specified on the command line, the value of
788.B FILENAME
789is \*(lq\-\*(rq.
790However,
791.B FILENAME
792is undefined inside the
793.B BEGIN
794block
795(unless set by
796.BR getline ).
797.TP
798.B FNR
799The input record number in the current input file.
800.TP
801.B FS
802The input field separator, a space by default. See
803.BR Fields ,
804above.
805.TP
806.B IGNORECASE
807Controls the case-sensitivity of all regular expression
808and string operations. If
809.B IGNORECASE
810has a non-zero value, then string comparisons and
811pattern matching in rules,
812field splitting with
813.BR FS ,
814record separating with
815.BR RS ,
816regular expression
817matching with
818.B ~
819and
820.BR !~ ,
821and the
822.BR gensub() ,
823.BR gsub() ,
824.BR index() ,
825.BR match() ,
826.BR split() ,
827and
828.B sub()
829built-in functions all ignore case when doing regular expression
830operations.
831.B NOTE:
832Array subscripting is
833.I not
834affected.
835However, the
836.B asort()
837and
838.B asorti()
839functions are affected.
840.sp .5
841Thus, if
842.B IGNORECASE
843is not equal to zero,
844.B /aB/
845matches all of the strings \fB"ab"\fP, \fB"aB"\fP, \fB"Ab"\fP,
846and \fB"AB"\fP.
847As with all \*(AK variables, the initial value of
848.B IGNORECASE
849is zero, so all regular expression and string
850operations are normally case-sensitive.
851Under Unix, the full ISO 8859-1 Latin-1 character set is used
852when ignoring case.
853As of
854.I gawk
8553.1.4, the case equivalencies are fully locale-aware, based on
856the C
857.B <ctype.h>
858facilities such as
859.BR isalpha() ,
860and
861.BR tolupper() .
862.TP
863.B LINT
864Provides dynamic control of the
865.B \-\^\-lint
866option from within an \*(AK program.
867When true,
868.I gawk
869prints lint warnings. When false, it does not.
870When assigned the string value \fB"fatal"\fP,
871lint warnings become fatal errors, exactly like
872.BR \-\^\-lint=fatal .
873Any other true value just prints warnings.
874.TP
875.B NF
876The number of fields in the current input record.
877.TP
878.B NR
879The total number of input records seen so far.
880.TP
881.B OFMT
882The output format for numbers, \fB"%.6g"\fR, by default.
883.TP
884.B OFS
885The output field separator, a space by default.
886.TP
887.B ORS
888The output record separator, by default a newline.
889.TP
890.B PROCINFO
891The elements of this array provide access to information about the
892running \*(AK program.
893On some systems,
894there may be elements in the array, \fB"group1"\fP through
895\fB"group\fIn\fB"\fR for some
896.IR n ,
897which is the number of supplementary groups that the process has.
898Use the
899.B in
900operator to test for these elements.
901The following elements are guaranteed to be available:
902.RS
903.TP \w'\fBPROCINFO["pgrpid"]\fR'u+1n
904\fBPROCINFO["egid"]\fP
905the value of the
906.IR getegid (2)
907system call.
908.TP
909\fBPROCINFO["euid"]\fP
910the value of the
911.IR geteuid (2)
912system call.
913.TP
914\fBPROCINFO["FS"]\fP
915\fB"FS"\fP if field splitting with
916.B FS
917is in effect, or \fB"FIELDWIDTHS"\fP if field splitting with
918.B FIELDWIDTHS
919is in effect.
920.TP
921\fBPROCINFO["gid"]\fP
922the value of the
923.IR getgid (2)
924system call.
925.TP
926\fBPROCINFO["pgrpid"]\fP
927the process group ID of the current process.
928.TP
929\fBPROCINFO["pid"]\fP
930the process ID of the current process.
931.TP
932\fBPROCINFO["ppid"]\fP
933the parent process ID of the current process.
934.TP
935\fBPROCINFO["uid"]\fP
936the value of the
937.IR getuid (2)
938system call.
939.TP
940\fBPROCINFO["version"]\fP
941The version of
942.IR gawk .
943This is available from
944version 3.1.4 and later.
945.RE
946.TP
947.B RS
948The input record separator, by default a newline.
949.TP
950.B RT
951The record terminator.
952.I Gawk
953sets
954.B RT
955to the input text that matched the character or regular expression
956specified by
957.BR RS .
958.TP
959.B RSTART
960The index of the first character matched by
961.BR match() ;
9620 if no match.
963(This implies that character indices start at one.)
964.TP
965.B RLENGTH
966The length of the string matched by
967.BR match() ;
968\-1 if no match.
969.TP
970.B SUBSEP
971The character used to separate multiple subscripts in array
972elements, by default \fB"\e034"\fR.
973.TP
974.B TEXTDOMAIN
975The text domain of the \*(AK program; used to find the localized
976translations for the program's strings.
977.SS Arrays
978.PP
979Arrays are subscripted with an expression between square brackets
980.RB ( [ " and " ] ).
981If the expression is an expression list
982.RI ( expr ", " expr " .\|.\|.)"
983then the array subscript is a string consisting of the
984concatenation of the (string) value of each expression,
985separated by the value of the
986.B SUBSEP
987variable.
988This facility is used to simulate multiply dimensioned
989arrays. For example:
990.PP
991.RS
992.ft B
993i = "A";\^ j = "B";\^ k = "C"
994.br
995x[i, j, k] = "hello, world\en"
996.ft R
997.RE
998.PP
999assigns the string \fB"hello, world\en"\fR to the element of the array
1000.B x
1001which is indexed by the string \fB"A\e034B\e034C"\fR. All arrays in \*(AK
1002are associative, i.e. indexed by string values.
1003.PP
1004The special operator
1005.B in
1006may be used in an
1007.B if
1008or
1009.B while
1010statement to see if an array has an index consisting of a particular
1011value.
1012.PP
1013.RS
1014.ft B
1015.nf
1016if (val in array)
1017 print array[val]
1018.fi
1019.ft
1020.RE
1021.PP
1022If the array has multiple subscripts, use
1023.BR "(i, j) in array" .
1024.PP
1025The
1026.B in
1027construct may also be used in a
1028.B for
1029loop to iterate over all the elements of an array.
1030.PP
1031An element may be deleted from an array using the
1032.B delete
1033statement.
1034The
1035.B delete
1036statement may also be used to delete the entire contents of an array,
1037just by specifying the array name without a subscript.
1038.SS Variable Typing And Conversion
1039.PP
1040Variables and fields
1041may be (floating point) numbers, or strings, or both. How the
1042value of a variable is interpreted depends upon its context. If used in
1043a numeric expression, it will be treated as a number, if used as a string
1044it will be treated as a string.
1045.PP
1046To force a variable to be treated as a number, add 0 to it; to force it
1047to be treated as a string, concatenate it with the null string.
1048.PP
1049When a string must be converted to a number, the conversion is accomplished
1050using
1051.IR strtod (3).
1052A number is converted to a string by using the value of
1053.B CONVFMT
1054as a format string for
1055.IR sprintf (3),
1056with the numeric value of the variable as the argument.
1057However, even though all numbers in \*(AK are floating-point,
1058integral values are
1059.I always
1060converted as integers. Thus, given
1061.PP
1062.RS
1063.ft B
1064.nf
1065CONVFMT = "%2.2f"
1066a = 12
1067b = a ""
1068.fi
1069.ft R
1070.RE
1071.PP
1072the variable
1073.B b
1074has a string value of \fB"12"\fR and not \fB"12.00"\fR.
1075.PP
1076.I Gawk
1077performs comparisons as follows:
1078If two variables are numeric, they are compared numerically.
1079If one value is numeric and the other has a string value that is a
1080\*(lqnumeric string,\*(rq then comparisons are also done numerically.
1081Otherwise, the numeric value is converted to a string and a string
1082comparison is performed.
1083Two strings are compared, of course, as strings.
1084Note that the POSIX standard applies the concept of
1085\*(lqnumeric string\*(rq everywhere, even to string constants.
1086However, this is
1087clearly incorrect, and
1088.I gawk
1089does not do this.
1090(Fortunately, this is fixed in the next version of the standard.)
1091.PP
1092Note that string constants, such as \fB"57"\fP, are
1093.I not
1094numeric strings, they are string constants.
1095The idea of \*(lqnumeric string\*(rq
1096only applies to fields,
1097.B getline
1098input,
1099.BR FILENAME ,
1100.B ARGV
1101elements,
1102.B ENVIRON
1103elements and the elements of an array created by
1104.B split()
1105that are numeric strings.
1106The basic idea is that
1107.IR "user input" ,
1108and only user input, that looks numeric,
1109should be treated that way.
1110.PP
1111Uninitialized variables have the numeric value 0 and the string value ""
1112(the null, or empty, string).
1113.SS Octal and Hexadecimal Constants
1114Starting with version 3.1 of
1115.I gawk ,
1116you may use C-style octal and hexadecimal constants in your AWK
1117program source code.
1118For example, the octal value
1119.B 011
1120is equal to decimal
1121.BR 9 ,
1122and the hexadecimal value
1123.B 0x11
1124is equal to decimal 17.
1125.SS String Constants
1126.PP
1127String constants in \*(AK are sequences of characters enclosed
1128between double quotes (\fB"\fR). Within strings, certain
1129.I "escape sequences"
1130are recognized, as in C. These are:
1131.PP
1132.TP "\w'\fB\e\^\fIddd\fR'u+1n"
1133.B \e\e
1134A literal backslash.
1135.TP
1136.B \ea
1137The \*(lqalert\*(rq character; usually the \s-1ASCII\s+1 \s-1BEL\s+1 character.
1138.TP
1139.B \eb
1140backspace.
1141.TP
1142.B \ef
1143form-feed.
1144.TP
1145.B \en
1146newline.
1147.TP
1148.B \er
1149carriage return.
1150.TP
1151.B \et
1152horizontal tab.
1153.TP
1154.B \ev
1155vertical tab.
1156.TP
1157.BI \ex "\^hex digits"
1158The character represented by the string of hexadecimal digits following
1159the
1160.BR \ex .
1161As in \*(AN C, all following hexadecimal digits are considered part of
1162the escape sequence.
1163(This feature should tell us something about language design by committee.)
1164E.g., \fB"\ex1B"\fR is the \s-1ASCII\s+1 \s-1ESC\s+1 (escape) character.
1165.TP
1166.BI \e ddd
1167The character represented by the 1-, 2-, or 3-digit sequence of octal
1168digits.
1169E.g., \fB"\e033"\fR is the \s-1ASCII\s+1 \s-1ESC\s+1 (escape) character.
1170.TP
1171.BI \e c
1172The literal character
1173.IR c\^ .
1174.PP
1175The escape sequences may also be used inside constant regular expressions
1176(e.g.,
1177.B "/[\ \et\ef\en\er\ev]/"
1178matches whitespace characters).
1179.PP
1180In compatibility mode, the characters represented by octal and
1181hexadecimal escape sequences are treated literally when used in
1182regular expression constants. Thus,
1183.B /a\e52b/
1184is equivalent to
1185.BR /a\e*b/ .
1186.SH PATTERNS AND ACTIONS
1187\*(AK is a line-oriented language. The pattern comes first, and then the
1188action. Action statements are enclosed in
1189.B {
1190and
1191.BR } .
1192Either the pattern may be missing, or the action may be missing, but,
1193of course, not both. If the pattern is missing, the action is
1194executed for every single record of input.
1195A missing action is equivalent to
1196.RS
1197.PP
1198.B "{ print }"
1199.RE
1200.PP
1201which prints the entire record.
1202.PP
1203Comments begin with the \*(lq#\*(rq character, and continue until the
1204end of the line.
1205Blank lines may be used to separate statements.
1206Normally, a statement ends with a newline, however, this is not the
1207case for lines ending in
1208a \*(lq,\*(rq,
1209.BR { ,
1210.BR ? ,
1211.BR : ,
1212.BR && ,
1213or
1214.BR || .
1215Lines ending in
1216.B do
1217or
1218.B else
1219also have their statements automatically continued on the following line.
1220In other cases, a line can be continued by ending it with a \*(lq\e\*(rq,
1221in which case the newline will be ignored.
1222.PP
1223Multiple statements may
1224be put on one line by separating them with a \*(lq;\*(rq.
1225This applies to both the statements within the action part of a
1226pattern-action pair (the usual case),
1227and to the pattern-action statements themselves.
1228.SS Patterns
1229\*(AK patterns may be one of the following:
1230.PP
1231.RS
1232.nf
1233.B BEGIN
1234.B END
1235.BI / "regular expression" /
1236.I "relational expression"
1237.IB pattern " && " pattern
1238.IB pattern " || " pattern
1239.IB pattern " ? " pattern " : " pattern
1240.BI ( pattern )
1241.BI ! " pattern"
1242.IB pattern1 ", " pattern2
1243.fi
1244.RE
1245.PP
1246.B BEGIN
1247and
1248.B END
1249are two special kinds of patterns which are not tested against
1250the input.
1251The action parts of all
1252.B BEGIN
1253patterns are merged as if all the statements had
1254been written in a single
1255.B BEGIN
1256block. They are executed before any
1257of the input is read. Similarly, all the
1258.B END
1259blocks are merged,
1260and executed when all the input is exhausted (or when an
1261.B exit
1262statement is executed).
1263.B BEGIN
1264and
1265.B END
1266patterns cannot be combined with other patterns in pattern expressions.
1267.B BEGIN
1268and
1269.B END
1270patterns cannot have missing action parts.
1271.PP
1272For
1273.BI / "regular expression" /
1274patterns, the associated statement is executed for each input record that matches
1275the regular expression.
1276Regular expressions are the same as those in
1277.IR egrep (1),
1278and are summarized below.
1279.PP
1280A
1281.I "relational expression"
1282may use any of the operators defined below in the section on actions.
1283These generally test whether certain fields match certain regular expressions.
1284.PP
1285The
1286.BR && ,
1287.BR || ,
1288and
1289.B !
1290operators are logical AND, logical OR, and logical NOT, respectively, as in C.
1291They do short-circuit evaluation, also as in C, and are used for combining
1292more primitive pattern expressions. As in most languages, parentheses
1293may be used to change the order of evaluation.
1294.PP
1295The
1296.B ?\^:
1297operator is like the same operator in C. If the first pattern is true
1298then the pattern used for testing is the second pattern, otherwise it is
1299the third. Only one of the second and third patterns is evaluated.
1300.PP
1301The
1302.IB pattern1 ", " pattern2
1303form of an expression is called a
1304.IR "range pattern" .
1305It matches all input records starting with a record that matches
1306.IR pattern1 ,
1307and continuing until a record that matches
1308.IR pattern2 ,
1309inclusive. It does not combine with any other sort of pattern expression.
1310.SS Regular Expressions
1311Regular expressions are the extended kind found in
1312.IR egrep .
1313They are composed of characters as follows:
1314.TP "\w'\fB[^\fIabc.\|.\|.\fB]\fR'u+2n"
1315.I c
1316matches the non-metacharacter
1317.IR c .
1318.TP
1319.I \ec
1320matches the literal character
1321.IR c .
1322.TP
1323.B .
1324matches any character
1325.I including
1326newline.
1327.TP
1328.B ^
1329matches the beginning of a string.
1330.TP
1331.B $
1332matches the end of a string.
1333.TP
1334.BI [ abc.\|.\|. ]
1335character list, matches any of the characters
1336.IR abc.\|.\|. .
1337.TP
1338.BI [^ abc.\|.\|. ]
1339negated character list, matches any character except
1340.IR abc.\|.\|. .
1341.TP
1342.IB r1 | r2
1343alternation: matches either
1344.I r1
1345or
1346.IR r2 .
1347.TP
1348.I r1r2
1349concatenation: matches
1350.IR r1 ,
1351and then
1352.IR r2 .
1353.TP
1354.IB r\^ +
1355matches one or more
1356.IR r\^ "'s."
1357.TP
1358.IB r *
1359matches zero or more
1360.IR r\^ "'s."
1361.TP
1362.IB r\^ ?
1363matches zero or one
1364.IR r\^ "'s."
1365.TP
1366.BI ( r )
1367grouping: matches
1368.IR r .
1369.TP
1370.PD 0
1371.IB r { n }
1372.TP
1373.PD 0
1374.IB r { n ,}
1375.TP
1376.PD
1377.IB r { n , m }
1378One or two numbers inside braces denote an
1379.IR "interval expression" .
1380If there is one number in the braces, the preceding regular expression
1381.I r
1382is repeated
1383.I n
1384times. If there are two numbers separated by a comma,
1385.I r
1386is repeated
1387.I n
1388to
1389.I m
1390times.
1391If there is one number followed by a comma, then
1392.I r
1393is repeated at least
1394.I n
1395times.
1396.sp .5
1397Interval expressions are only available if either
1398.B \-\^\-posix
1399or
1400.B \-\^\-re\-interval
1401is specified on the command line.
1402.TP
1403.B \ey
1404matches the empty string at either the beginning or the
1405end of a word.
1406.TP
1407.B \eB
1408matches the empty string within a word.
1409.TP
1410.B \e<
1411matches the empty string at the beginning of a word.
1412.TP
1413.B \e>
1414matches the empty string at the end of a word.
1415.TP
1416.B \ew
1417matches any word-constituent character (letter, digit, or underscore).
1418.TP
1419.B \eW
1420matches any character that is not word-constituent.
1421.TP
1422.B \e`
1423matches the empty string at the beginning of a buffer (string).
1424.TP
1425.B \e'
1426matches the empty string at the end of a buffer.
1427.PP
1428The escape sequences that are valid in string constants (see below)
1429are also valid in regular expressions.
1430.PP
1431.I "Character classes"
1432are a new feature introduced in the \*(PX standard.
1433A character class is a special notation for describing
1434lists of characters that have a specific attribute, but where the
1435actual characters themselves can vary from country to country and/or
1436from character set to character set. For example, the notion of what
1437is an alphabetic character differs in the USA and in France.
1438.PP
1439A character class is only valid in a regular expression
1440.I inside
1441the brackets of a character list. Character classes consist of
1442.BR [: ,
1443a keyword denoting the class, and
1444.BR :] .
1445The character
1446classes defined by the \*(PX standard are:
1447.TP "\w'\fB[:alnum:]\fR'u+2n"
1448.B [:alnum:]
1449Alphanumeric characters.
1450.TP
1451.B [:alpha:]
1452Alphabetic characters.
1453.TP
1454.B [:blank:]
1455Space or tab characters.
1456.TP
1457.B [:cntrl:]
1458Control characters.
1459.TP
1460.B [:digit:]
1461Numeric characters.
1462.TP
1463.B [:graph:]
1464Characters that are both printable and visible.
1465(A space is printable, but not visible, while an
1466.B a
1467is both.)
1468.TP
1469.B [:lower:]
1470Lower-case alphabetic characters.
1471.TP
1472.B [:print:]
1473Printable characters (characters that are not control characters.)
1474.TP
1475.B [:punct:]
1476Punctuation characters (characters that are not letter, digits,
1477control characters, or space characters).
1478.TP
1479.B [:space:]
1480Space characters (such as space, tab, and formfeed, to name a few).
1481.TP
1482.B [:upper:]
1483Upper-case alphabetic characters.
1484.TP
1485.B [:xdigit:]
1486Characters that are hexadecimal digits.
1487.PP
1488For example, before the \*(PX standard, to match alphanumeric
1489characters, you would have had to write
1490.BR /[A\-Za\-z0\-9]/ .
1491If your character set had other alphabetic characters in it, this would not
1492match them, and if your character set collated differently from
1493\s-1ASCII\s+1, this might not even match the
1494\s-1ASCII\s+1 alphanumeric characters.
1495With the \*(PX character classes, you can write
1496.BR /[[:alnum:]]/ ,
1497and this matches
1498the alphabetic and numeric characters in your character set.
1499.PP
1500Two additional special sequences can appear in character lists.
1501These apply to non-\s-1ASCII\s+1 character sets, which can have single symbols
1502(called
1503.IR "collating elements" )
1504that are represented with more than one
1505character, as well as several characters that are equivalent for
1506.IR collating ,
1507or sorting, purposes. (E.g., in French, a plain \*(lqe\*(rq
1508and a grave-accented e\` are equivalent.)
1509.TP
1510Collating Symbols
1511A collating symbol is a multi-character collating element enclosed in
1512.B [.
1513and
1514.BR .] .
1515For example, if
1516.B ch
1517is a collating element, then
1518.B [[.ch.]]
1519is a regular expression that matches this collating element, while
1520.B [ch]
1521is a regular expression that matches either
1522.B c
1523or
1524.BR h .
1525.TP
1526Equivalence Classes
1527An equivalence class is a locale-specific name for a list of
1528characters that are equivalent. The name is enclosed in
1529.B [=
1530and
1531.BR =] .
1532For example, the name
1533.B e
1534might be used to represent all of
1535\*(lqe,\*(rq \*(lqe\h'-\w:e:u'\',\*(rq and \*(lqe\h'-\w:e:u'\`.\*(rq
1536In this case,
1537.B [[=e=]]
1538is a regular expression
1539that matches any of
1540.BR e ,
1541.BR "e\h'-\w:e:u'\'" ,
1542or
1543.BR "e\h'-\w:e:u'\`" .
1544.PP
1545These features are very valuable in non-English speaking locales.
1546The library functions that
1547.I gawk
1548uses for regular expression matching
1549currently only recognize \*(PX character classes; they do not recognize
1550collating symbols or equivalence classes.
1551.PP
1552The
1553.BR \ey ,
1554.BR \eB ,
1555.BR \e< ,
1556.BR \e> ,
1557.BR \ew ,
1558.BR \eW ,
1559.BR \e` ,
1560and
1561.B \e'
1562operators are specific to
1563.IR gawk ;
1564they are extensions based on facilities in the \*(GN regular expression libraries.
1565.PP
1566The various command line options
1567control how
1568.I gawk
1569interprets characters in regular expressions.
1570.TP
1571No options
1572In the default case,
1573.I gawk
1574provide all the facilities of
1575\*(PX regular expressions and the \*(GN regular expression operators described above.
1576However, interval expressions are not supported.
1577.TP
1578.B \-\^\-posix
1579Only \*(PX regular expressions are supported, the \*(GN operators are not special.
1580(E.g.,
1581.B \ew
1582matches a literal
1583.BR w ).
1584Interval expressions are allowed.
1585.TP
1586.B \-\^\-traditional
1587Traditional Unix
1588.I awk
1589regular expressions are matched. The \*(GN operators
1590are not special, interval expressions are not available, and neither
1591are the \*(PX character classes
1592.RB ( [[:alnum:]]
1593and so on).
1594Characters described by octal and hexadecimal escape sequences are
1595treated literally, even if they represent regular expression metacharacters.
1596.TP
1597.B \-\^\-re\-interval
1598Allow interval expressions in regular expressions, even if
1599.B \-\^\-traditional
1600has been provided.
1601.SS Actions
1602Action statements are enclosed in braces,
1603.B {
1604and
1605.BR } .
1606Action statements consist of the usual assignment, conditional, and looping
1607statements found in most languages. The operators, control statements,
1608and input/output statements
1609available are patterned after those in C.
1610.SS Operators
1611.PP
1612The operators in \*(AK, in order of decreasing precedence, are
1613.PP
1614.TP "\w'\fB*= /= %= ^=\fR'u+1n"
1615.BR ( \&.\|.\|. )
1616Grouping
1617.TP
1618.B $
1619Field reference.
1620.TP
1621.B "++ \-\^\-"
1622Increment and decrement, both prefix and postfix.
1623.TP
1624.B ^
1625Exponentiation (\fB**\fR may also be used, and \fB**=\fR for
1626the assignment operator).
1627.TP
1628.B "+ \- !"
1629Unary plus, unary minus, and logical negation.
1630.TP
1631.B "* / %"
1632Multiplication, division, and modulus.
1633.TP
1634.B "+ \-"
1635Addition and subtraction.
1636.TP
1637.I space
1638String concatenation.
1639.TP
1640.PD 0
1641.B "< >"
1642.TP
1643.PD 0
1644.B "<= >="
1645.TP
1646.PD
1647.B "!= =="
1648The regular relational operators.
1649.TP
1650.B "~ !~"
1651Regular expression match, negated match.
1652.B NOTE:
1653Do not use a constant regular expression
1654.RB ( /foo/ )
1655on the left-hand side of a
1656.B ~
1657or
1658.BR !~ .
1659Only use one on the right-hand side. The expression
1660.BI "/foo/ ~ " exp
1661has the same meaning as \fB(($0 ~ /foo/) ~ \fIexp\fB)\fR.
1662This is usually
1663.I not
1664what was intended.
1665.TP
1666.B in
1667Array membership.
1668.TP
1669.B &&
1670Logical AND.
1671.TP
1672.B ||
1673Logical OR.
1674.TP
1675.B ?:
1676The C conditional expression. This has the form
1677.IB expr1 " ? " expr2 " : " expr3\c
1678\&.
1679If
1680.I expr1
1681is true, the value of the expression is
1682.IR expr2 ,
1683otherwise it is
1684.IR expr3 .
1685Only one of
1686.I expr2
1687and
1688.I expr3
1689is evaluated.
1690.TP
1691.PD 0
1692.B "= += \-="
1693.TP
1694.PD
1695.B "*= /= %= ^="
1696Assignment. Both absolute assignment
1697.BI ( var " = " value )
1698and operator-assignment (the other forms) are supported.
1699.SS Control Statements
1700.PP
1701The control statements are
1702as follows:
1703.PP
1704.RS
1705.nf
1706\fBif (\fIcondition\fB) \fIstatement\fR [ \fBelse\fI statement \fR]
1707\fBwhile (\fIcondition\fB) \fIstatement \fR
1708\fBdo \fIstatement \fBwhile (\fIcondition\fB)\fR
1709\fBfor (\fIexpr1\fB; \fIexpr2\fB; \fIexpr3\fB) \fIstatement\fR
1710\fBfor (\fIvar \fBin\fI array\fB) \fIstatement\fR
1711\fBbreak\fR
1712\fBcontinue\fR
1713\fBdelete \fIarray\^\fB[\^\fIindex\^\fB]\fR
1714\fBdelete \fIarray\^\fR
1715\fBexit\fR [ \fIexpression\fR ]
1716\fB{ \fIstatements \fB}\fR
1717.fi
1718.RE
1719.SS "I/O Statements"
1720.PP
1721The input/output statements are as follows:
1722.PP
1723.TP "\w'\fBprintf \fIfmt, expr-list\fR'u+1n"
1724\fBclose(\fIfile \fR[\fB, \fIhow\fR]\fB)\fR
1725Close file, pipe or co-process.
1726The optional
1727.I how
1728should only be used when closing one end of a
1729two-way pipe to a co-process.
1730It must be a string value, either
1731\fB"to"\fR or \fB"from"\fR.
1732.TP
1733.B getline
1734Set
1735.B $0
1736from next input record; set
1737.BR NF ,
1738.BR NR ,
1739.BR FNR .
1740.TP
1741.BI "getline <" file
1742Set
1743.B $0
1744from next record of
1745.IR file ;
1746set
1747.BR NF .
1748.TP
1749.BI getline " var"
1750Set
1751.I var
1752from next input record; set
1753.BR NR ,
1754.BR FNR .
1755.TP
1756.BI getline " var" " <" file
1757Set
1758.I var
1759from next record of
1760.IR file .
1761.TP
1762\fIcommand\fB | getline \fR[\fIvar\fR]
1763Run
1764.I command
1765piping the output either into
1766.B $0
1767or
1768.IR var ,
1769as above.
1770.TP
1771\fIcommand\fB |& getline \fR[\fIvar\fR]
1772Run
1773.I command
1774as a co-process
1775piping the output either into
1776.B $0
1777or
1778.IR var ,
1779as above.
1780Co-processes are a
1781.I gawk
1782extension.
1783.TP
1784.B next
1785Stop processing the current input record. The next input record
1786is read and processing starts over with the first pattern in the
1787\*(AK program. If the end of the input data is reached, the
1788.B END
1789block(s), if any, are executed.
1790.TP
1791.B "nextfile"
1792Stop processing the current input file. The next input record read
1793comes from the next input file.
1794.B FILENAME
1795and
1796.B ARGIND
1797are updated,
1798.B FNR
1799is reset to 1, and processing starts over with the first pattern in the
1800\*(AK program. If the end of the input data is reached, the
1801.B END
1802block(s), if any, are executed.
1803.TP
1804.B print
1805Prints the current record.
1806The output record is terminated with the value of the
1807.B ORS
1808variable.
1809.TP
1810.BI print " expr-list"
1811Prints expressions.
1812Each expression is separated by the value of the
1813.B OFS
1814variable.
1815The output record is terminated with the value of the
1816.B ORS
1817variable.
1818.TP
1819.BI print " expr-list" " >" file
1820Prints expressions on
1821.IR file .
1822Each expression is separated by the value of the
1823.B OFS
1824variable. The output record is terminated with the value of the
1825.B ORS
1826variable.
1827.TP
1828.BI printf " fmt, expr-list"
1829Format and print.
1830.TP
1831.BI printf " fmt, expr-list" " >" file
1832Format and print on
1833.IR file .
1834.TP
1835.BI system( cmd-line )
1836Execute the command
1837.IR cmd-line ,
1838and return the exit status.
1839(This may not be available on non-\*(PX systems.)
1840.TP
1841\&\fBfflush(\fR[\fIfile\^\fR]\fB)\fR
1842Flush any buffers associated with the open output file or pipe
1843.IR file .
1844If
1845.I file
1846is missing, then standard output is flushed.
1847If
1848.I file
1849is the null string,
1850then all open output files and pipes
1851have their buffers flushed.
1852.PP
1853Additional output redirections are allowed for
1854.B print
1855and
1856.BR printf .
1857.TP
1858.BI "print .\|.\|. >>" " file"
1859appends output to the
1860.IR file .
1861.TP
1862.BI "print .\|.\|. |" " command"
1863writes on a pipe.
1864.TP
1865.BI "print .\|.\|. |&" " command"
1866sends data to a co-process.
1867.PP
1868The
1869.BR getline
1870command returns 0 on end of file and \-1 on an error.
1871Upon an error,
1872.B ERRNO
1873contains a string describing the problem.
1874.PP
1875.B NOTE:
1876If using a pipe or co-process to
1877.BR getline ,
1878or from
1879.B print
1880or
1881.B printf
1882within a loop, you
1883.I must
1884use
1885.B close()
1886to create new instances of the command.
1887\*(AK does not automatically close pipes or co-processes when
1888they return EOF.
1889.SS The \fIprintf\fP\^ Statement
1890.PP
1891The \*(AK versions of the
1892.B printf
1893statement and
1894.B sprintf()
1895function
1896(see below)
1897accept the following conversion specification formats:
1898.TP "\w'\fB%g\fR, \fB%G\fR'u+2n"
1899.B %c
1900An \s-1ASCII\s+1 character.
1901If the argument used for
1902.B %c
1903is numeric, it is treated as a character and printed.
1904Otherwise, the argument is assumed to be a string, and the only first
1905character of that string is printed.
1906.TP
1907.BR "%d" "," " %i"
1908A decimal number (the integer part).
1909.TP
1910.B %e , " %E"
1911A floating point number of the form
1912.BR [\-]d.dddddde[+\^\-]dd .
1913The
1914.B %E
1915format uses
1916.B E
1917instead of
1918.BR e .
1919.TP
1920.B %f
1921A floating point number of the form
1922.BR [\-]ddd.dddddd .
1923.TP
1924.B %g , " %G"
1925Use
1926.B %e
1927or
1928.B %f
1929conversion, whichever is shorter, with nonsignificant zeros suppressed.
1930The
1931.B %G
1932format uses
1933.B %E
1934instead of
1935.BR %e .
1936.TP
1937.B %o
1938An unsigned octal number (also an integer).
1939.TP
1940.PD
1941.B %u
1942An unsigned decimal number (again, an integer).
1943.TP
1944.B %s
1945A character string.
1946.TP
1947.B %x , " %X"
1948An unsigned hexadecimal number (an integer).
1949The
1950.B %X
1951format uses
1952.B ABCDEF
1953instead of
1954.BR abcdef .
1955.TP
1956.B %%
1957A single
1958.B %
1959character; no argument is converted.
1960.PP
1961.BR NOTE :
1962When using the integer format-control letters for values that are
1963outside the range of a C
1964.B long
1965integer,
1966.I gawk
1967switches to the
1968.B %g
1969format specifier. If
1970.B \-\^\-lint
1971is provided on the command line
1972.I gawk
1973warns about this. Other versions of
1974.I awk
1975may print invalid values or do something else entirely.
1976.PP
1977Optional, additional parameters may lie between the
1978.B %
1979and the control letter:
1980.TP
1981.IB count $
1982Use the
1983.IR count "'th"
1984argument at this point in the formatting.
1985This is called a
1986.I "positional specifier"
1987and
1988is intended primarily for use in translated versions of
1989format strings, not in the original text of an AWK program.
1990It is a
1991.I gawk
1992extension.
1993.TP
1994.B \-
1995The expression should be left-justified within its field.
1996.TP
1997.I space
1998For numeric conversions, prefix positive values with a space, and
1999negative values with a minus sign.
2000.TP
2001.B +
2002The plus sign, used before the width modifier (see below),
2003says to always supply a sign for numeric conversions, even if the data
2004to be formatted is positive. The
2005.B +
2006overrides the space modifier.
2007.TP
2008.B #
2009Use an \*(lqalternate form\*(rq for certain control letters.
2010For
2011.BR %o ,
2012supply a leading zero.
2013For
2014.BR %x ,
2015and
2016.BR %X ,
2017supply a leading
2018.BR 0x
2019or
2020.BR 0X
2021for
2022a nonzero result.
2023For
2024.BR %e ,
2025.BR %E ,
2026and
2027.BR %f ,
2028the result always contains a
2029decimal point.
2030For
2031.BR %g ,
2032and
2033.BR %G ,
2034trailing zeros are not removed from the result.
2035.TP
2036.B 0
2037A leading
2038.B 0
2039(zero) acts as a flag, that indicates output should be
2040padded with zeroes instead of spaces.
2041This applies even to non-numeric output formats.
2042This flag only has an effect when the field width is wider than the
2043value to be printed.
2044.TP
2045.I width
2046The field should be padded to this width. The field is normally padded
2047with spaces. If the
2048.B 0
2049flag has been used, it is padded with zeroes.
2050.TP
2051.BI \&. prec
2052A number that specifies the precision to use when printing.
2053For the
2054.BR %e ,
2055.BR %E ,
2056and
2057.BR %f
2058formats, this specifies the
2059number of digits you want printed to the right of the decimal point.
2060For the
2061.BR %g ,
2062and
2063.B %G
2064formats, it specifies the maximum number
2065of significant digits. For the
2066.BR %d ,
2067.BR %o ,
2068.BR %i ,
2069.BR %u ,
2070.BR %x ,
2071and
2072.B %X
2073formats, it specifies the minimum number of
2074digits to print. For
2075.BR %s ,
2076it specifies the maximum number of
2077characters from the string that should be printed.
2078.PP
2079The dynamic
2080.I width
2081and
2082.I prec
2083capabilities of the \*(AN C
2084.B printf()
2085routines are supported.
2086A
2087.B *
2088in place of either the
2089.B width
2090or
2091.B prec
2092specifications causes their values to be taken from
2093the argument list to
2094.B printf
2095or
2096.BR sprintf() .
2097To use a positional specifier with a dynamic width or precision,
2098supply the
2099.IB count $
2100after the
2101.B *
2102in the format string.
2103For example, \fB"%3$*2$.*1$s"\fP.
2104.SS Special File Names
2105.PP
2106When doing I/O redirection from either
2107.B print
2108or
2109.B printf
2110into a file,
2111or via
2112.B getline
2113from a file,
2114.I gawk
2115recognizes certain special filenames internally. These filenames
2116allow access to open file descriptors inherited from
2117.IR gawk\^ "'s"
2118parent process (usually the shell).
2119These file names may also be used on the command line to name data files.
2120The filenames are:
2121.TP "\w'\fB/dev/stdout\fR'u+1n"
2122.B /dev/stdin
2123The standard input.
2124.TP
2125.B /dev/stdout
2126The standard output.
2127.TP
2128.B /dev/stderr
2129The standard error output.
2130.TP
2131.BI /dev/fd/\^ n
2132The file associated with the open file descriptor
2133.IR n .
2134.PP
2135These are particularly useful for error messages. For example:
2136.PP
2137.RS
2138.ft B
2139print "You blew it!" > "/dev/stderr"
2140.ft R
2141.RE
2142.PP
2143whereas you would otherwise have to use
2144.PP
2145.RS
2146.ft B
2147print "You blew it!" | "cat 1>&2"
2148.ft R
2149.RE
2150.PP
2151The following special filenames may be used with the
2152.B |&
2153co-process operator for creating TCP/IP network connections.
2154.TP "\w'\fB/inet/tcp/\fIlport\fB/\fIrhost\fB/\fIrport\fR'u+2n"
2155.BI /inet/tcp/ lport / rhost / rport
2156File for TCP/IP connection on local port
2157.I lport
2158to
2159remote host
2160.I rhost
2161on remote port
2162.IR rport .
2163Use a port of
2164.B 0
2165to have the system pick a port.
2166.TP
2167.BI /inet/udp/ lport / rhost / rport
2168Similar, but use UDP/IP instead of TCP/IP.
2169.TP
2170.BI /inet/raw/ lport / rhost / rport
2171.\" Similar, but use raw IP sockets.
2172Reserved for future use.
2173.PP
2174Other special filenames provide access to information about the running
2175.I gawk
2176process.
2177.B "These filenames are now obsolete."
2178Use the
2179.B PROCINFO
2180array to obtain the information they provide.
2181The filenames are:
2182.TP "\w'\fB/dev/stdout\fR'u+1n"
2183.B /dev/pid
2184Reading this file returns the process ID of the current process,
2185in decimal, terminated with a newline.
2186.TP
2187.B /dev/ppid
2188Reading this file returns the parent process ID of the current process,
2189in decimal, terminated with a newline.
2190.TP
2191.B /dev/pgrpid
2192Reading this file returns the process group ID of the current process,
2193in decimal, terminated with a newline.
2194.TP
2195.B /dev/user
2196Reading this file returns a single record terminated with a newline.
2197The fields are separated with spaces.
2198.B $1
2199is the value of the
2200.IR getuid (2)
2201system call,
2202.B $2
2203is the value of the
2204.IR geteuid (2)
2205system call,
2206.B $3
2207is the value of the
2208.IR getgid (2)
2209system call, and
2210.B $4
2211is the value of the
2212.IR getegid (2)
2213system call.
2214If there are any additional fields, they are the group IDs returned by
2215.IR getgroups (2).
2216Multiple groups may not be supported on all systems.
2217.SS Numeric Functions
2218.PP
2219\*(AK has the following built-in arithmetic functions:
2220.PP
2221.TP "\w'\fBsrand(\fR[\fIexpr\^\fR]\fB)\fR'u+1n"
2222.BI atan2( y , " x" )
2223Returns the arctangent of
2224.I y/x
2225in radians.
2226.TP
2227.BI cos( expr )
2228Returns the cosine of
2229.IR expr ,
2230which is in radians.
2231.TP
2232.BI exp( expr )
2233The exponential function.
2234.TP
2235.BI int( expr )
2236Truncates to integer.
2237.TP
2238.BI log( expr )
2239The natural logarithm function.
2240.TP
2241.B rand()
2242Returns a random number
2243.IR N ,
2244between 0 and 1,
2245such that 0 \(<= \fIN\fP < 1.
2246.TP
2247.BI sin( expr )
2248Returns the sine of
2249.IR expr ,
2250which is in radians.
2251.TP
2252.BI sqrt( expr )
2253The square root function.
2254.TP
2255\&\fBsrand(\fR[\fIexpr\^\fR]\fB)\fR
2256Uses
2257.I expr
2258as a new seed for the random number generator. If no
2259.I expr
2260is provided, the time of day is used.
2261The return value is the previous seed for the random
2262number generator.
2263.SS String Functions
2264.PP
2265.I Gawk
2266has the following built-in string functions:
2267.PP
2268.TP "\w'\fBsprintf(\^\fIfmt\fB\^, \fIexpr-list\^\fB)\fR'u+1n"
2269\fBasort(\fIs \fR[\fB, \fId\fR]\fB)\fR
2270Returns the number of elements in the source
2271array
2272.IR s .
2273The contents of
2274.I s
2275are sorted using
2276.IR gawk\^ "'s"
2277normal rules for
2278comparing values, and the indexes of the
2279sorted values of
2280.I s
2281are replaced with sequential
2282integers starting with 1. If the optional
2283destination array
2284.I d
2285is specified, then
2286.I s
2287is first duplicated into
2288.IR d ,
2289and then
2290.I d
2291is sorted, leaving the indexes of the
2292source array
2293.I s
2294unchanged.
2295.TP "\w'\fBsprintf(\^\fIfmt\fB\^, \fIexpr-list\^\fB)\fR'u+1n"
2296\fBasorti(\fIs \fR[\fB, \fId\fR]\fB)\fR
2297Returns the number of elements in the source
2298array
2299.IR s .
2300The behavior is the same as that of
2301.BR asort() ,
2302except that the array
2303.I indices
2304are used for sorting, not the array values.
2305When done, the array is indexed numerically, and
2306the values are those of the original indices.
2307The original values are lost; thus provide
2308a second array if you wish to preserve the original.
2309.TP
2310\fBgensub(\fIr\fB, \fIs\fB, \fIh \fR[\fB, \fIt\fR]\fB)\fR
2311Search the target string
2312.I t
2313for matches of the regular expression
2314.IR r .
2315If
2316.I h
2317is a string beginning with
2318.B g
2319or
2320.BR G ,
2321then replace all matches of
2322.I r
2323with
2324.IR s .
2325Otherwise,
2326.I h
2327is a number indicating which match of
2328.I r
2329to replace.
2330If
2331.I t
2332is not supplied,
2333.B $0
2334is used instead.
2335Within the replacement text
2336.IR s ,
2337the sequence
2338.BI \e n\fR,
2339where
2340.I n
2341is a digit from 1 to 9, may be used to indicate just the text that
2342matched the
2343.IR n 'th
2344parenthesized subexpression. The sequence
2345.B \e0
2346represents the entire matched text, as does the character
2347.BR & .
2348Unlike
2349.B sub()
2350and
2351.BR gsub() ,
2352the modified string is returned as the result of the function,
2353and the original target string is
2354.I not
2355changed.
2356.TP "\w'\fBsprintf(\^\fIfmt\fB\^, \fIexpr-list\^\fB)\fR'u+1n"
2357\fBgsub(\fIr\fB, \fIs \fR[\fB, \fIt\fR]\fB)\fR
2358For each substring matching the regular expression
2359.I r
2360in the string
2361.IR t ,
2362substitute the string
2363.IR s ,
2364and return the number of substitutions.
2365If
2366.I t
2367is not supplied, use
2368.BR $0 .
2369An
2370.B &
2371in the replacement text is replaced with the text that was actually matched.
2372Use
2373.B \e&
2374to get a literal
2375.BR & .
2376(This must be typed as \fB"\e\e&"\fP;
2377see \*(EP
2378for a fuller discussion of the rules for
2379.BR &'s
2380and backslashes in the replacement text of
2381.BR sub() ,
2382.BR gsub() ,
2383and
2384.BR gensub() .)
2385.TP
2386.BI index( s , " t" )
2387Returns the index of the string
2388.I t
2389in the string
2390.IR s ,
2391or 0 if
2392.I t
2393is not present.
2394(This implies that character indices start at one.)
2395.TP
2396\fBlength(\fR[\fIs\fR]\fB)
2397Returns the length of the string
2398.IR s ,
2399or the length of
2400.B $0
2401if
2402.I s
2403is not supplied.
2404Starting with version 3.1.5,
2405as a non-standard extension, with an array argument,
2406.B length()
2407returns the number of elements in the array.
2408.TP
2409\fBmatch(\fIs\fB, \fIr \fR[\fB, \fIa\fR]\fB)\fR
2410Returns the position in
2411.I s
2412where the regular expression
2413.I r
2414occurs, or 0 if
2415.I r
2416is not present, and sets the values of
2417.B RSTART
2418and
2419.BR RLENGTH .
2420Note that the argument order is the same as for the
2421.B ~
2422operator:
2423.IB str " ~"
2424.IR re .
2425.ft R
2426If array
2427.I a
2428is provided,
2429.I a
2430is cleared and then elements 1 through
2431.I n
2432are filled with the portions of
2433.I s
2434that match the corresponding parenthesized
2435subexpression in
2436.IR r .
2437The 0'th element of
2438.I a
2439contains the portion
2440of
2441.I s
2442matched by the entire regular expression
2443.IR r .
2444Subscripts
2445\fBa[\fIn\^\fB, "start"]\fR,
2446and
2447\fBa[\fIn\^\fB, "length"]\fR
2448provide the starting index in the string and length
2449respectively, of each matching substring.
2450.TP
2451\fBsplit(\fIs\fB, \fIa \fR[\fB, \fIr\fR]\fB)\fR
2452Splits the string
2453.I s
2454into the array
2455.I a
2456on the regular expression
2457.IR r ,
2458and returns the number of fields. If
2459.I r
2460is omitted,
2461.B FS
2462is used instead.
2463The array
2464.I a
2465is cleared first.
2466Splitting behaves identically to field splitting, described above.
2467.TP
2468.BI sprintf( fmt , " expr-list" )
2469Prints
2470.I expr-list
2471according to
2472.IR fmt ,
2473and returns the resulting string.
2474.TP
2475.BI strtonum( str )
2476Examines
2477.IR str ,
2478and returns its numeric value.
2479If
2480.I str
2481begins
2482with a leading
2483.BR 0 ,
2484.B strtonum()
2485assumes that
2486.I str
2487is an octal number.
2488If
2489.I str
2490begins
2491with a leading
2492.B 0x
2493or
2494.BR 0X ,
2495.B strtonum()
2496assumes that
2497.I str
2498is a hexadecimal number.
2499.TP
2500\fBsub(\fIr\fB, \fIs \fR[\fB, \fIt\fR]\fB)\fR
2501Just like
2502.BR gsub() ,
2503but only the first matching substring is replaced.
2504.TP
2505\fBsubstr(\fIs\fB, \fIi \fR[\fB, \fIn\fR]\fB)\fR
2506Returns the at most
2507.IR n -character
2508substring of
2509.I s
2510starting at
2511.IR i .
2512If
2513.I n
2514is omitted, the rest of
2515.I s
2516is used.
2517.TP
2518.BI tolower( str )
2519Returns a copy of the string
2520.IR str ,
2521with all the upper-case characters in
2522.I str
2523translated to their corresponding lower-case counterparts.
2524Non-alphabetic characters are left unchanged.
2525.TP
2526.BI toupper( str )
2527Returns a copy of the string
2528.IR str ,
2529with all the lower-case characters in
2530.I str
2531translated to their corresponding upper-case counterparts.
2532Non-alphabetic characters are left unchanged.
2533.SS Time Functions
2534Since one of the primary uses of \*(AK programs is processing log files
2535that contain time stamp information,
2536.I gawk
2537provides the following functions for obtaining time stamps and
2538formatting them.
2539.PP
2540.TP "\w'\fBsystime()\fR'u+1n"
2541\fBmktime(\fIdatespec\fB)\fR
2542Turns
2543.I datespec
2544into a time stamp of the same form as returned by
2545.BR systime() .
2546The
2547.I datespec
2548is a string of the form
2549.IR "YYYY MM DD HH MM SS[ DST]" .
2550The contents of the string are six or seven numbers representing respectively
2551the full year including century,
2552the month from 1 to 12,
2553the day of the month from 1 to 31,
2554the hour of the day from 0 to 23,
2555the minute from 0 to 59,
2556and the second from 0 to 60,
2557and an optional daylight saving flag.
2558The values of these numbers need not be within the ranges specified;
2559for example, an hour of \-1 means 1 hour before midnight.
2560The origin-zero Gregorian calendar is assumed,
2561with year 0 preceding year 1 and year \-1 preceding year 0.
2562The time is assumed to be in the local timezone.
2563If the daylight saving flag is positive,
2564the time is assumed to be daylight saving time;
2565if zero, the time is assumed to be standard time;
2566and if negative (the default),
2567.B mktime()
2568attempts to determine whether daylight saving time is in effect
2569for the specified time.
2570If
2571.I datespec
2572does not contain enough elements or if the resulting time
2573is out of range,
2574.B mktime()
2575returns \-1.
2576.TP
2577\fBstrftime(\fR[\fIformat \fR[\fB, \fItimestamp\fR]]\fB)\fR
2578Formats
2579.I timestamp
2580according to the specification in
2581.IR format.
2582The
2583.I timestamp
2584should be of the same form as returned by
2585.BR systime() .
2586If
2587.I timestamp
2588is missing, the current time of day is used.
2589If
2590.I format
2591is missing, a default format equivalent to the output of
2592.IR date (1)
2593is used.
2594See the specification for the
2595.B strftime()
2596function in \*(AN C for the format conversions that are
2597guaranteed to be available.
2598A public-domain version of
2599.IR strftime (3)
2600and a man page for it come with
2601.IR gawk ;
2602if that version was used to build
2603.IR gawk ,
2604then all of the conversions described in that man page are available to
2605.IR gawk.
2606.TP
2607.B systime()
2608Returns the current time of day as the number of seconds since the Epoch
2609(1970-01-01 00:00:00 UTC on \*(PX systems).
2610.SS Bit Manipulations Functions
2611Starting with version 3.1 of
2612.IR gawk ,
2613the following bit manipulation functions are available.
2614They work by converting double-precision floating point
2615values to
2616.B "unsigned long"
2617integers, doing the operation, and then converting the
2618result back to floating point.
2619The functions are:
2620.TP "\w'\fBrshift(\fIval\fB, \fIcount\fB)\fR'u+2n"
2621\fBand(\fIv1\fB, \fIv2\fB)\fR
2622Return the bitwise AND of the values provided by
2623.I v1
2624and
2625.IR v2 .
2626.TP
2627\fBcompl(\fIval\fB)\fR
2628Return the bitwise complement of
2629.IR val .
2630.TP
2631\fBlshift(\fIval\fB, \fIcount\fB)\fR
2632Return the value of
2633.IR val ,
2634shifted left by
2635.I count
2636bits.
2637.TP
2638\fBor(\fIv1\fB, \fIv2\fB)\fR
2639Return the bitwise OR of the values provided by
2640.I v1
2641and
2642.IR v2 .
2643.TP
2644\fBrshift(\fIval\fB, \fIcount\fB)\fR
2645Return the value of
2646.IR val ,
2647shifted right by
2648.I count
2649bits.
2650.TP
2651\fBxor(\fIv1\fB, \fIv2\fB)\fR
2652Return the bitwise XOR of the values provided by
2653.I v1
2654and
2655.IR v2 .
2656.PP
2657.SS Internationalization Functions
2658Starting with version 3.1 of
2659.IR gawk ,
2660the following functions may be used from within your AWK program for
2661translating strings at run-time.
2662For full details, see \*(EP.
2663.TP
2664\fBbindtextdomain(\fIdirectory \fR[\fB, \fIdomain\fR]\fB)\fR
2665Specifies the directory where
2666.I gawk
2667looks for the
2668.B \&.mo
2669files, in case they
2670will not or cannot be placed in the ``standard'' locations
2671(e.g., during testing).
2672It returns the directory where
2673.I domain
2674is ``bound.''
2675.sp .5
2676The default
2677.I domain
2678is the value of
2679.BR TEXTDOMAIN .
2680If
2681.I directory
2682is the null string (\fB""\fR), then
2683.B bindtextdomain()
2684returns the current binding for the
2685given
2686.IR domain .
2687.TP
2688\fBdcgettext(\fIstring \fR[\fB, \fIdomain \fR[\fB, \fIcategory\fR]]\fB)\fR
2689Returns the translation of
2690.I string
2691in
2692text domain
2693.I domain
2694for locale category
2695.IR category .
2696The default value for
2697.I domain
2698is the current value of
2699.BR TEXTDOMAIN .
2700The default value for
2701.I category
2702is \fB"LC_MESSAGES"\fR.
2703.sp .5
2704If you supply a value for
2705.IR category ,
2706it must be a string equal to
2707one of the known locale categories described
2708in \*(EP.
2709You must also supply a text domain. Use
2710.B TEXTDOMAIN
2711if you want to use the current domain.
2712.TP
2713\fBdcngettext(\fIstring1 \fR, \fIstring2 \fR, \fInumber \fR[\fB, \fIdomain \fR[\fB, \fIcategory\fR]]\fB)\fR
2714Returns the plural form used for
2715.I number
2716of the translation of
2717.I string1
2718and
2719.I string2
2720in
2721text domain
2722.I domain
2723for locale category
2724.IR category .
2725The default value for
2726.I domain
2727is the current value of
2728.BR TEXTDOMAIN .
2729The default value for
2730.I category
2731is \fB"LC_MESSAGES"\fR.
2732.sp .5
2733If you supply a value for
2734.IR category ,
2735it must be a string equal to
2736one of the known locale categories described
2737in \*(EP.
2738You must also supply a text domain. Use
2739.B TEXTDOMAIN
2740if you want to use the current domain.
2741.SH USER-DEFINED FUNCTIONS
2742Functions in \*(AK are defined as follows:
2743.PP
2744.RS
2745\fBfunction \fIname\fB(\fIparameter list\fB) { \fIstatements \fB}\fR
2746.RE
2747.PP
2748Functions are executed when they are called from within expressions
2749in either patterns or actions. Actual parameters supplied in the function
2750call are used to instantiate the formal parameters declared in the function.
2751Arrays are passed by reference, other variables are passed by value.
2752.PP
2753Since functions were not originally part of the \*(AK language, the provision
2754for local variables is rather clumsy: They are declared as extra parameters
2755in the parameter list. The convention is to separate local variables from
2756real parameters by extra spaces in the parameter list. For example:
2757.PP
2758.RS
2759.ft B
2760.nf
2761function f(p, q, a, b) # a and b are local
2762{
2763 \&.\|.\|.
2764}
2765
2766/abc/ { .\|.\|. ; f(1, 2) ; .\|.\|. }
2767.fi
2768.ft R
2769.RE
2770.PP
2771The left parenthesis in a function call is required
2772to immediately follow the function name,
2773without any intervening white space.
2774This is to avoid a syntactic ambiguity with the concatenation operator.
2775This restriction does not apply to the built-in functions listed above.
2776.PP
2777Functions may call each other and may be recursive.
2778Function parameters used as local variables are initialized
2779to the null string and the number zero upon function invocation.
2780.PP
2781Use
2782.BI return " expr"
2783to return a value from a function. The return value is undefined if no
2784value is provided, or if the function returns by \*(lqfalling off\*(rq the
2785end.
2786.PP
2787If
2788.B \-\^\-lint
2789has been provided,
2790.I gawk
2791warns about calls to undefined functions at parse time,
2792instead of at run time.
2793Calling an undefined function at run time is a fatal error.
2794.PP
2795The word
2796.B func
2797may be used in place of
2798.BR function .
2799.SH DYNAMICALLY LOADING NEW FUNCTIONS
2800Beginning with version 3.1 of
2801.IR gawk ,
2802you can dynamically add new built-in functions to the running
2803.I gawk
2804interpreter.
2805The full details are beyond the scope of this manual page;
2806see \*(EP for the details.
2807.PP
2808.TP 8
2809\fBextension(\fIobject\fB, \fIfunction\fB)\fR
2810Dynamically link the shared object file named by
2811.IR object ,
2812and invoke
2813.I function
2814in that object, to perform initialization.
2815These should both be provided as strings.
2816Returns the value returned by
2817.IR function .
2818.PP
2819.ft B
2820This function is provided and documented in \*(EP,
2821but everything about this feature is likely to change
2822in the next release.
2823We STRONGLY recommend that you do not use this feature
2824for anything that you aren't willing to redo.
2825.ft R
2826.SH SIGNALS
2827.I pgawk
2828accepts two signals.
2829.B SIGUSR1
2830causes it to dump a profile and function call stack to the
2831profile file, which is either
2832.BR awkprof.out ,
2833or whatever file was named with the
2834.B \-\^\-profile
2835option. It then continues to run.
2836.B SIGHUP
2837causes it to dump the profile and function call stack and then exit.
2838.SH EXAMPLES
2839.nf
2840Print and sort the login names of all users:
2841
2842.ft B
2843 BEGIN { FS = ":" }
2844 { print $1 | "sort" }
2845
2846.ft R
2847Count lines in a file:
2848
2849.ft B
2850 { nlines++ }
2851 END { print nlines }
2852
2853.ft R
2854Precede each line by its number in the file:
2855
2856.ft B
2857 { print FNR, $0 }
2858
2859.ft R
2860Concatenate and line number (a variation on a theme):
2861
2862.ft B
2863 { print NR, $0 }
2864.ft R
2865Run an external command for particular lines of data:
2866
2867.ft B
2868 tail -f access_log |
2869 awk '/myhome.html/ { system("nmap " $1 ">> logdir/myhome.html") }'
2870.ft R
2871.fi
2872.SH INTERNATIONALIZATION
2873.PP
2874String constants are sequences of characters enclosed in double
2875quotes. In non-English speaking environments, it is possible to mark
2876strings in the \*(AK program as requiring translation to the native
2877natural language. Such strings are marked in the \*(AK program with
2878a leading underscore (\*(lq_\*(rq). For example,
2879.sp
2880.RS
2881.ft B
2882gawk 'BEGIN { print "hello, world" }'
2883.RE
2884.sp
2885.ft R
2886always prints
2887.BR "hello, world" .
2888But,
2889.sp
2890.RS
2891.ft B
2892gawk 'BEGIN { print _"hello, world" }'
2893.RE
2894.sp
2895.ft R
2896might print
2897.B "bonjour, monde"
2898in France.
2899.PP
2900There are several steps involved in producing and running a localizable
2901\*(AK program.
2902.TP "\w'4.'u+2n"
29031.
2904Add a
2905.B BEGIN
2906action to assign a value to the
2907.B TEXTDOMAIN
2908variable to set the text domain to a name associated with your program.
2909.sp
2910.ti +5n
2911.ft B
2912BEGIN { TEXTDOMAIN = "myprog" }
2913.ft R
2914.sp
2915This allows
2916.I gawk
2917to find the
2918.B \&.mo
2919file associated with your program.
2920Without this step,
2921.I gawk
2922uses the
2923.B messages
2924text domain,
2925which likely does not contain translations for your program.
2926.TP
29272.
2928Mark all strings that should be translated with leading underscores.
2929.TP
29303.
2931If necessary, use the
2932.B dcgettext()
2933and/or
2934.B bindtextdomain()
2935functions in your program, as appropriate.
2936.TP
29374.
2938Run
2939.B "gawk \-\^\-gen\-po \-f myprog.awk > myprog.po"
2940to generate a
2941.B \&.po
2942file for your program.
2943.TP
29445.
2945Provide appropriate translations, and build and install a corresponding
2946.B \&.mo
2947file.
2948.PP
2949The internationalization features are described in full detail in \*(EP.
2950.SH POSIX COMPATIBILITY
2951A primary goal for
2952.I gawk
2953is compatibility with the \*(PX standard, as well as with the
2954latest version of \*(UX
2955.IR awk .
2956To this end,
2957.I gawk
2958incorporates the following user visible
2959features which are not described in the \*(AK book,
2960but are part of the Bell Laboratories version of
2961.IR awk ,
2962and are in the \*(PX standard.
2963.PP
2964The book indicates that command line variable assignment happens when
2965.I awk
2966would otherwise open the argument as a file, which is after the
2967.B BEGIN
2968block is executed. However, in earlier implementations, when such an
2969assignment appeared before any file names, the assignment would happen
2970.I before
2971the
2972.B BEGIN
2973block was run. Applications came to depend on this \*(lqfeature.\*(rq
2974When
2975.I awk
2976was changed to match its documentation, the
2977.B \-v
2978option for assigning variables before program execution was added to
2979accommodate applications that depended upon the old behavior.
2980(This feature was agreed upon by both the Bell Laboratories and the \*(GN developers.)
2981.PP
2982The
2983.B \-W
2984option for implementation specific features is from the \*(PX standard.
2985.PP
2986When processing arguments,
2987.I gawk
2988uses the special option \*(lq\-\^\-\*(rq to signal the end of
2989arguments.
2990In compatibility mode, it warns about but otherwise ignores
2991undefined options.
2992In normal operation, such arguments are passed on to the \*(AK program for
2993it to process.
2994.PP
2995The \*(AK book does not define the return value of
2996.BR srand() .
2997The \*(PX standard
2998has it return the seed it was using, to allow keeping track
2999of random number sequences. Therefore
3000.B srand()
3001in
3002.I gawk
3003also returns its current seed.
3004.PP
3005Other new features are:
3006The use of multiple
3007.B \-f
3008options (from MKS
3009.IR awk );
3010the
3011.B ENVIRON
3012array; the
3013.BR \ea ,
3014and
3015.BR \ev
3016escape sequences (done originally in
3017.I gawk
3018and fed back into the Bell Laboratories version); the
3019.B tolower()
3020and
3021.B toupper()
3022built-in functions (from the Bell Laboratories version); and the \*(AN C conversion specifications in
3023.B printf
3024(done first in the Bell Laboratories version).
3025.SH HISTORICAL FEATURES
3026There are two features of historical \*(AK implementations that
3027.I gawk
3028supports.
3029First, it is possible to call the
3030.B length()
3031built-in function not only with no argument, but even without parentheses!
3032Thus,
3033.RS
3034.PP
3035.ft B
3036a = length # Holy Algol 60, Batman!
3037.ft R
3038.RE
3039.PP
3040is the same as either of
3041.RS
3042.PP
3043.ft B
3044a = length()
3045.br
3046a = length($0)
3047.ft R
3048.RE
3049.PP
3050This feature is marked as \*(lqdeprecated\*(rq in the \*(PX standard, and
3051.I gawk
3052issues a warning about its use if
3053.B \-\^\-lint
3054is specified on the command line.
3055.PP
3056The other feature is the use of either the
3057.B continue
3058or the
3059.B break
3060statements outside the body of a
3061.BR while ,
3062.BR for ,
3063or
3064.B do
3065loop. Traditional \*(AK implementations have treated such usage as
3066equivalent to the
3067.B next
3068statement.
3069.I Gawk
3070supports this usage if
3071.B \-\^\-traditional
3072has been specified.
3073.SH GNU EXTENSIONS
3074.I Gawk
3075has a number of extensions to \*(PX
3076.IR awk .
3077They are described in this section. All the extensions described here
3078can be disabled by
3079invoking
3080.I gawk
3081with the
3082.B \-\^\-traditional
3083option.
3084.PP
3085The following features of
3086.I gawk
3087are not available in
3088\*(PX
3089.IR awk .
3090.\" Environment vars and startup stuff
3091.TP "\w'\(bu'u+1n"
3092\(bu
3093No path search is performed for files named via the
3094.B \-f
3095option. Therefore the
3096.B AWKPATH
3097environment variable is not special.
3098.\" POSIX and language recognition issues
3099.TP
3100\(bu
3101The
3102.B \ex
3103escape sequence.
3104(Disabled with
3105.BR \-\^\-posix .)
3106.TP
3107\(bu
3108The
3109.B fflush()
3110function.
3111(Disabled with
3112.BR \-\^\-posix .)
3113.TP
3114\(bu
3115The ability to continue lines after
3116.B ?
3117and
3118.BR : .
3119(Disabled with
3120.BR \-\^\-posix .)
3121.TP
3122\(bu
3123Octal and hexadecimal constants in AWK programs.
3124.\" Special variables
3125.TP
3126\(bu
3127The
3128.BR ARGIND ,
3129.BR BINMODE ,
3130.BR ERRNO ,
3131.BR LINT ,
3132.B RT
3133and
3134.B TEXTDOMAIN
3135variables are not special.
3136.TP
3137\(bu
3138The
3139.B IGNORECASE
3140variable and its side-effects are not available.
3141.TP
3142\(bu
3143The
3144.B FIELDWIDTHS
3145variable and fixed-width field splitting.
3146.TP
3147\(bu
3148The
3149.B PROCINFO
3150array is not available.
3151.\" I/O stuff
3152.TP
3153\(bu
3154The use of
3155.B RS
3156as a regular expression.
3157.TP
3158\(bu
3159The special file names available for I/O redirection are not recognized.
3160.TP
3161\(bu
3162The
3163.B |&
3164operator for creating co-processes.
3165.\" Changes to standard awk functions
3166.TP
3167\(bu
3168The ability to split out individual characters using the null string
3169as the value of
3170.BR FS ,
3171and as the third argument to
3172.BR split() .
3173.TP
3174\(bu
3175The optional second argument to the
3176.B close()
3177function.
3178.TP
3179\(bu
3180The optional third argument to the
3181.B match()
3182function.
3183.TP
3184\(bu
3185The ability to use positional specifiers with
3186.B printf
3187and
3188.BR sprintf() .
3189.\" New keywords or changes to keywords
3190.TP
3191\(bu
3192The use of
3193.BI delete " array"
3194to delete the entire contents of an array.
3195.TP
3196\(bu
3197The use of
3198.B "nextfile"
3199to abandon processing of the current input file.
3200.\" New functions
3201.TP
3202\(bu
3203The
3204.BR and() ,
3205.BR asort() ,
3206.BR asorti() ,
3207.BR bindtextdomain() ,
3208.BR compl() ,
3209.BR dcgettext() ,
3210.BR dcngettext() ,
3211.BR gensub() ,
3212.BR lshift() ,
3213.BR mktime() ,
3214.BR or() ,
3215.BR rshift() ,
3216.BR strftime() ,
3217.BR strtonum() ,
3218.B systime()
3219and
3220.B xor()
3221functions.
3222.\" I18N stuff
3223.TP
3224\(bu
3225Localizable strings.
3226.\" Extending gawk
3227.TP
3228\(bu
3229Adding new built-in functions dynamically with the
3230.B extension()
3231function.
3232.PP
3233The \*(AK book does not define the return value of the
3234.B close()
3235function.
3236.IR Gawk\^ "'s"
3237.B close()
3238returns the value from
3239.IR fclose (3),
3240or
3241.IR pclose (3),
3242when closing an output file or pipe, respectively.
3243It returns the process's exit status when closing an input pipe.
3244The return value is \-1 if the named file, pipe
3245or co-process was not opened with a redirection.
3246.PP
3247When
3248.I gawk
3249is invoked with the
3250.B \-\^\-traditional
3251option,
3252if the
3253.I fs
3254argument to the
3255.B \-F
3256option is \*(lqt\*(rq, then
3257.B FS
3258is set to the tab character.
3259Note that typing
3260.B "gawk \-F\et \&.\|.\|."
3261simply causes the shell to quote the \*(lqt,\*(rq, and does not pass
3262\*(lq\et\*(rq to the
3263.B \-F
3264option.
3265Since this is a rather ugly special case, it is not the default behavior.
3266This behavior also does not occur if
3267.B \-\^\-posix
3268has been specified.
3269To really get a tab character as the field separator, it is best to use
3270single quotes:
3271.BR "gawk \-F'\et' \&.\|.\|." .
3272.ig
3273.PP
3274If
3275.I gawk
3276was compiled for debugging, it
3277accepts the following additional options:
3278.TP
3279.PD 0
3280.B \-Wparsedebug
3281.TP
3282.PD
3283.B \-\^\-parsedebug
3284Turn on
3285.IR yacc (1)
3286or
3287.IR bison (1)
3288debugging output during program parsing.
3289This option should only be of interest to the
3290.I gawk
3291maintainers, and may not even be compiled into
3292.IR gawk .
3293..
3294.PP
3295If
3296.I gawk
3297is
3298.I configured
3299with the
3300.B \-\^\-enable\-switch
3301option to the
3302.I configure
3303command, then it accepts an additional control-flow statement:
3304.RS
3305.nf
3306\fBswitch (\fIexpression\fB) {
3307\fBcase \fIvalue\fB|\fIregex\fB : \fIstatement
3308\&.\^.\^.
3309\fR[ \fBdefault: \fIstatement \fR]
3310\fB}\fR
3311.fi
3312.RE
3313.SH ENVIRONMENT VARIABLES
3314The
3315.B AWKPATH
3316environment variable can be used to provide a list of directories that
3317.I gawk
3318searches when looking for files named via the
3319.B \-f
3320and
3321.B \-\^\-file
3322options.
3323.PP
3324If
3325.B POSIXLY_CORRECT
3326exists in the environment, then
3327.I gawk
3328behaves exactly as if
3329.B \-\^\-posix
3330had been specified on the command line.
3331If
3332.B \-\^\-lint
3333has been specified,
3334.I gawk
3335issues a warning message to this effect.
3336.SH SEE ALSO
3337.IR egrep (1),
3338.IR getpid (2),
3339.IR getppid (2),
3340.IR getpgrp (2),
3341.IR getuid (2),
3342.IR geteuid (2),
3343.IR getgid (2),
3344.IR getegid (2),
3345.IR getgroups (2)
3346.PP
3347.IR "The AWK Programming Language" ,
3348Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger,
3349Addison-Wesley, 1988. ISBN 0-201-07981-X.
3350.PP
3351\*(EP,
3352Edition 3.0, published by the Free Software Foundation, 2001.
3353.SH BUGS
3354The
3355.B \-F
3356option is not necessary given the command line variable assignment feature;
3357it remains only for backwards compatibility.
3358.PP
3359Syntactically invalid single character programs tend to overflow
3360the parse stack, generating a rather unhelpful message. Such programs
3361are surprisingly difficult to diagnose in the completely general case,
3362and the effort to do so really is not worth it.
3363.SH AUTHORS
3364The original version of \*(UX
3365.I awk
3366was designed and implemented by Alfred Aho,
3367Peter Weinberger, and Brian Kernighan of Bell Laboratories. Brian Kernighan
3368continues to maintain and enhance it.
3369.PP
3370Paul Rubin and Jay Fenlason,
3371of the Free Software Foundation, wrote
3372.IR gawk ,
3373to be compatible with the original version of
3374.I awk
3375distributed in Seventh Edition \*(UX.
3376John Woods contributed a number of bug fixes.
3377David Trueman, with contributions
3378from Arnold Robbins, made
3379.I gawk
3380compatible with the new version of \*(UX
3381.IR awk .
3382Arnold Robbins is the current maintainer.
3383.PP
3384The initial DOS port was done by Conrad Kwok and Scott Garfinkle.
3385Scott Deifik is the current DOS maintainer. Pat Rankin did the
3386port to VMS, and Michal Jaegermann did the port to the Atari ST.
3387The port to OS/2 was done by Kai Uwe Rommel, with contributions and
3388help from Darrel Hankerson. Fred Fish supplied support for the Amiga,
3389Stephen Davies provided the Tandem port,
3390and Martin Brown provided the BeOS port.
3391.SH VERSION INFORMATION
3392This man page documents
3393.IR gawk ,
3394version 3.1.5.
3395.SH BUG REPORTS
3396If you find a bug in
3397.IR gawk ,
3398please send electronic mail to
3399.BR bug-gawk@gnu.org .
3400Please include your operating system and its revision, the version of
3401.I gawk
3402(from
3403.BR "gawk \-\^\-version" ),
3404what C compiler you used to compile it, and a test program
3405and data that are as small as possible for reproducing the problem.
3406.PP
3407Before sending a bug report, please do two things. First, verify that
3408you have the latest version of
3409.IR gawk .
3410Many bugs (usually subtle ones) are fixed at each release, and if
3411yours is out of date, the problem may already have been solved.
3412Second, please read this man page and the reference manual carefully to
3413be sure that what you think is a bug really is, instead of just a quirk
3414in the language.
3415.PP
3416Whatever you do, do
3417.B NOT
3418post a bug report in
3419.BR comp.lang.awk .
3420While the
3421.I gawk
3422developers occasionally read this newsgroup, posting bug reports there
3423is an unreliable way to report bugs. Instead, please use the electronic mail
3424addresses given above.
3425.PP
3426If you're using a GNU/Linux system or BSD-based system,
3427you may wish to submit a bug report to the vendor of your distribution.
3428That's fine, but please send a copy to the official email address as well,
3429since there's no guarantee that the bug will be forwarded to the
3430.I gawk
3431maintainer.
3432.SH ACKNOWLEDGEMENTS
3433Brian Kernighan of Bell Laboratories
3434provided valuable assistance during testing and debugging.
3435We thank him.
3436.SH COPYING PERMISSIONS
3437Copyright \(co 1989, 1991, 1992, 1993, 1994, 1995, 1996,
34381997, 1998, 1999, 2001, 2002, 2003, 2004, 2005 Free Software Foundation, Inc.
3439.PP
3440Permission is granted to make and distribute verbatim copies of
3441this manual page provided the copyright notice and this permission
3442notice are preserved on all copies.
3443.ig
3444Permission is granted to process this file through troff and print the
3445results, provided the printed document carries copying permission
3446notice identical to this one except for the removal of this paragraph
3447(this paragraph not being relevant to the printed manual page).
3448..
3449.PP
3450Permission is granted to copy and distribute modified versions of this
3451manual page under the conditions for verbatim copying, provided that
3452the entire resulting derived work is distributed under the terms of a
3453permission notice identical to this one.
3454.PP
3455Permission is granted to copy and distribute translations of this
3456manual page into another language, under the above conditions for
3457modified versions, except that this permission notice may be stated in
3458a translation approved by the Foundation.
Note: See TracBrowser for help on using the repository browser.