aboutsummaryrefslogtreecommitdiffstats
path: root/gawk.1
diff options
context:
space:
mode:
Diffstat (limited to 'gawk.1')
-rw-r--r--gawk.1280
1 files changed, 218 insertions, 62 deletions
diff --git a/gawk.1 b/gawk.1
index 67a76c3b..b933c53e 100644
--- a/gawk.1
+++ b/gawk.1
@@ -9,9 +9,9 @@ gawk \- pattern scanning and processing language
] [
.B \-D
] [
-.B \-i
-] [
.B \-v
+] [
+.B \-V
]
..
[
@@ -33,9 +33,9 @@ gawk \- pattern scanning and processing language
] [
.B \-D
] [
-.B \-i
-] [
.B \-v
+] [
+.B \-V
]
..
[
@@ -76,13 +76,7 @@ Use
for the input field separator (the value of the
.B FS
predefined
-variable). For compatibility with \s-1UNIX\s+1
-.IR awk ,
-if
-.I fs
-is ``t'', then
-.B FS
-will be set to the tab character.
+variable).
.TP
.BI \-f " program-file"
Read the AWK program source from the file
@@ -130,6 +124,17 @@ type your program, and end it with a
.B ^D
(control-d).
.PP
+The environment variable
+.B AWKPATH
+specifies a search path to use when finding source files named with
+the
+.B \-f
+option. If this variable does not exist, the default path is
+\fB".:/usr/lib/awk:/usr/local/lib/awk"\fR.
+If a file name given to the
+.B \-f
+option contains a ``/'' character, no path search is performed.
+.PP
.I Gawk
compiles the program into an internal form,
and then proceeds to read
@@ -184,6 +189,11 @@ In the special case that
.B FS
is a single blank, fields are separated
by runs of blanks and/or tabs.
+Note that the value of
+.B IGNORECASE
+(see below) will also affect how fields are split when
+.B FS
+is a regular expression.
.PP
Each field in the input line may be referenced by its position,
.BR $1 ,
@@ -223,12 +233,12 @@ to be recomputed, with the fields being separated by the value of
AWK's built-in variables are:
.PP
.RS
-.TP \l'\fBFILENAME\fR'
+.TP \l'\fBIGNORECASE\fR'
.B ARGC
the number of command line arguments (does not include options to
.IR gawk ,
or the program source).
-.TP \l'\fBFILENAME\fR'
+.TP \l'\fBIGNORECASE\fR'
.B ARGV
array of command line arguments. The array is indexed from
0 to
@@ -237,7 +247,7 @@ array of command line arguments. The array is indexed from
Dynamically changing the contents of
.B ARGV
can control the files used for data.
-.TP \l'\fBFILENAME\fR'
+.TP \l'\fBIGNORECASE\fR'
.B ENVIRON
An array containing the values of the current environment.
The array is indexed by the environment variables, each element being
@@ -248,36 +258,64 @@ Changing this array does not affect the environment seen by programs which
spawns via redirection or the
.B system
function.
-.TP \l'\fBFILENAME\fR'
+.TP \l'\fBIGNORECASE\fR'
.B FILENAME
the name of the current input file.
If no files are specified on the command line, the value of
.B FILENAME
is ``\-''.
-.TP \l'\fBFILENAME\fR'
+.TP \l'\fBIGNORECASE\fR'
.B FNR
the input record number in the current input file.
-.TP \l'\fBFILENAME\fR'
+.TP \l'\fBIGNORECASE\fR'
.B FS
the input field separator, a blank by default.
-.TP \l'\fBFILENAME\fR'
+.TP \l'\fBIGNORECASE\fR'
+.B IGNORECASE
+Controls the case-sensitivity of all regular expression operations. If
+.B IGNORECASE
+has a non-zero value, then pattern matching in rules,
+field splitting with
+.BR FS ,
+regular expression
+matching with
+.B ~
+and
+.BR !~ ,
+and the
+.BR gsub() ,
+.BR match() ,
+.BR split() ,
+and
+.B sub()
+pre-defined functions will all ignore case when doing regular expression
+operations. Thus, if
+.B IGNORECASE
+is not equal to zero,
+.B /aB/
+matches all of the strings \fB"ab"\fP, \fB"aB"\fP, \fB"Ab"\fP,
+and \fB"AB"\fP.
+As with all AWK variables, the initial value of
+.B IGNORECASE
+is zero, so all regular expression operations are normally case-sensitive.
+.TP \l'\fBIGNORECASE\fR'
.B NF
the number of fields in the current input record.
-.TP \l'\fBFILENAME\fR'
+.TP \l'\fBIGNORECASE\fR'
.B NR
the total number of input records seen so far.
-.TP \l'\fBFILENAME\fR'
+.TP \l'\fBIGNORECASE\fR'
.B OFMT
the output format for numbers,
.B %.6g
by default.
-.TP \l'\fBFILENAME\fR'
+.TP \l'\fBIGNORECASE\fR'
.B OFS
the output field separator, a blank by default.
-.TP \l'\fBFILENAME\fR'
+.TP \l'\fBIGNORECASE\fR'
.B ORS
the output record separator, by default a newline.
-.TP \l'\fBFILENAME\fR'
+.TP \l'\fBIGNORECASE\fR'
.B RS
the input record separator, by default a newline.
.B RS
@@ -292,17 +330,17 @@ is set to the null string, then the newline character always acts as
a field separator, in addition to whatever value
.B FS
may have.
-.TP \l'\fBFILENAME\fR'
+.TP \l'\fBIGNORECASE\fR'
.B RSTART
the index of the first character matched by
.BR match() ;
0 if no match.
-.TP \l'\fBFILENAME\fR'
+.TP \l'\fBIGNORECASE\fR'
.B RLENGTH
the length of the string matched by
.BR match() ;
\-1 if no match.
-.TP \l'\fBFILENAME\fR'
+.TP \l'\fBIGNORECASE\fR'
.B SUBSEP
the character used to separate multiple subscripts in array
elements, by default \fB"\e034"\fR.
@@ -740,6 +778,11 @@ functions accept the following conversion specification formats:
.TP
.B %c
An ASCII character.
+If the argument used for
+.B %c
+is numeric, it is treated as a character and printed.
+Otherwise, the argument is assumed to be a string, and the only first
+character of that string is printed.
.TP
.B %d
A decimal number (the integer part).
@@ -803,6 +846,53 @@ However, they may be simulated by using
the AWK concatenation operation to build up
a format specification dynamically.
.PP
+When doing I/O redirection from either
+.B print
+or
+.B printf
+into a file,
+or via
+.B getline
+from a file,
+.I gawk
+recognizes certain special filenames internally. These filenames
+allow access to open file descriptors inherited from
+.IR gawk 's
+parent process (usually the shell). The filenames are:
+.RS
+.TP
+.B /dev/stdin
+The standard input.
+.TP
+.B /dev/stdout
+The standard output.
+.TP
+.B /dev/stderr
+The standard error output.
+.TP
+.BI /dev/fd/\^ n
+The file denoted by the open file descriptor
+.IR n .
+.RE
+.PP
+These are particularly useful for error messages. For example:
+.PP
+.RS
+.ft B
+print "You blew it!" > "/dev/stderr"
+.ft R
+.RE
+.PP
+whereas you would otherwise have to use
+.PP
+.RS
+.ft B
+print "You blew it!" | "cat 1>&2"
+.ft R
+.RE
+.PP
+These file names may also be used on the command line to name data files.
+.PP
AWK has the following pre-defined arithmetic functions:
.PP
.RS
@@ -922,6 +1012,22 @@ If
is omitted, the rest of
.I s
is used.
+.TP \l'\fBsprintf(\fIfmt\fB, \fIexpr-list\fB)\fR'
+.BI tolower( str )
+returns a copy of the string
+.IR str ,
+with all the upper-case characters in
+.I str
+translated to their corresponding lower-case counterparts.
+Non-alphabetic characters are left unchanged.
+.TP \l'\fBsprintf(\fIfmt\fB, \fIexpr-list\fB)\fR'
+.BI toupper( str )
+returns a copy of the string
+.IR str ,
+with all the lower-case characters in
+.I str
+translated to their corresponding upper-case counterparts.
+Non-alphabetic characters are left unchanged.
.RE
.PP
String constants in AWK are sequences of characters enclosed
@@ -931,6 +1037,9 @@ are recognized, as in C. These are:
.PP
.RS
.TP \l'\fB\e\fIddd\fR'
+.B \ea
+The ``alert'' character; usually the ASCII BEL character.
+.TP \l'\fB\e\fIddd\fR'
.B \eb
backspace.
.TP \l'\fB\e\fIddd\fR'
@@ -949,10 +1058,24 @@ horizontal tab.
.B \ev
vertical tab.
.TP \l'\fB\e\fIddd\fR'
+.BI \ex "\^hex digits"
+The character represented by the string of hexadecimal digits following
+the
+.BR \ex .
+As in ANSI C, all following hexadecimal digits are considered part of
+the escape sequence.
+(This feature should tell us something about language design by committee.)
+E.g., "\ex1B" is the ASCII ESC (escape) character.
+.TP \l'\fB\e\fIddd\fR'
.BI \e ddd
The character represented by the 1-, 2-, or 3-digit sequence of octal
digits. E.g. "\e033" is the ASCII ESC (escape) character.
.RE
+.PP
+The escape sequences may also be used inside constant regular expressions
+(e.g.,
+.B "/[\ \et\ef\en\er\ev]/"
+matches whitespace characters).
.SH FUNCTIONS
Functions in AWK are defined as follows:
.PP
@@ -1064,10 +1187,8 @@ array.
.I Gawk
has some extensions to System V
.IR awk .
-They are described in this section.
-All features described in this section may change at some time in
-the future, or may go away entirely. They can be disabled either by
-compiling
+They are described in this section. All the extensions described here
+can be disabled by compiling
.I gawk
with
.BR \-DSTRICT ,
@@ -1075,25 +1196,51 @@ or by invoking
.I gawk
with the name
.IR awk .
-You should not write programs that depend upon them.
-.PP
-The environment variable
-.B AWKPATH
-specifies a search path to use when finding source files named with
-the
-.B \-f
-option. If this variable does not exist, the default path is
-\fB".:/usr/lib/awk:/usr/local/lib/awk"\fR.
-If a file name given to the
-.B \-f
-option contains a ``/'' character, no path search is performed.
+If the underlying operating system supports the
+.B /dev/fd
+directory and corresponding files, then
+.I gawk
+can be compiled with
+.B \-DNO_DEV_FD
+to disable the special filename processing.
.PP
-Two new relational operators are defined,
-.BR ~~ ,
+The following features of
+.I gawk
+are not available in
+System V
+.IR awk .
+.RS
+.TP \l'\(bu'
+\(bu
+The
+.BR \ea ,
+.BR \ev ,
+or
+.B \ex
+escape sequences are not recognized.
+.TP \l'\(bu'
+\(bu
+The special file names available for I/O redirection are not recognized.
+.TP \l'\(bu'
+\(bu
+The
+.B tolower
and
-.BR !~~ .
-These perform case independent regular expression match and no-match
-operations, respectively.
+.B toupper
+built-in string functions are not available.
+.TP \l'\(bu'
+\(bu
+The
+.B IGNORECASE
+variable and its side-effects are not available.
+.TP \l'\(bu'
+\(bu
+No path search is performed for files named via the
+.B \-f
+option. Therefore the
+.B AWKPATH
+environment variable is not special.
+.RE
.PP
The AWK book does not define the return value of the
.B close
@@ -1106,8 +1253,25 @@ or
.IR pclose (3),
when closing a file or pipe, respectively.
.PP
+When
+.I gawk
+is invoked as
+.IR awk ,
+if the
+.I fs
+argument to the
+.B \-F
+option is ``t'', then
+.B FS
+will be set to the tab character.
+Since this is a rather ugly special case, it is not the default behavior.
+.PP
+The rest of the features described in this section may change at some time in
+the future, or may go away entirely.
+You should not write programs that depend upon them.
+.PP
.I Gawk
-accepts the following additional arguments:
+accepts the following additional options:
.ig
.TP
.B \-D
@@ -1131,18 +1295,6 @@ maintainers, and may not even be compiled into
.IR gawk .
..
.TP
-.B \-i
-Ignore case when doing regular expression operations.
-This causes
-.B ~
-and
-.B !~
-to behave like the new operators
-.B ~~
-and
-.BR !~~ ,
-described above.
-.TP
.B \-v
Print version information for this particular copy of
.I gawk
@@ -1152,6 +1304,9 @@ This is useful mainly for knowing if the current copy of
on your system
is up to date with respect to whatever the Free Software Foundation
is distributing.
+.TP
+.B \-V
+Print the GNU copyright information message on the error output.
.SH BUGS
The
.B \-F
@@ -1164,12 +1319,13 @@ was designed and implemented by Alfred Aho,
Peter Weinberger, and Brian Kernighan of AT&T Bell Labs. Brian Kernighan
continues to maintain and enhance it.
.PP
-Paul Rubin and Jay Fenlason, with John Woods,
-all of the Free Software Foundation, wrote
+Paul Rubin and Jay Fenlason,
+of the Free Software Foundation, wrote
.IR gawk ,
to be compatible with the original version of
.I awk
distributed in Seventh Edition \s-1UNIX\s+1.
+John Woods contributed a number of bug fixes.
David Trueman of Dalhousie University, with contributions
from Arnold Robbins at Emory University, made
.I gawk