aboutsummaryrefslogtreecommitdiffstats
path: root/doc/gawk.texi
diff options
context:
space:
mode:
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r--doc/gawk.texi687
1 files changed, 376 insertions, 311 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi
index ac973b9b..e702f407 100644
--- a/doc/gawk.texi
+++ b/doc/gawk.texi
@@ -55,6 +55,7 @@
@set VERSION 4.1
@set PATCHLEVEL 2
+@set GAWKINETTITLE TCP/IP Internetworking with @command{gawk}
@ifset FOR_PRINT
@set TITLE Effective awk Programming
@end ifset
@@ -472,7 +473,7 @@ particular records in a file and perform operations upon them.
@command{gawk}.
* Internationalization:: Getting @command{gawk} to speak your
language.
-* Debugger:: The @code{gawk} debugger.
+* Debugger:: The @command{gawk} debugger.
* Arbitrary Precision Arithmetic:: Arbitrary precision arithmetic with
@command{gawk}.
* Dynamic Extensions:: Adding new built-in functions to
@@ -955,7 +956,7 @@ particular records in a file and perform operations upon them.
* Internal File Ops:: The code for internal file operations.
* Using Internal File Ops:: How to use an external extension.
* Extension Samples:: The sample extensions that ship with
- @code{gawk}.
+ @command{gawk}.
* Extension Sample File Functions:: The file functions sample.
* Extension Sample Fnmatch:: An interface to @code{fnmatch()}.
* Extension Sample Fork:: An interface to @code{fork()} and
@@ -1496,7 +1497,7 @@ In May 1997, J@"urgen Kahrs felt the need for network access
from @command{awk}, and with a little help from me, set about adding
features to do this for @command{gawk}. At that time, he also
wrote the bulk of
-@cite{TCP/IP Internetworking with @command{gawk}}
+@cite{@value{GAWKINETTITLE}}
(a separate document, available as part of the @command{gawk} distribution).
His code finally became part of the main @command{gawk} distribution
with @command{gawk} @value{PVERSION} 3.1.
@@ -4677,7 +4678,7 @@ $ @kbd{gawk -f test2}
@print{} This is script test2.
@end example
-@code{gawk} runs the @file{test2} script, which includes @file{test1}
+@command{gawk} runs the @file{test2} script, which includes @file{test1}
using the @code{@@include}
keyword. So, to include external @command{awk} source files, you just
use @code{@@include} followed by the name of the file to be included,
@@ -4886,7 +4887,7 @@ This seems to have been a long-undocumented feature in Unix @command{awk}.
Similarly, you may use @code{print} or @code{printf} statements in the
@var{init} and @var{increment} parts of a @code{for} loop. This is another
-long-undocumented ``feature'' of Unix @code{awk}.
+long-undocumented ``feature'' of Unix @command{awk}.
@end ignore
@@ -5178,13 +5179,12 @@ letters or numbers. @value{COMMONEXT}
@quotation CAUTION
In ISO C, the escape sequence continues until the first nonhexadecimal
digit is seen.
-@c FIXME: Add exact version here.
For many years, @command{gawk} would continue incorporating
hexadecimal digits into the value until a non-hexadecimal digit
or the end of the string was encountered.
However, using more than two hexadecimal digits produced
undefined results.
-As of @value{PVERSION} @strong{FIXME:} 4.3.0, only two digits
+As of @value{PVERSION} 4.2, only two digits
are processed.
@end quotation
@@ -14508,7 +14508,7 @@ respectively, should use binary I/O. A string value of @code{"rw"} or
@code{"wr"} indicates that all files should use binary I/O. Any other
string value is treated the same as @code{"rw"}, but causes @command{gawk}
to generate a warning message. @code{BINMODE} is described in more
-detail in @ref{PC Using}. @command{mawk} (@pxref{Other Versions}),
+detail in @ref{PC Using}. @command{mawk} (@pxref{Other Versions})
also supports this variable, but only using numeric values.
@cindex @code{CONVFMT} variable
@@ -14516,7 +14516,7 @@ also supports this variable, but only using numeric values.
@cindex numbers, converting, to strings
@cindex strings, converting, numbers to
@item @code{CONVFMT}
-This string controls conversion of numbers to
+A string that controls the conversion of numbers to
strings (@pxref{Conversion}).
It works by being passed, in effect, as the first argument to the
@code{sprintf()} function
@@ -14591,7 +14591,7 @@ is to simply say @samp{FS = FS}, perhaps with an explanatory comment.
@cindex regular expressions, case sensitivity
@item IGNORECASE #
If @code{IGNORECASE} is nonzero or non-null, then all string comparisons
-and all regular expression matching are case independent. Thus, regexp
+and all regular expression matching are case-independent. Thus, regexp
matching with @samp{~} and @samp{!~}, as well as the @code{gensub()},
@code{gsub()}, @code{index()}, @code{match()}, @code{patsplit()},
@code{split()}, and @code{sub()}
@@ -14617,7 +14617,7 @@ Any other true value prints nonfatal warnings.
Assigning a false value to @code{LINT} turns off the lint warnings.
This variable is a @command{gawk} extension. It is not special
-in other @command{awk} implementations. Unlike the other special variables,
+in other @command{awk} implementations. Unlike with the other special variables,
changing @code{LINT} does affect the production of lint warnings,
even if @command{gawk} is in compatibility mode. Much as
the @option{--lint} and @option{--traditional} options independently
@@ -14629,7 +14629,7 @@ of @command{awk} being executed.
@cindex numbers, converting, to strings
@cindex strings, converting, numbers to
@item OFMT
-Controls conversion of numbers to
+A string that controls conversion of numbers to
strings (@pxref{Conversion}) for
printing with the @code{print} statement. It works by being passed
as the first argument to the @code{sprintf()} function
@@ -14644,7 +14644,7 @@ strings in general expressions; this is now done by @code{CONVFMT}.
@cindex separators, field
@cindex field separators
@item OFS
-This is the output field separator (@pxref{Output Separators}). It is
+The output field separator (@pxref{Output Separators}). It is
output between the fields printed by a @code{print} statement. Its
default value is @w{@code{" "}}, a string consisting of a single space.
@@ -14662,7 +14662,7 @@ The working precision of arbitrary-precision floating-point numbers,
@cindex @code{ROUNDMODE} variable
@item ROUNDMODE #
The rounding mode to use for arbitrary-precision arithmetic on
-numbers, by default @code{"N"} (@samp{roundTiesToEven} in
+numbers, by default @code{"N"} (@code{roundTiesToEven} in
the IEEE 754 standard; @pxref{Setting the rounding mode}).
@cindex @code{RS} variable
@@ -14691,7 +14691,7 @@ just the first character of @code{RS}'s value is used.
@item @code{SUBSEP}
The subscript separator. It has the default value of
@code{"\034"} and is used to separate the parts of the indices of a
-multidimensional array. Thus, the expression @code{@w{foo["A", "B"]}}
+multidimensional array. Thus, the expression @samp{@w{foo["A", "B"]}}
really accesses @code{foo["A\034B"]}
(@pxref{Multidimensional}).
@@ -14709,7 +14709,7 @@ The default value of @code{TEXTDOMAIN} is @code{"messages"}.
@end table
@node Auto-set
-@subsection Built-In Variables That Convey Information
+@subsection Built-in Variables That Convey Information
@cindex predefined variables, conveying information
@cindex variables, predefined conveying information
@@ -14867,12 +14867,12 @@ input file.
@item @code{NF}
The number of fields in the current input record.
@code{NF} is set each time a new record is read, when a new field is
-created or when @code{$0} changes (@pxref{Fields}).
+created, or when @code{$0} changes (@pxref{Fields}).
Unlike most of the variables described in this @value{SUBSECTION},
assigning a value to @code{NF} has the potential to affect
@command{awk}'s internal workings. In particular, assignments
-to @code{NF} can be used to create or remove fields from the
+to @code{NF} can be used to create fields in or remove fields from the
current record. @xref{Changing Fields}.
@cindex @code{FUNCTAB} array
@@ -14922,7 +14922,7 @@ or @code{"FPAT"} if field matching with @code{FPAT} is in effect.
@item PROCINFO["identifiers"]
@cindex program identifiers
A subarray, indexed by the names of all identifiers used in the text of
-the AWK program. An @dfn{identifier} is simply the name of a variable
+the @command{awk} program. An @dfn{identifier} is simply the name of a variable
(be it scalar or array), built-in function, user-defined function, or
extension function. For each identifier, the value of the element is
one of the following:
@@ -14942,7 +14942,7 @@ The identifier is an extension function loaded via
The identifier is a scalar.
@item "untyped"
-The identifier is untyped (could be used as a scalar or array,
+The identifier is untyped (could be used as a scalar or an array;
@command{gawk} doesn't know yet).
@item "user"
@@ -15063,7 +15063,7 @@ is the length of the matched string, or @minus{}1 if no match is found.
@cindex @code{RSTART} variable
@item @code{RSTART}
-The start-index in characters of the substring that is matched by the
+The start index in characters of the substring that is matched by the
@code{match()} function
(@pxref{String Functions}).
@code{RSTART} is set by invoking the @code{match()} function. Its value
@@ -15130,7 +15130,7 @@ function multiply(variable, amount)
@quotation NOTE
In order to avoid severe time-travel paradoxes,@footnote{Not to mention difficult
implementation issues.} neither @code{FUNCTAB} nor @code{SYMTAB}
-are available as elements within the @code{SYMTAB} array.
+is available as an element within the @code{SYMTAB} array.
@end quotation
@end table
@@ -15350,7 +15350,7 @@ When designing your program, you should choose options that don't
conflict with @command{gawk}'s, because it will process any options
that it accepts before passing the rest of the command line on to
your program. Using @samp{#!} with the @option{-E} option may help
-(@DBXREF{Executable Scripts}
+(@DBPXREF{Executable Scripts}
and
@ifnotdocbook
@DBPXREF{Options}).
@@ -15364,15 +15364,15 @@ and
@itemize @value{BULLET}
@item
-Pattern-action pairs make up the basic elements of an @command{awk}
+Pattern--action pairs make up the basic elements of an @command{awk}
program. Patterns are either normal expressions, range expressions,
-regexp constants, one of the special keywords @code{BEGIN}, @code{END},
-@code{BEGINFILE}, @code{ENDFILE}, or empty. The action executes if
+or regexp constants; one of the special keywords @code{BEGIN}, @code{END},
+@code{BEGINFILE}, or @code{ENDFILE}; or empty. The action executes if
the current record matches the pattern. Empty (missing) patterns match
all records.
@item
-I/O from @code{BEGIN} and @code{END} rules have certain constraints.
+I/O from @code{BEGIN} and @code{END} rules has certain constraints.
This is also true, only more so, for @code{BEGINFILE} and @code{ENDFILE}
rules. The latter two give you ``hooks'' into @command{gawk}'s file
processing, allowing you to recover from a file that otherwise would
@@ -15402,12 +15402,12 @@ iteration of a loop (or get out of a @code{switch}).
@item
@code{next} and @code{nextfile} let you read the next record and start
-over at the top of your program, or skip to the next input file and
+over at the top of your program or skip to the next input file and
start over, respectively.
@item
The @code{exit} statement terminates your program. When executed
-from an action (or function body) it transfers control to the
+from an action (or function body), it transfers control to the
@code{END} statements. From an @code{END} statement body, it exits
immediately. You may pass an optional numeric value to be used
as @command{awk}'s exit status.
@@ -15510,15 +15510,17 @@ the declaration.
indices---e.g., @samp{15 .. 27}---but the size of the array is still fixed when
the array is declared.)
-A contiguous array of four elements might look like the following example,
-conceptually, if the element values are 8, @code{"foo"},
-@code{""}, and 30
+@c 1/2015: Do not put the numeric values into @code. Array element
+@c values are no different than scalar variable values.
+A contiguous array of four elements might look like
@ifnotdocbook
-as shown in @ref{figure-array-elements}:
+@ref{figure-array-elements},
@end ifnotdocbook
@ifdocbook
-as shown in @inlineraw{docbook, <xref linkend="figure-array-elements"/>}:
+@inlineraw{docbook, <xref linkend="figure-array-elements"/>},
@end ifdocbook
+conceptually, if the element values are eight, @code{"foo"},
+@code{""}, and 30.
@ifnotdocbook
@float Figure,figure-array-elements
@@ -15543,7 +15545,7 @@ as shown in @inlineraw{docbook, <xref linkend="figure-array-elements"/>}:
@noindent
Only the values are stored; the indices are implicit from the order of
-the values. Here, 8 is the value at index zero, because 8 appears in the
+the values. Here, eight is the value at index zero, because eight appears in the
position with zero elements before it.
@cindex arrays, indexing
@@ -15555,19 +15557,21 @@ that each array is a collection of pairs---an index and its corresponding
array element value:
@ifnotdocbook
-@example
-@r{Index} 3 @r{Value} 30
-@r{Index} 1 @r{Value} "foo"
-@r{Index} 0 @r{Value} 8
-@r{Index} 2 @r{Value} ""
-@end example
+@c extra empty column to indent it right
+@multitable @columnfractions .1 .1 .1
+@headitem @tab Index @tab Value
+@item @tab @code{3} @tab @code{30}
+@item @tab @code{1} @tab @code{"foo"}
+@item @tab @code{0} @tab @code{8}
+@item @tab @code{2} @tab @code{""}
+@end multitable
@end ifnotdocbook
@docbook
<informaltable>
<tgroup cols="2">
-<colspec colname="1" align="center"/>
-<colspec colname="2" align="center"/>
+<colspec colname="1" align="left"/>
+<colspec colname="2" align="left"/>
<thead>
<row>
<entry>Index</entry>
@@ -15613,20 +15617,22 @@ at any time. For example, suppose a tenth element is added to the array
whose value is @w{@code{"number ten"}}. The result is:
@ifnotdocbook
-@example
-@r{Index} 10 @r{Value} "number ten"
-@r{Index} 3 @r{Value} 30
-@r{Index} 1 @r{Value} "foo"
-@r{Index} 0 @r{Value} 8
-@r{Index} 2 @r{Value} ""
-@end example
+@c extra empty column to indent it right
+@multitable @columnfractions .1 .1 .2
+@headitem @tab Index @tab Value
+@item @tab @code{10} @tab @code{"number ten"}
+@item @tab @code{3} @tab @code{30}
+@item @tab @code{1} @tab @code{"foo"}
+@item @tab @code{0} @tab @code{8}
+@item @tab @code{2} @tab @code{""}
+@end multitable
@end ifnotdocbook
@docbook
<informaltable>
<tgroup cols="2">
-<colspec colname="1" align="center"/>
-<colspec colname="2" align="center"/>
+<colspec colname="1" align="left"/>
+<colspec colname="2" align="left"/>
<thead>
<row>
<entry>Index</entry>
@@ -15678,19 +15684,20 @@ an index. For example, the following is an array that translates words from
English to French:
@ifnotdocbook
-@example
-@r{Index} "dog" @r{Value} "chien"
-@r{Index} "cat" @r{Value} "chat"
-@r{Index} "one" @r{Value} "un"
-@r{Index} 1 @r{Value} "un"
-@end example
+@multitable @columnfractions .1 .1 .1
+@headitem @tab Index @tab Value
+@item @tab @code{"dog"} @tab @code{"chien"}
+@item @tab @code{"cat"} @tab @code{"chat"}
+@item @tab @code{"one"} @tab @code{"un"}
+@item @tab @code{1} @tab @code{"un"}
+@end multitable
@end ifnotdocbook
@docbook
<informaltable>
<tgroup cols="2">
-<colspec colname="1" align="center"/>
-<colspec colname="2" align="center"/>
+<colspec colname="1" align="left"/>
+<colspec colname="2" align="left"/>
<thead>
<row>
<entry>Index</entry>
@@ -15732,7 +15739,7 @@ numbers and strings as indices.
There are some subtleties to how numbers work when used as
array subscripts; this is discussed in more detail in
@ref{Numeric Array Subscripts}.)
-Here, the number @code{1} isn't double quoted, because @command{awk}
+Here, the number @code{1} isn't double-quoted, because @command{awk}
automatically converts it to a string.
@cindex @command{gawk}, @code{IGNORECASE} variable in
@@ -15757,7 +15764,7 @@ is independent of the number of elements in the array.
@cindex elements of arrays
The principal way to use an array is to refer to one of its elements.
-An array reference is an expression as follows:
+An @dfn{array reference} is an expression as follows:
@example
@var{array}[@var{index-expression}]
@@ -15767,8 +15774,11 @@ An array reference is an expression as follows:
Here, @var{array} is the name of an array. The expression @var{index-expression} is
the index of the desired element of the array.
+@c 1/2015: Having the 4.3 in @samp is a little iffy. It's essentially
+@c an expression though, so leave be. It's to early in the discussion
+@c to mention that it's really a string.
The value of the array reference is the current value of that array
-element. For example, @code{foo[4.3]} is an expression for the element
+element. For example, @code{foo[4.3]} is an expression referencing the element
of array @code{foo} at index @samp{4.3}.
@cindex arrays, unassigned elements
@@ -15860,7 +15870,7 @@ assign to that element of the array.
The following program takes a list of lines, each beginning with a line
number, and prints them out in order of line number. The line numbers
-are not in order when they are first read---instead they
+are not in order when they are first read---instead, they
are scrambled. This program sorts the lines by making an array using
the line numbers as subscripts. The program then prints out the lines
in sorted order of their numbers. It is a very simple program and gets
@@ -15954,7 +15964,7 @@ program has previously used, with the variable @var{var} set to that index.
The following program uses this form of the @code{for} statement. The
first rule scans the input records and notes which words appear (at
least once) in the input, by storing a one into the array @code{used} with
-the word as index. The second rule scans the elements of @code{used} to
+the word as the index. The second rule scans the elements of @code{used} to
find all the distinct words that appear in the input. It prints each
word that is more than 10 characters long and also prints the number of
such words.
@@ -16051,7 +16061,7 @@ and will vary from one version of @command{awk} to the next.
Often, though, you may wish to do something simple, such as
``traverse the array by comparing the indices in ascending order,''
or ``traverse the array by comparing the values in descending order.''
-@command{gawk} provides two mechanisms which give you this control.
+@command{gawk} provides two mechanisms that give you this control:
@itemize @value{BULLET}
@item
@@ -16108,21 +16118,26 @@ across different environments.} which @command{gawk} uses internally
to perform the sorting.
@item "@@ind_str_desc"
-String indices ordered from high to low.
+Like @code{"@@ind_str_asc"}, but the
+string indices are ordered from high to low.
@item "@@ind_num_desc"
-Numeric indices ordered from high to low.
+Like @code{"@@ind_num_asc"}, but the
+numeric indices are ordered from high to low.
@item "@@val_type_desc"
-Element values, based on type, ordered from high to low.
+Like @code{"@@val_type_asc"}, but the
+element values, based on type, are ordered from high to low.
Subarrays, if present, come out first.
@item "@@val_str_desc"
-Element values, treated as strings, ordered from high to low.
+Like @code{"@@val_str_asc"}, but the
+element values, treated as strings, are ordered from high to low.
Subarrays, if present, come out first.
@item "@@val_num_desc"
-Element values, treated as numbers, ordered from high to low.
+Like @code{"@@val_num_asc"}, but the
+element values, treated as numbers, are ordered from high to low.
Subarrays, if present, come out first.
@end table
@@ -16345,7 +16360,7 @@ for (i in frequencies)
@noindent
This example removes all the elements from the array @code{frequencies}.
Once an element is deleted, a subsequent @code{for} statement to scan the array
-does not report that element and the @code{in} operator to check for
+does not report that element and using the @code{in} operator to check for
the presence of that element returns zero (i.e., false):
@example
@@ -16605,7 +16620,7 @@ a[1][2] = 2
This simulates a true two-dimensional array. Each subarray element can
contain another subarray as a value, which in turn can hold other arrays
as well. In this way, you can create arrays of three or more dimensions.
-The indices can be any @command{awk} expression, including scalars
+The indices can be any @command{awk} expressions, including scalars
separated by commas (i.e., a regular @command{awk} simulated
multidimensional subscript). So the following is valid in
@command{gawk}:
@@ -16617,7 +16632,7 @@ a[1][3][1, "name"] = "barney"
Each subarray and the main array can be of different length. In fact, the
elements of an array or its subarray do not all have to have the same
type. This means that the main array and any of its subarrays can be
-non-rectangular, or jagged in structure. You can assign a scalar value to
+nonrectangular, or jagged in structure. You can assign a scalar value to
the index @code{4} of the main array @code{a}, even though @code{a[1]}
is itself an array and not a scalar:
@@ -16641,7 +16656,8 @@ a[4][5][6][7] = "An element in a four-dimensional array"
@noindent
This removes the scalar value from index @code{4} and then inserts a
-subarray of subarray of subarray containing a scalar. You can also
+three-level nested subarray
+containing a scalar. You can also
delete an entire subarray or subarray of subarrays:
@example
@@ -16652,7 +16668,7 @@ a[4][5] = "An element in subarray a[4]"
But recall that you can not delete the main array @code{a} and then use it
as a scalar.
-The built-in functions which take array arguments can also be used
+The built-in functions that take array arguments can also be used
with subarrays. For example, the following code fragment uses @code{length()}
(@pxref{String Functions})
to determine the number of elements in the main array @code{a} and
@@ -16682,7 +16698,7 @@ can be nested to scan all the
elements of an array of arrays if it is rectangular in structure. In order
to print the contents (scalar values) of a two-dimensional array of arrays
(i.e., in which each first-level element is itself an
-array, not necessarily of the same length)
+array, not necessarily of the same length),
you could use the following code:
@example
@@ -16782,9 +16798,9 @@ versions of @command{awk}.
@item
Standard @command{awk} simulates multidimensional arrays by separating
-subscript values with a comma. The values are concatenated into a
+subscript values with commas. The values are concatenated into a
single string, separated by the value of @code{SUBSEP}. The fact
-that such a subscript was created in this way is not retained; thus
+that such a subscript was created in this way is not retained; thus,
changing @code{SUBSEP} may have unexpected consequences. You can use
@samp{(@var{sub1}, @var{sub2}, @dots{}) in @var{array}} to see if such
a multidimensional subscript exists in @var{array}.
@@ -16793,7 +16809,7 @@ a multidimensional subscript exists in @var{array}.
@command{gawk} provides true arrays of arrays. You use a separate
set of square brackets for each dimension in such an array:
@code{data[row][col]}, for example. Array elements may thus be either
-scalar values (number or string) or another array.
+scalar values (number or string) or other arrays.
@item
Use the @code{isarray()} built-in function to determine if an array
@@ -16818,6 +16834,9 @@ Besides the built-in functions, @command{awk} has provisions for
writing new functions that the rest of a program can use.
The second half of this @value{CHAPTER} describes these
@dfn{user-defined} functions.
+Finally, we explore indirect function calls, a @command{gawk}-specific
+extension that lets you determine at runtime what function is to
+be called.
@menu
* Built-in:: Summarizes the built-in functions.
@@ -16827,7 +16846,7 @@ The second half of this @value{CHAPTER} describes these
@end menu
@node Built-in
-@section Built-In Functions
+@section Built-in Functions
@dfn{Built-in} functions are always available for
your @command{awk} program to call. This @value{SECTION} defines all
@@ -16850,7 +16869,7 @@ but are summarized here for your convenience.
@end menu
@node Calling Built-in
-@subsection Calling Built-In Functions
+@subsection Calling Built-in Functions
To call one of @command{awk}'s built-in functions, write the name of
the function followed
@@ -16901,7 +16920,7 @@ j = atan2(++i, i *= 2)
@end example
If the order of evaluation is left to right, then @code{i} first becomes
-6, and then 12, and @code{atan2()} is called with the two arguments 6
+six, and then 12, and @code{atan2()} is called with the two arguments six
and 12. But if the order of evaluation is right to left, @code{i}
first becomes 10, then 11, and @code{atan2()} is called with the
two arguments 11 and 10.
@@ -16982,7 +17001,7 @@ In fact, @command{gawk} uses the BSD @code{random()} function, which is
considerably better than @code{rand()}, to produce random numbers.}
Often random integers are needed instead. Following is a user-defined function
-that can be used to obtain a random non-negative integer less than @var{n}:
+that can be used to obtain a random nonnegative integer less than @var{n}:
@example
function randint(n)
@@ -17077,7 +17096,7 @@ implementations.
The functions in this @value{SECTION} look at or change the text of one
or more strings.
-@code{gawk} understands locales (@pxref{Locales}), and does all
+@command{gawk} understands locales (@pxref{Locales}) and does all
string processing in terms of @emph{characters}, not @emph{bytes}.
This distinction is particularly important to understand for locales
where one character may be represented by multiple bytes. Thus, for
@@ -17166,7 +17185,7 @@ a[2] = "de"
a[3] = "sac"
@end example
-The @code{asorti()} function works similarly to @code{asort()}, however,
+The @code{asorti()} function works similarly to @code{asort()}; however,
the @emph{indices} are sorted, instead of the values. Thus, in the
previous example, starting with the same initial set of indices and
values in @code{a}, calling @samp{asorti(a)} would yield:
@@ -17281,7 +17300,7 @@ If @var{find} is not found, @code{index()} returns zero.
With BWK @command{awk} and @command{gawk},
it is a fatal error to use a regexp constant for @var{find}.
Other implementations allow it, simply treating the regexp
-constant as an expression meaning @samp{$0 ~ /regexp/}. @value{DARKCORNER}.
+constant as an expression meaning @samp{$0 ~ /regexp/}. @value{DARKCORNER}
@item @code{length(}[@var{string}]@code{)}
@cindexawkfunc{length}
@@ -17364,7 +17383,7 @@ If @option{--posix} is supplied, using an array argument is a fatal error
@cindex string, regular expression match
@cindex match regexp in string
Search @var{string} for the
-longest, leftmost substring matched by the regular expression,
+longest, leftmost substring matched by the regular expression
@var{regexp} and return the character position (index)
at which that substring begins (one, if it starts at the beginning of
@var{string}). If no match is found, return zero.
@@ -17376,7 +17395,7 @@ In the latter case, the string is treated as a regexp to be matched.
discussion of the difference between the two forms, and the
implications for writing your program correctly.
-The order of the first two arguments is backwards from most other string
+The order of the first two arguments is the opposite of most other string
functions that work with regular expressions, such as
@code{sub()} and @code{gsub()}. It might help to remember that
for @code{match()}, the order is the same as for the @samp{~} operator:
@@ -17465,7 +17484,7 @@ $ @kbd{echo foooobazbarrrrr |}
@end example
There may not be subscripts for the start and index for every parenthesized
-subexpression, because they may not all have matched text; thus they
+subexpression, because they may not all have matched text; thus, they
should be tested for with the @code{in} operator
(@pxref{Reference to Elements}).
@@ -17512,13 +17531,13 @@ a regexp describing where to split @var{string} (much as @code{FS} can
be a regexp describing where to split input records).
If @var{fieldsep} is omitted, the value of @code{FS} is used.
@code{split()} returns the number of elements created.
-@var{seps} is a @command{gawk} extension with @code{@var{seps}[@var{i}]}
+@var{seps} is a @command{gawk} extension, with @code{@var{seps}[@var{i}]}
being the separator string
between @code{@var{array}[@var{i}]} and @code{@var{array}[@var{i}+1]}.
If @var{fieldsep} is a single
-space then any leading whitespace goes into @code{@var{seps}[0]} and
+space, then any leading whitespace goes into @code{@var{seps}[0]} and
any trailing
-whitespace goes into @code{@var{seps}[@var{n}]} where @var{n} is the
+whitespace goes into @code{@var{seps}[@var{n}]}, where @var{n} is the
return value of
@code{split()} (i.e., the number of elements in @var{array}).
@@ -17531,7 +17550,7 @@ split("cul-de-sac", a, "-", seps)
@noindent
@cindex strings splitting, example
-splits the string @samp{cul-de-sac} into three fields using @samp{-} as the
+splits the string @code{"cul-de-sac"} into three fields using @samp{-} as the
separator. It sets the contents of the array @code{a} as follows:
@example
@@ -17556,19 +17575,18 @@ As with input field-splitting, when the value of @var{fieldsep} is
the elements of
@var{array} but not in @var{seps}, and the elements
are separated by runs of whitespace.
-Also, as with input field-splitting, if @var{fieldsep} is the null string, each
+Also, as with input field splitting, if @var{fieldsep} is the null string, each
individual character in the string is split into its own array element.
@value{COMMONEXT}
Note, however, that @code{RS} has no effect on the way @code{split()}
-works. Even though @samp{RS = ""} causes newline to also be an input
+works. Even though @samp{RS = ""} causes the newline character to also be an input
field separator, this does not affect how @code{split()} splits strings.
@cindex dark corner, @code{split()} function
Modern implementations of @command{awk}, including @command{gawk}, allow
-the third argument to be a regexp constant (@code{/abc/}) as well as a
-string.
-@value{DARKCORNER}
+the third argument to be a regexp constant (@w{@code{/}@dots{}@code{/}})
+as well as a string. @value{DARKCORNER}
The POSIX standard allows this as well.
@DBXREF{Computed Regexps} for a
discussion of the difference between using a string constant or a regexp constant,
@@ -17705,7 +17723,7 @@ an @samp{&}:
@cindex @code{sub()} function, arguments of
@cindex @code{gsub()} function, arguments of
As mentioned, the third argument to @code{sub()} must
-be a variable, field or array element.
+be a variable, field, or array element.
Some versions of @command{awk} allow the third argument to
be an expression that is not an lvalue. In such a case, @code{sub()}
still searches for the pattern and returns zero or one, but the result of
@@ -17897,8 +17915,8 @@ example, @code{"a\qb"} is treated as @code{"aqb"}.
At the runtime level, the various functions handle sequences of
@samp{\} and @samp{&} differently. The situation is (sadly) somewhat complex.
-Historically, the @code{sub()} and @code{gsub()} functions treated the two
-character sequence @samp{\&} specially; this sequence was replaced in
+Historically, the @code{sub()} and @code{gsub()} functions treated the
+two-character sequence @samp{\&} specially; this sequence was replaced in
the generated text with a single @samp{&}. Any other @samp{\} within
the @var{replacement} string that did not precede an @samp{&} was passed
through unchanged. This is illustrated in @ref{table-sub-escapes}.
@@ -17956,7 +17974,7 @@ _bigskip}
@end float
@noindent
-This table shows both the lexical-level processing, where
+This table shows the lexical-level processing, where
an odd number of backslashes becomes an even number at the runtime level,
as well as the runtime processing done by @code{sub()}.
(For the sake of simplicity, the rest of the following tables only show the
@@ -17977,7 +17995,7 @@ This is shown in
@ref{table-sub-proposed}.
@float Table,table-sub-proposed
-@caption{GNU @command{awk} rules for @code{sub()} and backslash}
+@caption{@command{gawk} rules for @code{sub()} and backslash}
@tex
\vbox{\bigskip
% We need more characters for escape and tab ...
@@ -18022,7 +18040,7 @@ _bigskip}
@end float
In a nutshell, at the runtime level, there are now three special sequences
-of characters (@samp{\\\&}, @samp{\\&} and @samp{\&}) whereas historically
+of characters (@samp{\\\&}, @samp{\\&}, and @samp{\&}) whereas historically
there was only one. However, as in the historical case, any @samp{\} that
is not part of one of these three sequences is not special and appears
in the output literally.
@@ -18088,7 +18106,7 @@ The only case where the difference is noticeable is the last one: @samp{\\\\}
is seen as @samp{\\} and produces @samp{\} instead of @samp{\\}.
Starting with @value{PVERSION} 3.1.4, @command{gawk} followed the POSIX rules
-when @option{--posix} is specified (@pxref{Options}). Otherwise,
+when @option{--posix} was specified (@pxref{Options}). Otherwise,
it continued to follow the proposed rules, as
that had been its behavior for many years.
@@ -18156,7 +18174,7 @@ _bigskip}
@end ifnottex
@end float
-Because of the complexity of the lexical and runtime level processing
+Because of the complexity of the lexical- and runtime-level processing
and the special cases for @code{sub()} and @code{gsub()},
we recommend the use of @command{gawk} and @code{gensub()} when you have
to do substitutions.
@@ -18182,6 +18200,7 @@ for more information.
When closing a coprocess, it is occasionally useful to first close
one end of the two-way pipe and then to close the other. This is done
by providing a second argument to @code{close()}. This second argument
+(@var{how})
should be one of the two string values @code{"to"} or @code{"from"},
indicating which end of the pipe to close. Case in the string does
not matter.
@@ -18208,7 +18227,7 @@ every little bit of information as soon as it is ready. However, sometimes
it is necessary to force a program to @dfn{flush} its buffers (i.e.,
write the information to its destination, even if a buffer is not full).
This is the purpose of the @code{fflush()} function---@command{gawk} also
-buffers its output and the @code{fflush()} function forces
+buffers its output, and the @code{fflush()} function forces
@command{gawk} to flush its buffers.
@cindex extensions, common@comma{} @code{fflush()} function
@@ -18229,7 +18248,7 @@ would flush only the standard output if there was no argument,
and flush all output files and pipes if the argument was the null
string. This was changed in order to be compatible with Brian
Kernighan's @command{awk}, in the hope that standardizing this
-feature in POSIX would then be easier (which indeed helped).
+feature in POSIX would then be easier (which indeed proved to be the case).
With @command{gawk},
you can use @samp{fflush("/dev/stdout")} if you wish to flush
@@ -18240,7 +18259,7 @@ only the standard output.
@c @cindex warnings, automatic
@cindex troubleshooting, @code{fflush()} function
@code{fflush()} returns zero if the buffer is successfully flushed;
-otherwise, it returns non-zero. (@command{gawk} returns @minus{}1.)
+otherwise, it returns a nonzero value. (@command{gawk} returns @minus{}1.)
In the case where all buffers are flushed, the return value is zero
only if all buffers were flushed successfully. Otherwise, it is
@minus{}1, and @command{gawk} warns about the problem @var{filename}.
@@ -18258,8 +18277,8 @@ In such a case, @code{fflush()} returns @minus{}1, as well.
@cindex buffering, interactive vs.@: noninteractive
-As a side point, buffering issues can be even more confusing, depending
-upon whether your program is @dfn{interactive} (i.e., communicating
+As a side point, buffering issues can be even more confusing if
+your program is @dfn{interactive} (i.e., communicating
with a user sitting at a keyboard).@footnote{A program is interactive
if the standard output is connected to a terminal device. On modern
systems, this means your keyboard and screen.}
@@ -18309,8 +18328,8 @@ it is all buffered and sent down the pipe to @command{cat} in one shot.
@cindex buffering, interactive vs.@: noninteractive
-As a side point, buffering issues can be even more confusing, depending
-upon whether your program is @dfn{interactive} (i.e., communicating
+As a side point, buffering issues can be even more confusing if
+your program is @dfn{interactive} (i.e., communicating
with a user sitting at a keyboard).@footnote{A program is interactive
if the standard output is connected to a terminal device. On modern
systems, this means your keyboard and screen.}
@@ -18354,7 +18373,7 @@ it is all buffered and sent down the pipe to @command{cat} in one shot.
@cindexawkfunc{system}
@cindex invoke shell command
@cindex interacting with other programs
-Execute the operating-system
+Execute the operating system
command @var{command} and then return to the @command{awk} program.
Return @var{command}'s exit status.
@@ -18534,9 +18553,9 @@ you would see the latter (undesirable) output.
@cindex files, log@comma{} timestamps in
@cindex @command{gawk}, timestamps
@cindex POSIX @command{awk}, timestamps and
-@code{awk} programs are commonly used to process log files
+@command{awk} programs are commonly used to process log files
containing timestamp information, indicating when a
-particular log record was written. Many programs log their timestamp
+particular log record was written. Many programs log their timestamps
in the form returned by the @code{time()} system call, which is the
number of seconds since a particular epoch. On POSIX-compliant systems,
it is the number of seconds since
@@ -18597,7 +18616,7 @@ The values of these numbers need not be within the ranges specified;
for example, an hour of @minus{}1 means 1 hour before midnight.
The origin-zero Gregorian calendar is assumed, with year 0 preceding
year 1 and year @minus{}1 preceding year 0.
-The time is assumed to be in the local timezone.
+The time is assumed to be in the local time zone.
If the daylight-savings flag is positive, the time is assumed to be
daylight savings time; if zero, the time is assumed to be standard
time; and if negative (the default), @code{mktime()} attempts to determine
@@ -18757,12 +18776,12 @@ Equivalent to specifying @samp{%H:%M:%S}.
The weekday as a decimal number (1--7). Monday is day one.
@item %U
-The week number of the year (the first Sunday as the first day of week one)
+The week number of the year (with the first Sunday as the first day of week one)
as a decimal number (00--53).
@c @cindex ISO 8601
@item %V
-The week number of the year (the first Monday as the first
+The week number of the year (with the first Monday as the first
day of week one) as a decimal number (01--53).
The method for determining the week number is as specified by ISO 8601.
(To wit: if the week containing January 1 has four or more days in the
@@ -18773,7 +18792,7 @@ and the next week is week one.)
The weekday as a decimal number (0--6). Sunday is day zero.
@item %W
-The week number of the year (the first Monday as the first day of week one)
+The week number of the year (with the first Monday as the first day of week one)
as a decimal number (00--53).
@item %x
@@ -18793,8 +18812,8 @@ The full year as a decimal number (e.g., 2015).
@c @cindex RFC 822
@c @cindex RFC 1036
@item %z
-The timezone offset in a +HHMM format (e.g., the format necessary to
-produce RFC 822/RFC 1036 date headers).
+The time zone offset in a @samp{+@var{HHMM}} format (e.g., the format
+necessary to produce RFC 822/RFC 1036 date headers).
@item %Z
The time zone name or abbreviation; no characters if
@@ -18934,7 +18953,7 @@ The operations are described in @ref{table-bitwise-ops}.
@ifnottex
@ifnotdocbook
@display
- Bit Operator
+ Bit operator
| AND | OR | XOR
|---+---+---+---+---+---
Operands | 0 | 1 | 0 | 1 | 0 | 1
@@ -18992,7 +19011,7 @@ Operands | 0 | 1 | 0 | 1 | 0 | 1
<tbody>
<row>
<entry colsep="0"></entry>
-<entry spanname="optitle"><emphasis role="bold">Bit Operator</emphasis></entry>
+<entry spanname="optitle"><emphasis role="bold">Bit operator</emphasis></entry>
</row>
<row rowsep="1">
@@ -19056,10 +19075,9 @@ of a given value.
Finally, two other common operations are to shift the bits left or right.
For example, if you have a bit string @samp{10111001} and you shift it
right by three bits, you end up with @samp{00010111}.@footnote{This example
-shows that 0's come in on the left side. For @command{gawk}, this is
+shows that zeros come in on the left side. For @command{gawk}, this is
always true, but in some languages, it's possible to have the left side
-fill with 1's.}
-@c Purposely decided to use 0's and 1's here. 2/2001.
+fill with ones.}
If you start over again with @samp{10111001} and shift it left by three
bits, you end up with @samp{11001000}. The following list describes
@command{gawk}'s built-in functions that implement the bitwise operations.
@@ -19113,7 +19131,7 @@ that illustrates the use of these functions:
@example
@group
@c file eg/lib/bits2str.awk
-# bits2str --- turn a byte into readable 1's and 0's
+# bits2str --- turn a byte into readable ones and zeros
function bits2str(bits, data, mask)
@{
@@ -19187,15 +19205,16 @@ $ @kbd{gawk -f testbits.awk}
@cindex converting, numbers to strings
@cindex number as string of bits
The @code{bits2str()} function turns a binary number into a string.
-The number @code{1} represents a binary value where the rightmost bit
-is set to 1. Using this mask,
+Initializing @code{mask} to one creates
+a binary value where the rightmost bit
+is set to one. Using this mask,
the function repeatedly checks the rightmost bit.
ANDing the mask with the value indicates whether the
-rightmost bit is 1 or not. If so, a @code{"1"} is concatenated onto the front
+rightmost bit is one or not. If so, a @code{"1"} is concatenated onto the front
of the string.
Otherwise, a @code{"0"} is added.
The value is then shifted right by one bit and the loop continues
-until there are no more 1 bits.
+until there are no more one bits.
If the initial value is zero, it returns a simple @code{"0"}.
Otherwise, at the end, it pads the value with zeros to represent multiples
@@ -19219,7 +19238,7 @@ that traverses every element of an array of arrays
@cindexgawkfunc{isarray}
@cindex scalar or array
@item isarray(@var{x})
-Return a true value if @var{x} is an array. Otherwise return false.
+Return a true value if @var{x} is an array. Otherwise, return false.
@end table
@code{isarray()} is meant for use in two circumstances. The first is when
@@ -19280,7 +19299,7 @@ The default value for @var{category} is @code{"LC_MESSAGES"}.
Return the plural form used for @var{number} of the
translation of @var{string1} and @var{string2} in text domain
@var{domain} for locale category @var{category}. @var{string1} is the
-English singular variant of a message, and @var{string2} the English plural
+English singular variant of a message, and @var{string2} is the English plural
variant of the same message.
The default value for @var{domain} is the current value of @code{TEXTDOMAIN}.
The default value for @var{category} is @code{"LC_MESSAGES"}.
@@ -19309,7 +19328,7 @@ them (i.e., to tell @command{awk} what they should do).
@subsection Function Definition Syntax
@quotation
-@i{It's entirely fair to say that the @command{awk} syntax for local
+@i{It's entirely fair to say that the awk syntax for local
variable definitions is appallingly awful.}
@author Brian Kernighan
@end quotation
@@ -19351,14 +19370,23 @@ the call.
A function cannot have two parameters with the same name, nor may it
have a parameter with the same name as the function itself.
-In addition, according to the POSIX standard, function parameters
+
+@quotation CAUTION
+According to the POSIX standard, function parameters
cannot have the same name as one of the special predefined variables
-(@pxref{Built-in Variables}). Not all versions of @command{awk} enforce
-this restriction.
+(@pxref{Built-in Variables}), nor may a function parameter have the
+same name as another function.
+
+Not all versions of @command{awk} enforce
+these restrictions.
+@command{gawk} always enforces the first restriction.
+With @option{--posix} (@pxref{Options}),
+it also enforces the second restriction.
+@end quotation
Local variables act like the empty string if referenced where a string
value is required, and like zero if referenced where a numeric value
-is required. This is the same as regular variables that have never been
+is required. This is the same as the behavior of regular variables that have never been
assigned a value. (There is more to understand about local variables;
@pxref{Dynamic Typing}.)
@@ -19392,7 +19420,7 @@ During execution of the function body, the arguments and local variable
values hide, or @dfn{shadow}, any variables of the same names used in the
rest of the program. The shadowed variables are not accessible in the
function definition, because there is no way to name them while their
-names have been taken away for the local variables. All other variables
+names have been taken away for the arguments and local variables. All other variables
used in the @command{awk} program can be referenced or set normally in the
function's body.
@@ -19459,7 +19487,7 @@ function myprint(num)
@end example
@noindent
-To illustrate, here is an @command{awk} rule that uses our @code{myprint}
+To illustrate, here is an @command{awk} rule that uses our @code{myprint()}
function:
@example
@@ -19500,13 +19528,13 @@ in an array and start over with a new list of elements
(@pxref{Delete}).
Instead of having
to repeat this loop everywhere that you need to clear out
-an array, your program can just call @code{delarray}.
+an array, your program can just call @code{delarray()}.
(This guarantees portability. The use of @samp{delete @var{array}} to delete
the contents of an entire array is a relatively recent@footnote{Late in 2012.}
addition to the POSIX standard.)
The following is an example of a recursive function. It takes a string
-as an input parameter and returns the string in backwards order.
+as an input parameter and returns the string in reverse order.
Recursive functions must always have a test that stops the recursion.
In this case, the recursion terminates when the input string is
already empty:
@@ -19603,7 +19631,7 @@ an error.
@cindex local variables, in a function
@cindex variables, local to a function
-Unlike many languages,
+Unlike in many languages,
there is no way to make a variable local to a @code{@{} @dots{} @code{@}} block in
@command{awk}, but you can make a variable local to a function. It is
good practice to do so whenever a variable is needed only in that
@@ -19612,7 +19640,7 @@ function.
To make a variable local to a function, simply declare the variable as
an argument after the actual function arguments
(@pxref{Definition Syntax}).
-Look at the following example where variable
+Look at the following example, where variable
@code{i} is a global variable used by both functions @code{foo()} and
@code{bar()}:
@@ -19653,7 +19681,7 @@ foo's i=3
top's i=3
@end example
-If you want @code{i} to be local to both @code{foo()} and @code{bar()} do as
+If you want @code{i} to be local to both @code{foo()} and @code{bar()}, do as
follows (the extra space before @code{i} is a coding convention to
indicate that @code{i} is a local variable, not an argument):
@@ -19741,7 +19769,7 @@ declare explicitly whether the arguments are passed @dfn{by value} or
@dfn{by reference}.
Instead, the passing convention is determined at runtime when
-the function is called according to the following rule:
+the function is called, according to the following rule:
if the argument is an array variable, then it is passed by reference.
Otherwise, the argument is passed by value.
@@ -19818,7 +19846,7 @@ prints @samp{a[1] = 1, a[2] = two, a[3] = 3}, because
@cindex undefined functions
@cindex functions, undefined
Some @command{awk} implementations allow you to call a function that
-has not been defined. They only report a problem at runtime when the
+has not been defined. They only report a problem at runtime, when the
program actually tries to call the function. For example:
@example
@@ -19877,15 +19905,15 @@ makes the returned value undefined, and therefore, unpredictable.
In practice, though, all versions of @command{awk} simply return the
null string, which acts like zero if used in a numeric context.
-A @code{return} statement with no value expression is assumed at the end of
-every function definition. So if control reaches the end of the function
-body, then technically, the function returns an unpredictable value.
+A @code{return} statement without an @var{expression} is assumed at the end of
+every function definition. So, if control reaches the end of the function
+body, then technically the function returns an unpredictable value.
In practice, it returns the empty string. @command{awk}
does @emph{not} warn you if you use the return value of such a function.
Sometimes, you want to write a function for what it does, not for
what it returns. Such a function corresponds to a @code{void} function
-in C, C++ or Java, or to a @code{procedure} in Ada. Thus, it may be appropriate to not
+in C, C++, or Java, or to a @code{procedure} in Ada. Thus, it may be appropriate to not
return any value; simply bear in mind that you should not be using the
return value of such a function.
@@ -20004,13 +20032,15 @@ function calls, you can specify the name of the function to call as a
string variable, and then call the function. Let's look at an example.
Suppose you have a file with your test scores for the classes you
-are taking. The first field is the class name. The following fields
+are taking, and
+you wish to get the sum and the average of
+your test scores.
+The first field is the class name. The following fields
are the functions to call to process the data, up to a ``marker''
field @samp{data:}. Following the marker, to the end of the record,
are the various numeric test scores.
-Here is the initial file; you wish to get the sum and the average of
-your test scores:
+Here is the initial file:
@example
@c file eg/data/class_data1
@@ -20093,9 +20123,9 @@ function sum(first, last, ret, i)
@c endfile
@end example
-These two functions expect to work on fields; thus the parameters
+These two functions expect to work on fields; thus, the parameters
@code{first} and @code{last} indicate where in the fields to start and end.
-Otherwise they perform the expected computations and are not unusual:
+Otherwise, they perform the expected computations and are not unusual:
@example
@c file eg/prog/indirectcall.awk
@@ -20154,8 +20184,8 @@ The ability to use indirect function calls is more powerful than you may
think at first. The C and C++ languages provide ``function pointers,'' which
are a mechanism for calling a function chosen at runtime. One of the most
well-known uses of this ability is the C @code{qsort()} function, which sorts
-an array using the famous ``quick sort'' algorithm
-(see @uref{http://en.wikipedia.org/wiki/Quick_sort, the Wikipedia article}
+an array using the famous ``quicksort'' algorithm
+(see @uref{http://en.wikipedia.org/wiki/Quicksort, the Wikipedia article}
for more information). To use this function, you supply a pointer to a comparison
function. This mechanism allows you to sort arbitrary data in an arbitrary
fashion.
@@ -20174,11 +20204,11 @@ We can do something similar using @command{gawk}, like this:
# January 2009
@c endfile
-
@end ignore
@c file eg/lib/quicksort.awk
-# quicksort --- C.A.R. Hoare's quick sort algorithm. See Wikipedia
-# or almost any algorithms or computer science text
+
+# quicksort --- C.A.R. Hoare's quicksort algorithm. See Wikipedia
+# or almost any algorithms or computer science text.
@c endfile
@ignore
@c file eg/lib/quicksort.awk
@@ -20216,7 +20246,7 @@ function quicksort_swap(data, i, j, temp)
The @code{quicksort()} function receives the @code{data} array, the starting and ending
indices to sort (@code{left} and @code{right}), and the name of a function that
-performs a ``less than'' comparison. It then implements the quick sort algorithm.
+performs a ``less than'' comparison. It then implements the quicksort algorithm.
To make use of the sorting function, we return to our previous example. The
first thing to do is write some comparison functions:
@@ -20407,7 +20437,7 @@ for (i = 1; i <= n; i++)
@end example
@noindent
-@code{gawk} looks up the actual function to call only once.
+@command{gawk} looks up the actual function to call only once.
@node Functions Summary
@section Summary
@@ -20503,7 +20533,7 @@ It contains the following chapters:
your own @command{awk} functions. Writing functions is important, because
it allows you to encapsulate algorithms and program tasks in a single
place. It simplifies programming, making program development more
-manageable, and making programs more readable.
+manageable and making programs more readable.
@cindex Kernighan, Brian
@cindex Plauger, P.J.@:
@@ -20632,7 +20662,7 @@ often use variable names like these for their own purposes.
The example programs shown in this @value{CHAPTER} all start the names of their
private variables with an underscore (@samp{_}). Users generally don't use
leading underscores in their variable names, so this convention immediately
-decreases the chances that the variable name will be accidentally shared
+decreases the chances that the variable names will be accidentally shared
with the user's program.
@cindex @code{_} (underscore), in names of private variables
@@ -20650,8 +20680,8 @@ show how our own @command{awk} programming style has evolved and to
provide some basis for this discussion.}
As a final note on variable naming, if a function makes global variables
-available for use by a main program, it is a good convention to start that
-variable's name with a capital letter---for
+available for use by a main program, it is a good convention to start those
+variables' names with a capital letter---for
example, @code{getopt()}'s @code{Opterr} and @code{Optind} variables
(@pxref{Getopt Function}).
The leading capital letter indicates that it is global, while the fact that
@@ -20662,7 +20692,7 @@ not one of @command{awk}'s predefined variables, such as @code{FS}.
It is also important that @emph{all} variables in library
functions that do not need to save state are, in fact, declared
local.@footnote{@command{gawk}'s @option{--dump-variables} command-line
-option is useful for verifying this.} If this is not done, the variable
+option is useful for verifying this.} If this is not done, the variables
could accidentally be used in the user's program, leading to bugs that
are very difficult to track down:
@@ -20860,7 +20890,7 @@ Following is the function:
@example
@c file eg/lib/assert.awk
-# assert --- assert that a condition is true. Otherwise exit.
+# assert --- assert that a condition is true. Otherwise, exit.
@c endfile
@ignore
@@ -20896,7 +20926,7 @@ is false, it prints a message to standard error, using the @code{string}
parameter to describe the failed condition. It then sets the variable
@code{_assert_exit} to one and executes the @code{exit} statement.
The @code{exit} statement jumps to the @code{END} rule. If the @code{END}
-rules finds @code{_assert_exit} to be true, it exits immediately.
+rule finds @code{_assert_exit} to be true, it exits immediately.
The purpose of the test in the @code{END} rule is to
keep any other @code{END} rules from running. When an assertion fails, the
@@ -21188,7 +21218,7 @@ all the strings in an array into one long string. The following function,
the application programs
(@pxref{Sample Programs}).
-Good function design is important; this function needs to be general but it
+Good function design is important; this function needs to be general, but it
should also have a reasonable default behavior. It is called with an array
as well as the beginning and ending indices of the elements in the array to be
merged. This assumes that the array indices are numeric---a reasonable
@@ -21336,7 +21366,7 @@ allowed the user to supply an optional timestamp value to use instead
of the current time.
@node Readfile Function
-@subsection Reading a Whole File At Once
+@subsection Reading a Whole File at Once
Often, it is convenient to have the entire contents of a file available
in memory as a single string. A straightforward but naive way to
@@ -21393,13 +21423,13 @@ function readfile(file, tmp, save_rs)
It works by setting @code{RS} to @samp{^$}, a regular expression that
will never match if the file has contents. @command{gawk} reads data from
-the file into @code{tmp} attempting to match @code{RS}. The match fails
+the file into @code{tmp}, attempting to match @code{RS}. The match fails
after each read, but fails quickly, such that @command{gawk} fills
@code{tmp} with the entire contents of the file.
(@DBXREF{Records} for information on @code{RT} and @code{RS}.)
In the case that @code{file} is empty, the return value is the null
-string. Thus calling code may use something like:
+string. Thus, calling code may use something like:
@example
contents = readfile("/some/path")
@@ -21410,7 +21440,7 @@ if (length(contents) == 0)
This tests the result to see if it is empty or not. An equivalent
test would be @samp{contents == ""}.
-@xref{Extension Sample Readfile}, for an extension function that
+@DBXREF{Extension Sample Readfile} for an extension function that
also reads an entire file into memory.
@node Shell Quoting
@@ -21517,8 +21547,8 @@ The @code{BEGIN} and @code{END} rules are each executed exactly once, at
the beginning and end of your @command{awk} program, respectively
(@pxref{BEGIN/END}).
We (the @command{gawk} authors) once had a user who mistakenly thought that the
-@code{BEGIN} rule is executed at the beginning of each @value{DF} and the
-@code{END} rule is executed at the end of each @value{DF}.
+@code{BEGIN} rules were executed at the beginning of each @value{DF} and the
+@code{END} rules were executed at the end of each @value{DF}.
When informed
that this was not the case, the user requested that we add new special
@@ -21558,7 +21588,7 @@ END @{ endfile(FILENAME) @}
This file must be loaded before the user's ``main'' program, so that the
rule it supplies is executed first.
-This rule relies on @command{awk}'s @code{FILENAME} variable that
+This rule relies on @command{awk}'s @code{FILENAME} variable, which
automatically changes for each new @value{DF}. The current @value{FN} is
saved in a private variable, @code{_oldfilename}. If @code{FILENAME} does
not equal @code{_oldfilename}, then a new @value{DF} is being processed and
@@ -21574,7 +21604,7 @@ first @value{DF}.
The program also supplies an @code{END} rule to do the final processing for
the last file. Because this @code{END} rule comes before any @code{END} rules
supplied in the ``main'' program, @code{endfile()} is called first. Once
-again the value of multiple @code{BEGIN} and @code{END} rules should be clear.
+again, the value of multiple @code{BEGIN} and @code{END} rules should be clear.
@cindex @code{beginfile()} user-defined function
@cindex @code{endfile()} user-defined function
@@ -21622,7 +21652,7 @@ how it simplifies writing the main program.
You are probably wondering, if @code{beginfile()} and @code{endfile()}
functions can do the job, why does @command{gawk} have
-@code{BEGINFILE} and @code{ENDFILE} patterns (@pxref{BEGINFILE/ENDFILE})?
+@code{BEGINFILE} and @code{ENDFILE} patterns?
Good question. Normally, if @command{awk} cannot open a file, this
causes an immediate fatal error. In this case, there is no way for a
@@ -21631,6 +21661,7 @@ calling it relies on the file being open and at the first record. Thus,
the main reason for @code{BEGINFILE} is to give you a ``hook'' to catch
files that cannot be processed. @code{ENDFILE} exists for symmetry,
and because it provides an easy way to do per-file cleanup processing.
+For more information, refer to @ref{BEGINFILE/ENDFILE}.
@docbook
</sidebar>
@@ -21645,7 +21676,7 @@ and because it provides an easy way to do per-file cleanup processing.
You are probably wondering, if @code{beginfile()} and @code{endfile()}
functions can do the job, why does @command{gawk} have
-@code{BEGINFILE} and @code{ENDFILE} patterns (@pxref{BEGINFILE/ENDFILE})?
+@code{BEGINFILE} and @code{ENDFILE} patterns?
Good question. Normally, if @command{awk} cannot open a file, this
causes an immediate fatal error. In this case, there is no way for a
@@ -21654,6 +21685,7 @@ calling it relies on the file being open and at the first record. Thus,
the main reason for @code{BEGINFILE} is to give you a ``hook'' to catch
files that cannot be processed. @code{ENDFILE} exists for symmetry,
and because it provides an easy way to do per-file cleanup processing.
+For more information, refer to @ref{BEGINFILE/ENDFILE}.
@end cartouche
@end ifnotdocbook
@@ -21661,7 +21693,7 @@ and because it provides an easy way to do per-file cleanup processing.
@subsection Rereading the Current File
@cindex files, reading
-Another request for a new built-in function was for a @code{rewind()}
+Another request for a new built-in function was for a
function that would make it possible to reread the current file.
The requesting user didn't want to have to use @code{getline}
(@pxref{Getline})
@@ -21670,7 +21702,7 @@ inside a loop.
However, as long as you are not in the @code{END} rule, it is
quite easy to arrange to immediately close the current input file
and then start over with it from the top.
-For lack of a better name, we'll call it @code{rewind()}:
+For lack of a better name, we'll call the function @code{rewind()}:
@cindex @code{rewind()} user-defined function
@example
@@ -21763,16 +21795,16 @@ See also @ref{ARGC and ARGV}.
Because @command{awk} variable names only allow the English letters,
the regular expression check purposely does not use character classes
such as @samp{[:alpha:]} and @samp{[:alnum:]}
-(@pxref{Bracket Expressions})
+(@pxref{Bracket Expressions}).
@node Empty Files
-@subsection Checking for Zero-length Files
+@subsection Checking for Zero-Length Files
All known @command{awk} implementations silently skip over zero-length files.
This is a by-product of @command{awk}'s implicit
read-a-record-and-match-against-the-rules loop: when @command{awk}
tries to read a record from an empty file, it immediately receives an
-end of file indication, closes the file, and proceeds on to the next
+end-of-file indication, closes the file, and proceeds on to the next
command-line @value{DF}, @emph{without} executing any user-level
@command{awk} program code.
@@ -21837,7 +21869,7 @@ Occasionally, you might not want @command{awk} to process command-line
variable assignments
(@pxref{Assignment Options}).
In particular, if you have a @value{FN} that contains an @samp{=} character,
-@command{awk} treats the @value{FN} as an assignment, and does not process it.
+@command{awk} treats the @value{FN} as an assignment and does not process it.
Some users have suggested an additional command-line option for @command{gawk}
to disable command-line assignments. However, some simple programming with
@@ -22199,8 +22231,8 @@ BEGIN @{
@c endfile
@end example
-The rest of the @code{BEGIN} rule is a simple test program. Here is the
-result of two sample runs of the test program:
+The rest of the @code{BEGIN} rule is a simple test program. Here are the
+results of two sample runs of the test program:
@example
$ @kbd{awk -f getopt.awk -v _getopt_test=1 -- -a -cbARG bax -x}
@@ -22258,7 +22290,7 @@ use @code{getopt()} to process their arguments.
The @code{PROCINFO} array
(@pxref{Built-in Variables})
provides access to the current user's real and effective user and group ID
-numbers, and if available, the user's supplementary group set.
+numbers, and, if available, the user's supplementary group set.
However, because these are numbers, they do not provide very useful
information to the average user. There needs to be some way to find the
user information associated with the user and group ID numbers. This
@@ -22278,7 +22310,7 @@ kept. Instead, it provides the @code{<pwd.h>} header file
and several C language subroutines for obtaining user information.
The primary function is @code{getpwent()}, for ``get password entry.''
The ``password'' comes from the original user database file,
-@file{/etc/passwd}, which stores user information, along with the
+@file{/etc/passwd}, which stores user information along with the
encrypted passwords (hence the name).
@cindex @command{pwcat} program
@@ -22377,7 +22409,7 @@ The user's encrypted password. This may not be available on some systems.
@item User-ID
The user's numeric user ID number.
-(On some systems, it's a C @code{long}, and not an @code{int}. Thus
+(On some systems, it's a C @code{long}, and not an @code{int}. Thus,
we cast it to @code{long} for all cases.)
@item Group-ID
@@ -22504,7 +22536,7 @@ The code that checks for using @code{FPAT}, using @code{using_fpat}
and @code{PROCINFO["FS"]}, is similar.
The main part of the function uses a loop to read database lines, split
-the line into fields, and then store the line into each array as necessary.
+the lines into fields, and then store the lines into each array as necessary.
When the loop is done, @code{@w{_pw_init()}} cleans up by closing the pipeline,
setting @code{@w{_pw_inited}} to one, and restoring @code{FS}
(and @code{FIELDWIDTHS} or @code{FPAT}
@@ -22721,7 +22753,7 @@ it is usually empty or set to @samp{*}.
@item Group ID Number
The group's numeric group ID number;
the association of name to number must be unique within the file.
-(On some systems it's a C @code{long}, and not an @code{int}. Thus
+(On some systems it's a C @code{long}, and not an @code{int}. Thus,
we cast it to @code{long} for all cases.)
@item Group Member List
@@ -22835,32 +22867,32 @@ The @code{@w{_gr_init()}} function first saves @code{FS},
@code{$0}, and then sets @code{FS} and @code{RS} to the correct values for
scanning the group information.
It also takes care to note whether @code{FIELDWIDTHS} or @code{FPAT}
-is being used, and to restore the appropriate field splitting mechanism.
+is being used, and to restore the appropriate field-splitting mechanism.
-The group information is stored is several associative arrays.
+The group information is stored in several associative arrays.
The arrays are indexed by group name (@code{@w{_gr_byname}}), by group ID number
(@code{@w{_gr_bygid}}), and by position in the database (@code{@w{_gr_bycount}}).
There is an additional array indexed by username (@code{@w{_gr_groupsbyuser}}),
which is a space-separated list of groups to which each user belongs.
-Unlike the user database, it is possible to have multiple records in the
+Unlike in the user database, it is possible to have multiple records in the
database for the same group. This is common when a group has a large number
of members. A pair of such entries might look like the following:
@example
-tvpeople:*:101:johny,jay,arsenio
+tvpeople:*:101:johnny,jay,arsenio
tvpeople:*:101:david,conan,tom,joan
@end example
For this reason, @code{_gr_init()} looks to see if a group name or
-group ID number is already seen. If it is, the usernames are
-simply concatenated onto the previous list of users.@footnote{There is actually a
+group ID number is already seen. If so, the usernames are
+simply concatenated onto the previous list of users.@footnote{There is a
subtle problem with the code just presented. Suppose that
the first time there were no names. This code adds the names with
a leading comma. It also doesn't check that there is a @code{$4}.}
Finally, @code{_gr_init()} closes the pipeline to @command{grcat}, restores
-@code{FS} (and @code{FIELDWIDTHS} or @code{FPAT} if necessary), @code{RS}, and @code{$0},
+@code{FS} (and @code{FIELDWIDTHS} or @code{FPAT}, if necessary), @code{RS}, and @code{$0},
initializes @code{_gr_count} to zero
(it is used later), and makes @code{_gr_inited} nonzero.
@@ -22960,12 +22992,12 @@ uses these functions.
@DBREF{Arrays of Arrays} described how @command{gawk}
provides arrays of arrays. In particular, any element of
-an array may be either a scalar, or another array. The
+an array may be either a scalar or another array. The
@code{isarray()} function (@pxref{Type Functions})
lets you distinguish an array
from a scalar.
The following function, @code{walk_array()}, recursively traverses
-an array, printing each element's indices and value.
+an array, printing the element indices and values.
You call it with the array and a string representing the name
of the array:
@@ -23037,24 +23069,24 @@ The functions presented here fit into the following categories:
@c nested list
@table @asis
@item General problems
-Number-to-string conversion, assertions, rounding, random number
+Number-to-string conversion, testing assertions, rounding, random number
generation, converting characters to numbers, joining strings, getting
easily usable time-of-day information, and reading a whole file in
-one shot.
+one shot
@item Managing @value{DF}s
Noting @value{DF} boundaries, rereading the current file, checking for
readable files, checking for zero-length files, and treating assignments
-as @value{FN}s.
+as @value{FN}s
@item Processing command-line options
-An @command{awk} version of the standard C @code{getopt()} function.
+An @command{awk} version of the standard C @code{getopt()} function
@item Reading the user and group databases
-Two sets of routines that parallel the C library versions.
+Two sets of routines that parallel the C library versions
@item Traversing arrays of arrays
-A simple function to traverse an array of arrays to any depth.
+A simple function to traverse an array of arrays to any depth
@end table
@c end nested list
@@ -23149,10 +23181,10 @@ in this @value{CHAPTER}.
The second presents @command{awk}
versions of several common POSIX utilities.
These are programs that you are hopefully already familiar with,
-and therefore, whose problems are understood.
+and therefore whose problems are understood.
By reimplementing these programs in @command{awk},
you can focus on the @command{awk}-related aspects of solving
-the programming problem.
+the programming problems.
The third is a grab bag of interesting programs.
These solve a number of different data-manipulation and management
@@ -23212,7 +23244,7 @@ It should be noted that these programs are not necessarily intended to
replace the installed versions on your system.
Nor may all of these programs be fully compliant with the most recent
POSIX standard. This is not a problem; their
-purpose is to illustrate @command{awk} language programming for ``real world''
+purpose is to illustrate @command{awk} language programming for ``real-world''
tasks.
The programs are presented in alphabetical order.
@@ -23241,7 +23273,7 @@ but you may supply a command-line option to change the field
@dfn{delimiter} (i.e., the field-separator character). @command{cut}'s
definition of fields is less general than @command{awk}'s.
-A common use of @command{cut} might be to pull out just the login name of
+A common use of @command{cut} might be to pull out just the login names of
logged-on users from the output of @command{who}. For example, the following
pipeline generates a sorted, unique list of the logged-on users:
@@ -23750,7 +23782,7 @@ successful or unsuccessful match. If the line does not match, the
@code{next} statement just moves on to the next record.
A number of additional tests are made, but they are only done if we
-are not counting lines. First, if the user only wants exit status
+are not counting lines. First, if the user only wants the exit status
(@code{no_print} is true), then it is enough to know that @emph{one}
line in this file matched, and we can skip on to the next file with
@code{nextfile}. Similarly, if we are only printing @value{FN}s, we can
@@ -23791,7 +23823,7 @@ if necessary:
@end example
The @code{END} rule takes care of producing the correct exit status. If
-there are no matches, the exit status is one; otherwise it is zero:
+there are no matches, the exit status is one; otherwise, it is zero:
@example
@c file eg/prog/egrep.awk
@@ -23843,7 +23875,8 @@ Here is a simple version of @command{id} written in @command{awk}.
It uses the user database library functions
(@pxref{Passwd Functions})
and the group database library functions
-(@pxref{Group Functions}):
+(@pxref{Group Functions})
+from @ref{Library Functions}.
The program is fairly straightforward. All the work is done in the
@code{BEGIN} rule. The user and group ID numbers are obtained from
@@ -23970,8 +24003,8 @@ By default,
the output files are named @file{xaa}, @file{xab}, and so on. Each file has
1,000 lines in it, with the likely exception of the last file. To change the
number of lines in each file, supply a number on the command line
-preceded with a minus (e.g., @samp{-500} for files with 500 lines in them
-instead of 1,000). To change the name of the output files to something like
+preceded with a minus sign (e.g., @samp{-500} for files with 500 lines in them
+instead of 1,000). To change the names of the output files to something like
@file{myfileaa}, @file{myfileab}, and so on, supply an additional
argument that specifies the @value{FN} prefix.
@@ -24810,7 +24843,7 @@ checking and setting of defaults: the delay, the count, and the message to
print. If the user supplied a message without the ASCII BEL
character (known as the ``alert'' character, @code{"\a"}), then it is added to
the message. (On many systems, printing the ASCII BEL generates an
-audible alert. Thus when the alarm goes off, the system calls attention
+audible alert. Thus, when the alarm goes off, the system calls attention
to itself in case the user is not looking at the computer.)
Just for a change, this program uses a @code{switch} statement
(@pxref{Switch Statement}), but the processing could be done with a series of
@@ -24979,7 +25012,7 @@ to @command{gawk}.
@c at least theoretically
The following program was written to
prove that character transliteration could be done with a user-level
-function. This program is not as complete as the system @command{tr} utility
+function. This program is not as complete as the system @command{tr} utility,
but it does most of the job.
The @command{translate} program was written long before @command{gawk}
@@ -24991,13 +25024,13 @@ takes three arguments:
@table @code
@item from
-A list of characters from which to translate.
+A list of characters from which to translate
@item to
-A list of characters to which to translate.
+A list of characters to which to translate
@item target
-The string on which to do the translation.
+The string on which to do the translation
@end table
Associative arrays make the translation part fairly easy. @code{t_ar} holds
@@ -25006,7 +25039,7 @@ loop goes through @code{from}, one character at a time. For each character
in @code{from}, if the character appears in @code{target},
it is replaced with the corresponding @code{to} character.
-The @code{translate()} function calls @code{stranslate()} using @code{$0}
+The @code{translate()} function calls @code{stranslate()}, using @code{$0}
as the target. The main program sets two global variables, @code{FROM} and
@code{TO}, from the command line, and then changes @code{ARGV} so that
@command{awk} reads from the standard input.
@@ -25028,7 +25061,7 @@ Finally, the processing rule simply calls @code{translate()} for each record:
@c endfile
@end ignore
@c file eg/prog/translate.awk
-# Bugs: does not handle things like: tr A-Z a-z, it has
+# Bugs: does not handle things like tr A-Z a-z; it has
# to be spelled out. However, if `to' is shorter than `from',
# the last character in `to' is used for the rest of `from'.
@@ -25104,7 +25137,7 @@ for inspiration.
@cindex printing, mailing labels
@cindex mailing labels@comma{} printing
-Here is a ``real world''@footnote{``Real world'' is defined as
+Here is a ``real-world''@footnote{``Real world'' is defined as
``a program actually used to get something done.''}
program. This
script reads lists of names and
@@ -25113,7 +25146,7 @@ on it, two across and 10 down. The addresses are guaranteed to be no more
than five lines of data. Each address is separated from the next by a blank
line.
-The basic idea is to read 20 labels worth of data. Each line of each label
+The basic idea is to read 20 labels' worth of data. Each line of each label
is stored in the @code{line} array. The single rule takes care of filling
the @code{line} array and printing the page when 20 labels have been read.
@@ -25136,12 +25169,12 @@ of lines on the page
Most of the work is done in the @code{printpage()} function.
The label lines are stored sequentially in the @code{line} array. But they
-have to print horizontally; @code{line[1]} next to @code{line[6]},
+have to print horizontally: @code{line[1]} next to @code{line[6]},
@code{line[2]} next to @code{line[7]}, and so on. Two loops
accomplish this. The outer loop, controlled by @code{i}, steps through
every 10 lines of data; this is each row of labels. The inner loop,
controlled by @code{j}, goes through the lines within the row.
-As @code{j} goes from 0 to 4, @samp{i+j} is the @code{j}-th line in
+As @code{j} goes from 0 to 4, @samp{i+j} is the @code{j}th line in
the row, and @samp{i+j+5} is the entry next to it. The output ends up
looking something like this:
@@ -25259,8 +25292,8 @@ END @{
@}
@end example
-The program relies on @command{awk}'s default field splitting
-mechanism to break each line up into ``words,'' and uses an
+The program relies on @command{awk}'s default field-splitting
+mechanism to break each line up into ``words'' and uses an
associative array named @code{freq}, indexed by each word, to count
the number of times the word occurs. In the @code{END} rule,
it prints the counts.
@@ -25365,7 +25398,7 @@ to use the @command{sort} program.
@cindex lines, duplicate@comma{} removing
The @command{uniq} program
-(@pxref{Uniq Program}),
+(@pxref{Uniq Program})
removes duplicate lines from @emph{sorted} data.
Suppose, however, you need to remove duplicate lines from a @value{DF} but
@@ -25452,7 +25485,7 @@ Texinfo input file into separate files.
@cindex Texinfo
This @value{DOCUMENT} is written in @uref{http://www.gnu.org/software/texinfo/, Texinfo},
-the GNU project's document formatting language.
+the GNU Project's document formatting language.
A single Texinfo source file can be used to produce both
printed documentation, with @TeX{}, and online documentation.
@ifnotinfo
@@ -25511,7 +25544,7 @@ The Texinfo file looks something like this:
@example
@dots{}
-This program has a @@code@{BEGIN@} rule,
+This program has a @@code@{BEGIN@} rule
that prints a nice message:
@@example
@@ -25540,7 +25573,7 @@ exits with a zero exit status, signifying OK:
@cindex @code{extract.awk} program
@example
@c file eg/prog/extract.awk
-# extract.awk --- extract files and run programs from texinfo files
+# extract.awk --- extract files and run programs from Texinfo files
@c endfile
@ignore
@c file eg/prog/extract.awk
@@ -25581,12 +25614,12 @@ The second rule handles moving data into files. It verifies that a
@value{FN} is given in the directive. If the file named is not the
current file, then the current file is closed. Keeping the current file
open until a new file is encountered allows the use of the @samp{>}
-redirection for printing the contents, keeping open file management
+redirection for printing the contents, keeping open-file management
simple.
The @code{for} loop does the work. It reads lines using @code{getline}
(@pxref{Getline}).
-For an unexpected end of file, it calls the @code{@w{unexpected_eof()}}
+For an unexpected end-of-file, it calls the @code{@w{unexpected_eof()}}
function. If the line is an ``endfile'' line, then it breaks out of
the loop.
If the line is an @samp{@@group} or @samp{@@end group} line, then it
@@ -25688,7 +25721,7 @@ END @{
@cindex @command{sed} utility
@cindex stream editors
-The @command{sed} utility is a stream editor, a program that reads a
+The @command{sed} utility is a @dfn{stream editor}, a program that reads a
stream of data, makes changes to it, and passes it on.
It is often used to make global changes to a large file or to a stream
of data generated by a pipeline of commands.
@@ -25833,7 +25866,7 @@ includes don't accidentally include a library function twice.
@command{igawk} should behave just like @command{gawk} externally. This
means it should accept all of @command{gawk}'s command-line arguments,
including the ability to have multiple source files specified via
-@option{-f}, and the ability to mix command-line and library source files.
+@option{-f} and the ability to mix command-line and library source files.
The program is written using the POSIX Shell (@command{sh}) command
language.@footnote{Fully explaining the @command{sh} language is beyond
@@ -25872,7 +25905,7 @@ Run the expanded program with @command{gawk} and any other original command-line
arguments that the user supplied (such as the @value{DF} names).
@end enumerate
-This program uses shell variables extensively: for storing command-line arguments,
+This program uses shell variables extensively: for storing command-line arguments and
the text of the @command{awk} program that will expand the user's program, for the
user's original program, and for the expanded program. Doing so removes some
potential problems that might arise were we to use temporary files instead,
@@ -26189,22 +26222,7 @@ Save the results of this processing in the shell variable
The last step is to call @command{gawk} with the expanded program,
along with the original
-options and command-line arguments that the user supplied.
-
-@c this causes more problems than it solves, so leave it out.
-@ignore
-The special file @file{/dev/null} is passed as a @value{DF} to @command{gawk}
-to handle an interesting case. Suppose that the user's program only has
-a @code{BEGIN} rule and there are no @value{DF}s to read.
-The program should exit without reading any @value{DF}s.
-However, suppose that an included library file defines an @code{END}
-rule of its own. In this case, @command{gawk} will hang, reading standard
-input. In order to avoid this, @file{/dev/null} is explicitly added to the
-command line. Reading from @file{/dev/null} always returns an immediate
-end of file indication.
-
-@c Hmm. Add /dev/null if $# is 0? Still messes up ARGV. Sigh.
-@end ignore
+options and command-line arguments that the user supplied:
@example
@c file eg/prog/igawk.sh
@@ -26270,8 +26288,8 @@ the same letters
Column 2, Problem C, of Jon Bentley's @cite{Programming Pearls}, Second
Edition, presents an elegant algorithm. The idea is to give words that
are anagrams a common signature, sort all the words together by their
-signature, and then print them. Dr.@: Bentley observes that taking the
-letters in each word and sorting them produces that common signature.
+signatures, and then print them. Dr.@: Bentley observes that taking the
+letters in each word and sorting them produces those common signatures.
The following program uses arrays of arrays to bring together
words with the same signature and array sorting to print the words
@@ -26280,8 +26298,8 @@ in sorted order:
@cindex @code{anagram.awk} program
@example
@c file eg/prog/anagram.awk
-# anagram.awk --- An implementation of the anagram finding algorithm
-# from Jon Bentley's "Programming Pearls", 2nd edition.
+# anagram.awk --- An implementation of the anagram-finding algorithm
+# from Jon Bentley's "Programming Pearls," 2nd edition.
# Addison Wesley, 2000, ISBN 0-201-65788-0.
# Column 2, Problem C, section 2.8, pp 18-20.
@c endfile
@@ -26329,7 +26347,7 @@ sorts the letters, and then joins them back together:
@example
@c file eg/prog/anagram.awk
-# word2key --- split word apart into letters, sort, joining back together
+# word2key --- split word apart into letters, sort, and join back together
function word2key(word, a, i, n, result)
@{
@@ -26524,12 +26542,13 @@ characters. The ability to use @code{split()} with the empty string as
the separator can considerably simplify such tasks.
@item
-The library functions from @ref{Library Functions}, proved their
-usefulness for a number of real (if small) programs.
+The examples here demonstrate the usefulness of the library
+functions from @DBREF{Library Functions}
+for a number of real (if small) programs.
@item
Besides reinventing POSIX wheels, other programs solved a selection of
-interesting problems, such as finding duplicates words in text, printing
+interesting problems, such as finding duplicate words in text, printing
mailing labels, and finding anagrams.
@end itemize
@@ -26725,18 +26744,18 @@ a violent psychopath who knows where you live.}
This @value{CHAPTER} discusses advanced features in @command{gawk}.
It's a bit of a ``grab bag'' of items that are otherwise unrelated
to each other.
-First, a command-line option allows @command{gawk} to recognize
+First, we look at a command-line option that allows @command{gawk} to recognize
nondecimal numbers in input data, not just in @command{awk}
programs.
Then, @command{gawk}'s special features for sorting arrays are presented.
Next, two-way I/O, discussed briefly in earlier parts of this
@value{DOCUMENT}, is described in full detail, along with the basics
-of TCP/IP networking. Finally, @command{gawk}
+of TCP/IP networking. Finally, we see how @command{gawk}
can @dfn{profile} an @command{awk} program, making it possible to tune
it for performance.
@c FULLXREF ON
-A number of advanced features require separate @value{CHAPTER}s of their
+Additional advanced features are discussed in separate @value{CHAPTER}s of their
own:
@itemize @value{BULLET}
@@ -26830,7 +26849,8 @@ This option may disappear in a future version of @command{gawk}.
@node Array Sorting
@section Controlling Array Traversal and Array Sorting
-@command{gawk} lets you control the order in which a @samp{for (i in array)}
+@command{gawk} lets you control the order in which a
+@samp{for (@var{indx} in @var{array})}
loop traverses an array.
In addition, two built-in functions, @code{asort()} and @code{asorti()},
@@ -26846,7 +26866,7 @@ to order the elements during sorting.
@node Controlling Array Traversal
@subsection Controlling Array Traversal
-By default, the order in which a @samp{for (i in array)} loop
+By default, the order in which a @samp{for (@var{indx} in @var{array})} loop
scans an array is not defined; it is generally based upon
the internal implementation of arrays inside @command{awk}.
@@ -26875,23 +26895,23 @@ function comp_func(i1, v1, i2, v2)
@}
@end example
-Here, @var{i1} and @var{i2} are the indices, and @var{v1} and @var{v2}
+Here, @code{i1} and @code{i2} are the indices, and @code{v1} and @code{v2}
are the corresponding values of the two elements being compared.
-Either @var{v1} or @var{v2}, or both, can be arrays if the array being
+Either @code{v1} or @code{v2}, or both, can be arrays if the array being
traversed contains subarrays as values.
(@DBXREF{Arrays of Arrays} for more information about subarrays.)
The three possible return values are interpreted as follows:
@table @code
@item comp_func(i1, v1, i2, v2) < 0
-Index @var{i1} comes before index @var{i2} during loop traversal.
+Index @code{i1} comes before index @code{i2} during loop traversal.
@item comp_func(i1, v1, i2, v2) == 0
-Indices @var{i1} and @var{i2}
-come together but the relative order with respect to each other is undefined.
+Indices @code{i1} and @code{i2}
+come together, but the relative order with respect to each other is undefined.
@item comp_func(i1, v1, i2, v2) > 0
-Index @var{i1} comes after index @var{i2} during loop traversal.
+Index @code{i1} comes after index @code{i2} during loop traversal.
@end table
Our first comparison function can be used to scan an array in
@@ -27052,7 +27072,7 @@ As already mentioned, the order of the indices is arbitrary if two
elements compare equal. This is usually not a problem, but letting
the tied elements come out in arbitrary order can be an issue, especially
when comparing item values. The partial ordering of the equal elements
-may change the next time the array is traversed, if other elements are added or
+may change the next time the array is traversed, if other elements are added to or
removed from the array. One way to resolve ties when comparing elements
with otherwise equal values is to include the indices in the comparison
rules. Note that doing this may make the loop traversal less efficient,
@@ -27095,7 +27115,7 @@ equivalent or distinct.
Another point to keep in mind is that in the case of subarrays,
the element values can themselves be arrays; a production comparison
function should use the @code{isarray()} function
-(@pxref{Type Functions}),
+(@pxref{Type Functions})
to check for this, and choose a defined sorting order for subarrays.
All sorting based on @code{PROCINFO["sorted_in"]}
@@ -27103,7 +27123,7 @@ is disabled in POSIX mode,
because the @code{PROCINFO} array is not special in that case.
As a side note, sorting the array indices before traversing
-the array has been reported to add 15% to 20% overhead to the
+the array has been reported to add a 15% to 20% overhead to the
execution time of @command{awk} programs. For this reason,
sorted array traversal is not the default.
@@ -27162,7 +27182,7 @@ However, the @code{source} array is not affected.
Often, what's needed is to sort on the values of the @emph{indices}
instead of the values of the elements. To do that, use the
@code{asorti()} function. The interface and behavior are identical to
-that of @code{asort()}, except that the index values are used for sorting,
+that of @code{asort()}, except that the index values are used for sorting
and become the values of the result array:
@example
@@ -27197,8 +27217,8 @@ it chooses}, taking into account just the indices, just the values,
or both. This is extremely powerful.
Once the array is sorted, @code{asort()} takes the @emph{values} in
-their final order, and uses them to fill in the result array, whereas
-@code{asorti()} takes the @emph{indices} in their final order, and uses
+their final order and uses them to fill in the result array, whereas
+@code{asorti()} takes the @emph{indices} in their final order and uses
them to fill in the result array.
@cindex reference counting, sorting arrays
@@ -27495,7 +27515,7 @@ service name.
@cindex @command{gawk}, @code{ERRNO} variable in
@cindex @code{ERRNO} variable
@quotation NOTE
-Failure in opening a two-way socket will result in a non-fatal error
+Failure in opening a two-way socket will result in a nonfatal error
being returned to the calling code. The value of @code{ERRNO} indicates
the error (@pxref{Auto-set}).
@end quotation
@@ -27512,19 +27532,19 @@ BEGIN @{
@end example
This program reads the current date and time from the local system's
-TCP @samp{daytime} server.
+TCP @code{daytime} server.
It then prints the results and closes the connection.
Because this topic is extensive, the use of @command{gawk} for
TCP/IP programming is documented separately.
@ifinfo
See
-@inforef{Top, , General Introduction, gawkinet, TCP/IP Internetworking with @command{gawk}},
+@inforef{Top, , General Introduction, gawkinet, @value{GAWKINETTITLE}},
@end ifinfo
@ifnotinfo
See
@uref{http://www.gnu.org/software/gawk/manual/gawkinet/,
-@cite{TCP/IP Internetworking with @command{gawk}}},
+@cite{@value{GAWKINETTITLE}}},
which comes as part of the @command{gawk} distribution,
@end ifnotinfo
for a much more complete introduction and discussion, as well as
@@ -27600,9 +27620,9 @@ junk
@end example
Here is the @file{awkprof.out} that results from running the
-@command{gawk} profiler on this program and data. (This example also
+@command{gawk} profiler on this program and data (this example also
illustrates that @command{awk} programmers sometimes get up very early
-in the morning to work.)
+in the morning to work):
@cindex @code{BEGIN} pattern, and profiling
@cindex @code{END} pattern, and profiling
@@ -27662,8 +27682,8 @@ They are as follows:
@item
The program is printed in the order @code{BEGIN} rules,
@code{BEGINFILE} rules,
-pattern/action rules,
-@code{ENDFILE} rules, @code{END} rules and functions, listed
+pattern--action rules,
+@code{ENDFILE} rules, @code{END} rules, and functions, listed
alphabetically.
Multiple @code{BEGIN} and @code{END} rules retain their
separate identities, as do
@@ -27671,7 +27691,7 @@ multiple @code{BEGINFILE} and @code{ENDFILE} rules.
@cindex patterns, counts, in a profile
@item
-Pattern-action rules have two counts.
+Pattern--action rules have two counts.
The first count, to the left of the rule, shows how many times
the rule's pattern was @emph{tested}.
The second count, to the right of the rule's opening left brace
@@ -27738,13 +27758,13 @@ the target of a redirection isn't a scalar, it gets parenthesized.
@command{gawk} supplies leading comments in
front of the @code{BEGIN} and @code{END} rules,
the @code{BEGINFILE} and @code{ENDFILE} rules,
-the pattern/action rules, and the functions.
+the pattern--action rules, and the functions.
@end itemize
The profiled version of your program may not look exactly like what you
typed when you wrote it. This is because @command{gawk} creates the
-profiled version by ``pretty printing'' its internal representation of
+profiled version by ``pretty-printing'' its internal representation of
the program. The advantage to this is that @command{gawk} can produce
a standard representation.
Also, things such as:
@@ -27827,16 +27847,16 @@ If you use the @code{HUP} signal instead of the @code{USR1} signal,
@cindex @code{SIGQUIT} signal (MS-Windows)
@cindex signals, @code{QUIT}/@code{SIGQUIT} (MS-Windows)
When @command{gawk} runs on MS-Windows systems, it uses the
-@code{INT} and @code{QUIT} signals for producing the profile and, in
+@code{INT} and @code{QUIT} signals for producing the profile, and in
the case of the @code{INT} signal, @command{gawk} exits. This is
because these systems don't support the @command{kill} command, so the
only signals you can deliver to a program are those generated by the
keyboard. The @code{INT} signal is generated by the
-@kbd{Ctrl-@key{C}} or @kbd{Ctrl-@key{BREAK}} key, while the
-@code{QUIT} signal is generated by the @kbd{Ctrl-@key{\}} key.
+@kbd{Ctrl-c} or @kbd{Ctrl-BREAK} key, while the
+@code{QUIT} signal is generated by the @kbd{Ctrl-\} key.
Finally, @command{gawk} also accepts another option, @option{--pretty-print}.
-When called this way, @command{gawk} ``pretty prints'' the program into
+When called this way, @command{gawk} ``pretty-prints'' the program into
@file{awkprof.out}, without any execution counts.
@quotation NOTE
@@ -27890,7 +27910,7 @@ optionally, close off one side of the two-way communications.
@item
By using special @value{FN}s with the @samp{|&} operator, you can open a
-TCP/IP (or UDP/IP) connection to remote hosts in the Internet. @command{gawk}
+TCP/IP (or UDP/IP) connection to remote hosts on the Internet. @command{gawk}
supports both IPv4 and IPv6.
@item
@@ -27900,7 +27920,7 @@ you tune them more easily. Sending the @code{USR1} signal while profiling cause
@command{gawk} to dump the profile and keep going, including a function call stack.
@item
-You can also just ``pretty print'' the program. This currently also runs
+You can also just ``pretty-print'' the program. This currently also runs
the program, but that will change in the next major release.
@end itemize
@@ -31062,7 +31082,7 @@ Allowing completely alphabetic strings to have valid numeric
values is also a very severe departure from historical practice.
@end itemize
-The second problem is that the @code{gawk} maintainer feels that this
+The second problem is that the @command{gawk} maintainer feels that this
interpretation of the standard, which requires a certain amount of
``language lawyering'' to arrive at in the first place, was not even
intended by the standard developers. In other words, ``we see how you
@@ -31221,7 +31241,7 @@ When @option{--sandbox} is specified, extensions are disabled
* Finding Extensions:: How @command{gawk} finds compiled extensions.
* Extension Example:: Example C code for an extension.
* Extension Samples:: The sample extensions that ship with
- @code{gawk}.
+ @command{gawk}.
* gawkextlib:: The @code{gawkextlib} project.
* Extension summary:: Extension summary.
* Extension Exercises:: Exercises.
@@ -32185,7 +32205,7 @@ If the concept of a ``record terminator'' makes sense, then
@code{*rt_start} should be set to point to the data to be used for
@code{RT}, and @code{*rt_len} should be set to the length of the
data. Otherwise, @code{*rt_len} should be set to zero.
-@code{gawk} makes its own copy of this data, so the
+@command{gawk} makes its own copy of this data, so the
extension must manage this storage.
@end table
@@ -32231,7 +32251,7 @@ When writing an input parser, you should think about (and document)
how it is expected to interact with @command{awk} code. You may want
it to always be called, and take effect as appropriate (as the
@code{readdir} extension does). Or you may want it to take effect
-based upon the value of an @code{awk} variable, as the XML extension
+based upon the value of an @command{awk} variable, as the XML extension
from the @code{gawkextlib} project does (@pxref{gawkextlib}).
In the latter case, code in a @code{BEGINFILE} section
can look at @code{FILENAME} and @code{ERRNO} to decide whether or
@@ -33014,7 +33034,7 @@ converts it to a string. Using non-integral values is possible, but
requires that you understand how such values are converted to strings
(@pxref{Conversion}); thus using integral values is safest.
-As with @emph{all} strings passed into @code{gawk} from an extension,
+As with @emph{all} strings passed into @command{gawk} from an extension,
the string value of @code{index} must come from @code{gawk_malloc()},
@code{gawk_calloc()} or @code{gawk_realloc()}, and
@command{gawk} releases the storage.
@@ -35721,6 +35741,11 @@ The @code{isarray()} function to check if a variable is an array or not
The @code{bindtextdomain()}, @code{dcgettext()} and @code{dcngettext()}
functions for internationalization
(@pxref{Programmer i18n}).
+
+@item
+The @code{div()} function for doing integer
+division and remainder
+(@pxref{Numeric Functions}).
@end itemize
@item
@@ -35854,8 +35879,14 @@ Ultrix
@end itemize
@item
-@c FIXME: Verify the version here.
-Support for MirBSD was removed at @command{gawk} @value{PVERSION} 4.2.
+Support for the following systems was removed from the code
+for @command{gawk} @value{PVERSION} 4.2:
+
+@c nested table
+@itemize @value{MINUS}
+@item
+MirBSD
+@end itemize
@end itemize
@@ -36469,6 +36500,40 @@ with a minimum of two
The dynamic extension interface was completely redone
(@pxref{Dynamic Extensions}).
+@item
+Support for Ultrix was removed.
+
+@end itemize
+
+Version 4.2 introduced the following changes:
+
+@itemize @bullet
+@item
+Changes to @code{ENVIRON} are reflected into @command{gawk}'s
+environment and that of programs that it runs.
+@xref{Auto-set}.
+
+@item
+The @option{--pretty-print} option no longer runs the @command{awk}
+program too.
+@xref{Options}.
+
+@item
+The @command{igawk} program and its manual page are no longer
+installed when @command{gawk} is built.
+@xref{Igawk Program}.
+
+@item
+The @code{div()} function.
+@xref{Numeric Functions}.
+
+@item
+The maximum number of hexdecimal digits in @samp{\x} escapes
+is now two.
+@xref{Escape Sequences}.
+
+@item
+Support for MirBSD was removed.
@end itemize
@c XXX ADD MORE STUFF HERE
@@ -37116,10 +37181,10 @@ The generated Info file for this @value{DOCUMENT}.
@item doc/gawkinet.texi
The Texinfo source file for
@ifinfo
-@inforef{Top, , General Introduction, gawkinet, TCP/IP Internetworking with @command{gawk}}.
+@inforef{Top, , General Introduction, gawkinet, @value{GAWKINETTITLE}}.
@end ifinfo
@ifnotinfo
-@cite{TCP/IP Internetworking with @command{gawk}}.
+@cite{@value{GAWKINETTITLE}}.
@end ifnotinfo
It should be processed with @TeX{}
(via @command{texi2dvi} or @command{texi2pdf})
@@ -37128,7 +37193,7 @@ with @command{makeinfo} to produce an Info or HTML file.
@item doc/gawkinet.info
The generated Info file for
-@cite{TCP/IP Internetworking with @command{gawk}}.
+@cite{@value{GAWKINETTITLE}}.
@item doc/igawk.1
The @command{troff} source for a manual page describing the @command{igawk}
@@ -37367,7 +37432,7 @@ can be configured and compiled.
@cindex @option{--disable-lint} configuration option
@cindex configuration option, @code{--disable-lint}
@item --disable-lint
-Disable all lint checking within @code{gawk}. The
+Disable all lint checking within @command{gawk}. The
@option{--lint} and @option{--lint-old} options
(@pxref{Options})
are accepted, but silently do nothing.