aboutsummaryrefslogtreecommitdiffstats
path: root/doc/gawk.texi
diff options
context:
space:
mode:
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r--doc/gawk.texi1641
1 files changed, 912 insertions, 729 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi
index cc215c6a..68d35876 100644
--- a/doc/gawk.texi
+++ b/doc/gawk.texi
@@ -51,10 +51,11 @@
@c applies to and all the info about who's publishing this edition
@c These apply across the board.
-@set UPDATE-MONTH September, 2014
+@set UPDATE-MONTH February, 2015
@set VERSION 4.1
@set PATCHLEVEL 2
+@set GAWKINETTITLE TCP/IP Internetworking with @command{gawk}
@ifset FOR_PRINT
@set TITLE Effective awk Programming
@end ifset
@@ -197,9 +198,9 @@
@ifclear FOR_PRINT
@set FN file name
-@set FFN File Name
+@set FFN File name
@set DF data file
-@set DDF Data File
+@set DDF Data file
@set PVERSION version
@end ifclear
@ifset FOR_PRINT
@@ -298,7 +299,7 @@ Fax: +1-617-542-2652
Email: <email>gnu@@gnu.org</email>
URL: <ulink url="http://www.gnu.org">http://www.gnu.org/</ulink></literallayout>
-<literallayout class="normal">Copyright &copy; 1989, 1991, 1992, 1993, 1996&ndash;2005, 2007, 2009&ndash;2014
+<literallayout class="normal">Copyright &copy; 1989, 1991, 1992, 1993, 1996&ndash;2005, 2007, 2009&ndash;2015
Free Software Foundation, Inc.
All Rights Reserved.</literallayout>
@end docbook
@@ -633,6 +634,7 @@ particular records in a file and perform operations upon them.
* Special Caveats:: Things to watch out for.
* Close Files And Pipes:: Closing Input and Output Files and
Pipes.
+* Nonfatal:: Enabling Nonfatal Output.
* Output Summary:: Output summary.
* Output Exercises:: Exercises.
* Values:: Constants, Variables, and Regular
@@ -1302,7 +1304,7 @@ October 2014
<affiliation><jobtitle>Nof Ayalon</jobtitle></affiliation>
<affiliation><jobtitle>Israel</jobtitle></affiliation>
</author>
- <date>December 2014</date>
+ <date>February 2015</date>
</prefaceinfo>
@end docbook
@@ -1498,7 +1500,7 @@ In May 1997, J@"urgen Kahrs felt the need for network access
from @command{awk}, and with a little help from me, set about adding
features to do this for @command{gawk}. At that time, he also
wrote the bulk of
-@cite{TCP/IP Internetworking with @command{gawk}}
+@cite{@value{GAWKINETTITLE}}
(a separate document, available as part of the @command{gawk} distribution).
His code finally became part of the main @command{gawk} distribution
with @command{gawk} @value{PVERSION} 3.1.
@@ -1521,7 +1523,7 @@ is often referred to as ``new @command{awk}.''
By analogy, the original version of @command{awk} is
referred to as ``old @command{awk}.''
-Today, on most systems, when you run the @command{awk} utility
+On most current systems, when you run the @command{awk} utility
you get some version of new @command{awk}.@footnote{Only
Solaris systems still use an old @command{awk} for the
default @command{awk} utility. A more modern @command{awk} lives in
@@ -1752,15 +1754,39 @@ and how to compile and use it on different
non-POSIX systems. It also describes how to report bugs
in @command{gawk} and where to get other freely
available @command{awk} implementations.
-@end itemize
@ifset FOR_PRINT
-@itemize @value{MINUS}
@item
@ref{Copying},
presents the license that covers the @command{gawk} source code.
+@end ifset
+
+@ifclear FOR_PRINT
+@item
+@ref{Notes},
+describes how to disable @command{gawk}'s extensions, as
+well as how to contribute new code to @command{gawk},
+and some possible future directions for @command{gawk} development.
+
+@item
+@ref{Basic Concepts},
+provides some very cursory background material for those who
+are completely unfamiliar with computer programming.
+
+The @ref{Glossary}, defines most, if not all, of the significant terms used
+throughout the @value{DOCUMENT}. If you find terms that you aren't familiar with,
+try looking them up here.
+
+@item
+@ref{Copying}, and
+@ref{GNU Free Documentation License},
+present the licenses that cover the @command{gawk} source code
+and this @value{DOCUMENT}, respectively.
+@end ifclear
+@end itemize
@end itemize
+@ifset FOR_PRINT
The version of this @value{DOCUMENT} distributed with @command{gawk}
contains additional appendices and other end material.
To save space, we have omitted them from the
@@ -1798,32 +1824,6 @@ Some of the chapters have exercise sections; these have also been
omitted from the print edition but are available online.
@end ifset
-@ifclear FOR_PRINT
-@itemize @value{MINUS}
-@item
-@ref{Notes},
-describes how to disable @command{gawk}'s extensions, as
-well as how to contribute new code to @command{gawk},
-and some possible future directions for @command{gawk} development.
-
-@item
-@ref{Basic Concepts},
-provides some very cursory background material for those who
-are completely unfamiliar with computer programming.
-
-The @ref{Glossary}, defines most, if not all, of the significant terms used
-throughout the @value{DOCUMENT}. If you find terms that you aren't familiar with,
-try looking them up here.
-
-@item
-@ref{Copying}, and
-@ref{GNU Free Documentation License},
-present the licenses that cover the @command{gawk} source code
-and this @value{DOCUMENT}, respectively.
-@end itemize
-@end ifclear
-@end itemize
-
@c FULLXREF OFF
@node Conventions
@@ -1865,15 +1865,23 @@ $ @kbd{echo hello on stderr 1>&2}
@end example
@ifnotinfo
-In the text, command names appear in @code{this font}, while code segments
+In the text, almost anything related to programming, such as
+command names,
+variable and function names, and string, numeric and regexp constants
+appear in @code{this font}. Code fragments
appear in the same font and quoted, @samp{like this}.
+Things that are replaced by the user or programmer
+appear in @var{this font}.
Options look like this: @option{-f}.
+@value{FFN}s are indicated like this: @file{/path/to/ourfile}.
+@ifclear FOR_PRINT
Some things are
emphasized @emph{like this}, and if a point needs to be made
-strongly, it is done @strong{like this}. The first occurrence of
+strongly, it is done @strong{like this}.
+@end ifclear
+The first occurrence of
a new term is usually its @dfn{definition} and appears in the same
font as the previous occurrence of ``definition'' in this sentence.
-Finally, @value{FN}s are indicated like this: @file{/path/to/ourfile}.
@end ifnotinfo
Characters that you type at the keyboard look @kbd{like this}. In particular,
@@ -2286,14 +2294,14 @@ which they raised and educated me.
Finally, I also must acknowledge my gratitude to G-d, for the many opportunities
He has sent my way, as well as for the gifts He has given me with which to
take advantage of those opportunities.
-@iftex
+@ifnotdocbook
@sp 2
@noindent
Arnold Robbins @*
Nof Ayalon @*
Israel @*
-December 2014
-@end iftex
+February 2015
+@end ifnotdocbook
@ifnotinfo
@part @value{PART1}The @command{awk} Language
@@ -4544,6 +4552,8 @@ wait for input before returning with an error.
Controls the number of times @command{gawk} attempts to
retry a two-way TCP/IP (socket) connection before giving up.
@xref{TCP/IP Networking}.
+Note that when nonfatal I/O is enabled (@pxref{Nonfatal}),
+@command{gawk} only tries to open a TCP/IP socket once.
@item POSIXLY_CORRECT
Causes @command{gawk} to switch to POSIX-compatibility
@@ -5180,13 +5190,12 @@ letters or numbers. @value{COMMONEXT}
@quotation CAUTION
In ISO C, the escape sequence continues until the first nonhexadecimal
digit is seen.
-@c FIXME: Add exact version here.
For many years, @command{gawk} would continue incorporating
hexadecimal digits into the value until a non-hexadecimal digit
or the end of the string was encountered.
However, using more than two hexadecimal digits produced
undefined results.
-As of @value{PVERSION} @strong{FIXME:} 4.3.0, only two digits
+As of @value{PVERSION} 4.2, only two digits
are processed.
@end quotation
@@ -7669,7 +7678,7 @@ variable @code{FIELDWIDTHS}. Each number specifies the width of the field,
@emph{including} columns between fields. If you want to ignore the columns
between fields, you can specify the width as a separate field that is
subsequently ignored.
-It is a fatal error to supply a field width that is not a positive number.
+It is a fatal error to supply a field width that has a negative value.
The following data is the output of the Unix @command{w} utility. It is useful
to illustrate the use of @code{FIELDWIDTHS}:
@@ -8967,6 +8976,7 @@ and discusses the @code{close()} built-in function.
@command{gawk} allows access to inherited file
descriptors.
* Close Files And Pipes:: Closing Input and Output Files and Pipes.
+* Nonfatal:: Enabling Nonfatal Output.
* Output Summary:: Output summary.
* Output Exercises:: Exercises.
@end menu
@@ -9445,12 +9455,12 @@ represent
spaces in the output. Here are the possible modifiers, in the order in
which they may appear:
-@table @code
+@table @asis
@cindex differences in @command{awk} and @command{gawk}, @code{print}/@code{printf} statements
@cindex @code{printf} statement, positional specifiers
@c the code{} does NOT start a secondary
@cindex positional specifiers, @code{printf} statement
-@item @var{N}$
+@item @code{@var{N}$}
An integer constant followed by a @samp{$} is a @dfn{positional specifier}.
Normally, format specifications are applied to arguments in the order
given in the format string. With a positional specifier, the format
@@ -9473,7 +9483,7 @@ messages at runtime.
which describes how and why to use positional specifiers.
For now, we ignore them.
-@item - @r{(Minus)}
+@item @code{-} (Minus)
The minus sign, used before the width modifier (see later on in
this list),
says to left-justify
@@ -9491,13 +9501,13 @@ prints @samp{foo@bullet{}}.
For numeric conversions, prefix positive values with a space and
negative values with a minus sign.
-@item +
+@item @code{+}
The plus sign, used before the width modifier (see later on in
this list),
says to always supply a sign for numeric conversions, even if the data
to format is positive. The @samp{+} overrides the space modifier.
-@item #
+@item @code{#}
Use an ``alternative form'' for certain control letters.
For @samp{%o}, supply a leading zero.
For @samp{%x} and @samp{%X}, supply a leading @samp{0x} or @samp{0X} for
@@ -9506,14 +9516,14 @@ For @samp{%e}, @samp{%E}, @samp{%f}, and @samp{%F}, the result always
contains a decimal point.
For @samp{%g} and @samp{%G}, trailing zeros are not removed from the result.
-@item 0
+@item @code{0}
A leading @samp{0} (zero) acts as a flag indicating that output should be
padded with zeros instead of spaces.
This applies only to the numeric output formats.
This flag only has an effect when the field width is wider than the
value to print.
-@item '
+@item @code{'}
A single quote or apostrophe character is a POSIX extension to ISO C.
It indicates that the integer part of a floating-point value, or the
entire part of an integer decimal value, should have a thousands-separator
@@ -9566,7 +9576,7 @@ prints @samp{foobar}.
Preceding the @var{width} with a minus sign causes the output to be
padded with spaces on the right, instead of on the left.
-@item .@var{prec}
+@item @code{.@var{prec}}
A period followed by an integer constant
specifies the precision to use when printing.
The meaning of the precision varies by control letter:
@@ -10473,6 +10483,71 @@ when closing a pipe.
@end ifnotdocbook
+@node Nonfatal
+@section Enabling Nonfatal Output
+
+This @value{SECTION} describes a @command{gawk}-specific feature.
+
+In standard @command{awk}, output with @code{print} or @code{printf}
+to a nonexistent file, or some other I/O error (such as filling up the
+disk) is a fatal error.
+
+@example
+$ @kbd{gawk 'BEGIN @{ print "hi" > "/no/such/file" @}'}
+@error{} gawk: cmd. line:1: fatal: can't redirect to `/no/such/file' (No such file or directory)
+@end example
+
+@command{gawk} makes it possible to detect that an error has
+occurred, allowing you to possibly recover from the error, or
+at least print an error message of your choosing before exiting.
+You can do this in one of two ways:
+
+@itemize @bullet
+@item
+For all output files, by assigning any value to @code{PROCINFO["NONFATAL"]}.
+
+@item
+On a per-file basis, by assigning any value to
+@code{PROCINFO[@var{filename}, "NONFATAL"]}.
+Here, @var{filename} is the name of the file to which
+you wish output to be nonfatal.
+@end itemize
+
+Once you have enabled nonfatal output, you must check @code{ERRNO}
+after every relevant @code{print} or @code{printf} statement to
+see if something went wrong. It is also a good idea to initialize
+@code{ERRNO} to zero before attempting the output. For example:
+
+@example
+$ @kbd{gawk '}
+> @kbd{BEGIN @{}
+> @kbd{ PROCINFO["NONFATAL"] = 1}
+> @kbd{ ERRNO = 0}
+> @kbd{ print "hi" > "/no/such/file"}
+> @kbd{ if (ERRNO) @{}
+> @kbd{ print("Output failed:", ERRNO) > "/dev/stderr"}
+> @kbd{ exit 1}
+> @kbd{ @}}
+> @kbd{@}'}
+@error{} Output failed: No such file or directory
+@end example
+
+Here, @command{gawk} did not produce a fatal error; instead
+it let the @command{awk} program code detect the problem and handle it.
+
+This mechanism works also for standard output and standard error.
+For standard output, you may use @code{PROCINFO["-", "NONFATAL"]}
+or @code{PROCINFO["/dev/stdout", "NONFATAL"]}. For standard error, use
+@code{PROCINFO["/dev/stderr", "NONFATAL"]}.
+
+When attempting to open a TCP/IP socket (@pxref{TCP/IP Networking}),
+@command{gawk} tries multiple times. The @env{GAWK_SOCK_RETRIES}
+environment variable (@pxref{Other Environment Variables}) allows you to
+override @command{gawk}'s builtin default number of attempts. However,
+once nonfatal I/O is enabled for a given socket, @command{gawk} only
+retries once, relying on @command{awk}-level code to notice that there
+was a problem.
+
@node Output Summary
@section Summary
@@ -10501,6 +10576,12 @@ Use @code{close()} to close open file, pipe, and coprocess redirections.
For coprocesses, it is possible to close only one direction of the
communications.
+@item
+Normally errors with @code{print} or @code{printf} are fatal.
+@command{gawk} lets you make output errors be nonfatal either for
+all files or on a per-file basis. You must then check for errors
+after every relevant output statement.
+
@end itemize
@c EXCLUDE START
@@ -11815,6 +11896,7 @@ has the value four, but it changes the value of @code{foo} to five.
In other words, the operator returns the old value of the variable,
but with the side effect of incrementing it.
+@c FIXME: Use @sup here for superscript
The post-increment @samp{foo++} is nearly the same as writing @samp{(foo
+= 1) - 1}. It is not perfectly equivalent because all numbers in
@command{awk} are floating point---in floating point, @samp{foo + 1 - 1} does
@@ -12035,6 +12117,9 @@ the string constant @code{"0"} is actually true, because it is non-null.
@i{The Guide is definitive. Reality is frequently inaccurate.}
@author Douglas Adams, @cite{The Hitchhiker's Guide to the Galaxy}
@end quotation
+@c 2/2015: Antonio Colombo points out that this is really from
+@c The Restaurant at the End of the Universe. But I'm going to
+@c leave it alone.
@cindex comparison expressions
@cindex expressions, comparison
@@ -13205,6 +13290,7 @@ $ @kbd{awk '$1 ~ /li/ @{ print $2 @}' mail-list}
@cindex regexp constants, as patterns
@cindex patterns, regexp constants as
+A regexp constant as a pattern is also a special case of an expression
pattern. The expression @code{/li/} has the value one if @samp{li}
appears in the current input record. Thus, as a pattern, @code{/li/}
matches any record containing @samp{li}.
@@ -14156,12 +14242,12 @@ numbers:
# find smallest divisor of num
@{
num = $1
- for (div = 2; div * div <= num; div++) @{
- if (num % div == 0)
+ for (divisor = 2; divisor * divisor <= num; divisor++) @{
+ if (num % divisor == 0)
break
@}
- if (num % div == 0)
- printf "Smallest divisor of %d is %d\n", num, div
+ if (num % divisor == 0)
+ printf "Smallest divisor of %d is %d\n", num, divisor
else
printf "%d is prime\n", num
@}
@@ -14182,12 +14268,12 @@ an @code{if}:
# find smallest divisor of num
@{
num = $1
- for (div = 2; ; div++) @{
- if (num % div == 0) @{
- printf "Smallest divisor of %d is %d\n", num, div
+ for (divisor = 2; ; divisor++) @{
+ if (num % divisor == 0) @{
+ printf "Smallest divisor of %d is %d\n", num, divisor
break
@}
- if (div * div > num) @{
+ if (divisor * divisor > num) @{
printf "%d is prime\n", num
break
@}
@@ -14629,12 +14715,13 @@ is to simply say @samp{FS = FS}, perhaps with an explanatory comment.
@cindex regular expressions, case sensitivity
@item IGNORECASE #
If @code{IGNORECASE} is nonzero or non-null, then all string comparisons
-and all regular expression matching are case-independent. Thus, regexp
-matching with @samp{~} and @samp{!~}, as well as the @code{gensub()},
-@code{gsub()}, @code{index()}, @code{match()}, @code{patsplit()},
-@code{split()}, and @code{sub()}
-functions, record termination with @code{RS}, and field splitting with
-@code{FS} and @code{FPAT}, all ignore case when doing their particular regexp operations.
+and all regular expression matching are case-independent.
+This applies to
+regexp matching with @samp{~} and @samp{!~},
+the @code{gensub()}, @code{gsub()}, @code{index()}, @code{match()},
+@code{patsplit()}, @code{split()}, and @code{sub()} functions,
+record termination with @code{RS}, and field splitting with
+@code{FS} and @code{FPAT}.
However, the value of @code{IGNORECASE} does @emph{not} affect array subscripting
and it does not affect field splitting when using a single-character
field separator.
@@ -15551,7 +15638,7 @@ In most other languages, arrays must be @dfn{declared} before use,
including a specification of
how many elements or components they contain. In such languages, the
declaration causes a contiguous block of memory to be allocated for that
-many elements. Usually, an index in the array must be a positive integer.
+many elements. Usually, an index in the array must be a nonnegative integer.
For example, the index zero specifies the first element in the array, which is
actually stored at the beginning of the block of memory. Index one
specifies the second element, which is stored in memory right after the
@@ -15731,7 +15818,7 @@ Now the array is @dfn{sparse}, which just means some indices are missing.
It has elements 0--3 and 10, but doesn't have elements 4, 5, 6, 7, 8, or 9.
Another consequence of associative arrays is that the indices don't
-have to be positive integers. Any number, or even a string, can be
+have to be nonnegative integers. Any number, or even a string, can be
an index. For example, the following is an array that translates words from
English to French:
@@ -15994,7 +16081,7 @@ END @{
In programs that use arrays, it is often necessary to use a loop that
executes once for each element of an array. In other languages, where
-arrays are contiguous and indices are limited to positive integers,
+arrays are contiguous and indices are limited to nonnegative integers,
this is easy: all the valid indices can be found by counting from
the lowest index up to the highest. This technique won't do the job
in @command{awk}, because any number or string can be an array index.
@@ -17116,7 +17203,7 @@ for generating random numbers to the value @var{x}.
Each seed value leads to a particular sequence of random
numbers.@footnote{Computer-generated random numbers really are not truly
-random. They are technically known as ``pseudorandom.'' This means
+random. They are technically known as @dfn{pseudorandom}. This means
that although the numbers in a sequence appear to be random, you can in
fact generate the same sequence of random numbers over and over again.}
Thus, if the seed is set to the same value a second time,
@@ -18634,6 +18721,7 @@ which is sufficient to represent times through
2038-01-19 03:14:07 UTC. Many systems support a wider range of timestamps,
including negative timestamps that represent times before the
epoch.
+@c FIXME: Use @sup here for superscript
@cindex @command{date} utility, GNU
@cindex time, retrieving
@@ -19257,15 +19345,16 @@ $ @kbd{gawk -f testbits.awk}
@cindex converting, numbers to strings
@cindex number as string of bits
The @code{bits2str()} function turns a binary number into a string.
-The number @code{1} represents a binary value where the rightmost bit
-is set to 1. Using this mask,
+Initializing @code{mask} to one creates
+a binary value where the rightmost bit
+is set to one. Using this mask,
the function repeatedly checks the rightmost bit.
ANDing the mask with the value indicates whether the
-rightmost bit is 1 or not. If so, a @code{"1"} is concatenated onto the front
+rightmost bit is one or not. If so, a @code{"1"} is concatenated onto the front
of the string.
Otherwise, a @code{"0"} is added.
The value is then shifted right by one bit and the loop continues
-until there are no more 1 bits.
+until there are no more one bits.
If the initial value is zero, it returns a simple @code{"0"}.
Otherwise, at the end, it pads the value with zeros to represent multiples
@@ -19289,7 +19378,7 @@ that traverses every element of an array of arrays
@cindexgawkfunc{isarray}
@cindex scalar or array
@item isarray(@var{x})
-Return a true value if @var{x} is an array. Otherwise return false.
+Return a true value if @var{x} is an array. Otherwise, return false.
@end table
@code{isarray()} is meant for use in two circumstances. The first is when
@@ -19350,7 +19439,7 @@ The default value for @var{category} is @code{"LC_MESSAGES"}.
Return the plural form used for @var{number} of the
translation of @var{string1} and @var{string2} in text domain
@var{domain} for locale category @var{category}. @var{string1} is the
-English singular variant of a message, and @var{string2} the English plural
+English singular variant of a message, and @var{string2} is the English plural
variant of the same message.
The default value for @var{domain} is the current value of @code{TEXTDOMAIN}.
The default value for @var{category} is @code{"LC_MESSAGES"}.
@@ -19379,7 +19468,7 @@ them (i.e., to tell @command{awk} what they should do).
@subsection Function Definition Syntax
@quotation
-@i{It's entirely fair to say that the @command{awk} syntax for local
+@i{It's entirely fair to say that the awk syntax for local
variable definitions is appallingly awful.}
@author Brian Kernighan
@end quotation
@@ -19437,7 +19526,7 @@ it also enforces the second restriction.
Local variables act like the empty string if referenced where a string
value is required, and like zero if referenced where a numeric value
-is required. This is the same as regular variables that have never been
+is required. This is the same as the behavior of regular variables that have never been
assigned a value. (There is more to understand about local variables;
@pxref{Dynamic Typing}.)
@@ -19471,7 +19560,7 @@ During execution of the function body, the arguments and local variable
values hide, or @dfn{shadow}, any variables of the same names used in the
rest of the program. The shadowed variables are not accessible in the
function definition, because there is no way to name them while their
-names have been taken away for the local variables. All other variables
+names have been taken away for the arguments and local variables. All other variables
used in the @command{awk} program can be referenced or set normally in the
function's body.
@@ -19538,7 +19627,7 @@ function myprint(num)
@end example
@noindent
-To illustrate, here is an @command{awk} rule that uses our @code{myprint}
+To illustrate, here is an @command{awk} rule that uses our @code{myprint()}
function:
@example
@@ -19579,13 +19668,13 @@ in an array and start over with a new list of elements
(@pxref{Delete}).
Instead of having
to repeat this loop everywhere that you need to clear out
-an array, your program can just call @code{delarray}.
+an array, your program can just call @code{delarray()}.
(This guarantees portability. The use of @samp{delete @var{array}} to delete
the contents of an entire array is a relatively recent@footnote{Late in 2012.}
addition to the POSIX standard.)
The following is an example of a recursive function. It takes a string
-as an input parameter and returns the string in backwards order.
+as an input parameter and returns the string in reverse order.
Recursive functions must always have a test that stops the recursion.
In this case, the recursion terminates when the input string is
already empty:
@@ -19682,7 +19771,7 @@ an error.
@cindex local variables, in a function
@cindex variables, local to a function
-Unlike many languages,
+Unlike in many languages,
there is no way to make a variable local to a @code{@{} @dots{} @code{@}} block in
@command{awk}, but you can make a variable local to a function. It is
good practice to do so whenever a variable is needed only in that
@@ -19691,7 +19780,7 @@ function.
To make a variable local to a function, simply declare the variable as
an argument after the actual function arguments
(@pxref{Definition Syntax}).
-Look at the following example where variable
+Look at the following example, where variable
@code{i} is a global variable used by both functions @code{foo()} and
@code{bar()}:
@@ -19732,7 +19821,7 @@ foo's i=3
top's i=3
@end example
-If you want @code{i} to be local to both @code{foo()} and @code{bar()} do as
+If you want @code{i} to be local to both @code{foo()} and @code{bar()}, do as
follows (the extra space before @code{i} is a coding convention to
indicate that @code{i} is a local variable, not an argument):
@@ -19820,7 +19909,7 @@ declare explicitly whether the arguments are passed @dfn{by value} or
@dfn{by reference}.
Instead, the passing convention is determined at runtime when
-the function is called according to the following rule:
+the function is called, according to the following rule:
if the argument is an array variable, then it is passed by reference.
Otherwise, the argument is passed by value.
@@ -19897,7 +19986,7 @@ prints @samp{a[1] = 1, a[2] = two, a[3] = 3}, because
@cindex undefined functions
@cindex functions, undefined
Some @command{awk} implementations allow you to call a function that
-has not been defined. They only report a problem at runtime when the
+has not been defined. They only report a problem at runtime, when the
program actually tries to call the function. For example:
@example
@@ -19956,15 +20045,15 @@ makes the returned value undefined, and therefore, unpredictable.
In practice, though, all versions of @command{awk} simply return the
null string, which acts like zero if used in a numeric context.
-A @code{return} statement with no value expression is assumed at the end of
-every function definition. So if control reaches the end of the function
-body, then technically, the function returns an unpredictable value.
+A @code{return} statement without an @var{expression} is assumed at the end of
+every function definition. So, if control reaches the end of the function
+body, then technically the function returns an unpredictable value.
In practice, it returns the empty string. @command{awk}
does @emph{not} warn you if you use the return value of such a function.
Sometimes, you want to write a function for what it does, not for
what it returns. Such a function corresponds to a @code{void} function
-in C, C++ or Java, or to a @code{procedure} in Ada. Thus, it may be appropriate to not
+in C, C++, or Java, or to a @code{procedure} in Ada. Thus, it may be appropriate to not
return any value; simply bear in mind that you should not be using the
return value of such a function.
@@ -20083,13 +20172,15 @@ function calls, you can specify the name of the function to call as a
string variable, and then call the function. Let's look at an example.
Suppose you have a file with your test scores for the classes you
-are taking. The first field is the class name. The following fields
+are taking, and
+you wish to get the sum and the average of
+your test scores.
+The first field is the class name. The following fields
are the functions to call to process the data, up to a ``marker''
field @samp{data:}. Following the marker, to the end of the record,
are the various numeric test scores.
-Here is the initial file; you wish to get the sum and the average of
-your test scores:
+Here is the initial file:
@example
@c file eg/data/class_data1
@@ -20172,9 +20263,9 @@ function sum(first, last, ret, i)
@c endfile
@end example
-These two functions expect to work on fields; thus the parameters
+These two functions expect to work on fields; thus, the parameters
@code{first} and @code{last} indicate where in the fields to start and end.
-Otherwise they perform the expected computations and are not unusual:
+Otherwise, they perform the expected computations and are not unusual:
@example
@c file eg/prog/indirectcall.awk
@@ -20233,8 +20324,8 @@ The ability to use indirect function calls is more powerful than you may
think at first. The C and C++ languages provide ``function pointers,'' which
are a mechanism for calling a function chosen at runtime. One of the most
well-known uses of this ability is the C @code{qsort()} function, which sorts
-an array using the famous ``quick sort'' algorithm
-(see @uref{http://en.wikipedia.org/wiki/Quick_sort, the Wikipedia article}
+an array using the famous ``quicksort'' algorithm
+(see @uref{http://en.wikipedia.org/wiki/Quicksort, the Wikipedia article}
for more information). To use this function, you supply a pointer to a comparison
function. This mechanism allows you to sort arbitrary data in an arbitrary
fashion.
@@ -20253,11 +20344,11 @@ We can do something similar using @command{gawk}, like this:
# January 2009
@c endfile
-
@end ignore
@c file eg/lib/quicksort.awk
-# quicksort --- C.A.R. Hoare's quick sort algorithm. See Wikipedia
-# or almost any algorithms or computer science text
+
+# quicksort --- C.A.R. Hoare's quicksort algorithm. See Wikipedia
+# or almost any algorithms or computer science text.
@c endfile
@ignore
@c file eg/lib/quicksort.awk
@@ -20295,7 +20386,7 @@ function quicksort_swap(data, i, j, temp)
The @code{quicksort()} function receives the @code{data} array, the starting and ending
indices to sort (@code{left} and @code{right}), and the name of a function that
-performs a ``less than'' comparison. It then implements the quick sort algorithm.
+performs a ``less than'' comparison. It then implements the quicksort algorithm.
To make use of the sorting function, we return to our previous example. The
first thing to do is write some comparison functions:
@@ -20406,67 +20497,7 @@ $ @kbd{gawk -f quicksort.awk -f indirectcall.awk class_data2}
@end example
Another example where indirect functions calls are useful can be found in
-processing arrays. @DBREF{Walking Arrays} presented a simple function
-for ``walking'' an array of arrays. That function simply printed the
-name and value of each scalar array element. However, it is easy to
-generalize that function, by passing in the name of a function to call
-when walking an array. The modified function looks like this:
-
-@example
-@c file eg/lib/processarray.awk
-function process_array(arr, name, process, do_arrays, i, new_name)
-@{
- for (i in arr) @{
- new_name = (name "[" i "]")
- if (isarray(arr[i])) @{
- if (do_arrays)
- @@process(new_name, arr[i])
- process_array(arr[i], new_name, process, do_arrays)
- @} else
- @@process(new_name, arr[i])
- @}
-@}
-@c endfile
-@end example
-
-The arguments are as follows:
-
-@table @code
-@item arr
-The array.
-
-@item name
-The name of the array (a string).
-
-@item process
-The name of the function to call.
-
-@item do_arrays
-If this is true, the function can handle elements that are subarrays.
-@end table
-
-If subarrays are to be processed, that is done before walking them further.
-
-When run with the following scaffolding, the function produces the same
-results as does the earlier @code{walk_array()} function:
-
-@example
-BEGIN @{
- a[1] = 1
- a[2][1] = 21
- a[2][2] = 22
- a[3] = 3
- a[4][1][1] = 411
- a[4][2] = 42
-
- process_array(a, "a", "do_print", 0)
-@}
-
-function do_print(name, element)
-@{
- printf "%s = %s\n", name, element
-@}
-@end example
+processing arrays. This is described in @ref{Walking Arrays}.
Remember that you must supply a leading @samp{@@} in front of an indirect function call.
@@ -20582,7 +20613,7 @@ It contains the following chapters:
your own @command{awk} functions. Writing functions is important, because
it allows you to encapsulate algorithms and program tasks in a single
place. It simplifies programming, making program development more
-manageable, and making programs more readable.
+manageable and making programs more readable.
@cindex Kernighan, Brian
@cindex Plauger, P.J.@:
@@ -20711,7 +20742,7 @@ often use variable names like these for their own purposes.
The example programs shown in this @value{CHAPTER} all start the names of their
private variables with an underscore (@samp{_}). Users generally don't use
leading underscores in their variable names, so this convention immediately
-decreases the chances that the variable name will be accidentally shared
+decreases the chances that the variable names will be accidentally shared
with the user's program.
@cindex @code{_} (underscore), in names of private variables
@@ -20729,8 +20760,8 @@ show how our own @command{awk} programming style has evolved and to
provide some basis for this discussion.}
As a final note on variable naming, if a function makes global variables
-available for use by a main program, it is a good convention to start that
-variable's name with a capital letter---for
+available for use by a main program, it is a good convention to start those
+variables' names with a capital letter---for
example, @code{getopt()}'s @code{Opterr} and @code{Optind} variables
(@pxref{Getopt Function}).
The leading capital letter indicates that it is global, while the fact that
@@ -20741,7 +20772,7 @@ not one of @command{awk}'s predefined variables, such as @code{FS}.
It is also important that @emph{all} variables in library
functions that do not need to save state are, in fact, declared
local.@footnote{@command{gawk}'s @option{--dump-variables} command-line
-option is useful for verifying this.} If this is not done, the variable
+option is useful for verifying this.} If this is not done, the variables
could accidentally be used in the user's program, leading to bugs that
are very difficult to track down:
@@ -20939,7 +20970,7 @@ Following is the function:
@example
@c file eg/lib/assert.awk
-# assert --- assert that a condition is true. Otherwise exit.
+# assert --- assert that a condition is true. Otherwise, exit.
@c endfile
@ignore
@@ -20975,7 +21006,7 @@ is false, it prints a message to standard error, using the @code{string}
parameter to describe the failed condition. It then sets the variable
@code{_assert_exit} to one and executes the @code{exit} statement.
The @code{exit} statement jumps to the @code{END} rule. If the @code{END}
-rules finds @code{_assert_exit} to be true, it exits immediately.
+rule finds @code{_assert_exit} to be true, it exits immediately.
The purpose of the test in the @code{END} rule is to
keep any other @code{END} rules from running. When an assertion fails, the
@@ -21267,7 +21298,7 @@ all the strings in an array into one long string. The following function,
the application programs
(@pxref{Sample Programs}).
-Good function design is important; this function needs to be general but it
+Good function design is important; this function needs to be general, but it
should also have a reasonable default behavior. It is called with an array
as well as the beginning and ending indices of the elements in the array to be
merged. This assumes that the array indices are numeric---a reasonable
@@ -21415,7 +21446,7 @@ allowed the user to supply an optional timestamp value to use instead
of the current time.
@node Readfile Function
-@subsection Reading a Whole File At Once
+@subsection Reading a Whole File at Once
Often, it is convenient to have the entire contents of a file available
in memory as a single string. A straightforward but naive way to
@@ -21472,13 +21503,13 @@ function readfile(file, tmp, save_rs)
It works by setting @code{RS} to @samp{^$}, a regular expression that
will never match if the file has contents. @command{gawk} reads data from
-the file into @code{tmp} attempting to match @code{RS}. The match fails
+the file into @code{tmp}, attempting to match @code{RS}. The match fails
after each read, but fails quickly, such that @command{gawk} fills
@code{tmp} with the entire contents of the file.
(@DBXREF{Records} for information on @code{RT} and @code{RS}.)
In the case that @code{file} is empty, the return value is the null
-string. Thus calling code may use something like:
+string. Thus, calling code may use something like:
@example
contents = readfile("/some/path")
@@ -21489,7 +21520,7 @@ if (length(contents) == 0)
This tests the result to see if it is empty or not. An equivalent
test would be @samp{contents == ""}.
-@xref{Extension Sample Readfile}, for an extension function that
+@DBXREF{Extension Sample Readfile} for an extension function that
also reads an entire file into memory.
@node Shell Quoting
@@ -21596,8 +21627,8 @@ The @code{BEGIN} and @code{END} rules are each executed exactly once, at
the beginning and end of your @command{awk} program, respectively
(@pxref{BEGIN/END}).
We (the @command{gawk} authors) once had a user who mistakenly thought that the
-@code{BEGIN} rule is executed at the beginning of each @value{DF} and the
-@code{END} rule is executed at the end of each @value{DF}.
+@code{BEGIN} rules were executed at the beginning of each @value{DF} and the
+@code{END} rules were executed at the end of each @value{DF}.
When informed
that this was not the case, the user requested that we add new special
@@ -21637,7 +21668,7 @@ END @{ endfile(FILENAME) @}
This file must be loaded before the user's ``main'' program, so that the
rule it supplies is executed first.
-This rule relies on @command{awk}'s @code{FILENAME} variable that
+This rule relies on @command{awk}'s @code{FILENAME} variable, which
automatically changes for each new @value{DF}. The current @value{FN} is
saved in a private variable, @code{_oldfilename}. If @code{FILENAME} does
not equal @code{_oldfilename}, then a new @value{DF} is being processed and
@@ -21653,7 +21684,7 @@ first @value{DF}.
The program also supplies an @code{END} rule to do the final processing for
the last file. Because this @code{END} rule comes before any @code{END} rules
supplied in the ``main'' program, @code{endfile()} is called first. Once
-again the value of multiple @code{BEGIN} and @code{END} rules should be clear.
+again, the value of multiple @code{BEGIN} and @code{END} rules should be clear.
@cindex @code{beginfile()} user-defined function
@cindex @code{endfile()} user-defined function
@@ -21701,7 +21732,7 @@ how it simplifies writing the main program.
You are probably wondering, if @code{beginfile()} and @code{endfile()}
functions can do the job, why does @command{gawk} have
-@code{BEGINFILE} and @code{ENDFILE} patterns (@pxref{BEGINFILE/ENDFILE})?
+@code{BEGINFILE} and @code{ENDFILE} patterns?
Good question. Normally, if @command{awk} cannot open a file, this
causes an immediate fatal error. In this case, there is no way for a
@@ -21710,6 +21741,7 @@ calling it relies on the file being open and at the first record. Thus,
the main reason for @code{BEGINFILE} is to give you a ``hook'' to catch
files that cannot be processed. @code{ENDFILE} exists for symmetry,
and because it provides an easy way to do per-file cleanup processing.
+For more information, refer to @ref{BEGINFILE/ENDFILE}.
@docbook
</sidebar>
@@ -21724,7 +21756,7 @@ and because it provides an easy way to do per-file cleanup processing.
You are probably wondering, if @code{beginfile()} and @code{endfile()}
functions can do the job, why does @command{gawk} have
-@code{BEGINFILE} and @code{ENDFILE} patterns (@pxref{BEGINFILE/ENDFILE})?
+@code{BEGINFILE} and @code{ENDFILE} patterns?
Good question. Normally, if @command{awk} cannot open a file, this
causes an immediate fatal error. In this case, there is no way for a
@@ -21733,6 +21765,7 @@ calling it relies on the file being open and at the first record. Thus,
the main reason for @code{BEGINFILE} is to give you a ``hook'' to catch
files that cannot be processed. @code{ENDFILE} exists for symmetry,
and because it provides an easy way to do per-file cleanup processing.
+For more information, refer to @ref{BEGINFILE/ENDFILE}.
@end cartouche
@end ifnotdocbook
@@ -21740,7 +21773,7 @@ and because it provides an easy way to do per-file cleanup processing.
@subsection Rereading the Current File
@cindex files, reading
-Another request for a new built-in function was for a @code{rewind()}
+Another request for a new built-in function was for a
function that would make it possible to reread the current file.
The requesting user didn't want to have to use @code{getline}
(@pxref{Getline})
@@ -21749,7 +21782,7 @@ inside a loop.
However, as long as you are not in the @code{END} rule, it is
quite easy to arrange to immediately close the current input file
and then start over with it from the top.
-For lack of a better name, we'll call it @code{rewind()}:
+For lack of a better name, we'll call the function @code{rewind()}:
@cindex @code{rewind()} user-defined function
@example
@@ -21842,16 +21875,16 @@ See also @ref{ARGC and ARGV}.
Because @command{awk} variable names only allow the English letters,
the regular expression check purposely does not use character classes
such as @samp{[:alpha:]} and @samp{[:alnum:]}
-(@pxref{Bracket Expressions})
+(@pxref{Bracket Expressions}).
@node Empty Files
-@subsection Checking for Zero-length Files
+@subsection Checking for Zero-Length Files
All known @command{awk} implementations silently skip over zero-length files.
This is a by-product of @command{awk}'s implicit
read-a-record-and-match-against-the-rules loop: when @command{awk}
tries to read a record from an empty file, it immediately receives an
-end of file indication, closes the file, and proceeds on to the next
+end-of-file indication, closes the file, and proceeds on to the next
command-line @value{DF}, @emph{without} executing any user-level
@command{awk} program code.
@@ -21916,7 +21949,7 @@ Occasionally, you might not want @command{awk} to process command-line
variable assignments
(@pxref{Assignment Options}).
In particular, if you have a @value{FN} that contains an @samp{=} character,
-@command{awk} treats the @value{FN} as an assignment, and does not process it.
+@command{awk} treats the @value{FN} as an assignment and does not process it.
Some users have suggested an additional command-line option for @command{gawk}
to disable command-line assignments. However, some simple programming with
@@ -22278,8 +22311,8 @@ BEGIN @{
@c endfile
@end example
-The rest of the @code{BEGIN} rule is a simple test program. Here is the
-result of two sample runs of the test program:
+The rest of the @code{BEGIN} rule is a simple test program. Here are the
+results of two sample runs of the test program:
@example
$ @kbd{awk -f getopt.awk -v _getopt_test=1 -- -a -cbARG bax -x}
@@ -22337,7 +22370,7 @@ use @code{getopt()} to process their arguments.
The @code{PROCINFO} array
(@pxref{Built-in Variables})
provides access to the current user's real and effective user and group ID
-numbers, and if available, the user's supplementary group set.
+numbers, and, if available, the user's supplementary group set.
However, because these are numbers, they do not provide very useful
information to the average user. There needs to be some way to find the
user information associated with the user and group ID numbers. This
@@ -22357,7 +22390,7 @@ kept. Instead, it provides the @code{<pwd.h>} header file
and several C language subroutines for obtaining user information.
The primary function is @code{getpwent()}, for ``get password entry.''
The ``password'' comes from the original user database file,
-@file{/etc/passwd}, which stores user information, along with the
+@file{/etc/passwd}, which stores user information along with the
encrypted passwords (hence the name).
@cindex @command{pwcat} program
@@ -22456,7 +22489,7 @@ The user's encrypted password. This may not be available on some systems.
@item User-ID
The user's numeric user ID number.
-(On some systems, it's a C @code{long}, and not an @code{int}. Thus
+(On some systems, it's a C @code{long}, and not an @code{int}. Thus,
we cast it to @code{long} for all cases.)
@item Group-ID
@@ -22583,7 +22616,7 @@ The code that checks for using @code{FPAT}, using @code{using_fpat}
and @code{PROCINFO["FS"]}, is similar.
The main part of the function uses a loop to read database lines, split
-the line into fields, and then store the line into each array as necessary.
+the lines into fields, and then store the lines into each array as necessary.
When the loop is done, @code{@w{_pw_init()}} cleans up by closing the pipeline,
setting @code{@w{_pw_inited}} to one, and restoring @code{FS}
(and @code{FIELDWIDTHS} or @code{FPAT}
@@ -22800,7 +22833,7 @@ it is usually empty or set to @samp{*}.
@item Group ID Number
The group's numeric group ID number;
the association of name to number must be unique within the file.
-(On some systems it's a C @code{long}, and not an @code{int}. Thus
+(On some systems it's a C @code{long}, and not an @code{int}. Thus,
we cast it to @code{long} for all cases.)
@item Group Member List
@@ -22914,32 +22947,32 @@ The @code{@w{_gr_init()}} function first saves @code{FS},
@code{$0}, and then sets @code{FS} and @code{RS} to the correct values for
scanning the group information.
It also takes care to note whether @code{FIELDWIDTHS} or @code{FPAT}
-is being used, and to restore the appropriate field splitting mechanism.
+is being used, and to restore the appropriate field-splitting mechanism.
-The group information is stored is several associative arrays.
+The group information is stored in several associative arrays.
The arrays are indexed by group name (@code{@w{_gr_byname}}), by group ID number
(@code{@w{_gr_bygid}}), and by position in the database (@code{@w{_gr_bycount}}).
There is an additional array indexed by username (@code{@w{_gr_groupsbyuser}}),
which is a space-separated list of groups to which each user belongs.
-Unlike the user database, it is possible to have multiple records in the
+Unlike in the user database, it is possible to have multiple records in the
database for the same group. This is common when a group has a large number
of members. A pair of such entries might look like the following:
@example
-tvpeople:*:101:johny,jay,arsenio
+tvpeople:*:101:johnny,jay,arsenio
tvpeople:*:101:david,conan,tom,joan
@end example
For this reason, @code{_gr_init()} looks to see if a group name or
-group ID number is already seen. If it is, the usernames are
-simply concatenated onto the previous list of users.@footnote{There is actually a
+group ID number is already seen. If so, the usernames are
+simply concatenated onto the previous list of users.@footnote{There is a
subtle problem with the code just presented. Suppose that
the first time there were no names. This code adds the names with
a leading comma. It also doesn't check that there is a @code{$4}.}
Finally, @code{_gr_init()} closes the pipeline to @command{grcat}, restores
-@code{FS} (and @code{FIELDWIDTHS} or @code{FPAT} if necessary), @code{RS}, and @code{$0},
+@code{FS} (and @code{FIELDWIDTHS} or @code{FPAT}, if necessary), @code{RS}, and @code{$0},
initializes @code{_gr_count} to zero
(it is used later), and makes @code{_gr_inited} nonzero.
@@ -23039,12 +23072,12 @@ uses these functions.
@DBREF{Arrays of Arrays} described how @command{gawk}
provides arrays of arrays. In particular, any element of
-an array may be either a scalar, or another array. The
+an array may be either a scalar or another array. The
@code{isarray()} function (@pxref{Type Functions})
lets you distinguish an array
from a scalar.
The following function, @code{walk_array()}, recursively traverses
-an array, printing each element's indices and value.
+an array, printing the element indices and values.
You call it with the array and a string representing the name
of the array:
@@ -23095,6 +23128,66 @@ $ @kbd{gawk -f walk_array.awk}
@print{} a[4][2] = 42
@end example
+The function just presented simply prints the
+name and value of each scalar array element. However, it is easy to
+generalize it, by passing in the name of a function to call
+when walking an array. The modified function looks like this:
+
+@example
+@c file eg/lib/processarray.awk
+function process_array(arr, name, process, do_arrays, i, new_name)
+@{
+ for (i in arr) @{
+ new_name = (name "[" i "]")
+ if (isarray(arr[i])) @{
+ if (do_arrays)
+ @@process(new_name, arr[i])
+ process_array(arr[i], new_name, process, do_arrays)
+ @} else
+ @@process(new_name, arr[i])
+ @}
+@}
+@c endfile
+@end example
+
+The arguments are as follows:
+
+@table @code
+@item arr
+The array.
+
+@item name
+The name of the array (a string).
+
+@item process
+The name of the function to call.
+
+@item do_arrays
+If this is true, the function can handle elements that are subarrays.
+@end table
+
+If subarrays are to be processed, that is done before walking them further.
+
+When run with the following scaffolding, the function produces the same
+results as does the earlier version of @code{walk_array()}:
+
+@example
+BEGIN @{
+ a[1] = 1
+ a[2][1] = 21
+ a[2][2] = 22
+ a[3] = 3
+ a[4][1][1] = 411
+ a[4][2] = 42
+
+ process_array(a, "a", "do_print", 0)
+@}
+
+function do_print(name, element)
+@{
+ printf "%s = %s\n", name, element
+@}
+@end example
@node Library Functions Summary
@section Summary
@@ -23116,24 +23209,24 @@ The functions presented here fit into the following categories:
@c nested list
@table @asis
@item General problems
-Number-to-string conversion, assertions, rounding, random number
+Number-to-string conversion, testing assertions, rounding, random number
generation, converting characters to numbers, joining strings, getting
easily usable time-of-day information, and reading a whole file in
-one shot.
+one shot
@item Managing @value{DF}s
Noting @value{DF} boundaries, rereading the current file, checking for
readable files, checking for zero-length files, and treating assignments
-as @value{FN}s.
+as @value{FN}s
@item Processing command-line options
-An @command{awk} version of the standard C @code{getopt()} function.
+An @command{awk} version of the standard C @code{getopt()} function
@item Reading the user and group databases
-Two sets of routines that parallel the C library versions.
+Two sets of routines that parallel the C library versions
@item Traversing arrays of arrays
-A simple function to traverse an array of arrays to any depth.
+Two functions that traverse an array of arrays to any depth
@end table
@c end nested list
@@ -23228,10 +23321,10 @@ in this @value{CHAPTER}.
The second presents @command{awk}
versions of several common POSIX utilities.
These are programs that you are hopefully already familiar with,
-and therefore, whose problems are understood.
+and therefore whose problems are understood.
By reimplementing these programs in @command{awk},
you can focus on the @command{awk}-related aspects of solving
-the programming problem.
+the programming problems.
The third is a grab bag of interesting programs.
These solve a number of different data-manipulation and management
@@ -23291,7 +23384,7 @@ It should be noted that these programs are not necessarily intended to
replace the installed versions on your system.
Nor may all of these programs be fully compliant with the most recent
POSIX standard. This is not a problem; their
-purpose is to illustrate @command{awk} language programming for ``real world''
+purpose is to illustrate @command{awk} language programming for ``real-world''
tasks.
The programs are presented in alphabetical order.
@@ -23320,7 +23413,7 @@ but you may supply a command-line option to change the field
@dfn{delimiter} (i.e., the field-separator character). @command{cut}'s
definition of fields is less general than @command{awk}'s.
-A common use of @command{cut} might be to pull out just the login name of
+A common use of @command{cut} might be to pull out just the login names of
logged-on users from the output of @command{who}. For example, the following
pipeline generates a sorted, unique list of the logged-on users:
@@ -23829,7 +23922,7 @@ successful or unsuccessful match. If the line does not match, the
@code{next} statement just moves on to the next record.
A number of additional tests are made, but they are only done if we
-are not counting lines. First, if the user only wants exit status
+are not counting lines. First, if the user only wants the exit status
(@code{no_print} is true), then it is enough to know that @emph{one}
line in this file matched, and we can skip on to the next file with
@code{nextfile}. Similarly, if we are only printing @value{FN}s, we can
@@ -23870,7 +23963,7 @@ if necessary:
@end example
The @code{END} rule takes care of producing the correct exit status. If
-there are no matches, the exit status is one; otherwise it is zero:
+there are no matches, the exit status is one; otherwise, it is zero:
@example
@c file eg/prog/egrep.awk
@@ -23922,7 +24015,8 @@ Here is a simple version of @command{id} written in @command{awk}.
It uses the user database library functions
(@pxref{Passwd Functions})
and the group database library functions
-(@pxref{Group Functions}):
+(@pxref{Group Functions})
+from @ref{Library Functions}.
The program is fairly straightforward. All the work is done in the
@code{BEGIN} rule. The user and group ID numbers are obtained from
@@ -24049,8 +24143,8 @@ By default,
the output files are named @file{xaa}, @file{xab}, and so on. Each file has
1,000 lines in it, with the likely exception of the last file. To change the
number of lines in each file, supply a number on the command line
-preceded with a minus (e.g., @samp{-500} for files with 500 lines in them
-instead of 1,000). To change the name of the output files to something like
+preceded with a minus sign (e.g., @samp{-500} for files with 500 lines in them
+instead of 1,000). To change the names of the output files to something like
@file{myfileaa}, @file{myfileab}, and so on, supply an additional
argument that specifies the @value{FN} prefix.
@@ -24889,7 +24983,7 @@ checking and setting of defaults: the delay, the count, and the message to
print. If the user supplied a message without the ASCII BEL
character (known as the ``alert'' character, @code{"\a"}), then it is added to
the message. (On many systems, printing the ASCII BEL generates an
-audible alert. Thus when the alarm goes off, the system calls attention
+audible alert. Thus, when the alarm goes off, the system calls attention
to itself in case the user is not looking at the computer.)
Just for a change, this program uses a @code{switch} statement
(@pxref{Switch Statement}), but the processing could be done with a series of
@@ -25058,7 +25152,7 @@ to @command{gawk}.
@c at least theoretically
The following program was written to
prove that character transliteration could be done with a user-level
-function. This program is not as complete as the system @command{tr} utility
+function. This program is not as complete as the system @command{tr} utility,
but it does most of the job.
The @command{translate} program was written long before @command{gawk}
@@ -25070,13 +25164,13 @@ takes three arguments:
@table @code
@item from
-A list of characters from which to translate.
+A list of characters from which to translate
@item to
-A list of characters to which to translate.
+A list of characters to which to translate
@item target
-The string on which to do the translation.
+The string on which to do the translation
@end table
Associative arrays make the translation part fairly easy. @code{t_ar} holds
@@ -25085,7 +25179,7 @@ loop goes through @code{from}, one character at a time. For each character
in @code{from}, if the character appears in @code{target},
it is replaced with the corresponding @code{to} character.
-The @code{translate()} function calls @code{stranslate()} using @code{$0}
+The @code{translate()} function calls @code{stranslate()}, using @code{$0}
as the target. The main program sets two global variables, @code{FROM} and
@code{TO}, from the command line, and then changes @code{ARGV} so that
@command{awk} reads from the standard input.
@@ -25107,7 +25201,7 @@ Finally, the processing rule simply calls @code{translate()} for each record:
@c endfile
@end ignore
@c file eg/prog/translate.awk
-# Bugs: does not handle things like: tr A-Z a-z, it has
+# Bugs: does not handle things like tr A-Z a-z; it has
# to be spelled out. However, if `to' is shorter than `from',
# the last character in `to' is used for the rest of `from'.
@@ -25183,7 +25277,7 @@ for inspiration.
@cindex printing, mailing labels
@cindex mailing labels@comma{} printing
-Here is a ``real world''@footnote{``Real world'' is defined as
+Here is a ``real-world''@footnote{``Real world'' is defined as
``a program actually used to get something done.''}
program. This
script reads lists of names and
@@ -25192,7 +25286,7 @@ on it, two across and 10 down. The addresses are guaranteed to be no more
than five lines of data. Each address is separated from the next by a blank
line.
-The basic idea is to read 20 labels worth of data. Each line of each label
+The basic idea is to read 20 labels' worth of data. Each line of each label
is stored in the @code{line} array. The single rule takes care of filling
the @code{line} array and printing the page when 20 labels have been read.
@@ -25215,12 +25309,12 @@ of lines on the page
Most of the work is done in the @code{printpage()} function.
The label lines are stored sequentially in the @code{line} array. But they
-have to print horizontally; @code{line[1]} next to @code{line[6]},
+have to print horizontally: @code{line[1]} next to @code{line[6]},
@code{line[2]} next to @code{line[7]}, and so on. Two loops
accomplish this. The outer loop, controlled by @code{i}, steps through
every 10 lines of data; this is each row of labels. The inner loop,
controlled by @code{j}, goes through the lines within the row.
-As @code{j} goes from 0 to 4, @samp{i+j} is the @code{j}-th line in
+As @code{j} goes from 0 to 4, @samp{i+j} is the @code{j}th line in
the row, and @samp{i+j+5} is the entry next to it. The output ends up
looking something like this:
@@ -25338,8 +25432,8 @@ END @{
@}
@end example
-The program relies on @command{awk}'s default field splitting
-mechanism to break each line up into ``words,'' and uses an
+The program relies on @command{awk}'s default field-splitting
+mechanism to break each line up into ``words'' and uses an
associative array named @code{freq}, indexed by each word, to count
the number of times the word occurs. In the @code{END} rule,
it prints the counts.
@@ -25444,7 +25538,7 @@ to use the @command{sort} program.
@cindex lines, duplicate@comma{} removing
The @command{uniq} program
-(@pxref{Uniq Program}),
+(@pxref{Uniq Program})
removes duplicate lines from @emph{sorted} data.
Suppose, however, you need to remove duplicate lines from a @value{DF} but
@@ -25531,7 +25625,7 @@ Texinfo input file into separate files.
@cindex Texinfo
This @value{DOCUMENT} is written in @uref{http://www.gnu.org/software/texinfo/, Texinfo},
-the GNU project's document formatting language.
+the GNU Project's document formatting language.
A single Texinfo source file can be used to produce both
printed documentation, with @TeX{}, and online documentation.
@ifnotinfo
@@ -25590,7 +25684,7 @@ The Texinfo file looks something like this:
@example
@dots{}
-This program has a @@code@{BEGIN@} rule,
+This program has a @@code@{BEGIN@} rule
that prints a nice message:
@@example
@@ -25619,7 +25713,7 @@ exits with a zero exit status, signifying OK:
@cindex @code{extract.awk} program
@example
@c file eg/prog/extract.awk
-# extract.awk --- extract files and run programs from texinfo files
+# extract.awk --- extract files and run programs from Texinfo files
@c endfile
@ignore
@c file eg/prog/extract.awk
@@ -25660,12 +25754,12 @@ The second rule handles moving data into files. It verifies that a
@value{FN} is given in the directive. If the file named is not the
current file, then the current file is closed. Keeping the current file
open until a new file is encountered allows the use of the @samp{>}
-redirection for printing the contents, keeping open file management
+redirection for printing the contents, keeping open-file management
simple.
The @code{for} loop does the work. It reads lines using @code{getline}
(@pxref{Getline}).
-For an unexpected end of file, it calls the @code{@w{unexpected_eof()}}
+For an unexpected end-of-file, it calls the @code{@w{unexpected_eof()}}
function. If the line is an ``endfile'' line, then it breaks out of
the loop.
If the line is an @samp{@@group} or @samp{@@end group} line, then it
@@ -25767,7 +25861,7 @@ END @{
@cindex @command{sed} utility
@cindex stream editors
-The @command{sed} utility is a stream editor, a program that reads a
+The @command{sed} utility is a @dfn{stream editor}, a program that reads a
stream of data, makes changes to it, and passes it on.
It is often used to make global changes to a large file or to a stream
of data generated by a pipeline of commands.
@@ -25912,7 +26006,7 @@ includes don't accidentally include a library function twice.
@command{igawk} should behave just like @command{gawk} externally. This
means it should accept all of @command{gawk}'s command-line arguments,
including the ability to have multiple source files specified via
-@option{-f}, and the ability to mix command-line and library source files.
+@option{-f} and the ability to mix command-line and library source files.
The program is written using the POSIX Shell (@command{sh}) command
language.@footnote{Fully explaining the @command{sh} language is beyond
@@ -25951,7 +26045,7 @@ Run the expanded program with @command{gawk} and any other original command-line
arguments that the user supplied (such as the @value{DF} names).
@end enumerate
-This program uses shell variables extensively: for storing command-line arguments,
+This program uses shell variables extensively: for storing command-line arguments and
the text of the @command{awk} program that will expand the user's program, for the
user's original program, and for the expanded program. Doing so removes some
potential problems that might arise were we to use temporary files instead,
@@ -26268,22 +26362,7 @@ Save the results of this processing in the shell variable
The last step is to call @command{gawk} with the expanded program,
along with the original
-options and command-line arguments that the user supplied.
-
-@c this causes more problems than it solves, so leave it out.
-@ignore
-The special file @file{/dev/null} is passed as a @value{DF} to @command{gawk}
-to handle an interesting case. Suppose that the user's program only has
-a @code{BEGIN} rule and there are no @value{DF}s to read.
-The program should exit without reading any @value{DF}s.
-However, suppose that an included library file defines an @code{END}
-rule of its own. In this case, @command{gawk} will hang, reading standard
-input. In order to avoid this, @file{/dev/null} is explicitly added to the
-command line. Reading from @file{/dev/null} always returns an immediate
-end of file indication.
-
-@c Hmm. Add /dev/null if $# is 0? Still messes up ARGV. Sigh.
-@end ignore
+options and command-line arguments that the user supplied:
@example
@c file eg/prog/igawk.sh
@@ -26349,8 +26428,8 @@ the same letters
Column 2, Problem C, of Jon Bentley's @cite{Programming Pearls}, Second
Edition, presents an elegant algorithm. The idea is to give words that
are anagrams a common signature, sort all the words together by their
-signature, and then print them. Dr.@: Bentley observes that taking the
-letters in each word and sorting them produces that common signature.
+signatures, and then print them. Dr.@: Bentley observes that taking the
+letters in each word and sorting them produces those common signatures.
The following program uses arrays of arrays to bring together
words with the same signature and array sorting to print the words
@@ -26359,8 +26438,8 @@ in sorted order:
@cindex @code{anagram.awk} program
@example
@c file eg/prog/anagram.awk
-# anagram.awk --- An implementation of the anagram finding algorithm
-# from Jon Bentley's "Programming Pearls", 2nd edition.
+# anagram.awk --- An implementation of the anagram-finding algorithm
+# from Jon Bentley's "Programming Pearls," 2nd edition.
# Addison Wesley, 2000, ISBN 0-201-65788-0.
# Column 2, Problem C, section 2.8, pp 18-20.
@c endfile
@@ -26408,7 +26487,7 @@ sorts the letters, and then joins them back together:
@example
@c file eg/prog/anagram.awk
-# word2key --- split word apart into letters, sort, joining back together
+# word2key --- split word apart into letters, sort, and join back together
function word2key(word, a, i, n, result)
@{
@@ -26603,12 +26682,13 @@ characters. The ability to use @code{split()} with the empty string as
the separator can considerably simplify such tasks.
@item
-The library functions from @ref{Library Functions}, proved their
-usefulness for a number of real (if small) programs.
+The examples here demonstrate the usefulness of the library
+functions from @DBREF{Library Functions}
+for a number of real (if small) programs.
@item
Besides reinventing POSIX wheels, other programs solved a selection of
-interesting problems, such as finding duplicates words in text, printing
+interesting problems, such as finding duplicate words in text, printing
mailing labels, and finding anagrams.
@end itemize
@@ -26804,18 +26884,18 @@ a violent psychopath who knows where you live.}
This @value{CHAPTER} discusses advanced features in @command{gawk}.
It's a bit of a ``grab bag'' of items that are otherwise unrelated
to each other.
-First, a command-line option allows @command{gawk} to recognize
+First, we look at a command-line option that allows @command{gawk} to recognize
nondecimal numbers in input data, not just in @command{awk}
programs.
Then, @command{gawk}'s special features for sorting arrays are presented.
Next, two-way I/O, discussed briefly in earlier parts of this
@value{DOCUMENT}, is described in full detail, along with the basics
-of TCP/IP networking. Finally, @command{gawk}
+of TCP/IP networking. Finally, we see how @command{gawk}
can @dfn{profile} an @command{awk} program, making it possible to tune
it for performance.
@c FULLXREF ON
-A number of advanced features require separate @value{CHAPTER}s of their
+Additional advanced features are discussed in separate @value{CHAPTER}s of their
own:
@itemize @value{BULLET}
@@ -26909,7 +26989,8 @@ This option may disappear in a future version of @command{gawk}.
@node Array Sorting
@section Controlling Array Traversal and Array Sorting
-@command{gawk} lets you control the order in which a @samp{for (i in array)}
+@command{gawk} lets you control the order in which a
+@samp{for (@var{indx} in @var{array})}
loop traverses an array.
In addition, two built-in functions, @code{asort()} and @code{asorti()},
@@ -26925,7 +27006,7 @@ to order the elements during sorting.
@node Controlling Array Traversal
@subsection Controlling Array Traversal
-By default, the order in which a @samp{for (i in array)} loop
+By default, the order in which a @samp{for (@var{indx} in @var{array})} loop
scans an array is not defined; it is generally based upon
the internal implementation of arrays inside @command{awk}.
@@ -26954,23 +27035,23 @@ function comp_func(i1, v1, i2, v2)
@}
@end example
-Here, @var{i1} and @var{i2} are the indices, and @var{v1} and @var{v2}
+Here, @code{i1} and @code{i2} are the indices, and @code{v1} and @code{v2}
are the corresponding values of the two elements being compared.
-Either @var{v1} or @var{v2}, or both, can be arrays if the array being
+Either @code{v1} or @code{v2}, or both, can be arrays if the array being
traversed contains subarrays as values.
(@DBXREF{Arrays of Arrays} for more information about subarrays.)
The three possible return values are interpreted as follows:
@table @code
@item comp_func(i1, v1, i2, v2) < 0
-Index @var{i1} comes before index @var{i2} during loop traversal.
+Index @code{i1} comes before index @code{i2} during loop traversal.
@item comp_func(i1, v1, i2, v2) == 0
-Indices @var{i1} and @var{i2}
-come together but the relative order with respect to each other is undefined.
+Indices @code{i1} and @code{i2}
+come together, but the relative order with respect to each other is undefined.
@item comp_func(i1, v1, i2, v2) > 0
-Index @var{i1} comes after index @var{i2} during loop traversal.
+Index @code{i1} comes after index @code{i2} during loop traversal.
@end table
Our first comparison function can be used to scan an array in
@@ -27131,7 +27212,7 @@ As already mentioned, the order of the indices is arbitrary if two
elements compare equal. This is usually not a problem, but letting
the tied elements come out in arbitrary order can be an issue, especially
when comparing item values. The partial ordering of the equal elements
-may change the next time the array is traversed, if other elements are added or
+may change the next time the array is traversed, if other elements are added to or
removed from the array. One way to resolve ties when comparing elements
with otherwise equal values is to include the indices in the comparison
rules. Note that doing this may make the loop traversal less efficient,
@@ -27174,7 +27255,7 @@ equivalent or distinct.
Another point to keep in mind is that in the case of subarrays,
the element values can themselves be arrays; a production comparison
function should use the @code{isarray()} function
-(@pxref{Type Functions}),
+(@pxref{Type Functions})
to check for this, and choose a defined sorting order for subarrays.
All sorting based on @code{PROCINFO["sorted_in"]}
@@ -27182,7 +27263,7 @@ is disabled in POSIX mode,
because the @code{PROCINFO} array is not special in that case.
As a side note, sorting the array indices before traversing
-the array has been reported to add 15% to 20% overhead to the
+the array has been reported to add a 15% to 20% overhead to the
execution time of @command{awk} programs. For this reason,
sorted array traversal is not the default.
@@ -27241,7 +27322,7 @@ However, the @code{source} array is not affected.
Often, what's needed is to sort on the values of the @emph{indices}
instead of the values of the elements. To do that, use the
@code{asorti()} function. The interface and behavior are identical to
-that of @code{asort()}, except that the index values are used for sorting,
+that of @code{asort()}, except that the index values are used for sorting
and become the values of the result array:
@example
@@ -27276,8 +27357,8 @@ it chooses}, taking into account just the indices, just the values,
or both. This is extremely powerful.
Once the array is sorted, @code{asort()} takes the @emph{values} in
-their final order, and uses them to fill in the result array, whereas
-@code{asorti()} takes the @emph{indices} in their final order, and uses
+their final order and uses them to fill in the result array, whereas
+@code{asorti()} takes the @emph{indices} in their final order and uses
them to fill in the result array.
@cindex reference counting, sorting arrays
@@ -27574,7 +27655,7 @@ service name.
@cindex @command{gawk}, @code{ERRNO} variable in
@cindex @code{ERRNO} variable
@quotation NOTE
-Failure in opening a two-way socket will result in a non-fatal error
+Failure in opening a two-way socket will result in a nonfatal error
being returned to the calling code. The value of @code{ERRNO} indicates
the error (@pxref{Auto-set}).
@end quotation
@@ -27591,19 +27672,19 @@ BEGIN @{
@end example
This program reads the current date and time from the local system's
-TCP @samp{daytime} server.
+TCP @code{daytime} server.
It then prints the results and closes the connection.
Because this topic is extensive, the use of @command{gawk} for
TCP/IP programming is documented separately.
@ifinfo
See
-@inforef{Top, , General Introduction, gawkinet, TCP/IP Internetworking with @command{gawk}},
+@inforef{Top, , General Introduction, gawkinet, @value{GAWKINETTITLE}},
@end ifinfo
@ifnotinfo
See
@uref{http://www.gnu.org/software/gawk/manual/gawkinet/,
-@cite{TCP/IP Internetworking with @command{gawk}}},
+@cite{@value{GAWKINETTITLE}}},
which comes as part of the @command{gawk} distribution,
@end ifnotinfo
for a much more complete introduction and discussion, as well as
@@ -27679,9 +27760,9 @@ junk
@end example
Here is the @file{awkprof.out} that results from running the
-@command{gawk} profiler on this program and data. (This example also
+@command{gawk} profiler on this program and data (this example also
illustrates that @command{awk} programmers sometimes get up very early
-in the morning to work.)
+in the morning to work):
@cindex @code{BEGIN} pattern, and profiling
@cindex @code{END} pattern, and profiling
@@ -27741,8 +27822,8 @@ They are as follows:
@item
The program is printed in the order @code{BEGIN} rules,
@code{BEGINFILE} rules,
-pattern/action rules,
-@code{ENDFILE} rules, @code{END} rules and functions, listed
+pattern--action rules,
+@code{ENDFILE} rules, @code{END} rules, and functions, listed
alphabetically.
Multiple @code{BEGIN} and @code{END} rules retain their
separate identities, as do
@@ -27750,7 +27831,7 @@ multiple @code{BEGINFILE} and @code{ENDFILE} rules.
@cindex patterns, counts, in a profile
@item
-Pattern-action rules have two counts.
+Pattern--action rules have two counts.
The first count, to the left of the rule, shows how many times
the rule's pattern was @emph{tested}.
The second count, to the right of the rule's opening left brace
@@ -27817,13 +27898,13 @@ the target of a redirection isn't a scalar, it gets parenthesized.
@command{gawk} supplies leading comments in
front of the @code{BEGIN} and @code{END} rules,
the @code{BEGINFILE} and @code{ENDFILE} rules,
-the pattern/action rules, and the functions.
+the pattern--action rules, and the functions.
@end itemize
The profiled version of your program may not look exactly like what you
typed when you wrote it. This is because @command{gawk} creates the
-profiled version by ``pretty printing'' its internal representation of
+profiled version by ``pretty-printing'' its internal representation of
the program. The advantage to this is that @command{gawk} can produce
a standard representation.
Also, things such as:
@@ -27906,16 +27987,16 @@ If you use the @code{HUP} signal instead of the @code{USR1} signal,
@cindex @code{SIGQUIT} signal (MS-Windows)
@cindex signals, @code{QUIT}/@code{SIGQUIT} (MS-Windows)
When @command{gawk} runs on MS-Windows systems, it uses the
-@code{INT} and @code{QUIT} signals for producing the profile and, in
+@code{INT} and @code{QUIT} signals for producing the profile, and in
the case of the @code{INT} signal, @command{gawk} exits. This is
because these systems don't support the @command{kill} command, so the
only signals you can deliver to a program are those generated by the
keyboard. The @code{INT} signal is generated by the
-@kbd{Ctrl-@key{C}} or @kbd{Ctrl-@key{BREAK}} key, while the
-@code{QUIT} signal is generated by the @kbd{Ctrl-@key{\}} key.
+@kbd{Ctrl-c} or @kbd{Ctrl-BREAK} key, while the
+@code{QUIT} signal is generated by the @kbd{Ctrl-\} key.
Finally, @command{gawk} also accepts another option, @option{--pretty-print}.
-When called this way, @command{gawk} ``pretty prints'' the program into
+When called this way, @command{gawk} ``pretty-prints'' the program into
@file{awkprof.out}, without any execution counts.
@quotation NOTE
@@ -27969,7 +28050,7 @@ optionally, close off one side of the two-way communications.
@item
By using special @value{FN}s with the @samp{|&} operator, you can open a
-TCP/IP (or UDP/IP) connection to remote hosts in the Internet. @command{gawk}
+TCP/IP (or UDP/IP) connection to remote hosts on the Internet. @command{gawk}
supports both IPv4 and IPv6.
@item
@@ -27979,7 +28060,7 @@ you tune them more easily. Sending the @code{USR1} signal while profiling cause
@command{gawk} to dump the profile and keep going, including a function call stack.
@item
-You can also just ``pretty print'' the program. This currently also runs
+You can also just ``pretty-print'' the program. This currently also runs
the program, but that will change in the next major release.
@end itemize
@@ -28028,7 +28109,7 @@ a requirement.
@cindex localization
@dfn{Internationalization} means writing (or modifying) a program once,
in such a way that it can use multiple languages without requiring
-further source-code changes.
+further source code changes.
@dfn{Localization} means providing the data necessary for an
internationalized program to work in a particular language.
Most typically, these terms refer to features such as the language
@@ -28043,7 +28124,7 @@ monetary values are printed and read.
@cindex @command{gettext} library
@command{gawk} uses GNU @command{gettext} to provide its internationalization
features.
-The facilities in GNU @command{gettext} focus on messages; strings printed
+The facilities in GNU @command{gettext} focus on messages: strings printed
by a program, either directly or via formatting with @code{printf} or
@code{sprintf()}.@footnote{For some operating systems, the @command{gawk}
port doesn't support GNU @command{gettext}.
@@ -28234,7 +28315,7 @@ All of the above. (Not too useful in the context of @command{gettext}.)
@section Internationalizing @command{awk} Programs
@cindex @command{awk} programs, internationalizing
-@command{gawk} provides the following variables and functions for
+@command{gawk} provides the following variables for
internationalization:
@table @code
@@ -28250,7 +28331,12 @@ value is @code{"messages"}.
String constants marked with a leading underscore
are candidates for translation at runtime.
String constants without a leading underscore are not translated.
+@end table
+
+@command{gawk} provides the following functions for
+internationalization:
+@table @code
@cindexgawkfunc{dcgettext}
@item @code{dcgettext(@var{string}} [@code{,} @var{domain} [@code{,} @var{category}]]@code{)}
Return the translation of @var{string} in
@@ -28307,15 +28393,7 @@ If @var{directory} is the null string (@code{""}), then
given @var{domain}.
@end table
-To use these facilities in your @command{awk} program, follow the steps
-outlined in
-@ifnotinfo
-the previous @value{SECTION},
-@end ifnotinfo
-@ifinfo
-@ref{Explaining gettext},
-@end ifinfo
-like so:
+To use these facilities in your @command{awk} program, follow these steps:
@enumerate
@cindex @code{BEGIN} pattern, @code{TEXTDOMAIN} variable and
@@ -28598,7 +28676,7 @@ the null string (@code{""}) as its value, leaving the original string constant a
the result.
@item
-By defining ``dummy'' functions to replace @code{dcgettext()}, @code{dcngettext()}
+By defining ``dummy'' functions to replace @code{dcgettext()}, @code{dcngettext()},
and @code{bindtextdomain()}, the @command{awk} program can be made to run, but
all the messages are output in the original language.
For example:
@@ -28782,11 +28860,11 @@ using the GNU @command{gettext} package.
(GNU @command{gettext} is described in
complete detail in
@ifinfo
-@inforef{Top, , GNU @command{gettext} utilities, gettext, GNU gettext tools}.)
+@inforef{Top, , GNU @command{gettext} utilities, gettext, GNU @command{gettext} utilities}.)
@end ifinfo
@ifnotinfo
@uref{http://www.gnu.org/software/gettext/manual/,
-@cite{GNU gettext tools}}.)
+@cite{GNU @command{gettext} utilities}}.)
@end ifnotinfo
As of this writing, the latest version of GNU @command{gettext} is
@uref{ftp://ftp.gnu.org/gnu/gettext/gettext-0.19.4.tar.gz,
@@ -28802,7 +28880,7 @@ and fatal errors in the local language.
@itemize @value{BULLET}
@item
Internationalization means writing a program such that it can use multiple
-languages without requiring source-code changes. Localization means
+languages without requiring source code changes. Localization means
providing the data necessary for an internationalized program to work
in a particular language.
@@ -28819,9 +28897,9 @@ file, and the @file{.po} files are compiled into @file{.gmo} files for
use at runtime.
@item
-You can use position specifications with @code{sprintf()} and
+You can use positional specifications with @code{sprintf()} and
@code{printf} to rearrange the placement of argument values in formatted
-strings and output. This is useful for the translations of format
+strings and output. This is useful for the translation of format
control strings.
@item
@@ -28877,8 +28955,7 @@ the discussion of debugging in @command{gawk}.
@subsection Debugging in General
(If you have used debuggers in other languages, you may want to skip
-ahead to the next section on the specific features of the @command{gawk}
-debugger.)
+ahead to @ref{Awk Debugging}.)
Of course, a debugging program cannot remove bugs for you, because it has
no way of knowing what you or your users consider a ``bug'' versus a
@@ -28969,10 +29046,10 @@ and usually find the errant code quite quickly.
@end table
@node Awk Debugging
-@subsection Awk Debugging
+@subsection @command{awk} Debugging
Debugging an @command{awk} program has some specific aspects that are
-not shared with other programming languages.
+not shared with programs written in other languages.
First of all, the fact that @command{awk} programs usually take input
line by line from a file or files and operate on those lines using specific
@@ -28988,7 +29065,7 @@ to look at the individual primitive instructions carried out
by the higher-level @command{awk} commands.
@node Sample Debugging Session
-@section Sample Debugging Session
+@section Sample @command{gawk} Debugging Session
@cindex sample debugging session
In order to illustrate the use of @command{gawk} as a debugger, let's look at a sample
@@ -29007,8 +29084,8 @@ as our example.
@cindex debugger, how to start
Starting the debugger is almost exactly like running @command{gawk} normally,
-except you have to pass an additional option @option{--debug}, or the
-corresponding short option @option{-D}. The file(s) containing the
+except you have to pass an additional option, @option{--debug}, or the
+corresponding short option, @option{-D}. The file(s) containing the
program and any supporting code are given on the command line as arguments
to one or more @option{-f} options. (@command{gawk} is not designed
to debug command-line programs, only programs contained in files.)
@@ -29021,7 +29098,7 @@ $ @kbd{gawk -D -f getopt.awk -f join.awk -f uniq.awk -1 inputfile}
@noindent
where both @file{getopt.awk} and @file{uniq.awk} are in @env{$AWKPATH}.
(Experienced users of GDB or similar debuggers should note that
-this syntax is slightly different from what they are used to.
+this syntax is slightly different from what you are used to.
With the @command{gawk} debugger, you give the arguments for running the program
in the command line to the debugger rather than as part of the @code{run}
command at the debugger prompt.)
@@ -29175,10 +29252,10 @@ gawk> @kbd{n}
@end example
This tells us that @command{gawk} is now ready to execute line 66, which
-decides whether to give the lines the special ``field skipping'' treatment
+decides whether to give the lines the special ``field-skipping'' treatment
indicated by the @option{-1} command-line option. (Notice that we skipped
-from where we were before at line 63 to here, because the condition in line 63
-@samp{if (fcount == 0 && charcount == 0)} was false.)
+from where we were before, at line 63, to here, because the condition
+in line 63, @samp{if (fcount == 0 && charcount == 0)}, was false.)
Continuing to step, we now get to the splitting of the current and
last records:
@@ -29252,7 +29329,7 @@ gawk> @kbd{n}
Well, here we are at our error (sorry to spoil the suspense). What we
had in mind was to join the fields starting from the second one to make
-the virtual record to compare, and if the first field was numbered zero,
+the virtual record to compare, and if the first field were numbered zero,
this would work. Let's look at what we've got:
@example
@@ -29261,7 +29338,7 @@ gawk> @kbd{p cline clast}
@print{} clast = "awk is a wonderful program!"
@end example
-Hey, those look pretty familiar! They're just our original, unaltered,
+Hey, those look pretty familiar! They're just our original, unaltered
input records. A little thinking (the human brain is still the best
debugging tool), and we realize that we were off by one!
@@ -29311,11 +29388,11 @@ Miscellaneous
@end itemize
Each of these are discussed in the following subsections.
-In the following descriptions, commands which may be abbreviated
+In the following descriptions, commands that may be abbreviated
show the abbreviation on a second description line.
A debugger command name may also be truncated if that partial
name is unambiguous. The debugger has the built-in capability to
-automatically repeat the previous command just by hitting @key{Enter}.
+automatically repeat the previous command just by hitting @kbd{Enter}.
This works for the commands @code{list}, @code{next}, @code{nexti},
@code{step}, @code{stepi}, and @code{continue} executed without any
argument.
@@ -29365,7 +29442,7 @@ Set a breakpoint at entry to (the first instruction of)
function @var{function}.
@end table
-Each breakpoint is assigned a number which can be used to delete it from
+Each breakpoint is assigned a number that can be used to delete it from
the breakpoint list using the @code{delete} command.
With a breakpoint, you may also supply a condition. This is an
@@ -29417,7 +29494,7 @@ watchpoint is made unconditional).
@cindex breakpoint, delete by number
@item @code{delete} [@var{n1 n2} @dots{}] [@var{n}--@var{m}]
@itemx @code{d} [@var{n1 n2} @dots{}] [@var{n}--@var{m}]
-Delete specified breakpoints or a range of breakpoints. Deletes
+Delete specified breakpoints or a range of breakpoints. Delete
all defined breakpoints if no argument is supplied.
@cindex debugger commands, @code{disable}
@@ -29426,7 +29503,7 @@ all defined breakpoints if no argument is supplied.
@cindex breakpoint, how to disable or enable
@item @code{disable} [@var{n1 n2} @dots{} | @var{n}--@var{m}]
Disable specified breakpoints or a range of breakpoints. Without
-any argument, disables all breakpoints.
+any argument, disable all breakpoints.
@cindex debugger commands, @code{e} (@code{enable})
@cindex debugger commands, @code{enable}
@@ -29436,18 +29513,18 @@ any argument, disables all breakpoints.
@item @code{enable} [@code{del} | @code{once}] [@var{n1 n2} @dots{}] [@var{n}--@var{m}]
@itemx @code{e} [@code{del} | @code{once}] [@var{n1 n2} @dots{}] [@var{n}--@var{m}]
Enable specified breakpoints or a range of breakpoints. Without
-any argument, enables all breakpoints.
-Optionally, you can specify how to enable the breakpoint:
+any argument, enable all breakpoints.
+Optionally, you can specify how to enable the breakpoints:
@c nested table
@table @code
@item del
-Enable the breakpoint(s) temporarily, then delete it when
-the program stops at the breakpoint.
+Enable the breakpoints temporarily, then delete each one when
+the program stops at it.
@item once
-Enable the breakpoint(s) temporarily, then disable it when
-the program stops at the breakpoint.
+Enable the breakpoints temporarily, then disable each one when
+the program stops at it.
@end table
@cindex debugger commands, @code{ignore}
@@ -29515,7 +29592,7 @@ gawk>
@item @code{continue} [@var{count}]
@itemx @code{c} [@var{count}]
Resume program execution. If continued from a breakpoint and @var{count} is
-specified, ignores the breakpoint at that location the next @var{count} times
+specified, ignore the breakpoint at that location the next @var{count} times
before stopping.
@cindex debugger commands, @code{finish}
@@ -29569,7 +29646,7 @@ automatic display variables, and debugger options.
@item @code{step} [@var{count}]
@itemx @code{s} [@var{count}]
Continue execution until control reaches a different source line in the
-current stack frame. @code{step} steps inside any function called within
+current stack frame, stepping inside any function called within
the line. If the argument @var{count} is supplied, steps that many times before
stopping, unless it encounters a breakpoint or watchpoint.
@@ -29682,7 +29759,7 @@ or field.
String values must be enclosed between double quotes (@code{"}@dots{}@code{"}).
You can also set special @command{awk} variables, such as @code{FS},
-@code{NF}, @code{NR}, and son on.
+@code{NF}, @code{NR}, and so on.
@cindex debugger commands, @code{w} (@code{watch})
@cindex debugger commands, @code{watch}
@@ -29694,7 +29771,7 @@ You can also set special @command{awk} variables, such as @code{FS},
Add variable @var{var} (or field @code{$@var{n}}) to the watch list.
The debugger then stops whenever
the value of the variable or field changes. Each watched item is assigned a
-number which can be used to delete it from the watch list using the
+number that can be used to delete it from the watch list using the
@code{unwatch} command.
With a watchpoint, you may also supply a condition. This is an
@@ -29722,11 +29799,11 @@ watch list.
@node Execution Stack
@subsection Working with the Stack
-Whenever you run a program which contains any function calls,
+Whenever you run a program that contains any function calls,
@command{gawk} maintains a stack of all of the function calls leading up
to where the program is right now. You can see how you got to where you are,
and also move around in the stack to see what the state of things was in the
-functions which called the one you are in. The commands for doing this are:
+functions that called the one you are in. The commands for doing this are:
@table @asis
@cindex debugger commands, @code{bt} (@code{backtrace})
@@ -29761,8 +29838,8 @@ Then select and print the frame.
@item @code{frame} [@var{n}]
@itemx @code{f} [@var{n}]
Select and print stack frame @var{n}. Frame 0 is the currently executing,
-or @dfn{innermost}, frame (function call), frame 1 is the frame that
-called the innermost one. The highest numbered frame is the one for the
+or @dfn{innermost}, frame (function call); frame 1 is the frame that
+called the innermost one. The highest-numbered frame is the one for the
main program. The printed information consists of the frame number,
function and argument names, source file, and the source line.
@@ -29778,7 +29855,7 @@ Then select and print the frame.
Besides looking at the values of variables, there is often a need to get
other sorts of information about the state of your program and of the
-debugging environment itself. The @command{gawk} debugger has one command which
+debugging environment itself. The @command{gawk} debugger has one command that
provides this information, appropriately called @code{info}. @code{info}
is used with one of a number of arguments that tell it exactly what
you want to know:
@@ -29866,12 +29943,12 @@ The available options are:
@table @asis
@item @code{history_size}
@cindex debugger history size
-The maximum number of lines to keep in the history file @file{./.gawk_history}.
-The default is 100.
+Set the maximum number of lines to keep in the history file
+@file{./.gawk_history}. The default is 100.
@item @code{listsize}
@cindex debugger default list amount
-The number of lines that @code{list} prints. The default is 15.
+Specify the number of lines that @code{list} prints. The default is 15.
@item @code{outfile}
@cindex redirect @command{gawk} output, in debugger
@@ -29881,7 +29958,7 @@ standard output.
@item @code{prompt}
@cindex debugger prompt
-The debugger prompt. The default is @samp{@w{gawk> }}.
+Change the debugger prompt. The default is @samp{@w{gawk> }}.
@item @code{save_history} [@code{on} | @code{off}]
@cindex debugger history file
@@ -29892,7 +29969,7 @@ The default is @code{on}.
@cindex save debugger options
Save current options to file @file{./.gawkrc} upon exit.
The default is @code{on}.
-Options are read back in to the next session upon startup.
+Options are read back into the next session upon startup.
@item @code{trace} [@code{on} | @code{off}]
@cindex instruction tracing, in debugger
@@ -29915,7 +29992,7 @@ command in the file. Also, the list of commands may include additional
@code{source} commands; however, the @command{gawk} debugger will not source the
same file more than once in order to avoid infinite recursion.
-In addition to, or instead of the @code{source} command, you can use
+In addition to, or instead of, the @code{source} command, you can use
the @option{-D @var{file}} or @option{--debug=@var{file}} command-line
options to execute commands from a file non-interactively
(@pxref{Options}).
@@ -29924,16 +30001,16 @@ options to execute commands from a file non-interactively
@node Miscellaneous Debugger Commands
@subsection Miscellaneous Commands
-There are a few more commands which do not fit into the
+There are a few more commands that do not fit into the
previous categories, as follows:
@table @asis
@cindex debugger commands, @code{dump}
@cindex @code{dump} debugger command
@item @code{dump} [@var{filename}]
-Dump bytecode of the program to standard output or to the file
+Dump byte code of the program to standard output or to the file
named in @var{filename}. This prints a representation of the internal
-instructions which @command{gawk} executes to implement the @command{awk}
+instructions that @command{gawk} executes to implement the @command{awk}
commands in a program. This can be very enlightening, as the following
partial dump of Davide Brini's obfuscated code
(@pxref{Signature Program}) demonstrates:
@@ -30030,7 +30107,7 @@ Print lines centered around line number @var{n} in
source file @var{filename}. This command may change the current source file.
@item @var{function}
-Print lines centered around beginning of the
+Print lines centered around the beginning of the
function @var{function}. This command may change the current source file.
@end table
@@ -30042,16 +30119,16 @@ function @var{function}. This command may change the current source file.
@item @code{quit}
@itemx @code{q}
Exit the debugger. Debugging is great fun, but sometimes we all have
-to tend to other obligations in life, and sometimes we find the bug,
+to tend to other obligations in life, and sometimes we find the bug
and are free to go on to the next one! As we saw earlier, if you are
-running a program, the debugger warns you if you accidentally type
+running a program, the debugger warns you when you type
@samp{q} or @samp{quit}, to make sure you really want to quit.
@cindex debugger commands, @code{trace}
@cindex @code{trace} debugger command
@item @code{trace} [@code{on} | @code{off}]
-Turn on or off a continuous printing of instructions which are about to
-be executed, along with printing the @command{awk} line which they
+Turn on or off continuous printing of the instructions that are about to
+be executed, along with the @command{awk} lines they
implement. The default is @code{off}.
It is to be hoped that most of the ``opcodes'' in these instructions are
@@ -30067,7 +30144,7 @@ fairly self-explanatory, and using @code{stepi} and @code{nexti} while
If @command{gawk} is compiled with
@uref{http://cnswww.cns.cwru.edu/php/chet/readline/readline.html,
-the @code{readline} library}, you can take advantage of that library's
+the GNU Readline library}, you can take advantage of that library's
command completion and history expansion features. The following types
of completion are available:
@@ -30104,7 +30181,7 @@ and
We hope you find the @command{gawk} debugger useful and enjoyable to work with,
but as with any program, especially in its early releases, it still has
-some limitations. A few which are worth being aware of are:
+some limitations. A few that it's worth being aware of are:
@itemize @value{BULLET}
@item
@@ -30120,13 +30197,13 @@ If you perused the dump of opcodes in @ref{Miscellaneous Debugger Commands}
(or if you are already familiar with @command{gawk} internals),
you will realize that much of the internal manipulation of data
in @command{gawk}, as in many interpreters, is done on a stack.
-@code{Op_push}, @code{Op_pop}, and the like, are the ``bread and butter'' of
+@code{Op_push}, @code{Op_pop}, and the like are the ``bread and butter'' of
most @command{gawk} code.
Unfortunately, as of now, the @command{gawk}
debugger does not allow you to examine the stack's contents.
That is, the intermediate results of expression evaluation are on the
-stack, but cannot be printed. Rather, only variables which are defined
+stack, but cannot be printed. Rather, only variables that are defined
in the program can be printed. Of course, a workaround for
this is to use more explicit variables at the debugging stage and then
change back to obscure, perhaps more optimal code later.
@@ -30140,12 +30217,12 @@ programmer, you are expected to know the meaning of
@item
The @command{gawk} debugger is designed to be used by running a program (with all its
parameters) on the command line, as described in @ref{Debugger Invocation}.
-There is no way (as of now) to attach or ``break in'' to a running program.
-This seems reasonable for a language which is used mainly for quickly
+There is no way (as of now) to attach or ``break into'' a running program.
+This seems reasonable for a language that is used mainly for quickly
executing, short programs.
@item
-The @command{gawk} debugger only accepts source supplied with the @option{-f} option.
+The @command{gawk} debugger only accepts source code supplied with the @option{-f} option.
@end itemize
@ignore
@@ -30159,8 +30236,8 @@ be added, and of course feel free to try to add them yourself!
@itemize @value{BULLET}
@item
Programs rarely work correctly the first time. Finding bugs
-is @dfn{debugging} and a program that helps you find bugs is a
-@dfn{debugger}. @command{gawk} has a built-in debugger that works very
+is called debugging, and a program that helps you find bugs is a
+debugger. @command{gawk} has a built-in debugger that works very
similarly to the GNU Debugger, GDB.
@item
@@ -30180,7 +30257,7 @@ breakpoints, execution, viewing and changing data, working with the stack,
getting information, and other tasks.
@item
-If the @code{readline} library is available when @command{gawk} is
+If the GNU Readline library is available when @command{gawk} is
compiled, it is used by the debugger to provide command-line history
and editing.
@@ -30244,7 +30321,7 @@ paper and pencil (and/or a calculator). In theory, numbers can have an
arbitrary number of digits on either side (or both sides) of the decimal
point, and the results of a computation are always exact.
-Some modern system can do decimal arithmetic in hardware, but usually you
+Some modern systems can do decimal arithmetic in hardware, but usually you
need a special software library to provide access to these instructions.
There are also libraries that do decimal arithmetic entirely in software.
@@ -30262,8 +30339,8 @@ The disadvantage is that their range is limited.
@cindex integers, unsigned
In computers, integer values come in two flavors: @dfn{signed} and
@dfn{unsigned}. Signed values may be negative or positive, whereas
-unsigned values are always positive (i.e., greater than or equal
-to zero).
+unsigned values are always greater than or equal
+to zero.
In computer systems, integer arithmetic is exact, but the possible
range of values is limited. Integer arithmetic is generally faster than
@@ -30300,8 +30377,35 @@ signed. The possible ranges of values are shown in @ref{table-numeric-ranges}.
@item 32-bit unsigned integer @tab 0 @tab 4,294,967,295
@item 64-bit signed integer @tab @minus{}9,223,372,036,854,775,808 @tab 9,223,372,036,854,775,807
@item 64-bit unsigned integer @tab 0 @tab 18,446,744,073,709,551,615
-@item Single-precision floating point (approximate) @tab @code{1.175494e-38} @tab @code{3.402823e+38}
-@item Double-precision floating point (approximate) @tab @code{2.225074e-308} @tab @code{1.797693e+308}
+@iftex
+@item Single-precision floating point (approximate) @tab @math{1.175494^{-38}} @tab @math{3.402823^{38}}
+@item Double-precision floating point (approximate) @tab @math{2.225074^{-308}} @tab @math{1.797693^{308}}
+@end iftex
+@ifnottex
+@ifnotdocbook
+@item Single-precision floating point (approximate) @tab 1.175494e-38 @tab 3.402823e38
+@item Double-precision floating point (approximate) @tab 2.225074e-308 @tab 1.797693e308
+@end ifnotdocbook
+@end ifnottex
+@ifdocbook
+@item Single-precision floating point (approximate) @tab
+@c FIXME: Use @sup here for superscript
+@docbook
+1.175494<superscript>-38</superscript>
+@end docbook
+@tab
+@docbook
+3.402823<superscript>38</superscript>
+@end docbook
+@item Double-precision floating point (approximate) @tab
+@docbook
+2.225074<superscript>-308</superscript>
+@end docbook
+@tab
+@docbook
+1.797693<superscript>308</superscript>
+@end docbook
+@end ifdocbook
@end multitable
@end float
@@ -30310,7 +30414,7 @@ signed. The possible ranges of values are shown in @ref{table-numeric-ranges}.
The rest of this @value{CHAPTER} uses a number of terms. Here are some
informal definitions that should help you work your way through the material
-here.
+here:
@table @dfn
@item Accuracy
@@ -30331,7 +30435,7 @@ A special value representing infinity. Operations involving another
number and infinity produce infinity.
@item NaN
-``Not A Number.''@footnote{Thanks to Michael Brennan for this description,
+``Not a number.''@footnote{Thanks to Michael Brennan for this description,
which we have paraphrased, and for the examples.} A special value that
results from attempting a calculation that has no answer as a real number.
In such a case, programs can either receive a floating-point exception,
@@ -30374,8 +30478,8 @@ formula:
@end display
@noindent
-Here, @var{prec} denotes the binary precision
-(measured in bits) and @var{dps} (short for decimal places)
+Here, @emph{prec} denotes the binary precision
+(measured in bits) and @emph{dps} (short for decimal places)
is the decimal digits.
@item Rounding mode
@@ -30383,7 +30487,7 @@ How numbers are rounded up or down when necessary.
More details are provided later.
@item Significand
-A floating-point value consists the significand multiplied by 10
+A floating-point value consists of the significand multiplied by 10
to the power of the exponent. For example, in @code{1.2345e67},
the significand is @code{1.2345}.
@@ -30407,7 +30511,7 @@ to allow greater precisions and larger exponent ranges.
(@command{awk} uses only the 64-bit double-precision format.)
@ref{table-ieee-formats} lists the precision and exponent
-field values for the basic IEEE 754 binary formats:
+field values for the basic IEEE 754 binary formats.
@float Table,table-ieee-formats
@caption{Basic IEEE format values}
@@ -30471,12 +30575,12 @@ for more information.
@author Teen Talk Barbie, July 1992
@end quotation
-This @value{SECTION} provides a high level overview of the issues
+This @value{SECTION} provides a high-level overview of the issues
involved when doing lots of floating-point arithmetic.@footnote{There
is a very nice @uref{http://www.validlab.com/goldberg/paper.pdf,
paper on floating-point arithmetic} by David Goldberg, ``What Every
-Computer Scientist Should Know About Floating-point Arithmetic,''
-@cite{ACM Computing Surveys} @strong{23}, 1 (1991-03), 5-48. This is
+Computer Scientist Should Know About Floating-Point Arithmetic,''
+@cite{ACM Computing Surveys} @strong{23}, 1 (1991-03): 5-48. This is
worth reading if you are interested in the details, but it does require
a background in computer science.}
The discussion applies to both hardware and arbitrary-precision
@@ -30545,7 +30649,7 @@ $ @kbd{gawk 'BEGIN @{ x = 0.875; y = 0.425}
Often the error is so small you do not even notice it, and if you do,
you can always specify how much precision you would like in your output.
-Usually this is a format string like @code{"%.15g"}, which when
+Usually this is a format string like @code{"%.15g"}, which, when
used in the previous example, produces an output identical to the input.
@node Comparing FP Values
@@ -30584,7 +30688,7 @@ else
The loss of accuracy during a single computation with floating-point
numbers usually isn't enough to worry about. However, if you compute a
-value which is the result of a sequence of floating-point operations,
+value that is the result of a sequence of floating-point operations,
the error can accumulate and greatly affect the computation itself.
Here is an attempt to compute the value of @value{PI} using one of its
many series representations:
@@ -30637,7 +30741,7 @@ no easy answers. The standard rules of algebra often do not apply
when using floating-point arithmetic.
Among other things, the distributive and associative laws
do not hold completely, and order of operation may be important
-for your computation. Rounding error, cumulative precision loss
+for your computation. Rounding error, cumulative precision loss,
and underflow are often troublesome.
When @command{gawk} tests the expressions @samp{0.1 + 12.2} and
@@ -30677,7 +30781,8 @@ by our earlier attempt to compute the value of @value{PI}.
Extra precision can greatly enhance the stability and the accuracy
of your computation in such cases.
-Repeated addition is not necessarily equivalent to multiplication
+Additionally, you should understand that
+repeated addition is not necessarily equivalent to multiplication
in floating-point arithmetic. In the example in
@ref{Errors accumulate}:
@@ -30740,7 +30845,7 @@ to emulate an IEEE 754 binary format.
@float Table,table-predefined-precision-strings
@caption{Predefined precision strings for @code{PREC}}
@multitable {@code{"double"}} {12345678901234567890123456789012345}
-@headitem @code{PREC} @tab IEEE 754 Binary Format
+@headitem @code{PREC} @tab IEEE 754 binary format
@item @code{"half"} @tab 16-bit half-precision
@item @code{"single"} @tab Basic 32-bit single precision
@item @code{"double"} @tab Basic 64-bit double precision
@@ -30772,7 +30877,6 @@ than the default and cannot use a command-line assignment to @code{PREC},
you should either specify the constant as a string, or as a rational
number, whenever possible. The following example illustrates the
differences among various ways to print a floating-point constant:
-@end quotation
@example
$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", 0.1) @}'}
@@ -30784,22 +30888,23 @@ $ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", "0.1") @}'}
$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", 1/10) @}'}
@print{} 0.1000000000000000000000000
@end example
+@end quotation
@node Setting the rounding mode
@subsection Setting the Rounding Mode
The @code{ROUNDMODE} variable provides
-program level control over the rounding mode.
+program-level control over the rounding mode.
The correspondence between @code{ROUNDMODE} and the IEEE
rounding modes is shown in @ref{table-gawk-rounding-modes}.
@float Table,table-gawk-rounding-modes
@caption{@command{gawk} rounding modes}
@multitable @columnfractions .45 .30 .25
-@headitem Rounding Mode @tab IEEE Name @tab @code{ROUNDMODE}
+@headitem Rounding mode @tab IEEE name @tab @code{ROUNDMODE}
@item Round to nearest, ties to even @tab @code{roundTiesToEven} @tab @code{"N"} or @code{"n"}
-@item Round toward plus Infinity @tab @code{roundTowardPositive} @tab @code{"U"} or @code{"u"}
-@item Round toward negative Infinity @tab @code{roundTowardNegative} @tab @code{"D"} or @code{"d"}
+@item Round toward positive infinity @tab @code{roundTowardPositive} @tab @code{"U"} or @code{"u"}
+@item Round toward negative infinity @tab @code{roundTowardNegative} @tab @code{"D"} or @code{"d"}
@item Round toward zero @tab @code{roundTowardZero} @tab @code{"Z"} or @code{"z"}
@item Round to nearest, ties away from zero @tab @code{roundTiesToAway} @tab @code{"A"} or @code{"a"}
@end multitable
@@ -30860,8 +30965,8 @@ distributes upward and downward rounds of exact halves, which might
cause any accumulating round-off error to cancel itself out. This is the
default rounding mode for IEEE 754 computing functions and operators.
-The other rounding modes are rarely used. Round toward positive infinity
-(@code{roundTowardPositive}) and round toward negative infinity
+The other rounding modes are rarely used. Rounding toward positive infinity
+(@code{roundTowardPositive}) and toward negative infinity
(@code{roundTowardNegative}) are often used to implement interval
arithmetic, where you adjust the rounding mode to calculate upper and
lower bounds for the range of output. The @code{roundTowardZero} mode can
@@ -30903,6 +31008,7 @@ the following computes
@end docbook
the result of which is beyond the
limits of ordinary hardware double-precision floating-point values:
+@c FIXME: Use @sup here for superscript
@example
$ @kbd{gawk -M 'BEGIN @{}
@@ -30918,17 +31024,17 @@ If instead you were to compute the same value using arbitrary-precision
floating-point values, the precision needed for correct output (using
the formula
@iftex
-@math{prec = 3.322 @cdot dps}),
+@math{prec = 3.322 @cdot dps})
would be @math{3.322 @cdot 183231},
@end iftex
@ifnottex
@ifnotdocbook
-@samp{prec = 3.322 * dps}),
+@samp{prec = 3.322 * dps})
would be 3.322 x 183231,
@end ifnotdocbook
@end ifnottex
@docbook
-<emphasis>prec</emphasis> = 3.322 &sdot; <emphasis>dps</emphasis>),
+<emphasis>prec</emphasis> = 3.322 &sdot; <emphasis>dps</emphasis>)
would be
<emphasis>prec</emphasis> = 3.322 &sdot; 183231, @c
@end docbook
@@ -30966,7 +31072,7 @@ interface to process arbitrary-precision integers or mixed-mode numbers
as needed by an operation or function. In such a case, the precision is
set to the minimum value necessary for exact conversion, and the working
precision is not used for this purpose. If this is not what you need or
-want, you can employ a subterfuge, and convert the integer to floating
+want, you can employ a subterfuge and convert the integer to floating
point first, like this:
@example
@@ -31103,7 +31209,7 @@ word sizes. See
@node POSIX Floating Point Problems
@section Standards Versus Existing Practice
-Historically, @command{awk} has converted any non-numeric looking string
+Historically, @command{awk} has converted any nonnumeric-looking string
to the numeric value zero, when required. Furthermore, the original
definition of the language and the original POSIX standards specified that
@command{awk} only understands decimal numbers (base 10), and not octal
@@ -31120,8 +31226,8 @@ notation (e.g., @code{0xDEADBEEF}). (Note: data values, @emph{not}
source code constants.)
@item
-Support for the special IEEE 754 floating-point values ``Not A Number''
-(NaN), positive Infinity (``inf''), and negative Infinity (``@minus{}inf'').
+Support for the special IEEE 754 floating-point values ``not a number''
+(NaN), positive infinity (``inf''), and negative infinity (``@minus{}inf'').
In particular, the format for these values is as specified by the ISO 1999
C standard, which ignores case and can allow implementation-dependent additional
characters after the @samp{nan} and allow either @samp{inf} or @samp{infinity}.
@@ -31142,21 +31248,21 @@ values is also a very severe departure from historical practice.
@end itemize
The second problem is that the @command{gawk} maintainer feels that this
-interpretation of the standard, which requires a certain amount of
+interpretation of the standard, which required a certain amount of
``language lawyering'' to arrive at in the first place, was not even
-intended by the standard developers. In other words, ``we see how you
+intended by the standard developers. In other words, ``We see how you
got where you are, but we don't think that that's where you want to be.''
Recognizing these issues, but attempting to provide compatibility
with the earlier versions of the standard,
the 2008 POSIX standard added explicit wording to allow, but not require,
that @command{awk} support hexadecimal floating-point values and
-special values for ``Not A Number'' and infinity.
+special values for ``not a number'' and infinity.
Although the @command{gawk} maintainer continues to feel that
providing those features is inadvisable,
nevertheless, on systems that support IEEE floating point, it seems
-reasonable to provide @emph{some} way to support NaN and Infinity values.
+reasonable to provide @emph{some} way to support NaN and infinity values.
The solution implemented in @command{gawk} is as follows:
@itemize @value{BULLET}
@@ -31176,7 +31282,7 @@ $ @kbd{echo 0xDeadBeef | gawk --posix '@{ print $1 + 0 @}'}
@end example
@item
-Without @option{--posix}, @command{gawk} interprets the four strings
+Without @option{--posix}, @command{gawk} interprets the four string values
@samp{+inf},
@samp{-inf},
@samp{+nan},
@@ -31198,7 +31304,7 @@ $ @kbd{echo 0xDeadBeef | gawk '@{ print $1 + 0 @}'}
@end example
@command{gawk} ignores case in the four special values.
-Thus @samp{+nan} and @samp{+NaN} are the same.
+Thus, @samp{+nan} and @samp{+NaN} are the same.
@end itemize
@node Floating point summary
@@ -31211,9 +31317,9 @@ values. Standard @command{awk} uses double-precision
floating-point values.
@item
-In the early 1990s, Barbie mistakenly said ``Math class is tough!''
+In the early 1990s Barbie mistakenly said, ``Math class is tough!''
Although math isn't tough, floating-point arithmetic isn't the same
-as pencil and paper math, and care must be taken:
+as pencil-and-paper math, and care must be taken:
@c nested list
@itemize @value{MINUS}
@@ -31246,11 +31352,11 @@ arithmetic. Use @code{PREC} to set the precision in bits, and
@item
With @option{-M}, @command{gawk} performs
arbitrary-precision integer arithmetic using the GMP library.
-This is faster and more space efficient than using MPFR for
+This is faster and more space-efficient than using MPFR for
the same calculations.
@item
-There are several ``dark corners'' with respect to floating-point
+There are several areas with respect to floating-point
numbers where @command{gawk} disagrees with the POSIX standard.
It pays to be aware of them.
@@ -31258,7 +31364,7 @@ It pays to be aware of them.
Overall, there is no need to be unduly suspicious about the results from
floating-point arithmetic. The lesson to remember is that floating-point
arithmetic is always more complex than arithmetic using pencil and
-paper. In order to take advantage of the power of computer floating point,
+paper. In order to take advantage of the power of floating-point arithmetic,
you need to know its limitations and work within them. For most casual
use of floating-point arithmetic, you will often get the expected result
if you simply round the display of your final results to the correct number
@@ -31319,7 +31425,7 @@ Extensions are useful because they allow you (of course) to extend
@command{gawk}'s functionality. For example, they can provide access to
system calls (such as @code{chdir()} to change directory) and to other
C library routines that could be of use. As with most software,
-``the sky is the limit;'' if you can imagine something that you might
+``the sky is the limit''; if you can imagine something that you might
want to do and can write in C or C++, you can write an extension to do it!
Extensions are written in C or C++, using the @dfn{application programming
@@ -31327,7 +31433,7 @@ interface} (API) defined for this purpose by the @command{gawk}
developers. The rest of this @value{CHAPTER} explains
the facilities that the API provides and how to use
them, and presents a small example extension. In addition, it documents
-the sample extensions included in the @command{gawk} distribution,
+the sample extensions included in the @command{gawk} distribution
and describes the @code{gawkextlib} project.
@ifclear FOR_PRINT
@xref{Extension Design}, for a discussion of the extension mechanism
@@ -31480,7 +31586,7 @@ Some other bits and pieces:
@itemize @value{BULLET}
@item
The API provides access to @command{gawk}'s @code{do_@var{xxx}} values,
-reflecting command-line options, like @code{do_lint}, @code{do_profiling}
+reflecting command-line options, like @code{do_lint}, @code{do_profiling},
and so on (@pxref{Extension API Variables}).
These are informational: an extension cannot affect their values
inside @command{gawk}. In addition, attempting to assign to them
@@ -31525,7 +31631,7 @@ This (rather large) @value{SECTION} describes the API in detail.
@node Extension API Functions Introduction
@subsection Introduction
-Access to facilities within @command{gawk} are made available
+Access to facilities within @command{gawk} is achieved
by calling through function pointers passed into your extension.
API function pointers are provided for the following kinds of operations:
@@ -31553,7 +31659,7 @@ Output wrappers
Two-way processors
@end itemize
-All of these are discussed in detail, later in this @value{CHAPTER}.
+All of these are discussed in detail later in this @value{CHAPTER}.
@item
Printing fatal, warning, and ``lint'' warning messages.
@@ -31591,7 +31697,7 @@ Creating a new array
Clearing an array
@item
-Flattening an array for easy C style looping over all its indices and elements
+Flattening an array for easy C-style looping over all its indices and elements
@end itemize
@item
@@ -31607,8 +31713,9 @@ The following types, macros, and/or functions are referenced
in @file{gawkapi.h}. For correct use, you must therefore include the
corresponding standard header file @emph{before} including @file{gawkapi.h}:
+@c FIXME: Make this is a float at some point.
@multitable {@code{memset()}, @code{memcpy()}} {@code{<sys/types.h>}}
-@headitem C Entity @tab Header File
+@headitem C entity @tab Header file
@item @code{EOF} @tab @code{<stdio.h>}
@item Values for @code{errno} @tab @code{<errno.h>}
@item @code{FILE} @tab @code{<stdio.h>}
@@ -31634,7 +31741,7 @@ Doing so, however, is poor coding practice.
Although the API only uses ISO C 90 features, there is an exception; the
``constructor'' functions use the @code{inline} keyword. If your compiler
does not support this keyword, you should either place
-@samp{-Dinline=''} on your command line, or use the GNU Autotools and include a
+@samp{-Dinline=''} on your command line or use the GNU Autotools and include a
@file{config.h} file in your extensions.
@item
@@ -31642,7 +31749,7 @@ All pointers filled in by @command{gawk} point to memory
managed by @command{gawk} and should be treated by the extension as
read-only. Memory for @emph{all} strings passed into @command{gawk}
from the extension @emph{must} come from calling one of
-@code{gawk_malloc()}, @code{gawk_calloc()} or @code{gawk_realloc()},
+@code{gawk_malloc()}, @code{gawk_calloc()}, or @code{gawk_realloc()},
and is managed by @command{gawk} from then on.
@item
@@ -31656,7 +31763,7 @@ characters are allowed.
By intent, strings are maintained using the current multibyte encoding (as
defined by @env{LC_@var{xxx}} environment variables) and not using wide
characters. This matches how @command{gawk} stores strings internally
-and also how characters are likely to be input and output from files.
+and also how characters are likely to be input into and output from files.
@end quotation
@item
@@ -31701,6 +31808,8 @@ general-purpose use. Additional, more specialized, data structures are
introduced in subsequent @value{SECTION}s, together with the functions
that use them.
+The general-purpose types and structures are as follows:
+
@table @code
@item typedef void *awk_ext_id_t;
A value of this type is received from @command{gawk} when an extension is loaded.
@@ -31717,7 +31826,7 @@ while allowing @command{gawk} to use them as it needs to.
@itemx @ @ @ @ awk_false = 0,
@itemx @ @ @ @ awk_true
@itemx @} awk_bool_t;
-A simple boolean type.
+A simple Boolean type.
@item typedef struct awk_string @{
@itemx @ @ @ @ char *str;@ @ @ @ @ @ /* data */
@@ -31763,7 +31872,7 @@ The @code{val_type} member indicates what kind of value the
@itemx #define array_cookie@ @ @ u.a
@itemx #define scalar_cookie@ @ u.scl
@itemx #define value_cookie@ @ @ u.vc
-These macros make accessing the fields of the @code{awk_value_t} more
+Using these macros makes accessing the fields of the @code{awk_value_t} more
readable.
@item typedef void *awk_scalar_t;
@@ -31786,7 +31895,7 @@ indicates what is in the @code{union}.
Representing numbers is easy---the API uses a C @code{double}. Strings
require more work. Because @command{gawk} allows embedded @sc{nul} bytes
in string values, a string must be represented as a pair containing a
-data-pointer and length. This is the @code{awk_string_t} type.
+data pointer and length. This is the @code{awk_string_t} type.
Identifiers (i.e., the names of global variables) can be associated
with either scalar values or with arrays. In addition, @command{gawk}
@@ -31799,12 +31908,12 @@ of the @code{union} as if they were fields in a @code{struct}; this
is a common coding practice in C. Such code is easier to write and to
read, but it remains @emph{your} responsibility to make sure that
the @code{val_type} member correctly reflects the type of the value in
-the @code{awk_value_t}.
+the @code{awk_value_t} struct.
Conceptually, the first three members of the @code{union} (number, string,
and array) are all that is needed for working with @command{awk} values.
However, because the API provides routines for accessing and changing
-the value of global scalar variables only by using the variable's name,
+the value of a global scalar variable only by using the variable's name,
there is a performance penalty: @command{gawk} must find the variable
each time it is accessed and changed. This turns out to be a real issue,
not just a theoretical one.
@@ -31822,7 +31931,9 @@ See also the entry for ``Cookie'' in the @ref{Glossary}.
object for that variable, and then use
the cookie for getting the variable's value or for changing the variable's
value.
-This is the @code{awk_scalar_t} type and @code{scalar_cookie} macro.
+The @code{awk_scalar_t} type holds a scalar cookie, and the
+@code{scalar_cookie} macro provides access to the value of that type
+in the @code{awk_value_t} struct.
Given a scalar cookie, @command{gawk} can directly retrieve or
modify the value, as required, without having to find it first.
@@ -31831,8 +31942,8 @@ If you know that you wish to
use the same numeric or string @emph{value} for one or more variables,
you can create the value once, retaining a @dfn{value cookie} for it,
and then pass in that value cookie whenever you wish to set the value of a
-variable. This saves both storage space within the running @command{gawk}
-process as well as the time needed to create the value.
+variable. This saves storage space within the running @command{gawk}
+process and reduces the time needed to create the value.
@node Memory Allocation Functions
@subsection Memory Allocation Functions and Convenience Macros
@@ -31860,13 +31971,13 @@ be passed to @command{gawk}.
@item void gawk_free(void *ptr);
Call the correct version of @code{free()} to release storage that was
-allocated with @code{gawk_malloc()}, @code{gawk_calloc()} or @code{gawk_realloc()}.
+allocated with @code{gawk_malloc()}, @code{gawk_calloc()}, or @code{gawk_realloc()}.
@end table
The API has to provide these functions because it is possible
for an extension to be compiled and linked against a different
version of the C library than was used for the @command{gawk}
-executable.@footnote{This is more common on MS-Windows systems, but
+executable.@footnote{This is more common on MS-Windows systems, but it
can happen on Unix-like systems as well.} If @command{gawk} were
to use its version of @code{free()} when the memory came from an
unrelated version of @code{malloc()}, unexpected behavior would
@@ -31876,7 +31987,7 @@ Two convenience macros may be used for allocating storage
from @code{gawk_malloc()} and
@code{gawk_realloc()}. If the allocation fails, they cause @command{gawk}
to exit with a fatal error message. They should be used as if they were
-procedure calls that do not return a value.
+procedure calls that do not return a value:
@table @code
@item #define emalloc(pointer, type, size, message) @dots{}
@@ -31913,7 +32024,7 @@ make_malloced_string(message, strlen(message), & result);
@end example
@item #define erealloc(pointer, type, size, message) @dots{}
-This is like @code{emalloc()}, but it calls @code{gawk_realloc()},
+This is like @code{emalloc()}, but it calls @code{gawk_realloc()}
instead of @code{gawk_malloc()}.
The arguments are the same as for the @code{emalloc()} macro.
@end table
@@ -31928,28 +32039,28 @@ the way that extension code would use them:
@table @code
@item static inline awk_value_t *
-@itemx make_const_string(const char *string, size_t length, awk_value_t *result)
+@itemx make_const_string(const char *string, size_t length, awk_value_t *result);
This function creates a string value in the @code{awk_value_t} variable
pointed to by @code{result}. It expects @code{string} to be a C string constant
(or other string data), and automatically creates a @emph{copy} of the data
for storage in @code{result}. It returns @code{result}.
@item static inline awk_value_t *
-@itemx make_malloced_string(const char *string, size_t length, awk_value_t *result)
+@itemx make_malloced_string(const char *string, size_t length, awk_value_t *result);
This function creates a string value in the @code{awk_value_t} variable
pointed to by @code{result}. It expects @code{string} to be a @samp{char *}
-value pointing to data previously obtained from @code{gawk_malloc()}, @code{gawk_calloc()} or @code{gawk_realloc()}. The idea here
+value pointing to data previously obtained from @code{gawk_malloc()}, @code{gawk_calloc()}, or @code{gawk_realloc()}. The idea here
is that the data is passed directly to @command{gawk}, which assumes
responsibility for it. It returns @code{result}.
@item static inline awk_value_t *
-@itemx make_null_string(awk_value_t *result)
+@itemx make_null_string(awk_value_t *result);
This specialized function creates a null string (the ``undefined'' value)
in the @code{awk_value_t} variable pointed to by @code{result}.
It returns @code{result}.
@item static inline awk_value_t *
-@itemx make_number(double num, awk_value_t *result)
+@itemx make_number(double num, awk_value_t *result);
This function simply creates a numeric value in the @code{awk_value_t} variable
pointed to by @code{result}.
@end table
@@ -31989,7 +32100,7 @@ The fields are:
@table @code
@item const char *name;
The name of the new function.
-@command{awk} level code calls the function by this name.
+@command{awk}-level code calls the function by this name.
This is a regular C string.
Function names must obey the rules for @command{awk}
@@ -32003,7 +32114,7 @@ This is a pointer to the C function that provides the extension's
functionality.
The function must fill in @code{*result} with either a number
or a string. @command{gawk} takes ownership of any string memory.
-As mentioned earlier, string memory @strong{must} come from one of
+As mentioned earlier, string memory @emph{must} come from one of
@code{gawk_malloc()}, @code{gawk_calloc()}, or @code{gawk_realloc()}.
The @code{num_actual_args} argument tells the C function how many
@@ -32055,20 +32166,20 @@ The @code{exit_status} parameter is the exit status value that
@command{gawk} intends to pass to the @code{exit()} system call.
@item arg0
-A pointer to private data which @command{gawk} saves in order to pass to
+A pointer to private data that @command{gawk} saves in order to pass to
the function pointed to by @code{funcp}.
@end table
@end table
-Exit callback functions are called in last-in-first-out (LIFO)
+Exit callback functions are called in last-in, first-out (LIFO)
order---that is, in the reverse order in which they are registered with
@command{gawk}.
@node Extension Version String
@subsubsection Registering An Extension Version String
-You can register a version string which indicates the name and
-version of your extension, with @command{gawk}, as follows:
+You can register a version string that indicates the name and
+version of your extension with @command{gawk}, as follows:
@table @code
@item void register_ext_version(const char *version);
@@ -32090,7 +32201,7 @@ of @code{RS} to find the end of the record, and then uses @code{FS}
Additionally, it sets the value of @code{RT} (@pxref{Built-in Variables}).
If you want, you can provide your own custom input parser. An input
-parser's job is to return a record to the @command{gawk} record processing
+parser's job is to return a record to the @command{gawk} record-processing
code, along with indicators for the value and length of the data to be
used for @code{RT}, if any.
@@ -32108,9 +32219,9 @@ It should not change any state (variable values, etc.) within @command{gawk}.
@item awk_bool_t @var{XXX}_take_control_of(awk_input_buf_t *iobuf);
When @command{gawk} decides to hand control of the file over to the
input parser, it calls this function. This function in turn must fill
-in certain fields in the @code{awk_input_buf_t} structure, and ensure
+in certain fields in the @code{awk_input_buf_t} structure and ensure
that certain conditions are true. It should then return true. If an
-error of some kind occurs, it should not fill in any fields, and should
+error of some kind occurs, it should not fill in any fields and should
return false; then @command{gawk} will not use the input parser.
The details are presented shortly.
@end table
@@ -32203,7 +32314,7 @@ in the @code{struct stat}, or any combination of these factors.
Once @code{@var{XXX}_can_take_file()} has returned true, and
@command{gawk} has decided to use your input parser, it calls
-@code{@var{XXX}_take_control_of()}. That function then fills one of
+@code{@var{XXX}_take_control_of()}. That function then fills
either the @code{get_record} field or the @code{read_func} field in
the @code{awk_input_buf_t}. It must also ensure that @code{fd} is @emph{not}
set to @code{INVALID_HANDLE}. The following list describes the fields that
@@ -32225,21 +32336,21 @@ records. Said function is the core of the input parser. Its behavior
is described in the text following this list.
@item ssize_t (*read_func)();
-This function pointer should point to function that has the
+This function pointer should point to a function that has the
same behavior as the standard POSIX @code{read()} system call.
It is an alternative to the @code{get_record} pointer. Its behavior
is also described in the text following this list.
@item void (*close_func)(struct awk_input *iobuf);
This function pointer should point to a function that does
-the ``tear down.'' It should release any resources allocated by
+the ``teardown.'' It should release any resources allocated by
@code{@var{XXX}_take_control_of()}. It may also close the file. If it
does so, it should set the @code{fd} field to @code{INVALID_HANDLE}.
If @code{fd} is still not @code{INVALID_HANDLE} after the call to this
function, @command{gawk} calls the regular @code{close()} system call.
-Having a ``tear down'' function is optional. If your input parser does
+Having a ``teardown'' function is optional. If your input parser does
not need it, do not set this field. Then, @command{gawk} calls the
regular @code{close()} system call on the file descriptor, so it should
be valid.
@@ -32250,7 +32361,7 @@ input records. The parameters are as follows:
@table @code
@item char **out
-This is a pointer to a @code{char *} variable which is set to point
+This is a pointer to a @code{char *} variable that is set to point
to the record. @command{gawk} makes its own copy of the data, so
the extension must manage this storage.
@@ -32303,17 +32414,17 @@ set this field explicitly.
You must choose one method or the other: either a function that
returns a record, or one that returns raw data. In particular,
if you supply a function to get a record, @command{gawk} will
-call it, and never call the raw read function.
+call it, and will never call the raw read function.
@end quotation
@command{gawk} ships with a sample extension that reads directories,
-returning records for each entry in the directory (@pxref{Extension
+returning records for each entry in a directory (@pxref{Extension
Sample Readdir}). You may wish to use that code as a guide for writing
your own input parser.
When writing an input parser, you should think about (and document)
how it is expected to interact with @command{awk} code. You may want
-it to always be called, and take effect as appropriate (as the
+it to always be called, and to take effect as appropriate (as the
@code{readdir} extension does). Or you may want it to take effect
based upon the value of an @command{awk} variable, as the XML extension
from the @code{gawkextlib} project does (@pxref{gawkextlib}).
@@ -32423,7 +32534,7 @@ a pointer to any private data associated with the file.
These pointers should be set to point to functions that perform
the equivalent function as the @code{<stdio.h>} functions do, if appropriate.
@command{gawk} uses these function pointers for all output.
-@command{gawk} initializes the pointers to point to internal, ``pass through''
+@command{gawk} initializes the pointers to point to internal ``pass-through''
functions that just call the regular @code{<stdio.h>} functions, so an
extension only needs to redefine those functions that are appropriate for
what it does.
@@ -32434,7 +32545,7 @@ upon the @code{name} and @code{mode} fields, and any additional state
(such as @command{awk} variable values) that is appropriate.
When @command{gawk} calls @code{@var{XXX}_take_control_of()}, that function should fill
-in the other fields, as appropriate, except for @code{fp}, which it should just
+in the other fields as appropriate, except for @code{fp}, which it should just
use normally.
You register your output wrapper with the following function:
@@ -32474,14 +32585,14 @@ The fields are as follows:
The name of the two-way processor.
@item awk_bool_t (*can_take_two_way)(const char *name);
-This function returns true if it wants to take over two-way I/O for this @value{FN}.
+The function pointed to by this field should return true if it wants to take over two-way I/O for this @value{FN}.
It should not change any state (variable
values, etc.) within @command{gawk}.
@item awk_bool_t (*take_control_of)(const char *name,
@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_input_buf_t *inbuf,
@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_output_buf_t *outbuf);
-This function should fill in the @code{awk_input_buf_t} and
+The function pointed to by this field should fill in the @code{awk_input_buf_t} and
@code{awk_outut_buf_t} structures pointed to by @code{inbuf} and
@code{outbuf}, respectively. These structures were described earlier.
@@ -32510,7 +32621,7 @@ Register the two-way processor pointed to by @code{two_way_processor} with
You can print different kinds of warning messages from your
extension, as described here. Note that for these functions,
-you must pass in the extension id received from @command{gawk}
+you must pass in the extension ID received from @command{gawk}
when the extension was loaded:@footnote{Because the API uses only ISO C 90
features, it cannot make use of the ISO C 99 variadic macro feature to hide
that parameter. More's the pity.}
@@ -32563,7 +32674,7 @@ matches what you requested, the function returns true and fills
in the @code{awk_value_t} result.
Otherwise, the function returns false, and the @code{val_type}
member indicates the type of the actual value. You may then
-print an error message, or reissue the request for the actual
+print an error message or reissue the request for the actual
value type, as appropriate. This behavior is summarized in
@ref{table-value-types-returned}.
@@ -32596,32 +32707,32 @@ value type, as appropriate. This behavior is summarized in
<entry><para><emphasis role="bold">String</emphasis></para></entry>
<entry><para>String</para></entry>
<entry><para>String</para></entry>
- <entry><para>false</para></entry>
- <entry><para>false</para></entry>
+ <entry><para>False</para></entry>
+ <entry><para>False</para></entry>
</row>
<row>
<entry></entry>
<entry><para><emphasis role="bold">Number</emphasis></para></entry>
<entry><para>Number if can be converted, else false</para></entry>
<entry><para>Number</para></entry>
- <entry><para>false</para></entry>
- <entry><para>false</para></entry>
+ <entry><para>False</para></entry>
+ <entry><para>False</para></entry>
</row>
<row>
<entry><para><emphasis role="bold">Type</emphasis></para></entry>
<entry><para><emphasis role="bold">Array</emphasis></para></entry>
- <entry><para>false</para></entry>
- <entry><para>false</para></entry>
+ <entry><para>False</para></entry>
+ <entry><para>False</para></entry>
<entry><para>Array</para></entry>
- <entry><para>false</para></entry>
+ <entry><para>False</para></entry>
</row>
<row>
<entry><para><emphasis role="bold">Requested</emphasis></para></entry>
<entry><para><emphasis role="bold">Scalar</emphasis></para></entry>
<entry><para>Scalar</para></entry>
<entry><para>Scalar</para></entry>
- <entry><para>false</para></entry>
- <entry><para>false</para></entry>
+ <entry><para>False</para></entry>
+ <entry><para>False</para></entry>
</row>
<row>
<entry></entry>
@@ -32633,11 +32744,11 @@ value type, as appropriate. This behavior is summarized in
</row>
<row>
<entry></entry>
- <entry><para><emphasis role="bold">Value Cookie</emphasis></para></entry>
- <entry><para>false</para></entry>
- <entry><para>false</para></entry>
- <entry><para>false</para>
- </entry><entry><para>false</para></entry>
+ <entry><para><emphasis role="bold">Value cookie</emphasis></para></entry>
+ <entry><para>False</para></entry>
+ <entry><para>False</para></entry>
+ <entry><para>False</para>
+ </entry><entry><para>False</para></entry>
</row>
</tbody>
</tgroup>
@@ -32655,12 +32766,12 @@ value type, as appropriate. This behavior is summarized in
@end tex
@multitable @columnfractions .166 .166 .198 .15 .15 .166
@headitem @tab @tab String @tab Number @tab Array @tab Undefined
-@item @tab @b{String} @tab String @tab String @tab false @tab false
-@item @tab @b{Number} @tab Number if can be converted, else false @tab Number @tab false @tab false
-@item @b{Type} @tab @b{Array} @tab false @tab false @tab Array @tab false
-@item @b{Requested} @tab @b{Scalar} @tab Scalar @tab Scalar @tab false @tab false
+@item @tab @b{String} @tab String @tab String @tab False @tab False
+@item @tab @b{Number} @tab Number if can be converted, else false @tab Number @tab False @tab False
+@item @b{Type} @tab @b{Array} @tab False @tab False @tab Array @tab False
+@item @b{Requested} @tab @b{Scalar} @tab Scalar @tab Scalar @tab False @tab False
@item @tab @b{Undefined} @tab String @tab Number @tab Array @tab Undefined
-@item @tab @b{Value Cookie} @tab false @tab false @tab false @tab false
+@item @tab @b{Value cookie} @tab False @tab False @tab False @tab False
@end multitable
@end ifnotdocbook
@end ifnotplaintext
@@ -32671,21 +32782,21 @@ value type, as appropriate. This behavior is summarized in
+------------+------------+-----------+-----------+
| String | Number | Array | Undefined |
+-----------+-----------+------------+------------+-----------+-----------+
-| | String | String | String | false | false |
+| | String | String | String | False | False |
| |-----------+------------+------------+-----------+-----------+
-| | Number | Number if | Number | false | false |
+| | Number | Number if | Number | False | False |
| | | can be | | | |
| | | converted, | | | |
| | | else false | | | |
| |-----------+------------+------------+-----------+-----------+
-| Type | Array | false | false | Array | false |
+| Type | Array | False | False | Array | False |
| Requested |-----------+------------+------------+-----------+-----------+
-| | Scalar | Scalar | Scalar | false | false |
+| | Scalar | Scalar | Scalar | False | False |
| |-----------+------------+------------+-----------+-----------+
| | Undefined | String | Number | Array | Undefined |
| |-----------+------------+------------+-----------+-----------+
-| | Value | false | false | false | false |
-| | Cookie | | | | |
+| | Value | False | False | False | False |
+| | cookie | | | | |
+-----------+-----------+------------+------------+-----------+-----------+
@end example
@end ifplaintext
@@ -32702,16 +32813,16 @@ passed to your extension function. They are:
@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_valtype_t wanted,
@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_value_t *result);
Fill in the @code{awk_value_t} structure pointed to by @code{result}
-with the @code{count}'th argument. Return true if the actual
-type matches @code{wanted}, false otherwise. In the latter
+with the @code{count}th argument. Return true if the actual
+type matches @code{wanted}, and false otherwise. In the latter
case, @code{result@w{->}val_type} indicates the actual type
-(@pxref{table-value-types-returned}). Counts are zero based---the first
+(@pxref{table-value-types-returned}). Counts are zero-based---the first
argument is numbered zero, the second one, and so on. @code{wanted}
indicates the type of value expected.
@item awk_bool_t set_argument(size_t count, awk_array_t array);
Convert a parameter that was undefined into an array; this provides
-call-by-reference for arrays. Return false if @code{count} is too big,
+call by reference for arrays. Return false if @code{count} is too big,
or if the argument's type is not undefined. @DBXREF{Array Manipulation}
for more information on creating arrays.
@end table
@@ -32735,8 +32846,9 @@ allows you to create and release cached values.
The following routines provide the ability to access and update
global @command{awk}-level variables by name. In compiler terminology,
identifiers of different kinds are termed @dfn{symbols}, thus the ``sym''
-in the routines' names. The data structure which stores information
+in the routines' names. The data structure that stores information
about symbols is termed a @dfn{symbol table}.
+The functions are as follows:
@table @code
@item awk_bool_t sym_lookup(const char *name,
@@ -32745,14 +32857,14 @@ about symbols is termed a @dfn{symbol table}.
Fill in the @code{awk_value_t} structure pointed to by @code{result}
with the value of the variable named by the string @code{name}, which is
a regular C string. @code{wanted} indicates the type of value expected.
-Return true if the actual type matches @code{wanted}, false otherwise.
+Return true if the actual type matches @code{wanted}, and false otherwise.
In the latter case, @code{result->val_type} indicates the actual type
(@pxref{table-value-types-returned}).
@item awk_bool_t sym_update(const char *name, awk_value_t *value);
Update the variable named by the string @code{name}, which is a regular
C string. The variable is added to @command{gawk}'s symbol table
-if it is not there. Return true if everything worked, false otherwise.
+if it is not there. Return true if everything worked, and false otherwise.
Changing types (scalar to array or vice versa) of an existing variable
is @emph{not} allowed, nor may this routine be used to update an array.
@@ -32777,7 +32889,7 @@ populate it.
A @dfn{scalar cookie} is an opaque handle that provides access
to a global variable or array. It is an optimization that
avoids looking up variables in @command{gawk}'s symbol table every time
-access is needed. This was discussed earlier in @ref{General Data Types}.
+access is needed. This was discussed earlier, in @ref{General Data Types}.
The following functions let you work with scalar cookies:
@@ -32893,7 +33005,7 @@ and carefully check the return values from the API functions.
@subsubsection Creating and Using Cached Values
The routines in this section allow you to create and release
-cached values. As with scalar cookies, in theory, cached values
+cached values. Like scalar cookies, in theory, cached values
are not necessary. You can create numbers and strings using
the functions in @ref{Constructor Functions}. You can then
assign those values to variables using @code{sym_update()}
@@ -32971,7 +33083,7 @@ Using value cookies in this way saves considerable storage, as all of
@code{VAR1} through @code{VAR100} share the same value.
You might be wondering, ``Is this sharing problematic?
-What happens if @command{awk} code assigns a new value to @code{VAR1},
+What happens if @command{awk} code assigns a new value to @code{VAR1};
are all the others changed too?''
That's a great question. The answer is that no, it's not a problem.
@@ -33075,7 +33187,7 @@ modify them.
@node Array Functions
@subsubsection Array Functions
-The following functions relate to individual array elements.
+The following functions relate to individual array elements:
@table @code
@item awk_bool_t get_element_count(awk_array_t a_cookie, size_t *count);
@@ -33094,13 +33206,13 @@ Return false if @code{wanted} does not match the actual type or if
@code{index} is not in the array (@pxref{table-value-types-returned}).
The value for @code{index} can be numeric, in which case @command{gawk}
-converts it to a string. Using non-integral values is possible, but
+converts it to a string. Using nonintegral values is possible, but
requires that you understand how such values are converted to strings
-(@pxref{Conversion}); thus using integral values is safest.
+(@pxref{Conversion}); thus, using integral values is safest.
As with @emph{all} strings passed into @command{gawk} from an extension,
the string value of @code{index} must come from @code{gawk_malloc()},
-@code{gawk_calloc()} or @code{gawk_realloc()}, and
+@code{gawk_calloc()}, or @code{gawk_realloc()}, and
@command{gawk} releases the storage.
@item awk_bool_t set_array_element(awk_array_t a_cookie,
@@ -33156,7 +33268,7 @@ flatten an array and work with it.
@item awk_bool_t release_flattened_array(awk_array_t a_cookie,
@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_flat_array_t *data);
When done with a flattened array, release the storage using this function.
-You must pass in both the original array cookie, and the address of
+You must pass in both the original array cookie and the address of
the created @code{awk_flat_array_t} structure.
The function returns true upon success, false otherwise.
@end table
@@ -33166,7 +33278,7 @@ The function returns true upon success, false otherwise.
To @dfn{flatten} an array is to create a structure that
represents the full array in a fashion that makes it easy
-for C code to traverse the entire array. Test code
+for C code to traverse the entire array. Some of the code
in @file{extension/testext.c} does this, and also serves
as a nice example showing how to use the APIs.
@@ -33223,9 +33335,9 @@ dump_array_and_delete(int nargs, awk_value_t *result)
@end example
The function then proceeds in steps, as follows. First, retrieve
-the name of the array, passed as the first argument. Then
-retrieve the array itself. If either operation fails, print
-error messages and return:
+the name of the array, passed as the first argument, followed by
+the array itself. If either operation fails, print an
+error message and return:
@example
/* get argument named array as flat array and print it */
@@ -33261,7 +33373,7 @@ and print it:
@end example
The third step is to actually flatten the array, and then
-to double check that the count in the @code{awk_flat_array_t}
+to double-check that the count in the @code{awk_flat_array_t}
is the same as the count just retrieved:
@example
@@ -33282,7 +33394,7 @@ is the same as the count just retrieved:
The fourth step is to retrieve the index of the element
to be deleted, which was passed as the second argument.
Remember that argument counts passed to @code{get_argument()}
-are zero-based, thus the second argument is numbered one:
+are zero-based, and thus the second argument is numbered one:
@example
if (! get_argument(1, AWK_STRING, & value3)) @{
@@ -33297,7 +33409,7 @@ element values. In addition, upon finding the element with the
index that is supposed to be deleted, the function sets the
@code{AWK_ELEMENT_DELETE} bit in the @code{flags} field
of the element. When the array is released, @command{gawk}
-traverses the flattened array, and deletes any elements which
+traverses the flattened array, and deletes any elements that
have this flag bit set:
@example
@@ -33641,10 +33753,10 @@ The API versions are available at compile time as constants:
@table @code
@item GAWK_API_MAJOR_VERSION
-The major version of the API.
+The major version of the API
@item GAWK_API_MINOR_VERSION
-The minor version of the API.
+The minor version of the API
@end table
The minor version increases when new functions are added to the API. Such
@@ -33662,14 +33774,14 @@ constant integers:
@table @code
@item api->major_version
-The major version of the running @command{gawk}.
+The major version of the running @command{gawk}
@item api->minor_version
-The minor version of the running @command{gawk}.
+The minor version of the running @command{gawk}
@end table
It is up to the extension to decide if there are API incompatibilities.
-Typically a check like this is enough:
+Typically, a check like this is enough:
@example
if (api->major_version != GAWK_API_MAJOR_VERSION
@@ -33683,7 +33795,7 @@ if (api->major_version != GAWK_API_MAJOR_VERSION
@end example
Such code is included in the boilerplate @code{dl_load_func()} macro
-provided in @file{gawkapi.h} (discussed later, in
+provided in @file{gawkapi.h} (discussed in
@ref{Extension API Boilerplate}).
@node Extension API Informational Variables
@@ -33730,7 +33842,7 @@ as described here. The boilerplate needed is also provided in comments
in the @file{gawkapi.h} header file:
@example
-/* Boiler plate code: */
+/* Boilerplate code: */
int plugin_is_GPL_compatible;
static gawk_api_t *const api;
@@ -33789,7 +33901,7 @@ to @code{NULL}, or to point to a string giving the name and version of
your extension.
@item static awk_ext_func_t func_table[] = @{ @dots{} @};
-This is an array of one or more @code{awk_ext_func_t} structures
+This is an array of one or more @code{awk_ext_func_t} structures,
as described earlier (@pxref{Extension Functions}).
It can then be looped over for multiple calls to
@code{add_ext_func()}.
@@ -33920,7 +34032,7 @@ the @code{stat()} fails. It fills in the following elements:
@table @code
@item "name"
-The name of the file that was @code{stat()}'ed.
+The name of the file that was @code{stat()}ed.
@item "dev"
@itemx "ino"
@@ -33976,7 +34088,7 @@ interprocess communications).
The file is a directory.
@item "fifo"
-The file is a named-pipe (also known as a FIFO).
+The file is a named pipe (also known as a FIFO).
@item "file"
The file is just a regular file.
@@ -33999,7 +34111,7 @@ For some other systems, @dfn{a priori} knowledge is used to provide
a value. Where no value can be determined, it defaults to 512.
@end table
-Several additional elements may be present depending upon the operating
+Several additional elements may be present, depending upon the operating
system and the type of the file. You can test for them in your @command{awk}
program by using the @code{in} operator
(@pxref{Reference to Elements}):
@@ -34029,7 +34141,7 @@ edited slightly for presentation. See @file{extension/filefuncs.c}
in the @command{gawk} distribution for the complete version.}
The file includes a number of standard header files, and then includes
-the @file{gawkapi.h} header file which provides the API definitions.
+the @file{gawkapi.h} header file, which provides the API definitions.
Those are followed by the necessary variable declarations
to make use of the API macros and boilerplate code
(@pxref{Extension API Boilerplate}):
@@ -34070,9 +34182,9 @@ int plugin_is_GPL_compatible;
@cindex programming conventions, @command{gawk} extensions
By convention, for an @command{awk} function @code{foo()}, the C function
that implements it is called @code{do_foo()}. The function should have
-two arguments: the first is an @code{int} usually called @code{nargs},
+two arguments. The first is an @code{int}, usually called @code{nargs},
that represents the number of actual arguments for the function.
-The second is a pointer to an @code{awk_value_t}, usually named
+The second is a pointer to an @code{awk_value_t} structure, usually named
@code{result}:
@example
@@ -34118,7 +34230,7 @@ Finally, the function returns the return value to the @command{awk} level:
The @code{stat()} extension is more involved. First comes a function
that turns a numeric mode into a printable representation
-(e.g., 644 becomes @samp{-rw-r--r--}). This is omitted here for brevity:
+(e.g., octal @code{0644} becomes @samp{-rw-r--r--}). This is omitted here for brevity:
@example
/* format_mode --- turn a stat mode field into something readable */
@@ -34174,9 +34286,9 @@ array_set_numeric(awk_array_t array, const char *sub, double num)
The following function does most of the work to fill in
the @code{awk_array_t} result array with values obtained
-from a valid @code{struct stat}. It is done in a separate function
+from a valid @code{struct stat}. This work is done in a separate function
to support the @code{stat()} function for @command{gawk} and also
-to support the @code{fts()} extension which is included in
+to support the @code{fts()} extension, which is included in
the same file but whose code is not shown here
(@pxref{Extension Sample File Functions}).
@@ -34297,8 +34409,8 @@ the @code{stat()} system call instead of the @code{lstat()} system
call. This is done by using a function pointer: @code{statfunc}.
@code{statfunc} is initialized to point to @code{lstat()} (instead
of @code{stat()}) to get the file information, in case the file is a
-symbolic link. However, if there were three arguments, @code{statfunc}
-is set point to @code{stat()}, instead.
+symbolic link. However, if the third argument is included, @code{statfunc}
+is set to point to @code{stat()}, instead.
Here is the @code{do_stat()} function, which starts with
variable declarations and argument checking:
@@ -34354,7 +34466,7 @@ Next, it gets the information for the file. If the called function
/* always empty out the array */
clear_array(array);
- /* stat the file, if error, set ERRNO and return */
+ /* stat the file; if error, set ERRNO and return */
ret = statfunc(name, & sbuf);
if (ret < 0) @{
update_ERRNO_int(errno);
@@ -34376,7 +34488,9 @@ Finally, it's necessary to provide the ``glue'' that loads the
new function(s) into @command{gawk}.
The @code{filefuncs} extension also provides an @code{fts()}
-function, which we omit here. For its sake there is an initialization
+function, which we omit here
+(@pxref{Extension Sample File Functions}).
+For its sake, there is an initialization
function:
@example
@@ -34501,9 +34615,9 @@ $ @kbd{AWKLIBPATH=$PWD gawk -f testff.awk}
@section The Sample Extensions in the @command{gawk} Distribution
@cindex extensions distributed with @command{gawk}
-This @value{SECTION} provides brief overviews of the sample extensions
+This @value{SECTION} provides a brief overview of the sample extensions
that come in the @command{gawk} distribution. Some of them are intended
-for production use (e.g., the @code{filefuncs}, @code{readdir} and
+for production use (e.g., the @code{filefuncs}, @code{readdir}, and
@code{inplace} extensions). Others mainly provide example code that
shows how to use the extension API.
@@ -34539,14 +34653,14 @@ This is how you load the extension.
@item @code{result = chdir("/some/directory")}
The @code{chdir()} function is a direct hook to the @code{chdir()}
system call to change the current directory. It returns zero
-upon success or less than zero upon error. In the latter case, it updates
-@code{ERRNO}.
+upon success or a value less than zero upon error.
+In the latter case, it updates @code{ERRNO}.
@cindex @code{stat()} extension function
@item @code{result = stat("/some/path", statdata} [@code{, follow}]@code{)}
The @code{stat()} function provides a hook into the
@code{stat()} system call.
-It returns zero upon success or less than zero upon error.
+It returns zero upon success or a value less than zero upon error.
In the latter case, it updates @code{ERRNO}.
By default, it uses the @code{lstat()} system call. However, if passed
@@ -34573,10 +34687,10 @@ array with information retrieved from the filesystem, as follows:
@item @code{"major"} @tab @code{st_major} @tab Device files
@item @code{"minor"} @tab @code{st_minor} @tab Device files
@item @code{"blksize"} @tab @code{st_blksize} @tab All
-@item @code{"pmode"} @tab A human-readable version of the mode value, such as printed by
-@command{ls}. For example, @code{"-rwxr-xr-x"} @tab All
+@item @code{"pmode"} @tab A human-readable version of the mode value, like that printed by
+@command{ls} (for example, @code{"-rwxr-xr-x"}) @tab All
@item @code{"linkval"} @tab The value of the symbolic link @tab Symbolic links
-@item @code{"type"} @tab The type of the file as a string. One of
+@item @code{"type"} @tab The type of the file as a string---one of
@code{"file"},
@code{"blockdev"},
@code{"chardev"},
@@ -34586,15 +34700,15 @@ array with information retrieved from the filesystem, as follows:
@code{"symlink"},
@code{"door"},
or
-@code{"unknown"}.
-Not all systems support all file types. @tab All
+@code{"unknown"}
+(not all systems support all file types) @tab All
@end multitable
@cindex @code{fts()} extension function
@item @code{flags = or(FTS_PHYSICAL, ...)}
@itemx @code{result = fts(pathlist, flags, filedata)}
Walk the file trees provided in @code{pathlist} and fill in the
-@code{filedata} array as described next. @code{flags} is the bitwise
+@code{filedata} array, as described next. @code{flags} is the bitwise
OR of several predefined values, also described in a moment.
Return zero if there were no errors, otherwise return @minus{}1.
@end table
@@ -34650,7 +34764,8 @@ During a traversal, do not cross onto a different mounted filesystem.
@end table
@item filedata
-The @code{filedata} array is first cleared. Then, @code{fts()} creates
+The @code{filedata} array holds the results.
+@code{fts()} first clears it. Then it creates
an element in @code{filedata} for every element in @code{pathlist}.
The index is the name of the directory or file given in @code{pathlist}.
The element for this index is itself an array. There are two cases:
@@ -34692,7 +34807,7 @@ for a file: @code{"path"}, @code{"stat"}, and @code{"error"}.
@end table
The @code{fts()} function returns zero if there were no errors.
-Otherwise it returns @minus{}1.
+Otherwise, it returns @minus{}1.
@quotation NOTE
The @code{fts()} extension does not exactly mimic the
@@ -34734,14 +34849,14 @@ The arguments to @code{fnmatch()} are:
@table @code
@item pattern
-The @value{FN} wildcard to match.
+The @value{FN} wildcard to match
@item string
-The @value{FN} string.
+The @value{FN} string
@item flag
Either zero, or the bitwise OR of one or more of the
-flags in the @code{FNM} array.
+flags in the @code{FNM} array
@end table
The flags are as follows:
@@ -34778,14 +34893,14 @@ This is how you load the extension.
@cindex @code{fork()} extension function
@item pid = fork()
This function creates a new process. The return value is zero in the
-child and the process-ID number of the child in the parent, or @minus{}1
+child and the process ID number of the child in the parent, or @minus{}1
upon error. In the latter case, @code{ERRNO} indicates the problem.
In the child, @code{PROCINFO["pid"]} and @code{PROCINFO["ppid"]} are
updated to reflect the correct values.
@cindex @code{waitpid()} extension function
@item ret = waitpid(pid)
-This function takes a numeric argument, which is the process-ID to
+This function takes a numeric argument, which is the process ID to
wait for. The return value is that of the
@code{waitpid()} system call.
@@ -34813,8 +34928,8 @@ else
@subsection Enabling In-Place File Editing
@cindex @code{inplace} extension
-The @code{inplace} extension emulates GNU @command{sed}'s @option{-i} option
-which performs ``in place'' editing of each input file.
+The @code{inplace} extension emulates GNU @command{sed}'s @option{-i} option,
+which performs ``in-place'' editing of each input file.
It uses the bundled @file{inplace.awk} include file to invoke the extension
properly:
@@ -34828,11 +34943,16 @@ properly:
# Please set INPLACE_SUFFIX to make a backup copy. For example, you may
# want to set INPLACE_SUFFIX to .bak on the command line or in a BEGIN rule.
+# N.B. We call inplace_end() in the BEGINFILE and END rules so that any
+# actions in an ENDFILE rule will be redirected as expected.
+
BEGINFILE @{
- inplace_begin(FILENAME, INPLACE_SUFFIX)
+ if (_inplace_filename != "")
+ inplace_end(_inplace_filename, INPLACE_SUFFIX)
+ inplace_begin(_inplace_filename = FILENAME, INPLACE_SUFFIX)
@}
-ENDFILE @{
+END @{
inplace_end(FILENAME, INPLACE_SUFFIX)
@}
@end group
@@ -34847,6 +34967,10 @@ If @code{INPLACE_SUFFIX} is not an empty string, the original file is
linked to a backup @value{FN} created by appending that suffix. Finally,
the temporary file is renamed to the original @value{FN}.
+The @code{_inplace_filename} variable serves to keep track of the
+current filename so as to not invoke @code{inplace_end()} before
+processing the first file.
+
If any error occurs, the extension issues a fatal error to terminate
processing immediately without damaging the original file.
@@ -34910,14 +35034,14 @@ they are read, with each entry returned as a record.
The record consists of three fields. The first two are the inode number and the
@value{FN}, separated by a forward slash character.
On systems where the directory entry contains the file type, the record
-has a third field (also separated by a slash) which is a single letter
+has a third field (also separated by a slash), which is a single letter
indicating the type of the file. The letters and their corresponding file
types are shown in @ref{table-readdir-file-types}.
@float Table,table-readdir-file-types
@caption{File types returned by the @code{readdir} extension}
@multitable @columnfractions .1 .9
-@headitem Letter @tab File Type
+@headitem Letter @tab File type
@item @code{b} @tab Block device
@item @code{c} @tab Character device
@item @code{d} @tab Directory
@@ -34945,7 +35069,7 @@ Here is an example:
@@load "readdir"
@dots{}
BEGIN @{ FS = "/" @}
-@{ print "file name is", $2 @}
+@{ print "@value{FN} is", $2 @}
@end example
@node Extension Sample Revout
@@ -34966,8 +35090,7 @@ BEGIN @{
@}
@end example
-The output from this program is:
-@samp{cinap t'nod}.
+The output from this program is @samp{cinap t'nod}.
@node Extension Sample Rev2way
@subsection Two-Way I/O Example
@@ -35022,7 +35145,7 @@ success, or zero upon failure.
@code{reada()} is the inverse of @code{writea()};
it reads the file named as its first argument, filling in
the array named as the second argument. It clears the array first.
-Here too, the return value is one on success and zero upon failure.
+Here too, the return value is one on success, or zero upon failure.
@end table
The array created by @code{reada()} is identical to that written by
@@ -35110,7 +35233,7 @@ it tries to use @code{GetSystemTimeAsFileTime()}.
Attempt to sleep for @var{seconds} seconds. If @var{seconds} is negative,
or the attempt to sleep fails, return @minus{}1 and set @code{ERRNO}.
Otherwise, return zero after sleeping for the indicated amount of time.
-Note that @var{seconds} may be a floating-point (non-integral) value.
+Note that @var{seconds} may be a floating-point (nonintegral) value.
Implementation details: depending on platform availability, this function
tries to use @code{nanosleep()} or @code{select()} to implement the delay.
@end table
@@ -35137,10 +35260,13 @@ project provides a number of @command{gawk} extensions, including one for
processing XML files. This is the evolution of the original @command{xgawk}
(XML @command{gawk}) project.
-As of this writing, there are six extensions:
+As of this writing, there are seven extensions:
@itemize @value{BULLET}
@item
+@code{errno} extension
+
+@item
GD graphics library extension
@item
@@ -35151,7 +35277,7 @@ PostgreSQL extension
@item
MPFR library extension
-(this provides access to a number of MPFR functions which @command{gawk}'s
+(this provides access to a number of MPFR functions that @command{gawk}'s
native MPFR support does not)
@item
@@ -35205,7 +35331,7 @@ make install @ii{Install the extensions}
If you have installed @command{gawk} in the standard way, then you
will likely not need the @option{--with-gawk} option when configuring
-@code{gawkextlib}. You may also need to use the @command{sudo} utility
+@code{gawkextlib}. You may need to use the @command{sudo} utility
to install both @command{gawk} and @code{gawkextlib}, depending upon
how your system works.
@@ -35230,7 +35356,7 @@ named @code{plugin_is_GPL_compatible}.
@item
Communication between @command{gawk} and an extension is two-way.
-@command{gawk} passes a @code{struct} to the extension which contains
+@command{gawk} passes a @code{struct} to the extension that contains
various data fields and function pointers. The extension can then call
into @command{gawk} via the supplied function pointers to accomplish
certain tasks.
@@ -35243,7 +35369,7 @@ By convention, implementation functions are named @code{do_@var{XXXX}()}
for some @command{awk}-level function @code{@var{XXXX}()}.
@item
-The API is defined in a header file named @file{gawkpi.h}. You must include
+The API is defined in a header file named @file{gawkapi.h}. You must include
a number of standard header files @emph{before} including it in your source file.
@item
@@ -35288,7 +35414,7 @@ getting the count of elements in an array;
creating a new array;
clearing an array;
and
-flattening an array for easy C style looping over all its indices and elements)
+flattening an array for easy C-style looping over all its indices and elements)
@end itemize
@item
@@ -35296,7 +35422,7 @@ The API defines a number of standard data types for representing
@command{awk} values, array elements, and arrays.
@item
-The API provide convenience functions for constructing values.
+The API provides convenience functions for constructing values.
It also provides memory management functions to ensure compatibility
between memory allocated by @command{gawk} and memory allocated by an
extension.
@@ -35322,8 +35448,8 @@ file make this easier to do.
@item
The @command{gawk} distribution includes a number of small but useful
-sample extensions. The @code{gawkextlib} project includes several more,
-larger, extensions. If you wish to write an extension and contribute it
+sample extensions. The @code{gawkextlib} project includes several more
+(larger) extensions. If you wish to write an extension and contribute it
to the community of @command{gawk} users, the @code{gawkextlib} project
is the place to do so.
@@ -35451,81 +35577,81 @@ cross-references to further details:
@itemize @value{BULLET}
@item
The requirement for @samp{;} to separate rules on a line
-(@pxref{Statements/Lines}).
+(@pxref{Statements/Lines})
@item
User-defined functions and the @code{return} statement
-(@pxref{User-defined}).
+(@pxref{User-defined})
@item
The @code{delete} statement (@pxref{Delete}).
@item
The @code{do}-@code{while} statement
-(@pxref{Do Statement}).
+(@pxref{Do Statement})
@item
The built-in functions @code{atan2()}, @code{cos()}, @code{sin()}, @code{rand()}, and
-@code{srand()} (@pxref{Numeric Functions}).
+@code{srand()} (@pxref{Numeric Functions})
@item
The built-in functions @code{gsub()}, @code{sub()}, and @code{match()}
-(@pxref{String Functions}).
+(@pxref{String Functions})
@item
The built-in functions @code{close()} and @code{system()}
-(@pxref{I/O Functions}).
+(@pxref{I/O Functions})
@item
The @code{ARGC}, @code{ARGV}, @code{FNR}, @code{RLENGTH}, @code{RSTART},
-and @code{SUBSEP} predefined variables (@pxref{Built-in Variables}).
+and @code{SUBSEP} predefined variables (@pxref{Built-in Variables})
@item
-Assignable @code{$0} (@pxref{Changing Fields}).
+Assignable @code{$0} (@pxref{Changing Fields})
@item
The conditional expression using the ternary operator @samp{?:}
-(@pxref{Conditional Exp}).
+(@pxref{Conditional Exp})
@item
-The expression @samp{@var{index-variable} in @var{array}} outside of @code{for}
-statements (@pxref{Reference to Elements}).
+The expression @samp{@var{indx} in @var{array}} outside of @code{for}
+statements (@pxref{Reference to Elements})
@item
The exponentiation operator @samp{^}
(@pxref{Arithmetic Ops}) and its assignment operator
-form @samp{^=} (@pxref{Assignment Ops}).
+form @samp{^=} (@pxref{Assignment Ops})
@item
C-compatible operator precedence, which breaks some old @command{awk}
-programs (@pxref{Precedence}).
+programs (@pxref{Precedence})
@item
Regexps as the value of @code{FS}
(@pxref{Field Separators}) and as the
third argument to the @code{split()} function
(@pxref{String Functions}), rather than using only the first character
-of @code{FS}.
+of @code{FS}
@item
Dynamic regexps as operands of the @samp{~} and @samp{!~} operators
-(@pxref{Computed Regexps}).
+(@pxref{Computed Regexps})
@item
The escape sequences @samp{\b}, @samp{\f}, and @samp{\r}
-(@pxref{Escape Sequences}).
+(@pxref{Escape Sequences})
@item
Redirection of input for the @code{getline} function
-(@pxref{Getline}).
+(@pxref{Getline})
@item
Multiple @code{BEGIN} and @code{END} rules
-(@pxref{BEGIN/END}).
+(@pxref{BEGIN/END})
@item
Multidimensional arrays
-(@pxref{Multidimensional}).
+(@pxref{Multidimensional})
@end itemize
@node SVR4
@@ -35537,54 +35663,54 @@ The System V Release 4 (1989) version of Unix @command{awk} added these features
@itemize @value{BULLET}
@item
-The @code{ENVIRON} array (@pxref{Built-in Variables}).
+The @code{ENVIRON} array (@pxref{Built-in Variables})
@c gawk and MKS awk
@item
Multiple @option{-f} options on the command line
-(@pxref{Options}).
+(@pxref{Options})
@c MKS awk
@item
The @option{-v} option for assigning variables before program execution begins
-(@pxref{Options}).
+(@pxref{Options})
@c GNU, Bell Laboratories & MKS together
@item
-The @option{--} signal for terminating command-line options.
+The @option{--} signal for terminating command-line options
@item
The @samp{\a}, @samp{\v}, and @samp{\x} escape sequences
-(@pxref{Escape Sequences}).
+(@pxref{Escape Sequences})
@c GNU, for ANSI C compat
@item
A defined return value for the @code{srand()} built-in function
-(@pxref{Numeric Functions}).
+(@pxref{Numeric Functions})
@item
The @code{toupper()} and @code{tolower()} built-in string functions
for case translation
-(@pxref{String Functions}).
+(@pxref{String Functions})
@item
A cleaner specification for the @samp{%c} format-control letter in the
@code{printf} function
-(@pxref{Control Letters}).
+(@pxref{Control Letters})
@item
The ability to dynamically pass the field width and precision (@code{"%*.*d"})
in the argument list of @code{printf} and @code{sprintf()}
-(@pxref{Control Letters}).
+(@pxref{Control Letters})
@item
The use of regexp constants, such as @code{/foo/}, as expressions, where
they are equivalent to using the matching operator, as in @samp{$0 ~ /foo/}
-(@pxref{Using Constant Regexps}).
+(@pxref{Using Constant Regexps})
@item
Processing of escape sequences inside command-line variable assignments
-(@pxref{Assignment Options}).
+(@pxref{Assignment Options})
@end itemize
@node POSIX
@@ -35598,23 +35724,23 @@ introduced the following changes into the language:
@itemize @value{BULLET}
@item
The use of @option{-W} for implementation-specific options
-(@pxref{Options}).
+(@pxref{Options})
@item
The use of @code{CONVFMT} for controlling the conversion of numbers
-to strings (@pxref{Conversion}).
+to strings (@pxref{Conversion})
@item
The concept of a numeric string and tighter comparison rules to go
-with it (@pxref{Typing and Comparison}).
+with it (@pxref{Typing and Comparison})
@item
The use of predefined variables as function parameter names is forbidden
-(@pxref{Definition Syntax}).
+(@pxref{Definition Syntax})
@item
More complete documentation of many of the previously undocumented
-features of the language.
+features of the language
@end itemize
In 2012, a number of extensions that had been commonly available for
@@ -35623,15 +35749,15 @@ many years were finally added to POSIX. They are:
@itemize @value{BULLET}
@item
The @code{fflush()} built-in function for flushing buffered output
-(@pxref{I/O Functions}).
+(@pxref{I/O Functions})
@item
The @code{nextfile} statement
-(@pxref{Nextfile Statement}).
+(@pxref{Nextfile Statement})
@item
The ability to delete all of an array at once with @samp{delete @var{array}}
-(@pxref{Delete}).
+(@pxref{Delete})
@end itemize
@@ -35661,22 +35787,22 @@ originally appeared in his version of @command{awk}:
The @samp{**} and @samp{**=} operators
(@pxref{Arithmetic Ops}
and
-@ref{Assignment Ops}).
+@ref{Assignment Ops})
@item
The use of @code{func} as an abbreviation for @code{function}
-(@pxref{Definition Syntax}).
+(@pxref{Definition Syntax})
@item
The @code{fflush()} built-in function for flushing buffered output
-(@pxref{I/O Functions}).
+(@pxref{I/O Functions})
@ignore
@item
The @code{SYMTAB} array, that allows access to @command{awk}'s internal symbol
table. This feature was never documented for his @command{awk}, largely because
it is somewhat shakily implemented. For instance, you cannot access arrays
-or array elements through it.
+or array elements through it
@end ignore
@end itemize
@@ -35706,7 +35832,7 @@ Additional predefined variables:
@itemize @value{MINUS}
@item
The
-@code{ARGIND}
+@code{ARGIND},
@code{BINMODE},
@code{ERRNO},
@code{FIELDWIDTHS},
@@ -35718,7 +35844,7 @@ The
and
@code{TEXTDOMAIN}
variables
-(@pxref{Built-in Variables}).
+(@pxref{Built-in Variables})
@end itemize
@item
@@ -35726,15 +35852,15 @@ Special files in I/O redirections:
@itemize @value{MINUS}
@item
-The @file{/dev/stdin}, @file{/dev/stdout}, @file{/dev/stderr} and
+The @file{/dev/stdin}, @file{/dev/stdout}, @file{/dev/stderr}, and
@file{/dev/fd/@var{N}} special @value{FN}s
-(@pxref{Special Files}).
+(@pxref{Special Files})
@item
The @file{/inet}, @file{/inet4}, and @samp{/inet6} special files for
TCP/IP networking using @samp{|&} to specify which version of the
IP protocol to use
-(@pxref{TCP/IP Networking}).
+(@pxref{TCP/IP Networking})
@end itemize
@item
@@ -35743,37 +35869,41 @@ Changes and/or additions to the language:
@itemize @value{MINUS}
@item
The @samp{\x} escape sequence
-(@pxref{Escape Sequences}).
+(@pxref{Escape Sequences})
@item
Full support for both POSIX and GNU regexps
-(@pxref{Regexp}).
+(@pxref{Regexp})
@item
The ability for @code{FS} and for the third
argument to @code{split()} to be null strings
-(@pxref{Single Character Fields}).
+(@pxref{Single Character Fields})
@item
The ability for @code{RS} to be a regexp
-(@pxref{Records}).
+(@pxref{Records})
@item
The ability to use octal and hexadecimal constants in @command{awk}
program source code
-(@pxref{Nondecimal-numbers}).
+(@pxref{Nondecimal-numbers})
@item
The @samp{|&} operator for two-way I/O to a coprocess
-(@pxref{Two-way I/O}).
+(@pxref{Two-way I/O})
@item
Indirect function calls
-(@pxref{Indirect Calls}).
+(@pxref{Indirect Calls})
@item
Directories on the command line produce a warning and are skipped
-(@pxref{Command-line directories}).
+(@pxref{Command-line directories})
+
+@item
+Output with @code{print} and @code{printf} need not be fatal
+(@pxref{Nonfatal})
@end itemize
@item
@@ -35782,11 +35912,11 @@ New keywords:
@itemize @value{MINUS}
@item
The @code{BEGINFILE} and @code{ENDFILE} special patterns
-(@pxref{BEGINFILE/ENDFILE}).
+(@pxref{BEGINFILE/ENDFILE})
@item
The @code{switch} statement
-(@pxref{Switch Statement}).
+(@pxref{Switch Statement})
@end itemize
@item
@@ -35796,30 +35926,30 @@ Changes to standard @command{awk} functions:
@item
The optional second argument to @code{close()} that allows closing one end
of a two-way pipe to a coprocess
-(@pxref{Two-way I/O}).
+(@pxref{Two-way I/O})
@item
-POSIX compliance for @code{gsub()} and @code{sub()} with @option{--posix}.
+POSIX compliance for @code{gsub()} and @code{sub()} with @option{--posix}
@item
The @code{length()} function accepts an array argument
and returns the number of elements in the array
-(@pxref{String Functions}).
+(@pxref{String Functions})
@item
The optional third argument to the @code{match()} function
for capturing text-matching subexpressions within a regexp
-(@pxref{String Functions}).
+(@pxref{String Functions})
@item
Positional specifiers in @code{printf} formats for
making translations easier
-(@pxref{Printf Ordering}).
+(@pxref{Printf Ordering})
@item
The @code{split()} function's additional optional fourth
-argument which is an array to hold the text of the field separators
-(@pxref{String Functions}).
+argument, which is an array to hold the text of the field separators
+(@pxref{String Functions})
@end itemize
@item
@@ -35829,16 +35959,16 @@ Additional functions only in @command{gawk}:
@item
The @code{gensub()}, @code{patsplit()}, and @code{strtonum()} functions
for more powerful text manipulation
-(@pxref{String Functions}).
+(@pxref{String Functions})
@item
The @code{asort()} and @code{asorti()} functions for sorting arrays
-(@pxref{Array Sorting}).
+(@pxref{Array Sorting})
@item
The @code{mktime()}, @code{systime()}, and @code{strftime()}
functions for working with timestamps
-(@pxref{Time Functions}).
+(@pxref{Time Functions})
@item
The
@@ -35850,17 +35980,22 @@ The
and
@code{xor()}
functions for bit manipulation
-(@pxref{Bitwise Functions}).
+(@pxref{Bitwise Functions})
@c In 4.1, and(), or() and xor() grew the ability to take > 2 arguments
@item
The @code{isarray()} function to check if a variable is an array or not
-(@pxref{Type Functions}).
+(@pxref{Type Functions})
@item
-The @code{bindtextdomain()}, @code{dcgettext()} and @code{dcngettext()}
+The @code{bindtextdomain()}, @code{dcgettext()}, and @code{dcngettext()}
functions for internationalization
-(@pxref{Programmer i18n}).
+(@pxref{Programmer i18n})
+
+@item
+The @code{div()} function for doing integer
+division and remainder
+(@pxref{Numeric Functions})
@end itemize
@item
@@ -35870,12 +36005,12 @@ Changes and/or additions in the command-line options:
@item
The @env{AWKPATH} environment variable for specifying a path search for
the @option{-f} command-line option
-(@pxref{Options}).
+(@pxref{Options})
@item
The @env{AWKLIBPATH} environment variable for specifying a path search for
the @option{-l} command-line option
-(@pxref{Options}).
+(@pxref{Options})
@item
The
@@ -35904,7 +36039,7 @@ The
and
@option{-V}
short options. Also, the
-ability to use GNU-style long-named options that start with @option{--}
+ability to use GNU-style long-named options that start with @option{--},
and the
@option{--assign},
@option{--bignum},
@@ -35984,7 +36119,7 @@ GCC for VAX and Alpha has not been tested for a while.
@end itemize
@item
-Support for the following obsolete systems was removed from the code
+Support for the following obsolete system was removed from the code
for @command{gawk} @value{PVERSION} 4.1:
@c nested table
@@ -35994,8 +36129,14 @@ Ultrix
@end itemize
@item
-@c FIXME: Verify the version here.
-Support for MirBSD was removed at @command{gawk} @value{PVERSION} 4.2.
+Support for the following systems was removed from the code
+for @command{gawk} @value{PVERSION} 4.2:
+
+@c nested table
+@itemize @value{MINUS}
+@item
+MirBSD
+@end itemize
@end itemize
@@ -36609,6 +36750,44 @@ with a minimum of two
The dynamic extension interface was completely redone
(@pxref{Dynamic Extensions}).
+@item
+Support for Ultrix was removed.
+
+@end itemize
+
+Version 4.2 introduced the following changes:
+
+@itemize @bullet
+@item
+Changes to @code{ENVIRON} are reflected into @command{gawk}'s
+environment and that of programs that it runs.
+@xref{Auto-set}.
+
+@item
+The @option{--pretty-print} option no longer runs the @command{awk}
+program too.
+@xref{Options}.
+
+@item
+The @command{igawk} program and its manual page are no longer
+installed when @command{gawk} is built.
+@xref{Igawk Program}.
+
+@item
+The @code{div()} function.
+@xref{Numeric Functions}.
+
+@item
+The maximum number of hexdecimal digits in @samp{\x} escapes
+is now two.
+@xref{Escape Sequences}.
+
+@item
+Nonfatal output with @code{print} and @code{printf}.
+@xref{Nonfatal}.
+
+@item
+Support for MirBSD was removed.
@end itemize
@c XXX ADD MORE STUFF HERE
@@ -36624,9 +36803,9 @@ by @command{gawk}, Brian Kernighan's @command{awk}, and @command{mawk},
the three most widely used freely available versions of @command{awk}
(@pxref{Other Versions}).
-@multitable {@file{/dev/stderr} special file} {BWK Awk} {Mawk} {GNU Awk} {Now standard}
-@headitem Feature @tab BWK Awk @tab Mawk @tab GNU Awk @tab Now standard
-@item @samp{\x} Escape sequence @tab X @tab X @tab X @tab
+@multitable {@file{/dev/stderr} special file} {BWK @command{awk}} {@command{mawk}} {@command{gawk}} {Now standard}
+@headitem Feature @tab BWK @command{awk} @tab @command{mawk} @tab @command{gawk} @tab Now standard
+@item @samp{\x} escape sequence @tab X @tab X @tab X @tab
@item @code{FS} as null string @tab X @tab X @tab X @tab
@item @file{/dev/stdin} special file @tab X @tab X @tab X @tab
@item @file{/dev/stdout} special file @tab X @tab X @tab X @tab
@@ -36657,7 +36836,7 @@ in the machine's native character set. Thus, on ASCII-based systems,
@samp{[a-z]} matched all the lowercase letters, and only the lowercase
letters, as the numeric values for the letters from @samp{a} through
@samp{z} were contiguous. (On an EBCDIC system, the range @samp{[a-z]}
-includes additional, non-alphabetic characters as well.)
+includes additional nonalphabetic characters as well.)
Almost all introductory Unix literature explained range expressions
as working in this fashion, and in particular, would teach that the
@@ -36682,7 +36861,7 @@ What does that mean?
In many locales, @samp{A} and @samp{a} are both less than @samp{B}.
In other words, these locales sort characters in dictionary order,
and @samp{[a-dx-z]} is typically not equivalent to @samp{[abcdxyz]};
-instead it might be equivalent to @samp{[ABCXYabcdxyz]}, for example.
+instead, it might be equivalent to @samp{[ABCXYabcdxyz]}, for example.
This point needs to be emphasized: much literature teaches that you should
use @samp{[a-z]} to match a lowercase character. But on systems with
@@ -36711,23 +36890,23 @@ is perfectly valid in ASCII, but is not valid in many Unicode locales,
such as @code{en_US.UTF-8}.
Early versions of @command{gawk} used regexp matching code that was not
-locale aware, so ranges had their traditional interpretation.
+locale-aware, so ranges had their traditional interpretation.
When @command{gawk} switched to using locale-aware regexp matchers,
the problems began; especially as both GNU/Linux and commercial Unix
vendors started implementing non-ASCII locales, @emph{and making them
the default}. Perhaps the most frequently asked question became something
-like ``why does @samp{[A-Z]} match lowercase letters?!?''
+like, ``Why does @samp{[A-Z]} match lowercase letters?!?''
@cindex Berry, Karl
This situation existed for close to 10 years, if not more, and
the @command{gawk} maintainer grew weary of trying to explain that
-@command{gawk} was being nicely standards compliant, and that the issue
+@command{gawk} was being nicely standards-compliant, and that the issue
was in the user's locale. During the development of @value{PVERSION} 4.0,
he modified @command{gawk} to always treat ranges in the original,
pre-POSIX fashion, unless @option{--posix} was used (@pxref{Options}).@footnote{And
thus was born the Campaign for Rational Range Interpretation (or
-RRI). A number of GNU tools have either implemented this change,
+RRI). A number of GNU tools have already implemented this change,
or will soon. Thanks to Karl Berry for coining the phrase ``Rational
Range Interpretation.''}
@@ -36741,9 +36920,10 @@ and
By using this lovely technical term, the standard gives license
to implementors to implement ranges in whatever way they choose.
-The @command{gawk} maintainer chose to apply the pre-POSIX meaning in all
-cases: the default regexp matching; with @option{--traditional} and with
-@option{--posix}; in all cases, @command{gawk} remains POSIX compliant.
+The @command{gawk} maintainer chose to apply the pre-POSIX meaning
+both with the default regexp matching and when @option{--traditional} or
+@option{--posix} are used.
+In all cases @command{gawk} remains POSIX-compliant.
@node Contributors
@appendixsec Major Contributors to @command{gawk}
@@ -36789,7 +36969,7 @@ to around 90 pages.
Richard Stallman
helped finish the implementation and the initial draft of this
@value{DOCUMENT}.
-He is also the founder of the FSF and the GNU project.
+He is also the founder of the FSF and the GNU Project.
@item
@cindex Woods, John
@@ -36953,28 +37133,28 @@ John Haque made the following contributions:
@itemize @value{MINUS}
@item
The modifications to convert @command{gawk}
-into a byte-code interpreter, including the debugger.
+into a byte-code interpreter, including the debugger
@item
-The addition of true arrays of arrays.
+The addition of true arrays of arrays
@item
-The additional modifications for support of arbitrary-precision arithmetic.
+The additional modifications for support of arbitrary-precision arithmetic
@item
The initial text of
-@ref{Arbitrary Precision Arithmetic}.
+@ref{Arbitrary Precision Arithmetic}
@item
The work to merge the three versions of @command{gawk}
-into one, for the 4.1 release.
+into one, for the 4.1 release
@item
-Improved array internals for arrays indexed by integers.
+Improved array internals for arrays indexed by integers
@item
-The improved array sorting features were driven by John together
-with Pat Rankin.
+The improved array sorting features were also driven by John, together
+with Pat Rankin
@end itemize
@cindex Papadopoulos, Panos
@@ -37015,10 +37195,10 @@ helping David Trueman, and as the primary maintainer since around 1994.
@itemize @value{BULLET}
@item
The @command{awk} language has evolved over time. The first release
-was with V7 Unix circa 1978. In 1987, for System V Release 3.1,
+was with V7 Unix, circa 1978. In 1987, for System V Release 3.1,
major additions, including user-defined functions, were made to the language.
Additional changes were made for System V Release 4, in 1989.
-Since then, further minor changes happen under the auspices of the
+Since then, further minor changes have happened under the auspices of the
POSIX standard.
@item
@@ -37034,7 +37214,7 @@ options.
The interaction of POSIX locales and regexp matching in @command{gawk} has been confusing over
the years. Today, @command{gawk} implements Rational Range Interpretation, where
ranges of the form @samp{[a-z]} match @emph{only} the characters numerically between
-@samp{a} through @samp{z} in the machine's native character set. Usually this is ASCII
+@samp{a} through @samp{z} in the machine's native character set. Usually this is ASCII,
but it can be EBCDIC on IBM S/390 systems.
@item
@@ -37119,7 +37299,7 @@ will be less busy, and you can usually find one closer to your site.
@command{gawk} is distributed as several @code{tar} files compressed with
different compression programs: @command{gzip}, @command{bzip2},
and @command{xz}. For simplicity, the rest of these instructions assume
-you are using the one compressed with the GNU Zip program, @code{gzip}.
+you are using the one compressed with the GNU Gzip program (@command{gzip}).
Once you have the distribution (e.g.,
@file{gawk-@value{VERSION}.@value{PATCHLEVEL}.tar.gz}),
@@ -37170,12 +37350,12 @@ operating systems:
@table @asis
@item Various @samp{.c}, @samp{.y}, and @samp{.h} files
-The actual @command{gawk} source code.
+These files contain the actual @command{gawk} source code.
@end table
@table @file
@item ABOUT-NLS
-Information about GNU @command{gettext} and translations.
+A file containing information about GNU @command{gettext} and translations.
@item AUTHORS
A file with some information about the authorship of @command{gawk}.
@@ -37205,7 +37385,7 @@ An older list of changes to @command{gawk}.
The GNU General Public License.
@item POSIX.STD
-A description of behaviors in the POSIX standard for @command{awk} which
+A description of behaviors in the POSIX standard for @command{awk} that
are left undefined, or where @command{gawk} may not comply fully, as well
as a list of things that the POSIX standard should describe but does not.
@@ -37256,10 +37436,10 @@ The generated Info file for this @value{DOCUMENT}.
@item doc/gawkinet.texi
The Texinfo source file for
@ifinfo
-@inforef{Top, , General Introduction, gawkinet, TCP/IP Internetworking with @command{gawk}}.
+@inforef{Top, , General Introduction, gawkinet, @value{GAWKINETTITLE}}.
@end ifinfo
@ifnotinfo
-@cite{TCP/IP Internetworking with @command{gawk}}.
+@cite{@value{GAWKINETTITLE}}.
@end ifnotinfo
It should be processed with @TeX{}
(via @command{texi2dvi} or @command{texi2pdf})
@@ -37268,7 +37448,7 @@ with @command{makeinfo} to produce an Info or HTML file.
@item doc/gawkinet.info
The generated Info file for
-@cite{TCP/IP Internetworking with @command{gawk}}.
+@cite{@value{GAWKINETTITLE}}.
@item doc/igawk.1
The @command{troff} source for a manual page describing the @command{igawk}
@@ -37515,14 +37695,17 @@ Similarly, setting the @code{LINT} variable
(@pxref{User-modified})
has no effect on the running @command{awk} program.
-When used with GCC's automatic dead-code-elimination, this option
+When used with the GNU Compiler Collection's (GCC's)
+automatic dead-code-elimination, this option
cuts almost 23K bytes off the size of the @command{gawk}
executable on GNU/Linux x86_64 systems. Results on other systems and
with other compilers are likely to vary.
Using this option may bring you some slight performance improvement.
+@quotation CAUTION
Using this option will cause some of the tests in the test suite
to fail. This option may be removed at a later date.
+@end quotation
@cindex @option{--disable-nls} configuration option
@cindex configuration option, @code{--disable-nls}
@@ -37619,10 +37802,10 @@ running MS-DOS, any version of MS-Windows, or OS/2.
running MS-DOS and any version of MS-Windows.
@end ifset
In this @value{SECTION}, the term ``Windows32''
-refers to any of Microsoft Windows-95/98/ME/NT/2000/XP/Vista/7/8.
+refers to any of Microsoft Windows 95/98/ME/NT/2000/XP/Vista/7/8.
The limitations of MS-DOS (and MS-DOS shells under the other operating
-systems) has meant that various ``DOS extenders'' are often used with
+systems) have meant that various ``DOS extenders'' are often used with
programs such as @command{gawk}. The varying capabilities of Microsoft
Windows 3.1 and Windows32 can add to the confusion. For an overview
of the considerations, refer to @file{README_d/README.pc} in
@@ -37881,7 +38064,7 @@ Under MS-Windows, OS/2 and MS-DOS,
Under MS-Windows and MS-DOS,
@end ifset
@command{gawk} (and many other text programs) silently
-translate end-of-line @samp{\r\n} to @samp{\n} on input and @samp{\n}
+translates end-of-line @samp{\r\n} to @samp{\n} on input and @samp{\n}
to @samp{\r\n} on output. A special @code{BINMODE} variable @value{COMMONEXT}
allows control over these translations and is interpreted as follows:
@@ -37915,7 +38098,7 @@ Setting @code{BINMODE} for standard input or
standard output is accomplished by using an
appropriate @samp{-v BINMODE=@var{N}} option on the command line.
@code{BINMODE} is set at the time a file or pipe is opened and cannot be
-changed mid-stream.
+changed midstream.
The name @code{BINMODE} was chosen to match @command{mawk}
(@pxref{Other Versions}).
@@ -37971,8 +38154,8 @@ moved into the @code{BEGIN} rule.
@command{gawk} can be built and used ``out of the box'' under MS-Windows
if you are using the @uref{http://www.cygwin.com, Cygwin environment}.
-This environment provides an excellent simulation of GNU/Linux, using the
-GNU tools, such as Bash, the GNU Compiler Collection (GCC), GNU Make,
+This environment provides an excellent simulation of GNU/Linux, using
+Bash, GCC, GNU Make,
and other GNU programs. Compilation and installation for Cygwin is the
same as for a Unix system:
@@ -37991,7 +38174,7 @@ and then the @samp{make} proceeds as usual.
@appendixsubsubsec Using @command{gawk} In The MSYS Environment
In the MSYS environment under MS-Windows, @command{gawk} automatically
-uses binary mode for reading and writing files. Thus there is no
+uses binary mode for reading and writing files. Thus, there is no
need to use the @code{BINMODE} variable.
This can cause problems with other Unix-like components that have
@@ -38055,7 +38238,7 @@ With ODS-5 volumes and extended parsing enabled, the case of the target
parameter may need to be exact.
@command{gawk} has been tested under VAX/VMS 7.3 and Alpha/VMS 7.3-1
-using Compaq C V6.4, and Alpha/VMS 7.3, Alpha/VMS 7.3-2, and IA64/VMS 8.3.
+using Compaq C V6.4, and under Alpha/VMS 7.3, Alpha/VMS 7.3-2, and IA64/VMS 8.3.
The most recent builds used HP C V7.3 on Alpha VMS 8.3 and both
Alpha and IA64 VMS 8.4 used HP C 7.3.@footnote{The IA64 architecture
is also known as ``Itanium.''}
@@ -38103,7 +38286,7 @@ For VAX:
/name=(as_is,short)
@end example
-Compile time macros need to be defined before the first VMS-supplied
+Compile-time macros need to be defined before the first VMS-supplied
header file is included, as follows:
@example
@@ -38150,7 +38333,7 @@ If your @command{gawk} was installed by a PCSI kit into the
@file{GNV$GNU:[vms_help]gawk.hlp}.
The PCSI kit also installs a @file{GNV$GNU:[vms_bin]gawk_verb.cld} file
-which can be used to add @command{gawk} and @command{awk} as DCL commands.
+that can be used to add @command{gawk} and @command{awk} as DCL commands.
For just the current process you can use:
@@ -38159,7 +38342,7 @@ $ @kbd{set command gnv$gnu:[vms_bin]gawk_verb.cld}
@end example
Or the system manager can use @file{GNV$GNU:[vms_bin]gawk_verb.cld} to
-add the @command{gawk} and @command{awk} to the system wide @samp{DCLTABLES}.
+add the @command{gawk} and @command{awk} to the system-wide @samp{DCLTABLES}.
The DCL syntax is documented in the @file{gawk.hlp} file.
@@ -38225,14 +38408,14 @@ The @code{exit} value is a Unix-style value and is encoded into a VMS exit
status value when the program exits.
The VMS severity bits will be set based on the @code{exit} value.
-A failure is indicated by 1 and VMS sets the @code{ERROR} status.
-A fatal error is indicated by 2 and VMS sets the @code{FATAL} status.
+A failure is indicated by 1, and VMS sets the @code{ERROR} status.
+A fatal error is indicated by 2, and VMS sets the @code{FATAL} status.
All other values will have the @code{SUCCESS} status. The exit value is
encoded to comply with VMS coding standards and will have the
@code{C_FACILITY_NO} of @code{0x350000} with the constant @code{0xA000}
added to the number shifted over by 3 bits to make room for the severity codes.
-To extract the actual @command{gawk} exit code from the VMS status use:
+To extract the actual @command{gawk} exit code from the VMS status, use:
@example
unix_status = (vms_status .and. &x7f8) / 8
@@ -38251,7 +38434,7 @@ VAX/VMS floating point uses unbiased rounding. @xref{Round Function}.
VMS reports time values in GMT unless one of the @code{SYS$TIMEZONE_RULE}
or @code{TZ} logical names is set. Older versions of VMS, such as VAX/VMS
-7.3 do not set these logical names.
+7.3, do not set these logical names.
@c @cindex directory search
@c @cindex path, search
@@ -38269,7 +38452,7 @@ translation and not a multitranslation @code{RMS} searchlist.
The VMS GNV package provides a build environment similar to POSIX with ports
of a collection of open source tools. The @command{gawk} found in the GNV
-base kit is an older port. Currently the GNV project is being reorganized
+base kit is an older port. Currently, the GNV project is being reorganized
to supply individual PCSI packages for each component.
See @w{@uref{https://sourceforge.net/p/gnv/wiki/InstallingGNVPackages/}.}
@@ -38342,7 +38525,7 @@ recommend compiling and using the current version.
@cindex debugging @command{gawk}, bug reports
@cindex troubleshooting, @command{gawk}, bug reports
If you have problems with @command{gawk} or think that you have found a bug,
-report it to the developers; we cannot promise to do anything
+report it to the developers; we cannot promise to do anything,
but we might well want to fix it.
Before reporting a bug, make sure you have really found a genuine bug.
@@ -38352,7 +38535,7 @@ to do something or not, report that too; it's a bug in the documentation!
Before reporting a bug or trying to fix it yourself, try to isolate it
to the smallest possible @command{awk} program and input @value{DF} that
-reproduces the problem. Then send us the program and @value{DF},
+reproduce the problem. Then send us the program and @value{DF},
some idea of what kind of Unix system you're using,
the compiler you used to compile @command{gawk}, and the exact results
@command{gawk} gave you. Also say what you expected to occur; this helps
@@ -38367,7 +38550,7 @@ You can get this information with the command @samp{gawk --version}.
Once you have a precise problem description, send email to
@EMAIL{bug-gawk@@gnu.org,bug-gawk at gnu dot org}.
-The @command{gawk} maintainers subscribe to this address and
+The @command{gawk} maintainers subscribe to this address, and
thus they will receive your bug report.
Although you can send mail to the maintainers directly,
the bug reporting address is preferred because the
@@ -38394,8 +38577,8 @@ bug reporting system, you should also send a copy to
This is for two reasons. First, although some distributions forward
bug reports ``upstream'' to the GNU mailing list, many don't, so there is a good
chance that the @command{gawk} maintainers won't even see the bug report! Second,
-mail to the GNU list is archived, and having everything at the GNU project
-keeps things self-contained and not dependant on other organizations.
+mail to the GNU list is archived, and having everything at the GNU Project
+keeps things self-contained and not dependent on other organizations.
@end quotation
Non-bug suggestions are always welcome as well. If you have questions
@@ -38404,7 +38587,7 @@ features, ask on the bug list; we will try to help you out if we can.
If you find bugs in one of the non-Unix ports of @command{gawk},
send an email to the bug list, with a copy to the
-person who maintains that port. They are named in the following list,
+person who maintains that port. The maintainers are named in the following list,
as well as in the @file{README} file in the @command{gawk} distribution.
Information in the @file{README} file should be considered authoritative
if it conflicts with this @value{DOCUMENT}.
@@ -38419,19 +38602,19 @@ The people maintaining the various @command{gawk} ports are:
@cindex Robbins, Arnold
@cindex Zaretskii, Eli
@multitable {MS-Windows with MinGW} {123456789012345678901234567890123456789001234567890}
-@item Unix and POSIX systems @tab Arnold Robbins, @EMAIL{arnold@@skeeve.com,arnold at skeeve dot com}.
+@item Unix and POSIX systems @tab Arnold Robbins, @EMAIL{arnold@@skeeve.com,arnold at skeeve dot com}
-@item MS-DOS with DJGPP @tab Scott Deifik, @EMAIL{scottd.mail@@sbcglobal.net,scottd dot mail at sbcglobal dot net}.
+@item MS-DOS with DJGPP @tab Scott Deifik, @EMAIL{scottd.mail@@sbcglobal.net,scottd dot mail at sbcglobal dot net}
-@item MS-Windows with MinGW @tab Eli Zaretskii, @EMAIL{eliz@@gnu.org,eliz at gnu dot org}.
+@item MS-Windows with MinGW @tab Eli Zaretskii, @EMAIL{eliz@@gnu.org,eliz at gnu dot org}
@c Leave this in the print version on purpose.
@c OS/2 is not mentioned anywhere else in the print version though.
-@item OS/2 @tab Andreas Buening, @EMAIL{andreas.buening@@nexgo.de,andreas dot buening at nexgo dot de}.
+@item OS/2 @tab Andreas Buening, @EMAIL{andreas.buening@@nexgo.de,andreas dot buening at nexgo dot de}
-@item VMS @tab John Malmberg, @EMAIL{wb8tyw@@qsl.net,wb8tyw at qsl.net}.
+@item VMS @tab John Malmberg, @EMAIL{wb8tyw@@qsl.net,wb8tyw at qsl.net}
-@item z/OS (OS/390) @tab Dave Pitts, @EMAIL{dpitts@@cozx.com,dpitts at cozx dot com}.
+@item z/OS (OS/390) @tab Dave Pitts, @EMAIL{dpitts@@cozx.com,dpitts at cozx dot com}
@end multitable
If your bug is also reproducible under Unix, send a copy of your
@@ -38450,7 +38633,7 @@ Date: Wed, 4 Sep 1996 08:11:48 -0700 (PDT)
@cindex Brennan, Michael
@ifnotdocbook
@quotation
-@i{It's kind of fun to put comments like this in your awk code.}@*
+@i{It's kind of fun to put comments like this in your awk code:}@*
@ @ @ @ @ @ @code{// Do C++ comments work? answer: yes! of course}
@author Michael Brennan
@end quotation
@@ -38491,7 +38674,7 @@ It is available in several archive formats:
@end table
@cindex @command{git} utility
-You can also retrieve it from Git Hub:
+You can also retrieve it from GitHub:
@example
git clone git://github.com/onetrueawk/awk bwkawk
@@ -38551,7 +38734,7 @@ for a list of extensions in @command{mawk} that are not in POSIX @command{awk}.
@item @command{awka}
Written by Andrew Sumner,
@command{awka} translates @command{awk} programs into C, compiles them,
-and links them with a library of functions that provides the core
+and links them with a library of functions that provide the core
@command{awk} functionality.
It also has a number of extensions.
@@ -38572,17 +38755,17 @@ since approximately 2001.
Nelson H.F.@: Beebe at the University of Utah has modified
BWK @command{awk} to provide timing and profiling information.
It is different from @command{gawk} with the @option{--profile} option
-(@pxref{Profiling}),
+(@pxref{Profiling})
in that it uses CPU-based profiling, not line-count
profiling. You may find it at either
@uref{ftp://ftp.math.utah.edu/pub/pawk/pawk-20030606.tar.gz}
or
@uref{http://www.math.utah.edu/pub/pawk/pawk-20030606.tar.gz}.
-@item Busybox Awk
-@cindex Busybox Awk
-@cindex source code, Busybox Awk
-Busybox is a GPL-licensed program providing small versions of many
+@item BusyBox @command{awk}
+@cindex BusyBox Awk
+@cindex source code, BusyBox Awk
+BusyBox is a GPL-licensed program providing small versions of many
applications within a single executable. It is aimed at embedded systems.
It includes a full implementation of POSIX @command{awk}. When building
it, be careful not to do @samp{make install} as it will overwrite
@@ -38594,7 +38777,7 @@ information, see the @uref{http://busybox.net, project's home page}.
@cindex source code, Solaris @command{awk}
@item The OpenSolaris POSIX @command{awk}
The versions of @command{awk} in @file{/usr/xpg4/bin} and
-@file{/usr/xpg6/bin} on Solaris are more-or-less POSIX-compliant.
+@file{/usr/xpg6/bin} on Solaris are more or less POSIX-compliant.
They are based on the @command{awk} from Mortice Kern Systems for PCs.
We were able to make this code compile and work under GNU/Linux
with 1--2 hours of work. Making it more generally portable (using
@@ -38635,9 +38818,9 @@ features to Python. See @uref{https://github.com/alecthomas/pawk}
for more information. (This is not related to Nelson Beebe's
modified version of BWK @command{awk}, described earlier.)
-@item @w{QSE Awk}
-@cindex QSE Awk
-@cindex source code, QSE Awk
+@item @w{QSE @command{awk}}
+@cindex QSE @command{awk}
+@cindex source code, QSE @command{awk}
This is an embeddable @command{awk} interpreter. For more information,
see @uref{http://code.google.com/p/qse/} and @uref{http://awk.info/?tools/qse}.
@@ -38656,7 +38839,7 @@ since approximately 2008.
@item Other versions
See also the ``Versions and implementations'' section of the
@uref{http://en.wikipedia.org/wiki/Awk_language#Versions_and_implementations,
-Wikipedia article} for information on additional versions.
+Wikipedia article} on @command{awk} for information on additional versions.
@end table
@@ -38665,7 +38848,7 @@ Wikipedia article} for information on additional versions.
@itemize @value{BULLET}
@item
-The @command{gawk} distribution is available from GNU project's main
+The @command{gawk} distribution is available from the GNU Project's main
distribution site, @code{ftp.gnu.org}. The canonical build recipe is:
@example
@@ -38677,22 +38860,22 @@ cd gawk-@value{VERSION}.@value{PATCHLEVEL}
@item
@command{gawk} may be built on non-POSIX systems as well. The currently
-supported systems are MS-Windows using DJGPP, MSYS, MinGW and Cygwin,
+supported systems are MS-Windows using DJGPP, MSYS, MinGW, and Cygwin,
@ifclear FOR_PRINT
OS/2 using EMX,
@end ifclear
and both Vax/VMS and OpenVMS.
-Instructions for each system are included in this @value{CHAPTER}.
+Instructions for each system are included in this @value{APPENDIX}.
@item
Bug reports should be sent via email to @email{bug-gawk@@gnu.org}.
-Bug reports should be in English, and should include the version of @command{gawk},
-how it was compiled, and a short program and @value{DF} which demonstrate
+Bug reports should be in English and should include the version of @command{gawk},
+how it was compiled, and a short program and @value{DF} that demonstrate
the problem.
@item
There are a number of other freely available @command{awk}
-implementations. Many are POSIX compliant; others are less so.
+implementations. Many are POSIX-compliant; others are less so.
@end itemize