aboutsummaryrefslogtreecommitdiffstats
path: root/doc/gawktexi.in
diff options
context:
space:
mode:
Diffstat (limited to 'doc/gawktexi.in')
-rw-r--r--doc/gawktexi.in1941
1 files changed, 889 insertions, 1052 deletions
diff --git a/doc/gawktexi.in b/doc/gawktexi.in
index 87fa41ca..e127f428 100644
--- a/doc/gawktexi.in
+++ b/doc/gawktexi.in
@@ -50,8 +50,9 @@
@set VERSION 4.1
@set PATCHLEVEL 2
+@set GAWKINETTITLE TCP/IP Internetworking with @command{gawk}
@ifset FOR_PRINT
-@set TITLE Effective Awk Programming
+@set TITLE Effective awk Programming
@end ifset
@ifclear FOR_PRINT
@set TITLE GAWK: Effective AWK Programming
@@ -202,7 +203,7 @@
@set FFN Filename
@set DF datafile
@set DDF Datafile
-@set PVERSION Version
+@set PVERSION version
@end ifset
@c For HTML, spell out email addresses, to avoid problems with
@@ -299,7 +300,7 @@ All Rights Reserved.</literallayout>
@end docbook
@ifnotdocbook
-Copyright @copyright{} 1989, 1991, 1992, 1993, 1996--2005, 2007, 2009--2014 @*
+Copyright @copyright{} 1989, 1991, 1992, 1993, 1996--2005, 2007, 2009--2015 @*
Free Software Foundation, Inc.
@end ifnotdocbook
@sp 2
@@ -467,7 +468,7 @@ particular records in a file and perform operations upon them.
@command{gawk}.
* Internationalization:: Getting @command{gawk} to speak your
language.
-* Debugger:: The @code{gawk} debugger.
+* Debugger:: The @command{gawk} debugger.
* Arbitrary Precision Arithmetic:: Arbitrary precision arithmetic with
@command{gawk}.
* Dynamic Extensions:: Adding new built-in functions to
@@ -951,7 +952,7 @@ particular records in a file and perform operations upon them.
* Internal File Ops:: The code for internal file operations.
* Using Internal File Ops:: How to use an external extension.
* Extension Samples:: The sample extensions that ship with
- @code{gawk}.
+ @command{gawk}.
* Extension Sample File Functions:: The file functions sample.
* Extension Sample Fnmatch:: An interface to @code{fnmatch()}.
* Extension Sample Fork:: An interface to @code{fork()} and
@@ -1166,7 +1167,7 @@ interface to network protocols via special @file{/inet} files.
The programs in this book make clear that an AWK program is
typically much smaller and faster to develop than
a counterpart written in C.
-Consequently, there is often a payoff to prototype an
+Consequently, there is often a payoff to prototyping an
algorithm or design in AWK to get it running quickly and expose
problems early. Often, the interpreted performance is adequate
and the AWK prototype becomes the product.
@@ -1243,15 +1244,15 @@ March 2001
Some things don't change. Thirteen years ago I wrote:
``If you use AWK or want to learn how, then read this book.''
-True then and still true today.
+True then, and still true today.
-Learning to use a programming language is more than mastering the
+Learning to use a programming language is about more than mastering the
syntax. One needs to acquire an understanding of how to use the
features of the language to solve practical programming problems.
A focus of this book is many examples that show how to use AWK.
Some things do change. Our computers are much faster and have more memory.
-Consequently, speed and storage inefficiencies of a high level language
+Consequently, speed and storage inefficiencies of a high-level language
matter less. Prototyping in AWK and then rewriting in C for performance
reasons happens less, because more often the prototype is fast enough.
@@ -1259,12 +1260,12 @@ Of course, there are computing operations that are best done in C or C++.
With @command{gawk} 4.1 and later, you do not have to choose between writing
your program in AWK or in C/C++. You can write most of your
program in AWK and the aspects that require C/C++ capabilities can be written
-in C/C++ and then the pieces glued together when the @command{gawk} module loads
+in C/C++, and then the pieces glued together when the @command{gawk} module loads
the C/C++ module as a dynamic plug-in.
@c Chapter 16
@ref{Dynamic Extensions},
has all the
-details, and as expected, many examples to help you learn the ins and outs.
+details, and, as expected, many examples to help you learn the ins and outs.
I enjoy programming in AWK and had fun (re)reading this book.
I think you will too.
@@ -1339,7 +1340,7 @@ Generate reports
Validate data
@item
-Produce indexes and perform other document preparation tasks
+Produce indexes and perform other document-preparation tasks
@item
Experiment with algorithms that you can adapt later to other computer
@@ -1458,23 +1459,23 @@ help from me, thoroughly reworked @command{gawk} for compatibility
with the newer @command{awk}.
Circa 1994, I became the primary maintainer.
Current development focuses on bug fixes,
-performance improvements, standards compliance and, occasionally, new features.
+performance improvements, standards compliance, and, occasionally, new features.
In May 1997, J@"urgen Kahrs felt the need for network access
from @command{awk}, and with a little help from me, set about adding
features to do this for @command{gawk}. At that time, he also
wrote the bulk of
-@cite{TCP/IP Internetworking with @command{gawk}}
+@cite{@value{GAWKINETTITLE}}
(a separate document, available as part of the @command{gawk} distribution).
His code finally became part of the main @command{gawk} distribution
with @command{gawk} @value{PVERSION} 3.1.
John Haque rewrote the @command{gawk} internals, in the process providing
an @command{awk}-level debugger. This version became available as
-@command{gawk} @value{PVERSION} 4.0, in 2011.
+@command{gawk} @value{PVERSION} 4.0 in 2011.
@DBXREF{Contributors}
-for a full list of those who made important contributions to @command{gawk}.
+for a full list of those who have made important contributions to @command{gawk}.
@node Names
@unnumberedsec A Rose by Any Other Name
@@ -1487,7 +1488,7 @@ is often referred to as ``new @command{awk}.''
By analogy, the original version of @command{awk} is
referred to as ``old @command{awk}.''
-Today, on most systems, when you run the @command{awk} utility,
+Today, on most systems, when you run the @command{awk} utility
you get some version of new @command{awk}.@footnote{Only
Solaris systems still use an old @command{awk} for the
default @command{awk} utility. A more modern @command{awk} lives in
@@ -1547,7 +1548,9 @@ the POSIX standard for @command{awk}.
This @value{DOCUMENT} has the difficult task of being both a tutorial and a reference.
If you are a novice, feel free to skip over details that seem too complex.
You should also ignore the many cross-references; they are for the
-expert user and for the online Info and HTML versions of the @value{DOCUMENT}.
+expert user and for the Info and
+@uref{http://www.gnu.org/software/gawk/manual/, HTML}
+versions of the @value{DOCUMENT}.
@end ifnotinfo
There are sidebars
@@ -1580,7 +1583,7 @@ This @value{DOCUMENT} is split into several parts, as follows:
@itemize @value{BULLET}
@item
-Part I describes the @command{awk} language and @command{gawk} program in detail.
+Part I describes the @command{awk} language and the @command{gawk} program in detail.
It starts with the basics, and continues through all of the features of @command{awk}.
It contains the following chapters:
@@ -1627,10 +1630,10 @@ doing something when a record is matched, and the predefined variables
@item
@ref{Arrays},
-covers @command{awk}'s one-and-only data structure: associative arrays.
-Deleting array elements and whole arrays is also described, as well as
-sorting arrays in @command{gawk}. It also describes how @command{gawk}
-provides arrays of arrays.
+covers @command{awk}'s one-and-only data structure: the associative array.
+Deleting array elements and whole arrays is described, as well as
+sorting arrays in @command{gawk}. The @value{CHAPTER} also describes how
+@command{gawk} provides arrays of arrays.
@item
@ref{Functions},
@@ -1642,17 +1645,17 @@ as well as how to define your own functions. It also discusses how
@item
Part II shows how to use @command{awk} and @command{gawk} for problem solving.
There is lots of code here for you to read and learn from.
-It contains the following chapters:
+This part contains the following chapters:
@c nested
@itemize @value{MINUS}
@item
-@ref{Library Functions}, which provides a number of functions meant to
+@ref{Library Functions}, provides a number of functions meant to
be used from main @command{awk} programs.
@item
@ref{Sample Programs},
-which provides many sample @command{awk} programs.
+provides many sample @command{awk} programs.
@end itemize
Reading these two chapters allows you to see @command{awk}
@@ -1705,7 +1708,7 @@ including the GNU General Public License:
@item
@ref{Language History},
describes how the @command{awk} language has evolved since
-its first release to present. It also describes how @command{gawk}
+its first release to the present. It also describes how @command{gawk}
has acquired features over time.
@item
@@ -1748,7 +1751,7 @@ are completely unfamiliar with computer programming.
@item
@uref{http://www.gnu.org/software/gawk/manual/html_node/Glossary.html,
The Glossary}
-defines most, if not all of, the significant terms used
+defines most, if not all, of the significant terms used
throughout the @value{DOCUMENT}. If you find terms that you aren't familiar with,
try looking them up here.
@@ -1775,7 +1778,7 @@ and some possible future directions for @command{gawk} development.
provides some very cursory background material for those who
are completely unfamiliar with computer programming.
-The @ref{Glossary}, defines most, if not all of, the significant terms used
+The @ref{Glossary}, defines most, if not all, of the significant terms used
throughout the @value{DOCUMENT}. If you find terms that you aren't familiar with,
try looking them up here.
@@ -1818,7 +1821,7 @@ This typically represents the command's standard output.
Output from the command, usually its standard output, appears
@code{like this}.
@end ifset
-Error messages, and other output on the command's standard error, are preceded
+Error messages and other output on the command's standard error are preceded
by the glyph ``@error{}''. For example:
@example
@@ -1845,7 +1848,7 @@ there are special characters called ``control characters.'' These are
characters that you type by holding down both the @kbd{CONTROL} key and
another key, at the same time. For example, a @kbd{Ctrl-d} is typed
by first pressing and holding the @kbd{CONTROL} key, next
-pressing the @kbd{d} key and finally releasing both keys.
+pressing the @kbd{d} key, and finally releasing both keys.
For the sake of brevity, throughout this @value{DOCUMENT}, we refer to
Brian Kernighan's version of @command{awk} as ``BWK @command{awk}.''
@@ -1881,7 +1884,7 @@ the picture of a flashlight in the margin, as shown here.
@value{DARKCORNER}
@end iftex
@ifnottex
-``(d.c.)''.
+``(d.c.).''
@end ifnottex
@ifclear FOR_PRINT
They also appear in the index under the heading ``dark corner.''
@@ -1916,12 +1919,12 @@ Emacs editor. GNU Emacs is the most widely used version of Emacs today.
@cindex GPL (General Public License)
@cindex General Public License, See GPL
@cindex documentation, online
-The GNU@footnote{GNU stands for ``GNU's not Unix.''}
+The GNU@footnote{GNU stands for ``GNU's Not Unix.''}
Project is an ongoing effort on the part of the Free Software
Foundation to create a complete, freely distributable, POSIX-compliant
computing environment.
-The FSF uses the ``GNU General Public License'' (GPL) to ensure that
-their software's
+The FSF uses the GNU General Public License (GPL) to ensure that
+its software's
source code is always available to the end user.
@ifclear FOR_PRINT
A copy of the GPL is included
@@ -1981,7 +1984,7 @@ version of @command{awk}.
I started working with that version in the fall of 1988.
As work on it progressed,
the FSF published several preliminary versions (numbered 0.@var{x}).
-In 1996, Edition 1.0 was released with @command{gawk} 3.0.0.
+In 1996, edition 1.0 was released with @command{gawk} 3.0.0.
The FSF published the first two editions under
the title @cite{The GNU Awk User's Guide}.
@ifset FOR_PRINT
@@ -1993,7 +1996,7 @@ the third edition in 2001.
This edition maintains the basic structure of the previous editions.
For FSF edition 4.0, the content was thoroughly reviewed and updated. All
references to @command{gawk} versions prior to 4.0 were removed.
-Of significant note for that edition was @ref{Debugger}.
+Of significant note for that edition was the addition of @ref{Debugger}.
For FSF edition
@ifclear FOR_PRINT
@@ -2008,7 +2011,7 @@ and the major new additions are @ref{Arbitrary Precision Arithmetic},
and @ref{Dynamic Extensions}.
This @value{DOCUMENT} will undoubtedly continue to evolve. If you
-find an error in this @value{DOCUMENT}, please report it! @DBXREF{Bugs}
+find an error in the @value{DOCUMENT}, please report it! @DBXREF{Bugs}
for information on submitting problem reports electronically.
@ifset FOR_PRINT
@@ -2018,7 +2021,7 @@ for information on submitting problem reports electronically.
You may have a newer version of @command{gawk} than the
one described here. To find out what has changed,
you should first look at the @file{NEWS} file in the @command{gawk}
-distribution, which provides a high-level summary of what changed in
+distribution, which provides a high-level summary of the changes in
each release.
You can then look at the @uref{http://www.gnu.org/software/gawk/manual/,
@@ -2072,7 +2075,7 @@ The initial draft of @cite{The GAWK Manual} had the following acknowledgments:
Many people need to be thanked for their assistance in producing this
manual. Jay Fenlason contributed many ideas and sample programs. Richard
Mlynarik and Robert Chassell gave helpful comments on drafts of this
-manual. The paper @cite{A Supplemental Document for @command{awk}} by John W.@:
+manual. The paper @cite{A Supplemental Document for AWK} by John W.@:
Pierce of the Chemistry Department at UC San Diego, pinpointed several
issues relevant both to @command{awk} implementation and to this manual, that
would otherwise have escaped us.
@@ -2083,12 +2086,18 @@ I would like to acknowledge Richard M.@: Stallman, for his vision of a
better world and for his courage in founding the FSF and starting the
GNU Project.
+@ifclear FOR_PRINT
Earlier editions of this @value{DOCUMENT} had the following acknowledgements:
+@end ifclear
+@ifset FOR_PRINT
+The previous edition of this @value{DOCUMENT} had
+the following acknowledgements:
+@end ifset
@quotation
The following people (in alphabetical order)
provided helpful comments on various
-versions of this book,
+versions of this book:
Rick Adams,
Dr.@: Nelson H.F. Beebe,
Karl Berry,
@@ -2116,7 +2125,7 @@ Robert J.@: Chassell provided much valuable advice on
the use of Texinfo.
He also deserves special thanks for
convincing me @emph{not} to title this @value{DOCUMENT}
-@cite{How To Gawk Politely}.
+@cite{How to Gawk Politely}.
Karl Berry helped significantly with the @TeX{} part of Texinfo.
@cindex Hartholz, Marshall
@@ -2200,9 +2209,9 @@ a number of people. @DBXREF{Contributors} for the full list.
@ifset FOR_PRINT
@cindex Oram, Andy
-Thanks to Andy Oram, of O'Reilly Media, for initiating
+Thanks to Andy Oram of O'Reilly Media for initiating
the fourth edition and for his support during the work.
-Thanks to Jasmine Kwityn for her copy-editing work.
+Thanks to Jasmine Kwityn for her copyediting work.
@end ifset
Thanks to Michael Brennan for the Forewords.
@@ -2210,7 +2219,7 @@ Thanks to Michael Brennan for the Forewords.
@cindex Duman, Patrice
@cindex Berry, Karl
Thanks to Patrice Dumas for the new @command{makeinfo} program.
-Thanks to Karl Berry who continues to work to keep
+Thanks to Karl Berry, who continues to work to keep
the Texinfo markup language sane.
@cindex Kernighan, Brian
@@ -2220,8 +2229,8 @@ Robert P.J.@: Day, Michael Brennan, and Brian Kernighan kindly acted as
reviewers for the 2015 edition of this @value{DOCUMENT}. Their feedback
helped improve the final work.
-I would like to thank Brian Kernighan for invaluable assistance during the
-testing and debugging of @command{gawk}, and for ongoing
+I would also like to thank Brian Kernighan for his invaluable assistance during the
+testing and debugging of @command{gawk}, and for his ongoing
help and advice in clarifying numerous points about the language.
We could not have done nearly as good a job on either @command{gawk}
or its documentation without his help.
@@ -2332,9 +2341,9 @@ an advanced feature that we will ignore for now;
pattern to search for and one action to perform
upon finding the pattern.
-Syntactically, a rule consists of a pattern followed by an action. The
-action is enclosed in braces to separate it from the pattern.
-Newlines usually separate rules. Therefore, an @command{awk}
+Syntactically, a rule consists of a @dfn{pattern} followed by an
+@dfn{action}. The action is enclosed in braces to separate it from the
+pattern. Newlines usually separate rules. Therefore, an @command{awk}
program looks like this:
@example
@@ -2408,8 +2417,8 @@ awk '@var{program}' @var{input-file1} @var{input-file2} @dots{}
@end example
@noindent
-where @var{program} consists of a series of @var{patterns} and
-@var{actions}, as described earlier.
+where @var{program} consists of a series of patterns and
+actions, as described earlier.
@cindex single quote (@code{'})
@cindex @code{'} (single quote)
@@ -2428,12 +2437,12 @@ programs from shell scripts, because it avoids the need for a separate
file for the @command{awk} program. A self-contained shell script is more
reliable because there are no other files to misplace.
-Later in this chapter,
+Later in this chapter, in
@ifdocbook
the section
@end ifdocbook
@ref{Very Simple},
-presents several short,
+we'll see examples of several short,
self-contained programs.
@node Read Terminal
@@ -2454,10 +2463,10 @@ awk '@var{program}'
which usually means whatever you type on the keyboard. This continues
until you indicate end-of-file by typing @kbd{Ctrl-d}.
@ifset FOR_PRINT
-(On other operating systems, the end-of-file character may be different.)
+(On non-POSIX operating systems, the end-of-file character may be different.)
@end ifset
@ifclear FOR_PRINT
-(On other operating systems, the end-of-file character may be different.
+(On non-POSIX operating systems, the end-of-file character may be different.
For example, on OS/2, it is @kbd{Ctrl-z}.)
@end ifclear
@@ -2557,11 +2566,9 @@ for programs that are provided on the @command{awk} command line.
(Also, placing the program in a file allows us to use a literal single quote in the program
text, instead of the magic @samp{\47}.)
-@c STARTOFRANGE sq1x
@cindex single quote (@code{'}) in @command{gawk} command lines
-@c STARTOFRANGE qs2x
@cindex @code{'} (single quote) in @command{gawk} command lines
-If you want to clearly identify your @command{awk} program files as such,
+If you want to clearly identify an @command{awk} program file as such,
you can add the extension @file{.awk} to the @value{FN}. This doesn't
affect the execution of the @command{awk} program but it does make
``housekeeping'' easier.
@@ -2719,7 +2726,7 @@ The next @value{SUBSECTION} describes the shell's quoting rules.
@end quotation
@node Quoting
-@subsection Shell-Quoting Issues
+@subsection Shell Quoting Issues
@cindex shell quoting, rules for
@menu
@@ -2856,7 +2863,7 @@ $ @kbd{awk 'BEGIN @{ print "Here is a single quote <'"'"'>" @}'}
@noindent
This program consists of three concatenated quoted strings. The first and the
-third are single quoted, the second is double quoted.
+third are single-quoted, and the second is double-quoted.
This can be ``simplified'' to:
@@ -2877,8 +2884,6 @@ $ @kbd{awk "BEGIN @{ print \"Here is a single quote <'>\" @}"}
@end example
@noindent
-@c ENDOFRANGE sq1x
-@c ENDOFRANGE qs2x
This option is also painful, because double quotes, backslashes, and dollar signs
are very common in more advanced @command{awk} programs.
@@ -2895,7 +2900,7 @@ $ @kbd{awk 'BEGIN @{ print "Here is a double quote <\42>" @}'}
@end example
@noindent
-This works nicely, except that you should comment clearly what the
+This works nicely, but you should comment clearly what the
escapes mean.
A fourth option is to use command-line variable assignment, like this:
@@ -2906,11 +2911,11 @@ $ @kbd{awk -v sq="'" 'BEGIN @{ print "Here is a single quote <" sq ">" @}'}
@end example
(Here, the two string constants and the value of @code{sq} are concatenated
-into a single string which is printed by @code{print}.)
+into a single string that is printed by @code{print}.)
If you really need both single and double quotes in your @command{awk}
program, it is probably best to move it into a separate file, where
-the shell won't be part of the picture, and you can say what you mean.
+the shell won't be part of the picture and you can say what you mean.
@node DOS Quoting
@subsubsection Quoting in MS-Windows Batch Files
@@ -3009,7 +3014,7 @@ of green crates shipped, the number of red boxes shipped, the number of
orange bags shipped, and the number of blue packages shipped,
respectively. There are 16 entries, covering the 12 months of last year
and the first four months of the current year.
-An empty line separates the data for the two years.
+An empty line separates the data for the two years:
@example
@c file eg/data/inventory-shipped
@@ -3043,7 +3048,7 @@ The following command runs a simple @command{awk} program that searches the
input file @file{mail-list} for the character string @samp{li} (a
grouping of characters is usually called a @dfn{string};
the term @dfn{string} is based on similar usage in English, such
-as ``a string of pearls,'' or ``a string of cars in a train''):
+as ``a string of pearls'' or ``a string of cars in a train''):
@example
awk '/li/ @{ print $0 @}' mail-list
@@ -3090,7 +3095,7 @@ omitting the @code{print} statement but retaining the braces makes an
empty action that does nothing (i.e., no lines are printed).
@cindex @command{awk} programs, one-line examples
-Many practical @command{awk} programs are just a line or two. Following is a
+Many practical @command{awk} programs are just a line or two long. Following is a
collection of useful, short programs to get you started. Some of these
programs contain constructs that haven't been covered yet. (The description
of the program will give you a good idea of what is going on, but you'll
@@ -3111,7 +3116,7 @@ Print every line that is longer than 80 characters:
awk 'length($0) > 80' data
@end example
-The sole rule has a relational expression as its pattern and it has no
+The sole rule has a relational expression as its pattern and has no
action---so it uses the default action, printing the record.
@item
@@ -3198,7 +3203,7 @@ Print the even-numbered lines in the @value{DF}:
awk 'NR % 2 == 0' data
@end example
-If you use the expression @samp{NR % 2 == 1} instead,
+If you used the expression @samp{NR % 2 == 1} instead,
the program would print the odd-numbered lines.
@end itemize
@@ -3214,8 +3219,13 @@ no actions run.
After processing all the rules that match the line (and perhaps there are none),
@command{awk} reads the next line. (However,
-@pxref{Next Statement},
+@DBPXREF{Next Statement}
+@ifdocbook
+and @DBREF{Nextfile Statement}.)
+@end ifdocbook
+@ifnotdocbook
and also @pxref{Nextfile Statement}.)
+@end ifnotdocbook
This continues until the program reaches the end of the file.
For example, the following @command{awk} program contains two rules:
@@ -3480,7 +3490,7 @@ performing bit manipulation, for runtime string translation (internationalizatio
determining the type of a variable,
and array sorting.
-As we develop our presentation of the @command{awk} language, we introduce
+As we develop our presentation of the @command{awk} language, we will introduce
most of the variables and many of the functions. They are described
systematically in @DBREF{Built-in Variables} and in
@ref{Built-in}.
@@ -3534,7 +3544,7 @@ and Perl.}
@c FIXME: Review this chapter for summary of builtin functions called.
@itemize @value{BULLET}
@item
-Programs in @command{awk} consist of @var{pattern}-@var{action} pairs.
+Programs in @command{awk} consist of @var{pattern}--@var{action} pairs.
@item
An @var{action} without a @var{pattern} always runs. The default
@@ -3563,7 +3573,7 @@ part of a larger shell script (or MS-Windows batch file).
You may use backslash continuation to continue a source line.
Lines are automatically continued after
a comma, open brace, question mark, colon,
-@samp{||}, @samp{&&}, @code{do} and @code{else}.
+@samp{||}, @samp{&&}, @code{do}, and @code{else}.
@end itemize
@node Invoking Gawk
@@ -3638,20 +3648,16 @@ warning that the program is empty.
@node Options
@section Command-Line Options
-@c STARTOFRANGE ocl
@cindex options, command-line
-@c STARTOFRANGE clo
@cindex command line, options
-@c STARTOFRANGE gnulo
@cindex GNU long options
-@c STARTOFRANGE longo
@cindex options, long
Options begin with a dash and consist of a single character.
GNU-style long options consist of two dashes and a keyword.
The keyword can be abbreviated, as long as the abbreviation allows the option
-to be uniquely identified. If the option takes an argument, then the
-keyword is either immediately followed by an equals sign (@samp{=}) and the
+to be uniquely identified. If the option takes an argument, either the
+keyword is immediately followed by an equals sign (@samp{=}) and the
argument's value, or the keyword and the argument's value are separated
by whitespace.
If a particular option with a value is given more than once, it is the
@@ -3678,7 +3684,7 @@ Set the @code{FS} variable to @var{fs}
@cindex @option{-f} option
@cindex @option{--file} option
@cindex @command{awk} programs, location of
-Read @command{awk} program source from @var{source-file}
+Read the @command{awk} program source from @var{source-file}
instead of in the first nonoption argument.
This option may be given multiple times; the @command{awk}
program consists of the concatenation of the contents of
@@ -3733,8 +3739,6 @@ by the user that could start with @samp{-}.
It is also useful for passing options on to the @command{awk}
program; see @ref{Getopt Function}.
@end table
-@c ENDOFRANGE gnulo
-@c ENDOFRANGE longo
The following list describes @command{gawk}-specific options:
@@ -3746,14 +3750,14 @@ The following list describes @command{gawk}-specific options:
@cindex @option{--characters-as-bytes} option
Cause @command{gawk} to treat all input data as single-byte characters.
In addition, all output written with @code{print} or @code{printf}
-are treated as single-byte characters.
+is treated as single-byte characters.
Normally, @command{gawk} follows the POSIX standard and attempts to process
its input data according to the current locale (@pxref{Locales}). This can often involve
converting multibyte characters into wide characters (internally), and
can lead to problems or confusion if the input data does not contain valid
-multibyte characters. This option is an easy way to tell @command{gawk}:
-``hands off my data!''.
+multibyte characters. This option is an easy way to tell @command{gawk},
+``Hands off my data!''
@item @option{-c}
@itemx @option{--traditional}
@@ -3810,7 +3814,7 @@ Enable debugging of @command{awk} programs
By default, the debugger reads commands interactively from the keyboard
(standard input).
The optional @var{file} argument allows you to specify a file with a list
-of commands for the debugger to execute non-interactively.
+of commands for the debugger to execute noninteractively.
No space is allowed between the @option{-D} and @var{file}, if
@var{file} is supplied.
@@ -3870,7 +3874,7 @@ with @samp{#!} scripts (@pxref{Executable Scripts}), like so:
@cindex portable object files, generating
@cindex files, portable object, generating
Analyze the source program and
-generate a GNU @command{gettext} Portable Object Template file on standard
+generate a GNU @command{gettext} portable object template file on standard
output for all string constants that have been marked for translation.
@xref{Internationalization},
for information about this option.
@@ -3882,7 +3886,7 @@ for information about this option.
@cindex GNU long options, printing list of
@cindex options, printing list of
@cindex printing, list of options
-Print a ``usage'' message summarizing the short and long style options
+Print a ``usage'' message summarizing the short- and long-style options
that @command{gawk} accepts and then exit.
@item @option{-i} @var{source-file}
@@ -3892,7 +3896,7 @@ that @command{gawk} accepts and then exit.
@cindex @command{awk} programs, location of
Read an @command{awk} source library from @var{source-file}. This option
is completely equivalent to using the @code{@@include} directive inside
-your program. This option is very similar to the @option{-f} option,
+your program. It is very similar to the @option{-f} option,
but there are two important differences. First, when @option{-i} is
used, the program source is not loaded if it has been previously
loaded, whereas with @option{-f}, @command{gawk} always loads the file.
@@ -3977,7 +3981,7 @@ when parsing numeric input data (@pxref{Locales}).
@cindex @option{-o} option
@cindex @option{--pretty-print} option
Enable pretty-printing of @command{awk} programs.
-By default, output program is created in a file named @file{awkprof.out}
+By default, the output program is created in a file named @file{awkprof.out}
(@pxref{Profiling}).
The optional @var{file} argument allows you to specify a different
@value{FN} for the output.
@@ -4021,7 +4025,7 @@ in the left margin, and function call counts for each function.
Operate in strict POSIX mode. This disables all @command{gawk}
extensions (just like @option{--traditional}) and
disables all extensions not allowed by POSIX.
-@xref{Common Extensions}, for a summary of the extensions
+@DBXREF{Common Extensions} for a summary of the extensions
in @command{gawk} that are disabled by this option.
Also,
the following additional
@@ -4142,7 +4146,7 @@ source of data.)
Because it is clumsy using the standard @command{awk} mechanisms to mix
source file and command-line @command{awk} programs, @command{gawk}
provides the @option{-e} option. This does not require you to
-pre-empt the standard input for your source code; it allows you to easily
+preempt the standard input for your source code; it allows you to easily
mix command-line and library source code (@pxref{AWKPATH Variable}).
As with @option{-f}, the @option{-e} and @option{-i}
options may also be used multiple times on the command line.
@@ -4188,8 +4192,6 @@ setenv POSIXLY_CORRECT true
Having @env{POSIXLY_CORRECT} set is not recommended for daily use,
but it is good for testing the portability of your programs to other
environments.
-@c ENDOFRANGE ocl
-@c ENDOFRANGE clo
@node Other Arguments
@section Other Command-Line Arguments
@@ -4332,7 +4334,7 @@ file, unless the file is in the current directory.
But with @command{gawk}, if the @value{FN} supplied to the @option{-f}
or @option{-i} options
does not contain a directory separator @samp{/}, then @command{gawk} searches a list of
-directories (called the @dfn{search path}), one by one, looking for a
+directories (called the @dfn{search path}) one by one, looking for a
file with the specified name.
The search path is a string consisting of directory names
@@ -4373,9 +4375,9 @@ as an entry in the path or write a null entry in the path.
Different past versions of @command{gawk} would also look explicitly in
the current directory, either before or after the path search. As of
-@value{PVERSION} 4.1.2, this no longer happens, and if you wish to look
+@value{PVERSION} 4.1.2, this no longer happens; if you wish to look
in the current directory, you must include @file{.} either as a separate
-entry, or as a null entry in the search path.
+entry or as a null entry in the search path.
@end quotation
The default value for @env{AWKPATH} is
@@ -4491,7 +4493,7 @@ If this variable exists, @command{gawk} includes the @value{FN}
and line number within the @command{gawk} source code
from which warning and/or fatal messages
are generated. Its purpose is to help isolate the source of a
-message, as there are multiple places which produce the
+message, as there are multiple places that produce the
same warning or error message.
@item GAWK_NO_DFA
@@ -4507,16 +4509,16 @@ This specifies the amount by which @command{gawk} should grow its
internal evaluation stack, when needed.
@item INT_CHAIN_MAX
-The intended maximum number of items @command{gawk} will maintain on a
+This specifies intended maximum number of items @command{gawk} will maintain on a
hash chain for managing arrays indexed by integers.
@item STR_CHAIN_MAX
-The intended maximum number of items @command{gawk} will maintain on a
+This specifies intended maximum number of items @command{gawk} will maintain on a
hash chain for managing arrays indexed by strings.
@item TIDYMEM
If this variable exists, @command{gawk} uses the @code{mtrace()} library
-calls from GNU LIBC to help track down possible memory leaks.
+calls from the GNU C library to help track down possible memory leaks.
@end table
@node Exit Status
@@ -4553,7 +4555,7 @@ The @code{@@include} keyword can be used to read external @command{awk} source
files. This gives you the ability to split large @command{awk} source files
into smaller, more manageable pieces, and also lets you reuse common @command{awk}
code from various @command{awk} scripts. In other words, you can group
-together @command{awk} functions, used to carry out specific tasks,
+together @command{awk} functions used to carry out specific tasks
into external files. These files can be used just like function libraries,
using the @code{@@include} keyword in conjunction with the @env{AWKPATH}
environment variable. Note that source files may also be included
@@ -4588,7 +4590,7 @@ $ @kbd{gawk -f test2}
@print{} This is script test2.
@end example
-@code{gawk} runs the @file{test2} script, which includes @file{test1}
+@command{gawk} runs the @file{test2} script, which includes @file{test1}
using the @code{@@include}
keyword. So, to include external @command{awk} source files, you just
use @code{@@include} followed by the name of the file to be included,
@@ -4643,11 +4645,12 @@ of the @env{AWKPATH} variable in command-line file searches
This is very helpful in constructing @command{gawk} function libraries.
If you have a large script with useful, general-purpose @command{awk}
functions, you can break it down into library files and put those files
-in a special directory. You can then include those ``libraries,'' using
-either the full pathnames of the files, or by setting the @env{AWKPATH}
+in a special directory. You can then include those ``libraries,''
+either by using the full pathnames of the files, or by setting the @env{AWKPATH}
environment variable accordingly and then using @code{@@include} with
-just the file part of the full pathname. Of course, you can have more
-than one directory to keep library files; the more complex the working
+just the file part of the full pathname. Of course,
+you can keep library files in more than one directory;
+the more complex the working
environment is, the more directories you may need to organize the files
to be included.
@@ -4660,8 +4663,8 @@ In particular, @code{@@include} is very useful for writing CGI scripts
to be run from web pages.
As mentioned in @ref{AWKPATH Variable}, the current directory is always
-searched first for source files, before searching in @env{AWKPATH},
-and this also applies to files named with @code{@@include}.
+searched first for source files, before searching in @env{AWKPATH};
+this also applies to files named with @code{@@include}.
@node Loading Shared Libraries
@section Loading Dynamic Extensions into Your Program
@@ -4715,8 +4718,8 @@ It also describes the @code{ordchr} extension.
@cindex features, deprecated
@cindex obsolete features
This @value{SECTION} describes features and/or command-line options from
-previous releases of @command{gawk} that are either not available in the
-current version or that are still supported but deprecated (meaning that
+previous releases of @command{gawk} that either are not available in the
+current version or are still supported but deprecated (meaning that
they will @emph{not} be in the next release).
The process-related special files @file{/dev/pid}, @file{/dev/ppid},
@@ -4796,7 +4799,7 @@ This seems to have been a long-undocumented feature in Unix @command{awk}.
Similarly, you may use @code{print} or @code{printf} statements in the
@var{init} and @var{increment} parts of a @code{for} loop. This is another
-long-undocumented ``feature'' of Unix @code{awk}.
+long-undocumented ``feature'' of Unix @command{awk}.
@end ignore
@@ -4813,7 +4816,7 @@ to run @command{awk}.
@item
The three standard options for all versions of @command{awk} are
-@option{-f}, @option{-F} and @option{-v}. @command{gawk} supplies these
+@option{-f}, @option{-F}, and @option{-v}. @command{gawk} supplies these
and many others, as well as corresponding GNU-style long options.
@item
@@ -4850,13 +4853,12 @@ and @option{-f} command-line options.
@item
@command{gawk} allows you to load additional functions written in C
or C++ using the @code{@@load} statement and/or the @option{-l} option.
-(This advanced feature is described later on in @ref{Dynamic Extensions}.)
+(This advanced feature is described later, in @ref{Dynamic Extensions}.)
@end itemize
@node Regexp
@chapter Regular Expressions
@cindex regexp
-@c STARTOFRANGE regexp
@cindex regular expressions
A @dfn{regular expression}, or @dfn{regexp}, is a way of describing a
@@ -5063,7 +5065,7 @@ Horizontal TAB, @kbd{Ctrl-i}, ASCII code 9 (HT).
@cindex @code{\} (backslash), @code{\v} escape sequence
@cindex backslash (@code{\}), @code{\v} escape sequence
@item \v
-Vertical tab, @kbd{Ctrl-k}, ASCII code 11 (VT).
+Vertical TAB, @kbd{Ctrl-k}, ASCII code 11 (VT).
@cindex @code{\} (backslash), @code{\}@var{nnn} escape sequence
@cindex backslash (@code{\}), @code{\}@var{nnn} escape sequence
@@ -5137,7 +5139,7 @@ characters @samp{a+b}.
@cindex @code{\} (backslash), in escape sequences
@cindex portability
For complete portability, do not use a backslash before any character not
-shown in the previous list and that is not an operator.
+shown in the previous list or that is not an operator.
@c 11/2014: Moved so as to not stack sidebars
@sidebar Backslash Before Regular Characters
@@ -5216,7 +5218,6 @@ escape sequences literally when used in regexp constants. Thus,
@node Regexp Operators
@section Regular Expression Operators
-@c STARTOFRANGE regexpo
@cindex regular expressions, operators
@cindex metacharacters in regular expressions
@@ -5234,7 +5235,7 @@ are recognized and converted into corresponding real characters as
the very first step in processing regexps.
Here is a list of metacharacters. All characters that are not escape
-sequences and that are not listed in the following stand for themselves:
+sequences and that are not listed here stand for themselves:
@c Use @asis so the docbook comes out ok. Sigh.
@table @asis
@@ -5357,7 +5358,7 @@ just @samp{p} if no @samp{h}s are present.
There are two subtle points to understand about how @samp{*} works.
First, the @samp{*} applies only to the single preceding regular expression
component (e.g., in @samp{ph*}, it applies just to the @samp{h}).
-To cause @samp{*} to apply to a larger sub-expression, use parentheses:
+To cause @samp{*} to apply to a larger subexpression, use parentheses:
@samp{(ph)*} matches @samp{ph}, @samp{phph}, @samp{phphph}, and so on.
Second, @samp{*} finds as many repetitions as possible. If the text
@@ -5396,10 +5397,10 @@ is repeated at least @var{n} times:
Matches @samp{whhhy}, but not @samp{why} or @samp{whhhhy}.
@item wh@{3,5@}y
-Matches @samp{whhhy}, @samp{whhhhy}, or @samp{whhhhhy}, only.
+Matches @samp{whhhy}, @samp{whhhhy}, or @samp{whhhhhy} only.
@item wh@{2,@}y
-Matches @samp{whhy} or @samp{whhhy}, and so on.
+Matches @samp{whhy}, @samp{whhhy}, and so on.
@end table
@cindex POSIX @command{awk}, interval expressions in
@@ -5448,11 +5449,9 @@ usage as a syntax error.
If @command{gawk} is in compatibility mode (@pxref{Options}), interval
expressions are not available in regular expressions.
-@c ENDOFRANGE regexpo
@node Bracket Expressions
@section Using Bracket Expressions
-@c STARTOFRANGE charlist
@cindex bracket expressions
@cindex bracket expressions, range expressions
@cindex range expressions (regexps)
@@ -5528,7 +5527,7 @@ POSIX standard.
(a space is printable but not visible, whereas an @samp{a} is both)
@item @code{[:lower:]} @tab Lowercase alphabetic characters
@item @code{[:print:]} @tab Printable characters (characters that are not control characters)
-@item @code{[:punct:]} @tab Punctuation characters (characters that are not letters, digits
+@item @code{[:punct:]} @tab Punctuation characters (characters that are not letters, digits,
control characters, or space characters)
@item @code{[:space:]} @tab Space characters (such as space, TAB, and formfeed, to name a few)
@item @code{[:upper:]} @tab Uppercase alphabetic characters
@@ -5548,11 +5547,11 @@ and numeric characters in your character set.
@c Date: Tue, 01 Jul 2014 07:39:51 +0200
@c From: Hermann Peifer <peifer@gmx.eu>
Some utilities that match regular expressions provide a nonstandard
-@code{[:ascii:]} character class; @command{awk} does not. However, you
-can simulate such a construct using @code{[\x00-\x7F]}. This matches
+@samp{[:ascii:]} character class; @command{awk} does not. However, you
+can simulate such a construct using @samp{[\x00-\x7F]}. This matches
all values numerically between zero and 127, which is the defined
range of the ASCII character set. Use a complemented character list
-(@code{[^\x00-\x7F]}) to match any single-byte characters that are not
+(@samp{[^\x00-\x7F]}) to match any single-byte characters that are not
in the ASCII range.
@cindex bracket expressions, collating elements
@@ -5581,8 +5580,8 @@ Locale-specific names for a list of
characters that are equal. The name is enclosed between
@samp{[=} and @samp{=]}.
For example, the name @samp{e} might be used to represent all of
-``e,'' ``@`e,'' and ``@'e.'' In this case, @samp{[[=e=]]} is a regexp
-that matches any of @samp{e}, @samp{@'e}, or @samp{@`e}.
+``e,'' ``@^e,'' ``@`e,'' and ``@'e.'' In this case, @samp{[[=e=]]} is a regexp
+that matches any of @samp{e}, @samp{@^e}, @samp{@'e}, or @samp{@`e}.
@end table
These features are very valuable in non-English-speaking locales.
@@ -5596,7 +5595,6 @@ expression matching currently recognize only POSIX character classes;
they do not recognize collating symbols or equivalence classes.
@end quotation
@c maybe one day ...
-@c ENDOFRANGE charlist
@node Leftmost Longest
@section How Much Text Matches?
@@ -5612,7 +5610,7 @@ echo aaaabcd | awk '@{ sub(/a+/, "<A>"); print @}'
This example uses the @code{sub()} function to make a change to the input
record. (@code{sub()} replaces the first instance of any text matched
by the first argument with the string provided as the second argument;
-@pxref{String Functions}). Here, the regexp @code{/a+/} indicates ``one
+@pxref{String Functions}.) Here, the regexp @code{/a+/} indicates ``one
or more @samp{a} characters,'' and the replacement text is @samp{<A>}.
The input contains four @samp{a} characters.
@@ -5640,9 +5638,7 @@ and also @pxref{Field Separators}).
@node Computed Regexps
@section Using Dynamic Regexps
-@c STARTOFRANGE dregexp
@cindex regular expressions, computed
-@c STARTOFRANGE regexpd
@cindex regular expressions, dynamic
@cindex @code{~} (tilde), @code{~} operator
@cindex tilde (@code{~}), @code{~} operator
@@ -5668,14 +5664,14 @@ and tests whether the input record matches this regexp.
@quotation NOTE
When using the @samp{~} and @samp{!~}
-operators, there is a difference between a regexp constant
+operators, be aware that there is a difference between a regexp constant
enclosed in slashes and a string constant enclosed in double quotes.
If you are going to use a string constant, you have to understand that
the string is, in essence, scanned @emph{twice}: the first time when
@command{awk} reads your program, and the second time when it goes to
match the string on the lefthand side of the operator with the pattern
on the right. This is true of any string-valued expression (such as
-@code{digits_regexp}, shown previously), not just string constants.
+@code{digits_regexp}, shown in the previous example), not just string constants.
@end quotation
@cindex regexp constants, slashes vs.@: quotes
@@ -5749,17 +5745,13 @@ $ @kbd{awk '$0 ~ /[ \t\n]/'}
@command{gawk} does not have this problem, and it isn't likely to
occur often in practice, but it's worth noting for future reference.
@end sidebar
-@c ENDOFRANGE dregexp
-@c ENDOFRANGE regexpd
@node GNU Regexp Operators
@section @command{gawk}-Specific Regexp Operators
@c This section adapted (long ago) from the regex-0.12 manual
-@c STARTOFRANGE regexpg
@cindex regular expressions, operators, @command{gawk}
-@c STARTOFRANGE gregexp
@cindex @command{gawk}, regular expressions, operators
@cindex operators, GNU-specific
@cindex regular expressions, operators, for words
@@ -5835,7 +5827,7 @@ matches either @samp{ball} or @samp{balls}, as a separate word.
@item \B
Matches the empty string that occurs between two
word-constituent characters. For example,
-@code{/\Brat\B/} matches @samp{crate} but it does not match @samp{dirty rat}.
+@code{/\Brat\B/} matches @samp{crate}, but it does not match @samp{dirty rat}.
@samp{\B} is essentially the opposite of @samp{\y}.
@end table
@@ -5854,14 +5846,14 @@ The operators are:
@cindex backslash (@code{\}), @code{\`} operator (@command{gawk})
@cindex @code{\} (backslash), @code{\`} operator (@command{gawk})
Matches the empty string at the
-beginning of a buffer (string).
+beginning of a buffer (string)
@c @cindex operators, @code{\'} (@command{gawk})
@cindex backslash (@code{\}), @code{\'} operator (@command{gawk})
@cindex @code{\} (backslash), @code{\'} operator (@command{gawk})
@item \'
Matches the empty string at the
-end of a buffer (string).
+end of a buffer (string)
@end table
@cindex @code{^} (caret), regexp operator
@@ -5924,15 +5916,11 @@ Allow interval expressions in regexps, if @option{--traditional}
has been provided.
Otherwise, interval expressions are available by default.
@end table
-@c ENDOFRANGE gregexp
-@c ENDOFRANGE regexpg
@node Case-sensitivity
@section Case Sensitivity in Matching
-@c STARTOFRANGE regexpcs
@cindex regular expressions, case sensitivity
-@c STARTOFRANGE csregexp
@cindex case sensitivity, regexps and
Case is normally significant in regular expressions, both when matching
ordinary characters (i.e., not metacharacters) and inside bracket
@@ -6024,8 +6012,6 @@ the right thing.}
The value of @code{IGNORECASE} has no effect if @command{gawk} is in
compatibility mode (@pxref{Options}).
Case is always significant in compatibility mode.
-@c ENDOFRANGE csregexp
-@c ENDOFRANGE regexpcs
@node Regexp Summary
@section Summary
@@ -6072,12 +6058,10 @@ versions, use @code{tolower()} or @code{toupper()}.
@end itemize
-@c ENDOFRANGE regexp
@node Reading Files
@chapter Reading Input Files
-@c STARTOFRANGE infir
@cindex reading input files
@cindex input files, reading
@cindex input files
@@ -6102,7 +6086,7 @@ This makes it more convenient for programs to work on the parts of a record.
@cindex @code{getline} command
On rare occasions, you may need to use the @code{getline} command.
-The @code{getline} command is valuable, both because it
+The @code{getline} command is valuable both because it
can do explicit input from any number of files, and because the files
used with it do not have to be named on the @command{awk} command line
(@pxref{Getline}).
@@ -6128,9 +6112,7 @@ used with it do not have to be named on the @command{awk} command line
@node Records
@section How Input Is Split into Records
-@c STARTOFRANGE inspl
@cindex input, splitting into records
-@c STARTOFRANGE recspl
@cindex records, splitting input into
@cindex @code{NR} variable
@cindex @code{FNR} variable
@@ -6155,8 +6137,8 @@ never automatically reset to zero.
Records are separated by a character called the @dfn{record separator}.
By default, the record separator is the newline character.
This is why records are, by default, single lines.
-A different character can be used for the record separator by
-assigning the character to the predefined variable @code{RS}.
+To use a different character for the record separator,
+simply assign that character to the predefined variable @code{RS}.
@cindex newlines, as record separators
@cindex @code{RS} variable
@@ -6179,8 +6161,8 @@ awk 'BEGIN @{ RS = "u" @}
@noindent
changes the value of @code{RS} to @samp{u}, before reading any input.
-This is a string whose first character is the letter ``u''; as a result, records
-are separated by the letter ``u.'' Then the input file is read, and the second
+The new value is a string whose first character is the letter ``u''; as a result, records
+are separated by the letter ``u''. Then the input file is read, and the second
rule in the @command{awk} program (the action with no pattern) prints each
record. Because each @code{print} statement adds a newline at the end of
its output, this @command{awk} program copies the input
@@ -6241,8 +6223,8 @@ Bill 555-1675 bill.drowning@@hotmail.com A
@end example
@noindent
-It contains no @samp{u} so there is no reason to split the record,
-unlike the others which have one or more occurrences of the @samp{u}.
+It contains no @samp{u}, so there is no reason to split the record,
+unlike the others, which each have one or more occurrences of the @samp{u}.
In fact, this record is treated as part of the previous record;
the newline separating them in the output
is the original newline in the @value{DF}, not the one added by
@@ -6337,7 +6319,7 @@ contains the same single character. However, when @code{RS} is a
regular expression, @code{RT} contains
the actual input text that matched the regular expression.
-If the input file ended without any text that matches @code{RS},
+If the input file ends without any text matching @code{RS},
@command{gawk} sets @code{RT} to the null string.
The following example illustrates both of these features.
@@ -6430,8 +6412,6 @@ character as a record separator. However, this is a special case:
whole files. If you are using @command{gawk}, see @DBREF{Extension Sample
Readfile} for another option.
@end sidebar
-@c ENDOFRANGE inspl
-@c ENDOFRANGE recspl
@node Fields
@section Examining Fields
@@ -6439,7 +6419,6 @@ Readfile} for another option.
@cindex examining fields
@cindex fields
@cindex accessing fields
-@c STARTOFRANGE fiex
@cindex fields, examining
@cindex POSIX @command{awk}, field separators and
@cindex field separators, POSIX and
@@ -6464,11 +6443,11 @@ simple @command{awk} programs so powerful.
@cindex @code{$} (dollar sign), @code{$} field operator
@cindex dollar sign (@code{$}), @code{$} field operator
@cindex field operators@comma{} dollar sign as
-You use a dollar-sign (@samp{$})
+You use a dollar sign (@samp{$})
to refer to a field in an @command{awk} program,
followed by the number of the field you want. Thus, @code{$1}
refers to the first field, @code{$2} to the second, and so on.
-(Unlike the Unix shells, the field numbers are not limited to single digits.
+(Unlike in the Unix shells, the field numbers are not limited to single digits.
@code{$127} is the 127th field in the record.)
For example, suppose the following is a line of input:
@@ -6494,7 +6473,7 @@ If you try to reference a field beyond the last
one (such as @code{$8} when the record has only seven fields), you get
the empty string. (If used in a numeric operation, you get zero.)
-The use of @code{$0}, which looks like a reference to the ``zero-th'' field, is
+The use of @code{$0}, which looks like a reference to the ``zeroth'' field, is
a special case: it represents the whole input record. Use it
when you are not interested in specific fields.
Here are some more examples:
@@ -6520,7 +6499,6 @@ $ @kbd{awk '/li/ @{ print $1, $NF @}' mail-list}
@print{} Julie F
@print{} Samuel A
@end example
-@c ENDOFRANGE fiex
@node Nonconstant Fields
@section Nonconstant Field Numbers
@@ -6550,13 +6528,13 @@ awk '@{ print $(2*2) @}' mail-list
@end example
@command{awk} evaluates the expression @samp{(2*2)} and uses
-its value as the number of the field to print. The @samp{*} sign
+its value as the number of the field to print. The @samp{*}
represents multiplication, so the expression @samp{2*2} evaluates to four.
The parentheses are used so that the multiplication is done before the
@samp{$} operation; they are necessary whenever there is a binary
operator@footnote{A @dfn{binary operator}, such as @samp{*} for
multiplication, is one that takes two operands. The distinction
-is required, because @command{awk} also has unary (one-operand)
+is required because @command{awk} also has unary (one-operand)
and ternary (three-operand) operators.}
in the field-number expression. This example, then, prints the
type of relationship (the fourth field) for every line of the file
@@ -6581,7 +6559,6 @@ evaluating @code{NF} and using its value as a field number.
@node Changing Fields
@section Changing the Contents of a Field
-@c STARTOFRANGE ficon
@cindex fields, changing contents of
The contents of a field, as seen by @command{awk}, can be changed within an
@command{awk} program; this changes what @command{awk} perceives as the
@@ -6737,7 +6714,7 @@ rebuild @code{$0} when @code{NF} is decremented.
Finally, there are times when it is convenient to force
@command{awk} to rebuild the entire record, using the current
-value of the fields and @code{OFS}. To do this, use the
+values of the fields and @code{OFS}. To do this, use the
seemingly innocuous assignment:
@example
@@ -6761,7 +6738,7 @@ such as @code{sub()} and @code{gsub()}
It is important to remember that @code{$0} is the @emph{full}
record, exactly as it was read from the input. This includes
any leading or trailing whitespace, and the exact whitespace (or other
-characters) that separate the fields.
+characters) that separates the fields.
It is a common error to try to change the field separators
in a record simply by setting @code{FS} and @code{OFS}, and then
@@ -6773,7 +6750,6 @@ itself. Instead, you must force the record to be rebuilt, typically
with a statement such as @samp{$1 = $1}, as described earlier.
@end sidebar
-@c ENDOFRANGE ficon
@node Field Separators
@section Specifying How Fields Are Separated
@@ -6789,9 +6765,7 @@ with a statement such as @samp{$1 = $1}, as described earlier.
@cindex @code{FS} variable
@cindex fields, separating
-@c STARTOFRANGE fisepr
@cindex field separators
-@c STARTOFRANGE fisepg
@cindex fields, separating
The @dfn{field separator}, which is either a single character or a regular
expression, controls the way @command{awk} splits an input record into fields.
@@ -6857,7 +6831,7 @@ John Q. Smith, LXIX, 29 Oak St., Walamazoo, MI 42139
@end example
@noindent
-The same program would extract @samp{@bullet{}LXIX}, instead of
+The same program would extract @samp{@bullet{}LXIX} instead of
@samp{@bullet{}29@bullet{}Oak@bullet{}St.}.
If you were expecting the program to print the
address, you would be surprised. The moral is to choose your data layout and
@@ -6891,9 +6865,7 @@ rules.
@node Regexp Field Splitting
@subsection Using Regular Expressions to Separate Fields
-@c STARTOFRANGE regexpfs
@cindex regular expressions, as field separators
-@c STARTOFRANGE fsregexp
@cindex field separators, regular expressions as
The previous @value{SUBSECTION}
discussed the use of single characters or simple strings as the
@@ -6997,8 +6969,6 @@ $ @kbd{echo 'xxAA xxBxx C' |}
@print{} -->xxBxx<--
@print{} -->C<--
@end example
-@c ENDOFRANGE regexpfs
-@c ENDOFRANGE fsregexp
@node Single Character Fields
@subsection Making Each Character a Separate Field
@@ -7122,7 +7092,7 @@ choosing your field and record separators.
@cindex Unix @command{awk}, password files@comma{} field separators and
Perhaps the most common use of a single character as the field separator
occurs when processing the Unix system password file. On many Unix
-systems, each user has a separate entry in the system password file, one
+systems, each user has a separate entry in the system password file, with one
line per user. The information in these lines is separated by colons.
The first field is the user's login name and the second is the user's
encrypted or shadow password. (A shadow password is indicated by the
@@ -7163,7 +7133,7 @@ When you do this, @code{$1} is the same as @code{$0}.
According to the POSIX standard, @command{awk} is supposed to behave
as if each record is split into fields at the time it is read.
In particular, this means that if you change the value of @code{FS}
-after a record is read, the value of the fields (i.e., how they were split)
+after a record is read, the values of the fields (i.e., how they were split)
should reflect the old value of @code{FS}, not the new one.
@cindex dark corner, field separators
@@ -7176,10 +7146,7 @@ using the @emph{current} value of @code{FS}!
@value{DARKCORNER}
This behavior can be difficult
to diagnose. The following example illustrates the difference
-between the two methods.
-(The @command{sed}@footnote{The @command{sed} utility is a ``stream editor.''
-Its behavior is also defined by the POSIX standard.}
-command prints just the first line of @file{/etc/passwd}.)
+between the two methods:
@example
sed 1q /etc/passwd | awk '@{ FS = ":" ; print $1 @}'
@@ -7199,6 +7166,10 @@ prints the full first line of the file, something like:
@example
root:x:0:0:Root:/:
@end example
+
+(The @command{sed}@footnote{The @command{sed} utility is a ``stream editor.''
+Its behavior is also defined by the POSIX standard.}
+command prints just the first line of @file{/etc/passwd}.)
@end sidebar
@node Field Splitting Summary
@@ -7259,8 +7230,6 @@ do it for you (e.g., @samp{FS = "[c]"}). In this case, @code{IGNORECASE}
will take effect.
@end sidebar
-@c ENDOFRANGE fisepr
-@c ENDOFRANGE fisepg
@node Constant Size
@section Reading Fixed-Width Data
@@ -7375,7 +7344,7 @@ In order to tell which kind of field splitting is in effect,
use @code{PROCINFO["FS"]}
(@pxref{Auto-set}).
The value is @code{"FS"} if regular field splitting is being used,
-or it is @code{"FIELDWIDTHS"} if fixed-width field splitting is being used:
+or @code{"FIELDWIDTHS"} if fixed-width field splitting is being used:
@example
if (PROCINFO["FS"] == "FS")
@@ -7411,14 +7380,14 @@ what they are, and not by what they are not.
The most notorious such case
is so-called @dfn{comma-separated values} (CSV) data. Many spreadsheet programs,
for example, can export their data into text files, where each record is
-terminated with a newline, and fields are separated by commas. If only
-commas separated the data, there wouldn't be an issue. The problem comes when
+terminated with a newline, and fields are separated by commas. If
+commas only separated the data, there wouldn't be an issue. The problem comes when
one of the fields contains an @emph{embedded} comma.
In such cases, most programs embed the field in double quotes.@footnote{The
CSV format lacked a formal standard definition for many years.
@uref{http://www.ietf.org/rfc/rfc4180.txt, RFC 4180}
standardizes the most common practices.}
-So we might have data like this:
+So, we might have data like this:
@example
@c file eg/misc/addresses.csv
@@ -7504,8 +7473,8 @@ of cases, and the @command{gawk} developers are satisfied with that.
@end quotation
As written, the regexp used for @code{FPAT} requires that each field
-have a least one character. A straightforward modification
-(changing changed the first @samp{+} to @samp{*}) allows fields to be empty:
+contain at least one character. A straightforward modification
+(changing the first @samp{+} to @samp{*}) allows fields to be empty:
@example
FPAT = "([^,]*)|(\"[^\"]+\")"
@@ -7515,20 +7484,17 @@ Finally, the @code{patsplit()} function makes the same functionality
available for splitting regular strings (@pxref{String Functions}).
To recap, @command{gawk} provides three independent methods
-to split input records into fields. @command{gawk} uses whichever
-mechanism was last chosen based on which of the three
-variables---@code{FS}, @code{FIELDWIDTHS}, and @code{FPAT}---was
+to split input records into fields.
+The mechanism used is based on which of the three
+variables---@code{FS}, @code{FIELDWIDTHS}, or @code{FPAT}---was
last assigned to.
@node Multiple Line
@section Multiple-Line Records
@cindex multiple-line records
-@c STARTOFRANGE recm
@cindex records, multiline
-@c STARTOFRANGE imr
@cindex input, multiline records
-@c STARTOFRANGE frm
@cindex files, reading, multiline records
@cindex input, files, See input files
In some databases, a single line cannot conveniently hold all the
@@ -7563,7 +7529,7 @@ at the end of the record and one or more blank lines after the record.
In addition, a regular expression always matches the longest possible
sequence when there is a choice
(@pxref{Leftmost Longest}).
-So the next record doesn't start until
+So, the next record doesn't start until
the first nonblank line that follows---no matter how many blank lines
appear in a row, they are considered one record separator.
@@ -7578,10 +7544,10 @@ In the second case, this special processing is not done.
@cindex field separator, in multiline records
@cindex @code{FS}, in multiline records
Now that the input is separated into records, the second step is to
-separate the fields in the record. One way to do this is to divide each
+separate the fields in the records. One way to do this is to divide each
of the lines into fields in the normal manner. This happens by default
as the result of a special feature. When @code{RS} is set to the empty
-string, @emph{and} @code{FS} is set to a single character,
+string @emph{and} @code{FS} is set to a single character,
the newline character @emph{always} acts as a field separator.
This is in addition to whatever field separations result from
@code{FS}.@footnote{When @code{FS} is the null string (@code{""})
@@ -7596,7 +7562,7 @@ want the newline character to separate fields, because there is no way to
prevent it. However, you can work around this by using the @code{split()}
function to break up the record manually
(@pxref{String Functions}).
-If you have a single character field separator, you can work around
+If you have a single-character field separator, you can work around
the special feature in a different way, by making @code{FS} into a
regexp for that single character. For example, if the field
separator is a percent character, instead of
@@ -7604,10 +7570,10 @@ separator is a percent character, instead of
Another way to separate fields is to
put each field on a separate line: to do this, just set the
-variable @code{FS} to the string @code{"\n"}. (This single
-character separator matches a single newline.)
+variable @code{FS} to the string @code{"\n"}.
+(This single-character separator matches a single newline.)
A practical example of a @value{DF} organized this way might be a mailing
-list, where each entry is separated by blank lines. Consider a mailing
+list, where blank lines separate the entries. Consider a mailing
list in a file named @file{addresses}, which looks like this:
@example
@@ -7695,20 +7661,15 @@ If not in compatibility mode (@pxref{Options}), @command{gawk} sets
@code{RT} to the input text that matched the value specified by @code{RS}.
But if the input file ended without any text that matches @code{RS},
then @command{gawk} sets @code{RT} to the null string.
-@c ENDOFRANGE recm
-@c ENDOFRANGE imr
-@c ENDOFRANGE frm
@node Getline
@section Explicit Input with @code{getline}
-@c STARTOFRANGE getl
@cindex @code{getline} command, explicit input with
-@c STARTOFRANGE inex
@cindex input, explicit
So far we have been getting our input data from @command{awk}'s main
input stream---either the standard input (usually your keyboard, sometimes
-the output from another program) or from the
+the output from another program) or the
files specified on the command line. The @command{awk} language has a
special built-in command called @code{getline} that
can be used to read input under your explicit control.
@@ -7892,7 +7853,7 @@ free
@end example
The @code{getline} command used in this way sets only the variables
-@code{NR}, @code{FNR}, and @code{RT} (and of course, @var{var}).
+@code{NR}, @code{FNR}, and @code{RT} (and, of course, @var{var}).
The record is not
split into fields, so the values of the fields (including @code{$0}) and
the value of @code{NF} do not change.
@@ -7907,7 +7868,7 @@ the value of @code{NF} do not change.
@cindex left angle bracket (@code{<}), @code{<} operator (I/O)
@cindex operators, input/output
Use @samp{getline < @var{file}} to read the next record from @var{file}.
-Here @var{file} is a string-valued expression that
+Here, @var{file} is a string-valued expression that
specifies the @value{FN}. @samp{< @var{file}} is called a @dfn{redirection}
because it directs input to come from a different place.
For example, the following
@@ -8085,7 +8046,7 @@ of a construct like @samp{@w{"echo "} "date" | getline}.
Most versions, including the current version, treat it at as
@samp{@w{("echo "} "date") | getline}.
(This is also how BWK @command{awk} behaves.)
-Some versions changed and treated it as
+Some versions instead treat it as
@samp{@w{"echo "} ("date" | getline)}.
(This is how @command{mawk} behaves.)
In short, @emph{always} use explicit parentheses, and then you won't
@@ -8133,7 +8094,7 @@ program to be portable to other @command{awk} implementations.
@cindex operators, input/output
@cindex differences in @command{awk} and @command{gawk}, input/output operators
-Input into @code{getline} from a pipe is a one-way operation.
+Reading input into @code{getline} from a pipe is a one-way operation.
The command that is started with @samp{@var{command} | getline} only
sends data @emph{to} your @command{awk} program.
@@ -8143,7 +8104,7 @@ for processing and then read the results back.
communications are possible. This is done with the @samp{|&}
operator.
Typically, you write data to the coprocess first and then
-read results back, as shown in the following:
+read the results back, as shown in the following:
@example
print "@var{some query}" |& "db_server"
@@ -8226,7 +8187,7 @@ also @pxref{Auto-set}.)
@item
Using @code{FILENAME} with @code{getline}
(@samp{getline < FILENAME})
-is likely to be a source for
+is likely to be a source of
confusion. @command{awk} opens a separate input stream from the
current input file. However, by not using a variable, @code{$0}
and @code{NF} are still updated. If you're doing this, it's
@@ -8234,9 +8195,15 @@ probably by accident, and you should reconsider what it is you're
trying to accomplish.
@item
-@DBREF{Getline Summary} presents a table summarizing the
+@ifdocbook
+The next section
+@end ifdocbook
+@ifnotdocbook
+@ref{Getline Summary},
+@end ifnotdocbook
+presents a table summarizing the
@code{getline} variants and which variables they can affect.
-It is worth noting that those variants which do not use redirection
+It is worth noting that those variants that do not use redirection
can cause @code{FILENAME} to be updated if they cause
@command{awk} to start reading a new input file.
@@ -8245,7 +8212,7 @@ can cause @code{FILENAME} to be updated if they cause
If the variable being assigned is an expression with side effects,
different versions of @command{awk} behave differently upon encountering
end-of-file. Some versions don't evaluate the expression; many versions
-(including @command{gawk}) do. Here is an example, due to Duncan Moore:
+(including @command{gawk}) do. Here is an example, courtesy of Duncan Moore:
@ignore
Date: Sun, 01 Apr 2012 11:49:33 +0100
@@ -8262,7 +8229,7 @@ BEGIN @{
@noindent
Here, the side effect is the @samp{++c}. Is @code{c} incremented if
-end of file is encountered, before the element in @code{a} is assigned?
+end-of-file is encountered before the element in @code{a} is assigned?
@command{gawk} treats @code{getline} like a function call, and evaluates
the expression @samp{a[++c]} before attempting to read from @file{f}.
@@ -8294,9 +8261,6 @@ Note: for each variant, @command{gawk} sets the @code{RT} predefined variable.
@item @var{command} @code{|& getline} @var{var} @tab Sets @var{var} and @code{RT} @tab @command{gawk}
@end multitable
@end float
-@c ENDOFRANGE getl
-@c ENDOFRANGE inex
-@c ENDOFRANGE infir
@node Read Timeout
@section Reading Input with a Timeout
@@ -8307,8 +8271,8 @@ This @value{SECTION} describes a feature that is specific to @command{gawk}.
You may specify a timeout in milliseconds for reading input from the keyboard,
a pipe, or two-way communication, including TCP/IP sockets. This can be done
-on a per input, command, or connection basis, by setting a special element
-in the @code{PROCINFO} array (@pxref{Auto-set}):
+on a per-input, per-command, or per-connection basis, by setting a special
+element in the @code{PROCINFO} array (@pxref{Auto-set}):
@example
PROCINFO["input_name", "READ_TIMEOUT"] = @var{timeout in milliseconds}
@@ -8339,7 +8303,7 @@ while ((getline < "/dev/stdin") > 0)
@end example
@command{gawk} terminates the read operation if input does not
-arrive after waiting for the timeout period, returns failure
+arrive after waiting for the timeout period, returns failure,
and sets @code{ERRNO} to an appropriate string value.
A negative or zero value for the timeout is the same as specifying
no timeout at all.
@@ -8349,7 +8313,7 @@ loop that reads input records and matches them against patterns,
like so:
@example
-$ @kbd{ gawk 'BEGIN @{ PROCINFO["-", "READ_TIMEOUT"] = 5000 @}}
+$ @kbd{gawk 'BEGIN @{ PROCINFO["-", "READ_TIMEOUT"] = 5000 @}}
> @kbd{@{ print "You entered: " $0 @}'}
@kbd{gawk}
@print{} You entered: gawk
@@ -8389,7 +8353,7 @@ If the @code{PROCINFO} element is not present and the
@command{gawk} uses its value to initialize the timeout value.
The exclusive use of the environment variable to specify timeout
has the disadvantage of not being able to control it
-on a per command or connection basis.
+on a per-command or per-connection basis.
@command{gawk} considers a timeout event to be an error even though
the attempt to read from the underlying device may
@@ -8455,7 +8419,7 @@ The possibilities are as follows:
@item
After splitting the input into records, @command{awk} further splits
-the record into individual fields, named @code{$1}, @code{$2}, and so
+the records into individual fields, named @code{$1}, @code{$2}, and so
on. @code{$0} is the whole record, and @code{NF} indicates how many
fields there are. The default way to split fields is between whitespace
characters.
@@ -8471,12 +8435,12 @@ thing. Decrementing @code{NF} throws away fields and rebuilds the record.
@item
Field splitting is more complicated than record splitting:
-@multitable @columnfractions .40 .45 .15
+@multitable @columnfractions .40 .40 .20
@headitem Field separator value @tab Fields are split @dots{} @tab @command{awk} / @command{gawk}
@item @code{FS == " "} @tab On runs of whitespace @tab @command{awk}
@item @code{FS == @var{any single character}} @tab On that character @tab @command{awk}
@item @code{FS == @var{regexp}} @tab On text matching the regexp @tab @command{awk}
-@item @code{FS == ""} @tab Each individual character is a separate field @tab @command{gawk}
+@item @code{FS == ""} @tab Such that each individual character is a separate field @tab @command{gawk}
@item @code{FIELDWIDTHS == @var{list of columns}} @tab Based on character position @tab @command{gawk}
@item @code{FPAT == @var{regexp}} @tab On the text surrounding text matching the regexp @tab @command{gawk}
@end multitable
@@ -8493,11 +8457,11 @@ This can also be done using command-line variable assignment.
Use @code{PROCINFO["FS"]} to see how fields are being split.
@item
-Use @code{getline} in its various forms to read additional records,
+Use @code{getline} in its various forms to read additional records
from the default input stream, from a file, or from a pipe or coprocess.
@item
-Use @code{PROCINFO[@var{file}, "READ_TIMEOUT"]} to cause reads to timeout
+Use @code{PROCINFO[@var{file}, "READ_TIMEOUT"]} to cause reads to time out
for @var{file}.
@item
@@ -8531,7 +8495,6 @@ That can be fixed by making one simple change. What is it?
@node Printing
@chapter Printing Output
-@c STARTOFRANGE prnt
@cindex printing
@cindex output, printing, See printing
One of the most common programming actions is to @dfn{print}, or output,
@@ -8547,7 +8510,6 @@ columns, whether to use exponential notation or not, and so on.
For printing with specifications, you need the @code{printf} statement
(@pxref{Printf}).
-@c STARTOFRANGE prnts
@cindex @code{print} statement
@cindex @code{printf} statement
Besides basic and formatted printing, this @value{CHAPTER}
@@ -8609,7 +8571,7 @@ space is printed between any two items.
Note that the @code{print} statement is a statement and not an
expression---you can't use it in the pattern part of a
-@var{pattern}-@var{action} statement, for example.
+pattern--action statement, for example.
@node Print Examples
@section @code{print} Statement Examples
@@ -8728,7 +8690,6 @@ You can continue either a @code{print} or
@code{printf} statement simply by putting a newline after any comma
(@pxref{Statements/Lines}).
@end quotation
-@c ENDOFRANGE prnts
@node Output Separators
@section Output Separators
@@ -8801,7 +8762,7 @@ runs together on a single line.
@cindex numeric, output format
@cindex formats@comma{} numeric output
When printing numeric values with the @code{print} statement,
-@command{awk} internally converts the number to a string of characters
+@command{awk} internally converts each number to a string of characters
and prints that string. @command{awk} uses the @code{sprintf()} function
to do this conversion
(@pxref{String Functions}).
@@ -8841,7 +8802,6 @@ if @code{OFMT} contains anything but a floating-point conversion specification.
@node Printf
@section Using @code{printf} Statements for Fancier Printing
-@c STARTOFRANGE printfs
@cindex @code{printf} statement
@cindex output, formatted
@cindex formatting output
@@ -8873,7 +8833,7 @@ printf @var{format}, @var{item1}, @var{item2}, @dots{}
@noindent
As for @code{print}, the entire list of arguments may optionally be
enclosed in parentheses. Here too, the parentheses are necessary if any
-of the item expressions use the @samp{>} relational operator; otherwise,
+of the item expressions uses the @samp{>} relational operator; otherwise,
it can be confused with an output redirection (@pxref{Redirection}).
@cindex format specifiers
@@ -8904,7 +8864,7 @@ $ @kbd{awk 'BEGIN @{}
@end example
@noindent
-Here, neither the @samp{+} nor the @samp{OUCH!} appear in
+Here, neither the @samp{+} nor the @samp{OUCH!} appears in
the output message.
@node Control Letters
@@ -8951,8 +8911,8 @@ The two control letters are equivalent.
(The @samp{%i} specification is for compatibility with ISO C.)
@item @code{%e}, @code{%E}
-Print a number in scientific (exponential) notation;
-for example:
+Print a number in scientific (exponential) notation.
+For example:
@example
printf "%4.3e\n", 1950
@@ -8989,7 +8949,7 @@ The special ``not a number'' value formats as @samp{-nan} or @samp{nan}
(@pxref{Math Definitions}).
@item @code{%F}
-Like @samp{%f} but the infinity and ``not a number'' values are spelled
+Like @samp{%f}, but the infinity and ``not a number'' values are spelled
using uppercase letters.
The @samp{%F} format is a POSIX extension to ISO C; not all systems
@@ -9039,7 +8999,6 @@ values or do something else entirely.
@node Format Modifiers
@subsection Modifiers for @code{printf} Formats
-@c STARTOFRANGE pfm
@cindex @code{printf} statement, modifiers
@cindex modifiers@comma{} in format specifiers
A format specification can also include @dfn{modifiers} that can control
@@ -9078,7 +9037,7 @@ messages at runtime.
which describes how and why to use positional specifiers.
For now, we ignore them.
-@item - (Minus)
+@item - @r{(Minus)}
The minus sign, used before the width modifier (see later on in
this list),
says to left-justify
@@ -9234,7 +9193,7 @@ printf "%" w "." p "s\n", s
@end example
@noindent
-This is not particularly easy to read but it does work.
+This is not particularly easy to read, but it does work.
@c @cindex lint checks
@cindex troubleshooting, fatal errors, @code{printf} format strings
@@ -9245,7 +9204,6 @@ format strings. These are not valid in @command{awk}. Most @command{awk}
implementations silently ignore them. If @option{--lint} is provided
on the command line (@pxref{Options}), @command{gawk} warns about their
use. If @option{--posix} is supplied, their use is a fatal error.
-@c ENDOFRANGE pfm
@node Printf Examples
@subsection Examples Using @code{printf}
@@ -9281,7 +9239,7 @@ $ @kbd{awk '@{ printf "%-10s %s\n", $1, $2 @}' mail-list}
@end example
In this case, the phone numbers had to be printed as strings because
-the numbers are separated by a dash. Printing the phone numbers as
+the numbers are separated by dashes. Printing the phone numbers as
numbers would have produced just the first three digits: @samp{555}.
This would have been pretty confusing.
@@ -9326,14 +9284,11 @@ awk 'BEGIN @{ format = "%-10s %s\n"
@{ printf format, $1, $2 @}' mail-list
@end example
-@c ENDOFRANGE printfs
@node Redirection
@section Redirecting Output of @code{print} and @code{printf}
-@c STARTOFRANGE outre
@cindex output redirection
-@c STARTOFRANGE reout
@cindex redirection of output
@cindex @option{--sandbox} option, output redirection with @code{print}, @code{printf}
So far, the output from @code{print} and @code{printf} has gone
@@ -9344,7 +9299,7 @@ This is called @dfn{redirection}.
@quotation NOTE
When @option{--sandbox} is specified (@pxref{Options}),
-redirecting output to files, pipes and coprocesses is disabled.
+redirecting output to files, pipes, and coprocesses is disabled.
@end quotation
A redirection appears after the @code{print} or @code{printf} statement.
@@ -9397,7 +9352,7 @@ Each output file contains one name or number per line.
@cindex @code{>} (right angle bracket), @code{>>} operator (I/O)
@cindex right angle bracket (@code{>}), @code{>>} operator (I/O)
@item print @var{items} >> @var{output-file}
-This redirection prints the items into the pre-existing output file
+This redirection prints the items into the preexisting output file
named @var{output-file}. The difference between this and the
single-@samp{>} redirection is that the old contents (if any) of
@var{output-file} are not erased. Instead, the @command{awk} output is
@@ -9436,7 +9391,7 @@ The unsorted list is written with an ordinary redirection, while
the sorted list is written by piping through the @command{sort} utility.
The next example uses redirection to mail a message to the mailing
-list @samp{bug-system}. This might be useful when trouble is encountered
+list @code{bug-system}. This might be useful when trouble is encountered
in an @command{awk} script run periodically for system maintenance:
@example
@@ -9467,15 +9422,23 @@ This redirection prints the items to the input of @var{command}.
The difference between this and the
single-@samp{|} redirection is that the output from @var{command}
can be read with @code{getline}.
-Thus @var{command} is a @dfn{coprocess}, which works together with,
-but subsidiary to, the @command{awk} program.
+Thus, @var{command} is a @dfn{coprocess}, which works together with
+but is subsidiary to the @command{awk} program.
This feature is a @command{gawk} extension, and is not available in
POSIX @command{awk}.
-@DBXREF{Getline/Coprocess}
+@ifnotdocbook
+@xref{Getline/Coprocess},
for a brief discussion.
-@DBXREF{Two-way I/O}
+@xref{Two-way I/O},
for a more complete discussion.
+@end ifnotdocbook
+@ifdocbook
+@DBXREF{Getline/Coprocess}
+for a brief discussion and
+@DBREF{Two-way I/O}
+for a more complete discussion.
+@end ifdocbook
@end table
Redirecting output using @samp{>}, @samp{>>}, @samp{|}, or @samp{|&}
@@ -9500,7 +9463,7 @@ This is indeed how redirections must be used from the shell. But in
@command{awk}, it isn't necessary. In this kind of case, a program should
use @samp{>} for all the @code{print} statements, because the output file
is only opened once. (It happens that if you mix @samp{>} and @samp{>>}
-that output is produced in the expected order. However, mixing the operators
+output is produced in the expected order. However, mixing the operators
for the same file is definitely poor style, and is confusing to readers
of your program.)
@@ -9550,11 +9513,9 @@ It then sends the list to the shell for execution.
@DBXREF{Shell Quoting} for a function that can help in generating
command lines to be fed to the shell.
@end sidebar
-@c ENDOFRANGE outre
-@c ENDOFRANGE reout
@node Special FD
-@section Special Files for Standard Pre-Opened Data Streams
+@section Special Files for Standard Preopened Data Streams
@cindex standard input
@cindex input, standard
@cindex standard output
@@ -9567,7 +9528,7 @@ command lines to be fed to the shell.
Running programs conventionally have three input and output streams
already available to them for reading and writing. These are known
as the @dfn{standard input}, @dfn{standard output}, and @dfn{standard
-error output}. These open streams (and any other open file or pipe)
+error output}. These open streams (and any other open files or pipes)
are often referred to by the technical term @dfn{file descriptors}.
These streams are, by default, connected to your keyboard and screen, but
@@ -9605,7 +9566,7 @@ that is connected to your keyboard and screen. It represents the
``terminal,''@footnote{The ``tty'' in @file{/dev/tty} stands for
``Teletype,'' a serial terminal.} which on modern systems is a keyboard
and screen, not a serial console.)
-This generally has the same effect but not always: although the
+This generally has the same effect, but not always: although the
standard error stream is usually the screen, it can be redirected; when
that happens, writing to the screen is not correct. In fact, if
@command{awk} is run from a background job, it may not have a
@@ -9650,7 +9611,7 @@ print "Serious error detected!" > "/dev/stderr"
@cindex troubleshooting, quotes with file names
Note the use of quotes around the @value{FN}.
-Like any other redirection, the value must be a string.
+Like with any other redirection, the value must be a string.
It is a common error to omit the quotes, which leads
to confusing results.
@@ -9661,7 +9622,6 @@ invoked with the @option{--traditional} option (@pxref{Options}).
@node Special Files
@section Special @value{FFN}s in @command{gawk}
-@c STARTOFRANGE gfn
@cindex @command{gawk}, file names in
Besides access to standard input, standard output, and standard error,
@@ -9677,7 +9637,7 @@ TCP/IP networking.
@end menu
@node Other Inherited Files
-@subsection Accessing Other Open Files With @command{gawk}
+@subsection Accessing Other Open Files with @command{gawk}
Besides the @code{/dev/stdin}, @code{/dev/stdout}, and @code{/dev/stderr}
special @value{FN}s mentioned earlier, @command{gawk} provides syntax
@@ -9734,7 +9694,7 @@ special @value{FN}s that @command{gawk} provides:
@cindex compatibility mode (@command{gawk}), file names
@cindex file names, in compatibility mode
@item
-Recognition of the @value{FN}s for the three standard pre-opened
+Recognition of the @value{FN}s for the three standard preopened
files is disabled only in POSIX mode.
@item
@@ -9747,23 +9707,18 @@ compatibility mode (either @option{--traditional} or @option{--posix};
interprets these special @value{FN}s.
For example, using @samp{/dev/fd/4}
for output actually writes on file descriptor 4, and not on a new
-file descriptor that is @code{dup()}'ed from file descriptor 4. Most of
+file descriptor that is @code{dup()}ed from file descriptor 4. Most of
the time this does not matter; however, it is important to @emph{not}
close any of the files related to file descriptors 0, 1, and 2.
Doing so results in unpredictable behavior.
@end itemize
-@c ENDOFRANGE gfn
@node Close Files And Pipes
@section Closing Input and Output Redirections
@cindex files, output, See output files
-@c STARTOFRANGE ifc
@cindex input files, closing
-@c STARTOFRANGE ofc
@cindex output, files@comma{} closing
-@c STARTOFRANGE pc
@cindex pipe, closing
-@c STARTOFRANGE cc
@cindex coprocesses, closing
@cindex @code{getline} command, coprocesses@comma{} using from
@@ -9969,18 +9924,14 @@ This value is zero if the close succeeds, or @minus{}1 if
it fails.
The POSIX standard is very vague; it says that @code{close()}
-returns zero on success and nonzero otherwise. In general,
+returns zero on success and a nonzero value otherwise. In general,
different implementations vary in what they report when closing
-pipes; thus the return value cannot be used portably.
+pipes; thus, the return value cannot be used portably.
@value{DARKCORNER}
In POSIX mode (@pxref{Options}), @command{gawk} just returns zero
when closing a pipe.
@end sidebar
-@c ENDOFRANGE ifc
-@c ENDOFRANGE ofc
-@c ENDOFRANGE pc
-@c ENDOFRANGE cc
@node Nonfatal
@section Enabling Nonfatal Output
@@ -10051,8 +10002,8 @@ for numeric values for the @code{print} statement.
@item
The @code{printf} statement provides finer-grained control over output,
-with format control letters for different data types and various flags
-that modify the behavior of the format control letters.
+with format-control letters for different data types and various flags
+that modify the behavior of the format-control letters.
@item
Output from both @code{print} and @code{printf} may be redirected to
@@ -10107,11 +10058,9 @@ BEGIN @{ print "Serious error detected!" > /dev/stderr @}
@end enumerate
@c EXCLUDE END
-@c ENDOFRANGE prnt
@node Expressions
@chapter Expressions
-@c STARTOFRANGE exps
@cindex expressions
Expressions are the basic building blocks of @command{awk} patterns
@@ -10122,7 +10071,7 @@ can assign a new value to a variable or a field by using an assignment operator.
An expression can serve as a pattern or action statement on its own.
Most other kinds of
statements contain one or more expressions that specify the data on which to
-operate. As in other languages, expressions in @command{awk} include
+operate. As in other languages, expressions in @command{awk} can include
variables, array references, constants, and function calls, as well as
combinations of these with various operators.
@@ -10141,7 +10090,7 @@ combinations of these with various operators.
Expressions are built up from values and the operations performed
upon them. This @value{SECTION} describes the elementary objects
-which provide the values used in expressions.
+that provide the values used in expressions.
@menu
* Constants:: String, numeric and regexp constants.
@@ -10154,7 +10103,6 @@ which provide the values used in expressions.
@node Constants
@subsection Constant Expressions
-@c STARTOFRANGE cnst
@cindex constants, types of
The simplest type of expression is the @dfn{constant}, which always has
@@ -10192,7 +10140,7 @@ have the same value:
@end example
@cindex string constants
-A string constant consists of a sequence of characters enclosed in
+A @dfn{string constant} consists of a sequence of characters enclosed in
double quotation marks. For example:
@example
@@ -10204,7 +10152,7 @@ double quotation marks. For example:
@cindex strings, length limitations
represents the string whose contents are @samp{parrot}. Strings in
@command{gawk} can be of any length, and they can contain any of the possible
-eight-bit ASCII characters including ASCII @sc{nul} (character code zero).
+eight-bit ASCII characters, including ASCII @sc{nul} (character code zero).
Other @command{awk}
implementations may have difficulty with some character codes.
@@ -10219,15 +10167,15 @@ In @command{awk}, all numbers are in decimal (i.e., base 10). Many other
programming languages allow you to specify numbers in other bases, often
octal (base 8) and hexadecimal (base 16).
In octal, the numbers go 0, 1, 2, 3, 4, 5, 6, 7, 10, 11, 12, and so on.
-Just as @samp{11}, in decimal, is 1 times 10 plus 1, so
-@samp{11}, in octal, is 1 times 8, plus 1. This equals 9 in decimal.
+Just as @samp{11} in decimal is 1 times 10 plus 1, so
+@samp{11} in octal is 1 times 8 plus 1. This equals 9 in decimal.
In hexadecimal, there are 16 digits. Because the everyday decimal
number system only has ten digits (@samp{0}--@samp{9}), the letters
@samp{a} through @samp{f} are used to represent the rest.
(Case in the letters is usually irrelevant; hexadecimal @samp{a} and @samp{A}
have the same value.)
-Thus, @samp{11}, in
-hexadecimal, is 1 times 16 plus 1, which equals 17 in decimal.
+Thus, @samp{11} in
+hexadecimal is 1 times 16 plus 1, which equals 17 in decimal.
Just by looking at plain @samp{11}, you can't tell what base it's in.
So, in C, C++, and other languages derived from C,
@@ -10238,13 +10186,13 @@ and hexadecimal numbers start with a leading @samp{0x} or @samp{0X}:
@table @code
@item 11
-Decimal value 11.
+Decimal value 11
@item 011
-Octal 11, decimal value 9.
+Octal 11, decimal value 9
@item 0x11
-Hexadecimal 11, decimal value 17.
+Hexadecimal 11, decimal value 17
@end table
This example shows the difference:
@@ -10272,11 +10220,11 @@ you can use the @code{strtonum()} function
(@pxref{String Functions})
to convert the data into a number.
Most of the time, you will want to use octal or hexadecimal constants
-when working with the built-in bit manipulation functions;
+when working with the built-in bit-manipulation functions;
see @DBREF{Bitwise Functions}
for more information.
-Unlike some early C implementations, @samp{8} and @samp{9} are not
+Unlike in some early C implementations, @samp{8} and @samp{9} are not
valid in octal constants. For example, @command{gawk} treats @samp{018}
as decimal 18:
@@ -10311,19 +10259,17 @@ $ @kbd{gawk 'BEGIN @{ printf "0x11 is <%s>\n", 0x11 @}'}
@node Regexp Constants
@subsubsection Regular Expression Constants
-@c STARTOFRANGE rec
@cindex regexp constants
@cindex @code{~} (tilde), @code{~} operator
@cindex tilde (@code{~}), @code{~} operator
@cindex @code{!} (exclamation point), @code{!~} operator
@cindex exclamation point (@code{!}), @code{!~} operator
-A regexp constant is a regular expression description enclosed in
+A @dfn{regexp constant} is a regular expression description enclosed in
slashes, such as @code{@w{/^beginning and end$/}}. Most regexps used in
@command{awk} programs are constant, but the @samp{~} and @samp{!~}
matching operators can also match computed or dynamic regexps
(which are typically just ordinary strings or variables that contain a regexp,
-but could be a more complex expression).
-@c ENDOFRANGE cnst
+but could be more complex expressions).
@node Using Constant Regexps
@subsection Using Regular Expression Constants
@@ -10403,7 +10349,7 @@ the third argument of @code{split()} to be a regexp constant, but some
older implementations do not.
@value{DARKCORNER}
Because some built-in functions accept regexp constants as arguments,
-it can be confusing when attempting to use regexp constants as arguments
+confusion can arise when attempting to use regexp constants as arguments
to user-defined functions (@pxref{User-defined}). For example:
@example
@@ -10429,19 +10375,18 @@ function mysub(pat, repl, str, global)
In this example, the programmer wants to pass a regexp constant to the
user-defined function @code{mysub()}, which in turn passes it on to
either @code{sub()} or @code{gsub()}. However, what really happens is that
-the @code{pat} parameter is either one or zero, depending upon whether
+the @code{pat} parameter is assigned a value of either one or zero, depending upon whether
or not @code{$0} matches @code{/hi/}.
@command{gawk} issues a warning when it sees a regexp constant used as
a parameter to a user-defined function, because passing a truth value in
this way is probably not what was intended.
-@c ENDOFRANGE rec
@node Variables
@subsection Variables
@cindex variables, user-defined
@cindex user-defined, variables
-Variables are ways of storing values at one point in your program for
+@dfn{Variables} are ways of storing values at one point in your program for
use later in another part of your program. They can be manipulated
entirely within the program text, and they can also be assigned values
on the @command{awk} command line.
@@ -10469,17 +10414,17 @@ are distinct variables.
A variable name is a valid expression by itself; it represents the
variable's current value. Variables are given new values with
@dfn{assignment operators}, @dfn{increment operators}, and
-@dfn{decrement operators}.
-@xref{Assignment Ops}.
+@dfn{decrement operators}
+(@pxref{Assignment Ops}).
In addition, the @code{sub()} and @code{gsub()} functions can
change a variable's value, and the @code{match()}, @code{split()},
and @code{patsplit()} functions can change the contents of their
-array parameters. @xref{String Functions}.
+array parameters (@pxref{String Functions}).
@cindex variables, built-in
@cindex variables, initializing
A few variables have special built-in meanings, such as @code{FS} (the
-field separator), and @code{NF} (the number of fields in the current input
+field separator) and @code{NF} (the number of fields in the current input
record). @DBXREF{Built-in Variables} for a list of the predefined variables.
These predefined variables can be used and assigned just like all other
variables, but their values are also used or changed automatically by
@@ -10707,7 +10652,7 @@ point, so the default behavior was restored to use a period as the
decimal point character. You can use the @option{--use-lc-numeric}
option (@pxref{Options}) to force @command{gawk} to use the locale's
decimal point character. (@command{gawk} also uses the locale's decimal
-point character when in POSIX mode, either via @option{--posix}, or the
+point character when in POSIX mode, either via @option{--posix} or the
@env{POSIXLY_CORRECT} environment variable, as shown previously.)
@ref{table-locale-affects} describes the cases in which the locale's decimal
@@ -10725,7 +10670,7 @@ features have not been described yet.
@end multitable
@end float
-Finally, modern day formal standards and IEEE standard floating-point
+Finally, modern-day formal standards and the IEEE standard floating-point
representation can have an unusual but important effect on the way
@command{gawk} converts some special string values to numbers. The details
are presented in @ref{POSIX Floating Point Problems}.
@@ -10733,7 +10678,7 @@ are presented in @ref{POSIX Floating Point Problems}.
@node All Operators
@section Operators: Doing Something with Values
-This @value{SECTION} introduces the @dfn{operators} which make use
+This @value{SECTION} introduces the @dfn{operators} that make use
of the values provided by constants and variables.
@menu
@@ -10911,7 +10856,7 @@ print "something meaningful" > file name
@noindent
This produces a syntax error with some versions of Unix
@command{awk}.@footnote{It happens that BWK
-@command{awk}, @command{gawk} and @command{mawk} all ``get it right,''
+@command{awk}, @command{gawk}, and @command{mawk} all ``get it right,''
but you should not rely on this.}
It is necessary to use the following:
@@ -11000,11 +10945,8 @@ you're never quite sure what you'll get.
@node Assignment Ops
@subsection Assignment Expressions
-@c STARTOFRANGE asop
@cindex assignment operators
-@c STARTOFRANGE opas
@cindex operators, assignment
-@c STARTOFRANGE exas
@cindex expressions, assignment
@cindex @code{=} (equals sign), @code{=} operator
@cindex equals sign (@code{=}), @code{=} operator
@@ -11164,7 +11106,7 @@ and
@ifdocbook
@DBREF{Numeric Functions}
@end ifdocbook
-for more information).
+for more information.)
This example illustrates an important fact about assignment
operators: the lefthand expression is only evaluated @emph{once}.
@@ -11200,17 +11142,17 @@ to a number.
@caption{Arithmetic assignment operators}
@multitable @columnfractions .30 .70
@headitem Operator @tab Effect
-@item @var{lvalue} @code{+=} @var{increment} @tab Add @var{increment} to the value of @var{lvalue}
-@item @var{lvalue} @code{-=} @var{decrement} @tab Subtract @var{decrement} from the value of @var{lvalue}
-@item @var{lvalue} @code{*=} @var{coefficient} @tab Multiply the value of @var{lvalue} by @var{coefficient}
-@item @var{lvalue} @code{/=} @var{divisor} @tab Divide the value of @var{lvalue} by @var{divisor}
-@item @var{lvalue} @code{%=} @var{modulus} @tab Set @var{lvalue} to its remainder by @var{modulus}
+@item @var{lvalue} @code{+=} @var{increment} @tab Add @var{increment} to the value of @var{lvalue}.
+@item @var{lvalue} @code{-=} @var{decrement} @tab Subtract @var{decrement} from the value of @var{lvalue}.
+@item @var{lvalue} @code{*=} @var{coefficient} @tab Multiply the value of @var{lvalue} by @var{coefficient}.
+@item @var{lvalue} @code{/=} @var{divisor} @tab Divide the value of @var{lvalue} by @var{divisor}.
+@item @var{lvalue} @code{%=} @var{modulus} @tab Set @var{lvalue} to its remainder by @var{modulus}.
@cindex common extensions, @code{**=} operator
@cindex extensions, common@comma{} @code{**=} operator
@cindex @command{awk} language, POSIX version
@cindex POSIX @command{awk}
-@item @var{lvalue} @code{^=} @var{power} @tab
-@item @var{lvalue} @code{**=} @var{power} @tab Raise @var{lvalue} to the power @var{power} @value{COMMONEXT}
+@item @var{lvalue} @code{^=} @var{power} @tab Raise @var{lvalue} to the power @var{power}.
+@item @var{lvalue} @code{**=} @var{power} @tab Raise @var{lvalue} to the power @var{power}. @value{COMMONEXT}
@end multitable
@end float
@@ -11258,16 +11200,11 @@ awk '/[=]=/' /dev/null
@command{gawk} does not have this problem; BWK @command{awk}
and @command{mawk} also do not.
@end sidebar
-@c ENDOFRANGE exas
-@c ENDOFRANGE opas
-@c ENDOFRANGE asop
@node Increment Ops
@subsection Increment and Decrement Operators
-@c STARTOFRANGE inop
@cindex increment operators
-@c STARTOFRANGE opde
@cindex operators, decrement/increment
@dfn{Increment} and @dfn{decrement operators} increase or decrease the value of
a variable by one. An assignment operator can do the same thing, so
@@ -11315,7 +11252,6 @@ just like variables. (Use @samp{$(i++)} when you want to do a field reference
and a variable increment at the same time. The parentheses are necessary
because of the precedence of the field reference operator @samp{$}.)
-@c STARTOFRANGE deop
@cindex decrement operators
The decrement operator @samp{--} works just like @samp{++}, except that
it subtracts one instead of adding it. As with @samp{++}, it can be used before
@@ -11355,8 +11291,8 @@ like @samp{@var{lvalue}++}, but instead of adding, it subtracts.)
@cindex evaluation order
@cindex Marx, Groucho
@quotation
-@i{Doctor, doctor! It hurts when I do this!@*
-So don't do that!}
+@i{Doctor, it hurts when I do this!@*
+Then don't do that!}
@author Groucho Marx
@end quotation
@@ -11380,7 +11316,7 @@ print b
@cindex side effects
In other words, when do the various side effects prescribed by the
postfix operators (@samp{b++}) take effect?
-When side effects happen is @dfn{implementation defined}.
+When side effects happen is @dfn{implementation-defined}.
In other words, it is up to the particular version of @command{awk}.
The result for the first example may be 12 or 13, and for the second, it
may be 22 or 23.
@@ -11391,15 +11327,12 @@ You should avoid such things in your own programs.
@c You'll sleep better at night and be able to look at yourself
@c in the mirror in the morning.
@end sidebar
-@c ENDOFRANGE inop
-@c ENDOFRANGE opde
-@c ENDOFRANGE deop
@node Truth Values and Conditions
@section Truth Values and Conditions
-In certain contexts, expression values also serve as ``truth values''; (i.e.,
-they determine what should happen next as the program runs). This
+In certain contexts, expression values also serve as ``truth values''; i.e.,
+they determine what should happen next as the program runs. This
@value{SECTION} describes how @command{awk} defines ``true'' and ``false''
and how values are compared.
@@ -11458,19 +11391,15 @@ the string constant @code{"0"} is actually true, because it is non-null.
@author Douglas Adams, @cite{The Hitchhiker's Guide to the Galaxy}
@end quotation
-@c STARTOFRANGE comex
@cindex comparison expressions
-@c STARTOFRANGE excom
@cindex expressions, comparison
@cindex expressions, matching, See comparison expressions
@cindex matching, expressions, See comparison expressions
@cindex relational operators, See comparison operators
@cindex operators, relational, See operators@comma{} comparison
-@c STARTOFRANGE varting
@cindex variable typing
-@c STARTOFRANGE vartypc
@cindex variables, types of, comparison expressions and
-Unlike other programming languages, @command{awk} variables do not have a
+Unlike in other programming languages, in @command{awk} variables do not have a
fixed type. Instead, they can be either a number or a string, depending
upon the value that is assigned to them.
We look now at how variables are typed, and how @command{awk}
@@ -11499,20 +11428,20 @@ Variable typing follows these rules:
@itemize @value{BULLET}
@item
-A numeric constant or the result of a numeric operation has the @var{numeric}
+A numeric constant or the result of a numeric operation has the @dfn{numeric}
attribute.
@item
-A string constant or the result of a string operation has the @var{string}
+A string constant or the result of a string operation has the @dfn{string}
attribute.
@item
Fields, @code{getline} input, @code{FILENAME}, @code{ARGV} elements,
@code{ENVIRON} elements, and the elements of an array created by
@code{match()}, @code{split()}, and @code{patsplit()} that are numeric
-strings have the @var{strnum} attribute. Otherwise, they have
-the @var{string} attribute. Uninitialized variables also have the
-@var{strnum} attribute.
+strings have the @dfn{strnum} attribute. Otherwise, they have
+the @dfn{string} attribute. Uninitialized variables also have the
+@dfn{strnum} attribute.
@item
Attributes propagate across assignments but are not changed by
@@ -11656,13 +11585,13 @@ constant, then a string comparison is performed. Otherwise, a
numeric comparison is performed.
This point bears additional emphasis: All user input is made of characters,
-and so is first and foremost of @var{string} type; input strings
-that look numeric are additionally given the @var{strnum} attribute.
+and so is first and foremost of string type; input strings
+that look numeric are additionally given the strnum attribute.
Thus, the six-character input string @w{@samp{ +3.14}} receives the
-@var{strnum} attribute. In contrast, the eight characters
+strnum attribute. In contrast, the eight characters
@w{@code{" +3.14"}} appearing in program text comprise a string constant.
The following examples print @samp{1} when the comparison between
-the two different constants is true, @samp{0} otherwise:
+the two different constants is true, and @samp{0} otherwise:
@c 22.9.2014: Tested with mawk and BWK awk, got same results.
@example
@@ -11792,7 +11721,7 @@ $ @kbd{echo 1e2 3 | awk '@{ print ($1 < $2) ? "true" : "false" @}'}
@noindent
the result is @samp{false} because both @code{$1} and @code{$2}
are user input. They are numeric strings---therefore both have
-the @var{strnum} attribute, dictating a numeric comparison.
+the strnum attribute, dictating a numeric comparison.
The purpose of the comparison rules and the use of numeric strings is
to attempt to produce the behavior that is ``least surprising,'' while
still ``doing the right thing.''
@@ -11851,7 +11780,7 @@ characters sort, as defined by the locale (for more discussion,
@pxref{Locales}). This order is usually very different
from the results obtained when doing straight character-by-character
comparison.@footnote{Technically, string comparison is supposed
-to behave the same way as if the strings are compared with the C
+to behave the same way as if the strings were compared with the C
@code{strcoll()} function.}
Because this behavior differs considerably from existing practice,
@@ -11868,19 +11797,13 @@ $ @kbd{gawk --posix 'BEGIN @{ printf("ABC < abc = %s\n",}
@print{} ABC < abc = FALSE
@end example
-@c ENDOFRANGE comex
-@c ENDOFRANGE excom
-@c ENDOFRANGE vartypc
-@c ENDOFRANGE varting
@node Boolean Ops
@subsection Boolean Expressions
@cindex and Boolean-logic operator
@cindex or Boolean-logic operator
@cindex not Boolean-logic operator
-@c STARTOFRANGE exbo
@cindex expressions, Boolean
-@c STARTOFRANGE boex
@cindex Boolean expressions
@cindex operators, Boolean, See Boolean expressions
@cindex Boolean operators, See Boolean expressions
@@ -11964,7 +11887,7 @@ BEGIN @{ if (! ("HOME" in ENVIRON))
@cindex vertical bar (@code{|}), @code{||} operator
The @samp{&&} and @samp{||} operators are called @dfn{short-circuit}
operators because of the way they work. Evaluation of the full expression
-is ``short-circuited'' if the result can be determined part way through
+is ``short-circuited'' if the result can be determined partway through
its evaluation.
@cindex line continuations
@@ -12026,8 +11949,6 @@ next record, and start processing the rules over again at the top.
The reason it's there is to avoid printing the bracketing
@samp{START} and @samp{END} lines.
@end quotation
-@c ENDOFRANGE exbo
-@c ENDOFRANGE boex
@node Conditional Exp
@subsection Conditional Expressions
@@ -12038,8 +11959,8 @@ The reason it's there is to avoid printing the bracketing
A @dfn{conditional expression} is a special kind of expression that has
three operands. It allows you to use one expression's value to select
one of two other expressions.
-The conditional expression is the same as in the C language,
-as shown here:
+The conditional expression in @command{awk} is the same as in the C
+language, as shown here:
@example
@var{selector} ? @var{if-true-exp} : @var{if-false-exp}
@@ -12048,8 +11969,8 @@ as shown here:
@noindent
There are three subexpressions. The first, @var{selector}, is always
computed first. If it is ``true'' (not zero or not null), then
-@var{if-true-exp} is computed next and its value becomes the value of
-the whole expression. Otherwise, @var{if-false-exp} is computed next
+@var{if-true-exp} is computed next, and its value becomes the value of
+the whole expression. Otherwise, @var{if-false-exp} is computed next,
and its value becomes the value of the whole expression.
For example, the following expression produces the absolute value of @code{x}:
@@ -12097,7 +12018,7 @@ ask for it by name at any point in the program. For
example, the function @code{sqrt()} computes the square root of a number.
@cindex functions, built-in
-A fixed set of functions are @dfn{built-in}, which means they are
+A fixed set of functions are @dfn{built in}, which means they are
available in every @command{awk} program. The @code{sqrt()} function is one
of these. @DBXREF{Built-in} for a list of built-in
functions and their descriptions. In addition, you can define
@@ -12206,9 +12127,7 @@ $ @kbd{awk -f matchit.awk}
@node Precedence
@section Operator Precedence (How Operators Nest)
-@c STARTOFRANGE prec
@cindex precedence
-@c STARTOFRANGE oppr
@cindex operators, precedence
@dfn{Operator precedence} determines how operators are grouped when
@@ -12273,7 +12192,7 @@ Increment, decrement.
@cindex @code{*} (asterisk), @code{**} operator
@cindex asterisk (@code{*}), @code{**} operator
@item @code{^ **}
-Exponentiation. These operators group right-to-left.
+Exponentiation. These operators group right to left.
@cindex @code{+} (plus sign), @code{+} operator
@cindex plus sign (@code{+}), @code{+} operator
@@ -12339,7 +12258,7 @@ statements belong to the statement level, not to expressions. The
redirection does not produce an expression that could be the operand of
another operator. As a result, it does not make sense to use a
redirection operator near another operator of lower precedence without
-parentheses. Such combinations (e.g., @samp{print foo > a ? b : c}),
+parentheses. Such combinations (e.g., @samp{print foo > a ? b : c})
result in syntax errors.
The correct way to write this statement is @samp{print foo > (a ? b : c)}.
@@ -12357,17 +12276,17 @@ Array membership.
@cindex @code{&} (ampersand), @code{&&} operator
@cindex ampersand (@code{&}), @code{&&} operator
@item @code{&&}
-Logical ``and''.
+Logical ``and.''
@cindex @code{|} (vertical bar), @code{||} operator
@cindex vertical bar (@code{|}), @code{||} operator
@item @code{||}
-Logical ``or''.
+Logical ``or.''
@cindex @code{?} (question mark), @code{?:} operator
@cindex question mark (@code{?}), @code{?:} operator
@item @code{?:}
-Conditional. This operator groups right-to-left.
+Conditional. This operator groups right to left.
@cindex @code{+} (plus sign), @code{+=} operator
@cindex plus sign (@code{+}), @code{+=} operator
@@ -12384,7 +12303,7 @@ Conditional. This operator groups right-to-left.
@cindex @code{^} (caret), @code{^=} operator
@cindex caret (@code{^}), @code{^=} operator
@item @code{= += -= *= /= %= ^= **=}
-Assignment. These operators group right-to-left.
+Assignment. These operators group right to left.
@end table
@cindex POSIX @command{awk}, @code{**} operator and
@@ -12393,8 +12312,6 @@ Assignment. These operators group right-to-left.
The @samp{|&}, @samp{**}, and @samp{**=} operators are not specified by POSIX.
For maximum portability, do not use them.
@end quotation
-@c ENDOFRANGE prec
-@c ENDOFRANGE oppr
@node Locales
@section Where You Are Makes a Difference
@@ -12460,8 +12377,8 @@ Locales can influence the conversions.
@item
@command{awk} provides the usual arithmetic operators (addition,
subtraction, multiplication, division, modulus), and unary plus and minus.
-It also provides comparison operators, boolean operators, array membership
-testing, and regexp
+It also provides comparison operators, Boolean operators, an array membership
+testing operator, and regexp
matching operators. String concatenation is accomplished by placing
two expressions next to each other; there is no explicit operator.
The three-operand @samp{?:} operator provides an ``if-else'' test within
@@ -12472,7 +12389,7 @@ Assignment operators provide convenient shorthands for common arithmetic
operations.
@item
-In @command{awk}, a value is considered to be true if it is non-zero
+In @command{awk}, a value is considered to be true if it is nonzero
@emph{or} non-null. Otherwise, the value is false.
@item
@@ -12481,7 +12398,7 @@ lifetime. The type determines how it behaves in comparisons (string
or numeric).
@item
-Function calls return a value which may be used as part of a larger
+Function calls return a value that may be used as part of a larger
expression. Expressions used to pass parameter values are fully
evaluated before the function is called. @command{awk} provides
built-in and user-defined functions; this is described in
@@ -12498,11 +12415,9 @@ program, and occasionally the format for data read as input.
@end itemize
-@c ENDOFRANGE exps
@node Patterns and Actions
@chapter Patterns, Actions, and Variables
-@c STARTOFRANGE pat
@cindex patterns
As you have already seen, each @command{awk} statement consists of
@@ -12510,7 +12425,7 @@ a pattern with an associated action. This @value{CHAPTER} describes how
you build patterns and actions, what kinds of things you can do within
actions, and @command{awk}'s predefined variables.
-The pattern-action rules and the statements available for use
+The pattern--action rules and the statements available for use
within actions form the core of @command{awk} programming.
In a sense, everything covered
up to here has been the foundation
@@ -12701,7 +12616,7 @@ patterns. Likewise, the special patterns @code{BEGIN}, @code{END},
which never match any input record, are not expressions and cannot
appear inside Boolean patterns.
-The precedence of the different operators which can appear in
+The precedence of the different operators that can appear in
patterns is described in @ref{Precedence}.
@node Ranges
@@ -12727,7 +12642,7 @@ prints every record in @file{myfile} between @samp{on}/@samp{off} pairs, inclusi
A range pattern starts out by matching @var{begpat} against every
input record. When a record matches @var{begpat}, the range pattern is
-@dfn{turned on} and the range pattern matches this record as well. As long as
+@dfn{turned on}, and the range pattern matches this record as well. As long as
the range pattern stays turned on, it automatically matches every input
record read. The range pattern also matches @var{endpat} against every
input record; when this succeeds, the range pattern is @dfn{turned off} again
@@ -12798,9 +12713,7 @@ a range pattern. @value{DARKCORNER}
@node BEGIN/END
@subsection The @code{BEGIN} and @code{END} Special Patterns
-@c STARTOFRANGE beg
@cindex @code{BEGIN} pattern
-@c STARTOFRANGE end
@cindex @code{END} pattern
All the patterns described so far are for matching input records.
The @code{BEGIN} and @code{END} special patterns are different.
@@ -12873,7 +12786,7 @@ using library functions.
for a number of useful library functions.
If an @command{awk} program has only @code{BEGIN} rules and no
-other rules, then the program exits after the @code{BEGIN} rule is
+other rules, then the program exits after the @code{BEGIN} rules are
run.@footnote{The original version of @command{awk} kept
reading and ignoring input until the end of the file was seen.} However, if an
@code{END} rule exists, then the input is read, even if there are
@@ -12901,7 +12814,7 @@ Another way is simply to assign a value to @code{$0}.
@cindex @code{print} statement, @code{BEGIN}/@code{END} patterns and
@cindex @code{BEGIN} pattern, @code{print} statement and
@cindex @code{END} pattern, @code{print} statement and
-The second point is similar to the first but from the other direction.
+The second point is similar to the first, but from the other direction.
Traditionally, due largely to implementation issues, @code{$0} and
@code{NF} were @emph{undefined} inside an @code{END} rule.
The POSIX standard specifies that @code{NF} is available in an @code{END}
@@ -12938,8 +12851,6 @@ are not valid in an @code{END} rule, because all the input has been read.
@ifdocbook
@DBREF{Nextfile Statement}.)
@end ifdocbook
-@c ENDOFRANGE beg
-@c ENDOFRANGE end
@node BEGINFILE/ENDFILE
@subsection The @code{BEGINFILE} and @code{ENDFILE} Special Patterns
@@ -12992,7 +12903,7 @@ fatal error.
@item
If you have written extensions that modify the record handling (by
-inserting an ``input parser,'' @pxref{Input Parsers}), you can invoke
+inserting an ``input parser''; @pxref{Input Parsers}), you can invoke
them at this point, before @command{gawk} has started processing the file.
(This is a @emph{very} advanced feature, currently used only by the
@uref{http://gawkextlib.sourceforge.net, @code{gawkextlib} project}.)
@@ -13003,8 +12914,8 @@ the last record in an input file. For the last input file,
it will be called before any @code{END} rules.
The @code{ENDFILE} rule is executed even for empty input files.
-Normally, when an error occurs when reading input in the normal input
-processing loop, the error is fatal. However, if an @code{ENDFILE}
+Normally, when an error occurs when reading input in the normal
+input-processing loop, the error is fatal. However, if an @code{ENDFILE}
rule is present, the error becomes non-fatal, and instead @code{ERRNO}
is set. This makes it possible to catch and process I/O errors at the
level of the @command{awk} program.
@@ -13013,7 +12924,7 @@ level of the @command{awk} program.
The @code{next} statement (@pxref{Next Statement}) is not allowed inside
either a @code{BEGINFILE} or an @code{ENDFILE} rule. The @code{nextfile}
statement is allowed only inside a
-@code{BEGINFILE} rule, but not inside an @code{ENDFILE} rule.
+@code{BEGINFILE} rule, not inside an @code{ENDFILE} rule.
@cindex @code{getline} statement, @code{BEGINFILE}/@code{ENDFILE} patterns and
The @code{getline} statement (@pxref{Getline}) is restricted inside
@@ -13060,7 +12971,6 @@ awk '@{ print $1 @}' mail-list
@noindent
prints the first field of every record.
-@c ENDOFRANGE pat
@node Using Shell Variables
@section Using Shell Variables in Programs
@@ -13090,11 +13000,11 @@ awk "/$pattern/ "'@{ nmatches++ @}
@noindent
The @command{awk} program consists of two pieces of quoted text
that are concatenated together to form the program.
-The first part is double quoted, which allows substitution of
+The first part is double-quoted, which allows substitution of
the @code{pattern} shell variable inside the quotes.
-The second part is single quoted.
+The second part is single-quoted.
-Variable substitution via quoting works, but can be potentially
+Variable substitution via quoting works, but can potentially be
messy. It requires a good understanding of the shell's quoting rules
(@pxref{Quoting}),
and it's often difficult to correctly
@@ -13209,11 +13119,8 @@ For deleting array elements.
@node Statements
@section Control Statements in Actions
-@c STARTOFRANGE csta
@cindex control statements
-@c STARTOFRANGE acs
@cindex statements, control, in actions
-@c STARTOFRANGE accs
@cindex actions, control statements in
@dfn{Control statements}, such as @code{if}, @code{while}, and so on,
@@ -13356,13 +13263,13 @@ The body of this loop is a compound statement enclosed in braces,
containing two statements.
The loop works in the following manner: first, the value of @code{i} is set to one.
Then, the @code{while} statement tests whether @code{i} is less than or equal to
-three. This is true when @code{i} equals one, so the @code{i}-th
+three. This is true when @code{i} equals one, so the @code{i}th
field is printed. Then the @samp{i++} increments the value of @code{i}
and the loop repeats. The loop terminates when @code{i} reaches four.
A newline is not required between the condition and the
body; however, using one makes the program clearer unless the body is a
-compound statement or else is very simple. The newline after the open-brace
+compound statement or else is very simple. The newline after the open brace
that begins the compound statement is not required either, but the
program is harder to read without it.
@@ -13392,9 +13299,9 @@ while (@var{condition})
@end example
@noindent
-This statement does not execute @var{body} even once if the @var{condition}
-is false to begin with.
-The following is an example of a @code{do} statement:
+This statement does not execute the @var{body} even once if the
+@var{condition} is false to begin with. The following is an example of
+a @code{do} statement:
@example
@{
@@ -13461,7 +13368,7 @@ their assignments as separate statements preceding the @code{for} loop.)
The same is true of the @var{increment} part. Incrementing additional
variables requires separate statements at the end of the loop.
The C compound expression, using C's comma operator, is useful in
-this context but it is not supported in @command{awk}.
+this context, but it is not supported in @command{awk}.
Most often, @var{increment} is an increment expression, as in the previous
example. But this is not required; it can be any expression
@@ -13552,7 +13459,7 @@ default:
Control flow in
the @code{switch} statement works as it does in C. Once a match to a given
case is made, the case statement bodies execute until a @code{break},
-@code{continue}, @code{next}, @code{nextfile} or @code{exit} is encountered,
+@code{continue}, @code{next}, @code{nextfile}, or @code{exit} is encountered,
or the end of the @code{switch} statement itself. For example:
@example
@@ -13726,7 +13633,12 @@ body of a loop. Historical versions of @command{awk} treated a @code{continue}
statement outside a loop the same way they treated a @code{break}
statement outside a loop: as if it were a @code{next}
statement
+@ifset FOR_PRINT
+(discussed in the following section).
+@end ifset
+@ifclear FOR_PRINT
(@pxref{Next Statement}).
+@end ifclear
@value{DARKCORNER}
Recent versions of BWK @command{awk} no longer work this way, nor
does @command{gawk}.
@@ -13854,7 +13766,7 @@ See @uref{http://austingroupbugs.net/view.php?id=607, the Austin Group website}.
@cindex @code{nextfile} statement, user-defined functions and
@cindex Brian Kernighan's @command{awk}
@cindex @command{mawk} utility
-The current version of BWK @command{awk}, and @command{mawk}
+The current version of BWK @command{awk} and @command{mawk}
also support @code{nextfile}. However, they don't allow the
@code{nextfile} statement inside function bodies (@pxref{User-defined}).
@command{gawk} does; a @code{nextfile} inside a function body reads the
@@ -13892,7 +13804,7 @@ any @code{ENDFILE} rules; they do not execute.
In such a case,
if you don't want the @code{END} rule to do its job, set a variable
-to nonzero before the @code{exit} statement and check that variable in
+to a nonzero value before the @code{exit} statement and check that variable in
the @code{END} rule.
@DBXREF{Assert Function}
for an example that does this.
@@ -13931,15 +13843,10 @@ Negative values, and values of 127 or greater, may not produce consistent
results across different operating systems.
@end quotation
-@c ENDOFRANGE csta
-@c ENDOFRANGE acs
-@c ENDOFRANGE accs
@node Built-in Variables
@section Predefined Variables
-@c STARTOFRANGE bvar
@cindex predefined variables
-@c STARTOFRANGE varb
@cindex variables, predefined
Most @command{awk} variables are available to use for your own
@@ -13965,10 +13872,8 @@ their areas of activity.
@end menu
@node User-modified
-@subsection Built-In Variables That Control @command{awk}
-@c STARTOFRANGE bvaru
+@subsection Built-in Variables That Control @command{awk}
@cindex predefined variables, user-modifiable
-@c STARTOFRANGE nmbv
@cindex user-modifiable variables
The following is an alphabetical list of variables that you can change to
@@ -13996,7 +13901,7 @@ respectively, should use binary I/O. A string value of @code{"rw"} or
@code{"wr"} indicates that all files should use binary I/O. Any other
string value is treated the same as @code{"rw"}, but causes @command{gawk}
to generate a warning message. @code{BINMODE} is described in more
-detail in @ref{PC Using}. @command{mawk} (@pxref{Other Versions}),
+detail in @ref{PC Using}. @command{mawk} (@pxref{Other Versions})
also supports this variable, but only using numeric values.
@cindex @code{CONVFMT} variable
@@ -14004,7 +13909,7 @@ also supports this variable, but only using numeric values.
@cindex numbers, converting, to strings
@cindex strings, converting, numbers to
@item @code{CONVFMT}
-This string controls conversion of numbers to
+A string that controls the conversion of numbers to
strings (@pxref{Conversion}).
It works by being passed, in effect, as the first argument to the
@code{sprintf()} function
@@ -14079,7 +13984,7 @@ is to simply say @samp{FS = FS}, perhaps with an explanatory comment.
@cindex regular expressions, case sensitivity
@item IGNORECASE #
If @code{IGNORECASE} is nonzero or non-null, then all string comparisons
-and all regular expression matching are case independent. Thus, regexp
+and all regular expression matching are case-independent. Thus, regexp
matching with @samp{~} and @samp{!~}, as well as the @code{gensub()},
@code{gsub()}, @code{index()}, @code{match()}, @code{patsplit()},
@code{split()}, and @code{sub()}
@@ -14105,7 +14010,7 @@ Any other true value prints nonfatal warnings.
Assigning a false value to @code{LINT} turns off the lint warnings.
This variable is a @command{gawk} extension. It is not special
-in other @command{awk} implementations. Unlike the other special variables,
+in other @command{awk} implementations. Unlike with the other special variables,
changing @code{LINT} does affect the production of lint warnings,
even if @command{gawk} is in compatibility mode. Much as
the @option{--lint} and @option{--traditional} options independently
@@ -14117,7 +14022,7 @@ of @command{awk} being executed.
@cindex numbers, converting, to strings
@cindex strings, converting, numbers to
@item OFMT
-Controls conversion of numbers to
+A string that controls conversion of numbers to
strings (@pxref{Conversion}) for
printing with the @code{print} statement. It works by being passed
as the first argument to the @code{sprintf()} function
@@ -14132,7 +14037,7 @@ strings in general expressions; this is now done by @code{CONVFMT}.
@cindex separators, field
@cindex field separators
@item OFS
-This is the output field separator (@pxref{Output Separators}). It is
+The output field separator (@pxref{Output Separators}). It is
output between the fields printed by a @code{print} statement. Its
default value is @w{@code{" "}}, a string consisting of a single space.
@@ -14150,7 +14055,7 @@ The working precision of arbitrary-precision floating-point numbers,
@cindex @code{ROUNDMODE} variable
@item ROUNDMODE #
The rounding mode to use for arbitrary-precision arithmetic on
-numbers, by default @code{"N"} (@samp{roundTiesToEven} in
+numbers, by default @code{"N"} (@code{roundTiesToEven} in
the IEEE 754 standard; @pxref{Setting the rounding mode}).
@cindex @code{RS} variable
@@ -14179,7 +14084,7 @@ just the first character of @code{RS}'s value is used.
@item @code{SUBSEP}
The subscript separator. It has the default value of
@code{"\034"} and is used to separate the parts of the indices of a
-multidimensional array. Thus, the expression @code{@w{foo["A", "B"]}}
+multidimensional array. Thus, the expression @samp{@w{foo["A", "B"]}}
really accesses @code{foo["A\034B"]}
(@pxref{Multidimensional}).
@@ -14195,17 +14100,11 @@ marked string constants in the source text, as well as for the
(@pxref{Internationalization}).
The default value of @code{TEXTDOMAIN} is @code{"messages"}.
@end table
-@c ENDOFRANGE bvar
-@c ENDOFRANGE varb
-@c ENDOFRANGE bvaru
-@c ENDOFRANGE nmbv
@node Auto-set
-@subsection Built-In Variables That Convey Information
+@subsection Built-in Variables That Convey Information
-@c STARTOFRANGE bvconi
@cindex predefined variables, conveying information
-@c STARTOFRANGE vbconi
@cindex variables, predefined conveying information
The following is an alphabetical list of variables that @command{awk}
sets automatically on certain occasions in order to provide
@@ -14361,12 +14260,12 @@ input file.
@item @code{NF}
The number of fields in the current input record.
@code{NF} is set each time a new record is read, when a new field is
-created or when @code{$0} changes (@pxref{Fields}).
+created, or when @code{$0} changes (@pxref{Fields}).
Unlike most of the variables described in this @value{SUBSECTION},
assigning a value to @code{NF} has the potential to affect
@command{awk}'s internal workings. In particular, assignments
-to @code{NF} can be used to create or remove fields from the
+to @code{NF} can be used to create fields in or remove fields from the
current record. @xref{Changing Fields}.
@cindex @code{FUNCTAB} array
@@ -14416,7 +14315,7 @@ or @code{"FPAT"} if field matching with @code{FPAT} is in effect.
@item PROCINFO["identifiers"]
@cindex program identifiers
A subarray, indexed by the names of all identifiers used in the text of
-the AWK program. An @dfn{identifier} is simply the name of a variable
+the @command{awk} program. An @dfn{identifier} is simply the name of a variable
(be it scalar or array), built-in function, user-defined function, or
extension function. For each identifier, the value of the element is
one of the following:
@@ -14436,7 +14335,7 @@ The identifier is an extension function loaded via
The identifier is a scalar.
@item "untyped"
-The identifier is untyped (could be used as a scalar or array,
+The identifier is untyped (could be used as a scalar or an array;
@command{gawk} doesn't know yet).
@item "user"
@@ -14557,7 +14456,7 @@ is the length of the matched string, or @minus{}1 if no match is found.
@cindex @code{RSTART} variable
@item @code{RSTART}
-The start-index in characters of the substring that is matched by the
+The start index in characters of the substring that is matched by the
@code{match()} function
(@pxref{String Functions}).
@code{RSTART} is set by invoking the @code{match()} function. Its value
@@ -14624,11 +14523,9 @@ function multiply(variable, amount)
@quotation NOTE
In order to avoid severe time-travel paradoxes,@footnote{Not to mention difficult
implementation issues.} neither @code{FUNCTAB} nor @code{SYMTAB}
-are available as elements within the @code{SYMTAB} array.
+is available as an element within the @code{SYMTAB} array.
@end quotation
@end table
-@c ENDOFRANGE bvconi
-@c ENDOFRANGE vbconi
@sidebar Changing @code{NR} and @code{FNR}
@cindex @code{NR} variable, changing
@@ -14800,7 +14697,7 @@ When designing your program, you should choose options that don't
conflict with @command{gawk}'s, because it will process any options
that it accepts before passing the rest of the command line on to
your program. Using @samp{#!} with the @option{-E} option may help
-(@DBXREF{Executable Scripts}
+(@DBPXREF{Executable Scripts}
and
@ifnotdocbook
@DBPXREF{Options}).
@@ -14814,15 +14711,15 @@ and
@itemize @value{BULLET}
@item
-Pattern-action pairs make up the basic elements of an @command{awk}
+Pattern--action pairs make up the basic elements of an @command{awk}
program. Patterns are either normal expressions, range expressions,
-regexp constants, one of the special keywords @code{BEGIN}, @code{END},
-@code{BEGINFILE}, @code{ENDFILE}, or empty. The action executes if
+or regexp constants; one of the special keywords @code{BEGIN}, @code{END},
+@code{BEGINFILE}, or @code{ENDFILE}; or empty. The action executes if
the current record matches the pattern. Empty (missing) patterns match
all records.
@item
-I/O from @code{BEGIN} and @code{END} rules have certain constraints.
+I/O from @code{BEGIN} and @code{END} rules has certain constraints.
This is also true, only more so, for @code{BEGINFILE} and @code{ENDFILE}
rules. The latter two give you ``hooks'' into @command{gawk}'s file
processing, allowing you to recover from a file that otherwise would
@@ -14852,12 +14749,12 @@ iteration of a loop (or get out of a @code{switch}).
@item
@code{next} and @code{nextfile} let you read the next record and start
-over at the top of your program, or skip to the next input file and
+over at the top of your program or skip to the next input file and
start over, respectively.
@item
The @code{exit} statement terminates your program. When executed
-from an action (or function body) it transfers control to the
+from an action (or function body), it transfers control to the
@code{END} statements. From an @code{END} statement body, it exits
immediately. You may pass an optional numeric value to be used
as @command{awk}'s exit status.
@@ -14875,7 +14772,6 @@ control how @command{awk} will process the provided @value{DF}s.
@node Arrays
@chapter Arrays in @command{awk}
-@c STARTOFRANGE arrs
@cindex arrays
An @dfn{array} is a table of values called @dfn{elements}. The
@@ -14961,15 +14857,17 @@ the declaration.
indices---e.g., @samp{15 .. 27}---but the size of the array is still fixed when
the array is declared.)
-A contiguous array of four elements might look like the following example,
-conceptually, if the element values are 8, @code{"foo"},
-@code{""}, and 30
+@c 1/2015: Do not put the numeric values into @code. Array element
+@c values are no different than scalar variable values.
+A contiguous array of four elements might look like
@ifnotdocbook
-as shown in @ref{figure-array-elements}:
+@ref{figure-array-elements},
@end ifnotdocbook
@ifdocbook
-as shown in @inlineraw{docbook, <xref linkend="figure-array-elements"/>}:
+@inlineraw{docbook, <xref linkend="figure-array-elements"/>},
@end ifdocbook
+conceptually, if the element values are eight, @code{"foo"},
+@code{""}, and 30.
@ifnotdocbook
@float Figure,figure-array-elements
@@ -14994,12 +14892,10 @@ as shown in @inlineraw{docbook, <xref linkend="figure-array-elements"/>}:
@noindent
Only the values are stored; the indices are implicit from the order of
-the values. Here, 8 is the value at index zero, because 8 appears in the
+the values. Here, eight is the value at index zero, because eight appears in the
position with zero elements before it.
-@c STARTOFRANGE arrin
@cindex arrays, indexing
-@c STARTOFRANGE inarr
@cindex indexing arrays
@cindex associative arrays
@cindex arrays, associative
@@ -15008,19 +14904,21 @@ that each array is a collection of pairs---an index and its corresponding
array element value:
@ifnotdocbook
-@example
-@r{Index} 3 @r{Value} 30
-@r{Index} 1 @r{Value} "foo"
-@r{Index} 0 @r{Value} 8
-@r{Index} 2 @r{Value} ""
-@end example
+@c extra empty column to indent it right
+@multitable @columnfractions .1 .1 .1
+@headitem @tab Index @tab Value
+@item @tab @code{3} @tab @code{30}
+@item @tab @code{1} @tab @code{"foo"}
+@item @tab @code{0} @tab @code{8}
+@item @tab @code{2} @tab @code{""}
+@end multitable
@end ifnotdocbook
@docbook
<informaltable>
<tgroup cols="2">
-<colspec colname="1" align="center"/>
-<colspec colname="2" align="center"/>
+<colspec colname="1" align="left"/>
+<colspec colname="2" align="left"/>
<thead>
<row>
<entry>Index</entry>
@@ -15066,20 +14964,22 @@ at any time. For example, suppose a tenth element is added to the array
whose value is @w{@code{"number ten"}}. The result is:
@ifnotdocbook
-@example
-@r{Index} 10 @r{Value} "number ten"
-@r{Index} 3 @r{Value} 30
-@r{Index} 1 @r{Value} "foo"
-@r{Index} 0 @r{Value} 8
-@r{Index} 2 @r{Value} ""
-@end example
+@c extra empty column to indent it right
+@multitable @columnfractions .1 .1 .2
+@headitem @tab Index @tab Value
+@item @tab @code{10} @tab @code{"number ten"}
+@item @tab @code{3} @tab @code{30}
+@item @tab @code{1} @tab @code{"foo"}
+@item @tab @code{0} @tab @code{8}
+@item @tab @code{2} @tab @code{""}
+@end multitable
@end ifnotdocbook
@docbook
<informaltable>
<tgroup cols="2">
-<colspec colname="1" align="center"/>
-<colspec colname="2" align="center"/>
+<colspec colname="1" align="left"/>
+<colspec colname="2" align="left"/>
<thead>
<row>
<entry>Index</entry>
@@ -15131,19 +15031,20 @@ an index. For example, the following is an array that translates words from
English to French:
@ifnotdocbook
-@example
-@r{Index} "dog" @r{Value} "chien"
-@r{Index} "cat" @r{Value} "chat"
-@r{Index} "one" @r{Value} "un"
-@r{Index} 1 @r{Value} "un"
-@end example
+@multitable @columnfractions .1 .1 .1
+@headitem @tab Index @tab Value
+@item @tab @code{"dog"} @tab @code{"chien"}
+@item @tab @code{"cat"} @tab @code{"chat"}
+@item @tab @code{"one"} @tab @code{"un"}
+@item @tab @code{1} @tab @code{"un"}
+@end multitable
@end ifnotdocbook
@docbook
<informaltable>
<tgroup cols="2">
-<colspec colname="1" align="center"/>
-<colspec colname="2" align="center"/>
+<colspec colname="1" align="left"/>
+<colspec colname="2" align="left"/>
<thead>
<row>
<entry>Index</entry>
@@ -15185,7 +15086,7 @@ numbers and strings as indices.
There are some subtleties to how numbers work when used as
array subscripts; this is discussed in more detail in
@ref{Numeric Array Subscripts}.)
-Here, the number @code{1} isn't double quoted, because @command{awk}
+Here, the number @code{1} isn't double-quoted, because @command{awk}
automatically converts it to a string.
@cindex @command{gawk}, @code{IGNORECASE} variable in
@@ -15202,8 +15103,6 @@ that array's indices are consecutive integers starting at one.
@command{awk}'s arrays are efficient---the time to access an element
is independent of the number of elements in the array.
-@c ENDOFRANGE arrin
-@c ENDOFRANGE inarr
@node Reference to Elements
@subsection Referring to an Array Element
@@ -15212,7 +15111,7 @@ is independent of the number of elements in the array.
@cindex elements of arrays
The principal way to use an array is to refer to one of its elements.
-An array reference is an expression as follows:
+An @dfn{array reference} is an expression as follows:
@example
@var{array}[@var{index-expression}]
@@ -15222,8 +15121,11 @@ An array reference is an expression as follows:
Here, @var{array} is the name of an array. The expression @var{index-expression} is
the index of the desired element of the array.
+@c 1/2015: Having the 4.3 in @samp is a little iffy. It's essentially
+@c an expression though, so leave be. It's to early in the discussion
+@c to mention that it's really a string.
The value of the array reference is the current value of that array
-element. For example, @code{foo[4.3]} is an expression for the element
+element. For example, @code{foo[4.3]} is an expression referencing the element
of array @code{foo} at index @samp{4.3}.
@cindex arrays, unassigned elements
@@ -15315,7 +15217,7 @@ assign to that element of the array.
The following program takes a list of lines, each beginning with a line
number, and prints them out in order of line number. The line numbers
-are not in order when they are first read---instead they
+are not in order when they are first read---instead, they
are scrambled. This program sorts the lines by making an array using
the line numbers as subscripts. The program then prints out the lines
in sorted order of their numbers. It is a very simple program and gets
@@ -15409,7 +15311,7 @@ program has previously used, with the variable @var{var} set to that index.
The following program uses this form of the @code{for} statement. The
first rule scans the input records and notes which words appear (at
least once) in the input, by storing a one into the array @code{used} with
-the word as index. The second rule scans the elements of @code{used} to
+the word as the index. The second rule scans the elements of @code{used} to
find all the distinct words that appear in the input. It prints each
word that is more than 10 characters long and also prints the number of
such words.
@@ -15506,7 +15408,7 @@ and will vary from one version of @command{awk} to the next.
Often, though, you may wish to do something simple, such as
``traverse the array by comparing the indices in ascending order,''
or ``traverse the array by comparing the values in descending order.''
-@command{gawk} provides two mechanisms which give you this control.
+@command{gawk} provides two mechanisms that give you this control:
@itemize @value{BULLET}
@item
@@ -15563,21 +15465,26 @@ across different environments.} which @command{gawk} uses internally
to perform the sorting.
@item "@@ind_str_desc"
-String indices ordered from high to low.
+Like @code{"@@ind_str_asc"}, but the
+string indices are ordered from high to low.
@item "@@ind_num_desc"
-Numeric indices ordered from high to low.
+Like @code{"@@ind_num_asc"}, but the
+numeric indices are ordered from high to low.
@item "@@val_type_desc"
-Element values, based on type, ordered from high to low.
+Like @code{"@@val_type_asc"}, but the
+element values, based on type, are ordered from high to low.
Subarrays, if present, come out first.
@item "@@val_str_desc"
-Element values, treated as strings, ordered from high to low.
+Like @code{"@@val_str_asc"}, but the
+element values, treated as strings, are ordered from high to low.
Subarrays, if present, come out first.
@item "@@val_num_desc"
-Element values, treated as numbers, ordered from high to low.
+Like @code{"@@val_num_asc"}, but the
+element values, treated as numbers, are ordered from high to low.
Subarrays, if present, come out first.
@end table
@@ -15800,7 +15707,7 @@ for (i in frequencies)
@noindent
This example removes all the elements from the array @code{frequencies}.
Once an element is deleted, a subsequent @code{for} statement to scan the array
-does not report that element and the @code{in} operator to check for
+does not report that element and using the @code{in} operator to check for
the presence of that element returns zero (i.e., false):
@example
@@ -16060,7 +15967,7 @@ a[1][2] = 2
This simulates a true two-dimensional array. Each subarray element can
contain another subarray as a value, which in turn can hold other arrays
as well. In this way, you can create arrays of three or more dimensions.
-The indices can be any @command{awk} expression, including scalars
+The indices can be any @command{awk} expressions, including scalars
separated by commas (i.e., a regular @command{awk} simulated
multidimensional subscript). So the following is valid in
@command{gawk}:
@@ -16072,7 +15979,7 @@ a[1][3][1, "name"] = "barney"
Each subarray and the main array can be of different length. In fact, the
elements of an array or its subarray do not all have to have the same
type. This means that the main array and any of its subarrays can be
-non-rectangular, or jagged in structure. You can assign a scalar value to
+nonrectangular, or jagged in structure. You can assign a scalar value to
the index @code{4} of the main array @code{a}, even though @code{a[1]}
is itself an array and not a scalar:
@@ -16096,7 +16003,8 @@ a[4][5][6][7] = "An element in a four-dimensional array"
@noindent
This removes the scalar value from index @code{4} and then inserts a
-subarray of subarray of subarray containing a scalar. You can also
+three-level nested subarray
+containing a scalar. You can also
delete an entire subarray or subarray of subarrays:
@example
@@ -16107,7 +16015,7 @@ a[4][5] = "An element in subarray a[4]"
But recall that you can not delete the main array @code{a} and then use it
as a scalar.
-The built-in functions which take array arguments can also be used
+The built-in functions that take array arguments can also be used
with subarrays. For example, the following code fragment uses @code{length()}
(@pxref{String Functions})
to determine the number of elements in the main array @code{a} and
@@ -16137,7 +16045,7 @@ can be nested to scan all the
elements of an array of arrays if it is rectangular in structure. In order
to print the contents (scalar values) of a two-dimensional array of arrays
(i.e., in which each first-level element is itself an
-array, not necessarily of the same length)
+array, not necessarily of the same length),
you could use the following code:
@example
@@ -16237,9 +16145,9 @@ versions of @command{awk}.
@item
Standard @command{awk} simulates multidimensional arrays by separating
-subscript values with a comma. The values are concatenated into a
+subscript values with commas. The values are concatenated into a
single string, separated by the value of @code{SUBSEP}. The fact
-that such a subscript was created in this way is not retained; thus
+that such a subscript was created in this way is not retained; thus,
changing @code{SUBSEP} may have unexpected consequences. You can use
@samp{(@var{sub1}, @var{sub2}, @dots{}) in @var{array}} to see if such
a multidimensional subscript exists in @var{array}.
@@ -16248,7 +16156,7 @@ a multidimensional subscript exists in @var{array}.
@command{gawk} provides true arrays of arrays. You use a separate
set of square brackets for each dimension in such an array:
@code{data[row][col]}, for example. Array elements may thus be either
-scalar values (number or string) or another array.
+scalar values (number or string) or other arrays.
@item
Use the @code{isarray()} built-in function to determine if an array
@@ -16256,14 +16164,11 @@ element is itself a subarray.
@end itemize
-@c ENDOFRANGE arrs
@node Functions
@chapter Functions
-@c STARTOFRANGE funcbi
@cindex functions, built-in
-@c STARTOFRANGE bifunc
@cindex built-in functions
This @value{CHAPTER} describes @command{awk}'s built-in functions,
which fall into three categories: numeric, string, and I/O.
@@ -16276,6 +16181,9 @@ Besides the built-in functions, @command{awk} has provisions for
writing new functions that the rest of a program can use.
The second half of this @value{CHAPTER} describes these
@dfn{user-defined} functions.
+Finally, we explore indirect function calls, a @command{gawk}-specific
+extension that lets you determine at runtime what function is to
+be called.
@menu
* Built-in:: Summarizes the built-in functions.
@@ -16285,7 +16193,7 @@ The second half of this @value{CHAPTER} describes these
@end menu
@node Built-in
-@section Built-In Functions
+@section Built-in Functions
@dfn{Built-in} functions are always available for
your @command{awk} program to call. This @value{SECTION} defines all
@@ -16308,7 +16216,7 @@ but are summarized here for your convenience.
@end menu
@node Calling Built-in
-@subsection Calling Built-In Functions
+@subsection Calling Built-in Functions
To call one of @command{awk}'s built-in functions, write the name of
the function followed
@@ -16359,7 +16267,7 @@ j = atan2(++i, i *= 2)
@end example
If the order of evaluation is left to right, then @code{i} first becomes
-6, and then 12, and @code{atan2()} is called with the two arguments 6
+six, and then 12, and @code{atan2()} is called with the two arguments six
and 12. But if the order of evaluation is right to left, @code{i}
first becomes 10, then 11, and @code{atan2()} is called with the
two arguments 11 and 10.
@@ -16440,7 +16348,7 @@ In fact, @command{gawk} uses the BSD @code{random()} function, which is
considerably better than @code{rand()}, to produce random numbers.}
Often random integers are needed instead. Following is a user-defined function
-that can be used to obtain a random non-negative integer less than @var{n}:
+that can be used to obtain a random nonnegative integer less than @var{n}:
@example
function randint(n)
@@ -16535,7 +16443,7 @@ implementations.
The functions in this @value{SECTION} look at or change the text of one
or more strings.
-@code{gawk} understands locales (@pxref{Locales}), and does all
+@command{gawk} understands locales (@pxref{Locales}) and does all
string processing in terms of @emph{characters}, not @emph{bytes}.
This distinction is particularly important to understand for locales
where one character may be represented by multiple bytes. Thus, for
@@ -16624,7 +16532,7 @@ a[2] = "de"
a[3] = "sac"
@end example
-The @code{asorti()} function works similarly to @code{asort()}, however,
+The @code{asorti()} function works similarly to @code{asort()}; however,
the @emph{indices} are sorted, instead of the values. Thus, in the
previous example, starting with the same initial set of indices and
values in @code{a}, calling @samp{asorti(a)} would yield:
@@ -16739,7 +16647,7 @@ If @var{find} is not found, @code{index()} returns zero.
With BWK @command{awk} and @command{gawk},
it is a fatal error to use a regexp constant for @var{find}.
Other implementations allow it, simply treating the regexp
-constant as an expression meaning @samp{$0 ~ /regexp/}. @value{DARKCORNER}.
+constant as an expression meaning @samp{$0 ~ /regexp/}. @value{DARKCORNER}
@item @code{length(}[@var{string}]@code{)}
@cindexawkfunc{length}
@@ -16822,7 +16730,7 @@ If @option{--posix} is supplied, using an array argument is a fatal error
@cindex string, regular expression match
@cindex match regexp in string
Search @var{string} for the
-longest, leftmost substring matched by the regular expression,
+longest, leftmost substring matched by the regular expression
@var{regexp} and return the character position (index)
at which that substring begins (one, if it starts at the beginning of
@var{string}). If no match is found, return zero.
@@ -16834,7 +16742,7 @@ In the latter case, the string is treated as a regexp to be matched.
discussion of the difference between the two forms, and the
implications for writing your program correctly.
-The order of the first two arguments is backwards from most other string
+The order of the first two arguments is the opposite of most other string
functions that work with regular expressions, such as
@code{sub()} and @code{gsub()}. It might help to remember that
for @code{match()}, the order is the same as for the @samp{~} operator:
@@ -16923,7 +16831,7 @@ $ @kbd{echo foooobazbarrrrr |}
@end example
There may not be subscripts for the start and index for every parenthesized
-subexpression, because they may not all have matched text; thus they
+subexpression, because they may not all have matched text; thus, they
should be tested for with the @code{in} operator
(@pxref{Reference to Elements}).
@@ -16970,13 +16878,13 @@ a regexp describing where to split @var{string} (much as @code{FS} can
be a regexp describing where to split input records).
If @var{fieldsep} is omitted, the value of @code{FS} is used.
@code{split()} returns the number of elements created.
-@var{seps} is a @command{gawk} extension with @code{@var{seps}[@var{i}]}
+@var{seps} is a @command{gawk} extension, with @code{@var{seps}[@var{i}]}
being the separator string
between @code{@var{array}[@var{i}]} and @code{@var{array}[@var{i}+1]}.
If @var{fieldsep} is a single
-space then any leading whitespace goes into @code{@var{seps}[0]} and
+space, then any leading whitespace goes into @code{@var{seps}[0]} and
any trailing
-whitespace goes into @code{@var{seps}[@var{n}]} where @var{n} is the
+whitespace goes into @code{@var{seps}[@var{n}]}, where @var{n} is the
return value of
@code{split()} (i.e., the number of elements in @var{array}).
@@ -16989,7 +16897,7 @@ split("cul-de-sac", a, "-", seps)
@noindent
@cindex strings splitting, example
-splits the string @samp{cul-de-sac} into three fields using @samp{-} as the
+splits the string @code{"cul-de-sac"} into three fields using @samp{-} as the
separator. It sets the contents of the array @code{a} as follows:
@example
@@ -17014,19 +16922,18 @@ As with input field-splitting, when the value of @var{fieldsep} is
the elements of
@var{array} but not in @var{seps}, and the elements
are separated by runs of whitespace.
-Also, as with input field-splitting, if @var{fieldsep} is the null string, each
+Also, as with input field splitting, if @var{fieldsep} is the null string, each
individual character in the string is split into its own array element.
@value{COMMONEXT}
Note, however, that @code{RS} has no effect on the way @code{split()}
-works. Even though @samp{RS = ""} causes newline to also be an input
+works. Even though @samp{RS = ""} causes the newline character to also be an input
field separator, this does not affect how @code{split()} splits strings.
@cindex dark corner, @code{split()} function
Modern implementations of @command{awk}, including @command{gawk}, allow
-the third argument to be a regexp constant (@code{/abc/}) as well as a
-string.
-@value{DARKCORNER}
+the third argument to be a regexp constant (@w{@code{/}@dots{}@code{/}})
+as well as a string. @value{DARKCORNER}
The POSIX standard allows this as well.
@DBXREF{Computed Regexps} for a
discussion of the difference between using a string constant or a regexp constant,
@@ -17163,7 +17070,7 @@ an @samp{&}:
@cindex @code{sub()} function, arguments of
@cindex @code{gsub()} function, arguments of
As mentioned, the third argument to @code{sub()} must
-be a variable, field or array element.
+be a variable, field, or array element.
Some versions of @command{awk} allow the third argument to
be an expression that is not an lvalue. In such a case, @code{sub()}
still searches for the pattern and returns zero or one, but the result of
@@ -17322,8 +17229,8 @@ example, @code{"a\qb"} is treated as @code{"aqb"}.
At the runtime level, the various functions handle sequences of
@samp{\} and @samp{&} differently. The situation is (sadly) somewhat complex.
-Historically, the @code{sub()} and @code{gsub()} functions treated the two
-character sequence @samp{\&} specially; this sequence was replaced in
+Historically, the @code{sub()} and @code{gsub()} functions treated the
+two-character sequence @samp{\&} specially; this sequence was replaced in
the generated text with a single @samp{&}. Any other @samp{\} within
the @var{replacement} string that did not precede an @samp{&} was passed
through unchanged. This is illustrated in @ref{table-sub-escapes}.
@@ -17381,7 +17288,7 @@ _bigskip}
@end float
@noindent
-This table shows both the lexical-level processing, where
+This table shows the lexical-level processing, where
an odd number of backslashes becomes an even number at the runtime level,
as well as the runtime processing done by @code{sub()}.
(For the sake of simplicity, the rest of the following tables only show the
@@ -17402,7 +17309,7 @@ This is shown in
@ref{table-sub-proposed}.
@float Table,table-sub-proposed
-@caption{GNU @command{awk} rules for @code{sub()} and backslash}
+@caption{@command{gawk} rules for @code{sub()} and backslash}
@tex
\vbox{\bigskip
% We need more characters for escape and tab ...
@@ -17447,7 +17354,7 @@ _bigskip}
@end float
In a nutshell, at the runtime level, there are now three special sequences
-of characters (@samp{\\\&}, @samp{\\&} and @samp{\&}) whereas historically
+of characters (@samp{\\\&}, @samp{\\&}, and @samp{\&}) whereas historically
there was only one. However, as in the historical case, any @samp{\} that
is not part of one of these three sequences is not special and appears
in the output literally.
@@ -17513,7 +17420,7 @@ The only case where the difference is noticeable is the last one: @samp{\\\\}
is seen as @samp{\\} and produces @samp{\} instead of @samp{\\}.
Starting with @value{PVERSION} 3.1.4, @command{gawk} followed the POSIX rules
-when @option{--posix} is specified (@pxref{Options}). Otherwise,
+when @option{--posix} was specified (@pxref{Options}). Otherwise,
it continued to follow the proposed rules, as
that had been its behavior for many years.
@@ -17581,7 +17488,7 @@ _bigskip}
@end ifnottex
@end float
-Because of the complexity of the lexical and runtime level processing
+Because of the complexity of the lexical- and runtime-level processing
and the special cases for @code{sub()} and @code{gsub()},
we recommend the use of @command{gawk} and @code{gensub()} when you have
to do substitutions.
@@ -17607,6 +17514,7 @@ for more information.
When closing a coprocess, it is occasionally useful to first close
one end of the two-way pipe and then to close the other. This is done
by providing a second argument to @code{close()}. This second argument
+(@var{how})
should be one of the two string values @code{"to"} or @code{"from"},
indicating which end of the pipe to close. Case in the string does
not matter.
@@ -17633,7 +17541,7 @@ every little bit of information as soon as it is ready. However, sometimes
it is necessary to force a program to @dfn{flush} its buffers (i.e.,
write the information to its destination, even if a buffer is not full).
This is the purpose of the @code{fflush()} function---@command{gawk} also
-buffers its output and the @code{fflush()} function forces
+buffers its output, and the @code{fflush()} function forces
@command{gawk} to flush its buffers.
@cindex extensions, common@comma{} @code{fflush()} function
@@ -17654,7 +17562,7 @@ would flush only the standard output if there was no argument,
and flush all output files and pipes if the argument was the null
string. This was changed in order to be compatible with Brian
Kernighan's @command{awk}, in the hope that standardizing this
-feature in POSIX would then be easier (which indeed helped).
+feature in POSIX would then be easier (which indeed proved to be the case).
With @command{gawk},
you can use @samp{fflush("/dev/stdout")} if you wish to flush
@@ -17665,7 +17573,7 @@ only the standard output.
@c @cindex warnings, automatic
@cindex troubleshooting, @code{fflush()} function
@code{fflush()} returns zero if the buffer is successfully flushed;
-otherwise, it returns non-zero. (@command{gawk} returns @minus{}1.)
+otherwise, it returns a nonzero value. (@command{gawk} returns @minus{}1.)
In the case where all buffers are flushed, the return value is zero
only if all buffers were flushed successfully. Otherwise, it is
@minus{}1, and @command{gawk} warns about the problem @var{filename}.
@@ -17678,8 +17586,8 @@ In such a case, @code{fflush()} returns @minus{}1, as well.
@sidebar Interactive Versus Noninteractive Buffering
@cindex buffering, interactive vs.@: noninteractive
-As a side point, buffering issues can be even more confusing, depending
-upon whether your program is @dfn{interactive} (i.e., communicating
+As a side point, buffering issues can be even more confusing if
+your program is @dfn{interactive} (i.e., communicating
with a user sitting at a keyboard).@footnote{A program is interactive
if the standard output is connected to a terminal device. On modern
systems, this means your keyboard and screen.}
@@ -17722,7 +17630,7 @@ it is all buffered and sent down the pipe to @command{cat} in one shot.
@cindexawkfunc{system}
@cindex invoke shell command
@cindex interacting with other programs
-Execute the operating-system
+Execute the operating system
command @var{command} and then return to the @command{awk} program.
Return @var{command}'s exit status.
@@ -17826,18 +17734,14 @@ you would see the latter (undesirable) output.
@subsection Time Functions
@cindex time functions
-@c STARTOFRANGE tst
@cindex timestamps
-@c STARTOFRANGE logftst
@cindex log files, timestamps in
-@c STARTOFRANGE filogtst
@cindex files, log@comma{} timestamps in
-@c STARTOFRANGE gawtst
@cindex @command{gawk}, timestamps
@cindex POSIX @command{awk}, timestamps and
-@code{awk} programs are commonly used to process log files
+@command{awk} programs are commonly used to process log files
containing timestamp information, indicating when a
-particular log record was written. Many programs log their timestamp
+particular log record was written. Many programs log their timestamps
in the form returned by the @code{time()} system call, which is the
number of seconds since a particular epoch. On POSIX-compliant systems,
it is the number of seconds since
@@ -17898,7 +17802,7 @@ The values of these numbers need not be within the ranges specified;
for example, an hour of @minus{}1 means 1 hour before midnight.
The origin-zero Gregorian calendar is assumed, with year 0 preceding
year 1 and year @minus{}1 preceding year 0.
-The time is assumed to be in the local timezone.
+The time is assumed to be in the local time zone.
If the daylight-savings flag is positive, the time is assumed to be
daylight savings time; if zero, the time is assumed to be standard
time; and if negative (the default), @code{mktime()} attempts to determine
@@ -17910,7 +17814,6 @@ is out of range, @code{mktime()} returns @minus{}1.
@cindex @command{gawk}, @code{PROCINFO} array in
@cindex @code{PROCINFO} array
@item @code{strftime(}[@var{format} [@code{,} @var{timestamp} [@code{,} @var{utc-flag}] ] ]@code{)}
-@c STARTOFRANGE strf
@cindexgawkfunc{strftime}
@cindex format time string
Format the time specified by @var{timestamp}
@@ -18059,12 +17962,12 @@ Equivalent to specifying @samp{%H:%M:%S}.
The weekday as a decimal number (1--7). Monday is day one.
@item %U
-The week number of the year (the first Sunday as the first day of week one)
+The week number of the year (with the first Sunday as the first day of week one)
as a decimal number (00--53).
@c @cindex ISO 8601
@item %V
-The week number of the year (the first Monday as the first
+The week number of the year (with the first Monday as the first
day of week one) as a decimal number (01--53).
The method for determining the week number is as specified by ISO 8601.
(To wit: if the week containing January 1 has four or more days in the
@@ -18075,7 +17978,7 @@ and the next week is week one.)
The weekday as a decimal number (0--6). Sunday is day zero.
@item %W
-The week number of the year (the first Monday as the first day of week one)
+The week number of the year (with the first Monday as the first day of week one)
as a decimal number (00--53).
@item %x
@@ -18095,8 +17998,8 @@ The full year as a decimal number (e.g., 2015).
@c @cindex RFC 822
@c @cindex RFC 1036
@item %z
-The timezone offset in a +HHMM format (e.g., the format necessary to
-produce RFC 822/RFC 1036 date headers).
+The time zone offset in a @samp{+@var{HHMM}} format (e.g., the format
+necessary to produce RFC 822/RFC 1036 date headers).
@item %Z
The time zone name or abbreviation; no characters if
@@ -18159,7 +18062,6 @@ The time as a decimal timestamp in seconds since the epoch.
The date in VMS format (e.g., @samp{20-JUN-1991}).
@end ignore
@end table
-@c ENDOFRANGE strf
Additionally, the alternative representations are recognized but their
normal representations are used.
@@ -18210,23 +18112,14 @@ gawk 'BEGIN @{
exit exitval
@}' "$@@"
@end example
-@c ENDOFRANGE tst
-@c ENDOFRANGE logftst
-@c ENDOFRANGE filogtst
-@c ENDOFRANGE gawtst
@node Bitwise Functions
@subsection Bit-Manipulation Functions
@cindex bit-manipulation functions
-@c STARTOFRANGE bit
@cindex bitwise, operations
-@c STARTOFRANGE and
@cindex AND bitwise operation
-@c STARTOFRANGE oro
@cindex OR bitwise operation
-@c STARTOFRANGE xor
@cindex XOR bitwise operation
-@c STARTOFRANGE opbit
@cindex operations, bitwise
@quotation
@i{I can explain it for you, but I can't understand it for you.}
@@ -18246,7 +18139,7 @@ The operations are described in @ref{table-bitwise-ops}.
@ifnottex
@ifnotdocbook
@display
- Bit Operator
+ Bit operator
| AND | OR | XOR
|---+---+---+---+---+---
Operands | 0 | 1 | 0 | 1 | 0 | 1
@@ -18304,7 +18197,7 @@ Operands | 0 | 1 | 0 | 1 | 0 | 1
<tbody>
<row>
<entry colsep="0"></entry>
-<entry spanname="optitle"><emphasis role="bold">Bit Operator</emphasis></entry>
+<entry spanname="optitle"><emphasis role="bold">Bit operator</emphasis></entry>
</row>
<row rowsep="1">
@@ -18368,10 +18261,9 @@ of a given value.
Finally, two other common operations are to shift the bits left or right.
For example, if you have a bit string @samp{10111001} and you shift it
right by three bits, you end up with @samp{00010111}.@footnote{This example
-shows that 0's come in on the left side. For @command{gawk}, this is
+shows that zeros come in on the left side. For @command{gawk}, this is
always true, but in some languages, it's possible to have the left side
-fill with 1's.}
-@c Purposely decided to use 0's and 1's here. 2/2001.
+fill with ones.}
If you start over again with @samp{10111001} and shift it left by three
bits, you end up with @samp{11001000}. The following list describes
@command{gawk}'s built-in functions that implement the bitwise operations.
@@ -18425,7 +18317,7 @@ that illustrates the use of these functions:
@example
@group
@c file eg/lib/bits2str.awk
-# bits2str --- turn a byte into readable 1's and 0's
+# bits2str --- turn a byte into readable ones and zeros
function bits2str(bits, data, mask)
@{
@@ -18499,15 +18391,16 @@ $ @kbd{gawk -f testbits.awk}
@cindex converting, numbers to strings
@cindex number as string of bits
The @code{bits2str()} function turns a binary number into a string.
-The number @code{1} represents a binary value where the rightmost bit
-is set to 1. Using this mask,
+Initializing @code{mask} to one creates
+a binary value where the rightmost bit
+is set to one. Using this mask,
the function repeatedly checks the rightmost bit.
ANDing the mask with the value indicates whether the
-rightmost bit is 1 or not. If so, a @code{"1"} is concatenated onto the front
+rightmost bit is one or not. If so, a @code{"1"} is concatenated onto the front
of the string.
Otherwise, a @code{"0"} is added.
The value is then shifted right by one bit and the loop continues
-until there are no more 1 bits.
+until there are no more one bits.
If the initial value is zero, it returns a simple @code{"0"}.
Otherwise, at the end, it pads the value with zeros to represent multiples
@@ -18518,11 +18411,6 @@ decimal and octal values for the same numbers
(@pxref{Nondecimal-numbers}),
and then demonstrates the
results of the @code{compl()}, @code{lshift()}, and @code{rshift()} functions.
-@c ENDOFRANGE bit
-@c ENDOFRANGE and
-@c ENDOFRANGE oro
-@c ENDOFRANGE xor
-@c ENDOFRANGE opbit
@node Type Functions
@subsection Getting Type Information
@@ -18536,7 +18424,7 @@ that traverses every element of an array of arrays
@cindexgawkfunc{isarray}
@cindex scalar or array
@item isarray(@var{x})
-Return a true value if @var{x} is an array. Otherwise return false.
+Return a true value if @var{x} is an array. Otherwise, return false.
@end table
@code{isarray()} is meant for use in two circumstances. The first is when
@@ -18597,20 +18485,16 @@ The default value for @var{category} is @code{"LC_MESSAGES"}.
Return the plural form used for @var{number} of the
translation of @var{string1} and @var{string2} in text domain
@var{domain} for locale category @var{category}. @var{string1} is the
-English singular variant of a message, and @var{string2} the English plural
+English singular variant of a message, and @var{string2} is the English plural
variant of the same message.
The default value for @var{domain} is the current value of @code{TEXTDOMAIN}.
The default value for @var{category} is @code{"LC_MESSAGES"}.
@end table
-@c ENDOFRANGE funcbi
-@c ENDOFRANGE bifunc
@node User-defined
@section User-Defined Functions
-@c STARTOFRANGE udfunc
@cindex user-defined functions
-@c STARTOFRANGE funcud
@cindex functions, user-defined
Complicated @command{awk} programs can often be simplified by defining
your own functions. User-defined functions can be called just like
@@ -18630,12 +18514,11 @@ them (i.e., to tell @command{awk} what they should do).
@subsection Function Definition Syntax
@quotation
-@i{It's entirely fair to say that the @command{awk} syntax for local
+@i{It's entirely fair to say that the awk syntax for local
variable definitions is appallingly awful.}
@author Brian Kernighan
@end quotation
-@c STARTOFRANGE fdef
@cindex functions, defining
Definitions of functions can appear anywhere between the rules of an
@command{awk} program. Thus, the general form of an @command{awk} program is
@@ -18673,14 +18556,23 @@ the call.
A function cannot have two parameters with the same name, nor may it
have a parameter with the same name as the function itself.
-In addition, according to the POSIX standard, function parameters
+
+@quotation CAUTION
+According to the POSIX standard, function parameters
cannot have the same name as one of the special predefined variables
-(@pxref{Built-in Variables}). Not all versions of @command{awk} enforce
-this restriction.
+(@pxref{Built-in Variables}), nor may a function parameter have the
+same name as another function.
+
+Not all versions of @command{awk} enforce
+these restrictions.
+@command{gawk} always enforces the first restriction.
+With @option{--posix} (@pxref{Options}),
+it also enforces the second restriction.
+@end quotation
Local variables act like the empty string if referenced where a string
value is required, and like zero if referenced where a numeric value
-is required. This is the same as regular variables that have never been
+is required. This is the same as the behavior of regular variables that have never been
assigned a value. (There is more to understand about local variables;
@pxref{Dynamic Typing}.)
@@ -18714,7 +18606,7 @@ During execution of the function body, the arguments and local variable
values hide, or @dfn{shadow}, any variables of the same names used in the
rest of the program. The shadowed variables are not accessible in the
function definition, because there is no way to name them while their
-names have been taken away for the local variables. All other variables
+names have been taken away for the arguments and local variables. All other variables
used in the @command{awk} program can be referenced or set normally in the
function's body.
@@ -18781,7 +18673,7 @@ function myprint(num)
@end example
@noindent
-To illustrate, here is an @command{awk} rule that uses our @code{myprint}
+To illustrate, here is an @command{awk} rule that uses our @code{myprint()}
function:
@example
@@ -18822,13 +18714,13 @@ in an array and start over with a new list of elements
(@pxref{Delete}).
Instead of having
to repeat this loop everywhere that you need to clear out
-an array, your program can just call @code{delarray}.
+an array, your program can just call @code{delarray()}.
(This guarantees portability. The use of @samp{delete @var{array}} to delete
the contents of an entire array is a relatively recent@footnote{Late in 2012.}
addition to the POSIX standard.)
The following is an example of a recursive function. It takes a string
-as an input parameter and returns the string in backwards order.
+as an input parameter and returns the string in reverse order.
Recursive functions must always have a test that stops the recursion.
In this case, the recursion terminates when the input string is
already empty:
@@ -18882,12 +18774,10 @@ You might think that @code{ctime()} could use @code{PROCINFO["strftime"]}
for its format string. That would be a mistake, because @code{ctime()} is
supposed to return the time formatted in a standard fashion, and user-level
code could have changed @code{PROCINFO["strftime"]}.
-@c ENDOFRANGE fdef
@node Function Caveats
@subsection Calling User-Defined Functions
-@c STARTOFRANGE fudc
@cindex functions, user-defined, calling
@dfn{Calling a function} means causing the function to run and do its job.
A function call is an expression and its value is the value returned by
@@ -18927,7 +18817,7 @@ an error.
@cindex local variables, in a function
@cindex variables, local to a function
-Unlike many languages,
+Unlike in many languages,
there is no way to make a variable local to a @code{@{} @dots{} @code{@}} block in
@command{awk}, but you can make a variable local to a function. It is
good practice to do so whenever a variable is needed only in that
@@ -18936,7 +18826,7 @@ function.
To make a variable local to a function, simply declare the variable as
an argument after the actual function arguments
(@pxref{Definition Syntax}).
-Look at the following example where variable
+Look at the following example, where variable
@code{i} is a global variable used by both functions @code{foo()} and
@code{bar()}:
@@ -18977,7 +18867,7 @@ foo's i=3
top's i=3
@end example
-If you want @code{i} to be local to both @code{foo()} and @code{bar()} do as
+If you want @code{i} to be local to both @code{foo()} and @code{bar()}, do as
follows (the extra space before @code{i} is a coding convention to
indicate that @code{i} is a local variable, not an argument):
@@ -19065,7 +18955,7 @@ declare explicitly whether the arguments are passed @dfn{by value} or
@dfn{by reference}.
Instead, the passing convention is determined at runtime when
-the function is called according to the following rule:
+the function is called, according to the following rule:
if the argument is an array variable, then it is passed by reference.
Otherwise, the argument is passed by value.
@@ -19142,7 +19032,7 @@ prints @samp{a[1] = 1, a[2] = two, a[3] = 3}, because
@cindex undefined functions
@cindex functions, undefined
Some @command{awk} implementations allow you to call a function that
-has not been defined. They only report a problem at runtime when the
+has not been defined. They only report a problem at runtime, when the
program actually tries to call the function. For example:
@example
@@ -19179,7 +19069,6 @@ or the @code{nextfile} statement
@end ifnotdocbook
inside a user-defined function.
@command{gawk} does not have this limitation.
-@c ENDOFRANGE fudc
@node Return Statement
@subsection The @code{return} Statement
@@ -19202,15 +19091,15 @@ makes the returned value undefined, and therefore, unpredictable.
In practice, though, all versions of @command{awk} simply return the
null string, which acts like zero if used in a numeric context.
-A @code{return} statement with no value expression is assumed at the end of
-every function definition. So if control reaches the end of the function
-body, then technically, the function returns an unpredictable value.
+A @code{return} statement without an @var{expression} is assumed at the end of
+every function definition. So, if control reaches the end of the function
+body, then technically the function returns an unpredictable value.
In practice, it returns the empty string. @command{awk}
does @emph{not} warn you if you use the return value of such a function.
Sometimes, you want to write a function for what it does, not for
what it returns. Such a function corresponds to a @code{void} function
-in C, C++ or Java, or to a @code{procedure} in Ada. Thus, it may be appropriate to not
+in C, C++, or Java, or to a @code{procedure} in Ada. Thus, it may be appropriate to not
return any value; simply bear in mind that you should not be using the
return value of such a function.
@@ -19307,7 +19196,6 @@ does report the second error.
Usually, such things aren't a big issue, but it's worth
being aware of them.
-@c ENDOFRANGE udfunc
@node Indirect Calls
@section Indirect Function Calls
@@ -19330,13 +19218,15 @@ function calls, you can specify the name of the function to call as a
string variable, and then call the function. Let's look at an example.
Suppose you have a file with your test scores for the classes you
-are taking. The first field is the class name. The following fields
+are taking, and
+you wish to get the sum and the average of
+your test scores.
+The first field is the class name. The following fields
are the functions to call to process the data, up to a ``marker''
field @samp{data:}. Following the marker, to the end of the record,
are the various numeric test scores.
-Here is the initial file; you wish to get the sum and the average of
-your test scores:
+Here is the initial file:
@example
@c file eg/data/class_data1
@@ -19419,9 +19309,9 @@ function sum(first, last, ret, i)
@c endfile
@end example
-These two functions expect to work on fields; thus the parameters
+These two functions expect to work on fields; thus, the parameters
@code{first} and @code{last} indicate where in the fields to start and end.
-Otherwise they perform the expected computations and are not unusual:
+Otherwise, they perform the expected computations and are not unusual:
@example
@c file eg/prog/indirectcall.awk
@@ -19480,8 +19370,8 @@ The ability to use indirect function calls is more powerful than you may
think at first. The C and C++ languages provide ``function pointers,'' which
are a mechanism for calling a function chosen at runtime. One of the most
well-known uses of this ability is the C @code{qsort()} function, which sorts
-an array using the famous ``quick sort'' algorithm
-(see @uref{http://en.wikipedia.org/wiki/Quick_sort, the Wikipedia article}
+an array using the famous ``quicksort'' algorithm
+(see @uref{http://en.wikipedia.org/wiki/Quicksort, the Wikipedia article}
for more information). To use this function, you supply a pointer to a comparison
function. This mechanism allows you to sort arbitrary data in an arbitrary
fashion.
@@ -19500,11 +19390,11 @@ We can do something similar using @command{gawk}, like this:
# January 2009
@c endfile
-
@end ignore
@c file eg/lib/quicksort.awk
-# quicksort --- C.A.R. Hoare's quick sort algorithm. See Wikipedia
-# or almost any algorithms or computer science text
+
+# quicksort --- C.A.R. Hoare's quicksort algorithm. See Wikipedia
+# or almost any algorithms or computer science text.
@c endfile
@ignore
@c file eg/lib/quicksort.awk
@@ -19542,7 +19432,7 @@ function quicksort_swap(data, i, j, temp)
The @code{quicksort()} function receives the @code{data} array, the starting and ending
indices to sort (@code{left} and @code{right}), and the name of a function that
-performs a ``less than'' comparison. It then implements the quick sort algorithm.
+performs a ``less than'' comparison. It then implements the quicksort algorithm.
To make use of the sorting function, we return to our previous example. The
first thing to do is write some comparison functions:
@@ -19733,7 +19623,7 @@ for (i = 1; i <= n; i++)
@end example
@noindent
-@code{gawk} looks up the actual function to call only once.
+@command{gawk} looks up the actual function to call only once.
@node Functions Summary
@section Summary
@@ -19800,7 +19690,6 @@ program. This is equivalent to function pointers in C and C++.
@end itemize
-@c ENDOFRANGE funcud
@ifnotinfo
@part @value{PART2}Problem Solving with @command{awk}
@@ -19822,18 +19711,15 @@ It contains the following chapters:
@node Library Functions
@chapter A Library of @command{awk} Functions
-@c STARTOFRANGE libf
@cindex libraries of @command{awk} functions
-@c STARTOFRANGE flib
@cindex functions, library
-@c STARTOFRANGE fudlib
@cindex functions, user-defined, library of
@DBREF{User-defined} describes how to write
your own @command{awk} functions. Writing functions is important, because
it allows you to encapsulate algorithms and program tasks in a single
place. It simplifies programming, making program development more
-manageable, and making programs more readable.
+manageable and making programs more readable.
@cindex Kernighan, Brian
@cindex Plauger, P.J.@:
@@ -19962,7 +19848,7 @@ often use variable names like these for their own purposes.
The example programs shown in this @value{CHAPTER} all start the names of their
private variables with an underscore (@samp{_}). Users generally don't use
leading underscores in their variable names, so this convention immediately
-decreases the chances that the variable name will be accidentally shared
+decreases the chances that the variable names will be accidentally shared
with the user's program.
@cindex @code{_} (underscore), in names of private variables
@@ -19980,8 +19866,8 @@ show how our own @command{awk} programming style has evolved and to
provide some basis for this discussion.}
As a final note on variable naming, if a function makes global variables
-available for use by a main program, it is a good convention to start that
-variable's name with a capital letter---for
+available for use by a main program, it is a good convention to start those
+variables' names with a capital letter---for
example, @code{getopt()}'s @code{Opterr} and @code{Optind} variables
(@pxref{Getopt Function}).
The leading capital letter indicates that it is global, while the fact that
@@ -19992,7 +19878,7 @@ not one of @command{awk}'s predefined variables, such as @code{FS}.
It is also important that @emph{all} variables in library
functions that do not need to save state are, in fact, declared
local.@footnote{@command{gawk}'s @option{--dump-variables} command-line
-option is useful for verifying this.} If this is not done, the variable
+option is useful for verifying this.} If this is not done, the variables
could accidentally be used in the user's program, leading to bugs that
are very difficult to track down:
@@ -20149,13 +20035,9 @@ be tested with @command{gawk} and the results compared to the built-in
@node Assert Function
@subsection Assertions
-@c STARTOFRANGE asse
@cindex assertions
-@c STARTOFRANGE assef
@cindex @code{assert()} function (C library)
-@c STARTOFRANGE libfass
@cindex libraries of @command{awk} functions, assertions
-@c STARTOFRANGE flibass
@cindex functions, library, assertions
@cindex @command{awk} programs, lengthy, assertions
When writing large programs, it is often useful to know
@@ -20194,7 +20076,7 @@ Following is the function:
@example
@c file eg/lib/assert.awk
-# assert --- assert that a condition is true. Otherwise exit.
+# assert --- assert that a condition is true. Otherwise, exit.
@c endfile
@ignore
@@ -20230,7 +20112,7 @@ is false, it prints a message to standard error, using the @code{string}
parameter to describe the failed condition. It then sets the variable
@code{_assert_exit} to one and executes the @code{exit} statement.
The @code{exit} statement jumps to the @code{END} rule. If the @code{END}
-rules finds @code{_assert_exit} to be true, it exits immediately.
+rule finds @code{_assert_exit} to be true, it exits immediately.
The purpose of the test in the @code{END} rule is to
keep any other @code{END} rules from running. When an assertion fails, the
@@ -20271,10 +20153,6 @@ most likely causing the program to hang as it waits for input.
There is a simple workaround to this:
make sure that such a @code{BEGIN} rule always ends
with an @code{exit} statement.
-@c ENDOFRANGE asse
-@c ENDOFRANGE assef
-@c ENDOFRANGE flibass
-@c ENDOFRANGE libfass
@node Round Function
@subsection Rounding Numbers
@@ -20526,7 +20404,7 @@ all the strings in an array into one long string. The following function,
the application programs
(@pxref{Sample Programs}).
-Good function design is important; this function needs to be general but it
+Good function design is important; this function needs to be general, but it
should also have a reasonable default behavior. It is called with an array
as well as the beginning and ending indices of the elements in the array to be
merged. This assumes that the array indices are numeric---a reasonable
@@ -20674,7 +20552,7 @@ allowed the user to supply an optional timestamp value to use instead
of the current time.
@node Readfile Function
-@subsection Reading a Whole File At Once
+@subsection Reading a Whole File at Once
Often, it is convenient to have the entire contents of a file available
in memory as a single string. A straightforward but naive way to
@@ -20731,13 +20609,13 @@ function readfile(file, tmp, save_rs)
It works by setting @code{RS} to @samp{^$}, a regular expression that
will never match if the file has contents. @command{gawk} reads data from
-the file into @code{tmp} attempting to match @code{RS}. The match fails
+the file into @code{tmp}, attempting to match @code{RS}. The match fails
after each read, but fails quickly, such that @command{gawk} fills
@code{tmp} with the entire contents of the file.
(@DBXREF{Records} for information on @code{RT} and @code{RS}.)
In the case that @code{file} is empty, the return value is the null
-string. Thus calling code may use something like:
+string. Thus, calling code may use something like:
@example
contents = readfile("/some/path")
@@ -20748,7 +20626,7 @@ if (length(contents) == 0)
This tests the result to see if it is empty or not. An equivalent
test would be @samp{contents == ""}.
-@xref{Extension Sample Readfile}, for an extension function that
+@DBXREF{Extension Sample Readfile} for an extension function that
also reads an entire file into memory.
@node Shell Quoting
@@ -20832,11 +20710,8 @@ function shell_quote(s, # parameter
@node Data File Management
@section @value{DDF} Management
-@c STARTOFRANGE dataf
@cindex files, managing
-@c STARTOFRANGE libfdataf
@cindex libraries of @command{awk} functions, managing, data files
-@c STARTOFRANGE flibdataf
@cindex functions, library, managing data files
This @value{SECTION} presents functions that are useful for managing
command-line @value{DF}s.
@@ -20858,8 +20733,8 @@ The @code{BEGIN} and @code{END} rules are each executed exactly once, at
the beginning and end of your @command{awk} program, respectively
(@pxref{BEGIN/END}).
We (the @command{gawk} authors) once had a user who mistakenly thought that the
-@code{BEGIN} rule is executed at the beginning of each @value{DF} and the
-@code{END} rule is executed at the end of each @value{DF}.
+@code{BEGIN} rules were executed at the beginning of each @value{DF} and the
+@code{END} rules were executed at the end of each @value{DF}.
When informed
that this was not the case, the user requested that we add new special
@@ -20899,7 +20774,7 @@ END @{ endfile(FILENAME) @}
This file must be loaded before the user's ``main'' program, so that the
rule it supplies is executed first.
-This rule relies on @command{awk}'s @code{FILENAME} variable that
+This rule relies on @command{awk}'s @code{FILENAME} variable, which
automatically changes for each new @value{DF}. The current @value{FN} is
saved in a private variable, @code{_oldfilename}. If @code{FILENAME} does
not equal @code{_oldfilename}, then a new @value{DF} is being processed and
@@ -20915,7 +20790,7 @@ first @value{DF}.
The program also supplies an @code{END} rule to do the final processing for
the last file. Because this @code{END} rule comes before any @code{END} rules
supplied in the ``main'' program, @code{endfile()} is called first. Once
-again the value of multiple @code{BEGIN} and @code{END} rules should be clear.
+again, the value of multiple @code{BEGIN} and @code{END} rules should be clear.
@cindex @code{beginfile()} user-defined function
@cindex @code{endfile()} user-defined function
@@ -20958,7 +20833,7 @@ how it simplifies writing the main program.
You are probably wondering, if @code{beginfile()} and @code{endfile()}
functions can do the job, why does @command{gawk} have
-@code{BEGINFILE} and @code{ENDFILE} patterns (@pxref{BEGINFILE/ENDFILE})?
+@code{BEGINFILE} and @code{ENDFILE} patterns?
Good question. Normally, if @command{awk} cannot open a file, this
causes an immediate fatal error. In this case, there is no way for a
@@ -20967,13 +20842,14 @@ calling it relies on the file being open and at the first record. Thus,
the main reason for @code{BEGINFILE} is to give you a ``hook'' to catch
files that cannot be processed. @code{ENDFILE} exists for symmetry,
and because it provides an easy way to do per-file cleanup processing.
+For more information, refer to @ref{BEGINFILE/ENDFILE}.
@end sidebar
@node Rewind Function
@subsection Rereading the Current File
@cindex files, reading
-Another request for a new built-in function was for a @code{rewind()}
+Another request for a new built-in function was for a
function that would make it possible to reread the current file.
The requesting user didn't want to have to use @code{getline}
(@pxref{Getline})
@@ -20982,7 +20858,7 @@ inside a loop.
However, as long as you are not in the @code{END} rule, it is
quite easy to arrange to immediately close the current input file
and then start over with it from the top.
-For lack of a better name, we'll call it @code{rewind()}:
+For lack of a better name, we'll call the function @code{rewind()}:
@cindex @code{rewind()} user-defined function
@example
@@ -21075,16 +20951,16 @@ See also @ref{ARGC and ARGV}.
Because @command{awk} variable names only allow the English letters,
the regular expression check purposely does not use character classes
such as @samp{[:alpha:]} and @samp{[:alnum:]}
-(@pxref{Bracket Expressions})
+(@pxref{Bracket Expressions}).
@node Empty Files
-@subsection Checking for Zero-length Files
+@subsection Checking for Zero-Length Files
All known @command{awk} implementations silently skip over zero-length files.
This is a by-product of @command{awk}'s implicit
read-a-record-and-match-against-the-rules loop: when @command{awk}
tries to read a record from an empty file, it immediately receives an
-end of file indication, closes the file, and proceeds on to the next
+end-of-file indication, closes the file, and proceeds on to the next
command-line @value{DF}, @emph{without} executing any user-level
@command{awk} program code.
@@ -21149,7 +21025,7 @@ Occasionally, you might not want @command{awk} to process command-line
variable assignments
(@pxref{Assignment Options}).
In particular, if you have a @value{FN} that contains an @samp{=} character,
-@command{awk} treats the @value{FN} as an assignment, and does not process it.
+@command{awk} treats the @value{FN} as an assignment and does not process it.
Some users have suggested an additional command-line option for @command{gawk}
to disable command-line assignments. However, some simple programming with
@@ -21199,22 +21075,14 @@ The use of @code{No_command_assign} allows you to disable command-line
assignments at invocation time, by giving the variable a true value.
When not set, it is initially zero (i.e., false), so the command-line arguments
are left alone.
-@c ENDOFRANGE dataf
-@c ENDOFRANGE flibdataf
-@c ENDOFRANGE libfdataf
@node Getopt Function
@section Processing Command-Line Options
-@c STARTOFRANGE libfclo
@cindex libraries of @command{awk} functions, command-line options
-@c STARTOFRANGE flibclo
@cindex functions, library, command-line options
-@c STARTOFRANGE clop
@cindex command-line options, processing
-@c STARTOFRANGE oclp
@cindex options, command-line, processing
-@c STARTOFRANGE clibf
@cindex functions, library, C library
@cindex arguments, processing
Most utilities on POSIX-compatible systems take options on
@@ -21519,8 +21387,8 @@ BEGIN @{
@c endfile
@end example
-The rest of the @code{BEGIN} rule is a simple test program. Here is the
-result of two sample runs of the test program:
+The rest of the @code{BEGIN} rule is a simple test program. Here are the
+results of two sample runs of the test program:
@example
$ @kbd{awk -f getopt.awk -v _getopt_test=1 -- -a -cbARG bax -x}
@@ -21566,27 +21434,19 @@ further options
Several of the sample programs presented in
@ref{Sample Programs},
use @code{getopt()} to process their arguments.
-@c ENDOFRANGE libfclo
-@c ENDOFRANGE flibclo
-@c ENDOFRANGE clop
-@c ENDOFRANGE oclp
@node Passwd Functions
@section Reading the User Database
-@c STARTOFRANGE libfudata
@cindex libraries of @command{awk} functions, user database, reading
-@c STARTOFRANGE flibudata
@cindex functions, library, user database@comma{} reading
-@c STARTOFRANGE udatar
@cindex user database@comma{} reading
-@c STARTOFRANGE dataur
@cindex database, users@comma{} reading
@cindex @code{PROCINFO} array
The @code{PROCINFO} array
(@pxref{Built-in Variables})
provides access to the current user's real and effective user and group ID
-numbers, and if available, the user's supplementary group set.
+numbers, and, if available, the user's supplementary group set.
However, because these are numbers, they do not provide very useful
information to the average user. There needs to be some way to find the
user information associated with the user and group ID numbers. This
@@ -21606,7 +21466,7 @@ kept. Instead, it provides the @code{<pwd.h>} header file
and several C language subroutines for obtaining user information.
The primary function is @code{getpwent()}, for ``get password entry.''
The ``password'' comes from the original user database file,
-@file{/etc/passwd}, which stores user information, along with the
+@file{/etc/passwd}, which stores user information along with the
encrypted passwords (hence the name).
@cindex @command{pwcat} program
@@ -21705,7 +21565,7 @@ The user's encrypted password. This may not be available on some systems.
@item User-ID
The user's numeric user ID number.
-(On some systems, it's a C @code{long}, and not an @code{int}. Thus
+(On some systems, it's a C @code{long}, and not an @code{int}. Thus,
we cast it to @code{long} for all cases.)
@item Group-ID
@@ -21832,7 +21692,7 @@ The code that checks for using @code{FPAT}, using @code{using_fpat}
and @code{PROCINFO["FS"]}, is similar.
The main part of the function uses a loop to read database lines, split
-the line into fields, and then store the line into each array as necessary.
+the lines into fields, and then store the lines into each array as necessary.
When the loop is done, @code{@w{_pw_init()}} cleans up by closing the pipeline,
setting @code{@w{_pw_inited}} to one, and restoring @code{FS}
(and @code{FIELDWIDTHS} or @code{FPAT}
@@ -21927,21 +21787,13 @@ and such a change would clutter up the code.
The @command{id} program in @DBREF{Id Program}
uses these functions.
-@c ENDOFRANGE libfudata
-@c ENDOFRANGE flibudata
-@c ENDOFRANGE udatar
-@c ENDOFRANGE dataur
@node Group Functions
@section Reading the Group Database
-@c STARTOFRANGE libfgdata
@cindex libraries of @command{awk} functions, group database, reading
-@c STARTOFRANGE flibgdata
@cindex functions, library, group database@comma{} reading
-@c STARTOFRANGE gdatar
@cindex group database, reading
-@c STARTOFRANGE datagr
@cindex database, group, reading
@cindex @code{PROCINFO} array, and group membership
@cindex @code{getgrent()} function (C library)
@@ -22057,7 +21909,7 @@ it is usually empty or set to @samp{*}.
@item Group ID Number
The group's numeric group ID number;
the association of name to number must be unique within the file.
-(On some systems it's a C @code{long}, and not an @code{int}. Thus
+(On some systems it's a C @code{long}, and not an @code{int}. Thus,
we cast it to @code{long} for all cases.)
@item Group Member List
@@ -22171,32 +22023,32 @@ The @code{@w{_gr_init()}} function first saves @code{FS},
@code{$0}, and then sets @code{FS} and @code{RS} to the correct values for
scanning the group information.
It also takes care to note whether @code{FIELDWIDTHS} or @code{FPAT}
-is being used, and to restore the appropriate field splitting mechanism.
+is being used, and to restore the appropriate field-splitting mechanism.
-The group information is stored is several associative arrays.
+The group information is stored in several associative arrays.
The arrays are indexed by group name (@code{@w{_gr_byname}}), by group ID number
(@code{@w{_gr_bygid}}), and by position in the database (@code{@w{_gr_bycount}}).
There is an additional array indexed by username (@code{@w{_gr_groupsbyuser}}),
which is a space-separated list of groups to which each user belongs.
-Unlike the user database, it is possible to have multiple records in the
+Unlike in the user database, it is possible to have multiple records in the
database for the same group. This is common when a group has a large number
of members. A pair of such entries might look like the following:
@example
-tvpeople:*:101:johny,jay,arsenio
+tvpeople:*:101:johnny,jay,arsenio
tvpeople:*:101:david,conan,tom,joan
@end example
For this reason, @code{_gr_init()} looks to see if a group name or
-group ID number is already seen. If it is, the usernames are
-simply concatenated onto the previous list of users.@footnote{There is actually a
+group ID number is already seen. If so, the usernames are
+simply concatenated onto the previous list of users.@footnote{There is a
subtle problem with the code just presented. Suppose that
the first time there were no names. This code adds the names with
a leading comma. It also doesn't check that there is a @code{$4}.}
Finally, @code{_gr_init()} closes the pipeline to @command{grcat}, restores
-@code{FS} (and @code{FIELDWIDTHS} or @code{FPAT} if necessary), @code{RS}, and @code{$0},
+@code{FS} (and @code{FIELDWIDTHS} or @code{FPAT}, if necessary), @code{RS}, and @code{$0},
initializes @code{_gr_count} to zero
(it is used later), and makes @code{_gr_inited} nonzero.
@@ -22264,7 +22116,6 @@ function getgrent()
@}
@c endfile
@end example
-@c ENDOFRANGE clibf
@cindex @code{endgrent()} function (C library)
The @code{endgrent()} function resets @code{_gr_count} to zero so that @code{getgrent()} can
@@ -22297,12 +22148,12 @@ uses these functions.
@DBREF{Arrays of Arrays} described how @command{gawk}
provides arrays of arrays. In particular, any element of
-an array may be either a scalar, or another array. The
+an array may be either a scalar or another array. The
@code{isarray()} function (@pxref{Type Functions})
lets you distinguish an array
from a scalar.
The following function, @code{walk_array()}, recursively traverses
-an array, printing each element's indices and value.
+an array, printing the element indices and values.
You call it with the array and a string representing the name
of the array:
@@ -22353,10 +22204,6 @@ $ @kbd{gawk -f walk_array.awk}
@print{} a[4][2] = 42
@end example
-@c ENDOFRANGE libfgdata
-@c ENDOFRANGE flibgdata
-@c ENDOFRANGE gdatar
-@c ENDOFRANGE libf
@node Library Functions Summary
@section Summary
@@ -22378,24 +22225,24 @@ The functions presented here fit into the following categories:
@c nested list
@table @asis
@item General problems
-Number-to-string conversion, assertions, rounding, random number
+Number-to-string conversion, testing assertions, rounding, random number
generation, converting characters to numbers, joining strings, getting
easily usable time-of-day information, and reading a whole file in
-one shot.
+one shot
@item Managing @value{DF}s
Noting @value{DF} boundaries, rereading the current file, checking for
readable files, checking for zero-length files, and treating assignments
-as @value{FN}s.
+as @value{FN}s
@item Processing command-line options
-An @command{awk} version of the standard C @code{getopt()} function.
+An @command{awk} version of the standard C @code{getopt()} function
@item Reading the user and group databases
-Two sets of routines that parallel the C library versions.
+Two sets of routines that parallel the C library versions
@item Traversing arrays of arrays
-A simple function to traverse an array of arrays to any depth.
+A simple function to traverse an array of arrays to any depth
@end table
@c end nested list
@@ -22470,13 +22317,9 @@ output identical to that of the original version.
@end enumerate
@c EXCLUDE END
-@c ENDOFRANGE flib
-@c ENDOFRANGE fudlib
-@c ENDOFRANGE datagr
@node Sample Programs
@chapter Practical @command{awk} Programs
-@c STARTOFRANGE awkpex
@cindex @command{awk} programs, examples of
@c FULLXREF ON
@@ -22494,10 +22337,10 @@ in this @value{CHAPTER}.
The second presents @command{awk}
versions of several common POSIX utilities.
These are programs that you are hopefully already familiar with,
-and therefore, whose problems are understood.
+and therefore whose problems are understood.
By reimplementing these programs in @command{awk},
you can focus on the @command{awk}-related aspects of solving
-the programming problem.
+the programming problems.
The third is a grab bag of interesting programs.
These solve a number of different data-manipulation and management
@@ -22546,7 +22389,6 @@ cut.awk -- -c1-8 myfiles > results
@node Clones
@section Reinventing Wheels for Fun and Profit
-@c STARTOFRANGE posimawk
@cindex POSIX, programs@comma{} implementing in @command{awk}
This @value{SECTION} presents a number of POSIX utilities implemented in
@@ -22558,7 +22400,7 @@ It should be noted that these programs are not necessarily intended to
replace the installed versions on your system.
Nor may all of these programs be fully compliant with the most recent
POSIX standard. This is not a problem; their
-purpose is to illustrate @command{awk} language programming for ``real world''
+purpose is to illustrate @command{awk} language programming for ``real-world''
tasks.
The programs are presented in alphabetical order.
@@ -22577,11 +22419,8 @@ The programs are presented in alphabetical order.
@subsection Cutting Out Fields and Columns
@cindex @command{cut} utility
-@c STARTOFRANGE cut
@cindex @command{cut} utility
-@c STARTOFRANGE ficut
@cindex fields, cutting
-@c STARTOFRANGE colcut
@cindex columns, cutting
The @command{cut} utility selects, or ``cuts,'' characters or fields
from its standard input and sends them to its standard output.
@@ -22590,7 +22429,7 @@ but you may supply a command-line option to change the field
@dfn{delimiter} (i.e., the field-separator character). @command{cut}'s
definition of fields is less general than @command{awk}'s.
-A common use of @command{cut} might be to pull out just the login name of
+A common use of @command{cut} might be to pull out just the login names of
logged-on users from the output of @command{who}. For example, the following
pipeline generates a sorted, unique list of the logged-on users:
@@ -22889,21 +22728,14 @@ other @command{awk} implementations to use @code{substr()}
it is also extremely painful.
The @code{FIELDWIDTHS} variable supplies an elegant solution to the problem
of picking the input line apart by characters.
-@c ENDOFRANGE cut
-@c ENDOFRANGE ficut
-@c ENDOFRANGE colcut
@node Egrep Program
@subsection Searching for Regular Expressions in Files
-@c STARTOFRANGE regexps
@cindex regular expressions, searching for
-@c STARTOFRANGE sfregexp
@cindex searching, files for regular expressions
-@c STARTOFRANGE fsregexp
@cindex files, searching for regular expressions
-@c STARTOFRANGE egrep
@cindex @command{egrep} utility
The @command{egrep} utility searches files for patterns. It uses regular
expressions that are almost identical to those available in @command{awk}
@@ -23106,7 +22938,7 @@ successful or unsuccessful match. If the line does not match, the
@code{next} statement just moves on to the next record.
A number of additional tests are made, but they are only done if we
-are not counting lines. First, if the user only wants exit status
+are not counting lines. First, if the user only wants the exit status
(@code{no_print} is true), then it is enough to know that @emph{one}
line in this file matched, and we can skip on to the next file with
@code{nextfile}. Similarly, if we are only printing @value{FN}s, we can
@@ -23147,7 +22979,7 @@ if necessary:
@end example
The @code{END} rule takes care of producing the correct exit status. If
-there are no matches, the exit status is one; otherwise it is zero:
+there are no matches, the exit status is one; otherwise, it is zero:
@example
@c file eg/prog/egrep.awk
@@ -23171,17 +23003,12 @@ function usage()
@c endfile
@end example
-@c ENDOFRANGE regexps
-@c ENDOFRANGE sfregexp
-@c ENDOFRANGE fsregexp
-@c ENDOFRANGE egrep
@node Id Program
@subsection Printing Out User Information
@cindex printing, user information
@cindex users, information about, printing
-@c STARTOFRANGE id
@cindex @command{id} utility
The @command{id} utility lists a user's real and effective user ID numbers,
real and effective group ID numbers, and the user's group set, if any.
@@ -23204,7 +23031,8 @@ Here is a simple version of @command{id} written in @command{awk}.
It uses the user database library functions
(@pxref{Passwd Functions})
and the group database library functions
-(@pxref{Group Functions}):
+(@pxref{Group Functions})
+from @ref{Library Functions}.
The program is fairly straightforward. All the work is done in the
@code{BEGIN} rule. The user and group ID numbers are obtained from
@@ -23310,16 +23138,13 @@ code that is used repeatedly, making the whole program
shorter and cleaner. In particular, moving the check for
the empty string into this function saves several lines of code.
-@c ENDOFRANGE id
@node Split Program
@subsection Splitting a Large File into Pieces
@c FIXME: One day, update to current POSIX version of split
-@c STARTOFRANGE filspl
@cindex files, splitting
-@c STARTOFRANGE split
@cindex @code{split} utility
The @command{split} program splits large text files into smaller pieces.
Usage is as follows:@footnote{This is the traditional usage. The
@@ -23334,8 +23159,8 @@ By default,
the output files are named @file{xaa}, @file{xab}, and so on. Each file has
1,000 lines in it, with the likely exception of the last file. To change the
number of lines in each file, supply a number on the command line
-preceded with a minus (e.g., @samp{-500} for files with 500 lines in them
-instead of 1,000). To change the name of the output files to something like
+preceded with a minus sign (e.g., @samp{-500} for files with 500 lines in them
+instead of 1,000). To change the names of the output files to something like
@file{myfileaa}, @file{myfileab}, and so on, supply an additional
argument that specifies the @value{FN} prefix.
@@ -23454,15 +23279,12 @@ You might want to consider how to eliminate the use of
way as to solve the EBCDIC issue as well.
@end ifset
-@c ENDOFRANGE filspl
-@c ENDOFRANGE split
@node Tee Program
@subsection Duplicating Output into Multiple Files
@cindex files, multiple@comma{} duplicating output into
@cindex output, duplicating into files
-@c STARTOFRANGE tee
@cindex @code{tee} utility
The @code{tee} program is known as a ``pipe fitting.'' @code{tee} copies
its standard input to its standard output and also duplicates it to the
@@ -23575,18 +23397,14 @@ END @{
@}
@c endfile
@end example
-@c ENDOFRANGE tee
@node Uniq Program
@subsection Printing Nonduplicated Lines of Text
@c FIXME: One day, update to current POSIX version of uniq
-@c STARTOFRANGE prunt
@cindex printing, unduplicated lines of text
-@c STARTOFRANGE tpul
@cindex text@comma{} printing, unduplicated lines of
-@c STARTOFRANGE uniq
@cindex @command{uniq} utility
The @command{uniq} utility reads sorted lines of data on its standard
input, and by default removes duplicate lines. In other words, it only
@@ -23855,26 +23673,17 @@ suggestion.
@end ifset
-@c ENDOFRANGE prunt
-@c ENDOFRANGE tpul
-@c ENDOFRANGE uniq
@node Wc Program
@subsection Counting Things
@c FIXME: One day, update to current POSIX version of wc
-@c STARTOFRANGE count
@cindex counting
-@c STARTOFRANGE infco
@cindex input files, counting elements in
-@c STARTOFRANGE woco
@cindex words, counting
-@c STARTOFRANGE chco
@cindex characters, counting
-@c STARTOFRANGE lico
@cindex lines, counting
-@c STARTOFRANGE wc
@cindex @command{wc} utility
The @command{wc} (word count) utility counts lines, words, and characters in
one or more input files. Its usage is as follows:
@@ -24044,13 +23853,6 @@ END @{
@}
@c endfile
@end example
-@c ENDOFRANGE count
-@c ENDOFRANGE infco
-@c ENDOFRANGE lico
-@c ENDOFRANGE woco
-@c ENDOFRANGE chco
-@c ENDOFRANGE wc
-@c ENDOFRANGE posimawk
@node Miscellaneous Programs
@section A Grab Bag of @command{awk} Programs
@@ -24181,9 +23983,7 @@ Aharon Robbins <arnold@skeeve.com> wrote:
@author Erik Quanstrom
@end quotation
-@c STARTOFRANGE tialarm
@cindex time, alarm clock example program
-@c STARTOFRANGE alaex
@cindex alarm clock example program
The following program is a simple ``alarm clock'' program.
You give it a time of day and an optional message. At the specified time,
@@ -24199,7 +23999,7 @@ checking and setting of defaults: the delay, the count, and the message to
print. If the user supplied a message without the ASCII BEL
character (known as the ``alert'' character, @code{"\a"}), then it is added to
the message. (On many systems, printing the ASCII BEL generates an
-audible alert. Thus when the alarm goes off, the system calls attention
+audible alert. Thus, when the alarm goes off, the system calls attention
to itself in case the user is not looking at the computer.)
Just for a change, this program uses a @code{switch} statement
(@pxref{Switch Statement}), but the processing could be done with a series of
@@ -24335,15 +24135,11 @@ seconds are necessary:
@}
@c endfile
@end example
-@c ENDOFRANGE tialarm
-@c ENDOFRANGE alaex
@node Translate Program
@subsection Transliterating Characters
-@c STARTOFRANGE chtra
@cindex characters, transliterating
-@c STARTOFRANGE tr
@cindex @command{tr} utility
The system @command{tr} utility transliterates characters. For example, it is
often used to map uppercase letters into lowercase for further processing:
@@ -24372,7 +24168,7 @@ to @command{gawk}.
@c at least theoretically
The following program was written to
prove that character transliteration could be done with a user-level
-function. This program is not as complete as the system @command{tr} utility
+function. This program is not as complete as the system @command{tr} utility,
but it does most of the job.
The @command{translate} program was written long before @command{gawk}
@@ -24384,13 +24180,13 @@ takes three arguments:
@table @code
@item from
-A list of characters from which to translate.
+A list of characters from which to translate
@item to
-A list of characters to which to translate.
+A list of characters to which to translate
@item target
-The string on which to do the translation.
+The string on which to do the translation
@end table
Associative arrays make the translation part fairly easy. @code{t_ar} holds
@@ -24399,7 +24195,7 @@ loop goes through @code{from}, one character at a time. For each character
in @code{from}, if the character appears in @code{target},
it is replaced with the corresponding @code{to} character.
-The @code{translate()} function calls @code{stranslate()} using @code{$0}
+The @code{translate()} function calls @code{stranslate()}, using @code{$0}
as the target. The main program sets two global variables, @code{FROM} and
@code{TO}, from the command line, and then changes @code{ARGV} so that
@command{awk} reads from the standard input.
@@ -24421,7 +24217,7 @@ Finally, the processing rule simply calls @code{translate()} for each record:
@c endfile
@end ignore
@c file eg/prog/translate.awk
-# Bugs: does not handle things like: tr A-Z a-z, it has
+# Bugs: does not handle things like tr A-Z a-z; it has
# to be spelled out. However, if `to' is shorter than `from',
# the last character in `to' is used for the rest of `from'.
@@ -24491,17 +24287,13 @@ such as @samp{a-z}, as allowed by the @command{tr} utility.
Look at the code for @file{cut.awk} (@pxref{Cut Program})
for inspiration.
-@c ENDOFRANGE chtra
-@c ENDOFRANGE tr
@node Labels Program
@subsection Printing Mailing Labels
-@c STARTOFRANGE prml
@cindex printing, mailing labels
-@c STARTOFRANGE mlprint
@cindex mailing labels@comma{} printing
-Here is a ``real world''@footnote{``Real world'' is defined as
+Here is a ``real-world''@footnote{``Real world'' is defined as
``a program actually used to get something done.''}
program. This
script reads lists of names and
@@ -24510,7 +24302,7 @@ on it, two across and 10 down. The addresses are guaranteed to be no more
than five lines of data. Each address is separated from the next by a blank
line.
-The basic idea is to read 20 labels worth of data. Each line of each label
+The basic idea is to read 20 labels' worth of data. Each line of each label
is stored in the @code{line} array. The single rule takes care of filling
the @code{line} array and printing the page when 20 labels have been read.
@@ -24533,12 +24325,12 @@ of lines on the page
Most of the work is done in the @code{printpage()} function.
The label lines are stored sequentially in the @code{line} array. But they
-have to print horizontally; @code{line[1]} next to @code{line[6]},
+have to print horizontally: @code{line[1]} next to @code{line[6]},
@code{line[2]} next to @code{line[7]}, and so on. Two loops
accomplish this. The outer loop, controlled by @code{i}, steps through
every 10 lines of data; this is each row of labels. The inner loop,
controlled by @code{j}, goes through the lines within the row.
-As @code{j} goes from 0 to 4, @samp{i+j} is the @code{j}-th line in
+As @code{j} goes from 0 to 4, @samp{i+j} is the @code{j}th line in
the row, and @samp{i+j+5} is the entry next to it. The output ends up
looking something like this:
@@ -24563,7 +24355,6 @@ that there are two blank lines at the top and two blank lines at the bottom.
The @code{END} rule arranges to flush the final page of labels; there may
not have been an even multiple of 20 labels in the data:
-@c STARTOFRANGE labels
@cindex @code{labels.awk} program
@example
@c file eg/prog/labels.awk
@@ -24628,14 +24419,10 @@ END @{
@}
@c endfile
@end example
-@c ENDOFRANGE prml
-@c ENDOFRANGE mlprint
-@c ENDOFRANGE labels
@node Word Sorting
@subsection Generating Word-Usage Counts
-@c STARTOFRANGE worus
@cindex words, usage counts@comma{} generating
When working with large amounts of text, it can be interesting to know
@@ -24661,8 +24448,8 @@ END @{
@}
@end example
-The program relies on @command{awk}'s default field splitting
-mechanism to break each line up into ``words,'' and uses an
+The program relies on @command{awk}'s default field-splitting
+mechanism to break each line up into ``words'' and uses an
associative array named @code{freq}, indexed by each word, to count
the number of times the word occurs. In the @code{END} rule,
it prints the counts.
@@ -24697,7 +24484,6 @@ to remove punctuation characters. Finally, we solve the third problem
by using the system @command{sort} utility to process the output of the
@command{awk} script. Here is the new version of the program:
-@c STARTOFRANGE wordfreq
@cindex @code{wordfreq.awk} program
@example
@c file eg/prog/wordfreq.awk
@@ -24762,16 +24548,13 @@ This way of sorting must be used on systems that do not
have true pipes at the command-line (or batch-file) level.
See the general operating system documentation for more information on how
to use the @command{sort} program.
-@c ENDOFRANGE worus
-@c ENDOFRANGE wordfreq
@node History Sorting
@subsection Removing Duplicates from Unsorted Text
-@c STARTOFRANGE lidu
@cindex lines, duplicate@comma{} removing
The @command{uniq} program
-(@pxref{Uniq Program}),
+(@pxref{Uniq Program})
removes duplicate lines from @emph{sorted} data.
Suppose, however, you need to remove duplicate lines from a @value{DF} but
@@ -24793,7 +24576,6 @@ Each element of @code{lines} is a unique command, and the indices of
The @code{END} rule simply prints out the lines, in order:
@cindex Rakitzis, Byron
-@c STARTOFRANGE histsort
@cindex @code{histsort.awk} program
@example
@c file eg/prog/histsort.awk
@@ -24836,15 +24618,11 @@ print data[lines[i]], lines[i]
@noindent
This works because @code{data[$0]} is incremented each time a line is
seen.
-@c ENDOFRANGE lidu
-@c ENDOFRANGE histsort
@node Extract Program
@subsection Extracting Programs from Texinfo Source Files
-@c STARTOFRANGE texse
@cindex Texinfo, extracting programs from source files
-@c STARTOFRANGE fitex
@cindex files, Texinfo@comma{} extracting programs from
@ifnotinfo
Both this chapter and the previous chapter
@@ -24863,7 +24641,7 @@ Texinfo input file into separate files.
@cindex Texinfo
This @value{DOCUMENT} is written in @uref{http://www.gnu.org/software/texinfo/, Texinfo},
-the GNU project's document formatting language.
+the GNU Project's document formatting language.
A single Texinfo source file can be used to produce both
printed documentation, with @TeX{}, and online documentation.
@ifnotinfo
@@ -24922,7 +24700,7 @@ The Texinfo file looks something like this:
@example
@dots{}
-This program has a @@code@{BEGIN@} rule,
+This program has a @@code@{BEGIN@} rule
that prints a nice message:
@@example
@@ -24948,11 +24726,10 @@ The first rule handles calling @code{system()}, checking that a command is
given (@code{NF} is at least three) and also checking that the command
exits with a zero exit status, signifying OK:
-@c STARTOFRANGE extract
@cindex @code{extract.awk} program
@example
@c file eg/prog/extract.awk
-# extract.awk --- extract files and run programs from texinfo files
+# extract.awk --- extract files and run programs from Texinfo files
@c endfile
@ignore
@c file eg/prog/extract.awk
@@ -24993,12 +24770,12 @@ The second rule handles moving data into files. It verifies that a
@value{FN} is given in the directive. If the file named is not the
current file, then the current file is closed. Keeping the current file
open until a new file is encountered allows the use of the @samp{>}
-redirection for printing the contents, keeping open file management
+redirection for printing the contents, keeping open-file management
simple.
The @code{for} loop does the work. It reads lines using @code{getline}
(@pxref{Getline}).
-For an unexpected end of file, it calls the @code{@w{unexpected_eof()}}
+For an unexpected end-of-file, it calls the @code{@w{unexpected_eof()}}
function. If the line is an ``endfile'' line, then it breaks out of
the loop.
If the line is an @samp{@@group} or @samp{@@end group} line, then it
@@ -25094,16 +24871,13 @@ END @{
@}
@c endfile
@end example
-@c ENDOFRANGE texse
-@c ENDOFRANGE fitex
-@c ENDOFRANGE extract
@node Simple Sed
@subsection A Simple Stream Editor
@cindex @command{sed} utility
@cindex stream editors
-The @command{sed} utility is a stream editor, a program that reads a
+The @command{sed} utility is a @dfn{stream editor}, a program that reads a
stream of data, makes changes to it, and passes it on.
It is often used to make global changes to a large file or to a stream
of data generated by a pipeline of commands.
@@ -25126,7 +24900,6 @@ additional arguments are treated as @value{DF} names to process. If none
are provided, the standard input is used:
@cindex Brennan, Michael
-@c STARTOFRANGE awksed
@cindex @command{awksed.awk} program
@c @cindex simple stream editor
@c @cindex stream editor, simple
@@ -25203,14 +24976,11 @@ The @code{usage()} function prints an error message and exits.
Finally, the single rule handles the printing scheme outlined earlier,
using @code{print} or @code{printf} as appropriate, depending upon the
value of @code{RT}.
-@c ENDOFRANGE awksed
@node Igawk Program
@subsection An Easy Way to Use Library Functions
-@c STARTOFRANGE libfex
@cindex libraries of @command{awk} functions, example program for using
-@c STARTOFRANGE flibex
@cindex functions, library, example program for using
In @ref{Include Files}, we saw how @command{gawk} provides a built-in
file-inclusion capability. However, this is a @command{gawk} extension.
@@ -25252,7 +25022,7 @@ includes don't accidentally include a library function twice.
@command{igawk} should behave just like @command{gawk} externally. This
means it should accept all of @command{gawk}'s command-line arguments,
including the ability to have multiple source files specified via
-@option{-f}, and the ability to mix command-line and library source files.
+@option{-f} and the ability to mix command-line and library source files.
The program is written using the POSIX Shell (@command{sh}) command
language.@footnote{Fully explaining the @command{sh} language is beyond
@@ -25291,7 +25061,7 @@ Run the expanded program with @command{gawk} and any other original command-line
arguments that the user supplied (such as the @value{DF} names).
@end enumerate
-This program uses shell variables extensively: for storing command-line arguments,
+This program uses shell variables extensively: for storing command-line arguments and
the text of the @command{awk} program that will expand the user's program, for the
user's original program, and for the expanded program. Doing so removes some
potential problems that might arise were we to use temporary files instead,
@@ -25349,7 +25119,6 @@ program.
The program is as follows:
-@c STARTOFRANGE igawk
@cindex @code{igawk.sh} program
@example
@c file eg/prog/igawk.sh
@@ -25609,22 +25378,7 @@ Save the results of this processing in the shell variable
The last step is to call @command{gawk} with the expanded program,
along with the original
-options and command-line arguments that the user supplied.
-
-@c this causes more problems than it solves, so leave it out.
-@ignore
-The special file @file{/dev/null} is passed as a @value{DF} to @command{gawk}
-to handle an interesting case. Suppose that the user's program only has
-a @code{BEGIN} rule and there are no @value{DF}s to read.
-The program should exit without reading any @value{DF}s.
-However, suppose that an included library file defines an @code{END}
-rule of its own. In this case, @command{gawk} will hang, reading standard
-input. In order to avoid this, @file{/dev/null} is explicitly added to the
-command line. Reading from @file{/dev/null} always returns an immediate
-end of file indication.
-
-@c Hmm. Add /dev/null if $# is 0? Still messes up ARGV. Sigh.
-@end ignore
+options and command-line arguments that the user supplied:
@example
@c file eg/prog/igawk.sh
@@ -25674,10 +25428,6 @@ features to a program; they can often be layered on top.@footnote{@command{gawk}
does @code{@@include} processing itself in order to support the use
of @command{awk} programs as Web CGI scripts.}
-@c ENDOFRANGE libfex
-@c ENDOFRANGE flibex
-@c ENDOFRANGE awkpex
-@c ENDOFRANGE igawk
@node Anagram Program
@subsection Finding Anagrams from a Dictionary
@@ -25694,19 +25444,18 @@ the same letters
Column 2, Problem C, of Jon Bentley's @cite{Programming Pearls}, Second
Edition, presents an elegant algorithm. The idea is to give words that
are anagrams a common signature, sort all the words together by their
-signature, and then print them. Dr.@: Bentley observes that taking the
-letters in each word and sorting them produces that common signature.
+signatures, and then print them. Dr.@: Bentley observes that taking the
+letters in each word and sorting them produces those common signatures.
The following program uses arrays of arrays to bring together
words with the same signature and array sorting to print the words
in sorted order:
-@c STARTOFRANGE anagram
@cindex @code{anagram.awk} program
@example
@c file eg/prog/anagram.awk
-# anagram.awk --- An implementation of the anagram finding algorithm
-# from Jon Bentley's "Programming Pearls", 2nd edition.
+# anagram.awk --- An implementation of the anagram-finding algorithm
+# from Jon Bentley's "Programming Pearls," 2nd edition.
# Addison Wesley, 2000, ISBN 0-201-65788-0.
# Column 2, Problem C, section 2.8, pp 18-20.
@c endfile
@@ -25754,7 +25503,7 @@ sorts the letters, and then joins them back together:
@example
@c file eg/prog/anagram.awk
-# word2key --- split word apart into letters, sort, joining back together
+# word2key --- split word apart into letters, sort, and join back together
function word2key(word, a, i, n, result)
@{
@@ -25810,7 +25559,6 @@ babery yabber
@dots{}
@end example
-@c ENDOFRANGE anagram
@node Signature Program
@subsection And Now for Something Completely Different
@@ -25950,12 +25698,13 @@ characters. The ability to use @code{split()} with the empty string as
the separator can considerably simplify such tasks.
@item
-The library functions from @ref{Library Functions}, proved their
-usefulness for a number of real (if small) programs.
+The examples here demonstrate the usefulness of the library
+functions from @DBREF{Library Functions}
+for a number of real (if small) programs.
@item
Besides reinventing POSIX wheels, other programs solved a selection of
-interesting problems, such as finding duplicates words in text, printing
+interesting problems, such as finding duplicate words in text, printing
mailing labels, and finding anagrams.
@end itemize
@@ -26130,9 +25879,7 @@ It contains the following chapters:
@node Advanced Features
@chapter Advanced Features of @command{gawk}
-@c STARTOFRANGE gawadv
@cindex @command{gawk}, features, advanced
-@c STARTOFRANGE advgaw
@cindex advanced features, @command{gawk}
@ignore
Contributed by: Peter Langston <pud!psl@bellcore.bellcore.com>
@@ -26153,18 +25900,18 @@ a violent psychopath who knows where you live.}
This @value{CHAPTER} discusses advanced features in @command{gawk}.
It's a bit of a ``grab bag'' of items that are otherwise unrelated
to each other.
-First, a command-line option allows @command{gawk} to recognize
+First, we look at a command-line option that allows @command{gawk} to recognize
nondecimal numbers in input data, not just in @command{awk}
programs.
Then, @command{gawk}'s special features for sorting arrays are presented.
Next, two-way I/O, discussed briefly in earlier parts of this
@value{DOCUMENT}, is described in full detail, along with the basics
-of TCP/IP networking. Finally, @command{gawk}
+of TCP/IP networking. Finally, we see how @command{gawk}
can @dfn{profile} an @command{awk} program, making it possible to tune
it for performance.
@c FULLXREF ON
-A number of advanced features require separate @value{CHAPTER}s of their
+Additional advanced features are discussed in separate @value{CHAPTER}s of their
own:
@itemize @value{BULLET}
@@ -26258,7 +26005,8 @@ This option may disappear in a future version of @command{gawk}.
@node Array Sorting
@section Controlling Array Traversal and Array Sorting
-@command{gawk} lets you control the order in which a @samp{for (i in array)}
+@command{gawk} lets you control the order in which a
+@samp{for (@var{indx} in @var{array})}
loop traverses an array.
In addition, two built-in functions, @code{asort()} and @code{asorti()},
@@ -26274,7 +26022,7 @@ to order the elements during sorting.
@node Controlling Array Traversal
@subsection Controlling Array Traversal
-By default, the order in which a @samp{for (i in array)} loop
+By default, the order in which a @samp{for (@var{indx} in @var{array})} loop
scans an array is not defined; it is generally based upon
the internal implementation of arrays inside @command{awk}.
@@ -26303,23 +26051,23 @@ function comp_func(i1, v1, i2, v2)
@}
@end example
-Here, @var{i1} and @var{i2} are the indices, and @var{v1} and @var{v2}
+Here, @code{i1} and @code{i2} are the indices, and @code{v1} and @code{v2}
are the corresponding values of the two elements being compared.
-Either @var{v1} or @var{v2}, or both, can be arrays if the array being
+Either @code{v1} or @code{v2}, or both, can be arrays if the array being
traversed contains subarrays as values.
(@DBXREF{Arrays of Arrays} for more information about subarrays.)
The three possible return values are interpreted as follows:
@table @code
@item comp_func(i1, v1, i2, v2) < 0
-Index @var{i1} comes before index @var{i2} during loop traversal.
+Index @code{i1} comes before index @code{i2} during loop traversal.
@item comp_func(i1, v1, i2, v2) == 0
-Indices @var{i1} and @var{i2}
-come together but the relative order with respect to each other is undefined.
+Indices @code{i1} and @code{i2}
+come together, but the relative order with respect to each other is undefined.
@item comp_func(i1, v1, i2, v2) > 0
-Index @var{i1} comes after index @var{i2} during loop traversal.
+Index @code{i1} comes after index @code{i2} during loop traversal.
@end table
Our first comparison function can be used to scan an array in
@@ -26480,7 +26228,7 @@ As already mentioned, the order of the indices is arbitrary if two
elements compare equal. This is usually not a problem, but letting
the tied elements come out in arbitrary order can be an issue, especially
when comparing item values. The partial ordering of the equal elements
-may change the next time the array is traversed, if other elements are added or
+may change the next time the array is traversed, if other elements are added to or
removed from the array. One way to resolve ties when comparing elements
with otherwise equal values is to include the indices in the comparison
rules. Note that doing this may make the loop traversal less efficient,
@@ -26523,7 +26271,7 @@ equivalent or distinct.
Another point to keep in mind is that in the case of subarrays,
the element values can themselves be arrays; a production comparison
function should use the @code{isarray()} function
-(@pxref{Type Functions}),
+(@pxref{Type Functions})
to check for this, and choose a defined sorting order for subarrays.
All sorting based on @code{PROCINFO["sorted_in"]}
@@ -26531,7 +26279,7 @@ is disabled in POSIX mode,
because the @code{PROCINFO} array is not special in that case.
As a side note, sorting the array indices before traversing
-the array has been reported to add 15% to 20% overhead to the
+the array has been reported to add a 15% to 20% overhead to the
execution time of @command{awk} programs. For this reason,
sorted array traversal is not the default.
@@ -26590,7 +26338,7 @@ However, the @code{source} array is not affected.
Often, what's needed is to sort on the values of the @emph{indices}
instead of the values of the elements. To do that, use the
@code{asorti()} function. The interface and behavior are identical to
-that of @code{asort()}, except that the index values are used for sorting,
+that of @code{asort()}, except that the index values are used for sorting
and become the values of the result array:
@example
@@ -26625,8 +26373,8 @@ it chooses}, taking into account just the indices, just the values,
or both. This is extremely powerful.
Once the array is sorted, @code{asort()} takes the @emph{values} in
-their final order, and uses them to fill in the result array, whereas
-@code{asorti()} takes the @emph{indices} in their final order, and uses
+their final order and uses them to fill in the result array, whereas
+@code{asorti()} takes the @emph{indices} in their final order and uses
them to fill in the result array.
@cindex reference counting, sorting arrays
@@ -26842,7 +26590,6 @@ using regular pipes.
@section Using @command{gawk} for Network Programming
@cindex advanced features, network programming
@cindex networks, programming
-@c STARTOFRANGE tcpip
@cindex TCP/IP
@cindex @code{/inet/@dots{}} special files (@command{gawk})
@cindex files, @code{/inet/@dots{}} (@command{gawk})
@@ -26924,7 +26671,7 @@ service name.
@cindex @command{gawk}, @code{ERRNO} variable in
@cindex @code{ERRNO} variable
@quotation NOTE
-Failure in opening a two-way socket will result in a non-fatal error
+Failure in opening a two-way socket will result in a nonfatal error
being returned to the calling code. The value of @code{ERRNO} indicates
the error (@pxref{Auto-set}).
@end quotation
@@ -26941,31 +26688,28 @@ BEGIN @{
@end example
This program reads the current date and time from the local system's
-TCP @samp{daytime} server.
+TCP @code{daytime} server.
It then prints the results and closes the connection.
Because this topic is extensive, the use of @command{gawk} for
TCP/IP programming is documented separately.
@ifinfo
See
-@inforef{Top, , General Introduction, gawkinet, TCP/IP Internetworking with @command{gawk}},
+@inforef{Top, , General Introduction, gawkinet, @value{GAWKINETTITLE}},
@end ifinfo
@ifnotinfo
See
@uref{http://www.gnu.org/software/gawk/manual/gawkinet/,
-@cite{TCP/IP Internetworking with @command{gawk}}},
+@cite{@value{GAWKINETTITLE}}},
which comes as part of the @command{gawk} distribution,
@end ifnotinfo
for a much more complete introduction and discussion, as well as
extensive examples.
-@c ENDOFRANGE tcpip
@node Profiling
@section Profiling Your @command{awk} Programs
-@c STARTOFRANGE awkp
@cindex @command{awk} programs, profiling
-@c STARTOFRANGE proawk
@cindex profiling @command{awk} programs
@cindex @code{awkprof.out} file
@cindex files, @code{awkprof.out}
@@ -27032,9 +26776,9 @@ junk
@end example
Here is the @file{awkprof.out} that results from running the
-@command{gawk} profiler on this program and data. (This example also
+@command{gawk} profiler on this program and data (this example also
illustrates that @command{awk} programmers sometimes get up very early
-in the morning to work.)
+in the morning to work):
@cindex @code{BEGIN} pattern, and profiling
@cindex @code{END} pattern, and profiling
@@ -27094,8 +26838,8 @@ They are as follows:
@item
The program is printed in the order @code{BEGIN} rules,
@code{BEGINFILE} rules,
-pattern/action rules,
-@code{ENDFILE} rules, @code{END} rules and functions, listed
+pattern--action rules,
+@code{ENDFILE} rules, @code{END} rules, and functions, listed
alphabetically.
Multiple @code{BEGIN} and @code{END} rules retain their
separate identities, as do
@@ -27103,7 +26847,7 @@ multiple @code{BEGINFILE} and @code{ENDFILE} rules.
@cindex patterns, counts, in a profile
@item
-Pattern-action rules have two counts.
+Pattern--action rules have two counts.
The first count, to the left of the rule, shows how many times
the rule's pattern was @emph{tested}.
The second count, to the right of the rule's opening left brace
@@ -27170,13 +26914,13 @@ the target of a redirection isn't a scalar, it gets parenthesized.
@command{gawk} supplies leading comments in
front of the @code{BEGIN} and @code{END} rules,
the @code{BEGINFILE} and @code{ENDFILE} rules,
-the pattern/action rules, and the functions.
+the pattern--action rules, and the functions.
@end itemize
The profiled version of your program may not look exactly like what you
typed when you wrote it. This is because @command{gawk} creates the
-profiled version by ``pretty printing'' its internal representation of
+profiled version by ``pretty-printing'' its internal representation of
the program. The advantage to this is that @command{gawk} can produce
a standard representation.
Also, things such as:
@@ -27259,16 +27003,16 @@ If you use the @code{HUP} signal instead of the @code{USR1} signal,
@cindex @code{SIGQUIT} signal (MS-Windows)
@cindex signals, @code{QUIT}/@code{SIGQUIT} (MS-Windows)
When @command{gawk} runs on MS-Windows systems, it uses the
-@code{INT} and @code{QUIT} signals for producing the profile and, in
+@code{INT} and @code{QUIT} signals for producing the profile, and in
the case of the @code{INT} signal, @command{gawk} exits. This is
because these systems don't support the @command{kill} command, so the
only signals you can deliver to a program are those generated by the
keyboard. The @code{INT} signal is generated by the
-@kbd{Ctrl-@key{C}} or @kbd{Ctrl-@key{BREAK}} key, while the
-@code{QUIT} signal is generated by the @kbd{Ctrl-@key{\}} key.
+@kbd{Ctrl-c} or @kbd{Ctrl-BREAK} key, while the
+@code{QUIT} signal is generated by the @kbd{Ctrl-\} key.
Finally, @command{gawk} also accepts another option, @option{--pretty-print}.
-When called this way, @command{gawk} ``pretty prints'' the program into
+When called this way, @command{gawk} ``pretty-prints'' the program into
@file{awkprof.out}, without any execution counts.
@quotation NOTE
@@ -27292,9 +27036,6 @@ that the profiling output does. This makes it easy to pretty-print your
code once development is completed, and then use the result as the final
version of your program.
-@c ENDOFRANGE awkp
-@c ENDOFRANGE proawk
-
@node Advanced Features Summary
@section Summary
@@ -27325,7 +27066,7 @@ optionally, close off one side of the two-way communications.
@item
By using special @value{FN}s with the @samp{|&} operator, you can open a
-TCP/IP (or UDP/IP) connection to remote hosts in the Internet. @command{gawk}
+TCP/IP (or UDP/IP) connection to remote hosts on the Internet. @command{gawk}
supports both IPv4 and IPv6.
@item
@@ -27335,13 +27076,11 @@ you tune them more easily. Sending the @code{USR1} signal while profiling cause
@command{gawk} to dump the profile and keep going, including a function call stack.
@item
-You can also just ``pretty print'' the program. This currently also runs
+You can also just ``pretty-print'' the program. This currently also runs
the program, but that will change in the next major release.
@end itemize
-@c ENDOFRANGE advgaw
-@c ENDOFRANGE gawadv
@node Internationalization
@chapter Internationalization with @command{gawk}
@@ -27354,7 +27093,6 @@ countries, they were able to sell more systems.
As a result, internationalization and localization
of programs and software systems became a common practice.
-@c STARTOFRANGE inloc
@cindex internationalization, localization
@cindex @command{gawk}, internationalization and, See internationalization
@cindex internationalization, localization, @command{gawk} and
@@ -27387,7 +27125,7 @@ a requirement.
@cindex localization
@dfn{Internationalization} means writing (or modifying) a program once,
in such a way that it can use multiple languages without requiring
-further source-code changes.
+further source code changes.
@dfn{Localization} means providing the data necessary for an
internationalized program to work in a particular language.
Most typically, these terms refer to features such as the language
@@ -27399,11 +27137,10 @@ monetary values are printed and read.
@section GNU @command{gettext}
@cindex internationalizing a program
-@c STARTOFRANGE gettex
@cindex @command{gettext} library
@command{gawk} uses GNU @command{gettext} to provide its internationalization
features.
-The facilities in GNU @command{gettext} focus on messages; strings printed
+The facilities in GNU @command{gettext} focus on messages: strings printed
by a program, either directly or via formatting with @code{printf} or
@code{sprintf()}.@footnote{For some operating systems, the @command{gawk}
port doesn't support GNU @command{gettext}.
@@ -27451,7 +27188,6 @@ lookup of the translations.
@cindex @code{.po} files
@cindex files, @code{.po}
-@c STARTOFRANGE portobfi
@cindex portable object files
@cindex files, portable object
@item
@@ -27463,7 +27199,6 @@ For example, there might be a @file{fr.po} for a French translation.
@cindex @code{.gmo} files
@cindex files, @code{.gmo}
@cindex message object files
-@c STARTOFRANGE portmsgfi
@cindex files, message object
@item
Each language's @file{.po} file is converted into a binary
@@ -27591,14 +27326,12 @@ before or after the day in a date, local month abbreviations, and so on.
@item LC_ALL
All of the above. (Not too useful in the context of @command{gettext}.)
@end table
-@c ENDOFRANGE gettex
@node Programmer i18n
@section Internationalizing @command{awk} Programs
-@c STARTOFRANGE inap
@cindex @command{awk} programs, internationalizing
-@command{gawk} provides the following variables and functions for
+@command{gawk} provides the following variables for
internationalization:
@table @code
@@ -27614,7 +27347,12 @@ value is @code{"messages"}.
String constants marked with a leading underscore
are candidates for translation at runtime.
String constants without a leading underscore are not translated.
+@end table
+@command{gawk} provides the following functions for
+internationalization:
+
+@table @code
@cindexgawkfunc{dcgettext}
@item @code{dcgettext(@var{string}} [@code{,} @var{domain} [@code{,} @var{category}]]@code{)}
Return the translation of @var{string} in
@@ -27671,15 +27409,7 @@ If @var{directory} is the null string (@code{""}), then
given @var{domain}.
@end table
-To use these facilities in your @command{awk} program, follow the steps
-outlined in
-@ifnotinfo
-the previous @value{SECTION},
-@end ifnotinfo
-@ifinfo
-@ref{Explaining gettext},
-@end ifinfo
-like so:
+To use these facilities in your @command{awk} program, follow these steps:
@enumerate
@cindex @code{BEGIN} pattern, @code{TEXTDOMAIN} variable and
@@ -27828,8 +27558,6 @@ to provide you translations that you can also then distribute.
@DBXREF{I18N Example}
for the full list of steps to go through to create and test
translations for @command{guide}.
-@c ENDOFRANGE portobfi
-@c ENDOFRANGE portmsgfi
@node Printf Ordering
@subsection Rearranging @code{printf} Arguments
@@ -27964,7 +27692,7 @@ the null string (@code{""}) as its value, leaving the original string constant a
the result.
@item
-By defining ``dummy'' functions to replace @code{dcgettext()}, @code{dcngettext()}
+By defining ``dummy'' functions to replace @code{dcgettext()}, @code{dcngettext()},
and @code{bindtextdomain()}, the @command{awk} program can be made to run, but
all the messages are output in the original language.
For example:
@@ -28005,7 +27733,6 @@ However, because the positional specifications are primarily for use in
@emph{translated} format strings, and because non-GNU @command{awk}s never
retrieve the translated string, this should not be a problem in practice.
@end itemize
-@c ENDOFRANGE inap
@node I18N Example
@section A Simple Internationalization Example
@@ -28149,15 +27876,15 @@ using the GNU @command{gettext} package.
(GNU @command{gettext} is described in
complete detail in
@ifinfo
-@inforef{Top, , GNU @command{gettext} utilities, gettext, GNU gettext tools}.)
+@inforef{Top, , GNU @command{gettext} utilities, gettext, GNU @command{gettext} utilities}.)
@end ifinfo
@ifnotinfo
@uref{http://www.gnu.org/software/gettext/manual/,
-@cite{GNU gettext tools}}.)
+@cite{GNU @command{gettext} utilities}}.)
@end ifnotinfo
As of this writing, the latest version of GNU @command{gettext} is
-@uref{ftp://ftp.gnu.org/gnu/gettext/gettext-0.19.3.tar.gz,
-@value{PVERSION} 0.19.3}.
+@uref{ftp://ftp.gnu.org/gnu/gettext/gettext-0.19.4.tar.gz,
+@value{PVERSION} 0.19.4}.
If a translation of @command{gawk}'s messages exists,
then @command{gawk} produces usage messages, warnings,
@@ -28169,7 +27896,7 @@ and fatal errors in the local language.
@itemize @value{BULLET}
@item
Internationalization means writing a program such that it can use multiple
-languages without requiring source-code changes. Localization means
+languages without requiring source code changes. Localization means
providing the data necessary for an internationalized program to work
in a particular language.
@@ -28186,9 +27913,9 @@ file, and the @file{.po} files are compiled into @file{.gmo} files for
use at runtime.
@item
-You can use position specifications with @code{sprintf()} and
+You can use positional specifications with @code{sprintf()} and
@code{printf} to rearrange the placement of argument values in formatted
-strings and output. This is useful for the translations of format
+strings and output. This is useful for the translation of format
control strings.
@item
@@ -28201,7 +27928,6 @@ a number of translations for its messages.
@end itemize
-@c ENDOFRANGE inloc
@node Debugger
@chapter Debugging @command{awk} Programs
@@ -28245,8 +27971,7 @@ the discussion of debugging in @command{gawk}.
@subsection Debugging in General
(If you have used debuggers in other languages, you may want to skip
-ahead to the next section on the specific features of the @command{gawk}
-debugger.)
+ahead to @ref{Awk Debugging}.)
Of course, a debugging program cannot remove bugs for you, because it has
no way of knowing what you or your users consider a ``bug'' versus a
@@ -28337,10 +28062,10 @@ and usually find the errant code quite quickly.
@end table
@node Awk Debugging
-@subsection Awk Debugging
+@subsection @command{awk} Debugging
Debugging an @command{awk} program has some specific aspects that are
-not shared with other programming languages.
+not shared with programs written in other languages.
First of all, the fact that @command{awk} programs usually take input
line by line from a file or files and operate on those lines using specific
@@ -28356,7 +28081,7 @@ to look at the individual primitive instructions carried out
by the higher-level @command{awk} commands.
@node Sample Debugging Session
-@section Sample Debugging Session
+@section Sample @command{gawk} Debugging Session
@cindex sample debugging session
In order to illustrate the use of @command{gawk} as a debugger, let's look at a sample
@@ -28375,8 +28100,8 @@ as our example.
@cindex debugger, how to start
Starting the debugger is almost exactly like running @command{gawk} normally,
-except you have to pass an additional option @option{--debug}, or the
-corresponding short option @option{-D}. The file(s) containing the
+except you have to pass an additional option, @option{--debug}, or the
+corresponding short option, @option{-D}. The file(s) containing the
program and any supporting code are given on the command line as arguments
to one or more @option{-f} options. (@command{gawk} is not designed
to debug command-line programs, only programs contained in files.)
@@ -28389,7 +28114,7 @@ $ @kbd{gawk -D -f getopt.awk -f join.awk -f uniq.awk -1 inputfile}
@noindent
where both @file{getopt.awk} and @file{uniq.awk} are in @env{$AWKPATH}.
(Experienced users of GDB or similar debuggers should note that
-this syntax is slightly different from what they are used to.
+this syntax is slightly different from what you are used to.
With the @command{gawk} debugger, you give the arguments for running the program
in the command line to the debugger rather than as part of the @code{run}
command at the debugger prompt.)
@@ -28543,10 +28268,10 @@ gawk> @kbd{n}
@end example
This tells us that @command{gawk} is now ready to execute line 66, which
-decides whether to give the lines the special ``field skipping'' treatment
+decides whether to give the lines the special ``field-skipping'' treatment
indicated by the @option{-1} command-line option. (Notice that we skipped
-from where we were before at line 63 to here, because the condition in line 63
-@samp{if (fcount == 0 && charcount == 0)} was false.)
+from where we were before, at line 63, to here, because the condition
+in line 63, @samp{if (fcount == 0 && charcount == 0)}, was false.)
Continuing to step, we now get to the splitting of the current and
last records:
@@ -28620,7 +28345,7 @@ gawk> @kbd{n}
Well, here we are at our error (sorry to spoil the suspense). What we
had in mind was to join the fields starting from the second one to make
-the virtual record to compare, and if the first field was numbered zero,
+the virtual record to compare, and if the first field were numbered zero,
this would work. Let's look at what we've got:
@example
@@ -28629,7 +28354,7 @@ gawk> @kbd{p cline clast}
@print{} clast = "awk is a wonderful program!"
@end example
-Hey, those look pretty familiar! They're just our original, unaltered,
+Hey, those look pretty familiar! They're just our original, unaltered
input records. A little thinking (the human brain is still the best
debugging tool), and we realize that we were off by one!
@@ -28679,11 +28404,11 @@ Miscellaneous
@end itemize
Each of these are discussed in the following subsections.
-In the following descriptions, commands which may be abbreviated
+In the following descriptions, commands that may be abbreviated
show the abbreviation on a second description line.
A debugger command name may also be truncated if that partial
name is unambiguous. The debugger has the built-in capability to
-automatically repeat the previous command just by hitting @key{Enter}.
+automatically repeat the previous command just by hitting @kbd{Enter}.
This works for the commands @code{list}, @code{next}, @code{nexti},
@code{step}, @code{stepi}, and @code{continue} executed without any
argument.
@@ -28733,7 +28458,7 @@ Set a breakpoint at entry to (the first instruction of)
function @var{function}.
@end table
-Each breakpoint is assigned a number which can be used to delete it from
+Each breakpoint is assigned a number that can be used to delete it from
the breakpoint list using the @code{delete} command.
With a breakpoint, you may also supply a condition. This is an
@@ -28785,7 +28510,7 @@ watchpoint is made unconditional).
@cindex breakpoint, delete by number
@item @code{delete} [@var{n1 n2} @dots{}] [@var{n}--@var{m}]
@itemx @code{d} [@var{n1 n2} @dots{}] [@var{n}--@var{m}]
-Delete specified breakpoints or a range of breakpoints. Deletes
+Delete specified breakpoints or a range of breakpoints. Delete
all defined breakpoints if no argument is supplied.
@cindex debugger commands, @code{disable}
@@ -28794,7 +28519,7 @@ all defined breakpoints if no argument is supplied.
@cindex breakpoint, how to disable or enable
@item @code{disable} [@var{n1 n2} @dots{} | @var{n}--@var{m}]
Disable specified breakpoints or a range of breakpoints. Without
-any argument, disables all breakpoints.
+any argument, disable all breakpoints.
@cindex debugger commands, @code{e} (@code{enable})
@cindex debugger commands, @code{enable}
@@ -28804,18 +28529,18 @@ any argument, disables all breakpoints.
@item @code{enable} [@code{del} | @code{once}] [@var{n1 n2} @dots{}] [@var{n}--@var{m}]
@itemx @code{e} [@code{del} | @code{once}] [@var{n1 n2} @dots{}] [@var{n}--@var{m}]
Enable specified breakpoints or a range of breakpoints. Without
-any argument, enables all breakpoints.
-Optionally, you can specify how to enable the breakpoint:
+any argument, enable all breakpoints.
+Optionally, you can specify how to enable the breakpoints:
@c nested table
@table @code
@item del
-Enable the breakpoint(s) temporarily, then delete it when
-the program stops at the breakpoint.
+Enable the breakpoints temporarily, then delete each one when
+the program stops at it.
@item once
-Enable the breakpoint(s) temporarily, then disable it when
-the program stops at the breakpoint.
+Enable the breakpoints temporarily, then disable each one when
+the program stops at it.
@end table
@cindex debugger commands, @code{ignore}
@@ -28883,7 +28608,7 @@ gawk>
@item @code{continue} [@var{count}]
@itemx @code{c} [@var{count}]
Resume program execution. If continued from a breakpoint and @var{count} is
-specified, ignores the breakpoint at that location the next @var{count} times
+specified, ignore the breakpoint at that location the next @var{count} times
before stopping.
@cindex debugger commands, @code{finish}
@@ -28937,7 +28662,7 @@ automatic display variables, and debugger options.
@item @code{step} [@var{count}]
@itemx @code{s} [@var{count}]
Continue execution until control reaches a different source line in the
-current stack frame. @code{step} steps inside any function called within
+current stack frame, stepping inside any function called within
the line. If the argument @var{count} is supplied, steps that many times before
stopping, unless it encounters a breakpoint or watchpoint.
@@ -29050,7 +28775,7 @@ or field.
String values must be enclosed between double quotes (@code{"}@dots{}@code{"}).
You can also set special @command{awk} variables, such as @code{FS},
-@code{NF}, @code{NR}, and son on.
+@code{NF}, @code{NR}, and so on.
@cindex debugger commands, @code{w} (@code{watch})
@cindex debugger commands, @code{watch}
@@ -29062,7 +28787,7 @@ You can also set special @command{awk} variables, such as @code{FS},
Add variable @var{var} (or field @code{$@var{n}}) to the watch list.
The debugger then stops whenever
the value of the variable or field changes. Each watched item is assigned a
-number which can be used to delete it from the watch list using the
+number that can be used to delete it from the watch list using the
@code{unwatch} command.
With a watchpoint, you may also supply a condition. This is an
@@ -29090,11 +28815,11 @@ watch list.
@node Execution Stack
@subsection Working with the Stack
-Whenever you run a program which contains any function calls,
+Whenever you run a program that contains any function calls,
@command{gawk} maintains a stack of all of the function calls leading up
to where the program is right now. You can see how you got to where you are,
and also move around in the stack to see what the state of things was in the
-functions which called the one you are in. The commands for doing this are:
+functions that called the one you are in. The commands for doing this are:
@table @asis
@cindex debugger commands, @code{bt} (@code{backtrace})
@@ -29129,8 +28854,8 @@ Then select and print the frame.
@item @code{frame} [@var{n}]
@itemx @code{f} [@var{n}]
Select and print stack frame @var{n}. Frame 0 is the currently executing,
-or @dfn{innermost}, frame (function call), frame 1 is the frame that
-called the innermost one. The highest numbered frame is the one for the
+or @dfn{innermost}, frame (function call); frame 1 is the frame that
+called the innermost one. The highest-numbered frame is the one for the
main program. The printed information consists of the frame number,
function and argument names, source file, and the source line.
@@ -29146,7 +28871,7 @@ Then select and print the frame.
Besides looking at the values of variables, there is often a need to get
other sorts of information about the state of your program and of the
-debugging environment itself. The @command{gawk} debugger has one command which
+debugging environment itself. The @command{gawk} debugger has one command that
provides this information, appropriately called @code{info}. @code{info}
is used with one of a number of arguments that tell it exactly what
you want to know:
@@ -29234,12 +28959,12 @@ The available options are:
@table @asis
@item @code{history_size}
@cindex debugger history size
-The maximum number of lines to keep in the history file @file{./.gawk_history}.
-The default is 100.
+Set the maximum number of lines to keep in the history file
+@file{./.gawk_history}. The default is 100.
@item @code{listsize}
@cindex debugger default list amount
-The number of lines that @code{list} prints. The default is 15.
+Specify the number of lines that @code{list} prints. The default is 15.
@item @code{outfile}
@cindex redirect @command{gawk} output, in debugger
@@ -29249,7 +28974,7 @@ standard output.
@item @code{prompt}
@cindex debugger prompt
-The debugger prompt. The default is @samp{@w{gawk> }}.
+Change the debugger prompt. The default is @samp{@w{gawk> }}.
@item @code{save_history} [@code{on} | @code{off}]
@cindex debugger history file
@@ -29260,7 +28985,7 @@ The default is @code{on}.
@cindex save debugger options
Save current options to file @file{./.gawkrc} upon exit.
The default is @code{on}.
-Options are read back in to the next session upon startup.
+Options are read back into the next session upon startup.
@item @code{trace} [@code{on} | @code{off}]
@cindex instruction tracing, in debugger
@@ -29283,7 +29008,7 @@ command in the file. Also, the list of commands may include additional
@code{source} commands; however, the @command{gawk} debugger will not source the
same file more than once in order to avoid infinite recursion.
-In addition to, or instead of the @code{source} command, you can use
+In addition to, or instead of, the @code{source} command, you can use
the @option{-D @var{file}} or @option{--debug=@var{file}} command-line
options to execute commands from a file non-interactively
(@pxref{Options}).
@@ -29292,16 +29017,16 @@ options to execute commands from a file non-interactively
@node Miscellaneous Debugger Commands
@subsection Miscellaneous Commands
-There are a few more commands which do not fit into the
+There are a few more commands that do not fit into the
previous categories, as follows:
@table @asis
@cindex debugger commands, @code{dump}
@cindex @code{dump} debugger command
@item @code{dump} [@var{filename}]
-Dump bytecode of the program to standard output or to the file
+Dump byte code of the program to standard output or to the file
named in @var{filename}. This prints a representation of the internal
-instructions which @command{gawk} executes to implement the @command{awk}
+instructions that @command{gawk} executes to implement the @command{awk}
commands in a program. This can be very enlightening, as the following
partial dump of Davide Brini's obfuscated code
(@pxref{Signature Program}) demonstrates:
@@ -29398,7 +29123,7 @@ Print lines centered around line number @var{n} in
source file @var{filename}. This command may change the current source file.
@item @var{function}
-Print lines centered around beginning of the
+Print lines centered around the beginning of the
function @var{function}. This command may change the current source file.
@end table
@@ -29410,16 +29135,16 @@ function @var{function}. This command may change the current source file.
@item @code{quit}
@itemx @code{q}
Exit the debugger. Debugging is great fun, but sometimes we all have
-to tend to other obligations in life, and sometimes we find the bug,
+to tend to other obligations in life, and sometimes we find the bug
and are free to go on to the next one! As we saw earlier, if you are
-running a program, the debugger warns you if you accidentally type
+running a program, the debugger warns you when you type
@samp{q} or @samp{quit}, to make sure you really want to quit.
@cindex debugger commands, @code{trace}
@cindex @code{trace} debugger command
@item @code{trace} [@code{on} | @code{off}]
-Turn on or off a continuous printing of instructions which are about to
-be executed, along with printing the @command{awk} line which they
+Turn on or off continuous printing of the instructions that are about to
+be executed, along with the @command{awk} lines they
implement. The default is @code{off}.
It is to be hoped that most of the ``opcodes'' in these instructions are
@@ -29435,7 +29160,7 @@ fairly self-explanatory, and using @code{stepi} and @code{nexti} while
If @command{gawk} is compiled with
@uref{http://cnswww.cns.cwru.edu/php/chet/readline/readline.html,
-the @code{readline} library}, you can take advantage of that library's
+the GNU Readline library}, you can take advantage of that library's
command completion and history expansion features. The following types
of completion are available:
@@ -29472,7 +29197,7 @@ and
We hope you find the @command{gawk} debugger useful and enjoyable to work with,
but as with any program, especially in its early releases, it still has
-some limitations. A few which are worth being aware of are:
+some limitations. A few that it's worth being aware of are:
@itemize @value{BULLET}
@item
@@ -29488,13 +29213,13 @@ If you perused the dump of opcodes in @ref{Miscellaneous Debugger Commands}
(or if you are already familiar with @command{gawk} internals),
you will realize that much of the internal manipulation of data
in @command{gawk}, as in many interpreters, is done on a stack.
-@code{Op_push}, @code{Op_pop}, and the like, are the ``bread and butter'' of
+@code{Op_push}, @code{Op_pop}, and the like are the ``bread and butter'' of
most @command{gawk} code.
Unfortunately, as of now, the @command{gawk}
debugger does not allow you to examine the stack's contents.
That is, the intermediate results of expression evaluation are on the
-stack, but cannot be printed. Rather, only variables which are defined
+stack, but cannot be printed. Rather, only variables that are defined
in the program can be printed. Of course, a workaround for
this is to use more explicit variables at the debugging stage and then
change back to obscure, perhaps more optimal code later.
@@ -29508,12 +29233,12 @@ programmer, you are expected to know the meaning of
@item
The @command{gawk} debugger is designed to be used by running a program (with all its
parameters) on the command line, as described in @ref{Debugger Invocation}.
-There is no way (as of now) to attach or ``break in'' to a running program.
-This seems reasonable for a language which is used mainly for quickly
+There is no way (as of now) to attach or ``break into'' a running program.
+This seems reasonable for a language that is used mainly for quickly
executing, short programs.
@item
-The @command{gawk} debugger only accepts source supplied with the @option{-f} option.
+The @command{gawk} debugger only accepts source code supplied with the @option{-f} option.
@end itemize
@ignore
@@ -29527,8 +29252,8 @@ be added, and of course feel free to try to add them yourself!
@itemize @value{BULLET}
@item
Programs rarely work correctly the first time. Finding bugs
-is @dfn{debugging} and a program that helps you find bugs is a
-@dfn{debugger}. @command{gawk} has a built-in debugger that works very
+is called debugging, and a program that helps you find bugs is a
+debugger. @command{gawk} has a built-in debugger that works very
similarly to the GNU Debugger, GDB.
@item
@@ -29548,7 +29273,7 @@ breakpoints, execution, viewing and changing data, working with the stack,
getting information, and other tasks.
@item
-If the @code{readline} library is available when @command{gawk} is
+If the GNU Readline library is available when @command{gawk} is
compiled, it is used by the debugger to provide command-line history
and editing.
@@ -29805,7 +29530,7 @@ is available like so:
@example
$ @kbd{gawk --version}
@print{} GNU Awk 4.1.2, API: 1.1 (GNU MPFR 3.1.0-p3, GNU MP 5.0.2)
-@print{} Copyright (C) 1989, 1991-2014 Free Software Foundation.
+@print{} Copyright (C) 1989, 1991-2015 Free Software Foundation.
@dots{}
@end example
@@ -30459,7 +30184,7 @@ When asked about the algorithm used, Katie replied:
@quotation
It's not that well known but it's not that obscure either.
It's Euler's modification to Newton's method for calculating pi.
-Take a look at lines (23) - (25) here: @uref{http://mathworld.wolfram.com/PiFormulas.htm}.
+Take a look at lines (23) - (25) here: @uref{http://mathworld.wolfram.com/PiFormulas.html}.
The algorithm I wrote simply expands the multiply by 2 and works from
the innermost expression outwards. I used this to program HP calculators
@@ -30509,7 +30234,7 @@ Allowing completely alphabetic strings to have valid numeric
values is also a very severe departure from historical practice.
@end itemize
-The second problem is that the @code{gawk} maintainer feels that this
+The second problem is that the @command{gawk} maintainer feels that this
interpretation of the standard, which requires a certain amount of
``language lawyering'' to arrive at in the first place, was not even
intended by the standard developers. In other words, ``we see how you
@@ -30668,7 +30393,7 @@ When @option{--sandbox} is specified, extensions are disabled
* Finding Extensions:: How @command{gawk} finds compiled extensions.
* Extension Example:: Example C code for an extension.
* Extension Samples:: The sample extensions that ship with
- @code{gawk}.
+ @command{gawk}.
* gawkextlib:: The @code{gawkextlib} project.
* Extension summary:: Extension summary.
* Extension Exercises:: Exercises.
@@ -31632,7 +31357,7 @@ If the concept of a ``record terminator'' makes sense, then
@code{*rt_start} should be set to point to the data to be used for
@code{RT}, and @code{*rt_len} should be set to the length of the
data. Otherwise, @code{*rt_len} should be set to zero.
-@code{gawk} makes its own copy of this data, so the
+@command{gawk} makes its own copy of this data, so the
extension must manage this storage.
@end table
@@ -31678,7 +31403,7 @@ When writing an input parser, you should think about (and document)
how it is expected to interact with @command{awk} code. You may want
it to always be called, and take effect as appropriate (as the
@code{readdir} extension does). Or you may want it to take effect
-based upon the value of an @code{awk} variable, as the XML extension
+based upon the value of an @command{awk} variable, as the XML extension
from the @code{gawkextlib} project does (@pxref{gawkextlib}).
In the latter case, code in a @code{BEGINFILE} section
can look at @code{FILENAME} and @code{ERRNO} to decide whether or
@@ -32461,7 +32186,7 @@ converts it to a string. Using non-integral values is possible, but
requires that you understand how such values are converted to strings
(@pxref{Conversion}); thus using integral values is safest.
-As with @emph{all} strings passed into @code{gawk} from an extension,
+As with @emph{all} strings passed into @command{gawk} from an extension,
the string value of @code{index} must come from @code{gawk_malloc()},
@code{gawk_calloc()} or @code{gawk_realloc()}, and
@command{gawk} releases the storage.
@@ -34747,9 +34472,7 @@ online documentation}.
@node V7/SVR3.1
@appendixsec Major Changes Between V7 and SVR3.1
-@c STARTOFRANGE gawkv
@cindex @command{awk}, versions of
-@c STARTOFRANGE gawkv1
@cindex @command{awk}, versions of, changes between V7 and SVR3.1
The @command{awk} language evolved considerably between the release of
@@ -34836,7 +34559,6 @@ Multiple @code{BEGIN} and @code{END} rules
Multidimensional arrays
(@pxref{Multidimensional}).
@end itemize
-@c ENDOFRANGE gawkv1
@node SVR4
@appendixsec Changes Between SVR3.1 and SVR4
@@ -34951,7 +34673,6 @@ not permitted by the POSIX standard.
The 2008 POSIX standard can be found online at
@url{http://www.opengroup.org/onlinepubs/9699919799/}.
-@c ENDOFRANGE gawkv
@node BTL
@appendixsec Extensions in Brian Kernighan's @command{awk}
@@ -34997,11 +34718,8 @@ available in his @command{awk}.
@node POSIX/GNU
@appendixsec Extensions in @command{gawk} Not in POSIX @command{awk}
-@c STARTOFRANGE fripls
@cindex compatibility mode (@command{gawk}), extensions
-@c STARTOFRANGE exgnot
@cindex extensions, in @command{gawk}, not in POSIX @command{awk}
-@c STARTOFRANGE posnot
@cindex POSIX, @command{gawk} extensions not included in
The GNU implementation, @command{gawk}, adds a large number of features.
They can all be disabled with either the @option{--traditional} or
@@ -35330,9 +35048,6 @@ MirBSD
@c XXX ADD MORE STUFF HERE
-@c ENDOFRANGE fripls
-@c ENDOFRANGE exgnot
-@c ENDOFRANGE posnot
@c This does not need to be in the formal book.
@ifclear FOR_PRINT
@@ -36419,9 +36134,7 @@ the appropriate credit where credit is due.
@c last two commas are part of see also
@cindex operating systems, See Also GNU/Linux@comma{} PC operating systems@comma{} Unix
-@c STARTOFRANGE gligawk
@cindex @command{gawk}, installing
-@c STARTOFRANGE ingawk
@cindex installing @command{gawk}
This appendix provides instructions for installing @command{gawk} on the
various platforms that are supported by the developers. The primary
@@ -36531,7 +36244,6 @@ a local expert.
@node Distribution contents
@appendixsubsec Contents of the @command{gawk} Distribution
-@c STARTOFRANGE gawdis
@cindex @command{gawk}, distribution
The @command{gawk} distribution has a number of C source files,
@@ -36629,10 +36341,10 @@ The generated Info file for this @value{DOCUMENT}.
@item doc/gawkinet.texi
The Texinfo source file for
@ifinfo
-@inforef{Top, , General Introduction, gawkinet, TCP/IP Internetworking with @command{gawk}}.
+@inforef{Top, , General Introduction, gawkinet, @value{GAWKINETTITLE}}.
@end ifinfo
@ifnotinfo
-@cite{TCP/IP Internetworking with @command{gawk}}.
+@cite{@value{GAWKINETTITLE}}.
@end ifnotinfo
It should be processed with @TeX{}
(via @command{texi2dvi} or @command{texi2pdf})
@@ -36641,7 +36353,7 @@ with @command{makeinfo} to produce an Info or HTML file.
@item doc/gawkinet.info
The generated Info file for
-@cite{TCP/IP Internetworking with @command{gawk}}.
+@cite{@value{GAWKINETTITLE}}.
@item doc/igawk.1
The @command{troff} source for a manual page describing the @command{igawk}
@@ -36730,7 +36442,6 @@ directory to run your version of @command{gawk} against the test suite.
If @command{gawk} successfully passes @samp{make check}, then you can
be confident of a successful port.
@end table
-@c ENDOFRANGE gawdis
@node Unix Installation
@appendixsec Compiling and Installing @command{gawk} on Unix-Like Systems
@@ -36881,7 +36592,7 @@ can be configured and compiled.
@cindex @option{--disable-lint} configuration option
@cindex configuration option, @code{--disable-lint}
@item --disable-lint
-Disable all lint checking within @code{gawk}. The
+Disable all lint checking within @command{gawk}. The
@option{--lint} and @option{--lint-old} options
(@pxref{Options})
are accepted, but silently do nothing.
@@ -37195,9 +36906,7 @@ multibyte functionality is not available.
@node PC Using
@appendixsubsubsec Using @command{gawk} on PC Operating Systems
-@c STARTOFRANGE opgawx
@cindex operating systems, PC, @command{gawk} on
-@c STARTOFRANGE pcgawon
@cindex PC operating systems, @command{gawk} on
Under MS-DOS and MS-Windows, the Cygwin and MinGW environments support
@@ -37705,8 +37414,6 @@ $ @kbd{gawk :== $sys$common:[syshlp.examples.tcpip.snmp]gawk.exe}
This is apparently @value{PVERSION} 2.15.6, which is extremely old. We
recommend compiling and using the current version.
-@c ENDOFRANGE opgawx
-@c ENDOFRANGE pcgawon
@node Bugs
@appendixsec Reporting Problems and Bugs
@@ -37717,9 +37424,7 @@ recommend compiling and using the current version.
@end quotation
@c the radio show, not the book. :-)
-@c STARTOFRANGE dbugg
@cindex debugging @command{gawk}, bug reports
-@c STARTOFRANGE tblgawb
@cindex troubleshooting, @command{gawk}, bug reports
If you have problems with @command{gawk} or think that you have found a bug,
report it to the developers; we cannot promise to do anything
@@ -37816,12 +37521,9 @@ The people maintaining the various @command{gawk} ports are:
If your bug is also reproducible under Unix, send a copy of your
report to the @EMAIL{bug-gawk@@gnu.org,bug-gawk at gnu dot org} email list as well.
-@c ENDOFRANGE dbugg
-@c ENDOFRANGE tblgawb
@node Other Versions
@appendixsec Other Freely Available @command{awk} Implementations
-@c STARTOFRANGE awkim
@cindex @command{awk}, implementations
@ignore
From: emory!amc.com!brennan (Michael Brennan)
@@ -37881,7 +37583,7 @@ git clone git://github.com/onetrueawk/awk bwkawk
@end example
@noindent
-This command creates a copy of the @uref{http://www.git-scm.com, Git}
+This command creates a copy of the @uref{http://git-scm.com, Git}
repository in a directory named @file{bwkawk}. If you leave that argument
off the @command{git} command line, the repository copy is created in a
directory named @file{awk}.
@@ -37946,7 +37648,7 @@ To get @command{awka}, go to @url{http://sourceforge.net/projects/awka}.
@c andrewsumner@@yahoo.net
The project seems to be frozen; no new code changes have been made
-since approximately 2003.
+since approximately 2001.
@cindex Beebe, Nelson H.F.@:
@cindex @command{pawk} (profiling version of Brian Kernighan's @command{awk})
@@ -38042,7 +37744,6 @@ See also the ``Versions and implementations'' section of the
Wikipedia article} for information on additional versions.
@end table
-@c ENDOFRANGE awkim
@node Installation summary
@appendixsec Summary
@@ -38080,15 +37781,11 @@ implementations. Many are POSIX compliant; others are less so.
@end itemize
-@c ENDOFRANGE gligawk
-@c ENDOFRANGE ingawk
@ifclear FOR_PRINT
@node Notes
@appendix Implementation Notes
-@c STARTOFRANGE gawii
@cindex @command{gawk}, implementation issues
-@c STARTOFRANGE impis
@cindex implementation issues, @command{gawk}
This appendix contains information mainly of interest to implementers and
@@ -38164,7 +37861,7 @@ However, if you want to modify @command{gawk} and contribute back your
changes, you will probably wish to work with the development version.
To do so, you will need to access the @command{gawk} source code
repository. The code is maintained using the
-@uref{http://git-scm.com/, Git distributed version control system}.
+@uref{http://git-scm.com, Git distributed version control system}.
You will need to install it if your system doesn't have it.
Once you have done so, use the command:
@@ -38193,11 +37890,8 @@ that has a Git plug-in for working with Git repositories.
@node Adding Code
@appendixsubsec Adding New Features
-@c STARTOFRANGE adfgaw
@cindex adding, features to @command{gawk}
-@c STARTOFRANGE fadgaw
@cindex features, adding to @command{gawk}
-@c STARTOFRANGE gawadf
@cindex @command{gawk}, features, adding
You are free to add any new features you like to @command{gawk}.
However, if you want your changes to be incorporated into the @command{gawk}
@@ -38232,7 +37926,7 @@ for information on getting the latest version of @command{gawk}.)
@item
@ifnotinfo
-Follow the @uref{http://www.gnu.org/prep/standards/, @cite{GNU Coding Standards}}.
+Follow the @cite{GNU Coding Standards}.
@end ifnotinfo
@ifinfo
See @inforef{Top, , Version, standards, GNU Coding Standards}.
@@ -38241,7 +37935,7 @@ This document describes how GNU software should be written. If you haven't
read it, please do so, preferably @emph{before} starting to modify @command{gawk}.
(The @cite{GNU Coding Standards} are available from
the GNU Project's
-@uref{http://www.gnu.org/prep/standards_toc.html, website}.
+@uref{http://www.gnu.org/prep/standards/, website}.
Texinfo, Info, and DVI versions are also available.)
@cindex @command{gawk}, coding style in
@@ -38364,9 +38058,6 @@ Although this sounds like a lot of work, please remember that while you
may write the new code, I have to maintain it and support it. If it
isn't possible for me to do that with a minimum of extra work, then I
probably will not.
-@c ENDOFRANGE adfgaw
-@c ENDOFRANGE gawadf
-@c ENDOFRANGE fadgaw
@node New Ports
@appendixsubsec Porting @command{gawk} to a New Operating System
@@ -38500,7 +38191,6 @@ coding style and brace layout that suits your taste.
@node Derived Files
@appendixsubsec Why Generated Files Are Kept In Git
-@c STARTOFRANGE gawkgit
@cindex Git, use of for @command{gawk} source code
@c From emails written March 22, 2012, to the gawk developers list.
@@ -38689,7 +38379,6 @@ wget http://git.savannah.gnu.org/cgit/gawk.git/snapshot/gawk-@var{branchname}.ta
@noindent
to retrieve a snapshot of the given branch.
-@c ENDOFRANGE gawkgit
@node Future Extensions
@appendixsec Probable Future Extensions
@@ -39070,13 +38759,10 @@ of @command{gawk}, but it @emph{will} be removed in the next major release.
@end itemize
-@c ENDOFRANGE impis
-@c ENDOFRANGE gawii
@node Basic Concepts
@appendix Basic Programming Concepts
@cindex programming, concepts
-@c STARTOFRANGE procon
@cindex programming, concepts
This @value{APPENDIX} attempts to define some of the basic concepts
@@ -39314,7 +39000,6 @@ standard for C. This standard became an ISO standard in 1990.
In 1999, a revised ISO C standard was approved and released.
Where it makes sense, POSIX @command{awk} is compatible with 1999 ISO C.
-@c ENDOFRANGE procon
@node Glossary
@unnumbered Glossary
@@ -39365,6 +39050,21 @@ languages.
These standards often become international standards as well. See also
``ISO.''
+@item Argument
+An argument can be two different things. It can be an option or a
+@value{FN} passed to a command while invoking it from the command line, or
+it can be something passed to a @dfn{function} inside a program, e.g.
+inside @command{awk}.
+
+In the latter case, an argument can be passed to a function in two ways.
+Either it is given to the called function by value, i.e., a copy of the
+value of the variable is made available to the called function, but the
+original variable cannot be modified by the function itself; or it is
+given by reference, i.e., a pointer to the interested variable is passed to
+the function, which can then directly modify it. In @command{awk}
+scalars are passed by value, and arrays are passed by reference.
+See ``Pass By Value/Reference.''
+
@item Array
A grouping of multiple values under the same name.
Most languages just provide sequential arrays.
@@ -39406,6 +39106,25 @@ The GNU version of the standard shell
@end ifinfo
See also ``Bourne Shell.''
+@item Binary
+Base-two notation, where the digits are @code{0}--@code{1}. Since
+electronic circuitry works ``naturally'' in base 2 (just think of Off/On),
+everything inside a computer is calculated using base 2. Each digit
+represents the presence (or absence) of a power of 2 and is called a
+@dfn{bit}. So, for example, the base-two number @code{10101} is
+the same as decimal 21, ((1 x 16) + (1 x 4) + (1 x 1)).
+
+Since base-two numbers quickly become
+very long to read and write, they are usually grouped by 3 (i.e., they are
+read as octal numbers), or by 4 (i.e., they are read as hexadecimal
+numbers). There is no direct way to insert base 2 numbers in a C program.
+If need arises, such numbers are usually inserted as octal or hexadecimal
+numbers. The number of base-two digits that fit into registers used for
+representing integer numbers in computers is a rough indication of the
+computing power of the computer itself. Most computers nowadays use 64
+bits for representing integer numbers in their registers, but 32-bit,
+16-bit and 8-bit registers have been widely used in the past.
+@xref{Nondecimal-numbers}.
@item Bit
Short for ``Binary Digit.''
All values in computer memory ultimately reduce to binary digits: values
@@ -39437,6 +39156,19 @@ The characters @samp{@{} and @samp{@}}. Braces are used in
@command{awk} for delimiting actions, compound statements, and function
bodies.
+@item Bracket Expression
+Inside a @dfn{regular expression}, an expression included in square
+brackets, meant to designate a single character as belonging to a
+specified character class. A bracket expression can contain a list of one
+or more characters, like @samp{[abc]}, a range of characters, like
+@samp{[A-Z]}, or a name, delimited by @samp{:}, that designates a known set
+of characters, like @samp{[:digit:]}. The form of bracket expression
+enclosed between @samp{:} is independent of the underlying representation
+of the character themselves, which could utilize the ASCII, ECBDIC, or
+Unicode codesets, depending on the architecture of the computer system, and on
+localization.
+See also ``Regular Expression.''
+
@item Built-in Function
The @command{awk} language provides built-in functions that perform various
numerical, I/O-related, and string computations. Examples are
@@ -39490,9 +39222,25 @@ points out similarities between @command{awk} and C when appropriate.
In general, @command{gawk} attempts to be as similar to the 1990 version
of ISO C as makes sense.
+@item C Shell
+The C Shell (@command{csh} or its improved version, @command{tcsh}) is a Unix shell that was
+created by Bill Joy in the late 1970s. The C shell was differentiated from
+other shells by its interactive features and overall style, which
+looks more like C. The C Shell is not backward compatible with the Bourne
+Shell, so special attention is required when converting scripts
+written for other Unix shells to the C shell, especially with regard to the management of
+shell variables.
+See also ``Bourne Shell.''
+
@item C++
A popular object-oriented programming language derived from C.
+@item Character Class
+See ``Bracket Expression.''
+
+@item Character List
+See ``Bracket Expression.''
+
@cindex ASCII
@cindex ISO 8859-1
@cindex ISO Latin-1
@@ -39516,7 +39264,7 @@ A preprocessor for @command{pic} that reads descriptions of molecules
and produces @command{pic} input for drawing them.
It was written in @command{awk}
by Brian Kernighan and Jon Bentley, and is available from
-@uref{http://netlib.sandia.gov/netlib/typesetting/chem.gz}.
+@uref{http://netlib.org/typesetting/chem}.
@item Comparison Expression
A relation that is either true or false, such as @samp{a < b}.
@@ -39532,11 +39280,23 @@ machine-executable object code. The object code is then executed
directly by the computer.
See also ``Interpreter.''
+@item Complemented Bracket Expression
+The negation of a @dfn{bracket expression}. All that is @emph{not}
+described by a given bracket expression. The symbol @samp{^} precedes
+the negated bracket expression. E.g.: @samp{[[^:digit:]}
+designates whatever character is not a digit. @samp{[^bad]}
+designates whatever character is not one of the letters @samp{b}, @samp{a},
+or @samp{d}.
+See ``Bracket Expression.''
+
@item Compound Statement
A series of @command{awk} statements, enclosed in curly braces. Compound
statements may be nested.
(@xref{Statements}.)
+@item Computed Regexps
+See ``Dynamic Regular Expressions.''
+
@item Concatenation
Concatenating two strings means sticking them together, one after another,
producing a new string. For example, the string @samp{foo} concatenated with
@@ -39551,6 +39311,13 @@ expression is the value of @var{expr2}; otherwise the value is
@var{expr3}. In either case, only one of @var{expr2} and @var{expr3}
is evaluated. (@xref{Conditional Exp}.)
+@item Control Statement
+A control statement is an instruction to perform a given operation or a set
+of operations inside an @command{awk} program, if a given condition is
+true. Control statements are: @code{if}, @code{for}, @code{while}, and
+@code{do}
+(@pxref{Statements}).
+
@cindex McIlroy, Doug
@cindex cookie
@item Cookie
@@ -39705,6 +39472,11 @@ Format strings control the appearance of output in the
are controlled by the format strings contained in the predefined variables
@code{CONVFMT} and @code{OFMT}. (@xref{Control Letters}.)
+@item Fortran
+Shorthand for FORmula TRANslator, one of the first programming languages
+available for scientific calculations. It was created by John Backus,
+and has been available since 1957. It is still in use today.
+
@item Free Documentation License
This document describes the terms under which this @value{DOCUMENT}
is published and may be copied. (@xref{GNU Free Documentation License}.)
@@ -39722,10 +39494,21 @@ Emacs editor. GNU Emacs is the most widely used version of Emacs today.
See ``Free Software Foundation.''
@item Function
-A specialized group of statements used to encapsulate general
-or program-specific tasks. @command{awk} has a number of built-in
-functions, and also allows you to define your own.
-(@xref{Functions}.)
+A part of an @command{awk} program that can be invoked from every point of
+the program, to perform a task. @command{awk} has several built-in
+functions.
+Users can define their own functions in every part of the program.
+Function can be recursive, i.e., they may invoke themselves.
+@xref{Functions}.
+In @command{gawk} it is also possible to have functions shared
+among different programs, and included where required using the
+@code{@@include} directive
+(@pxref{Include Files}).
+In @command{gawk} the name of the function that should be invoked
+can be generated at run time, i.e., dynamically.
+The @command{gawk} extension API provides constructor functions
+(@pxref{Constructor Functions}).
+
@item @command{gawk}
The GNU implementation of @command{awk}.
@@ -39849,6 +39632,12 @@ meaning. Keywords are reserved and may not be used as variable names.
and
@code{while}.
+@item Korn Shell
+The Korn Shell (@command{ksh}) is a Unix shell which was developed by David Korn at Bell
+Laboratories in the early 1980s. The Korn Shell is backward-compatible with the Bourne
+shell and includes many features of the C shell.
+See also ``Bourne Shell.''
+
@cindex LGPL (Lesser General Public License)
@cindex Lesser General Public License (LGPL)
@cindex GNU Lesser General Public License
@@ -39888,6 +39677,14 @@ Characters used within a regexp that do not stand for themselves.
Instead, they denote regular expression operations, such as repetition,
grouping, or alternation.
+@item Nesting
+Nesting is where information is organized in layers, or where objects
+contain other similar objects.
+In @command{gawk} the @code{@@include}
+directive can be nested. The ``natural'' nesting of arithmetic and
+logical operations can be changed using parentheses
+(@pxref{Precedence}).
+
@item No-op
An operation that does nothing.
@@ -39908,6 +39705,11 @@ Octal numbers are written in C using a leading @samp{0},
to indicate their base. Thus, @code{013} is 11 ((1 x 8) + 3).
@xref{Nondecimal-numbers}.
+@item Output Record
+A single chunk of data that is written out by @command{awk}. Usually, an
+@command{awk} output record consists of one or more lines of text.
+@xref{Records}.
+
@item Pattern
Patterns tell @command{awk} which input records are interesting to which
rules.
@@ -39922,6 +39724,9 @@ An acronym describing what is possibly the most frequent
source of computer usage problems. (Problem Exists Between
Keyboard And Chair.)
+@item Plug-in
+See ``Extensions.''
+
@item POSIX
The name for a series of standards
that specify a Portable Operating System interface. The ``IX'' denotes
@@ -39946,6 +39751,9 @@ A sequence of consecutive lines from the input file(s). A pattern
can specify ranges of input lines for @command{awk} to process or it can
specify single lines. (@xref{Pattern Overview}.)
+@item Record
+See ``Input record'' and ``Output record.''
+
@item Recursion
When a function calls itself, either directly or indirectly.
If this is clear, stop, and proceed to the next entry.
@@ -39963,6 +39771,15 @@ operators.
(@xref{Getline},
and @ref{Redirection}.)
+@item Reference Counts
+An internal mechanism in @command{gawk} to minimize the amount of memory
+needed to store the value of string variables. If the value assumed by
+a variable is used in more than one place, only one copy of the value
+itself is kept, and the associated reference count is increased when the
+same value is used by an additional variable, and decresed when the related
+variable is no longer in use. When the reference count goes to zero,
+the memory space used to store the value of the variable is freed.
+
@item Regexp
See ``Regular Expression.''
@@ -39980,6 +39797,15 @@ slashes, such as @code{/foo/}. This regular expression is chosen
when you write the @command{awk} program and cannot be changed during
its execution. (@xref{Regexp Usage}.)
+@item Regular Expression Operators
+See ``Metacharacters.''
+
+@item Rounding
+Rounding the result of an arithmetic operation can be tricky.
+More than one way of rounding exists, and in @command{gawk}
+it is possible to choose which method should be used in a program.
+@xref{Setting the rounding mode}.
+
@item Rule
A segment of an @command{awk} program that specifies how to process single
input records. A rule consists of a @dfn{pattern} and an @dfn{action}.
@@ -40039,6 +39865,12 @@ A @value{FN} interpreted internally by @command{gawk}, instead of being handed
directly to the underlying operating system---for example, @file{/dev/stderr}.
(@xref{Special Files}.)
+@item Statement
+An expression inside an @command{awk} program in the action part
+of a pattern--action rule, or inside an
+@command{awk} function. A statement can be a variable assignment,
+an array operation, a loop, etc.
+
@item Stream Editor
A program that reads records from an input stream and processes them one
or more at a time. This is in contrast with batch programs, which may
@@ -40089,9 +39921,14 @@ This is standard time in Greenwich, England, which is used as a
reference time for day and date calculations.
See also ``Epoch'' and ``GMT.''
+@item Variable
+A name for a value. In @command{awk}, variables may be either scalars
+or arrays.
+
@item Whitespace
A sequence of space, TAB, or newline characters occurring inside an input
record or a string.
+
@end table
@end ifclear