aboutsummaryrefslogtreecommitdiffstats
path: root/doc/gawk.texi
diff options
context:
space:
mode:
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r--doc/gawk.texi4066
1 files changed, 2281 insertions, 1785 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi
index 52c526a4..20087fa7 100644
--- a/doc/gawk.texi
+++ b/doc/gawk.texi
@@ -37,11 +37,13 @@
@ifnotdocbook
@set BULLET @bullet{}
@set MINUS @minus{}
+@set NUL @sc{nul}
@end ifnotdocbook
@ifdocbook
@set BULLET
@set MINUS
+@set NUL NUL
@end ifdocbook
@set xref-automatic-section-title
@@ -51,11 +53,16 @@
@c applies to and all the info about who's publishing this edition
@c These apply across the board.
-@set UPDATE-MONTH August, 2014
+@set UPDATE-MONTH September, 2014
@set VERSION 4.1
-@set PATCHLEVEL 1
+@set PATCHLEVEL 2
+@ifset FOR_PRINT
+@set TITLE Effective AWK Programming
+@end ifset
+@ifclear FOR_PRINT
@set TITLE GAWK: Effective AWK Programming
+@end ifclear
@set SUBTITLE A User's Guide for GNU Awk
@set EDITION 4.1
@@ -165,6 +172,19 @@
@end macro
@end ifdocbook
+@c hack for docbook, where comma shouldn't always follow an @ref{}
+@ifdocbook
+@macro DBREF{text}
+@ref{\text\}
+@end macro
+@end ifdocbook
+
+@ifnotdocbook
+@macro DBREF{text}
+@ref{\text\},
+@end macro
+@end ifnotdocbook
+
@ifclear FOR_PRINT
@set FN file name
@set FFN File Name
@@ -526,10 +546,10 @@ particular records in a file and perform operations upon them.
* Escape Sequences:: How to write nonprinting characters.
* Regexp Operators:: Regular Expression Operators.
* Bracket Expressions:: What can go between @samp{[...]}.
-* GNU Regexp Operators:: Operators specific to GNU software.
-* Case-sensitivity:: How to do case-insensitive matching.
* Leftmost Longest:: How much text matches.
* Computed Regexps:: Using Dynamic Regexps.
+* GNU Regexp Operators:: Operators specific to GNU software.
+* Case-sensitivity:: How to do case-insensitive matching.
* Regexp Summary:: Regular expressions summary.
* Records:: Controlling how data is split into
records.
@@ -545,8 +565,8 @@ particular records in a file and perform operations upon them.
* Regexp Field Splitting:: Using regexps as the field separator.
* Single Character Fields:: Making each character a separate
field.
-* Command Line Field Separator:: Setting @code{FS} from the
- command line.
+* Command Line Field Separator:: Setting @code{FS} from the command
+ line.
* Full Line Fields:: Making the full line be a single
field.
* Field Splitting Summary:: Some final points and a summary table.
@@ -590,17 +610,19 @@ particular records in a file and perform operations upon them.
* Printf Examples:: Several examples.
* Redirection:: How to redirect output to multiple
files and pipes.
+* Special FD:: Special files for I/O.
* Special Files:: File name interpretation in
@command{gawk}. @command{gawk} allows
access to inherited file descriptors.
-* Special FD:: Special files for I/O.
+* Other Inherited Files:: Accessing other open files with
+ @command{gawk}.
* Special Network:: Special files for network
communications.
* Special Caveats:: Things to watch out for.
* Close Files And Pipes:: Closing Input and Output Files and
Pipes.
* Output Summary:: Output summary.
-* Output exercises:: Exercises.
+* Output Exercises:: Exercises.
* Values:: Constants, Variables, and Regular
Expressions.
* Constants:: String, numeric and regexp constants.
@@ -686,7 +708,7 @@ particular records in a file and perform operations upon them.
record.
* Nextfile Statement:: Stop processing the current file.
* Exit Statement:: Stop execution of @command{awk}.
-* Built-in Variables:: Summarizes the built-in variables.
+* Built-in Variables:: Summarizes the predefined variables.
* User-modified:: Built-in variables that you change to
control @command{awk}.
* Auto-set:: Built-in variables where @command{awk}
@@ -706,12 +728,12 @@ particular records in a file and perform operations upon them.
elements.
* Controlling Scanning:: Controlling the order in which arrays
are scanned.
-* Delete:: The @code{delete} statement removes an
- element from an array.
* Numeric Array Subscripts:: How to use numbers as subscripts in
@command{awk}.
* Uninitialized Subscripts:: Using Uninitialized variables as
subscripts.
+* Delete:: The @code{delete} statement removes an
+ element from an array.
* Multidimensional:: Emulating multidimensional arrays in
@command{awk}.
* Multiscanning:: Scanning multidimensional arrays.
@@ -770,6 +792,8 @@ particular records in a file and perform operations upon them.
* Getlocaltime Function:: A function to get formatted times.
* Readfile Function:: A function to read an entire file at
once.
+* Shell Quoting:: A function to quote strings for the
+ shell.
* Data File Management:: Functions for managing command-line
data files.
* Filetrans Function:: A function for handling data file
@@ -787,7 +811,7 @@ particular records in a file and perform operations upon them.
information.
* Walking Arrays:: A function to walk arrays of arrays.
* Library Functions Summary:: Summary of library functions.
-* Library exercises:: Exercises.
+* Library Exercises:: Exercises.
* Running Examples:: How to run these examples.
* Clones:: Clones of common utilities.
* Cut Program:: The @command{cut} utility.
@@ -884,7 +908,6 @@ particular records in a file and perform operations upon them.
* Extension API Description:: A full description of the API.
* Extension API Functions Introduction:: Introduction to the API functions.
* General Data Types:: The data types.
-* Requesting Values:: How to get a value.
* Memory Allocation Functions:: Functions for allocating memory.
* Constructor Functions:: Functions for creating values.
* Registration Functions:: Functions to register things with
@@ -897,6 +920,7 @@ particular records in a file and perform operations upon them.
* Two-way processors:: Registering a two-way processor.
* Printing Messages:: Functions for printing messages.
* Updating @code{ERRNO}:: Functions for updating @code{ERRNO}.
+* Requesting Values:: How to get a value.
* Accessing Parameters:: Functions for accessing parameters.
* Symbol Table Access:: Functions for accessing global
variables.
@@ -935,9 +959,9 @@ particular records in a file and perform operations upon them.
processor.
* Extension Sample Read write array:: Serializing an array to a file.
* Extension Sample Readfile:: Reading an entire file into a string.
-* Extension Sample API Tests:: Tests for the API.
* Extension Sample Time:: An interface to @code{gettimeofday()}
and @code{sleep()}.
+* Extension Sample API Tests:: Tests for the API.
* gawkextlib:: The @code{gawkextlib} project.
* Extension summary:: Extension summary.
* Extension Exercises:: Exercises.
@@ -1073,7 +1097,7 @@ books on Unix, I found the gray AWK book, a.k.a.@: Aho, Kernighan and
Weinberger, @cite{The AWK Programming Language}, Addison-Wesley,
1988. AWK's simple programming paradigm---find a pattern in the
input and then perform an action---often reduced complex or tedious
-data manipulations to few lines of code. I was excited to try my
+data manipulations to a few lines of code. I was excited to try my
hand at programming in AWK.
Alas, the @command{awk} on my computer was a limited version of the
@@ -1207,7 +1231,7 @@ March, 2001
<affiliation><jobtitle>Nof Ayalon</jobtitle></affiliation>
<affiliation><jobtitle>ISRAEL</jobtitle></affiliation>
</author>
- <date>June, 2014</date>
+ <date>December, 2014</date>
</prefaceinfo>
@end docbook
@@ -1220,8 +1244,7 @@ language that makes it easy to handle simple data-reformatting jobs.
The GNU implementation of @command{awk} is called @command{gawk}; if you
invoke it with the proper options or environment variables
-(@pxref{Options}), it is fully
-compatible with
+it is fully compatible with
the POSIX@footnote{The 2008 POSIX standard is accessible online at
@w{@url{http://www.opengroup.org/onlinepubs/9699919799/}.}}
specification of the @command{awk} language
@@ -1229,7 +1252,7 @@ and with the Unix version of @command{awk} maintained
by Brian Kernighan.
This means that all
properly written @command{awk} programs should work with @command{gawk}.
-Thus, we usually don't distinguish between @command{gawk} and other
+So most of the time, we don't distinguish between @command{gawk} and other
@command{awk} implementations.
@cindex @command{awk}, POSIX and, See Also POSIX @command{awk}
@@ -1238,7 +1261,7 @@ Thus, we usually don't distinguish between @command{gawk} and other
@cindex @command{gawk}, @command{awk} and
@cindex @command{awk}, @command{gawk} and
@cindex @command{awk}, uses for
-Using @command{awk} allows you to:
+Using @command{awk} you can:
@itemize @value{BULLET}
@item
@@ -1276,15 +1299,15 @@ Sort data
Perform simple network communications
@item
-Profile and debug @command{awk} programs.
+Profile and debug @command{awk} programs
@item
-Extend the language with functions written in C or C++.
+Extend the language with functions written in C or C++
@end itemize
This @value{DOCUMENT} teaches you about the @command{awk} language and
how you can use it effectively. You should already be familiar with basic
-system commands, such as @command{cat} and @command{ls},@footnote{These commands
+system commands, such as @command{cat} and @command{ls},@footnote{These utilities
are available on POSIX-compliant systems, as well as on traditional
Unix-based systems. If you are using some other operating system, you still need to
be familiar with the ideas of I/O redirection and pipes.} as well as basic shell
@@ -1306,10 +1329,9 @@ Microsoft Windows
@ifclear FOR_PRINT
(all versions) and OS/2 PCs,
@end ifclear
-and OpenVMS.
-(Some other, obsolete systems to which @command{gawk} was once ported
-are no longer supported and the code for those systems
-has been removed.)
+and OpenVMS.@footnote{Some other, obsolete systems to which @command{gawk}
+was once ported are no longer supported and the code for those systems
+has been removed.}
@menu
* History:: The history of @command{gawk} and
@@ -1386,13 +1408,13 @@ The version in System V Release 4 (1989) added some new features and cleaned
up the behavior in some of the ``dark corners'' of the language.
The specification for @command{awk} in the POSIX Command Language
and Utilities standard further clarified the language.
-Both the @command{gawk} designers and the original Bell Laboratories @command{awk}
-designers provided feedback for the POSIX specification.
+Both the @command{gawk} designers and the original @command{awk} designers at Bell Laboratories
+provided feedback for the POSIX specification.
@cindex Rubin, Paul
@cindex Fenlason, Jay
@cindex Trueman, David
-Paul Rubin wrote the GNU implementation, @command{gawk}, in 1986.
+Paul Rubin wrote @command{gawk} in 1986.
Jay Fenlason completed it, with advice from Richard Stallman. John Woods
contributed parts of the code as well. In 1988 and 1989, David Trueman, with
help from me, thoroughly reworked @command{gawk} for compatibility
@@ -1415,7 +1437,7 @@ an @command{awk}-level debugger. This version became available as
@command{gawk} @value{PVERSION} 4.0, in 2011.
@xref{Contributors},
-for a complete list of those who made important contributions to @command{gawk}.
+for a full list of those who made important contributions to @command{gawk}.
@node Names
@unnumberedsec A Rose by Any Other Name
@@ -1424,29 +1446,27 @@ for a complete list of those who made important contributions to @command{gawk}.
The @command{awk} language has evolved over the years. Full details are
provided in @ref{Language History}.
The language described in this @value{DOCUMENT}
-is often referred to as ``new @command{awk}'' (@command{nawk}).
+is often referred to as ``new @command{awk}''.
+By analogy, the original version of @command{awk} is
+referred to as ``old @command{awk}.''
-@cindex @command{awk}, versions of
-@cindex @command{nawk} utility
-@cindex @command{oawk} utility
-For some time after new @command{awk} was introduced, there were
-systems with multiple versions of @command{awk}. Some systems had
-an @command{awk} utility that implemented the original version of the
-@command{awk} language and a @command{nawk} utility for the new version.
-Others had an @command{oawk} version for the ``old @command{awk}''
-language and plain @command{awk} for the new one. Still others only
-had one version, which is usually the new one.
-
-Today, only Solaris systems still use an old @command{awk} for the
-default @command{awk} utility. (A more modern @command{awk} lives in
-@file{/usr/xpg6/bin} on these systems.) All other modern systems use
-some version of new @command{awk}.@footnote{Many of these systems use
-@command{gawk} for their @command{awk} implementation!}
-
-It is likely that you already have some version of new @command{awk} on
-your system, which is what you should use when running your programs.
-(Of course, if you're reading this @value{DOCUMENT}, chances are good
-that you have @command{gawk}!)
+Today, on most systems, when you run the @command{awk} utility,
+you get some version of new @command{awk}.@footnote{Only
+Solaris systems still use an old @command{awk} for the
+default @command{awk} utility. A more modern @command{awk} lives in
+@file{/usr/xpg6/bin} on these systems.} If your system's standard
+@command{awk} is the old one, you will see something like this
+if you try the test program:
+
+@example
+$ @kbd{awk 1 /dev/null}
+@error{} awk: syntax error near line 1
+@error{} awk: bailing out near line 1
+@end example
+
+@noindent
+In this case, you should find a version of new @command{awk},
+or just install @command{gawk}!
Throughout this @value{DOCUMENT}, whenever we refer to a language feature
that should be available in any complete implementation of POSIX @command{awk},
@@ -1483,8 +1503,8 @@ entry ``differences in @command{awk} and @command{gawk}.''}
@ifset FOR_PRINT
implementations.
@end ifset
-Finally, any @command{gawk} features that are not in
-the POSIX standard for @command{awk} are noted.
+Finally, it notes any @command{gawk} features that are not in
+the POSIX standard for @command{awk}.
@ifnotinfo
This @value{DOCUMENT} has the difficult task of being both a tutorial and a reference.
@@ -1497,11 +1517,13 @@ There are sidebars
scattered throughout the @value{DOCUMENT}.
They add a more complete explanation of points that are relevant, but not likely
to be of interest on first reading.
+@ifclear FOR_PRINT
All appear in the index, under the heading ``sidebar.''
+@end ifclear
Most of the time, the examples use complete @command{awk} programs.
Some of the more advanced sections show only the part of the @command{awk}
-program that illustrates the concept currently being described.
+program that illustrates the concept being described.
While this @value{DOCUMENT} is aimed principally at people who have not been
exposed
@@ -1549,7 +1571,7 @@ for getting most things done in a program.
@ref{Patterns and Actions},
describes how to write patterns for matching records, actions for
-doing something when a record is matched, and the built-in variables
+doing something when a record is matched, and the predefined variables
@command{awk} and @command{gawk} use.
@ref{Arrays},
@@ -1559,9 +1581,9 @@ sorting arrays in @command{gawk}. It also describes how @command{gawk}
provides arrays of arrays.
@ref{Functions},
-describes the built-in functions @command{awk} and
-@command{gawk} provide, as well as how to define
-your own functions.
+describes the built-in functions @command{awk} and @command{gawk} provide,
+as well as how to define your own functions. It also discusses how
+@command{gawk} lets you call functions indirectly.
Part II shows how to use @command{awk} and @command{gawk} for problem solving.
There is lots of code here for you to read and learn from.
@@ -1580,21 +1602,21 @@ Part III focuses on features specific to @command{gawk}.
It contains the following chapters:
@ref{Advanced Features},
-describes a number of @command{gawk}-specific advanced features.
+describes a number of advanced features.
Of particular note
-are the abilities to have two-way communications with another process,
+are the abilities to control the order of array traversal,
+have two-way communications with another process,
perform TCP/IP networking, and
profile your @command{awk} programs.
@ref{Internationalization},
-describes special features in @command{gawk} for translating program
+describes special features for translating program
messages into different languages at runtime.
-@ref{Debugger}, describes the @command{awk} debugger.
+@ref{Debugger}, describes the @command{gawk} debugger.
@ref{Arbitrary Precision Arithmetic},
-describes advanced arithmetic facilities provided by
-@command{gawk}.
+describes advanced arithmetic facilities.
@ref{Dynamic Extensions}, describes how to add new variables and
functions to @command{gawk} by writing extensions in C or C++.
@@ -1634,9 +1656,10 @@ printed edition. You may find them online, as follows:
@uref{http://www.gnu.org/software/gawk/manual/html_node/Notes.html,
The appendix on implementation notes}
-describes how to disable @command{gawk}'s extensions, as
-well as how to contribute new code to @command{gawk},
-and some possible future directions for @command{gawk} development.
+describes how to disable @command{gawk}'s extensions, how to contribute
+new code to @command{gawk}, where to find information on some possible
+future directions for @command{gawk} development, and the design decisions
+behind the extension API.
@uref{http://www.gnu.org/software/gawk/manual/html_node/Basic-Concepts.html,
The appendix on basic concepts}
@@ -1652,6 +1675,9 @@ try looking them up here.
@uref{http://www.gnu.org/software/gawk/manual/html_node/GNU-Free-Documentation-License.html,
The GNU FDL}
is the license that covers this @value{DOCUMENT}.
+
+Some of the chapters have exercise sections; these have also been
+omitted from the print edition but are available online.
@end ifset
@ifclear FOR_PRINT
@@ -1773,9 +1799,10 @@ the picture of a flashlight in the margin, as shown here.
They also appear in the index under the heading ``dark corner.''
@end ifclear
-As noted by the opening quote, though, any coverage of dark corners is,
-by definition, incomplete.
+But, as noted by the opening quote, any coverage of dark
+corners is by definition incomplete.
+@cindex c.e., See common extensions
Extensions to the standard @command{awk} language that are supported by
more than one @command{awk} implementation are marked
@ifclear FOR_PRINT
@@ -1847,9 +1874,7 @@ available for download from the Internet.
@ifnotinfo
The @value{DOCUMENT} you are reading is actually free---at least, the
information in it is free to anyone. The machine-readable
-source code for the @value{DOCUMENT} comes with @command{gawk}; anyone
-may take this @value{DOCUMENT} to a copying machine and make as many
-copies as they like.
+source code for the @value{DOCUMENT} comes with @command{gawk}.
@ifclear FOR_PRINT
(Take a moment to check the Free Documentation
License in @ref{GNU Free Documentation License}.)
@@ -1857,7 +1882,7 @@ License in @ref{GNU Free Documentation License}.)
@end ifnotinfo
@cindex Close, Diane
-The @value{DOCUMENT} itself has gone through a number of previous editions.
+The @value{DOCUMENT} itself has gone through multiple previous editions.
Paul Rubin wrote the very first draft of @cite{The GAWK Manual};
it was around 40 pages in size.
Diane Close and Richard Stallman improved it, yielding a
@@ -1873,15 +1898,14 @@ The FSF published the first two editions under
the title @cite{The GNU Awk User's Guide}.
@ifset FOR_PRINT
SSC published two editions of the @value{DOCUMENT} under the
-title @cite{Effective awk Programming}, and in O'Reilly published
+title @cite{Effective awk Programming}, and O'Reilly published
the third edition in 2001.
@end ifset
This edition maintains the basic structure of the previous editions.
-For FSF edition 4.0, the content has been thoroughly reviewed
-and updated. All references to @command{gawk} versions prior to 4.0 have been
-removed.
-Of significant note for this edition was @ref{Debugger}.
+For FSF edition 4.0, the content was thoroughly reviewed and updated. All
+references to @command{gawk} versions prior to 4.0 were removed.
+Of significant note for that edition was @ref{Debugger}.
For FSF edition
@ifclear FOR_PRINT
@@ -1895,8 +1919,7 @@ the content has been reorganized into parts,
and the major new additions are @ref{Arbitrary Precision Arithmetic},
and @ref{Dynamic Extensions}.
-This @value{DOCUMENT} will undoubtedly continue to evolve. An electronic
-version comes with the @command{gawk} distribution from the FSF. If you
+This @value{DOCUMENT} will undoubtedly continue to evolve. If you
find an error in this @value{DOCUMENT}, please report it! @xref{Bugs},
for information on submitting problem reports electronically.
@@ -1905,7 +1928,7 @@ for information on submitting problem reports electronically.
@unnumberedsec How to Stay Current
It may be you have a version of @command{gawk} which is newer than the
-one described in this @value{DOCUMENT}. To find out what has changed,
+one described here. To find out what has changed,
you should first look at the @file{NEWS} file in the @command{gawk}
distribution, which provides a high level summary of what changed in
each release.
@@ -2077,15 +2100,24 @@ Andrew Schorr,
Corinna Vinschen,
and Eli Zaretskii
(in alphabetical order)
-make up the current
-@command{gawk} ``crack portability team.'' Without their hard work and
-help, @command{gawk} would not be nearly the fine program it is today. It
-has been and continues to be a pleasure working with this team of fine
-people.
+make up the current @command{gawk} ``crack portability team.'' Without
+their hard work and help, @command{gawk} would not be nearly the robust,
+portable program it is today. It has been and continues to be a pleasure
+working with this team of fine people.
Notable code and documentation contributions were made by
a number of people. @xref{Contributors}, for the full list.
+@ifset FOR_PRINT
+@cindex Oram, Andy
+Thanks to Andy Oram, of O'Reilly Media, for initiating
+the fourth edition and for his support during the work.
+@end ifset
+
+Thanks to Michael Brennan for the Foreword.
+
+@cindex Duman, Patrice
+@cindex Berry, Karl
Thanks to Patrice Dumas for the new @command{makeinfo} program.
Thanks to Karl Berry who continues to work to keep
the Texinfo markup language sane.
@@ -2127,7 +2159,7 @@ take advantage of those opportunities.
Arnold Robbins @*
Nof Ayalon @*
ISRAEL @*
-May, 2014
+December, 2014
@end iftex
@ifnotinfo
@@ -2343,27 +2375,22 @@ For example, on OS/2, it is @kbd{Ctrl-z}.)
As an example, the following program prints a friendly piece of advice
(from Douglas Adams's @cite{The Hitchhiker's Guide to the Galaxy}),
to keep you from worrying about the complexities of computer
-programming (@code{BEGIN} is a feature we haven't discussed yet):
+programming:
@example
-$ @kbd{awk "BEGIN @{ print \"Don't Panic!\" @}"}
+$ @kbd{awk 'BEGIN @{ print "Don\47t Panic!" @}'}
@print{} Don't Panic!
@end example
-@cindex shell quoting, double quote
-@cindex double quote (@code{"}) in shell commands
-@cindex @code{"} (double quote) in shell commands
-@cindex @code{\} (backslash) in shell commands
-@cindex backslash (@code{\}) in shell commands
-This program does not read any input. The @samp{\} before each of the
-inner double quotes is necessary because of the shell's quoting
-rules---in particular because it mixes both single quotes and
-double quotes.@footnote{Although we generally recommend the use of single
-quotes around the program text, double quotes are needed here in order to
-put the single quote into the message.}
+@command{awk} executes statements associated with @code{BEGIN} before
+reading any input. If there are no other statements in your program,
+as is the case here, @command{awk} just stops, instead of trying to read
+input it doesn't know how to process.
+The @samp{\47} is a magic way (explained later) of getting a single quote into
+the program, without having to engage in ugly shell quoting tricks.
@quotation NOTE
-As a side note, if you use Bash as your shell, you should execute the
+If you use Bash as your shell, you should execute the
command @samp{set +H} before running this program interactively, to
disable the C shell-style command history, which treats @samp{!} as a
special character. We recommend putting this command into your personal
@@ -2393,7 +2420,7 @@ $ @kbd{awk '@{ print @}'}
@cindex @command{awk} programs, running
@cindex @command{awk} programs, lengthy
@cindex files, @command{awk} programs in
-Sometimes your @command{awk} programs can be very long. In this case, it is
+Sometimes @command{awk} programs are very long. In these cases, it is
more convenient to put the program into a separate file. In order to tell
@command{awk} to use that file for its program, you type:
@@ -2423,7 +2450,7 @@ awk -f advice
does the same thing as this one:
@example
-awk "BEGIN @{ print \"Don't Panic!\" @}"
+awk 'BEGIN @{ print "Don\47t Panic!" @}'
@end example
@cindex quoting in @command{gawk} command lines
@@ -2435,6 +2462,8 @@ specify with @option{-f}, because most @value{FN}s don't contain any of the shel
special characters. Notice that in @file{advice}, the @command{awk}
program did not have single quotes around it. The quotes are only needed
for programs that are provided on the @command{awk} command line.
+(Also, placing the program in a file allows us to use a literal single quote in the program
+text, instead of the magic @samp{\47}.)
@c STARTOFRANGE sq1x
@cindex single quote (@code{'}) in @command{gawk} command lines
@@ -2467,16 +2496,7 @@ BEGIN @{ print "Don't Panic!" @}
@noindent
After making this file executable (with the @command{chmod} utility),
simply type @samp{advice}
-at the shell and the system arranges to run @command{awk}@footnote{The
-line beginning with @samp{#!} lists the full @value{FN} of an interpreter
-to run and a single optional initial command-line argument to pass to that
-interpreter. The operating system then runs the interpreter with the given
-argument and the full argument list of the executed program. The first argument
-in the list is the full @value{FN} of the @command{awk} program.
-The rest of the
-argument list contains either options to @command{awk}, or @value{DF}s,
-or both. Note that on many systems @command{awk} may be found in
-@file{/usr/bin} instead of in @file{/bin}. Caveat Emptor.} as if you had
+at the shell and the system arranges to run @command{awk} as if you had
typed @samp{awk -f advice}:
@example
@@ -2494,14 +2514,32 @@ Self-contained @command{awk} scripts are useful when you want to write a
program that users can invoke without their having to know that the program is
written in @command{awk}.
-@cindex sidebar, Portability Issues with @samp{#!}
+@cindex sidebar, Understanding @samp{#!}
@ifdocbook
@docbook
-<sidebar><title>Portability Issues with @samp{#!}</title>
+<sidebar><title>Understanding @samp{#!}</title>
@end docbook
@cindex portability, @code{#!} (executable scripts)
+@command{awk} is an @dfn{interpreted} language. This means that the
+@command{awk} utility reads your program and then processes your data
+according to the instructions in your program. (This is different
+from a @dfn{compiled} language such as C, where your program is first
+compiled into machine code that is executed directly by your system's
+processor.) The @command{awk} utility is thus termed an @dfn{interpreter}.
+Many modern languages are interperted.
+
+The line beginning with @samp{#!} lists the full @value{FN} of an
+interpreter to run and a single optional initial command-line argument
+to pass to that interpreter. The operating system then runs the
+interpreter with the given argument and the full argument list of the
+executed program. The first argument in the list is the full @value{FN}
+of the @command{awk} program. The rest of the argument list contains
+either options to @command{awk}, or @value{DF}s, or both. (Note that on
+many systems @command{awk} may be found in @file{/usr/bin} instead of
+in @file{/bin}.)
+
Some systems limit the length of the interpreter name to 32 characters.
Often, this can be dealt with by using a symbolic link.
@@ -2513,8 +2551,7 @@ of some sort from @command{awk}.
@cindex @code{ARGC}/@code{ARGV} variables, portability and
@cindex portability, @code{ARGV} variable
-Finally,
-the value of @code{ARGV[0]}
+Finally, the value of @code{ARGV[0]}
(@pxref{Built-in Variables})
varies depending upon your operating system.
Some systems put @samp{awk} there, some put the full pathname
@@ -2530,11 +2567,29 @@ to provide your script name.
@ifnotdocbook
@cartouche
-@center @b{Portability Issues with @samp{#!}}
+@center @b{Understanding @samp{#!}}
@cindex portability, @code{#!} (executable scripts)
+@command{awk} is an @dfn{interpreted} language. This means that the
+@command{awk} utility reads your program and then processes your data
+according to the instructions in your program. (This is different
+from a @dfn{compiled} language such as C, where your program is first
+compiled into machine code that is executed directly by your system's
+processor.) The @command{awk} utility is thus termed an @dfn{interpreter}.
+Many modern languages are interperted.
+
+The line beginning with @samp{#!} lists the full @value{FN} of an
+interpreter to run and a single optional initial command-line argument
+to pass to that interpreter. The operating system then runs the
+interpreter with the given argument and the full argument list of the
+executed program. The first argument in the list is the full @value{FN}
+of the @command{awk} program. The rest of the argument list contains
+either options to @command{awk}, or @value{DF}s, or both. (Note that on
+many systems @command{awk} may be found in @file{/usr/bin} instead of
+in @file{/bin}.)
+
Some systems limit the length of the interpreter name to 32 characters.
Often, this can be dealt with by using a symbolic link.
@@ -2546,8 +2601,7 @@ of some sort from @command{awk}.
@cindex @code{ARGC}/@code{ARGV} variables, portability and
@cindex portability, @code{ARGV} variable
-Finally,
-the value of @code{ARGV[0]}
+Finally, the value of @code{ARGV[0]}
(@pxref{Built-in Variables})
varies depending upon your operating system.
Some systems put @samp{awk} there, some put the full pathname
@@ -2713,8 +2767,14 @@ Thus, the example seen
@ifnotinfo
previously
@end ifnotinfo
-in @ref{Read Terminal},
-is applicable:
+in @ref{Read Terminal}:
+
+@example
+awk 'BEGIN @{ print "Don\47t Panic!" @}'
+@end example
+
+@noindent
+could instead be written this way:
@example
$ @kbd{awk "BEGIN @{ print \"Don't Panic!\" @}"}
@@ -2744,7 +2804,7 @@ awk -F"" '@var{program}' @var{files} # wrong!
@end example
@noindent
-In the second case, @command{awk} will attempt to use the text of the program
+In the second case, @command{awk} attempts to use the text of the program
as the value of @code{FS}, and the first @value{FN} as the text of the program!
This results in syntax errors at best, and confusing behavior at worst.
@end itemize
@@ -2809,6 +2869,9 @@ $ awk -v sq="'" 'BEGIN @{ print "Here is a single quote <" sq ">" @}'
@print{} Here is a single quote <'>
@end example
+(Here, the two string constants and the value of @code{sq} are concatenated
+into a single string which is printed by @code{print}.)
+
If you really need both single and double quotes in your @command{awk}
program, it is probably best to move it into a separate file, where
the shell won't be part of the picture, and you can say what you mean.
@@ -2872,7 +2935,7 @@ The second @value{DF}, called @file{inventory-shipped}, contains
information about monthly shipments. In both files,
each line is considered to be one @dfn{record}.
-In the @value{DF} @file{mail-list}, each record contains the name of a person,
+In @file{mail-list}, each record contains the name of a person,
his/her phone number, his/her email-address, and a code for their relationship
with the author of the list.
The columns are aligned using spaces.
@@ -2910,6 +2973,7 @@ of green crates shipped, the number of red boxes shipped, the number of
orange bags shipped, and the number of blue packages shipped,
respectively. There are 16 entries, covering the 12 months of last year
and the first four months of the current year.
+An empty line separates the data for the two years.
@example
@c file eg/data/inventory-shipped
@@ -3005,22 +3069,25 @@ you can come up with different ways to do the same things shown here:
@itemize @value{BULLET}
@item
-Print the length of the longest input line:
+Print every line that is longer than 80 characters:
@example
-awk '@{ if (length($0) > max) max = length($0) @}
- END @{ print max @}' data
+awk 'length($0) > 80' data
@end example
+The sole rule has a relational expression as its pattern and it has no
+action---so it uses the default action, printing the record.
+
@item
-Print every line that is longer than 80 characters:
+Print the length of the longest input line:
@example
-awk 'length($0) > 80' data
+awk '@{ if (length($0) > max) max = length($0) @}
+ END @{ print max @}' data
@end example
-The sole rule has a relational expression as its pattern and it has no
-action---so it uses the default action, printing the record.
+The code associated with @code{END} executes after all
+input has been read; it's the other side of the coin to @code{BEGIN}.
@cindex @command{expand} utility
@item
@@ -3028,10 +3095,10 @@ Print the length of the longest line in @file{data}:
@example
expand data | awk '@{ if (x < length($0)) x = length($0) @}
- END @{ print "maximum line length is " x @}'
+ END @{ print "maximum line length is " x @}'
@end example
-This example differs slightly from the first example in this list:
+This example differs slightly from the previous one:
The input is processed by the @command{expand} utility to change TABs
into spaces, so the widths compared are actually the right-margin columns,
as opposed to the number of input characters on each line.
@@ -3060,7 +3127,7 @@ Print the total number of bytes used by @var{files}:
@example
ls -l @var{files} | awk '@{ x += $5 @}
- END @{ print "total bytes: " x @}'
+ END @{ print "total bytes: " x @}'
@end example
@item
@@ -3104,7 +3171,7 @@ the program would print the odd-numbered lines.
@cindex @command{awk} programs
The @command{awk} utility reads the input files one line at a
-time. For each line, @command{awk} tries the patterns of each of the rules.
+time. For each line, @command{awk} tries the patterns of each rule.
If several patterns match, then several actions execute in the order in
which they appear in the @command{awk} program. If no patterns match, then
no actions run.
@@ -3112,7 +3179,7 @@ no actions run.
After processing all the rules that match the line (and perhaps there are none),
@command{awk} reads the next line. (However,
@pxref{Next Statement},
-and also @pxref{Nextfile Statement}).
+and also @pxref{Nextfile Statement}.)
This continues until the program reaches the end of the file.
For example, the following @command{awk} program contains two rules:
@@ -3161,8 +3228,8 @@ features that haven't been covered yet, so don't worry if you don't
understand all the details:
@example
-LC_ALL=C ls -l | awk '$6 == "Nov" @{ sum += $5 @}
- END @{ print sum @}'
+ls -l | awk '$6 == "Nov" @{ sum += $5 @}
+ END @{ print sum @}'
@end example
@cindex @command{ls} utility
@@ -3186,13 +3253,12 @@ the file was last modified. Its output looks like this:
@noindent
@cindex line continuations, with C shell
The first field contains read-write permissions, the second field contains
-the number of links to the file, and the third field identifies the owner of
-the file. The fourth field identifies the group of the file.
-The fifth field contains the size of the file in bytes. The
+the number of links to the file, and the third field identifies the file's owner.
+The fourth field identifies the file's group.
+The fifth field contains the file's size in bytes. The
sixth, seventh, and eighth fields contain the month, day, and time,
respectively, that the file was last modified. Finally, the ninth field
-contains the @value{FN}.@footnote{The @samp{LC_ALL=C} is
-needed to produce this traditional-style output from @command{ls}.}
+contains the @value{FN}.
@c @cindex automatic initialization
@cindex initialization, automatic
@@ -3380,7 +3446,7 @@ and array sorting.
As we develop our presentation of the @command{awk} language, we introduce
most of the variables and many of the functions. They are described
-systematically in @ref{Built-in Variables}, and
+systematically in @ref{Built-in Variables}, and in
@ref{Built-in}.
@node When
@@ -3415,9 +3481,7 @@ eight-bit microprocessors,
and a microcode assembler for a special-purpose Prolog
computer.
While the original @command{awk}'s capabilities were strained by tasks
-of such complexity, modern versions are more capable. Even BWK @command{awk}
-has fewer predefined limits, and those
-that it has are much larger than they used to be.
+of such complexity, modern versions are more capable.
@cindex @command{awk} programs, complex
If you find yourself writing @command{awk} scripts of more than, say,
@@ -3431,11 +3495,16 @@ and Perl.}
@node Intro Summary
@section Summary
+@c FIXME: Review this chapter for summary of builtin functions called.
@itemize @value{BULLET}
@item
Programs in @command{awk} consist of @var{pattern}-@var{action} pairs.
@item
+An @var{action} without a @var{pattern} always runs. The default
+@var{action} for a pattern without one is @samp{@{ print $0 @}}.
+
+@item
Use either
@samp{awk '@var{program}' @var{files}}
or
@@ -3593,13 +3662,13 @@ The @option{-v} option can only set one variable, but it can be used
more than once, setting another variable each time, like this:
@samp{awk @w{-v foo=1} @w{-v bar=2} @dots{}}.
-@cindex built-in variables, @code{-v} option@comma{} setting with
-@cindex variables, built-in, @code{-v} option@comma{} setting with
+@cindex predefined variables, @code{-v} option@comma{} setting with
+@cindex variables, predefined @code{-v} option@comma{} setting with
@quotation CAUTION
Using @option{-v} to set the values of the built-in
variables may lead to surprising results. @command{awk} will reset the
values of those variables as it needs to, possibly ignoring any
-predefined value you may have given.
+initial value you may have given.
@end quotation
@item -W @var{gawk-opt}
@@ -3682,7 +3751,7 @@ Print the short version of the General Public License and then exit.
@cindex variables, global, printing list of
Print a sorted list of global variables, their types, and final values
to @var{file}. If no @var{file} is provided, print this
-list to the file named @file{awkvars.out} in the current directory.
+list to a file named @file{awkvars.out} in the current directory.
No space is allowed between the @option{-d} and @var{file}, if
@var{file} is supplied.
@@ -3778,7 +3847,7 @@ that @command{gawk} accepts and then exit.
@cindex @option{-i} option
@cindex @option{--include} option
@cindex @command{awk} programs, location of
-Read @command{awk} source library from @var{source-file}. This option
+Read an @command{awk} source library from @var{source-file}. This option
is completely equivalent to using the @code{@@include} directive inside
your program. This option is very similar to the @option{-f} option,
but there are two important differences. First, when @option{-i} is
@@ -3802,7 +3871,7 @@ environment variable. The correct library suffix for your platform will be
supplied by default, so it need not be specified in the extension name.
The extension initialization routine should be named @code{dl_load()}.
An alternative is to use the @code{@@load} keyword inside the program to load
-a shared library. This feature is described in detail in @ref{Dynamic Extensions}.
+a shared library. This advanced feature is described in detail in @ref{Dynamic Extensions}.
@item @option{-L}[@var{value}]
@itemx @option{--lint}[@code{=}@var{value}]
@@ -3851,6 +3920,8 @@ values in input data
@quotation CAUTION
This option can severely break old programs.
Use with care.
+
+This option may disappear in a future version of @command{gawk}.
@end quotation
@item @option{-N}
@@ -4014,6 +4085,7 @@ if they had been concatenated together into one big file. This is
useful for creating libraries of @command{awk} functions. These functions
can be written once and then retrieved from a standard place, instead
of having to be included into each individual program.
+The @option{-i} option is similar in this regard.
(As mentioned in
@ref{Definition Syntax},
function names must be unique.)
@@ -4087,15 +4159,18 @@ Any additional arguments on the command line are normally treated as
input files to be processed in the order specified. However, an
argument that has the form @code{@var{var}=@var{value}}, assigns
the value @var{value} to the variable @var{var}---it does not specify a
-file at all.
-(See
-@ref{Assignment Options}.)
+file at all. (See @ref{Assignment Options}.) In the following example,
+@var{count=1} is a variable assignment, not a @value{FN}:
+
+@example
+awk -f program.awk file1 count=1 file2
+@end example
@cindex @command{gawk}, @code{ARGIND} variable in
@cindex @code{ARGIND} variable, command-line arguments
@cindex @code{ARGV} array, indexing into
@cindex @code{ARGC}/@code{ARGV} variables, command-line arguments
-All these arguments are made available to your @command{awk} program in the
+All the command-line arguments are made available to your @command{awk} program in the
@code{ARGV} array (@pxref{Built-in Variables}). Command-line options
and the program text (if present) are omitted from @code{ARGV}.
All other arguments, including variable assignments, are
@@ -4103,6 +4178,11 @@ included. As each element of @code{ARGV} is processed, @command{gawk}
sets the variable @code{ARGIND} to the index in @code{ARGV} of the
current element.
+@c FIXME: One day, move the ARGC and ARGV node closer to here.
+Changing @code{ARGC} and @code{ARGV} in your @command{awk} program lets
+you control how @command{awk} processes the input files; this is described
+in more detail in @ref{ARGC and ARGV}.
+
@cindex input files, variable assignments and
@cindex variable assignments and input files
The distinction between @value{FN} arguments and variable-assignment
@@ -4221,15 +4301,15 @@ separated by colons@footnote{Semicolons on MS-Windows and MS-DOS.}. @command{ga
@samp{.:/usr/local/share/awk}.@footnote{Your version of @command{gawk}
may use a different directory; it
will depend upon how @command{gawk} was built and installed. The actual
-directory is the value of @samp{$(datadir)} generated when
+directory is the value of @code{$(datadir)} generated when
@command{gawk} was configured. You probably don't need to worry about this,
though.}
The search path feature is particularly helpful for building libraries
of useful @command{awk} functions. The library files can be placed in a
standard directory in the default path and then specified on
-the command line with a short @value{FN}. Otherwise, the full @value{FN}
-would have to be typed for each file.
+the command line with a short @value{FN}. Otherwise, you would have to
+type the full @value{FN} for each file.
By using the @option{-i} option, or the @option{-e} and @option{-f} options, your command-line
@command{awk} programs can use facilities in @command{awk} library files
@@ -4238,25 +4318,23 @@ Path searching is not done if @command{gawk} is in compatibility mode.
This is true for both @option{--traditional} and @option{--posix}.
@xref{Options}.
-If the source code is not found after the initial search, the path is searched
+If the source code file is not found after the initial search, the path is searched
again after adding the default @samp{.awk} suffix to the @value{FN}.
-@quotation NOTE
-@c 4/2014:
-@c using @samp{.} to get quotes, since @file{} no longer supplies them.
-To include
-the current directory in the path, either place
-@samp{.} explicitly in the path or write a null entry in the
-path. (A null entry is indicated by starting or ending the path with a
-colon or by placing two colons next to each other [@samp{::}].)
-This path search mechanism is similar
+@command{gawk}'s path search mechanism is similar
to the shell's.
(See @uref{http://www.gnu.org/software/bash/manual/,
-@cite{The Bourne-Again SHell manual}.})
+@cite{The Bourne-Again SHell manual}}.)
+It treats a null entry in the path as indicating the current
+directory.
+(A null entry is indicated by starting or ending the path with a
+colon or by placing two colons next to each other [@samp{::}].)
-However, @command{gawk} always looks in the current directory @emph{before}
-searching @env{AWKPATH}, so there is no real reason to include
-the current directory in the search path.
+@quotation NOTE
+@command{gawk} always looks in the current directory @emph{before}
+searching @env{AWKPATH}. Thus, while you can include the current directory
+in the search path, either explicitly or with a null entry, there is no
+real reason to do so.
@c Prior to 4.0, gawk searched the current directory after the
@c path search, but it's not worth documenting it.
@end quotation
@@ -4297,16 +4375,6 @@ behavior, but they are more specialized. Those in the following
list are meant to be used by regular users.
@table @env
-@item POSIXLY_CORRECT
-Causes @command{gawk} to switch to POSIX compatibility
-mode, disabling all traditional and GNU extensions.
-@xref{Options}.
-
-@item GAWK_SOCK_RETRIES
-Controls the number of times @command{gawk} attempts to
-retry a two-way TCP/IP (socket) connection before giving up.
-@xref{TCP/IP Networking}.
-
@item GAWK_MSEC_SLEEP
Specifies the interval between connection retries,
in milliseconds. On systems that do not support
@@ -4317,6 +4385,16 @@ the value is rounded up to an integral number of seconds.
Specifies the time, in milliseconds, for @command{gawk} to
wait for input before returning with an error.
@xref{Read Timeout}.
+
+@item GAWK_SOCK_RETRIES
+Controls the number of times @command{gawk} attempts to
+retry a two-way TCP/IP (socket) connection before giving up.
+@xref{TCP/IP Networking}.
+
+@item POSIXLY_CORRECT
+Causes @command{gawk} to switch to POSIX compatibility
+mode, disabling all traditional and GNU extensions.
+@xref{Options}.
@end table
The environment variables in the following list are meant
@@ -4331,7 +4409,7 @@ file as the size of the memory buffer to allocate for I/O. Otherwise,
the value should be a number, and @command{gawk} uses that number as
the size of the buffer to allocate. (When this variable is not set,
@command{gawk} uses the smaller of the file's size and the ``default''
-blocksize, which is usually the filesystems I/O blocksize.)
+blocksize, which is usually the filesystem's I/O blocksize.)
@item AWK_HASH
If this variable exists with a value of @samp{gst}, @command{gawk}
@@ -4346,10 +4424,11 @@ for debugging problems on filesystems on non-POSIX operating systems
where I/O is performed in records, not in blocks.
@item GAWK_MSG_SRC
-If this variable exists, @command{gawk} includes the source file
-name and line number from which warning and/or fatal messages
+If this variable exists, @command{gawk} includes the file
+name and line number within the @command{gawk} source code
+from which warning and/or fatal messages
are generated. Its purpose is to help isolate the source of a
-message, since there can be multiple places which produce the
+message, since there are multiple places which produce the
same warning or error message.
@item GAWK_NO_DFA
@@ -4365,11 +4444,11 @@ This specifies the amount by which @command{gawk} should grow its
internal evaluation stack, when needed.
@item INT_CHAIN_MAX
-The average number of items @command{gawk} will maintain on a
+The intended maximum number of items @command{gawk} will maintain on a
hash chain for managing arrays indexed by integers.
@item STR_CHAIN_MAX
-The average number of items @command{gawk} will maintain on a
+The intended maximum number of items @command{gawk} will maintain on a
hash chain for managing arrays indexed by strings.
@item TIDYMEM
@@ -4404,6 +4483,9 @@ to @code{EXIT_FAILURE}.
This @value{SECTION} describes a feature that is specific to @command{gawk}.
+@cindex @code{@@include} directive
+@cindex file inclusion, @code{@@include} directive
+@cindex including files, @code{@@include} directive
The @code{@@include} keyword can be used to read external @command{awk} source
files. This gives you the ability to split large @command{awk} source files
into smaller, more manageable pieces, and also lets you reuse common @command{awk}
@@ -4439,8 +4521,8 @@ produces the following result:
@example
$ @kbd{gawk -f test2}
-@print{} This is file test1.
-@print{} This is file test2.
+@print{} This is script test1.
+@print{} This is script test2.
@end example
@code{gawk} runs the @file{test2} script which includes @file{test1}
@@ -4470,9 +4552,9 @@ following results:
@example
$ @kbd{gawk -f test3}
-@print{} This is file test1.
-@print{} This is file test2.
-@print{} This is file test3.
+@print{} This is script test1.
+@print{} This is script test2.
+@print{} This is script test3.
@end example
The @value{FN} can, of course, be a pathname. For example:
@@ -4523,6 +4605,9 @@ and this also applies to files named with @code{@@include}.
This @value{SECTION} describes a feature that is specific to @command{gawk}.
+@cindex @code{@@load} directive
+@cindex loading extensions, @code{@@load} directive
+@cindex extensions, loading, @code{@@load} directive
The @code{@@load} keyword can be used to read external @command{awk} extensions
(stored as system shared libraries).
This allows you to link in compiled code that may offer superior
@@ -4556,6 +4641,7 @@ that requires access to an extension.
@ref{Dynamic Extensions}, describes how to write extensions (in C or C++)
that can be loaded with either @code{@@load} or the @option{-l} option.
+It also describes the @code{ordchr} extension.
@node Obsolete
@section Obsolete Options and/or Features
@@ -4624,15 +4710,15 @@ awk '@{ sum += $1 @} END @{ print sum @}'
@end example
@command{gawk} actually supports this but it is purposely undocumented
-because it is considered bad style. The correct way to write such a program
-is either
+because it is bad style. The correct way to write such a program
+is either:
@example
awk '@{ sum += $1 @} ; END @{ print sum @}'
@end example
@noindent
-or
+or:
@example
awk '@{ sum += $1 @}
@@ -4640,8 +4726,7 @@ awk '@{ sum += $1 @}
@end example
@noindent
-@xref{Statements/Lines}, for a fuller
-explanation.
+@xref{Statements/Lines}, for a fuller explanation.
You can insert newlines after the @samp{;} in @code{for} loops.
This seems to have been a long-undocumented feature in Unix @command{awk}.
@@ -4681,7 +4766,8 @@ affects how @command{awk} processes input.
@item
You can use a single minus sign (@samp{-}) to refer to standard input
-on the command line.
+on the command line. @command{gawk} also lets you use the special
+@value{FN} @file{/dev/stdin}.
@item
@command{gawk} pays attention to a number of environment variables.
@@ -4724,7 +4810,7 @@ The simplest regular expression is a sequence of letters, numbers, or
both. Such a regexp matches any string that contains that sequence.
Thus, the regexp @samp{foo} matches any string containing @samp{foo}.
Therefore, the pattern @code{/foo/} matches any input record containing
-the three characters @samp{foo} @emph{anywhere} in the record. Other
+the three adjacent characters @samp{foo} @emph{anywhere} in the record. Other
kinds of regexps let you specify more complicated classes of strings.
@ifnotinfo
@@ -4738,10 +4824,10 @@ regular expressions work, we present more complicated instances.
* Escape Sequences:: How to write nonprinting characters.
* Regexp Operators:: Regular Expression Operators.
* Bracket Expressions:: What can go between @samp{[...]}.
-* GNU Regexp Operators:: Operators specific to GNU software.
-* Case-sensitivity:: How to do case-insensitive matching.
* Leftmost Longest:: How much text matches.
* Computed Regexps:: Using Dynamic Regexps.
+* GNU Regexp Operators:: Operators specific to GNU software.
+* Case-sensitivity:: How to do case-insensitive matching.
* Regexp Summary:: Regular expressions summary.
@end menu
@@ -4870,7 +4956,7 @@ such as TAB or newline. While there is nothing to stop you from entering most
unprintable characters directly in a string constant or regexp constant,
they may look ugly.
-The following table lists
+The following list presents
all the escape sequences used in @command{awk} and
what they represent. Unless noted otherwise, all these escape
sequences apply to both string constants and regexp constants:
@@ -4952,8 +5038,11 @@ However, using more than two hexadecimal digits produces
@item \/
A literal slash (necessary for regexp constants only).
This sequence is used when you want to write a regexp
-constant that contains a slash. Because the regexp is delimited by
-slashes, you need to escape the slash that is part of the pattern,
+constant that contains a slash
+(such as @code{/.*:\/home\/[[:alnum:]]+:.*/}; the @samp{[[:alnum:]]}
+notation is discussed shortly, in @ref{Bracket Expressions}).
+Because the regexp is delimited by
+slashes, you need to escape any slash that is part of the pattern,
in order to tell @command{awk} to keep processing the rest of the regexp.
@cindex @code{\} (backslash), @code{\"} escape sequence
@@ -4961,8 +5050,10 @@ in order to tell @command{awk} to keep processing the rest of the regexp.
@item \"
A literal double quote (necessary for string constants only).
This sequence is used when you want to write a string
-constant that contains a double quote. Because the string is delimited by
-double quotes, you need to escape the quote that is part of the string,
+constant that contains a double quote
+(such as @code{"He said \"hi!\" to her."}).
+Because the string is delimited by
+double quotes, you need to escape any quote that is part of the string,
in order to tell @command{awk} to keep processing the rest of the string.
@end table
@@ -4981,13 +5072,13 @@ characters @samp{a+b}.
@cindex @code{\} (backslash), in escape sequences
@cindex portability
For complete portability, do not use a backslash before any character not
-shown in the previous list.
+shown in the previous list and that is not an operator.
To summarize:
@itemize @value{BULLET}
@item
-The escape sequences in the table above are always processed first,
+The escape sequences in the list above are always processed first,
for both string constants and regexp constants. This happens very early,
as soon as @command{awk} reads your program.
@@ -5154,13 +5245,13 @@ The escape sequences described
@ifnotinfo
earlier
@end ifnotinfo
-in @ref{Escape Sequences},
+in @DBREF{Escape Sequences}
are valid inside a regexp. They are introduced by a @samp{\} and
are recognized and converted into corresponding real characters as
the very first step in processing regexps.
Here is a list of metacharacters. All characters that are not escape
-sequences and that are not listed in the table stand for themselves:
+sequences and that are not listed in the following stand for themselves:
@c Use @asis so the docbook comes out ok. Sigh.
@table @asis
@@ -5217,10 +5308,10 @@ with @samp{A}.
@cindex POSIX @command{awk}, period (@code{.})@comma{} using
In strict POSIX mode (@pxref{Options}),
-@samp{.} does not match the @sc{nul}
+@samp{.} does not match the @value{NUL}
character, which is a character with all bits equal to zero.
-Otherwise, @sc{nul} is just another character. Other versions of @command{awk}
-may not be able to match the @sc{nul} character.
+Otherwise, @value{NUL} is just another character. Other versions of @command{awk}
+may not be able to match the @value{NUL} character.
@cindex @code{[]} (square brackets), regexp operator
@cindex square brackets (@code{[]}), regexp operator
@@ -5251,12 +5342,11 @@ or @samp{k}.
@cindex vertical bar (@code{|})
@item @code{|}
This is the @dfn{alternation operator} and it is used to specify
-alternatives.
-The @samp{|} has the lowest precedence of all the regular
-expression operators.
-For example, @samp{^P|[[:digit:]]}
-matches any string that matches either @samp{^P} or @samp{[[:digit:]]}. This
-means it matches any string that starts with @samp{P} or contains a digit.
+alternatives. The @samp{|} has the lowest precedence of all the regular
+expression operators. For example, @samp{^P|[aeiouy]} matches any string
+that matches either @samp{^P} or @samp{[aeiouy]}. This means it matches
+any string that starts with @samp{P} or contains (anywhere within it)
+a lowercase English vowel.
The alternation applies to the largest possible regexps on either side.
@@ -5280,14 +5370,15 @@ applies the @samp{*} symbol to the preceding @samp{h} and looks for matches
of one @samp{p} followed by any number of @samp{h}s. This also matches
just @samp{p} if no @samp{h}s are present.
-The @samp{*} repeats the @emph{smallest} possible preceding expression.
-(Use parentheses if you want to repeat a larger expression.) It finds
-as many repetitions as possible. For example,
-@samp{awk '/\(c[ad][ad]*r x\)/ @{ print @}' sample}
-prints every record in @file{sample} containing a string of the form
-@samp{(car x)}, @samp{(cdr x)}, @samp{(cadr x)}, and so on.
-Notice the escaping of the parentheses by preceding them
-with backslashes.
+There are two subtle points to understand about how @samp{*} works.
+First, the @samp{*} applies only to the single preceding regular expression
+component (e.g., in @samp{ph*}, it applies just to the @samp{h}).
+To cause @samp{*} to apply to a larger sub-expression, use parentheses:
+@samp{(ph)*} matches @samp{ph}, @samp{phph}, @samp{phphph} and so on.
+
+Second, @samp{*} finds as many repetitions as possible. If the text
+to be matched is @samp{phhhhhhhhhhhhhhooey}, @samp{ph*} matches all of
+the @samp{h}s.
@cindex @code{+} (plus sign), regexp operator
@cindex plus sign (@code{+}), regexp operator
@@ -5296,12 +5387,6 @@ This symbol is similar to @samp{*}, except that the preceding expression must be
matched at least once. This means that @samp{wh+y}
would match @samp{why} and @samp{whhy}, but not @samp{wy}, whereas
@samp{wh*y} would match all three.
-The following is a simpler
-way of writing the last @samp{*} example:
-
-@example
-awk '/\(c[ad]+r x\)/ @{ print @}' sample
-@end example
@cindex @code{?} (question mark), regexp operator
@cindex question mark (@code{?}), regexp operator
@@ -5396,7 +5481,7 @@ Within a bracket expression, a @dfn{range expression} consists of two
characters separated by a hyphen. It matches any single character that
sorts between the two characters, based upon the system's native character
set. For example, @samp{[0-9]} is equivalent to @samp{[0123456789]}.
-(See @ref{Ranges and Locales}, for an explanation of how the POSIX
+(See @DBREF{Ranges and Locales} for an explanation of how the POSIX
standard and @command{gawk} have changed over time. This is mainly
of historical interest.)
@@ -5415,12 +5500,15 @@ bracket expression, put a @samp{\} in front of it. For example:
@noindent
matches either @samp{d} or @samp{]}.
+Additionally, if you place @samp{]} right after the opening
+@samp{[}, the closing bracket is treated as one of the
+characters to be matched.
@cindex POSIX @command{awk}, bracket expressions and
@cindex Extended Regular Expressions (EREs)
@cindex EREs (Extended Regular Expressions)
@cindex @command{egrep} utility
-This treatment of @samp{\} in bracket expressions
+The treatment of @samp{\} in bracket expressions
is compatible with other @command{awk}
implementations and is also mandated by POSIX.
The regular expressions in @command{awk} are a superset
@@ -5526,6 +5614,204 @@ they do not recognize collating symbols or equivalence classes.
@c maybe one day ...
@c ENDOFRANGE charlist
+@node Leftmost Longest
+@section How Much Text Matches?
+
+@cindex regular expressions, leftmost longest match
+@c @cindex matching, leftmost longest
+Consider the following:
+
+@example
+echo aaaabcd | awk '@{ sub(/a+/, "<A>"); print @}'
+@end example
+
+This example uses the @code{sub()} function to make a change to the input
+record. (@code{sub()} replaces the first instance of any text matched
+by the first argument with the string provided as the second argument;
+@pxref{String Functions}). Here, the regexp @code{/a+/} indicates ``one
+or more @samp{a} characters,'' and the replacement text is @samp{<A>}.
+
+The input contains four @samp{a} characters.
+@command{awk} (and POSIX) regular expressions always match
+the leftmost, @emph{longest} sequence of input characters that can
+match. Thus, all four @samp{a} characters are
+replaced with @samp{<A>} in this example:
+
+@example
+$ @kbd{echo aaaabcd | awk '@{ sub(/a+/, "<A>"); print @}'}
+@print{} <A>bcd
+@end example
+
+For simple match/no-match tests, this is not so important. But when doing
+text matching and substitutions with the @code{match()}, @code{sub()}, @code{gsub()},
+and @code{gensub()} functions, it is very important.
+@ifinfo
+@xref{String Functions},
+for more information on these functions.
+@end ifinfo
+Understanding this principle is also important for regexp-based record
+and field splitting (@pxref{Records},
+and also @pxref{Field Separators}).
+
+@node Computed Regexps
+@section Using Dynamic Regexps
+
+@c STARTOFRANGE dregexp
+@cindex regular expressions, computed
+@c STARTOFRANGE regexpd
+@cindex regular expressions, dynamic
+@cindex @code{~} (tilde), @code{~} operator
+@cindex tilde (@code{~}), @code{~} operator
+@cindex @code{!} (exclamation point), @code{!~} operator
+@cindex exclamation point (@code{!}), @code{!~} operator
+@c @cindex operators, @code{~}
+@c @cindex operators, @code{!~}
+The righthand side of a @samp{~} or @samp{!~} operator need not be a
+regexp constant (i.e., a string of characters between slashes). It may
+be any expression. The expression is evaluated and converted to a string
+if necessary; the contents of the string are then used as the
+regexp. A regexp computed in this way is called a @dfn{dynamic
+regexp} or a @dfn{computed regexp}:
+
+@example
+BEGIN @{ digits_regexp = "[[:digit:]]+" @}
+$0 ~ digits_regexp @{ print @}
+@end example
+
+@noindent
+This sets @code{digits_regexp} to a regexp that describes one or more digits,
+and tests whether the input record matches this regexp.
+
+@quotation NOTE
+When using the @samp{~} and @samp{!~}
+operators, there is a difference between a regexp constant
+enclosed in slashes and a string constant enclosed in double quotes.
+If you are going to use a string constant, you have to understand that
+the string is, in essence, scanned @emph{twice}: the first time when
+@command{awk} reads your program, and the second time when it goes to
+match the string on the lefthand side of the operator with the pattern
+on the right. This is true of any string-valued expression (such as
+@code{digits_regexp}, shown previously), not just string constants.
+@end quotation
+
+@cindex regexp constants, slashes vs.@: quotes
+@cindex @code{\} (backslash), in regexp constants
+@cindex backslash (@code{\}), in regexp constants
+@cindex @code{"} (double quote), in regexp constants
+@cindex double quote (@code{"}), in regexp constants
+What difference does it make if the string is
+scanned twice? The answer has to do with escape sequences, and particularly
+with backslashes. To get a backslash into a regular expression inside a
+string, you have to type two backslashes.
+
+For example, @code{/\*/} is a regexp constant for a literal @samp{*}.
+Only one backslash is needed. To do the same thing with a string,
+you have to type @code{"\\*"}. The first backslash escapes the
+second one so that the string actually contains the
+two characters @samp{\} and @samp{*}.
+
+@cindex troubleshooting, regexp constants vs.@: string constants
+@cindex regexp constants, vs.@: string constants
+@cindex string constants, vs.@: regexp constants
+Given that you can use both regexp and string constants to describe
+regular expressions, which should you use? The answer is ``regexp
+constants,'' for several reasons:
+
+@itemize @value{BULLET}
+@item
+String constants are more complicated to write and
+more difficult to read. Using regexp constants makes your programs
+less error-prone. Not understanding the difference between the two
+kinds of constants is a common source of errors.
+
+@item
+It is more efficient to use regexp constants. @command{awk} can note
+that you have supplied a regexp and store it internally in a form that
+makes pattern matching more efficient. When using a string constant,
+@command{awk} must first convert the string into this internal form and
+then perform the pattern matching.
+
+@item
+Using regexp constants is better form; it shows clearly that you
+intend a regexp match.
+@end itemize
+
+@cindex sidebar, Using @code{\n} in Bracket Expressions of Dynamic Regexps
+@ifdocbook
+@docbook
+<sidebar><title>Using @code{\n} in Bracket Expressions of Dynamic Regexps</title>
+@end docbook
+
+@cindex regular expressions, dynamic, with embedded newlines
+@cindex newlines, in dynamic regexps
+
+Some older versions of @command{awk} do not allow the newline
+character to be used inside a bracket expression for a dynamic regexp:
+
+@example
+$ @kbd{awk '$0 ~ "[ \t\n]"'}
+@error{} awk: newline in character class [
+@error{} ]...
+@error{} source line number 1
+@error{} context is
+@error{} $0 ~ "[ >>> \t\n]" <<<
+@end example
+
+@cindex newlines, in regexp constants
+But a newline in a regexp constant works with no problem:
+
+@example
+$ @kbd{awk '$0 ~ /[ \t\n]/'}
+@kbd{here is a sample line}
+@print{} here is a sample line
+@kbd{Ctrl-d}
+@end example
+
+@command{gawk} does not have this problem, and it isn't likely to
+occur often in practice, but it's worth noting for future reference.
+
+@docbook
+</sidebar>
+@end docbook
+@end ifdocbook
+
+@ifnotdocbook
+@cartouche
+@center @b{Using @code{\n} in Bracket Expressions of Dynamic Regexps}
+
+
+@cindex regular expressions, dynamic, with embedded newlines
+@cindex newlines, in dynamic regexps
+
+Some older versions of @command{awk} do not allow the newline
+character to be used inside a bracket expression for a dynamic regexp:
+
+@example
+$ @kbd{awk '$0 ~ "[ \t\n]"'}
+@error{} awk: newline in character class [
+@error{} ]...
+@error{} source line number 1
+@error{} context is
+@error{} $0 ~ "[ >>> \t\n]" <<<
+@end example
+
+@cindex newlines, in regexp constants
+But a newline in a regexp constant works with no problem:
+
+@example
+$ @kbd{awk '$0 ~ /[ \t\n]/'}
+@kbd{here is a sample line}
+@print{} here is a sample line
+@kbd{Ctrl-d}
+@end example
+
+@command{gawk} does not have this problem, and it isn't likely to
+occur often in practice, but it's worth noting for future reference.
+@end cartouche
+@end ifnotdocbook
+@c ENDOFRANGE dregexp
+@c ENDOFRANGE regexpd
+
@node GNU Regexp Operators
@section @command{gawk}-Specific Regexp Operators
@@ -5689,7 +5975,7 @@ are allowed.
Traditional Unix @command{awk} regexps are matched. The GNU operators
are not special, and interval expressions are not available.
The POSIX character classes (@samp{[[:alnum:]]}, etc.) are supported,
-as BWK @command{awk} does support them.
+as BWK @command{awk} supports them.
Characters described by octal and hexadecimal escape sequences are
treated literally, even if they represent regexp metacharacters.
@@ -5801,204 +6087,6 @@ Case is always significant in compatibility mode.
@c ENDOFRANGE csregexp
@c ENDOFRANGE regexpcs
-@node Leftmost Longest
-@section How Much Text Matches?
-
-@cindex regular expressions, leftmost longest match
-@c @cindex matching, leftmost longest
-Consider the following:
-
-@example
-echo aaaabcd | awk '@{ sub(/a+/, "<A>"); print @}'
-@end example
-
-This example uses the @code{sub()} function (which we haven't discussed yet;
-@pxref{String Functions})
-to make a change to the input record. Here, the regexp @code{/a+/}
-indicates ``one or more @samp{a} characters,'' and the replacement
-text is @samp{<A>}.
-
-The input contains four @samp{a} characters.
-@command{awk} (and POSIX) regular expressions always match
-the leftmost, @emph{longest} sequence of input characters that can
-match. Thus, all four @samp{a} characters are
-replaced with @samp{<A>} in this example:
-
-@example
-$ @kbd{echo aaaabcd | awk '@{ sub(/a+/, "<A>"); print @}'}
-@print{} <A>bcd
-@end example
-
-For simple match/no-match tests, this is not so important. But when doing
-text matching and substitutions with the @code{match()}, @code{sub()}, @code{gsub()},
-and @code{gensub()} functions, it is very important.
-@ifinfo
-@xref{String Functions},
-for more information on these functions.
-@end ifinfo
-Understanding this principle is also important for regexp-based record
-and field splitting (@pxref{Records},
-and also @pxref{Field Separators}).
-
-@node Computed Regexps
-@section Using Dynamic Regexps
-
-@c STARTOFRANGE dregexp
-@cindex regular expressions, computed
-@c STARTOFRANGE regexpd
-@cindex regular expressions, dynamic
-@cindex @code{~} (tilde), @code{~} operator
-@cindex tilde (@code{~}), @code{~} operator
-@cindex @code{!} (exclamation point), @code{!~} operator
-@cindex exclamation point (@code{!}), @code{!~} operator
-@c @cindex operators, @code{~}
-@c @cindex operators, @code{!~}
-The righthand side of a @samp{~} or @samp{!~} operator need not be a
-regexp constant (i.e., a string of characters between slashes). It may
-be any expression. The expression is evaluated and converted to a string
-if necessary; the contents of the string are then used as the
-regexp. A regexp computed in this way is called a @dfn{dynamic
-regexp} or a @dfn{computed regexp}:
-
-@example
-BEGIN @{ digits_regexp = "[[:digit:]]+" @}
-$0 ~ digits_regexp @{ print @}
-@end example
-
-@noindent
-This sets @code{digits_regexp} to a regexp that describes one or more digits,
-and tests whether the input record matches this regexp.
-
-@quotation NOTE
-When using the @samp{~} and @samp{!~}
-operators, there is a difference between a regexp constant
-enclosed in slashes and a string constant enclosed in double quotes.
-If you are going to use a string constant, you have to understand that
-the string is, in essence, scanned @emph{twice}: the first time when
-@command{awk} reads your program, and the second time when it goes to
-match the string on the lefthand side of the operator with the pattern
-on the right. This is true of any string-valued expression (such as
-@code{digits_regexp}, shown previously), not just string constants.
-@end quotation
-
-@cindex regexp constants, slashes vs.@: quotes
-@cindex @code{\} (backslash), in regexp constants
-@cindex backslash (@code{\}), in regexp constants
-@cindex @code{"} (double quote), in regexp constants
-@cindex double quote (@code{"}), in regexp constants
-What difference does it make if the string is
-scanned twice? The answer has to do with escape sequences, and particularly
-with backslashes. To get a backslash into a regular expression inside a
-string, you have to type two backslashes.
-
-For example, @code{/\*/} is a regexp constant for a literal @samp{*}.
-Only one backslash is needed. To do the same thing with a string,
-you have to type @code{"\\*"}. The first backslash escapes the
-second one so that the string actually contains the
-two characters @samp{\} and @samp{*}.
-
-@cindex troubleshooting, regexp constants vs.@: string constants
-@cindex regexp constants, vs.@: string constants
-@cindex string constants, vs.@: regexp constants
-Given that you can use both regexp and string constants to describe
-regular expressions, which should you use? The answer is ``regexp
-constants,'' for several reasons:
-
-@itemize @value{BULLET}
-@item
-String constants are more complicated to write and
-more difficult to read. Using regexp constants makes your programs
-less error-prone. Not understanding the difference between the two
-kinds of constants is a common source of errors.
-
-@item
-It is more efficient to use regexp constants. @command{awk} can note
-that you have supplied a regexp and store it internally in a form that
-makes pattern matching more efficient. When using a string constant,
-@command{awk} must first convert the string into this internal form and
-then perform the pattern matching.
-
-@item
-Using regexp constants is better form; it shows clearly that you
-intend a regexp match.
-@end itemize
-
-@cindex sidebar, Using @code{\n} in Bracket Expressions of Dynamic Regexps
-@ifdocbook
-@docbook
-<sidebar><title>Using @code{\n} in Bracket Expressions of Dynamic Regexps</title>
-@end docbook
-
-@cindex regular expressions, dynamic, with embedded newlines
-@cindex newlines, in dynamic regexps
-
-Some versions of @command{awk} do not allow the newline
-character to be used inside a bracket expression for a dynamic regexp:
-
-@example
-$ @kbd{awk '$0 ~ "[ \t\n]"'}
-@error{} awk: newline in character class [
-@error{} ]...
-@error{} source line number 1
-@error{} context is
-@error{} >>> <<<
-@end example
-
-@cindex newlines, in regexp constants
-But a newline in a regexp constant works with no problem:
-
-@example
-$ @kbd{awk '$0 ~ /[ \t\n]/'}
-@kbd{here is a sample line}
-@print{} here is a sample line
-@kbd{Ctrl-d}
-@end example
-
-@command{gawk} does not have this problem, and it isn't likely to
-occur often in practice, but it's worth noting for future reference.
-
-@docbook
-</sidebar>
-@end docbook
-@end ifdocbook
-
-@ifnotdocbook
-@cartouche
-@center @b{Using @code{\n} in Bracket Expressions of Dynamic Regexps}
-
-
-@cindex regular expressions, dynamic, with embedded newlines
-@cindex newlines, in dynamic regexps
-
-Some versions of @command{awk} do not allow the newline
-character to be used inside a bracket expression for a dynamic regexp:
-
-@example
-$ @kbd{awk '$0 ~ "[ \t\n]"'}
-@error{} awk: newline in character class [
-@error{} ]...
-@error{} source line number 1
-@error{} context is
-@error{} >>> <<<
-@end example
-
-@cindex newlines, in regexp constants
-But a newline in a regexp constant works with no problem:
-
-@example
-$ @kbd{awk '$0 ~ /[ \t\n]/'}
-@kbd{here is a sample line}
-@print{} here is a sample line
-@kbd{Ctrl-d}
-@end example
-
-@command{gawk} does not have this problem, and it isn't likely to
-occur often in practice, but it's worth noting for future reference.
-@end cartouche
-@end ifnotdocbook
-@c ENDOFRANGE dregexp
-@c ENDOFRANGE regexpd
-
@node Regexp Summary
@section Summary
@@ -6028,20 +6116,20 @@ Within bracket expressions, POSIX character classes let you specify
certain groups of characters in a locale-independent fashion.
@item
-@command{gawk}'s @code{IGNORECASE} variable lets you control the
-case sensitivity of regexp matching. In other @command{awk}
-versions, use @code{tolower()} or @code{toupper()}.
-
-@item
Regular expressions match the leftmost longest text in the string being
matched. This matters for cases where you need to know the extent of
the match, such as for text substitution and when the record separator
is a regexp.
@item
-Matching expressions may use dynamic regexps; that is, string values
+Matching expressions may use dynamic regexps, that is, string values
treated as regular expressions.
+@item
+@command{gawk}'s @code{IGNORECASE} variable lets you control the
+case sensitivity of regexp matching. In other @command{awk}
+versions, use @code{tolower()} or @code{toupper()}.
+
@end itemize
@c ENDOFRANGE regexp
@@ -6060,7 +6148,7 @@ standard input (by default, this is the keyboard, but often it is a pipe from an
command) or from files whose names you specify on the @command{awk}
command line. If you specify input files, @command{awk} reads them
in order, processing all the data from one before going on to the next.
-The name of the current input file can be found in the built-in variable
+The name of the current input file can be found in the predefined variable
@code{FILENAME}
(@pxref{Built-in Variables}).
@@ -6106,16 +6194,13 @@ used with it do not have to be named on the @command{awk} command line
@cindex records, splitting input into
@cindex @code{NR} variable
@cindex @code{FNR} variable
-The @command{awk} utility divides the input for your @command{awk}
-program into records and fields.
-@command{awk} keeps track of the number of records that have
-been read
-so far
-from the current input file. This value is stored in a
-built-in variable called @code{FNR}. It is reset to zero when a new
-file is started. Another built-in variable, @code{NR}, records the total
-number of input records read so far from all @value{DF}s. It starts at zero,
-but is never automatically reset to zero.
+@command{awk} divides the input for your program into records and fields.
+It keeps track of the number of records that have been read so far from
+the current input file. This value is stored in a predefined variable
+called @code{FNR} which is reset to zero every time a new file is started.
+Another predefined variable, @code{NR}, records the total number of input
+records read so far from all @value{DF}s. It starts at zero, but is
+never automatically reset to zero.
@menu
* awk split records:: How standard @command{awk} splits records.
@@ -6131,7 +6216,7 @@ Records are separated by a character called the @dfn{record separator}.
By default, the record separator is the newline character.
This is why records are, by default, single lines.
A different character can be used for the record separator by
-assigning the character to the built-in variable @code{RS}.
+assigning the character to the predefined variable @code{RS}.
@cindex newlines, as record separators
@cindex @code{RS} variable
@@ -6242,7 +6327,8 @@ Using an unusual character such as @samp{/} is more likely to
produce correct behavior in the majority of cases, but there
are no guarantees. The moral is: Know Your Data.
-There is one unusual case, that occurs when @command{gawk} is
+When using regular characters as the record separator,
+there is one unusual case that occurs when @command{gawk} is
being fully POSIX-compliant (@pxref{Options}).
Then, the following (extreme) pipeline prints a surprising @samp{1}:
@@ -6331,7 +6417,7 @@ $ @kbd{echo record 1 AAAA record 2 BBBB record 3 |}
@noindent
The square brackets delineate the contents of @code{RT}, letting you
-see the leading and trailing whitespace. The final value of @code{RT}
+see the leading and trailing whitespace. The final value of
@code{RT} is a newline.
@xref{Simple Sed}, for a more useful example
of @code{RS} as a regexp and @code{RT}.
@@ -6350,7 +6436,7 @@ metacharacters match the beginning and end of a @emph{string}, and not
the beginning and end of a @emph{line}. As a result, something like
@samp{RS = "^[[:upper:]]"} can only match at the beginning of a file.
This is because @command{gawk} views the input file as one long string
-that happens to contain newline characters in it.
+that happens to contain newline characters.
It is thus best to avoid anchor characters in the value of @code{RS}.
@end quotation
@@ -6360,7 +6446,7 @@ variable are @command{gawk} extensions; they are not available in
compatibility mode
(@pxref{Options}).
In compatibility mode, only the first character of the value of
-@code{RS} is used to determine the end of the record.
+@code{RS} determines the end of the record.
@cindex sidebar, @code{RS = "\0"} Is Not Portable
@ifdocbook
@@ -6375,7 +6461,7 @@ a value that you know doesn't occur in the input file. This is hard
to do in a general way, such that a program always works for arbitrary
input files.
-You might think that for text files, the @sc{nul} character, which
+You might think that for text files, the @value{NUL} character, which
consists of a character with all bits equal to zero, is a good
value to use for @code{RS} in this case:
@@ -6384,27 +6470,28 @@ BEGIN @{ RS = "\0" @} # whole file becomes one record?
@end example
@cindex differences in @command{awk} and @command{gawk}, strings, storing
-@command{gawk} in fact accepts this, and uses the @sc{nul}
+@command{gawk} in fact accepts this, and uses the @value{NUL}
character for the record separator.
This works for certain special files, such as @file{/proc/environ} on
-GNU/Linux systems, where the @sc{nul} character is in fact the record separator.
+GNU/Linux systems, where the @value{NUL} character is in fact the record separator.
However, this usage is @emph{not} portable
to most other @command{awk} implementations.
@cindex dark corner, strings, storing
Almost all other @command{awk} implementations@footnote{At least that we know
about.} store strings internally as C-style strings. C strings use the
-@sc{nul} character as the string terminator. In effect, this means that
+@value{NUL} character as the string terminator. In effect, this means that
@samp{RS = "\0"} is the same as @samp{RS = ""}.
@value{DARKCORNER}
-It happens that recent versions of @command{mawk} can use the @sc{nul}
+It happens that recent versions of @command{mawk} can use the @value{NUL}
character as a record separator. However, this is a special case:
-@command{mawk} does not allow embedded @sc{nul} characters in strings.
+@command{mawk} does not allow embedded @value{NUL} characters in strings.
+(This may change in a future version of @command{mawk}.)
@cindex records, treating files as
@cindex treating files, as single records
-@xref{Readfile Function}, for an interesting, portable way to read
+@xref{Readfile Function}, for an interesting way to read
whole files. If you are using @command{gawk}, see @ref{Extension Sample
Readfile}, for another option.
@@ -6425,7 +6512,7 @@ a value that you know doesn't occur in the input file. This is hard
to do in a general way, such that a program always works for arbitrary
input files.
-You might think that for text files, the @sc{nul} character, which
+You might think that for text files, the @value{NUL} character, which
consists of a character with all bits equal to zero, is a good
value to use for @code{RS} in this case:
@@ -6434,27 +6521,28 @@ BEGIN @{ RS = "\0" @} # whole file becomes one record?
@end example
@cindex differences in @command{awk} and @command{gawk}, strings, storing
-@command{gawk} in fact accepts this, and uses the @sc{nul}
+@command{gawk} in fact accepts this, and uses the @value{NUL}
character for the record separator.
This works for certain special files, such as @file{/proc/environ} on
-GNU/Linux systems, where the @sc{nul} character is in fact the record separator.
+GNU/Linux systems, where the @value{NUL} character is in fact the record separator.
However, this usage is @emph{not} portable
to most other @command{awk} implementations.
@cindex dark corner, strings, storing
Almost all other @command{awk} implementations@footnote{At least that we know
about.} store strings internally as C-style strings. C strings use the
-@sc{nul} character as the string terminator. In effect, this means that
+@value{NUL} character as the string terminator. In effect, this means that
@samp{RS = "\0"} is the same as @samp{RS = ""}.
@value{DARKCORNER}
-It happens that recent versions of @command{mawk} can use the @sc{nul}
+It happens that recent versions of @command{mawk} can use the @value{NUL}
character as a record separator. However, this is a special case:
-@command{mawk} does not allow embedded @sc{nul} characters in strings.
+@command{mawk} does not allow embedded @value{NUL} characters in strings.
+(This may change in a future version of @command{mawk}.)
@cindex records, treating files as
@cindex treating files, as single records
-@xref{Readfile Function}, for an interesting, portable way to read
+@xref{Readfile Function}, for an interesting way to read
whole files. If you are using @command{gawk}, see @ref{Extension Sample
Readfile}, for another option.
@end cartouche
@@ -6514,7 +6602,7 @@ field.
@cindex @code{NF} variable
@cindex fields, number of
-@code{NF} is a built-in variable whose value is the number of fields
+@code{NF} is a predefined variable whose value is the number of fields
in the current record. @command{awk} automatically updates the value
of @code{NF} each time it reads a record. No matter how many fields
there are, the last field in a record can be represented by @code{$NF}.
@@ -6536,15 +6624,11 @@ $ @kbd{awk '$1 ~ /li/ @{ print $0 @}' mail-list}
@noindent
This example prints each record in the file @file{mail-list} whose first
-field contains the string @samp{li}. The operator @samp{~} is called a
-@dfn{matching operator}
-(@pxref{Regexp Usage});
-it tests whether a string (here, the field @code{$1}) matches a given regular
-expression.
+field contains the string @samp{li}.
-By contrast, the following example
-looks for @samp{li} in @emph{the entire record} and prints the first
-field and the last field for each matching input record:
+By contrast, the following example looks for @samp{li} in @emph{the
+entire record} and prints the first and last fields for each matching
+input record:
@example
$ @kbd{awk '/li/ @{ print $1, $NF @}' mail-list}
@@ -6607,7 +6691,7 @@ implementations may behave differently.)
As mentioned in @ref{Fields},
@command{awk} stores the current record's number of fields in the built-in
-variable @code{NF} (also @pxref{Built-in Variables}). The expression
+variable @code{NF} (also @pxref{Built-in Variables}). Thus, the expression
@code{$NF} is not a special feature---it is the direct consequence of
evaluating @code{NF} and using its value as a field number.
@@ -6667,8 +6751,8 @@ It is also possible to also assign contents to fields that are out
of range. For example:
@example
-$ awk '@{ $6 = ($5 + $4 + $3 + $2)
-> print $6 @}' inventory-shipped
+$ @kbd{awk '@{ $6 = ($5 + $4 + $3 + $2)}
+> @kbd{ print $6 @}' inventory-shipped}
@print{} 168
@print{} 297
@print{} 301
@@ -6757,7 +6841,7 @@ Here is an example:
@example
$ echo a b c d e f | awk '@{ print "NF =", NF;
-> NF = 3; print $0 @}'
+> NF = 3; print $0 @}'
@print{} NF = 6
@print{} a b c
@end example
@@ -6765,7 +6849,7 @@ $ echo a b c d e f | awk '@{ print "NF =", NF;
@cindex portability, @code{NF} variable@comma{} decrementing
@quotation CAUTION
Some versions of @command{awk} don't
-rebuild @code{$0} when @code{NF} is decremented. Caveat emptor.
+rebuild @code{$0} when @code{NF} is decremented.
@end quotation
Finally, there are times when it is convenient to force
@@ -6801,7 +6885,7 @@ record, exactly as it was read from the input. This includes
any leading or trailing whitespace, and the exact whitespace (or other
characters) that separate the fields.
-It is a not-uncommon error to try to change the field separators
+It is a common error to try to change the field separators
in a record simply by setting @code{FS} and @code{OFS}, and then
expecting a plain @samp{print} or @samp{print $0} to print the
modified record.
@@ -6826,7 +6910,7 @@ record, exactly as it was read from the input. This includes
any leading or trailing whitespace, and the exact whitespace (or other
characters) that separate the fields.
-It is a not-uncommon error to try to change the field separators
+It is a common error to try to change the field separators
in a record simply by setting @code{FS} and @code{OFS}, and then
expecting a plain @samp{print} or @samp{print $0} to print the
modified record.
@@ -6876,7 +6960,7 @@ is split into three fields: @samp{m}, @samp{@bullet{}g}, and
Note the leading spaces in the values of the second and third fields.
@cindex troubleshooting, @command{awk} uses @code{FS} not @code{IFS}
-The field separator is represented by the built-in variable @code{FS}.
+The field separator is represented by the predefined variable @code{FS}.
Shell programmers take note: @command{awk} does @emph{not} use the
name @code{IFS} that is used by the POSIX-compliant shells (such as
the Unix Bourne shell, @command{sh}, or Bash).
@@ -7030,9 +7114,10 @@ $ @kbd{echo ' a b c d' | awk '@{ print; $2 = $2; print @}'}
The first @code{print} statement prints the record as it was read,
with leading whitespace intact. The assignment to @code{$2} rebuilds
@code{$0} by concatenating @code{$1} through @code{$NF} together,
-separated by the value of @code{OFS}. Because the leading whitespace
-was ignored when finding @code{$1}, it is not part of the new @code{$0}.
-Finally, the last @code{print} statement prints the new @code{$0}.
+separated by the value of @code{OFS} (which is a space by default).
+Because the leading whitespace was ignored when finding @code{$1},
+it is not part of the new @code{$0}. Finally, the last @code{print}
+statement prints the new @code{$0}.
@cindex @code{FS}, containing @code{^}
@cindex @code{^} (caret), in @code{FS}
@@ -7054,7 +7139,7 @@ also works this way. For example:
@example
$ @kbd{echo 'xxAA xxBxx C' |}
> @kbd{gawk -F '(^x+)|( +)' '@{ for (i = 1; i <= NF; i++)}
-> @kbd{printf "-->%s<--\n", $i @}'}
+> @kbd{ printf "-->%s<--\n", $i @}'}
@print{} --><--
@print{} -->AA<--
@print{} -->xxBxx<--
@@ -7117,15 +7202,10 @@ awk -F, '@var{program}' @var{input-files}
@noindent
sets @code{FS} to the @samp{,} character. Notice that the option uses
an uppercase @samp{F} instead of a lowercase @samp{f}. The latter
-option (@option{-f}) specifies a file
-containing an @command{awk} program. Case is significant in command-line
-options:
-the @option{-F} and @option{-f} options have nothing to do with each other.
-You can use both options at the same time to set the @code{FS} variable
-@emph{and} get an @command{awk} program from a file.
+option (@option{-f}) specifies a file containing an @command{awk} program.
The value used for the argument to @option{-F} is processed in exactly the
-same way as assignments to the built-in variable @code{FS}.
+same way as assignments to the predefined variable @code{FS}.
Any special characters in the field separator must be escaped
appropriately. For example, to use a @samp{\} as the field separator
on the command line, you would have to type:
@@ -7236,7 +7316,7 @@ to @code{FS} (the backslash is stripped). This creates a regexp meaning
If instead you want fields to be separated by a literal period followed
by any single character, use @samp{FS = "\\.."}.
-The following table summarizes how fields are split, based on the value
+The following list summarizes how fields are split, based on the value
of @code{FS} (@samp{==} means ``is equal to''):
@table @code
@@ -7257,8 +7337,7 @@ Leading and trailing matches of @var{regexp} delimit empty fields.
@item FS == ""
Each individual character in the record becomes a separate field.
-(This is a @command{gawk} extension; it is not specified by the
-POSIX standard.)
+(This is a common extension; it is not specified by the POSIX standard.)
@end table
@cindex sidebar, Changing @code{FS} Does Not Affect the Fields
@@ -7805,7 +7884,7 @@ BEGIN @{ RS = "" ; FS = "\n" @}
Running the program produces the following output:
@example
-$ awk -f addrs.awk addresses
+$ @kbd{awk -f addrs.awk addresses}
@print{} Name is: Jane Doe
@print{} Address is: 123 Main Street
@print{} City and State are: Anywhere, SE 12345-6789
@@ -7817,12 +7896,9 @@ $ awk -f addrs.awk addresses
@dots{}
@end example
-@xref{Labels Program}, for a more realistic
-program that deals with address lists.
-The following
-table
-summarizes how records are split, based on the
-value of
+@xref{Labels Program}, for a more realistic program that deals with
+address lists. The following list summarizes how records are split,
+based on the value of
@ifinfo
@code{RS}.
(@samp{==} means ``is equal to.'')
@@ -7857,8 +7933,8 @@ POSIX standard.)
@cindex @command{gawk}, @code{RT} variable in
@cindex @code{RT} variable
-In all cases, @command{gawk} sets @code{RT} to the input text that matched the
-value specified by @code{RS}.
+If not in compatibility mode (@pxref{Options}), @command{gawk} sets
+@code{RT} to the input text that matched the value specified by @code{RS}.
But if the input file ended without any text that matches @code{RS},
then @command{gawk} sets @code{RT} to the null string.
@c ENDOFRANGE recm
@@ -7904,7 +7980,7 @@ and have a good knowledge of how @command{awk} works.
@cindex @code{getline} command, return values
@cindex @option{--sandbox} option, input redirection with @code{getline}
-The @code{getline} command returns one if it finds a record and zero if
+The @code{getline} command returns 1 if it finds a record and 0 if
it encounters the end of the file. If there is some error in getting
a record, such as a file that cannot be opened, then @code{getline}
returns @minus{}1. In this case, @command{gawk} sets the variable
@@ -7944,32 +8020,56 @@ finished processing the current record, but want to do some special
processing on the next record @emph{right now}. For example:
@example
+# Remove text between /* and */, inclusive
@{
- if ((t = index($0, "/*")) != 0) @{
- # value of `tmp' will be "" if t is 1
- tmp = substr($0, 1, t - 1)
- u = index(substr($0, t + 2), "*/")
- offset = t + 2
- while (u == 0) @{
- if (getline <= 0) @{
- m = "unexpected EOF or error"
- m = (m ": " ERRNO)
- print m > "/dev/stderr"
+ if ((i = index($0, "/*")) != 0) @{
+ out = substr($0, 1, i - 1) # leading part of the string
+ rest = substr($0, i + 2) # ... */ ...
+ j = index(rest, "*/") # is */ in trailing part?
+ if (j > 0) @{
+ rest = substr(rest, j + 2) # remove comment
+ @} else @{
+ while (j == 0) @{
+ # get more text
+ if (getline <= 0) @{
+ print("unexpected EOF or error:", ERRNO) > "/dev/stderr"
exit
- @}
- u = index($0, "*/")
- offset = 0
- @}
- # substr() expression will be "" if */
- # occurred at end of line
- $0 = tmp substr($0, offset + u + 2)
- @}
- print $0
+ @}
+ # build up the line using string concatenation
+ rest = rest $0
+ j = index(rest, "*/") # is */ in trailing part?
+ if (j != 0) @{
+ rest = substr(rest, j + 2)
+ break
+ @}
+ @}
+ @}
+ # build up the output line using string concatenation
+ $0 = out rest
+ @}
+ print $0
@}
@end example
+@c 8/2014: Here is some sample input:
+@ignore
+mon/*comment*/key
+rab/*commen
+t*/bit
+horse /*comment*/more text
+part 1 /*comment*/part 2 /*comment*/part 3
+no comment
+@end ignore
+
This @command{awk} program deletes C-style comments (@samp{/* @dots{}
-*/}) from the input. By replacing the @samp{print $0} with other
+*/}) from the input.
+It uses a number of features we haven't covered yet, including
+string concatenation
+(@pxref{Concatenation})
+and the @code{index()} and @code{substr()} built-in
+functions
+(@pxref{String Functions}).
+By replacing the @samp{print $0} with other
statements, you could perform more complicated processing on the
decommented input, such as searching for matches of a regular
expression. (This program has a subtle problem---it does not work if one
@@ -8091,7 +8191,7 @@ from the file
@var{file}, and put it in the variable @var{var}. As above, @var{file}
is a string-valued expression that specifies the file from which to read.
-In this version of @code{getline}, none of the built-in variables are
+In this version of @code{getline}, none of the predefined variables are
changed and the record is not split into fields. The only variable
changed is @var{var}.@footnote{This is not quite true. @code{RT} could
be changed if @code{RS} is a regular expression.}
@@ -8201,7 +8301,7 @@ bletch
@end example
@noindent
-Notice that this program ran the command @command{who} and printed the previous result.
+Notice that this program ran the command @command{who} and printed the result.
(If you try this program yourself, you will of course get different results,
depending upon who is logged in on your system.)
@@ -8226,7 +8326,7 @@ Unfortunately, @command{gawk} has not been consistent in its treatment
of a construct like @samp{@w{"echo "} "date" | getline}.
Most versions, including the current version, treat it at as
@samp{@w{("echo "} "date") | getline}.
-(This how BWK @command{awk} behaves.)
+(This is also how BWK @command{awk} behaves.)
Some versions changed and treated it as
@samp{@w{"echo "} ("date" | getline)}.
(This is how @command{mawk} behaves.)
@@ -8253,8 +8353,8 @@ BEGIN @{
@}
@end example
-In this version of @code{getline}, none of the built-in variables are
-changed and the record is not split into fields.
+In this version of @code{getline}, none of the predefined variables are
+changed and the record is not split into fields. However, @code{RT} is set.
@ifinfo
@c Thanks to Paul Eggert for initial wording here
@@ -8315,7 +8415,7 @@ When you use @samp{@var{command} |& getline @var{var}}, the output from
the coprocess @var{command} is sent through a two-way pipe to @code{getline}
and into the variable @var{var}.
-In this version of @code{getline}, none of the built-in variables are
+In this version of @code{getline}, none of the predefined variables are
changed and the record is not split into fields. The only variable
changed is @var{var}.
However, @code{RT} is set.
@@ -8362,7 +8462,7 @@ causes @command{awk} to set the value of @code{FILENAME}. Normally,
@code{FILENAME} does not have a value inside @code{BEGIN} rules, because you
have not yet started to process the command-line @value{DF}s.
@value{DARKCORNER}
-(@xref{BEGIN/END},
+(See @ref{BEGIN/END};
also @pxref{Auto-set}.)
@item
@@ -8376,7 +8476,7 @@ probably by accident, and you should reconsider what it is you're
trying to accomplish.
@item
-@ref{Getline Summary}, presents a table summarizing the
+@DBREF{Getline Summary} presents a table summarizing the
@code{getline} variants and which variables they can affect.
It is worth noting that those variants which do not use redirection
can cause @code{FILENAME} to be updated if they cause
@@ -8409,7 +8509,7 @@ end of file is encountered, before the element in @code{a} is assigned?
@command{gawk} treats @code{getline} like a function call, and evaluates
the expression @samp{a[++c]} before attempting to read from @file{f}.
However, some versions of @command{awk} only evaluate the expression once they
-know that there is a string value to be assigned. Caveat Emptor.
+know that there is a string value to be assigned.
@end itemize
@node Getline Summary
@@ -8418,22 +8518,22 @@ know that there is a string value to be assigned. Caveat Emptor.
@ref{table-getline-variants}
summarizes the eight variants of @code{getline},
-listing which built-in variables are set by each one,
+listing which predefined variables are set by each one,
and whether the variant is standard or a @command{gawk} extension.
-Note: for each variant, @command{gawk} sets the @code{RT} built-in variable.
+Note: for each variant, @command{gawk} sets the @code{RT} predefined variable.
@float Table,table-getline-variants
@caption{@code{getline} Variants and What They Set}
@multitable @columnfractions .33 .38 .27
-@headitem Variant @tab Effect @tab Standard / Extension
-@item @code{getline} @tab Sets @code{$0}, @code{NF}, @code{FNR}, @code{NR}, and @code{RT} @tab Standard
-@item @code{getline} @var{var} @tab Sets @var{var}, @code{FNR}, @code{NR}, and @code{RT} @tab Standard
-@item @code{getline <} @var{file} @tab Sets @code{$0}, @code{NF}, and @code{RT} @tab Standard
-@item @code{getline @var{var} < @var{file}} @tab Sets @var{var} and @code{RT} @tab Standard
-@item @var{command} @code{| getline} @tab Sets @code{$0}, @code{NF}, and @code{RT} @tab Standard
-@item @var{command} @code{| getline} @var{var} @tab Sets @var{var} and @code{RT} @tab Standard
-@item @var{command} @code{|& getline} @tab Sets @code{$0}, @code{NF}, and @code{RT} @tab Extension
-@item @var{command} @code{|& getline} @var{var} @tab Sets @var{var} and @code{RT} @tab Extension
+@headitem Variant @tab Effect @tab @command{awk} / @command{gawk}
+@item @code{getline} @tab Sets @code{$0}, @code{NF}, @code{FNR}, @code{NR}, and @code{RT} @tab @command{awk}
+@item @code{getline} @var{var} @tab Sets @var{var}, @code{FNR}, @code{NR}, and @code{RT} @tab @command{awk}
+@item @code{getline <} @var{file} @tab Sets @code{$0}, @code{NF}, and @code{RT} @tab @command{awk}
+@item @code{getline @var{var} < @var{file}} @tab Sets @var{var} and @code{RT} @tab @command{awk}
+@item @var{command} @code{| getline} @tab Sets @code{$0}, @code{NF}, and @code{RT} @tab @command{awk}
+@item @var{command} @code{| getline} @var{var} @tab Sets @var{var} and @code{RT} @tab @command{awk}
+@item @var{command} @code{|& getline} @tab Sets @code{$0}, @code{NF}, and @code{RT} @tab @command{gawk}
+@item @var{command} @code{|& getline} @var{var} @tab Sets @var{var} and @code{RT} @tab @command{gawk}
@end multitable
@end float
@c ENDOFRANGE getl
@@ -8450,7 +8550,7 @@ This @value{SECTION} describes a feature that is specific to @command{gawk}.
You may specify a timeout in milliseconds for reading input from the keyboard,
a pipe, or two-way communication, including TCP/IP sockets. This can be done
on a per input, command or connection basis, by setting a special element
-in the @code{PROCINFO} (@pxref{Auto-set}) array:
+in the @code{PROCINFO} array (@pxref{Auto-set}):
@example
PROCINFO["input_name", "READ_TIMEOUT"] = @var{timeout in milliseconds}
@@ -8482,7 +8582,7 @@ while ((getline < "/dev/stdin") > 0)
@command{gawk} terminates the read operation if input does not
arrive after waiting for the timeout period, returns failure
-and sets the @code{ERRNO} variable to an appropriate string value.
+and sets @code{ERRNO} to an appropriate string value.
A negative or zero value for the timeout is the same as specifying
no timeout at all.
@@ -8589,6 +8689,10 @@ The possibilities are as follows:
@end multitable
@item
+@code{FNR} indicates how many records have been read from the current input file;
+@code{NR} indicates how many records have been read in total.
+
+@item
@command{gawk} sets @code{RT} to the text matched by @code{RS}.
@item
@@ -8599,7 +8703,7 @@ fields there are. The default way to split fields is between whitespace
characters.
@item
-Fields may be referenced using a variable, as in @samp{$NF}. Fields
+Fields may be referenced using a variable, as in @code{$NF}. Fields
may also be assigned values, which causes the value of @code{$0} to be
recomputed when it is later referenced. Assigning to a field with a number
greater than @code{NF} creates the field and rebuilds the record, using
@@ -8609,16 +8713,17 @@ thing. Decrementing @code{NF} throws away fields and rebuilds the record.
@item
Field splitting is more complicated than record splitting.
-@multitable @columnfractions .40 .40 .20
+@multitable @columnfractions .40 .45 .15
@headitem Field separator value @tab Fields are split @dots{} @tab @command{awk} / @command{gawk}
@item @code{FS == " "} @tab On runs of whitespace @tab @command{awk}
@item @code{FS == @var{any single character}} @tab On that character @tab @command{awk}
@item @code{FS == @var{regexp}} @tab On text matching the regexp @tab @command{awk}
@item @code{FS == ""} @tab Each individual character is a separate field @tab @command{gawk}
@item @code{FIELDWIDTHS == @var{list of columns}} @tab Based on character position @tab @command{gawk}
-@item @code{FPAT == @var{regexp}} @tab On text around text matching the regexp @tab @command{gawk}
+@item @code{FPAT == @var{regexp}} @tab On the text surrounding text matching the regexp @tab @command{gawk}
@end multitable
+@item
Using @samp{FS = "\n"} causes the entire record to be a single field
(assuming that newlines separate records).
@@ -8627,11 +8732,11 @@ Using @samp{FS = "\n"} causes the entire record to be a single field
This can also be done using command-line variable assignment.
@item
-@code{PROCINFO["FS"]} can be used to see how fields are being split.
+Use @code{PROCINFO["FS"]} to see how fields are being split.
@item
Use @code{getline} in its various forms to read additional records,
-from the default input stream, from a file, or from a pipe or co-process.
+from the default input stream, from a file, or from a pipe or coprocess.
@item
Use @code{PROCINFO[@var{file}, "READ_TIMEOUT"]} to cause reads to timeout
@@ -8643,6 +8748,7 @@ Directories on the command line are fatal for standard @command{awk};
@end itemize
+@c EXCLUDE START
@node Input Exercises
@section Exercises
@@ -8659,9 +8765,10 @@ including abstentions, for each item.
comments (@samp{/* @dots{} */}) from the input. That program
does not work if one comment ends on one line and another one
starts later on the same line.
-Write a program that does handle multiple comments on the line.
+That can be fixed by making one simple change. What is it?
@end enumerate
+@c EXCLUDE END
@node Printing
@chapter Printing Output
@@ -8698,18 +8805,19 @@ and discusses the @code{close()} built-in function.
* Printf:: The @code{printf} statement.
* Redirection:: How to redirect output to multiple files and
pipes.
+* Special FD:: Special files for I/O.
* Special Files:: File name interpretation in @command{gawk}.
@command{gawk} allows access to inherited file
descriptors.
* Close Files And Pipes:: Closing Input and Output Files and Pipes.
* Output Summary:: Output summary.
-* Output exercises:: Exercises.
+* Output Exercises:: Exercises.
@end menu
@node Print
@section The @code{print} Statement
-The @code{print} statement is used for producing output with simple, standardized
+Use the @code{print} statement to produce output with simple, standardized
formatting. You specify only the strings or numbers to print, in a
list separated by commas. They are output, separated by single spaces,
followed by a newline. The statement looks like this:
@@ -8733,7 +8841,7 @@ expression. Numeric values are converted to strings and then printed.
@cindex text, printing
The simple statement @samp{print} with no items is equivalent to
@samp{print $0}: it prints the entire current record. To print a blank
-line, use @samp{print ""}, where @code{""} is the empty string.
+line, use @samp{print ""}.
To print a fixed piece of text, use a string constant, such as
@w{@code{"Don't Panic"}}, as one item. If you forget to use the
double-quote characters, your text is taken as an @command{awk}
@@ -8741,8 +8849,8 @@ expression, and you will probably get an error. Keep in mind that a
space is printed between any two items.
Note that the @code{print} statement is a statement and not an
-expression---you can't use it the pattern part of a pattern-action
-statement, for example.
+expression---you can't use it in the pattern part of a
+@var{pattern}-@var{action} statement, for example.
@node Print Examples
@section @code{print} Statement Examples
@@ -8753,9 +8861,22 @@ newline, the newline is output along with the rest of the string. A
single @code{print} statement can make any number of lines this way.
@cindex newlines, printing
-The following is an example of printing a string that contains embedded newlines
+The following is an example of printing a string that contains embedded
+@ifinfo
+newlines
+(the @samp{\n} is an escape sequence, used to represent the newline
+character; @pxref{Escape Sequences}):
+@end ifinfo
+@ifhtml
+newlines
(the @samp{\n} is an escape sequence, used to represent the newline
character; @pxref{Escape Sequences}):
+@end ifhtml
+@ifnotinfo
+@ifnothtml
+newlines:
+@end ifnothtml
+@end ifnotinfo
@example
$ @kbd{awk 'BEGIN @{ print "line one\nline two\nline three" @}'}
@@ -8859,7 +8980,7 @@ of items separated by commas. In the output, the items are normally
separated by single spaces. However, this doesn't need to be the case;
a single space is simply the default. Any string of
characters may be used as the @dfn{output field separator} by setting the
-built-in variable @code{OFS}. The initial value of this variable
+predefined variable @code{OFS}. The initial value of this variable
is the string @w{@code{" "}}---that is, a single space.
The output from an entire @code{print} statement is called an
@@ -8935,13 +9056,13 @@ more fully in
@cindexawkfunc{sprintf}
@cindex @code{OFMT} variable
@cindex output, format specifier@comma{} @code{OFMT}
-The built-in variable @code{OFMT} contains the default format specification
+The predefined variable @code{OFMT} contains the format specification
that @code{print} uses with @code{sprintf()} when it wants to convert a
number to a string for printing.
The default value of @code{OFMT} is @code{"%.6g"}.
The way @code{print} prints numbers can be changed
-by supplying different format specifications
-as the value of @code{OFMT}, as shown in the following example:
+by supplying a different format specification
+for the value of @code{OFMT}, as shown in the following example:
@example
$ @kbd{awk 'BEGIN @{}
@@ -8971,9 +9092,7 @@ With @code{printf} you can
specify the width to use for each item, as well as various
formatting choices for numbers (such as what output base to use, whether to
print an exponent, whether to print a sign, and how many digits to print
-after the decimal point). You do this by supplying a string, called
-the @dfn{format string}, that controls how and where to print the other
-arguments.
+after the decimal point).
@menu
* Basic Printf:: Syntax of the @code{printf} statement.
@@ -8993,10 +9112,10 @@ printf @var{format}, @var{item1}, @var{item2}, @dots{}
@end example
@noindent
-The entire list of arguments may optionally be enclosed in parentheses. The
-parentheses are necessary if any of the item expressions use the @samp{>}
-relational operator; otherwise, it can be confused with an output redirection
-(@pxref{Redirection}).
+As print @code{print}, the entire list of arguments may optionally be
+enclosed in parentheses. Here too, the parentheses are necessary if any
+of the item expressions use the @samp{>} relational operator; otherwise,
+it can be confused with an output redirection (@pxref{Redirection}).
@cindex format specifiers
The difference between @code{printf} and @code{print} is the @var{format}
@@ -9019,10 +9138,10 @@ on @code{printf} statements. For example:
@example
$ @kbd{awk 'BEGIN @{}
> @kbd{ORS = "\nOUCH!\n"; OFS = "+"}
-> @kbd{msg = "Dont Panic!"}
+> @kbd{msg = "Don\47t Panic!"}
> @kbd{printf "%s\n", msg}
> @kbd{@}'}
-@print{} Dont Panic!
+@print{} Don't Panic!
@end example
@noindent
@@ -9044,7 +9163,7 @@ the field width. Here is a list of the format-control letters:
@c @asis for docbook to come out right
@table @asis
@item @code{%c}
-Print a number as an ASCII character; thus, @samp{printf "%c",
+Print a number as a character; thus, @samp{printf "%c",
65} outputs the letter @samp{A}. The output for a string value is
the first character of the string.
@@ -9070,7 +9189,7 @@ a single byte (0--255).
@item @code{%d}, @code{%i}
Print a decimal integer.
The two control letters are equivalent.
-(The @samp{%i} specification is for compatibility with ISO C.)
+(The @code{%i} specification is for compatibility with ISO C.)
@item @code{%e}, @code{%E}
Print a number in scientific (exponential) notation;
@@ -9085,7 +9204,7 @@ prints @samp{1.950e+03}, with a total of four significant figures, three of
which follow the decimal point.
(The @samp{4.3} represents two modifiers,
discussed in the next @value{SUBSECTION}.)
-@samp{%E} uses @samp{E} instead of @samp{e} in the output.
+@code{%E} uses @samp{E} instead of @samp{e} in the output.
@item @code{%f}
Print a number in floating-point notation.
@@ -9111,16 +9230,16 @@ The special ``not a number'' value formats as @samp{-nan} or @samp{nan}
(@pxref{Math Definitions}).
@item @code{%F}
-Like @samp{%f} but the infinity and ``not a number'' values are spelled
+Like @code{%f} but the infinity and ``not a number'' values are spelled
using uppercase letters.
-The @samp{%F} format is a POSIX extension to ISO C; not all systems
-support it. On those that don't, @command{gawk} uses @samp{%f} instead.
+The @code{%F} format is a POSIX extension to ISO C; not all systems
+support it. On those that don't, @command{gawk} uses @code{%f} instead.
@item @code{%g}, @code{%G}
Print a number in either scientific notation or in floating-point
notation, whichever uses fewer characters; if the result is printed in
-scientific notation, @samp{%G} uses @samp{E} instead of @samp{e}.
+scientific notation, @code{%G} uses @samp{E} instead of @samp{e}.
@item @code{%o}
Print an unsigned octal integer
@@ -9136,7 +9255,7 @@ are floating-point; it is provided primarily for compatibility with C.)
@item @code{%x}, @code{%X}
Print an unsigned hexadecimal integer;
-@samp{%X} uses the letters @samp{A} through @samp{F}
+@code{%X} uses the letters @samp{A} through @samp{F}
instead of @samp{a} through @samp{f}
(@pxref{Nondecimal-numbers}).
@@ -9151,7 +9270,7 @@ argument and it ignores any modifiers.
@quotation NOTE
When using the integer format-control letters for values that are
outside the range of the widest C integer type, @command{gawk} switches to
-the @samp{%g} format specifier. If @option{--lint} is provided on the
+the @code{%g} format specifier. If @option{--lint} is provided on the
command line (@pxref{Options}), @command{gawk}
warns about this. Other versions of @command{awk} may print invalid
values or do something else entirely.
@@ -9167,7 +9286,7 @@ values or do something else entirely.
A format specification can also include @dfn{modifiers} that can control
how much of the item's value is printed, as well as how much space it gets.
The modifiers come between the @samp{%} and the format-control letter.
-We will use the bullet symbol ``@bullet{}'' in the following examples to
+We use the bullet symbol ``@bullet{}'' in the following examples to
represent
spaces in the output. Here are the possible modifiers, in the order in
which they may appear:
@@ -9198,7 +9317,7 @@ It is in fact a @command{gawk} extension, intended for use in translating
messages at runtime.
@xref{Printf Ordering},
which describes how and why to use positional specifiers.
-For now, we will not use them.
+For now, we ignore them.
@item -
The minus sign, used before the width modifier (see later on in
@@ -9226,15 +9345,15 @@ to format is positive. The @samp{+} overrides the space modifier.
@item #
Use an ``alternate form'' for certain control letters.
-For @samp{%o}, supply a leading zero.
-For @samp{%x} and @samp{%X}, supply a leading @samp{0x} or @samp{0X} for
+For @code{%o}, supply a leading zero.
+For @code{%x} and @code{%X}, supply a leading @code{0x} or @samp{0X} for
a nonzero result.
-For @samp{%e}, @samp{%E}, @samp{%f}, and @samp{%F}, the result always
+For @code{%e}, @code{%E}, @code{%f}, and @code{%F}, the result always
contains a decimal point.
-For @samp{%g} and @samp{%G}, trailing zeros are not removed from the result.
+For @code{%g} and @code{%G}, trailing zeros are not removed from the result.
@item 0
-A leading @samp{0} (zero) acts as a flag that indicates that output should be
+A leading @samp{0} (zero) acts as a flag indicating that output should be
padded with zeros instead of spaces.
This applies only to the numeric output formats.
This flag only has an effect when the field width is wider than the
@@ -9420,7 +9539,7 @@ the @command{awk} program:
@example
awk 'BEGIN @{ print "Name Number"
print "---- ------" @}
- @{ printf "%-10s %s\n", $1, $2 @}' mail-list
+ @{ printf "%-10s %s\n", $1, $2 @}' mail-list
@end example
The above example mixes @code{print} and @code{printf} statements in
@@ -9430,7 +9549,7 @@ same results:
@example
awk 'BEGIN @{ printf "%-10s %s\n", "Name", "Number"
printf "%-10s %s\n", "----", "------" @}
- @{ printf "%-10s %s\n", $1, $2 @}' mail-list
+ @{ printf "%-10s %s\n", $1, $2 @}' mail-list
@end example
@noindent
@@ -9445,7 +9564,7 @@ emphasized by storing it in a variable, like this:
awk 'BEGIN @{ format = "%-10s %s\n"
printf format, "Name", "Number"
printf format, "----", "------" @}
- @{ printf format, $1, $2 @}' mail-list
+ @{ printf format, $1, $2 @}' mail-list
@end example
@c ENDOFRANGE printfs
@@ -9466,7 +9585,7 @@ This is called @dfn{redirection}.
@quotation NOTE
When @option{--sandbox} is specified (@pxref{Options}),
-redirecting output to files and pipes is disabled.
+redirecting output to files, pipes and coprocesses is disabled.
@end quotation
A redirection appears after the @code{print} or @code{printf} statement.
@@ -9563,17 +9682,11 @@ in an @command{awk} script run periodically for system maintenance:
@example
report = "mail bug-system"
-print "Awk script failed:", $0 | report
-m = ("at record number " FNR " of " FILENAME)
-print m | report
+print("Awk script failed:", $0) | report
+print("at record number", FNR, "of", FILENAME) | report
close(report)
@end example
-The message is built using string concatenation and saved in the variable
-@code{m}. It's then sent down the pipeline to the @command{mail} program.
-(The parentheses group the items to concatenate---see
-@ref{Concatenation}.)
-
The @code{close()} function is called here because it's a good idea to close
the pipe as soon as all the intended output has been sent to it.
@xref{Close Files And Pipes},
@@ -9680,6 +9793,9 @@ The program builds up a list of command lines,
using the @command{mv} utility to rename the files.
It then sends the list to the shell for execution.
+@xref{Shell Quoting}, for a function that can help in generating
+command lines to be fed to the shell.
+
@docbook
</sidebar>
@end docbook
@@ -9711,28 +9827,16 @@ uppercase characters converted to lowercase
The program builds up a list of command lines,
using the @command{mv} utility to rename the files.
It then sends the list to the shell for execution.
+
+@xref{Shell Quoting}, for a function that can help in generating
+command lines to be fed to the shell.
@end cartouche
@end ifnotdocbook
@c ENDOFRANGE outre
@c ENDOFRANGE reout
-@node Special Files
-@section Special @value{FFN}s in @command{gawk}
-@c STARTOFRANGE gfn
-@cindex @command{gawk}, file names in
-
-@command{gawk} provides a number of special @value{FN}s that it interprets
-internally. These @value{FN}s provide access to standard file descriptors
-and TCP/IP networking.
-
-@menu
-* Special FD:: Special files for I/O.
-* Special Network:: Special files for network communications.
-* Special Caveats:: Things to watch out for.
-@end menu
-
@node Special FD
-@subsection Special Files for Standard Descriptors
+@section Special Files for Standard Pre-Opened Data Streams
@cindex standard input
@cindex input, standard
@cindex standard output
@@ -9743,9 +9847,12 @@ and TCP/IP networking.
@cindex files, descriptors, See file descriptors
Running programs conventionally have three input and output streams
-already available to them for reading and writing. These are known as
-the @dfn{standard input}, @dfn{standard output}, and @dfn{standard error
-output}. These streams are, by default, connected to your keyboard and screen, but
+already available to them for reading and writing. These are known
+as the @dfn{standard input}, @dfn{standard output}, and @dfn{standard
+error output}. These open streams (and any other open file or pipe)
+are often referred to by the technical term @dfn{file descriptors}.
+
+These streams are, by default, connected to your keyboard and screen, but
they are often redirected with the shell, via the @samp{<}, @samp{<<},
@samp{>}, @samp{>>}, @samp{>&}, and @samp{|} operators. Standard error
is typically used for writing error messages; the reason there are two separate
@@ -9754,7 +9861,7 @@ redirected separately.
@cindex differences in @command{awk} and @command{gawk}, error messages
@cindex error handling
-In other implementations of @command{awk}, the only way to write an error
+In traditional implementations of @command{awk}, the only way to write an error
message to standard error in an @command{awk} program is as follows:
@example
@@ -9780,19 +9887,19 @@ that is connected to your keyboard and screen. It represents the
``terminal,''@footnote{The ``tty'' in @file{/dev/tty} stands for
``Teletype,'' a serial terminal.} which on modern systems is a keyboard
and screen, not a serial console.)
-This usually has the same effect but not always: although the
+This generally has the same effect but not always: although the
standard error stream is usually the screen, it can be redirected; when
that happens, writing to the screen is not correct. In fact, if
@command{awk} is run from a background job, it may not have a
terminal at all.
Then opening @file{/dev/tty} fails.
-@command{gawk} provides special @value{FN}s for accessing the three standard
-streams. @value{COMMONEXT} It also provides syntax for accessing
-any other inherited open files. If the @value{FN} matches
-one of these special names when @command{gawk} redirects input or output,
-then it directly uses the stream that the @value{FN} stands for.
-These special @value{FN}s work for all operating systems that @command{gawk}
+@command{gawk}, BWK @command{awk} and @command{mawk} provide
+special @value{FN}s for accessing the three standard streams.
+If the @value{FN} matches one of these special names when @command{gawk}
+(or one of the others) redirects input or output, then it directly uses
+the descriptor that the @value{FN} stands for. These special
+@value{FN}s work for all operating systems that @command{gawk}
has been ported to, not just those that are POSIX-compliant:
@cindex common extensions, @code{/dev/stdin} special file
@@ -9814,19 +9921,10 @@ The standard output (file descriptor 1).
@item /dev/stderr
The standard error output (file descriptor 2).
-
-@item /dev/fd/@var{N}
-The file associated with file descriptor @var{N}. Such a file must
-be opened by the program initiating the @command{awk} execution (typically
-the shell). Unless special pains are taken in the shell from which
-@command{gawk} is invoked, only descriptors 0, 1, and 2 are available.
@end table
-The @value{FN}s @file{/dev/stdin}, @file{/dev/stdout}, and @file{/dev/stderr}
-are aliases for @file{/dev/fd/0}, @file{/dev/fd/1}, and @file{/dev/fd/2},
-respectively. However, they are more self-explanatory.
-The proper way to write an error message in a @command{gawk} program
-is to use @file{/dev/stderr}, like this:
+With these facilities,
+the proper way to write an error message then becomes:
@example
print "Serious error detected!" > "/dev/stderr"
@@ -9838,14 +9936,51 @@ Like any other redirection, the value must be a string.
It is a common error to omit the quotes, which leads
to confusing results.
-Finally, using the @code{close()} function on a @value{FN} of the
+@command{gawk} does not treat these @value{FN}s as special when
+in POSIX compatibility mode. However, since BWK @command{awk}
+supports them, @command{gawk} does support them even when
+invoked with the @option{--traditional} option (@pxref{Options}).
+
+@node Special Files
+@section Special @value{FFN}s in @command{gawk}
+@c STARTOFRANGE gfn
+@cindex @command{gawk}, file names in
+
+Besides access to standard input, stanard output, and standard error,
+@command{gawk} provides access to any open file descriptor.
+Additionally, there are special @value{FN}s reserved for
+TCP/IP networking.
+
+@menu
+* Other Inherited Files:: Accessing other open files with
+ @command{gawk}.
+* Special Network:: Special files for network communications.
+* Special Caveats:: Things to watch out for.
+@end menu
+
+@node Other Inherited Files
+@subsection Accessing Other Open Files With @command{gawk}
+
+Besides the @code{/dev/stdin}, @code{/dev/stdout}, and @code{/dev/stderr}
+special @value{FN}s mentioned earlier, @command{gawk} provides syntax
+for accessing any other inherited open file:
+
+@table @file
+@item /dev/fd/@var{N}
+The file associated with file descriptor @var{N}. Such a file must
+be opened by the program initiating the @command{awk} execution (typically
+the shell). Unless special pains are taken in the shell from which
+@command{gawk} is invoked, only descriptors 0, 1, and 2 are available.
+@end table
+
+The @value{FN}s @file{/dev/stdin}, @file{/dev/stdout}, and @file{/dev/stderr}
+are essentially aliases for @file{/dev/fd/0}, @file{/dev/fd/1}, and
+@file{/dev/fd/2}, respectively. However, those names are more self-explanatory.
+
+Note that using @code{close()} on a @value{FN} of the
form @code{"/dev/fd/@var{N}"}, for file descriptor numbers
above two, does actually close the given file descriptor.
-The @file{/dev/stdin}, @file{/dev/stdout}, and @file{/dev/stderr}
-special files are also recognized internally by several other
-versions of @command{awk}.
-
@node Special Network
@subsection Special Files for Network Communications
@cindex networks, support for
@@ -9874,15 +10009,20 @@ Full discussion is delayed until
@node Special Caveats
@subsection Special @value{FFN} Caveats
-Here is a list of things to bear in mind when using the
+Here are some things to bear in mind when using the
special @value{FN}s that @command{gawk} provides:
@itemize @value{BULLET}
@cindex compatibility mode (@command{gawk}), file names
@cindex file names, in compatibility mode
@item
-Recognition of these special @value{FN}s is disabled if @command{gawk} is in
-compatibility mode (@pxref{Options}).
+Recognition of the @value{FN}s for the three standard pre-opened
+files is disabled only in POSIX mode.
+
+@item
+Recognition of the other special @value{FN}s is disabled if @command{gawk} is in
+compatibility mode (either @option{--traditional} or @option{--posix};
+@pxref{Options}).
@item
@command{gawk} @emph{always}
@@ -10052,7 +10192,8 @@ to a string indicating the error.
Note also that @samp{close(FILENAME)} has no ``magic'' effects on the
implicit loop that reads through the files named on the command line.
It is, more likely, a close of a file that was never opened with a
-redirection, so @command{awk} silently does nothing.
+redirection, so @command{awk} silently does nothing, except return
+a negative value.
@cindex @code{|} (vertical bar), @code{|&} operator (I/O), pipes@comma{} closing
When using the @samp{|&} operator to communicate with a coprocess,
@@ -10064,10 +10205,10 @@ the first argument is the name of the command or special file used
to start the coprocess.
The second argument should be a string, with either of the values
@code{"to"} or @code{"from"}. Case does not matter.
-As this is an advanced feature, a more complete discussion is
+As this is an advanced feature, discussion is
delayed until
@ref{Two-way I/O},
-which discusses it in more detail and gives an example.
+which describes it in more detail and gives an example.
@cindex sidebar, Using @code{close()}'s Return Value
@ifdocbook
@@ -10098,7 +10239,7 @@ retval = close(command) # syntax error in many Unix awks
The return value is @minus{}1 if the argument names something
that was never opened with a redirection, or if there is
a system problem closing the file or process.
-In these cases, @command{gawk} sets the built-in variable
+In these cases, @command{gawk} sets the predefined variable
@code{ERRNO} to a string describing the problem.
In @command{gawk},
@@ -10154,7 +10295,7 @@ retval = close(command) # syntax error in many Unix awks
The return value is @minus{}1 if the argument names something
that was never opened with a redirection, or if there is
a system problem closing the file or process.
-In these cases, @command{gawk} sets the built-in variable
+In these cases, @command{gawk} sets the predefined variable
@code{ERRNO} to a string describing the problem.
In @command{gawk},
@@ -10201,20 +10342,21 @@ that modify the behavior of the format control letters.
@item
Output from both @code{print} and @code{printf} may be redirected to
-files, pipes, and co-processes.
+files, pipes, and coprocesses.
@item
@command{gawk} provides special file names for access to standard input,
output and error, and for network communications.
@item
-Use @code{close()} to close open file, pipe and co-process redirections.
-For co-processes, it is possible to close only one direction of the
+Use @code{close()} to close open file, pipe and coprocess redirections.
+For coprocesses, it is possible to close only one direction of the
communications.
@end itemize
-@node Output exercises
+@c EXCLUDE START
+@node Output Exercises
@section Exercises
@enumerate
@@ -10243,6 +10385,7 @@ BEGIN @{ print "Serious error detected!" > /dev/stderr @}
@end example
@end enumerate
+@c EXCLUDE END
@c ENDOFRANGE prnt
@@ -10341,7 +10484,7 @@ double-quotation marks. For example:
@cindex strings, length limitations
represents the string whose contents are @samp{parrot}. Strings in
@command{gawk} can be of any length, and they can contain any of the possible
-eight-bit ASCII characters including ASCII @sc{nul} (character code zero).
+eight-bit ASCII characters including ASCII @value{NUL} (character code zero).
Other @command{awk}
implementations may have difficulty with some character codes.
@@ -10486,7 +10629,8 @@ A regexp constant is a regular expression description enclosed in
slashes, such as @code{@w{/^beginning and end$/}}. Most regexps used in
@command{awk} programs are constant, but the @samp{~} and @samp{!~}
matching operators can also match computed or dynamic regexps
-(which are just ordinary strings or variables that contain a regexp).
+(which are typically just ordinary strings or variables that contain a regexp,
+but could be a more complex expression).
@c ENDOFRANGE cnst
@node Using Constant Regexps
@@ -10520,7 +10664,7 @@ if (/barfly/ || /camelot/)
@noindent
are exactly equivalent.
One rather bizarre consequence of this rule is that the following
-Boolean expression is valid, but does not do what the user probably
+Boolean expression is valid, but does not do what its author probably
intended:
@example
@@ -10566,10 +10710,9 @@ Modern implementations of @command{awk}, including @command{gawk}, allow
the third argument of @code{split()} to be a regexp constant, but some
older implementations do not.
@value{DARKCORNER}
-This can lead to confusion when attempting to use regexp constants
-as arguments to user-defined functions
-(@pxref{User-defined}).
-For example:
+Because some built-in functions accept regexp constants as arguments,
+it can be confusing when attempting to use regexp constants as arguments
+to user-defined functions (@pxref{User-defined}). For example:
@example
function mysub(pat, repl, str, global)
@@ -10624,7 +10767,11 @@ on the @command{awk} command line.
Variables let you give names to values and refer to them later. Variables
have already been used in many of the examples. The name of a variable
must be a sequence of letters, digits, or underscores, and it may not begin
-with a digit. Case is significant in variable names; @code{a} and @code{A}
+with a digit.
+Here, a @dfn{letter} is any one of the 52 upper- and lowercase
+English letters. Other characters that may be defined as letters
+in non-English locales are not valid in variable names.
+Case is significant in variable names; @code{a} and @code{A}
are distinct variables.
A variable name is a valid expression by itself; it represents the
@@ -10633,24 +10780,24 @@ variable's current value. Variables are given new values with
@dfn{decrement operators}.
@xref{Assignment Ops}.
In addition, the @code{sub()} and @code{gsub()} functions can
-change a variable's value, and the @code{match()}, @code{patsplit()}
-and @code{split()} functions can change the contents of their
+change a variable's value, and the @code{match()}, @code{split()}
+and @code{patsplit()} functions can change the contents of their
array parameters. @xref{String Functions}.
@cindex variables, built-in
@cindex variables, initializing
A few variables have special built-in meanings, such as @code{FS} (the
field separator), and @code{NF} (the number of fields in the current input
-record). @xref{Built-in Variables}, for a list of the built-in variables.
-These built-in variables can be used and assigned just like all other
+record). @xref{Built-in Variables}, for a list of the predefined variables.
+These predefined variables can be used and assigned just like all other
variables, but their values are also used or changed automatically by
-@command{awk}. All built-in variables' names are entirely uppercase.
+@command{awk}. All predefined variables' names are entirely uppercase.
Variables in @command{awk} can be assigned either numeric or string values.
The kind of value a variable holds can change over the life of a program.
By default, variables are initialized to the empty string, which
is zero if converted to a number. There is no need to explicitly
-``initialize'' a variable in @command{awk},
+initialize a variable in @command{awk},
which is what you would do in C and in most other traditional languages.
@node Assignment Options
@@ -10770,7 +10917,7 @@ Strings that can't be interpreted as valid numbers convert to zero.
@cindex @code{CONVFMT} variable
The exact manner in which numbers are converted into strings is controlled
-by the @command{awk} built-in variable @code{CONVFMT} (@pxref{Built-in Variables}).
+by the @command{awk} predefined variable @code{CONVFMT} (@pxref{Built-in Variables}).
Numbers are converted using the @code{sprintf()} function
with @code{CONVFMT} as the format
specifier
@@ -10887,8 +11034,8 @@ $ @kbd{echo 4,321 | LC_ALL=en_DK.utf-8 gawk '@{ print $1 + 1 @}'}
@noindent
The @code{en_DK.utf-8} locale is for English in Denmark, where the comma acts as
the decimal point separator. In the normal @code{"C"} locale, @command{gawk}
-treats @samp{4,321} as @samp{4}, while in the Danish locale, it's treated
-as the full number, 4.321.
+treats @samp{4,321} as 4, while in the Danish locale, it's treated
+as the full number including the fractional part, 4.321.
Some earlier versions of @command{gawk} fully complied with this aspect
of the standard. However, many users in non-English locales complained
@@ -11444,7 +11591,7 @@ awk '/[=]=/' /dev/null
@end example
@command{gawk} does not have this problem; BWK @command{awk}
-and @command{mawk} also do not (@pxref{Other Versions}).
+and @command{mawk} also do not.
@docbook
</sidebar>
@@ -11490,7 +11637,7 @@ awk '/[=]=/' /dev/null
@end example
@command{gawk} does not have this problem; BWK @command{awk}
-and @command{mawk} also do not (@pxref{Other Versions}).
+and @command{mawk} also do not.
@end cartouche
@end ifnotdocbook
@c ENDOFRANGE exas
@@ -11802,7 +11949,7 @@ attribute.
@item
Fields, @code{getline} input, @code{FILENAME}, @code{ARGV} elements,
@code{ENVIRON} elements, and the elements of an array created by
-@code{patsplit()}, @code{split()} and @code{match()} that are numeric
+@code{match()}, @code{split()} and @code{patsplit()} that are numeric
strings have the @var{strnum} attribute. Otherwise, they have
the @var{string} attribute. Uninitialized variables also have the
@var{strnum} attribute.
@@ -11957,22 +12104,23 @@ Thus, the six-character input string @w{@samp{ +3.14}} receives the
The following examples print @samp{1} when the comparison between
the two different constants is true, @samp{0} otherwise:
+@c 22.9.2014: Tested with mawk and BWK awk, got same results.
@example
-$ @kbd{echo ' +3.14' | gawk '@{ print $0 == " +3.14" @}'} @ii{True}
+$ @kbd{echo ' +3.14' | awk '@{ print($0 == " +3.14") @}'} @ii{True}
@print{} 1
-$ @kbd{echo ' +3.14' | gawk '@{ print $0 == "+3.14" @}'} @ii{False}
+$ @kbd{echo ' +3.14' | awk '@{ print($0 == "+3.14") @}'} @ii{False}
@print{} 0
-$ @kbd{echo ' +3.14' | gawk '@{ print $0 == "3.14" @}'} @ii{False}
+$ @kbd{echo ' +3.14' | awk '@{ print($0 == "3.14") @}'} @ii{False}
@print{} 0
-$ @kbd{echo ' +3.14' | gawk '@{ print $0 == 3.14 @}'} @ii{True}
+$ @kbd{echo ' +3.14' | awk '@{ print($0 == 3.14) @}'} @ii{True}
@print{} 1
-$ @kbd{echo ' +3.14' | gawk '@{ print $1 == " +3.14" @}'} @ii{False}
+$ @kbd{echo ' +3.14' | awk '@{ print($1 == " +3.14") @}'} @ii{False}
@print{} 0
-$ @kbd{echo ' +3.14' | gawk '@{ print $1 == "+3.14" @}'} @ii{True}
+$ @kbd{echo ' +3.14' | awk '@{ print($1 == "+3.14") @}'} @ii{True}
@print{} 1
-$ @kbd{echo ' +3.14' | gawk '@{ print $1 == "3.14" @}'} @ii{False}
+$ @kbd{echo ' +3.14' | awk '@{ print($1 == "3.14") @}'} @ii{False}
@print{} 0
-$ @kbd{echo ' +3.14' | gawk '@{ print $1 == 3.14 @}'} @ii{True}
+$ @kbd{echo ' +3.14' | awk '@{ print($1 == 3.14) @}'} @ii{True}
@print{} 1
@end example
@@ -12046,9 +12194,8 @@ part of the test always succeeds. Because the operators are
so similar, this kind of error is very difficult to spot when
scanning the source code.
-@cindex @command{gawk}, comparison operators and
-The following table of expressions illustrates the kind of comparison
-@command{gawk} performs, as well as what the result of the comparison is:
+The following list of expressions illustrates the kinds of comparisons
+@command{awk} performs, as well as what the result of each comparison is:
@table @code
@item 1.5 <= 2.0
@@ -12121,7 +12268,7 @@ dynamic regexp (@pxref{Regexp Usage}; also
@cindex @command{awk}, regexp constants and
@cindex regexp constants
-In modern implementations of @command{awk}, a constant regular
+A constant regular
expression in slashes by itself is also an expression. The regexp
@code{/@var{regexp}/} is an abbreviation for the following comparison expression:
@@ -12141,7 +12288,7 @@ where this is discussed in more detail.
The POSIX standard says that string comparison is performed based
on the locale's @dfn{collating order}. This is the order in which
characters sort, as defined by the locale (for more discussion,
-@pxref{Ranges and Locales}). This order is usually very different
+@pxref{Locales}). This order is usually very different
from the results obtained when doing straight character-by-character
comparison.@footnote{Technically, string comparison is supposed
to behave the same way as if the strings are compared with the C
@@ -12221,7 +12368,7 @@ no substring @samp{foo} in the record.
True if at least one of @var{boolean1} or @var{boolean2} is true.
For example, the following statement prints all records in the input
that contain @emph{either} @samp{edu} or
-@samp{li} or both:
+@samp{li}:
@example
if ($0 ~ /edu/ || $0 ~ /li/) print
@@ -12230,6 +12377,9 @@ if ($0 ~ /edu/ || $0 ~ /li/) print
The subexpression @var{boolean2} is evaluated only if @var{boolean1}
is false. This can make a difference when @var{boolean2} contains
expressions that have side effects.
+(Thus, this test never really distinguishes records that contain both
+@samp{edu} and @samp{li}---as soon as @samp{edu} is matched,
+the full test succeeds.)
@item ! @var{boolean}
True if @var{boolean} is false. For example,
@@ -12239,7 +12389,7 @@ variable is not defined:
@example
BEGIN @{ if (! ("HOME" in ENVIRON))
- print "no home!" @}
+ print "no home!" @}
@end example
(The @code{in} operator is described in
@@ -12258,7 +12408,7 @@ is ``short-circuited'' if the result can be determined part way through
its evaluation.
@cindex line continuations
-Statements that use @samp{&&} or @samp{||} can be continued simply
+Statements that end with @samp{&&} or @samp{||} can be continued simply
by putting a newline after them. But you cannot put a newline in front
of either of these operators without using backslash continuation
(@pxref{Statements/Lines}).
@@ -12277,7 +12427,7 @@ program is one way to print lines in between special bracketing lines:
@example
$1 == "START" @{ interested = ! interested; next @}
-interested == 1 @{ print @}
+interested @{ print @}
$1 == "END" @{ interested = ! interested; next @}
@end example
@@ -12297,6 +12447,16 @@ bogus input data, but the point is to illustrate the use of `!',
so we'll leave well enough alone.
@end ignore
+Most commonly, the @samp{!} operator is used in the conditions of
+@code{if} and @code{while} statements, where it often makes more
+sense to phrase the logic in the negative:
+
+@example
+if (! @var{some condition} || @var{some other condition}) @{
+ @var{@dots{} do whatever processing @dots{}}
+@}
+@end example
+
@cindex @code{next} statement
@quotation NOTE
The @code{next} statement is discussed in
@@ -12528,7 +12688,7 @@ expression because the first @samp{$} has higher precedence than the
@samp{++}; to avoid the problem the expression can be rewritten as
@samp{$($0++)--}.
-This table presents @command{awk}'s operators, in order of highest
+This list presents @command{awk}'s operators, in order of highest
to lowest precedence:
@c @asis for docbook to come out right
@@ -12685,8 +12845,8 @@ system about the local character set and language. The ISO C standard
defines a default @code{"C"} locale, which is an environment that is
typical of what many C programmers are used to.
-Once upon a time, the locale setting used to affect regexp matching
-(@pxref{Ranges and Locales}), but this is no longer true.
+Once upon a time, the locale setting used to affect regexp matching,
+but this is no longer true (@pxref{Ranges and Locales}).
Locales can affect record splitting. For the normal case of @samp{RS =
"\n"}, the locale is largely irrelevant. For other single-character
@@ -12698,7 +12858,7 @@ character}, to find the record terminator.
Locales can affect how dates and times are formatted (@pxref{Time
Functions}). For example, a common way to abbreviate the date September
4, 2015 in the United States is ``9/4/15.'' In many countries in
-Europe, however, it is abbreviated ``4.9.15.'' Thus, the @samp{%x}
+Europe, however, it is abbreviated ``4.9.15.'' Thus, the @code{%x}
specification in a @code{"US"} locale might produce @samp{9/4/15},
while in a @code{"EUROPE"} locale, it might produce @samp{4.9.15}.
@@ -12740,7 +12900,8 @@ Locales can influence the conversions.
@item
@command{awk} provides the usual arithmetic operators (addition,
subtraction, multiplication, division, modulus), and unary plus and minus.
-It also provides comparison operators, boolean operators, and regexp
+It also provides comparison operators, boolean operators, array membership
+testing, and regexp
matching operators. String concatenation is accomplished by placing
two expressions next to each other; there is no explicit operator.
The three-operand @samp{?:} operator provides an ``if-else'' test within
@@ -12755,7 +12916,7 @@ In @command{awk}, a value is considered to be true if it is non-zero
@emph{or} non-null. Otherwise, the value is false.
@item
-A value's type is set upon each assignment and may change over its
+A variable's type is set upon each assignment and may change over its
lifetime. The type determines how it behaves in comparisons (string
or numeric).
@@ -12787,7 +12948,7 @@ program, and occasionally the format for data read as input.
As you have already seen, each @command{awk} statement consists of
a pattern with an associated action. This @value{CHAPTER} describes how
you build patterns and actions, what kinds of things you can do within
-actions, and @command{awk}'s built-in variables.
+actions, and @command{awk}'s predefined variables.
The pattern-action rules and the statements available for use
within actions form the core of @command{awk} programming.
@@ -12802,7 +12963,7 @@ building something useful.
* Action Overview:: What goes into an action.
* Statements:: Describes the various control statements in
detail.
-* Built-in Variables:: Summarizes the built-in variables.
+* Built-in Variables:: Summarizes the predefined variables.
* Pattern Action Summary:: Patterns and Actions summary.
@end menu
@@ -12835,7 +12996,7 @@ is nonzero (if a number) or non-null (if a string).
(@xref{Expression Patterns}.)
@item @var{begpat}, @var{endpat}
-A pair of patterns separated by a comma, specifying a range of records.
+A pair of patterns separated by a comma, specifying a @dfn{range} of records.
The range includes both the initial record that matches @var{begpat} and
the final record that matches @var{endpat}.
(@xref{Ranges}.)
@@ -12917,7 +13078,7 @@ Contrast this with the following regular expression match, which
accepts any record with a first field that contains @samp{li}:
@example
-$ @kbd{awk '$1 ~ /foo/ @{ print $2 @}' mail-list}
+$ @kbd{awk '$1 ~ /li/ @{ print $2 @}' mail-list}
@print{} 555-5553
@print{} 555-6699
@end example
@@ -12925,8 +13086,8 @@ $ @kbd{awk '$1 ~ /foo/ @{ print $2 @}' mail-list}
@cindex regexp constants, as patterns
@cindex patterns, regexp constants as
A regexp constant as a pattern is also a special case of an expression
-pattern. The expression @code{/li/} has the value one if @samp{li}
-appears in the current input record. Thus, as a pattern, @code{/li/}
+pattern. The expression @samp{/li/} has the value one if @samp{li}
+appears in the current input record. Thus, as a pattern, @samp{/li/}
matches any record containing @samp{li}.
@cindex Boolean expressions, as patterns
@@ -13108,7 +13269,7 @@ input is read. For example:
@example
$ @kbd{awk '}
> @kbd{BEGIN @{ print "Analysis of \"li\"" @}}
-> @kbd{/li/ @{ ++n @}}
+> @kbd{/li/ @{ ++n @}}
> @kbd{END @{ print "\"li\" appears in", n, "records." @}' mail-list}
@print{} Analysis of "li"
@print{} "li" appears in 4 records.
@@ -13188,9 +13349,10 @@ The POSIX standard specifies that @code{NF} is available in an @code{END}
rule. It contains the number of fields from the last input record.
Most probably due to an oversight, the standard does not say that @code{$0}
is also preserved, although logically one would think that it should be.
-In fact, @command{gawk} does preserve the value of @code{$0} for use in
-@code{END} rules. Be aware, however, that BWK @command{awk}, and possibly
-other implementations, do not.
+In fact, all of BWK @command{awk}, @command{mawk}, and @command{gawk}
+preserve the value of @code{$0} for use in @code{END} rules. Be aware,
+however, that some other implementations and many older versions
+of Unix @command{awk} do not.
The third point follows from the first two. The meaning of @samp{print}
inside a @code{BEGIN} or @code{END} rule is the same as always:
@@ -13285,8 +13447,8 @@ level of the @command{awk} program.
@cindex @code{next} statement, @code{BEGINFILE}/@code{ENDFILE} patterns and
The @code{next} statement (@pxref{Next Statement}) is not allowed inside
-either a @code{BEGINFILE} or and @code{ENDFILE} rule. The @code{nextfile}
-statement (@pxref{Nextfile Statement}) is allowed only inside a
+either a @code{BEGINFILE} or an @code{ENDFILE} rule. The @code{nextfile}
+statement is allowed only inside a
@code{BEGINFILE} rule, but not inside an @code{ENDFILE} rule.
@cindex @code{getline} statement, @code{BEGINFILE}/@code{ENDFILE} patterns and
@@ -13350,7 +13512,7 @@ There are two ways to get the value of the shell variable
into the body of the @command{awk} program.
@cindex shells, quoting
-The most common method is to use shell quoting to substitute
+A common method is to use shell quoting to substitute
the variable's value into the program inside the script.
For example, consider the following program:
@@ -13607,20 +13769,21 @@ If the @var{condition} is true, it executes the statement @var{body}.
is not zero and not a null string.)
@end ifinfo
After @var{body} has been executed,
-@var{condition} is tested again, and if it is still true, @var{body} is
-executed again. This process repeats until the @var{condition} is no longer
-true. If the @var{condition} is initially false, the body of the loop is
-never executed and @command{awk} continues with the statement following
+@var{condition} is tested again, and if it is still true, @var{body}
+executes again. This process repeats until the @var{condition} is no longer
+true. If the @var{condition} is initially false, the body of the loop
+never executes and @command{awk} continues with the statement following
the loop.
This example prints the first three fields of each record, one per line:
@example
-awk '@{
- i = 1
- while (i <= 3) @{
- print $i
- i++
- @}
+awk '
+@{
+ i = 1
+ while (i <= 3) @{
+ print $i
+ i++
+ @}
@}' inventory-shipped
@end example
@@ -13654,14 +13817,14 @@ do
while (@var{condition})
@end example
-Even if the @var{condition} is false at the start, the @var{body} is
-executed at least once (and only once, unless executing @var{body}
+Even if the @var{condition} is false at the start, the @var{body}
+executes at least once (and only once, unless executing @var{body}
makes @var{condition} true). Contrast this with the corresponding
@code{while} statement:
@example
while (@var{condition})
- @var{body}
+ @var{body}
@end example
@noindent
@@ -13671,11 +13834,11 @@ The following is an example of a @code{do} statement:
@example
@{
- i = 1
- do @{
- print $0
- i++
- @} while (i <= 10)
+ i = 1
+ do @{
+ print $0
+ i++
+ @} while (i <= 10)
@}
@end example
@@ -13712,9 +13875,10 @@ compares it against the desired number of iterations.
For example:
@example
-awk '@{
- for (i = 1; i <= 3; i++)
- print $i
+awk '
+@{
+ for (i = 1; i <= 3; i++)
+ print $i
@}' inventory-shipped
@end example
@@ -13742,7 +13906,7 @@ between 1 and 100:
@example
for (i = 1; i <= 100; i *= 2)
- print i
+ print i
@end example
If there is nothing to be done, any of the three expressions in the
@@ -14033,9 +14197,8 @@ the beginning, in the following manner:
@example
NF != 4 @{
- err = sprintf("%s:%d: skipped: NF != 4\n", FILENAME, FNR)
- print err > "/dev/stderr"
- next
+ printf("%s:%d: skipped: NF != 4\n", FILENAME, FNR) > "/dev/stderr"
+ next
@}
@end example
@@ -14063,7 +14226,7 @@ The @code{next} statement is not allowed inside @code{BEGINFILE} and
@cindex functions, user-defined, @code{next}/@code{nextfile} statements and
According to the POSIX standard, the behavior is undefined if the
@code{next} statement is used in a @code{BEGIN} or @code{END} rule.
-@command{gawk} treats it as a syntax error. Although POSIX permits it,
+@command{gawk} treats it as a syntax error. Although POSIX does not disallow it,
most other @command{awk} implementations don't allow the @code{next}
statement inside function bodies (@pxref{User-defined}). Just as with any
other @code{next} statement, a @code{next} statement inside a function
@@ -14089,7 +14252,8 @@ starts over with the first rule in the program.
If the @code{nextfile} statement causes the end of the input to be reached,
then the code in any @code{END} rules is executed. An exception to this is
when @code{nextfile} is invoked during execution of any statement in an
-@code{END} rule; In this case, it causes the program to stop immediately. @xref{BEGIN/END}.
+@code{END} rule; in this case, it causes the program to stop immediately.
+@xref{BEGIN/END}.
The @code{nextfile} statement is useful when there are many @value{DF}s
to process but it isn't necessary to process every record in every file.
@@ -14099,13 +14263,10 @@ would have to continue scanning the unwanted records. The @code{nextfile}
statement accomplishes this much more efficiently.
In @command{gawk}, execution of @code{nextfile} causes additional things
-to happen:
-any @code{ENDFILE} rules are executed except in the case as
-mentioned below,
-@code{ARGIND} is incremented,
-and
-any @code{BEGINFILE} rules are executed.
-(@code{ARGIND} hasn't been introduced yet. @xref{Built-in Variables}.)
+to happen: any @code{ENDFILE} rules are executed if @command{gawk} is
+not currently in an @code{END} or @code{BEGINFILE} rule, @code{ARGIND} is
+incremented, and any @code{BEGINFILE} rules are executed. (@code{ARGIND}
+hasn't been introduced yet. @xref{Built-in Variables}.)
With @command{gawk}, @code{nextfile} is useful inside a @code{BEGINFILE}
rule to skip over a file that would otherwise cause @command{gawk}
@@ -14120,7 +14281,7 @@ opened with redirections. It is not related to the main processing that
@quotation NOTE
For many years, @code{nextfile} was a
-@command{gawk} extension. As of September, 2012, it was accepted for
+common extension. In September, 2012, it was accepted for
inclusion into the POSIX standard.
See @uref{http://austingroupbugs.net/view.php?id=607, the Austin Group website}.
@end quotation
@@ -14129,8 +14290,8 @@ See @uref{http://austingroupbugs.net/view.php?id=607, the Austin Group website}.
@cindex @code{nextfile} statement, user-defined functions and
@cindex Brian Kernighan's @command{awk}
@cindex @command{mawk} utility
-The current version of BWK @command{awk}, and @command{mawk} (@pxref{Other
-Versions}) also support @code{nextfile}. However, they don't allow the
+The current version of BWK @command{awk}, and @command{mawk}
+also support @code{nextfile}. However, they don't allow the
@code{nextfile} statement inside function bodies (@pxref{User-defined}).
@command{gawk} does; a @code{nextfile} inside a function body reads the
next record and starts processing it with the first rule in the program,
@@ -14162,8 +14323,8 @@ the program to stop immediately.
An @code{exit} statement that is not part of a @code{BEGIN} or @code{END}
rule stops the execution of any further automatic rules for the current
record, skips reading any remaining input records, and executes the
-@code{END} rule if there is one.
-Any @code{ENDFILE} rules are also skipped; they are not executed.
+@code{END} rule if there is one. @command{gawk} also skips
+any @code{ENDFILE} rules; they do not execute.
In such a case,
if you don't want the @code{END} rule to do its job, set a variable
@@ -14211,11 +14372,11 @@ results across different operating systems.
@c ENDOFRANGE accs
@node Built-in Variables
-@section Built-in Variables
+@section Predefined Variables
@c STARTOFRANGE bvar
-@cindex built-in variables
+@cindex predefined variables
@c STARTOFRANGE varb
-@cindex variables, built-in
+@cindex variables, predefined
Most @command{awk} variables are available to use for your own
purposes; they never change unless your program assigns values to
@@ -14226,8 +14387,8 @@ to tell @command{awk} how to do certain things. Others are set
automatically by @command{awk}, so that they carry information from the
internal workings of @command{awk} to your program.
-@cindex @command{gawk}, built-in variables and
-This @value{SECTION} documents all of @command{gawk}'s built-in variables,
+@cindex @command{gawk}, predefined variables and
+This @value{SECTION} documents all of @command{gawk}'s predefined variables,
most of which are also documented in the @value{CHAPTER}s describing
their areas of activity.
@@ -14242,7 +14403,7 @@ their areas of activity.
@node User-modified
@subsection Built-in Variables That Control @command{awk}
@c STARTOFRANGE bvaru
-@cindex built-in variables, user-modifiable
+@cindex predefined variables, user-modifiable
@c STARTOFRANGE nmbv
@cindex user-modifiable variables
@@ -14271,7 +14432,7 @@ respectively, should use binary I/O. A string value of @code{"rw"} or
@code{"wr"} indicates that all files should use binary I/O. Any other
string value is treated the same as @code{"rw"}, but causes @command{gawk}
to generate a warning message. @code{BINMODE} is described in more
-detail in @ref{PC Using}. @command{mawk} @pxref{Other Versions}),
+detail in @ref{PC Using}. @command{mawk} (@pxref{Other Versions}),
also supports this variable, but only using numeric values.
@cindex @code{CONVFMT} variable
@@ -14398,7 +14559,7 @@ printing with the @code{print} statement. It works by being passed
as the first argument to the @code{sprintf()} function
(@pxref{String Functions}).
Its default value is @code{"%.6g"}. Earlier versions of @command{awk}
-also used @code{OFMT} to specify the format for converting numbers to
+used @code{OFMT} to specify the format for converting numbers to
strings in general expressions; this is now done by @code{CONVFMT}.
@cindex @code{sprintf()} function, @code{OFMT} variable and
@@ -14479,9 +14640,9 @@ The default value of @code{TEXTDOMAIN} is @code{"messages"}.
@subsection Built-in Variables That Convey Information
@c STARTOFRANGE bvconi
-@cindex built-in variables, conveying information
+@cindex predefined variables, conveying information
@c STARTOFRANGE vbconi
-@cindex variables, built-in, conveying information
+@cindex variables, predefined conveying information
The following is an alphabetical list of variables that @command{awk}
sets automatically on certain occasions in order to provide
information to your program.
@@ -14550,8 +14711,8 @@ successive instances of the same @value{FN} on the command line.
@cindex file names, distinguishing
While you can change the value of @code{ARGIND} within your @command{awk}
-program, @command{gawk} automatically sets it to a new value when the
-next file is opened.
+program, @command{gawk} automatically sets it to a new value when it
+opens the next file.
@cindex @code{ENVIRON} array
@cindex environment variables, in @code{ENVIRON} array
@@ -14616,10 +14777,10 @@ can give @code{FILENAME} a value.
@cindex @code{FNR} variable
@item @code{FNR}
-The current record number in the current file. @code{FNR} is
-incremented each time a new record is read
-(@pxref{Records}). It is reinitialized
-to zero each time a new input file is started.
+The current record number in the current file. @command{awk} increments
+@code{FNR} each time it reads a new record (@pxref{Records}).
+@command{awk} resets @code{FNR} to zero each time it starts a new
+input file.
@cindex @code{NF} variable
@item @code{NF}
@@ -14638,7 +14799,7 @@ current record. @xref{Changing Fields}.
@cindex differences in @command{awk} and @command{gawk}, @code{FUNCTAB} variable
@item @code{FUNCTAB #}
An array whose indices and corresponding values are the names of all
-the user-defined or extension functions in the program.
+the built-in, user-defined and extension functions in the program.
@quotation NOTE
Attempting to use the @code{delete} statement with the @code{FUNCTAB}
@@ -14651,7 +14812,7 @@ array causes a fatal error. Any attempt to assign to an element of
The number of input records @command{awk} has processed since
the beginning of the program's execution
(@pxref{Records}).
-@code{NR} is incremented each time a new record is read.
+@command{awk} increments @code{NR} each time it reads a new record.
@cindex @command{gawk}, @code{PROCINFO} array in
@cindex @code{PROCINFO} array
@@ -14679,16 +14840,22 @@ or @code{"FPAT"} if field matching with @code{FPAT} is in effect.
@item PROCINFO["identifiers"]
@cindex program identifiers
-A subarray, indexed by the names of all identifiers used in the
-text of the AWK program. For each identifier, the value of the element is one of the following:
+A subarray, indexed by the names of all identifiers used in the text of
+the AWK program. An @dfn{identifier} is simply the name of a variable
+(be it scalar or array), built-in function, user-defined function, or
+extension function. For each identifier, the value of the element is
+one of the following:
@table @code
@item "array"
The identifier is an array.
+@item "builtin"
+The identifier is a built-in function.
+
@item "extension"
The identifier is an extension function loaded via
-@code{@@load}.
+@code{@@load} or @option{-l}.
@item "scalar"
The identifier is a scalar.
@@ -14725,7 +14892,7 @@ The parent process ID of the current process.
@item PROCINFO["sorted_in"]
If this element exists in @code{PROCINFO}, its value controls the
order in which array indices will be processed by
-@samp{for (@var{index} in @var{array})} loops.
+@samp{for (@var{indx} in @var{array})} loops.
Since this is an advanced feature, we defer the
full description until later; see
@ref{Scanning an Array}.
@@ -14746,7 +14913,7 @@ The version of @command{gawk}.
The following additional elements in the array
are available to provide information about the MPFR and GMP libraries
-if your version of @command{gawk} supports arbitrary precision numbers
+if your version of @command{gawk} supports arbitrary precision arithmetic
(@pxref{Arbitrary Precision Arithmetic}):
@table @code
@@ -14795,14 +14962,14 @@ The @code{PROCINFO} array has the following additional uses:
@itemize @value{BULLET}
@item
-It may be used to cause coprocesses to communicate over pseudo-ttys
-instead of through two-way pipes; this is discussed further in
-@ref{Two-way I/O}.
-
-@item
It may be used to provide a timeout when reading from any
open input file, pipe, or coprocess.
@xref{Read Timeout}, for more information.
+
+@item
+It may be used to cause coprocesses to communicate over pseudo-ttys
+instead of through two-way pipes; this is discussed further in
+@ref{Two-way I/O}.
@end itemize
@cindex @code{RLENGTH} variable
@@ -14833,9 +15000,13 @@ the record separator. It is set every time a record is read.
@cindex @code{SYMTAB} array
@cindex differences in @command{awk} and @command{gawk}, @code{SYMTAB} variable
@item @code{SYMTAB #}
-An array whose indices are the names of all currently defined
-global variables and arrays in the program. The array may be used
-for indirect access to read or write the value of a variable:
+An array whose indices are the names of all defined global variables and
+arrays in the program. @code{SYMTAB} makes @command{gawk}'s symbol table
+visible to the @command{awk} programmer. It is built as @command{gawk}
+parses the program and is complete before the program starts to run.
+
+The array may be used for indirect access to read or write the value of
+a variable:
@example
foo = 5
@@ -14968,7 +15139,7 @@ changed.
@cindex arguments, command-line
@cindex command line, arguments
-@ref{Auto-set},
+@DBREF{Auto-set}
presented the following program describing the information contained in @code{ARGC}
and @code{ARGV}:
@@ -15090,6 +15261,12 @@ following @option{-v} are passed on to the @command{awk} program.
(@xref{Getopt Function}, for an @command{awk} library function that
parses command-line options.)
+When designing your program, you should choose options that don't
+conflict with @command{gawk}'s, since it will process any options
+that it accepts before passing the rest of the command line on to
+your program. Using @samp{#!} with the @option{-E} option may help
+(@pxref{Executable Scripts}, and @pxref{Options}).
+
@node Pattern Action Summary
@section Summary
@@ -15124,7 +15301,7 @@ input and output statements, and deletion statements.
The control statements in @command{awk} are @code{if}-@code{else},
@code{while}, @code{for}, and @code{do}-@code{while}. @command{gawk}
adds the @code{switch} statement. There are two flavors of @code{for}
-statement: one for for performing general looping, and the other iterating
+statement: one for performing general looping, and the other for iterating
through an array.
@item
@@ -15141,12 +15318,17 @@ The @code{exit} statement terminates your program. When executed
from an action (or function body) it transfers control to the
@code{END} statements. From an @code{END} statement body, it exits
immediately. You may pass an optional numeric value to be used
-at @command{awk}'s exit status.
+as @command{awk}'s exit status.
@item
-Some built-in variables provide control over @command{awk}, mainly for I/O.
+Some predefined variables provide control over @command{awk}, mainly for I/O.
Other variables convey information from @command{awk} to your program.
+@item
+@code{ARGC} and @code{ARGV} make the command-line arguments available
+to your program. Manipulating them from a @code{BEGIN} rule lets you
+control how @command{awk} will process the provided @value{DF}s.
+
@end itemize
@node Arrays
@@ -15167,24 +15349,13 @@ The @value{CHAPTER} moves on to discuss @command{gawk}'s facility
for sorting arrays, and ends with a brief description of @command{gawk}'s
ability to support true arrays of arrays.
-@cindex variables, names of
-@cindex functions, names of
-@cindex arrays, names of, and names of functions/variables
-@cindex names, arrays/variables
-@cindex namespace issues
-@command{awk} maintains a single set
-of names that may be used for naming variables, arrays, and functions
-(@pxref{User-defined}).
-Thus, you cannot have a variable and an array with the same name in the
-same @command{awk} program.
-
@menu
* Array Basics:: The basics of arrays.
-* Delete:: The @code{delete} statement removes an element
- from an array.
* Numeric Array Subscripts:: How to use numbers as subscripts in
@command{awk}.
* Uninitialized Subscripts:: Using Uninitialized variables as subscripts.
+* Delete:: The @code{delete} statement removes an element
+ from an array.
* Multidimensional:: Emulating multidimensional arrays in
@command{awk}.
* Arrays of Arrays:: True multidimensional arrays.
@@ -15230,7 +15401,7 @@ as a variable) in the same @command{awk} program.
Arrays in @command{awk} superficially resemble arrays in other programming
languages, but there are fundamental differences. In @command{awk}, it
isn't necessary to specify the size of an array before starting to use it.
-Additionally, any number or string in @command{awk}, not just consecutive integers,
+Additionally, any number or string, not just consecutive integers,
may be used as an array index.
In most other languages, arrays must be @dfn{declared} before use,
@@ -15343,7 +15514,10 @@ array element value:
@end docbook
@noindent
-The pairs are shown in jumbled order because their order is irrelevant.
+The pairs are shown in jumbled order because their order is
+irrelevant.@footnote{The ordering will vary among @command{awk}
+implementations, which typically use hash tables to store array elements
+and values.}
One advantage of associative arrays is that new pairs can be added
at any time. For example, suppose a tenth element is added to the array
@@ -15465,8 +15639,9 @@ English to French:
Here we decided to translate the number one in both spelled-out and
numeric form---thus illustrating that a single array can have both
numbers and strings as indices.
-(In fact, array subscripts are always strings; this is discussed
-in more detail in
+(In fact, array subscripts are always strings.
+There are some subtleties to how numbers work when used as
+array subscripts; this is discussed in more detail in
@ref{Numeric Array Subscripts}.)
Here, the number @code{1} isn't double-quoted, since @command{awk}
automatically converts it to a string.
@@ -15553,6 +15728,8 @@ This expression tests whether the particular index @var{indx} exists,
without the side effect of creating that element if it is not present.
The expression has the value one (true) if @code{@var{array}[@var{indx}]}
exists and zero (false) if it does not exist.
+(We use @var{indx} here, since @samp{index} is the name of a built-in
+function.)
For example, this statement tests whether the array @code{frequencies}
contains the index @samp{2}:
@@ -15606,14 +15783,14 @@ begin with a number:
@example
@c file eg/misc/arraymax.awk
@{
- if ($1 > max)
- max = $1
- arr[$1] = $0
+ if ($1 > max)
+ max = $1
+ arr[$1] = $0
@}
END @{
- for (x = 1; x <= max; x++)
- print arr[x]
+ for (x = 1; x <= max; x++)
+ print arr[x]
@}
@c endfile
@end example
@@ -15653,9 +15830,9 @@ program's @code{END} rule, as follows:
@example
END @{
- for (x = 1; x <= max; x++)
- if (x in arr)
- print arr[x]
+ for (x = 1; x <= max; x++)
+ if (x in arr)
+ print arr[x]
@}
@end example
@@ -15677,7 +15854,7 @@ an array:
@example
for (@var{var} in @var{array})
- @var{body}
+ @var{body}
@end example
@noindent
@@ -15750,7 +15927,7 @@ BEGIN @{
@}
@end example
-Here is what happens when run with @command{gawk}:
+Here is what happens when run with @command{gawk} (and @command{mawk}):
@example
$ @kbd{gawk -f loopcheck.awk}
@@ -15868,7 +16045,8 @@ does not affect the loop.
For example:
@example
-$ @kbd{gawk 'BEGIN @{}
+$ @kbd{gawk '}
+> @kbd{BEGIN @{}
> @kbd{ a[4] = 4}
> @kbd{ a[3] = 3}
> @kbd{ for (i in a)}
@@ -15876,7 +16054,8 @@ $ @kbd{gawk 'BEGIN @{}
> @kbd{@}'}
@print{} 4 4
@print{} 3 3
-$ @kbd{gawk 'BEGIN @{}
+$ @kbd{gawk '}
+> @kbd{BEGIN @{}
> @kbd{ PROCINFO["sorted_in"] = "@@ind_str_asc"}
> @kbd{ a[4] = 4}
> @kbd{ a[3] = 3}
@@ -15925,118 +16104,6 @@ the @code{delete} statement.
In addition, @command{gawk} provides built-in functions for
sorting arrays; see @ref{Array Sorting Functions}.
-@node Delete
-@section The @code{delete} Statement
-@cindex @code{delete} statement
-@cindex deleting elements in arrays
-@cindex arrays, elements, deleting
-@cindex elements in arrays, deleting
-
-To remove an individual element of an array, use the @code{delete}
-statement:
-
-@example
-delete @var{array}[@var{index-expression}]
-@end example
-
-Once an array element has been deleted, any value the element once
-had is no longer available. It is as if the element had never
-been referred to or been given a value.
-The following is an example of deleting elements in an array:
-
-@example
-for (i in frequencies)
- delete frequencies[i]
-@end example
-
-@noindent
-This example removes all the elements from the array @code{frequencies}.
-Once an element is deleted, a subsequent @code{for} statement to scan the array
-does not report that element and the @code{in} operator to check for
-the presence of that element returns zero (i.e., false):
-
-@example
-delete foo[4]
-if (4 in foo)
- print "This will never be printed"
-@end example
-
-@cindex null strings, and deleting array elements
-It is important to note that deleting an element is @emph{not} the
-same as assigning it a null value (the empty string, @code{""}).
-For example:
-
-@example
-foo[4] = ""
-if (4 in foo)
- print "This is printed, even though foo[4] is empty"
-@end example
-
-@cindex lint checking, array elements
-It is not an error to delete an element that does not exist.
-However, if @option{--lint} is provided on the command line
-(@pxref{Options}),
-@command{gawk} issues a warning message when an element that
-is not in the array is deleted.
-
-@cindex common extensions, @code{delete} to delete entire arrays
-@cindex extensions, common@comma{} @code{delete} to delete entire arrays
-@cindex arrays, deleting entire contents
-@cindex deleting entire arrays
-@cindex @code{delete} @var{array}
-@cindex differences in @command{awk} and @command{gawk}, array elements, deleting
-All the elements of an array may be deleted with a single statement
-by leaving off the subscript in the @code{delete} statement,
-as follows:
-
-
-@example
-delete @var{array}
-@end example
-
-Using this version of the @code{delete} statement is about three times
-more efficient than the equivalent loop that deletes each element one
-at a time.
-
-@cindex Brian Kernighan's @command{awk}
-@quotation NOTE
-For many years,
-using @code{delete} without a subscript was a @command{gawk} extension.
-As of September, 2012, it was accepted for
-inclusion into the POSIX standard. See @uref{http://austingroupbugs.net/view.php?id=544,
-the Austin Group website}. This form of the @code{delete} statement is also supported
-by BWK @command{awk} and @command{mawk}, as well as
-by a number of other implementations (@pxref{Other Versions}).
-@end quotation
-
-@cindex portability, deleting array elements
-@cindex Brennan, Michael
-The following statement provides a portable but nonobvious way to clear
-out an array:@footnote{Thanks to Michael Brennan for pointing this out.}
-
-@example
-split("", array)
-@end example
-
-@cindex @code{split()} function, array elements@comma{} deleting
-The @code{split()} function
-(@pxref{String Functions})
-clears out the target array first. This call asks it to split
-apart the null string. Because there is no data to split out, the
-function simply clears the array and then returns.
-
-@quotation CAUTION
-Deleting an array does not change its type; you cannot
-delete an array and then use the array's name as a scalar
-(i.e., a regular variable). For example, the following does not work:
-
-@example
-a[1] = 3
-delete a
-a = 3
-@end example
-@end quotation
-
@node Numeric Array Subscripts
@section Using Numbers to Subscript Arrays
@@ -16048,7 +16115,7 @@ An important aspect to remember about arrays is that @emph{array subscripts
are always strings}. When a numeric value is used as a subscript,
it is converted to a string value before being used for subscripting
(@pxref{Conversion}).
-This means that the value of the built-in variable @code{CONVFMT} can
+This means that the value of the predefined variable @code{CONVFMT} can
affect how your program accesses elements of an array. For example:
@example
@@ -16077,7 +16144,7 @@ since @code{"12.15"} is different from @code{"12.153"}.
@cindex integer array indices
According to the rules for conversions
(@pxref{Conversion}), integer
-values are always converted to strings as integers, no matter what the
+values always convert to strings as integers, no matter what the
value of @code{CONVFMT} may happen to be. So the usual case of
the following works:
@@ -16100,7 +16167,7 @@ and
all refer to the same element!
As with many things in @command{awk}, the majority of the time
-things work as one would expect them to. But it is useful to have a precise
+things work as you would expect them to. But it is useful to have a precise
knowledge of the actual rules since they can sometimes have a subtle
effect on your programs.
@@ -16121,7 +16188,7 @@ $ @kbd{echo 'line 1}
> @kbd{line 2}
> @kbd{line 3' | awk '@{ l[lines] = $0; ++lines @}}
> @kbd{END @{}
-> @kbd{for (i = lines-1; i >= 0; --i)}
+> @kbd{for (i = lines - 1; i >= 0; i--)}
> @kbd{print l[i]}
> @kbd{@}'}
@print{} line 3
@@ -16145,7 +16212,7 @@ The following version of the program works correctly:
@example
@{ l[lines++] = $0 @}
END @{
- for (i = lines - 1; i >= 0; --i)
+ for (i = lines - 1; i >= 0; i--)
print l[i]
@}
@end example
@@ -16164,6 +16231,119 @@ Even though it is somewhat unusual, the null string
if @option{--lint} is provided
on the command line (@pxref{Options}).
+@node Delete
+@section The @code{delete} Statement
+@cindex @code{delete} statement
+@cindex deleting elements in arrays
+@cindex arrays, elements, deleting
+@cindex elements in arrays, deleting
+
+To remove an individual element of an array, use the @code{delete}
+statement:
+
+@example
+delete @var{array}[@var{index-expression}]
+@end example
+
+Once an array element has been deleted, any value the element once
+had is no longer available. It is as if the element had never
+been referred to or been given a value.
+The following is an example of deleting elements in an array:
+
+@example
+for (i in frequencies)
+ delete frequencies[i]
+@end example
+
+@noindent
+This example removes all the elements from the array @code{frequencies}.
+Once an element is deleted, a subsequent @code{for} statement to scan the array
+does not report that element and the @code{in} operator to check for
+the presence of that element returns zero (i.e., false):
+
+@example
+delete foo[4]
+if (4 in foo)
+ print "This will never be printed"
+@end example
+
+@cindex null strings, and deleting array elements
+It is important to note that deleting an element is @emph{not} the
+same as assigning it a null value (the empty string, @code{""}).
+For example:
+
+@example
+foo[4] = ""
+if (4 in foo)
+ print "This is printed, even though foo[4] is empty"
+@end example
+
+@cindex lint checking, array elements
+It is not an error to delete an element that does not exist.
+However, if @option{--lint} is provided on the command line
+(@pxref{Options}),
+@command{gawk} issues a warning message when an element that
+is not in the array is deleted.
+
+@cindex common extensions, @code{delete} to delete entire arrays
+@cindex extensions, common@comma{} @code{delete} to delete entire arrays
+@cindex arrays, deleting entire contents
+@cindex deleting entire arrays
+@cindex @code{delete} @var{array}
+@cindex differences in @command{awk} and @command{gawk}, array elements, deleting
+All the elements of an array may be deleted with a single statement
+by leaving off the subscript in the @code{delete} statement,
+as follows:
+
+
+@example
+delete @var{array}
+@end example
+
+Using this version of the @code{delete} statement is about three times
+more efficient than the equivalent loop that deletes each element one
+at a time.
+
+This form of the @code{delete} statement is also supported
+by BWK @command{awk} and @command{mawk}, as well as
+by a number of other implementations.
+
+@cindex Brian Kernighan's @command{awk}
+@quotation NOTE
+For many years, using @code{delete} without a subscript was a common
+extension. In September, 2012, it was accepted for inclusion into the
+POSIX standard. See @uref{http://austingroupbugs.net/view.php?id=544,
+the Austin Group website}.
+@end quotation
+
+@cindex portability, deleting array elements
+@cindex Brennan, Michael
+The following statement provides a portable but nonobvious way to clear
+out an array:@footnote{Thanks to Michael Brennan for pointing this out.}
+
+@example
+split("", array)
+@end example
+
+@cindex @code{split()} function, array elements@comma{} deleting
+The @code{split()} function
+(@pxref{String Functions})
+clears out the target array first. This call asks it to split
+apart the null string. Because there is no data to split out, the
+function simply clears the array and then returns.
+
+@quotation CAUTION
+Deleting all the elements from an array does not change its type; you cannot
+clear an array and then use the array's name as a scalar
+(i.e., a regular variable). For example, the following does not work:
+
+@example
+a[1] = 3
+delete a
+a = 3
+@end example
+@end quotation
+
@node Multidimensional
@section Multidimensional Arrays
@@ -16175,7 +16355,7 @@ on the command line (@pxref{Options}).
@cindex arrays, multidimensional
A multidimensional array is an array in which an element is identified
by a sequence of indices instead of a single index. For example, a
-two-dimensional array requires two indices. The usual way (in most
+two-dimensional array requires two indices. The usual way (in many
languages, including @command{awk}) to refer to an element of a
two-dimensional array named @code{grid} is with
@code{grid[@var{x},@var{y}]}.
@@ -16350,8 +16530,9 @@ a[1][3][1, "name"] = "barney"
Each subarray and the main array can be of different length. In fact, the
elements of an array or its subarray do not all have to have the same
type. This means that the main array and any of its subarrays can be
-non-rectangular, or jagged in structure. One can assign a scalar value to
-the index @code{4} of the main array @code{a}:
+non-rectangular, or jagged in structure. You can assign a scalar value to
+the index @code{4} of the main array @code{a}, even though @code{a[1]}
+is itself an array and not a scalar:
@example
a[4] = "An element in a jagged array"
@@ -16433,6 +16614,8 @@ for (i in array) @{
print array[i][j]
@}
@}
+ else
+ print array[i]
@}
@end example
@@ -16717,8 +16900,9 @@ Often random integers are needed instead. Following is a user-defined function
that can be used to obtain a random non-negative integer less than @var{n}:
@example
-function randint(n) @{
- return int(n * rand())
+function randint(n)
+@{
+ return int(n * rand())
@}
@end example
@@ -16738,8 +16922,7 @@ function roll(n) @{ return 1 + int(rand() * n) @}
# Roll 3 six-sided dice and
# print total number of points.
@{
- printf("%d points\n",
- roll(6)+roll(6)+roll(6))
+ printf("%d points\n", roll(6) + roll(6) + roll(6))
@}
@end example
@@ -16828,7 +17011,7 @@ doing index calculations, particularly if you are used to C.
In the following list, optional parameters are enclosed in square brackets@w{ ([ ]).}
Several functions perform string substitution; the full discussion is
provided in the description of the @code{sub()} function, which comes
-towards the end since the list is presented in alphabetic order.
+towards the end since the list is presented alphabetically.
Those functions that are specific to @command{gawk} are marked with a
pound sign (@samp{#}). They are not available in compatibility mode
@@ -16872,6 +17055,7 @@ When comparing strings, @code{IGNORECASE} affects the sorting
(@pxref{Array Sorting Functions}). If the
@var{source} array contains subarrays as values (@pxref{Arrays of
Arrays}), they will come last, after all scalar values.
+Subarrays are @emph{not} recursively sorted.
For example, if the contents of @code{a} are as follows:
@@ -17008,7 +17192,10 @@ $ @kbd{awk 'BEGIN @{ print index("peanut", "an") @}'}
@noindent
If @var{find} is not found, @code{index()} returns zero.
-It is a fatal error to use a regexp constant for @var{find}.
+With BWK @command{awk} and @command{gawk},
+it is a fatal error to use a regexp constant for @var{find}.
+Other implementations allow it, simply treating the regexp
+constant as an expression meaning @samp{$0 ~ /regexp/}. @value{DARKCORNER}.
@item @code{length(}[@var{string}]@code{)}
@cindexawkfunc{length}
@@ -17112,8 +17299,8 @@ for @code{match()}, the order is the same as for the @samp{~} operator:
@cindex @code{RSTART} variable, @code{match()} function and
@cindex @code{RLENGTH} variable, @code{match()} function and
@cindex @code{match()} function, @code{RSTART}/@code{RLENGTH} variables
-The @code{match()} function sets the built-in variable @code{RSTART} to
-the index. It also sets the built-in variable @code{RLENGTH} to the
+The @code{match()} function sets the predefined variable @code{RSTART} to
+the index. It also sets the predefined variable @code{RLENGTH} to the
length in characters of the matched substring. If no match is found,
@code{RSTART} is set to zero, and @code{RLENGTH} to @minus{}1.
@@ -17122,13 +17309,12 @@ For example:
@example
@c file eg/misc/findpat.awk
@{
- if ($1 == "FIND")
- regex = $2
- else @{
- where = match($0, regex)
- if (where != 0)
- print "Match of", regex, "found at",
- where, "in", $0
+ if ($1 == "FIND")
+ regex = $2
+ else @{
+ where = match($0, regex)
+ if (where != 0)
+ print "Match of", regex, "found at", where, "in", $0
@}
@}
@c endfile
@@ -17224,7 +17410,7 @@ Any leading separator will be in @code{@var{seps}[0]}.
The @code{patsplit()} function splits strings into pieces in a
manner similar to the way input lines are split into fields using @code{FPAT}
-(@pxref{Splitting By Content}.
+(@pxref{Splitting By Content}).
Before splitting the string, @code{patsplit()} deletes any previously existing
elements in the arrays @var{array} and @var{seps}.
@@ -17237,8 +17423,7 @@ and store the pieces in @var{array} and the separator strings in the
@code{@var{array}[1]}, the second piece in @code{@var{array}[2]}, and so
forth. The string value of the third argument, @var{fieldsep}, is
a regexp describing where to split @var{string} (much as @code{FS} can
-be a regexp describing where to split input records;
-@pxref{Regexp Field Splitting}).
+be a regexp describing where to split input records).
If @var{fieldsep} is omitted, the value of @code{FS} is used.
@code{split()} returns the number of elements created.
@var{seps} is a @command{gawk} extension with @code{@var{seps}[@var{i}]}
@@ -17533,6 +17718,59 @@ Nonalphabetic characters are left unchanged. For example,
@code{toupper("MiXeD cAsE 123")} returns @code{"MIXED CASE 123"}.
@end table
+@cindex sidebar, Matching the Null String
+@ifdocbook
+@docbook
+<sidebar><title>Matching the Null String</title>
+@end docbook
+
+@cindex matching, null strings
+@cindex null strings, matching
+@cindex @code{*} (asterisk), @code{*} operator, null strings@comma{} matching
+@cindex asterisk (@code{*}), @code{*} operator, null strings@comma{} matching
+
+In @command{awk}, the @samp{*} operator can match the null string.
+This is particularly important for the @code{sub()}, @code{gsub()},
+and @code{gensub()} functions. For example:
+
+@example
+$ @kbd{echo abc | awk '@{ gsub(/m*/, "X"); print @}'}
+@print{} XaXbXcX
+@end example
+
+@noindent
+Although this makes a certain amount of sense, it can be surprising.
+
+@docbook
+</sidebar>
+@end docbook
+@end ifdocbook
+
+@ifnotdocbook
+@cartouche
+@center @b{Matching the Null String}
+
+
+@cindex matching, null strings
+@cindex null strings, matching
+@cindex @code{*} (asterisk), @code{*} operator, null strings@comma{} matching
+@cindex asterisk (@code{*}), @code{*} operator, null strings@comma{} matching
+
+In @command{awk}, the @samp{*} operator can match the null string.
+This is particularly important for the @code{sub()}, @code{gsub()},
+and @code{gensub()} functions. For example:
+
+@example
+$ @kbd{echo abc | awk '@{ gsub(/m*/, "X"); print @}'}
+@print{} XaXbXcX
+@end example
+
+@noindent
+Although this makes a certain amount of sense, it can be surprising.
+@end cartouche
+@end ifnotdocbook
+
+
@node Gory Details
@subsubsection More About @samp{\} and @samp{&} with @code{sub()}, @code{gsub()}, and @code{gensub()}
@@ -17546,7 +17784,7 @@ Nonalphabetic characters are left unchanged. For example,
@cindex ampersand (@code{&}), @code{gsub()}/@code{gensub()}/@code{sub()} functions and
@quotation CAUTION
-This section has been known to cause headaches.
+This subsubsection has been reported to cause headaches.
You might want to skip it upon first reading.
@end quotation
@@ -17837,58 +18075,6 @@ and the special cases for @code{sub()} and @code{gsub()},
we recommend the use of @command{gawk} and @code{gensub()} when you have
to do substitutions.
-@cindex sidebar, Matching the Null String
-@ifdocbook
-@docbook
-<sidebar><title>Matching the Null String</title>
-@end docbook
-
-@cindex matching, null strings
-@cindex null strings, matching
-@cindex @code{*} (asterisk), @code{*} operator, null strings@comma{} matching
-@cindex asterisk (@code{*}), @code{*} operator, null strings@comma{} matching
-
-In @command{awk}, the @samp{*} operator can match the null string.
-This is particularly important for the @code{sub()}, @code{gsub()},
-and @code{gensub()} functions. For example:
-
-@example
-$ @kbd{echo abc | awk '@{ gsub(/m*/, "X"); print @}'}
-@print{} XaXbXcX
-@end example
-
-@noindent
-Although this makes a certain amount of sense, it can be surprising.
-
-@docbook
-</sidebar>
-@end docbook
-@end ifdocbook
-
-@ifnotdocbook
-@cartouche
-@center @b{Matching the Null String}
-
-
-@cindex matching, null strings
-@cindex null strings, matching
-@cindex @code{*} (asterisk), @code{*} operator, null strings@comma{} matching
-@cindex asterisk (@code{*}), @code{*} operator, null strings@comma{} matching
-
-In @command{awk}, the @samp{*} operator can match the null string.
-This is particularly important for the @code{sub()}, @code{gsub()},
-and @code{gensub()} functions. For example:
-
-@example
-$ @kbd{echo abc | awk '@{ gsub(/m*/, "X"); print @}'}
-@print{} XaXbXcX
-@end example
-
-@noindent
-Although this makes a certain amount of sense, it can be surprising.
-@end cartouche
-@end ifnotdocbook
-
@node I/O Functions
@subsection Input/Output Functions
@cindex input/output functions
@@ -17941,10 +18127,9 @@ buffers its output and the @code{fflush()} function forces
@cindex extensions, common@comma{} @code{fflush()} function
@cindex Brian Kernighan's @command{awk}
-@code{fflush()} was added to BWK @command{awk} in
-April of 1992. For two decades, it was not part of the POSIX standard.
-As of December, 2012, it was accepted for inclusion into the POSIX
-standard.
+Brian Kernighan added @code{fflush()} to his @command{awk} in April
+of 1992. For two decades, it was a common extension. In December,
+2012, it was accepted for inclusion into the POSIX standard.
See @uref{http://austingroupbugs.net/view.php?id=634, the Austin Group website}.
POSIX standardizes @code{fflush()} as follows: If there
@@ -18341,7 +18526,7 @@ is out of range, @code{mktime()} returns @minus{}1.
@cindex @command{gawk}, @code{PROCINFO} array in
@cindex @code{PROCINFO} array
-@item @code{strftime(} [@var{format} [@code{,} @var{timestamp} [@code{,} @var{utc-flag}] ] ]@code{)}
+@item @code{strftime(}[@var{format} [@code{,} @var{timestamp} [@code{,} @var{utc-flag}] ] ]@code{)}
@c STARTOFRANGE strf
@cindexgawkfunc{strftime}
@cindex format time string
@@ -18447,7 +18632,7 @@ of its ISO week number is 2013, even though its year is 2012.
The full year of the ISO week number, as a decimal number.
@item %h
-Equivalent to @samp{%b}.
+Equivalent to @code{%b}.
@item %H
The hour (24-hour clock) as a decimal number (00--23).
@@ -18516,7 +18701,7 @@ The locale's ``appropriate'' date representation.
@item %X
The locale's ``appropriate'' time representation.
-(This is @samp{%T} in the @code{"C"} locale.)
+(This is @code{%T} in the @code{"C"} locale.)
@item %y
The year modulo 100 as a decimal number (00--99).
@@ -18537,7 +18722,7 @@ no time zone is determinable.
@item %Ec %EC %Ex %EX %Ey %EY %Od %Oe %OH
@itemx %OI %Om %OM %OS %Ou %OU %OV %Ow %OW %Oy
``Alternate representations'' for the specifications
-that use only the second letter (@samp{%c}, @samp{%C},
+that use only the second letter (@code{%c}, @code{%C},
and so on).@footnote{If you don't understand any of this, don't worry about
it; these facilities are meant to make it easier to ``internationalize''
programs.
@@ -18608,7 +18793,7 @@ the string. For example:
@example
$ date '+Today is %A, %B %d, %Y.'
-@print{} Today is Monday, May 05, 2014.
+@print{} Today is Monday, September 22, 2014.
@end example
Here is the @command{gawk} version of the @command{date} utility.
@@ -18800,19 +18985,18 @@ For example, if you have a bit string @samp{10111001} and you shift it
right by three bits, you end up with @samp{00010111}.@footnote{This example
shows that 0's come in on the left side. For @command{gawk}, this is
always true, but in some languages, it's possible to have the left side
-fill with 1's. Caveat emptor.}
+fill with 1's.}
@c Purposely decided to use 0's and 1's here. 2/2001.
-If you start over
-again with @samp{10111001} and shift it left by three bits, you end up
-with @samp{11001000}.
-@command{gawk} provides built-in functions that implement the
-bitwise operations just described. They are:
+If you start over again with @samp{10111001} and shift it left by three
+bits, you end up with @samp{11001000}. The following list describes
+@command{gawk}'s built-in functions that implement the bitwise operations.
+Optional parameters are enclosed in square brackets ([ ]):
@cindex @command{gawk}, bitwise operations in
@table @code
@cindexgawkfunc{and}
@cindex bitwise AND
-@item @code{and(@var{v1}, @var{v2}} [@code{,} @dots{}]@code{)}
+@item @code{and(}@var{v1}@code{,} @var{v2} [@code{,} @dots{}]@code{)}
Return the bitwise AND of the arguments. There must be at least two.
@cindexgawkfunc{compl}
@@ -18827,7 +19011,7 @@ Return the value of @var{val}, shifted left by @var{count} bits.
@cindexgawkfunc{or}
@cindex bitwise OR
-@item @code{or(@var{v1}, @var{v2}} [@code{,} @dots{}]@code{)}
+@item @code{or(}@var{v1}@code{,} @var{v2} [@code{,} @dots{}]@code{)}
Return the bitwise OR of the arguments. There must be at least two.
@cindexgawkfunc{rshift}
@@ -18837,7 +19021,7 @@ Return the value of @var{val}, shifted right by @var{count} bits.
@cindexgawkfunc{xor}
@cindex bitwise XOR
-@item @code{xor(@var{v1}, @var{v2}} [@code{,} @dots{}]@code{)}
+@item @code{xor(}@var{v1}@code{,} @var{v2} [@code{,} @dots{}]@code{)}
Return the bitwise XOR of the arguments. There must be at least two.
@end table
@@ -18960,7 +19144,7 @@ results of the @code{compl()}, @code{lshift()}, and @code{rshift()} functions.
@command{gawk} provides a single function that lets you distinguish
an array from a scalar variable. This is necessary for writing code
-that traverses every element of an array of arrays.
+that traverses every element of an array of arrays
(@pxref{Arrays of Arrays}).
@table @code
@@ -18976,12 +19160,14 @@ an array or not. The second is inside the body of a user-defined function
(not discussed yet; @pxref{User-defined}), to test if a parameter is an
array or not.
-Note, however, that using @code{isarray()} at the global level to test
+@quotation NOTE
+Using @code{isarray()} at the global level to test
variables makes no sense. Since you are the one writing the program, you
are supposed to know if your variables are arrays or not. And in fact,
due to the way @command{gawk} works, if you pass the name of a variable
that has not been previously used to @code{isarray()}, @command{gawk}
-will end up turning it into a scalar.
+ends up turning it into a scalar.
+@end quotation
@node I18N Functions
@subsection String-Translation Functions
@@ -19058,6 +19244,12 @@ them, i.e., to tell @command{awk} what they should do.
@node Definition Syntax
@subsection Function Definition Syntax
+@quotation
+@i{It's entirely fair to say that the @command{awk} syntax for local
+variable definitions is appallingly awful.}
+@author Brian Kernighan
+@end quotation
+
@c STARTOFRANGE fdef
@cindex functions, defining
Definitions of functions can appear anywhere between the rules of an
@@ -19084,6 +19276,8 @@ The definition of a function named @var{name} looks like this:
Here, @var{name} is the name of the function to define. A valid function
name is like a valid variable name: a sequence of letters, digits, and
underscores that doesn't start with a digit.
+Here too, only the 52 upper- and lowercase English letters may
+be used in a function name.
Within a single @command{awk} program, any particular name can only be
used as a variable, array, or function.
@@ -19095,9 +19289,9 @@ the call.
A function cannot have two parameters with the same name, nor may it
have a parameter with the same name as the function itself.
In addition, according to the POSIX standard, function parameters
-cannot have the same name as one of the special built-in variables
+cannot have the same name as one of the special predefined variables
(@pxref{Built-in Variables}). Not all versions of @command{awk} enforce
-this restriction.)
+this restriction.
Local variables act like the empty string if referenced where a string
value is required, and like zero if referenced where a numeric value
@@ -19234,7 +19428,7 @@ extra whitespace signifies the start of the local variable list):
function delarray(a, i)
@{
for (i in a)
- delete a[i]
+ delete a[i]
@}
@end example
@@ -19245,7 +19439,7 @@ Instead of having
to repeat this loop everywhere that you need to clear out
an array, your program can just call @code{delarray}.
(This guarantees portability. The use of @samp{delete @var{array}} to delete
-the contents of an entire array is a recent@footnote{Late in 2012.}
+the contents of an entire array is a relatively recent@footnote{Late in 2012.}
addition to the POSIX standard.)
The following is an example of a recursive function. It takes a string
@@ -19275,7 +19469,7 @@ $ @kbd{echo "Don't Panic!" |}
@print{} !cinaP t'noD
@end example
-The C @code{ctime()} function takes a timestamp and returns it in a string,
+The C @code{ctime()} function takes a timestamp and returns it as a string,
formatted in a well-known fashion.
The following example uses the built-in @code{strftime()} function
(@pxref{Time Functions})
@@ -19290,13 +19484,19 @@ to create an @command{awk} version of @code{ctime()}:
function ctime(ts, format)
@{
- format = PROCINFO["strftime"]
+ format = "%a %b %e %H:%M:%S %Z %Y"
+
if (ts == 0)
ts = systime() # use current time as default
return strftime(format, ts)
@}
@c endfile
@end example
+
+You might think that @code{ctime()} could use @code{PROCINFO["strftime"]}
+for its format string. That would be a mistake, since @code{ctime()} is
+supposed to return the time formatted in a standard fashion, and user-level
+code could have changed @code{PROCINFO["strftime"]}.
@c ENDOFRANGE fdef
@node Function Caveats
@@ -19732,7 +19932,7 @@ being aware of them.
@cindex pointers to functions
@cindex differences in @command{awk} and @command{gawk}, indirect function calls
-This section describes a @command{gawk}-specific extension.
+This section describes an advanced, @command{gawk}-specific extension.
Often, you may wish to defer the choice of function to call until runtime.
For example, you may have different kinds of records, each of which
@@ -19778,8 +19978,11 @@ To process the data, you might write initially:
@noindent
This style of programming works, but can be awkward. With @dfn{indirect}
function calls, you tell @command{gawk} to use the @emph{value} of a
-variable as the name of the function to call.
+variable as the @emph{name} of the function to call.
+@cindex @code{@@}-notation for indirect function calls
+@cindex indirect function calls, @code{@@}-notation
+@cindex function calls, indirect, @code{@@}-notation for
The syntax is similar to that of a regular function call: an identifier
immediately followed by a left parenthesis, any arguments, and then
a closing right parenthesis, with the addition of a leading @samp{@@}
@@ -19837,7 +20040,6 @@ Otherwise they perform the expected computations and are not unusual.
@example
@c file eg/prog/indirectcall.awk
# For each record, print the class name and the requested statistics
-
@{
class_name = $1
gsub(/_/, " ", class_name) # Replace _ with spaces
@@ -19866,7 +20068,7 @@ saving it in @code{start}.
The last part of the code loops through each function name (from @code{$2} up to
the marker, @samp{data:}), calling the function named by the field. The indirect
function call itself occurs as a parameter in the call to @code{printf}.
-(The @code{printf} format string uses @samp{%s} as the format specifier so that we
+(The @code{printf} format string uses @code{%s} as the format specifier so that we
can use functions that return strings, as well as numbers. Note that the result
from the indirect call is concatenated with the empty string, in order to force
it to be a string value.)
@@ -19943,7 +20145,7 @@ function quicksort(data, left, right, less_than, i, last)
# quicksort_swap --- helper function for quicksort, should really be inline
-function quicksort_swap(data, i, j, temp)
+function quicksort_swap(data, i, j, temp)
@{
temp = data[i]
data[i] = data[j]
@@ -20064,12 +20266,77 @@ $ @kbd{gawk -f quicksort.awk -f indirectcall.awk class_data2}
@print{} rsort: <100.0 95.6 93.4 87.1>
@end example
+Another example where indirect functions calls are useful can be found in
+processing arrays. @DBREF{Walking Arrays} presented a simple function
+for ``walking'' an array of arrays. That function simply printed the
+name and value of each scalar array element. However, it is easy to
+generalize that function, by passing in the name of a function to call
+when walking an array. The modified function looks like this:
+
+@example
+@c file eg/lib/processarray.awk
+function process_array(arr, name, process, do_arrays, i, new_name)
+@{
+ for (i in arr) @{
+ new_name = (name "[" i "]")
+ if (isarray(arr[i])) @{
+ if (do_arrays)
+ @@process(new_name, arr[i])
+ process_array(arr[i], new_name, process, do_arrays)
+ @} else
+ @@process(new_name, arr[i])
+ @}
+@}
+@c endfile
+@end example
+
+The arguments are as follows:
+
+@table @code
+@item arr
+The array.
+
+@item name
+The name of the array (a string).
+
+@item process
+The name of the function to call.
+
+@item do_arrays
+If this is true, the function can handle elements that are subarrays.
+@end table
+
+If subarrays are to be processed, that is done before walking them further.
+
+When run with the following scaffolding, the function produces the same
+results as does the earlier @code{walk_array()} function:
+
+@example
+BEGIN @{
+ a[1] = 1
+ a[2][1] = 21
+ a[2][2] = 22
+ a[3] = 3
+ a[4][1][1] = 411
+ a[4][2] = 42
+
+ process_array(a, "a", "do_print", 0)
+@}
+
+function do_print(name, element)
+@{
+ printf "%s = %s\n", name, element
+@}
+@end example
+
Remember that you must supply a leading @samp{@@} in front of an indirect function call.
-Unfortunately, indirect function calls cannot be used with the built-in functions. However,
-you can generally write ``wrapper'' functions which call the built-in ones, and those can
-be called indirectly. (Other than, perhaps, the mathematical functions, there is not a lot
-of reason to try to call the built-in functions indirectly.)
+Starting with @value{PVERSION} 4.1.2 of @command{gawk}, indirect function
+calls may also be used with built-in functions and with extension functions
+(@pxref{Dynamic Extensions}). The only thing you cannot do is pass a regular
+expression constant to a built-in function through an indirect function
+call.@footnote{This may change in a future version; recheck the documentation that
+comes with your version of @command{gawk} to see if it has.}
@command{gawk} does its best to make indirect function calls efficient.
For example, in the following case:
@@ -20080,7 +20347,7 @@ for (i = 1; i <= n; i++)
@end example
@noindent
-@code{gawk} will look up the actual function to call only once.
+@code{gawk} looks up the actual function to call only once.
@node Functions Summary
@section Summary
@@ -20092,10 +20359,11 @@ functions.
@item
POSIX @command{awk} provides three kinds of built-in functions: numeric,
-string, and I/O. @command{gawk} provides functions that work with values
-representing time, do bit manipulation, sort arrays, and internationalize
-and localize programs. @command{gawk} also provides several extensions to
-some of standard functions, typically in the form of additional arguments.
+string, and I/O. @command{gawk} provides functions that sort arrays, work
+with values representing time, do bit manipulation, determine variable
+type (array vs.@: scalar), and internationalize and localize programs.
+@command{gawk} also provides several extensions to some of standard
+functions, typically in the form of additional arguments.
@item
Functions accept zero or more arguments and return a value. The
@@ -20120,6 +20388,8 @@ from the real parameters by extra whitespace.
User-defined functions may call other user-defined (and built-in)
functions and may call themselves recursively. Function parameters
``hide'' any global variables of the same names.
+You cannot use the name of a reserved variable (such as @code{ARGC})
+as the name of a parameter in user-defined functions.
@item
Scalar values are passed to user-defined functions by value. Array
@@ -20138,7 +20408,7 @@ either scalar or array.
@item
@command{gawk} provides indirect function calls using a special syntax.
-By setting a variable to the name of a user-defined function, you can
+By setting a variable to the name of a function, you can
determine at runtime what function will be called at that point in the
program. This is equivalent to function pointers in C and C++.
@@ -20173,7 +20443,7 @@ It contains the following chapters:
@c STARTOFRANGE fudlib
@cindex functions, user-defined, library of
-@ref{User-defined}, describes how to write
+@DBREF{User-defined} describes how to write
your own @command{awk} functions. Writing functions is important, because
it allows you to encapsulate algorithms and program tasks in a single
place. It simplifies programming, making program development more
@@ -20206,7 +20476,7 @@ use these functions.
The functions are presented here in a progression from simple to complex.
@cindex Texinfo
-@ref{Extract Program},
+@DBREF{Extract Program}
presents a program that you can use to extract the source code for
these example library functions and programs from the Texinfo source
for this @value{DOCUMENT}.
@@ -20270,7 +20540,7 @@ comparisons use only lowercase letters.
* Group Functions:: Functions for getting group information.
* Walking Arrays:: A function to walk arrays of arrays.
* Library Functions Summary:: Summary of library functions.
-* Library exercises:: Exercises.
+* Library Exercises:: Exercises.
@end menu
@node Library Names
@@ -20330,7 +20600,7 @@ example, @code{getopt()}'s @code{Opterr} and @code{Optind} variables
(@pxref{Getopt Function}).
The leading capital letter indicates that it is global, while the fact that
the variable name is not all capital letters indicates that the variable is
-not one of @command{awk}'s built-in variables, such as @code{FS}.
+not one of @command{awk}'s predefined variables, such as @code{FS}.
@cindex @option{--dump-variables} option, using for library functions
It is also important that @emph{all} variables in library
@@ -20344,8 +20614,9 @@ are very difficult to track down:
function lib_func(x, y, l1, l2)
@{
@dots{}
- @var{use variable} some_var # some_var should be local
- @dots{} # but is not by oversight
+ # some_var should be local but by oversight is not
+ @var{use variable} some_var
+ @dots{}
@}
@end example
@@ -20357,7 +20628,7 @@ A different convention, common in the Tcl community, is to use a single
associative array to hold the values needed by the library function(s), or
``package.'' This significantly decreases the number of actual global names
in use. For example, the functions described in
-@ref{Passwd Functions},
+@DBREF{Passwd Functions}
might have used array elements @code{@w{PW_data["inited"]}}, @code{@w{PW_data["total"]}},
@code{@w{PW_data["count"]}}, and @code{@w{PW_data["awklib"]}}, instead of
@code{@w{_pw_inited}}, @code{@w{_pw_awklib}}, @code{@w{_pw_total}},
@@ -20386,6 +20657,7 @@ programming use.
* Join Function:: A function to join an array into a string.
* Getlocaltime Function:: A function to get formatted times.
* Readfile Function:: A function to read an entire file at once.
+* Shell Quoting:: A function to quote strings for the shell.
@end menu
@node Strtonum Function
@@ -20418,8 +20690,9 @@ function mystrtonum(str, ret, n, i, k, c)
ret = 0
for (i = 1; i <= n; i++) @{
c = substr(str, i, 1)
- if ((k = index("01234567", c)) > 0)
- k-- # adjust for 1-basing in awk
+ # index() returns 0 if c not in string,
+ # includes c == "0"
+ k = index("1234567", c)
ret = ret * 8 + k
@}
@@ -20431,6 +20704,8 @@ function mystrtonum(str, ret, n, i, k, c)
for (i = 1; i <= n; i++) @{
c = substr(str, i, 1)
c = tolower(c)
+ # index() returns 0 if c not in string,
+ # includes c == "0"
k = index("123456789abcdef", c)
ret = ret * 16 + k
@@ -20453,7 +20728,7 @@ function mystrtonum(str, ret, n, i, k, c)
# a[5] = "123.45"
# a[6] = "1.e3"
# a[7] = "1.32"
-# a[7] = "1.32E2"
+# a[8] = "1.32E2"
#
# for (i = 1; i in a; i++)
# print a[i], strtonum(a[i]), mystrtonum(a[i])
@@ -20464,9 +20739,12 @@ function mystrtonum(str, ret, n, i, k, c)
The function first looks for C-style octal numbers (base 8).
If the input string matches a regular expression describing octal
numbers, then @code{mystrtonum()} loops through each character in the
-string. It sets @code{k} to the index in @code{"01234567"} of the current
-octal digit. Since the return value is one-based, the @samp{k--}
-adjusts @code{k} so it can be used in computing the return value.
+string. It sets @code{k} to the index in @code{"1234567"} of the current
+octal digit.
+The return value will either be the same number as the digit, or zero
+if the character is not there, which will be true for a @samp{0}.
+This is safe, since the regexp test in the @code{if} ensures that
+only octal values are converted.
Similar logic applies to the code that checks for and converts a
hexadecimal value, which starts with @samp{0x} or @samp{0X}.
@@ -20499,7 +20777,7 @@ that a condition or set of conditions is true. Before proceeding with a
particular computation, you make a statement about what you believe to be
the case. Such a statement is known as an
@dfn{assertion}. The C language provides an @code{<assert.h>} header file
-and corresponding @code{assert()} macro that the programmer can use to make
+and corresponding @code{assert()} macro that a programmer can use to make
assertions. If an assertion fails, the @code{assert()} macro arranges to
print a diagnostic message describing the condition that should have
been true but was not, and then it kills the program. In C, using
@@ -20832,8 +21110,7 @@ function chr(c)
@c endfile
#### test code ####
-# BEGIN \
-# @{
+# BEGIN @{
# for (;;) @{
# printf("enter a character: ")
# if (getline var <= 0)
@@ -20918,7 +21195,7 @@ more difficult than they really need to be.}
@cindex timestamps, formatted
@cindex time, managing
The @code{systime()} and @code{strftime()} functions described in
-@ref{Time Functions},
+@DBREF{Time Functions}
provide the minimum functionality necessary for dealing with the time of day
in human readable form. While @code{strftime()} is extensive, the control
formats are not necessarily easy to remember or intuitively obvious when
@@ -20970,7 +21247,7 @@ function getlocaltime(time, ret, now, i)
now = systime()
# return date(1)-style output
- ret = strftime(PROCINFO["strftime"], now)
+ ret = strftime("%a %b %e %H:%M:%S %Z %Y", now)
# clear out target array
delete time
@@ -21004,7 +21281,7 @@ function getlocaltime(time, ret, now, i)
The string indices are easier to use and read than the various formats
required by @code{strftime()}. The @code{alarm} program presented in
-@ref{Alarm Program},
+@DBREF{Alarm Program}
uses this function.
A more general design for the @code{getlocaltime()} function would have
allowed the user to supply an optional timestamp value to use instead
@@ -21085,6 +21362,87 @@ if (length(contents) == 0)
This tests the result to see if it is empty or not. An equivalent
test would be @samp{contents == ""}.
+@xref{Extension Sample Readfile}, for an extension function that
+also reads an entire file into memory.
+
+@node Shell Quoting
+@subsection Quoting Strings to Pass to The Shell
+
+@c included by permission
+@ignore
+Date: Sun, 27 Jul 2014 17:16:16 -0700
+Message-ID: <CAKuGj+iCF_obaCLDUX60aSAgbfocFVtguG39GyeoNxTFby5sqQ@mail.gmail.com>
+Subject: Useful awk function
+From: Mike Brennan <mike@madronabluff.com>
+To: Arnold Robbins <arnold@skeeve.com>
+@end ignore
+
+Michael Brennan offers the following programming pattern,
+which he uses frequently:
+
+@example
+#! /bin/sh
+
+awkp='
+ @dots{}
+ '
+
+@var{input_program} | awk "$awkp" | /bin/sh
+@end example
+
+For example, a program of his named @command{flac-edit} has this form:
+
+@example
+$ @kbd{flac-edit -song="Whoope! That's Great" file.flac}
+@end example
+
+It generates the following output, which is to be piped to
+the shell (@file{/bin/sh}):
+
+@example
+chmod +w file.flac
+metaflac --remove-tag=TITLE file.flac
+LANG=en_US.88591 metaflac --set-tag=TITLE='Whoope! That'"'"'s Great' file.flac
+chmod -w file.flac
+@end example
+
+Note the need for shell quoting. The function @code{shell_quote()}
+does it. @code{SINGLE} is the one-character string @code{"'"} and
+@code{QSINGLE} is the three-character string @code{"\"'\""}.
+
+@example
+@c file eg/lib/shellquote.awk
+# shell_quote --- quote an argument for passing to the shell
+@c endfile
+@ignore
+@c file eg/lib/shellquote.awk
+#
+# Michael Brennan
+# brennan@@madronabluff.com
+# September 2014
+@c endfile
+@end ignore
+@c file eg/lib/shellquote.awk
+
+function shell_quote(s, # parameter
+ SINGLE, QSINGLE, i, X, n, ret) # locals
+@{
+ if (s == "")
+ return "\"\""
+
+ SINGLE = "\x27" # single quote
+ QSINGLE = "\"\x27\""
+ n = split(s, X, SINGLE)
+
+ ret = SINGLE X[1] SINGLE
+ for (i = 2; i <= n; i++)
+ ret = ret QSINGLE SINGLE X[i] SINGLE
+
+ return ret
+@}
+@c endfile
+@end example
+
@node Data File Management
@section @value{DDF} Management
@@ -21142,15 +21500,14 @@ Besides solving the problem in only nine(!) lines of code, it does so
@c # Arnold Robbins, arnold@@skeeve.com, Public Domain
@c # January 1992
-FILENAME != _oldfilename \
-@{
+FILENAME != _oldfilename @{
if (_oldfilename != "")
endfile(_oldfilename)
_oldfilename = FILENAME
beginfile(FILENAME)
@}
-END @{ endfile(FILENAME) @}
+END @{ endfile(FILENAME) @}
@end example
This file must be loaded before the user's ``main'' program, so that the
@@ -21203,11 +21560,11 @@ FNR == 1 @{
beginfile(FILENAME)
@}
-END @{ endfile(_filename_) @}
+END @{ endfile(_filename_) @}
@c endfile
@end example
-@ref{Wc Program},
+@DBREF{Wc Program}
shows how this library function can be used and
how it simplifies writing the main program.
@@ -21302,24 +21659,12 @@ function rewind( i)
@c endfile
@end example
-This code relies on the @code{ARGIND} variable
-(@pxref{Auto-set}),
-which is specific to @command{gawk}.
-If you are not using
-@command{gawk}, you can use ideas presented in
-@ifnotinfo
-the previous @value{SECTION}
-@end ifnotinfo
-@ifinfo
-@ref{Filetrans Function},
-@end ifinfo
-to either update @code{ARGIND} on your own
-or modify this code as appropriate.
-
-The @code{rewind()} function also relies on the @code{nextfile} keyword
-(@pxref{Nextfile Statement}). Because of this, you should not call it
-from an @code{ENDFILE} rule. (This isn't necessary anyway, since as soon
-as an @code{ENDFILE} rule finishes @command{gawk} goes to the next file!)
+The @code{rewind()} function relies on the @code{ARGIND} variable
+(@pxref{Auto-set}), which is specific to @command{gawk}. It also
+relies on the @code{nextfile} keyword (@pxref{Nextfile Statement}).
+Because of this, you should not call it from an @code{ENDFILE} rule.
+(This isn't necessary anyway, since as soon as an @code{ENDFILE} rule
+finishes @command{gawk} goes to the next file!)
@node File Checking
@subsection Checking for Readable @value{DDF}s
@@ -21352,7 +21697,7 @@ the following program to your @command{awk} program:
BEGIN @{
for (i = 1; i < ARGC; i++) @{
- if (ARGV[i] ~ /^[[:alpha:]_][[:alnum:]_]*=.*/ \
+ if (ARGV[i] ~ /^[a-zA-Z_][a-zA-Z0-9_]*=.*/ \
|| ARGV[i] == "-" || ARGV[i] == "/dev/stdin")
continue # assignment or standard input
else if ((getline junk < ARGV[i]) < 0) # unreadable
@@ -21370,6 +21715,11 @@ Removing the element from @code{ARGV} with @code{delete}
skips the file (since it's no longer in the list).
See also @ref{ARGC and ARGV}.
+The regular expression check purposely does not use character classes
+such as @samp{[:alpha:]} and @samp{[:alnum:]}
+(@pxref{Bracket Expressions})
+since @command{awk} variable names only allow the English letters.
+
@node Empty Files
@subsection Checking for Zero-length Files
@@ -21466,7 +21816,7 @@ a library file does the trick:
function disable_assigns(argc, argv, i)
@{
for (i = 1; i < argc; i++)
- if (argv[i] ~ /^[[:alpha:]_][[:alnum:]_]*=.*/)
+ if (argv[i] ~ /^[a-zA-Z_][a-zA-Z0-9_]*=.*/)
argv[i] = ("./" argv[i])
@}
@@ -21707,8 +22057,7 @@ it is not an option, and it ends option processing. Continuing on:
i = index(options, thisopt)
if (i == 0) @{
if (Opterr)
- printf("%c -- invalid option\n",
- thisopt) > "/dev/stderr"
+ printf("%c -- invalid option\n", thisopt) > "/dev/stderr"
if (_opti >= length(argv[Optind])) @{
Optind++
_opti = 0
@@ -21839,12 +22188,18 @@ In both runs, the first @option{--} terminates the arguments to
etc., as its own options.
@quotation NOTE
-After @code{getopt()} is through, it is the responsibility of the
-user level code to clear out all the elements of @code{ARGV} from 1
+After @code{getopt()} is through,
+user level code must clear out all the elements of @code{ARGV} from 1
to @code{Optind}, so that @command{awk} does not try to process the
command-line options as @value{FN}s.
@end quotation
+Using @samp{#!} with the @option{-E} option may help avoid
+conflicts between your program's options and @command{gawk}'s options,
+since @option{-E} causes @command{gawk} to abandon processing of
+further options
+(@pxref{Executable Scripts}, and @pxref{Options}).
+
Several of the sample programs presented in
@ref{Sample Programs},
use @code{getopt()} to process their arguments.
@@ -22089,13 +22444,14 @@ The @code{BEGIN} rule sets a private variable to the directory where
routine, we have chosen to put it in @file{/usr/local/libexec/awk};
however, you might want it to be in a different directory on your system.
-The function @code{_pw_init()} keeps three copies of the user information
-in three associative arrays. The arrays are indexed by username
+The function @code{_pw_init()} fills three copies of the user information
+into three associative arrays. The arrays are indexed by username
(@code{_pw_byname}), by user ID number (@code{_pw_byuid}), and by order of
occurrence (@code{_pw_bycount}).
The variable @code{_pw_inited} is used for efficiency, since @code{_pw_init()}
needs to be called only once.
+@cindex @code{PROCINFO} array, testing the field splitting
@cindex @code{getline} command, @code{_pw_init()} function
Because this function uses @code{getline} to read information from
@command{pwcat}, it first saves the values of @code{FS}, @code{RS}, and @code{$0}.
@@ -22103,13 +22459,8 @@ It notes in the variable @code{using_fw} whether field splitting
with @code{FIELDWIDTHS} is in effect or not.
Doing so is necessary, since these functions could be called
from anywhere within a user's program, and the user may have his
-or her
-own way of splitting records and fields.
-
-@cindex @code{PROCINFO} array, testing the field splitting
-The @code{using_fw} variable checks @code{PROCINFO["FS"]}, which
-is @code{"FIELDWIDTHS"} if field splitting is being done with
-@code{FIELDWIDTHS}. This makes it possible to restore the correct
+or her own way of splitting records and fields.
+This makes it possible to restore the correct
field-splitting mechanism later. The test can only be true for
@command{gawk}. It is false if using @code{FS} or @code{FPAT},
or on some other @command{awk} implementation.
@@ -22211,7 +22562,7 @@ once. If you are worried about squeezing every last cycle out of your
this is not necessary, since most @command{awk} programs are I/O-bound,
and such a change would clutter up the code.
-The @command{id} program in @ref{Id Program},
+The @command{id} program in @DBREF{Id Program}
uses these functions.
@c ENDOFRANGE libfudata
@c ENDOFRANGE flibudata
@@ -22237,7 +22588,7 @@ uses these functions.
@cindex group file
@cindex files, group
Much of the discussion presented in
-@ref{Passwd Functions},
+@DBREF{Passwd Functions}
applies to the group database as well. Although there has traditionally
been a well-known file (@file{/etc/group}) in a well-known format, the POSIX
standard only provides a set of C library routines
@@ -22390,8 +22741,7 @@ There are several, modeled after the C library functions of the same names:
@c line break on _gr_init for smallbook
@c file eg/lib/groupawk.in
-BEGIN \
-@{
+BEGIN @{
# Change to suit your system
_gr_awklib = "/usr/local/libexec/awk/"
@}
@@ -22424,8 +22774,7 @@ function _gr_init( oldfs, oldrs, olddol0, grcat,
n = split($4, a, "[ \t]*,[ \t]*")
for (i = 1; i <= n; i++)
if (a[i] in _gr_groupsbyuser)
- _gr_groupsbyuser[a[i]] = \
- _gr_groupsbyuser[a[i]] " " $1
+ _gr_groupsbyuser[a[i]] = gr_groupsbyuser[a[i]] " " $1
else
_gr_groupsbyuser[a[i]] = $1
@@ -22577,13 +22926,13 @@ Most of the work is in scanning the database and building the various
associative arrays. The functions that the user calls are themselves very
simple, relying on @command{awk}'s associative arrays to do work.
-The @command{id} program in @ref{Id Program},
+The @command{id} program in @DBREF{Id Program}
uses these functions.
@node Walking Arrays
@section Traversing Arrays of Arrays
-@ref{Arrays of Arrays}, described how @command{gawk}
+@DBREF{Arrays of Arrays} described how @command{gawk}
provides arrays of arrays. In particular, any element of
an array may be either a scalar, or another array. The
@code{isarray()} function (@pxref{Type Functions})
@@ -22633,12 +22982,12 @@ When run, the program produces the following output:
@example
$ @kbd{gawk -f walk_array.awk}
-@print{} a[4][1][1] = 411
-@print{} a[4][2] = 42
@print{} a[1] = 1
@print{} a[2][1] = 21
@print{} a[2][2] = 22
@print{} a[3] = 3
+@print{} a[4][1][1] = 411
+@print{} a[4][2] = 42
@end example
@c ENDOFRANGE libfgdata
@@ -22652,8 +23001,8 @@ $ @kbd{gawk -f walk_array.awk}
@itemize @value{BULLET}
@item
Reading programs is an excellent way to learn Good Programming.
-The functions provided in this @value{CHAPTER} and the next are intended
-to serve that purpose.
+The functions and programs provided in this @value{CHAPTER} and the next
+are intended to serve that purpose.
@item
When writing general-purpose library functions, put some thought into how
@@ -22689,7 +23038,8 @@ A simple function to traverse an array of arrays to any depth.
@end itemize
-@node Library exercises
+@c EXCLUDE START
+@node Library Exercises
@section Exercises
@enumerate
@@ -22737,7 +23087,7 @@ As a related challenge, revise that code to handle the case where
an intervening value in @code{ARGV} is a variable assignment.
@item
-@ref{Walking Arrays}, presented a function that walked a multidimensional
+@DBREF{Walking Arrays} presented a function that walked a multidimensional
array to print it out. However, walking an array and processing
each element is a general-purpose operation. Generalize the
@code{walk_array()} function by adding an additional parameter named
@@ -22755,6 +23105,7 @@ Test your new version by printing the array; you should end up with
output identical to that of the original version.
@end enumerate
+@c EXCLUDE END
@c ENDOFRANGE flib
@c ENDOFRANGE fudlib
@@ -22938,22 +23289,16 @@ supplied:
# Requires getopt() and join() library functions
@group
-function usage( e1, e2)
+function usage()
@{
- e1 = "usage: cut [-f list] [-d c] [-s] [files...]"
- e2 = "usage: cut [-c list] [files...]"
- print e1 > "/dev/stderr"
- print e2 > "/dev/stderr"
+ print("usage: cut [-f list] [-d c] [-s] [files...]") > "/dev/stderr"
+ print("usage: cut [-c list] [files...]") > "/dev/stderr"
exit 1
@}
@end group
@c endfile
@end example
-@noindent
-The variables @code{e1} and @code{e2} are used so that the function
-fits nicely on the @value{PAGE}.
-
@cindex @code{BEGIN} pattern, running @command{awk} programs and
@cindex @code{FS} variable, running @command{awk} programs and
Next comes a @code{BEGIN} rule that parses the command-line options.
@@ -22968,8 +23313,7 @@ string:
@example
@c file eg/prog/cut.awk
-BEGIN \
-@{
+BEGIN @{
FS = "\t" # default
OFS = FS
while ((c = getopt(ARGC, ARGV, "sf:c:d:")) != -1) @{
@@ -23249,7 +23593,7 @@ and the file transition library program
The program begins with a descriptive comment and then a @code{BEGIN} rule
that processes the command-line arguments with @code{getopt()}. The @option{-i}
(ignore case) option is particularly easy with @command{gawk}; we just use the
-@code{IGNORECASE} built-in variable
+@code{IGNORECASE} predefined variable
(@pxref{Built-in Variables}):
@cindex @code{egrep.awk} program
@@ -23444,8 +23788,7 @@ there are no matches, the exit status is one; otherwise it is zero:
@example
@c file eg/prog/egrep.awk
-END \
-@{
+END @{
exit (total == 0)
@}
@c endfile
@@ -23456,30 +23799,15 @@ and then exits:
@example
@c file eg/prog/egrep.awk
-function usage( e)
+function usage()
@{
- e = "Usage: egrep [-csvil] [-e pat] [files ...]"
- e = e "\n\tegrep [-csvil] pat [files ...]"
- print e > "/dev/stderr"
+ print("Usage: egrep [-csvil] [-e pat] [files ...]") > "/dev/stderr"
+ print("\n\tegrep [-csvil] pat [files ...]") > "/dev/stderr"
exit 1
@}
@c endfile
@end example
-The variable @code{e} is used so that the function fits nicely
-on the printed page.
-
-@cindex @code{END} pattern, backslash continuation and
-@cindex @code{\} (backslash), continuing lines and
-@cindex backslash (@code{\}), continuing lines and
-Just a note on programming style: you may have noticed that the @code{END}
-rule uses backslash continuation, with the open brace on a line by
-itself. This is so that it more closely resembles the way functions
-are written. Many of the examples
-in this @value{CHAPTER}
-use this style. You can decide for yourself if you like writing
-your @code{BEGIN} and @code{END} rules this way
-or not.
@c ENDOFRANGE regexps
@c ENDOFRANGE sfregexp
@c ENDOFRANGE fsregexp
@@ -23537,6 +23865,7 @@ numbers:
# May 1993
# Revised February 1996
# Revised May 2014
+# Revised September 2014
@c endfile
@end ignore
@@ -23546,8 +23875,7 @@ numbers:
# egid=5(blat) groups=9(nine),2(two),1(one)
@group
-BEGIN \
-@{
+BEGIN @{
uid = PROCINFO["uid"]
euid = PROCINFO["euid"]
gid = PROCINFO["gid"]
@@ -23556,26 +23884,22 @@ BEGIN \
printf("uid=%d", uid)
pw = getpwuid(uid)
- if (pw != "")
- pr_first_field(pw)
+ pr_first_field(pw)
if (euid != uid) @{
printf(" euid=%d", euid)
pw = getpwuid(euid)
- if (pw != "")
- pr_first_field(pw)
+ pr_first_field(pw)
@}
printf(" gid=%d", gid)
pw = getgrgid(gid)
- if (pw != "")
- pr_first_field(pw)
+ pr_first_field(pw)
if (egid != gid) @{
printf(" egid=%d", egid)
pw = getgrgid(egid)
- if (pw != "")
- pr_first_field(pw)
+ pr_first_field(pw)
@}
for (i = 1; ("group" i) in PROCINFO; i++) @{
@@ -23584,8 +23908,7 @@ BEGIN \
group = PROCINFO["group" i]
printf("%d", group)
pw = getgrgid(group)
- if (pw != "")
- pr_first_field(pw)
+ pr_first_field(pw)
if (("group" (i+1)) in PROCINFO)
printf(",")
@}
@@ -23595,8 +23918,10 @@ BEGIN \
function pr_first_field(str, a)
@{
- split(str, a, ":")
- printf("(%s)", a[1])
+ if (str != "") @{
+ split(str, a, ":")
+ printf("(%s)", a[1])
+ @}
@}
@c endfile
@end example
@@ -23619,7 +23944,8 @@ tested, and the loop body never executes.
The @code{pr_first_field()} function simply isolates out some
code that is used repeatedly, making the whole program
-slightly shorter and cleaner.
+shorter and cleaner. In particular, moving the check for
+the empty string into this function saves several lines of code.
@c ENDOFRANGE id
@@ -23746,24 +24072,25 @@ The @code{usage()} function simply prints an error message and exits:
@example
@c file eg/prog/split.awk
-function usage( e)
+function usage()
@{
- e = "usage: split [-num] [file] [outname]"
- print e > "/dev/stderr"
+ print("usage: split [-num] [file] [outname]") > "/dev/stderr"
exit 1
@}
@c endfile
@end example
-@noindent
-The variable @code{e} is used so that the function
-fits nicely on the @value{PAGE}.
-
This program is a bit sloppy; it relies on @command{awk} to automatically close the last file
instead of doing it in an @code{END} rule.
It also assumes that letters are contiguous in the character set,
which isn't true for EBCDIC systems.
+@ifset FOR_PRINT
+You might want to consider how to eliminate the use of
+@code{ord()} and @code{chr()}; this can be done in such a
+way as to solve the EBCDIC issue as well.
+@end ifset
+
@c ENDOFRANGE filspl
@c ENDOFRANGE split
@@ -23817,8 +24144,7 @@ Finally, @command{awk} is forced to read the standard input by setting
@c endfile
@end ignore
@c file eg/prog/tee.awk
-BEGIN \
-@{
+BEGIN @{
for (i = 1; i < ARGC; i++)
copy[i] = ARGV[i]
@@ -23880,8 +24206,7 @@ Finally, the @code{END} rule cleans up by closing all the output files:
@example
@c file eg/prog/tee.awk
-END \
-@{
+END @{
for (i in copy)
close(copy[i])
@}
@@ -23913,10 +24238,10 @@ The options for @command{uniq} are:
@table @code
@item -d
-Print only repeated lines.
+Print only repeated (duplicated) lines.
@item -u
-Print only nonrepeated lines.
+Print only nonrepeated (unique) lines.
@item -c
Count lines. This option overrides @option{-d} and @option{-u}. Both repeated
@@ -23985,10 +24310,9 @@ standard output, @file{/dev/stdout}:
@end ignore
@c file eg/prog/uniq.awk
-function usage( e)
+function usage()
@{
- e = "Usage: uniq [-udc [-n]] [+n] [ in [ out ]]"
- print e > "/dev/stderr"
+ print("Usage: uniq [-udc [-n]] [+n] [ in [ out ]]") > "/dev/stderr"
exit 1
@}
@@ -23998,8 +24322,7 @@ function usage( e)
# -n skip n fields
# +n skip n characters, skip fields first
-BEGIN \
-@{
+BEGIN @{
count = 1
outputfile = "/dev/stdout"
opts = "udc0:1:2:3:4:5:6:7:8:9:"
@@ -24011,7 +24334,7 @@ BEGIN \
else if (c == "c")
do_count++
else if (index("0123456789", c) != 0) @{
- # getopt requires args to options
+ # getopt() requires args to options
# this messes us up for things like -5
if (Optarg ~ /^[[:digit:]]+$/)
fcount = (c Optarg) + 0
@@ -24043,22 +24366,20 @@ BEGIN \
@end example
The following function, @code{are_equal()}, compares the current line,
-@code{$0}, to the
-previous line, @code{last}. It handles skipping fields and characters.
-If no field count and no character count are specified, @code{are_equal()}
-simply returns one or zero depending upon the result of a simple string
-comparison of @code{last} and @code{$0}. Otherwise, things get more
-complicated.
-If fields have to be skipped, each line is broken into an array using
-@code{split()}
-(@pxref{String Functions});
-the desired fields are then joined back into a line using @code{join()}.
-The joined lines are stored in @code{clast} and @code{cline}.
-If no fields are skipped, @code{clast} and @code{cline} are set to
-@code{last} and @code{$0}, respectively.
-Finally, if characters are skipped, @code{substr()} is used to strip off the
-leading @code{charcount} characters in @code{clast} and @code{cline}. The
-two strings are then compared and @code{are_equal()} returns the result:
+@code{$0}, to the previous line, @code{last}. It handles skipping fields
+and characters. If no field count and no character count are specified,
+@code{are_equal()} returns one or zero depending upon the result of a
+simple string comparison of @code{last} and @code{$0}.
+
+Otherwise, things get more complicated. If fields have to be skipped,
+each line is broken into an array using @code{split()} (@pxref{String
+Functions}); the desired fields are then joined back into a line
+using @code{join()}. The joined lines are stored in @code{clast} and
+@code{cline}. If no fields are skipped, @code{clast} and @code{cline}
+are set to @code{last} and @code{$0}, respectively. Finally, if
+characters are skipped, @code{substr()} is used to strip off the leading
+@code{charcount} characters in @code{clast} and @code{cline}. The two
+strings are then compared and @code{are_equal()} returns the result:
@example
@c file eg/prog/uniq.awk
@@ -24148,6 +24469,29 @@ END @{
@}
@c endfile
@end example
+
+@c FIXME: Include this?
+@ignore
+This program does not follow our recommended convention of naming
+global variables with a leading capital letter. Doing that would
+make the program a little easier to follow.
+@end ignore
+
+@ifset FOR_PRINT
+The logic for choosing which lines to print represents a @dfn{state
+machine}, which is ``a device that can be in one of a set number of stable
+conditions depending on its previous condition and on the present values
+of its inputs.''@footnote{This is the definition returned from entering
+@code{define: state machine} into Google.}
+Brian Kernighan suggests that
+``an alternative approach to state machines is to just read
+the input into an array, then use indexing. It's almost always
+easier code, and for most inputs where you would use this, just
+as fast.'' Consider how to rewrite the logic to follow this
+suggestion.
+@end ifset
+
+
@c ENDOFRANGE prunt
@c ENDOFRANGE tpul
@c ENDOFRANGE uniq
@@ -24178,7 +24522,7 @@ one or more input files. Its usage is as follows:
If no files are specified on the command line, @command{wc} reads its standard
input. If there are multiple files, it also prints total counts for all
-the files. The options and their meanings are shown in the following list:
+the files. The options and their meanings are as follows:
@table @code
@item -l
@@ -24518,8 +24862,7 @@ Here is the program:
@c file eg/prog/alarm.awk
# usage: alarm time [ "message" [ count [ delay ] ] ]
-BEGIN \
-@{
+BEGIN @{
# Initial argument sanity checking
usage1 = "usage: alarm time ['message' [count [delay]]]"
usage2 = sprintf("\t(%s) time ::= hh:mm", ARGV[1])
@@ -24660,8 +25003,8 @@ character of the ``to'' list is used for the remaining characters in the
Once upon a time,
@c early or mid-1989!
-a user proposed that a transliteration function should
-be added to @command{gawk}.
+a user proposed adding a transliteration function
+to @command{gawk}.
@c Wishing to avoid gratuitous new features,
@c at least theoretically
The following program was written to
@@ -24669,15 +25012,12 @@ prove that character transliteration could be done with a user-level
function. This program is not as complete as the system @command{tr} utility
but it does most of the job.
-The @command{translate} program demonstrates one of the few weaknesses
-of standard @command{awk}: dealing with individual characters is very
-painful, requiring repeated use of the @code{substr()}, @code{index()},
-and @code{gsub()} built-in functions
-(@pxref{String Functions}).@footnote{This
-program was written before @command{gawk} acquired the ability to
-split each character in a string into separate array elements.}
-There are two functions. The first, @code{stranslate()}, takes three
-arguments:
+The @command{translate} program was written long before @command{gawk}
+acquired the ability to split each character in a string into separate
+array elements. Thus, it makes repeated use of the @code{substr()},
+@code{index()}, and @code{gsub()} built-in functions (@pxref{String
+Functions}). There are two functions. The first, @code{stranslate()},
+takes three arguments:
@table @code
@item from
@@ -24696,7 +25036,7 @@ loop goes through @code{from}, one character at a time. For each character
in @code{from}, if the character appears in @code{target},
it is replaced with the corresponding @code{to} character.
-The @code{translate()} function simply calls @code{stranslate()} using @code{$0}
+The @code{translate()} function calls @code{stranslate()} using @code{$0}
as the target. The main program sets two global variables, @code{FROM} and
@code{TO}, from the command line, and then changes @code{ARGV} so that
@command{awk} reads from the standard input.
@@ -24782,6 +25122,12 @@ An obvious improvement to this program would be to set up the
@code{t_ar} array only once, in a @code{BEGIN} rule. However, this
assumes that the ``from'' and ``to'' lists
will never change throughout the lifetime of the program.
+
+Another obvious improvement is to enable the use of ranges,
+such as @samp{a-z}, as allowed by the @command{tr} utility.
+Look at the code for @file{cut.awk} (@pxref{Cut Program})
+for inspiration.
+
@c ENDOFRANGE chtra
@c ENDOFRANGE tr
@@ -24825,7 +25171,7 @@ of lines on the page
Most of the work is done in the @code{printpage()} function.
The label lines are stored sequentially in the @code{line} array. But they
have to print horizontally; @code{line[1]} next to @code{line[6]},
-@code{line[2]} next to @code{line[7]}, and so on. Two loops are used to
+@code{line[2]} next to @code{line[7]}, and so on. Two loops
accomplish this. The outer loop, controlled by @code{i}, steps through
every 10 lines of data; this is each row of labels. The inner loop,
controlled by @code{j}, goes through the lines within the row.
@@ -24914,8 +25260,7 @@ function printpage( i, j)
Count++
@}
-END \
-@{
+END @{
printpage()
@}
@c endfile
@@ -24940,7 +25285,7 @@ in a useful format.
At first glance, a program like this would seem to do the job:
@example
-# Print list of word frequencies
+# wordfreq-first-try.awk --- print list of word frequencies
@{
for (i = 1; i <= NF; i++)
@@ -25157,16 +25502,16 @@ Texinfo input file into separate files.
This @value{DOCUMENT} is written in @uref{http://www.gnu.org/software/texinfo/, Texinfo},
the GNU project's document formatting language.
A single Texinfo source file can be used to produce both
-printed and online documentation.
+printed documentation, with @TeX{}, and online documentation.
@ifnotinfo
-Texinfo is fully documented in the book
+(Texinfo is fully documented in the book
@cite{Texinfo---The GNU Documentation Format},
available from the Free Software Foundation,
-and also available @uref{http://www.gnu.org/software/texinfo/manual/texinfo/, online}.
+and also available @uref{http://www.gnu.org/software/texinfo/manual/texinfo/, online}.)
@end ifnotinfo
@ifinfo
-The Texinfo language is described fully, starting with
-@inforef{Top, , Texinfo, texinfo,Texinfo---The GNU Documentation Format}.
+(The Texinfo language is described fully, starting with
+@inforef{Top, , Texinfo, texinfo,Texinfo---The GNU Documentation Format}.)
@end ifinfo
For our purposes, it is enough to know three things about Texinfo input
@@ -25244,8 +25589,7 @@ exits with a zero exit status, signifying OK:
@cindex @code{extract.awk} program
@example
@c file eg/prog/extract.awk
-# extract.awk --- extract files and run programs
-# from texinfo files
+# extract.awk --- extract files and run programs from texinfo files
@c endfile
@ignore
@c file eg/prog/extract.awk
@@ -25259,8 +25603,7 @@ exits with a zero exit status, signifying OK:
BEGIN @{ IGNORECASE = 1 @}
-/^@@c(omment)?[ \t]+system/ \
-@{
+/^@@c(omment)?[ \t]+system/ @{
if (NF < 3) @{
e = ("extract: " FILENAME ":" FNR)
e = (e ": badly formed `system' line")
@@ -25317,8 +25660,7 @@ line. That line is then printed to the output file:
@example
@c file eg/prog/extract.awk
-/^@@c(omment)?[ \t]+file/ \
-@{
+/^@@c(omment)?[ \t]+file/ @{
if (NF != 3) @{
e = ("extract: " FILENAME ":" FNR ": badly formed `file' line")
print e > "/dev/stderr"
@@ -25378,7 +25720,7 @@ The @code{END} rule handles the final cleanup, closing the open file:
function unexpected_eof()
@{
printf("extract: %s:%d: unexpected EOF or error\n",
- FILENAME, FNR) > "/dev/stderr"
+ FILENAME, FNR) > "/dev/stderr"
exit 1
@}
@end group
@@ -25406,7 +25748,7 @@ While @command{sed} is a complicated program in its own right, its most common
use is to perform global substitutions in the middle of a pipeline:
@example
-command1 < orig.data | sed 's/old/new/g' | command2 > result
+@var{command1} < orig.data | sed 's/old/new/g' | @var{command2} > result
@end example
Here, @samp{s/old/new/g} tells @command{sed} to look for the regexp
@@ -25638,6 +25980,7 @@ should be the @command{awk} program. If there are no command-line
arguments left, @command{igawk} prints an error message and exits.
Otherwise, the first argument is appended to @code{program}.
In any case, after the arguments have been processed,
+the shell variable
@code{program} contains the complete text of the original @command{awk}
program.
@@ -25760,8 +26103,8 @@ the path, and an attempt is made to open the generated @value{FN}.
The only way to test if a file can be read in @command{awk} is to go
ahead and try to read it with @code{getline}; this is what @code{pathto()}
does.@footnote{On some very old versions of @command{awk}, the test
-@samp{getline junk < t} can loop forever if the file exists but is empty.
-Caveat emptor.} If the file can be read, it is closed and the @value{FN}
+@samp{getline junk < t} can loop forever if the file exists but is empty.}
+If the file can be read, it is closed and the @value{FN}
is returned:
@ignore
@@ -25961,12 +26304,10 @@ in C or C++, and it is frequently easier to do certain kinds of string
and argument manipulation using the shell than it is in @command{awk}.
Finally, @command{igawk} shows that it is not always necessary to add new
-features to a program; they can often be layered on top.
-@ignore
-With @command{igawk},
-there is no real reason to build @code{@@include} processing into
-@command{gawk} itself.
-@end ignore
+features to a program; they can often be layered on top.@footnote{@command{gawk}
+does @code{@@include} processing itself in order to support the use
+of @command{awk} programs as Web CGI scripts.}
+
@c ENDOFRANGE libfex
@c ENDOFRANGE flibex
@c ENDOFRANGE awkpex
@@ -25984,12 +26325,11 @@ One word is an anagram of another if both words contain
the same letters
(for example, ``babbling'' and ``blabbing'').
-An elegant algorithm is presented in Column 2, Problem C of
-Jon Bentley's @cite{Programming Pearls}, second edition.
-The idea is to give words that are anagrams a common signature,
-sort all the words together by their signature, and then print them.
-Dr.@: Bentley observes that taking the letters in each word and
-sorting them produces that common signature.
+Column 2, Problem C of Jon Bentley's @cite{Programming Pearls}, second
+edition, presents an elegant algorithm. The idea is to give words that
+are anagrams a common signature, sort all the words together by their
+signature, and then print them. Dr.@: Bentley observes that taking the
+letters in each word and sorting them produces that common signature.
The following program uses arrays of arrays to bring together
words with the same signature and array sorting to print the words
@@ -26223,7 +26563,7 @@ BEGIN {
@itemize @value{BULLET}
@item
-The functions provided in this @value{CHAPTER} and the previous one
+The programs provided in this @value{CHAPTER}
continue on the theme that reading programs is an excellent way to learn
Good Programming.
@@ -26254,13 +26594,14 @@ mailing labels, and finding anagrams.
@end itemize
+@c EXCLUDE START
@node Programs Exercises
@section Exercises
@enumerate
@item
Rewrite @file{cut.awk} (@pxref{Cut Program})
-using @code{split()} with @code{""} as the seperator.
+using @code{split()} with @code{""} as the separator.
@item
In @ref{Egrep Program}, we mentioned that @samp{egrep -i} could be
@@ -26277,17 +26618,27 @@ information is printed. Modify the @command{awk} version
same way.
@item
-The @code{split.awk} program (@pxref{Split Program}) uses
-the @code{chr()} and @code{ord()} functions to move through the
-letters of the alphabet.
-Modify the program to instead use only the @command{awk}
-built-in functions, such as @code{index()} and @code{substr()}.
-
-@item
The @code{split.awk} program (@pxref{Split Program}) assumes
that letters are contiguous in the character set,
which isn't true for EBCDIC systems.
Fix this problem.
+(Hint: Consider a different way to work through the alphabet,
+without relying on @code{ord()} and @code{chr()}.)
+
+@item
+In @file{uniq.awk} (@pxref{Uniq Program}, the
+logic for choosing which lines to print represents a @dfn{state
+machine}, which is ``a device that can be in one of a set number of stable
+conditions depending on its previous condition and on the present values
+of its inputs.''@footnote{This is the definition returned from entering
+@code{define: state machine} into Google.}
+Brian Kernighan suggests that
+``an alternative approach to state machines is to just read
+the input into an array, then use indexing. It's almost always
+easier code, and for most inputs where you would use this, just
+as fast.'' Rewrite the logic to follow this
+suggestion.
+
@item
Why can't the @file{wc.awk} program (@pxref{Wc Program}) just
@@ -26383,6 +26734,7 @@ Modify @file{anagram.awk} (@pxref{Anagram Program}), to avoid
the use of the external @command{sort} utility.
@end enumerate
+@c EXCLUDE END
@ifnotinfo
@part @value{PART3}Moving Beyond Standard @command{awk} With @command{gawk}
@@ -26488,13 +26840,11 @@ discusses the ability to dynamically add new built-in functions to
@cindex constants, nondecimal
If you run @command{gawk} with the @option{--non-decimal-data} option,
-you can have nondecimal constants in your input data:
+you can have nondecimal values in your input data:
-@c line break here for small book format
@example
$ @kbd{echo 0123 123 0x123 |}
-> @kbd{gawk --non-decimal-data '@{ printf "%d, %d, %d\n",}
-> @kbd{$1, $2, $3 @}'}
+> @kbd{gawk --non-decimal-data '@{ printf "%d, %d, %d\n", $1, $2, $3 @}'}
@print{} 83, 123, 291
@end example
@@ -26535,6 +26885,8 @@ Instead, use the @code{strtonum()} function to convert your data
(@pxref{String Functions}).
This makes your programs easier to write and easier to read, and
leads to less surprising results.
+
+This option may disappear in a future version of @command{gawk}.
@end quotation
@node Array Sorting
@@ -26564,12 +26916,14 @@ Often, though, it is desirable to be able to loop over the elements
in a particular order that you, the programmer, choose. @command{gawk}
lets you do this.
-@ref{Controlling Scanning}, describes how you can assign special,
+@DBREF{Controlling Scanning} describes how you can assign special,
pre-defined values to @code{PROCINFO["sorted_in"]} in order to
control the order in which @command{gawk} traverses an array
during a @code{for} loop.
-In addition, the value of @code{PROCINFO["sorted_in"]} can be a function name.
+In addition, the value of @code{PROCINFO["sorted_in"]} can be a
+function name.@footnote{This is why the predefined sorting orders
+start with an @samp{@@} character, which cannot be part of an identifier.}
This lets you traverse an array based on any custom criterion.
The array elements are ordered according to the return value of this
function. The comparison function should be defined with at least
@@ -26701,7 +27055,7 @@ according to login name. The following program sorts records
by a specific field position and can be used for this purpose:
@example
-# sort.awk --- simple program to sort by field position
+# passwd-sort.awk --- simple program to sort by field position
# field position is specified by the global variable POS
function cmp_field(i1, v1, i2, v2)
@@ -26760,7 +27114,7 @@ As mentioned above, the order of the indices is arbitrary if two
elements compare equal. This is usually not a problem, but letting
the tied elements come out in arbitrary order can be an issue, especially
when comparing item values. The partial ordering of the equal elements
-may change during the next loop traversal, if other elements are added or
+may change the next time the array is traversed, if other elements are added or
removed from the array. One way to resolve ties when comparing elements
with otherwise equal values is to include the indices in the comparison
rules. Note that doing this may make the loop traversal less efficient,
@@ -26929,7 +27283,6 @@ come into play; comparisons are based on character values only.@footnote{This
is true because locale-based comparison occurs only when in POSIX
compatibility mode, and since @code{asort()} and @code{asorti()} are
@command{gawk} extensions, they are not available in that case.}
-Caveat Emptor.
@node Two-way I/O
@section Two-Way Communications with Another Process
@@ -26995,7 +27348,7 @@ for example, @file{/tmp} will not do, as another user might happen
to be using a temporary file with the same name.@footnote{Michael
Brennan suggests the use of @command{rand()} to generate unique
@value{FN}s. This is a valid point; nevertheless, temporary files
-remain more difficult than two-way pipes.} @c 8/2014
+remain more difficult to use than two-way pipes.} @c 8/2014
@cindex coprocesses
@cindex input/output, two-way
@@ -27132,13 +27485,27 @@ using regular pipes.
@cindex @code{/inet6/@dots{}} special files (@command{gawk})
@cindex files, @code{/inet6/@dots{}} (@command{gawk})
@cindex @code{EMISTERED}
+@ifnotdocbook
@quotation
-@code{EMISTERED}:@*
+@code{EMRED}:@*
@ @ @ @ @i{A host is a host from coast to coast,@*
-@ @ @ @ and no-one can talk to host that's close,@*
+@ @ @ @ and nobody talks to a host that's close,@*
@ @ @ @ unless the host that isn't close@*
-@ @ @ @ is busy hung or dead.}
+@ @ @ @ is busy, hung, or dead.}
+@author Mike O'Brien (aka Mr.@: Protocol)
@end quotation
+@end ifnotdocbook
+
+@docbook
+<blockquote>
+<attribution>Mike O'Brien (aka Mr.&nbsp;Protocol)</attribution>
+<literallayout class="normal"><literal>EMISTERED</literal>:
+&nbsp;&nbsp;&nbsp;&nbsp;<emphasis>A host is a host from coast to coast,</emphasis>
+&nbsp;&nbsp;&nbsp;&nbsp;<emphasis>and no-one can talk to host that's close,</emphasis>
+&nbsp;&nbsp;&nbsp;&nbsp;<emphasis>unless the host that isn't close</emphasis>
+&nbsp;&nbsp;&nbsp;&nbsp;<emphasis>is busy, hung, or dead.</emphasis></literallayout>
+</blockquote>
+@end docbook
In addition to being able to open a two-way pipeline to a coprocess
on the same system
@@ -27167,7 +27534,7 @@ the system default, most likely IPv4.
@item protocol
The protocol to use over IP. This must be either @samp{tcp}, or
@samp{udp}, for a TCP or UDP IP connection,
-respectively. The use of TCP is recommended for most applications.
+respectively. TCP should be used for most applications.
@item local-port
@cindex @code{getaddrinfo()} function (C library)
@@ -27200,10 +27567,10 @@ Consider the following very simple example:
@example
BEGIN @{
- Service = "/inet/tcp/0/localhost/daytime"
- Service |& getline
- print $0
- close(Service)
+ Service = "/inet/tcp/0/localhost/daytime"
+ Service |& getline
+ print $0
+ close(Service)
@}
@end example
@@ -27306,9 +27673,9 @@ in the morning to work.)
@cindex @code{BEGIN} pattern, and profiling
@cindex @code{END} pattern, and profiling
@example
- # gawk profile, created Thu Feb 27 05:16:21 2014
+ # gawk profile, created Mon Sep 29 05:16:21 2014
- # BEGIN block(s)
+ # BEGIN rule(s)
BEGIN @{
1 print "First BEGIN rule"
@@ -27335,7 +27702,7 @@ in the morning to work.)
@}
@}
- # END block(s)
+ # END rule(s)
END @{
1 print "First END rule"
@@ -27463,7 +27830,7 @@ come out as:
@end example
@noindent
-which is correct, but possibly surprising.
+which is correct, but possibly unexpected.
@cindex profiling @command{awk} programs, dynamically
@cindex @command{gawk} program, dynamic profiling
@@ -27495,7 +27862,7 @@ $ @kbd{kill -USR1 13992}
@noindent
As usual, the profiled version of the program is written to
-@file{awkprof.out}, or to a different file if one specified with
+@file{awkprof.out}, or to a different file if one was specified with
the @option{--profile} option.
Along with the regular profile, as shown earlier, the profile file
@@ -27555,6 +27922,7 @@ The @option{--non-decimal-data} option causes @command{gawk} to treat
octal- and hexadecimal-looking input data as octal and hexadecimal.
This option should be used with caution or not at all; use of @code{strtonum()}
is preferable.
+Note that this option may disappear in a future version of @command{gawk}.
@item
You can take over complete control of sorting in @samp{for (@var{indx} in @var{array})}
@@ -27568,15 +27936,15 @@ those functions sort arrays. Or you may provide one of the predefined control
strings that work for @code{PROCINFO["sorted_in"]}.
@item
-You can use the @samp{|&} operator to create a two-way pipe to a co-process.
-You read from the co-process with @code{getline} and write to it with @code{print}
-or @code{printf}. Use @code{close()} to close off the co-process completely, or
+You can use the @samp{|&} operator to create a two-way pipe to a coprocess.
+You read from the coprocess with @code{getline} and write to it with @code{print}
+or @code{printf}. Use @code{close()} to close off the coprocess completely, or
optionally, close off one side of the two-way communications.
@item
-By using special ``@value{FN}s'' with the @samp{|&} operator, you can open a
+By using special @value{FN}s with the @samp{|&} operator, you can open a
TCP/IP (or UDP/IP) connection to remote hosts in the Internet. @command{gawk}
-supports both IPv4 an IPv6.
+supports both IPv4 and IPv6.
@item
You can generate statement count profiles of your program. This can help you
@@ -27814,7 +28182,7 @@ In June 2001 Bruno Haible wrote:
This information is accessed via the
POSIX character classes in regular expressions,
such as @code{/[[:alnum:]]/}
-(@pxref{Regexp Operators}).
+(@pxref{Bracket Expressions}).
@cindex monetary information, localization
@cindex currency symbols, localization
@@ -27897,7 +28265,7 @@ default arguments.
Return the plural form used for @var{number} of the
translation of @var{string1} and @var{string2} in text domain
@var{domain} for locale category @var{category}. @var{string1} is the
-English singular variant of a message, and @var{string2} the English plural
+English singular variant of a message, and @var{string2} is the English plural
variant of the same message.
The default value for @var{domain} is the current value of @code{TEXTDOMAIN}.
The default value for @var{category} is @code{"LC_MESSAGES"}.
@@ -27985,9 +28353,11 @@ This example would be better done with @code{dcngettext()}:
@example
if (groggy)
- message = dcngettext("%d customer disturbing me\n", "%d customers disturbing me\n", "adminprog")
+ message = dcngettext("%d customer disturbing me\n",
+ "%d customers disturbing me\n", "adminprog")
else
- message = dcngettext("enjoying %d customer\n", "enjoying %d customers\n", "adminprog")
+ message = dcngettext("enjoying %d customer\n",
+ "enjoying %d customers\n", "adminprog")
printf(message, ncustomers)
@end example
@@ -28059,7 +28429,7 @@ First, use the @option{--gen-pot} command-line option to create
the initial @file{.pot} file:
@example
-$ @kbd{gawk --gen-pot -f guide.awk > guide.pot}
+gawk --gen-pot -f guide.awk > guide.pot
@end example
@cindex @code{xgettext} utility
@@ -28123,11 +28493,11 @@ example, @samp{string} is the first argument and @samp{length(string)} is the se
@example
$ @kbd{gawk 'BEGIN @{}
-> @kbd{string = "Dont Panic"}
+> @kbd{string = "Don\47t Panic"}
> @kbd{printf "%2$d characters live in \"%1$s\"\n",}
> @kbd{string, length(string)}
> @kbd{@}'}
-@print{} 10 characters live in "Dont Panic"
+@print{} 11 characters live in "Don't Panic"
@end example
If present, positional specifiers come first in the format specification,
@@ -28339,7 +28709,8 @@ msgstr "Like, the scoop is"
@cindex GNU/Linux
The next step is to make the directory to hold the binary message object
file and then to create the @file{guide.mo} file.
-We pretend that our file is to be used in the @code{en_US.UTF-8} locale.
+We pretend that our file is to be used in the @code{en_US.UTF-8} locale,
+since we have to use a locale name known to the C @command{gettext} routines.
The directory layout shown here is standard for GNU @command{gettext} on
GNU/Linux systems. Other versions of @command{gettext} may use a different
layout:
@@ -28360,8 +28731,8 @@ $ @kbd{mkdir en_US.UTF-8 en_US.UTF-8/LC_MESSAGES}
The @command{msgfmt} utility does the conversion from human-readable
@file{.po} file to machine-readable @file{.mo} file.
By default, @command{msgfmt} creates a file named @file{messages}.
-This file must be renamed and placed in the proper directory so that
-@command{gawk} can find it:
+This file must be renamed and placed in the proper directory (using
+the @option{-o} option) so that @command{gawk} can find it:
@example
$ @kbd{msgfmt guide-mellow.po -o en_US.UTF-8/LC_MESSAGES/guide.mo}
@@ -28404,8 +28775,8 @@ complete detail in
@cite{GNU gettext tools}}.)
@end ifnotinfo
As of this writing, the latest version of GNU @command{gettext} is
-@uref{ftp://ftp.gnu.org/gnu/gettext/gettext-0.19.1.tar.gz,
-@value{PVERSION} 0.19.1}.
+@uref{ftp://ftp.gnu.org/gnu/gettext/gettext-0.19.2.tar.gz,
+@value{PVERSION} 0.19.2}.
If a translation of @command{gawk}'s messages exists,
then @command{gawk} produces usage messages, warnings,
@@ -28493,7 +28864,7 @@ the discussion of debugging in @command{gawk}.
@subsection Debugging in General
(If you have used debuggers in other languages, you may want to skip
-ahead to the next section on the specific features of the @command{awk}
+ahead to the next section on the specific features of the @command{gawk}
debugger.)
Of course, a debugging program cannot remove bugs for you, since it has
@@ -28533,7 +28904,7 @@ is going wrong (or, for that matter, to better comprehend a perfectly
functional program that you or someone else wrote).
@node Debugging Terms
-@subsection Additional Debugging Concepts
+@subsection Debugging Concepts
Before diving in to the details, we need to introduce several
important concepts that apply to just about all debuggers.
@@ -28622,8 +28993,8 @@ as our example.
@cindex starting the debugger
@cindex debugger, how to start
-Starting the debugger is almost exactly like running @command{gawk},
-except you have to pass an additional option @option{--debug} or the
+Starting the debugger is almost exactly like running @command{gawk} normally,
+except you have to pass an additional option @option{--debug}, or the
corresponding short option @option{-D}. The file(s) containing the
program and any supporting code are given on the command line as arguments
to one or more @option{-f} options. (@command{gawk} is not designed
@@ -28631,7 +29002,7 @@ to debug command-line programs, only programs contained in files.)
In our case, we invoke the debugger like this:
@example
-$ @kbd{gawk -D -f getopt.awk -f join.awk -f uniq.awk inputfile}
+$ @kbd{gawk -D -f getopt.awk -f join.awk -f uniq.awk -1 inputfile}
@end example
@noindent
@@ -28641,6 +29012,7 @@ this syntax is slightly different from what they are used to.
With the @command{gawk} debugger, you give the arguments for running the program
in the command line to the debugger rather than as part of the @code{run}
command at the debugger prompt.)
+The @option{-1} is an option to @file{uniq.awk}.
Instead of immediately running the program on @file{inputfile}, as
@command{gawk} would ordinarily do, the debugger merely loads all
@@ -28693,7 +29065,7 @@ the breakpoint, use the @code{b} (breakpoint) command:
@example
gawk> @kbd{b are_equal}
-@print{} Breakpoint 1 set at file `awklib/eg/prog/uniq.awk', line 64
+@print{} Breakpoint 1 set at file `awklib/eg/prog/uniq.awk', line 63
@end example
The debugger tells us the file and line number where the breakpoint is.
@@ -28705,8 +29077,8 @@ gawk> @kbd{r}
@print{} Starting program:
@print{} Stopping in Rule ...
@print{} Breakpoint 1, are_equal(n, m, clast, cline, alast, aline)
- at `awklib/eg/prog/uniq.awk':64
-@print{} 64 if (fcount == 0 && charcount == 0)
+ at `awklib/eg/prog/uniq.awk':63
+@print{} 63 if (fcount == 0 && charcount == 0)
gawk>
@end example
@@ -28718,12 +29090,12 @@ listing of the current stack frames:
@example
gawk> @kbd{bt}
@print{} #0 are_equal(n, m, clast, cline, alast, aline)
- at `awklib/eg/prog/uniq.awk':69
-@print{} #1 in main() at `awklib/eg/prog/uniq.awk':89
+ at `awklib/eg/prog/uniq.awk':68
+@print{} #1 in main() at `awklib/eg/prog/uniq.awk':88
@end example
This tells us that @code{are_equal()} was called by the main program at
-line 89 of @file{uniq.awk}. (This is not a big surprise, since this
+line 88 of @file{uniq.awk}. (This is not a big surprise, since this
is the only call to @code{are_equal()} in the program, but in more complex
programs, knowing who called a function and with what parameters can be
the key to finding the source of the problem.)
@@ -28747,7 +29119,7 @@ A more useful variable to display might be the current record:
@example
gawk> @kbd{p $0}
-@print{} $0 = string ("gawk is a wonderful program!")
+@print{} $0 = "gawk is a wonderful program!"
@end example
@noindent
@@ -28756,7 +29128,7 @@ our test input above. Let's look at @code{NR}:
@example
gawk> @kbd{p NR}
-@print{} NR = number (2)
+@print{} NR = 2
@end example
@noindent
@@ -28775,7 +29147,7 @@ OK, let's just check that that rule worked correctly:
@example
gawk> @kbd{p last}
-@print{} last = string ("awk is a wonderful program!")
+@print{} last = "awk is a wonderful program!"
@end example
Everything we have done so far has verified that the program has worked as
@@ -28786,13 +29158,13 @@ be inside this function. To investigate further, we must begin
@example
gawk> @kbd{n}
-@print{} 67 if (fcount > 0) @{
+@print{} 66 if (fcount > 0) @{
@end example
-This tells us that @command{gawk} is now ready to execute line 67, which
+This tells us that @command{gawk} is now ready to execute line 66, which
decides whether to give the lines the special ``field skipping'' treatment
-indicated by the @option{-f} command-line option. (Notice that we skipped
-from where we were before at line 64 to here, since the condition in line 64
+indicated by the @option{-1} command-line option. (Notice that we skipped
+from where we were before at line 63 to here, since the condition in line 63
@samp{if (fcount == 0 && charcount == 0)} was false.)
Continuing to step, we now get to the splitting of the current and
@@ -28800,9 +29172,9 @@ last records:
@example
gawk> @kbd{n}
-@print{} 68 n = split(last, alast)
+@print{} 67 n = split(last, alast)
gawk> @kbd{n}
-@print{} 69 m = split($0, aline)
+@print{} 68 m = split($0, aline)
@end example
At this point, we should be curious to see what our records were split
@@ -28810,10 +29182,10 @@ into, so we try to look:
@example
gawk> @kbd{p n m alast aline}
-@print{} n = number (5)
-@print{} m = number (5)
+@print{} n = 5
+@print{} m = untyped variable
@print{} alast = array, 5 elements
-@print{} aline = array, 5 elements
+@print{} aline = untyped variable
@end example
@noindent
@@ -28821,7 +29193,9 @@ gawk> @kbd{p n m alast aline}
@command{awk}'s @code{print} statement.)
This is kind of disappointing, though. All we found out is that there
-are five elements in each of our arrays. Useful enough (we now know that
+are five elements in @code{alast}; @code{m} and @code{aline} don't have
+values since we are at line 68 but haven't executed it yet.
+This information is useful enough (we now know that
none of the words were accidentally left out), but what if we want to see
inside the array?
@@ -28837,7 +29211,7 @@ Oops!
@example
gawk> @kbd{p alast[1]}
-@print{} alast["1"] = string ("awk")
+@print{} alast["1"] = "awk"
@end example
This would be kind of slow for a 100-member array, though, so
@@ -28846,11 +29220,11 @@ not to be mentioned):
@example
gawk> @kbd{p @@alast}
-@print{} alast["1"] = string ("awk")
-@print{} alast["2"] = string ("is")
-@print{} alast["3"] = string ("a")
-@print{} alast["4"] = string ("wonderful")
-@print{} alast["5"] = string ("program!")
+@print{} alast["1"] = "awk"
+@print{} alast["2"] = "is"
+@print{} alast["3"] = "a"
+@print{} alast["4"] = "wonderful"
+@print{} alast["5"] = "program!"
@end example
It looks like we got this far OK. Let's take another step
@@ -28858,9 +29232,9 @@ or two:
@example
gawk> @kbd{n}
-@print{} 70 clast = join(alast, fcount, n)
+@print{} 69 clast = join(alast, fcount, n)
gawk> @kbd{n}
-@print{} 71 cline = join(aline, fcount, m)
+@print{} 70 cline = join(aline, fcount, m)
@end example
Well, here we are at our error (sorry to spoil the suspense). What we
@@ -28870,8 +29244,8 @@ this would work. Let's look at what we've got:
@example
gawk> @kbd{p cline clast}
-@print{} cline = string ("gawk is a wonderful program!")
-@print{} clast = string ("awk is a wonderful program!")
+@print{} cline = "gawk is a wonderful program!"
+@print{} clast = "awk is a wonderful program!"
@end example
Hey, those look pretty familiar! They're just our original, unaltered,
@@ -29013,7 +29387,8 @@ Delete breakpoint(s) set at entry to function @var{function}.
@cindex breakpoint condition
@item @code{condition} @var{n} @code{"@var{expression}"}
Add a condition to existing breakpoint or watchpoint @var{n}. The
-condition is an @command{awk} expression that the debugger evaluates
+condition is an @command{awk} expression @emph{enclosed in double quotes}
+that the debugger evaluates
whenever the breakpoint or watchpoint is reached. If the condition is true, then
the debugger stops execution and prompts for a command. Otherwise,
the debugger continues executing the program. If the condition expression is
@@ -29201,7 +29576,7 @@ see the output shown under @code{dump} in @ref{Miscellaneous Debugger Commands}.
@item @code{until} [[@var{filename}@code{:}]@var{n} | @var{function}]
@itemx @code{u} [[@var{filename}@code{:}]@var{n} | @var{function}]
Without any argument, continue execution until a line past the current
-line in current stack frame is reached. With an argument,
+line in the current stack frame is reached. With an argument,
continue execution until the specified location is reached, or the current
stack frame returns.
@end table
@@ -29265,7 +29640,7 @@ gawk> @kbd{print $3}
@noindent
This prints the third field in the input record (if the specified field does not
exist, it prints @samp{Null field}). A variable can be an array element, with
-the subscripts being constant values. To print the contents of an array,
+the subscripts being constant string values. To print the contents of an array,
prefix the name of the array with the @samp{@@} symbol:
@example
@@ -29331,7 +29706,7 @@ watch list.
@end table
@node Execution Stack
-@subsection Dealing with the Stack
+@subsection Working with the Stack
Whenever you run a program which contains any function calls,
@command{gawk} maintains a stack of all of the function calls leading up
@@ -29342,16 +29717,22 @@ functions which called the one you are in. The commands for doing this are:
@table @asis
@cindex debugger commands, @code{bt} (@code{backtrace})
@cindex debugger commands, @code{backtrace}
+@cindex debugger commands, @code{where} (@code{backtrace})
@cindex @code{backtrace} debugger command
@cindex @code{bt} debugger command (alias for @code{backtrace})
+@cindex @code{where} debugger command
+@cindex @code{where} debugger command (alias for @code{backtrace})
@cindex call stack, display in debugger
@cindex traceback, display in debugger
@item @code{backtrace} [@var{count}]
@itemx @code{bt} [@var{count}]
+@itemx @code{where} [@var{count}]
Print a backtrace of all function calls (stack frames), or innermost @var{count}
frames if @var{count} > 0. Print the outermost @var{count} frames if
@var{count} < 0. The backtrace displays the name and arguments to each
function, the source @value{FN}, and the line number.
+The alias @code{where} for @code{backtrace} is provided for long-time
+GDB users who may be used to that command.
@cindex debugger commands, @code{down}
@cindex @code{down} debugger command
@@ -29401,7 +29782,7 @@ The value for @var{what} should be one of the following:
@table @code
@item args
@cindex show function arguments, in debugger
-Arguments of the selected frame.
+List arguments of the selected frame.
@item break
@cindex show breakpoints
@@ -29413,7 +29794,7 @@ List all items in the automatic display list.
@item frame
@cindex describe call stack frame, in debugger
-Description of the selected stack frame.
+Give a description of the selected stack frame.
@item functions
@cindex list function definitions, in debugger
@@ -29422,11 +29803,11 @@ line numbers.
@item locals
@cindex show local variables, in debugger
-Local variables of the selected frame.
+List local variables of the selected frame.
@item source
@cindex show name of current source file, in debugger
-The name of the current source file. Each time the program stops, the
+Print the name of the current source file. Each time the program stops, the
current source file is the file containing the current instruction.
When the debugger first starts, the current source file is the first file
included via the @option{-f} option. The
@@ -29543,6 +29924,7 @@ commands in a program. This can be very enlightening, as the following
partial dump of Davide Brini's obfuscated code
(@pxref{Signature Program}) demonstrates:
+@c FIXME: This will need updating if num-handler branch is ever merged in.
@smallexample
gawk> @kbd{dump}
@print{} # BEGIN
@@ -29616,7 +29998,7 @@ are as follows:
@c nested table
@table @asis
-@item @code{-}
+@item @code{-} (Minus)
Print lines before the lines last printed.
@item @code{+}
@@ -29704,7 +30086,7 @@ and
@end table
@node Limitations
-@section Limitations and Future Plans
+@section Limitations
We hope you find the @command{gawk} debugger useful and enjoyable to work with,
but as with any program, especially in its early releases, it still has
@@ -29718,7 +30100,9 @@ responds @samp{syntax error}. When you do figure out what your mistake was,
though, you'll feel like a real guru.
@item
-If you perused the dump of opcodes in @ref{Miscellaneous Debugger Commands},
+@c NOTE: no comma after the ref{} on purpose, due to following
+@c parenthetical remark.
+If you perused the dump of opcodes in @ref{Miscellaneous Debugger Commands}
(or if you are already familiar with @command{gawk} internals),
you will realize that much of the internal manipulation of data
in @command{gawk}, as in many interpreters, is done on a stack.
@@ -29750,8 +30134,10 @@ executing, short programs.
The @command{gawk} debugger only accepts source supplied with the @option{-f} option.
@end itemize
+@ignore
Look forward to a future release when these and other missing features may
be added, and of course feel free to try to add them yourself!
+@end ignore
@node Debugging Summary
@section Summary
@@ -29794,9 +30180,8 @@ and editing.
@cindex floating-point, numbers@comma{} arbitrary precision
This @value{CHAPTER} introduces some basic concepts relating to
-how computers do arithmetic and briefly lists the features in
-@command{gawk} for performing arbitrary precision floating point
-computations. It then proceeds to describe floating-point arithmetic,
+how computers do arithmetic and defines some important terms.
+It then proceeds to describe floating-point arithmetic,
which is what @command{awk} uses for all its computations, including a
discussion of arbitrary precision floating point arithmetic, which is
a feature available only in @command{gawk}. It continues on to present
@@ -29891,10 +30276,12 @@ Computers work with integer and floating point values of different
ranges. Integer values are usually either 32 or 64 bits in size. Single
precision floating point values occupy 32 bits, whereas double precision
floating point values occupy 64 bits. Floating point values are always
-signed. The possible ranges of values are shown in the following table.
+signed. The possible ranges of values are shown in @ref{table-numeric-ranges}.
+@float Table,table-numeric-ranges
+@caption{Value Ranges for Different Numeric Representations}
@multitable @columnfractions .34 .33 .33
-@headitem Numeric representation @tab Miniumum value @tab Maximum value
+@headitem Numeric representation @tab Minimum value @tab Maximum value
@item 32-bit signed integer @tab @minus{}2,147,483,648 @tab 2,147,483,647
@item 32-bit unsigned integer @tab 0 @tab 4,294,967,295
@item 64-bit signed integer @tab @minus{}9,223,372,036,854,775,808 @tab 9,223,372,036,854,775,807
@@ -29902,6 +30289,7 @@ signed. The possible ranges of values are shown in the following table.
@item Single precision floating point (approximate) @tab @code{1.175494e-38} @tab @code{3.402823e+38}
@item Double precision floating point (approximate) @tab @code{2.225074e-308} @tab @code{1.797693e+308}
@end multitable
+@end float
@node Math Definitions
@section Other Stuff To Know
@@ -29929,14 +30317,12 @@ A special value representing infinity. Operations involving another
number and infinity produce infinity.
@item NaN
-``Not A Number.''@footnote{Thanks
-to Michael Brennan for this description, which I have paraphrased, and
-for the examples}.
-A special value that results from attempting a
-calculation that has no answer as a real number. In such a case,
-programs can either receive a floating-point exception, or get @code{NaN}
-back as the result. The IEEE 754 standard recommends that systems return
-@code{NaN}. Some examples:
+``Not A Number.''@footnote{Thanks to Michael Brennan for this description,
+which we have paraphrased, and for the examples.} A special value that
+results from attempting a calculation that has no answer as a real number.
+In such a case, programs can either receive a floating-point exception,
+or get @code{NaN} back as the result. The IEEE 754 standard recommends
+that systems return @code{NaN}. Some examples:
@table @code
@item sqrt(-1)
@@ -30010,9 +30396,9 @@ to allow greater precisions and larger exponent ranges.
field values for the basic IEEE 754 binary formats:
@float Table,table-ieee-formats
-@caption{Basic IEEE Format Context Values}
+@caption{Basic IEEE Format Values}
@multitable @columnfractions .20 .20 .20 .20 .20
-@headitem Name @tab Total bits @tab Precision @tab emin @tab emax
+@headitem Name @tab Total bits @tab Precision @tab Minimum exponent @tab Maximum exponent
@item Single @tab 32 @tab 24 @tab @minus{}126 @tab +127
@item Double @tab 64 @tab 53 @tab @minus{}1022 @tab +1023
@item Quadruple @tab 128 @tab 113 @tab @minus{}16382 @tab +16383
@@ -30025,18 +30411,18 @@ one extra bit of significand.
@end quotation
@node MPFR features
-@section Arbitrary Precison Arithmetic Features In @command{gawk}
+@section Arbitrary Precision Arithmetic Features In @command{gawk}
-By default, @command{gawk} uses the double precision floating point values
+By default, @command{gawk} uses the double precision floating-point values
supplied by the hardware of the system it runs on. However, if it was
-compiled to do, @command{gawk} uses the @uref{http://www.mpfr.org, GNU
-MPFR} and @uref{http://gmplib.org, GNU MP} (GMP) libraries for arbitrary
+compiled to do so, @command{gawk} uses the @uref{http://www.mpfr.org
+GNU MPFR} and @uref{http://gmplib.org, GNU MP} (GMP) libraries for arbitrary
precision arithmetic on numbers. You can see if MPFR support is available
like so:
@example
$ @kbd{gawk --version}
-@print{} GNU Awk 4.1.1, API: 1.1 (GNU MPFR 3.1.0-p3, GNU MP 5.0.2)
+@print{} GNU Awk 4.1.2, API: 1.1 (GNU MPFR 3.1.0-p3, GNU MP 5.0.2)
@print{} Copyright (C) 1989, 1991-2014 Free Software Foundation.
@dots{}
@end example
@@ -30056,17 +30442,18 @@ results. With the @option{-M} command-line option,
all floating-point arithmetic operators and numeric functions
can yield results to any desired precision level supported by MPFR.
-Two built-in variables, @code{PREC} and @code{ROUNDMODE},
+Two predefined variables, @code{PREC} and @code{ROUNDMODE},
provide control over the working precision and the rounding mode.
The precision and the rounding mode are set globally for every operation
to follow.
-@xref{Auto-set}, for more information.
+@xref{Setting precision}, and @ref{Setting the rounding mode},
+for more information.
@node FP Math Caution
@section Floating Point Arithmetic: Caveat Emptor!
@quotation
-Math class is tough!
+@i{Math class is tough!}
@author Teen Talk Barbie, July 1992
@end quotation
@@ -30174,6 +30561,10 @@ else
# not ok
@end example
+@noindent
+(We assume that you have a simple absolute value function named
+@code{abs()} defined elsewhere in your program.)
+
@node Errors accumulate
@subsubsection Errors Accumulate
@@ -30260,7 +30651,7 @@ It is easy to forget that the finite number of bits used to store the value
is often just an approximation after proper rounding.
The test for equality succeeds if and only if @emph{all} bits in the two operands
are exactly the same. Since this is not necessarily true after floating-point
-computations with a particular precision and effective rounding rule,
+computations with a particular precision and effective rounding mode,
a straight test for equality may not work. Instead, compare the
two numbers to see if they are within the desirable delta of each other.
@@ -30327,7 +30718,7 @@ $ @kbd{gawk -f pi2.awk}
the precision or accuracy of individual numbers. Performing an arithmetic
operation or calling a built-in function rounds the result to the current
working precision. The default working precision is 53 bits, which you can
-modify using the built-in variable @code{PREC}. You can also set the
+modify using the predefined variable @code{PREC}. You can also set the
value to one of the predefined case-insensitive strings
shown in @ref{table-predefined-precision-strings},
to emulate an IEEE 754 binary format.
@@ -30359,7 +30750,7 @@ Be wary of floating-point constants! When reading a floating-point
constant from program source code, @command{gawk} uses the default
precision (that of a C @code{double}), unless overridden by an assignment
to the special variable @code{PREC} on the command line, to store it
-internally as a MPFR number. Changing the precision using @code{PREC}
+internally as an MPFR number. Changing the precision using @code{PREC}
in the program text does @emph{not} change the precision of a constant.
If you need to represent a floating-point constant at a higher precision
@@ -30497,15 +30888,15 @@ the following computes
5<superscript>4<superscript>3<superscript>2</superscript></superscript></superscript>, @c
@end docbook
the result of which is beyond the
-limits of ordinary hardware double-precision floating point values:
+limits of ordinary hardware double precision floating point values:
@example
$ @kbd{gawk -M 'BEGIN @{}
> @kbd{x = 5^4^3^2}
-> @kbd{print "# of digits =", length(x)}
+> @kbd{print "number of digits =", length(x)}
> @kbd{print substr(x, 1, 20), "...", substr(x, length(x) - 19, 20)}
> @kbd{@}'}
-@print{} # of digits = 183231
+@print{} number of digits = 183231
@print{} 62060698786608744707 ... 92256259918212890625
@end example
@@ -30607,7 +30998,7 @@ using this user-defined function:
@end ignore
@c file eg/lib/div.awk
-function div(numerator, denominator, result, i)
+function div(numerator, denominator, result)
@{
split("", result)
@@ -30621,6 +31012,80 @@ function div(numerator, denominator, result, i)
@c endfile
@end example
+The following example program, contributed by Katie Wasserman,
+uses @code{div()} to
+compute the digits of @value{PI} to as many places as you
+choose to set:
+
+@example
+@c file eg/prog/pi.awk
+# pi.awk --- compute the digits of pi
+@c endfile
+@c endfile
+@ignore
+@c file eg/prog/pi.awk
+#
+# Katie Wasserman, katie@@wass.net
+# August 2014
+@c endfile
+@end ignore
+@c file eg/prog/pi.awk
+
+BEGIN @{
+ digits = 100000
+ two = 2 * 10 ^ digits
+ pi = two
+ for (m = digits * 4; m > 0; --m) @{
+ d = m * 2 + 1
+ x = pi * m
+ div(x, d, result)
+ pi = result["quotient"]
+ pi = pi + two
+ @}
+ print pi
+@}
+@c endfile
+@end example
+
+@ignore
+Date: Wed, 20 Aug 2014 10:19:11 -0400
+To: arnold@skeeve.com
+From: Katherine Wasserman <katie@wass.net>
+Subject: Re: computation of digits of pi?
+
+Arnold,
+
+>The program that you sent to compute the digits of pi using div(). Is
+>that some standard algorithm that every math student knows? If so,
+>what's it called?
+
+It's not that well known but it's not that obscure either
+
+It's Euler's modification to Newton's method for calculating pi.
+
+Take a look at lines (23) - (25) here: http://mathworld.wolfram.com/PiFormulas.htm
+
+The algorithm I wrote simply expands the multiply by 2 and works from the innermost expression outwards. I used this to program HP calculators because it's quite easy to modify for tiny memory devices with smallish word sizes.
+
+http://www.hpmuseum.org/cgi-sys/cgiwrap/hpmuseum/articles.cgi?read=899
+
+-Katie
+@end ignore
+
+When asked about the algorithm used, Katie replied:
+
+@quotation
+It's not that well known but it's not that obscure either.
+It's Euler's modification to Newton's method for calculating pi.
+Take a look at lines (23) - (25) here: @uref{http://mathworld.wolfram.com/PiFormulas.htm}.
+
+The algorithm I wrote simply expands the multiply by 2 and works from
+the innermost expression outwards. I used this to program HP calculators
+because it's quite easy to modify for tiny memory devices with smallish
+word sizes. See
+@uref{http://www.hpmuseum.org/cgi-sys/cgiwrap/hpmuseum/articles.cgi?read=899}.
+@end quotation
+
@node POSIX Floating Point Problems
@section Standards Versus Existing Practice
@@ -30728,7 +31193,7 @@ Thus @samp{+nan} and @samp{+NaN} are the same.
@itemize @value{BULLET}
@item
Most computer arithmetic is done using either integers or floating-point
-values. The default for @command{awk} is to use double-precision
+values. Standard @command{awk} uses double precision
floating-point values.
@item
@@ -30847,7 +31312,7 @@ Extensions are written in C or C++, using the @dfn{Application Programming
Interface} (API) defined for this purpose by the @command{gawk}
developers. The rest of this @value{CHAPTER} explains
the facilities that the API provides and how to use
-them, and presents a small sample extension. In addition, it documents
+them, and presents a small example extension. In addition, it documents
the sample extensions included in the @command{gawk} distribution,
and describes the @code{gawkextlib} project.
@ifclear FOR_PRINT
@@ -30863,10 +31328,14 @@ goals and design.
@node Plugin License
@section Extension Licensing
-Every dynamic extension should define the global symbol
-@code{plugin_is_GPL_compatible} to assert that it has been licensed under
-a GPL-compatible license. If this symbol does not exist, @command{gawk}
-emits a fatal error and exits when it tries to load your extension.
+Every dynamic extension must be distributed under a license that is
+compatible with the GNU GPL (@pxref{Copying}).
+
+In order for the extension to tell @command{gawk} that it is
+properly licensed, the extension must define the global symbol
+@code{plugin_is_GPL_compatible}. If this symbol does not exist,
+@command{gawk} emits a fatal error and exits when it tries to load
+your extension.
The declared type of the symbol should be @code{int}. It does not need
to be in any allocated section, though. The code merely asserts that
@@ -30881,7 +31350,7 @@ int plugin_is_GPL_compatible;
Communication between
@command{gawk} and an extension is two-way. First, when an extension
-is loaded, it is passed a pointer to a @code{struct} whose fields are
+is loaded, @command{gawk} passes it a pointer to a @code{struct} whose fields are
function pointers.
@ifnotdocbook
This is shown in @ref{figure-load-extension}.
@@ -30917,29 +31386,29 @@ This is shown in @inlineraw{docbook, <xref linkend="figure-load-extension"/>}.
The extension can call functions inside @command{gawk} through these
function pointers, at runtime, without needing (link-time) access
to @command{gawk}'s symbols. One of these function pointers is to a
-function for ``registering'' new built-in functions.
+function for ``registering'' new functions.
@ifnotdocbook
-This is shown in @ref{figure-load-new-function}.
+This is shown in @ref{figure-register-new-function}.
@end ifnotdocbook
@ifdocbook
-This is shown in @inlineraw{docbook, <xref linkend="figure-load-new-function"/>}.
+This is shown in @inlineraw{docbook, <xref linkend="figure-register-new-function"/>}.
@end ifdocbook
@ifnotdocbook
-@float Figure,figure-load-new-function
-@caption{Loading The New Function}
+@float Figure,figure-register-new-function
+@caption{Registering A New Function}
@ifinfo
-@center @image{api-figure2, , , Loading The New Function, txt}
+@center @image{api-figure2, , , Registering A New Function, txt}
@end ifinfo
@ifnotinfo
-@center @image{api-figure2, , , Loading The New Function}
+@center @image{api-figure2, , , Registering A New Function}
@end ifnotinfo
@end float
@end ifnotdocbook
@docbook
-<figure id="figure-load-new-function" float="0">
-<title>Loading The New Function</title>
+<figure id="figure-register-new-function" float="0">
+<title>Registering A New Function</title>
<mediaobject>
<imageobject role="web"><imagedata fileref="api-figure2.png" format="PNG"/></imageobject>
</mediaobject>
@@ -30989,8 +31458,8 @@ and understandable.
Although all of this sounds somewhat complicated, the result is that
extension code is quite straightforward to write and to read. You can
-see this in the sample extensions @file{filefuncs.c} (@pxref{Extension
-Example}) and also the @file{testext.c} code for testing the APIs.
+see this in the sample extension @file{filefuncs.c} (@pxref{Extension
+Example}) and also in the @file{testext.c} code for testing the APIs.
Some other bits and pieces:
@@ -31024,13 +31493,13 @@ This (rather large) @value{SECTION} describes the API in detail.
@menu
* Extension API Functions Introduction:: Introduction to the API functions.
* General Data Types:: The data types.
-* Requesting Values:: How to get a value.
* Memory Allocation Functions:: Functions for allocating memory.
* Constructor Functions:: Functions for creating values.
* Registration Functions:: Functions to register things with
@command{gawk}.
* Printing Messages:: Functions for printing messages.
* Updating @code{ERRNO}:: Functions for updating @code{ERRNO}.
+* Requesting Values:: How to get a value.
* Accessing Parameters:: Functions for accessing parameters.
* Symbol Table Access:: Functions for accessing global
variables.
@@ -31049,6 +31518,9 @@ API function pointers are provided for the following kinds of operations:
@itemize @value{BULLET}
@item
+Allocating, reallocating, and releasing memory.
+
+@item
Registration functions. You may register:
@itemize @value{MINUS}
@item
@@ -31081,9 +31553,6 @@ Symbol table access: retrieving a global variable, creating one,
or changing one.
@item
-Allocating, reallocating, and releasing memory.
-
-@item
Creating and releasing cached values; this provides an
efficient way to use values for multiple variables and
can be a big performance win.
@@ -31151,15 +31620,15 @@ does not support this keyword, you should either place
All pointers filled in by @command{gawk} point to memory
managed by @command{gawk} and should be treated by the extension as
read-only. Memory for @emph{all} strings passed into @command{gawk}
-from the extension @emph{must} come from calling the API-provided function
-pointers @code{api_malloc()}, @code{api_calloc()} or @code{api_realloc()},
+from the extension @emph{must} come from calling one of
+@code{gawk_malloc()}, @code{gawk_calloc()} or @code{gawk_realloc()},
and is managed by @command{gawk} from then on.
@item
The API defines several simple @code{struct}s that map values as seen
from @command{awk}. A value can be a @code{double}, a string, or an
array (as in multidimensional arrays, or when creating a new array).
-String values maintain both pointer and length since embedded @sc{nul}
+String values maintain both pointer and length since embedded @value{NUL}
characters are allowed.
@quotation NOTE
@@ -31172,7 +31641,7 @@ and also how characters are likely to be input and output from files.
@item
When retrieving a value (such as a parameter or that of a global variable
or array element), the extension requests a specific type (number, string,
-scalars, value cookie, array, or ``undefined''). When the request is
+scalar, value cookie, array, or ``undefined''). When the request is
``undefined,'' the returned value will have the real underlying type.
However, if the request and actual type don't match, the access function
@@ -31235,8 +31704,8 @@ A simple boolean type.
This represents a mutable string. @command{gawk}
owns the memory pointed to if it supplied
the value. Otherwise, it takes ownership of the memory pointed to.
-@strong{Such memory must come from calling the API-provided function
-pointers @code{api_malloc()}, @code{api_calloc()}, or @code{api_realloc()}!}
+@strong{Such memory must come from calling one of the
+@code{gawk_malloc()}, @code{gawk_calloc()}, or @code{gawk_realloc()} functions!}
As mentioned earlier, strings are maintained using the current
multibyte encoding.
@@ -31291,7 +31760,7 @@ Scalar values in @command{awk} are either numbers or strings. The
indicates what is in the @code{union}.
Representing numbers is easy---the API uses a C @code{double}. Strings
-require more work. Since @command{gawk} allows embedded @sc{nul} bytes
+require more work. Since @command{gawk} allows embedded @value{NUL} bytes
in string values, a string must be represented as a pair containing a
data-pointer and length. This is the @code{awk_string_t} type.
@@ -31331,7 +31800,7 @@ the cookie for getting the variable's value or for changing the variable's
value.
This is the @code{awk_scalar_t} type and @code{scalar_cookie} macro.
Given a scalar cookie, @command{gawk} can directly retrieve or
-modify the value, as required, without having to first find it.
+modify the value, as required, without having to find it first.
The @code{awk_value_cookie_t} type and @code{value_cookie} macro are similar.
If you know that you wish to
@@ -31341,149 +31810,6 @@ and then pass in that value cookie whenever you wish to set the value of a
variable. This saves both storage space within the running @command{gawk}
process as well as the time needed to create the value.
-@node Requesting Values
-@subsection Requesting Values
-
-All of the functions that return values from @command{gawk}
-work in the same way. You pass in an @code{awk_valtype_t} value
-to indicate what kind of value you expect. If the actual value
-matches what you requested, the function returns true and fills
-in the @code{awk_value_t} result.
-Otherwise, the function returns false, and the @code{val_type}
-member indicates the type of the actual value. You may then
-print an error message, or reissue the request for the actual
-value type, as appropriate. This behavior is summarized in
-@ref{table-value-types-returned}.
-
-@c FIXME: Try to do this with spans...
-
-@float Table,table-value-types-returned
-@caption{API Value Types Returned}
-@docbook
-<informaltable>
-<tgroup cols="2">
- <colspec colwidth="50*"/><colspec colwidth="50*"/>
- <thead>
- <row><entry></entry><entry><para>Type of Actual Value:</para></entry></row>
- </thead>
- <tbody>
- <row><entry></entry><entry></entry></row>
- </tbody>
-</tgroup>
-<tgroup cols="6">
- <colspec colwidth="16.6*"/>
- <colspec colwidth="16.6*"/>
- <colspec colwidth="19.8*"/>
- <colspec colwidth="15*"/>
- <colspec colwidth="15*"/>
- <colspec colwidth="16.6*"/>
- <thead>
- <row>
- <entry></entry>
- <entry></entry>
- <entry><para>String</para></entry>
- <entry><para>Number</para></entry>
- <entry><para>Array</para></entry>
- <entry><para>Undefined</para></entry>
- </row>
- </thead>
- <tbody>
- <row>
- <entry></entry>
- <entry><para><emphasis role="bold">String</emphasis></para></entry>
- <entry><para>String</para></entry>
- <entry><para>String</para></entry>
- <entry><para>false</para></entry>
- <entry><para>false</para></entry>
- </row>
- <row>
- <entry></entry>
- <entry><para><emphasis role="bold">Number</emphasis></para></entry>
- <entry><para>Number if can be converted, else false</para></entry>
- <entry><para>Number</para></entry>
- <entry><para>false</para></entry>
- <entry><para>false</para></entry>
- </row>
- <row>
- <entry><para><emphasis role="bold">Type</emphasis></para></entry>
- <entry><para><emphasis role="bold">Array</emphasis></para></entry>
- <entry><para>false</para></entry>
- <entry><para>false</para></entry>
- <entry><para>Array</para></entry>
- <entry><para>false</para></entry>
- </row>
- <row>
- <entry><para><emphasis role="bold">Requested:</emphasis></para></entry>
- <entry><para><emphasis role="bold">Scalar</emphasis></para></entry>
- <entry><para>Scalar</para></entry>
- <entry><para>Scalar</para></entry>
- <entry><para>false</para></entry>
- <entry><para>false</para></entry>
- </row>
- <row>
- <entry></entry>
- <entry><para><emphasis role="bold">Undefined</emphasis></para></entry>
- <entry><para>String</para></entry>
- <entry><para>Number</para></entry>
- <entry><para>Array</para></entry>
- <entry><para>Undefined</para></entry>
- </row>
- <row>
- <entry></entry>
- <entry><para><emphasis role="bold">Value Cookie</emphasis></para></entry>
- <entry><para>false</para></entry>
- <entry><para>false</para></entry>
- <entry><para>false</para>
- </entry><entry><para>false</para></entry>
- </row>
- </tbody>
-</tgroup>
-</informaltable>
-@end docbook
-
-@ifnotplaintext
-@ifnotdocbook
-@multitable @columnfractions .50 .50
-@headitem @tab Type of Actual Value:
-@end multitable
-@multitable @columnfractions .166 .166 .198 .15 .15 .166
-@headitem @tab @tab String @tab Number @tab Array @tab Undefined
-@item @tab @b{String} @tab String @tab String @tab false @tab false
-@item @tab @b{Number} @tab Number if can be converted, else false @tab Number @tab false @tab false
-@item @b{Type} @tab @b{Array} @tab false @tab false @tab Array @tab false
-@item @b{Requested:} @tab @b{Scalar} @tab Scalar @tab Scalar @tab false @tab false
-@item @tab @b{Undefined} @tab String @tab Number @tab Array @tab Undefined
-@item @tab @b{Value Cookie} @tab false @tab false @tab false @tab false
-@end multitable
-@end ifnotdocbook
-@end ifnotplaintext
-@ifplaintext
-@example
- +-------------------------------------------------+
- | Type of Actual Value: |
- +------------+------------+-----------+-----------+
- | String | Number | Array | Undefined |
-+-----------+-----------+------------+------------+-----------+-----------+
-| | String | String | String | false | false |
-| |-----------+------------+------------+-----------+-----------+
-| | Number | Number if | Number | false | false |
-| | | can be | | | |
-| | | converted, | | | |
-| | | else false | | | |
-| |-----------+------------+------------+-----------+-----------+
-| Type | Array | false | false | Array | false |
-| Requested |-----------+------------+------------+-----------+-----------+
-| | Scalar | Scalar | Scalar | false | false |
-| |-----------+------------+------------+-----------+-----------+
-| | Undefined | String | Number | Array | Undefined |
-| |-----------+------------+------------+-----------+-----------+
-| | Value | false | false | false | false |
-| | Cookie | | | | |
-+-----------+-----------+------------+------------+-----------+-----------+
-@end example
-@end ifplaintext
-@end float
-
@node Memory Allocation Functions
@subsection Memory Allocation Functions and Convenience Macros
@cindex allocating memory for extensions
@@ -31492,22 +31818,24 @@ value type, as appropriate. This behavior is summarized in
The API provides a number of @dfn{memory allocation} functions for
allocating memory that can be passed to @command{gawk}, as well as a number of
convenience macros.
+This @value{SUBSECTION} presents them all as function prototypes, in
+the way that extension code would use them.
@table @code
@item void *gawk_malloc(size_t size);
-Call @command{gawk}-provided @code{api_malloc()} to allocate storage that may
+Call the correct version of @code{malloc()} to allocate storage that may
be passed to @command{gawk}.
@item void *gawk_calloc(size_t nmemb, size_t size);
-Call @command{gawk}-provided @code{api_calloc()} to allocate storage that may
+Call the correct version of @code{calloc()} to allocate storage that may
be passed to @command{gawk}.
@item void *gawk_realloc(void *ptr, size_t size);
-Call @command{gawk}-provided @code{api_realloc()} to allocate storage that may
+Call the correct version of @code{realloc()} to allocate storage that may
be passed to @command{gawk}.
@item void gawk_free(void *ptr);
-Call @command{gawk}-provided @code{api_free()} to release storage that was
+Call the correct version of @code{free()} to release storage that was
allocated with @code{gawk_malloc()}, @code{gawk_calloc()} or @code{gawk_realloc()}.
@end table
@@ -31521,8 +31849,8 @@ unrelated version of @code{malloc()}, unexpected behavior would
likely result.
Two convenience macros may be used for allocating storage
-from the API-provided function pointers @code{api_malloc()} and
-@code{api_realloc()}. If the allocation fails, they cause @command{gawk}
+from @code{gawk_malloc()} and
+@code{gawk_realloc()}. If the allocation fails, they cause @command{gawk}
to exit with a fatal error message. They should be used as if they were
procedure calls that do not return a value.
@@ -31536,7 +31864,7 @@ The arguments to this macro are as follows:
The pointer variable to point at the allocated storage.
@item type
-The type of the pointer variable, used to create a cast for the call to @code{api_malloc()}.
+The type of the pointer variable, used to create a cast for the call to @code{gawk_malloc()}.
@item size
The total number of bytes to be allocated.
@@ -31560,8 +31888,8 @@ make_malloced_string(message, strlen(message), & result);
@end example
@item #define erealloc(pointer, type, size, message) @dots{}
-This is like @code{emalloc()}, but it calls @code{api_realloc()},
-instead of @code{api_malloc()}.
+This is like @code{emalloc()}, but it calls @code{gawk_realloc()},
+instead of @code{gawk_malloc()}.
The arguments are the same as for the @code{emalloc()} macro.
@end table
@@ -31585,7 +31913,7 @@ for storage in @code{result}. It returns @code{result}.
@itemx make_malloced_string(const char *string, size_t length, awk_value_t *result)
This function creates a string value in the @code{awk_value_t} variable
pointed to by @code{result}. It expects @code{string} to be a @samp{char *}
-value pointing to data previously obtained from the api-provided functions @code{api_malloc()}, @code{api_calloc()} or @code{api_realloc()}. The idea here
+value pointing to data previously obtained from @code{gawk_malloc()}, @code{gawk_calloc()} or @code{gawk_realloc()}. The idea here
is that the data is passed directly to @command{gawk}, which assumes
responsibility for it. It returns @code{result}.
@@ -31640,17 +31968,18 @@ The name of the new function.
This is a regular C string.
Function names must obey the rules for @command{awk}
-identifiers. That is, they must begin with either a letter
+identifiers. That is, they must begin with either an English letter
or an underscore, which may be followed by any number of
letters, digits, and underscores.
Letter case in function names is significant.
@item awk_value_t *(*function)(int num_actual_args, awk_value_t *result);
-This is a pointer to the C function that provides the desired
+This is a pointer to the C function that provides the extension's
functionality.
-The function must fill in the result with either a number
+The function must fill in @code{*result} with either a number
or a string. @command{gawk} takes ownership of any string memory.
-As mentioned earlier, string memory @strong{must} come from the api-provided functions @code{api_malloc()}, @code{api_calloc()} or @code{api_realloc()}.
+As mentioned earlier, string memory @strong{must} come from one of @code{gawk_malloc()},
+@code{gawk_calloc()} or @code{gawk_realloc()}.
The @code{num_actual_args} argument tells the C function how many
actual parameters were passed from the calling @command{awk} code.
@@ -31661,7 +31990,7 @@ This is for the convenience of the calling code inside @command{gawk}.
@item size_t num_expected_args;
This is the number of arguments the function expects to receive.
Each extension function may decide what to do if the number of
-arguments isn't what it expected. Following @command{awk} functions, it
+arguments isn't what it expected. As with real @command{awk} functions, it
is likely OK to ignore extra arguments.
@end table
@@ -31915,7 +32244,7 @@ If the concept of a ``record terminator'' makes sense, then
@code{RT}, and @code{*rt_len} should be set to the length of the
data. Otherwise, @code{*rt_len} should be set to zero.
@code{gawk} makes its own copy of this data, so the
-extension must manage the storage.
+extension must manage this storage.
@end table
The return value is the length of the buffer pointed to by
@@ -32194,10 +32523,148 @@ into a (possibly translated) string using the C @code{strerror()} function.
Set @code{ERRNO} directly to the string value of @code{ERRNO}.
@command{gawk} makes a copy of the value of @code{string}.
-@item void unset_ERRNO();
+@item void unset_ERRNO(void);
Unset @code{ERRNO}.
@end table
+@node Requesting Values
+@subsection Requesting Values
+
+All of the functions that return values from @command{gawk}
+work in the same way. You pass in an @code{awk_valtype_t} value
+to indicate what kind of value you expect. If the actual value
+matches what you requested, the function returns true and fills
+in the @code{awk_value_t} result.
+Otherwise, the function returns false, and the @code{val_type}
+member indicates the type of the actual value. You may then
+print an error message, or reissue the request for the actual
+value type, as appropriate. This behavior is summarized in
+@ref{table-value-types-returned}.
+
+@float Table,table-value-types-returned
+@caption{API Value Types Returned}
+@docbook
+<informaltable>
+<tgroup cols="6">
+ <colspec colwidth="16.6*"/>
+ <colspec colwidth="16.6*"/>
+ <colspec colwidth="19.8*" colname="c3"/>
+ <colspec colwidth="15*" colname="c4"/>
+ <colspec colwidth="15*" colname="c5"/>
+ <colspec colwidth="16.6*" colname="c6"/>
+ <spanspec spanname="hspan" namest="c3" nameend="c6" align="center"/>
+ <thead>
+ <row><entry></entry><entry spanname="hspan"><para>Type of Actual Value:</para></entry></row>
+ <row>
+ <entry></entry>
+ <entry></entry>
+ <entry><para>String</para></entry>
+ <entry><para>Number</para></entry>
+ <entry><para>Array</para></entry>
+ <entry><para>Undefined</para></entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry></entry>
+ <entry><para><emphasis role="bold">String</emphasis></para></entry>
+ <entry><para>String</para></entry>
+ <entry><para>String</para></entry>
+ <entry><para>false</para></entry>
+ <entry><para>false</para></entry>
+ </row>
+ <row>
+ <entry></entry>
+ <entry><para><emphasis role="bold">Number</emphasis></para></entry>
+ <entry><para>Number if can be converted, else false</para></entry>
+ <entry><para>Number</para></entry>
+ <entry><para>false</para></entry>
+ <entry><para>false</para></entry>
+ </row>
+ <row>
+ <entry><para><emphasis role="bold">Type</emphasis></para></entry>
+ <entry><para><emphasis role="bold">Array</emphasis></para></entry>
+ <entry><para>false</para></entry>
+ <entry><para>false</para></entry>
+ <entry><para>Array</para></entry>
+ <entry><para>false</para></entry>
+ </row>
+ <row>
+ <entry><para><emphasis role="bold">Requested:</emphasis></para></entry>
+ <entry><para><emphasis role="bold">Scalar</emphasis></para></entry>
+ <entry><para>Scalar</para></entry>
+ <entry><para>Scalar</para></entry>
+ <entry><para>false</para></entry>
+ <entry><para>false</para></entry>
+ </row>
+ <row>
+ <entry></entry>
+ <entry><para><emphasis role="bold">Undefined</emphasis></para></entry>
+ <entry><para>String</para></entry>
+ <entry><para>Number</para></entry>
+ <entry><para>Array</para></entry>
+ <entry><para>Undefined</para></entry>
+ </row>
+ <row>
+ <entry></entry>
+ <entry><para><emphasis role="bold">Value Cookie</emphasis></para></entry>
+ <entry><para>false</para></entry>
+ <entry><para>false</para></entry>
+ <entry><para>false</para>
+ </entry><entry><para>false</para></entry>
+ </row>
+ </tbody>
+</tgroup>
+</informaltable>
+@end docbook
+
+@ifnotplaintext
+@ifnotdocbook
+@multitable @columnfractions .50 .50
+@headitem @tab Type of Actual Value:
+@end multitable
+@c 10/2014: Thanks to Karl Berry for this bit to reduce the space:
+@tex
+\vglue-1.1\baselineskip
+@end tex
+@multitable @columnfractions .166 .166 .198 .15 .15 .166
+@headitem @tab @tab String @tab Number @tab Array @tab Undefined
+@item @tab @b{String} @tab String @tab String @tab false @tab false
+@item @tab @b{Number} @tab Number if can be converted, else false @tab Number @tab false @tab false
+@item @b{Type} @tab @b{Array} @tab false @tab false @tab Array @tab false
+@item @b{Requested:} @tab @b{Scalar} @tab Scalar @tab Scalar @tab false @tab false
+@item @tab @b{Undefined} @tab String @tab Number @tab Array @tab Undefined
+@item @tab @b{Value Cookie} @tab false @tab false @tab false @tab false
+@end multitable
+@end ifnotdocbook
+@end ifnotplaintext
+@ifplaintext
+@example
+ +-------------------------------------------------+
+ | Type of Actual Value: |
+ +------------+------------+-----------+-----------+
+ | String | Number | Array | Undefined |
++-----------+-----------+------------+------------+-----------+-----------+
+| | String | String | String | false | false |
+| |-----------+------------+------------+-----------+-----------+
+| | Number | Number if | Number | false | false |
+| | | can be | | | |
+| | | converted, | | | |
+| | | else false | | | |
+| |-----------+------------+------------+-----------+-----------+
+| Type | Array | false | false | Array | false |
+| Requested |-----------+------------+------------+-----------+-----------+
+| | Scalar | Scalar | Scalar | false | false |
+| |-----------+------------+------------+-----------+-----------+
+| | Undefined | String | Number | Array | Undefined |
+| |-----------+------------+------------+-----------+-----------+
+| | Value | false | false | false | false |
+| | Cookie | | | | |
++-----------+-----------+------------+------------+-----------+-----------+
+@end example
+@end ifplaintext
+@end float
+
@node Accessing Parameters
@subsection Accessing and Updating Parameters
@@ -32252,7 +32719,7 @@ about symbols is termed a @dfn{symbol table}.
Fill in the @code{awk_value_t} structure pointed to by @code{result}
with the value of the variable named by the string @code{name}, which is
a regular C string. @code{wanted} indicates the type of value expected.
-Return true if the actual type matches @code{wanted}, false otherwise
+Return true if the actual type matches @code{wanted}, false otherwise.
In the latter case, @code{result->val_type} indicates the actual type
(@pxref{table-value-types-returned}).
@@ -32271,7 +32738,7 @@ An extension can look up the value of @command{gawk}'s special variables.
However, with the exception of the @code{PROCINFO} array, an extension
cannot change any of those variables.
-@quotation NOTE
+@quotation CAUTION
It is possible for the lookup of @code{PROCINFO} to fail. This happens if
the @command{awk} program being run does not reference @code{PROCINFO};
in this case @command{gawk} doesn't bother to create the array and
@@ -32293,14 +32760,14 @@ The following functions let you work with scalar cookies.
@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_valtype_t wanted,
@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_value_t *result);
Retrieve the current value of a scalar cookie.
-Once you have obtained a scalar_cookie using @code{sym_lookup()}, you can
+Once you have obtained a scalar cookie using @code{sym_lookup()}, you can
use this function to get its value more efficiently.
Return false if the value cannot be retrieved.
@item awk_bool_t sym_update_scalar(awk_scalar_t cookie, awk_value_t *value);
Update the value associated with a scalar cookie. Return false if
the new value is not of type @code{AWK_STRING} or @code{AWK_NUMBER}.
-Here too, the built-in variables may not be updated.
+Here too, the predefined variables may not be updated.
@end table
It is not obvious at first glance how to work with scalar cookies or
@@ -32355,7 +32822,7 @@ my_extension_init()
/* install initial value */
sym_update("MAGIC_VAR", make_number(42.0, & value));
- /* get cookie */
+ /* get the cookie */
sym_lookup("MAGIC_VAR", AWK_SCALAR, & value);
/* save the cookie */
@@ -32404,7 +32871,8 @@ assign those values to variables using @code{sym_update()}
or @code{sym_update_scalar()}, as you like.
However, you can understand the point of cached values if you remember that
-@emph{every} string value's storage @emph{must} come from @code{api_malloc()}, @code{api_calloc()} or @code{api_realloc()}.
+@emph{every} string value's storage @emph{must} come from @code{gawk_malloc()},
+@code{gawk_calloc()} or @code{gawk_realloc()}.
If you have 20 variables, all of which have the same string value, you
must create 20 identical copies of the string.@footnote{Numeric values
are clearly less problematic, requiring only a C @code{double} to store.}
@@ -32475,7 +32943,7 @@ Using value cookies in this way saves considerable storage, since all of
You might be wondering, ``Is this sharing problematic?
What happens if @command{awk} code assigns a new value to @code{VAR1},
-are all the others be changed too?''
+are all the others changed too?''
That's a great question. The answer is that no, it's not a problem.
Internally, @command{gawk} uses @dfn{reference-counted strings}. This means
@@ -32530,7 +32998,7 @@ with the @code{<stdio.h>} library routines.
@itemx @ @ @ @ struct awk_element *next;
@itemx @ @ @ @ enum @{
@itemx @ @ @ @ @ @ @ @ AWK_ELEMENT_DEFAULT = 0,@ @ /* set by gawk */
-@itemx @ @ @ @ @ @ @ @ AWK_ELEMENT_DELETE = 1@ @ @ @ /* set by extension if should be deleted */
+@itemx @ @ @ @ @ @ @ @ AWK_ELEMENT_DELETE = 1@ @ @ @ /* set by extension */
@itemx @ @ @ @ @} flags;
@itemx @ @ @ @ awk_value_t index;
@itemx @ @ @ @ awk_value_t value;
@@ -32550,8 +33018,8 @@ an extension to create a linked list of new elements that can then be
added to an array in a loop that traverses the list.
@item enum @{ @dots{} @} flags;
-A set of flag values that convey information between @command{gawk}
-and the extension. Currently there is only one: @code{AWK_ELEMENT_DELETE}.
+A set of flag values that convey information between the extension
+and @command{gawk}. Currently there is only one: @code{AWK_ELEMENT_DELETE}.
Setting it causes @command{gawk} to delete the
element from the original array upon release of the flattened array.
@@ -32562,8 +33030,8 @@ The index and value of the element, respectively.
@end table
@item typedef struct awk_flat_array @{
-@itemx @ @ @ @ awk_const void *awk_const opaque1;@ @ @ @ /* private data for use by gawk */
-@itemx @ @ @ @ awk_const void *awk_const opaque2;@ @ @ @ /* private data for use by gawk */
+@itemx @ @ @ @ awk_const void *awk_const opaque1;@ @ @ @ /* for use by gawk */
+@itemx @ @ @ @ awk_const void *awk_const opaque2;@ @ @ @ /* for use by gawk */
@itemx @ @ @ @ awk_const size_t count;@ @ @ @ @ /* how many elements */
@itemx @ @ @ @ awk_element_t elements[1];@ @ /* will be extended */
@itemx @} awk_flat_array_t;
@@ -32582,7 +33050,7 @@ The following functions relate to individual array elements.
@table @code
@item awk_bool_t get_element_count(awk_array_t a_cookie, size_t *count);
-For the array represented by @code{a_cookie}, return in @code{*count}
+For the array represented by @code{a_cookie}, place in @code{*count}
the number of elements it contains. A subarray counts as a single element.
Return false if there is an error.
@@ -32602,7 +33070,8 @@ requires that you understand how such values are converted to strings
(@pxref{Conversion}); thus using integral values is safest.
As with @emph{all} strings passed into @code{gawk} from an extension,
-the string value of @code{index} must come from the API-provided functions @code{api_malloc()}, @code{api_calloc()} or @code{api_realloc()} and
+the string value of @code{index} must come from @code{gawk_malloc()},
+@code{gawk_calloc()} or @code{gawk_realloc()}, and
@command{gawk} releases the storage.
@item awk_bool_t set_array_element(awk_array_t a_cookie,
@@ -32629,7 +33098,7 @@ not exist in the array.
The following functions relate to arrays as a whole:
@table @code
-@item awk_array_t create_array();
+@item awk_array_t create_array(void);
Create a new array to which elements may be added.
@xref{Creating Arrays}, for a discussion of how to
create a new array and add elements to it.
@@ -32646,7 +33115,13 @@ For the array represented by @code{a_cookie}, create an @code{awk_flat_array_t}
structure and fill it in. Set the pointer whose address is passed as @code{data}
to point to this structure.
Return true upon success, or false otherwise.
-@xref{Flattening Arrays}, for a discussion of how to
+@ifset FOR_PRINT
+See the next section
+@end ifset
+@ifclear FOR_PRINT
+@xref{Flattening Arrays},
+@end ifclear
+for a discussion of how to
flatten an array and work with it.
@item awk_bool_t release_flattened_array(awk_array_t a_cookie,
@@ -32666,6 +33141,7 @@ for C code to traverse the entire array. Test code
in @file{extension/testext.c} does this, and also serves
as a nice example showing how to use the APIs.
+We walk through that part of the code one step at a time.
First, the @command{gawk} script that drives the test extension:
@example
@@ -32804,8 +33280,7 @@ have this flag bit set:
valrep2str(& flat_array->elements[i].value));
if (strcmp(value3.str_value.str,
- flat_array->elements[i].index.str_value.str)
- == 0) @{
+ flat_array->elements[i].index.str_value.str) == 0) @{
flat_array->elements[i].flags |= AWK_ELEMENT_DELETE;
printf("dump_array_and_delete: marking element \"%s\" "
"for deletion\n",
@@ -32909,7 +33384,9 @@ of the array cookie after the call to @code{set_element()}.
The following C code is a simple test extension to create an array
with two regular elements and with a subarray. The leading @code{#include}
-directives and boilerplate variable declarations are omitted for brevity.
+directives and boilerplate variable declarations
+(@pxref{Extension API Boilerplate})
+are omitted for brevity.
The first step is to create a new array and then install it
in the symbol table:
@@ -33155,7 +33632,7 @@ This variable is true if @command{gawk} was invoked with @option{--traditional}
@end table
The value of @code{do_lint} can change if @command{awk} code
-modifies the @code{LINT} built-in variable (@pxref{Built-in Variables}).
+modifies the @code{LINT} predefined variable (@pxref{Built-in Variables}).
The others should not change during execution.
@node Extension API Boilerplate
@@ -33188,12 +33665,12 @@ static awk_bool_t (*init_func)(void) = NULL;
/* OR: */
static awk_bool_t
-init_my_module(void)
+init_my_extension(void)
@{
@dots{}
@}
-static awk_bool_t (*init_func)(void) = init_my_module;
+static awk_bool_t (*init_func)(void) = init_my_extension;
dl_load_func(func_table, some_name, "name_space_in_quotes")
@end example
@@ -33236,8 +33713,8 @@ It can then be looped over for multiple calls to
@c Use @var{OR} for docbook
@item static awk_bool_t (*init_func)(void) = NULL;
@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @var{OR}
-@itemx static awk_bool_t init_my_module(void) @{ @dots{} @}
-@itemx static awk_bool_t (*init_func)(void) = init_my_module;
+@itemx static awk_bool_t init_my_extension(void) @{ @dots{} @}
+@itemx static awk_bool_t (*init_func)(void) = init_my_extension;
If you need to do some initialization work, you should define a
function that does it (creates variables, opens files, etc.)
and then define the @code{init_func} pointer to point to your
@@ -33304,8 +33781,8 @@ path with a list of directories to search for compiled extensions.
Two useful functions that are not in @command{awk} are @code{chdir()} (so
that an @command{awk} program can change its directory) and @code{stat()}
(so that an @command{awk} program can gather information about a file).
-This @value{SECTION} implements these functions for @command{gawk}
-in an extension.
+In order to illustrate the API in action, this @value{SECTION} implements
+these functions for @command{gawk} in an extension.
@menu
* Internal File Description:: What the new functions will do.
@@ -33327,8 +33804,7 @@ straightforward. It takes one argument, the new directory to change to:
newdir = "/home/arnold/funstuff"
ret = chdir(newdir)
if (ret < 0) @{
- printf("could not change to %s: %s\n",
- newdir, ERRNO) > "/dev/stderr"
+ printf("could not change to %s: %s\n", newdir, ERRNO) > "/dev/stderr"
exit 1
@}
@dots{}
@@ -33516,7 +33992,7 @@ The second is a pointer to an @code{awk_value_t}, usually named
@code{result}.
@example
-/* do_chdir --- provide dynamically loaded chdir() builtin for gawk */
+/* do_chdir --- provide dynamically loaded chdir() function for gawk */
static awk_value_t *
do_chdir(int nargs, awk_value_t *result)
@@ -33725,13 +34201,22 @@ for success:
@}
@}
- array_set(array, "type", make_const_string(type, strlen(type), &tmp));
+ array_set(array, "type", make_const_string(type, strlen(type), & tmp));
return 0;
@}
@end example
-Finally, here is the @code{do_stat()} function. It starts with
+The third argument to @code{stat()} was not discussed previously. This
+argument is optional. If present, it causes @code{do_stat()} to use
+the @code{stat()} system call instead of the @code{lstat()} system
+call. This is done by using a function pointer: @code{statfunc}.
+@code{statfunc} is initialized to point to @code{lstat()} (instead
+of @code{stat()}) to get the file information, in case the file is a
+symbolic link. However, if there were three arguments, @code{statfunc}
+is set point to @code{stat()}, instead.
+
+Here is the @code{do_stat()} function. It starts with
variable declarations and argument checking:
@ignore
@@ -33762,16 +34247,10 @@ do_stat(int nargs, awk_value_t *result)
@}
@end example
-The third argument to @code{stat()} was not discussed previously. This argument
-is optional. If present, it causes @code{stat()} to use the @code{stat()}
-system call instead of the @code{lstat()} system call.
-
Then comes the actual work. First, the function gets the arguments.
-Next, it gets the information for the file.
-The code use @code{lstat()} (instead of @code{stat()})
-to get the file information,
-in case the file is a symbolic link.
-If there's an error, it sets @code{ERRNO} and returns:
+Next, it gets the information for the file. If the called function
+(@code{lstat()} or @code{stat()}) returns an error, the code sets
+@code{ERRNO} and returns:
@example
/* file is first arg, array to hold results is second */
@@ -33800,7 +34279,7 @@ If there's an error, it sets @code{ERRNO} and returns:
@end example
The tedious work is done by @code{fill_stat_array()}, shown
-earlier. When done, return the result from @code{fill_stat_array()}:
+earlier. When done, the function returns the result from @code{fill_stat_array()}:
@example
ret = fill_stat_array(name, array, & sbuf);
@@ -33863,7 +34342,7 @@ of the @file{gawkapi.h} header file,
the following steps@footnote{In practice, you would probably want to
use the GNU Autotools---Automake, Autoconf, Libtool, and @command{gettext}---to
configure and build your libraries. Instructions for doing so are beyond
-the scope of this @value{DOCUMENT}. @xref{gawkextlib}, for WWW links to
+the scope of this @value{DOCUMENT}. @xref{gawkextlib}, for Internet links to
the tools.} create a GNU/Linux shared library:
@example
@@ -33891,14 +34370,14 @@ BEGIN @{
for (i in data)
printf "data[\"%s\"] = %s\n", i, data[i]
print "testff.awk modified:",
- strftime("%m %d %y %H:%M:%S", data["mtime"])
+ strftime("%m %d %Y %H:%M:%S", data["mtime"])
print "\nInfo for JUNK"
ret = stat("JUNK", data)
print "ret =", ret
for (i in data)
printf "data[\"%s\"] = %s\n", i, data[i]
- print "JUNK modified:", strftime("%m %d %y %H:%M:%S", data["mtime"])
+ print "JUNK modified:", strftime("%m %d %Y %H:%M:%S", data["mtime"])
@}
@end example
@@ -33912,25 +34391,26 @@ $ @kbd{AWKLIBPATH=$PWD gawk -f testff.awk}
@print{} Info for testff.awk
@print{} ret = 0
@print{} data["blksize"] = 4096
-@print{} data["mtime"] = 1350838628
+@print{} data["devbsize"] = 512
+@print{} data["mtime"] = 1412004710
@print{} data["mode"] = 33204
@print{} data["type"] = file
@print{} data["dev"] = 2053
@print{} data["gid"] = 1000
-@print{} data["ino"] = 1719496
-@print{} data["ctime"] = 1350838628
+@print{} data["ino"] = 10358899
+@print{} data["ctime"] = 1412004710
@print{} data["blocks"] = 8
@print{} data["nlink"] = 1
@print{} data["name"] = testff.awk
-@print{} data["atime"] = 1350838632
+@print{} data["atime"] = 1412004716
@print{} data["pmode"] = -rw-rw-r--
-@print{} data["size"] = 662
+@print{} data["size"] = 666
@print{} data["uid"] = 1000
-@print{} testff.awk modified: 10 21 12 18:57:08
-@print{}
+@print{} testff.awk modified: 09 29 2014 18:31:50
+@print{}
@print{} Info for JUNK
@print{} ret = -1
-@print{} JUNK modified: 01 01 70 02:00:00
+@print{} JUNK modified: 01 01 1970 02:00:00
@end example
@node Extension Samples
@@ -33955,9 +34435,9 @@ Others mainly provide example code that shows how to use the extension API.
* Extension Sample Rev2way:: Reversing data sample two-way processor.
* Extension Sample Read write array:: Serializing an array to a file.
* Extension Sample Readfile:: Reading an entire file into a string.
-* Extension Sample API Tests:: Tests for the API.
* Extension Sample Time:: An interface to @code{gettimeofday()}
and @code{sleep()}.
+* Extension Sample API Tests:: Tests for the API.
@end menu
@node Extension Sample File Functions
@@ -33967,7 +34447,7 @@ The @code{filefuncs} extension provides three different functions, as follows:
The usage is:
@table @asis
-@item @@load "filefuncs"
+@item @code{@@load "filefuncs"}
This is how you load the extension.
@cindex @code{chdir()} extension function
@@ -34030,7 +34510,7 @@ Not all systems support all file types. @tab All
@itemx @code{result = fts(pathlist, flags, filedata)}
Walk the file trees provided in @code{pathlist} and fill in the
@code{filedata} array as described below. @code{flags} is the bitwise
-OR of several predefined constant values, also described below.
+OR of several predefined values, also described below.
Return zero if there were no errors, otherwise return @minus{}1.
@end table
@@ -34075,10 +34555,10 @@ Immediately follow a symbolic link named in @code{pathlist},
whether or not @code{FTS_LOGICAL} is set.
@item FTS_SEEDOT
-By default, the @code{fts()} routines do not return entries for @file{.} (dot)
-and @file{..} (dot-dot). This option causes entries for dot-dot to also
-be included. (The extension always includes an entry for dot,
-see below.)
+By default, the C library @code{fts()} routines do not return entries for
+@file{.} (dot) and @file{..} (dot-dot). This option causes entries for
+dot-dot to also be included. (The extension always includes an entry
+for dot, see below.)
@item FTS_XDEV
During a traversal, do not cross onto a different mounted filesystem.
@@ -34132,8 +34612,8 @@ Otherwise it returns @minus{}1.
@quotation NOTE
The @code{fts()} extension does not exactly mimic the
interface of the C library @code{fts()} routines, choosing instead to
-provide an interface that is based on associative arrays, which should
-be more comfortable to use from an @command{awk} program. This includes the
+provide an interface that is based on associative arrays, which is
+more comfortable to use from an @command{awk} program. This includes the
lack of a comparison function, since @command{gawk} already provides
powerful array sorting facilities. While an @code{fts_read()}-like
interface could have been provided, this felt less natural than simply
@@ -34141,7 +34621,8 @@ creating a multidimensional array to represent the file hierarchy and
its information.
@end quotation
-See @file{test/fts.awk} in the @command{gawk} distribution for an example.
+See @file{test/fts.awk} in the @command{gawk} distribution for an example
+use of the @code{fts()} extension function.
@node Extension Sample Fnmatch
@subsection Interface To @code{fnmatch()}
@@ -34349,7 +34830,7 @@ indicating the type of the file. The letters are file types are shown
in @ref{table-readdir-file-types}.
@float Table,table-readdir-file-types
-@caption{File Types Returned By @code{readdir()}}
+@caption{File Types Returned By The @code{readdir} Extension}
@multitable @columnfractions .1 .9
@headitem Letter @tab File Type
@item @code{b} @tab Block device
@@ -34441,6 +34922,9 @@ The @code{rwarray} extension adds two functions,
named @code{writea()} and @code{reada()}, as follows:
@table @code
+@item @@load "rwarray"
+This is how you load the extension.
+
@cindex @code{writea()} extension function
@item ret = writea(file, array)
This function takes a string argument, which is the name of the file
@@ -34516,17 +35000,6 @@ if (contents == "" && ERRNO != "") @{
@}
@end example
-@node Extension Sample API Tests
-@subsection API Tests
-@cindex @code{testext} extension
-
-The @code{testext} extension exercises parts of the extension API that
-are not tested by the other samples. The @file{extension/testext.c}
-file contains both the C code for the extension and @command{awk}
-test code inside C comments that run the tests. The testing framework
-extracts the @command{awk} code and runs the tests. See the source file
-for more information.
-
@node Extension Sample Time
@subsection Extension Time Functions
@@ -34557,6 +35030,17 @@ Implementation details: depending on platform availability, this function
tries to use @code{nanosleep()} or @code{select()} to implement the delay.
@end table
+@node Extension Sample API Tests
+@subsection API Tests
+@cindex @code{testext} extension
+
+The @code{testext} extension exercises parts of the extension API that
+are not tested by the other samples. The @file{extension/testext.c}
+file contains both the C code for the extension and @command{awk}
+test code inside C comments that run the tests. The testing framework
+extracts the @command{awk} code and runs the tests. See the source file
+for more information.
+
@node gawkextlib
@section The @code{gawkextlib} Project
@cindex @code{gawkextlib}
@@ -34572,8 +35056,7 @@ As of this writing, there are five extensions:
@itemize @value{BULLET}
@item
-XML parser extension, using the @uref{http://expat.sourceforge.net, Expat}
-XML parsing library.
+GD graphics library extension.
@item
PDF extension.
@@ -34582,17 +35065,14 @@ PDF extension.
PostgreSQL extension.
@item
-GD graphics library extension.
-
-@item
MPFR library extension.
This provides access to a number of MPFR functions which @command{gawk}'s
native MPFR support does not.
-@end itemize
-The @code{time} extension described earlier (@pxref{Extension Sample
-Time}) was originally from this project but has been moved in to the
-main @command{gawk} distribution.
+@item
+XML parser extension, using the @uref{http://expat.sourceforge.net, Expat}
+XML parsing library.
+@end itemize
@cindex @command{git} utility
You can check out the code for the @code{gawkextlib} project
@@ -34669,7 +35149,7 @@ certain tasks.
@item
One of these tasks is to ``register'' the name and implementation of
-a new @command{awk}-level function with @command{gawk}. The implementation
+new @command{awk}-level functions with @command{gawk}. The implementation
takes the form of a C function pointer with a defined signature.
By convention, implementation functions are named @code{do_@var{XXXX}()}
for some @command{awk}-level function @code{@var{XXXX}()}.
@@ -34683,6 +35163,9 @@ API function pointers are provided for the following kinds of operations:
@itemize @value{BULLET}
@item
+Allocating, reallocating, and releasing memory.
+
+@item
Registration functions. You may register
extension functions,
exit callbacks,
@@ -34706,9 +35189,6 @@ Symbol table access: retrieving a global variable, creating one,
or changing one.
@item
-Allocating, reallocating, and releasing memory.
-
-@item
Creating and releasing cached values; this provides an
efficient way to use values for multiple variables and
can be a big performance win.
@@ -34720,7 +35200,7 @@ getting the count of elements in an array;
creating a new array;
clearing an array;
and
-flattening an array for easy C style looping over all its indices and elements
+flattening an array for easy C style looping over all its indices and elements.
@end itemize
@item
@@ -34740,7 +35220,7 @@ treated as read-only by the extension.
@item
@emph{All} memory passed from an extension to @command{gawk} must come from
the API's memory allocation functions. @command{gawk} takes responsibility for
-the memory and will release it when appropriate.
+the memory and releases it when appropriate.
@item
The API provides information about the running version of @command{gawk} so
@@ -34757,10 +35237,11 @@ The @command{gawk} distribution includes a number of small but useful
sample extensions. The @code{gawkextlib} project includes several more,
larger, extensions. If you wish to write an extension and contribute it
to the community of @command{gawk} users, the @code{gawkextlib} project
-should be the place to do so.
+is the place to do so.
@end itemize
+@c EXCLUDE START
@node Extension Exercises
@section Exercises
@@ -34783,6 +35264,7 @@ Write a wrapper script that provides an interface similar to
@ref{Extension Sample Inplace}.
@end enumerate
+@c EXCLUDE END
@ifnotinfo
@part @value{PART4}Appendices
@@ -34837,7 +35319,7 @@ which follows the POSIX specification. Many long-time @command{awk}
users learned @command{awk} programming with the original @command{awk}
implementation in Version 7 Unix. (This implementation was the basis for
@command{awk} in Berkeley Unix, through 4.3-Reno. Subsequent versions
-of Berkeley Unix, and some systems derived from 4.4BSD-Lite, used various
+of Berkeley Unix, and, for a while, some systems derived from 4.4BSD-Lite, used various
versions of @command{gawk} for their @command{awk}.) This @value{CHAPTER}
briefly describes the evolution of the @command{awk} language, with
cross-references to other parts of the @value{DOCUMENT} where you can
@@ -34910,7 +35392,7 @@ The built-in functions @code{close()} and @code{system()}
@item
The @code{ARGC}, @code{ARGV}, @code{FNR}, @code{RLENGTH}, @code{RSTART},
-and @code{SUBSEP} built-in variables (@pxref{Built-in Variables}).
+and @code{SUBSEP} predefined variables (@pxref{Built-in Variables}).
@item
Assignable @code{$0} (@pxref{Changing Fields}).
@@ -34941,14 +35423,11 @@ of @code{FS}.
@item
Dynamic regexps as operands of the @samp{~} and @samp{!~} operators
-(@pxref{Regexp Usage}).
+(@pxref{Computed Regexps}).
@item
The escape sequences @samp{\b}, @samp{\f}, and @samp{\r}
(@pxref{Escape Sequences}).
-(Some vendors have updated their old versions of @command{awk} to
-recognize @samp{\b}, @samp{\f}, and @samp{\r}, but this is not
-something you can rely on.)
@item
Redirection of input for the @code{getline} function
@@ -34987,7 +35466,7 @@ The @option{-v} option for assigning variables before program execution begins
@c GNU, Bell Laboratories & MKS together
@item
-The @option{--} option for terminating command-line options.
+The @option{--} signal for terminating command-line options.
@item
The @samp{\a}, @samp{\v}, and @samp{\x} escape sequences
@@ -35004,13 +35483,13 @@ for case translation
(@pxref{String Functions}).
@item
-A cleaner specification for the @samp{%c} format-control letter in the
+A cleaner specification for the @code{%c} format-control letter in the
@code{printf} function
(@pxref{Control Letters}).
@item
The ability to dynamically pass the field width and precision (@code{"%*.*d"})
-in the argument list of the @code{printf} function
+in the argument list of @code{printf} and @code{sprintf()}
(@pxref{Control Letters}).
@item
@@ -35045,8 +35524,8 @@ The concept of a numeric string and tighter comparison rules to go
with it (@pxref{Typing and Comparison}).
@item
-The use of built-in variables as function parameter names is forbidden
-(@pxref{Definition Syntax}.
+The use of predefined variables as function parameter names is forbidden
+(@pxref{Definition Syntax}).
@item
More complete documentation of many of the previously undocumented
@@ -35141,7 +35620,7 @@ in the current version of @command{gawk}.
@itemize @value{BULLET}
@item
-Additional built-in variables:
+Additional predefined variables:
@itemize @value{MINUS}
@item
@@ -35225,14 +35704,6 @@ The @code{BEGINFILE} and @code{ENDFILE} special patterns.
(@pxref{BEGINFILE/ENDFILE}).
@item
-The ability to delete all of an array at once with @samp{delete @var{array}}
-(@pxref{Delete}).
-
-@item
-The @code{nextfile} statement
-(@pxref{Nextfile Statement}).
-
-@item
The @code{switch} statement
(@pxref{Switch Statement}).
@end itemize
@@ -35247,7 +35718,7 @@ of a two-way pipe to a coprocess
(@pxref{Two-way I/O}).
@item
-POSIX compliance for @code{gsub()} and @code{sub()}.
+POSIX compliance for @code{gsub()} and @code{sub()} with @option{--posix}.
@item
The @code{length()} function accepts an array argument
@@ -35275,6 +35746,20 @@ Additional functions only in @command{gawk}:
@itemize @value{MINUS}
@item
+The @code{gensub()}, @code{patsplit()}, and @code{strtonum()} functions
+for more powerful text manipulation
+(@pxref{String Functions}).
+
+@item
+The @code{asort()} and @code{asorti()} functions for sorting arrays
+(@pxref{Array Sorting}).
+
+@item
+The @code{mktime()}, @code{systime()}, and @code{strftime()}
+functions for working with timestamps
+(@pxref{Time Functions}).
+
+@item
The
@code{and()},
@code{compl()},
@@ -35288,30 +35773,15 @@ functions for bit manipulation
@c In 4.1, and(), or() and xor() grew the ability to take > 2 arguments
@item
-The @code{asort()} and @code{asorti()} functions for sorting arrays
-(@pxref{Array Sorting}).
+The @code{isarray()} function to check if a variable is an array or not
+(@pxref{Type Functions}).
@item
The @code{bindtextdomain()}, @code{dcgettext()} and @code{dcngettext()}
functions for internationalization
(@pxref{Programmer i18n}).
-
-@item
-The @code{fflush()} function from BWK @command{awk}
-(@pxref{I/O Functions}).
-
-@item
-The @code{gensub()}, @code{patsplit()}, and @code{strtonum()} functions
-for more powerful text manipulation
-(@pxref{String Functions}).
-
-@item
-The @code{mktime()}, @code{systime()}, and @code{strftime()}
-functions for working with timestamps
-(@pxref{Time Functions}).
@end itemize
-
@item
Changes and/or additions in the command-line options:
@@ -35434,7 +35904,7 @@ GCC for VAX and Alpha has not been tested for a while.
@item
Support for the following obsolete systems was removed from the code
-and the documentation for @command{gawk} @value{PVERSION} 4.1:
+for @command{gawk} @value{PVERSION} 4.1:
@c nested table
@itemize @value{MINUS}
@@ -35442,6 +35912,10 @@ and the documentation for @command{gawk} @value{PVERSION} 4.1:
Ultrix
@end itemize
+@item
+@c FIXME: Verify the version here.
+Support for MirBSD was removed at @command{gawk} @value{PVERSION} 4.2.
+
@end itemize
@c XXX ADD MORE STUFF HERE
@@ -36067,33 +36541,29 @@ The dynamic extension interface was completely redone
@cindex extensions, Brian Kernighan's @command{awk}
@cindex extensions, @command{mawk}
-This @value{SECTION} summarizes the common extensions supported
+The following table summarizes the common extensions supported
by @command{gawk}, Brian Kernighan's @command{awk}, and @command{mawk},
the three most widely-used freely available versions of @command{awk}
(@pxref{Other Versions}).
-@multitable {@file{/dev/stderr} special file} {BWK Awk} {Mawk} {GNU Awk}
-@headitem Feature @tab BWK Awk @tab Mawk @tab GNU Awk
-@item @samp{\x} Escape sequence @tab X @tab X @tab X
-@item @code{FS} as null string @tab X @tab X @tab X
-@item @file{/dev/stdin} special file @tab X @tab X @tab X
-@item @file{/dev/stdout} special file @tab X @tab X @tab X
-@item @file{/dev/stderr} special file @tab X @tab X @tab X
-@item @code{delete} without subscript @tab X @tab X @tab X
-@item @code{fflush()} function @tab X @tab X @tab X
-@item @code{length()} of an array @tab X @tab X @tab X
-@item @code{nextfile} statement @tab X @tab X @tab X
-@item @code{**} and @code{**=} operators @tab X @tab @tab X
-@item @code{func} keyword @tab X @tab @tab X
-@item @code{BINMODE} variable @tab @tab X @tab X
-@item @code{RS} as regexp @tab @tab X @tab X
-@item Time related functions @tab @tab X @tab X
+@multitable {@file{/dev/stderr} special file} {BWK Awk} {Mawk} {GNU Awk} {Now standard}
+@headitem Feature @tab BWK Awk @tab Mawk @tab GNU Awk @tab Now standard
+@item @samp{\x} Escape sequence @tab X @tab X @tab X @tab
+@item @code{FS} as null string @tab X @tab X @tab X @tab
+@item @file{/dev/stdin} special file @tab X @tab X @tab X @tab
+@item @file{/dev/stdout} special file @tab X @tab X @tab X @tab
+@item @file{/dev/stderr} special file @tab X @tab X @tab X @tab
+@item @code{delete} without subscript @tab X @tab X @tab X @tab X
+@item @code{fflush()} function @tab X @tab X @tab X @tab X
+@item @code{length()} of an array @tab X @tab X @tab X @tab
+@item @code{nextfile} statement @tab X @tab X @tab X @tab X
+@item @code{**} and @code{**=} operators @tab X @tab @tab X @tab
+@item @code{func} keyword @tab X @tab @tab X @tab
+@item @code{BINMODE} variable @tab @tab X @tab X @tab
+@item @code{RS} as regexp @tab @tab X @tab X @tab
+@item Time related functions @tab @tab X @tab X @tab
@end multitable
-(Technically speaking, as of late 2012, @code{fflush()}, @samp{delete @var{array}},
-and @code{nextfile} are no longer extensions, since they have been added
-to POSIX.)
-
@node Ranges and Locales
@appendixsec Regexp Ranges and Locales: A Long Sad Story
@@ -36130,6 +36600,7 @@ In the @code{"C"} and @code{"POSIX"} locales, a range expression like
But outside those locales, the ordering was defined to be based on
@dfn{collation order}.
+What does that mean?
In many locales, @samp{A} and @samp{a} are both less than @samp{B}.
In other words, these locales sort characters in dictionary order,
and @samp{[a-dx-z]} is typically not equivalent to @samp{[abcdxyz]};
@@ -36137,7 +36608,7 @@ instead it might be equivalent to @samp{[ABCXYabcdxyz]}, for example.
This point needs to be emphasized: Much literature teaches that you should
use @samp{[a-z]} to match a lowercase character. But on systems with
-non-ASCII locales, this also matched all of the uppercase characters
+non-ASCII locales, this also matches all of the uppercase characters
except @samp{A} or @samp{Z}! This was a continuous cause of confusion, even well
into the twenty-first century.
@@ -36327,7 +36798,7 @@ the various PC platforms.
@cindex Zoulas, Christos
Christos Zoulas
provided the @code{extension()}
-built-in function for dynamically adding new modules.
+built-in function for dynamically adding new functions.
(This was obsoleted at @command{gawk} 4.1.)
@item
@@ -36443,6 +36914,11 @@ The development of the extension API first released with
Arnold Robbins and Andrew Schorr, with notable contributions from
the rest of the development team.
+@cindex Malmberg, John E.
+@item
+John Malmberg contributed significant improvements to the
+OpenVMS port and the related documentation.
+
@item
@cindex Colombo, Antonio
Antonio Giovanni Colombo rewrote a number of examples in the early
@@ -36504,7 +36980,7 @@ various platforms that are supported by the developers. The primary
developer supports GNU/Linux (and Unix), whereas the other ports are
contributed.
@xref{Bugs},
-for the electronic mail addresses of the people who did
+for the electronic mail addresses of the people who maintain
the respective ports.
@menu
@@ -36823,11 +37299,10 @@ Unix-derived systems, GNU/Linux, BSD-based systems, and the Cygwin
environment for MS-Windows.
After you have extracted the @command{gawk} distribution, @command{cd}
-to @file{gawk-@value{VERSION}.@value{PATCHLEVEL}}. Like most GNU software,
-@command{gawk} is configured
-automatically for your system by running the @command{configure} program.
-This program is a Bourne shell script that is generated automatically using
-GNU Autoconf.
+to @file{gawk-@value{VERSION}.@value{PATCHLEVEL}}. As with most GNU
+software, you configure @command{gawk} for your system by running the
+@command{configure} program. This program is a Bourne shell script that
+is generated automatically using GNU Autoconf.
@ifnotinfo
(The Autoconf software is
described fully in
@@ -36922,8 +37397,8 @@ Similarly, setting the @code{LINT} variable
has no effect on the running @command{awk} program.
When used with GCC's automatic dead-code-elimination, this option
-cuts almost 200K bytes off the size of the @command{gawk}
-executable on GNU/Linux x86 systems. Results on other systems and
+cuts almost 23K bytes off the size of the @command{gawk}
+executable on GNU/Linux x86_64 systems. Results on other systems and
with other compilers are likely to vary.
Using this option may bring you some slight performance improvement.
@@ -37016,7 +37491,8 @@ various non-Unix systems.
@cindex PC operating systems@comma{} @command{gawk} on, installing
@cindex operating systems, PC@comma{} @command{gawk} on, installing
-This @value{SECTION} covers installation and usage of @command{gawk} on x86 machines
+This @value{SECTION} covers installation and usage of @command{gawk}
+on Intel architecture machines
@ifclear FOR_PRINT
running MS-DOS, any version of MS-Windows, or OS/2.
@end ifclear
@@ -37118,7 +37594,8 @@ MS-DOS and Windows32 versions. A list of targets is printed if the
build @command{gawk} using the DJGPP tools, enter @samp{make djgpp}.
(The DJGPP tools needed for the build may be found at
@uref{ftp://ftp.delorie.com/pub/djgpp/current/v2gnu/}.) To build a
-native MS-Windows binary of @command{gawk}, type @samp{make mingw32}.
+native MS-Windows binary of @command{gawk} using the MinGW tools,
+type @samp{make mingw32}.
@ifclear FOR_PRINT
@cindex compiling @command{gawk} with EMX for OS/2
@@ -37248,8 +37725,7 @@ The MS-DOS and MS-Windows versions of @command{gawk} search for
program files as described in @ref{AWKPATH Variable}. However,
semicolons (rather than colons) separate elements in the @env{AWKPATH}
variable. If @env{AWKPATH} is not set or is empty, then the default
-search path for MS-Windows and MS-DOS versions is
-@samp{@w{.;c:/lib/awk;c:/gnu/lib/awk}}.
+search path is @samp{@w{.;c:/lib/awk;c:/gnu/lib/awk}}.
@ifclear FOR_PRINT
@cindex @command{gawk}, OS/2 version of
@@ -37294,12 +37770,12 @@ allows control over these translations and is interpreted as follows:
@itemize @value{BULLET}
@item
-If @code{BINMODE} is @code{"r"}, or one,
+If @code{BINMODE} is @code{"r"} or one,
then
binary mode is set on read (i.e., no translations on reads).
@item
-If @code{BINMODE} is @code{"w"}, or two,
+If @code{BINMODE} is @code{"w"} or two,
then
binary mode is set on write (i.e., no translations on writes).
@@ -37387,7 +37863,7 @@ same as for a Unix system:
tar -xvpzf gawk-@value{VERSION}.@value{PATCHLEVEL}.tar.gz
cd gawk-@value{VERSION}.@value{PATCHLEVEL}
./configure
-make
+make && make check
@end example
When compared to GNU/Linux on the same system, the @samp{configure}
@@ -37403,10 +37879,10 @@ need to use the @code{BINMODE} variable.
This can cause problems with other Unix-like components that have
been ported to MS-Windows that expect @command{gawk} to do automatic
-translation of @code{"\r\n"}, since it won't. Caveat Emptor!
+translation of @code{"\r\n"}, since it won't.
@node VMS Installation
-@appendixsubsec How to Compile and Install @command{gawk} on Vax/VMS and OpenVMS
+@appendixsubsec Compiling and Installing @command{gawk} on Vax/VMS and OpenVMS
@c based on material from Pat Rankin <rankin@eql.caltech.edu>
@c now rankin@pactechdata.com
@@ -37511,7 +37987,7 @@ For VAX:
@end example
Compile time macros need to be defined before the first VMS-supplied
-header file is included.
+header file is included, as follows:
@example
#if (__CRTL_VER >= 70200000) && !defined (__VAX)
@@ -37527,6 +38003,11 @@ header file is included.
#endif
@end example
+If you are writing your own extensions to run on VMS, you must supply these
+definitions yourself. The @file{config.h} file created when building @command{gawk}
+on VMS does this for you; if instead you use that file or a similar one, then you
+must remember to include it before any VMS-supplied header files.
+
@node VMS Installation Details
@appendixsubsubsec Installing @command{gawk} on VMS
@@ -37623,12 +38104,12 @@ other dash-type options (or multiple parameters such as @value{DF}s to
process) are present, there is no ambiguity and @option{--} can be omitted.
@cindex exit status, of VMS
-The @code{exit} value is a Unix-style value and is encoded to a VMS exit
+The @code{exit} value is a Unix-style value and is encoded into a VMS exit
status value when the program exits.
The VMS severity bits will be set based on the @code{exit} value.
A failure is indicated by 1 and VMS sets the @code{ERROR} status.
-A fatal error is indicated by 2 and VMS will set the @code{FATAL} status.
+A fatal error is indicated by 2 and VMS sets the @code{FATAL} status.
All other values will have the @code{SUCCESS} status. The exit value is
encoded to comply with VMS coding standards and will have the
@code{C_FACILITY_NO} of @code{0x350000} with the constant @code{0xA000}
@@ -37644,7 +38125,7 @@ unix_status = (vms_status .and. &x7f8) / 8
A C program that uses @code{exec()} to call @command{gawk} will get the original
Unix-style exit value.
-Older versions of @command{gawk} treated a Unix exit code 0 as 1, a failure
+Older versions of @command{gawk} for VMS treated a Unix exit code 0 as 1, a failure
as 2, a fatal error as 4, and passed all the other numbers through.
This violated the VMS exit status coding requirements.
@@ -37678,8 +38159,8 @@ See @w{@uref{https://sourceforge.net/p/gnv/wiki/InstallingGNVPackages/}.}
The normal build procedure for @command{gawk} produces a program that
is suitable for use with GNV.
-The @file{vms/gawk_build_steps.txt} in the source documents the procedure
-for building a VMS PCSI kit that is compatible with GNV.
+The file @file{vms/gawk_build_steps.txt} in the distribution documents
+the procedure for building a VMS PCSI kit that is compatible with GNV.
@ignore
@c The VMS POSIX product, also known as POSIX for OpenVMS, is long defunct
@@ -37751,8 +38232,8 @@ If you have problems with @command{gawk} or think that you have found a bug,
please report it to the developers; we cannot promise to do anything
but we might well want to fix it.
-Before reporting a bug, make sure you have actually found a real bug.
-Carefully reread the documentation and see if it really says you can do
+Before reporting a bug, please make sure you have really found a genuine bug.
+Carefully reread the documentation and see if it says you can do
what you're trying to do. If it's not clear whether you should be able
to do something or not, report that too; it's a bug in the documentation!
@@ -37770,17 +38251,15 @@ You can get this information with the command @samp{gawk --version}.
@cindex @code{bug-gawk@@gnu.org} bug reporting address
@cindex email address for bug reports, @code{bug-gawk@@gnu.org}
@cindex bug reports, email address, @code{bug-gawk@@gnu.org}
-Once you have a precise problem, send email to
+Once you have a precise problem description, send email to
@EMAIL{bug-gawk@@gnu.org,bug-gawk at gnu dot org}.
-@cindex Robbins, Arnold
The @command{gawk} maintainers subscribe to this address and
thus they will receive your bug report.
-If necessary, the primary maintainer can be reached directly at
-@EMAIL{arnold@@skeeve.com,arnold at skeeve dot com}.
-The bug reporting address is preferred since the
+Although you can send mail to the maintainers directly,
+the bug reporting address is preferred since the
email list is archived at the GNU Project.
-@emph{All email should be in English. This is the only language
+@emph{All email must be in English. This is the only language
understood in common by all the maintainers.}
@cindex @code{comp.lang.awk} newsgroup
@@ -37789,7 +38268,7 @@ Do @emph{not} try to report bugs in @command{gawk} by
posting to the Usenet/Internet newsgroup @code{comp.lang.awk}.
While the @command{gawk} developers do occasionally read this newsgroup,
there is no guarantee that we will see your posting. The steps described
-above are the official recognized ways for reporting bugs.
+above are the only official recognized way for reporting bugs.
Really.
@end quotation
@@ -37801,35 +38280,34 @@ bug reporting system, @emph{please} also send a copy to
This is for two reasons. First, while some distributions forward
bug reports ``upstream'' to the GNU mailing list, many don't, so there is a good
-chance that the @command{gawk} maintainer won't even see the bug report! Second,
+chance that the @command{gawk} maintainers won't even see the bug report! Second,
mail to the GNU list is archived, and having everything at the GNU project
-keeps things self-contained and not dependant on other web sites.
+keeps things self-contained and not dependant on other organizations.
@end quotation
Non-bug suggestions are always welcome as well. If you have questions
about things that are unclear in the documentation or are just obscure
-features, ask me; I will try to help you out, although I
-may not have the time to fix the problem. You can send me electronic
-mail at the Internet address noted previously.
-
-If you find bugs in one of the non-Unix ports of @command{gawk}, please send
-an electronic mail message to the person who maintains that port. They
-are named in the following list, as well as in the @file{README} file
-in the @command{gawk} distribution. Information in the @file{README}
-file should be considered authoritative if it conflicts with this
-@value{DOCUMENT}.
+features, ask on the bug list; we will try to help you out if we can.
-The people maintaining the non-Unix ports of @command{gawk} are
-as follows:
+If you find bugs in one of the non-Unix ports of @command{gawk}, please
+send an electronic mail message to the bug list, with a copy to the
+person who maintains that port. They are named in the following list,
+as well as in the @file{README} file in the @command{gawk} distribution.
+Information in the @file{README} file should be considered authoritative
+if it conflicts with this @value{DOCUMENT}.
+
+The people maintaining the various @command{gawk} ports are:
@c put the index entries outside the table, for docbook
-@cindex Deifik, Scott
-@cindex Zaretskii, Eli
@cindex Buening, Andreas
-@cindex Rankin, Pat
+@cindex Deifik, Scott
@cindex Malmberg, John
@cindex Pitts, Dave
+@cindex Robbins, Arnold
+@cindex Zaretskii, Eli
@multitable {MS-Windows with MinGW} {123456789012345678901234567890123456789001234567890}
+@item Unix and POSIX systems @tab Arnold Robbins, @EMAIL{arnold@@skeeve.com,arnold at skeeve dot com}.
+
@item MS-DOS with DJGPP @tab Scott Deifik, @EMAIL{scottd.mail@@sbcglobal.net,scottd dot mail at sbcglobal dot net}.
@item MS-Windows with MinGW @tab Eli Zaretskii, @EMAIL{eliz@@gnu.org,eliz at gnu dot org}.
@@ -37838,8 +38316,7 @@ as follows:
@c OS/2 is not mentioned anywhere else in the print version though.
@item OS/2 @tab Andreas Buening, @EMAIL{andreas.buening@@nexgo.de,andreas dot buening at nexgo dot de}.
-@item VMS @tab Pat Rankin, @EMAIL{r.pat.rankin@@gmail.com,r.pat.rankin at gmail.com}, and
-John Malmberg, @EMAIL{wb8tyw@@qsl.net,wb8tyw at qsl.net}.
+@item VMS @tab John Malmberg, @EMAIL{wb8tyw@@qsl.net,wb8tyw at qsl.net}.
@item z/OS (OS/390) @tab Dave Pitts, @EMAIL{dpitts@@cozx.com,dpitts at cozx dot com}.
@end multitable
@@ -37862,11 +38339,22 @@ Date: Wed, 4 Sep 1996 08:11:48 -0700 (PDT)
@end ignore
@cindex Brennan, Michael
+@ifnotdocbook
@quotation
@i{It's kind of fun to put comments like this in your awk code.}@*
@ @ @ @ @ @ @code{// Do C++ comments work? answer: yes! of course}
@author Michael Brennan
@end quotation
+@end ifnotdocbook
+
+@docbook
+<blockquote><attribution>Michael Brennan</attribution>
+<literallayout><emphasis>It's kind of fun to put comments like this in your awk code.</emphasis>
+&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<literal>// Do C++ comments work? answer: yes! of course</literal></literallayout>
+</blockquote>
+@end docbook
+
+
There are a number of other freely available @command{awk} implementations.
This @value{SECTION} briefly describes where to get them:
@@ -37972,7 +38460,7 @@ since approximately 2003.
@item @command{pawk}
Nelson H.F.@: Beebe at the University of Utah has modified
BWK @command{awk} to provide timing and profiling information.
-It is different from @command{gawk} with the @option{--profile} option.
+It is different from @command{gawk} with the @option{--profile} option
(@pxref{Profiling}),
in that it uses CPU-based profiling, not line-count
profiling. You may find it at either
@@ -37997,7 +38485,7 @@ information, see the @uref{http://busybox.net, project's home page}.
The versions of @command{awk} in @file{/usr/xpg4/bin} and
@file{/usr/xpg6/bin} on Solaris are more-or-less POSIX-compliant.
They are based on the @command{awk} from Mortice Kern Systems for PCs.
-This author was able to make this code compile and work under GNU/Linux
+We were able to make this code compile and work under GNU/Linux
with 1--2 hours of work. Making it more generally portable (using
GNU Autoconf and/or Automake) would take more work, and this
has not been done, at least to our knowledge.
@@ -38173,7 +38661,7 @@ as well as any considerations you should bear in mind.
@appendixsubsec Accessing The @command{gawk} Git Repository
As @command{gawk} is Free Software, the source code is always available.
-@ref{Gawk Distribution}, describes how to get and build the formal,
+@DBREF{Gawk Distribution} describes how to get and build the formal,
released versions of @command{gawk}.
@cindex @command{git} utility
@@ -38224,7 +38712,7 @@ make it possible to include them:
@enumerate 1
@item
Before building the new feature into @command{gawk} itself,
-consider writing it as an extension module
+consider writing it as an extension
(@pxref{Dynamic Extensions}).
If that's not possible, continue with the rest of the steps in this list.
@@ -38469,7 +38957,7 @@ and
@item
Be willing to continue to maintain the port.
Non-Unix operating systems are supported by volunteers who maintain
-the code needed to compile and run @command{gawk} on their systems. If noone
+the code needed to compile and run @command{gawk} on their systems. If no-one
volunteers to maintain a port, it becomes unsupported and it may
be necessary to remove it from the distribution.
@@ -38531,7 +39019,7 @@ the derived files, because that keeps the repository less cluttered,
and it is easier to see the substantive changes when comparing versions
and trying to understand what changed between commits.
-However, there are two reasons why the @command{gawk} maintainer
+However, there are several reasons why the @command{gawk} maintainer
likes to have everything in the repository.
First, because it is then easy to reproduce any given version completely,
@@ -38600,6 +39088,14 @@ the maintainer is no different than Jane User who wants to try to build
Thus, the maintainer thinks that it's not just important, but critical,
that for any given branch, the above incantation @emph{just works}.
+@c Added 9/2014:
+A third reason to have all the files is that without them, using @samp{git
+bisect} to try to find the commit that introduced a bug is exceedingly
+difficult. The maintainer tried to do that on another project that
+requires running bootstrapping scripts just to create @command{configure}
+and so on; it was really painful. When the repository is self-contained,
+using @command{git bisect} in it is very easy.
+
@c So - that's my reasoning and philosophy.
What are some of the consequences and/or actions to take?
@@ -38947,7 +39443,7 @@ Pat Rankin suggested the solution that was adopted.
@appendixsubsec Other Design Decisions
As an arbitrary design decision, extensions can read the values of
-built-in variables and arrays (such as @code{ARGV} and @code{FS}), but cannot
+predefined variables and arrays (such as @code{ARGV} and @code{FS}), but cannot
change them, with the exception of @code{PROCINFO}.
The reason for this is to prevent an extension function from affecting
@@ -39688,11 +40184,11 @@ See ``Free Documentation License.''
@item Field
When @command{awk} reads an input record, it splits the record into pieces
separated by whitespace (or by a separator regexp that you can
-change by setting the built-in variable @code{FS}). Such pieces are
+change by setting the predefined variable @code{FS}). Such pieces are
called fields. If the pieces are of fixed length, you can use the built-in
variable @code{FIELDWIDTHS} to describe their lengths.
If you wish to specify the contents of fields instead of the field
-separator, you can use the built-in variable @code{FPAT} to do so.
+separator, you can use the predefined variable @code{FPAT} to do so.
(@xref{Field Separators},
@ref{Constant Size},
and
@@ -39711,7 +40207,7 @@ See also ``Double Precision'' and ``Single Precision.''
Format strings control the appearance of output in the
@code{strftime()} and @code{sprintf()} functions, and in the
@code{printf} statement as well. Also, data conversions from numbers to strings
-are controlled by the format strings contained in the built-in variables
+are controlled by the format strings contained in the predefined variables
@code{CONVFMT} and @code{OFMT}. (@xref{Control Letters}.)
@item Free Documentation License
@@ -41390,6 +41886,7 @@ Consistency issues:
Use --foo, not -Wfoo when describing long options
Use "Bell Laboratories", but not "Bell Labs".
Use "behavior" instead of "behaviour".
+ Use "coprocess" instead of "co-process".
Use "zeros" instead of "zeroes".
Use "nonzero" not "non-zero".
Use "runtime" not "run time" or "run-time".
@@ -41494,4 +41991,3 @@ But to use it you have to say
which sorta sucks.
TODO:
------