aboutsummaryrefslogtreecommitdiffstats
path: root/doc/gawk.texi
diff options
context:
space:
mode:
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r--doc/gawk.texi2020
1 files changed, 1341 insertions, 679 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi
index 0d780286..539ea53d 100644
--- a/doc/gawk.texi
+++ b/doc/gawk.texi
@@ -40,9 +40,9 @@
@c applies to and all the info about who's publishing this edition
@c These apply across the board.
-@set UPDATE-MONTH February, 2014
+@set UPDATE-MONTH April, 2014
@set VERSION 4.1
-@set PATCHLEVEL 0
+@set PATCHLEVEL 1
@set FSF
@@ -116,11 +116,19 @@
@end ifnottex
@ifnottex
+@ifnotdocbook
@macro ii{text}
@i{\text\}
@end macro
+@end ifnotdocbook
@end ifnottex
+@ifdocbook
+@macro ii{text}
+@inlineraw{docbook,<lineannotation>\text\</lineannotation>}
+@end macro
+@end ifdocbook
+
@c For HTML, spell out email addresses, to avoid problems with
@c address harvesters for spammers.
@ifhtml
@@ -134,9 +142,36 @@
@end macro
@end ifnothtml
+@c Indexing macros
+@ifinfo
+
+@macro cindexawkfunc{name}
+@cindex @code{\name\}
+@end macro
+
+@macro cindexgawkfunc{name}
+@cindex @code{\name\}
+@end macro
+
+@end ifinfo
+
+@ifnotinfo
+
+@macro cindexawkfunc{name}
+@cindex @code{\name\()} function
+@end macro
+
+@macro cindexgawkfunc{name}
+@cindex @code{\name\()} function (@command{gawk})
+@end macro
+@end ifnotinfo
+
@ignore
Some comments on the layout for TeX.
1. Use at least texinfo.tex 2014-01-30.15
+2. When using @docbook, if the last line is part of a paragraph, end
+it with a space and @c so that the lines won't run together. This is a
+quirk of the language / makeinfo, and isn't going to change.
@end ignore
@c merge the function and variable indexes into the concept index
@@ -152,6 +187,10 @@ Some comments on the layout for TeX.
@syncodeindex fn cp
@syncodeindex vr cp
@end ifxml
+@ifdocbook
+@synindex fn cp
+@synindex vr cp
+@end ifdocbook
@c If "finalout" is commented out, the printed output will show
@c black boxes that mark lines that are too long. Thus, it is
@@ -163,10 +202,26 @@ Some comments on the layout for TeX.
@end iftex
@copying
-Copyright @copyright{} 1989, 1991, 1992, 1993, 1996, 1997, 1998, 1999,
-2000, 2001, 2002, 2003, 2004, 2005, 2007, 2009, 2010, 2011, 2012, 2013,
-2014
+@docbook
+<para>Published by:</para>
+
+<literallayout class="normal">Free Software Foundation
+51 Franklin Street, Fifth Floor
+Boston, MA 02110-1301 USA
+Phone: +1-617-542-5942
+Fax: +1-617-542-2652
+Email: <email>gnu@@gnu.org</email>
+URL: <ulink url="http://www.gnu.org">http://www.gnu.org/</ulink></literallayout>
+
+<literallayout class="normal">Copyright &copy; 1989, 1991, 1992, 1993, 1996&ndash;2005, 2007, 2009&ndash;2014
Free Software Foundation, Inc.
+All Rights Reserved.</literallayout>
+@end docbook
+
+@ifnotdocbook
+Copyright @copyright{} 1989, 1991, 1992, 1993, 1996--2005, 2007, 2009--2014 @*
+Free Software Foundation, Inc.
+@end ifnotdocbook
@sp 2
This is Edition @value{EDITION} of @cite{@value{TITLE}: @value{SUBTITLE}},
@@ -214,6 +269,7 @@ supports it in developing GNU and promoting software freedom.''
@subtitle @value{UPDATE-MONTH}
@author Arnold D. Robbins
+@ifnotdocbook
@c Include the Distribution inside the titlepage environment so
@c that headings are turned off. Headings on and off do not work.
@@ -238,6 +294,7 @@ URL: @uref{http://www.gnu.org/} @*
ISBN 1-882114-28-0 @*
@sp 2
@insertcopying
+@end ifnotdocbook
@end titlepage
@c Thanks to Bob Chassell for directions on doing dedications.
@@ -262,6 +319,18 @@ ISBN 1-882114-28-0 @*
@headings on
@end iftex
+@docbook
+<dedication>
+<simplelist>
+<member>To Miriam, for making me complete.</member>
+<member>To Chana, for the joy you bring us.</member>
+<member>To Rivka, for the exponential increase.</member>
+<member>To Nachum, for the added dimension.</member>
+<member>To Malka, for the new beginning.</member>
+</simplelist>
+</dedication>
+@end docbook
+
@iftex
@headings off
@evenheading @thispage@ @ @ @strong{@value{TITLE}} @| @|
@@ -270,6 +339,7 @@ ISBN 1-882114-28-0 @*
@ifnottex
@ifnotxml
+@ifnotdocbook
@node Top
@top General Introduction
@c Preface node should come right after the Top
@@ -281,6 +351,7 @@ particular records in a file and perform operations upon them.
@insertcopying
+@end ifnotdocbook
@end ifnotxml
@end ifnottex
@@ -742,6 +813,7 @@ particular records in a file and perform operations upon them.
* Extension API Functions Introduction:: Introduction to the API functions.
* General Data Types:: The data types.
* Requesting Values:: How to get a value.
+* Memory Allocation Functions:: Functions for allocating memory.
* Constructor Functions:: Functions for creating values.
* Registration Functions:: Functions to register things with
@command{gawk}.
@@ -804,7 +876,8 @@ particular records in a file and perform operations upon them.
version of @command{awk}.
* POSIX/GNU:: The extensions in @command{gawk} not
in POSIX @command{awk}.
-* Feature History:: The history of the features in @command{gawk}.
+* Feature History:: The history of the features in
+ @command{gawk}.
* Common Extensions:: Common Extensions Summary.
* Ranges and Locales:: How locales used to affect regexp
ranges.
@@ -976,21 +1049,37 @@ and the AWK prototype becomes the product.
The new @command{pgawk} (profiling @command{gawk}), produces
program execution counts.
I recently experimented with an algorithm that for
-@math{n} lines of input, exhibited
+@ifnotdocbook
+@math{n}
+@end ifnotdocbook
+@ifdocbook
+@i{n}
+@end ifdocbook
+lines of input, exhibited
@tex
$\sim\! Cn^2$
@end tex
@ifnottex
+@ifnotdocbook
~ C n^2
+@end ifnotdocbook
@end ifnottex
+@docbook
+<emphasis>&sim; Cn<superscript>2</superscript></emphasis> @c
+@end docbook
performance, while
theory predicted
@tex
$\sim\! Cn\log n$
@end tex
@ifnottex
+@ifnotdocbook
~ C n log n
+@end ifnotdocbook
@end ifnottex
+@docbook
+<emphasis>&sim; Cn log n</emphasis> @c
+@end docbook
behavior. A few minutes poring
over the @file{awkprof.out} profile pinpointed the problem to
a single line of code. @command{pgawk} is a welcome addition to
@@ -1000,6 +1089,7 @@ Arnold has distilled over a decade of experience writing and
using AWK programs, and developing @command{gawk}, into this book. If you use
AWK or want to learn how, then read this book.
+@cindex Brennan, Michael
@display
Michael Brennan
Author of @command{mawk}
@@ -1024,6 +1114,7 @@ Such jobs are often easier with @command{awk}.
The @command{awk} utility interprets a special-purpose programming language
that makes it easy to handle simple data-reformatting jobs.
+@cindex Brian Kernighan's @command{awk}
The GNU implementation of @command{awk} is called @command{gawk}; if you
invoke it with the proper options or environment variables
(@pxref{Options}), it is fully
@@ -1774,7 +1865,7 @@ significant editorial help for this @value{DOCUMENT} for the
3.1 release of @command{gawk}.
@end quotation
-@cindex Beebe, Nelson
+@cindex Beebe, Nelson H.F.@:
@cindex Buening, Andreas
@cindex Collado, Manuel
@cindex Colombo, Antonio
@@ -2067,11 +2158,11 @@ $ @kbd{awk "BEGIN @{ print \"Don't Panic!\" @}"}
@print{} Don't Panic!
@end example
-@cindex quoting
-@cindex double quote (@code{"})
-@cindex @code{"} (double quote)
-@cindex @code{\} (backslash)
-@cindex backslash (@code{\})
+@cindex shell quoting, double quote
+@cindex double quote (@code{"}) in shell commands
+@cindex @code{"} (double quote) in shell commands
+@cindex @code{\} (backslash) in shell commands
+@cindex backslash (@code{\}) in shell commands
This program does not read any input. The @samp{\} before each of the
inner double quotes is necessary because of the shell's quoting
rules---in particular because it mixes both single quotes and
@@ -2111,8 +2202,7 @@ awk -f @var{source-file} @var{input-file1} @var{input-file2} @dots{}
@end example
@cindex @option{-f} option
-@cindex command line, options
-@cindex options, command-line
+@cindex command line, option @option{-f}
The @option{-f} instructs the @command{awk} utility to get the @command{awk} program
from the file @var{source-file}. Any file name can be used for
@var{source-file}. For example, you could put the program:
@@ -2135,7 +2225,7 @@ does the same thing as this one:
awk "BEGIN @{ print \"Don't Panic!\" @}"
@end example
-@cindex quoting
+@cindex quoting in @command{gawk} command lines
@noindent
This was explained earlier
(@pxref{Read Terminal}).
@@ -2146,9 +2236,9 @@ program did not have single quotes around it. The quotes are only needed
for programs that are provided on the @command{awk} command line.
@c STARTOFRANGE sq1x
-@cindex single quote (@code{'})
+@cindex single quote (@code{'}) in @command{gawk} command lines
@c STARTOFRANGE qs2x
-@cindex @code{'} (single quote)
+@cindex @code{'} (single quote) in @command{gawk} command lines
If you want to clearly identify your @command{awk} program files as such,
you can add the extension @file{.awk} to the file name. This doesn't
affect the execution of the @command{awk} program but it does make
@@ -2297,7 +2387,7 @@ programs, but this usually isn't very useful; the purpose of a
comment is to help you or another person understand the program
when reading it at a later time.
-@cindex quoting
+@cindex quoting, for small awk programs
@cindex single quote (@code{'}), vs.@: apostrophe
@cindex @code{'} (single quote), vs.@: apostrophe
@quotation CAUTION
@@ -2338,7 +2428,7 @@ The next @value{SUBSECTION} describes the shell's quoting rules.
@node Quoting
@subsection Shell-Quoting Issues
-@cindex quoting, rules for
+@cindex shell quoting, rules for
@menu
* DOS Quoting:: Quoting in Windows Batch Files.
@@ -2373,10 +2463,10 @@ that character. The shell removes the backslash and passes the quoted
character on to the command.
@item
-@cindex @code{\} (backslash)
-@cindex backslash (@code{\})
-@cindex single quote (@code{'})
-@cindex @code{'} (single quote)
+@cindex @code{\} (backslash), in shell commands
+@cindex backslash (@code{\}), in shell commands
+@cindex single quote (@code{'}), in shell commands
+@cindex @code{'} (single quote), in shell commands
Single quotes protect everything between the opening and closing quotes.
The shell does no interpretation of the quoted text, passing it on verbatim
to the command.
@@ -2386,8 +2476,8 @@ Refer back to
for an example of what happens if you try.
@item
-@cindex double quote (@code{"})
-@cindex @code{"} (double quote)
+@cindex double quote (@code{"}), in shell commands
+@cindex @code{"} (double quote), in shell commands
Double quotes protect most things between the opening and closing quotes.
The shell does at least variable and command substitution on the quoted text.
Different shells may do additional kinds of processing on double-quoted text.
@@ -2424,7 +2514,7 @@ awk -F "" '@var{program}' @var{files} # correct
@end example
@noindent
-@cindex null strings, quoting and
+@cindex null strings in @command{gawk} arguments, quoting and
Don't use this:
@example
@@ -2437,7 +2527,7 @@ as the value of @code{FS}, and the first file name as the text of the program!
This results in syntax errors at best, and confusing behavior at worst.
@end itemize
-@cindex quoting, tricks for
+@cindex quoting in @command{gawk} command lines, tricks for
Mixing single and double quotes is difficult. You have to resort
to shell quoting tricks, like this:
@@ -2552,40 +2642,39 @@ gawk "@{ print \"\042\" $0 \"\042\" @}" @var{file}
@c For gawk >= 4.0, update these data files. No-one has such slow modems!
@cindex input files, examples
-@cindex @code{BBS-list} file
+@cindex @code{mail-list} file
Many of the examples in this @value{DOCUMENT} take their input from two sample
-data files. The first, @file{BBS-list}, represents a list of
-computer bulletin board systems together with information about those systems.
+data files. The first, @file{mail-list}, represents a list of peoples' names
+together with their email addresses and information about those people.
The second data file, called @file{inventory-shipped}, contains
information about monthly shipments. In both files,
each line is considered to be one @dfn{record}.
-In the data file @file{BBS-list}, each record contains the name of a computer
-bulletin board, its phone number, the board's baud rate(s), and a code for
-the number of hours it is operational. An @samp{A} in the last column
-means the board operates 24 hours a day. A @samp{B} in the last
-column means the board only operates on evening and weekend hours.
-A @samp{C} means the board operates only on weekends:
+In the data file @file{mail-list}, each record contains the name of a person,
+his/her phone number, his/her email-address, and a code for their relationship
+with the author of the list. An @samp{A} in the last column
+means that the person is an acquaintance. An @samp{F} in the last
+column means that the person is a friend.
+An @samp{R} means that the person is a relative:
-@c 2e: Update the baud rates to reflect today's faster modems
@example
@c system if test ! -d eg ; then mkdir eg ; fi
@c system if test ! -d eg/lib ; then mkdir eg/lib ; fi
@c system if test ! -d eg/data ; then mkdir eg/data ; fi
@c system if test ! -d eg/prog ; then mkdir eg/prog ; fi
@c system if test ! -d eg/misc ; then mkdir eg/misc ; fi
-@c file eg/data/BBS-list
-aardvark 555-5553 1200/300 B
-alpo-net 555-3412 2400/1200/300 A
-barfly 555-7685 1200/300 A
-bites 555-1675 2400/1200/300 A
-camelot 555-0542 300 C
-core 555-2912 1200/300 C
-fooey 555-1234 2400/1200/300 B
-foot 555-6699 1200/300 B
-macfoo 555-6480 1200/300 A
-sdace 555-3430 2400/1200/300 A
-sabafoo 555-2127 1200/300 C
+@c file eg/data/mail-list
+Amelia 555-5553 amelia.zodiacusque@@gmail.com F
+Anthony 555-3412 anthony.asserturo@@hotmail.com A
+Becky 555-7685 becky.algebrarum@@gmail.com A
+Bill 555-1675 bill.drowning@@hotmail.com A
+Broderick 555-0542 broderick.aliquotiens@@yahoo.com R
+Camilla 555-2912 camilla.infusarum@@skynet.be R
+Fabius 555-1234 fabius.undevicesimus@@ucb.edu F
+Julie 555-6699 julie.perscrutabor@@skeeve.com F
+Martin 555-6480 martin.codicibus@@hotmail.com A
+Samuel 555-3430 samuel.lanceolis@@shu.edu A
+Jean-Paul 555-2127 jeanpaul.campanorum@@nyu.edu R
@c endfile
@end example
@@ -2627,23 +2716,23 @@ in the directory @file{awklib/eg/data}.
@section Some Simple Examples
The following command runs a simple @command{awk} program that searches the
-input file @file{BBS-list} for the character string @samp{foo} (a
+input file @file{mail-list} for the character string @samp{li} (a
grouping of characters is usually called a @dfn{string};
the term @dfn{string} is based on similar usage in English, such
as ``a string of pearls,'' or ``a string of cars in a train''):
@example
-awk '/foo/ @{ print $0 @}' BBS-list
+awk '/li/ @{ print $0 @}' mail-list
@end example
@noindent
-When lines containing @samp{foo} are found, they are printed because
+When lines containing @samp{li} are found, they are printed because
@w{@samp{print $0}} means print the current line. (Just @samp{print} by
itself means the same thing, so we could have written that
instead.)
-You will notice that slashes (@samp{/}) surround the string @samp{foo}
-in the @command{awk} program. The slashes indicate that @samp{foo}
+You will notice that slashes (@samp{/}) surround the string @samp{li}
+in the @command{awk} program. The slashes indicate that @samp{li}
is the pattern to search for. This type of pattern is called a
@dfn{regular expression}, which is covered in more detail later
(@pxref{Regexp}).
@@ -2655,11 +2744,11 @@ interpret any of it as special shell characters.
Here is what this program prints:
@example
-$ @kbd{awk '/foo/ @{ print $0 @}' BBS-list}
-@print{} fooey 555-1234 2400/1200/300 B
-@print{} foot 555-6699 1200/300 B
-@print{} macfoo 555-6480 1200/300 A
-@print{} sabafoo 555-2127 1200/300 C
+$ @kbd{awk '/li/ @{ print $0 @}' mail-list}
+@print{} Amelia 555-5553 amelia.zodiacusque@@gmail.com F
+@print{} Broderick 555-0542 broderick.aliquotiens@@yahoo.com R
+@print{} Julie 555-6699 julie.perscrutabor@@skeeve.com F
+@print{} Samuel 555-3430 samuel.lanceolis@@shu.edu A
@end example
@cindex actions, default
@@ -2672,7 +2761,7 @@ action is to print all lines that match the pattern.
@cindex actions, empty
Thus, we could leave out the action (the @code{print} statement and the curly
braces) in the previous example and the result would be the same:
-@command{awk} prints all lines matching the pattern @samp{foo}. By comparison,
+@command{awk} prints all lines matching the pattern @samp{li}. By comparison,
omitting the @code{print} statement but retaining the curly braces makes an
empty action that does nothing (i.e., no lines are printed).
@@ -2817,29 +2906,23 @@ This program prints every line that contains the string
strings, it is printed twice, once by each rule.
This is what happens if we run this program on our two sample data files,
-@file{BBS-list} and @file{inventory-shipped}:
+@file{mail-list} and @file{inventory-shipped}:
@example
$ @kbd{awk '/12/ @{ print $0 @}}
-> @kbd{/21/ @{ print $0 @}' BBS-list inventory-shipped}
-@print{} aardvark 555-5553 1200/300 B
-@print{} alpo-net 555-3412 2400/1200/300 A
-@print{} barfly 555-7685 1200/300 A
-@print{} bites 555-1675 2400/1200/300 A
-@print{} core 555-2912 1200/300 C
-@print{} fooey 555-1234 2400/1200/300 B
-@print{} foot 555-6699 1200/300 B
-@print{} macfoo 555-6480 1200/300 A
-@print{} sdace 555-3430 2400/1200/300 A
-@print{} sabafoo 555-2127 1200/300 C
-@print{} sabafoo 555-2127 1200/300 C
+> @kbd{/21/ @{ print $0 @}' mail-list inventory-shipped}
+@print{} Anthony 555-3412 anthony.asserturo@@hotmail.com A
+@print{} Camilla 555-2912 camilla.infusarum@@skynet.be R
+@print{} Fabius 555-1234 fabius.undevicesimus@@ucb.edu F
+@print{} Jean-Paul 555-2127 jeanpaul.campanorum@@nyu.edu R
+@print{} Jean-Paul 555-2127 jeanpaul.campanorum@@nyu.edu R
@print{} Jan 21 36 64 620
@print{} Apr 21 70 74 514
@end example
@noindent
-Note how the line beginning with @samp{sabafoo}
-in @file{BBS-list} was printed twice, once for each rule.
+Note how the line beginning with @samp{Jean-Paul}
+in @file{mail-list} was printed twice, once for each rule.
@node More Complex
@section A More Complex Example
@@ -2918,7 +3001,7 @@ separate rule, like this:
@example
awk '/12/ @{ print $0 @}
- /21/ @{ print $0 @}' BBS-list inventory-shipped
+ /21/ @{ print $0 @}' mail-list inventory-shipped
@end example
@cindex @command{gawk}, newlines in
@@ -3033,8 +3116,8 @@ noticed because it is ``hidden'' inside the comment. Thus, the
@code{BEGIN} is noted as a syntax error.
@cindex statements, multiple
-@cindex @code{;} (semicolon)
-@cindex semicolon (@code{;})
+@cindex @code{;} (semicolon), separating statements in actions
+@cindex semicolon (@code{;}), separating statements in actions
When @command{awk} statements within one rule are short, you might want to put
more than one of them on a line. This is accomplished by separating the statements
with a semicolon (@samp{;}).
@@ -3094,6 +3177,7 @@ used once, and thrown away. Because @command{awk} programs are interpreted, you
can avoid the (usually lengthy) compilation part of the typical
edit-compile-test-debug cycle of software development.
+@cindex Brian Kernighan's @command{awk}
Complex programs have been written in @command{awk}, including a complete
retargetable assembler for eight-bit microprocessors (@pxref{Glossary}, for
more information), and a microcode assembler for a special-purpose Prolog
@@ -3156,10 +3240,19 @@ There are two ways to run @command{awk}---with an explicit program or with
one or more program files. Here are templates for both of them; items
enclosed in [@dots{}] in these templates are optional:
+@ifnotdocbook
@example
awk @r{[@var{options}]} -f progfile @r{[@code{--}]} @var{file} @dots{}
awk @r{[@var{options}]} @r{[@code{--}]} '@var{program}' @var{file} @dots{}
@end example
+@end ifnotdocbook
+
+@c FIXME - find a better way to mark this up in docbook
+@docbook
+<screen>awk [<replaceable>options</replaceable>] -f progfile [<literal>--</literal>] <replaceable>file</replaceable> &#8230;
+awk [<replaceable>options</replaceable>] [<literal>--</literal>] '<replaceable>program</replaceable>' <replaceable>file</replaceable> &#8230;
+</screen>
+@end docbook
@cindex GNU long options
@cindex long options
@@ -3325,6 +3418,7 @@ Print the short version of the General Public License and then exit.
@itemx --dump-variables@r{[}=@var{file}@r{]}
@cindex @option{-d} option
@cindex @option{--dump-variables} option
+@cindex dump all variables of a program
@cindex @file{awkvars.out} file
@cindex files, @file{awkvars.out}
@cindex variables, global, printing list of
@@ -3478,7 +3572,7 @@ care to search for all occurrences of each inappropriate construct. As
@cindex @option{--bignum} option
Force arbitrary precision arithmetic on numbers. This option has no effect
if @command{gawk} is not compiled to use the GNU MPFR and MP libraries
-(@pxref{Arbitrary Precision Arithmetic}).
+(@pxref{Gawk and MPFR}).
@item -n
@itemx --non-decimal-data
@@ -3731,6 +3825,7 @@ file at all.
@cindex @command{gawk}, @code{ARGIND} variable in
@cindex @code{ARGIND} variable, command-line arguments
+@cindex @code{ARGV} array, indexing into
@cindex @code{ARGC}/@code{ARGV} variables, command-line arguments
All these arguments are made available to your @command{awk} program in the
@code{ARGV} array (@pxref{Built-in Variables}). Command-line options
@@ -3741,6 +3836,7 @@ sets the variable @code{ARGIND} to the index in @code{ARGV} of the
current element.
@cindex input files, variable assignments and
+@cindex variable assignments and input files
The distinction between file name arguments and variable-assignment
arguments is made when @command{awk} is about to open the next input file.
At that point in execution, it checks the file name to see whether
@@ -3818,6 +3914,7 @@ this file name itself.)
@node Environment Variables
@section The Environment Variables @command{gawk} Uses
+@cindex environment variables used by @command{gawk}
A number of environment variables influence how @command{gawk}
behaves.
@@ -3833,8 +3930,7 @@ behaves.
@node AWKPATH Variable
@subsection The @env{AWKPATH} Environment Variable
@cindex @env{AWKPATH} environment variable
-@cindex directories, searching
-@cindex search paths
+@cindex directories, searching for source files
@cindex search paths, for source files
@cindex differences in @command{awk} and @command{gawk}, @code{AWKPATH} environment variable
@ifinfo
@@ -3846,12 +3942,12 @@ implementations, you must supply a precise path name for each program
file, unless the file is in the current directory.
But in @command{gawk}, if the file name supplied to the @option{-f}
or @option{-i} options
-does not contain a @samp{/}, then @command{gawk} searches a list of
+does not contain a directory separator @samp{/}, then @command{gawk} searches a list of
directories (called the @dfn{search path}), one by one, looking for a
file with the specified name.
The search path is a string consisting of directory names
-separated by colons. @command{gawk} gets its search path from the
+separated by colons@footnote{Semicolons on MS-Windows and MS-DOS.}. @command{gawk} gets its search path from the
@env{AWKPATH} environment variable. If that variable does not exist,
@command{gawk} uses a default path,
@samp{.:/usr/local/share/awk}.@footnote{Your version of @command{gawk}
@@ -3909,8 +4005,7 @@ found, and @command{gawk} no longer needs to use @env{AWKPATH}.
@node AWKLIBPATH Variable
@subsection The @env{AWKLIBPATH} Environment Variable
@cindex @env{AWKLIBPATH} environment variable
-@cindex directories, searching
-@cindex search paths
+@cindex directories, searching for shared libraries
@cindex search paths, for shared libraries
@cindex differences in @command{awk} and @command{gawk}, @code{AWKLIBPATH} environment variable
@@ -4195,7 +4290,6 @@ they will @emph{not} be in the next release).
@c update this section for each release!
-@cindex @code{PROCINFO} array
The process-related special files @file{/dev/pid}, @file{/dev/ppid},
@file{/dev/pgrpid}, and @file{/dev/user} were deprecated in @command{gawk}
3.1, but still worked. As of version 4.0, they are no longer
@@ -4221,6 +4315,7 @@ in case some option becomes obsolete in a future version of @command{gawk}.
@author Obi-Wan
@end quotation
+@cindex shells, sea
This @value{SECTION} intentionally left
blank.
@@ -4233,7 +4328,7 @@ blank.
@table @code
@item -W nostalgia
@itemx --nostalgia
-Print the message @code{"awk: bailing out near line 1"} and dump core.
+Print the message @samp{awk: bailing out near line 1} and dump core.
This option was inspired by the common behavior of very early versions of
Unix @command{awk} and by a t--shirt.
The message is @emph{not} subject to translation in non-English locales.
@@ -4279,7 +4374,7 @@ long-undocumented ``feature'' of Unix @code{awk}.
@node Regexp
@chapter Regular Expressions
-@cindex regexp, See regular expressions
+@cindex regexp
@c STARTOFRANGE regexp
@cindex regular expressions
@@ -4288,8 +4383,8 @@ set of strings.
Because regular expressions are such a fundamental part of @command{awk}
programming, their format and use deserve a separate @value{CHAPTER}.
-@cindex forward slash (@code{/})
-@cindex @code{/} (forward slash)
+@cindex forward slash (@code{/}) to enclose regular expressions
+@cindex @code{/} (forward slash) to enclose regular expressions
A regular expression enclosed in slashes (@samp{/})
is an @command{awk} pattern that matches every input record whose text
belongs to that set.
@@ -4326,14 +4421,14 @@ slashes. Then the regular expression is tested against the
entire text of each record. (Normally, it only needs
to match some part of the text in order to succeed.) For example, the
following prints the second field of each record that contains the string
-@samp{foo} anywhere in it:
+@samp{li} anywhere in it:
@example
-$ @kbd{awk '/foo/ @{ print $2 @}' BBS-list}
-@print{} 555-1234
+$ @kbd{awk '/li/ @{ print $2 @}' mail-list}
+@print{} 555-5553
+@print{} 555-0542
@print{} 555-6699
-@print{} 555-6480
-@print{} 555-2127
+@print{} 555-3430
@end example
@cindex regular expressions, operators
@@ -4345,9 +4440,9 @@ $ @kbd{awk '/foo/ @{ print $2 @}' BBS-list}
@cindex @code{!} (exclamation point), @code{!~} operator
@cindex exclamation point (@code{!}), @code{!~} operator
@c @cindex operators, @code{!~}
-@cindex @code{if} statement
-@cindex @code{while} statement
-@cindex @code{do}-@code{while} statement
+@cindex @code{if} statement, use of regexps in
+@cindex @code{while} statement, use of regexps in
+@cindex @code{do}-@code{while} statement, use of regexps in
@c @cindex statements, @code{if}
@c @cindex statements, @code{while}
@c @cindex statements, @code{do}
@@ -4406,6 +4501,7 @@ $ @kbd{awk '$1 !~ /J/' inventory-shipped}
@end example
@cindex regexp constants
+@cindex constant regexps
@cindex regular expressions, constants, See regexp constants
When a regexp is enclosed in slashes, such as @code{/foo/}, we call it
a @dfn{regexp constant}, much like @code{5.27} is a numeric constant and
@@ -4414,7 +4510,7 @@ a @dfn{regexp constant}, much like @code{5.27} is a numeric constant and
@node Escape Sequences
@section Escape Sequences
-@cindex escape sequences
+@cindex escape sequences, in strings
@cindex backslash (@code{\}), in escape sequences
@cindex @code{\} (backslash), in escape sequences
Some characters cannot be included literally in string constants
@@ -4584,6 +4680,7 @@ leaves what happens as undefined. There are two choices:
@c @cindex automatic warnings
@c @cindex warnings, automatic
+@cindex Brian Kernighan's @command{awk}
@table @asis
@item Strip the backslash out
This is what Brian Kernighan's @command{awk} and @command{gawk} both do.
@@ -4597,6 +4694,7 @@ two backslashes in the string: @samp{FS = @w{"[ \t]+\\|[ \t]+"}}.)
@cindex @command{gawk}, escape sequences
@cindex Unix @command{awk}, backslashes in escape sequences
+@cindex @command{mawk} utility
@item Leave the backslash alone
Some other @command{awk} implementations do this.
In such implementations, typing @code{"a\qc"} is the same as typing
@@ -4625,6 +4723,7 @@ leaves what happens as undefined. There are two choices:
@c @cindex automatic warnings
@c @cindex warnings, automatic
+@cindex Brian Kernighan's @command{awk}
@table @asis
@item Strip the backslash out
This is what Brian Kernighan's @command{awk} and @command{gawk} both do.
@@ -4638,6 +4737,7 @@ two backslashes in the string: @samp{FS = @w{"[ \t]+\\|[ \t]+"}}.)
@cindex @command{gawk}, escape sequences
@cindex Unix @command{awk}, backslashes in escape sequences
+@cindex @command{mawk} utility
@item Leave the backslash alone
Some other @command{awk} implementations do this.
In such implementations, typing @code{"a\qc"} is the same as typing
@@ -4704,6 +4804,7 @@ escape sequences literally when used in regexp constants. Thus,
@section Regular Expression Operators
@c STARTOFRANGE regexpo
@cindex regular expressions, operators
+@cindex metacharacters in regular expressions
You can combine regular expressions with special characters,
called @dfn{regular expression operators} or @dfn{metacharacters}, to
@@ -4722,8 +4823,8 @@ Here is a list of metacharacters. All characters that are not escape
sequences and that are not listed in the table stand for themselves:
@table @code
-@cindex backslash (@code{\})
-@cindex @code{\} (backslash)
+@cindex backslash (@code{\}), regexp operator
+@cindex @code{\} (backslash), regexp operator
@item \
This is used to suppress the special meaning of a character when
matching. For example, @samp{\$}
@@ -4748,8 +4849,8 @@ The condition is not true in the following example:
if ("line1\nLINE 2" ~ /^L/) @dots{}
@end example
-@cindex @code{$} (dollar sign)
-@cindex dollar sign (@code{$})
+@cindex @code{$} (dollar sign), regexp operator
+@cindex dollar sign (@code{$}), regexp operator
@item $
This is similar to @samp{^}, but it matches only at the end of a string.
For example, @samp{p$}
@@ -4761,8 +4862,8 @@ The condition in the following example is not true:
if ("line1\nLINE 2" ~ /1$/) @dots{}
@end example
-@cindex @code{.} (period)
-@cindex period (@code{.})
+@cindex @code{.} (period), regexp operator
+@cindex period (@code{.}), regexp operator
@item . @r{(period)}
This matches any single character,
@emph{including} the newline character. For example, @samp{.P}
@@ -4778,11 +4879,12 @@ character, which is a character with all bits equal to zero.
Otherwise, @sc{nul} is just another character. Other versions of @command{awk}
may not be able to match the @sc{nul} character.
-@cindex @code{[]} (square brackets)
-@cindex square brackets (@code{[]})
+@cindex @code{[]} (square brackets), regexp operator
+@cindex square brackets (@code{[]}), regexp operator
@cindex bracket expressions
@cindex character sets, See Also bracket expressions
@cindex character lists, See bracket expressions
+@cindex character classes, See bracket expressions
@item [@dots{}]
This is called a @dfn{bracket expression}.@footnote{In other literature,
you may see a bracket expression referred to as either a
@@ -4815,8 +4917,8 @@ means it matches any string that starts with @samp{P} or contains a digit.
The alternation applies to the largest possible regexps on either side.
-@cindex @code{()} (parentheses)
-@cindex parentheses @code{()}
+@cindex @code{()} (parentheses), regexp operator
+@cindex parentheses @code{()}, regexp operator
@item (@dots{})
Parentheses are used for grouping in regular expressions, as in
arithmetic. They can be used to concatenate regular expressions
@@ -4844,8 +4946,8 @@ prints every record in @file{sample} containing a string of the form
Notice the escaping of the parentheses by preceding them
with backslashes.
-@cindex @code{+} (plus sign)
-@cindex plus sign (@code{+})
+@cindex @code{+} (plus sign), regexp operator
+@cindex plus sign (@code{+}), regexp operator
@item +
This symbol is similar to @samp{*}, except that the preceding expression must be
matched at least once. This means that @samp{wh+y}
@@ -4858,14 +4960,14 @@ way of writing the last @samp{*} example:
awk '/\(c[ad]+r x\)/ @{ print @}' sample
@end example
-@cindex @code{?} (question mark) regexp operator
-@cindex question mark (@code{?}) regexp operator
+@cindex @code{?} (question mark), regexp operator
+@cindex question mark (@code{?}), regexp operator
@item ?
This symbol is similar to @samp{*}, except that the preceding expression can be
matched either once or not at all. For example, @samp{fe?d}
matches @samp{fed} and @samp{fd}, but nothing else.
-@cindex interval expressions
+@cindex interval expressions, regexp operator
@item @{@var{n}@}
@itemx @{@var{n},@}
@itemx @{@var{n},@var{m}@}
@@ -4942,6 +5044,7 @@ expressions are not available in regular expressions.
@cindex bracket expressions
@cindex bracket expressions, range expressions
@cindex range expressions (regexps)
+@cindex character lists in regular expression
As mentioned earlier, a bracket expression matches any character amongst
those listed between the opening and closing square brackets.
@@ -5181,10 +5284,10 @@ Matches the empty string at the
end of a buffer (string).
@end table
-@cindex @code{^} (caret)
-@cindex caret (@code{^})
-@cindex @code{?} (question mark) regexp operator
-@cindex question mark (@code{?}) regexp operator
+@cindex @code{^} (caret), regexp operator
+@cindex caret (@code{^}), regexp operator
+@cindex @code{?} (question mark), regexp operator
+@cindex question mark (@code{?}), regexp operator
Because @samp{^} and @samp{$} always work in terms of the beginning
and end of strings, these operators don't add any new capabilities
for @command{awk}. They are provided for compatibility with other
@@ -5205,7 +5308,7 @@ lesser of two evils.
@c
@c Should really do this with file inclusion.
@cindex regular expressions, @command{gawk}, command-line options
-@cindex @command{gawk}, command-line options
+@cindex @command{gawk}, command-line options, and regular expressions
The various command-line options
(@pxref{Options})
control how @command{gawk} interprets characters in regexps:
@@ -5228,6 +5331,7 @@ Only POSIX regexps are supported; the GNU operators are not special
(e.g., @samp{\w} matches a literal @samp{w}). Interval expressions
are allowed.
+@cindex Brian Kernighan's @command{awk}
@item @code{--traditional}
Traditional Unix @command{awk} regexps are matched. The GNU operators
are not special, and interval expressions are not available.
@@ -5283,7 +5387,7 @@ This works in any POSIX-compliant @command{awk}.
@cindex tilde (@code{~}), @code{~} operator
@cindex @code{!} (exclamation point), @code{!~} operator
@cindex exclamation point (@code{!}), @code{!~} operator
-@cindex @code{IGNORECASE} variable
+@cindex @code{IGNORECASE} variable, with @code{~} and @code{!~} operators
@cindex @command{gawk}, @code{IGNORECASE} variable in
@c @cindex variables, @code{IGNORECASE}
Another method, specific to @command{gawk}, is to set the variable
@@ -5548,6 +5652,7 @@ occur often in practice, but it's worth noting for future reference.
@chapter Reading Input Files
@c STARTOFRANGE infir
+@cindex reading input files
@cindex input files, reading
@cindex input files
@cindex @code{FILENAME} variable
@@ -5634,68 +5739,79 @@ To do this, use the special @code{BEGIN} pattern
(@pxref{BEGIN/END}).
For example:
-@cindex @code{BEGIN} pattern
@example
-awk 'BEGIN @{ RS = "/" @}
- @{ print $0 @}' BBS-list
+awk 'BEGIN @{ RS = "u" @}
+ @{ print $0 @}' mail-list
@end example
@noindent
-changes the value of @code{RS} to @code{"/"}, before reading any input.
-This is a string whose first character is a slash; as a result, records
-are separated by slashes. Then the input file is read, and the second
+changes the value of @code{RS} to @samp{u}, before reading any input.
+This is a string whose first character is the letter ``u;'' as a result, records
+are separated by the letter ``u.'' Then the input file is read, and the second
rule in the @command{awk} program (the action with no pattern) prints each
record. Because each @code{print} statement adds a newline at the end of
its output, this @command{awk} program copies the input
-with each slash changed to a newline. Here are the results of running
-the program on @file{BBS-list}:
-
-@example
-$ @kbd{awk 'BEGIN @{ RS = "/" @}}
-> @kbd{@{ print $0 @}' BBS-list}
-@print{} aardvark 555-5553 1200
-@print{} 300 B
-@print{} alpo-net 555-3412 2400
-@print{} 1200
-@print{} 300 A
-@print{} barfly 555-7685 1200
-@print{} 300 A
-@print{} bites 555-1675 2400
-@print{} 1200
-@print{} 300 A
-@print{} camelot 555-0542 300 C
-@print{} core 555-2912 1200
-@print{} 300 C
-@print{} fooey 555-1234 2400
-@print{} 1200
-@print{} 300 B
-@print{} foot 555-6699 1200
-@print{} 300 B
-@print{} macfoo 555-6480 1200
-@print{} 300 A
-@print{} sdace 555-3430 2400
-@print{} 1200
-@print{} 300 A
-@print{} sabafoo 555-2127 1200
-@print{} 300 C
-@print{}
+with each @samp{u} changed to a newline. Here are the results of running
+the program on @file{mail-list}:
+
+@example
+$ @kbd{awk 'BEGIN @{ RS = "u" @}}
+> @kbd{@{ print $0 @}' mail-list}
+@print{} Amelia 555-5553 amelia.zodiac
+@print{} sq
+@print{} e@@gmail.com F
+@print{} Anthony 555-3412 anthony.assert
+@print{} ro@@hotmail.com A
+@print{} Becky 555-7685 becky.algebrar
+@print{} m@@gmail.com A
+@print{} Bill 555-1675 bill.drowning@@hotmail.com A
+@print{} Broderick 555-0542 broderick.aliq
+@print{} otiens@@yahoo.com R
+@print{} Camilla 555-2912 camilla.inf
+@print{} sar
+@print{} m@@skynet.be R
+@print{} Fabi
+@print{} s 555-1234 fabi
+@print{} s.
+@print{} ndevicesim
+@print{} s@@
+@print{} cb.ed
+@print{} F
+@print{} J
+@print{} lie 555-6699 j
+@print{} lie.perscr
+@print{} tabor@@skeeve.com F
+@print{} Martin 555-6480 martin.codicib
+@print{} s@@hotmail.com A
+@print{} Sam
+@print{} el 555-3430 sam
+@print{} el.lanceolis@@sh
+@print{} .ed
+@print{} A
+@print{} Jean-Pa
+@print{} l 555-2127 jeanpa
+@print{} l.campanor
+@print{} m@@ny
+@print{} .ed
+@print{} R
+@print{}
@end example
@noindent
-Note that the entry for the @samp{camelot} BBS is not split.
+Note that the entry for the name @samp{Bill} is not split.
In the original data file
(@pxref{Sample Data Files}),
the line looks like this:
@example
-camelot 555-0542 300 C
+Bill 555-1675 bill.drowning@@hotmail.com A
@end example
@noindent
-It has one baud rate only, so there are no slashes in the record,
-unlike the others which have two or more baud rates.
-In fact, this record is treated as part of the record
-for the @samp{core} BBS; the newline separating them in the output
+It contains no @samp{u} so there is no reason to split the record,
+unlike the others which have one or more occurrences of the @samp{u}.
+In fact, this record is treated as part of the previous record;
+the newline separating them in the output
is the original newline in the data file, not the one added by
@command{awk} when it printed the record!
@@ -5706,14 +5822,17 @@ using the variable-assignment feature
(@pxref{Other Arguments}):
@example
-awk '@{ print $0 @}' RS="/" BBS-list
+awk '@{ print $0 @}' RS="u" mail-list
@end example
@noindent
-This sets @code{RS} to @samp{/} before processing @file{BBS-list}.
+This sets @code{RS} to @samp{u} before processing @file{mail-list}.
-Using an unusual character such as @samp{/} for the record separator
-produces correct behavior in the vast majority of cases.
+Using an alphabetic character such as @samp{u} for the record separator
+is highly likely to produce strange results.
+Using an unusual character such as @samp{/} is more likely to
+produce correct behavior in the majority of cases, but there
+are no guarantees. The moral is: Know Your Data.
There is one unusual case, that occurs when @command{gawk} is
being fully POSIX-compliant (@pxref{Options}).
@@ -5735,6 +5854,7 @@ Reaching the end of an input file terminates the current input record,
even if the last character in the file is not the character in @code{RS}.
@value{DARKCORNER}
+@cindex empty strings
@cindex null strings
@cindex strings, empty, See null strings
The empty string @code{""} (a string without any characters)
@@ -5871,7 +5991,7 @@ character as a record separator. However, this is a special case:
@command{mawk} does not allow embedded @sc{nul} characters in strings.
@cindex records, treating files as
-@cindex files, as single records
+@cindex treating files, as single records
The best way to treat a whole file as a single record is to
simply read the file in, one record at a time, concatenating each
record onto the end of the previous ones.
@@ -5922,7 +6042,7 @@ character as a record separator. However, this is a special case:
@command{mawk} does not allow embedded @sc{nul} characters in strings.
@cindex records, treating files as
-@cindex files, as single records
+@cindex treating files, as single records
The best way to treat a whole file as a single record is to
simply read the file in, one record at a time, concatenating each
record onto the end of the previous ones.
@@ -6000,31 +6120,29 @@ when you are not interested in specific fields.
Here are some more examples:
@example
-$ @kbd{awk '$1 ~ /foo/ @{ print $0 @}' BBS-list}
-@print{} fooey 555-1234 2400/1200/300 B
-@print{} foot 555-6699 1200/300 B
-@print{} macfoo 555-6480 1200/300 A
-@print{} sabafoo 555-2127 1200/300 C
+$ @kbd{awk '$1 ~ /li/ @{ print $0 @}' mail-list}
+@print{} Amelia 555-5553 amelia.zodiacusque@@gmail.com F
+@print{} Julie 555-6699 julie.perscrutabor@@skeeve.com F
@end example
@noindent
-This example prints each record in the file @file{BBS-list} whose first
-field contains the string @samp{foo}. The operator @samp{~} is called a
+This example prints each record in the file @file{mail-list} whose first
+field contains the string @samp{li}. The operator @samp{~} is called a
@dfn{matching operator}
(@pxref{Regexp Usage});
it tests whether a string (here, the field @code{$1}) matches a given regular
expression.
By contrast, the following example
-looks for @samp{foo} in @emph{the entire record} and prints the first
+looks for @samp{li} in @emph{the entire record} and prints the first
field and the last field for each matching input record:
@example
-$ @kbd{awk '/foo/ @{ print $1, $NF @}' BBS-list}
-@print{} fooey B
-@print{} foot B
-@print{} macfoo A
-@print{} sabafoo C
+$ @kbd{awk '/li/ @{ print $1, $NF @}' mail-list}
+@print{} Amelia F
+@print{} Broderick R
+@print{} Julie F
+@print{} Samuel A
@end example
@c ENDOFRANGE fiex
@@ -6052,7 +6170,7 @@ the record has fewer than 20 fields, so this prints a blank line.
Here is another example of using expressions as field numbers:
@example
-awk '@{ print $(2*2) @}' BBS-list
+awk '@{ print $(2*2) @}' mail-list
@end example
@command{awk} evaluates the expression @samp{(2*2)} and uses
@@ -6061,8 +6179,8 @@ represents multiplication, so the expression @samp{2*2} evaluates to four.
The parentheses are used so that the multiplication is done before the
@samp{$} operation; they are necessary whenever there is a binary
operator in the field-number expression. This example, then, prints the
-hours of operation (the fourth field) for every line of the file
-@file{BBS-list}. (All of the @command{awk} operators are listed, in
+type of relationship (the fourth field) for every line of the file
+@file{mail-list}. (All of the @command{awk} operators are listed, in
order of decreasing precedence, in
@ref{Precedence}.)
@@ -6504,7 +6622,7 @@ was ignored when finding @code{$1}, it is not part of the new @code{$0}.
Finally, the last @code{print} statement prints the new @code{$0}.
@cindex @code{FS}, containing @code{^}
-@cindex @code{^}, in @code{FS}
+@cindex @code{^} (caret), in @code{FS}
@cindex dark corner, @code{^}, in @code{FS}
There is an additional subtlety to be aware of when using regular expressions
for field splitting.
@@ -6515,6 +6633,7 @@ different @command{awk} versions answer this question differently, and you
should not rely on any specific behavior in your programs.
@value{DARKCORNER}
+@cindex Brian Kernighan's @command{awk}
As a point of information, Brian Kernighan's @command{awk} allows @samp{^}
to match only at the beginning of the record. @command{gawk}
also works this way. For example:
@@ -6558,7 +6677,7 @@ $ @kbd{echo a b | gawk 'BEGIN @{ FS = "" @}}
@end example
@cindex dark corner, @code{FS} as null string
-@cindex FS variable, as null string
+@cindex @code{FS} variable, as null string
Traditionally, the behavior of @code{FS} equal to @code{""} was not defined.
In this case, most versions of Unix @command{awk} simply treat the entire record
as only having one field.
@@ -6570,10 +6689,8 @@ behaves this way.
@node Command Line Field Separator
@subsection Setting @code{FS} from the Command Line
-@cindex @option{-F} option
-@cindex options, command-line
-@cindex command line, options
-@cindex field separators, on command line
+@cindex @option{-F} option, command line
+@cindex field separator, on command line
@cindex command line, @code{FS} on@comma{} setting
@cindex @code{FS} variable, setting from command line
@@ -6623,66 +6740,59 @@ figures that you really want your fields to be separated with TABs and
not @samp{t}s. Use @samp{-v FS="t"} or @samp{-F"[t]"} on the command line
if you really do want to separate your fields with @samp{t}s.
-As an example, let's use an @command{awk} program file called @file{baud.awk}
-that contains the pattern @code{/300/} and the action @samp{print $1}:
+As an example, let's use an @command{awk} program file called @file{edu.awk}
+that contains the pattern @code{/edu/} and the action @samp{print $1}:
@example
-/300/ @{ print $1 @}
+/edu/ @{ print $1 @}
@end example
Let's also set @code{FS} to be the @samp{-} character and run the
-program on the file @file{BBS-list}. The following command prints a
-list of the names of the bulletin boards that operate at 300 baud and
+program on the file @file{mail-list}. The following command prints a
+list of the names of the people that work at or attend a university, and
the first three digits of their phone numbers:
@c tweaked to make the tex output look better in @smallbook
@example
-$ @kbd{awk -F- -f baud.awk BBS-list}
-@print{} aardvark 555
-@print{} alpo
-@print{} barfly 555
-@print{} bites 555
-@print{} camelot 555
-@print{} core 555
-@print{} fooey 555
-@print{} foot 555
-@print{} macfoo 555
-@print{} sdace 555
-@print{} sabafoo 555
+$ @kbd{awk -F- -f edu.awk mail-list}
+@print{} Fabius 555
+@print{} Samuel 555
+@print{} Jean
@end example
@noindent
-Note the second line of output. The second line
+Note the third line of output. The third line
in the original file looked like this:
@example
-alpo-net 555-3412 2400/1200/300 A
+Jean-Paul 555-2127 jeanpaul.campanorum@@nyu.edu R
@end example
-The @samp{-} as part of the system's name was used as the field
+The @samp{-} as part of the person's name was used as the field
separator, instead of the @samp{-} in the phone number that was
originally intended. This demonstrates why you have to be careful in
choosing your field and record separators.
@cindex Unix @command{awk}, password files@comma{} field separators and
-Perhaps the most common use of a single character as the field
-separator occurs when processing the Unix system password file.
-On many Unix systems, each user has a separate entry in the system password
-file, one line per user. The information in these lines is separated
-by colons. The first field is the user's login name and the second is
-the user's (encrypted or shadow) password. A password file entry might look
-like this:
+Perhaps the most common use of a single character as the field separator
+occurs when processing the Unix system password file. On many Unix
+systems, each user has a separate entry in the system password file, one
+line per user. The information in these lines is separated by colons.
+The first field is the user's login name and the second is the user's
+encrypted or shadow password. (A shadow password is indicated by the
+presence of a single @samp{x} in the second field.) A password file
+entry might look like this:
@cindex Robbins, Arnold
@example
-arnold:xyzzy:2076:10:Arnold Robbins:/home/arnold:/bin/bash
+arnold:x:2076:10:Arnold Robbins:/home/arnold:/bin/bash
@end example
The following program searches the system password file and prints
-the entries for users who have no password:
+the entries for users whose full name is not indicated:
@example
-awk -F: '$2 == ""' /etc/passwd
+awk -F: '$5 == ""' /etc/passwd
@end example
@node Full Line Fields
@@ -6745,7 +6855,7 @@ POSIX standard.)
@cindex POSIX @command{awk}, field separators and
-@cindex field separators, POSIX and
+@cindex field separator, POSIX and
According to the POSIX standard, @command{awk} is supposed to behave
as if each record is split into fields at the time it is read.
In particular, this means that if you change the value of @code{FS}
@@ -6798,7 +6908,7 @@ root:nSijPlPhZZwgE:0:0:Root:/:
@cindex POSIX @command{awk}, field separators and
-@cindex field separators, POSIX and
+@cindex field separator, POSIX and
According to the POSIX standard, @command{awk} is supposed to behave
as if each record is split into fields at the time it is read.
In particular, this means that if you change the value of @code{FS}
@@ -7158,6 +7268,7 @@ available for splitting regular strings (@pxref{String Functions}).
@node Multiple Line
@section Multiple-Line Records
+@cindex multiple-line records
@c STARTOFRANGE recm
@cindex records, multiline
@c STARTOFRANGE imr
@@ -7209,7 +7320,8 @@ after the last record, the final newline is removed from the record.
In the second case, this special processing is not done.
@value{DARKCORNER}
-@cindex field separators, in multiline records
+@cindex field separator, in multiline records
+@cindex @code{FS}, in multiline records
Now that the input is separated into records, the second step is to
separate the fields in the record. One way to do this is to divide each
of the lines into fields in the normal manner. This happens by default
@@ -7357,7 +7469,7 @@ and study the @code{getline} command @emph{after} you have reviewed the
rest of this @value{DOCUMENT} and have a good knowledge of how @command{awk} works.
@cindex @command{gawk}, @code{ERRNO} variable in
-@cindex @code{ERRNO} variable
+@cindex @code{ERRNO} variable, with @command{getline} command
@cindex differences in @command{awk} and @command{gawk}, @code{getline} command
@cindex @code{getline} command, return values
@cindex @option{--sandbox} option, input redirection with @code{getline}
@@ -7453,6 +7565,7 @@ rule in the program. @xref{Next Statement}.
@node Getline/Variable
@subsection Using @code{getline} into a Variable
+@cindex @code{getline} into a variable
@cindex variables, @code{getline} command into@comma{} using
You can use @samp{getline @var{var}} to read the next record from
@@ -7504,6 +7617,7 @@ the value of @code{NF} do not change.
@node Getline/File
@subsection Using @code{getline} from a File
+@cindex @code{getline} from a file
@cindex input redirection
@cindex redirection of input
@cindex @code{<} (left angle bracket), @code{<} operator (I/O)
@@ -7552,8 +7666,6 @@ from the file
@var{file}, and put it in the variable @var{var}. As above, @var{file}
is a string-valued expression that specifies the file from which to read.
-@cindex @command{gawk}, @code{RT} variable in
-@cindex @code{RT} variable
In this version of @code{getline}, none of the built-in variables are
changed and the record is not split into fields. The only variable
changed is @var{var}.@footnote{This is not quite true. @code{RT} could
@@ -7578,7 +7690,6 @@ Note here how the name of the extra input file is not built into
the program; it is taken directly from the data, specifically from the second field on
the @samp{@@include} line.
-@cindex @code{close()} function
The @code{close()} function is called to ensure that if two identical
@samp{@@include} lines appear in the input, the entire specified file is
included twice.
@@ -7595,6 +7706,7 @@ that does handle nested @samp{@@include} statements.
@subsection Using @code{getline} from a Pipe
@c From private email, dated October 2, 1988. Used by permission, March 2013.
+@cindex Kernighan, Brian
@quotation
@i{Omniscience has much to recommend it.
Failing that, attention to details would be useful.}
@@ -7604,7 +7716,7 @@ Failing that, attention to details would be useful.}
@cindex @code{|} (vertical bar), @code{|} operator (I/O)
@cindex vertical bar (@code{|}), @code{|} operator (I/O)
@cindex input pipeline
-@cindex pipes, input
+@cindex pipe, input
@cindex operators, input/output
The output of a command can also be piped into @code{getline}, using
@samp{@var{command} | getline}. In
@@ -7628,7 +7740,6 @@ produced by running the rest of the line as a shell command:
@end example
@noindent
-@cindex @code{close()} function
The @code{close()} function is called to ensure that if two identical
@samp{@@execute} lines appear in the input, the command is run for
each one.
@@ -7682,6 +7793,8 @@ because the concatenation operator is not parenthesized. You should
write it as @samp{(@w{"echo "} "date") | getline} if you want your program
to be portable to all @command{awk} implementations.
+@cindex Brian Kernighan's @command{awk}
+@cindex @command{mawk} utility
@quotation NOTE
Unfortunately, @command{gawk} has not been consistent in its treatment
of a construct like @samp{@w{"echo "} "date" | getline}.
@@ -8006,6 +8119,7 @@ indefinitely until some other process opens it for writing.
@node Command line directories
@section Directories On The Command Line
+@cindex differences in @command{awk} and @command{gawk}, command line directories
@cindex directories, command line
@cindex command line, directories on
@@ -8249,13 +8363,29 @@ program by using a new value of @code{OFS}.
@example
$ @kbd{awk 'BEGIN @{ OFS = ";"; ORS = "\n\n" @}}
-> @kbd{@{ print $1, $2 @}' BBS-list}
-@print{} aardvark;555-5553
-@print{}
-@print{} alpo-net;555-3412
-@print{}
-@print{} barfly;555-7685
-@dots{}
+> @kbd{@{ print $1, $2 @}' mail-list}
+@print{} Amelia;555-5553
+@print{}
+@print{} Anthony;555-3412
+@print{}
+@print{} Becky;555-7685
+@print{}
+@print{} Bill;555-1675
+@print{}
+@print{} Broderick;555-0542
+@print{}
+@print{} Camilla;555-2912
+@print{}
+@print{} Fabius;555-1234
+@print{}
+@print{} Julie;555-6699
+@print{}
+@print{} Martin;555-6480
+@print{}
+@print{} Samuel;555-3430
+@print{}
+@print{} Jean-Paul;555-2127
+@print{}
@end example
If the value of @code{ORS} does not contain a newline, the program's output
@@ -8277,7 +8407,7 @@ numbers can be formatted. The different format specifications are discussed
more fully in
@ref{Control Letters}.
-@cindex @code{sprintf()} function
+@cindexawkfunc{sprintf}
@cindex @code{OFMT} variable
@cindex output, format specifier@comma{} @code{OFMT}
The built-in variable @code{OFMT} contains the default format specification
@@ -8343,7 +8473,7 @@ parentheses are necessary if any of the item expressions use the @samp{>}
relational operator; otherwise, it can be confused with an output redirection
(@pxref{Redirection}).
-@cindex format strings
+@cindex format specifiers
The difference between @code{printf} and @code{print} is the @var{format}
argument. This is an expression whose value is taken as a string; it
specifies how to output each of the other arguments. It is called the
@@ -8729,30 +8859,30 @@ The following simple example shows
how to use @code{printf} to make an aligned table:
@example
-awk '@{ printf "%-10s %s\n", $1, $2 @}' BBS-list
+awk '@{ printf "%-10s %s\n", $1, $2 @}' mail-list
@end example
@noindent
This command
-prints the names of the bulletin boards (@code{$1}) in the file
-@file{BBS-list} as a string of 10 characters that are left-justified. It also
+prints the names of the people (@code{$1}) in the file
+@file{mail-list} as a string of 10 characters that are left-justified. It also
prints the phone numbers (@code{$2}) next on the line. This
produces an aligned two-column table of names and phone numbers,
as shown here:
@example
-$ @kbd{awk '@{ printf "%-10s %s\n", $1, $2 @}' BBS-list}
-@print{} aardvark 555-5553
-@print{} alpo-net 555-3412
-@print{} barfly 555-7685
-@print{} bites 555-1675
-@print{} camelot 555-0542
-@print{} core 555-2912
-@print{} fooey 555-1234
-@print{} foot 555-6699
-@print{} macfoo 555-6480
-@print{} sdace 555-3430
-@print{} sabafoo 555-2127
+$ @kbd{awk '@{ printf "%-10s %s\n", $1, $2 @}' mail-list}
+@print{} Amelia 555-5553
+@print{} Anthony 555-3412
+@print{} Becky 555-7685
+@print{} Bill 555-1675
+@print{} Broderick 555-0542
+@print{} Camilla 555-2912
+@print{} Fabius 555-1234
+@print{} Julie 555-6699
+@print{} Martin 555-6480
+@print{} Samuel 555-3430
+@print{} Jean-Paul 555-2127
@end example
In this case, the phone numbers had to be printed as strings because
@@ -8773,7 +8903,7 @@ the @command{awk} program:
@example
awk 'BEGIN @{ print "Name Number"
print "---- ------" @}
- @{ printf "%-10s %s\n", $1, $2 @}' BBS-list
+ @{ printf "%-10s %s\n", $1, $2 @}' mail-list
@end example
The above example mixes @code{print} and @code{printf} statements in
@@ -8783,7 +8913,7 @@ same results:
@example
awk 'BEGIN @{ printf "%-10s %s\n", "Name", "Number"
printf "%-10s %s\n", "----", "------" @}
- @{ printf "%-10s %s\n", $1, $2 @}' BBS-list
+ @{ printf "%-10s %s\n", $1, $2 @}' mail-list
@end example
@noindent
@@ -8798,7 +8928,7 @@ emphasized by storing it in a variable, like this:
awk 'BEGIN @{ format = "%-10s %s\n"
printf format, "Name", "Number"
printf format, "----", "------" @}
- @{ printf format, $1, $2 @}' BBS-list
+ @{ printf format, $1, $2 @}' mail-list
@end example
@c !!! exercise
@@ -8855,20 +8985,20 @@ before the first output is written to it. Subsequent writes to the same
@var{output-file} do not erase @var{output-file}, but append to it.
(This is different from how you use redirections in shell scripts.)
If @var{output-file} does not exist, it is created. For example, here
-is how an @command{awk} program can write a list of BBS names to one
+is how an @command{awk} program can write a list of peoples' names to one
file named @file{name-list}, and a list of phone numbers to another file
named @file{phone-list}:
@example
$ @kbd{awk '@{ print $2 > "phone-list"}
-> @kbd{print $1 > "name-list" @}' BBS-list}
+> @kbd{print $1 > "name-list" @}' mail-list}
$ @kbd{cat phone-list}
@print{} 555-5553
@print{} 555-3412
@dots{}
$ @kbd{cat name-list}
-@print{} aardvark
-@print{} alpo-net
+@print{} Amelia
+@print{} Anthony
@dots{}
@end example
@@ -8886,7 +9016,7 @@ appended to the file.
If @var{output-file} does not exist, then it is created.
@cindex @code{|} (vertical bar), @code{|} operator (I/O)
-@cindex pipes, output
+@cindex pipe, output
@cindex output, pipes
@item print @var{items} | @var{command}
It is possible to send output to another program through a pipe
@@ -8897,7 +9027,7 @@ to another process created to execute @var{command}.
The redirection argument @var{command} is actually an @command{awk}
expression. Its value is converted to a string whose contents give
the shell command to be run. For example, the following produces two
-files, one unsorted list of BBS names, and one list sorted in reverse
+files, one unsorted list of peoples' names, and one list sorted in reverse
alphabetical order:
@ignore
@@ -8910,7 +9040,7 @@ alone for now and let's hope no-one notices.
@example
awk '@{ print $1 > "names.unsorted"
command = "sort -r > names.sorted"
- print $1 | command @}' BBS-list
+ print $1 | command @}' mail-list
@end example
The unsorted list is written with an ordinary redirection, while
@@ -9158,9 +9288,9 @@ has been ported to, not just those that are POSIX-compliant:
@cindex extensions, common@comma{} @code{/dev/stdout} special file
@cindex extensions, common@comma{} @code{/dev/stderr} special file
@cindex file names, standard streams in @command{gawk}
-@cindex @code{/dev/@dots{}} special files (@command{gawk})
+@cindex @code{/dev/@dots{}} special files
@cindex files, @code{/dev/@dots{}} special files
-@cindex @code{/dev/fd/@var{N}} special files
+@cindex @code{/dev/fd/@var{N}} special files (@command{gawk})
@table @file
@item /dev/stdin
The standard input (file descriptor 0).
@@ -9261,7 +9391,7 @@ Doing so results in unpredictable behavior.
@c STARTOFRANGE ofc
@cindex output, files@comma{} closing
@c STARTOFRANGE pc
-@cindex pipes, closing
+@cindex pipe, closing
@c STARTOFRANGE cc
@cindex coprocesses, closing
@cindex @code{getline} command, coprocesses@comma{} using from
@@ -9279,7 +9409,7 @@ the file name or command associated with it, and subsequent
writes to the same file or command are appended to the previous writes.
The file or pipe stays open until @command{awk} exits.
-@cindex @code{close()} function
+@cindexawkfunc{close}
This implies that special steps are necessary in order to read the same
file again from the beginning, or to rerun a shell command (rather than
reading more output from the same command). The @code{close()} function
@@ -9364,6 +9494,7 @@ a separate message.
@cindex differences in @command{awk} and @command{gawk}, @code{close()} function
@cindex portability, @code{close()} function and
+@cindex @code{close()} function, portability
If you use more files than the system allows you to have open,
@command{gawk} attempts to multiplex the available open files among
your data files. @command{gawk}'s ability to do this depends upon the
@@ -9448,7 +9579,7 @@ retval = close(command) # syntax error in many Unix awks
@end example
@cindex @command{gawk}, @code{ERRNO} variable in
-@cindex @code{ERRNO} variable
+@cindex @code{ERRNO} variable, with @command{close()} function
@command{gawk} treats @code{close()} as a function.
The return value is @minus{}1 if the argument names something
that was never opened with a redirection, or if there is
@@ -9504,7 +9635,7 @@ retval = close(command) # syntax error in many Unix awks
@end example
@cindex @command{gawk}, @code{ERRNO} variable in
-@cindex @code{ERRNO} variable
+@cindex @code{ERRNO} variable, with @command{close()} function
@command{gawk} treats @code{close()} as a function.
The return value is @minus{}1 if the argument names something
that was never opened with a redirection, or if there is
@@ -9604,7 +9735,8 @@ have different forms, but are stored identically internally.
@node Scalar Constants
@subsubsection Numeric and String Constants
-@cindex numeric, constants
+@cindex constants, numeric
+@cindex numeric constants
A @dfn{numeric constant} stands for a number. This number can be an
integer, a decimal fraction, or a number in scientific (exponential)
notation.@footnote{The internal representation of all numbers,
@@ -9630,7 +9762,7 @@ double-quotation marks. For example:
@noindent
@cindex differences in @command{awk} and @command{gawk}, strings
-@cindex strings, length of
+@cindex strings, length limitations
represents the string whose contents are @samp{parrot}. Strings in
@command{gawk} can be of any length, and they can contain any of the possible
eight-bit ASCII characters including ASCII @sc{nul} (character code zero).
@@ -9846,9 +9978,9 @@ upon the contents of the current input record.
@cindex differences in @command{awk} and @command{gawk}, regexp constants
@cindex dark corner, regexp constants, as arguments to user-defined functions
-@cindex @code{gensub()} function (@command{gawk})
-@cindex @code{sub()} function
-@cindex @code{gsub()} function
+@cindexgawkfunc{gensub}
+@cindexawkfunc{sub}
+@cindexawkfunc{gsub}
Constant regular expressions are also used as the first argument for
the @code{gensub()}, @code{sub()}, and @code{gsub()} functions, as the
second argument of the @code{match()} function,
@@ -9981,7 +10113,7 @@ its position among the input file arguments---after the processing of the
preceding input file argument. For example:
@example
-awk '@{ print $n @}' n=4 inventory-shipped n=2 BBS-list
+awk '@{ print $n @}' n=4 inventory-shipped n=2 mail-list
@end example
@noindent
@@ -9990,10 +10122,10 @@ the first file is read, the command line sets the variable @code{n}
equal to four. This causes the fourth field to be printed in lines from
@file{inventory-shipped}. After the first file has finished,
but before the second file is started, @code{n} is set to two, so that the
-second field is printed in lines from @file{BBS-list}:
+second field is printed in lines from @file{mail-list}:
@example
-$ @kbd{awk '@{ print $n @}' n=4 inventory-shipped n=2 BBS-list}
+$ @kbd{awk '@{ print $n @}' n=4 inventory-shipped n=2 mail-list}
@print{} 15
@print{} 24
@dots{}
@@ -10316,9 +10448,9 @@ specific operator to represent it. Instead, concatenation is performed by
writing expressions next to one another, with no operator. For example:
@example
-$ @kbd{awk '@{ print "Field number one: " $1 @}' BBS-list}
-@print{} Field number one: aardvark
-@print{} Field number one: alpo-net
+$ @kbd{awk '@{ print "Field number one: " $1 @}' mail-list}
+@print{} Field number one: Amelia
+@print{} Field number one: Anthony
@dots{}
@end example
@@ -10326,9 +10458,9 @@ Without the space in the string constant after the @samp{:}, the line
runs together. For example:
@example
-$ @kbd{awk '@{ print "Field number one:" $1 @}' BBS-list}
-@print{} Field number one:aardvark
-@print{} Field number one:alpo-net
+$ @kbd{awk '@{ print "Field number one:" $1 @}' mail-list}
+@print{} Field number one:Amelia
+@print{} Field number one:Anthony
@dots{}
@end example
@@ -10345,6 +10477,8 @@ name = "name"
print "something meaningful" > file name
@end example
+@cindex Brian Kernighan's @command{awk}
+@cindex @command{mawk} utility
@noindent
This produces a syntax error with some versions of Unix
@command{awk}.@footnote{It happens that Brian Kernighan's
@@ -11401,10 +11535,10 @@ The Boolean operators are:
@item @var{boolean1} && @var{boolean2}
True if both @var{boolean1} and @var{boolean2} are true. For example,
the following statement prints the current input record if it contains
-both @samp{2400} and @samp{foo}:
+both @samp{edu} and @samp{li}:
@example
-if ($0 ~ /2400/ && $0 ~ /foo/) print
+if ($0 ~ /edu/ && $0 ~ /li/) print
@end example
@cindex side effects, Boolean operators
@@ -11417,11 +11551,11 @@ no substring @samp{foo} in the record.
@item @var{boolean1} || @var{boolean2}
True if at least one of @var{boolean1} or @var{boolean2} is true.
For example, the following statement prints all records in the input
-that contain @emph{either} @samp{2400} or
-@samp{foo} or both:
+that contain @emph{either} @samp{edu} or
+@samp{li} or both:
@example
-if ($0 ~ /2400/ || $0 ~ /foo/) print
+if ($0 ~ /edu/ || $0 ~ /li/) print
@end example
The subexpression @var{boolean2} is evaluated only if @var{boolean1}
@@ -12016,7 +12150,7 @@ slashes (@code{/@var{regexp}/}), or any expression whose string value
is used as a dynamic regular expression
(@pxref{Computed Regexps}).
The following example prints the second field of each input record
-whose first field is precisely @samp{foo}:
+whose first field is precisely @samp{li}:
@cindex @code{/} (forward slash), patterns and
@cindex forward slash (@code{/}), patterns and
@@ -12025,68 +12159,65 @@ whose first field is precisely @samp{foo}:
@cindex @code{!} (exclamation point), @code{!~} operator
@cindex exclamation point (@code{!}), @code{!~} operator
@example
-$ @kbd{awk '$1 == "foo" @{ print $2 @}' BBS-list}
+$ @kbd{awk '$1 == "li" @{ print $2 @}' mail-list}
@end example
@noindent
-(There is no output, because there is no BBS site with the exact name @samp{foo}.)
+(There is no output, because there is no person with the exact name @samp{li}.)
Contrast this with the following regular expression match, which
-accepts any record with a first field that contains @samp{foo}:
+accepts any record with a first field that contains @samp{li}:
@example
-$ @kbd{awk '$1 ~ /foo/ @{ print $2 @}' BBS-list}
-@print{} 555-1234
+$ @kbd{awk '$1 ~ /foo/ @{ print $2 @}' mail-list}
+@print{} 555-5553
@print{} 555-6699
-@print{} 555-6480
-@print{} 555-2127
@end example
@cindex regexp constants, as patterns
@cindex patterns, regexp constants as
A regexp constant as a pattern is also a special case of an expression
-pattern. The expression @code{/foo/} has the value one if @samp{foo}
-appears in the current input record. Thus, as a pattern, @code{/foo/}
-matches any record containing @samp{foo}.
+pattern. The expression @code{/li/} has the value one if @samp{li}
+appears in the current input record. Thus, as a pattern, @code{/li/}
+matches any record containing @samp{li}.
@cindex Boolean expressions, as patterns
Boolean expressions are also commonly used as patterns.
Whether the pattern
matches an input record depends on whether its subexpressions match.
For example, the following command prints all the records in
-@file{BBS-list} that contain both @samp{2400} and @samp{foo}:
+@file{mail-list} that contain both @samp{edu} and @samp{li}:
@example
-$ @kbd{awk '/2400/ && /foo/' BBS-list}
-@print{} fooey 555-1234 2400/1200/300 B
+$ @kbd{awk '/edu/ && /li/' mail-list}
+@print{} Samuel 555-3430 samuel.lanceolis@@shu.edu A
@end example
The following command prints all records in
-@file{BBS-list} that contain @emph{either} @samp{2400} or @samp{foo}
+@file{mail-list} that contain @emph{either} @samp{edu} or @samp{li}
(or both, of course):
@example
-$ @kbd{awk '/2400/ || /foo/' BBS-list}
-@print{} alpo-net 555-3412 2400/1200/300 A
-@print{} bites 555-1675 2400/1200/300 A
-@print{} fooey 555-1234 2400/1200/300 B
-@print{} foot 555-6699 1200/300 B
-@print{} macfoo 555-6480 1200/300 A
-@print{} sdace 555-3430 2400/1200/300 A
-@print{} sabafoo 555-2127 1200/300 C
+$ @kbd{awk '/edu/ || /li/' mail-list}
+@print{} Amelia 555-5553 amelia.zodiacusque@@gmail.com F
+@print{} Broderick 555-0542 broderick.aliquotiens@@yahoo.com R
+@print{} Fabius 555-1234 fabius.undevicesimus@@ucb.edu F
+@print{} Julie 555-6699 julie.perscrutabor@@skeeve.com F
+@print{} Samuel 555-3430 samuel.lanceolis@@shu.edu A
+@print{} Jean-Paul 555-2127 jeanpaul.campanorum@@nyu.edu R
@end example
The following command prints all records in
-@file{BBS-list} that do @emph{not} contain the string @samp{foo}:
+@file{mail-list} that do @emph{not} contain the string @samp{li}:
@example
-$ @kbd{awk '! /foo/' BBS-list}
-@print{} aardvark 555-5553 1200/300 B
-@print{} alpo-net 555-3412 2400/1200/300 A
-@print{} barfly 555-7685 1200/300 A
-@print{} bites 555-1675 2400/1200/300 A
-@print{} camelot 555-0542 300 C
-@print{} core 555-2912 1200/300 C
-@print{} sdace 555-3430 2400/1200/300 A
+$ @kbd{awk '! /li/' mail-list}
+@print{} Anthony 555-3412 anthony.asserturo@@hotmail.com A
+@print{} Becky 555-7685 becky.algebrarum@@gmail.com A
+@print{} Bill 555-1675 bill.drowning@@hotmail.com A
+@print{} Camilla 555-2912 camilla.infusarum@@skynet.be R
+@print{} Fabius 555-1234 fabius.undevicesimus@@ucb.edu F
+@print{} Martin 555-6480 martin.codicibus@@hotmail.com A
+@print{} Jean-Paul 555-2127 jeanpaul.campanorum@@nyu.edu R
@end example
@cindex @code{BEGIN} pattern, Boolean patterns and
@@ -12190,6 +12321,11 @@ $ @kbd{echo Yes | gawk '(/1/,/2/) || /Yes/'}
@error{} gawk: cmd. line:1: ^ syntax error
@end example
+@cindex range patterns, line continuation and
+As a minor point of interest, although it is poor style,
+POSIX allows you to put a newline after the comma in
+a range pattern. @value{DARKCORNER}
+
@node BEGIN/END
@subsection The @code{BEGIN} and @code{END} Special Patterns
@@ -12214,28 +12350,30 @@ programmers.
@node Using BEGIN/END
@subsubsection Startup and Cleanup Actions
+@cindex @code{BEGIN} pattern
+@cindex @code{END} pattern
A @code{BEGIN} rule is executed once only, before the first input record
is read. Likewise, an @code{END} rule is executed once only, after all the
input is read. For example:
@example
$ @kbd{awk '}
-> @kbd{BEGIN @{ print "Analysis of \"foo\"" @}}
-> @kbd{/foo/ @{ ++n @}}
-> @kbd{END @{ print "\"foo\" appears", n, "times." @}' BBS-list}
-@print{} Analysis of "foo"
-@print{} "foo" appears 4 times.
+> @kbd{BEGIN @{ print "Analysis of \"li\"" @}}
+> @kbd{/li/ @{ ++n @}}
+> @kbd{END @{ print "\"li\" appears in", n, "records." @}' mail-list}
+@print{} Analysis of "li"
+@print{} "li" appears in 4 records.
@end example
@cindex @code{BEGIN} pattern, operators and
@cindex @code{END} pattern, operators and
-This program finds the number of records in the input file @file{BBS-list}
-that contain the string @samp{foo}. The @code{BEGIN} rule prints a title
+This program finds the number of records in the input file @file{mail-list}
+that contain the string @samp{li}. The @code{BEGIN} rule prints a title
for the report. There is no need to use the @code{BEGIN} rule to
initialize the counter @code{n} to zero, since @command{awk} does this
automatically (@pxref{Variables}).
The second rule increments the variable @code{n} every time a
-record containing the pattern @samp{foo} is read. The @code{END} rule
+record containing the pattern @samp{li} is read. The @code{END} rule
prints the value of @code{n} at the end of the run.
The special patterns @code{BEGIN} and @code{END} cannot be used in ranges
@@ -12288,6 +12426,7 @@ to give @code{$0} a real value is to execute a @code{getline} command
without a variable (@pxref{Getline}).
Another way is simply to assign a value to @code{$0}.
+@cindex Brian Kernighan's @command{awk}
@cindex differences in @command{awk} and @command{gawk}, @code{BEGIN}/@code{END} patterns
@cindex POSIX @command{awk}, @code{BEGIN}/@code{END} patterns
@cindex @code{print} statement, @code{BEGIN}/@code{END} patterns and
@@ -12356,7 +12495,7 @@ you can bypass the fatal error and move on to the next file on the
command line.
@cindex @command{gawk}, @code{ERRNO} variable in
-@cindex @code{ERRNO} variable
+@cindex @code{ERRNO} variable, with @code{BEGINFILE} pattern
@cindex @code{nextfile} statement, @code{BEGINFILE}/@code{ENDFILE} patterns and
You do this by checking if the @code{ERRNO} variable is not the empty
string; if so, then @command{gawk} was not able to open the file. In
@@ -12398,7 +12537,7 @@ both @code{BEGINFILE} and @code{ENDFILE}. Only the @samp{getline
In most other @command{awk} implementations, or if @command{gawk} is in
compatibility mode (@pxref{Options}), they are not special.
-@c FIXME: For 4.1 maybe deal with this?
+@c FIXME: For 4.2 maybe deal with this?
@ignore
Date: Tue, 17 May 2011 02:06:10 PDT
From: rankin@pactechdata.com (Pat Rankin)
@@ -12429,7 +12568,7 @@ An empty (i.e., nonexistent) pattern is considered to match @emph{every}
input record. For example, the program:
@example
-awk '@{ print $1 @}' BBS-list
+awk '@{ print $1 @}' mail-list
@end example
@noindent
@@ -12682,6 +12821,7 @@ the first thing on its line.
@subsection The @code{while} Statement
@cindex @code{while} statement
@cindex loops
+@cindex loops, @code{while}
@cindex loops, See Also @code{while} statement
In programming, a @dfn{loop} is a part of a program that can
@@ -12742,6 +12882,7 @@ program is harder to read without it.
@node Do Statement
@subsection The @code{do}-@code{while} Statement
@cindex @code{do}-@code{while} statement
+@cindex loops, @code{do}-@code{while}
The @code{do} loop is a variation of the @code{while} looping statement.
The @code{do} loop executes the @var{body} once and then repeats the
@@ -12787,6 +12928,7 @@ occasionally is there a real use for a @code{do} statement.
@node For Statement
@subsection The @code{for} Statement
@cindex @code{for} statement
+@cindex loops, @code{for}, iterative
The @code{for} statement makes it more convenient to count iterations of a
loop. The general form of the @code{for} statement looks like this:
@@ -12893,6 +13035,8 @@ for more information on this version of the @code{for} loop.
@cindex @code{case} keyword
@cindex @code{default} keyword
+This @value{SECTION} describes a @command{gawk}-specific feature.
+
The @code{switch} statement allows the evaluation of an expression and
the execution of statements based on a @code{case} match. Case statements
are checked for a match in the order they are defined. If no suitable
@@ -12957,6 +13101,7 @@ it is not available.
@subsection The @code{break} Statement
@cindex @code{break} statement
@cindex loops, exiting
+@cindex loops, @code{break} statement and
The @code{break} statement jumps out of the innermost @code{for},
@code{while}, or @code{do} loop that encloses it. The following example
@@ -13016,6 +13161,7 @@ This is discussed in @ref{Switch Statement}.
@cindex POSIX @command{awk}, @code{break} statement and
@cindex dark corner, @code{break} statement
@cindex @command{gawk}, @code{break} statement in
+@cindex Brian Kernighan's @command{awk}
The @code{break} statement has no meaning when
used outside the body of a loop or @code{switch}.
However, although it was never documented,
@@ -13080,6 +13226,7 @@ This program loops forever once @code{x} reaches 5.
@cindex POSIX @command{awk}, @code{continue} statement and
@cindex dark corner, @code{continue} statement
@cindex @command{gawk}, @code{continue} statement in
+@cindex Brian Kernighan's @command{awk}
The @code{continue} statement has no special meaning with respect to the
@code{switch} statement, nor does it have any meaning when used outside the
body of a loop. Historical versions of @command{awk} treated a @code{continue}
@@ -13217,8 +13364,10 @@ See @uref{http://austingroupbugs.net/view.php?id=607, the Austin Group website}.
@cindex functions, user-defined, @code{next}/@code{nextfile} statements and
@cindex @code{nextfile} statement, user-defined functions and
-The current version of the Brian Kernighan's @command{awk} (@pxref{Other
-Versions}) also supports @code{nextfile}. However, it doesn't allow the
+@cindex Brian Kernighan's @command{awk}
+@cindex @command{mawk} utility
+The current version of the Brian Kernighan's @command{awk}, and @command{mawk} (@pxref{Other
+Versions}) also support @code{nextfile}. However, they don't allow the
@code{nextfile} statement inside function bodies (@pxref{User-defined}).
@command{gawk} does; a @code{nextfile} inside a function body reads the
next record and starts processing it with the first rule in the program,
@@ -13455,8 +13604,8 @@ is to simply say @samp{FS = FS}, perhaps with an explanatory comment.
@cindex @command{gawk}, @code{IGNORECASE} variable in
@cindex @code{IGNORECASE} variable
@cindex differences in @command{awk} and @command{gawk}, @code{IGNORECASE} variable
-@cindex case sensitivity, string comparisons and
-@cindex case sensitivity, regexps and
+@cindex case sensitivity, and string comparisons
+@cindex case sensitivity, and regexps
@cindex regular expressions, case sensitivity
@item IGNORECASE #
If @code{IGNORECASE} is nonzero or non-null, then all string comparisons
@@ -13621,16 +13770,16 @@ In the following example:
$ @kbd{awk 'BEGIN @{}
> @kbd{for (i = 0; i < ARGC; i++)}
> @kbd{print ARGV[i]}
-> @kbd{@}' inventory-shipped BBS-list}
+> @kbd{@}' inventory-shipped mail-list}
@print{} awk
@print{} inventory-shipped
-@print{} BBS-list
+@print{} mail-list
@end example
@noindent
@code{ARGV[0]} contains @samp{awk}, @code{ARGV[1]}
contains @samp{inventory-shipped}, and @code{ARGV[2]} contains
-@samp{BBS-list}. The value of @code{ARGC} is three, one more than the
+@samp{mail-list}. The value of @code{ARGC} is three, one more than the
index of the last element in @code{ARGV}, because the elements are numbered
from zero.
@@ -13673,7 +13822,7 @@ or if @command{gawk} is in compatibility mode
it is not special.
@cindex @code{ENVIRON} array
-@cindex environment variables
+@cindex environment variables, in @code{ENVIRON} array
@item ENVIRON
An associative array containing the values of the environment. The array
indices are the environment variable names; the elements are the values of
@@ -13796,10 +13945,12 @@ The following elements (listed alphabetically)
are guaranteed to be available:
@table @code
+@cindex effective group ID of @command{gawk} user
@item PROCINFO["egid"]
The value of the @code{getegid()} system call.
@item PROCINFO["euid"]
+@cindex effective user ID of @command{gawk} user
The value of the @code{geteuid()} system call.
@item PROCINFO["FS"]
@@ -13809,6 +13960,7 @@ This is
or @code{"FPAT"} if field matching with @code{FPAT} is in effect.
@item PROCINFO["identifiers"]
+@cindex program identifiers
A subarray, indexed by the names of all identifiers used in the
text of the AWK program. For each identifier, the value of the element is one of the following:
@@ -13837,15 +13989,19 @@ after it has finished parsing the program; they are @emph{not} updated
while the program runs.
@item PROCINFO["gid"]
+@cindex group ID of @command{gawk} user
The value of the @code{getgid()} system call.
@item PROCINFO["pgrpid"]
+@cindex process group idIDof @command{gawk} process
The process group ID of the current process.
@item PROCINFO["pid"]
+@cindex process ID of @command{gawk} process
The process ID of the current process.
@item PROCINFO["ppid"]
+@cindex parent process ID of @command{gawk} process
The parent process ID of the current process.
@item PROCINFO["sorted_in"]
@@ -13865,25 +14021,31 @@ Assigning a new value to this element changes the default.
The value of the @code{getuid()} system call.
@item PROCINFO["version"]
+@cindex version of @command{gawk}
+@cindex @command{gawk} version
The version of @command{gawk}.
@end table
The following additional elements in the array
are available to provide information about the MPFR and GMP libraries
if your version of @command{gawk} supports arbitrary precision numbers
-(@pxref{Arbitrary Precision Arithmetic}):
+(@pxref{Gawk and MPFR}):
@table @code
+@cindex version of GNU MPFR library
@item PROCINFO["mpfr_version"]
The version of the GNU MPFR library.
@item PROCINFO["gmp_version"]
+@cindex version of GNU MP library
The version of the GNU MP library.
@item PROCINFO["prec_max"]
+@cindex maximum precision supported by MPFR library
The maximum precision supported by MPFR.
@item PROCINFO["prec_min"]
+@cindex minimum precision supported by MPFR library
The minimum precision required by MPFR.
@end table
@@ -13894,12 +14056,15 @@ of @command{gawk} supports dynamic loading of extension functions
@table @code
@item PROCINFO["api_major"]
+@cindex version of @command{gawk} extension API
+@cindex extension API, version number
The major version of the extension API.
@item PROCINFO["api_minor"]
The minor version of the extension API.
@end table
+@cindex supplementary groups of @command{gawk} process
On some systems, there may be elements in the array, @code{"group1"}
through @code{"group@var{N}"} for some @var{N}. @var{N} is the number of
supplementary groups that the process has. Use the @code{in} operator
@@ -13907,7 +14072,7 @@ to test for these elements
(@pxref{Reference to Elements}).
@cindex @command{gawk}, @code{PROCINFO} array in
-@cindex @code{PROCINFO} array
+@cindex @code{PROCINFO} array, uses
The @code{PROCINFO} array has the following additional uses:
@itemize @bullet
@@ -13979,7 +14144,7 @@ if an element in @code{SYMTAB} is an array.
Also, you may not use the @code{delete} statement with the
@code{SYMTAB} array.
-You may use an index for @code{SYMTAB} that is not a predefined identifer:
+You may use an index for @code{SYMTAB} that is not a predefined identifier:
@example
SYMTAB["xxx"] = 5
@@ -14093,7 +14258,7 @@ changed.
@node ARGC and ARGV
@subsection Using @code{ARGC} and @code{ARGV}
-@cindex @code{ARGC}/@code{ARGV} variables
+@cindex @code{ARGC}/@code{ARGV} variables, how to use
@cindex arguments, command-line
@cindex command line, arguments
@@ -14105,16 +14270,16 @@ and @code{ARGV}:
$ @kbd{awk 'BEGIN @{}
> @kbd{for (i = 0; i < ARGC; i++)}
> @kbd{print ARGV[i]}
-> @kbd{@}' inventory-shipped BBS-list}
+> @kbd{@}' inventory-shipped mail-list}
@print{} awk
@print{} inventory-shipped
-@print{} BBS-list
+@print{} mail-list
@end example
@noindent
In this example, @code{ARGV[0]} contains @samp{awk}, @code{ARGV[1]}
contains @samp{inventory-shipped}, and @code{ARGV[2]} contains
-@samp{BBS-list}.
+@samp{mail-list}.
Notice that the @command{awk} program is not entered in @code{ARGV}. The
other command-line options, with their arguments, are also not
entered. This includes variable assignments done with the @option{-v}
@@ -14238,7 +14403,7 @@ ability to support true multidimensional arrays.
@cindex variables, names of
@cindex functions, names of
-@cindex arrays, names of
+@cindex arrays, names of, and names of functions/variables
@cindex names, arrays/variables
@cindex namespace issues
@command{awk} maintains a single set
@@ -14414,10 +14579,9 @@ Here, the number @code{1} isn't double-quoted, since @command{awk}
automatically converts it to a string.
@cindex @command{gawk}, @code{IGNORECASE} variable in
-@cindex @code{IGNORECASE} variable
@cindex case sensitivity, array indices and
-@cindex arrays, @code{IGNORECASE} variable and
-@cindex @code{IGNORECASE} variable, array subscripts and
+@cindex arrays, and @code{IGNORECASE} variable
+@cindex @code{IGNORECASE} variable, and array indices
The value of @code{IGNORECASE} has no effect upon array subscripting.
The identical string value used to store an array element must be used
to retrieve it.
@@ -14433,8 +14597,9 @@ is independent of the number of elements in the array.
@node Reference to Elements
@subsection Referring to an Array Element
-@cindex arrays, elements, referencing
-@cindex elements in arrays
+@cindex arrays, referencing elements
+@cindex array members
+@cindex elements of arrays
The principal way to use an array is to refer to one of its elements.
An array reference is an expression as follows:
@@ -14451,11 +14616,16 @@ The value of the array reference is the current value of that array
element. For example, @code{foo[4.3]} is an expression for the element
of array @code{foo} at index @samp{4.3}.
+@cindex arrays, unassigned elements
+@cindex unassigned array elements
+@cindex empty array elements
A reference to an array element that has no recorded value yields a value of
@code{""}, the null string. This includes elements
that have not been assigned any value as well as elements that have been
deleted (@pxref{Delete}).
+@cindex non-existent array elements
+@cindex arrays, elements that don't exist
@quotation NOTE
A reference to an element that does not exist @emph{automatically} creates
that array element, with the null string as its value. (In some cases,
@@ -14475,7 +14645,7 @@ if it didn't exist before!
@end quotation
@c @cindex arrays, @code{in} operator and
-@cindex @code{in} operator
+@cindex @code{in} operator, testing if array element exists
To determine whether an element exists in an array at a certain index, use
the following expression:
@@ -14510,8 +14680,8 @@ if (frequencies[2] != "")
@node Assigning Elements
@subsection Assigning Array Elements
-@cindex arrays, elements, assigning
-@cindex elements in arrays, assigning
+@cindex arrays, elements, assigning values
+@cindex elements in arrays, assigning values
Array elements can be assigned values just like
@command{awk} variables:
@@ -14528,6 +14698,7 @@ assign to that element of the array.
@node Array Example
@subsection Basic Array Example
+@cindex arrays, an example of using
The following program takes a list of lines, each beginning with a line
number, and prints them out in order of line number. The line numbers
@@ -14597,7 +14768,9 @@ END @{
@node Scanning an Array
@subsection Scanning All Elements of an Array
@cindex elements in arrays, scanning
+@cindex scanning arrays
@cindex arrays, scanning
+@cindex loops, @code{for}, array scanning
In programs that use arrays, it is often necessary to use a loop that
executes once for each element of an array. In other languages, where
@@ -14614,7 +14787,7 @@ for (@var{var} in @var{array})
@end example
@noindent
-@cindex @code{in} operator
+@cindex @code{in} operator, use in loops
This loop executes @var{body} once for each index in @var{array} that the
program has previously used, with the variable @var{var} set to that index.
@@ -14653,8 +14826,9 @@ END @{
@xref{Word Sorting},
for a more detailed example of this type.
-@cindex arrays, elements, order of
-@cindex elements in arrays, order of
+@cindex arrays, elements, order of access by @code{in} operator
+@cindex elements in arrays, order of access by @code{in} operator
+@cindex @code{in} operator, order of array access
The order in which elements of the array are accessed by this statement
is determined by the internal arrangement of the array elements within
@command{awk} and normally cannot be controlled or changed. This can lead to
@@ -14672,6 +14846,8 @@ determines the order in which the array is traversed.
This order is usually based on the internal implementation of arrays
and will vary from one version of @command{awk} to the next.
+@cindex array scanning order, controlling
+@cindex controlling array scanning order
Often, though, you may wish to do something simple, such as
``traverse the array by comparing the indices in ascending order,''
or ``traverse the array by comparing the values in descending order.''
@@ -14688,6 +14864,7 @@ to use for comparison of array elements. This advanced feature
is described later, in @ref{Array Sorting}.
@end itemize
+@cindex @code{PROCINFO}, values of @code{sorted_in}
The following special values for @code{PROCINFO["sorted_in"]} are available:
@table @code
@@ -14848,7 +15025,7 @@ if (4 in foo)
print "This will never be printed"
@end example
-@cindex null strings, array elements and
+@cindex null strings, and deleting array elements
It is important to note that deleting an element is @emph{not} the
same as assigning it a null value (the empty string, @code{""}).
For example:
@@ -14870,6 +15047,7 @@ is not in the array is deleted.
@cindex extensions, common@comma{} @code{delete} to delete entire arrays
@cindex arrays, deleting entire contents
@cindex deleting entire arrays
+@cindex @code{delete} @var{array}
@cindex differences in @command{awk} and @command{gawk}, array elements, deleting
All the elements of an array may be deleted with a single statement
by leaving off the subscript in the @code{delete} statement,
@@ -14884,6 +15062,7 @@ Using this version of the @code{delete} statement is about three times
more efficient than the equivalent loop that deletes each element one
at a time.
+@cindex Brian Kernighan's @command{awk}
@quotation NOTE
For many years,
using @code{delete} without a subscript was a @command{gawk} extension.
@@ -14926,9 +15105,9 @@ a = 3
@section Using Numbers to Subscript Arrays
@cindex numbers, as array subscripts
-@cindex arrays, subscripts
+@cindex arrays, numeric subscripts
@cindex subscripts in arrays, numbers as
-@cindex @code{CONVFMT} variable, array subscripts and
+@cindex @code{CONVFMT} variable, and array subscripts
An important aspect to remember about arrays is that @emph{array subscripts
are always strings}. When a numeric value is used as a subscript,
it is converted to a string value before being used for subscripting
@@ -14958,7 +15137,8 @@ string value from @code{xyz}---this time @code{"12.15"}---because the value of
@code{CONVFMT} only allows two significant digits. This test fails,
since @code{"12.15"} is different from @code{"12.153"}.
-@cindex converting, during subscripting
+@cindex converting integer array subscripts
+@cindex integer array indices
According to the rules for conversions
(@pxref{Conversion}), integer
values are always converted to strings as integers, no matter what the
@@ -15064,7 +15244,7 @@ languages, including @command{awk}) to refer to an element of a
two-dimensional array named @code{grid} is with
@code{grid[@var{x},@var{y}]}.
-@cindex @code{SUBSEP} variable, multidimensional arrays
+@cindex @code{SUBSEP} variable, and multidimensional arrays
Multidimensional arrays are supported in @command{awk} through
concatenation of indices into one string.
@command{awk} converts the indices into strings
@@ -15096,6 +15276,7 @@ combined strings that are ambiguous. Suppose that @code{SUBSEP} is
"b@@c"]}} are indistinguishable because both are actually
stored as @samp{foo["a@@b@@c"]}.
+@cindex @code{in} operator, index existence in multidimensional arrays
To test whether a particular index sequence exists in a
multidimensional array, use the same operator (@code{in}) that is
used for single dimensional arrays. Write the whole sequence of indices
@@ -15161,6 +15342,7 @@ multidimensional @emph{way of accessing} an array.
@cindex subscripts in arrays, multidimensional, scanning
@cindex arrays, multidimensional, scanning
+@cindex scanning multidimensional arrays
However, if your program has an array that is always accessed as
multidimensional, you can get the effect of scanning it by combining
the scanning @code{for} statement
@@ -15202,6 +15384,7 @@ separate indices is recovered.
@node Arrays of Arrays
@section Arrays of Arrays
+@cindex arrays of arrays
@command{gawk} goes beyond standard @command{awk}'s multidimensional
array access and provides true arrays of
@@ -15461,6 +15644,7 @@ two arguments 11 and 10.
@node Numeric Functions
@subsection Numeric Functions
+@cindex numeric functions
The following list describes all of
the built-in functions that work with numbers.
@@ -15468,22 +15652,26 @@ Optional parameters are enclosed in square brackets@w{ ([ ]):}
@table @code
@item atan2(@var{y}, @var{x})
-@cindex @code{atan2()} function
+@cindexawkfunc{atan2}
+@cindex arctangent
Return the arctangent of @code{@var{y} / @var{x}} in radians.
You can use @samp{pi = atan2(0, -1)} to retrieve the value of @value{PI}.
@item cos(@var{x})
-@cindex @code{cos()} function
+@cindexawkfunc{cos}
+@cindex cosine
Return the cosine of @var{x}, with @var{x} in radians.
@item exp(@var{x})
-@cindex @code{exp()} function
+@cindexawkfunc{exp}
+@cindex exponent
Return the exponential of @var{x} (@code{e ^ @var{x}}) or report
an error if @var{x} is out of range. The range of values @var{x} can have
depends on your machine's floating-point representation.
@item int(@var{x})
-@cindex @code{int()} function
+@cindexawkfunc{int}
+@cindex round to nearest integer
Return the nearest integer to @var{x}, located between @var{x} and zero and
truncated toward zero.
@@ -15491,12 +15679,13 @@ For example, @code{int(3)} is 3, @code{int(3.9)} is 3, @code{int(-3.9)}
is @minus{}3, and @code{int(-3)} is @minus{}3 as well.
@item log(@var{x})
-@cindex @code{log()} function
+@cindexawkfunc{log}
+@cindex logarithm
Return the natural logarithm of @var{x}, if @var{x} is positive;
otherwise, report an error.
@item rand()
-@cindex @code{rand()} function
+@cindexawkfunc{rand}
@cindex random numbers, @code{rand()}/@code{srand()} functions
Return a random number. The values of @code{rand()} are
uniformly distributed between zero and one.
@@ -15538,7 +15727,7 @@ function roll(n) @{ return 1 + int(rand() * n) @}
@}
@end example
-@cindex numbers, random
+@cindex seeding random number generator
@cindex random numbers, seed of
@quotation CAUTION
In most @command{awk} implementations, including @command{gawk},
@@ -15554,17 +15743,19 @@ use @code{srand()}.
@end quotation
@item sin(@var{x})
-@cindex @code{sin()} function
+@cindexawkfunc{sin}
+@cindex sine
Return the sine of @var{x}, with @var{x} in radians.
@item sqrt(@var{x})
-@cindex @code{sqrt()} function
+@cindexawkfunc{sqrt}
+@cindex square root
Return the positive square root of @var{x}.
@command{gawk} prints a warning message
if @var{x} is negative. Thus, @code{sqrt(4)} is 2.
@item srand(@r{[}@var{x}@r{]})
-@cindex @code{srand()} function
+@cindexawkfunc{srand}
Set the starting point, or seed,
for generating random numbers to the value @var{x}.
@@ -15594,6 +15785,7 @@ sequences of random numbers.
@node String Functions
@subsection String-Manipulation Functions
+@cindex string-manipulation functions
The functions in this @value{SECTION} look at or change the text of one
or more strings.
@@ -15622,11 +15814,11 @@ pound sign@w{ (@samp{#}):}
@table @code
@item asort(@var{source} @r{[}, @var{dest} @r{[}, @var{how} @r{]} @r{]}) #
@itemx asorti(@var{source} @r{[}, @var{dest} @r{[}, @var{how} @r{]} @r{]}) #
-@cindex @code{asorti()} function (@command{gawk})
+@cindexgawkfunc{asorti}
+@cindex sort array
@cindex arrays, elements, retrieving number of
-@cindex @code{asort()} function (@command{gawk})
-@cindex @command{gawk}, @code{IGNORECASE} variable in
-@cindex @code{IGNORECASE} variable
+@cindexgawkfunc{asort}
+@cindex sort array indices
These two functions are similar in behavior, so they are described
together.
@@ -15644,7 +15836,9 @@ sequential integers starting with one. If the optional array @var{dest}
is specified, then @var{source} is duplicated into @var{dest}. @var{dest}
is then sorted, leaving the indices of @var{source} unchanged.
-When comparing strings, @code{IGNORECASE} affects the sorting. If the
+@cindex @command{gawk}, @code{IGNORECASE} variable in
+When comparing strings, @code{IGNORECASE} affects the sorting
+(@pxref{Array Sorting Functions}). If the
@var{source} array contains subarrays as values (@pxref{Arrays of
Arrays}), they will come last, after all scalar values.
@@ -15687,7 +15881,9 @@ a[3] = "middle"
are not available in compatibility mode (@pxref{Options}).
@item gensub(@var{regexp}, @var{replacement}, @var{how} @r{[}, @var{target}@r{]}) #
-@cindex @code{gensub()} function (@command{gawk})
+@cindexgawkfunc{gensub}
+@cindex search and replace in strings
+@cindex substitute in string
Search the target string @var{target} for matches of the regular
expression @var{regexp}. If @var{how} is a string beginning with
@samp{g} or @samp{G} (short for ``global''), then replace all matches of @var{regexp} with
@@ -15696,7 +15892,7 @@ which match of @var{regexp} to replace. If no @var{target} is supplied,
use @code{$0}. It returns the modified string as the result
of the function and the original target string is @emph{not} changed.
-@code{gensub()} is a general substitution function. It's purpose is
+@code{gensub()} is a general substitution function. Its purpose is
to provide more features than the standard @code{sub()} and @code{gsub()}
functions.
@@ -15750,7 +15946,7 @@ is the original unchanged value of @var{target}.
in compatibility mode (@pxref{Options}).
@item gsub(@var{regexp}, @var{replacement} @r{[}, @var{target}@r{]})
-@cindex @code{gsub()} function
+@cindexawkfunc{gsub}
Search @var{target} for
@emph{all} of the longest, leftmost, @emph{nonoverlapping} matching
substrings it can find and replace them with @var{replacement}.
@@ -15772,8 +15968,9 @@ As in @code{sub()}, the characters @samp{&} and @samp{\} are special,
and the third argument must be assignable.
@item index(@var{in}, @var{find})
-@cindex @code{index()} function
-@cindex searching
+@cindexawkfunc{index}
+@cindex search in string
+@cindex find substring in string
Search the string @var{in} for the first occurrence of the string
@var{find}, and return the position in characters where that occurrence
begins in the string @var{in}. Consider the following example:
@@ -15790,7 +15987,9 @@ If @var{find} is not found, @code{index()} returns zero.
It is a fatal error to use a regexp constant for @var{find}.
@item length(@r{[}@var{string}@r{]})
-@cindex @code{length()} function
+@cindexawkfunc{length}
+@cindex string length
+@cindex length of string
Return the number of characters in @var{string}. If
@var{string} is a number, the length of the digit string representing
that number is returned. For example, @code{length("abcde")} is five. By
@@ -15798,6 +15997,8 @@ contrast, @code{length(15 * 35)} works out to three. In this example, 15 * 35 =
525, and 525 is then converted to the string @code{"525"}, which has
three characters.
+@cindex length of input record
+@cindex input record, length of
If no argument is supplied, @code{length()} returns the length of @code{$0}.
@c @cindex historical features
@@ -15836,6 +16037,8 @@ warning about this.
@cindex common extensions, @code{length()} applied to an array
@cindex extensions, common@comma{} @code{length()} applied to an array
@cindex differences between @command{gawk} and @command{awk}
+@cindex number of array elements
+@cindex array, number of elements
With @command{gawk} and several other @command{awk} implementations, when given an
array argument, the @code{length()} function returns the number of elements
in the array. @value{COMMONEXT}
@@ -15849,7 +16052,9 @@ If @option{--posix} is supplied, using an array argument is a fatal error
(@pxref{Arrays}).
@item match(@var{string}, @var{regexp} @r{[}, @var{array}@r{]})
-@cindex @code{match()} function
+@cindexawkfunc{match}
+@cindex string, regular expression match
+@cindex match regexp in string
Search @var{string} for the
longest, leftmost substring matched by the regular expression,
@var{regexp} and return the character position, or @dfn{index},
@@ -15964,7 +16169,8 @@ The @var{array} argument to @code{match()} is a
using a third argument is a fatal error.
@item patsplit(@var{string}, @var{array} @r{[}, @var{fieldpat} @r{[}, @var{seps} @r{]} @r{]}) #
-@cindex @code{patsplit()} function (@command{gawk})
+@cindexgawkfunc{patsplit}
+@cindex split string into array
Divide
@var{string} into pieces defined by @var{fieldpat}
and store the pieces in @var{array} and the separator strings in the
@@ -15995,7 +16201,7 @@ The @code{patsplit()} function is a
it is not available.
@item split(@var{string}, @var{array} @r{[}, @var{fieldsep} @r{[}, @var{seps} @r{]} @r{]})
-@cindex @code{split()} function
+@cindexawkfunc{split}
Divide @var{string} into pieces separated by @var{fieldsep}
and store the pieces in @var{array} and the separator strings in the
@var{seps} array. The first piece is stored in
@@ -16024,7 +16230,7 @@ split("cul-de-sac", a, "-", seps)
@end example
@noindent
-@cindex strings, splitting
+@cindex strings splitting, example
splits the string @samp{cul-de-sac} into three fields using @samp{-} as the
separator. It sets the contents of the array @code{a} as follows:
@@ -16080,7 +16286,8 @@ If @var{string} does not match @var{fieldsep} at all (but is not null),
@var{string}.
@item sprintf(@var{format}, @var{expression1}, @dots{})
-@cindex @code{sprintf()} function
+@cindexawkfunc{sprintf}
+@cindex formatting strings
Return (without printing) the string that @code{printf} would
have printed out with the same arguments
(@pxref{Printf}).
@@ -16093,7 +16300,8 @@ pival = sprintf("pi = %.2f (approx.)", 22/7)
@noindent
assigns the string @w{@samp{pi = 3.14 (approx.)}} to the variable @code{pival}.
-@cindex @code{strtonum()} function (@command{gawk})
+@cindexgawkfunc{strtonum}
+@cindex convert string to number
@item strtonum(@var{str}) #
Examine @var{str} and return its numeric value. If @var{str}
begins with a leading @samp{0}, @code{strtonum()} assumes that @var{str}
@@ -16116,12 +16324,12 @@ you use the @option{--non-decimal-data} option, which isn't recommended.
Note also that @code{strtonum()} uses the current locale's decimal point
for recognizing numbers (@pxref{Locales}).
-@cindex differences in @command{awk} and @command{gawk}, @code{strtonum()} function (@command{gawk})
@code{strtonum()} is a @command{gawk} extension; it is not available
in compatibility mode (@pxref{Options}).
@item sub(@var{regexp}, @var{replacement} @r{[}, @var{target}@r{]})
-@cindex @code{sub()} function
+@cindexawkfunc{sub}
+@cindex replace in string
Search @var{target}, which is treated as a string, for the
leftmost, longest substring matched by the regular expression @var{regexp}.
Modify the entire string
@@ -16221,7 +16429,8 @@ Finally, if the @var{regexp} is not a regexp constant, it is converted into a
string, and then the value of that string is treated as the regexp to match.
@item substr(@var{string}, @var{start} @r{[}, @var{length}@r{]})
-@cindex @code{substr()} function
+@cindexawkfunc{substr}
+@cindex substring
Return a @var{length}-character-long substring of @var{string},
starting at character number @var{start}. The first character of a
string is character number one.@footnote{This is different from
@@ -16235,6 +16444,7 @@ suffix is also returned
if @var{length} is greater than the number of characters remaining
in the string, counting from character @var{start}.
+@cindex Brian Kernighan's @command{awk}
If @var{start} is less than one, @code{substr()} treats it as
if it was one. (POSIX doesn't specify what to do in this case:
Brian Kernighan's @command{awk} acts this way, and therefore @command{gawk}
@@ -16277,16 +16487,18 @@ string = substr(string, 1, 2) "CDE" substr(string, 6)
@end example
@cindex case sensitivity, converting case
-@cindex converting, case
+@cindex strings, converting letter case
@item tolower(@var{string})
-@cindex @code{tolower()} function
+@cindexawkfunc{tolower}
+@cindex convert string to lower case
Return a copy of @var{string}, with each uppercase character
in the string replaced with its corresponding lowercase character.
Nonalphabetic characters are left unchanged. For example,
@code{tolower("MiXeD cAsE 123")} returns @code{"mixed case 123"}.
@item toupper(@var{string})
-@cindex @code{toupper()} function
+@cindexawkfunc{toupper}
+@cindex convert string to upper case
Return a copy of @var{string}, with each lowercase character
in the string replaced with its corresponding uppercase character.
Nonalphabetic characters are left unchanged. For example,
@@ -16314,6 +16526,7 @@ and builds an internal copy of it that can be executed.
Then there is the runtime level, which is when @command{awk} actually scans the
replacement string to determine what to generate.
+@cindex Brian Kernighan's @command{awk}
At both levels, @command{awk} looks for a defined set of characters that
can come after a backslash. At the lexical level, it looks for the
escape sequences listed in @ref{Escape Sequences}.
@@ -16711,14 +16924,16 @@ Although this makes a certain amount of sense, it can be surprising.
@node I/O Functions
@subsection Input/Output Functions
+@cindex input/output functions
The following functions relate to input/output (I/O).
Optional parameters are enclosed in square brackets ([ ]):
@table @code
@item close(@var{filename} @r{[}, @var{how}@r{]})
-@cindex @code{close()} function
+@cindexawkfunc{close}
@cindex files, closing
+@cindex close file or coprocess
Close the file @var{filename} for input or output. Alternatively, the
argument may be a shell command that was used for creating a coprocess, or
for redirecting to or from a pipe; then the coprocess or pipe is closed.
@@ -16735,7 +16950,8 @@ not matter.
which discusses this feature in more detail and gives an example.
@item fflush(@r{[}@var{filename}@r{]})
-@cindex @code{fflush()} function
+@cindexawkfunc{fflush}
+@cindex flush buffered output
Flush any buffered output associated with @var{filename}, which is either a
file opened for writing or a shell command for redirecting output to
a pipe or coprocess.
@@ -16753,11 +16969,12 @@ This is the purpose of the @code{fflush()} function---@command{gawk} also
buffers its output and the @code{fflush()} function forces
@command{gawk} to flush its buffers.
-@code{fflush()} was added to Brian Kernighan's
-version of @command{awk} in 1994.
-For over two decades, it was not part of the POSIX standard.
-As of December, 2012, it was accepted for
-inclusion into the POSIX standard.
+@cindex extensions, common@comma{} @code{fflush()} function
+@cindex Brian Kernighan's @command{awk}
+@code{fflush()} was added to Brian Kernighan's version of @command{awk} in
+April of 1992. For two decades, it was not part of the POSIX standard.
+As of December, 2012, it was accepted for inclusion into the POSIX
+standard.
See @uref{http://austingroupbugs.net/view.php?id=634, the Austin Group website}.
POSIX standardizes @code{fflush()} as follows: If there
@@ -16793,7 +17010,8 @@ or if @var{filename} is not an open file, pipe, or coprocess.
In such a case, @code{fflush()} returns @minus{}1, as well.
@item system(@var{command})
-@cindex @code{system()} function
+@cindexawkfunc{system}
+@cindex invoke shell command
@cindex interacting with other programs
Execute the operating-system
command @var{command} and then return to the @command{awk} program.
@@ -17068,6 +17286,7 @@ you would see the latter (undesirable) output.
@node Time Functions
@subsection Time Functions
+@cindex time functions
@c STARTOFRANGE tst
@cindex timestamps
@@ -17087,7 +17306,18 @@ it is the number of seconds since
1970-01-01 00:00:00 UTC, not counting leap seconds.@footnote{@xref{Glossary},
especially the entries ``Epoch'' and ``UTC.''}
All known POSIX-compliant systems support timestamps from 0 through
-@math{2^{31} - 1}, which is sufficient to represent times through
+@iftex
+@math{2^{31} - 1},
+@end iftex
+@ifnottex
+@ifnotdocbook
+2^31 - 1,
+@end ifnotdocbook
+@end ifnottex
+@docbook
+2<superscript>31</superscript> &minus; 1, @c
+@end docbook
+which is sufficient to represent times through
2038-01-19 03:14:07 UTC. Many systems support a wider range of timestamps,
including negative timestamps that represent times before the
epoch.
@@ -17106,7 +17336,8 @@ Optional parameters are enclosed in square brackets ([ ]):
@table @code
@item mktime(@var{datespec})
-@cindex @code{mktime()} function (@command{gawk})
+@cindexgawkfunc{mktime}
+@cindex generate time values
Turn @var{datespec} into a timestamp in the same form
as is returned by @code{systime()}. It is similar to the function of the
same name in ISO C. The argument, @var{datespec}, is a string of the form
@@ -17136,7 +17367,8 @@ is out of range, @code{mktime()} returns @minus{}1.
@cindex @code{PROCINFO} array
@item strftime(@r{[}@var{format} @r{[}, @var{timestamp} @r{[}, @var{utc-flag}@r{]]]})
@c STARTOFRANGE strf
-@cindex @code{strftime()} function (@command{gawk})
+@cindexgawkfunc{strftime}
+@cindex format time string
Format the time specified by @var{timestamp}
based on the contents of the @var{format} string and return the result.
It is similar to the function of the same name in ISO C.
@@ -17153,11 +17385,12 @@ The default string value is
@code{@w{"%a %b %e %H:%M:%S %Z %Y"}}. This format string produces
output that is equivalent to that of the @command{date} utility.
You can assign a new value to @code{PROCINFO["strftime"]} to
-change the default format.
+change the default format; see below for the various format directives.
@item systime()
-@cindex @code{systime()} function (@command{gawk})
+@cindexgawkfunc{systime}
@cindex timestamps
+@cindex current system time
Return the current time as the number of seconds since
the system epoch. On POSIX systems, this is the number of seconds
since 1970-01-01 00:00:00 UTC, not counting leap seconds.
@@ -17451,6 +17684,7 @@ gawk 'BEGIN @{
@node Bitwise Functions
@subsection Bit-Manipulation Functions
+@cindex bit-manipulation functions
@c STARTOFRANGE bit
@cindex bitwise, operations
@c STARTOFRANGE and
@@ -17613,27 +17847,33 @@ bitwise operations just described. They are:
@cindex @command{gawk}, bitwise operations in
@table @code
-@cindex @code{and()} function (@command{gawk})
+@cindexgawkfunc{and}
+@cindex bitwise AND
@item and(@var{v1}, @var{v2} @r{[}, @r{@dots{}]})
Return the bitwise AND of the arguments. There must be at least two.
-@cindex @code{compl()} function (@command{gawk})
+@cindexgawkfunc{compl}
+@cindex bitwise complement
@item compl(@var{val})
Return the bitwise complement of @var{val}.
-@cindex @code{lshift()} function (@command{gawk})
+@cindexgawkfunc{lshift}
+@cindex left shift
@item lshift(@var{val}, @var{count})
Return the value of @var{val}, shifted left by @var{count} bits.
-@cindex @code{or()} function (@command{gawk})
+@cindexgawkfunc{or}
+@cindex bitwise OR
@item or(@var{v1}, @var{v2} @r{[}, @r{@dots{}]})
Return the bitwise OR of the arguments. There must be at least two.
-@cindex @code{rshift()} function (@command{gawk})
+@cindexgawkfunc{rshift}
+@cindex right shift
@item rshift(@var{val}, @var{count})
Return the value of @var{val}, shifted right by @var{count} bits.
-@cindex @code{xor()} function (@command{gawk})
+@cindexgawkfunc{xor}
+@cindex bitwise XOR
@item xor(@var{v1}, @var{v2} @r{[}, @r{@dots{}]})
Return the bitwise XOR of the arguments. There must be at least two.
@end table
@@ -17725,6 +17965,7 @@ $ @kbd{gawk -f testbits.awk}
@cindex strings, converting
@cindex numbers, converting
@cindex converting, numbers to strings
+@cindex number as string of bits
The @code{bits2str()} function turns a binary number into a string.
The number @code{1} represents a binary value where the rightmost bit
is set to 1. Using this mask,
@@ -17760,7 +18001,8 @@ that traverses every element of a true multidimensional array
(@pxref{Arrays of Arrays}).
@table @code
-@cindex @code{isarray()} function (@command{gawk})
+@cindexgawkfunc{isarray}
+@cindex scalar or array
@item isarray(@var{x})
Return a true value if @var{x} is an array. Otherwise return false.
@end table
@@ -17768,7 +18010,7 @@ Return a true value if @var{x} is an array. Otherwise return false.
@code{isarray()} is meant for use in two circumstances. The first is when
traversing a multidimensional array: you can test if an element is itself
an array or not. The second is inside the body of a user-defined function
-(not discussed yet; @pxref{User-defined}), to test if a paramater is an
+(not discussed yet; @pxref{User-defined}), to test if a parameter is an
array or not.
Note, however, that using @code{isarray()} at the global level to test
@@ -17782,6 +18024,7 @@ will end up turning it into a scalar.
@subsection String-Translation Functions
@cindex @command{gawk}, string-translation functions
@cindex functions, string-translation
+@cindex string-translation functions
@cindex internationalization
@cindex @command{awk} programs, internationalizing
@@ -17793,7 +18036,8 @@ for the full story.
Optional parameters are enclosed in square brackets ([ ]):
@table @code
-@cindex @code{bindtextdomain()} function (@command{gawk})
+@cindexgawkfunc{bindtextdomain}
+@cindex set directory of message catalogs
@item bindtextdomain(@var{directory} @r{[}, @var{domain}@r{]})
Set the directory in which
@command{gawk} will look for message translation files, in case they
@@ -17806,14 +18050,15 @@ If @var{directory} is the null string (@code{""}), then
@code{bindtextdomain()} returns the current binding for the
given @var{domain}.
-@cindex @code{dcgettext()} function (@command{gawk})
+@cindexgawkfunc{dcgettext}
+@cindex translate string
@item dcgettext(@var{string} @r{[}, @var{domain} @r{[}, @var{category}@r{]]})
Return the translation of @var{string} in
text domain @var{domain} for locale category @var{category}.
The default value for @var{domain} is the current value of @code{TEXTDOMAIN}.
The default value for @var{category} is @code{"LC_MESSAGES"}.
-@cindex @code{dcngettext()} function (@command{gawk})
+@cindexgawkfunc{dcngettext}
@item dcngettext(@var{string1}, @var{string2}, @var{number} @r{[}, @var{domain} @r{[}, @var{category}@r{]]})
Return the plural form used for @var{number} of the
translation of @var{string1} and @var{string2} in text domain
@@ -17830,7 +18075,7 @@ The default value for @var{category} is @code{"LC_MESSAGES"}.
@section User-Defined Functions
@c STARTOFRANGE udfunc
-@cindex user-defined, functions
+@cindex user-defined functions
@c STARTOFRANGE funcud
@cindex functions, user-defined
Complicated @command{awk} programs can often be simplified by defining
@@ -17889,7 +18134,7 @@ have a parameter with the same name as the function itself.
In addition, according to the POSIX standard, function parameters cannot have the same
name as one of the special built-in variables
(@pxref{Built-in Variables}. Not all versions of @command{awk}
-enforce this restriction.
+enforce this restriction.)
The @var{body-of-function} consists of @command{awk} statements. It is the
most important part of the definition, because it says what the function
@@ -17916,6 +18161,7 @@ conventional to place some extra space between the arguments and
the local variables, in order to document how your function is supposed to be used.
@cindex variables, shadowing
+@cindex shadowing of variable values
During execution of the function body, the arguments and local variable
values hide, or @dfn{shadow}, any variables of the same names used in the
rest of the program. The shadowed variables are not accessible in the
@@ -17936,7 +18182,7 @@ function. When this happens, we say the function is @dfn{recursive}.
The act of a function calling itself is called @dfn{recursion}.
All the built-in functions return a value to their caller.
-User-defined functions can do also, using the @code{return} statement,
+User-defined functions can do so also, using the @code{return} statement,
which is described in detail in @ref{Return Statement}.
Many of the subsequent examples in this @value{SECTION} use
the @code{return} statement.
@@ -17974,6 +18220,7 @@ keyword @code{function} when defining a function.
@node Function Example
@subsection Function Definition Examples
+@cindex function definition example
Here is an example of a user-defined function, called @code{myprint()}, that
takes a number and prints it in a specific format:
@@ -18028,7 +18275,8 @@ Instead of having
to repeat this loop everywhere that you need to clear out
an array, your program can just call @code{delarray}.
(This guarantees portability. The use of @samp{delete @var{array}} to delete
-the contents of an entire array is a nonstandard extension.)
+the contents of an entire array is a recent@footnote{Late in 2012.}
+addition to the POSIX standard.)
The following is an example of a recursive function. It takes a string
as an input parameter and returns the string in backwards order.
@@ -18084,7 +18332,10 @@ function ctime(ts, format)
@subsection Calling User-Defined Functions
@c STARTOFRANGE fudc
-This section describes how to call a user-defined function.
+@cindex functions, user-defined, calling
+@dfn{Calling a function} means causing the function to run and do its job.
+A function call is an expression and its value is the value returned by
+the function.
@menu
* Calling A Function:: Don't use spaces.
@@ -18095,11 +18346,6 @@ This section describes how to call a user-defined function.
@node Calling A Function
@subsubsection Writing A Function Call
-@cindex functions, user-defined, calling
-@dfn{Calling a function} means causing the function to run and do its job.
-A function call is an expression and its value is the value returned by
-the function.
-
A function call consists of the function name followed by the arguments
in parentheses. @command{awk} expressions are what you write in the
call for the arguments. Each time the call is executed, these
@@ -18123,8 +18369,8 @@ an error.
@node Variable Scope
@subsubsection Controlling Variable Scope
-@cindex local variables
-@cindex variables, local
+@cindex local variables, in a function
+@cindex variables, local to a function
There is no way to make a variable local to a @code{@{ @dots{} @}} block in
@command{awk}, but you can make a variable local to a function. It is
good practice to do so whenever a variable is needed only in that
@@ -18569,7 +18815,7 @@ character:
@example
the_func = "sum"
-result = @@the_func() # calls the `sum' function
+result = @@the_func() # calls the sum() function
@end example
Here is a full program that processes the previously shown data,
@@ -18690,8 +18936,9 @@ We can do something similar using @command{gawk}, like this:
@ignore
@c file eg/lib/quicksort.awk
#
-# Arnold Robbins, arnold@skeeve.com, Public Domain
+# Arnold Robbins, arnold@@skeeve.com, Public Domain
# January 2009
+
@c endfile
@end ignore
@@ -18764,7 +19011,7 @@ or equal to), which yields data sorted in descending order.
Next comes a sorting function. It is parameterized with the starting and
ending field numbers and the comparison function. It builds an array with
-the data and calls @code{quicksort} appropriately, and then formats the
+the data and calls @code{quicksort()} appropriately, and then formats the
results as a single string:
@example
@@ -18902,6 +19149,8 @@ it allows you to encapsulate algorithms and program tasks in a single
place. It simplifies programming, making program development more
manageable, and making programs more readable.
+@cindex Kernighan, Brian
+@cindex Plauger, P.J.@:
In their seminal 1976 book, @cite{Software Tools},@footnote{Sadly, over 35
years later, many of the lessons taught by this book have yet to be
learned by a vast number of practicing programmers.} Brian Kernighan
@@ -19031,7 +19280,7 @@ with the user's program.
@cindex underscore (@code{_}), in names of private variables
In addition, several of the library functions use a prefix that helps
indicate what function or set of functions use the variables---for example,
-@code{_pw_byname} in the user database routines
+@code{_pw_byname()} in the user database routines
(@pxref{Passwd Functions}).
This convention is recommended, since it even further decreases the
chance of inadvertent conflict among variable names. Note that this
@@ -19050,7 +19299,7 @@ The leading capital letter indicates that it is global, while the fact that
the variable name is not all capital letters indicates that the variable is
not one of @command{awk}'s built-in variables, such as @code{FS}.
-@cindex @option{--dump-variables} option
+@cindex @option{--dump-variables} option, using for library functions
It is also important that @emph{all} variables in library
functions that do not need to save state are, in fact, declared
local.@footnote{@command{gawk}'s @option{--dump-variables} command-line
@@ -19345,9 +19594,9 @@ with an @code{exit} statement.
The way @code{printf} and @code{sprintf()}
(@pxref{Printf})
perform rounding often depends upon the system's C @code{sprintf()}
-subroutine. On many machines, @code{sprintf()} rounding is ``unbiased,''
-which means it doesn't always round a trailing @samp{.5} up, contrary
-to naive expectations. In unbiased rounding, @samp{.5} rounds to even,
+subroutine. On many machines, @code{sprintf()} rounding is @dfn{unbiased},
+which means it doesn't always round a trailing .5 up, contrary
+to naive expectations. In unbiased rounding, .5 rounds to even,
rather than always up, so 1.5 rounds to 2 but 4.5 rounds to 4. This means
that if you are using a format that does rounding (e.g., @code{"%.0f"}),
you should check what your system does. The following function does
@@ -19396,7 +19645,7 @@ function round(x, ival, aval, fraction)
@c don't include test harness in the file that gets installed
# test harness
-@{ print $0, round($0) @}
+# @{ print $0, round($0) @}
@end example
@node Cliff Random Function
@@ -19463,6 +19712,7 @@ reason to build them into the @command{awk} interpreter:
@cindex @code{ord()} user-defined function
@cindex @code{chr()} user-defined function
+@cindex @code{_ord_init()} user-defined function
@example
@c file eg/lib/ord.awk
# ord.awk --- do ord and chr
@@ -19509,8 +19759,9 @@ function _ord_init( low, high, i, t)
@cindex character sets (machine character encodings)
@cindex ASCII
@cindex EBCDIC
+@cindex Unicode
@cindex mark parity
-Some explanation of the numbers used by @code{chr()} is worthwhile.
+Some explanation of the numbers used by @code{_ord_init()} is worthwhile.
The most prominent character set in use today is ASCII.@footnote{This
is changing; many systems use Unicode, a very large character set
that includes ASCII as a subset. On systems with full Unicode support,
@@ -19521,7 +19772,7 @@ Although an
defines characters that use the values from 0 to 127.@footnote{ASCII
has been extended in many countries to use the values from 128 to 255
for country-specific characters. If your system uses these extensions,
-you can simplify @code{_ord_init} to loop from 0 to 255.}
+you can simplify @code{_ord_init()} to loop from 0 to 255.}
In the now distant past,
at least one minicomputer manufacturer
@c Pr1me, blech
@@ -20195,7 +20446,7 @@ END @{
Occasionally, you might not want @command{awk} to process command-line
variable assignments
(@pxref{Assignment Options}).
-In particular, if you have a file name that contain an @samp{=} character,
+In particular, if you have a file name that contains an @samp{=} character,
@command{awk} treats the file name as an assignment, and does not process it.
Some users have suggested an additional command-line option for @command{gawk}
@@ -20865,7 +21116,7 @@ from anywhere within a user's program, and the user may have his
or her
own way of splitting records and fields.
-@cindex @code{PROCINFO} array
+@cindex @code{PROCINFO} array, testing the field splitting
The @code{using_fw} variable checks @code{PROCINFO["FS"]}, which
is @code{"FIELDWIDTHS"} if field splitting is being done with
@code{FIELDWIDTHS}. This makes it possible to restore the correct
@@ -20874,7 +21125,7 @@ field-splitting mechanism later. The test can only be true for
or on some other @command{awk} implementation.
The code that checks for using @code{FPAT}, using @code{using_fpat}
-and @code{PROCINFO["FS"]} is similar.
+and @code{PROCINFO["FS"]}, is similar.
The main part of the function uses a loop to read database lines, split
the line into fields, and then store the line into each array as necessary.
@@ -20904,10 +21155,9 @@ function getpwnam(name)
@end example
@cindex @code{getpwuid()} function (C library)
-Similarly,
-the @code{getpwuid} function takes a user ID number argument. If that
-user number is in the database, it returns the appropriate line. Otherwise, it
-returns the null string:
+Similarly, the @code{getpwuid()} function takes a user ID number
+argument. If that user number is in the database, it returns the
+appropriate line. Otherwise, it returns the null string:
@cindex @code{getpwuid()} user-defined function
@example
@@ -20989,7 +21239,7 @@ uses these functions.
@cindex group database, reading
@c STARTOFRANGE datagr
@cindex database, group, reading
-@cindex @code{PROCINFO} array
+@cindex @code{PROCINFO} array, and group membership
@cindex @code{getgrent()} function (C library)
@cindex @code{getgrent()} user-defined function
@cindex groups@comma{} information about
@@ -21411,7 +21661,7 @@ index and value, use the indirect function call syntax
and the value.
When calling @code{walk_array()}, you would pass the name of a user-defined
-function that expects to receive and index and a value, and then processes
+function that expects to receive an index and a value, and then processes
the element.
@@ -21765,7 +22015,7 @@ complete field list, including filler fields:
@example
@c file eg/prog/cut.awk
-function set_charlist( field, i, j, f, g, t,
+function set_charlist( field, i, j, f, g, n, m, t,
filler, last, len)
@{
field = 1 # count total fields
@@ -21862,6 +22112,7 @@ of picking the input line apart by characters.
@cindex searching, files for regular expressions
@c STARTOFRANGE fsregexp
@cindex files, searching for regular expressions
+@c STARTOFRANGE egrep
@cindex @command{egrep} utility
The @command{egrep} utility searches files for patterns. It uses regular
expressions that are almost identical to those available in @command{awk}
@@ -22147,12 +22398,14 @@ or not.
@c ENDOFRANGE regexps
@c ENDOFRANGE sfregexp
@c ENDOFRANGE fsregexp
+@c ENDOFRANGE egrep
@node Id Program
@subsection Printing out User Information
@cindex printing, user information
@cindex users, information about, printing
+@c STARTOFRANGE id
@cindex @command{id} utility
The @command{id} utility lists a user's real and effective user ID numbers,
real and effective group ID numbers, and the user's group set, if any.
@@ -22165,7 +22418,7 @@ $ @kbd{id}
@print{} uid=500(arnold) gid=500(arnold) groups=6(disk),7(lp),19(floppy)
@end example
-@cindex @code{PROCINFO} array
+@cindex @code{PROCINFO} array, and user and group ID numbers
This information is part of what is provided by @command{gawk}'s
@code{PROCINFO} array (@pxref{Built-in Variables}).
However, the @command{id} utility provides a more palatable output than just
@@ -22266,7 +22519,6 @@ BEGIN \
@c endfile
@end example
-@cindex @code{in} operator
The test in the @code{for} loop is worth noting.
Any supplementary groups in the @code{PROCINFO} array have the
indices @code{"group1"} through @code{"group@var{N}"} for some
@@ -22276,7 +22528,7 @@ there are.
This loop works by starting at one, concatenating the value with
@code{"group"}, and then using @code{in} to see if that value is
-in the array. Eventually, @code{i} is incremented past
+in the array (@pxref{Reference to Elements}). Eventually, @code{i} is incremented past
the last group in the array and the loop exits.
The loop is also correct if there are @emph{no} supplementary
@@ -22289,6 +22541,7 @@ The POSIX version of @command{id} takes arguments that control which
information is printed. Modify this version to accept the same
arguments and perform in the same way.
@end ignore
+@c ENDOFRANGE id
@node Split Program
@subsection Splitting a Large File into Pieces
@@ -22297,6 +22550,7 @@ arguments and perform in the same way.
@c STARTOFRANGE filspl
@cindex files, splitting
+@c STARTOFRANGE split
@cindex @code{split} utility
The @command{split} program splits large text files into smaller pieces.
Usage is as follows:@footnote{This is the traditional usage. The
@@ -22440,12 +22694,14 @@ which isn't true for EBCDIC systems.
@c Exercise: Fix these problems.
@c BFD...
@c ENDOFRANGE filspl
+@c ENDOFRANGE split
@node Tee Program
@subsection Duplicating Output into Multiple Files
@cindex files, multiple@comma{} duplicating output into
@cindex output, duplicating into files
+@c STARTOFRANGE tee
@cindex @code{tee} utility
The @code{tee} program is known as a ``pipe fitting.'' @code{tee} copies
its standard input to its standard output and also duplicates it to the
@@ -22560,6 +22816,7 @@ END \
@}
@c endfile
@end example
+@c ENDOFRANGE tee
@node Uniq Program
@subsection Printing Nonduplicated Lines of Text
@@ -22570,6 +22827,7 @@ END \
@cindex printing, unduplicated lines of text
@c STARTOFRANGE tpul
@cindex text@comma{} printing, unduplicated lines of
+@c STARTOFRANGE uniq
@cindex @command{uniq} utility
The @command{uniq} utility reads sorted lines of data on its standard
input, and by default removes duplicate lines. In other words, it only
@@ -22821,6 +23079,7 @@ END @{
@end example
@c ENDOFRANGE prunt
@c ENDOFRANGE tpul
+@c ENDOFRANGE uniq
@node Wc Program
@subsection Counting Things
@@ -22837,6 +23096,7 @@ END @{
@cindex characters, counting
@c STARTOFRANGE lico
@cindex lines, counting
+@c STARTOFRANGE wc
@cindex @command{wc} utility
The @command{wc} (word count) utility counts lines, words, and characters in
one or more input files. Its usage is as follows:
@@ -23019,6 +23279,7 @@ END @{
@c ENDOFRANGE lico
@c ENDOFRANGE woco
@c ENDOFRANGE chco
+@c ENDOFRANGE wc
@c ENDOFRANGE posimawk
@node Miscellaneous Programs
@@ -23313,6 +23574,7 @@ seconds are necessary:
@c STARTOFRANGE chtra
@cindex characters, transliterating
+@c STARTOFRANGE tr
@cindex @command{tr} utility
The system @command{tr} utility transliterates characters. For example, it is
often used to map uppercase letters into lowercase for further processing:
@@ -23461,6 +23723,7 @@ An obvious improvement to this program would be to set up the
assumes that the ``from'' and ``to'' lists
will never change throughout the lifetime of the program.
@c ENDOFRANGE chtra
+@c ENDOFRANGE tr
@node Labels Program
@subsection Printing Mailing Labels
@@ -23520,6 +23783,7 @@ that there are two blank lines at the top and two blank lines at the bottom.
The @code{END} rule arranges to flush the final page of labels; there may
not have been an even multiple of 20 labels in the data:
+@c STARTOFRANGE labels
@cindex @code{labels.awk} program
@example
@c file eg/prog/labels.awk
@@ -23587,6 +23851,7 @@ END \
@end example
@c ENDOFRANGE prml
@c ENDOFRANGE mlprint
+@c ENDOFRANGE labels
@node Word Sorting
@subsection Generating Word-Usage Counts
@@ -23653,6 +23918,7 @@ to remove punctuation characters. Finally, we solve the third problem
by using the system @command{sort} utility to process the output of the
@command{awk} script. Here is the new version of the program:
+@c STARTOFRANGE wordfreq
@cindex @code{wordfreq.awk} program
@example
@c file eg/prog/wordfreq.awk
@@ -23714,6 +23980,7 @@ have true pipes at the command-line (or batch-file) level.
See the general operating system documentation for more information on how
to use the @command{sort} program.
@c ENDOFRANGE worus
+@c ENDOFRANGE wordfreq
@node History Sorting
@subsection Removing Duplicates from Unsorted Text
@@ -23743,6 +24010,7 @@ Each element of @code{lines} is a unique command, and the indices of
The @code{END} rule simply prints out the lines, in order:
@cindex Rakitzis, Byron
+@c STARTOFRANGE histsort
@cindex @code{histsort.awk} program
@example
@c file eg/prog/histsort.awk
@@ -23785,6 +24053,7 @@ print data[lines[i]], lines[i]
This works because @code{data[$0]} is incremented each time a line is
seen.
@c ENDOFRANGE lidu
+@c ENDOFRANGE histsort
@node Extract Program
@subsection Extracting Programs from Texinfo Source Files
@@ -23895,6 +24164,7 @@ The first rule handles calling @code{system()}, checking that a command is
given (@code{NF} is at least three) and also checking that the command
exits with a zero exit status, signifying OK:
+@c STARTOFRANGE extract
@cindex @code{extract.awk} program
@example
@c file eg/prog/extract.awk
@@ -24053,6 +24323,7 @@ END @{
@end example
@c ENDOFRANGE texse
@c ENDOFRANGE fitex
+@c ENDOFRANGE extract
@node Simple Sed
@subsection A Simple Stream Editor
@@ -24082,6 +24353,7 @@ additional arguments are treated as data file names to process. If none
are provided, the standard input is used:
@cindex Brennan, Michael
+@c STARTOFRANGE awksed
@cindex @command{awksed.awk} program
@c @cindex simple stream editor
@c @cindex stream editor, simple
@@ -24178,6 +24450,7 @@ Exercise: what are the advantages and disadvantages of this version versus sed?
Others?
@end ignore
+@c ENDOFRANGE awksed
@node Igawk Program
@subsection An Easy Way to Use Library Functions
@@ -24321,6 +24594,7 @@ program.
The program is as follows:
+@c STARTOFRANGE igawk
@cindex @code{igawk.sh} program
@example
@c file eg/prog/igawk.sh
@@ -24493,7 +24767,7 @@ BEGIN @{
@c endfile
@end example
-The stack is initialized with @code{ARGV[1]}, which will be @file{/dev/stdin}.
+The stack is initialized with @code{ARGV[1]}, which will be @samp{/dev/stdin}.
The main loop comes next. Input lines are read in succession. Lines that
do not start with @samp{@@include} are printed verbatim.
If the line does start with @samp{@@include}, the file name is in @code{$2}.
@@ -24680,10 +24954,12 @@ statements for the desired library functions.
@c ENDOFRANGE libfex
@c ENDOFRANGE flibex
@c ENDOFRANGE awkpex
+@c ENDOFRANGE igawk
@node Anagram Program
@subsection Finding Anagrams From A Dictionary
+@cindex anagrams, finding
An interesting programming challenge is to
search for @dfn{anagrams} in a
word list (such as
@@ -24703,6 +24979,7 @@ The following program uses arrays of arrays to bring together
words with the same signature and array sorting to print the words
in sorted order.
+@c STARTOFRANGE anagram
@cindex @code{anagram.awk} program
@example
@c file eg/prog/anagram.awk
@@ -24810,10 +25087,13 @@ babels beslab
babery yabber
@dots{}
@end example
+@c ENDOFRANGE anagram
@node Signature Program
@subsection And Now For Something Completely Different
+@cindex signature program
+@cindex Brini, Davide
The following program was written by Davide Brini
@c (@email{dave_br@@gmx.com})
and is published on @uref{http://backreference.org/2011/02/03/obfuscated-awk/,
@@ -24945,12 +25225,15 @@ It contains the following chapters:
@item
@ref{Dynamic Extensions}.
+@end itemize
@end ifdocbook
@end ignore
@node Advanced Features
@chapter Advanced Features of @command{gawk}
+@ifset WITH_NETWORK_CHAPTER
@cindex advanced features, network connections, See Also networks@comma{} connections
+@end ifset
@c STARTOFRANGE gawadv
@cindex @command{gawk}, features, advanced
@c STARTOFRANGE advgaw
@@ -25357,9 +25640,9 @@ sorted array traversal is not the default.
@subsection Sorting Array Values and Indices with @command{gawk}
@cindex arrays, sorting
-@cindex @code{asort()} function (@command{gawk})
+@cindexgawkfunc{asort}
@cindex @code{asort()} function (@command{gawk}), arrays@comma{} sorting
-@cindex @code{asorti()} function (@command{gawk})
+@cindexgawkfunc{asorti}
@cindex @code{asorti()} function (@command{gawk}), arrays@comma{} sorting
@cindex sort function, arrays, sorting
In most @command{awk} implementations, sorting an array requires writing
@@ -25454,9 +25737,8 @@ both arrays use the values.
@c Document It And Call It A Feature. Sigh.
@cindex @command{gawk}, @code{IGNORECASE} variable in
-@cindex @code{IGNORECASE} variable
-@cindex arrays, sorting, @code{IGNORECASE} variable and
-@cindex @code{IGNORECASE} variable, array sorting and
+@cindex arrays, sorting, and @code{IGNORECASE} variable
+@cindex @code{IGNORECASE} variable, and array sorting functions
Because @code{IGNORECASE} affects string comparisons, the value
of @code{IGNORECASE} also affects sorting for both @code{asort()} and @code{asorti()}.
Note also that the locale's sorting order does @emph{not}
@@ -25535,7 +25817,7 @@ open a @emph{two-way} pipe to another process. The second process is
termed a @dfn{coprocess}, since it runs in parallel with @command{gawk}.
The two-way connection is created using the @samp{|&} operator
(borrowed from the Korn shell, @command{ksh}):@footnote{This is very
-different from the same operator in the C shell.}
+different from the same operator in the C shell and in Bash.}
@example
do @{
@@ -25625,7 +25907,7 @@ As a side note, the assignment @samp{LC_ALL=C} in the @command{sort}
command ensures traditional Unix (ASCII) sorting from @command{sort}.
@cindex @command{gawk}, @code{PROCINFO} array in
-@cindex @code{PROCINFO} array
+@cindex @code{PROCINFO} array, and communications via ptys
You may also use pseudo-ttys (ptys) for
two-way communication instead of pipes, if your system supports them.
This is done on a per-command basis, by setting a special element
@@ -25823,52 +26105,60 @@ foo
junk
@end example
-Here is the @file{awkprof.out} that results from running the @command{gawk}
-profiler on this program and data (this example also illustrates that @command{awk}
-programmers sometimes have to work late):
+Here is the @file{awkprof.out} that results from running the
+@command{gawk} profiler on this program and data. (This example also
+illustrates that @command{awk} programmers sometimes get up very early
+in the morning to work.)
-@cindex @code{BEGIN} pattern
-@cindex @code{END} pattern
+@cindex @code{BEGIN} pattern, and profiling
+@cindex @code{END} pattern, and profiling
@example
- # gawk profile, created Sun Aug 13 00:00:15 2000
+ # gawk profile, created Thu Feb 27 05:16:21 2014
- # BEGIN block(s)
+ # BEGIN block(s)
- BEGIN @{
- 1 print "First BEGIN rule"
- 1 print "Second BEGIN rule"
- @}
+ BEGIN @{
+ 1 print "First BEGIN rule"
+ @}
- # Rule(s)
+ BEGIN @{
+ 1 print "Second BEGIN rule"
+ @}
- 5 /foo/ @{ # 2
- 2 print "matched /foo/, gosh"
- 6 for (i = 1; i <= 3; i++) @{
- 6 sing()
- @}
- @}
+ # Rule(s)
- 5 @{
- 5 if (/foo/) @{ # 2
- 2 print "if is true"
- 3 @} else @{
- 3 print "else is true"
- @}
- @}
+ 5 /foo/ @{ # 2
+ 2 print "matched /foo/, gosh"
+ 6 for (i = 1; i <= 3; i++) @{
+ 6 sing()
+ @}
+ @}
- # END block(s)
+ 5 @{
+ 5 if (/foo/) @{ # 2
+ 2 print "if is true"
+ 3 @} else @{
+ 3 print "else is true"
+ @}
+ @}
- END @{
- 1 print "First END rule"
- 1 print "Second END rule"
- @}
+ # END block(s)
- # Functions, listed alphabetically
+ END @{
+ 1 print "First END rule"
+ @}
- 6 function sing(dummy)
- @{
- 6 print "I gotta be me!"
- @}
+ END @{
+ 1 print "Second END rule"
+ @}
+
+
+ # Functions, listed alphabetically
+
+ 6 function sing(dummy)
+ @{
+ 6 print "I gotta be me!"
+ @}
@end example
This example illustrates many of the basic features of profiling output.
@@ -25876,15 +26166,16 @@ They are as follows:
@itemize @bullet
@item
-The program is printed in the order @code{BEGIN} rule,
-@code{BEGINFILE} rule,
+The program is printed in the order @code{BEGIN} rules,
+@code{BEGINFILE} rules,
pattern/action rules,
-@code{ENDFILE} rule, @code{END} rule and functions, listed
+@code{ENDFILE} rules, @code{END} rules and functions, listed
alphabetically.
-Multiple @code{BEGIN} and @code{END} rules are merged together,
-as are multiple @code{BEGINFILE} and @code{ENDFILE} rules.
+Multiple @code{BEGIN} and @code{END} rules retain their
+separate identities, as do
+multiple @code{BEGINFILE} and @code{ENDFILE} rules.
-@cindex patterns, counts
+@cindex patterns, counts, in a profile
@item
Pattern-action rules have two counts.
The first count, to the left of the rule, shows how many times
@@ -25904,7 +26195,7 @@ is a count showing how many times the condition was true.
The count for the @code{else}
indicates how many times the test failed.
-@cindex loops, count for header
+@cindex loops, count for header, in a profile
@item
The count for a loop header (such as @code{for}
or @code{while}) shows how many times the loop test was executed.
@@ -25912,8 +26203,8 @@ or @code{while}) shows how many times the loop test was executed.
statement in a rule to determine how many times the rule was executed.
If the first statement is a loop, the count is misleading.)
-@cindex functions, user-defined, counts
-@cindex user-defined, functions, counts
+@cindex functions, user-defined, counts, in a profile
+@cindex user-defined, functions, counts, in a profile
@item
For user-defined functions, the count next to the @code{function}
keyword indicates how many times the function was called.
@@ -25927,8 +26218,8 @@ The layout uses ``K&R'' style with TABs.
Braces are used everywhere, even when
the body of an @code{if}, @code{else}, or loop is only a single statement.
-@cindex @code{()} (parentheses)
-@cindex parentheses @code{()}
+@cindex @code{()} (parentheses), in a profile
+@cindex parentheses @code{()}, in a profile
@item
Parentheses are used only where needed, as indicated by the structure
of the program and the precedence rules.
@@ -25963,8 +26254,8 @@ typed when you wrote it. This is because @command{gawk} creates the
profiled version by ``pretty printing'' its internal representation of
the program. The advantage to this is that @command{gawk} can produce
a standard representation. The disadvantage is that all source-code
-comments are lost, as are the distinctions among multiple @code{BEGIN},
-@code{END}, @code{BEGINFILE}, and @code{ENDFILE} rules. Also, things such as:
+comments are lost.
+Also, things such as:
@example
/foo/
@@ -25984,6 +26275,7 @@ which is correct, but possibly surprising.
@cindex profiling @command{awk} programs, dynamically
@cindex @command{gawk} program, dynamic profiling
+@cindex dynamic profiling
Besides creating profiles when a program has completed,
@command{gawk} can produce a profile while it is running.
This is useful if your @command{awk} program goes into an
@@ -25997,9 +26289,9 @@ $ @kbd{gawk --profile -f myprog &}
@end example
@cindex @command{kill} command@comma{} dynamic profiling
-@cindex @code{USR1} signal
-@cindex @code{SIGUSR1} signal
-@cindex signals, @code{USR1}/@code{SIGUSR1}
+@cindex @code{USR1} signal, for dynamic profiling
+@cindex @code{SIGUSR1} signal, for dynamic profiling
+@cindex signals, @code{USR1}/@code{SIGUSR1}, for profiling
@noindent
The shell prints a job number and process ID number; in this case, 13992.
Use the @command{kill} command to send the @code{USR1} signal
@@ -26030,9 +26322,9 @@ You may send @command{gawk} the @code{USR1} signal as many times as you like.
Each time, the profile and function call trace are appended to the output
profile file.
-@cindex @code{HUP} signal
-@cindex @code{SIGHUP} signal
-@cindex signals, @code{HUP}/@code{SIGHUP}
+@cindex @code{HUP} signal, for dynamic profiling
+@cindex @code{SIGHUP} signal, for dynamic profiling
+@cindex signals, @code{HUP}/@code{SIGHUP}, for profiling
If you use the @code{HUP} signal instead of the @code{USR1} signal,
@command{gawk} produces the profile and the function call trace and then exits.
@@ -26054,6 +26346,11 @@ keyboard. The @code{INT} signal is generated by the
Finally, @command{gawk} also accepts another option, @option{--pretty-print}.
When called this way, @command{gawk} ``pretty prints'' the program into
@file{awkprof.out}, without any execution counts.
+
+@quotation NOTE
+The @option{--pretty-print} option still runs your program.
+This will change in the next major release.
+@end quotation
@c ENDOFRANGE advgaw
@c ENDOFRANGE gawadv
@c ENDOFRANGE awkp
@@ -26165,6 +26462,7 @@ lookup of the translations.
@cindex @code{.po} files
@cindex files, @code{.po}
+@c STARTOFRANGE portobfi
@cindex portable object files
@cindex files, portable object
@item
@@ -26176,6 +26474,7 @@ For example, there might be a @file{fr.po} for a French translation.
@cindex @code{.gmo} files
@cindex files, @code{.gmo}
@cindex message object files
+@c STARTOFRANGE portmsgfi
@cindex files, message object
@item
Each language's @file{.po} file is converted into a binary
@@ -26323,7 +26622,7 @@ String constants marked with a leading underscore
are candidates for translation at runtime.
String constants without a leading underscore are not translated.
-@cindex @code{dcgettext()} function (@command{gawk})
+@cindexgawkfunc{dcgettext}
@item dcgettext(@var{string} @r{[}, @var{domain} @r{[}, @var{category}@r{]]})
Return the translation of @var{string} in
text domain @var{domain} for locale category @var{category}.
@@ -26349,7 +26648,7 @@ chosen to be simple and to allow for reasonable @command{awk}-style
default arguments.
@end quotation
-@cindex @code{dcngettext()} function (@command{gawk})
+@cindexgawkfunc{dcngettext}
@item dcngettext(@var{string1}, @var{string2}, @var{number} @r{[}, @var{domain} @r{[}, @var{category}@r{]]})
Return the plural form used for @var{number} of the
translation of @var{string1} and @var{string2} in text domain
@@ -26365,7 +26664,7 @@ The same remarks about argument order as for the @code{dcgettext()} function app
@cindex files, @code{.gmo}, specifying directory of
@cindex message object files, specifying directory of
@cindex files, message object, specifying directory of
-@cindex @code{bindtextdomain()} function (@command{gawk})
+@cindexgawkfunc{bindtextdomain}
@item bindtextdomain(@var{directory} @r{[}, @var{domain}@r{]})
Change the directory in which
@code{gettext} looks for @file{.gmo} files, in case they
@@ -26467,7 +26766,7 @@ and use translations from @command{awk}.
@cindex portable object files
@cindex files, portable object
Once a program's translatable strings have been marked, they must
-be extracted to create the initial @file{.po} file.
+be extracted to create the initial @file{.pot} file.
As part of translation, it is often helpful to rearrange the order
in which arguments to @code{printf} are output.
@@ -26516,6 +26815,8 @@ second argument to @code{dcngettext()}.@footnote{The
@xref{I18N Example},
for the full list of steps to go through to create and test
translations for @command{guide}.
+@c ENDOFRANGE portobfi
+@c ENDOFRANGE portmsgfi
@node Printf Ordering
@subsection Rearranging @code{printf} Arguments
@@ -26562,7 +26863,7 @@ example, @samp{string} is the first argument and @samp{length(string)} is the se
@example
$ @kbd{gawk 'BEGIN @{}
> @kbd{string = "Dont Panic"}
-> @kbd{printf _"%2$d characters live in \"%1$s\"\n",}
+> @kbd{printf "%2$d characters live in \"%1$s\"\n",}
> @kbd{string, length(string)}
> @kbd{@}'}
@print{} 10 characters live in "Dont Panic"
@@ -26596,7 +26897,7 @@ This is somewhat counterintuitive.
and those with positional specifiers in the same string:
@example
-$ @kbd{gawk 'BEGIN @{ printf _"%d %3$s\n", 1, 2, "hi" @}'}
+$ @kbd{gawk 'BEGIN @{ printf "%d %3$s\n", 1, 2, "hi" @}'}
@error{} gawk: cmd. line:1: fatal: must use `count$' on all formats or none
@end example
@@ -26937,6 +27238,7 @@ The following list defines terms used throughout the rest of
this @value{CHAPTER}.
@table @dfn
+@cindex stack frame
@item Stack Frame
Programs generally call functions during the course of their execution.
One function can call another, or a function can call itself (recursion).
@@ -26958,6 +27260,7 @@ invoked. Commands that print the call stack print information about
each stack frame (as detailed later on).
@item Breakpoint
+@cindex breakpoint
During debugging, you often wish to let the program run until it
reaches a certain point, and then continue execution from there one
statement (or instruction) at a time. The way to do this is to set
@@ -26967,6 +27270,7 @@ take over control of the program's execution. You can add and remove
as many breakpoints as you like.
@item Watchpoint
+@cindex watchpoint
A watchpoint is similar to a breakpoint. The difference is that
breakpoints are oriented around the code: stop when a certain point in the
code is reached. A watchpoint, however, specifies that program execution
@@ -26998,6 +27302,7 @@ by the higher-level @command{awk} commands.
@node Sample Debugging Session
@section Sample Debugging Session
+@cindex sample debugging session
In order to illustrate the use of @command{gawk} as a debugger, let's look at a sample
debugging session. We will use the @command{awk} implementation of the
@@ -27011,13 +27316,16 @@ as our example.
@node Debugger Invocation
@subsection How to Start the Debugger
+@cindex starting the debugger
+@cindex debugger, how to start
-Starting the debugger is almost exactly like running @command{awk}, except you have to
-pass an additional option @option{--debug} or the corresponding short option @option{-D}.
-The file(s) containing the program and any supporting code are given on the command
-line as arguments to one or more @option{-f} options. (@command{gawk} is not designed
-to debug command-line programs, only programs contained in files.) In our case,
-we invoke the debugger like this:
+Starting the debugger is almost exactly like running @command{gawk},
+except you have to pass an additional option @option{--debug} or the
+corresponding short option @option{-D}. The file(s) containing the
+program and any supporting code are given on the command line as arguments
+to one or more @option{-f} options. (@command{gawk} is not designed
+to debug command-line programs, only programs contained in files.)
+In our case, we invoke the debugger like this:
@example
$ @kbd{gawk -D -f getopt.awk -f join.awk -f uniq.awk inputfile}
@@ -27150,7 +27458,7 @@ gawk> @kbd{p NR}
@noindent
So we can see that @code{are_equal()} was only called for the second record
-of the file. Of course, this is because our program contained a rule for
+of the file. Of course, this is because our program contains a rule for
@samp{NR == 1}:
@example
@@ -27350,21 +27658,24 @@ controlling breakpoints are:
@cindex debugger commands, @code{break}
@cindex @code{break} debugger command
@cindex @code{b} debugger command (alias for @code{break})
+@cindex set breakpoint
+@cindex breakpoint, setting
@item @code{break} [[@var{filename}@code{:}]@var{n} | @var{function}] [@code{"@var{expression}"}]
@itemx @code{b} [[@var{filename}@code{:}]@var{n} | @var{function}] [@code{"@var{expression}"}]
Without any argument, set a breakpoint at the next instruction
to be executed in the selected stack frame.
Arguments can be one of the following:
+@c @asis for docbook
@c nested table
-@table @var
-@item n
+@table @asis
+@item @var{n}
Set a breakpoint at line number @var{n} in the current source file.
-@item filename@code{:}n
+@item @var{filename}@code{:}@var{n}
Set a breakpoint at line number @var{n} in source file @var{filename}.
-@item function
+@item @var{function}
Set a breakpoint at entry to (the first instruction of)
function @var{function}.
@end table
@@ -27380,6 +27691,8 @@ it continues executing the program.
@cindex debugger commands, @code{clear}
@cindex @code{clear} debugger command
+@cindex delete breakpoint at location
+@cindex breakpoint at location, how to delete
@item @code{clear} [[@var{filename}@code{:}]@var{n} | @var{function}]
Without any argument, delete any breakpoint at the next instruction
to be executed in the selected stack frame. If the program stops at
@@ -27387,19 +27700,20 @@ a breakpoint, this deletes that breakpoint so that the program
does not stop at that location again. Arguments can be one of the following:
@c nested table
-@table @var
-@item n
+@table @asis
+@item @var{n}
Delete breakpoint(s) set at line number @var{n} in the current source file.
-@item filename@code{:}n
+@item @var{filename}@code{:}@var{n}
Delete breakpoint(s) set at line number @var{n} in source file @var{filename}.
-@item function
+@item @var{function}
Delete breakpoint(s) set at entry to function @var{function}.
@end table
@cindex debugger commands, @code{condition}
@cindex @code{condition} debugger command
+@cindex breakpoint condition
@item @code{condition} @var{n} @code{"@var{expression}"}
Add a condition to existing breakpoint or watchpoint @var{n}. The
condition is an @command{awk} expression that the debugger evaluates
@@ -27413,6 +27727,8 @@ watchpoint is made unconditional.
@cindex debugger commands, @code{delete}
@cindex @code{delete} debugger command
@cindex @code{d} debugger command (alias for @code{delete})
+@cindex delete breakpoint by number
+@cindex breakpoint, delete by number
@item @code{delete} [@var{n1 n2} @dots{}] [@var{n}--@var{m}]
@itemx @code{d} [@var{n1 n2} @dots{}] [@var{n}--@var{m}]
Delete specified breakpoints or a range of breakpoints. Deletes
@@ -27420,6 +27736,8 @@ all defined breakpoints if no argument is supplied.
@cindex debugger commands, @code{disable}
@cindex @code{disable} debugger command
+@cindex disable breakpoint
+@cindex breakpoint, how to disable or enable
@item @code{disable} [@var{n1 n2} @dots{} | @var{n}--@var{m}]
Disable specified breakpoints or a range of breakpoints. Without
any argument, disables all breakpoints.
@@ -27428,6 +27746,7 @@ any argument, disables all breakpoints.
@cindex debugger commands, @code{enable}
@cindex @code{enable} debugger command
@cindex @code{e} debugger command (alias for @code{enable})
+@cindex enable breakpoint
@item @code{enable} [@code{del} | @code{once}] [@var{n1 n2} @dots{}] [@var{n}--@var{m}]
@itemx @code{e} [@code{del} | @code{once}] [@var{n1 n2} @dots{}] [@var{n}--@var{m}]
Enable specified breakpoints or a range of breakpoints. Without
@@ -27447,6 +27766,7 @@ the program stops at the breakpoint.
@cindex debugger commands, @code{ignore}
@cindex @code{ignore} debugger command
+@cindex ignore breakpoint
@item @code{ignore} @var{n} @var{count}
Ignore breakpoint number @var{n} the next @var{count} times it is
hit.
@@ -27455,6 +27775,7 @@ hit.
@cindex debugger commands, @code{tbreak}
@cindex @code{tbreak} debugger command
@cindex @code{t} debugger command (alias for @code{tbreak})
+@cindex temporary breakpoint
@item @code{tbreak} [[@var{filename}@code{:}]@var{n} | @var{function}]
@itemx @code{t} [[@var{filename}@code{:}]@var{n} | @var{function}]
Set a temporary breakpoint (enabled for only one stop).
@@ -27475,6 +27796,8 @@ execution of the program than we saw in our earlier example:
@cindex @code{silent} debugger command
@cindex debugger commands, @code{end}
@cindex @code{end} debugger command
+@cindex breakpoint commands
+@cindex commands to execute at breakpoint
@item @code{commands} [@var{n}]
@itemx @code{silent}
@itemx @dots{}
@@ -27502,6 +27825,7 @@ gawk>
@cindex debugger commands, @code{c} (@code{continue})
@cindex debugger commands, @code{continue}
+@cindex continue program, in debugger
@item @code{continue} [@var{count}]
@itemx @code{c} [@var{count}]
Resume program execution. If continued from a breakpoint and @var{count} is
@@ -27518,6 +27842,7 @@ Print the returned value.
@cindex debugger commands, @code{next}
@cindex @code{next} debugger command
@cindex @code{n} debugger command (alias for @code{next})
+@cindex single-step execution, in the debugger
@item @code{next} [@var{count}]
@itemx @code{n} [@var{count}]
Continue execution to the next source line, stepping over function calls.
@@ -27612,6 +27937,7 @@ items on the list.
@cindex debugger commands, @code{eval}
@cindex @code{eval} debugger command
+@cindex evaluate expressions, in debugger
@item @code{eval "@var{awk statements}"}
Evaluate @var{awk statements} in the context of the running program.
You can do anything that an @command{awk} program would do: assign
@@ -27629,6 +27955,7 @@ parameters defined by the program.
@cindex debugger commands, @code{print}
@cindex @code{print} debugger command
@cindex @code{p} debugger command (alias for @code{print})
+@cindex print variables, in debugger
@item @code{print} @var{var1}[@code{,} @var{var2} @dots{}]
@itemx @code{p} @var{var1}[@code{,} @var{var2} @dots{}]
Print the value of a @command{gawk} variable or field.
@@ -27662,6 +27989,7 @@ No newline is printed unless one is specified.
@cindex debugger commands, @code{set}
@cindex @code{set} debugger command
+@cindex assign values to variables, in debugger
@item @code{set} @var{var}@code{=}@var{value}
Assign a constant (number or string) value to an @command{awk} variable
or field.
@@ -27674,6 +28002,7 @@ You can also set special @command{awk} variables, such as @code{FS},
@cindex debugger commands, @code{watch}
@cindex @code{watch} debugger command
@cindex @code{w} debugger command (alias for @code{watch})
+@cindex set watchpoint
@item @code{watch} @var{var} | @code{$}@var{n} [@code{"@var{expression}"}]
@itemx @code{w} @var{var} | @code{$}@var{n} [@code{"@var{expression}"}]
Add variable @var{var} (or field @code{$@var{n}}) to the watch list.
@@ -27690,12 +28019,14 @@ then the debugger stops execution and prompts for a command. Otherwise,
@cindex debugger commands, @code{undisplay}
@cindex @code{undisplay} debugger command
+@cindex stop automatic display, in debugger
@item @code{undisplay} [@var{n}]
Remove item number @var{n} (or all items, if no argument) from the
automatic display list.
@cindex debugger commands, @code{unwatch}
@cindex @code{unwatch} debugger command
+@cindex delete watchpoint
@item @code{unwatch} [@var{n}]
Remove item number @var{n} (or all items, if no argument) from the
watch list.
@@ -27716,6 +28047,8 @@ functions which called the one you are in. The commands for doing this are:
@cindex debugger commands, @code{backtrace}
@cindex @code{backtrace} debugger command
@cindex @code{bt} debugger command (alias for @code{backtrace})
+@cindex call stack, display in debugger
+@cindex traceback, display in debugger
@item @code{backtrace} [@var{count}]
@itemx @code{bt} [@var{count}]
Print a backtrace of all function calls (stack frames), or innermost @var{count}
@@ -27769,25 +28102,32 @@ The value for @var{what} should be one of the following:
@c nested table
@table @code
@item args
+@cindex show function arguments, in debugger
Arguments of the selected frame.
@item break
+@cindex show breakpoints
List all currently set breakpoints.
@item display
+@cindex automatic displays, in debugger
List all items in the automatic display list.
@item frame
+@cindex describe call stack frame, in debugger
Description of the selected stack frame.
@item functions
+@cindex list function definitions, in debugger
List all function definitions including source file names and
line numbers.
@item locals
+@cindex show local variables, in debugger
Local variables of the selected frame.
@item source
+@cindex show name of current source file, in debugger
The name of the current source file. Each time the program stops, the
current source file is the file containing the current instruction.
When the debugger first starts, the current source file is the first file
@@ -27796,12 +28136,15 @@ included via the @option{-f} option. The
be used at any time to change the current source.
@item sources
+@cindex show all source files, in debugger
List all program sources.
@item variables
+@cindex list all global variables, in debugger
List all global variables.
@item watch
+@cindex show watchpoints
List all items in the watch list.
@end table
@end table
@@ -27815,6 +28158,8 @@ from a file. The commands are:
@cindex debugger commands, @code{option}
@cindex @code{option} debugger command
@cindex @code{o} debugger command (alias for @code{option})
+@cindex display debugger options
+@cindex debugger options
@item @code{option} [@var{name}[@code{=}@var{value}]]
@itemx @code{o} [@var{name}[@code{=}@var{value}]]
Without an argument, display the available debugger options
@@ -27826,30 +28171,37 @@ The available options are:
@c nested table
@table @code
@item history_size
+@cindex debugger history size
The maximum number of lines to keep in the history file @file{./.gawk_history}.
The default is 100.
@item listsize
+@cindex debugger default list amount
The number of lines that @code{list} prints. The default is 15.
@item outfile
+@cindex redirect @command{gawk} output, in debugger
Send @command{gawk} output to a file; debugger output still goes
to standard output. An empty string (@code{""}) resets output to
standard output.
@item prompt
+@cindex debugger prompt
The debugger prompt. The default is @samp{@w{gawk> }}.
@item save_history @r{[}on @r{|} off@r{]}
+@cindex debugger history file
Save command history to file @file{./.gawk_history}.
The default is @code{on}.
@item save_options @r{[}on @r{|} off@r{]}
+@cindex save debugger options
Save current options to file @file{./.gawkrc} upon exit.
The default is @code{on}.
Options are read back in to the next session upon startup.
@item trace @r{[}on @r{|} off@r{]}
+@cindex instruction tracing, in debugger
Turn instruction tracing on or off. The default is @code{off}.
@end table
@@ -27858,6 +28210,7 @@ Save the commands from the current session to the given file name,
so that they can be replayed using the @command{source} command.
@item @code{source} @var{filename}
+@cindex debugger, read commands from a file
Run command(s) from a file; an error in any command does not
terminate execution of subsequent commands. Comments (lines starting
with @samp{#}) are allowed in a command file.
@@ -27956,8 +28309,8 @@ about the command @var{command}.
@cindex debugger commands, @code{list}
@cindex @code{list} debugger command
@cindex @code{l} debugger command (alias for @code{list})
-@item @code{list} [@code{-} | @code{+} | @var{n} | @var{filename@code{:}n} | @var{n}--@var{m} | @var{function}]
-@itemx @code{l} [@code{-} | @code{+} | @var{n} | @var{filename@code{:}n} | @var{n}--@var{m} | @var{function}]
+@item @code{list} [@code{-} | @code{+} | @var{n} | @var{filename}@code{:}@var{n} | @var{n}--@var{m} | @var{function}]
+@itemx @code{l} [@code{-} | @code{+} | @var{n} | @var{filename}@code{:}@var{n} | @var{n}--@var{m} | @var{function}]
Print the specified lines (default 15) from the current source file
or the file named @var{filename}. The possible arguments to @code{list}
are as follows:
@@ -27977,7 +28330,7 @@ Print lines centered around line number @var{n}.
@item @var{n}--@var{m}
Print lines from @var{n} to @var{m}.
-@item @var{filename@code{:}n}
+@item @var{filename}@code{:}@var{n}
Print lines centered around line number @var{n} in
source file @var{filename}. This command may change the current source file.
@@ -27990,6 +28343,7 @@ function @var{function}. This command may change the current source file.
@cindex debugger commands, @code{quit}
@cindex @code{quit} debugger command
@cindex @code{q} debugger command (alias for @code{quit})
+@cindex exit the debugger
@item @code{quit}
@itemx @code{q}
Exit the debugger. Debugging is great fun, but sometimes we all have
@@ -28013,6 +28367,8 @@ fairly self-explanatory, and using @code{stepi} and @code{nexti} while
@node Readline Support
@section Readline Support
+@cindex command completion, in debugger
+@cindex history expansion, in debugger
If @command{gawk} is compiled with the @code{readline} library, you
can take advantage of that library's command completion and history expansion
@@ -28100,9 +28456,7 @@ be added, and of course feel free to try to add them yourself!
@cindex arbitrary precision
@cindex multiple precision
@cindex infinite precision
-@cindex floating-point numbers, arbitrary precision
-@cindex MPFR
-@cindex GMP
+@cindex floating-point, numbers@comma{} arbitrary precision
@cindex Knuth, Donald
@quotation
@@ -28446,23 +28800,38 @@ then the answer is
@math{2^{53}}.
@end iftex
@ifnottex
+@ifnotdocbook
2^53.
+@end ifnotdocbook
@end ifnottex
+@docbook
+2<superscript>53</superscript>. @c
+@end docbook
The next representable number is the even number
@iftex
@math{2^{53} + 2},
@end iftex
@ifnottex
+@ifnotdocbook
2^53 + 2,
+@end ifnotdocbook
@end ifnottex
+@docbook
+2<superscript>53</superscript> &plus; 2, @c
+@end docbook
meaning it is unlikely that you will be able to make
@command{gawk} print
@iftex
@math{2^{53} + 1}
@end iftex
@ifnottex
+@ifnotdocbook
2^53 + 1
+@end ifnotdocbook
@end ifnottex
+@docbook
+2<superscript>53</superscript> &plus; 1 @c
+@end docbook
in integer format.
The range of integers exactly representable by a 64-bit double
is
@@ -28470,8 +28839,13 @@ is
@math{[-2^{53}, 2^{53}]}.
@end iftex
@ifnottex
+@ifnotdocbook
[@minus{}2^53, 2^53].
+@end ifnotdocbook
@end ifnottex
+@docbook
+[&minus;2<superscript>53</superscript>, 2<superscript>53</superscript>]. @c
+@end docbook
If you ever see an integer outside this range in @command{awk}
using 64-bit doubles, you have reason to be very suspicious about
the accuracy of the output. Here is a simple program with erroneous output:
@@ -28695,8 +29069,13 @@ number is then
@math{s @cdot 2^e}.
@end iftex
@ifnottex
+@ifnotdocbook
@var{s * 2^e}.
+@end ifnotdocbook
@end ifnottex
+@docbook
+<emphasis>s &sdot; 2<superscript>e</superscript></emphasis>. @c
+@end docbook
The first bit of a non-zero binary significand
is always one, so the significand in an IEEE-754 format only includes the
fractional part, leaving the leading one implicit.
@@ -28866,6 +29245,8 @@ when you change the rounding mode.
@node Gawk and MPFR
@section @command{gawk} + MPFR = Powerful Arithmetic
+@cindex MPFR
+@cindex GMP
The rest of this @value{CHAPTER} describes how to use the arbitrary precision
(also known as @dfn{multiple precision} or @dfn{infinite precision}) numeric
@@ -28878,12 +29259,17 @@ The easiest way to find out is to look at the output of
the following command:
@example
-$ @kbd{gawk --version}
-@print{} GNU Awk 4.1.0, API: 1.0 (GNU MPFR 3.1.0-p3, GNU MP 5.0.2)
-@print{} Copyright (C) 1989, 1991-2013 Free Software Foundation.
+$ @kbd{./gawk --version}
+@print{} GNU Awk 4.1.1, API: 1.1 (GNU MPFR 3.1.0-p3, GNU MP 5.0.2)
+@print{} Copyright (C) 1989, 1991-2014 Free Software Foundation.
@dots{}
@end example
+@noindent
+(You may see different version numbers than what's shown here. That's OK;
+what's important is to see that GNU MPFR and GNU MP are listed in
+the output.)
+
@command{gawk} uses the
@uref{http://www.mpfr.org, GNU MPFR}
and
@@ -28937,8 +29323,13 @@ numbers are not implemented.}
(@math{emax = 2^{30} - 1, emin = -emax})
@end iftex
@ifnottex
+@ifnotdocbook
(@var{emax} = 2^30 @minus{} 1, @var{emin} = @minus{}@var{emax})
+@end ifnotdocbook
@end ifnottex
+@docbook
+(<emphasis>emax</emphasis> = 2<superscript>30</superscript> &minus; 1, <emphasis>emin</emphasis> = &minus;<emphasis>emax</emphasis>) @c
+@end docbook
for all floating-point contexts.
There is no explicit mechanism to adjust the exponent range.
MPFR does not implement subnormal numbers by default,
@@ -28970,6 +29361,7 @@ your program.
@node Setting Precision
@subsection Setting the Working Precision
@cindex @code{PREC} variable
+@cindex setting working precision
@command{gawk} uses a global working precision; it does not keep track of
the precision or accuracy of individual numbers. Performing an arithmetic
@@ -29009,8 +29401,15 @@ formula:
@math{prec = 3.322 @cdot dps}
@end iftex
@ifnottex
+@ifnotdocbook
@var{prec} = 3.322 * @var{dps}
+@end ifnotdocbook
@end ifnottex
+@docbook
+<para>
+<emphasis>prec</emphasis> = 3.322 &sdot; <emphasis>dps</emphasis> @c
+</para>
+@end docbook
@noindent
Here, @var{prec} denotes the binary precision
@@ -29045,6 +29444,7 @@ issues that occur because numbers are stored internally in binary.
@node Setting Rounding Mode
@subsection Setting the Rounding Mode
@cindex @code{ROUNDMODE} variable
+@cindex setting rounding mode
The @code{ROUNDMODE} variable provides
program level control over the rounding mode.
@@ -29112,6 +29512,7 @@ In the first case, the number is stored with the default precision of 53 bits.
@node Changing Precision
@subsection Changing the Precision of a Number
+@cindex changing precision of a number
@cindex Laurie, Dirk
@quotation
@@ -29229,7 +29630,8 @@ the problem at hand is often the correct approach in such situations.
@node Arbitrary Precision Integers
@section Arbitrary Precision Integer Arithmetic with @command{gawk}
-@cindex integer, arbitrary precision
+@cindex integers, arbitrary precision
+@cindex arbitrary precision integers
If one of the options @option{--bignum} or @option{-M} is specified,
@command{gawk} performs all
@@ -29243,8 +29645,13 @@ For example, the following computes
@math{5^{4^{3^{2}}}},
@end iftex
@ifnottex
+@ifnotdocbook
5^4^3^2,
+@end ifnotdocbook
@end ifnottex
+@docbook
+5<superscript>4<superscript>3<superscript>2</superscript></superscript></superscript>, @c
+@end docbook
the result of which is beyond the
limits of ordinary @command{gawk} numbers:
@@ -29266,9 +29673,16 @@ floating-point values instead, the precision needed for correct output
would be @math{3.322 @cdot 183231},
@end iftex
@ifnottex
+@ifnotdocbook
@samp{prec = 3.322 * dps}),
would be 3.322 x 183231,
+@end ifnotdocbook
@end ifnottex
+@docbook
+<emphasis>prec</emphasis> = 3.322 &sdot; <emphasis>dps</emphasis>),
+would be
+<emphasis>prec</emphasis> = 3.322 &sdot; 183231, @c
+@end docbook
or 608693.
The result from an arithmetic operation with an integer and a floating-point value
@@ -29317,7 +29731,7 @@ to begin with:
gawk -M 'BEGIN @{ n = 13.0; print n % 2.0 @}'
@end example
-Note that for the particular example above, there is likely best
+Note that for the particular example above, it is likely best
to just use the following:
@example
@@ -29326,6 +29740,7 @@ gawk -M 'BEGIN @{ n = 13; print n % 2 @}'
@node Dynamic Extensions
@chapter Writing Extensions for @command{gawk}
+@cindex dynamically loaded extensions
It is possible to add new functions written in C or C++ to @command{gawk} using
dynamically loaded libraries. This facility is available on systems
@@ -29360,6 +29775,7 @@ When @option{--sandbox} is specified, extensions are disabled
@node Extension Intro
@section Introduction
+@cindex plug-in
An @dfn{extension} (sometimes called a @dfn{plug-in}) is a piece of
external compiled code that @command{gawk} can load at runtime to
provide additional functionality, over and above the built-in capabilities
@@ -29405,8 +29821,14 @@ Communication between
@command{gawk} and an extension is two-way. First, when an extension
is loaded, it is passed a pointer to a @code{struct} whose fields are
function pointers.
+@ifnotdocbook
This is shown in @ref{load-extension}.
+@end ifnotdocbook
+@ifdocbook
+This is shown in @inlineraw{docbook, <xref linkend="load-extension"/>}.
+@end ifdocbook
+@ifnotdocbook
@float Figure,load-extension
@caption{Loading The Extension}
@c FIXME: One day, it should not be necessary to have two cases,
@@ -29419,13 +29841,27 @@ This is shown in @ref{load-extension}.
@center @image{api-figure1, , , Loading the extension}
@end ifnotinfo
@end float
+@end ifnotdocbook
+
+@docbook
+<figure id="load-extension">
+<title>Loading the extension</title>
+<graphic fileref="api-figure1.eps"/>
+</figure>
+@end docbook
The extension can call functions inside @command{gawk} through these
function pointers, at runtime, without needing (link-time) access
to @command{gawk}'s symbols. One of these function pointers is to a
function for ``registering'' new built-in functions.
+@ifnotdocbook
This is shown in @ref{load-new-function}.
+@end ifnotdocbook
+@ifdocbook
+This is shown in @inlineraw{docbook, <xref linkend="load-new-function"/>}.
+@end ifdocbook
+@ifnotdocbook
@float Figure,load-new-function
@caption{Loading The New Function}
@ifinfo
@@ -29435,14 +29871,28 @@ This is shown in @ref{load-new-function}.
@center @image{api-figure2, , , Loading the new function}
@end ifnotinfo
@end float
+@end ifnotdocbook
+
+@docbook
+<figure id="load-new-function">
+<title>Loading the new function</title>
+<graphic fileref="api-figure2.eps"/>
+</figure>
+@end docbook
In the other direction, the extension registers its new functions
with @command{gawk} by passing function pointers to the functions that
provide the new feature (@code{do_chdir()}, for example). @command{gawk}
associates the function pointer with a name and can then call it, using a
defined calling convention.
+@ifnotdocbook
This is shown in @ref{call-new-function}.
+@end ifnotdocbook
+@ifdocbook
+This is shown in @inlineraw{docbook, <xref linkend="call-new-function"/>}.
+@end ifdocbook
+@ifnotdocbook
@float Figure,call-new-function
@caption{Calling The New Function}
@ifinfo
@@ -29452,6 +29902,14 @@ This is shown in @ref{call-new-function}.
@center @image{api-figure3, , , Calling the new function}
@end ifnotinfo
@end float
+@end ifnotdocbook
+
+@docbook
+<figure id="call-new-function">
+<title>Calling The New Function</title>
+<graphic fileref="api-figure3.eps"/>
+</figure>
+@end docbook
The @code{do_@var{xxx}()} function, in turn, then uses the function
pointers in the API @code{struct} to do its work, such as updating
@@ -29488,6 +29946,7 @@ happen, but we all know how @emph{that} goes.)
@node Extension API Description
@section API Description
+@cindex extension API
This (rather large) @value{SECTION} describes the API in detail.
@@ -29495,6 +29954,7 @@ This (rather large) @value{SECTION} describes the API in detail.
* Extension API Functions Introduction:: Introduction to the API functions.
* General Data Types:: The data types.
* Requesting Values:: How to get a value.
+* Memory Allocation Functions:: Functions for allocating memory.
* Constructor Functions:: Functions for creating values.
* Registration Functions:: Functions to register things with
@command{gawk}.
@@ -29550,6 +30010,9 @@ Symbol table access: retrieving a global variable, creating one,
or changing one.
@item
+Allocating, reallocating, and releasing memory.
+
+@item
Creating and releasing cached values; this provides an
efficient way to use values for multiple variables and
can be a big performance win.
@@ -29588,10 +30051,8 @@ corresponding standard header file @emph{before} including @file{gawkapi.h}:
@item @code{EOF} @tab @code{<stdio.h>}
@item @code{FILE} @tab @code{<stdio.h>}
@item @code{NULL} @tab @code{<stddef.h>}
-@item @code{malloc()} @tab @code{<stdlib.h>}
@item @code{memcpy()} @tab @code{<string.h>}
@item @code{memset()} @tab @code{<string.h>}
-@item @code{realloc()} @tab @code{<stdlib.h>}
@item @code{size_t} @tab @code{<sys/types.h>}
@item @code{struct stat} @tab @code{<sys/stat.h>}
@end multitable
@@ -29621,8 +30082,9 @@ does not support this keyword, you should either place
All pointers filled in by @command{gawk} are to memory
managed by @command{gawk} and should be treated by the extension as
read-only. Memory for @emph{all} strings passed into @command{gawk}
-from the extension @emph{must} come from @code{malloc()} and is managed
-by @command{gawk} from then on.
+from the extension @emph{must} come from calling the API-provided function
+pointers @code{api_malloc()}, @code{api_calloc()} or @code{api_realloc()},
+and is managed by @command{gawk} from then on.
@item
The API defines several simple @code{struct}s that map values as seen
@@ -29662,6 +30124,8 @@ the macros as if they were functions.
@node General Data Types
@subsection General Purpose Data Types
+@cindex Robbins, Arnold
+@cindex Ramey, Chet
@quotation
@i{I have a true love/hate relationship with unions.}
@author Arnold Robbins
@@ -29702,7 +30166,8 @@ A simple boolean type.
This represents a mutable string. @command{gawk}
owns the memory pointed to if it supplied
the value. Otherwise, it takes ownership of the memory pointed to.
-@strong{Such memory must come from @code{malloc()}!}
+@strong{Such memory must come from calling the API-provided function
+pointers @code{api_malloc()}, @code{api_calloc()}, or @code{api_realloc()}!}
As mentioned earlier, strings are maintained using the current
multibyte encoding.
@@ -29818,7 +30283,94 @@ print an error message, or reissue the request for the actual
value type, as appropriate. This behavior is summarized in
@ref{table-value-types-returned}.
+@c FIXME: Try to do this with spans...
+@ifdocbook
+@anchor{table-value-types-returned}
+@end ifdocbook
+@docbook
+<informaltable>
+<tgroup cols="2">
+ <colspec colwidth="50*"/><colspec colwidth="50*"/>
+ <thead>
+ <row><entry></entry><entry><para>Type of Actual Value:</para></entry></row>
+ </thead>
+ <tbody>
+ <row><entry></entry><entry></entry></row>
+ </tbody>
+</tgroup>
+<tgroup cols="6">
+ <colspec colwidth="16.6*"/>
+ <colspec colwidth="16.6*"/>
+ <colspec colwidth="19.8*"/>
+ <colspec colwidth="15*"/>
+ <colspec colwidth="15*"/>
+ <colspec colwidth="16.6*"/>
+ <thead>
+ <row>
+ <entry></entry>
+ <entry></entry>
+ <entry><para>String</para></entry>
+ <entry><para>Number</para></entry>
+ <entry><para>Array</para></entry>
+ <entry><para>Undefined</para></entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry></entry>
+ <entry><para><emphasis role="bold">String</emphasis></para></entry>
+ <entry><para>String</para></entry>
+ <entry><para>String</para></entry>
+ <entry><para>false</para></entry>
+ <entry><para>false</para></entry>
+ </row>
+ <row>
+ <entry></entry>
+ <entry><para><emphasis role="bold">Number</emphasis></para></entry>
+ <entry><para>Number if can be converted, else false</para></entry>
+ <entry><para>Number</para></entry>
+ <entry><para>false</para></entry>
+ <entry><para>false</para></entry>
+ </row>
+ <row>
+ <entry><para><emphasis role="bold">Type</emphasis></para></entry>
+ <entry><para><emphasis role="bold">Array</emphasis></para></entry>
+ <entry><para>false</para></entry>
+ <entry><para>false</para></entry>
+ <entry><para>Array</para></entry>
+ <entry><para>false</para></entry>
+ </row>
+ <row>
+ <entry><para><emphasis role="bold">Requested:</emphasis></para></entry>
+ <entry><para><emphasis role="bold">Scalar</emphasis></para></entry>
+ <entry><para>Scalar</para></entry>
+ <entry><para>Scalar</para></entry>
+ <entry><para>false</para></entry>
+ <entry><para>false</para></entry>
+ </row>
+ <row>
+ <entry></entry>
+ <entry><para><emphasis role="bold">Undefined</emphasis></para></entry>
+ <entry><para>String</para></entry>
+ <entry><para>Number</para></entry>
+ <entry><para>Array</para></entry>
+ <entry><para>Undefined</para></entry>
+ </row>
+ <row>
+ <entry></entry>
+ <entry><para><emphasis role="bold">Value Cookie</emphasis></para></entry>
+ <entry><para>false</para></entry>
+ <entry><para>false</para></entry>
+ <entry><para>false</para>
+ </entry><entry><para>false</para></entry>
+ </row>
+ </tbody>
+</tgroup>
+</informaltable>
+@end docbook
+
@ifnotplaintext
+@ifnotdocbook
@float Table,table-value-types-returned
@caption{Value Types Returned}
@multitable @columnfractions .50 .50
@@ -29834,6 +30386,7 @@ value type, as appropriate. This behavior is summarized in
@item @tab @b{Value Cookie} @tab false @tab false @tab false @tab false
@end multitable
@end float
+@end ifnotdocbook
@end ifnotplaintext
@ifplaintext
@float Table,table-value-types-returned
@@ -29864,45 +30417,46 @@ value type, as appropriate. This behavior is summarized in
@end float
@end ifplaintext
-@node Constructor Functions
-@subsection Constructor Functions and Convenience Macros
+@node Memory Allocation Functions
+@subsection Memory Allocation Functions and Convenience Macros
+@cindex allocating memory for extensions
+@cindex extensions, allocating memory
-The API provides a number of @dfn{constructor} functions for creating
-string and numeric values, as well as a number of convenience macros.
-This @value{SUBSECTION} presents them all as function prototypes, in
-the way that extension code would use them.
+The API provides a number of @dfn{memory allocation} functions for
+allocating memory that can be passed to @command{gawk}, as well as a number of
+convenience macros.
@table @code
-@item static inline awk_value_t *
-@itemx make_const_string(const char *string, size_t length, awk_value_t *result)
-This function creates a string value in the @code{awk_value_t} variable
-pointed to by @code{result}. It expects @code{string} to be a C string constant
-(or other string data), and automatically creates a @emph{copy} of the data
-for storage in @code{result}. It returns @code{result}.
+@item void *gawk_malloc(size_t size);
+Call @command{gawk}-provided @code{api_malloc()} to allocate storage that may
+be passed to @command{gawk}.
-@item static inline awk_value_t *
-@itemx make_malloced_string(const char *string, size_t length, awk_value_t *result)
-This function creates a string value in the @code{awk_value_t} variable
-pointed to by @code{result}. It expects @code{string} to be a @samp{char *}
-value pointing to data previously obtained from @code{malloc()}. The idea here
-is that the data is passed directly to @command{gawk}, which assumes
-responsibility for it. It returns @code{result}.
+@item void *gawk_calloc(size_t nmemb, size_t size);
+Call @command{gawk}-provided @code{api_calloc()} to allocate storage that may
+be passed to @command{gawk}.
-@item static inline awk_value_t *
-@itemx make_null_string(awk_value_t *result)
-This specialized function creates a null string (the ``undefined'' value)
-in the @code{awk_value_t} variable pointed to by @code{result}.
-It returns @code{result}.
+@item void *gawk_realloc(void *ptr, size_t size);
+Call @command{gawk}-provided @code{api_realloc()} to allocate storage that may
+be passed to @command{gawk}.
-@item static inline awk_value_t *
-@itemx make_number(double num, awk_value_t *result)
-This function simply creates a numeric value in the @code{awk_value_t} variable
-pointed to by @code{result}.
+@item void gawk_free(void *ptr);
+Call @command{gawk}-provided @code{api_free()} to release storage that was
+allocated with @code{gawk_malloc()}, @code{gawk_calloc()} or @code{gawk_realloc()}.
@end table
-Two convenience macros may be used for allocating storage from @code{malloc()}
-and @code{realloc()}. If the allocation fails, they cause @command{gawk} to
-exit with a fatal error message. They should be used as if they were
+The API has to provide these functions because it is possible
+for an extension to be compiled and linked against a different
+version of the C library than was used for the @command{gawk}
+executable.@footnote{This is more common on MS-Windows systems, but
+can happen on Unix-like systems as well.} If @command{gawk} were
+to use its version of @code{free()} when the memory came from an
+unrelated version of @code{malloc()}, unexpected behavior would
+likely result.
+
+Two convenience macros may be used for allocating storage
+from the API-provided function pointers @code{api_malloc()} and
+@code{api_realloc()}. If the allocation fails, they cause @command{gawk}
+to exit with a fatal error message. They should be used as if they were
procedure calls that do not return a value.
@table @code
@@ -29914,7 +30468,7 @@ The arguments to this macro are as follows:
The pointer variable to point at the allocated storage.
@item type
-The type of the pointer variable, used to create a cast for the call to @code{malloc()}.
+The type of the pointer variable, used to create a cast for the call to @code{api_malloc()}.
@item size
The total number of bytes to be allocated.
@@ -29938,13 +30492,51 @@ make_malloced_string(message, strlen(message), & result);
@end example
@item #define erealloc(pointer, type, size, message) @dots{}
-This is like @code{emalloc()}, but it calls @code{realloc()},
-instead of @code{malloc()}.
+This is like @code{emalloc()}, but it calls @code{api_realloc()},
+instead of @code{api_malloc()}.
The arguments are the same as for the @code{emalloc()} macro.
@end table
+@node Constructor Functions
+@subsection Constructor Functions
+
+The API provides a number of @dfn{constructor} functions for creating
+string and numeric values, as well as a number of convenience macros.
+This @value{SUBSECTION} presents them all as function prototypes, in
+the way that extension code would use them.
+
+@table @code
+@item static inline awk_value_t *
+@itemx make_const_string(const char *string, size_t length, awk_value_t *result)
+This function creates a string value in the @code{awk_value_t} variable
+pointed to by @code{result}. It expects @code{string} to be a C string constant
+(or other string data), and automatically creates a @emph{copy} of the data
+for storage in @code{result}. It returns @code{result}.
+
+@item static inline awk_value_t *
+@itemx make_malloced_string(const char *string, size_t length, awk_value_t *result)
+This function creates a string value in the @code{awk_value_t} variable
+pointed to by @code{result}. It expects @code{string} to be a @samp{char *}
+value pointing to data previously obtained from the api-provided functions @code{api_malloc()}, @code{api_calloc()} or @code{api_realloc()}. The idea here
+is that the data is passed directly to @command{gawk}, which assumes
+responsibility for it. It returns @code{result}.
+
+@item static inline awk_value_t *
+@itemx make_null_string(awk_value_t *result)
+This specialized function creates a null string (the ``undefined'' value)
+in the @code{awk_value_t} variable pointed to by @code{result}.
+It returns @code{result}.
+
+@item static inline awk_value_t *
+@itemx make_number(double num, awk_value_t *result)
+This function simply creates a numeric value in the @code{awk_value_t} variable
+pointed to by @code{result}.
+@end table
+
@node Registration Functions
@subsection Registration Functions
+@cindex register extension
+@cindex extension registration
This @value{SECTION} describes the API functions for
registering parts of your extension with @command{gawk}.
@@ -29989,8 +30581,8 @@ Letter case in function names is significant.
This is a pointer to the C function that provides the desired
functionality.
The function must fill in the result with either a number
-or a string. @command{awk} takes ownership of any string memory.
-As mentioned earlier, string memory @strong{must} come from @code{malloc()}.
+or a string. @command{gawk} takes ownership of any string memory.
+As mentioned earlier, string memory @strong{must} come from the api-provided functions @code{api_malloc()}, @code{api_calloc()} or @code{api_realloc()}.
The @code{num_actual_args} argument tells the C function how many
actual parameters were passed from the calling @command{awk} code.
@@ -30066,6 +30658,7 @@ is invoked with the @option{--version} option.
@node Input Parsers
@subsubsection Customized Input Parsers
+@cindex customized input parser
By default, @command{gawk} reads text files as its input. It uses the value
of @code{RS} to find the end of the record, and then uses @code{FS}
@@ -30313,7 +30906,9 @@ Register the input parser pointed to by @code{input_parser} with
@node Output Wrappers
@subsubsection Customized Output Wrappers
+@cindex customized output wrapper
+@cindex output wrapper
An @dfn{output wrapper} is the mirror image of an input parser.
It allows an extension to take over the output to a file opened
with the @samp{>} or @samp{>>} I/O redirection operators (@pxref{Redirection}).
@@ -30427,6 +31022,7 @@ Register the output wrapper pointed to by @code{output_wrapper} with
@node Two-way processors
@subsubsection Customized Two-way Processors
+@cindex customized two-way processor
A @dfn{two-way processor} combines an input parser and an output wrapper for
two-way I/O with the @samp{|&} operator (@pxref{Redirection}). It makes identical
@@ -30484,6 +31080,8 @@ Register the two-way processor pointed to by @code{two_way_processor} with
@node Printing Messages
@subsection Printing Messages
+@cindex printing messages from extensions
+@cindex messages from extensions
You can print different kinds of warning messages from your
extension, as described below. Note that for these functions,
@@ -30557,6 +31155,7 @@ for more information on creating arrays.
@node Symbol Table Access
@subsection Symbol Table Access
+@cindex accessing global variables from extensions
Two sets of routines provide access to global variables, and one set
allows you to create and release cached values.
@@ -30602,6 +31201,13 @@ An extension can look up the value of @command{gawk}'s special variables.
However, with the exception of the @code{PROCINFO} array, an extension
cannot change any of those variables.
+@quotation NOTE
+It is possible for the lookup of @code{PROCINFO} to fail. This happens if
+the @command{awk} program being run does not reference @code{PROCINFO};
+in this case @command{gawk} doesn't bother to create the array and
+populate it.
+@end quotation
+
@node Symbol table by cookie
@subsubsection Variable Access and Update by Cookie
@@ -30728,7 +31334,7 @@ assign those values to variables using @code{sym_update()}
or @code{sym_update_scalar()}, as you like.
However, you can understand the point of cached values if you remember that
-@emph{every} string value's storage @emph{must} come from @code{malloc()}.
+@emph{every} string value's storage @emph{must} come from @code{api_malloc()}, @code{api_calloc()} or @code{api_realloc()}.
If you have 20 variables, all of which have the same string value, you
must create 20 identical copies of the string.@footnote{Numeric values
are clearly less problematic, requiring only a C @code{double} to store.}
@@ -30814,6 +31420,7 @@ you should release any cached values that you created, using
@node Array Manipulation
@subsection Array Manipulation
+@cindex array manipulation in extensions
The primary data structure@footnote{Okay, the only data structure.} in @command{awk}
is the associative array (@pxref{Arrays}).
@@ -30925,7 +31532,7 @@ requires that you understand how such values are converted to strings
(@pxref{Conversion}); thus using integral values is safest.
As with @emph{all} strings passed into @code{gawk} from an extension,
-the string value of @code{index} must come from @code{malloc()}, and
+the string value of @code{index} must come from the API-provided functions @code{api_malloc()}, @code{api_calloc()} or @code{api_realloc()} and
@command{gawk} releases the storage.
@item awk_bool_t set_array_element(awk_array_t a_cookie,
@@ -31393,6 +32000,8 @@ information about how @command{gawk} was invoked.
@node Extension Versioning
@subsubsection API Version Constants and Variables
+@cindex API version
+@cindex extension API version
The API provides both a ``major'' and a ``minor'' version number.
The API versions are available at compile time as constants:
@@ -31446,6 +32055,8 @@ provided in @file{gawkapi.h} (discussed later, in
@node Extension API Informational Variables
@subsubsection Informational Variables
+@cindex API informational variables
+@cindex extension API informational variables
The API provides access to several variables that describe
whether the corresponding command-line options were enabled when
@@ -31591,6 +32202,8 @@ the version string with @command{gawk}.
@node Finding Extensions
@section How @command{gawk} Finds Extensions
+@cindex extension search path
+@cindex finding extensions
Compiled extensions have to be installed in a directory where
@command{gawk} can find them. If @command{gawk} is configured and
@@ -31601,6 +32214,7 @@ path with a list of directories to search for compiled extensions.
@node Extension Example
@section Example: Some File Functions
+@cindex extension example
@quotation
@i{No matter where you go, there you are.}
@@ -32059,7 +32673,7 @@ do_stat(int nargs, awk_value_t *result)
awk_array_t array;
int ret;
struct stat sbuf;
- /* default is stat() */
+ /* default is lstat() */
int (*statfunc)(const char *path, struct stat *sbuf) = lstat;
assert(result != NULL);
@@ -32245,6 +32859,7 @@ $ @kbd{AWKLIBPATH=$PWD gawk -f testff.awk}
@node Extension Samples
@section The Sample Extensions In The @command{gawk} Distribution
+@cindex extensions distributed with @command{gawk}
This @value{SECTION} provides brief overviews of the sample extensions
that come in the @command{gawk} distribution. Some of them are intended
@@ -32287,7 +32902,7 @@ upon success or less than zero upon error. In the latter case it updates
@code{ERRNO}.
@cindex @code{stat()} extension function
-@item result = stat("/some/path", statdata [, follow])
+@item result = stat("/some/path", statdata @r{[}, follow@r{]})
The @code{stat()} function provides a hook into the
@code{stat()} system call.
It returns zero upon success or less than zero upon error.
@@ -32497,19 +33112,23 @@ See @file{test/fts.awk} in the @command{gawk} distribution for an example.
@node Extension Sample Fnmatch
@subsection Interface To @code{fnmatch()}
-@cindex @code{fnmatch()} extension function
This extension provides an interface to the C library
@code{fnmatch()} function. The usage is:
-@example
-@@load "fnmatch"
+@table @code
+@item @@load "fnmatch"
+This is how you load the extension.
-result = fnmatch(pattern, string, flags)
-@end example
+@cindex @code{fnmatch()} extension function
+@item result = fnmatch(pattern, string, flags)
+The return value is zero on success, @code{FNM_NOMATCH}
+if the string did not match the pattern, or
+a different non-zero value if an error occurred.
+@end table
-The @code{fnmatch} extension adds a single function named
-@code{fnmatch()}, one constant (@code{FNM_NOMATCH}), and an array of
-flag values named @code{FNM}.
+Besides the @code{fnmatch()} function, the @code{fnmatch} extension
+adds one constant (@code{FNM_NOMATCH}), and an array of flag values
+named @code{FNM}.
The arguments to @code{fnmatch()} are:
@@ -32525,10 +33144,6 @@ Either zero, or the bitwise OR of one or more of the
flags in the @code{FNM} array.
@end table
-The return value is zero on success, @code{FNM_NOMATCH}
-if the string did not match the pattern, or
-a different non-zero value if an error occurred.
-
The flags are follows:
@multitable @columnfractions .25 .75
@@ -32572,15 +33187,15 @@ This is how you load the extension.
@cindex @code{fork()} extension function
@item pid = fork()
-This function creates a new process. The return value is the zero in the
-child and the process-id number of the child in the parent, or @minus{}1
+This function creates a new process. The return value is zero in the
+child and the process-ID number of the child in the parent, or @minus{}1
upon error. In the latter case, @code{ERRNO} indicates the problem.
In the child, @code{PROCINFO["pid"]} and @code{PROCINFO["ppid"]} are
updated to reflect the correct values.
@cindex @code{waitpid()} extension function
@item ret = waitpid(pid)
-This function takes a numeric argument, which is the process-id to
+This function takes a numeric argument, which is the process-ID to
wait for. The return value is that of the
@code{waitpid()} system call.
@@ -32836,7 +33451,7 @@ ret = reada("arraydump.bin", array)
@subsection Reading An Entire File
The @code{readfile} extension adds a single function
-named @code{readfile()}:
+named @code{readfile()}, and an input parser:
@table @code
@item @@load "readfile"
@@ -32847,6 +33462,12 @@ This is how you load the extension.
The argument is the name of the file to read. The return value is a
string containing the entire contents of the requested file. Upon error,
the function returns the empty string and sets @code{ERRNO}.
+
+@item BEGIN @{ PROCINFO["readfile"] = 1 @}
+In addition, the extension adds an input parser that is activated if
+@code{PROCINFO["readfile"]} exists.
+When activated, each input file is returned in its entirety as @code{$0}.
+@code{RT} is set to the null string.
@end table
Here is an example:
@@ -32905,6 +33526,8 @@ tries to use @code{nanosleep()} or @code{select()} to implement the delay.
@node gawkextlib
@section The @code{gawkextlib} Project
+@cindex @code{gawkextlib}
+@cindex extensions, where to find
@cindex @code{gawkextlib} project
The @uref{http://sourceforge.net/projects/gawkextlib/, @code{gawkextlib}}
@@ -32938,6 +33561,7 @@ The @code{time} extension described earlier (@pxref{Extension Sample
Time}) was originally from this project but has been moved in to the
main @command{gawk} distribution.
+@cindex @command{git} utility
You can check out the code for the @code{gawkextlib} project
using the @uref{http://git-scm.com, GIT} distributed source
code control system. The command is as follows:
@@ -34204,7 +34828,7 @@ The @option{-i} and @option{--include} options
load @command{awk} library files.
@item
-The @option{-l} and @option{--load} options for load compiled dynamic extensions.
+The @option{-l} and @option{--load} options load compiled dynamic extensions.
@item
The @option{-M} and @option{--bignum} options enable MPFR.
@@ -34225,7 +34849,7 @@ Support for high precision arithmetic with MPFR.
@item
The @code{and()}, @code{or()} and @code{xor()} functions
-allow any number of arguments,
+changed to allow any number of arguments,
with a minimum of two
(@pxref{Bitwise Functions}).
@@ -34250,18 +34874,18 @@ the three most widely-used freely available versions of @command{awk}
@multitable {@file{/dev/stderr} special file} {BWK Awk} {Mawk} {GNU Awk}
@headitem Feature @tab BWK Awk @tab Mawk @tab GNU Awk
@item @samp{\x} Escape sequence @tab X @tab X @tab X
-@item @code{RS} as regexp @tab @tab X @tab X
@item @code{FS} as null string @tab X @tab X @tab X
@item @file{/dev/stdin} special file @tab X @tab X @tab X
@item @file{/dev/stdout} special file @tab X @tab X @tab X
@item @file{/dev/stderr} special file @tab X @tab X @tab X
-@item @code{**} and @code{**=} operators @tab X @tab @tab X
-@item @code{fflush()} function @tab X @tab X @tab X
-@item @code{func} keyword @tab X @tab @tab X
-@item @code{nextfile} statement @tab X @tab X @tab X
@item @code{delete} without subscript @tab X @tab X @tab X
+@item @code{fflush()} function @tab X @tab X @tab X
@item @code{length()} of an array @tab X @tab X @tab X
+@item @code{nextfile} statement @tab X @tab X @tab X
+@item @code{**} and @code{**=} operators @tab X @tab @tab X
+@item @code{func} keyword @tab X @tab @tab X
@item @code{BINMODE} variable @tab @tab X @tab X
+@item @code{RS} as regexp @tab @tab X @tab X
@item Time related functions @tab @tab X @tab X
@end multitable
@@ -34292,7 +34916,7 @@ as working in this fashion, and in particular, would teach that the
that @samp{[A-Z]} was the ``correct'' way to match uppercase letters.
And indeed, this was true.@footnote{And Life was good.}
-The 1993 POSIX standard introduced the idea of locales (@pxref{Locales}).
+The 1992 POSIX standard introduced the idea of locales (@pxref{Locales}).
Since many locales include other letters besides the plain twenty-six
letters of the American English alphabet, the POSIX standard added
character classes (@pxref{Bracket Expressions}) as a way to match
@@ -34331,6 +34955,7 @@ This output is unexpected, since the @samp{bc} at the end of
This result is due to the locale setting (and thus you may not see
it on your system).
+@cindex Unicode
Similar considerations apply to other ranges. For example, @samp{["-/]}
is perfectly valid in ASCII, but is not valid in many Unicode locales,
such as @samp{en_US.UTF-8}.
@@ -34344,16 +34969,17 @@ vendors started implementing non-ASCII locales, @emph{and making them
the default}. Perhaps the most frequently asked question became something
like ``why does @samp{[A-Z]} match lowercase letters?!?''
+@cindex Berry, Karl
This situation existed for close to 10 years, if not more, and
the @command{gawk} maintainer grew weary of trying to explain that
@command{gawk} was being nicely standards-compliant, and that the issue
was in the user's locale. During the development of version 4.0,
he modified @command{gawk} to always treat ranges in the original,
pre-POSIX fashion, unless @option{--posix} was used (@pxref{Options}).@footnote{And
-thus was born the Campain for Rational Range Interpretation (or RRI). A number
-of GNU tools, such as @command{grep} and @command{sed}, have either
-implemented this change, or will soon. Thanks to Karl Berry for coining the phrase
-``Rational Range Interpretation.''}
+thus was born the Campaign for Rational Range Interpretation (or
+RRI). A number of GNU tools have either implemented this change,
+or will soon. Thanks to Karl Berry for coining the phrase ``Rational
+Range Interpretation.''}
Fortunately, shortly before the final release of @command{gawk} 4.0,
the maintainer learned that the 2008 standard had changed the
@@ -34366,7 +34992,7 @@ and
By using this lovely technical term, the standard gives license
to implementors to implement ranges in whatever way they choose.
The @command{gawk} maintainer chose to apply the pre-POSIX meaning in all
-cases: the default regexp matching; with @option{--traditional}, and with
+cases: the default regexp matching; with @option{--traditional} and with
@option{--posix}; in all cases, @command{gawk} remains POSIX compliant.
@node Contributors
@@ -34566,6 +35192,11 @@ environments.
Anders Wallin helped keep the VMS port going for several years.
@item
+@cindex Gordon, Assaf
+Assaf Gordon contributed the code to implement the
+@option{--sandbox} option.
+
+@item
@cindex Haque, John
John Haque made the following contributions:
@@ -34609,6 +35240,11 @@ Arnold Robbins and Andrew Schorr, with notable contributions from
the rest of the development team.
@item
+@cindex Colombo, Antonio
+Antonio Giovanni Colombo rewrote a number of examples in the early
+chapters that were severely dated, for which I am incredibly grateful.
+
+@item
@cindex Robbins, Arnold
Arnold Robbins
has been working on @command{gawk} since 1988, at first
@@ -35007,7 +35643,7 @@ please send in a bug report (@pxref{Bugs}).
Of course, once you've built @command{gawk}, it is likely that you will
wish to install it. To do so, you need to run the command @samp{make
-check}, as a user with the appropriate permissions. How to do this
+install}, as a user with the appropriate permissions. How to do this
varies by system, but on many systems you can use the @command{sudo}
command to do so. The command then becomes @samp{sudo make install}. It
is likely that you will be asked for your password, and you will have
@@ -35333,11 +35969,10 @@ multibyte functionality is not available.
@c STARTOFRANGE pcgawon
@cindex PC operating systems, @command{gawk} on
-With the exception of the Cygwin environment,
-the @samp{|&} operator and TCP/IP networking
-(@pxref{TCP/IP Networking})
-are not supported for MS-DOS or MS-Windows. EMX (OS/2 only) does support
-at least the @samp{|&} operator.
+Under MS-DOS and MS-Windows, the Cygwin and MinGW environments support
+both the @samp{|&} operator and TCP/IP networking
+(@pxref{TCP/IP Networking}).
+EMX (OS/2 only) supports at least the @samp{|&} operator.
@cindex search paths
@cindex search paths, for source files
@@ -35467,7 +36102,7 @@ moved into the @code{BEGIN} rule.
@command{gawk} can be built and used ``out of the box'' under MS-Windows
if you are using the @uref{http://www.cygwin.com, Cygwin environment}.
-This environment provides an excellent simulation of Unix, using the
+This environment provides an excellent simulation of GNU/Linux, using the
GNU tools, such as Bash, the GNU Compiler Collection (GCC), GNU Make,
and other GNU programs. Compilation and installation for Cygwin is the
same as for a Unix system:
@@ -35483,13 +36118,6 @@ When compared to GNU/Linux on the same system, the @samp{configure}
step on Cygwin takes considerably longer. However, it does finish,
and then the @samp{make} proceeds as usual.
-@quotation NOTE
-The @samp{|&} operator and TCP/IP networking
-(@pxref{TCP/IP Networking})
-are fully supported in the Cygwin environment. This is not true
-for any other environment on MS-Windows.
-@end quotation
-
@node MSYS
@appendixsubsubsec Using @command{gawk} In The MSYS Environment
@@ -35557,21 +36185,14 @@ can better handle @code{ODS-5} volumes with upper- and lowercase filenames.
With @code{ODS-5} volumes and extended parsing enabled, the case of the target
parameter may need to be exact.
-Older versions of @command{gawk} could be built with VAX C or
-GNU C on VAX/VMS, as well as with DEC C, but that is no longer
-supported. DEC C (also briefly known as ``Compaq C'' and now known
-as ``HP C,'' but referred to here as ``DEC C'') is required. Both
-@code{vmsbuild.com} and @code{descrip.mms} contain some obsolete support
-for the older compilers but are set up to use DEC C by default.
-
@command{gawk} has been tested under VAX/VMS 7.3 and Alpha/VMS 7.3-1
using Compaq C V6.4, and Alpha/VMS 7.3, Alpha/VMS 7.3-2, and IA64/VMS 8.3.
The most recent builds used HP C V7.3 on Alpha VMS 8.3 and both
Alpha and IA64 VMS 8.4 used HP C 7.3.@footnote{The IA64 architecture
is also known as ``Itanium.''}
-Work is currently being done for a procedure to build @command{gawk} and create
-a PCSI kit for compatible with the GNV product.
+The @file{[.vms]gawk_build_steps.txt} provides information on how to build
+@command{gawk} into a PCSI kit that is compatible with the GNV product.
@node VMS Dynamic Extensions
@appendixsubsubsec Compiling @command{gawk} Dynamic Extensions on VMS
@@ -35668,7 +36289,7 @@ add the @command{gawk} and @command{awk} to the system wide @samp{DCLTABLES}.
The DCL syntax is documented in the @file{gawk.hlp} file.
-Optionally, @file{gawk.hlp} entry can be loaded into a VMS help library:
+Optionally, the @file{gawk.hlp} entry can be loaded into a VMS help library:
@example
$ @kbd{LIBRARY/HELP sys$help:helplib [.vms]gawk.hlp}
@@ -35923,22 +36544,23 @@ file should be considered authoritative if it conflicts with this
The people maintaining the non-Unix ports of @command{gawk} are
as follows:
-@multitable {MS-Windows with MINGW} {123456789012345678901234567890123456789001234567890}
+@c put the index entries outside the table, for docbook
@cindex Deifik, Scott
+@cindex Zaretskii, Eli
+@cindex Buening, Andreas
+@cindex Rankin, Pat
+@cindex Malmberg, John
+@cindex Pitts, Dave
+@multitable {MS-Windows with MINGW} {123456789012345678901234567890123456789001234567890}
@item MS-DOS with DJGPP @tab Scott Deifik, @EMAIL{scottd.mail@@sbcglobal.net,scottd dot mail at sbcglobal dot net}.
-@cindex Zaretskii, Eli
@item MS-Windows with MINGW @tab Eli Zaretskii, @EMAIL{eliz@@gnu.org,eliz at gnu dot org}.
-@cindex Buening, Andreas
@item OS/2 @tab Andreas Buening, @EMAIL{andreas.buening@@nexgo.de,andreas dot buening at nexgo dot de}.
-@cindex Rankin, Pat
-@cindex Malmberg, John
@item VMS @tab Pat Rankin, @EMAIL{r.pat.rankin@@gmail.com,r.pat.rankin at gmail.com}, and
John Malmberg, @EMAIL{wb8tyw@@qsl.net,wb8tyw at qsl.net}.
-@cindex Pitts, Dave
@item z/OS (OS/390) @tab Dave Pitts, @EMAIL{dpitts@@cozx.com,dpitts at cozx dot com}.
@end multitable
@@ -35973,7 +36595,7 @@ This @value{SECTION} briefly describes where to get them:
@cindex Kernighan, Brian
@cindex source code, Brian Kernighan's @command{awk}
@cindex @command{awk}, versions of, See Also Brian Kernighan's @command{awk}
-@cindex Brian Kernighan's @command{awk}
+@cindex Brian Kernighan's @command{awk}, source code
@item Unix @command{awk}
Brian Kernighan, one of the original designers of Unix @command{awk},
has made his implementation of
@@ -35993,6 +36615,7 @@ It is available in several archive formats:
@uref{http://www.cs.princeton.edu/~bwk/btl.mirror/awk.zip}
@end table
+@cindex @command{git} utility
You can also retrieve it from Git Hub:
@example
@@ -36012,7 +36635,7 @@ from GCC (the GNU Compiler Collection) works quite nicely.
for a list of extensions in this @command{awk} that are not in POSIX @command{awk}.
@cindex Brennan, Michael
-@cindex @command{mawk} program
+@cindex @command{mawk} utility
@cindex source code, @command{mawk}
@item @command{mawk}
Michael Brennan wrote an independent implementation of @command{awk},
@@ -36058,7 +36681,7 @@ To get @command{awka}, go to @url{http://sourceforge.net/projects/awka}.
The project seems to be frozen; no new code changes have been made
since approximately 2003.
-@cindex Beebe, Nelson
+@cindex Beebe, Nelson H.F.@:
@cindex @command{pawk} (profiling version of Brian Kernighan's @command{awk})
@cindex source code, @command{pawk}
@item @command{pawk}
@@ -36121,6 +36744,7 @@ This is an embeddable @command{awk} interpreter derived from
@uref{http://repo.hu/projects/libmawk/}.
@item @code{pawk}
+@cindex source code, @command{pawk} (Python version)
@cindex @code{pawk}, @command{awk}-like facilities for Python
This is a Python module that claims to bring @command{awk}-like
features to Python. See @uref{https://github.com/alecthomas/pawk}
@@ -36226,6 +36850,7 @@ As @command{gawk} is Free Software, the source code is always available.
@ref{Gawk Distribution}, describes how to get and build the formal,
released versions of @command{gawk}.
+@cindex @command{git} utility
However, if you want to modify @command{gawk} and contribute back your
changes, you will probably wish to work with the development version.
To do so, you will need to access the @command{gawk} source code
@@ -36401,6 +37026,7 @@ If possible, please update the @command{man} page as well.
You will also have to sign paperwork for your documentation changes.
+@cindex @command{git} utility
@item
Submit changes as unified diffs.
Use @samp{diff -u -r -N} to compare
@@ -36534,6 +37160,8 @@ coding style and brace layout that suits your taste.
@node Derived Files
@appendixsubsec Why Generated Files Are Kept In @command{git}
+@c STARTOFRANGE gawkgit
+@cindex @command{git}, use of for @command{gawk} source code
@c From emails written March 22, 2012, to the gawk developers list.
If you look at the @command{gawk} source in the @command{git}
@@ -36713,7 +37341,7 @@ wget http://git.savannah.gnu.org/cgit/gawk.git/snapshot/gawk-@var{branchname}.ta
@noindent
to retrieve a snapshot of the given branch.
-
+@c ENDOFRANGE gawkgit
@node Future Extensions
@appendixsec Probable Future Extensions
@@ -37084,8 +37712,15 @@ other introductory texts that you should refer to instead.)
@cindex processing data
At the most basic level, the job of a program is to process
-some input data and produce results. See @ref{figure-general-flow}.
+some input data and produce results.
+@ifnotdocbook
+See @ref{figure-general-flow}.
+@end ifnotdocbook
+@ifdocbook
+See @inlineraw{docbook, <xref linkend="figure-general-flow"/>}.
+@end ifdocbook
+@ifnotdocbook
@float Figure,figure-general-flow
@caption{General Program Flow}
@ifinfo
@@ -37095,6 +37730,14 @@ some input data and produce results. See @ref{figure-general-flow}.
@center @image{general-program, , , General program flow}
@end ifnotinfo
@end float
+@end ifnotdocbook
+
+@docbook
+<figure id="figure-general-flow">
+<title>General Program Flow</title>
+<graphic fileref="general-program.eps"/>
+</figure>
+@end docbook
@cindex compiled programs
@cindex interpreted programs
@@ -37110,9 +37753,15 @@ instructions in your program to process the data.
@cindex programming, basic steps
When you write a program, it usually consists
-of the following, very basic set of steps, as shown
-in @ref{figure-process-flow}:
+of the following, very basic set of steps,
+@ifnotdocbook
+as shown in @ref{figure-process-flow}:
+@end ifnotdocbook
+@ifdocbook
+as shown in @inlineraw{docbook <xref linkend="figure-process-flow"/>}:
+@end ifdocbook
+@ifnotdocbook
@float Figure,figure-process-flow
@caption{Basic Program Steps}
@ifinfo
@@ -37122,6 +37771,14 @@ in @ref{figure-process-flow}:
@center @image{process-flow, , , Basic Program Stages}
@end ifnotinfo
@end float
+@end ifnotdocbook
+
+@docbook
+<figure id="figure-process-flow">
+<title>Basic Program Stages</title>
+<graphic fileref="process-flow.eps"/>
+</figure>
+@end docbook
@table @asis
@item Initialization
@@ -37292,7 +37949,7 @@ better written in another language.
You can get it from @uref{http://awk.info/?awk100/aaa}.
@cindex Ada programming language
-@cindex Programming languages, Ada
+@cindex programming languages, Ada
@item Ada
A programming language originally defined by the U.S.@: Department of
Defense for embedded programming. It was designed to enforce good
@@ -37360,9 +38017,6 @@ The GNU version of the standard shell
@end ifinfo
See also ``Bourne Shell.''
-@item BBS
-See ``Bulletin Board System.''
-
@item Bit
Short for ``Binary Digit.''
All values in computer memory ultimately reduce to binary digits: values
@@ -37437,11 +38091,6 @@ Changing some of them affects @command{awk}'s running environment.
@item Braces
See ``Curly Braces.''
-@item Bulletin Board System
-A computer system allowing users to log in and read and/or leave messages
-for other users of the system, much like leaving paper notes on a bulletin
-board.
-
@item C
The system programming language that most GNU software is written in. The
@command{awk} programming language has C-like syntax, and this @value{DOCUMENT}
@@ -37468,6 +38117,8 @@ The @uref{http://www.unicode.org, Unicode character set} is
becoming increasingly popular and standard, and is particularly
widely used on GNU/Linux systems.
+@cindex Kernighan, Brian
+@cindex Bentley, Jon
@cindex @command{chem} utility
@item CHEM
A preprocessor for @command{pic} that reads descriptions of molecules
@@ -37604,7 +38255,7 @@ ordinary expression. It could be a string constant, such as
(@xref{Computed Regexps}.)
@item Environment
-A collection of strings, of the form @var{name@code{=}val}, that each
+A collection of strings, of the form @var{name}@code{=}@code{val}, that each
program has available to it. Users generally place values into the
environment in order to provide information to various programs. Typical
examples are the environment variables @env{HOME} and @env{PATH}.
@@ -37773,7 +38424,7 @@ information about the name of the organization and its language-independent
three-letter acronym.
@cindex Java programming language
-@cindex Programming languages, Java
+@cindex programming languages, Java
@item Java
A modern programming language originally developed by Sun Microsystems
(now Oracle) supporting Object-Oriented programming. Although usually
@@ -38060,7 +38711,12 @@ record or a string.
@c The GNU General Public License.
@node Copying
@unnumbered GNU General Public License
+@ifnotdocbook
@center Version 3, 29 June 2007
+@end ifnotdocbook
+@docbook
+<subtitle>Version 3, 29 June 2007</subtitle>
+@end docbook
@c This file is intended to be included within another document,
@c hence no sectioning command or @node.
@@ -38785,10 +39441,17 @@ first, please read @url{http://www.gnu.org/philosophy/why-not-lgpl.html}.
@c The GNU Free Documentation License.
@node GNU Free Documentation License
@unnumbered GNU Free Documentation License
+@ifnotdocbook
+@center Version 1.3, 3 November 2008
+@end ifnotdocbook
+
+@docbook
+<subtitle>Version 1.3, 3 November 2008</subtitle>
+@end docbook
+
@cindex FDL (Free Documentation License)
@cindex Free Documentation License (FDL)
@cindex GNU Free Documentation License
-@center Version 1.3, 3 November 2008
@c This file is intended to be included within another document,
@c hence no sectioning command or @node.
@@ -39293,8 +39956,10 @@ to permit their use in free software.
@c ispell-local-pdict: "ispell-dict"
@c End:
+@ifnotdocbook
@node Index
@unnumbered Index
+@end ifnotdocbook
@printindex cp
@bye
@@ -39405,8 +40070,6 @@ Suggestions:
% Next edition:
% 1. Standardize the error messages from the functions and programs
% in the two sample code chapters.
-% 2. Nuke the BBS stuff and use something that won't be obsolete
-% 3. Turn the advanced notes into sidebars by using @cartouche
Better sidebars can almost sort of be done with:
@@ -39438,4 +40101,3 @@ But to use it you have to say
}
which sorta sucks.
-