aboutsummaryrefslogtreecommitdiffstats
path: root/doc/gawk.texi
diff options
context:
space:
mode:
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r--doc/gawk.texi133
1 files changed, 101 insertions, 32 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi
index 81b36ae5..7fc342c3 100644
--- a/doc/gawk.texi
+++ b/doc/gawk.texi
@@ -165,6 +165,19 @@
@end macro
@end ifdocbook
+@c hack for docbook, where comma shouldn't always follow an @ref{}
+@ifdocbook
+@macro DBREF{text}
+@ref{\text\}
+@end macro
+@end ifdocbook
+
+@ifnotdocbook
+@macro DBREF{text}
+@ref{\text\},
+@end macro
+@end ifnotdocbook
+
@ifclear FOR_PRINT
@set FN file name
@set FFN File Name
@@ -1622,7 +1635,7 @@ available @command{awk} implementations.
@ifset FOR_PRINT
-@ref{Copying},
+@DBREF{Copying}
presents the license that covers the @command{gawk} source code.
The version of this @value{DOCUMENT} distributed with @command{gawk}
@@ -3403,7 +3416,7 @@ and array sorting.
As we develop our presentation of the @command{awk} language, we introduce
most of the variables and many of the functions. They are described
-systematically in @ref{Built-in Variables}, and
+systematically in @ref{Built-in Variables}, and in
@ref{Built-in}.
@node When
@@ -5196,7 +5209,7 @@ The escape sequences described
@ifnotinfo
earlier
@end ifnotinfo
-in @ref{Escape Sequences},
+in @DBREF{Escape Sequences}
are valid inside a regexp. They are introduced by a @samp{\} and
are recognized and converted into corresponding real characters as
the very first step in processing regexps.
@@ -5432,7 +5445,7 @@ Within a bracket expression, a @dfn{range expression} consists of two
characters separated by a hyphen. It matches any single character that
sorts between the two characters, based upon the system's native character
set. For example, @samp{[0-9]} is equivalent to @samp{[0123456789]}.
-(See @ref{Ranges and Locales}, for an explanation of how the POSIX
+(See @DBREF{Ranges and Locales} for an explanation of how the POSIX
standard and @command{gawk} have changed over time. This is mainly
of historical interest.)
@@ -8013,6 +8026,16 @@ processing on the next record @emph{right now}. For example:
@}
@end example
+@c 8/2014: Here is some sample input:
+@ignore
+mon/*comment*/key
+rab/*commen
+t*/bit
+horse /*comment*/more text
+part 1 /*comment*/part 2 /*comment*/part 3
+no comment
+@end ignore
+
This @command{awk} program deletes C-style comments (@samp{/* @dots{}
*/}) from the input.
It uses a number of features we haven't covered yet, including
@@ -8428,7 +8451,7 @@ probably by accident, and you should reconsider what it is you're
trying to accomplish.
@item
-@ref{Getline Summary}, presents a table summarizing the
+@DBREF{Getline Summary} presents a table summarizing the
@code{getline} variants and which variables they can affect.
It is worth noting that those variants which do not use redirection
can cause @code{FILENAME} to be updated if they cause
@@ -15033,7 +15056,7 @@ changed.
@cindex arguments, command-line
@cindex command line, arguments
-@ref{Auto-set},
+@DBREF{Auto-set}
presented the following program describing the information contained in @code{ARGC}
and @code{ARGV}:
@@ -19809,7 +19832,7 @@ being aware of them.
@cindex pointers to functions
@cindex differences in @command{awk} and @command{gawk}, indirect function calls
-This section describes a @command{gawk}-specific extension.
+This section describes an advanced, @command{gawk}-specific extension.
Often, you may wish to defer the choice of function to call until runtime.
For example, you may have different kinds of records, each of which
@@ -19855,7 +19878,7 @@ To process the data, you might write initially:
@noindent
This style of programming works, but can be awkward. With @dfn{indirect}
function calls, you tell @command{gawk} to use the @emph{value} of a
-variable as the name of the function to call.
+variable as the @emph{name} of the function to call.
@cindex @code{@@}-notation for indirect function calls
@cindex indirect function calls, @code{@@}-notation
@@ -19917,7 +19940,6 @@ Otherwise they perform the expected computations and are not unusual.
@example
@c file eg/prog/indirectcall.awk
# For each record, print the class name and the requested statistics
-
@{
class_name = $1
gsub(/_/, " ", class_name) # Replace _ with spaces
@@ -20146,10 +20168,12 @@ $ @kbd{gawk -f quicksort.awk -f indirectcall.awk class_data2}
Remember that you must supply a leading @samp{@@} in front of an indirect function call.
-Unfortunately, indirect function calls cannot be used with the built-in functions. However,
-you can generally write ``wrapper'' functions which call the built-in ones, and those can
-be called indirectly. (Other than, perhaps, the mathematical functions, there is not a lot
-of reason to try to call the built-in functions indirectly.)
+Starting with @value{PVERSION} 4.1.2 of @command{gawk}, indirect function
+calls may also be used with built-in functions and with extension functions
+(@pxref{Dynamic Extensions}). The only thing you cannot do is pass a regular
+expression constant to a built-in function through an indirect function
+call.@footnote{This may change in a future version; recheck the documentation that
+comes with your version of @command{gawk} to see if it has.}
@command{gawk} does its best to make indirect function calls efficient.
For example, in the following case:
@@ -20160,7 +20184,7 @@ for (i = 1; i <= n; i++)
@end example
@noindent
-@code{gawk} will look up the actual function to call only once.
+@code{gawk} looks up the actual function to call only once.
@node Functions Summary
@section Summary
@@ -20200,6 +20224,8 @@ from the real parameters by extra whitespace.
User-defined functions may call other user-defined (and built-in)
functions and may call themselves recursively. Function parameters
``hide'' any global variables of the same names.
+You cannot use the name of a reserved variable (such as @code{ARGC})
+as the name of a parameter in user-defined functions.
@item
Scalar values are passed to user-defined functions by value. Array
@@ -20218,7 +20244,7 @@ either scalar or array.
@item
@command{gawk} provides indirect function calls using a special syntax.
-By setting a variable to the name of a user-defined function, you can
+By setting a variable to the name of a function, you can
determine at runtime what function will be called at that point in the
program. This is equivalent to function pointers in C and C++.
@@ -20253,7 +20279,7 @@ It contains the following chapters:
@c STARTOFRANGE fudlib
@cindex functions, user-defined, library of
-@ref{User-defined}, describes how to write
+@DBREF{User-defined} describes how to write
your own @command{awk} functions. Writing functions is important, because
it allows you to encapsulate algorithms and program tasks in a single
place. It simplifies programming, making program development more
@@ -20286,7 +20312,7 @@ use these functions.
The functions are presented here in a progression from simple to complex.
@cindex Texinfo
-@ref{Extract Program},
+@DBREF{Extract Program}
presents a program that you can use to extract the source code for
these example library functions and programs from the Texinfo source
for this @value{DOCUMENT}.
@@ -20437,7 +20463,7 @@ A different convention, common in the Tcl community, is to use a single
associative array to hold the values needed by the library function(s), or
``package.'' This significantly decreases the number of actual global names
in use. For example, the functions described in
-@ref{Passwd Functions},
+@DBREF{Passwd Functions}
might have used array elements @code{@w{PW_data["inited"]}}, @code{@w{PW_data["total"]}},
@code{@w{PW_data["count"]}}, and @code{@w{PW_data["awklib"]}}, instead of
@code{@w{_pw_inited}}, @code{@w{_pw_awklib}}, @code{@w{_pw_total}},
@@ -21000,7 +21026,7 @@ more difficult than they really need to be.}
@cindex timestamps, formatted
@cindex time, managing
The @code{systime()} and @code{strftime()} functions described in
-@ref{Time Functions},
+@DBREF{Time Functions}
provide the minimum functionality necessary for dealing with the time of day
in human readable form. While @code{strftime()} is extensive, the control
formats are not necessarily easy to remember or intuitively obvious when
@@ -21086,7 +21112,7 @@ function getlocaltime(time, ret, now, i)
The string indices are easier to use and read than the various formats
required by @code{strftime()}. The @code{alarm} program presented in
-@ref{Alarm Program},
+@DBREF{Alarm Program}
uses this function.
A more general design for the @code{getlocaltime()} function would have
allowed the user to supply an optional timestamp value to use instead
@@ -21118,10 +21144,13 @@ This function reads from @code{file} one record at a time, building
up the full contents of the file in the local variable @code{contents}.
It works, but is not necessarily
@c 8/2014. Thanks to BWK for pointing this out:
-efficient.@footnote{Execution time grows quadratically in the size of
+efficient.
+@ignore
+@footnote{Execution time grows quadratically in the size of
the input; for each record, @command{awk} has to allocate a bigger
internal buffer for @code{contents}, copy the old contents into it,
and then append the contents of the new record.}
+@end ignore
The following function, based on a suggestion by Denis Shirokov,
reads the entire contents of the named file in one shot:
@@ -21294,7 +21323,7 @@ END @{ endfile(_filename_) @}
@c endfile
@end example
-@ref{Wc Program},
+@DBREF{Wc Program}
shows how this library function can be used and
how it simplifies writing the main program.
@@ -22297,7 +22326,7 @@ once. If you are worried about squeezing every last cycle out of your
this is not necessary, since most @command{awk} programs are I/O-bound,
and such a change would clutter up the code.
-The @command{id} program in @ref{Id Program},
+The @command{id} program in @DBREF{Id Program}
uses these functions.
@c ENDOFRANGE libfudata
@c ENDOFRANGE flibudata
@@ -22323,7 +22352,7 @@ uses these functions.
@cindex group file
@cindex files, group
Much of the discussion presented in
-@ref{Passwd Functions},
+@DBREF{Passwd Functions}
applies to the group database as well. Although there has traditionally
been a well-known file (@file{/etc/group}) in a well-known format, the POSIX
standard only provides a set of C library routines
@@ -22662,13 +22691,13 @@ Most of the work is in scanning the database and building the various
associative arrays. The functions that the user calls are themselves very
simple, relying on @command{awk}'s associative arrays to do work.
-The @command{id} program in @ref{Id Program},
+The @command{id} program in @DBREF{Id Program}
uses these functions.
@node Walking Arrays
@section Traversing Arrays of Arrays
-@ref{Arrays of Arrays}, described how @command{gawk}
+@DBREF{Arrays of Arrays} described how @command{gawk}
provides arrays of arrays. In particular, any element of
an array may be either a scalar, or another array. The
@code{isarray()} function (@pxref{Type Functions})
@@ -22823,7 +22852,7 @@ As a related challenge, revise that code to handle the case where
an intervening value in @code{ARGV} is a variable assignment.
@item
-@ref{Walking Arrays}, presented a function that walked a multidimensional
+@DBREF{Walking Arrays} presented a function that walked a multidimensional
array to print it out. However, walking an array and processing
each element is a general-purpose operation. Generalize the
@code{walk_array()} function by adding an additional parameter named
@@ -23836,6 +23865,11 @@ This program is a bit sloppy; it relies on @command{awk} to automatically close
instead of doing it in an @code{END} rule.
It also assumes that letters are contiguous in the character set,
which isn't true for EBCDIC systems.
+@ifset FOR_PRINT
+You might want to consider how to eliminate the use of
+@code{ord()} and @code{chr()}; this can be done in such a
+way as to solve the EBCDIC issue as well.
+@end ifset
@c ENDOFRANGE filspl
@c ENDOFRANGE split
@@ -24081,7 +24115,7 @@ BEGIN @{
else if (c == "c")
do_count++
else if (index("0123456789", c) != 0) @{
- # getopt requires args to options
+ # getopt() requires args to options
# this messes us up for things like -5
if (Optarg ~ /^[[:digit:]]+$/)
fcount = (c Optarg) + 0
@@ -24218,6 +24252,22 @@ END @{
@}
@c endfile
@end example
+
+@ifset FOR_PRINT
+The logic for choosing which lines to print represents a @dfn{state
+machine}, which is ``a device that can be in one of a set number of stable
+conditions depending on its previous condition and on the present values
+of its inputs.''@footnote{This is the definition returned from entering
+@code{define: state machine} into Google.}
+Brian Kernighan suggests that
+``an alternative approach to state mechines is to just read
+the input into an array, then use indexing. It's almost always
+easier code, and for most inputs where you would use this, just
+as fast.'' Consider how to rewrite the logic to follow this
+suggestion.
+@end ifset
+
+
@c ENDOFRANGE prunt
@c ENDOFRANGE tpul
@c ENDOFRANGE uniq
@@ -24743,7 +24793,7 @@ of standard @command{awk}: dealing with individual characters is very
painful, requiring repeated use of the @code{substr()}, @code{index()},
and @code{gsub()} built-in functions
(@pxref{String Functions}).@footnote{This
-program was written before @command{gawk} acquired the ability to
+program was also written before @command{gawk} acquired the ability to
split each character in a string into separate array elements.}
There are two functions. The first, @code{stranslate()}, takes three
arguments:
@@ -26357,6 +26407,23 @@ The @code{split.awk} program (@pxref{Split Program}) assumes
that letters are contiguous in the character set,
which isn't true for EBCDIC systems.
Fix this problem.
+(Hint: Consider a different way to work through the alphabet,
+without relying on @code{ord()} and @code{chr()}.)
+
+@item
+In @file{uniq.awk} (@pxref{Uniq Program}, the
+logic for choosing which lines to print represents a @dfn{state
+machine}, which is ``a device that can be in one of a set number of stable
+conditions depending on its previous condition and on the present values
+of its inputs.''@footnote{This is the definition returned from entering
+@code{define: state machine} into Google.}
+Brian Kernighan suggests that
+``an alternative approach to state mechines is to just read
+the input into an array, then use indexing. It's almost always
+easier code, and for most inputs where you would use this, just
+as fast.'' Rewrite the logic to follow this
+suggestion.
+
@item
Why can't the @file{wc.awk} program (@pxref{Wc Program}) just
@@ -26634,7 +26701,7 @@ Often, though, it is desirable to be able to loop over the elements
in a particular order that you, the programmer, choose. @command{gawk}
lets you do this.
-@ref{Controlling Scanning}, describes how you can assign special,
+@DBREF{Controlling Scanning} describes how you can assign special,
pre-defined values to @code{PROCINFO["sorted_in"]} in order to
control the order in which @command{gawk} traverses an array
during a @code{for} loop.
@@ -29790,7 +29857,9 @@ responds @samp{syntax error}. When you do figure out what your mistake was,
though, you'll feel like a real guru.
@item
-If you perused the dump of opcodes in @ref{Miscellaneous Debugger Commands},
+@c NOTE: no comma after the ref{} on purpose, due to following
+@c parenthetical remark.
+If you perused the dump of opcodes in @ref{Miscellaneous Debugger Commands}
(or if you are already familiar with @command{gawk} internals),
you will realize that much of the internal manipulation of data
in @command{gawk}, as in many interpreters, is done on a stack.
@@ -38251,7 +38320,7 @@ as well as any considerations you should bear in mind.
@appendixsubsec Accessing The @command{gawk} Git Repository
As @command{gawk} is Free Software, the source code is always available.
-@ref{Gawk Distribution}, describes how to get and build the formal,
+@DBREF{Gawk Distribution} describes how to get and build the formal,
released versions of @command{gawk}.
@cindex @command{git} utility