diff options
author | Arnold D. Robbins <arnold@skeeve.com> | 2014-09-04 09:49:44 +0300 |
---|---|---|
committer | Arnold D. Robbins <arnold@skeeve.com> | 2014-09-04 09:49:44 +0300 |
commit | 7448f28d356fc5cd8d9117111baea3a549e0930e (patch) | |
tree | f24d82d8b121d8321cfc0366dabad236d3f3dac3 /doc/gawk.texi | |
parent | a205df7903bce201577df4f7049c190e283f1ea4 (diff) | |
parent | 8beb9796b17b6ca48eb62df8fd3d31421e43c761 (diff) | |
download | egawk-7448f28d356fc5cd8d9117111baea3a549e0930e.tar.gz egawk-7448f28d356fc5cd8d9117111baea3a549e0930e.tar.bz2 egawk-7448f28d356fc5cd8d9117111baea3a549e0930e.zip |
Merge branch 'gawk-4.1-stable'
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r-- | doc/gawk.texi | 133 |
1 files changed, 101 insertions, 32 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi index 81b36ae5..7fc342c3 100644 --- a/doc/gawk.texi +++ b/doc/gawk.texi @@ -165,6 +165,19 @@ @end macro @end ifdocbook +@c hack for docbook, where comma shouldn't always follow an @ref{} +@ifdocbook +@macro DBREF{text} +@ref{\text\} +@end macro +@end ifdocbook + +@ifnotdocbook +@macro DBREF{text} +@ref{\text\}, +@end macro +@end ifnotdocbook + @ifclear FOR_PRINT @set FN file name @set FFN File Name @@ -1622,7 +1635,7 @@ available @command{awk} implementations. @ifset FOR_PRINT -@ref{Copying}, +@DBREF{Copying} presents the license that covers the @command{gawk} source code. The version of this @value{DOCUMENT} distributed with @command{gawk} @@ -3403,7 +3416,7 @@ and array sorting. As we develop our presentation of the @command{awk} language, we introduce most of the variables and many of the functions. They are described -systematically in @ref{Built-in Variables}, and +systematically in @ref{Built-in Variables}, and in @ref{Built-in}. @node When @@ -5196,7 +5209,7 @@ The escape sequences described @ifnotinfo earlier @end ifnotinfo -in @ref{Escape Sequences}, +in @DBREF{Escape Sequences} are valid inside a regexp. They are introduced by a @samp{\} and are recognized and converted into corresponding real characters as the very first step in processing regexps. @@ -5432,7 +5445,7 @@ Within a bracket expression, a @dfn{range expression} consists of two characters separated by a hyphen. It matches any single character that sorts between the two characters, based upon the system's native character set. For example, @samp{[0-9]} is equivalent to @samp{[0123456789]}. -(See @ref{Ranges and Locales}, for an explanation of how the POSIX +(See @DBREF{Ranges and Locales} for an explanation of how the POSIX standard and @command{gawk} have changed over time. This is mainly of historical interest.) @@ -8013,6 +8026,16 @@ processing on the next record @emph{right now}. For example: @} @end example +@c 8/2014: Here is some sample input: +@ignore +mon/*comment*/key +rab/*commen +t*/bit +horse /*comment*/more text +part 1 /*comment*/part 2 /*comment*/part 3 +no comment +@end ignore + This @command{awk} program deletes C-style comments (@samp{/* @dots{} */}) from the input. It uses a number of features we haven't covered yet, including @@ -8428,7 +8451,7 @@ probably by accident, and you should reconsider what it is you're trying to accomplish. @item -@ref{Getline Summary}, presents a table summarizing the +@DBREF{Getline Summary} presents a table summarizing the @code{getline} variants and which variables they can affect. It is worth noting that those variants which do not use redirection can cause @code{FILENAME} to be updated if they cause @@ -15033,7 +15056,7 @@ changed. @cindex arguments, command-line @cindex command line, arguments -@ref{Auto-set}, +@DBREF{Auto-set} presented the following program describing the information contained in @code{ARGC} and @code{ARGV}: @@ -19809,7 +19832,7 @@ being aware of them. @cindex pointers to functions @cindex differences in @command{awk} and @command{gawk}, indirect function calls -This section describes a @command{gawk}-specific extension. +This section describes an advanced, @command{gawk}-specific extension. Often, you may wish to defer the choice of function to call until runtime. For example, you may have different kinds of records, each of which @@ -19855,7 +19878,7 @@ To process the data, you might write initially: @noindent This style of programming works, but can be awkward. With @dfn{indirect} function calls, you tell @command{gawk} to use the @emph{value} of a -variable as the name of the function to call. +variable as the @emph{name} of the function to call. @cindex @code{@@}-notation for indirect function calls @cindex indirect function calls, @code{@@}-notation @@ -19917,7 +19940,6 @@ Otherwise they perform the expected computations and are not unusual. @example @c file eg/prog/indirectcall.awk # For each record, print the class name and the requested statistics - @{ class_name = $1 gsub(/_/, " ", class_name) # Replace _ with spaces @@ -20146,10 +20168,12 @@ $ @kbd{gawk -f quicksort.awk -f indirectcall.awk class_data2} Remember that you must supply a leading @samp{@@} in front of an indirect function call. -Unfortunately, indirect function calls cannot be used with the built-in functions. However, -you can generally write ``wrapper'' functions which call the built-in ones, and those can -be called indirectly. (Other than, perhaps, the mathematical functions, there is not a lot -of reason to try to call the built-in functions indirectly.) +Starting with @value{PVERSION} 4.1.2 of @command{gawk}, indirect function +calls may also be used with built-in functions and with extension functions +(@pxref{Dynamic Extensions}). The only thing you cannot do is pass a regular +expression constant to a built-in function through an indirect function +call.@footnote{This may change in a future version; recheck the documentation that +comes with your version of @command{gawk} to see if it has.} @command{gawk} does its best to make indirect function calls efficient. For example, in the following case: @@ -20160,7 +20184,7 @@ for (i = 1; i <= n; i++) @end example @noindent -@code{gawk} will look up the actual function to call only once. +@code{gawk} looks up the actual function to call only once. @node Functions Summary @section Summary @@ -20200,6 +20224,8 @@ from the real parameters by extra whitespace. User-defined functions may call other user-defined (and built-in) functions and may call themselves recursively. Function parameters ``hide'' any global variables of the same names. +You cannot use the name of a reserved variable (such as @code{ARGC}) +as the name of a parameter in user-defined functions. @item Scalar values are passed to user-defined functions by value. Array @@ -20218,7 +20244,7 @@ either scalar or array. @item @command{gawk} provides indirect function calls using a special syntax. -By setting a variable to the name of a user-defined function, you can +By setting a variable to the name of a function, you can determine at runtime what function will be called at that point in the program. This is equivalent to function pointers in C and C++. @@ -20253,7 +20279,7 @@ It contains the following chapters: @c STARTOFRANGE fudlib @cindex functions, user-defined, library of -@ref{User-defined}, describes how to write +@DBREF{User-defined} describes how to write your own @command{awk} functions. Writing functions is important, because it allows you to encapsulate algorithms and program tasks in a single place. It simplifies programming, making program development more @@ -20286,7 +20312,7 @@ use these functions. The functions are presented here in a progression from simple to complex. @cindex Texinfo -@ref{Extract Program}, +@DBREF{Extract Program} presents a program that you can use to extract the source code for these example library functions and programs from the Texinfo source for this @value{DOCUMENT}. @@ -20437,7 +20463,7 @@ A different convention, common in the Tcl community, is to use a single associative array to hold the values needed by the library function(s), or ``package.'' This significantly decreases the number of actual global names in use. For example, the functions described in -@ref{Passwd Functions}, +@DBREF{Passwd Functions} might have used array elements @code{@w{PW_data["inited"]}}, @code{@w{PW_data["total"]}}, @code{@w{PW_data["count"]}}, and @code{@w{PW_data["awklib"]}}, instead of @code{@w{_pw_inited}}, @code{@w{_pw_awklib}}, @code{@w{_pw_total}}, @@ -21000,7 +21026,7 @@ more difficult than they really need to be.} @cindex timestamps, formatted @cindex time, managing The @code{systime()} and @code{strftime()} functions described in -@ref{Time Functions}, +@DBREF{Time Functions} provide the minimum functionality necessary for dealing with the time of day in human readable form. While @code{strftime()} is extensive, the control formats are not necessarily easy to remember or intuitively obvious when @@ -21086,7 +21112,7 @@ function getlocaltime(time, ret, now, i) The string indices are easier to use and read than the various formats required by @code{strftime()}. The @code{alarm} program presented in -@ref{Alarm Program}, +@DBREF{Alarm Program} uses this function. A more general design for the @code{getlocaltime()} function would have allowed the user to supply an optional timestamp value to use instead @@ -21118,10 +21144,13 @@ This function reads from @code{file} one record at a time, building up the full contents of the file in the local variable @code{contents}. It works, but is not necessarily @c 8/2014. Thanks to BWK for pointing this out: -efficient.@footnote{Execution time grows quadratically in the size of +efficient. +@ignore +@footnote{Execution time grows quadratically in the size of the input; for each record, @command{awk} has to allocate a bigger internal buffer for @code{contents}, copy the old contents into it, and then append the contents of the new record.} +@end ignore The following function, based on a suggestion by Denis Shirokov, reads the entire contents of the named file in one shot: @@ -21294,7 +21323,7 @@ END @{ endfile(_filename_) @} @c endfile @end example -@ref{Wc Program}, +@DBREF{Wc Program} shows how this library function can be used and how it simplifies writing the main program. @@ -22297,7 +22326,7 @@ once. If you are worried about squeezing every last cycle out of your this is not necessary, since most @command{awk} programs are I/O-bound, and such a change would clutter up the code. -The @command{id} program in @ref{Id Program}, +The @command{id} program in @DBREF{Id Program} uses these functions. @c ENDOFRANGE libfudata @c ENDOFRANGE flibudata @@ -22323,7 +22352,7 @@ uses these functions. @cindex group file @cindex files, group Much of the discussion presented in -@ref{Passwd Functions}, +@DBREF{Passwd Functions} applies to the group database as well. Although there has traditionally been a well-known file (@file{/etc/group}) in a well-known format, the POSIX standard only provides a set of C library routines @@ -22662,13 +22691,13 @@ Most of the work is in scanning the database and building the various associative arrays. The functions that the user calls are themselves very simple, relying on @command{awk}'s associative arrays to do work. -The @command{id} program in @ref{Id Program}, +The @command{id} program in @DBREF{Id Program} uses these functions. @node Walking Arrays @section Traversing Arrays of Arrays -@ref{Arrays of Arrays}, described how @command{gawk} +@DBREF{Arrays of Arrays} described how @command{gawk} provides arrays of arrays. In particular, any element of an array may be either a scalar, or another array. The @code{isarray()} function (@pxref{Type Functions}) @@ -22823,7 +22852,7 @@ As a related challenge, revise that code to handle the case where an intervening value in @code{ARGV} is a variable assignment. @item -@ref{Walking Arrays}, presented a function that walked a multidimensional +@DBREF{Walking Arrays} presented a function that walked a multidimensional array to print it out. However, walking an array and processing each element is a general-purpose operation. Generalize the @code{walk_array()} function by adding an additional parameter named @@ -23836,6 +23865,11 @@ This program is a bit sloppy; it relies on @command{awk} to automatically close instead of doing it in an @code{END} rule. It also assumes that letters are contiguous in the character set, which isn't true for EBCDIC systems. +@ifset FOR_PRINT +You might want to consider how to eliminate the use of +@code{ord()} and @code{chr()}; this can be done in such a +way as to solve the EBCDIC issue as well. +@end ifset @c ENDOFRANGE filspl @c ENDOFRANGE split @@ -24081,7 +24115,7 @@ BEGIN @{ else if (c == "c") do_count++ else if (index("0123456789", c) != 0) @{ - # getopt requires args to options + # getopt() requires args to options # this messes us up for things like -5 if (Optarg ~ /^[[:digit:]]+$/) fcount = (c Optarg) + 0 @@ -24218,6 +24252,22 @@ END @{ @} @c endfile @end example + +@ifset FOR_PRINT +The logic for choosing which lines to print represents a @dfn{state +machine}, which is ``a device that can be in one of a set number of stable +conditions depending on its previous condition and on the present values +of its inputs.''@footnote{This is the definition returned from entering +@code{define: state machine} into Google.} +Brian Kernighan suggests that +``an alternative approach to state mechines is to just read +the input into an array, then use indexing. It's almost always +easier code, and for most inputs where you would use this, just +as fast.'' Consider how to rewrite the logic to follow this +suggestion. +@end ifset + + @c ENDOFRANGE prunt @c ENDOFRANGE tpul @c ENDOFRANGE uniq @@ -24743,7 +24793,7 @@ of standard @command{awk}: dealing with individual characters is very painful, requiring repeated use of the @code{substr()}, @code{index()}, and @code{gsub()} built-in functions (@pxref{String Functions}).@footnote{This -program was written before @command{gawk} acquired the ability to +program was also written before @command{gawk} acquired the ability to split each character in a string into separate array elements.} There are two functions. The first, @code{stranslate()}, takes three arguments: @@ -26357,6 +26407,23 @@ The @code{split.awk} program (@pxref{Split Program}) assumes that letters are contiguous in the character set, which isn't true for EBCDIC systems. Fix this problem. +(Hint: Consider a different way to work through the alphabet, +without relying on @code{ord()} and @code{chr()}.) + +@item +In @file{uniq.awk} (@pxref{Uniq Program}, the +logic for choosing which lines to print represents a @dfn{state +machine}, which is ``a device that can be in one of a set number of stable +conditions depending on its previous condition and on the present values +of its inputs.''@footnote{This is the definition returned from entering +@code{define: state machine} into Google.} +Brian Kernighan suggests that +``an alternative approach to state mechines is to just read +the input into an array, then use indexing. It's almost always +easier code, and for most inputs where you would use this, just +as fast.'' Rewrite the logic to follow this +suggestion. + @item Why can't the @file{wc.awk} program (@pxref{Wc Program}) just @@ -26634,7 +26701,7 @@ Often, though, it is desirable to be able to loop over the elements in a particular order that you, the programmer, choose. @command{gawk} lets you do this. -@ref{Controlling Scanning}, describes how you can assign special, +@DBREF{Controlling Scanning} describes how you can assign special, pre-defined values to @code{PROCINFO["sorted_in"]} in order to control the order in which @command{gawk} traverses an array during a @code{for} loop. @@ -29790,7 +29857,9 @@ responds @samp{syntax error}. When you do figure out what your mistake was, though, you'll feel like a real guru. @item -If you perused the dump of opcodes in @ref{Miscellaneous Debugger Commands}, +@c NOTE: no comma after the ref{} on purpose, due to following +@c parenthetical remark. +If you perused the dump of opcodes in @ref{Miscellaneous Debugger Commands} (or if you are already familiar with @command{gawk} internals), you will realize that much of the internal manipulation of data in @command{gawk}, as in many interpreters, is done on a stack. @@ -38251,7 +38320,7 @@ as well as any considerations you should bear in mind. @appendixsubsec Accessing The @command{gawk} Git Repository As @command{gawk} is Free Software, the source code is always available. -@ref{Gawk Distribution}, describes how to get and build the formal, +@DBREF{Gawk Distribution} describes how to get and build the formal, released versions of @command{gawk}. @cindex @command{git} utility |