diff options
author | Arnold D. Robbins <arnold@skeeve.com> | 2014-11-12 22:29:14 +0200 |
---|---|---|
committer | Arnold D. Robbins <arnold@skeeve.com> | 2014-11-16 20:00:46 +0200 |
commit | d4397f45eb710a3c24b7b24aa895e8b9323aff4f (patch) | |
tree | be0024b4a2333793703df5233ec1cfeea6668511 | |
parent | b027c0d5d49cddfb46565d2d572ecf3828b80b1a (diff) | |
download | egawk-d4397f45eb710a3c24b7b24aa895e8b9323aff4f.tar.gz egawk-d4397f45eb710a3c24b7b24aa895e8b9323aff4f.tar.bz2 egawk-d4397f45eb710a3c24b7b24aa895e8b9323aff4f.zip |
Copyedits. Through Part II.
-rw-r--r-- | NOTES | 15 | ||||
-rw-r--r-- | doc/gawktexi.in | 258 |
2 files changed, 144 insertions, 129 deletions
@@ -5,15 +5,20 @@ to be humorous. Page 10 - references to 'Chapter 10' and 'Chapter 11' have been left alone since they are links and I can't do it that way in texinfo anyway. -Appendices vs. Appendixes - I have left it as the former; the latter +Appendices vs. Appendixes: I have left it as the former; the latter looks totally wrong to me. -Numbers. I use the style where values from zero to nine are spelled -out and from 10 up they're written with digits. I forget what the -Chicago Manual of Style calls this. So I've rejected those changes. +Numbers: I use the style where values from zero to nine are spelled +out and from 10 up they're written with digits. (I forget what the +Chicago Manual of Style calls this.) So I've rejected those changes. C heads - I have not lowercased them; this would be incorrect for the Texinfo, so I've marked them as Rejected but with a reply in the PDF to please do this during production. -At page 222. +Literal layout blocks not being indented - I used literal layout to get +the brackets, which indicate optional stuff, in Roman. I think that if you +simply fix the style sheets to indent those blocks, we should be in better +shape. + +At page 321. diff --git a/doc/gawktexi.in b/doc/gawktexi.in index 01fa8565..65e8b8f3 100644 --- a/doc/gawktexi.in +++ b/doc/gawktexi.in @@ -18770,7 +18770,7 @@ function ctime(ts, format) @end example You might think that @code{ctime()} could use @code{PROCINFO["strftime"]} -for its format string. That would be a mistake, since @code{ctime()} is +for its format string. That would be a mistake, because @code{ctime()} is supposed to return the time formatted in a standard fashion, and user-level code could have changed @code{PROCINFO["strftime"]}. @c ENDOFRANGE fdef @@ -18791,7 +18791,7 @@ the function. @end menu @node Calling A Function -@subsubsection Writing A Function Call +@subsubsection Writing a Function Call A function call consists of the function name followed by the arguments in parentheses. @command{awk} expressions are what you write in the @@ -18806,7 +18806,7 @@ foo(x y, "lose", 4 * z) @quotation CAUTION Whitespace characters (spaces and TABs) are not allowed -between the function name and the open-parenthesis of the argument list. +between the function name and the opening parenthesis of the argument list. If you write whitespace by mistake, @command{awk} might think that you mean to concatenate a variable with an expression in parentheses. However, it notices that you used a function name and not a variable name, and reports @@ -18869,7 +18869,7 @@ top's i=3 @end example If you want @code{i} to be local to both @code{foo()} and @code{bar()} do as -follows (the extra-space before @code{i} is a coding convention to +follows (the extra space before @code{i} is a coding convention to indicate that @code{i} is a local variable, not an argument): @example @@ -18949,21 +18949,16 @@ At level 2, index 2 is found in a @end example @node Pass By Value/Reference -@subsubsection Passing Function Arguments By Value Or By Reference +@subsubsection Passing Function Arguments by Value Or by Reference In @command{awk}, when you declare a function, there is no way to declare explicitly whether the arguments are passed @dfn{by value} or @dfn{by reference}. -Instead the passing convention is determined at runtime when +Instead, the passing convention is determined at runtime when the function is called according to the following rule: - -@itemize -@item -If the argument is an array variable, then it is passed by reference, -@item -Otherwise the argument is passed by value. -@end itemize +if the argument is an array variable, then it is passed by reference. +Otherwise, the argument is passed by value. @cindex call by value Passing an argument by value means that when a function is called, it @@ -19066,7 +19061,13 @@ If @option{--lint} is specified Some @command{awk} implementations generate a runtime error if you use either the @code{next} statement or the @code{nextfile} statement -(@pxref{Next Statement}, also @pxref{Nextfile Statement}) +(@pxref{Next Statement}, and +@ifdocbook +@ref{Nextfile Statement}) +@end ifdocbook +@ifnotdocbook +@pxref{Nextfile Statement}) +@end ifnotdocbook inside a user-defined function. @command{gawk} does not have this limitation. @c ENDOFRANGE fudc @@ -19122,8 +19123,8 @@ function maxelt(vec, i, ret) @noindent You call @code{maxelt()} with one argument, which is an array name. The local variables @code{i} and @code{ret} are not intended to be arguments; -while there is nothing to stop you from passing more than one argument -to @code{maxelt()}, the results would be strange. The extra space before +there is nothing to stop you from passing more than one argument +to @code{maxelt()} but the results would be strange. The extra space before @code{i} in the function parameter list indicates that @code{i} and @code{ret} are local variables. You should follow this convention when defining functions. @@ -19260,8 +19261,8 @@ variable as the @emph{name} of the function to call. @cindex indirect function calls, @code{@@}-notation @cindex function calls, indirect, @code{@@}-notation for The syntax is similar to that of a regular function call: an identifier -immediately followed by a left parenthesis, any arguments, and then -a closing right parenthesis, with the addition of a leading @samp{@@} +immediately followed by an opening parenthesis, any arguments, and then +a closing parenthesis, with the addition of a leading @samp{@@} character: @example @@ -19270,7 +19271,7 @@ result = @@the_func() # calls the sum() function @end example Here is a full program that processes the previously shown data, -using indirect function calls. +using indirect function calls: @example @c file eg/prog/indirectcall.awk @@ -19311,7 +19312,7 @@ function sum(first, last, ret, i) These two functions expect to work on fields; thus the parameters @code{first} and @code{last} indicate where in the fields to start and end. -Otherwise they perform the expected computations and are not unusual. +Otherwise they perform the expected computations and are not unusual: @example @c file eg/prog/indirectcall.awk @@ -19637,7 +19638,7 @@ functions. POSIX @command{awk} provides three kinds of built-in functions: numeric, string, and I/O. @command{gawk} provides functions that sort arrays, work with values representing time, do bit manipulation, determine variable -type (array vs.@: scalar), and internationalize and localize programs. +type (array versus scalar), and internationalize and localize programs. @command{gawk} also provides several extensions to some of standard functions, typically in the form of additional arguments. @@ -19693,7 +19694,7 @@ program. This is equivalent to function pointers in C and C++. @c ENDOFRANGE funcud @ifnotinfo -@part @value{PART2}Problem Solving With @command{awk} +@part @value{PART2}Problem Solving with @command{awk} @end ifnotinfo @ifdocbook @@ -19703,10 +19704,10 @@ It contains the following chapters: @itemize @value{BULLET} @item -@ref{Library Functions}. +@ref{Library Functions} @item -@ref{Sample Programs}. +@ref{Sample Programs} @end itemize @end ifdocbook @@ -19767,9 +19768,9 @@ and would like to contribute them to the @command{awk} user community, see @cindex portability, example programs The programs in this @value{CHAPTER} and in @ref{Sample Programs}, -freely use features that are @command{gawk}-specific. +freely use @command{gawk}-specific features. Rewriting these programs for different implementations of @command{awk} -is pretty straightforward. +is pretty straightforward: @itemize @value{BULLET} @item @@ -19839,7 +19840,7 @@ Library functions often need to have global variables that they can use to preserve state information between calls to the function---for example, @code{getopt()}'s variable @code{_opti} (@pxref{Getopt Function}). -Such variables are called @dfn{private}, since the only functions that need to +Such variables are called @dfn{private}, as the only functions that need to use them are the ones in the library. When writing a library function, you should try to choose names for your @@ -19861,10 +19862,10 @@ In addition, several of the library functions use a prefix that helps indicate what function or set of functions use the variables---for example, @code{_pw_byname()} in the user database routines (@pxref{Passwd Functions}). -This convention is recommended, since it even further decreases the +This convention is recommended, as it even further decreases the chance of inadvertent conflict among variable names. Note that this convention is used equally well for variable names and for private -function names.@footnote{While all the library routines could have +function names.@footnote{Although all the library routines could have been rewritten to use this convention, this was not done, in order to show how our own @command{awk} programming style has evolved and to provide some basis for this discussion.} @@ -19937,7 +19938,7 @@ programming use. @end menu @node Strtonum Function -@subsection Converting Strings To Numbers +@subsection Converting Strings to Numbers The @code{strtonum()} function (@pxref{String Functions}) is a @command{gawk} extension. The following function @@ -20019,7 +20020,7 @@ string. It sets @code{k} to the index in @code{"1234567"} of the current octal digit. The return value will either be the same number as the digit, or zero if the character is not there, which will be true for a @samp{0}. -This is safe, since the regexp test in the @code{if} ensures that +This is safe, because the regexp test in the @code{if} ensures that only octal values are converted. Similar logic applies to the code that checks for and converts a @@ -20366,7 +20367,7 @@ is always 1. This means that on those systems, characters have numeric values from 128 to 255. Finally, large mainframe systems use the EBCDIC character set, which uses all 256 values. -While there are other character sets in use on some older systems, +There are other character sets in use on some older systems, but they are not really worth worrying about: @example @@ -20420,7 +20421,7 @@ Good function design is important; this function needs to be general but it should also have a reasonable default behavior. It is called with an array as well as the beginning and ending indices of the elements in the array to be merged. This assumes that the array indices are numeric---a reasonable -assumption since the array was likely created with @code{split()} +assumption, as the array was likely created with @code{split()} (@pxref{String Functions}): @cindex @code{join()} user-defined function @@ -20473,7 +20474,7 @@ more difficult than they really need to be.} The @code{systime()} and @code{strftime()} functions described in @DBREF{Time Functions} provide the minimum functionality necessary for dealing with the time of day -in human readable form. While @code{strftime()} is extensive, the control +in human-readable form. Although @code{strftime()} is extensive, the control formats are not necessarily easy to remember or intuitively obvious when reading a program. @@ -20564,7 +20565,7 @@ allowed the user to supply an optional timestamp value to use instead of the current time. @node Readfile Function -@subsection Reading A Whole File At Once +@subsection Reading a Whole File At Once Often, it is convenient to have the entire contents of a file available in memory as a single string. A straightforward but naive way to @@ -20624,7 +20625,7 @@ will never match if the file has contents. @command{gawk} reads data from the file into @code{tmp} attempting to match @code{RS}. The match fails after each read, but fails quickly, such that @command{gawk} fills @code{tmp} with the entire contents of the file. -(@xref{Records}, for information on @code{RT} and @code{RS}.) +(@DBXREF{Records} for information on @code{RT} and @code{RS}.) In the case that @code{file} is empty, the return value is the null string. Thus calling code may use something like: @@ -20642,7 +20643,7 @@ test would be @samp{contents == ""}. also reads an entire file into memory. @node Shell Quoting -@subsection Quoting Strings to Pass to The Shell +@subsection Quoting Strings to Pass to the Shell @c included by permission @ignore @@ -20684,7 +20685,7 @@ chmod -w file.flac Note the need for shell quoting. The function @code{shell_quote()} does it. @code{SINGLE} is the one-character string @code{"'"} and -@code{QSINGLE} is the three-character string @code{"\"'\""}. +@code{QSINGLE} is the three-character string @code{"\"'\""}: @example @c file eg/lib/shellquote.awk @@ -20744,7 +20745,7 @@ command-line @value{DF}s. @cindex files, managing, data file boundaries @cindex files, initialization and cleanup -The @code{BEGIN} and @code{END} rules are each executed exactly once at +The @code{BEGIN} and @code{END} rules are each executed exactly once, at the beginning and end of your @command{awk} program, respectively (@pxref{BEGIN/END}). We (the @command{gawk} authors) once had a user who mistakenly thought that the @@ -20816,7 +20817,7 @@ The following version solves the problem: @example @c file eg/lib/ftrans.awk -# ftrans.awk --- handle data file transitions +# ftrans.awk --- handle datafile transitions # # user supplies beginfile() and endfile() functions @c endfile @@ -20844,7 +20845,7 @@ END @{ endfile(_filename_) @} shows how this library function can be used and how it simplifies writing the main program. -@sidebar So Why Does @command{gawk} have @code{BEGINFILE} and @code{ENDFILE}? +@sidebar So Why Does @command{gawk} Have @code{BEGINFILE} and @code{ENDFILE}? You are probably wondering, if @code{beginfile()} and @code{endfile()} functions can do the job, why does @command{gawk} have @@ -20852,7 +20853,7 @@ functions can do the job, why does @command{gawk} have Good question. Normally, if @command{awk} cannot open a file, this causes an immediate fatal error. In this case, there is no way for a -user-defined function to deal with the problem, since the mechanism for +user-defined function to deal with the problem, as the mechanism for calling it relies on the file being open and at the first record. Thus, the main reason for @code{BEGINFILE} is to give you a ``hook'' to catch files that cannot be processed. @code{ENDFILE} exists for symmetry, @@ -20910,8 +20911,8 @@ The @code{rewind()} function relies on the @code{ARGIND} variable (@pxref{Auto-set}), which is specific to @command{gawk}. It also relies on the @code{nextfile} keyword (@pxref{Nextfile Statement}). Because of this, you should not call it from an @code{ENDFILE} rule. -(This isn't necessary anyway, since as soon as an @code{ENDFILE} rule -finishes @command{gawk} goes to the next file!) +(This isn't necessary anyway, because @command{gawk} goes to the next +file as soon as an @code{ENDFILE} rule finishes!) @node File Checking @subsection Checking for Readable @value{DDF}s @@ -20959,13 +20960,13 @@ BEGIN @{ @cindex troubleshooting, @code{getline} function This works, because the @code{getline} won't be fatal. Removing the element from @code{ARGV} with @code{delete} -skips the file (since it's no longer in the list). +skips the file (because it's no longer in the list). See also @ref{ARGC and ARGV}. -The regular expression check purposely does not use character classes +Because @command{awk} variable names only allow the English letters, +the regular expression check purposely does not use character classes such as @samp{[:alpha:]} and @samp{[:alnum:]} (@pxref{Bracket Expressions}) -since @command{awk} variable names only allow the English letters. @node Empty Files @subsection Checking for Zero-length Files @@ -21107,12 +21108,12 @@ are left alone. @c STARTOFRANGE clibf @cindex functions, library, C library @cindex arguments, processing -Most utilities on POSIX compatible systems take options on +Most utilities on POSIX-compatible systems take options on the command line that can be used to change the way a program behaves. @command{awk} is an example of such a program (@pxref{Options}). -Often, options take @dfn{arguments}; i.e., data that the program needs to -correctly obey the command-line option. For example, @command{awk}'s +Often, options take @dfn{arguments} (i.e., data that the program needs to +correctly obey the command-line option). For example, @command{awk}'s @option{-F} option requires a string to use as the field separator. The first occurrence on the command line of either @option{--} or a string that does not begin with @samp{-} ends the options. @@ -21216,7 +21217,7 @@ necessary for accessing individual characters (@pxref{String Functions}).@footnote{This function was written before @command{gawk} acquired the ability to split strings into single characters using @code{""} as the separator. -We have left it alone, since using @code{substr()} is more portable.} +We have left it alone, as using @code{substr()} is more portable.} The discussion that follows walks through the code a bit at a time: @@ -21384,9 +21385,9 @@ next element in @code{argv}. If neither condition is true, then only on the next call to @code{getopt()}. The @code{BEGIN} rule initializes both @code{Opterr} and @code{Optind} to one. -@code{Opterr} is set to one, since the default behavior is for @code{getopt()} +@code{Opterr} is set to one, because the default behavior is for @code{getopt()} to print a diagnostic message upon seeing an invalid option. @code{Optind} -is set to one, since there's no reason to look at the program name, which is +is set to one, because there's no reason to look at the program name, which is in @code{ARGV[0]}: @example @@ -21436,16 +21437,22 @@ etc., as its own options. @quotation NOTE After @code{getopt()} is through, -user level code must clear out all the elements of @code{ARGV} from 1 +user-level code must clear out all the elements of @code{ARGV} from 1 to @code{Optind}, so that @command{awk} does not try to process the command-line options as @value{FN}s. @end quotation Using @samp{#!} with the @option{-E} option may help avoid conflicts between your program's options and @command{gawk}'s options, -since @option{-E} causes @command{gawk} to abandon processing of +as @option{-E} causes @command{gawk} to abandon processing of further options -(@pxref{Executable Scripts}, and @pxref{Options}). +(@DBPXREF{Executable Scripts} and +@ifnotdocbook +@pxref{Options}). +@end ifnotdocbook +@ifdocbook +@ref{Options}). +@end ifdocbook Several of the sample programs presented in @ref{Sample Programs}, @@ -21475,7 +21482,7 @@ However, because these are numbers, they do not provide very useful information to the average user. There needs to be some way to find the user information associated with the user and group ID numbers. This @value{SECTION} presents a suite of functions for retrieving information from the -user database. @xref{Group Functions}, +user database. @DBXREF{Group Functions} for a similar suite that retrieves information from the group database. @cindex @code{getpwent()} function (C library) @@ -21494,7 +21501,7 @@ The ``password'' comes from the original user database file, encrypted passwords (hence the name). @cindex @command{pwcat} program -While an @command{awk} program could simply read @file{/etc/passwd} +Although an @command{awk} program could simply read @file{/etc/passwd} directly, this file may not contain complete information about the system's set of users.@footnote{It is often the case that password information is stored in a network database.} To be sure you are able to @@ -21589,12 +21596,12 @@ The user's encrypted password. This may not be available on some systems. @item User-ID The user's numeric user ID number. -(On some systems it's a C @code{long}, and not an @code{int}. Thus +(On some systems, it's a C @code{long}, and not an @code{int}. Thus we cast it to @code{long} for all cases.) @item Group-ID The user's numeric group ID number. -(Similar comments about @code{long} vs.@: @code{int} apply here.) +(Similar comments about @code{long} versus @code{int} apply here.) @item Full name The user's full name, and perhaps other information associated with the @@ -21695,7 +21702,7 @@ The function @code{_pw_init()} fills three copies of the user information into three associative arrays. The arrays are indexed by username (@code{_pw_byname}), by user ID number (@code{_pw_byuid}), and by order of occurrence (@code{_pw_bycount}). -The variable @code{_pw_inited} is used for efficiency, since @code{_pw_init()} +The variable @code{_pw_inited} is used for efficiency, as @code{_pw_init()} needs to be called only once. @cindex @code{PROCINFO} array, testing the field splitting @@ -21704,7 +21711,7 @@ Because this function uses @code{getline} to read information from @command{pwcat}, it first saves the values of @code{FS}, @code{RS}, and @code{$0}. It notes in the variable @code{using_fw} whether field splitting with @code{FIELDWIDTHS} is in effect or not. -Doing so is necessary, since these functions could be called +Doing so is necessary, as these functions could be called from anywhere within a user's program, and the user may have his or her own way of splitting records and fields. This makes it possible to restore the correct @@ -21806,7 +21813,7 @@ In turn, calling @code{_pw_init()} is not too expensive, because the once. If you are worried about squeezing every last cycle out of your @command{awk} program, the check of @code{_pw_inited} could be moved out of @code{_pw_init()} and duplicated in all the other functions. In practice, -this is not necessary, since most @command{awk} programs are I/O-bound, +this is not necessary, as most @command{awk} programs are I/O-bound, and such a change would clutter up the code. The @command{id} program in @DBREF{Id Program} @@ -21945,7 +21952,7 @@ the association of name to number must be unique within the file. we cast it to @code{long} for all cases.) @item Group Member List -A comma-separated list of user names. These users are members of the group. +A comma-separated list of usernames. These users are members of the group. Modern Unix systems allow users to be members of several groups simultaneously. If your system does, then there are elements @code{"group1"} through @code{"group@var{N}"} in @code{PROCINFO} @@ -22060,7 +22067,7 @@ is being used, and to restore the appropriate field splitting mechanism. The group information is stored is several associative arrays. The arrays are indexed by group name (@code{@w{_gr_byname}}), by group ID number (@code{@w{_gr_bygid}}), and by position in the database (@code{@w{_gr_bycount}}). -There is an additional array indexed by user name (@code{@w{_gr_groupsbyuser}}), +There is an additional array indexed by username (@code{@w{_gr_groupsbyuser}}), which is a space-separated list of groups to which each user belongs. Unlike the user database, it is possible to have multiple records in the @@ -22073,7 +22080,7 @@ tvpeople:*:101:david,conan,tom,joan @end example For this reason, @code{_gr_init()} looks to see if a group name or -group ID number is already seen. If it is, then the user names are +group ID number is already seen. If it is, the usernames are simply concatenated onto the previous list of users.@footnote{There is actually a subtle problem with the code just presented. Suppose that the first time there were no names. This code adds the names with @@ -22119,7 +22126,7 @@ function getgrgid(gid) @cindex @code{getgruser()} function (C library) The @code{getgruser()} function does not have a C counterpart. It takes a -user name and returns the list of groups that have the user as a member: +username and returns the list of groups that have the user as a member: @cindex @code{getgruser()} function, user-defined @example @@ -22262,7 +22269,7 @@ The functions presented here fit into the following categories: @c nested list @table @asis @item General problems -Number to string conversion, assertions, rounding, random number +Number-to-string conversion, assertions, rounding, random number generation, converting characters to numbers, joining strings, getting easily usable time-of-day information, and reading a whole file in one shot. @@ -22458,7 +22465,7 @@ The programs are presented in alphabetical order. @end menu @node Cut Program -@subsection Cutting out Fields and Columns +@subsection Cutting Out Fields and Columns @cindex @command{cut} utility @c STARTOFRANGE cut @@ -22735,7 +22742,7 @@ function set_charlist( field, i, j, f, g, n, m, t, @c endfile @end example -Next is the rule that actually processes the data. If the @option{-s} option +Next is the rule that processes the data. If the @option{-s} option is given, then @code{suppress} is true. The first @code{if} statement makes sure that the input record does have the field separator. If @command{cut} is processing fields, @code{suppress} is true, and the field @@ -22767,9 +22774,9 @@ written out between the fields: @end example This version of @command{cut} relies on @command{gawk}'s @code{FIELDWIDTHS} -variable to do the character-based cutting. While it is possible in +variable to do the character-based cutting. It is possible in other @command{awk} implementations to use @code{substr()} -(@pxref{String Functions}), +(@pxref{String Functions}), but it is also extremely painful. The @code{FIELDWIDTHS} variable supplies an elegant solution to the problem of picking the input line apart by characters. @@ -22914,7 +22921,7 @@ matched lines in the output: @c endfile @end example -The last two lines are commented out, since they are not needed in +The last two lines are commented out, as they are not needed in @command{gawk}. They should be uncommented if you have to use another version of @command{awk}. @@ -22924,7 +22931,7 @@ into lowercase if the @option{-i} option is specified.@footnote{It also introduces a subtle bug; if a match happens, we output the translated line, not the original.} The rule is -commented out since it is not necessary with @command{gawk}: +commented out as it is not necessary with @command{gawk}: @example @c file eg/prog/egrep.awk @@ -23061,7 +23068,7 @@ function usage() @c ENDOFRANGE egrep @node Id Program -@subsection Printing out User Information +@subsection Printing Out User Information @cindex printing, user information @cindex users, information about, printing @@ -23176,7 +23183,7 @@ function pr_first_field(str, a) The test in the @code{for} loop is worth noting. Any supplementary groups in the @code{PROCINFO} array have the indices @code{"group1"} through @code{"group@var{N}"} for some -@var{N}, i.e., the total number of supplementary groups. +@var{N} (i.e., the total number of supplementary groups). However, we don't know in advance how many of these groups there are. @@ -23216,10 +23223,10 @@ aims to demonstrate.} By default, the output files are named @file{xaa}, @file{xab}, and so on. Each file has -1000 lines in it, with the likely exception of the last file. To change the +1,000 lines in it, with the likely exception of the last file. To change the number of lines in each file, supply a number on the command line -preceded with a minus; e.g., @samp{-500} for files with 500 lines in them -instead of 1000. To change the name of the output files to something like +preceded with a minus (e.g., @samp{-500} for files with 500 lines in them +instead of 1,000). To change the name of the output files to something like @file{myfileaa}, @file{myfileab}, and so on, supply an additional argument that specifies the @value{FN} prefix. @@ -23267,7 +23274,7 @@ BEGIN @{ @} # test argv in case reading from stdin instead of file if (i in ARGV) - i++ # skip data file name + i++ # skip datafile name if (i in ARGV) @{ outfile = ARGV[i] ARGV[i] = "" @@ -23361,8 +23368,8 @@ truncating them and starting over. The @code{BEGIN} rule first makes a copy of all the command-line arguments into an array named @code{copy}. -@code{ARGV[0]} is not copied, since it is not needed. -@code{tee} cannot use @code{ARGV} directly, since @command{awk} attempts to +@code{ARGV[0]} is not needed, so it is not copied. +@code{tee} cannot use @code{ARGV} directly, because @command{awk} attempts to process each @value{FN} in @code{ARGV} as input data. @cindex flag variables @@ -23411,7 +23418,7 @@ BEGIN @{ @c endfile @end example -The following single rule does all the work. Since there is no pattern, it is +The following single rule does all the work. Because there is no pattern, it is executed for each line of input. The body of the rule simply prints the line into each file on the command line, and then to the standard output: @@ -23442,7 +23449,7 @@ for (i in copy) @end example @noindent -This is more concise but it is also less efficient. The @samp{if} is +This is more concise, but it is also less efficient. The @samp{if} is tested for each record and for each output file. By duplicating the loop body, the @samp{if} is only tested once for each input record. If there are @var{N} input records and @var{M} output files, the first method only @@ -23662,10 +23669,10 @@ The second rule does the work. The variable @code{equal} is one or zero, depending upon the results of @code{are_equal()}'s comparison. If @command{uniq} is counting repeated lines, and the lines are equal, then it increments the @code{count} variable. Otherwise, it prints the line and resets @code{count}, -since the two lines are not equal. +because the two lines are not equal. If @command{uniq} is not counting, and if the lines are equal, @code{count} is incremented. -Nothing is printed, since the point is to remove duplicates. +Nothing is printed, as the point is to remove duplicates. Otherwise, if @command{uniq} is counting repeated lines and more than one line is seen, or if @command{uniq} is counting nonrepeated lines and only one line is seen, then the line is printed, and @code{count} @@ -23786,7 +23793,7 @@ Count only characters. @end table Implementing @command{wc} in @command{awk} is particularly elegant, -since @command{awk} does a lot of the work for us; it splits lines into +because @command{awk} does a lot of the work for us; it splits lines into words (i.e., fields) and counts them, it counts lines (i.e., records), and it can easily tell us how long a line is. @@ -23891,7 +23898,7 @@ function endfile(file) @end example There is one rule that is executed for each line. It adds the length of -the record, plus one, to @code{chars}.@footnote{Since @command{gawk} +the record, plus one, to @code{chars}.@footnote{Because @command{gawk} understands multibyte locales, this code counts characters, not bytes.} Adding one plus the record length is needed because the newline character separating records (the value @@ -24239,8 +24246,8 @@ often used to map uppercase letters into lowercase for further processing: @command{tr} requires two lists of characters.@footnote{On some older systems, including Solaris, the system version of @command{tr} may require that the lists be written as range expressions enclosed in square brackets -(@samp{[a-z]}) and quoted, to prevent the shell from attempting a file -name expansion. This is not a feature.} When processing the input, the +(@samp{[a-z]}) and quoted, to prevent the shell from attempting a +@value{FN} expansion. This is not a feature.} When processing the input, the first character in the first list is replaced with the first character in the second list, the second character in the first list is replaced with the second character in the second list, and so on. If there are @@ -24355,9 +24362,9 @@ BEGIN @{ @c endfile @end example -While it is possible to do character transliteration in a user-level -function, it is not necessarily efficient, and we (the @command{gawk} -authors) started to consider adding a built-in function. However, +It is possible to do character transliteration in a user-level +function, but it is not necessarily efficient, and we (the @command{gawk} +developers) started to consider adding a built-in function. However, shortly after writing this program, we learned that Brian Kernighan had added the @code{toupper()} and @code{tolower()} functions to his @command{awk} (@pxref{String Functions}). These functions handle the @@ -24401,7 +24408,7 @@ the @code{line} array and printing the page when 20 labels have been read. The @code{BEGIN} rule simply sets @code{RS} to the empty string, so that @command{awk} splits records at blank lines (@pxref{Records}). -It sets @code{MAXLINES} to 100, since 100 is the maximum number +It sets @code{MAXLINES} to 100, because 100 is the maximum number of lines on the page @iftex (@math{20 @cdot 5 = 100}). @@ -24558,9 +24565,9 @@ useful on real text files: @item The @command{awk} language considers upper- and lowercase characters to be distinct. Therefore, ``bartender'' and ``Bartender'' are not treated -as the same word. This is undesirable, since in normal text, words -are capitalized if they begin sentences, and a frequency analyzer should not -be sensitive to capitalization. +as the same word. This is undesirable, because words are capitalized +if they begin sentences in normal text, and a frequency analyzer should +not be sensitive to capitalization. @item Words are detected using the @command{awk} convention that fields are @@ -24741,7 +24748,7 @@ The nodes and @ref{Sample Programs}, are the top level nodes for a large number of @command{awk} programs. @end ifinfo -If you want to experiment with these programs, it is tedious to have to type +If you want to experiment with these programs, it is tedious to type them in by hand. Here we present a program that can extract parts of a Texinfo input file into separate files. @@ -24819,7 +24826,7 @@ It also prints some final advice: @@example @@c file examples/messages.awk -END @@@{ print "Always avoid bored archeologists!" @@@} +END @@@{ print "Always avoid bored archaeologists!" @@@} @@c end file @@end example @dots{} @@ -24991,7 +24998,7 @@ The @command{sed} utility is a stream editor, a program that reads a stream of data, makes changes to it, and passes it on. It is often used to make global changes to a large file or to a stream of data generated by a pipeline of commands. -While @command{sed} is a complicated program in its own right, its most common +Although @command{sed} is a complicated program in its own right, its most common use is to perform global substitutions in the middle of a pipeline: @example @@ -25000,7 +25007,7 @@ use is to perform global substitutions in the middle of a pipeline: Here, @samp{s/old/new/g} tells @command{sed} to look for the regexp @samp{old} on each input line and globally replace it with the text -@samp{new}, i.e., all the occurrences on a line. This is similar to +@samp{new} (i.e., all the occurrences on a line). This is similar to @command{awk}'s @code{gsub()} function (@pxref{String Functions}). @@ -25084,7 +25091,7 @@ not treated as @value{FN}s (@pxref{ARGC and ARGV}). The @code{usage()} function prints an error message and exits. -Finally, the single rule handles the printing scheme outlined above, +Finally, the single rule handles the printing scheme outlined earlier, using @code{print} or @code{printf} as appropriate, depending upon the value of @code{RT}. @c ENDOFRANGE awksed @@ -25128,8 +25135,8 @@ BEGIN @{ The following program, @file{igawk.sh}, provides this service. It simulates @command{gawk}'s searching of the @env{AWKPATH} variable -and also allows @dfn{nested} includes; i.e., a file that is included -with @code{@@include} can contain further @code{@@include} statements. +and also allows @dfn{nested} includes (i.e., a file that is included +with @code{@@include} can contain further @code{@@include} statements). @command{igawk} makes an effort to only include files once, so that nested includes don't accidentally include a library function twice. @@ -25159,10 +25166,10 @@ Literal text, provided with @option{-e} or @option{--source}. This text is just appended directly. @item -Source @value{FN}s, provided with @option{-f}. We use a neat trick and append -@samp{@@include @var{filename}} to the shell variable's contents. Since the file-inclusion -program works the way @command{gawk} does, this gets the text -of the file included into the program at the correct point. +Source @value{FN}s, provided with @option{-f}. We use a neat trick and +append @samp{@@include @var{filename}} to the shell variable's contents. +Because the file-inclusion program works the way @command{gawk} does, this +gets the text of the file included in the program at the correct point. @end enumerate @item @@ -25461,9 +25468,10 @@ EOF @c endfile @end example -The shell construct @samp{@var{command} << @var{marker}} is called a @dfn{here document}. -Everything in the shell script up to the @var{marker} is fed to @var{command} as input. -The shell processes the contents of the here document for variable and command substitution +The shell construct @samp{@var{command} << @var{marker}} is called +a @dfn{here document}. Everything in the shell script up to the +@var{marker} is fed to @var{command} as input. The shell processes +the contents of the here document for variable and command substitution (and possibly other things as well, depending upon the shell). The shell construct @samp{$(@dots{})} is called @dfn{command substitution}. @@ -25478,14 +25486,16 @@ It's done in these steps: @enumerate @item Run @command{gawk} with the @code{@@include}-processing program (the -value of the @code{expand_prog} shell variable) on standard input. +value of the @code{expand_prog} shell variable) reading standard input. @item -Standard input is the contents of the user's program, from the shell variable @code{program}. -Its contents are fed to @command{gawk} via a here document. +Standard input is the contents of the user's program, +from the shell variable @code{program}. +Feed its contents to @command{gawk} via a here document. @item -The results of this processing are saved in the shell variable @code{processed_program} by using command substitution. +Save the results of this processing in the shell variable +@code{processed_program} by using command substitution. @end enumerate The last step is to call @command{gawk} with the expanded program, @@ -25561,7 +25571,7 @@ of @command{awk} programs as Web CGI scripts.} @c ENDOFRANGE igawk @node Anagram Program -@subsection Finding Anagrams From A Dictionary +@subsection Finding Anagrams from a Dictionary @cindex anagrams, finding An interesting programming challenge is to @@ -25570,17 +25580,17 @@ word list (such as @file{/usr/share/dict/words} on many GNU/Linux systems). One word is an anagram of another if both words contain the same letters -(for example, ``babbling'' and ``blabbing''). +(e.g., ``babbling'' and ``blabbing''). -Column 2, Problem C of Jon Bentley's @cite{Programming Pearls}, second -edition, presents an elegant algorithm. The idea is to give words that +Column 2, Problem C, of Jon Bentley's @cite{Programming Pearls}, Second +Edition, presents an elegant algorithm. The idea is to give words that are anagrams a common signature, sort all the words together by their signature, and then print them. Dr.@: Bentley observes that taking the letters in each word and sorting them produces that common signature. The following program uses arrays of arrays to bring together words with the same signature and array sorting to print the words -in sorted order. +in sorted order: @c STARTOFRANGE anagram @cindex @code{anagram.awk} program @@ -25652,7 +25662,7 @@ function word2key(word, a, i, n, result) Finally, the @code{END} rule traverses the array and prints out the anagram lists. It sends the output -to the system @command{sort} command, since otherwise +to the system @command{sort} command because otherwise the anagrams would appear in arbitrary order: @example @@ -25694,7 +25704,7 @@ babery yabber @c ENDOFRANGE anagram @node Signature Program -@subsection And Now For Something Completely Different +@subsection And Now for Something Completely Different @cindex signature program @cindex Brini, Davide @@ -37347,9 +37357,9 @@ recommend compiling and using the current version. @node Bugs @appendixsec Reporting Problems and Bugs -@cindex archeologists +@cindex archaeologists @quotation -@i{There is nothing more dangerous than a bored archeologist.} +@i{There is nothing more dangerous than a bored archaeologist.} @author The Hitchhiker's Guide to the Galaxy @end quotation @c the radio show, not the book. :-) |