diff options
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r-- | doc/gawk.texi | 1971 |
1 files changed, 905 insertions, 1066 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi index e0023245..81568fe7 100644 --- a/doc/gawk.texi +++ b/doc/gawk.texi @@ -55,8 +55,9 @@ @set VERSION 4.1 @set PATCHLEVEL 2 +@set GAWKINETTITLE TCP/IP Internetworking with @command{gawk} @ifset FOR_PRINT -@set TITLE Effective Awk Programming +@set TITLE Effective awk Programming @end ifset @ifclear FOR_PRINT @set TITLE GAWK: Effective AWK Programming @@ -207,7 +208,7 @@ @set FFN Filename @set DF datafile @set DDF Datafile -@set PVERSION Version +@set PVERSION version @end ifset @c For HTML, spell out email addresses, to avoid problems with @@ -304,7 +305,7 @@ All Rights Reserved.</literallayout> @end docbook @ifnotdocbook -Copyright @copyright{} 1989, 1991, 1992, 1993, 1996--2005, 2007, 2009--2014 @* +Copyright @copyright{} 1989, 1991, 1992, 1993, 1996--2005, 2007, 2009--2015 @* Free Software Foundation, Inc. @end ifnotdocbook @sp 2 @@ -472,7 +473,7 @@ particular records in a file and perform operations upon them. @command{gawk}. * Internationalization:: Getting @command{gawk} to speak your language. -* Debugger:: The @code{gawk} debugger. +* Debugger:: The @command{gawk} debugger. * Arbitrary Precision Arithmetic:: Arbitrary precision arithmetic with @command{gawk}. * Dynamic Extensions:: Adding new built-in functions to @@ -956,7 +957,7 @@ particular records in a file and perform operations upon them. * Internal File Ops:: The code for internal file operations. * Using Internal File Ops:: How to use an external extension. * Extension Samples:: The sample extensions that ship with - @code{gawk}. + @command{gawk}. * Extension Sample File Functions:: The file functions sample. * Extension Sample Fnmatch:: An interface to @code{fnmatch()}. * Extension Sample Fork:: An interface to @code{fork()} and @@ -1171,7 +1172,7 @@ interface to network protocols via special @file{/inet} files. The programs in this book make clear that an AWK program is typically much smaller and faster to develop than a counterpart written in C. -Consequently, there is often a payoff to prototype an +Consequently, there is often a payoff to prototyping an algorithm or design in AWK to get it running quickly and expose problems early. Often, the interpreted performance is adequate and the AWK prototype becomes the product. @@ -1248,15 +1249,15 @@ March 2001 Some things don't change. Thirteen years ago I wrote: ``If you use AWK or want to learn how, then read this book.'' -True then and still true today. +True then, and still true today. -Learning to use a programming language is more than mastering the +Learning to use a programming language is about more than mastering the syntax. One needs to acquire an understanding of how to use the features of the language to solve practical programming problems. A focus of this book is many examples that show how to use AWK. Some things do change. Our computers are much faster and have more memory. -Consequently, speed and storage inefficiencies of a high level language +Consequently, speed and storage inefficiencies of a high-level language matter less. Prototyping in AWK and then rewriting in C for performance reasons happens less, because more often the prototype is fast enough. @@ -1264,12 +1265,12 @@ Of course, there are computing operations that are best done in C or C++. With @command{gawk} 4.1 and later, you do not have to choose between writing your program in AWK or in C/C++. You can write most of your program in AWK and the aspects that require C/C++ capabilities can be written -in C/C++ and then the pieces glued together when the @command{gawk} module loads +in C/C++, and then the pieces glued together when the @command{gawk} module loads the C/C++ module as a dynamic plug-in. @c Chapter 16 @ref{Dynamic Extensions}, has all the -details, and as expected, many examples to help you learn the ins and outs. +details, and, as expected, many examples to help you learn the ins and outs. I enjoy programming in AWK and had fun (re)reading this book. I think you will too. @@ -1344,7 +1345,7 @@ Generate reports Validate data @item -Produce indexes and perform other document preparation tasks +Produce indexes and perform other document-preparation tasks @item Experiment with algorithms that you can adapt later to other computer @@ -1491,23 +1492,23 @@ help from me, thoroughly reworked @command{gawk} for compatibility with the newer @command{awk}. Circa 1994, I became the primary maintainer. Current development focuses on bug fixes, -performance improvements, standards compliance and, occasionally, new features. +performance improvements, standards compliance, and, occasionally, new features. In May 1997, J@"urgen Kahrs felt the need for network access from @command{awk}, and with a little help from me, set about adding features to do this for @command{gawk}. At that time, he also wrote the bulk of -@cite{TCP/IP Internetworking with @command{gawk}} +@cite{@value{GAWKINETTITLE}} (a separate document, available as part of the @command{gawk} distribution). His code finally became part of the main @command{gawk} distribution with @command{gawk} @value{PVERSION} 3.1. John Haque rewrote the @command{gawk} internals, in the process providing an @command{awk}-level debugger. This version became available as -@command{gawk} @value{PVERSION} 4.0, in 2011. +@command{gawk} @value{PVERSION} 4.0 in 2011. @DBXREF{Contributors} -for a full list of those who made important contributions to @command{gawk}. +for a full list of those who have made important contributions to @command{gawk}. @node Names @unnumberedsec A Rose by Any Other Name @@ -1520,7 +1521,7 @@ is often referred to as ``new @command{awk}.'' By analogy, the original version of @command{awk} is referred to as ``old @command{awk}.'' -Today, on most systems, when you run the @command{awk} utility, +Today, on most systems, when you run the @command{awk} utility you get some version of new @command{awk}.@footnote{Only Solaris systems still use an old @command{awk} for the default @command{awk} utility. A more modern @command{awk} lives in @@ -1580,7 +1581,9 @@ the POSIX standard for @command{awk}. This @value{DOCUMENT} has the difficult task of being both a tutorial and a reference. If you are a novice, feel free to skip over details that seem too complex. You should also ignore the many cross-references; they are for the -expert user and for the online Info and HTML versions of the @value{DOCUMENT}. +expert user and for the Info and +@uref{http://www.gnu.org/software/gawk/manual/, HTML} +versions of the @value{DOCUMENT}. @end ifnotinfo There are sidebars @@ -1613,7 +1616,7 @@ This @value{DOCUMENT} is split into several parts, as follows: @itemize @value{BULLET} @item -Part I describes the @command{awk} language and @command{gawk} program in detail. +Part I describes the @command{awk} language and the @command{gawk} program in detail. It starts with the basics, and continues through all of the features of @command{awk}. It contains the following chapters: @@ -1660,10 +1663,10 @@ doing something when a record is matched, and the predefined variables @item @ref{Arrays}, -covers @command{awk}'s one-and-only data structure: associative arrays. -Deleting array elements and whole arrays is also described, as well as -sorting arrays in @command{gawk}. It also describes how @command{gawk} -provides arrays of arrays. +covers @command{awk}'s one-and-only data structure: the associative array. +Deleting array elements and whole arrays is described, as well as +sorting arrays in @command{gawk}. The @value{CHAPTER} also describes how +@command{gawk} provides arrays of arrays. @item @ref{Functions}, @@ -1675,17 +1678,17 @@ as well as how to define your own functions. It also discusses how @item Part II shows how to use @command{awk} and @command{gawk} for problem solving. There is lots of code here for you to read and learn from. -It contains the following chapters: +This part contains the following chapters: @c nested @itemize @value{MINUS} @item -@ref{Library Functions}, which provides a number of functions meant to +@ref{Library Functions}, provides a number of functions meant to be used from main @command{awk} programs. @item @ref{Sample Programs}, -which provides many sample @command{awk} programs. +provides many sample @command{awk} programs. @end itemize Reading these two chapters allows you to see @command{awk} @@ -1738,7 +1741,7 @@ including the GNU General Public License: @item @ref{Language History}, describes how the @command{awk} language has evolved since -its first release to present. It also describes how @command{gawk} +its first release to the present. It also describes how @command{gawk} has acquired features over time. @item @@ -1781,7 +1784,7 @@ are completely unfamiliar with computer programming. @item @uref{http://www.gnu.org/software/gawk/manual/html_node/Glossary.html, The Glossary} -defines most, if not all of, the significant terms used +defines most, if not all, of the significant terms used throughout the @value{DOCUMENT}. If you find terms that you aren't familiar with, try looking them up here. @@ -1808,7 +1811,7 @@ and some possible future directions for @command{gawk} development. provides some very cursory background material for those who are completely unfamiliar with computer programming. -The @ref{Glossary}, defines most, if not all of, the significant terms used +The @ref{Glossary}, defines most, if not all, of the significant terms used throughout the @value{DOCUMENT}. If you find terms that you aren't familiar with, try looking them up here. @@ -1851,7 +1854,7 @@ This typically represents the command's standard output. Output from the command, usually its standard output, appears @code{like this}. @end ifset -Error messages, and other output on the command's standard error, are preceded +Error messages and other output on the command's standard error are preceded by the glyph ``@error{}''. For example: @example @@ -1878,7 +1881,7 @@ there are special characters called ``control characters.'' These are characters that you type by holding down both the @kbd{CONTROL} key and another key, at the same time. For example, a @kbd{Ctrl-d} is typed by first pressing and holding the @kbd{CONTROL} key, next -pressing the @kbd{d} key and finally releasing both keys. +pressing the @kbd{d} key, and finally releasing both keys. For the sake of brevity, throughout this @value{DOCUMENT}, we refer to Brian Kernighan's version of @command{awk} as ``BWK @command{awk}.'' @@ -1914,7 +1917,7 @@ the picture of a flashlight in the margin, as shown here. @value{DARKCORNER} @end iftex @ifnottex -``(d.c.)''. +``(d.c.).'' @end ifnottex @ifclear FOR_PRINT They also appear in the index under the heading ``dark corner.'' @@ -1949,12 +1952,12 @@ Emacs editor. GNU Emacs is the most widely used version of Emacs today. @cindex GPL (General Public License) @cindex General Public License, See GPL @cindex documentation, online -The GNU@footnote{GNU stands for ``GNU's not Unix.''} +The GNU@footnote{GNU stands for ``GNU's Not Unix.''} Project is an ongoing effort on the part of the Free Software Foundation to create a complete, freely distributable, POSIX-compliant computing environment. -The FSF uses the ``GNU General Public License'' (GPL) to ensure that -their software's +The FSF uses the GNU General Public License (GPL) to ensure that +its software's source code is always available to the end user. @ifclear FOR_PRINT A copy of the GPL is included @@ -2014,7 +2017,7 @@ version of @command{awk}. I started working with that version in the fall of 1988. As work on it progressed, the FSF published several preliminary versions (numbered 0.@var{x}). -In 1996, Edition 1.0 was released with @command{gawk} 3.0.0. +In 1996, edition 1.0 was released with @command{gawk} 3.0.0. The FSF published the first two editions under the title @cite{The GNU Awk User's Guide}. @ifset FOR_PRINT @@ -2026,7 +2029,7 @@ the third edition in 2001. This edition maintains the basic structure of the previous editions. For FSF edition 4.0, the content was thoroughly reviewed and updated. All references to @command{gawk} versions prior to 4.0 were removed. -Of significant note for that edition was @ref{Debugger}. +Of significant note for that edition was the addition of @ref{Debugger}. For FSF edition @ifclear FOR_PRINT @@ -2041,7 +2044,7 @@ and the major new additions are @ref{Arbitrary Precision Arithmetic}, and @ref{Dynamic Extensions}. This @value{DOCUMENT} will undoubtedly continue to evolve. If you -find an error in this @value{DOCUMENT}, please report it! @DBXREF{Bugs} +find an error in the @value{DOCUMENT}, please report it! @DBXREF{Bugs} for information on submitting problem reports electronically. @ifset FOR_PRINT @@ -2051,7 +2054,7 @@ for information on submitting problem reports electronically. You may have a newer version of @command{gawk} than the one described here. To find out what has changed, you should first look at the @file{NEWS} file in the @command{gawk} -distribution, which provides a high-level summary of what changed in +distribution, which provides a high-level summary of the changes in each release. You can then look at the @uref{http://www.gnu.org/software/gawk/manual/, @@ -2105,7 +2108,7 @@ The initial draft of @cite{The GAWK Manual} had the following acknowledgments: Many people need to be thanked for their assistance in producing this manual. Jay Fenlason contributed many ideas and sample programs. Richard Mlynarik and Robert Chassell gave helpful comments on drafts of this -manual. The paper @cite{A Supplemental Document for @command{awk}} by John W.@: +manual. The paper @cite{A Supplemental Document for AWK} by John W.@: Pierce of the Chemistry Department at UC San Diego, pinpointed several issues relevant both to @command{awk} implementation and to this manual, that would otherwise have escaped us. @@ -2116,12 +2119,18 @@ I would like to acknowledge Richard M.@: Stallman, for his vision of a better world and for his courage in founding the FSF and starting the GNU Project. +@ifclear FOR_PRINT Earlier editions of this @value{DOCUMENT} had the following acknowledgements: +@end ifclear +@ifset FOR_PRINT +The previous edition of this @value{DOCUMENT} had +the following acknowledgements: +@end ifset @quotation The following people (in alphabetical order) provided helpful comments on various -versions of this book, +versions of this book: Rick Adams, Dr.@: Nelson H.F. Beebe, Karl Berry, @@ -2149,7 +2158,7 @@ Robert J.@: Chassell provided much valuable advice on the use of Texinfo. He also deserves special thanks for convincing me @emph{not} to title this @value{DOCUMENT} -@cite{How To Gawk Politely}. +@cite{How to Gawk Politely}. Karl Berry helped significantly with the @TeX{} part of Texinfo. @cindex Hartholz, Marshall @@ -2233,9 +2242,9 @@ a number of people. @DBXREF{Contributors} for the full list. @ifset FOR_PRINT @cindex Oram, Andy -Thanks to Andy Oram, of O'Reilly Media, for initiating +Thanks to Andy Oram of O'Reilly Media for initiating the fourth edition and for his support during the work. -Thanks to Jasmine Kwityn for her copy-editing work. +Thanks to Jasmine Kwityn for her copyediting work. @end ifset Thanks to Michael Brennan for the Forewords. @@ -2243,7 +2252,7 @@ Thanks to Michael Brennan for the Forewords. @cindex Duman, Patrice @cindex Berry, Karl Thanks to Patrice Dumas for the new @command{makeinfo} program. -Thanks to Karl Berry who continues to work to keep +Thanks to Karl Berry, who continues to work to keep the Texinfo markup language sane. @cindex Kernighan, Brian @@ -2253,8 +2262,8 @@ Robert P.J.@: Day, Michael Brennan, and Brian Kernighan kindly acted as reviewers for the 2015 edition of this @value{DOCUMENT}. Their feedback helped improve the final work. -I would like to thank Brian Kernighan for invaluable assistance during the -testing and debugging of @command{gawk}, and for ongoing +I would also like to thank Brian Kernighan for his invaluable assistance during the +testing and debugging of @command{gawk}, and for his ongoing help and advice in clarifying numerous points about the language. We could not have done nearly as good a job on either @command{gawk} or its documentation without his help. @@ -2365,9 +2374,9 @@ an advanced feature that we will ignore for now; pattern to search for and one action to perform upon finding the pattern. -Syntactically, a rule consists of a pattern followed by an action. The -action is enclosed in braces to separate it from the pattern. -Newlines usually separate rules. Therefore, an @command{awk} +Syntactically, a rule consists of a @dfn{pattern} followed by an +@dfn{action}. The action is enclosed in braces to separate it from the +pattern. Newlines usually separate rules. Therefore, an @command{awk} program looks like this: @example @@ -2441,8 +2450,8 @@ awk '@var{program}' @var{input-file1} @var{input-file2} @dots{} @end example @noindent -where @var{program} consists of a series of @var{patterns} and -@var{actions}, as described earlier. +where @var{program} consists of a series of patterns and +actions, as described earlier. @cindex single quote (@code{'}) @cindex @code{'} (single quote) @@ -2461,12 +2470,12 @@ programs from shell scripts, because it avoids the need for a separate file for the @command{awk} program. A self-contained shell script is more reliable because there are no other files to misplace. -Later in this chapter, +Later in this chapter, in @ifdocbook the section @end ifdocbook @ref{Very Simple}, -presents several short, +we'll see examples of several short, self-contained programs. @node Read Terminal @@ -2487,10 +2496,10 @@ awk '@var{program}' which usually means whatever you type on the keyboard. This continues until you indicate end-of-file by typing @kbd{Ctrl-d}. @ifset FOR_PRINT -(On other operating systems, the end-of-file character may be different.) +(On non-POSIX operating systems, the end-of-file character may be different.) @end ifset @ifclear FOR_PRINT -(On other operating systems, the end-of-file character may be different. +(On non-POSIX operating systems, the end-of-file character may be different. For example, on OS/2, it is @kbd{Ctrl-z}.) @end ifclear @@ -2590,11 +2599,9 @@ for programs that are provided on the @command{awk} command line. (Also, placing the program in a file allows us to use a literal single quote in the program text, instead of the magic @samp{\47}.) -@c STARTOFRANGE sq1x @cindex single quote (@code{'}) in @command{gawk} command lines -@c STARTOFRANGE qs2x @cindex @code{'} (single quote) in @command{gawk} command lines -If you want to clearly identify your @command{awk} program files as such, +If you want to clearly identify an @command{awk} program file as such, you can add the extension @file{.awk} to the @value{FN}. This doesn't affect the execution of the @command{awk} program but it does make ``housekeeping'' easier. @@ -2808,7 +2815,7 @@ The next @value{SUBSECTION} describes the shell's quoting rules. @end quotation @node Quoting -@subsection Shell-Quoting Issues +@subsection Shell Quoting Issues @cindex shell quoting, rules for @menu @@ -2945,7 +2952,7 @@ $ @kbd{awk 'BEGIN @{ print "Here is a single quote <'"'"'>" @}'} @noindent This program consists of three concatenated quoted strings. The first and the -third are single quoted, the second is double quoted. +third are single-quoted, and the second is double-quoted. This can be ``simplified'' to: @@ -2966,8 +2973,6 @@ $ @kbd{awk "BEGIN @{ print \"Here is a single quote <'>\" @}"} @end example @noindent -@c ENDOFRANGE sq1x -@c ENDOFRANGE qs2x This option is also painful, because double quotes, backslashes, and dollar signs are very common in more advanced @command{awk} programs. @@ -2984,7 +2989,7 @@ $ @kbd{awk 'BEGIN @{ print "Here is a double quote <\42>" @}'} @end example @noindent -This works nicely, except that you should comment clearly what the +This works nicely, but you should comment clearly what the escapes mean. A fourth option is to use command-line variable assignment, like this: @@ -2995,11 +3000,11 @@ $ @kbd{awk -v sq="'" 'BEGIN @{ print "Here is a single quote <" sq ">" @}'} @end example (Here, the two string constants and the value of @code{sq} are concatenated -into a single string which is printed by @code{print}.) +into a single string that is printed by @code{print}.) If you really need both single and double quotes in your @command{awk} program, it is probably best to move it into a separate file, where -the shell won't be part of the picture, and you can say what you mean. +the shell won't be part of the picture and you can say what you mean. @node DOS Quoting @subsubsection Quoting in MS-Windows Batch Files @@ -3098,7 +3103,7 @@ of green crates shipped, the number of red boxes shipped, the number of orange bags shipped, and the number of blue packages shipped, respectively. There are 16 entries, covering the 12 months of last year and the first four months of the current year. -An empty line separates the data for the two years. +An empty line separates the data for the two years: @example @c file eg/data/inventory-shipped @@ -3132,7 +3137,7 @@ The following command runs a simple @command{awk} program that searches the input file @file{mail-list} for the character string @samp{li} (a grouping of characters is usually called a @dfn{string}; the term @dfn{string} is based on similar usage in English, such -as ``a string of pearls,'' or ``a string of cars in a train''): +as ``a string of pearls'' or ``a string of cars in a train''): @example awk '/li/ @{ print $0 @}' mail-list @@ -3179,7 +3184,7 @@ omitting the @code{print} statement but retaining the braces makes an empty action that does nothing (i.e., no lines are printed). @cindex @command{awk} programs, one-line examples -Many practical @command{awk} programs are just a line or two. Following is a +Many practical @command{awk} programs are just a line or two long. Following is a collection of useful, short programs to get you started. Some of these programs contain constructs that haven't been covered yet. (The description of the program will give you a good idea of what is going on, but you'll @@ -3200,7 +3205,7 @@ Print every line that is longer than 80 characters: awk 'length($0) > 80' data @end example -The sole rule has a relational expression as its pattern and it has no +The sole rule has a relational expression as its pattern and has no action---so it uses the default action, printing the record. @item @@ -3287,7 +3292,7 @@ Print the even-numbered lines in the @value{DF}: awk 'NR % 2 == 0' data @end example -If you use the expression @samp{NR % 2 == 1} instead, +If you used the expression @samp{NR % 2 == 1} instead, the program would print the odd-numbered lines. @end itemize @@ -3303,8 +3308,13 @@ no actions run. After processing all the rules that match the line (and perhaps there are none), @command{awk} reads the next line. (However, -@pxref{Next Statement}, +@DBPXREF{Next Statement} +@ifdocbook +and @DBREF{Nextfile Statement}.) +@end ifdocbook +@ifnotdocbook and also @pxref{Nextfile Statement}.) +@end ifnotdocbook This continues until the program reaches the end of the file. For example, the following @command{awk} program contains two rules: @@ -3569,7 +3579,7 @@ performing bit manipulation, for runtime string translation (internationalizatio determining the type of a variable, and array sorting. -As we develop our presentation of the @command{awk} language, we introduce +As we develop our presentation of the @command{awk} language, we will introduce most of the variables and many of the functions. They are described systematically in @DBREF{Built-in Variables} and in @ref{Built-in}. @@ -3623,7 +3633,7 @@ and Perl.} @c FIXME: Review this chapter for summary of builtin functions called. @itemize @value{BULLET} @item -Programs in @command{awk} consist of @var{pattern}-@var{action} pairs. +Programs in @command{awk} consist of @var{pattern}--@var{action} pairs. @item An @var{action} without a @var{pattern} always runs. The default @@ -3652,7 +3662,7 @@ part of a larger shell script (or MS-Windows batch file). You may use backslash continuation to continue a source line. Lines are automatically continued after a comma, open brace, question mark, colon, -@samp{||}, @samp{&&}, @code{do} and @code{else}. +@samp{||}, @samp{&&}, @code{do}, and @code{else}. @end itemize @node Invoking Gawk @@ -3727,20 +3737,16 @@ warning that the program is empty. @node Options @section Command-Line Options -@c STARTOFRANGE ocl @cindex options, command-line -@c STARTOFRANGE clo @cindex command line, options -@c STARTOFRANGE gnulo @cindex GNU long options -@c STARTOFRANGE longo @cindex options, long Options begin with a dash and consist of a single character. GNU-style long options consist of two dashes and a keyword. The keyword can be abbreviated, as long as the abbreviation allows the option -to be uniquely identified. If the option takes an argument, then the -keyword is either immediately followed by an equals sign (@samp{=}) and the +to be uniquely identified. If the option takes an argument, either the +keyword is immediately followed by an equals sign (@samp{=}) and the argument's value, or the keyword and the argument's value are separated by whitespace. If a particular option with a value is given more than once, it is the @@ -3767,7 +3773,7 @@ Set the @code{FS} variable to @var{fs} @cindex @option{-f} option @cindex @option{--file} option @cindex @command{awk} programs, location of -Read @command{awk} program source from @var{source-file} +Read the @command{awk} program source from @var{source-file} instead of in the first nonoption argument. This option may be given multiple times; the @command{awk} program consists of the concatenation of the contents of @@ -3822,8 +3828,6 @@ by the user that could start with @samp{-}. It is also useful for passing options on to the @command{awk} program; see @ref{Getopt Function}. @end table -@c ENDOFRANGE gnulo -@c ENDOFRANGE longo The following list describes @command{gawk}-specific options: @@ -3835,14 +3839,14 @@ The following list describes @command{gawk}-specific options: @cindex @option{--characters-as-bytes} option Cause @command{gawk} to treat all input data as single-byte characters. In addition, all output written with @code{print} or @code{printf} -are treated as single-byte characters. +is treated as single-byte characters. Normally, @command{gawk} follows the POSIX standard and attempts to process its input data according to the current locale (@pxref{Locales}). This can often involve converting multibyte characters into wide characters (internally), and can lead to problems or confusion if the input data does not contain valid -multibyte characters. This option is an easy way to tell @command{gawk}: -``hands off my data!''. +multibyte characters. This option is an easy way to tell @command{gawk}, +``Hands off my data!'' @item @option{-c} @itemx @option{--traditional} @@ -3899,7 +3903,7 @@ Enable debugging of @command{awk} programs By default, the debugger reads commands interactively from the keyboard (standard input). The optional @var{file} argument allows you to specify a file with a list -of commands for the debugger to execute non-interactively. +of commands for the debugger to execute noninteractively. No space is allowed between the @option{-D} and @var{file}, if @var{file} is supplied. @@ -3959,7 +3963,7 @@ with @samp{#!} scripts (@pxref{Executable Scripts}), like so: @cindex portable object files, generating @cindex files, portable object, generating Analyze the source program and -generate a GNU @command{gettext} Portable Object Template file on standard +generate a GNU @command{gettext} portable object template file on standard output for all string constants that have been marked for translation. @xref{Internationalization}, for information about this option. @@ -3971,7 +3975,7 @@ for information about this option. @cindex GNU long options, printing list of @cindex options, printing list of @cindex printing, list of options -Print a ``usage'' message summarizing the short and long style options +Print a ``usage'' message summarizing the short- and long-style options that @command{gawk} accepts and then exit. @item @option{-i} @var{source-file} @@ -3981,7 +3985,7 @@ that @command{gawk} accepts and then exit. @cindex @command{awk} programs, location of Read an @command{awk} source library from @var{source-file}. This option is completely equivalent to using the @code{@@include} directive inside -your program. This option is very similar to the @option{-f} option, +your program. It is very similar to the @option{-f} option, but there are two important differences. First, when @option{-i} is used, the program source is not loaded if it has been previously loaded, whereas with @option{-f}, @command{gawk} always loads the file. @@ -4066,7 +4070,7 @@ when parsing numeric input data (@pxref{Locales}). @cindex @option{-o} option @cindex @option{--pretty-print} option Enable pretty-printing of @command{awk} programs. -By default, output program is created in a file named @file{awkprof.out} +By default, the output program is created in a file named @file{awkprof.out} (@pxref{Profiling}). The optional @var{file} argument allows you to specify a different @value{FN} for the output. @@ -4110,7 +4114,7 @@ in the left margin, and function call counts for each function. Operate in strict POSIX mode. This disables all @command{gawk} extensions (just like @option{--traditional}) and disables all extensions not allowed by POSIX. -@xref{Common Extensions}, for a summary of the extensions +@DBXREF{Common Extensions} for a summary of the extensions in @command{gawk} that are disabled by this option. Also, the following additional @@ -4231,7 +4235,7 @@ source of data.) Because it is clumsy using the standard @command{awk} mechanisms to mix source file and command-line @command{awk} programs, @command{gawk} provides the @option{-e} option. This does not require you to -pre-empt the standard input for your source code; it allows you to easily +preempt the standard input for your source code; it allows you to easily mix command-line and library source code (@pxref{AWKPATH Variable}). As with @option{-f}, the @option{-e} and @option{-i} options may also be used multiple times on the command line. @@ -4277,8 +4281,6 @@ setenv POSIXLY_CORRECT true Having @env{POSIXLY_CORRECT} set is not recommended for daily use, but it is good for testing the portability of your programs to other environments. -@c ENDOFRANGE ocl -@c ENDOFRANGE clo @node Other Arguments @section Other Command-Line Arguments @@ -4421,7 +4423,7 @@ file, unless the file is in the current directory. But with @command{gawk}, if the @value{FN} supplied to the @option{-f} or @option{-i} options does not contain a directory separator @samp{/}, then @command{gawk} searches a list of -directories (called the @dfn{search path}), one by one, looking for a +directories (called the @dfn{search path}) one by one, looking for a file with the specified name. The search path is a string consisting of directory names @@ -4462,9 +4464,9 @@ as an entry in the path or write a null entry in the path. Different past versions of @command{gawk} would also look explicitly in the current directory, either before or after the path search. As of -@value{PVERSION} 4.1.2, this no longer happens, and if you wish to look +@value{PVERSION} 4.1.2, this no longer happens; if you wish to look in the current directory, you must include @file{.} either as a separate -entry, or as a null entry in the search path. +entry or as a null entry in the search path. @end quotation The default value for @env{AWKPATH} is @@ -4580,7 +4582,7 @@ If this variable exists, @command{gawk} includes the @value{FN} and line number within the @command{gawk} source code from which warning and/or fatal messages are generated. Its purpose is to help isolate the source of a -message, as there are multiple places which produce the +message, as there are multiple places that produce the same warning or error message. @item GAWK_NO_DFA @@ -4596,16 +4598,16 @@ This specifies the amount by which @command{gawk} should grow its internal evaluation stack, when needed. @item INT_CHAIN_MAX -The intended maximum number of items @command{gawk} will maintain on a +This specifies intended maximum number of items @command{gawk} will maintain on a hash chain for managing arrays indexed by integers. @item STR_CHAIN_MAX -The intended maximum number of items @command{gawk} will maintain on a +This specifies intended maximum number of items @command{gawk} will maintain on a hash chain for managing arrays indexed by strings. @item TIDYMEM If this variable exists, @command{gawk} uses the @code{mtrace()} library -calls from GNU LIBC to help track down possible memory leaks. +calls from the GNU C library to help track down possible memory leaks. @end table @node Exit Status @@ -4642,7 +4644,7 @@ The @code{@@include} keyword can be used to read external @command{awk} source files. This gives you the ability to split large @command{awk} source files into smaller, more manageable pieces, and also lets you reuse common @command{awk} code from various @command{awk} scripts. In other words, you can group -together @command{awk} functions, used to carry out specific tasks, +together @command{awk} functions used to carry out specific tasks into external files. These files can be used just like function libraries, using the @code{@@include} keyword in conjunction with the @env{AWKPATH} environment variable. Note that source files may also be included @@ -4677,7 +4679,7 @@ $ @kbd{gawk -f test2} @print{} This is script test2. @end example -@code{gawk} runs the @file{test2} script, which includes @file{test1} +@command{gawk} runs the @file{test2} script, which includes @file{test1} using the @code{@@include} keyword. So, to include external @command{awk} source files, you just use @code{@@include} followed by the name of the file to be included, @@ -4732,11 +4734,12 @@ of the @env{AWKPATH} variable in command-line file searches This is very helpful in constructing @command{gawk} function libraries. If you have a large script with useful, general-purpose @command{awk} functions, you can break it down into library files and put those files -in a special directory. You can then include those ``libraries,'' using -either the full pathnames of the files, or by setting the @env{AWKPATH} +in a special directory. You can then include those ``libraries,'' +either by using the full pathnames of the files, or by setting the @env{AWKPATH} environment variable accordingly and then using @code{@@include} with -just the file part of the full pathname. Of course, you can have more -than one directory to keep library files; the more complex the working +just the file part of the full pathname. Of course, +you can keep library files in more than one directory; +the more complex the working environment is, the more directories you may need to organize the files to be included. @@ -4749,8 +4752,8 @@ In particular, @code{@@include} is very useful for writing CGI scripts to be run from web pages. As mentioned in @ref{AWKPATH Variable}, the current directory is always -searched first for source files, before searching in @env{AWKPATH}, -and this also applies to files named with @code{@@include}. +searched first for source files, before searching in @env{AWKPATH}; +this also applies to files named with @code{@@include}. @node Loading Shared Libraries @section Loading Dynamic Extensions into Your Program @@ -4804,8 +4807,8 @@ It also describes the @code{ordchr} extension. @cindex features, deprecated @cindex obsolete features This @value{SECTION} describes features and/or command-line options from -previous releases of @command{gawk} that are either not available in the -current version or that are still supported but deprecated (meaning that +previous releases of @command{gawk} that either are not available in the +current version or are still supported but deprecated (meaning that they will @emph{not} be in the next release). The process-related special files @file{/dev/pid}, @file{/dev/ppid}, @@ -4885,7 +4888,7 @@ This seems to have been a long-undocumented feature in Unix @command{awk}. Similarly, you may use @code{print} or @code{printf} statements in the @var{init} and @var{increment} parts of a @code{for} loop. This is another -long-undocumented ``feature'' of Unix @code{awk}. +long-undocumented ``feature'' of Unix @command{awk}. @end ignore @@ -4902,7 +4905,7 @@ to run @command{awk}. @item The three standard options for all versions of @command{awk} are -@option{-f}, @option{-F} and @option{-v}. @command{gawk} supplies these +@option{-f}, @option{-F}, and @option{-v}. @command{gawk} supplies these and many others, as well as corresponding GNU-style long options. @item @@ -4939,13 +4942,12 @@ and @option{-f} command-line options. @item @command{gawk} allows you to load additional functions written in C or C++ using the @code{@@load} statement and/or the @option{-l} option. -(This advanced feature is described later on in @ref{Dynamic Extensions}.) +(This advanced feature is described later, in @ref{Dynamic Extensions}.) @end itemize @node Regexp @chapter Regular Expressions @cindex regexp -@c STARTOFRANGE regexp @cindex regular expressions A @dfn{regular expression}, or @dfn{regexp}, is a way of describing a @@ -5152,7 +5154,7 @@ Horizontal TAB, @kbd{Ctrl-i}, ASCII code 9 (HT). @cindex @code{\} (backslash), @code{\v} escape sequence @cindex backslash (@code{\}), @code{\v} escape sequence @item \v -Vertical tab, @kbd{Ctrl-k}, ASCII code 11 (VT). +Vertical TAB, @kbd{Ctrl-k}, ASCII code 11 (VT). @cindex @code{\} (backslash), @code{\}@var{nnn} escape sequence @cindex backslash (@code{\}), @code{\}@var{nnn} escape sequence @@ -5226,7 +5228,7 @@ characters @samp{a+b}. @cindex @code{\} (backslash), in escape sequences @cindex portability For complete portability, do not use a backslash before any character not -shown in the previous list and that is not an operator. +shown in the previous list or that is not an operator. @c 11/2014: Moved so as to not stack sidebars @cindex sidebar, Backslash Before Regular Characters @@ -5388,7 +5390,6 @@ escape sequences literally when used in regexp constants. Thus, @node Regexp Operators @section Regular Expression Operators -@c STARTOFRANGE regexpo @cindex regular expressions, operators @cindex metacharacters in regular expressions @@ -5406,7 +5407,7 @@ are recognized and converted into corresponding real characters as the very first step in processing regexps. Here is a list of metacharacters. All characters that are not escape -sequences and that are not listed in the following stand for themselves: +sequences and that are not listed here stand for themselves: @c Use @asis so the docbook comes out ok. Sigh. @table @asis @@ -5529,7 +5530,7 @@ just @samp{p} if no @samp{h}s are present. There are two subtle points to understand about how @samp{*} works. First, the @samp{*} applies only to the single preceding regular expression component (e.g., in @samp{ph*}, it applies just to the @samp{h}). -To cause @samp{*} to apply to a larger sub-expression, use parentheses: +To cause @samp{*} to apply to a larger subexpression, use parentheses: @samp{(ph)*} matches @samp{ph}, @samp{phph}, @samp{phphph}, and so on. Second, @samp{*} finds as many repetitions as possible. If the text @@ -5568,10 +5569,10 @@ is repeated at least @var{n} times: Matches @samp{whhhy}, but not @samp{why} or @samp{whhhhy}. @item wh@{3,5@}y -Matches @samp{whhhy}, @samp{whhhhy}, or @samp{whhhhhy}, only. +Matches @samp{whhhy}, @samp{whhhhy}, or @samp{whhhhhy} only. @item wh@{2,@}y -Matches @samp{whhy} or @samp{whhhy}, and so on. +Matches @samp{whhy}, @samp{whhhy}, and so on. @end table @cindex POSIX @command{awk}, interval expressions in @@ -5620,11 +5621,9 @@ usage as a syntax error. If @command{gawk} is in compatibility mode (@pxref{Options}), interval expressions are not available in regular expressions. -@c ENDOFRANGE regexpo @node Bracket Expressions @section Using Bracket Expressions -@c STARTOFRANGE charlist @cindex bracket expressions @cindex bracket expressions, range expressions @cindex range expressions (regexps) @@ -5700,7 +5699,7 @@ POSIX standard. (a space is printable but not visible, whereas an @samp{a} is both) @item @code{[:lower:]} @tab Lowercase alphabetic characters @item @code{[:print:]} @tab Printable characters (characters that are not control characters) -@item @code{[:punct:]} @tab Punctuation characters (characters that are not letters, digits +@item @code{[:punct:]} @tab Punctuation characters (characters that are not letters, digits, control characters, or space characters) @item @code{[:space:]} @tab Space characters (such as space, TAB, and formfeed, to name a few) @item @code{[:upper:]} @tab Uppercase alphabetic characters @@ -5720,11 +5719,11 @@ and numeric characters in your character set. @c Date: Tue, 01 Jul 2014 07:39:51 +0200 @c From: Hermann Peifer <peifer@gmx.eu> Some utilities that match regular expressions provide a nonstandard -@code{[:ascii:]} character class; @command{awk} does not. However, you -can simulate such a construct using @code{[\x00-\x7F]}. This matches +@samp{[:ascii:]} character class; @command{awk} does not. However, you +can simulate such a construct using @samp{[\x00-\x7F]}. This matches all values numerically between zero and 127, which is the defined range of the ASCII character set. Use a complemented character list -(@code{[^\x00-\x7F]}) to match any single-byte characters that are not +(@samp{[^\x00-\x7F]}) to match any single-byte characters that are not in the ASCII range. @cindex bracket expressions, collating elements @@ -5753,8 +5752,8 @@ Locale-specific names for a list of characters that are equal. The name is enclosed between @samp{[=} and @samp{=]}. For example, the name @samp{e} might be used to represent all of -``e,'' ``@`e,'' and ``@'e.'' In this case, @samp{[[=e=]]} is a regexp -that matches any of @samp{e}, @samp{@'e}, or @samp{@`e}. +``e,'' ``@^e,'' ``@`e,'' and ``@'e.'' In this case, @samp{[[=e=]]} is a regexp +that matches any of @samp{e}, @samp{@^e}, @samp{@'e}, or @samp{@`e}. @end table These features are very valuable in non-English-speaking locales. @@ -5768,7 +5767,6 @@ expression matching currently recognize only POSIX character classes; they do not recognize collating symbols or equivalence classes. @end quotation @c maybe one day ... -@c ENDOFRANGE charlist @node Leftmost Longest @section How Much Text Matches? @@ -5784,7 +5782,7 @@ echo aaaabcd | awk '@{ sub(/a+/, "<A>"); print @}' This example uses the @code{sub()} function to make a change to the input record. (@code{sub()} replaces the first instance of any text matched by the first argument with the string provided as the second argument; -@pxref{String Functions}). Here, the regexp @code{/a+/} indicates ``one +@pxref{String Functions}.) Here, the regexp @code{/a+/} indicates ``one or more @samp{a} characters,'' and the replacement text is @samp{<A>}. The input contains four @samp{a} characters. @@ -5812,9 +5810,7 @@ and also @pxref{Field Separators}). @node Computed Regexps @section Using Dynamic Regexps -@c STARTOFRANGE dregexp @cindex regular expressions, computed -@c STARTOFRANGE regexpd @cindex regular expressions, dynamic @cindex @code{~} (tilde), @code{~} operator @cindex tilde (@code{~}), @code{~} operator @@ -5840,14 +5836,14 @@ and tests whether the input record matches this regexp. @quotation NOTE When using the @samp{~} and @samp{!~} -operators, there is a difference between a regexp constant +operators, be aware that there is a difference between a regexp constant enclosed in slashes and a string constant enclosed in double quotes. If you are going to use a string constant, you have to understand that the string is, in essence, scanned @emph{twice}: the first time when @command{awk} reads your program, and the second time when it goes to match the string on the lefthand side of the operator with the pattern on the right. This is true of any string-valued expression (such as -@code{digits_regexp}, shown previously), not just string constants. +@code{digits_regexp}, shown in the previous example), not just string constants. @end quotation @cindex regexp constants, slashes vs.@: quotes @@ -5965,17 +5961,13 @@ $ @kbd{awk '$0 ~ /[ \t\n]/'} occur often in practice, but it's worth noting for future reference. @end cartouche @end ifnotdocbook -@c ENDOFRANGE dregexp -@c ENDOFRANGE regexpd @node GNU Regexp Operators @section @command{gawk}-Specific Regexp Operators @c This section adapted (long ago) from the regex-0.12 manual -@c STARTOFRANGE regexpg @cindex regular expressions, operators, @command{gawk} -@c STARTOFRANGE gregexp @cindex @command{gawk}, regular expressions, operators @cindex operators, GNU-specific @cindex regular expressions, operators, for words @@ -6051,7 +6043,7 @@ matches either @samp{ball} or @samp{balls}, as a separate word. @item \B Matches the empty string that occurs between two word-constituent characters. For example, -@code{/\Brat\B/} matches @samp{crate} but it does not match @samp{dirty rat}. +@code{/\Brat\B/} matches @samp{crate}, but it does not match @samp{dirty rat}. @samp{\B} is essentially the opposite of @samp{\y}. @end table @@ -6070,14 +6062,14 @@ The operators are: @cindex backslash (@code{\}), @code{\`} operator (@command{gawk}) @cindex @code{\} (backslash), @code{\`} operator (@command{gawk}) Matches the empty string at the -beginning of a buffer (string). +beginning of a buffer (string) @c @cindex operators, @code{\'} (@command{gawk}) @cindex backslash (@code{\}), @code{\'} operator (@command{gawk}) @cindex @code{\} (backslash), @code{\'} operator (@command{gawk}) @item \' Matches the empty string at the -end of a buffer (string). +end of a buffer (string) @end table @cindex @code{^} (caret), regexp operator @@ -6140,15 +6132,11 @@ Allow interval expressions in regexps, if @option{--traditional} has been provided. Otherwise, interval expressions are available by default. @end table -@c ENDOFRANGE gregexp -@c ENDOFRANGE regexpg @node Case-sensitivity @section Case Sensitivity in Matching -@c STARTOFRANGE regexpcs @cindex regular expressions, case sensitivity -@c STARTOFRANGE csregexp @cindex case sensitivity, regexps and Case is normally significant in regular expressions, both when matching ordinary characters (i.e., not metacharacters) and inside bracket @@ -6240,8 +6228,6 @@ the right thing.} The value of @code{IGNORECASE} has no effect if @command{gawk} is in compatibility mode (@pxref{Options}). Case is always significant in compatibility mode. -@c ENDOFRANGE csregexp -@c ENDOFRANGE regexpcs @node Regexp Summary @section Summary @@ -6288,12 +6274,10 @@ versions, use @code{tolower()} or @code{toupper()}. @end itemize -@c ENDOFRANGE regexp @node Reading Files @chapter Reading Input Files -@c STARTOFRANGE infir @cindex reading input files @cindex input files, reading @cindex input files @@ -6318,7 +6302,7 @@ This makes it more convenient for programs to work on the parts of a record. @cindex @code{getline} command On rare occasions, you may need to use the @code{getline} command. -The @code{getline} command is valuable, both because it +The @code{getline} command is valuable both because it can do explicit input from any number of files, and because the files used with it do not have to be named on the @command{awk} command line (@pxref{Getline}). @@ -6344,9 +6328,7 @@ used with it do not have to be named on the @command{awk} command line @node Records @section How Input Is Split into Records -@c STARTOFRANGE inspl @cindex input, splitting into records -@c STARTOFRANGE recspl @cindex records, splitting input into @cindex @code{NR} variable @cindex @code{FNR} variable @@ -6371,8 +6353,8 @@ never automatically reset to zero. Records are separated by a character called the @dfn{record separator}. By default, the record separator is the newline character. This is why records are, by default, single lines. -A different character can be used for the record separator by -assigning the character to the predefined variable @code{RS}. +To use a different character for the record separator, +simply assign that character to the predefined variable @code{RS}. @cindex newlines, as record separators @cindex @code{RS} variable @@ -6395,8 +6377,8 @@ awk 'BEGIN @{ RS = "u" @} @noindent changes the value of @code{RS} to @samp{u}, before reading any input. -This is a string whose first character is the letter ``u''; as a result, records -are separated by the letter ``u.'' Then the input file is read, and the second +The new value is a string whose first character is the letter ``u''; as a result, records +are separated by the letter ``u''. Then the input file is read, and the second rule in the @command{awk} program (the action with no pattern) prints each record. Because each @code{print} statement adds a newline at the end of its output, this @command{awk} program copies the input @@ -6457,8 +6439,8 @@ Bill 555-1675 bill.drowning@@hotmail.com A @end example @noindent -It contains no @samp{u} so there is no reason to split the record, -unlike the others which have one or more occurrences of the @samp{u}. +It contains no @samp{u}, so there is no reason to split the record, +unlike the others, which each have one or more occurrences of the @samp{u}. In fact, this record is treated as part of the previous record; the newline separating them in the output is the original newline in the @value{DF}, not the one added by @@ -6553,7 +6535,7 @@ contains the same single character. However, when @code{RS} is a regular expression, @code{RT} contains the actual input text that matched the regular expression. -If the input file ended without any text that matches @code{RS}, +If the input file ends without any text matching @code{RS}, @command{gawk} sets @code{RT} to the null string. The following example illustrates both of these features. @@ -6703,8 +6685,6 @@ whole files. If you are using @command{gawk}, see @DBREF{Extension Sample Readfile} for another option. @end cartouche @end ifnotdocbook -@c ENDOFRANGE inspl -@c ENDOFRANGE recspl @node Fields @section Examining Fields @@ -6712,7 +6692,6 @@ Readfile} for another option. @cindex examining fields @cindex fields @cindex accessing fields -@c STARTOFRANGE fiex @cindex fields, examining @cindex POSIX @command{awk}, field separators and @cindex field separators, POSIX and @@ -6737,11 +6716,11 @@ simple @command{awk} programs so powerful. @cindex @code{$} (dollar sign), @code{$} field operator @cindex dollar sign (@code{$}), @code{$} field operator @cindex field operators@comma{} dollar sign as -You use a dollar-sign (@samp{$}) +You use a dollar sign (@samp{$}) to refer to a field in an @command{awk} program, followed by the number of the field you want. Thus, @code{$1} refers to the first field, @code{$2} to the second, and so on. -(Unlike the Unix shells, the field numbers are not limited to single digits. +(Unlike in the Unix shells, the field numbers are not limited to single digits. @code{$127} is the 127th field in the record.) For example, suppose the following is a line of input: @@ -6767,7 +6746,7 @@ If you try to reference a field beyond the last one (such as @code{$8} when the record has only seven fields), you get the empty string. (If used in a numeric operation, you get zero.) -The use of @code{$0}, which looks like a reference to the ``zero-th'' field, is +The use of @code{$0}, which looks like a reference to the ``zeroth'' field, is a special case: it represents the whole input record. Use it when you are not interested in specific fields. Here are some more examples: @@ -6793,7 +6772,6 @@ $ @kbd{awk '/li/ @{ print $1, $NF @}' mail-list} @print{} Julie F @print{} Samuel A @end example -@c ENDOFRANGE fiex @node Nonconstant Fields @section Nonconstant Field Numbers @@ -6823,13 +6801,13 @@ awk '@{ print $(2*2) @}' mail-list @end example @command{awk} evaluates the expression @samp{(2*2)} and uses -its value as the number of the field to print. The @samp{*} sign +its value as the number of the field to print. The @samp{*} represents multiplication, so the expression @samp{2*2} evaluates to four. The parentheses are used so that the multiplication is done before the @samp{$} operation; they are necessary whenever there is a binary operator@footnote{A @dfn{binary operator}, such as @samp{*} for multiplication, is one that takes two operands. The distinction -is required, because @command{awk} also has unary (one-operand) +is required because @command{awk} also has unary (one-operand) and ternary (three-operand) operators.} in the field-number expression. This example, then, prints the type of relationship (the fourth field) for every line of the file @@ -6854,7 +6832,6 @@ evaluating @code{NF} and using its value as a field number. @node Changing Fields @section Changing the Contents of a Field -@c STARTOFRANGE ficon @cindex fields, changing contents of The contents of a field, as seen by @command{awk}, can be changed within an @command{awk} program; this changes what @command{awk} perceives as the @@ -7010,7 +6987,7 @@ rebuild @code{$0} when @code{NF} is decremented. Finally, there are times when it is convenient to force @command{awk} to rebuild the entire record, using the current -value of the fields and @code{OFS}. To do this, use the +values of the fields and @code{OFS}. To do this, use the seemingly innocuous assignment: @example @@ -7039,7 +7016,7 @@ such as @code{sub()} and @code{gsub()} It is important to remember that @code{$0} is the @emph{full} record, exactly as it was read from the input. This includes any leading or trailing whitespace, and the exact whitespace (or other -characters) that separate the fields. +characters) that separates the fields. It is a common error to try to change the field separators in a record simply by setting @code{FS} and @code{OFS}, and then @@ -7064,7 +7041,7 @@ with a statement such as @samp{$1 = $1}, as described earlier. It is important to remember that @code{$0} is the @emph{full} record, exactly as it was read from the input. This includes any leading or trailing whitespace, and the exact whitespace (or other -characters) that separate the fields. +characters) that separates the fields. It is a common error to try to change the field separators in a record simply by setting @code{FS} and @code{OFS}, and then @@ -7077,7 +7054,6 @@ with a statement such as @samp{$1 = $1}, as described earlier. @end cartouche @end ifnotdocbook -@c ENDOFRANGE ficon @node Field Separators @section Specifying How Fields Are Separated @@ -7093,9 +7069,7 @@ with a statement such as @samp{$1 = $1}, as described earlier. @cindex @code{FS} variable @cindex fields, separating -@c STARTOFRANGE fisepr @cindex field separators -@c STARTOFRANGE fisepg @cindex fields, separating The @dfn{field separator}, which is either a single character or a regular expression, controls the way @command{awk} splits an input record into fields. @@ -7161,7 +7135,7 @@ John Q. Smith, LXIX, 29 Oak St., Walamazoo, MI 42139 @end example @noindent -The same program would extract @samp{@bullet{}LXIX}, instead of +The same program would extract @samp{@bullet{}LXIX} instead of @samp{@bullet{}29@bullet{}Oak@bullet{}St.}. If you were expecting the program to print the address, you would be surprised. The moral is to choose your data layout and @@ -7195,9 +7169,7 @@ rules. @node Regexp Field Splitting @subsection Using Regular Expressions to Separate Fields -@c STARTOFRANGE regexpfs @cindex regular expressions, as field separators -@c STARTOFRANGE fsregexp @cindex field separators, regular expressions as The previous @value{SUBSECTION} discussed the use of single characters or simple strings as the @@ -7301,8 +7273,6 @@ $ @kbd{echo 'xxAA xxBxx C' |} @print{} -->xxBxx<-- @print{} -->C<-- @end example -@c ENDOFRANGE regexpfs -@c ENDOFRANGE fsregexp @node Single Character Fields @subsection Making Each Character a Separate Field @@ -7426,7 +7396,7 @@ choosing your field and record separators. @cindex Unix @command{awk}, password files@comma{} field separators and Perhaps the most common use of a single character as the field separator occurs when processing the Unix system password file. On many Unix -systems, each user has a separate entry in the system password file, one +systems, each user has a separate entry in the system password file, with one line per user. The information in these lines is separated by colons. The first field is the user's login name and the second is the user's encrypted or shadow password. (A shadow password is indicated by the @@ -7472,7 +7442,7 @@ When you do this, @code{$1} is the same as @code{$0}. According to the POSIX standard, @command{awk} is supposed to behave as if each record is split into fields at the time it is read. In particular, this means that if you change the value of @code{FS} -after a record is read, the value of the fields (i.e., how they were split) +after a record is read, the values of the fields (i.e., how they were split) should reflect the old value of @code{FS}, not the new one. @cindex dark corner, field separators @@ -7485,10 +7455,7 @@ using the @emph{current} value of @code{FS}! @value{DARKCORNER} This behavior can be difficult to diagnose. The following example illustrates the difference -between the two methods. -(The @command{sed}@footnote{The @command{sed} utility is a ``stream editor.'' -Its behavior is also defined by the POSIX standard.} -command prints just the first line of @file{/etc/passwd}.) +between the two methods: @example sed 1q /etc/passwd | awk '@{ FS = ":" ; print $1 @}' @@ -7509,6 +7476,10 @@ prints the full first line of the file, something like: root:x:0:0:Root:/: @end example +(The @command{sed}@footnote{The @command{sed} utility is a ``stream editor.'' +Its behavior is also defined by the POSIX standard.} +command prints just the first line of @file{/etc/passwd}.) + @docbook </sidebar> @end docbook @@ -7525,7 +7496,7 @@ root:x:0:0:Root:/: According to the POSIX standard, @command{awk} is supposed to behave as if each record is split into fields at the time it is read. In particular, this means that if you change the value of @code{FS} -after a record is read, the value of the fields (i.e., how they were split) +after a record is read, the values of the fields (i.e., how they were split) should reflect the old value of @code{FS}, not the new one. @cindex dark corner, field separators @@ -7538,10 +7509,7 @@ using the @emph{current} value of @code{FS}! @value{DARKCORNER} This behavior can be difficult to diagnose. The following example illustrates the difference -between the two methods. -(The @command{sed}@footnote{The @command{sed} utility is a ``stream editor.'' -Its behavior is also defined by the POSIX standard.} -command prints just the first line of @file{/etc/passwd}.) +between the two methods: @example sed 1q /etc/passwd | awk '@{ FS = ":" ; print $1 @}' @@ -7561,6 +7529,10 @@ prints the full first line of the file, something like: @example root:x:0:0:Root:/: @end example + +(The @command{sed}@footnote{The @command{sed} utility is a ``stream editor.'' +Its behavior is also defined by the POSIX standard.} +command prints just the first line of @file{/etc/passwd}.) @end cartouche @end ifnotdocbook @@ -7658,8 +7630,6 @@ will take effect. @end cartouche @end ifnotdocbook -@c ENDOFRANGE fisepr -@c ENDOFRANGE fisepg @node Constant Size @section Reading Fixed-Width Data @@ -7774,7 +7744,7 @@ In order to tell which kind of field splitting is in effect, use @code{PROCINFO["FS"]} (@pxref{Auto-set}). The value is @code{"FS"} if regular field splitting is being used, -or it is @code{"FIELDWIDTHS"} if fixed-width field splitting is being used: +or @code{"FIELDWIDTHS"} if fixed-width field splitting is being used: @example if (PROCINFO["FS"] == "FS") @@ -7810,14 +7780,14 @@ what they are, and not by what they are not. The most notorious such case is so-called @dfn{comma-separated values} (CSV) data. Many spreadsheet programs, for example, can export their data into text files, where each record is -terminated with a newline, and fields are separated by commas. If only -commas separated the data, there wouldn't be an issue. The problem comes when +terminated with a newline, and fields are separated by commas. If +commas only separated the data, there wouldn't be an issue. The problem comes when one of the fields contains an @emph{embedded} comma. In such cases, most programs embed the field in double quotes.@footnote{The CSV format lacked a formal standard definition for many years. @uref{http://www.ietf.org/rfc/rfc4180.txt, RFC 4180} standardizes the most common practices.} -So we might have data like this: +So, we might have data like this: @example @c file eg/misc/addresses.csv @@ -7903,8 +7873,8 @@ of cases, and the @command{gawk} developers are satisfied with that. @end quotation As written, the regexp used for @code{FPAT} requires that each field -have a least one character. A straightforward modification -(changing changed the first @samp{+} to @samp{*}) allows fields to be empty: +contain at least one character. A straightforward modification +(changing the first @samp{+} to @samp{*}) allows fields to be empty: @example FPAT = "([^,]*)|(\"[^\"]+\")" @@ -7914,20 +7884,17 @@ Finally, the @code{patsplit()} function makes the same functionality available for splitting regular strings (@pxref{String Functions}). To recap, @command{gawk} provides three independent methods -to split input records into fields. @command{gawk} uses whichever -mechanism was last chosen based on which of the three -variables---@code{FS}, @code{FIELDWIDTHS}, and @code{FPAT}---was +to split input records into fields. +The mechanism used is based on which of the three +variables---@code{FS}, @code{FIELDWIDTHS}, or @code{FPAT}---was last assigned to. @node Multiple Line @section Multiple-Line Records @cindex multiple-line records -@c STARTOFRANGE recm @cindex records, multiline -@c STARTOFRANGE imr @cindex input, multiline records -@c STARTOFRANGE frm @cindex files, reading, multiline records @cindex input, files, See input files In some databases, a single line cannot conveniently hold all the @@ -7962,7 +7929,7 @@ at the end of the record and one or more blank lines after the record. In addition, a regular expression always matches the longest possible sequence when there is a choice (@pxref{Leftmost Longest}). -So the next record doesn't start until +So, the next record doesn't start until the first nonblank line that follows---no matter how many blank lines appear in a row, they are considered one record separator. @@ -7977,10 +7944,10 @@ In the second case, this special processing is not done. @cindex field separator, in multiline records @cindex @code{FS}, in multiline records Now that the input is separated into records, the second step is to -separate the fields in the record. One way to do this is to divide each +separate the fields in the records. One way to do this is to divide each of the lines into fields in the normal manner. This happens by default as the result of a special feature. When @code{RS} is set to the empty -string, @emph{and} @code{FS} is set to a single character, +string @emph{and} @code{FS} is set to a single character, the newline character @emph{always} acts as a field separator. This is in addition to whatever field separations result from @code{FS}.@footnote{When @code{FS} is the null string (@code{""}) @@ -7995,7 +7962,7 @@ want the newline character to separate fields, because there is no way to prevent it. However, you can work around this by using the @code{split()} function to break up the record manually (@pxref{String Functions}). -If you have a single character field separator, you can work around +If you have a single-character field separator, you can work around the special feature in a different way, by making @code{FS} into a regexp for that single character. For example, if the field separator is a percent character, instead of @@ -8003,10 +7970,10 @@ separator is a percent character, instead of Another way to separate fields is to put each field on a separate line: to do this, just set the -variable @code{FS} to the string @code{"\n"}. (This single -character separator matches a single newline.) +variable @code{FS} to the string @code{"\n"}. +(This single-character separator matches a single newline.) A practical example of a @value{DF} organized this way might be a mailing -list, where each entry is separated by blank lines. Consider a mailing +list, where blank lines separate the entries. Consider a mailing list in a file named @file{addresses}, which looks like this: @example @@ -8094,20 +8061,15 @@ If not in compatibility mode (@pxref{Options}), @command{gawk} sets @code{RT} to the input text that matched the value specified by @code{RS}. But if the input file ended without any text that matches @code{RS}, then @command{gawk} sets @code{RT} to the null string. -@c ENDOFRANGE recm -@c ENDOFRANGE imr -@c ENDOFRANGE frm @node Getline @section Explicit Input with @code{getline} -@c STARTOFRANGE getl @cindex @code{getline} command, explicit input with -@c STARTOFRANGE inex @cindex input, explicit So far we have been getting our input data from @command{awk}'s main input stream---either the standard input (usually your keyboard, sometimes -the output from another program) or from the +the output from another program) or the files specified on the command line. The @command{awk} language has a special built-in command called @code{getline} that can be used to read input under your explicit control. @@ -8291,7 +8253,7 @@ free @end example The @code{getline} command used in this way sets only the variables -@code{NR}, @code{FNR}, and @code{RT} (and of course, @var{var}). +@code{NR}, @code{FNR}, and @code{RT} (and, of course, @var{var}). The record is not split into fields, so the values of the fields (including @code{$0}) and the value of @code{NF} do not change. @@ -8306,7 +8268,7 @@ the value of @code{NF} do not change. @cindex left angle bracket (@code{<}), @code{<} operator (I/O) @cindex operators, input/output Use @samp{getline < @var{file}} to read the next record from @var{file}. -Here @var{file} is a string-valued expression that +Here, @var{file} is a string-valued expression that specifies the @value{FN}. @samp{< @var{file}} is called a @dfn{redirection} because it directs input to come from a different place. For example, the following @@ -8484,7 +8446,7 @@ of a construct like @samp{@w{"echo "} "date" | getline}. Most versions, including the current version, treat it at as @samp{@w{("echo "} "date") | getline}. (This is also how BWK @command{awk} behaves.) -Some versions changed and treated it as +Some versions instead treat it as @samp{@w{"echo "} ("date" | getline)}. (This is how @command{mawk} behaves.) In short, @emph{always} use explicit parentheses, and then you won't @@ -8532,7 +8494,7 @@ program to be portable to other @command{awk} implementations. @cindex operators, input/output @cindex differences in @command{awk} and @command{gawk}, input/output operators -Input into @code{getline} from a pipe is a one-way operation. +Reading input into @code{getline} from a pipe is a one-way operation. The command that is started with @samp{@var{command} | getline} only sends data @emph{to} your @command{awk} program. @@ -8542,7 +8504,7 @@ for processing and then read the results back. communications are possible. This is done with the @samp{|&} operator. Typically, you write data to the coprocess first and then -read results back, as shown in the following: +read the results back, as shown in the following: @example print "@var{some query}" |& "db_server" @@ -8625,7 +8587,7 @@ also @pxref{Auto-set}.) @item Using @code{FILENAME} with @code{getline} (@samp{getline < FILENAME}) -is likely to be a source for +is likely to be a source of confusion. @command{awk} opens a separate input stream from the current input file. However, by not using a variable, @code{$0} and @code{NF} are still updated. If you're doing this, it's @@ -8633,9 +8595,15 @@ probably by accident, and you should reconsider what it is you're trying to accomplish. @item -@DBREF{Getline Summary} presents a table summarizing the +@ifdocbook +The next section +@end ifdocbook +@ifnotdocbook +@ref{Getline Summary}, +@end ifnotdocbook +presents a table summarizing the @code{getline} variants and which variables they can affect. -It is worth noting that those variants which do not use redirection +It is worth noting that those variants that do not use redirection can cause @code{FILENAME} to be updated if they cause @command{awk} to start reading a new input file. @@ -8644,7 +8612,7 @@ can cause @code{FILENAME} to be updated if they cause If the variable being assigned is an expression with side effects, different versions of @command{awk} behave differently upon encountering end-of-file. Some versions don't evaluate the expression; many versions -(including @command{gawk}) do. Here is an example, due to Duncan Moore: +(including @command{gawk}) do. Here is an example, courtesy of Duncan Moore: @ignore Date: Sun, 01 Apr 2012 11:49:33 +0100 @@ -8661,7 +8629,7 @@ BEGIN @{ @noindent Here, the side effect is the @samp{++c}. Is @code{c} incremented if -end of file is encountered, before the element in @code{a} is assigned? +end-of-file is encountered before the element in @code{a} is assigned? @command{gawk} treats @code{getline} like a function call, and evaluates the expression @samp{a[++c]} before attempting to read from @file{f}. @@ -8693,9 +8661,6 @@ Note: for each variant, @command{gawk} sets the @code{RT} predefined variable. @item @var{command} @code{|& getline} @var{var} @tab Sets @var{var} and @code{RT} @tab @command{gawk} @end multitable @end float -@c ENDOFRANGE getl -@c ENDOFRANGE inex -@c ENDOFRANGE infir @node Read Timeout @section Reading Input with a Timeout @@ -8706,8 +8671,8 @@ This @value{SECTION} describes a feature that is specific to @command{gawk}. You may specify a timeout in milliseconds for reading input from the keyboard, a pipe, or two-way communication, including TCP/IP sockets. This can be done -on a per input, command, or connection basis, by setting a special element -in the @code{PROCINFO} array (@pxref{Auto-set}): +on a per-input, per-command, or per-connection basis, by setting a special +element in the @code{PROCINFO} array (@pxref{Auto-set}): @example PROCINFO["input_name", "READ_TIMEOUT"] = @var{timeout in milliseconds} @@ -8738,7 +8703,7 @@ while ((getline < "/dev/stdin") > 0) @end example @command{gawk} terminates the read operation if input does not -arrive after waiting for the timeout period, returns failure +arrive after waiting for the timeout period, returns failure, and sets @code{ERRNO} to an appropriate string value. A negative or zero value for the timeout is the same as specifying no timeout at all. @@ -8748,7 +8713,7 @@ loop that reads input records and matches them against patterns, like so: @example -$ @kbd{ gawk 'BEGIN @{ PROCINFO["-", "READ_TIMEOUT"] = 5000 @}} +$ @kbd{gawk 'BEGIN @{ PROCINFO["-", "READ_TIMEOUT"] = 5000 @}} > @kbd{@{ print "You entered: " $0 @}'} @kbd{gawk} @print{} You entered: gawk @@ -8788,7 +8753,7 @@ If the @code{PROCINFO} element is not present and the @command{gawk} uses its value to initialize the timeout value. The exclusive use of the environment variable to specify timeout has the disadvantage of not being able to control it -on a per command or connection basis. +on a per-command or per-connection basis. @command{gawk} considers a timeout event to be an error even though the attempt to read from the underlying device may @@ -8854,7 +8819,7 @@ The possibilities are as follows: @item After splitting the input into records, @command{awk} further splits -the record into individual fields, named @code{$1}, @code{$2}, and so +the records into individual fields, named @code{$1}, @code{$2}, and so on. @code{$0} is the whole record, and @code{NF} indicates how many fields there are. The default way to split fields is between whitespace characters. @@ -8870,12 +8835,12 @@ thing. Decrementing @code{NF} throws away fields and rebuilds the record. @item Field splitting is more complicated than record splitting: -@multitable @columnfractions .40 .45 .15 +@multitable @columnfractions .40 .40 .20 @headitem Field separator value @tab Fields are split @dots{} @tab @command{awk} / @command{gawk} @item @code{FS == " "} @tab On runs of whitespace @tab @command{awk} @item @code{FS == @var{any single character}} @tab On that character @tab @command{awk} @item @code{FS == @var{regexp}} @tab On text matching the regexp @tab @command{awk} -@item @code{FS == ""} @tab Each individual character is a separate field @tab @command{gawk} +@item @code{FS == ""} @tab Such that each individual character is a separate field @tab @command{gawk} @item @code{FIELDWIDTHS == @var{list of columns}} @tab Based on character position @tab @command{gawk} @item @code{FPAT == @var{regexp}} @tab On the text surrounding text matching the regexp @tab @command{gawk} @end multitable @@ -8892,11 +8857,11 @@ This can also be done using command-line variable assignment. Use @code{PROCINFO["FS"]} to see how fields are being split. @item -Use @code{getline} in its various forms to read additional records, +Use @code{getline} in its various forms to read additional records from the default input stream, from a file, or from a pipe or coprocess. @item -Use @code{PROCINFO[@var{file}, "READ_TIMEOUT"]} to cause reads to timeout +Use @code{PROCINFO[@var{file}, "READ_TIMEOUT"]} to cause reads to time out for @var{file}. @item @@ -8930,7 +8895,6 @@ That can be fixed by making one simple change. What is it? @node Printing @chapter Printing Output -@c STARTOFRANGE prnt @cindex printing @cindex output, printing, See printing One of the most common programming actions is to @dfn{print}, or output, @@ -8946,7 +8910,6 @@ columns, whether to use exponential notation or not, and so on. For printing with specifications, you need the @code{printf} statement (@pxref{Printf}). -@c STARTOFRANGE prnts @cindex @code{print} statement @cindex @code{printf} statement Besides basic and formatted printing, this @value{CHAPTER} @@ -9008,7 +8971,7 @@ space is printed between any two items. Note that the @code{print} statement is a statement and not an expression---you can't use it in the pattern part of a -@var{pattern}-@var{action} statement, for example. +pattern--action statement, for example. @node Print Examples @section @code{print} Statement Examples @@ -9127,7 +9090,6 @@ You can continue either a @code{print} or @code{printf} statement simply by putting a newline after any comma (@pxref{Statements/Lines}). @end quotation -@c ENDOFRANGE prnts @node Output Separators @section Output Separators @@ -9200,7 +9162,7 @@ runs together on a single line. @cindex numeric, output format @cindex formats@comma{} numeric output When printing numeric values with the @code{print} statement, -@command{awk} internally converts the number to a string of characters +@command{awk} internally converts each number to a string of characters and prints that string. @command{awk} uses the @code{sprintf()} function to do this conversion (@pxref{String Functions}). @@ -9240,7 +9202,6 @@ if @code{OFMT} contains anything but a floating-point conversion specification. @node Printf @section Using @code{printf} Statements for Fancier Printing -@c STARTOFRANGE printfs @cindex @code{printf} statement @cindex output, formatted @cindex formatting output @@ -9272,7 +9233,7 @@ printf @var{format}, @var{item1}, @var{item2}, @dots{} @noindent As for @code{print}, the entire list of arguments may optionally be enclosed in parentheses. Here too, the parentheses are necessary if any -of the item expressions use the @samp{>} relational operator; otherwise, +of the item expressions uses the @samp{>} relational operator; otherwise, it can be confused with an output redirection (@pxref{Redirection}). @cindex format specifiers @@ -9303,7 +9264,7 @@ $ @kbd{awk 'BEGIN @{} @end example @noindent -Here, neither the @samp{+} nor the @samp{OUCH!} appear in +Here, neither the @samp{+} nor the @samp{OUCH!} appears in the output message. @node Control Letters @@ -9350,8 +9311,8 @@ The two control letters are equivalent. (The @samp{%i} specification is for compatibility with ISO C.) @item @code{%e}, @code{%E} -Print a number in scientific (exponential) notation; -for example: +Print a number in scientific (exponential) notation. +For example: @example printf "%4.3e\n", 1950 @@ -9388,7 +9349,7 @@ The special ``not a number'' value formats as @samp{-nan} or @samp{nan} (@pxref{Math Definitions}). @item @code{%F} -Like @samp{%f} but the infinity and ``not a number'' values are spelled +Like @samp{%f}, but the infinity and ``not a number'' values are spelled using uppercase letters. The @samp{%F} format is a POSIX extension to ISO C; not all systems @@ -9438,7 +9399,6 @@ values or do something else entirely. @node Format Modifiers @subsection Modifiers for @code{printf} Formats -@c STARTOFRANGE pfm @cindex @code{printf} statement, modifiers @cindex modifiers@comma{} in format specifiers A format specification can also include @dfn{modifiers} that can control @@ -9477,7 +9437,7 @@ messages at runtime. which describes how and why to use positional specifiers. For now, we ignore them. -@item - (Minus) +@item - @r{(Minus)} The minus sign, used before the width modifier (see later on in this list), says to left-justify @@ -9633,7 +9593,7 @@ printf "%" w "." p "s\n", s @end example @noindent -This is not particularly easy to read but it does work. +This is not particularly easy to read, but it does work. @c @cindex lint checks @cindex troubleshooting, fatal errors, @code{printf} format strings @@ -9644,7 +9604,6 @@ format strings. These are not valid in @command{awk}. Most @command{awk} implementations silently ignore them. If @option{--lint} is provided on the command line (@pxref{Options}), @command{gawk} warns about their use. If @option{--posix} is supplied, their use is a fatal error. -@c ENDOFRANGE pfm @node Printf Examples @subsection Examples Using @code{printf} @@ -9680,7 +9639,7 @@ $ @kbd{awk '@{ printf "%-10s %s\n", $1, $2 @}' mail-list} @end example In this case, the phone numbers had to be printed as strings because -the numbers are separated by a dash. Printing the phone numbers as +the numbers are separated by dashes. Printing the phone numbers as numbers would have produced just the first three digits: @samp{555}. This would have been pretty confusing. @@ -9725,14 +9684,11 @@ awk 'BEGIN @{ format = "%-10s %s\n" @{ printf format, $1, $2 @}' mail-list @end example -@c ENDOFRANGE printfs @node Redirection @section Redirecting Output of @code{print} and @code{printf} -@c STARTOFRANGE outre @cindex output redirection -@c STARTOFRANGE reout @cindex redirection of output @cindex @option{--sandbox} option, output redirection with @code{print}, @code{printf} So far, the output from @code{print} and @code{printf} has gone @@ -9743,7 +9699,7 @@ This is called @dfn{redirection}. @quotation NOTE When @option{--sandbox} is specified (@pxref{Options}), -redirecting output to files, pipes and coprocesses is disabled. +redirecting output to files, pipes, and coprocesses is disabled. @end quotation A redirection appears after the @code{print} or @code{printf} statement. @@ -9796,7 +9752,7 @@ Each output file contains one name or number per line. @cindex @code{>} (right angle bracket), @code{>>} operator (I/O) @cindex right angle bracket (@code{>}), @code{>>} operator (I/O) @item print @var{items} >> @var{output-file} -This redirection prints the items into the pre-existing output file +This redirection prints the items into the preexisting output file named @var{output-file}. The difference between this and the single-@samp{>} redirection is that the old contents (if any) of @var{output-file} are not erased. Instead, the @command{awk} output is @@ -9835,7 +9791,7 @@ The unsorted list is written with an ordinary redirection, while the sorted list is written by piping through the @command{sort} utility. The next example uses redirection to mail a message to the mailing -list @samp{bug-system}. This might be useful when trouble is encountered +list @code{bug-system}. This might be useful when trouble is encountered in an @command{awk} script run periodically for system maintenance: @example @@ -9866,15 +9822,23 @@ This redirection prints the items to the input of @var{command}. The difference between this and the single-@samp{|} redirection is that the output from @var{command} can be read with @code{getline}. -Thus @var{command} is a @dfn{coprocess}, which works together with, -but subsidiary to, the @command{awk} program. +Thus, @var{command} is a @dfn{coprocess}, which works together with +but is subsidiary to the @command{awk} program. This feature is a @command{gawk} extension, and is not available in POSIX @command{awk}. -@DBXREF{Getline/Coprocess} +@ifnotdocbook +@xref{Getline/Coprocess}, for a brief discussion. -@DBXREF{Two-way I/O} +@xref{Two-way I/O}, +for a more complete discussion. +@end ifnotdocbook +@ifdocbook +@DBXREF{Getline/Coprocess} +for a brief discussion and +@DBREF{Two-way I/O} for a more complete discussion. +@end ifdocbook @end table Redirecting output using @samp{>}, @samp{>>}, @samp{|}, or @samp{|&} @@ -9899,7 +9863,7 @@ This is indeed how redirections must be used from the shell. But in @command{awk}, it isn't necessary. In this kind of case, a program should use @samp{>} for all the @code{print} statements, because the output file is only opened once. (It happens that if you mix @samp{>} and @samp{>>} -that output is produced in the expected order. However, mixing the operators +output is produced in the expected order. However, mixing the operators for the same file is definitely poor style, and is confusing to readers of your program.) @@ -9990,11 +9954,9 @@ It then sends the list to the shell for execution. command lines to be fed to the shell. @end cartouche @end ifnotdocbook -@c ENDOFRANGE outre -@c ENDOFRANGE reout @node Special FD -@section Special Files for Standard Pre-Opened Data Streams +@section Special Files for Standard Preopened Data Streams @cindex standard input @cindex input, standard @cindex standard output @@ -10007,7 +9969,7 @@ command lines to be fed to the shell. Running programs conventionally have three input and output streams already available to them for reading and writing. These are known as the @dfn{standard input}, @dfn{standard output}, and @dfn{standard -error output}. These open streams (and any other open file or pipe) +error output}. These open streams (and any other open files or pipes) are often referred to by the technical term @dfn{file descriptors}. These streams are, by default, connected to your keyboard and screen, but @@ -10045,7 +10007,7 @@ that is connected to your keyboard and screen. It represents the ``terminal,''@footnote{The ``tty'' in @file{/dev/tty} stands for ``Teletype,'' a serial terminal.} which on modern systems is a keyboard and screen, not a serial console.) -This generally has the same effect but not always: although the +This generally has the same effect, but not always: although the standard error stream is usually the screen, it can be redirected; when that happens, writing to the screen is not correct. In fact, if @command{awk} is run from a background job, it may not have a @@ -10090,7 +10052,7 @@ print "Serious error detected!" > "/dev/stderr" @cindex troubleshooting, quotes with file names Note the use of quotes around the @value{FN}. -Like any other redirection, the value must be a string. +Like with any other redirection, the value must be a string. It is a common error to omit the quotes, which leads to confusing results. @@ -10101,7 +10063,6 @@ invoked with the @option{--traditional} option (@pxref{Options}). @node Special Files @section Special @value{FFN}s in @command{gawk} -@c STARTOFRANGE gfn @cindex @command{gawk}, file names in Besides access to standard input, standard output, and standard error, @@ -10117,7 +10078,7 @@ TCP/IP networking. @end menu @node Other Inherited Files -@subsection Accessing Other Open Files With @command{gawk} +@subsection Accessing Other Open Files with @command{gawk} Besides the @code{/dev/stdin}, @code{/dev/stdout}, and @code{/dev/stderr} special @value{FN}s mentioned earlier, @command{gawk} provides syntax @@ -10174,7 +10135,7 @@ special @value{FN}s that @command{gawk} provides: @cindex compatibility mode (@command{gawk}), file names @cindex file names, in compatibility mode @item -Recognition of the @value{FN}s for the three standard pre-opened +Recognition of the @value{FN}s for the three standard preopened files is disabled only in POSIX mode. @item @@ -10187,23 +10148,18 @@ compatibility mode (either @option{--traditional} or @option{--posix}; interprets these special @value{FN}s. For example, using @samp{/dev/fd/4} for output actually writes on file descriptor 4, and not on a new -file descriptor that is @code{dup()}'ed from file descriptor 4. Most of +file descriptor that is @code{dup()}ed from file descriptor 4. Most of the time this does not matter; however, it is important to @emph{not} close any of the files related to file descriptors 0, 1, and 2. Doing so results in unpredictable behavior. @end itemize -@c ENDOFRANGE gfn @node Close Files And Pipes @section Closing Input and Output Redirections @cindex files, output, See output files -@c STARTOFRANGE ifc @cindex input files, closing -@c STARTOFRANGE ofc @cindex output, files@comma{} closing -@c STARTOFRANGE pc @cindex pipe, closing -@c STARTOFRANGE cc @cindex coprocesses, closing @cindex @code{getline} command, coprocesses@comma{} using from @@ -10414,9 +10370,9 @@ This value is zero if the close succeeds, or @minus{}1 if it fails. The POSIX standard is very vague; it says that @code{close()} -returns zero on success and nonzero otherwise. In general, +returns zero on success and a nonzero value otherwise. In general, different implementations vary in what they report when closing -pipes; thus the return value cannot be used portably. +pipes; thus, the return value cannot be used portably. @value{DARKCORNER} In POSIX mode (@pxref{Options}), @command{gawk} just returns zero when closing a pipe. @@ -10471,19 +10427,15 @@ This value is zero if the close succeeds, or @minus{}1 if it fails. The POSIX standard is very vague; it says that @code{close()} -returns zero on success and nonzero otherwise. In general, +returns zero on success and a nonzero value otherwise. In general, different implementations vary in what they report when closing -pipes; thus the return value cannot be used portably. +pipes; thus, the return value cannot be used portably. @value{DARKCORNER} In POSIX mode (@pxref{Options}), @command{gawk} just returns zero when closing a pipe. @end cartouche @end ifnotdocbook -@c ENDOFRANGE ifc -@c ENDOFRANGE ofc -@c ENDOFRANGE pc -@c ENDOFRANGE cc @node Nonfatal @section Enabling Nonfatal Output @@ -10554,8 +10506,8 @@ for numeric values for the @code{print} statement. @item The @code{printf} statement provides finer-grained control over output, -with format control letters for different data types and various flags -that modify the behavior of the format control letters. +with format-control letters for different data types and various flags +that modify the behavior of the format-control letters. @item Output from both @code{print} and @code{printf} may be redirected to @@ -10610,11 +10562,9 @@ BEGIN @{ print "Serious error detected!" > /dev/stderr @} @end enumerate @c EXCLUDE END -@c ENDOFRANGE prnt @node Expressions @chapter Expressions -@c STARTOFRANGE exps @cindex expressions Expressions are the basic building blocks of @command{awk} patterns @@ -10625,7 +10575,7 @@ can assign a new value to a variable or a field by using an assignment operator. An expression can serve as a pattern or action statement on its own. Most other kinds of statements contain one or more expressions that specify the data on which to -operate. As in other languages, expressions in @command{awk} include +operate. As in other languages, expressions in @command{awk} can include variables, array references, constants, and function calls, as well as combinations of these with various operators. @@ -10644,7 +10594,7 @@ combinations of these with various operators. Expressions are built up from values and the operations performed upon them. This @value{SECTION} describes the elementary objects -which provide the values used in expressions. +that provide the values used in expressions. @menu * Constants:: String, numeric and regexp constants. @@ -10657,7 +10607,6 @@ which provide the values used in expressions. @node Constants @subsection Constant Expressions -@c STARTOFRANGE cnst @cindex constants, types of The simplest type of expression is the @dfn{constant}, which always has @@ -10695,7 +10644,7 @@ have the same value: @end example @cindex string constants -A string constant consists of a sequence of characters enclosed in +A @dfn{string constant} consists of a sequence of characters enclosed in double quotation marks. For example: @example @@ -10707,7 +10656,7 @@ double quotation marks. For example: @cindex strings, length limitations represents the string whose contents are @samp{parrot}. Strings in @command{gawk} can be of any length, and they can contain any of the possible -eight-bit ASCII characters including ASCII @sc{nul} (character code zero). +eight-bit ASCII characters, including ASCII @sc{nul} (character code zero). Other @command{awk} implementations may have difficulty with some character codes. @@ -10722,15 +10671,15 @@ In @command{awk}, all numbers are in decimal (i.e., base 10). Many other programming languages allow you to specify numbers in other bases, often octal (base 8) and hexadecimal (base 16). In octal, the numbers go 0, 1, 2, 3, 4, 5, 6, 7, 10, 11, 12, and so on. -Just as @samp{11}, in decimal, is 1 times 10 plus 1, so -@samp{11}, in octal, is 1 times 8, plus 1. This equals 9 in decimal. +Just as @samp{11} in decimal is 1 times 10 plus 1, so +@samp{11} in octal is 1 times 8 plus 1. This equals 9 in decimal. In hexadecimal, there are 16 digits. Because the everyday decimal number system only has ten digits (@samp{0}--@samp{9}), the letters @samp{a} through @samp{f} are used to represent the rest. (Case in the letters is usually irrelevant; hexadecimal @samp{a} and @samp{A} have the same value.) -Thus, @samp{11}, in -hexadecimal, is 1 times 16 plus 1, which equals 17 in decimal. +Thus, @samp{11} in +hexadecimal is 1 times 16 plus 1, which equals 17 in decimal. Just by looking at plain @samp{11}, you can't tell what base it's in. So, in C, C++, and other languages derived from C, @@ -10741,13 +10690,13 @@ and hexadecimal numbers start with a leading @samp{0x} or @samp{0X}: @table @code @item 11 -Decimal value 11. +Decimal value 11 @item 011 -Octal 11, decimal value 9. +Octal 11, decimal value 9 @item 0x11 -Hexadecimal 11, decimal value 17. +Hexadecimal 11, decimal value 17 @end table This example shows the difference: @@ -10775,11 +10724,11 @@ you can use the @code{strtonum()} function (@pxref{String Functions}) to convert the data into a number. Most of the time, you will want to use octal or hexadecimal constants -when working with the built-in bit manipulation functions; +when working with the built-in bit-manipulation functions; see @DBREF{Bitwise Functions} for more information. -Unlike some early C implementations, @samp{8} and @samp{9} are not +Unlike in some early C implementations, @samp{8} and @samp{9} are not valid in octal constants. For example, @command{gawk} treats @samp{018} as decimal 18: @@ -10843,19 +10792,17 @@ $ @kbd{gawk 'BEGIN @{ printf "0x11 is <%s>\n", 0x11 @}'} @node Regexp Constants @subsubsection Regular Expression Constants -@c STARTOFRANGE rec @cindex regexp constants @cindex @code{~} (tilde), @code{~} operator @cindex tilde (@code{~}), @code{~} operator @cindex @code{!} (exclamation point), @code{!~} operator @cindex exclamation point (@code{!}), @code{!~} operator -A regexp constant is a regular expression description enclosed in +A @dfn{regexp constant} is a regular expression description enclosed in slashes, such as @code{@w{/^beginning and end$/}}. Most regexps used in @command{awk} programs are constant, but the @samp{~} and @samp{!~} matching operators can also match computed or dynamic regexps (which are typically just ordinary strings or variables that contain a regexp, -but could be a more complex expression). -@c ENDOFRANGE cnst +but could be more complex expressions). @node Using Constant Regexps @subsection Using Regular Expression Constants @@ -10935,7 +10882,7 @@ the third argument of @code{split()} to be a regexp constant, but some older implementations do not. @value{DARKCORNER} Because some built-in functions accept regexp constants as arguments, -it can be confusing when attempting to use regexp constants as arguments +confusion can arise when attempting to use regexp constants as arguments to user-defined functions (@pxref{User-defined}). For example: @example @@ -10961,19 +10908,18 @@ function mysub(pat, repl, str, global) In this example, the programmer wants to pass a regexp constant to the user-defined function @code{mysub()}, which in turn passes it on to either @code{sub()} or @code{gsub()}. However, what really happens is that -the @code{pat} parameter is either one or zero, depending upon whether +the @code{pat} parameter is assigned a value of either one or zero, depending upon whether or not @code{$0} matches @code{/hi/}. @command{gawk} issues a warning when it sees a regexp constant used as a parameter to a user-defined function, because passing a truth value in this way is probably not what was intended. -@c ENDOFRANGE rec @node Variables @subsection Variables @cindex variables, user-defined @cindex user-defined, variables -Variables are ways of storing values at one point in your program for +@dfn{Variables} are ways of storing values at one point in your program for use later in another part of your program. They can be manipulated entirely within the program text, and they can also be assigned values on the @command{awk} command line. @@ -11001,17 +10947,17 @@ are distinct variables. A variable name is a valid expression by itself; it represents the variable's current value. Variables are given new values with @dfn{assignment operators}, @dfn{increment operators}, and -@dfn{decrement operators}. -@xref{Assignment Ops}. +@dfn{decrement operators} +(@pxref{Assignment Ops}). In addition, the @code{sub()} and @code{gsub()} functions can change a variable's value, and the @code{match()}, @code{split()}, and @code{patsplit()} functions can change the contents of their -array parameters. @xref{String Functions}. +array parameters (@pxref{String Functions}). @cindex variables, built-in @cindex variables, initializing A few variables have special built-in meanings, such as @code{FS} (the -field separator), and @code{NF} (the number of fields in the current input +field separator) and @code{NF} (the number of fields in the current input record). @DBXREF{Built-in Variables} for a list of the predefined variables. These predefined variables can be used and assigned just like all other variables, but their values are also used or changed automatically by @@ -11268,7 +11214,7 @@ point, so the default behavior was restored to use a period as the decimal point character. You can use the @option{--use-lc-numeric} option (@pxref{Options}) to force @command{gawk} to use the locale's decimal point character. (@command{gawk} also uses the locale's decimal -point character when in POSIX mode, either via @option{--posix}, or the +point character when in POSIX mode, either via @option{--posix} or the @env{POSIXLY_CORRECT} environment variable, as shown previously.) @ref{table-locale-affects} describes the cases in which the locale's decimal @@ -11286,7 +11232,7 @@ features have not been described yet. @end multitable @end float -Finally, modern day formal standards and IEEE standard floating-point +Finally, modern-day formal standards and the IEEE standard floating-point representation can have an unusual but important effect on the way @command{gawk} converts some special string values to numbers. The details are presented in @ref{POSIX Floating Point Problems}. @@ -11294,7 +11240,7 @@ are presented in @ref{POSIX Floating Point Problems}. @node All Operators @section Operators: Doing Something with Values -This @value{SECTION} introduces the @dfn{operators} which make use +This @value{SECTION} introduces the @dfn{operators} that make use of the values provided by constants and variables. @menu @@ -11472,7 +11418,7 @@ print "something meaningful" > file name @noindent This produces a syntax error with some versions of Unix @command{awk}.@footnote{It happens that BWK -@command{awk}, @command{gawk} and @command{mawk} all ``get it right,'' +@command{awk}, @command{gawk}, and @command{mawk} all ``get it right,'' but you should not rely on this.} It is necessary to use the following: @@ -11561,11 +11507,8 @@ you're never quite sure what you'll get. @node Assignment Ops @subsection Assignment Expressions -@c STARTOFRANGE asop @cindex assignment operators -@c STARTOFRANGE opas @cindex operators, assignment -@c STARTOFRANGE exas @cindex expressions, assignment @cindex @code{=} (equals sign), @code{=} operator @cindex equals sign (@code{=}), @code{=} operator @@ -11725,7 +11668,7 @@ and @ifdocbook @DBREF{Numeric Functions} @end ifdocbook -for more information). +for more information.) This example illustrates an important fact about assignment operators: the lefthand expression is only evaluated @emph{once}. @@ -11761,17 +11704,17 @@ to a number. @caption{Arithmetic assignment operators} @multitable @columnfractions .30 .70 @headitem Operator @tab Effect -@item @var{lvalue} @code{+=} @var{increment} @tab Add @var{increment} to the value of @var{lvalue} -@item @var{lvalue} @code{-=} @var{decrement} @tab Subtract @var{decrement} from the value of @var{lvalue} -@item @var{lvalue} @code{*=} @var{coefficient} @tab Multiply the value of @var{lvalue} by @var{coefficient} -@item @var{lvalue} @code{/=} @var{divisor} @tab Divide the value of @var{lvalue} by @var{divisor} -@item @var{lvalue} @code{%=} @var{modulus} @tab Set @var{lvalue} to its remainder by @var{modulus} +@item @var{lvalue} @code{+=} @var{increment} @tab Add @var{increment} to the value of @var{lvalue}. +@item @var{lvalue} @code{-=} @var{decrement} @tab Subtract @var{decrement} from the value of @var{lvalue}. +@item @var{lvalue} @code{*=} @var{coefficient} @tab Multiply the value of @var{lvalue} by @var{coefficient}. +@item @var{lvalue} @code{/=} @var{divisor} @tab Divide the value of @var{lvalue} by @var{divisor}. +@item @var{lvalue} @code{%=} @var{modulus} @tab Set @var{lvalue} to its remainder by @var{modulus}. @cindex common extensions, @code{**=} operator @cindex extensions, common@comma{} @code{**=} operator @cindex @command{awk} language, POSIX version @cindex POSIX @command{awk} -@item @var{lvalue} @code{^=} @var{power} @tab -@item @var{lvalue} @code{**=} @var{power} @tab Raise @var{lvalue} to the power @var{power} @value{COMMONEXT} +@item @var{lvalue} @code{^=} @var{power} @tab Raise @var{lvalue} to the power @var{power}. +@item @var{lvalue} @code{**=} @var{power} @tab Raise @var{lvalue} to the power @var{power}. @value{COMMONEXT} @end multitable @end float @@ -11871,16 +11814,11 @@ awk '/[=]=/' /dev/null and @command{mawk} also do not. @end cartouche @end ifnotdocbook -@c ENDOFRANGE exas -@c ENDOFRANGE opas -@c ENDOFRANGE asop @node Increment Ops @subsection Increment and Decrement Operators -@c STARTOFRANGE inop @cindex increment operators -@c STARTOFRANGE opde @cindex operators, decrement/increment @dfn{Increment} and @dfn{decrement operators} increase or decrease the value of a variable by one. An assignment operator can do the same thing, so @@ -11928,7 +11866,6 @@ just like variables. (Use @samp{$(i++)} when you want to do a field reference and a variable increment at the same time. The parentheses are necessary because of the precedence of the field reference operator @samp{$}.) -@c STARTOFRANGE deop @cindex decrement operators The decrement operator @samp{--} works just like @samp{++}, except that it subtracts one instead of adding it. As with @samp{++}, it can be used before @@ -11973,8 +11910,8 @@ like @samp{@var{lvalue}++}, but instead of adding, it subtracts.) @cindex evaluation order @cindex Marx, Groucho @quotation -@i{Doctor, doctor! It hurts when I do this!@* -So don't do that!} +@i{Doctor, it hurts when I do this!@* +Then don't do that!} @author Groucho Marx @end quotation @@ -11998,7 +11935,7 @@ print b @cindex side effects In other words, when do the various side effects prescribed by the postfix operators (@samp{b++}) take effect? -When side effects happen is @dfn{implementation defined}. +When side effects happen is @dfn{implementation-defined}. In other words, it is up to the particular version of @command{awk}. The result for the first example may be 12 or 13, and for the second, it may be 22 or 23. @@ -12025,8 +11962,8 @@ You should avoid such things in your own programs. @cindex evaluation order @cindex Marx, Groucho @quotation -@i{Doctor, doctor! It hurts when I do this!@* -So don't do that!} +@i{Doctor, it hurts when I do this!@* +Then don't do that!} @author Groucho Marx @end quotation @@ -12050,7 +11987,7 @@ print b @cindex side effects In other words, when do the various side effects prescribed by the postfix operators (@samp{b++}) take effect? -When side effects happen is @dfn{implementation defined}. +When side effects happen is @dfn{implementation-defined}. In other words, it is up to the particular version of @command{awk}. The result for the first example may be 12 or 13, and for the second, it may be 22 or 23. @@ -12062,15 +11999,12 @@ You should avoid such things in your own programs. @c in the mirror in the morning. @end cartouche @end ifnotdocbook -@c ENDOFRANGE inop -@c ENDOFRANGE opde -@c ENDOFRANGE deop @node Truth Values and Conditions @section Truth Values and Conditions -In certain contexts, expression values also serve as ``truth values''; (i.e., -they determine what should happen next as the program runs). This +In certain contexts, expression values also serve as ``truth values''; i.e., +they determine what should happen next as the program runs. This @value{SECTION} describes how @command{awk} defines ``true'' and ``false'' and how values are compared. @@ -12129,19 +12063,15 @@ the string constant @code{"0"} is actually true, because it is non-null. @author Douglas Adams, @cite{The Hitchhiker's Guide to the Galaxy} @end quotation -@c STARTOFRANGE comex @cindex comparison expressions -@c STARTOFRANGE excom @cindex expressions, comparison @cindex expressions, matching, See comparison expressions @cindex matching, expressions, See comparison expressions @cindex relational operators, See comparison operators @cindex operators, relational, See operators@comma{} comparison -@c STARTOFRANGE varting @cindex variable typing -@c STARTOFRANGE vartypc @cindex variables, types of, comparison expressions and -Unlike other programming languages, @command{awk} variables do not have a +Unlike in other programming languages, in @command{awk} variables do not have a fixed type. Instead, they can be either a number or a string, depending upon the value that is assigned to them. We look now at how variables are typed, and how @command{awk} @@ -12170,20 +12100,20 @@ Variable typing follows these rules: @itemize @value{BULLET} @item -A numeric constant or the result of a numeric operation has the @var{numeric} +A numeric constant or the result of a numeric operation has the @dfn{numeric} attribute. @item -A string constant or the result of a string operation has the @var{string} +A string constant or the result of a string operation has the @dfn{string} attribute. @item Fields, @code{getline} input, @code{FILENAME}, @code{ARGV} elements, @code{ENVIRON} elements, and the elements of an array created by @code{match()}, @code{split()}, and @code{patsplit()} that are numeric -strings have the @var{strnum} attribute. Otherwise, they have -the @var{string} attribute. Uninitialized variables also have the -@var{strnum} attribute. +strings have the @dfn{strnum} attribute. Otherwise, they have +the @dfn{string} attribute. Uninitialized variables also have the +@dfn{strnum} attribute. @item Attributes propagate across assignments but are not changed by @@ -12327,13 +12257,13 @@ constant, then a string comparison is performed. Otherwise, a numeric comparison is performed. This point bears additional emphasis: All user input is made of characters, -and so is first and foremost of @var{string} type; input strings -that look numeric are additionally given the @var{strnum} attribute. +and so is first and foremost of string type; input strings +that look numeric are additionally given the strnum attribute. Thus, the six-character input string @w{@samp{ +3.14}} receives the -@var{strnum} attribute. In contrast, the eight characters +strnum attribute. In contrast, the eight characters @w{@code{" +3.14"}} appearing in program text comprise a string constant. The following examples print @samp{1} when the comparison between -the two different constants is true, @samp{0} otherwise: +the two different constants is true, and @samp{0} otherwise: @c 22.9.2014: Tested with mawk and BWK awk, got same results. @example @@ -12463,7 +12393,7 @@ $ @kbd{echo 1e2 3 | awk '@{ print ($1 < $2) ? "true" : "false" @}'} @noindent the result is @samp{false} because both @code{$1} and @code{$2} are user input. They are numeric strings---therefore both have -the @var{strnum} attribute, dictating a numeric comparison. +the strnum attribute, dictating a numeric comparison. The purpose of the comparison rules and the use of numeric strings is to attempt to produce the behavior that is ``least surprising,'' while still ``doing the right thing.'' @@ -12522,7 +12452,7 @@ characters sort, as defined by the locale (for more discussion, @pxref{Locales}). This order is usually very different from the results obtained when doing straight character-by-character comparison.@footnote{Technically, string comparison is supposed -to behave the same way as if the strings are compared with the C +to behave the same way as if the strings were compared with the C @code{strcoll()} function.} Because this behavior differs considerably from existing practice, @@ -12539,19 +12469,13 @@ $ @kbd{gawk --posix 'BEGIN @{ printf("ABC < abc = %s\n",} @print{} ABC < abc = FALSE @end example -@c ENDOFRANGE comex -@c ENDOFRANGE excom -@c ENDOFRANGE vartypc -@c ENDOFRANGE varting @node Boolean Ops @subsection Boolean Expressions @cindex and Boolean-logic operator @cindex or Boolean-logic operator @cindex not Boolean-logic operator -@c STARTOFRANGE exbo @cindex expressions, Boolean -@c STARTOFRANGE boex @cindex Boolean expressions @cindex operators, Boolean, See Boolean expressions @cindex Boolean operators, See Boolean expressions @@ -12635,7 +12559,7 @@ BEGIN @{ if (! ("HOME" in ENVIRON)) @cindex vertical bar (@code{|}), @code{||} operator The @samp{&&} and @samp{||} operators are called @dfn{short-circuit} operators because of the way they work. Evaluation of the full expression -is ``short-circuited'' if the result can be determined part way through +is ``short-circuited'' if the result can be determined partway through its evaluation. @cindex line continuations @@ -12697,8 +12621,6 @@ next record, and start processing the rules over again at the top. The reason it's there is to avoid printing the bracketing @samp{START} and @samp{END} lines. @end quotation -@c ENDOFRANGE exbo -@c ENDOFRANGE boex @node Conditional Exp @subsection Conditional Expressions @@ -12709,8 +12631,8 @@ The reason it's there is to avoid printing the bracketing A @dfn{conditional expression} is a special kind of expression that has three operands. It allows you to use one expression's value to select one of two other expressions. -The conditional expression is the same as in the C language, -as shown here: +The conditional expression in @command{awk} is the same as in the C +language, as shown here: @example @var{selector} ? @var{if-true-exp} : @var{if-false-exp} @@ -12719,8 +12641,8 @@ as shown here: @noindent There are three subexpressions. The first, @var{selector}, is always computed first. If it is ``true'' (not zero or not null), then -@var{if-true-exp} is computed next and its value becomes the value of -the whole expression. Otherwise, @var{if-false-exp} is computed next +@var{if-true-exp} is computed next, and its value becomes the value of +the whole expression. Otherwise, @var{if-false-exp} is computed next, and its value becomes the value of the whole expression. For example, the following expression produces the absolute value of @code{x}: @@ -12768,7 +12690,7 @@ ask for it by name at any point in the program. For example, the function @code{sqrt()} computes the square root of a number. @cindex functions, built-in -A fixed set of functions are @dfn{built-in}, which means they are +A fixed set of functions are @dfn{built in}, which means they are available in every @command{awk} program. The @code{sqrt()} function is one of these. @DBXREF{Built-in} for a list of built-in functions and their descriptions. In addition, you can define @@ -12877,9 +12799,7 @@ $ @kbd{awk -f matchit.awk} @node Precedence @section Operator Precedence (How Operators Nest) -@c STARTOFRANGE prec @cindex precedence -@c STARTOFRANGE oppr @cindex operators, precedence @dfn{Operator precedence} determines how operators are grouped when @@ -12944,7 +12864,7 @@ Increment, decrement. @cindex @code{*} (asterisk), @code{**} operator @cindex asterisk (@code{*}), @code{**} operator @item @code{^ **} -Exponentiation. These operators group right-to-left. +Exponentiation. These operators group right to left. @cindex @code{+} (plus sign), @code{+} operator @cindex plus sign (@code{+}), @code{+} operator @@ -13010,7 +12930,7 @@ statements belong to the statement level, not to expressions. The redirection does not produce an expression that could be the operand of another operator. As a result, it does not make sense to use a redirection operator near another operator of lower precedence without -parentheses. Such combinations (e.g., @samp{print foo > a ? b : c}), +parentheses. Such combinations (e.g., @samp{print foo > a ? b : c}) result in syntax errors. The correct way to write this statement is @samp{print foo > (a ? b : c)}. @@ -13028,17 +12948,17 @@ Array membership. @cindex @code{&} (ampersand), @code{&&} operator @cindex ampersand (@code{&}), @code{&&} operator @item @code{&&} -Logical ``and''. +Logical ``and.'' @cindex @code{|} (vertical bar), @code{||} operator @cindex vertical bar (@code{|}), @code{||} operator @item @code{||} -Logical ``or''. +Logical ``or.'' @cindex @code{?} (question mark), @code{?:} operator @cindex question mark (@code{?}), @code{?:} operator @item @code{?:} -Conditional. This operator groups right-to-left. +Conditional. This operator groups right to left. @cindex @code{+} (plus sign), @code{+=} operator @cindex plus sign (@code{+}), @code{+=} operator @@ -13055,7 +12975,7 @@ Conditional. This operator groups right-to-left. @cindex @code{^} (caret), @code{^=} operator @cindex caret (@code{^}), @code{^=} operator @item @code{= += -= *= /= %= ^= **=} -Assignment. These operators group right-to-left. +Assignment. These operators group right to left. @end table @cindex POSIX @command{awk}, @code{**} operator and @@ -13064,8 +12984,6 @@ Assignment. These operators group right-to-left. The @samp{|&}, @samp{**}, and @samp{**=} operators are not specified by POSIX. For maximum portability, do not use them. @end quotation -@c ENDOFRANGE prec -@c ENDOFRANGE oppr @node Locales @section Where You Are Makes a Difference @@ -13131,8 +13049,8 @@ Locales can influence the conversions. @item @command{awk} provides the usual arithmetic operators (addition, subtraction, multiplication, division, modulus), and unary plus and minus. -It also provides comparison operators, boolean operators, array membership -testing, and regexp +It also provides comparison operators, Boolean operators, an array membership +testing operator, and regexp matching operators. String concatenation is accomplished by placing two expressions next to each other; there is no explicit operator. The three-operand @samp{?:} operator provides an ``if-else'' test within @@ -13143,7 +13061,7 @@ Assignment operators provide convenient shorthands for common arithmetic operations. @item -In @command{awk}, a value is considered to be true if it is non-zero +In @command{awk}, a value is considered to be true if it is nonzero @emph{or} non-null. Otherwise, the value is false. @item @@ -13152,7 +13070,7 @@ lifetime. The type determines how it behaves in comparisons (string or numeric). @item -Function calls return a value which may be used as part of a larger +Function calls return a value that may be used as part of a larger expression. Expressions used to pass parameter values are fully evaluated before the function is called. @command{awk} provides built-in and user-defined functions; this is described in @@ -13169,11 +13087,9 @@ program, and occasionally the format for data read as input. @end itemize -@c ENDOFRANGE exps @node Patterns and Actions @chapter Patterns, Actions, and Variables -@c STARTOFRANGE pat @cindex patterns As you have already seen, each @command{awk} statement consists of @@ -13181,7 +13097,7 @@ a pattern with an associated action. This @value{CHAPTER} describes how you build patterns and actions, what kinds of things you can do within actions, and @command{awk}'s predefined variables. -The pattern-action rules and the statements available for use +The pattern--action rules and the statements available for use within actions form the core of @command{awk} programming. In a sense, everything covered up to here has been the foundation @@ -13372,7 +13288,7 @@ patterns. Likewise, the special patterns @code{BEGIN}, @code{END}, which never match any input record, are not expressions and cannot appear inside Boolean patterns. -The precedence of the different operators which can appear in +The precedence of the different operators that can appear in patterns is described in @ref{Precedence}. @node Ranges @@ -13398,7 +13314,7 @@ prints every record in @file{myfile} between @samp{on}/@samp{off} pairs, inclusi A range pattern starts out by matching @var{begpat} against every input record. When a record matches @var{begpat}, the range pattern is -@dfn{turned on} and the range pattern matches this record as well. As long as +@dfn{turned on}, and the range pattern matches this record as well. As long as the range pattern stays turned on, it automatically matches every input record read. The range pattern also matches @var{endpat} against every input record; when this succeeds, the range pattern is @dfn{turned off} again @@ -13469,9 +13385,7 @@ a range pattern. @value{DARKCORNER} @node BEGIN/END @subsection The @code{BEGIN} and @code{END} Special Patterns -@c STARTOFRANGE beg @cindex @code{BEGIN} pattern -@c STARTOFRANGE end @cindex @code{END} pattern All the patterns described so far are for matching input records. The @code{BEGIN} and @code{END} special patterns are different. @@ -13544,7 +13458,7 @@ using library functions. for a number of useful library functions. If an @command{awk} program has only @code{BEGIN} rules and no -other rules, then the program exits after the @code{BEGIN} rule is +other rules, then the program exits after the @code{BEGIN} rules are run.@footnote{The original version of @command{awk} kept reading and ignoring input until the end of the file was seen.} However, if an @code{END} rule exists, then the input is read, even if there are @@ -13572,7 +13486,7 @@ Another way is simply to assign a value to @code{$0}. @cindex @code{print} statement, @code{BEGIN}/@code{END} patterns and @cindex @code{BEGIN} pattern, @code{print} statement and @cindex @code{END} pattern, @code{print} statement and -The second point is similar to the first but from the other direction. +The second point is similar to the first, but from the other direction. Traditionally, due largely to implementation issues, @code{$0} and @code{NF} were @emph{undefined} inside an @code{END} rule. The POSIX standard specifies that @code{NF} is available in an @code{END} @@ -13609,8 +13523,6 @@ are not valid in an @code{END} rule, because all the input has been read. @ifdocbook @DBREF{Nextfile Statement}.) @end ifdocbook -@c ENDOFRANGE beg -@c ENDOFRANGE end @node BEGINFILE/ENDFILE @subsection The @code{BEGINFILE} and @code{ENDFILE} Special Patterns @@ -13663,7 +13575,7 @@ fatal error. @item If you have written extensions that modify the record handling (by -inserting an ``input parser,'' @pxref{Input Parsers}), you can invoke +inserting an ``input parser''; @pxref{Input Parsers}), you can invoke them at this point, before @command{gawk} has started processing the file. (This is a @emph{very} advanced feature, currently used only by the @uref{http://gawkextlib.sourceforge.net, @code{gawkextlib} project}.) @@ -13674,8 +13586,8 @@ the last record in an input file. For the last input file, it will be called before any @code{END} rules. The @code{ENDFILE} rule is executed even for empty input files. -Normally, when an error occurs when reading input in the normal input -processing loop, the error is fatal. However, if an @code{ENDFILE} +Normally, when an error occurs when reading input in the normal +input-processing loop, the error is fatal. However, if an @code{ENDFILE} rule is present, the error becomes non-fatal, and instead @code{ERRNO} is set. This makes it possible to catch and process I/O errors at the level of the @command{awk} program. @@ -13684,7 +13596,7 @@ level of the @command{awk} program. The @code{next} statement (@pxref{Next Statement}) is not allowed inside either a @code{BEGINFILE} or an @code{ENDFILE} rule. The @code{nextfile} statement is allowed only inside a -@code{BEGINFILE} rule, but not inside an @code{ENDFILE} rule. +@code{BEGINFILE} rule, not inside an @code{ENDFILE} rule. @cindex @code{getline} statement, @code{BEGINFILE}/@code{ENDFILE} patterns and The @code{getline} statement (@pxref{Getline}) is restricted inside @@ -13731,7 +13643,6 @@ awk '@{ print $1 @}' mail-list @noindent prints the first field of every record. -@c ENDOFRANGE pat @node Using Shell Variables @section Using Shell Variables in Programs @@ -13761,11 +13672,11 @@ awk "/$pattern/ "'@{ nmatches++ @} @noindent The @command{awk} program consists of two pieces of quoted text that are concatenated together to form the program. -The first part is double quoted, which allows substitution of +The first part is double-quoted, which allows substitution of the @code{pattern} shell variable inside the quotes. -The second part is single quoted. +The second part is single-quoted. -Variable substitution via quoting works, but can be potentially +Variable substitution via quoting works, but can potentially be messy. It requires a good understanding of the shell's quoting rules (@pxref{Quoting}), and it's often difficult to correctly @@ -13880,11 +13791,8 @@ For deleting array elements. @node Statements @section Control Statements in Actions -@c STARTOFRANGE csta @cindex control statements -@c STARTOFRANGE acs @cindex statements, control, in actions -@c STARTOFRANGE accs @cindex actions, control statements in @dfn{Control statements}, such as @code{if}, @code{while}, and so on, @@ -14027,13 +13935,13 @@ The body of this loop is a compound statement enclosed in braces, containing two statements. The loop works in the following manner: first, the value of @code{i} is set to one. Then, the @code{while} statement tests whether @code{i} is less than or equal to -three. This is true when @code{i} equals one, so the @code{i}-th +three. This is true when @code{i} equals one, so the @code{i}th field is printed. Then the @samp{i++} increments the value of @code{i} and the loop repeats. The loop terminates when @code{i} reaches four. A newline is not required between the condition and the body; however, using one makes the program clearer unless the body is a -compound statement or else is very simple. The newline after the open-brace +compound statement or else is very simple. The newline after the open brace that begins the compound statement is not required either, but the program is harder to read without it. @@ -14063,9 +13971,9 @@ while (@var{condition}) @end example @noindent -This statement does not execute @var{body} even once if the @var{condition} -is false to begin with. -The following is an example of a @code{do} statement: +This statement does not execute the @var{body} even once if the +@var{condition} is false to begin with. The following is an example of +a @code{do} statement: @example @{ @@ -14132,7 +14040,7 @@ their assignments as separate statements preceding the @code{for} loop.) The same is true of the @var{increment} part. Incrementing additional variables requires separate statements at the end of the loop. The C compound expression, using C's comma operator, is useful in -this context but it is not supported in @command{awk}. +this context, but it is not supported in @command{awk}. Most often, @var{increment} is an increment expression, as in the previous example. But this is not required; it can be any expression @@ -14223,7 +14131,7 @@ default: Control flow in the @code{switch} statement works as it does in C. Once a match to a given case is made, the case statement bodies execute until a @code{break}, -@code{continue}, @code{next}, @code{nextfile} or @code{exit} is encountered, +@code{continue}, @code{next}, @code{nextfile}, or @code{exit} is encountered, or the end of the @code{switch} statement itself. For example: @example @@ -14397,7 +14305,12 @@ body of a loop. Historical versions of @command{awk} treated a @code{continue} statement outside a loop the same way they treated a @code{break} statement outside a loop: as if it were a @code{next} statement +@ifset FOR_PRINT +(discussed in the following section). +@end ifset +@ifclear FOR_PRINT (@pxref{Next Statement}). +@end ifclear @value{DARKCORNER} Recent versions of BWK @command{awk} no longer work this way, nor does @command{gawk}. @@ -14525,7 +14438,7 @@ See @uref{http://austingroupbugs.net/view.php?id=607, the Austin Group website}. @cindex @code{nextfile} statement, user-defined functions and @cindex Brian Kernighan's @command{awk} @cindex @command{mawk} utility -The current version of BWK @command{awk}, and @command{mawk} +The current version of BWK @command{awk} and @command{mawk} also support @code{nextfile}. However, they don't allow the @code{nextfile} statement inside function bodies (@pxref{User-defined}). @command{gawk} does; a @code{nextfile} inside a function body reads the @@ -14563,7 +14476,7 @@ any @code{ENDFILE} rules; they do not execute. In such a case, if you don't want the @code{END} rule to do its job, set a variable -to nonzero before the @code{exit} statement and check that variable in +to a nonzero value before the @code{exit} statement and check that variable in the @code{END} rule. @DBXREF{Assert Function} for an example that does this. @@ -14602,15 +14515,10 @@ Negative values, and values of 127 or greater, may not produce consistent results across different operating systems. @end quotation -@c ENDOFRANGE csta -@c ENDOFRANGE acs -@c ENDOFRANGE accs @node Built-in Variables @section Predefined Variables -@c STARTOFRANGE bvar @cindex predefined variables -@c STARTOFRANGE varb @cindex variables, predefined Most @command{awk} variables are available to use for your own @@ -14636,10 +14544,8 @@ their areas of activity. @end menu @node User-modified -@subsection Built-In Variables That Control @command{awk} -@c STARTOFRANGE bvaru +@subsection Built-in Variables That Control @command{awk} @cindex predefined variables, user-modifiable -@c STARTOFRANGE nmbv @cindex user-modifiable variables The following is an alphabetical list of variables that you can change to @@ -14667,7 +14573,7 @@ respectively, should use binary I/O. A string value of @code{"rw"} or @code{"wr"} indicates that all files should use binary I/O. Any other string value is treated the same as @code{"rw"}, but causes @command{gawk} to generate a warning message. @code{BINMODE} is described in more -detail in @ref{PC Using}. @command{mawk} (@pxref{Other Versions}), +detail in @ref{PC Using}. @command{mawk} (@pxref{Other Versions}) also supports this variable, but only using numeric values. @cindex @code{CONVFMT} variable @@ -14675,7 +14581,7 @@ also supports this variable, but only using numeric values. @cindex numbers, converting, to strings @cindex strings, converting, numbers to @item @code{CONVFMT} -This string controls conversion of numbers to +A string that controls the conversion of numbers to strings (@pxref{Conversion}). It works by being passed, in effect, as the first argument to the @code{sprintf()} function @@ -14750,7 +14656,7 @@ is to simply say @samp{FS = FS}, perhaps with an explanatory comment. @cindex regular expressions, case sensitivity @item IGNORECASE # If @code{IGNORECASE} is nonzero or non-null, then all string comparisons -and all regular expression matching are case independent. Thus, regexp +and all regular expression matching are case-independent. Thus, regexp matching with @samp{~} and @samp{!~}, as well as the @code{gensub()}, @code{gsub()}, @code{index()}, @code{match()}, @code{patsplit()}, @code{split()}, and @code{sub()} @@ -14776,7 +14682,7 @@ Any other true value prints nonfatal warnings. Assigning a false value to @code{LINT} turns off the lint warnings. This variable is a @command{gawk} extension. It is not special -in other @command{awk} implementations. Unlike the other special variables, +in other @command{awk} implementations. Unlike with the other special variables, changing @code{LINT} does affect the production of lint warnings, even if @command{gawk} is in compatibility mode. Much as the @option{--lint} and @option{--traditional} options independently @@ -14788,7 +14694,7 @@ of @command{awk} being executed. @cindex numbers, converting, to strings @cindex strings, converting, numbers to @item OFMT -Controls conversion of numbers to +A string that controls conversion of numbers to strings (@pxref{Conversion}) for printing with the @code{print} statement. It works by being passed as the first argument to the @code{sprintf()} function @@ -14803,7 +14709,7 @@ strings in general expressions; this is now done by @code{CONVFMT}. @cindex separators, field @cindex field separators @item OFS -This is the output field separator (@pxref{Output Separators}). It is +The output field separator (@pxref{Output Separators}). It is output between the fields printed by a @code{print} statement. Its default value is @w{@code{" "}}, a string consisting of a single space. @@ -14821,7 +14727,7 @@ The working precision of arbitrary-precision floating-point numbers, @cindex @code{ROUNDMODE} variable @item ROUNDMODE # The rounding mode to use for arbitrary-precision arithmetic on -numbers, by default @code{"N"} (@samp{roundTiesToEven} in +numbers, by default @code{"N"} (@code{roundTiesToEven} in the IEEE 754 standard; @pxref{Setting the rounding mode}). @cindex @code{RS} variable @@ -14850,7 +14756,7 @@ just the first character of @code{RS}'s value is used. @item @code{SUBSEP} The subscript separator. It has the default value of @code{"\034"} and is used to separate the parts of the indices of a -multidimensional array. Thus, the expression @code{@w{foo["A", "B"]}} +multidimensional array. Thus, the expression @samp{@w{foo["A", "B"]}} really accesses @code{foo["A\034B"]} (@pxref{Multidimensional}). @@ -14866,17 +14772,11 @@ marked string constants in the source text, as well as for the (@pxref{Internationalization}). The default value of @code{TEXTDOMAIN} is @code{"messages"}. @end table -@c ENDOFRANGE bvar -@c ENDOFRANGE varb -@c ENDOFRANGE bvaru -@c ENDOFRANGE nmbv @node Auto-set -@subsection Built-In Variables That Convey Information +@subsection Built-in Variables That Convey Information -@c STARTOFRANGE bvconi @cindex predefined variables, conveying information -@c STARTOFRANGE vbconi @cindex variables, predefined conveying information The following is an alphabetical list of variables that @command{awk} sets automatically on certain occasions in order to provide @@ -15032,12 +14932,12 @@ input file. @item @code{NF} The number of fields in the current input record. @code{NF} is set each time a new record is read, when a new field is -created or when @code{$0} changes (@pxref{Fields}). +created, or when @code{$0} changes (@pxref{Fields}). Unlike most of the variables described in this @value{SUBSECTION}, assigning a value to @code{NF} has the potential to affect @command{awk}'s internal workings. In particular, assignments -to @code{NF} can be used to create or remove fields from the +to @code{NF} can be used to create fields in or remove fields from the current record. @xref{Changing Fields}. @cindex @code{FUNCTAB} array @@ -15087,7 +14987,7 @@ or @code{"FPAT"} if field matching with @code{FPAT} is in effect. @item PROCINFO["identifiers"] @cindex program identifiers A subarray, indexed by the names of all identifiers used in the text of -the AWK program. An @dfn{identifier} is simply the name of a variable +the @command{awk} program. An @dfn{identifier} is simply the name of a variable (be it scalar or array), built-in function, user-defined function, or extension function. For each identifier, the value of the element is one of the following: @@ -15107,7 +15007,7 @@ The identifier is an extension function loaded via The identifier is a scalar. @item "untyped" -The identifier is untyped (could be used as a scalar or array, +The identifier is untyped (could be used as a scalar or an array; @command{gawk} doesn't know yet). @item "user" @@ -15228,7 +15128,7 @@ is the length of the matched string, or @minus{}1 if no match is found. @cindex @code{RSTART} variable @item @code{RSTART} -The start-index in characters of the substring that is matched by the +The start index in characters of the substring that is matched by the @code{match()} function (@pxref{String Functions}). @code{RSTART} is set by invoking the @code{match()} function. Its value @@ -15295,11 +15195,9 @@ function multiply(variable, amount) @quotation NOTE In order to avoid severe time-travel paradoxes,@footnote{Not to mention difficult implementation issues.} neither @code{FUNCTAB} nor @code{SYMTAB} -are available as elements within the @code{SYMTAB} array. +is available as an element within the @code{SYMTAB} array. @end quotation @end table -@c ENDOFRANGE bvconi -@c ENDOFRANGE vbconi @cindex sidebar, Changing @code{NR} and @code{FNR} @ifdocbook @@ -15517,7 +15415,7 @@ When designing your program, you should choose options that don't conflict with @command{gawk}'s, because it will process any options that it accepts before passing the rest of the command line on to your program. Using @samp{#!} with the @option{-E} option may help -(@DBXREF{Executable Scripts} +(@DBPXREF{Executable Scripts} and @ifnotdocbook @DBPXREF{Options}). @@ -15531,15 +15429,15 @@ and @itemize @value{BULLET} @item -Pattern-action pairs make up the basic elements of an @command{awk} +Pattern--action pairs make up the basic elements of an @command{awk} program. Patterns are either normal expressions, range expressions, -regexp constants, one of the special keywords @code{BEGIN}, @code{END}, -@code{BEGINFILE}, @code{ENDFILE}, or empty. The action executes if +or regexp constants; one of the special keywords @code{BEGIN}, @code{END}, +@code{BEGINFILE}, or @code{ENDFILE}; or empty. The action executes if the current record matches the pattern. Empty (missing) patterns match all records. @item -I/O from @code{BEGIN} and @code{END} rules have certain constraints. +I/O from @code{BEGIN} and @code{END} rules has certain constraints. This is also true, only more so, for @code{BEGINFILE} and @code{ENDFILE} rules. The latter two give you ``hooks'' into @command{gawk}'s file processing, allowing you to recover from a file that otherwise would @@ -15569,12 +15467,12 @@ iteration of a loop (or get out of a @code{switch}). @item @code{next} and @code{nextfile} let you read the next record and start -over at the top of your program, or skip to the next input file and +over at the top of your program or skip to the next input file and start over, respectively. @item The @code{exit} statement terminates your program. When executed -from an action (or function body) it transfers control to the +from an action (or function body), it transfers control to the @code{END} statements. From an @code{END} statement body, it exits immediately. You may pass an optional numeric value to be used as @command{awk}'s exit status. @@ -15592,7 +15490,6 @@ control how @command{awk} will process the provided @value{DF}s. @node Arrays @chapter Arrays in @command{awk} -@c STARTOFRANGE arrs @cindex arrays An @dfn{array} is a table of values called @dfn{elements}. The @@ -15678,15 +15575,17 @@ the declaration. indices---e.g., @samp{15 .. 27}---but the size of the array is still fixed when the array is declared.) -A contiguous array of four elements might look like the following example, -conceptually, if the element values are 8, @code{"foo"}, -@code{""}, and 30 +@c 1/2015: Do not put the numeric values into @code. Array element +@c values are no different than scalar variable values. +A contiguous array of four elements might look like @ifnotdocbook -as shown in @ref{figure-array-elements}: +@ref{figure-array-elements}, @end ifnotdocbook @ifdocbook -as shown in @inlineraw{docbook, <xref linkend="figure-array-elements"/>}: +@inlineraw{docbook, <xref linkend="figure-array-elements"/>}, @end ifdocbook +conceptually, if the element values are eight, @code{"foo"}, +@code{""}, and 30. @ifnotdocbook @float Figure,figure-array-elements @@ -15711,12 +15610,10 @@ as shown in @inlineraw{docbook, <xref linkend="figure-array-elements"/>}: @noindent Only the values are stored; the indices are implicit from the order of -the values. Here, 8 is the value at index zero, because 8 appears in the +the values. Here, eight is the value at index zero, because eight appears in the position with zero elements before it. -@c STARTOFRANGE arrin @cindex arrays, indexing -@c STARTOFRANGE inarr @cindex indexing arrays @cindex associative arrays @cindex arrays, associative @@ -15725,19 +15622,21 @@ that each array is a collection of pairs---an index and its corresponding array element value: @ifnotdocbook -@example -@r{Index} 3 @r{Value} 30 -@r{Index} 1 @r{Value} "foo" -@r{Index} 0 @r{Value} 8 -@r{Index} 2 @r{Value} "" -@end example +@c extra empty column to indent it right +@multitable @columnfractions .1 .1 .1 +@headitem @tab Index @tab Value +@item @tab @code{3} @tab @code{30} +@item @tab @code{1} @tab @code{"foo"} +@item @tab @code{0} @tab @code{8} +@item @tab @code{2} @tab @code{""} +@end multitable @end ifnotdocbook @docbook <informaltable> <tgroup cols="2"> -<colspec colname="1" align="center"/> -<colspec colname="2" align="center"/> +<colspec colname="1" align="left"/> +<colspec colname="2" align="left"/> <thead> <row> <entry>Index</entry> @@ -15783,20 +15682,22 @@ at any time. For example, suppose a tenth element is added to the array whose value is @w{@code{"number ten"}}. The result is: @ifnotdocbook -@example -@r{Index} 10 @r{Value} "number ten" -@r{Index} 3 @r{Value} 30 -@r{Index} 1 @r{Value} "foo" -@r{Index} 0 @r{Value} 8 -@r{Index} 2 @r{Value} "" -@end example +@c extra empty column to indent it right +@multitable @columnfractions .1 .1 .2 +@headitem @tab Index @tab Value +@item @tab @code{10} @tab @code{"number ten"} +@item @tab @code{3} @tab @code{30} +@item @tab @code{1} @tab @code{"foo"} +@item @tab @code{0} @tab @code{8} +@item @tab @code{2} @tab @code{""} +@end multitable @end ifnotdocbook @docbook <informaltable> <tgroup cols="2"> -<colspec colname="1" align="center"/> -<colspec colname="2" align="center"/> +<colspec colname="1" align="left"/> +<colspec colname="2" align="left"/> <thead> <row> <entry>Index</entry> @@ -15848,19 +15749,20 @@ an index. For example, the following is an array that translates words from English to French: @ifnotdocbook -@example -@r{Index} "dog" @r{Value} "chien" -@r{Index} "cat" @r{Value} "chat" -@r{Index} "one" @r{Value} "un" -@r{Index} 1 @r{Value} "un" -@end example +@multitable @columnfractions .1 .1 .1 +@headitem @tab Index @tab Value +@item @tab @code{"dog"} @tab @code{"chien"} +@item @tab @code{"cat"} @tab @code{"chat"} +@item @tab @code{"one"} @tab @code{"un"} +@item @tab @code{1} @tab @code{"un"} +@end multitable @end ifnotdocbook @docbook <informaltable> <tgroup cols="2"> -<colspec colname="1" align="center"/> -<colspec colname="2" align="center"/> +<colspec colname="1" align="left"/> +<colspec colname="2" align="left"/> <thead> <row> <entry>Index</entry> @@ -15902,7 +15804,7 @@ numbers and strings as indices. There are some subtleties to how numbers work when used as array subscripts; this is discussed in more detail in @ref{Numeric Array Subscripts}.) -Here, the number @code{1} isn't double quoted, because @command{awk} +Here, the number @code{1} isn't double-quoted, because @command{awk} automatically converts it to a string. @cindex @command{gawk}, @code{IGNORECASE} variable in @@ -15919,8 +15821,6 @@ that array's indices are consecutive integers starting at one. @command{awk}'s arrays are efficient---the time to access an element is independent of the number of elements in the array. -@c ENDOFRANGE arrin -@c ENDOFRANGE inarr @node Reference to Elements @subsection Referring to an Array Element @@ -15929,7 +15829,7 @@ is independent of the number of elements in the array. @cindex elements of arrays The principal way to use an array is to refer to one of its elements. -An array reference is an expression as follows: +An @dfn{array reference} is an expression as follows: @example @var{array}[@var{index-expression}] @@ -15939,8 +15839,11 @@ An array reference is an expression as follows: Here, @var{array} is the name of an array. The expression @var{index-expression} is the index of the desired element of the array. +@c 1/2015: Having the 4.3 in @samp is a little iffy. It's essentially +@c an expression though, so leave be. It's to early in the discussion +@c to mention that it's really a string. The value of the array reference is the current value of that array -element. For example, @code{foo[4.3]} is an expression for the element +element. For example, @code{foo[4.3]} is an expression referencing the element of array @code{foo} at index @samp{4.3}. @cindex arrays, unassigned elements @@ -16032,7 +15935,7 @@ assign to that element of the array. The following program takes a list of lines, each beginning with a line number, and prints them out in order of line number. The line numbers -are not in order when they are first read---instead they +are not in order when they are first read---instead, they are scrambled. This program sorts the lines by making an array using the line numbers as subscripts. The program then prints out the lines in sorted order of their numbers. It is a very simple program and gets @@ -16126,7 +16029,7 @@ program has previously used, with the variable @var{var} set to that index. The following program uses this form of the @code{for} statement. The first rule scans the input records and notes which words appear (at least once) in the input, by storing a one into the array @code{used} with -the word as index. The second rule scans the elements of @code{used} to +the word as the index. The second rule scans the elements of @code{used} to find all the distinct words that appear in the input. It prints each word that is more than 10 characters long and also prints the number of such words. @@ -16223,7 +16126,7 @@ and will vary from one version of @command{awk} to the next. Often, though, you may wish to do something simple, such as ``traverse the array by comparing the indices in ascending order,'' or ``traverse the array by comparing the values in descending order.'' -@command{gawk} provides two mechanisms which give you this control. +@command{gawk} provides two mechanisms that give you this control: @itemize @value{BULLET} @item @@ -16280,21 +16183,26 @@ across different environments.} which @command{gawk} uses internally to perform the sorting. @item "@@ind_str_desc" -String indices ordered from high to low. +Like @code{"@@ind_str_asc"}, but the +string indices are ordered from high to low. @item "@@ind_num_desc" -Numeric indices ordered from high to low. +Like @code{"@@ind_num_asc"}, but the +numeric indices are ordered from high to low. @item "@@val_type_desc" -Element values, based on type, ordered from high to low. +Like @code{"@@val_type_asc"}, but the +element values, based on type, are ordered from high to low. Subarrays, if present, come out first. @item "@@val_str_desc" -Element values, treated as strings, ordered from high to low. +Like @code{"@@val_str_asc"}, but the +element values, treated as strings, are ordered from high to low. Subarrays, if present, come out first. @item "@@val_num_desc" -Element values, treated as numbers, ordered from high to low. +Like @code{"@@val_num_asc"}, but the +element values, treated as numbers, are ordered from high to low. Subarrays, if present, come out first. @end table @@ -16517,7 +16425,7 @@ for (i in frequencies) @noindent This example removes all the elements from the array @code{frequencies}. Once an element is deleted, a subsequent @code{for} statement to scan the array -does not report that element and the @code{in} operator to check for +does not report that element and using the @code{in} operator to check for the presence of that element returns zero (i.e., false): @example @@ -16777,7 +16685,7 @@ a[1][2] = 2 This simulates a true two-dimensional array. Each subarray element can contain another subarray as a value, which in turn can hold other arrays as well. In this way, you can create arrays of three or more dimensions. -The indices can be any @command{awk} expression, including scalars +The indices can be any @command{awk} expressions, including scalars separated by commas (i.e., a regular @command{awk} simulated multidimensional subscript). So the following is valid in @command{gawk}: @@ -16789,7 +16697,7 @@ a[1][3][1, "name"] = "barney" Each subarray and the main array can be of different length. In fact, the elements of an array or its subarray do not all have to have the same type. This means that the main array and any of its subarrays can be -non-rectangular, or jagged in structure. You can assign a scalar value to +nonrectangular, or jagged in structure. You can assign a scalar value to the index @code{4} of the main array @code{a}, even though @code{a[1]} is itself an array and not a scalar: @@ -16813,7 +16721,8 @@ a[4][5][6][7] = "An element in a four-dimensional array" @noindent This removes the scalar value from index @code{4} and then inserts a -subarray of subarray of subarray containing a scalar. You can also +three-level nested subarray +containing a scalar. You can also delete an entire subarray or subarray of subarrays: @example @@ -16824,7 +16733,7 @@ a[4][5] = "An element in subarray a[4]" But recall that you can not delete the main array @code{a} and then use it as a scalar. -The built-in functions which take array arguments can also be used +The built-in functions that take array arguments can also be used with subarrays. For example, the following code fragment uses @code{length()} (@pxref{String Functions}) to determine the number of elements in the main array @code{a} and @@ -16854,7 +16763,7 @@ can be nested to scan all the elements of an array of arrays if it is rectangular in structure. In order to print the contents (scalar values) of a two-dimensional array of arrays (i.e., in which each first-level element is itself an -array, not necessarily of the same length) +array, not necessarily of the same length), you could use the following code: @example @@ -16954,9 +16863,9 @@ versions of @command{awk}. @item Standard @command{awk} simulates multidimensional arrays by separating -subscript values with a comma. The values are concatenated into a +subscript values with commas. The values are concatenated into a single string, separated by the value of @code{SUBSEP}. The fact -that such a subscript was created in this way is not retained; thus +that such a subscript was created in this way is not retained; thus, changing @code{SUBSEP} may have unexpected consequences. You can use @samp{(@var{sub1}, @var{sub2}, @dots{}) in @var{array}} to see if such a multidimensional subscript exists in @var{array}. @@ -16965,7 +16874,7 @@ a multidimensional subscript exists in @var{array}. @command{gawk} provides true arrays of arrays. You use a separate set of square brackets for each dimension in such an array: @code{data[row][col]}, for example. Array elements may thus be either -scalar values (number or string) or another array. +scalar values (number or string) or other arrays. @item Use the @code{isarray()} built-in function to determine if an array @@ -16973,14 +16882,11 @@ element is itself a subarray. @end itemize -@c ENDOFRANGE arrs @node Functions @chapter Functions -@c STARTOFRANGE funcbi @cindex functions, built-in -@c STARTOFRANGE bifunc @cindex built-in functions This @value{CHAPTER} describes @command{awk}'s built-in functions, which fall into three categories: numeric, string, and I/O. @@ -16993,6 +16899,9 @@ Besides the built-in functions, @command{awk} has provisions for writing new functions that the rest of a program can use. The second half of this @value{CHAPTER} describes these @dfn{user-defined} functions. +Finally, we explore indirect function calls, a @command{gawk}-specific +extension that lets you determine at runtime what function is to +be called. @menu * Built-in:: Summarizes the built-in functions. @@ -17002,7 +16911,7 @@ The second half of this @value{CHAPTER} describes these @end menu @node Built-in -@section Built-In Functions +@section Built-in Functions @dfn{Built-in} functions are always available for your @command{awk} program to call. This @value{SECTION} defines all @@ -17025,7 +16934,7 @@ but are summarized here for your convenience. @end menu @node Calling Built-in -@subsection Calling Built-In Functions +@subsection Calling Built-in Functions To call one of @command{awk}'s built-in functions, write the name of the function followed @@ -17076,7 +16985,7 @@ j = atan2(++i, i *= 2) @end example If the order of evaluation is left to right, then @code{i} first becomes -6, and then 12, and @code{atan2()} is called with the two arguments 6 +six, and then 12, and @code{atan2()} is called with the two arguments six and 12. But if the order of evaluation is right to left, @code{i} first becomes 10, then 11, and @code{atan2()} is called with the two arguments 11 and 10. @@ -17157,7 +17066,7 @@ In fact, @command{gawk} uses the BSD @code{random()} function, which is considerably better than @code{rand()}, to produce random numbers.} Often random integers are needed instead. Following is a user-defined function -that can be used to obtain a random non-negative integer less than @var{n}: +that can be used to obtain a random nonnegative integer less than @var{n}: @example function randint(n) @@ -17252,7 +17161,7 @@ implementations. The functions in this @value{SECTION} look at or change the text of one or more strings. -@code{gawk} understands locales (@pxref{Locales}), and does all +@command{gawk} understands locales (@pxref{Locales}) and does all string processing in terms of @emph{characters}, not @emph{bytes}. This distinction is particularly important to understand for locales where one character may be represented by multiple bytes. Thus, for @@ -17341,7 +17250,7 @@ a[2] = "de" a[3] = "sac" @end example -The @code{asorti()} function works similarly to @code{asort()}, however, +The @code{asorti()} function works similarly to @code{asort()}; however, the @emph{indices} are sorted, instead of the values. Thus, in the previous example, starting with the same initial set of indices and values in @code{a}, calling @samp{asorti(a)} would yield: @@ -17456,7 +17365,7 @@ If @var{find} is not found, @code{index()} returns zero. With BWK @command{awk} and @command{gawk}, it is a fatal error to use a regexp constant for @var{find}. Other implementations allow it, simply treating the regexp -constant as an expression meaning @samp{$0 ~ /regexp/}. @value{DARKCORNER}. +constant as an expression meaning @samp{$0 ~ /regexp/}. @value{DARKCORNER} @item @code{length(}[@var{string}]@code{)} @cindexawkfunc{length} @@ -17539,7 +17448,7 @@ If @option{--posix} is supplied, using an array argument is a fatal error @cindex string, regular expression match @cindex match regexp in string Search @var{string} for the -longest, leftmost substring matched by the regular expression, +longest, leftmost substring matched by the regular expression @var{regexp} and return the character position (index) at which that substring begins (one, if it starts at the beginning of @var{string}). If no match is found, return zero. @@ -17551,7 +17460,7 @@ In the latter case, the string is treated as a regexp to be matched. discussion of the difference between the two forms, and the implications for writing your program correctly. -The order of the first two arguments is backwards from most other string +The order of the first two arguments is the opposite of most other string functions that work with regular expressions, such as @code{sub()} and @code{gsub()}. It might help to remember that for @code{match()}, the order is the same as for the @samp{~} operator: @@ -17640,7 +17549,7 @@ $ @kbd{echo foooobazbarrrrr |} @end example There may not be subscripts for the start and index for every parenthesized -subexpression, because they may not all have matched text; thus they +subexpression, because they may not all have matched text; thus, they should be tested for with the @code{in} operator (@pxref{Reference to Elements}). @@ -17687,13 +17596,13 @@ a regexp describing where to split @var{string} (much as @code{FS} can be a regexp describing where to split input records). If @var{fieldsep} is omitted, the value of @code{FS} is used. @code{split()} returns the number of elements created. -@var{seps} is a @command{gawk} extension with @code{@var{seps}[@var{i}]} +@var{seps} is a @command{gawk} extension, with @code{@var{seps}[@var{i}]} being the separator string between @code{@var{array}[@var{i}]} and @code{@var{array}[@var{i}+1]}. If @var{fieldsep} is a single -space then any leading whitespace goes into @code{@var{seps}[0]} and +space, then any leading whitespace goes into @code{@var{seps}[0]} and any trailing -whitespace goes into @code{@var{seps}[@var{n}]} where @var{n} is the +whitespace goes into @code{@var{seps}[@var{n}]}, where @var{n} is the return value of @code{split()} (i.e., the number of elements in @var{array}). @@ -17706,7 +17615,7 @@ split("cul-de-sac", a, "-", seps) @noindent @cindex strings splitting, example -splits the string @samp{cul-de-sac} into three fields using @samp{-} as the +splits the string @code{"cul-de-sac"} into three fields using @samp{-} as the separator. It sets the contents of the array @code{a} as follows: @example @@ -17731,19 +17640,18 @@ As with input field-splitting, when the value of @var{fieldsep} is the elements of @var{array} but not in @var{seps}, and the elements are separated by runs of whitespace. -Also, as with input field-splitting, if @var{fieldsep} is the null string, each +Also, as with input field splitting, if @var{fieldsep} is the null string, each individual character in the string is split into its own array element. @value{COMMONEXT} Note, however, that @code{RS} has no effect on the way @code{split()} -works. Even though @samp{RS = ""} causes newline to also be an input +works. Even though @samp{RS = ""} causes the newline character to also be an input field separator, this does not affect how @code{split()} splits strings. @cindex dark corner, @code{split()} function Modern implementations of @command{awk}, including @command{gawk}, allow -the third argument to be a regexp constant (@code{/abc/}) as well as a -string. -@value{DARKCORNER} +the third argument to be a regexp constant (@w{@code{/}@dots{}@code{/}}) +as well as a string. @value{DARKCORNER} The POSIX standard allows this as well. @DBXREF{Computed Regexps} for a discussion of the difference between using a string constant or a regexp constant, @@ -17880,7 +17788,7 @@ an @samp{&}: @cindex @code{sub()} function, arguments of @cindex @code{gsub()} function, arguments of As mentioned, the third argument to @code{sub()} must -be a variable, field or array element. +be a variable, field, or array element. Some versions of @command{awk} allow the third argument to be an expression that is not an lvalue. In such a case, @code{sub()} still searches for the pattern and returns zero or one, but the result of @@ -18072,8 +17980,8 @@ example, @code{"a\qb"} is treated as @code{"aqb"}. At the runtime level, the various functions handle sequences of @samp{\} and @samp{&} differently. The situation is (sadly) somewhat complex. -Historically, the @code{sub()} and @code{gsub()} functions treated the two -character sequence @samp{\&} specially; this sequence was replaced in +Historically, the @code{sub()} and @code{gsub()} functions treated the +two-character sequence @samp{\&} specially; this sequence was replaced in the generated text with a single @samp{&}. Any other @samp{\} within the @var{replacement} string that did not precede an @samp{&} was passed through unchanged. This is illustrated in @ref{table-sub-escapes}. @@ -18131,7 +18039,7 @@ _bigskip} @end float @noindent -This table shows both the lexical-level processing, where +This table shows the lexical-level processing, where an odd number of backslashes becomes an even number at the runtime level, as well as the runtime processing done by @code{sub()}. (For the sake of simplicity, the rest of the following tables only show the @@ -18152,7 +18060,7 @@ This is shown in @ref{table-sub-proposed}. @float Table,table-sub-proposed -@caption{GNU @command{awk} rules for @code{sub()} and backslash} +@caption{@command{gawk} rules for @code{sub()} and backslash} @tex \vbox{\bigskip % We need more characters for escape and tab ... @@ -18197,7 +18105,7 @@ _bigskip} @end float In a nutshell, at the runtime level, there are now three special sequences -of characters (@samp{\\\&}, @samp{\\&} and @samp{\&}) whereas historically +of characters (@samp{\\\&}, @samp{\\&}, and @samp{\&}) whereas historically there was only one. However, as in the historical case, any @samp{\} that is not part of one of these three sequences is not special and appears in the output literally. @@ -18263,7 +18171,7 @@ The only case where the difference is noticeable is the last one: @samp{\\\\} is seen as @samp{\\} and produces @samp{\} instead of @samp{\\}. Starting with @value{PVERSION} 3.1.4, @command{gawk} followed the POSIX rules -when @option{--posix} is specified (@pxref{Options}). Otherwise, +when @option{--posix} was specified (@pxref{Options}). Otherwise, it continued to follow the proposed rules, as that had been its behavior for many years. @@ -18331,7 +18239,7 @@ _bigskip} @end ifnottex @end float -Because of the complexity of the lexical and runtime level processing +Because of the complexity of the lexical- and runtime-level processing and the special cases for @code{sub()} and @code{gsub()}, we recommend the use of @command{gawk} and @code{gensub()} when you have to do substitutions. @@ -18357,6 +18265,7 @@ for more information. When closing a coprocess, it is occasionally useful to first close one end of the two-way pipe and then to close the other. This is done by providing a second argument to @code{close()}. This second argument +(@var{how}) should be one of the two string values @code{"to"} or @code{"from"}, indicating which end of the pipe to close. Case in the string does not matter. @@ -18383,7 +18292,7 @@ every little bit of information as soon as it is ready. However, sometimes it is necessary to force a program to @dfn{flush} its buffers (i.e., write the information to its destination, even if a buffer is not full). This is the purpose of the @code{fflush()} function---@command{gawk} also -buffers its output and the @code{fflush()} function forces +buffers its output, and the @code{fflush()} function forces @command{gawk} to flush its buffers. @cindex extensions, common@comma{} @code{fflush()} function @@ -18404,7 +18313,7 @@ would flush only the standard output if there was no argument, and flush all output files and pipes if the argument was the null string. This was changed in order to be compatible with Brian Kernighan's @command{awk}, in the hope that standardizing this -feature in POSIX would then be easier (which indeed helped). +feature in POSIX would then be easier (which indeed proved to be the case). With @command{gawk}, you can use @samp{fflush("/dev/stdout")} if you wish to flush @@ -18415,7 +18324,7 @@ only the standard output. @c @cindex warnings, automatic @cindex troubleshooting, @code{fflush()} function @code{fflush()} returns zero if the buffer is successfully flushed; -otherwise, it returns non-zero. (@command{gawk} returns @minus{}1.) +otherwise, it returns a nonzero value. (@command{gawk} returns @minus{}1.) In the case where all buffers are flushed, the return value is zero only if all buffers were flushed successfully. Otherwise, it is @minus{}1, and @command{gawk} warns about the problem @var{filename}. @@ -18433,8 +18342,8 @@ In such a case, @code{fflush()} returns @minus{}1, as well. @cindex buffering, interactive vs.@: noninteractive -As a side point, buffering issues can be even more confusing, depending -upon whether your program is @dfn{interactive} (i.e., communicating +As a side point, buffering issues can be even more confusing if +your program is @dfn{interactive} (i.e., communicating with a user sitting at a keyboard).@footnote{A program is interactive if the standard output is connected to a terminal device. On modern systems, this means your keyboard and screen.} @@ -18484,8 +18393,8 @@ it is all buffered and sent down the pipe to @command{cat} in one shot. @cindex buffering, interactive vs.@: noninteractive -As a side point, buffering issues can be even more confusing, depending -upon whether your program is @dfn{interactive} (i.e., communicating +As a side point, buffering issues can be even more confusing if +your program is @dfn{interactive} (i.e., communicating with a user sitting at a keyboard).@footnote{A program is interactive if the standard output is connected to a terminal device. On modern systems, this means your keyboard and screen.} @@ -18529,7 +18438,7 @@ it is all buffered and sent down the pipe to @command{cat} in one shot. @cindexawkfunc{system} @cindex invoke shell command @cindex interacting with other programs -Execute the operating-system +Execute the operating system command @var{command} and then return to the @command{awk} program. Return @var{command}'s exit status. @@ -18704,18 +18613,14 @@ you would see the latter (undesirable) output. @subsection Time Functions @cindex time functions -@c STARTOFRANGE tst @cindex timestamps -@c STARTOFRANGE logftst @cindex log files, timestamps in -@c STARTOFRANGE filogtst @cindex files, log@comma{} timestamps in -@c STARTOFRANGE gawtst @cindex @command{gawk}, timestamps @cindex POSIX @command{awk}, timestamps and -@code{awk} programs are commonly used to process log files +@command{awk} programs are commonly used to process log files containing timestamp information, indicating when a -particular log record was written. Many programs log their timestamp +particular log record was written. Many programs log their timestamps in the form returned by the @code{time()} system call, which is the number of seconds since a particular epoch. On POSIX-compliant systems, it is the number of seconds since @@ -18776,7 +18681,7 @@ The values of these numbers need not be within the ranges specified; for example, an hour of @minus{}1 means 1 hour before midnight. The origin-zero Gregorian calendar is assumed, with year 0 preceding year 1 and year @minus{}1 preceding year 0. -The time is assumed to be in the local timezone. +The time is assumed to be in the local time zone. If the daylight-savings flag is positive, the time is assumed to be daylight savings time; if zero, the time is assumed to be standard time; and if negative (the default), @code{mktime()} attempts to determine @@ -18788,7 +18693,6 @@ is out of range, @code{mktime()} returns @minus{}1. @cindex @command{gawk}, @code{PROCINFO} array in @cindex @code{PROCINFO} array @item @code{strftime(}[@var{format} [@code{,} @var{timestamp} [@code{,} @var{utc-flag}] ] ]@code{)} -@c STARTOFRANGE strf @cindexgawkfunc{strftime} @cindex format time string Format the time specified by @var{timestamp} @@ -18937,12 +18841,12 @@ Equivalent to specifying @samp{%H:%M:%S}. The weekday as a decimal number (1--7). Monday is day one. @item %U -The week number of the year (the first Sunday as the first day of week one) +The week number of the year (with the first Sunday as the first day of week one) as a decimal number (00--53). @c @cindex ISO 8601 @item %V -The week number of the year (the first Monday as the first +The week number of the year (with the first Monday as the first day of week one) as a decimal number (01--53). The method for determining the week number is as specified by ISO 8601. (To wit: if the week containing January 1 has four or more days in the @@ -18953,7 +18857,7 @@ and the next week is week one.) The weekday as a decimal number (0--6). Sunday is day zero. @item %W -The week number of the year (the first Monday as the first day of week one) +The week number of the year (with the first Monday as the first day of week one) as a decimal number (00--53). @item %x @@ -18973,8 +18877,8 @@ The full year as a decimal number (e.g., 2015). @c @cindex RFC 822 @c @cindex RFC 1036 @item %z -The timezone offset in a +HHMM format (e.g., the format necessary to -produce RFC 822/RFC 1036 date headers). +The time zone offset in a @samp{+@var{HHMM}} format (e.g., the format +necessary to produce RFC 822/RFC 1036 date headers). @item %Z The time zone name or abbreviation; no characters if @@ -19037,7 +18941,6 @@ The time as a decimal timestamp in seconds since the epoch. The date in VMS format (e.g., @samp{20-JUN-1991}). @end ignore @end table -@c ENDOFRANGE strf Additionally, the alternative representations are recognized but their normal representations are used. @@ -19088,23 +18991,14 @@ gawk 'BEGIN @{ exit exitval @}' "$@@" @end example -@c ENDOFRANGE tst -@c ENDOFRANGE logftst -@c ENDOFRANGE filogtst -@c ENDOFRANGE gawtst @node Bitwise Functions @subsection Bit-Manipulation Functions @cindex bit-manipulation functions -@c STARTOFRANGE bit @cindex bitwise, operations -@c STARTOFRANGE and @cindex AND bitwise operation -@c STARTOFRANGE oro @cindex OR bitwise operation -@c STARTOFRANGE xor @cindex XOR bitwise operation -@c STARTOFRANGE opbit @cindex operations, bitwise @quotation @i{I can explain it for you, but I can't understand it for you.} @@ -19124,7 +19018,7 @@ The operations are described in @ref{table-bitwise-ops}. @ifnottex @ifnotdocbook @display - Bit Operator + Bit operator | AND | OR | XOR |---+---+---+---+---+--- Operands | 0 | 1 | 0 | 1 | 0 | 1 @@ -19182,7 +19076,7 @@ Operands | 0 | 1 | 0 | 1 | 0 | 1 <tbody> <row> <entry colsep="0"></entry> -<entry spanname="optitle"><emphasis role="bold">Bit Operator</emphasis></entry> +<entry spanname="optitle"><emphasis role="bold">Bit operator</emphasis></entry> </row> <row rowsep="1"> @@ -19246,10 +19140,9 @@ of a given value. Finally, two other common operations are to shift the bits left or right. For example, if you have a bit string @samp{10111001} and you shift it right by three bits, you end up with @samp{00010111}.@footnote{This example -shows that 0's come in on the left side. For @command{gawk}, this is +shows that zeros come in on the left side. For @command{gawk}, this is always true, but in some languages, it's possible to have the left side -fill with 1's.} -@c Purposely decided to use 0's and 1's here. 2/2001. +fill with ones.} If you start over again with @samp{10111001} and shift it left by three bits, you end up with @samp{11001000}. The following list describes @command{gawk}'s built-in functions that implement the bitwise operations. @@ -19303,7 +19196,7 @@ that illustrates the use of these functions: @example @group @c file eg/lib/bits2str.awk -# bits2str --- turn a byte into readable 1's and 0's +# bits2str --- turn a byte into readable ones and zeros function bits2str(bits, data, mask) @{ @@ -19377,15 +19270,16 @@ $ @kbd{gawk -f testbits.awk} @cindex converting, numbers to strings @cindex number as string of bits The @code{bits2str()} function turns a binary number into a string. -The number @code{1} represents a binary value where the rightmost bit -is set to 1. Using this mask, +Initializing @code{mask} to one creates +a binary value where the rightmost bit +is set to one. Using this mask, the function repeatedly checks the rightmost bit. ANDing the mask with the value indicates whether the -rightmost bit is 1 or not. If so, a @code{"1"} is concatenated onto the front +rightmost bit is one or not. If so, a @code{"1"} is concatenated onto the front of the string. Otherwise, a @code{"0"} is added. The value is then shifted right by one bit and the loop continues -until there are no more 1 bits. +until there are no more one bits. If the initial value is zero, it returns a simple @code{"0"}. Otherwise, at the end, it pads the value with zeros to represent multiples @@ -19396,11 +19290,6 @@ decimal and octal values for the same numbers (@pxref{Nondecimal-numbers}), and then demonstrates the results of the @code{compl()}, @code{lshift()}, and @code{rshift()} functions. -@c ENDOFRANGE bit -@c ENDOFRANGE and -@c ENDOFRANGE oro -@c ENDOFRANGE xor -@c ENDOFRANGE opbit @node Type Functions @subsection Getting Type Information @@ -19414,7 +19303,7 @@ that traverses every element of an array of arrays @cindexgawkfunc{isarray} @cindex scalar or array @item isarray(@var{x}) -Return a true value if @var{x} is an array. Otherwise return false. +Return a true value if @var{x} is an array. Otherwise, return false. @end table @code{isarray()} is meant for use in two circumstances. The first is when @@ -19475,20 +19364,16 @@ The default value for @var{category} is @code{"LC_MESSAGES"}. Return the plural form used for @var{number} of the translation of @var{string1} and @var{string2} in text domain @var{domain} for locale category @var{category}. @var{string1} is the -English singular variant of a message, and @var{string2} the English plural +English singular variant of a message, and @var{string2} is the English plural variant of the same message. The default value for @var{domain} is the current value of @code{TEXTDOMAIN}. The default value for @var{category} is @code{"LC_MESSAGES"}. @end table -@c ENDOFRANGE funcbi -@c ENDOFRANGE bifunc @node User-defined @section User-Defined Functions -@c STARTOFRANGE udfunc @cindex user-defined functions -@c STARTOFRANGE funcud @cindex functions, user-defined Complicated @command{awk} programs can often be simplified by defining your own functions. User-defined functions can be called just like @@ -19508,12 +19393,11 @@ them (i.e., to tell @command{awk} what they should do). @subsection Function Definition Syntax @quotation -@i{It's entirely fair to say that the @command{awk} syntax for local +@i{It's entirely fair to say that the awk syntax for local variable definitions is appallingly awful.} @author Brian Kernighan @end quotation -@c STARTOFRANGE fdef @cindex functions, defining Definitions of functions can appear anywhere between the rules of an @command{awk} program. Thus, the general form of an @command{awk} program is @@ -19551,14 +19435,23 @@ the call. A function cannot have two parameters with the same name, nor may it have a parameter with the same name as the function itself. -In addition, according to the POSIX standard, function parameters + +@quotation CAUTION +According to the POSIX standard, function parameters cannot have the same name as one of the special predefined variables -(@pxref{Built-in Variables}). Not all versions of @command{awk} enforce -this restriction. +(@pxref{Built-in Variables}), nor may a function parameter have the +same name as another function. + +Not all versions of @command{awk} enforce +these restrictions. +@command{gawk} always enforces the first restriction. +With @option{--posix} (@pxref{Options}), +it also enforces the second restriction. +@end quotation Local variables act like the empty string if referenced where a string value is required, and like zero if referenced where a numeric value -is required. This is the same as regular variables that have never been +is required. This is the same as the behavior of regular variables that have never been assigned a value. (There is more to understand about local variables; @pxref{Dynamic Typing}.) @@ -19592,7 +19485,7 @@ During execution of the function body, the arguments and local variable values hide, or @dfn{shadow}, any variables of the same names used in the rest of the program. The shadowed variables are not accessible in the function definition, because there is no way to name them while their -names have been taken away for the local variables. All other variables +names have been taken away for the arguments and local variables. All other variables used in the @command{awk} program can be referenced or set normally in the function's body. @@ -19659,7 +19552,7 @@ function myprint(num) @end example @noindent -To illustrate, here is an @command{awk} rule that uses our @code{myprint} +To illustrate, here is an @command{awk} rule that uses our @code{myprint()} function: @example @@ -19700,13 +19593,13 @@ in an array and start over with a new list of elements (@pxref{Delete}). Instead of having to repeat this loop everywhere that you need to clear out -an array, your program can just call @code{delarray}. +an array, your program can just call @code{delarray()}. (This guarantees portability. The use of @samp{delete @var{array}} to delete the contents of an entire array is a relatively recent@footnote{Late in 2012.} addition to the POSIX standard.) The following is an example of a recursive function. It takes a string -as an input parameter and returns the string in backwards order. +as an input parameter and returns the string in reverse order. Recursive functions must always have a test that stops the recursion. In this case, the recursion terminates when the input string is already empty: @@ -19760,12 +19653,10 @@ You might think that @code{ctime()} could use @code{PROCINFO["strftime"]} for its format string. That would be a mistake, because @code{ctime()} is supposed to return the time formatted in a standard fashion, and user-level code could have changed @code{PROCINFO["strftime"]}. -@c ENDOFRANGE fdef @node Function Caveats @subsection Calling User-Defined Functions -@c STARTOFRANGE fudc @cindex functions, user-defined, calling @dfn{Calling a function} means causing the function to run and do its job. A function call is an expression and its value is the value returned by @@ -19805,7 +19696,7 @@ an error. @cindex local variables, in a function @cindex variables, local to a function -Unlike many languages, +Unlike in many languages, there is no way to make a variable local to a @code{@{} @dots{} @code{@}} block in @command{awk}, but you can make a variable local to a function. It is good practice to do so whenever a variable is needed only in that @@ -19814,7 +19705,7 @@ function. To make a variable local to a function, simply declare the variable as an argument after the actual function arguments (@pxref{Definition Syntax}). -Look at the following example where variable +Look at the following example, where variable @code{i} is a global variable used by both functions @code{foo()} and @code{bar()}: @@ -19855,7 +19746,7 @@ foo's i=3 top's i=3 @end example -If you want @code{i} to be local to both @code{foo()} and @code{bar()} do as +If you want @code{i} to be local to both @code{foo()} and @code{bar()}, do as follows (the extra space before @code{i} is a coding convention to indicate that @code{i} is a local variable, not an argument): @@ -19943,7 +19834,7 @@ declare explicitly whether the arguments are passed @dfn{by value} or @dfn{by reference}. Instead, the passing convention is determined at runtime when -the function is called according to the following rule: +the function is called, according to the following rule: if the argument is an array variable, then it is passed by reference. Otherwise, the argument is passed by value. @@ -20020,7 +19911,7 @@ prints @samp{a[1] = 1, a[2] = two, a[3] = 3}, because @cindex undefined functions @cindex functions, undefined Some @command{awk} implementations allow you to call a function that -has not been defined. They only report a problem at runtime when the +has not been defined. They only report a problem at runtime, when the program actually tries to call the function. For example: @example @@ -20057,7 +19948,6 @@ or the @code{nextfile} statement @end ifnotdocbook inside a user-defined function. @command{gawk} does not have this limitation. -@c ENDOFRANGE fudc @node Return Statement @subsection The @code{return} Statement @@ -20080,15 +19970,15 @@ makes the returned value undefined, and therefore, unpredictable. In practice, though, all versions of @command{awk} simply return the null string, which acts like zero if used in a numeric context. -A @code{return} statement with no value expression is assumed at the end of -every function definition. So if control reaches the end of the function -body, then technically, the function returns an unpredictable value. +A @code{return} statement without an @var{expression} is assumed at the end of +every function definition. So, if control reaches the end of the function +body, then technically the function returns an unpredictable value. In practice, it returns the empty string. @command{awk} does @emph{not} warn you if you use the return value of such a function. Sometimes, you want to write a function for what it does, not for what it returns. Such a function corresponds to a @code{void} function -in C, C++ or Java, or to a @code{procedure} in Ada. Thus, it may be appropriate to not +in C, C++, or Java, or to a @code{procedure} in Ada. Thus, it may be appropriate to not return any value; simply bear in mind that you should not be using the return value of such a function. @@ -20185,7 +20075,6 @@ does report the second error. Usually, such things aren't a big issue, but it's worth being aware of them. -@c ENDOFRANGE udfunc @node Indirect Calls @section Indirect Function Calls @@ -20208,13 +20097,15 @@ function calls, you can specify the name of the function to call as a string variable, and then call the function. Let's look at an example. Suppose you have a file with your test scores for the classes you -are taking. The first field is the class name. The following fields +are taking, and +you wish to get the sum and the average of +your test scores. +The first field is the class name. The following fields are the functions to call to process the data, up to a ``marker'' field @samp{data:}. Following the marker, to the end of the record, are the various numeric test scores. -Here is the initial file; you wish to get the sum and the average of -your test scores: +Here is the initial file: @example @c file eg/data/class_data1 @@ -20297,9 +20188,9 @@ function sum(first, last, ret, i) @c endfile @end example -These two functions expect to work on fields; thus the parameters +These two functions expect to work on fields; thus, the parameters @code{first} and @code{last} indicate where in the fields to start and end. -Otherwise they perform the expected computations and are not unusual: +Otherwise, they perform the expected computations and are not unusual: @example @c file eg/prog/indirectcall.awk @@ -20358,8 +20249,8 @@ The ability to use indirect function calls is more powerful than you may think at first. The C and C++ languages provide ``function pointers,'' which are a mechanism for calling a function chosen at runtime. One of the most well-known uses of this ability is the C @code{qsort()} function, which sorts -an array using the famous ``quick sort'' algorithm -(see @uref{http://en.wikipedia.org/wiki/Quick_sort, the Wikipedia article} +an array using the famous ``quicksort'' algorithm +(see @uref{http://en.wikipedia.org/wiki/Quicksort, the Wikipedia article} for more information). To use this function, you supply a pointer to a comparison function. This mechanism allows you to sort arbitrary data in an arbitrary fashion. @@ -20378,11 +20269,11 @@ We can do something similar using @command{gawk}, like this: # January 2009 @c endfile - @end ignore @c file eg/lib/quicksort.awk -# quicksort --- C.A.R. Hoare's quick sort algorithm. See Wikipedia -# or almost any algorithms or computer science text + +# quicksort --- C.A.R. Hoare's quicksort algorithm. See Wikipedia +# or almost any algorithms or computer science text. @c endfile @ignore @c file eg/lib/quicksort.awk @@ -20420,7 +20311,7 @@ function quicksort_swap(data, i, j, temp) The @code{quicksort()} function receives the @code{data} array, the starting and ending indices to sort (@code{left} and @code{right}), and the name of a function that -performs a ``less than'' comparison. It then implements the quick sort algorithm. +performs a ``less than'' comparison. It then implements the quicksort algorithm. To make use of the sorting function, we return to our previous example. The first thing to do is write some comparison functions: @@ -20611,7 +20502,7 @@ for (i = 1; i <= n; i++) @end example @noindent -@code{gawk} looks up the actual function to call only once. +@command{gawk} looks up the actual function to call only once. @node Functions Summary @section Summary @@ -20678,7 +20569,6 @@ program. This is equivalent to function pointers in C and C++. @end itemize -@c ENDOFRANGE funcud @ifnotinfo @part @value{PART2}Problem Solving with @command{awk} @@ -20700,18 +20590,15 @@ It contains the following chapters: @node Library Functions @chapter A Library of @command{awk} Functions -@c STARTOFRANGE libf @cindex libraries of @command{awk} functions -@c STARTOFRANGE flib @cindex functions, library -@c STARTOFRANGE fudlib @cindex functions, user-defined, library of @DBREF{User-defined} describes how to write your own @command{awk} functions. Writing functions is important, because it allows you to encapsulate algorithms and program tasks in a single place. It simplifies programming, making program development more -manageable, and making programs more readable. +manageable and making programs more readable. @cindex Kernighan, Brian @cindex Plauger, P.J.@: @@ -20840,7 +20727,7 @@ often use variable names like these for their own purposes. The example programs shown in this @value{CHAPTER} all start the names of their private variables with an underscore (@samp{_}). Users generally don't use leading underscores in their variable names, so this convention immediately -decreases the chances that the variable name will be accidentally shared +decreases the chances that the variable names will be accidentally shared with the user's program. @cindex @code{_} (underscore), in names of private variables @@ -20858,8 +20745,8 @@ show how our own @command{awk} programming style has evolved and to provide some basis for this discussion.} As a final note on variable naming, if a function makes global variables -available for use by a main program, it is a good convention to start that -variable's name with a capital letter---for +available for use by a main program, it is a good convention to start those +variables' names with a capital letter---for example, @code{getopt()}'s @code{Opterr} and @code{Optind} variables (@pxref{Getopt Function}). The leading capital letter indicates that it is global, while the fact that @@ -20870,7 +20757,7 @@ not one of @command{awk}'s predefined variables, such as @code{FS}. It is also important that @emph{all} variables in library functions that do not need to save state are, in fact, declared local.@footnote{@command{gawk}'s @option{--dump-variables} command-line -option is useful for verifying this.} If this is not done, the variable +option is useful for verifying this.} If this is not done, the variables could accidentally be used in the user's program, leading to bugs that are very difficult to track down: @@ -21027,13 +20914,9 @@ be tested with @command{gawk} and the results compared to the built-in @node Assert Function @subsection Assertions -@c STARTOFRANGE asse @cindex assertions -@c STARTOFRANGE assef @cindex @code{assert()} function (C library) -@c STARTOFRANGE libfass @cindex libraries of @command{awk} functions, assertions -@c STARTOFRANGE flibass @cindex functions, library, assertions @cindex @command{awk} programs, lengthy, assertions When writing large programs, it is often useful to know @@ -21072,7 +20955,7 @@ Following is the function: @example @c file eg/lib/assert.awk -# assert --- assert that a condition is true. Otherwise exit. +# assert --- assert that a condition is true. Otherwise, exit. @c endfile @ignore @@ -21108,7 +20991,7 @@ is false, it prints a message to standard error, using the @code{string} parameter to describe the failed condition. It then sets the variable @code{_assert_exit} to one and executes the @code{exit} statement. The @code{exit} statement jumps to the @code{END} rule. If the @code{END} -rules finds @code{_assert_exit} to be true, it exits immediately. +rule finds @code{_assert_exit} to be true, it exits immediately. The purpose of the test in the @code{END} rule is to keep any other @code{END} rules from running. When an assertion fails, the @@ -21149,10 +21032,6 @@ most likely causing the program to hang as it waits for input. There is a simple workaround to this: make sure that such a @code{BEGIN} rule always ends with an @code{exit} statement. -@c ENDOFRANGE asse -@c ENDOFRANGE assef -@c ENDOFRANGE flibass -@c ENDOFRANGE libfass @node Round Function @subsection Rounding Numbers @@ -21404,7 +21283,7 @@ all the strings in an array into one long string. The following function, the application programs (@pxref{Sample Programs}). -Good function design is important; this function needs to be general but it +Good function design is important; this function needs to be general, but it should also have a reasonable default behavior. It is called with an array as well as the beginning and ending indices of the elements in the array to be merged. This assumes that the array indices are numeric---a reasonable @@ -21552,7 +21431,7 @@ allowed the user to supply an optional timestamp value to use instead of the current time. @node Readfile Function -@subsection Reading a Whole File At Once +@subsection Reading a Whole File at Once Often, it is convenient to have the entire contents of a file available in memory as a single string. A straightforward but naive way to @@ -21609,13 +21488,13 @@ function readfile(file, tmp, save_rs) It works by setting @code{RS} to @samp{^$}, a regular expression that will never match if the file has contents. @command{gawk} reads data from -the file into @code{tmp} attempting to match @code{RS}. The match fails +the file into @code{tmp}, attempting to match @code{RS}. The match fails after each read, but fails quickly, such that @command{gawk} fills @code{tmp} with the entire contents of the file. (@DBXREF{Records} for information on @code{RT} and @code{RS}.) In the case that @code{file} is empty, the return value is the null -string. Thus calling code may use something like: +string. Thus, calling code may use something like: @example contents = readfile("/some/path") @@ -21626,7 +21505,7 @@ if (length(contents) == 0) This tests the result to see if it is empty or not. An equivalent test would be @samp{contents == ""}. -@xref{Extension Sample Readfile}, for an extension function that +@DBXREF{Extension Sample Readfile} for an extension function that also reads an entire file into memory. @node Shell Quoting @@ -21710,11 +21589,8 @@ function shell_quote(s, # parameter @node Data File Management @section @value{DDF} Management -@c STARTOFRANGE dataf @cindex files, managing -@c STARTOFRANGE libfdataf @cindex libraries of @command{awk} functions, managing, data files -@c STARTOFRANGE flibdataf @cindex functions, library, managing data files This @value{SECTION} presents functions that are useful for managing command-line @value{DF}s. @@ -21736,8 +21612,8 @@ The @code{BEGIN} and @code{END} rules are each executed exactly once, at the beginning and end of your @command{awk} program, respectively (@pxref{BEGIN/END}). We (the @command{gawk} authors) once had a user who mistakenly thought that the -@code{BEGIN} rule is executed at the beginning of each @value{DF} and the -@code{END} rule is executed at the end of each @value{DF}. +@code{BEGIN} rules were executed at the beginning of each @value{DF} and the +@code{END} rules were executed at the end of each @value{DF}. When informed that this was not the case, the user requested that we add new special @@ -21777,7 +21653,7 @@ END @{ endfile(FILENAME) @} This file must be loaded before the user's ``main'' program, so that the rule it supplies is executed first. -This rule relies on @command{awk}'s @code{FILENAME} variable that +This rule relies on @command{awk}'s @code{FILENAME} variable, which automatically changes for each new @value{DF}. The current @value{FN} is saved in a private variable, @code{_oldfilename}. If @code{FILENAME} does not equal @code{_oldfilename}, then a new @value{DF} is being processed and @@ -21793,7 +21669,7 @@ first @value{DF}. The program also supplies an @code{END} rule to do the final processing for the last file. Because this @code{END} rule comes before any @code{END} rules supplied in the ``main'' program, @code{endfile()} is called first. Once -again the value of multiple @code{BEGIN} and @code{END} rules should be clear. +again, the value of multiple @code{BEGIN} and @code{END} rules should be clear. @cindex @code{beginfile()} user-defined function @cindex @code{endfile()} user-defined function @@ -21841,7 +21717,7 @@ how it simplifies writing the main program. You are probably wondering, if @code{beginfile()} and @code{endfile()} functions can do the job, why does @command{gawk} have -@code{BEGINFILE} and @code{ENDFILE} patterns (@pxref{BEGINFILE/ENDFILE})? +@code{BEGINFILE} and @code{ENDFILE} patterns? Good question. Normally, if @command{awk} cannot open a file, this causes an immediate fatal error. In this case, there is no way for a @@ -21850,6 +21726,7 @@ calling it relies on the file being open and at the first record. Thus, the main reason for @code{BEGINFILE} is to give you a ``hook'' to catch files that cannot be processed. @code{ENDFILE} exists for symmetry, and because it provides an easy way to do per-file cleanup processing. +For more information, refer to @ref{BEGINFILE/ENDFILE}. @docbook </sidebar> @@ -21864,7 +21741,7 @@ and because it provides an easy way to do per-file cleanup processing. You are probably wondering, if @code{beginfile()} and @code{endfile()} functions can do the job, why does @command{gawk} have -@code{BEGINFILE} and @code{ENDFILE} patterns (@pxref{BEGINFILE/ENDFILE})? +@code{BEGINFILE} and @code{ENDFILE} patterns? Good question. Normally, if @command{awk} cannot open a file, this causes an immediate fatal error. In this case, there is no way for a @@ -21873,6 +21750,7 @@ calling it relies on the file being open and at the first record. Thus, the main reason for @code{BEGINFILE} is to give you a ``hook'' to catch files that cannot be processed. @code{ENDFILE} exists for symmetry, and because it provides an easy way to do per-file cleanup processing. +For more information, refer to @ref{BEGINFILE/ENDFILE}. @end cartouche @end ifnotdocbook @@ -21880,7 +21758,7 @@ and because it provides an easy way to do per-file cleanup processing. @subsection Rereading the Current File @cindex files, reading -Another request for a new built-in function was for a @code{rewind()} +Another request for a new built-in function was for a function that would make it possible to reread the current file. The requesting user didn't want to have to use @code{getline} (@pxref{Getline}) @@ -21889,7 +21767,7 @@ inside a loop. However, as long as you are not in the @code{END} rule, it is quite easy to arrange to immediately close the current input file and then start over with it from the top. -For lack of a better name, we'll call it @code{rewind()}: +For lack of a better name, we'll call the function @code{rewind()}: @cindex @code{rewind()} user-defined function @example @@ -21982,16 +21860,16 @@ See also @ref{ARGC and ARGV}. Because @command{awk} variable names only allow the English letters, the regular expression check purposely does not use character classes such as @samp{[:alpha:]} and @samp{[:alnum:]} -(@pxref{Bracket Expressions}) +(@pxref{Bracket Expressions}). @node Empty Files -@subsection Checking for Zero-length Files +@subsection Checking for Zero-Length Files All known @command{awk} implementations silently skip over zero-length files. This is a by-product of @command{awk}'s implicit read-a-record-and-match-against-the-rules loop: when @command{awk} tries to read a record from an empty file, it immediately receives an -end of file indication, closes the file, and proceeds on to the next +end-of-file indication, closes the file, and proceeds on to the next command-line @value{DF}, @emph{without} executing any user-level @command{awk} program code. @@ -22056,7 +21934,7 @@ Occasionally, you might not want @command{awk} to process command-line variable assignments (@pxref{Assignment Options}). In particular, if you have a @value{FN} that contains an @samp{=} character, -@command{awk} treats the @value{FN} as an assignment, and does not process it. +@command{awk} treats the @value{FN} as an assignment and does not process it. Some users have suggested an additional command-line option for @command{gawk} to disable command-line assignments. However, some simple programming with @@ -22106,22 +21984,14 @@ The use of @code{No_command_assign} allows you to disable command-line assignments at invocation time, by giving the variable a true value. When not set, it is initially zero (i.e., false), so the command-line arguments are left alone. -@c ENDOFRANGE dataf -@c ENDOFRANGE flibdataf -@c ENDOFRANGE libfdataf @node Getopt Function @section Processing Command-Line Options -@c STARTOFRANGE libfclo @cindex libraries of @command{awk} functions, command-line options -@c STARTOFRANGE flibclo @cindex functions, library, command-line options -@c STARTOFRANGE clop @cindex command-line options, processing -@c STARTOFRANGE oclp @cindex options, command-line, processing -@c STARTOFRANGE clibf @cindex functions, library, C library @cindex arguments, processing Most utilities on POSIX-compatible systems take options on @@ -22426,8 +22296,8 @@ BEGIN @{ @c endfile @end example -The rest of the @code{BEGIN} rule is a simple test program. Here is the -result of two sample runs of the test program: +The rest of the @code{BEGIN} rule is a simple test program. Here are the +results of two sample runs of the test program: @example $ @kbd{awk -f getopt.awk -v _getopt_test=1 -- -a -cbARG bax -x} @@ -22473,27 +22343,19 @@ further options Several of the sample programs presented in @ref{Sample Programs}, use @code{getopt()} to process their arguments. -@c ENDOFRANGE libfclo -@c ENDOFRANGE flibclo -@c ENDOFRANGE clop -@c ENDOFRANGE oclp @node Passwd Functions @section Reading the User Database -@c STARTOFRANGE libfudata @cindex libraries of @command{awk} functions, user database, reading -@c STARTOFRANGE flibudata @cindex functions, library, user database@comma{} reading -@c STARTOFRANGE udatar @cindex user database@comma{} reading -@c STARTOFRANGE dataur @cindex database, users@comma{} reading @cindex @code{PROCINFO} array The @code{PROCINFO} array (@pxref{Built-in Variables}) provides access to the current user's real and effective user and group ID -numbers, and if available, the user's supplementary group set. +numbers, and, if available, the user's supplementary group set. However, because these are numbers, they do not provide very useful information to the average user. There needs to be some way to find the user information associated with the user and group ID numbers. This @@ -22513,7 +22375,7 @@ kept. Instead, it provides the @code{<pwd.h>} header file and several C language subroutines for obtaining user information. The primary function is @code{getpwent()}, for ``get password entry.'' The ``password'' comes from the original user database file, -@file{/etc/passwd}, which stores user information, along with the +@file{/etc/passwd}, which stores user information along with the encrypted passwords (hence the name). @cindex @command{pwcat} program @@ -22612,7 +22474,7 @@ The user's encrypted password. This may not be available on some systems. @item User-ID The user's numeric user ID number. -(On some systems, it's a C @code{long}, and not an @code{int}. Thus +(On some systems, it's a C @code{long}, and not an @code{int}. Thus, we cast it to @code{long} for all cases.) @item Group-ID @@ -22739,7 +22601,7 @@ The code that checks for using @code{FPAT}, using @code{using_fpat} and @code{PROCINFO["FS"]}, is similar. The main part of the function uses a loop to read database lines, split -the line into fields, and then store the line into each array as necessary. +the lines into fields, and then store the lines into each array as necessary. When the loop is done, @code{@w{_pw_init()}} cleans up by closing the pipeline, setting @code{@w{_pw_inited}} to one, and restoring @code{FS} (and @code{FIELDWIDTHS} or @code{FPAT} @@ -22834,21 +22696,13 @@ and such a change would clutter up the code. The @command{id} program in @DBREF{Id Program} uses these functions. -@c ENDOFRANGE libfudata -@c ENDOFRANGE flibudata -@c ENDOFRANGE udatar -@c ENDOFRANGE dataur @node Group Functions @section Reading the Group Database -@c STARTOFRANGE libfgdata @cindex libraries of @command{awk} functions, group database, reading -@c STARTOFRANGE flibgdata @cindex functions, library, group database@comma{} reading -@c STARTOFRANGE gdatar @cindex group database, reading -@c STARTOFRANGE datagr @cindex database, group, reading @cindex @code{PROCINFO} array, and group membership @cindex @code{getgrent()} function (C library) @@ -22964,7 +22818,7 @@ it is usually empty or set to @samp{*}. @item Group ID Number The group's numeric group ID number; the association of name to number must be unique within the file. -(On some systems it's a C @code{long}, and not an @code{int}. Thus +(On some systems it's a C @code{long}, and not an @code{int}. Thus, we cast it to @code{long} for all cases.) @item Group Member List @@ -23078,32 +22932,32 @@ The @code{@w{_gr_init()}} function first saves @code{FS}, @code{$0}, and then sets @code{FS} and @code{RS} to the correct values for scanning the group information. It also takes care to note whether @code{FIELDWIDTHS} or @code{FPAT} -is being used, and to restore the appropriate field splitting mechanism. +is being used, and to restore the appropriate field-splitting mechanism. -The group information is stored is several associative arrays. +The group information is stored in several associative arrays. The arrays are indexed by group name (@code{@w{_gr_byname}}), by group ID number (@code{@w{_gr_bygid}}), and by position in the database (@code{@w{_gr_bycount}}). There is an additional array indexed by username (@code{@w{_gr_groupsbyuser}}), which is a space-separated list of groups to which each user belongs. -Unlike the user database, it is possible to have multiple records in the +Unlike in the user database, it is possible to have multiple records in the database for the same group. This is common when a group has a large number of members. A pair of such entries might look like the following: @example -tvpeople:*:101:johny,jay,arsenio +tvpeople:*:101:johnny,jay,arsenio tvpeople:*:101:david,conan,tom,joan @end example For this reason, @code{_gr_init()} looks to see if a group name or -group ID number is already seen. If it is, the usernames are -simply concatenated onto the previous list of users.@footnote{There is actually a +group ID number is already seen. If so, the usernames are +simply concatenated onto the previous list of users.@footnote{There is a subtle problem with the code just presented. Suppose that the first time there were no names. This code adds the names with a leading comma. It also doesn't check that there is a @code{$4}.} Finally, @code{_gr_init()} closes the pipeline to @command{grcat}, restores -@code{FS} (and @code{FIELDWIDTHS} or @code{FPAT} if necessary), @code{RS}, and @code{$0}, +@code{FS} (and @code{FIELDWIDTHS} or @code{FPAT}, if necessary), @code{RS}, and @code{$0}, initializes @code{_gr_count} to zero (it is used later), and makes @code{_gr_inited} nonzero. @@ -23171,7 +23025,6 @@ function getgrent() @} @c endfile @end example -@c ENDOFRANGE clibf @cindex @code{endgrent()} function (C library) The @code{endgrent()} function resets @code{_gr_count} to zero so that @code{getgrent()} can @@ -23204,12 +23057,12 @@ uses these functions. @DBREF{Arrays of Arrays} described how @command{gawk} provides arrays of arrays. In particular, any element of -an array may be either a scalar, or another array. The +an array may be either a scalar or another array. The @code{isarray()} function (@pxref{Type Functions}) lets you distinguish an array from a scalar. The following function, @code{walk_array()}, recursively traverses -an array, printing each element's indices and value. +an array, printing the element indices and values. You call it with the array and a string representing the name of the array: @@ -23260,10 +23113,6 @@ $ @kbd{gawk -f walk_array.awk} @print{} a[4][2] = 42 @end example -@c ENDOFRANGE libfgdata -@c ENDOFRANGE flibgdata -@c ENDOFRANGE gdatar -@c ENDOFRANGE libf @node Library Functions Summary @section Summary @@ -23285,24 +23134,24 @@ The functions presented here fit into the following categories: @c nested list @table @asis @item General problems -Number-to-string conversion, assertions, rounding, random number +Number-to-string conversion, testing assertions, rounding, random number generation, converting characters to numbers, joining strings, getting easily usable time-of-day information, and reading a whole file in -one shot. +one shot @item Managing @value{DF}s Noting @value{DF} boundaries, rereading the current file, checking for readable files, checking for zero-length files, and treating assignments -as @value{FN}s. +as @value{FN}s @item Processing command-line options -An @command{awk} version of the standard C @code{getopt()} function. +An @command{awk} version of the standard C @code{getopt()} function @item Reading the user and group databases -Two sets of routines that parallel the C library versions. +Two sets of routines that parallel the C library versions @item Traversing arrays of arrays -A simple function to traverse an array of arrays to any depth. +A simple function to traverse an array of arrays to any depth @end table @c end nested list @@ -23377,13 +23226,9 @@ output identical to that of the original version. @end enumerate @c EXCLUDE END -@c ENDOFRANGE flib -@c ENDOFRANGE fudlib -@c ENDOFRANGE datagr @node Sample Programs @chapter Practical @command{awk} Programs -@c STARTOFRANGE awkpex @cindex @command{awk} programs, examples of @c FULLXREF ON @@ -23401,10 +23246,10 @@ in this @value{CHAPTER}. The second presents @command{awk} versions of several common POSIX utilities. These are programs that you are hopefully already familiar with, -and therefore, whose problems are understood. +and therefore whose problems are understood. By reimplementing these programs in @command{awk}, you can focus on the @command{awk}-related aspects of solving -the programming problem. +the programming problems. The third is a grab bag of interesting programs. These solve a number of different data-manipulation and management @@ -23453,7 +23298,6 @@ cut.awk -- -c1-8 myfiles > results @node Clones @section Reinventing Wheels for Fun and Profit -@c STARTOFRANGE posimawk @cindex POSIX, programs@comma{} implementing in @command{awk} This @value{SECTION} presents a number of POSIX utilities implemented in @@ -23465,7 +23309,7 @@ It should be noted that these programs are not necessarily intended to replace the installed versions on your system. Nor may all of these programs be fully compliant with the most recent POSIX standard. This is not a problem; their -purpose is to illustrate @command{awk} language programming for ``real world'' +purpose is to illustrate @command{awk} language programming for ``real-world'' tasks. The programs are presented in alphabetical order. @@ -23484,11 +23328,8 @@ The programs are presented in alphabetical order. @subsection Cutting Out Fields and Columns @cindex @command{cut} utility -@c STARTOFRANGE cut @cindex @command{cut} utility -@c STARTOFRANGE ficut @cindex fields, cutting -@c STARTOFRANGE colcut @cindex columns, cutting The @command{cut} utility selects, or ``cuts,'' characters or fields from its standard input and sends them to its standard output. @@ -23497,7 +23338,7 @@ but you may supply a command-line option to change the field @dfn{delimiter} (i.e., the field-separator character). @command{cut}'s definition of fields is less general than @command{awk}'s. -A common use of @command{cut} might be to pull out just the login name of +A common use of @command{cut} might be to pull out just the login names of logged-on users from the output of @command{who}. For example, the following pipeline generates a sorted, unique list of the logged-on users: @@ -23796,21 +23637,14 @@ other @command{awk} implementations to use @code{substr()} it is also extremely painful. The @code{FIELDWIDTHS} variable supplies an elegant solution to the problem of picking the input line apart by characters. -@c ENDOFRANGE cut -@c ENDOFRANGE ficut -@c ENDOFRANGE colcut @node Egrep Program @subsection Searching for Regular Expressions in Files -@c STARTOFRANGE regexps @cindex regular expressions, searching for -@c STARTOFRANGE sfregexp @cindex searching, files for regular expressions -@c STARTOFRANGE fsregexp @cindex files, searching for regular expressions -@c STARTOFRANGE egrep @cindex @command{egrep} utility The @command{egrep} utility searches files for patterns. It uses regular expressions that are almost identical to those available in @command{awk} @@ -24013,7 +23847,7 @@ successful or unsuccessful match. If the line does not match, the @code{next} statement just moves on to the next record. A number of additional tests are made, but they are only done if we -are not counting lines. First, if the user only wants exit status +are not counting lines. First, if the user only wants the exit status (@code{no_print} is true), then it is enough to know that @emph{one} line in this file matched, and we can skip on to the next file with @code{nextfile}. Similarly, if we are only printing @value{FN}s, we can @@ -24054,7 +23888,7 @@ if necessary: @end example The @code{END} rule takes care of producing the correct exit status. If -there are no matches, the exit status is one; otherwise it is zero: +there are no matches, the exit status is one; otherwise, it is zero: @example @c file eg/prog/egrep.awk @@ -24078,17 +23912,12 @@ function usage() @c endfile @end example -@c ENDOFRANGE regexps -@c ENDOFRANGE sfregexp -@c ENDOFRANGE fsregexp -@c ENDOFRANGE egrep @node Id Program @subsection Printing Out User Information @cindex printing, user information @cindex users, information about, printing -@c STARTOFRANGE id @cindex @command{id} utility The @command{id} utility lists a user's real and effective user ID numbers, real and effective group ID numbers, and the user's group set, if any. @@ -24111,7 +23940,8 @@ Here is a simple version of @command{id} written in @command{awk}. It uses the user database library functions (@pxref{Passwd Functions}) and the group database library functions -(@pxref{Group Functions}): +(@pxref{Group Functions}) +from @ref{Library Functions}. The program is fairly straightforward. All the work is done in the @code{BEGIN} rule. The user and group ID numbers are obtained from @@ -24217,16 +24047,13 @@ code that is used repeatedly, making the whole program shorter and cleaner. In particular, moving the check for the empty string into this function saves several lines of code. -@c ENDOFRANGE id @node Split Program @subsection Splitting a Large File into Pieces @c FIXME: One day, update to current POSIX version of split -@c STARTOFRANGE filspl @cindex files, splitting -@c STARTOFRANGE split @cindex @code{split} utility The @command{split} program splits large text files into smaller pieces. Usage is as follows:@footnote{This is the traditional usage. The @@ -24241,8 +24068,8 @@ By default, the output files are named @file{xaa}, @file{xab}, and so on. Each file has 1,000 lines in it, with the likely exception of the last file. To change the number of lines in each file, supply a number on the command line -preceded with a minus (e.g., @samp{-500} for files with 500 lines in them -instead of 1,000). To change the name of the output files to something like +preceded with a minus sign (e.g., @samp{-500} for files with 500 lines in them +instead of 1,000). To change the names of the output files to something like @file{myfileaa}, @file{myfileab}, and so on, supply an additional argument that specifies the @value{FN} prefix. @@ -24361,15 +24188,12 @@ You might want to consider how to eliminate the use of way as to solve the EBCDIC issue as well. @end ifset -@c ENDOFRANGE filspl -@c ENDOFRANGE split @node Tee Program @subsection Duplicating Output into Multiple Files @cindex files, multiple@comma{} duplicating output into @cindex output, duplicating into files -@c STARTOFRANGE tee @cindex @code{tee} utility The @code{tee} program is known as a ``pipe fitting.'' @code{tee} copies its standard input to its standard output and also duplicates it to the @@ -24482,18 +24306,14 @@ END @{ @} @c endfile @end example -@c ENDOFRANGE tee @node Uniq Program @subsection Printing Nonduplicated Lines of Text @c FIXME: One day, update to current POSIX version of uniq -@c STARTOFRANGE prunt @cindex printing, unduplicated lines of text -@c STARTOFRANGE tpul @cindex text@comma{} printing, unduplicated lines of -@c STARTOFRANGE uniq @cindex @command{uniq} utility The @command{uniq} utility reads sorted lines of data on its standard input, and by default removes duplicate lines. In other words, it only @@ -24762,26 +24582,17 @@ suggestion. @end ifset -@c ENDOFRANGE prunt -@c ENDOFRANGE tpul -@c ENDOFRANGE uniq @node Wc Program @subsection Counting Things @c FIXME: One day, update to current POSIX version of wc -@c STARTOFRANGE count @cindex counting -@c STARTOFRANGE infco @cindex input files, counting elements in -@c STARTOFRANGE woco @cindex words, counting -@c STARTOFRANGE chco @cindex characters, counting -@c STARTOFRANGE lico @cindex lines, counting -@c STARTOFRANGE wc @cindex @command{wc} utility The @command{wc} (word count) utility counts lines, words, and characters in one or more input files. Its usage is as follows: @@ -24951,13 +24762,6 @@ END @{ @} @c endfile @end example -@c ENDOFRANGE count -@c ENDOFRANGE infco -@c ENDOFRANGE lico -@c ENDOFRANGE woco -@c ENDOFRANGE chco -@c ENDOFRANGE wc -@c ENDOFRANGE posimawk @node Miscellaneous Programs @section A Grab Bag of @command{awk} Programs @@ -25088,9 +24892,7 @@ Aharon Robbins <arnold@skeeve.com> wrote: @author Erik Quanstrom @end quotation -@c STARTOFRANGE tialarm @cindex time, alarm clock example program -@c STARTOFRANGE alaex @cindex alarm clock example program The following program is a simple ``alarm clock'' program. You give it a time of day and an optional message. At the specified time, @@ -25106,7 +24908,7 @@ checking and setting of defaults: the delay, the count, and the message to print. If the user supplied a message without the ASCII BEL character (known as the ``alert'' character, @code{"\a"}), then it is added to the message. (On many systems, printing the ASCII BEL generates an -audible alert. Thus when the alarm goes off, the system calls attention +audible alert. Thus, when the alarm goes off, the system calls attention to itself in case the user is not looking at the computer.) Just for a change, this program uses a @code{switch} statement (@pxref{Switch Statement}), but the processing could be done with a series of @@ -25242,15 +25044,11 @@ seconds are necessary: @} @c endfile @end example -@c ENDOFRANGE tialarm -@c ENDOFRANGE alaex @node Translate Program @subsection Transliterating Characters -@c STARTOFRANGE chtra @cindex characters, transliterating -@c STARTOFRANGE tr @cindex @command{tr} utility The system @command{tr} utility transliterates characters. For example, it is often used to map uppercase letters into lowercase for further processing: @@ -25279,7 +25077,7 @@ to @command{gawk}. @c at least theoretically The following program was written to prove that character transliteration could be done with a user-level -function. This program is not as complete as the system @command{tr} utility +function. This program is not as complete as the system @command{tr} utility, but it does most of the job. The @command{translate} program was written long before @command{gawk} @@ -25291,13 +25089,13 @@ takes three arguments: @table @code @item from -A list of characters from which to translate. +A list of characters from which to translate @item to -A list of characters to which to translate. +A list of characters to which to translate @item target -The string on which to do the translation. +The string on which to do the translation @end table Associative arrays make the translation part fairly easy. @code{t_ar} holds @@ -25306,7 +25104,7 @@ loop goes through @code{from}, one character at a time. For each character in @code{from}, if the character appears in @code{target}, it is replaced with the corresponding @code{to} character. -The @code{translate()} function calls @code{stranslate()} using @code{$0} +The @code{translate()} function calls @code{stranslate()}, using @code{$0} as the target. The main program sets two global variables, @code{FROM} and @code{TO}, from the command line, and then changes @code{ARGV} so that @command{awk} reads from the standard input. @@ -25328,7 +25126,7 @@ Finally, the processing rule simply calls @code{translate()} for each record: @c endfile @end ignore @c file eg/prog/translate.awk -# Bugs: does not handle things like: tr A-Z a-z, it has +# Bugs: does not handle things like tr A-Z a-z; it has # to be spelled out. However, if `to' is shorter than `from', # the last character in `to' is used for the rest of `from'. @@ -25398,17 +25196,13 @@ such as @samp{a-z}, as allowed by the @command{tr} utility. Look at the code for @file{cut.awk} (@pxref{Cut Program}) for inspiration. -@c ENDOFRANGE chtra -@c ENDOFRANGE tr @node Labels Program @subsection Printing Mailing Labels -@c STARTOFRANGE prml @cindex printing, mailing labels -@c STARTOFRANGE mlprint @cindex mailing labels@comma{} printing -Here is a ``real world''@footnote{``Real world'' is defined as +Here is a ``real-world''@footnote{``Real world'' is defined as ``a program actually used to get something done.''} program. This script reads lists of names and @@ -25417,7 +25211,7 @@ on it, two across and 10 down. The addresses are guaranteed to be no more than five lines of data. Each address is separated from the next by a blank line. -The basic idea is to read 20 labels worth of data. Each line of each label +The basic idea is to read 20 labels' worth of data. Each line of each label is stored in the @code{line} array. The single rule takes care of filling the @code{line} array and printing the page when 20 labels have been read. @@ -25440,12 +25234,12 @@ of lines on the page Most of the work is done in the @code{printpage()} function. The label lines are stored sequentially in the @code{line} array. But they -have to print horizontally; @code{line[1]} next to @code{line[6]}, +have to print horizontally: @code{line[1]} next to @code{line[6]}, @code{line[2]} next to @code{line[7]}, and so on. Two loops accomplish this. The outer loop, controlled by @code{i}, steps through every 10 lines of data; this is each row of labels. The inner loop, controlled by @code{j}, goes through the lines within the row. -As @code{j} goes from 0 to 4, @samp{i+j} is the @code{j}-th line in +As @code{j} goes from 0 to 4, @samp{i+j} is the @code{j}th line in the row, and @samp{i+j+5} is the entry next to it. The output ends up looking something like this: @@ -25470,7 +25264,6 @@ that there are two blank lines at the top and two blank lines at the bottom. The @code{END} rule arranges to flush the final page of labels; there may not have been an even multiple of 20 labels in the data: -@c STARTOFRANGE labels @cindex @code{labels.awk} program @example @c file eg/prog/labels.awk @@ -25535,14 +25328,10 @@ END @{ @} @c endfile @end example -@c ENDOFRANGE prml -@c ENDOFRANGE mlprint -@c ENDOFRANGE labels @node Word Sorting @subsection Generating Word-Usage Counts -@c STARTOFRANGE worus @cindex words, usage counts@comma{} generating When working with large amounts of text, it can be interesting to know @@ -25568,8 +25357,8 @@ END @{ @} @end example -The program relies on @command{awk}'s default field splitting -mechanism to break each line up into ``words,'' and uses an +The program relies on @command{awk}'s default field-splitting +mechanism to break each line up into ``words'' and uses an associative array named @code{freq}, indexed by each word, to count the number of times the word occurs. In the @code{END} rule, it prints the counts. @@ -25604,7 +25393,6 @@ to remove punctuation characters. Finally, we solve the third problem by using the system @command{sort} utility to process the output of the @command{awk} script. Here is the new version of the program: -@c STARTOFRANGE wordfreq @cindex @code{wordfreq.awk} program @example @c file eg/prog/wordfreq.awk @@ -25669,16 +25457,13 @@ This way of sorting must be used on systems that do not have true pipes at the command-line (or batch-file) level. See the general operating system documentation for more information on how to use the @command{sort} program. -@c ENDOFRANGE worus -@c ENDOFRANGE wordfreq @node History Sorting @subsection Removing Duplicates from Unsorted Text -@c STARTOFRANGE lidu @cindex lines, duplicate@comma{} removing The @command{uniq} program -(@pxref{Uniq Program}), +(@pxref{Uniq Program}) removes duplicate lines from @emph{sorted} data. Suppose, however, you need to remove duplicate lines from a @value{DF} but @@ -25700,7 +25485,6 @@ Each element of @code{lines} is a unique command, and the indices of The @code{END} rule simply prints out the lines, in order: @cindex Rakitzis, Byron -@c STARTOFRANGE histsort @cindex @code{histsort.awk} program @example @c file eg/prog/histsort.awk @@ -25743,15 +25527,11 @@ print data[lines[i]], lines[i] @noindent This works because @code{data[$0]} is incremented each time a line is seen. -@c ENDOFRANGE lidu -@c ENDOFRANGE histsort @node Extract Program @subsection Extracting Programs from Texinfo Source Files -@c STARTOFRANGE texse @cindex Texinfo, extracting programs from source files -@c STARTOFRANGE fitex @cindex files, Texinfo@comma{} extracting programs from @ifnotinfo Both this chapter and the previous chapter @@ -25770,7 +25550,7 @@ Texinfo input file into separate files. @cindex Texinfo This @value{DOCUMENT} is written in @uref{http://www.gnu.org/software/texinfo/, Texinfo}, -the GNU project's document formatting language. +the GNU Project's document formatting language. A single Texinfo source file can be used to produce both printed documentation, with @TeX{}, and online documentation. @ifnotinfo @@ -25829,7 +25609,7 @@ The Texinfo file looks something like this: @example @dots{} -This program has a @@code@{BEGIN@} rule, +This program has a @@code@{BEGIN@} rule that prints a nice message: @@example @@ -25855,11 +25635,10 @@ The first rule handles calling @code{system()}, checking that a command is given (@code{NF} is at least three) and also checking that the command exits with a zero exit status, signifying OK: -@c STARTOFRANGE extract @cindex @code{extract.awk} program @example @c file eg/prog/extract.awk -# extract.awk --- extract files and run programs from texinfo files +# extract.awk --- extract files and run programs from Texinfo files @c endfile @ignore @c file eg/prog/extract.awk @@ -25900,12 +25679,12 @@ The second rule handles moving data into files. It verifies that a @value{FN} is given in the directive. If the file named is not the current file, then the current file is closed. Keeping the current file open until a new file is encountered allows the use of the @samp{>} -redirection for printing the contents, keeping open file management +redirection for printing the contents, keeping open-file management simple. The @code{for} loop does the work. It reads lines using @code{getline} (@pxref{Getline}). -For an unexpected end of file, it calls the @code{@w{unexpected_eof()}} +For an unexpected end-of-file, it calls the @code{@w{unexpected_eof()}} function. If the line is an ``endfile'' line, then it breaks out of the loop. If the line is an @samp{@@group} or @samp{@@end group} line, then it @@ -26001,16 +25780,13 @@ END @{ @} @c endfile @end example -@c ENDOFRANGE texse -@c ENDOFRANGE fitex -@c ENDOFRANGE extract @node Simple Sed @subsection A Simple Stream Editor @cindex @command{sed} utility @cindex stream editors -The @command{sed} utility is a stream editor, a program that reads a +The @command{sed} utility is a @dfn{stream editor}, a program that reads a stream of data, makes changes to it, and passes it on. It is often used to make global changes to a large file or to a stream of data generated by a pipeline of commands. @@ -26033,7 +25809,6 @@ additional arguments are treated as @value{DF} names to process. If none are provided, the standard input is used: @cindex Brennan, Michael -@c STARTOFRANGE awksed @cindex @command{awksed.awk} program @c @cindex simple stream editor @c @cindex stream editor, simple @@ -26110,14 +25885,11 @@ The @code{usage()} function prints an error message and exits. Finally, the single rule handles the printing scheme outlined earlier, using @code{print} or @code{printf} as appropriate, depending upon the value of @code{RT}. -@c ENDOFRANGE awksed @node Igawk Program @subsection An Easy Way to Use Library Functions -@c STARTOFRANGE libfex @cindex libraries of @command{awk} functions, example program for using -@c STARTOFRANGE flibex @cindex functions, library, example program for using In @ref{Include Files}, we saw how @command{gawk} provides a built-in file-inclusion capability. However, this is a @command{gawk} extension. @@ -26159,7 +25931,7 @@ includes don't accidentally include a library function twice. @command{igawk} should behave just like @command{gawk} externally. This means it should accept all of @command{gawk}'s command-line arguments, including the ability to have multiple source files specified via -@option{-f}, and the ability to mix command-line and library source files. +@option{-f} and the ability to mix command-line and library source files. The program is written using the POSIX Shell (@command{sh}) command language.@footnote{Fully explaining the @command{sh} language is beyond @@ -26198,7 +25970,7 @@ Run the expanded program with @command{gawk} and any other original command-line arguments that the user supplied (such as the @value{DF} names). @end enumerate -This program uses shell variables extensively: for storing command-line arguments, +This program uses shell variables extensively: for storing command-line arguments and the text of the @command{awk} program that will expand the user's program, for the user's original program, and for the expanded program. Doing so removes some potential problems that might arise were we to use temporary files instead, @@ -26256,7 +26028,6 @@ program. The program is as follows: -@c STARTOFRANGE igawk @cindex @code{igawk.sh} program @example @c file eg/prog/igawk.sh @@ -26516,22 +26287,7 @@ Save the results of this processing in the shell variable The last step is to call @command{gawk} with the expanded program, along with the original -options and command-line arguments that the user supplied. - -@c this causes more problems than it solves, so leave it out. -@ignore -The special file @file{/dev/null} is passed as a @value{DF} to @command{gawk} -to handle an interesting case. Suppose that the user's program only has -a @code{BEGIN} rule and there are no @value{DF}s to read. -The program should exit without reading any @value{DF}s. -However, suppose that an included library file defines an @code{END} -rule of its own. In this case, @command{gawk} will hang, reading standard -input. In order to avoid this, @file{/dev/null} is explicitly added to the -command line. Reading from @file{/dev/null} always returns an immediate -end of file indication. - -@c Hmm. Add /dev/null if $# is 0? Still messes up ARGV. Sigh. -@end ignore +options and command-line arguments that the user supplied: @example @c file eg/prog/igawk.sh @@ -26581,10 +26337,6 @@ features to a program; they can often be layered on top.@footnote{@command{gawk} does @code{@@include} processing itself in order to support the use of @command{awk} programs as Web CGI scripts.} -@c ENDOFRANGE libfex -@c ENDOFRANGE flibex -@c ENDOFRANGE awkpex -@c ENDOFRANGE igawk @node Anagram Program @subsection Finding Anagrams from a Dictionary @@ -26601,19 +26353,18 @@ the same letters Column 2, Problem C, of Jon Bentley's @cite{Programming Pearls}, Second Edition, presents an elegant algorithm. The idea is to give words that are anagrams a common signature, sort all the words together by their -signature, and then print them. Dr.@: Bentley observes that taking the -letters in each word and sorting them produces that common signature. +signatures, and then print them. Dr.@: Bentley observes that taking the +letters in each word and sorting them produces those common signatures. The following program uses arrays of arrays to bring together words with the same signature and array sorting to print the words in sorted order: -@c STARTOFRANGE anagram @cindex @code{anagram.awk} program @example @c file eg/prog/anagram.awk -# anagram.awk --- An implementation of the anagram finding algorithm -# from Jon Bentley's "Programming Pearls", 2nd edition. +# anagram.awk --- An implementation of the anagram-finding algorithm +# from Jon Bentley's "Programming Pearls," 2nd edition. # Addison Wesley, 2000, ISBN 0-201-65788-0. # Column 2, Problem C, section 2.8, pp 18-20. @c endfile @@ -26661,7 +26412,7 @@ sorts the letters, and then joins them back together: @example @c file eg/prog/anagram.awk -# word2key --- split word apart into letters, sort, joining back together +# word2key --- split word apart into letters, sort, and join back together function word2key(word, a, i, n, result) @{ @@ -26717,7 +26468,6 @@ babery yabber @dots{} @end example -@c ENDOFRANGE anagram @node Signature Program @subsection And Now for Something Completely Different @@ -26857,12 +26607,13 @@ characters. The ability to use @code{split()} with the empty string as the separator can considerably simplify such tasks. @item -The library functions from @ref{Library Functions}, proved their -usefulness for a number of real (if small) programs. +The examples here demonstrate the usefulness of the library +functions from @DBREF{Library Functions} +for a number of real (if small) programs. @item Besides reinventing POSIX wheels, other programs solved a selection of -interesting problems, such as finding duplicates words in text, printing +interesting problems, such as finding duplicate words in text, printing mailing labels, and finding anagrams. @end itemize @@ -27037,9 +26788,7 @@ It contains the following chapters: @node Advanced Features @chapter Advanced Features of @command{gawk} -@c STARTOFRANGE gawadv @cindex @command{gawk}, features, advanced -@c STARTOFRANGE advgaw @cindex advanced features, @command{gawk} @ignore Contributed by: Peter Langston <pud!psl@bellcore.bellcore.com> @@ -27060,18 +26809,18 @@ a violent psychopath who knows where you live.} This @value{CHAPTER} discusses advanced features in @command{gawk}. It's a bit of a ``grab bag'' of items that are otherwise unrelated to each other. -First, a command-line option allows @command{gawk} to recognize +First, we look at a command-line option that allows @command{gawk} to recognize nondecimal numbers in input data, not just in @command{awk} programs. Then, @command{gawk}'s special features for sorting arrays are presented. Next, two-way I/O, discussed briefly in earlier parts of this @value{DOCUMENT}, is described in full detail, along with the basics -of TCP/IP networking. Finally, @command{gawk} +of TCP/IP networking. Finally, we see how @command{gawk} can @dfn{profile} an @command{awk} program, making it possible to tune it for performance. @c FULLXREF ON -A number of advanced features require separate @value{CHAPTER}s of their +Additional advanced features are discussed in separate @value{CHAPTER}s of their own: @itemize @value{BULLET} @@ -27165,7 +26914,8 @@ This option may disappear in a future version of @command{gawk}. @node Array Sorting @section Controlling Array Traversal and Array Sorting -@command{gawk} lets you control the order in which a @samp{for (i in array)} +@command{gawk} lets you control the order in which a +@samp{for (@var{indx} in @var{array})} loop traverses an array. In addition, two built-in functions, @code{asort()} and @code{asorti()}, @@ -27181,7 +26931,7 @@ to order the elements during sorting. @node Controlling Array Traversal @subsection Controlling Array Traversal -By default, the order in which a @samp{for (i in array)} loop +By default, the order in which a @samp{for (@var{indx} in @var{array})} loop scans an array is not defined; it is generally based upon the internal implementation of arrays inside @command{awk}. @@ -27210,23 +26960,23 @@ function comp_func(i1, v1, i2, v2) @} @end example -Here, @var{i1} and @var{i2} are the indices, and @var{v1} and @var{v2} +Here, @code{i1} and @code{i2} are the indices, and @code{v1} and @code{v2} are the corresponding values of the two elements being compared. -Either @var{v1} or @var{v2}, or both, can be arrays if the array being +Either @code{v1} or @code{v2}, or both, can be arrays if the array being traversed contains subarrays as values. (@DBXREF{Arrays of Arrays} for more information about subarrays.) The three possible return values are interpreted as follows: @table @code @item comp_func(i1, v1, i2, v2) < 0 -Index @var{i1} comes before index @var{i2} during loop traversal. +Index @code{i1} comes before index @code{i2} during loop traversal. @item comp_func(i1, v1, i2, v2) == 0 -Indices @var{i1} and @var{i2} -come together but the relative order with respect to each other is undefined. +Indices @code{i1} and @code{i2} +come together, but the relative order with respect to each other is undefined. @item comp_func(i1, v1, i2, v2) > 0 -Index @var{i1} comes after index @var{i2} during loop traversal. +Index @code{i1} comes after index @code{i2} during loop traversal. @end table Our first comparison function can be used to scan an array in @@ -27387,7 +27137,7 @@ As already mentioned, the order of the indices is arbitrary if two elements compare equal. This is usually not a problem, but letting the tied elements come out in arbitrary order can be an issue, especially when comparing item values. The partial ordering of the equal elements -may change the next time the array is traversed, if other elements are added or +may change the next time the array is traversed, if other elements are added to or removed from the array. One way to resolve ties when comparing elements with otherwise equal values is to include the indices in the comparison rules. Note that doing this may make the loop traversal less efficient, @@ -27430,7 +27180,7 @@ equivalent or distinct. Another point to keep in mind is that in the case of subarrays, the element values can themselves be arrays; a production comparison function should use the @code{isarray()} function -(@pxref{Type Functions}), +(@pxref{Type Functions}) to check for this, and choose a defined sorting order for subarrays. All sorting based on @code{PROCINFO["sorted_in"]} @@ -27438,7 +27188,7 @@ is disabled in POSIX mode, because the @code{PROCINFO} array is not special in that case. As a side note, sorting the array indices before traversing -the array has been reported to add 15% to 20% overhead to the +the array has been reported to add a 15% to 20% overhead to the execution time of @command{awk} programs. For this reason, sorted array traversal is not the default. @@ -27497,7 +27247,7 @@ However, the @code{source} array is not affected. Often, what's needed is to sort on the values of the @emph{indices} instead of the values of the elements. To do that, use the @code{asorti()} function. The interface and behavior are identical to -that of @code{asort()}, except that the index values are used for sorting, +that of @code{asort()}, except that the index values are used for sorting and become the values of the result array: @example @@ -27532,8 +27282,8 @@ it chooses}, taking into account just the indices, just the values, or both. This is extremely powerful. Once the array is sorted, @code{asort()} takes the @emph{values} in -their final order, and uses them to fill in the result array, whereas -@code{asorti()} takes the @emph{indices} in their final order, and uses +their final order and uses them to fill in the result array, whereas +@code{asorti()} takes the @emph{indices} in their final order and uses them to fill in the result array. @cindex reference counting, sorting arrays @@ -27749,7 +27499,6 @@ using regular pipes. @section Using @command{gawk} for Network Programming @cindex advanced features, network programming @cindex networks, programming -@c STARTOFRANGE tcpip @cindex TCP/IP @cindex @code{/inet/@dots{}} special files (@command{gawk}) @cindex files, @code{/inet/@dots{}} (@command{gawk}) @@ -27831,7 +27580,7 @@ service name. @cindex @command{gawk}, @code{ERRNO} variable in @cindex @code{ERRNO} variable @quotation NOTE -Failure in opening a two-way socket will result in a non-fatal error +Failure in opening a two-way socket will result in a nonfatal error being returned to the calling code. The value of @code{ERRNO} indicates the error (@pxref{Auto-set}). @end quotation @@ -27848,31 +27597,28 @@ BEGIN @{ @end example This program reads the current date and time from the local system's -TCP @samp{daytime} server. +TCP @code{daytime} server. It then prints the results and closes the connection. Because this topic is extensive, the use of @command{gawk} for TCP/IP programming is documented separately. @ifinfo See -@inforef{Top, , General Introduction, gawkinet, TCP/IP Internetworking with @command{gawk}}, +@inforef{Top, , General Introduction, gawkinet, @value{GAWKINETTITLE}}, @end ifinfo @ifnotinfo See @uref{http://www.gnu.org/software/gawk/manual/gawkinet/, -@cite{TCP/IP Internetworking with @command{gawk}}}, +@cite{@value{GAWKINETTITLE}}}, which comes as part of the @command{gawk} distribution, @end ifnotinfo for a much more complete introduction and discussion, as well as extensive examples. -@c ENDOFRANGE tcpip @node Profiling @section Profiling Your @command{awk} Programs -@c STARTOFRANGE awkp @cindex @command{awk} programs, profiling -@c STARTOFRANGE proawk @cindex profiling @command{awk} programs @cindex @code{awkprof.out} file @cindex files, @code{awkprof.out} @@ -27939,9 +27685,9 @@ junk @end example Here is the @file{awkprof.out} that results from running the -@command{gawk} profiler on this program and data. (This example also +@command{gawk} profiler on this program and data (this example also illustrates that @command{awk} programmers sometimes get up very early -in the morning to work.) +in the morning to work): @cindex @code{BEGIN} pattern, and profiling @cindex @code{END} pattern, and profiling @@ -28001,8 +27747,8 @@ They are as follows: @item The program is printed in the order @code{BEGIN} rules, @code{BEGINFILE} rules, -pattern/action rules, -@code{ENDFILE} rules, @code{END} rules and functions, listed +pattern--action rules, +@code{ENDFILE} rules, @code{END} rules, and functions, listed alphabetically. Multiple @code{BEGIN} and @code{END} rules retain their separate identities, as do @@ -28010,7 +27756,7 @@ multiple @code{BEGINFILE} and @code{ENDFILE} rules. @cindex patterns, counts, in a profile @item -Pattern-action rules have two counts. +Pattern--action rules have two counts. The first count, to the left of the rule, shows how many times the rule's pattern was @emph{tested}. The second count, to the right of the rule's opening left brace @@ -28077,13 +27823,13 @@ the target of a redirection isn't a scalar, it gets parenthesized. @command{gawk} supplies leading comments in front of the @code{BEGIN} and @code{END} rules, the @code{BEGINFILE} and @code{ENDFILE} rules, -the pattern/action rules, and the functions. +the pattern--action rules, and the functions. @end itemize The profiled version of your program may not look exactly like what you typed when you wrote it. This is because @command{gawk} creates the -profiled version by ``pretty printing'' its internal representation of +profiled version by ``pretty-printing'' its internal representation of the program. The advantage to this is that @command{gawk} can produce a standard representation. Also, things such as: @@ -28166,16 +27912,16 @@ If you use the @code{HUP} signal instead of the @code{USR1} signal, @cindex @code{SIGQUIT} signal (MS-Windows) @cindex signals, @code{QUIT}/@code{SIGQUIT} (MS-Windows) When @command{gawk} runs on MS-Windows systems, it uses the -@code{INT} and @code{QUIT} signals for producing the profile and, in +@code{INT} and @code{QUIT} signals for producing the profile, and in the case of the @code{INT} signal, @command{gawk} exits. This is because these systems don't support the @command{kill} command, so the only signals you can deliver to a program are those generated by the keyboard. The @code{INT} signal is generated by the -@kbd{Ctrl-@key{C}} or @kbd{Ctrl-@key{BREAK}} key, while the -@code{QUIT} signal is generated by the @kbd{Ctrl-@key{\}} key. +@kbd{Ctrl-c} or @kbd{Ctrl-BREAK} key, while the +@code{QUIT} signal is generated by the @kbd{Ctrl-\} key. Finally, @command{gawk} also accepts another option, @option{--pretty-print}. -When called this way, @command{gawk} ``pretty prints'' the program into +When called this way, @command{gawk} ``pretty-prints'' the program into @file{awkprof.out}, without any execution counts. @quotation NOTE @@ -28199,9 +27945,6 @@ that the profiling output does. This makes it easy to pretty-print your code once development is completed, and then use the result as the final version of your program. -@c ENDOFRANGE awkp -@c ENDOFRANGE proawk - @node Advanced Features Summary @section Summary @@ -28232,7 +27975,7 @@ optionally, close off one side of the two-way communications. @item By using special @value{FN}s with the @samp{|&} operator, you can open a -TCP/IP (or UDP/IP) connection to remote hosts in the Internet. @command{gawk} +TCP/IP (or UDP/IP) connection to remote hosts on the Internet. @command{gawk} supports both IPv4 and IPv6. @item @@ -28242,13 +27985,11 @@ you tune them more easily. Sending the @code{USR1} signal while profiling cause @command{gawk} to dump the profile and keep going, including a function call stack. @item -You can also just ``pretty print'' the program. This currently also runs +You can also just ``pretty-print'' the program. This currently also runs the program, but that will change in the next major release. @end itemize -@c ENDOFRANGE advgaw -@c ENDOFRANGE gawadv @node Internationalization @chapter Internationalization with @command{gawk} @@ -28261,7 +28002,6 @@ countries, they were able to sell more systems. As a result, internationalization and localization of programs and software systems became a common practice. -@c STARTOFRANGE inloc @cindex internationalization, localization @cindex @command{gawk}, internationalization and, See internationalization @cindex internationalization, localization, @command{gawk} and @@ -28294,7 +28034,7 @@ a requirement. @cindex localization @dfn{Internationalization} means writing (or modifying) a program once, in such a way that it can use multiple languages without requiring -further source-code changes. +further source code changes. @dfn{Localization} means providing the data necessary for an internationalized program to work in a particular language. Most typically, these terms refer to features such as the language @@ -28306,11 +28046,10 @@ monetary values are printed and read. @section GNU @command{gettext} @cindex internationalizing a program -@c STARTOFRANGE gettex @cindex @command{gettext} library @command{gawk} uses GNU @command{gettext} to provide its internationalization features. -The facilities in GNU @command{gettext} focus on messages; strings printed +The facilities in GNU @command{gettext} focus on messages: strings printed by a program, either directly or via formatting with @code{printf} or @code{sprintf()}.@footnote{For some operating systems, the @command{gawk} port doesn't support GNU @command{gettext}. @@ -28358,7 +28097,6 @@ lookup of the translations. @cindex @code{.po} files @cindex files, @code{.po} -@c STARTOFRANGE portobfi @cindex portable object files @cindex files, portable object @item @@ -28370,7 +28108,6 @@ For example, there might be a @file{fr.po} for a French translation. @cindex @code{.gmo} files @cindex files, @code{.gmo} @cindex message object files -@c STARTOFRANGE portmsgfi @cindex files, message object @item Each language's @file{.po} file is converted into a binary @@ -28498,14 +28235,12 @@ before or after the day in a date, local month abbreviations, and so on. @item LC_ALL All of the above. (Not too useful in the context of @command{gettext}.) @end table -@c ENDOFRANGE gettex @node Programmer i18n @section Internationalizing @command{awk} Programs -@c STARTOFRANGE inap @cindex @command{awk} programs, internationalizing -@command{gawk} provides the following variables and functions for +@command{gawk} provides the following variables for internationalization: @table @code @@ -28521,7 +28256,12 @@ value is @code{"messages"}. String constants marked with a leading underscore are candidates for translation at runtime. String constants without a leading underscore are not translated. +@end table + +@command{gawk} provides the following functions for +internationalization: +@table @code @cindexgawkfunc{dcgettext} @item @code{dcgettext(@var{string}} [@code{,} @var{domain} [@code{,} @var{category}]]@code{)} Return the translation of @var{string} in @@ -28578,15 +28318,7 @@ If @var{directory} is the null string (@code{""}), then given @var{domain}. @end table -To use these facilities in your @command{awk} program, follow the steps -outlined in -@ifnotinfo -the previous @value{SECTION}, -@end ifnotinfo -@ifinfo -@ref{Explaining gettext}, -@end ifinfo -like so: +To use these facilities in your @command{awk} program, follow these steps: @enumerate @cindex @code{BEGIN} pattern, @code{TEXTDOMAIN} variable and @@ -28735,8 +28467,6 @@ to provide you translations that you can also then distribute. @DBXREF{I18N Example} for the full list of steps to go through to create and test translations for @command{guide}. -@c ENDOFRANGE portobfi -@c ENDOFRANGE portmsgfi @node Printf Ordering @subsection Rearranging @code{printf} Arguments @@ -28871,7 +28601,7 @@ the null string (@code{""}) as its value, leaving the original string constant a the result. @item -By defining ``dummy'' functions to replace @code{dcgettext()}, @code{dcngettext()} +By defining ``dummy'' functions to replace @code{dcgettext()}, @code{dcngettext()}, and @code{bindtextdomain()}, the @command{awk} program can be made to run, but all the messages are output in the original language. For example: @@ -28912,7 +28642,6 @@ However, because the positional specifications are primarily for use in @emph{translated} format strings, and because non-GNU @command{awk}s never retrieve the translated string, this should not be a problem in practice. @end itemize -@c ENDOFRANGE inap @node I18N Example @section A Simple Internationalization Example @@ -29056,15 +28785,15 @@ using the GNU @command{gettext} package. (GNU @command{gettext} is described in complete detail in @ifinfo -@inforef{Top, , GNU @command{gettext} utilities, gettext, GNU gettext tools}.) +@inforef{Top, , GNU @command{gettext} utilities, gettext, GNU @command{gettext} utilities}.) @end ifinfo @ifnotinfo @uref{http://www.gnu.org/software/gettext/manual/, -@cite{GNU gettext tools}}.) +@cite{GNU @command{gettext} utilities}}.) @end ifnotinfo As of this writing, the latest version of GNU @command{gettext} is -@uref{ftp://ftp.gnu.org/gnu/gettext/gettext-0.19.3.tar.gz, -@value{PVERSION} 0.19.3}. +@uref{ftp://ftp.gnu.org/gnu/gettext/gettext-0.19.4.tar.gz, +@value{PVERSION} 0.19.4}. If a translation of @command{gawk}'s messages exists, then @command{gawk} produces usage messages, warnings, @@ -29076,7 +28805,7 @@ and fatal errors in the local language. @itemize @value{BULLET} @item Internationalization means writing a program such that it can use multiple -languages without requiring source-code changes. Localization means +languages without requiring source code changes. Localization means providing the data necessary for an internationalized program to work in a particular language. @@ -29093,9 +28822,9 @@ file, and the @file{.po} files are compiled into @file{.gmo} files for use at runtime. @item -You can use position specifications with @code{sprintf()} and +You can use positional specifications with @code{sprintf()} and @code{printf} to rearrange the placement of argument values in formatted -strings and output. This is useful for the translations of format +strings and output. This is useful for the translation of format control strings. @item @@ -29108,7 +28837,6 @@ a number of translations for its messages. @end itemize -@c ENDOFRANGE inloc @node Debugger @chapter Debugging @command{awk} Programs @@ -29152,8 +28880,7 @@ the discussion of debugging in @command{gawk}. @subsection Debugging in General (If you have used debuggers in other languages, you may want to skip -ahead to the next section on the specific features of the @command{gawk} -debugger.) +ahead to @ref{Awk Debugging}.) Of course, a debugging program cannot remove bugs for you, because it has no way of knowing what you or your users consider a ``bug'' versus a @@ -29244,10 +28971,10 @@ and usually find the errant code quite quickly. @end table @node Awk Debugging -@subsection Awk Debugging +@subsection @command{awk} Debugging Debugging an @command{awk} program has some specific aspects that are -not shared with other programming languages. +not shared with programs written in other languages. First of all, the fact that @command{awk} programs usually take input line by line from a file or files and operate on those lines using specific @@ -29263,7 +28990,7 @@ to look at the individual primitive instructions carried out by the higher-level @command{awk} commands. @node Sample Debugging Session -@section Sample Debugging Session +@section Sample @command{gawk} Debugging Session @cindex sample debugging session In order to illustrate the use of @command{gawk} as a debugger, let's look at a sample @@ -29282,8 +29009,8 @@ as our example. @cindex debugger, how to start Starting the debugger is almost exactly like running @command{gawk} normally, -except you have to pass an additional option @option{--debug}, or the -corresponding short option @option{-D}. The file(s) containing the +except you have to pass an additional option, @option{--debug}, or the +corresponding short option, @option{-D}. The file(s) containing the program and any supporting code are given on the command line as arguments to one or more @option{-f} options. (@command{gawk} is not designed to debug command-line programs, only programs contained in files.) @@ -29296,7 +29023,7 @@ $ @kbd{gawk -D -f getopt.awk -f join.awk -f uniq.awk -1 inputfile} @noindent where both @file{getopt.awk} and @file{uniq.awk} are in @env{$AWKPATH}. (Experienced users of GDB or similar debuggers should note that -this syntax is slightly different from what they are used to. +this syntax is slightly different from what you are used to. With the @command{gawk} debugger, you give the arguments for running the program in the command line to the debugger rather than as part of the @code{run} command at the debugger prompt.) @@ -29450,10 +29177,10 @@ gawk> @kbd{n} @end example This tells us that @command{gawk} is now ready to execute line 66, which -decides whether to give the lines the special ``field skipping'' treatment +decides whether to give the lines the special ``field-skipping'' treatment indicated by the @option{-1} command-line option. (Notice that we skipped -from where we were before at line 63 to here, because the condition in line 63 -@samp{if (fcount == 0 && charcount == 0)} was false.) +from where we were before, at line 63, to here, because the condition +in line 63, @samp{if (fcount == 0 && charcount == 0)}, was false.) Continuing to step, we now get to the splitting of the current and last records: @@ -29527,7 +29254,7 @@ gawk> @kbd{n} Well, here we are at our error (sorry to spoil the suspense). What we had in mind was to join the fields starting from the second one to make -the virtual record to compare, and if the first field was numbered zero, +the virtual record to compare, and if the first field were numbered zero, this would work. Let's look at what we've got: @example @@ -29536,7 +29263,7 @@ gawk> @kbd{p cline clast} @print{} clast = "awk is a wonderful program!" @end example -Hey, those look pretty familiar! They're just our original, unaltered, +Hey, those look pretty familiar! They're just our original, unaltered input records. A little thinking (the human brain is still the best debugging tool), and we realize that we were off by one! @@ -29586,11 +29313,11 @@ Miscellaneous @end itemize Each of these are discussed in the following subsections. -In the following descriptions, commands which may be abbreviated +In the following descriptions, commands that may be abbreviated show the abbreviation on a second description line. A debugger command name may also be truncated if that partial name is unambiguous. The debugger has the built-in capability to -automatically repeat the previous command just by hitting @key{Enter}. +automatically repeat the previous command just by hitting @kbd{Enter}. This works for the commands @code{list}, @code{next}, @code{nexti}, @code{step}, @code{stepi}, and @code{continue} executed without any argument. @@ -29640,7 +29367,7 @@ Set a breakpoint at entry to (the first instruction of) function @var{function}. @end table -Each breakpoint is assigned a number which can be used to delete it from +Each breakpoint is assigned a number that can be used to delete it from the breakpoint list using the @code{delete} command. With a breakpoint, you may also supply a condition. This is an @@ -29692,7 +29419,7 @@ watchpoint is made unconditional). @cindex breakpoint, delete by number @item @code{delete} [@var{n1 n2} @dots{}] [@var{n}--@var{m}] @itemx @code{d} [@var{n1 n2} @dots{}] [@var{n}--@var{m}] -Delete specified breakpoints or a range of breakpoints. Deletes +Delete specified breakpoints or a range of breakpoints. Delete all defined breakpoints if no argument is supplied. @cindex debugger commands, @code{disable} @@ -29701,7 +29428,7 @@ all defined breakpoints if no argument is supplied. @cindex breakpoint, how to disable or enable @item @code{disable} [@var{n1 n2} @dots{} | @var{n}--@var{m}] Disable specified breakpoints or a range of breakpoints. Without -any argument, disables all breakpoints. +any argument, disable all breakpoints. @cindex debugger commands, @code{e} (@code{enable}) @cindex debugger commands, @code{enable} @@ -29711,18 +29438,18 @@ any argument, disables all breakpoints. @item @code{enable} [@code{del} | @code{once}] [@var{n1 n2} @dots{}] [@var{n}--@var{m}] @itemx @code{e} [@code{del} | @code{once}] [@var{n1 n2} @dots{}] [@var{n}--@var{m}] Enable specified breakpoints or a range of breakpoints. Without -any argument, enables all breakpoints. -Optionally, you can specify how to enable the breakpoint: +any argument, enable all breakpoints. +Optionally, you can specify how to enable the breakpoints: @c nested table @table @code @item del -Enable the breakpoint(s) temporarily, then delete it when -the program stops at the breakpoint. +Enable the breakpoints temporarily, then delete each one when +the program stops at it. @item once -Enable the breakpoint(s) temporarily, then disable it when -the program stops at the breakpoint. +Enable the breakpoints temporarily, then disable each one when +the program stops at it. @end table @cindex debugger commands, @code{ignore} @@ -29790,7 +29517,7 @@ gawk> @item @code{continue} [@var{count}] @itemx @code{c} [@var{count}] Resume program execution. If continued from a breakpoint and @var{count} is -specified, ignores the breakpoint at that location the next @var{count} times +specified, ignore the breakpoint at that location the next @var{count} times before stopping. @cindex debugger commands, @code{finish} @@ -29844,7 +29571,7 @@ automatic display variables, and debugger options. @item @code{step} [@var{count}] @itemx @code{s} [@var{count}] Continue execution until control reaches a different source line in the -current stack frame. @code{step} steps inside any function called within +current stack frame, stepping inside any function called within the line. If the argument @var{count} is supplied, steps that many times before stopping, unless it encounters a breakpoint or watchpoint. @@ -29957,7 +29684,7 @@ or field. String values must be enclosed between double quotes (@code{"}@dots{}@code{"}). You can also set special @command{awk} variables, such as @code{FS}, -@code{NF}, @code{NR}, and son on. +@code{NF}, @code{NR}, and so on. @cindex debugger commands, @code{w} (@code{watch}) @cindex debugger commands, @code{watch} @@ -29969,7 +29696,7 @@ You can also set special @command{awk} variables, such as @code{FS}, Add variable @var{var} (or field @code{$@var{n}}) to the watch list. The debugger then stops whenever the value of the variable or field changes. Each watched item is assigned a -number which can be used to delete it from the watch list using the +number that can be used to delete it from the watch list using the @code{unwatch} command. With a watchpoint, you may also supply a condition. This is an @@ -29997,11 +29724,11 @@ watch list. @node Execution Stack @subsection Working with the Stack -Whenever you run a program which contains any function calls, +Whenever you run a program that contains any function calls, @command{gawk} maintains a stack of all of the function calls leading up to where the program is right now. You can see how you got to where you are, and also move around in the stack to see what the state of things was in the -functions which called the one you are in. The commands for doing this are: +functions that called the one you are in. The commands for doing this are: @table @asis @cindex debugger commands, @code{bt} (@code{backtrace}) @@ -30036,8 +29763,8 @@ Then select and print the frame. @item @code{frame} [@var{n}] @itemx @code{f} [@var{n}] Select and print stack frame @var{n}. Frame 0 is the currently executing, -or @dfn{innermost}, frame (function call), frame 1 is the frame that -called the innermost one. The highest numbered frame is the one for the +or @dfn{innermost}, frame (function call); frame 1 is the frame that +called the innermost one. The highest-numbered frame is the one for the main program. The printed information consists of the frame number, function and argument names, source file, and the source line. @@ -30053,7 +29780,7 @@ Then select and print the frame. Besides looking at the values of variables, there is often a need to get other sorts of information about the state of your program and of the -debugging environment itself. The @command{gawk} debugger has one command which +debugging environment itself. The @command{gawk} debugger has one command that provides this information, appropriately called @code{info}. @code{info} is used with one of a number of arguments that tell it exactly what you want to know: @@ -30141,12 +29868,12 @@ The available options are: @table @asis @item @code{history_size} @cindex debugger history size -The maximum number of lines to keep in the history file @file{./.gawk_history}. -The default is 100. +Set the maximum number of lines to keep in the history file +@file{./.gawk_history}. The default is 100. @item @code{listsize} @cindex debugger default list amount -The number of lines that @code{list} prints. The default is 15. +Specify the number of lines that @code{list} prints. The default is 15. @item @code{outfile} @cindex redirect @command{gawk} output, in debugger @@ -30156,7 +29883,7 @@ standard output. @item @code{prompt} @cindex debugger prompt -The debugger prompt. The default is @samp{@w{gawk> }}. +Change the debugger prompt. The default is @samp{@w{gawk> }}. @item @code{save_history} [@code{on} | @code{off}] @cindex debugger history file @@ -30167,7 +29894,7 @@ The default is @code{on}. @cindex save debugger options Save current options to file @file{./.gawkrc} upon exit. The default is @code{on}. -Options are read back in to the next session upon startup. +Options are read back into the next session upon startup. @item @code{trace} [@code{on} | @code{off}] @cindex instruction tracing, in debugger @@ -30190,7 +29917,7 @@ command in the file. Also, the list of commands may include additional @code{source} commands; however, the @command{gawk} debugger will not source the same file more than once in order to avoid infinite recursion. -In addition to, or instead of the @code{source} command, you can use +In addition to, or instead of, the @code{source} command, you can use the @option{-D @var{file}} or @option{--debug=@var{file}} command-line options to execute commands from a file non-interactively (@pxref{Options}). @@ -30199,16 +29926,16 @@ options to execute commands from a file non-interactively @node Miscellaneous Debugger Commands @subsection Miscellaneous Commands -There are a few more commands which do not fit into the +There are a few more commands that do not fit into the previous categories, as follows: @table @asis @cindex debugger commands, @code{dump} @cindex @code{dump} debugger command @item @code{dump} [@var{filename}] -Dump bytecode of the program to standard output or to the file +Dump byte code of the program to standard output or to the file named in @var{filename}. This prints a representation of the internal -instructions which @command{gawk} executes to implement the @command{awk} +instructions that @command{gawk} executes to implement the @command{awk} commands in a program. This can be very enlightening, as the following partial dump of Davide Brini's obfuscated code (@pxref{Signature Program}) demonstrates: @@ -30305,7 +30032,7 @@ Print lines centered around line number @var{n} in source file @var{filename}. This command may change the current source file. @item @var{function} -Print lines centered around beginning of the +Print lines centered around the beginning of the function @var{function}. This command may change the current source file. @end table @@ -30317,16 +30044,16 @@ function @var{function}. This command may change the current source file. @item @code{quit} @itemx @code{q} Exit the debugger. Debugging is great fun, but sometimes we all have -to tend to other obligations in life, and sometimes we find the bug, +to tend to other obligations in life, and sometimes we find the bug and are free to go on to the next one! As we saw earlier, if you are -running a program, the debugger warns you if you accidentally type +running a program, the debugger warns you when you type @samp{q} or @samp{quit}, to make sure you really want to quit. @cindex debugger commands, @code{trace} @cindex @code{trace} debugger command @item @code{trace} [@code{on} | @code{off}] -Turn on or off a continuous printing of instructions which are about to -be executed, along with printing the @command{awk} line which they +Turn on or off continuous printing of the instructions that are about to +be executed, along with the @command{awk} lines they implement. The default is @code{off}. It is to be hoped that most of the ``opcodes'' in these instructions are @@ -30342,7 +30069,7 @@ fairly self-explanatory, and using @code{stepi} and @code{nexti} while If @command{gawk} is compiled with @uref{http://cnswww.cns.cwru.edu/php/chet/readline/readline.html, -the @code{readline} library}, you can take advantage of that library's +the GNU Readline library}, you can take advantage of that library's command completion and history expansion features. The following types of completion are available: @@ -30379,7 +30106,7 @@ and We hope you find the @command{gawk} debugger useful and enjoyable to work with, but as with any program, especially in its early releases, it still has -some limitations. A few which are worth being aware of are: +some limitations. A few that it's worth being aware of are: @itemize @value{BULLET} @item @@ -30395,13 +30122,13 @@ If you perused the dump of opcodes in @ref{Miscellaneous Debugger Commands} (or if you are already familiar with @command{gawk} internals), you will realize that much of the internal manipulation of data in @command{gawk}, as in many interpreters, is done on a stack. -@code{Op_push}, @code{Op_pop}, and the like, are the ``bread and butter'' of +@code{Op_push}, @code{Op_pop}, and the like are the ``bread and butter'' of most @command{gawk} code. Unfortunately, as of now, the @command{gawk} debugger does not allow you to examine the stack's contents. That is, the intermediate results of expression evaluation are on the -stack, but cannot be printed. Rather, only variables which are defined +stack, but cannot be printed. Rather, only variables that are defined in the program can be printed. Of course, a workaround for this is to use more explicit variables at the debugging stage and then change back to obscure, perhaps more optimal code later. @@ -30415,12 +30142,12 @@ programmer, you are expected to know the meaning of @item The @command{gawk} debugger is designed to be used by running a program (with all its parameters) on the command line, as described in @ref{Debugger Invocation}. -There is no way (as of now) to attach or ``break in'' to a running program. -This seems reasonable for a language which is used mainly for quickly +There is no way (as of now) to attach or ``break into'' a running program. +This seems reasonable for a language that is used mainly for quickly executing, short programs. @item -The @command{gawk} debugger only accepts source supplied with the @option{-f} option. +The @command{gawk} debugger only accepts source code supplied with the @option{-f} option. @end itemize @ignore @@ -30434,8 +30161,8 @@ be added, and of course feel free to try to add them yourself! @itemize @value{BULLET} @item Programs rarely work correctly the first time. Finding bugs -is @dfn{debugging} and a program that helps you find bugs is a -@dfn{debugger}. @command{gawk} has a built-in debugger that works very +is called debugging, and a program that helps you find bugs is a +debugger. @command{gawk} has a built-in debugger that works very similarly to the GNU Debugger, GDB. @item @@ -30455,7 +30182,7 @@ breakpoints, execution, viewing and changing data, working with the stack, getting information, and other tasks. @item -If the @code{readline} library is available when @command{gawk} is +If the GNU Readline library is available when @command{gawk} is compiled, it is used by the debugger to provide command-line history and editing. @@ -30712,7 +30439,7 @@ is available like so: @example $ @kbd{gawk --version} @print{} GNU Awk 4.1.2, API: 1.1 (GNU MPFR 3.1.0-p3, GNU MP 5.0.2) -@print{} Copyright (C) 1989, 1991-2014 Free Software Foundation. +@print{} Copyright (C) 1989, 1991-2015 Free Software Foundation. @dots{} @end example @@ -31366,7 +31093,7 @@ When asked about the algorithm used, Katie replied: @quotation It's not that well known but it's not that obscure either. It's Euler's modification to Newton's method for calculating pi. -Take a look at lines (23) - (25) here: @uref{http://mathworld.wolfram.com/PiFormulas.htm}. +Take a look at lines (23) - (25) here: @uref{http://mathworld.wolfram.com/PiFormulas.html}. The algorithm I wrote simply expands the multiply by 2 and works from the innermost expression outwards. I used this to program HP calculators @@ -31416,7 +31143,7 @@ Allowing completely alphabetic strings to have valid numeric values is also a very severe departure from historical practice. @end itemize -The second problem is that the @code{gawk} maintainer feels that this +The second problem is that the @command{gawk} maintainer feels that this interpretation of the standard, which requires a certain amount of ``language lawyering'' to arrive at in the first place, was not even intended by the standard developers. In other words, ``we see how you @@ -31575,7 +31302,7 @@ When @option{--sandbox} is specified, extensions are disabled * Finding Extensions:: How @command{gawk} finds compiled extensions. * Extension Example:: Example C code for an extension. * Extension Samples:: The sample extensions that ship with - @code{gawk}. + @command{gawk}. * gawkextlib:: The @code{gawkextlib} project. * Extension summary:: Extension summary. * Extension Exercises:: Exercises. @@ -32539,7 +32266,7 @@ If the concept of a ``record terminator'' makes sense, then @code{*rt_start} should be set to point to the data to be used for @code{RT}, and @code{*rt_len} should be set to the length of the data. Otherwise, @code{*rt_len} should be set to zero. -@code{gawk} makes its own copy of this data, so the +@command{gawk} makes its own copy of this data, so the extension must manage this storage. @end table @@ -32585,7 +32312,7 @@ When writing an input parser, you should think about (and document) how it is expected to interact with @command{awk} code. You may want it to always be called, and take effect as appropriate (as the @code{readdir} extension does). Or you may want it to take effect -based upon the value of an @code{awk} variable, as the XML extension +based upon the value of an @command{awk} variable, as the XML extension from the @code{gawkextlib} project does (@pxref{gawkextlib}). In the latter case, code in a @code{BEGINFILE} section can look at @code{FILENAME} and @code{ERRNO} to decide whether or @@ -33368,7 +33095,7 @@ converts it to a string. Using non-integral values is possible, but requires that you understand how such values are converted to strings (@pxref{Conversion}); thus using integral values is safest. -As with @emph{all} strings passed into @code{gawk} from an extension, +As with @emph{all} strings passed into @command{gawk} from an extension, the string value of @code{index} must come from @code{gawk_malloc()}, @code{gawk_calloc()} or @code{gawk_realloc()}, and @command{gawk} releases the storage. @@ -35654,9 +35381,7 @@ online documentation}. @node V7/SVR3.1 @appendixsec Major Changes Between V7 and SVR3.1 -@c STARTOFRANGE gawkv @cindex @command{awk}, versions of -@c STARTOFRANGE gawkv1 @cindex @command{awk}, versions of, changes between V7 and SVR3.1 The @command{awk} language evolved considerably between the release of @@ -35743,7 +35468,6 @@ Multiple @code{BEGIN} and @code{END} rules Multidimensional arrays (@pxref{Multidimensional}). @end itemize -@c ENDOFRANGE gawkv1 @node SVR4 @appendixsec Changes Between SVR3.1 and SVR4 @@ -35858,7 +35582,6 @@ not permitted by the POSIX standard. The 2008 POSIX standard can be found online at @url{http://www.opengroup.org/onlinepubs/9699919799/}. -@c ENDOFRANGE gawkv @node BTL @appendixsec Extensions in Brian Kernighan's @command{awk} @@ -35904,11 +35627,8 @@ available in his @command{awk}. @node POSIX/GNU @appendixsec Extensions in @command{gawk} Not in POSIX @command{awk} -@c STARTOFRANGE fripls @cindex compatibility mode (@command{gawk}), extensions -@c STARTOFRANGE exgnot @cindex extensions, in @command{gawk}, not in POSIX @command{awk} -@c STARTOFRANGE posnot @cindex POSIX, @command{gawk} extensions not included in The GNU implementation, @command{gawk}, adds a large number of features. They can all be disabled with either the @option{--traditional} or @@ -36237,9 +35957,6 @@ MirBSD @c XXX ADD MORE STUFF HERE -@c ENDOFRANGE fripls -@c ENDOFRANGE exgnot -@c ENDOFRANGE posnot @c This does not need to be in the formal book. @ifclear FOR_PRINT @@ -37326,9 +37043,7 @@ the appropriate credit where credit is due. @c last two commas are part of see also @cindex operating systems, See Also GNU/Linux@comma{} PC operating systems@comma{} Unix -@c STARTOFRANGE gligawk @cindex @command{gawk}, installing -@c STARTOFRANGE ingawk @cindex installing @command{gawk} This appendix provides instructions for installing @command{gawk} on the various platforms that are supported by the developers. The primary @@ -37438,7 +37153,6 @@ a local expert. @node Distribution contents @appendixsubsec Contents of the @command{gawk} Distribution -@c STARTOFRANGE gawdis @cindex @command{gawk}, distribution The @command{gawk} distribution has a number of C source files, @@ -37536,10 +37250,10 @@ The generated Info file for this @value{DOCUMENT}. @item doc/gawkinet.texi The Texinfo source file for @ifinfo -@inforef{Top, , General Introduction, gawkinet, TCP/IP Internetworking with @command{gawk}}. +@inforef{Top, , General Introduction, gawkinet, @value{GAWKINETTITLE}}. @end ifinfo @ifnotinfo -@cite{TCP/IP Internetworking with @command{gawk}}. +@cite{@value{GAWKINETTITLE}}. @end ifnotinfo It should be processed with @TeX{} (via @command{texi2dvi} or @command{texi2pdf}) @@ -37548,7 +37262,7 @@ with @command{makeinfo} to produce an Info or HTML file. @item doc/gawkinet.info The generated Info file for -@cite{TCP/IP Internetworking with @command{gawk}}. +@cite{@value{GAWKINETTITLE}}. @item doc/igawk.1 The @command{troff} source for a manual page describing the @command{igawk} @@ -37637,7 +37351,6 @@ directory to run your version of @command{gawk} against the test suite. If @command{gawk} successfully passes @samp{make check}, then you can be confident of a successful port. @end table -@c ENDOFRANGE gawdis @node Unix Installation @appendixsec Compiling and Installing @command{gawk} on Unix-Like Systems @@ -37788,7 +37501,7 @@ can be configured and compiled. @cindex @option{--disable-lint} configuration option @cindex configuration option, @code{--disable-lint} @item --disable-lint -Disable all lint checking within @code{gawk}. The +Disable all lint checking within @command{gawk}. The @option{--lint} and @option{--lint-old} options (@pxref{Options}) are accepted, but silently do nothing. @@ -38102,9 +37815,7 @@ multibyte functionality is not available. @node PC Using @appendixsubsubsec Using @command{gawk} on PC Operating Systems -@c STARTOFRANGE opgawx @cindex operating systems, PC, @command{gawk} on -@c STARTOFRANGE pcgawon @cindex PC operating systems, @command{gawk} on Under MS-DOS and MS-Windows, the Cygwin and MinGW environments support @@ -38612,8 +38323,6 @@ $ @kbd{gawk :== $sys$common:[syshlp.examples.tcpip.snmp]gawk.exe} This is apparently @value{PVERSION} 2.15.6, which is extremely old. We recommend compiling and using the current version. -@c ENDOFRANGE opgawx -@c ENDOFRANGE pcgawon @node Bugs @appendixsec Reporting Problems and Bugs @@ -38624,9 +38333,7 @@ recommend compiling and using the current version. @end quotation @c the radio show, not the book. :-) -@c STARTOFRANGE dbugg @cindex debugging @command{gawk}, bug reports -@c STARTOFRANGE tblgawb @cindex troubleshooting, @command{gawk}, bug reports If you have problems with @command{gawk} or think that you have found a bug, report it to the developers; we cannot promise to do anything @@ -38723,12 +38430,9 @@ The people maintaining the various @command{gawk} ports are: If your bug is also reproducible under Unix, send a copy of your report to the @EMAIL{bug-gawk@@gnu.org,bug-gawk at gnu dot org} email list as well. -@c ENDOFRANGE dbugg -@c ENDOFRANGE tblgawb @node Other Versions @appendixsec Other Freely Available @command{awk} Implementations -@c STARTOFRANGE awkim @cindex @command{awk}, implementations @ignore From: emory!amc.com!brennan (Michael Brennan) @@ -38788,7 +38492,7 @@ git clone git://github.com/onetrueawk/awk bwkawk @end example @noindent -This command creates a copy of the @uref{http://www.git-scm.com, Git} +This command creates a copy of the @uref{http://git-scm.com, Git} repository in a directory named @file{bwkawk}. If you leave that argument off the @command{git} command line, the repository copy is created in a directory named @file{awk}. @@ -38853,7 +38557,7 @@ To get @command{awka}, go to @url{http://sourceforge.net/projects/awka}. @c andrewsumner@@yahoo.net The project seems to be frozen; no new code changes have been made -since approximately 2003. +since approximately 2001. @cindex Beebe, Nelson H.F.@: @cindex @command{pawk} (profiling version of Brian Kernighan's @command{awk}) @@ -38949,7 +38653,6 @@ See also the ``Versions and implementations'' section of the Wikipedia article} for information on additional versions. @end table -@c ENDOFRANGE awkim @node Installation summary @appendixsec Summary @@ -38987,15 +38690,11 @@ implementations. Many are POSIX compliant; others are less so. @end itemize -@c ENDOFRANGE gligawk -@c ENDOFRANGE ingawk @ifclear FOR_PRINT @node Notes @appendix Implementation Notes -@c STARTOFRANGE gawii @cindex @command{gawk}, implementation issues -@c STARTOFRANGE impis @cindex implementation issues, @command{gawk} This appendix contains information mainly of interest to implementers and @@ -39071,7 +38770,7 @@ However, if you want to modify @command{gawk} and contribute back your changes, you will probably wish to work with the development version. To do so, you will need to access the @command{gawk} source code repository. The code is maintained using the -@uref{http://git-scm.com/, Git distributed version control system}. +@uref{http://git-scm.com, Git distributed version control system}. You will need to install it if your system doesn't have it. Once you have done so, use the command: @@ -39100,11 +38799,8 @@ that has a Git plug-in for working with Git repositories. @node Adding Code @appendixsubsec Adding New Features -@c STARTOFRANGE adfgaw @cindex adding, features to @command{gawk} -@c STARTOFRANGE fadgaw @cindex features, adding to @command{gawk} -@c STARTOFRANGE gawadf @cindex @command{gawk}, features, adding You are free to add any new features you like to @command{gawk}. However, if you want your changes to be incorporated into the @command{gawk} @@ -39139,7 +38835,7 @@ for information on getting the latest version of @command{gawk}.) @item @ifnotinfo -Follow the @uref{http://www.gnu.org/prep/standards/, @cite{GNU Coding Standards}}. +Follow the @cite{GNU Coding Standards}. @end ifnotinfo @ifinfo See @inforef{Top, , Version, standards, GNU Coding Standards}. @@ -39148,7 +38844,7 @@ This document describes how GNU software should be written. If you haven't read it, please do so, preferably @emph{before} starting to modify @command{gawk}. (The @cite{GNU Coding Standards} are available from the GNU Project's -@uref{http://www.gnu.org/prep/standards_toc.html, website}. +@uref{http://www.gnu.org/prep/standards/, website}. Texinfo, Info, and DVI versions are also available.) @cindex @command{gawk}, coding style in @@ -39271,9 +38967,6 @@ Although this sounds like a lot of work, please remember that while you may write the new code, I have to maintain it and support it. If it isn't possible for me to do that with a minimum of extra work, then I probably will not. -@c ENDOFRANGE adfgaw -@c ENDOFRANGE gawadf -@c ENDOFRANGE fadgaw @node New Ports @appendixsubsec Porting @command{gawk} to a New Operating System @@ -39407,7 +39100,6 @@ coding style and brace layout that suits your taste. @node Derived Files @appendixsubsec Why Generated Files Are Kept In Git -@c STARTOFRANGE gawkgit @cindex Git, use of for @command{gawk} source code @c From emails written March 22, 2012, to the gawk developers list. @@ -39596,7 +39288,6 @@ wget http://git.savannah.gnu.org/cgit/gawk.git/snapshot/gawk-@var{branchname}.ta @noindent to retrieve a snapshot of the given branch. -@c ENDOFRANGE gawkgit @node Future Extensions @appendixsec Probable Future Extensions @@ -39977,13 +39668,10 @@ of @command{gawk}, but it @emph{will} be removed in the next major release. @end itemize -@c ENDOFRANGE impis -@c ENDOFRANGE gawii @node Basic Concepts @appendix Basic Programming Concepts @cindex programming, concepts -@c STARTOFRANGE procon @cindex programming, concepts This @value{APPENDIX} attempts to define some of the basic concepts @@ -40221,7 +39909,6 @@ standard for C. This standard became an ISO standard in 1990. In 1999, a revised ISO C standard was approved and released. Where it makes sense, POSIX @command{awk} is compatible with 1999 ISO C. -@c ENDOFRANGE procon @node Glossary @unnumbered Glossary @@ -40272,6 +39959,21 @@ languages. These standards often become international standards as well. See also ``ISO.'' +@item Argument +An argument can be two different things. It can be an option or a +@value{FN} passed to a command while invoking it from the command line, or +it can be something passed to a @dfn{function} inside a program, e.g. +inside @command{awk}. + +In the latter case, an argument can be passed to a function in two ways. +Either it is given to the called function by value, i.e., a copy of the +value of the variable is made available to the called function, but the +original variable cannot be modified by the function itself; or it is +given by reference, i.e., a pointer to the interested variable is passed to +the function, which can then directly modify it. In @command{awk} +scalars are passed by value, and arrays are passed by reference. +See ``Pass By Value/Reference.'' + @item Array A grouping of multiple values under the same name. Most languages just provide sequential arrays. @@ -40313,6 +40015,25 @@ The GNU version of the standard shell @end ifinfo See also ``Bourne Shell.'' +@item Binary +Base-two notation, where the digits are @code{0}--@code{1}. Since +electronic circuitry works ``naturally'' in base 2 (just think of Off/On), +everything inside a computer is calculated using base 2. Each digit +represents the presence (or absence) of a power of 2 and is called a +@dfn{bit}. So, for example, the base-two number @code{10101} is +the same as decimal 21, ((1 x 16) + (1 x 4) + (1 x 1)). + +Since base-two numbers quickly become +very long to read and write, they are usually grouped by 3 (i.e., they are +read as octal numbers), or by 4 (i.e., they are read as hexadecimal +numbers). There is no direct way to insert base 2 numbers in a C program. +If need arises, such numbers are usually inserted as octal or hexadecimal +numbers. The number of base-two digits that fit into registers used for +representing integer numbers in computers is a rough indication of the +computing power of the computer itself. Most computers nowadays use 64 +bits for representing integer numbers in their registers, but 32-bit, +16-bit and 8-bit registers have been widely used in the past. +@xref{Nondecimal-numbers}. @item Bit Short for ``Binary Digit.'' All values in computer memory ultimately reduce to binary digits: values @@ -40344,6 +40065,19 @@ The characters @samp{@{} and @samp{@}}. Braces are used in @command{awk} for delimiting actions, compound statements, and function bodies. +@item Bracket Expression +Inside a @dfn{regular expression}, an expression included in square +brackets, meant to designate a single character as belonging to a +specified character class. A bracket expression can contain a list of one +or more characters, like @samp{[abc]}, a range of characters, like +@samp{[A-Z]}, or a name, delimited by @samp{:}, that designates a known set +of characters, like @samp{[:digit:]}. The form of bracket expression +enclosed between @samp{:} is independent of the underlying representation +of the character themselves, which could utilize the ASCII, ECBDIC, or +Unicode codesets, depending on the architecture of the computer system, and on +localization. +See also ``Regular Expression.'' + @item Built-in Function The @command{awk} language provides built-in functions that perform various numerical, I/O-related, and string computations. Examples are @@ -40397,9 +40131,25 @@ points out similarities between @command{awk} and C when appropriate. In general, @command{gawk} attempts to be as similar to the 1990 version of ISO C as makes sense. +@item C Shell +The C Shell (@command{csh} or its improved version, @command{tcsh}) is a Unix shell that was +created by Bill Joy in the late 1970s. The C shell was differentiated from +other shells by its interactive features and overall style, which +looks more like C. The C Shell is not backward compatible with the Bourne +Shell, so special attention is required when converting scripts +written for other Unix shells to the C shell, especially with regard to the management of +shell variables. +See also ``Bourne Shell.'' + @item C++ A popular object-oriented programming language derived from C. +@item Character Class +See ``Bracket Expression.'' + +@item Character List +See ``Bracket Expression.'' + @cindex ASCII @cindex ISO 8859-1 @cindex ISO Latin-1 @@ -40423,7 +40173,7 @@ A preprocessor for @command{pic} that reads descriptions of molecules and produces @command{pic} input for drawing them. It was written in @command{awk} by Brian Kernighan and Jon Bentley, and is available from -@uref{http://netlib.sandia.gov/netlib/typesetting/chem.gz}. +@uref{http://netlib.org/typesetting/chem}. @item Comparison Expression A relation that is either true or false, such as @samp{a < b}. @@ -40439,11 +40189,23 @@ machine-executable object code. The object code is then executed directly by the computer. See also ``Interpreter.'' +@item Complemented Bracket Expression +The negation of a @dfn{bracket expression}. All that is @emph{not} +described by a given bracket expression. The symbol @samp{^} precedes +the negated bracket expression. E.g.: @samp{[[^:digit:]} +designates whatever character is not a digit. @samp{[^bad]} +designates whatever character is not one of the letters @samp{b}, @samp{a}, +or @samp{d}. +See ``Bracket Expression.'' + @item Compound Statement A series of @command{awk} statements, enclosed in curly braces. Compound statements may be nested. (@xref{Statements}.) +@item Computed Regexps +See ``Dynamic Regular Expressions.'' + @item Concatenation Concatenating two strings means sticking them together, one after another, producing a new string. For example, the string @samp{foo} concatenated with @@ -40458,6 +40220,13 @@ expression is the value of @var{expr2}; otherwise the value is @var{expr3}. In either case, only one of @var{expr2} and @var{expr3} is evaluated. (@xref{Conditional Exp}.) +@item Control Statement +A control statement is an instruction to perform a given operation or a set +of operations inside an @command{awk} program, if a given condition is +true. Control statements are: @code{if}, @code{for}, @code{while}, and +@code{do} +(@pxref{Statements}). + @cindex McIlroy, Doug @cindex cookie @item Cookie @@ -40612,6 +40381,11 @@ Format strings control the appearance of output in the are controlled by the format strings contained in the predefined variables @code{CONVFMT} and @code{OFMT}. (@xref{Control Letters}.) +@item Fortran +Shorthand for FORmula TRANslator, one of the first programming languages +available for scientific calculations. It was created by John Backus, +and has been available since 1957. It is still in use today. + @item Free Documentation License This document describes the terms under which this @value{DOCUMENT} is published and may be copied. (@xref{GNU Free Documentation License}.) @@ -40629,10 +40403,21 @@ Emacs editor. GNU Emacs is the most widely used version of Emacs today. See ``Free Software Foundation.'' @item Function -A specialized group of statements used to encapsulate general -or program-specific tasks. @command{awk} has a number of built-in -functions, and also allows you to define your own. -(@xref{Functions}.) +A part of an @command{awk} program that can be invoked from every point of +the program, to perform a task. @command{awk} has several built-in +functions. +Users can define their own functions in every part of the program. +Function can be recursive, i.e., they may invoke themselves. +@xref{Functions}. +In @command{gawk} it is also possible to have functions shared +among different programs, and included where required using the +@code{@@include} directive +(@pxref{Include Files}). +In @command{gawk} the name of the function that should be invoked +can be generated at run time, i.e., dynamically. +The @command{gawk} extension API provides constructor functions +(@pxref{Constructor Functions}). + @item @command{gawk} The GNU implementation of @command{awk}. @@ -40756,6 +40541,12 @@ meaning. Keywords are reserved and may not be used as variable names. and @code{while}. +@item Korn Shell +The Korn Shell (@command{ksh}) is a Unix shell which was developed by David Korn at Bell +Laboratories in the early 1980s. The Korn Shell is backward-compatible with the Bourne +shell and includes many features of the C shell. +See also ``Bourne Shell.'' + @cindex LGPL (Lesser General Public License) @cindex Lesser General Public License (LGPL) @cindex GNU Lesser General Public License @@ -40795,6 +40586,14 @@ Characters used within a regexp that do not stand for themselves. Instead, they denote regular expression operations, such as repetition, grouping, or alternation. +@item Nesting +Nesting is where information is organized in layers, or where objects +contain other similar objects. +In @command{gawk} the @code{@@include} +directive can be nested. The ``natural'' nesting of arithmetic and +logical operations can be changed using parentheses +(@pxref{Precedence}). + @item No-op An operation that does nothing. @@ -40815,6 +40614,11 @@ Octal numbers are written in C using a leading @samp{0}, to indicate their base. Thus, @code{013} is 11 ((1 x 8) + 3). @xref{Nondecimal-numbers}. +@item Output Record +A single chunk of data that is written out by @command{awk}. Usually, an +@command{awk} output record consists of one or more lines of text. +@xref{Records}. + @item Pattern Patterns tell @command{awk} which input records are interesting to which rules. @@ -40829,6 +40633,9 @@ An acronym describing what is possibly the most frequent source of computer usage problems. (Problem Exists Between Keyboard And Chair.) +@item Plug-in +See ``Extensions.'' + @item POSIX The name for a series of standards that specify a Portable Operating System interface. The ``IX'' denotes @@ -40853,6 +40660,9 @@ A sequence of consecutive lines from the input file(s). A pattern can specify ranges of input lines for @command{awk} to process or it can specify single lines. (@xref{Pattern Overview}.) +@item Record +See ``Input record'' and ``Output record.'' + @item Recursion When a function calls itself, either directly or indirectly. If this is clear, stop, and proceed to the next entry. @@ -40870,6 +40680,15 @@ operators. (@xref{Getline}, and @ref{Redirection}.) +@item Reference Counts +An internal mechanism in @command{gawk} to minimize the amount of memory +needed to store the value of string variables. If the value assumed by +a variable is used in more than one place, only one copy of the value +itself is kept, and the associated reference count is increased when the +same value is used by an additional variable, and decresed when the related +variable is no longer in use. When the reference count goes to zero, +the memory space used to store the value of the variable is freed. + @item Regexp See ``Regular Expression.'' @@ -40887,6 +40706,15 @@ slashes, such as @code{/foo/}. This regular expression is chosen when you write the @command{awk} program and cannot be changed during its execution. (@xref{Regexp Usage}.) +@item Regular Expression Operators +See ``Metacharacters.'' + +@item Rounding +Rounding the result of an arithmetic operation can be tricky. +More than one way of rounding exists, and in @command{gawk} +it is possible to choose which method should be used in a program. +@xref{Setting the rounding mode}. + @item Rule A segment of an @command{awk} program that specifies how to process single input records. A rule consists of a @dfn{pattern} and an @dfn{action}. @@ -40946,6 +40774,12 @@ A @value{FN} interpreted internally by @command{gawk}, instead of being handed directly to the underlying operating system---for example, @file{/dev/stderr}. (@xref{Special Files}.) +@item Statement +An expression inside an @command{awk} program in the action part +of a pattern--action rule, or inside an +@command{awk} function. A statement can be a variable assignment, +an array operation, a loop, etc. + @item Stream Editor A program that reads records from an input stream and processes them one or more at a time. This is in contrast with batch programs, which may @@ -40996,9 +40830,14 @@ This is standard time in Greenwich, England, which is used as a reference time for day and date calculations. See also ``Epoch'' and ``GMT.'' +@item Variable +A name for a value. In @command{awk}, variables may be either scalars +or arrays. + @item Whitespace A sequence of space, TAB, or newline characters occurring inside an input record or a string. + @end table @end ifclear |