diff options
author | Juergen Kahrs <Juergen.Kahrs@googlemail.com> | 2013-04-28 19:30:52 +0200 |
---|---|---|
committer | Juergen Kahrs <Juergen.Kahrs@googlemail.com> | 2013-04-28 19:30:52 +0200 |
commit | 11c996c675afa475d46834b2b09039097e25afb5 (patch) | |
tree | 8e720e41b15affe811b21d74bddba14c302612fa /doc/gawk.texi | |
parent | 74db9f3cb12c4c45487b8646473daad7d0df641f (diff) | |
parent | 1dd19986291bdd1129ac08eec40d963a65170422 (diff) | |
download | egawk-11c996c675afa475d46834b2b09039097e25afb5.tar.gz egawk-11c996c675afa475d46834b2b09039097e25afb5.tar.bz2 egawk-11c996c675afa475d46834b2b09039097e25afb5.zip |
Merge remote-tracking branch 'origin/master' into cmake
Conflicts:
README_d/ChangeLog
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r-- | doc/gawk.texi | 3217 |
1 files changed, 2121 insertions, 1096 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi index a5b57712..fcd07a07 100644 --- a/doc/gawk.texi +++ b/doc/gawk.texi @@ -1,4 +1,10 @@ +% **************************************************** +% * DO NOT MODIFY THIS FILE!!!! * +% * It was generated from gawktexi.in by sidebar.awk * +% * Edit gawktexi.in instead. * +% **************************************************** \input texinfo @c -*-texinfo-*- +@c vim: filetype=texinfo @c %**start of header (This is for running Texinfo on a region.) @setfilename gawk.info @settitle The GNU Awk User's Guide @@ -20,7 +26,7 @@ @c applies to and all the info about who's publishing this edition @c These apply across the board. -@set UPDATE-MONTH January, 2013 +@set UPDATE-MONTH April, 2013 @set VERSION 4.0 @set PATCHLEVEL 2 @@ -297,10 +303,10 @@ particular records in a file and perform operations upon them. * Library Functions:: A Library of @command{awk} Functions. * Sample Programs:: Many @command{awk} programs with complete explanations. -* Internationalization:: Getting @command{gawk} to speak your - language. * Advanced Features:: Stuff for advanced users, specific to @command{gawk}. +* Internationalization:: Getting @command{gawk} to speak your + language. * Debugger:: The @code{gawk} debugger. * Arbitrary Precision Arithmetic:: Arbitrary precision arithmetic with @command{gawk}. @@ -654,17 +660,6 @@ particular records in a file and perform operations upon them. * Anagram Program:: Finding anagrams from a dictionary. * Signature Program:: People do amazing things with too much time on their hands. -* I18N and L10N:: Internationalization and Localization. -* Explaining gettext:: How GNU @code{gettext} works. -* Programmer i18n:: Features for the programmer. -* Translator i18n:: Features for the translator. -* String Extraction:: Extracting marked strings. -* Printf Ordering:: Rearranging @code{printf} arguments. -* I18N Portability:: @command{awk}-level portability - issues. -* I18N Example:: A simple i18n example. -* Gawk I18N:: @command{gawk} is also - internationalized. * Nondecimal Data:: Allowing nondecimal input data. * Array Sorting:: Facilities for controlling array traversal and sorting arrays. @@ -676,6 +671,17 @@ particular records in a file and perform operations upon them. * TCP/IP Networking:: Using @command{gawk} for network programming. * Profiling:: Profiling your @command{awk} programs. +* I18N and L10N:: Internationalization and Localization. +* Explaining gettext:: How GNU @code{gettext} works. +* Programmer i18n:: Features for the programmer. +* Translator i18n:: Features for the translator. +* String Extraction:: Extracting marked strings. +* Printf Ordering:: Rearranging @code{printf} arguments. +* I18N Portability:: @command{awk}-level portability + issues. +* I18N Example:: A simple i18n example. +* Gawk I18N:: @command{gawk} is also + internationalized. * Debugging:: Introduction to @command{gawk} debugger. * Debugging Concepts:: Debugging in General. @@ -736,7 +742,7 @@ particular records in a file and perform operations upon them. * Output Wrappers:: Registering an output wrapper. * Two-way processors:: Registering a two-way processor. * Printing Messages:: Functions for printing messages. -* Updating @code{ERRNO}:: Functions for updating @code{ERRNO}. +* Updating @code{ERRNO}:: Functions for updating @code{ERRNO}. * Accessing Parameters:: Functions for accessing parameters. * Symbol Table Access:: Functions for accessing global variables. @@ -840,13 +846,14 @@ particular records in a file and perform operations upon them. @command{git} repository. * Future Extensions:: New features that may be implemented one day. -* Implementation Limitations:: Some limitations of the implementation. +* Implementation Limitations:: Some limitations of the + implementation. * Extension Design:: Design notes about the extension API. * Old Extension Problems:: Problems with the old mechanism. * Extension New Mechanism Goals:: Goals for the new mechanism. * Extension Other Design Decisions:: Some other design decisions. * Extension Future Growth:: Some room for future growth. -* Old Extension Mechansim:: Some compatibility for old extensions. +* Old Extension Mechanism:: Some compatibility for old extensions. * Basic High Level:: The high level view. * Basic Data Typing:: A very quick intro to data types. @end detailmenu @@ -1007,7 +1014,7 @@ The GNU implementation of @command{awk} is called @command{gawk}; if you invoke it with the proper options or environment variables (@pxref{Options}), it is fully compatible with -the POSIX@footnote{The 2008 POSIX standard can be found online at +the POSIX@footnote{The 2008 POSIX standard is online at @url{http://www.opengroup.org/onlinepubs/9699919799/}.} specification of the @command{awk} language and with the Unix version of @command{awk} maintained @@ -1101,20 +1108,47 @@ has been removed.) @unnumberedsec History of @command{awk} and @command{gawk} @cindex recipe for a programming language @cindex programming language, recipe for -@center Recipe For A Programming Language +@cindex sidebar, Recipe For A Programming Language +@ifdocbook +@docbook +<sidebar><title>Recipe For A Programming Language</title> +@end docbook + @multitable {2 parts} {1 part @code{egrep}} {1 part @code{snobol}} @item @tab 1 part @code{egrep} @tab 1 part @code{snobol} @item @tab 2 parts @code{ed} @tab 3 parts C @end multitable -@quotation Blend all parts well using @code{lex} and @code{yacc}. Document minimally and release. After eight years, add another part @code{egrep} and two more parts C. Document very well and release. -@end quotation + +@docbook +</sidebar> +@end docbook +@end ifdocbook + +@ifnotdocbook +@cartouche +@center @b{Recipe For A Programming Language} + + + +@multitable {2 parts} {1 part @code{egrep}} {1 part @code{snobol}} +@item @tab 1 part @code{egrep} @tab 1 part @code{snobol} +@item @tab 2 parts @code{ed} @tab 3 parts C +@end multitable + +Blend all parts well using @code{lex} and @code{yacc}. +Document minimally and release. + +After eight years, add another part @code{egrep} and two +more parts C. Document very well and release. +@end cartouche +@end ifnotdocbook @cindex Aho, Alfred @cindex Weinberger, Peter @@ -1233,13 +1267,11 @@ You should also ignore the many cross-references; they are for the expert user and for the online Info and HTML versions of the document. @end ifnotinfo -There are -subsections labeled -as @strong{Advanced Notes} +There are sidebars scattered throughout the @value{DOCUMENT}. They add a more complete explanation of points that are relevant, but not likely to be of interest on first reading. -All appear in the index, under the heading ``advanced features.'' +All appear in the index, under the heading ``sidebar.'' Most of the time, the examples use complete @command{awk} programs. Some of the more advanced sections show only the part of the @command{awk} @@ -1319,10 +1351,6 @@ solving real problems. Part III focuses on features specific to @command{gawk}. It contains the following chapters: -@ref{Internationalization}, -describes special features in @command{gawk} for translating program -messages into different languages at runtime. - @ref{Advanced Features}, describes a number of @command{gawk}-specific advanced features. Of particular note @@ -1330,6 +1358,10 @@ are the abilities to have two-way communications with another process, perform TCP/IP networking, and profile your @command{awk} programs. +@ref{Internationalization}, +describes special features in @command{gawk} for translating program +messages into different languages at runtime. + @ref{Debugger}, describes the @command{awk} debugger. @ref{Arbitrary Precision Arithmetic}, @@ -1382,7 +1414,7 @@ and this @value{DOCUMENT}, respectively. @section Typographical Conventions @cindex Texinfo -This @value{DOCUMENT} is written in @uref{http://texinfo.org, Texinfo}, +This @value{DOCUMENT} is written in @uref{http://www.gnu.org/software/texinfo/, Texinfo}, the GNU documentation formatting language. A single Texinfo source file is used to produce both the printed and online versions of the documentation. @@ -1597,10 +1629,13 @@ the title @cite{The GNU Awk User's Guide}. This edition maintains the basic structure of the previous editions. For Edition 4.0, the content has been thoroughly reviewed -and updated. All references to versions prior to 4.0 have been +and updated. All references to @command{gawk} versions prior to 4.0 have been removed. Of significant note for this edition is @ref{Debugger}. +For edition 4.1, the content has been reorganized into parts, +and the major new addition is @ref{Dynamic Extensions}. + @cite{@value{TITLE}} will undoubtedly continue to evolve. An electronic version comes with the @command{gawk} distribution from the FSF. @@ -1723,39 +1758,45 @@ The intrepid members of the GNITS mailing list, and most notably Ulrich Drepper, provided invaluable help and feedback for the design of the internationalization features. -Chuck Toporek, Mary Sheehan, and Claire Coutier of O'Reilly & Associates contributed +Chuck Toporek, Mary Sheehan, and Claire Cloutier of O'Reilly & Associates contributed significant editorial help for this @value{DOCUMENT} for the 3.1 release of @command{gawk}. @end quotation @cindex Beebe, Nelson @cindex Buening, Andreas +@cindex Collado, Manuel @cindex Colombo, Antonio @cindex Davies, Stephen @cindex Deifik, Scott -@cindex DuBois, John +@cindex Demaille, Akim @cindex Hankerson, Darrel -@cindex Haque, John @cindex Jaegermann, Michal @cindex Kahrs, J@"urgen @cindex Kasal, Stepan +@cindex Malmberg, John @cindex Pitts, Dave +@cindex Ramey, Chet @cindex Rankin, Pat @cindex Schorr, Andrew @cindex Vinschen, Corinna @cindex Wallin, Anders @cindex Zaretskii, Eli + Dr.@: Nelson Beebe, Andreas Buening, +Dr.@: Manuel Collado, Antonio Colombo, Stephen Davies, Scott Deifik, -John H. DuBois III, +Akim Demaille, Darrel Hankerson, Michal Jaegermann, J@"urgen Kahrs, -Dave Pitts, Stepan Kasal, +John Malmberg, +Dave Pitts, +Chet Ramey, Pat Rankin, Andrew Schorr, Corinna Vinschen, @@ -1768,20 +1809,14 @@ help, @command{gawk} would not be nearly the fine program it is today. It has been and continues to be a pleasure working with this team of fine people. -John Haque contributed the modifications to convert @command{gawk} -into a byte-code interpreter, including the debugger, and the -additional modifications for support of arbitrary precision arithmetic. -Stephen Davies -contributed to the effort to bring the byte-code changes into the mainstream -code base. -Efraim Yawitz contributed the initial text of @ref{Debugger}. -John Haque contributed the initial text of @ref{Arbitrary Precision Arithmetic}. +Notable code and documentation contributions were made by +a number of people. @xref{Contributors}, for the full list. @cindex Kernighan, Brian I would like to thank Brian Kernighan for invaluable assistance during the testing and debugging of @command{gawk}, and for ongoing help and advice in clarifying numerous points about the language. - We could not have done nearly as good a job on either @command{gawk} +We could not have done nearly as good a job on either @command{gawk} or its documentation without his help. @cindex Robbins, Miriam @@ -1801,7 +1836,7 @@ take advantage of those opportunities. Arnold Robbins @* Nof Ayalon @* ISRAEL @* -March, 2011 +April, 2013 @iftex @part Part I:@* The @command{awk} Language @@ -2161,8 +2196,45 @@ Self-contained @command{awk} scripts are useful when you want to write a program that users can invoke without their having to know that the program is written in @command{awk}. -@c fakenode --- for prepinfo -@subheading Advanced Notes: Portability Issues with @samp{#!} +@cindex sidebar, Portability Issues with @samp{#!} +@ifdocbook +@docbook +<sidebar><title>Portability Issues with @samp{#!}</title> +@end docbook + +@cindex portability, @code{#!} (executable scripts) + +Some systems limit the length of the interpreter name to 32 characters. +Often, this can be dealt with by using a symbolic link. + +You should not put more than one argument on the @samp{#!} +line after the path to @command{awk}. It does not work. The operating system +treats the rest of the line as a single argument and passes it to @command{awk}. +Doing this leads to confusing behavior---most likely a usage diagnostic +of some sort from @command{awk}. + +@cindex @code{ARGC}/@code{ARGV} variables, portability and +@cindex portability, @code{ARGV} variable +Finally, +the value of @code{ARGV[0]} +(@pxref{Built-in Variables}) +varies depending upon your operating system. +Some systems put @samp{awk} there, some put the full pathname +of @command{awk} (such as @file{/bin/awk}), and some put the name +of your script (@samp{advice}). @value{DARKCORNER} +Don't rely on the value of @code{ARGV[0]} +to provide your script name. + +@docbook +</sidebar> +@end docbook +@end ifdocbook + +@ifnotdocbook +@cartouche +@center @b{Portability Issues with @samp{#!}} + + @cindex portability, @code{#!} (executable scripts) Some systems limit the length of the interpreter name to 32 characters. @@ -2185,6 +2257,8 @@ of @command{awk} (such as @file{/bin/awk}), and some put the name of your script (@samp{advice}). @value{DARKCORNER} Don't rely on the value of @code{ARGV[0]} to provide your script name. +@end cartouche +@end ifnotdocbook @node Comments @subsection Comments in @command{awk} Programs @@ -3177,9 +3251,9 @@ your program. This option is very similar to the @option{-f} option, but there are two important differences. First, when @option{-i} is used, the program source will not be loaded if it has been previously loaded, whereas the @option{-f} will always load the file. -Second, because this option is intended to be used with code libraries, the -@command{awk} command does not recognize such files as constituting main program -input. Thus, after processing an @option{-i} argument, we still expect to +Second, because this option is intended to be used with code libraries, +@command{gawk} does not recognize such files as constituting main program +input. Thus, after processing an @option{-i} argument, @command{gawk} still expects to find the main source code via the @option{-f} option or on the command-line. @item -v @var{var}=@var{val} @@ -3855,6 +3929,8 @@ with the @option{-l} option rather than for source files. If the library is not found, the path is searched again after adding the appropriate shared library suffix for the platform. For example, on GNU/Linux systems, the suffix @samp{.so} is used. +The search path specified is also used for libraries loaded via the +@samp{@@load} keyword (@pxref{Loading Shared Libraries}). @node Other Environment Variables @subsection Other Environment Variables @@ -4100,6 +4176,9 @@ For command-line usage, the @option{-l} option is more convenient, but @samp{@@load} is useful for embedding inside an @command{awk} source file that requires access to a shared library. +@ref{Dynamic Extensions}, describes how to write extensions (in C or C++) +that can be loaded with either @samp{@@load} or the @option{-l} option. + @node Obsolete @section Obsolete Options and/or Features @@ -4485,8 +4564,12 @@ A backslash before any other character means to treat that character literally. @end itemize -@c fakenode --- for prepinfo -@subheading Advanced Notes: Backslash Before Regular Characters +@cindex sidebar, Backslash Before Regular Characters +@ifdocbook +@docbook +<sidebar><title>Backslash Before Regular Characters</title> +@end docbook + @cindex portability, backslash in escape sequences @cindex POSIX @command{awk}, backslashes in string constants @cindex backslash (@code{\}), in escape sequences, POSIX and @@ -4518,8 +4601,55 @@ In such implementations, typing @code{"a\qc"} is the same as typing @code{"a\\qc"}. @end table -@c fakenode --- for prepinfo -@subheading Advanced Notes: Escape Sequences for Metacharacters +@docbook +</sidebar> +@end docbook +@end ifdocbook + +@ifnotdocbook +@cartouche +@center @b{Backslash Before Regular Characters} + + +@cindex portability, backslash in escape sequences +@cindex POSIX @command{awk}, backslashes in string constants +@cindex backslash (@code{\}), in escape sequences, POSIX and +@cindex @code{\} (backslash), in escape sequences, POSIX and + +@cindex troubleshooting, backslash before nonspecial character +If you place a backslash in a string constant before something that is +not one of the characters previously listed, POSIX @command{awk} purposely +leaves what happens as undefined. There are two choices: + +@c @cindex automatic warnings +@c @cindex warnings, automatic +@table @asis +@item Strip the backslash out +This is what Brian Kernighan's @command{awk} and @command{gawk} both do. +For example, @code{"a\qc"} is the same as @code{"aqc"}. +(Because this is such an easy bug both to introduce and to miss, +@command{gawk} warns you about it.) +Consider @samp{FS = @w{"[ \t]+\|[ \t]+"}} to use vertical bars +surrounded by whitespace as the field separator. There should be +two backslashes in the string: @samp{FS = @w{"[ \t]+\\|[ \t]+"}}.) +@c I did this! This is why I added the warning. + +@cindex @command{gawk}, escape sequences +@cindex Unix @command{awk}, backslashes in escape sequences +@item Leave the backslash alone +Some other @command{awk} implementations do this. +In such implementations, typing @code{"a\qc"} is the same as typing +@code{"a\\qc"}. +@end table +@end cartouche +@end ifnotdocbook + +@cindex sidebar, Escape Sequences for Metacharacters +@ifdocbook +@docbook +<sidebar><title>Escape Sequences for Metacharacters</title> +@end docbook + @cindex metacharacters, escape sequences for Suppose you use an octal or hexadecimal @@ -4538,6 +4668,36 @@ In compatibility mode (@pxref{Options}), escape sequences literally when used in regexp constants. Thus, @code{/a\52b/} is equivalent to @code{/a\*b/}. +@docbook +</sidebar> +@end docbook +@end ifdocbook + +@ifnotdocbook +@cartouche +@center @b{Escape Sequences for Metacharacters} + + +@cindex metacharacters, escape sequences for + +Suppose you use an octal or hexadecimal +escape to represent a regexp metacharacter. +(See @ref{Regexp Operators}.) +Does @command{awk} treat the character as a literal character or as a regexp +operator? + +@cindex dark corner, escape sequences, for metacharacters +Historically, such characters were taken literally. +@value{DARKCORNER} +However, the POSIX standard indicates that they should be treated +as real metacharacters, which is what @command{gawk} does. +In compatibility mode (@pxref{Options}), +@command{gawk} treats the characters represented by octal and hexadecimal +escape sequences literally when used in regexp constants. Thus, +@code{/a\52b/} is equivalent to @code{/a\*b/}. +@end cartouche +@end ifnotdocbook + @node Regexp Operators @section Regular Expression Operators @c STARTOFRANGE regexpo @@ -5305,8 +5465,12 @@ Using regexp constants is better form; it shows clearly that you intend a regexp match. @end itemize -@c fakenode --- for prepinfo -@subheading Advanced Notes: Using @code{\n} in Bracket Expressions of Dynamic Regexps +@cindex sidebar, Using @code{\n} in Bracket Expressions of Dynamic Regexps +@ifdocbook +@docbook +<sidebar><title>Using @code{\n} in Bracket Expressions of Dynamic Regexps</title> +@end docbook + @cindex regular expressions, dynamic, with embedded newlines @cindex newlines, in dynamic regexps @@ -5334,6 +5498,46 @@ $ @kbd{awk '$0 ~ /[ \t\n]/'} @command{gawk} does not have this problem, and it isn't likely to occur often in practice, but it's worth noting for future reference. + +@docbook +</sidebar> +@end docbook +@end ifdocbook + +@ifnotdocbook +@cartouche +@center @b{Using @code{\n} in Bracket Expressions of Dynamic Regexps} + + +@cindex regular expressions, dynamic, with embedded newlines +@cindex newlines, in dynamic regexps + +Some commercial versions of @command{awk} do not allow the newline +character to be used inside a bracket expression for a dynamic regexp: + +@example +$ @kbd{awk '$0 ~ "[ \t\n]"'} +@error{} awk: newline in character class [ +@error{} ]... +@error{} source line number 1 +@error{} context is +@error{} >>> <<< +@end example + +@cindex newlines, in regexp constants +But a newline in a regexp constant works with no problem: + +@example +$ @kbd{awk '$0 ~ /[ \t\n]/'} +@kbd{here is a sample line} +@print{} here is a sample line +@kbd{@value{CTL}-d} +@end example + +@command{gawk} does not have this problem, and it isn't likely to +occur often in practice, but it's worth noting for future reference. +@end cartouche +@end ifnotdocbook @c ENDOFRANGE dregexp @c ENDOFRANGE regexpd @c ENDOFRANGE regexp @@ -5625,10 +5829,12 @@ compatibility mode In compatibility mode, only the first character of the value of @code{RS} is used to determine the end of the record. -@c fakenode --- for prepinfo -@subheading Advanced Notes: @code{RS = "\0"} Is Not Portable +@cindex sidebar, @code{RS = "\0"} Is Not Portable +@ifdocbook +@docbook +<sidebar><title>@code{RS = "\0"} Is Not Portable</title> +@end docbook -@cindex advanced features, @value{DF}s as single record @cindex portability, @value{DF}s as single record There are times when you might want to treat an entire @value{DF} as a single record. The only way to make this happen is to give @code{RS} @@ -5663,6 +5869,53 @@ about.} store strings internally as C-style strings. C strings use the The best way to treat a whole file as a single record is to simply read the file in, one record at a time, concatenating each record onto the end of the previous ones. + +@docbook +</sidebar> +@end docbook +@end ifdocbook + +@ifnotdocbook +@cartouche +@center @b{@code{RS = "\0"} Is Not Portable} + + +@cindex portability, @value{DF}s as single record +There are times when you might want to treat an entire @value{DF} as a +single record. The only way to make this happen is to give @code{RS} +a value that you know doesn't occur in the input file. This is hard +to do in a general way, such that a program always works for arbitrary +input files. +@c can you say `understatement' boys and girls? + +You might think that for text files, the @sc{nul} character, which +consists of a character with all bits equal to zero, is a good +value to use for @code{RS} in this case: + +@example +BEGIN @{ RS = "\0" @} # whole file becomes one record? +@end example + +@cindex differences in @command{awk} and @command{gawk}, strings, storing +@command{gawk} in fact accepts this, and uses the @sc{nul} +character for the record separator. +However, this usage is @emph{not} portable +to other @command{awk} implementations. + +@cindex dark corner, strings, storing +All other @command{awk} implementations@footnote{At least that we know +about.} store strings internally as C-style strings. C strings use the +@sc{nul} character as the string terminator. In effect, this means that +@samp{RS = "\0"} is the same as @samp{RS = ""}. +@value{DARKCORNER} + +@cindex records, treating files as +@cindex files, as single records +The best way to treat a whole file as a single record is to +simply read the file in, one record at a time, concatenating each +record onto the end of the previous ones. +@end cartouche +@end ifnotdocbook @c ENDOFRANGE inspl @c ENDOFRANGE recspl @@ -5991,8 +6244,37 @@ This also applies to any built-in function that updates @code{$0}, such as @code{sub()} and @code{gsub()} (@pxref{String Functions}). -@c fakenode --- for prepinfo -@subheading Advanced Notes: Understanding @code{$0} +@cindex sidebar, Understanding @code{$0} +@ifdocbook +@docbook +<sidebar><title>Understanding @code{$0}</title> +@end docbook + + +It is important to remember that @code{$0} is the @emph{full} +record, exactly as it was read from the input. This includes +any leading or trailing whitespace, and the exact whitespace (or other +characters) that separate the fields. + +It is a not-uncommon error to try to change the field separators +in a record simply by setting @code{FS} and @code{OFS}, and then +expecting a plain @samp{print} or @samp{print $0} to print the +modified record. + +But this does not work, since nothing was done to change the record +itself. Instead, you must force the record to be rebuilt, typically +with a statement such as @samp{$1 = $1}, as described earlier. + +@docbook +</sidebar> +@end docbook +@end ifdocbook + +@ifnotdocbook +@cartouche +@center @b{Understanding @code{$0}} + + It is important to remember that @code{$0} is the @emph{full} record, exactly as it was read from the input. This includes @@ -6007,6 +6289,8 @@ modified record. But this does not work, since nothing was done to change the record itself. Instead, you must force the record to be rebuilt, typically with a statement such as @samp{$1 = $1}, as described earlier. +@end cartouche +@end ifnotdocbook @c ENDOFRANGE ficon @@ -6423,8 +6707,12 @@ Each individual character in the record becomes a separate field. POSIX standard.) @end table -@c fakenode --- for prepinfo -@subheading Advanced Notes: Changing @code{FS} Does Not Affect the Fields +@cindex sidebar, Changing @code{FS} Does Not Affect the Fields +@ifdocbook +@docbook +<sidebar><title>Changing @code{FS} Does Not Affect the Fields</title> +@end docbook + @cindex POSIX @command{awk}, field separators and @cindex field separators, POSIX and @@ -6468,8 +6756,67 @@ prints something like: root:nSijPlPhZZwgE:0:0:Root:/: @end example -@c fakenode --- for prepinfo -@subheading Advanced Notes: @code{FS} and @code{IGNORECASE} +@docbook +</sidebar> +@end docbook +@end ifdocbook + +@ifnotdocbook +@cartouche +@center @b{Changing @code{FS} Does Not Affect the Fields} + + + +@cindex POSIX @command{awk}, field separators and +@cindex field separators, POSIX and +According to the POSIX standard, @command{awk} is supposed to behave +as if each record is split into fields at the time it is read. +In particular, this means that if you change the value of @code{FS} +after a record is read, the value of the fields (i.e., how they were split) +should reflect the old value of @code{FS}, not the new one. + +@cindex dark corner, field separators +@cindex @command{sed} utility +@cindex stream editors +However, many older implementations of @command{awk} do not work this way. Instead, +they defer splitting the fields until a field is actually +referenced. The fields are split +using the @emph{current} value of @code{FS}! +@value{DARKCORNER} +This behavior can be difficult +to diagnose. The following example illustrates the difference +between the two methods. +(The @command{sed}@footnote{The @command{sed} utility is a ``stream editor.'' +Its behavior is also defined by the POSIX standard.} +command prints just the first line of @file{/etc/passwd}.) + +@example +sed 1q /etc/passwd | awk '@{ FS = ":" ; print $1 @}' +@end example + +@noindent +which usually prints: + +@example +root +@end example + +@noindent +on an incorrect implementation of @command{awk}, while @command{gawk} +prints something like: + +@example +root:nSijPlPhZZwgE:0:0:Root:/: +@end example +@end cartouche +@end ifnotdocbook + +@cindex sidebar, @code{FS} and @code{IGNORECASE} +@ifdocbook +@docbook +<sidebar><title>@code{FS} and @code{IGNORECASE}</title> +@end docbook + The @code{IGNORECASE} variable (@pxref{User-modified}) @@ -6490,6 +6837,38 @@ alphabetic character while ignoring case, use a regexp that will do it for you. E.g., @samp{FS = "[c]"}. In this case, @code{IGNORECASE} will take effect. +@docbook +</sidebar> +@end docbook +@end ifdocbook + +@ifnotdocbook +@cartouche +@center @b{@code{FS} and @code{IGNORECASE}} + + + +The @code{IGNORECASE} variable +(@pxref{User-modified}) +affects field splitting @emph{only} when the value of @code{FS} is a regexp. +It has no effect when @code{FS} is a single character, even if +that character is a letter. Thus, in the following code: + +@example +FS = "c" +IGNORECASE = 1 +$0 = "aCa" +print $1 +@end example + +@noindent +The output is @samp{aCa}. If you really want to split fields on an +alphabetic character while ignoring case, use a regexp that will +do it for you. E.g., @samp{FS = "[c]"}. In this case, @code{IGNORECASE} +will take effect. +@end cartouche +@end ifnotdocbook + @c ENDOFRANGE fisepr @c ENDOFRANGE fisepg @@ -7045,7 +7424,7 @@ write a program that does handle multiple comments on the line. @end ignore This form of the @code{getline} command sets @code{NF}, -@code{NR}, @code{FNR}, and the value of @code{$0}. +@code{NR}, @code{FNR}, @code{RT}, and the value of @code{$0}. @quotation NOTE The new value of @code{$0} is used to test @@ -7102,7 +7481,8 @@ free @end example The @code{getline} command used in this way sets only the variables -@code{NR} and @code{FNR} (and of course, @var{var}). The record is not +@code{NR}, @code{FNR} and @code{RT} (and of course, @var{var}). +The record is not split into fields, so the values of the fields (including @code{$0}) and the value of @code{NF} do not change. @@ -7137,6 +7517,7 @@ Because the main input stream is not used, the values of @code{NR} and @code{FNR} are not changed. However, the record it reads is split into fields in the normal manner, so the values of @code{$0} and the other fields are changed, resulting in a new value of @code{NF}. +@code{RT} is also set. @cindex POSIX @command{awk}, @code{<} operator and @c Thanks to Paul Eggert for initial wording here @@ -7275,6 +7656,7 @@ depending upon who is logged in on your system.) This variation of @code{getline} splits the record into fields, sets the value of @code{NF}, and recomputes the value of @code{$0}. The values of @code{NR} and @code{FNR} are not changed. +@code{RT} is set. @cindex POSIX @command{awk}, @code{|} I/O operator and @c Thanks to Paul Eggert for initial wording here @@ -7364,7 +7746,7 @@ The values of @code{NR} and because the main input stream is not used. However, the record is split into fields in the normal manner, thus changing the values of @code{$0}, of the other fields, -and of @code{NF}. +and of @code{NF} and @code{RT}. Coprocesses are an advanced feature. They are discussed here only because this is the @value{SECTION} on @code{getline}. @@ -7382,6 +7764,7 @@ and into the variable @var{var}. In this version of @code{getline}, none of the built-in variables are changed and the record is not split into fields. The only variable changed is @var{var}. +However, @code{RT} is set. @ifinfo Coprocesses are an advanced feature. They are discussed here only because @@ -7488,14 +7871,14 @@ Note: for each variant, @command{gawk} sets the @code{RT} built-in variable. @caption{@code{getline} Variants and What They Set} @multitable @columnfractions .33 .38 .27 @headitem Variant @tab Effect @tab Standard / Extension -@item @code{getline} @tab Sets @code{$0}, @code{NF}, @code{FNR}, and @code{NR} @tab Standard -@item @code{getline} @var{var} @tab Sets @var{var}, @code{FNR}, and @code{NR} @tab Standard -@item @code{getline <} @var{file} @tab Sets @code{$0} and @code{NF} @tab Standard -@item @code{getline @var{var} < @var{file}} @tab Sets @var{var} @tab Standard -@item @var{command} @code{| getline} @tab Sets @code{$0} and @code{NF} @tab Standard -@item @var{command} @code{| getline} @var{var} @tab Sets @var{var} @tab Standard -@item @var{command} @code{|& getline} @tab Sets @code{$0} and @code{NF} @tab Extension -@item @var{command} @code{|& getline} @var{var} @tab Sets @var{var} @tab Extension +@item @code{getline} @tab Sets @code{$0}, @code{NF}, @code{FNR}, @code{NR}, and @code{RT} @tab Standard +@item @code{getline} @var{var} @tab Sets @var{var}, @code{FNR}, @code{NR}, and @code{RT} @tab Standard +@item @code{getline <} @var{file} @tab Sets @code{$0}, @code{NF}, and @code{RT} @tab Standard +@item @code{getline @var{var} < @var{file}} @tab Sets @var{var} and @code{RT} @tab Standard +@item @var{command} @code{| getline} @tab Sets @code{$0}, @code{NF}, and @code{RT} @tab Standard +@item @var{command} @code{| getline} @var{var} @tab Sets @var{var} and @code{RT} @tab Standard +@item @var{command} @code{|& getline} @tab Sets @code{$0}, @code{NF}, and @code{RT} @tab Extension +@item @var{command} @code{|& getline} @var{var} @tab Sets @var{var} and @code{RT} @tab Extension @end multitable @end float @c ENDOFRANGE getl @@ -7515,7 +7898,7 @@ in the @code{PROCINFO} array: PROCINFO["input_name", "READ_TIMEOUT"] = @var{timeout in milliseconds} @end example -When set, this will cause @command{gawk} to time out and return failure +When set, this causes @command{gawk} to time out and return failure if no data is available to read within the specified timeout period. For example, a TCP client can decide to give up on receiving any response from the server after a certain amount of time: @@ -8609,9 +8992,12 @@ program may have open to just one! In @command{gawk}, there is no such limit. @command{gawk} allows a program to open as many pipelines as the underlying operating system permits. -@c fakenode --- for prepinfo -@subheading Advanced Notes: Piping into @command{sh} -@cindex advanced features, piping into @command{sh} +@cindex sidebar, Piping into @command{sh} +@ifdocbook +@docbook +<sidebar><title>Piping into @command{sh}</title> +@end docbook + @cindex shells, piping commands into A particularly powerful way to use redirection is to build command lines @@ -8633,6 +9019,40 @@ uppercase characters converted to lowercase The program builds up a list of command lines, using the @command{mv} utility to rename the files. It then sends the list to the shell for execution. + +@docbook +</sidebar> +@end docbook +@end ifdocbook + +@ifnotdocbook +@cartouche +@center @b{Piping into @command{sh}} + + +@cindex shells, piping commands into + +A particularly powerful way to use redirection is to build command lines +and pipe them into the shell, @command{sh}. For example, suppose you +have a list of files brought over from a system where all the @value{FN}s +are stored in uppercase, and you wish to rename them to have names in +all lowercase. The following program is both simple and efficient: + +@c @cindex @command{mv} utility +@example +@{ printf("mv %s %s\n", $0, tolower($0)) | "sh" @} + +END @{ close("sh") @} +@end example + +The @code{tolower()} function returns its argument string with all +uppercase characters converted to lowercase +(@pxref{String Functions}). +The program builds up a list of command lines, +using the @command{mv} utility to rename the files. +It then sends the list to the shell for execution. +@end cartouche +@end ifnotdocbook @c ENDOFRANGE outre @c ENDOFRANGE reout @@ -8760,7 +9180,7 @@ to confusing results. Finally, using the @code{close()} function on a @value{FN} of the form @code{"/dev/fd/@var{N}"}, for file descriptor numbers -above two, will actually close the given file descriptor. +above two, does actually close the given file descriptor. The @file{/dev/stdin}, @file{/dev/stdout}, and @file{/dev/stderr} special files are also recognized internally by several other @@ -8987,9 +9407,68 @@ delayed until @ref{Two-way I/O}, which discusses it in more detail and gives an example. -@c fakenode --- for prepinfo -@subheading Advanced Notes: Using @code{close()}'s Return Value -@cindex advanced features, @code{close()} function +@cindex sidebar, Using @code{close()}'s Return Value +@ifdocbook +@docbook +<sidebar><title>Using @code{close()}'s Return Value</title> +@end docbook + +@cindex dark corner, @code{close()} function +@cindex @code{close()} function, return values +@cindex return values@comma{} @code{close()} function +@cindex differences in @command{awk} and @command{gawk}, @code{close()} function +@cindex Unix @command{awk}, @code{close()} function and + +In many versions of Unix @command{awk}, the @code{close()} function +is actually a statement. It is a syntax error to try and use the return +value from @code{close()}: +@value{DARKCORNER} + +@example +command = "@dots{}" +command | getline info +retval = close(command) # syntax error in many Unix awks +@end example + +@cindex @command{gawk}, @code{ERRNO} variable in +@cindex @code{ERRNO} variable +@command{gawk} treats @code{close()} as a function. +The return value is @minus{}1 if the argument names something +that was never opened with a redirection, or if there is +a system problem closing the file or process. +In these cases, @command{gawk} sets the built-in variable +@code{ERRNO} to a string describing the problem. + +In @command{gawk}, +when closing a pipe or coprocess (input or output), +the return value is the exit status of the command.@footnote{ +This is a full 16-bit value as returned by the @code{wait()} +system call. See the system manual pages for information on +how to decode this value.} +Otherwise, it is the return value from the system's @code{close()} or +@code{fclose()} C functions when closing input or output +files, respectively. +This value is zero if the close succeeds, or @minus{}1 if +it fails. + +The POSIX standard is very vague; it says that @code{close()} +returns zero on success and nonzero otherwise. In general, +different implementations vary in what they report when closing +pipes; thus the return value cannot be used portably. +@value{DARKCORNER} +In POSIX mode (@pxref{Options}), @command{gawk} just returns zero +when closing a pipe. + +@docbook +</sidebar> +@end docbook +@end ifdocbook + +@ifnotdocbook +@cartouche +@center @b{Using @code{close()}'s Return Value} + + @cindex dark corner, @code{close()} function @cindex @code{close()} function, return values @cindex return values@comma{} @code{close()} function @@ -9035,6 +9514,8 @@ pipes; thus the return value cannot be used portably. @value{DARKCORNER} In POSIX mode (@pxref{Options}), @command{gawk} just returns zero when closing a pipe. +@end cartouche +@end ifnotdocbook @c ENDOFRANGE ifc @c ENDOFRANGE ofc @@ -9221,9 +9702,35 @@ If @command{gawk} is in compatibility mode (@pxref{Options}), they are not available. -@c fakenode --- for prepinfo -@subheading Advanced Notes: A Constant's Base Does Not Affect Its Value -@cindex advanced features, constants@comma{} values of +@cindex sidebar, A Constant's Base Does Not Affect Its Value +@ifdocbook +@docbook +<sidebar><title>A Constant's Base Does Not Affect Its Value</title> +@end docbook + + +Once a numeric constant has +been converted internally into a number, +@command{gawk} no longer remembers +what the original form of the constant was; the internal value is +always used. This has particular consequences for conversion of +numbers to strings: + +@example +$ @kbd{gawk 'BEGIN @{ printf "0x11 is <%s>\n", 0x11 @}'} +@print{} 0x11 is <17> +@end example + +@docbook +</sidebar> +@end docbook +@end ifdocbook + +@ifnotdocbook +@cartouche +@center @b{A Constant's Base Does Not Affect Its Value} + + Once a numeric constant has been converted internally into a number, @@ -9236,6 +9743,8 @@ numbers to strings: $ @kbd{gawk 'BEGIN @{ printf "0x11 is <%s>\n", 0x11 @}'} @print{} 0x11 is <17> @end example +@end cartouche +@end ifnotdocbook @node Regexp Constants @subsubsection Regular Expression Constants @@ -9616,7 +10125,7 @@ point, so the default behavior was restored to use a period as the decimal point character. You can use the @option{--use-lc-numeric} option (@pxref{Options}) to force @command{gawk} to use the locale's decimal point character. (@command{gawk} also uses the locale's decimal -point character when in POSIX mode, either via @option{--posix}, or the +point character when in POSIX mode, either via @w{@option{--posix}}, or the @env{POSIXLY_CORRECT} environment variable, as shown previously.) @ref{table-locale-affects} describes the cases in which the locale's decimal @@ -10120,9 +10629,12 @@ Only the @samp{^=} operator is specified by POSIX. For maximum portability, do not use the @samp{**=} operator. @end quotation -@c fakenode --- for prepinfo -@subheading Advanced Notes: Syntactic Ambiguities Between @samp{/=} and Regular Expressions -@cindex advanced features, regexp constants +@cindex sidebar, Syntactic Ambiguities Between @samp{/=} and Regular Expressions +@ifdocbook +@docbook +<sidebar><title>Syntactic Ambiguities Between @samp{/=} and Regular Expressions</title> +@end docbook + @cindex dark corner, regexp constants, @code{/=} operator and @cindex @code{/} (forward slash), @code{/=} operator, vs. @code{/=@dots{}/} regexp constant @cindex forward slash (@code{/}), @code{/=} operator, vs. @code{/=@dots{}/} regexp constant @@ -10138,7 +10650,7 @@ For maximum portability, do not use the @samp{**=} operator. There is a syntactic ambiguity between the @code{/=} assignment operator and regexp constants whose first character is an @samp{=}. @value{DARKCORNER} -This is most notable in commercial @command{awk} versions. +This is most notable in some commercial @command{awk} versions. For example: @example @@ -10160,6 +10672,56 @@ awk '/[=]=/' /dev/null nor do the other freely available versions described in @ref{Other Versions}. + +@docbook +</sidebar> +@end docbook +@end ifdocbook + +@ifnotdocbook +@cartouche +@center @b{Syntactic Ambiguities Between @samp{/=} and Regular Expressions} + + +@cindex dark corner, regexp constants, @code{/=} operator and +@cindex @code{/} (forward slash), @code{/=} operator, vs. @code{/=@dots{}/} regexp constant +@cindex forward slash (@code{/}), @code{/=} operator, vs. @code{/=@dots{}/} regexp constant +@cindex regexp constants, @code{/=@dots{}/}, @code{/=} operator and + +@c derived from email from "Nelson H. F. Beebe" <beebe@math.utah.edu> +@c Date: Mon, 1 Sep 1997 13:38:35 -0600 (MDT) + +@cindex dark corner +@cindex ambiguity, syntactic: @code{/=} operator vs. @code{/=@dots{}/} regexp constant +@cindex syntactic ambiguity: @code{/=} operator vs. @code{/=@dots{}/} regexp constant +@cindex @code{/=} operator vs. @code{/=@dots{}/} regexp constant +There is a syntactic ambiguity between the @code{/=} assignment +operator and regexp constants whose first character is an @samp{=}. +@value{DARKCORNER} +This is most notable in some commercial @command{awk} versions. +For example: + +@example +$ awk /==/ /dev/null +@error{} awk: syntax error at source line 1 +@error{} context is +@error{} >>> /= <<< +@error{} awk: bailing out at source line 1 +@end example + +@noindent +A workaround is: + +@example +awk '/[=]=/' /dev/null +@end example + +@command{gawk} does not have this problem, +nor do the other +freely available versions described in +@ref{Other Versions}. +@end cartouche +@end ifnotdocbook @c ENDOFRANGE exas @c ENDOFRANGE opas @c ENDOFRANGE asop @@ -10239,9 +10801,64 @@ as the value of the expression. like @samp{@var{lvalue}++}, but instead of adding, it subtracts.) @end table -@c fakenode --- for prepinfo -@subheading Advanced Notes: Operator Evaluation Order -@cindex advanced features, operators@comma{} precedence +@cindex sidebar, Operator Evaluation Order +@ifdocbook +@docbook +<sidebar><title>Operator Evaluation Order</title> +@end docbook + +@cindex precedence +@cindex operators, precedence +@cindex portability, operators +@cindex evaluation order +@cindex Marx, Groucho +@quotation +@i{Doctor, doctor! It hurts when I do this!@* +So don't do that!}@* +Groucho Marx +@end quotation + +@noindent +What happens for something like the following? + +@example +b = 6 +print b += b++ +@end example + +@noindent +Or something even stranger? + +@example +b = 6 +b += ++b + b++ +print b +@end example + +@cindex side effects +In other words, when do the various side effects prescribed by the +postfix operators (@samp{b++}) take effect? +When side effects happen is @dfn{implementation defined}. +In other words, it is up to the particular version of @command{awk}. +The result for the first example may be 12 or 13, and for the second, it +may be 22 or 23. + +In short, doing things like this is not recommended and definitely +not anything that you can rely upon for portability. +You should avoid such things in your own programs. +@c You'll sleep better at night and be able to look at yourself +@c in the mirror in the morning. + +@docbook +</sidebar> +@end docbook +@end ifdocbook + +@ifnotdocbook +@cartouche +@center @b{Operator Evaluation Order} + + @cindex precedence @cindex operators, precedence @cindex portability, operators @@ -10283,6 +10900,8 @@ not anything that you can rely upon for portability. You should avoid such things in your own programs. @c You'll sleep better at night and be able to look at yourself @c in the mirror in the morning. +@end cartouche +@end ifnotdocbook @c ENDOFRANGE inop @c ENDOFRANGE opde @c ENDOFRANGE deop @@ -12555,7 +13174,7 @@ any @code{ENDFILE} rules are executed except in the case as mentioned below, @code{ARGIND} is incremented, and -any @code{BEGINFILE} rules are executed +any @code{BEGINFILE} rules are executed. (@code{ARGIND} hasn't been introduced yet. @xref{Built-in Variables}.) With @command{gawk}, @code{nextfile} is useful inside a @code{BEGINFILE} @@ -12892,7 +13511,7 @@ character. (@xref{Output Separators}.) @cindex @code{PREC} variable @item PREC # The working precision of arbitrary precision floating-point numbers, -53 by default (@pxref{Setting Precision}). +53 bits by default (@pxref{Setting Precision}). @cindex @code{ROUNDMODE} variable @item ROUNDMODE # @@ -13129,10 +13748,9 @@ current record. @xref{Changing Fields}. @cindex @command{gawk}, @code{FUNCTAB} array in @cindex differences in @command{awk} and @command{gawk}, @code{FUNCTAB} variable @item FUNCTAB # -An array whose indices are the names of all the user-defined -or extension functions in the program. -@strong{NOTE}: The array values cannot currently be used. -Also, you may not use the @code{delete} statement with the +An array whose indices and corresponding values are the names of all +the user-defined or extension functions in the program. +@strong{NOTE}: You may not use the @code{delete} statement with the @code{FUNCTAB} array. @cindex @code{NR} variable @@ -13243,6 +13861,19 @@ The maximum precision supported by MPFR. The minimum precision required by MPFR. @end table +The following additional elements in the array are available to provide +information about the version of the extension API, if your version +of @command{gawk} supports dynamic loading of extension functions +(@pxref{Dynamic Extensions}): + +@table @code +@item PROCINFO["api_major"] +The major version of the extension API. + +@item PROCINFO["api_minor"] +The minor version of the extension API. +@end table + On some systems, there may be elements in the array, @code{"group1"} through @code{"group@var{N}"} for some @var{N}. @var{N} is the number of supplementary groups that the process has. Use the @code{in} operator @@ -13251,10 +13882,21 @@ to test for these elements @cindex @command{gawk}, @code{PROCINFO} array in @cindex @code{PROCINFO} array -The @code{PROCINFO} array is also used to cause coprocesses +The @code{PROCINFO} array has the following additional uses: + +@itemize @bullet +@item +It may be +used to cause coprocesses to communicate over pseudo-ttys instead of through two-way pipes; this is discussed further in @ref{Two-way I/O}. +@item +It may be used to provide a timeout when reading from any +open input file, pipe, or coprocess. +@xref{Read Timeout}, for more information. +@end itemize + This array is a @command{gawk} extension. In other @command{awk} implementations, or if @command{gawk} is in compatibility mode @@ -13345,11 +13987,54 @@ are available as elements within the @code{SYMTAB} array. @c ENDOFRANGE bvconi @c ENDOFRANGE vbconi -@c fakenode --- for prepinfo -@subheading Advanced Notes: Changing @code{NR} and @code{FNR} +@cindex sidebar, Changing @code{NR} and @code{FNR} +@ifdocbook +@docbook +<sidebar><title>Changing @code{NR} and @code{FNR}</title> +@end docbook + +@cindex @code{NR} variable, changing +@cindex @code{FNR} variable, changing +@cindex dark corner, @code{FNR}/@code{NR} variables +@command{awk} increments @code{NR} and @code{FNR} +each time it reads a record, instead of setting them to the absolute +value of the number of records read. This means that a program can +change these variables and their new values are incremented for +each record. +@value{DARKCORNER} +The following example shows this: + +@example +$ @kbd{echo '1} +> @kbd{2} +> @kbd{3} +> @kbd{4' | awk 'NR == 2 @{ NR = 17 @}} +> @kbd{@{ print NR @}'} +@print{} 1 +@print{} 17 +@print{} 18 +@print{} 19 +@end example + +@noindent +Before @code{FNR} was added to the @command{awk} language +(@pxref{V7/SVR3.1}), +many @command{awk} programs used this feature to track the number of +records in a file by resetting @code{NR} to zero when @code{FILENAME} +changed. + +@docbook +</sidebar> +@end docbook +@end ifdocbook + +@ifnotdocbook +@cartouche +@center @b{Changing @code{NR} and @code{FNR}} + + @cindex @code{NR} variable, changing @cindex @code{FNR} variable, changing -@cindex advanced features, @code{FNR}/@code{NR} variables @cindex dark corner, @code{FNR}/@code{NR} variables @command{awk} increments @code{NR} and @code{FNR} each time it reads a record, instead of setting them to the absolute @@ -13377,6 +14062,8 @@ Before @code{FNR} was added to the @command{awk} language many @command{awk} programs used this feature to track the number of records in a file by resetting @code{NR} to zero when @code{FILENAME} changed. +@end cartouche +@end ifnotdocbook @node ARGC and ARGV @subsection Using @code{ARGC} and @code{ARGV} @@ -13607,7 +14294,7 @@ A contiguous array of four elements might look like the following example, conceptually, if the element values are 8, @code{"foo"}, @code{""}, and 30: -@c @strong{FIXME: NEXT ED:} Use real images here +@c @strong{FIXME: NEXT ED:} Use real images here, and an @float @iftex @c from Karl Berry, much thanks for the help. @tex @@ -13961,7 +14648,7 @@ and will vary from one version of @command{awk} to the next. Often, though, you may wish to do something simple, such as ``traverse the array by comparing the indices in ascending order,'' -or ``traverse the array by on comparing the values in descending order.'' +or ``traverse the array by comparing the values in descending order.'' @command{gawk} provides two mechanisms which give you this control. @itemize @bullet @@ -13971,7 +14658,7 @@ We describe this now. @item Set @code{PROCINFO["sorted_in"]} to the name of a user-defined function -to be used for comparison of array elements. This advanced feature +to use for comparison of array elements. This advanced feature is described later, in @ref{Array Sorting}. @end itemize @@ -14037,8 +14724,7 @@ Subarrays, if present, come out first. The array traversal order is determined before the @code{for} loop starts to run. Changing @code{PROCINFO["sorted_in"]} in the loop body -will not affect the loop. - +does not affect the loop. For example: @example @@ -14620,7 +15306,7 @@ for (i in a) @{ @end example @noindent -@xref{Walking Arrays}, for a user-defined function that will ``walk'' an +@xref{Walking Arrays}, for a user-defined function that ``walks'' an arbitrarily-dimensioned array of arrays. Recall that a reference to an uninitialized array element yields a value @@ -15949,9 +16635,12 @@ and the special cases for @code{sub()} and @code{gsub()}, we recommend the use of @command{gawk} and @code{gensub()} when you have to do substitutions. -@c fakenode --- for prepinfo -@subheading Advanced Notes: Matching the Null String -@cindex advanced features, null strings@comma{} matching +@cindex sidebar, Matching the Null String +@ifdocbook +@docbook +<sidebar><title>Matching the Null String</title> +@end docbook + @cindex matching, null strings @cindex null strings, matching @cindex @code{*} (asterisk), @code{*} operator, null strings@comma{} matching @@ -15969,6 +16658,35 @@ $ @kbd{echo abc | awk '@{ gsub(/m*/, "X"); print @}'} @noindent Although this makes a certain amount of sense, it can be surprising. +@docbook +</sidebar> +@end docbook +@end ifdocbook + +@ifnotdocbook +@cartouche +@center @b{Matching the Null String} + + +@cindex matching, null strings +@cindex null strings, matching +@cindex @code{*} (asterisk), @code{*} operator, null strings@comma{} matching +@cindex asterisk (@code{*}), @code{*} operator, null strings@comma{} matching + +In @command{awk}, the @samp{*} operator can match the null string. +This is particularly important for the @code{sub()}, @code{gsub()}, +and @code{gensub()} functions. For example: + +@example +$ @kbd{echo abc | awk '@{ gsub(/m*/, "X"); print @}'} +@print{} XaXbXcX +@end example + +@noindent +Although this makes a certain amount of sense, it can be surprising. +@end cartouche +@end ifnotdocbook + @node I/O Functions @subsection Input/Output Functions @@ -16101,9 +16819,12 @@ When @option{--sandbox} is specified, the @code{system()} function is disabled @end table -@c fakenode --- for prepinfo -@subheading Advanced Notes: Interactive Versus Noninteractive Buffering -@cindex advanced features, buffering +@cindex sidebar, Interactive Versus Noninteractive Buffering +@ifdocbook +@docbook +<sidebar><title>Interactive Versus Noninteractive Buffering</title> +@end docbook + @cindex buffering, interactive vs.@: noninteractive As a side point, buffering issues can be even more confusing, depending @@ -16145,9 +16866,65 @@ $ @kbd{awk '@{ print $1 + $2 @}' | cat} Here, no output is printed until after the @kbd{@value{CTL}-d} is typed, because it is all buffered and sent down the pipe to @command{cat} in one shot. -@c fakenode --- for prepinfo -@subheading Advanced Notes: Controlling Output Buffering with @code{system()} -@cindex advanced features, buffering +@docbook +</sidebar> +@end docbook +@end ifdocbook + +@ifnotdocbook +@cartouche +@center @b{Interactive Versus Noninteractive Buffering} + + +@cindex buffering, interactive vs.@: noninteractive + +As a side point, buffering issues can be even more confusing, depending +upon whether your program is @dfn{interactive}, i.e., communicating +with a user sitting at a keyboard.@footnote{A program is interactive +if the standard output is connected to a terminal device. On modern +systems, this means your keyboard and screen.} + +@c Thanks to Walter.Mecky@dresdnerbank.de for this example, and for +@c motivating me to write this section. +Interactive programs generally @dfn{line buffer} their output; i.e., they +write out every line. Noninteractive programs wait until they have +a full buffer, which may be many lines of output. +Here is an example of the difference: + +@example +$ @kbd{awk '@{ print $1 + $2 @}'} +@kbd{1 1} +@print{} 2 +@kbd{2 3} +@print{} 5 +@kbd{@value{CTL}-d} +@end example + +@noindent +Each line of output is printed immediately. Compare that behavior +with this example: + +@example +$ @kbd{awk '@{ print $1 + $2 @}' | cat} +@kbd{1 1} +@kbd{2 3} +@kbd{@value{CTL}-d} +@print{} 2 +@print{} 5 +@end example + +@noindent +Here, no output is printed until after the @kbd{@value{CTL}-d} is typed, because +it is all buffered and sent down the pipe to @command{cat} in one shot. +@end cartouche +@end ifnotdocbook + +@cindex sidebar, Controlling Output Buffering with @code{system()} +@ifdocbook +@docbook +<sidebar><title>Controlling Output Buffering with @code{system()}</title> +@end docbook + @cindex buffers, flushing @cindex buffering, input/output @cindex output, buffering @@ -16203,6 +16980,73 @@ second print If @command{awk} did not flush its buffers before calling @code{system()}, you would see the latter (undesirable) output. +@docbook +</sidebar> +@end docbook +@end ifdocbook + +@ifnotdocbook +@cartouche +@center @b{Controlling Output Buffering with @code{system()}} + + +@cindex buffers, flushing +@cindex buffering, input/output +@cindex output, buffering + +The @code{fflush()} function provides explicit control over output buffering for +individual files and pipes. However, its use is not portable to many older +@command{awk} implementations. An alternative method to flush output +buffers is to call @code{system()} with a null string as its argument: + +@example +system("") # flush output +@end example + +@noindent +@command{gawk} treats this use of the @code{system()} function as a special +case and is smart enough not to run a shell (or other command +interpreter) with the empty command. Therefore, with @command{gawk}, this +idiom is not only useful, it is also efficient. While this method should work +with other @command{awk} implementations, it does not necessarily avoid +starting an unnecessary shell. (Other implementations may only +flush the buffer associated with the standard output and not necessarily +all buffered output.) + +If you think about what a programmer expects, it makes sense that +@code{system()} should flush any pending output. The following program: + +@example +BEGIN @{ + print "first print" + system("echo system echo") + print "second print" +@} +@end example + +@noindent +must print: + +@example +first print +system echo +second print +@end example + +@noindent +and not: + +@example +system echo +first print +second print +@end example + +If @command{awk} did not flush its buffers before calling @code{system()}, +you would see the latter (undesirable) output. +@end cartouche +@end ifnotdocbook + @node Time Functions @subsection Time Functions @@ -17494,8 +18338,9 @@ If @option{--lint} is specified @cindex portability, @code{next} statement in user-defined functions Some @command{awk} implementations generate a runtime -error if you use the @code{next} statement -(@pxref{Next Statement}) +error if you use either the @code{next} statement +or the @code{nextfile} statement +(@pxref{Next Statement}, also @pxref{Nextfile Statement}) inside a user-defined function. @command{gawk} does not have this limitation. @c ENDOFRANGE fudc @@ -18976,8 +19821,12 @@ END @{ endfile(_filename_) @} shows how this library function can be used and how it simplifies writing the main program. -@c fakenode --- for prepinfo -@subheading Advanced Notes: So Why Does @command{gawk} have @code{BEGINFILE} and @code{ENDFILE}? +@cindex sidebar, So Why Does @command{gawk} have @code{BEGINFILE} and @code{ENDFILE}? +@ifdocbook +@docbook +<sidebar><title>So Why Does @command{gawk} have @code{BEGINFILE} and @code{ENDFILE}?</title> +@end docbook + You are probably wondering, if @code{beginfile()} and @code{endfile()} functions can do the job, why does @command{gawk} have @@ -18991,6 +19840,31 @@ the main reason for @code{BEGINFILE} is to give you a ``hook'' to catch files that cannot be processed. @code{ENDFILE} exists for symmetry, and because it provides an easy way to do per-file cleanup processing. +@docbook +</sidebar> +@end docbook +@end ifdocbook + +@ifnotdocbook +@cartouche +@center @b{So Why Does @command{gawk} have @code{BEGINFILE} and @code{ENDFILE}?} + + + +You are probably wondering, if @code{beginfile()} and @code{endfile()} +functions can do the job, why does @command{gawk} have +@code{BEGINFILE} and @code{ENDFILE} patterns (@pxref{BEGINFILE/ENDFILE})? + +Good question. Normally, if @command{awk} cannot open a file, this +causes an immediate fatal error. In this case, there is no way for a +user-defined function to deal with the problem, since the mechanism for +calling it relies on the file being open and at the first record. Thus, +the main reason for @code{BEGINFILE} is to give you a ``hook'' to catch +files that cannot be processed. @code{ENDFILE} exists for symmetry, +and because it provides an easy way to do per-file cleanup processing. +@end cartouche +@end ifnotdocbook + @node Rewind Function @subsection Rereading the Current File @@ -20419,6 +21293,20 @@ $ @kbd{gawk -f walk_array.awk} @print{} a[3] = 3 @end example +Walking an array and processing each element is a general-purpose +operation. You might want to consider generalizing the @code{walk_array()} +function by adding an additional parameter named @code{process}. + +Then, inside the loop, instead of simply printing the array element's +index and value, use the indirect function call syntax +(@pxref{Indirect Calls}) on @code{process}, passing it the index +and the value. + +When calling @code{walk_array()}, you would pass the name of a user-defined +function that expects to receive and index and a value, and then processes +the element. + + @c ENDOFRANGE libfgdata @c ENDOFRANGE flibgdata @c ENDOFRANGE gdatar @@ -22789,7 +23677,7 @@ them in by hand. Here we present a program that can extract parts of a Texinfo input file into separate files. @cindex Texinfo -This @value{DOCUMENT} is written in @uref{http://texinfo.org, Texinfo}, +This @value{DOCUMENT} is written in @uref{http://www.gnu.org/software/texinfo/, Texinfo}, the GNU project's document formatting language. A single Texinfo source file can be used to produce both printed and online documentation. @@ -23911,10 +24799,10 @@ It contains the following chapters: @itemize @bullet @item -@ref{Internationalization}. +@ref{Advanced Features}. @item -@ref{Advanced Features}. +@ref{Internationalization}. @item @ref{Debugger}. @@ -23927,795 +24815,6 @@ It contains the following chapters: @end ifdocbook @end ignore -@node Internationalization -@chapter Internationalization with @command{gawk} - -Once upon a time, computer makers -wrote software that worked only in English. -Eventually, hardware and software vendors noticed that if their -systems worked in the native languages of non-English-speaking -countries, they were able to sell more systems. -As a result, internationalization and localization -of programs and software systems became a common practice. - -@c STARTOFRANGE inloc -@cindex internationalization, localization -@cindex @command{gawk}, internationalization and, See internationalization -@cindex internationalization, localization, @command{gawk} and -For many years, the ability to provide internationalization -was largely restricted to programs written in C and C++. -This @value{CHAPTER} describes the underlying library @command{gawk} -uses for internationalization, as well as how -@command{gawk} makes internationalization -features available at the @command{awk} program level. -Having internationalization available at the @command{awk} level -gives software developers additional flexibility---they are no -longer forced to write in C or C++ when internationalization is -a requirement. - -@menu -* I18N and L10N:: Internationalization and Localization. -* Explaining gettext:: How GNU @code{gettext} works. -* Programmer i18n:: Features for the programmer. -* Translator i18n:: Features for the translator. -* I18N Example:: A simple i18n example. -* Gawk I18N:: @command{gawk} is also internationalized. -@end menu - -@node I18N and L10N -@section Internationalization and Localization - -@cindex internationalization -@cindex localization, See internationalization@comma{} localization -@cindex localization -@dfn{Internationalization} means writing (or modifying) a program once, -in such a way that it can use multiple languages without requiring -further source-code changes. -@dfn{Localization} means providing the data necessary for an -internationalized program to work in a particular language. -Most typically, these terms refer to features such as the language -used for printing error messages, the language used to read -responses, and information related to how numerical and -monetary values are printed and read. - -@node Explaining gettext -@section GNU @code{gettext} - -@cindex internationalizing a program -@c STARTOFRANGE gettex -@cindex @code{gettext} library -The facilities in GNU @code{gettext} focus on messages; strings printed -by a program, either directly or via formatting with @code{printf} or -@code{sprintf()}.@footnote{For some operating systems, the @command{gawk} -port doesn't support GNU @code{gettext}. -Therefore, these features are not available -if you are using one of those operating systems. Sorry.} - -@cindex portability, @code{gettext} library and -When using GNU @code{gettext}, each application has its own -@dfn{text domain}. This is a unique name, such as @samp{kpilot} or @samp{gawk}, -that identifies the application. -A complete application may have multiple components---programs written -in C or C++, as well as scripts written in @command{sh} or @command{awk}. -All of the components use the same text domain. - -To make the discussion concrete, assume we're writing an application -named @command{guide}. Internationalization consists of the -following steps, in this order: - -@enumerate -@item -The programmer goes -through the source for all of @command{guide}'s components -and marks each string that is a candidate for translation. -For example, @code{"`-F': option required"} is a good candidate for translation. -A table with strings of option names is not (e.g., @command{gawk}'s -@option{--profile} option should remain the same, no matter what the local -language). - -@cindex @code{textdomain()} function (C library) -@item -The programmer indicates the application's text domain -(@code{"guide"}) to the @code{gettext} library, -by calling the @code{textdomain()} function. - -@cindex @code{.pot} files -@cindex files, @code{.pot} -@cindex portable object template files -@cindex files, portable object template -@item -Messages from the application are extracted from the source code and -collected into a portable object template file (@file{guide.pot}), -which lists the strings and their translations. -The translations are initially empty. -The original (usually English) messages serve as the key for -lookup of the translations. - -@cindex @code{.po} files -@cindex files, @code{.po} -@cindex portable object files -@cindex files, portable object -@item -For each language with a translator, @file{guide.pot} -is copied to a portable object file (@code{.po}) -and translations are created and shipped with the application. -For example, there might be a @file{fr.po} for a French translation. - -@cindex @code{.mo} files -@cindex files, @code{.mo} -@cindex message object files -@cindex files, message object -@item -Each language's @file{.po} file is converted into a binary -message object (@file{.mo}) file. -A message object file contains the original messages and their -translations in a binary format that allows fast lookup of translations -at runtime. - -@item -When @command{guide} is built and installed, the binary translation files -are installed in a standard place. - -@cindex @code{bindtextdomain()} function (C library) -@item -For testing and development, it is possible to tell @code{gettext} -to use @file{.mo} files in a different directory than the standard -one by using the @code{bindtextdomain()} function. - -@cindex @code{.mo} files, specifying directory of -@cindex files, @code{.mo}, specifying directory of -@cindex message object files, specifying directory of -@cindex files, message object, specifying directory of -@item -At runtime, @command{guide} looks up each string via a call -to @code{gettext()}. The returned string is the translated string -if available, or the original string if not. - -@item -If necessary, it is possible to access messages from a different -text domain than the one belonging to the application, without -having to switch the application's default text domain back -and forth. -@end enumerate - -@cindex @code{gettext()} function (C library) -In C (or C++), the string marking and dynamic translation lookup -are accomplished by wrapping each string in a call to @code{gettext()}: - -@example -printf("%s", gettext("Don't Panic!\n")); -@end example - -The tools that extract messages from source code pull out all -strings enclosed in calls to @code{gettext()}. - -@cindex @code{_} (underscore), @code{_} C macro -@cindex underscore (@code{_}), @code{_} C macro -The GNU @code{gettext} developers, recognizing that typing -@samp{gettext(@dots{})} over and over again is both painful and ugly to look -at, use the macro @samp{_} (an underscore) to make things easier: - -@example -/* In the standard header file: */ -#define _(str) gettext(str) - -/* In the program text: */ -printf("%s", _("Don't Panic!\n")); -@end example - -@cindex internationalization, localization, locale categories -@cindex @code{gettext} library, locale categories -@cindex locale categories -@noindent -This reduces the typing overhead to just three extra characters per string -and is considerably easier to read as well. - -There are locale @dfn{categories} -for different types of locale-related information. -The defined locale categories that @code{gettext} knows about are: - -@table @code -@cindex @code{LC_MESSAGES} locale category -@item LC_MESSAGES -Text messages. This is the default category for @code{gettext} -operations, but it is possible to supply a different one explicitly, -if necessary. (It is almost never necessary to supply a different category.) - -@cindex sorting characters in different languages -@cindex @code{LC_COLLATE} locale category -@item LC_COLLATE -Text-collation information; i.e., how different characters -and/or groups of characters sort in a given language. - -@cindex @code{LC_CTYPE} locale category -@item LC_CTYPE -Character-type information (alphabetic, digit, upper- or lowercase, and -so on). -This information is accessed via the -POSIX character classes in regular expressions, -such as @code{/[[:alnum:]]/} -(@pxref{Regexp Operators}). - -@cindex monetary information, localization -@cindex currency symbols, localization -@cindex @code{LC_MONETARY} locale category -@item LC_MONETARY -Monetary information, such as the currency symbol, and whether the -symbol goes before or after a number. - -@cindex @code{LC_NUMERIC} locale category -@item LC_NUMERIC -Numeric information, such as which characters to use for the decimal -point and the thousands separator.@footnote{Americans -use a comma every three decimal places and a period for the decimal -point, while many Europeans do exactly the opposite: -1,234.56 versus 1.234,56.} - -@cindex @code{LC_RESPONSE} locale category -@item LC_RESPONSE -Response information, such as how ``yes'' and ``no'' appear in the -local language, and possibly other information as well. - -@cindex time, localization and -@cindex dates, information related to@comma{} localization -@cindex @code{LC_TIME} locale category -@item LC_TIME -Time- and date-related information, such as 12- or 24-hour clock, month printed -before or after the day in a date, local month abbreviations, and so on. - -@cindex @code{LC_ALL} locale category -@item LC_ALL -All of the above. (Not too useful in the context of @code{gettext}.) -@end table -@c ENDOFRANGE gettex - -@node Programmer i18n -@section Internationalizing @command{awk} Programs -@c STARTOFRANGE inap -@cindex @command{awk} programs, internationalizing - -@command{gawk} provides the following variables and functions for -internationalization: - -@table @code -@cindex @code{TEXTDOMAIN} variable -@item TEXTDOMAIN -This variable indicates the application's text domain. -For compatibility with GNU @code{gettext}, the default -value is @code{"messages"}. - -@cindex internationalization, localization, marked strings -@cindex strings, for localization -@item _"your message here" -String constants marked with a leading underscore -are candidates for translation at runtime. -String constants without a leading underscore are not translated. - -@cindex @code{dcgettext()} function (@command{gawk}) -@item dcgettext(@var{string} @r{[}, @var{domain} @r{[}, @var{category}@r{]]}) -Return the translation of @var{string} in -text domain @var{domain} for locale category @var{category}. -The default value for @var{domain} is the current value of @code{TEXTDOMAIN}. -The default value for @var{category} is @code{"LC_MESSAGES"}. - -If you supply a value for @var{category}, it must be a string equal to -one of the known locale categories described in -@ifnotinfo -the previous @value{SECTION}. -@end ifnotinfo -@ifinfo -@ref{Explaining gettext}. -@end ifinfo -You must also supply a text domain. Use @code{TEXTDOMAIN} if -you want to use the current domain. - -@quotation CAUTION -The order of arguments to the @command{awk} version -of the @code{dcgettext()} function is purposely different from the order for -the C version. The @command{awk} version's order was -chosen to be simple and to allow for reasonable @command{awk}-style -default arguments. -@end quotation - -@cindex @code{dcngettext()} function (@command{gawk}) -@item dcngettext(@var{string1}, @var{string2}, @var{number} @r{[}, @var{domain} @r{[}, @var{category}@r{]]}) -Return the plural form used for @var{number} of the -translation of @var{string1} and @var{string2} in text domain -@var{domain} for locale category @var{category}. @var{string1} is the -English singular variant of a message, and @var{string2} the English plural -variant of the same message. -The default value for @var{domain} is the current value of @code{TEXTDOMAIN}. -The default value for @var{category} is @code{"LC_MESSAGES"}. - -The same remarks about argument order as for the @code{dcgettext()} function apply. - -@cindex @code{.mo} files, specifying directory of -@cindex files, @code{.mo}, specifying directory of -@cindex message object files, specifying directory of -@cindex files, message object, specifying directory of -@cindex @code{bindtextdomain()} function (@command{gawk}) -@item bindtextdomain(@var{directory} @r{[}, @var{domain}@r{]}) -Change the directory in which -@code{gettext} looks for @file{.mo} files, in case they -will not or cannot be placed in the standard locations -(e.g., during testing). -Return the directory in which @var{domain} is ``bound.'' - -The default @var{domain} is the value of @code{TEXTDOMAIN}. -If @var{directory} is the null string (@code{""}), then -@code{bindtextdomain()} returns the current binding for the -given @var{domain}. -@end table - -To use these facilities in your @command{awk} program, follow the steps -outlined in -@ifnotinfo -the previous @value{SECTION}, -@end ifnotinfo -@ifinfo -@ref{Explaining gettext}, -@end ifinfo -like so: - -@enumerate -@cindex @code{BEGIN} pattern, @code{TEXTDOMAIN} variable and -@cindex @code{TEXTDOMAIN} variable, @code{BEGIN} pattern and -@item -Set the variable @code{TEXTDOMAIN} to the text domain of -your program. This is best done in a @code{BEGIN} rule -(@pxref{BEGIN/END}), -or it can also be done via the @option{-v} command-line -option (@pxref{Options}): - -@example -BEGIN @{ - TEXTDOMAIN = "guide" - @dots{} -@} -@end example - -@cindex @code{_} (underscore), translatable string -@cindex underscore (@code{_}), translatable string -@item -Mark all translatable strings with a leading underscore (@samp{_}) -character. It @emph{must} be adjacent to the opening -quote of the string. For example: - -@example -print _"hello, world" -x = _"you goofed" -printf(_"Number of users is %d\n", nusers) -@end example - -@item -If you are creating strings dynamically, you can -still translate them, using the @code{dcgettext()} -built-in function: - -@example -message = nusers " users logged in" -message = dcgettext(message, "adminprog") -print message -@end example - -Here, the call to @code{dcgettext()} supplies a different -text domain (@code{"adminprog"}) in which to find the -message, but it uses the default @code{"LC_MESSAGES"} category. - -@cindex @code{LC_MESSAGES} locale category, @code{bindtextdomain()} function (@command{gawk}) -@item -During development, you might want to put the @file{.mo} -file in a private directory for testing. This is done -with the @code{bindtextdomain()} built-in function: - -@example -BEGIN @{ - TEXTDOMAIN = "guide" # our text domain - if (Testing) @{ - # where to find our files - bindtextdomain("testdir") - # joe is in charge of adminprog - bindtextdomain("../joe/testdir", "adminprog") - @} - @dots{} -@} -@end example - -@end enumerate - -@xref{I18N Example}, -for an example program showing the steps to create -and use translations from @command{awk}. - -@node Translator i18n -@section Translating @command{awk} Programs - -@cindex @code{.po} files -@cindex files, @code{.po} -@cindex portable object files -@cindex files, portable object -Once a program's translatable strings have been marked, they must -be extracted to create the initial @file{.po} file. -As part of translation, it is often helpful to rearrange the order -in which arguments to @code{printf} are output. - -@command{gawk}'s @option{--gen-pot} command-line option extracts -the messages and is discussed next. -After that, @code{printf}'s ability to -rearrange the order for @code{printf} arguments at runtime -is covered. - -@menu -* String Extraction:: Extracting marked strings. -* Printf Ordering:: Rearranging @code{printf} arguments. -* I18N Portability:: @command{awk}-level portability issues. -@end menu - -@node String Extraction -@subsection Extracting Marked Strings -@cindex strings, extracting -@cindex marked strings@comma{} extracting -@cindex @code{--gen-pot} option -@cindex command-line options, string extraction -@cindex string extraction (internationalization) -@cindex marked string extraction (internationalization) -@cindex extraction, of marked strings (internationalization) - -@cindex @code{--gen-pot} option -Once your @command{awk} program is working, and all the strings have -been marked and you've set (and perhaps bound) the text domain, -it is time to produce translations. -First, use the @option{--gen-pot} command-line option to create -the initial @file{.pot} file: - -@example -$ @kbd{gawk --gen-pot -f guide.awk > guide.pot} -@end example - -@cindex @code{xgettext} utility -When run with @option{--gen-pot}, @command{gawk} does not execute your -program. Instead, it parses it as usual and prints all marked strings -to standard output in the format of a GNU @code{gettext} Portable Object -file. Also included in the output are any constant strings that -appear as the first argument to @code{dcgettext()} or as the first and -second argument to @code{dcngettext()}.@footnote{The -@command{xgettext} utility that comes with GNU -@code{gettext} can handle @file{.awk} files.} -@xref{I18N Example}, -for the full list of steps to go through to create and test -translations for @command{guide}. - -@node Printf Ordering -@subsection Rearranging @code{printf} Arguments - -@cindex @code{printf} statement, positional specifiers -@cindex positional specifiers, @code{printf} statement -Format strings for @code{printf} and @code{sprintf()} -(@pxref{Printf}) -present a special problem for translation. -Consider the following:@footnote{This example is borrowed -from the GNU @code{gettext} manual.} - -@c line broken here only for smallbook format -@example -printf(_"String `%s' has %d characters\n", - string, length(string))) -@end example - -A possible German translation for this might be: - -@example -"%d Zeichen lang ist die Zeichenkette `%s'\n" -@end example - -The problem should be obvious: the order of the format -specifications is different from the original! -Even though @code{gettext()} can return the translated string -at runtime, -it cannot change the argument order in the call to @code{printf}. - -To solve this problem, @code{printf} format specifiers may have -an additional optional element, which we call a @dfn{positional specifier}. -For example: - -@example -"%2$d Zeichen lang ist die Zeichenkette `%1$s'\n" -@end example - -Here, the positional specifier consists of an integer count, which indicates which -argument to use, and a @samp{$}. Counts are one-based, and the -format string itself is @emph{not} included. Thus, in the following -example, @samp{string} is the first argument and @samp{length(string)} is the second: - -@example -$ @kbd{gawk 'BEGIN @{} -> @kbd{string = "Dont Panic"} -> @kbd{printf _"%2$d characters live in \"%1$s\"\n",} -> @kbd{string, length(string)} -> @kbd{@}'} -@print{} 10 characters live in "Dont Panic" -@end example - -If present, positional specifiers come first in the format specification, -before the flags, the field width, and/or the precision. - -Positional specifiers can be used with the dynamic field width and -precision capability: - -@example -$ @kbd{gawk 'BEGIN @{} -> @kbd{printf("%*.*s\n", 10, 20, "hello")} -> @kbd{printf("%3$*2$.*1$s\n", 20, 10, "hello")} -> @kbd{@}'} -@print{} hello -@print{} hello -@end example - -@quotation NOTE -When using @samp{*} with a positional specifier, the @samp{*} -comes first, then the integer position, and then the @samp{$}. -This is somewhat counterintuitive. -@end quotation - -@cindex @code{printf} statement, positional specifiers, mixing with regular formats -@cindex positional specifiers, @code{printf} statement, mixing with regular formats -@cindex format specifiers, mixing regular with positional specifiers -@command{gawk} does not allow you to mix regular format specifiers -and those with positional specifiers in the same string: - -@example -$ @kbd{gawk 'BEGIN @{ printf _"%d %3$s\n", 1, 2, "hi" @}'} -@error{} gawk: cmd. line:1: fatal: must use `count$' on all formats or none -@end example - -@quotation NOTE -There are some pathological cases that @command{gawk} may fail to -diagnose. In such cases, the output may not be what you expect. -It's still a bad idea to try mixing them, even if @command{gawk} -doesn't detect it. -@end quotation - -Although positional specifiers can be used directly in @command{awk} programs, -their primary purpose is to help in producing correct translations of -format strings into languages different from the one in which the program -is first written. - -@node I18N Portability -@subsection @command{awk} Portability Issues - -@cindex portability, internationalization and -@cindex internationalization, localization, portability and -@command{gawk}'s internationalization features were purposely chosen to -have as little impact as possible on the portability of @command{awk} -programs that use them to other versions of @command{awk}. -Consider this program: - -@example -BEGIN @{ - TEXTDOMAIN = "guide" - if (Test_Guide) # set with -v - bindtextdomain("/test/guide/messages") - print _"don't panic!" -@} -@end example - -@noindent -As written, it won't work on other versions of @command{awk}. -However, it is actually almost portable, requiring very little -change: - -@itemize @bullet -@cindex @code{TEXTDOMAIN} variable, portability and -@item -Assignments to @code{TEXTDOMAIN} won't have any effect, -since @code{TEXTDOMAIN} is not special in other @command{awk} implementations. - -@item -Non-GNU versions of @command{awk} treat marked strings -as the concatenation of a variable named @code{_} with the string -following it.@footnote{This is good fodder for an ``Obfuscated -@command{awk}'' contest.} Typically, the variable @code{_} has -the null string (@code{""}) as its value, leaving the original string constant as -the result. - -@item -By defining ``dummy'' functions to replace @code{dcgettext()}, @code{dcngettext()} -and @code{bindtextdomain()}, the @command{awk} program can be made to run, but -all the messages are output in the original language. -For example: - -@cindex @code{bindtextdomain()} function (@command{gawk}), portability and -@cindex @code{dcgettext()} function (@command{gawk}), portability and -@cindex @code{dcngettext()} function (@command{gawk}), portability and -@example -@c file eg/lib/libintl.awk -function bindtextdomain(dir, domain) -@{ - return dir -@} - -function dcgettext(string, domain, category) -@{ - return string -@} - -function dcngettext(string1, string2, number, domain, category) -@{ - return (number == 1 ? string1 : string2) -@} -@c endfile -@end example - -@item -The use of positional specifications in @code{printf} or -@code{sprintf()} is @emph{not} portable. -To support @code{gettext()} at the C level, many systems' C versions of -@code{sprintf()} do support positional specifiers. But it works only if -enough arguments are supplied in the function call. Many versions of -@command{awk} pass @code{printf} formats and arguments unchanged to the -underlying C library version of @code{sprintf()}, but only one format and -argument at a time. What happens if a positional specification is -used is anybody's guess. -However, since the positional specifications are primarily for use in -@emph{translated} format strings, and since non-GNU @command{awk}s never -retrieve the translated string, this should not be a problem in practice. -@end itemize -@c ENDOFRANGE inap - -@node I18N Example -@section A Simple Internationalization Example - -Now let's look at a step-by-step example of how to internationalize and -localize a simple @command{awk} program, using @file{guide.awk} as our -original source: - -@example -@c file eg/prog/guide.awk -BEGIN @{ - TEXTDOMAIN = "guide" - bindtextdomain(".") # for testing - print _"Don't Panic" - print _"The Answer Is", 42 - print "Pardon me, Zaphod who?" -@} -@c endfile -@end example - -@noindent -Run @samp{gawk --gen-pot} to create the @file{.pot} file: - -@example -$ @kbd{gawk --gen-pot -f guide.awk > guide.pot} -@end example - -@noindent -This produces: - -@example -@c file eg/data/guide.po -#: guide.awk:4 -msgid "Don't Panic" -msgstr "" - -#: guide.awk:5 -msgid "The Answer Is" -msgstr "" - -@c endfile -@end example - -This original portable object template file is saved and reused for each language -into which the application is translated. The @code{msgid} -is the original string and the @code{msgstr} is the translation. - -@quotation NOTE -Strings not marked with a leading underscore do not -appear in the @file{guide.pot} file. -@end quotation - -Next, the messages must be translated. -Here is a translation to a hypothetical dialect of English, -called ``Mellow'':@footnote{Perhaps it would be better if it were -called ``Hippy.'' Ah, well.} - -@example -@group -$ cp guide.pot guide-mellow.po -@var{Add translations to} guide-mellow.po @dots{} -@end group -@end example - -@noindent -Following are the translations: - -@example -@c file eg/data/guide-mellow.po -#: guide.awk:4 -msgid "Don't Panic" -msgstr "Hey man, relax!" - -#: guide.awk:5 -msgid "The Answer Is" -msgstr "Like, the scoop is" - -@c endfile -@end example - -@cindex Linux -@cindex GNU/Linux -The next step is to make the directory to hold the binary message object -file and then to create the @file{guide.mo} file. -The directory layout shown here is standard for GNU @code{gettext} on -GNU/Linux systems. Other versions of @code{gettext} may use a different -layout: - -@example -$ @kbd{mkdir en_US en_US/LC_MESSAGES} -@end example - -@cindex @code{.po} files, converting to @code{.mo} -@cindex files, @code{.po}, converting to @code{.mo} -@cindex @code{.mo} files, converting from @code{.po} -@cindex files, @code{.mo}, converting from @code{.po} -@cindex portable object files, converting to message object files -@cindex files, portable object, converting to message object files -@cindex message object files, converting from portable object files -@cindex files, message object, converting from portable object files -@cindex @command{msgfmt} utility -The @command{msgfmt} utility does the conversion from human-readable -@file{.po} file to machine-readable @file{.mo} file. -By default, @command{msgfmt} creates a file named @file{messages}. -This file must be renamed and placed in the proper directory so that -@command{gawk} can find it: - -@example -$ @kbd{msgfmt guide-mellow.po} -$ @kbd{mv messages en_US/LC_MESSAGES/guide.mo} -@end example - -Finally, we run the program to test it: - -@example -$ @kbd{gawk -f guide.awk} -@print{} Hey man, relax! -@print{} Like, the scoop is 42 -@print{} Pardon me, Zaphod who? -@end example - -If the three replacement functions for @code{dcgettext()}, @code{dcngettext()} -and @code{bindtextdomain()} -(@pxref{I18N Portability}) -are in a file named @file{libintl.awk}, -then we can run @file{guide.awk} unchanged as follows: - -@example -$ @kbd{gawk --posix -f guide.awk -f libintl.awk} -@print{} Don't Panic -@print{} The Answer Is 42 -@print{} Pardon me, Zaphod who? -@end example - -@node Gawk I18N -@section @command{gawk} Can Speak Your Language - -@command{gawk} itself has been internationalized -using the GNU @code{gettext} package. -(GNU @code{gettext} is described in -complete detail in -@ifinfo -@inforef{Top, , GNU @code{gettext} utilities, gettext, GNU gettext tools}.) -@end ifinfo -@ifnotinfo -@cite{GNU gettext tools}.) -@end ifnotinfo -As of this writing, the latest version of GNU @code{gettext} is -@uref{ftp://ftp.gnu.org/gnu/gettext/gettext-0.18.1.tar.gz, @value{PVERSION} 0.18.1}. - -If a translation of @command{gawk}'s messages exists, -then @command{gawk} produces usage messages, warnings, -and fatal errors in the local language. -@c ENDOFRANGE inloc - @node Advanced Features @chapter Advanced Features of @command{gawk} @cindex advanced features, network connections, See Also networks, connections @@ -24750,10 +24849,28 @@ of TCP/IP networking. Finally, @command{gawk} can @dfn{profile} an @command{awk} program, making it possible to tune it for performance. +A number of advanced features require separate @value{CHAPTER}s of their +own: + +@itemize @bullet +@item +@ref{Internationalization}, discusses how to internationalize +your @command{awk} programs, so that they can speak multiple +national languages. + +@item +@ref{Debugger}, describes @command{gawk}'s built-in command-line +debugger for debugging @command{awk} programs. + +@item +@ref{Arbitrary Precision Arithmetic}, describes how you can use +@command{gawk} to perform arbitrary-precision arithmetic. + +@item @ref{Dynamic Extensions}, discusses the ability to dynamically add new built-in functions to -@command{gawk}. As this feature is still immature and likely to change, -its description is relegated to an appendix. +@command{gawk}. +@end itemize @menu * Nondecimal Data:: Allowing nondecimal input data. @@ -24955,7 +25072,6 @@ BEGIN @{ @end example Here are the results when the program is run: -@page @example $ @kbd{gawk -f compdemo.awk} @@ -25804,7 +25920,7 @@ keyboard. The @code{INT} signal is generated by the @kbd{@value{CTL}-@key{C}} or @kbd{@value{CTL}-@key{BREAK}} key, while the @code{QUIT} signal is generated by the @kbd{@value{CTL}-@key{\}} key. -Finally, @command{gawk} also accepts another option @option{--pretty-print}. +Finally, @command{gawk} also accepts another option, @option{--pretty-print}. When called this way, @command{gawk} ``pretty prints'' the program into @file{awkprof.out}, without any execution counts. @c ENDOFRANGE advgaw @@ -25812,6 +25928,795 @@ When called this way, @command{gawk} ``pretty prints'' the program into @c ENDOFRANGE awkp @c ENDOFRANGE proawk +@node Internationalization +@chapter Internationalization with @command{gawk} + +Once upon a time, computer makers +wrote software that worked only in English. +Eventually, hardware and software vendors noticed that if their +systems worked in the native languages of non-English-speaking +countries, they were able to sell more systems. +As a result, internationalization and localization +of programs and software systems became a common practice. + +@c STARTOFRANGE inloc +@cindex internationalization, localization +@cindex @command{gawk}, internationalization and, See internationalization +@cindex internationalization, localization, @command{gawk} and +For many years, the ability to provide internationalization +was largely restricted to programs written in C and C++. +This @value{CHAPTER} describes the underlying library @command{gawk} +uses for internationalization, as well as how +@command{gawk} makes internationalization +features available at the @command{awk} program level. +Having internationalization available at the @command{awk} level +gives software developers additional flexibility---they are no +longer forced to write in C or C++ when internationalization is +a requirement. + +@menu +* I18N and L10N:: Internationalization and Localization. +* Explaining gettext:: How GNU @code{gettext} works. +* Programmer i18n:: Features for the programmer. +* Translator i18n:: Features for the translator. +* I18N Example:: A simple i18n example. +* Gawk I18N:: @command{gawk} is also internationalized. +@end menu + +@node I18N and L10N +@section Internationalization and Localization + +@cindex internationalization +@cindex localization, See internationalization@comma{} localization +@cindex localization +@dfn{Internationalization} means writing (or modifying) a program once, +in such a way that it can use multiple languages without requiring +further source-code changes. +@dfn{Localization} means providing the data necessary for an +internationalized program to work in a particular language. +Most typically, these terms refer to features such as the language +used for printing error messages, the language used to read +responses, and information related to how numerical and +monetary values are printed and read. + +@node Explaining gettext +@section GNU @code{gettext} + +@cindex internationalizing a program +@c STARTOFRANGE gettex +@cindex @code{gettext} library +The facilities in GNU @code{gettext} focus on messages; strings printed +by a program, either directly or via formatting with @code{printf} or +@code{sprintf()}.@footnote{For some operating systems, the @command{gawk} +port doesn't support GNU @code{gettext}. +Therefore, these features are not available +if you are using one of those operating systems. Sorry.} + +@cindex portability, @code{gettext} library and +When using GNU @code{gettext}, each application has its own +@dfn{text domain}. This is a unique name, such as @samp{kpilot} or @samp{gawk}, +that identifies the application. +A complete application may have multiple components---programs written +in C or C++, as well as scripts written in @command{sh} or @command{awk}. +All of the components use the same text domain. + +To make the discussion concrete, assume we're writing an application +named @command{guide}. Internationalization consists of the +following steps, in this order: + +@enumerate +@item +The programmer goes +through the source for all of @command{guide}'s components +and marks each string that is a candidate for translation. +For example, @code{"`-F': option required"} is a good candidate for translation. +A table with strings of option names is not (e.g., @command{gawk}'s +@option{--profile} option should remain the same, no matter what the local +language). + +@cindex @code{textdomain()} function (C library) +@item +The programmer indicates the application's text domain +(@code{"guide"}) to the @code{gettext} library, +by calling the @code{textdomain()} function. + +@cindex @code{.pot} files +@cindex files, @code{.pot} +@cindex portable object template files +@cindex files, portable object template +@item +Messages from the application are extracted from the source code and +collected into a portable object template file (@file{guide.pot}), +which lists the strings and their translations. +The translations are initially empty. +The original (usually English) messages serve as the key for +lookup of the translations. + +@cindex @code{.po} files +@cindex files, @code{.po} +@cindex portable object files +@cindex files, portable object +@item +For each language with a translator, @file{guide.pot} +is copied to a portable object file (@code{.po}) +and translations are created and shipped with the application. +For example, there might be a @file{fr.po} for a French translation. + +@cindex @code{.mo} files +@cindex files, @code{.mo} +@cindex message object files +@cindex files, message object +@item +Each language's @file{.po} file is converted into a binary +message object (@file{.mo}) file. +A message object file contains the original messages and their +translations in a binary format that allows fast lookup of translations +at runtime. + +@item +When @command{guide} is built and installed, the binary translation files +are installed in a standard place. + +@cindex @code{bindtextdomain()} function (C library) +@item +For testing and development, it is possible to tell @code{gettext} +to use @file{.mo} files in a different directory than the standard +one by using the @code{bindtextdomain()} function. + +@cindex @code{.mo} files, specifying directory of +@cindex files, @code{.mo}, specifying directory of +@cindex message object files, specifying directory of +@cindex files, message object, specifying directory of +@item +At runtime, @command{guide} looks up each string via a call +to @code{gettext()}. The returned string is the translated string +if available, or the original string if not. + +@item +If necessary, it is possible to access messages from a different +text domain than the one belonging to the application, without +having to switch the application's default text domain back +and forth. +@end enumerate + +@cindex @code{gettext()} function (C library) +In C (or C++), the string marking and dynamic translation lookup +are accomplished by wrapping each string in a call to @code{gettext()}: + +@example +printf("%s", gettext("Don't Panic!\n")); +@end example + +The tools that extract messages from source code pull out all +strings enclosed in calls to @code{gettext()}. + +@cindex @code{_} (underscore), @code{_} C macro +@cindex underscore (@code{_}), @code{_} C macro +The GNU @code{gettext} developers, recognizing that typing +@samp{gettext(@dots{})} over and over again is both painful and ugly to look +at, use the macro @samp{_} (an underscore) to make things easier: + +@example +/* In the standard header file: */ +#define _(str) gettext(str) + +/* In the program text: */ +printf("%s", _("Don't Panic!\n")); +@end example + +@cindex internationalization, localization, locale categories +@cindex @code{gettext} library, locale categories +@cindex locale categories +@noindent +This reduces the typing overhead to just three extra characters per string +and is considerably easier to read as well. + +There are locale @dfn{categories} +for different types of locale-related information. +The defined locale categories that @code{gettext} knows about are: + +@table @code +@cindex @code{LC_MESSAGES} locale category +@item LC_MESSAGES +Text messages. This is the default category for @code{gettext} +operations, but it is possible to supply a different one explicitly, +if necessary. (It is almost never necessary to supply a different category.) + +@cindex sorting characters in different languages +@cindex @code{LC_COLLATE} locale category +@item LC_COLLATE +Text-collation information; i.e., how different characters +and/or groups of characters sort in a given language. + +@cindex @code{LC_CTYPE} locale category +@item LC_CTYPE +Character-type information (alphabetic, digit, upper- or lowercase, and +so on). +This information is accessed via the +POSIX character classes in regular expressions, +such as @code{/[[:alnum:]]/} +(@pxref{Regexp Operators}). + +@cindex monetary information, localization +@cindex currency symbols, localization +@cindex @code{LC_MONETARY} locale category +@item LC_MONETARY +Monetary information, such as the currency symbol, and whether the +symbol goes before or after a number. + +@cindex @code{LC_NUMERIC} locale category +@item LC_NUMERIC +Numeric information, such as which characters to use for the decimal +point and the thousands separator.@footnote{Americans +use a comma every three decimal places and a period for the decimal +point, while many Europeans do exactly the opposite: +1,234.56 versus 1.234,56.} + +@cindex @code{LC_RESPONSE} locale category +@item LC_RESPONSE +Response information, such as how ``yes'' and ``no'' appear in the +local language, and possibly other information as well. + +@cindex time, localization and +@cindex dates, information related to@comma{} localization +@cindex @code{LC_TIME} locale category +@item LC_TIME +Time- and date-related information, such as 12- or 24-hour clock, month printed +before or after the day in a date, local month abbreviations, and so on. + +@cindex @code{LC_ALL} locale category +@item LC_ALL +All of the above. (Not too useful in the context of @code{gettext}.) +@end table +@c ENDOFRANGE gettex + +@node Programmer i18n +@section Internationalizing @command{awk} Programs +@c STARTOFRANGE inap +@cindex @command{awk} programs, internationalizing + +@command{gawk} provides the following variables and functions for +internationalization: + +@table @code +@cindex @code{TEXTDOMAIN} variable +@item TEXTDOMAIN +This variable indicates the application's text domain. +For compatibility with GNU @code{gettext}, the default +value is @code{"messages"}. + +@cindex internationalization, localization, marked strings +@cindex strings, for localization +@item _"your message here" +String constants marked with a leading underscore +are candidates for translation at runtime. +String constants without a leading underscore are not translated. + +@cindex @code{dcgettext()} function (@command{gawk}) +@item dcgettext(@var{string} @r{[}, @var{domain} @r{[}, @var{category}@r{]]}) +Return the translation of @var{string} in +text domain @var{domain} for locale category @var{category}. +The default value for @var{domain} is the current value of @code{TEXTDOMAIN}. +The default value for @var{category} is @code{"LC_MESSAGES"}. + +If you supply a value for @var{category}, it must be a string equal to +one of the known locale categories described in +@ifnotinfo +the previous @value{SECTION}. +@end ifnotinfo +@ifinfo +@ref{Explaining gettext}. +@end ifinfo +You must also supply a text domain. Use @code{TEXTDOMAIN} if +you want to use the current domain. + +@quotation CAUTION +The order of arguments to the @command{awk} version +of the @code{dcgettext()} function is purposely different from the order for +the C version. The @command{awk} version's order was +chosen to be simple and to allow for reasonable @command{awk}-style +default arguments. +@end quotation + +@cindex @code{dcngettext()} function (@command{gawk}) +@item dcngettext(@var{string1}, @var{string2}, @var{number} @r{[}, @var{domain} @r{[}, @var{category}@r{]]}) +Return the plural form used for @var{number} of the +translation of @var{string1} and @var{string2} in text domain +@var{domain} for locale category @var{category}. @var{string1} is the +English singular variant of a message, and @var{string2} the English plural +variant of the same message. +The default value for @var{domain} is the current value of @code{TEXTDOMAIN}. +The default value for @var{category} is @code{"LC_MESSAGES"}. + +The same remarks about argument order as for the @code{dcgettext()} function apply. + +@cindex @code{.mo} files, specifying directory of +@cindex files, @code{.mo}, specifying directory of +@cindex message object files, specifying directory of +@cindex files, message object, specifying directory of +@cindex @code{bindtextdomain()} function (@command{gawk}) +@item bindtextdomain(@var{directory} @r{[}, @var{domain}@r{]}) +Change the directory in which +@code{gettext} looks for @file{.mo} files, in case they +will not or cannot be placed in the standard locations +(e.g., during testing). +Return the directory in which @var{domain} is ``bound.'' + +The default @var{domain} is the value of @code{TEXTDOMAIN}. +If @var{directory} is the null string (@code{""}), then +@code{bindtextdomain()} returns the current binding for the +given @var{domain}. +@end table + +To use these facilities in your @command{awk} program, follow the steps +outlined in +@ifnotinfo +the previous @value{SECTION}, +@end ifnotinfo +@ifinfo +@ref{Explaining gettext}, +@end ifinfo +like so: + +@enumerate +@cindex @code{BEGIN} pattern, @code{TEXTDOMAIN} variable and +@cindex @code{TEXTDOMAIN} variable, @code{BEGIN} pattern and +@item +Set the variable @code{TEXTDOMAIN} to the text domain of +your program. This is best done in a @code{BEGIN} rule +(@pxref{BEGIN/END}), +or it can also be done via the @option{-v} command-line +option (@pxref{Options}): + +@example +BEGIN @{ + TEXTDOMAIN = "guide" + @dots{} +@} +@end example + +@cindex @code{_} (underscore), translatable string +@cindex underscore (@code{_}), translatable string +@item +Mark all translatable strings with a leading underscore (@samp{_}) +character. It @emph{must} be adjacent to the opening +quote of the string. For example: + +@example +print _"hello, world" +x = _"you goofed" +printf(_"Number of users is %d\n", nusers) +@end example + +@item +If you are creating strings dynamically, you can +still translate them, using the @code{dcgettext()} +built-in function: + +@example +message = nusers " users logged in" +message = dcgettext(message, "adminprog") +print message +@end example + +Here, the call to @code{dcgettext()} supplies a different +text domain (@code{"adminprog"}) in which to find the +message, but it uses the default @code{"LC_MESSAGES"} category. + +@cindex @code{LC_MESSAGES} locale category, @code{bindtextdomain()} function (@command{gawk}) +@item +During development, you might want to put the @file{.mo} +file in a private directory for testing. This is done +with the @code{bindtextdomain()} built-in function: + +@example +BEGIN @{ + TEXTDOMAIN = "guide" # our text domain + if (Testing) @{ + # where to find our files + bindtextdomain("testdir") + # joe is in charge of adminprog + bindtextdomain("../joe/testdir", "adminprog") + @} + @dots{} +@} +@end example + +@end enumerate + +@xref{I18N Example}, +for an example program showing the steps to create +and use translations from @command{awk}. + +@node Translator i18n +@section Translating @command{awk} Programs + +@cindex @code{.po} files +@cindex files, @code{.po} +@cindex portable object files +@cindex files, portable object +Once a program's translatable strings have been marked, they must +be extracted to create the initial @file{.po} file. +As part of translation, it is often helpful to rearrange the order +in which arguments to @code{printf} are output. + +@command{gawk}'s @option{--gen-pot} command-line option extracts +the messages and is discussed next. +After that, @code{printf}'s ability to +rearrange the order for @code{printf} arguments at runtime +is covered. + +@menu +* String Extraction:: Extracting marked strings. +* Printf Ordering:: Rearranging @code{printf} arguments. +* I18N Portability:: @command{awk}-level portability issues. +@end menu + +@node String Extraction +@subsection Extracting Marked Strings +@cindex strings, extracting +@cindex marked strings@comma{} extracting +@cindex @code{--gen-pot} option +@cindex command-line options, string extraction +@cindex string extraction (internationalization) +@cindex marked string extraction (internationalization) +@cindex extraction, of marked strings (internationalization) + +@cindex @code{--gen-pot} option +Once your @command{awk} program is working, and all the strings have +been marked and you've set (and perhaps bound) the text domain, +it is time to produce translations. +First, use the @option{--gen-pot} command-line option to create +the initial @file{.pot} file: + +@example +$ @kbd{gawk --gen-pot -f guide.awk > guide.pot} +@end example + +@cindex @code{xgettext} utility +When run with @option{--gen-pot}, @command{gawk} does not execute your +program. Instead, it parses it as usual and prints all marked strings +to standard output in the format of a GNU @code{gettext} Portable Object +file. Also included in the output are any constant strings that +appear as the first argument to @code{dcgettext()} or as the first and +second argument to @code{dcngettext()}.@footnote{The +@command{xgettext} utility that comes with GNU +@code{gettext} can handle @file{.awk} files.} +@xref{I18N Example}, +for the full list of steps to go through to create and test +translations for @command{guide}. + +@node Printf Ordering +@subsection Rearranging @code{printf} Arguments + +@cindex @code{printf} statement, positional specifiers +@cindex positional specifiers, @code{printf} statement +Format strings for @code{printf} and @code{sprintf()} +(@pxref{Printf}) +present a special problem for translation. +Consider the following:@footnote{This example is borrowed +from the GNU @code{gettext} manual.} + +@c line broken here only for smallbook format +@example +printf(_"String `%s' has %d characters\n", + string, length(string))) +@end example + +A possible German translation for this might be: + +@example +"%d Zeichen lang ist die Zeichenkette `%s'\n" +@end example + +The problem should be obvious: the order of the format +specifications is different from the original! +Even though @code{gettext()} can return the translated string +at runtime, +it cannot change the argument order in the call to @code{printf}. + +To solve this problem, @code{printf} format specifiers may have +an additional optional element, which we call a @dfn{positional specifier}. +For example: + +@example +"%2$d Zeichen lang ist die Zeichenkette `%1$s'\n" +@end example + +Here, the positional specifier consists of an integer count, which indicates which +argument to use, and a @samp{$}. Counts are one-based, and the +format string itself is @emph{not} included. Thus, in the following +example, @samp{string} is the first argument and @samp{length(string)} is the second: + +@example +$ @kbd{gawk 'BEGIN @{} +> @kbd{string = "Dont Panic"} +> @kbd{printf _"%2$d characters live in \"%1$s\"\n",} +> @kbd{string, length(string)} +> @kbd{@}'} +@print{} 10 characters live in "Dont Panic" +@end example + +If present, positional specifiers come first in the format specification, +before the flags, the field width, and/or the precision. + +Positional specifiers can be used with the dynamic field width and +precision capability: + +@example +$ @kbd{gawk 'BEGIN @{} +> @kbd{printf("%*.*s\n", 10, 20, "hello")} +> @kbd{printf("%3$*2$.*1$s\n", 20, 10, "hello")} +> @kbd{@}'} +@print{} hello +@print{} hello +@end example + +@quotation NOTE +When using @samp{*} with a positional specifier, the @samp{*} +comes first, then the integer position, and then the @samp{$}. +This is somewhat counterintuitive. +@end quotation + +@cindex @code{printf} statement, positional specifiers, mixing with regular formats +@cindex positional specifiers, @code{printf} statement, mixing with regular formats +@cindex format specifiers, mixing regular with positional specifiers +@command{gawk} does not allow you to mix regular format specifiers +and those with positional specifiers in the same string: + +@example +$ @kbd{gawk 'BEGIN @{ printf _"%d %3$s\n", 1, 2, "hi" @}'} +@error{} gawk: cmd. line:1: fatal: must use `count$' on all formats or none +@end example + +@quotation NOTE +There are some pathological cases that @command{gawk} may fail to +diagnose. In such cases, the output may not be what you expect. +It's still a bad idea to try mixing them, even if @command{gawk} +doesn't detect it. +@end quotation + +Although positional specifiers can be used directly in @command{awk} programs, +their primary purpose is to help in producing correct translations of +format strings into languages different from the one in which the program +is first written. + +@node I18N Portability +@subsection @command{awk} Portability Issues + +@cindex portability, internationalization and +@cindex internationalization, localization, portability and +@command{gawk}'s internationalization features were purposely chosen to +have as little impact as possible on the portability of @command{awk} +programs that use them to other versions of @command{awk}. +Consider this program: + +@example +BEGIN @{ + TEXTDOMAIN = "guide" + if (Test_Guide) # set with -v + bindtextdomain("/test/guide/messages") + print _"don't panic!" +@} +@end example + +@noindent +As written, it won't work on other versions of @command{awk}. +However, it is actually almost portable, requiring very little +change: + +@itemize @bullet +@cindex @code{TEXTDOMAIN} variable, portability and +@item +Assignments to @code{TEXTDOMAIN} won't have any effect, +since @code{TEXTDOMAIN} is not special in other @command{awk} implementations. + +@item +Non-GNU versions of @command{awk} treat marked strings +as the concatenation of a variable named @code{_} with the string +following it.@footnote{This is good fodder for an ``Obfuscated +@command{awk}'' contest.} Typically, the variable @code{_} has +the null string (@code{""}) as its value, leaving the original string constant as +the result. + +@item +By defining ``dummy'' functions to replace @code{dcgettext()}, @code{dcngettext()} +and @code{bindtextdomain()}, the @command{awk} program can be made to run, but +all the messages are output in the original language. +For example: + +@cindex @code{bindtextdomain()} function (@command{gawk}), portability and +@cindex @code{dcgettext()} function (@command{gawk}), portability and +@cindex @code{dcngettext()} function (@command{gawk}), portability and +@example +@c file eg/lib/libintl.awk +function bindtextdomain(dir, domain) +@{ + return dir +@} + +function dcgettext(string, domain, category) +@{ + return string +@} + +function dcngettext(string1, string2, number, domain, category) +@{ + return (number == 1 ? string1 : string2) +@} +@c endfile +@end example + +@item +The use of positional specifications in @code{printf} or +@code{sprintf()} is @emph{not} portable. +To support @code{gettext()} at the C level, many systems' C versions of +@code{sprintf()} do support positional specifiers. But it works only if +enough arguments are supplied in the function call. Many versions of +@command{awk} pass @code{printf} formats and arguments unchanged to the +underlying C library version of @code{sprintf()}, but only one format and +argument at a time. What happens if a positional specification is +used is anybody's guess. +However, since the positional specifications are primarily for use in +@emph{translated} format strings, and since non-GNU @command{awk}s never +retrieve the translated string, this should not be a problem in practice. +@end itemize +@c ENDOFRANGE inap + +@node I18N Example +@section A Simple Internationalization Example + +Now let's look at a step-by-step example of how to internationalize and +localize a simple @command{awk} program, using @file{guide.awk} as our +original source: + +@example +@c file eg/prog/guide.awk +BEGIN @{ + TEXTDOMAIN = "guide" + bindtextdomain(".") # for testing + print _"Don't Panic" + print _"The Answer Is", 42 + print "Pardon me, Zaphod who?" +@} +@c endfile +@end example + +@noindent +Run @samp{gawk --gen-pot} to create the @file{.pot} file: + +@example +$ @kbd{gawk --gen-pot -f guide.awk > guide.pot} +@end example + +@noindent +This produces: + +@example +@c file eg/data/guide.po +#: guide.awk:4 +msgid "Don't Panic" +msgstr "" + +#: guide.awk:5 +msgid "The Answer Is" +msgstr "" + +@c endfile +@end example + +This original portable object template file is saved and reused for each language +into which the application is translated. The @code{msgid} +is the original string and the @code{msgstr} is the translation. + +@quotation NOTE +Strings not marked with a leading underscore do not +appear in the @file{guide.pot} file. +@end quotation + +Next, the messages must be translated. +Here is a translation to a hypothetical dialect of English, +called ``Mellow'':@footnote{Perhaps it would be better if it were +called ``Hippy.'' Ah, well.} + +@example +@group +$ cp guide.pot guide-mellow.po +@var{Add translations to} guide-mellow.po @dots{} +@end group +@end example + +@noindent +Following are the translations: + +@example +@c file eg/data/guide-mellow.po +#: guide.awk:4 +msgid "Don't Panic" +msgstr "Hey man, relax!" + +#: guide.awk:5 +msgid "The Answer Is" +msgstr "Like, the scoop is" + +@c endfile +@end example + +@cindex Linux +@cindex GNU/Linux +The next step is to make the directory to hold the binary message object +file and then to create the @file{guide.mo} file. +The directory layout shown here is standard for GNU @code{gettext} on +GNU/Linux systems. Other versions of @code{gettext} may use a different +layout: + +@example +$ @kbd{mkdir en_US en_US/LC_MESSAGES} +@end example + +@cindex @code{.po} files, converting to @code{.mo} +@cindex files, @code{.po}, converting to @code{.mo} +@cindex @code{.mo} files, converting from @code{.po} +@cindex files, @code{.mo}, converting from @code{.po} +@cindex portable object files, converting to message object files +@cindex files, portable object, converting to message object files +@cindex message object files, converting from portable object files +@cindex files, message object, converting from portable object files +@cindex @command{msgfmt} utility +The @command{msgfmt} utility does the conversion from human-readable +@file{.po} file to machine-readable @file{.mo} file. +By default, @command{msgfmt} creates a file named @file{messages}. +This file must be renamed and placed in the proper directory so that +@command{gawk} can find it: + +@example +$ @kbd{msgfmt guide-mellow.po} +$ @kbd{mv messages en_US/LC_MESSAGES/guide.mo} +@end example + +Finally, we run the program to test it: + +@example +$ @kbd{gawk -f guide.awk} +@print{} Hey man, relax! +@print{} Like, the scoop is 42 +@print{} Pardon me, Zaphod who? +@end example + +If the three replacement functions for @code{dcgettext()}, @code{dcngettext()} +and @code{bindtextdomain()} +(@pxref{I18N Portability}) +are in a file named @file{libintl.awk}, +then we can run @file{guide.awk} unchanged as follows: + +@example +$ @kbd{gawk --posix -f guide.awk -f libintl.awk} +@print{} Don't Panic +@print{} The Answer Is 42 +@print{} Pardon me, Zaphod who? +@end example + +@node Gawk I18N +@section @command{gawk} Can Speak Your Language + +@command{gawk} itself has been internationalized +using the GNU @code{gettext} package. +(GNU @code{gettext} is described in +complete detail in +@ifinfo +@inforef{Top, , GNU @code{gettext} utilities, gettext, GNU gettext tools}.) +@end ifinfo +@ifnotinfo +@cite{GNU gettext tools}.) +@end ifnotinfo +As of this writing, the latest version of GNU @code{gettext} is +@uref{ftp://ftp.gnu.org/gnu/gettext/gettext-0.18.2.1.tar.gz, @value{PVERSION} 0.18.2.1}. + +If a translation of @command{gawk}'s messages exists, +then @command{gawk} produces usage messages, warnings, +and fatal errors in the local language. +@c ENDOFRANGE inloc + @c The original text for this chapter was contributed by Efraim Yawitz. @c FIXME: Add more indexing. @@ -25991,7 +26896,7 @@ $ @kbd{gawk -D -f getopt.awk -f join.awk -f uniq.awk inputfile} where both @file{getopt.awk} and @file{uniq.awk} are in @env{$AWKPATH}. (Experienced users of GDB or similar debuggers should note that this syntax is slightly different from what they are used to. -With @command{gawk} debugger, the arguments for running the program are given +With the @command{gawk} debugger, you give the arguments for running the program in the command line to the debugger rather than as part of the @code{run} command at the debugger prompt.) @@ -26835,7 +27740,7 @@ same file more than once in order to avoid infinite recursion. In addition to, or instead of the @code{source} command, you can use the @option{-D @var{file}} or @option{--debug=@var{file}} command-line options to execute commands from a file non-interactively -(@pxref{Options}. +(@pxref{Options}). @end table @node Miscellaneous Debugger Commands @@ -26857,54 +27762,52 @@ partial dump of Davide Brini's obfuscated code @smallexample gawk> @kbd{dump} -@print{} # BEGIN -@print{} -@print{} [ 2:0x89faef4] Op_rule : [in_rule = BEGIN] [source_file = brini.awk] -@print{} [ 3:0x89fa428] Op_push_i : "~" [PERM|STRING|STRCUR] -@print{} [ 3:0x89fa464] Op_push_i : "~" [PERM|STRING|STRCUR] -@print{} [ 3:0x89fa450] Op_match : -@print{} [ 3:0x89fa3ec] Op_store_var : O [do_reference = FALSE] -@print{} [ 4:0x89fa48c] Op_push_i : "==" [PERM|STRING|STRCUR] -@print{} [ 4:0x89fa4c8] Op_push_i : "==" [PERM|STRING|STRCUR] -@print{} [ 4:0x89fa4b4] Op_equal : -@print{} [ 4:0x89fa400] Op_store_var : o [do_reference = FALSE] -@print{} [ 5:0x89fa4f0] Op_push : o -@print{} [ 5:0x89fa4dc] Op_plus_i : 0 [PERM|NUMCUR|NUMBER] -@print{} [ 5:0x89fa414] Op_push_lhs : o [do_reference = TRUE] -@print{} [ 5:0x89fa4a0] Op_assign_plus : -@print{} [ :0x89fa478] Op_pop : -@print{} [ 6:0x89fa540] Op_push : O -@print{} [ 6:0x89fa554] Op_push_i : "" [PERM|STRING|STRCUR] -@print{} [ :0x89fa5a4] Op_no_op : -@print{} [ 6:0x89fa590] Op_push : O -@print{} [ :0x89fa5b8] Op_concat : [expr_count = 3] [concat_flag = 0] -@print{} [ 6:0x89fa518] Op_store_var : x [do_reference = FALSE] -@print{} [ 7:0x89fa504] Op_push_loop : [target_continue = 0x89fa568] [target_break = 0x89fa680] -@print{} [ 7:0x89fa568] Op_push_lhs : X [do_reference = TRUE] -@print{} [ 7:0x89fa52c] Op_postincrement : -@print{} [ 7:0x89fa5e0] Op_push : x -@print{} [ 7:0x89fa61c] Op_push : o -@print{} [ 7:0x89fa5f4] Op_plus : -@print{} [ 7:0x89fa644] Op_push : o -@print{} [ 7:0x89fa630] Op_plus : -@print{} [ 7:0x89fa5cc] Op_leq : -@print{} [ :0x89fa57c] Op_jmp_false : [target_jmp = 0x89fa680] -@print{} [ 7:0x89fa694] Op_push_i : "%c" [PERM|STRING|STRCUR] -@print{} [ :0x89fa6d0] Op_no_op : -@print{} [ 7:0x89fa608] Op_assign_concat : c -@print{} [ :0x89fa6a8] Op_jmp : [target_jmp = 0x89fa568] -@print{} [ :0x89fa680] Op_pop_loop : +@print{} # BEGIN @print{} -@dots{} +@print{} [ 1:0xfcd340] Op_rule : [in_rule = BEGIN] [source_file = brini.awk] +@print{} [ 1:0xfcc240] Op_push_i : "~" [MALLOC|STRING|STRCUR] +@print{} [ 1:0xfcc2a0] Op_push_i : "~" [MALLOC|STRING|STRCUR] +@print{} [ 1:0xfcc280] Op_match : +@print{} [ 1:0xfcc1e0] Op_store_var : O +@print{} [ 1:0xfcc2e0] Op_push_i : "==" [MALLOC|STRING|STRCUR] +@print{} [ 1:0xfcc340] Op_push_i : "==" [MALLOC|STRING|STRCUR] +@print{} [ 1:0xfcc320] Op_equal : +@print{} [ 1:0xfcc200] Op_store_var : o +@print{} [ 1:0xfcc380] Op_push : o +@print{} [ 1:0xfcc360] Op_plus_i : 0 [MALLOC|NUMCUR|NUMBER] +@print{} [ 1:0xfcc220] Op_push_lhs : o [do_reference = true] +@print{} [ 1:0xfcc300] Op_assign_plus : +@print{} [ :0xfcc2c0] Op_pop : +@print{} [ 1:0xfcc400] Op_push : O +@print{} [ 1:0xfcc420] Op_push_i : "" [MALLOC|STRING|STRCUR] +@print{} [ :0xfcc4a0] Op_no_op : +@print{} [ 1:0xfcc480] Op_push : O +@print{} [ :0xfcc4c0] Op_concat : [expr_count = 3] [concat_flag = 0] +@print{} [ 1:0xfcc3c0] Op_store_var : x +@print{} [ 1:0xfcc440] Op_push_lhs : X [do_reference = true] +@print{} [ 1:0xfcc3a0] Op_postincrement : +@print{} [ 1:0xfcc4e0] Op_push : x +@print{} [ 1:0xfcc540] Op_push : o +@print{} [ 1:0xfcc500] Op_plus : +@print{} [ 1:0xfcc580] Op_push : o +@print{} [ 1:0xfcc560] Op_plus : +@print{} [ 1:0xfcc460] Op_leq : +@print{} [ :0xfcc5c0] Op_jmp_false : [target_jmp = 0xfcc5e0] +@print{} [ 1:0xfcc600] Op_push_i : "%c" [MALLOC|STRING|STRCUR] +@print{} [ :0xfcc660] Op_no_op : +@print{} [ 1:0xfcc520] Op_assign_concat : c +@print{} [ :0xfcc620] Op_jmp : [target_jmp = 0xfcc440] @print{} -@print{} [ 8:0x89fa658] Op_K_printf : [expr_count = 17] [redir_type = ""] -@print{} [ :0x89fa374] Op_no_op : -@print{} [ :0x89fa3d8] Op_atexit : -@print{} [ :0x89fa6bc] Op_stop : -@print{} [ :0x89fa39c] Op_no_op : -@print{} [ :0x89fa3b0] Op_after_beginfile : -@print{} [ :0x89fa388] Op_no_op : -@print{} [ :0x89fa3c4] Op_after_endfile : +@dots{} +@print{} +@print{} [ 2:0xfcc5a0] Op_K_printf : [expr_count = 17] [redir_type = ""] +@print{} [ :0xfcc140] Op_no_op : +@print{} [ :0xfcc1c0] Op_atexit : +@print{} [ :0xfcc640] Op_stop : +@print{} [ :0xfcc180] Op_no_op : +@print{} [ :0xfcd150] Op_after_beginfile : +@print{} [ :0xfcc160] Op_no_op : +@print{} [ :0xfcc1a0] Op_after_endfile : gawk> @end smallexample @@ -27090,7 +27993,7 @@ the general attributes of computer arithmetic, along with how this can influence what you see when running @command{awk} programs. This discussion applies to all versions of @command{awk}. -Then the @value{CHAPTER} moves on to @dfn{arbitrary precision +The @value{CHAPTER} then moves on to describe @dfn{arbitrary precision arithmetic}, a feature which is specific to @command{gawk}. @menu @@ -27185,7 +28088,7 @@ which plays a role in how variables are used in comparisons. It is important to note that the string value for a number may not reflect the full value (all the digits) that the numeric value actually contains. -The following program (@file{values.awk}) illustrates this: +The following program, @file{values.awk}, illustrates this: @example @{ @@ -27395,7 +28298,7 @@ Thus @samp{+nan} and @samp{+NaN} are the same. @node Integer Programming @subsection Mixing Integers And Floating-point -As has been mentioned already, @command{gawk} ordinarily uses hardware double +As has been mentioned already, @command{awk} uses hardware double precision with 64-bit IEEE binary floating-point representation for numbers on most systems. A large integer like 9,007,199,254,740,997 has a binary representation that, although finite, is more than 53 bits long; @@ -27438,7 +28341,7 @@ is @ifnottex [@minus{}2^53, 2^53]. @end ifnottex -If you ever see an integer outside this range in @command{gawk} +If you ever see an integer outside this range in @command{awk} using 64-bit doubles, you have reason to be very suspicious about the accuracy of the output. Here is a simple program with erroneous output: @@ -27450,7 +28353,7 @@ $ @kbd{gawk 'BEGIN @{ i = 2^53 - 1; for (j = 0; j < 4; j++) print i + j @}'} @print{} 9007199254740994 @end example -The lesson is to not assume that any large integer printed by @command{gawk} +The lesson is to not assume that any large integer printed by @command{awk} represents an exact result from your computation, especially if it wraps around on your screen. @@ -27460,8 +28363,12 @@ around on your screen. Numerical programming is an extensive area; if you need to develop sophisticated numerical algorithms then @command{gawk} may not be the ideal tool, and this documentation may not be sufficient. -@c FIXME: JOHN: Do you want to cite some actual books? -It might require digesting a book or two to really internalize how to compute +It might require digesting a book or two@footnote{One recommended title is +@cite{Numerical Computing with IEEE Floating Point Arithmetic}, Michael L.@: +Overton, Society for Industrial and Applied Mathematics, 2004. +ISBN: 0-89871-482-6, ISBN-13: 978-0-89871-482-1. See +@uref{http://www.cs.nyu.edu/cs/faculty/overton/book}.} +to really internalize how to compute with ideal accuracy and precision, and the result often depends on the particular application. @@ -27574,7 +28481,7 @@ yield an unexpected result: @example $ @kbd{gawk 'BEGIN @{} -> @kbd{for (d = 1.1; d <= 1.5; d += 0.1)} +> @kbd{for (d = 1.1; d <= 1.5; d += 0.1) # loop five times (?)} > @kbd{i++} > @kbd{print i} > @kbd{@}'} @@ -27589,7 +28496,7 @@ Instead of arbitrary precision floating-point arithmetic, often all you need is an adjustment of your logic or a different order for the operations in your calculation. The stability and the accuracy of the computation of the constant @value{PI} -in the previous example can be enhanced by using the following +in the earlier example can be enhanced by using the following simple algebraic transformation: @example @@ -27601,7 +28508,7 @@ After making this, change the program does converge to @value{PI} in under 30 iterations: @example -$ @kbd{gawk -f /tmp/pi2.awk} +$ @kbd{gawk -f pi2.awk} @print{} 3.215390309173473 @print{} 3.159659942097501 @print{} 3.146086215131436 @@ -27681,14 +28588,18 @@ The context has the following primary components: @table @dfn @item Precision Precision of the floating-point format in bits. + @item emax -Maximum exponent allowed for this format. +Maximum exponent allowed for the format. + @item emin -Minimum exponent allowed for this format. +Minimum exponent allowed for the format. + @item Underflow behavior The format may or may not support gradual underflow. + @item Rounding -The rounding mode of this context. +The rounding mode of the context. @end table @ref{table-ieee-formats} lists the precision and exponent @@ -27763,7 +28674,7 @@ In this case, the number is rounded to the nearest even digit. So rounding 0.125 to two digits rounds down to 0.12, but rounding 0.6875 to three digits rounds up to 0.688. You probably have already encountered this rounding mode when -using the @code{printf} routine to format floating-point numbers. +using @code{printf} to format floating-point numbers. For example: @example @@ -27777,10 +28688,10 @@ BEGIN @{ @end example @noindent -produces the following output when run:@footnote{It +produces the following output when run on the author's system:@footnote{It is possible for the output to be completely different if the C library in your system does not use the IEEE-754 even-rounding -rule to round halfway cases for @code{printf()}.} +rule to round halfway cases for @code{printf}.} @example -3.5 => -4 @@ -27796,7 +28707,7 @@ rule to round halfway cases for @code{printf()}.} The theory behind the rounding mode @code{roundTiesToEven} is that it more or less evenly distributes upward and downward rounds -of exact halves, which might cause the round-off error +of exact halves, which might cause any round-off error to cancel itself out. This is the default rounding mode used in IEEE-754 computing functions and operators. @@ -27837,8 +28748,8 @@ the following command: @example $ @kbd{gawk --version} -@print{} GNU Awk 4.1.0 (GNU MPFR 3.1.0, GNU MP 5.0.3) -@print{} Copyright (C) 1989, 1991-2012 Free Software Foundation. +@print{} GNU Awk 4.1.0, API: 1.0 (GNU MPFR 3.1.0-p3, GNU MP 5.0.2) +@print{} Copyright (C) 1989, 1991-2013 Free Software Foundation. @dots{} @end example @@ -27870,8 +28781,8 @@ in general, and the limitations of doing arithmetic with ordinary @command{gawk} uses the GNU MPFR library for arbitrary precision floating-point arithmetic. The MPFR library provides precise control over precisions and rounding modes, and gives -correctly rounded, reproducible, platform-independent results. With the -command-line option @option{--bignum} or @option{-M}, +correctly rounded, reproducible, platform-independent results. With one +of the command-line options @option{--bignum} or @option{-M}, all floating-point arithmetic operators and numeric functions can yield results to any desired precision level supported by MPFR. Two built-in variables, @code{PREC} and @code{ROUNDMODE}, @@ -27881,11 +28792,11 @@ provide control over the working precision and the rounding mode The precision and the rounding mode are set globally for every operation to follow. -The default working precision for arbitrary precision floating-point values is 53, -and the default value for @code{ROUNDMODE} is @code{"N"}, +The default working precision for arbitrary precision floating-point values is +53 bits, and the default value for @code{ROUNDMODE} is @code{"N"}, which selects the IEEE-754 @code{roundTiesToEven} rounding mode (@pxref{Rounding Mode}).@footnote{The -default precision is 53, since according to the MPFR documentation, +default precision is 53 bits, since according to the MPFR documentation, the library should be able to exactly reproduce all computations with double-precision machine floating-point numbers (@code{double} type in C), except the default exponent range is much wider and subnormal @@ -27932,11 +28843,14 @@ your program. @command{gawk} uses a global working precision; it does not keep track of the precision or accuracy of individual numbers. Performing an arithmetic operation or calling a built-in function rounds the result to the current -working precision. The default working precision is 53, which can be +working precision. The default working precision is 53 bits, which can be modified using the built-in variable @code{PREC}. You can also set the -value to one of the following pre-defined case-insensitive strings -to emulate an IEEE-754 binary format: +value to one of the pre-defined case-insensitive strings +shown in @ref{table-predefined-precision-strings}, +to emulate an IEEE-754 binary format. +@float Table,table-predefined-precision-strings +@caption{Predefined precision strings for @code{PREC}} @multitable {@code{"double"}} {12345678901234567890123456789012345} @headitem @code{PREC} @tab IEEE-754 Binary Format @item @code{"half"} @tab 16-bit half-precision. @@ -27945,12 +28859,13 @@ to emulate an IEEE-754 binary format: @item @code{"quad"} @tab Basic 128-bit quadruple precision. @item @code{"oct"} @tab 256-bit octuple precision. @end multitable +@end float The following example illustrates the effects of changing precision on arithmetic operations: @example -$ @kbd{gawk -M -v PREC=100 'BEGIN @{ x = 1.0e-400; print x + 0; \} +$ @kbd{gawk -M -v PREC=100 'BEGIN @{ x = 1.0e-400; print x + 0} > @kbd{PREC = "double"; print x + 0 @}'} @print{} 1e-400 @print{} 0 @@ -28019,7 +28934,7 @@ rounding modes is shown in @ref{table-gawk-rounding-modes}. @code{ROUNDMODE} has the default value @code{"N"}, which selects the IEEE-754 rounding mode @code{roundTiesToEven}. -@ref{table-gawk-rounding-modes}, lists @code{"A"} to select the IEEE-754 mode +In @ref{table-gawk-rounding-modes}, @code{"A"} is listed to select the IEEE-754 mode @code{roundTiesToAway}. This is only available if your version of the MPFR library supports it; otherwise setting @code{ROUNDMODE} to this value has no effect. @xref{Rounding Mode}, @@ -28062,7 +28977,7 @@ $ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", 1/10) @}'} @print{} 0.1000000000000000000000000 @end example -In the first case, the number is stored with the default precision of 53. +In the first case, the number is stored with the default precision of 53 bits. @node Changing Precision @subsection Changing the Precision of a Number @@ -28125,7 +29040,7 @@ using the machine double precision arithmetic, it decides that they are not equal! (@xref{Floating-point Programming}.) You can get the result you want by increasing the precision; -56 in this case will get the job done: +56 bits in this case will get the job done: @example $ @kbd{gawk -M -v PREC=56 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'} @@ -28170,7 +29085,7 @@ in floating-point arithmetic. In the example in @example $ @kbd{gawk 'BEGIN @{} -> @kbd{for (d = 1.1; d <= 1.5; d += 0.1)} +> @kbd{for (d = 1.1; d <= 1.5; d += 0.1) # loop five times (?)} > @kbd{i++} > @kbd{print i} > @kbd{@}'} @@ -28186,7 +29101,7 @@ the problem at hand is often the correct approach in such situations. @section Arbitrary Precision Integer Arithmetic with @command{gawk} @cindex integer, arbitrary precision -If the option @option{--bignum} or @option{-M} is specified, +If one of the options @option{--bignum} or @option{-M} is specified, @command{gawk} performs all integer arithmetic using GMP arbitrary precision integers. Any number that looks like an integer in a program source or data file @@ -28245,9 +29160,9 @@ $ @kbd{gawk -M 'BEGIN @{} @end example The output differs from the actual number, 113,423,713,055,421,844,361,000,443, -because the default precision of 53 is not enough to represent the +because the default precision of 53 bits is not enough to represent the floating-point results exactly. You can either increase the precision -(100 is enough in this case), or replace the floating-point constant +(100 bits is enough in this case), or replace the floating-point constant @samp{2.0} with an integer, to perform all computations using integer arithmetic to get the correct output. @@ -28282,7 +29197,7 @@ gawk -M 'BEGIN @{ n = 13; print n % 2 @}' @node Dynamic Extensions @chapter Writing Extensions for @command{gawk} -It is possible to add new built-in functions to @command{gawk} using +It is possible to add new functions written in C or C++ to @command{gawk} using dynamically loaded libraries. This facility is available on systems that support the C @code{dlopen()} and @code{dlsym()} functions. This @value{CHAPTER} describes how to create extensions @@ -28305,6 +29220,7 @@ When @option{--sandbox} is specified, extensions are disabled * Plugin License:: A note about licensing. * Extension Mechanism Outline:: An outline of how it works. * Extension API Description:: A full description of the API. +* Finding Extensions:: How @command{gawk} finds compiled extensions. * Extension Example:: Example C code for an extension. * Extension Samples:: The sample extensions that ship with @code{gawk}. @@ -28328,11 +29244,13 @@ want to do and can write in C or C++, you can write an extension to do it! Extensions are written in C or C++, using the @dfn{Application Programming Interface} (API) defined for this purpose by the @command{gawk} -developers. The rest of this @value{CHAPTER} explains the design -decisions behind the API, the facilities that it provides and how to use +developers. The rest of this @value{CHAPTER} explains +the facilities that the API provides and how to use them, and presents a small sample extension. In addition, it documents the sample extensions included in the @command{gawk} distribution, and describes the @code{gawkextlib} project. +@xref{Extension Design}, for a discussion of the extension mechanism +goals and design. @node Plugin License @section Extension Licensing @@ -28425,7 +29343,7 @@ Some other bits and pieces: The API provides access to @command{gawk}'s @code{do_@var{xxx}} values, reflecting command line options, like @code{do_lint}, @code{do_profiling} and so on (@pxref{Extension API Variables}). -These are informational: an extension cannot affect these +These are informational: an extension cannot affect their values inside @command{gawk}. In addition, attempting to assign to them produces a compile-time error. @@ -28451,15 +29369,13 @@ This (rather large) @value{SECTION} describes the API in detail. * Registration Functions:: Functions to register things with @command{gawk}. * Printing Messages:: Functions for printing messages. -* Updating @code{ERRNO}:: Functions for updating @code{ERRNO}. +* Updating @code{ERRNO}:: Functions for updating @code{ERRNO}. * Accessing Parameters:: Functions for accessing parameters. * Symbol Table Access:: Functions for accessing global variables. * Array Manipulation:: Functions for working with arrays. * Extension API Variables:: Variables provided by the API. * Extension API Boilerplate:: Boilerplate code for using the API. -* Finding Extensions:: How @command{gawk} finds compiled - extensions. @end menu @node Extension API Functions Introduction @@ -28501,8 +29417,7 @@ an array. @item Symbol table access: retrieving a global variable, creating one, -or changing one. This also includes the ability to create a scalar -variable that will be @emph{constant} within @command{awk} code. +or changing one. @item Creating and releasing cached values; this provides an @@ -28511,15 +29426,20 @@ can be a big performance win. @item Manipulating arrays: + @itemize @minus @item Retrieving, adding, deleting, and modifying elements + @item Getting the count of elements in an array + @item Creating a new array + @item Clearing an array + @item Flattening an array for easy C style looping over all its indices and elements @end itemize @@ -28535,10 +29455,13 @@ corresponding standard header file @emph{before} including @file{gawkapi.h}: @multitable {@code{memset()}, @code{memcpy()}} {@code{<sys/types.h>}} @headitem C Entity @tab Header File +@item @code{EOF} @tab @code{<stdio.h>} @item @code{FILE} @tab @code{<stdio.h>} @item @code{NULL} @tab @code{<stddef.h>} @item @code{malloc()} @tab @code{<stdlib.h>} -@item @code{memset()}, @code{memcpy()} @tab @code{<string.h>} +@item @code{memcpy()} @tab @code{<string.h>} +@item @code{memset()} @tab @code{<string.h>} +@item @code{realloc()} @tab @code{<stdlib.h>} @item @code{size_t} @tab @code{<sys/types.h>} @item @code{struct stat} @tab @code{<sys/stat.h>} @end multitable @@ -28547,7 +29470,8 @@ Due to portability concerns, especially to systems that are not fully standards-compliant, it is your responsibility to include the correct files in the correct way. This requirement is necessary in order to keep @file{gawkapi.h} clean, instead of becoming -a portability hodge-podge as can be seen in the @command{gawk} source code. +a portability hodge-podge as can be seen in some parts of +the @command{gawk} source code. To pass reasonable integer values for @code{ERRNO}, you will also need to include @code{<errno.h>}. @@ -28571,16 +29495,18 @@ from the extension @emph{must} come from @code{malloc()} and is managed by @command{gawk} from then on. @item -The API defines several simple structs that map values as seen +The API defines several simple @code{struct}s that map values as seen from @command{awk}. A value can be a @code{double}, a string, or an array (as in multidimensional arrays, or when creating a new array). -Strings maintain both pointer and length since embedded @code{NUL} +String values maintain both pointer and length since embedded @code{NUL} characters are allowed. +@quotation NOTE By intent, strings are maintained using the current multibyte encoding (as defined by @env{LC_@var{xxx}} environment variables) and not using wide characters. This matches how @command{gawk} stores strings internally and also how characters are likely to be input and output from files. +@end quotation @item When retrieving a value (such as a parameter or that of a global variable @@ -28591,7 +29517,7 @@ scalars, value cookie, array, or ``undefined''). When the request is However, if the request and actual type don't match, the access function returns ``false'' and fills in the type of the actual value that is there, so that the extension can, e.g., print an error message -(``scalar passed where array expected''). +(such as ``scalar passed where array expected''). @c This is documented in the header file and needs some expanding upon. @c The table there should be presented here @@ -28616,7 +29542,7 @@ Chet Ramey @end quotation The extension API defines a number of simple types and structures for general -purpose use. Additional, more specialized, data structures, are introduced +purpose use. Additional, more specialized, data structures are introduced in subsequent @value{SECTION}s, together with the functions that use them. @table @code @@ -28655,7 +29581,7 @@ multibyte encoding. @itemx @ @ @ @ AWK_STRING, @itemx @ @ @ @ AWK_ARRAY, @itemx @ @ @ @ AWK_SCALAR,@ @ @ @ @ @ @ @ @ /* opaque access to a variable */ -@itemx @ @ @ @ AWK_VALUE_COOKIE@ @ @ /* for updating a previously created value */ +@itemx @ @ @ @ AWK_VALUE_COOKIE@ @ @ @ /* for updating a previously created value */ @itemx @} awk_valtype_t; This @code{enum} indicates the type of a value. It is used in the following @code{struct}. @@ -28848,7 +29774,7 @@ exit with a fatal error message. They should be used as if they were procedure calls that do not return a value. @table @code -@item emalloc(pointer, type, size, message) +@item #define emalloc(pointer, type, size, message) @dots{} The arguments to this macro are as follows: @c nested table @table @code @@ -28879,7 +29805,7 @@ strcpy(message, greet); make_malloced_string(message, strlen(message), & result); @end example -@item erealloc(pointer, type, size, message) +@item #define erealloc(pointer, type, size, message) @dots{} This is like @code{emalloc()}, but it calls @code{realloc()}, instead of @code{malloc()}. The arguments are the same as for the @code{emalloc()} macro. @@ -28925,6 +29851,7 @@ Function names must obey the rules for @command{awk} identifiers. That is, they must begin with either a letter or an underscore, which may be followed by any number of letters, digits, and underscores. +Letter case in function names is significant. @item awk_value_t *(*function)(int num_actual_args, awk_value_t *result); This is a pointer to the C function that provides the desired @@ -28977,8 +29904,8 @@ The parameters are: @item funcp A pointer to the function to be called before @command{gawk} exits. The @code{data} parameter will be the original value of @code{arg0}. -The @code{exit_status} parameter is -the exit status value that @command{gawk} will pass to the @code{exit()} system call. +The @code{exit_status} parameter is the exit status value that +@command{gawk} intends to pass to the @code{exit()} system call. @item arg0 A pointer to private data which @command{gawk} saves in order to pass to @@ -29010,7 +29937,7 @@ is invoked with the @option{--version} option. By default, @command{gawk} reads text files as its input. It uses the value of @code{RS} to find the end of the record, and then uses @code{FS} -(or @code{FIELDWIDTHS}) to split it into fields (@pxref{Reading Files}). +(or @code{FIELDWIDTHS} or @code{FPAT}) to split it into fields (@pxref{Reading Files}). Additionally, it sets the value of @code{RT} (@pxref{Built-in Variables}). If you want, you can provide your own custom input parser. An input @@ -29047,7 +29974,7 @@ typedef struct awk_input_parser @{ const char *name; /* name of parser */ awk_bool_t (*can_take_file)(const awk_input_buf_t *iobuf); awk_bool_t (*take_control_of)(awk_input_buf_t *iobuf); - awk_const struct awk_input_parser *awk_const next; /* for use by gawk */ + awk_const struct awk_input_parser *awk_const next; /* for gawk */ @} awk_input_parser_t; @end example @@ -29126,11 +30053,11 @@ in the @code{struct stat}, or any combination of the above. Once @code{@var{XXX}_can_take_file()} has returned true, and @command{gawk} has decided to use your input parser, it calls -@code{@var{XXX}_take_control_of()}. That function then fills in at -least the @code{get_record} field of the @code{awk_input_buf_t}. It must -also ensure that @code{fd} is not set to @code{INVALID_HANDLE}. All of -the fields that may be filled by @code{@var{XXX}_take_control_of()} -are as follows: +@code{@var{XXX}_take_control_of()}. That function then fills one of +either the @code{get_record} field or the @code{read_func} field in +the @code{awk_input_buf_t}. It must also ensure that @code{fd} is @emph{not} +set to @code{INVALID_HANDLE}. All of the fields that may be filled by +@code{@var{XXX}_take_control_of()} are as follows: @table @code @item void *opaque; @@ -29207,8 +30134,8 @@ to zero, so there is no need to set it unless an error occurs. If an error does occur, the function should return @code{EOF} and set @code{*errcode} to a non-zero value. In that case, if @code{*errcode} does not equal @minus{}1, @command{gawk} automatically updates -the @code{ERRNO} variable based on the value of @code{*errcode} (e.g., -setting @samp{*errcode = errno} should do the right thing). +the @code{ERRNO} variable based on the value of @code{*errcode}. +(In general, setting @samp{*errcode = errno} should do the right thing.) As an alternative to supplying a function that returns an input record, you may instead supply a function that simply reads bytes, and let @@ -29257,7 +30184,7 @@ Register the input parser pointed to by @code{input_parser} with An @dfn{output wrapper} is the mirror image of an input parser. It allows an extension to take over the output to a file opened -with the @samp{>} or @samp{>>} operators (@pxref{Redirection}). +with the @samp{>} or @samp{>>} I/O redirection operators (@pxref{Redirection}). The output wrapper is very similar to the input parser structure: @@ -29266,7 +30193,7 @@ typedef struct awk_output_wrapper @{ const char *name; /* name of the wrapper */ awk_bool_t (*can_take_file)(const awk_output_buf_t *outbuf); awk_bool_t (*take_control_of)(awk_output_buf_t *outbuf); - awk_const struct awk_output_wrapper *awk_const next; /* for use by gawk */ + awk_const struct awk_output_wrapper *awk_const next; /* for gawk */ @} awk_output_wrapper_t; @end example @@ -29290,7 +30217,9 @@ fill in appropriate members of the @code{awk_output_buf_t} structure, as described below, and return true if successful, false otherwise. @item awk_const struct output_wrapper *awk_const next; -This is for use by @command{gawk}. +This is for use by @command{gawk}; +therefore they are marked @code{awk_const} so that the extension cannot +modify them. @end table The @code{awk_output_buf_t} structure looks like this: @@ -29381,7 +30310,7 @@ typedef struct awk_two_way_processor @{ awk_bool_t (*take_control_of)(const char *name, awk_input_buf_t *inbuf, awk_output_buf_t *outbuf); - awk_const struct awk_two_way_processor *awk_const next; /* for use by gawk */ + awk_const struct awk_two_way_processor *awk_const next; /* for gawk */ @} awk_two_way_processor_t; @end example @@ -29404,7 +30333,9 @@ This function should fill in the @code{awk_input_buf_t} and @code{outbuf}, respectively. These structures were described earlier. @item awk_const struct two_way_processor *awk_const next; -This is for use by @command{gawk}. +This is for use by @command{gawk}; +therefore they are marked @code{awk_const} so that the extension cannot +modify them. @end table As with the input parser and output processor, you provide @@ -29535,10 +30466,14 @@ This routine cannot be used to update any of the predefined variables (such as @code{ARGC} or @code{NF}). @end table +An extension can look up the value of @command{gawk}'s special variables. +However, with the exception of the @code{PROCINFO} array, an extension +cannot change any of those variables. + @node Symbol table by cookie @subsubsection Variable Access and Update by Cookie -A @dfn{scalar cookie} is an opaque handle that provide access +A @dfn{scalar cookie} is an opaque handle that provides access to a global variable or array. It is an optimization that avoids looking up variables in @command{gawk}'s symbol table every time access is needed. This was discussed earlier, in @ref{General Data Types}. @@ -29561,10 +30496,10 @@ Here too, the built-in variables may not be updated. @end table It is not obvious at first glance how to work with scalar cookies or -what their @i{raison d@^etre} really is. In theory, the @code{sym_lookup()} +what their @i{raison d'@^etre} really is. In theory, the @code{sym_lookup()} and @code{sym_update()} routines are all you really need to work with -variables. For example, you might have code that looked up the value of -a variable, evaluated a condition, and then possibly changed the value +variables. For example, you might have code that looks up the value of +a variable, evaluates a condition, and then possibly changes the value of the variable based on the result of that evaluation, like so: @example @@ -29736,7 +30671,7 @@ are all the others be changed too?'' That's a great question. The answer is that no, it's not a problem. Internally, @command{gawk} uses reference-counted strings. This means -that many variables can share the same string, and @command{gawk} +that many variables can share the same string value, and @command{gawk} keeps track of the usage. When a variable's value changes, @command{gawk} simply decrements the reference count on the old value and updates the variable to use the new value. @@ -29919,7 +30854,7 @@ To @dfn{flatten} an array is create a structure that represents the full array in a fashion that makes it easy for C code to traverse the entire array. Test code in @file{extension/testext.c} does this, and also serves -as a nice example to show how to use the APIs. +as a nice example showing how to use the APIs. First, the @command{gawk} script that drives the test extension: @@ -29943,7 +30878,7 @@ This code creates an array with @code{split()} (@pxref{String Functions}) and then calls @code{dump_array_and_delete()}. That function looks up the array whose name is passed as the first argument, and deletes the element at the index passed in the second argument. -It then prints the return value and checks if the element +The @command{awk} code then prints the return value and checks if the element was indeed deleted. Here is the C code that implements @code{dump_array_and_delete()}. It has been edited slightly for presentation. @@ -30047,7 +30982,7 @@ element values. In addition, upon finding the element with the index that is supposed to be deleted, the function sets the @code{AWK_ELEMENT_DELETE} bit in the @code{flags} field of the element. When the array is released, @command{gawk} -traverses the flattened array, and deletes any element which +traverses the flattened array, and deletes any elements which have this flag bit set: @example @@ -30145,17 +31080,15 @@ into @command{gawk}, you have to retrieve the array cookie from the value passed in to @command{sym_update()} before doing anything else with it, like so: @example -awk_value_t index, value; +awk_value_t value; awk_array_t new_array; -make_const_string("an index", 8, & index); - new_array = create_array(); val.val_type = AWK_ARRAY; val.array_cookie = new_array; /* install array in the symbol table */ -sym_update("array", & index, & val); +sym_update("array", & val); new_array = val.array_cookie; /* YOU MUST DO THIS */ @end example @@ -30525,7 +31458,7 @@ the version string with @command{gawk}. @end enumerate @node Finding Extensions -@subsection How @command{gawk} Finds Extensions +@section How @command{gawk} Finds Extensions Compiled extensions have to be installed in a directory where @command{gawk} can find them. If @command{gawk} is configured and @@ -30985,13 +31918,15 @@ do_stat(int nargs, awk_value_t *result) awk_array_t array; int ret; struct stat sbuf; - int (*statfunc)(const char *path, struct stat *sbuf) = lstat; /* default */ + /* default is stat() */ + int (*statfunc)(const char *path, struct stat *sbuf) = lstat; assert(result != NULL); if (nargs != 2 && nargs != 3) @{ if (do_lint) - lintwarn(ext_id, _("stat: called with wrong number of arguments")); + lintwarn(ext_id, + _("stat: called with wrong number of arguments")); return make_number(-1, result); @} @end example @@ -31271,7 +32206,7 @@ Corresponds to the @code{st_minor} field in the @code{struct stat}. This element is only present for device files. @item @code{statdata["blksize"]} @tab -Corresponds to the @code{st_blksize} field in the @code{struct stat}. +Corresponds to the @code{st_blksize} field in the @code{struct stat}, if this field is present on your system. (It is present on all modern systems that we know of.) @@ -31303,7 +32238,7 @@ Not all systems support all file types. @itemx result = fts(pathlist, flags, filedata) Walk the file trees provided in @code{pathlist} and fill in the @code{filedata} array as described below. @code{flags} is the bitwise -OR of several predefined constant values, also as described below. +OR of several predefined constant values, also described below. Return zero if there were no errors, otherwise return @minus{}1. @end table @@ -31348,9 +32283,9 @@ Immediately follow a symbolic link named in @code{pathlist}, whether or not @code{FTS_LOGICAL} is set. @item FTS_SEEDOT -By default, the @code{fts()} routines do not return entries for @file{.} -and @file{..}. This option causes entries for @file{..} to also -be included. (The extension always includes an entry for @file{.}, +By default, the @code{fts()} routines do not return entries for @file{.} (dot) +and @file{..} (dot-dot). This option causes entries for dot-dot to also +be included. (The extension always includes an entry for dot, see below.) @item FTS_XDEV @@ -31365,7 +32300,7 @@ The element for this index is itself an array. There are two cases. @c nested table @table @emph -@item The path is a file. +@item The path is a file In this case, the array contains two or three elements: @c doubly nested table @@ -31385,7 +32320,7 @@ If some kind of error was encountered, the array will also contain an element named @code{"error"}, which is a string describing the error. @end table -@item The path is a directory. +@item The path is a directory In this case, the array contains one element for each entry in the directory. If an entry is a file, that element is as for files, just described. If the entry is a directory, that element is (recursively), @@ -31439,7 +32374,7 @@ The arguments to @code{fnmatch()} are: The filename wildcard to match. @item string -The filename string, +The filename string. @item flag Either zero, or the bitwise OR of one or more of the @@ -31523,10 +32458,10 @@ else @end example @node Extension Sample Inplace -@subsection Enabling in-place file editing. +@subsection Enabling In-Place File Editing -The @code{inplace} extension emulates the @command{sed} @option{-i} option -which performs ``in placed'' editing of each input file. +The @code{inplace} extension emulates GNU @command{sed}'s @option{-i} option +which performs ``in place'' editing of each input file. It uses the bundled @file{inplace.awk} include file to invoke the extension properly: @@ -31576,7 +32511,7 @@ $ @kbd{gawk -i inplace -v INPLACE_SUFFIX=.bak '@{ gsub(/foo/, "bar") @}} @end example We leave it as an exercise to write a wrapper script that presents an -interface similar to the @command{sed} @option{-i} option. +interface similar to @samp{sed -i}. @node Extension Sample Ord @subsection Character and Numeric values: @code{ord()} and @code{chr()} @@ -31585,11 +32520,14 @@ The @code{ordchr} extension adds two functions, named @code{ord()} and @code{chr()}, as follows. @table @code +@item @@load "ordchr" +This is how you load the extension. + @item number = ord(string) Return the numeric value of the first character in @code{string}. @item char = chr(number) -Return the string whose first character is that represented by @code{number}. +Return a string whose first character is that represented by @code{number}. @end table These functions are inspired by the Pascal language functions @@ -31619,8 +32557,8 @@ they are read, with each entry returned as a record. The record consists of three fields. The first two are the inode number and the filename, separated by a forward slash character. On systems where the directory entry contains the file type, the record -has a third field which is a single letter indicating the type of the -file: +has a third field (also separated by a slash) which is a single letter +indicating the type of the file: @multitable @columnfractions .1 .9 @headitem Letter @tab File Type @@ -31718,8 +32656,8 @@ The array created by @code{reada()} is identical to that written by @code{writea()} in the sense that the contents are the same. However, due to implementation issues, the array traversal order of the recreated array is likely to be different from that of the original array. As array -traversal order in @command{awk} is by default undefined, this is not -(technically) a problem. If you need to guarantee a particular traversal +traversal order in @command{awk} is by default undefined, this is (technically) +not a problem. If you need to guarantee a particular traversal order, use the array sorting features in @command{gawk} to do so (@pxref{Array Sorting}). @@ -31746,6 +32684,9 @@ The @code{readfile} extension adds a single function named @code{readfile()}: @table @code +@item @@load "readfile" +This is how you load the extension. + @item result = readfile("/some/path") The argument is the name of the file to read. The return value is a string containing the entire contents of the requested file. Upon error, @@ -31780,11 +32721,13 @@ for more information. @cindex time @cindex sleep -These functions can be used by either invoking @command{gawk} +These functions can be used either by invoking @command{gawk} with a command-line argument of @samp{-l time} or by inserting @samp{@@load "time"} in your script. @table @code +@item @@load "time" +This is how you load the extension. @cindex @code{gettimeofday} time extension function @item the_time = gettimeofday() @@ -31878,6 +32821,7 @@ make && make check @ii{Build and check that all is OK} If you write an extension that you wish to share with other @command{gawk} users, please consider doing so through the @code{gawkextlib} project. +See the project's web site for more information. @iftex @part Part IV:@* Appendices @@ -32132,6 +33076,24 @@ More complete documentation of many of the previously undocumented features of the language. @end itemize +In 2012, a number of extensions that had been commonly available for +many years were finally added to POSIX. They are: + +@itemize @bullet +@item +The @code{fflush()} built-in function for flushing buffered output +(@pxref{I/O Functions}). + +@item +The @code{nextfile} statement +(@pxref{Nextfile Statement}). + +@item +The ability to delete all of an array at once with @samp{delete @var{array}} +(@pxref{Delete}). + +@end itemize + @xref{Common Extensions}, for a list of common extensions not permitted by the POSIX standard. @@ -32168,7 +33130,6 @@ The use of @code{func} as an abbreviation for @code{function} @item The @code{fflush()} built-in function for flushing buffered output (@pxref{I/O Functions}). -As of December 2012, this function is now standardized by POSIX. @ignore @item @@ -32517,6 +33478,7 @@ the three most widely-used freely available versions of @command{awk} @item @file{/dev/stdout} special file @tab X @tab X @tab X @item @file{/dev/stderr} special file @tab X @tab X @tab X @item @code{**} and @code{**=} operators @tab X @tab @tab X +@item @code{fflush()} function @tab X @tab X @tab X @item @code{func} keyword @tab X @tab @tab X @item @code{nextfile} statement @tab X @tab X @tab X @item @code{delete} without subscript @tab X @tab X @tab X @@ -32624,6 +33586,7 @@ to implementors to implement ranges in whatever way they choose. The @command{gawk} maintainer chose to apply the pre-POSIX meaning in all cases: the default regexp matching; with @option{--traditional}, and with @option{--posix}; in all cases, @command{gawk} remains POSIX compliant. + @node Contributors @appendixsec Major Contributors to @command{gawk} @cindex @command{gawk}, list of contributors to @@ -32816,17 +33779,40 @@ Patrick T.J.@: McPhee contributed the code for dynamic loading in Windows32 environments. (This is no longer supported) + @item @cindex Haque, John -John Haque -reworked the @command{gawk} internals to use a byte-code engine, -providing the @command{gawk} debugger for @command{awk} programs. +John Haque made the following contributions: + +@itemize @minus +@item +The modifications to convert @command{gawk} +into a byte-code interpreter, including the debugger. + +@item +The additional modifications for support of arbitrary precision arithmetic. + +@item +The initial text of +@ref{Arbitrary Precision Arithmetic}. + +@item +The work to merge the three versions of @command{gawk} +into one, for the 4.1 release. +@end itemize @item @cindex Yawitz, Efraim Efraim Yawitz contributed the original text for @ref{Debugger}. @item +@cindex Schorr, Andrew +The development of the extension API first released with +@command{gawk} 4.1 was driven primarily by +Arnold Robbins and Andrew Schorr, with notable contributions from +the rest of the development team. + +@item @cindex Robbins, Arnold Arnold Robbins has been working on @command{gawk} since 1988, at first @@ -34152,6 +35138,14 @@ This is an embeddable @command{awk} interpreter derived from @command{mawk}. For more information see @uref{http://repo.hu/projects/libmawk/}. +@item @code{pawk} +@cindex @code{pawk}, @command{awk}-like facilities for Python +This is a Python module that claims to bring @command{awk}-like +features to Python. See @uref{https://github.com/alecthomas/pawk} +for more information. (This is not related to Nelson Beebe's +modified version of Brian Kernighan's @command{awk}, +described earlier.) + @item @w{QSE Awk} @cindex QSE Awk @cindex source code, QSE Awk @@ -34190,7 +35184,7 @@ maintainers of @command{gawk}. Everything in it applies specifically to * Future Extensions:: New features that may be implemented one day. * Implementation Limitations:: Some limitations of the implementation. * Extension Design:: Design notes about the extension API. -* Old Extension Mechansim:: Some compatibility for old extensions. +* Old Extension Mechanism:: Some compatibility for old extensions. @end menu @node Compatibility Mode @@ -34271,17 +35265,10 @@ Once you have made changes, you can use @samp{git diff} to produce a patch, and send that to the @command{gawk} maintainer; see @ref{Bugs}, for how to do that. -Finally, if you cannot install Git (e.g., if it hasn't been ported -yet to your operating system), you can use the Git--CVS gateway -to check out a copy using CVS, as follows: - -@example -cvs -d:pserver:anonymous@@pserver.git.sv.gnu.org:/gawk.git co -d gawk master -@end example - -Note that this gateway is flakey; you may have better luck using -a more modern version control system like Bazaar, that has a Git -plug-in for working with Git repositories. +Once upon a time there was Git--CVS gateway for use by people who could +not install Git. However, this gateway no longer works, so you may have +better luck using a more modern version control system like Bazaar, +that has a Git plug-in for working with Git repositories. @node Adding Code @appendixsubsec Adding New Features @@ -34384,7 +35371,7 @@ of @code{switch} statements, instead of just the plain pointer or character value. @item -Use @code{true}, @code{false} for @code{bool} values, +Use @code{true} and @code{false} for @code{bool} values, the @code{NULL} symbolic constant for pointer values, and the character constant @code{'\0'} where appropriate, instead of @code{1} and @code{0}. @@ -34839,7 +35826,7 @@ functions. @command{gawk} included some sample extensions, of which a few were really useful. However, it was clear from the outset that the extension -mechanism was bolted onto the side and was not really thought out. +mechanism was bolted onto the side and was not really well thought out. @menu * Old Extension Problems:: Problems with the old mechanism. @@ -34900,7 +35887,7 @@ The API should provide @emph{binary} compatibility across @command{gawk} releases as long as the API itself does not change. @item -The API should enable extensions written in C to have roughly the +The API should enable extensions written in C or C++ to have roughly the same ``appearance'' to @command{awk}-level code as @command{awk} functions do. This means that extensions should have: @@ -35040,7 +36027,7 @@ to know. @item Similarly, the extension passes a ``name space'' into @command{gawk} -when it registers each extension function. This allows a future +when it registers each extension function. This accommodates a possible future mechanism for grouping extension functions and possibly avoiding name conflicts. @end itemize @@ -35048,17 +36035,17 @@ conflicts. Of course, as of this writing, no decisions have been made with respect to any of the above. -@node Old Extension Mechansim +@node Old Extension Mechanism @appendixsec Compatibility For Old Extensions @ref{Dynamic Extensions}, describes the supported API and mechanisms for writing extensions for @command{gawk}. This API was introduced -in @strong{FIXME: VERSION}. However, for many years @command{gawk} +in @value{PVERSION} 4.1. However, for many years @command{gawk} provided an extension mechanism that required knowledge of @command{gawk} internals and that was not as well designed. -In order to provide a transition period, @command{gawk} version -@strong{FIXME: VERSION} continues to support the original extension mechanism. +In order to provide a transition period, @command{gawk} @value{PVERSION} +4.1 continues to support the original extension mechanism. This will be true for the life of exactly one major release. This support will be withdrawn, and removed from the source code, at the next major release. @@ -35075,7 +36062,7 @@ Just as in previous versions, you load an old-style extension with the This function in turn finds and loads the shared object file containing the extension and calls its @code{dl_load()} C routine. -Because original-style and new-style extensions use different initialiation +Because original-style and new-style extensions use different initialization routines (@code{dl_load()} versus @code{dlload()}), they may safely be installed in the same directory (to be found by @env{AWKLIBPATH}) without conflict. @@ -35908,6 +36895,11 @@ tested. If the condition is satisfied, the pattern is said to @dfn{match} the input record. A typical pattern might compare the input record against a regular expression. (@xref{Pattern Overview}.) +@item PEBKAC +An acronym describing what is possibly the most frequent +source of computer usage problems. (Problem Exists Between +Keyboard And Chair.) + @item POSIX The name for a series of standards @c being developed by the IEEE @@ -37429,3 +38421,36 @@ Suggestions: % 1. Standardize the error messages from the functions and programs % in the two sample code chapters. % 2. Nuke the BBS stuff and use something that won't be obsolete +% 3. Turn the advanced notes into sidebars by using @cartouche + +Better sidebars can almost sort of be done with: + + @ifdocbook + @macro @sidebar{title, content} + @inlinefmt{docbook, <sidebar><title>} + \title\ + @inlinefmt{docbook, </title>} + \content\ + @inlinefmt{docbook, </sidebar>} + @end macro + @end ifdocbook + + + @ifnotdocbook + @macro @sidebar{title, content} + @cartouche + @center @b{\title\} + + \content\ + @end cartouche + @end macro + @end ifnotdocbook + +But to use it you have to say + + @sidebar{Title Here, + @include file-with-content + } + +which sorta sucks. + |