aboutsummaryrefslogtreecommitdiffstats
path: root/doc/gawk.texi
diff options
context:
space:
mode:
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r--doc/gawk.texi12578
1 files changed, 8116 insertions, 4462 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi
index 8cd7e38e..fb008d74 100644
--- a/doc/gawk.texi
+++ b/doc/gawk.texi
@@ -20,9 +20,9 @@
@c applies to and all the info about who's publishing this edition
@c These apply across the board.
-@set UPDATE-MONTH February, 2012
+@set UPDATE-MONTH November, 2012
@set VERSION 4.0
-@set PATCHLEVEL 1
+@set PATCHLEVEL 2
@set FSF
@@ -66,6 +66,15 @@
@set DARKCORNER (d.c.)
@set COMMONEXT (c.e.)
@end ifdocbook
+@ifxml
+@set DOCUMENT book
+@set CHAPTER chapter
+@set APPENDIX appendix
+@set SECTION section
+@set SUBSECTION subsection
+@set DARKCORNER (d.c.)
+@set COMMONEXT (c.e.)
+@end ifxml
@ifplaintext
@set DOCUMENT book
@set CHAPTER chapter
@@ -285,22 +294,24 @@ particular records in a file and perform operations upon them.
* Arrays:: The description and use of arrays. Also
includes array-oriented control statements.
* Functions:: Built-in and user-defined functions.
+* Library Functions:: A Library of @command{awk} Functions.
+* Sample Programs:: Many @command{awk} programs with complete
+ explanations.
* Internationalization:: Getting @command{gawk} to speak your
language.
-* Arbitrary Precision Arithmetic:: Arbitrary precision arithmetic with
- @command{gawk}.
* Advanced Features:: Stuff for advanced users, specific to
@command{gawk}.
-* Library Functions:: A Library of @command{awk} Functions.
-* Sample Programs:: Many @command{awk} programs with complete
- explanations.
* Debugger:: The @code{gawk} debugger.
+* Arbitrary Precision Arithmetic:: Arbitrary precision arithmetic with
+ @command{gawk}.
+* Dynamic Extensions:: Adding new built-in functions to
+ @command{gawk}.
* Language History:: The evolution of the @command{awk}
language.
* Installation:: Installing @command{gawk} under various
operating systems.
-* Notes:: Notes about @command{gawk} extensions and
- possible future work.
+* Notes:: Notes about adding things to @command{gawk}
+ and possible future work.
* Basic Concepts:: A very quick introduction to programming
concepts.
* Glossary:: An explanation of some unfamiliar terms.
@@ -310,416 +321,532 @@ particular records in a file and perform operations upon them.
* Index:: Concept and Variable Index.
@detailmenu
-* History:: The history of @command{gawk} and
- @command{awk}.
-* Names:: What name to use to find @command{awk}.
-* This Manual:: Using this @value{DOCUMENT}. Includes
- sample input files that you can use.
-* Conventions:: Typographical Conventions.
-* Manual History:: Brief history of the GNU project and this
- @value{DOCUMENT}.
-* How To Contribute:: Helping to save the world.
-* Acknowledgments:: Acknowledgments.
-* Running gawk:: How to run @command{gawk} programs;
- includes command-line syntax.
-* One-shot:: Running a short throwaway @command{awk}
- program.
-* Read Terminal:: Using no input files (input from terminal
- instead).
-* Long:: Putting permanent @command{awk} programs in
- files.
-* Executable Scripts:: Making self-contained @command{awk}
- programs.
-* Comments:: Adding documentation to @command{gawk}
- programs.
-* Quoting:: More discussion of shell quoting issues.
-* DOS Quoting:: Quoting in Windows Batch Files.
-* Sample Data Files:: Sample data files for use in the
- @command{awk} programs illustrated in this
- @value{DOCUMENT}.
-* Very Simple:: A very simple example.
-* Two Rules:: A less simple one-line example using two
- rules.
-* More Complex:: A more complex example.
-* Statements/Lines:: Subdividing or combining statements into
- lines.
-* Other Features:: Other Features of @command{awk}.
-* When:: When to use @command{gawk} and when to use
- other things.
-* Command Line:: How to run @command{awk}.
-* Options:: Command-line options and their meanings.
-* Other Arguments:: Input file names and variable assignments.
-* Naming Standard Input:: How to specify standard input with other
- files.
-* Environment Variables:: The environment variables @command{gawk}
- uses.
-* AWKPATH Variable:: Searching directories for @command{awk}
- programs.
-* Other Environment Variables:: The environment variables.
-* Exit Status:: @command{gawk}'s exit status.
-* Include Files:: Including other files into your program.
-* Obsolete:: Obsolete Options and/or features.
-* Undocumented:: Undocumented Options and Features.
-* Regexp Usage:: How to Use Regular Expressions.
-* Escape Sequences:: How to write nonprinting characters.
-* Regexp Operators:: Regular Expression Operators.
-* Bracket Expressions:: What can go between @samp{[...]}.
-* GNU Regexp Operators:: Operators specific to GNU software.
-* Case-sensitivity:: How to do case-insensitive matching.
-* Leftmost Longest:: How much text matches.
-* Computed Regexps:: Using Dynamic Regexps.
-* Records:: Controlling how data is split into records.
-* Fields:: An introduction to fields.
-* Nonconstant Fields:: Nonconstant Field Numbers.
-* Changing Fields:: Changing the Contents of a Field.
-* Field Separators:: The field separator and how to change it.
-* Default Field Splitting:: How fields are normally separated.
-* Regexp Field Splitting:: Using regexps as the field separator.
-* Single Character Fields:: Making each character a separate field.
-* Command Line Field Separator:: Setting @code{FS} from the command-line.
-* Field Splitting Summary:: Some final points and a summary table.
-* Constant Size:: Reading constant width data.
-* Splitting By Content:: Defining Fields By Content
-* Multiple Line:: Reading multi-line records.
-* Getline:: Reading files under explicit program
- control using the @code{getline} function.
-* Plain Getline:: Using @code{getline} with no arguments.
-* Getline/Variable:: Using @code{getline} into a variable.
-* Getline/File:: Using @code{getline} from a file.
-* Getline/Variable/File:: Using @code{getline} into a variable from a
- file.
-* Getline/Pipe:: Using @code{getline} from a pipe.
-* Getline/Variable/Pipe:: Using @code{getline} into a variable from a
- pipe.
-* Getline/Coprocess:: Using @code{getline} from a coprocess.
-* Getline/Variable/Coprocess:: Using @code{getline} into a variable from a
- coprocess.
-* Getline Notes:: Important things to know about
- @code{getline}.
-* Getline Summary:: Summary of @code{getline} Variants.
-* Read Timeout:: Reading input with a timeout.
-* Command line directories:: What happens if you put a directory on the
- command line.
-* Print:: The @code{print} statement.
-* Print Examples:: Simple examples of @code{print} statements.
-* Output Separators:: The output separators and how to change
- them.
-* OFMT:: Controlling Numeric Output With
- @code{print}.
-* Printf:: The @code{printf} statement.
-* Basic Printf:: Syntax of the @code{printf} statement.
-* Control Letters:: Format-control letters.
-* Format Modifiers:: Format-specification modifiers.
-* Printf Examples:: Several examples.
-* Redirection:: How to redirect output to multiple files
- and pipes.
-* Special Files:: File name interpretation in @command{gawk}.
- @command{gawk} allows access to inherited
- file descriptors.
-* Special FD:: Special files for I/O.
-* Special Network:: Special files for network communications.
-* Special Caveats:: Things to watch out for.
-* Close Files And Pipes:: Closing Input and Output Files and Pipes.
-* Values:: Constants, Variables, and Regular
- Expressions.
-* Constants:: String, numeric and regexp constants.
-* Scalar Constants:: Numeric and string constants.
-* Nondecimal-numbers:: What are octal and hex numbers.
-* Regexp Constants:: Regular Expression constants.
-* Using Constant Regexps:: When and how to use a regexp constant.
-* Variables:: Variables give names to values for later
- use.
-* Using Variables:: Using variables in your programs.
-* Assignment Options:: Setting variables on the command-line and a
- summary of command-line syntax. This is an
- advanced method of input.
-* Conversion:: The conversion of strings to numbers and
- vice versa.
-* All Operators:: @command{gawk}'s operators.
-* Arithmetic Ops:: Arithmetic operations (@samp{+}, @samp{-},
- etc.)
-* Concatenation:: Concatenating strings.
-* Assignment Ops:: Changing the value of a variable or a
- field.
-* Increment Ops:: Incrementing the numeric value of a
- variable.
-* Truth Values and Conditions:: Testing for true and false.
-* Truth Values:: What is ``true'' and what is ``false''.
-* Typing and Comparison:: How variables acquire types and how this
- affects comparison of numbers and strings
- with @samp{<}, etc.
-* Variable Typing:: String type versus numeric type.
-* Comparison Operators:: The comparison operators.
-* POSIX String Comparison:: String comparison with POSIX rules.
-* Boolean Ops:: Combining comparison expressions using
- boolean operators @samp{||} (``or''),
- @samp{&&} (``and'') and @samp{!} (``not'').
-* Conditional Exp:: Conditional expressions select between two
- subexpressions under control of a third
- subexpression.
-* Function Calls:: A function call is an expression.
-* Precedence:: How various operators nest.
-* Locales:: How the locale affects things.
-* Pattern Overview:: What goes into a pattern.
-* Regexp Patterns:: Using regexps as patterns.
-* Expression Patterns:: Any expression can be used as a pattern.
-* Ranges:: Pairs of patterns specify record ranges.
-* BEGIN/END:: Specifying initialization and cleanup
- rules.
-* Using BEGIN/END:: How and why to use BEGIN/END rules.
-* I/O And BEGIN/END:: I/O issues in BEGIN/END rules.
-* BEGINFILE/ENDFILE:: Two special patterns for advanced control.
-* Empty:: The empty pattern, which matches every
- record.
-* Using Shell Variables:: How to use shell variables with
- @command{awk}.
-* Action Overview:: What goes into an action.
-* Statements:: Describes the various control statements in
- detail.
-* If Statement:: Conditionally execute some @command{awk}
- statements.
-* While Statement:: Loop until some condition is satisfied.
-* Do Statement:: Do specified action while looping until
- some condition is satisfied.
-* For Statement:: Another looping statement, that provides
- initialization and increment clauses.
-* Switch Statement:: Switch/case evaluation for conditional
- execution of statements based on a value.
-* Break Statement:: Immediately exit the innermost enclosing
- loop.
-* Continue Statement:: Skip to the end of the innermost enclosing
- loop.
-* Next Statement:: Stop processing the current input record.
-* Nextfile Statement:: Stop processing the current file.
-* Exit Statement:: Stop execution of @command{awk}.
-* Built-in Variables:: Summarizes the built-in variables.
-* User-modified:: Built-in variables that you change to
- control @command{awk}.
-* Auto-set:: Built-in variables where @command{awk}
- gives you information.
-* ARGC and ARGV:: Ways to use @code{ARGC} and @code{ARGV}.
-* Array Basics:: The basics of arrays.
-* Array Intro:: Introduction to Arrays
-* Reference to Elements:: How to examine one element of an array.
-* Assigning Elements:: How to change an element of an array.
-* Array Example:: Basic Example of an Array
-* Scanning an Array:: A variation of the @code{for} statement. It
- loops through the indices of an array's
- existing elements.
-* Controlling Scanning:: Controlling the order in which arrays are
- scanned.
-* Delete:: The @code{delete} statement removes an
- element from an array.
-* Numeric Array Subscripts:: How to use numbers as subscripts in
- @command{awk}.
-* Uninitialized Subscripts:: Using Uninitialized variables as
- subscripts.
-* Multi-dimensional:: Emulating multidimensional arrays in
- @command{awk}.
-* Multi-scanning:: Scanning multidimensional arrays.
-* Arrays of Arrays:: True multidimensional arrays.
-* Built-in:: Summarizes the built-in functions.
-* Calling Built-in:: How to call built-in functions.
-* Numeric Functions:: Functions that work with numbers, including
- @code{int()}, @code{sin()} and
- @code{rand()}.
-* String Functions:: Functions for string manipulation, such as
- @code{split()}, @code{match()} and
- @code{sprintf()}.
-* Gory Details:: More than you want to know about @samp{\}
- and @samp{&} with @code{sub()},
- @code{gsub()}, and @code{gensub()}.
-* I/O Functions:: Functions for files and shell commands.
-* Time Functions:: Functions for dealing with timestamps.
-* Bitwise Functions:: Functions for bitwise operations.
-* Type Functions:: Functions for type information.
-* I18N Functions:: Functions for string translation.
-* User-defined:: Describes User-defined functions in detail.
-* Definition Syntax:: How to write definitions and what they
- mean.
-* Function Example:: An example function definition and what it
- does.
-* Function Caveats:: Things to watch out for.
-* Calling A Function:: Don't use spaces.
-* Variable Scope:: Controlling variable scope.
-* Pass By Value/Reference:: Passing parameters.
-* Return Statement:: Specifying the value a function returns.
-* Dynamic Typing:: How variable types can change at runtime.
-* Indirect Calls:: Choosing the function to call at runtime.
-* I18N and L10N:: Internationalization and Localization.
-* Explaining gettext:: How GNU @code{gettext} works.
-* Programmer i18n:: Features for the programmer.
-* Translator i18n:: Features for the translator.
-* String Extraction:: Extracting marked strings.
-* Printf Ordering:: Rearranging @code{printf} arguments.
-* I18N Portability:: @command{awk}-level portability issues.
-* I18N Example:: A simple i18n example.
-* Gawk I18N:: @command{gawk} is also internationalized.
-* Floating-point Programming:: Effective floating-point programming.
-* Floating-point Representation:: Binary floating-point representation.
-* Floating-point Context:: Floating-point context.
-* Rounding Mode:: Floating-point rounding mode.
-* Arbitrary Precision Floats:: Arbitrary precision floating-point
- arithmetic with @command{gawk}.
-* Setting Precision:: Setting the working precision.
-* Setting Rounding Mode:: Setting the rounding mode.
-* Floating-point Constants:: Representing floating-point constants.
-* Changing Precision:: Changing the precision of a number.
-* Exact Arithmetic:: Exact arithmetic with floating-point numbers.
-* Integer Programming:: Effective integer programming.
-* Arbitrary Precision Integers:: Arbitrary precision integer
- arithmetic with @command{gawk}.
-* MPFR and GMP Libraries:: Information about the MPFR and GMP libraries.
-* Nondecimal Data:: Allowing nondecimal input data.
-* Array Sorting:: Facilities for controlling array traversal
- and sorting arrays.
-* Controlling Array Traversal:: How to use PROCINFO["sorted_in"].
-* Array Sorting Functions:: How to use @code{asort()} and
- @code{asorti()}.
-* Two-way I/O:: Two-way communications with another
- process.
-* TCP/IP Networking:: Using @command{gawk} for network
- programming.
-* Profiling:: Profiling your @command{awk} programs.
-* Library Names:: How to best name private global variables
- in library functions.
-* General Functions:: Functions that are of general use.
-* Strtonum Function:: A replacement for the built-in
- @code{strtonum()} function.
-* Assert Function:: A function for assertions in @command{awk}
- programs.
-* Round Function:: A function for rounding if @code{sprintf()}
- does not do it correctly.
-* Cliff Random Function:: The Cliff Random Number Generator.
-* Ordinal Functions:: Functions for using characters as numbers
- and vice versa.
-* Join Function:: A function to join an array into a string.
-* Gettimeofday Function:: A function to get formatted times.
-* Data File Management:: Functions for managing command-line data
- files.
-* Filetrans Function:: A function for handling data file
- transitions.
-* Rewind Function:: A function for rereading the current file.
-* File Checking:: Checking that data files are readable.
-* Empty Files:: Checking for zero-length files.
-* Ignoring Assigns:: Treating assignments as file names.
-* Getopt Function:: A function for processing command-line
- arguments.
-* Passwd Functions:: Functions for getting user information.
-* Group Functions:: Functions for getting group information.
-* Walking Arrays:: A function to walk arrays of arrays.
-* Running Examples:: How to run these examples.
-* Clones:: Clones of common utilities.
-* Cut Program:: The @command{cut} utility.
-* Egrep Program:: The @command{egrep} utility.
-* Id Program:: The @command{id} utility.
-* Split Program:: The @command{split} utility.
-* Tee Program:: The @command{tee} utility.
-* Uniq Program:: The @command{uniq} utility.
-* Wc Program:: The @command{wc} utility.
-* Miscellaneous Programs:: Some interesting @command{awk} programs.
-* Dupword Program:: Finding duplicated words in a document.
-* Alarm Program:: An alarm clock.
-* Translate Program:: A program similar to the @command{tr}
- utility.
-* Labels Program:: Printing mailing labels.
-* Word Sorting:: A program to produce a word usage count.
-* History Sorting:: Eliminating duplicate entries from a
- history file.
-* Extract Program:: Pulling out programs from Texinfo source
- files.
-* Simple Sed:: A Simple Stream Editor.
-* Igawk Program:: A wrapper for @command{awk} that includes
- files.
-* Anagram Program:: Finding anagrams from a dictionary.
-* Signature Program:: People do amazing things with too much time
- on their hands.
-* Debugging:: Introduction to @command{gawk} Debugger.
-* Debugging Concepts:: Debugging in General.
-* Debugging Terms:: Additional Debugging Concepts.
-* Awk Debugging:: Awk Debugging.
-* Sample Debugging Session:: Sample Debugging Session.
-* Debugger Invocation:: How to Start the Debugger.
-* Finding The Bug:: Finding the Bug.
-* List of Debugger Commands:: Main Commands.
-* Breakpoint Control:: Control of Breakpoints.
-* Debugger Execution Control:: Control of Execution.
-* Viewing And Changing Data:: Viewing and Changing Data.
-* Execution Stack:: Dealing with the Stack.
-* Debugger Info:: Obtaining Information about the Program and
- the Debugger State.
-* Miscellaneous Debugger Commands:: Miscellaneous Commands.
-* Readline Support:: Readline Support.
-* Limitations:: Limitations and Future Plans.
-* V7/SVR3.1:: The major changes between V7 and System V
- Release 3.1.
-* SVR4:: Minor changes between System V Releases 3.1
- and 4.
-* POSIX:: New features from the POSIX standard.
-* BTL:: New features from Brian Kernighan's version
- of @command{awk}.
-* POSIX/GNU:: The extensions in @command{gawk} not in
- POSIX @command{awk}.
-* Common Extensions:: Common Extensions Summary.
-* Ranges and Locales:: How locales used to affect regexp ranges.
-* Contributors:: The major contributors to @command{gawk}.
-* Gawk Distribution:: What is in the @command{gawk} distribution.
-* Getting:: How to get the distribution.
-* Extracting:: How to extract the distribution.
-* Distribution contents:: What is in the distribution.
-* Unix Installation:: Installing @command{gawk} under various
- versions of Unix.
-* Quick Installation:: Compiling @command{gawk} under Unix.
-* Additional Configuration Options:: Other compile-time options.
-* Configuration Philosophy:: How it's all supposed to work.
-* Non-Unix Installation:: Installation on Other Operating Systems.
-* PC Installation:: Installing and Compiling @command{gawk} on
- MS-DOS and OS/2.
-* PC Binary Installation:: Installing a prepared distribution.
-* PC Compiling:: Compiling @command{gawk} for MS-DOS,
- Windows32, and OS/2.
-* PC Testing:: Testing @command{gawk} on PC systems.
-* PC Using:: Running @command{gawk} on MS-DOS, Windows32
- and OS/2.
-* Cygwin:: Building and running @command{gawk} for
- Cygwin.
-* MSYS:: Using @command{gawk} In The MSYS
- Environment.
-* VMS Installation:: Installing @command{gawk} on VMS.
-* VMS Compilation:: How to compile @command{gawk} under VMS.
-* VMS Installation Details:: How to install @command{gawk} under VMS.
-* VMS Running:: How to run @command{gawk} under VMS.
-* VMS Old Gawk:: An old version comes with some VMS systems.
-* Bugs:: Reporting Problems and Bugs.
-* Other Versions:: Other freely available @command{awk}
- implementations.
-* Compatibility Mode:: How to disable certain @command{gawk}
- extensions.
-* Additions:: Making Additions To @command{gawk}.
-* Accessing The Source:: Accessing the Git repository.
-* Adding Code:: Adding code to the main body of
- @command{gawk}.
-* New Ports:: Porting @command{gawk} to a new operating
- system.
-* Dynamic Extensions:: Adding new built-in functions to
- @command{gawk}.
-* Internals:: A brief look at some @command{gawk}
- internals.
-* Plugin License:: A note about licensing.
-* Loading Extensions:: How to load dynamic extensions.
-* Sample Library:: A example of new functions.
-* Internal File Description:: What the new functions will do.
-* Internal File Ops:: The code for internal file operations.
-* Using Internal File Ops:: How to use an external extension.
-* Future Extensions:: New features that may be implemented one
- day.
-* Basic High Level:: The high level view.
-* Basic Data Typing:: A very quick intro to data types.
-* Floating Point Issues:: Stuff to know about floating-point numbers.
-* String Conversion Precision:: The String Value Can Lie.
-* Unexpected Results:: Floating Point Numbers Are Not Abstract
- Numbers.
-* POSIX Floating Point Problems:: Standards Versus Existing Practice.
+* History:: The history of @command{gawk} and
+ @command{awk}.
+* Names:: What name to use to find
+ @command{awk}.
+* This Manual:: Using this @value{DOCUMENT}. Includes
+ sample input files that you can use.
+* Conventions:: Typographical Conventions.
+* Manual History:: Brief history of the GNU project and
+ this @value{DOCUMENT}.
+* How To Contribute:: Helping to save the world.
+* Acknowledgments:: Acknowledgments.
+* Running gawk:: How to run @command{gawk} programs;
+ includes command-line syntax.
+* One-shot:: Running a short throwaway
+ @command{awk} program.
+* Read Terminal:: Using no input files (input from
+ terminal instead).
+* Long:: Putting permanent @command{awk}
+ programs in files.
+* Executable Scripts:: Making self-contained @command{awk}
+ programs.
+* Comments:: Adding documentation to @command{gawk}
+ programs.
+* Quoting:: More discussion of shell quoting
+ issues.
+* DOS Quoting:: Quoting in Windows Batch Files.
+* Sample Data Files:: Sample data files for use in the
+ @command{awk} programs illustrated in
+ this @value{DOCUMENT}.
+* Very Simple:: A very simple example.
+* Two Rules:: A less simple one-line example using
+ two rules.
+* More Complex:: A more complex example.
+* Statements/Lines:: Subdividing or combining statements
+ into lines.
+* Other Features:: Other Features of @command{awk}.
+* When:: When to use @command{gawk} and when to
+ use other things.
+* Command Line:: How to run @command{awk}.
+* Options:: Command-line options and their
+ meanings.
+* Other Arguments:: Input file names and variable
+ assignments.
+* Naming Standard Input:: How to specify standard input with
+ other files.
+* Environment Variables:: The environment variables
+ @command{gawk} uses.
+* AWKPATH Variable:: Searching directories for
+ @command{awk} programs.
+* AWKLIBPATH Variable:: Searching directories for
+ @command{awk} shared libraries.
+* Other Environment Variables:: The environment variables.
+* Exit Status:: @command{gawk}'s exit status.
+* Include Files:: Including other files into your
+ program.
+* Loading Shared Libraries:: Loading shared libraries into your
+ program.
+* Obsolete:: Obsolete Options and/or features.
+* Undocumented:: Undocumented Options and Features.
+* Regexp Usage:: How to Use Regular Expressions.
+* Escape Sequences:: How to write nonprinting characters.
+* Regexp Operators:: Regular Expression Operators.
+* Bracket Expressions:: What can go between @samp{[...]}.
+* GNU Regexp Operators:: Operators specific to GNU software.
+* Case-sensitivity:: How to do case-insensitive matching.
+* Leftmost Longest:: How much text matches.
+* Computed Regexps:: Using Dynamic Regexps.
+* Records:: Controlling how data is split into
+ records.
+* Fields:: An introduction to fields.
+* Nonconstant Fields:: Nonconstant Field Numbers.
+* Changing Fields:: Changing the Contents of a Field.
+* Field Separators:: The field separator and how to change
+ it.
+* Default Field Splitting:: How fields are normally separated.
+* Regexp Field Splitting:: Using regexps as the field separator.
+* Single Character Fields:: Making each character a separate
+ field.
+* Command Line Field Separator:: Setting @code{FS} from the
+ command-line.
+* Field Splitting Summary:: Some final points and a summary table.
+* Constant Size:: Reading constant width data.
+* Splitting By Content:: Defining Fields By Content
+* Multiple Line:: Reading multi-line records.
+* Getline:: Reading files under explicit program
+ control using the @code{getline}
+ function.
+* Plain Getline:: Using @code{getline} with no
+ arguments.
+* Getline/Variable:: Using @code{getline} into a variable.
+* Getline/File:: Using @code{getline} from a file.
+* Getline/Variable/File:: Using @code{getline} into a variable
+ from a file.
+* Getline/Pipe:: Using @code{getline} from a pipe.
+* Getline/Variable/Pipe:: Using @code{getline} into a variable
+ from a pipe.
+* Getline/Coprocess:: Using @code{getline} from a coprocess.
+* Getline/Variable/Coprocess:: Using @code{getline} into a variable
+ from a coprocess.
+* Getline Notes:: Important things to know about
+ @code{getline}.
+* Getline Summary:: Summary of @code{getline} Variants.
+* Read Timeout:: Reading input with a timeout.
+* Command line directories:: What happens if you put a directory on
+ the command line.
+* Print:: The @code{print} statement.
+* Print Examples:: Simple examples of @code{print}
+ statements.
+* Output Separators:: The output separators and how to
+ change them.
+* OFMT:: Controlling Numeric Output With
+ @code{print}.
+* Printf:: The @code{printf} statement.
+* Basic Printf:: Syntax of the @code{printf} statement.
+* Control Letters:: Format-control letters.
+* Format Modifiers:: Format-specification modifiers.
+* Printf Examples:: Several examples.
+* Redirection:: How to redirect output to multiple
+ files and pipes.
+* Special Files:: File name interpretation in
+ @command{gawk}. @command{gawk} allows
+ access to inherited file descriptors.
+* Special FD:: Special files for I/O.
+* Special Network:: Special files for network
+ communications.
+* Special Caveats:: Things to watch out for.
+* Close Files And Pipes:: Closing Input and Output Files and
+ Pipes.
+* Values:: Constants, Variables, and Regular
+ Expressions.
+* Constants:: String, numeric and regexp constants.
+* Scalar Constants:: Numeric and string constants.
+* Nondecimal-numbers:: What are octal and hex numbers.
+* Regexp Constants:: Regular Expression constants.
+* Using Constant Regexps:: When and how to use a regexp constant.
+* Variables:: Variables give names to values for
+ later use.
+* Using Variables:: Using variables in your programs.
+* Assignment Options:: Setting variables on the command-line
+ and a summary of command-line syntax.
+ This is an advanced method of input.
+* Conversion:: The conversion of strings to numbers
+ and vice versa.
+* All Operators:: @command{gawk}'s operators.
+* Arithmetic Ops:: Arithmetic operations (@samp{+},
+ @samp{-}, etc.)
+* Concatenation:: Concatenating strings.
+* Assignment Ops:: Changing the value of a variable or a
+ field.
+* Increment Ops:: Incrementing the numeric value of a
+ variable.
+* Truth Values and Conditions:: Testing for true and false.
+* Truth Values:: What is ``true'' and what is
+ ``false''.
+* Typing and Comparison:: How variables acquire types and how
+ this affects comparison of numbers and
+ strings with @samp{<}, etc.
+* Variable Typing:: String type versus numeric type.
+* Comparison Operators:: The comparison operators.
+* POSIX String Comparison:: String comparison with POSIX rules.
+* Boolean Ops:: Combining comparison expressions using
+ boolean operators @samp{||} (``or''),
+ @samp{&&} (``and'') and @samp{!}
+ (``not'').
+* Conditional Exp:: Conditional expressions select between
+ two subexpressions under control of a
+ third subexpression.
+* Function Calls:: A function call is an expression.
+* Precedence:: How various operators nest.
+* Locales:: How the locale affects things.
+* Pattern Overview:: What goes into a pattern.
+* Regexp Patterns:: Using regexps as patterns.
+* Expression Patterns:: Any expression can be used as a
+ pattern.
+* Ranges:: Pairs of patterns specify record
+ ranges.
+* BEGIN/END:: Specifying initialization and cleanup
+ rules.
+* Using BEGIN/END:: How and why to use BEGIN/END rules.
+* I/O And BEGIN/END:: I/O issues in BEGIN/END rules.
+* BEGINFILE/ENDFILE:: Two special patterns for advanced
+ control.
+* Empty:: The empty pattern, which matches every
+ record.
+* Using Shell Variables:: How to use shell variables with
+ @command{awk}.
+* Action Overview:: What goes into an action.
+* Statements:: Describes the various control
+ statements in detail.
+* If Statement:: Conditionally execute some
+ @command{awk} statements.
+* While Statement:: Loop until some condition is
+ satisfied.
+* Do Statement:: Do specified action while looping
+ until some condition is satisfied.
+* For Statement:: Another looping statement, that
+ provides initialization and increment
+ clauses.
+* Switch Statement:: Switch/case evaluation for conditional
+ execution of statements based on a
+ value.
+* Break Statement:: Immediately exit the innermost
+ enclosing loop.
+* Continue Statement:: Skip to the end of the innermost
+ enclosing loop.
+* Next Statement:: Stop processing the current input
+ record.
+* Nextfile Statement:: Stop processing the current file.
+* Exit Statement:: Stop execution of @command{awk}.
+* Built-in Variables:: Summarizes the built-in variables.
+* User-modified:: Built-in variables that you change to
+ control @command{awk}.
+* Auto-set:: Built-in variables where @command{awk}
+ gives you information.
+* ARGC and ARGV:: Ways to use @code{ARGC} and
+ @code{ARGV}.
+* Array Basics:: The basics of arrays.
+* Array Intro:: Introduction to Arrays
+* Reference to Elements:: How to examine one element of an
+ array.
+* Assigning Elements:: How to change an element of an array.
+* Array Example:: Basic Example of an Array
+* Scanning an Array:: A variation of the @code{for}
+ statement. It loops through the
+ indices of an array's existing
+ elements.
+* Controlling Scanning:: Controlling the order in which arrays
+ are scanned.
+* Delete:: The @code{delete} statement removes an
+ element from an array.
+* Numeric Array Subscripts:: How to use numbers as subscripts in
+ @command{awk}.
+* Uninitialized Subscripts:: Using Uninitialized variables as
+ subscripts.
+* Multi-dimensional:: Emulating multidimensional arrays in
+ @command{awk}.
+* Multi-scanning:: Scanning multidimensional arrays.
+* Arrays of Arrays:: True multidimensional arrays.
+* Built-in:: Summarizes the built-in functions.
+* Calling Built-in:: How to call built-in functions.
+* Numeric Functions:: Functions that work with numbers,
+ including @code{int()}, @code{sin()}
+ and @code{rand()}.
+* String Functions:: Functions for string manipulation,
+ such as @code{split()}, @code{match()}
+ and @code{sprintf()}.
+* Gory Details:: More than you want to know about
+ @samp{\} and @samp{&} with
+ @code{sub()}, @code{gsub()}, and
+ @code{gensub()}.
+* I/O Functions:: Functions for files and shell
+ commands.
+* Time Functions:: Functions for dealing with timestamps.
+* Bitwise Functions:: Functions for bitwise operations.
+* Type Functions:: Functions for type information.
+* I18N Functions:: Functions for string translation.
+* User-defined:: Describes User-defined functions in
+ detail.
+* Definition Syntax:: How to write definitions and what they
+ mean.
+* Function Example:: An example function definition and
+ what it does.
+* Function Caveats:: Things to watch out for.
+* Calling A Function:: Don't use spaces.
+* Variable Scope:: Controlling variable scope.
+* Pass By Value/Reference:: Passing parameters.
+* Return Statement:: Specifying the value a function
+ returns.
+* Dynamic Typing:: How variable types can change at
+ runtime.
+* Indirect Calls:: Choosing the function to call at
+ runtime.
+* Library Names:: How to best name private global
+ variables in library functions.
+* General Functions:: Functions that are of general use.
+* Strtonum Function:: A replacement for the built-in
+ @code{strtonum()} function.
+* Assert Function:: A function for assertions in
+ @command{awk} programs.
+* Round Function:: A function for rounding if
+ @code{sprintf()} does not do it
+ correctly.
+* Cliff Random Function:: The Cliff Random Number Generator.
+* Ordinal Functions:: Functions for using characters as
+ numbers and vice versa.
+* Join Function:: A function to join an array into a
+ string.
+* Getlocaltime Function:: A function to get formatted times.
+* Data File Management:: Functions for managing command-line
+ data files.
+* Filetrans Function:: A function for handling data file
+ transitions.
+* Rewind Function:: A function for rereading the current
+ file.
+* File Checking:: Checking that data files are readable.
+* Empty Files:: Checking for zero-length files.
+* Ignoring Assigns:: Treating assignments as file names.
+* Getopt Function:: A function for processing command-line
+ arguments.
+* Passwd Functions:: Functions for getting user
+ information.
+* Group Functions:: Functions for getting group
+ information.
+* Walking Arrays:: A function to walk arrays of arrays.
+* Running Examples:: How to run these examples.
+* Clones:: Clones of common utilities.
+* Cut Program:: The @command{cut} utility.
+* Egrep Program:: The @command{egrep} utility.
+* Id Program:: The @command{id} utility.
+* Split Program:: The @command{split} utility.
+* Tee Program:: The @command{tee} utility.
+* Uniq Program:: The @command{uniq} utility.
+* Wc Program:: The @command{wc} utility.
+* Miscellaneous Programs:: Some interesting @command{awk}
+ programs.
+* Dupword Program:: Finding duplicated words in a
+ document.
+* Alarm Program:: An alarm clock.
+* Translate Program:: A program similar to the @command{tr}
+ utility.
+* Labels Program:: Printing mailing labels.
+* Word Sorting:: A program to produce a word usage
+ count.
+* History Sorting:: Eliminating duplicate entries from a
+ history file.
+* Extract Program:: Pulling out programs from Texinfo
+ source files.
+* Simple Sed:: A Simple Stream Editor.
+* Igawk Program:: A wrapper for @command{awk} that
+ includes files.
+* Anagram Program:: Finding anagrams from a dictionary.
+* Signature Program:: People do amazing things with too much
+ time on their hands.
+* I18N and L10N:: Internationalization and Localization.
+* Explaining gettext:: How GNU @code{gettext} works.
+* Programmer i18n:: Features for the programmer.
+* Translator i18n:: Features for the translator.
+* String Extraction:: Extracting marked strings.
+* Printf Ordering:: Rearranging @code{printf} arguments.
+* I18N Portability:: @command{awk}-level portability
+ issues.
+* I18N Example:: A simple i18n example.
+* Gawk I18N:: @command{gawk} is also
+ internationalized.
+* Nondecimal Data:: Allowing nondecimal input data.
+* Array Sorting:: Facilities for controlling array
+ traversal and sorting arrays.
+* Controlling Array Traversal:: How to use PROCINFO["sorted_in"].
+* Array Sorting Functions:: How to use @code{asort()} and
+ @code{asorti()}.
+* Two-way I/O:: Two-way communications with another
+ process.
+* TCP/IP Networking:: Using @command{gawk} for network
+ programming.
+* Profiling:: Profiling your @command{awk} programs.
+* Debugging:: Introduction to @command{gawk}
+ debugger.
+* Debugging Concepts:: Debugging in General.
+* Debugging Terms:: Additional Debugging Concepts.
+* Awk Debugging:: Awk Debugging.
+* Sample Debugging Session:: Sample debugging session.
+* Debugger Invocation:: How to Start the Debugger.
+* Finding The Bug:: Finding the Bug.
+* List of Debugger Commands:: Main debugger commands.
+* Breakpoint Control:: Control of Breakpoints.
+* Debugger Execution Control:: Control of Execution.
+* Viewing And Changing Data:: Viewing and Changing Data.
+* Execution Stack:: Dealing with the Stack.
+* Debugger Info:: Obtaining Information about the
+ Program and the Debugger State.
+* Miscellaneous Debugger Commands:: Miscellaneous Commands.
+* Readline Support:: Readline support.
+* Limitations:: Limitations and future plans.
+* General Arithmetic:: An introduction to computer
+ arithmetic.
+* Floating Point Issues:: Stuff to know about floating-point
+ numbers.
+* String Conversion Precision:: The String Value Can Lie.
+* Unexpected Results:: Floating Point Numbers Are Not
+ Abstract Numbers.
+* POSIX Floating Point Problems:: Standards Versus Existing Practice.
+* Integer Programming:: Effective integer programming.
+* Floating-point Programming:: Effective Floating-point Programming.
+* Floating-point Representation:: Binary floating-point representation.
+* Floating-point Context:: Floating-point context.
+* Rounding Mode:: Floating-point rounding mode.
+* Gawk and MPFR:: How @command{gawk} provides
+ arbitrary-precision arithmetic.
+* Arbitrary Precision Floats:: Arbitrary Precision Floating-point
+ Arithmetic with @command{gawk}.
+* Setting Precision:: Setting the working precision.
+* Setting Rounding Mode:: Setting the rounding mode.
+* Floating-point Constants:: Representing floating-point constants.
+* Changing Precision:: Changing the precision of a number.
+* Exact Arithmetic:: Exact arithmetic with floating-point
+ numbers.
+* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic
+ with @command{gawk}.
+* Extension Intro:: What is an extension.
+* Plugin License:: A note about licensing.
+* Extension Design:: Design notes about the extension API.
+* Old Extension Problems:: Problems with the old mechanism.
+* Extension New Mechanism Goals:: Goals for the new mechanism.
+* Extension Other Design Decisions:: Some other design decisions.
+* Extension Mechanism Outline:: An outline of how it works.
+* Extension Future Growth:: Some room for future growth.
+* Extension API Description:: A full description of the API.
+* Extension API Functions Introduction:: Introduction to the API functions.
+* General Data Types:: The data types.
+* Requesting Values:: How to get a value.
+* Constructor Functions:: Functions for creating values.
+* Registration Functions:: Functions to register things with
+ @command{gawk}.
+* Extension Functions:: Registering extension functions.
+* Exit Callback Functions:: Registering an exit callback.
+* Extension Version String:: Registering a version string.
+* Input Parsers:: Registering an input parser.
+* Output Wrappers:: Registering an output wrapper.
+* Two-way processors:: Registering a two-way processor.
+* Printing Messages:: Functions for printing messages.
+* Updating @code{ERRNO}:: Functions for updating @code{ERRNO}.
+* Accessing Parameters:: Functions for accessing parameters.
+* Symbol Table Access:: Functions for accessing global
+ variables.
+* Symbol table by name:: Accessing variables by name.
+* Symbol table by cookie:: Accessing variables by ``cookie''.
+* Cached values:: Creating and using cached values.
+* Array Manipulation:: Functions for working with arrays.
+* Array Data Types:: Data types for working with arrays.
+* Array Functions:: Functions for working with arrays.
+* Flattening Arrays:: How to flatten arrays.
+* Creating Arrays:: How to create and populate arrays.
+* Extension API Variables:: Variables provided by the API.
+* Extension Versioning:: API Version information.
+* Extension API Informational Variables:: Variables providing information about
+ @command{gawk}'s invocation.
+* Extension API Boilerplate:: Boilerplate code for using the API.
+* Finding Extensions:: How @command{gawk} finds compiled
+ extensions.
+* Extension Example:: Example C code for an extension.
+* Internal File Description:: What the new functions will do.
+* Internal File Ops:: The code for internal file operations.
+* Using Internal File Ops:: How to use an external extension.
+* Extension Samples:: The sample extensions that ship with
+ @code{gawk}.
+* Extension Sample File Functions:: The file functions sample.
+* Extension Sample Fnmatch:: An interface to @code{fnmatch()}.
+* Extension Sample Fork:: An interface to @code{fork()} and
+ other process functions.
+* Extension Sample Ord:: Character to value to character
+ conversions.
+* Extension Sample Readdir:: An interface to @code{readdir()}.
+* Extension Sample Revout:: Reversing output sample output
+ wrapper.
+* Extension Sample Rev2way:: Reversing data sample two-way
+ processor.
+* Extension Sample Read write array:: Serializing an array to a file.
+* Extension Sample Readfile:: Reading an entire file into a string.
+* Extension Sample API Tests:: Tests for the API.
+* Extension Sample Time:: An interface to @code{gettimeofday()}
+ and @code{sleep()}.
+* gawkextlib:: The @code{gawkextlib} project.
+* V7/SVR3.1:: The major changes between V7 and
+ System V Release 3.1.
+* SVR4:: Minor changes between System V
+ Releases 3.1 and 4.
+* POSIX:: New features from the POSIX standard.
+* BTL:: New features from Brian Kernighan's
+ version of @command{awk}.
+* POSIX/GNU:: The extensions in @command{gawk} not
+ in POSIX @command{awk}.
+* Common Extensions:: Common Extensions Summary.
+* Ranges and Locales:: How locales used to affect regexp
+ ranges.
+* Contributors:: The major contributors to
+ @command{gawk}.
+* Gawk Distribution:: What is in the @command{gawk}
+ distribution.
+* Getting:: How to get the distribution.
+* Extracting:: How to extract the distribution.
+* Distribution contents:: What is in the distribution.
+* Unix Installation:: Installing @command{gawk} under
+ various versions of Unix.
+* Quick Installation:: Compiling @command{gawk} under Unix.
+* Additional Configuration Options:: Other compile-time options.
+* Configuration Philosophy:: How it's all supposed to work.
+* Non-Unix Installation:: Installation on Other Operating
+ Systems.
+* PC Installation:: Installing and Compiling
+ @command{gawk} on MS-DOS and OS/2.
+* PC Binary Installation:: Installing a prepared distribution.
+* PC Compiling:: Compiling @command{gawk} for MS-DOS,
+ Windows32, and OS/2.
+* PC Testing:: Testing @command{gawk} on PC systems.
+* PC Using:: Running @command{gawk} on MS-DOS,
+ Windows32 and OS/2.
+* Cygwin:: Building and running @command{gawk}
+ for Cygwin.
+* MSYS:: Using @command{gawk} In The MSYS
+ Environment.
+* VMS Installation:: Installing @command{gawk} on VMS.
+* VMS Compilation:: How to compile @command{gawk} under
+ VMS.
+* VMS Installation Details:: How to install @command{gawk} under
+ VMS.
+* VMS Running:: How to run @command{gawk} under VMS.
+* VMS Old Gawk:: An old version comes with some VMS
+ systems.
+* Bugs:: Reporting Problems and Bugs.
+* Other Versions:: Other freely available @command{awk}
+ implementations.
+* Compatibility Mode:: How to disable certain @command{gawk}
+ extensions.
+* Additions:: Making Additions To @command{gawk}.
+* Accessing The Source:: Accessing the Git repository.
+* Adding Code:: Adding code to the main body of
+ @command{gawk}.
+* New Ports:: Porting @command{gawk} to a new
+ operating system.
+* Derived Files:: Why derived files are kept in the
+ @command{git} repository.
+* Future Extensions:: New features that may be implemented
+ one day.
+* Implementation Limitations:: Some limitations of the implementation.
+* Basic High Level:: The high level view.
+* Basic Data Typing:: A very quick intro to data types.
@end detailmenu
@end menu
@@ -1125,6 +1252,12 @@ expert should find useful. In particular, the description of POSIX
@ref{Sample Programs},
should be of interest.
+This @value{DOCUMENT} is split into several parts, as follows:
+
+Part I describes the @command{awk} language and @command{gawk} program in detail.
+It starts with the basics, and continues through all of the features of @command{awk}.
+It contains the following chapters:
+
@ref{Getting Started},
provides the essentials you need to know to begin using @command{awk}.
@@ -1168,6 +1301,22 @@ describes the built-in functions @command{awk} and
@command{gawk} provide, as well as how to define
your own functions.
+Part II shows how to use @command{awk} and @command{gawk} for problem solving.
+There is lots of code here for you to read and learn from.
+It contains the following chapters:
+
+@ref{Library Functions}, which provides a number of functions meant to
+be used from main @command{awk} programs.
+
+@ref{Sample Programs},
+which provides many sample @command{awk} programs.
+
+Reading these two chapters allows you to see @command{awk}
+solving real problems.
+
+Part III focuses on features specific to @command{gawk}.
+It contains the following chapters:
+
@ref{Internationalization},
describes special features in @command{gawk} for translating program
messages into different languages at runtime.
@@ -1179,14 +1328,19 @@ are the abilities to have two-way communications with another process,
perform TCP/IP networking, and
profile your @command{awk} programs.
-@ref{Library Functions}, and
-@ref{Sample Programs},
-provide many sample @command{awk} programs.
-Reading them allows you to see @command{awk}
-solving real problems.
-
@ref{Debugger}, describes the @command{awk} debugger.
+@ref{Arbitrary Precision Arithmetic},
+describes advanced arithmetic facilities provided by
+@command{gawk}.
+
+@ref{Dynamic Extensions}, describes how to add new variables and
+functions to @command{gawk} by writing extensions in C.
+
+Part IV provides the appendices, the Glossary, and two licenses that cover
+the @command{gawk} source code and this @value{DOCUMENT}, respectively.
+It contains the following appendices:
+
@ref{Language History},
describes how the @command{awk} language has evolved since
its first release to present. It also describes how @command{gawk}
@@ -1203,8 +1357,7 @@ available @command{awk} implementations.
@ref{Notes},
describes how to disable @command{gawk}'s extensions, as
well as how to contribute new code to @command{gawk},
-how to write extension libraries, and some possible
-future directions for @command{gawk} development.
+and some possible future directions for @command{gawk} development.
@ref{Basic Concepts},
provides some very cursory background material for those who
@@ -1648,12 +1801,14 @@ Nof Ayalon @*
ISRAEL @*
March, 2011
-@ignore
-@c Try this
@iftex
-@page
-@headings off
-@majorheading I@ @ @ @ The @command{awk} Language and @command{gawk}
+@part Part I:@* The @command{awk} Language
+@end iftex
+
+@ignore
+@ifdocbook
+@part Part I:@* The @command{awk} Language
+
Part I describes the @command{awk} language and @command{gawk} program in detail.
It starts with the basics, and continues through all of the features of @command{awk}
and @command{gawk}. It contains the following chapters:
@@ -1663,6 +1818,9 @@ and @command{gawk}. It contains the following chapters:
@ref{Getting Started}.
@item
+@ref{Invoking Gawk}.
+
+@item
@ref{Regexp}.
@item
@@ -1682,21 +1840,8 @@ and @command{gawk}. It contains the following chapters:
@item
@ref{Functions}.
-
-@item
-@ref{Internationalization}.
-
-@item
-@ref{Advanced Features}.
-
-@item
-@ref{Invoking Gawk}.
@end itemize
-
-@page
-@evenheading @thispage@ @ @ @strong{@value{TITLE}} @| @|
-@oddheading @| @| @strong{@thischapter}@ @ @ @thispage
-@end iftex
+@end ifdocbook
@end ignore
@node Getting Started
@@ -2927,6 +3072,7 @@ things in this @value{CHAPTER} that don't interest you right now.
* Environment Variables:: The environment variables @command{gawk} uses.
* Exit Status:: @command{gawk}'s exit status.
* Include Files:: Including other files into your program.
+* Loading Shared Libraries:: Loading shared libraries into your program.
* Obsolete:: Obsolete Options and/or features.
* Undocumented:: Undocumented Options and Features.
@end menu
@@ -3018,6 +3164,22 @@ This option may be given multiple times; the @command{awk}
program consists of the concatenation the contents of
each specified @var{source-file}.
+@item -i @var{source-file}
+@itemx --include @var{source-file}
+@cindex @code{-i} option
+@cindex @code{--include} option
+@cindex @command{awk} programs, location of
+Read @command{awk} source library from @var{source-file}. This option is
+completely equivalent to using the @samp{@@include} directive inside
+your program. This option is very
+similar to the @option{-f} option, but there are two important differences.
+First, when @option{-i} is used, the program source will not be loaded if it has
+been previously loaded, whereas the @option{-f} will always load the file.
+Second, because this option is intended to be used with code libraries, the
+@command{awk} command does not recognize such files as constituting main program
+input. Thus, after processing an @option{-i} argument, we still expect to
+find the main source code via the @option{-f} option or on the command-line.
+
@item -v @var{var}=@var{val}
@itemx --assign @var{var}=@var{val}
@cindex @code{-v} option
@@ -3078,6 +3240,9 @@ The following list describes @command{gawk}-specific options:
@cindex @code{-b} option
@cindex @code{--characters-as-bytes} option
Cause @command{gawk} to treat all input data as single-byte characters.
+In addition, all output written with @code{print} or @code{printf}
+are treated as single-byte characters.
+
Normally, @command{gawk} follows the POSIX standard and attempts to process
its input data according to the current locale. This can often involve
converting multibyte characters into wide characters (internally), and
@@ -3209,9 +3374,12 @@ that @command{gawk} accepts and then exit.
@cindex @code{-l} option
@cindex @code{--load} option
@cindex loading, library
-Load a shared library @var{lib}. This searches for the library using the @env{AWKPATH}
-environment variable. The suffix @samp{.so} in the library name is optional.
-The library initialization routine should be named @code{dlload()}.
+Load a shared library @var{lib}. This searches for the library using the @env{AWKLIBPATH}
+environment variable. The correct library suffix for your platform will be
+supplied by default, so it need not be specified in the library name.
+The library initialization routine should be named @code{dl_load()}.
+An alternative is to use the @samp{@@load} keyword inside the program to load
+a shared library.
@item -L @r{[}value@r{]}
@itemx --lint@r{[}=value@r{]}
@@ -3590,6 +3758,8 @@ behaves.
@menu
* AWKPATH Variable:: Searching directories for @command{awk}
programs.
+* AWKLIBPATH Variable:: Searching directories for @command{awk} shared
+ libraries.
* Other Environment Variables:: The environment variables.
@end menu
@@ -3607,7 +3777,8 @@ on the command-line with the @option{-f} option.
In most @command{awk}
implementations, you must supply a precise path name for each program
file, unless the file is in the current directory.
-But in @command{gawk}, if the @value{FN} supplied to the @option{-f} option
+But in @command{gawk}, if the @value{FN} supplied to the @option{-f}
+or @option{-i} options
does not contain a @samp{/}, then @command{gawk} searches a list of
directories (called the @dfn{search path}), one by one, looking for a
file with the specified name.
@@ -3629,13 +3800,16 @@ standard directory in the default path and then specified on
the command line with a short @value{FN}. Otherwise, the full @value{FN}
would have to be typed for each file.
-By using both the @option{--source} and @option{-f} options, your command-line
+By using the @option{-i} option, or the @option{--source} and @option{-f} options, your command-line
@command{awk} programs can use facilities in @command{awk} library files
(@pxref{Library Functions}).
Path searching is not done if @command{gawk} is in compatibility mode.
This is true for both @option{--traditional} and @option{--posix}.
@xref{Options}.
+If the source code is not found after the initial search, the path is searched
+again after adding the default @samp{.awk} suffix to the filename.
+
@quotation NOTE
To include
the current directory in the path, either place
@@ -3665,6 +3839,21 @@ sense: the @env{AWKPATH} environment variable is used to find the program
source files. Once your program is running, all the files have been
found, and @command{gawk} no longer needs to use @env{AWKPATH}.
+@node AWKLIBPATH Variable
+@subsection The @env{AWKLIBPATH} Environment Variable
+@cindex @env{AWKLIBPATH} environment variable
+@cindex directories, searching
+@cindex search paths
+@cindex search paths, for shared libraries
+@cindex differences in @command{awk} and @command{gawk}, @code{AWKLIBPATH} environment variable
+
+The @env{AWKLIBPATH} environment variable is similar to the @env{AWKPATH}
+variable, but it is used to search for shared libraries specified
+with the @option{-l} option rather than for source files. If the library
+is not found, the path is searched again after adding the appropriate
+shared library suffix for the platform. For example, on GNU/Linux systems,
+the suffix @samp{.so} is used.
+
@node Other Environment Variables
@subsection Other Environment Variables
@@ -3767,7 +3956,8 @@ code from various @command{awk} scripts. In other words, you can group
together @command{awk} functions, used to carry out specific tasks,
into external files. These files can be used just like function libraries,
using the @samp{@@include} keyword in conjunction with the @env{AWKPATH}
-environment variable.
+environment variable. Note that source files may also be included
+using the @option{-i} option.
Let's see an example.
We'll start with two (trivial) @command{awk} scripts, namely
@@ -3873,6 +4063,41 @@ As mentioned in @ref{AWKPATH Variable}, the current directory is always
searched first for source files, before searching in @env{AWKPATH},
and this also applies to files named with @samp{@@include}.
+@node Loading Shared Libraries
+@section Loading Shared Libraries Into Your Program
+
+This @value{SECTION} describes a feature that is specific to @command{gawk}.
+
+The @samp{@@load} keyword can be used to read external @command{awk} shared
+libraries. This allows you to link in compiled code that may offer superior
+performance and/or give you access to extended capabilities not supported
+by the @command{awk} language. The @env{AWKLIBPATH} variable is used to
+search for the shared library. Using @samp{@@load} is completely equivalent
+to using the @option{-l} command-line option.
+
+If the shared library is not initially found in @env{AWKLIBPATH}, another
+search is conducted after appending the platform's default shared library
+suffix to the filename. For example, on GNU/Linux systems, the suffix
+@samp{.so} is used.
+
+@example
+$ @kbd{gawk '@@load "ordchr"; BEGIN @{print chr(65)@}'}
+@print{} A
+@end example
+
+@noindent
+This is equivalent to the following example:
+
+@example
+$ @kbd{gawk -lordchr 'BEGIN @{print chr(65)@}'}
+@print{} A
+@end example
+
+@noindent
+For command-line usage, the @option{-l} option is more convenient,
+but @samp{@@load} is useful for embedding inside an @command{awk} source file
+that requires access to a shared library.
+
@node Obsolete
@section Obsolete Options and/or Features
@@ -3969,31 +4194,6 @@ long-undocumented ``feature'' of Unix @code{awk}.
@end ignore
-@ignore
-@c Try this
-@iftex
-@page
-@headings off
-@majorheading II@ @ @ Using @command{awk} and @command{gawk}
-Part II shows how to use @command{awk} and @command{gawk} for problem solving.
-There is lots of code here for you to read and learn from.
-It contains the following chapters:
-
-@itemize @bullet
-@item
-@ref{Library Functions}.
-
-@item
-@ref{Sample Programs}.
-
-@end itemize
-
-@page
-@evenheading @thispage@ @ @ @strong{@value{TITLE}} @| @|
-@oddheading @| @| @strong{@thischapter}@ @ @ @thispage
-@end iftex
-@end ignore
-
@node Regexp
@chapter Regular Expressions
@cindex regexp, See regular expressions
@@ -5180,7 +5380,6 @@ used with it do not have to be named on the @command{awk} command line
* Getline:: Reading files under explicit program control
using the @code{getline} function.
* Read Timeout:: Reading input with a timeout.
-
* Command line directories:: What happens if you put a directory on the
command line.
@end menu
@@ -5306,16 +5505,22 @@ awk '@{ print $0 @}' RS="/" BBS-list
This sets @code{RS} to @samp{/} before processing @file{BBS-list}.
Using an unusual character such as @samp{/} for the record separator
-produces correct behavior in the vast majority of cases. However,
-the following (extreme) pipeline prints a surprising @samp{1}:
+produces correct behavior in the vast majority of cases.
+
+There is one unusual case, that occurs when @command{gawk} is
+being fully POSIX-compliant (@pxref{Options}).
+Then, the following (extreme) pipeline prints a surprising @samp{1}:
@example
-$ echo | awk 'BEGIN @{ RS = "a" @} ; @{ print NF @}'
+$ echo | gawk --posix 'BEGIN @{ RS = "a" @} ; @{ print NF @}'
@print{} 1
@end example
There is one field, consisting of a newline. The value of the built-in
variable @code{NF} is the number of fields in the current record.
+(In the normal case, @command{gawk} treats the newline as whitespace,
+printing @samp{0} as the result. Most other versions of @command{awk}
+also act this way.)
@cindex dark corner, input files
Reaching the end of an input file terminates the current input record,
@@ -7230,6 +7435,34 @@ trying to accomplish.
It is worth noting that those variants which do not use redirection
can cause @code{FILENAME} to be updated if they cause
@command{awk} to start reading a new input file.
+
+@item
+If the variable being assigned is an expression with side effects,
+different versions of @command{awk} behave differently upon encountering
+end-of-file. Some versions don't evaluate the expression; many versions
+(including @command{gawk}) do. Here is an example, due to Duncan Moore:
+
+@ignore
+Date: Sun, 01 Apr 2012 11:49:33 +0100
+From: Duncan Moore <duncan.moore@@gmx.com>
+@end ignore
+
+@example
+BEGIN @{
+ system("echo 1 > f")
+ while ((getline a[++c] < "f") > 0) @{ @}
+ print c
+@}
+@end example
+
+@noindent
+Here, the side effect is the @samp{++c}. Is @code{c} incremented if
+end of file is encountered, before the element in @code{a} is assigned?
+
+@command{gawk} treats @code{getline} like a function call, and evaluates
+the expression @samp{a[++c]} before attempting to read from @file{f}.
+Other versions of @command{awk} only evaluate the expression once they
+know that there is a string value to be assigned. Caveat Emptor.
@end itemize
@node Getline Summary
@@ -7240,9 +7473,10 @@ can cause @code{FILENAME} to be updated if they cause
summarizes the eight variants of @code{getline},
listing which built-in variables are set by each one,
and whether the variant is standard or a @command{gawk} extension.
+Note: for each variant, @command{gawk} sets the @code{RT} built-in variable.
@float Table,table-getline-variants
-@caption{getline Variants and What They Set}
+@caption{@code{getline} Variants and What They Set}
@multitable @columnfractions .33 .38 .27
@headitem Variant @tab Effect @tab Standard / Extension
@item @code{getline} @tab Sets @code{$0}, @code{NF}, @code{FNR}, and @code{NR} @tab Standard
@@ -11481,9 +11715,9 @@ fatal error.
@item
If you have written extensions that modify the record handling (by inserting
-an ``open hook''), you can invoke them at this point, before @command{gawk}
+an ``input parser''), you can invoke them at this point, before @command{gawk}
has started processing the file. (This is a @emph{very} advanced feature,
-currently used only by the @uref{http://xmlgawk.sourceforge.net, XMLgawk project}.)
+currently used only by the @uref{http://gawkextlib.sourceforge.net, @code{gawkextlib} project}.)
@end itemize
The @code{ENDFILE} rule is called when @command{gawk} has finished processing
@@ -12275,44 +12509,46 @@ function body reads the next record and starts processing it with the
first rule in the program.
@node Nextfile Statement
-@subsection Using @command{gawk}'s @code{nextfile} Statement
+@subsection The @code{nextfile} Statement
@cindex @code{nextfile} statement
@cindex differences in @command{awk} and @command{gawk}, @code{next}/@code{nextfile} statements
@cindex common extensions, @code{nextfile} statement
@cindex extensions, common@comma{} @code{nextfile} statement
-@command{gawk} provides the @code{nextfile} statement,
-which is similar to the @code{next} statement. @value{COMMONEXT}
+The @code{nextfile} statement
+is similar to the @code{next} statement.
However, instead of abandoning processing of the current record, the
-@code{nextfile} statement instructs @command{gawk} to stop processing the
+@code{nextfile} statement instructs @command{awk} to stop processing the
current @value{DF}.
-The @code{nextfile} statement is a @command{gawk} extension.
-In most other @command{awk} implementations,
-or if @command{gawk} is in compatibility mode
-(@pxref{Options}),
-@code{nextfile} is not special.
-
Upon execution of the @code{nextfile} statement,
-any @code{ENDFILE} rules are executed except in the case as
-mentioned below, @code{FILENAME} is
+@code{FILENAME} is
updated to the name of the next @value{DF} listed on the command line,
-@code{FNR} is reset to one, @code{ARGIND} is incremented,
-any @code{BEGINFILE} rules are executed, and processing
+@code{FNR} is reset to one,
+and processing
starts over with the first rule in the program.
-(@code{ARGIND} hasn't been introduced yet. @xref{Built-in Variables}.)
If the @code{nextfile} statement causes the end of the input to be reached,
then the code in any @code{END} rules is executed. An exception to this is
-when the @code{nextfile} is invoked during execution of any statement in an
+when @code{nextfile} is invoked during execution of any statement in an
@code{END} rule; In this case, it causes the program to stop immediately. @xref{BEGIN/END}.
The @code{nextfile} statement is useful when there are many @value{DF}s
to process but it isn't necessary to process every record in every file.
-Normally, in order to move on to the next @value{DF}, a program
-has to continue scanning the unwanted records. The @code{nextfile}
+Without @code{nextfile},
+in order to move on to the next @value{DF}, a program
+would have to continue scanning the unwanted records. The @code{nextfile}
statement accomplishes this much more efficiently.
-In addition, @code{nextfile} is useful inside a @code{BEGINFILE}
+In @command{gawk}, execution of @code{nextfile} causes additional things
+to happen:
+any @code{ENDFILE} rules are executed except in the case as
+mentioned below,
+@code{ARGIND} is incremented,
+and
+any @code{BEGINFILE} rules are executed
+(@code{ARGIND} hasn't been introduced yet. @xref{Built-in Variables}.)
+
+With @command{gawk}, @code{nextfile} is useful inside a @code{BEGINFILE}
rule to skip over a file that would otherwise cause @command{gawk}
to exit with a fatal error. In this case, @code{ENDFILE} rules are not
executed. @xref{BEGINFILE/ENDFILE}.
@@ -12323,6 +12559,13 @@ reserved for closing files, pipes, and coprocesses that are
opened with redirections. It is not related to the main processing that
@command{awk} does with the files listed in @code{ARGV}.
+@quotation NOTE
+For many years, @code{nextfile} was a
+@command{gawk} extension. As of September, 2012, it was accepted for
+inclusion into the POSIX standard.
+See @uref{http://austingroupbugs.net/view.php?id=607, the Austin Group website}.
+@end quotation
+
@cindex functions, user-defined, @code{next}/@code{nextfile} statements and
@cindex @code{nextfile} statement, user-defined functions and
The current version of the Brian Kernighan's @command{awk} (@pxref{Other
@@ -12794,7 +13037,9 @@ does not affect the environment passed on to any programs that
Some operating systems may not have environment variables.
On such systems, the @code{ENVIRON} array is empty (except for
@w{@code{ENVIRON["AWKPATH"]}},
-@pxref{AWKPATH Variable}).
+@pxref{AWKPATH Variable} and
+@w{@code{ENVIRON["AWKLIBPATH"]}},
+@pxref{AWKLIBPATH Variable}).
@cindex @command{gawk}, @code{ERRNO} variable in
@cindex @code{ERRNO} variable
@@ -12870,6 +13115,16 @@ assigning a value to @code{NF} has the potential to affect
to @code{NF} can be used to create or remove fields from the
current record. @xref{Changing Fields}.
+@cindex @code{FUNCTAB} array
+@cindex @command{gawk}, @code{FUNCTAB} array in
+@cindex differences in @command{awk} and @command{gawk}, @code{FUNCTAB} variable
+@item FUNCTAB #
+An array whose indices are the names of all the user-defined
+or extension functions in the program.
+@strong{NOTE}: The array values cannot currently be used.
+Also, you may not use the @code{delete} statement with the
+@code{FUNCTAB} array.
+
@cindex @code{NR} variable
@item NR
The number of input records @command{awk} has processed since
@@ -12899,6 +13154,34 @@ This is
@code{"FIELDWIDTHS"} if field splitting with @code{FIELDWIDTHS} is in effect,
or @code{"FPAT"} if field matching with @code{FPAT} is in effect.
+@item PROCINFO["identifiers"]
+A subarray, indexed by the names of all identifiers used in the
+text of the AWK program. For each identifier, the value of the element is one of the following:
+
+@table @code
+@item "array"
+The identifier is an array.
+
+@item "extension"
+The identifier is an extension function loaded via
+@code{@@load}.
+
+@item "scalar"
+The identifier is a scalar.
+
+@item "untyped"
+The identifier is untyped (could be used as a scalar or array,
+@command{gawk} doesn't know yet).
+
+@item "user"
+The identifier is a user-defined function.
+@end table
+
+@noindent
+The values indicate what @command{gawk} knows about the identifiers
+after it has finished parsing the program; they are @emph{not} updated
+while the program runs.
+
@item PROCINFO["gid"]
The value of the @code{getgid()} system call.
@@ -12997,6 +13280,57 @@ In other @command{awk} implementations,
or if @command{gawk} is in compatibility mode
(@pxref{Options}),
it is not special.
+
+@cindex @command{gawk}, @code{SYMTAB} array in
+@cindex @code{SYMTAB} array
+@cindex differences in @command{awk} and @command{gawk}, @code{SYMTAB} variable
+@item SYMTAB #
+An array whose indices are the names of all currently defined
+global variables and arrays in the program. The array may be used
+for indirect access to read or write the value of a variable:
+
+@example
+foo = 5
+SYMTAB["foo"] = 4
+print foo # prints 4
+@end example
+
+@noindent
+The @code{isarray()} function (@pxref{Type Functions}) may be used to test
+if an element in @code{SYMTAB} is an array.
+Also, you may not use the @code{delete} statement with the
+@code{SYMTAB} array.
+
+You may use an index for @code{SYMTAB} that is not a predefined identifer:
+
+@example
+SYMTAB["xxx"] = 5
+print SYMTAB["xxx"]
+@end example
+
+@noindent
+This works as expected: in this case @code{SYMTAB} acts just like
+a regular array. The only difference is that you can't then delete
+@code{SYMTAB["xxx"]}.
+
+The @code{SYMTAB} array is more interesting than it looks. Andrew Schorr
+points out that it effectively gives @command{awk} data pointers. Consider his
+example:
+
+@example
+# Indirect multiply of any variable by amount, return result
+
+function multiply(variable, amount)
+@{
+ return SYMTAB[variable] *= amount
+@}
+@end example
+
+@quotation NOTE
+In order to avoid severe time-travel paradoxes@footnote{Not to mention difficult
+implementation issues.}, neither @code{FUNCTAB} nor @code{SYMTAB}
+are available as elements within the @code{SYMTAB} array.
+@end quotation
@end table
@c ENDOFRANGE bvconi
@c ENDOFRANGE vbconi
@@ -13815,21 +14149,28 @@ is not in the array is deleted.
@cindex deleting entire arrays
@cindex differences in @command{awk} and @command{gawk}, array elements, deleting
All the elements of an array may be deleted with a single statement
-@value{COMMONEXT}
by leaving off the subscript in the @code{delete} statement,
as follows:
+
@example
delete @var{array}
@end example
-This ability is a @command{gawk} extension; it is not available in
-compatibility mode (@pxref{Options}).
-
Using this version of the @code{delete} statement is about three times
more efficient than the equivalent loop that deletes each element one
at a time.
+@quotation NOTE
+For many years,
+using @code{delete} without a subscript was a @command{gawk} extension.
+As of September, 2012, it was accepted for
+inclusion into the POSIX standard. See @uref{http://austingroupbugs.net/view.php?id=544,
+the Austin Group website}. This form of the @code{delete} statement is also supported
+by Brian Kernighan's @command{awk} and @command{mawk}, as well as
+by a number of other implementations (@pxref{Other Versions}).
+@end quotation
+
@cindex portability, deleting array elements
@cindex Brennan, Michael
The following statement provides a portable but nonobvious way to clear
@@ -15349,7 +15690,7 @@ output literally. The interpretation of @samp{\} and @samp{&} then becomes
as shown in @ref{table-sub-posix-92}.
@float Table,table-sub-posix-92
-@caption{1992 POSIX Rules for sub and gsub Escape Sequence Processing}
+@caption{1992 POSIX Rules for @code{sub()} and @code{gsub()} Escape Sequence Processing}
@c thanks to Karl Berry for formatting this table
@tex
\vbox{\bigskip
@@ -15418,7 +15759,7 @@ to produce a @samp{\} preceding the matched text. This is shown in
@ref{table-sub-proposed}.
@float Table,table-sub-proposed
-@caption{Proposed rules for sub and backslash}
+@caption{Proposed Rules For @code{sub()} And Backslash}
@tex
\vbox{\bigskip
% This table has lots of &'s and \'s, so unspecialize them.
@@ -15480,7 +15821,7 @@ by anything else is not special; the @samp{\} is placed straight into the output
These rules are presented in @ref{table-posix-sub}.
@float Table,table-posix-sub
-@caption{POSIX rules for @code{sub()} and @code{gsub()}}
+@caption{POSIX Rules For @code{sub()} And @code{gsub()}}
@tex
\vbox{\bigskip
% This table has lots of &'s and \'s, so unspecialize them.
@@ -15548,7 +15889,7 @@ appears in the generated text and the @samp{\} does not,
as shown in @ref{table-gensub-escapes}.
@float Table,table-gensub-escapes
-@caption{Escape Sequence Processing for @code{gensub()}}
+@caption{Escape Sequence Processing For @code{gensub()}}
@tex
\vbox{\bigskip
% This table has lots of &'s and \'s, so unspecialize them.
@@ -16388,8 +16729,8 @@ bitwise operations just described. They are:
@cindex @command{gawk}, bitwise operations in
@table @code
@cindex @code{and()} function (@command{gawk})
-@item and(@var{v1}, @var{v2})
-Return the bitwise AND of the values provided by @var{v1} and @var{v2}.
+@item and(@var{v1}, @var{v2} @r{[}, @r{@dots{}]})
+Return the bitwise AND of the arguments. There must be at least two.
@cindex @code{compl()} function (@command{gawk})
@item compl(@var{val})
@@ -16400,16 +16741,16 @@ Return the bitwise complement of @var{val}.
Return the value of @var{val}, shifted left by @var{count} bits.
@cindex @code{or()} function (@command{gawk})
-@item or(@var{v1}, @var{v2})
-Return the bitwise OR of the values provided by @var{v1} and @var{v2}.
+@item or(@var{v1}, @var{v2} @r{[}, @r{@dots{}]})
+Return the bitwise OR of the arguments. There must be at least two.
@cindex @code{rshift()} function (@command{gawk})
@item rshift(@var{val}, @var{count})
Return the value of @var{val}, shifted right by @var{count} bits.
@cindex @code{xor()} function (@command{gawk})
-@item xor(@var{v1}, @var{v2})
-Return the bitwise XOR of the values provided by @var{v1} and @var{v2}.
+@item xor(@var{v1}, @var{v2} @r{[}, @r{@dots{}]})
+Return the bitwise XOR of the arguments. There must be at least two.
@end table
For all of these functions, first the double precision floating-point value is
@@ -16974,6 +17315,47 @@ foo's i=1
top's i=10
@end example
+Besides scalar values (strings and numbers), you may also have
+local arrays. By using a parameter name as an array, @command{awk}
+treats it as an array, and it is local to the function.
+In addition, recursive calls create new arrays.
+Consider this example:
+
+@example
+function some_func(p1, a)
+@{
+ if (p1++ > 3)
+ return
+
+ a[p1] = p1
+
+ some_func(p1)
+
+ printf("At level %d, index %d %s found in a\n",
+ p1, (p1 - 1), (p1 - 1) in a ? "is" : "is not")
+ printf("At level %d, index %d %s found in a\n",
+ p1, p1, p1 in a ? "is" : "is not")
+ print ""
+@}
+
+BEGIN @{
+ some_func(1)
+@}
+@end example
+
+When run, this program produces the following output:
+
+@example
+At level 4, index 3 is not found in a
+At level 4, index 4 is found in a
+
+At level 3, index 2 is not found in a
+At level 3, index 3 is found in a
+
+At level 2, index 1 is not found in a
+At level 2, index 2 is found in a
+@end example
+
@node Pass By Value/Reference
@subsubsection Passing Function Arguments By Value Or By Reference
@@ -17579,2748 +17961,28 @@ for (i = 1; i <= n; i++)
@c ENDOFRANGE funcud
-@node Internationalization
-@chapter Internationalization with @command{gawk}
-
-Once upon a time, computer makers
-wrote software that worked only in English.
-Eventually, hardware and software vendors noticed that if their
-systems worked in the native languages of non-English-speaking
-countries, they were able to sell more systems.
-As a result, internationalization and localization
-of programs and software systems became a common practice.
-
-@c STARTOFRANGE inloc
-@cindex internationalization, localization
-@cindex @command{gawk}, internationalization and, See internationalization
-@cindex internationalization, localization, @command{gawk} and
-For many years, the ability to provide internationalization
-was largely restricted to programs written in C and C++.
-This @value{CHAPTER} describes the underlying library @command{gawk}
-uses for internationalization, as well as how
-@command{gawk} makes internationalization
-features available at the @command{awk} program level.
-Having internationalization available at the @command{awk} level
-gives software developers additional flexibility---they are no
-longer forced to write in C or C++ when internationalization is
-a requirement.
-
-@menu
-* I18N and L10N:: Internationalization and Localization.
-* Explaining gettext:: How GNU @code{gettext} works.
-* Programmer i18n:: Features for the programmer.
-* Translator i18n:: Features for the translator.
-* I18N Example:: A simple i18n example.
-* Gawk I18N:: @command{gawk} is also internationalized.
-@end menu
-
-@node I18N and L10N
-@section Internationalization and Localization
-
-@cindex internationalization
-@cindex localization, See internationalization@comma{} localization
-@cindex localization
-@dfn{Internationalization} means writing (or modifying) a program once,
-in such a way that it can use multiple languages without requiring
-further source-code changes.
-@dfn{Localization} means providing the data necessary for an
-internationalized program to work in a particular language.
-Most typically, these terms refer to features such as the language
-used for printing error messages, the language used to read
-responses, and information related to how numerical and
-monetary values are printed and read.
-
-@node Explaining gettext
-@section GNU @code{gettext}
-
-@cindex internationalizing a program
-@c STARTOFRANGE gettex
-@cindex @code{gettext} library
-The facilities in GNU @code{gettext} focus on messages; strings printed
-by a program, either directly or via formatting with @code{printf} or
-@code{sprintf()}.@footnote{For some operating systems, the @command{gawk}
-port doesn't support GNU @code{gettext}.
-Therefore, these features are not available
-if you are using one of those operating systems. Sorry.}
-
-@cindex portability, @code{gettext} library and
-When using GNU @code{gettext}, each application has its own
-@dfn{text domain}. This is a unique name, such as @samp{kpilot} or @samp{gawk},
-that identifies the application.
-A complete application may have multiple components---programs written
-in C or C++, as well as scripts written in @command{sh} or @command{awk}.
-All of the components use the same text domain.
-
-To make the discussion concrete, assume we're writing an application
-named @command{guide}. Internationalization consists of the
-following steps, in this order:
-
-@enumerate
-@item
-The programmer goes
-through the source for all of @command{guide}'s components
-and marks each string that is a candidate for translation.
-For example, @code{"`-F': option required"} is a good candidate for translation.
-A table with strings of option names is not (e.g., @command{gawk}'s
-@option{--profile} option should remain the same, no matter what the local
-language).
-
-@cindex @code{textdomain()} function (C library)
-@item
-The programmer indicates the application's text domain
-(@code{"guide"}) to the @code{gettext} library,
-by calling the @code{textdomain()} function.
-
-@cindex @code{.pot} files
-@cindex files, @code{.pot}
-@cindex portable object template files
-@cindex files, portable object template
-@item
-Messages from the application are extracted from the source code and
-collected into a portable object template file (@file{guide.pot}),
-which lists the strings and their translations.
-The translations are initially empty.
-The original (usually English) messages serve as the key for
-lookup of the translations.
-
-@cindex @code{.po} files
-@cindex files, @code{.po}
-@cindex portable object files
-@cindex files, portable object
-@item
-For each language with a translator, @file{guide.pot}
-is copied to a portable object file (@code{.po})
-and translations are created and shipped with the application.
-For example, there might be a @file{fr.po} for a French translation.
-
-@cindex @code{.mo} files
-@cindex files, @code{.mo}
-@cindex message object files
-@cindex files, message object
-@item
-Each language's @file{.po} file is converted into a binary
-message object (@file{.mo}) file.
-A message object file contains the original messages and their
-translations in a binary format that allows fast lookup of translations
-at runtime.
-
-@item
-When @command{guide} is built and installed, the binary translation files
-are installed in a standard place.
-
-@cindex @code{bindtextdomain()} function (C library)
-@item
-For testing and development, it is possible to tell @code{gettext}
-to use @file{.mo} files in a different directory than the standard
-one by using the @code{bindtextdomain()} function.
-
-@cindex @code{.mo} files, specifying directory of
-@cindex files, @code{.mo}, specifying directory of
-@cindex message object files, specifying directory of
-@cindex files, message object, specifying directory of
-@item
-At runtime, @command{guide} looks up each string via a call
-to @code{gettext()}. The returned string is the translated string
-if available, or the original string if not.
-
-@item
-If necessary, it is possible to access messages from a different
-text domain than the one belonging to the application, without
-having to switch the application's default text domain back
-and forth.
-@end enumerate
-
-@cindex @code{gettext()} function (C library)
-In C (or C++), the string marking and dynamic translation lookup
-are accomplished by wrapping each string in a call to @code{gettext()}:
-
-@example
-printf("%s", gettext("Don't Panic!\n"));
-@end example
-
-The tools that extract messages from source code pull out all
-strings enclosed in calls to @code{gettext()}.
-
-@cindex @code{_} (underscore), @code{_} C macro
-@cindex underscore (@code{_}), @code{_} C macro
-The GNU @code{gettext} developers, recognizing that typing
-@samp{gettext(@dots{})} over and over again is both painful and ugly to look
-at, use the macro @samp{_} (an underscore) to make things easier:
-
-@example
-/* In the standard header file: */
-#define _(str) gettext(str)
-
-/* In the program text: */
-printf("%s", _("Don't Panic!\n"));
-@end example
-
-@cindex internationalization, localization, locale categories
-@cindex @code{gettext} library, locale categories
-@cindex locale categories
-@noindent
-This reduces the typing overhead to just three extra characters per string
-and is considerably easier to read as well.
-
-There are locale @dfn{categories}
-for different types of locale-related information.
-The defined locale categories that @code{gettext} knows about are:
-
-@table @code
-@cindex @code{LC_MESSAGES} locale category
-@item LC_MESSAGES
-Text messages. This is the default category for @code{gettext}
-operations, but it is possible to supply a different one explicitly,
-if necessary. (It is almost never necessary to supply a different category.)
-
-@cindex sorting characters in different languages
-@cindex @code{LC_COLLATE} locale category
-@item LC_COLLATE
-Text-collation information; i.e., how different characters
-and/or groups of characters sort in a given language.
-
-@cindex @code{LC_CTYPE} locale category
-@item LC_CTYPE
-Character-type information (alphabetic, digit, upper- or lowercase, and
-so on).
-This information is accessed via the
-POSIX character classes in regular expressions,
-such as @code{/[[:alnum:]]/}
-(@pxref{Regexp Operators}).
-
-@cindex monetary information, localization
-@cindex currency symbols, localization
-@cindex @code{LC_MONETARY} locale category
-@item LC_MONETARY
-Monetary information, such as the currency symbol, and whether the
-symbol goes before or after a number.
-
-@cindex @code{LC_NUMERIC} locale category
-@item LC_NUMERIC
-Numeric information, such as which characters to use for the decimal
-point and the thousands separator.@footnote{Americans
-use a comma every three decimal places and a period for the decimal
-point, while many Europeans do exactly the opposite:
-1,234.56 versus 1.234,56.}
-
-@cindex @code{LC_RESPONSE} locale category
-@item LC_RESPONSE
-Response information, such as how ``yes'' and ``no'' appear in the
-local language, and possibly other information as well.
-
-@cindex time, localization and
-@cindex dates, information related to@comma{} localization
-@cindex @code{LC_TIME} locale category
-@item LC_TIME
-Time- and date-related information, such as 12- or 24-hour clock, month printed
-before or after the day in a date, local month abbreviations, and so on.
-
-@cindex @code{LC_ALL} locale category
-@item LC_ALL
-All of the above. (Not too useful in the context of @code{gettext}.)
-@end table
-@c ENDOFRANGE gettex
-
-@node Programmer i18n
-@section Internationalizing @command{awk} Programs
-@c STARTOFRANGE inap
-@cindex @command{awk} programs, internationalizing
-
-@command{gawk} provides the following variables and functions for
-internationalization:
-
-@table @code
-@cindex @code{TEXTDOMAIN} variable
-@item TEXTDOMAIN
-This variable indicates the application's text domain.
-For compatibility with GNU @code{gettext}, the default
-value is @code{"messages"}.
-
-@cindex internationalization, localization, marked strings
-@cindex strings, for localization
-@item _"your message here"
-String constants marked with a leading underscore
-are candidates for translation at runtime.
-String constants without a leading underscore are not translated.
-
-@cindex @code{dcgettext()} function (@command{gawk})
-@item dcgettext(@var{string} @r{[}, @var{domain} @r{[}, @var{category}@r{]]})
-Return the translation of @var{string} in
-text domain @var{domain} for locale category @var{category}.
-The default value for @var{domain} is the current value of @code{TEXTDOMAIN}.
-The default value for @var{category} is @code{"LC_MESSAGES"}.
-
-If you supply a value for @var{category}, it must be a string equal to
-one of the known locale categories described in
-@ifnotinfo
-the previous @value{SECTION}.
-@end ifnotinfo
-@ifinfo
-@ref{Explaining gettext}.
-@end ifinfo
-You must also supply a text domain. Use @code{TEXTDOMAIN} if
-you want to use the current domain.
-
-@quotation CAUTION
-The order of arguments to the @command{awk} version
-of the @code{dcgettext()} function is purposely different from the order for
-the C version. The @command{awk} version's order was
-chosen to be simple and to allow for reasonable @command{awk}-style
-default arguments.
-@end quotation
-
-@cindex @code{dcngettext()} function (@command{gawk})
-@item dcngettext(@var{string1}, @var{string2}, @var{number} @r{[}, @var{domain} @r{[}, @var{category}@r{]]})
-Return the plural form used for @var{number} of the
-translation of @var{string1} and @var{string2} in text domain
-@var{domain} for locale category @var{category}. @var{string1} is the
-English singular variant of a message, and @var{string2} the English plural
-variant of the same message.
-The default value for @var{domain} is the current value of @code{TEXTDOMAIN}.
-The default value for @var{category} is @code{"LC_MESSAGES"}.
-
-The same remarks about argument order as for the @code{dcgettext()} function apply.
-
-@cindex @code{.mo} files, specifying directory of
-@cindex files, @code{.mo}, specifying directory of
-@cindex message object files, specifying directory of
-@cindex files, message object, specifying directory of
-@cindex @code{bindtextdomain()} function (@command{gawk})
-@item bindtextdomain(@var{directory} @r{[}, @var{domain}@r{]})
-Change the directory in which
-@code{gettext} looks for @file{.mo} files, in case they
-will not or cannot be placed in the standard locations
-(e.g., during testing).
-Return the directory in which @var{domain} is ``bound.''
-
-The default @var{domain} is the value of @code{TEXTDOMAIN}.
-If @var{directory} is the null string (@code{""}), then
-@code{bindtextdomain()} returns the current binding for the
-given @var{domain}.
-@end table
-
-To use these facilities in your @command{awk} program, follow the steps
-outlined in
-@ifnotinfo
-the previous @value{SECTION},
-@end ifnotinfo
-@ifinfo
-@ref{Explaining gettext},
-@end ifinfo
-like so:
-
-@enumerate
-@cindex @code{BEGIN} pattern, @code{TEXTDOMAIN} variable and
-@cindex @code{TEXTDOMAIN} variable, @code{BEGIN} pattern and
-@item
-Set the variable @code{TEXTDOMAIN} to the text domain of
-your program. This is best done in a @code{BEGIN} rule
-(@pxref{BEGIN/END}),
-or it can also be done via the @option{-v} command-line
-option (@pxref{Options}):
-
-@example
-BEGIN @{
- TEXTDOMAIN = "guide"
- @dots{}
-@}
-@end example
-
-@cindex @code{_} (underscore), translatable string
-@cindex underscore (@code{_}), translatable string
-@item
-Mark all translatable strings with a leading underscore (@samp{_})
-character. It @emph{must} be adjacent to the opening
-quote of the string. For example:
-
-@example
-print _"hello, world"
-x = _"you goofed"
-printf(_"Number of users is %d\n", nusers)
-@end example
-
-@item
-If you are creating strings dynamically, you can
-still translate them, using the @code{dcgettext()}
-built-in function:
-
-@example
-message = nusers " users logged in"
-message = dcgettext(message, "adminprog")
-print message
-@end example
-
-Here, the call to @code{dcgettext()} supplies a different
-text domain (@code{"adminprog"}) in which to find the
-message, but it uses the default @code{"LC_MESSAGES"} category.
-
-@cindex @code{LC_MESSAGES} locale category, @code{bindtextdomain()} function (@command{gawk})
-@item
-During development, you might want to put the @file{.mo}
-file in a private directory for testing. This is done
-with the @code{bindtextdomain()} built-in function:
-
-@example
-BEGIN @{
- TEXTDOMAIN = "guide" # our text domain
- if (Testing) @{
- # where to find our files
- bindtextdomain("testdir")
- # joe is in charge of adminprog
- bindtextdomain("../joe/testdir", "adminprog")
- @}
- @dots{}
-@}
-@end example
-
-@end enumerate
-
-@xref{I18N Example},
-for an example program showing the steps to create
-and use translations from @command{awk}.
-
-@node Translator i18n
-@section Translating @command{awk} Programs
-
-@cindex @code{.po} files
-@cindex files, @code{.po}
-@cindex portable object files
-@cindex files, portable object
-Once a program's translatable strings have been marked, they must
-be extracted to create the initial @file{.po} file.
-As part of translation, it is often helpful to rearrange the order
-in which arguments to @code{printf} are output.
-
-@command{gawk}'s @option{--gen-pot} command-line option extracts
-the messages and is discussed next.
-After that, @code{printf}'s ability to
-rearrange the order for @code{printf} arguments at runtime
-is covered.
-
-@menu
-* String Extraction:: Extracting marked strings.
-* Printf Ordering:: Rearranging @code{printf} arguments.
-* I18N Portability:: @command{awk}-level portability issues.
-@end menu
-
-@node String Extraction
-@subsection Extracting Marked Strings
-@cindex strings, extracting
-@cindex marked strings@comma{} extracting
-@cindex @code{--gen-pot} option
-@cindex command-line options, string extraction
-@cindex string extraction (internationalization)
-@cindex marked string extraction (internationalization)
-@cindex extraction, of marked strings (internationalization)
-
-@cindex @code{--gen-pot} option
-Once your @command{awk} program is working, and all the strings have
-been marked and you've set (and perhaps bound) the text domain,
-it is time to produce translations.
-First, use the @option{--gen-pot} command-line option to create
-the initial @file{.pot} file:
-
-@example
-$ @kbd{gawk --gen-pot -f guide.awk > guide.pot}
-@end example
-
-@cindex @code{xgettext} utility
-When run with @option{--gen-pot}, @command{gawk} does not execute your
-program. Instead, it parses it as usual and prints all marked strings
-to standard output in the format of a GNU @code{gettext} Portable Object
-file. Also included in the output are any constant strings that
-appear as the first argument to @code{dcgettext()} or as the first and
-second argument to @code{dcngettext()}.@footnote{The
-@command{xgettext} utility that comes with GNU
-@code{gettext} can handle @file{.awk} files.}
-@xref{I18N Example},
-for the full list of steps to go through to create and test
-translations for @command{guide}.
-
-@node Printf Ordering
-@subsection Rearranging @code{printf} Arguments
-
-@cindex @code{printf} statement, positional specifiers
-@cindex positional specifiers, @code{printf} statement
-Format strings for @code{printf} and @code{sprintf()}
-(@pxref{Printf})
-present a special problem for translation.
-Consider the following:@footnote{This example is borrowed
-from the GNU @code{gettext} manual.}
-
-@c line broken here only for smallbook format
-@example
-printf(_"String `%s' has %d characters\n",
- string, length(string)))
-@end example
-
-A possible German translation for this might be:
-
-@example
-"%d Zeichen lang ist die Zeichenkette `%s'\n"
-@end example
-
-The problem should be obvious: the order of the format
-specifications is different from the original!
-Even though @code{gettext()} can return the translated string
-at runtime,
-it cannot change the argument order in the call to @code{printf}.
-
-To solve this problem, @code{printf} format specifiers may have
-an additional optional element, which we call a @dfn{positional specifier}.
-For example:
-
-@example
-"%2$d Zeichen lang ist die Zeichenkette `%1$s'\n"
-@end example
-
-Here, the positional specifier consists of an integer count, which indicates which
-argument to use, and a @samp{$}. Counts are one-based, and the
-format string itself is @emph{not} included. Thus, in the following
-example, @samp{string} is the first argument and @samp{length(string)} is the second:
-
-@example
-$ @kbd{gawk 'BEGIN @{}
-> @kbd{string = "Dont Panic"}
-> @kbd{printf _"%2$d characters live in \"%1$s\"\n",}
-> @kbd{string, length(string)}
-> @kbd{@}'}
-@print{} 10 characters live in "Dont Panic"
-@end example
-
-If present, positional specifiers come first in the format specification,
-before the flags, the field width, and/or the precision.
-
-Positional specifiers can be used with the dynamic field width and
-precision capability:
-
-@example
-$ @kbd{gawk 'BEGIN @{}
-> @kbd{printf("%*.*s\n", 10, 20, "hello")}
-> @kbd{printf("%3$*2$.*1$s\n", 20, 10, "hello")}
-> @kbd{@}'}
-@print{} hello
-@print{} hello
-@end example
-
-@quotation NOTE
-When using @samp{*} with a positional specifier, the @samp{*}
-comes first, then the integer position, and then the @samp{$}.
-This is somewhat counterintuitive.
-@end quotation
-
-@cindex @code{printf} statement, positional specifiers, mixing with regular formats
-@cindex positional specifiers, @code{printf} statement, mixing with regular formats
-@cindex format specifiers, mixing regular with positional specifiers
-@command{gawk} does not allow you to mix regular format specifiers
-and those with positional specifiers in the same string:
-
-@example
-$ @kbd{gawk 'BEGIN @{ printf _"%d %3$s\n", 1, 2, "hi" @}'}
-@error{} gawk: cmd. line:1: fatal: must use `count$' on all formats or none
-@end example
-
-@quotation NOTE
-There are some pathological cases that @command{gawk} may fail to
-diagnose. In such cases, the output may not be what you expect.
-It's still a bad idea to try mixing them, even if @command{gawk}
-doesn't detect it.
-@end quotation
-
-Although positional specifiers can be used directly in @command{awk} programs,
-their primary purpose is to help in producing correct translations of
-format strings into languages different from the one in which the program
-is first written.
-
-@node I18N Portability
-@subsection @command{awk} Portability Issues
-
-@cindex portability, internationalization and
-@cindex internationalization, localization, portability and
-@command{gawk}'s internationalization features were purposely chosen to
-have as little impact as possible on the portability of @command{awk}
-programs that use them to other versions of @command{awk}.
-Consider this program:
-
-@example
-BEGIN @{
- TEXTDOMAIN = "guide"
- if (Test_Guide) # set with -v
- bindtextdomain("/test/guide/messages")
- print _"don't panic!"
-@}
-@end example
-
-@noindent
-As written, it won't work on other versions of @command{awk}.
-However, it is actually almost portable, requiring very little
-change:
-
-@itemize @bullet
-@cindex @code{TEXTDOMAIN} variable, portability and
-@item
-Assignments to @code{TEXTDOMAIN} won't have any effect,
-since @code{TEXTDOMAIN} is not special in other @command{awk} implementations.
-
-@item
-Non-GNU versions of @command{awk} treat marked strings
-as the concatenation of a variable named @code{_} with the string
-following it.@footnote{This is good fodder for an ``Obfuscated
-@command{awk}'' contest.} Typically, the variable @code{_} has
-the null string (@code{""}) as its value, leaving the original string constant as
-the result.
-
-@item
-By defining ``dummy'' functions to replace @code{dcgettext()}, @code{dcngettext()}
-and @code{bindtextdomain()}, the @command{awk} program can be made to run, but
-all the messages are output in the original language.
-For example:
-
-@cindex @code{bindtextdomain()} function (@command{gawk}), portability and
-@cindex @code{dcgettext()} function (@command{gawk}), portability and
-@cindex @code{dcngettext()} function (@command{gawk}), portability and
-@example
-@c file eg/lib/libintl.awk
-function bindtextdomain(dir, domain)
-@{
- return dir
-@}
-
-function dcgettext(string, domain, category)
-@{
- return string
-@}
-
-function dcngettext(string1, string2, number, domain, category)
-@{
- return (number == 1 ? string1 : string2)
-@}
-@c endfile
-@end example
-
-@item
-The use of positional specifications in @code{printf} or
-@code{sprintf()} is @emph{not} portable.
-To support @code{gettext()} at the C level, many systems' C versions of
-@code{sprintf()} do support positional specifiers. But it works only if
-enough arguments are supplied in the function call. Many versions of
-@command{awk} pass @code{printf} formats and arguments unchanged to the
-underlying C library version of @code{sprintf()}, but only one format and
-argument at a time. What happens if a positional specification is
-used is anybody's guess.
-However, since the positional specifications are primarily for use in
-@emph{translated} format strings, and since non-GNU @command{awk}s never
-retrieve the translated string, this should not be a problem in practice.
-@end itemize
-@c ENDOFRANGE inap
-
-@node I18N Example
-@section A Simple Internationalization Example
-
-Now let's look at a step-by-step example of how to internationalize and
-localize a simple @command{awk} program, using @file{guide.awk} as our
-original source:
-
-@example
-@c file eg/prog/guide.awk
-BEGIN @{
- TEXTDOMAIN = "guide"
- bindtextdomain(".") # for testing
- print _"Don't Panic"
- print _"The Answer Is", 42
- print "Pardon me, Zaphod who?"
-@}
-@c endfile
-@end example
-
-@noindent
-Run @samp{gawk --gen-pot} to create the @file{.pot} file:
-
-@example
-$ @kbd{gawk --gen-pot -f guide.awk > guide.pot}
-@end example
-
-@noindent
-This produces:
-
-@example
-@c file eg/data/guide.po
-#: guide.awk:4
-msgid "Don't Panic"
-msgstr ""
-
-#: guide.awk:5
-msgid "The Answer Is"
-msgstr ""
-
-@c endfile
-@end example
-
-This original portable object template file is saved and reused for each language
-into which the application is translated. The @code{msgid}
-is the original string and the @code{msgstr} is the translation.
-
-@quotation NOTE
-Strings not marked with a leading underscore do not
-appear in the @file{guide.pot} file.
-@end quotation
-
-Next, the messages must be translated.
-Here is a translation to a hypothetical dialect of English,
-called ``Mellow'':@footnote{Perhaps it would be better if it were
-called ``Hippy.'' Ah, well.}
-
-@example
-@group
-$ cp guide.pot guide-mellow.po
-@var{Add translations to} guide-mellow.po @dots{}
-@end group
-@end example
-
-@noindent
-Following are the translations:
-
-@example
-@c file eg/data/guide-mellow.po
-#: guide.awk:4
-msgid "Don't Panic"
-msgstr "Hey man, relax!"
-
-#: guide.awk:5
-msgid "The Answer Is"
-msgstr "Like, the scoop is"
-
-@c endfile
-@end example
-
-@cindex Linux
-@cindex GNU/Linux
-The next step is to make the directory to hold the binary message object
-file and then to create the @file{guide.mo} file.
-The directory layout shown here is standard for GNU @code{gettext} on
-GNU/Linux systems. Other versions of @code{gettext} may use a different
-layout:
-
-@example
-$ @kbd{mkdir en_US en_US/LC_MESSAGES}
-@end example
-
-@cindex @code{.po} files, converting to @code{.mo}
-@cindex files, @code{.po}, converting to @code{.mo}
-@cindex @code{.mo} files, converting from @code{.po}
-@cindex files, @code{.mo}, converting from @code{.po}
-@cindex portable object files, converting to message object files
-@cindex files, portable object, converting to message object files
-@cindex message object files, converting from portable object files
-@cindex files, message object, converting from portable object files
-@cindex @command{msgfmt} utility
-The @command{msgfmt} utility does the conversion from human-readable
-@file{.po} file to machine-readable @file{.mo} file.
-By default, @command{msgfmt} creates a file named @file{messages}.
-This file must be renamed and placed in the proper directory so that
-@command{gawk} can find it:
-
-@example
-$ @kbd{msgfmt guide-mellow.po}
-$ @kbd{mv messages en_US/LC_MESSAGES/guide.mo}
-@end example
-
-Finally, we run the program to test it:
-
-@example
-$ @kbd{gawk -f guide.awk}
-@print{} Hey man, relax!
-@print{} Like, the scoop is 42
-@print{} Pardon me, Zaphod who?
-@end example
-
-If the three replacement functions for @code{dcgettext()}, @code{dcngettext()}
-and @code{bindtextdomain()}
-(@pxref{I18N Portability})
-are in a file named @file{libintl.awk},
-then we can run @file{guide.awk} unchanged as follows:
-
-@example
-$ @kbd{gawk --posix -f guide.awk -f libintl.awk}
-@print{} Don't Panic
-@print{} The Answer Is 42
-@print{} Pardon me, Zaphod who?
-@end example
-
-@node Gawk I18N
-@section @command{gawk} Can Speak Your Language
-
-@command{gawk} itself has been internationalized
-using the GNU @code{gettext} package.
-(GNU @code{gettext} is described in
-complete detail in
-@ifinfo
-@inforef{Top, , GNU @code{gettext} utilities, gettext, GNU gettext tools}.)
-@end ifinfo
-@ifnotinfo
-@cite{GNU gettext tools}.)
-@end ifnotinfo
-As of this writing, the latest version of GNU @code{gettext} is
-@uref{ftp://ftp.gnu.org/gnu/gettext/gettext-0.18.1.tar.gz, @value{PVERSION} 0.18.1}.
-
-If a translation of @command{gawk}'s messages exists,
-then @command{gawk} produces usage messages, warnings,
-and fatal errors in the local language.
-@c ENDOFRANGE inloc
-
-@node Arbitrary Precision Arithmetic
-@chapter Arbitrary Precision Arithmetic with @command{gawk}
-@cindex arbitrary precision
-@cindex multiple precision
-@cindex infinite precision
-@cindex floating-point numbers, arbitrary precision
-@cindex MPFR
-@cindex GMP
-
-@cindex Knuth, Donald
-@quotation
-@i{There's a credibility gap: We don't know how much of the computer's answers
-to believe. Novice computer users solve this problem by implicitly trusting
-in the computer as an infallible authority; they tend to believe that all
-digits of a printed answer are significant. Disillusioned computer users have
-just the opposite approach; they are constantly afraid that their answers
-are almost meaningless.}
-
-Donald Knuth@footnote{Donald E.@: Knuth.
-@cite{The Art of Computer Programming}. Volume 2,
-@cite{Seminumerical Algorithms}, third edition,
-1998, ISBN 0-201-89683-4, p.@: 229.}
-@end quotation
-
-This @value{SECTION} decsribes how to use the arbitrary precision
-(also known as @dfn{multiple precision} or @dfn{infinite precision}) numeric
-capabilites in @command{gawk} to produce maximally accurate results
-when you need it. But first you should check if your version of
-@command{gawk} supports arbitrary precision arithmetic.
-The easiest way to find out is to look at the output of
-the following command:
-
-@example
-$ @kbd{gawk --version}
-@print{} GNU Awk 4.1.0 (GNU MPFR 3.1.0, GNU MP 5.0.3)
-@print{} Copyright (C) 1989, 1991-2012 Free Software Foundation.
-@dots{}
-@end example
-
-@command{gawk} uses the
-@uref{http://www.mpfr.org, GNU MPFR}
-and
-@uref{http://gmplib.org, GNU MP} (GMP)
-libraries for arbitrary precision
-arithmetic on numbers. So if you do not see the names of these libraries
-in the output, then your version of @command{gawk} does not support
-arbitrary precision arithmetic.
-
-Even if you aren't interested in arbitrary precision arithmetic, you
-may still benifit from knowing about how @command{gawk} handles numbers
-in general, and the limitations of doing arithmetic with ordinary
-@command{gawk} numbers.
-
-@menu
-* Floating-point Programming:: Effective Floating-point Programming.
-* Floating-point Representation:: Binary Floating-point Representation.
-* Floating-point Context:: Floating-point Context.
-* Rounding Mode:: Floating-point Rounding Mode.
-* Arbitrary Precision Floats:: Arbitrary Precision Floating-point
- Arithmetic with @command{gawk}.
-* Setting Precision:: Setting the Working Precision.
-* Setting Rounding Mode:: Setting the Rounding Mode.
-* Floating-point Constants:: Representing Floating-point Constants.
-* Changing Precision:: Changing the Precision of a Number.
-* Exact Arithmetic:: Exact Arithmetic with Floating-point Numbers.
-* Integer Programming:: Effective Integer Programming.
-* Arbitrary Precision Integers:: Arbitrary Precision Integer
- Arithmetic with @command{gawk}.
-* MPFR and GMP Libraries:: Information About the MPFR and GMP Libraries.
-@end menu
-
-@node Floating-point Programming
-@section Effective Floating-point Programming
-
-Numerical programming is an extensive area; if you need to develop
-sophisticated numerical algorithms then @command{gawk} may not be
-the ideal tool, and this documentation may not be sufficient.
-@c FIXME: JOHN: Do you want to cite some actual books?
-It might require a book or two to communicate how to compute
-with ideal accuracy and precision
-and the result often depends on the particular application.
-
-@quotation NOTE
-A floating-point calculation's @dfn{accuracy} is how close it comes
-to the real value. This is as opposed to the @dfn{precision}, which
-usually refers to the number of bits used to represent the number
-(see @uref{http://en.wikipedia.org/wiki/Accuracy_and_precision,
-the Wikipedia article} for more information).
-@end quotation
-
-Binary floating-point representations and arithmetic are inexact.
-Simple values like 0.1 cannot be precisely represented using
-binary floating-point numbers, and the limited precision of
-floating-point numbers means that slight changes in
-the order of operations or the precision of intermediate storage
-can change the result. To make matters worse with arbitrary precision
-floating-point, you can set the precision before starting a computation,
-but then you cannot be sure of the number of significant decimal places
-in the final result.
-
-Sometimes you need to think more about what you really want
-and what's really happening. Consider the two numbers
-in the following example:
-
-@example
-x = 0.875 # 1/2 + 1/4 + 1/8
-y = 0.425
-@end example
-
-Unlike the number in @code{y}, the number stored in @code{x}
-is exactly representable
-in binary since it can be written as a finite sum of one or
-more fractions whose denominators are all powers of two.
-When @command{gawk} reads a floating-point number from
-program source, it automatically rounds that number to whatever
-precision your machine supports. If you try to print the numeric
-content of a variable using an output format string of @code{"%.17g"},
-it may not produce the same number as you assigned to it:
-
-@example
-$ @kbd{gawk 'BEGIN @{ x = 0.875; y = 0.425}
-> @kbd{ printf("%0.17g, %0.17g\n", x, y) @}'}
-@print{} 0.875, 0.42499999999999999
-@end example
-
-Often the error is so small you do not even notice it, and if you do,
-you can always specify how much precision you would like in your output.
-Usually this is a format string like @code{"%.15g"}, which when
-used in the previous example, produces an output identical to the input.
-
-Because the underlying representation can be little bit off from the exact value,
-comparing floats to see if they are equal is generally not a good idea.
-Here is an example where it does not work like you expect:
-
-@example
-$ @kbd{gawk 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'}
-@print{} 0
-@end example
-
-The loss of accuracy during a single computation with floating-point numbers
-usually isn't enough to worry about. However, if you compute a value
-which is the result of a sequence of floating point operations,
-the error can accumulate and greatly affect the computation itself.
-Here is an attempt to compute the value of the constant
-@value{PI} using one of its many series representations:
-
-@example
-BEGIN @{
- x = 1.0 / sqrt(3.0)
- n = 6
- for (i = 1; i < 30; i++) @{
- n = n * 2.0
- x = (sqrt(x * x + 1) - 1) / x
- printf("%.15f\n", n * x)
- @}
-@}
-@end example
-
-When run, the early errors propagating through later computations
-cause the loop to terminate prematurely after an attempt to divide by zero.
-
-@example
-$ @kbd{gawk -f pi.awk}
-@print{} 3.215390309173475
-@print{} 3.159659942097510
-@print{} 3.146086215131467
-@print{} 3.142714599645573
-@dots{}
-@print{} 3.224515243534819
-@print{} 2.791117213058638
-@print{} 0.000000000000000
-@error{} gawk: pi.awk:6: fatal: division by zero attempted
-@end example
-
-Here is one more example where the inaccuracies in internal representations
-yield an unexpected result:
-
-@example
-$ @kbd{gawk 'BEGIN @{}
-> @kbd{for (d = 1.1; d <= 1.5; d += 0.1)}
-> @kbd{i++}
-> @kbd{print i}
-> @kbd{@}'}
-@print{} 4
-@end example
-
-Can computation using aribitrary precision help with the previous examples?
-If you are impatient to know, see
-@ref{Exact Arithmetic}.
-
-Instead of aribitrary precision floating-point arithmetic,
-often all you need is an adjustment of your logic
-or a different order for the operations in your calculation.
-The stability and the accuracy of the computation of the constant @value{PI}
-in the previous example can be enhanced by using the following
-simple algebraic transformation:
-
-@example
-(sqrt(x * x + 1) - 1) / x = x / (sqrt(x * x + 1) + x)
-@end example
-
-There is no need to be unduly suspicious about the results from
-floating-point arithmetic. The lesson to remember is that
-floating-point math is always more complex than the math using
-pencil and paper. In order to take advantage of the power
-of computer floating-point, you need to know its limitations
-and work within them. For most casual use of floating-point arithmetic,
-you will often get the expected result in the end if you simply round
-the display of your final results to the correct number of significant
-decimal digits. Avoid presenting numerical data in a manner that
-implies better precision than is actually the case.
-
-@node Floating-point Representation
-@section Binary Floating-point Representation
-@cindex IEEE-754 format
-
-Although floating-point representations vary from machine to machine,
-the most commonly encountered representation is that defined by the
-IEEE 754 Standard. An IEEE-754 format value has three components:
-
-@itemize @bullet
-@item
-a sign bit telling whether the number is positive or negative,
-
-@item
-an @dfn{exponent} giving its order of magnitude, @var{e},
-
-@item
-and a @dfn{significand}, @var{s},
-specifying the actual digits of the number.
-@end itemize
-
-The value of the
-number is then
@iftex
-@math{s @cdot 2^e}.
+@part Part II:@* Problem Solving With @command{awk}
@end iftex
-@ifnottex
-@var{s * 2^e}.
-@end ifnottex
-The first bit of a non-zero binary significand
-is always one, so the significand in an IEEE-754 format only includes the
-fractional part, leaving the leading one implicit.
-
-Three of the standard IEEE-754 types are 32-bit single precision,
-64-bit double precision and 128-bit quadruple precision.
-The standard also specifies extended precision formats
-to allow greater precisions and larger exponent ranges.
-@node Floating-point Context
-@section Floating-point Context
-@cindex context, floating-point
-
-A floating-point context defines the environment for arithmetic operations.
-It governs precision, sets rules for rounding and limits range for exponents.
-The context has the following primary components:
-
-@table @code
-@item precision
-Precision of the floating-point format in bits.
-@item emax
-Maximum exponent allowed for this format.
-@item emin
-Minimum exponent allowed for this format.
-@item underflow behavior
-The format may or may not support gradual underflow.
-@item rounding
-The rounding mode of this context.
-@end table
-
-@ref{table-ieee-formats} lists the precision and exponent
-field values for the basic IEEE-754 binary formats:
-
-@float Table,table-ieee-formats
-@caption{Basic IEEE Formats}
-@multitable @columnfractions .20 .20 .20 .20 .20
-@headitem Name @tab Total bits @tab Precision @tab emin @tab emax
-@item Single @tab 32 @tab 24 @tab @minus{}126 @tab +127
-@item Double @tab 64 @tab 53 @tab @minus{}1022 @tab +1023
-@item Quadruple @tab 128 @tab 113 @tab @minus{}16382 @tab +16383
-@end multitable
-@end float
-
-@quotation NOTE
-The precision numbers include the implied leading one that gives them
-one extra bit of significand.
-@end quotation
-
-A floating-point context can also determine which signals are treated
-as exceptions, and can set rules for arithmetic with special values.
-Please consult the IEEE-754 standard or other resources for details.
-
-@command{gawk} ordinarily uses the hardware double precision
-representation for numbers. On most systems, this is IEEE-754
-floating-point format, corresponding to 64-bit binary with 53 bits
-of precision.
-
-@quotation NOTE
-In case an underflow occurs, the standard allows, but does not require,
-the result from an arithmetic operation to be a number smaller than
-the smallest nonzero normalized number. Such numbers do
-not have as many significant digits as normal numbers, and are called
-@dfn{denormals} or @dfn{subnormals}. The alternative, simply returning a zero,
-is called @dfn{flush to zero}. The basic IEEE-754 binary formats
-support subnormal numbers.
-@end quotation
-
-@node Rounding Mode
-@section Floating-point Rounding Mode
-@cindex rounding mode, floating-point
-
-The @dfn{rounding mode} specifies the behavior for the results of numerical
-operations when discarding extra precision. Each rounding mode indicates
-how the least significant returned digit of a rounded result is to
-be calculated.
-The @code{ROUNDMODE} variable (@pxref{Setting Rounding Mode}) provides
-program level control over the rounding mode.
-@ref{table-rounding-modes} lists the IEEE-754 defined
-rounding modes:
-
-@float Table,table-rounding-modes
-@caption{Rounding Modes}
-@multitable @columnfractions .45 .30 .25
-@headitem Rounding Mode @tab IEEE Name @tab @code{ROUNDMODE}
-@item Round to nearest, ties to even @tab @code{roundTiesToEven} @tab @code{"N"} or @code{"n"}
-@item Round toward plus Infinity @tab @code{roundTowardPositive} @tab @code{"U"} or @code{"u"}
-@item Round toward negative Infinity @tab @code{roundTowardNegative} @tab @code{"D"} or @code{"d"}
-@item Round toward zero @tab @code{roundTowardZero} @tab @code{"Z"} or @code{"z"}
-@item Round to nearest, ties away from zero @tab @code{roundTiesToAway} @tab @code{"A"} or @code{"a"}
-@end multitable
-@end float
-
-The default mode @samp{roundTiesToEven} is the most preferred,
-but the least intuitive. This method does the obvious thing for most values,
-by rounding them up or down to the nearest digit.
-For example, rounding 1.132 to two digits yields 1.13,
-and rounding 1.157 yields 1.16.
-
-However, when it comes to rounding a value that is exactly halfway between,
-things do not work the way you probably learned in school.
-In this case, the number is rounded to the nearest even digit.
-So rounding 0.125 to two digits rounds down to 0.12,
-but rounding 0.6875 to three digits rounds up to 0.688.
-You probably have already encountered this rounding mode when
-using the @code{printf} routine to format floating-point numbers.
-For example:
-
-@example
-BEGIN @{
- x = -4.5
- for (i = 1; i < 10; i++) @{
- x += 1.0
- printf("%4.1f => %2.0f\n", x, x)
- @}
-@}
-@end example
-
-@noindent
-produces the following output when run@footnote{It
-is possible for the output to be completely different if the
-C library in your system does not use the IEEE-754 even-rounding
-rule to round halfway cases for @code{printf()}.}:
-
-@example
--3.5 => -4
--2.5 => -2
--1.5 => -2
--0.5 => 0
- 0.5 => 0
- 1.5 => 2
- 2.5 => 2
- 3.5 => 4
- 4.5 => 4
-@end example
-
-The theory behind the rounding mode @samp{roundTiesToEven} is that
-it more or less evenly distributes upward and downward rounds
-of exact halves, which might cause the round-off error
-to cancel itself out. This is the default rounding mode used
-in IEEE-754 computing functions and operators.
-
-The other rounding modes are rarely used.
-Round toward positive infinity (@samp{roundTowardPositive})
-and round toward negative infinity (@samp{roundTowardNegative})
-are often used to implement interval arithmetic,
-where you adjust the rounding mode to calculate upper and lower bounds
-for the range of output. The @samp{roundTowardZero}
-mode can be used for converting floating-point numbers to integers.
-The rounding mode @samp{roundTiesToAway} rounds the result to the
-nearest number and selects the number with the larger magnitude
-if a tie occurs.
-
-Some numerical analysts will tell you that your choice of rounding style
-has tremendous impact on the final outcome, and advise you to wait until
-final output for any rounding. Instead, you can often achieve this goal by
-setting the precision initially to some value sufficiently larger than
-the final desired precision, so that the accumulation of round-off error
-does not influence the outcome.
-If you suspect that results from your computation are
-sensitive to accumulation of round-off error,
-one way to be sure is to look for a significant difference in output
-when you change the rounding mode.
-
-@node Arbitrary Precision Floats
-@section Arbitrary Precision Floating-point Arithmetic with @command{gawk}
-
-@command{gawk} uses the GNU MPFR library
-for arbitrary precision floating-point arithmetic. The MPFR library
-provides precise control over precisions and rounding modes, and gives
-correctly rounded reproducible platform-independent results. With the
-command-line option @option{--bignum} or @option{-M},
-all floating-point arithmetic operators and numeric functions can yield
-results to any desired precision level supported by MPFR.
-Two built-in
-variables @code{PREC}
-(@pxref{Setting Precision})
-and @code{ROUNDMODE}
-(@pxref{Setting Rounding Mode})
-provide control over the working precision and the rounding mode.
-The precision and the rounding mode are set globally for every operation
-to follow.
-
-The default working precision for arbitrary precision floats is 53,
-and the default value for @code{ROUNDMODE} is @code{"N"},
-which selects the IEEE-754
-@samp{roundTiesToEven} (@pxref{Rounding Mode}) rounding mode.@footnote{The
-default precision is 53, since according to the MPFR documentation,
-the library should be able to exactly reproduce all computations with
-double-precision machine floating-point numbers (@code{double} type
-in C), except the default exponent range is much wider and subnormal
-numbers are not implemented.}
-@command{gawk} uses the default exponent range in MPFR
-@iftex
-(@math{emax = 2^{30} - 1, emin = -emax})
-@end iftex
-@ifnottex
-(@var{emax} = 2^30 @minus{} 1, @var{emin} = @minus{}@var{emax})
-@end ifnottex
-for all floating-point contexts.
-There is no explicit mechanism to adjust the exponent range.
-MPFR does not implement subnormal numbers by default,
-and this behavior cannot be changed in @command{gawk}.
-
-@quotation NOTE
-When emulating an IEEE-754 format (@pxref{Setting Precision}),
-@command{gawk} internally adjusts the exponent range
-to the value defined for the format and also performs computations needed for
-gradual underflow (subnormal numbers).
-@end quotation
-
-@quotation NOTE
-MPFR numbers are variable-size entities, consuming only as much space as
-needed to store the significant digits. Since the performance using MPFR
-numbers pales in comparison to doing math using the underlying machine
-types, you should consider using only as much precision as needed by
-your program.
-@end quotation
-
-@node Setting Precision
-@section Setting the Working Precision
-@cindex @code{PREC} variable
-
-@command{gawk} uses a global working precision; it does not keep track of
-the precision or accuracy of individual numbers. Performing an arithmetic
-operation or calling a built-in function rounds the result to the current
-working precision. The default working precision is 53 which can be
-modified using the built-in variable @code{PREC}. You can also set the
-value to one of the following pre-defined case-insensitive strings
-to emulate an IEEE-754 binary format:
-
-@multitable {@code{"double"}} {12345678901234567890123456789012345}
-@headitem @code{PREC} @tab IEEE-754 Binary Format
-@item @code{"half"} @tab 16-bit half-precision.
-@item @code{"single"} @tab Basic 32-bit single precision.
-@item @code{"double"} @tab Basic 64-bit double precision.
-@item @code{"quad"} @tab Basic 128-bit quadruple precision.
-@item @code{"oct"} @tab 256-bit octuple precision.
-@end multitable
-
-The following example illustrates the effects of changing precision
-on arithmetic operations:
-
-@example
-$ @kbd{gawk -M -vPREC=100 'BEGIN @{ x = 1.0e-400; print x + 0; \}
-> @kbd{PREC = "double"; print x + 0 @}'}
-@print{} 1e-400
-@print{} 0
-@end example
-
-Binary and decimal precisions are related approximately according to the
-formula:
-
-@iftex
-@math{prec = 3.322 @cdot dps}
-@end iftex
-@ifnottex
-@var{prec} = 3.322 * @var{dps}
-@end ifnottex
-
-@noindent
-Here, @var{prec} denotes the binary precision
-(measured in bits) and @var{dps} (short for decimal places)
-is the decimal digits. We can easily calculate how many decimal
-digits the 53-bit significand of an IEEE double is equivalent to:
-53 / 3.332 which is equal to about 15.95.
-But what does 15.95 digits actually mean? It depends whether you are
-concerned about how many digits you can rely on, or how many digits
-you need.
-
-It is important to know how many bits it takes to uniquely identify
-a double-precision value (the C type @code{double}). If you want to
-convert from @code{double} to decimal and back to @code{double} (e.g.,
-saving a @code{double} representing an intermediate result to a file, and
-later reading it back to restart the computation), then a few more decimal
-digits are required. 17 digits is generally enough for a @code{double}.
-
-It can also be important to know what decimal numbers can be uniquely
-represented with a @code{double}. If you want to convert
-from decimal to @code{double} and back again, 15 digits is the most that
-you can get. Stated differently, you should not present
-the numbers from your floating-point computations with more than 15
-significant digits in them.
-
-Conversely, it takes a precision of 332 bits to hold an approximation
-of constant @value{PI} that is accurate to 100 decimal places.
-You should always add some extra bits in order to avoid the confusing round-off
-issues that occur because numbers are stored internally in binary.
-
-@node Setting Rounding Mode
-@section Setting the Rounding Mode
-@cindex @code{ROUNDMODE} variable
-
-The built-in variable @code{ROUNDMODE} has the default value @code{"N"},
-which selects the IEEE-754 rounding mode @samp{roundTiesToEven}.
-The other possible values for @code{ROUNDMODE} are @code{"U"} for rounding mode
-@samp{roundTowardPositive}, @code{"D"} for @samp{roundTowardNegative},
-and @code{"Z"} for @samp{roundTowardZero}.
-@command{gawk} also accepts @code{"A"} to select the IEEE-754 mode
-@samp{roundTiesToAway}
-if your version of the MPFR library supports it; otherwise setting
-@code{ROUNDMODE} to this value has no effect. @xref{Rounding Mode},
-for the meanings of the various rounding modes.
-
-Here is an example of how to change the default rounding behavior of
-@code{printf}'s output:
-
-@example
-$ @kbd{gawk -M -vROUNDMODE="Z" 'BEGIN @{ printf("%.2f\n", 1.378) @}'}
-@print{} 1.37
-@end example
-
-@node Floating-point Constants
-@section Representing Floating-point Constants
-@cindex constants, floating-point
-
-Be wary of floating-point constants! When reading a floating-point constant
-from program source code, @command{gawk} uses the default precision,
-unless overridden
-by an assignment to the special variable @code{PREC} on the command
-line, to store it internally as a MPFR number.
-Changing the precision using @code{PREC} in the program text does
-not change the precision of a constant. If you need to
-represent a floating-point constant at a higher precision than the
-default and cannot use a command line assignment to @code{PREC},
-you should either specify the constant as a string, or
-a rational number whenever possible. The following example
-illustrates the differences among various ways to
-print a floating-point constant:
-
-@example
-$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", 0.1) @}'}
-@print{} 0.1000000000000000055511151
-$ @kbd{gawk -M -vPREC = 113 'BEGIN @{ printf("%0.25f\n", 0.1) @}'}
-@print{} 0.1000000000000000000000000
-$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", "0.1") @}'}
-@print{} 0.1000000000000000000000000
-$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", 1/10) @}'}
-@print{} 0.1000000000000000000000000
-@end example
-
-In the first case, the number is stored with the default precision of 53.
-
-@node Changing Precision
-@section Changing the Precision of a Number
-
-@cindex Laurie, Dirk
-@quotation
-@i{The point is that in any variable-precision package,
-a decision is made on how to treat numbers given as data,
-or arising in intermediate results, which are represented in
-floating-point format to a precision lower than working precision.
-Do we promote them to full membership of the high-precision club,
-or do we treat them and all their associates as second-class citizens?
-Sometimes the first course is proper, sometimes the second, and it takes
-careful analysis to tell which.}
-
-Dirk Laurie@footnote{Dirk Laurie.
-@cite{Variable-precision Arithmetic Considered Perilous -- A Detective Story}.
-Electronic Transactions on Numerical Analysis. Volume 28, pp. 168-173, 2008.}
-@end quotation
-
-@command{gawk} does not implicitly modify the precision of any previously
-computed results when the working precision is changed with an assignment
-to @code{PREC}. The precision of a number is always the one that was
-used at the time of its creation, and there is no way for the user
-to explicitly change it afterwards. However, since the result of a
-floating-point arithmetic operation is always an arbitrary precision
-floating-point value---with a precision set by the value of @code{PREC}---one of the
-following workarounds effectively accomplishes the desired behavior:
-
-@example
-x = x + 0.0
-@end example
-
-@noindent
-or:
-
-@example
-x += 0.0
-@end example
-
-@node Exact Arithmetic
-@section Exact Arithmetic with Floating-point Numbers
-
-@quotation CAUTION
-Never depend on the exactness of floating-point arithmetic,
-even for apparently simple expressions!
-@end quotation
-
-Can arbitrary precision arithmetic give exact results? There are
-no easy answers. The standard rules of algebra often do not apply
-when using floating-point arithmetic.
-Among other things, the distributive and associative laws
-do not hold completely, and order of operation may be important
-for your computation. Rounding error, cumulative precision loss
-and underflow are often troublesome.
-
-When @command{gawk} tests the expressions @samp{0.1 + 12.2} and @samp{12.3}
-for equality
-using the machine double precision arithmetic, it decides that they
-are not equal!
-(@xref{Floating-point Programming}.)
-You can get the result you want by increasing the precision;
-56 in this case will get the job done:
-
-@example
-$ @kbd{gawk -M -vPREC=56 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'}
-@print{} 1
-@end example
-
-If adding more bits is good, perhaps adding even more bits of
-precision is better?
-Here is what happens if we use an even larger value of @code{PREC}:
-
-@example
-$ @kbd{gawk -M -vPREC=201 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'}
-@print{} 0
-@end example
-
-This is not a bug in @command{gawk} or in the MPFR library.
-It is easy to forget that the finite number of bits used to store the value
-is often just an approximation after proper rounding.
-The test for equality succeeds if and only if @emph{all} bits in the two operands
-are exactly the same. Since this is not necessarily true after floating-point
-computations with a particular precision and effective rounding rule,
-a straight test for equality may not work.
-
-So, don't assume that floating-point values can be compared for equality.
-You should also exercise caution when using other forms of comparisons.
-The standard way to compare between floating-point numbers is to determine
-how much error (or @dfn{tolerance}) you will allow in a comparison and
-check to see if one value is within this error range of the other.
-
-In applications where 15 or fewer decimal places suffice,
-hardware double precision arithmetic can be adequate, and is usually much faster.
-But you do need to keep in mind that every floating-point operation
-can suffer a new rounding error with catastrophic consequences as illustrated
-by our attempt to compute the value of the constant @value{PI},
-(@pxref{Floating-point Programming}).
-Extra precision can greatly enhance the stability and the accuracy
-of your computation in such cases.
-
-Repeated addition is not necessarily equivalent to multiplication
-in floating-point arithmetic. In the last example
-(@pxref{Floating-point Programming}),
-you may or may not succeed in getting the correct result by choosing
-an arbitrarily large value for @code{PREC}. Reformulation of
-the problem at hand is often the correct approach in such situations.
-
-
-@node Integer Programming
-@section Effective Integer Programming
-
-As has been mentioned already, @command{gawk} ordinarily uses hardware double
-precision with 64-bit IEEE binary floating-point representation
-for numbers on most systems. A large integer like 9007199254740997
-has a binary representation that, although finite, is more than 53 bits long;
-it must also be rounded to 53 bits.
-The biggest integer that can be stored in a C @code{double} is usually the same
-as the largest possible value of a @code{double}. If your system @code{double}
-is an IEEE 64-bit @code{double}, this largest possible value is an integer and
-can be represented precisely. What more should one know about integers?
-
-If you want to know what is the largest integer, such that it and
-all smaller integers can be stored in 64-bit doubles without losing precision,
-then the answer is
-@iftex
-@math{2^{53}}.
-@end iftex
-@ifnottex
-2^53.
-@end ifnottex
-The next representable number is the even number
-@iftex
-@math{2^{53} + 2},
-@end iftex
-@ifnottex
-2^53 + 2,
-@end ifnottex
-meaning it is unlikely that you will be able to make
-@command{gawk} print
-@iftex
-@math{2^{53} + 1}
-@end iftex
-@ifnottex
-2^53 + 1
-@end ifnottex
-in integer format.
-The range of integers exactly representable by a 64-bit double
-is
-@iftex
-@math{[-2^{53}, 2^{53}]}.
-@end iftex
-@ifnottex
-[@minus{}2^53, 2^53].
-@end ifnottex
-If you ever see an integer outside this range in @command{gawk}
-using 64-bit doubles, you have reason to be very suspicious about
-the accuracy of the output. Here is a simple program with erroneous output:
-
-@example
-$ @kbd{gawk 'BEGIN @{ i = 2^53 - 1; for (j = 0; j < 4; j++) print i + j @}'}
-@print{} 9007199254740991
-@print{} 9007199254740992
-@print{} 9007199254740992
-@print{} 9007199254740994
-@end example
-
-The lesson is to not assume that any large integer printed by @command{gawk}
-represents an exact result from your computation, especially if it wraps
-around on your screen.
-
-@node Arbitrary Precision Integers
-@section Arbitrary Precision Integer Arithmetic with @command{gawk}
-@cindex integer, arbitrary precision
-
-If the option @option{--bignum} or @option{-M} is specified,
-@command{gawk} performs all
-integer arithmetic using GMP arbitrary precision integers.
-Any number that looks like an integer in a program source or data file
-is stored as an arbitrary precision integer.
-The size of the integer is limited only by your computer's memory.
-The current floating-point context has no effect on operations involving integers.
-For example, the following computes
-@iftex
-@math{5^{4^{3^{2}}}},
-@end iftex
-@ifnottex
-5^4^3^2,
-@end ifnottex
-the result of which is beyond the
-limits of ordinary @command{gawk} numbers:
-
-@example
-$ @kbd{gawk -M 'BEGIN @{}
-> @kbd{x = 5^4^3^2}
-> @kbd{print "# of digits =", length(x)}
-> @kbd{print substr(x, 1, 20), "...", substr(x, length(x) - 19, 20)}
-> @kbd{@}'}
-@print{} # of digits = 183231
-@print{} 62060698786608744707 ... 92256259918212890625
-@end example
-
-If you were to compute the same value using arbitrary precision
-floating-point values instead, the precision needed for correct output
-(using the formula
-@iftex
-@math{prec = 3.322 @cdot dps}),
-would be @math{3.322 @cdot 183231},
-@end iftex
-@ifnottex
-@samp{prec = 3.322 * dps}),
-would be 3.322 x 183231,
-@end ifnottex
-or 608693.
-
-The result from an arithmetic operation with an integer and a floating-point value
-is a floating-point value with a precision equal to the working precision.
-The following program calculates the eighth term in
-Sylvester's sequence@footnote{Weisstein, Eric W.
-@cite{Sylvester's Sequence}. From MathWorld--A Wolfram Web Resource.
-@url{http://mathworld.wolfram.com/SylvestersSequence.html}}
-using a recurrence:
-
-@example
-$ @kbd{gawk -M 'BEGIN @{}
-> @kbd{s = 2.0}
-> @kbd{for (i = 1; i <= 7; i++)}
-> @kbd{s = s * (s - 1) + 1}
-> @kbd{print s}
-> @kbd{@}'}
-@print{} 113423713055421845118910464
-@end example
-
-The output differs from the acutal number, 113423713055421844361000443,
-because the default precision of 53 is not enough to represent the
-floating-point results exactly. You can either increase the precision
-(100 is enough in this case), or replace the floating-point constant
-@code{2.0} with an integer, to perform all computations using integer
-arithmetic to get the correct output.
-
-It will sometimes be necessary for @command{gawk} to implicitly convert an
-arbitrary precision integer into an arbitrary precision floating-point value.
-This is primarily because the MPFR library does not always provide the
-relevant interface to process arbitrary precision integers or mixed-mode
-numbers as needed by an operation or function.
-In such a case, the precision is set to the minimum value necessary
-for exact conversion, and the working precision is not used for this purpose.
-If this is not what you need or want, you can employ a subterfuge
-like this:
-
-@example
-gawk -M 'BEGIN @{ n = 13; print (n + 0.0) % 2.0 @}'
-@end example
-
-You can avoid this issue altogether by specifying the number as a float
-to begin with:
-
-@example
-gawk -M 'BEGIN @{ n = 13.0; print n % 2.0 @}'
-@end example
-
-Note that for the particular example above, there is unlikely to be a
-reason for simply not using the following:
-
-@example
-gawk -M 'BEGIN @{ n = 13; print n % 2 @}'
-@end example
-
-
-@node MPFR and GMP Libraries
-@section Information About the MPFR and GMP Libraries
-
-There are a few elements available in the @code{PROCINFO} array
-to provide information about the MPFR and GMP libraries.
-@xref{Auto-set}, for more information.
-
-@node Advanced Features
-@chapter Advanced Features of @command{gawk}
-@cindex advanced features, network connections, See Also networks, connections
-@c STARTOFRANGE gawadv
-@cindex @command{gawk}, features, advanced
-@c STARTOFRANGE advgaw
-@cindex advanced features, @command{gawk}
@ignore
-Contributed by: Peter Langston <pud!psl@bellcore.bellcore.com>
-
- Found in Steve English's "signature" line:
-
-"Write documentation as if whoever reads it is a violent psychopath
-who knows where you live."
-@end ignore
-@quotation
-@i{Write documentation as if whoever reads it is
-a violent psychopath who knows where you live.}@*
-Steve English, as quoted by Peter Langston
-@end quotation
-
-This @value{CHAPTER} discusses advanced features in @command{gawk}.
-It's a bit of a ``grab bag'' of items that are otherwise unrelated
-to each other.
-First, a command-line option allows @command{gawk} to recognize
-nondecimal numbers in input data, not just in @command{awk}
-programs.
-Then, @command{gawk}'s special features for sorting arrays are presented.
-Next, two-way I/O, discussed briefly in earlier parts of this
-@value{DOCUMENT}, is described in full detail, along with the basics
-of TCP/IP networking. Finally, @command{gawk}
-can @dfn{profile} an @command{awk} program, making it possible to tune
-it for performance.
-
-@ref{Dynamic Extensions},
-discusses the ability to dynamically add new built-in functions to
-@command{gawk}. As this feature is still immature and likely to change,
-its description is relegated to an appendix.
-
-@menu
-* Nondecimal Data:: Allowing nondecimal input data.
-* Array Sorting:: Facilities for controlling array traversal and
- sorting arrays.
-* Two-way I/O:: Two-way communications with another process.
-* TCP/IP Networking:: Using @command{gawk} for network programming.
-* Profiling:: Profiling your @command{awk} programs.
-@end menu
-
-@node Nondecimal Data
-@section Allowing Nondecimal Input Data
-@cindex @code{--non-decimal-data} option
-@cindex advanced features, @command{gawk}, nondecimal input data
-@cindex input, data@comma{} nondecimal
-@cindex constants, nondecimal
-
-If you run @command{gawk} with the @option{--non-decimal-data} option,
-you can have nondecimal constants in your input data:
-
-@c line break here for small book format
-@example
-$ @kbd{echo 0123 123 0x123 |}
-> @kbd{gawk --non-decimal-data '@{ printf "%d, %d, %d\n",}
-> @kbd{$1, $2, $3 @}'}
-@print{} 83, 123, 291
-@end example
-
-For this feature to work, write your program so that
-@command{gawk} treats your data as numeric:
-
-@example
-$ @kbd{echo 0123 123 0x123 | gawk '@{ print $1, $2, $3 @}'}
-@print{} 0123 123 0x123
-@end example
-
-@noindent
-The @code{print} statement treats its expressions as strings.
-Although the fields can act as numbers when necessary,
-they are still strings, so @code{print} does not try to treat them
-numerically. You may need to add zero to a field to force it to
-be treated as a number. For example:
-
-@example
-$ @kbd{echo 0123 123 0x123 | gawk --non-decimal-data '}
-> @kbd{@{ print $1, $2, $3}
-> @kbd{print $1 + 0, $2 + 0, $3 + 0 @}'}
-@print{} 0123 123 0x123
-@print{} 83 123 291
-@end example
-
-Because it is common to have decimal data with leading zeros, and because
-using this facility could lead to surprising results, the default is to leave it
-disabled. If you want it, you must explicitly request it.
-
-@cindex programming conventions, @code{--non-decimal-data} option
-@cindex @code{--non-decimal-data} option, @code{strtonum()} function and
-@cindex @code{strtonum()} function (@command{gawk}), @code{--non-decimal-data} option and
-@quotation CAUTION
-@emph{Use of this option is not recommended.}
-It can break old programs very badly.
-Instead, use the @code{strtonum()} function to convert your data
-(@pxref{Nondecimal-numbers}).
-This makes your programs easier to write and easier to read, and
-leads to less surprising results.
-@end quotation
-
-@node Array Sorting
-@section Controlling Array Traversal and Array Sorting
-
-@command{gawk} lets you control the order in which a @samp{for (i in array)}
-loop traverses an array.
-
-In addition, two built-in functions, @code{asort()} and @code{asorti()},
-let you sort arrays based on the array values and indices, respectively.
-These two functions also provide control over the sorting criteria used
-to order the elements during sorting.
-
-@menu
-* Controlling Array Traversal:: How to use PROCINFO["sorted_in"].
-* Array Sorting Functions:: How to use @code{asort()} and @code{asorti()}.
-@end menu
-
-@node Controlling Array Traversal
-@subsection Controlling Array Traversal
-
-By default, the order in which a @samp{for (i in array)} loop
-scans an array is not defined; it is generally based upon
-the internal implementation of arrays inside @command{awk}.
-
-Often, though, it is desirable to be able to loop over the elements
-in a particular order that you, the programmer, choose. @command{gawk}
-lets you do this.
-
-@ref{Controlling Scanning}, describes how you can assign special,
-pre-defined values to @code{PROCINFO["sorted_in"]} in order to
-control the order in which @command{gawk} will traverse an array
-during a @code{for} loop.
-
-In addition, the value of @code{PROCINFO["sorted_in"]} can be a function name.
-This lets you traverse an array based on any custom criterion.
-The array elements are ordered according to the return value of this
-function. The comparison function should be defined with at least
-four arguments:
-
-@example
-function comp_func(i1, v1, i2, v2)
-@{
- @var{compare elements 1 and 2 in some fashion}
- @var{return < 0; 0; or > 0}
-@}
-@end example
-
-Here, @var{i1} and @var{i2} are the indices, and @var{v1} and @var{v2}
-are the corresponding values of the two elements being compared.
-Either @var{v1} or @var{v2}, or both, can be arrays if the array being
-traversed contains subarrays as values.
-(@xref{Arrays of Arrays}, for more information about subarrays.)
-The three possible return values are interpreted as follows:
-
-@table @code
-@item comp_func(i1, v1, i2, v2) < 0
-Index @var{i1} comes before index @var{i2} during loop traversal.
-
-@item comp_func(i1, v1, i2, v2) == 0
-Indices @var{i1} and @var{i2}
-come together but the relative order with respect to each other is undefined.
-
-@item comp_func(i1, v1, i2, v2) > 0
-Index @var{i1} comes after index @var{i2} during loop traversal.
-@end table
-
-Our first comparison function can be used to scan an array in
-numerical order of the indices:
-
-@example
-function cmp_num_idx(i1, v1, i2, v2)
-@{
- # numerical index comparison, ascending order
- return (i1 - i2)
-@}
-@end example
-
-Our second function traverses an array based on the string order of
-the element values rather than by indices:
-
-@example
-function cmp_str_val(i1, v1, i2, v2)
-@{
- # string value comparison, ascending order
- v1 = v1 ""
- v2 = v2 ""
- if (v1 < v2)
- return -1
- return (v1 != v2)
-@}
-@end example
-
-The third
-comparison function makes all numbers, and numeric strings without
-any leading or trailing spaces, come out first during loop traversal:
-
-@example
-function cmp_num_str_val(i1, v1, i2, v2, n1, n2)
-@{
- # numbers before string value comparison, ascending order
- n1 = v1 + 0
- n2 = v2 + 0
- if (n1 == v1)
- return (n2 == v2) ? (n1 - n2) : -1
- else if (n2 == v2)
- return 1
- return (v1 < v2) ? -1 : (v1 != v2)
-@}
-@end example
-
-Here is a main program to demonstrate how @command{gawk}
-behaves using each of the previous functions:
-
-@example
-BEGIN @{
- data["one"] = 10
- data["two"] = 20
- data[10] = "one"
- data[100] = 100
- data[20] = "two"
-
- f[1] = "cmp_num_idx"
- f[2] = "cmp_str_val"
- f[3] = "cmp_num_str_val"
- for (i = 1; i <= 3; i++) @{
- printf("Sort function: %s\n", f[i])
- PROCINFO["sorted_in"] = f[i]
- for (j in data)
- printf("\tdata[%s] = %s\n", j, data[j])
- print ""
- @}
-@}
-@end example
-
-Here are the results when the program is run:
-@page
-
-@example
-$ @kbd{gawk -f compdemo.awk}
-@print{} Sort function: cmp_num_idx @ii{Sort by numeric index}
-@print{} data[two] = 20
-@print{} data[one] = 10 @ii{Both strings are numerically zero}
-@print{} data[10] = one
-@print{} data[20] = two
-@print{} data[100] = 100
-@print{}
-@print{} Sort function: cmp_str_val @ii{Sort by element values as strings}
-@print{} data[one] = 10
-@print{} data[100] = 100 @ii{String 100 is less than string 20}
-@print{} data[two] = 20
-@print{} data[10] = one
-@print{} data[20] = two
-@print{}
-@print{} Sort function: cmp_num_str_val @ii{Sort all numeric values before all strings}
-@print{} data[one] = 10
-@print{} data[two] = 20
-@print{} data[100] = 100
-@print{} data[10] = one
-@print{} data[20] = two
-@end example
-
-Consider sorting the entries of a GNU/Linux system password file
-according to login name. The following program sorts records
-by a specific field position and can be used for this purpose:
-
-@example
-# sort.awk --- simple program to sort by field position
-# field position is specified by the global variable POS
-
-function cmp_field(i1, v1, i2, v2)
-@{
- # comparison by value, as string, and ascending order
- return v1[POS] < v2[POS] ? -1 : (v1[POS] != v2[POS])
-@}
-
-@{
- for (i = 1; i <= NF; i++)
- a[NR][i] = $i
-@}
-
-END @{
- PROCINFO["sorted_in"] = "cmp_field"
- if (POS < 1 || POS > NF)
- POS = 1
- for (i in a) @{
- for (j = 1; j <= NF; j++)
- printf("%s%c", a[i][j], j < NF ? ":" : "")
- print ""
- @}
-@}
-@end example
-
-The first field in each entry of the password file is the user's login name,
-and the fields are separated by colons.
-Each record defines a subarray,
-with each field as an element in the subarray.
-Running the program produces the
-following output:
-
-@example
-$ @kbd{gawk -vPOS=1 -F: -f sort.awk /etc/passwd}
-@print{} adm:x:3:4:adm:/var/adm:/sbin/nologin
-@print{} apache:x:48:48:Apache:/var/www:/sbin/nologin
-@print{} avahi:x:70:70:Avahi daemon:/:/sbin/nologin
-@dots{}
-@end example
-
-The comparison should normally always return the same value when given a
-specific pair of array elements as its arguments. If inconsistent
-results are returned then the order is undefined. This behavior can be
-exploited to introduce random order into otherwise seemingly
-ordered data:
-
-@example
-function cmp_randomize(i1, v1, i2, v2)
-@{
- # random order
- return (2 - 4 * rand())
-@}
-@end example
-
-As mentioned above, the order of the indices is arbitrary if two
-elements compare equal. This is usually not a problem, but letting
-the tied elements come out in arbitrary order can be an issue, especially
-when comparing item values. The partial ordering of the equal elements
-may change during the next loop traversal, if other elements are added or
-removed from the array. One way to resolve ties when comparing elements
-with otherwise equal values is to include the indices in the comparison
-rules. Note that doing this may make the loop traversal less efficient,
-so consider it only if necessary. The following comparison functions
-force a deterministic order, and are based on the fact that the
-indices of two elements are never equal:
-
-@example
-function cmp_numeric(i1, v1, i2, v2)
-@{
- # numerical value (and index) comparison, descending order
- return (v1 != v2) ? (v2 - v1) : (i2 - i1)
-@}
-
-function cmp_string(i1, v1, i2, v2)
-@{
- # string value (and index) comparison, descending order
- v1 = v1 i1
- v2 = v2 i2
- return (v1 > v2) ? -1 : (v1 != v2)
-@}
-@end example
-
-@c Avoid using the term ``stable'' when describing the unpredictable behavior
-@c if two items compare equal. Usually, the goal of a "stable algorithm"
-@c is to maintain the original order of the items, which is a meaningless
-@c concept for a list constructed from a hash.
-
-A custom comparison function can often simplify ordered loop
-traversal, and the sky is really the limit when it comes to
-designing such a function.
-
-When string comparisons are made during a sort, either for element
-values where one or both aren't numbers, or for element indices
-handled as strings, the value of @code{IGNORECASE}
-(@pxref{Built-in Variables}) controls whether
-the comparisons treat corresponding uppercase and lowercase letters as
-equivalent or distinct.
-
-Another point to keep in mind is that in the case of subarrays
-the element values can themselves be arrays; a production comparison
-function should use the @code{isarray()} function
-(@pxref{Type Functions}),
-to check for this, and choose a defined sorting order for subarrays.
-
-All sorting based on @code{PROCINFO["sorted_in"]}
-is disabled in POSIX mode,
-since the @code{PROCINFO} array is not special in that case.
-
-As a side note, sorting the array indices before traversing
-the array has been reported to add 15% to 20% overhead to the
-execution time of @command{awk} programs. For this reason,
-sorted array traversal is not the default.
-
-@c The @command{gawk}
-@c maintainers believe that only the people who wish to use a
-@c feature should have to pay for it.
-
-@node Array Sorting Functions
-@subsection Sorting Array Values and Indices with @command{gawk}
-
-@cindex arrays, sorting
-@cindex @code{asort()} function (@command{gawk})
-@cindex @code{asort()} function (@command{gawk}), arrays@comma{} sorting
-@cindex sort function, arrays, sorting
-In most @command{awk} implementations, sorting an array requires
-writing a @code{sort()} function.
-While this can be educational for exploring different sorting algorithms,
-usually that's not the point of the program.
-@command{gawk} provides the built-in @code{asort()}
-and @code{asorti()} functions
-(@pxref{String Functions})
-for sorting arrays. For example:
-
-@example
-@var{populate the array} data
-n = asort(data)
-for (i = 1; i <= n; i++)
- @var{do something with} data[i]
-@end example
-
-After the call to @code{asort()}, the array @code{data} is indexed from 1
-to some number @var{n}, the total number of elements in @code{data}.
-(This count is @code{asort()}'s return value.)
-@code{data[1]} @value{LEQ} @code{data[2]} @value{LEQ} @code{data[3]}, and so on.
-The comparison is based on the type of the elements
-(@pxref{Typing and Comparison}).
-All numeric values come before all string values,
-which in turn come before all subarrays.
-
-@cindex side effects, @code{asort()} function
-An important side effect of calling @code{asort()} is that
-@emph{the array's original indices are irrevocably lost}.
-As this isn't always desirable, @code{asort()} accepts a
-second argument:
-
-@example
-@var{populate the array} source
-n = asort(source, dest)
-for (i = 1; i <= n; i++)
- @var{do something with} dest[i]
-@end example
-
-In this case, @command{gawk} copies the @code{source} array into the
-@code{dest} array and then sorts @code{dest}, destroying its indices.
-However, the @code{source} array is not affected.
-
-@code{asort()} accepts a third string argument to control comparison of
-array elements. As with @code{PROCINFO["sorted_in"]}, this argument
-may be one of the predefined names that @command{gawk} provides
-(@pxref{Controlling Scanning}), or the name of a user-defined function
-(@pxref{Controlling Array Traversal}).
-
-@quotation NOTE
-In all cases, the sorted element values consist of the original
-array's element values. The ability to control comparison merely
-affects the way in which they are sorted.
-@end quotation
-
-Often, what's needed is to sort on the values of the @emph{indices}
-instead of the values of the elements.
-To do that, use the
-@code{asorti()} function. The interface is identical to that of
-@code{asort()}, except that the index values are used for sorting, and
-become the values of the result array:
-
-@example
-@{ source[$0] = some_func($0) @}
-
-END @{
- n = asorti(source, dest)
- for (i = 1; i <= n; i++) @{
- @ii{Work with sorted indices directly:}
- @var{do something with} dest[i]
- @dots{}
- @ii{Access original array via sorted indices:}
- @var{do something with} source[dest[i]]
- @}
-@}
-@end example
-
-Similar to @code{asort()},
-in all cases, the sorted element values consist of the original
-array's indices. The ability to control comparison merely
-affects the way in which they are sorted.
-
-Sorting the array by replacing the indices provides maximal flexibility.
-To traverse the elements in decreasing order, use a loop that goes from
-@var{n} down to 1, either over the elements or over the indices.@footnote{You
-may also use one of the predefined sorting names that sorts in
-decreasing order.}
-
-@cindex reference counting, sorting arrays
-Copying array indices and elements isn't expensive in terms of memory.
-Internally, @command{gawk} maintains @dfn{reference counts} to data.
-For example, when @code{asort()} copies the first array to the second one,
-there is only one copy of the original array elements' data, even though
-both arrays use the values.
-
-@c Document It And Call It A Feature. Sigh.
-@cindex @command{gawk}, @code{IGNORECASE} variable in
-@cindex @code{IGNORECASE} variable
-@cindex arrays, sorting, @code{IGNORECASE} variable and
-@cindex @code{IGNORECASE} variable, array sorting and
-Because @code{IGNORECASE} affects string comparisons, the value
-of @code{IGNORECASE} also affects sorting for both @code{asort()} and @code{asorti()}.
-Note also that the locale's sorting order does @emph{not}
-come into play; comparisons are based on character values only.@footnote{This
-is true because locale-based comparison occurs only when in POSIX
-compatibility mode, and since @code{asort()} and @code{asorti()} are
-@command{gawk} extensions, they are not available in that case.}
-Caveat Emptor.
-
-@node Two-way I/O
-@section Two-Way Communications with Another Process
-@cindex Brennan, Michael
-@cindex programmers, attractiveness of
-@smallexample
-@c Path: cssun.mathcs.emory.edu!gatech!newsxfer3.itd.umich.edu!news-peer.sprintlink.net!news-sea-19.sprintlink.net!news-in-west.sprintlink.net!news.sprintlink.net!Sprint!204.94.52.5!news.whidbey.com!brennan
-From: brennan@@whidbey.com (Mike Brennan)
-Newsgroups: comp.lang.awk
-Subject: Re: Learn the SECRET to Attract Women Easily
-Date: 4 Aug 1997 17:34:46 GMT
-@c Organization: WhidbeyNet
-@c Lines: 12
-Message-ID: <5s53rm$eca@@news.whidbey.com>
-@c References: <5s20dn$2e1@chronicle.concentric.net>
-@c Reply-To: brennan@whidbey.com
-@c NNTP-Posting-Host: asn202.whidbey.com
-@c X-Newsreader: slrn (0.9.4.1 UNIX)
-@c Xref: cssun.mathcs.emory.edu comp.lang.awk:5403
-
-On 3 Aug 1997 13:17:43 GMT, Want More Dates???
-<tracy78@@kilgrona.com> wrote:
->Learn the SECRET to Attract Women Easily
->
->The SCENT(tm) Pheromone Sex Attractant For Men to Attract Women
-
-The scent of awk programmers is a lot more attractive to women than
-the scent of perl programmers.
---
-Mike Brennan
-@c brennan@@whidbey.com
-@end smallexample
-
-@cindex advanced features, @command{gawk}, processes@comma{} communicating with
-@cindex processes, two-way communications with
-It is often useful to be able to
-send data to a separate program for
-processing and then read the result. This can always be
-done with temporary files:
-
-@example
-# Write the data for processing
-tempfile = ("mydata." PROCINFO["pid"])
-while (@var{not done with data})
- print @var{data} | ("subprogram > " tempfile)
-close("subprogram > " tempfile)
-
-# Read the results, remove tempfile when done
-while ((getline newdata < tempfile) > 0)
- @var{process} newdata @var{appropriately}
-close(tempfile)
-system("rm " tempfile)
-@end example
-
-@noindent
-This works, but not elegantly. Among other things, it requires that
-the program be run in a directory that cannot be shared among users;
-for example, @file{/tmp} will not do, as another user might happen
-to be using a temporary file with the same name.
-
-@cindex coprocesses
-@cindex input/output, two-way
-@cindex @code{|} (vertical bar), @code{|&} operator (I/O)
-@cindex vertical bar (@code{|}), @code{|&} operator (I/O)
-@cindex @command{csh} utility, @code{|&} operator, comparison with
-However, with @command{gawk}, it is possible to
-open a @emph{two-way} pipe to another process. The second process is
-termed a @dfn{coprocess}, since it runs in parallel with @command{gawk}.
-The two-way connection is created using the @samp{|&} operator
-(borrowed from the Korn shell, @command{ksh}):@footnote{This is very
-different from the same operator in the C shell.}
-
-@example
-do @{
- print @var{data} |& "subprogram"
- "subprogram" |& getline results
-@} while (@var{data left to process})
-close("subprogram")
-@end example
-
-The first time an I/O operation is executed using the @samp{|&}
-operator, @command{gawk} creates a two-way pipeline to a child process
-that runs the other program. Output created with @code{print}
-or @code{printf} is written to the program's standard input, and
-output from the program's standard output can be read by the @command{gawk}
-program using @code{getline}.
-As is the case with processes started by @samp{|}, the subprogram
-can be any program, or pipeline of programs, that can be started by
-the shell.
+@ifdocbook
+@part Part II:@* Problem Solving With @command{awk}
-There are some cautionary items to be aware of:
+Part II shows how to use @command{awk} and @command{gawk} for problem solving.
+There is lots of code here for you to read and learn from.
+It contains the following chapters:
@itemize @bullet
@item
-As the code inside @command{gawk} currently stands, the coprocess's
-standard error goes to the same place that the parent @command{gawk}'s
-standard error goes. It is not possible to read the child's
-standard error separately.
+@ref{Library Functions}.
-@cindex deadlocks
-@cindex buffering, input/output
-@cindex @code{getline} command, deadlock and
@item
-I/O buffering may be a problem. @command{gawk} automatically
-flushes all output down the pipe to the coprocess.
-However, if the coprocess does not flush its output,
-@command{gawk} may hang when doing a @code{getline} in order to read
-the coprocess's results. This could lead to a situation
-known as @dfn{deadlock}, where each process is waiting for the
-other one to do something.
+@ref{Sample Programs}.
@end itemize
-
-@cindex @code{close()} function, two-way pipes and
-It is possible to close just one end of the two-way pipe to
-a coprocess, by supplying a second argument to the @code{close()}
-function of either @code{"to"} or @code{"from"}
-(@pxref{Close Files And Pipes}).
-These strings tell @command{gawk} to close the end of the pipe
-that sends data to the coprocess or the end that reads from it,
-respectively.
-
-@cindex @command{sort} utility, coprocesses and
-This is particularly necessary in order to use
-the system @command{sort} utility as part of a coprocess;
-@command{sort} must read @emph{all} of its input
-data before it can produce any output.
-The @command{sort} program does not receive an end-of-file indication
-until @command{gawk} closes the write end of the pipe.
-
-When you have finished writing data to the @command{sort}
-utility, you can close the @code{"to"} end of the pipe, and
-then start reading sorted data via @code{getline}.
-For example:
-
-@example
-BEGIN @{
- command = "LC_ALL=C sort"
- n = split("abcdefghijklmnopqrstuvwxyz", a, "")
-
- for (i = n; i > 0; i--)
- print a[i] |& command
- close(command, "to")
-
- while ((command |& getline line) > 0)
- print "got", line
- close(command)
-@}
-@end example
-
-This program writes the letters of the alphabet in reverse order, one
-per line, down the two-way pipe to @command{sort}. It then closes the
-write end of the pipe, so that @command{sort} receives an end-of-file
-indication. This causes @command{sort} to sort the data and write the
-sorted data back to the @command{gawk} program. Once all of the data
-has been read, @command{gawk} terminates the coprocess and exits.
-
-As a side note, the assignment @samp{LC_ALL=C} in the @command{sort}
-command ensures traditional Unix (ASCII) sorting from @command{sort}.
-
-@cindex @command{gawk}, @code{PROCINFO} array in
-@cindex @code{PROCINFO} array
-You may also use pseudo-ttys (ptys) for
-two-way communication instead of pipes, if your system supports them.
-This is done on a per-command basis, by setting a special element
-in the @code{PROCINFO} array
-(@pxref{Auto-set}),
-like so:
-
-@example
-command = "sort -nr" # command, save in convenience variable
-PROCINFO[command, "pty"] = 1 # update PROCINFO
-print @dots{} |& command # start two-way pipe
-@dots{}
-@end example
-
-@noindent
-Using ptys avoids the buffer deadlock issues described earlier, at some
-loss in performance. If your system does not have ptys, or if all the
-system's ptys are in use, @command{gawk} automatically falls back to
-using regular pipes.
-
-@node TCP/IP Networking
-@section Using @command{gawk} for Network Programming
-@cindex advanced features, @command{gawk}, network programming
-@cindex networks, programming
-@c STARTOFRANGE tcpip
-@cindex TCP/IP
-@cindex @code{/inet/@dots{}} special files (@command{gawk})
-@cindex files, @code{/inet/@dots{}} (@command{gawk})
-@cindex @code{/inet4/@dots{}} special files (@command{gawk})
-@cindex files, @code{/inet4/@dots{}} (@command{gawk})
-@cindex @code{/inet6/@dots{}} special files (@command{gawk})
-@cindex files, @code{/inet6/@dots{}} (@command{gawk})
-@cindex @code{EMISTERED}
-@quotation
-@code{EMISTERED}:@*
-@ @ @ @ @i{A host is a host from coast to coast,@*
-@ @ @ @ and no-one can talk to host that's close,@*
-@ @ @ @ unless the host that isn't close@*
-@ @ @ @ is busy hung or dead.}
-@end quotation
-
-In addition to being able to open a two-way pipeline to a coprocess
-on the same system
-(@pxref{Two-way I/O}),
-it is possible to make a two-way connection to
-another process on another system across an IP network connection.
-
-You can think of this as just a @emph{very long} two-way pipeline to
-a coprocess.
-The way @command{gawk} decides that you want to use TCP/IP networking is
-by recognizing special @value{FN}s that begin with one of @samp{/inet/},
-@samp{/inet4/} or @samp{/inet6}.
-
-The full syntax of the special @value{FN} is
-@file{/@var{net-type}/@var{protocol}/@var{local-port}/@var{remote-host}/@var{remote-port}}.
-The components are:
-
-@table @var
-@item net-type
-Specifies the kind of Internet connection to make.
-Use @samp{/inet4/} to force IPv4, and
-@samp{/inet6/} to force IPv6.
-Plain @samp{/inet/} (which used to be the only option) uses
-the system default, most likely IPv4.
-
-@item protocol
-The protocol to use over IP. This must be either @samp{tcp}, or
-@samp{udp}, for a TCP or UDP IP connection,
-respectively. The use of TCP is recommended for most applications.
-
-@item local-port
-@cindex @code{getaddrinfo()} function (C library)
-The local TCP or UDP port number to use. Use a port number of @samp{0}
-when you want the system to pick a port. This is what you should do
-when writing a TCP or UDP client.
-You may also use a well-known service name, such as @samp{smtp}
-or @samp{http}, in which case @command{gawk} attempts to determine
-the predefined port number using the C @code{getaddrinfo()} function.
-
-@item remote-host
-The IP address or fully-qualified domain name of the Internet
-host to which you want to connect.
-
-@item remote-port
-The TCP or UDP port number to use on the given @var{remote-host}.
-Again, use @samp{0} if you don't care, or else a well-known
-service name.
-@end table
-
-@cindex @command{gawk}, @code{ERRNO} variable in
-@cindex @code{ERRNO} variable
-@quotation NOTE
-Failure in opening a two-way socket will result in a non-fatal error
-being returned to the calling code. The value of @code{ERRNO} indicates
-the error (@pxref{Auto-set}).
-@end quotation
-
-Consider the following very simple example:
-
-@example
-BEGIN @{
- Service = "/inet/tcp/0/localhost/daytime"
- Service |& getline
- print $0
- close(Service)
-@}
-@end example
-
-This program reads the current date and time from the local system's
-TCP @samp{daytime} server.
-It then prints the results and closes the connection.
-
-Because this topic is extensive, the use of @command{gawk} for
-TCP/IP programming is documented separately.
-@ifinfo
-See
-@inforef{Top, , General Introduction, gawkinet, TCP/IP Internetworking with @command{gawk}},
-@end ifinfo
-@ifnotinfo
-See @cite{TCP/IP Internetworking with @command{gawk}},
-which comes as part of the @command{gawk} distribution,
-@end ifnotinfo
-for a much more complete introduction and discussion, as well as
-extensive examples.
-
-@c ENDOFRANGE tcpip
-
-@node Profiling
-@section Profiling Your @command{awk} Programs
-@c STARTOFRANGE awkp
-@cindex @command{awk} programs, profiling
-@c STARTOFRANGE proawk
-@cindex profiling @command{awk} programs
-@cindex profiling @command{gawk}
-@cindex @code{awkprof.out} file
-@cindex files, @code{awkprof.out}
-
-You may produce execution traces of your @command{awk} programs.
-This is done by passing the option @option{--profile} to @command{gawk}.
-When @command{gawk} has finished running, it creates a profile of your program in a file
-named @file{awkprof.out}. Because it is profiling, it also executes up to 45% slower than
-@command{gawk} normally does.
-
-@cindex @code{--profile} option
-As shown in the following example,
-the @option{--profile} option can be used to change the name of the file
-where @command{gawk} will write the profile:
-
-@example
-gawk --profile=myprog.prof -f myprog.awk data1 data2
-@end example
-
-@noindent
-In the above example, @command{gawk} places the profile in
-@file{myprog.prof} instead of in @file{awkprof.out}.
-
-Here is a sample session showing a simple @command{awk} program, its input data, and the
-results from running @command{gawk} with the @option{--profile} option.
-First, the @command{awk} program:
-
-@example
-BEGIN @{ print "First BEGIN rule" @}
-
-END @{ print "First END rule" @}
-
-/foo/ @{
- print "matched /foo/, gosh"
- for (i = 1; i <= 3; i++)
- sing()
-@}
-
-@{
- if (/foo/)
- print "if is true"
- else
- print "else is true"
-@}
-
-BEGIN @{ print "Second BEGIN rule" @}
-
-END @{ print "Second END rule" @}
-
-function sing( dummy)
-@{
- print "I gotta be me!"
-@}
-@end example
-
-Following is the input data:
-
-@example
-foo
-bar
-baz
-foo
-junk
-@end example
-
-Here is the @file{awkprof.out} that results from running the @command{gawk}
-profiler on this program and data (this example also illustrates that @command{awk}
-programmers sometimes have to work late):
-
-@cindex @code{BEGIN} pattern
-@cindex @code{END} pattern
-@example
- # gawk profile, created Sun Aug 13 00:00:15 2000
-
- # BEGIN block(s)
-
- BEGIN @{
- 1 print "First BEGIN rule"
- 1 print "Second BEGIN rule"
- @}
-
- # Rule(s)
-
- 5 /foo/ @{ # 2
- 2 print "matched /foo/, gosh"
- 6 for (i = 1; i <= 3; i++) @{
- 6 sing()
- @}
- @}
-
- 5 @{
- 5 if (/foo/) @{ # 2
- 2 print "if is true"
- 3 @} else @{
- 3 print "else is true"
- @}
- @}
-
- # END block(s)
-
- END @{
- 1 print "First END rule"
- 1 print "Second END rule"
- @}
-
- # Functions, listed alphabetically
-
- 6 function sing(dummy)
- @{
- 6 print "I gotta be me!"
- @}
-@end example
-
-This example illustrates many of the basic features of profiling output.
-They are as follows:
-
-@itemize @bullet
-@item
-The program is printed in the order @code{BEGIN} rule,
-@code{BEGINFILE} rule,
-pattern/action rules,
-@code{ENDFILE} rule, @code{END} rule and functions, listed
-alphabetically.
-Multiple @code{BEGIN} and @code{END} rules are merged together,
-as are multiple @code{BEGINFILE} and @code{ENDFILE} rules.
-
-@cindex patterns, counts
-@item
-Pattern-action rules have two counts.
-The first count, to the left of the rule, shows how many times
-the rule's pattern was @emph{tested}.
-The second count, to the right of the rule's opening left brace
-in a comment,
-shows how many times the rule's action was @emph{executed}.
-The difference between the two indicates how many times the rule's
-pattern evaluated to false.
-
-@item
-Similarly,
-the count for an @code{if}-@code{else} statement shows how many times
-the condition was tested.
-To the right of the opening left brace for the @code{if}'s body
-is a count showing how many times the condition was true.
-The count for the @code{else}
-indicates how many times the test failed.
-
-@cindex loops, count for header
-@item
-The count for a loop header (such as @code{for}
-or @code{while}) shows how many times the loop test was executed.
-(Because of this, you can't just look at the count on the first
-statement in a rule to determine how many times the rule was executed.
-If the first statement is a loop, the count is misleading.)
-
-@cindex functions, user-defined, counts
-@cindex user-defined, functions, counts
-@item
-For user-defined functions, the count next to the @code{function}
-keyword indicates how many times the function was called.
-The counts next to the statements in the body show how many times
-those statements were executed.
-
-@cindex @code{@{@}} (braces)
-@cindex braces (@code{@{@}})
-@item
-The layout uses ``K&R'' style with TABs.
-Braces are used everywhere, even when
-the body of an @code{if}, @code{else}, or loop is only a single statement.
-
-@cindex @code{()} (parentheses)
-@cindex parentheses @code{()}
-@item
-Parentheses are used only where needed, as indicated by the structure
-of the program and the precedence rules.
-@c extra verbiage here satisfies the copyeditor. ugh.
-For example, @samp{(3 + 5) * 4} means add three plus five, then multiply
-the total by four. However, @samp{3 + 5 * 4} has no parentheses, and
-means @samp{3 + (5 * 4)}.
-
-@ignore
-@item
-All string concatenations are parenthesized too.
-(This could be made a bit smarter.)
+@end ifdocbook
@end ignore
-@item
-Parentheses are used around the arguments to @code{print}
-and @code{printf} only when
-the @code{print} or @code{printf} statement is followed by a redirection.
-Similarly, if
-the target of a redirection isn't a scalar, it gets parenthesized.
-
-@item
-@command{gawk} supplies leading comments in
-front of the @code{BEGIN} and @code{END} rules,
-the pattern/action rules, and the functions.
-
-@end itemize
-
-The profiled version of your program may not look exactly like what you
-typed when you wrote it. This is because @command{gawk} creates the
-profiled version by ``pretty printing'' its internal representation of
-the program. The advantage to this is that @command{gawk} can produce
-a standard representation. The disadvantage is that all source-code
-comments are lost, as are the distinctions among multiple @code{BEGIN},
-@code{END}, @code{BEGINFILE}, and @code{ENDFILE} rules. Also, things such as:
-
-@example
-/foo/
-@end example
-
-@noindent
-come out as:
-
-@example
-/foo/ @{
- print $0
-@}
-@end example
-
-@noindent
-which is correct, but possibly surprising.
-
-@cindex profiling @command{awk} programs, dynamically
-@cindex @command{gawk} program, dynamic profiling
-Besides creating profiles when a program has completed,
-@command{gawk} can produce a profile while it is running.
-This is useful if your @command{awk} program goes into an
-infinite loop and you want to see what has been executed.
-To use this feature, run @command{gawk} with the @option{--profile}
-option in the background:
-
-@example
-$ @kbd{gawk --profile -f myprog &}
-[1] 13992
-@end example
-
-@cindex @command{kill} command@comma{} dynamic profiling
-@cindex @code{USR1} signal
-@cindex @code{SIGUSR1} signal
-@cindex signals, @code{USR1}/@code{SIGUSR1}
-@noindent
-The shell prints a job number and process ID number; in this case, 13992.
-Use the @command{kill} command to send the @code{USR1} signal
-to @command{gawk}:
-
-@example
-$ @kbd{kill -USR1 13992}
-@end example
-
-@noindent
-As usual, the profiled version of the program is written to
-@file{awkprof.out}, or to a different file if one specified with
-the @option{--profile} option.
-
-Along with the regular profile, as shown earlier, the profile
-includes a trace of any active functions:
-
-@example
-# Function Call Stack:
-
-# 3. baz
-# 2. bar
-# 1. foo
-# -- main --
-@end example
-
-You may send @command{gawk} the @code{USR1} signal as many times as you like.
-Each time, the profile and function call trace are appended to the output
-profile file.
-
-@cindex @code{HUP} signal
-@cindex @code{SIGHUP} signal
-@cindex signals, @code{HUP}/@code{SIGHUP}
-If you use the @code{HUP} signal instead of the @code{USR1} signal,
-@command{gawk} produces the profile and the function call trace and then exits.
-
-@cindex @code{INT} signal (MS-Windows)
-@cindex @code{SIGINT} signal (MS-Windows)
-@cindex signals, @code{INT}/@code{SIGINT} (MS-Windows)
-@cindex @code{QUIT} signal (MS-Windows)
-@cindex @code{SIGQUIT} signal (MS-Windows)
-@cindex signals, @code{QUIT}/@code{SIGQUIT} (MS-Windows)
-When @command{gawk} runs on MS-Windows systems, it uses the
-@code{INT} and @code{QUIT} signals for producing the profile and, in
-the case of the @code{INT} signal, @command{gawk} exits. This is
-because these systems don't support the @command{kill} command, so the
-only signals you can deliver to a program are those generated by the
-keyboard. The @code{INT} signal is generated by the
-@kbd{@value{CTL}-@key{C}} or @kbd{@value{CTL}-@key{BREAK}} key, while the
-@code{QUIT} signal is generated by the @kbd{@value{CTL}-@key{\}} key.
-
-Finally, @command{gawk} also accepts another option @option{--pretty-print}.
-When called this way, @command{gawk} ``pretty prints'' the program into
-@file{awkprof.out}, without any execution counts.
-@c ENDOFRANGE advgaw
-@c ENDOFRANGE gawadv
-@c ENDOFRANGE awkp
-@c ENDOFRANGE proawk
-
@node Library Functions
@chapter A Library of @command{awk} Functions
@c STARTOFRANGE libf
@@ -20523,7 +18185,7 @@ programming use.
* Ordinal Functions:: Functions for using characters as numbers and
vice versa.
* Join Function:: A function to join an array into a string.
-* Gettimeofday Function:: A function to get formatted times.
+* Getlocaltime Function:: A function to get formatted times.
@end menu
@node Strtonum Function
@@ -21048,7 +18710,7 @@ be nice if @command{awk} had an assignment operator for concatenation.
The lack of an explicit operator for concatenation makes string operations
more difficult than they really need to be.}
-@node Gettimeofday Function
+@node Getlocaltime Function
@subsection Managing the Time of Day
@cindex libraries of @command{awk} functions, managing, time
@@ -21062,14 +18724,14 @@ in human readable form. While @code{strftime()} is extensive, the control
formats are not necessarily easy to remember or intuitively obvious when
reading a program.
-The following function, @code{gettimeofday()}, populates a user-supplied array
+The following function, @code{getlocaltime()}, populates a user-supplied array
with preformatted time information. It returns a string with the current
time formatted in the same way as the @command{date} utility:
-@cindex @code{gettimeofday()} user-defined function
+@cindex @code{getlocaltime()} user-defined function
@example
@c file eg/lib/gettime.awk
-# gettimeofday.awk --- get the time of day in a usable format
+# getlocaltime.awk --- get the time of day in a usable format
@c endfile
@ignore
@c file eg/lib/gettime.awk
@@ -21102,7 +18764,7 @@ time formatted in the same way as the @command{date} utility:
# time["weeknum"] -- week number, Sunday first day
# time["altweeknum"] -- week number, Monday first day
-function gettimeofday(time, ret, now, i)
+function getlocaltime(time, ret, now, i)
@{
# get time once, avoids unnecessary system calls
now = systime()
@@ -21144,7 +18806,7 @@ The string indices are easier to use and read than the various formats
required by @code{strftime()}. The @code{alarm} program presented in
@ref{Alarm Program},
uses this function.
-A more general design for the @code{gettimeofday()} function would have
+A more general design for the @code{getlocaltime()} function would have
allowed the user to supply an optional timestamp value to use instead
of the current time.
@@ -24436,8 +22098,8 @@ it prints the message on the standard output. In addition, you can give it
the number of times to repeat the message as well as a delay between
repetitions.
-This program uses the @code{gettimeofday()} function from
-@ref{Gettimeofday Function}.
+This program uses the @code{getlocaltime()} function from
+@ref{Getlocaltime Function}.
All the work is done in the @code{BEGIN} rule. The first part is argument
checking and setting of defaults: the delay, the count, and the message to
@@ -24456,7 +22118,7 @@ Here is the program:
@c file eg/prog/alarm.awk
# alarm.awk --- set an alarm
#
-# Requires gettimeofday() library function
+# Requires getlocaltime() library function
@c endfile
@ignore
@c file eg/prog/alarm.awk
@@ -24528,7 +22190,7 @@ is how long to wait before setting off the alarm:
minute = atime[2] + 0 # force numeric
# get current broken down time
- gettimeofday(now)
+ getlocaltime(now)
# if time given is 12-hour hours and it's after that
# hour, e.g., `alarm 5:30' at 9 a.m. means 5:30 p.m.,
@@ -26195,6 +23857,1921 @@ BEGIN {
}
@end ignore
+@iftex
+@part Part III:@* Moving Beyond Standard @command{awk} With @command{gawk}
+@end iftex
+
+@ignore
+@ifdocbook
+
+@part Part III:@* Moving Beyond Standard @command{awk} With @command{gawk}
+
+Part III focuses on features specific to @command{gawk}.
+It contains the following chapters:
+
+@itemize @bullet
+@item
+@ref{Internationalization}.
+
+@item
+@ref{Advanced Features}.
+
+@item
+@ref{Debugger}.
+
+@item
+@ref{Arbitrary Precision Arithmetic}.
+
+@item
+@ref{Dynamic Extensions}.
+@end ifdocbook
+@end ignore
+
+@node Internationalization
+@chapter Internationalization with @command{gawk}
+
+Once upon a time, computer makers
+wrote software that worked only in English.
+Eventually, hardware and software vendors noticed that if their
+systems worked in the native languages of non-English-speaking
+countries, they were able to sell more systems.
+As a result, internationalization and localization
+of programs and software systems became a common practice.
+
+@c STARTOFRANGE inloc
+@cindex internationalization, localization
+@cindex @command{gawk}, internationalization and, See internationalization
+@cindex internationalization, localization, @command{gawk} and
+For many years, the ability to provide internationalization
+was largely restricted to programs written in C and C++.
+This @value{CHAPTER} describes the underlying library @command{gawk}
+uses for internationalization, as well as how
+@command{gawk} makes internationalization
+features available at the @command{awk} program level.
+Having internationalization available at the @command{awk} level
+gives software developers additional flexibility---they are no
+longer forced to write in C or C++ when internationalization is
+a requirement.
+
+@menu
+* I18N and L10N:: Internationalization and Localization.
+* Explaining gettext:: How GNU @code{gettext} works.
+* Programmer i18n:: Features for the programmer.
+* Translator i18n:: Features for the translator.
+* I18N Example:: A simple i18n example.
+* Gawk I18N:: @command{gawk} is also internationalized.
+@end menu
+
+@node I18N and L10N
+@section Internationalization and Localization
+
+@cindex internationalization
+@cindex localization, See internationalization@comma{} localization
+@cindex localization
+@dfn{Internationalization} means writing (or modifying) a program once,
+in such a way that it can use multiple languages without requiring
+further source-code changes.
+@dfn{Localization} means providing the data necessary for an
+internationalized program to work in a particular language.
+Most typically, these terms refer to features such as the language
+used for printing error messages, the language used to read
+responses, and information related to how numerical and
+monetary values are printed and read.
+
+@node Explaining gettext
+@section GNU @code{gettext}
+
+@cindex internationalizing a program
+@c STARTOFRANGE gettex
+@cindex @code{gettext} library
+The facilities in GNU @code{gettext} focus on messages; strings printed
+by a program, either directly or via formatting with @code{printf} or
+@code{sprintf()}.@footnote{For some operating systems, the @command{gawk}
+port doesn't support GNU @code{gettext}.
+Therefore, these features are not available
+if you are using one of those operating systems. Sorry.}
+
+@cindex portability, @code{gettext} library and
+When using GNU @code{gettext}, each application has its own
+@dfn{text domain}. This is a unique name, such as @samp{kpilot} or @samp{gawk},
+that identifies the application.
+A complete application may have multiple components---programs written
+in C or C++, as well as scripts written in @command{sh} or @command{awk}.
+All of the components use the same text domain.
+
+To make the discussion concrete, assume we're writing an application
+named @command{guide}. Internationalization consists of the
+following steps, in this order:
+
+@enumerate
+@item
+The programmer goes
+through the source for all of @command{guide}'s components
+and marks each string that is a candidate for translation.
+For example, @code{"`-F': option required"} is a good candidate for translation.
+A table with strings of option names is not (e.g., @command{gawk}'s
+@option{--profile} option should remain the same, no matter what the local
+language).
+
+@cindex @code{textdomain()} function (C library)
+@item
+The programmer indicates the application's text domain
+(@code{"guide"}) to the @code{gettext} library,
+by calling the @code{textdomain()} function.
+
+@cindex @code{.pot} files
+@cindex files, @code{.pot}
+@cindex portable object template files
+@cindex files, portable object template
+@item
+Messages from the application are extracted from the source code and
+collected into a portable object template file (@file{guide.pot}),
+which lists the strings and their translations.
+The translations are initially empty.
+The original (usually English) messages serve as the key for
+lookup of the translations.
+
+@cindex @code{.po} files
+@cindex files, @code{.po}
+@cindex portable object files
+@cindex files, portable object
+@item
+For each language with a translator, @file{guide.pot}
+is copied to a portable object file (@code{.po})
+and translations are created and shipped with the application.
+For example, there might be a @file{fr.po} for a French translation.
+
+@cindex @code{.mo} files
+@cindex files, @code{.mo}
+@cindex message object files
+@cindex files, message object
+@item
+Each language's @file{.po} file is converted into a binary
+message object (@file{.mo}) file.
+A message object file contains the original messages and their
+translations in a binary format that allows fast lookup of translations
+at runtime.
+
+@item
+When @command{guide} is built and installed, the binary translation files
+are installed in a standard place.
+
+@cindex @code{bindtextdomain()} function (C library)
+@item
+For testing and development, it is possible to tell @code{gettext}
+to use @file{.mo} files in a different directory than the standard
+one by using the @code{bindtextdomain()} function.
+
+@cindex @code{.mo} files, specifying directory of
+@cindex files, @code{.mo}, specifying directory of
+@cindex message object files, specifying directory of
+@cindex files, message object, specifying directory of
+@item
+At runtime, @command{guide} looks up each string via a call
+to @code{gettext()}. The returned string is the translated string
+if available, or the original string if not.
+
+@item
+If necessary, it is possible to access messages from a different
+text domain than the one belonging to the application, without
+having to switch the application's default text domain back
+and forth.
+@end enumerate
+
+@cindex @code{gettext()} function (C library)
+In C (or C++), the string marking and dynamic translation lookup
+are accomplished by wrapping each string in a call to @code{gettext()}:
+
+@example
+printf("%s", gettext("Don't Panic!\n"));
+@end example
+
+The tools that extract messages from source code pull out all
+strings enclosed in calls to @code{gettext()}.
+
+@cindex @code{_} (underscore), @code{_} C macro
+@cindex underscore (@code{_}), @code{_} C macro
+The GNU @code{gettext} developers, recognizing that typing
+@samp{gettext(@dots{})} over and over again is both painful and ugly to look
+at, use the macro @samp{_} (an underscore) to make things easier:
+
+@example
+/* In the standard header file: */
+#define _(str) gettext(str)
+
+/* In the program text: */
+printf("%s", _("Don't Panic!\n"));
+@end example
+
+@cindex internationalization, localization, locale categories
+@cindex @code{gettext} library, locale categories
+@cindex locale categories
+@noindent
+This reduces the typing overhead to just three extra characters per string
+and is considerably easier to read as well.
+
+There are locale @dfn{categories}
+for different types of locale-related information.
+The defined locale categories that @code{gettext} knows about are:
+
+@table @code
+@cindex @code{LC_MESSAGES} locale category
+@item LC_MESSAGES
+Text messages. This is the default category for @code{gettext}
+operations, but it is possible to supply a different one explicitly,
+if necessary. (It is almost never necessary to supply a different category.)
+
+@cindex sorting characters in different languages
+@cindex @code{LC_COLLATE} locale category
+@item LC_COLLATE
+Text-collation information; i.e., how different characters
+and/or groups of characters sort in a given language.
+
+@cindex @code{LC_CTYPE} locale category
+@item LC_CTYPE
+Character-type information (alphabetic, digit, upper- or lowercase, and
+so on).
+This information is accessed via the
+POSIX character classes in regular expressions,
+such as @code{/[[:alnum:]]/}
+(@pxref{Regexp Operators}).
+
+@cindex monetary information, localization
+@cindex currency symbols, localization
+@cindex @code{LC_MONETARY} locale category
+@item LC_MONETARY
+Monetary information, such as the currency symbol, and whether the
+symbol goes before or after a number.
+
+@cindex @code{LC_NUMERIC} locale category
+@item LC_NUMERIC
+Numeric information, such as which characters to use for the decimal
+point and the thousands separator.@footnote{Americans
+use a comma every three decimal places and a period for the decimal
+point, while many Europeans do exactly the opposite:
+1,234.56 versus 1.234,56.}
+
+@cindex @code{LC_RESPONSE} locale category
+@item LC_RESPONSE
+Response information, such as how ``yes'' and ``no'' appear in the
+local language, and possibly other information as well.
+
+@cindex time, localization and
+@cindex dates, information related to@comma{} localization
+@cindex @code{LC_TIME} locale category
+@item LC_TIME
+Time- and date-related information, such as 12- or 24-hour clock, month printed
+before or after the day in a date, local month abbreviations, and so on.
+
+@cindex @code{LC_ALL} locale category
+@item LC_ALL
+All of the above. (Not too useful in the context of @code{gettext}.)
+@end table
+@c ENDOFRANGE gettex
+
+@node Programmer i18n
+@section Internationalizing @command{awk} Programs
+@c STARTOFRANGE inap
+@cindex @command{awk} programs, internationalizing
+
+@command{gawk} provides the following variables and functions for
+internationalization:
+
+@table @code
+@cindex @code{TEXTDOMAIN} variable
+@item TEXTDOMAIN
+This variable indicates the application's text domain.
+For compatibility with GNU @code{gettext}, the default
+value is @code{"messages"}.
+
+@cindex internationalization, localization, marked strings
+@cindex strings, for localization
+@item _"your message here"
+String constants marked with a leading underscore
+are candidates for translation at runtime.
+String constants without a leading underscore are not translated.
+
+@cindex @code{dcgettext()} function (@command{gawk})
+@item dcgettext(@var{string} @r{[}, @var{domain} @r{[}, @var{category}@r{]]})
+Return the translation of @var{string} in
+text domain @var{domain} for locale category @var{category}.
+The default value for @var{domain} is the current value of @code{TEXTDOMAIN}.
+The default value for @var{category} is @code{"LC_MESSAGES"}.
+
+If you supply a value for @var{category}, it must be a string equal to
+one of the known locale categories described in
+@ifnotinfo
+the previous @value{SECTION}.
+@end ifnotinfo
+@ifinfo
+@ref{Explaining gettext}.
+@end ifinfo
+You must also supply a text domain. Use @code{TEXTDOMAIN} if
+you want to use the current domain.
+
+@quotation CAUTION
+The order of arguments to the @command{awk} version
+of the @code{dcgettext()} function is purposely different from the order for
+the C version. The @command{awk} version's order was
+chosen to be simple and to allow for reasonable @command{awk}-style
+default arguments.
+@end quotation
+
+@cindex @code{dcngettext()} function (@command{gawk})
+@item dcngettext(@var{string1}, @var{string2}, @var{number} @r{[}, @var{domain} @r{[}, @var{category}@r{]]})
+Return the plural form used for @var{number} of the
+translation of @var{string1} and @var{string2} in text domain
+@var{domain} for locale category @var{category}. @var{string1} is the
+English singular variant of a message, and @var{string2} the English plural
+variant of the same message.
+The default value for @var{domain} is the current value of @code{TEXTDOMAIN}.
+The default value for @var{category} is @code{"LC_MESSAGES"}.
+
+The same remarks about argument order as for the @code{dcgettext()} function apply.
+
+@cindex @code{.mo} files, specifying directory of
+@cindex files, @code{.mo}, specifying directory of
+@cindex message object files, specifying directory of
+@cindex files, message object, specifying directory of
+@cindex @code{bindtextdomain()} function (@command{gawk})
+@item bindtextdomain(@var{directory} @r{[}, @var{domain}@r{]})
+Change the directory in which
+@code{gettext} looks for @file{.mo} files, in case they
+will not or cannot be placed in the standard locations
+(e.g., during testing).
+Return the directory in which @var{domain} is ``bound.''
+
+The default @var{domain} is the value of @code{TEXTDOMAIN}.
+If @var{directory} is the null string (@code{""}), then
+@code{bindtextdomain()} returns the current binding for the
+given @var{domain}.
+@end table
+
+To use these facilities in your @command{awk} program, follow the steps
+outlined in
+@ifnotinfo
+the previous @value{SECTION},
+@end ifnotinfo
+@ifinfo
+@ref{Explaining gettext},
+@end ifinfo
+like so:
+
+@enumerate
+@cindex @code{BEGIN} pattern, @code{TEXTDOMAIN} variable and
+@cindex @code{TEXTDOMAIN} variable, @code{BEGIN} pattern and
+@item
+Set the variable @code{TEXTDOMAIN} to the text domain of
+your program. This is best done in a @code{BEGIN} rule
+(@pxref{BEGIN/END}),
+or it can also be done via the @option{-v} command-line
+option (@pxref{Options}):
+
+@example
+BEGIN @{
+ TEXTDOMAIN = "guide"
+ @dots{}
+@}
+@end example
+
+@cindex @code{_} (underscore), translatable string
+@cindex underscore (@code{_}), translatable string
+@item
+Mark all translatable strings with a leading underscore (@samp{_})
+character. It @emph{must} be adjacent to the opening
+quote of the string. For example:
+
+@example
+print _"hello, world"
+x = _"you goofed"
+printf(_"Number of users is %d\n", nusers)
+@end example
+
+@item
+If you are creating strings dynamically, you can
+still translate them, using the @code{dcgettext()}
+built-in function:
+
+@example
+message = nusers " users logged in"
+message = dcgettext(message, "adminprog")
+print message
+@end example
+
+Here, the call to @code{dcgettext()} supplies a different
+text domain (@code{"adminprog"}) in which to find the
+message, but it uses the default @code{"LC_MESSAGES"} category.
+
+@cindex @code{LC_MESSAGES} locale category, @code{bindtextdomain()} function (@command{gawk})
+@item
+During development, you might want to put the @file{.mo}
+file in a private directory for testing. This is done
+with the @code{bindtextdomain()} built-in function:
+
+@example
+BEGIN @{
+ TEXTDOMAIN = "guide" # our text domain
+ if (Testing) @{
+ # where to find our files
+ bindtextdomain("testdir")
+ # joe is in charge of adminprog
+ bindtextdomain("../joe/testdir", "adminprog")
+ @}
+ @dots{}
+@}
+@end example
+
+@end enumerate
+
+@xref{I18N Example},
+for an example program showing the steps to create
+and use translations from @command{awk}.
+
+@node Translator i18n
+@section Translating @command{awk} Programs
+
+@cindex @code{.po} files
+@cindex files, @code{.po}
+@cindex portable object files
+@cindex files, portable object
+Once a program's translatable strings have been marked, they must
+be extracted to create the initial @file{.po} file.
+As part of translation, it is often helpful to rearrange the order
+in which arguments to @code{printf} are output.
+
+@command{gawk}'s @option{--gen-pot} command-line option extracts
+the messages and is discussed next.
+After that, @code{printf}'s ability to
+rearrange the order for @code{printf} arguments at runtime
+is covered.
+
+@menu
+* String Extraction:: Extracting marked strings.
+* Printf Ordering:: Rearranging @code{printf} arguments.
+* I18N Portability:: @command{awk}-level portability issues.
+@end menu
+
+@node String Extraction
+@subsection Extracting Marked Strings
+@cindex strings, extracting
+@cindex marked strings@comma{} extracting
+@cindex @code{--gen-pot} option
+@cindex command-line options, string extraction
+@cindex string extraction (internationalization)
+@cindex marked string extraction (internationalization)
+@cindex extraction, of marked strings (internationalization)
+
+@cindex @code{--gen-pot} option
+Once your @command{awk} program is working, and all the strings have
+been marked and you've set (and perhaps bound) the text domain,
+it is time to produce translations.
+First, use the @option{--gen-pot} command-line option to create
+the initial @file{.pot} file:
+
+@example
+$ @kbd{gawk --gen-pot -f guide.awk > guide.pot}
+@end example
+
+@cindex @code{xgettext} utility
+When run with @option{--gen-pot}, @command{gawk} does not execute your
+program. Instead, it parses it as usual and prints all marked strings
+to standard output in the format of a GNU @code{gettext} Portable Object
+file. Also included in the output are any constant strings that
+appear as the first argument to @code{dcgettext()} or as the first and
+second argument to @code{dcngettext()}.@footnote{The
+@command{xgettext} utility that comes with GNU
+@code{gettext} can handle @file{.awk} files.}
+@xref{I18N Example},
+for the full list of steps to go through to create and test
+translations for @command{guide}.
+
+@node Printf Ordering
+@subsection Rearranging @code{printf} Arguments
+
+@cindex @code{printf} statement, positional specifiers
+@cindex positional specifiers, @code{printf} statement
+Format strings for @code{printf} and @code{sprintf()}
+(@pxref{Printf})
+present a special problem for translation.
+Consider the following:@footnote{This example is borrowed
+from the GNU @code{gettext} manual.}
+
+@c line broken here only for smallbook format
+@example
+printf(_"String `%s' has %d characters\n",
+ string, length(string)))
+@end example
+
+A possible German translation for this might be:
+
+@example
+"%d Zeichen lang ist die Zeichenkette `%s'\n"
+@end example
+
+The problem should be obvious: the order of the format
+specifications is different from the original!
+Even though @code{gettext()} can return the translated string
+at runtime,
+it cannot change the argument order in the call to @code{printf}.
+
+To solve this problem, @code{printf} format specifiers may have
+an additional optional element, which we call a @dfn{positional specifier}.
+For example:
+
+@example
+"%2$d Zeichen lang ist die Zeichenkette `%1$s'\n"
+@end example
+
+Here, the positional specifier consists of an integer count, which indicates which
+argument to use, and a @samp{$}. Counts are one-based, and the
+format string itself is @emph{not} included. Thus, in the following
+example, @samp{string} is the first argument and @samp{length(string)} is the second:
+
+@example
+$ @kbd{gawk 'BEGIN @{}
+> @kbd{string = "Dont Panic"}
+> @kbd{printf _"%2$d characters live in \"%1$s\"\n",}
+> @kbd{string, length(string)}
+> @kbd{@}'}
+@print{} 10 characters live in "Dont Panic"
+@end example
+
+If present, positional specifiers come first in the format specification,
+before the flags, the field width, and/or the precision.
+
+Positional specifiers can be used with the dynamic field width and
+precision capability:
+
+@example
+$ @kbd{gawk 'BEGIN @{}
+> @kbd{printf("%*.*s\n", 10, 20, "hello")}
+> @kbd{printf("%3$*2$.*1$s\n", 20, 10, "hello")}
+> @kbd{@}'}
+@print{} hello
+@print{} hello
+@end example
+
+@quotation NOTE
+When using @samp{*} with a positional specifier, the @samp{*}
+comes first, then the integer position, and then the @samp{$}.
+This is somewhat counterintuitive.
+@end quotation
+
+@cindex @code{printf} statement, positional specifiers, mixing with regular formats
+@cindex positional specifiers, @code{printf} statement, mixing with regular formats
+@cindex format specifiers, mixing regular with positional specifiers
+@command{gawk} does not allow you to mix regular format specifiers
+and those with positional specifiers in the same string:
+
+@example
+$ @kbd{gawk 'BEGIN @{ printf _"%d %3$s\n", 1, 2, "hi" @}'}
+@error{} gawk: cmd. line:1: fatal: must use `count$' on all formats or none
+@end example
+
+@quotation NOTE
+There are some pathological cases that @command{gawk} may fail to
+diagnose. In such cases, the output may not be what you expect.
+It's still a bad idea to try mixing them, even if @command{gawk}
+doesn't detect it.
+@end quotation
+
+Although positional specifiers can be used directly in @command{awk} programs,
+their primary purpose is to help in producing correct translations of
+format strings into languages different from the one in which the program
+is first written.
+
+@node I18N Portability
+@subsection @command{awk} Portability Issues
+
+@cindex portability, internationalization and
+@cindex internationalization, localization, portability and
+@command{gawk}'s internationalization features were purposely chosen to
+have as little impact as possible on the portability of @command{awk}
+programs that use them to other versions of @command{awk}.
+Consider this program:
+
+@example
+BEGIN @{
+ TEXTDOMAIN = "guide"
+ if (Test_Guide) # set with -v
+ bindtextdomain("/test/guide/messages")
+ print _"don't panic!"
+@}
+@end example
+
+@noindent
+As written, it won't work on other versions of @command{awk}.
+However, it is actually almost portable, requiring very little
+change:
+
+@itemize @bullet
+@cindex @code{TEXTDOMAIN} variable, portability and
+@item
+Assignments to @code{TEXTDOMAIN} won't have any effect,
+since @code{TEXTDOMAIN} is not special in other @command{awk} implementations.
+
+@item
+Non-GNU versions of @command{awk} treat marked strings
+as the concatenation of a variable named @code{_} with the string
+following it.@footnote{This is good fodder for an ``Obfuscated
+@command{awk}'' contest.} Typically, the variable @code{_} has
+the null string (@code{""}) as its value, leaving the original string constant as
+the result.
+
+@item
+By defining ``dummy'' functions to replace @code{dcgettext()}, @code{dcngettext()}
+and @code{bindtextdomain()}, the @command{awk} program can be made to run, but
+all the messages are output in the original language.
+For example:
+
+@cindex @code{bindtextdomain()} function (@command{gawk}), portability and
+@cindex @code{dcgettext()} function (@command{gawk}), portability and
+@cindex @code{dcngettext()} function (@command{gawk}), portability and
+@example
+@c file eg/lib/libintl.awk
+function bindtextdomain(dir, domain)
+@{
+ return dir
+@}
+
+function dcgettext(string, domain, category)
+@{
+ return string
+@}
+
+function dcngettext(string1, string2, number, domain, category)
+@{
+ return (number == 1 ? string1 : string2)
+@}
+@c endfile
+@end example
+
+@item
+The use of positional specifications in @code{printf} or
+@code{sprintf()} is @emph{not} portable.
+To support @code{gettext()} at the C level, many systems' C versions of
+@code{sprintf()} do support positional specifiers. But it works only if
+enough arguments are supplied in the function call. Many versions of
+@command{awk} pass @code{printf} formats and arguments unchanged to the
+underlying C library version of @code{sprintf()}, but only one format and
+argument at a time. What happens if a positional specification is
+used is anybody's guess.
+However, since the positional specifications are primarily for use in
+@emph{translated} format strings, and since non-GNU @command{awk}s never
+retrieve the translated string, this should not be a problem in practice.
+@end itemize
+@c ENDOFRANGE inap
+
+@node I18N Example
+@section A Simple Internationalization Example
+
+Now let's look at a step-by-step example of how to internationalize and
+localize a simple @command{awk} program, using @file{guide.awk} as our
+original source:
+
+@example
+@c file eg/prog/guide.awk
+BEGIN @{
+ TEXTDOMAIN = "guide"
+ bindtextdomain(".") # for testing
+ print _"Don't Panic"
+ print _"The Answer Is", 42
+ print "Pardon me, Zaphod who?"
+@}
+@c endfile
+@end example
+
+@noindent
+Run @samp{gawk --gen-pot} to create the @file{.pot} file:
+
+@example
+$ @kbd{gawk --gen-pot -f guide.awk > guide.pot}
+@end example
+
+@noindent
+This produces:
+
+@example
+@c file eg/data/guide.po
+#: guide.awk:4
+msgid "Don't Panic"
+msgstr ""
+
+#: guide.awk:5
+msgid "The Answer Is"
+msgstr ""
+
+@c endfile
+@end example
+
+This original portable object template file is saved and reused for each language
+into which the application is translated. The @code{msgid}
+is the original string and the @code{msgstr} is the translation.
+
+@quotation NOTE
+Strings not marked with a leading underscore do not
+appear in the @file{guide.pot} file.
+@end quotation
+
+Next, the messages must be translated.
+Here is a translation to a hypothetical dialect of English,
+called ``Mellow'':@footnote{Perhaps it would be better if it were
+called ``Hippy.'' Ah, well.}
+
+@example
+@group
+$ cp guide.pot guide-mellow.po
+@var{Add translations to} guide-mellow.po @dots{}
+@end group
+@end example
+
+@noindent
+Following are the translations:
+
+@example
+@c file eg/data/guide-mellow.po
+#: guide.awk:4
+msgid "Don't Panic"
+msgstr "Hey man, relax!"
+
+#: guide.awk:5
+msgid "The Answer Is"
+msgstr "Like, the scoop is"
+
+@c endfile
+@end example
+
+@cindex Linux
+@cindex GNU/Linux
+The next step is to make the directory to hold the binary message object
+file and then to create the @file{guide.mo} file.
+The directory layout shown here is standard for GNU @code{gettext} on
+GNU/Linux systems. Other versions of @code{gettext} may use a different
+layout:
+
+@example
+$ @kbd{mkdir en_US en_US/LC_MESSAGES}
+@end example
+
+@cindex @code{.po} files, converting to @code{.mo}
+@cindex files, @code{.po}, converting to @code{.mo}
+@cindex @code{.mo} files, converting from @code{.po}
+@cindex files, @code{.mo}, converting from @code{.po}
+@cindex portable object files, converting to message object files
+@cindex files, portable object, converting to message object files
+@cindex message object files, converting from portable object files
+@cindex files, message object, converting from portable object files
+@cindex @command{msgfmt} utility
+The @command{msgfmt} utility does the conversion from human-readable
+@file{.po} file to machine-readable @file{.mo} file.
+By default, @command{msgfmt} creates a file named @file{messages}.
+This file must be renamed and placed in the proper directory so that
+@command{gawk} can find it:
+
+@example
+$ @kbd{msgfmt guide-mellow.po}
+$ @kbd{mv messages en_US/LC_MESSAGES/guide.mo}
+@end example
+
+Finally, we run the program to test it:
+
+@example
+$ @kbd{gawk -f guide.awk}
+@print{} Hey man, relax!
+@print{} Like, the scoop is 42
+@print{} Pardon me, Zaphod who?
+@end example
+
+If the three replacement functions for @code{dcgettext()}, @code{dcngettext()}
+and @code{bindtextdomain()}
+(@pxref{I18N Portability})
+are in a file named @file{libintl.awk},
+then we can run @file{guide.awk} unchanged as follows:
+
+@example
+$ @kbd{gawk --posix -f guide.awk -f libintl.awk}
+@print{} Don't Panic
+@print{} The Answer Is 42
+@print{} Pardon me, Zaphod who?
+@end example
+
+@node Gawk I18N
+@section @command{gawk} Can Speak Your Language
+
+@command{gawk} itself has been internationalized
+using the GNU @code{gettext} package.
+(GNU @code{gettext} is described in
+complete detail in
+@ifinfo
+@inforef{Top, , GNU @code{gettext} utilities, gettext, GNU gettext tools}.)
+@end ifinfo
+@ifnotinfo
+@cite{GNU gettext tools}.)
+@end ifnotinfo
+As of this writing, the latest version of GNU @code{gettext} is
+@uref{ftp://ftp.gnu.org/gnu/gettext/gettext-0.18.1.tar.gz, @value{PVERSION} 0.18.1}.
+
+If a translation of @command{gawk}'s messages exists,
+then @command{gawk} produces usage messages, warnings,
+and fatal errors in the local language.
+@c ENDOFRANGE inloc
+
+@node Advanced Features
+@chapter Advanced Features of @command{gawk}
+@cindex advanced features, network connections, See Also networks, connections
+@c STARTOFRANGE gawadv
+@cindex @command{gawk}, features, advanced
+@c STARTOFRANGE advgaw
+@cindex advanced features, @command{gawk}
+@ignore
+Contributed by: Peter Langston <pud!psl@bellcore.bellcore.com>
+
+ Found in Steve English's "signature" line:
+
+"Write documentation as if whoever reads it is a violent psychopath
+who knows where you live."
+@end ignore
+@quotation
+@i{Write documentation as if whoever reads it is
+a violent psychopath who knows where you live.}@*
+Steve English, as quoted by Peter Langston
+@end quotation
+
+This @value{CHAPTER} discusses advanced features in @command{gawk}.
+It's a bit of a ``grab bag'' of items that are otherwise unrelated
+to each other.
+First, a command-line option allows @command{gawk} to recognize
+nondecimal numbers in input data, not just in @command{awk}
+programs.
+Then, @command{gawk}'s special features for sorting arrays are presented.
+Next, two-way I/O, discussed briefly in earlier parts of this
+@value{DOCUMENT}, is described in full detail, along with the basics
+of TCP/IP networking. Finally, @command{gawk}
+can @dfn{profile} an @command{awk} program, making it possible to tune
+it for performance.
+
+@ref{Dynamic Extensions},
+discusses the ability to dynamically add new built-in functions to
+@command{gawk}. As this feature is still immature and likely to change,
+its description is relegated to an appendix.
+
+@menu
+* Nondecimal Data:: Allowing nondecimal input data.
+* Array Sorting:: Facilities for controlling array traversal and
+ sorting arrays.
+* Two-way I/O:: Two-way communications with another process.
+* TCP/IP Networking:: Using @command{gawk} for network programming.
+* Profiling:: Profiling your @command{awk} programs.
+@end menu
+
+@node Nondecimal Data
+@section Allowing Nondecimal Input Data
+@cindex @code{--non-decimal-data} option
+@cindex advanced features, @command{gawk}, nondecimal input data
+@cindex input, data@comma{} nondecimal
+@cindex constants, nondecimal
+
+If you run @command{gawk} with the @option{--non-decimal-data} option,
+you can have nondecimal constants in your input data:
+
+@c line break here for small book format
+@example
+$ @kbd{echo 0123 123 0x123 |}
+> @kbd{gawk --non-decimal-data '@{ printf "%d, %d, %d\n",}
+> @kbd{$1, $2, $3 @}'}
+@print{} 83, 123, 291
+@end example
+
+For this feature to work, write your program so that
+@command{gawk} treats your data as numeric:
+
+@example
+$ @kbd{echo 0123 123 0x123 | gawk '@{ print $1, $2, $3 @}'}
+@print{} 0123 123 0x123
+@end example
+
+@noindent
+The @code{print} statement treats its expressions as strings.
+Although the fields can act as numbers when necessary,
+they are still strings, so @code{print} does not try to treat them
+numerically. You may need to add zero to a field to force it to
+be treated as a number. For example:
+
+@example
+$ @kbd{echo 0123 123 0x123 | gawk --non-decimal-data '}
+> @kbd{@{ print $1, $2, $3}
+> @kbd{print $1 + 0, $2 + 0, $3 + 0 @}'}
+@print{} 0123 123 0x123
+@print{} 83 123 291
+@end example
+
+Because it is common to have decimal data with leading zeros, and because
+using this facility could lead to surprising results, the default is to leave it
+disabled. If you want it, you must explicitly request it.
+
+@cindex programming conventions, @code{--non-decimal-data} option
+@cindex @code{--non-decimal-data} option, @code{strtonum()} function and
+@cindex @code{strtonum()} function (@command{gawk}), @code{--non-decimal-data} option and
+@quotation CAUTION
+@emph{Use of this option is not recommended.}
+It can break old programs very badly.
+Instead, use the @code{strtonum()} function to convert your data
+(@pxref{Nondecimal-numbers}).
+This makes your programs easier to write and easier to read, and
+leads to less surprising results.
+@end quotation
+
+@node Array Sorting
+@section Controlling Array Traversal and Array Sorting
+
+@command{gawk} lets you control the order in which a @samp{for (i in array)}
+loop traverses an array.
+
+In addition, two built-in functions, @code{asort()} and @code{asorti()},
+let you sort arrays based on the array values and indices, respectively.
+These two functions also provide control over the sorting criteria used
+to order the elements during sorting.
+
+@menu
+* Controlling Array Traversal:: How to use PROCINFO["sorted_in"].
+* Array Sorting Functions:: How to use @code{asort()} and @code{asorti()}.
+@end menu
+
+@node Controlling Array Traversal
+@subsection Controlling Array Traversal
+
+By default, the order in which a @samp{for (i in array)} loop
+scans an array is not defined; it is generally based upon
+the internal implementation of arrays inside @command{awk}.
+
+Often, though, it is desirable to be able to loop over the elements
+in a particular order that you, the programmer, choose. @command{gawk}
+lets you do this.
+
+@ref{Controlling Scanning}, describes how you can assign special,
+pre-defined values to @code{PROCINFO["sorted_in"]} in order to
+control the order in which @command{gawk} will traverse an array
+during a @code{for} loop.
+
+In addition, the value of @code{PROCINFO["sorted_in"]} can be a function name.
+This lets you traverse an array based on any custom criterion.
+The array elements are ordered according to the return value of this
+function. The comparison function should be defined with at least
+four arguments:
+
+@example
+function comp_func(i1, v1, i2, v2)
+@{
+ @var{compare elements 1 and 2 in some fashion}
+ @var{return < 0; 0; or > 0}
+@}
+@end example
+
+Here, @var{i1} and @var{i2} are the indices, and @var{v1} and @var{v2}
+are the corresponding values of the two elements being compared.
+Either @var{v1} or @var{v2}, or both, can be arrays if the array being
+traversed contains subarrays as values.
+(@xref{Arrays of Arrays}, for more information about subarrays.)
+The three possible return values are interpreted as follows:
+
+@table @code
+@item comp_func(i1, v1, i2, v2) < 0
+Index @var{i1} comes before index @var{i2} during loop traversal.
+
+@item comp_func(i1, v1, i2, v2) == 0
+Indices @var{i1} and @var{i2}
+come together but the relative order with respect to each other is undefined.
+
+@item comp_func(i1, v1, i2, v2) > 0
+Index @var{i1} comes after index @var{i2} during loop traversal.
+@end table
+
+Our first comparison function can be used to scan an array in
+numerical order of the indices:
+
+@example
+function cmp_num_idx(i1, v1, i2, v2)
+@{
+ # numerical index comparison, ascending order
+ return (i1 - i2)
+@}
+@end example
+
+Our second function traverses an array based on the string order of
+the element values rather than by indices:
+
+@example
+function cmp_str_val(i1, v1, i2, v2)
+@{
+ # string value comparison, ascending order
+ v1 = v1 ""
+ v2 = v2 ""
+ if (v1 < v2)
+ return -1
+ return (v1 != v2)
+@}
+@end example
+
+The third
+comparison function makes all numbers, and numeric strings without
+any leading or trailing spaces, come out first during loop traversal:
+
+@example
+function cmp_num_str_val(i1, v1, i2, v2, n1, n2)
+@{
+ # numbers before string value comparison, ascending order
+ n1 = v1 + 0
+ n2 = v2 + 0
+ if (n1 == v1)
+ return (n2 == v2) ? (n1 - n2) : -1
+ else if (n2 == v2)
+ return 1
+ return (v1 < v2) ? -1 : (v1 != v2)
+@}
+@end example
+
+Here is a main program to demonstrate how @command{gawk}
+behaves using each of the previous functions:
+
+@example
+BEGIN @{
+ data["one"] = 10
+ data["two"] = 20
+ data[10] = "one"
+ data[100] = 100
+ data[20] = "two"
+
+ f[1] = "cmp_num_idx"
+ f[2] = "cmp_str_val"
+ f[3] = "cmp_num_str_val"
+ for (i = 1; i <= 3; i++) @{
+ printf("Sort function: %s\n", f[i])
+ PROCINFO["sorted_in"] = f[i]
+ for (j in data)
+ printf("\tdata[%s] = %s\n", j, data[j])
+ print ""
+ @}
+@}
+@end example
+
+Here are the results when the program is run:
+@page
+
+@example
+$ @kbd{gawk -f compdemo.awk}
+@print{} Sort function: cmp_num_idx @ii{Sort by numeric index}
+@print{} data[two] = 20
+@print{} data[one] = 10 @ii{Both strings are numerically zero}
+@print{} data[10] = one
+@print{} data[20] = two
+@print{} data[100] = 100
+@print{}
+@print{} Sort function: cmp_str_val @ii{Sort by element values as strings}
+@print{} data[one] = 10
+@print{} data[100] = 100 @ii{String 100 is less than string 20}
+@print{} data[two] = 20
+@print{} data[10] = one
+@print{} data[20] = two
+@print{}
+@print{} Sort function: cmp_num_str_val @ii{Sort all numeric values before all strings}
+@print{} data[one] = 10
+@print{} data[two] = 20
+@print{} data[100] = 100
+@print{} data[10] = one
+@print{} data[20] = two
+@end example
+
+Consider sorting the entries of a GNU/Linux system password file
+according to login name. The following program sorts records
+by a specific field position and can be used for this purpose:
+
+@example
+# sort.awk --- simple program to sort by field position
+# field position is specified by the global variable POS
+
+function cmp_field(i1, v1, i2, v2)
+@{
+ # comparison by value, as string, and ascending order
+ return v1[POS] < v2[POS] ? -1 : (v1[POS] != v2[POS])
+@}
+
+@{
+ for (i = 1; i <= NF; i++)
+ a[NR][i] = $i
+@}
+
+END @{
+ PROCINFO["sorted_in"] = "cmp_field"
+ if (POS < 1 || POS > NF)
+ POS = 1
+ for (i in a) @{
+ for (j = 1; j <= NF; j++)
+ printf("%s%c", a[i][j], j < NF ? ":" : "")
+ print ""
+ @}
+@}
+@end example
+
+The first field in each entry of the password file is the user's login name,
+and the fields are separated by colons.
+Each record defines a subarray,
+with each field as an element in the subarray.
+Running the program produces the
+following output:
+
+@example
+$ @kbd{gawk -v POS=1 -F: -f sort.awk /etc/passwd}
+@print{} adm:x:3:4:adm:/var/adm:/sbin/nologin
+@print{} apache:x:48:48:Apache:/var/www:/sbin/nologin
+@print{} avahi:x:70:70:Avahi daemon:/:/sbin/nologin
+@dots{}
+@end example
+
+The comparison should normally always return the same value when given a
+specific pair of array elements as its arguments. If inconsistent
+results are returned then the order is undefined. This behavior can be
+exploited to introduce random order into otherwise seemingly
+ordered data:
+
+@example
+function cmp_randomize(i1, v1, i2, v2)
+@{
+ # random order
+ return (2 - 4 * rand())
+@}
+@end example
+
+As mentioned above, the order of the indices is arbitrary if two
+elements compare equal. This is usually not a problem, but letting
+the tied elements come out in arbitrary order can be an issue, especially
+when comparing item values. The partial ordering of the equal elements
+may change during the next loop traversal, if other elements are added or
+removed from the array. One way to resolve ties when comparing elements
+with otherwise equal values is to include the indices in the comparison
+rules. Note that doing this may make the loop traversal less efficient,
+so consider it only if necessary. The following comparison functions
+force a deterministic order, and are based on the fact that the
+indices of two elements are never equal:
+
+@example
+function cmp_numeric(i1, v1, i2, v2)
+@{
+ # numerical value (and index) comparison, descending order
+ return (v1 != v2) ? (v2 - v1) : (i2 - i1)
+@}
+
+function cmp_string(i1, v1, i2, v2)
+@{
+ # string value (and index) comparison, descending order
+ v1 = v1 i1
+ v2 = v2 i2
+ return (v1 > v2) ? -1 : (v1 != v2)
+@}
+@end example
+
+@c Avoid using the term ``stable'' when describing the unpredictable behavior
+@c if two items compare equal. Usually, the goal of a "stable algorithm"
+@c is to maintain the original order of the items, which is a meaningless
+@c concept for a list constructed from a hash.
+
+A custom comparison function can often simplify ordered loop
+traversal, and the sky is really the limit when it comes to
+designing such a function.
+
+When string comparisons are made during a sort, either for element
+values where one or both aren't numbers, or for element indices
+handled as strings, the value of @code{IGNORECASE}
+(@pxref{Built-in Variables}) controls whether
+the comparisons treat corresponding uppercase and lowercase letters as
+equivalent or distinct.
+
+Another point to keep in mind is that in the case of subarrays
+the element values can themselves be arrays; a production comparison
+function should use the @code{isarray()} function
+(@pxref{Type Functions}),
+to check for this, and choose a defined sorting order for subarrays.
+
+All sorting based on @code{PROCINFO["sorted_in"]}
+is disabled in POSIX mode,
+since the @code{PROCINFO} array is not special in that case.
+
+As a side note, sorting the array indices before traversing
+the array has been reported to add 15% to 20% overhead to the
+execution time of @command{awk} programs. For this reason,
+sorted array traversal is not the default.
+
+@c The @command{gawk}
+@c maintainers believe that only the people who wish to use a
+@c feature should have to pay for it.
+
+@node Array Sorting Functions
+@subsection Sorting Array Values and Indices with @command{gawk}
+
+@cindex arrays, sorting
+@cindex @code{asort()} function (@command{gawk})
+@cindex @code{asort()} function (@command{gawk}), arrays@comma{} sorting
+@cindex sort function, arrays, sorting
+In most @command{awk} implementations, sorting an array requires
+writing a @code{sort()} function.
+While this can be educational for exploring different sorting algorithms,
+usually that's not the point of the program.
+@command{gawk} provides the built-in @code{asort()}
+and @code{asorti()} functions
+(@pxref{String Functions})
+for sorting arrays. For example:
+
+@example
+@var{populate the array} data
+n = asort(data)
+for (i = 1; i <= n; i++)
+ @var{do something with} data[i]
+@end example
+
+After the call to @code{asort()}, the array @code{data} is indexed from 1
+to some number @var{n}, the total number of elements in @code{data}.
+(This count is @code{asort()}'s return value.)
+@code{data[1]} @value{LEQ} @code{data[2]} @value{LEQ} @code{data[3]}, and so on.
+The comparison is based on the type of the elements
+(@pxref{Typing and Comparison}).
+All numeric values come before all string values,
+which in turn come before all subarrays.
+
+@cindex side effects, @code{asort()} function
+An important side effect of calling @code{asort()} is that
+@emph{the array's original indices are irrevocably lost}.
+As this isn't always desirable, @code{asort()} accepts a
+second argument:
+
+@example
+@var{populate the array} source
+n = asort(source, dest)
+for (i = 1; i <= n; i++)
+ @var{do something with} dest[i]
+@end example
+
+In this case, @command{gawk} copies the @code{source} array into the
+@code{dest} array and then sorts @code{dest}, destroying its indices.
+However, the @code{source} array is not affected.
+
+@code{asort()} accepts a third string argument to control comparison of
+array elements. As with @code{PROCINFO["sorted_in"]}, this argument
+may be one of the predefined names that @command{gawk} provides
+(@pxref{Controlling Scanning}), or the name of a user-defined function
+(@pxref{Controlling Array Traversal}).
+
+@quotation NOTE
+In all cases, the sorted element values consist of the original
+array's element values. The ability to control comparison merely
+affects the way in which they are sorted.
+@end quotation
+
+Often, what's needed is to sort on the values of the @emph{indices}
+instead of the values of the elements.
+To do that, use the
+@code{asorti()} function. The interface is identical to that of
+@code{asort()}, except that the index values are used for sorting, and
+become the values of the result array:
+
+@example
+@{ source[$0] = some_func($0) @}
+
+END @{
+ n = asorti(source, dest)
+ for (i = 1; i <= n; i++) @{
+ @ii{Work with sorted indices directly:}
+ @var{do something with} dest[i]
+ @dots{}
+ @ii{Access original array via sorted indices:}
+ @var{do something with} source[dest[i]]
+ @}
+@}
+@end example
+
+Similar to @code{asort()},
+in all cases, the sorted element values consist of the original
+array's indices. The ability to control comparison merely
+affects the way in which they are sorted.
+
+Sorting the array by replacing the indices provides maximal flexibility.
+To traverse the elements in decreasing order, use a loop that goes from
+@var{n} down to 1, either over the elements or over the indices.@footnote{You
+may also use one of the predefined sorting names that sorts in
+decreasing order.}
+
+@cindex reference counting, sorting arrays
+Copying array indices and elements isn't expensive in terms of memory.
+Internally, @command{gawk} maintains @dfn{reference counts} to data.
+For example, when @code{asort()} copies the first array to the second one,
+there is only one copy of the original array elements' data, even though
+both arrays use the values.
+
+@c Document It And Call It A Feature. Sigh.
+@cindex @command{gawk}, @code{IGNORECASE} variable in
+@cindex @code{IGNORECASE} variable
+@cindex arrays, sorting, @code{IGNORECASE} variable and
+@cindex @code{IGNORECASE} variable, array sorting and
+Because @code{IGNORECASE} affects string comparisons, the value
+of @code{IGNORECASE} also affects sorting for both @code{asort()} and @code{asorti()}.
+Note also that the locale's sorting order does @emph{not}
+come into play; comparisons are based on character values only.@footnote{This
+is true because locale-based comparison occurs only when in POSIX
+compatibility mode, and since @code{asort()} and @code{asorti()} are
+@command{gawk} extensions, they are not available in that case.}
+Caveat Emptor.
+
+@node Two-way I/O
+@section Two-Way Communications with Another Process
+@cindex Brennan, Michael
+@cindex programmers, attractiveness of
+@smallexample
+@c Path: cssun.mathcs.emory.edu!gatech!newsxfer3.itd.umich.edu!news-peer.sprintlink.net!news-sea-19.sprintlink.net!news-in-west.sprintlink.net!news.sprintlink.net!Sprint!204.94.52.5!news.whidbey.com!brennan
+From: brennan@@whidbey.com (Mike Brennan)
+Newsgroups: comp.lang.awk
+Subject: Re: Learn the SECRET to Attract Women Easily
+Date: 4 Aug 1997 17:34:46 GMT
+@c Organization: WhidbeyNet
+@c Lines: 12
+Message-ID: <5s53rm$eca@@news.whidbey.com>
+@c References: <5s20dn$2e1@chronicle.concentric.net>
+@c Reply-To: brennan@whidbey.com
+@c NNTP-Posting-Host: asn202.whidbey.com
+@c X-Newsreader: slrn (0.9.4.1 UNIX)
+@c Xref: cssun.mathcs.emory.edu comp.lang.awk:5403
+
+On 3 Aug 1997 13:17:43 GMT, Want More Dates???
+<tracy78@@kilgrona.com> wrote:
+>Learn the SECRET to Attract Women Easily
+>
+>The SCENT(tm) Pheromone Sex Attractant For Men to Attract Women
+
+The scent of awk programmers is a lot more attractive to women than
+the scent of perl programmers.
+--
+Mike Brennan
+@c brennan@@whidbey.com
+@end smallexample
+
+@cindex advanced features, @command{gawk}, processes@comma{} communicating with
+@cindex processes, two-way communications with
+It is often useful to be able to
+send data to a separate program for
+processing and then read the result. This can always be
+done with temporary files:
+
+@example
+# Write the data for processing
+tempfile = ("mydata." PROCINFO["pid"])
+while (@var{not done with data})
+ print @var{data} | ("subprogram > " tempfile)
+close("subprogram > " tempfile)
+
+# Read the results, remove tempfile when done
+while ((getline newdata < tempfile) > 0)
+ @var{process} newdata @var{appropriately}
+close(tempfile)
+system("rm " tempfile)
+@end example
+
+@noindent
+This works, but not elegantly. Among other things, it requires that
+the program be run in a directory that cannot be shared among users;
+for example, @file{/tmp} will not do, as another user might happen
+to be using a temporary file with the same name.
+
+@cindex coprocesses
+@cindex input/output, two-way
+@cindex @code{|} (vertical bar), @code{|&} operator (I/O)
+@cindex vertical bar (@code{|}), @code{|&} operator (I/O)
+@cindex @command{csh} utility, @code{|&} operator, comparison with
+However, with @command{gawk}, it is possible to
+open a @emph{two-way} pipe to another process. The second process is
+termed a @dfn{coprocess}, since it runs in parallel with @command{gawk}.
+The two-way connection is created using the @samp{|&} operator
+(borrowed from the Korn shell, @command{ksh}):@footnote{This is very
+different from the same operator in the C shell.}
+
+@example
+do @{
+ print @var{data} |& "subprogram"
+ "subprogram" |& getline results
+@} while (@var{data left to process})
+close("subprogram")
+@end example
+
+The first time an I/O operation is executed using the @samp{|&}
+operator, @command{gawk} creates a two-way pipeline to a child process
+that runs the other program. Output created with @code{print}
+or @code{printf} is written to the program's standard input, and
+output from the program's standard output can be read by the @command{gawk}
+program using @code{getline}.
+As is the case with processes started by @samp{|}, the subprogram
+can be any program, or pipeline of programs, that can be started by
+the shell.
+
+There are some cautionary items to be aware of:
+
+@itemize @bullet
+@item
+As the code inside @command{gawk} currently stands, the coprocess's
+standard error goes to the same place that the parent @command{gawk}'s
+standard error goes. It is not possible to read the child's
+standard error separately.
+
+@cindex deadlocks
+@cindex buffering, input/output
+@cindex @code{getline} command, deadlock and
+@item
+I/O buffering may be a problem. @command{gawk} automatically
+flushes all output down the pipe to the coprocess.
+However, if the coprocess does not flush its output,
+@command{gawk} may hang when doing a @code{getline} in order to read
+the coprocess's results. This could lead to a situation
+known as @dfn{deadlock}, where each process is waiting for the
+other one to do something.
+@end itemize
+
+@cindex @code{close()} function, two-way pipes and
+It is possible to close just one end of the two-way pipe to
+a coprocess, by supplying a second argument to the @code{close()}
+function of either @code{"to"} or @code{"from"}
+(@pxref{Close Files And Pipes}).
+These strings tell @command{gawk} to close the end of the pipe
+that sends data to the coprocess or the end that reads from it,
+respectively.
+
+@cindex @command{sort} utility, coprocesses and
+This is particularly necessary in order to use
+the system @command{sort} utility as part of a coprocess;
+@command{sort} must read @emph{all} of its input
+data before it can produce any output.
+The @command{sort} program does not receive an end-of-file indication
+until @command{gawk} closes the write end of the pipe.
+
+When you have finished writing data to the @command{sort}
+utility, you can close the @code{"to"} end of the pipe, and
+then start reading sorted data via @code{getline}.
+For example:
+
+@example
+BEGIN @{
+ command = "LC_ALL=C sort"
+ n = split("abcdefghijklmnopqrstuvwxyz", a, "")
+
+ for (i = n; i > 0; i--)
+ print a[i] |& command
+ close(command, "to")
+
+ while ((command |& getline line) > 0)
+ print "got", line
+ close(command)
+@}
+@end example
+
+This program writes the letters of the alphabet in reverse order, one
+per line, down the two-way pipe to @command{sort}. It then closes the
+write end of the pipe, so that @command{sort} receives an end-of-file
+indication. This causes @command{sort} to sort the data and write the
+sorted data back to the @command{gawk} program. Once all of the data
+has been read, @command{gawk} terminates the coprocess and exits.
+
+As a side note, the assignment @samp{LC_ALL=C} in the @command{sort}
+command ensures traditional Unix (ASCII) sorting from @command{sort}.
+
+@cindex @command{gawk}, @code{PROCINFO} array in
+@cindex @code{PROCINFO} array
+You may also use pseudo-ttys (ptys) for
+two-way communication instead of pipes, if your system supports them.
+This is done on a per-command basis, by setting a special element
+in the @code{PROCINFO} array
+(@pxref{Auto-set}),
+like so:
+
+@example
+command = "sort -nr" # command, save in convenience variable
+PROCINFO[command, "pty"] = 1 # update PROCINFO
+print @dots{} |& command # start two-way pipe
+@dots{}
+@end example
+
+@noindent
+Using ptys avoids the buffer deadlock issues described earlier, at some
+loss in performance. If your system does not have ptys, or if all the
+system's ptys are in use, @command{gawk} automatically falls back to
+using regular pipes.
+
+@node TCP/IP Networking
+@section Using @command{gawk} for Network Programming
+@cindex advanced features, @command{gawk}, network programming
+@cindex networks, programming
+@c STARTOFRANGE tcpip
+@cindex TCP/IP
+@cindex @code{/inet/@dots{}} special files (@command{gawk})
+@cindex files, @code{/inet/@dots{}} (@command{gawk})
+@cindex @code{/inet4/@dots{}} special files (@command{gawk})
+@cindex files, @code{/inet4/@dots{}} (@command{gawk})
+@cindex @code{/inet6/@dots{}} special files (@command{gawk})
+@cindex files, @code{/inet6/@dots{}} (@command{gawk})
+@cindex @code{EMISTERED}
+@quotation
+@code{EMISTERED}:@*
+@ @ @ @ @i{A host is a host from coast to coast,@*
+@ @ @ @ and no-one can talk to host that's close,@*
+@ @ @ @ unless the host that isn't close@*
+@ @ @ @ is busy hung or dead.}
+@end quotation
+
+In addition to being able to open a two-way pipeline to a coprocess
+on the same system
+(@pxref{Two-way I/O}),
+it is possible to make a two-way connection to
+another process on another system across an IP network connection.
+
+You can think of this as just a @emph{very long} two-way pipeline to
+a coprocess.
+The way @command{gawk} decides that you want to use TCP/IP networking is
+by recognizing special @value{FN}s that begin with one of @samp{/inet/},
+@samp{/inet4/} or @samp{/inet6}.
+
+The full syntax of the special @value{FN} is
+@file{/@var{net-type}/@var{protocol}/@var{local-port}/@var{remote-host}/@var{remote-port}}.
+The components are:
+
+@table @var
+@item net-type
+Specifies the kind of Internet connection to make.
+Use @samp{/inet4/} to force IPv4, and
+@samp{/inet6/} to force IPv6.
+Plain @samp{/inet/} (which used to be the only option) uses
+the system default, most likely IPv4.
+
+@item protocol
+The protocol to use over IP. This must be either @samp{tcp}, or
+@samp{udp}, for a TCP or UDP IP connection,
+respectively. The use of TCP is recommended for most applications.
+
+@item local-port
+@cindex @code{getaddrinfo()} function (C library)
+The local TCP or UDP port number to use. Use a port number of @samp{0}
+when you want the system to pick a port. This is what you should do
+when writing a TCP or UDP client.
+You may also use a well-known service name, such as @samp{smtp}
+or @samp{http}, in which case @command{gawk} attempts to determine
+the predefined port number using the C @code{getaddrinfo()} function.
+
+@item remote-host
+The IP address or fully-qualified domain name of the Internet
+host to which you want to connect.
+
+@item remote-port
+The TCP or UDP port number to use on the given @var{remote-host}.
+Again, use @samp{0} if you don't care, or else a well-known
+service name.
+@end table
+
+@cindex @command{gawk}, @code{ERRNO} variable in
+@cindex @code{ERRNO} variable
+@quotation NOTE
+Failure in opening a two-way socket will result in a non-fatal error
+being returned to the calling code. The value of @code{ERRNO} indicates
+the error (@pxref{Auto-set}).
+@end quotation
+
+Consider the following very simple example:
+
+@example
+BEGIN @{
+ Service = "/inet/tcp/0/localhost/daytime"
+ Service |& getline
+ print $0
+ close(Service)
+@}
+@end example
+
+This program reads the current date and time from the local system's
+TCP @samp{daytime} server.
+It then prints the results and closes the connection.
+
+Because this topic is extensive, the use of @command{gawk} for
+TCP/IP programming is documented separately.
+@ifinfo
+See
+@inforef{Top, , General Introduction, gawkinet, TCP/IP Internetworking with @command{gawk}},
+@end ifinfo
+@ifnotinfo
+See @cite{TCP/IP Internetworking with @command{gawk}},
+which comes as part of the @command{gawk} distribution,
+@end ifnotinfo
+for a much more complete introduction and discussion, as well as
+extensive examples.
+
+@c ENDOFRANGE tcpip
+
+@node Profiling
+@section Profiling Your @command{awk} Programs
+@c STARTOFRANGE awkp
+@cindex @command{awk} programs, profiling
+@c STARTOFRANGE proawk
+@cindex profiling @command{awk} programs
+@cindex profiling @command{gawk}
+@cindex @code{awkprof.out} file
+@cindex files, @code{awkprof.out}
+
+You may produce execution traces of your @command{awk} programs.
+This is done by passing the option @option{--profile} to @command{gawk}.
+When @command{gawk} has finished running, it creates a profile of your program in a file
+named @file{awkprof.out}. Because it is profiling, it also executes up to 45% slower than
+@command{gawk} normally does.
+
+@cindex @code{--profile} option
+As shown in the following example,
+the @option{--profile} option can be used to change the name of the file
+where @command{gawk} will write the profile:
+
+@example
+gawk --profile=myprog.prof -f myprog.awk data1 data2
+@end example
+
+@noindent
+In the above example, @command{gawk} places the profile in
+@file{myprog.prof} instead of in @file{awkprof.out}.
+
+Here is a sample session showing a simple @command{awk} program, its input data, and the
+results from running @command{gawk} with the @option{--profile} option.
+First, the @command{awk} program:
+
+@example
+BEGIN @{ print "First BEGIN rule" @}
+
+END @{ print "First END rule" @}
+
+/foo/ @{
+ print "matched /foo/, gosh"
+ for (i = 1; i <= 3; i++)
+ sing()
+@}
+
+@{
+ if (/foo/)
+ print "if is true"
+ else
+ print "else is true"
+@}
+
+BEGIN @{ print "Second BEGIN rule" @}
+
+END @{ print "Second END rule" @}
+
+function sing( dummy)
+@{
+ print "I gotta be me!"
+@}
+@end example
+
+Following is the input data:
+
+@example
+foo
+bar
+baz
+foo
+junk
+@end example
+
+Here is the @file{awkprof.out} that results from running the @command{gawk}
+profiler on this program and data (this example also illustrates that @command{awk}
+programmers sometimes have to work late):
+
+@cindex @code{BEGIN} pattern
+@cindex @code{END} pattern
+@example
+ # gawk profile, created Sun Aug 13 00:00:15 2000
+
+ # BEGIN block(s)
+
+ BEGIN @{
+ 1 print "First BEGIN rule"
+ 1 print "Second BEGIN rule"
+ @}
+
+ # Rule(s)
+
+ 5 /foo/ @{ # 2
+ 2 print "matched /foo/, gosh"
+ 6 for (i = 1; i <= 3; i++) @{
+ 6 sing()
+ @}
+ @}
+
+ 5 @{
+ 5 if (/foo/) @{ # 2
+ 2 print "if is true"
+ 3 @} else @{
+ 3 print "else is true"
+ @}
+ @}
+
+ # END block(s)
+
+ END @{
+ 1 print "First END rule"
+ 1 print "Second END rule"
+ @}
+
+ # Functions, listed alphabetically
+
+ 6 function sing(dummy)
+ @{
+ 6 print "I gotta be me!"
+ @}
+@end example
+
+This example illustrates many of the basic features of profiling output.
+They are as follows:
+
+@itemize @bullet
+@item
+The program is printed in the order @code{BEGIN} rule,
+@code{BEGINFILE} rule,
+pattern/action rules,
+@code{ENDFILE} rule, @code{END} rule and functions, listed
+alphabetically.
+Multiple @code{BEGIN} and @code{END} rules are merged together,
+as are multiple @code{BEGINFILE} and @code{ENDFILE} rules.
+
+@cindex patterns, counts
+@item
+Pattern-action rules have two counts.
+The first count, to the left of the rule, shows how many times
+the rule's pattern was @emph{tested}.
+The second count, to the right of the rule's opening left brace
+in a comment,
+shows how many times the rule's action was @emph{executed}.
+The difference between the two indicates how many times the rule's
+pattern evaluated to false.
+
+@item
+Similarly,
+the count for an @code{if}-@code{else} statement shows how many times
+the condition was tested.
+To the right of the opening left brace for the @code{if}'s body
+is a count showing how many times the condition was true.
+The count for the @code{else}
+indicates how many times the test failed.
+
+@cindex loops, count for header
+@item
+The count for a loop header (such as @code{for}
+or @code{while}) shows how many times the loop test was executed.
+(Because of this, you can't just look at the count on the first
+statement in a rule to determine how many times the rule was executed.
+If the first statement is a loop, the count is misleading.)
+
+@cindex functions, user-defined, counts
+@cindex user-defined, functions, counts
+@item
+For user-defined functions, the count next to the @code{function}
+keyword indicates how many times the function was called.
+The counts next to the statements in the body show how many times
+those statements were executed.
+
+@cindex @code{@{@}} (braces)
+@cindex braces (@code{@{@}})
+@item
+The layout uses ``K&R'' style with TABs.
+Braces are used everywhere, even when
+the body of an @code{if}, @code{else}, or loop is only a single statement.
+
+@cindex @code{()} (parentheses)
+@cindex parentheses @code{()}
+@item
+Parentheses are used only where needed, as indicated by the structure
+of the program and the precedence rules.
+@c extra verbiage here satisfies the copyeditor. ugh.
+For example, @samp{(3 + 5) * 4} means add three plus five, then multiply
+the total by four. However, @samp{3 + 5 * 4} has no parentheses, and
+means @samp{3 + (5 * 4)}.
+
+@ignore
+@item
+All string concatenations are parenthesized too.
+(This could be made a bit smarter.)
+@end ignore
+
+@item
+Parentheses are used around the arguments to @code{print}
+and @code{printf} only when
+the @code{print} or @code{printf} statement is followed by a redirection.
+Similarly, if
+the target of a redirection isn't a scalar, it gets parenthesized.
+
+@item
+@command{gawk} supplies leading comments in
+front of the @code{BEGIN} and @code{END} rules,
+the pattern/action rules, and the functions.
+
+@end itemize
+
+The profiled version of your program may not look exactly like what you
+typed when you wrote it. This is because @command{gawk} creates the
+profiled version by ``pretty printing'' its internal representation of
+the program. The advantage to this is that @command{gawk} can produce
+a standard representation. The disadvantage is that all source-code
+comments are lost, as are the distinctions among multiple @code{BEGIN},
+@code{END}, @code{BEGINFILE}, and @code{ENDFILE} rules. Also, things such as:
+
+@example
+/foo/
+@end example
+
+@noindent
+come out as:
+
+@example
+/foo/ @{
+ print $0
+@}
+@end example
+
+@noindent
+which is correct, but possibly surprising.
+
+@cindex profiling @command{awk} programs, dynamically
+@cindex @command{gawk} program, dynamic profiling
+Besides creating profiles when a program has completed,
+@command{gawk} can produce a profile while it is running.
+This is useful if your @command{awk} program goes into an
+infinite loop and you want to see what has been executed.
+To use this feature, run @command{gawk} with the @option{--profile}
+option in the background:
+
+@example
+$ @kbd{gawk --profile -f myprog &}
+[1] 13992
+@end example
+
+@cindex @command{kill} command@comma{} dynamic profiling
+@cindex @code{USR1} signal
+@cindex @code{SIGUSR1} signal
+@cindex signals, @code{USR1}/@code{SIGUSR1}
+@noindent
+The shell prints a job number and process ID number; in this case, 13992.
+Use the @command{kill} command to send the @code{USR1} signal
+to @command{gawk}:
+
+@example
+$ @kbd{kill -USR1 13992}
+@end example
+
+@noindent
+As usual, the profiled version of the program is written to
+@file{awkprof.out}, or to a different file if one specified with
+the @option{--profile} option.
+
+Along with the regular profile, as shown earlier, the profile
+includes a trace of any active functions:
+
+@example
+# Function Call Stack:
+
+# 3. baz
+# 2. bar
+# 1. foo
+# -- main --
+@end example
+
+You may send @command{gawk} the @code{USR1} signal as many times as you like.
+Each time, the profile and function call trace are appended to the output
+profile file.
+
+@cindex @code{HUP} signal
+@cindex @code{SIGHUP} signal
+@cindex signals, @code{HUP}/@code{SIGHUP}
+If you use the @code{HUP} signal instead of the @code{USR1} signal,
+@command{gawk} produces the profile and the function call trace and then exits.
+
+@cindex @code{INT} signal (MS-Windows)
+@cindex @code{SIGINT} signal (MS-Windows)
+@cindex signals, @code{INT}/@code{SIGINT} (MS-Windows)
+@cindex @code{QUIT} signal (MS-Windows)
+@cindex @code{SIGQUIT} signal (MS-Windows)
+@cindex signals, @code{QUIT}/@code{SIGQUIT} (MS-Windows)
+When @command{gawk} runs on MS-Windows systems, it uses the
+@code{INT} and @code{QUIT} signals for producing the profile and, in
+the case of the @code{INT} signal, @command{gawk} exits. This is
+because these systems don't support the @command{kill} command, so the
+only signals you can deliver to a program are those generated by the
+keyboard. The @code{INT} signal is generated by the
+@kbd{@value{CTL}-@key{C}} or @kbd{@value{CTL}-@key{BREAK}} key, while the
+@code{QUIT} signal is generated by the @kbd{@value{CTL}-@key{\}} key.
+
+Finally, @command{gawk} also accepts another option @option{--pretty-print}.
+When called this way, @command{gawk} ``pretty prints'' the program into
+@file{awkprof.out}, without any execution counts.
+@c ENDOFRANGE advgaw
+@c ENDOFRANGE gawadv
+@c ENDOFRANGE awkp
+@c ENDOFRANGE proawk
+
@c The original text for this chapter was contributed by Efraim Yawitz.
@c FIXME: Add more indexing.
@@ -27444,15 +27021,4971 @@ The @command{gawk} debugger only accepts source supplied with the @option{-f} op
Look forward to a future release when these and other missing features may
be added, and of course feel free to try to add them yourself!
+@node Arbitrary Precision Arithmetic
+@chapter Arithmetic and Arbitrary Precision Arithmetic with @command{gawk}
+@cindex arbitrary precision
+@cindex multiple precision
+@cindex infinite precision
+@cindex floating-point numbers, arbitrary precision
+@cindex MPFR
+@cindex GMP
+
+@cindex Knuth, Donald
+@quotation
+@i{There's a credibility gap: We don't know how much of the computer's answers
+to believe. Novice computer users solve this problem by implicitly trusting
+in the computer as an infallible authority; they tend to believe that all
+digits of a printed answer are significant. Disillusioned computer users have
+just the opposite approach; they are constantly afraid that their answers
+are almost meaningless.}@*
+Donald Knuth@footnote{Donald E.@: Knuth.
+@cite{The Art of Computer Programming}. Volume 2,
+@cite{Seminumerical Algorithms}, third edition,
+1998, ISBN 0-201-89683-4, p.@: 229.}
+@end quotation
+
+This @value{CHAPTER} discusses issues that you may encounter
+when performing arithmetic. It begins by discussing some of
+the general attributes of computer arithmetic, along with how
+this can influence what you see when running @command{awk} programs.
+This discussion applies to all versions of @command{awk}.
+
+Then the @value{CHAPTER} moves on to @dfn{arbitrary precision
+arithmetic}, a feature which is specific to @command{gawk}.
+
+@menu
+* General Arithmetic:: An introduction to computer arithmetic.
+* Floating-point Programming:: Effective Floating-point Programming.
+* Gawk and MPFR:: How @command{gawk} provides
+ arbitrary-precision arithmetic.
+* Arbitrary Precision Floats:: Arbitrary Precision Floating-point Arithmetic
+ with @command{gawk}.
+* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic with
+ @command{gawk}.
+@end menu
+
+@node General Arithmetic
+@section A General Description of Computer Arithmetic
+
+@cindex integers
+@cindex floating-point, numbers
+@cindex numbers, floating-point
+Within computers, there are two kinds of numeric values: @dfn{integers}
+and @dfn{floating-point}.
+In school, integer values were referred to as ``whole'' numbers---that is,
+numbers without any fractional part, such as 1, 42, or @minus{}17.
+The advantage to integer numbers is that they represent values exactly.
+The disadvantage is that their range is limited. On most systems,
+this range is @minus{}2,147,483,648 to 2,147,483,647.
+However, many systems now support a range from
+@minus{}9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.
+
+@cindex unsigned integers
+@cindex integers, unsigned
+Integer values come in two flavors: @dfn{signed} and @dfn{unsigned}.
+Signed values may be negative or positive, with the range of values just
+described.
+Unsigned values are always positive. On most systems,
+the range is from 0 to 4,294,967,295.
+However, many systems now support a range from
+0 to 18,446,744,073,709,551,615.
+
+@cindex double precision floating-point
+@cindex single precision floating-point
+Floating-point numbers represent what are called ``real'' numbers; i.e.,
+those that do have a fractional part, such as 3.1415927.
+The advantage to floating-point numbers is that they
+can represent a much larger range of values.
+The disadvantage is that there are numbers that they cannot represent
+exactly.
+@command{awk} uses @dfn{double precision} floating-point numbers, which
+can hold more digits than @dfn{single precision}
+floating-point numbers.
+@c Floating-point issues are discussed more fully in
+@c @ref{Floating Point Issues}.
+
+There a several important issues to be aware of, described next.
+
+@menu
+* Floating Point Issues:: Stuff to know about floating-point numbers.
+* Integer Programming:: Effective integer programming.
+@end menu
+
+@node Floating Point Issues
+@subsection Floating-Point Number Caveats
+
+This @value{SECTION} describes some of the issues
+involved in using floating-point numbers.
+
+There is a very nice
+@uref{http://www.validlab.com/goldberg/paper.pdf, paper on floating-point arithmetic}
+by David Goldberg,
+``What Every Computer Scientist Should Know About Floating-point Arithmetic,''
+@cite{ACM Computing Surveys} @strong{23}, 1 (1991-03), 5-48.
+This is worth reading if you are interested in the details,
+but it does require a background in computer science.
+
+@menu
+* String Conversion Precision:: The String Value Can Lie.
+* Unexpected Results:: Floating Point Numbers Are Not Abstract
+ Numbers.
+* POSIX Floating Point Problems:: Standards Versus Existing Practice.
+@end menu
+
+@node String Conversion Precision
+@subsubsection The String Value Can Lie
+
+Internally, @command{awk} keeps both the numeric value
+(double precision floating-point) and the string value for a variable.
+Separately, @command{awk} keeps
+track of what type the variable has
+(@pxref{Typing and Comparison}),
+which plays a role in how variables are used in comparisons.
+
+It is important to note that the string value for a number may not
+reflect the full value (all the digits) that the numeric value
+actually contains.
+The following program (@file{values.awk}) illustrates this:
+
+@example
+@{
+ sum = $1 + $2
+ # see it for what it is
+ printf("sum = %.12g\n", sum)
+ # use CONVFMT
+ a = "<" sum ">"
+ print "a =", a
+ # use OFMT
+ print "sum =", sum
+@}
+@end example
+
+@noindent
+This program shows the full value of the sum of @code{$1} and @code{$2}
+using @code{printf}, and then prints the string values obtained
+from both automatic conversion (via @code{CONVFMT}) and
+from printing (via @code{OFMT}).
+
+Here is what happens when the program is run:
+
+@example
+$ @kbd{echo 3.654321 1.2345678 | awk -f values.awk}
+@print{} sum = 4.8888888
+@print{} a = <4.88889>
+@print{} sum = 4.88889
+@end example
+
+This makes it clear that the full numeric value is different from
+what the default string representations show.
+
+@code{CONVFMT}'s default value is @code{"%.6g"}, which yields a value with
+at least six significant digits. For some applications, you might want to
+change it to specify more precision.
+On most modern machines, most of the time,
+17 digits is enough to capture a floating-point number's
+value exactly.@footnote{Pathological cases can require up to
+752 digits (!), but we doubt that you need to worry about this.}
+
+@node Unexpected Results
+@subsubsection Floating Point Numbers Are Not Abstract Numbers
+
+@cindex floating-point, numbers
+Unlike numbers in the abstract sense (such as what you studied in high school
+or college arithmetic), numbers stored in computers are limited in certain ways.
+They cannot represent an infinite number of digits, nor can they always
+represent things exactly.
+In particular,
+floating-point numbers cannot
+always represent values exactly. Here is an example:
+
+@example
+$ @kbd{awk '@{ printf("%010d\n", $1 * 100) @}'}
+515.79
+@print{} 0000051579
+515.80
+@print{} 0000051579
+515.81
+@print{} 0000051580
+515.82
+@print{} 0000051582
+@kbd{@value{CTL}-d}
+@end example
+
+@noindent
+This shows that some values can be represented exactly,
+whereas others are only approximated. This is not a ``bug''
+in @command{awk}, but simply an artifact of how computers
+represent numbers.
+
+@quotation NOTE
+It cannot be emphasized enough that the behavior just
+described is fundamental to modern computers. You will
+see this kind of thing happen in @emph{any} programming
+language using hardware floating-point numbers. It is @emph{not}
+a bug in @command{gawk}, nor is it something that can be ``just
+fixed.''
+@end quotation
+
+@cindex negative zero
+@cindex positive zero
+@cindex zero@comma{} negative vs.@: positive
+Another peculiarity of floating-point numbers on modern systems
+is that they often have more than one representation for the number zero!
+In particular, it is possible to represent ``minus zero'' as well as
+regular, or ``positive'' zero.
+
+This example shows that negative and positive zero are distinct values
+when stored internally, but that they are in fact equal to each other,
+as well as to ``regular'' zero:
+
+@example
+$ @kbd{gawk 'BEGIN @{ mz = -0 ; pz = 0}
+> @kbd{printf "-0 = %g, +0 = %g, (-0 == +0) -> %d\n", mz, pz, mz == pz}
+> @kbd{printf "mz == 0 -> %d, pz == 0 -> %d\n", mz == 0, pz == 0}
+> @kbd{@}'}
+@print{} -0 = -0, +0 = 0, (-0 == +0) -> 1
+@print{} mz == 0 -> 1, pz == 0 -> 1
+@end example
+
+It helps to keep this in mind should you process numeric data
+that contains negative zero values; the fact that the zero is negative
+is noted and can affect comparisons.
+
+@node POSIX Floating Point Problems
+@subsubsection Standards Versus Existing Practice
+
+Historically, @command{awk} has converted any non-numeric looking string
+to the numeric value zero, when required. Furthermore, the original
+definition of the language and the original POSIX standards specified that
+@command{awk} only understands decimal numbers (base 10), and not octal
+(base 8) or hexadecimal numbers (base 16).
+
+Changes in the language of the
+2001 and 2004 POSIX standards can be interpreted to imply that @command{awk}
+should support additional features. These features are:
+
+@itemize @bullet
+@item
+Interpretation of floating point data values specified in hexadecimal
+notation (@samp{0xDEADBEEF}). (Note: data values, @emph{not}
+source code constants.)
+
+@item
+Support for the special IEEE 754 floating point values ``Not A Number''
+(NaN), positive Infinity (``inf'') and negative Infinity (``@minus{}inf'').
+In particular, the format for these values is as specified by the ISO 1999
+C standard, which ignores case and can allow machine-dependent additional
+characters after the @samp{nan} and allow either @samp{inf} or @samp{infinity}.
+@end itemize
+
+The first problem is that both of these are clear changes to historical
+practice:
+
+@itemize @bullet
+@item
+The @command{gawk} maintainer feels that supporting hexadecimal floating
+point values, in particular, is ugly, and was never intended by the
+original designers to be part of the language.
+
+@item
+Allowing completely alphabetic strings to have valid numeric
+values is also a very severe departure from historical practice.
+@end itemize
+
+The second problem is that the @code{gawk} maintainer feels that this
+interpretation of the standard, which requires a certain amount of
+``language lawyering'' to arrive at in the first place, was not even
+intended by the standard developers. In other words, ``we see how you
+got where you are, but we don't think that that's where you want to be.''
+
+Recognizing the above issues, but attempting to provide compatibility
+with the earlier versions of the standard,
+the 2008 POSIX standard added explicit wording to allow, but not require,
+that @command{awk} support hexadecimal floating point values and
+special values for ``Not A Number'' and infinity.
+
+Although the @command{gawk} maintainer continues to feel that
+providing those features is inadvisable,
+nevertheless, on systems that support IEEE floating point, it seems
+reasonable to provide @emph{some} way to support NaN and Infinity values.
+The solution implemented in @command{gawk} is as follows:
+
+@itemize @bullet
+@item
+With the @option{--posix} command-line option, @command{gawk} becomes
+``hands off.'' String values are passed directly to the system library's
+@code{strtod()} function, and if it successfully returns a numeric value,
+that is what's used.@footnote{You asked for it, you got it.}
+By definition, the results are not portable across
+different systems. They are also a little surprising:
+
+@example
+$ @kbd{echo nanny | gawk --posix '@{ print $1 + 0 @}'}
+@print{} nan
+$ @kbd{echo 0xDeadBeef | gawk --posix '@{ print $1 + 0 @}'}
+@print{} 3735928559
+@end example
+
+@item
+Without @option{--posix}, @command{gawk} interprets the four strings
+@samp{+inf},
+@samp{-inf},
+@samp{+nan},
+and
+@samp{-nan}
+specially, producing the corresponding special numeric values.
+The leading sign acts a signal to @command{gawk} (and the user)
+that the value is really numeric. Hexadecimal floating point is
+not supported (unless you also use @option{--non-decimal-data},
+which is @emph{not} recommended). For example:
+
+@example
+$ @kbd{echo nanny | gawk '@{ print $1 + 0 @}'}
+@print{} 0
+$ @kbd{echo +nan | gawk '@{ print $1 + 0 @}'}
+@print{} nan
+$ @kbd{echo 0xDeadBeef | gawk '@{ print $1 + 0 @}'}
+@print{} 0
+@end example
+
+@command{gawk} does ignore case in the four special values.
+Thus @samp{+nan} and @samp{+NaN} are the same.
+@end itemize
+
+@node Integer Programming
+@subsection Mixing Integers And Floating-point
+
+As has been mentioned already, @command{gawk} ordinarily uses hardware double
+precision with 64-bit IEEE binary floating-point representation
+for numbers on most systems. A large integer like 9,007,199,254,740,997
+has a binary representation that, although finite, is more than 53 bits long;
+it must also be rounded to 53 bits.
+The biggest integer that can be stored in a C @code{double} is usually the same
+as the largest possible value of a @code{double}. If your system @code{double}
+is an IEEE 64-bit @code{double}, this largest possible value is an integer and
+can be represented precisely. What more should one know about integers?
+
+If you want to know what is the largest integer, such that it and
+all smaller integers can be stored in 64-bit doubles without losing precision,
+then the answer is
+@iftex
+@math{2^{53}}.
+@end iftex
+@ifnottex
+2^53.
+@end ifnottex
+The next representable number is the even number
+@iftex
+@math{2^{53} + 2},
+@end iftex
+@ifnottex
+2^53 + 2,
+@end ifnottex
+meaning it is unlikely that you will be able to make
+@command{gawk} print
+@iftex
+@math{2^{53} + 1}
+@end iftex
+@ifnottex
+2^53 + 1
+@end ifnottex
+in integer format.
+The range of integers exactly representable by a 64-bit double
+is
+@iftex
+@math{[-2^{53}, 2^{53}]}.
+@end iftex
+@ifnottex
+[@minus{}2^53, 2^53].
+@end ifnottex
+If you ever see an integer outside this range in @command{gawk}
+using 64-bit doubles, you have reason to be very suspicious about
+the accuracy of the output. Here is a simple program with erroneous output:
+
+@example
+$ @kbd{gawk 'BEGIN @{ i = 2^53 - 1; for (j = 0; j < 4; j++) print i + j @}'}
+@print{} 9007199254740991
+@print{} 9007199254740992
+@print{} 9007199254740992
+@print{} 9007199254740994
+@end example
+
+The lesson is to not assume that any large integer printed by @command{gawk}
+represents an exact result from your computation, especially if it wraps
+around on your screen.
+
+@node Floating-point Programming
+@section Understanding Floating-point Programming
+
+Numerical programming is an extensive area; if you need to develop
+sophisticated numerical algorithms then @command{gawk} may not be
+the ideal tool, and this documentation may not be sufficient.
+@c FIXME: JOHN: Do you want to cite some actual books?
+It might require digesting a book or two to really internalize how to compute
+with ideal accuracy and precision,
+and the result often depends on the particular application.
+
+@quotation NOTE
+A floating-point calculation's @dfn{accuracy} is how close it comes
+to the real value. This is as opposed to the @dfn{precision}, which
+usually refers to the number of bits used to represent the number
+(see @uref{http://en.wikipedia.org/wiki/Accuracy_and_precision,
+the Wikipedia article} for more information).
+@end quotation
+
+There are two options for doing floating-point calculations:
+hardware floating-point (as used by standard @command{awk} and
+the default for @command{gawk}), and @dfn{arbitrary-precision}
+floating-point, which is software based.
+From this point forward, this @value{CHAPTER}
+aims to provide enough information to understand both, and then
+will focus on @command{gawk}'s facilities for the latter.@footnote{If you
+are interested in other tools that perform arbitrary precision arithmetic,
+you may want to investigate the POSIX @command{bc} tool. See
+@uref{http://pubs.opengroup.org/onlinepubs/009695399/utilities/bc.html,
+the POSIX specification for it}, for more information.}
+
+Binary floating-point representations and arithmetic are inexact.
+Simple values like 0.1 cannot be precisely represented using
+binary floating-point numbers, and the limited precision of
+floating-point numbers means that slight changes in
+the order of operations or the precision of intermediate storage
+can change the result. To make matters worse, with arbitrary precision
+floating-point, you can set the precision before starting a computation,
+but then you cannot be sure of the number of significant decimal places
+in the final result.
+
+Sometimes, before you start to write any code, you should think more
+about what you really want and what's really happening. Consider the
+two numbers in the following example:
+
+@example
+x = 0.875 # 1/2 + 1/4 + 1/8
+y = 0.425
+@end example
+
+Unlike the number in @code{y}, the number stored in @code{x}
+is exactly representable
+in binary since it can be written as a finite sum of one or
+more fractions whose denominators are all powers of two.
+When @command{gawk} reads a floating-point number from
+program source, it automatically rounds that number to whatever
+precision your machine supports. If you try to print the numeric
+content of a variable using an output format string of @code{"%.17g"},
+it may not produce the same number as you assigned to it:
+
+@example
+$ @kbd{gawk 'BEGIN @{ x = 0.875; y = 0.425}
+> @kbd{ printf("%0.17g, %0.17g\n", x, y) @}'}
+@print{} 0.875, 0.42499999999999999
+@end example
+
+Often the error is so small you do not even notice it, and if you do,
+you can always specify how much precision you would like in your output.
+Usually this is a format string like @code{"%.15g"}, which when
+used in the previous example, produces an output identical to the input.
+
+Because the underlying representation can be a little bit off from the exact value,
+comparing floating-point values to see if they are equal is generally not a good idea.
+Here is an example where it does not work like you expect:
+
+@example
+$ @kbd{gawk 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'}
+@print{} 0
+@end example
+
+The loss of accuracy during a single computation with floating-point numbers
+usually isn't enough to worry about. However, if you compute a value
+which is the result of a sequence of floating point operations,
+the error can accumulate and greatly affect the computation itself.
+Here is an attempt to compute the value of the constant
+@value{PI} using one of its many series representations:
+
+@example
+BEGIN @{
+ x = 1.0 / sqrt(3.0)
+ n = 6
+ for (i = 1; i < 30; i++) @{
+ n = n * 2.0
+ x = (sqrt(x * x + 1) - 1) / x
+ printf("%.15f\n", n * x)
+ @}
+@}
+@end example
+
+When run, the early errors propagating through later computations
+cause the loop to terminate prematurely after an attempt to divide by zero.
+
+@example
+$ @kbd{gawk -f pi.awk}
+@print{} 3.215390309173475
+@print{} 3.159659942097510
+@print{} 3.146086215131467
+@print{} 3.142714599645573
+@dots{}
+@print{} 3.224515243534819
+@print{} 2.791117213058638
+@print{} 0.000000000000000
+@error{} gawk: pi.awk:6: fatal: division by zero attempted
+@end example
+
+Here is an additional example where the inaccuracies in internal representations
+yield an unexpected result:
+
+@example
+$ @kbd{gawk 'BEGIN @{}
+> @kbd{for (d = 1.1; d <= 1.5; d += 0.1)}
+> @kbd{i++}
+> @kbd{print i}
+> @kbd{@}'}
+@print{} 4
+@end example
+
+Can computation using arbitrary precision help with the previous examples?
+If you are impatient to know, see
+@ref{Exact Arithmetic}.
+
+Instead of arbitrary precision floating-point arithmetic,
+often all you need is an adjustment of your logic
+or a different order for the operations in your calculation.
+The stability and the accuracy of the computation of the constant @value{PI}
+in the previous example can be enhanced by using the following
+simple algebraic transformation:
+
+@example
+(sqrt(x * x + 1) - 1) / x = x / (sqrt(x * x + 1) + 1)
+@end example
+
+@noindent
+After making this, change the program does converge to
+@value{PI} in under 30 iterations:
+
+@example
+$ @kbd{gawk -f /tmp/pi2.awk}
+@print{} 3.215390309173473
+@print{} 3.159659942097501
+@print{} 3.146086215131436
+@print{} 3.142714599645370
+@print{} 3.141873049979825
+@dots{}
+@print{} 3.141592653589797
+@print{} 3.141592653589797
+@end example
+
+There is no need to be unduly suspicious about the results from
+floating-point arithmetic. The lesson to remember is that
+floating-point arithmetic is always more complex than arithmetic using
+pencil and paper. In order to take advantage of the power
+of computer floating-point, you need to know its limitations
+and work within them. For most casual use of floating-point arithmetic,
+you will often get the expected result in the end if you simply round
+the display of your final results to the correct number of significant
+decimal digits.
+
+As general advice, avoid presenting numerical data in a manner that
+implies better precision than is actually the case.
+
+@menu
+* Floating-point Representation:: Binary floating-point representation.
+* Floating-point Context:: Floating-point context.
+* Rounding Mode:: Floating-point rounding mode.
+@end menu
+
+@node Floating-point Representation
+@subsection Binary Floating-point Representation
+@cindex IEEE-754 format
+
+Although floating-point representations vary from machine to machine,
+the most commonly encountered representation is that defined by the
+IEEE 754 Standard. An IEEE-754 format value has three components:
+
+@itemize @bullet
+@item
+A sign bit telling whether the number is positive or negative.
+
+@item
+An @dfn{exponent}, @var{e}, giving its order of magnitude.
+
+@item
+A @dfn{significand}, @var{s},
+specifying the actual digits of the number.
+@end itemize
+
+The value of the
+number is then
+@iftex
+@math{s @cdot 2^e}.
+@end iftex
+@ifnottex
+@var{s * 2^e}.
+@end ifnottex
+The first bit of a non-zero binary significand
+is always one, so the significand in an IEEE-754 format only includes the
+fractional part, leaving the leading one implicit.
+The significand is stored in @dfn{normalized} format,
+which means that the first bit is always a one.
+
+Three of the standard IEEE-754 types are 32-bit single precision,
+64-bit double precision and 128-bit quadruple precision.
+The standard also specifies extended precision formats
+to allow greater precisions and larger exponent ranges.
+
+@node Floating-point Context
+@subsection Floating-point Context
+@cindex context, floating-point
+
+A floating-point @dfn{context} defines the environment for arithmetic operations.
+It governs precision, sets rules for rounding, and limits the range for exponents.
+The context has the following primary components:
+
+@table @dfn
+@item Precision
+Precision of the floating-point format in bits.
+@item emax
+Maximum exponent allowed for this format.
+@item emin
+Minimum exponent allowed for this format.
+@item Underflow behavior
+The format may or may not support gradual underflow.
+@item Rounding
+The rounding mode of this context.
+@end table
+
+@ref{table-ieee-formats} lists the precision and exponent
+field values for the basic IEEE-754 binary formats:
+
+@float Table,table-ieee-formats
+@caption{Basic IEEE Format Context Values}
+@multitable @columnfractions .20 .20 .20 .20 .20
+@headitem Name @tab Total bits @tab Precision @tab emin @tab emax
+@item Single @tab 32 @tab 24 @tab @minus{}126 @tab +127
+@item Double @tab 64 @tab 53 @tab @minus{}1022 @tab +1023
+@item Quadruple @tab 128 @tab 113 @tab @minus{}16382 @tab +16383
+@end multitable
+@end float
+
+@quotation NOTE
+The precision numbers include the implied leading one that gives them
+one extra bit of significand.
+@end quotation
+
+A floating-point context can also determine which signals are treated
+as exceptions, and can set rules for arithmetic with special values.
+Please consult the IEEE-754 standard or other resources for details.
+
+@command{gawk} ordinarily uses the hardware double precision
+representation for numbers. On most systems, this is IEEE-754
+floating-point format, corresponding to 64-bit binary with 53 bits
+of precision.
+
+@quotation NOTE
+In case an underflow occurs, the standard allows, but does not require,
+the result from an arithmetic operation to be a number smaller than
+the smallest nonzero normalized number. Such numbers do
+not have as many significant digits as normal numbers, and are called
+@dfn{denormals} or @dfn{subnormals}. The alternative, simply returning a zero,
+is called @dfn{flush to zero}. The basic IEEE-754 binary formats
+support subnormal numbers.
+@end quotation
+
+@node Rounding Mode
+@subsection Floating-point Rounding Mode
+@cindex rounding mode, floating-point
+
+The @dfn{rounding mode} specifies the behavior for the results of numerical
+operations when discarding extra precision. Each rounding mode indicates
+how the least significant returned digit of a rounded result is to
+be calculated.
+@ref{table-rounding-modes} lists the IEEE-754 defined
+rounding modes:
+
+@float Table,table-rounding-modes
+@caption{IEEE 754 Rounding Modes}
+@multitable @columnfractions .45 .55
+@headitem Rounding Mode @tab IEEE Name
+@item Round to nearest, ties to even @tab @code{roundTiesToEven}
+@item Round toward plus Infinity @tab @code{roundTowardPositive}
+@item Round toward negative Infinity @tab @code{roundTowardNegative}
+@item Round toward zero @tab @code{roundTowardZero}
+@item Round to nearest, ties away from zero @tab @code{roundTiesToAway}
+@end multitable
+@end float
+
+The default mode @code{roundTiesToEven} is the most preferred,
+but the least intuitive. This method does the obvious thing for most values,
+by rounding them up or down to the nearest digit.
+For example, rounding 1.132 to two digits yields 1.13,
+and rounding 1.157 yields 1.16.
+
+However, when it comes to rounding a value that is exactly halfway between,
+things do not work the way you probably learned in school.
+In this case, the number is rounded to the nearest even digit.
+So rounding 0.125 to two digits rounds down to 0.12,
+but rounding 0.6875 to three digits rounds up to 0.688.
+You probably have already encountered this rounding mode when
+using the @code{printf} routine to format floating-point numbers.
+For example:
+
+@example
+BEGIN @{
+ x = -4.5
+ for (i = 1; i < 10; i++) @{
+ x += 1.0
+ printf("%4.1f => %2.0f\n", x, x)
+ @}
+@}
+@end example
+
+@noindent
+produces the following output when run:@footnote{It
+is possible for the output to be completely different if the
+C library in your system does not use the IEEE-754 even-rounding
+rule to round halfway cases for @code{printf()}.}
+
+@example
+-3.5 => -4
+-2.5 => -2
+-1.5 => -2
+-0.5 => 0
+ 0.5 => 0
+ 1.5 => 2
+ 2.5 => 2
+ 3.5 => 4
+ 4.5 => 4
+@end example
+
+The theory behind the rounding mode @code{roundTiesToEven} is that
+it more or less evenly distributes upward and downward rounds
+of exact halves, which might cause the round-off error
+to cancel itself out. This is the default rounding mode used
+in IEEE-754 computing functions and operators.
+
+The other rounding modes are rarely used.
+Round toward positive infinity (@code{roundTowardPositive})
+and round toward negative infinity (@code{roundTowardNegative})
+are often used to implement interval arithmetic,
+where you adjust the rounding mode to calculate upper and lower bounds
+for the range of output. The @code{roundTowardZero}
+mode can be used for converting floating-point numbers to integers.
+The rounding mode @code{roundTiesToAway} rounds the result to the
+nearest number and selects the number with the larger magnitude
+if a tie occurs.
+
+Some numerical analysts will tell you that your choice of rounding style
+has tremendous impact on the final outcome, and advise you to wait until
+final output for any rounding. Instead, you can often avoid round-off error problems by
+setting the precision initially to some value sufficiently larger than
+the final desired precision, so that the accumulation of round-off error
+does not influence the outcome.
+If you suspect that results from your computation are
+sensitive to accumulation of round-off error,
+one way to be sure is to look for a significant difference in output
+when you change the rounding mode.
+
+@node Gawk and MPFR
+@section @command{gawk} + MPFR = Powerful Arithmetic
+
+The rest of this @value{CHAPTER} describes how to use the arbitrary precision
+(also known as @dfn{multiple precision} or @dfn{infinite precision}) numeric
+capabilities in @command{gawk} to produce maximally accurate results
+when you need it.
+
+But first you should check if your version of
+@command{gawk} supports arbitrary precision arithmetic.
+The easiest way to find out is to look at the output of
+the following command:
+
+@example
+$ @kbd{gawk --version}
+@print{} GNU Awk 4.1.0 (GNU MPFR 3.1.0, GNU MP 5.0.3)
+@print{} Copyright (C) 1989, 1991-2012 Free Software Foundation.
+@dots{}
+@end example
+
+@command{gawk} uses the
+@uref{http://www.mpfr.org, GNU MPFR}
+and
+@uref{http://gmplib.org, GNU MP} (GMP)
+libraries for arbitrary precision
+arithmetic on numbers. So if you do not see the names of these libraries
+in the output, then your version of @command{gawk} does not support
+arbitrary precision arithmetic.
+
+Additionally,
+there are a few elements available in the @code{PROCINFO} array
+to provide information about the MPFR and GMP libraries.
+@xref{Auto-set}, for more information.
+
@ignore
-@c Try this
+Even if you aren't interested in arbitrary precision arithmetic, you
+may still benefit from knowing about how @command{gawk} handles numbers
+in general, and the limitations of doing arithmetic with ordinary
+@command{gawk} numbers.
+@end ignore
+
+
+@node Arbitrary Precision Floats
+@section Arbitrary Precision Floating-point Arithmetic with @command{gawk}
+
+@command{gawk} uses the GNU MPFR library
+for arbitrary precision floating-point arithmetic. The MPFR library
+provides precise control over precisions and rounding modes, and gives
+correctly rounded, reproducible, platform-independent results. With the
+command-line option @option{--bignum} or @option{-M},
+all floating-point arithmetic operators and numeric functions can yield
+results to any desired precision level supported by MPFR.
+Two built-in variables, @code{PREC} and @code{ROUNDMODE},
+provide control over the working precision and the rounding mode
+(@pxref{Setting Precision}, and
+@pxref{Setting Rounding Mode}).
+The precision and the rounding mode are set globally for every operation
+to follow.
+
+The default working precision for arbitrary precision floating-point values is 53,
+and the default value for @code{ROUNDMODE} is @code{"N"},
+which selects the IEEE-754 @code{roundTiesToEven} rounding mode
+(@pxref{Rounding Mode}).@footnote{The
+default precision is 53, since according to the MPFR documentation,
+the library should be able to exactly reproduce all computations with
+double-precision machine floating-point numbers (@code{double} type
+in C), except the default exponent range is much wider and subnormal
+numbers are not implemented.}
+@command{gawk} uses the default exponent range in MPFR
@iftex
-@page
-@headings off
-@majorheading III@ @ @ Appendixes
-Part III provides the appendixes, the Glossary, and two licenses that cover
+(@math{emax = 2^{30} - 1, emin = -emax})
+@end iftex
+@ifnottex
+(@var{emax} = 2^30 @minus{} 1, @var{emin} = @minus{}@var{emax})
+@end ifnottex
+for all floating-point contexts.
+There is no explicit mechanism to adjust the exponent range.
+MPFR does not implement subnormal numbers by default,
+and this behavior cannot be changed in @command{gawk}.
+
+@quotation NOTE
+When emulating an IEEE-754 format (@pxref{Setting Precision}),
+@command{gawk} internally adjusts the exponent range
+to the value defined for the format and also performs computations needed for
+gradual underflow (subnormal numbers).
+@end quotation
+
+@quotation NOTE
+MPFR numbers are variable-size entities, consuming only as much space as
+needed to store the significant digits. Since the performance using MPFR
+numbers pales in comparison to doing arithmetic using the underlying machine
+types, you should consider using only as much precision as needed by
+your program.
+@end quotation
+
+@menu
+* Setting Precision:: Setting the working precision.
+* Setting Rounding Mode:: Setting the rounding mode.
+* Floating-point Constants:: Representing floating-point constants.
+* Changing Precision:: Changing the precision of a number.
+* Exact Arithmetic:: Exact arithmetic with floating-point numbers.
+@end menu
+
+@node Setting Precision
+@subsection Setting the Working Precision
+@cindex @code{PREC} variable
+
+@command{gawk} uses a global working precision; it does not keep track of
+the precision or accuracy of individual numbers. Performing an arithmetic
+operation or calling a built-in function rounds the result to the current
+working precision. The default working precision is 53, which can be
+modified using the built-in variable @code{PREC}. You can also set the
+value to one of the following pre-defined case-insensitive strings
+to emulate an IEEE-754 binary format:
+
+@multitable {@code{"double"}} {12345678901234567890123456789012345}
+@headitem @code{PREC} @tab IEEE-754 Binary Format
+@item @code{"half"} @tab 16-bit half-precision.
+@item @code{"single"} @tab Basic 32-bit single precision.
+@item @code{"double"} @tab Basic 64-bit double precision.
+@item @code{"quad"} @tab Basic 128-bit quadruple precision.
+@item @code{"oct"} @tab 256-bit octuple precision.
+@end multitable
+
+The following example illustrates the effects of changing precision
+on arithmetic operations:
+
+@example
+$ @kbd{gawk -M -v PREC=100 'BEGIN @{ x = 1.0e-400; print x + 0; \}
+> @kbd{PREC = "double"; print x + 0 @}'}
+@print{} 1e-400
+@print{} 0
+@end example
+
+Binary and decimal precisions are related approximately, according to the
+formula:
+
+@iftex
+@math{prec = 3.322 @cdot dps}
+@end iftex
+@ifnottex
+@var{prec} = 3.322 * @var{dps}
+@end ifnottex
+
+@noindent
+Here, @var{prec} denotes the binary precision
+(measured in bits) and @var{dps} (short for decimal places)
+is the decimal digits. We can easily calculate how many decimal
+digits the 53-bit significand of an IEEE double is equivalent to:
+53 / 3.332 which is equal to about 15.95.
+But what does 15.95 digits actually mean? It depends whether you are
+concerned about how many digits you can rely on, or how many digits
+you need.
+
+It is important to know how many bits it takes to uniquely identify
+a double-precision value (the C type @code{double}). If you want to
+convert from @code{double} to decimal and back to @code{double} (e.g.,
+saving a @code{double} representing an intermediate result to a file, and
+later reading it back to restart the computation), then a few more decimal
+digits are required. 17 digits is generally enough for a @code{double}.
+
+It can also be important to know what decimal numbers can be uniquely
+represented with a @code{double}. If you want to convert
+from decimal to @code{double} and back again, 15 digits is the most that
+you can get. Stated differently, you should not present
+the numbers from your floating-point computations with more than 15
+significant digits in them.
+
+Conversely, it takes a precision of 332 bits to hold an approximation
+of the constant @value{PI} that is accurate to 100 decimal places.
+
+You should always add some extra bits in order to avoid the confusing round-off
+issues that occur because numbers are stored internally in binary.
+
+@node Setting Rounding Mode
+@subsection Setting the Rounding Mode
+@cindex @code{ROUNDMODE} variable
+
+The @code{ROUNDMODE} variable provides
+program level control over the rounding mode.
+The correspondence between @code{ROUNDMODE} and the IEEE
+rounding modes is shown in @ref{table-gawk-rounding-modes}.
+
+@float Table,table-gawk-rounding-modes
+@caption{@command{gawk} Rounding Modes}
+@multitable @columnfractions .45 .30 .25
+@headitem Rounding Mode @tab IEEE Name @tab @code{ROUNDMODE}
+@item Round to nearest, ties to even @tab @code{roundTiesToEven} @tab @code{"N"} or @code{"n"}
+@item Round toward plus Infinity @tab @code{roundTowardPositive} @tab @code{"U"} or @code{"u"}
+@item Round toward negative Infinity @tab @code{roundTowardNegative} @tab @code{"D"} or @code{"d"}
+@item Round toward zero @tab @code{roundTowardZero} @tab @code{"Z"} or @code{"z"}
+@item Round to nearest, ties away from zero @tab @code{roundTiesToAway} @tab @code{"A"} or @code{"a"}
+@end multitable
+@end float
+
+@code{ROUNDMODE} has the default value @code{"N"},
+which selects the IEEE-754 rounding mode @code{roundTiesToEven}.
+@ref{table-gawk-rounding-modes}, lists @code{"A"} to select the IEEE-754 mode
+@code{roundTiesToAway}. This is only available
+if your version of the MPFR library supports it; otherwise setting
+@code{ROUNDMODE} to this value has no effect. @xref{Rounding Mode},
+for the meanings of the various rounding modes.
+
+Here is an example of how to change the default rounding behavior of
+@code{printf}'s output:
+
+@example
+$ @kbd{gawk -M -v ROUNDMODE="Z" 'BEGIN @{ printf("%.2f\n", 1.378) @}'}
+@print{} 1.37
+@end example
+
+@node Floating-point Constants
+@subsection Representing Floating-point Constants
+@cindex constants, floating-point
+
+Be wary of floating-point constants! When reading a floating-point constant
+from program source code, @command{gawk} uses the default precision,
+unless overridden
+by an assignment to the special variable @code{PREC} on the command
+line, to store it internally as a MPFR number.
+Changing the precision using @code{PREC} in the program text does
+@emph{not} change the precision of a constant. If you need to
+represent a floating-point constant at a higher precision than the
+default and cannot use a command line assignment to @code{PREC},
+you should either specify the constant as a string, or
+as a rational number, whenever possible. The following example
+illustrates the differences among various ways to
+print a floating-point constant:
+
+@example
+$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", 0.1) @}'}
+@print{} 0.1000000000000000055511151
+$ @kbd{gawk -M -v PREC=113 'BEGIN @{ printf("%0.25f\n", 0.1) @}'}
+@print{} 0.1000000000000000000000000
+$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", "0.1") @}'}
+@print{} 0.1000000000000000000000000
+$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", 1/10) @}'}
+@print{} 0.1000000000000000000000000
+@end example
+
+In the first case, the number is stored with the default precision of 53.
+
+@node Changing Precision
+@subsection Changing the Precision of a Number
+
+@cindex Laurie, Dirk
+@quotation
+@i{The point is that in any variable-precision package,
+a decision is made on how to treat numbers given as data,
+or arising in intermediate results, which are represented in
+floating-point format to a precision lower than working precision.
+Do we promote them to full membership of the high-precision club,
+or do we treat them and all their associates as second-class citizens?
+Sometimes the first course is proper, sometimes the second, and it takes
+careful analysis to tell which.}
+
+Dirk Laurie@footnote{Dirk Laurie.
+@cite{Variable-precision Arithmetic Considered Perilous --- A Detective Story}.
+Electronic Transactions on Numerical Analysis. Volume 28, pp. 168-173, 2008.}
+@end quotation
+
+@command{gawk} does not implicitly modify the precision of any previously
+computed results when the working precision is changed with an assignment
+to @code{PREC}. The precision of a number is always the one that was
+used at the time of its creation, and there is no way for the user
+to explicitly change it afterwards. However, since the result of a
+floating-point arithmetic operation is always an arbitrary precision
+floating-point value---with a precision set by the value of @code{PREC}---one of the
+following workarounds effectively accomplishes the desired behavior:
+
+@example
+x = x + 0.0
+@end example
+
+@noindent
+or:
+
+@example
+x += 0.0
+@end example
+
+@node Exact Arithmetic
+@subsection Exact Arithmetic with Floating-point Numbers
+
+@quotation CAUTION
+Never depend on the exactness of floating-point arithmetic,
+even for apparently simple expressions!
+@end quotation
+
+Can arbitrary precision arithmetic give exact results? There are
+no easy answers. The standard rules of algebra often do not apply
+when using floating-point arithmetic.
+Among other things, the distributive and associative laws
+do not hold completely, and order of operation may be important
+for your computation. Rounding error, cumulative precision loss
+and underflow are often troublesome.
+
+When @command{gawk} tests the expressions @samp{0.1 + 12.2} and @samp{12.3}
+for equality
+using the machine double precision arithmetic, it decides that they
+are not equal!
+(@xref{Floating-point Programming}.)
+You can get the result you want by increasing the precision;
+56 in this case will get the job done:
+
+@example
+$ @kbd{gawk -M -v PREC=56 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'}
+@print{} 1
+@end example
+
+If adding more bits is good, perhaps adding even more bits of
+precision is better?
+Here is what happens if we use an even larger value of @code{PREC}:
+
+@example
+$ @kbd{gawk -M -v PREC=201 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'}
+@print{} 0
+@end example
+
+This is not a bug in @command{gawk} or in the MPFR library.
+It is easy to forget that the finite number of bits used to store the value
+is often just an approximation after proper rounding.
+The test for equality succeeds if and only if @emph{all} bits in the two operands
+are exactly the same. Since this is not necessarily true after floating-point
+computations with a particular precision and effective rounding rule,
+a straight test for equality may not work.
+
+So, don't assume that floating-point values can be compared for equality.
+You should also exercise caution when using other forms of comparisons.
+The standard way to compare between floating-point numbers is to determine
+how much error (or @dfn{tolerance}) you will allow in a comparison and
+check to see if one value is within this error range of the other.
+
+In applications where 15 or fewer decimal places suffice,
+hardware double precision arithmetic can be adequate, and is usually much faster.
+But you do need to keep in mind that every floating-point operation
+can suffer a new rounding error with catastrophic consequences as illustrated
+by our earlier attempt to compute the value of the constant @value{PI}
+(@pxref{Floating-point Programming}).
+Extra precision can greatly enhance the stability and the accuracy
+of your computation in such cases.
+
+Repeated addition is not necessarily equivalent to multiplication
+in floating-point arithmetic. In the example in
+@ref{Floating-point Programming}:
+
+@example
+$ @kbd{gawk 'BEGIN @{}
+> @kbd{for (d = 1.1; d <= 1.5; d += 0.1)}
+> @kbd{i++}
+> @kbd{print i}
+> @kbd{@}'}
+@print{} 4
+@end example
+
+@noindent
+you may or may not succeed in getting the correct result by choosing
+an arbitrarily large value for @code{PREC}. Reformulation of
+the problem at hand is often the correct approach in such situations.
+
+@node Arbitrary Precision Integers
+@section Arbitrary Precision Integer Arithmetic with @command{gawk}
+@cindex integer, arbitrary precision
+
+If the option @option{--bignum} or @option{-M} is specified,
+@command{gawk} performs all
+integer arithmetic using GMP arbitrary precision integers.
+Any number that looks like an integer in a program source or data file
+is stored as an arbitrary precision integer.
+The size of the integer is limited only by your computer's memory.
+The current floating-point context has no effect on operations involving integers.
+For example, the following computes
+@iftex
+@math{5^{4^{3^{2}}}},
+@end iftex
+@ifnottex
+5^4^3^2,
+@end ifnottex
+the result of which is beyond the
+limits of ordinary @command{gawk} numbers:
+
+@example
+$ @kbd{gawk -M 'BEGIN @{}
+> @kbd{x = 5^4^3^2}
+> @kbd{print "# of digits =", length(x)}
+> @kbd{print substr(x, 1, 20), "...", substr(x, length(x) - 19, 20)}
+> @kbd{@}'}
+@print{} # of digits = 183231
+@print{} 62060698786608744707 ... 92256259918212890625
+@end example
+
+If you were to compute the same value using arbitrary precision
+floating-point values instead, the precision needed for correct output
+(using the formula
+@iftex
+@math{prec = 3.322 @cdot dps}),
+would be @math{3.322 @cdot 183231},
+@end iftex
+@ifnottex
+@samp{prec = 3.322 * dps}),
+would be 3.322 x 183231,
+@end ifnottex
+or 608693.
+
+The result from an arithmetic operation with an integer and a floating-point value
+is a floating-point value with a precision equal to the working precision.
+The following program calculates the eighth term in
+Sylvester's sequence@footnote{Weisstein, Eric W.
+@cite{Sylvester's Sequence}. From MathWorld---A Wolfram Web Resource.
+@url{http://mathworld.wolfram.com/SylvestersSequence.html}}
+using a recurrence:
+
+@example
+$ @kbd{gawk -M 'BEGIN @{}
+> @kbd{s = 2.0}
+> @kbd{for (i = 1; i <= 7; i++)}
+> @kbd{s = s * (s - 1) + 1}
+> @kbd{print s}
+> @kbd{@}'}
+@print{} 113423713055421845118910464
+@end example
+
+The output differs from the actual number, 113,423,713,055,421,844,361,000,443,
+because the default precision of 53 is not enough to represent the
+floating-point results exactly. You can either increase the precision
+(100 is enough in this case), or replace the floating-point constant
+@samp{2.0} with an integer, to perform all computations using integer
+arithmetic to get the correct output.
+
+It will sometimes be necessary for @command{gawk} to implicitly convert an
+arbitrary precision integer into an arbitrary precision floating-point value.
+This is primarily because the MPFR library does not always provide the
+relevant interface to process arbitrary precision integers or mixed-mode
+numbers as needed by an operation or function.
+In such a case, the precision is set to the minimum value necessary
+for exact conversion, and the working precision is not used for this purpose.
+If this is not what you need or want, you can employ a subterfuge
+like this:
+
+@example
+gawk -M 'BEGIN @{ n = 13; print (n + 0.0) % 2.0 @}'
+@end example
+
+You can avoid this issue altogether by specifying the number as a floating-point value
+to begin with:
+
+@example
+gawk -M 'BEGIN @{ n = 13.0; print n % 2.0 @}'
+@end example
+
+Note that for the particular example above, there is likely best
+to just use the following:
+
+@example
+gawk -M 'BEGIN @{ n = 13; print n % 2 @}'
+@end example
+
+@node Dynamic Extensions
+@chapter Writing Extensions for @command{gawk}
+
+It is possible to add new built-in functions to @command{gawk} using
+dynamically loaded libraries. This facility is available on systems
+that support the C @code{dlopen()} and @code{dlsym()}
+functions. This @value{CHAPTER} describes how to create extensions
+using code written in C or C++.
+
+If you don't know anything about C programming, you can safely skip this
+@value{CHAPTER}, although you may wish to review the documentation on the
+extensions that come with @command{gawk} (@pxref{Extension Samples}),
+and the @value{SECTION} on the @code{gawkextlib} project (@pxref{gawkextlib}).
+The sample extensions are automatically built and installed when
+@command{gawk} is.
+
+@quotation NOTE
+When @option{--sandbox} is specified, extensions are disabled
+(@pxref{Options}).
+@end quotation
+
+@menu
+* Extension Intro:: What is an extension.
+* Plugin License:: A note about licensing.
+* Extension Design:: Design notes about the extension API.
+* Extension API Description:: A full description of the API.
+* Extension Example:: Example C code for an extension.
+* Extension Samples:: The sample extensions that ship with
+ @code{gawk}.
+* gawkextlib:: The @code{gawkextlib} project.
+@end menu
+
+@node Extension Intro
+@section Introduction
+
+An @dfn{extension} (sometimes called a @dfn{plug-in}) is a piece of
+external compiled code that @command{gawk} can load at runtime to
+provide additional functionality, over and above the built-in capabilities
+described in the rest of this @value{DOCUMENT}.
+
+Extensions are useful because they allow you (of course) to extend
+@command{gawk}'s functionality. For example, they can provide access to
+system calls (such as @code{chdir()} to change directory) and to other
+C library routines that could be of use. As with most software,
+``the sky is the limit;'' if you can imagine something that you might
+want to do and can write in C or C++, you can write an extension to do it!
+
+Extensions are written in C or C++, using the @dfn{Application Programming
+Interface} (API) defined for this purpose by the @command{gawk}
+developers. The rest of this @value{CHAPTER} explains the design
+decisions behind the API, the facilities that it provides and how to use
+them, and presents a small sample extension. In addition, it documents
+the sample extensions included in the @command{gawk} distribution,
+and describes the @code{gawkextlib} project.
+
+@node Plugin License
+@section Extension Licensing
+
+Every dynamic extension should define the global symbol
+@code{plugin_is_GPL_compatible} to assert that it has been licensed under
+a GPL-compatible license. If this symbol does not exist, @command{gawk}
+emits a fatal error and exits when it tries to load your extension.
+
+The declared type of the symbol should be @code{int}. It does not need
+to be in any allocated section, though. The code merely asserts that
+the symbol exists in the global scope. Something like this is enough:
+
+@example
+int plugin_is_GPL_compatible;
+@end example
+
+@node Extension Design
+@section Extension API Design
+
+The first version of extensions for @command{gawk} was developed in
+the mid-1990s and released with @command{gawk} 3.1 in the late 1990s.
+The basic mechanisms and design remained unchanged for close to 15 years,
+until 2012.
+
+The old extension mechanism used data types and functions from
+@command{gawk} itself, with a ``clever hack'' to install extension
+functions.
+
+@command{gawk} included some sample extensions, of which a few were
+really useful. However, it was clear from the outset that the extension
+mechanism was bolted onto the side and was not really thought out.
+
+@menu
+* Old Extension Problems:: Problems with the old mechanism.
+* Extension New Mechanism Goals:: Goals for the new mechanism.
+* Extension Other Design Decisions:: Some other design decisions.
+* Extension Mechanism Outline:: An outline of how it works.
+* Extension Future Growth:: Some room for future growth.
+@end menu
+
+@node Old Extension Problems
+@subsection Problems With The Old Mechanism
+
+The old extension mechanism had several problems:
+
+@itemize @bullet
+@item
+It depended heavily upon @command{gawk} internals. Any time the
+@code{NODE} structure@footnote{A critical central data structure
+inside @command{gawk}.} changed, an extension would have to be
+recompiled. Furthermore, to really write extensions required understanding
+something about @command{gawk}'s internal functions. There was some
+documentation in this @value{DOCUMENT}, but it was quite minimal.
+
+@item
+Being able to call into @command{gawk} from an extension required linker
+facilities that are common on Unix-derived systems but that did
+not work on Windows systems; users wanting extensions on Windows
+had to statically link them into @command{gawk}, even though Windows supports
+dynamic loading of shared objects.
+
+@item
+The API would change occasionally as @command{gawk} changed; no compatibility
+between versions was ever offered or planned for.
+@end itemize
+
+Despite the drawbacks, the @command{xgawk} project developers forked
+@command{gawk} and developed several significant extensions. They also
+enhanced @command{gawk}'s facilities relating to file inclusion and
+shared object access.
+
+A new API was desired for a long time, but only in 2012 did the
+@command{gawk} maintainer and the @command{xgawk} developers finally
+start working on it together. More information about the @command{xgawk}
+project is provided in @ref{gawkextlib}.
+
+@node Extension New Mechanism Goals
+@subsection Goals For A New Mechanism
+
+Some goals for the new API were:
+
+@itemize @bullet
+@item
+The API should be independent of @command{gawk} internals. Changes in
+@command{gawk} internals should not be visible to the writer of an
+extension function.
+
+@item
+The API should provide @emph{binary} compatibility across @command{gawk}
+releases as long as the API itself does not change.
+
+@item
+The API should enable extensions written in C to have roughly the
+same ``appearance'' to @command{awk}-level code as @command{awk}
+functions do. This means that extensions should have:
+
+@itemize @minus
+@item
+The ability to access function parameters.
+
+@item
+The ability to turn an undefined parameter into an array (call by reference).
+
+@item
+The ability to create, access and update global variables.
+
+@item
+Easy access to all the elements of an array at once (``array flattening'')
+in order to loop over all the element in an easy fashion for C code.
+
+@item
+The ability to create arrays (including @command{gawk}'s true
+multi-dimensional arrays).
+@end itemize
+@end itemize
+
+Some additional important goals were:
+
+@itemize @bullet
+@item
+The API should use only features in ISO C 90, so that extensions
+can be written using the widest range of C and C++ compilers. The header
+should include the appropriate @samp{#ifdef __cplusplus} and @samp{extern "C"}
+magic so that a C++ compiler could be used. (If using C++, the runtime
+system has to be smart enough to call any constructors and destructors,
+as @command{gawk} is a C program. As of this writing, this has not been
+tested.)
+
+@item
+The API mechanism should not require access to @command{gawk}'s
+symbols@footnote{The @dfn{symbols} are the variables and functions
+defined inside @command{gawk}. Access to these symbols by code
+external to @command{gawk} loaded dynamically at runtime is
+problematic on Windows.} by the compile-time or dynamic linker,
+in order to enable creation of extensions that also work on Windows.
+@end itemize
+
+During development, it became clear that there were other features
+that should be available to extensions, which were also subsequently
+provided:
+
+@itemize @bullet
+@item
+Extensions should have the ability to hook into @command{gawk}'s
+I/O redirection mechanism. In particular, the @command{xgawk}
+developers provided a so-called ``open hook'' to take over reading
+records. During development, this was generalized to allow
+extensions to hook into input processing, output processing, and
+two-way I/O.
+
+@item
+An extension should be able to provide a ``call back'' function
+to perform clean up actions when @command{gawk} exits.
+
+@item
+An extension should be able to provide a version string so that
+@command{gawk}'s @option{--version} option can provide information
+about extensions as well.
+@end itemize
+
+@node Extension Other Design Decisions
+@subsection Other Design Decisions
+
+As an arbitrary design decision, extensions can read the values of
+built-in variables and arrays (such as @code{ARGV} and @code{FS}), but cannot
+change them, with the exception of @code{PROCINFO}.
+
+The reason for this is to prevent an extension function from affecting
+the flow of an @command{awk} program outside its control. While a real
+@command{awk} function can do what it likes, that is at the discretion
+of the programmer. An extension function should provide a service or
+make a C API available for use within @command{awk}, and not mess with
+@code{FS} or @code{ARGC} and @code{ARGV}.
+
+In addition, it becomes easy to start down a slippery slope. How
+much access to @command{gawk} facilities do extensions need?
+Do they need @code{getline}? What about calling @code{gsub()} or
+compiling regular expressions? What about calling into @command{awk}
+functions? (@emph{That} would be messy.)
+
+In order to avoid these issues, the @command{gawk} developers chose
+to start with the simplest, most basic features that are still truly useful.
+
+Another decision is that although @command{gawk} provides nice things like
+MPFR, and arrays indexed internally by integers, these features are not
+being brought out to the API in order to keep things simple and close to
+traditional @command{awk} semantics. (In fact, arrays indexed internally
+by integers are so transparent that they aren't even documented!)
+
+Additionally, all functions in the API check that their pointer
+input parameters are not @code{NULL}. If they are, they return an error.
+(It is a good idea for extension code to verify that
+pointers received from @command{gawk} are not @code{NULL}.
+Such a thing should not happen, but the @command{gawk} developers
+are only human, and they have been known to occasionally make
+mistakes.)
+
+With time, the API will undoubtedly evolve; the @command{gawk} developers
+expect this to be driven by user needs. For now, the current API seems
+to provide a minimal yet powerful set of features for creating extensions.
+
+@node Extension Mechanism Outline
+@subsection At A High Level How It Works
+
+The requirement to avoid access to @command{gawk}'s symbols is, at first
+glance, a difficult one to meet.
+
+One design, apparently used by Perl and Ruby and maybe others, would
+be to make the mainline @command{gawk} code into a library, with the
+@command{gawk} utility a small C @code{main()} function linked against
+the library.
+
+This seemed like the tail wagging the dog, complicating build and
+installation and making a simple copy of the @command{gawk} executable
+from one system to another (or one place to another on the same
+system!) into a chancy operation.
+
+Pat Rankin suggested the solution that was adopted. Communication between
+@command{gawk} and an extension is two-way. First, when an extension
+is loaded, it is passed a pointer to a @code{struct} whose fields are
+function pointers.
+This is shown in @ref{load-extension}.
+
+@float Figure,load-extension
+@caption{Loading The Extension}
+@ifinfo
+@center @image{api-figure1, , , Loading the extension, txt}
+@end ifinfo
+@ifnotinfo
+@center @image{api-figure1, , , Loading the extension}
+@end ifnotinfo
+@end float
+
+The extension can call functions inside @command{gawk} through these
+function pointers, at runtime, without needing (link-time) access
+to @command{gawk}'s symbols. One of these function pointers is to a
+function for ``registering'' new built-in functions.
+This is shown in @ref{load-new-function}.
+
+@float Figure,load-new-function
+@caption{Loading The New Function}
+@ifinfo
+@center @image{api-figure2, , , Loading the new function, txt}
+@end ifinfo
+@ifnotinfo
+@center @image{api-figure2, , , Loading the new function}
+@end ifnotinfo
+@end float
+
+In the other direction, the extension registers its new functions
+with @command{gawk} by passing function pointers to the functions that
+provide the new feature (@code{do_chdir()}, for example). @command{gawk}
+associates the function pointer with a name and can then call it, using a
+defined calling convention.
+This is shown in @ref{call-new-function}.
+
+@float Figure,call-new-function
+@caption{Calling The New Function}
+@ifinfo
+@center @image{api-figure3, , , Calling the new function, txt}
+@end ifinfo
+@ifnotinfo
+@center @image{api-figure3, , , Calling the new function}
+@end ifnotinfo
+@end float
+
+The @code{do_@var{xxx}()} function, in turn, then uses the function
+pointers in the API @code{struct} to do its work, such as updating
+variables or arrays, printing messages, setting @code{ERRNO}, and so on.
+
+Convenience macros in the @file{gawkapi.h} header file make calling
+through the function pointers look like regular function calls so that
+extension code is quite readable and understandable.
+
+Although all of this sounds somewhat complicated, the result is that
+extension code is quite straightforward to write and to read. You can
+see this in the sample extensions @file{filefuncs.c} (@pxref{Extension
+Example}) and also the @file{testext.c} code for testing the APIs.
+
+Some other bits and pieces:
+
+@itemize @bullet
+@item
+The API provides access to @command{gawk}'s @code{do_@var{xxx}} values,
+reflecting command line options, like @code{do_lint}, @code{do_profiling}
+and so on (@pxref{Extension API Variables}).
+These are informational: an extension cannot affect these
+inside @command{gawk}. In addition, attempting to assign to them
+produces a compile-time error.
+
+@item
+The API also provides major and minor version numbers, so that an
+extension can check if the @command{gawk} it is loaded with supports the
+facilities it was compiled with. (Version mismatches ``shouldn't''
+happen, but we all know how @emph{that} goes.)
+@xref{Extension Versioning}, for details.
+@end itemize
+
+@node Extension Future Growth
+@subsection Room For Future Growth
+
+The API can later be expanded, in two ways:
+
+@itemize @bullet
+@item
+@command{gawk} passes an ``extension id'' into the extension when it
+first loads the extension. The extension then passes this id back
+to @command{gawk} with each function call. This mechanism allows
+@command{gawk} to identify the extension calling into it, should it need
+to know.
+
+@item
+Similarly, the extension passes a ``name space'' into @command{gawk}
+when it registers each extension function. This allows a future
+mechanism for grouping extension functions and possibly avoiding name
+conflicts.
+@end itemize
+
+Of course, as of this writing, no decisions have been made with respect
+to any of the above.
+
+@node Extension API Description
+@section API Description
+
+This (rather large) @value{SECTION} describes the API in detail.
+
+@menu
+* Extension API Functions Introduction:: Introduction to the API functions.
+* General Data Types:: The data types.
+* Requesting Values:: How to get a value.
+* Constructor Functions:: Functions for creating values.
+* Registration Functions:: Functions to register things with
+ @command{gawk}.
+* Printing Messages:: Functions for printing messages.
+* Updating @code{ERRNO}:: Functions for updating @code{ERRNO}.
+* Accessing Parameters:: Functions for accessing parameters.
+* Symbol Table Access:: Functions for accessing global
+ variables.
+* Array Manipulation:: Functions for working with arrays.
+* Extension API Variables:: Variables provided by the API.
+* Extension API Boilerplate:: Boilerplate code for using the API.
+* Finding Extensions:: How @command{gawk} finds compiled
+ extensions.
+@end menu
+
+@node Extension API Functions Introduction
+@subsection Introduction
+
+Access to facilities within @command{gawk} are made available
+by calling through function pointers passed into your extension.
+
+API function pointers are provided for the following kinds of operations:
+
+@itemize @bullet
+@item
+Registrations functions. You may register:
+@itemize @minus
+@item
+extension functions,
+@item
+exit callbacks,
+@item
+a version string,
+@item
+input parsers,
+@item
+output wrappers,
+@item
+and two-way processors.
+@end itemize
+All of these are discussed in detail, later in this @value{CHAPTER}.
+
+@item
+Printing fatal, warning, and ``lint'' warning messages.
+
+@item
+Updating @code{ERRNO}, or unsetting it.
+
+@item
+Accessing parameters, including converting an undefined parameter into
+an array.
+
+@item
+Symbol table access: retrieving a global variable, creating one,
+or changing one. This also includes the ability to create a scalar
+variable that will be @emph{constant} within @command{awk} code.
+
+@item
+Creating and releasing cached values; this provides an
+efficient way to use values for multiple variables and
+can be a big performance win.
+
+@item
+Manipulating arrays:
+@itemize @minus
+@item
+Retrieving, adding, deleting, and modifying elements
+@item
+Getting the count of elements in an array
+@item
+Creating a new array
+@item
+Clearing an array
+@item
+Flattening an array for easy C style looping over all its indices and elements
+@end itemize
+@end itemize
+
+Some points about using the API:
+
+@itemize @bullet
+@item
+The following types and/or macros and/or functions are referenced
+in @file{gawkapi.h}. For correct use, you must therefore include the
+corresponding standard header file @emph{before} including @file{gawkapi.h}:
+
+@multitable {C Entity} {@code{<sys/types.h>}}
+@headitem C Entity @tab Header File
+@item @code{FILE} @tab @code{<stdio.h>}
+@item @code{NULL} @tab @code{<stddef.h>}
+@item @code{malloc()} @tab @code{<stdlib.h>}
+@item @code{memset()}, @code{memcpy()} @tab @code{<string.h>}
+@item @code{size_t} @tab @code{<sys/types.h>}
+@item @code{struct stat} @tab @code{<sys/stat.h>}
+@end multitable
+
+Due to portability concerns, especially to systems that are not
+fully standards-compliant, it is your responsibility
+to include the correct files in the correct way. This requirement
+is necessary in order to keep @file{gawkapi.h} clean, instead of becoming
+a portability hodge-podge as can be seen in the @command{gawk} source code.
+
+To pass reasonable integer values for @code{ERRNO}, you will also need to
+include @code{<errno.h>}.
+
+@item
+The @file{gawkapi.h} file may be included more than once without ill effect.
+Doing so, however, is poor coding practice.
+
+@item
+Although the API only uses ISO C 90 features, there is an exception; the
+``constructor'' functions use the @code{inline} keyword. If your compiler
+does not support this keyword, you should either place
+@samp{-Dinline=''} on your command line, or use the GNU Autotools and include a
+@file{config.h} file in your extensions.
+
+@item
+All pointers filled in by @command{gawk} are to memory
+managed by @command{gawk} and should be treated by the extension as
+read-only. Memory for @emph{all} strings passed into @command{gawk}
+from the extension @emph{must} come from @code{malloc()} and is managed
+by @command{gawk} from then on.
+
+@item
+The API defines several simple structs that map values as seen
+from @command{awk}. A value can be a @code{double}, a string, or an
+array (as in multidimensional arrays, or when creating a new array).
+Strings maintain both pointer and length since embedded @code{NUL}
+characters are allowed.
+
+By intent, strings are maintained using the current multibyte encoding (as
+defined by @env{LC_@var{xxx}} environment variables) and not using wide
+characters. This matches how @command{gawk} stores strings internally
+and also how characters are likely to be input and output from files.
+
+@item
+When retrieving a value (such as a parameter or that of a global variable
+or array element), the extension requests a specific type (number, string,
+scalars, value cookie, array, or ``undefined''). When the request is
+``undefined,'' the returned value will have the real underlying type.
+
+However, if the request and actual type don't match, the access function
+returns ``false'' and fills in the type of the actual value that is there,
+so that the extension can, e.g., print an error message
+(``scalar passed where array expected'').
+
+@c This is documented in the header file and needs some expanding upon.
+@c The table there should be presented here
+@end itemize
+
+While you may call the API functions by using the function pointers
+directly, the interface is not so pretty. To make extension code look
+more like regular code, the @file{gawkapi.h} header file defines several
+macros that you should use in your code. This @value{SECTION} presents
+the macros as if they were functions.
+
+@node General Data Types
+@subsection General Purpose Data Types
+
+@quotation
+@i{I have a true love/hate relationship with unions.}@*
+Arnold Robbins
+
+@i{That's the thing about unions: the compiler will arrange things so they
+can accommodate both love and hate.}@*
+Chet Ramey
+@end quotation
+
+The extension API defines a number of simple types and structures for general
+purpose use. Additional, more specialized, data structures, are introduced
+in subsequent @value{SECTION}s, together with the functions that use them.
+
+@table @code
+@item typedef void *awk_ext_id_t;
+A value of this type is received from @command{gawk} when an extension is loaded.
+That value must then be passed back to @command{gawk} as the first parameter of
+each API function.
+
+@item #define awk_const @dots{}
+This macro expands to @samp{const} when compiling an extension,
+and to nothing when compiling @command{gawk} itself. This makes
+certain fields in the API data structures unwritable from extension code,
+while allowing @command{gawk} to use them as it needs to.
+
+@item typedef int awk_bool_t;
+A simple boolean type. At the moment, the API does not define special
+``true'' and ``false'' values, although perhaps it should.
+
+@item typedef struct @{
+@itemx @ @ @ @ char *str;@ @ @ @ @ @ /* data */
+@itemx @ @ @ @ size_t len;@ @ @ @ @ /* length thereof, in chars */
+@itemx @} awk_string_t;
+This represents a mutable string. @command{gawk}
+owns the memory pointed to if it supplied
+the value. Otherwise, it takes ownership of the memory pointed to.
+@strong{Such memory must come from @code{malloc()}!}
+
+As mentioned earlier, strings are maintained using the current
+multibyte encoding.
+
+@item typedef enum @{
+@itemx @ @ @ @ AWK_UNDEFINED,
+@itemx @ @ @ @ AWK_NUMBER,
+@itemx @ @ @ @ AWK_STRING,
+@itemx @ @ @ @ AWK_ARRAY,
+@itemx @ @ @ @ AWK_SCALAR,@ @ @ @ @ @ @ @ @ /* opaque access to a variable */
+@itemx @ @ @ @ AWK_VALUE_COOKIE@ @ @ /* for updating a previously created value */
+@itemx @} awk_valtype_t;
+This @code{enum} indicates the type of a value.
+It is used in the following @code{struct}.
+
+@item typedef struct @{
+@itemx @ @ @ @ awk_valtype_t val_type;
+@itemx @ @ @ @ union @{
+@itemx @ @ @ @ @ @ @ @ awk_string_t@ @ @ @ @ @ @ s;
+@itemx @ @ @ @ @ @ @ @ double@ @ @ @ @ @ @ @ @ @ @ @ @ d;
+@itemx @ @ @ @ @ @ @ @ awk_array_t@ @ @ @ @ @ @ @ a;
+@itemx @ @ @ @ @ @ @ @ awk_scalar_t@ @ @ @ @ @ @ scl;
+@itemx @ @ @ @ @ @ @ @ awk_value_cookie_t@ vc;
+@itemx @ @ @ @ @} u;
+@itemx @} awk_value_t;
+An ``@command{awk} value.''
+The @code{val_type} member indicates what kind of value the
+@code{union} holds, and each member is of the appropriate type.
+
+@item #define str_value@ @ @ @ @ @ u.s
+@itemx #define num_value@ @ @ @ @ @ u.d
+@itemx #define array_cookie@ @ @ u.a
+@itemx #define scalar_cookie@ @ u.scl
+@itemx #define value_cookie@ @ @ u.vc
+These macros make accessing the fields of the @code{awk_value_t} more
+readable.
+
+@item typedef void *awk_scalar_t;
+Scalars can be represented as an opaque type. These values are obtained from
+@command{gawk} and then passed back into it. This is discussed in a general fashion below,
+and in more detail in @ref{Symbol table by cookie}.
+
+@item typedef void *awk_value_cookie_t;
+A ``value cookie'' is an opaque type representing a cached value.
+This is also discussed in a general fashion below,
+and in more detail in @ref{Cached values}.
+
+@end table
+
+Scalar values in @command{awk} are either numbers or strings. The
+@code{awk_value_t} struct represents values. The @code{val_type} member
+indicates what is in the @code{union}.
+
+Representing numbers is easy---the API uses a C @code{double}. Strings
+require more work. Since @command{gawk} allows embedded @code{NUL} bytes
+in string values, a string must be represented as a pair containing a
+data-pointer and length. This is the @code{awk_string_t} type.
+
+Identifiers (i.e., the names of global variables) can be associated
+with either scalar values or with arrays. In addition, @command{gawk}
+provides true arrays of arrays, where any given array element can
+itself be an array. Discussion of arrays is delayed until
+@ref{Array Manipulation}.
+
+The various macros listed earlier make it easier to use the elements
+of the @code{union} as if they were fields in a @code{struct}; this
+is a common coding practice in C. Such code is easier to write and to
+read, however it remains @emph{your} responsibility to make sure that
+the @code{val_type} member correctly reflects the type of the value in
+the @code{awk_value_t}.
+
+Conceptually, the first three members of the @code{union} (number, string,
+and array) are all that is needed for working with @command{awk} values.
+However, since the API provides routines for accessing and changing
+the value of global scalar variables only by using the variable's name,
+there is a performance penalty: @command{gawk} must find the variable
+each time it is accessed and changed. This turns out to be a real issue,
+not just a theoretical one.
+
+Thus, if you know that your extension will spend considerable time
+reading and/or changing the value of one or more scalar variables, you
+can obtain a @dfn{scalar cookie}@footnote{See
+@uref{http://catb.org/jargon/html/C/cookie.html, the ``cookie'' entry in the Jargon file} for a
+definition of @dfn{cookie}, and @uref{http://catb.org/jargon/html/M/magic-cookie.html,
+the ``magic cookie'' entry in the Jargon file} for a nice example. See
+also the entry for ``Cookie'' in the @ref{Glossary}.}
+object for that variable, and then use
+the cookie for getting the variable's value or for changing the variable's
+value.
+This is the @code{awk_scalar_t} type and @code{scalar_cookie} macro.
+Given a scalar cookie, @command{gawk} can directly retrieve or
+modify the value, as required, without having to first find it.
+
+The @code{awk_value_cookie_t} type and @code{value_cookie} macro are similar.
+If you know that you wish to
+use the same numeric or string @emph{value} for one or more variables,
+you can create the value once, retaining a @dfn{value cookie} for it,
+and then pass in that value cookie whenever you wish to set the value of a
+variable. This saves both storage space within the running @command{gawk}
+process as well as the time needed to create the value.
+
+@node Requesting Values
+@subsection Requesting Values
+
+All of the functions that return values from @command{gawk}
+work in the same way. You pass in an @code{awk_valtype_t} value
+to indicate what kind of value you expect. If the actual value
+matches what you requested, the function returns true and fills
+in the @code{awk_value_t} result.
+Otherwise, the function returns false, and the @code{val_type}
+member indicates the type of the actual value. You may then
+print an error message, or reissue the request for the actual
+value type, as appropriate. This behavior is summarized in
+@ref{table-value-types-returned}.
+
+@ifnotplaintext
+@float Table,table-value-types-returned
+@caption{Value Types Returned}
+@multitable @columnfractions .50 .50
+@headitem @tab Type of Actual Value:
+@end multitable
+@multitable @columnfractions .166 .166 .198 .15 .15 .166
+@headitem @tab @tab String @tab Number @tab Array @tab Undefined
+@item @tab @b{String} @tab String @tab String @tab false @tab false
+@item @tab @b{Number} @tab Number if can be converted, else false @tab Number @tab false @tab false
+@item @b{Type} @tab @b{Array} @tab false @tab false @tab Array @tab false
+@item @b{Requested:} @tab @b{Scalar} @tab Scalar @tab Scalar @tab false @tab false
+@item @tab @b{Undefined} @tab String @tab Number @tab Array @tab Undefined
+@item @tab @b{Value Cookie} @tab false @tab false @tab false @tab false
+@end multitable
+@end float
+@end ifnotplaintext
+@ifplaintext
+@float Table,table-value-types-returned
+@caption{Value Types Returned}
+@example
+ +-------------------------------------------------+
+ | Type of Actual Value: |
+ +------------+------------+-----------+-----------+
+ | String | Number | Array | Undefined |
++-----------+-----------+------------+------------+-----------+-----------+
+| | String | String | String | false | false |
+| |-----------+------------+------------+-----------+-----------+
+| | Number | Number if | Number | false | false |
+| | | can be | | | |
+| | | converted, | | | |
+| | | else false | | | |
+| |-----------+------------+------------+-----------+-----------+
+| Type | Array | false | false | Array | false |
+| Requested |-----------+------------+------------+-----------+-----------+
+| | Scalar | Scalar | Scalar | false | false |
+| |-----------+------------+------------+-----------+-----------+
+| | Undefined | String | Number | Array | Undefined |
+| |-----------+------------+------------+-----------+-----------+
+| | Value | false | false | false | false |
+| | Cookie | | | | |
++-----------+-----------+------------+------------+-----------+-----------+
+@end example
+@end float
+@end ifplaintext
+
+@node Constructor Functions
+@subsection Constructor Functions and Convenience Macros
+
+The API provides a number of @dfn{constructor} functions for creating
+string and numeric values, as well as a number of convenience macros.
+This @value{SUBSECTION} presents them all as function prototypes, in
+the way that extension code would use them.
+
+@table @code
+@item static inline awk_value_t *
+@itemx make_const_string(const char *string, size_t length, awk_value_t *result)
+This function creates a string value in the @code{awk_value_t} variable
+pointed to by @code{result}. It expects @code{string} to be a C string constant
+(or other string data), and automatically creates a @emph{copy} of the data
+for storage in @code{result}. It returns @code{result}.
+
+@item static inline awk_value_t *
+@itemx make_malloced_string(const char *string, size_t length, awk_value_t *result)
+This function creates a string value in the @code{awk_value_t} variable
+pointed to by @code{result}. It expects @code{string} to be a @samp{char *}
+value pointing to data previously obtained from @code{malloc()}. The idea here
+is that the data is passed directly to @command{gawk}, which assumes
+responsibility for it. It returns @code{result}.
+
+@item static inline awk_value_t *
+@itemx make_null_string(awk_value_t *result)
+This specialized function creates a null string (the ``undefined'' value)
+in the @code{awk_value_t} variable pointed to by @code{result}.
+It returns @code{result}.
+
+@item static inline awk_value_t *
+@itemx make_number(double num, awk_value_t *result)
+This function simply creates a numeric value in the @code{awk_value_t} variable
+pointed to by @code{result}.
+@end table
+
+Two convenience macros may be used for allocating storage from @code{malloc()}
+and @code{realloc()}. If the allocation fails, they cause @command{gawk} to
+exit with a fatal error message. They should be used as if they were
+procedure calls that do not return a value.
+
+@table @code
+@item emalloc(pointer, type, size, message)
+The arguments to this macro are as follows:
+@c nested table
+@table @code
+@item pointer
+The pointer variable to point at the allocated storage.
+
+@item type
+The type of the pointer variable, used to create a cast for the call to @code{malloc()}.
+
+@item size
+The total number of bytes to be allocated.
+
+@item message
+A message to be prefixed to the fatal error message. Typically this is the name
+of the function using the macro.
+@end table
+
+@noindent
+For example, you might allocate a string value like so:
+
+@example
+awk_value_t result;
+char *message;
+const char greet[] = "Don't Panic!";
+
+emalloc(message, char *, sizeof(greet), "myfunc");
+strcpy(message, greet);
+make_malloced_string(message, strlen(message), & result);
+@end example
+
+@item erealloc(pointer, type, size, message)
+This is like @code{emalloc()}, but it calls @code{realloc()},
+instead of @code{malloc()}.
+The arguments are the same as for the @code{emalloc()} macro.
+@end table
+
+@node Registration Functions
+@subsection Registration Functions
+
+This @value{SECTION} describes the API functions for
+registering parts of your extension with @command{gawk}.
+
+@menu
+* Extension Functions:: Registering extension functions.
+* Exit Callback Functions:: Registering an exit callback.
+* Extension Version String:: Registering a version string.
+* Input Parsers:: Registering an input parser.
+* Output Wrappers:: Registering an output wrapper.
+* Two-way processors:: Registering a two-way processor.
+@end menu
+
+@node Extension Functions
+@subsubsection Registering An Extension Function
+
+Extension functions are described by the following record:
+
+@example
+typedef struct @{
+@ @ @ @ const char *name;
+@ @ @ @ awk_value_t *(*function)(int num_actual_args, awk_value_t *result);
+@ @ @ @ size_t num_expected_args;
+@} awk_ext_func_t;
+@end example
+
+The fields are:
+
+@table @code
+@item const char *name;
+The name of the new function.
+@command{awk} level code calls the function by this name.
+This is a regular C string.
+
+@item awk_value_t *(*function)(int num_actual_args, awk_value_t *result);
+This is a pointer to the C function that provides the desired
+functionality.
+The function must fill in the result with either a number
+or a string. @command{awk} takes ownership of any string memory.
+As mentioned earlier, string memory @strong{must} come from @code{malloc()}.
+
+The function must return the value of @code{result}.
+This is for the convenience of the calling code inside @command{gawk}.
+
+@item size_t num_expected_args;
+This is the number of arguments the function expects to receive.
+Each extension function may decide what to do if the number of
+arguments isn't what it expected. Following @command{awk} functions, it
+is likely OK to ignore extra arguments.
+@end table
+
+Once you have a record representing your extension function, you register
+it with @command{gawk} using this API function:
+
+@table @code
+@item awk_bool_t add_ext_func(const char *namespace, const awk_ext_func_t *func);
+This function returns true upon success, false otherwise.
+The @code{namespace} parameter is currently not used; you should pass in an
+empty string (@code{""}). The @code{func} pointer is the address of a
+@code{struct} representing your function, as just described.
+@end table
+
+@node Exit Callback Functions
+@subsubsection Registering An Exit Callback Function
+
+An @dfn{exit callback} function is a function that
+@command{gawk} calls before it exits.
+Such functions are useful if you have general ``clean up'' tasks
+that should be performed in your extension (such as closing data
+base connections or other resource deallocations).
+You can register such
+a function with @command{gawk} using the following function.
+
+@table @code
+@item void awk_atexit(void (*funcp)(void *data, int exit_status),
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ void *arg0);
+The parameters are:
+@c nested table
+@table @code
+@item funcp
+A pointer to the function to be called before @command{gawk} exits. The @code{data}
+parameter will be the original value of @code{arg0}.
+The @code{exit_status} parameter is
+the exit status value that @command{gawk} will pass to the @code{exit()} system call.
+
+@item arg0
+A pointer to private data which @command{gawk} saves in order to pass to
+the function pointed to by @code{funcp}.
+@end table
+@end table
+
+Exit callback functions are called in Last-In-First-Out (LIFO) order---that is, in
+the reverse order in which they are registered with @command{gawk}.
+
+@node Extension Version String
+@subsubsection Registering An Extension Version String
+
+You can register a version string which indicates the name and
+version of your extension, with @command{gawk}, as follows:
+
+@table @code
+@item void register_ext_version(const char *version);
+Register the string pointed to by @code{version} with @command{gawk}.
+@command{gawk} does @emph{not} copy the @code{version} string, so
+it should not be changed.
+@end table
+
+@command{gawk} prints all registered extension version strings when it
+is invoked with the @option{--version} option.
+
+@node Input Parsers
+@subsubsection Customized Input Parsers
+
+By default, @command{gawk} reads text files as its input. It uses the value
+of @code{RS} to find the end of the record, and then uses @code{FS}
+(or @code{FIELDWIDTHS}) to split it into fields (@pxref{Reading Files}).
+Additionally, it sets the value of @code{RT} (@pxref{Built-in Variables}).
+
+If you want, you can provide your own custom input parser. An input
+parser's job is to return a record to the @command{gawk} record processing
+code, along with indicators for the value and length of the data to be
+used for @code{RT}, if any.
+
+To provide an input parser, you must first provide two functions
+(where @var{XXX} is a prefix name for your extension):
+
+@table @code
+@item awk_bool_t @var{XXX}_can_take_file(const awk_input_buf_t *iobuf)
+This function examines the information available in @code{iobuf}
+(which we discuss shortly). Based on the information there, it
+decides if the input parser should be used for this file.
+If so, it should return true. Otherwise, it should return false.
+It should not change any state (variable values, etc.) within @command{gawk}.
+
+@item awk_bool_t @var{XXX}_take_control_of(awk_input_buf_t *iobuf)
+When @command{gawk} decides to hand control of the file over to the
+input parser, it calls this function. This function in turn must fill
+in certain fields in the @code{awk_input_buf_t} structure, and ensure
+that certain conditions are true. It should then return true. If an
+error of some kind occurs, it should not fill in any fields, and should
+return false; then @command{gawk} will not use the input parser.
+The details are presented shortly.
+@end table
+
+Your extension should package these functions inside an
+@code{awk_input_parser_t}, which looks like this:
+
+@example
+typedef struct input_parser @{
+ const char *name; /* name of parser */
+ awk_bool_t (*can_take_file)(const awk_input_buf_t *iobuf);
+ awk_bool_t (*take_control_of)(awk_input_buf_t *iobuf);
+ awk_const struct input_parser *awk_const next; /* for use by gawk */
+@} awk_input_parser_t;
+@end example
+
+The fields are:
+
+@table @code
+@item const char *name;
+The name of the input parser. This is a regular C string.
+
+@item awk_bool_t (*can_take_file)(const awk_input_buf_t *iobuf);
+A pointer to your @code{@var{XXX}_can_take_file()} function.
+
+@item awk_bool_t (*take_control_of)(awk_input_buf_t *iobuf);
+A pointer to your @code{@var{XXX}_take_control_of()} function.
+
+@item awk_const struct input_parser *awk_const next;
+This pointer is used by @command{gawk}.
+The extension cannot modify it.
+@end table
+
+The steps are as follows:
+
+@enumerate
+@item
+Create a @code{static awk_input_parser_t} variable and initialize it
+appropriately.
+
+@item
+When your extension is loaded, register your input parser with
+@command{gawk} using the @code{register_input_parser()} API function
+(described below).
+@end enumerate
+
+An @code{awk_input_buf_t} looks like this:
+
+@example
+typedef struct awk_input @{
+ const char *name; /* filename */
+ int fd; /* file descriptor */
+#define INVALID_HANDLE (-1)
+ void *opaque; /* private data for input parsers */
+ int (*get_record)(char **out, struct awk_input *iobuf,
+ int *errcode, char **rt_start, size_t *rt_len);
+ void (*close_func)(struct awk_input *iobuf);
+ struct stat sbuf; /* stat buf */
+@} awk_input_buf_t;
+@end example
+
+The fields can be divided into two categories: those for use (initially,
+at least) by @code{@var{XXX}_can_take_file()}, and those for use by
+@code{@var{XXX}_take_control_of()}. The first group of fields and their uses
+are as follows:
+
+@table @code
+@item const char *name;
+The name of the file.
+
+@item int fd;
+A file descriptor for the file. If @command{gawk} was able to
+open the file, then @code{fd} will @emph{not} be equal to
+@code{INVALID_HANDLE}. Otherwise, it will.
+
+@item struct stat sbuf;
+If file descriptor is valid, then @command{gawk} will have filled
+in this structure via a call to the @code{fstat()} system call.
+@end table
+
+The @code{@var{XXX}_can_take_file()} function should examine these
+fields and decide if the input parser should be used for the file.
+The decision can be made based upon @command{gawk} state (the value
+of a variable defined previously by the extension and set by
+@command{awk} code), the name of the
+file, whether or not the file descriptor is valid, the information
+in the @code{struct stat}, or any combination of the above.
+
+Once @code{@var{XXX}_can_take_file()} has returned true, and
+@command{gawk} has decided to use your input parser, it calls
+@code{@var{XXX}_take_control_of()}. That function then fills in at
+least the @code{get_record} field of the @code{awk_input_buf_t}. It must
+also ensure that @code{fd} is not set to @code{INVALID_HANDLE}. All of
+the fields that may be filled by @code{@var{XXX}_take_control_of()}
+are as follows:
+
+@table @code
+@item void *opaque;
+This is used to hold any state information needed by the input parser
+for this file. It is ``opaque'' to @command{gawk}. The input parser
+is not required to use this pointer.
+
+@item int@ (*get_record)(char@ **out,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ struct@ awk_input *iobuf,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ int *errcode,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ char **rt_start,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ size_t *rt_len);
+This function pointer should point to a function that creates the input
+records. Said function is the core of the input parser. Its behavior
+is described below.
+
+@item void (*close_func)(struct awk_input *iobuf);
+This function pointer should point to a function that does
+the ``tear down.'' It should release any resources allocated by
+@code{@var{XXX}_take_control_of()}. It may also close the file. If it
+does so, it should set the @code{fd} field to @code{INVALID_HANDLE}.
+
+If @code{fd} is still not @code{INVALID_HANDLE} after the call to this
+function, @command{gawk} calls the regular @code{close()} system call.
+
+Having a ``tear down'' function is optional. If your input parser does
+not need it, do not set this field. Then, @command{gawk} calls the
+regular @code{close()} system call on the file descriptor, so it should
+be valid.
+@end table
+
+The @code{@var{XXX}_get_record()} function does the work of creating
+input records. The parameters are as follows:
+
+@table @code
+@item char **out
+This is a pointer to a @code{char *} variable which is set to point
+to the record. @command{gawk} makes its own copy of the data, so
+the extension must manage this storage.
+
+@item struct awk_input *iobuf
+This is the @code{awk_input_buf_t} for the file. The fields should be
+used for reading data (@code{fd}) and for managing private state
+(@code{opaque}), if any.
+
+@item int *errcode
+If an error occurs, @code{*errcode} should be set to an appropriate
+code from @code{<errno.h>}.
+
+@item char **rt_start
+@itemx size_t *rt_len
+If the concept of a ``record terminator'' makes sense, then
+@code{*rt_start} should be set to point to the data to be used for
+@code{RT}, and @code{*rt_len} should be set to the length of the
+data. Otherwise, @code{*rt_len} should be set to zero.
+@code{gawk} makes its own copy of this data, so the
+extension must manage the storage.
+@end table
+
+The return value is the length of the buffer pointed to by
+@code{*out}, or @code{EOF} if end-of-file was reached or an
+error occurred.
+
+It is guaranteed that @code{errcode} is a valid pointer, so there is no
+need to test for a @code{NULL} value. @command{gawk} sets @code{*errcode}
+to zero, so there is no need to set it unless an error occurs.
+
+If an error does occur, the function should return @code{EOF} and set
+@code{*errcode} to a non-zero value. In that case, if @code{*errcode}
+does not equal @minus{}1, @command{gawk} automatically updates
+the @code{ERRNO} variable based on the value of @code{*errcode} (e.g.,
+setting @samp{*errcode = errno} should do the right thing).
+
+@command{gawk} ships with a sample extension that reads directories,
+returning records for each entry in the directory (@pxref{Extension
+Sample Readdir}). You may wish to use that code as a guide for writing
+your own input parser.
+
+When writing an input parser, you should think about (and document)
+how it is expected to interact with @command{awk} code. You may want
+it to always be called, and take effect as appropriate (as the
+@code{readdir} extension does). Or you may want it to take effect
+based upon the value of an @code{awk} variable, as the XML extension
+from the @code{gawkextlib} project does (@pxref{gawkextlib}).
+In the latter case, code in a @code{BEGINFILE} section
+can look at @code{FILENAME} and @code{ERRNO} to decide whether or
+not to activate an input parser (@pxref{BEGINFILE/ENDFILE}).
+
+You register your input parser with the following function:
+
+@table @code
+@item void register_input_parser(awk_input_parser_t *input_parser);
+Register the input parser pointed to by @code{input_parser} with
+@command{gawk}.
+@end table
+
+@node Output Wrappers
+@subsubsection Customized Output Wrappers
+
+An @dfn{output wrapper} is the mirror image of an input parser.
+It allows an extension to take over the output to a file opened
+with the @samp{>} or @samp{>>} operators (@pxref{Redirection}).
+
+The output wrapper is very similar to the input parser structure:
+
+@example
+typedef struct output_wrapper @{
+ const char *name; /* name of the wrapper */
+ awk_bool_t (*can_take_file)(const awk_output_buf_t *outbuf);
+ awk_bool_t (*take_control_of)(awk_output_buf_t *outbuf);
+ awk_const struct output_wrapper *awk_const next; /* for use by gawk */
+@} awk_output_wrapper_t;
+@end example
+
+The members are as follows:
+
+@table @code
+@item const char *name;
+This is the name of the output wrapper.
+
+@item awk_bool_t (*can_take_file)(const awk_output_buf_t *outbuf);
+This points to a function that examines the information in
+the @code{awk_output_buf_t} structure pointed to by @code{outbuf}.
+It should return true if the output wrapper wants to take over the
+file, and false otherwise. It should not change any state (variable
+values, etc.) within @command{gawk}.
+
+@item awk_bool_t (*take_control_of)(awk_output_buf_t *outbuf);
+The function pointed to by this field is called when @command{gawk}
+decides to let the output wrapper take control of the file. It should
+fill in appropriate members of the @code{awk_output_buf_t} structure,
+as described below, and return true if successful, false otherwise.
+
+@item awk_const struct output_wrapper *awk_const next;
+This is for use by @command{gawk}.
+@end table
+
+The @code{awk_output_buf_t} structure looks like this:
+
+@example
+typedef struct @{
+ const char *name; /* name of output file */
+ const char *mode; /* mode argument to fopen */
+ FILE *fp; /* stdio file pointer */
+ awk_bool_t redirected; /* true if a wrapper is active */
+ void *opaque; /* for use by output wrapper */
+ size_t (*gawk_fwrite)(const void *buf, size_t size, size_t count,
+ FILE *fp, void *opaque);
+ int (*gawk_fflush)(FILE *fp, void *opaque);
+ int (*gawk_ferror)(FILE *fp, void *opaque);
+ int (*gawk_fclose)(FILE *fp, void *opaque);
+@} awk_output_buf_t;
+@end example
+
+Here too, your extension will define @code{@var{XXX}_can_take_file()}
+and @code{@var{XXX}_take_control_of()} functions that examine and update
+data members in the @code{awk_output_buf_t}.
+The data members are as follows:
+
+@table @code
+@item const char *name;
+The name of the output file.
+
+@item const char *mode;
+The mode string (as would be used in the second argument to @code{fopen()})
+with which the file was opened.
+
+@item FILE *fp;
+The @code{FILE} pointer from @code{<stdio.h>}. @command{gawk} opens the file
+before attempting to find an output wrapper.
+
+@item awk_bool_t redirected;
+This field must be set to true by the @code{@var{XXX}_take_control_of()} function.
+
+@item void *opaque;
+This pointer is opaque to @command{gawk}. The extension should use it to store
+a pointer to any private data associated with the file.
+
+@item size_t (*gawk_fwrite)(const void *buf, size_t size, size_t count,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ FILE *fp, void *opaque);
+@itemx int (*gawk_fflush)(FILE *fp, void *opaque);
+@itemx int (*gawk_ferror)(FILE *fp, void *opaque);
+@itemx int (*gawk_fclose)(FILE *fp, void *opaque);
+These pointers should be set to point to functions that perform
+the equivalent function as the @code{<stdio.h>} functions do, if appropriate.
+@command{gawk} uses these function pointers for all output.
+@command{gawk} initializes the pointers to point to internal, ``pass through''
+functions that just call the regular @code{<stdio.h>} functions, so an
+extension only needs to redefine those functions that are appropriate for
+what it does.
+@end table
+
+The @code{@var{XXX}_can_take_file()} function should make a decision based
+upon the @code{name} and @code{mode} fields, and any additional state
+(such as @command{awk} variable values) that is appropriate.
+
+When @command{gawk} calls @code{@var{XXX}_take_control_of()}, it should fill
+in the other fields, as appropriate, except for @code{fp}, which it should just
+use normally.
+
+You register your output wrapper with the following function:
+
+@table @code
+@item void register_output_wrapper(awk_output_wrapper_t *output_wrapper);
+Register the output wrapper pointed to by @code{output_wrapper} with
+@command{gawk}.
+@end table
+
+@node Two-way processors
+@subsubsection Customized Two-way Processors
+
+A @dfn{two-way processor} combines an input parser and an output wrapper for
+two-way I/O with the @samp{|&} operator (@pxref{Redirection}). It makes identical
+use of the @code{awk_input_parser_t} and @code{awk_output_buf_t} structures
+as described earlier.
+
+A two-way processor is represented by the following structure:
+
+@example
+typedef struct two_way_processor @{
+ const char *name; /* name of the two-way processor */
+ awk_bool_t (*can_take_two_way)(const char *name);
+ awk_bool_t (*take_control_of)(const char *name,
+ awk_input_buf_t *inbuf,
+ awk_output_buf_t *outbuf);
+ awk_const struct two_way_processor *awk_const next; /* for use by gawk */
+@} awk_two_way_processor_t;
+@end example
+
+The fields are as follows:
+
+@table @code
+@item const char *name;
+The name of the two-way processor.
+
+@item awk_bool_t (*can_take_two_way)(const char *name);
+This function returns true if it wants to take over two-way I/O for this filename.
+It should not change any state (variable
+values, etc.) within @command{gawk}.
+
+@item awk_bool_t (*take_control_of)(const char *name,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_input_buf_t *inbuf,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_output_buf_t *outbuf);
+This function should fill in the @code{awk_input_buf_t} and
+@code{awk_outut_buf_t} structures pointed to by @code{inbuf} and
+@code{outbuf}, respectively. These structures were described earlier.
+
+@item awk_const struct two_way_processor *awk_const next;
+This is for use by @command{gawk}.
+@end table
+
+As with the input parser and output processor, you provide
+``yes I can take this'' and ``take over for this'' functions,
+@code{@var{XXX}_can_take_two_way()} and @code{@var{XXX}_take_control_of()}.
+
+You register your two-way processor with the following function:
+
+@table @code
+@item void register_two_way_processor(awk_two_way_processor_t *two_way_processor);
+Register the two-way processor pointed to by @code{two_way_processor} with
+@command{gawk}.
+@end table
+
+@node Printing Messages
+@subsection Printing Messages
+
+You can print different kinds of warning messages from your
+extension, as described below. Note that for these functions,
+you must pass in the extension id received from @command{gawk}
+when the extension was loaded.@footnote{Because the API uses only ISO C 90
+features, it cannot make use of the ISO C 99 variadic macro feature to hide
+that parameter. More's the pity.}
+
+@table @code
+@item void fatal(awk_ext_id_t id, const char *format, ...);
+Print a message and then cause @command{gawk} to exit immediately.
+
+@item void warning(awk_ext_id_t id, const char *format, ...);
+Print a warning message.
+
+@item void lintwarn(awk_ext_id_t id, const char *format, ...);
+Print a ``lint warning.'' Normally this is the same as printing a
+warning message, but if @command{gawk} was invoked with @samp{--lint=fatal},
+then lint warnings become fatal error messages.
+@end table
+
+All of these functions are otherwise like the C @code{printf()}
+family of functions, where the @code{format} parameter is a string
+with literal characters and formatting codes intermixed.
+
+@node Updating @code{ERRNO}
+@subsection Updating @code{ERRNO}
+
+The following functions allow you to update the @code{ERRNO}
+variable:
+
+@table @code
+@item void update_ERRNO_int(int errno_val);
+Set @code{ERRNO} to the string equivalent of the error code
+in @code{errno_val}. The value should be one of the defined
+error codes in @code{<errno.h>}, and @command{gawk} turns it
+into a (possibly translated) string using the C @code{strerror()} function.
+
+@item void update_ERRNO_string(const char *string);
+Set @code{ERRNO} directly to the string value of @code{ERRNO}.
+@command{gawk} makes a copy of the value of @code{string}.
+
+@item void unset_ERRNO();
+Unset @code{ERRNO}.
+@end table
+
+@node Accessing Parameters
+@subsection Accessing and Updating Parameters
+
+Two functions give you access to the arguments (parameters)
+passed to your extension function. They are:
+
+@table @code
+@item awk_bool_t get_argument(size_t count,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_valtype_t wanted,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_value_t *result);
+Fill in the @code{awk_value_t} structure pointed to by @code{result}
+with the @code{count}'th argument. Return true if the actual
+type matches @code{wanted}, false otherwise. In the latter
+case, @code{result@w{->}val_type} indicates the actual type
+(@pxref{table-value-types-returned}). Counts are zero based---the first
+argument is numbered zero, the second one, and so on. @code{wanted}
+indicates the type of value expected.
+
+@item awk_bool_t set_argument(size_t count, awk_array_t array);
+Convert a parameter that was undefined into an array; this provides
+call-by-reference for arrays. Return false if @code{count} is too big,
+or if the argument's type is not undefined. @xref{Array Manipulation},
+for more information on creating arrays.
+@end table
+
+@node Symbol Table Access
+@subsection Symbol Table Access
+
+Two sets of routines provide access to global variables, and one set
+allows you to create and release cached values.
+
+@menu
+* Symbol table by name:: Accessing variables by name.
+* Symbol table by cookie:: Accessing variables by ``cookie''.
+* Cached values:: Creating and using cached values.
+@end menu
+
+@node Symbol table by name
+@subsubsection Variable Access and Update by Name
+
+The following routines provide the ability to access and update
+global @command{awk}-level variables by name. In compiler terminology,
+identifiers of different kinds are termed @dfn{symbols}, thus the ``sym''
+in the routines' names. The data structure which stores information
+about symbols is termed a @dfn{symbol table}.
+
+@table @code
+@item awk_bool_t sym_lookup(const char *name,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_valtype_t wanted,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_value_t *result);
+Fill in the @code{awk_value_t} structure pointed to by @code{result}
+with the value of the variable named by the string @code{name}, which is
+a regular C string. @code{wanted} indicates the type of value expected.
+Return true if the actual type matches @code{wanted}, false otherwise
+In the latter case, @code{result->val_type} indicates the actual type
+(@pxref{table-value-types-returned}).
+
+@item awk_bool_t sym_update(const char *name, awk_value_t *value);
+Update the variable named by the string @code{name}, which is a regular
+C string. The variable is added to @command{gawk}'s symbol table
+if it is not there. Return true if everything worked, false otherwise.
+
+Changing types (scalar to array or vice versa) of an existing variable
+is @emph{not} allowed, nor may this routine be used to update an array.
+This routine cannot be used to update any of the predefined
+variables (such as @code{ARGC} or @code{NF}).
+
+@item awk_bool_t sym_constant(const char *name, awk_value_t *value);
+Create a variable named by the string @code{name}, which is
+a regular C string, that has the constant value as given by
+@code{value}. @command{awk}-level code cannot change the value of this
+variable.@footnote{There (currently) is no @code{awk}-level feature that
+provides this ability.} The extension may change the value of @code{name}'s
+variable with subsequent calls to this routine, and may also convert
+a variable created by @code{sym_update()} into a constant. However,
+once a variable becomes a constant, it cannot later be reverted into a
+mutable variable.
+@end table
+
+@node Symbol table by cookie
+@subsubsection Variable Access and Update by Cookie
+
+A @dfn{scalar cookie} is an opaque handle that provide access
+to a global variable or array. It is an optimization that
+avoids looking up variables in @command{gawk}'s symbol table every time
+access is needed. This was discussed earlier, in @ref{General Data Types}.
+
+The following functions let you work with scalar cookies.
+
+@table @code
+@item awk_bool_t sym_lookup_scalar(awk_scalar_t cookie,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_valtype_t wanted,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_value_t *result);
+Retrieve the current value of a scalar cookie.
+Once you have obtained a scalar_cookie using @code{sym_lookup()}, you can
+use this function to get its value more efficiently.
+Return false if the value cannot be retrieved.
+
+@item awk_bool_t sym_update_scalar(awk_scalar_t cookie, awk_value_t *value);
+Update the value associated with a scalar cookie. Return false if
+the new value is not one of @code{AWK_STRING} or @code{AWK_NUMBER}.
+Here too, the built-in variables may not be updated.
+@end table
+
+It is not obvious at first glance how to work with scalar cookies or
+what their @i{raison d@^etre} really is. In theory, the @code{sym_lookup()}
+and @code{sym_update()} routines are all you really need to work with
+variables. For example, you might have code that looked up the value of
+a variable, evaluated a condition, and then possibly changed the value
+of the variable based on the result of that evaluation, like so:
+
+@example
+/* do_magic --- do something really great */
+
+static awk_value_t *
+do_magic(int nargs, awk_value_t *result)
+@{
+ awk_value_t value;
+
+ if ( sym_lookup("MAGIC_VAR", AWK_NUMBER, & value)
+ && some_condition(value.num_value)) @{
+ value.num_value += 42;
+ sym_update("MAGIC_VAR", & value);
+ @}
+
+ return make_number(0.0, result);
+@}
+@end example
+
+@noindent
+This code looks (and is) simple and straightforward. So what's the problem?
+
+Consider what happens if @command{awk}-level code associated with your
+extension calls the @code{magic()} function (implemented in C by @code{do_magic()}),
+once per record, while processing hundreds of thousands or millions of records.
+The @code{MAGIC_VAR} variable is looked up in the symbol table once or twice per function call!
+
+The symbol table lookup is really pure overhead; it is considerably more efficient
+to get a cookie that represents the variable, and use that to get the variable's
+value and update it as needed.@footnote{The difference is measurable and quite real. Trust us.}
+
+Thus, the way to use cookies is as follows. First, install your extension's variable
+in @command{gawk}'s symbol table using @code{sym_update()}, as usual. Then get a
+scalar cookie for the variable using @code{sym_lookup()}:
+
+@example
+static awk_scalar_t magic_var_cookie; /* cookie for MAGIC_VAR */
+
+static void
+my_extension_init()
+@{
+ awk_value_t value;
+
+ /* install initial value */
+ sym_update("MAGIC_VAR", make_number(42.0, & value));
+
+ /* get cookie */
+ sym_lookup("MAGIC_VAR", AWK_SCALAR, & value);
+
+ /* save the cookie */
+ magic_var_cookie = value.scalar_cookie;
+ @dots{}
+@}
+@end example
+
+Next, use the routines in this section for retrieving and updating
+the value through the cookie. Thus, @code{do_magic()} now becomes
+something like this:
+
+@example
+/* do_magic --- do something really great */
+
+static awk_value_t *
+do_magic(int nargs, awk_value_t *result)
+@{
+ awk_value_t value;
+
+ if ( sym_lookup_scalar(magic_var_cookie, AWK_NUMBER, & value)
+ && some_condition(value.num_value)) @{
+ value.num_value += 42;
+ sym_update_scalar(magic_var_cookie, & value);
+ @}
+ @dots{}
+
+ return make_number(0.0, result);
+@}
+@end example
+
+@quotation NOTE
+The previous code omitted error checking for
+presentation purposes. Your extension code should be more robust
+and carefully check the return values from the API functions.
+@end quotation
+
+@node Cached values
+@subsubsection Creating and Using Cached Values
+
+The routines in this section allow you to create and release
+cached values. As with scalar cookies, in theory, cached values
+are not necessary. You can create numbers and strings using
+the functions in @ref{Constructor Functions}. You can then
+assign those values to variables using @code{sym_update()}
+or @code{sym_update_scalar()}, as you like.
+
+However, you can understand the point of cached values if you remember that
+@emph{every} string value's storage @emph{must} come from @code{malloc()}.
+If you have 20 variables, all of which have the same string value, you
+must create 20 identical copies of the string.@footnote{Numeric values
+are clearly less problematic, requiring only a C @code{double} to store.}
+
+It is clearly more efficient, if possible, to create a value once, and
+then tell @command{gawk} to reuse the value for multiple variables. That
+is what the routines in this section let you do. The functions are as follows:
+
+@table @code
+@item awk_bool_t create_value(awk_value_t *value, awk_value_cookie_t *result);
+Create a cached string or numeric value from @code{value} for efficient later
+assignment.
+Only @code{AWK_NUMBER} and @code{AWK_STRING} values are allowed. Any other type
+is rejected. While @code{AWK_UNDEFINED} could be allowed, doing so would
+result in inferior performance.
+
+@item awk_bool_t release_value(awk_value_cookie_t vc);
+Release the memory associated with a value cookie obtained
+from @code{create_value()}.
+@end table
+
+You use value cookies in a fashion similar to the way you use scalar cookies.
+In the extension initialization routine, you create the value cookie:
+
+@example
+static awk_value_cookie_t answer_cookie; /* static value cookie */
+
+static void
+my_extension_init()
+@{
+ awk_value_t value;
+ char *long_string;
+ size_t long_string_len;
+
+ /* code from earlier */
+ @dots{}
+ /* @dots{} fill in long_string and long_string_len @dots{} */
+ make_malloced_string(long_string, long_string_len, & value);
+ create_value(& value, & answer_cookie); /* create cookie */
+ @dots{}
+@}
+@end example
+
+Once the value is created, you can use it as the value of any number
+of variables:
+
+@example
+static awk_value_t *
+do_magic(int nargs, awk_value_t *result)
+@{
+ awk_value_t new_value;
+
+ @dots{} /* as earlier */
+
+ value.val_type = AWK_VALUE_COOKIE;
+ value.value_cookie = answer_cookie;
+ sym_update("VAR1", & value);
+ sym_update("VAR2", & value);
+ @dots{}
+ sym_update("VAR100", & value);
+ @dots{}
+@}
+@end example
+
+@noindent
+Using value cookies in this way saves considerable storage, since all of
+@code{VAR1} through @code{VAR100} share the same value.
+
+You might be wondering, ``Is this sharing problematic?
+What happens if @command{awk} code assigns a new value to @code{VAR1},
+are all the others be changed too?''
+
+That's a great question. The answer is that no, it's not a problem.
+Internally, @command{gawk} uses reference-counted strings. This means
+that many variables can share the same string, and @command{gawk}
+keeps track of the usage. When a variable's value changes, @command{gawk}
+simply decrements the reference count on the old value and updates
+the variable to use the new value.
+
+Finally, as part of your clean up action (@pxref{Exit Callback Functions})
+you should release any cached values that you created, using
+@code{release_value()}.
+
+@node Array Manipulation
+@subsection Array Manipulation
+
+The primary data structure@footnote{Okay, the only data structure.} in @command{awk}
+is the associative array (@pxref{Arrays}).
+Extensions need to be able to manipulate @command{awk} arrays.
+The API provides a number of data structures for working with arrays,
+functions for working with individual elements, and functions for
+working with arrays as a whole. This includes the ability to
+``flatten'' an array so that it is easy for C code to traverse
+every element in an array. The array data structures integrate
+nicely with the data structures for values to make it easy to
+both work with and create true arrays of arrays (@pxref{General Data Types}).
+
+@menu
+* Array Data Types:: Data types for working with arrays.
+* Array Functions:: Functions for working with arrays.
+* Flattening Arrays:: How to flatten arrays.
+* Creating Arrays:: How to create and populate arrays.
+@end menu
+
+@node Array Data Types
+@subsubsection Array Data Types
+
+The data types associated with arrays are listed below.
+
+@table @code
+@item typedef void *awk_array_t;
+If you request the value of an array variable, you get back an
+@code{awk_array_t} value. This value is opaque@footnote{It is also
+a ``cookie,'' but the @command{gawk} developers did not wish to overuse this
+term.} to the extension; it uniquely identifies the array but can
+only be used by passing it into API functions or receiving it from API
+functions. This is very similar to way @samp{FILE *} values are used
+with the @code{<stdio.h>} library routines.
+
+@item typedef struct awk_element @{
+@itemx @ @ @ @ /* convenience linked list pointer, not used by gawk */
+@itemx @ @ @ @ struct awk_element *next;
+@itemx @ @ @ @ enum @{
+@itemx @ @ @ @ @ @ @ @ AWK_ELEMENT_DEFAULT = 0,@ @ /* set by gawk */
+@itemx @ @ @ @ @ @ @ @ AWK_ELEMENT_DELETE = 1@ @ @ @ /* set by extension if should be deleted */
+@itemx @ @ @ @ @} flags;
+@itemx @ @ @ @ awk_value_t index;
+@itemx @ @ @ @ awk_value_t value;
+@itemx @} awk_element_t;
+The @code{awk_element_t} is a ``flattened''
+array element. @command{awk} produces an array of these
+inside the @code{awk_flat_array_t} (see the next item).
+Individual elements may be marked for deletion. New elements must be added
+individually, one at a time, using the separate API for that purpose.
+The fields are as follows:
+
+@c nested table
+@table @code
+@item struct awk_element *next;
+This pointer is for the convenience of extension writers. It allows
+an extension to create a linked list of new elements that can then be
+added to an array in a loop that traverses the list.
+
+@item enum @{ @dots{} @} flags;
+A set of flag values that convey information between @command{gawk}
+and the extension. Currently there is only one: @code{AWK_ELEMENT_DELETE}.
+Setting it causes @command{gawk} to delete the
+element from the original array upon release of the flattened array.
+
+@item index
+@itemx value
+The index and value of the element, respectively.
+@emph{All} memory pointed to by @code{index} and @code{value} belongs to @command{gawk}.
+@end table
+
+@item typedef struct awk_flat_array @{
+@itemx @ @ @ @ awk_const void *awk_const opaque1;@ @ @ @ /* private data for use by gawk */
+@itemx @ @ @ @ awk_const void *awk_const opaque2;@ @ @ @ /* private data for use by gawk */
+@itemx @ @ @ @ awk_const size_t count;@ @ @ @ @ /* how many elements */
+@itemx @ @ @ @ awk_element_t elements[1];@ @ /* will be extended */
+@itemx @} awk_flat_array_t;
+This is a flattened array. When an extension gets one of these
+from @command{gawk}, the @code{elements} array is of actual
+size @code{count}.
+The @code{opaque1} and @code{opaque2} pointers are for use by @command{gawk};
+therefore they are marked @code{awk_const} so that the extension cannot
+modify them.
+@end table
+
+@node Array Functions
+@subsubsection Array Functions
+
+The following functions relate to individual array elements.
+
+@table @code
+@item awk_bool_t get_element_count(awk_array_t a_cookie, size_t *count);
+For the array represented by @code{a_cookie}, return in @code{*count}
+the number of elements it contains. A subarray counts as a single element.
+Return false if there is an error.
+
+@item awk_bool_t get_array_element(awk_array_t a_cookie,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const awk_value_t *const index,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_valtype_t wanted,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_value_t *result);
+For the array represented by @code{a_cookie}, return in @code{*result}
+the value of the element whose index is @code{index}.
+@code{wanted} specifies the type of value you wish to retrieve.
+Return false if @code{wanted} does not match the actual type or if
+@code{index} is not in the array (@pxref{table-value-types-returned}).
+
+The value for @code{index} can be numeric, in which case @command{gawk}
+converts it to a string. Using non-integral values is possible, but
+requires that you understand how such values are converted to strings
+(@pxref{Conversion}); thus using integral values is safest.
+
+As with @emph{all} strings passed into @code{gawk} from an extension,
+the string value of @code{index} must come from @code{malloc()}, and
+@command{gawk} releases the storage.
+
+@item awk_bool_t set_array_element(awk_array_t a_cookie,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const@ awk_value_t *const index,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const@ awk_value_t *const value);
+In the array represented by @code{a_cookie}, create or modify
+the element whose index is given by @code{index}.
+The @code{ARGV} and @code{ENVIRON} arrays may not be changed.
+
+@item awk_bool_t set_array_element_by_elem(awk_array_t a_cookie,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_element_t element);
+Like @code{set_array_element()}, but take the @code{index} and @code{value}
+from @code{element}. This is a convenience macro.
+
+@item awk_bool_t del_array_element(awk_array_t a_cookie,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const awk_value_t* const index);
+Remove the element with the given index from the array
+represented by @code{a_cookie}.
+Return true if the element was removed, or false if the element did
+not exist in the array.
+@end table
+
+The following functions relate to arrays as a whole:
+
+@table @code
+@item awk_array_t create_array();
+Create a new array to which elements may be added.
+@xref{Creating Arrays}, for a discussion of how to
+create a new array and add elements to it.
+
+@item awk_bool_t clear_array(awk_array_t a_cookie);
+Clear the array represented by @code{a_cookie}.
+Return false if there was some kind of problem, true otherwise.
+The array remains an array, but after calling this function, it
+has no elements. This is equivalent to using the @code{delete}
+statement (@pxref{Delete}).
+
+@item awk_bool_t flatten_array(awk_array_t a_cookie, awk_flat_array_t **data);
+For the array represented by @code{a_cookie}, create an @code{awk_flat_array_t}
+structure and fill it in. Set the pointer whose address is passed as @code{data}
+to point to this structure.
+Return true upon success, or false otherwise.
+@xref{Flattening Arrays}, for a discussion of how to
+flatten an array and work with it.
+
+@item awk_bool_t release_flattened_array(awk_array_t a_cookie,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_flat_array_t *data);
+When done with a flattened array, release the storage using this function.
+You must pass in both the original array cookie, and the address of
+the created @code{awk_flat_array_t} structure.
+The function returns true upon success, false otherwise.
+@end table
+
+@node Flattening Arrays
+@subsubsection Working With All The Elements of an Array
+
+To @dfn{flatten} an array is create a structure that
+represents the full array in a fashion that makes it easy
+for C code to traverse the entire array. Test code
+in @file{extension/testext.c} does this, and also serves
+as a nice example to show how to use the APIs.
+
+First, the @command{gawk} script that drives the test extension:
+
+@example
+@@load "testext"
+BEGIN @{
+ n = split("blacky rusty sophie raincloud lucky", pets)
+ printf("pets has %d elements\n", length(pets))
+ ret = dump_array_and_delete("pets", "3")
+ printf("dump_array_and_delete(pets) returned %d\n", ret)
+ if ("3" in pets)
+ printf("dump_array_and_delete() did NOT remove index \"3\"!\n")
+ else
+ printf("dump_array_and_delete() did remove index \"3\"!\n")
+ print ""
+@}
+@end example
+
+@noindent
+This code creates an array with @code{split()} (@pxref{String Functions})
+and then calls @code{dump_and_delete()}. That function looks up
+the array whose name is passed as the first argument, and
+deletes the element at the index passed in the second argument.
+It then prints the return value and checks if the element
+was indeed deleted. Here is the C code that implements
+@code{dump_array_and_delete()}. It has been edited slightly for
+presentation.
+
+The first part declares variables, sets up the default
+return value in @code{result}, and checks that the function
+was called with the correct number of arguments:
+
+@example
+static awk_value_t *
+dump_array_and_delete(int nargs, awk_value_t *result)
+@{
+ awk_value_t value, value2, value3;
+ awk_flat_array_t *flat_array;
+ size_t count;
+ char *name;
+ int i;
+
+ assert(result != NULL);
+ make_number(0.0, result);
+
+ if (nargs != 2) @{
+ printf("dump_array_and_delete: nargs not right "
+ "(%d should be 2)\n", nargs);
+ goto out;
+ @}
+@end example
+
+The function then proceeds in steps, as follows. First, retrieve
+the name of the array, passed as the first argument. Then
+retrieve the array itself. If either operation fails, print
+error messages and return:
+
+@example
+ /* get argument named array as flat array and print it */
+ if (get_argument(0, AWK_STRING, & value)) @{
+ name = value.str_value.str;
+ if (sym_lookup(name, AWK_ARRAY, & value2))
+ printf("dump_array_and_delete: sym_lookup of %s passed\n",
+ name);
+ else @{
+ printf("dump_array_and_delete: sym_lookup of %s failed\n",
+ name);
+ goto out;
+ @}
+ @} else @{
+ printf("dump_array_and_delete: get_argument(0) failed\n");
+ goto out;
+ @}
+@end example
+
+For testing purposes and to make sure that the C code sees
+the same number of elements as the @command{awk} code,
+the second step is to get the count of elements in the array
+and print it:
+
+@example
+ if (! get_element_count(value2.array_cookie, & count)) @{
+ printf("dump_array_and_delete: get_element_count failed\n");
+ goto out;
+ @}
+
+ printf("dump_array_and_delete: incoming size is %lu\n",
+ (unsigned long) count);
+@end example
+
+The third step is to actually flatten the array, and then
+to double check that the count in the @code{awk_flat_array_t}
+is the same as the count just retrieved:
+
+@example
+ if (! flatten_array(value2.array_cookie, & flat_array)) @{
+ printf("dump_array_and_delete: could not flatten array\n");
+ goto out;
+ @}
+
+ if (flat_array->count != count) @{
+ printf("dump_array_and_delete: flat_array->count (%lu)"
+ " != count (%lu)\n",
+ (unsigned long) flat_array->count,
+ (unsigned long) count);
+ goto out;
+ @}
+@end example
+
+The fourth step is to retrieve the index of the element
+to be deleted, which was passed as the second argument.
+Remember that argument counts passed to @code{get_argument()}
+are zero-based, thus the second argument is numbered one:
+
+@example
+ if (! get_argument(1, AWK_STRING, & value3)) @{
+ printf("dump_array_and_delete: get_argument(1) failed\n");
+ goto out;
+ @}
+@end example
+
+The fifth step is where the ``real work'' is done. The function
+loops over every element in the array, printing the index and
+element values. In addition, upon finding the element with the
+index that is supposed to be deleted, the function sets the
+@code{AWK_ELEMENT_DELETE} bit in the @code{flags} field
+of the element. When the array is released, @command{gawk}
+traverses the flattened array, and deletes any element which
+have this flag bit set:
+
+@example
+ for (i = 0; i < flat_array->count; i++) @{
+ printf("\t%s[\"%.*s\"] = %s\n",
+ name,
+ (int) flat_array->elements[i].index.str_value.len,
+ flat_array->elements[i].index.str_value.str,
+ valrep2str(& flat_array->elements[i].value));
+
+ if (strcmp(value3.str_value.str,
+ flat_array->elements[i].index.str_value.str)
+ == 0) @{
+ flat_array->elements[i].flags |= AWK_ELEMENT_DELETE;
+ printf("dump_array_and_delete: marking element \"%s\" "
+ "for deletion\n",
+ flat_array->elements[i].index.str_value.str);
+ @}
+ @}
+@end example
+
+The sixth step is to release the flattened array. This tells
+@command{gawk} that the extension is no longer using the array,
+and that it should delete any elements marked for deletion.
+@command{gawk} also frees any storage that was allocated,
+so you should not use the pointer (@code{flat_array} in this
+code) once you have called @code{release_flattened_array()}:
+
+@example
+ if (! release_flattened_array(value2.array_cookie, flat_array)) @{
+ printf("dump_array_and_delete: could not release flattened array\n");
+ goto out;
+ @}
+@end example
+
+Finally, since everything was successful, the function sets the
+return value to success, and returns:
+
+@example
+ make_number(1.0, result);
+out:
+ return result;
+@}
+@end example
+
+Here is the output from running this part of the test:
+
+@example
+pets has 5 elements
+dump_array_and_delete: sym_lookup of pets passed
+dump_array_and_delete: incoming size is 5
+ pets["1"] = "blacky"
+ pets["2"] = "rusty"
+ pets["3"] = "sophie"
+dump_array_and_delete: marking element "3" for deletion
+ pets["4"] = "raincloud"
+ pets["5"] = "lucky"
+dump_array_and_delete(pets) returned 1
+dump_array_and_delete() did remove index "3"!
+@end example
+
+@node Creating Arrays
+@subsubsection How To Create and Populate Arrays
+
+Besides working with arrays created by @command{awk} code, you can
+create arrays and populate them as you see fit, and then @command{awk}
+code can access them and manipulate them.
+
+There are two important points about creating arrays from extension code:
+
+@enumerate 1
+@item
+You must install a new array into @command{gawk}'s symbol
+table immediately upon creating it. Once you have done so,
+you can then populate the array.
+
+@ignore
+Strictly speaking, this is required only
+for arrays that will have subarrays as elements; however it is
+a good idea to always do this. This restriction may be relaxed
+in a subsequent revision of the API.
+@end ignore
+
+Similarly, if installing a new array as a subarray of an existing array,
+you must add the new array to its parent before adding any elements to it.
+
+Thus, the correct way to build an array is to work ``top down.'' Create
+the array, and immediately install it in @command{gawk}'s symbol table
+using @code{sym_update()}, or install it as an element in a previously
+existing array using @code{set_element()}. We show example code shortly.
+
+@item
+Due to gawk internals, after using @code{sym_update()} to install an array
+into @command{gawk}, you have to retrieve the array cookie from the value
+passed in to @command{sym_update()} before doing anything else with it, like so:
+
+@example
+awk_value_t index, value;
+awk_array_t new_array;
+
+make_const_string("an index", 8, & index);
+
+new_array = create_array();
+val.val_type = AWK_ARRAY;
+val.array_cookie = new_array;
+
+/* install array in the symbol table */
+sym_update("array", & index, & val);
+
+new_array = val.array_cookie; /* YOU MUST DO THIS */
+@end example
+
+If installing an array as a subarray, you must also retrieve the value
+of the array cookie after the call to @code{set_element()}.
+@end enumerate
+
+The following C code is a simple test extension to create an array
+with two regular elements and with a subarray. The leading @samp{#include}
+directives and boilerplate variable declarations are omitted for brevity.
+The first step is to create a new array and then install it
+in the symbol table:
+
+@example
+@ignore
+#ifdef HAVE_CONFIG_H
+#include <config.h>
+#endif
+
+#include <stdio.h>
+#include <assert.h>
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+#include <sys/types.h>
+#include <sys/stat.h>
+
+#include "gawkapi.h"
+
+static const gawk_api_t *api; /* for convenience macros to work */
+static awk_ext_id_t *ext_id;
+static const char *ext_version = "testarray extension: version 1.0";
+
+int plugin_is_GPL_compatible;
+
+@end ignore
+/* create_new_array --- create a named array */
+
+static void
+create_new_array()
+@{
+ awk_array_t a_cookie;
+ awk_array_t subarray;
+ awk_value_t index, value;
+
+ a_cookie = create_array();
+ value.val_type = AWK_ARRAY;
+ value.array_cookie = a_cookie;
+
+ if (! sym_update("new_array", & value))
+ printf("create_new_array: sym_update(\"new_array\") failed!\n");
+ a_cookie = value.array_cookie;
+@end example
+
+@noindent
+Note how @code{a_cookie} is reset from the @code{array_cookie} field in
+the @code{value} structure.
+
+The second step is to install two regular values into @code{new_array}:
+
+@example
+ (void) make_const_string("hello", 5, & index);
+ (void) make_const_string("world", 5, & value);
+ if (! set_array_element(a_cookie, & index, & value)) @{
+ printf("fill_in_array: set_array_element failed\n");
+ return;
+ @}
+
+ (void) make_const_string("answer", 6, & index);
+ (void) make_number(42.0, & value);
+ if (! set_array_element(a_cookie, & index, & value)) @{
+ printf("fill_in_array: set_array_element failed\n");
+ return;
+ @}
+@end example
+
+The third step is to create the subarray and install it:
+
+@example
+ (void) make_const_string("subarray", 8, & index);
+ subarray = create_array();
+ value.val_type = AWK_ARRAY;
+ value.array_cookie = subarray;
+ if (! set_array_element(a_cookie, & index, & value)) @{
+ printf("fill_in_array: set_array_element failed\n");
+ return;
+ @}
+ subarray = value.array_cookie;
+@end example
+
+The final step is to populate the subarray with its own element:
+
+@example
+ (void) make_const_string("foo", 3, & index);
+ (void) make_const_string("bar", 3, & value);
+ if (! set_array_element(subarray, & index, & value)) @{
+ printf("fill_in_array: set_array_element failed\n");
+ return;
+ @}
+@}
+@ignore
+static awk_ext_func_t func_table[] = @{
+ @{ NULL, NULL, 0 @}
+@};
+
+/* init_testarray --- additional initialization function */
+
+static awk_bool_t init_testarray(void)
+@{
+ create_new_array();
+
+ return 1;
+@}
+
+static awk_bool_t (*init_func)(void) = init_testarray;
+
+dl_load_func(func_table, testarray, "")
+@end ignore
+@end example
+
+Here is sample script that loads the extension
+and then dumps the array:
+
+@example
+@@load "subarray"
+
+function dumparray(name, array, i)
+@{
+ for (i in array)
+ if (isarray(array[i]))
+ dumparray(name "[\"" i "\"]", array[i])
+ else
+ printf("%s[\"%s\"] = %s\n", name, i, array[i])
+@}
+
+BEGIN @{
+ dumparray("new_array", new_array);
+@}
+@end example
+
+Here is the result of running the script:
+
+@example
+$ @kbd{AWKLIBPATH=$PWD ./gawk -f subarray.awk}
+@print{} new_array["subarray"]["foo"] = bar
+@print{} new_array["hello"] = world
+@print{} new_array["answer"] = 42
+@end example
+
+@noindent
+(@xref{Finding Extensions}, for more information on the
+@env{AWKLIBPATH} environment variable.)
+
+@node Extension API Variables
+@subsection API Variables
+
+The API provides two sets of variables. The first provides information
+about the version of the API (both with which the extension was compiled,
+and with which @command{gawk} was compiled). The second provides
+information about how @command{gawk} was invoked.
+
+@menu
+* Extension Versioning:: API Version information.
+* Extension API Informational Variables:: Variables providing information about
+ @command{gawk}'s invocation.
+@end menu
+
+@node Extension Versioning
+@subsubsection API Version Constants and Variables
+
+The API provides both a ``major'' and a ``minor'' version number.
+The API versions are available at compile time as constants:
+
+@table @code
+@item GAWK_API_MAJOR_VERSION
+The major version of the API.
+
+@item GAWK_API_MINOR_VERSION
+The minor version of the API.
+@end table
+
+The minor version increases when new functions are added to the API. Such
+new functions are always added to the end of the API @code{struct}.
+
+The major version increases (and the minor version is reset to zero) if any
+of the data types change size or member order, or if any of the existing
+functions change signature.
+
+It could happen that an extension may be compiled against one version
+of the API but loaded by a version of @command{gawk} using a different
+version. For this reason, the major and minor API versions of the
+running @command{gawk} are included in the API @code{struct} as read-only
+constant integers:
+
+@table @code
+@item api->major_version
+The major version of the running @command{gawk}.
+
+@item api->minor_version
+The minor version of the running @command{gawk}.
+@end table
+
+It is up to the extension to decide if there are API incompatibilities.
+Typically a check like this is enough:
+
+@example
+if (api->major_version != GAWK_API_MAJOR_VERSION
+ || api->minor_version < GAWK_API_MINOR_VERSION) @{
+ fprintf(stderr, "foo_extension: version mismatch with gawk!\n");
+ fprintf(stderr, "\tmy version (%d, %d), gawk version (%d, %d)\n",
+ GAWK_API_MAJOR_VERSION, GAWK_API_MINOR_VERSION,
+ api->major_version, api->minor_version);
+ exit(1);
+@}
+@end example
+
+Such code is included in the boilerplate @code{dl_load_func()} macro
+provided in @file{gawkapi.h} (discussed later, in
+@ref{Extension API Boilerplate}).
+
+@node Extension API Informational Variables
+@subsubsection Informational Variables
+
+The API provides access to several variables that describe
+whether the corresponding command-line options were enabled when
+@command{gawk} was invoked. The variables are:
+
+@table @code
+@item do_lint
+This variable is true if @command{gawk} was invoked with @option{--lint} option
+(@pxref{Options}).
+
+@item do_traditional
+This variable is true if @command{gawk} was invoked with @option{--traditional} option.
+
+@item do_profile
+This variable is true if @command{gawk} was invoked with @option{--profile} option.
+
+@item do_sandbox
+This variable is true if @command{gawk} was invoked with @option{--sandbox} option.
+
+@item do_debug
+This variable is true if @command{gawk} was invoked with @option{--debug} option.
+
+@item do_mpfr
+This variable is true if @command{gawk} was invoked with @option{--bignum} option.
+@end table
+
+The value of @code{do_lint} can change if @command{awk} code
+modifies the @code{LINT} built-in variable (@pxref{Built-in Variables}).
+The others should not change during execution.
+
+@node Extension API Boilerplate
+@subsection Boilerplate Code
+
+As mentioned earlier (@pxref{Extension Mechanism Outline}), the function
+definitions as presented are really macros. To use these macros, your
+extension must provide a small amount of boilerplate code (variables and
+functions) towards the top of your source file, using pre-defined names
+as described below. The boilerplate needed is also provided in comments
+in the @file{gawkapi.h} header file:
+
+@example
+/* Boiler plate code: */
+int plugin_is_GPL_compatible;
+
+static gawk_api_t *const api;
+static awk_ext_id_t ext_id;
+static const char *ext_version = NULL; /* or @dots{} = "some string" */
+
+static awk_ext_func_t func_table[] = @{
+ @{ "name", do_name, 1 @},
+ /* @dots{} */
+@};
+
+/* EITHER: */
+
+static awk_bool_t (*init_func)(void) = NULL;
+
+/* OR: */
+
+static awk_bool_t
+init_my_module(void)
+@{
+ @dots{}
+@}
+
+static awk_bool_t (*init_func)(void) = init_my_module;
+
+dl_load_func(func_table, some_name, "name_space_in_quotes")
+@end example
+
+These variables and functions are as follows:
+
+@table @code
+@item int plugin_is_GPL_compatible;
+This asserts that the extension is compatible with the GNU GPL
+(@pxref{Copying}). If your extension does not have this, @command{gawk}
+will not load it (@pxref{Plugin License}).
+
+@item static gawk_api_t *const api;
+This global @code{static} variable should be set to point to
+the @code{gawk_api_t} pointer that @command{gawk} passes to your
+@code{dl_load()} function. This variable is used by all of the macros.
+
+@item static awk_ext_id_t ext_id;
+This global static variable should be set to the @code{awk_ext_id_t}
+value that @command{gawk} passes to your @code{dl_load()} function.
+This variable is used by all of the macros.
+
+@item static const char *ext_version = NULL; /* or @dots{} = "some string" */
+This global @code{static} variable should be set either
+to @code{NULL}, or to point to a string giving the name and version of
+your extension.
+
+@item static awk_ext_func_t func_table[] = @{ @dots{} @};
+This is an array of one or more @code{awk_ext_func_t} structures
+as described earlier (@pxref{Extension Functions}).
+It can then be looped over for multiple calls to
+@code{add_ext_func()}.
+
+@item static awk_bool_t (*init_func)(void) = NULL;
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @r{OR}
+@itemx static awk_bool_t init_my_module(void) @{ @dots{} @}
+@itemx static awk_bool_t (*init_func)(void) = init_my_module;
+If you need to do some initialization work, you should define a
+function that does it (creates variables, opens files, etc.)
+and then define the @code{init_func} pointer to point to your
+function.
+The function should return zero (false) upon failure, non-zero
+(success) if everything goes well.
+
+If you don't need to do any initialization, define the pointer and
+initialize it to @code{NULL}.
+
+@item dl_load_func(func_table, some_name, "name_space_in_quotes")
+This macro expands to a @code{dl_load()} function that performs
+all the necessary initializations.
+@end table
+
+The point of the all the variables and arrays is to let the
+@code{dl_load()} function (from the @code{dl_load_func()}
+macro) do all the standard work. It does the following:
+
+@enumerate 1
+@item
+Check the API versions. If the extension major version does not match
+@command{gawk}'s, or if the extension minor version is greater than
+@command{gawk}'s, it prints a fatal error message and exits.
+
+@item
+Load the functions defined in @code{func_table}.
+If any of them fails to load, it prints a warning message but
+continues on.
+
+@item
+If the @code{init_func} pointer is not @code{NULL}, call the
+function it points to. If it returns non-zero, print a
+warning message.
+
+@item
+If @code{ext_version} is not @code{NULL}, register
+the version string with @command{gawk}.
+@end enumerate
+
+@node Finding Extensions
+@subsection How @command{gawk} Finds Extensions
+
+Compiled extensions have to be installed in a directory where
+@command{gawk} can find them. If @command{gawk} is configured and
+built in the default fashion, the directory in which to find
+extensions is @file{/usr/local/lib/gawk}. You can also specify a search
+path with a list of directories to search for compiled extensions.
+@xref{AWKLIBPATH Variable}, for more information.
+
+@node Extension Example
+@section Example: Some File Functions
+
+@quotation
+@i{No matter where you go, there you are.} @*
+Buckaroo Bonzai
+@end quotation
+
+@c It's enough to show chdir and stat, no need for fts
+
+Two useful functions that are not in @command{awk} are @code{chdir()} (so
+that an @command{awk} program can change its directory) and @code{stat()}
+(so that an @command{awk} program can gather information about a file).
+This @value{SECTION} implements these functions for @command{gawk}
+in an extension.
+
+@menu
+* Internal File Description:: What the new functions will do.
+* Internal File Ops:: The code for internal file operations.
+* Using Internal File Ops:: How to use an external extension.
+@end menu
+
+@node Internal File Description
+@subsection Using @code{chdir()} and @code{stat()}
+
+This @value{SECTION} shows how to use the new functions at
+the @command{awk} level once they've been integrated into the
+running @command{gawk} interpreter. Using @code{chdir()} is very
+straightforward. It takes one argument, the new directory to change to:
+
+@example
+@@load "filefuncs"
+@dots{}
+newdir = "/home/arnold/funstuff"
+ret = chdir(newdir)
+if (ret < 0) @{
+ printf("could not change to %s: %s\n",
+ newdir, ERRNO) > "/dev/stderr"
+ exit 1
+@}
+@dots{}
+@end example
+
+The return value is negative if the @code{chdir()} failed, and
+@code{ERRNO} (@pxref{Built-in Variables}) is set to a string indicating
+the error.
+
+Using @code{stat()} is a bit more complicated. The C @code{stat()}
+function fills in a structure that has a fair amount of information.
+The right way to model this in @command{awk} is to fill in an associative
+array with the appropriate information:
+
+@c broke printf for page breaking
+@example
+file = "/home/arnold/.profile"
+ret = stat(file, fdata)
+if (ret < 0) @{
+ printf("could not stat %s: %s\n",
+ file, ERRNO) > "/dev/stderr"
+ exit 1
+@}
+printf("size of %s is %d bytes\n", file, fdata["size"])
+@end example
+
+The @code{stat()} function always clears the data array, even if
+the @code{stat()} fails. It fills in the following elements:
+
+@table @code
+@item "name"
+The name of the file that was @code{stat()}'ed.
+
+@item "dev"
+@itemx "ino"
+The file's device and inode numbers, respectively.
+
+@item "mode"
+The file's mode, as a numeric value. This includes both the file's
+type and its permissions.
+
+@item "nlink"
+The number of hard links (directory entries) the file has.
+
+@item "uid"
+@itemx "gid"
+The numeric user and group ID numbers of the file's owner.
+
+@item "size"
+The size in bytes of the file.
+
+@item "blocks"
+The number of disk blocks the file actually occupies. This may not
+be a function of the file's size if the file has holes.
+
+@item "atime"
+@itemx "mtime"
+@itemx "ctime"
+The file's last access, modification, and inode update times,
+respectively. These are numeric timestamps, suitable for formatting
+with @code{strftime()}
+(@pxref{Time Functions}).
+
+@item "pmode"
+The file's ``printable mode.'' This is a string representation of
+the file's type and permissions, such as is produced by
+@samp{ls -l}---for example, @code{"drwxr-xr-x"}.
+
+@item "type"
+A printable string representation of the file's type. The value
+is one of the following:
+
+@table @code
+@item "blockdev"
+@itemx "chardev"
+The file is a block or character device (``special file'').
+
+@ignore
+@item "door"
+The file is a Solaris ``door'' (special file used for
+interprocess communications).
+@end ignore
+
+@item "directory"
+The file is a directory.
+
+@item "fifo"
+The file is a named-pipe (also known as a FIFO).
+
+@item "file"
+The file is just a regular file.
+
+@item "socket"
+The file is an @code{AF_UNIX} (``Unix domain'') socket in the
+filesystem.
+
+@item "symlink"
+The file is a symbolic link.
+@end table
+@end table
+
+Several additional elements may be present depending upon the operating
+system and the type of the file. You can test for them in your @command{awk}
+program by using the @code{in} operator
+(@pxref{Reference to Elements}):
+
+@table @code
+@item "blksize"
+The preferred block size for I/O to the file. This field is not
+present on all POSIX-like systems in the C @code{stat} structure.
+
+@item "linkval"
+If the file is a symbolic link, this element is the name of the
+file the link points to (i.e., the value of the link).
+
+@item "rdev"
+@itemx "major"
+@itemx "minor"
+If the file is a block or character device file, then these values
+represent the numeric device number and the major and minor components
+of that number, respectively.
+@end table
+
+@node Internal File Ops
+@subsection C Code for @code{chdir()} and @code{stat()}
+
+Here is the C code for these extensions.@footnote{This version is
+edited slightly for presentation. See @file{extension/filefuncs.c}
+in the @command{gawk} distribution for the complete version.}
+
+The file includes a number of standard header files, and then includes
+the @file{gawkapi.h} header file which provides the API definitions.
+Those are followed by the necessary variable declarations
+to make use of the API macros and boilerplate code
+(@pxref{Extension API Boilerplate}).
+
+@c break line for page breaking
+@example
+#ifdef HAVE_CONFIG_H
+#include <config.h>
+#endif
+
+#include <stdio.h>
+#include <assert.h>
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+#include <sys/types.h>
+#include <sys/stat.h>
+
+#include "gawkapi.h"
+
+#include "gettext.h"
+#define _(msgid) gettext(msgid)
+#define N_(msgid) msgid
+
+#include "gawkfts.h"
+#include "stack.h"
+
+static const gawk_api_t *api; /* for convenience macros to work */
+static awk_ext_id_t *ext_id;
+static awk_bool_t init_filefuncs(void);
+static awk_bool_t (*init_func)(void) = init_filefuncs;
+static const char *ext_version = "filefuncs extension: version 1.0";
+
+int plugin_is_GPL_compatible;
+@end example
+
+@cindex programming conventions, @command{gawk} internals
+By convention, for an @command{awk} function @code{foo()}, the C function
+that implements it is called @code{do_foo()}. The function should have
+two arguments: the first is an @code{int} usually called @code{nargs},
+that represents the number of actual arguments for the function.
+The second is a pointer to an @code{awk_value_t}, usually named
+@code{result}.
+
+@example
+/* do_chdir --- provide dynamically loaded chdir() builtin for gawk */
+
+static awk_value_t *
+do_chdir(int nargs, awk_value_t *result)
+@{
+ awk_value_t newdir;
+ int ret = -1;
+
+ assert(result != NULL);
+
+ if (do_lint && nargs != 1)
+ lintwarn(ext_id,
+ _("chdir: called with incorrect number of arguments, "
+ "expecting 1"));
+@end example
+
+The @code{newdir}
+variable represents the new directory to change to, retrieved
+with @code{get_argument()}. Note that the first argument is
+numbered zero.
+
+If the argument is retrieved successfully, the function calls the
+@code{chdir()} system call. If the @code{chdir()} fails, @code{ERRNO}
+is updated.
+
+@example
+ if (get_argument(0, AWK_STRING, & newdir)) @{
+ ret = chdir(newdir.str_value.str);
+ if (ret < 0)
+ update_ERRNO_int(errno);
+ @}
+@end example
+
+Finally, the function returns the return value to the @command{awk} level:
+
+@example
+ return make_number(ret, result);
+@}
+@end example
+
+The @code{stat()} extension is more involved. First comes a function
+that turns a numeric mode into a printable representation
+(e.g., 644 becomes @samp{-rw-r--r--}). This is omitted here for brevity:
+
+@c break line for page breaking
+@example
+/* format_mode --- turn a stat mode field into something readable */
+
+static char *
+format_mode(unsigned long fmode)
+@{
+ @dots{}
+@}
+@end example
+
+Next comes a function for reading symbolic links, which is also
+omitted here for brevity:
+
+@example
+/* read_symlink --- read a symbolic link into an allocated buffer.
+ @dots{} */
+
+static char *
+read_symlink(const char *fname, size_t bufsize, ssize_t *linksize)
+@{
+ @dots{}
+@}
+@end example
+
+Two helper functions simplify entering values in the
+array that will contain the result of the @code{stat()}:
+
+@example
+/* array_set --- set an array element */
+
+static void
+array_set(awk_array_t array, const char *sub, awk_value_t *value)
+@{
+ awk_value_t index;
+
+ set_array_element(array,
+ make_const_string(sub, strlen(sub), & index),
+ value);
+
+@}
+
+/* array_set_numeric --- set an array element with a number */
+
+static void
+array_set_numeric(awk_array_t array, const char *sub, double num)
+@{
+ awk_value_t tmp;
+
+ array_set(array, sub, make_number(num, & tmp));
+@}
+@end example
+
+The following function does most of the work to fill in
+the @code{awk_array_t} result array with values obtained
+from a valid @code{struct stat}. It is done in a separate function
+to support the @code{stat()} function for @command{gawk} and also
+to support the @code{fts()} extension which is included in
+the same file but whose code is not shown here
+(@pxref{Extension Sample File Functions}).
+
+The first part of the function is variable declarations,
+including a table to map file types to strings:
+
+@example
+/* fill_stat_array --- do the work to fill an array with stat info */
+
+static int
+fill_stat_array(const char *name, awk_array_t array, struct stat *sbuf)
+@{
+ char *pmode; /* printable mode */
+ const char *type = "unknown";
+ awk_value_t tmp;
+ static struct ftype_map @{
+ unsigned int mask;
+ const char *type;
+ @} ftype_map[] = @{
+ @{ S_IFREG, "file" @},
+ @{ S_IFBLK, "blockdev" @},
+ @{ S_IFCHR, "chardev" @},
+ @{ S_IFDIR, "directory" @},
+#ifdef S_IFSOCK
+ @{ S_IFSOCK, "socket" @},
+#endif
+#ifdef S_IFIFO
+ @{ S_IFIFO, "fifo" @},
+#endif
+#ifdef S_IFLNK
+ @{ S_IFLNK, "symlink" @},
+#endif
+#ifdef S_IFDOOR /* Solaris weirdness */
+ @{ S_IFDOOR, "door" @},
+#endif /* S_IFDOOR */
+ @};
+ int j, k;
+@end example
+
+The destination array is cleared, and then code fills in
+various elements based on values in the @code{struct stat}:
+
+@example
+ /* empty out the array */
+ clear_array(array);
+
+ /* fill in the array */
+ array_set(array, "name", make_const_string(name, strlen(name),
+ & tmp));
+ array_set_numeric(array, "dev", sbuf->st_dev);
+ array_set_numeric(array, "ino", sbuf->st_ino);
+ array_set_numeric(array, "mode", sbuf->st_mode);
+ array_set_numeric(array, "nlink", sbuf->st_nlink);
+ array_set_numeric(array, "uid", sbuf->st_uid);
+ array_set_numeric(array, "gid", sbuf->st_gid);
+ array_set_numeric(array, "size", sbuf->st_size);
+ array_set_numeric(array, "blocks", sbuf->st_blocks);
+ array_set_numeric(array, "atime", sbuf->st_atime);
+ array_set_numeric(array, "mtime", sbuf->st_mtime);
+ array_set_numeric(array, "ctime", sbuf->st_ctime);
+
+ /* for block and character devices, add rdev,
+ major and minor numbers */
+ if (S_ISBLK(sbuf->st_mode) || S_ISCHR(sbuf->st_mode)) @{
+ array_set_numeric(array, "rdev", sbuf->st_rdev);
+ array_set_numeric(array, "major", major(sbuf->st_rdev));
+ array_set_numeric(array, "minor", minor(sbuf->st_rdev));
+ @}
+@end example
+
+@noindent
+The latter part of the function makes selective additions
+to the destination array, depending upon the availability of
+certain members and/or the type of the file. It then returns zero,
+for success:
+
+@example
+#ifdef HAVE_ST_BLKSIZE
+ array_set_numeric(array, "blksize", sbuf->st_blksize);
+#endif /* HAVE_ST_BLKSIZE */
+
+ pmode = format_mode(sbuf->st_mode);
+ array_set(array, "pmode", make_const_string(pmode, strlen(pmode),
+ & tmp));
+
+ /* for symbolic links, add a linkval field */
+ if (S_ISLNK(sbuf->st_mode)) @{
+ char *buf;
+ ssize_t linksize;
+
+ if ((buf = read_symlink(name, sbuf->st_size,
+ & linksize)) != NULL)
+ array_set(array, "linkval",
+ make_malloced_string(buf, linksize, & tmp));
+ else
+ warning(ext_id, _("stat: unable to read symbolic link `%s'"),
+ name);
+ @}
+
+ /* add a type field */
+ type = "unknown"; /* shouldn't happen */
+ for (j = 0, k = sizeof(ftype_map)/sizeof(ftype_map[0]); j < k; j++) @{
+ if ((sbuf->st_mode & S_IFMT) == ftype_map[j].mask) @{
+ type = ftype_map[j].type;
+ break;
+ @}
+ @}
+
+ array_set(array, "type", make_const_string(type, strlen(type), &tmp));
+
+ return 0;
+@}
+@end example
+
+Finally, here is the @code{do_stat()} function. It starts with
+variable declarations and argument checking:
+
+@ignore
+Changed message for page breaking. Used to be:
+ "stat: called with incorrect number of arguments (%d), should be 2",
+@end ignore
+@example
+/* do_stat --- provide a stat() function for gawk */
+
+static awk_value_t *
+do_stat(int nargs, awk_value_t *result)
+@{
+ awk_value_t file_param, array_param;
+ char *name;
+ awk_array_t array;
+ int ret;
+ struct stat sbuf;
+ int (*statfunc)(const char *path, struct stat *sbuf) = lstat; /* default */
+
+ assert(result != NULL);
+
+ if (nargs != 2 && nargs != 3) @{
+ if (do_lint)
+ lintwarn(ext_id, _("stat: called with wrong number of arguments"));
+ return make_number(-1, result);
+ @}
+@end example
+
+The third argument to @code{stat()} was not discussed previously. This argument
+is optional. If present, it causes @code{stat()} to use the @code{stat()}
+system call instead of the @code{lstat()} system call.
+
+Then comes the actual work. First, the function gets the arguments.
+Next, it gets the information for the file.
+The code use @code{lstat()} (instead of @code{stat()})
+to get the file information,
+in case the file is a symbolic link.
+If there's an error, it sets @code{ERRNO} and returns:
+
+@example
+ /* file is first arg, array to hold results is second */
+ if ( ! get_argument(0, AWK_STRING, & file_param)
+ || ! get_argument(1, AWK_ARRAY, & array_param)) @{
+ warning(ext_id, _("stat: bad parameters"));
+ return make_number(-1, result);
+ @}
+
+ if (nargs == 3) @{
+ statfunc = stat;
+ @}
+
+ name = file_param.str_value.str;
+ array = array_param.array_cookie;
+
+ /* always empty out the array */
+ clear_array(array);
+
+ /* stat the file, if error, set ERRNO and return */
+ ret = statfunc(name, & sbuf);
+ if (ret < 0) @{
+ update_ERRNO_int(errno);
+ return make_number(ret, result);
+ @}
+@end example
+
+The tedious work is done by @code{fill_stat_array()}, shown
+earlier. When done, return the result from @code{fill_stat_array()}:
+
+@example
+ ret = fill_stat_array(name, array, & sbuf);
+
+ return make_number(ret, result);
+@}
+@end example
+
+@cindex programming conventions, @command{gawk} internals
+Finally, it's necessary to provide the ``glue'' that loads the
+new function(s) into @command{gawk}.
+
+The @code{filefuncs} extension also provides an @code{fts()}
+function, which we omit here. For its sake there is an initialization
+function:
+
+@example
+/* init_filefuncs --- initialization routine */
+
+static awk_bool_t
+init_filefuncs(void)
+@{
+ @dots{}
+@}
+@end example
+
+We are almost done. We need an array of @code{awk_ext_func_t}
+structures for loading each function into @command{gawk}:
+
+@example
+static awk_ext_func_t func_table[] = @{
+ @{ "chdir", do_chdir, 1 @},
+ @{ "stat", do_stat, 2 @},
+ @{ "fts", do_fts, 3 @},
+@};
+@end example
+
+Each extension must have a routine named @code{dl_load()} to load
+everything that needs to be loaded. It is simplest to use the
+@code{dl_load_func()} macro in @code{gawkapi.h}:
+
+@example
+/* define the dl_load() function using the boilerplate macro */
+
+dl_load_func(func_table, filefuncs, "")
+@end example
+
+And that's it! As an exercise, consider adding functions to
+implement system calls such as @code{chown()}, @code{chmod()},
+and @code{umask()}.
+
+@node Using Internal File Ops
+@subsection Integrating The Extensions
+
+@cindex @command{gawk}, interpreter@comma{} adding code to
+Now that the code is written, it must be possible to add it at
+runtime to the running @command{gawk} interpreter. First, the
+code must be compiled. Assuming that the functions are in
+a file named @file{filefuncs.c}, and @var{idir} is the location
+of the @file{gawkapi.h} header file,
+the following steps@footnote{In practice, you would probably want to
+use the GNU Autotools---Automake, Autoconf, Libtool, and Gettext---to
+configure and build your libraries. Instructions for doing so are beyond
+the scope of this @value{DOCUMENT}. @xref{gawkextlib}, for WWW links to
+the tools.} create a GNU/Linux shared library:
+
+@example
+$ @kbd{gcc -fPIC -shared -DHAVE_CONFIG_H -c -O -g -I@var{idir} filefuncs.c}
+$ @kbd{ld -o filefuncs.so -shared filefuncs.o -lc}
+@end example
+
+Once the library exists, it is loaded by using the @code{@@load} keyword.
+
+@example
+# file testff.awk
+@@load "filefuncs"
+
+BEGIN @{
+ "pwd" | getline curdir # save current directory
+ close("pwd")
+
+ chdir("/tmp")
+ system("pwd") # test it
+ chdir(curdir) # go back
+
+ print "Info for testff.awk"
+ ret = stat("testff.awk", data)
+ print "ret =", ret
+ for (i in data)
+ printf "data[\"%s\"] = %s\n", i, data[i]
+ print "testff.awk modified:",
+ strftime("%m %d %y %H:%M:%S", data["mtime"])
+
+ print "\nInfo for JUNK"
+ ret = stat("JUNK", data)
+ print "ret =", ret
+ for (i in data)
+ printf "data[\"%s\"] = %s\n", i, data[i]
+ print "JUNK modified:", strftime("%m %d %y %H:%M:%S", data["mtime"])
+@}
+@end example
+
+The @env{AWKLIBPATH} environment variable tells
+@command{gawk} where to find shared libraries (@pxref{Finding Extensions}).
+We set it to the current directory and run the program:
+
+@example
+$ @kbd{AWKLIBPATH=$PWD gawk -f testff.awk}
+@print{} /tmp
+@print{} Info for testff.awk
+@print{} ret = 0
+@print{} data["blksize"] = 4096
+@print{} data["mtime"] = 1350838628
+@print{} data["mode"] = 33204
+@print{} data["type"] = file
+@print{} data["dev"] = 2053
+@print{} data["gid"] = 1000
+@print{} data["ino"] = 1719496
+@print{} data["ctime"] = 1350838628
+@print{} data["blocks"] = 8
+@print{} data["nlink"] = 1
+@print{} data["name"] = testff.awk
+@print{} data["atime"] = 1350838632
+@print{} data["pmode"] = -rw-rw-r--
+@print{} data["size"] = 662
+@print{} data["uid"] = 1000
+@print{} testff.awk modified: 10 21 12 18:57:08
+@print{}
+@print{} Info for JUNK
+@print{} ret = -1
+@print{} JUNK modified: 01 01 70 02:00:00
+@end example
+
+@node Extension Samples
+@section The Sample Extensions In The @command{gawk} Distribution
+
+This @value{SECTION} provides brief overviews of the sample extensions
+that come in the @command{gawk} distribution. Some of them are intended
+for production use, such the @code{filefuncs} and @code{readdir} extensions.
+Others mainly provide example code that shows how to use the extension API.
+
+@menu
+* Extension Sample File Functions:: The file functions sample.
+* Extension Sample Fnmatch:: An interface to @code{fnmatch()}.
+* Extension Sample Fork:: An interface to @code{fork()} and other
+ process functions.
+* Extension Sample Ord:: Character to value to character
+ conversions.
+* Extension Sample Readdir:: An interface to @code{readdir()}.
+* Extension Sample Revout:: Reversing output sample output wrapper.
+* Extension Sample Rev2way:: Reversing data sample two-way processor.
+* Extension Sample Read write array:: Serializing an array to a file.
+* Extension Sample Readfile:: Reading an entire file into a string.
+* Extension Sample API Tests:: Tests for the API.
+* Extension Sample Time:: An interface to @code{gettimeofday()}
+ and @code{sleep()}.
+@end menu
+
+@node Extension Sample File Functions
+@subsection File Related Functions
+
+The @code{filefuncs} extension provides three different functions, as follows:
+The usage is:
+
+@table @code
+@item @@load "filefuncs"
+This is how you load the extension.
+
+@item result = chdir("/some/directory")
+The @code{chdir()} function is a direct hook to the @code{chdir()}
+system call to change the current directory. It returns zero
+upon success or less than zero upon error. In the latter case it updates
+@code{ERRNO}.
+
+@item result = stat("/some/path", statdata [, follow])
+The @code{stat()} function provides a hook into the
+@code{stat()} system call.
+It returns zero upon success or less than zero upon error.
+In the latter case it updates @code{ERRNO}.
+
+By default, it uses the @code{lstat()} system call. However, if passed
+a third argument, it uses @code{stat()} instead.
+
+In all cases, it clears the @code{statdata} array.
+When the call is successful, @code{stat()} fills the @code{statdata}
+array with information retrieved from the filesystem, as follows:
+
+@c nested table
+@multitable @columnfractions .25 .60
+@item @code{statdata["name"]} @tab
+The name of the file.
+
+@item @code{statdata["dev"]} @tab
+Corresponds to the @code{st_dev} field in the @code{struct stat}.
+
+@item @code{statdata["ino"]} @tab
+Corresponds to the @code{st_ino} field in the @code{struct stat}.
+
+@item @code{statdata["mode"]} @tab
+Corresponds to the @code{st_mode} field in the @code{struct stat}.
+
+@item @code{statdata["nlink"]} @tab
+Corresponds to the @code{st_nlink} field in the @code{struct stat}.
+
+@item @code{statdata["uid"]} @tab
+Corresponds to the @code{st_uid} field in the @code{struct stat}.
+
+@item @code{statdata["gid"]} @tab
+Corresponds to the @code{st_gid} field in the @code{struct stat}.
+
+@item @code{statdata["size"]} @tab
+Corresponds to the @code{st_size} field in the @code{struct stat}.
+
+@item @code{statdata["atime"]} @tab
+Corresponds to the @code{st_atime} field in the @code{struct stat}.
+
+@item @code{statdata["mtime"]} @tab
+Corresponds to the @code{st_mtime} field in the @code{struct stat}.
+
+@item @code{statdata["ctime"]} @tab
+Corresponds to the @code{st_ctime} field in the @code{struct stat}.
+
+@item @code{statdata["rdev"]} @tab
+Corresponds to the @code{st_rdev} field in the @code{struct stat}.
+This element is only present for device files.
+
+@item @code{statdata["major"]} @tab
+Corresponds to the @code{st_major} field in the @code{struct stat}.
+This element is only present for device files.
+
+@item @code{statdata["minor"]} @tab
+Corresponds to the @code{st_minor} field in the @code{struct stat}.
+This element is only present for device files.
+
+@item @code{statdata["blksize"]} @tab
+Corresponds to the @code{st_blksize} field in the @code{struct stat}.
+if this field is present on your system.
+(It is present on all modern systems that we know of.)
+
+@item @code{statdata["pmode"]} @tab
+A human-readable version of the mode value, such as printed by
+@command{ls}. For example, @code{"-rwxr-xr-x"}.
+
+@item @code{statdata["linkval"]} @tab
+If the named file is a symbolic link, this element will exist
+and its value is the value of the symbolic link (where the
+symbolic link points to).
+
+@item @code{statdata["type"]} @tab
+The type of the file as a string. One of
+@code{"file"},
+@code{"blockdev"},
+@code{"chardev"},
+@code{"directory"},
+@code{"socket"},
+@code{"fifo"},
+@code{"symlink"},
+@code{"door"},
+or
+@code{"unknown"}.
+Not all systems support all file types.
+@end multitable
+
+@item flags = or(FTS_PHYSICAL, ...)
+@itemx result = fts(pathlist, flags, filedata)
+Walk the file trees provided in @code{pathlist} and fill in the
+@code{filedata} array as described below. @code{flags} is the bitwise
+OR of several predefined constant values, also as described below.
+Return zero if there were no errors, otherwise return @minus{}1.
+@end table
+
+The @code{fts()} function provides a hook to the C library @code{fts()}
+routines for traversing file hierarchies. Instead of returning data
+about one file at a time in a stream, it fills in a multi-dimensional
+array with data about each file and directory encountered in the requested
+hierarchies.
+
+The arguments are as follows:
+
+@table @code
+@item pathlist
+An array of filenames. The element values are used; the index values are ignored.
+
+@item flags
+This should be the bitwise OR of one or more of the following
+predefined constant flag values. At least one of
+@code{FTS_LOGICAL} or @code{FTS_PHYSICAL} must be provided; otherwise
+@code{fts()} returns an error value and sets @code{ERRNO}.
+The flags are:
+
+@c nested table
+@table @code
+@item FTS_LOGICAL
+Do a ``logical'' file traversal, where the information returned for
+a symbolic link refers to the linked-to file, and not to the symbolic
+link itself. This flag is mutually exclusive with @code{FTS_PHYSICAL}.
+
+@item FTS_PHYSICAL
+Do a ``physical'' file traversal, where the information returned for a
+symbolic link refers to the symbolic link itself. This flag is mutually
+exclusive with @code{FTS_LOGICAL}.
+
+@item FTS_NOCHDIR
+As a performance optimization, the C library @code{fts()} routines
+change directory as they traverse a file hierarchy. This flag disables
+that optimization.
+
+@item FTS_COMFOLLOW
+Immediately follow a symbolic link named in @code{pathlist},
+whether or not @code{FTS_LOGICAL} is set.
+
+@item FTS_SEEDOT
+By default, the @code{fts()} routines do not return entries for @file{.}
+and @file{..}. This option causes entries for @file{..} to also
+be included. (The extension always includes an entry for @file{.},
+see below.)
+
+@item FTS_XDEV
+During a traversal, do not cross onto a different mounted filesystem.
+@end table
+
+@item filedata
+The @code{filedata} array is first cleared. Then, @code{fts()} creates
+an element in @code{filedata} for every element in @code{pathlist}.
+The index is the name of the directory or file given in @code{pathlist}.
+The element for this index is itself an array. There are two cases.
+
+@c nested table
+@table @emph
+@item The path is a file.
+In this case, the array contains two or three elements:
+
+@c doubly nested table
+@table @code
+@item "path"
+The full path to this file, starting from the ``root'' that was given
+in the @code{pathlist} array.
+
+@item "stat"
+This element is itself an array, containing the same information as provided
+by the @code{stat()} function described earlier for its
+@code{statdata} argument. The element may not be present if
+the @code{stat()} system call for the file failed.
+
+@item "error"
+If some kind of error was encountered, the array will also
+contain an element named @code{"error"}, which is a string describing the error.
+@end table
+
+@item The path is a directory.
+In this case, the array contains one element for each entry in the
+directory. If an entry is a file, that element is as for files, just
+described. If the entry is a directory, that element is (recursively),
+an array describing the subdirectory. If @code{FTS_SEEDOT} was provided
+in the flags, then there will also be an element named @code{".."}. This
+element will be an array containing the data as provided by @code{stat()}.
+
+In addition, there will be an element whose index is @code{"."}.
+This element is an array containing the same two or three elements as
+for a file: @code{"path"}, @code{"stat"}, and @code{"error"}.
+@end table
+@end table
+
+The @code{fts()} function returns zero if there were no errors.
+Otherwise it returns @minus{}1.
+
+@quotation NOTE
+The @code{fts()} extension does not exactly mimic the
+interface of the C library @code{fts()} routines, choosing instead to
+provide an interface that is based on associative arrays, which should
+be more comfortable to use from an @command{awk} program. This includes the
+lack of a comparison function, since @command{gawk} already provides
+powerful array sorting facilities. While an @code{fts_read()}-like
+interface could have been provided, this felt less natural than simply
+creating a multi-dimensional array to represent the file hierarchy and
+its information.
+@end quotation
+
+See @file{test/fts.awk} in the @command{gawk} distribution for an example.
+
+@node Extension Sample Fnmatch
+@subsection Interface To @code{fnmatch()}
+
+This extension provides an interface to the C library
+@code{fnmatch()} function. The usage is:
+
+@example
+@@load "fnmatch"
+
+result = fnmatch(pattern, string, flags)
+@end example
+
+The @code{fnmatch} extension adds a single function named
+@code{fnmatch()}, one constant (@code{FNM_NOMATCH}), and an array of
+flag values named @code{FNM}.
+
+The arguments to @code{fnmatch()} are:
+
+@table @code
+@item pattern
+The filename wildcard to match.
+
+@item string
+The filename string,
+
+@item flag
+Either zero, or the bitwise OR of one or more of the
+flags in the @code{FNM} array.
+@end table
+
+The return value is zero on success, @code{FNM_NOMATCH}
+if the string did not match the pattern, or
+a different non-zero value if an error occurred.
+
+The flags are follows:
+
+@multitable @columnfractions .25 .75
+@item @code{FNM["CASEFOLD"]} @tab
+Corresponds to the @code{FNM_CASEFOLD} flag as defined in @code{fnmatch()}.
+
+@item @code{FNM["FILE_NAME"]} @tab
+Corresponds to the @code{FNM_FILE_NAME} flag as defined in @code{fnmatch()}.
+
+@item @code{FNM["LEADING_DIR"]} @tab
+Corresponds to the @code{FNM_LEADING_DIR} flag as defined in @code{fnmatch()}.
+
+@item @code{FNM["NOESCAPE"]} @tab
+Corresponds to the @code{FNM_NOESCAPE} flag as defined in @code{fnmatch()}.
+
+@item @code{FNM["PATHNAME"]} @tab
+Corresponds to the @code{FNM_PATHNAME} flag as defined in @code{fnmatch()}.
+
+@item @code{FNM["PERIOD"]} @tab
+Corresponds to the @code{FNM_PERIOD} flag as defined in @code{fnmatch()}.
+@end multitable
+
+Here is an example:
+
+@example
+@@load "fnmatch"
+@dots{}
+flags = or(FNM["PERIOD"], FNM["NOESCAPE"])
+if (fnmatch("*.a", "foo.c", flags) == FNM_NOMATCH)
+ print "no match"
+@end example
+
+@node Extension Sample Fork
+@subsection Interface To @code{fork()}, @code{wait()} and @code{waitpid()}
+
+The @code{fork} extension adds three functions, as follows.
+
+@table @code
+@item @@load "fork"
+This is how you load the extension.
+
+@item pid = fork()
+This function creates a new process. The return value is the zero in the
+child and the process-id number of the child in the parent, or @minus{}1
+upon error. In the latter case, @code{ERRNO} indicates the problem.
+In the child, @code{PROCINFO["pid"]} and @code{PROCINFO["ppid"]} are
+updated to reflect the correct values.
+
+@item ret = waitpid(pid)
+This function takes a numeric argument, which is the process-id to
+wait for. The return value is that of the
+@code{waitpid()} system call.
+
+@item ret = wait()
+This function waits for the first child to die.
+The return value is that of the
+@code{wait()} system call.
+@end table
+
+There is no corresponding @code{exec()} function.
+
+Here is an example:
+
+@example
+@@load "fork"
+@dots{}
+if ((pid = fork()) == 0)
+ print "hello from the child"
+else
+ print "hello from the parent"
+@end example
+
+@node Extension Sample Ord
+@subsection Character and Numeric values: @code{ord()} and @code{chr()}
+
+The @code{ordchr} extension adds two functions, named
+@code{ord()} and @code{chr()}, as follows.
+
+@table @code
+@item number = ord(string)
+Return the numeric value of the first character in @code{string}.
+
+@item char = chr(number)
+Return the string whose first character is that represented by @code{number}.
+@end table
+
+These functions are inspired by the Pascal language functions
+of the same name. Here is an example:
+
+@example
+@@load "ordchr"
+@dots{}
+printf("The numeric value of 'A' is %d\n", ord("A"))
+printf("The string value of 65 is %s\n", chr(65))
+@end example
+
+@node Extension Sample Readdir
+@subsection Reading Directories
+
+The @code{readdir} extension adds an input parser for directories.
+The usage is as follows:
+
+@example
+@@load "readdir"
+@end example
+
+When this extension is in use, instead of skipping directories named
+on the command line (or with @code{getline}),
+they are read, with each entry returned as a record.
+
+The record consists of three fields. The first two are the inode number and the
+filename, separated by a forward slash character.
+On systems where the directory entry contains the file type, the record
+has a third field which is a single letter indicating the type of the
+file:
+
+@multitable @columnfractions .1 .9
+@headitem Letter @tab File Type
+@item @code{b} @tab Block device
+@item @code{c} @tab Character device
+@item @code{d} @tab Directory
+@item @code{f} @tab Regular file
+@item @code{l} @tab Symbolic link
+@item @code{p} @tab Named pipe (FIFO)
+@item @code{s} @tab Socket
+@item @code{u} @tab Anything else (unknown)
+@end multitable
+
+On systems without the file type information, the third field is always
+@samp{u}.
+
+@quotation NOTE
+On GNU/Linux systems, there are filesystems that don't support the
+@code{d_type} entry (see the @i{readdir}(3) manual page), and so the file
+type is always @samp{u}. You can use the @code{filefuncs} extension to call
+@code{stat()} in order to get correct type information.
+@end quotation
+
+Here is an example:
+
+@example
+@@load "readdir"
+@dots{}
+BEGIN @{ FS = "/" @}
+@{ print "file name is", $2 @}
+@end example
+
+@node Extension Sample Revout
+@subsection Reversing Output
+
+The @code{revoutput} extension adds a simple output wrapper that reverses
+the characters in each output line. It's main purpose is to show how to
+write an output wrapper, although it may be mildly amusing for the unwary.
+Here is an example:
+
+@example
+@@load "revoutput"
+
+BEGIN @{
+ REVOUT = 1
+ print "hello, world" > "/dev/stdout"
+@}
+@end example
+
+The output from this program is:
+@samp{dlrow ,olleh}.
+
+@node Extension Sample Rev2way
+@subsection Two-Way I/O Example
+
+The @code{revtwoway} extension adds a simple two-way processor that
+reverses the characters in each line sent to it for reading back by
+the @command{awk} program. It's main purpose is to show how to write
+a two-way processor, although it may also be mildly amusing.
+The following example shows how to use it:
+
+@example
+@@load "revtwoway"
+
+BEGIN @{
+ cmd = "/magic/mirror"
+ print "hello, world" |& cmd
+ cmd |& getline result
+ print result
+ close(cmd)
+@}
+@end example
+
+@node Extension Sample Read write array
+@subsection Dumping and Restoring An Array
+
+The @code{rwarray} extension adds two functions,
+named @code{writea()} and @code{reada()}, as follows:
+
+@table @code
+@item ret = writea(file, array)
+This function takes a string argument, which is the name of the file
+to which dump the array, and the array itself as the second argument.
+@code{writea()} understands multidimensional arrays. It returns one on
+success, or zero upon failure.
+
+@item ret = reada(file, array)
+@code{reada()} is the inverse of @code{writea()};
+it reads the file named as its first argument, filling in
+the array named as the second argument. It clears the array first.
+Here too, the return value is one on success and zero upon failure.
+@end table
+
+The array created by @code{reada()} is identical to that written by
+@code{writea()} in the sense that the contents are the same. However,
+due to implementation issues, the array traversal order of the recreated
+array is likely to be different from that of the original array. As array
+traversal order in @command{awk} is by default undefined, this is not
+(technically) a problem. If you need to guarantee a particular traversal
+order, use the array sorting features in @command{gawk} to do so
+(@pxref{Array Sorting}).
+
+The file contains binary data. All integral values are written in network
+byte order. However, double precision floating-point values are written
+as native binary data. Thus, arrays containing only string data can
+theoretically be dumped on systems with one byte order and restored on
+systems with a different one, but this has not been tried.
+
+Here is an example:
+
+@example
+@@load "rwarray"
+@dots{}
+ret = writea("arraydump.bin", array)
+@dots{}
+ret = reada("arraydump.bin", array)
+@end example
+
+@node Extension Sample Readfile
+@subsection Reading An Entire File
+
+The @code{readfile} extension adds a single function
+named @code{readfile()}:
+
+@table @code
+@item result = readfile("/some/path")
+The argument is the name of the file to read. The return value is a
+string containing the entire contents of the requested file. Upon error,
+the function returns the empty string and sets @code{ERRNO}.
+@end table
+
+Here is an example:
+
+@example
+@@load "readfile"
+@dots{}
+contents = readfile("/path/to/file");
+if (contents == "" && ERRNO != "") @{
+ print("problem reading file", ERRNO) > "/dev/stderr"
+ ...
+@}
+@end example
+
+@node Extension Sample API Tests
+@subsection API Tests
+
+The @code{testext} extension exercises parts of the extension API that
+are not tested by the other samples. The @file{extension/testext.c}
+file contains both the C code for the extension and @command{awk}
+test code inside C comments that run the tests. The testing framework
+extracts the @command{awk} code and runs the tests. See the source file
+for more information.
+
+@node Extension Sample Time
+@subsection Extension Time Functions
+
+@cindex time
+@cindex sleep
+
+These functions can be used by either invoking @command{gawk}
+with a command-line argument of @samp{-l time} or by
+inserting @samp{@@load "time"} in your script.
+
+@table @code
+
+@cindex @code{gettimeofday} time extension function
+@item the_time = gettimeofday()
+Return the time in seconds that has elapsed since 1970-01-01 UTC as a
+floating point value. If the time is unavailable on this platform, return
+@minus{}1 and set @code{ERRNO}. The returned time should have sub-second
+precision, but the actual precision may vary based on the platform.
+If the standard C @code{gettimeofday()} system call is available on this
+platform, then it simply returns the value. Otherwise, if on Windows,
+it tries to use @code{GetSystemTimeAsFileTime()}.
+
+@cindex @code{sleep} time extension function
+@item result = sleep(@var{seconds})
+Attempt to sleep for @var{seconds} seconds. If @var{seconds} is negative,
+or the attempt to sleep fails, return @minus{}1 and set @code{ERRNO}.
+Otherwise, return zero after sleeping for the indicated amount of time.
+Note that @var{seconds} may be a floating-point (non-integral) value.
+Implementation details: depending on platform availability, this function
+tries to use @code{nanosleep()} or @code{select()} to implement the delay.
+@end table
+
+@node gawkextlib
+@section The @code{gawkextlib} Project
+
+The @uref{http://sourceforge.net/projects/gawkextlib/, @code{gawkextlib}}
+project provides a number of @command{gawk} extensions, including one for
+processing XML files. This is the evolution of the original @command{xgawk}
+(XML @command{gawk}) project.
+
+As of this writing, there are four extensions:
+
+@itemize @bullet
+@item
+XML parser extension, using the @uref{http://expat.sourceforge.net, Expat}
+XML parsing library.
+
+@item
+PostgreSQL extension.
+
+@item
+GD graphics library extension.
+
+@item
+MPFR library extension.
+This provides access to a number of MPFR functions which @command{gawk}'s
+native MPFR support does not.
+@end itemize
+
+The @code{time} extension described earlier (@pxref{Extension Sample
+Time}) was originally from this project but has been moved in to the
+main @command{gawk} distribution.
+
+You can check out the code for the @code{gawkextlib} project
+using the @uref{http://git-scm.com, GIT} distributed source
+code control system. The command is as follows:
+
+@example
+git clone git://git.code.sf.net/p/gawkextlib/code gawkextlib-code
+@end example
+
+You will need to have the @uref{http://expat.sourceforge.net, Expat}
+XML parser library installed in order to build and use the XML extension.
+
+In addition, you must have the GNU Autotools installed
+(@uref{http://www.gnu.org/software/autoconf, Autoconf},
+@uref{http://www.gnu.org/software/automake, Automake},
+@uref{http://www.gnu.org/software/libtool, Libtool},
+and
+@uref{http://www.gnu.org/software/gettext, Gettext}).
+
+The simple recipe for building and testing @code{gawkextlib} is as follows.
+First, build and install @command{gawk}:
+
+@example
+cd .../path/to/gawk/code
+./configure --prefix=/tmp/newgawk @ii{Install in /tmp/newgawk for now}
+make && make check @ii{Build and check that all is OK}
+make install @ii{Install gawk}
+@end example
+
+Next, build @code{gawkextlib} and test it:
+
+@example
+cd .../path/to/gawkextlib-code
+./update-autotools @ii{Generate configure, etc.}
+ @ii{You may have to run this command twice}
+./configure --with-gawk=/tmp/newgawk @ii{Configure, point at ``installed'' gawk}
+make && make check @ii{Build and check that all is OK}
+@end example
+
+If you write an extension that you wish to share with other
+@command{gawk} users, please consider doing so through the
+@code{gawkextlib} project.
+
+@iftex
+@part Part IV:@* Appendices
+@end iftex
+
+@ignore
+@ifdocbook
+
+@part Part IV:@* Appendices
+
+Part IV provides the appendices, the Glossary, and two licenses that cover
the @command{gawk} source code and this @value{DOCUMENT}, respectively.
-It contains the following appendixes:
+It contains the following appendices:
@itemize @bullet
@item
@@ -27476,11 +32009,7 @@ It contains the following appendixes:
@item
@ref{GNU Free Documentation License}.
@end itemize
-
-@page
-@evenheading @thispage@ @ @ @strong{@value{TITLE}} @| @|
-@oddheading @| @| @strong{@thischapter}@ @ @ @thispage
-@end iftex
+@end ifdocbook
@end ignore
@node Language History
@@ -27498,8 +32027,6 @@ This @value{CHAPTER} briefly describes the
evolution of the @command{awk} language, with cross-references to other parts
of the @value{DOCUMENT} where you can find more information.
-@c FIXME: Try to determine whether it was 3.1 or 3.2 that had new awk.
-
@menu
* V7/SVR3.1:: The major changes between V7 and System V
Release 3.1.
@@ -27914,6 +32441,7 @@ and
@code{xor()}
functions for bit manipulation
(@pxref{Bitwise Functions}).
+@c In 4.1, and(), or() and xor() grew the ability to take > 2 arguments
@item
The @code{asort()} and @code{asorti()} functions for sorting arrays
@@ -27925,11 +32453,6 @@ functions for internationalization
(@pxref{Programmer i18n}).
@item
-The @code{extension()} built-in function and the ability to add
-new functions dynamically
-(@pxref{Dynamic Extensions}).
-
-@item
The @code{fflush()} function from Brian Kernighan's
version of @command{awk}
(@pxref{I/O Functions}).
@@ -27956,29 +32479,70 @@ the @option{-f} command-line option
(@pxref{Options}).
@item
-The ability to use GNU-style long-named options that start with @option{--}
+The @env{AWKLIBPATH} environment variable for specifying a path search for
+the @option{-l} command-line option
+(@pxref{Options}).
+
+@item
+The
+@option{-b},
+@option{-c},
+@option{-C},
+@option{-d},
+@option{-D},
+@option{-e},
+@option{-E},
+@option{-g},
+@option{-h},
+@option{-i},
+@option{-l},
+@option{-L},
+@option{-M},
+@option{-n},
+@option{-N},
+@option{-o},
+@option{-O},
+@option{-p},
+@option{-P},
+@option{-r},
+@option{-S},
+@option{-t},
+and
+@option{-V}
+short options. Also, the
+ability to use GNU-style long-named options that start with @option{--}
and the
+@option{--assign},
+@option{--bignum},
@option{--characters-as-bytes},
-@option{--compat},
+@option{--copyright},
+@option{--debug},
@option{--dump-variables},
-@option{--exec},
+@option{--execle},
+@option{--field-separator},
+@option{--file},
@option{--gen-pot},
+@option{--help},
+@option{--include},
@option{--lint},
@option{--lint-old},
+@option{--load},
@option{--non-decimal-data},
+@option{--optimize},
@option{--posix},
+@option{--pretty-print},
@option{--profile},
@option{--re-interval},
@option{--sandbox},
@option{--source},
@option{--traditional},
+@option{--use-lc-numeric},
and
-@option{--use-lc-numeric}
-options
+@option{--version}
+long options
(@pxref{Options}).
@end itemize
-
@c new ports
@item
@@ -28076,7 +32640,7 @@ Almost all introductory Unix literature explained range expressions
as working in this fashion, and in particular, would teach that the
``correct'' way to match lowercase letters was with @samp{[a-z]}, and
that @samp{[A-Z]} was the ``correct'' way to match uppercase letters.
-And indeed, this was true.
+And indeed, this was true.@footnote{And Life was good.}
The 1993 POSIX standard introduced the idea of locales (@pxref{Locales}).
Since many locales include other letters besides the plain twenty-six
@@ -28094,12 +32658,12 @@ But outside those locales, the ordering was defined to be based on
In many locales, @samp{A} and @samp{a} are both less than @samp{B}.
In other words, these locales sort characters in dictionary order,
and @samp{[a-dx-z]} is typically not equivalent to @samp{[abcdxyz]};
-instead it might be equivalent to @samp{[aBbCcdXxYyz]}, for example.
+instead it might be equivalent to @samp{[ABCXYabcdxyz]}, for example.
This point needs to be emphasized: Much literature teaches that you should
use @samp{[a-z]} to match a lowercase character. But on systems with
non-ASCII locales, this also matched all of the uppercase characters
-except @samp{Z}! This was a continuous cause of confusion, even well
+except @samp{A} or @samp{Z}! This was a continuous cause of confusion, even well
into the twenty-first century.
To demonstrate these issues, the following example uses the @code{sub()}
@@ -28135,13 +32699,16 @@ the @command{gawk} maintainer grew weary of trying to explain that
@command{gawk} was being nicely standards-compliant, and that the issue
was in the user's locale. During the development of version 4.0,
he modified @command{gawk} to always treat ranges in the original,
-pre-POSIX fashion, unless @option{--posix} was used (@pxref{Options}).
+pre-POSIX fashion, unless @option{--posix} was used (@pxref{Options}).@footnote{And
+thus was born the Campain for Rational Range Interpretation (or RRI). A number
+of GNU tools, such as @command{grep} and @command{sed}, have either
+implemented this change, or will soon. Thanks to Karl Berry for coining the phrase
+``Rational Range Interpretation.''}
Fortunately, shortly before the final release of @command{gawk} 4.0,
the maintainer learned that the 2008 standard had changed the
definition of ranges, such that outside the @code{"C"} and @code{"POSIX"}
-locales, the meaning of range expressions was
-@emph{undefined}.@footnote{See
+locales, the meaning of range expressions was @emph{undefined}.@footnote{See
@uref{http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05, the standard}
and
@uref{http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xbd_chap09.html#tag_21_09_03_05, its rationale}.}
@@ -28151,7 +32718,6 @@ to implementors to implement ranges in whatever way they choose.
The @command{gawk} maintainer chose to apply the pre-POSIX meaning in all
cases: the default regexp matching; with @option{--traditional}, and with
@option{--posix}; in all cases, @command{gawk} remains POSIX compliant.
-
@node Contributors
@appendixsec Major Contributors to @command{gawk}
@cindex @command{gawk}, list of contributors to
@@ -28284,6 +32850,7 @@ the various PC platforms.
Christos Zoulas
provided the @code{extension()}
built-in function for dynamically adding new modules.
+(This was removed at @command{gawk} 4.1.)
@item
@cindex Kahrs, J@"urgen
@@ -29712,9 +34279,8 @@ maintainers of @command{gawk}. Everything in it applies specifically to
* Compatibility Mode:: How to disable certain @command{gawk}
extensions.
* Additions:: Making Additions To @command{gawk}.
-* Dynamic Extensions:: Adding new built-in functions to
- @command{gawk}.
* Future Extensions:: New features that may be implemented one day.
+* Implementation Limitations:: Some limitations of the implementation.
@end menu
@node Compatibility Mode
@@ -29759,6 +34325,8 @@ as well as any considerations you should bear in mind.
@command{gawk}.
* New Ports:: Porting @command{gawk} to a new operating
system.
+* Derived Files:: Why derived files are kept in the
+ @command{git} repository.
@end menu
@node Accessing The Source
@@ -29790,7 +34358,7 @@ git clone http://git.savannah.gnu.org/r/gawk.git
@end example
Once you have made changes, you can use @samp{git diff} to produce a
-patch, and send that to the @command{gawk} maintainer; see @ref{Bugs}
+patch, and send that to the @command{gawk} maintainer; see @ref{Bugs},
for how to do that.
Finally, if you cannot install Git (e.g., if it hasn't been ported
@@ -29801,6 +34369,10 @@ to check out a copy using CVS, as follows:
cvs -d:pserver:anonymous@@pserver.git.sv.gnu.org:/gawk.git co -d gawk master
@end example
+Note that this gateway is flakey; you may have better luck using
+a more modern version control system like Bazaar, that has a Git
+plug-in for working with Git repositories.
+
@node Adding Code
@appendixsubsec Adding New Features
@@ -29902,7 +34474,8 @@ of @code{switch} statements, instead of just the
plain pointer or character value.
@item
-Use the @code{TRUE}, @code{FALSE} and @code{NULL} symbolic constants
+Use @code{true}, @code{false} for @code{bool} values,
+the @code{NULL} symbolic constant for pointer values,
and the character constant @code{'\0'} where appropriate, instead of @code{1}
and @code{0}.
@@ -29949,8 +34522,9 @@ You will also have to sign paperwork for your documentation changes.
Submit changes as unified diffs.
Use @samp{diff -u -r -N} to compare
the original @command{gawk} source tree with your version.
-I recommend using the GNU version of @command{diff}.
-Send the output produced by either run of @command{diff} to me when you
+I recommend using the GNU version of @command{diff}, or best of all,
+@samp{git diff} or @samp{git format-patch}.
+Send the output produced by @command{diff} to me when you
submit your changes.
(@xref{Bugs}, for the electronic mail
information.)
@@ -30076,840 +34650,188 @@ operating systems' code that is already there.
In the code that you supply and maintain, feel free to use a
coding style and brace layout that suits your taste.
-@node Dynamic Extensions
-@appendixsec Adding New Built-in Functions to @command{gawk}
-@cindex Robinson, Will
-@cindex robot, the
-@cindex Lost In Space
-@quotation
-@i{Danger Will Robinson! Danger!!@*
-Warning! Warning!}@*
-The Robot
-@end quotation
+@node Derived Files
+@appendixsubsec Why Generated Files Are Kept In @command{git}
-@c STARTOFRANGE gladfgaw
-@cindex @command{gawk}, functions, adding
-@c STARTOFRANGE adfugaw
-@cindex adding, functions to @command{gawk}
-@c STARTOFRANGE fubadgaw
-@cindex functions, built-in, adding to @command{gawk}
-It is possible to add new built-in
-functions to @command{gawk} using dynamically loaded libraries. This
-facility is available on systems (such as GNU/Linux) that support
-the C @code{dlopen()} and @code{dlsym()} functions.
-This @value{SECTION} describes how to write and use dynamically
-loaded extensions for @command{gawk}.
-Experience with programming in
-C or C++ is necessary when reading this @value{SECTION}.
+@c From emails written March 22, 2012, to the gawk developers list.
-@quotation CAUTION
-The facilities described in this @value{SECTION}
-are very much subject to change in a future @command{gawk} release.
-Be aware that you may have to re-do everything,
-at some future time.
-
-If you have written your own dynamic extensions,
-be sure to recompile them for each new @command{gawk} release.
-There is no guarantee of binary compatibility between different
-releases, nor will there ever be such a guarantee.
-@end quotation
+If you look at the @command{gawk} source in the @command{git}
+repository, you will notice that it includes files that are automatically
+generated by GNU infrastructure tools, such as @file{Makefile.in} from
+@command{automake} and even @file{configure} from @command{autoconf}.
-@quotation NOTE
-When @option{--sandbox} is specified, extensions are disabled
-(@pxref{Options}.
-@end quotation
+This is different from many Free Software projects that do not store
+the derived files, because that keeps the repository less cluttered,
+and it is easier to see the substantive changes when comparing versions
+and trying to understand what changed between commits.
-@menu
-* Internals:: A brief look at some @command{gawk} internals.
-* Plugin License:: A note about licensing.
-* Loading Extensions:: How to load dynamic extensions.
-* Sample Library:: A example of new functions.
-@end menu
+However, there are two reasons why the @command{gawk} maintainer
+likes to have everything in the repository.
-@node Internals
-@appendixsubsec A Minimal Introduction to @command{gawk} Internals
-@c STARTOFRANGE gawint
-@cindex @command{gawk}, internals
-
-The truth is that @command{gawk} was not designed for simple extensibility.
-The facilities for adding functions using shared libraries work, but
-are something of a ``bag on the side.'' Thus, this tour is
-brief and simplistic; would-be @command{gawk} hackers are encouraged to
-spend some time reading the source code before trying to write
-extensions based on the material presented here. Of particular note
-are the files @file{awk.h}, @file{builtin.c}, and @file{eval.c}.
-Reading @file{awkgram.y} in order to see how the parse tree is built
-would also be of use.
-
-@cindex @code{awk.h} file (internal)
-With the disclaimers out of the way, the following types, structure
-members, functions, and macros are declared in @file{awk.h} and are of
-use when writing extensions. The next @value{SECTION}
-shows how they are used:
+First, because it is then easy to reproduce any given version completely,
+without relying upon the availability of (older, likely obsolete, and
+maybe even impossible to find) other tools.
-@table @code
-@cindex floating-point, numbers, @code{AWKNUM} internal type
-@cindex numbers, floating-point, @code{AWKNUM} internal type
-@cindex @code{AWKNUM} internal type
-@cindex internal type, @code{AWKNUM}
-@item AWKNUM
-An @code{AWKNUM} is the internal type of @command{awk}
-floating-point numbers. Typically, it is a C @code{double}.
-
-@cindex @code{NODE} internal type
-@cindex internal type, @code{NODE}
-@cindex strings, @code{NODE} internal type
-@cindex numbers, @code{NODE} internal type
-@item NODE
-Just about everything is done using objects of type @code{NODE}.
-These contain both strings and numbers, as well as variables and arrays.
-
-@cindex @code{force_number()} internal function
-@cindex internal function, @code{force_number()}
-@cindex numeric, values
-@item AWKNUM force_number(NODE *n)
-This macro forces a value to be numeric. It returns the actual
-numeric value contained in the node.
-It may end up calling an internal @command{gawk} function.
-
-@cindex @code{force_string()} internal function
-@cindex internal function, @code{force_string()}
-@item void force_string(NODE *n)
-This macro guarantees that a @code{NODE}'s string value is current.
-It may end up calling an internal @command{gawk} function.
-It also guarantees that the string is zero-terminated.
-
-@cindex @code{force_wstring()} internal function
-@cindex internal function, @code{force_wstring()}
-@item void force_wstring(NODE *n)
-Similarly, this
-macro guarantees that a @code{NODE}'s wide-string value is current.
-It may end up calling an internal @command{gawk} function.
-It also guarantees that the wide string is zero-terminated.
-
-@cindex parameters@comma{} number of
-@cindex @code{nargs} internal variable
-@cindex internal variable, @code{nargs}
-@item nargs
-Inside an extension function, this is the actual number of
-parameters passed to the current function.
-
-@cindex @code{stptr} internal variable
-@cindex internal variable, @code{stptr}
-@cindex @code{stlen} internal variable
-@cindex internal variable, @code{stlen}
-@item n->stptr
-@itemx n->stlen
-The data and length of a @code{NODE}'s string value, respectively.
-The string is @emph{not} guaranteed to be zero-terminated.
-If you need to pass the string value to a C library function, save
-the value in @code{n->stptr[n->stlen]}, assign @code{'\0'} to it,
-call the routine, and then restore the value.
-
-@cindex @code{wstptr} internal variable
-@cindex internal variable, @code{wstptr}
-@cindex @code{wstlen} internal variable
-@cindex internal variable, @code{wstlen}
-@item n->wstptr
-@itemx n->wstlen
-The data and length of a @code{NODE}'s wide-string value, respectively.
-Use @code{force_wstring()} to make sure these values are current.
-
-@cindex @code{type} internal variable
-@cindex internal variable, @code{type}
-@item n->type
-The type of the @code{NODE}. This is a C @code{enum}. Values should
-be one of @code{Node_var}, @code{Node_var_new}, or @code{Node_var_array}
-for function parameters.
-
-@cindex @code{vname} internal variable
-@cindex internal variable, @code{vname}
-@item n->vname
-The ``variable name'' of a node. This is not of much use inside
-externally written extensions.
-
-@cindex arrays, associative, clearing
-@cindex @code{assoc_clear()} internal function
-@cindex internal function, @code{assoc_clear()}
-@item void assoc_clear(NODE *n)
-Clears the associative array pointed to by @code{n}.
-Make sure that @samp{n->type == Node_var_array} first.
-
-@cindex arrays, elements, installing
-@cindex @code{assoc_lookup()} internal function
-@cindex internal function, @code{assoc_lookup()}
-@item NODE **assoc_lookup(NODE *symbol, NODE *subs)
-Finds, and installs if necessary, array elements.
-@code{symbol} is the array, @code{subs} is the subscript.
-This is usually a value created with @code{make_string()} (see below).
-
-@cindex strings
-@cindex @code{make_string()} internal function
-@cindex internal function, @code{make_string()}
-@item NODE *make_string(char *s, size_t len)
-Take a C string and turn it into a pointer to a @code{NODE} that
-can be stored appropriately. This is permanent storage; understanding
-of @command{gawk} memory management is helpful.
-
-@cindex numbers
-@cindex @code{make_number()} internal function
-@cindex internal function, @code{make_number()}
-@item NODE *make_number(AWKNUM val)
-Take an @code{AWKNUM} and turn it into a pointer to a @code{NODE} that
-can be stored appropriately. This is permanent storage; understanding
-of @command{gawk} memory management is helpful.
-
-
-@cindex nodes@comma{} duplicating
-@cindex @code{dupnode()} internal function
-@cindex internal function, @code{dupnode()}
-@item NODE *dupnode(NODE *n)
-Duplicate a node. In most cases, this increments an internal
-reference count instead of actually duplicating the entire @code{NODE};
-understanding of @command{gawk} memory management is helpful.
-
-@cindex memory, releasing
-@cindex @code{unref()} internal function
-@cindex internal function, @code{unref()}
-@item void unref(NODE *n)
-This macro releases the memory associated with a @code{NODE}
-allocated with @code{make_string()} or @code{make_number()}.
-Understanding of @command{gawk} memory management is helpful.
-
-@cindex @code{make_builtin()} internal function
-@cindex internal function, @code{make_builtin()}
-@item void make_builtin(const char *name, NODE *(*func)(NODE *), int count)
-Register a C function pointed to by @code{func} as new built-in
-function @code{name}. @code{name} is a regular C string. @code{count}
-is the maximum number of arguments that the function takes.
-The function should be written in the following manner:
-
-@example
-/* do_xxx --- do xxx function for gawk */
-
-NODE *
-do_xxx(int nargs)
-@{
- @dots{}
-@}
-@end example
+As an extreme example, if you ever even think about trying to compile,
+oh, say, the V7 @command{awk}, you will discover that not only do you
+have to bootstrap the V7 @command{yacc} to do so, but you also need the
+V7 @command{lex}. And the latter is pretty much impossible to bring up
+on a modern GNU/Linux system.@footnote{We tried. It was painful.}
-@cindex arguments, retrieving
-@cindex @code{get_argument()} internal function
-@cindex internal function, @code{get_argument()}
-@item NODE *get_argument(int i)
-This function is called from within a C extension function to get
-the @code{i}-th argument from the function call.
-The first argument is argument zero.
-
-@cindex @code{get_actual_argument()} internal function
-@cindex internal function, @code{get_actual_argument()}
-@item NODE *get_actual_argument(int i,
-@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ int@ optional,@ int@ wantarray);
-This function retrieves a particular argument @code{i}. @code{wantarray} is @code{TRUE}
-if the argument should be an array, @code{FALSE} otherwise. If @code{optional} is
-@code{TRUE}, the argument need not have been supplied. If it wasn't, the return
-value is @code{NULL}. It is a fatal error if @code{optional} is @code{TRUE} but
-the argument was not provided.
-
-@cindex @code{get_scalar_argument()} internal macro
-@cindex internal macro, @code{get_scalar_argument()}
-@item get_scalar_argument(i, opt)
-This is a convenience macro that calls @code{get_actual_argument()}.
-
-@cindex @code{get_array_argument()} internal macro
-@cindex internal macro, @code{get_array_argument()}
-@item get_array_argument(i, opt)
-This is a convenience macro that calls @code{get_actual_argument()}.
-
-@cindex functions, return values@comma{} setting
+(Or, let's say @command{gawk} 1.2 required @command{bison} whatever-it-was
+in 1989 and that there was no @file{awkgram.c} file in the repository. Is
+there a guarantee that we could find that @command{bison} version? Or that
+@emph{it} would build?)
-@cindex @code{ERRNO} variable
-@cindex @code{update_ERRNO()} internal function
-@cindex internal function, @code{update_ERRNO()}
-@item void update_ERRNO(void)
-This function is called from within a C extension function to set
-the value of @command{gawk}'s @code{ERRNO} variable, based on the current
-value of the C @code{errno} global variable.
-It is provided as a convenience.
-
-@cindex @code{ERRNO} variable
-@cindex @code{update_ERRNO_saved()} internal function
-@cindex internal function, @code{update_ERRNO_saved()}
-@item void update_ERRNO_saved(int errno_saved)
-This function is called from within a C extension function to set
-the value of @command{gawk}'s @code{ERRNO} variable, based on the error
-value provided as the argument.
-It is provided as a convenience.
-
-@cindex @code{ENVIRON} array
-@cindex @code{PROCINFO} array
-@cindex @code{register_deferred_variable()} internal function
-@cindex internal function, @code{register_deferred_variable()}
-@item void register_deferred_variable(const char *name, NODE *(*load_func)(void))
-This function is called to register a function to be called when a
-reference to an undefined variable with the given name is encountered.
-The callback function will never be called if the variable exists already,
-so, unless the calling code is running at program startup, it should first
-check whether a variable of the given name already exists.
-The argument function must return a pointer to a @code{NODE} containing the
-newly created variable. This function is used to implement the builtin
-@code{ENVIRON} and @code{PROCINFO} arrays, so you can refer to them
-for examples.
-
-@cindex @code{IOBUF} internal structure
-@cindex internal structure, @code{IOBUF}
-@cindex @code{iop_alloc()} internal function
-@cindex internal function, @code{iop_alloc()}
-@cindex @code{get_record()} input method
-@cindex @code{close_func}() input method
-@cindex @code{INVALID_HANDLE} internal constant
-@cindex internal constant, @code{INVALID_HANDLE}
-@cindex XML (eXtensible Markup Language)
-@cindex eXtensible Markup Language (XML)
-@cindex @code{register_open_hook()} internal function
-@cindex internal function, @code{register_open_hook()}
-@item void register_open_hook(void *(*open_func)(IOBUF *))
-This function is called to register a function to be called whenever
-a new data file is opened, leading to the creation of an @code{IOBUF}
-structure in @code{iop_alloc()}. After creating the new @code{IOBUF},
-@code{iop_alloc()} will call (in reverse order of registration, so the last
-function registered is called first) each open hook until one returns
-non-@code{NULL}. If any hook returns a non-@code{NULL} value, that value is assigned
-to the @code{IOBUF}'s @code{opaque} field (which will presumably point
-to a structure containing additional state associated with the input
-processing), and no further open hooks are called.
-
-The function called will most likely want to set the @code{IOBUF}'s
-@code{get_record} method to indicate that future input records should
-be retrieved by calling that method instead of using the standard
-@command{gawk} input processing.
-
-And the function will also probably want to set the @code{IOBUF}'s
-@code{close_func} method to be called when the file is closed to clean
-up any state associated with the input.
-
-Finally, hook functions should be prepared to receive an @code{IOBUF}
-structure where the @code{fd} field is set to @code{INVALID_HANDLE},
-meaning that @command{gawk} was not able to open the file itself. In
-this case, the hook function must be able to successfully open the file
-and place a valid file descriptor there.
-
-Currently, for example, the hook function facility is used to implement
-the XML parser shared library extension. For more info, please look in
-@file{awk.h} and in @file{io.c}.
-@end table
-
-An argument that is supposed to be an array needs to be handled with
-some extra code, in case the array being passed in is actually
-from a function parameter.
-
-The following boilerplate code shows how to do this:
-
-@example
-NODE *the_arg;
-
-/* assume need 3rd arg, 0-based */
-the_arg = get_array_argument(2, FALSE);
-@end example
-
-Again, you should spend time studying the @command{gawk} internals;
-don't just blindly copy this code.
-@c ENDOFRANGE gawint
-
-@node Plugin License
-@appendixsubsec Extension Licensing
+If the repository has all the generated files, then it's easy to just check
+them out and build. (Or @emph{easier}, depending upon how far back we go.
+@code{:-)})
-Every dynamic extension should define the global symbol
-@code{plugin_is_GPL_compatible} to assert that it has been licensed under
-a GPL-compatible license. If this symbol does not exist, @command{gawk}
-will emit a fatal error and exit.
+And that brings us to the second (and stronger) reason why all the files
+really need to be in @command{git}. It boils down to who do you cater
+to---the @command{gawk} developer(s), or the user who just wants to check
+out a version and try it out?
-The declared type of the symbol should be @code{int}. It does not need
-to be in any allocated section, though. The code merely asserts that
-the symbol exists in the global scope. Something like this is enough:
+The @command{gawk} maintainer
+wants it to be possible for any interested @command{awk} user in the
+world to just clone the repository, check out the branch of interest and
+build it. Without their having to have the correct version(s) of the
+autotools.@footnote{There is one GNU program that is (in our opinion)
+severely difficult to bootstrap from the @command{git} repository. For
+example, on the author's old (but still working) PowerPC macintosh with
+Mac OS X 10.5, it was necessary to bootstrap a ton of software, starting
+with @command{git} itself, in order to try to work with the latest code.
+It's not pleasant, and especially on older systems, it's a big waste
+of time.
-@example
-int plugin_is_GPL_compatible;
-@end example
-
-@node Loading Extensions
-@appendixsubsec Loading a Dynamic Extension
-@cindex loading extension
-@cindex @command{gawk}, functions, loading
-There are two ways to load a dynamically linked library. The first is to use the
-builtin @code{extension()}:
+Starting with the latest tarball was no picnic either. The maintainers
+had dropped @file{.gz} and @file{.bz2} files and only distribute
+@file{.tar.xz} files. It was necessary to bootstrap @command{xz} first!}
+That is the point of the @file{bootstrap.sh} file. It touches the
+various other files in the right order such that
@example
-extension(libname, init_func)
+# The canonical incantation for building GNU software:
+./bootstrap.sh && ./configure && make
@end example
-where @file{libname} is the library to load, and @samp{init_func} is the
-name of the initialization or bootstrap routine to run once loaded.
-
-The second method for dynamic loading of a library is to use the
-command line option @option{-l}:
-
-@example
-$ @kbd{gawk -l libname -f myprog}
-@end example
-
-This will work only if the initialization routine is named @code{dlload()}.
-
-If you use @code{extension()}, the library will be loaded
-at run time. This means that the functions are available only to the rest of
-your script. If you use the command line option @option{-l} instead,
-the library will be loaded before @command{gawk} starts compiling the
-actual program. The net effect is that you can use those functions
-anywhere in the program.
-
-@command{gawk} has a list of directories where it searches for libraries.
-By default, the list includes directories that depend upon how gawk was built
-and installed (@pxref{AWKPATH Variable}). If you want @command{gawk}
-to look for libraries in your private directory, you have to tell it.
-The way to do it is to set the @env{AWKPATH} environment variable
-(@pxref{AWKPATH Variable}).
-@command{gawk} supplies the default suffix @samp{.so} if it is not
-present in the name of the library.
-If the name of your library is @file{mylib.so}, you can simply type
-
-@example
-$ @kbd{gawk -l mylib -f myprog}
-@end example
-
-and @command{gawk} will do everything necessary to load in your library,
-and then call your @code{dlload()} routine.
-
-You can always specify the library using an absolute pathname, in which
-case @command{gawk} will not use @env{AWKPATH} to search for it.
-
-@node Sample Library
-@appendixsubsec Example: Directory and File Operation Built-ins
-@c STARTOFRANGE chdirg
-@cindex @code{chdir()} function@comma{} implementing in @command{gawk}
-@c STARTOFRANGE statg
-@cindex @code{stat()} function@comma{} implementing in @command{gawk}
-@c STARTOFRANGE filre
-@cindex files, information about@comma{} retrieving
-@c STARTOFRANGE dirch
-@cindex directories, changing
-
-Two useful functions that are not in @command{awk} are @code{chdir()}
-(so that an @command{awk} program can change its directory) and
-@code{stat()} (so that an @command{awk} program can gather information about
-a file).
-This @value{SECTION} implements these functions for @command{gawk} in an
-external extension library.
-
-@menu
-* Internal File Description:: What the new functions will do.
-* Internal File Ops:: The code for internal file operations.
-* Using Internal File Ops:: How to use an external extension.
-@end menu
-
-@node Internal File Description
-@appendixsubsubsec Using @code{chdir()} and @code{stat()}
-
-This @value{SECTION} shows how to use the new functions at the @command{awk}
-level once they've been integrated into the running @command{gawk}
-interpreter.
-Using @code{chdir()} is very straightforward. It takes one argument,
-the new directory to change to:
-
-@example
-@dots{}
-newdir = "/home/arnold/funstuff"
-ret = chdir(newdir)
-if (ret < 0) @{
- printf("could not change to %s: %s\n",
- newdir, ERRNO) > "/dev/stderr"
- exit 1
-@}
-@dots{}
-@end example
-
-The return value is negative if the @code{chdir} failed,
-and @code{ERRNO}
-(@pxref{Built-in Variables})
-is set to a string indicating the error.
+@noindent
+will @emph{just work}.
-Using @code{stat()} is a bit more complicated.
-The C @code{stat()} function fills in a structure that has a fair
-amount of information.
-The right way to model this in @command{awk} is to fill in an associative
-array with the appropriate information:
+This is extremely important for the @code{master} and
+@code{gawk-@var{X}.@var{Y}-stable} branches.
-@c broke printf for page breaking
-@example
-file = "/home/arnold/.profile"
-fdata[1] = "x" # force `fdata' to be an array
-ret = stat(file, fdata)
-if (ret < 0) @{
- printf("could not stat %s: %s\n",
- file, ERRNO) > "/dev/stderr"
- exit 1
-@}
-printf("size of %s is %d bytes\n", file, fdata["size"])
-@end example
+Further, the @command{gawk} maintainer would argue that it's also
+important for the @command{gawk} developers. When he tried to check out
+the @code{xgawk} branch@footnote{A branch created by one of the other
+developers that did not include the generated files.} to build it, he
+couldn't. (No @file{ltmain.sh} file, and he had no idea how to create it,
+and that was not the only problem.)
-The @code{stat()} function always clears the data array, even if
-the @code{stat()} fails. It fills in the following elements:
+He felt @emph{extremely} frustrated. With respect to that branch,
+the maintainer is no different than Jane User who wants to try to build
+@code{gawk-4.0-stable} or @code{master} from the repository.
-@table @code
-@item "name"
-The name of the file that was @code{stat()}'ed.
+Thus, the maintainer thinks that it's not just important, but critical,
+that for any given branch, the above incantation @emph{just works}.
-@item "dev"
-@itemx "ino"
-The file's device and inode numbers, respectively.
+@c So - that's my reasoning and philosophy.
-@item "mode"
-The file's mode, as a numeric value. This includes both the file's
-type and its permissions.
+What are some of the consequences and/or actions to take?
-@item "nlink"
-The number of hard links (directory entries) the file has.
-
-@item "uid"
-@itemx "gid"
-The numeric user and group ID numbers of the file's owner.
-
-@item "size"
-The size in bytes of the file.
-
-@item "blocks"
-The number of disk blocks the file actually occupies. This may not
-be a function of the file's size if the file has holes.
-
-@item "atime"
-@itemx "mtime"
-@itemx "ctime"
-The file's last access, modification, and inode update times,
-respectively. These are numeric timestamps, suitable for formatting
-with @code{strftime()}
-(@pxref{Built-in}).
+@enumerate 1
+@item
+We don't mind that there are differing files in the different branches
+as a result of different versions of the autotools.
-@item "pmode"
-The file's ``printable mode.'' This is a string representation of
-the file's type and permissions, such as what is produced by
-@samp{ls -l}---for example, @code{"drwxr-xr-x"}.
+@enumerate A
+@item
+It's the maintainer's job to merge them and he will deal with it.
-@item "type"
-A printable string representation of the file's type. The value
-is one of the following:
+@item
+He is really good at @samp{git diff x y > /tmp/diff1 ; gvim /tmp/diff1} to
+remove the diffs that aren't of interest in order to review code. @code{:-)}
+@end enumerate
-@table @code
-@item "blockdev"
-@itemx "chardev"
-The file is a block or character device (``special file'').
+@item
+It would certainly help if everyone used the same versions of the GNU tools
+as he does, which in general are the latest released versions of
+@command{automake},
+@command{autoconf},
+@command{bison},
+and
+@command{gettext}.
@ignore
-@item "door"
-The file is a Solaris ``door'' (special file used for
-interprocess communications).
+If it would help if I sent out an "I just upgraded to version x.y
+of tool Z" kind of message to this list, I can do that. Up until
+now it hasn't been a real issue since I'm the only one who's been
+dorking with the configuration machinery.
@end ignore
-@item "directory"
-The file is a directory.
-
-@item "fifo"
-The file is a named-pipe (also known as a FIFO).
-
-@item "file"
-The file is just a regular file.
-
-@item "socket"
-The file is an @code{AF_UNIX} (``Unix domain'') socket in the
-filesystem.
-
-@item "symlink"
-The file is a symbolic link.
-@end table
-@end table
-
-Several additional elements may be present depending upon the operating
-system and the type of the file. You can test for them in your @command{awk}
-program by using the @code{in} operator
-(@pxref{Reference to Elements}):
-
-@table @code
-@item "blksize"
-The preferred block size for I/O to the file. This field is not
-present on all POSIX-like systems in the C @code{stat} structure.
-
-@item "linkval"
-If the file is a symbolic link, this element is the name of the
-file the link points to (i.e., the value of the link).
-
-@item "rdev"
-@itemx "major"
-@itemx "minor"
-If the file is a block or character device file, then these values
-represent the numeric device number and the major and minor components
-of that number, respectively.
-@end table
-
-@node Internal File Ops
-@appendixsubsubsec C Code for @code{chdir()} and @code{stat()}
-
-Here is the C code for these extensions. They were written for
-GNU/Linux. The code needs some more work for complete portability
-to other POSIX-compliant systems:@footnote{This version is edited
-slightly for presentation. See
-@file{extension/filefuncs.c} in the @command{gawk} distribution
-for the complete version.}
-
-@c break line for page breaking
-@example
-#include "awk.h"
-
-#include <sys/sysmacros.h>
-
-int plugin_is_GPL_compatible;
-
-/* do_chdir --- provide dynamically loaded chdir() builtin for gawk */
-
-static NODE *
-do_chdir(int nargs)
-@{
- NODE *newdir;
- int ret = -1;
-
- if (do_lint && nargs != 1)
- lintwarn("chdir: called with incorrect number of arguments");
-
- newdir = get_scalar_argument(0, FALSE);
-@end example
-
-The file includes the @code{"awk.h"} header file for definitions
-for the @command{gawk} internals. It includes @code{<sys/sysmacros.h>}
-for access to the @code{major()} and @code{minor}() macros.
-
-@cindex programming conventions, @command{gawk} internals
-By convention, for an @command{awk} function @code{foo}, the function that
-implements it is called @samp{do_foo}. The function should take
-a @samp{int} argument, usually called @code{nargs}, that
-represents the number of defined arguments for the function. The @code{newdir}
-variable represents the new directory to change to, retrieved
-with @code{get_scalar_argument()}. Note that the first argument is
-numbered zero.
-
-This code actually accomplishes the @code{chdir()}. It first forces
-the argument to be a string and passes the string value to the
-@code{chdir()} system call. If the @code{chdir()} fails, @code{ERRNO}
-is updated.
-
-@example
- (void) force_string(newdir);
- ret = chdir(newdir->stptr);
- if (ret < 0)
- update_ERRNO();
-@end example
-
-Finally, the function returns the return value to the @command{awk} level:
-
-@example
- return make_number((AWKNUM) ret);
-@}
-@end example
-
-The @code{stat()} built-in is more involved. First comes a function
-that turns a numeric mode into a printable representation
-(e.g., 644 becomes @samp{-rw-r--r--}). This is omitted here for brevity:
+@enumerate A
+@item
+Installing from source is quite easy. It's how the maintainer worked for years
+under Fedora.
+He had @file{/usr/local/bin} at the front of hs @env{PATH} and just did:
-@c break line for page breaking
@example
-/* format_mode --- turn a stat mode field into something readable */
-
-static char *
-format_mode(unsigned long fmode)
-@{
- @dots{}
-@}
+wget http://ftp.gnu.org/gnu/@var{package}/@var{package}-@var{x}.@var{y}.@var{z}.tar.gz
+tar -xpzvf @var{package}-@var{x}.@var{y}.@var{z}.tar.gz
+cd @var{package}-@var{x}.@var{y}.@var{z}
+./configure && make && make check
+make install # as root
@end example
-Next comes the @code{do_stat()} function. It starts with
-variable declarations and argument checking:
+@item
+These days the maintainer uses Ubuntu 10.11 which is medium current, but
+he is already doing the above for @command{autoconf} and @command{bison}.
@ignore
-Changed message for page breaking. Used to be:
- "stat: called with incorrect number of arguments (%d), should be 2",
+(C. Rant: Recent Linux versions with GNOME 3 really suck. What
+ are all those people thinking? Fedora 15 was such a bust it drove
+ me to Ubuntu, but Ubuntu 11.04 and 11.10 are totally unusable from
+ a UI perspective. Bleah.)
@end ignore
-@example
-/* do_stat --- provide a stat() function for gawk */
-
-static NODE *
-do_stat(int nargs)
-@{
- NODE *file, *array, *tmp;
- struct stat sbuf;
- int ret;
- NODE **aptr;
- char *pmode; /* printable mode */
- char *type = "unknown";
-
- if (do_lint && nargs > 2)
- lintwarn("stat: called with too many arguments");
-@end example
-
-Then comes the actual work. First, the function gets the arguments.
-Then, it always clears the array.
-The code use @code{lstat()} (instead of @code{stat()})
-to get the file information,
-in case the file is a symbolic link.
-If there's an error, it sets @code{ERRNO} and returns:
-
-@c comment made multiline for page breaking
-@example
- /* file is first arg, array to hold results is second */
- file = get_scalar_argument(0, FALSE);
- array = get_array_argument(1, FALSE);
-
- /* empty out the array */
- assoc_clear(array);
-
- /* lstat the file, if error, set ERRNO and return */
- (void) force_string(file);
- ret = lstat(file->stptr, & sbuf);
- if (ret < 0) @{
- update_ERRNO();
- return make_number((AWKNUM) ret);
- @}
-@end example
-
-Now comes the tedious part: filling in the array. Only a few of the
-calls are shown here, since they all follow the same pattern:
-
-@example
- /* fill in the array */
- aptr = assoc_lookup(array, tmp = make_string("name", 4));
- *aptr = dupnode(file);
- unref(tmp);
-
- aptr = assoc_lookup(array, tmp = make_string("mode", 4));
- *aptr = make_number((AWKNUM) sbuf.st_mode);
- unref(tmp);
-
- aptr = assoc_lookup(array, tmp = make_string("pmode", 5));
- pmode = format_mode(sbuf.st_mode);
- *aptr = make_string(pmode, strlen(pmode));
- unref(tmp);
-@end example
-
-When done, return the @code{lstat()} return value:
-
-@example
-
- return make_number((AWKNUM) ret);
-@}
-@end example
-
-@cindex programming conventions, @command{gawk} internals
-Finally, it's necessary to provide the ``glue'' that loads the
-new function(s) into @command{gawk}. By convention, each library has
-a routine named @code{dlload()} that does the job:
-
-@example
-/* dlload --- load new builtins in this library */
-
-NODE *
-dlload(NODE *tree, void *dl)
-@{
- make_builtin("chdir", do_chdir, 1);
- make_builtin("stat", do_stat, 2);
- return make_number((AWKNUM) 0);
-@}
-@end example
+@end enumerate
-And that's it! As an exercise, consider adding functions to
-implement system calls such as @code{chown()}, @code{chmod()},
-and @code{umask()}.
+@ignore
+@item
+If someone still feels really strongly about all this, then perhaps they
+can have two branches, one for their development with just the clean
+changes, and one that is buildable (xgawk and xgawk-buildable, maybe).
+Or, as I suggested in another mail, make commits in pairs, the first with
+the "real" changes and the second with "everything else needed for
+ building".
+@end ignore
+@end enumerate
-@node Using Internal File Ops
-@appendixsubsubsec Integrating the Extensions
+Most of the above was originally written by the maintainer to other
+@command{gawk} developers. It raised the objection from one of
+the developers ``@dots{} that anybody pulling down the source from
+@command{git} is not an end user.''
-@cindex @command{gawk}, interpreter@comma{} adding code to
-Now that the code is written, it must be possible to add it at
-runtime to the running @command{gawk} interpreter. First, the
-code must be compiled. Assuming that the functions are in
-a file named @file{filefuncs.c}, and @var{idir} is the location
-of the @command{gawk} include files,
-the following steps create
-a GNU/Linux shared library:
+However, this is not true. There are ``power @command{awk} users''
+who can build @command{gawk} (using the magic incantation shown previously)
+but who can't program in C. Thus, the major branches should be
+kept buildable all the time.
-@example
-$ @kbd{gcc -fPIC -shared -DHAVE_CONFIG_H -c -O -g -I@var{idir} filefuncs.c}
-$ @kbd{ld -o filefuncs.so -shared filefuncs.o}
-@end example
+It was then suggested that there be a @command{cron} job to create
+nightly tarballs of ``the source.'' Here, the problem is that there
+are source trees, corresponding to the various branches! So,
+nightly tar balls aren't the answer, especially as the repository can go
+for weeks without significant change being introduced.
-@cindex @code{extension()} function (@command{gawk})
-Once the library exists, it is loaded by calling the @code{extension()}
-built-in function.
-This function takes two arguments: the name of the
-library to load and the name of a function to call when the library
-is first loaded. This function adds the new functions to @command{gawk}.
-It returns the value returned by the initialization function
-within the shared library:
+Fortunately, the @command{git} server can meet this need. For any given
+branch named @var{branchname}, use:
@example
-# file testff.awk
-BEGIN @{
- extension("./filefuncs.so", "dlload")
-
- chdir(".") # no-op
-
- data[1] = 1 # force `data' to be an array
- print "Info for testff.awk"
- ret = stat("testff.awk", data)
- print "ret =", ret
- for (i in data)
- printf "data[\"%s\"] = %s\n", i, data[i]
- print "testff.awk modified:",
- strftime("%m %d %y %H:%M:%S", data["mtime"])
-
- print "\nInfo for JUNK"
- ret = stat("JUNK", data)
- print "ret =", ret
- for (i in data)
- printf "data[\"%s\"] = %s\n", i, data[i]
- print "JUNK modified:", strftime("%m %d %y %H:%M:%S", data["mtime"])
-@}
+wget http://git.savannah.gnu.org/cgit/gawk.git/snapshot/gawk-@var{branchname}.tar.gz
@end example
-Here are the results of running the program:
+@noindent
+to retrieve a snapshot of the given branch.
-@example
-$ @kbd{gawk -f testff.awk}
-@print{} Info for testff.awk
-@print{} ret = 0
-@print{} data["size"] = 607
-@print{} data["ino"] = 14945891
-@print{} data["name"] = testff.awk
-@print{} data["pmode"] = -rw-rw-r--
-@print{} data["nlink"] = 1
-@print{} data["atime"] = 1293993369
-@print{} data["mtime"] = 1288520752
-@print{} data["mode"] = 33204
-@print{} data["blksize"] = 4096
-@print{} data["dev"] = 2054
-@print{} data["type"] = file
-@print{} data["gid"] = 500
-@print{} data["uid"] = 500
-@print{} data["blocks"] = 8
-@print{} data["ctime"] = 1290113572
-@print{} testff.awk modified: 10 31 10 12:25:52
-@print{}
-@print{} Info for JUNK
-@print{} ret = -1
-@print{} JUNK modified: 01 01 70 02:00:00
-@end example
-@c ENDOFRANGE filre
-@c ENDOFRANGE dirch
-@c ENDOFRANGE statg
-@c ENDOFRANGE chdirg
-@c ENDOFRANGE gladfgaw
-@c ENDOFRANGE adfugaw
-@c ENDOFRANGE fubadgaw
@node Future Extensions
@appendixsec Probable Future Extensions
@@ -30958,66 +34880,37 @@ Arnold Robbins
Larry Wall
@end quotation
-This @value{SECTION} briefly lists extensions and possible improvements
-that indicate the directions we are
-currently considering for @command{gawk}. The file @file{FUTURES} in the
-@command{gawk} distribution lists these extensions as well.
-
-Following is a list of probable future changes visible at the
-@command{awk} language level:
-
-@c these are ordered by likelihood
-@table @asis
-@item Loadable module interface
-It is not clear that the @command{awk}-level interface to the
-modules facility is as good as it should be. The interface needs to be
-redesigned, particularly taking namespace issues into account, as
-well as possibly including issues such as library search path order
-and versioning.
-
-@item @code{RECLEN} variable for fixed-length records
-Along with @code{FIELDWIDTHS}, this would speed up the processing of
-fixed-length records.
-@code{PROCINFO["RS"]} would be @code{"RS"} or @code{"RECLEN"},
-depending upon which kind of record processing is in effect.
-
-@item Databases
-It may be possible to map a GDBM/NDBM/SDBM file into an @command{awk} array.
-
-@item More @code{lint} warnings
-There are more things that could be checked for portability.
-@end table
-
-Following is a list of probable improvements that will make @command{gawk}'s
-source code easier to work with:
-
-@table @asis
-@item Loadable module mechanics
-The current extension mechanism works
-(@pxref{Dynamic Extensions}),
-but is rather primitive. It requires a fair amount of manual work
-to create and integrate a loadable module.
-Nor is the current mechanism as portable as might be desired.
-The GNU @command{libtool} package provides a number of features that
-would make using loadable modules much easier.
-@command{gawk} should be changed to use @command{libtool}.
-
-@item Loadable module internals
-The API to its internals that @command{gawk} ``exports'' should be revised.
-Too many things are needlessly exposed. A new API should be designed
-and implemented to make module writing easier.
-
-@item Better array subscript management
-@command{gawk}'s management of array subscript storage could use revamping,
-so that using the same value to index multiple arrays only
-stores one copy of the index value.
-@end table
-
-Finally,
-the programs in the test suite could use documenting in this @value{DOCUMENT}.
-
+The @file{TODO} file in the @command{gawk} Git repository lists possible
+future enhancements. Some of these relate to the source code, and others
+to possible new features. Please see that file for the list.
@xref{Additions},
-if you are interested in tackling any of these projects.
+if you are interested in tackling any of the projects listed there.
+
+@node Implementation Limitations
+@appendixsec Some Limitations of the Implementation
+
+This following table describes limits of @command{gawk} on a Unix-like
+system (although it is variable even then). Other systems may have
+different limits.
+
+@c @multitable {Number of file redirections} {min(number of processes per user, number of open files)}
+@multitable @columnfractions .40 .60
+@headitem Item @tab Limit
+@item Characters in a character class @tab 2^(number of bits per byte)
+@item Length of input record @tab @code{MAX_INT }
+@item Length of output record @tab Unlimited
+@item Length of source line @tab Unlimited
+@item Number of fields in a record @tab @code{MAX_LONG}
+@item Number of file redirections @tab Unlimited
+@item Number of input records in one file @tab @code{MAX_LONG}
+@item Number of input records total @tab @code{MAX_LONG}
+@item Number of pipe redirections @tab min(number of processes per user, number of open files)
+@item Numeric values @tab Double-precision floating point (if not using MPFR)
+@item Size of a field @tab @code{MAX_INT }
+@item Size of a literal string @tab @code{MAX_INT }
+@item Size of a printf string @tab @code{MAX_INT }
+@end multitable
+
@c ENDOFRANGE impis
@c ENDOFRANGE gawii
@@ -31038,7 +34931,6 @@ other introductory texts that you should refer to instead.)
@menu
* Basic High Level:: The high level view.
* Basic Data Typing:: A very quick intro to data types.
-* Floating Point Issues:: Stuff to know about floating-point numbers.
@end menu
@node Basic High Level
@@ -31046,19 +34938,17 @@ other introductory texts that you should refer to instead.)
@cindex processing data
At the most basic level, the job of a program is to process
-some input data and produce results.
+some input data and produce results. See @ref{figure-general-flow}.
-@iftex
-@image{general-program}
-@end iftex
-@ifnottex
-@example
- _______
-+------+ / \ +---------+
-| Data | -----> < Program > -----> | Results |
-+------+ \_______/ +---------+
-@end example
-@end ifnottex
+@float Figure,figure-general-flow
+@caption{General Program Flow}
+@ifinfo
+@center @image{general-program, , , General program flow, txt}
+@end ifinfo
+@ifnotinfo
+@center @image{general-program, , , General program flow}
+@end ifnotinfo
+@end float
@cindex compiled programs
@cindex interpreted programs
@@ -31074,26 +34964,18 @@ instructions in your program to process the data.
@cindex programming, basic steps
When you write a program, it usually consists
-of the following, very basic set of steps:
+of the following, very basic set of steps, as shown
+in @ref{figure-process-flow}:
-@iftex
-@image{process-flow}
-@end iftex
-@ifnottex
-@example
- ______
-+----------------+ / More \ No +----------+
-| Initialization | -------> < Data > -------> | Clean Up |
-+----------------+ ^ \ ? / +----------+
- | +--+-+
- | | Yes
- | |
- | V
- | +---------+
- +-----+ Process |
- +---------+
-@end example
-@end ifnottex
+@float Figure,figure-process-flow
+@caption{Basic Program Steps}
+@ifinfo
+@center @image{process-flow, , , Basic Program Stages, txt}
+@end ifinfo
+@ifnotinfo
+@center @image{process-flow, , , Basic Program Stages}
+@end ifnotinfo
+@end float
@table @asis
@item Initialization
@@ -31189,47 +35071,10 @@ Individual variables, as well as numeric and string variables, are
referred to as @dfn{scalar} values.
Groups of values, such as arrays, are not scalars.
-@cindex integers
-@cindex floating-point, numbers
-@cindex numbers, floating-point
-Within computers, there are two kinds of numeric values: @dfn{integers}
-and @dfn{floating-point}.
-In school, integer values were referred to as ``whole'' numbers---that is,
-numbers without any fractional part, such as 1, 42, or @minus{}17.
-The advantage to integer numbers is that they represent values exactly.
-The disadvantage is that their range is limited. On most systems,
-this range is @minus{}2,147,483,648 to 2,147,483,647.
-However, many systems now support a range from
-@minus{}9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.
-
-@cindex unsigned integers
-@cindex integers, unsigned
-Integer values come in two flavors: @dfn{signed} and @dfn{unsigned}.
-Signed values may be negative or positive, with the range of values just
-described.
-Unsigned values are always positive. On most systems,
-the range is from 0 to 4,294,967,295.
-However, many systems now support a range from
-0 to 18,446,744,073,709,551,615.
-
-@cindex double precision floating-point
-@cindex single precision floating-point
-Floating-point numbers represent what are called ``real'' numbers; i.e.,
-those that do have a fractional part, such as 3.1415927.
-The advantage to floating-point numbers is that they
-can represent a much larger range of values.
-The disadvantage is that there are numbers that they cannot represent
-exactly.
-@command{awk} uses @dfn{double precision} floating-point numbers, which
-can hold more digits than @dfn{single precision}
-floating-point numbers.
-Floating-point issues are discussed more fully in
-@ref{Floating Point Issues}.
-
-At the very lowest level, computers store values as groups of binary digits,
-or @dfn{bits}. Modern computers group bits into groups of eight, called @dfn{bytes}.
-Advanced applications sometimes have to manipulate bits directly,
-and @command{gawk} provides functions for doing so.
+@ref{General Arithmetic}, provided a basic introduction to numeric
+types (integer and floating-point) and how they are used in a computer.
+Please review that information, including a number of caveats that
+were presented.
@cindex null strings
While you are probably used to the idea of a number without a value (i.e., zero),
@@ -31253,6 +35098,11 @@ plus 0 times 1, or decimal 10.
Octal and hexadecimal are discussed more in
@ref{Nondecimal-numbers}.
+At the very lowest level, computers store values as groups of binary digits,
+or @dfn{bits}. Modern computers group bits into groups of eight, called @dfn{bytes}.
+Advanced applications sometimes have to manipulate bits directly,
+and @command{gawk} provides functions for doing so.
+
Programs are written in programming languages.
Hundreds, if not thousands, of programming languages exist.
One of the most popular is the C programming language.
@@ -31272,239 +35122,6 @@ standard for C. This standard became an ISO standard in 1990.
In 1999, a revised ISO C standard was approved and released.
Where it makes sense, POSIX @command{awk} is compatible with 1999 ISO C.
-@node Floating Point Issues
-@appendixsec Floating-Point Number Caveats
-
-As mentioned earlier, floating-point numbers represent what are called
-``real'' numbers, i.e., those that have a fractional part. @command{awk}
-uses double precision floating-point numbers to represent all
-numeric values. This @value{SECTION} describes some of the issues
-involved in using floating-point numbers.
-
-There is a very nice
-@uref{http://www.validlab.com/goldberg/paper.pdf, paper on floating-point arithmetic}
-by David Goldberg,
-``What Every Computer Scientist Should Know About Floating-point Arithmetic,''
-@cite{ACM Computing Surveys} @strong{23}, 1 (1991-03), 5-48.
-This is worth reading if you are interested in the details,
-but it does require a background in computer science.
-
-@menu
-* String Conversion Precision:: The String Value Can Lie.
-* Unexpected Results:: Floating Point Numbers Are Not Abstract
- Numbers.
-* POSIX Floating Point Problems:: Standards Versus Existing Practice.
-@end menu
-
-@node String Conversion Precision
-@appendixsubsec The String Value Can Lie
-
-Internally, @command{awk} keeps both the numeric value
-(double precision floating-point) and the string value for a variable.
-Separately, @command{awk} keeps
-track of what type the variable has
-(@pxref{Typing and Comparison}),
-which plays a role in how variables are used in comparisons.
-
-It is important to note that the string value for a number may not
-reflect the full value (all the digits) that the numeric value
-actually contains.
-The following program (@file{values.awk}) illustrates this:
-
-@example
-@{
- sum = $1 + $2
- # see it for what it is
- printf("sum = %.12g\n", sum)
- # use CONVFMT
- a = "<" sum ">"
- print "a =", a
- # use OFMT
- print "sum =", sum
-@}
-@end example
-
-@noindent
-This program shows the full value of the sum of @code{$1} and @code{$2}
-using @code{printf}, and then prints the string values obtained
-from both automatic conversion (via @code{CONVFMT}) and
-from printing (via @code{OFMT}).
-
-Here is what happens when the program is run:
-
-@example
-$ @kbd{echo 3.654321 1.2345678 | awk -f values.awk}
-@print{} sum = 4.8888888
-@print{} a = <4.88889>
-@print{} sum = 4.88889
-@end example
-
-This makes it clear that the full numeric value is different from
-what the default string representations show.
-
-@code{CONVFMT}'s default value is @code{"%.6g"}, which yields a value with
-at least six significant digits. For some applications, you might want to
-change it to specify more precision.
-On most modern machines, most of the time,
-17 digits is enough to capture a floating-point number's
-value exactly.@footnote{Pathological cases can require up to
-752 digits (!), but we doubt that you need to worry about this.}
-
-@node Unexpected Results
-@appendixsubsec Floating Point Numbers Are Not Abstract Numbers
-
-@cindex floating-point, numbers
-Unlike numbers in the abstract sense (such as what you studied in high school
-or college math), numbers stored in computers are limited in certain ways.
-They cannot represent an infinite number of digits, nor can they always
-represent things exactly.
-In particular,
-floating-point numbers cannot
-always represent values exactly. Here is an example:
-
-@example
-$ @kbd{awk '@{ printf("%010d\n", $1 * 100) @}'}
-515.79
-@print{} 0000051579
-515.80
-@print{} 0000051579
-515.81
-@print{} 0000051580
-515.82
-@print{} 0000051582
-@kbd{@value{CTL}-d}
-@end example
-
-@noindent
-This shows that some values can be represented exactly,
-whereas others are only approximated. This is not a ``bug''
-in @command{awk}, but simply an artifact of how computers
-represent numbers.
-
-@cindex negative zero
-@cindex positive zero
-@cindex zero@comma{} negative vs.@: positive
-Another peculiarity of floating-point numbers on modern systems
-is that they often have more than one representation for the number zero!
-In particular, it is possible to represent ``minus zero'' as well as
-regular, or ``positive'' zero.
-
-This example shows that negative and positive zero are distinct values
-when stored internally, but that they are in fact equal to each other,
-as well as to ``regular'' zero:
-
-@example
-$ @kbd{gawk 'BEGIN @{ mz = -0 ; pz = 0}
-> @kbd{printf "-0 = %g, +0 = %g, (-0 == +0) -> %d\n", mz, pz, mz == pz}
-> @kbd{printf "mz == 0 -> %d, pz == 0 -> %d\n", mz == 0, pz == 0}
-> @kbd{@}'}
-@print{} -0 = -0, +0 = 0, (-0 == +0) -> 1
-@print{} mz == 0 -> 1, pz == 0 -> 1
-@end example
-
-It helps to keep this in mind should you process numeric data
-that contains negative zero values; the fact that the zero is negative
-is noted and can affect comparisons.
-
-@node POSIX Floating Point Problems
-@appendixsubsec Standards Versus Existing Practice
-
-Historically, @command{awk} has converted any non-numeric looking string
-to the numeric value zero, when required. Furthermore, the original
-definition of the language and the original POSIX standards specified that
-@command{awk} only understands decimal numbers (base 10), and not octal
-(base 8) or hexadecimal numbers (base 16).
-
-Changes in the language of the
-2001 and 2004 POSIX standard can be interpreted to imply that @command{awk}
-should support additional features. These features are:
-
-@itemize @bullet
-@item
-Interpretation of floating point data values specified in hexadecimal
-notation (@samp{0xDEADBEEF}). (Note: data values, @emph{not}
-source code constants.)
-
-@item
-Support for the special IEEE 754 floating point values ``Not A Number''
-(NaN), positive Infinity (``inf'') and negative Infinity (``@minus{}inf'').
-In particular, the format for these values is as specified by the ISO 1999
-C standard, which ignores case and can allow machine-dependent additional
-characters after the @samp{nan} and allow either @samp{inf} or @samp{infinity}.
-@end itemize
-
-The first problem is that both of these are clear changes to historical
-practice:
-
-@itemize @bullet
-@item
-The @command{gawk} maintainer feels that supporting hexadecimal floating
-point values, in particular, is ugly, and was never intended by the
-original designers to be part of the language.
-
-@item
-Allowing completely alphabetic strings to have valid numeric
-values is also a very severe departure from historical practice.
-@end itemize
-
-The second problem is that the @code{gawk} maintainer feels that this
-interpretation of the standard, which requires a certain amount of
-``language lawyering'' to arrive at in the first place, was not even
-intended by the standard developers. In other words, ``we see how you
-got where you are, but we don't think that that's where you want to be.''
-
-The 2008 POSIX standard added explicit wording to allow, but not require,
-that @command{awk} support hexadecimal floating point values and
-special values for ``Not A Number'' and infinity.
-
-Although the @command{gawk} maintainer continues to feel that
-providing those features is inadvisable,
-nevertheless, on systems that support IEEE floating point, it seems
-reasonable to provide @emph{some} way to support NaN and Infinity values.
-The solution implemented in @command{gawk} is as follows:
-
-@itemize @bullet
-@item
-With the @option{--posix} command-line option, @command{gawk} becomes
-``hands off.'' String values are passed directly to the system library's
-@code{strtod()} function, and if it successfully returns a numeric value,
-that is what's used.@footnote{You asked for it, you got it.}
-By definition, the results are not portable across
-different systems. They are also a little surprising:
-
-@example
-$ @kbd{echo nanny | gawk --posix '@{ print $1 + 0 @}'}
-@print{} nan
-$ @kbd{echo 0xDeadBeef | gawk --posix '@{ print $1 + 0 @}'}
-@print{} 3735928559
-@end example
-
-@item
-Without @option{--posix}, @command{gawk} interprets the four strings
-@samp{+inf},
-@samp{-inf},
-@samp{+nan},
-and
-@samp{-nan}
-specially, producing the corresponding special numeric values.
-The leading sign acts a signal to @command{gawk} (and the user)
-that the value is really numeric. Hexadecimal floating point is
-not supported (unless you also use @option{--non-decimal-data},
-which is @emph{not} recommended). For example:
-
-@example
-$ @kbd{echo nanny | gawk '@{ print $1 + 0 @}'}
-@print{} 0
-$ @kbd{echo +nan | gawk '@{ print $1 + 0 @}'}
-@print{} nan
-$ @kbd{echo 0xDeadBeef | gawk '@{ print $1 + 0 @}'}
-@print{} 0
-@end example
-
-@command{gawk} does ignore case in the four special values.
-Thus @samp{+nan} and @samp{+NaN} are the same.
-@end itemize
-
@c ENDOFRANGE procon
@node Glossary
@@ -31713,6 +35330,50 @@ It was written in @command{awk}
by Brian Kernighan and Jon Bentley, and is available from
@uref{http://netlib.sandia.gov/netlib/typesetting/chem.gz}.
+@cindex cookie
+@item Cookie
+A peculiar goodie, token, saying or remembrance
+produced by or presented to a program. (With thanks to Doug McIlroy.)
+@ignore
+From: Doug McIlroy <doug@cs.dartmouth.edu>
+Date: Sat, 13 Oct 2012 19:55:25 -0400
+To: arnold@skeeve.com
+Subject: Re: origin of the term "cookie"?
+
+I believe the term "cookie", for a more or less inscrutable
+saying or crumb of information, was injected into Unix
+jargon by Bob Morris, who used the word quite frequently.
+It had no fixed meaning as it now does in browsers.
+
+The word had been around long before it was recognized in
+the 8th edition glossary (earlier editions had no glossary):
+
+cookie a peculiar goodie, token, saying or remembrance
+returned by or presented to a program. [I would say that
+"returned by" would better read "produced by", and assume
+responsibility for the inexactitude.]
+
+Doug McIlroy
+
+From: Doug McIlroy <doug@cs.dartmouth.edu>
+Date: Sun, 14 Oct 2012 10:08:43 -0400
+To: arnold@skeeve.com
+Subject: Re: origin of the term "cookie"?
+
+> Can I forward your email to Eric Raymond, for possible addition to the
+> Jargon File?
+
+Sure. I might add that I don't know how "cookie" entered Morris's
+vocabulary. Certainly "values of beta give rise to dom!" (see google)
+was an early, if not the earliest Unix cookie. The fact that it was
+found lying around on a model 37 teletype (which had Greek beta in
+its type box) suggests that maybe it was seen to be like milk and
+cookies laid out for Santa Claus. Morris was wont to make such
+connections.
+
+Doug
+@end ignore
+
@item Coprocess
A subordinate program with which two-way communications is possible.
@@ -31955,12 +35616,15 @@ in @command{awk} programs.
@cindex ISO
@item ISO
-The International Standards Organization.
+The International Organization for Standardization.
This organization produces international standards for many things, including
programming languages, such as C and C++.
In the computer arena, important standards like those for C, C++, and POSIX
become both American national and ISO international standards simultaneously.
This @value{DOCUMENT} refers to Standard C as ``ISO C'' throughout.
+See @uref{http://www.iso.org/iso/home/about.htm, the ISO website} for more
+information about the name of the organization and its language-independent
+three-letter acronym.
@cindex Java programming language
@cindex Programming languages, Java
@@ -33494,9 +37158,6 @@ Unresolved Issues:
of how to use them. It would be useful to perhaps have a "programming
style" section of the manual that would include this and other tips.
-2. The default AWKPATH search path should be configurable via `configure'
- The default and how this changes needs to be documented.
-
Consistency issues:
/.../ regexps are in @code, not @samp
".." strings are in @code, not @samp
@@ -33591,14 +37252,7 @@ ORA uses filename, thus the macro.
Suggestions:
------------
-Enhance FIELDWIDTHS with some way to indicate "the rest of the record".
-E.g., a length of 0 or -1 or something. May be "n"?
-
-Make FIELDWIDTHS be an array?
-
% Next edition:
-% 1. Talk about common extensions, those in nawk, gawk, mawk
-% 2. Use @code{foo} for variables and @code{foo()} for functions
-% 3. Standardize the error messages from the functions and programs
-% in Chapters 12 and 13.
-% 4. Nuke the BBS stuff and use something that won't be obsolete
+% 1. Standardize the error messages from the functions and programs
+% in the two sample code chapters.
+% 2. Nuke the BBS stuff and use something that won't be obsolete