aboutsummaryrefslogtreecommitdiffstats
path: root/doc/gawk.texi
diff options
context:
space:
mode:
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r--doc/gawk.texi12114
1 files changed, 8390 insertions, 3724 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi
index ec83964d..ab26df28 100644
--- a/doc/gawk.texi
+++ b/doc/gawk.texi
@@ -88,9 +88,11 @@
@c some special symbols
@iftex
@set LEQ @math{@leq}
+@set PI @math{@pi}
@end iftex
@ifnottex
@set LEQ <=
+@set PI @i{pi}
@end ifnottex
@ifnottex
@@ -292,20 +294,24 @@ particular records in a file and perform operations upon them.
* Arrays:: The description and use of arrays. Also
includes array-oriented control statements.
* Functions:: Built-in and user-defined functions.
+* Library Functions:: A Library of @command{awk} Functions.
+* Sample Programs:: Many @command{awk} programs with complete
+ explanations.
* Internationalization:: Getting @command{gawk} to speak your
language.
* Advanced Features:: Stuff for advanced users, specific to
@command{gawk}.
-* Library Functions:: A Library of @command{awk} Functions.
-* Sample Programs:: Many @command{awk} programs with complete
- explanations.
-* Debugger:: The @code{dgawk} debugger.
+* Debugger:: The @code{gawk} debugger.
+* Arbitrary Precision Arithmetic:: Arbitrary precision arithmetic with
+ @command{gawk}.
+* Dynamic Extensions:: Adding new built-in functions to
+ @command{gawk}.
* Language History:: The evolution of the @command{awk}
language.
* Installation:: Installing @command{gawk} under various
operating systems.
-* Notes:: Notes about @command{gawk} extensions and
- possible future work.
+* Notes:: Notes about adding things to @command{gawk}
+ and possible future work.
* Basic Concepts:: A very quick introduction to programming
concepts.
* Glossary:: An explanation of some unfamiliar terms.
@@ -315,399 +321,532 @@ particular records in a file and perform operations upon them.
* Index:: Concept and Variable Index.
@detailmenu
-* History:: The history of @command{gawk} and
- @command{awk}.
-* Names:: What name to use to find @command{awk}.
-* This Manual:: Using this @value{DOCUMENT}. Includes
- sample input files that you can use.
-* Conventions:: Typographical Conventions.
-* Manual History:: Brief history of the GNU project and this
- @value{DOCUMENT}.
-* How To Contribute:: Helping to save the world.
-* Acknowledgments:: Acknowledgments.
-* Running gawk:: How to run @command{gawk} programs;
- includes command-line syntax.
-* One-shot:: Running a short throwaway @command{awk}
- program.
-* Read Terminal:: Using no input files (input from terminal
- instead).
-* Long:: Putting permanent @command{awk} programs in
- files.
-* Executable Scripts:: Making self-contained @command{awk}
- programs.
-* Comments:: Adding documentation to @command{gawk}
- programs.
-* Quoting:: More discussion of shell quoting issues.
-* DOS Quoting:: Quoting in Windows Batch Files.
-* Sample Data Files:: Sample data files for use in the
- @command{awk} programs illustrated in this
- @value{DOCUMENT}.
-* Very Simple:: A very simple example.
-* Two Rules:: A less simple one-line example using two
- rules.
-* More Complex:: A more complex example.
-* Statements/Lines:: Subdividing or combining statements into
- lines.
-* Other Features:: Other Features of @command{awk}.
-* When:: When to use @command{gawk} and when to use
- other things.
-* Command Line:: How to run @command{awk}.
-* Options:: Command-line options and their meanings.
-* Other Arguments:: Input file names and variable assignments.
-* Naming Standard Input:: How to specify standard input with other
- files.
-* Environment Variables:: The environment variables @command{gawk}
- uses.
-* AWKPATH Variable:: Searching directories for @command{awk}
- programs.
-* Other Environment Variables:: The environment variables.
-* Exit Status:: @command{gawk}'s exit status.
-* Include Files:: Including other files into your program.
-* Obsolete:: Obsolete Options and/or features.
-* Undocumented:: Undocumented Options and Features.
-* Regexp Usage:: How to Use Regular Expressions.
-* Escape Sequences:: How to write nonprinting characters.
-* Regexp Operators:: Regular Expression Operators.
-* Bracket Expressions:: What can go between @samp{[...]}.
-* GNU Regexp Operators:: Operators specific to GNU software.
-* Case-sensitivity:: How to do case-insensitive matching.
-* Leftmost Longest:: How much text matches.
-* Computed Regexps:: Using Dynamic Regexps.
-* Records:: Controlling how data is split into records.
-* Fields:: An introduction to fields.
-* Nonconstant Fields:: Nonconstant Field Numbers.
-* Changing Fields:: Changing the Contents of a Field.
-* Field Separators:: The field separator and how to change it.
-* Default Field Splitting:: How fields are normally separated.
-* Regexp Field Splitting:: Using regexps as the field separator.
-* Single Character Fields:: Making each character a separate field.
-* Command Line Field Separator:: Setting @code{FS} from the command-line.
-* Field Splitting Summary:: Some final points and a summary table.
-* Constant Size:: Reading constant width data.
-* Splitting By Content:: Defining Fields By Content
-* Multiple Line:: Reading multi-line records.
-* Getline:: Reading files under explicit program
- control using the @code{getline} function.
-* Plain Getline:: Using @code{getline} with no arguments.
-* Getline/Variable:: Using @code{getline} into a variable.
-* Getline/File:: Using @code{getline} from a file.
-* Getline/Variable/File:: Using @code{getline} into a variable from a
- file.
-* Getline/Pipe:: Using @code{getline} from a pipe.
-* Getline/Variable/Pipe:: Using @code{getline} into a variable from a
- pipe.
-* Getline/Coprocess:: Using @code{getline} from a coprocess.
-* Getline/Variable/Coprocess:: Using @code{getline} into a variable from a
- coprocess.
-* Getline Notes:: Important things to know about
- @code{getline}.
-* Getline Summary:: Summary of @code{getline} Variants.
-* Command line directories:: What happens if you put a directory on the
- command line.
-* Print:: The @code{print} statement.
-* Print Examples:: Simple examples of @code{print} statements.
-* Output Separators:: The output separators and how to change
- them.
-* OFMT:: Controlling Numeric Output With
- @code{print}.
-* Printf:: The @code{printf} statement.
-* Basic Printf:: Syntax of the @code{printf} statement.
-* Control Letters:: Format-control letters.
-* Format Modifiers:: Format-specification modifiers.
-* Printf Examples:: Several examples.
-* Redirection:: How to redirect output to multiple files
- and pipes.
-* Special Files:: File name interpretation in @command{gawk}.
- @command{gawk} allows access to inherited
- file descriptors.
-* Special FD:: Special files for I/O.
-* Special Network:: Special files for network communications.
-* Special Caveats:: Things to watch out for.
-* Close Files And Pipes:: Closing Input and Output Files and Pipes.
-* Values:: Constants, Variables, and Regular
- Expressions.
-* Constants:: String, numeric and regexp constants.
-* Scalar Constants:: Numeric and string constants.
-* Nondecimal-numbers:: What are octal and hex numbers.
-* Regexp Constants:: Regular Expression constants.
-* Using Constant Regexps:: When and how to use a regexp constant.
-* Variables:: Variables give names to values for later
- use.
-* Using Variables:: Using variables in your programs.
-* Assignment Options:: Setting variables on the command-line and a
- summary of command-line syntax. This is an
- advanced method of input.
-* Conversion:: The conversion of strings to numbers and
- vice versa.
-* All Operators:: @command{gawk}'s operators.
-* Arithmetic Ops:: Arithmetic operations (@samp{+}, @samp{-},
- etc.)
-* Concatenation:: Concatenating strings.
-* Assignment Ops:: Changing the value of a variable or a
- field.
-* Increment Ops:: Incrementing the numeric value of a
- variable.
-* Truth Values and Conditions:: Testing for true and false.
-* Truth Values:: What is ``true'' and what is ``false''.
-* Typing and Comparison:: How variables acquire types and how this
- affects comparison of numbers and strings
- with @samp{<}, etc.
-* Variable Typing:: String type versus numeric type.
-* Comparison Operators:: The comparison operators.
-* POSIX String Comparison:: String comparison with POSIX rules.
-* Boolean Ops:: Combining comparison expressions using
- boolean operators @samp{||} (``or''),
- @samp{&&} (``and'') and @samp{!} (``not'').
-* Conditional Exp:: Conditional expressions select between two
- subexpressions under control of a third
- subexpression.
-* Function Calls:: A function call is an expression.
-* Precedence:: How various operators nest.
-* Locales:: How the locale affects things.
-* Pattern Overview:: What goes into a pattern.
-* Regexp Patterns:: Using regexps as patterns.
-* Expression Patterns:: Any expression can be used as a pattern.
-* Ranges:: Pairs of patterns specify record ranges.
-* BEGIN/END:: Specifying initialization and cleanup
- rules.
-* Using BEGIN/END:: How and why to use BEGIN/END rules.
-* I/O And BEGIN/END:: I/O issues in BEGIN/END rules.
-* BEGINFILE/ENDFILE:: Two special patterns for advanced control.
-* Empty:: The empty pattern, which matches every
- record.
-* Using Shell Variables:: How to use shell variables with
- @command{awk}.
-* Action Overview:: What goes into an action.
-* Statements:: Describes the various control statements in
- detail.
-* If Statement:: Conditionally execute some @command{awk}
- statements.
-* While Statement:: Loop until some condition is satisfied.
-* Do Statement:: Do specified action while looping until
- some condition is satisfied.
-* For Statement:: Another looping statement, that provides
- initialization and increment clauses.
-* Switch Statement:: Switch/case evaluation for conditional
- execution of statements based on a value.
-* Break Statement:: Immediately exit the innermost enclosing
- loop.
-* Continue Statement:: Skip to the end of the innermost enclosing
- loop.
-* Next Statement:: Stop processing the current input record.
-* Nextfile Statement:: Stop processing the current file.
-* Exit Statement:: Stop execution of @command{awk}.
-* Built-in Variables:: Summarizes the built-in variables.
-* User-modified:: Built-in variables that you change to
- control @command{awk}.
-* Auto-set:: Built-in variables where @command{awk}
- gives you information.
-* ARGC and ARGV:: Ways to use @code{ARGC} and @code{ARGV}.
-* Array Basics:: The basics of arrays.
-* Array Intro:: Introduction to Arrays
-* Reference to Elements:: How to examine one element of an array.
-* Assigning Elements:: How to change an element of an array.
-* Array Example:: Basic Example of an Array
-* Scanning an Array:: A variation of the @code{for} statement. It
- loops through the indices of an array's
- existing elements.
-* Controlling Scanning:: Controlling the order in which arrays are
- scanned.
-* Delete:: The @code{delete} statement removes an
- element from an array.
-* Numeric Array Subscripts:: How to use numbers as subscripts in
- @command{awk}.
-* Uninitialized Subscripts:: Using Uninitialized variables as
- subscripts.
-* Multi-dimensional:: Emulating multidimensional arrays in
- @command{awk}.
-* Multi-scanning:: Scanning multidimensional arrays.
-* Arrays of Arrays:: True multidimensional arrays.
-* Built-in:: Summarizes the built-in functions.
-* Calling Built-in:: How to call built-in functions.
-* Numeric Functions:: Functions that work with numbers, including
- @code{int()}, @code{sin()} and
- @code{rand()}.
-* String Functions:: Functions for string manipulation, such as
- @code{split()}, @code{match()} and
- @code{sprintf()}.
-* Gory Details:: More than you want to know about @samp{\}
- and @samp{&} with @code{sub()},
- @code{gsub()}, and @code{gensub()}.
-* I/O Functions:: Functions for files and shell commands.
-* Time Functions:: Functions for dealing with timestamps.
-* Bitwise Functions:: Functions for bitwise operations.
-* Type Functions:: Functions for type information.
-* I18N Functions:: Functions for string translation.
-* User-defined:: Describes User-defined functions in detail.
-* Definition Syntax:: How to write definitions and what they
- mean.
-* Function Example:: An example function definition and what it
- does.
-* Function Caveats:: Things to watch out for.
-* Calling A Function:: Don't use spaces.
-* Variable Scope:: Controlling variable scope.
-* Pass By Value/Reference:: Passing parameters.
-* Return Statement:: Specifying the value a function returns.
-* Dynamic Typing:: How variable types can change at runtime.
-* Indirect Calls:: Choosing the function to call at runtime.
-* I18N and L10N:: Internationalization and Localization.
-* Explaining gettext:: How GNU @code{gettext} works.
-* Programmer i18n:: Features for the programmer.
-* Translator i18n:: Features for the translator.
-* String Extraction:: Extracting marked strings.
-* Printf Ordering:: Rearranging @code{printf} arguments.
-* I18N Portability:: @command{awk}-level portability issues.
-* I18N Example:: A simple i18n example.
-* Gawk I18N:: @command{gawk} is also internationalized.
-* Nondecimal Data:: Allowing nondecimal input data.
-* Array Sorting:: Facilities for controlling array traversal
- and sorting arrays.
-* Controlling Array Traversal:: How to use PROCINFO["sorted_in"].
-* Array Sorting Functions:: How to use @code{asort()} and
- @code{asorti()}.
-* Two-way I/O:: Two-way communications with another
- process.
-* TCP/IP Networking:: Using @command{gawk} for network
- programming.
-* Profiling:: Profiling your @command{awk} programs.
-* Library Names:: How to best name private global variables
- in library functions.
-* General Functions:: Functions that are of general use.
-* Strtonum Function:: A replacement for the built-in
- @code{strtonum()} function.
-* Assert Function:: A function for assertions in @command{awk}
- programs.
-* Round Function:: A function for rounding if @code{sprintf()}
- does not do it correctly.
-* Cliff Random Function:: The Cliff Random Number Generator.
-* Ordinal Functions:: Functions for using characters as numbers
- and vice versa.
-* Join Function:: A function to join an array into a string.
-* Gettimeofday Function:: A function to get formatted times.
-* Data File Management:: Functions for managing command-line data
- files.
-* Filetrans Function:: A function for handling data file
- transitions.
-* Rewind Function:: A function for rereading the current file.
-* File Checking:: Checking that data files are readable.
-* Empty Files:: Checking for zero-length files.
-* Ignoring Assigns:: Treating assignments as file names.
-* Getopt Function:: A function for processing command-line
- arguments.
-* Passwd Functions:: Functions for getting user information.
-* Group Functions:: Functions for getting group information.
-* Walking Arrays:: A function to walk arrays of arrays.
-* Running Examples:: How to run these examples.
-* Clones:: Clones of common utilities.
-* Cut Program:: The @command{cut} utility.
-* Egrep Program:: The @command{egrep} utility.
-* Id Program:: The @command{id} utility.
-* Split Program:: The @command{split} utility.
-* Tee Program:: The @command{tee} utility.
-* Uniq Program:: The @command{uniq} utility.
-* Wc Program:: The @command{wc} utility.
-* Miscellaneous Programs:: Some interesting @command{awk} programs.
-* Dupword Program:: Finding duplicated words in a document.
-* Alarm Program:: An alarm clock.
-* Translate Program:: A program similar to the @command{tr}
- utility.
-* Labels Program:: Printing mailing labels.
-* Word Sorting:: A program to produce a word usage count.
-* History Sorting:: Eliminating duplicate entries from a
- history file.
-* Extract Program:: Pulling out programs from Texinfo source
- files.
-* Simple Sed:: A Simple Stream Editor.
-* Igawk Program:: A wrapper for @command{awk} that includes
- files.
-* Anagram Program:: Finding anagrams from a dictionary.
-* Signature Program:: People do amazing things with too much time
- on their hands.
-* Debugging:: Introduction to @command{dgawk}.
-* Debugging Concepts:: Debugging In General.
-* Debugging Terms:: Additional Debugging Concepts.
-* Awk Debugging:: Awk Debugging.
-* Sample dgawk session:: Sample @command{dgawk} session.
-* dgawk invocation:: @command{dgawk} Invocation.
-* Finding The Bug:: Finding The Bug.
-* List of Debugger Commands:: Main @command{dgawk} Commands.
-* Breakpoint Control:: Control of breakpoints.
-* Dgawk Execution Control:: Control of execution.
-* Viewing And Changing Data:: Viewing and changing data.
-* Dgawk Stack:: Dealing with the stack.
-* Dgawk Info:: Obtaining information about the program and
- the debugger state.
-* Miscellaneous Dgawk Commands:: Miscellaneous Commands.
-* Readline Support:: Readline Support.
-* Dgawk Limitations:: Limitations and future plans.
-* V7/SVR3.1:: The major changes between V7 and System V
- Release 3.1.
-* SVR4:: Minor changes between System V Releases 3.1
- and 4.
-* POSIX:: New features from the POSIX standard.
-* BTL:: New features from Brian Kernighan's version
- of @command{awk}.
-* POSIX/GNU:: The extensions in @command{gawk} not in
- POSIX @command{awk}.
-* Common Extensions:: Common Extensions Summary.
-* Ranges and Locales:: How locales used to affect regexp ranges.
-* Contributors:: The major contributors to @command{gawk}.
-* Gawk Distribution:: What is in the @command{gawk} distribution.
-* Getting:: How to get the distribution.
-* Extracting:: How to extract the distribution.
-* Distribution contents:: What is in the distribution.
-* Unix Installation:: Installing @command{gawk} under various
- versions of Unix.
-* Quick Installation:: Compiling @command{gawk} under Unix.
-* Additional Configuration Options:: Other compile-time options.
-* Configuration Philosophy:: How it's all supposed to work.
-* Non-Unix Installation:: Installation on Other Operating Systems.
-* PC Installation:: Installing and Compiling @command{gawk} on
- MS-DOS and OS/2.
-* PC Binary Installation:: Installing a prepared distribution.
-* PC Compiling:: Compiling @command{gawk} for MS-DOS,
- Windows32, and OS/2.
-* PC Testing:: Testing @command{gawk} on PC systems.
-* PC Using:: Running @command{gawk} on MS-DOS, Windows32
- and OS/2.
-* Cygwin:: Building and running @command{gawk} for
- Cygwin.
-* MSYS:: Using @command{gawk} In The MSYS
- Environment.
-* VMS Installation:: Installing @command{gawk} on VMS.
-* VMS Compilation:: How to compile @command{gawk} under VMS.
-* VMS Installation Details:: How to install @command{gawk} under VMS.
-* VMS Running:: How to run @command{gawk} under VMS.
-* VMS Old Gawk:: An old version comes with some VMS systems.
-* Bugs:: Reporting Problems and Bugs.
-* Other Versions:: Other freely available @command{awk}
- implementations.
-* Compatibility Mode:: How to disable certain @command{gawk}
- extensions.
-* Additions:: Making Additions To @command{gawk}.
-* Accessing The Source:: Accessing the Git repository.
-* Adding Code:: Adding code to the main body of
- @command{gawk}.
-* New Ports:: Porting @command{gawk} to a new operating
- system.
-* Dynamic Extensions:: Adding new built-in functions to
- @command{gawk}.
-* Internals:: A brief look at some @command{gawk}
- internals.
-* Plugin License:: A note about licensing.
-* Sample Library:: A example of new functions.
-* Internal File Description:: What the new functions will do.
-* Internal File Ops:: The code for internal file operations.
-* Using Internal File Ops:: How to use an external extension.
-* Future Extensions:: New features that may be implemented one
- day.
-* Basic High Level:: The high level view.
-* Basic Data Typing:: A very quick intro to data types.
-* Floating Point Issues:: Stuff to know about floating-point numbers.
-* String Conversion Precision:: The String Value Can Lie.
-* Unexpected Results:: Floating Point Numbers Are Not Abstract
- Numbers.
-* POSIX Floating Point Problems:: Standards Versus Existing Practice.
+* History:: The history of @command{gawk} and
+ @command{awk}.
+* Names:: What name to use to find
+ @command{awk}.
+* This Manual:: Using this @value{DOCUMENT}. Includes
+ sample input files that you can use.
+* Conventions:: Typographical Conventions.
+* Manual History:: Brief history of the GNU project and
+ this @value{DOCUMENT}.
+* How To Contribute:: Helping to save the world.
+* Acknowledgments:: Acknowledgments.
+* Running gawk:: How to run @command{gawk} programs;
+ includes command-line syntax.
+* One-shot:: Running a short throwaway
+ @command{awk} program.
+* Read Terminal:: Using no input files (input from
+ terminal instead).
+* Long:: Putting permanent @command{awk}
+ programs in files.
+* Executable Scripts:: Making self-contained @command{awk}
+ programs.
+* Comments:: Adding documentation to @command{gawk}
+ programs.
+* Quoting:: More discussion of shell quoting
+ issues.
+* DOS Quoting:: Quoting in Windows Batch Files.
+* Sample Data Files:: Sample data files for use in the
+ @command{awk} programs illustrated in
+ this @value{DOCUMENT}.
+* Very Simple:: A very simple example.
+* Two Rules:: A less simple one-line example using
+ two rules.
+* More Complex:: A more complex example.
+* Statements/Lines:: Subdividing or combining statements
+ into lines.
+* Other Features:: Other Features of @command{awk}.
+* When:: When to use @command{gawk} and when to
+ use other things.
+* Command Line:: How to run @command{awk}.
+* Options:: Command-line options and their
+ meanings.
+* Other Arguments:: Input file names and variable
+ assignments.
+* Naming Standard Input:: How to specify standard input with
+ other files.
+* Environment Variables:: The environment variables
+ @command{gawk} uses.
+* AWKPATH Variable:: Searching directories for
+ @command{awk} programs.
+* AWKLIBPATH Variable:: Searching directories for
+ @command{awk} shared libraries.
+* Other Environment Variables:: The environment variables.
+* Exit Status:: @command{gawk}'s exit status.
+* Include Files:: Including other files into your
+ program.
+* Loading Shared Libraries:: Loading shared libraries into your
+ program.
+* Obsolete:: Obsolete Options and/or features.
+* Undocumented:: Undocumented Options and Features.
+* Regexp Usage:: How to Use Regular Expressions.
+* Escape Sequences:: How to write nonprinting characters.
+* Regexp Operators:: Regular Expression Operators.
+* Bracket Expressions:: What can go between @samp{[...]}.
+* GNU Regexp Operators:: Operators specific to GNU software.
+* Case-sensitivity:: How to do case-insensitive matching.
+* Leftmost Longest:: How much text matches.
+* Computed Regexps:: Using Dynamic Regexps.
+* Records:: Controlling how data is split into
+ records.
+* Fields:: An introduction to fields.
+* Nonconstant Fields:: Nonconstant Field Numbers.
+* Changing Fields:: Changing the Contents of a Field.
+* Field Separators:: The field separator and how to change
+ it.
+* Default Field Splitting:: How fields are normally separated.
+* Regexp Field Splitting:: Using regexps as the field separator.
+* Single Character Fields:: Making each character a separate
+ field.
+* Command Line Field Separator:: Setting @code{FS} from the
+ command-line.
+* Field Splitting Summary:: Some final points and a summary table.
+* Constant Size:: Reading constant width data.
+* Splitting By Content:: Defining Fields By Content
+* Multiple Line:: Reading multi-line records.
+* Getline:: Reading files under explicit program
+ control using the @code{getline}
+ function.
+* Plain Getline:: Using @code{getline} with no
+ arguments.
+* Getline/Variable:: Using @code{getline} into a variable.
+* Getline/File:: Using @code{getline} from a file.
+* Getline/Variable/File:: Using @code{getline} into a variable
+ from a file.
+* Getline/Pipe:: Using @code{getline} from a pipe.
+* Getline/Variable/Pipe:: Using @code{getline} into a variable
+ from a pipe.
+* Getline/Coprocess:: Using @code{getline} from a coprocess.
+* Getline/Variable/Coprocess:: Using @code{getline} into a variable
+ from a coprocess.
+* Getline Notes:: Important things to know about
+ @code{getline}.
+* Getline Summary:: Summary of @code{getline} Variants.
+* Read Timeout:: Reading input with a timeout.
+* Command line directories:: What happens if you put a directory on
+ the command line.
+* Print:: The @code{print} statement.
+* Print Examples:: Simple examples of @code{print}
+ statements.
+* Output Separators:: The output separators and how to
+ change them.
+* OFMT:: Controlling Numeric Output With
+ @code{print}.
+* Printf:: The @code{printf} statement.
+* Basic Printf:: Syntax of the @code{printf} statement.
+* Control Letters:: Format-control letters.
+* Format Modifiers:: Format-specification modifiers.
+* Printf Examples:: Several examples.
+* Redirection:: How to redirect output to multiple
+ files and pipes.
+* Special Files:: File name interpretation in
+ @command{gawk}. @command{gawk} allows
+ access to inherited file descriptors.
+* Special FD:: Special files for I/O.
+* Special Network:: Special files for network
+ communications.
+* Special Caveats:: Things to watch out for.
+* Close Files And Pipes:: Closing Input and Output Files and
+ Pipes.
+* Values:: Constants, Variables, and Regular
+ Expressions.
+* Constants:: String, numeric and regexp constants.
+* Scalar Constants:: Numeric and string constants.
+* Nondecimal-numbers:: What are octal and hex numbers.
+* Regexp Constants:: Regular Expression constants.
+* Using Constant Regexps:: When and how to use a regexp constant.
+* Variables:: Variables give names to values for
+ later use.
+* Using Variables:: Using variables in your programs.
+* Assignment Options:: Setting variables on the command-line
+ and a summary of command-line syntax.
+ This is an advanced method of input.
+* Conversion:: The conversion of strings to numbers
+ and vice versa.
+* All Operators:: @command{gawk}'s operators.
+* Arithmetic Ops:: Arithmetic operations (@samp{+},
+ @samp{-}, etc.)
+* Concatenation:: Concatenating strings.
+* Assignment Ops:: Changing the value of a variable or a
+ field.
+* Increment Ops:: Incrementing the numeric value of a
+ variable.
+* Truth Values and Conditions:: Testing for true and false.
+* Truth Values:: What is ``true'' and what is
+ ``false''.
+* Typing and Comparison:: How variables acquire types and how
+ this affects comparison of numbers and
+ strings with @samp{<}, etc.
+* Variable Typing:: String type versus numeric type.
+* Comparison Operators:: The comparison operators.
+* POSIX String Comparison:: String comparison with POSIX rules.
+* Boolean Ops:: Combining comparison expressions using
+ boolean operators @samp{||} (``or''),
+ @samp{&&} (``and'') and @samp{!}
+ (``not'').
+* Conditional Exp:: Conditional expressions select between
+ two subexpressions under control of a
+ third subexpression.
+* Function Calls:: A function call is an expression.
+* Precedence:: How various operators nest.
+* Locales:: How the locale affects things.
+* Pattern Overview:: What goes into a pattern.
+* Regexp Patterns:: Using regexps as patterns.
+* Expression Patterns:: Any expression can be used as a
+ pattern.
+* Ranges:: Pairs of patterns specify record
+ ranges.
+* BEGIN/END:: Specifying initialization and cleanup
+ rules.
+* Using BEGIN/END:: How and why to use BEGIN/END rules.
+* I/O And BEGIN/END:: I/O issues in BEGIN/END rules.
+* BEGINFILE/ENDFILE:: Two special patterns for advanced
+ control.
+* Empty:: The empty pattern, which matches every
+ record.
+* Using Shell Variables:: How to use shell variables with
+ @command{awk}.
+* Action Overview:: What goes into an action.
+* Statements:: Describes the various control
+ statements in detail.
+* If Statement:: Conditionally execute some
+ @command{awk} statements.
+* While Statement:: Loop until some condition is
+ satisfied.
+* Do Statement:: Do specified action while looping
+ until some condition is satisfied.
+* For Statement:: Another looping statement, that
+ provides initialization and increment
+ clauses.
+* Switch Statement:: Switch/case evaluation for conditional
+ execution of statements based on a
+ value.
+* Break Statement:: Immediately exit the innermost
+ enclosing loop.
+* Continue Statement:: Skip to the end of the innermost
+ enclosing loop.
+* Next Statement:: Stop processing the current input
+ record.
+* Nextfile Statement:: Stop processing the current file.
+* Exit Statement:: Stop execution of @command{awk}.
+* Built-in Variables:: Summarizes the built-in variables.
+* User-modified:: Built-in variables that you change to
+ control @command{awk}.
+* Auto-set:: Built-in variables where @command{awk}
+ gives you information.
+* ARGC and ARGV:: Ways to use @code{ARGC} and
+ @code{ARGV}.
+* Array Basics:: The basics of arrays.
+* Array Intro:: Introduction to Arrays
+* Reference to Elements:: How to examine one element of an
+ array.
+* Assigning Elements:: How to change an element of an array.
+* Array Example:: Basic Example of an Array
+* Scanning an Array:: A variation of the @code{for}
+ statement. It loops through the
+ indices of an array's existing
+ elements.
+* Controlling Scanning:: Controlling the order in which arrays
+ are scanned.
+* Delete:: The @code{delete} statement removes an
+ element from an array.
+* Numeric Array Subscripts:: How to use numbers as subscripts in
+ @command{awk}.
+* Uninitialized Subscripts:: Using Uninitialized variables as
+ subscripts.
+* Multi-dimensional:: Emulating multidimensional arrays in
+ @command{awk}.
+* Multi-scanning:: Scanning multidimensional arrays.
+* Arrays of Arrays:: True multidimensional arrays.
+* Built-in:: Summarizes the built-in functions.
+* Calling Built-in:: How to call built-in functions.
+* Numeric Functions:: Functions that work with numbers,
+ including @code{int()}, @code{sin()}
+ and @code{rand()}.
+* String Functions:: Functions for string manipulation,
+ such as @code{split()}, @code{match()}
+ and @code{sprintf()}.
+* Gory Details:: More than you want to know about
+ @samp{\} and @samp{&} with
+ @code{sub()}, @code{gsub()}, and
+ @code{gensub()}.
+* I/O Functions:: Functions for files and shell
+ commands.
+* Time Functions:: Functions for dealing with timestamps.
+* Bitwise Functions:: Functions for bitwise operations.
+* Type Functions:: Functions for type information.
+* I18N Functions:: Functions for string translation.
+* User-defined:: Describes User-defined functions in
+ detail.
+* Definition Syntax:: How to write definitions and what they
+ mean.
+* Function Example:: An example function definition and
+ what it does.
+* Function Caveats:: Things to watch out for.
+* Calling A Function:: Don't use spaces.
+* Variable Scope:: Controlling variable scope.
+* Pass By Value/Reference:: Passing parameters.
+* Return Statement:: Specifying the value a function
+ returns.
+* Dynamic Typing:: How variable types can change at
+ runtime.
+* Indirect Calls:: Choosing the function to call at
+ runtime.
+* Library Names:: How to best name private global
+ variables in library functions.
+* General Functions:: Functions that are of general use.
+* Strtonum Function:: A replacement for the built-in
+ @code{strtonum()} function.
+* Assert Function:: A function for assertions in
+ @command{awk} programs.
+* Round Function:: A function for rounding if
+ @code{sprintf()} does not do it
+ correctly.
+* Cliff Random Function:: The Cliff Random Number Generator.
+* Ordinal Functions:: Functions for using characters as
+ numbers and vice versa.
+* Join Function:: A function to join an array into a
+ string.
+* Getlocaltime Function:: A function to get formatted times.
+* Data File Management:: Functions for managing command-line
+ data files.
+* Filetrans Function:: A function for handling data file
+ transitions.
+* Rewind Function:: A function for rereading the current
+ file.
+* File Checking:: Checking that data files are readable.
+* Empty Files:: Checking for zero-length files.
+* Ignoring Assigns:: Treating assignments as file names.
+* Getopt Function:: A function for processing command-line
+ arguments.
+* Passwd Functions:: Functions for getting user
+ information.
+* Group Functions:: Functions for getting group
+ information.
+* Walking Arrays:: A function to walk arrays of arrays.
+* Running Examples:: How to run these examples.
+* Clones:: Clones of common utilities.
+* Cut Program:: The @command{cut} utility.
+* Egrep Program:: The @command{egrep} utility.
+* Id Program:: The @command{id} utility.
+* Split Program:: The @command{split} utility.
+* Tee Program:: The @command{tee} utility.
+* Uniq Program:: The @command{uniq} utility.
+* Wc Program:: The @command{wc} utility.
+* Miscellaneous Programs:: Some interesting @command{awk}
+ programs.
+* Dupword Program:: Finding duplicated words in a
+ document.
+* Alarm Program:: An alarm clock.
+* Translate Program:: A program similar to the @command{tr}
+ utility.
+* Labels Program:: Printing mailing labels.
+* Word Sorting:: A program to produce a word usage
+ count.
+* History Sorting:: Eliminating duplicate entries from a
+ history file.
+* Extract Program:: Pulling out programs from Texinfo
+ source files.
+* Simple Sed:: A Simple Stream Editor.
+* Igawk Program:: A wrapper for @command{awk} that
+ includes files.
+* Anagram Program:: Finding anagrams from a dictionary.
+* Signature Program:: People do amazing things with too much
+ time on their hands.
+* I18N and L10N:: Internationalization and Localization.
+* Explaining gettext:: How GNU @code{gettext} works.
+* Programmer i18n:: Features for the programmer.
+* Translator i18n:: Features for the translator.
+* String Extraction:: Extracting marked strings.
+* Printf Ordering:: Rearranging @code{printf} arguments.
+* I18N Portability:: @command{awk}-level portability
+ issues.
+* I18N Example:: A simple i18n example.
+* Gawk I18N:: @command{gawk} is also
+ internationalized.
+* Nondecimal Data:: Allowing nondecimal input data.
+* Array Sorting:: Facilities for controlling array
+ traversal and sorting arrays.
+* Controlling Array Traversal:: How to use PROCINFO["sorted_in"].
+* Array Sorting Functions:: How to use @code{asort()} and
+ @code{asorti()}.
+* Two-way I/O:: Two-way communications with another
+ process.
+* TCP/IP Networking:: Using @command{gawk} for network
+ programming.
+* Profiling:: Profiling your @command{awk} programs.
+* Debugging:: Introduction to @command{gawk}
+ debugger.
+* Debugging Concepts:: Debugging in General.
+* Debugging Terms:: Additional Debugging Concepts.
+* Awk Debugging:: Awk Debugging.
+* Sample Debugging Session:: Sample debugging session.
+* Debugger Invocation:: How to Start the Debugger.
+* Finding The Bug:: Finding the Bug.
+* List of Debugger Commands:: Main debugger commands.
+* Breakpoint Control:: Control of Breakpoints.
+* Debugger Execution Control:: Control of Execution.
+* Viewing And Changing Data:: Viewing and Changing Data.
+* Execution Stack:: Dealing with the Stack.
+* Debugger Info:: Obtaining Information about the
+ Program and the Debugger State.
+* Miscellaneous Debugger Commands:: Miscellaneous Commands.
+* Readline Support:: Readline support.
+* Limitations:: Limitations and future plans.
+* General Arithmetic:: An introduction to computer
+ arithmetic.
+* Floating Point Issues:: Stuff to know about floating-point
+ numbers.
+* String Conversion Precision:: The String Value Can Lie.
+* Unexpected Results:: Floating Point Numbers Are Not
+ Abstract Numbers.
+* POSIX Floating Point Problems:: Standards Versus Existing Practice.
+* Integer Programming:: Effective integer programming.
+* Floating-point Programming:: Effective Floating-point Programming.
+* Floating-point Representation:: Binary floating-point representation.
+* Floating-point Context:: Floating-point context.
+* Rounding Mode:: Floating-point rounding mode.
+* Gawk and MPFR:: How @command{gawk} provides
+ arbitrary-precision arithmetic.
+* Arbitrary Precision Floats:: Arbitrary Precision Floating-point
+ Arithmetic with @command{gawk}.
+* Setting Precision:: Setting the working precision.
+* Setting Rounding Mode:: Setting the rounding mode.
+* Floating-point Constants:: Representing floating-point constants.
+* Changing Precision:: Changing the precision of a number.
+* Exact Arithmetic:: Exact arithmetic with floating-point
+ numbers.
+* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic
+ with @command{gawk}.
+* Extension Intro:: What is an extension.
+* Plugin License:: A note about licensing.
+* Extension Design:: Design notes about the extension API.
+* Old Extension Problems:: Problems with the old mechanism.
+* Extension New Mechanism Goals:: Goals for the new mechanism.
+* Extension Other Design Decisions:: Some other design decisions.
+* Extension Mechanism Outline:: An outline of how it works.
+* Extension Future Growth:: Some room for future growth.
+* Extension API Description:: A full description of the API.
+* Extension API Functions Introduction:: Introduction to the API functions.
+* General Data Types:: The data types.
+* Requesting Values:: How to get a value.
+* Constructor Functions:: Functions for creating values.
+* Registration Functions:: Functions to register things with
+ @command{gawk}.
+* Extension Functions:: Registering extension functions.
+* Exit Callback Functions:: Registering an exit callback.
+* Extension Version String:: Registering a version string.
+* Input Parsers:: Registering an input parser.
+* Output Wrappers:: Registering an output wrapper.
+* Two-way processors:: Registering a two-way processor.
+* Printing Messages:: Functions for printing messages.
+* Updating @code{ERRNO}:: Functions for updating @code{ERRNO}.
+* Accessing Parameters:: Functions for accessing parameters.
+* Symbol Table Access:: Functions for accessing global
+ variables.
+* Symbol table by name:: Accessing variables by name.
+* Symbol table by cookie:: Accessing variables by ``cookie''.
+* Cached values:: Creating and using cached values.
+* Array Manipulation:: Functions for working with arrays.
+* Array Data Types:: Data types for working with arrays.
+* Array Functions:: Functions for working with arrays.
+* Flattening Arrays:: How to flatten arrays.
+* Creating Arrays:: How to create and populate arrays.
+* Extension API Variables:: Variables provided by the API.
+* Extension Versioning:: API Version information.
+* Extension API Informational Variables:: Variables providing information about
+ @command{gawk}'s invocation.
+* Extension API Boilerplate:: Boilerplate code for using the API.
+* Finding Extensions:: How @command{gawk} finds compiled
+ extensions.
+* Extension Example:: Example C code for an extension.
+* Internal File Description:: What the new functions will do.
+* Internal File Ops:: The code for internal file operations.
+* Using Internal File Ops:: How to use an external extension.
+* Extension Samples:: The sample extensions that ship with
+ @code{gawk}.
+* Extension Sample File Functions:: The file functions sample.
+* Extension Sample Fnmatch:: An interface to @code{fnmatch()}.
+* Extension Sample Fork:: An interface to @code{fork()} and
+ other process functions.
+* Extension Sample Ord:: Character to value to character
+ conversions.
+* Extension Sample Readdir:: An interface to @code{readdir()}.
+* Extension Sample Revout:: Reversing output sample output
+ wrapper.
+* Extension Sample Rev2way:: Reversing data sample two-way
+ processor.
+* Extension Sample Read write array:: Serializing an array to a file.
+* Extension Sample Readfile:: Reading an entire file into a string.
+* Extension Sample API Tests:: Tests for the API.
+* Extension Sample Time:: An interface to @code{gettimeofday()}
+ and @code{sleep()}.
+* gawkextlib:: The @code{gawkextlib} project.
+* V7/SVR3.1:: The major changes between V7 and
+ System V Release 3.1.
+* SVR4:: Minor changes between System V
+ Releases 3.1 and 4.
+* POSIX:: New features from the POSIX standard.
+* BTL:: New features from Brian Kernighan's
+ version of @command{awk}.
+* POSIX/GNU:: The extensions in @command{gawk} not
+ in POSIX @command{awk}.
+* Common Extensions:: Common Extensions Summary.
+* Ranges and Locales:: How locales used to affect regexp
+ ranges.
+* Contributors:: The major contributors to
+ @command{gawk}.
+* Gawk Distribution:: What is in the @command{gawk}
+ distribution.
+* Getting:: How to get the distribution.
+* Extracting:: How to extract the distribution.
+* Distribution contents:: What is in the distribution.
+* Unix Installation:: Installing @command{gawk} under
+ various versions of Unix.
+* Quick Installation:: Compiling @command{gawk} under Unix.
+* Additional Configuration Options:: Other compile-time options.
+* Configuration Philosophy:: How it's all supposed to work.
+* Non-Unix Installation:: Installation on Other Operating
+ Systems.
+* PC Installation:: Installing and Compiling
+ @command{gawk} on MS-DOS and OS/2.
+* PC Binary Installation:: Installing a prepared distribution.
+* PC Compiling:: Compiling @command{gawk} for MS-DOS,
+ Windows32, and OS/2.
+* PC Testing:: Testing @command{gawk} on PC systems.
+* PC Using:: Running @command{gawk} on MS-DOS,
+ Windows32 and OS/2.
+* Cygwin:: Building and running @command{gawk}
+ for Cygwin.
+* MSYS:: Using @command{gawk} In The MSYS
+ Environment.
+* VMS Installation:: Installing @command{gawk} on VMS.
+* VMS Compilation:: How to compile @command{gawk} under
+ VMS.
+* VMS Installation Details:: How to install @command{gawk} under
+ VMS.
+* VMS Running:: How to run @command{gawk} under VMS.
+* VMS Old Gawk:: An old version comes with some VMS
+ systems.
+* Bugs:: Reporting Problems and Bugs.
+* Other Versions:: Other freely available @command{awk}
+ implementations.
+* Compatibility Mode:: How to disable certain @command{gawk}
+ extensions.
+* Additions:: Making Additions To @command{gawk}.
+* Accessing The Source:: Accessing the Git repository.
+* Adding Code:: Adding code to the main body of
+ @command{gawk}.
+* New Ports:: Porting @command{gawk} to a new
+ operating system.
+* Derived Files:: Why derived files are kept in the
+ @command{git} repository.
+* Future Extensions:: New features that may be implemented
+ one day.
+* Implementation Limitations:: Some limitations of the implementation.
+* Basic High Level:: The high level view.
+* Basic Data Typing:: A very quick intro to data types.
@end detailmenu
@end menu
@@ -1113,6 +1252,12 @@ expert should find useful. In particular, the description of POSIX
@ref{Sample Programs},
should be of interest.
+This @value{DOCUMENT} is split into several parts, as follows:
+
+Part I describes the @command{awk} language and @command{gawk} program in detail.
+It starts with the basics, and continues through all of the features of @command{awk}.
+It contains the following chapters:
+
@ref{Getting Started},
provides the essentials you need to know to begin using @command{awk}.
@@ -1156,6 +1301,22 @@ describes the built-in functions @command{awk} and
@command{gawk} provide, as well as how to define
your own functions.
+Part II shows how to use @command{awk} and @command{gawk} for problem solving.
+There is lots of code here for you to read and learn from.
+It contains the following chapters:
+
+@ref{Library Functions}, which provides a number of functions meant to
+be used from main @command{awk} programs.
+
+@ref{Sample Programs},
+which provides many sample @command{awk} programs.
+
+Reading these two chapters allows you to see @command{awk}
+solving real problems.
+
+Part III focuses on features specific to @command{gawk}.
+It contains the following chapters:
+
@ref{Internationalization},
describes special features in @command{gawk} for translating program
messages into different languages at runtime.
@@ -1167,14 +1328,18 @@ are the abilities to have two-way communications with another process,
perform TCP/IP networking, and
profile your @command{awk} programs.
-@ref{Library Functions}, and
-@ref{Sample Programs},
-provide many sample @command{awk} programs.
-Reading them allows you to see @command{awk}
-solving real problems.
+@ref{Debugger}, describes the @command{awk} debugger.
+
+@ref{Arbitrary Precision Arithmetic},
+describes advanced arithmetic facilities provided by
+@command{gawk}.
-@ref{Debugger}, describes the @command{awk} debugger,
-@command{dgawk}.
+@ref{Dynamic Extensions}, describes how to add new variables and
+functions to @command{gawk} by writing extensions in C.
+
+Part IV provides the appendices, the Glossary, and two licenses that cover
+the @command{gawk} source code and this @value{DOCUMENT}, respectively.
+It contains the following appendices:
@ref{Language History},
describes how the @command{awk} language has evolved since
@@ -1192,8 +1357,7 @@ available @command{awk} implementations.
@ref{Notes},
describes how to disable @command{gawk}'s extensions, as
well as how to contribute new code to @command{gawk},
-how to write extension libraries, and some possible
-future directions for @command{gawk} development.
+and some possible future directions for @command{gawk} development.
@ref{Basic Concepts},
provides some very cursory background material for those who
@@ -1603,10 +1767,13 @@ has been and continues to be a pleasure working with this team of fine
people.
John Haque contributed the modifications to convert @command{gawk}
-into a byte-code interpreter, including the debugger. Stephen Davies
+into a byte-code interpreter, including the debugger, and the
+additional modifications for support of arbitrary precision arithmetic.
+Stephen Davies
contributed to the effort to bring the byte-code changes into the mainstream
code base.
Efraim Yawitz contributed the initial text of @ref{Debugger}.
+John Haque contributed the initial text of @ref{Arbitrary Precision Arithmetic}.
@cindex Kernighan, Brian
I would like to thank Brian Kernighan for invaluable assistance during the
@@ -1634,12 +1801,14 @@ Nof Ayalon @*
ISRAEL @*
March, 2011
-@ignore
-@c Try this
@iftex
-@page
-@headings off
-@majorheading I@ @ @ @ The @command{awk} Language and @command{gawk}
+@part Part I:@* The @command{awk} Language
+@end iftex
+
+@ignore
+@ifdocbook
+@part Part I:@* The @command{awk} Language
+
Part I describes the @command{awk} language and @command{gawk} program in detail.
It starts with the basics, and continues through all of the features of @command{awk}
and @command{gawk}. It contains the following chapters:
@@ -1649,6 +1818,9 @@ and @command{gawk}. It contains the following chapters:
@ref{Getting Started}.
@item
+@ref{Invoking Gawk}.
+
+@item
@ref{Regexp}.
@item
@@ -1668,21 +1840,8 @@ and @command{gawk}. It contains the following chapters:
@item
@ref{Functions}.
-
-@item
-@ref{Internationalization}.
-
-@item
-@ref{Advanced Features}.
-
-@item
-@ref{Invoking Gawk}.
@end itemize
-
-@page
-@evenheading @thispage@ @ @ @strong{@value{TITLE}} @| @|
-@oddheading @| @| @strong{@thischapter}@ @ @ @thispage
-@end iftex
+@end ifdocbook
@end ignore
@node Getting Started
@@ -2913,6 +3072,7 @@ things in this @value{CHAPTER} that don't interest you right now.
* Environment Variables:: The environment variables @command{gawk} uses.
* Exit Status:: @command{gawk}'s exit status.
* Include Files:: Including other files into your program.
+* Loading Shared Libraries:: Loading shared libraries into your program.
* Obsolete:: Obsolete Options and/or features.
* Undocumented:: Undocumented Options and Features.
@end menu
@@ -3004,6 +3164,22 @@ This option may be given multiple times; the @command{awk}
program consists of the concatenation the contents of
each specified @var{source-file}.
+@item -i @var{source-file}
+@itemx --include @var{source-file}
+@cindex @code{-i} option
+@cindex @code{--include} option
+@cindex @command{awk} programs, location of
+Read @command{awk} source library from @var{source-file}. This option is
+completely equivalent to using the @samp{@@include} directive inside
+your program. This option is very
+similar to the @option{-f} option, but there are two important differences.
+First, when @option{-i} is used, the program source will not be loaded if it has
+been previously loaded, whereas the @option{-f} will always load the file.
+Second, because this option is intended to be used with code libraries, the
+@command{awk} command does not recognize such files as constituting main program
+input. Thus, after processing an @option{-i} argument, we still expect to
+find the main source code via the @option{-f} option or on the command-line.
+
@item -v @var{var}=@var{val}
@itemx --assign @var{var}=@var{val}
@cindex @code{-v} option
@@ -3115,6 +3291,19 @@ inadvertently use global variables that you meant to be local.
(This is a particularly easy mistake to make with simple variable
names like @code{i}, @code{j}, etc.)
+@item -D@r{[}@var{file}@r{]}
+@itemx --debug=@r{[}@var{file}@r{]}
+@cindex @code{-D} option
+@cindex @code{--debug} option
+@cindex @command{awk} debugging, enabling
+Enable debugging of @command{awk} programs
+(@pxref{Debugging}).
+By default, the debugger reads commands interactively from the terminal.
+The optional @var{file} argument allows you to specify a file with a list
+of commands for the debugger to execute non-interactively.
+No space is allowed between the @option{-D} and @var{file}, if
+@var{file} is supplied.
+
@item -e @var{program-text}
@itemx --source @var{program-text}
@cindex @code{-e} option
@@ -3180,6 +3369,18 @@ for information about this option.
Print a ``usage'' message summarizing the short and long style options
that @command{gawk} accepts and then exit.
+@item -l @var{lib}
+@itemx --load @var{lib}
+@cindex @code{-l} option
+@cindex @code{--load} option
+@cindex loading, library
+Load a shared library @var{lib}. This searches for the library using the @env{AWKLIBPATH}
+environment variable. The correct library suffix for your platform will be
+supplied by default, so it need not be specified in the library name.
+The library initialization routine should be named @code{dl_load()}.
+An alternative is to use the @samp{@@load} keyword inside the program to load
+a shared library.
+
@item -L @r{[}value@r{]}
@itemx --lint@r{[}=value@r{]}
@cindex @code{-l} option
@@ -3203,6 +3404,14 @@ when eliminating problems pointed out by @option{--lint}, you should take
care to search for all occurrences of each inappropriate construct. As
@command{awk} programs are usually short, doing so is not burdensome.
+@item -M
+@itemx --bignum
+@cindex @code{-M} option
+@cindex @code{--bignum} option
+Force arbitrary precision arithmetic on numbers. This option has no effect
+if @command{gawk} is not compiled to use the GNU MPFR and MP libraries
+(@pxref{Arbitrary Precision Arithmetic}).
+
@item -n
@itemx --non-decimal-data
@cindex @code{-n} option
@@ -3226,6 +3435,18 @@ Use with care.
Force the use of the locale's decimal point character
when parsing numeric input data (@pxref{Locales}).
+@item -o@r{[}@var{file}@r{]}
+@itemx --pretty-print@r{[}=@var{file}@r{]}
+@cindex @code{-o} option
+@cindex @code{--pretty-print} option
+@cindex @command{awk} enabling
+Enable pretty-printing of @command{awk} programs.
+By default, output program is created in a file named @file{awkprof.out}.
+The optional @var{file} argument allows you to specify a different
+@value{FN} for the output.
+No space is allowed between the @option{-o} and @var{file}, if
+@var{file} is supplied.
+
@item -O
@itemx --optimize
@cindex @code{--optimize} option
@@ -3238,7 +3459,7 @@ maintainer hopes to add more optimizations over time.
@itemx --profile@r{[}=@var{file}@r{]}
@cindex @code{-p} option
@cindex @code{--profile} option
-@cindex @command{awk} programs, profiling, enabling
+@cindex @command{awk} profiling, enabling
Enable profiling of @command{awk} programs
(@pxref{Profiling}).
By default, profiles are created in a file named @file{awkprof.out}.
@@ -3247,10 +3468,8 @@ The optional @var{file} argument allows you to specify a different
No space is allowed between the @option{-p} and @var{file}, if
@var{file} is supplied.
-When run with @command{gawk}, the profile is just a ``pretty printed'' version
-of the program. When run with @command{pgawk}, the profile contains execution
-counts for each statement in the program in the left margin, and function
-call counts for each function.
+The profile contains execution counts for each statement in the program
+in the left margin, and function call counts for each function.
@item -P
@itemx --posix
@@ -3314,14 +3533,6 @@ This is now @command{gawk}'s default behavior.
Nevertheless, this option remains both for backward compatibility,
and for use in combination with the @option{--traditional} option.
-@item -R @var{file}
-@itemx --command=@var{file}
-@cindex @code{-R} option
-@cindex @code{--command} option
-@command{dgawk} only.
-Read @command{dgawk} debugger options and commands from @var{file}.
-@xref{Dgawk Info}, for more information.
-
@item -S
@itemx --sandbox
@cindex @code{-S} option
@@ -3547,6 +3758,8 @@ behaves.
@menu
* AWKPATH Variable:: Searching directories for @command{awk}
programs.
+* AWKLIBPATH Variable:: Searching directories for @command{awk} shared
+ libraries.
* Other Environment Variables:: The environment variables.
@end menu
@@ -3564,7 +3777,8 @@ on the command-line with the @option{-f} option.
In most @command{awk}
implementations, you must supply a precise path name for each program
file, unless the file is in the current directory.
-But in @command{gawk}, if the @value{FN} supplied to the @option{-f} option
+But in @command{gawk}, if the @value{FN} supplied to the @option{-f}
+or @option{-i} options
does not contain a @samp{/}, then @command{gawk} searches a list of
directories (called the @dfn{search path}), one by one, looking for a
file with the specified name.
@@ -3586,13 +3800,16 @@ standard directory in the default path and then specified on
the command line with a short @value{FN}. Otherwise, the full @value{FN}
would have to be typed for each file.
-By using both the @option{--source} and @option{-f} options, your command-line
+By using the @option{-i} option, or the @option{--source} and @option{-f} options, your command-line
@command{awk} programs can use facilities in @command{awk} library files
(@pxref{Library Functions}).
Path searching is not done if @command{gawk} is in compatibility mode.
This is true for both @option{--traditional} and @option{--posix}.
@xref{Options}.
+If the source code is not found after the initial search, the path is searched
+again after adding the default @samp{.awk} suffix to the filename.
+
@quotation NOTE
To include
the current directory in the path, either place
@@ -3622,6 +3839,21 @@ sense: the @env{AWKPATH} environment variable is used to find the program
source files. Once your program is running, all the files have been
found, and @command{gawk} no longer needs to use @env{AWKPATH}.
+@node AWKLIBPATH Variable
+@subsection The @env{AWKLIBPATH} Environment Variable
+@cindex @env{AWKLIBPATH} environment variable
+@cindex directories, searching
+@cindex search paths
+@cindex search paths, for shared libraries
+@cindex differences in @command{awk} and @command{gawk}, @code{AWKLIBPATH} environment variable
+
+The @env{AWKLIBPATH} environment variable is similar to the @env{AWKPATH}
+variable, but it is used to search for shared libraries specified
+with the @option{-l} option rather than for source files. If the library
+is not found, the path is searched again after adding the appropriate
+shared library suffix for the platform. For example, on GNU/Linux systems,
+the suffix @samp{.so} is used.
+
@node Other Environment Variables
@subsection Other Environment Variables
@@ -3645,6 +3877,11 @@ Specifies the interval between connection retries,
in milliseconds. On systems that do not support
the @code{usleep()} system call,
the value is rounded up to an integral number of seconds.
+
+@item GAWK_READ_TIMEOUT
+Specifies the time, in milliseconds, for @command{gawk} to
+wait for input before returning with an error.
+@xref{Read Timeout}.
@end table
The environment variables in the following list are meant
@@ -3718,8 +3955,9 @@ into smaller, more manageable pieces, and also lets you reuse common @command{aw
code from various @command{awk} scripts. In other words, you can group
together @command{awk} functions, used to carry out specific tasks,
into external files. These files can be used just like function libraries,
-using the @samp{@@include} keyword in conjunction with the @code{AWKPATH}
-environment variable.
+using the @samp{@@include} keyword in conjunction with the @env{AWKPATH}
+environment variable. Note that source files may also be included
+using the @option{-i} option.
Let's see an example.
We'll start with two (trivial) @command{awk} scripts, namely
@@ -3825,6 +4063,41 @@ As mentioned in @ref{AWKPATH Variable}, the current directory is always
searched first for source files, before searching in @env{AWKPATH},
and this also applies to files named with @samp{@@include}.
+@node Loading Shared Libraries
+@section Loading Shared Libraries Into Your Program
+
+This @value{SECTION} describes a feature that is specific to @command{gawk}.
+
+The @samp{@@load} keyword can be used to read external @command{awk} shared
+libraries. This allows you to link in compiled code that may offer superior
+performance and/or give you access to extended capabilities not supported
+by the @command{awk} language. The @env{AWKLIBPATH} variable is used to
+search for the shared library. Using @samp{@@load} is completely equivalent
+to using the @option{-l} command-line option.
+
+If the shared library is not initially found in @env{AWKLIBPATH}, another
+search is conducted after appending the platform's default shared library
+suffix to the filename. For example, on GNU/Linux systems, the suffix
+@samp{.so} is used.
+
+@example
+$ @kbd{gawk '@@load "ordchr"; BEGIN @{print chr(65)@}'}
+@print{} A
+@end example
+
+@noindent
+This is equivalent to the following example:
+
+@example
+$ @kbd{gawk -lordchr 'BEGIN @{print chr(65)@}'}
+@print{} A
+@end example
+
+@noindent
+For command-line usage, the @option{-l} option is more convenient,
+but @samp{@@load} is useful for embedding inside an @command{awk} source file
+that requires access to a shared library.
+
@node Obsolete
@section Obsolete Options and/or Features
@@ -3921,31 +4194,6 @@ long-undocumented ``feature'' of Unix @code{awk}.
@end ignore
-@ignore
-@c Try this
-@iftex
-@page
-@headings off
-@majorheading II@ @ @ Using @command{awk} and @command{gawk}
-Part II shows how to use @command{awk} and @command{gawk} for problem solving.
-There is lots of code here for you to read and learn from.
-It contains the following chapters:
-
-@itemize @bullet
-@item
-@ref{Library Functions}.
-
-@item
-@ref{Sample Programs}.
-
-@end itemize
-
-@page
-@evenheading @thispage@ @ @ @strong{@value{TITLE}} @| @|
-@oddheading @| @| @strong{@thischapter}@ @ @ @thispage
-@end iftex
-@end ignore
-
@node Regexp
@chapter Regular Expressions
@cindex regexp, See regular expressions
@@ -5131,6 +5379,7 @@ used with it do not have to be named on the @command{awk} command line
* Multiple Line:: Reading multi-line records.
* Getline:: Reading files under explicit program control
using the @code{getline} function.
+* Read Timeout:: Reading input with a timeout.
* Command line directories:: What happens if you put a directory on the
command line.
@end menu
@@ -7224,9 +7473,10 @@ know that there is a string value to be assigned. Caveat Emptor.
summarizes the eight variants of @code{getline},
listing which built-in variables are set by each one,
and whether the variant is standard or a @command{gawk} extension.
+Note: for each variant, @command{gawk} sets the @code{RT} built-in variable.
@float Table,table-getline-variants
-@caption{getline Variants and What They Set}
+@caption{@code{getline} Variants and What They Set}
@multitable @columnfractions .33 .38 .27
@headitem Variant @tab Effect @tab Standard / Extension
@item @code{getline} @tab Sets @code{$0}, @code{NF}, @code{FNR}, and @code{NR} @tab Standard
@@ -7243,6 +7493,110 @@ and whether the variant is standard or a @command{gawk} extension.
@c ENDOFRANGE inex
@c ENDOFRANGE infir
+@node Read Timeout
+@section Reading Input With A Timeout
+@cindex timeout, reading input
+
+You may specify a timeout in milliseconds for reading input from a terminal,
+pipe or two-way communication including, TCP/IP sockets. This can be done
+on a per input, command or connection basis, by setting a special element
+in the @code{PROCINFO} array:
+
+@example
+PROCINFO["input_name", "READ_TIMEOUT"] = @var{timeout in milliseconds}
+@end example
+
+When set, this will cause @command{gawk} to time out and return failure
+if no data is available to read within the specified timeout period.
+For example, a TCP client can decide to give up on receiving
+any response from the server after a certain amount of time:
+
+@example
+Service = "/inet/tcp/0/localhost/daytime"
+PROCINFO[Service, "READ_TIMEOUT"] = 100
+if ((Service |& getline) > 0)
+ print $0
+else if (ERRNO != "")
+ print ERRNO
+@end example
+
+Here is how to read interactively from the terminal@footnote{This assumes
+that standard input is the keyboard} without waiting
+for more than five seconds:
+
+@example
+PROCINFO["/dev/stdin", "READ_TIMEOUT"] = 5000
+while ((getline < "/dev/stdin") > 0)
+ print $0
+@end example
+
+@command{gawk} will terminate the read operation if input does not
+arrive after waiting for the timeout period, return failure
+and set the @code{ERRNO} variable to an appropriate string value.
+A negative or zero value for the timeout is the same as specifying
+no timeout at all.
+
+A timeout can also be set for reading from the terminal in the implicit
+loop that reads input records and matches them against patterns,
+like so:
+
+@example
+$ @kbd{ gawk 'BEGIN @{ PROCINFO["-", "READ_TIMEOUT"] = 5000 @}}
+> @kbd{@{ print "You entered: " $0 @}'}
+@kbd{gawk}
+@print{} You entered: gawk
+@end example
+
+In this case, failure to respond within five seconds results in the following
+error message:
+
+@example
+@error{} gawk: cmd. line:2: (FILENAME=- FNR=1) fatal: error reading input file `-': Connection timed out
+@end example
+
+The timeout can be set or changed at any time, and will take effect on the
+next attempt to read from the input device. In the following example,
+we start with a timeout value of one second, and progressively
+reduce it by one-tenth of a second until we wait indefinitely
+for the input to arrive:
+
+@example
+PROCINFO[Service, "READ_TIMEOUT"] = 1000
+while ((Service |& getline) > 0) @{
+ print $0
+ PROCINFO[S, "READ_TIMEOUT"] -= 100
+@}
+@end example
+
+@quotation NOTE
+You should not assume that the read operation will block
+exactly after the tenth record has been printed. It is possible that
+@command{gawk} will read and buffer more than one record's
+worth of data the first time. Because of this, changing the value
+of timeout like in the above example is not very useful.
+@end quotation
+
+If the @code{PROCINFO} element is not present and the environment
+variable @env{GAWK_READ_TIMEOUT} exists,
+@command{gawk} uses its value to initialize the timeout value.
+The exclusive use of the environment variable to specify timeout
+has the disadvantage of not being able to control it
+on a per command or connection basis.
+
+@command{gawk} considers a timeout event to be an error even though
+the attempt to read from the underlying device may
+succeed in a later attempt. This is a limitation, and it also
+means that you cannot use this to multiplex input from
+two or more sources.
+
+Assigning a timeout value prevents read operations from
+blocking indefinitely. But bear in mind that there are other ways
+@command{gawk} can stall waiting for an input device to be ready.
+A network client can sometimes take a long time to establish
+a connection before it can start reading any data,
+or the attempt to open a FIFO special file for reading can block
+indefinitely until some other process opens it for writing.
+
@node Command line directories
@section Directories On The Command Line
@cindex directories, command line
@@ -11361,9 +11715,9 @@ fatal error.
@item
If you have written extensions that modify the record handling (by inserting
-an ``open hook''), you can invoke them at this point, before @command{gawk}
+an ``input parser''), you can invoke them at this point, before @command{gawk}
has started processing the file. (This is a @emph{very} advanced feature,
-currently used only by the @uref{http://xmlgawk.sourceforge.net, XMLgawk project}.)
+currently used only by the @uref{http://gawkextlib.sourceforge.net, @code{gawkextlib} project}.)
@end itemize
The @code{ENDFILE} rule is called when @command{gawk} has finished processing
@@ -12525,6 +12879,18 @@ This is the output record separator. It is output at the end of every
@code{print} statement. Its default value is @code{"\n"}, the newline
character. (@xref{Output Separators}.)
+@cindex @code{PREC} variable
+@item PREC #
+The working precision of arbitrary precision floating-point numbers,
+53 by default (@pxref{Setting Precision}).
+
+@cindex @code{ROUNDMODE} variable
+@item ROUNDMODE #
+The rounding mode to use for arbitrary precision arithmetic on
+numbers, by default @code{"N"} (@samp{roundTiesToEven} in
+the IEEE-754 standard)
+(@pxref{Setting Rounding Mode}).
+
@cindex @code{RS} variable
@cindex separators, for records
@cindex record separators
@@ -12671,7 +13037,9 @@ does not affect the environment passed on to any programs that
Some operating systems may not have environment variables.
On such systems, the @code{ENVIRON} array is empty (except for
@w{@code{ENVIRON["AWKPATH"]}},
-@pxref{AWKPATH Variable}).
+@pxref{AWKPATH Variable} and
+@w{@code{ENVIRON["AWKLIBPATH"]}},
+@pxref{AWKLIBPATH Variable}).
@cindex @command{gawk}, @code{ERRNO} variable in
@cindex @code{ERRNO} variable
@@ -12747,6 +13115,16 @@ assigning a value to @code{NF} has the potential to affect
to @code{NF} can be used to create or remove fields from the
current record. @xref{Changing Fields}.
+@cindex @code{FUNCTAB} array
+@cindex @command{gawk}, @code{FUNCTAB} array in
+@cindex differences in @command{awk} and @command{gawk}, @code{FUNCTAB} variable
+@item FUNCTAB #
+An array whose indices are the names of all the user-defined
+or extension functions in the program.
+@strong{NOTE}: The array values cannot currently be used.
+Also, you may not use the @code{delete} statement with the
+@code{FUNCTAB} array.
+
@cindex @code{NR} variable
@item NR
The number of input records @command{awk} has processed since
@@ -12776,6 +13154,34 @@ This is
@code{"FIELDWIDTHS"} if field splitting with @code{FIELDWIDTHS} is in effect,
or @code{"FPAT"} if field matching with @code{FPAT} is in effect.
+@item PROCINFO["identifiers"]
+A subarray, indexed by the names of all identifiers used in the
+text of the AWK program. For each identifier, the value of the element is one of the following:
+
+@table @code
+@item "array"
+The identifier is an array.
+
+@item "extension"
+The identifier is an extension function loaded via
+@code{@@load}.
+
+@item "scalar"
+The identifier is a scalar.
+
+@item "untyped"
+The identifier is untyped (could be used as a scalar or array,
+@command{gawk} doesn't know yet).
+
+@item "user"
+The identifier is a user-defined function.
+@end table
+
+@noindent
+The values indicate what @command{gawk} knows about the identifiers
+after it has finished parsing the program; they are @emph{not} updated
+while the program runs.
+
@item PROCINFO["gid"]
The value of the @code{getgid()} system call.
@@ -12808,6 +13214,25 @@ The value of the @code{getuid()} system call.
The version of @command{gawk}.
@end table
+The following additional elements in the array
+are available to provide information about the MPFR and GMP libraries
+if your version of @command{gawk} supports arbitrary precision numbers
+(@pxref{Arbitrary Precision Arithmetic}):
+
+@table @code
+@item PROCINFO["mpfr_version"]
+The version of the GNU MPFR library.
+
+@item PROCINFO["gmp_version"]
+The version of the GNU MP library.
+
+@item PROCINFO["prec_max"]
+The maximum precision supported by MPFR.
+
+@item PROCINFO["prec_min"]
+The minimum precision required by MPFR.
+@end table
+
On some systems, there may be elements in the array, @code{"group1"}
through @code{"group@var{N}"} for some @var{N}. @var{N} is the number of
supplementary groups that the process has. Use the @code{in} operator
@@ -12855,6 +13280,57 @@ In other @command{awk} implementations,
or if @command{gawk} is in compatibility mode
(@pxref{Options}),
it is not special.
+
+@cindex @command{gawk}, @code{SYMTAB} array in
+@cindex @code{SYMTAB} array
+@cindex differences in @command{awk} and @command{gawk}, @code{SYMTAB} variable
+@item SYMTAB #
+An array whose indices are the names of all currently defined
+global variables and arrays in the program. The array may be used
+for indirect access to read or write the value of a variable:
+
+@example
+foo = 5
+SYMTAB["foo"] = 4
+print foo # prints 4
+@end example
+
+@noindent
+The @code{isarray()} function (@pxref{Type Functions}) may be used to test
+if an element in @code{SYMTAB} is an array.
+Also, you may not use the @code{delete} statement with the
+@code{SYMTAB} array.
+
+You may use an index for @code{SYMTAB} that is not a predefined identifer:
+
+@example
+SYMTAB["xxx"] = 5
+print SYMTAB["xxx"]
+@end example
+
+@noindent
+This works as expected: in this case @code{SYMTAB} acts just like
+a regular array. The only difference is that you can't then delete
+@code{SYMTAB["xxx"]}.
+
+The @code{SYMTAB} array is more interesting than it looks. Andrew Schorr
+points out that it effectively gives @command{awk} data pointers. Consider his
+example:
+
+@example
+# Indirect multiply of any variable by amount, return result
+
+function multiply(variable, amount)
+@{
+ return SYMTAB[variable] *= amount
+@}
+@end example
+
+@quotation NOTE
+In order to avoid severe time-travel paradoxes@footnote{Not to mention difficult
+implementation issues.}, neither @code{FUNCTAB} nor @code{SYMTAB}
+are available as elements within the @code{SYMTAB} array.
+@end quotation
@end table
@c ENDOFRANGE bvconi
@c ENDOFRANGE vbconi
@@ -14271,13 +14747,7 @@ Optional parameters are enclosed in square brackets@w{ ([ ]):}
@item atan2(@var{y}, @var{x})
@cindex @code{atan2()} function
Return the arctangent of @code{@var{y} / @var{x}} in radians.
-You can use @samp{pi = atan2(0, -1)} to retrieve the value of
-@tex
-$\pi$.
-@end tex
-@ifnottex
-pi.
-@end ifnottex
+You can use @samp{pi = atan2(0, -1)} to retrieve the value of @value{PI}.
@item cos(@var{x})
@cindex @code{cos()} function
@@ -15149,22 +15619,23 @@ through unchanged. This is illustrated in @ref{table-sub-escapes}.
@caption{Historical Escape Sequence Processing for @code{sub()} and @code{gsub()}}
@tex
\vbox{\bigskip
-% This table has lots of &'s and \'s, so unspecialize them.
+% We need more characters for escape and tab ...
+\catcode`_ = 0
+\catcode`! = 4
+% ... since this table has lots of &'s and \'s, so we unspecialize them.
\catcode`\& = \other \catcode`\\ = \other
-% But then we need character for escape and tab.
-@catcode`! = 4
-@halign{@hfil#!@qquad@hfil#!@qquad#@hfil@cr
- You type!@code{sub()} sees!@code{sub()} generates@cr
-@hrulefill!@hrulefill!@hrulefill@cr
- @code{\&}! @code{&}!the matched text@cr
- @code{\\&}! @code{\&}!a literal @samp{&}@cr
- @code{\\\&}! @code{\&}!a literal @samp{&}@cr
-@code{\\\\&}! @code{\\&}!a literal @samp{\&}@cr
-@code{\\\\\&}! @code{\\&}!a literal @samp{\&}@cr
-@code{\\\\\\&}! @code{\\\&}!a literal @samp{\\&}@cr
- @code{\\q}! @code{\q}!a literal @samp{\q}@cr
+_halign{_hfil#!_qquad_hfil#!_qquad#_hfil_cr
+ You type!@code{sub()} sees!@code{sub()} generates_cr
+_hrulefill!_hrulefill!_hrulefill_cr
+ @code{\&}! @code{&}!the matched text_cr
+ @code{\\&}! @code{\&}!a literal @samp{&}_cr
+ @code{\\\&}! @code{\&}!a literal @samp{&}_cr
+ @code{\\\\&}! @code{\\&}!a literal @samp{\&}_cr
+ @code{\\\\\&}! @code{\\&}!a literal @samp{\&}_cr
+@code{\\\\\\&}! @code{\\\&}!a literal @samp{\\&}_cr
+ @code{\\q}! @code{\q}!a literal @samp{\q}_cr
}
-@bigskip}
+_bigskip}
@end tex
@ifdocbook
@multitable @columnfractions .20 .20 .60
@@ -15214,21 +15685,22 @@ output literally. The interpretation of @samp{\} and @samp{&} then becomes
as shown in @ref{table-sub-posix-92}.
@float Table,table-sub-posix-92
-@caption{1992 POSIX Rules for sub and gsub Escape Sequence Processing}
+@caption{1992 POSIX Rules for @code{sub()} and @code{gsub()} Escape Sequence Processing}
@c thanks to Karl Berry for formatting this table
@tex
\vbox{\bigskip
-% This table has lots of &'s and \'s, so unspecialize them.
+% We need more characters for escape and tab ...
+\catcode`_ = 0
+\catcode`! = 4
+% ... since this table has lots of &'s and \'s, so we unspecialize them.
\catcode`\& = \other \catcode`\\ = \other
-% But then we need character for escape and tab.
-@catcode`! = 4
-@halign{@hfil#!@qquad@hfil#!@qquad#@hfil@cr
- You type!@code{sub()} sees!@code{sub()} generates@cr
-@hrulefill!@hrulefill!@hrulefill@cr
- @code{&}! @code{&}!the matched text@cr
- @code{\\&}! @code{\&}!a literal @samp{&}@cr
-@code{\\\\&}! @code{\\&}!a literal @samp{\}, then the matched text@cr
-@code{\\\\\\&}! @code{\\\&}!a literal @samp{\&}@cr
+_halign{_hfil#!_qquad_hfil#!_qquad#_hfil_cr
+ You type!@code{sub()} sees!@code{sub()} generates_cr
+_hrulefill!_hrulefill!_hrulefill_cr
+ @code{&}! @code{&}!the matched text_cr
+ @code{\\&}! @code{\&}!a literal @samp{&}_cr
+@code{\\\\&}! @code{\\&}!a literal @samp{\}, then the matched text_cr
+@code{\\\\\\&}! @code{\\\&}!a literal @samp{\&}_cr
}
@bigskip}
@end tex
@@ -15283,23 +15755,24 @@ to produce a @samp{\} preceding the matched text. This is shown in
@ref{table-sub-proposed}.
@float Table,table-sub-proposed
-@caption{Proposed rules for sub and backslash}
+@caption{Proposed Rules For @code{sub()} And Backslash}
@tex
\vbox{\bigskip
-% This table has lots of &'s and \'s, so unspecialize them.
+% We need more characters for escape and tab ...
+\catcode`_ = 0
+\catcode`! = 4
+% ... since this table has lots of &'s and \'s, so we unspecialize them.
\catcode`\& = \other \catcode`\\ = \other
-% But then we need character for escape and tab.
-@catcode`! = 4
-@halign{@hfil#!@qquad@hfil#!@qquad#@hfil@cr
- You type!@code{sub()} sees!@code{sub()} generates@cr
-@hrulefill!@hrulefill!@hrulefill@cr
-@code{\\\\\\&}! @code{\\\&}!a literal @samp{\&}@cr
-@code{\\\\&}! @code{\\&}!a literal @samp{\}, followed by the matched text@cr
- @code{\\&}! @code{\&}!a literal @samp{&}@cr
- @code{\\q}! @code{\q}!a literal @samp{\q}@cr
- @code{\\\\}! @code{\\}!@code{\\}@cr
+_halign{_hfil#!_qquad_hfil#!_qquad#_hfil_cr
+ You type!@code{sub()} sees!@code{sub()} generates_cr
+_hrulefill!_hrulefill!_hrulefill_cr
+@code{\\\\\\&}! @code{\\\&}!a literal @samp{\&}_cr
+@code{\\\\&}! @code{\\&}!a literal @samp{\}, followed by the matched text_cr
+ @code{\\&}! @code{\&}!a literal @samp{&}_cr
+ @code{\\q}! @code{\q}!a literal @samp{\q}_cr
+ @code{\\\\}! @code{\\}!@code{\\}_cr
}
-@bigskip}
+_bigskip}
@end tex
@ifdocbook
@multitable @columnfractions .20 .20 .60
@@ -15345,23 +15818,24 @@ by anything else is not special; the @samp{\} is placed straight into the output
These rules are presented in @ref{table-posix-sub}.
@float Table,table-posix-sub
-@caption{POSIX rules for @code{sub()} and @code{gsub()}}
+@caption{POSIX Rules For @code{sub()} And @code{gsub()}}
@tex
\vbox{\bigskip
-% This table has lots of &'s and \'s, so unspecialize them.
+% We need more characters for escape and tab ...
+\catcode`_ = 0
+\catcode`! = 4
+% ... since this table has lots of &'s and \'s, so we unspecialize them.
\catcode`\& = \other \catcode`\\ = \other
-% But then we need character for escape and tab.
-@catcode`! = 4
-@halign{@hfil#!@qquad@hfil#!@qquad#@hfil@cr
- You type!@code{sub()} sees!@code{sub()} generates@cr
-@hrulefill!@hrulefill!@hrulefill@cr
-@code{\\\\\\&}! @code{\\\&}!a literal @samp{\&}@cr
-@code{\\\\&}! @code{\\&}!a literal @samp{\}, followed by the matched text@cr
- @code{\\&}! @code{\&}!a literal @samp{&}@cr
- @code{\\q}! @code{\q}!a literal @samp{\q}@cr
- @code{\\\\}! @code{\\}!@code{\}@cr
+_halign{_hfil#!_qquad_hfil#!_qquad#_hfil_cr
+ You type!@code{sub()} sees!@code{sub()} generates_cr
+_hrulefill!_hrulefill!_hrulefill_cr
+@code{\\\\\\&}! @code{\\\&}!a literal @samp{\&}_cr
+@code{\\\\&}! @code{\\&}!a literal @samp{\}, followed by the matched text_cr
+ @code{\\&}! @code{\&}!a literal @samp{&}_cr
+ @code{\\q}! @code{\q}!a literal @samp{\q}_cr
+ @code{\\\\}! @code{\\}!@code{\}_cr
}
-@bigskip}
+_bigskip}
@end tex
@ifdocbook
@multitable @columnfractions .20 .20 .60
@@ -15396,7 +15870,7 @@ when @option{--posix} is specified (@pxref{Options}). Otherwise,
it continued to follow the 1996 proposed rules, since
that had been its behavior for many years.
-When @value{PVERSION} 4.0.0, was released, the @command{gawk} maintainer
+When @value{PVERSION} 4.0.0 was released, the @command{gawk} maintainer
made the POSIX rules the default, breaking well over a decade's worth
of backwards compatibility.@footnote{This was rather naive of him, despite
there being a note in this section indicating that the next major version
@@ -15413,24 +15887,25 @@ appears in the generated text and the @samp{\} does not,
as shown in @ref{table-gensub-escapes}.
@float Table,table-gensub-escapes
-@caption{Escape Sequence Processing for @code{gensub()}}
+@caption{Escape Sequence Processing For @code{gensub()}}
@tex
\vbox{\bigskip
-% This table has lots of &'s and \'s, so unspecialize them.
+% We need more characters for escape and tab ...
+\catcode`_ = 0
+\catcode`! = 4
+% ... since this table has lots of &'s and \'s, so we unspecialize them.
\catcode`\& = \other \catcode`\\ = \other
-% But then we need character for escape and tab.
-@catcode`! = 4
-@halign{@hfil#!@qquad@hfil#!@qquad#@hfil@cr
- You type!@code{gensub()} sees!@code{gensub()} generates@cr
-@hrulefill!@hrulefill!@hrulefill@cr
- @code{&}! @code{&}!the matched text@cr
- @code{\\&}! @code{\&}!a literal @samp{&}@cr
- @code{\\\\}! @code{\\}!a literal @samp{\}@cr
- @code{\\\\&}! @code{\\&}!a literal @samp{\}, then the matched text@cr
-@code{\\\\\\&}! @code{\\\&}!a literal @samp{\&}@cr
- @code{\\q}! @code{\q}!a literal @samp{q}@cr
+_halign{_hfil#!_qquad_hfil#!_qquad#_hfil_cr
+ You type!@code{gensub()} sees!@code{gensub()} generates_cr
+_hrulefill!_hrulefill!_hrulefill_cr
+ @code{&}! @code{&}!the matched text_cr
+ @code{\\&}! @code{\&}!a literal @samp{&}_cr
+ @code{\\\\}! @code{\\}!a literal @samp{\}_cr
+ @code{\\\\&}! @code{\\&}!a literal @samp{\}, then the matched text_cr
+@code{\\\\\\&}! @code{\\\&}!a literal @samp{\&}_cr
+ @code{\\q}! @code{\q}!a literal @samp{q}_cr
}
-@bigskip}
+_bigskip}
@end tex
@ifdocbook
@multitable @columnfractions .20 .20 .60
@@ -16265,8 +16740,8 @@ bitwise operations just described. They are:
@cindex @command{gawk}, bitwise operations in
@table @code
@cindex @code{and()} function (@command{gawk})
-@item and(@var{v1}, @var{v2})
-Return the bitwise AND of the values provided by @var{v1} and @var{v2}.
+@item and(@var{v1}, @var{v2} @r{[}, @r{@dots{}]})
+Return the bitwise AND of the arguments. There must be at least two.
@cindex @code{compl()} function (@command{gawk})
@item compl(@var{val})
@@ -16277,16 +16752,16 @@ Return the bitwise complement of @var{val}.
Return the value of @var{val}, shifted left by @var{count} bits.
@cindex @code{or()} function (@command{gawk})
-@item or(@var{v1}, @var{v2})
-Return the bitwise OR of the values provided by @var{v1} and @var{v2}.
+@item or(@var{v1}, @var{v2} @r{[}, @r{@dots{}]})
+Return the bitwise OR of the arguments. There must be at least two.
@cindex @code{rshift()} function (@command{gawk})
@item rshift(@var{val}, @var{count})
Return the value of @var{val}, shifted right by @var{count} bits.
@cindex @code{xor()} function (@command{gawk})
-@item xor(@var{v1}, @var{v2})
-Return the bitwise XOR of the values provided by @var{v1} and @var{v2}.
+@item xor(@var{v1}, @var{v2} @r{[}, @r{@dots{}]})
+Return the bitwise XOR of the arguments. There must be at least two.
@end table
For all of these functions, first the double precision floating-point value is
@@ -16851,6 +17326,47 @@ foo's i=1
top's i=10
@end example
+Besides scalar values (strings and numbers), you may also have
+local arrays. By using a parameter name as an array, @command{awk}
+treats it as an array, and it is local to the function.
+In addition, recursive calls create new arrays.
+Consider this example:
+
+@example
+function some_func(p1, a)
+@{
+ if (p1++ > 3)
+ return
+
+ a[p1] = p1
+
+ some_func(p1)
+
+ printf("At level %d, index %d %s found in a\n",
+ p1, (p1 - 1), (p1 - 1) in a ? "is" : "is not")
+ printf("At level %d, index %d %s found in a\n",
+ p1, p1, p1 in a ? "is" : "is not")
+ print ""
+@}
+
+BEGIN @{
+ some_func(1)
+@}
+@end example
+
+When run, this program produces the following output:
+
+@example
+At level 4, index 3 is not found in a
+At level 4, index 4 is found in a
+
+At level 3, index 2 is not found in a
+At level 3, index 3 is found in a
+
+At level 2, index 1 is not found in a
+At level 2, index 2 is found in a
+@end example
+
@node Pass By Value/Reference
@subsubsection Passing Function Arguments By Value Or By Reference
@@ -17456,1899 +17972,28 @@ for (i = 1; i <= n; i++)
@c ENDOFRANGE funcud
-@node Internationalization
-@chapter Internationalization with @command{gawk}
-
-Once upon a time, computer makers
-wrote software that worked only in English.
-Eventually, hardware and software vendors noticed that if their
-systems worked in the native languages of non-English-speaking
-countries, they were able to sell more systems.
-As a result, internationalization and localization
-of programs and software systems became a common practice.
-
-@c STARTOFRANGE inloc
-@cindex internationalization, localization
-@cindex @command{gawk}, internationalization and, See internationalization
-@cindex internationalization, localization, @command{gawk} and
-For many years, the ability to provide internationalization
-was largely restricted to programs written in C and C++.
-This @value{CHAPTER} describes the underlying library @command{gawk}
-uses for internationalization, as well as how
-@command{gawk} makes internationalization
-features available at the @command{awk} program level.
-Having internationalization available at the @command{awk} level
-gives software developers additional flexibility---they are no
-longer forced to write in C or C++ when internationalization is
-a requirement.
-
-@menu
-* I18N and L10N:: Internationalization and Localization.
-* Explaining gettext:: How GNU @code{gettext} works.
-* Programmer i18n:: Features for the programmer.
-* Translator i18n:: Features for the translator.
-* I18N Example:: A simple i18n example.
-* Gawk I18N:: @command{gawk} is also internationalized.
-@end menu
-
-@node I18N and L10N
-@section Internationalization and Localization
-
-@cindex internationalization
-@cindex localization, See internationalization@comma{} localization
-@cindex localization
-@dfn{Internationalization} means writing (or modifying) a program once,
-in such a way that it can use multiple languages without requiring
-further source-code changes.
-@dfn{Localization} means providing the data necessary for an
-internationalized program to work in a particular language.
-Most typically, these terms refer to features such as the language
-used for printing error messages, the language used to read
-responses, and information related to how numerical and
-monetary values are printed and read.
-
-@node Explaining gettext
-@section GNU @code{gettext}
-
-@cindex internationalizing a program
-@c STARTOFRANGE gettex
-@cindex @code{gettext} library
-The facilities in GNU @code{gettext} focus on messages; strings printed
-by a program, either directly or via formatting with @code{printf} or
-@code{sprintf()}.@footnote{For some operating systems, the @command{gawk}
-port doesn't support GNU @code{gettext}.
-Therefore, these features are not available
-if you are using one of those operating systems. Sorry.}
-
-@cindex portability, @code{gettext} library and
-When using GNU @code{gettext}, each application has its own
-@dfn{text domain}. This is a unique name, such as @samp{kpilot} or @samp{gawk},
-that identifies the application.
-A complete application may have multiple components---programs written
-in C or C++, as well as scripts written in @command{sh} or @command{awk}.
-All of the components use the same text domain.
-
-To make the discussion concrete, assume we're writing an application
-named @command{guide}. Internationalization consists of the
-following steps, in this order:
-
-@enumerate
-@item
-The programmer goes
-through the source for all of @command{guide}'s components
-and marks each string that is a candidate for translation.
-For example, @code{"`-F': option required"} is a good candidate for translation.
-A table with strings of option names is not (e.g., @command{gawk}'s
-@option{--profile} option should remain the same, no matter what the local
-language).
-
-@cindex @code{textdomain()} function (C library)
-@item
-The programmer indicates the application's text domain
-(@code{"guide"}) to the @code{gettext} library,
-by calling the @code{textdomain()} function.
-
-@cindex @code{.pot} files
-@cindex files, @code{.pot}
-@cindex portable object template files
-@cindex files, portable object template
-@item
-Messages from the application are extracted from the source code and
-collected into a portable object template file (@file{guide.pot}),
-which lists the strings and their translations.
-The translations are initially empty.
-The original (usually English) messages serve as the key for
-lookup of the translations.
-
-@cindex @code{.po} files
-@cindex files, @code{.po}
-@cindex portable object files
-@cindex files, portable object
-@item
-For each language with a translator, @file{guide.pot}
-is copied to a portable object file (@code{.po})
-and translations are created and shipped with the application.
-For example, there might be a @file{fr.po} for a French translation.
-
-@cindex @code{.mo} files
-@cindex files, @code{.mo}
-@cindex message object files
-@cindex files, message object
-@item
-Each language's @file{.po} file is converted into a binary
-message object (@file{.mo}) file.
-A message object file contains the original messages and their
-translations in a binary format that allows fast lookup of translations
-at runtime.
-
-@item
-When @command{guide} is built and installed, the binary translation files
-are installed in a standard place.
-
-@cindex @code{bindtextdomain()} function (C library)
-@item
-For testing and development, it is possible to tell @code{gettext}
-to use @file{.mo} files in a different directory than the standard
-one by using the @code{bindtextdomain()} function.
-
-@cindex @code{.mo} files, specifying directory of
-@cindex files, @code{.mo}, specifying directory of
-@cindex message object files, specifying directory of
-@cindex files, message object, specifying directory of
-@item
-At runtime, @command{guide} looks up each string via a call
-to @code{gettext()}. The returned string is the translated string
-if available, or the original string if not.
-
-@item
-If necessary, it is possible to access messages from a different
-text domain than the one belonging to the application, without
-having to switch the application's default text domain back
-and forth.
-@end enumerate
-
-@cindex @code{gettext()} function (C library)
-In C (or C++), the string marking and dynamic translation lookup
-are accomplished by wrapping each string in a call to @code{gettext()}:
-
-@example
-printf("%s", gettext("Don't Panic!\n"));
-@end example
-
-The tools that extract messages from source code pull out all
-strings enclosed in calls to @code{gettext()}.
-
-@cindex @code{_} (underscore), @code{_} C macro
-@cindex underscore (@code{_}), @code{_} C macro
-The GNU @code{gettext} developers, recognizing that typing
-@samp{gettext(@dots{})} over and over again is both painful and ugly to look
-at, use the macro @samp{_} (an underscore) to make things easier:
-
-@example
-/* In the standard header file: */
-#define _(str) gettext(str)
-
-/* In the program text: */
-printf("%s", _("Don't Panic!\n"));
-@end example
-
-@cindex internationalization, localization, locale categories
-@cindex @code{gettext} library, locale categories
-@cindex locale categories
-@noindent
-This reduces the typing overhead to just three extra characters per string
-and is considerably easier to read as well.
-
-There are locale @dfn{categories}
-for different types of locale-related information.
-The defined locale categories that @code{gettext} knows about are:
-
-@table @code
-@cindex @code{LC_MESSAGES} locale category
-@item LC_MESSAGES
-Text messages. This is the default category for @code{gettext}
-operations, but it is possible to supply a different one explicitly,
-if necessary. (It is almost never necessary to supply a different category.)
-
-@cindex sorting characters in different languages
-@cindex @code{LC_COLLATE} locale category
-@item LC_COLLATE
-Text-collation information; i.e., how different characters
-and/or groups of characters sort in a given language.
-
-@cindex @code{LC_CTYPE} locale category
-@item LC_CTYPE
-Character-type information (alphabetic, digit, upper- or lowercase, and
-so on).
-This information is accessed via the
-POSIX character classes in regular expressions,
-such as @code{/[[:alnum:]]/}
-(@pxref{Regexp Operators}).
-
-@cindex monetary information, localization
-@cindex currency symbols, localization
-@cindex @code{LC_MONETARY} locale category
-@item LC_MONETARY
-Monetary information, such as the currency symbol, and whether the
-symbol goes before or after a number.
-
-@cindex @code{LC_NUMERIC} locale category
-@item LC_NUMERIC
-Numeric information, such as which characters to use for the decimal
-point and the thousands separator.@footnote{Americans
-use a comma every three decimal places and a period for the decimal
-point, while many Europeans do exactly the opposite:
-1,234.56 versus 1.234,56.}
-
-@cindex @code{LC_RESPONSE} locale category
-@item LC_RESPONSE
-Response information, such as how ``yes'' and ``no'' appear in the
-local language, and possibly other information as well.
-
-@cindex time, localization and
-@cindex dates, information related to@comma{} localization
-@cindex @code{LC_TIME} locale category
-@item LC_TIME
-Time- and date-related information, such as 12- or 24-hour clock, month printed
-before or after the day in a date, local month abbreviations, and so on.
-
-@cindex @code{LC_ALL} locale category
-@item LC_ALL
-All of the above. (Not too useful in the context of @code{gettext}.)
-@end table
-@c ENDOFRANGE gettex
-
-@node Programmer i18n
-@section Internationalizing @command{awk} Programs
-@c STARTOFRANGE inap
-@cindex @command{awk} programs, internationalizing
-
-@command{gawk} provides the following variables and functions for
-internationalization:
-
-@table @code
-@cindex @code{TEXTDOMAIN} variable
-@item TEXTDOMAIN
-This variable indicates the application's text domain.
-For compatibility with GNU @code{gettext}, the default
-value is @code{"messages"}.
-
-@cindex internationalization, localization, marked strings
-@cindex strings, for localization
-@item _"your message here"
-String constants marked with a leading underscore
-are candidates for translation at runtime.
-String constants without a leading underscore are not translated.
-
-@cindex @code{dcgettext()} function (@command{gawk})
-@item dcgettext(@var{string} @r{[}, @var{domain} @r{[}, @var{category}@r{]]})
-Return the translation of @var{string} in
-text domain @var{domain} for locale category @var{category}.
-The default value for @var{domain} is the current value of @code{TEXTDOMAIN}.
-The default value for @var{category} is @code{"LC_MESSAGES"}.
-
-If you supply a value for @var{category}, it must be a string equal to
-one of the known locale categories described in
-@ifnotinfo
-the previous @value{SECTION}.
-@end ifnotinfo
-@ifinfo
-@ref{Explaining gettext}.
-@end ifinfo
-You must also supply a text domain. Use @code{TEXTDOMAIN} if
-you want to use the current domain.
-
-@quotation CAUTION
-The order of arguments to the @command{awk} version
-of the @code{dcgettext()} function is purposely different from the order for
-the C version. The @command{awk} version's order was
-chosen to be simple and to allow for reasonable @command{awk}-style
-default arguments.
-@end quotation
-
-@cindex @code{dcngettext()} function (@command{gawk})
-@item dcngettext(@var{string1}, @var{string2}, @var{number} @r{[}, @var{domain} @r{[}, @var{category}@r{]]})
-Return the plural form used for @var{number} of the
-translation of @var{string1} and @var{string2} in text domain
-@var{domain} for locale category @var{category}. @var{string1} is the
-English singular variant of a message, and @var{string2} the English plural
-variant of the same message.
-The default value for @var{domain} is the current value of @code{TEXTDOMAIN}.
-The default value for @var{category} is @code{"LC_MESSAGES"}.
-
-The same remarks about argument order as for the @code{dcgettext()} function apply.
-
-@cindex @code{.mo} files, specifying directory of
-@cindex files, @code{.mo}, specifying directory of
-@cindex message object files, specifying directory of
-@cindex files, message object, specifying directory of
-@cindex @code{bindtextdomain()} function (@command{gawk})
-@item bindtextdomain(@var{directory} @r{[}, @var{domain}@r{]})
-Change the directory in which
-@code{gettext} looks for @file{.mo} files, in case they
-will not or cannot be placed in the standard locations
-(e.g., during testing).
-Return the directory in which @var{domain} is ``bound.''
-
-The default @var{domain} is the value of @code{TEXTDOMAIN}.
-If @var{directory} is the null string (@code{""}), then
-@code{bindtextdomain()} returns the current binding for the
-given @var{domain}.
-@end table
-
-To use these facilities in your @command{awk} program, follow the steps
-outlined in
-@ifnotinfo
-the previous @value{SECTION},
-@end ifnotinfo
-@ifinfo
-@ref{Explaining gettext},
-@end ifinfo
-like so:
-
-@enumerate
-@cindex @code{BEGIN} pattern, @code{TEXTDOMAIN} variable and
-@cindex @code{TEXTDOMAIN} variable, @code{BEGIN} pattern and
-@item
-Set the variable @code{TEXTDOMAIN} to the text domain of
-your program. This is best done in a @code{BEGIN} rule
-(@pxref{BEGIN/END}),
-or it can also be done via the @option{-v} command-line
-option (@pxref{Options}):
-
-@example
-BEGIN @{
- TEXTDOMAIN = "guide"
- @dots{}
-@}
-@end example
-
-@cindex @code{_} (underscore), translatable string
-@cindex underscore (@code{_}), translatable string
-@item
-Mark all translatable strings with a leading underscore (@samp{_})
-character. It @emph{must} be adjacent to the opening
-quote of the string. For example:
-
-@example
-print _"hello, world"
-x = _"you goofed"
-printf(_"Number of users is %d\n", nusers)
-@end example
-
-@item
-If you are creating strings dynamically, you can
-still translate them, using the @code{dcgettext()}
-built-in function:
-
-@example
-message = nusers " users logged in"
-message = dcgettext(message, "adminprog")
-print message
-@end example
-
-Here, the call to @code{dcgettext()} supplies a different
-text domain (@code{"adminprog"}) in which to find the
-message, but it uses the default @code{"LC_MESSAGES"} category.
-
-@cindex @code{LC_MESSAGES} locale category, @code{bindtextdomain()} function (@command{gawk})
-@item
-During development, you might want to put the @file{.mo}
-file in a private directory for testing. This is done
-with the @code{bindtextdomain()} built-in function:
-
-@example
-BEGIN @{
- TEXTDOMAIN = "guide" # our text domain
- if (Testing) @{
- # where to find our files
- bindtextdomain("testdir")
- # joe is in charge of adminprog
- bindtextdomain("../joe/testdir", "adminprog")
- @}
- @dots{}
-@}
-@end example
-
-@end enumerate
-
-@xref{I18N Example},
-for an example program showing the steps to create
-and use translations from @command{awk}.
-
-@node Translator i18n
-@section Translating @command{awk} Programs
-
-@cindex @code{.po} files
-@cindex files, @code{.po}
-@cindex portable object files
-@cindex files, portable object
-Once a program's translatable strings have been marked, they must
-be extracted to create the initial @file{.po} file.
-As part of translation, it is often helpful to rearrange the order
-in which arguments to @code{printf} are output.
-
-@command{gawk}'s @option{--gen-pot} command-line option extracts
-the messages and is discussed next.
-After that, @code{printf}'s ability to
-rearrange the order for @code{printf} arguments at runtime
-is covered.
-
-@menu
-* String Extraction:: Extracting marked strings.
-* Printf Ordering:: Rearranging @code{printf} arguments.
-* I18N Portability:: @command{awk}-level portability issues.
-@end menu
-
-@node String Extraction
-@subsection Extracting Marked Strings
-@cindex strings, extracting
-@cindex marked strings@comma{} extracting
-@cindex @code{--gen-pot} option
-@cindex command-line options, string extraction
-@cindex string extraction (internationalization)
-@cindex marked string extraction (internationalization)
-@cindex extraction, of marked strings (internationalization)
-
-@cindex @code{--gen-pot} option
-Once your @command{awk} program is working, and all the strings have
-been marked and you've set (and perhaps bound) the text domain,
-it is time to produce translations.
-First, use the @option{--gen-pot} command-line option to create
-the initial @file{.pot} file:
-
-@example
-$ @kbd{gawk --gen-pot -f guide.awk > guide.pot}
-@end example
-
-@cindex @code{xgettext} utility
-When run with @option{--gen-pot}, @command{gawk} does not execute your
-program. Instead, it parses it as usual and prints all marked strings
-to standard output in the format of a GNU @code{gettext} Portable Object
-file. Also included in the output are any constant strings that
-appear as the first argument to @code{dcgettext()} or as the first and
-second argument to @code{dcngettext()}.@footnote{The
-@command{xgettext} utility that comes with GNU
-@code{gettext} can handle @file{.awk} files.}
-@xref{I18N Example},
-for the full list of steps to go through to create and test
-translations for @command{guide}.
-
-@node Printf Ordering
-@subsection Rearranging @code{printf} Arguments
-
-@cindex @code{printf} statement, positional specifiers
-@cindex positional specifiers, @code{printf} statement
-Format strings for @code{printf} and @code{sprintf()}
-(@pxref{Printf})
-present a special problem for translation.
-Consider the following:@footnote{This example is borrowed
-from the GNU @code{gettext} manual.}
-
-@c line broken here only for smallbook format
-@example
-printf(_"String `%s' has %d characters\n",
- string, length(string)))
-@end example
-
-A possible German translation for this might be:
-
-@example
-"%d Zeichen lang ist die Zeichenkette `%s'\n"
-@end example
-
-The problem should be obvious: the order of the format
-specifications is different from the original!
-Even though @code{gettext()} can return the translated string
-at runtime,
-it cannot change the argument order in the call to @code{printf}.
-
-To solve this problem, @code{printf} format specifiers may have
-an additional optional element, which we call a @dfn{positional specifier}.
-For example:
-
-@example
-"%2$d Zeichen lang ist die Zeichenkette `%1$s'\n"
-@end example
-
-Here, the positional specifier consists of an integer count, which indicates which
-argument to use, and a @samp{$}. Counts are one-based, and the
-format string itself is @emph{not} included. Thus, in the following
-example, @samp{string} is the first argument and @samp{length(string)} is the second:
-
-@example
-$ @kbd{gawk 'BEGIN @{}
-> @kbd{string = "Dont Panic"}
-> @kbd{printf _"%2$d characters live in \"%1$s\"\n",}
-> @kbd{string, length(string)}
-> @kbd{@}'}
-@print{} 10 characters live in "Dont Panic"
-@end example
-
-If present, positional specifiers come first in the format specification,
-before the flags, the field width, and/or the precision.
-
-Positional specifiers can be used with the dynamic field width and
-precision capability:
-
-@example
-$ @kbd{gawk 'BEGIN @{}
-> @kbd{printf("%*.*s\n", 10, 20, "hello")}
-> @kbd{printf("%3$*2$.*1$s\n", 20, 10, "hello")}
-> @kbd{@}'}
-@print{} hello
-@print{} hello
-@end example
-
-@quotation NOTE
-When using @samp{*} with a positional specifier, the @samp{*}
-comes first, then the integer position, and then the @samp{$}.
-This is somewhat counterintuitive.
-@end quotation
-
-@cindex @code{printf} statement, positional specifiers, mixing with regular formats
-@cindex positional specifiers, @code{printf} statement, mixing with regular formats
-@cindex format specifiers, mixing regular with positional specifiers
-@command{gawk} does not allow you to mix regular format specifiers
-and those with positional specifiers in the same string:
-
-@example
-$ @kbd{gawk 'BEGIN @{ printf _"%d %3$s\n", 1, 2, "hi" @}'}
-@error{} gawk: cmd. line:1: fatal: must use `count$' on all formats or none
-@end example
-
-@quotation NOTE
-There are some pathological cases that @command{gawk} may fail to
-diagnose. In such cases, the output may not be what you expect.
-It's still a bad idea to try mixing them, even if @command{gawk}
-doesn't detect it.
-@end quotation
-
-Although positional specifiers can be used directly in @command{awk} programs,
-their primary purpose is to help in producing correct translations of
-format strings into languages different from the one in which the program
-is first written.
-
-@node I18N Portability
-@subsection @command{awk} Portability Issues
-
-@cindex portability, internationalization and
-@cindex internationalization, localization, portability and
-@command{gawk}'s internationalization features were purposely chosen to
-have as little impact as possible on the portability of @command{awk}
-programs that use them to other versions of @command{awk}.
-Consider this program:
-
-@example
-BEGIN @{
- TEXTDOMAIN = "guide"
- if (Test_Guide) # set with -v
- bindtextdomain("/test/guide/messages")
- print _"don't panic!"
-@}
-@end example
-
-@noindent
-As written, it won't work on other versions of @command{awk}.
-However, it is actually almost portable, requiring very little
-change:
-
-@itemize @bullet
-@cindex @code{TEXTDOMAIN} variable, portability and
-@item
-Assignments to @code{TEXTDOMAIN} won't have any effect,
-since @code{TEXTDOMAIN} is not special in other @command{awk} implementations.
-
-@item
-Non-GNU versions of @command{awk} treat marked strings
-as the concatenation of a variable named @code{_} with the string
-following it.@footnote{This is good fodder for an ``Obfuscated
-@command{awk}'' contest.} Typically, the variable @code{_} has
-the null string (@code{""}) as its value, leaving the original string constant as
-the result.
-
-@item
-By defining ``dummy'' functions to replace @code{dcgettext()}, @code{dcngettext()}
-and @code{bindtextdomain()}, the @command{awk} program can be made to run, but
-all the messages are output in the original language.
-For example:
-
-@cindex @code{bindtextdomain()} function (@command{gawk}), portability and
-@cindex @code{dcgettext()} function (@command{gawk}), portability and
-@cindex @code{dcngettext()} function (@command{gawk}), portability and
-@example
-@c file eg/lib/libintl.awk
-function bindtextdomain(dir, domain)
-@{
- return dir
-@}
-
-function dcgettext(string, domain, category)
-@{
- return string
-@}
-
-function dcngettext(string1, string2, number, domain, category)
-@{
- return (number == 1 ? string1 : string2)
-@}
-@c endfile
-@end example
-
-@item
-The use of positional specifications in @code{printf} or
-@code{sprintf()} is @emph{not} portable.
-To support @code{gettext()} at the C level, many systems' C versions of
-@code{sprintf()} do support positional specifiers. But it works only if
-enough arguments are supplied in the function call. Many versions of
-@command{awk} pass @code{printf} formats and arguments unchanged to the
-underlying C library version of @code{sprintf()}, but only one format and
-argument at a time. What happens if a positional specification is
-used is anybody's guess.
-However, since the positional specifications are primarily for use in
-@emph{translated} format strings, and since non-GNU @command{awk}s never
-retrieve the translated string, this should not be a problem in practice.
-@end itemize
-@c ENDOFRANGE inap
-
-@node I18N Example
-@section A Simple Internationalization Example
-
-Now let's look at a step-by-step example of how to internationalize and
-localize a simple @command{awk} program, using @file{guide.awk} as our
-original source:
-
-@example
-@c file eg/prog/guide.awk
-BEGIN @{
- TEXTDOMAIN = "guide"
- bindtextdomain(".") # for testing
- print _"Don't Panic"
- print _"The Answer Is", 42
- print "Pardon me, Zaphod who?"
-@}
-@c endfile
-@end example
-
-@noindent
-Run @samp{gawk --gen-pot} to create the @file{.pot} file:
-
-@example
-$ @kbd{gawk --gen-pot -f guide.awk > guide.pot}
-@end example
-
-@noindent
-This produces:
-
-@example
-@c file eg/data/guide.po
-#: guide.awk:4
-msgid "Don't Panic"
-msgstr ""
-
-#: guide.awk:5
-msgid "The Answer Is"
-msgstr ""
-
-@c endfile
-@end example
-
-This original portable object template file is saved and reused for each language
-into which the application is translated. The @code{msgid}
-is the original string and the @code{msgstr} is the translation.
-
-@quotation NOTE
-Strings not marked with a leading underscore do not
-appear in the @file{guide.pot} file.
-@end quotation
-
-Next, the messages must be translated.
-Here is a translation to a hypothetical dialect of English,
-called ``Mellow'':@footnote{Perhaps it would be better if it were
-called ``Hippy.'' Ah, well.}
-
-@example
-@group
-$ cp guide.pot guide-mellow.po
-@var{Add translations to} guide-mellow.po @dots{}
-@end group
-@end example
-
-@noindent
-Following are the translations:
-
-@example
-@c file eg/data/guide-mellow.po
-#: guide.awk:4
-msgid "Don't Panic"
-msgstr "Hey man, relax!"
-
-#: guide.awk:5
-msgid "The Answer Is"
-msgstr "Like, the scoop is"
-
-@c endfile
-@end example
-
-@cindex Linux
-@cindex GNU/Linux
-The next step is to make the directory to hold the binary message object
-file and then to create the @file{guide.mo} file.
-The directory layout shown here is standard for GNU @code{gettext} on
-GNU/Linux systems. Other versions of @code{gettext} may use a different
-layout:
-
-@example
-$ @kbd{mkdir en_US en_US/LC_MESSAGES}
-@end example
-
-@cindex @code{.po} files, converting to @code{.mo}
-@cindex files, @code{.po}, converting to @code{.mo}
-@cindex @code{.mo} files, converting from @code{.po}
-@cindex files, @code{.mo}, converting from @code{.po}
-@cindex portable object files, converting to message object files
-@cindex files, portable object, converting to message object files
-@cindex message object files, converting from portable object files
-@cindex files, message object, converting from portable object files
-@cindex @command{msgfmt} utility
-The @command{msgfmt} utility does the conversion from human-readable
-@file{.po} file to machine-readable @file{.mo} file.
-By default, @command{msgfmt} creates a file named @file{messages}.
-This file must be renamed and placed in the proper directory so that
-@command{gawk} can find it:
-
-@example
-$ @kbd{msgfmt guide-mellow.po}
-$ @kbd{mv messages en_US/LC_MESSAGES/guide.mo}
-@end example
-
-Finally, we run the program to test it:
-
-@example
-$ @kbd{gawk -f guide.awk}
-@print{} Hey man, relax!
-@print{} Like, the scoop is 42
-@print{} Pardon me, Zaphod who?
-@end example
-
-If the three replacement functions for @code{dcgettext()}, @code{dcngettext()}
-and @code{bindtextdomain()}
-(@pxref{I18N Portability})
-are in a file named @file{libintl.awk},
-then we can run @file{guide.awk} unchanged as follows:
-
-@example
-$ @kbd{gawk --posix -f guide.awk -f libintl.awk}
-@print{} Don't Panic
-@print{} The Answer Is 42
-@print{} Pardon me, Zaphod who?
-@end example
-
-@node Gawk I18N
-@section @command{gawk} Can Speak Your Language
-
-@command{gawk} itself has been internationalized
-using the GNU @code{gettext} package.
-(GNU @code{gettext} is described in
-complete detail in
-@ifinfo
-@inforef{Top, , GNU @code{gettext} utilities, gettext, GNU gettext tools}.)
-@end ifinfo
-@ifnotinfo
-@cite{GNU gettext tools}.)
-@end ifnotinfo
-As of this writing, the latest version of GNU @code{gettext} is
-@uref{ftp://ftp.gnu.org/gnu/gettext/gettext-0.18.1.tar.gz, @value{PVERSION} 0.18.1}.
-
-If a translation of @command{gawk}'s messages exists,
-then @command{gawk} produces usage messages, warnings,
-and fatal errors in the local language.
-@c ENDOFRANGE inloc
+@iftex
+@part Part II:@* Problem Solving With @command{awk}
+@end iftex
-@node Advanced Features
-@chapter Advanced Features of @command{gawk}
-@cindex advanced features, network connections, See Also networks, connections
-@c STARTOFRANGE gawadv
-@cindex @command{gawk}, features, advanced
-@c STARTOFRANGE advgaw
-@cindex advanced features, @command{gawk}
@ignore
-Contributed by: Peter Langston <pud!psl@bellcore.bellcore.com>
-
- Found in Steve English's "signature" line:
-
-"Write documentation as if whoever reads it is a violent psychopath
-who knows where you live."
-@end ignore
-@quotation
-@i{Write documentation as if whoever reads it is
-a violent psychopath who knows where you live.}@*
-Steve English, as quoted by Peter Langston
-@end quotation
-
-This @value{CHAPTER} discusses advanced features in @command{gawk}.
-It's a bit of a ``grab bag'' of items that are otherwise unrelated
-to each other.
-First, a command-line option allows @command{gawk} to recognize
-nondecimal numbers in input data, not just in @command{awk}
-programs.
-Then, @command{gawk}'s special features for sorting arrays are presented.
-Next, two-way I/O, discussed briefly in earlier parts of this
-@value{DOCUMENT}, is described in full detail, along with the basics
-of TCP/IP networking. Finally, @command{gawk}
-can @dfn{profile} an @command{awk} program, making it possible to tune
-it for performance.
-
-@ref{Dynamic Extensions},
-discusses the ability to dynamically add new built-in functions to
-@command{gawk}. As this feature is still immature and likely to change,
-its description is relegated to an appendix.
-
-@menu
-* Nondecimal Data:: Allowing nondecimal input data.
-* Array Sorting:: Facilities for controlling array traversal and
- sorting arrays.
-* Two-way I/O:: Two-way communications with another process.
-* TCP/IP Networking:: Using @command{gawk} for network programming.
-* Profiling:: Profiling your @command{awk} programs.
-@end menu
-
-@node Nondecimal Data
-@section Allowing Nondecimal Input Data
-@cindex @code{--non-decimal-data} option
-@cindex advanced features, @command{gawk}, nondecimal input data
-@cindex input, data@comma{} nondecimal
-@cindex constants, nondecimal
-
-If you run @command{gawk} with the @option{--non-decimal-data} option,
-you can have nondecimal constants in your input data:
-
-@c line break here for small book format
-@example
-$ @kbd{echo 0123 123 0x123 |}
-> @kbd{gawk --non-decimal-data '@{ printf "%d, %d, %d\n",}
-> @kbd{$1, $2, $3 @}'}
-@print{} 83, 123, 291
-@end example
-
-For this feature to work, write your program so that
-@command{gawk} treats your data as numeric:
-
-@example
-$ @kbd{echo 0123 123 0x123 | gawk '@{ print $1, $2, $3 @}'}
-@print{} 0123 123 0x123
-@end example
-
-@noindent
-The @code{print} statement treats its expressions as strings.
-Although the fields can act as numbers when necessary,
-they are still strings, so @code{print} does not try to treat them
-numerically. You may need to add zero to a field to force it to
-be treated as a number. For example:
-
-@example
-$ @kbd{echo 0123 123 0x123 | gawk --non-decimal-data '}
-> @kbd{@{ print $1, $2, $3}
-> @kbd{print $1 + 0, $2 + 0, $3 + 0 @}'}
-@print{} 0123 123 0x123
-@print{} 83 123 291
-@end example
-
-Because it is common to have decimal data with leading zeros, and because
-using this facility could lead to surprising results, the default is to leave it
-disabled. If you want it, you must explicitly request it.
-
-@cindex programming conventions, @code{--non-decimal-data} option
-@cindex @code{--non-decimal-data} option, @code{strtonum()} function and
-@cindex @code{strtonum()} function (@command{gawk}), @code{--non-decimal-data} option and
-@quotation CAUTION
-@emph{Use of this option is not recommended.}
-It can break old programs very badly.
-Instead, use the @code{strtonum()} function to convert your data
-(@pxref{Nondecimal-numbers}).
-This makes your programs easier to write and easier to read, and
-leads to less surprising results.
-@end quotation
-
-@node Array Sorting
-@section Controlling Array Traversal and Array Sorting
-
-@command{gawk} lets you control the order in which a @samp{for (i in array)}
-loop traverses an array.
-
-In addition, two built-in functions, @code{asort()} and @code{asorti()},
-let you sort arrays based on the array values and indices, respectively.
-These two functions also provide control over the sorting criteria used
-to order the elements during sorting.
-
-@menu
-* Controlling Array Traversal:: How to use PROCINFO["sorted_in"].
-* Array Sorting Functions:: How to use @code{asort()} and @code{asorti()}.
-@end menu
-
-@node Controlling Array Traversal
-@subsection Controlling Array Traversal
-
-By default, the order in which a @samp{for (i in array)} loop
-scans an array is not defined; it is generally based upon
-the internal implementation of arrays inside @command{awk}.
-
-Often, though, it is desirable to be able to loop over the elements
-in a particular order that you, the programmer, choose. @command{gawk}
-lets you do this.
-
-@ref{Controlling Scanning}, describes how you can assign special,
-pre-defined values to @code{PROCINFO["sorted_in"]} in order to
-control the order in which @command{gawk} will traverse an array
-during a @code{for} loop.
-
-In addition, the value of @code{PROCINFO["sorted_in"]} can be a function name.
-This lets you traverse an array based on any custom criterion.
-The array elements are ordered according to the return value of this
-function. The comparison function should be defined with at least
-four arguments:
-
-@example
-function comp_func(i1, v1, i2, v2)
-@{
- @var{compare elements 1 and 2 in some fashion}
- @var{return < 0; 0; or > 0}
-@}
-@end example
-
-Here, @var{i1} and @var{i2} are the indices, and @var{v1} and @var{v2}
-are the corresponding values of the two elements being compared.
-Either @var{v1} or @var{v2}, or both, can be arrays if the array being
-traversed contains subarrays as values.
-(@xref{Arrays of Arrays}, for more information about subarrays.)
-The three possible return values are interpreted as follows:
-
-@table @code
-@item comp_func(i1, v1, i2, v2) < 0
-Index @var{i1} comes before index @var{i2} during loop traversal.
-
-@item comp_func(i1, v1, i2, v2) == 0
-Indices @var{i1} and @var{i2}
-come together but the relative order with respect to each other is undefined.
-
-@item comp_func(i1, v1, i2, v2) > 0
-Index @var{i1} comes after index @var{i2} during loop traversal.
-@end table
-
-Our first comparison function can be used to scan an array in
-numerical order of the indices:
-
-@example
-function cmp_num_idx(i1, v1, i2, v2)
-@{
- # numerical index comparison, ascending order
- return (i1 - i2)
-@}
-@end example
-
-Our second function traverses an array based on the string order of
-the element values rather than by indices:
-
-@example
-function cmp_str_val(i1, v1, i2, v2)
-@{
- # string value comparison, ascending order
- v1 = v1 ""
- v2 = v2 ""
- if (v1 < v2)
- return -1
- return (v1 != v2)
-@}
-@end example
-
-The third
-comparison function makes all numbers, and numeric strings without
-any leading or trailing spaces, come out first during loop traversal:
-
-@example
-function cmp_num_str_val(i1, v1, i2, v2, n1, n2)
-@{
- # numbers before string value comparison, ascending order
- n1 = v1 + 0
- n2 = v2 + 0
- if (n1 == v1)
- return (n2 == v2) ? (n1 - n2) : -1
- else if (n2 == v2)
- return 1
- return (v1 < v2) ? -1 : (v1 != v2)
-@}
-@end example
-
-Here is a main program to demonstrate how @command{gawk}
-behaves using each of the previous functions:
-
-@example
-BEGIN @{
- data["one"] = 10
- data["two"] = 20
- data[10] = "one"
- data[100] = 100
- data[20] = "two"
-
- f[1] = "cmp_num_idx"
- f[2] = "cmp_str_val"
- f[3] = "cmp_num_str_val"
- for (i = 1; i <= 3; i++) @{
- printf("Sort function: %s\n", f[i])
- PROCINFO["sorted_in"] = f[i]
- for (j in data)
- printf("\tdata[%s] = %s\n", j, data[j])
- print ""
- @}
-@}
-@end example
-
-Here are the results when the program is run:
-@page
-
-@example
-$ @kbd{gawk -f compdemo.awk}
-@print{} Sort function: cmp_num_idx @ii{Sort by numeric index}
-@print{} data[two] = 20
-@print{} data[one] = 10 @ii{Both strings are numerically zero}
-@print{} data[10] = one
-@print{} data[20] = two
-@print{} data[100] = 100
-@print{}
-@print{} Sort function: cmp_str_val @ii{Sort by element values as strings}
-@print{} data[one] = 10
-@print{} data[100] = 100 @ii{String 100 is less than string 20}
-@print{} data[two] = 20
-@print{} data[10] = one
-@print{} data[20] = two
-@print{}
-@print{} Sort function: cmp_num_str_val @ii{Sort all numeric values before all strings}
-@print{} data[one] = 10
-@print{} data[two] = 20
-@print{} data[100] = 100
-@print{} data[10] = one
-@print{} data[20] = two
-@end example
-
-Consider sorting the entries of a GNU/Linux system password file
-according to login name. The following program sorts records
-by a specific field position and can be used for this purpose:
-
-@example
-# sort.awk --- simple program to sort by field position
-# field position is specified by the global variable POS
-
-function cmp_field(i1, v1, i2, v2)
-@{
- # comparison by value, as string, and ascending order
- return v1[POS] < v2[POS] ? -1 : (v1[POS] != v2[POS])
-@}
-
-@{
- for (i = 1; i <= NF; i++)
- a[NR][i] = $i
-@}
-
-END @{
- PROCINFO["sorted_in"] = "cmp_field"
- if (POS < 1 || POS > NF)
- POS = 1
- for (i in a) @{
- for (j = 1; j <= NF; j++)
- printf("%s%c", a[i][j], j < NF ? ":" : "")
- print ""
- @}
-@}
-@end example
-
-The first field in each entry of the password file is the user's login name,
-and the fields are separated by colons.
-Each record defines a subarray,
-with each field as an element in the subarray.
-Running the program produces the
-following output:
-
-@example
-$ @kbd{gawk -vPOS=1 -F: -f sort.awk /etc/passwd}
-@print{} adm:x:3:4:adm:/var/adm:/sbin/nologin
-@print{} apache:x:48:48:Apache:/var/www:/sbin/nologin
-@print{} avahi:x:70:70:Avahi daemon:/:/sbin/nologin
-@dots{}
-@end example
-
-The comparison should normally always return the same value when given a
-specific pair of array elements as its arguments. If inconsistent
-results are returned then the order is undefined. This behavior can be
-exploited to introduce random order into otherwise seemingly
-ordered data:
-
-@example
-function cmp_randomize(i1, v1, i2, v2)
-@{
- # random order
- return (2 - 4 * rand())
-@}
-@end example
-
-As mentioned above, the order of the indices is arbitrary if two
-elements compare equal. This is usually not a problem, but letting
-the tied elements come out in arbitrary order can be an issue, especially
-when comparing item values. The partial ordering of the equal elements
-may change during the next loop traversal, if other elements are added or
-removed from the array. One way to resolve ties when comparing elements
-with otherwise equal values is to include the indices in the comparison
-rules. Note that doing this may make the loop traversal less efficient,
-so consider it only if necessary. The following comparison functions
-force a deterministic order, and are based on the fact that the
-indices of two elements are never equal:
-
-@example
-function cmp_numeric(i1, v1, i2, v2)
-@{
- # numerical value (and index) comparison, descending order
- return (v1 != v2) ? (v2 - v1) : (i2 - i1)
-@}
-
-function cmp_string(i1, v1, i2, v2)
-@{
- # string value (and index) comparison, descending order
- v1 = v1 i1
- v2 = v2 i2
- return (v1 > v2) ? -1 : (v1 != v2)
-@}
-@end example
-
-@c Avoid using the term ``stable'' when describing the unpredictable behavior
-@c if two items compare equal. Usually, the goal of a "stable algorithm"
-@c is to maintain the original order of the items, which is a meaningless
-@c concept for a list constructed from a hash.
-
-A custom comparison function can often simplify ordered loop
-traversal, and the sky is really the limit when it comes to
-designing such a function.
-
-When string comparisons are made during a sort, either for element
-values where one or both aren't numbers, or for element indices
-handled as strings, the value of @code{IGNORECASE}
-(@pxref{Built-in Variables}) controls whether
-the comparisons treat corresponding uppercase and lowercase letters as
-equivalent or distinct.
-
-Another point to keep in mind is that in the case of subarrays
-the element values can themselves be arrays; a production comparison
-function should use the @code{isarray()} function
-(@pxref{Type Functions}),
-to check for this, and choose a defined sorting order for subarrays.
-
-All sorting based on @code{PROCINFO["sorted_in"]}
-is disabled in POSIX mode,
-since the @code{PROCINFO} array is not special in that case.
-
-As a side note, sorting the array indices before traversing
-the array has been reported to add 15% to 20% overhead to the
-execution time of @command{awk} programs. For this reason,
-sorted array traversal is not the default.
-
-@c The @command{gawk}
-@c maintainers believe that only the people who wish to use a
-@c feature should have to pay for it.
-
-@node Array Sorting Functions
-@subsection Sorting Array Values and Indices with @command{gawk}
-
-@cindex arrays, sorting
-@cindex @code{asort()} function (@command{gawk})
-@cindex @code{asort()} function (@command{gawk}), arrays@comma{} sorting
-@cindex sort function, arrays, sorting
-In most @command{awk} implementations, sorting an array requires
-writing a @code{sort()} function.
-While this can be educational for exploring different sorting algorithms,
-usually that's not the point of the program.
-@command{gawk} provides the built-in @code{asort()}
-and @code{asorti()} functions
-(@pxref{String Functions})
-for sorting arrays. For example:
-
-@example
-@var{populate the array} data
-n = asort(data)
-for (i = 1; i <= n; i++)
- @var{do something with} data[i]
-@end example
-
-After the call to @code{asort()}, the array @code{data} is indexed from 1
-to some number @var{n}, the total number of elements in @code{data}.
-(This count is @code{asort()}'s return value.)
-@code{data[1]} @value{LEQ} @code{data[2]} @value{LEQ} @code{data[3]}, and so on.
-The comparison is based on the type of the elements
-(@pxref{Typing and Comparison}).
-All numeric values come before all string values,
-which in turn come before all subarrays.
-
-@cindex side effects, @code{asort()} function
-An important side effect of calling @code{asort()} is that
-@emph{the array's original indices are irrevocably lost}.
-As this isn't always desirable, @code{asort()} accepts a
-second argument:
-
-@example
-@var{populate the array} source
-n = asort(source, dest)
-for (i = 1; i <= n; i++)
- @var{do something with} dest[i]
-@end example
-
-In this case, @command{gawk} copies the @code{source} array into the
-@code{dest} array and then sorts @code{dest}, destroying its indices.
-However, the @code{source} array is not affected.
-
-@code{asort()} accepts a third string argument to control comparison of
-array elements. As with @code{PROCINFO["sorted_in"]}, this argument
-may be one of the predefined names that @command{gawk} provides
-(@pxref{Controlling Scanning}), or the name of a user-defined function
-(@pxref{Controlling Array Traversal}).
-
-@quotation NOTE
-In all cases, the sorted element values consist of the original
-array's element values. The ability to control comparison merely
-affects the way in which they are sorted.
-@end quotation
-
-Often, what's needed is to sort on the values of the @emph{indices}
-instead of the values of the elements.
-To do that, use the
-@code{asorti()} function. The interface is identical to that of
-@code{asort()}, except that the index values are used for sorting, and
-become the values of the result array:
-
-@example
-@{ source[$0] = some_func($0) @}
-
-END @{
- n = asorti(source, dest)
- for (i = 1; i <= n; i++) @{
- @ii{Work with sorted indices directly:}
- @var{do something with} dest[i]
- @dots{}
- @ii{Access original array via sorted indices:}
- @var{do something with} source[dest[i]]
- @}
-@}
-@end example
-
-Similar to @code{asort()},
-in all cases, the sorted element values consist of the original
-array's indices. The ability to control comparison merely
-affects the way in which they are sorted.
-
-Sorting the array by replacing the indices provides maximal flexibility.
-To traverse the elements in decreasing order, use a loop that goes from
-@var{n} down to 1, either over the elements or over the indices.@footnote{You
-may also use one of the predefined sorting names that sorts in
-decreasing order.}
-
-@cindex reference counting, sorting arrays
-Copying array indices and elements isn't expensive in terms of memory.
-Internally, @command{gawk} maintains @dfn{reference counts} to data.
-For example, when @code{asort()} copies the first array to the second one,
-there is only one copy of the original array elements' data, even though
-both arrays use the values.
-
-@c Document It And Call It A Feature. Sigh.
-@cindex @command{gawk}, @code{IGNORECASE} variable in
-@cindex @code{IGNORECASE} variable
-@cindex arrays, sorting, @code{IGNORECASE} variable and
-@cindex @code{IGNORECASE} variable, array sorting and
-Because @code{IGNORECASE} affects string comparisons, the value
-of @code{IGNORECASE} also affects sorting for both @code{asort()} and @code{asorti()}.
-Note also that the locale's sorting order does @emph{not}
-come into play; comparisons are based on character values only.@footnote{This
-is true because locale-based comparison occurs only when in POSIX
-compatibility mode, and since @code{asort()} and @code{asorti()} are
-@command{gawk} extensions, they are not available in that case.}
-Caveat Emptor.
-
-@node Two-way I/O
-@section Two-Way Communications with Another Process
-@cindex Brennan, Michael
-@cindex programmers, attractiveness of
-@smallexample
-@c Path: cssun.mathcs.emory.edu!gatech!newsxfer3.itd.umich.edu!news-peer.sprintlink.net!news-sea-19.sprintlink.net!news-in-west.sprintlink.net!news.sprintlink.net!Sprint!204.94.52.5!news.whidbey.com!brennan
-From: brennan@@whidbey.com (Mike Brennan)
-Newsgroups: comp.lang.awk
-Subject: Re: Learn the SECRET to Attract Women Easily
-Date: 4 Aug 1997 17:34:46 GMT
-@c Organization: WhidbeyNet
-@c Lines: 12
-Message-ID: <5s53rm$eca@@news.whidbey.com>
-@c References: <5s20dn$2e1@chronicle.concentric.net>
-@c Reply-To: brennan@whidbey.com
-@c NNTP-Posting-Host: asn202.whidbey.com
-@c X-Newsreader: slrn (0.9.4.1 UNIX)
-@c Xref: cssun.mathcs.emory.edu comp.lang.awk:5403
-
-On 3 Aug 1997 13:17:43 GMT, Want More Dates???
-<tracy78@@kilgrona.com> wrote:
->Learn the SECRET to Attract Women Easily
->
->The SCENT(tm) Pheromone Sex Attractant For Men to Attract Women
-
-The scent of awk programmers is a lot more attractive to women than
-the scent of perl programmers.
---
-Mike Brennan
-@c brennan@@whidbey.com
-@end smallexample
-
-@cindex advanced features, @command{gawk}, processes@comma{} communicating with
-@cindex processes, two-way communications with
-It is often useful to be able to
-send data to a separate program for
-processing and then read the result. This can always be
-done with temporary files:
-
-@example
-# Write the data for processing
-tempfile = ("mydata." PROCINFO["pid"])
-while (@var{not done with data})
- print @var{data} | ("subprogram > " tempfile)
-close("subprogram > " tempfile)
-
-# Read the results, remove tempfile when done
-while ((getline newdata < tempfile) > 0)
- @var{process} newdata @var{appropriately}
-close(tempfile)
-system("rm " tempfile)
-@end example
-
-@noindent
-This works, but not elegantly. Among other things, it requires that
-the program be run in a directory that cannot be shared among users;
-for example, @file{/tmp} will not do, as another user might happen
-to be using a temporary file with the same name.
-
-@cindex coprocesses
-@cindex input/output, two-way
-@cindex @code{|} (vertical bar), @code{|&} operator (I/O)
-@cindex vertical bar (@code{|}), @code{|&} operator (I/O)
-@cindex @command{csh} utility, @code{|&} operator, comparison with
-However, with @command{gawk}, it is possible to
-open a @emph{two-way} pipe to another process. The second process is
-termed a @dfn{coprocess}, since it runs in parallel with @command{gawk}.
-The two-way connection is created using the @samp{|&} operator
-(borrowed from the Korn shell, @command{ksh}):@footnote{This is very
-different from the same operator in the C shell.}
-
-@example
-do @{
- print @var{data} |& "subprogram"
- "subprogram" |& getline results
-@} while (@var{data left to process})
-close("subprogram")
-@end example
-
-The first time an I/O operation is executed using the @samp{|&}
-operator, @command{gawk} creates a two-way pipeline to a child process
-that runs the other program. Output created with @code{print}
-or @code{printf} is written to the program's standard input, and
-output from the program's standard output can be read by the @command{gawk}
-program using @code{getline}.
-As is the case with processes started by @samp{|}, the subprogram
-can be any program, or pipeline of programs, that can be started by
-the shell.
+@ifdocbook
+@part Part II:@* Problem Solving With @command{awk}
-There are some cautionary items to be aware of:
+Part II shows how to use @command{awk} and @command{gawk} for problem solving.
+There is lots of code here for you to read and learn from.
+It contains the following chapters:
@itemize @bullet
@item
-As the code inside @command{gawk} currently stands, the coprocess's
-standard error goes to the same place that the parent @command{gawk}'s
-standard error goes. It is not possible to read the child's
-standard error separately.
+@ref{Library Functions}.
-@cindex deadlocks
-@cindex buffering, input/output
-@cindex @code{getline} command, deadlock and
@item
-I/O buffering may be a problem. @command{gawk} automatically
-flushes all output down the pipe to the coprocess.
-However, if the coprocess does not flush its output,
-@command{gawk} may hang when doing a @code{getline} in order to read
-the coprocess's results. This could lead to a situation
-known as @dfn{deadlock}, where each process is waiting for the
-other one to do something.
+@ref{Sample Programs}.
@end itemize
-
-@cindex @code{close()} function, two-way pipes and
-It is possible to close just one end of the two-way pipe to
-a coprocess, by supplying a second argument to the @code{close()}
-function of either @code{"to"} or @code{"from"}
-(@pxref{Close Files And Pipes}).
-These strings tell @command{gawk} to close the end of the pipe
-that sends data to the coprocess or the end that reads from it,
-respectively.
-
-@cindex @command{sort} utility, coprocesses and
-This is particularly necessary in order to use
-the system @command{sort} utility as part of a coprocess;
-@command{sort} must read @emph{all} of its input
-data before it can produce any output.
-The @command{sort} program does not receive an end-of-file indication
-until @command{gawk} closes the write end of the pipe.
-
-When you have finished writing data to the @command{sort}
-utility, you can close the @code{"to"} end of the pipe, and
-then start reading sorted data via @code{getline}.
-For example:
-
-@example
-BEGIN @{
- command = "LC_ALL=C sort"
- n = split("abcdefghijklmnopqrstuvwxyz", a, "")
-
- for (i = n; i > 0; i--)
- print a[i] |& command
- close(command, "to")
-
- while ((command |& getline line) > 0)
- print "got", line
- close(command)
-@}
-@end example
-
-This program writes the letters of the alphabet in reverse order, one
-per line, down the two-way pipe to @command{sort}. It then closes the
-write end of the pipe, so that @command{sort} receives an end-of-file
-indication. This causes @command{sort} to sort the data and write the
-sorted data back to the @command{gawk} program. Once all of the data
-has been read, @command{gawk} terminates the coprocess and exits.
-
-As a side note, the assignment @samp{LC_ALL=C} in the @command{sort}
-command ensures traditional Unix (ASCII) sorting from @command{sort}.
-
-@cindex @command{gawk}, @code{PROCINFO} array in
-@cindex @code{PROCINFO} array
-You may also use pseudo-ttys (ptys) for
-two-way communication instead of pipes, if your system supports them.
-This is done on a per-command basis, by setting a special element
-in the @code{PROCINFO} array
-(@pxref{Auto-set}),
-like so:
-
-@example
-command = "sort -nr" # command, save in convenience variable
-PROCINFO[command, "pty"] = 1 # update PROCINFO
-print @dots{} |& command # start two-way pipe
-@dots{}
-@end example
-
-@noindent
-Using ptys avoids the buffer deadlock issues described earlier, at some
-loss in performance. If your system does not have ptys, or if all the
-system's ptys are in use, @command{gawk} automatically falls back to
-using regular pipes.
-
-@node TCP/IP Networking
-@section Using @command{gawk} for Network Programming
-@cindex advanced features, @command{gawk}, network programming
-@cindex networks, programming
-@c STARTOFRANGE tcpip
-@cindex TCP/IP
-@cindex @code{/inet/@dots{}} special files (@command{gawk})
-@cindex files, @code{/inet/@dots{}} (@command{gawk})
-@cindex @code{/inet4/@dots{}} special files (@command{gawk})
-@cindex files, @code{/inet4/@dots{}} (@command{gawk})
-@cindex @code{/inet6/@dots{}} special files (@command{gawk})
-@cindex files, @code{/inet6/@dots{}} (@command{gawk})
-@cindex @code{EMISTERED}
-@quotation
-@code{EMISTERED}:@*
-@ @ @ @ @i{A host is a host from coast to coast,@*
-@ @ @ @ and no-one can talk to host that's close,@*
-@ @ @ @ unless the host that isn't close@*
-@ @ @ @ is busy hung or dead.}
-@end quotation
-
-In addition to being able to open a two-way pipeline to a coprocess
-on the same system
-(@pxref{Two-way I/O}),
-it is possible to make a two-way connection to
-another process on another system across an IP network connection.
-
-You can think of this as just a @emph{very long} two-way pipeline to
-a coprocess.
-The way @command{gawk} decides that you want to use TCP/IP networking is
-by recognizing special @value{FN}s that begin with one of @samp{/inet/},
-@samp{/inet4/} or @samp{/inet6}.
-
-The full syntax of the special @value{FN} is
-@file{/@var{net-type}/@var{protocol}/@var{local-port}/@var{remote-host}/@var{remote-port}}.
-The components are:
-
-@table @var
-@item net-type
-Specifies the kind of Internet connection to make.
-Use @samp{/inet4/} to force IPv4, and
-@samp{/inet6/} to force IPv6.
-Plain @samp{/inet/} (which used to be the only option) uses
-the system default, most likely IPv4.
-
-@item protocol
-The protocol to use over IP. This must be either @samp{tcp}, or
-@samp{udp}, for a TCP or UDP IP connection,
-respectively. The use of TCP is recommended for most applications.
-
-@item local-port
-@cindex @code{getaddrinfo()} function (C library)
-The local TCP or UDP port number to use. Use a port number of @samp{0}
-when you want the system to pick a port. This is what you should do
-when writing a TCP or UDP client.
-You may also use a well-known service name, such as @samp{smtp}
-or @samp{http}, in which case @command{gawk} attempts to determine
-the predefined port number using the C @code{getaddrinfo()} function.
-
-@item remote-host
-The IP address or fully-qualified domain name of the Internet
-host to which you want to connect.
-
-@item remote-port
-The TCP or UDP port number to use on the given @var{remote-host}.
-Again, use @samp{0} if you don't care, or else a well-known
-service name.
-@end table
-
-@cindex @command{gawk}, @code{ERRNO} variable in
-@cindex @code{ERRNO} variable
-@quotation NOTE
-Failure in opening a two-way socket will result in a non-fatal error
-being returned to the calling code. The value of @code{ERRNO} indicates
-the error (@pxref{Auto-set}).
-@end quotation
-
-Consider the following very simple example:
-
-@example
-BEGIN @{
- Service = "/inet/tcp/0/localhost/daytime"
- Service |& getline
- print $0
- close(Service)
-@}
-@end example
-
-This program reads the current date and time from the local system's
-TCP @samp{daytime} server.
-It then prints the results and closes the connection.
-
-Because this topic is extensive, the use of @command{gawk} for
-TCP/IP programming is documented separately.
-@ifinfo
-See
-@inforef{Top, , General Introduction, gawkinet, TCP/IP Internetworking with @command{gawk}},
-@end ifinfo
-@ifnotinfo
-See @cite{TCP/IP Internetworking with @command{gawk}},
-which comes as part of the @command{gawk} distribution,
-@end ifnotinfo
-for a much more complete introduction and discussion, as well as
-extensive examples.
-
-@c ENDOFRANGE tcpip
-
-@node Profiling
-@section Profiling Your @command{awk} Programs
-@c STARTOFRANGE awkp
-@cindex @command{awk} programs, profiling
-@c STARTOFRANGE proawk
-@cindex profiling @command{awk} programs
-@c STARTOFRANGE pgawk
-@cindex @command{pgawk} program
-@cindex profiling @command{gawk}, See @command{pgawk} program
-
-You may produce execution
-traces of your @command{awk} programs.
-This is done with a specially compiled version of @command{gawk},
-called @command{pgawk} (``profiling @command{gawk}'').
-
-@cindex @code{awkprof.out} file
-@cindex files, @code{awkprof.out}
-@cindex @command{pgawk} program, @code{awkprof.out} file
-@command{pgawk} is identical in every way to @command{gawk}, except that when
-it has finished running, it creates a profile of your program in a file
-named @file{awkprof.out}.
-Because it is profiling, it also executes up to 45% slower than
-@command{gawk} normally does.
-
-@cindex @code{--profile} option
-As shown in the following example,
-the @option{--profile} option can be used to change the name of the file
-where @command{pgawk} will write the profile:
-
-@example
-pgawk --profile=myprog.prof -f myprog.awk data1 data2
-@end example
-
-@noindent
-In the above example, @command{pgawk} places the profile in
-@file{myprog.prof} instead of in @file{awkprof.out}.
-
-Here is a sample
-session showing a simple @command{awk} program, its input data, and the
-results from running @command{pgawk}. First, the @command{awk} program:
-
-@example
-BEGIN @{ print "First BEGIN rule" @}
-
-END @{ print "First END rule" @}
-
-/foo/ @{
- print "matched /foo/, gosh"
- for (i = 1; i <= 3; i++)
- sing()
-@}
-
-@{
- if (/foo/)
- print "if is true"
- else
- print "else is true"
-@}
-
-BEGIN @{ print "Second BEGIN rule" @}
-
-END @{ print "Second END rule" @}
-
-function sing( dummy)
-@{
- print "I gotta be me!"
-@}
-@end example
-
-Following is the input data:
-
-@example
-foo
-bar
-baz
-foo
-junk
-@end example
-
-Here is the @file{awkprof.out} that results from running @command{pgawk}
-on this program and data (this example also illustrates that @command{awk}
-programmers sometimes have to work late):
-
-@cindex @code{BEGIN} pattern, @command{pgawk} program
-@cindex @code{END} pattern, @command{pgawk} program
-@example
- # gawk profile, created Sun Aug 13 00:00:15 2000
-
- # BEGIN block(s)
-
- BEGIN @{
- 1 print "First BEGIN rule"
- 1 print "Second BEGIN rule"
- @}
-
- # Rule(s)
-
- 5 /foo/ @{ # 2
- 2 print "matched /foo/, gosh"
- 6 for (i = 1; i <= 3; i++) @{
- 6 sing()
- @}
- @}
-
- 5 @{
- 5 if (/foo/) @{ # 2
- 2 print "if is true"
- 3 @} else @{
- 3 print "else is true"
- @}
- @}
-
- # END block(s)
-
- END @{
- 1 print "First END rule"
- 1 print "Second END rule"
- @}
-
- # Functions, listed alphabetically
-
- 6 function sing(dummy)
- @{
- 6 print "I gotta be me!"
- @}
-@end example
-
-This example illustrates many of the basic features of profiling output.
-They are as follows:
-
-@itemize @bullet
-@item
-The program is printed in the order @code{BEGIN} rule,
-@code{BEGINFILE} rule,
-pattern/action rules,
-@code{ENDFILE} rule, @code{END} rule and functions, listed
-alphabetically.
-Multiple @code{BEGIN} and @code{END} rules are merged together,
-as are multiple @code{BEGINFILE} and @code{ENDFILE} rules.
-
-@cindex patterns, counts
-@item
-Pattern-action rules have two counts.
-The first count, to the left of the rule, shows how many times
-the rule's pattern was @emph{tested}.
-The second count, to the right of the rule's opening left brace
-in a comment,
-shows how many times the rule's action was @emph{executed}.
-The difference between the two indicates how many times the rule's
-pattern evaluated to false.
-
-@item
-Similarly,
-the count for an @code{if}-@code{else} statement shows how many times
-the condition was tested.
-To the right of the opening left brace for the @code{if}'s body
-is a count showing how many times the condition was true.
-The count for the @code{else}
-indicates how many times the test failed.
-
-@cindex loops, count for header
-@item
-The count for a loop header (such as @code{for}
-or @code{while}) shows how many times the loop test was executed.
-(Because of this, you can't just look at the count on the first
-statement in a rule to determine how many times the rule was executed.
-If the first statement is a loop, the count is misleading.)
-
-@cindex functions, user-defined, counts
-@cindex user-defined, functions, counts
-@item
-For user-defined functions, the count next to the @code{function}
-keyword indicates how many times the function was called.
-The counts next to the statements in the body show how many times
-those statements were executed.
-
-@cindex @code{@{@}} (braces), @command{pgawk} program
-@cindex braces (@code{@{@}}), @command{pgawk} program
-@item
-The layout uses ``K&R'' style with TABs.
-Braces are used everywhere, even when
-the body of an @code{if}, @code{else}, or loop is only a single statement.
-
-@cindex @code{()} (parentheses), @command{pgawk} program
-@cindex parentheses @code{()}, @command{pgawk} program
-@item
-Parentheses are used only where needed, as indicated by the structure
-of the program and the precedence rules.
-@c extra verbiage here satisfies the copyeditor. ugh.
-For example, @samp{(3 + 5) * 4} means add three plus five, then multiply
-the total by four. However, @samp{3 + 5 * 4} has no parentheses, and
-means @samp{3 + (5 * 4)}.
-
-@ignore
-@item
-All string concatenations are parenthesized too.
-(This could be made a bit smarter.)
+@end ifdocbook
@end ignore
-@item
-Parentheses are used around the arguments to @code{print}
-and @code{printf} only when
-the @code{print} or @code{printf} statement is followed by a redirection.
-Similarly, if
-the target of a redirection isn't a scalar, it gets parenthesized.
-
-@item
-@command{pgawk} supplies leading comments in
-front of the @code{BEGIN} and @code{END} rules,
-the pattern/action rules, and the functions.
-
-@end itemize
-
-The profiled version of your program may not look exactly like what you
-typed when you wrote it. This is because @command{pgawk} creates the
-profiled version by ``pretty printing'' its internal representation of
-the program. The advantage to this is that @command{pgawk} can produce
-a standard representation. The disadvantage is that all source-code
-comments are lost, as are the distinctions among multiple @code{BEGIN},
-@code{END}, @code{BEGINFILE}, and @code{ENDFILE} rules. Also, things such as:
-
-@example
-/foo/
-@end example
-
-@noindent
-come out as:
-
-@example
-/foo/ @{
- print $0
-@}
-@end example
-
-@noindent
-which is correct, but possibly surprising.
-
-@cindex profiling @command{awk} programs, dynamically
-@cindex @command{pgawk} program, dynamic profiling
-Besides creating profiles when a program has completed,
-@command{pgawk} can produce a profile while it is running.
-This is useful if your @command{awk} program goes into an
-infinite loop and you want to see what has been executed.
-To use this feature, run @command{pgawk} in the background:
-
-@example
-$ @kbd{pgawk -f myprog &}
-[1] 13992
-@end example
-
-@cindex @command{kill} command@comma{} dynamic profiling
-@cindex @code{USR1} signal
-@cindex @code{SIGUSR1} signal
-@cindex signals, @code{USR1}/@code{SIGUSR1}
-@noindent
-The shell prints a job number and process ID number; in this case, 13992.
-Use the @command{kill} command to send the @code{USR1} signal
-to @command{pgawk}:
-
-@example
-$ @kbd{kill -USR1 13992}
-@end example
-
-@noindent
-As usual, the profiled version of the program is written to
-@file{awkprof.out}, or to a different file if you use the @option{--profile}
-option.
-
-Along with the regular profile, as shown earlier, the profile
-includes a trace of any active functions:
-
-@example
-# Function Call Stack:
-
-# 3. baz
-# 2. bar
-# 1. foo
-# -- main --
-@end example
-
-You may send @command{pgawk} the @code{USR1} signal as many times as you like.
-Each time, the profile and function call trace are appended to the output
-profile file.
-
-@cindex @code{HUP} signal
-@cindex @code{SIGHUP} signal
-@cindex signals, @code{HUP}/@code{SIGHUP}
-If you use the @code{HUP} signal instead of the @code{USR1} signal,
-@command{pgawk} produces the profile and the function call trace and then exits.
-
-@cindex @code{INT} signal (MS-Windows)
-@cindex @code{SIGINT} signal (MS-Windows)
-@cindex signals, @code{INT}/@code{SIGINT} (MS-Windows)
-@cindex @code{QUIT} signal (MS-Windows)
-@cindex @code{SIGQUIT} signal (MS-Windows)
-@cindex signals, @code{QUIT}/@code{SIGQUIT} (MS-Windows)
-When @command{pgawk} runs on MS-Windows systems, it uses the
-@code{INT} and @code{QUIT} signals for producing the profile and, in
-the case of the @code{INT} signal, @command{pgawk} exits. This is
-because these systems don't support the @command{kill} command, so the
-only signals you can deliver to a program are those generated by the
-keyboard. The @code{INT} signal is generated by the
-@kbd{@value{CTL}-@key{C}} or @kbd{@value{CTL}-@key{BREAK}} key, while the
-@code{QUIT} signal is generated by the @kbd{@value{CTL}-@key{\}} key.
-
-Finally, regular @command{gawk} also accepts the @option{--profile} option.
-When called this way, @command{gawk} ``pretty prints'' the program into
-@file{awkprof.out}, without any execution counts.
-@c ENDOFRANGE advgaw
-@c ENDOFRANGE gawadv
-@c ENDOFRANGE pgawk
-@c ENDOFRANGE awkp
-@c ENDOFRANGE proawk
-
@node Library Functions
@chapter A Library of @command{awk} Functions
@c STARTOFRANGE libf
@@ -19551,7 +18196,7 @@ programming use.
* Ordinal Functions:: Functions for using characters as numbers and
vice versa.
* Join Function:: A function to join an array into a string.
-* Gettimeofday Function:: A function to get formatted times.
+* Getlocaltime Function:: A function to get formatted times.
@end menu
@node Strtonum Function
@@ -20076,7 +18721,7 @@ be nice if @command{awk} had an assignment operator for concatenation.
The lack of an explicit operator for concatenation makes string operations
more difficult than they really need to be.}
-@node Gettimeofday Function
+@node Getlocaltime Function
@subsection Managing the Time of Day
@cindex libraries of @command{awk} functions, managing, time
@@ -20090,14 +18735,14 @@ in human readable form. While @code{strftime()} is extensive, the control
formats are not necessarily easy to remember or intuitively obvious when
reading a program.
-The following function, @code{gettimeofday()}, populates a user-supplied array
+The following function, @code{getlocaltime()}, populates a user-supplied array
with preformatted time information. It returns a string with the current
time formatted in the same way as the @command{date} utility:
-@cindex @code{gettimeofday()} user-defined function
+@cindex @code{getlocaltime()} user-defined function
@example
@c file eg/lib/gettime.awk
-# gettimeofday.awk --- get the time of day in a usable format
+# getlocaltime.awk --- get the time of day in a usable format
@c endfile
@ignore
@c file eg/lib/gettime.awk
@@ -20130,7 +18775,7 @@ time formatted in the same way as the @command{date} utility:
# time["weeknum"] -- week number, Sunday first day
# time["altweeknum"] -- week number, Monday first day
-function gettimeofday(time, ret, now, i)
+function getlocaltime(time, ret, now, i)
@{
# get time once, avoids unnecessary system calls
now = systime()
@@ -20172,7 +18817,7 @@ The string indices are easier to use and read than the various formats
required by @code{strftime()}. The @code{alarm} program presented in
@ref{Alarm Program},
uses this function.
-A more general design for the @code{gettimeofday()} function would have
+A more general design for the @code{getlocaltime()} function would have
allowed the user to supply an optional timestamp value to use instead
of the current time.
@@ -23464,8 +22109,8 @@ it prints the message on the standard output. In addition, you can give it
the number of times to repeat the message as well as a delay between
repetitions.
-This program uses the @code{gettimeofday()} function from
-@ref{Gettimeofday Function}.
+This program uses the @code{getlocaltime()} function from
+@ref{Getlocaltime Function}.
All the work is done in the @code{BEGIN} rule. The first part is argument
checking and setting of defaults: the delay, the count, and the message to
@@ -23484,7 +22129,7 @@ Here is the program:
@c file eg/prog/alarm.awk
# alarm.awk --- set an alarm
#
-# Requires gettimeofday() library function
+# Requires getlocaltime() library function
@c endfile
@ignore
@c file eg/prog/alarm.awk
@@ -23556,7 +22201,7 @@ is how long to wait before setting off the alarm:
minute = atime[2] + 0 # force numeric
# get current broken down time
- gettimeofday(now)
+ getlocaltime(now)
# if time given is 12-hour hours and it's after that
# hour, e.g., `alarm 5:30' at 9 a.m. means 5:30 p.m.,
@@ -25223,45 +23868,1960 @@ BEGIN {
}
@end ignore
+@iftex
+@part Part III:@* Moving Beyond Standard @command{awk} With @command{gawk}
+@end iftex
+
+@ignore
+@ifdocbook
+
+@part Part III:@* Moving Beyond Standard @command{awk} With @command{gawk}
+
+Part III focuses on features specific to @command{gawk}.
+It contains the following chapters:
+
+@itemize @bullet
+@item
+@ref{Internationalization}.
+
+@item
+@ref{Advanced Features}.
+
+@item
+@ref{Debugger}.
+
+@item
+@ref{Arbitrary Precision Arithmetic}.
+
+@item
+@ref{Dynamic Extensions}.
+@end ifdocbook
+@end ignore
+
+@node Internationalization
+@chapter Internationalization with @command{gawk}
+
+Once upon a time, computer makers
+wrote software that worked only in English.
+Eventually, hardware and software vendors noticed that if their
+systems worked in the native languages of non-English-speaking
+countries, they were able to sell more systems.
+As a result, internationalization and localization
+of programs and software systems became a common practice.
+
+@c STARTOFRANGE inloc
+@cindex internationalization, localization
+@cindex @command{gawk}, internationalization and, See internationalization
+@cindex internationalization, localization, @command{gawk} and
+For many years, the ability to provide internationalization
+was largely restricted to programs written in C and C++.
+This @value{CHAPTER} describes the underlying library @command{gawk}
+uses for internationalization, as well as how
+@command{gawk} makes internationalization
+features available at the @command{awk} program level.
+Having internationalization available at the @command{awk} level
+gives software developers additional flexibility---they are no
+longer forced to write in C or C++ when internationalization is
+a requirement.
+
+@menu
+* I18N and L10N:: Internationalization and Localization.
+* Explaining gettext:: How GNU @code{gettext} works.
+* Programmer i18n:: Features for the programmer.
+* Translator i18n:: Features for the translator.
+* I18N Example:: A simple i18n example.
+* Gawk I18N:: @command{gawk} is also internationalized.
+@end menu
+
+@node I18N and L10N
+@section Internationalization and Localization
+
+@cindex internationalization
+@cindex localization, See internationalization@comma{} localization
+@cindex localization
+@dfn{Internationalization} means writing (or modifying) a program once,
+in such a way that it can use multiple languages without requiring
+further source-code changes.
+@dfn{Localization} means providing the data necessary for an
+internationalized program to work in a particular language.
+Most typically, these terms refer to features such as the language
+used for printing error messages, the language used to read
+responses, and information related to how numerical and
+monetary values are printed and read.
+
+@node Explaining gettext
+@section GNU @code{gettext}
+
+@cindex internationalizing a program
+@c STARTOFRANGE gettex
+@cindex @code{gettext} library
+The facilities in GNU @code{gettext} focus on messages; strings printed
+by a program, either directly or via formatting with @code{printf} or
+@code{sprintf()}.@footnote{For some operating systems, the @command{gawk}
+port doesn't support GNU @code{gettext}.
+Therefore, these features are not available
+if you are using one of those operating systems. Sorry.}
+
+@cindex portability, @code{gettext} library and
+When using GNU @code{gettext}, each application has its own
+@dfn{text domain}. This is a unique name, such as @samp{kpilot} or @samp{gawk},
+that identifies the application.
+A complete application may have multiple components---programs written
+in C or C++, as well as scripts written in @command{sh} or @command{awk}.
+All of the components use the same text domain.
+
+To make the discussion concrete, assume we're writing an application
+named @command{guide}. Internationalization consists of the
+following steps, in this order:
+
+@enumerate
+@item
+The programmer goes
+through the source for all of @command{guide}'s components
+and marks each string that is a candidate for translation.
+For example, @code{"`-F': option required"} is a good candidate for translation.
+A table with strings of option names is not (e.g., @command{gawk}'s
+@option{--profile} option should remain the same, no matter what the local
+language).
+
+@cindex @code{textdomain()} function (C library)
+@item
+The programmer indicates the application's text domain
+(@code{"guide"}) to the @code{gettext} library,
+by calling the @code{textdomain()} function.
+
+@cindex @code{.pot} files
+@cindex files, @code{.pot}
+@cindex portable object template files
+@cindex files, portable object template
+@item
+Messages from the application are extracted from the source code and
+collected into a portable object template file (@file{guide.pot}),
+which lists the strings and their translations.
+The translations are initially empty.
+The original (usually English) messages serve as the key for
+lookup of the translations.
+
+@cindex @code{.po} files
+@cindex files, @code{.po}
+@cindex portable object files
+@cindex files, portable object
+@item
+For each language with a translator, @file{guide.pot}
+is copied to a portable object file (@code{.po})
+and translations are created and shipped with the application.
+For example, there might be a @file{fr.po} for a French translation.
+
+@cindex @code{.mo} files
+@cindex files, @code{.mo}
+@cindex message object files
+@cindex files, message object
+@item
+Each language's @file{.po} file is converted into a binary
+message object (@file{.mo}) file.
+A message object file contains the original messages and their
+translations in a binary format that allows fast lookup of translations
+at runtime.
+
+@item
+When @command{guide} is built and installed, the binary translation files
+are installed in a standard place.
+
+@cindex @code{bindtextdomain()} function (C library)
+@item
+For testing and development, it is possible to tell @code{gettext}
+to use @file{.mo} files in a different directory than the standard
+one by using the @code{bindtextdomain()} function.
+
+@cindex @code{.mo} files, specifying directory of
+@cindex files, @code{.mo}, specifying directory of
+@cindex message object files, specifying directory of
+@cindex files, message object, specifying directory of
+@item
+At runtime, @command{guide} looks up each string via a call
+to @code{gettext()}. The returned string is the translated string
+if available, or the original string if not.
+
+@item
+If necessary, it is possible to access messages from a different
+text domain than the one belonging to the application, without
+having to switch the application's default text domain back
+and forth.
+@end enumerate
+
+@cindex @code{gettext()} function (C library)
+In C (or C++), the string marking and dynamic translation lookup
+are accomplished by wrapping each string in a call to @code{gettext()}:
+
+@example
+printf("%s", gettext("Don't Panic!\n"));
+@end example
+
+The tools that extract messages from source code pull out all
+strings enclosed in calls to @code{gettext()}.
+
+@cindex @code{_} (underscore), @code{_} C macro
+@cindex underscore (@code{_}), @code{_} C macro
+The GNU @code{gettext} developers, recognizing that typing
+@samp{gettext(@dots{})} over and over again is both painful and ugly to look
+at, use the macro @samp{_} (an underscore) to make things easier:
+
+@example
+/* In the standard header file: */
+#define _(str) gettext(str)
+
+/* In the program text: */
+printf("%s", _("Don't Panic!\n"));
+@end example
+
+@cindex internationalization, localization, locale categories
+@cindex @code{gettext} library, locale categories
+@cindex locale categories
+@noindent
+This reduces the typing overhead to just three extra characters per string
+and is considerably easier to read as well.
+
+There are locale @dfn{categories}
+for different types of locale-related information.
+The defined locale categories that @code{gettext} knows about are:
+
+@table @code
+@cindex @code{LC_MESSAGES} locale category
+@item LC_MESSAGES
+Text messages. This is the default category for @code{gettext}
+operations, but it is possible to supply a different one explicitly,
+if necessary. (It is almost never necessary to supply a different category.)
+
+@cindex sorting characters in different languages
+@cindex @code{LC_COLLATE} locale category
+@item LC_COLLATE
+Text-collation information; i.e., how different characters
+and/or groups of characters sort in a given language.
+
+@cindex @code{LC_CTYPE} locale category
+@item LC_CTYPE
+Character-type information (alphabetic, digit, upper- or lowercase, and
+so on).
+This information is accessed via the
+POSIX character classes in regular expressions,
+such as @code{/[[:alnum:]]/}
+(@pxref{Regexp Operators}).
+
+@cindex monetary information, localization
+@cindex currency symbols, localization
+@cindex @code{LC_MONETARY} locale category
+@item LC_MONETARY
+Monetary information, such as the currency symbol, and whether the
+symbol goes before or after a number.
+
+@cindex @code{LC_NUMERIC} locale category
+@item LC_NUMERIC
+Numeric information, such as which characters to use for the decimal
+point and the thousands separator.@footnote{Americans
+use a comma every three decimal places and a period for the decimal
+point, while many Europeans do exactly the opposite:
+1,234.56 versus 1.234,56.}
+
+@cindex @code{LC_RESPONSE} locale category
+@item LC_RESPONSE
+Response information, such as how ``yes'' and ``no'' appear in the
+local language, and possibly other information as well.
+
+@cindex time, localization and
+@cindex dates, information related to@comma{} localization
+@cindex @code{LC_TIME} locale category
+@item LC_TIME
+Time- and date-related information, such as 12- or 24-hour clock, month printed
+before or after the day in a date, local month abbreviations, and so on.
+
+@cindex @code{LC_ALL} locale category
+@item LC_ALL
+All of the above. (Not too useful in the context of @code{gettext}.)
+@end table
+@c ENDOFRANGE gettex
+
+@node Programmer i18n
+@section Internationalizing @command{awk} Programs
+@c STARTOFRANGE inap
+@cindex @command{awk} programs, internationalizing
+
+@command{gawk} provides the following variables and functions for
+internationalization:
+
+@table @code
+@cindex @code{TEXTDOMAIN} variable
+@item TEXTDOMAIN
+This variable indicates the application's text domain.
+For compatibility with GNU @code{gettext}, the default
+value is @code{"messages"}.
+
+@cindex internationalization, localization, marked strings
+@cindex strings, for localization
+@item _"your message here"
+String constants marked with a leading underscore
+are candidates for translation at runtime.
+String constants without a leading underscore are not translated.
+
+@cindex @code{dcgettext()} function (@command{gawk})
+@item dcgettext(@var{string} @r{[}, @var{domain} @r{[}, @var{category}@r{]]})
+Return the translation of @var{string} in
+text domain @var{domain} for locale category @var{category}.
+The default value for @var{domain} is the current value of @code{TEXTDOMAIN}.
+The default value for @var{category} is @code{"LC_MESSAGES"}.
+
+If you supply a value for @var{category}, it must be a string equal to
+one of the known locale categories described in
+@ifnotinfo
+the previous @value{SECTION}.
+@end ifnotinfo
+@ifinfo
+@ref{Explaining gettext}.
+@end ifinfo
+You must also supply a text domain. Use @code{TEXTDOMAIN} if
+you want to use the current domain.
+
+@quotation CAUTION
+The order of arguments to the @command{awk} version
+of the @code{dcgettext()} function is purposely different from the order for
+the C version. The @command{awk} version's order was
+chosen to be simple and to allow for reasonable @command{awk}-style
+default arguments.
+@end quotation
+
+@cindex @code{dcngettext()} function (@command{gawk})
+@item dcngettext(@var{string1}, @var{string2}, @var{number} @r{[}, @var{domain} @r{[}, @var{category}@r{]]})
+Return the plural form used for @var{number} of the
+translation of @var{string1} and @var{string2} in text domain
+@var{domain} for locale category @var{category}. @var{string1} is the
+English singular variant of a message, and @var{string2} the English plural
+variant of the same message.
+The default value for @var{domain} is the current value of @code{TEXTDOMAIN}.
+The default value for @var{category} is @code{"LC_MESSAGES"}.
+
+The same remarks about argument order as for the @code{dcgettext()} function apply.
+
+@cindex @code{.mo} files, specifying directory of
+@cindex files, @code{.mo}, specifying directory of
+@cindex message object files, specifying directory of
+@cindex files, message object, specifying directory of
+@cindex @code{bindtextdomain()} function (@command{gawk})
+@item bindtextdomain(@var{directory} @r{[}, @var{domain}@r{]})
+Change the directory in which
+@code{gettext} looks for @file{.mo} files, in case they
+will not or cannot be placed in the standard locations
+(e.g., during testing).
+Return the directory in which @var{domain} is ``bound.''
+
+The default @var{domain} is the value of @code{TEXTDOMAIN}.
+If @var{directory} is the null string (@code{""}), then
+@code{bindtextdomain()} returns the current binding for the
+given @var{domain}.
+@end table
+
+To use these facilities in your @command{awk} program, follow the steps
+outlined in
+@ifnotinfo
+the previous @value{SECTION},
+@end ifnotinfo
+@ifinfo
+@ref{Explaining gettext},
+@end ifinfo
+like so:
+
+@enumerate
+@cindex @code{BEGIN} pattern, @code{TEXTDOMAIN} variable and
+@cindex @code{TEXTDOMAIN} variable, @code{BEGIN} pattern and
+@item
+Set the variable @code{TEXTDOMAIN} to the text domain of
+your program. This is best done in a @code{BEGIN} rule
+(@pxref{BEGIN/END}),
+or it can also be done via the @option{-v} command-line
+option (@pxref{Options}):
+
+@example
+BEGIN @{
+ TEXTDOMAIN = "guide"
+ @dots{}
+@}
+@end example
+
+@cindex @code{_} (underscore), translatable string
+@cindex underscore (@code{_}), translatable string
+@item
+Mark all translatable strings with a leading underscore (@samp{_})
+character. It @emph{must} be adjacent to the opening
+quote of the string. For example:
+
+@example
+print _"hello, world"
+x = _"you goofed"
+printf(_"Number of users is %d\n", nusers)
+@end example
+
+@item
+If you are creating strings dynamically, you can
+still translate them, using the @code{dcgettext()}
+built-in function:
+
+@example
+message = nusers " users logged in"
+message = dcgettext(message, "adminprog")
+print message
+@end example
+
+Here, the call to @code{dcgettext()} supplies a different
+text domain (@code{"adminprog"}) in which to find the
+message, but it uses the default @code{"LC_MESSAGES"} category.
+
+@cindex @code{LC_MESSAGES} locale category, @code{bindtextdomain()} function (@command{gawk})
+@item
+During development, you might want to put the @file{.mo}
+file in a private directory for testing. This is done
+with the @code{bindtextdomain()} built-in function:
+
+@example
+BEGIN @{
+ TEXTDOMAIN = "guide" # our text domain
+ if (Testing) @{
+ # where to find our files
+ bindtextdomain("testdir")
+ # joe is in charge of adminprog
+ bindtextdomain("../joe/testdir", "adminprog")
+ @}
+ @dots{}
+@}
+@end example
+
+@end enumerate
+
+@xref{I18N Example},
+for an example program showing the steps to create
+and use translations from @command{awk}.
+
+@node Translator i18n
+@section Translating @command{awk} Programs
+
+@cindex @code{.po} files
+@cindex files, @code{.po}
+@cindex portable object files
+@cindex files, portable object
+Once a program's translatable strings have been marked, they must
+be extracted to create the initial @file{.po} file.
+As part of translation, it is often helpful to rearrange the order
+in which arguments to @code{printf} are output.
+
+@command{gawk}'s @option{--gen-pot} command-line option extracts
+the messages and is discussed next.
+After that, @code{printf}'s ability to
+rearrange the order for @code{printf} arguments at runtime
+is covered.
+
+@menu
+* String Extraction:: Extracting marked strings.
+* Printf Ordering:: Rearranging @code{printf} arguments.
+* I18N Portability:: @command{awk}-level portability issues.
+@end menu
+
+@node String Extraction
+@subsection Extracting Marked Strings
+@cindex strings, extracting
+@cindex marked strings@comma{} extracting
+@cindex @code{--gen-pot} option
+@cindex command-line options, string extraction
+@cindex string extraction (internationalization)
+@cindex marked string extraction (internationalization)
+@cindex extraction, of marked strings (internationalization)
+
+@cindex @code{--gen-pot} option
+Once your @command{awk} program is working, and all the strings have
+been marked and you've set (and perhaps bound) the text domain,
+it is time to produce translations.
+First, use the @option{--gen-pot} command-line option to create
+the initial @file{.pot} file:
+
+@example
+$ @kbd{gawk --gen-pot -f guide.awk > guide.pot}
+@end example
+
+@cindex @code{xgettext} utility
+When run with @option{--gen-pot}, @command{gawk} does not execute your
+program. Instead, it parses it as usual and prints all marked strings
+to standard output in the format of a GNU @code{gettext} Portable Object
+file. Also included in the output are any constant strings that
+appear as the first argument to @code{dcgettext()} or as the first and
+second argument to @code{dcngettext()}.@footnote{The
+@command{xgettext} utility that comes with GNU
+@code{gettext} can handle @file{.awk} files.}
+@xref{I18N Example},
+for the full list of steps to go through to create and test
+translations for @command{guide}.
+
+@node Printf Ordering
+@subsection Rearranging @code{printf} Arguments
+
+@cindex @code{printf} statement, positional specifiers
+@cindex positional specifiers, @code{printf} statement
+Format strings for @code{printf} and @code{sprintf()}
+(@pxref{Printf})
+present a special problem for translation.
+Consider the following:@footnote{This example is borrowed
+from the GNU @code{gettext} manual.}
+
+@c line broken here only for smallbook format
+@example
+printf(_"String `%s' has %d characters\n",
+ string, length(string)))
+@end example
+
+A possible German translation for this might be:
+
+@example
+"%d Zeichen lang ist die Zeichenkette `%s'\n"
+@end example
+
+The problem should be obvious: the order of the format
+specifications is different from the original!
+Even though @code{gettext()} can return the translated string
+at runtime,
+it cannot change the argument order in the call to @code{printf}.
+
+To solve this problem, @code{printf} format specifiers may have
+an additional optional element, which we call a @dfn{positional specifier}.
+For example:
+
+@example
+"%2$d Zeichen lang ist die Zeichenkette `%1$s'\n"
+@end example
+
+Here, the positional specifier consists of an integer count, which indicates which
+argument to use, and a @samp{$}. Counts are one-based, and the
+format string itself is @emph{not} included. Thus, in the following
+example, @samp{string} is the first argument and @samp{length(string)} is the second:
+
+@example
+$ @kbd{gawk 'BEGIN @{}
+> @kbd{string = "Dont Panic"}
+> @kbd{printf _"%2$d characters live in \"%1$s\"\n",}
+> @kbd{string, length(string)}
+> @kbd{@}'}
+@print{} 10 characters live in "Dont Panic"
+@end example
+
+If present, positional specifiers come first in the format specification,
+before the flags, the field width, and/or the precision.
+
+Positional specifiers can be used with the dynamic field width and
+precision capability:
+
+@example
+$ @kbd{gawk 'BEGIN @{}
+> @kbd{printf("%*.*s\n", 10, 20, "hello")}
+> @kbd{printf("%3$*2$.*1$s\n", 20, 10, "hello")}
+> @kbd{@}'}
+@print{} hello
+@print{} hello
+@end example
+
+@quotation NOTE
+When using @samp{*} with a positional specifier, the @samp{*}
+comes first, then the integer position, and then the @samp{$}.
+This is somewhat counterintuitive.
+@end quotation
+
+@cindex @code{printf} statement, positional specifiers, mixing with regular formats
+@cindex positional specifiers, @code{printf} statement, mixing with regular formats
+@cindex format specifiers, mixing regular with positional specifiers
+@command{gawk} does not allow you to mix regular format specifiers
+and those with positional specifiers in the same string:
+
+@example
+$ @kbd{gawk 'BEGIN @{ printf _"%d %3$s\n", 1, 2, "hi" @}'}
+@error{} gawk: cmd. line:1: fatal: must use `count$' on all formats or none
+@end example
+
+@quotation NOTE
+There are some pathological cases that @command{gawk} may fail to
+diagnose. In such cases, the output may not be what you expect.
+It's still a bad idea to try mixing them, even if @command{gawk}
+doesn't detect it.
+@end quotation
+
+Although positional specifiers can be used directly in @command{awk} programs,
+their primary purpose is to help in producing correct translations of
+format strings into languages different from the one in which the program
+is first written.
+
+@node I18N Portability
+@subsection @command{awk} Portability Issues
+
+@cindex portability, internationalization and
+@cindex internationalization, localization, portability and
+@command{gawk}'s internationalization features were purposely chosen to
+have as little impact as possible on the portability of @command{awk}
+programs that use them to other versions of @command{awk}.
+Consider this program:
+
+@example
+BEGIN @{
+ TEXTDOMAIN = "guide"
+ if (Test_Guide) # set with -v
+ bindtextdomain("/test/guide/messages")
+ print _"don't panic!"
+@}
+@end example
+
+@noindent
+As written, it won't work on other versions of @command{awk}.
+However, it is actually almost portable, requiring very little
+change:
+
+@itemize @bullet
+@cindex @code{TEXTDOMAIN} variable, portability and
+@item
+Assignments to @code{TEXTDOMAIN} won't have any effect,
+since @code{TEXTDOMAIN} is not special in other @command{awk} implementations.
+
+@item
+Non-GNU versions of @command{awk} treat marked strings
+as the concatenation of a variable named @code{_} with the string
+following it.@footnote{This is good fodder for an ``Obfuscated
+@command{awk}'' contest.} Typically, the variable @code{_} has
+the null string (@code{""}) as its value, leaving the original string constant as
+the result.
+
+@item
+By defining ``dummy'' functions to replace @code{dcgettext()}, @code{dcngettext()}
+and @code{bindtextdomain()}, the @command{awk} program can be made to run, but
+all the messages are output in the original language.
+For example:
+
+@cindex @code{bindtextdomain()} function (@command{gawk}), portability and
+@cindex @code{dcgettext()} function (@command{gawk}), portability and
+@cindex @code{dcngettext()} function (@command{gawk}), portability and
+@example
+@c file eg/lib/libintl.awk
+function bindtextdomain(dir, domain)
+@{
+ return dir
+@}
+
+function dcgettext(string, domain, category)
+@{
+ return string
+@}
+
+function dcngettext(string1, string2, number, domain, category)
+@{
+ return (number == 1 ? string1 : string2)
+@}
+@c endfile
+@end example
+
+@item
+The use of positional specifications in @code{printf} or
+@code{sprintf()} is @emph{not} portable.
+To support @code{gettext()} at the C level, many systems' C versions of
+@code{sprintf()} do support positional specifiers. But it works only if
+enough arguments are supplied in the function call. Many versions of
+@command{awk} pass @code{printf} formats and arguments unchanged to the
+underlying C library version of @code{sprintf()}, but only one format and
+argument at a time. What happens if a positional specification is
+used is anybody's guess.
+However, since the positional specifications are primarily for use in
+@emph{translated} format strings, and since non-GNU @command{awk}s never
+retrieve the translated string, this should not be a problem in practice.
+@end itemize
+@c ENDOFRANGE inap
+
+@node I18N Example
+@section A Simple Internationalization Example
+
+Now let's look at a step-by-step example of how to internationalize and
+localize a simple @command{awk} program, using @file{guide.awk} as our
+original source:
+
+@example
+@c file eg/prog/guide.awk
+BEGIN @{
+ TEXTDOMAIN = "guide"
+ bindtextdomain(".") # for testing
+ print _"Don't Panic"
+ print _"The Answer Is", 42
+ print "Pardon me, Zaphod who?"
+@}
+@c endfile
+@end example
+
+@noindent
+Run @samp{gawk --gen-pot} to create the @file{.pot} file:
+
+@example
+$ @kbd{gawk --gen-pot -f guide.awk > guide.pot}
+@end example
+
+@noindent
+This produces:
+
+@example
+@c file eg/data/guide.po
+#: guide.awk:4
+msgid "Don't Panic"
+msgstr ""
+
+#: guide.awk:5
+msgid "The Answer Is"
+msgstr ""
+
+@c endfile
+@end example
+
+This original portable object template file is saved and reused for each language
+into which the application is translated. The @code{msgid}
+is the original string and the @code{msgstr} is the translation.
+
+@quotation NOTE
+Strings not marked with a leading underscore do not
+appear in the @file{guide.pot} file.
+@end quotation
+
+Next, the messages must be translated.
+Here is a translation to a hypothetical dialect of English,
+called ``Mellow'':@footnote{Perhaps it would be better if it were
+called ``Hippy.'' Ah, well.}
+
+@example
+@group
+$ cp guide.pot guide-mellow.po
+@var{Add translations to} guide-mellow.po @dots{}
+@end group
+@end example
+
+@noindent
+Following are the translations:
+
+@example
+@c file eg/data/guide-mellow.po
+#: guide.awk:4
+msgid "Don't Panic"
+msgstr "Hey man, relax!"
+
+#: guide.awk:5
+msgid "The Answer Is"
+msgstr "Like, the scoop is"
+
+@c endfile
+@end example
+
+@cindex Linux
+@cindex GNU/Linux
+The next step is to make the directory to hold the binary message object
+file and then to create the @file{guide.mo} file.
+The directory layout shown here is standard for GNU @code{gettext} on
+GNU/Linux systems. Other versions of @code{gettext} may use a different
+layout:
+
+@example
+$ @kbd{mkdir en_US en_US/LC_MESSAGES}
+@end example
+
+@cindex @code{.po} files, converting to @code{.mo}
+@cindex files, @code{.po}, converting to @code{.mo}
+@cindex @code{.mo} files, converting from @code{.po}
+@cindex files, @code{.mo}, converting from @code{.po}
+@cindex portable object files, converting to message object files
+@cindex files, portable object, converting to message object files
+@cindex message object files, converting from portable object files
+@cindex files, message object, converting from portable object files
+@cindex @command{msgfmt} utility
+The @command{msgfmt} utility does the conversion from human-readable
+@file{.po} file to machine-readable @file{.mo} file.
+By default, @command{msgfmt} creates a file named @file{messages}.
+This file must be renamed and placed in the proper directory so that
+@command{gawk} can find it:
+
+@example
+$ @kbd{msgfmt guide-mellow.po}
+$ @kbd{mv messages en_US/LC_MESSAGES/guide.mo}
+@end example
+
+Finally, we run the program to test it:
+
+@example
+$ @kbd{gawk -f guide.awk}
+@print{} Hey man, relax!
+@print{} Like, the scoop is 42
+@print{} Pardon me, Zaphod who?
+@end example
+
+If the three replacement functions for @code{dcgettext()}, @code{dcngettext()}
+and @code{bindtextdomain()}
+(@pxref{I18N Portability})
+are in a file named @file{libintl.awk},
+then we can run @file{guide.awk} unchanged as follows:
+
+@example
+$ @kbd{gawk --posix -f guide.awk -f libintl.awk}
+@print{} Don't Panic
+@print{} The Answer Is 42
+@print{} Pardon me, Zaphod who?
+@end example
+
+@node Gawk I18N
+@section @command{gawk} Can Speak Your Language
+
+@command{gawk} itself has been internationalized
+using the GNU @code{gettext} package.
+(GNU @code{gettext} is described in
+complete detail in
+@ifinfo
+@inforef{Top, , GNU @code{gettext} utilities, gettext, GNU gettext tools}.)
+@end ifinfo
+@ifnotinfo
+@cite{GNU gettext tools}.)
+@end ifnotinfo
+As of this writing, the latest version of GNU @code{gettext} is
+@uref{ftp://ftp.gnu.org/gnu/gettext/gettext-0.18.1.tar.gz, @value{PVERSION} 0.18.1}.
+
+If a translation of @command{gawk}'s messages exists,
+then @command{gawk} produces usage messages, warnings,
+and fatal errors in the local language.
+@c ENDOFRANGE inloc
+
+@node Advanced Features
+@chapter Advanced Features of @command{gawk}
+@cindex advanced features, network connections, See Also networks, connections
+@c STARTOFRANGE gawadv
+@cindex @command{gawk}, features, advanced
+@c STARTOFRANGE advgaw
+@cindex advanced features, @command{gawk}
+@ignore
+Contributed by: Peter Langston <pud!psl@bellcore.bellcore.com>
+
+ Found in Steve English's "signature" line:
+
+"Write documentation as if whoever reads it is a violent psychopath
+who knows where you live."
+@end ignore
+@quotation
+@i{Write documentation as if whoever reads it is
+a violent psychopath who knows where you live.}@*
+Steve English, as quoted by Peter Langston
+@end quotation
+
+This @value{CHAPTER} discusses advanced features in @command{gawk}.
+It's a bit of a ``grab bag'' of items that are otherwise unrelated
+to each other.
+First, a command-line option allows @command{gawk} to recognize
+nondecimal numbers in input data, not just in @command{awk}
+programs.
+Then, @command{gawk}'s special features for sorting arrays are presented.
+Next, two-way I/O, discussed briefly in earlier parts of this
+@value{DOCUMENT}, is described in full detail, along with the basics
+of TCP/IP networking. Finally, @command{gawk}
+can @dfn{profile} an @command{awk} program, making it possible to tune
+it for performance.
+
+@ref{Dynamic Extensions},
+discusses the ability to dynamically add new built-in functions to
+@command{gawk}. As this feature is still immature and likely to change,
+its description is relegated to an appendix.
+
+@menu
+* Nondecimal Data:: Allowing nondecimal input data.
+* Array Sorting:: Facilities for controlling array traversal and
+ sorting arrays.
+* Two-way I/O:: Two-way communications with another process.
+* TCP/IP Networking:: Using @command{gawk} for network programming.
+* Profiling:: Profiling your @command{awk} programs.
+@end menu
+
+@node Nondecimal Data
+@section Allowing Nondecimal Input Data
+@cindex @code{--non-decimal-data} option
+@cindex advanced features, @command{gawk}, nondecimal input data
+@cindex input, data@comma{} nondecimal
+@cindex constants, nondecimal
+
+If you run @command{gawk} with the @option{--non-decimal-data} option,
+you can have nondecimal constants in your input data:
+
+@c line break here for small book format
+@example
+$ @kbd{echo 0123 123 0x123 |}
+> @kbd{gawk --non-decimal-data '@{ printf "%d, %d, %d\n",}
+> @kbd{$1, $2, $3 @}'}
+@print{} 83, 123, 291
+@end example
+
+For this feature to work, write your program so that
+@command{gawk} treats your data as numeric:
+
+@example
+$ @kbd{echo 0123 123 0x123 | gawk '@{ print $1, $2, $3 @}'}
+@print{} 0123 123 0x123
+@end example
+
+@noindent
+The @code{print} statement treats its expressions as strings.
+Although the fields can act as numbers when necessary,
+they are still strings, so @code{print} does not try to treat them
+numerically. You may need to add zero to a field to force it to
+be treated as a number. For example:
+
+@example
+$ @kbd{echo 0123 123 0x123 | gawk --non-decimal-data '}
+> @kbd{@{ print $1, $2, $3}
+> @kbd{print $1 + 0, $2 + 0, $3 + 0 @}'}
+@print{} 0123 123 0x123
+@print{} 83 123 291
+@end example
+
+Because it is common to have decimal data with leading zeros, and because
+using this facility could lead to surprising results, the default is to leave it
+disabled. If you want it, you must explicitly request it.
+
+@cindex programming conventions, @code{--non-decimal-data} option
+@cindex @code{--non-decimal-data} option, @code{strtonum()} function and
+@cindex @code{strtonum()} function (@command{gawk}), @code{--non-decimal-data} option and
+@quotation CAUTION
+@emph{Use of this option is not recommended.}
+It can break old programs very badly.
+Instead, use the @code{strtonum()} function to convert your data
+(@pxref{Nondecimal-numbers}).
+This makes your programs easier to write and easier to read, and
+leads to less surprising results.
+@end quotation
+
+@node Array Sorting
+@section Controlling Array Traversal and Array Sorting
+
+@command{gawk} lets you control the order in which a @samp{for (i in array)}
+loop traverses an array.
+
+In addition, two built-in functions, @code{asort()} and @code{asorti()},
+let you sort arrays based on the array values and indices, respectively.
+These two functions also provide control over the sorting criteria used
+to order the elements during sorting.
+
+@menu
+* Controlling Array Traversal:: How to use PROCINFO["sorted_in"].
+* Array Sorting Functions:: How to use @code{asort()} and @code{asorti()}.
+@end menu
+
+@node Controlling Array Traversal
+@subsection Controlling Array Traversal
+
+By default, the order in which a @samp{for (i in array)} loop
+scans an array is not defined; it is generally based upon
+the internal implementation of arrays inside @command{awk}.
+
+Often, though, it is desirable to be able to loop over the elements
+in a particular order that you, the programmer, choose. @command{gawk}
+lets you do this.
+
+@ref{Controlling Scanning}, describes how you can assign special,
+pre-defined values to @code{PROCINFO["sorted_in"]} in order to
+control the order in which @command{gawk} will traverse an array
+during a @code{for} loop.
+
+In addition, the value of @code{PROCINFO["sorted_in"]} can be a function name.
+This lets you traverse an array based on any custom criterion.
+The array elements are ordered according to the return value of this
+function. The comparison function should be defined with at least
+four arguments:
+
+@example
+function comp_func(i1, v1, i2, v2)
+@{
+ @var{compare elements 1 and 2 in some fashion}
+ @var{return < 0; 0; or > 0}
+@}
+@end example
+
+Here, @var{i1} and @var{i2} are the indices, and @var{v1} and @var{v2}
+are the corresponding values of the two elements being compared.
+Either @var{v1} or @var{v2}, or both, can be arrays if the array being
+traversed contains subarrays as values.
+(@xref{Arrays of Arrays}, for more information about subarrays.)
+The three possible return values are interpreted as follows:
+
+@table @code
+@item comp_func(i1, v1, i2, v2) < 0
+Index @var{i1} comes before index @var{i2} during loop traversal.
+
+@item comp_func(i1, v1, i2, v2) == 0
+Indices @var{i1} and @var{i2}
+come together but the relative order with respect to each other is undefined.
+
+@item comp_func(i1, v1, i2, v2) > 0
+Index @var{i1} comes after index @var{i2} during loop traversal.
+@end table
+
+Our first comparison function can be used to scan an array in
+numerical order of the indices:
+
+@example
+function cmp_num_idx(i1, v1, i2, v2)
+@{
+ # numerical index comparison, ascending order
+ return (i1 - i2)
+@}
+@end example
+
+Our second function traverses an array based on the string order of
+the element values rather than by indices:
+
+@example
+function cmp_str_val(i1, v1, i2, v2)
+@{
+ # string value comparison, ascending order
+ v1 = v1 ""
+ v2 = v2 ""
+ if (v1 < v2)
+ return -1
+ return (v1 != v2)
+@}
+@end example
+
+The third
+comparison function makes all numbers, and numeric strings without
+any leading or trailing spaces, come out first during loop traversal:
+
+@example
+function cmp_num_str_val(i1, v1, i2, v2, n1, n2)
+@{
+ # numbers before string value comparison, ascending order
+ n1 = v1 + 0
+ n2 = v2 + 0
+ if (n1 == v1)
+ return (n2 == v2) ? (n1 - n2) : -1
+ else if (n2 == v2)
+ return 1
+ return (v1 < v2) ? -1 : (v1 != v2)
+@}
+@end example
+
+Here is a main program to demonstrate how @command{gawk}
+behaves using each of the previous functions:
+
+@example
+BEGIN @{
+ data["one"] = 10
+ data["two"] = 20
+ data[10] = "one"
+ data[100] = 100
+ data[20] = "two"
+
+ f[1] = "cmp_num_idx"
+ f[2] = "cmp_str_val"
+ f[3] = "cmp_num_str_val"
+ for (i = 1; i <= 3; i++) @{
+ printf("Sort function: %s\n", f[i])
+ PROCINFO["sorted_in"] = f[i]
+ for (j in data)
+ printf("\tdata[%s] = %s\n", j, data[j])
+ print ""
+ @}
+@}
+@end example
+
+Here are the results when the program is run:
+@page
+
+@example
+$ @kbd{gawk -f compdemo.awk}
+@print{} Sort function: cmp_num_idx @ii{Sort by numeric index}
+@print{} data[two] = 20
+@print{} data[one] = 10 @ii{Both strings are numerically zero}
+@print{} data[10] = one
+@print{} data[20] = two
+@print{} data[100] = 100
+@print{}
+@print{} Sort function: cmp_str_val @ii{Sort by element values as strings}
+@print{} data[one] = 10
+@print{} data[100] = 100 @ii{String 100 is less than string 20}
+@print{} data[two] = 20
+@print{} data[10] = one
+@print{} data[20] = two
+@print{}
+@print{} Sort function: cmp_num_str_val @ii{Sort all numeric values before all strings}
+@print{} data[one] = 10
+@print{} data[two] = 20
+@print{} data[100] = 100
+@print{} data[10] = one
+@print{} data[20] = two
+@end example
+
+Consider sorting the entries of a GNU/Linux system password file
+according to login name. The following program sorts records
+by a specific field position and can be used for this purpose:
+
+@example
+# sort.awk --- simple program to sort by field position
+# field position is specified by the global variable POS
+
+function cmp_field(i1, v1, i2, v2)
+@{
+ # comparison by value, as string, and ascending order
+ return v1[POS] < v2[POS] ? -1 : (v1[POS] != v2[POS])
+@}
+
+@{
+ for (i = 1; i <= NF; i++)
+ a[NR][i] = $i
+@}
+
+END @{
+ PROCINFO["sorted_in"] = "cmp_field"
+ if (POS < 1 || POS > NF)
+ POS = 1
+ for (i in a) @{
+ for (j = 1; j <= NF; j++)
+ printf("%s%c", a[i][j], j < NF ? ":" : "")
+ print ""
+ @}
+@}
+@end example
+
+The first field in each entry of the password file is the user's login name,
+and the fields are separated by colons.
+Each record defines a subarray,
+with each field as an element in the subarray.
+Running the program produces the
+following output:
+
+@example
+$ @kbd{gawk -v POS=1 -F: -f sort.awk /etc/passwd}
+@print{} adm:x:3:4:adm:/var/adm:/sbin/nologin
+@print{} apache:x:48:48:Apache:/var/www:/sbin/nologin
+@print{} avahi:x:70:70:Avahi daemon:/:/sbin/nologin
+@dots{}
+@end example
+
+The comparison should normally always return the same value when given a
+specific pair of array elements as its arguments. If inconsistent
+results are returned then the order is undefined. This behavior can be
+exploited to introduce random order into otherwise seemingly
+ordered data:
+
+@example
+function cmp_randomize(i1, v1, i2, v2)
+@{
+ # random order
+ return (2 - 4 * rand())
+@}
+@end example
+
+As mentioned above, the order of the indices is arbitrary if two
+elements compare equal. This is usually not a problem, but letting
+the tied elements come out in arbitrary order can be an issue, especially
+when comparing item values. The partial ordering of the equal elements
+may change during the next loop traversal, if other elements are added or
+removed from the array. One way to resolve ties when comparing elements
+with otherwise equal values is to include the indices in the comparison
+rules. Note that doing this may make the loop traversal less efficient,
+so consider it only if necessary. The following comparison functions
+force a deterministic order, and are based on the fact that the
+indices of two elements are never equal:
+
+@example
+function cmp_numeric(i1, v1, i2, v2)
+@{
+ # numerical value (and index) comparison, descending order
+ return (v1 != v2) ? (v2 - v1) : (i2 - i1)
+@}
+
+function cmp_string(i1, v1, i2, v2)
+@{
+ # string value (and index) comparison, descending order
+ v1 = v1 i1
+ v2 = v2 i2
+ return (v1 > v2) ? -1 : (v1 != v2)
+@}
+@end example
+
+@c Avoid using the term ``stable'' when describing the unpredictable behavior
+@c if two items compare equal. Usually, the goal of a "stable algorithm"
+@c is to maintain the original order of the items, which is a meaningless
+@c concept for a list constructed from a hash.
+
+A custom comparison function can often simplify ordered loop
+traversal, and the sky is really the limit when it comes to
+designing such a function.
+
+When string comparisons are made during a sort, either for element
+values where one or both aren't numbers, or for element indices
+handled as strings, the value of @code{IGNORECASE}
+(@pxref{Built-in Variables}) controls whether
+the comparisons treat corresponding uppercase and lowercase letters as
+equivalent or distinct.
+
+Another point to keep in mind is that in the case of subarrays
+the element values can themselves be arrays; a production comparison
+function should use the @code{isarray()} function
+(@pxref{Type Functions}),
+to check for this, and choose a defined sorting order for subarrays.
+
+All sorting based on @code{PROCINFO["sorted_in"]}
+is disabled in POSIX mode,
+since the @code{PROCINFO} array is not special in that case.
+
+As a side note, sorting the array indices before traversing
+the array has been reported to add 15% to 20% overhead to the
+execution time of @command{awk} programs. For this reason,
+sorted array traversal is not the default.
+
+@c The @command{gawk}
+@c maintainers believe that only the people who wish to use a
+@c feature should have to pay for it.
+
+@node Array Sorting Functions
+@subsection Sorting Array Values and Indices with @command{gawk}
+
+@cindex arrays, sorting
+@cindex @code{asort()} function (@command{gawk})
+@cindex @code{asort()} function (@command{gawk}), arrays@comma{} sorting
+@cindex sort function, arrays, sorting
+In most @command{awk} implementations, sorting an array requires
+writing a @code{sort()} function.
+While this can be educational for exploring different sorting algorithms,
+usually that's not the point of the program.
+@command{gawk} provides the built-in @code{asort()}
+and @code{asorti()} functions
+(@pxref{String Functions})
+for sorting arrays. For example:
+
+@example
+@var{populate the array} data
+n = asort(data)
+for (i = 1; i <= n; i++)
+ @var{do something with} data[i]
+@end example
+
+After the call to @code{asort()}, the array @code{data} is indexed from 1
+to some number @var{n}, the total number of elements in @code{data}.
+(This count is @code{asort()}'s return value.)
+@code{data[1]} @value{LEQ} @code{data[2]} @value{LEQ} @code{data[3]}, and so on.
+The comparison is based on the type of the elements
+(@pxref{Typing and Comparison}).
+All numeric values come before all string values,
+which in turn come before all subarrays.
+
+@cindex side effects, @code{asort()} function
+An important side effect of calling @code{asort()} is that
+@emph{the array's original indices are irrevocably lost}.
+As this isn't always desirable, @code{asort()} accepts a
+second argument:
+
+@example
+@var{populate the array} source
+n = asort(source, dest)
+for (i = 1; i <= n; i++)
+ @var{do something with} dest[i]
+@end example
+
+In this case, @command{gawk} copies the @code{source} array into the
+@code{dest} array and then sorts @code{dest}, destroying its indices.
+However, the @code{source} array is not affected.
+
+@code{asort()} accepts a third string argument to control comparison of
+array elements. As with @code{PROCINFO["sorted_in"]}, this argument
+may be one of the predefined names that @command{gawk} provides
+(@pxref{Controlling Scanning}), or the name of a user-defined function
+(@pxref{Controlling Array Traversal}).
+
+@quotation NOTE
+In all cases, the sorted element values consist of the original
+array's element values. The ability to control comparison merely
+affects the way in which they are sorted.
+@end quotation
+
+Often, what's needed is to sort on the values of the @emph{indices}
+instead of the values of the elements.
+To do that, use the
+@code{asorti()} function. The interface is identical to that of
+@code{asort()}, except that the index values are used for sorting, and
+become the values of the result array:
+
+@example
+@{ source[$0] = some_func($0) @}
+
+END @{
+ n = asorti(source, dest)
+ for (i = 1; i <= n; i++) @{
+ @ii{Work with sorted indices directly:}
+ @var{do something with} dest[i]
+ @dots{}
+ @ii{Access original array via sorted indices:}
+ @var{do something with} source[dest[i]]
+ @}
+@}
+@end example
+
+Similar to @code{asort()},
+in all cases, the sorted element values consist of the original
+array's indices. The ability to control comparison merely
+affects the way in which they are sorted.
+
+Sorting the array by replacing the indices provides maximal flexibility.
+To traverse the elements in decreasing order, use a loop that goes from
+@var{n} down to 1, either over the elements or over the indices.@footnote{You
+may also use one of the predefined sorting names that sorts in
+decreasing order.}
+
+@cindex reference counting, sorting arrays
+Copying array indices and elements isn't expensive in terms of memory.
+Internally, @command{gawk} maintains @dfn{reference counts} to data.
+For example, when @code{asort()} copies the first array to the second one,
+there is only one copy of the original array elements' data, even though
+both arrays use the values.
+
+@c Document It And Call It A Feature. Sigh.
+@cindex @command{gawk}, @code{IGNORECASE} variable in
+@cindex @code{IGNORECASE} variable
+@cindex arrays, sorting, @code{IGNORECASE} variable and
+@cindex @code{IGNORECASE} variable, array sorting and
+Because @code{IGNORECASE} affects string comparisons, the value
+of @code{IGNORECASE} also affects sorting for both @code{asort()} and @code{asorti()}.
+Note also that the locale's sorting order does @emph{not}
+come into play; comparisons are based on character values only.@footnote{This
+is true because locale-based comparison occurs only when in POSIX
+compatibility mode, and since @code{asort()} and @code{asorti()} are
+@command{gawk} extensions, they are not available in that case.}
+Caveat Emptor.
+
+@node Two-way I/O
+@section Two-Way Communications with Another Process
+@cindex Brennan, Michael
+@cindex programmers, attractiveness of
+@smallexample
+@c Path: cssun.mathcs.emory.edu!gatech!newsxfer3.itd.umich.edu!news-peer.sprintlink.net!news-sea-19.sprintlink.net!news-in-west.sprintlink.net!news.sprintlink.net!Sprint!204.94.52.5!news.whidbey.com!brennan
+From: brennan@@whidbey.com (Mike Brennan)
+Newsgroups: comp.lang.awk
+Subject: Re: Learn the SECRET to Attract Women Easily
+Date: 4 Aug 1997 17:34:46 GMT
+@c Organization: WhidbeyNet
+@c Lines: 12
+Message-ID: <5s53rm$eca@@news.whidbey.com>
+@c References: <5s20dn$2e1@chronicle.concentric.net>
+@c Reply-To: brennan@whidbey.com
+@c NNTP-Posting-Host: asn202.whidbey.com
+@c X-Newsreader: slrn (0.9.4.1 UNIX)
+@c Xref: cssun.mathcs.emory.edu comp.lang.awk:5403
+
+On 3 Aug 1997 13:17:43 GMT, Want More Dates???
+<tracy78@@kilgrona.com> wrote:
+>Learn the SECRET to Attract Women Easily
+>
+>The SCENT(tm) Pheromone Sex Attractant For Men to Attract Women
+
+The scent of awk programmers is a lot more attractive to women than
+the scent of perl programmers.
+--
+Mike Brennan
+@c brennan@@whidbey.com
+@end smallexample
+
+@cindex advanced features, @command{gawk}, processes@comma{} communicating with
+@cindex processes, two-way communications with
+It is often useful to be able to
+send data to a separate program for
+processing and then read the result. This can always be
+done with temporary files:
+
+@example
+# Write the data for processing
+tempfile = ("mydata." PROCINFO["pid"])
+while (@var{not done with data})
+ print @var{data} | ("subprogram > " tempfile)
+close("subprogram > " tempfile)
+
+# Read the results, remove tempfile when done
+while ((getline newdata < tempfile) > 0)
+ @var{process} newdata @var{appropriately}
+close(tempfile)
+system("rm " tempfile)
+@end example
+
+@noindent
+This works, but not elegantly. Among other things, it requires that
+the program be run in a directory that cannot be shared among users;
+for example, @file{/tmp} will not do, as another user might happen
+to be using a temporary file with the same name.
+
+@cindex coprocesses
+@cindex input/output, two-way
+@cindex @code{|} (vertical bar), @code{|&} operator (I/O)
+@cindex vertical bar (@code{|}), @code{|&} operator (I/O)
+@cindex @command{csh} utility, @code{|&} operator, comparison with
+However, with @command{gawk}, it is possible to
+open a @emph{two-way} pipe to another process. The second process is
+termed a @dfn{coprocess}, since it runs in parallel with @command{gawk}.
+The two-way connection is created using the @samp{|&} operator
+(borrowed from the Korn shell, @command{ksh}):@footnote{This is very
+different from the same operator in the C shell.}
+
+@example
+do @{
+ print @var{data} |& "subprogram"
+ "subprogram" |& getline results
+@} while (@var{data left to process})
+close("subprogram")
+@end example
+
+The first time an I/O operation is executed using the @samp{|&}
+operator, @command{gawk} creates a two-way pipeline to a child process
+that runs the other program. Output created with @code{print}
+or @code{printf} is written to the program's standard input, and
+output from the program's standard output can be read by the @command{gawk}
+program using @code{getline}.
+As is the case with processes started by @samp{|}, the subprogram
+can be any program, or pipeline of programs, that can be started by
+the shell.
+
+There are some cautionary items to be aware of:
+
+@itemize @bullet
+@item
+As the code inside @command{gawk} currently stands, the coprocess's
+standard error goes to the same place that the parent @command{gawk}'s
+standard error goes. It is not possible to read the child's
+standard error separately.
+
+@cindex deadlocks
+@cindex buffering, input/output
+@cindex @code{getline} command, deadlock and
+@item
+I/O buffering may be a problem. @command{gawk} automatically
+flushes all output down the pipe to the coprocess.
+However, if the coprocess does not flush its output,
+@command{gawk} may hang when doing a @code{getline} in order to read
+the coprocess's results. This could lead to a situation
+known as @dfn{deadlock}, where each process is waiting for the
+other one to do something.
+@end itemize
+
+@cindex @code{close()} function, two-way pipes and
+It is possible to close just one end of the two-way pipe to
+a coprocess, by supplying a second argument to the @code{close()}
+function of either @code{"to"} or @code{"from"}
+(@pxref{Close Files And Pipes}).
+These strings tell @command{gawk} to close the end of the pipe
+that sends data to the coprocess or the end that reads from it,
+respectively.
+
+@cindex @command{sort} utility, coprocesses and
+This is particularly necessary in order to use
+the system @command{sort} utility as part of a coprocess;
+@command{sort} must read @emph{all} of its input
+data before it can produce any output.
+The @command{sort} program does not receive an end-of-file indication
+until @command{gawk} closes the write end of the pipe.
+
+When you have finished writing data to the @command{sort}
+utility, you can close the @code{"to"} end of the pipe, and
+then start reading sorted data via @code{getline}.
+For example:
+
+@example
+BEGIN @{
+ command = "LC_ALL=C sort"
+ n = split("abcdefghijklmnopqrstuvwxyz", a, "")
+
+ for (i = n; i > 0; i--)
+ print a[i] |& command
+ close(command, "to")
+
+ while ((command |& getline line) > 0)
+ print "got", line
+ close(command)
+@}
+@end example
+
+This program writes the letters of the alphabet in reverse order, one
+per line, down the two-way pipe to @command{sort}. It then closes the
+write end of the pipe, so that @command{sort} receives an end-of-file
+indication. This causes @command{sort} to sort the data and write the
+sorted data back to the @command{gawk} program. Once all of the data
+has been read, @command{gawk} terminates the coprocess and exits.
+
+As a side note, the assignment @samp{LC_ALL=C} in the @command{sort}
+command ensures traditional Unix (ASCII) sorting from @command{sort}.
+
+@cindex @command{gawk}, @code{PROCINFO} array in
+@cindex @code{PROCINFO} array
+You may also use pseudo-ttys (ptys) for
+two-way communication instead of pipes, if your system supports them.
+This is done on a per-command basis, by setting a special element
+in the @code{PROCINFO} array
+(@pxref{Auto-set}),
+like so:
+
+@example
+command = "sort -nr" # command, save in convenience variable
+PROCINFO[command, "pty"] = 1 # update PROCINFO
+print @dots{} |& command # start two-way pipe
+@dots{}
+@end example
+
+@noindent
+Using ptys avoids the buffer deadlock issues described earlier, at some
+loss in performance. If your system does not have ptys, or if all the
+system's ptys are in use, @command{gawk} automatically falls back to
+using regular pipes.
+
+@node TCP/IP Networking
+@section Using @command{gawk} for Network Programming
+@cindex advanced features, @command{gawk}, network programming
+@cindex networks, programming
+@c STARTOFRANGE tcpip
+@cindex TCP/IP
+@cindex @code{/inet/@dots{}} special files (@command{gawk})
+@cindex files, @code{/inet/@dots{}} (@command{gawk})
+@cindex @code{/inet4/@dots{}} special files (@command{gawk})
+@cindex files, @code{/inet4/@dots{}} (@command{gawk})
+@cindex @code{/inet6/@dots{}} special files (@command{gawk})
+@cindex files, @code{/inet6/@dots{}} (@command{gawk})
+@cindex @code{EMISTERED}
+@quotation
+@code{EMISTERED}:@*
+@ @ @ @ @i{A host is a host from coast to coast,@*
+@ @ @ @ and no-one can talk to host that's close,@*
+@ @ @ @ unless the host that isn't close@*
+@ @ @ @ is busy hung or dead.}
+@end quotation
+
+In addition to being able to open a two-way pipeline to a coprocess
+on the same system
+(@pxref{Two-way I/O}),
+it is possible to make a two-way connection to
+another process on another system across an IP network connection.
+
+You can think of this as just a @emph{very long} two-way pipeline to
+a coprocess.
+The way @command{gawk} decides that you want to use TCP/IP networking is
+by recognizing special @value{FN}s that begin with one of @samp{/inet/},
+@samp{/inet4/} or @samp{/inet6}.
+
+The full syntax of the special @value{FN} is
+@file{/@var{net-type}/@var{protocol}/@var{local-port}/@var{remote-host}/@var{remote-port}}.
+The components are:
+
+@table @var
+@item net-type
+Specifies the kind of Internet connection to make.
+Use @samp{/inet4/} to force IPv4, and
+@samp{/inet6/} to force IPv6.
+Plain @samp{/inet/} (which used to be the only option) uses
+the system default, most likely IPv4.
+
+@item protocol
+The protocol to use over IP. This must be either @samp{tcp}, or
+@samp{udp}, for a TCP or UDP IP connection,
+respectively. The use of TCP is recommended for most applications.
+
+@item local-port
+@cindex @code{getaddrinfo()} function (C library)
+The local TCP or UDP port number to use. Use a port number of @samp{0}
+when you want the system to pick a port. This is what you should do
+when writing a TCP or UDP client.
+You may also use a well-known service name, such as @samp{smtp}
+or @samp{http}, in which case @command{gawk} attempts to determine
+the predefined port number using the C @code{getaddrinfo()} function.
+
+@item remote-host
+The IP address or fully-qualified domain name of the Internet
+host to which you want to connect.
+
+@item remote-port
+The TCP or UDP port number to use on the given @var{remote-host}.
+Again, use @samp{0} if you don't care, or else a well-known
+service name.
+@end table
+
+@cindex @command{gawk}, @code{ERRNO} variable in
+@cindex @code{ERRNO} variable
+@quotation NOTE
+Failure in opening a two-way socket will result in a non-fatal error
+being returned to the calling code. The value of @code{ERRNO} indicates
+the error (@pxref{Auto-set}).
+@end quotation
+
+Consider the following very simple example:
+
+@example
+BEGIN @{
+ Service = "/inet/tcp/0/localhost/daytime"
+ Service |& getline
+ print $0
+ close(Service)
+@}
+@end example
+
+This program reads the current date and time from the local system's
+TCP @samp{daytime} server.
+It then prints the results and closes the connection.
+
+Because this topic is extensive, the use of @command{gawk} for
+TCP/IP programming is documented separately.
+@ifinfo
+See
+@inforef{Top, , General Introduction, gawkinet, TCP/IP Internetworking with @command{gawk}},
+@end ifinfo
+@ifnotinfo
+See @cite{TCP/IP Internetworking with @command{gawk}},
+which comes as part of the @command{gawk} distribution,
+@end ifnotinfo
+for a much more complete introduction and discussion, as well as
+extensive examples.
+
+@c ENDOFRANGE tcpip
+
+@node Profiling
+@section Profiling Your @command{awk} Programs
+@c STARTOFRANGE awkp
+@cindex @command{awk} programs, profiling
+@c STARTOFRANGE proawk
+@cindex profiling @command{awk} programs
+@cindex profiling @command{gawk}
+@cindex @code{awkprof.out} file
+@cindex files, @code{awkprof.out}
+
+You may produce execution traces of your @command{awk} programs.
+This is done by passing the option @option{--profile} to @command{gawk}.
+When @command{gawk} has finished running, it creates a profile of your program in a file
+named @file{awkprof.out}. Because it is profiling, it also executes up to 45% slower than
+@command{gawk} normally does.
+
+@cindex @code{--profile} option
+As shown in the following example,
+the @option{--profile} option can be used to change the name of the file
+where @command{gawk} will write the profile:
+
+@example
+gawk --profile=myprog.prof -f myprog.awk data1 data2
+@end example
+
+@noindent
+In the above example, @command{gawk} places the profile in
+@file{myprog.prof} instead of in @file{awkprof.out}.
+
+Here is a sample session showing a simple @command{awk} program, its input data, and the
+results from running @command{gawk} with the @option{--profile} option.
+First, the @command{awk} program:
+
+@example
+BEGIN @{ print "First BEGIN rule" @}
+
+END @{ print "First END rule" @}
+
+/foo/ @{
+ print "matched /foo/, gosh"
+ for (i = 1; i <= 3; i++)
+ sing()
+@}
+
+@{
+ if (/foo/)
+ print "if is true"
+ else
+ print "else is true"
+@}
+
+BEGIN @{ print "Second BEGIN rule" @}
+
+END @{ print "Second END rule" @}
+
+function sing( dummy)
+@{
+ print "I gotta be me!"
+@}
+@end example
+
+Following is the input data:
+
+@example
+foo
+bar
+baz
+foo
+junk
+@end example
+
+Here is the @file{awkprof.out} that results from running the @command{gawk}
+profiler on this program and data (this example also illustrates that @command{awk}
+programmers sometimes have to work late):
+
+@cindex @code{BEGIN} pattern
+@cindex @code{END} pattern
+@example
+ # gawk profile, created Sun Aug 13 00:00:15 2000
+
+ # BEGIN block(s)
+
+ BEGIN @{
+ 1 print "First BEGIN rule"
+ 1 print "Second BEGIN rule"
+ @}
+
+ # Rule(s)
+
+ 5 /foo/ @{ # 2
+ 2 print "matched /foo/, gosh"
+ 6 for (i = 1; i <= 3; i++) @{
+ 6 sing()
+ @}
+ @}
+
+ 5 @{
+ 5 if (/foo/) @{ # 2
+ 2 print "if is true"
+ 3 @} else @{
+ 3 print "else is true"
+ @}
+ @}
+
+ # END block(s)
+
+ END @{
+ 1 print "First END rule"
+ 1 print "Second END rule"
+ @}
+
+ # Functions, listed alphabetically
+
+ 6 function sing(dummy)
+ @{
+ 6 print "I gotta be me!"
+ @}
+@end example
+
+This example illustrates many of the basic features of profiling output.
+They are as follows:
+
+@itemize @bullet
+@item
+The program is printed in the order @code{BEGIN} rule,
+@code{BEGINFILE} rule,
+pattern/action rules,
+@code{ENDFILE} rule, @code{END} rule and functions, listed
+alphabetically.
+Multiple @code{BEGIN} and @code{END} rules are merged together,
+as are multiple @code{BEGINFILE} and @code{ENDFILE} rules.
+
+@cindex patterns, counts
+@item
+Pattern-action rules have two counts.
+The first count, to the left of the rule, shows how many times
+the rule's pattern was @emph{tested}.
+The second count, to the right of the rule's opening left brace
+in a comment,
+shows how many times the rule's action was @emph{executed}.
+The difference between the two indicates how many times the rule's
+pattern evaluated to false.
+
+@item
+Similarly,
+the count for an @code{if}-@code{else} statement shows how many times
+the condition was tested.
+To the right of the opening left brace for the @code{if}'s body
+is a count showing how many times the condition was true.
+The count for the @code{else}
+indicates how many times the test failed.
+
+@cindex loops, count for header
+@item
+The count for a loop header (such as @code{for}
+or @code{while}) shows how many times the loop test was executed.
+(Because of this, you can't just look at the count on the first
+statement in a rule to determine how many times the rule was executed.
+If the first statement is a loop, the count is misleading.)
+
+@cindex functions, user-defined, counts
+@cindex user-defined, functions, counts
+@item
+For user-defined functions, the count next to the @code{function}
+keyword indicates how many times the function was called.
+The counts next to the statements in the body show how many times
+those statements were executed.
+
+@cindex @code{@{@}} (braces)
+@cindex braces (@code{@{@}})
+@item
+The layout uses ``K&R'' style with TABs.
+Braces are used everywhere, even when
+the body of an @code{if}, @code{else}, or loop is only a single statement.
+
+@cindex @code{()} (parentheses)
+@cindex parentheses @code{()}
+@item
+Parentheses are used only where needed, as indicated by the structure
+of the program and the precedence rules.
+@c extra verbiage here satisfies the copyeditor. ugh.
+For example, @samp{(3 + 5) * 4} means add three plus five, then multiply
+the total by four. However, @samp{3 + 5 * 4} has no parentheses, and
+means @samp{3 + (5 * 4)}.
+
+@ignore
+@item
+All string concatenations are parenthesized too.
+(This could be made a bit smarter.)
+@end ignore
+
+@item
+Parentheses are used around the arguments to @code{print}
+and @code{printf} only when
+the @code{print} or @code{printf} statement is followed by a redirection.
+Similarly, if
+the target of a redirection isn't a scalar, it gets parenthesized.
+
+@item
+@command{gawk} supplies leading comments in
+front of the @code{BEGIN} and @code{END} rules,
+the pattern/action rules, and the functions.
+
+@end itemize
+
+The profiled version of your program may not look exactly like what you
+typed when you wrote it. This is because @command{gawk} creates the
+profiled version by ``pretty printing'' its internal representation of
+the program. The advantage to this is that @command{gawk} can produce
+a standard representation. The disadvantage is that all source-code
+comments are lost, as are the distinctions among multiple @code{BEGIN},
+@code{END}, @code{BEGINFILE}, and @code{ENDFILE} rules. Also, things such as:
+
+@example
+/foo/
+@end example
+
+@noindent
+come out as:
+
+@example
+/foo/ @{
+ print $0
+@}
+@end example
+
+@noindent
+which is correct, but possibly surprising.
+
+@cindex profiling @command{awk} programs, dynamically
+@cindex @command{gawk} program, dynamic profiling
+Besides creating profiles when a program has completed,
+@command{gawk} can produce a profile while it is running.
+This is useful if your @command{awk} program goes into an
+infinite loop and you want to see what has been executed.
+To use this feature, run @command{gawk} with the @option{--profile}
+option in the background:
+
+@example
+$ @kbd{gawk --profile -f myprog &}
+[1] 13992
+@end example
+
+@cindex @command{kill} command@comma{} dynamic profiling
+@cindex @code{USR1} signal
+@cindex @code{SIGUSR1} signal
+@cindex signals, @code{USR1}/@code{SIGUSR1}
+@noindent
+The shell prints a job number and process ID number; in this case, 13992.
+Use the @command{kill} command to send the @code{USR1} signal
+to @command{gawk}:
+
+@example
+$ @kbd{kill -USR1 13992}
+@end example
+
+@noindent
+As usual, the profiled version of the program is written to
+@file{awkprof.out}, or to a different file if one specified with
+the @option{--profile} option.
+
+Along with the regular profile, as shown earlier, the profile
+includes a trace of any active functions:
+
+@example
+# Function Call Stack:
+
+# 3. baz
+# 2. bar
+# 1. foo
+# -- main --
+@end example
+
+You may send @command{gawk} the @code{USR1} signal as many times as you like.
+Each time, the profile and function call trace are appended to the output
+profile file.
+
+@cindex @code{HUP} signal
+@cindex @code{SIGHUP} signal
+@cindex signals, @code{HUP}/@code{SIGHUP}
+If you use the @code{HUP} signal instead of the @code{USR1} signal,
+@command{gawk} produces the profile and the function call trace and then exits.
+
+@cindex @code{INT} signal (MS-Windows)
+@cindex @code{SIGINT} signal (MS-Windows)
+@cindex signals, @code{INT}/@code{SIGINT} (MS-Windows)
+@cindex @code{QUIT} signal (MS-Windows)
+@cindex @code{SIGQUIT} signal (MS-Windows)
+@cindex signals, @code{QUIT}/@code{SIGQUIT} (MS-Windows)
+When @command{gawk} runs on MS-Windows systems, it uses the
+@code{INT} and @code{QUIT} signals for producing the profile and, in
+the case of the @code{INT} signal, @command{gawk} exits. This is
+because these systems don't support the @command{kill} command, so the
+only signals you can deliver to a program are those generated by the
+keyboard. The @code{INT} signal is generated by the
+@kbd{@value{CTL}-@key{C}} or @kbd{@value{CTL}-@key{BREAK}} key, while the
+@code{QUIT} signal is generated by the @kbd{@value{CTL}-@key{\}} key.
+
+Finally, @command{gawk} also accepts another option @option{--pretty-print}.
+When called this way, @command{gawk} ``pretty prints'' the program into
+@file{awkprof.out}, without any execution counts.
+@c ENDOFRANGE advgaw
+@c ENDOFRANGE gawadv
+@c ENDOFRANGE awkp
+@c ENDOFRANGE proawk
+
@c The original text for this chapter was contributed by Efraim Yawitz.
@c FIXME: Add more indexing.
@node Debugger
-@chapter @command{dgawk}: The @command{awk} Debugger
-@cindex @command{dgawk}
+@chapter Debugging @command{awk} Programs
+@cindex debugging @command{awk} programs
It would be nice if computer programs worked perfectly the first time they
were run, but in real life, this rarely happens for programs of
any complexity. Thus, most programming languages have facilities available
for ``debugging'' programs, and now @command{awk} is no exception.
-The @command{dgawk} debugger is purposely modeled after
+The @command{gawk} debugger is purposely modeled after
@uref{http://www.gnu.org/software/gdb/, the GNU Debugger (GDB)}
command-line debugger. If you are familiar with GDB, learning
-@command{dgawk} is easy.
+how to use @command{gawk} for debugging your program is easy.
@menu
-* Debugging:: Introduction to @command{dgawk}.
-* Sample dgawk session:: Sample @command{dgawk} session.
-* List of Debugger Commands:: Main @command{dgawk} Commands.
-* Readline Support:: Readline Support.
-* Dgawk Limitations:: Limitations and future plans.
+* Debugging:: Introduction to @command{gawk} debugger.
+* Sample Debugging Session:: Sample debugging session.
+* List of Debugger Commands:: Main debugger commands.
+* Readline Support:: Readline support.
+* Limitations:: Limitations and future plans.
@end menu
@node Debugging
-@section Introduction to @command{dgawk}
+@section Introduction to @command{gawk} Debugger
This @value{SECTION} introduces debugging in general and begins
the discussion of debugging in @command{gawk}.
@menu
-* Debugging Concepts:: Debugging In General.
+* Debugging Concepts:: Debugging in General.
* Debugging Terms:: Additional Debugging Concepts.
* Awk Debugging:: Awk Debugging.
@end menu
@node Debugging Concepts
-@subsection Debugging In General
+@subsection Debugging in General
(If you have used debuggers in other languages, you may want to skip
ahead to the next section on the specific features of the @command{awk}
@@ -25307,8 +25867,7 @@ functional program that you or someone else wrote).
@subsection Additional Debugging Concepts
Before diving in to the details, we need to introduce several
-important concepts that apply to just about all debuggers, including
-@command{dgawk}.
+important concepts that apply to just about all debuggers.
The following list defines terms used throughout the rest of
this @value{CHAPTER}.
@@ -25327,7 +25886,7 @@ that contains the function's parameters, local variables, and return value,
as well as any other ``bookkeeping'' information needed to manage the
call stack. This data area is termed a @dfn{stack frame}.
-@command{gawk} also follows this model, and @command{dgawk} gives you
+@command{gawk} also follows this model, and gives you
access to the call stack and to each stack frame. You can see the
call stack, as well as from where each function on the stack was
invoked. Commands that print the call stack print information about
@@ -25372,48 +25931,48 @@ each line of @command{awk} code. The debugger provides the opportunity
to look at the individual primitive instructions carried out
by the higher-level @command{awk} commands.
-@node Sample dgawk session
-@section Sample @command{dgawk} session
+@node Sample Debugging Session
+@section Sample Debugging Session
-In order to illustrate the use of @command{dgawk}, let's look at a sample
+In order to illustrate the use of @command{gawk} as a debugger, let's look at a sample
debugging session. We will use the @command{awk} implementation of the
POSIX @command{uniq} command described earlier (@pxref{Uniq Program})
as our example.
@menu
-* dgawk invocation:: @command{dgawk} Invocation.
-* Finding The Bug:: Finding The Bug.
+* Debugger Invocation:: How to Start the Debugger.
+* Finding The Bug:: Finding the Bug.
@end menu
-@node dgawk invocation
-@subsection @command{dgawk} Invocation
+@node Debugger Invocation
+@subsection How to Start the Debugger
-Starting @command{dgawk} is exactly like running @command{awk}. The
-file(s) containing the program and any supporting code are given on the
-command line as arguments to one or more @option{-f} options.
-(@command{dgawk} is not designed to debug command-line
-programs, only programs contained in files.) In our case,
-we call @command{dgawk} like this:
+Starting the debugger is almost exactly like running @command{awk}, except you have to
+pass an additional option @option{--debug} or the corresponding short option @option{-D}.
+The file(s) containing the program and any supporting code are given on the command
+line as arguments to one or more @option{-f} options. (@command{gawk} is not designed
+to debug command-line programs, only programs contained in files.) In our case,
+we invoke the debugger like this:
@example
-$ @kbd{dgawk -f getopt.awk -f join.awk -f uniq.awk inputfile}
+$ @kbd{gawk -D -f getopt.awk -f join.awk -f uniq.awk inputfile}
@end example
@noindent
where both @file{getopt.awk} and @file{uniq.awk} are in @env{$AWKPATH}.
(Experienced users of GDB or similar debuggers should note that
this syntax is slightly different from what they are used to.
-With @command{dgawk}, the arguments for running the program are given
+With @command{gawk} debugger, the arguments for running the program are given
in the command line to the debugger rather than as part of the @code{run}
command at the debugger prompt.)
Instead of immediately running the program on @file{inputfile}, as
-@command{gawk} would ordinarily do, @command{dgawk} merely loads all
+@command{gawk} would ordinarily do, the debugger merely loads all
the program source files, compiles them internally, and then gives
us a prompt:
@example
-dgawk>
+gawk>
@end example
@noindent
@@ -25421,7 +25980,7 @@ from which we can issue commands to the debugger. At this point, no
code has been executed.
@node Finding The Bug
-@subsection Finding The Bug
+@subsection Finding the Bug
Let's say that we are having a problem using (a faulty version of)
@file{uniq.awk} in the ``field-skipping'' mode, and it doesn't seem to be
@@ -25457,7 +26016,7 @@ a breakpoint in @file{uniq.awk} is at the beginning of the function
the breakpoint, use the @code{b} (breakpoint) command:
@example
-dgawk> @kbd{b are_equal}
+gawk> @kbd{b are_equal}
@print{} Breakpoint 1 set at file `awklib/eg/prog/uniq.awk', line 64
@end example
@@ -25466,22 +26025,22 @@ Now type @samp{r} or @samp{run} and the program runs until it hits
the breakpoint for the first time:
@example
-dgawk> @kbd{r}
+gawk> @kbd{r}
@print{} Starting program:
@print{} Stopping in Rule ...
@print{} Breakpoint 1, are_equal(n, m, clast, cline, alast, aline)
at `awklib/eg/prog/uniq.awk':64
@print{} 64 if (fcount == 0 && charcount == 0)
-dgawk>
+gawk>
@end example
Now we can look at what's going on inside our program. First of all,
let's see how we got to where we are. At the prompt, we type @samp{bt}
-(short for ``backtrace''), and @command{dgawk} responds with a
+(short for ``backtrace''), and the debugger responds with a
listing of the current stack frames:
@example
-dgawk> @kbd{bt}
+gawk> @kbd{bt}
@print{} #0 are_equal(n, m, clast, cline, alast, aline)
at `awklib/eg/prog/uniq.awk':69
@print{} #1 in main() at `awklib/eg/prog/uniq.awk':89
@@ -25496,11 +26055,11 @@ the key to finding the source of the problem.)
Now that we're in @code{are_equal()}, we can start looking at the values
of some variables. Let's say we type @samp{p n}
(@code{p} is short for ``print''). We would expect to see the value of
-@code{n}, a parameter to @code{are_equal()}. Actually, @command{dgawk}
+@code{n}, a parameter to @code{are_equal()}. Actually, the debugger
gives us:
@example
-dgawk> @kbd{p n}
+gawk> @kbd{p n}
@print{} n = untyped variable
@end example
@@ -25511,7 +26070,7 @@ function was called without arguments (@pxref{Function Calls}).
A more useful variable to display might be the current record:
@example
-dgawk> @kbd{p $0}
+gawk> @kbd{p $0}
@print{} $0 = string ("gawk is a wonderful program!")
@end example
@@ -25520,7 +26079,7 @@ This might be a bit puzzling at first since this is the second line of
our test input above. Let's look at @code{NR}:
@example
-dgawk> @kbd{p NR}
+gawk> @kbd{p NR}
@print{} NR = number (2)
@end example
@@ -25539,7 +26098,7 @@ NR == 1 @{
OK, let's just check that that rule worked correctly:
@example
-dgawk> @kbd{p last}
+gawk> @kbd{p last}
@print{} last = string ("awk is a wonderful program!")
@end example
@@ -25550,7 +26109,7 @@ be inside this function. To investigate further, we must begin
@samp{n} (for ``next''):
@example
-dgawk> @kbd{n}
+gawk> @kbd{n}
@print{} 67 if (fcount > 0) @{
@end example
@@ -25570,9 +26129,9 @@ Continuing to step, we now get to the splitting of the current and
last records:
@example
-dgawk> @kbd{n}
+gawk> @kbd{n}
@print{} 68 n = split(last, alast)
-dgawk> @kbd{n}
+gawk> @kbd{n}
@print{} 69 m = split($0, aline)
@end example
@@ -25580,7 +26139,7 @@ At this point, we should be curious to see what our records were split
into, so we try to look:
@example
-dgawk> @kbd{p n m alast aline}
+gawk> @kbd{p n m alast aline}
@print{} n = number (5)
@print{} m = number (5)
@print{} alast = array, 5 elements
@@ -25599,7 +26158,7 @@ inside the array?
The first choice would be to use subscripts:
@example
-dgawk> @kbd{p alast[0]}
+gawk> @kbd{p alast[0]}
@print{} "0" not in array `alast'
@end example
@@ -25607,16 +26166,16 @@ dgawk> @kbd{p alast[0]}
Oops!
@example
-dgawk> @kbd{p alast[1]}
+gawk> @kbd{p alast[1]}
@print{} alast["1"] = string ("awk")
@end example
This would be kind of slow for a 100-member array, though, so
-@command{dgawk} provides a shortcut (reminiscent of another language
+@command{gawk} provides a shortcut (reminiscent of another language
not to be mentioned):
@example
-dgawk> @kbd{p @@alast}
+gawk> @kbd{p @@alast}
@print{} alast["1"] = string ("awk")
@print{} alast["2"] = string ("is")
@print{} alast["3"] = string ("a")
@@ -25628,9 +26187,9 @@ It looks like we got this far OK. Let's take another step
or two:
@example
-dgawk> @kbd{n}
+gawk> @kbd{n}
@print{} 70 clast = join(alast, fcount, n)
-dgawk> @kbd{n}
+gawk> @kbd{n}
@print{} 71 cline = join(aline, fcount, m)
@end example
@@ -25640,7 +26199,7 @@ the virtual record to compare, and if the first field was numbered zero,
this would work. Let's look at what we've got:
@example
-dgawk> @kbd{p cline clast}
+gawk> @kbd{p cline clast}
@print{} cline = string ("gawk is a wonderful program!")
@print{} clast = string ("awk is a wonderful program!")
@end example
@@ -25649,10 +26208,10 @@ Hey, those look pretty familiar! They're just our original, unaltered,
input records. A little thinking (the human brain is still the best
debugging tool), and we realize that we were off by one!
-We get out of @command{dgawk}:
+We get out of the debugger:
@example
-dgawk> @kbd{q}
+gawk> @kbd{q}
@print{} The program is running. Exit anyway (y/n)? @kbd{y}
@end example
@@ -25668,9 +26227,9 @@ cline = join(aline, fcount+1, m)
and problem solved!
@node List of Debugger Commands
-@section Main @command{dgawk} Commands
+@section Main Debugger Commands
-The @command{dgawk} command set can be divided into the
+The @command{gawk} debugger command set can be divided into the
following categories:
@itemize @bullet{}
@@ -25697,24 +26256,24 @@ Miscellaneous
Each of these are discussed in the following subsections.
In the following descriptions, commands which may be abbreviated
show the abbreviation on a second description line.
-A @command{dgawk} command name may also be truncated if that partial
-name is unambiguous. @command{dgawk} has the built-in capability to
+A debugger command name may also be truncated if that partial
+name is unambiguous. The debugger has the built-in capability to
automatically repeat the previous command when just hitting @key{Enter}.
This works for the commands @code{list}, @code{next}, @code{nexti}, @code{step}, @code{stepi}
and @code{continue} executed without any argument.
@menu
-* Breakpoint Control:: Control of breakpoints.
-* Dgawk Execution Control:: Control of execution.
-* Viewing And Changing Data:: Viewing and changing data.
-* Dgawk Stack:: Dealing with the stack.
-* Dgawk Info:: Obtaining information about the program and
- the debugger state.
-* Miscellaneous Dgawk Commands:: Miscellaneous Commands.
+* Breakpoint Control:: Control of Breakpoints.
+* Debugger Execution Control:: Control of Execution.
+* Viewing And Changing Data:: Viewing and Changing Data.
+* Execution Stack:: Dealing with the Stack.
+* Debugger Info:: Obtaining Information about the Program and
+ the Debugger State.
+* Miscellaneous Debugger Commands:: Miscellaneous Commands.
@end menu
@node Breakpoint Control
-@subsection Control Of Breakpoints
+@subsection Control of Breakpoints
As we saw above, the first thing you probably want to do in a debugging
session is to get your breakpoints set up, since otherwise your program
@@ -25749,10 +26308,10 @@ Each breakpoint is assigned a number which can be used to delete it from
the breakpoint list using the @code{delete} command.
With a breakpoint, you may also supply a condition. This is an
-@command{awk} expression (enclosed in double quotes) that @command{dgawk}
+@command{awk} expression (enclosed in double quotes) that the debugger
evaluates whenever the breakpoint is reached. If the condition is true,
-then @command{dgawk} stops execution and prompts for a command. Otherwise,
-@command{dgawk} continues executing the program.
+then the debugger stops execution and prompts for a command. Otherwise,
+it continues executing the program.
@cindex debugger commands, @code{clear}
@cindex @code{clear} debugger command
@@ -25778,10 +26337,10 @@ Delete breakpoint(s) set at entry to function @var{function}.
@cindex @code{condition} debugger command
@item @code{condition} @var{n} @code{"@var{expression}"}
Add a condition to existing breakpoint or watchpoint @var{n}. The
-condition is an @command{awk} expression that @command{dgawk} evaluates
+condition is an @command{awk} expression that the debugger evaluates
whenever the breakpoint or watchpoint is reached. If the condition is true, then
-@command{dgawk} stops execution and prompts for a command. Otherwise,
-@command{dgawk} continues executing the program. If the condition expression is
+the debugger stops execution and prompts for a command. Otherwise,
+the debugger continues executing the program. If the condition expression is
not specified, any existing condition is removed; i.e., the breakpoint or
watchpoint is made unconditional.
@@ -25837,7 +26396,7 @@ Set a temporary breakpoint (enabled for only one stop).
The arguments are the same as for @code{break}.
@end table
-@node Dgawk Execution Control
+@node Debugger Execution Control
@subsection Control of Execution
Now that your breakpoints are ready, you can start running the program
@@ -25866,14 +26425,14 @@ in the list that resumes execution (e.g., @code{continue}) terminates the list
For example:
@example
-dgawk> @kbd{commands}
+gawk> @kbd{commands}
> @kbd{silent}
> @kbd{printf "A silent breakpoint; i = %d\n", i}
> @kbd{info locals}
> @kbd{set i = 10}
> @kbd{continue}
> @kbd{end}
-dgawk>
+gawk>
@end example
@cindex debugger commands, @code{c} (@code{continue})
@@ -25923,7 +26482,7 @@ and the caller of that frame becomes the innermost frame.
@cindex @code{r} debugger command (alias for @code{run})
@item @code{run}
@itemx @code{r}
-Start/restart execution of the program. When restarting, @command{dgawk}
+Start/restart execution of the program. When restarting, the debugger
retains the current breakpoints, watchpoints, command history,
automatic display variables, and debugger options.
@@ -25946,7 +26505,7 @@ stopping, unless it encounters a breakpoint or watchpoint.
@itemx @code{si} [@var{count}]
Execute one (or @var{count}) instruction(s), stepping inside function calls.
(For illustration of what is meant by an ``instruction'' in @command{gawk},
-see the output shown under @code{dump} in @ref{Miscellaneous Dgawk Commands}.)
+see the output shown under @code{dump} in @ref{Miscellaneous Debugger Commands}.)
@cindex debugger commands, @code{u} (@code{until})
@cindex debugger commands, @code{until}
@@ -25974,7 +26533,7 @@ The value of the variable or field is displayed each time the program stops.
Each variable added to the list is identified by a unique number:
@example
-dgawk> @kbd{display x}
+gawk> @kbd{display x}
@print{} 10: x = 1
@end example
@@ -26011,7 +26570,7 @@ Print the value of a @command{gawk} variable or field.
Fields must be referenced by constants:
@example
-dgawk> @kbd{print $3}
+gawk> @kbd{print $3}
@end example
@noindent
@@ -26053,16 +26612,16 @@ You can also set special @command{awk} variables, such as @code{FS},
@item @code{watch} @var{var} | @code{$}@var{n} [@code{"@var{expression}"}]
@itemx @code{w} @var{var} | @code{$}@var{n} [@code{"@var{expression}"}]
Add variable @var{var} (or field @code{$@var{n}}) to the watch list.
-@command{dgawk} then stops whenever
+The debugger then stops whenever
the value of the variable or field changes. Each watched item is assigned a
number which can be used to delete it from the watch list using the
@code{unwatch} command.
With a watchpoint, you may also supply a condition. This is an
-@command{awk} expression (enclosed in double quotes) that @command{dgawk}
+@command{awk} expression (enclosed in double quotes) that the debugger
evaluates whenever the watchpoint is reached. If the condition is true,
-then @command{dgawk} stops execution and prompts for a command. Otherwise,
-@command{dgawk} continues executing the program.
+then the debugger stops execution and prompts for a command. Otherwise,
+@command{gawk} continues executing the program.
@cindex debugger commands, @code{undisplay}
@cindex @code{undisplay} debugger command
@@ -26078,8 +26637,8 @@ watch list.
@end table
-@node Dgawk Stack
-@subsection Dealing With The Stack
+@node Execution Stack
+@subsection Dealing with the Stack
Whenever you run a program which contains any function calls,
@command{gawk} maintains a stack of all of the function calls leading up
@@ -26123,12 +26682,12 @@ Move @var{count} (default 1) frames up the stack toward the outermost frame.
Then select and print the frame.
@end table
-@node Dgawk Info
-@subsection Obtaining Information About The Program and The Debugger State
+@node Debugger Info
+@subsection Obtaining Information about the Program and the Debugger State
Besides looking at the values of variables, there is often a need to get
other sorts of information about the state of your program and of the
-debugging environment itself. @command{dgawk} has one command which
+debugging environment itself. The @command{gawk} debugger has one command which
provides this information, appropriately called @code{info}. @code{info}
is used with one of a number of arguments that tell it exactly what
you want to know:
@@ -26166,7 +26725,7 @@ Local variables of the selected frame.
@item source
The name of the current source file. Each time the program stops, the
current source file is the file containing the current instruction.
-When @command{dgawk} first starts, the current source file is the first file
+When the debugger first starts, the current source file is the first file
included via the @option{-f} option. The
@samp{list @var{filename}:@var{lineno}} command can
be used at any time to change the current source.
@@ -26202,7 +26761,7 @@ The available options are:
@c nested table
@table @code
@item history_size
-The maximum number of lines to keep in the history file @file{./.dgawk_history}.
+The maximum number of lines to keep in the history file @file{./.gawk_history}.
The default is 100.
@item listsize
@@ -26214,14 +26773,14 @@ to standard output. An empty string (@code{""}) resets output to
standard output.
@item prompt
-The debugger prompt. The default is @samp{@w{dgawk> }}.
+The debugger prompt. The default is @samp{@w{gawk> }}.
@item save_history @r{[}on @r{|} off@r{]}
-Save command history to file @file{./.dgawk_history}.
+Save command history to file @file{./.gawk_history}.
The default is @code{on}.
@item save_options @r{[}on @r{|} off@r{]}
-Save current options to file @file{./.dgawkrc} upon exit.
+Save current options to file @file{./.gawkrc} upon exit.
The default is @code{on}.
Options are read back in to the next session upon startup.
@@ -26241,16 +26800,16 @@ Empty lines are ignored; they do @emph{not}
repeat the last command.
You can't restart the program by having more than one @code{run}
command in the file. Also, the list of commands may include additional
-@code{source} commands; however, @command{dgawk} will not source the
+@code{source} commands; however, the @command{gawk} debugger will not source the
same file more than once in order to avoid infinite recursion.
In addition to, or instead of the @code{source} command, you can use
-the @option{-R @var{file}} or @option{--command=@var{file}} command-line
+the @option{-D @var{file}} or @option{--debug=@var{file}} command-line
options to execute commands from a file non-interactively
(@pxref{Options}.
@end table
-@node Miscellaneous Dgawk Commands
+@node Miscellaneous Debugger Commands
@subsection Miscellaneous Commands
There are a few more commands which do not fit into the
@@ -26268,7 +26827,7 @@ partial dump of Davide Brini's obfuscated code
(@pxref{Signature Program}) demonstrates:
@smallexample
-dgawk> @kbd{dump}
+gawk> @kbd{dump}
@print{} # BEGIN
@print{}
@print{} [ 2:0x89faef4] Op_rule : [in_rule = BEGIN] [source_file = brini.awk]
@@ -26317,7 +26876,7 @@ dgawk> @kbd{dump}
@print{} [ :0x89fa3b0] Op_after_beginfile :
@print{} [ :0x89fa388] Op_no_op :
@print{} [ :0x89fa3c4] Op_after_endfile :
-dgawk>
+gawk>
@end smallexample
@cindex debugger commands, @code{h} (@code{help})
@@ -26326,7 +26885,7 @@ dgawk>
@cindex @code{h} debugger command (alias for @code{help})
@item @code{help}
@itemx @code{h}
-Print a list of all of the @command{dgawk} commands with a short
+Print a list of all of the @command{gawk} debugger commands with a short
summary of their usage. @samp{help @var{command}} prints the information
about the command @var{command}.
@@ -26373,7 +26932,7 @@ function @var{function}. This command may change the current source file.
Exit the debugger. Debugging is great fun, but sometimes we all have
to tend to other obligations in life, and sometimes we find the bug,
and are free to go on to the next one! As we saw above, if you are
-running a program, @command{dgawk} warns you if you accidentally type
+running a program, the debugger warns you if you accidentally type
@samp{q} or @samp{quit}, to make sure you really want to quit.
@cindex debugger commands, @code{trace}
@@ -26392,7 +26951,7 @@ fairly self-explanatory, and using @code{stepi} and @code{nexti} while
@node Readline Support
@section Readline Support
-If @command{dgawk} is compiled with the @code{readline} library, you
+If @command{gawk} is compiled with the @code{readline} library, you
can take advantage of that library's command completion and history expansion
features. The following types of completion are available:
@@ -26424,28 +26983,28 @@ and
@end table
-@node Dgawk Limitations
+@node Limitations
@section Limitations and Future Plans
-We hope you find @command{dgawk} useful and enjoyable to work with,
+We hope you find the @command{gawk} debugger useful and enjoyable to work with,
but as with any program, especially in its early releases, it still has
some limitations. A few which are worth being aware of are:
@itemize @bullet{}
@item
-At this point, @command{dgawk} does not give a detailed explanation of
+At this point, the debugger does not give a detailed explanation of
what you did wrong when you type in something it doesn't like. Rather, it just
responds @samp{syntax error}. When you do figure out what your mistake was,
though, you'll feel like a real guru.
@item
-If you perused the dump of opcodes in @ref{Miscellaneous Dgawk Commands},
+If you perused the dump of opcodes in @ref{Miscellaneous Debugger Commands},
(or if you are already familiar with @command{gawk} internals),
you will realize that much of the internal manipulation of data
in @command{gawk}, as in many interpreters, is done on a stack.
@code{Op_push}, @code{Op_pop}, etc., are the ``bread and butter'' of
-most @command{gawk} code. Unfortunately, as of now, @command{dgawk}
-does not allow you to examine the stack's contents.
+most @command{gawk} code. Unfortunately, as of now, the @command{gawk}
+debugger does not allow you to examine the stack's contents.
That is, the intermediate results of expression evaluation are on the
stack, but cannot be printed. Rather, only variables which are defined
@@ -26460,28 +27019,4986 @@ programmer, you are expected to know what @code{/[^[:alnum:][:blank:]]/}
means.
@item
-@command{dgawk} is designed to be used by running a program (with all its
-parameters) on the command line, as described in @ref{dgawk invocation}.
+The @command{gawk} debugger is designed to be used by running a program (with all its
+parameters) on the command line, as described in @ref{Debugger Invocation}.
There is no way (as of now) to attach or ``break in'' to a running program.
This seems reasonable for a language which is used mainly for quickly
executing, short programs.
@item
-@command{dgawk} only accepts source supplied with the @option{-f} option.
+The @command{gawk} debugger only accepts source supplied with the @option{-f} option.
@end itemize
Look forward to a future release when these and other missing features may
be added, and of course feel free to try to add them yourself!
+@node Arbitrary Precision Arithmetic
+@chapter Arithmetic and Arbitrary Precision Arithmetic with @command{gawk}
+@cindex arbitrary precision
+@cindex multiple precision
+@cindex infinite precision
+@cindex floating-point numbers, arbitrary precision
+@cindex MPFR
+@cindex GMP
+
+@cindex Knuth, Donald
+@quotation
+@i{There's a credibility gap: We don't know how much of the computer's answers
+to believe. Novice computer users solve this problem by implicitly trusting
+in the computer as an infallible authority; they tend to believe that all
+digits of a printed answer are significant. Disillusioned computer users have
+just the opposite approach; they are constantly afraid that their answers
+are almost meaningless.}@*
+Donald Knuth@footnote{Donald E.@: Knuth.
+@cite{The Art of Computer Programming}. Volume 2,
+@cite{Seminumerical Algorithms}, third edition,
+1998, ISBN 0-201-89683-4, p.@: 229.}
+@end quotation
+
+This @value{CHAPTER} discusses issues that you may encounter
+when performing arithmetic. It begins by discussing some of
+the general attributes of computer arithmetic, along with how
+this can influence what you see when running @command{awk} programs.
+This discussion applies to all versions of @command{awk}.
+
+Then the @value{CHAPTER} moves on to @dfn{arbitrary precision
+arithmetic}, a feature which is specific to @command{gawk}.
+
+@menu
+* General Arithmetic:: An introduction to computer arithmetic.
+* Floating-point Programming:: Effective Floating-point Programming.
+* Gawk and MPFR:: How @command{gawk} provides
+ arbitrary-precision arithmetic.
+* Arbitrary Precision Floats:: Arbitrary Precision Floating-point Arithmetic
+ with @command{gawk}.
+* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic with
+ @command{gawk}.
+@end menu
+
+@node General Arithmetic
+@section A General Description of Computer Arithmetic
+
+@cindex integers
+@cindex floating-point, numbers
+@cindex numbers, floating-point
+Within computers, there are two kinds of numeric values: @dfn{integers}
+and @dfn{floating-point}.
+In school, integer values were referred to as ``whole'' numbers---that is,
+numbers without any fractional part, such as 1, 42, or @minus{}17.
+The advantage to integer numbers is that they represent values exactly.
+The disadvantage is that their range is limited. On most systems,
+this range is @minus{}2,147,483,648 to 2,147,483,647.
+However, many systems now support a range from
+@minus{}9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.
+
+@cindex unsigned integers
+@cindex integers, unsigned
+Integer values come in two flavors: @dfn{signed} and @dfn{unsigned}.
+Signed values may be negative or positive, with the range of values just
+described.
+Unsigned values are always positive. On most systems,
+the range is from 0 to 4,294,967,295.
+However, many systems now support a range from
+0 to 18,446,744,073,709,551,615.
+
+@cindex double precision floating-point
+@cindex single precision floating-point
+Floating-point numbers represent what are called ``real'' numbers; i.e.,
+those that do have a fractional part, such as 3.1415927.
+The advantage to floating-point numbers is that they
+can represent a much larger range of values.
+The disadvantage is that there are numbers that they cannot represent
+exactly.
+@command{awk} uses @dfn{double precision} floating-point numbers, which
+can hold more digits than @dfn{single precision}
+floating-point numbers.
+@c Floating-point issues are discussed more fully in
+@c @ref{Floating Point Issues}.
+
+There a several important issues to be aware of, described next.
+
+@menu
+* Floating Point Issues:: Stuff to know about floating-point numbers.
+* Integer Programming:: Effective integer programming.
+@end menu
+
+@node Floating Point Issues
+@subsection Floating-Point Number Caveats
+
+This @value{SECTION} describes some of the issues
+involved in using floating-point numbers.
+
+There is a very nice
+@uref{http://www.validlab.com/goldberg/paper.pdf, paper on floating-point arithmetic}
+by David Goldberg,
+``What Every Computer Scientist Should Know About Floating-point Arithmetic,''
+@cite{ACM Computing Surveys} @strong{23}, 1 (1991-03), 5-48.
+This is worth reading if you are interested in the details,
+but it does require a background in computer science.
+
+@menu
+* String Conversion Precision:: The String Value Can Lie.
+* Unexpected Results:: Floating Point Numbers Are Not Abstract
+ Numbers.
+* POSIX Floating Point Problems:: Standards Versus Existing Practice.
+@end menu
+
+@node String Conversion Precision
+@subsubsection The String Value Can Lie
+
+Internally, @command{awk} keeps both the numeric value
+(double precision floating-point) and the string value for a variable.
+Separately, @command{awk} keeps
+track of what type the variable has
+(@pxref{Typing and Comparison}),
+which plays a role in how variables are used in comparisons.
+
+It is important to note that the string value for a number may not
+reflect the full value (all the digits) that the numeric value
+actually contains.
+The following program (@file{values.awk}) illustrates this:
+
+@example
+@{
+ sum = $1 + $2
+ # see it for what it is
+ printf("sum = %.12g\n", sum)
+ # use CONVFMT
+ a = "<" sum ">"
+ print "a =", a
+ # use OFMT
+ print "sum =", sum
+@}
+@end example
+
+@noindent
+This program shows the full value of the sum of @code{$1} and @code{$2}
+using @code{printf}, and then prints the string values obtained
+from both automatic conversion (via @code{CONVFMT}) and
+from printing (via @code{OFMT}).
+
+Here is what happens when the program is run:
+
+@example
+$ @kbd{echo 3.654321 1.2345678 | awk -f values.awk}
+@print{} sum = 4.8888888
+@print{} a = <4.88889>
+@print{} sum = 4.88889
+@end example
+
+This makes it clear that the full numeric value is different from
+what the default string representations show.
+
+@code{CONVFMT}'s default value is @code{"%.6g"}, which yields a value with
+at least six significant digits. For some applications, you might want to
+change it to specify more precision.
+On most modern machines, most of the time,
+17 digits is enough to capture a floating-point number's
+value exactly.@footnote{Pathological cases can require up to
+752 digits (!), but we doubt that you need to worry about this.}
+
+@node Unexpected Results
+@subsubsection Floating Point Numbers Are Not Abstract Numbers
+
+@cindex floating-point, numbers
+Unlike numbers in the abstract sense (such as what you studied in high school
+or college arithmetic), numbers stored in computers are limited in certain ways.
+They cannot represent an infinite number of digits, nor can they always
+represent things exactly.
+In particular,
+floating-point numbers cannot
+always represent values exactly. Here is an example:
+
+@example
+$ @kbd{awk '@{ printf("%010d\n", $1 * 100) @}'}
+515.79
+@print{} 0000051579
+515.80
+@print{} 0000051579
+515.81
+@print{} 0000051580
+515.82
+@print{} 0000051582
+@kbd{@value{CTL}-d}
+@end example
+
+@noindent
+This shows that some values can be represented exactly,
+whereas others are only approximated. This is not a ``bug''
+in @command{awk}, but simply an artifact of how computers
+represent numbers.
+
+@quotation NOTE
+It cannot be emphasized enough that the behavior just
+described is fundamental to modern computers. You will
+see this kind of thing happen in @emph{any} programming
+language using hardware floating-point numbers. It is @emph{not}
+a bug in @command{gawk}, nor is it something that can be ``just
+fixed.''
+@end quotation
+
+@cindex negative zero
+@cindex positive zero
+@cindex zero@comma{} negative vs.@: positive
+Another peculiarity of floating-point numbers on modern systems
+is that they often have more than one representation for the number zero!
+In particular, it is possible to represent ``minus zero'' as well as
+regular, or ``positive'' zero.
+
+This example shows that negative and positive zero are distinct values
+when stored internally, but that they are in fact equal to each other,
+as well as to ``regular'' zero:
+
+@example
+$ @kbd{gawk 'BEGIN @{ mz = -0 ; pz = 0}
+> @kbd{printf "-0 = %g, +0 = %g, (-0 == +0) -> %d\n", mz, pz, mz == pz}
+> @kbd{printf "mz == 0 -> %d, pz == 0 -> %d\n", mz == 0, pz == 0}
+> @kbd{@}'}
+@print{} -0 = -0, +0 = 0, (-0 == +0) -> 1
+@print{} mz == 0 -> 1, pz == 0 -> 1
+@end example
+
+It helps to keep this in mind should you process numeric data
+that contains negative zero values; the fact that the zero is negative
+is noted and can affect comparisons.
+
+@node POSIX Floating Point Problems
+@subsubsection Standards Versus Existing Practice
+
+Historically, @command{awk} has converted any non-numeric looking string
+to the numeric value zero, when required. Furthermore, the original
+definition of the language and the original POSIX standards specified that
+@command{awk} only understands decimal numbers (base 10), and not octal
+(base 8) or hexadecimal numbers (base 16).
+
+Changes in the language of the
+2001 and 2004 POSIX standards can be interpreted to imply that @command{awk}
+should support additional features. These features are:
+
+@itemize @bullet
+@item
+Interpretation of floating point data values specified in hexadecimal
+notation (@samp{0xDEADBEEF}). (Note: data values, @emph{not}
+source code constants.)
+
+@item
+Support for the special IEEE 754 floating point values ``Not A Number''
+(NaN), positive Infinity (``inf'') and negative Infinity (``@minus{}inf'').
+In particular, the format for these values is as specified by the ISO 1999
+C standard, which ignores case and can allow machine-dependent additional
+characters after the @samp{nan} and allow either @samp{inf} or @samp{infinity}.
+@end itemize
+
+The first problem is that both of these are clear changes to historical
+practice:
+
+@itemize @bullet
+@item
+The @command{gawk} maintainer feels that supporting hexadecimal floating
+point values, in particular, is ugly, and was never intended by the
+original designers to be part of the language.
+
+@item
+Allowing completely alphabetic strings to have valid numeric
+values is also a very severe departure from historical practice.
+@end itemize
+
+The second problem is that the @code{gawk} maintainer feels that this
+interpretation of the standard, which requires a certain amount of
+``language lawyering'' to arrive at in the first place, was not even
+intended by the standard developers. In other words, ``we see how you
+got where you are, but we don't think that that's where you want to be.''
+
+Recognizing the above issues, but attempting to provide compatibility
+with the earlier versions of the standard,
+the 2008 POSIX standard added explicit wording to allow, but not require,
+that @command{awk} support hexadecimal floating point values and
+special values for ``Not A Number'' and infinity.
+
+Although the @command{gawk} maintainer continues to feel that
+providing those features is inadvisable,
+nevertheless, on systems that support IEEE floating point, it seems
+reasonable to provide @emph{some} way to support NaN and Infinity values.
+The solution implemented in @command{gawk} is as follows:
+
+@itemize @bullet
+@item
+With the @option{--posix} command-line option, @command{gawk} becomes
+``hands off.'' String values are passed directly to the system library's
+@code{strtod()} function, and if it successfully returns a numeric value,
+that is what's used.@footnote{You asked for it, you got it.}
+By definition, the results are not portable across
+different systems. They are also a little surprising:
+
+@example
+$ @kbd{echo nanny | gawk --posix '@{ print $1 + 0 @}'}
+@print{} nan
+$ @kbd{echo 0xDeadBeef | gawk --posix '@{ print $1 + 0 @}'}
+@print{} 3735928559
+@end example
+
+@item
+Without @option{--posix}, @command{gawk} interprets the four strings
+@samp{+inf},
+@samp{-inf},
+@samp{+nan},
+and
+@samp{-nan}
+specially, producing the corresponding special numeric values.
+The leading sign acts a signal to @command{gawk} (and the user)
+that the value is really numeric. Hexadecimal floating point is
+not supported (unless you also use @option{--non-decimal-data},
+which is @emph{not} recommended). For example:
+
+@example
+$ @kbd{echo nanny | gawk '@{ print $1 + 0 @}'}
+@print{} 0
+$ @kbd{echo +nan | gawk '@{ print $1 + 0 @}'}
+@print{} nan
+$ @kbd{echo 0xDeadBeef | gawk '@{ print $1 + 0 @}'}
+@print{} 0
+@end example
+
+@command{gawk} does ignore case in the four special values.
+Thus @samp{+nan} and @samp{+NaN} are the same.
+@end itemize
+
+@node Integer Programming
+@subsection Mixing Integers And Floating-point
+
+As has been mentioned already, @command{gawk} ordinarily uses hardware double
+precision with 64-bit IEEE binary floating-point representation
+for numbers on most systems. A large integer like 9,007,199,254,740,997
+has a binary representation that, although finite, is more than 53 bits long;
+it must also be rounded to 53 bits.
+The biggest integer that can be stored in a C @code{double} is usually the same
+as the largest possible value of a @code{double}. If your system @code{double}
+is an IEEE 64-bit @code{double}, this largest possible value is an integer and
+can be represented precisely. What more should one know about integers?
+
+If you want to know what is the largest integer, such that it and
+all smaller integers can be stored in 64-bit doubles without losing precision,
+then the answer is
+@iftex
+@math{2^{53}}.
+@end iftex
+@ifnottex
+2^53.
+@end ifnottex
+The next representable number is the even number
+@iftex
+@math{2^{53} + 2},
+@end iftex
+@ifnottex
+2^53 + 2,
+@end ifnottex
+meaning it is unlikely that you will be able to make
+@command{gawk} print
+@iftex
+@math{2^{53} + 1}
+@end iftex
+@ifnottex
+2^53 + 1
+@end ifnottex
+in integer format.
+The range of integers exactly representable by a 64-bit double
+is
+@iftex
+@math{[-2^{53}, 2^{53}]}.
+@end iftex
+@ifnottex
+[@minus{}2^53, 2^53].
+@end ifnottex
+If you ever see an integer outside this range in @command{gawk}
+using 64-bit doubles, you have reason to be very suspicious about
+the accuracy of the output. Here is a simple program with erroneous output:
+
+@example
+$ @kbd{gawk 'BEGIN @{ i = 2^53 - 1; for (j = 0; j < 4; j++) print i + j @}'}
+@print{} 9007199254740991
+@print{} 9007199254740992
+@print{} 9007199254740992
+@print{} 9007199254740994
+@end example
+
+The lesson is to not assume that any large integer printed by @command{gawk}
+represents an exact result from your computation, especially if it wraps
+around on your screen.
+
+@node Floating-point Programming
+@section Understanding Floating-point Programming
+
+Numerical programming is an extensive area; if you need to develop
+sophisticated numerical algorithms then @command{gawk} may not be
+the ideal tool, and this documentation may not be sufficient.
+@c FIXME: JOHN: Do you want to cite some actual books?
+It might require digesting a book or two to really internalize how to compute
+with ideal accuracy and precision,
+and the result often depends on the particular application.
+
+@quotation NOTE
+A floating-point calculation's @dfn{accuracy} is how close it comes
+to the real value. This is as opposed to the @dfn{precision}, which
+usually refers to the number of bits used to represent the number
+(see @uref{http://en.wikipedia.org/wiki/Accuracy_and_precision,
+the Wikipedia article} for more information).
+@end quotation
+
+There are two options for doing floating-point calculations:
+hardware floating-point (as used by standard @command{awk} and
+the default for @command{gawk}), and @dfn{arbitrary-precision}
+floating-point, which is software based.
+From this point forward, this @value{CHAPTER}
+aims to provide enough information to understand both, and then
+will focus on @command{gawk}'s facilities for the latter.@footnote{If you
+are interested in other tools that perform arbitrary precision arithmetic,
+you may want to investigate the POSIX @command{bc} tool. See
+@uref{http://pubs.opengroup.org/onlinepubs/009695399/utilities/bc.html,
+the POSIX specification for it}, for more information.}
+
+Binary floating-point representations and arithmetic are inexact.
+Simple values like 0.1 cannot be precisely represented using
+binary floating-point numbers, and the limited precision of
+floating-point numbers means that slight changes in
+the order of operations or the precision of intermediate storage
+can change the result. To make matters worse, with arbitrary precision
+floating-point, you can set the precision before starting a computation,
+but then you cannot be sure of the number of significant decimal places
+in the final result.
+
+Sometimes, before you start to write any code, you should think more
+about what you really want and what's really happening. Consider the
+two numbers in the following example:
+
+@example
+x = 0.875 # 1/2 + 1/4 + 1/8
+y = 0.425
+@end example
+
+Unlike the number in @code{y}, the number stored in @code{x}
+is exactly representable
+in binary since it can be written as a finite sum of one or
+more fractions whose denominators are all powers of two.
+When @command{gawk} reads a floating-point number from
+program source, it automatically rounds that number to whatever
+precision your machine supports. If you try to print the numeric
+content of a variable using an output format string of @code{"%.17g"},
+it may not produce the same number as you assigned to it:
+
+@example
+$ @kbd{gawk 'BEGIN @{ x = 0.875; y = 0.425}
+> @kbd{ printf("%0.17g, %0.17g\n", x, y) @}'}
+@print{} 0.875, 0.42499999999999999
+@end example
+
+Often the error is so small you do not even notice it, and if you do,
+you can always specify how much precision you would like in your output.
+Usually this is a format string like @code{"%.15g"}, which when
+used in the previous example, produces an output identical to the input.
+
+Because the underlying representation can be a little bit off from the exact value,
+comparing floating-point values to see if they are equal is generally not a good idea.
+Here is an example where it does not work like you expect:
+
+@example
+$ @kbd{gawk 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'}
+@print{} 0
+@end example
+
+The loss of accuracy during a single computation with floating-point numbers
+usually isn't enough to worry about. However, if you compute a value
+which is the result of a sequence of floating point operations,
+the error can accumulate and greatly affect the computation itself.
+Here is an attempt to compute the value of the constant
+@value{PI} using one of its many series representations:
+
+@example
+BEGIN @{
+ x = 1.0 / sqrt(3.0)
+ n = 6
+ for (i = 1; i < 30; i++) @{
+ n = n * 2.0
+ x = (sqrt(x * x + 1) - 1) / x
+ printf("%.15f\n", n * x)
+ @}
+@}
+@end example
+
+When run, the early errors propagating through later computations
+cause the loop to terminate prematurely after an attempt to divide by zero.
+
+@example
+$ @kbd{gawk -f pi.awk}
+@print{} 3.215390309173475
+@print{} 3.159659942097510
+@print{} 3.146086215131467
+@print{} 3.142714599645573
+@dots{}
+@print{} 3.224515243534819
+@print{} 2.791117213058638
+@print{} 0.000000000000000
+@error{} gawk: pi.awk:6: fatal: division by zero attempted
+@end example
+
+Here is an additional example where the inaccuracies in internal representations
+yield an unexpected result:
+
+@example
+$ @kbd{gawk 'BEGIN @{}
+> @kbd{for (d = 1.1; d <= 1.5; d += 0.1)}
+> @kbd{i++}
+> @kbd{print i}
+> @kbd{@}'}
+@print{} 4
+@end example
+
+Can computation using arbitrary precision help with the previous examples?
+If you are impatient to know, see
+@ref{Exact Arithmetic}.
+
+Instead of arbitrary precision floating-point arithmetic,
+often all you need is an adjustment of your logic
+or a different order for the operations in your calculation.
+The stability and the accuracy of the computation of the constant @value{PI}
+in the previous example can be enhanced by using the following
+simple algebraic transformation:
+
+@example
+(sqrt(x * x + 1) - 1) / x = x / (sqrt(x * x + 1) + 1)
+@end example
+
+@noindent
+After making this, change the program does converge to
+@value{PI} in under 30 iterations:
+
+@example
+$ @kbd{gawk -f /tmp/pi2.awk}
+@print{} 3.215390309173473
+@print{} 3.159659942097501
+@print{} 3.146086215131436
+@print{} 3.142714599645370
+@print{} 3.141873049979825
+@dots{}
+@print{} 3.141592653589797
+@print{} 3.141592653589797
+@end example
+
+There is no need to be unduly suspicious about the results from
+floating-point arithmetic. The lesson to remember is that
+floating-point arithmetic is always more complex than arithmetic using
+pencil and paper. In order to take advantage of the power
+of computer floating-point, you need to know its limitations
+and work within them. For most casual use of floating-point arithmetic,
+you will often get the expected result in the end if you simply round
+the display of your final results to the correct number of significant
+decimal digits.
+
+As general advice, avoid presenting numerical data in a manner that
+implies better precision than is actually the case.
+
+@menu
+* Floating-point Representation:: Binary floating-point representation.
+* Floating-point Context:: Floating-point context.
+* Rounding Mode:: Floating-point rounding mode.
+@end menu
+
+@node Floating-point Representation
+@subsection Binary Floating-point Representation
+@cindex IEEE-754 format
+
+Although floating-point representations vary from machine to machine,
+the most commonly encountered representation is that defined by the
+IEEE 754 Standard. An IEEE-754 format value has three components:
+
+@itemize @bullet
+@item
+A sign bit telling whether the number is positive or negative.
+
+@item
+An @dfn{exponent}, @var{e}, giving its order of magnitude.
+
+@item
+A @dfn{significand}, @var{s},
+specifying the actual digits of the number.
+@end itemize
+
+The value of the
+number is then
+@iftex
+@math{s @cdot 2^e}.
+@end iftex
+@ifnottex
+@var{s * 2^e}.
+@end ifnottex
+The first bit of a non-zero binary significand
+is always one, so the significand in an IEEE-754 format only includes the
+fractional part, leaving the leading one implicit.
+The significand is stored in @dfn{normalized} format,
+which means that the first bit is always a one.
+
+Three of the standard IEEE-754 types are 32-bit single precision,
+64-bit double precision and 128-bit quadruple precision.
+The standard also specifies extended precision formats
+to allow greater precisions and larger exponent ranges.
+
+@node Floating-point Context
+@subsection Floating-point Context
+@cindex context, floating-point
+
+A floating-point @dfn{context} defines the environment for arithmetic operations.
+It governs precision, sets rules for rounding, and limits the range for exponents.
+The context has the following primary components:
+
+@table @dfn
+@item Precision
+Precision of the floating-point format in bits.
+@item emax
+Maximum exponent allowed for this format.
+@item emin
+Minimum exponent allowed for this format.
+@item Underflow behavior
+The format may or may not support gradual underflow.
+@item Rounding
+The rounding mode of this context.
+@end table
+
+@ref{table-ieee-formats} lists the precision and exponent
+field values for the basic IEEE-754 binary formats:
+
+@float Table,table-ieee-formats
+@caption{Basic IEEE Format Context Values}
+@multitable @columnfractions .20 .20 .20 .20 .20
+@headitem Name @tab Total bits @tab Precision @tab emin @tab emax
+@item Single @tab 32 @tab 24 @tab @minus{}126 @tab +127
+@item Double @tab 64 @tab 53 @tab @minus{}1022 @tab +1023
+@item Quadruple @tab 128 @tab 113 @tab @minus{}16382 @tab +16383
+@end multitable
+@end float
+
+@quotation NOTE
+The precision numbers include the implied leading one that gives them
+one extra bit of significand.
+@end quotation
+
+A floating-point context can also determine which signals are treated
+as exceptions, and can set rules for arithmetic with special values.
+Please consult the IEEE-754 standard or other resources for details.
+
+@command{gawk} ordinarily uses the hardware double precision
+representation for numbers. On most systems, this is IEEE-754
+floating-point format, corresponding to 64-bit binary with 53 bits
+of precision.
+
+@quotation NOTE
+In case an underflow occurs, the standard allows, but does not require,
+the result from an arithmetic operation to be a number smaller than
+the smallest nonzero normalized number. Such numbers do
+not have as many significant digits as normal numbers, and are called
+@dfn{denormals} or @dfn{subnormals}. The alternative, simply returning a zero,
+is called @dfn{flush to zero}. The basic IEEE-754 binary formats
+support subnormal numbers.
+@end quotation
+
+@node Rounding Mode
+@subsection Floating-point Rounding Mode
+@cindex rounding mode, floating-point
+
+The @dfn{rounding mode} specifies the behavior for the results of numerical
+operations when discarding extra precision. Each rounding mode indicates
+how the least significant returned digit of a rounded result is to
+be calculated.
+@ref{table-rounding-modes} lists the IEEE-754 defined
+rounding modes:
+
+@float Table,table-rounding-modes
+@caption{IEEE 754 Rounding Modes}
+@multitable @columnfractions .45 .55
+@headitem Rounding Mode @tab IEEE Name
+@item Round to nearest, ties to even @tab @code{roundTiesToEven}
+@item Round toward plus Infinity @tab @code{roundTowardPositive}
+@item Round toward negative Infinity @tab @code{roundTowardNegative}
+@item Round toward zero @tab @code{roundTowardZero}
+@item Round to nearest, ties away from zero @tab @code{roundTiesToAway}
+@end multitable
+@end float
+
+The default mode @code{roundTiesToEven} is the most preferred,
+but the least intuitive. This method does the obvious thing for most values,
+by rounding them up or down to the nearest digit.
+For example, rounding 1.132 to two digits yields 1.13,
+and rounding 1.157 yields 1.16.
+
+However, when it comes to rounding a value that is exactly halfway between,
+things do not work the way you probably learned in school.
+In this case, the number is rounded to the nearest even digit.
+So rounding 0.125 to two digits rounds down to 0.12,
+but rounding 0.6875 to three digits rounds up to 0.688.
+You probably have already encountered this rounding mode when
+using the @code{printf} routine to format floating-point numbers.
+For example:
+
+@example
+BEGIN @{
+ x = -4.5
+ for (i = 1; i < 10; i++) @{
+ x += 1.0
+ printf("%4.1f => %2.0f\n", x, x)
+ @}
+@}
+@end example
+
+@noindent
+produces the following output when run:@footnote{It
+is possible for the output to be completely different if the
+C library in your system does not use the IEEE-754 even-rounding
+rule to round halfway cases for @code{printf()}.}
+
+@example
+-3.5 => -4
+-2.5 => -2
+-1.5 => -2
+-0.5 => 0
+ 0.5 => 0
+ 1.5 => 2
+ 2.5 => 2
+ 3.5 => 4
+ 4.5 => 4
+@end example
+
+The theory behind the rounding mode @code{roundTiesToEven} is that
+it more or less evenly distributes upward and downward rounds
+of exact halves, which might cause the round-off error
+to cancel itself out. This is the default rounding mode used
+in IEEE-754 computing functions and operators.
+
+The other rounding modes are rarely used.
+Round toward positive infinity (@code{roundTowardPositive})
+and round toward negative infinity (@code{roundTowardNegative})
+are often used to implement interval arithmetic,
+where you adjust the rounding mode to calculate upper and lower bounds
+for the range of output. The @code{roundTowardZero}
+mode can be used for converting floating-point numbers to integers.
+The rounding mode @code{roundTiesToAway} rounds the result to the
+nearest number and selects the number with the larger magnitude
+if a tie occurs.
+
+Some numerical analysts will tell you that your choice of rounding style
+has tremendous impact on the final outcome, and advise you to wait until
+final output for any rounding. Instead, you can often avoid round-off error problems by
+setting the precision initially to some value sufficiently larger than
+the final desired precision, so that the accumulation of round-off error
+does not influence the outcome.
+If you suspect that results from your computation are
+sensitive to accumulation of round-off error,
+one way to be sure is to look for a significant difference in output
+when you change the rounding mode.
+
+@node Gawk and MPFR
+@section @command{gawk} + MPFR = Powerful Arithmetic
+
+The rest of this @value{CHAPTER} describes how to use the arbitrary precision
+(also known as @dfn{multiple precision} or @dfn{infinite precision}) numeric
+capabilities in @command{gawk} to produce maximally accurate results
+when you need it.
+
+But first you should check if your version of
+@command{gawk} supports arbitrary precision arithmetic.
+The easiest way to find out is to look at the output of
+the following command:
+
+@example
+$ @kbd{gawk --version}
+@print{} GNU Awk 4.1.0 (GNU MPFR 3.1.0, GNU MP 5.0.3)
+@print{} Copyright (C) 1989, 1991-2012 Free Software Foundation.
+@dots{}
+@end example
+
+@command{gawk} uses the
+@uref{http://www.mpfr.org, GNU MPFR}
+and
+@uref{http://gmplib.org, GNU MP} (GMP)
+libraries for arbitrary precision
+arithmetic on numbers. So if you do not see the names of these libraries
+in the output, then your version of @command{gawk} does not support
+arbitrary precision arithmetic.
+
+Additionally,
+there are a few elements available in the @code{PROCINFO} array
+to provide information about the MPFR and GMP libraries.
+@xref{Auto-set}, for more information.
+
@ignore
-@c Try this
+Even if you aren't interested in arbitrary precision arithmetic, you
+may still benefit from knowing about how @command{gawk} handles numbers
+in general, and the limitations of doing arithmetic with ordinary
+@command{gawk} numbers.
+@end ignore
+
+
+@node Arbitrary Precision Floats
+@section Arbitrary Precision Floating-point Arithmetic with @command{gawk}
+
+@command{gawk} uses the GNU MPFR library
+for arbitrary precision floating-point arithmetic. The MPFR library
+provides precise control over precisions and rounding modes, and gives
+correctly rounded, reproducible, platform-independent results. With the
+command-line option @option{--bignum} or @option{-M},
+all floating-point arithmetic operators and numeric functions can yield
+results to any desired precision level supported by MPFR.
+Two built-in variables, @code{PREC} and @code{ROUNDMODE},
+provide control over the working precision and the rounding mode
+(@pxref{Setting Precision}, and
+@pxref{Setting Rounding Mode}).
+The precision and the rounding mode are set globally for every operation
+to follow.
+
+The default working precision for arbitrary precision floating-point values is 53,
+and the default value for @code{ROUNDMODE} is @code{"N"},
+which selects the IEEE-754 @code{roundTiesToEven} rounding mode
+(@pxref{Rounding Mode}).@footnote{The
+default precision is 53, since according to the MPFR documentation,
+the library should be able to exactly reproduce all computations with
+double-precision machine floating-point numbers (@code{double} type
+in C), except the default exponent range is much wider and subnormal
+numbers are not implemented.}
+@command{gawk} uses the default exponent range in MPFR
@iftex
-@page
-@headings off
-@majorheading III@ @ @ Appendixes
-Part III provides the appendixes, the Glossary, and two licenses that cover
+(@math{emax = 2^{30} - 1, emin = -emax})
+@end iftex
+@ifnottex
+(@var{emax} = 2^30 @minus{} 1, @var{emin} = @minus{}@var{emax})
+@end ifnottex
+for all floating-point contexts.
+There is no explicit mechanism to adjust the exponent range.
+MPFR does not implement subnormal numbers by default,
+and this behavior cannot be changed in @command{gawk}.
+
+@quotation NOTE
+When emulating an IEEE-754 format (@pxref{Setting Precision}),
+@command{gawk} internally adjusts the exponent range
+to the value defined for the format and also performs computations needed for
+gradual underflow (subnormal numbers).
+@end quotation
+
+@quotation NOTE
+MPFR numbers are variable-size entities, consuming only as much space as
+needed to store the significant digits. Since the performance using MPFR
+numbers pales in comparison to doing arithmetic using the underlying machine
+types, you should consider using only as much precision as needed by
+your program.
+@end quotation
+
+@menu
+* Setting Precision:: Setting the working precision.
+* Setting Rounding Mode:: Setting the rounding mode.
+* Floating-point Constants:: Representing floating-point constants.
+* Changing Precision:: Changing the precision of a number.
+* Exact Arithmetic:: Exact arithmetic with floating-point numbers.
+@end menu
+
+@node Setting Precision
+@subsection Setting the Working Precision
+@cindex @code{PREC} variable
+
+@command{gawk} uses a global working precision; it does not keep track of
+the precision or accuracy of individual numbers. Performing an arithmetic
+operation or calling a built-in function rounds the result to the current
+working precision. The default working precision is 53, which can be
+modified using the built-in variable @code{PREC}. You can also set the
+value to one of the following pre-defined case-insensitive strings
+to emulate an IEEE-754 binary format:
+
+@multitable {@code{"double"}} {12345678901234567890123456789012345}
+@headitem @code{PREC} @tab IEEE-754 Binary Format
+@item @code{"half"} @tab 16-bit half-precision.
+@item @code{"single"} @tab Basic 32-bit single precision.
+@item @code{"double"} @tab Basic 64-bit double precision.
+@item @code{"quad"} @tab Basic 128-bit quadruple precision.
+@item @code{"oct"} @tab 256-bit octuple precision.
+@end multitable
+
+The following example illustrates the effects of changing precision
+on arithmetic operations:
+
+@example
+$ @kbd{gawk -M -v PREC=100 'BEGIN @{ x = 1.0e-400; print x + 0; \}
+> @kbd{PREC = "double"; print x + 0 @}'}
+@print{} 1e-400
+@print{} 0
+@end example
+
+Binary and decimal precisions are related approximately, according to the
+formula:
+
+@iftex
+@math{prec = 3.322 @cdot dps}
+@end iftex
+@ifnottex
+@var{prec} = 3.322 * @var{dps}
+@end ifnottex
+
+@noindent
+Here, @var{prec} denotes the binary precision
+(measured in bits) and @var{dps} (short for decimal places)
+is the decimal digits. We can easily calculate how many decimal
+digits the 53-bit significand of an IEEE double is equivalent to:
+53 / 3.332 which is equal to about 15.95.
+But what does 15.95 digits actually mean? It depends whether you are
+concerned about how many digits you can rely on, or how many digits
+you need.
+
+It is important to know how many bits it takes to uniquely identify
+a double-precision value (the C type @code{double}). If you want to
+convert from @code{double} to decimal and back to @code{double} (e.g.,
+saving a @code{double} representing an intermediate result to a file, and
+later reading it back to restart the computation), then a few more decimal
+digits are required. 17 digits is generally enough for a @code{double}.
+
+It can also be important to know what decimal numbers can be uniquely
+represented with a @code{double}. If you want to convert
+from decimal to @code{double} and back again, 15 digits is the most that
+you can get. Stated differently, you should not present
+the numbers from your floating-point computations with more than 15
+significant digits in them.
+
+Conversely, it takes a precision of 332 bits to hold an approximation
+of the constant @value{PI} that is accurate to 100 decimal places.
+
+You should always add some extra bits in order to avoid the confusing round-off
+issues that occur because numbers are stored internally in binary.
+
+@node Setting Rounding Mode
+@subsection Setting the Rounding Mode
+@cindex @code{ROUNDMODE} variable
+
+The @code{ROUNDMODE} variable provides
+program level control over the rounding mode.
+The correspondence between @code{ROUNDMODE} and the IEEE
+rounding modes is shown in @ref{table-gawk-rounding-modes}.
+
+@float Table,table-gawk-rounding-modes
+@caption{@command{gawk} Rounding Modes}
+@multitable @columnfractions .45 .30 .25
+@headitem Rounding Mode @tab IEEE Name @tab @code{ROUNDMODE}
+@item Round to nearest, ties to even @tab @code{roundTiesToEven} @tab @code{"N"} or @code{"n"}
+@item Round toward plus Infinity @tab @code{roundTowardPositive} @tab @code{"U"} or @code{"u"}
+@item Round toward negative Infinity @tab @code{roundTowardNegative} @tab @code{"D"} or @code{"d"}
+@item Round toward zero @tab @code{roundTowardZero} @tab @code{"Z"} or @code{"z"}
+@item Round to nearest, ties away from zero @tab @code{roundTiesToAway} @tab @code{"A"} or @code{"a"}
+@end multitable
+@end float
+
+@code{ROUNDMODE} has the default value @code{"N"},
+which selects the IEEE-754 rounding mode @code{roundTiesToEven}.
+@ref{table-gawk-rounding-modes}, lists @code{"A"} to select the IEEE-754 mode
+@code{roundTiesToAway}. This is only available
+if your version of the MPFR library supports it; otherwise setting
+@code{ROUNDMODE} to this value has no effect. @xref{Rounding Mode},
+for the meanings of the various rounding modes.
+
+Here is an example of how to change the default rounding behavior of
+@code{printf}'s output:
+
+@example
+$ @kbd{gawk -M -v ROUNDMODE="Z" 'BEGIN @{ printf("%.2f\n", 1.378) @}'}
+@print{} 1.37
+@end example
+
+@node Floating-point Constants
+@subsection Representing Floating-point Constants
+@cindex constants, floating-point
+
+Be wary of floating-point constants! When reading a floating-point constant
+from program source code, @command{gawk} uses the default precision,
+unless overridden
+by an assignment to the special variable @code{PREC} on the command
+line, to store it internally as a MPFR number.
+Changing the precision using @code{PREC} in the program text does
+@emph{not} change the precision of a constant. If you need to
+represent a floating-point constant at a higher precision than the
+default and cannot use a command line assignment to @code{PREC},
+you should either specify the constant as a string, or
+as a rational number, whenever possible. The following example
+illustrates the differences among various ways to
+print a floating-point constant:
+
+@example
+$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", 0.1) @}'}
+@print{} 0.1000000000000000055511151
+$ @kbd{gawk -M -v PREC=113 'BEGIN @{ printf("%0.25f\n", 0.1) @}'}
+@print{} 0.1000000000000000000000000
+$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", "0.1") @}'}
+@print{} 0.1000000000000000000000000
+$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", 1/10) @}'}
+@print{} 0.1000000000000000000000000
+@end example
+
+In the first case, the number is stored with the default precision of 53.
+
+@node Changing Precision
+@subsection Changing the Precision of a Number
+
+@cindex Laurie, Dirk
+@quotation
+@i{The point is that in any variable-precision package,
+a decision is made on how to treat numbers given as data,
+or arising in intermediate results, which are represented in
+floating-point format to a precision lower than working precision.
+Do we promote them to full membership of the high-precision club,
+or do we treat them and all their associates as second-class citizens?
+Sometimes the first course is proper, sometimes the second, and it takes
+careful analysis to tell which.}
+
+Dirk Laurie@footnote{Dirk Laurie.
+@cite{Variable-precision Arithmetic Considered Perilous --- A Detective Story}.
+Electronic Transactions on Numerical Analysis. Volume 28, pp. 168-173, 2008.}
+@end quotation
+
+@command{gawk} does not implicitly modify the precision of any previously
+computed results when the working precision is changed with an assignment
+to @code{PREC}. The precision of a number is always the one that was
+used at the time of its creation, and there is no way for the user
+to explicitly change it afterwards. However, since the result of a
+floating-point arithmetic operation is always an arbitrary precision
+floating-point value---with a precision set by the value of @code{PREC}---one of the
+following workarounds effectively accomplishes the desired behavior:
+
+@example
+x = x + 0.0
+@end example
+
+@noindent
+or:
+
+@example
+x += 0.0
+@end example
+
+@node Exact Arithmetic
+@subsection Exact Arithmetic with Floating-point Numbers
+
+@quotation CAUTION
+Never depend on the exactness of floating-point arithmetic,
+even for apparently simple expressions!
+@end quotation
+
+Can arbitrary precision arithmetic give exact results? There are
+no easy answers. The standard rules of algebra often do not apply
+when using floating-point arithmetic.
+Among other things, the distributive and associative laws
+do not hold completely, and order of operation may be important
+for your computation. Rounding error, cumulative precision loss
+and underflow are often troublesome.
+
+When @command{gawk} tests the expressions @samp{0.1 + 12.2} and @samp{12.3}
+for equality
+using the machine double precision arithmetic, it decides that they
+are not equal!
+(@xref{Floating-point Programming}.)
+You can get the result you want by increasing the precision;
+56 in this case will get the job done:
+
+@example
+$ @kbd{gawk -M -v PREC=56 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'}
+@print{} 1
+@end example
+
+If adding more bits is good, perhaps adding even more bits of
+precision is better?
+Here is what happens if we use an even larger value of @code{PREC}:
+
+@example
+$ @kbd{gawk -M -v PREC=201 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'}
+@print{} 0
+@end example
+
+This is not a bug in @command{gawk} or in the MPFR library.
+It is easy to forget that the finite number of bits used to store the value
+is often just an approximation after proper rounding.
+The test for equality succeeds if and only if @emph{all} bits in the two operands
+are exactly the same. Since this is not necessarily true after floating-point
+computations with a particular precision and effective rounding rule,
+a straight test for equality may not work.
+
+So, don't assume that floating-point values can be compared for equality.
+You should also exercise caution when using other forms of comparisons.
+The standard way to compare between floating-point numbers is to determine
+how much error (or @dfn{tolerance}) you will allow in a comparison and
+check to see if one value is within this error range of the other.
+
+In applications where 15 or fewer decimal places suffice,
+hardware double precision arithmetic can be adequate, and is usually much faster.
+But you do need to keep in mind that every floating-point operation
+can suffer a new rounding error with catastrophic consequences as illustrated
+by our earlier attempt to compute the value of the constant @value{PI}
+(@pxref{Floating-point Programming}).
+Extra precision can greatly enhance the stability and the accuracy
+of your computation in such cases.
+
+Repeated addition is not necessarily equivalent to multiplication
+in floating-point arithmetic. In the example in
+@ref{Floating-point Programming}:
+
+@example
+$ @kbd{gawk 'BEGIN @{}
+> @kbd{for (d = 1.1; d <= 1.5; d += 0.1)}
+> @kbd{i++}
+> @kbd{print i}
+> @kbd{@}'}
+@print{} 4
+@end example
+
+@noindent
+you may or may not succeed in getting the correct result by choosing
+an arbitrarily large value for @code{PREC}. Reformulation of
+the problem at hand is often the correct approach in such situations.
+
+@node Arbitrary Precision Integers
+@section Arbitrary Precision Integer Arithmetic with @command{gawk}
+@cindex integer, arbitrary precision
+
+If the option @option{--bignum} or @option{-M} is specified,
+@command{gawk} performs all
+integer arithmetic using GMP arbitrary precision integers.
+Any number that looks like an integer in a program source or data file
+is stored as an arbitrary precision integer.
+The size of the integer is limited only by your computer's memory.
+The current floating-point context has no effect on operations involving integers.
+For example, the following computes
+@iftex
+@math{5^{4^{3^{2}}}},
+@end iftex
+@ifnottex
+5^4^3^2,
+@end ifnottex
+the result of which is beyond the
+limits of ordinary @command{gawk} numbers:
+
+@example
+$ @kbd{gawk -M 'BEGIN @{}
+> @kbd{x = 5^4^3^2}
+> @kbd{print "# of digits =", length(x)}
+> @kbd{print substr(x, 1, 20), "...", substr(x, length(x) - 19, 20)}
+> @kbd{@}'}
+@print{} # of digits = 183231
+@print{} 62060698786608744707 ... 92256259918212890625
+@end example
+
+If you were to compute the same value using arbitrary precision
+floating-point values instead, the precision needed for correct output
+(using the formula
+@iftex
+@math{prec = 3.322 @cdot dps}),
+would be @math{3.322 @cdot 183231},
+@end iftex
+@ifnottex
+@samp{prec = 3.322 * dps}),
+would be 3.322 x 183231,
+@end ifnottex
+or 608693.
+
+The result from an arithmetic operation with an integer and a floating-point value
+is a floating-point value with a precision equal to the working precision.
+The following program calculates the eighth term in
+Sylvester's sequence@footnote{Weisstein, Eric W.
+@cite{Sylvester's Sequence}. From MathWorld---A Wolfram Web Resource.
+@url{http://mathworld.wolfram.com/SylvestersSequence.html}}
+using a recurrence:
+
+@example
+$ @kbd{gawk -M 'BEGIN @{}
+> @kbd{s = 2.0}
+> @kbd{for (i = 1; i <= 7; i++)}
+> @kbd{s = s * (s - 1) + 1}
+> @kbd{print s}
+> @kbd{@}'}
+@print{} 113423713055421845118910464
+@end example
+
+The output differs from the actual number, 113,423,713,055,421,844,361,000,443,
+because the default precision of 53 is not enough to represent the
+floating-point results exactly. You can either increase the precision
+(100 is enough in this case), or replace the floating-point constant
+@samp{2.0} with an integer, to perform all computations using integer
+arithmetic to get the correct output.
+
+It will sometimes be necessary for @command{gawk} to implicitly convert an
+arbitrary precision integer into an arbitrary precision floating-point value.
+This is primarily because the MPFR library does not always provide the
+relevant interface to process arbitrary precision integers or mixed-mode
+numbers as needed by an operation or function.
+In such a case, the precision is set to the minimum value necessary
+for exact conversion, and the working precision is not used for this purpose.
+If this is not what you need or want, you can employ a subterfuge
+like this:
+
+@example
+gawk -M 'BEGIN @{ n = 13; print (n + 0.0) % 2.0 @}'
+@end example
+
+You can avoid this issue altogether by specifying the number as a floating-point value
+to begin with:
+
+@example
+gawk -M 'BEGIN @{ n = 13.0; print n % 2.0 @}'
+@end example
+
+Note that for the particular example above, there is likely best
+to just use the following:
+
+@example
+gawk -M 'BEGIN @{ n = 13; print n % 2 @}'
+@end example
+
+@node Dynamic Extensions
+@chapter Writing Extensions for @command{gawk}
+
+It is possible to add new built-in functions to @command{gawk} using
+dynamically loaded libraries. This facility is available on systems
+that support the C @code{dlopen()} and @code{dlsym()}
+functions. This @value{CHAPTER} describes how to create extensions
+using code written in C or C++.
+
+If you don't know anything about C programming, you can safely skip this
+@value{CHAPTER}, although you may wish to review the documentation on the
+extensions that come with @command{gawk} (@pxref{Extension Samples}),
+and the @value{SECTION} on the @code{gawkextlib} project (@pxref{gawkextlib}).
+The sample extensions are automatically built and installed when
+@command{gawk} is.
+
+@quotation NOTE
+When @option{--sandbox} is specified, extensions are disabled
+(@pxref{Options}).
+@end quotation
+
+@menu
+* Extension Intro:: What is an extension.
+* Plugin License:: A note about licensing.
+* Extension Design:: Design notes about the extension API.
+* Extension API Description:: A full description of the API.
+* Extension Example:: Example C code for an extension.
+* Extension Samples:: The sample extensions that ship with
+ @code{gawk}.
+* gawkextlib:: The @code{gawkextlib} project.
+@end menu
+
+@node Extension Intro
+@section Introduction
+
+An @dfn{extension} (sometimes called a @dfn{plug-in}) is a piece of
+external compiled code that @command{gawk} can load at runtime to
+provide additional functionality, over and above the built-in capabilities
+described in the rest of this @value{DOCUMENT}.
+
+Extensions are useful because they allow you (of course) to extend
+@command{gawk}'s functionality. For example, they can provide access to
+system calls (such as @code{chdir()} to change directory) and to other
+C library routines that could be of use. As with most software,
+``the sky is the limit;'' if you can imagine something that you might
+want to do and can write in C or C++, you can write an extension to do it!
+
+Extensions are written in C or C++, using the @dfn{Application Programming
+Interface} (API) defined for this purpose by the @command{gawk}
+developers. The rest of this @value{CHAPTER} explains the design
+decisions behind the API, the facilities that it provides and how to use
+them, and presents a small sample extension. In addition, it documents
+the sample extensions included in the @command{gawk} distribution,
+and describes the @code{gawkextlib} project.
+
+@node Plugin License
+@section Extension Licensing
+
+Every dynamic extension should define the global symbol
+@code{plugin_is_GPL_compatible} to assert that it has been licensed under
+a GPL-compatible license. If this symbol does not exist, @command{gawk}
+emits a fatal error and exits when it tries to load your extension.
+
+The declared type of the symbol should be @code{int}. It does not need
+to be in any allocated section, though. The code merely asserts that
+the symbol exists in the global scope. Something like this is enough:
+
+@example
+int plugin_is_GPL_compatible;
+@end example
+
+@node Extension Design
+@section Extension API Design
+
+The first version of extensions for @command{gawk} was developed in
+the mid-1990s and released with @command{gawk} 3.1 in the late 1990s.
+The basic mechanisms and design remained unchanged for close to 15 years,
+until 2012.
+
+The old extension mechanism used data types and functions from
+@command{gawk} itself, with a ``clever hack'' to install extension
+functions.
+
+@command{gawk} included some sample extensions, of which a few were
+really useful. However, it was clear from the outset that the extension
+mechanism was bolted onto the side and was not really thought out.
+
+@menu
+* Old Extension Problems:: Problems with the old mechanism.
+* Extension New Mechanism Goals:: Goals for the new mechanism.
+* Extension Other Design Decisions:: Some other design decisions.
+* Extension Mechanism Outline:: An outline of how it works.
+* Extension Future Growth:: Some room for future growth.
+@end menu
+
+@node Old Extension Problems
+@subsection Problems With The Old Mechanism
+
+The old extension mechanism had several problems:
+
+@itemize @bullet
+@item
+It depended heavily upon @command{gawk} internals. Any time the
+@code{NODE} structure@footnote{A critical central data structure
+inside @command{gawk}.} changed, an extension would have to be
+recompiled. Furthermore, to really write extensions required understanding
+something about @command{gawk}'s internal functions. There was some
+documentation in this @value{DOCUMENT}, but it was quite minimal.
+
+@item
+Being able to call into @command{gawk} from an extension required linker
+facilities that are common on Unix-derived systems but that did
+not work on Windows systems; users wanting extensions on Windows
+had to statically link them into @command{gawk}, even though Windows supports
+dynamic loading of shared objects.
+
+@item
+The API would change occasionally as @command{gawk} changed; no compatibility
+between versions was ever offered or planned for.
+@end itemize
+
+Despite the drawbacks, the @command{xgawk} project developers forked
+@command{gawk} and developed several significant extensions. They also
+enhanced @command{gawk}'s facilities relating to file inclusion and
+shared object access.
+
+A new API was desired for a long time, but only in 2012 did the
+@command{gawk} maintainer and the @command{xgawk} developers finally
+start working on it together. More information about the @command{xgawk}
+project is provided in @ref{gawkextlib}.
+
+@node Extension New Mechanism Goals
+@subsection Goals For A New Mechanism
+
+Some goals for the new API were:
+
+@itemize @bullet
+@item
+The API should be independent of @command{gawk} internals. Changes in
+@command{gawk} internals should not be visible to the writer of an
+extension function.
+
+@item
+The API should provide @emph{binary} compatibility across @command{gawk}
+releases as long as the API itself does not change.
+
+@item
+The API should enable extensions written in C to have roughly the
+same ``appearance'' to @command{awk}-level code as @command{awk}
+functions do. This means that extensions should have:
+
+@itemize @minus
+@item
+The ability to access function parameters.
+
+@item
+The ability to turn an undefined parameter into an array (call by reference).
+
+@item
+The ability to create, access and update global variables.
+
+@item
+Easy access to all the elements of an array at once (``array flattening'')
+in order to loop over all the element in an easy fashion for C code.
+
+@item
+The ability to create arrays (including @command{gawk}'s true
+multi-dimensional arrays).
+@end itemize
+@end itemize
+
+Some additional important goals were:
+
+@itemize @bullet
+@item
+The API should use only features in ISO C 90, so that extensions
+can be written using the widest range of C and C++ compilers. The header
+should include the appropriate @samp{#ifdef __cplusplus} and @samp{extern "C"}
+magic so that a C++ compiler could be used. (If using C++, the runtime
+system has to be smart enough to call any constructors and destructors,
+as @command{gawk} is a C program. As of this writing, this has not been
+tested.)
+
+@item
+The API mechanism should not require access to @command{gawk}'s
+symbols@footnote{The @dfn{symbols} are the variables and functions
+defined inside @command{gawk}. Access to these symbols by code
+external to @command{gawk} loaded dynamically at runtime is
+problematic on Windows.} by the compile-time or dynamic linker,
+in order to enable creation of extensions that also work on Windows.
+@end itemize
+
+During development, it became clear that there were other features
+that should be available to extensions, which were also subsequently
+provided:
+
+@itemize @bullet
+@item
+Extensions should have the ability to hook into @command{gawk}'s
+I/O redirection mechanism. In particular, the @command{xgawk}
+developers provided a so-called ``open hook'' to take over reading
+records. During development, this was generalized to allow
+extensions to hook into input processing, output processing, and
+two-way I/O.
+
+@item
+An extension should be able to provide a ``call back'' function
+to perform clean up actions when @command{gawk} exits.
+
+@item
+An extension should be able to provide a version string so that
+@command{gawk}'s @option{--version} option can provide information
+about extensions as well.
+@end itemize
+
+@node Extension Other Design Decisions
+@subsection Other Design Decisions
+
+As an arbitrary design decision, extensions can read the values of
+built-in variables and arrays (such as @code{ARGV} and @code{FS}), but cannot
+change them, with the exception of @code{PROCINFO}.
+
+The reason for this is to prevent an extension function from affecting
+the flow of an @command{awk} program outside its control. While a real
+@command{awk} function can do what it likes, that is at the discretion
+of the programmer. An extension function should provide a service or
+make a C API available for use within @command{awk}, and not mess with
+@code{FS} or @code{ARGC} and @code{ARGV}.
+
+In addition, it becomes easy to start down a slippery slope. How
+much access to @command{gawk} facilities do extensions need?
+Do they need @code{getline}? What about calling @code{gsub()} or
+compiling regular expressions? What about calling into @command{awk}
+functions? (@emph{That} would be messy.)
+
+In order to avoid these issues, the @command{gawk} developers chose
+to start with the simplest, most basic features that are still truly useful.
+
+Another decision is that although @command{gawk} provides nice things like
+MPFR, and arrays indexed internally by integers, these features are not
+being brought out to the API in order to keep things simple and close to
+traditional @command{awk} semantics. (In fact, arrays indexed internally
+by integers are so transparent that they aren't even documented!)
+
+Additionally, all functions in the API check that their pointer
+input parameters are not @code{NULL}. If they are, they return an error.
+(It is a good idea for extension code to verify that
+pointers received from @command{gawk} are not @code{NULL}.
+Such a thing should not happen, but the @command{gawk} developers
+are only human, and they have been known to occasionally make
+mistakes.)
+
+With time, the API will undoubtedly evolve; the @command{gawk} developers
+expect this to be driven by user needs. For now, the current API seems
+to provide a minimal yet powerful set of features for creating extensions.
+
+@node Extension Mechanism Outline
+@subsection At A High Level How It Works
+
+The requirement to avoid access to @command{gawk}'s symbols is, at first
+glance, a difficult one to meet.
+
+One design, apparently used by Perl and Ruby and maybe others, would
+be to make the mainline @command{gawk} code into a library, with the
+@command{gawk} utility a small C @code{main()} function linked against
+the library.
+
+This seemed like the tail wagging the dog, complicating build and
+installation and making a simple copy of the @command{gawk} executable
+from one system to another (or one place to another on the same
+system!) into a chancy operation.
+
+Pat Rankin suggested the solution that was adopted. Communication between
+@command{gawk} and an extension is two-way. First, when an extension
+is loaded, it is passed a pointer to a @code{struct} whose fields are
+function pointers.
+This is shown in @ref{load-extension}.
+
+@float Figure,load-extension
+@caption{Loading The Extension}
+@ifinfo
+@center @image{api-figure1, , , Loading the extension, txt}
+@end ifinfo
+@ifnotinfo
+@center @image{api-figure1, , , Loading the extension}
+@end ifnotinfo
+@end float
+
+The extension can call functions inside @command{gawk} through these
+function pointers, at runtime, without needing (link-time) access
+to @command{gawk}'s symbols. One of these function pointers is to a
+function for ``registering'' new built-in functions.
+This is shown in @ref{load-new-function}.
+
+@float Figure,load-new-function
+@caption{Loading The New Function}
+@ifinfo
+@center @image{api-figure2, , , Loading the new function, txt}
+@end ifinfo
+@ifnotinfo
+@center @image{api-figure2, , , Loading the new function}
+@end ifnotinfo
+@end float
+
+In the other direction, the extension registers its new functions
+with @command{gawk} by passing function pointers to the functions that
+provide the new feature (@code{do_chdir()}, for example). @command{gawk}
+associates the function pointer with a name and can then call it, using a
+defined calling convention.
+This is shown in @ref{call-new-function}.
+
+@float Figure,call-new-function
+@caption{Calling The New Function}
+@ifinfo
+@center @image{api-figure3, , , Calling the new function, txt}
+@end ifinfo
+@ifnotinfo
+@center @image{api-figure3, , , Calling the new function}
+@end ifnotinfo
+@end float
+
+The @code{do_@var{xxx}()} function, in turn, then uses the function
+pointers in the API @code{struct} to do its work, such as updating
+variables or arrays, printing messages, setting @code{ERRNO}, and so on.
+
+Convenience macros in the @file{gawkapi.h} header file make calling
+through the function pointers look like regular function calls so that
+extension code is quite readable and understandable.
+
+Although all of this sounds somewhat complicated, the result is that
+extension code is quite straightforward to write and to read. You can
+see this in the sample extensions @file{filefuncs.c} (@pxref{Extension
+Example}) and also the @file{testext.c} code for testing the APIs.
+
+Some other bits and pieces:
+
+@itemize @bullet
+@item
+The API provides access to @command{gawk}'s @code{do_@var{xxx}} values,
+reflecting command line options, like @code{do_lint}, @code{do_profiling}
+and so on (@pxref{Extension API Variables}).
+These are informational: an extension cannot affect these
+inside @command{gawk}. In addition, attempting to assign to them
+produces a compile-time error.
+
+@item
+The API also provides major and minor version numbers, so that an
+extension can check if the @command{gawk} it is loaded with supports the
+facilities it was compiled with. (Version mismatches ``shouldn't''
+happen, but we all know how @emph{that} goes.)
+@xref{Extension Versioning}, for details.
+@end itemize
+
+@node Extension Future Growth
+@subsection Room For Future Growth
+
+The API can later be expanded, in two ways:
+
+@itemize @bullet
+@item
+@command{gawk} passes an ``extension id'' into the extension when it
+first loads the extension. The extension then passes this id back
+to @command{gawk} with each function call. This mechanism allows
+@command{gawk} to identify the extension calling into it, should it need
+to know.
+
+@item
+Similarly, the extension passes a ``name space'' into @command{gawk}
+when it registers each extension function. This allows a future
+mechanism for grouping extension functions and possibly avoiding name
+conflicts.
+@end itemize
+
+Of course, as of this writing, no decisions have been made with respect
+to any of the above.
+
+@node Extension API Description
+@section API Description
+
+This (rather large) @value{SECTION} describes the API in detail.
+
+@menu
+* Extension API Functions Introduction:: Introduction to the API functions.
+* General Data Types:: The data types.
+* Requesting Values:: How to get a value.
+* Constructor Functions:: Functions for creating values.
+* Registration Functions:: Functions to register things with
+ @command{gawk}.
+* Printing Messages:: Functions for printing messages.
+* Updating @code{ERRNO}:: Functions for updating @code{ERRNO}.
+* Accessing Parameters:: Functions for accessing parameters.
+* Symbol Table Access:: Functions for accessing global
+ variables.
+* Array Manipulation:: Functions for working with arrays.
+* Extension API Variables:: Variables provided by the API.
+* Extension API Boilerplate:: Boilerplate code for using the API.
+* Finding Extensions:: How @command{gawk} finds compiled
+ extensions.
+@end menu
+
+@node Extension API Functions Introduction
+@subsection Introduction
+
+Access to facilities within @command{gawk} are made available
+by calling through function pointers passed into your extension.
+
+API function pointers are provided for the following kinds of operations:
+
+@itemize @bullet
+@item
+Registrations functions. You may register:
+@itemize @minus
+@item
+extension functions,
+@item
+exit callbacks,
+@item
+a version string,
+@item
+input parsers,
+@item
+output wrappers,
+@item
+and two-way processors.
+@end itemize
+All of these are discussed in detail, later in this @value{CHAPTER}.
+
+@item
+Printing fatal, warning, and ``lint'' warning messages.
+
+@item
+Updating @code{ERRNO}, or unsetting it.
+
+@item
+Accessing parameters, including converting an undefined parameter into
+an array.
+
+@item
+Symbol table access: retrieving a global variable, creating one,
+or changing one. This also includes the ability to create a scalar
+variable that will be @emph{constant} within @command{awk} code.
+
+@item
+Creating and releasing cached values; this provides an
+efficient way to use values for multiple variables and
+can be a big performance win.
+
+@item
+Manipulating arrays:
+@itemize @minus
+@item
+Retrieving, adding, deleting, and modifying elements
+@item
+Getting the count of elements in an array
+@item
+Creating a new array
+@item
+Clearing an array
+@item
+Flattening an array for easy C style looping over all its indices and elements
+@end itemize
+@end itemize
+
+Some points about using the API:
+
+@itemize @bullet
+@item
+The following types and/or macros and/or functions are referenced
+in @file{gawkapi.h}. For correct use, you must therefore include the
+corresponding standard header file @emph{before} including @file{gawkapi.h}:
+
+@multitable {C Entity} {@code{<sys/types.h>}}
+@headitem C Entity @tab Header File
+@item @code{FILE} @tab @code{<stdio.h>}
+@item @code{NULL} @tab @code{<stddef.h>}
+@item @code{malloc()} @tab @code{<stdlib.h>}
+@item @code{memset()}, @code{memcpy()} @tab @code{<string.h>}
+@item @code{size_t} @tab @code{<sys/types.h>}
+@item @code{struct stat} @tab @code{<sys/stat.h>}
+@end multitable
+
+Due to portability concerns, especially to systems that are not
+fully standards-compliant, it is your responsibility
+to include the correct files in the correct way. This requirement
+is necessary in order to keep @file{gawkapi.h} clean, instead of becoming
+a portability hodge-podge as can be seen in the @command{gawk} source code.
+
+To pass reasonable integer values for @code{ERRNO}, you will also need to
+include @code{<errno.h>}.
+
+@item
+The @file{gawkapi.h} file may be included more than once without ill effect.
+Doing so, however, is poor coding practice.
+
+@item
+Although the API only uses ISO C 90 features, there is an exception; the
+``constructor'' functions use the @code{inline} keyword. If your compiler
+does not support this keyword, you should either place
+@samp{-Dinline=''} on your command line, or use the GNU Autotools and include a
+@file{config.h} file in your extensions.
+
+@item
+All pointers filled in by @command{gawk} are to memory
+managed by @command{gawk} and should be treated by the extension as
+read-only. Memory for @emph{all} strings passed into @command{gawk}
+from the extension @emph{must} come from @code{malloc()} and is managed
+by @command{gawk} from then on.
+
+@item
+The API defines several simple structs that map values as seen
+from @command{awk}. A value can be a @code{double}, a string, or an
+array (as in multidimensional arrays, or when creating a new array).
+Strings maintain both pointer and length since embedded @code{NUL}
+characters are allowed.
+
+By intent, strings are maintained using the current multibyte encoding (as
+defined by @env{LC_@var{xxx}} environment variables) and not using wide
+characters. This matches how @command{gawk} stores strings internally
+and also how characters are likely to be input and output from files.
+
+@item
+When retrieving a value (such as a parameter or that of a global variable
+or array element), the extension requests a specific type (number, string,
+scalars, value cookie, array, or ``undefined''). When the request is
+``undefined,'' the returned value will have the real underlying type.
+
+However, if the request and actual type don't match, the access function
+returns ``false'' and fills in the type of the actual value that is there,
+so that the extension can, e.g., print an error message
+(``scalar passed where array expected'').
+
+@c This is documented in the header file and needs some expanding upon.
+@c The table there should be presented here
+@end itemize
+
+While you may call the API functions by using the function pointers
+directly, the interface is not so pretty. To make extension code look
+more like regular code, the @file{gawkapi.h} header file defines several
+macros that you should use in your code. This @value{SECTION} presents
+the macros as if they were functions.
+
+@node General Data Types
+@subsection General Purpose Data Types
+
+@quotation
+@i{I have a true love/hate relationship with unions.}@*
+Arnold Robbins
+
+@i{That's the thing about unions: the compiler will arrange things so they
+can accommodate both love and hate.}@*
+Chet Ramey
+@end quotation
+
+The extension API defines a number of simple types and structures for general
+purpose use. Additional, more specialized, data structures, are introduced
+in subsequent @value{SECTION}s, together with the functions that use them.
+
+@table @code
+@item typedef void *awk_ext_id_t;
+A value of this type is received from @command{gawk} when an extension is loaded.
+That value must then be passed back to @command{gawk} as the first parameter of
+each API function.
+
+@item #define awk_const @dots{}
+This macro expands to @samp{const} when compiling an extension,
+and to nothing when compiling @command{gawk} itself. This makes
+certain fields in the API data structures unwritable from extension code,
+while allowing @command{gawk} to use them as it needs to.
+
+@item typedef enum awk_bool @{
+@item @ @ @ @ awk_false = 0,
+@item @ @ @ @ awk_true
+@item @} awk_bool_t;
+A simple boolean type.
+
+@item typedef struct awk_string @{
+@itemx @ @ @ @ char *str;@ @ @ @ @ @ /* data */
+@itemx @ @ @ @ size_t len;@ @ @ @ @ /* length thereof, in chars */
+@itemx @} awk_string_t;
+This represents a mutable string. @command{gawk}
+owns the memory pointed to if it supplied
+the value. Otherwise, it takes ownership of the memory pointed to.
+@strong{Such memory must come from @code{malloc()}!}
+
+As mentioned earlier, strings are maintained using the current
+multibyte encoding.
+
+@item typedef enum @{
+@itemx @ @ @ @ AWK_UNDEFINED,
+@itemx @ @ @ @ AWK_NUMBER,
+@itemx @ @ @ @ AWK_STRING,
+@itemx @ @ @ @ AWK_ARRAY,
+@itemx @ @ @ @ AWK_SCALAR,@ @ @ @ @ @ @ @ @ /* opaque access to a variable */
+@itemx @ @ @ @ AWK_VALUE_COOKIE@ @ @ /* for updating a previously created value */
+@itemx @} awk_valtype_t;
+This @code{enum} indicates the type of a value.
+It is used in the following @code{struct}.
+
+@item typedef struct awk_value @{
+@itemx @ @ @ @ awk_valtype_t val_type;
+@itemx @ @ @ @ union @{
+@itemx @ @ @ @ @ @ @ @ awk_string_t@ @ @ @ @ @ @ s;
+@itemx @ @ @ @ @ @ @ @ double@ @ @ @ @ @ @ @ @ @ @ @ @ d;
+@itemx @ @ @ @ @ @ @ @ awk_array_t@ @ @ @ @ @ @ @ a;
+@itemx @ @ @ @ @ @ @ @ awk_scalar_t@ @ @ @ @ @ @ scl;
+@itemx @ @ @ @ @ @ @ @ awk_value_cookie_t@ vc;
+@itemx @ @ @ @ @} u;
+@itemx @} awk_value_t;
+An ``@command{awk} value.''
+The @code{val_type} member indicates what kind of value the
+@code{union} holds, and each member is of the appropriate type.
+
+@item #define str_value@ @ @ @ @ @ u.s
+@itemx #define num_value@ @ @ @ @ @ u.d
+@itemx #define array_cookie@ @ @ u.a
+@itemx #define scalar_cookie@ @ u.scl
+@itemx #define value_cookie@ @ @ u.vc
+These macros make accessing the fields of the @code{awk_value_t} more
+readable.
+
+@item typedef void *awk_scalar_t;
+Scalars can be represented as an opaque type. These values are obtained from
+@command{gawk} and then passed back into it. This is discussed in a general fashion below,
+and in more detail in @ref{Symbol table by cookie}.
+
+@item typedef void *awk_value_cookie_t;
+A ``value cookie'' is an opaque type representing a cached value.
+This is also discussed in a general fashion below,
+and in more detail in @ref{Cached values}.
+
+@end table
+
+Scalar values in @command{awk} are either numbers or strings. The
+@code{awk_value_t} struct represents values. The @code{val_type} member
+indicates what is in the @code{union}.
+
+Representing numbers is easy---the API uses a C @code{double}. Strings
+require more work. Since @command{gawk} allows embedded @code{NUL} bytes
+in string values, a string must be represented as a pair containing a
+data-pointer and length. This is the @code{awk_string_t} type.
+
+Identifiers (i.e., the names of global variables) can be associated
+with either scalar values or with arrays. In addition, @command{gawk}
+provides true arrays of arrays, where any given array element can
+itself be an array. Discussion of arrays is delayed until
+@ref{Array Manipulation}.
+
+The various macros listed earlier make it easier to use the elements
+of the @code{union} as if they were fields in a @code{struct}; this
+is a common coding practice in C. Such code is easier to write and to
+read, however it remains @emph{your} responsibility to make sure that
+the @code{val_type} member correctly reflects the type of the value in
+the @code{awk_value_t}.
+
+Conceptually, the first three members of the @code{union} (number, string,
+and array) are all that is needed for working with @command{awk} values.
+However, since the API provides routines for accessing and changing
+the value of global scalar variables only by using the variable's name,
+there is a performance penalty: @command{gawk} must find the variable
+each time it is accessed and changed. This turns out to be a real issue,
+not just a theoretical one.
+
+Thus, if you know that your extension will spend considerable time
+reading and/or changing the value of one or more scalar variables, you
+can obtain a @dfn{scalar cookie}@footnote{See
+@uref{http://catb.org/jargon/html/C/cookie.html, the ``cookie'' entry in the Jargon file} for a
+definition of @dfn{cookie}, and @uref{http://catb.org/jargon/html/M/magic-cookie.html,
+the ``magic cookie'' entry in the Jargon file} for a nice example. See
+also the entry for ``Cookie'' in the @ref{Glossary}.}
+object for that variable, and then use
+the cookie for getting the variable's value or for changing the variable's
+value.
+This is the @code{awk_scalar_t} type and @code{scalar_cookie} macro.
+Given a scalar cookie, @command{gawk} can directly retrieve or
+modify the value, as required, without having to first find it.
+
+The @code{awk_value_cookie_t} type and @code{value_cookie} macro are similar.
+If you know that you wish to
+use the same numeric or string @emph{value} for one or more variables,
+you can create the value once, retaining a @dfn{value cookie} for it,
+and then pass in that value cookie whenever you wish to set the value of a
+variable. This saves both storage space within the running @command{gawk}
+process as well as the time needed to create the value.
+
+@node Requesting Values
+@subsection Requesting Values
+
+All of the functions that return values from @command{gawk}
+work in the same way. You pass in an @code{awk_valtype_t} value
+to indicate what kind of value you expect. If the actual value
+matches what you requested, the function returns true and fills
+in the @code{awk_value_t} result.
+Otherwise, the function returns false, and the @code{val_type}
+member indicates the type of the actual value. You may then
+print an error message, or reissue the request for the actual
+value type, as appropriate. This behavior is summarized in
+@ref{table-value-types-returned}.
+
+@ifnotplaintext
+@float Table,table-value-types-returned
+@caption{Value Types Returned}
+@multitable @columnfractions .50 .50
+@headitem @tab Type of Actual Value:
+@end multitable
+@multitable @columnfractions .166 .166 .198 .15 .15 .166
+@headitem @tab @tab String @tab Number @tab Array @tab Undefined
+@item @tab @b{String} @tab String @tab String @tab false @tab false
+@item @tab @b{Number} @tab Number if can be converted, else false @tab Number @tab false @tab false
+@item @b{Type} @tab @b{Array} @tab false @tab false @tab Array @tab false
+@item @b{Requested:} @tab @b{Scalar} @tab Scalar @tab Scalar @tab false @tab false
+@item @tab @b{Undefined} @tab String @tab Number @tab Array @tab Undefined
+@item @tab @b{Value Cookie} @tab false @tab false @tab false @tab false
+@end multitable
+@end float
+@end ifnotplaintext
+@ifplaintext
+@float Table,table-value-types-returned
+@caption{Value Types Returned}
+@example
+ +-------------------------------------------------+
+ | Type of Actual Value: |
+ +------------+------------+-----------+-----------+
+ | String | Number | Array | Undefined |
++-----------+-----------+------------+------------+-----------+-----------+
+| | String | String | String | false | false |
+| |-----------+------------+------------+-----------+-----------+
+| | Number | Number if | Number | false | false |
+| | | can be | | | |
+| | | converted, | | | |
+| | | else false | | | |
+| |-----------+------------+------------+-----------+-----------+
+| Type | Array | false | false | Array | false |
+| Requested |-----------+------------+------------+-----------+-----------+
+| | Scalar | Scalar | Scalar | false | false |
+| |-----------+------------+------------+-----------+-----------+
+| | Undefined | String | Number | Array | Undefined |
+| |-----------+------------+------------+-----------+-----------+
+| | Value | false | false | false | false |
+| | Cookie | | | | |
++-----------+-----------+------------+------------+-----------+-----------+
+@end example
+@end float
+@end ifplaintext
+
+@node Constructor Functions
+@subsection Constructor Functions and Convenience Macros
+
+The API provides a number of @dfn{constructor} functions for creating
+string and numeric values, as well as a number of convenience macros.
+This @value{SUBSECTION} presents them all as function prototypes, in
+the way that extension code would use them.
+
+@table @code
+@item static inline awk_value_t *
+@itemx make_const_string(const char *string, size_t length, awk_value_t *result)
+This function creates a string value in the @code{awk_value_t} variable
+pointed to by @code{result}. It expects @code{string} to be a C string constant
+(or other string data), and automatically creates a @emph{copy} of the data
+for storage in @code{result}. It returns @code{result}.
+
+@item static inline awk_value_t *
+@itemx make_malloced_string(const char *string, size_t length, awk_value_t *result)
+This function creates a string value in the @code{awk_value_t} variable
+pointed to by @code{result}. It expects @code{string} to be a @samp{char *}
+value pointing to data previously obtained from @code{malloc()}. The idea here
+is that the data is passed directly to @command{gawk}, which assumes
+responsibility for it. It returns @code{result}.
+
+@item static inline awk_value_t *
+@itemx make_null_string(awk_value_t *result)
+This specialized function creates a null string (the ``undefined'' value)
+in the @code{awk_value_t} variable pointed to by @code{result}.
+It returns @code{result}.
+
+@item static inline awk_value_t *
+@itemx make_number(double num, awk_value_t *result)
+This function simply creates a numeric value in the @code{awk_value_t} variable
+pointed to by @code{result}.
+@end table
+
+Two convenience macros may be used for allocating storage from @code{malloc()}
+and @code{realloc()}. If the allocation fails, they cause @command{gawk} to
+exit with a fatal error message. They should be used as if they were
+procedure calls that do not return a value.
+
+@table @code
+@item emalloc(pointer, type, size, message)
+The arguments to this macro are as follows:
+@c nested table
+@table @code
+@item pointer
+The pointer variable to point at the allocated storage.
+
+@item type
+The type of the pointer variable, used to create a cast for the call to @code{malloc()}.
+
+@item size
+The total number of bytes to be allocated.
+
+@item message
+A message to be prefixed to the fatal error message. Typically this is the name
+of the function using the macro.
+@end table
+
+@noindent
+For example, you might allocate a string value like so:
+
+@example
+awk_value_t result;
+char *message;
+const char greet[] = "Don't Panic!";
+
+emalloc(message, char *, sizeof(greet), "myfunc");
+strcpy(message, greet);
+make_malloced_string(message, strlen(message), & result);
+@end example
+
+@item erealloc(pointer, type, size, message)
+This is like @code{emalloc()}, but it calls @code{realloc()},
+instead of @code{malloc()}.
+The arguments are the same as for the @code{emalloc()} macro.
+@end table
+
+@node Registration Functions
+@subsection Registration Functions
+
+This @value{SECTION} describes the API functions for
+registering parts of your extension with @command{gawk}.
+
+@menu
+* Extension Functions:: Registering extension functions.
+* Exit Callback Functions:: Registering an exit callback.
+* Extension Version String:: Registering a version string.
+* Input Parsers:: Registering an input parser.
+* Output Wrappers:: Registering an output wrapper.
+* Two-way processors:: Registering a two-way processor.
+@end menu
+
+@node Extension Functions
+@subsubsection Registering An Extension Function
+
+Extension functions are described by the following record:
+
+@example
+typedef struct awk_ext_func @{
+@ @ @ @ const char *name;
+@ @ @ @ awk_value_t *(*function)(int num_actual_args, awk_value_t *result);
+@ @ @ @ size_t num_expected_args;
+@} awk_ext_func_t;
+@end example
+
+The fields are:
+
+@table @code
+@item const char *name;
+The name of the new function.
+@command{awk} level code calls the function by this name.
+This is a regular C string.
+
+@item awk_value_t *(*function)(int num_actual_args, awk_value_t *result);
+This is a pointer to the C function that provides the desired
+functionality.
+The function must fill in the result with either a number
+or a string. @command{awk} takes ownership of any string memory.
+As mentioned earlier, string memory @strong{must} come from @code{malloc()}.
+
+The function must return the value of @code{result}.
+This is for the convenience of the calling code inside @command{gawk}.
+
+@item size_t num_expected_args;
+This is the number of arguments the function expects to receive.
+Each extension function may decide what to do if the number of
+arguments isn't what it expected. Following @command{awk} functions, it
+is likely OK to ignore extra arguments.
+@end table
+
+Once you have a record representing your extension function, you register
+it with @command{gawk} using this API function:
+
+@table @code
+@item awk_bool_t add_ext_func(const char *namespace, const awk_ext_func_t *func);
+This function returns true upon success, false otherwise.
+The @code{namespace} parameter is currently not used; you should pass in an
+empty string (@code{""}). The @code{func} pointer is the address of a
+@code{struct} representing your function, as just described.
+@end table
+
+@node Exit Callback Functions
+@subsubsection Registering An Exit Callback Function
+
+An @dfn{exit callback} function is a function that
+@command{gawk} calls before it exits.
+Such functions are useful if you have general ``clean up'' tasks
+that should be performed in your extension (such as closing data
+base connections or other resource deallocations).
+You can register such
+a function with @command{gawk} using the following function.
+
+@table @code
+@item void awk_atexit(void (*funcp)(void *data, int exit_status),
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ void *arg0);
+The parameters are:
+@c nested table
+@table @code
+@item funcp
+A pointer to the function to be called before @command{gawk} exits. The @code{data}
+parameter will be the original value of @code{arg0}.
+The @code{exit_status} parameter is
+the exit status value that @command{gawk} will pass to the @code{exit()} system call.
+
+@item arg0
+A pointer to private data which @command{gawk} saves in order to pass to
+the function pointed to by @code{funcp}.
+@end table
+@end table
+
+Exit callback functions are called in Last-In-First-Out (LIFO) order---that is, in
+the reverse order in which they are registered with @command{gawk}.
+
+@node Extension Version String
+@subsubsection Registering An Extension Version String
+
+You can register a version string which indicates the name and
+version of your extension, with @command{gawk}, as follows:
+
+@table @code
+@item void register_ext_version(const char *version);
+Register the string pointed to by @code{version} with @command{gawk}.
+@command{gawk} does @emph{not} copy the @code{version} string, so
+it should not be changed.
+@end table
+
+@command{gawk} prints all registered extension version strings when it
+is invoked with the @option{--version} option.
+
+@node Input Parsers
+@subsubsection Customized Input Parsers
+
+By default, @command{gawk} reads text files as its input. It uses the value
+of @code{RS} to find the end of the record, and then uses @code{FS}
+(or @code{FIELDWIDTHS}) to split it into fields (@pxref{Reading Files}).
+Additionally, it sets the value of @code{RT} (@pxref{Built-in Variables}).
+
+If you want, you can provide your own custom input parser. An input
+parser's job is to return a record to the @command{gawk} record processing
+code, along with indicators for the value and length of the data to be
+used for @code{RT}, if any.
+
+To provide an input parser, you must first provide two functions
+(where @var{XXX} is a prefix name for your extension):
+
+@table @code
+@item awk_bool_t @var{XXX}_can_take_file(const awk_input_buf_t *iobuf)
+This function examines the information available in @code{iobuf}
+(which we discuss shortly). Based on the information there, it
+decides if the input parser should be used for this file.
+If so, it should return true. Otherwise, it should return false.
+It should not change any state (variable values, etc.) within @command{gawk}.
+
+@item awk_bool_t @var{XXX}_take_control_of(awk_input_buf_t *iobuf)
+When @command{gawk} decides to hand control of the file over to the
+input parser, it calls this function. This function in turn must fill
+in certain fields in the @code{awk_input_buf_t} structure, and ensure
+that certain conditions are true. It should then return true. If an
+error of some kind occurs, it should not fill in any fields, and should
+return false; then @command{gawk} will not use the input parser.
+The details are presented shortly.
+@end table
+
+Your extension should package these functions inside an
+@code{awk_input_parser_t}, which looks like this:
+
+@example
+typedef struct awk_input_parser @{
+ const char *name; /* name of parser */
+ awk_bool_t (*can_take_file)(const awk_input_buf_t *iobuf);
+ awk_bool_t (*take_control_of)(awk_input_buf_t *iobuf);
+ awk_const struct awk_input_parser *awk_const next; /* for use by gawk */
+@} awk_input_parser_t;
+@end example
+
+The fields are:
+
+@table @code
+@item const char *name;
+The name of the input parser. This is a regular C string.
+
+@item awk_bool_t (*can_take_file)(const awk_input_buf_t *iobuf);
+A pointer to your @code{@var{XXX}_can_take_file()} function.
+
+@item awk_bool_t (*take_control_of)(awk_input_buf_t *iobuf);
+A pointer to your @code{@var{XXX}_take_control_of()} function.
+
+@item awk_const struct input_parser *awk_const next;
+This pointer is used by @command{gawk}.
+The extension cannot modify it.
+@end table
+
+The steps are as follows:
+
+@enumerate
+@item
+Create a @code{static awk_input_parser_t} variable and initialize it
+appropriately.
+
+@item
+When your extension is loaded, register your input parser with
+@command{gawk} using the @code{register_input_parser()} API function
+(described below).
+@end enumerate
+
+An @code{awk_input_buf_t} looks like this:
+
+@example
+typedef struct awk_input @{
+ const char *name; /* filename */
+ int fd; /* file descriptor */
+#define INVALID_HANDLE (-1)
+ void *opaque; /* private data for input parsers */
+ int (*get_record)(char **out, struct awk_input *iobuf,
+ int *errcode, char **rt_start, size_t *rt_len);
+ void (*close_func)(struct awk_input *iobuf);
+ struct stat sbuf; /* stat buf */
+@} awk_input_buf_t;
+@end example
+
+The fields can be divided into two categories: those for use (initially,
+at least) by @code{@var{XXX}_can_take_file()}, and those for use by
+@code{@var{XXX}_take_control_of()}. The first group of fields and their uses
+are as follows:
+
+@table @code
+@item const char *name;
+The name of the file.
+
+@item int fd;
+A file descriptor for the file. If @command{gawk} was able to
+open the file, then @code{fd} will @emph{not} be equal to
+@code{INVALID_HANDLE}. Otherwise, it will.
+
+@item struct stat sbuf;
+If file descriptor is valid, then @command{gawk} will have filled
+in this structure via a call to the @code{fstat()} system call.
+@end table
+
+The @code{@var{XXX}_can_take_file()} function should examine these
+fields and decide if the input parser should be used for the file.
+The decision can be made based upon @command{gawk} state (the value
+of a variable defined previously by the extension and set by
+@command{awk} code), the name of the
+file, whether or not the file descriptor is valid, the information
+in the @code{struct stat}, or any combination of the above.
+
+Once @code{@var{XXX}_can_take_file()} has returned true, and
+@command{gawk} has decided to use your input parser, it calls
+@code{@var{XXX}_take_control_of()}. That function then fills in at
+least the @code{get_record} field of the @code{awk_input_buf_t}. It must
+also ensure that @code{fd} is not set to @code{INVALID_HANDLE}. All of
+the fields that may be filled by @code{@var{XXX}_take_control_of()}
+are as follows:
+
+@table @code
+@item void *opaque;
+This is used to hold any state information needed by the input parser
+for this file. It is ``opaque'' to @command{gawk}. The input parser
+is not required to use this pointer.
+
+@item int@ (*get_record)(char@ **out,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ struct@ awk_input *iobuf,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ int *errcode,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ char **rt_start,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ size_t *rt_len);
+This function pointer should point to a function that creates the input
+records. Said function is the core of the input parser. Its behavior
+is described below.
+
+@item void (*close_func)(struct awk_input *iobuf);
+This function pointer should point to a function that does
+the ``tear down.'' It should release any resources allocated by
+@code{@var{XXX}_take_control_of()}. It may also close the file. If it
+does so, it should set the @code{fd} field to @code{INVALID_HANDLE}.
+
+If @code{fd} is still not @code{INVALID_HANDLE} after the call to this
+function, @command{gawk} calls the regular @code{close()} system call.
+
+Having a ``tear down'' function is optional. If your input parser does
+not need it, do not set this field. Then, @command{gawk} calls the
+regular @code{close()} system call on the file descriptor, so it should
+be valid.
+@end table
+
+The @code{@var{XXX}_get_record()} function does the work of creating
+input records. The parameters are as follows:
+
+@table @code
+@item char **out
+This is a pointer to a @code{char *} variable which is set to point
+to the record. @command{gawk} makes its own copy of the data, so
+the extension must manage this storage.
+
+@item struct awk_input *iobuf
+This is the @code{awk_input_buf_t} for the file. The fields should be
+used for reading data (@code{fd}) and for managing private state
+(@code{opaque}), if any.
+
+@item int *errcode
+If an error occurs, @code{*errcode} should be set to an appropriate
+code from @code{<errno.h>}.
+
+@item char **rt_start
+@itemx size_t *rt_len
+If the concept of a ``record terminator'' makes sense, then
+@code{*rt_start} should be set to point to the data to be used for
+@code{RT}, and @code{*rt_len} should be set to the length of the
+data. Otherwise, @code{*rt_len} should be set to zero.
+@code{gawk} makes its own copy of this data, so the
+extension must manage the storage.
+@end table
+
+The return value is the length of the buffer pointed to by
+@code{*out}, or @code{EOF} if end-of-file was reached or an
+error occurred.
+
+It is guaranteed that @code{errcode} is a valid pointer, so there is no
+need to test for a @code{NULL} value. @command{gawk} sets @code{*errcode}
+to zero, so there is no need to set it unless an error occurs.
+
+If an error does occur, the function should return @code{EOF} and set
+@code{*errcode} to a non-zero value. In that case, if @code{*errcode}
+does not equal @minus{}1, @command{gawk} automatically updates
+the @code{ERRNO} variable based on the value of @code{*errcode} (e.g.,
+setting @samp{*errcode = errno} should do the right thing).
+
+@command{gawk} ships with a sample extension that reads directories,
+returning records for each entry in the directory (@pxref{Extension
+Sample Readdir}). You may wish to use that code as a guide for writing
+your own input parser.
+
+When writing an input parser, you should think about (and document)
+how it is expected to interact with @command{awk} code. You may want
+it to always be called, and take effect as appropriate (as the
+@code{readdir} extension does). Or you may want it to take effect
+based upon the value of an @code{awk} variable, as the XML extension
+from the @code{gawkextlib} project does (@pxref{gawkextlib}).
+In the latter case, code in a @code{BEGINFILE} section
+can look at @code{FILENAME} and @code{ERRNO} to decide whether or
+not to activate an input parser (@pxref{BEGINFILE/ENDFILE}).
+
+You register your input parser with the following function:
+
+@table @code
+@item void register_input_parser(awk_input_parser_t *input_parser);
+Register the input parser pointed to by @code{input_parser} with
+@command{gawk}.
+@end table
+
+@node Output Wrappers
+@subsubsection Customized Output Wrappers
+
+An @dfn{output wrapper} is the mirror image of an input parser.
+It allows an extension to take over the output to a file opened
+with the @samp{>} or @samp{>>} operators (@pxref{Redirection}).
+
+The output wrapper is very similar to the input parser structure:
+
+@example
+typedef struct awk_output_wrapper @{
+ const char *name; /* name of the wrapper */
+ awk_bool_t (*can_take_file)(const awk_output_buf_t *outbuf);
+ awk_bool_t (*take_control_of)(awk_output_buf_t *outbuf);
+ awk_const struct awk_output_wrapper *awk_const next; /* for use by gawk */
+@} awk_output_wrapper_t;
+@end example
+
+The members are as follows:
+
+@table @code
+@item const char *name;
+This is the name of the output wrapper.
+
+@item awk_bool_t (*can_take_file)(const awk_output_buf_t *outbuf);
+This points to a function that examines the information in
+the @code{awk_output_buf_t} structure pointed to by @code{outbuf}.
+It should return true if the output wrapper wants to take over the
+file, and false otherwise. It should not change any state (variable
+values, etc.) within @command{gawk}.
+
+@item awk_bool_t (*take_control_of)(awk_output_buf_t *outbuf);
+The function pointed to by this field is called when @command{gawk}
+decides to let the output wrapper take control of the file. It should
+fill in appropriate members of the @code{awk_output_buf_t} structure,
+as described below, and return true if successful, false otherwise.
+
+@item awk_const struct output_wrapper *awk_const next;
+This is for use by @command{gawk}.
+@end table
+
+The @code{awk_output_buf_t} structure looks like this:
+
+@example
+typedef struct awk_output_buf @{
+ const char *name; /* name of output file */
+ const char *mode; /* mode argument to fopen */
+ FILE *fp; /* stdio file pointer */
+ awk_bool_t redirected; /* true if a wrapper is active */
+ void *opaque; /* for use by output wrapper */
+ size_t (*gawk_fwrite)(const void *buf, size_t size, size_t count,
+ FILE *fp, void *opaque);
+ int (*gawk_fflush)(FILE *fp, void *opaque);
+ int (*gawk_ferror)(FILE *fp, void *opaque);
+ int (*gawk_fclose)(FILE *fp, void *opaque);
+@} awk_output_buf_t;
+@end example
+
+Here too, your extension will define @code{@var{XXX}_can_take_file()}
+and @code{@var{XXX}_take_control_of()} functions that examine and update
+data members in the @code{awk_output_buf_t}.
+The data members are as follows:
+
+@table @code
+@item const char *name;
+The name of the output file.
+
+@item const char *mode;
+The mode string (as would be used in the second argument to @code{fopen()})
+with which the file was opened.
+
+@item FILE *fp;
+The @code{FILE} pointer from @code{<stdio.h>}. @command{gawk} opens the file
+before attempting to find an output wrapper.
+
+@item awk_bool_t redirected;
+This field must be set to true by the @code{@var{XXX}_take_control_of()} function.
+
+@item void *opaque;
+This pointer is opaque to @command{gawk}. The extension should use it to store
+a pointer to any private data associated with the file.
+
+@item size_t (*gawk_fwrite)(const void *buf, size_t size, size_t count,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ FILE *fp, void *opaque);
+@itemx int (*gawk_fflush)(FILE *fp, void *opaque);
+@itemx int (*gawk_ferror)(FILE *fp, void *opaque);
+@itemx int (*gawk_fclose)(FILE *fp, void *opaque);
+These pointers should be set to point to functions that perform
+the equivalent function as the @code{<stdio.h>} functions do, if appropriate.
+@command{gawk} uses these function pointers for all output.
+@command{gawk} initializes the pointers to point to internal, ``pass through''
+functions that just call the regular @code{<stdio.h>} functions, so an
+extension only needs to redefine those functions that are appropriate for
+what it does.
+@end table
+
+The @code{@var{XXX}_can_take_file()} function should make a decision based
+upon the @code{name} and @code{mode} fields, and any additional state
+(such as @command{awk} variable values) that is appropriate.
+
+When @command{gawk} calls @code{@var{XXX}_take_control_of()}, it should fill
+in the other fields, as appropriate, except for @code{fp}, which it should just
+use normally.
+
+You register your output wrapper with the following function:
+
+@table @code
+@item void register_output_wrapper(awk_output_wrapper_t *output_wrapper);
+Register the output wrapper pointed to by @code{output_wrapper} with
+@command{gawk}.
+@end table
+
+@node Two-way processors
+@subsubsection Customized Two-way Processors
+
+A @dfn{two-way processor} combines an input parser and an output wrapper for
+two-way I/O with the @samp{|&} operator (@pxref{Redirection}). It makes identical
+use of the @code{awk_input_parser_t} and @code{awk_output_buf_t} structures
+as described earlier.
+
+A two-way processor is represented by the following structure:
+
+@example
+typedef struct awk_two_way_processor @{
+ const char *name; /* name of the two-way processor */
+ awk_bool_t (*can_take_two_way)(const char *name);
+ awk_bool_t (*take_control_of)(const char *name,
+ awk_input_buf_t *inbuf,
+ awk_output_buf_t *outbuf);
+ awk_const struct awk_two_way_processor *awk_const next; /* for use by gawk */
+@} awk_two_way_processor_t;
+@end example
+
+The fields are as follows:
+
+@table @code
+@item const char *name;
+The name of the two-way processor.
+
+@item awk_bool_t (*can_take_two_way)(const char *name);
+This function returns true if it wants to take over two-way I/O for this filename.
+It should not change any state (variable
+values, etc.) within @command{gawk}.
+
+@item awk_bool_t (*take_control_of)(const char *name,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_input_buf_t *inbuf,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_output_buf_t *outbuf);
+This function should fill in the @code{awk_input_buf_t} and
+@code{awk_outut_buf_t} structures pointed to by @code{inbuf} and
+@code{outbuf}, respectively. These structures were described earlier.
+
+@item awk_const struct two_way_processor *awk_const next;
+This is for use by @command{gawk}.
+@end table
+
+As with the input parser and output processor, you provide
+``yes I can take this'' and ``take over for this'' functions,
+@code{@var{XXX}_can_take_two_way()} and @code{@var{XXX}_take_control_of()}.
+
+You register your two-way processor with the following function:
+
+@table @code
+@item void register_two_way_processor(awk_two_way_processor_t *two_way_processor);
+Register the two-way processor pointed to by @code{two_way_processor} with
+@command{gawk}.
+@end table
+
+@node Printing Messages
+@subsection Printing Messages
+
+You can print different kinds of warning messages from your
+extension, as described below. Note that for these functions,
+you must pass in the extension id received from @command{gawk}
+when the extension was loaded.@footnote{Because the API uses only ISO C 90
+features, it cannot make use of the ISO C 99 variadic macro feature to hide
+that parameter. More's the pity.}
+
+@table @code
+@item void fatal(awk_ext_id_t id, const char *format, ...);
+Print a message and then cause @command{gawk} to exit immediately.
+
+@item void warning(awk_ext_id_t id, const char *format, ...);
+Print a warning message.
+
+@item void lintwarn(awk_ext_id_t id, const char *format, ...);
+Print a ``lint warning.'' Normally this is the same as printing a
+warning message, but if @command{gawk} was invoked with @samp{--lint=fatal},
+then lint warnings become fatal error messages.
+@end table
+
+All of these functions are otherwise like the C @code{printf()}
+family of functions, where the @code{format} parameter is a string
+with literal characters and formatting codes intermixed.
+
+@node Updating @code{ERRNO}
+@subsection Updating @code{ERRNO}
+
+The following functions allow you to update the @code{ERRNO}
+variable:
+
+@table @code
+@item void update_ERRNO_int(int errno_val);
+Set @code{ERRNO} to the string equivalent of the error code
+in @code{errno_val}. The value should be one of the defined
+error codes in @code{<errno.h>}, and @command{gawk} turns it
+into a (possibly translated) string using the C @code{strerror()} function.
+
+@item void update_ERRNO_string(const char *string);
+Set @code{ERRNO} directly to the string value of @code{ERRNO}.
+@command{gawk} makes a copy of the value of @code{string}.
+
+@item void unset_ERRNO();
+Unset @code{ERRNO}.
+@end table
+
+@node Accessing Parameters
+@subsection Accessing and Updating Parameters
+
+Two functions give you access to the arguments (parameters)
+passed to your extension function. They are:
+
+@table @code
+@item awk_bool_t get_argument(size_t count,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_valtype_t wanted,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_value_t *result);
+Fill in the @code{awk_value_t} structure pointed to by @code{result}
+with the @code{count}'th argument. Return true if the actual
+type matches @code{wanted}, false otherwise. In the latter
+case, @code{result@w{->}val_type} indicates the actual type
+(@pxref{table-value-types-returned}). Counts are zero based---the first
+argument is numbered zero, the second one, and so on. @code{wanted}
+indicates the type of value expected.
+
+@item awk_bool_t set_argument(size_t count, awk_array_t array);
+Convert a parameter that was undefined into an array; this provides
+call-by-reference for arrays. Return false if @code{count} is too big,
+or if the argument's type is not undefined. @xref{Array Manipulation},
+for more information on creating arrays.
+@end table
+
+@node Symbol Table Access
+@subsection Symbol Table Access
+
+Two sets of routines provide access to global variables, and one set
+allows you to create and release cached values.
+
+@menu
+* Symbol table by name:: Accessing variables by name.
+* Symbol table by cookie:: Accessing variables by ``cookie''.
+* Cached values:: Creating and using cached values.
+@end menu
+
+@node Symbol table by name
+@subsubsection Variable Access and Update by Name
+
+The following routines provide the ability to access and update
+global @command{awk}-level variables by name. In compiler terminology,
+identifiers of different kinds are termed @dfn{symbols}, thus the ``sym''
+in the routines' names. The data structure which stores information
+about symbols is termed a @dfn{symbol table}.
+
+@table @code
+@item awk_bool_t sym_lookup(const char *name,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_valtype_t wanted,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_value_t *result);
+Fill in the @code{awk_value_t} structure pointed to by @code{result}
+with the value of the variable named by the string @code{name}, which is
+a regular C string. @code{wanted} indicates the type of value expected.
+Return true if the actual type matches @code{wanted}, false otherwise
+In the latter case, @code{result->val_type} indicates the actual type
+(@pxref{table-value-types-returned}).
+
+@item awk_bool_t sym_update(const char *name, awk_value_t *value);
+Update the variable named by the string @code{name}, which is a regular
+C string. The variable is added to @command{gawk}'s symbol table
+if it is not there. Return true if everything worked, false otherwise.
+
+Changing types (scalar to array or vice versa) of an existing variable
+is @emph{not} allowed, nor may this routine be used to update an array.
+This routine cannot be used to update any of the predefined
+variables (such as @code{ARGC} or @code{NF}).
+
+@item awk_bool_t sym_constant(const char *name, awk_value_t *value);
+Create a variable named by the string @code{name}, which is
+a regular C string, that has the constant value as given by
+@code{value}. @command{awk}-level code cannot change the value of this
+variable.@footnote{There (currently) is no @code{awk}-level feature that
+provides this ability.} The extension may change the value of @code{name}'s
+variable with subsequent calls to this routine, and may also convert
+a variable created by @code{sym_update()} into a constant. However,
+once a variable becomes a constant, it cannot later be reverted into a
+mutable variable.
+@end table
+
+@node Symbol table by cookie
+@subsubsection Variable Access and Update by Cookie
+
+A @dfn{scalar cookie} is an opaque handle that provide access
+to a global variable or array. It is an optimization that
+avoids looking up variables in @command{gawk}'s symbol table every time
+access is needed. This was discussed earlier, in @ref{General Data Types}.
+
+The following functions let you work with scalar cookies.
+
+@table @code
+@item awk_bool_t sym_lookup_scalar(awk_scalar_t cookie,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_valtype_t wanted,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_value_t *result);
+Retrieve the current value of a scalar cookie.
+Once you have obtained a scalar_cookie using @code{sym_lookup()}, you can
+use this function to get its value more efficiently.
+Return false if the value cannot be retrieved.
+
+@item awk_bool_t sym_update_scalar(awk_scalar_t cookie, awk_value_t *value);
+Update the value associated with a scalar cookie. Return false if
+the new value is not one of @code{AWK_STRING} or @code{AWK_NUMBER}.
+Here too, the built-in variables may not be updated.
+@end table
+
+It is not obvious at first glance how to work with scalar cookies or
+what their @i{raison d@^etre} really is. In theory, the @code{sym_lookup()}
+and @code{sym_update()} routines are all you really need to work with
+variables. For example, you might have code that looked up the value of
+a variable, evaluated a condition, and then possibly changed the value
+of the variable based on the result of that evaluation, like so:
+
+@example
+/* do_magic --- do something really great */
+
+static awk_value_t *
+do_magic(int nargs, awk_value_t *result)
+@{
+ awk_value_t value;
+
+ if ( sym_lookup("MAGIC_VAR", AWK_NUMBER, & value)
+ && some_condition(value.num_value)) @{
+ value.num_value += 42;
+ sym_update("MAGIC_VAR", & value);
+ @}
+
+ return make_number(0.0, result);
+@}
+@end example
+
+@noindent
+This code looks (and is) simple and straightforward. So what's the problem?
+
+Consider what happens if @command{awk}-level code associated with your
+extension calls the @code{magic()} function (implemented in C by @code{do_magic()}),
+once per record, while processing hundreds of thousands or millions of records.
+The @code{MAGIC_VAR} variable is looked up in the symbol table once or twice per function call!
+
+The symbol table lookup is really pure overhead; it is considerably more efficient
+to get a cookie that represents the variable, and use that to get the variable's
+value and update it as needed.@footnote{The difference is measurable and quite real. Trust us.}
+
+Thus, the way to use cookies is as follows. First, install your extension's variable
+in @command{gawk}'s symbol table using @code{sym_update()}, as usual. Then get a
+scalar cookie for the variable using @code{sym_lookup()}:
+
+@example
+static awk_scalar_t magic_var_cookie; /* cookie for MAGIC_VAR */
+
+static void
+my_extension_init()
+@{
+ awk_value_t value;
+
+ /* install initial value */
+ sym_update("MAGIC_VAR", make_number(42.0, & value));
+
+ /* get cookie */
+ sym_lookup("MAGIC_VAR", AWK_SCALAR, & value);
+
+ /* save the cookie */
+ magic_var_cookie = value.scalar_cookie;
+ @dots{}
+@}
+@end example
+
+Next, use the routines in this section for retrieving and updating
+the value through the cookie. Thus, @code{do_magic()} now becomes
+something like this:
+
+@example
+/* do_magic --- do something really great */
+
+static awk_value_t *
+do_magic(int nargs, awk_value_t *result)
+@{
+ awk_value_t value;
+
+ if ( sym_lookup_scalar(magic_var_cookie, AWK_NUMBER, & value)
+ && some_condition(value.num_value)) @{
+ value.num_value += 42;
+ sym_update_scalar(magic_var_cookie, & value);
+ @}
+ @dots{}
+
+ return make_number(0.0, result);
+@}
+@end example
+
+@quotation NOTE
+The previous code omitted error checking for
+presentation purposes. Your extension code should be more robust
+and carefully check the return values from the API functions.
+@end quotation
+
+@node Cached values
+@subsubsection Creating and Using Cached Values
+
+The routines in this section allow you to create and release
+cached values. As with scalar cookies, in theory, cached values
+are not necessary. You can create numbers and strings using
+the functions in @ref{Constructor Functions}. You can then
+assign those values to variables using @code{sym_update()}
+or @code{sym_update_scalar()}, as you like.
+
+However, you can understand the point of cached values if you remember that
+@emph{every} string value's storage @emph{must} come from @code{malloc()}.
+If you have 20 variables, all of which have the same string value, you
+must create 20 identical copies of the string.@footnote{Numeric values
+are clearly less problematic, requiring only a C @code{double} to store.}
+
+It is clearly more efficient, if possible, to create a value once, and
+then tell @command{gawk} to reuse the value for multiple variables. That
+is what the routines in this section let you do. The functions are as follows:
+
+@table @code
+@item awk_bool_t create_value(awk_value_t *value, awk_value_cookie_t *result);
+Create a cached string or numeric value from @code{value} for efficient later
+assignment.
+Only @code{AWK_NUMBER} and @code{AWK_STRING} values are allowed. Any other type
+is rejected. While @code{AWK_UNDEFINED} could be allowed, doing so would
+result in inferior performance.
+
+@item awk_bool_t release_value(awk_value_cookie_t vc);
+Release the memory associated with a value cookie obtained
+from @code{create_value()}.
+@end table
+
+You use value cookies in a fashion similar to the way you use scalar cookies.
+In the extension initialization routine, you create the value cookie:
+
+@example
+static awk_value_cookie_t answer_cookie; /* static value cookie */
+
+static void
+my_extension_init()
+@{
+ awk_value_t value;
+ char *long_string;
+ size_t long_string_len;
+
+ /* code from earlier */
+ @dots{}
+ /* @dots{} fill in long_string and long_string_len @dots{} */
+ make_malloced_string(long_string, long_string_len, & value);
+ create_value(& value, & answer_cookie); /* create cookie */
+ @dots{}
+@}
+@end example
+
+Once the value is created, you can use it as the value of any number
+of variables:
+
+@example
+static awk_value_t *
+do_magic(int nargs, awk_value_t *result)
+@{
+ awk_value_t new_value;
+
+ @dots{} /* as earlier */
+
+ value.val_type = AWK_VALUE_COOKIE;
+ value.value_cookie = answer_cookie;
+ sym_update("VAR1", & value);
+ sym_update("VAR2", & value);
+ @dots{}
+ sym_update("VAR100", & value);
+ @dots{}
+@}
+@end example
+
+@noindent
+Using value cookies in this way saves considerable storage, since all of
+@code{VAR1} through @code{VAR100} share the same value.
+
+You might be wondering, ``Is this sharing problematic?
+What happens if @command{awk} code assigns a new value to @code{VAR1},
+are all the others be changed too?''
+
+That's a great question. The answer is that no, it's not a problem.
+Internally, @command{gawk} uses reference-counted strings. This means
+that many variables can share the same string, and @command{gawk}
+keeps track of the usage. When a variable's value changes, @command{gawk}
+simply decrements the reference count on the old value and updates
+the variable to use the new value.
+
+Finally, as part of your clean up action (@pxref{Exit Callback Functions})
+you should release any cached values that you created, using
+@code{release_value()}.
+
+@node Array Manipulation
+@subsection Array Manipulation
+
+The primary data structure@footnote{Okay, the only data structure.} in @command{awk}
+is the associative array (@pxref{Arrays}).
+Extensions need to be able to manipulate @command{awk} arrays.
+The API provides a number of data structures for working with arrays,
+functions for working with individual elements, and functions for
+working with arrays as a whole. This includes the ability to
+``flatten'' an array so that it is easy for C code to traverse
+every element in an array. The array data structures integrate
+nicely with the data structures for values to make it easy to
+both work with and create true arrays of arrays (@pxref{General Data Types}).
+
+@menu
+* Array Data Types:: Data types for working with arrays.
+* Array Functions:: Functions for working with arrays.
+* Flattening Arrays:: How to flatten arrays.
+* Creating Arrays:: How to create and populate arrays.
+@end menu
+
+@node Array Data Types
+@subsubsection Array Data Types
+
+The data types associated with arrays are listed below.
+
+@table @code
+@item typedef void *awk_array_t;
+If you request the value of an array variable, you get back an
+@code{awk_array_t} value. This value is opaque@footnote{It is also
+a ``cookie,'' but the @command{gawk} developers did not wish to overuse this
+term.} to the extension; it uniquely identifies the array but can
+only be used by passing it into API functions or receiving it from API
+functions. This is very similar to way @samp{FILE *} values are used
+with the @code{<stdio.h>} library routines.
+
+@item typedef struct awk_element @{
+@itemx @ @ @ @ /* convenience linked list pointer, not used by gawk */
+@itemx @ @ @ @ struct awk_element *next;
+@itemx @ @ @ @ enum @{
+@itemx @ @ @ @ @ @ @ @ AWK_ELEMENT_DEFAULT = 0,@ @ /* set by gawk */
+@itemx @ @ @ @ @ @ @ @ AWK_ELEMENT_DELETE = 1@ @ @ @ /* set by extension if should be deleted */
+@itemx @ @ @ @ @} flags;
+@itemx @ @ @ @ awk_value_t index;
+@itemx @ @ @ @ awk_value_t value;
+@itemx @} awk_element_t;
+The @code{awk_element_t} is a ``flattened''
+array element. @command{awk} produces an array of these
+inside the @code{awk_flat_array_t} (see the next item).
+Individual elements may be marked for deletion. New elements must be added
+individually, one at a time, using the separate API for that purpose.
+The fields are as follows:
+
+@c nested table
+@table @code
+@item struct awk_element *next;
+This pointer is for the convenience of extension writers. It allows
+an extension to create a linked list of new elements that can then be
+added to an array in a loop that traverses the list.
+
+@item enum @{ @dots{} @} flags;
+A set of flag values that convey information between @command{gawk}
+and the extension. Currently there is only one: @code{AWK_ELEMENT_DELETE}.
+Setting it causes @command{gawk} to delete the
+element from the original array upon release of the flattened array.
+
+@item index
+@itemx value
+The index and value of the element, respectively.
+@emph{All} memory pointed to by @code{index} and @code{value} belongs to @command{gawk}.
+@end table
+
+@item typedef struct awk_flat_array @{
+@itemx @ @ @ @ awk_const void *awk_const opaque1;@ @ @ @ /* private data for use by gawk */
+@itemx @ @ @ @ awk_const void *awk_const opaque2;@ @ @ @ /* private data for use by gawk */
+@itemx @ @ @ @ awk_const size_t count;@ @ @ @ @ /* how many elements */
+@itemx @ @ @ @ awk_element_t elements[1];@ @ /* will be extended */
+@itemx @} awk_flat_array_t;
+This is a flattened array. When an extension gets one of these
+from @command{gawk}, the @code{elements} array is of actual
+size @code{count}.
+The @code{opaque1} and @code{opaque2} pointers are for use by @command{gawk};
+therefore they are marked @code{awk_const} so that the extension cannot
+modify them.
+@end table
+
+@node Array Functions
+@subsubsection Array Functions
+
+The following functions relate to individual array elements.
+
+@table @code
+@item awk_bool_t get_element_count(awk_array_t a_cookie, size_t *count);
+For the array represented by @code{a_cookie}, return in @code{*count}
+the number of elements it contains. A subarray counts as a single element.
+Return false if there is an error.
+
+@item awk_bool_t get_array_element(awk_array_t a_cookie,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const awk_value_t *const index,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_valtype_t wanted,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_value_t *result);
+For the array represented by @code{a_cookie}, return in @code{*result}
+the value of the element whose index is @code{index}.
+@code{wanted} specifies the type of value you wish to retrieve.
+Return false if @code{wanted} does not match the actual type or if
+@code{index} is not in the array (@pxref{table-value-types-returned}).
+
+The value for @code{index} can be numeric, in which case @command{gawk}
+converts it to a string. Using non-integral values is possible, but
+requires that you understand how such values are converted to strings
+(@pxref{Conversion}); thus using integral values is safest.
+
+As with @emph{all} strings passed into @code{gawk} from an extension,
+the string value of @code{index} must come from @code{malloc()}, and
+@command{gawk} releases the storage.
+
+@item awk_bool_t set_array_element(awk_array_t a_cookie,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const@ awk_value_t *const index,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const@ awk_value_t *const value);
+In the array represented by @code{a_cookie}, create or modify
+the element whose index is given by @code{index}.
+The @code{ARGV} and @code{ENVIRON} arrays may not be changed.
+
+@item awk_bool_t set_array_element_by_elem(awk_array_t a_cookie,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_element_t element);
+Like @code{set_array_element()}, but take the @code{index} and @code{value}
+from @code{element}. This is a convenience macro.
+
+@item awk_bool_t del_array_element(awk_array_t a_cookie,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const awk_value_t* const index);
+Remove the element with the given index from the array
+represented by @code{a_cookie}.
+Return true if the element was removed, or false if the element did
+not exist in the array.
+@end table
+
+The following functions relate to arrays as a whole:
+
+@table @code
+@item awk_array_t create_array();
+Create a new array to which elements may be added.
+@xref{Creating Arrays}, for a discussion of how to
+create a new array and add elements to it.
+
+@item awk_bool_t clear_array(awk_array_t a_cookie);
+Clear the array represented by @code{a_cookie}.
+Return false if there was some kind of problem, true otherwise.
+The array remains an array, but after calling this function, it
+has no elements. This is equivalent to using the @code{delete}
+statement (@pxref{Delete}).
+
+@item awk_bool_t flatten_array(awk_array_t a_cookie, awk_flat_array_t **data);
+For the array represented by @code{a_cookie}, create an @code{awk_flat_array_t}
+structure and fill it in. Set the pointer whose address is passed as @code{data}
+to point to this structure.
+Return true upon success, or false otherwise.
+@xref{Flattening Arrays}, for a discussion of how to
+flatten an array and work with it.
+
+@item awk_bool_t release_flattened_array(awk_array_t a_cookie,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_flat_array_t *data);
+When done with a flattened array, release the storage using this function.
+You must pass in both the original array cookie, and the address of
+the created @code{awk_flat_array_t} structure.
+The function returns true upon success, false otherwise.
+@end table
+
+@node Flattening Arrays
+@subsubsection Working With All The Elements of an Array
+
+To @dfn{flatten} an array is create a structure that
+represents the full array in a fashion that makes it easy
+for C code to traverse the entire array. Test code
+in @file{extension/testext.c} does this, and also serves
+as a nice example to show how to use the APIs.
+
+First, the @command{gawk} script that drives the test extension:
+
+@example
+@@load "testext"
+BEGIN @{
+ n = split("blacky rusty sophie raincloud lucky", pets)
+ printf("pets has %d elements\n", length(pets))
+ ret = dump_array_and_delete("pets", "3")
+ printf("dump_array_and_delete(pets) returned %d\n", ret)
+ if ("3" in pets)
+ printf("dump_array_and_delete() did NOT remove index \"3\"!\n")
+ else
+ printf("dump_array_and_delete() did remove index \"3\"!\n")
+ print ""
+@}
+@end example
+
+@noindent
+This code creates an array with @code{split()} (@pxref{String Functions})
+and then calls @code{dump_and_delete()}. That function looks up
+the array whose name is passed as the first argument, and
+deletes the element at the index passed in the second argument.
+It then prints the return value and checks if the element
+was indeed deleted. Here is the C code that implements
+@code{dump_array_and_delete()}. It has been edited slightly for
+presentation.
+
+The first part declares variables, sets up the default
+return value in @code{result}, and checks that the function
+was called with the correct number of arguments:
+
+@example
+static awk_value_t *
+dump_array_and_delete(int nargs, awk_value_t *result)
+@{
+ awk_value_t value, value2, value3;
+ awk_flat_array_t *flat_array;
+ size_t count;
+ char *name;
+ int i;
+
+ assert(result != NULL);
+ make_number(0.0, result);
+
+ if (nargs != 2) @{
+ printf("dump_array_and_delete: nargs not right "
+ "(%d should be 2)\n", nargs);
+ goto out;
+ @}
+@end example
+
+The function then proceeds in steps, as follows. First, retrieve
+the name of the array, passed as the first argument. Then
+retrieve the array itself. If either operation fails, print
+error messages and return:
+
+@example
+ /* get argument named array as flat array and print it */
+ if (get_argument(0, AWK_STRING, & value)) @{
+ name = value.str_value.str;
+ if (sym_lookup(name, AWK_ARRAY, & value2))
+ printf("dump_array_and_delete: sym_lookup of %s passed\n",
+ name);
+ else @{
+ printf("dump_array_and_delete: sym_lookup of %s failed\n",
+ name);
+ goto out;
+ @}
+ @} else @{
+ printf("dump_array_and_delete: get_argument(0) failed\n");
+ goto out;
+ @}
+@end example
+
+For testing purposes and to make sure that the C code sees
+the same number of elements as the @command{awk} code,
+the second step is to get the count of elements in the array
+and print it:
+
+@example
+ if (! get_element_count(value2.array_cookie, & count)) @{
+ printf("dump_array_and_delete: get_element_count failed\n");
+ goto out;
+ @}
+
+ printf("dump_array_and_delete: incoming size is %lu\n",
+ (unsigned long) count);
+@end example
+
+The third step is to actually flatten the array, and then
+to double check that the count in the @code{awk_flat_array_t}
+is the same as the count just retrieved:
+
+@example
+ if (! flatten_array(value2.array_cookie, & flat_array)) @{
+ printf("dump_array_and_delete: could not flatten array\n");
+ goto out;
+ @}
+
+ if (flat_array->count != count) @{
+ printf("dump_array_and_delete: flat_array->count (%lu)"
+ " != count (%lu)\n",
+ (unsigned long) flat_array->count,
+ (unsigned long) count);
+ goto out;
+ @}
+@end example
+
+The fourth step is to retrieve the index of the element
+to be deleted, which was passed as the second argument.
+Remember that argument counts passed to @code{get_argument()}
+are zero-based, thus the second argument is numbered one:
+
+@example
+ if (! get_argument(1, AWK_STRING, & value3)) @{
+ printf("dump_array_and_delete: get_argument(1) failed\n");
+ goto out;
+ @}
+@end example
+
+The fifth step is where the ``real work'' is done. The function
+loops over every element in the array, printing the index and
+element values. In addition, upon finding the element with the
+index that is supposed to be deleted, the function sets the
+@code{AWK_ELEMENT_DELETE} bit in the @code{flags} field
+of the element. When the array is released, @command{gawk}
+traverses the flattened array, and deletes any element which
+have this flag bit set:
+
+@example
+ for (i = 0; i < flat_array->count; i++) @{
+ printf("\t%s[\"%.*s\"] = %s\n",
+ name,
+ (int) flat_array->elements[i].index.str_value.len,
+ flat_array->elements[i].index.str_value.str,
+ valrep2str(& flat_array->elements[i].value));
+
+ if (strcmp(value3.str_value.str,
+ flat_array->elements[i].index.str_value.str)
+ == 0) @{
+ flat_array->elements[i].flags |= AWK_ELEMENT_DELETE;
+ printf("dump_array_and_delete: marking element \"%s\" "
+ "for deletion\n",
+ flat_array->elements[i].index.str_value.str);
+ @}
+ @}
+@end example
+
+The sixth step is to release the flattened array. This tells
+@command{gawk} that the extension is no longer using the array,
+and that it should delete any elements marked for deletion.
+@command{gawk} also frees any storage that was allocated,
+so you should not use the pointer (@code{flat_array} in this
+code) once you have called @code{release_flattened_array()}:
+
+@example
+ if (! release_flattened_array(value2.array_cookie, flat_array)) @{
+ printf("dump_array_and_delete: could not release flattened array\n");
+ goto out;
+ @}
+@end example
+
+Finally, since everything was successful, the function sets the
+return value to success, and returns:
+
+@example
+ make_number(1.0, result);
+out:
+ return result;
+@}
+@end example
+
+Here is the output from running this part of the test:
+
+@example
+pets has 5 elements
+dump_array_and_delete: sym_lookup of pets passed
+dump_array_and_delete: incoming size is 5
+ pets["1"] = "blacky"
+ pets["2"] = "rusty"
+ pets["3"] = "sophie"
+dump_array_and_delete: marking element "3" for deletion
+ pets["4"] = "raincloud"
+ pets["5"] = "lucky"
+dump_array_and_delete(pets) returned 1
+dump_array_and_delete() did remove index "3"!
+@end example
+
+@node Creating Arrays
+@subsubsection How To Create and Populate Arrays
+
+Besides working with arrays created by @command{awk} code, you can
+create arrays and populate them as you see fit, and then @command{awk}
+code can access them and manipulate them.
+
+There are two important points about creating arrays from extension code:
+
+@enumerate 1
+@item
+You must install a new array into @command{gawk}'s symbol
+table immediately upon creating it. Once you have done so,
+you can then populate the array.
+
+@ignore
+Strictly speaking, this is required only
+for arrays that will have subarrays as elements; however it is
+a good idea to always do this. This restriction may be relaxed
+in a subsequent revision of the API.
+@end ignore
+
+Similarly, if installing a new array as a subarray of an existing array,
+you must add the new array to its parent before adding any elements to it.
+
+Thus, the correct way to build an array is to work ``top down.'' Create
+the array, and immediately install it in @command{gawk}'s symbol table
+using @code{sym_update()}, or install it as an element in a previously
+existing array using @code{set_element()}. We show example code shortly.
+
+@item
+Due to gawk internals, after using @code{sym_update()} to install an array
+into @command{gawk}, you have to retrieve the array cookie from the value
+passed in to @command{sym_update()} before doing anything else with it, like so:
+
+@example
+awk_value_t index, value;
+awk_array_t new_array;
+
+make_const_string("an index", 8, & index);
+
+new_array = create_array();
+val.val_type = AWK_ARRAY;
+val.array_cookie = new_array;
+
+/* install array in the symbol table */
+sym_update("array", & index, & val);
+
+new_array = val.array_cookie; /* YOU MUST DO THIS */
+@end example
+
+If installing an array as a subarray, you must also retrieve the value
+of the array cookie after the call to @code{set_element()}.
+@end enumerate
+
+The following C code is a simple test extension to create an array
+with two regular elements and with a subarray. The leading @samp{#include}
+directives and boilerplate variable declarations are omitted for brevity.
+The first step is to create a new array and then install it
+in the symbol table:
+
+@example
+@ignore
+#ifdef HAVE_CONFIG_H
+#include <config.h>
+#endif
+
+#include <stdio.h>
+#include <assert.h>
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+#include <sys/types.h>
+#include <sys/stat.h>
+
+#include "gawkapi.h"
+
+static const gawk_api_t *api; /* for convenience macros to work */
+static awk_ext_id_t *ext_id;
+static const char *ext_version = "testarray extension: version 1.0";
+
+int plugin_is_GPL_compatible;
+
+@end ignore
+/* create_new_array --- create a named array */
+
+static void
+create_new_array()
+@{
+ awk_array_t a_cookie;
+ awk_array_t subarray;
+ awk_value_t index, value;
+
+ a_cookie = create_array();
+ value.val_type = AWK_ARRAY;
+ value.array_cookie = a_cookie;
+
+ if (! sym_update("new_array", & value))
+ printf("create_new_array: sym_update(\"new_array\") failed!\n");
+ a_cookie = value.array_cookie;
+@end example
+
+@noindent
+Note how @code{a_cookie} is reset from the @code{array_cookie} field in
+the @code{value} structure.
+
+The second step is to install two regular values into @code{new_array}:
+
+@example
+ (void) make_const_string("hello", 5, & index);
+ (void) make_const_string("world", 5, & value);
+ if (! set_array_element(a_cookie, & index, & value)) @{
+ printf("fill_in_array: set_array_element failed\n");
+ return;
+ @}
+
+ (void) make_const_string("answer", 6, & index);
+ (void) make_number(42.0, & value);
+ if (! set_array_element(a_cookie, & index, & value)) @{
+ printf("fill_in_array: set_array_element failed\n");
+ return;
+ @}
+@end example
+
+The third step is to create the subarray and install it:
+
+@example
+ (void) make_const_string("subarray", 8, & index);
+ subarray = create_array();
+ value.val_type = AWK_ARRAY;
+ value.array_cookie = subarray;
+ if (! set_array_element(a_cookie, & index, & value)) @{
+ printf("fill_in_array: set_array_element failed\n");
+ return;
+ @}
+ subarray = value.array_cookie;
+@end example
+
+The final step is to populate the subarray with its own element:
+
+@example
+ (void) make_const_string("foo", 3, & index);
+ (void) make_const_string("bar", 3, & value);
+ if (! set_array_element(subarray, & index, & value)) @{
+ printf("fill_in_array: set_array_element failed\n");
+ return;
+ @}
+@}
+@ignore
+static awk_ext_func_t func_table[] = @{
+ @{ NULL, NULL, 0 @}
+@};
+
+/* init_testarray --- additional initialization function */
+
+static awk_bool_t init_testarray(void)
+@{
+ create_new_array();
+
+ return awk_true;
+@}
+
+static awk_bool_t (*init_func)(void) = init_testarray;
+
+dl_load_func(func_table, testarray, "")
+@end ignore
+@end example
+
+Here is sample script that loads the extension
+and then dumps the array:
+
+@example
+@@load "subarray"
+
+function dumparray(name, array, i)
+@{
+ for (i in array)
+ if (isarray(array[i]))
+ dumparray(name "[\"" i "\"]", array[i])
+ else
+ printf("%s[\"%s\"] = %s\n", name, i, array[i])
+@}
+
+BEGIN @{
+ dumparray("new_array", new_array);
+@}
+@end example
+
+Here is the result of running the script:
+
+@example
+$ @kbd{AWKLIBPATH=$PWD ./gawk -f subarray.awk}
+@print{} new_array["subarray"]["foo"] = bar
+@print{} new_array["hello"] = world
+@print{} new_array["answer"] = 42
+@end example
+
+@noindent
+(@xref{Finding Extensions}, for more information on the
+@env{AWKLIBPATH} environment variable.)
+
+@node Extension API Variables
+@subsection API Variables
+
+The API provides two sets of variables. The first provides information
+about the version of the API (both with which the extension was compiled,
+and with which @command{gawk} was compiled). The second provides
+information about how @command{gawk} was invoked.
+
+@menu
+* Extension Versioning:: API Version information.
+* Extension API Informational Variables:: Variables providing information about
+ @command{gawk}'s invocation.
+@end menu
+
+@node Extension Versioning
+@subsubsection API Version Constants and Variables
+
+The API provides both a ``major'' and a ``minor'' version number.
+The API versions are available at compile time as constants:
+
+@table @code
+@item GAWK_API_MAJOR_VERSION
+The major version of the API.
+
+@item GAWK_API_MINOR_VERSION
+The minor version of the API.
+@end table
+
+The minor version increases when new functions are added to the API. Such
+new functions are always added to the end of the API @code{struct}.
+
+The major version increases (and the minor version is reset to zero) if any
+of the data types change size or member order, or if any of the existing
+functions change signature.
+
+It could happen that an extension may be compiled against one version
+of the API but loaded by a version of @command{gawk} using a different
+version. For this reason, the major and minor API versions of the
+running @command{gawk} are included in the API @code{struct} as read-only
+constant integers:
+
+@table @code
+@item api->major_version
+The major version of the running @command{gawk}.
+
+@item api->minor_version
+The minor version of the running @command{gawk}.
+@end table
+
+It is up to the extension to decide if there are API incompatibilities.
+Typically a check like this is enough:
+
+@example
+if (api->major_version != GAWK_API_MAJOR_VERSION
+ || api->minor_version < GAWK_API_MINOR_VERSION) @{
+ fprintf(stderr, "foo_extension: version mismatch with gawk!\n");
+ fprintf(stderr, "\tmy version (%d, %d), gawk version (%d, %d)\n",
+ GAWK_API_MAJOR_VERSION, GAWK_API_MINOR_VERSION,
+ api->major_version, api->minor_version);
+ exit(1);
+@}
+@end example
+
+Such code is included in the boilerplate @code{dl_load_func()} macro
+provided in @file{gawkapi.h} (discussed later, in
+@ref{Extension API Boilerplate}).
+
+@node Extension API Informational Variables
+@subsubsection Informational Variables
+
+The API provides access to several variables that describe
+whether the corresponding command-line options were enabled when
+@command{gawk} was invoked. The variables are:
+
+@table @code
+@item do_lint
+This variable is true if @command{gawk} was invoked with @option{--lint} option
+(@pxref{Options}).
+
+@item do_traditional
+This variable is true if @command{gawk} was invoked with @option{--traditional} option.
+
+@item do_profile
+This variable is true if @command{gawk} was invoked with @option{--profile} option.
+
+@item do_sandbox
+This variable is true if @command{gawk} was invoked with @option{--sandbox} option.
+
+@item do_debug
+This variable is true if @command{gawk} was invoked with @option{--debug} option.
+
+@item do_mpfr
+This variable is true if @command{gawk} was invoked with @option{--bignum} option.
+@end table
+
+The value of @code{do_lint} can change if @command{awk} code
+modifies the @code{LINT} built-in variable (@pxref{Built-in Variables}).
+The others should not change during execution.
+
+@node Extension API Boilerplate
+@subsection Boilerplate Code
+
+As mentioned earlier (@pxref{Extension Mechanism Outline}), the function
+definitions as presented are really macros. To use these macros, your
+extension must provide a small amount of boilerplate code (variables and
+functions) towards the top of your source file, using pre-defined names
+as described below. The boilerplate needed is also provided in comments
+in the @file{gawkapi.h} header file:
+
+@example
+/* Boiler plate code: */
+int plugin_is_GPL_compatible;
+
+static gawk_api_t *const api;
+static awk_ext_id_t ext_id;
+static const char *ext_version = NULL; /* or @dots{} = "some string" */
+
+static awk_ext_func_t func_table[] = @{
+ @{ "name", do_name, 1 @},
+ /* @dots{} */
+@};
+
+/* EITHER: */
+
+static awk_bool_t (*init_func)(void) = NULL;
+
+/* OR: */
+
+static awk_bool_t
+init_my_module(void)
+@{
+ @dots{}
+@}
+
+static awk_bool_t (*init_func)(void) = init_my_module;
+
+dl_load_func(func_table, some_name, "name_space_in_quotes")
+@end example
+
+These variables and functions are as follows:
+
+@table @code
+@item int plugin_is_GPL_compatible;
+This asserts that the extension is compatible with the GNU GPL
+(@pxref{Copying}). If your extension does not have this, @command{gawk}
+will not load it (@pxref{Plugin License}).
+
+@item static gawk_api_t *const api;
+This global @code{static} variable should be set to point to
+the @code{gawk_api_t} pointer that @command{gawk} passes to your
+@code{dl_load()} function. This variable is used by all of the macros.
+
+@item static awk_ext_id_t ext_id;
+This global static variable should be set to the @code{awk_ext_id_t}
+value that @command{gawk} passes to your @code{dl_load()} function.
+This variable is used by all of the macros.
+
+@item static const char *ext_version = NULL; /* or @dots{} = "some string" */
+This global @code{static} variable should be set either
+to @code{NULL}, or to point to a string giving the name and version of
+your extension.
+
+@item static awk_ext_func_t func_table[] = @{ @dots{} @};
+This is an array of one or more @code{awk_ext_func_t} structures
+as described earlier (@pxref{Extension Functions}).
+It can then be looped over for multiple calls to
+@code{add_ext_func()}.
+
+@item static awk_bool_t (*init_func)(void) = NULL;
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @r{OR}
+@itemx static awk_bool_t init_my_module(void) @{ @dots{} @}
+@itemx static awk_bool_t (*init_func)(void) = init_my_module;
+If you need to do some initialization work, you should define a
+function that does it (creates variables, opens files, etc.)
+and then define the @code{init_func} pointer to point to your
+function.
+The function should return @code{awk_false} upon failure, or @code{awk_true}
+if everything goes well.
+
+If you don't need to do any initialization, define the pointer and
+initialize it to @code{NULL}.
+
+@item dl_load_func(func_table, some_name, "name_space_in_quotes")
+This macro expands to a @code{dl_load()} function that performs
+all the necessary initializations.
+@end table
+
+The point of the all the variables and arrays is to let the
+@code{dl_load()} function (from the @code{dl_load_func()}
+macro) do all the standard work. It does the following:
+
+@enumerate 1
+@item
+Check the API versions. If the extension major version does not match
+@command{gawk}'s, or if the extension minor version is greater than
+@command{gawk}'s, it prints a fatal error message and exits.
+
+@item
+Load the functions defined in @code{func_table}.
+If any of them fails to load, it prints a warning message but
+continues on.
+
+@item
+If the @code{init_func} pointer is not @code{NULL}, call the
+function it points to. If it returns @code{awk_false}, print a
+warning message.
+
+@item
+If @code{ext_version} is not @code{NULL}, register
+the version string with @command{gawk}.
+@end enumerate
+
+@node Finding Extensions
+@subsection How @command{gawk} Finds Extensions
+
+Compiled extensions have to be installed in a directory where
+@command{gawk} can find them. If @command{gawk} is configured and
+built in the default fashion, the directory in which to find
+extensions is @file{/usr/local/lib/gawk}. You can also specify a search
+path with a list of directories to search for compiled extensions.
+@xref{AWKLIBPATH Variable}, for more information.
+
+@node Extension Example
+@section Example: Some File Functions
+
+@quotation
+@i{No matter where you go, there you are.} @*
+Buckaroo Bonzai
+@end quotation
+
+@c It's enough to show chdir and stat, no need for fts
+
+Two useful functions that are not in @command{awk} are @code{chdir()} (so
+that an @command{awk} program can change its directory) and @code{stat()}
+(so that an @command{awk} program can gather information about a file).
+This @value{SECTION} implements these functions for @command{gawk}
+in an extension.
+
+@menu
+* Internal File Description:: What the new functions will do.
+* Internal File Ops:: The code for internal file operations.
+* Using Internal File Ops:: How to use an external extension.
+@end menu
+
+@node Internal File Description
+@subsection Using @code{chdir()} and @code{stat()}
+
+This @value{SECTION} shows how to use the new functions at
+the @command{awk} level once they've been integrated into the
+running @command{gawk} interpreter. Using @code{chdir()} is very
+straightforward. It takes one argument, the new directory to change to:
+
+@example
+@@load "filefuncs"
+@dots{}
+newdir = "/home/arnold/funstuff"
+ret = chdir(newdir)
+if (ret < 0) @{
+ printf("could not change to %s: %s\n",
+ newdir, ERRNO) > "/dev/stderr"
+ exit 1
+@}
+@dots{}
+@end example
+
+The return value is negative if the @code{chdir()} failed, and
+@code{ERRNO} (@pxref{Built-in Variables}) is set to a string indicating
+the error.
+
+Using @code{stat()} is a bit more complicated. The C @code{stat()}
+function fills in a structure that has a fair amount of information.
+The right way to model this in @command{awk} is to fill in an associative
+array with the appropriate information:
+
+@c broke printf for page breaking
+@example
+file = "/home/arnold/.profile"
+ret = stat(file, fdata)
+if (ret < 0) @{
+ printf("could not stat %s: %s\n",
+ file, ERRNO) > "/dev/stderr"
+ exit 1
+@}
+printf("size of %s is %d bytes\n", file, fdata["size"])
+@end example
+
+The @code{stat()} function always clears the data array, even if
+the @code{stat()} fails. It fills in the following elements:
+
+@table @code
+@item "name"
+The name of the file that was @code{stat()}'ed.
+
+@item "dev"
+@itemx "ino"
+The file's device and inode numbers, respectively.
+
+@item "mode"
+The file's mode, as a numeric value. This includes both the file's
+type and its permissions.
+
+@item "nlink"
+The number of hard links (directory entries) the file has.
+
+@item "uid"
+@itemx "gid"
+The numeric user and group ID numbers of the file's owner.
+
+@item "size"
+The size in bytes of the file.
+
+@item "blocks"
+The number of disk blocks the file actually occupies. This may not
+be a function of the file's size if the file has holes.
+
+@item "atime"
+@itemx "mtime"
+@itemx "ctime"
+The file's last access, modification, and inode update times,
+respectively. These are numeric timestamps, suitable for formatting
+with @code{strftime()}
+(@pxref{Time Functions}).
+
+@item "pmode"
+The file's ``printable mode.'' This is a string representation of
+the file's type and permissions, such as is produced by
+@samp{ls -l}---for example, @code{"drwxr-xr-x"}.
+
+@item "type"
+A printable string representation of the file's type. The value
+is one of the following:
+
+@table @code
+@item "blockdev"
+@itemx "chardev"
+The file is a block or character device (``special file'').
+
+@ignore
+@item "door"
+The file is a Solaris ``door'' (special file used for
+interprocess communications).
+@end ignore
+
+@item "directory"
+The file is a directory.
+
+@item "fifo"
+The file is a named-pipe (also known as a FIFO).
+
+@item "file"
+The file is just a regular file.
+
+@item "socket"
+The file is an @code{AF_UNIX} (``Unix domain'') socket in the
+filesystem.
+
+@item "symlink"
+The file is a symbolic link.
+@end table
+@end table
+
+Several additional elements may be present depending upon the operating
+system and the type of the file. You can test for them in your @command{awk}
+program by using the @code{in} operator
+(@pxref{Reference to Elements}):
+
+@table @code
+@item "blksize"
+The preferred block size for I/O to the file. This field is not
+present on all POSIX-like systems in the C @code{stat} structure.
+
+@item "linkval"
+If the file is a symbolic link, this element is the name of the
+file the link points to (i.e., the value of the link).
+
+@item "rdev"
+@itemx "major"
+@itemx "minor"
+If the file is a block or character device file, then these values
+represent the numeric device number and the major and minor components
+of that number, respectively.
+@end table
+
+@node Internal File Ops
+@subsection C Code for @code{chdir()} and @code{stat()}
+
+Here is the C code for these extensions.@footnote{This version is
+edited slightly for presentation. See @file{extension/filefuncs.c}
+in the @command{gawk} distribution for the complete version.}
+
+The file includes a number of standard header files, and then includes
+the @file{gawkapi.h} header file which provides the API definitions.
+Those are followed by the necessary variable declarations
+to make use of the API macros and boilerplate code
+(@pxref{Extension API Boilerplate}).
+
+@c break line for page breaking
+@example
+#ifdef HAVE_CONFIG_H
+#include <config.h>
+#endif
+
+#include <stdio.h>
+#include <assert.h>
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+#include <sys/types.h>
+#include <sys/stat.h>
+
+#include "gawkapi.h"
+
+#include "gettext.h"
+#define _(msgid) gettext(msgid)
+#define N_(msgid) msgid
+
+#include "gawkfts.h"
+#include "stack.h"
+
+static const gawk_api_t *api; /* for convenience macros to work */
+static awk_ext_id_t *ext_id;
+static awk_bool_t init_filefuncs(void);
+static awk_bool_t (*init_func)(void) = init_filefuncs;
+static const char *ext_version = "filefuncs extension: version 1.0";
+
+int plugin_is_GPL_compatible;
+@end example
+
+@cindex programming conventions, @command{gawk} internals
+By convention, for an @command{awk} function @code{foo()}, the C function
+that implements it is called @code{do_foo()}. The function should have
+two arguments: the first is an @code{int} usually called @code{nargs},
+that represents the number of actual arguments for the function.
+The second is a pointer to an @code{awk_value_t}, usually named
+@code{result}.
+
+@example
+/* do_chdir --- provide dynamically loaded chdir() builtin for gawk */
+
+static awk_value_t *
+do_chdir(int nargs, awk_value_t *result)
+@{
+ awk_value_t newdir;
+ int ret = -1;
+
+ assert(result != NULL);
+
+ if (do_lint && nargs != 1)
+ lintwarn(ext_id,
+ _("chdir: called with incorrect number of arguments, "
+ "expecting 1"));
+@end example
+
+The @code{newdir}
+variable represents the new directory to change to, retrieved
+with @code{get_argument()}. Note that the first argument is
+numbered zero.
+
+If the argument is retrieved successfully, the function calls the
+@code{chdir()} system call. If the @code{chdir()} fails, @code{ERRNO}
+is updated.
+
+@example
+ if (get_argument(0, AWK_STRING, & newdir)) @{
+ ret = chdir(newdir.str_value.str);
+ if (ret < 0)
+ update_ERRNO_int(errno);
+ @}
+@end example
+
+Finally, the function returns the return value to the @command{awk} level:
+
+@example
+ return make_number(ret, result);
+@}
+@end example
+
+The @code{stat()} extension is more involved. First comes a function
+that turns a numeric mode into a printable representation
+(e.g., 644 becomes @samp{-rw-r--r--}). This is omitted here for brevity:
+
+@c break line for page breaking
+@example
+/* format_mode --- turn a stat mode field into something readable */
+
+static char *
+format_mode(unsigned long fmode)
+@{
+ @dots{}
+@}
+@end example
+
+Next comes a function for reading symbolic links, which is also
+omitted here for brevity:
+
+@example
+/* read_symlink --- read a symbolic link into an allocated buffer.
+ @dots{} */
+
+static char *
+read_symlink(const char *fname, size_t bufsize, ssize_t *linksize)
+@{
+ @dots{}
+@}
+@end example
+
+Two helper functions simplify entering values in the
+array that will contain the result of the @code{stat()}:
+
+@example
+/* array_set --- set an array element */
+
+static void
+array_set(awk_array_t array, const char *sub, awk_value_t *value)
+@{
+ awk_value_t index;
+
+ set_array_element(array,
+ make_const_string(sub, strlen(sub), & index),
+ value);
+
+@}
+
+/* array_set_numeric --- set an array element with a number */
+
+static void
+array_set_numeric(awk_array_t array, const char *sub, double num)
+@{
+ awk_value_t tmp;
+
+ array_set(array, sub, make_number(num, & tmp));
+@}
+@end example
+
+The following function does most of the work to fill in
+the @code{awk_array_t} result array with values obtained
+from a valid @code{struct stat}. It is done in a separate function
+to support the @code{stat()} function for @command{gawk} and also
+to support the @code{fts()} extension which is included in
+the same file but whose code is not shown here
+(@pxref{Extension Sample File Functions}).
+
+The first part of the function is variable declarations,
+including a table to map file types to strings:
+
+@example
+/* fill_stat_array --- do the work to fill an array with stat info */
+
+static int
+fill_stat_array(const char *name, awk_array_t array, struct stat *sbuf)
+@{
+ char *pmode; /* printable mode */
+ const char *type = "unknown";
+ awk_value_t tmp;
+ static struct ftype_map @{
+ unsigned int mask;
+ const char *type;
+ @} ftype_map[] = @{
+ @{ S_IFREG, "file" @},
+ @{ S_IFBLK, "blockdev" @},
+ @{ S_IFCHR, "chardev" @},
+ @{ S_IFDIR, "directory" @},
+#ifdef S_IFSOCK
+ @{ S_IFSOCK, "socket" @},
+#endif
+#ifdef S_IFIFO
+ @{ S_IFIFO, "fifo" @},
+#endif
+#ifdef S_IFLNK
+ @{ S_IFLNK, "symlink" @},
+#endif
+#ifdef S_IFDOOR /* Solaris weirdness */
+ @{ S_IFDOOR, "door" @},
+#endif /* S_IFDOOR */
+ @};
+ int j, k;
+@end example
+
+The destination array is cleared, and then code fills in
+various elements based on values in the @code{struct stat}:
+
+@example
+ /* empty out the array */
+ clear_array(array);
+
+ /* fill in the array */
+ array_set(array, "name", make_const_string(name, strlen(name),
+ & tmp));
+ array_set_numeric(array, "dev", sbuf->st_dev);
+ array_set_numeric(array, "ino", sbuf->st_ino);
+ array_set_numeric(array, "mode", sbuf->st_mode);
+ array_set_numeric(array, "nlink", sbuf->st_nlink);
+ array_set_numeric(array, "uid", sbuf->st_uid);
+ array_set_numeric(array, "gid", sbuf->st_gid);
+ array_set_numeric(array, "size", sbuf->st_size);
+ array_set_numeric(array, "blocks", sbuf->st_blocks);
+ array_set_numeric(array, "atime", sbuf->st_atime);
+ array_set_numeric(array, "mtime", sbuf->st_mtime);
+ array_set_numeric(array, "ctime", sbuf->st_ctime);
+
+ /* for block and character devices, add rdev,
+ major and minor numbers */
+ if (S_ISBLK(sbuf->st_mode) || S_ISCHR(sbuf->st_mode)) @{
+ array_set_numeric(array, "rdev", sbuf->st_rdev);
+ array_set_numeric(array, "major", major(sbuf->st_rdev));
+ array_set_numeric(array, "minor", minor(sbuf->st_rdev));
+ @}
+@end example
+
+@noindent
+The latter part of the function makes selective additions
+to the destination array, depending upon the availability of
+certain members and/or the type of the file. It then returns zero,
+for success:
+
+@example
+#ifdef HAVE_ST_BLKSIZE
+ array_set_numeric(array, "blksize", sbuf->st_blksize);
+#endif /* HAVE_ST_BLKSIZE */
+
+ pmode = format_mode(sbuf->st_mode);
+ array_set(array, "pmode", make_const_string(pmode, strlen(pmode),
+ & tmp));
+
+ /* for symbolic links, add a linkval field */
+ if (S_ISLNK(sbuf->st_mode)) @{
+ char *buf;
+ ssize_t linksize;
+
+ if ((buf = read_symlink(name, sbuf->st_size,
+ & linksize)) != NULL)
+ array_set(array, "linkval",
+ make_malloced_string(buf, linksize, & tmp));
+ else
+ warning(ext_id, _("stat: unable to read symbolic link `%s'"),
+ name);
+ @}
+
+ /* add a type field */
+ type = "unknown"; /* shouldn't happen */
+ for (j = 0, k = sizeof(ftype_map)/sizeof(ftype_map[0]); j < k; j++) @{
+ if ((sbuf->st_mode & S_IFMT) == ftype_map[j].mask) @{
+ type = ftype_map[j].type;
+ break;
+ @}
+ @}
+
+ array_set(array, "type", make_const_string(type, strlen(type), &tmp));
+
+ return 0;
+@}
+@end example
+
+Finally, here is the @code{do_stat()} function. It starts with
+variable declarations and argument checking:
+
+@ignore
+Changed message for page breaking. Used to be:
+ "stat: called with incorrect number of arguments (%d), should be 2",
+@end ignore
+@example
+/* do_stat --- provide a stat() function for gawk */
+
+static awk_value_t *
+do_stat(int nargs, awk_value_t *result)
+@{
+ awk_value_t file_param, array_param;
+ char *name;
+ awk_array_t array;
+ int ret;
+ struct stat sbuf;
+ int (*statfunc)(const char *path, struct stat *sbuf) = lstat; /* default */
+
+ assert(result != NULL);
+
+ if (nargs != 2 && nargs != 3) @{
+ if (do_lint)
+ lintwarn(ext_id, _("stat: called with wrong number of arguments"));
+ return make_number(-1, result);
+ @}
+@end example
+
+The third argument to @code{stat()} was not discussed previously. This argument
+is optional. If present, it causes @code{stat()} to use the @code{stat()}
+system call instead of the @code{lstat()} system call.
+
+Then comes the actual work. First, the function gets the arguments.
+Next, it gets the information for the file.
+The code use @code{lstat()} (instead of @code{stat()})
+to get the file information,
+in case the file is a symbolic link.
+If there's an error, it sets @code{ERRNO} and returns:
+
+@example
+ /* file is first arg, array to hold results is second */
+ if ( ! get_argument(0, AWK_STRING, & file_param)
+ || ! get_argument(1, AWK_ARRAY, & array_param)) @{
+ warning(ext_id, _("stat: bad parameters"));
+ return make_number(-1, result);
+ @}
+
+ if (nargs == 3) @{
+ statfunc = stat;
+ @}
+
+ name = file_param.str_value.str;
+ array = array_param.array_cookie;
+
+ /* always empty out the array */
+ clear_array(array);
+
+ /* stat the file, if error, set ERRNO and return */
+ ret = statfunc(name, & sbuf);
+ if (ret < 0) @{
+ update_ERRNO_int(errno);
+ return make_number(ret, result);
+ @}
+@end example
+
+The tedious work is done by @code{fill_stat_array()}, shown
+earlier. When done, return the result from @code{fill_stat_array()}:
+
+@example
+ ret = fill_stat_array(name, array, & sbuf);
+
+ return make_number(ret, result);
+@}
+@end example
+
+@cindex programming conventions, @command{gawk} internals
+Finally, it's necessary to provide the ``glue'' that loads the
+new function(s) into @command{gawk}.
+
+The @code{filefuncs} extension also provides an @code{fts()}
+function, which we omit here. For its sake there is an initialization
+function:
+
+@example
+/* init_filefuncs --- initialization routine */
+
+static awk_bool_t
+init_filefuncs(void)
+@{
+ @dots{}
+@}
+@end example
+
+We are almost done. We need an array of @code{awk_ext_func_t}
+structures for loading each function into @command{gawk}:
+
+@example
+static awk_ext_func_t func_table[] = @{
+ @{ "chdir", do_chdir, 1 @},
+ @{ "stat", do_stat, 2 @},
+ @{ "fts", do_fts, 3 @},
+@};
+@end example
+
+Each extension must have a routine named @code{dl_load()} to load
+everything that needs to be loaded. It is simplest to use the
+@code{dl_load_func()} macro in @code{gawkapi.h}:
+
+@example
+/* define the dl_load() function using the boilerplate macro */
+
+dl_load_func(func_table, filefuncs, "")
+@end example
+
+And that's it! As an exercise, consider adding functions to
+implement system calls such as @code{chown()}, @code{chmod()},
+and @code{umask()}.
+
+@node Using Internal File Ops
+@subsection Integrating The Extensions
+
+@cindex @command{gawk}, interpreter@comma{} adding code to
+Now that the code is written, it must be possible to add it at
+runtime to the running @command{gawk} interpreter. First, the
+code must be compiled. Assuming that the functions are in
+a file named @file{filefuncs.c}, and @var{idir} is the location
+of the @file{gawkapi.h} header file,
+the following steps@footnote{In practice, you would probably want to
+use the GNU Autotools---Automake, Autoconf, Libtool, and Gettext---to
+configure and build your libraries. Instructions for doing so are beyond
+the scope of this @value{DOCUMENT}. @xref{gawkextlib}, for WWW links to
+the tools.} create a GNU/Linux shared library:
+
+@example
+$ @kbd{gcc -fPIC -shared -DHAVE_CONFIG_H -c -O -g -I@var{idir} filefuncs.c}
+$ @kbd{gcc -o filefuncs.so -shared filefuncs.o}
+@end example
+
+Once the library exists, it is loaded by using the @code{@@load} keyword.
+
+@example
+# file testff.awk
+@@load "filefuncs"
+
+BEGIN @{
+ "pwd" | getline curdir # save current directory
+ close("pwd")
+
+ chdir("/tmp")
+ system("pwd") # test it
+ chdir(curdir) # go back
+
+ print "Info for testff.awk"
+ ret = stat("testff.awk", data)
+ print "ret =", ret
+ for (i in data)
+ printf "data[\"%s\"] = %s\n", i, data[i]
+ print "testff.awk modified:",
+ strftime("%m %d %y %H:%M:%S", data["mtime"])
+
+ print "\nInfo for JUNK"
+ ret = stat("JUNK", data)
+ print "ret =", ret
+ for (i in data)
+ printf "data[\"%s\"] = %s\n", i, data[i]
+ print "JUNK modified:", strftime("%m %d %y %H:%M:%S", data["mtime"])
+@}
+@end example
+
+The @env{AWKLIBPATH} environment variable tells
+@command{gawk} where to find shared libraries (@pxref{Finding Extensions}).
+We set it to the current directory and run the program:
+
+@example
+$ @kbd{AWKLIBPATH=$PWD gawk -f testff.awk}
+@print{} /tmp
+@print{} Info for testff.awk
+@print{} ret = 0
+@print{} data["blksize"] = 4096
+@print{} data["mtime"] = 1350838628
+@print{} data["mode"] = 33204
+@print{} data["type"] = file
+@print{} data["dev"] = 2053
+@print{} data["gid"] = 1000
+@print{} data["ino"] = 1719496
+@print{} data["ctime"] = 1350838628
+@print{} data["blocks"] = 8
+@print{} data["nlink"] = 1
+@print{} data["name"] = testff.awk
+@print{} data["atime"] = 1350838632
+@print{} data["pmode"] = -rw-rw-r--
+@print{} data["size"] = 662
+@print{} data["uid"] = 1000
+@print{} testff.awk modified: 10 21 12 18:57:08
+@print{}
+@print{} Info for JUNK
+@print{} ret = -1
+@print{} JUNK modified: 01 01 70 02:00:00
+@end example
+
+@node Extension Samples
+@section The Sample Extensions In The @command{gawk} Distribution
+
+This @value{SECTION} provides brief overviews of the sample extensions
+that come in the @command{gawk} distribution. Some of them are intended
+for production use, such the @code{filefuncs} and @code{readdir} extensions.
+Others mainly provide example code that shows how to use the extension API.
+
+@menu
+* Extension Sample File Functions:: The file functions sample.
+* Extension Sample Fnmatch:: An interface to @code{fnmatch()}.
+* Extension Sample Fork:: An interface to @code{fork()} and other
+ process functions.
+* Extension Sample Ord:: Character to value to character
+ conversions.
+* Extension Sample Readdir:: An interface to @code{readdir()}.
+* Extension Sample Revout:: Reversing output sample output wrapper.
+* Extension Sample Rev2way:: Reversing data sample two-way processor.
+* Extension Sample Read write array:: Serializing an array to a file.
+* Extension Sample Readfile:: Reading an entire file into a string.
+* Extension Sample API Tests:: Tests for the API.
+* Extension Sample Time:: An interface to @code{gettimeofday()}
+ and @code{sleep()}.
+@end menu
+
+@node Extension Sample File Functions
+@subsection File Related Functions
+
+The @code{filefuncs} extension provides three different functions, as follows:
+The usage is:
+
+@table @code
+@item @@load "filefuncs"
+This is how you load the extension.
+
+@item result = chdir("/some/directory")
+The @code{chdir()} function is a direct hook to the @code{chdir()}
+system call to change the current directory. It returns zero
+upon success or less than zero upon error. In the latter case it updates
+@code{ERRNO}.
+
+@item result = stat("/some/path", statdata [, follow])
+The @code{stat()} function provides a hook into the
+@code{stat()} system call.
+It returns zero upon success or less than zero upon error.
+In the latter case it updates @code{ERRNO}.
+
+By default, it uses the @code{lstat()} system call. However, if passed
+a third argument, it uses @code{stat()} instead.
+
+In all cases, it clears the @code{statdata} array.
+When the call is successful, @code{stat()} fills the @code{statdata}
+array with information retrieved from the filesystem, as follows:
+
+@c nested table
+@multitable @columnfractions .25 .60
+@item @code{statdata["name"]} @tab
+The name of the file.
+
+@item @code{statdata["dev"]} @tab
+Corresponds to the @code{st_dev} field in the @code{struct stat}.
+
+@item @code{statdata["ino"]} @tab
+Corresponds to the @code{st_ino} field in the @code{struct stat}.
+
+@item @code{statdata["mode"]} @tab
+Corresponds to the @code{st_mode} field in the @code{struct stat}.
+
+@item @code{statdata["nlink"]} @tab
+Corresponds to the @code{st_nlink} field in the @code{struct stat}.
+
+@item @code{statdata["uid"]} @tab
+Corresponds to the @code{st_uid} field in the @code{struct stat}.
+
+@item @code{statdata["gid"]} @tab
+Corresponds to the @code{st_gid} field in the @code{struct stat}.
+
+@item @code{statdata["size"]} @tab
+Corresponds to the @code{st_size} field in the @code{struct stat}.
+
+@item @code{statdata["atime"]} @tab
+Corresponds to the @code{st_atime} field in the @code{struct stat}.
+
+@item @code{statdata["mtime"]} @tab
+Corresponds to the @code{st_mtime} field in the @code{struct stat}.
+
+@item @code{statdata["ctime"]} @tab
+Corresponds to the @code{st_ctime} field in the @code{struct stat}.
+
+@item @code{statdata["rdev"]} @tab
+Corresponds to the @code{st_rdev} field in the @code{struct stat}.
+This element is only present for device files.
+
+@item @code{statdata["major"]} @tab
+Corresponds to the @code{st_major} field in the @code{struct stat}.
+This element is only present for device files.
+
+@item @code{statdata["minor"]} @tab
+Corresponds to the @code{st_minor} field in the @code{struct stat}.
+This element is only present for device files.
+
+@item @code{statdata["blksize"]} @tab
+Corresponds to the @code{st_blksize} field in the @code{struct stat}.
+if this field is present on your system.
+(It is present on all modern systems that we know of.)
+
+@item @code{statdata["pmode"]} @tab
+A human-readable version of the mode value, such as printed by
+@command{ls}. For example, @code{"-rwxr-xr-x"}.
+
+@item @code{statdata["linkval"]} @tab
+If the named file is a symbolic link, this element will exist
+and its value is the value of the symbolic link (where the
+symbolic link points to).
+
+@item @code{statdata["type"]} @tab
+The type of the file as a string. One of
+@code{"file"},
+@code{"blockdev"},
+@code{"chardev"},
+@code{"directory"},
+@code{"socket"},
+@code{"fifo"},
+@code{"symlink"},
+@code{"door"},
+or
+@code{"unknown"}.
+Not all systems support all file types.
+@end multitable
+
+@item flags = or(FTS_PHYSICAL, ...)
+@itemx result = fts(pathlist, flags, filedata)
+Walk the file trees provided in @code{pathlist} and fill in the
+@code{filedata} array as described below. @code{flags} is the bitwise
+OR of several predefined constant values, also as described below.
+Return zero if there were no errors, otherwise return @minus{}1.
+@end table
+
+The @code{fts()} function provides a hook to the C library @code{fts()}
+routines for traversing file hierarchies. Instead of returning data
+about one file at a time in a stream, it fills in a multi-dimensional
+array with data about each file and directory encountered in the requested
+hierarchies.
+
+The arguments are as follows:
+
+@table @code
+@item pathlist
+An array of filenames. The element values are used; the index values are ignored.
+
+@item flags
+This should be the bitwise OR of one or more of the following
+predefined constant flag values. At least one of
+@code{FTS_LOGICAL} or @code{FTS_PHYSICAL} must be provided; otherwise
+@code{fts()} returns an error value and sets @code{ERRNO}.
+The flags are:
+
+@c nested table
+@table @code
+@item FTS_LOGICAL
+Do a ``logical'' file traversal, where the information returned for
+a symbolic link refers to the linked-to file, and not to the symbolic
+link itself. This flag is mutually exclusive with @code{FTS_PHYSICAL}.
+
+@item FTS_PHYSICAL
+Do a ``physical'' file traversal, where the information returned for a
+symbolic link refers to the symbolic link itself. This flag is mutually
+exclusive with @code{FTS_LOGICAL}.
+
+@item FTS_NOCHDIR
+As a performance optimization, the C library @code{fts()} routines
+change directory as they traverse a file hierarchy. This flag disables
+that optimization.
+
+@item FTS_COMFOLLOW
+Immediately follow a symbolic link named in @code{pathlist},
+whether or not @code{FTS_LOGICAL} is set.
+
+@item FTS_SEEDOT
+By default, the @code{fts()} routines do not return entries for @file{.}
+and @file{..}. This option causes entries for @file{..} to also
+be included. (The extension always includes an entry for @file{.},
+see below.)
+
+@item FTS_XDEV
+During a traversal, do not cross onto a different mounted filesystem.
+@end table
+
+@item filedata
+The @code{filedata} array is first cleared. Then, @code{fts()} creates
+an element in @code{filedata} for every element in @code{pathlist}.
+The index is the name of the directory or file given in @code{pathlist}.
+The element for this index is itself an array. There are two cases.
+
+@c nested table
+@table @emph
+@item The path is a file.
+In this case, the array contains two or three elements:
+
+@c doubly nested table
+@table @code
+@item "path"
+The full path to this file, starting from the ``root'' that was given
+in the @code{pathlist} array.
+
+@item "stat"
+This element is itself an array, containing the same information as provided
+by the @code{stat()} function described earlier for its
+@code{statdata} argument. The element may not be present if
+the @code{stat()} system call for the file failed.
+
+@item "error"
+If some kind of error was encountered, the array will also
+contain an element named @code{"error"}, which is a string describing the error.
+@end table
+
+@item The path is a directory.
+In this case, the array contains one element for each entry in the
+directory. If an entry is a file, that element is as for files, just
+described. If the entry is a directory, that element is (recursively),
+an array describing the subdirectory. If @code{FTS_SEEDOT} was provided
+in the flags, then there will also be an element named @code{".."}. This
+element will be an array containing the data as provided by @code{stat()}.
+
+In addition, there will be an element whose index is @code{"."}.
+This element is an array containing the same two or three elements as
+for a file: @code{"path"}, @code{"stat"}, and @code{"error"}.
+@end table
+@end table
+
+The @code{fts()} function returns zero if there were no errors.
+Otherwise it returns @minus{}1.
+
+@quotation NOTE
+The @code{fts()} extension does not exactly mimic the
+interface of the C library @code{fts()} routines, choosing instead to
+provide an interface that is based on associative arrays, which should
+be more comfortable to use from an @command{awk} program. This includes the
+lack of a comparison function, since @command{gawk} already provides
+powerful array sorting facilities. While an @code{fts_read()}-like
+interface could have been provided, this felt less natural than simply
+creating a multi-dimensional array to represent the file hierarchy and
+its information.
+@end quotation
+
+See @file{test/fts.awk} in the @command{gawk} distribution for an example.
+
+@node Extension Sample Fnmatch
+@subsection Interface To @code{fnmatch()}
+
+This extension provides an interface to the C library
+@code{fnmatch()} function. The usage is:
+
+@example
+@@load "fnmatch"
+
+result = fnmatch(pattern, string, flags)
+@end example
+
+The @code{fnmatch} extension adds a single function named
+@code{fnmatch()}, one constant (@code{FNM_NOMATCH}), and an array of
+flag values named @code{FNM}.
+
+The arguments to @code{fnmatch()} are:
+
+@table @code
+@item pattern
+The filename wildcard to match.
+
+@item string
+The filename string,
+
+@item flag
+Either zero, or the bitwise OR of one or more of the
+flags in the @code{FNM} array.
+@end table
+
+The return value is zero on success, @code{FNM_NOMATCH}
+if the string did not match the pattern, or
+a different non-zero value if an error occurred.
+
+The flags are follows:
+
+@multitable @columnfractions .25 .75
+@item @code{FNM["CASEFOLD"]} @tab
+Corresponds to the @code{FNM_CASEFOLD} flag as defined in @code{fnmatch()}.
+
+@item @code{FNM["FILE_NAME"]} @tab
+Corresponds to the @code{FNM_FILE_NAME} flag as defined in @code{fnmatch()}.
+
+@item @code{FNM["LEADING_DIR"]} @tab
+Corresponds to the @code{FNM_LEADING_DIR} flag as defined in @code{fnmatch()}.
+
+@item @code{FNM["NOESCAPE"]} @tab
+Corresponds to the @code{FNM_NOESCAPE} flag as defined in @code{fnmatch()}.
+
+@item @code{FNM["PATHNAME"]} @tab
+Corresponds to the @code{FNM_PATHNAME} flag as defined in @code{fnmatch()}.
+
+@item @code{FNM["PERIOD"]} @tab
+Corresponds to the @code{FNM_PERIOD} flag as defined in @code{fnmatch()}.
+@end multitable
+
+Here is an example:
+
+@example
+@@load "fnmatch"
+@dots{}
+flags = or(FNM["PERIOD"], FNM["NOESCAPE"])
+if (fnmatch("*.a", "foo.c", flags) == FNM_NOMATCH)
+ print "no match"
+@end example
+
+@node Extension Sample Fork
+@subsection Interface To @code{fork()}, @code{wait()} and @code{waitpid()}
+
+The @code{fork} extension adds three functions, as follows.
+
+@table @code
+@item @@load "fork"
+This is how you load the extension.
+
+@item pid = fork()
+This function creates a new process. The return value is the zero in the
+child and the process-id number of the child in the parent, or @minus{}1
+upon error. In the latter case, @code{ERRNO} indicates the problem.
+In the child, @code{PROCINFO["pid"]} and @code{PROCINFO["ppid"]} are
+updated to reflect the correct values.
+
+@item ret = waitpid(pid)
+This function takes a numeric argument, which is the process-id to
+wait for. The return value is that of the
+@code{waitpid()} system call.
+
+@item ret = wait()
+This function waits for the first child to die.
+The return value is that of the
+@code{wait()} system call.
+@end table
+
+There is no corresponding @code{exec()} function.
+
+Here is an example:
+
+@example
+@@load "fork"
+@dots{}
+if ((pid = fork()) == 0)
+ print "hello from the child"
+else
+ print "hello from the parent"
+@end example
+
+@node Extension Sample Ord
+@subsection Character and Numeric values: @code{ord()} and @code{chr()}
+
+The @code{ordchr} extension adds two functions, named
+@code{ord()} and @code{chr()}, as follows.
+
+@table @code
+@item number = ord(string)
+Return the numeric value of the first character in @code{string}.
+
+@item char = chr(number)
+Return the string whose first character is that represented by @code{number}.
+@end table
+
+These functions are inspired by the Pascal language functions
+of the same name. Here is an example:
+
+@example
+@@load "ordchr"
+@dots{}
+printf("The numeric value of 'A' is %d\n", ord("A"))
+printf("The string value of 65 is %s\n", chr(65))
+@end example
+
+@node Extension Sample Readdir
+@subsection Reading Directories
+
+The @code{readdir} extension adds an input parser for directories.
+The usage is as follows:
+
+@example
+@@load "readdir"
+@end example
+
+When this extension is in use, instead of skipping directories named
+on the command line (or with @code{getline}),
+they are read, with each entry returned as a record.
+
+The record consists of three fields. The first two are the inode number and the
+filename, separated by a forward slash character.
+On systems where the directory entry contains the file type, the record
+has a third field which is a single letter indicating the type of the
+file:
+
+@multitable @columnfractions .1 .9
+@headitem Letter @tab File Type
+@item @code{b} @tab Block device
+@item @code{c} @tab Character device
+@item @code{d} @tab Directory
+@item @code{f} @tab Regular file
+@item @code{l} @tab Symbolic link
+@item @code{p} @tab Named pipe (FIFO)
+@item @code{s} @tab Socket
+@item @code{u} @tab Anything else (unknown)
+@end multitable
+
+On systems without the file type information, the third field is always
+@samp{u}.
+
+@quotation NOTE
+On GNU/Linux systems, there are filesystems that don't support the
+@code{d_type} entry (see the @i{readdir}(3) manual page), and so the file
+type is always @samp{u}. You can use the @code{filefuncs} extension to call
+@code{stat()} in order to get correct type information.
+@end quotation
+
+Here is an example:
+
+@example
+@@load "readdir"
+@dots{}
+BEGIN @{ FS = "/" @}
+@{ print "file name is", $2 @}
+@end example
+
+@node Extension Sample Revout
+@subsection Reversing Output
+
+The @code{revoutput} extension adds a simple output wrapper that reverses
+the characters in each output line. It's main purpose is to show how to
+write an output wrapper, although it may be mildly amusing for the unwary.
+Here is an example:
+
+@example
+@@load "revoutput"
+
+BEGIN @{
+ REVOUT = 1
+ print "hello, world" > "/dev/stdout"
+@}
+@end example
+
+The output from this program is:
+@samp{dlrow ,olleh}.
+
+@node Extension Sample Rev2way
+@subsection Two-Way I/O Example
+
+The @code{revtwoway} extension adds a simple two-way processor that
+reverses the characters in each line sent to it for reading back by
+the @command{awk} program. It's main purpose is to show how to write
+a two-way processor, although it may also be mildly amusing.
+The following example shows how to use it:
+
+@example
+@@load "revtwoway"
+
+BEGIN @{
+ cmd = "/magic/mirror"
+ print "hello, world" |& cmd
+ cmd |& getline result
+ print result
+ close(cmd)
+@}
+@end example
+
+@node Extension Sample Read write array
+@subsection Dumping and Restoring An Array
+
+The @code{rwarray} extension adds two functions,
+named @code{writea()} and @code{reada()}, as follows:
+
+@table @code
+@item ret = writea(file, array)
+This function takes a string argument, which is the name of the file
+to which dump the array, and the array itself as the second argument.
+@code{writea()} understands multidimensional arrays. It returns one on
+success, or zero upon failure.
+
+@item ret = reada(file, array)
+@code{reada()} is the inverse of @code{writea()};
+it reads the file named as its first argument, filling in
+the array named as the second argument. It clears the array first.
+Here too, the return value is one on success and zero upon failure.
+@end table
+
+The array created by @code{reada()} is identical to that written by
+@code{writea()} in the sense that the contents are the same. However,
+due to implementation issues, the array traversal order of the recreated
+array is likely to be different from that of the original array. As array
+traversal order in @command{awk} is by default undefined, this is not
+(technically) a problem. If you need to guarantee a particular traversal
+order, use the array sorting features in @command{gawk} to do so
+(@pxref{Array Sorting}).
+
+The file contains binary data. All integral values are written in network
+byte order. However, double precision floating-point values are written
+as native binary data. Thus, arrays containing only string data can
+theoretically be dumped on systems with one byte order and restored on
+systems with a different one, but this has not been tried.
+
+Here is an example:
+
+@example
+@@load "rwarray"
+@dots{}
+ret = writea("arraydump.bin", array)
+@dots{}
+ret = reada("arraydump.bin", array)
+@end example
+
+@node Extension Sample Readfile
+@subsection Reading An Entire File
+
+The @code{readfile} extension adds a single function
+named @code{readfile()}:
+
+@table @code
+@item result = readfile("/some/path")
+The argument is the name of the file to read. The return value is a
+string containing the entire contents of the requested file. Upon error,
+the function returns the empty string and sets @code{ERRNO}.
+@end table
+
+Here is an example:
+
+@example
+@@load "readfile"
+@dots{}
+contents = readfile("/path/to/file");
+if (contents == "" && ERRNO != "") @{
+ print("problem reading file", ERRNO) > "/dev/stderr"
+ ...
+@}
+@end example
+
+@node Extension Sample API Tests
+@subsection API Tests
+
+The @code{testext} extension exercises parts of the extension API that
+are not tested by the other samples. The @file{extension/testext.c}
+file contains both the C code for the extension and @command{awk}
+test code inside C comments that run the tests. The testing framework
+extracts the @command{awk} code and runs the tests. See the source file
+for more information.
+
+@node Extension Sample Time
+@subsection Extension Time Functions
+
+@cindex time
+@cindex sleep
+
+These functions can be used by either invoking @command{gawk}
+with a command-line argument of @samp{-l time} or by
+inserting @samp{@@load "time"} in your script.
+
+@table @code
+
+@cindex @code{gettimeofday} time extension function
+@item the_time = gettimeofday()
+Return the time in seconds that has elapsed since 1970-01-01 UTC as a
+floating point value. If the time is unavailable on this platform, return
+@minus{}1 and set @code{ERRNO}. The returned time should have sub-second
+precision, but the actual precision may vary based on the platform.
+If the standard C @code{gettimeofday()} system call is available on this
+platform, then it simply returns the value. Otherwise, if on Windows,
+it tries to use @code{GetSystemTimeAsFileTime()}.
+
+@cindex @code{sleep} time extension function
+@item result = sleep(@var{seconds})
+Attempt to sleep for @var{seconds} seconds. If @var{seconds} is negative,
+or the attempt to sleep fails, return @minus{}1 and set @code{ERRNO}.
+Otherwise, return zero after sleeping for the indicated amount of time.
+Note that @var{seconds} may be a floating-point (non-integral) value.
+Implementation details: depending on platform availability, this function
+tries to use @code{nanosleep()} or @code{select()} to implement the delay.
+@end table
+
+@node gawkextlib
+@section The @code{gawkextlib} Project
+
+The @uref{http://sourceforge.net/projects/gawkextlib/, @code{gawkextlib}}
+project provides a number of @command{gawk} extensions, including one for
+processing XML files. This is the evolution of the original @command{xgawk}
+(XML @command{gawk}) project.
+
+As of this writing, there are four extensions:
+
+@itemize @bullet
+@item
+XML parser extension, using the @uref{http://expat.sourceforge.net, Expat}
+XML parsing library.
+
+@item
+PostgreSQL extension.
+
+@item
+GD graphics library extension.
+
+@item
+MPFR library extension.
+This provides access to a number of MPFR functions which @command{gawk}'s
+native MPFR support does not.
+@end itemize
+
+The @code{time} extension described earlier (@pxref{Extension Sample
+Time}) was originally from this project but has been moved in to the
+main @command{gawk} distribution.
+
+You can check out the code for the @code{gawkextlib} project
+using the @uref{http://git-scm.com, GIT} distributed source
+code control system. The command is as follows:
+
+@example
+git clone git://git.code.sf.net/p/gawkextlib/code gawkextlib-code
+@end example
+
+You will need to have the @uref{http://expat.sourceforge.net, Expat}
+XML parser library installed in order to build and use the XML extension.
+
+In addition, you must have the GNU Autotools installed
+(@uref{http://www.gnu.org/software/autoconf, Autoconf},
+@uref{http://www.gnu.org/software/automake, Automake},
+@uref{http://www.gnu.org/software/libtool, Libtool},
+and
+@uref{http://www.gnu.org/software/gettext, Gettext}).
+
+The simple recipe for building and testing @code{gawkextlib} is as follows.
+First, build and install @command{gawk}:
+
+@example
+cd .../path/to/gawk/code
+./configure --prefix=/tmp/newgawk @ii{Install in /tmp/newgawk for now}
+make && make check @ii{Build and check that all is OK}
+make install @ii{Install gawk}
+@end example
+
+Next, build @code{gawkextlib} and test it:
+
+@example
+cd .../path/to/gawkextlib-code
+./update-autotools @ii{Generate configure, etc.}
+ @ii{You may have to run this command twice}
+./configure --with-gawk=/tmp/newgawk @ii{Configure, point at ``installed'' gawk}
+make && make check @ii{Build and check that all is OK}
+@end example
+
+If you write an extension that you wish to share with other
+@command{gawk} users, please consider doing so through the
+@code{gawkextlib} project.
+
+@iftex
+@part Part IV:@* Appendices
+@end iftex
+
+@ignore
+@ifdocbook
+
+@part Part IV:@* Appendices
+
+Part IV provides the appendices, the Glossary, and two licenses that cover
the @command{gawk} source code and this @value{DOCUMENT}, respectively.
-It contains the following appendixes:
+It contains the following appendices:
@itemize @bullet
@item
@@ -26505,11 +32022,7 @@ It contains the following appendixes:
@item
@ref{GNU Free Documentation License}.
@end itemize
-
-@page
-@evenheading @thispage@ @ @ @strong{@value{TITLE}} @| @|
-@oddheading @| @| @strong{@thischapter}@ @ @ @thispage
-@end iftex
+@end ifdocbook
@end ignore
@node Language History
@@ -26527,8 +32040,6 @@ This @value{CHAPTER} briefly describes the
evolution of the @command{awk} language, with cross-references to other parts
of the @value{DOCUMENT} where you can find more information.
-@c FIXME: Try to determine whether it was 3.1 or 3.2 that had new awk.
-
@menu
* V7/SVR3.1:: The major changes between V7 and System V
Release 3.1.
@@ -26944,6 +32455,7 @@ and
@code{xor()}
functions for bit manipulation
(@pxref{Bitwise Functions}).
+@c In 4.1, and(), or() and xor() grew the ability to take > 2 arguments
@item
The @code{asort()} and @code{asorti()} functions for sorting arrays
@@ -26955,11 +32467,6 @@ functions for internationalization
(@pxref{Programmer i18n}).
@item
-The @code{extension()} built-in function and the ability to add
-new functions dynamically
-(@pxref{Dynamic Extensions}).
-
-@item
The @code{fflush()} function from Brian Kernighan's
version of @command{awk}
(@pxref{I/O Functions}).
@@ -26986,29 +32493,70 @@ the @option{-f} command-line option
(@pxref{Options}).
@item
-The ability to use GNU-style long-named options that start with @option{--}
+The @env{AWKLIBPATH} environment variable for specifying a path search for
+the @option{-l} command-line option
+(@pxref{Options}).
+
+@item
+The
+@option{-b},
+@option{-c},
+@option{-C},
+@option{-d},
+@option{-D},
+@option{-e},
+@option{-E},
+@option{-g},
+@option{-h},
+@option{-i},
+@option{-l},
+@option{-L},
+@option{-M},
+@option{-n},
+@option{-N},
+@option{-o},
+@option{-O},
+@option{-p},
+@option{-P},
+@option{-r},
+@option{-S},
+@option{-t},
+and
+@option{-V}
+short options. Also, the
+ability to use GNU-style long-named options that start with @option{--}
and the
+@option{--assign},
+@option{--bignum},
@option{--characters-as-bytes},
-@option{--compat},
+@option{--copyright},
+@option{--debug},
@option{--dump-variables},
-@option{--exec},
+@option{--execle},
+@option{--field-separator},
+@option{--file},
@option{--gen-pot},
+@option{--help},
+@option{--include},
@option{--lint},
@option{--lint-old},
+@option{--load},
@option{--non-decimal-data},
+@option{--optimize},
@option{--posix},
+@option{--pretty-print},
@option{--profile},
@option{--re-interval},
@option{--sandbox},
@option{--source},
@option{--traditional},
+@option{--use-lc-numeric},
and
-@option{--use-lc-numeric}
-options
+@option{--version}
+long options
(@pxref{Options}).
@end itemize
-
@c new ports
@item
@@ -27073,7 +32621,7 @@ the three most widely-used freely available versions of @command{awk}
@item @samp{\x} Escape sequence @tab X @tab X @tab X
@item @code{RS} as regexp @tab @tab X @tab X
@item @code{FS} as null string @tab X @tab X @tab X
-@item @file{/dev/stdin} special file @tab X @tab X @tab X
+@item @file{/dev/stdin} special file @tab X @tab @tab X
@item @file{/dev/stdout} special file @tab X @tab X @tab X
@item @file{/dev/stderr} special file @tab X @tab X @tab X
@item @code{**} and @code{**=} operators @tab X @tab @tab X
@@ -27183,7 +32731,6 @@ to implementors to implement ranges in whatever way they choose.
The @command{gawk} maintainer chose to apply the pre-POSIX meaning in all
cases: the default regexp matching; with @option{--traditional}, and with
@option{--posix}; in all cases, @command{gawk} remains POSIX compliant.
-
@node Contributors
@appendixsec Major Contributors to @command{gawk}
@cindex @command{gawk}, list of contributors to
@@ -27316,6 +32863,7 @@ the various PC platforms.
Christos Zoulas
provided the @code{extension()}
built-in function for dynamically adding new modules.
+(This was removed at @command{gawk} 4.1.)
@item
@cindex Kahrs, J@"urgen
@@ -27379,7 +32927,7 @@ environments.
@cindex Haque, John
John Haque
reworked the @command{gawk} internals to use a byte-code engine,
-providing the @command{dgawk} debugger for @command{awk} programs.
+providing the @command{gawk} debugger for @command{awk} programs.
@item
@cindex Yawitz, Efraim
@@ -27946,10 +33494,6 @@ install-info --info-dir=x:/usr/info x:/usr/info/gawkinet.info
The binary distribution may contain a separate file containing additional
or more detailed installation instructions.
-As of April, 2012, up to date @command{gawk} binaries for MS Windows
-are available from @uref{http://sourceforge.net/projects/ezwinports/files/,
-Eli Zaretskii's ports project}.
-
@node PC Compiling
@appendixsubsubsec Compiling @command{gawk} for PC Operating Systems
@@ -28654,7 +34198,7 @@ since approximately 2003.
@item @command{pawk}
Nelson H.F.@: Beebe at the University of Utah has modified
Brian Kernighan's @command{awk} to provide timing and profiling information.
-It is different from @command{pgawk}
+It is different from @command{gawk} with the @option{--profile} option.
(@pxref{Profiling}),
in that it uses CPU-based profiling, not line-count
profiling. You may find it at either
@@ -28749,9 +34293,8 @@ maintainers of @command{gawk}. Everything in it applies specifically to
* Compatibility Mode:: How to disable certain @command{gawk}
extensions.
* Additions:: Making Additions To @command{gawk}.
-* Dynamic Extensions:: Adding new built-in functions to
- @command{gawk}.
* Future Extensions:: New features that may be implemented one day.
+* Implementation Limitations:: Some limitations of the implementation.
@end menu
@node Compatibility Mode
@@ -28796,6 +34339,8 @@ as well as any considerations you should bear in mind.
@command{gawk}.
* New Ports:: Porting @command{gawk} to a new operating
system.
+* Derived Files:: Why derived files are kept in the
+ @command{git} repository.
@end menu
@node Accessing The Source
@@ -28827,7 +34372,7 @@ git clone http://git.savannah.gnu.org/r/gawk.git
@end example
Once you have made changes, you can use @samp{git diff} to produce a
-patch, and send that to the @command{gawk} maintainer; see @ref{Bugs}
+patch, and send that to the @command{gawk} maintainer; see @ref{Bugs},
for how to do that.
Finally, if you cannot install Git (e.g., if it hasn't been ported
@@ -28838,6 +34383,10 @@ to check out a copy using CVS, as follows:
cvs -d:pserver:anonymous@@pserver.git.sv.gnu.org:/gawk.git co -d gawk master
@end example
+Note that this gateway is flakey; you may have better luck using
+a more modern version control system like Bazaar, that has a Git
+plug-in for working with Git repositories.
+
@node Adding Code
@appendixsubsec Adding New Features
@@ -28939,7 +34488,8 @@ of @code{switch} statements, instead of just the
plain pointer or character value.
@item
-Use the @code{TRUE}, @code{FALSE} and @code{NULL} symbolic constants
+Use @code{true}, @code{false} for @code{bool} values,
+the @code{NULL} symbolic constant for pointer values,
and the character constant @code{'\0'} where appropriate, instead of @code{1}
and @code{0}.
@@ -28986,8 +34536,9 @@ You will also have to sign paperwork for your documentation changes.
Submit changes as unified diffs.
Use @samp{diff -u -r -N} to compare
the original @command{gawk} source tree with your version.
-I recommend using the GNU version of @command{diff}.
-Send the output produced by either run of @command{diff} to me when you
+I recommend using the GNU version of @command{diff}, or best of all,
+@samp{git diff} or @samp{git format-patch}.
+Send the output produced by @command{diff} to me when you
submit your changes.
(@xref{Bugs}, for the electronic mail
information.)
@@ -29113,802 +34664,188 @@ operating systems' code that is already there.
In the code that you supply and maintain, feel free to use a
coding style and brace layout that suits your taste.
-@node Dynamic Extensions
-@appendixsec Adding New Built-in Functions to @command{gawk}
-@cindex Robinson, Will
-@cindex robot, the
-@cindex Lost In Space
-@quotation
-@i{Danger Will Robinson! Danger!!@*
-Warning! Warning!}@*
-The Robot
-@end quotation
-
-@c STARTOFRANGE gladfgaw
-@cindex @command{gawk}, functions, adding
-@c STARTOFRANGE adfugaw
-@cindex adding, functions to @command{gawk}
-@c STARTOFRANGE fubadgaw
-@cindex functions, built-in, adding to @command{gawk}
-It is possible to add new built-in
-functions to @command{gawk} using dynamically loaded libraries. This
-facility is available on systems (such as GNU/Linux) that support
-the C @code{dlopen()} and @code{dlsym()} functions.
-This @value{SECTION} describes how to write and use dynamically
-loaded extensions for @command{gawk}.
-Experience with programming in
-C or C++ is necessary when reading this @value{SECTION}.
+@node Derived Files
+@appendixsubsec Why Generated Files Are Kept In @command{git}
-@quotation CAUTION
-The facilities described in this @value{SECTION}
-are very much subject to change in a future @command{gawk} release.
-Be aware that you may have to re-do everything,
-at some future time.
-
-If you have written your own dynamic extensions,
-be sure to recompile them for each new @command{gawk} release.
-There is no guarantee of binary compatibility between different
-releases, nor will there ever be such a guarantee.
-@end quotation
+@c From emails written March 22, 2012, to the gawk developers list.
-@quotation NOTE
-When @option{--sandbox} is specified, extensions are disabled
-(@pxref{Options}.
-@end quotation
+If you look at the @command{gawk} source in the @command{git}
+repository, you will notice that it includes files that are automatically
+generated by GNU infrastructure tools, such as @file{Makefile.in} from
+@command{automake} and even @file{configure} from @command{autoconf}.
-@menu
-* Internals:: A brief look at some @command{gawk} internals.
-* Plugin License:: A note about licensing.
-* Sample Library:: A example of new functions.
-@end menu
+This is different from many Free Software projects that do not store
+the derived files, because that keeps the repository less cluttered,
+and it is easier to see the substantive changes when comparing versions
+and trying to understand what changed between commits.
-@node Internals
-@appendixsubsec A Minimal Introduction to @command{gawk} Internals
-@c STARTOFRANGE gawint
-@cindex @command{gawk}, internals
-
-The truth is that @command{gawk} was not designed for simple extensibility.
-The facilities for adding functions using shared libraries work, but
-are something of a ``bag on the side.'' Thus, this tour is
-brief and simplistic; would-be @command{gawk} hackers are encouraged to
-spend some time reading the source code before trying to write
-extensions based on the material presented here. Of particular note
-are the files @file{awk.h}, @file{builtin.c}, and @file{eval.c}.
-Reading @file{awkgram.y} in order to see how the parse tree is built
-would also be of use.
-
-@cindex @code{awk.h} file (internal)
-With the disclaimers out of the way, the following types, structure
-members, functions, and macros are declared in @file{awk.h} and are of
-use when writing extensions. The next @value{SECTION}
-shows how they are used:
+However, there are two reasons why the @command{gawk} maintainer
+likes to have everything in the repository.
-@table @code
-@cindex floating-point, numbers, @code{AWKNUM} internal type
-@cindex numbers, floating-point, @code{AWKNUM} internal type
-@cindex @code{AWKNUM} internal type
-@cindex internal type, @code{AWKNUM}
-@item AWKNUM
-An @code{AWKNUM} is the internal type of @command{awk}
-floating-point numbers. Typically, it is a C @code{double}.
-
-@cindex @code{NODE} internal type
-@cindex internal type, @code{NODE}
-@cindex strings, @code{NODE} internal type
-@cindex numbers, @code{NODE} internal type
-@item NODE
-Just about everything is done using objects of type @code{NODE}.
-These contain both strings and numbers, as well as variables and arrays.
-
-@cindex @code{force_number()} internal function
-@cindex internal function, @code{force_number()}
-@cindex numeric, values
-@item AWKNUM force_number(NODE *n)
-This macro forces a value to be numeric. It returns the actual
-numeric value contained in the node.
-It may end up calling an internal @command{gawk} function.
-
-@cindex @code{force_string()} internal function
-@cindex internal function, @code{force_string()}
-@item void force_string(NODE *n)
-This macro guarantees that a @code{NODE}'s string value is current.
-It may end up calling an internal @command{gawk} function.
-It also guarantees that the string is zero-terminated.
-
-@cindex @code{force_wstring()} internal function
-@cindex internal function, @code{force_wstring()}
-@item void force_wstring(NODE *n)
-Similarly, this
-macro guarantees that a @code{NODE}'s wide-string value is current.
-It may end up calling an internal @command{gawk} function.
-It also guarantees that the wide string is zero-terminated.
-
-@cindex @code{get_curfunc_arg_count()} internal function
-@cindex internal function, @code{get_curfunc_arg_count()}
-@item size_t get_curfunc_arg_count(void)
-This function returns the actual number of parameters passed
-to the current function. Inside the code of an extension
-this can be used to determine the maximum index which is
-safe to use with @code{get_actual_argument}. If this value is
-greater than @code{nargs}, the function was
-called incorrectly from the @command{awk} program.
-
-@cindex parameters@comma{} number of
-@cindex @code{nargs} internal variable
-@cindex internal variable, @code{nargs}
-@item nargs
-Inside an extension function, this is the maximum number of
-expected parameters, as set by the @code{make_builtin()} function.
-
-@cindex @code{stptr} internal variable
-@cindex internal variable, @code{stptr}
-@cindex @code{stlen} internal variable
-@cindex internal variable, @code{stlen}
-@item n->stptr
-@itemx n->stlen
-The data and length of a @code{NODE}'s string value, respectively.
-The string is @emph{not} guaranteed to be zero-terminated.
-If you need to pass the string value to a C library function, save
-the value in @code{n->stptr[n->stlen]}, assign @code{'\0'} to it,
-call the routine, and then restore the value.
-
-@cindex @code{wstptr} internal variable
-@cindex internal variable, @code{wstptr}
-@cindex @code{wstlen} internal variable
-@cindex internal variable, @code{wstlen}
-@item n->wstptr
-@itemx n->wstlen
-The data and length of a @code{NODE}'s wide-string value, respectively.
-Use @code{force_wstring()} to make sure these values are current.
-
-@cindex @code{type} internal variable
-@cindex internal variable, @code{type}
-@item n->type
-The type of the @code{NODE}. This is a C @code{enum}. Values should
-be one of @code{Node_var}, @code{Node_var_new}, or @code{Node_var_array}
-for function parameters.
-
-@cindex @code{vname} internal variable
-@cindex internal variable, @code{vname}
-@item n->vname
-The ``variable name'' of a node. This is not of much use inside
-externally written extensions.
-
-@cindex arrays, associative, clearing
-@cindex @code{assoc_clear()} internal function
-@cindex internal function, @code{assoc_clear()}
-@item void assoc_clear(NODE *n)
-Clears the associative array pointed to by @code{n}.
-Make sure that @samp{n->type == Node_var_array} first.
-
-@cindex arrays, elements, installing
-@cindex @code{assoc_lookup()} internal function
-@cindex internal function, @code{assoc_lookup()}
-@item NODE **assoc_lookup(NODE *symbol, NODE *subs, int reference)
-Finds, and installs if necessary, array elements.
-@code{symbol} is the array, @code{subs} is the subscript.
-This is usually a value created with @code{make_string()} (see below).
-@code{reference} should be @code{TRUE} if it is an error to use the
-value before it is created. Typically, @code{FALSE} is the
-correct value to use from extension functions.
-
-@cindex strings
-@cindex @code{make_string()} internal function
-@cindex internal function, @code{make_string()}
-@item NODE *make_string(char *s, size_t len)
-Take a C string and turn it into a pointer to a @code{NODE} that
-can be stored appropriately. This is permanent storage; understanding
-of @command{gawk} memory management is helpful.
-
-@cindex numbers
-@cindex @code{make_number()} internal function
-@cindex internal function, @code{make_number()}
-@item NODE *make_number(AWKNUM val)
-Take an @code{AWKNUM} and turn it into a pointer to a @code{NODE} that
-can be stored appropriately. This is permanent storage; understanding
-of @command{gawk} memory management is helpful.
-
-
-@cindex nodes@comma{} duplicating
-@cindex @code{dupnode()} internal function
-@cindex internal function, @code{dupnode()}
-@item NODE *dupnode(NODE *n)
-Duplicate a node. In most cases, this increments an internal
-reference count instead of actually duplicating the entire @code{NODE};
-understanding of @command{gawk} memory management is helpful.
-
-@cindex memory, releasing
-@cindex @code{unref()} internal function
-@cindex internal function, @code{unref()}
-@item void unref(NODE *n)
-This macro releases the memory associated with a @code{NODE}
-allocated with @code{make_string()} or @code{make_number()}.
-Understanding of @command{gawk} memory management is helpful.
-
-@cindex @code{make_builtin()} internal function
-@cindex internal function, @code{make_builtin()}
-@item void make_builtin(const char *name, NODE *(*func)(NODE *), int count)
-Register a C function pointed to by @code{func} as new built-in
-function @code{name}. @code{name} is a regular C string. @code{count}
-is the maximum number of arguments that the function takes.
-The function should be written in the following manner:
-
-@example
-/* do_xxx --- do xxx function for gawk */
-
-NODE *
-do_xxx(int nargs)
-@{
- @dots{}
-@}
-@end example
+First, because it is then easy to reproduce any given version completely,
+without relying upon the availability of (older, likely obsolete, and
+maybe even impossible to find) other tools.
-@cindex arguments, retrieving
-@cindex @code{get_argument()} internal function
-@cindex internal function, @code{get_argument()}
-@item NODE *get_argument(int i)
-This function is called from within a C extension function to get
-the @code{i}-th argument from the function call.
-The first argument is argument zero.
-
-@cindex @code{get_actual_argument()} internal function
-@cindex internal function, @code{get_actual_argument()}
-@item NODE *get_actual_argument(int i,
-@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ int@ optional,@ int@ wantarray);
-This function retrieves a particular argument @code{i}. @code{wantarray} is @code{TRUE}
-if the argument should be an array, @code{FALSE} otherwise. If @code{optional} is
-@code{TRUE}, the argument need not have been supplied. If it wasn't, the return
-value is @code{NULL}. It is a fatal error if @code{optional} is @code{TRUE} but
-the argument was not provided.
-
-@cindex @code{get_scalar_argument()} internal macro
-@cindex internal macro, @code{get_scalar_argument()}
-@item get_scalar_argument(i, opt)
-This is a convenience macro that calls @code{get_actual_argument()}.
-
-@cindex @code{get_array_argument()} internal macro
-@cindex internal macro, @code{get_array_argument()}
-@item get_array_argument(i, opt)
-This is a convenience macro that calls @code{get_actual_argument()}.
-
-@cindex functions, return values@comma{} setting
+As an extreme example, if you ever even think about trying to compile,
+oh, say, the V7 @command{awk}, you will discover that not only do you
+have to bootstrap the V7 @command{yacc} to do so, but you also need the
+V7 @command{lex}. And the latter is pretty much impossible to bring up
+on a modern GNU/Linux system.@footnote{We tried. It was painful.}
-@cindex @code{ERRNO} variable
-@cindex @code{update_ERRNO()} internal function
-@cindex internal function, @code{update_ERRNO()}
-@item void update_ERRNO(void)
-This function is called from within a C extension function to set
-the value of @command{gawk}'s @code{ERRNO} variable, based on the current
-value of the C @code{errno} global variable.
-It is provided as a convenience.
+(Or, let's say @command{gawk} 1.2 required @command{bison} whatever-it-was
+in 1989 and that there was no @file{awkgram.c} file in the repository. Is
+there a guarantee that we could find that @command{bison} version? Or that
+@emph{it} would build?)
-@cindex @code{ERRNO} variable
-@cindex @code{update_ERRNO_saved()} internal function
-@cindex internal function, @code{update_ERRNO_saved()}
-@item void update_ERRNO_saved(int errno_saved)
-This function is called from within a C extension function to set
-the value of @command{gawk}'s @code{ERRNO} variable, based on the error
-value provided as the argument.
-It is provided as a convenience.
+If the repository has all the generated files, then it's easy to just check
+them out and build. (Or @emph{easier}, depending upon how far back we go.
+@code{:-)})
-@cindex @code{ENVIRON} array
-@cindex @code{PROCINFO} array
-@cindex @code{register_deferred_variable()} internal function
-@cindex internal function, @code{register_deferred_variable()}
-@item void register_deferred_variable(const char *name, NODE *(*load_func)(void))
-This function is called to register a function to be called when a
-reference to an undefined variable with the given name is encountered.
-The callback function will never be called if the variable exists already,
-so, unless the calling code is running at program startup, it should first
-check whether a variable of the given name already exists.
-The argument function must return a pointer to a @code{NODE} containing the
-newly created variable. This function is used to implement the builtin
-@code{ENVIRON} and @code{PROCINFO} arrays, so you can refer to them
-for examples.
-
-@cindex @code{IOBUF} internal structure
-@cindex internal structure, @code{IOBUF}
-@cindex @code{iop_alloc()} internal function
-@cindex internal function, @code{iop_alloc()}
-@cindex @code{get_record()} input method
-@cindex @code{close_func}() input method
-@cindex @code{INVALID_HANDLE} internal constant
-@cindex internal constant, @code{INVALID_HANDLE}
-@cindex XML (eXtensible Markup Language)
-@cindex eXtensible Markup Language (XML)
-@cindex @code{register_open_hook()} internal function
-@cindex internal function, @code{register_open_hook()}
-@item void register_open_hook(void *(*open_func)(IOBUF *))
-This function is called to register a function to be called whenever
-a new data file is opened, leading to the creation of an @code{IOBUF}
-structure in @code{iop_alloc()}. After creating the new @code{IOBUF},
-@code{iop_alloc()} will call (in reverse order of registration, so the last
-function registered is called first) each open hook until one returns
-non-@code{NULL}. If any hook returns a non-@code{NULL} value, that value is assigned
-to the @code{IOBUF}'s @code{opaque} field (which will presumably point
-to a structure containing additional state associated with the input
-processing), and no further open hooks are called.
-
-The function called will most likely want to set the @code{IOBUF}'s
-@code{get_record} method to indicate that future input records should
-be retrieved by calling that method instead of using the standard
-@command{gawk} input processing.
-
-And the function will also probably want to set the @code{IOBUF}'s
-@code{close_func} method to be called when the file is closed to clean
-up any state associated with the input.
-
-Finally, hook functions should be prepared to receive an @code{IOBUF}
-structure where the @code{fd} field is set to @code{INVALID_HANDLE},
-meaning that @command{gawk} was not able to open the file itself. In
-this case, the hook function must be able to successfully open the file
-and place a valid file descriptor there.
-
-Currently, for example, the hook function facility is used to implement
-the XML parser shared library extension. For more info, please look in
-@file{awk.h} and in @file{io.c}.
-@end table
+And that brings us to the second (and stronger) reason why all the files
+really need to be in @command{git}. It boils down to who do you cater
+to---the @command{gawk} developer(s), or the user who just wants to check
+out a version and try it out?
-An argument that is supposed to be an array needs to be handled with
-some extra code, in case the array being passed in is actually
-from a function parameter.
+The @command{gawk} maintainer
+wants it to be possible for any interested @command{awk} user in the
+world to just clone the repository, check out the branch of interest and
+build it. Without their having to have the correct version(s) of the
+autotools.@footnote{There is one GNU program that is (in our opinion)
+severely difficult to bootstrap from the @command{git} repository. For
+example, on the author's old (but still working) PowerPC macintosh with
+Mac OS X 10.5, it was necessary to bootstrap a ton of software, starting
+with @command{git} itself, in order to try to work with the latest code.
+It's not pleasant, and especially on older systems, it's a big waste
+of time.
-The following boilerplate code shows how to do this:
+Starting with the latest tarball was no picnic either. The maintainers
+had dropped @file{.gz} and @file{.bz2} files and only distribute
+@file{.tar.xz} files. It was necessary to bootstrap @command{xz} first!}
+That is the point of the @file{bootstrap.sh} file. It touches the
+various other files in the right order such that
@example
-NODE *the_arg;
-
-/* assume need 3rd arg, 0-based */
-the_arg = get_array_argument(2, FALSE);
+# The canonical incantation for building GNU software:
+./bootstrap.sh && ./configure && make
@end example
-Again, you should spend time studying the @command{gawk} internals;
-don't just blindly copy this code.
-@c ENDOFRANGE gawint
-
-@node Plugin License
-@appendixsubsec Extension Licensing
-
-Every dynamic extension should define the global symbol
-@code{plugin_is_GPL_compatible} to assert that it has been licensed under
-a GPL-compatible license. If this symbol does not exist, @command{gawk}
-will emit a fatal error and exit.
-
-The declared type of the symbol should be @code{int}. It does not need
-to be in any allocated section, though. The code merely asserts that
-the symbol exists in the global scope. Something like this is enough:
-
-@example
-int plugin_is_GPL_compatible;
-@end example
-
-@node Sample Library
-@appendixsubsec Example: Directory and File Operation Built-ins
-@c STARTOFRANGE chdirg
-@cindex @code{chdir()} function@comma{} implementing in @command{gawk}
-@c STARTOFRANGE statg
-@cindex @code{stat()} function@comma{} implementing in @command{gawk}
-@c STARTOFRANGE filre
-@cindex files, information about@comma{} retrieving
-@c STARTOFRANGE dirch
-@cindex directories, changing
-
-Two useful functions that are not in @command{awk} are @code{chdir()}
-(so that an @command{awk} program can change its directory) and
-@code{stat()} (so that an @command{awk} program can gather information about
-a file).
-This @value{SECTION} implements these functions for @command{gawk} in an
-external extension library.
-
-@menu
-* Internal File Description:: What the new functions will do.
-* Internal File Ops:: The code for internal file operations.
-* Using Internal File Ops:: How to use an external extension.
-@end menu
-
-@node Internal File Description
-@appendixsubsubsec Using @code{chdir()} and @code{stat()}
-
-This @value{SECTION} shows how to use the new functions at the @command{awk}
-level once they've been integrated into the running @command{gawk}
-interpreter.
-Using @code{chdir()} is very straightforward. It takes one argument,
-the new directory to change to:
-
-@example
-@dots{}
-newdir = "/home/arnold/funstuff"
-ret = chdir(newdir)
-if (ret < 0) @{
- printf("could not change to %s: %s\n",
- newdir, ERRNO) > "/dev/stderr"
- exit 1
-@}
-@dots{}
-@end example
-
-The return value is negative if the @code{chdir} failed,
-and @code{ERRNO}
-(@pxref{Built-in Variables})
-is set to a string indicating the error.
-
-Using @code{stat()} is a bit more complicated.
-The C @code{stat()} function fills in a structure that has a fair
-amount of information.
-The right way to model this in @command{awk} is to fill in an associative
-array with the appropriate information:
-
-@c broke printf for page breaking
-@example
-file = "/home/arnold/.profile"
-fdata[1] = "x" # force `fdata' to be an array
-ret = stat(file, fdata)
-if (ret < 0) @{
- printf("could not stat %s: %s\n",
- file, ERRNO) > "/dev/stderr"
- exit 1
-@}
-printf("size of %s is %d bytes\n", file, fdata["size"])
-@end example
-
-The @code{stat()} function always clears the data array, even if
-the @code{stat()} fails. It fills in the following elements:
-
-@table @code
-@item "name"
-The name of the file that was @code{stat()}'ed.
+@noindent
+will @emph{just work}.
-@item "dev"
-@itemx "ino"
-The file's device and inode numbers, respectively.
+This is extremely important for the @code{master} and
+@code{gawk-@var{X}.@var{Y}-stable} branches.
-@item "mode"
-The file's mode, as a numeric value. This includes both the file's
-type and its permissions.
+Further, the @command{gawk} maintainer would argue that it's also
+important for the @command{gawk} developers. When he tried to check out
+the @code{xgawk} branch@footnote{A branch created by one of the other
+developers that did not include the generated files.} to build it, he
+couldn't. (No @file{ltmain.sh} file, and he had no idea how to create it,
+and that was not the only problem.)
-@item "nlink"
-The number of hard links (directory entries) the file has.
+He felt @emph{extremely} frustrated. With respect to that branch,
+the maintainer is no different than Jane User who wants to try to build
+@code{gawk-4.0-stable} or @code{master} from the repository.
-@item "uid"
-@itemx "gid"
-The numeric user and group ID numbers of the file's owner.
+Thus, the maintainer thinks that it's not just important, but critical,
+that for any given branch, the above incantation @emph{just works}.
-@item "size"
-The size in bytes of the file.
+@c So - that's my reasoning and philosophy.
-@item "blocks"
-The number of disk blocks the file actually occupies. This may not
-be a function of the file's size if the file has holes.
+What are some of the consequences and/or actions to take?
-@item "atime"
-@itemx "mtime"
-@itemx "ctime"
-The file's last access, modification, and inode update times,
-respectively. These are numeric timestamps, suitable for formatting
-with @code{strftime()}
-(@pxref{Built-in}).
+@enumerate 1
+@item
+We don't mind that there are differing files in the different branches
+as a result of different versions of the autotools.
-@item "pmode"
-The file's ``printable mode.'' This is a string representation of
-the file's type and permissions, such as what is produced by
-@samp{ls -l}---for example, @code{"drwxr-xr-x"}.
+@enumerate A
+@item
+It's the maintainer's job to merge them and he will deal with it.
-@item "type"
-A printable string representation of the file's type. The value
-is one of the following:
+@item
+He is really good at @samp{git diff x y > /tmp/diff1 ; gvim /tmp/diff1} to
+remove the diffs that aren't of interest in order to review code. @code{:-)}
+@end enumerate
-@table @code
-@item "blockdev"
-@itemx "chardev"
-The file is a block or character device (``special file'').
+@item
+It would certainly help if everyone used the same versions of the GNU tools
+as he does, which in general are the latest released versions of
+@command{automake},
+@command{autoconf},
+@command{bison},
+and
+@command{gettext}.
@ignore
-@item "door"
-The file is a Solaris ``door'' (special file used for
-interprocess communications).
+If it would help if I sent out an "I just upgraded to version x.y
+of tool Z" kind of message to this list, I can do that. Up until
+now it hasn't been a real issue since I'm the only one who's been
+dorking with the configuration machinery.
@end ignore
-@item "directory"
-The file is a directory.
-
-@item "fifo"
-The file is a named-pipe (also known as a FIFO).
-
-@item "file"
-The file is just a regular file.
-
-@item "socket"
-The file is an @code{AF_UNIX} (``Unix domain'') socket in the
-filesystem.
-
-@item "symlink"
-The file is a symbolic link.
-@end table
-@end table
-
-Several additional elements may be present depending upon the operating
-system and the type of the file. You can test for them in your @command{awk}
-program by using the @code{in} operator
-(@pxref{Reference to Elements}):
-
-@table @code
-@item "blksize"
-The preferred block size for I/O to the file. This field is not
-present on all POSIX-like systems in the C @code{stat} structure.
-
-@item "linkval"
-If the file is a symbolic link, this element is the name of the
-file the link points to (i.e., the value of the link).
-
-@item "rdev"
-@itemx "major"
-@itemx "minor"
-If the file is a block or character device file, then these values
-represent the numeric device number and the major and minor components
-of that number, respectively.
-@end table
-
-@node Internal File Ops
-@appendixsubsubsec C Code for @code{chdir()} and @code{stat()}
-
-Here is the C code for these extensions. They were written for
-GNU/Linux. The code needs some more work for complete portability
-to other POSIX-compliant systems:@footnote{This version is edited
-slightly for presentation. See
-@file{extension/filefuncs.c} in the @command{gawk} distribution
-for the complete version.}
-
-@c break line for page breaking
-@example
-#include "awk.h"
-
-#include <sys/sysmacros.h>
-
-int plugin_is_GPL_compatible;
-
-/* do_chdir --- provide dynamically loaded chdir() builtin for gawk */
-
-static NODE *
-do_chdir(int nargs)
-@{
- NODE *newdir;
- int ret = -1;
-
- if (do_lint && get_curfunc_arg_count() != 1)
- lintwarn("chdir: called with incorrect number of arguments");
-
- newdir = get_scalar_argument(0, FALSE);
-@end example
-
-The file includes the @code{"awk.h"} header file for definitions
-for the @command{gawk} internals. It includes @code{<sys/sysmacros.h>}
-for access to the @code{major()} and @code{minor}() macros.
-
-@cindex programming conventions, @command{gawk} internals
-By convention, for an @command{awk} function @code{foo}, the function that
-implements it is called @samp{do_foo}. The function should take
-a @samp{int} argument, usually called @code{nargs}, that
-represents the number of defined arguments for the function. The @code{newdir}
-variable represents the new directory to change to, retrieved
-with @code{get_scalar_argument()}. Note that the first argument is
-numbered zero.
-
-This code actually accomplishes the @code{chdir()}. It first forces
-the argument to be a string and passes the string value to the
-@code{chdir()} system call. If the @code{chdir()} fails, @code{ERRNO}
-is updated.
-
-@example
- (void) force_string(newdir);
- ret = chdir(newdir->stptr);
- if (ret < 0)
- update_ERRNO();
-@end example
-
-Finally, the function returns the return value to the @command{awk} level:
-
-@example
- return make_number((AWKNUM) ret);
-@}
-@end example
-
-The @code{stat()} built-in is more involved. First comes a function
-that turns a numeric mode into a printable representation
-(e.g., 644 becomes @samp{-rw-r--r--}). This is omitted here for brevity:
+@enumerate A
+@item
+Installing from source is quite easy. It's how the maintainer worked for years
+under Fedora.
+He had @file{/usr/local/bin} at the front of hs @env{PATH} and just did:
-@c break line for page breaking
@example
-/* format_mode --- turn a stat mode field into something readable */
-
-static char *
-format_mode(unsigned long fmode)
-@{
- @dots{}
-@}
+wget http://ftp.gnu.org/gnu/@var{package}/@var{package}-@var{x}.@var{y}.@var{z}.tar.gz
+tar -xpzvf @var{package}-@var{x}.@var{y}.@var{z}.tar.gz
+cd @var{package}-@var{x}.@var{y}.@var{z}
+./configure && make && make check
+make install # as root
@end example
-Next comes the @code{do_stat()} function. It starts with
-variable declarations and argument checking:
+@item
+These days the maintainer uses Ubuntu 10.11 which is medium current, but
+he is already doing the above for @command{autoconf} and @command{bison}.
@ignore
-Changed message for page breaking. Used to be:
- "stat: called with incorrect number of arguments (%d), should be 2",
+(C. Rant: Recent Linux versions with GNOME 3 really suck. What
+ are all those people thinking? Fedora 15 was such a bust it drove
+ me to Ubuntu, but Ubuntu 11.04 and 11.10 are totally unusable from
+ a UI perspective. Bleah.)
@end ignore
-@example
-/* do_stat --- provide a stat() function for gawk */
-
-static NODE *
-do_stat(int nargs)
-@{
- NODE *file, *array, *tmp;
- struct stat sbuf;
- int ret;
- NODE **aptr;
- char *pmode; /* printable mode */
- char *type = "unknown";
-
- if (do_lint && get_curfunc_arg_count() > 2)
- lintwarn("stat: called with too many arguments");
-@end example
-
-Then comes the actual work. First, the function gets the arguments.
-Then, it always clears the array.
-The code use @code{lstat()} (instead of @code{stat()})
-to get the file information,
-in case the file is a symbolic link.
-If there's an error, it sets @code{ERRNO} and returns:
-
-@c comment made multiline for page breaking
-@example
- /* file is first arg, array to hold results is second */
- file = get_scalar_argument(0, FALSE);
- array = get_array_argument(1, FALSE);
-
- /* empty out the array */
- assoc_clear(array);
-
- /* lstat the file, if error, set ERRNO and return */
- (void) force_string(file);
- ret = lstat(file->stptr, & sbuf);
- if (ret < 0) @{
- update_ERRNO();
- return make_number((AWKNUM) ret);
- @}
-@end example
-
-Now comes the tedious part: filling in the array. Only a few of the
-calls are shown here, since they all follow the same pattern:
-
-@example
- /* fill in the array */
- aptr = assoc_lookup(array, tmp = make_string("name", 4), FALSE);
- *aptr = dupnode(file);
- unref(tmp);
-
- aptr = assoc_lookup(array, tmp = make_string("mode", 4), FALSE);
- *aptr = make_number((AWKNUM) sbuf.st_mode);
- unref(tmp);
-
- aptr = assoc_lookup(array, tmp = make_string("pmode", 5), FALSE);
- pmode = format_mode(sbuf.st_mode);
- *aptr = make_string(pmode, strlen(pmode));
- unref(tmp);
-@end example
-
-When done, return the @code{lstat()} return value:
-
-@example
-
- return make_number((AWKNUM) ret);
-@}
-@end example
-
-@cindex programming conventions, @command{gawk} internals
-Finally, it's necessary to provide the ``glue'' that loads the
-new function(s) into @command{gawk}. By convention, each library has
-a routine named @code{dlload()} that does the job:
-
-@example
-/* dlload --- load new builtins in this library */
+@end enumerate
-NODE *
-dlload(NODE *tree, void *dl)
-@{
- make_builtin("chdir", do_chdir, 1);
- make_builtin("stat", do_stat, 2);
- return make_number((AWKNUM) 0);
-@}
-@end example
+@ignore
+@item
+If someone still feels really strongly about all this, then perhaps they
+can have two branches, one for their development with just the clean
+changes, and one that is buildable (xgawk and xgawk-buildable, maybe).
+Or, as I suggested in another mail, make commits in pairs, the first with
+the "real" changes and the second with "everything else needed for
+ building".
+@end ignore
+@end enumerate
-And that's it! As an exercise, consider adding functions to
-implement system calls such as @code{chown()}, @code{chmod()},
-and @code{umask()}.
+Most of the above was originally written by the maintainer to other
+@command{gawk} developers. It raised the objection from one of
+the developers ``@dots{} that anybody pulling down the source from
+@command{git} is not an end user.''
-@node Using Internal File Ops
-@appendixsubsubsec Integrating the Extensions
+However, this is not true. There are ``power @command{awk} users''
+who can build @command{gawk} (using the magic incantation shown previously)
+but who can't program in C. Thus, the major branches should be
+kept buildable all the time.
-@cindex @command{gawk}, interpreter@comma{} adding code to
-Now that the code is written, it must be possible to add it at
-runtime to the running @command{gawk} interpreter. First, the
-code must be compiled. Assuming that the functions are in
-a file named @file{filefuncs.c}, and @var{idir} is the location
-of the @command{gawk} include files,
-the following steps create
-a GNU/Linux shared library:
-
-@example
-$ @kbd{gcc -fPIC -shared -DHAVE_CONFIG_H -c -O -g -I@var{idir} filefuncs.c}
-$ @kbd{ld -o filefuncs.so -shared filefuncs.o}
-@end example
+It was then suggested that there be a @command{cron} job to create
+nightly tarballs of ``the source.'' Here, the problem is that there
+are source trees, corresponding to the various branches! So,
+nightly tar balls aren't the answer, especially as the repository can go
+for weeks without significant change being introduced.
-@cindex @code{extension()} function (@command{gawk})
-Once the library exists, it is loaded by calling the @code{extension()}
-built-in function.
-This function takes two arguments: the name of the
-library to load and the name of a function to call when the library
-is first loaded. This function adds the new functions to @command{gawk}.
-It returns the value returned by the initialization function
-within the shared library:
+Fortunately, the @command{git} server can meet this need. For any given
+branch named @var{branchname}, use:
@example
-# file testff.awk
-BEGIN @{
- extension("./filefuncs.so", "dlload")
-
- chdir(".") # no-op
-
- data[1] = 1 # force `data' to be an array
- print "Info for testff.awk"
- ret = stat("testff.awk", data)
- print "ret =", ret
- for (i in data)
- printf "data[\"%s\"] = %s\n", i, data[i]
- print "testff.awk modified:",
- strftime("%m %d %y %H:%M:%S", data["mtime"])
-
- print "\nInfo for JUNK"
- ret = stat("JUNK", data)
- print "ret =", ret
- for (i in data)
- printf "data[\"%s\"] = %s\n", i, data[i]
- print "JUNK modified:", strftime("%m %d %y %H:%M:%S", data["mtime"])
-@}
+wget http://git.savannah.gnu.org/cgit/gawk.git/snapshot/gawk-@var{branchname}.tar.gz
@end example
-Here are the results of running the program:
+@noindent
+to retrieve a snapshot of the given branch.
-@example
-$ @kbd{gawk -f testff.awk}
-@print{} Info for testff.awk
-@print{} ret = 0
-@print{} data["size"] = 607
-@print{} data["ino"] = 14945891
-@print{} data["name"] = testff.awk
-@print{} data["pmode"] = -rw-rw-r--
-@print{} data["nlink"] = 1
-@print{} data["atime"] = 1293993369
-@print{} data["mtime"] = 1288520752
-@print{} data["mode"] = 33204
-@print{} data["blksize"] = 4096
-@print{} data["dev"] = 2054
-@print{} data["type"] = file
-@print{} data["gid"] = 500
-@print{} data["uid"] = 500
-@print{} data["blocks"] = 8
-@print{} data["ctime"] = 1290113572
-@print{} testff.awk modified: 10 31 10 12:25:52
-@print{}
-@print{} Info for JUNK
-@print{} ret = -1
-@print{} JUNK modified: 01 01 70 02:00:00
-@end example
-@c ENDOFRANGE filre
-@c ENDOFRANGE dirch
-@c ENDOFRANGE statg
-@c ENDOFRANGE chdirg
-@c ENDOFRANGE gladfgaw
-@c ENDOFRANGE adfugaw
-@c ENDOFRANGE fubadgaw
@node Future Extensions
@appendixsec Probable Future Extensions
@@ -29957,66 +34894,37 @@ Arnold Robbins
Larry Wall
@end quotation
-This @value{SECTION} briefly lists extensions and possible improvements
-that indicate the directions we are
-currently considering for @command{gawk}. The file @file{FUTURES} in the
-@command{gawk} distribution lists these extensions as well.
-
-Following is a list of probable future changes visible at the
-@command{awk} language level:
-
-@c these are ordered by likelihood
-@table @asis
-@item Loadable module interface
-It is not clear that the @command{awk}-level interface to the
-modules facility is as good as it should be. The interface needs to be
-redesigned, particularly taking namespace issues into account, as
-well as possibly including issues such as library search path order
-and versioning.
-
-@item @code{RECLEN} variable for fixed-length records
-Along with @code{FIELDWIDTHS}, this would speed up the processing of
-fixed-length records.
-@code{PROCINFO["RS"]} would be @code{"RS"} or @code{"RECLEN"},
-depending upon which kind of record processing is in effect.
-
-@item Databases
-It may be possible to map a GDBM/NDBM/SDBM file into an @command{awk} array.
-
-@item More @code{lint} warnings
-There are more things that could be checked for portability.
-@end table
-
-Following is a list of probable improvements that will make @command{gawk}'s
-source code easier to work with:
-
-@table @asis
-@item Loadable module mechanics
-The current extension mechanism works
-(@pxref{Dynamic Extensions}),
-but is rather primitive. It requires a fair amount of manual work
-to create and integrate a loadable module.
-Nor is the current mechanism as portable as might be desired.
-The GNU @command{libtool} package provides a number of features that
-would make using loadable modules much easier.
-@command{gawk} should be changed to use @command{libtool}.
-
-@item Loadable module internals
-The API to its internals that @command{gawk} ``exports'' should be revised.
-Too many things are needlessly exposed. A new API should be designed
-and implemented to make module writing easier.
-
-@item Better array subscript management
-@command{gawk}'s management of array subscript storage could use revamping,
-so that using the same value to index multiple arrays only
-stores one copy of the index value.
-@end table
-
-Finally,
-the programs in the test suite could use documenting in this @value{DOCUMENT}.
-
+The @file{TODO} file in the @command{gawk} Git repository lists possible
+future enhancements. Some of these relate to the source code, and others
+to possible new features. Please see that file for the list.
@xref{Additions},
-if you are interested in tackling any of these projects.
+if you are interested in tackling any of the projects listed there.
+
+@node Implementation Limitations
+@appendixsec Some Limitations of the Implementation
+
+This following table describes limits of @command{gawk} on a Unix-like
+system (although it is variable even then). Other systems may have
+different limits.
+
+@c @multitable {Number of file redirections} {min(number of processes per user, number of open files)}
+@multitable @columnfractions .40 .60
+@headitem Item @tab Limit
+@item Characters in a character class @tab 2^(number of bits per byte)
+@item Length of input record @tab @code{MAX_INT }
+@item Length of output record @tab Unlimited
+@item Length of source line @tab Unlimited
+@item Number of fields in a record @tab @code{MAX_LONG}
+@item Number of file redirections @tab Unlimited
+@item Number of input records in one file @tab @code{MAX_LONG}
+@item Number of input records total @tab @code{MAX_LONG}
+@item Number of pipe redirections @tab min(number of processes per user, number of open files)
+@item Numeric values @tab Double-precision floating point (if not using MPFR)
+@item Size of a field @tab @code{MAX_INT }
+@item Size of a literal string @tab @code{MAX_INT }
+@item Size of a printf string @tab @code{MAX_INT }
+@end multitable
+
@c ENDOFRANGE impis
@c ENDOFRANGE gawii
@@ -30037,7 +34945,6 @@ other introductory texts that you should refer to instead.)
@menu
* Basic High Level:: The high level view.
* Basic Data Typing:: A very quick intro to data types.
-* Floating Point Issues:: Stuff to know about floating-point numbers.
@end menu
@node Basic High Level
@@ -30045,19 +34952,17 @@ other introductory texts that you should refer to instead.)
@cindex processing data
At the most basic level, the job of a program is to process
-some input data and produce results.
+some input data and produce results. See @ref{figure-general-flow}.
-@iftex
-@image{general-program}
-@end iftex
-@ifnottex
-@example
- _______
-+------+ / \ +---------+
-| Data | -----> < Program > -----> | Results |
-+------+ \_______/ +---------+
-@end example
-@end ifnottex
+@float Figure,figure-general-flow
+@caption{General Program Flow}
+@ifinfo
+@center @image{general-program, , , General program flow, txt}
+@end ifinfo
+@ifnotinfo
+@center @image{general-program, , , General program flow}
+@end ifnotinfo
+@end float
@cindex compiled programs
@cindex interpreted programs
@@ -30073,26 +34978,18 @@ instructions in your program to process the data.
@cindex programming, basic steps
When you write a program, it usually consists
-of the following, very basic set of steps:
+of the following, very basic set of steps, as shown
+in @ref{figure-process-flow}:
-@iftex
-@image{process-flow}
-@end iftex
-@ifnottex
-@example
- ______
-+----------------+ / More \ No +----------+
-| Initialization | -------> < Data > -------> | Clean Up |
-+----------------+ ^ \ ? / +----------+
- | +--+-+
- | | Yes
- | |
- | V
- | +---------+
- +-----+ Process |
- +---------+
-@end example
-@end ifnottex
+@float Figure,figure-process-flow
+@caption{Basic Program Steps}
+@ifinfo
+@center @image{process-flow, , , Basic Program Stages, txt}
+@end ifinfo
+@ifnotinfo
+@center @image{process-flow, , , Basic Program Stages}
+@end ifnotinfo
+@end float
@table @asis
@item Initialization
@@ -30188,47 +35085,10 @@ Individual variables, as well as numeric and string variables, are
referred to as @dfn{scalar} values.
Groups of values, such as arrays, are not scalars.
-@cindex integers
-@cindex floating-point, numbers
-@cindex numbers, floating-point
-Within computers, there are two kinds of numeric values: @dfn{integers}
-and @dfn{floating-point}.
-In school, integer values were referred to as ``whole'' numbers---that is,
-numbers without any fractional part, such as 1, 42, or @minus{}17.
-The advantage to integer numbers is that they represent values exactly.
-The disadvantage is that their range is limited. On most systems,
-this range is @minus{}2,147,483,648 to 2,147,483,647.
-However, many systems now support a range from
-@minus{}9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.
-
-@cindex unsigned integers
-@cindex integers, unsigned
-Integer values come in two flavors: @dfn{signed} and @dfn{unsigned}.
-Signed values may be negative or positive, with the range of values just
-described.
-Unsigned values are always positive. On most systems,
-the range is from 0 to 4,294,967,295.
-However, many systems now support a range from
-0 to 18,446,744,073,709,551,615.
-
-@cindex double precision floating-point
-@cindex single precision floating-point
-Floating-point numbers represent what are called ``real'' numbers; i.e.,
-those that do have a fractional part, such as 3.1415927.
-The advantage to floating-point numbers is that they
-can represent a much larger range of values.
-The disadvantage is that there are numbers that they cannot represent
-exactly.
-@command{awk} uses @dfn{double precision} floating-point numbers, which
-can hold more digits than @dfn{single precision}
-floating-point numbers.
-Floating-point issues are discussed more fully in
-@ref{Floating Point Issues}.
-
-At the very lowest level, computers store values as groups of binary digits,
-or @dfn{bits}. Modern computers group bits into groups of eight, called @dfn{bytes}.
-Advanced applications sometimes have to manipulate bits directly,
-and @command{gawk} provides functions for doing so.
+@ref{General Arithmetic}, provided a basic introduction to numeric
+types (integer and floating-point) and how they are used in a computer.
+Please review that information, including a number of caveats that
+were presented.
@cindex null strings
While you are probably used to the idea of a number without a value (i.e., zero),
@@ -30252,6 +35112,11 @@ plus 0 times 1, or decimal 10.
Octal and hexadecimal are discussed more in
@ref{Nondecimal-numbers}.
+At the very lowest level, computers store values as groups of binary digits,
+or @dfn{bits}. Modern computers group bits into groups of eight, called @dfn{bytes}.
+Advanced applications sometimes have to manipulate bits directly,
+and @command{gawk} provides functions for doing so.
+
Programs are written in programming languages.
Hundreds, if not thousands, of programming languages exist.
One of the most popular is the C programming language.
@@ -30271,239 +35136,6 @@ standard for C. This standard became an ISO standard in 1990.
In 1999, a revised ISO C standard was approved and released.
Where it makes sense, POSIX @command{awk} is compatible with 1999 ISO C.
-@node Floating Point Issues
-@appendixsec Floating-Point Number Caveats
-
-As mentioned earlier, floating-point numbers represent what are called
-``real'' numbers, i.e., those that have a fractional part. @command{awk}
-uses double precision floating-point numbers to represent all
-numeric values. This @value{SECTION} describes some of the issues
-involved in using floating-point numbers.
-
-There is a very nice
-@uref{http://www.validlab.com/goldberg/paper.pdf, paper on floating-point arithmetic}
-by David Goldberg,
-``What Every Computer Scientist Should Know About Floating-point Arithmetic,''
-@cite{ACM Computing Surveys} @strong{23}, 1 (1991-03), 5-48.
-This is worth reading if you are interested in the details,
-but it does require a background in computer science.
-
-@menu
-* String Conversion Precision:: The String Value Can Lie.
-* Unexpected Results:: Floating Point Numbers Are Not Abstract
- Numbers.
-* POSIX Floating Point Problems:: Standards Versus Existing Practice.
-@end menu
-
-@node String Conversion Precision
-@appendixsubsec The String Value Can Lie
-
-Internally, @command{awk} keeps both the numeric value
-(double precision floating-point) and the string value for a variable.
-Separately, @command{awk} keeps
-track of what type the variable has
-(@pxref{Typing and Comparison}),
-which plays a role in how variables are used in comparisons.
-
-It is important to note that the string value for a number may not
-reflect the full value (all the digits) that the numeric value
-actually contains.
-The following program (@file{values.awk}) illustrates this:
-
-@example
-@{
- sum = $1 + $2
- # see it for what it is
- printf("sum = %.12g\n", sum)
- # use CONVFMT
- a = "<" sum ">"
- print "a =", a
- # use OFMT
- print "sum =", sum
-@}
-@end example
-
-@noindent
-This program shows the full value of the sum of @code{$1} and @code{$2}
-using @code{printf}, and then prints the string values obtained
-from both automatic conversion (via @code{CONVFMT}) and
-from printing (via @code{OFMT}).
-
-Here is what happens when the program is run:
-
-@example
-$ @kbd{echo 3.654321 1.2345678 | awk -f values.awk}
-@print{} sum = 4.8888888
-@print{} a = <4.88889>
-@print{} sum = 4.88889
-@end example
-
-This makes it clear that the full numeric value is different from
-what the default string representations show.
-
-@code{CONVFMT}'s default value is @code{"%.6g"}, which yields a value with
-at least six significant digits. For some applications, you might want to
-change it to specify more precision.
-On most modern machines, most of the time,
-17 digits is enough to capture a floating-point number's
-value exactly.@footnote{Pathological cases can require up to
-752 digits (!), but we doubt that you need to worry about this.}
-
-@node Unexpected Results
-@appendixsubsec Floating Point Numbers Are Not Abstract Numbers
-
-@cindex floating-point, numbers
-Unlike numbers in the abstract sense (such as what you studied in high school
-or college math), numbers stored in computers are limited in certain ways.
-They cannot represent an infinite number of digits, nor can they always
-represent things exactly.
-In particular,
-floating-point numbers cannot
-always represent values exactly. Here is an example:
-
-@example
-$ @kbd{awk '@{ printf("%010d\n", $1 * 100) @}'}
-515.79
-@print{} 0000051579
-515.80
-@print{} 0000051579
-515.81
-@print{} 0000051580
-515.82
-@print{} 0000051582
-@kbd{@value{CTL}-d}
-@end example
-
-@noindent
-This shows that some values can be represented exactly,
-whereas others are only approximated. This is not a ``bug''
-in @command{awk}, but simply an artifact of how computers
-represent numbers.
-
-@cindex negative zero
-@cindex positive zero
-@cindex zero@comma{} negative vs.@: positive
-Another peculiarity of floating-point numbers on modern systems
-is that they often have more than one representation for the number zero!
-In particular, it is possible to represent ``minus zero'' as well as
-regular, or ``positive'' zero.
-
-This example shows that negative and positive zero are distinct values
-when stored internally, but that they are in fact equal to each other,
-as well as to ``regular'' zero:
-
-@example
-$ @kbd{gawk 'BEGIN @{ mz = -0 ; pz = 0}
-> @kbd{printf "-0 = %g, +0 = %g, (-0 == +0) -> %d\n", mz, pz, mz == pz}
-> @kbd{printf "mz == 0 -> %d, pz == 0 -> %d\n", mz == 0, pz == 0}
-> @kbd{@}'}
-@print{} -0 = -0, +0 = 0, (-0 == +0) -> 1
-@print{} mz == 0 -> 1, pz == 0 -> 1
-@end example
-
-It helps to keep this in mind should you process numeric data
-that contains negative zero values; the fact that the zero is negative
-is noted and can affect comparisons.
-
-@node POSIX Floating Point Problems
-@appendixsubsec Standards Versus Existing Practice
-
-Historically, @command{awk} has converted any non-numeric looking string
-to the numeric value zero, when required. Furthermore, the original
-definition of the language and the original POSIX standards specified that
-@command{awk} only understands decimal numbers (base 10), and not octal
-(base 8) or hexadecimal numbers (base 16).
-
-Changes in the language of the
-2001 and 2004 POSIX standard can be interpreted to imply that @command{awk}
-should support additional features. These features are:
-
-@itemize @bullet
-@item
-Interpretation of floating point data values specified in hexadecimal
-notation (@samp{0xDEADBEEF}). (Note: data values, @emph{not}
-source code constants.)
-
-@item
-Support for the special IEEE 754 floating point values ``Not A Number''
-(NaN), positive Infinity (``inf'') and negative Infinity (``@minus{}inf'').
-In particular, the format for these values is as specified by the ISO 1999
-C standard, which ignores case and can allow machine-dependent additional
-characters after the @samp{nan} and allow either @samp{inf} or @samp{infinity}.
-@end itemize
-
-The first problem is that both of these are clear changes to historical
-practice:
-
-@itemize @bullet
-@item
-The @command{gawk} maintainer feels that supporting hexadecimal floating
-point values, in particular, is ugly, and was never intended by the
-original designers to be part of the language.
-
-@item
-Allowing completely alphabetic strings to have valid numeric
-values is also a very severe departure from historical practice.
-@end itemize
-
-The second problem is that the @code{gawk} maintainer feels that this
-interpretation of the standard, which requires a certain amount of
-``language lawyering'' to arrive at in the first place, was not even
-intended by the standard developers. In other words, ``we see how you
-got where you are, but we don't think that that's where you want to be.''
-
-The 2008 POSIX standard added explicit wording to allow, but not require,
-that @command{awk} support hexadecimal floating point values and
-special values for ``Not A Number'' and infinity.
-
-Although the @command{gawk} maintainer continues to feel that
-providing those features is inadvisable,
-nevertheless, on systems that support IEEE floating point, it seems
-reasonable to provide @emph{some} way to support NaN and Infinity values.
-The solution implemented in @command{gawk} is as follows:
-
-@itemize @bullet
-@item
-With the @option{--posix} command-line option, @command{gawk} becomes
-``hands off.'' String values are passed directly to the system library's
-@code{strtod()} function, and if it successfully returns a numeric value,
-that is what's used.@footnote{You asked for it, you got it.}
-By definition, the results are not portable across
-different systems. They are also a little surprising:
-
-@example
-$ @kbd{echo nanny | gawk --posix '@{ print $1 + 0 @}'}
-@print{} nan
-$ @kbd{echo 0xDeadBeef | gawk --posix '@{ print $1 + 0 @}'}
-@print{} 3735928559
-@end example
-
-@item
-Without @option{--posix}, @command{gawk} interprets the four strings
-@samp{+inf},
-@samp{-inf},
-@samp{+nan},
-and
-@samp{-nan}
-specially, producing the corresponding special numeric values.
-The leading sign acts a signal to @command{gawk} (and the user)
-that the value is really numeric. Hexadecimal floating point is
-not supported (unless you also use @option{--non-decimal-data},
-which is @emph{not} recommended). For example:
-
-@example
-$ @kbd{echo nanny | gawk '@{ print $1 + 0 @}'}
-@print{} 0
-$ @kbd{echo +nan | gawk '@{ print $1 + 0 @}'}
-@print{} nan
-$ @kbd{echo 0xDeadBeef | gawk '@{ print $1 + 0 @}'}
-@print{} 0
-@end example
-
-@command{gawk} does ignore case in the four special values.
-Thus @samp{+nan} and @samp{+NaN} are the same.
-@end itemize
-
@c ENDOFRANGE procon
@node Glossary
@@ -30712,6 +35344,50 @@ It was written in @command{awk}
by Brian Kernighan and Jon Bentley, and is available from
@uref{http://netlib.sandia.gov/netlib/typesetting/chem.gz}.
+@cindex cookie
+@item Cookie
+A peculiar goodie, token, saying or remembrance
+produced by or presented to a program. (With thanks to Doug McIlroy.)
+@ignore
+From: Doug McIlroy <doug@cs.dartmouth.edu>
+Date: Sat, 13 Oct 2012 19:55:25 -0400
+To: arnold@skeeve.com
+Subject: Re: origin of the term "cookie"?
+
+I believe the term "cookie", for a more or less inscrutable
+saying or crumb of information, was injected into Unix
+jargon by Bob Morris, who used the word quite frequently.
+It had no fixed meaning as it now does in browsers.
+
+The word had been around long before it was recognized in
+the 8th edition glossary (earlier editions had no glossary):
+
+cookie a peculiar goodie, token, saying or remembrance
+returned by or presented to a program. [I would say that
+"returned by" would better read "produced by", and assume
+responsibility for the inexactitude.]
+
+Doug McIlroy
+
+From: Doug McIlroy <doug@cs.dartmouth.edu>
+Date: Sun, 14 Oct 2012 10:08:43 -0400
+To: arnold@skeeve.com
+Subject: Re: origin of the term "cookie"?
+
+> Can I forward your email to Eric Raymond, for possible addition to the
+> Jargon File?
+
+Sure. I might add that I don't know how "cookie" entered Morris's
+vocabulary. Certainly "values of beta give rise to dom!" (see google)
+was an early, if not the earliest Unix cookie. The fact that it was
+found lying around on a model 37 teletype (which had Greek beta in
+its type box) suggests that maybe it was seen to be like milk and
+cookies laid out for Santa Claus. Morris was wont to make such
+connections.
+
+Doug
+@end ignore
+
@item Coprocess
A subordinate program with which two-way communications is possible.
@@ -32496,9 +37172,6 @@ Unresolved Issues:
of how to use them. It would be useful to perhaps have a "programming
style" section of the manual that would include this and other tips.
-2. The default AWKPATH search path should be configurable via `configure'
- The default and how this changes needs to be documented.
-
Consistency issues:
/.../ regexps are in @code, not @samp
".." strings are in @code, not @samp
@@ -32593,14 +37266,7 @@ ORA uses filename, thus the macro.
Suggestions:
------------
-Enhance FIELDWIDTHS with some way to indicate "the rest of the record".
-E.g., a length of 0 or -1 or something. May be "n"?
-
-Make FIELDWIDTHS be an array?
-
% Next edition:
-% 1. Talk about common extensions, those in nawk, gawk, mawk
-% 2. Use @code{foo} for variables and @code{foo()} for functions
-% 3. Standardize the error messages from the functions and programs
-% in Chapters 12 and 13.
-% 4. Nuke the BBS stuff and use something that won't be obsolete
+% 1. Standardize the error messages from the functions and programs
+% in the two sample code chapters.
+% 2. Nuke the BBS stuff and use something that won't be obsolete