aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorArnold D. Robbins <arnold@skeeve.com>2012-11-04 15:14:34 +0200
committerArnold D. Robbins <arnold@skeeve.com>2012-11-04 15:14:34 +0200
commit2204f38c05fef5747b8f6764a202b646f4126338 (patch)
treea24b6a63658bb9c0ef4baf989061f65959496992
parent5d3c11459bf9c8870cfc599722118b910aa17394 (diff)
downloadegawk-2204f38c05fef5747b8f6764a202b646f4126338.tar.gz
egawk-2204f38c05fef5747b8f6764a202b646f4126338.tar.bz2
egawk-2204f38c05fef5747b8f6764a202b646f4126338.zip
Finally! Integrated API chapter into gawk doc.
-rw-r--r--doc/ChangeLog4
-rw-r--r--doc/gawk.info5197
-rw-r--r--doc/gawk.texi4521
3 files changed, 8154 insertions, 1568 deletions
diff --git a/doc/ChangeLog b/doc/ChangeLog
index c05f5586..5aa8f674 100644
--- a/doc/ChangeLog
+++ b/doc/ChangeLog
@@ -1,3 +1,7 @@
+2012-11-04 Arnold D. Robbins <arnold@skeeve.com>
+
+ * gawk.texi: New chapter on extension API.
+
2012-11-03 Arnold D. Robbins <arnold@skeeve.com>
* api-figure1.pdf, api-figure2.pdf, api-figure3.pdf: Removed.
diff --git a/doc/gawk.info b/doc/gawk.info
index baa064d7..73fac121 100644
--- a/doc/gawk.info
+++ b/doc/gawk.info
@@ -113,419 +113,531 @@ texts being (a) (see below), and with the Back-Cover Texts being (b)
* GNU Free Documentation License:: The license for this Info file.
* Index:: Concept and Variable Index.
-* History:: The history of `gawk' and
- `awk'.
-* Names:: What name to use to find `awk'.
-* This Manual:: Using this Info file. Includes
- sample input files that you can use.
-* Conventions:: Typographical Conventions.
-* Manual History:: Brief history of the GNU project and this
- Info file.
-* How To Contribute:: Helping to save the world.
-* Acknowledgments:: Acknowledgments.
-* Running gawk:: How to run `gawk' programs;
- includes command-line syntax.
-* One-shot:: Running a short throwaway `awk'
- program.
-* Read Terminal:: Using no input files (input from terminal
- instead).
-* Long:: Putting permanent `awk' programs in
- files.
-* Executable Scripts:: Making self-contained `awk'
- programs.
-* Comments:: Adding documentation to `gawk'
- programs.
-* Quoting:: More discussion of shell quoting issues.
-* DOS Quoting:: Quoting in Windows Batch Files.
-* Sample Data Files:: Sample data files for use in the
- `awk' programs illustrated in this
- Info file.
-* Very Simple:: A very simple example.
-* Two Rules:: A less simple one-line example using two
- rules.
-* More Complex:: A more complex example.
-* Statements/Lines:: Subdividing or combining statements into
- lines.
-* Other Features:: Other Features of `awk'.
-* When:: When to use `gawk' and when to use
- other things.
-* Command Line:: How to run `awk'.
-* Options:: Command-line options and their meanings.
-* Other Arguments:: Input file names and variable assignments.
-* Naming Standard Input:: How to specify standard input with other
- files.
-* Environment Variables:: The environment variables `gawk'
- uses.
-* AWKPATH Variable:: Searching directories for `awk'
- programs.
-* AWKLIBPATH Variable:: Searching directories for `awk'
- shared libraries.
-* Other Environment Variables:: The environment variables.
-* Exit Status:: `gawk''s exit status.
-* Include Files:: Including other files into your program.
-* Loading Shared Libraries:: Loading shared libraries into your program.
-* Obsolete:: Obsolete Options and/or features.
-* Undocumented:: Undocumented Options and Features.
-* Regexp Usage:: How to Use Regular Expressions.
-* Escape Sequences:: How to write nonprinting characters.
-* Regexp Operators:: Regular Expression Operators.
-* Bracket Expressions:: What can go between `[...]'.
-* GNU Regexp Operators:: Operators specific to GNU software.
-* Case-sensitivity:: How to do case-insensitive matching.
-* Leftmost Longest:: How much text matches.
-* Computed Regexps:: Using Dynamic Regexps.
-* Records:: Controlling how data is split into records.
-* Fields:: An introduction to fields.
-* Nonconstant Fields:: Nonconstant Field Numbers.
-* Changing Fields:: Changing the Contents of a Field.
-* Field Separators:: The field separator and how to change it.
-* Default Field Splitting:: How fields are normally separated.
-* Regexp Field Splitting:: Using regexps as the field separator.
-* Single Character Fields:: Making each character a separate field.
-* Command Line Field Separator:: Setting `FS' from the command-line.
-* Field Splitting Summary:: Some final points and a summary table.
-* Constant Size:: Reading constant width data.
-* Splitting By Content:: Defining Fields By Content
-* Multiple Line:: Reading multi-line records.
-* Getline:: Reading files under explicit program
- control using the `getline' function.
-* Plain Getline:: Using `getline' with no arguments.
-* Getline/Variable:: Using `getline' into a variable.
-* Getline/File:: Using `getline' from a file.
-* Getline/Variable/File:: Using `getline' into a variable from a
- file.
-* Getline/Pipe:: Using `getline' from a pipe.
-* Getline/Variable/Pipe:: Using `getline' into a variable from a
- pipe.
-* Getline/Coprocess:: Using `getline' from a coprocess.
-* Getline/Variable/Coprocess:: Using `getline' into a variable from a
- coprocess.
-* Getline Notes:: Important things to know about
- `getline'.
-* Getline Summary:: Summary of `getline' Variants.
-* Read Timeout:: Reading input with a timeout.
-* Command line directories:: What happens if you put a directory on the
- command line.
-* Print:: The `print' statement.
-* Print Examples:: Simple examples of `print' statements.
-* Output Separators:: The output separators and how to change
- them.
-* OFMT:: Controlling Numeric Output With
- `print'.
-* Printf:: The `printf' statement.
-* Basic Printf:: Syntax of the `printf' statement.
-* Control Letters:: Format-control letters.
-* Format Modifiers:: Format-specification modifiers.
-* Printf Examples:: Several examples.
-* Redirection:: How to redirect output to multiple files
- and pipes.
-* Special Files:: File name interpretation in `gawk'.
- `gawk' allows access to inherited
- file descriptors.
-* Special FD:: Special files for I/O.
-* Special Network:: Special files for network communications.
-* Special Caveats:: Things to watch out for.
-* Close Files And Pipes:: Closing Input and Output Files and Pipes.
-* Values:: Constants, Variables, and Regular
- Expressions.
-* Constants:: String, numeric and regexp constants.
-* Scalar Constants:: Numeric and string constants.
-* Nondecimal-numbers:: What are octal and hex numbers.
-* Regexp Constants:: Regular Expression constants.
-* Using Constant Regexps:: When and how to use a regexp constant.
-* Variables:: Variables give names to values for later
- use.
-* Using Variables:: Using variables in your programs.
-* Assignment Options:: Setting variables on the command-line and a
- summary of command-line syntax. This is an
- advanced method of input.
-* Conversion:: The conversion of strings to numbers and
- vice versa.
-* All Operators:: `gawk''s operators.
-* Arithmetic Ops:: Arithmetic operations (`+', `-',
- etc.)
-* Concatenation:: Concatenating strings.
-* Assignment Ops:: Changing the value of a variable or a
- field.
-* Increment Ops:: Incrementing the numeric value of a
- variable.
-* Truth Values and Conditions:: Testing for true and false.
-* Truth Values:: What is ``true'' and what is ``false''.
-* Typing and Comparison:: How variables acquire types and how this
- affects comparison of numbers and strings
- with `<', etc.
-* Variable Typing:: String type versus numeric type.
-* Comparison Operators:: The comparison operators.
-* POSIX String Comparison:: String comparison with POSIX rules.
-* Boolean Ops:: Combining comparison expressions using
- boolean operators `||' (``or''),
- `&&' (``and'') and `!' (``not'').
-* Conditional Exp:: Conditional expressions select between two
- subexpressions under control of a third
- subexpression.
-* Function Calls:: A function call is an expression.
-* Precedence:: How various operators nest.
-* Locales:: How the locale affects things.
-* Pattern Overview:: What goes into a pattern.
-* Regexp Patterns:: Using regexps as patterns.
-* Expression Patterns:: Any expression can be used as a pattern.
-* Ranges:: Pairs of patterns specify record ranges.
-* BEGIN/END:: Specifying initialization and cleanup
- rules.
-* Using BEGIN/END:: How and why to use BEGIN/END rules.
-* I/O And BEGIN/END:: I/O issues in BEGIN/END rules.
-* BEGINFILE/ENDFILE:: Two special patterns for advanced control.
-* Empty:: The empty pattern, which matches every
- record.
-* Using Shell Variables:: How to use shell variables with
- `awk'.
-* Action Overview:: What goes into an action.
-* Statements:: Describes the various control statements in
- detail.
-* If Statement:: Conditionally execute some `awk'
- statements.
-* While Statement:: Loop until some condition is satisfied.
-* Do Statement:: Do specified action while looping until
- some condition is satisfied.
-* For Statement:: Another looping statement, that provides
- initialization and increment clauses.
-* Switch Statement:: Switch/case evaluation for conditional
- execution of statements based on a value.
-* Break Statement:: Immediately exit the innermost enclosing
- loop.
-* Continue Statement:: Skip to the end of the innermost enclosing
- loop.
-* Next Statement:: Stop processing the current input record.
-* Nextfile Statement:: Stop processing the current file.
-* Exit Statement:: Stop execution of `awk'.
-* Built-in Variables:: Summarizes the built-in variables.
-* User-modified:: Built-in variables that you change to
- control `awk'.
-* Auto-set:: Built-in variables where `awk'
- gives you information.
-* ARGC and ARGV:: Ways to use `ARGC' and `ARGV'.
-* Array Basics:: The basics of arrays.
-* Array Intro:: Introduction to Arrays
-* Reference to Elements:: How to examine one element of an array.
-* Assigning Elements:: How to change an element of an array.
-* Array Example:: Basic Example of an Array
-* Scanning an Array:: A variation of the `for' statement. It
- loops through the indices of an array's
- existing elements.
-* Controlling Scanning:: Controlling the order in which arrays are
- scanned.
-* Delete:: The `delete' statement removes an
- element from an array.
-* Numeric Array Subscripts:: How to use numbers as subscripts in
- `awk'.
-* Uninitialized Subscripts:: Using Uninitialized variables as
- subscripts.
-* Multi-dimensional:: Emulating multidimensional arrays in
- `awk'.
-* Multi-scanning:: Scanning multidimensional arrays.
-* Arrays of Arrays:: True multidimensional arrays.
-* Built-in:: Summarizes the built-in functions.
-* Calling Built-in:: How to call built-in functions.
-* Numeric Functions:: Functions that work with numbers, including
- `int()', `sin()' and
- `rand()'.
-* String Functions:: Functions for string manipulation, such as
- `split()', `match()' and
- `sprintf()'.
-* Gory Details:: More than you want to know about `\'
- and `&' with `sub()',
- `gsub()', and `gensub()'.
-* I/O Functions:: Functions for files and shell commands.
-* Time Functions:: Functions for dealing with timestamps.
-* Bitwise Functions:: Functions for bitwise operations.
-* Type Functions:: Functions for type information.
-* I18N Functions:: Functions for string translation.
-* User-defined:: Describes User-defined functions in detail.
-* Definition Syntax:: How to write definitions and what they
- mean.
-* Function Example:: An example function definition and what it
- does.
-* Function Caveats:: Things to watch out for.
-* Calling A Function:: Don't use spaces.
-* Variable Scope:: Controlling variable scope.
-* Pass By Value/Reference:: Passing parameters.
-* Return Statement:: Specifying the value a function returns.
-* Dynamic Typing:: How variable types can change at runtime.
-* Indirect Calls:: Choosing the function to call at runtime.
-* I18N and L10N:: Internationalization and Localization.
-* Explaining gettext:: How GNU `gettext' works.
-* Programmer i18n:: Features for the programmer.
-* Translator i18n:: Features for the translator.
-* String Extraction:: Extracting marked strings.
-* Printf Ordering:: Rearranging `printf' arguments.
-* I18N Portability:: `awk'-level portability issues.
-* I18N Example:: A simple i18n example.
-* Gawk I18N:: `gawk' is also internationalized.
-* Nondecimal Data:: Allowing nondecimal input data.
-* Array Sorting:: Facilities for controlling array traversal
- and sorting arrays.
-* Controlling Array Traversal:: How to use PROCINFO["sorted_in"].
-* Array Sorting Functions:: How to use `asort()' and
- `asorti()'.
-* Two-way I/O:: Two-way communications with another
- process.
-* TCP/IP Networking:: Using `gawk' for network
- programming.
-* Profiling:: Profiling your `awk' programs.
-* Library Names:: How to best name private global variables
- in library functions.
-* General Functions:: Functions that are of general use.
-* Strtonum Function:: A replacement for the built-in
- `strtonum()' function.
-* Assert Function:: A function for assertions in `awk'
- programs.
-* Round Function:: A function for rounding if `sprintf()'
- does not do it correctly.
-* Cliff Random Function:: The Cliff Random Number Generator.
-* Ordinal Functions:: Functions for using characters as numbers
- and vice versa.
-* Join Function:: A function to join an array into a string.
-* Getlocaltime Function:: A function to get formatted times.
-* Data File Management:: Functions for managing command-line data
- files.
-* Filetrans Function:: A function for handling data file
- transitions.
-* Rewind Function:: A function for rereading the current file.
-* File Checking:: Checking that data files are readable.
-* Empty Files:: Checking for zero-length files.
-* Ignoring Assigns:: Treating assignments as file names.
-* Getopt Function:: A function for processing command-line
- arguments.
-* Passwd Functions:: Functions for getting user information.
-* Group Functions:: Functions for getting group information.
-* Walking Arrays:: A function to walk arrays of arrays.
-* Running Examples:: How to run these examples.
-* Clones:: Clones of common utilities.
-* Cut Program:: The `cut' utility.
-* Egrep Program:: The `egrep' utility.
-* Id Program:: The `id' utility.
-* Split Program:: The `split' utility.
-* Tee Program:: The `tee' utility.
-* Uniq Program:: The `uniq' utility.
-* Wc Program:: The `wc' utility.
-* Miscellaneous Programs:: Some interesting `awk' programs.
-* Dupword Program:: Finding duplicated words in a document.
-* Alarm Program:: An alarm clock.
-* Translate Program:: A program similar to the `tr'
- utility.
-* Labels Program:: Printing mailing labels.
-* Word Sorting:: A program to produce a word usage count.
-* History Sorting:: Eliminating duplicate entries from a
- history file.
-* Extract Program:: Pulling out programs from Texinfo source
- files.
-* Simple Sed:: A Simple Stream Editor.
-* Igawk Program:: A wrapper for `awk' that includes
- files.
-* Anagram Program:: Finding anagrams from a dictionary.
-* Signature Program:: People do amazing things with too much time
- on their hands.
-* Debugging:: Introduction to `gawk' debugger.
-* Debugging Concepts:: Debugging in General.
-* Debugging Terms:: Additional Debugging Concepts.
-* Awk Debugging:: Awk Debugging.
-* Sample Debugging Session:: Sample debugging session.
-* Debugger Invocation:: How to Start the Debugger.
-* Finding The Bug:: Finding the Bug.
-* List of Debugger Commands:: Main debugger commands.
-* Breakpoint Control:: Control of Breakpoints.
-* Debugger Execution Control:: Control of Execution.
-* Viewing And Changing Data:: Viewing and Changing Data.
-* Execution Stack:: Dealing with the Stack.
-* Debugger Info:: Obtaining Information about the Program and
- the Debugger State.
-* Miscellaneous Debugger Commands:: Miscellaneous Commands.
-* Readline Support:: Readline support.
-* Limitations:: Limitations and future plans.
-* General Arithmetic:: An introduction to computer arithmetic.
-* Floating Point Issues:: Stuff to know about floating-point numbers.
-* String Conversion Precision:: The String Value Can Lie.
-* Unexpected Results:: Floating Point Numbers Are Not Abstract
- Numbers.
-* POSIX Floating Point Problems:: Standards Versus Existing Practice.
-* Integer Programming:: Effective integer programming.
-* Floating-point Programming:: Effective Floating-point Programming.
-* Floating-point Representation:: Binary floating-point representation.
-* Floating-point Context:: Floating-point context.
-* Rounding Mode:: Floating-point rounding mode.
-* Gawk and MPFR:: How `gawk' provides
- arbitrary-precision arithmetic.
-* Arbitrary Precision Floats:: Arbitrary Precision Floating-point
- Arithmetic with `gawk'.
-* Setting Precision:: Setting the working precision.
-* Setting Rounding Mode:: Setting the rounding mode.
-* Floating-point Constants:: Representing floating-point constants.
-* Changing Precision:: Changing the precision of a number.
-* Exact Arithmetic:: Exact arithmetic with floating-point
- numbers.
-* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic with
- `gawk'.
-* Plugin License:: A note about licensing.
-* Sample Library:: A example of new functions.
-* Internal File Description:: What the new functions will do.
-* Internal File Ops:: The code for internal file operations.
-* Using Internal File Ops:: How to use an external extension.
-* V7/SVR3.1:: The major changes between V7 and System V
- Release 3.1.
-* SVR4:: Minor changes between System V Releases 3.1
- and 4.
-* POSIX:: New features from the POSIX standard.
-* BTL:: New features from Brian Kernighan's version
- of `awk'.
-* POSIX/GNU:: The extensions in `gawk' not in
- POSIX `awk'.
-* Common Extensions:: Common Extensions Summary.
-* Ranges and Locales:: How locales used to affect regexp ranges.
-* Contributors:: The major contributors to `gawk'.
-* Gawk Distribution:: What is in the `gawk' distribution.
-* Getting:: How to get the distribution.
-* Extracting:: How to extract the distribution.
-* Distribution contents:: What is in the distribution.
-* Unix Installation:: Installing `gawk' under various
- versions of Unix.
-* Quick Installation:: Compiling `gawk' under Unix.
-* Additional Configuration Options:: Other compile-time options.
-* Configuration Philosophy:: How it's all supposed to work.
-* Non-Unix Installation:: Installation on Other Operating Systems.
-* PC Installation:: Installing and Compiling `gawk' on
- MS-DOS and OS/2.
-* PC Binary Installation:: Installing a prepared distribution.
-* PC Compiling:: Compiling `gawk' for MS-DOS,
- Windows32, and OS/2.
-* PC Testing:: Testing `gawk' on PC systems.
-* PC Using:: Running `gawk' on MS-DOS, Windows32
- and OS/2.
-* Cygwin:: Building and running `gawk' for
- Cygwin.
-* MSYS:: Using `gawk' In The MSYS
- Environment.
-* VMS Installation:: Installing `gawk' on VMS.
-* VMS Compilation:: How to compile `gawk' under VMS.
-* VMS Installation Details:: How to install `gawk' under VMS.
-* VMS Running:: How to run `gawk' under VMS.
-* VMS Old Gawk:: An old version comes with some VMS systems.
-* Bugs:: Reporting Problems and Bugs.
-* Other Versions:: Other freely available `awk'
- implementations.
-* Compatibility Mode:: How to disable certain `gawk'
- extensions.
-* Additions:: Making Additions To `gawk'.
-* Accessing The Source:: Accessing the Git repository.
-* Adding Code:: Adding code to the main body of
- `gawk'.
-* New Ports:: Porting `gawk' to a new operating
- system.
-* Derived Files:: Why derived files are kept in the
- `git' repository.
-* Future Extensions:: New features that may be implemented one
- day.
-* Basic High Level:: The high level view.
-* Basic Data Typing:: A very quick intro to data types.
+* History:: The history of `gawk' and
+ `awk'.
+* Names:: What name to use to find
+ `awk'.
+* This Manual:: Using this Info file. Includes
+ sample input files that you can use.
+* Conventions:: Typographical Conventions.
+* Manual History:: Brief history of the GNU project and
+ this Info file.
+* How To Contribute:: Helping to save the world.
+* Acknowledgments:: Acknowledgments.
+* Running gawk:: How to run `gawk' programs;
+ includes command-line syntax.
+* One-shot:: Running a short throwaway
+ `awk' program.
+* Read Terminal:: Using no input files (input from
+ terminal instead).
+* Long:: Putting permanent `awk'
+ programs in files.
+* Executable Scripts:: Making self-contained `awk'
+ programs.
+* Comments:: Adding documentation to `gawk'
+ programs.
+* Quoting:: More discussion of shell quoting
+ issues.
+* DOS Quoting:: Quoting in Windows Batch Files.
+* Sample Data Files:: Sample data files for use in the
+ `awk' programs illustrated in
+ this Info file.
+* Very Simple:: A very simple example.
+* Two Rules:: A less simple one-line example using
+ two rules.
+* More Complex:: A more complex example.
+* Statements/Lines:: Subdividing or combining statements
+ into lines.
+* Other Features:: Other Features of `awk'.
+* When:: When to use `gawk' and when to
+ use other things.
+* Command Line:: How to run `awk'.
+* Options:: Command-line options and their
+ meanings.
+* Other Arguments:: Input file names and variable
+ assignments.
+* Naming Standard Input:: How to specify standard input with
+ other files.
+* Environment Variables:: The environment variables
+ `gawk' uses.
+* AWKPATH Variable:: Searching directories for
+ `awk' programs.
+* AWKLIBPATH Variable:: Searching directories for
+ `awk' shared libraries.
+* Other Environment Variables:: The environment variables.
+* Exit Status:: `gawk''s exit status.
+* Include Files:: Including other files into your
+ program.
+* Loading Shared Libraries:: Loading shared libraries into your
+ program.
+* Obsolete:: Obsolete Options and/or features.
+* Undocumented:: Undocumented Options and Features.
+* Regexp Usage:: How to Use Regular Expressions.
+* Escape Sequences:: How to write nonprinting characters.
+* Regexp Operators:: Regular Expression Operators.
+* Bracket Expressions:: What can go between `[...]'.
+* GNU Regexp Operators:: Operators specific to GNU software.
+* Case-sensitivity:: How to do case-insensitive matching.
+* Leftmost Longest:: How much text matches.
+* Computed Regexps:: Using Dynamic Regexps.
+* Records:: Controlling how data is split into
+ records.
+* Fields:: An introduction to fields.
+* Nonconstant Fields:: Nonconstant Field Numbers.
+* Changing Fields:: Changing the Contents of a Field.
+* Field Separators:: The field separator and how to change
+ it.
+* Default Field Splitting:: How fields are normally separated.
+* Regexp Field Splitting:: Using regexps as the field separator.
+* Single Character Fields:: Making each character a separate
+ field.
+* Command Line Field Separator:: Setting `FS' from the
+ command-line.
+* Field Splitting Summary:: Some final points and a summary table.
+* Constant Size:: Reading constant width data.
+* Splitting By Content:: Defining Fields By Content
+* Multiple Line:: Reading multi-line records.
+* Getline:: Reading files under explicit program
+ control using the `getline'
+ function.
+* Plain Getline:: Using `getline' with no
+ arguments.
+* Getline/Variable:: Using `getline' into a variable.
+* Getline/File:: Using `getline' from a file.
+* Getline/Variable/File:: Using `getline' into a variable
+ from a file.
+* Getline/Pipe:: Using `getline' from a pipe.
+* Getline/Variable/Pipe:: Using `getline' into a variable
+ from a pipe.
+* Getline/Coprocess:: Using `getline' from a coprocess.
+* Getline/Variable/Coprocess:: Using `getline' into a variable
+ from a coprocess.
+* Getline Notes:: Important things to know about
+ `getline'.
+* Getline Summary:: Summary of `getline' Variants.
+* Read Timeout:: Reading input with a timeout.
+* Command line directories:: What happens if you put a directory on
+ the command line.
+* Print:: The `print' statement.
+* Print Examples:: Simple examples of `print'
+ statements.
+* Output Separators:: The output separators and how to
+ change them.
+* OFMT:: Controlling Numeric Output With
+ `print'.
+* Printf:: The `printf' statement.
+* Basic Printf:: Syntax of the `printf' statement.
+* Control Letters:: Format-control letters.
+* Format Modifiers:: Format-specification modifiers.
+* Printf Examples:: Several examples.
+* Redirection:: How to redirect output to multiple
+ files and pipes.
+* Special Files:: File name interpretation in
+ `gawk'. `gawk' allows
+ access to inherited file descriptors.
+* Special FD:: Special files for I/O.
+* Special Network:: Special files for network
+ communications.
+* Special Caveats:: Things to watch out for.
+* Close Files And Pipes:: Closing Input and Output Files and
+ Pipes.
+* Values:: Constants, Variables, and Regular
+ Expressions.
+* Constants:: String, numeric and regexp constants.
+* Scalar Constants:: Numeric and string constants.
+* Nondecimal-numbers:: What are octal and hex numbers.
+* Regexp Constants:: Regular Expression constants.
+* Using Constant Regexps:: When and how to use a regexp constant.
+* Variables:: Variables give names to values for
+ later use.
+* Using Variables:: Using variables in your programs.
+* Assignment Options:: Setting variables on the command-line
+ and a summary of command-line syntax.
+ This is an advanced method of input.
+* Conversion:: The conversion of strings to numbers
+ and vice versa.
+* All Operators:: `gawk''s operators.
+* Arithmetic Ops:: Arithmetic operations (`+',
+ `-', etc.)
+* Concatenation:: Concatenating strings.
+* Assignment Ops:: Changing the value of a variable or a
+ field.
+* Increment Ops:: Incrementing the numeric value of a
+ variable.
+* Truth Values and Conditions:: Testing for true and false.
+* Truth Values:: What is ``true'' and what is
+ ``false''.
+* Typing and Comparison:: How variables acquire types and how
+ this affects comparison of numbers and
+ strings with `<', etc.
+* Variable Typing:: String type versus numeric type.
+* Comparison Operators:: The comparison operators.
+* POSIX String Comparison:: String comparison with POSIX rules.
+* Boolean Ops:: Combining comparison expressions using
+ boolean operators `||' (``or''),
+ `&&' (``and'') and `!'
+ (``not'').
+* Conditional Exp:: Conditional expressions select between
+ two subexpressions under control of a
+ third subexpression.
+* Function Calls:: A function call is an expression.
+* Precedence:: How various operators nest.
+* Locales:: How the locale affects things.
+* Pattern Overview:: What goes into a pattern.
+* Regexp Patterns:: Using regexps as patterns.
+* Expression Patterns:: Any expression can be used as a
+ pattern.
+* Ranges:: Pairs of patterns specify record
+ ranges.
+* BEGIN/END:: Specifying initialization and cleanup
+ rules.
+* Using BEGIN/END:: How and why to use BEGIN/END rules.
+* I/O And BEGIN/END:: I/O issues in BEGIN/END rules.
+* BEGINFILE/ENDFILE:: Two special patterns for advanced
+ control.
+* Empty:: The empty pattern, which matches every
+ record.
+* Using Shell Variables:: How to use shell variables with
+ `awk'.
+* Action Overview:: What goes into an action.
+* Statements:: Describes the various control
+ statements in detail.
+* If Statement:: Conditionally execute some
+ `awk' statements.
+* While Statement:: Loop until some condition is
+ satisfied.
+* Do Statement:: Do specified action while looping
+ until some condition is satisfied.
+* For Statement:: Another looping statement, that
+ provides initialization and increment
+ clauses.
+* Switch Statement:: Switch/case evaluation for conditional
+ execution of statements based on a
+ value.
+* Break Statement:: Immediately exit the innermost
+ enclosing loop.
+* Continue Statement:: Skip to the end of the innermost
+ enclosing loop.
+* Next Statement:: Stop processing the current input
+ record.
+* Nextfile Statement:: Stop processing the current file.
+* Exit Statement:: Stop execution of `awk'.
+* Built-in Variables:: Summarizes the built-in variables.
+* User-modified:: Built-in variables that you change to
+ control `awk'.
+* Auto-set:: Built-in variables where `awk'
+ gives you information.
+* ARGC and ARGV:: Ways to use `ARGC' and
+ `ARGV'.
+* Array Basics:: The basics of arrays.
+* Array Intro:: Introduction to Arrays
+* Reference to Elements:: How to examine one element of an
+ array.
+* Assigning Elements:: How to change an element of an array.
+* Array Example:: Basic Example of an Array
+* Scanning an Array:: A variation of the `for'
+ statement. It loops through the
+ indices of an array's existing
+ elements.
+* Controlling Scanning:: Controlling the order in which arrays
+ are scanned.
+* Delete:: The `delete' statement removes an
+ element from an array.
+* Numeric Array Subscripts:: How to use numbers as subscripts in
+ `awk'.
+* Uninitialized Subscripts:: Using Uninitialized variables as
+ subscripts.
+* Multi-dimensional:: Emulating multidimensional arrays in
+ `awk'.
+* Multi-scanning:: Scanning multidimensional arrays.
+* Arrays of Arrays:: True multidimensional arrays.
+* Built-in:: Summarizes the built-in functions.
+* Calling Built-in:: How to call built-in functions.
+* Numeric Functions:: Functions that work with numbers,
+ including `int()', `sin()'
+ and `rand()'.
+* String Functions:: Functions for string manipulation,
+ such as `split()', `match()'
+ and `sprintf()'.
+* Gory Details:: More than you want to know about
+ `\' and `&' with
+ `sub()', `gsub()', and
+ `gensub()'.
+* I/O Functions:: Functions for files and shell
+ commands.
+* Time Functions:: Functions for dealing with timestamps.
+* Bitwise Functions:: Functions for bitwise operations.
+* Type Functions:: Functions for type information.
+* I18N Functions:: Functions for string translation.
+* User-defined:: Describes User-defined functions in
+ detail.
+* Definition Syntax:: How to write definitions and what they
+ mean.
+* Function Example:: An example function definition and
+ what it does.
+* Function Caveats:: Things to watch out for.
+* Calling A Function:: Don't use spaces.
+* Variable Scope:: Controlling variable scope.
+* Pass By Value/Reference:: Passing parameters.
+* Return Statement:: Specifying the value a function
+ returns.
+* Dynamic Typing:: How variable types can change at
+ runtime.
+* Indirect Calls:: Choosing the function to call at
+ runtime.
+* I18N and L10N:: Internationalization and Localization.
+* Explaining gettext:: How GNU `gettext' works.
+* Programmer i18n:: Features for the programmer.
+* Translator i18n:: Features for the translator.
+* String Extraction:: Extracting marked strings.
+* Printf Ordering:: Rearranging `printf' arguments.
+* I18N Portability:: `awk'-level portability
+ issues.
+* I18N Example:: A simple i18n example.
+* Gawk I18N:: `gawk' is also
+ internationalized.
+* Nondecimal Data:: Allowing nondecimal input data.
+* Array Sorting:: Facilities for controlling array
+ traversal and sorting arrays.
+* Controlling Array Traversal:: How to use PROCINFO["sorted_in"].
+* Array Sorting Functions:: How to use `asort()' and
+ `asorti()'.
+* Two-way I/O:: Two-way communications with another
+ process.
+* TCP/IP Networking:: Using `gawk' for network
+ programming.
+* Profiling:: Profiling your `awk' programs.
+* Library Names:: How to best name private global
+ variables in library functions.
+* General Functions:: Functions that are of general use.
+* Strtonum Function:: A replacement for the built-in
+ `strtonum()' function.
+* Assert Function:: A function for assertions in
+ `awk' programs.
+* Round Function:: A function for rounding if
+ `sprintf()' does not do it
+ correctly.
+* Cliff Random Function:: The Cliff Random Number Generator.
+* Ordinal Functions:: Functions for using characters as
+ numbers and vice versa.
+* Join Function:: A function to join an array into a
+ string.
+* Getlocaltime Function:: A function to get formatted times.
+* Data File Management:: Functions for managing command-line
+ data files.
+* Filetrans Function:: A function for handling data file
+ transitions.
+* Rewind Function:: A function for rereading the current
+ file.
+* File Checking:: Checking that data files are readable.
+* Empty Files:: Checking for zero-length files.
+* Ignoring Assigns:: Treating assignments as file names.
+* Getopt Function:: A function for processing command-line
+ arguments.
+* Passwd Functions:: Functions for getting user
+ information.
+* Group Functions:: Functions for getting group
+ information.
+* Walking Arrays:: A function to walk arrays of arrays.
+* Running Examples:: How to run these examples.
+* Clones:: Clones of common utilities.
+* Cut Program:: The `cut' utility.
+* Egrep Program:: The `egrep' utility.
+* Id Program:: The `id' utility.
+* Split Program:: The `split' utility.
+* Tee Program:: The `tee' utility.
+* Uniq Program:: The `uniq' utility.
+* Wc Program:: The `wc' utility.
+* Miscellaneous Programs:: Some interesting `awk'
+ programs.
+* Dupword Program:: Finding duplicated words in a
+ document.
+* Alarm Program:: An alarm clock.
+* Translate Program:: A program similar to the `tr'
+ utility.
+* Labels Program:: Printing mailing labels.
+* Word Sorting:: A program to produce a word usage
+ count.
+* History Sorting:: Eliminating duplicate entries from a
+ history file.
+* Extract Program:: Pulling out programs from Texinfo
+ source files.
+* Simple Sed:: A Simple Stream Editor.
+* Igawk Program:: A wrapper for `awk' that
+ includes files.
+* Anagram Program:: Finding anagrams from a dictionary.
+* Signature Program:: People do amazing things with too much
+ time on their hands.
+* Debugging:: Introduction to `gawk'
+ debugger.
+* Debugging Concepts:: Debugging in General.
+* Debugging Terms:: Additional Debugging Concepts.
+* Awk Debugging:: Awk Debugging.
+* Sample Debugging Session:: Sample debugging session.
+* Debugger Invocation:: How to Start the Debugger.
+* Finding The Bug:: Finding the Bug.
+* List of Debugger Commands:: Main debugger commands.
+* Breakpoint Control:: Control of Breakpoints.
+* Debugger Execution Control:: Control of Execution.
+* Viewing And Changing Data:: Viewing and Changing Data.
+* Execution Stack:: Dealing with the Stack.
+* Debugger Info:: Obtaining Information about the
+ Program and the Debugger State.
+* Miscellaneous Debugger Commands:: Miscellaneous Commands.
+* Readline Support:: Readline support.
+* Limitations:: Limitations and future plans.
+* General Arithmetic:: An introduction to computer
+ arithmetic.
+* Floating Point Issues:: Stuff to know about floating-point
+ numbers.
+* String Conversion Precision:: The String Value Can Lie.
+* Unexpected Results:: Floating Point Numbers Are Not
+ Abstract Numbers.
+* POSIX Floating Point Problems:: Standards Versus Existing Practice.
+* Integer Programming:: Effective integer programming.
+* Floating-point Programming:: Effective Floating-point Programming.
+* Floating-point Representation:: Binary floating-point representation.
+* Floating-point Context:: Floating-point context.
+* Rounding Mode:: Floating-point rounding mode.
+* Gawk and MPFR:: How `gawk' provides
+ arbitrary-precision arithmetic.
+* Arbitrary Precision Floats:: Arbitrary Precision Floating-point
+ Arithmetic with `gawk'.
+* Setting Precision:: Setting the working precision.
+* Setting Rounding Mode:: Setting the rounding mode.
+* Floating-point Constants:: Representing floating-point constants.
+* Changing Precision:: Changing the precision of a number.
+* Exact Arithmetic:: Exact arithmetic with floating-point
+ numbers.
+* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic
+ with `gawk'.
+* Extension Intro:: What is an extension.
+* Plugin License:: A note about licensing.
+* Extension Design:: Design notes about the extension API.
+* Old Extension Problems:: Problems with the old mechanism.
+* Extension New Mechanism Goals:: Goals for the new mechanism.
+* Extension Other Design Decisions:: Some other design decisions.
+* Extension Mechanism Outline:: An outline of how it works.
+* Extension Future Growth:: Some room for future growth.
+* Extension API Description:: A full description of the API.
+* Extension API Functions Introduction:: Introduction to the API functions.
+* General Data Types:: The data types.
+* Requesting Values:: How to get a value.
+* Constructor Functions:: Functions for creating values.
+* Registration Functions:: Functions to register things with
+ `gawk'.
+* Extension Functions:: Registering extension functions.
+* Exit Callback Functions:: Registering an exit callback.
+* Extension Version String:: Registering a version string.
+* Input Parsers:: Registering an input parser.
+* Output Wrappers:: Registering an output wrapper.
+* Two-way processors:: Registering a two-way processor.
+* Printing Messages:: Functions for printing messages.
+* Updating `ERRNO':: Functions for updating `ERRNO'.
+* Accessing Parameters:: Functions for accessing parameters.
+* Symbol Table Access:: Functions for accessing global
+ variables.
+* Symbol table by name:: Accessing variables by name.
+* Symbol table by cookie:: Accessing variables by ``cookie''.
+* Cached values:: Creating and using cached values.
+* Array Manipulation:: Functions for working with arrays.
+* Array Data Types:: Data types for working with arrays.
+* Array Functions:: Functions for working with arrays.
+* Flattening Arrays:: How to flatten arrays.
+* Creating Arrays:: How to create and populate arrays.
+* Extension API Variables:: Variables provided by the API.
+* Extension Versioning:: API Version information.
+* Extension API Informational Variables:: Variables providing information about
+ `gawk''s invocation.
+* Extension API Boilerplate:: Boilerplate code for using the API.
+* Finding Extensions:: How `gawk' find compiled
+ extensions.
+* Extension Example:: Example C code for an extension.
+* Internal File Description:: What the new functions will do.
+* Internal File Ops:: The code for internal file operations.
+* Using Internal File Ops:: How to use an external extension.
+* Extension Samples:: The sample extensions that ship with
+ `gawk'.
+* Extension Sample File Functions:: The file functions sample.
+* Extension Sample Fnmatch:: An interface to `fnmatch()'.
+* Extension Sample Fork:: An interface to `fork()' and
+ other process functions.
+* Extension Sample Ord:: Character to value to character
+ conversions.
+* Extension Sample Readdir:: An interface to `readdir()'.
+* Extension Sample Revout:: Reversing output sample output
+ wrapper.
+* Extension Sample Rev2way:: Reversing data sample two-way
+ processor.
+* Extension Sample Read write array:: Serializing an array to a file.
+* Extension Sample Readfile:: Reading an entire file into a string.
+* Extension Sample API Tests:: Tests for the API.
+* Extension Sample Time:: An interface to `gettimeofday()'
+ and `sleep()'.
+* gawkextlib:: The `gawkextlib' project.
+* V7/SVR3.1:: The major changes between V7 and
+ System V Release 3.1.
+* SVR4:: Minor changes between System V
+ Releases 3.1 and 4.
+* POSIX:: New features from the POSIX standard.
+* BTL:: New features from Brian Kernighan's
+ version of `awk'.
+* POSIX/GNU:: The extensions in `gawk' not
+ in POSIX `awk'.
+* Common Extensions:: Common Extensions Summary.
+* Ranges and Locales:: How locales used to affect regexp
+ ranges.
+* Contributors:: The major contributors to
+ `gawk'.
+* Gawk Distribution:: What is in the `gawk'
+ distribution.
+* Getting:: How to get the distribution.
+* Extracting:: How to extract the distribution.
+* Distribution contents:: What is in the distribution.
+* Unix Installation:: Installing `gawk' under
+ various versions of Unix.
+* Quick Installation:: Compiling `gawk' under Unix.
+* Additional Configuration Options:: Other compile-time options.
+* Configuration Philosophy:: How it's all supposed to work.
+* Non-Unix Installation:: Installation on Other Operating
+ Systems.
+* PC Installation:: Installing and Compiling
+ `gawk' on MS-DOS and OS/2.
+* PC Binary Installation:: Installing a prepared distribution.
+* PC Compiling:: Compiling `gawk' for MS-DOS,
+ Windows32, and OS/2.
+* PC Testing:: Testing `gawk' on PC systems.
+* PC Using:: Running `gawk' on MS-DOS,
+ Windows32 and OS/2.
+* Cygwin:: Building and running `gawk'
+ for Cygwin.
+* MSYS:: Using `gawk' In The MSYS
+ Environment.
+* VMS Installation:: Installing `gawk' on VMS.
+* VMS Compilation:: How to compile `gawk' under
+ VMS.
+* VMS Installation Details:: How to install `gawk' under
+ VMS.
+* VMS Running:: How to run `gawk' under VMS.
+* VMS Old Gawk:: An old version comes with some VMS
+ systems.
+* Bugs:: Reporting Problems and Bugs.
+* Other Versions:: Other freely available `awk'
+ implementations.
+* Compatibility Mode:: How to disable certain `gawk'
+ extensions.
+* Additions:: Making Additions To `gawk'.
+* Accessing The Source:: Accessing the Git repository.
+* Adding Code:: Adding code to the main body of
+ `gawk'.
+* New Ports:: Porting `gawk' to a new
+ operating system.
+* Derived Files:: Why derived files are kept in the
+ `git' repository.
+* Future Extensions:: New features that may be implemented
+ one day.
+* Basic High Level:: The high level view.
+* Basic Data Typing:: A very quick intro to data types.
To Miriam, for making me complete.
@@ -21180,34 +21292,66 @@ File: gawk.info, Node: Dynamic Extensions, Next: Language History, Prev: Arbi
16 Writing Extensions for `gawk'
********************************
-This chapter is a placeholder, pending a rewrite for the new API. Some
-of the old bits remain, since they can be partially reused.
-
- It is possible to add new built-in functions to `gawk' using
+It is possible to add new built-in functions to `gawk' using
dynamically loaded libraries. This facility is available on systems
(such as GNU/Linux) that support the C `dlopen()' and `dlsym()'
-functions. This major node describes how to write and use dynamically
-loaded extensions for `gawk'. Experience with programming in C or C++
-is necessary when reading this minor node.
+functions. This major node describes how to create extensions using
+code written in C or C++. If you don't know anything about C
+programming, you can safely skip this major node, although you may wish
+to review the documentation on the extensions that come with `gawk'
+(*note Extension Samples::), and the section on the `gawkextlib'
+project (*note gawkextlib::).
NOTE: When `--sandbox' is specified, extensions are disabled
- (*note Options::.
+ (*note Options::).
* Menu:
+* Extension Intro:: What is an extension.
* Plugin License:: A note about licensing.
-* Sample Library:: A example of new functions.
+* Extension Design:: Design notes about the extension API.
+* Extension API Description:: A full description of the API.
+* Extension Example:: Example C code for an extension.
+* Extension Samples:: The sample extensions that ship with
+ `gawk'.
+* gawkextlib:: The `gawkextlib' project.
+
+
+File: gawk.info, Node: Extension Intro, Next: Plugin License, Up: Dynamic Extensions
+
+16.1 Introduction
+=================
+
+An "extension" (sometimes called a "plug-in") is a piece of external
+compiled code that `gawk' can load at runtime to provide additional
+functionality, over and above the built-in capabilities described in
+the rest of this Info file.
+
+ Extensions are useful because they allow you (of course) to extend
+`gawk''s functionality. For example, they can provide access to system
+calls (such as `chdir()' to change directory) and to other C library
+routines that could be of use. As with most software, "the sky is the
+limit;" if you can imagine something that you might want to do and can
+write in C or C++, you can write an extension to do it!
+
+ Extensions are written in C or C++, using the "Application
+Programming Interface" (API) defined for this purpose by the `gawk'
+developers. The rest of this major node explains the design decisions
+behind the API, the facilities it provides and how to use them, and
+presents a small sample extension. In addition, it documents the
+sample extensions included in the `gawk' distribution, and describes
+the `gawkextlib' project.

-File: gawk.info, Node: Plugin License, Next: Sample Library, Up: Dynamic Extensions
+File: gawk.info, Node: Plugin License, Next: Extension Design, Prev: Extension Intro, Up: Dynamic Extensions
-16.1 Extension Licensing
+16.2 Extension Licensing
========================
Every dynamic extension should define the global symbol
`plugin_is_GPL_compatible' to assert that it has been licensed under a
-GPL-compatible license. If this symbol does not exist, `gawk' will
-emit a fatal error and exit.
+GPL-compatible license. If this symbol does not exist, `gawk' emits a
+fatal error and exits when it tries to load your extension.
The declared type of the symbol should be `int'. It does not need
to be in any allocated section, though. The code merely asserts that
@@ -21216,15 +21360,2213 @@ the symbol exists in the global scope. Something like this is enough:
int plugin_is_GPL_compatible;

-File: gawk.info, Node: Sample Library, Prev: Plugin License, Up: Dynamic Extensions
+File: gawk.info, Node: Extension Design, Next: Extension API Description, Prev: Plugin License, Up: Dynamic Extensions
-16.2 Example: Directory and File Operation Built-ins
-====================================================
+16.3 Extension API Design
+=========================
+
+The first version of extensions for `gawk' was developed in the
+mid-1990s and released with `gawk' 3.1 in the late 1990s. The basic
+mechanisms and design remained unchanged for close to 15 years, until
+2012.
+
+ The old extension mechanism used data types and functions from
+`gawk' itself, with a "clever hack" to install extension functions.
+
+ `gawk' included some sample extensions, of which a few were really
+useful. However, it was clear from the outset that the extension
+mechanism was bolted onto the side and was not really thought out.
+
+* Menu:
+
+* Old Extension Problems:: Problems with the old mechanism.
+* Extension New Mechanism Goals:: Goals for the new mechanism.
+* Extension Other Design Decisions:: Some other design decisions.
+* Extension Mechanism Outline:: An outline of how it works.
+* Extension Future Growth:: Some room for future growth.
+
+
+File: gawk.info, Node: Old Extension Problems, Next: Extension New Mechanism Goals, Up: Extension Design
+
+16.3.1 Problems With The Old Mechanism
+--------------------------------------
+
+The old extension mechanism had several problems:
+
+ * It depended heavily upon `gawk' internals. Any time the `NODE'
+ structure(1) changed, an extension would have to be recompiled.
+ Furthermore, to really write extensions required understanding
+ something about `gawk''s internal functions. There was some
+ documentation in this Info file, but it was quite minimal.
+
+ * Being able to call into `gawk' from an extension required linker
+ facilities that are common on Unix-derived systems but that did
+ not work on Windows systems; users wanting extensions on Windows
+ had to statically link them into `gawk', even though Windows
+ supports dynamic loading of shared objects.
+
+ * The API would change occasionally as `gawk' changed; no
+ compatibility between versions was ever offered or planned for.
+
+ Despite the drawbacks, the `xgawk' project developers forked `gawk'
+and developed several significant extensions. They also enhanced
+`gawk''s facilities relating to file inclusion and shared object access.
+
+ A new API was desired for a long time, but only in 2012 did the
+`gawk' maintainer and the `xgawk' developers finally start working on
+it together. More information about the `xgawk' project is provided in
+*note gawkextlib::.
+
+ ---------- Footnotes ----------
+
+ (1) A critical central data structure inside `gawk'.
+
+
+File: gawk.info, Node: Extension New Mechanism Goals, Next: Extension Other Design Decisions, Prev: Old Extension Problems, Up: Extension Design
+
+16.3.2 Goals For A New Mechanism
+--------------------------------
+
+Some goals for the new API were:
+
+ * The API should be independent of `gawk' internals. Changes in
+ `gawk' internals should not be visible to the writer of an
+ extension function.
+
+ * The API should provide _binary_ compatibility across `gawk'
+ releases as long as the API itself does not change.
+
+ * The API should enable extensions written in C to have roughly the
+ same "appearance" to `awk'-level code as `awk' functions do. This
+ means that extensions should have:
+
+ - The ability to access function parameters.
+
+ - The ability to turn an undefined parameter into an array
+ (call by reference).
+
+ - The ability to create, access and update global variables.
+
+ - Easy access to all the elements of an array at once ("array
+ flattening") in order to loop over all the element in an easy
+ fashion for C code.
+
+ - The ability to create arrays (including `gawk''s true
+ multi-dimensional arrays).
+
+ Some additional important goals were:
+
+ * The API should use only features in ISO C 90, so that extensions
+ can be written using the widest range of C and C++ compilers. The
+ header should include the appropriate `#ifdef __cplusplus' and
+ `extern "C"' magic so that a C++ compiler could be used. (If
+ using C++, the runtime system has to be smart enough to call any
+ constructors and destructors, as `gawk' is a C program. As of this
+ writing, this has not been tested.)
+
+ * The API mechanism should not require access to `gawk''s symbols(1)
+ by the compile-time or dynamic linker, in order to enable creation
+ of extensions that also work on Windows.
+
+ During development, it became clear that there were other features
+that should be available to extensions, which were also subsequently
+provided:
+
+ * Extensions should have the ability to hook into `gawk''s I/O
+ redirection mechanism. In particular, the `xgawk' developers
+ provided a so-called "open hook" to take over reading records.
+ During development, this was generalized to allow extensions to
+ hook into input processing, output processing, and two-way I/O.
+
+ * An extension should be able to provide a "call back" function to
+ perform clean up actions when `gawk' exits.
+
+ * An extension should be able to provide a version string so that
+ `gawk''s `--version' option can provide information about
+ extensions as well.
+
+ ---------- Footnotes ----------
+
+ (1) The "symbols" are the variables and functions defined inside
+`gawk'. Access to these symbols by code external to `gawk' loaded
+dynamically at runtime is problematic on Windows.
+
+
+File: gawk.info, Node: Extension Other Design Decisions, Next: Extension Mechanism Outline, Prev: Extension New Mechanism Goals, Up: Extension Design
+
+16.3.3 Other Design Decisions
+-----------------------------
+
+As an "arbitrary" design decision, extensions can read the values of
+built-in variables and arrays (such as `ARGV' and `FS'), but cannot
+change them, with the exception of `PROCINFO'.
+
+ The reason for this is to prevent an extension function from
+affecting the flow of an `awk' program outside its control. While a
+real `awk' function can do what it likes, that is at the discretion of
+the programmer. An extension function should provide a service or make
+a C API available for use within `awk', and not mess with `FS' or
+`ARGC' and `ARGV'.
+
+ In addition, it becomes easy to start down a slippery slope. How
+much access to `gawk' facilities do extensions need? Do they need
+`getline'? What about calling `gsub()' or compiling regular
+expressions? What about calling into `awk' functions? (_That_ would be
+messy.)
+
+ In order to avoid these issues, the `gawk' developers chose to start
+with the simplest, most basic features that are still truly useful.
+
+ Another decision is that although `gawk' provides nice things like
+MPFR, and arrays indexed internally by integers, these features are not
+being brought out to the API in order to keep things simple and close to
+traditional `awk' semantics. (In fact, arrays indexed internally by
+integers are so transparent that they aren't even documented!)
+
+ With time, the API will undoubtedly evolve; the `gawk' developers
+expect this to be driven by user needs. For now, the current API seems
+to provide a minimal yet powerful set of features for creating
+extensions.
+
+
+File: gawk.info, Node: Extension Mechanism Outline, Next: Extension Future Growth, Prev: Extension Other Design Decisions, Up: Extension Design
+
+16.3.4 At A High Level How It Works
+-----------------------------------
+
+The requirement to avoid access to `gawk''s symbols is, at first
+glance, a difficult one to meet.
+
+ One design, apparently used by Perl and Ruby and maybe others, would
+be to make the mainline `gawk' code into a library, with the `gawk'
+utility a small C `main()' function linked against the library.
+
+ This seemed like the tail wagging the dog, complicating build and
+installation and making a simple copy of the `gawk' executable from one
+system to another (or one place to another on the same system!) into a
+chancy operation.
+
+ Pat Rankin suggested the solution that was adopted. Communication
+between `gawk' and an extension is two-way. First, when an extension
+is loaded, it is passed a pointer to a `struct' whose fields are
+function pointers.
+
+ API
+ Struct
+ +---+
+ | |
+ +---+
+ +---------------| |
+ | +---+ dl_load(api_p, id);
+ | | | ___________________
+ | +---+ |
+ | +---------| | __________________ |
+ | | +---+ ||
+ | | | | ||
+ | | +---+ ||
+ | | +---| | ||
+ | | | +---+ \ || /
+ | | | \ /
+ v v v \/
++-------+-+---+-+---+-+------------------+--------------------+
+| |x| |x| |x| |OOOOOOOOOOOOOOOOOOOO|
+| |x| |x| |x| |OOOOOOOOOOOOOOOOOOOO|
+| |x| |x| |x| |OOOOOOOOOOOOOOOOOOOO|
++-------+-+---+-+---+-+------------------+--------------------+
+
+ gawk Main Program Address Space Extension
+Figure 16.1: Loading the extension
+
+ The extension can call functions inside `gawk' through these
+function pointers, at runtime, without needing (link-time) access to
+`gawk''s symbols. One of these function pointers is to a function for
+"registering" new built-in functions.
+
+ register_ext_func({ "chdir", do_chdir, 1 });
+
+ +--------------------------------------------+
+ | |
+ V |
++-------+-+---+-+---+-+------------------+--------------+-+---+
+| |x| |x| |x| |OOOOOOOOOOOOOO|X|OOO|
+| |x| |x| |x| |OOOOOOOOOOOOOO|X|OOO|
+| |x| |x| |x| |OOOOOOOOOOOOOO|X|OOO|
++-------+-+---+-+---+-+------------------+--------------+-+---+
+
+ gawk Main Program Address Space Extension
+Figure 16.2: Loading the new function
+
+ In the other direction, the extension registers its new functions
+with `gawk' by passing function pointers to the functions that provide
+the new feature (`do_chdir()', for example). `gawk' associates the
+function pointer with a name and can then call it, using a defined
+calling convention.
+
+ BEGIN {
+ chdir("/path") (*fnptr)(1);
+ }
+ +--------------------------------------------+
+ | |
+ | V
++-------+-+---+-+---+-+------------------+--------------+-+---+
+| |x| |x| |x| |OOOOOOOOOOOOOO|X|OOO|
+| |x| |x| |x| |OOOOOOOOOOOOOO|X|OOO|
+| |x| |x| |x| |OOOOOOOOOOOOOO|X|OOO|
++-------+-+---+-+---+-+------------------+--------------+-+---+
+
+ gawk Main Program Address Space Extension
+Figure 16.3: Calling the new function
+
+ The `do_XXX()' function, in turn, then uses the function pointers in
+the API `struct' to do its work, such as updating variables or arrays,
+printing messages, setting `ERRNO', and so on.
+
+ Convenience macros in the `gawkapi.h' header file make calling
+through the function pointers look like regular function calls so that
+extension code is quite readable and understandable.
+
+ Although all of this sounds medium complicated, the result is that
+extension code is quite clean and straightforward. This can be seen in
+the sample extensions `filefuncs.c' (*note Extension Example::) and
+also the `testext.c' code for testing the APIs.
+
+ Some other bits and pieces:
+
+ * The API provides access to `gawk''s `do_XXX' values, reflecting
+ command line options, like `do_lint', `do_profiling' and so on
+ (*note Extension API Variables::). These are informational: an
+ extension cannot affect these inside `gawk'. In addition,
+ attempting to assign to them produces a compile-time error.
+
+ * The API also provides major and minor version numbers, so that an
+ extension can check if the `gawk' it is loaded with supports the
+ facilities it was compiled with. (Version mismatches "shouldn't"
+ happen, but we all know how _that_ goes.) *Note Extension
+ Versioning::, for details.
+
+
+File: gawk.info, Node: Extension Future Growth, Prev: Extension Mechanism Outline, Up: Extension Design
+
+16.3.5 Room For Future Growth
+-----------------------------
+
+The API provides room for future growth, in two ways.
+
+ An "extension id" is passed into the extension when its loaded. This
+extension id is then passed back to `gawk' with each function call.
+This allows `gawk' to identify the extension calling into it, should it
+need to know.
+
+ A "name space" is passed into `gawk' when an extension function is
+registered. This provides for a future mechanism for grouping
+extension functions and possibly avoiding name conflicts.
+
+ Of course, as of this writing, no decisions have been made with
+respect to any of the above.
+
+
+File: gawk.info, Node: Extension API Description, Next: Extension Example, Prev: Extension Design, Up: Dynamic Extensions
+
+16.4 API Description
+====================
+
+This (rather large) minor node describes the API in detail.
+
+* Menu:
+
+* Extension API Functions Introduction:: Introduction to the API functions.
+* General Data Types:: The data types.
+* Requesting Values:: How to get a value.
+* Constructor Functions:: Functions for creating values.
+* Registration Functions:: Functions to register things with
+ `gawk'.
+* Printing Messages:: Functions for printing messages.
+* Updating `ERRNO':: Functions for updating `ERRNO'.
+* Accessing Parameters:: Functions for accessing parameters.
+* Symbol Table Access:: Functions for accessing global
+ variables.
+* Array Manipulation:: Functions for working with arrays.
+* Extension API Variables:: Variables provided by the API.
+* Extension API Boilerplate:: Boilerplate code for using the API.
+* Finding Extensions:: How `gawk' find compiled
+ extensions.
+
+
+File: gawk.info, Node: Extension API Functions Introduction, Next: General Data Types, Up: Extension API Description
+
+16.4.1 Introduction
+-------------------
+
+Access to facilities within `gawk' are made available by calling
+through function pointers passed into your extension.
+
+ API function pointers are provided for the following kinds of
+operations:
+
+ * Registrations functions. You may register:
+ - extension functions,
+
+ - exit callbacks,
+
+ - a version string,
+
+ - input parsers,
+
+ - output wrappers,
+
+ - and two-way processors.
+ All of these are discussed in detail, later in this major node.
+
+ * Printing fatal, warning, and "lint" warning messages.
+
+ * Updating `ERRNO', or unsetting it.
+
+ * Accessing parameters, including converting an undefined parameter
+ into an array.
+
+ * Symbol table access: retrieving a global variable, creating one,
+ or changing one. This also includes the ability to create a scalar
+ variable that will be _constant_ within `awk' code.
+
+ * Creating and releasing cached values; this provides an efficient
+ way to use values for multiple variables and can be a big
+ performance win.
+
+ * Manipulating arrays:
+ - Retrieving, adding, deleting, and modifying elements
+
+ - Getting the count of elements in an array
+
+ - Creating a new array
+
+ - Clearing an array
+
+ - Flattening an array for easy C style looping over all its
+ indices and elements
+
+ Some points about using the API:
+
+ * You must include `<sys/types.h>' and `<sys/stat.h>' before
+ including the `gawkapi.h' header file. In addition, you must
+ include either `<stddef.h>' or `<stdlib.h>' to get the definition
+ of `size_t'. If you wish to use the boilerplate `dl_load_func()'
+ macro, you will need to include `<stdio.h>' as well. Finally, to
+ pass reasonable integer values for `ERRNO', you will need to
+ include `<errno.h>'.
+
+ * Although the API only uses ISO C 90 features, there is an
+ exception; the "constructor" functions use the `inline' keyword.
+ If your compiler does not support this keyword, you should either
+ place `-Dinline=''' on your command line, or use the GNU Autotools
+ and include a `config.h' file in your extensions.
+
+ * All pointers filled in by `gawk' are to memory managed by `gawk'
+ and should be treated by the extension as read-only. Memory for
+ _all_ strings passed into `gawk' from the extension _must_ come
+ from `malloc()' and is managed by `gawk' from then on.
+
+ * The API defines several simple structs that map values as seen
+ from `awk'. A value can be a `double', a string, or an array (as
+ in multidimensional arrays, or when creating a new array).
+ Strings maintain both pointer and length since embedded `NUL'
+ characters are allowed.
+
+ By intent, strings are maintained using the current multibyte
+ encoding (as defined by `LC_XXX' environment variables) and not
+ using wide characters. This matches how `gawk' stores strings
+ internally and also how characters are likely to be input and
+ output from files.
+
+ * When retrieving a value (such as a parameter or that of a global
+ variable or array element), the extension requests a specific type
+ (number, string, scalars, value cookie, array, or "undefined").
+ When the request is "undefined," the returned value will have the
+ real underlying type.
+
+ However, if the request and actual type don't match, the access
+ function returns "false" and fills in the type of the actual value
+ that is there, so that the extension can, e.g., print an error
+ message ("scalar passed where array expected").
+
+
+ While you may call the API functions by using the function pointers
+directly, the interface is not so pretty. To make extension code look
+more like regular code, the `gawkapi.h' header file defines a number of
+macros which you should use in your code. This minor node presents the
+macros as if they were functions.
+
+
+File: gawk.info, Node: General Data Types, Next: Requesting Values, Prev: Extension API Functions Introduction, Up: Extension API Description
+
+16.4.2 General Purpose Data Types
+---------------------------------
+
+ I have a true love/hate relationship with unions.
+ Arnold Robbins
+
+ That's the thing about unions: the compiler will arrange things so
+ they can accommodate both love and hate.
+ Chet Ramey
+
+ The extension API defines a number of simple types and structures
+for general purpose use. Additional, more specialized, data structures,
+are introduced in subsequent minor nodes, together with the functions
+that use them.
+
+`typedef void *awk_ext_id_t;'
+ A value of this type is received from `gawk' when an extension is
+ loaded. That value must then be passed back to `gawk' as the
+ first parameter of each API function.
+
+`#define awk_const ...'
+ This macro expands to `const' when compiling an extension, and to
+ nothing when compiling `gawk' itself. This makes certain fields
+ in the API data structures unwritable from extension code, while
+ allowing `gawk' to use them as it needs to.
+
+`typedef int awk_bool_t;'
+ A simple boolean type. At the moment, the API does not define
+ special "true" and "false" values, although perhaps it should.
+
+`typedef struct {'
+` char *str; /* data */'
+` size_t len; /* length thereof, in chars */'
+`} awk_string_t;'
+ This represents a mutable string. `gawk' owns the memory pointed
+ to if it supplied the value. Otherwise, it takes ownership of the
+ memory pointed to. *Such memory must come from `malloc()'!*
+
+ As mentioned earlier, strings are maintained using the current
+ multibyte encoding.
+
+`typedef enum {'
+` AWK_UNDEFINED,'
+` AWK_NUMBER,'
+` AWK_STRING,'
+` AWK_ARRAY,'
+` AWK_SCALAR, /* opaque access to a variable */'
+` AWK_VALUE_COOKIE /* for updating a previously created value */'
+`} awk_valtype_t;'
+ This `enum' indicates the type of a value. It is used in the
+ following `struct'.
+
+`typedef struct {'
+` awk_valtype_t val_type;'
+` union {'
+` awk_string_t s;'
+` double d;'
+` awk_array_t a;'
+` awk_scalar_t scl;'
+` awk_value_cookie_t vc;'
+` } u;'
+`} awk_value_t;'
+ An "`awk' value." The `val_type' member indicates what kind of
+ value the `union' holds, and each member is of the appropriate
+ type.
+
+`#define str_value u.s'
+`#define num_value u.d'
+`#define array_cookie u.a'
+`#define scalar_cookie u.scl'
+`#define value_cookie u.vc'
+ These macros make accessing the fields of the `awk_value_t' more
+ readable.
+
+`typedef void *awk_scalar_t;'
+ Scalars can be represented as an opaque type. These values are
+ obtained from `gawk' and then passed back into it. This is
+ discussed in a general fashion below, and in more detail in *note
+ Symbol table by cookie::.
+
+`typedef void *awk_value_cookie_t;'
+ A "value cookie" is an opaque type representing a cached value.
+ This is also discussed in a general fashion below, and in more
+ detail in *note Cached values::.
+
+
+ Scalar values in `awk' are either numbers or strings. The
+`awk_value_t' struct represents values. The `val_type' member
+indicates what is in the `union'.
+
+ Representing numbers is easy--the API uses a C `double'. Strings
+require more work. Since `gawk' allows embedded `NUL' bytes in string
+values, a string must be represented as a pair containing a
+data-pointer and length. This is the `awk_string_t' type.
+
+ Identifiers (i.e., the names of global variables) can be associated
+with either scalar values or with arrays. In addition, `gawk' provides
+true arrays of arrays, where any given array element can itself be an
+array. Discussion of arrays is delayed until *note Array
+Manipulation::.
+
+ The various macros listed earlier make it easier to use the elements
+of the `union' as if they were fields in a `struct'; this is a common
+coding practice in C. Such code is easier to write and to read,
+however it remains _your_ responsibility to make sure that the
+`val_type' member correctly reflects the type of the value in the
+`awk_value_t'.
+
+ Conceptually, the first three members of the `union' (number, string,
+and array) are all that is needed for working with `awk' values.
+However, since the API provides routines for accessing and changing the
+value of global scalar variables only by using the variable's name,
+there is a performance penalty: `gawk' must find the variable each time
+it is accessed and changed. This turns out to be a real issue, not
+just a theoretical one.
+
+ Thus, if you know that your extension will spend considerable time
+reading and/or changing the value of one or more scalar variables, you
+can obtain a "scalar cookie"(1) object for that variable, and then use
+the cookie for getting the variable's value or for changing the
+variable's value. This is the `awk_scalar_t' type and `scalar_cookie'
+macro. Given a scalar cookie, `gawk' can directly retrieve or modify
+the value, as required, without having to first find it.
+
+ The `awk_value_cookie_t' type and `value_cookie' macro are similar.
+If you know that you wish to use the same numeric or string _value_ for
+one or more variables, you can create the value once, retaining a
+"value cookie" for it, and then pass in that value cookie whenever you
+wish to set the value of a variable. This saves both storage space
+within the running `gawk' process as well as the time needed to create
+the value.
+
+ ---------- Footnotes ----------
+
+ (1) See the "cookie" entry in the Jargon file
+(http://catb.org/jargon/html/C/cookie.html) for a definition of
+"cookie", and the "magic cookie" entry in the Jargon file
+(http://catb.org/jargon/html/M/magic-cookie.html) for a nice example.
+See also the entry for "Cookie" in the *note Glossary::.
+
+
+File: gawk.info, Node: Requesting Values, Next: Constructor Functions, Prev: General Data Types, Up: Extension API Description
+
+16.4.3 Requesting Values
+------------------------
+
+All of the functions that return values from `gawk' work in the same
+way. You pass in an `awk_valtype_t' value to indicate what kind of
+value you expect. If the actual value matches what you requested, the
+function returns true and fills in the `awk_value_t' result.
+Otherwise, the function returns false, and the `val_type' member
+indicates the type of the actual value. You may then print an error
+message, or reissue the request for the actual value type, as
+appropriate. This behavior is summarized in *note
+table-value-types-returned::.
+
+ Type of Actual Value:
+--------------------------------------------------------------------------
+
+ String Number Array Undefined
+------------------------------------------------------------------------------
+ String String String false false
+ Number Number if can Number false false
+ be converted,
+ else false
+Type Array false false Array false
+Requested: Scalar Scalar Scalar false false
+ Undefined String Number Array Undefined
+ Value false false false false
+ Cookie
+
+Table 16.1: Value Types Returned
+
+
+File: gawk.info, Node: Constructor Functions, Next: Registration Functions, Prev: Requesting Values, Up: Extension API Description
+
+16.4.4 Constructor Functions and Convenience Macros
+---------------------------------------------------
+
+The API provides a number of "constructor" functions for creating
+string and numeric values, as well as a number of convenience macros.
+This node presents them all as function prototypes, in the way that
+extension code would use them.
+
+`static inline awk_value_t *'
+`make_const_string(const char *string, size_t length, awk_value_t *result)'
+ This function creates a string value in the `awk_value_t' variable
+ pointed to by `result'. It expects `string' to be a C string
+ constant (or other string data), and automatically creates a
+ _copy_ of the data for storage in `result'. It returns `result'.
+
+`static inline awk_value_t *'
+`make_malloced_string(const char *string, size_t length, awk_value_t *result)'
+ This function creates a string value in the `awk_value_t' variable
+ pointed to by `result'. It expects `string' to be a `char *' value
+ pointing to data previously obtained from `malloc()'. The idea here
+ is that the data is passed directly to `gawk', which assumes
+ responsibility for it. It returns `result'.
+
+`static inline awk_value_t *'
+`make_null_string(awk_value_t *result)'
+ This specialized function creates a null string (the "undefined"
+ value) in the `awk_value_t' variable pointed to by `result'. It
+ returns `result'.
+
+`static inline awk_value_t *'
+`make_number(double num, awk_value_t *result)'
+ This function simply creates a numeric value in the `awk_value_t'
+ variable pointed to by `result'.
+
+ Two convenience macros may be used for allocating storage from
+`malloc()' and `realloc()'. If the allocation fails, they cause `gawk'
+to exit with a fatal error message. They should be used as if they were
+procedure calls that do not return a value.
+
+`emalloc(pointer, type, size, message)'
+ The arguments to this macro are as follows:
+ `pointer'
+ The pointer variable to point at the allocated storage.
+
+ `type'
+ The type of the pointer variable, used to create a cast for
+ the call to `malloc()'.
+
+ `size'
+ The total number of bytes to be allocated.
+
+ `message'
+ A message to be prefixed to the fatal error message.
+ Typically this is the name of the function using the macro.
+
+ For example, you might allocate a string value like so:
+
+ awk_value_t result;
+ char *message;
+ const char greet[] = "Don't Panic!";
+
+ emalloc(message, char *, sizeof(greet), "myfunc");
+ strcpy(message, greet);
+ make_malloced_string(message, strlen(message), & result);
+
+`erealloc(pointer, type, size, message)'
+ This is like `emalloc()', but it calls `realloc()', instead of
+ `malloc()'. The arguments are the same as for the `emalloc()'
+ macro.
+
+
+File: gawk.info, Node: Registration Functions, Next: Printing Messages, Prev: Constructor Functions, Up: Extension API Description
+
+16.4.5 Registration Functions
+-----------------------------
+
+This minor node describes the API functions for registering parts of
+your extension with `gawk'.
+
+* Menu:
+
+* Extension Functions:: Registering extension functions.
+* Exit Callback Functions:: Registering an exit callback.
+* Extension Version String:: Registering a version string.
+* Input Parsers:: Registering an input parser.
+* Output Wrappers:: Registering an output wrapper.
+* Two-way processors:: Registering a two-way processor.
+
+
+File: gawk.info, Node: Extension Functions, Next: Exit Callback Functions, Up: Registration Functions
+
+16.4.5.1 Registering An Extension Function
+..........................................
+
+Extension functions are described by the following record:
+
+ typedef struct {
+ const char *name;
+ awk_value_t *(*function)(int num_actual_args, awk_value_t *result);
+ size_t num_expected_args;
+ } awk_ext_func_t;
+
+ The fields are:
+
+`const char *name;'
+ The name of the new function. `awk' level code calls the function
+ by this name. This is a regular C string.
+
+`awk_value_t *(*function)(int num_actual_args, awk_value_t *result);'
+ This is a pointer to the C function that provides the desired
+ functionality. The function must fill in the result with either a
+ number or a string. `awk' takes ownership of any string memory.
+ As mentioned earlier, string memory *must* come from `malloc()'.
+
+ The function must return the value of `result'. This is for the
+ convenience of the calling code inside `gawk'.
+
+`size_t num_expected_args;'
+ This is the number of arguments the function expects to receive.
+ Each extension function may decide what to do if the number of
+ arguments isn't what it expected. Following `awk' functions, it
+ is likely OK to ignore extra arguments.
+
+ Once you have a record representing your extension function, you
+register it with `gawk' using this API function:
+
+`awk_bool_t add_ext_func(const char *namespace, const awk_ext_func_t *func);'
+ This function returns true upon success, false otherwise. The
+ `namespace' parameter is currently not used; you should pass in an
+ empty string (`""'). The `func' pointer is the address of a
+ `struct' representing your function, as just described.
+
+
+File: gawk.info, Node: Exit Callback Functions, Next: Extension Version String, Prev: Extension Functions, Up: Registration Functions
+
+16.4.5.2 Registering An Exit Callback Function
+..............................................
+
+An "exit callback" function is a function that `gawk' calls before it
+exits. Such functions are useful if you have general "clean up" tasks
+that should be performed in your extension (such as closing data base
+connections or other resource deallocations). You can register such a
+function with `gawk' using the following function.
+
+`void awk_atexit(void (*funcp)(void *data, int exit_status),'
+` void *arg0);'
+ The parameters are:
+ `funcp'
+ A pointer to the function to be called before `gawk' exits.
+ The `data' parameter will be the original value of `arg0'.
+ The `exit_status' parameter is the exit status value that
+ `gawk' will pass to the `exit()' system call.
+
+ `arg0'
+ A pointer to private data which `gawk' saves in order to pass
+ to the function pointed to by `funcp'.
+
+ Exit callback functions are called in Last-In-First-Out (LIFO)
+order--that is, in the reverse order in which they are registered with
+`gawk'.
+
+
+File: gawk.info, Node: Extension Version String, Next: Input Parsers, Prev: Exit Callback Functions, Up: Registration Functions
+
+16.4.5.3 Registering An Extension Version String
+................................................
+
+You can register a version string which indicates the name and version
+of your extension, with `gawk', as follows:
+
+`void register_ext_version(const char *version);'
+ Register the string pointed to by `version' with `gawk'. `gawk'
+ does _not_ copy the `version' string, so it should not be changed.
+
+ `gawk' prints all registered extension version strings when it is
+invoked with the `--version' option.
+
+
+File: gawk.info, Node: Input Parsers, Next: Output Wrappers, Prev: Extension Version String, Up: Registration Functions
+
+16.4.5.4 Customized Input Parsers
+.................................
+
+By default, `gawk' reads text files as its input. It uses the value of
+`RS' to find the end of the record, and then uses `FS' (or
+`FIELDWIDTHS') to split it into fields (*note Reading Files::).
+Additionally, it sets the value of `RT' (*note Built-in Variables::).
+
+ If you want, you can provide your own, custom, input parser. An
+input parser's job is to return a record to the `gawk' record processing
+code, along with indicators for the value and length of the data to be
+used for `RT', if any.
+
+ To provide an input parser, you must first provide two functions
+(where XXX is a prefix name for your extension):
+
+`awk_bool_t XXX_can_take_file(const awk_input_buf_t *iobuf)'
+ This function examines the information available in `iobuf' (which
+ we discuss shortly). Based on the information there, it decides
+ if the input parser should be used for this file. If so, it
+ should return true. Otherwise, it should return false. It should
+ not change any state (variable values, etc.) within `gawk'.
+
+`awk_bool_t XXX_take_control_of(awk_input_buf_t *iobuf)'
+ When `gawk' decides to hand control of the file over to the input
+ parser, it calls this function. This function in turn must fill
+ in certain fields in the `awk_input_buf_t' structure, and ensure
+ that certain conditions are true. It should then return true. If
+ an error of some kind occurs, it should not fill in any fields,
+ and should return false; then `gawk' will not use the input parser.
+ The details are presented shortly.
+
+ Your extension should package these functions inside an
+`awk_input_parser_t', which looks like this:
+
+ typedef struct input_parser {
+ const char *name; /* name of parser */
+ awk_bool_t (*can_take_file)(const awk_input_buf_t *iobuf);
+ awk_bool_t (*take_control_of)(awk_input_buf_t *iobuf);
+ awk_const struct input_parser *awk_const next; /* for use by gawk */
+ } awk_input_parser_t;
+
+ The fields are:
+
+`const char *name;'
+ The name of the input parser. This is a regular C string.
+
+`awk_bool_t (*can_take_file)(const awk_input_buf_t *iobuf);'
+ A pointer to your `XXX_can_take_file()' function.
+
+`awk_bool_t (*take_control_of)(awk_input_buf_t *iobuf);'
+ A pointer to your `XXX_take_control_of()' function.
+
+`awk_const struct input_parser *awk_const next;'
+ This pointer is used by `gawk'. The extension cannot modify it.
+
+ The steps are as follows:
+
+ 1. Create a `static awk_input_parser_t' variable and initialize it
+ appropriately.
+
+ 2. When your extension is loaded, register your input parser with
+ `gawk' using the `register_input_parser()' API function (described
+ below).
+
+ An `awk_input_buf_t' looks like this:
+
+ typedef struct awk_input {
+ const char *name; /* filename */
+ int fd; /* file descriptor */
+ #define INVALID_HANDLE (-1)
+ void *opaque; /* private data for input parsers */
+ int (*get_record)(char **out, struct awk_input *iobuf,
+ int *errcode, char **rt_start, size_t *rt_len);
+ void (*close_func)(struct awk_input *iobuf);
+ struct stat sbuf; /* stat buf */
+ } awk_input_buf_t;
+
+ The fields can be divided into two categories: those for use
+(initially, at least) by `XXX_can_take_file()', and those for use by
+`XXX_take_control_of()'. The first group of fields and their uses are
+as follows:
+
+`const char *name;'
+ The name of the file.
+
+`int fd;'
+ A file descriptor for the file. If `gawk' was able to open the
+ file, then `fd' will _not_ be equal to `INVALID_HANDLE'.
+ Otherwise, it will.
+
+`struct stat sbuf;'
+ If file descriptor is valid, then `gawk' will have filled in this
+ structure via a call to the `fstat()' system call.
+
+ The `XXX_can_take_file()' function should examine these fields and
+decide if the input parser should be used for the file. The decision
+can be made based upon `gawk' state (the value of a variable defined
+previously by the extension and set by `awk' code), the name of the
+file, whether or not the file descriptor is valid, the information in
+the `struct stat', or any combination of the above.
+
+ Once `XXX_can_take_file()' has returned true, and `gawk' has decided
+to use your input parser, it calls `XXX_take_control_of()'. That
+function then fills in at least the `get_record' field of the
+`awk_input_buf_t'. It must also ensure that `fd' is not set to
+`INVALID_HANDLE'. All of the fields that may be filled by
+`XXX_take_control_of()' are as follows:
+
+`void *opaque;'
+ This is used to hold any state information needed by the input
+ parser for this file. It is "opaque" to `gawk'. The input parser
+ is not required to use this pointer.
+
+`int (*get_record)(char **out,'
+` struct awk_input *iobuf,'
+` int *errcode,'
+` char **rt_start,'
+` size_t *rt_len);'
+ This function pointer should point to a function that creates the
+ input records. Said function is the core of the input parser.
+ Its behavior is described below.
+
+`void (*close_func)(struct awk_input *iobuf);'
+ This function pointer should point to a function that does the
+ "tear down." It should release any resources allocated by
+ `XXX_take_control_of()'. It may also close the file. If it does
+ so, it should set the `fd' field to `INVALID_HANDLE'.
+
+ If `fd' is still not `INVALID_HANDLE' after the call to this
+ function, `gawk' calls the regular `close()' system call.
+
+ Having a "tear down" function is optional. If your input parser
+ does not need it, do not set this field. Then, `gawk' calls the
+ regular `close()' system call on the file descriptor, so it should
+ be valid.
+
+ The `XXX_get_record()' function does the work of creating input
+records. The parameters are as follows:
+
+`char **out'
+ This is a pointer to a `char *' variable which is set to point to
+ the record. `gawk' makes its own copy of the data, so the
+ extension must manage this storage.
+
+`struct awk_input *iobuf'
+ This is the `awk_input_buf_t' for the file. The fields should be
+ used for reading data (`fd') and for managing private state
+ (`opaque'), if any.
+
+`int *errcode'
+ If an error occurs, `*errcode' should be set to an appropriate
+ code from `<errno.h>'.
+
+`char **rt_start'
+`size_t *rt_len'
+ If the concept of a "record terminator" makes sense, then
+ `*rt_start' should be set to point to the data to be used for
+ `RT', and `*rt_len' should be set to the length of the data.
+ Otherwise, `*rt_len' should be set to zero. `gawk' makes its own
+ copy of this data, so the extension must manage the storage.
+
+ The return value is the length of the buffer pointed to by `*out',
+or `EOF' if end-of-file was reached or an error occurred.
+
+ It is guaranteed that `errcode' is a valid pointer, so there is no
+need to test for a `NULL' value. `gawk' sets `*errcode' to zero, so
+there is no need to set it unless an error occurs.
+
+ If an error does occur, the function should return `EOF' and set
+`*errcode' to a non-zero value. In that case, if `*errcode' does not
+equal -1, `gawk' automatically updates the `ERRNO' variable based on
+the value of `*errcode' (e.g., setting `*errcode = errno' should do the
+right thing).
+
+ `gawk' ships with a sample extension that reads directories,
+returning records for each entry in the directory (*note Extension
+Sample Readdir::). You may wish to use that code as a guide for writing
+your own input parser.
+
+ When writing an input parser, you should think about (and document)
+how it is expected to interact with `awk' code. You may want it to
+always be called, and take effect as appropriate (as the `readdir'
+extension does). Or you may want it to take effect based upon the
+value of an `awk' variable, as the XML extension from the `gawkextlib'
+project does (*note gawkextlib::). In the latter case, code in a
+`BEGINFILE' section can look at `FILENAME' and `ERRNO' to decide
+whether or not to activate an input parser (*note BEGINFILE/ENDFILE::).
+
+ You register your input parser with the following function:
+
+`void register_input_parser(awk_input_parser_t *input_parser);'
+ Register the input parser pointed to by `input_parser' with `gawk'.
+
+
+File: gawk.info, Node: Output Wrappers, Next: Two-way processors, Prev: Input Parsers, Up: Registration Functions
+
+16.4.5.5 Customized Output Wrappers
+...................................
+
+An "output wrapper" is the mirror image of an input parser. It allows
+an extension to take over the output to a file opened with the `>' or
+`>>' operators (*note Redirection::).
+
+ The output wrapper is very similar to the input parser structure:
+
+ typedef struct output_wrapper {
+ const char *name; /* name of the wrapper */
+ awk_bool_t (*can_take_file)(const awk_output_buf_t *outbuf);
+ awk_bool_t (*take_control_of)(awk_output_buf_t *outbuf);
+ awk_const struct output_wrapper *awk_const next; /* for use by gawk */
+ } awk_output_wrapper_t;
+
+ The members are as follows:
+
+`const char *name;'
+ This is the name of the output wrapper.
+
+`awk_bool_t (*can_take_file)(const awk_output_buf_t *outbuf);'
+ This points to a function that examines the information in the
+ `awk_output_buf_t' structure pointed to by `outbuf'. It should
+ return true if the output wrapper wants to take over the file, and
+ false otherwise. It should not change any state (variable values,
+ etc.) within `gawk'.
+
+`awk_bool_t (*take_control_of)(awk_output_buf_t *outbuf);'
+ The function pointed to by this field is called when `gawk'
+ decides to let the output wrapper take control of the file. It
+ should fill in appropriate members of the `awk_output_buf_t'
+ structure, as described below, and return true if successful,
+ false otherwise.
+
+`awk_const struct output_wrapper *awk_const next;'
+ This is for use by `gawk'.
+
+ The `awk_output_buf_t' structure looks like this:
+
+ typedef struct {
+ const char *name; /* name of output file */
+ const char *mode; /* mode argument to fopen */
+ FILE *fp; /* stdio file pointer */
+ awk_bool_t redirected; /* true if a wrapper is active */
+ void *opaque; /* for use by output wrapper */
+ size_t (*gawk_fwrite)(const void *buf, size_t size, size_t count,
+ FILE *fp, void *opaque);
+ int (*gawk_fflush)(FILE *fp, void *opaque);
+ int (*gawk_ferror)(FILE *fp, void *opaque);
+ int (*gawk_fclose)(FILE *fp, void *opaque);
+ } awk_output_buf_t;
+
+ Here too, your extension will define `XXX_can_take_file()' and
+`XXX_take_control_of()' functions that examine and update data members
+in the `awk_output_buf_t'. The data members are as follows:
+
+`const char *name;'
+ The name of the output file.
+
+`const char *mode;'
+ The mode string (as would be used in the second argument to
+ `fopen()') with which the file was opened.
+
+`FILE *fp;'
+ The `FILE' pointer from `<stdio.h>'. `gawk' opens the file before
+ attempting to find an output wrapper.
+
+`awk_bool_t redirected;'
+ This field must be set to true by the `XXX_take_control_of()'
+ function.
+
+`void *opaque;'
+ This pointer is opaque to `gawk'. The extension should use it to
+ store a pointer to any private data associated with the file.
+
+`size_t (*gawk_fwrite)(const void *buf, size_t size, size_t count,'
+` FILE *fp, void *opaque);'
+`int (*gawk_fflush)(FILE *fp, void *opaque);'
+`int (*gawk_ferror)(FILE *fp, void *opaque);'
+`int (*gawk_fclose)(FILE *fp, void *opaque);'
+ These pointers should be set to point to functions that perform
+ the equivalent function as the `<stdio.h>' functions do, if
+ appropriate. `gawk' uses these function pointers for all output.
+ `gawk' initializes the pointers to point to internal, "pass
+ through" functions that just call the regular `<stdio.h>'
+ functions, so an extension only needs to redefine those functions
+ that are appropriate for what it does.
+
+ The `XXX_can_take_file()' function should make a decision based upon
+the `name' and `mode' fields, and any additional state (such as `awk'
+variable values) that is appropriate.
+
+ When `gawk' calls `XXX_take_control_of()', it should fill in the
+other fields, as appropriate, except for `fp', which it should just use
+normally.
+
+ You register your output wrapper with the following function:
+
+`void register_output_wrapper(awk_output_wrapper_t *output_wrapper);'
+ Register the output wrapper pointed to by `output_wrapper' with
+ `gawk'.
+
+
+File: gawk.info, Node: Two-way processors, Prev: Output Wrappers, Up: Registration Functions
+
+16.4.5.6 Customized Two-way Processors
+......................................
+
+A "two-way processor" combines an input parser and an output wrapper for
+two-way I/O with the `|&' operator (*note Redirection::). It makes
+identical use of the `awk_input_parser_t' and `awk_output_buf_t'
+structures as described earlier.
+
+ A two-way processor is represented by the following structure:
+
+ typedef struct two_way_processor {
+ const char *name; /* name of the two-way processor */
+ awk_bool_t (*can_take_two_way)(const char *name);
+ awk_bool_t (*take_control_of)(const char *name,
+ awk_input_buf_t *inbuf,
+ awk_output_buf_t *outbuf);
+ awk_const struct two_way_processor *awk_const next; /* for use by gawk */
+ } awk_two_way_processor_t;
+
+ The fields are as follows:
+
+`const char *name;'
+ The name of the two-way processor.
+
+`awk_bool_t (*can_take_two_way)(const char *name);'
+ This function returns true if it wants to take over two-way I/O
+ for this filename. It should not change any state (variable
+ values, etc.) within `gawk'.
+
+`awk_bool_t (*take_control_of)(const char *name,'
+` awk_input_buf_t *inbuf,'
+` awk_output_buf_t *outbuf);'
+ This function should fill in the `awk_input_buf_t' and
+ `awk_outut_buf_t' structures pointed to by `inbuf' and `outbuf',
+ respectively. These structures were described earlier.
+
+`awk_const struct two_way_processor *awk_const next;'
+ This is for use by `gawk'.
+
+ As with the input parser and output processor, you provide "yes I
+can take this" and "take over for this" functions,
+`XXX_can_take_two_way()' and `XXX_take_control_of()'.
+
+ You register your two-way processor with the following function:
+
+`void register_two_way_processor(awk_two_way_processor_t *two_way_processor);'
+ Register the two-way processor pointed to by `two_way_processor'
+ with `gawk'.
+
+
+File: gawk.info, Node: Printing Messages, Next: Updating `ERRNO', Prev: Registration Functions, Up: Extension API Description
+
+16.4.6 Printing Messages
+------------------------
+
+You can print different kinds of warning messages from your extension,
+as described below. Note that for these functions, you must pass in
+the extension id received from `gawk' when the extension was loaded.(1)
+
+`void fatal(awk_ext_id_t id, const char *format, ...);'
+ Print a message and then cause `gawk' to exit immediately.
+
+`void warning(awk_ext_id_t id, const char *format, ...);'
+ Print a warning message.
+
+`void lintwarn(awk_ext_id_t id, const char *format, ...);'
+ Print a "lint warning." Normally this is the same as printing a
+ warning message, but if `gawk' was invoked with `--lint=fatal',
+ then lint warnings become fatal error messages.
+
+ All of these functions are otherwise like the C `printf()' family of
+functions, where the `format' parameter is a string with literal
+characters and formatting codes intermixed.
+
+ ---------- Footnotes ----------
+
+ (1) Because the API uses only ISO C 90 features, it cannot make use
+of the ISO C 99 variadic macro feature to hide that parameter. More's
+the pity.
+
+
+File: gawk.info, Node: Updating `ERRNO', Next: Accessing Parameters, Prev: Printing Messages, Up: Extension API Description
+
+16.4.7 Updating `ERRNO'
+-----------------------
+
+The following functions allow you to update the `ERRNO' variable:
+
+`void update_ERRNO_int(int errno_val);'
+ Set `ERRNO' to the string equivalent of the error code in
+ `errno_val'. The value should be one of the defined error codes in
+ `<errno.h>', and `gawk' turns it into a (possibly translated)
+ string using the C `strerror()' function.
+
+`void update_ERRNO_string(const char *string);'
+ Set `ERRNO' directly to the string value of `ERRNO'. `gawk' makes
+ a copy of the value of `string'.
+
+`void unset_ERRNO();'
+ Unset `ERRNO'.
+
+
+File: gawk.info, Node: Accessing Parameters, Next: Symbol Table Access, Prev: Updating `ERRNO', Up: Extension API Description
+
+16.4.8 Accessing and Updating Parameters
+----------------------------------------
+
+Two functions give you access to the arguments (parameters) passed to
+your extension function. They are:
+
+`awk_bool_t get_argument(size_t count,'
+` awk_valtype_t wanted,'
+` awk_value_t *result);'
+ Fill in the `awk_value_t' structure pointed to by `result' with
+ the `count''th argument. Return true if the actual type matches
+ `wanted', false otherwise. In the latter case, `result->val_type'
+ indicates the actual type (*note Table 16.1:
+ table-value-types-returned.). Counts are zero based--the first
+ argument is numbered zero, the second one, and so on. `wanted'
+ indicates the type of value expected.
+
+`awk_bool_t set_argument(size_t count, awk_array_t array);'
+ Convert a parameter that was undefined into an array; this provides
+ call-by-reference for arrays. Return false if `count' is too big,
+ or if the argument's type is not undefined. *Note Array
+ Manipulation::, for more information on creating arrays.
+
+
+File: gawk.info, Node: Symbol Table Access, Next: Array Manipulation, Prev: Accessing Parameters, Up: Extension API Description
+
+16.4.9 Symbol Table Access
+--------------------------
+
+Two sets of routines provide access to global variables, and one set
+allows you to create and release cached values.
+
+* Menu:
+
+* Symbol table by name:: Accessing variables by name.
+* Symbol table by cookie:: Accessing variables by ``cookie''.
+* Cached values:: Creating and using cached values.
+
+
+File: gawk.info, Node: Symbol table by name, Next: Symbol table by cookie, Up: Symbol Table Access
+
+16.4.9.1 Variable Access and Update by Name
+...........................................
+
+The following routines provide the ability to access and update global
+`awk'-level variables by name. In compiler terminology, identifiers of
+different kinds are termed "symbols", thus the "sym" in the routines'
+names. The data structure which stores information about symbols is
+termed a "symbol table".
+
+`awk_bool_t sym_lookup(const char *name,'
+` awk_valtype_t wanted,'
+` awk_value_t *result);'
+ Fill in the `awk_value_t' structure pointed to by `result' with
+ the value of the variable named by the string `name', which is a
+ regular C string. `wanted' indicates the type of value expected.
+ Return true if the actual type matches `wanted', false otherwise
+ In the latter case, `result->val_type' indicates the actual type
+ (*note Table 16.1: table-value-types-returned.).
+
+`awk_bool_t sym_update(const char *name, awk_value_t *value);'
+ Update the variable named by the string `name', which is a regular
+ C string. The variable is added to `gawk''s symbol table if it is
+ not there. Return true if everything worked, false otherwise.
+
+ Changing types (scalar to array or vice versa) of an existing
+ variable is _not_ allowed, nor may this routine be used to update
+ an array. This routine cannot be be used to update any of the
+ predefined variables (such as `ARGC' or `NF').
+
+`awk_bool_t sym_constant(const char *name, awk_value_t *value);'
+ Create a variable named by the string `name', which is a regular C
+ string, that has the constant value as given by `value'.
+ `awk'-level code cannot change the value of this variable.(1) The
+ extension may change the value of `name''s variable with
+ subsequent calls to this routine, and may also convert a variable
+ created by `sym_update()' into a constant. However, once a
+ variable becomes a constant it cannot later be reverted into a
+ mutable variable.
+
+ ---------- Footnotes ----------
+
+ (1) There (currently) is no `awk'-level feature that provides this
+ability.
+
+
+File: gawk.info, Node: Symbol table by cookie, Next: Cached values, Prev: Symbol table by name, Up: Symbol Table Access
+
+16.4.9.2 Variable Access and Update by Cookie
+.............................................
+
+A "scalar cookie" is an opaque handle that provide access to a global
+variable or array. It is an optimization that avoids looking up
+variables in `gawk''s symbol table every time access is needed. This
+was discussed earlier, in *note General Data Types::.
+
+ The following functions let you work with scalar cookies.
+
+`awk_bool_t sym_lookup_scalar(awk_scalar_t cookie,'
+` awk_valtype_t wanted,'
+` awk_value_t *result);'
+ Retrieve the current value of a scalar cookie. Once you have
+ obtained a scalar_cookie using `sym_lookup()', you can use this
+ function to get its value more efficiently. Return false if the
+ value cannot be retrieved.
+
+`awk_bool_t sym_update_scalar(awk_scalar_t cookie, awk_value_t *value);'
+ Update the value associated with a scalar cookie. Return false if
+ the new value is not one of `AWK_STRING' or `AWK_NUMBER'. Here
+ too, the built-in variables may not be updated.
+
+ It is not obvious at first glance how to work with scalar cookies or
+what their raison d'etre really is. In theory, the `sym_lookup()' and
+`sym_update()' routines are all you really need to work with variables.
+For example, you might have code that looked up the value of a
+variable, evaluated a condition, and then possibly changed the value of
+the variable based on the result of that evaluation, like so:
+
+ /* do_magic --- do something really great */
+
+ static awk_value_t *
+ do_magic(int nargs, awk_value_t *result)
+ {
+ awk_value_t value;
+
+ if ( sym_lookup("MAGIC_VAR", AWK_NUMBER, & value)
+ && some_condition(value.num_value)) {
+ value.num_value += 42;
+ sym_update("MAGIC_VAR", & value);
+ }
+
+ return make_number(0.0, result);
+ }
+
+This code looks (and is) simple and straightforward. So what's the
+problem?
+
+ Consider what happens if `awk'-level code associated with your
+extension calls the `magic()' function (implemented in C by
+`do_magic()'), once per record, while processing hundreds of thousands
+or millions of records. The `MAGIC_VAR' variable is looked up in the
+symbol table once or twice per function call!
+
+ The symbol table lookup is really pure overhead; it is considerably
+more efficient to get a cookie that represents the variable, and use
+that to get the variable's value and update it as needed.(1)
+
+ Thus, the way to use cookies is as follows. First, install your
+extension's variable in `gawk''s symbol table using `sym_update()', as
+usual. Then get a scalar cookie for the variable using `sym_lookup()':
+
+ static awk_scalar_t magic_var_cookie; /* cookie for MAGIC_VAR */
+
+ static void
+ my_extension_init()
+ {
+ awk_value_t value;
+
+ /* install initial value */
+ sym_update("MAGIC_VAR", make_number(42.0, & value));
+
+ /* get cookie */
+ sym_lookup("MAGIC_VAR", AWK_SCALAR, & value);
+
+ /* save the cookie */
+ magic_var_cookie = value.scalar_cookie;
+ ...
+ }
+
+ Next, use the routines in this section for retrieving and updating
+the value through the cookie. Thus, `do_magic()' now becomes something
+like this:
+
+ /* do_magic --- do something really great */
+
+ static awk_value_t *
+ do_magic(int nargs, awk_value_t *result)
+ {
+ awk_value_t value;
+
+ if ( sym_lookup_scalar(magic_var_cookie, AWK_NUMBER, & value)
+ && some_condition(value.num_value)) {
+ value.num_value += 42;
+ sym_update_scalar(magic_var_cookie, & value);
+ }
+ ...
+
+ return make_number(0.0, result);
+ }
+
+ NOTE: The previous code omitted error checking for presentation
+ purposes. Your extension code should be more robust and carefully
+ check the return values from the API functions.
+
+ ---------- Footnotes ----------
+
+ (1) The difference is measurable and quite real. Trust us.
+
+
+File: gawk.info, Node: Cached values, Prev: Symbol table by cookie, Up: Symbol Table Access
+
+16.4.9.3 Creating and Using Cached Values
+.........................................
+
+The routines in this section allow you to create and release cached
+values. As with scalar cookies, in theory, cached values are not
+necessary. You can create numbers and strings using the functions in
+*note Constructor Functions::. You can then assign those values to
+variables using `sym_update()' or `sym_update_scalar()', as you like.
+
+ However, you can understand the point of cached values if you
+remember that _every_ string value's storage _must_ come from
+`malloc()'. If you have 20 variables, all of which have the same
+string value, you must create 20 identical copies of the string.(1)
+
+ It is clearly more efficient, if possible, to create a value once,
+and then tell `gawk' to reuse the value for multiple variables. That is
+what the routines in this section let you do. The functions are as
+follows:
+
+`awk_bool_t create_value(awk_value_t *value, awk_value_cookie_t *result);'
+ Create a cached string or numeric value from `value' for efficient
+ later assignment. Only `AWK_NUMBER' and `AWK_STRING' values are
+ allowed. Any other type is rejected. While `AWK_UNDEFINED' could
+ be allowed, doing so would result in inferior performance.
+
+`awk_bool_t release_value(awk_value_cookie_t vc);'
+ Release the memory associated with a value cookie obtained from
+ `create_value()'.
+
+ You use value cookies in a fashion similar to the way you use scalar
+cookies. In the extension initialization routine, you create the value
+cookie:
+
+ static awk_value_cookie_t answer_cookie; /* static value cookie */
+
+ static void
+ my_extension_init()
+ {
+ awk_value_t value;
+ char *long_string;
+ size_t long_string_len;
+
+ /* code from earlier */
+ ...
+ /* ... fill in long_string and long_string_len ... */
+ make_malloced_string(long_string, long_string_len, & value);
+ create_value(& value, & answer_cookie); /* create cookie */
+ ...
+ }
+
+ Once the value is created, you can use it as the value of any number
+of variables:
+
+ static awk_value_t *
+ do_magic(int nargs, awk_value_t *result)
+ {
+ awk_value_t new_value;
+
+ ... /* as earlier */
+
+ value.val_type = AWK_VALUE_COOKIE;
+ value.value_cookie = answer_cookie;
+ sym_update("VAR1", & value);
+ sym_update("VAR2", & value);
+ ...
+ sym_update("VAR100", & value);
+ ...
+ }
+
+Using value cookies in this way saves considerable storage, since all of
+`VAR1' through `VAR100' share the same value.
+
+ You might be wondering, "Is this sharing problematic? What happens
+if `awk' code assigns a new value to `VAR1', are all the others be
+changed too?"
+
+ That's a great question. The answer is that no, it's not a problem.
+`gawk' is smart enough to avoid such problems.
+
+ Finally, as part of your clean up action (*note Exit Callback
+Functions::) you should release any cached values that you created,
+using `release_value()'.
+
+ ---------- Footnotes ----------
+
+ (1) Numeric values are clearly less problematic, requiring only a C
+`double' to store.
+
+
+File: gawk.info, Node: Array Manipulation, Next: Extension API Variables, Prev: Symbol Table Access, Up: Extension API Description
+
+16.4.10 Array Manipulation
+--------------------------
+
+The primary data structure(1) in `awk' is the associative array (*note
+Arrays::). Extensions need to be able to manipulate `awk' arrays. The
+API provides a number of data structures for working with arrays,
+functions for working with individual elements, and functions for
+working with arrays as a whole. This includes the ability to "flatten"
+an array so that it is easy for C code to traverse every element in an
+array. The array data structures integrate nicely with the data
+structures for values to make it easy to both work with and create true
+arrays of arrays (*note General Data Types::).
+
+* Menu:
+
+* Array Data Types:: Data types for working with arrays.
+* Array Functions:: Functions for working with arrays.
+* Flattening Arrays:: How to flatten arrays.
+* Creating Arrays:: How to create and populate arrays.
+
+ ---------- Footnotes ----------
+
+ (1) Okay, the only data structure.
+
+
+File: gawk.info, Node: Array Data Types, Next: Array Functions, Up: Array Manipulation
+
+16.4.10.1 Array Data Types
+..........................
+
+The data types associated with arrays are listed below.
+
+`typedef void *awk_array_t;'
+ If you request the value of an array variable, you get back an
+ `awk_array_t' value. This value is opaque(1) to the extension; it
+ uniquely identifies the array but can only be used by passing it
+ into API functions or receiving it from API functions. This is
+ very similar to way `FILE *' values are used with the `<stdio.h>'
+ library routines.
+
+`'
+
+`typedef struct awk_element {'
+` /* convenience linked list pointer, not used by gawk */'
+` struct awk_element *next;'
+` enum {'
+` AWK_ELEMENT_DEFAULT = 0, /* set by gawk */'
+` AWK_ELEMENT_DELETE = 1 /* set by extension if should be deleted */'
+` } flags;'
+` awk_value_t index;'
+` awk_value_t value;'
+`} awk_element_t;'
+ The `awk_element_t' is a "flattened" array element. `awk' produces
+ an array of these inside the `awk_flat_array_t' (see the next
+ item). Individual elements may be marked for deletion. New
+ elements must be added individually, one at a time, using the
+ separate API for that purpose. The fields are as follows:
+
+ `struct awk_element *next;'
+ This pointer is for the convenience of extension writers. It
+ allows an extension to create a linked list of new elements
+ which can then be added to an array in a loop that traverses
+ the list.
+
+ `enum { ... } flags;'
+ A set of flag values that convey information between `gawk'
+ and the extension. Currently there is only one:
+ `AWK_ELEMENT_DELETE', which the extension can set to cause
+ `gawk' to delete the element from the original array upon
+ release of the flattened array.
+
+ `index'
+ `value'
+ The index and value of the element, respectively. _All_
+ memory pointed to by `index' and `value' belongs to `gawk'.
+
+`typedef struct awk_flat_array {'
+` awk_const void *awk_const opaque1; /* private data for use by gawk */'
+` awk_const void *awk_const opaque2; /* private data for use by gawk */'
+` awk_const size_t count; /* how many elements */'
+` awk_element_t elements[1]; /* will be extended */'
+`} awk_flat_array_t;'
+ This is a flattened array. When an extension gets one of these
+ from `gawk', the `elements' array is of actual size `count'. The
+ `opaque1' and `opaque2' pointers are for use by `gawk'; therefore
+ they are marked `awk_const' so that the extension cannot modify
+ them.
+
+ ---------- Footnotes ----------
+
+ (1) It is also a "cookie," but the `gawk' developers did not wish to
+overuse this term.
+
+
+File: gawk.info, Node: Array Functions, Next: Flattening Arrays, Prev: Array Data Types, Up: Array Manipulation
+
+16.4.10.2 Array Functions
+.........................
+
+The following functions relate to individual array elements.
+
+`awk_bool_t get_element_count(awk_array_t a_cookie, size_t *count);'
+ For the array represented by `a_cookie', return in `*count' the
+ number of elements it contains. A subarray counts as a single
+ element. Return false if there is an error.
+
+`awk_bool_t get_array_element(awk_array_t a_cookie,'
+` const awk_value_t *const index,'
+` awk_valtype_t wanted,'
+` awk_value_t *result);'
+ For the array represented by `a_cookie', return in `*result' the
+ value of the element whose index is `index'. `wanted' specifies
+ the type of value you wish to retrieve. Return false if `wanted'
+ does not match the actual type or if `index' is not in the array
+ (*note Table 16.1: table-value-types-returned.).
+
+ The value for `index' can be numeric, in which case `gawk'
+ converts it to a string. Using non-integral values is possible, but
+ requires that you understand how such values are converted to
+ strings (*note Conversion::); thus using integral values is safest.
+
+ As with _all_ strings passed into `gawk' from an extension, the
+ string value of `index' must come from `malloc()', and `gawk'
+ releases the storage.
+
+`awk_bool_t set_array_element(awk_array_t a_cookie,'
+` const awk_value_t *const index,'
+` const awk_value_t *const value);'
+ In the array represented by `a_cookie', create or modify the
+ element whose index is given by `index'. The `ARGV' and `ENVIRON'
+ arrays may not be changed.
+
+`awk_bool_t set_array_element_by_elem(awk_array_t a_cookie,'
+` awk_element_t element);'
+ Like `set_array_element()', but take the `index' and `value' from
+ `element'. This is a convenience macro.
+
+`awk_bool_t del_array_element(awk_array_t a_cookie,'
+` const awk_value_t* const index);'
+ Remove the element with the given index from the array represented
+ by `a_cookie'. Return true if the element was removed, or false
+ if the element did not exist in the array.
+
+ The following functions relate to arrays as a whole:
+
+`awk_array_t create_array();'
+ Create a new array to which elements may be added. *Note Creating
+ Arrays::, for a discussion of how to create a new array and add
+ elements to it.
+
+`awk_bool_t clear_array(awk_array_t a_cookie);'
+ Clear the array represented by `a_cookie'. Return false if there
+ was some kind of problem, true otherwise. The array remains an
+ array, but after calling this function, it has no elements. This
+ is equivalent to using the `delete' statement (*note Delete::).
+
+`awk_bool_t flatten_array(awk_array_t a_cookie, awk_flat_array_t **data);'
+ For the array represented by `a_cookie', create an
+ `awk_flat_array_t' structure and fill it in. Set the pointer whose
+ address is passed as `data' to point to this structure. Return
+ true upon success, or false otherwise. *Note Flattening Arrays::,
+ for a discussion of how to flatten an array and work with it.
+
+`awk_bool_t release_flattened_array(awk_array_t a_cookie,'
+` awk_flat_array_t *data);'
+ When done with a flattened array, release the storage using this
+ function. You must pass in both the original array cookie, and
+ the address of the created `awk_flat_array_t' structure. The
+ function returns true upon success, false otherwise.
+
+
+File: gawk.info, Node: Flattening Arrays, Next: Creating Arrays, Prev: Array Functions, Up: Array Manipulation
+
+16.4.10.3 Working With All The Elements of an Array
+...................................................
+
+To "flatten" an array is create a structure that represents the full
+array in a fashion that makes it easy for C code to traverse the entire
+array. Test code in `extension/testext.c' does this, and also serves
+as a nice example to show how to use the APIs.
+
+ First, the `gawk' script that drives the test extension:
+
+ @load "testext"
+ BEGIN {
+ n = split("blacky rusty sophie raincloud lucky", pets)
+ printf "pets has %d elements\n", length(pets)
+ ret = dump_array_and_delete("pets", "3")
+ printf "dump_array_and_delete(pets) returned %d\n", ret
+ if ("3" in pets)
+ printf("dump_array_and_delete() did NOT remove index \"3\"!\n")
+ else
+ printf("dump_array_and_delete() did remove index \"3\"!\n")
+ print ""
+ }
+
+This code creates an array with `split()' (*note String Functions::)
+and then calls `dump_and_delete()'. That function looks up the array
+whose name is passed as the first argument, and deletes the element at
+the index passed in the second argument. It then prints the return
+value and checks if the element was indeed deleted. Here is the C code
+that implements `dump_array_and_delete()'. It has been edited slightly
+for presentation.
+
+ The first part declares variables, sets up the default return value
+in `result', and checks that the function was called with the correct
+number of arguments:
+
+ static awk_value_t *
+ dump_array_and_delete(int nargs, awk_value_t *result)
+ {
+ awk_value_t value, value2, value3;
+ awk_flat_array_t *flat_array;
+ size_t count;
+ char *name;
+ int i;
+
+ assert(result != NULL);
+ make_number(0.0, result);
+
+ if (nargs != 2) {
+ printf("dump_array_and_delete: nargs not right "
+ "(%d should be 2)\n", nargs);
+ goto out;
+ }
+
+ The function then proceeds in steps, as follows. First, retrieve the
+name of the array, passed as the first argument. Then retrieve the
+array itself. If either operation fails, print error messages and
+return:
+
+ /* get argument named array as flat array and print it */
+ if (get_argument(0, AWK_STRING, & value)) {
+ name = value.str_value.str;
+ if (sym_lookup(name, AWK_ARRAY, & value2))
+ printf("dump_array_and_delete: sym_lookup of %s passed\n",
+ name);
+ else {
+ printf("dump_array_and_delete: sym_lookup of %s failed\n",
+ name);
+ goto out;
+ }
+ } else {
+ printf("dump_array_and_delete: get_argument(0) failed\n");
+ goto out;
+ }
+
+ For testing purposes and to make sure that the C code sees the same
+number of elements as the `awk' code, the second step is to get the
+count of elements in the array and print it:
+
+ if (! get_element_count(value2.array_cookie, & count)) {
+ printf("dump_array_and_delete: get_element_count failed\n");
+ goto out;
+ }
+
+ printf("dump_array_and_delete: incoming size is %lu\n",
+ (unsigned long) count);
+
+ The third step is to actually flatten the array, and then to double
+check that the count in the `awk_flat_array_t' is the same as the count
+just retrieved:
+
+ if (! flatten_array(value2.array_cookie, & flat_array)) {
+ printf("dump_array_and_delete: could not flatten array\n");
+ goto out;
+ }
+
+ if (flat_array->count != count) {
+ printf("dump_array_and_delete: flat_array->count (%lu)"
+ " != count (%lu)\n",
+ (unsigned long) flat_array->count,
+ (unsigned long) count);
+ goto out;
+ }
+
+ The fourth step is to retrieve the index of the element to be
+deleted, which was passed as the second argument. Remember that
+argument counts passed to `get_argument()' are zero-based, thus the
+second argument is numbered one:
+
+ if (! get_argument(1, AWK_STRING, & value3)) {
+ printf("dump_array_and_delete: get_argument(1) failed\n");
+ goto out;
+ }
+
+ The fifth step is where the "real work" is done. The function loops
+over every element in the array, printing the index and element values.
+In addition, upon finding the element with the index that is supposed
+to be deleted, the function sets the `AWK_ELEMENT_DELETE' bit in the
+`flags' field of the element. When the array is released, `gawk'
+traverses the flattened array, and deletes any element which have this
+flag bit set:
+
+ for (i = 0; i < flat_array->count; i++) {
+ printf("\t%s[\"%.*s\"] = %s\n",
+ name,
+ (int) flat_array->elements[i].index.str_value.len,
+ flat_array->elements[i].index.str_value.str,
+ valrep2str(& flat_array->elements[i].value));
+
+ if (strcmp(value3.str_value.str,
+ flat_array->elements[i].index.str_value.str)
+ == 0) {
+ flat_array->elements[i].flags |= AWK_ELEMENT_DELETE;
+ printf("dump_array_and_delete: marking element \"%s\" "
+ "for deletion\n",
+ flat_array->elements[i].index.str_value.str);
+ }
+ }
+
+ The sixth step is to release the flattened array. This tells `gawk'
+that the extension is no longer using the array, and that it should
+delete any elements marked for deletion. `gawk' also frees any storage
+that was allocated, so you should not use the pointer (`flat_array' in
+this code) once you have called `release_flattened_array()':
+
+ if (! release_flattened_array(value2.array_cookie, flat_array)) {
+ printf("dump_array_and_delete: could not release flattened array\n");
+ goto out;
+ }
+
+ Finally, since everything was successful, the function sets the
+return value to success, and returns:
+
+ make_number(1.0, result);
+ out:
+ return result;
+ }
+
+ Here is the output from running this part of the test:
+
+ pets has 5 elements
+ dump_array_and_delete: sym_lookup of pets passed
+ dump_array_and_delete: incoming size is 5
+ pets["1"] = "blacky"
+ pets["2"] = "rusty"
+ pets["3"] = "sophie"
+ dump_array_and_delete: marking element "3" for deletion
+ pets["4"] = "raincloud"
+ pets["5"] = "lucky"
+ dump_array_and_delete(pets) returned 1
+ dump_array_and_delete() did remove index "3"!
+
+
+File: gawk.info, Node: Creating Arrays, Prev: Flattening Arrays, Up: Array Manipulation
+
+16.4.10.4 How To Create and Populate Arrays
+...........................................
+
+Besides working with arrays created by `awk' code, you can create
+arrays and populate them as you see fit, and then `awk' code can access
+them and manipulate them.
+
+ There are two important points about creating arrays from extension
+code:
+
+ 1. You must install a new array into `gawk''s symbol table
+ immediately upon creating it. Once you have done so, you can then
+ populate the array.
+
+ Similarly, if installing a new array as a subarray of an existing
+ array, you must add the new array to its parent before adding any
+ elements to it.
+
+ Thus, the correct way to build an array is to work "top down."
+ Create the array, and immediately install it in `gawk''s symbol
+ table using `sym_update()', or install it as an element in a
+ previously existing array using `set_element()'. Example code is
+ coming shortly.
+
+ 2. Due to gawk internals, after using `sym_update()' to install an
+ array into `gawk', you have to retrieve the array cookie from the
+ value passed in to `sym_update()' before doing anything else with
+ it, like so:
+
+ awk_value_t index, value;
+ awk_array_t new_array;
+
+ make_const_string("an index", 8, & index);
+
+ new_array = create_array();
+ val.val_type = AWK_ARRAY;
+ val.array_cookie = new_array;
+
+ /* install array in the symbol table */
+ sym_update("array", & index, & val);
+
+ new_array = val.array_cookie; /* YOU MUST DO THIS */
+
+ If installing an array as a subarray, you must also retrieve the
+ value of the array cookie after the call to `set_element()'.
+
+ The following C code is a simple test extension to create an array
+with two regular elements and with a subarray. The leading `#include'
+directives and boilerplate variable declarations are omitted for
+brevity. The first step is to create a new array and then install it
+in the symbol table:
+
+ /* create_new_array --- create a named array */
+
+ static void
+ create_new_array()
+ {
+ awk_array_t a_cookie;
+ awk_array_t subarray;
+ awk_value_t index, value;
+
+ a_cookie = create_array();
+ value.val_type = AWK_ARRAY;
+ value.array_cookie = a_cookie;
+
+ if (! sym_update("new_array", & value))
+ printf("create_new_array: sym_update(\"new_array\") failed!\n");
+ a_cookie = value.array_cookie;
+
+Note how `a_cookie' is reset from the `array_cookie' field in the
+`value' structure.
+
+ The second step is to install two regular values into `new_array':
+
+ (void) make_const_string("hello", 5, & index);
+ (void) make_const_string("world", 5, & value);
+ if (! set_array_element(a_cookie, & index, & value)) {
+ printf("fill_in_array: set_array_element failed\n");
+ return;
+ }
+
+ (void) make_const_string("answer", 6, & index);
+ (void) make_number(42.0, & value);
+ if (! set_array_element(a_cookie, & index, & value)) {
+ printf("fill_in_array: set_array_element failed\n");
+ return;
+ }
+
+ The third step is to create the subarray and install it:
+
+ (void) make_const_string("subarray", 8, & index);
+ subarray = create_array();
+ value.val_type = AWK_ARRAY;
+ value.array_cookie = subarray;
+ if (! set_array_element(a_cookie, & index, & value)) {
+ printf("fill_in_array: set_array_element failed\n");
+ return;
+ }
+ subarray = value.array_cookie;
+
+ The final step is to populate the subarray with its own element:
+
+ (void) make_const_string("foo", 3, & index);
+ (void) make_const_string("bar", 3, & value);
+ if (! set_array_element(subarray, & index, & value)) {
+ printf("fill_in_array: set_array_element failed\n");
+ return;
+ }
+ }
+
+ Here is sample script that loads the extension and then dumps the
+array:
+
+ @load "subarray"
+
+ function dumparray(name, array, i)
+ {
+ for (i in array)
+ if (isarray(array[i]))
+ dumparray(name "[\"" i "\"]", array[i])
+ else
+ printf("%s[\"%s\"] = %s\n", name, i, array[i])
+ }
+
+ BEGIN {
+ dumparray("new_array", new_array);
+ }
+
+ Here is the result of running the script:
+
+ $ AWKLIBPATH=$PWD ./gawk -f subarray.awk
+ -| new_array["subarray"]["foo"] = bar
+ -| new_array["hello"] = world
+ -| new_array["answer"] = 42
+
+(*Note Finding Extensions::, for more information on the `AWKLIBPATH'
+environment variable.)
+
+
+File: gawk.info, Node: Extension API Variables, Next: Extension API Boilerplate, Prev: Array Manipulation, Up: Extension API Description
+
+16.4.11 API Variables
+---------------------
+
+The API provides two sets of variables. The first provides information
+about the version of the API (both with which the extension was
+compiled, and with which `gawk' was compiled). The second provides
+information about how `gawk' was invoked.
+
+* Menu:
+
+* Extension Versioning:: API Version information.
+* Extension API Informational Variables:: Variables providing information about
+ `gawk''s invocation.
+
+
+File: gawk.info, Node: Extension Versioning, Next: Extension API Informational Variables, Up: Extension API Variables
+
+16.4.11.1 API Version Constants and Variables
+.............................................
+
+The API provides both a "major" and a "minor" version number. The API
+versions are available at compile time as constants:
+
+`GAWK_API_MAJOR_VERSION'
+ The major version of the API.
+
+`GAWK_API_MINOR_VERSION'
+ The minor version of the API.
+
+ The minor version increases when new functions are added to the API.
+Such new functions are always added to the end of the API `struct'.
+
+ The major version increases (and the minor version is reset to zero)
+if any of the data types change size or member order, or if any of the
+existing functions change signature.
+
+ It could happen that an extension may be compiled against one version
+of the API but loaded by a version of `gawk' using a different version.
+For this reason, the major and minor API versions of the running `gawk'
+are included in the API `struct' as read-only constant integers:
+
+`api->major_version'
+ The major version of the running `gawk'.
+
+`api->minor_version'
+ The minor version of the running `gawk'.
+
+ It is up to the extension to decide if there are API
+incompatibilities. Typically a check like this is enough:
-Two useful functions that are not in `awk' are `chdir()' (so that an
+ if (api->major_version != GAWK_API_MAJOR_VERSION
+ || api->minor_version < GAWK_API_MINOR_VERSION) {
+ fprintf(stderr, "foo_extension: version mismatch with gawk!\n");
+ fprintf(stderr, "\tmy version (%d, %d), gawk version (%d, %d)\n",
+ GAWK_API_MAJOR_VERSION, GAWK_API_MINOR_VERSION,
+ api->major_version, api->minor_version);
+ exit(1);
+ }
+
+ Such code is included in the boilerplate `dl_load_func()' macro
+provided in `gawkapi.h' (discussed later, in *note Extension API
+Boilerplate::).
+
+
+File: gawk.info, Node: Extension API Informational Variables, Prev: Extension Versioning, Up: Extension API Variables
+
+16.4.11.2 Informational Variables
+.................................
+
+The API provides access to several variables that describe whether the
+corresponding command-line options were enabled when `gawk' was
+invoked. The variables are:
+
+`do_lint'
+ This variable is true if `gawk' was invoked with `--lint' option
+ (*note Options::).
+
+`do_traditional'
+ This variable is true if `gawk' was invoked with `--traditional'
+ option.
+
+`do_profile'
+ This variable is true if `gawk' was invoked with `--profile'
+ option.
+
+`do_sandbox'
+ This variable is true if `gawk' was invoked with `--sandbox'
+ option.
+
+`do_debug'
+ This variable is true if `gawk' was invoked with `--debug' option.
+
+`do_mpfr'
+ This variable is true if `gawk' was invoked with `--bignum' option.
+
+ The value of `do_lint' can change if `awk' code modifies the `LINT'
+built-in variable (*note Built-in Variables::). The others should not
+change during execution.
+
+
+File: gawk.info, Node: Extension API Boilerplate, Next: Finding Extensions, Prev: Extension API Variables, Up: Extension API Description
+
+16.4.12 Boilerplate Code
+------------------------
+
+As mentioned earlier (*note Extension Mechanism Outline::), the function
+definitions as presented are really macros. To use these macros, your
+extension must provide a small amount of boilerplate code (variables and
+functions) towards the top of your source file, using pre-defined names
+as described below. The boilerplate needed is also provided in comments
+in the `gawkapi.h' header file:
+
+ /* Boiler plate code: */
+ int plugin_is_GPL_compatible;
+
+ static gawk_api_t *const api;
+ static awk_ext_id_t ext_id;
+ static const char *ext_version = NULL; /* or ... = "some string" */
+
+ static awk_ext_func_t func_table[] = {
+ { "name", do_name, 1 },
+ /* ... */
+ };
+
+ /* EITHER: */
+
+ static awk_bool_t (*init_func)(void) = NULL;
+
+ /* OR: */
+
+ static awk_bool_t
+ init_my_module(void)
+ {
+ ...
+ }
+
+ static awk_bool_t (*init_func)(void) = init_my_module;
+
+ dl_load_func(func_table, some_name, "name_space_in_quotes")
+
+ These variables and functions are as follows:
+
+`int plugin_is_GPL_compatible;'
+ This asserts that the extension is compatible with the GNU GPL
+ (*note Copying::). If your extension does not have this, `gawk'
+ will not load it (*note Plugin License::).
+
+`static gawk_api_t *const api;'
+ This global `static' variable should be set to point to the
+ `gawk_api_t' pointer that `gawk' passes to your `dl_load()'
+ function. This variable is used by all of the macros.
+
+`static awk_ext_id_t ext_id;'
+ This global static variable should be set to the `awk_ext_id_t'
+ value that `gawk' passes to your `dl_load()' function. This
+ variable is used by all of the macros.
+
+`static const char *ext_version = NULL; /* or ... = "some string" */'
+ This global `static' variable should be set either to `NULL', or
+ to point to a string giving the name and version of your extension.
+
+`static awk_ext_func_t func_table[] = { ... };'
+ This is an array of one or more `awk_ext_func_t' structures as
+ described earlier (*note Extension Functions::). It can then be
+ looped over for multiple calls to `add_ext_func()'.
+
+`static awk_bool_t (*init_func)(void) = NULL;'
+` OR'
+`static awk_bool_t init_my_module(void) { ... }'
+`static awk_bool_t (*init_func)(void) = init_my_module;'
+ If you need to do some initialization work, you should define a
+ function that does it (creates variables, opens files, etc.) and
+ then define the `init_func' pointer to point to your function.
+ The function should return zero (false) upon failure, non-zero
+ (success) if everything goes well.
+
+ If you don't need to do any initialization, define the pointer and
+ initialize it to `NULL'.
+
+`dl_load_func(func_table, some_name, "name_space_in_quotes")'
+ This macro expands to a `dl_load()' function that performs all the
+ necessary initializations.
+
+ The point of the all the variables and arrays is to let the
+`dl_load()' function (from the `dl_load_func()' macro) do all the
+standard work. It does the following:
+
+ 1. Check the API versions. If the extension major version does not
+ match `gawk''s, or if the extension minor version is greater than
+ `gawk''s, it prints a fatal error message and exits.
+
+ 2. Load the functions defined in `func_table'. If any of them fails
+ to load, it prints a warning message but continues on.
+
+ 3. If the `init_func' pointer is not `NULL', call the function it
+ points to. If it returns non-zero, print a warning message.
+
+ 4. If `ext_version' is not `NULL', register the version string with
+ `gawk'.
+
+
+File: gawk.info, Node: Finding Extensions, Prev: Extension API Boilerplate, Up: Extension API Description
+
+16.4.13 How `gawk' Finds Extensions
+-----------------------------------
+
+Compiled extensions have to be installed in a directory where `gawk'
+can find them. If `gawk' is configured and built in the default
+fashion, the directory in which to find extensions is
+`/usr/local/lib/gawk'. You can also specify a search path with a list
+of directories to search for compiled extensions. *Note AWKLIBPATH
+Variable::, for more information.
+
+
+File: gawk.info, Node: Extension Example, Next: Extension Samples, Prev: Extension API Description, Up: Dynamic Extensions
+
+16.5 Example: Some File Functions
+=================================
+
+ No matter where you go, there you are.
+ Buckaroo Bonzai
+
+ Two useful functions that are not in `awk' are `chdir()' (so that an
`awk' program can change its directory) and `stat()' (so that an `awk'
program can gather information about a file). This minor node
-implements these functions for `gawk' in an external extension library.
+implements these functions for `gawk' in an extension.
* Menu:
@@ -21233,9 +23575,9 @@ implements these functions for `gawk' in an external extension library.
* Using Internal File Ops:: How to use an external extension.

-File: gawk.info, Node: Internal File Description, Next: Internal File Ops, Up: Sample Library
+File: gawk.info, Node: Internal File Description, Next: Internal File Ops, Up: Extension Example
-16.2.1 Using `chdir()' and `stat()'
+16.5.1 Using `chdir()' and `stat()'
-----------------------------------
This minor node shows how to use the new functions at the `awk' level
@@ -21243,6 +23585,7 @@ once they've been integrated into the running `gawk' interpreter.
Using `chdir()' is very straightforward. It takes one argument, the new
directory to change to:
+ @load "filefuncs"
...
newdir = "/home/arnold/funstuff"
ret = chdir(newdir)
@@ -21253,7 +23596,7 @@ directory to change to:
}
...
- The return value is negative if the `chdir' failed, and `ERRNO'
+ The return value is negative if the `chdir()' failed, and `ERRNO'
(*note Built-in Variables::) is set to a string indicating the error.
Using `stat()' is a bit more complicated. The C `stat()' function
@@ -21262,7 +23605,6 @@ way to model this in `awk' is to fill in an associative array with the
appropriate information:
file = "/home/arnold/.profile"
- fdata[1] = "x" # force `fdata' to be an array
ret = stat(file, fdata)
if (ret < 0) {
printf("could not stat %s: %s\n",
@@ -21304,11 +23646,11 @@ appropriate information:
`"ctime"'
The file's last access, modification, and inode update times,
respectively. These are numeric timestamps, suitable for
- formatting with `strftime()' (*note Built-in::).
+ formatting with `strftime()' (*note Time Functions::).
`"pmode"'
The file's "printable mode." This is a string representation of
- the file's type and permissions, such as what is produced by `ls
+ the file's type and permissions, such as is produced by `ls
-l'--for example, `"drwxr-xr-x"'.
`"type"'
@@ -21356,57 +23698,87 @@ Elements::):
components of that number, respectively.

-File: gawk.info, Node: Internal File Ops, Next: Using Internal File Ops, Prev: Internal File Description, Up: Sample Library
+File: gawk.info, Node: Internal File Ops, Next: Using Internal File Ops, Prev: Internal File Description, Up: Extension Example
-16.2.2 C Code for `chdir()' and `stat()'
+16.5.2 C Code for `chdir()' and `stat()'
----------------------------------------
-Here is the C code for these extensions. They were written for
-GNU/Linux. The code needs some more work for complete portability to
-other POSIX-compliant systems:(1)
+Here is the C code for these extensions.(1)
- #include "awk.h"
+ The file includes a number of standard header files, and then
+includes the `gawkapi.h' header file which provides the API definitions.
+Those are followed by the necessary variable declarations to make use
+of the API macros and boilerplate code (*note Extension API
+Boilerplate::).
- #include <sys/sysmacros.h>
+ #ifdef HAVE_CONFIG_H
+ #include <config.h>
+ #endif
+
+ #include <stdio.h>
+ #include <assert.h>
+ #include <errno.h>
+ #include <stdlib.h>
+ #include <string.h>
+ #include <unistd.h>
+
+ #include <sys/types.h>
+ #include <sys/stat.h>
+
+ #include "gawkapi.h"
+
+ #include "gettext.h"
+ #define _(msgid) gettext(msgid)
+ #define N_(msgid) msgid
+
+ #include "gawkfts.h"
+ #include "stack.h"
+
+ static const gawk_api_t *api; /* for convenience macros to work */
+ static awk_ext_id_t *ext_id;
+ static awk_bool_t init_filefuncs(void);
+ static awk_bool_t (*init_func)(void) = init_filefuncs;
+ static const char *ext_version = "filefuncs extension: version 1.0";
int plugin_is_GPL_compatible;
+ By convention, for an `awk' function `foo()', the C function that
+implements it is called `do_foo()'. The function should have two
+arguments: the first is an `int' usually called `nargs', that
+represents the number of actual arguments for the function. The second
+is a pointer to an `awk_value_t', usually named `result'.
+
/* do_chdir --- provide dynamically loaded chdir() builtin for gawk */
- static NODE *
- do_chdir(int nargs)
+ static awk_value_t *
+ do_chdir(int nargs, awk_value_t *result)
{
- NODE *newdir;
+ awk_value_t newdir;
int ret = -1;
- if (do_lint && nargs != 1)
- lintwarn("chdir: called with incorrect number of arguments");
-
- newdir = get_scalar_argument(0, FALSE);
+ assert(result != NULL);
- The file includes the `"awk.h"' header file for definitions for the
-`gawk' internals. It includes `<sys/sysmacros.h>' for access to the
-`major()' and `minor'() macros.
+ if (do_lint && nargs != 1)
+ lintwarn(ext_id,
+ _("chdir: called with incorrect number of arguments, "
+ "expecting 1"));
- By convention, for an `awk' function `foo', the function that
-implements it is called `do_foo'. The function should take a `int'
-argument, usually called `nargs', that represents the number of defined
-arguments for the function. The `newdir' variable represents the new
-directory to change to, retrieved with `get_scalar_argument()'. Note
-that the first argument is numbered zero.
+ The `newdir' variable represents the new directory to change to,
+retrieved with `get_argument()'. Note that the first argument is
+numbered zero.
- This code actually accomplishes the `chdir()'. It first forces the
-argument to be a string and passes the string value to the `chdir()'
-system call. If the `chdir()' fails, `ERRNO' is updated.
+ If the argument is retrieved successfully, the function calls the
+`chdir()' system call. If the `chdir()' fails, `ERRNO' is updated.
- (void) force_string(newdir);
- ret = chdir(newdir->stptr);
- if (ret < 0)
- update_ERRNO_int(errno);
+ if (get_argument(0, AWK_STRING, & newdir)) {
+ ret = chdir(newdir.str_value.str);
+ if (ret < 0)
+ update_ERRNO_int(errno);
+ }
Finally, the function returns the return value to the `awk' level:
- return make_number((AWKNUM) ret);
+ return make_number(ret, result);
}
The `stat()' built-in is more involved. First comes a function that
@@ -21421,71 +23793,239 @@ turns a numeric mode into a printable representation (e.g., 644 becomes
...
}
- Next comes the `do_stat()' function. It starts with variable
+ Next comes a function for reading symbolic links, which is also
+omitted here for brevity:
+
+ /* read_symlink --- read a symbolic link into an allocated buffer.
+ ... */
+
+ static char *
+ read_symlink(const char *fname, size_t bufsize, ssize_t *linksize)
+ {
+ ...
+ }
+
+ Two helper functions simplify entering values in the array that will
+contain the result of the `stat()':
+
+ /* array_set --- set an array element */
+
+ static void
+ array_set(awk_array_t array, const char *sub, awk_value_t *value)
+ {
+ awk_value_t index;
+
+ set_array_element(array,
+ make_const_string(sub, strlen(sub), & index),
+ value);
+
+ }
+
+ /* array_set_numeric --- set an array element with a number */
+
+ static void
+ array_set_numeric(awk_array_t array, const char *sub, double num)
+ {
+ awk_value_t tmp;
+
+ array_set(array, sub, make_number(num, & tmp));
+ }
+
+ The following function does most of the work to fill in the
+`awk_array_t' result array with values obtained from a valid `struct
+stat'. It is done in a separate function to support the `stat()'
+function for `gawk' and also to support the `fts()' extension which is
+included in the same file but whose code is not shown here (*note
+Extension Sample File Functions::).
+
+ The first part of the function is variable declarations, including a
+table to map file types to strings:
+
+ /* fill_stat_array --- do the work to fill an array with stat info */
+
+ static int
+ fill_stat_array(const char *name, awk_array_t array, struct stat *sbuf)
+ {
+ char *pmode; /* printable mode */
+ const char *type = "unknown";
+ awk_value_t tmp;
+ static struct ftype_map {
+ unsigned int mask;
+ const char *type;
+ } ftype_map[] = {
+ { S_IFREG, "file" },
+ { S_IFBLK, "blockdev" },
+ { S_IFCHR, "chardev" },
+ { S_IFDIR, "directory" },
+ #ifdef S_IFSOCK
+ { S_IFSOCK, "socket" },
+ #endif
+ #ifdef S_IFIFO
+ { S_IFIFO, "fifo" },
+ #endif
+ #ifdef S_IFLNK
+ { S_IFLNK, "symlink" },
+ #endif
+ #ifdef S_IFDOOR /* Solaris weirdness */
+ { S_IFDOOR, "door" },
+ #endif /* S_IFDOOR */
+ };
+ int j, k;
+
+ The destination array is cleared, and then code fills in various
+elements based on values in the `struct stat':
+
+ /* empty out the array */
+ clear_array(array);
+
+ /* fill in the array */
+ array_set(array, "name", make_const_string(name, strlen(name),
+ & tmp));
+ array_set_numeric(array, "dev", sbuf->st_dev);
+ array_set_numeric(array, "ino", sbuf->st_ino);
+ array_set_numeric(array, "mode", sbuf->st_mode);
+ array_set_numeric(array, "nlink", sbuf->st_nlink);
+ array_set_numeric(array, "uid", sbuf->st_uid);
+ array_set_numeric(array, "gid", sbuf->st_gid);
+ array_set_numeric(array, "size", sbuf->st_size);
+ array_set_numeric(array, "blocks", sbuf->st_blocks);
+ array_set_numeric(array, "atime", sbuf->st_atime);
+ array_set_numeric(array, "mtime", sbuf->st_mtime);
+ array_set_numeric(array, "ctime", sbuf->st_ctime);
+
+ /* for block and character devices, add rdev,
+ major and minor numbers */
+ if (S_ISBLK(sbuf->st_mode) || S_ISCHR(sbuf->st_mode)) {
+ array_set_numeric(array, "rdev", sbuf->st_rdev);
+ array_set_numeric(array, "major", major(sbuf->st_rdev));
+ array_set_numeric(array, "minor", minor(sbuf->st_rdev));
+ }
+
+The latter part of the function makes selective additions to the
+destination array, depending upon the availability of certain members
+and/or the type of the file. It then returns zero, for success:
+
+ #ifdef HAVE_ST_BLKSIZE
+ array_set_numeric(array, "blksize", sbuf->st_blksize);
+ #endif /* HAVE_ST_BLKSIZE */
+
+ pmode = format_mode(sbuf->st_mode);
+ array_set(array, "pmode", make_const_string(pmode, strlen(pmode),
+ & tmp));
+
+ /* for symbolic links, add a linkval field */
+ if (S_ISLNK(sbuf->st_mode)) {
+ char *buf;
+ ssize_t linksize;
+
+ if ((buf = read_symlink(name, sbuf->st_size,
+ & linksize)) != NULL)
+ array_set(array, "linkval",
+ make_malloced_string(buf, linksize, & tmp));
+ else
+ warning(ext_id, _("stat: unable to read symbolic link `%s'"),
+ name);
+ }
+
+ /* add a type field */
+ type = "unknown"; /* shouldn't happen */
+ for (j = 0, k = sizeof(ftype_map)/sizeof(ftype_map[0]); j < k; j++) {
+ if ((sbuf->st_mode & S_IFMT) == ftype_map[j].mask) {
+ type = ftype_map[j].type;
+ break;
+ }
+ }
+
+ array_set(array, "type", make_const_string(type, strlen(type), &tmp));
+
+ return 0;
+ }
+
+ Finally, here is the `do_stat()' function. It starts with variable
declarations and argument checking:
/* do_stat --- provide a stat() function for gawk */
- static NODE *
- do_stat(int nargs)
+ static awk_value_t *
+ do_stat(int nargs, awk_value_t *result)
{
- NODE *file, *array, *tmp;
- struct stat sbuf;
+ awk_value_t file_param, array_param;
+ char *name;
+ awk_array_t array;
int ret;
- NODE **aptr;
- char *pmode; /* printable mode */
- char *type = "unknown";
+ struct stat sbuf;
+
+ assert(result != NULL);
- if (do_lint && nargs > 2)
- lintwarn("stat: called with too many arguments");
+ if (do_lint && nargs != 2) {
+ lintwarn(ext_id,
+ _("stat: called with wrong number of arguments"));
+ return make_number(-1, result);
+ }
Then comes the actual work. First, the function gets the arguments.
-Then, it always clears the array. The code use `lstat()' (instead of
-`stat()') to get the file information, in case the file is a symbolic
-link. If there's an error, it sets `ERRNO' and returns:
+Next, it gets the information for the file. The code use `lstat()'
+(instead of `stat()') to get the file information, in case the file is
+a symbolic link. If there's an error, it sets `ERRNO' and returns:
/* file is first arg, array to hold results is second */
- file = get_scalar_argument(0, FALSE);
- array = get_array_argument(1, FALSE);
+ if ( ! get_argument(0, AWK_STRING, & file_param)
+ || ! get_argument(1, AWK_ARRAY, & array_param)) {
+ warning(ext_id, _("stat: bad parameters"));
+ return make_number(-1, result);
+ }
- /* empty out the array */
- assoc_clear(array);
+ name = file_param.str_value.str;
+ array = array_param.array_cookie;
+
+ /* always empty out the array */
+ clear_array(array);
/* lstat the file, if error, set ERRNO and return */
- (void) force_string(file);
- ret = lstat(file->stptr, & sbuf);
+ ret = lstat(name, & sbuf);
if (ret < 0) {
update_ERRNO_int(errno);
- return make_number((AWKNUM) ret);
+ return make_number(ret, result);
}
- Now comes the tedious part: filling in the array. Only a few of the
-calls are shown here, since they all follow the same pattern:
+ The tedious work is done by `fill_stat_array()', shown earlier.
+When done, return the result from `fill_stat_array()':
- /* fill in the array */
- aptr = assoc_lookup(array, tmp = make_string("name", 4));
- *aptr = dupnode(file);
- unref(tmp);
+ ret = fill_stat_array(name, array, & sbuf);
- aptr = assoc_lookup(array, tmp = make_string("mode", 4));
- *aptr = make_number((AWKNUM) sbuf.st_mode);
- unref(tmp);
+ return make_number(ret, result);
+ }
- aptr = assoc_lookup(array, tmp = make_string("pmode", 5));
- pmode = format_mode(sbuf.st_mode);
- *aptr = make_string(pmode, strlen(pmode));
- unref(tmp);
+ Finally, it's necessary to provide the "glue" that loads the new
+function(s) into `gawk'.
- When done, return the `lstat()' return value:
+ The `filefuncs' extension also provides an `fts()' function, which
+we omit here. For its sake there is an initialization function:
+ /* init_filefuncs --- initialization routine */
- return make_number((AWKNUM) ret);
+ static awk_bool_t
+ init_filefuncs(void)
+ {
+ ...
}
- Finally, it's necessary to provide the "glue" that loads the new
-function(s) into `gawk'. By convention, each library has a routine
-named `dl_load()' that does the job. The simplest way is to use the
-`dl_load_func' macro in `gawkapi.h'.
+ We are almost done. We need an array of `awk_ext_func_t' structures
+for loading each function into `gawk':
+
+ static awk_ext_func_t func_table[] = {
+ { "chdir", do_chdir, 1 },
+ { "stat", do_stat, 2 },
+ { "fts", do_fts, 3 },
+ };
+
+ Each extension must have a routine named `dl_load()' to load
+everything that needs to be loaded. It is simplest to use the
+`dl_load_func()' macro in `gawkapi.h':
+
+ /* define the dl_load() function using the boilerplate macro */
+
+ dl_load_func(func_table, filefuncs, "")
And that's it! As an exercise, consider adding functions to
implement system calls such as `chown()', `chmod()', and `umask()'.
@@ -21497,34 +24037,33 @@ implement system calls such as `chown()', `chmod()', and `umask()'.
version.

-File: gawk.info, Node: Using Internal File Ops, Prev: Internal File Ops, Up: Sample Library
+File: gawk.info, Node: Using Internal File Ops, Prev: Internal File Ops, Up: Extension Example
-16.2.3 Integrating the Extensions
+16.5.3 Integrating The Extensions
---------------------------------
Now that the code is written, it must be possible to add it at runtime
to the running `gawk' interpreter. First, the code must be compiled.
Assuming that the functions are in a file named `filefuncs.c', and IDIR
-is the location of the `gawk' include files, the following steps create
-a GNU/Linux shared library:
+is the location of the `gawkapi.h' header file, the following steps(1)
+create a GNU/Linux shared library:
$ gcc -fPIC -shared -DHAVE_CONFIG_H -c -O -g -IIDIR filefuncs.c
- $ ld -o filefuncs.so -shared filefuncs.o
+ $ ld -o filefuncs.so -shared filefuncs.o -lc
- Once the library exists, it is loaded by calling the `extension()'
-built-in function. This function takes two arguments: the name of the
-library to load and the name of a function to call when the library is
-first loaded. This function adds the new functions to `gawk'. It
-returns the value returned by the initialization function within the
-shared library:
+ Once the library exists, it is loaded by using the `@load' keyword.
# file testff.awk
+ @load "filefuncs"
+
BEGIN {
- extension("./filefuncs.so", "dl_load")
+ "pwd" | getline curdir # save current directory
+ close("pwd")
- chdir(".") # no-op
+ chdir("/tmp")
+ system("pwd") # test it
+ chdir(curdir) # go back
- data[1] = 1 # force `data' to be an array
print "Info for testff.awk"
ret = stat("testff.awk", data)
print "ret =", ret
@@ -21541,32 +24080,642 @@ shared library:
print "JUNK modified:", strftime("%m %d %y %H:%M:%S", data["mtime"])
}
- Here are the results of running the program:
+ The `AWKLIBPATH' environment variable tells `gawk' where to find
+shared libraries (*note Finding Extensions::). We set it to the
+current directory and run the program:
- $ gawk -f testff.awk
+ $ AWKLIBPATH=$PWD gawk -f testff.awk
+ -| /tmp
-| Info for testff.awk
-| ret = 0
- -| data["size"] = 607
- -| data["ino"] = 14945891
- -| data["name"] = testff.awk
- -| data["pmode"] = -rw-rw-r--
- -| data["nlink"] = 1
- -| data["atime"] = 1293993369
- -| data["mtime"] = 1288520752
- -| data["mode"] = 33204
-| data["blksize"] = 4096
- -| data["dev"] = 2054
+ -| data["mtime"] = 1350838628
+ -| data["mode"] = 33204
-| data["type"] = file
- -| data["gid"] = 500
- -| data["uid"] = 500
+ -| data["dev"] = 2053
+ -| data["gid"] = 1000
+ -| data["ino"] = 1719496
+ -| data["ctime"] = 1350838628
-| data["blocks"] = 8
- -| data["ctime"] = 1290113572
- -| testff.awk modified: 10 31 10 12:25:52
+ -| data["nlink"] = 1
+ -| data["name"] = testff.awk
+ -| data["atime"] = 1350838632
+ -| data["pmode"] = -rw-rw-r--
+ -| data["size"] = 662
+ -| data["uid"] = 1000
+ -| testff.awk modified: 10 21 12 18:57:08
-|
-| Info for JUNK
-| ret = -1
-| JUNK modified: 01 01 70 02:00:00
+ ---------- Footnotes ----------
+
+ (1) In practice, you would probably want to use the GNU
+Autotools--Automake, Autoconf, Libtool, and Gettext--to configure and
+build your libraries. Instructions for doing so are beyond the scope of
+this Info file. *Note gawkextlib::, for WWW links to the tools.
+
+
+File: gawk.info, Node: Extension Samples, Next: gawkextlib, Prev: Extension Example, Up: Dynamic Extensions
+
+16.6 The Sample Extensions In The `gawk' Distribution
+=====================================================
+
+This minor node provides brief overviews of the sample extensions that
+come in the `gawk' distribution. Some of them are intended for
+production use, such the `filefuncs' and `readdir' extensions. Others
+mainly provide example code that shows how to use the extension API.
+
+* Menu:
+
+* Extension Sample File Functions:: The file functions sample.
+* Extension Sample Fnmatch:: An interface to `fnmatch()'.
+* Extension Sample Fork:: An interface to `fork()' and other
+ process functions.
+* Extension Sample Ord:: Character to value to character
+ conversions.
+* Extension Sample Readdir:: An interface to `readdir()'.
+* Extension Sample Revout:: Reversing output sample output wrapper.
+* Extension Sample Rev2way:: Reversing data sample two-way processor.
+* Extension Sample Read write array:: Serializing an array to a file.
+* Extension Sample Readfile:: Reading an entire file into a string.
+* Extension Sample API Tests:: Tests for the API.
+* Extension Sample Time:: An interface to `gettimeofday()'
+ and `sleep()'.
+
+
+File: gawk.info, Node: Extension Sample File Functions, Next: Extension Sample Fnmatch, Up: Extension Samples
+
+16.6.1 File Related Functions
+-----------------------------
+
+The `filefuncs' extension provides three different functions, as
+follows: The usage is:
+
+`@load "filefuncs"'
+ This is how you load the extension.
+
+`result = chdir("/some/directory")'
+ The `chdir()' function is a direct hook to the `chdir()' system
+ call to change the current directory. It returns zero upon
+ success or less than zero upon error. In the latter case it
+ updates `ERRNO'.
+
+`result = stat("/some/path", statdata)'
+ The `stat()' function provides a hook into the `stat()' system
+ call. In fact, it uses `lstat()'. It returns zero upon success or
+ less than zero upon error. In the latter case it updates `ERRNO'.
+
+ In all cases, it clears the `statdata' array. When the call is
+ successful, `stat()' fills the `statdata' array with information
+ retrieved from the filesystem, as follows:
+
+ `statdata["name"]' The name of the file.
+ `statdata["dev"]' Corresponds to the `st_dev' field in
+ the `struct stat'.
+ `statdata["ino"]' Corresponds to the `st_ino' field in
+ the `struct stat'.
+ `statdata["mode"]' Corresponds to the `st_mode' field in
+ the `struct stat'.
+ `statdata["nlink"]' Corresponds to the `st_nlink' field in
+ the `struct stat'.
+ `statdata["uid"]' Corresponds to the `st_uid' field in
+ the `struct stat'.
+ `statdata["gid"]' Corresponds to the `st_gid' field in
+ the `struct stat'.
+ `statdata["size"]' Corresponds to the `st_size' field in
+ the `struct stat'.
+ `statdata["atime"]' Corresponds to the `st_atime' field in
+ the `struct stat'.
+ `statdata["mtime"]' Corresponds to the `st_mtime' field in
+ the `struct stat'.
+ `statdata["ctime"]' Corresponds to the `st_ctime' field in
+ the `struct stat'.
+ `statdata["rdev"]' Corresponds to the `st_rdev' field in
+ the `struct stat'. This element is
+ only present for device files.
+ `statdata["major"]' Corresponds to the `st_major' field in
+ the `struct stat'. This element is
+ only present for device files.
+ `statdata["minor"]' Corresponds to the `st_minor' field in
+ the `struct stat'. This element is
+ only present for device files.
+ `statdata["blksize"]' Corresponds to the `st_blksize' field
+ in the `struct stat'. if this field is
+ present on your system. (It is present
+ on all modern systems that we know of.)
+ `statdata["pmode"]' A human-readable version of the mode
+ value, such as printed by `ls'. For
+ example, `"-rwxr-xr-x"'.
+ `statdata["linkval"]' If the named file is a symbolic link,
+ this element will exist and its value
+ is the value of the symbolic link
+ (where the symbolic link points to).
+ `statdata["type"]' The type of the file as a string. One
+ of `"file"', `"blockdev"', `"chardev"',
+ `"directory"', `"socket"', `"fifo"',
+ `"symlink"', `"door"', or `"unknown"'.
+ Not all systems support all file types.
+
+`flags = or(FTS_PHYSICAL, ...)'
+`result = fts(pathlist, flags, filedata)'
+ Walk the file trees provided in `pathlist' and fill in the
+ `filedata' array as described below. `flags' is the bitwise OR of
+ several predefined constant values, also as described below.
+ Return zero if there were no errors, otherwise return -1.
+
+ The `fts()' function provides a hook to the C library `fts()'
+routines for traversing file hierarchies. Instead of returning data
+about one file at a time in a stream, it fills in a multi-dimensional
+array with data about each file and directory encountered in the
+requested hierarchies.
+
+ The arguments are as follows:
+
+`pathlist'
+ An array of filenames. The element values are used; the index
+ values are ignored.
+
+`flags'
+ This should be the bitwise OR of one or more of the following
+ predefined constant flag values. At least one of `FTS_LOGICAL' or
+ `FTS_PHYSICAL' must be provided; otherwise `fts()' returns an
+ error value and sets `ERRNO'. The flags are:
+
+ `FTS_LOGICAL'
+ Do a "logical" file traversal, where the information returned
+ for a symbolic link refers to the linked-to file, and not to
+ the symbolic link itself. This flag is mutually exclusive
+ with `FTS_PHYSICAL'.
+
+ `FTS_PHYSICAL'
+ Do a "physical" file traversal, where the information
+ returned for a symbolic link refers to the symbolic link
+ itself. This flag is mutually exclusive with `FTS_LOGICAL'.
+
+ `FTS_NOCHDIR'
+ As a performance optimization, the C library `fts()' routines
+ change directory as they traverse a file hierarchy. This
+ flag disables that optimization.
+
+ `FTS_COMFOLLOW'
+ Immediately follow a symbolic link named in `pathlist',
+ whether or not `FTS_LOGICAL' is set.
+
+ `FTS_SEEDOT'
+ By default, the `fts()' routines do not return entries for `.'
+ and `..'. This option causes entries for `..' to also be
+ included. (The extension always includes an entry for `.',
+ see below.)
+
+ `FTS_XDEV'
+ During a traversal, do not cross onto a different mounted
+ filesystem.
+
+`filedata'
+ The `filedata' array is first cleared. Then, `fts()' creates an
+ element in `filedata' for every element in `pathlist'. The index
+ is the name of the directory or file given in `pathlist'. The
+ element for this index is itself an array. There are two cases.
+
+ _The path is a file._
+ In this case, the array contains two or three elements:
+
+ `"path"'
+ The full path to this file, starting from the "root"
+ that was given in the `pathlist' array.
+
+ `"stat"'
+ This element is itself an array, containing the same
+ information as provided by the `stat()' function
+ described earlier for its `statdata' argument. The
+ element may not be present if the `stat()' system call
+ for the file failed.
+
+ `"error"'
+ If some kind of error was encountered, the array will
+ also contain an element named `"error"', which is a
+ string describing the error.
+
+ _The path is a directory._
+ In this case, the array contains one element for each entry
+ in the directory. If an entry is a file, that element is as
+ for files, just described. If the entry is a directory, that
+ element is (recursively), an array describing the
+ subdirectory. If `FTS_SEEDOT' was provided in the flags,
+ then there will also be an element named `".."'. This
+ element will be an array containing the data as provided by
+ `stat()'.
+
+ In addition, there will be an element whose index is `"."'.
+ This element is an array containing the same two or three
+ elements as for a file: `"path"', `"stat"', and `"error"'.
+
+ The `fts()' function returns zero if there were no errors.
+Otherwise it returns -1.
+
+ NOTE: The `fts()' extension does not exactly mimic the interface
+ of the C library `fts()' routines, choosing instead to provide an
+ interface that is based on associative arrays, which should be
+ more comfortable to use from an `awk' program. This includes the
+ lack of a comparison function, since `gawk' already provides
+ powerful array sorting facilities. While an `fts_read()'-like
+ interface could have been provided, this felt less natural than
+ simply creating a multi-dimensional array to represent the file
+ hierarchy and its information.
+
+ See `test/fts.awk' in the `gawk' distribution for an example.
+
+
+File: gawk.info, Node: Extension Sample Fnmatch, Next: Extension Sample Fork, Prev: Extension Sample File Functions, Up: Extension Samples
+
+16.6.2 Interface To `fnmatch()'
+-------------------------------
+
+This extension provides an interface to the C library `fnmatch()'
+function. The usage is:
+
+ @load "fnmatch"
+
+ result = fnmatch(pattern, string, flags)
+
+ The `fnmatch' extension adds a single function named `fnmatch()',
+one constant (`FNM_NOMATCH'), and an array of flag values named `FNM'.
+
+ The arguments to `fnmatch()' are:
+
+`pattern'
+ The filename wildcard to match.
+
+`string'
+ The filename string,
+
+`flag'
+ Either zero, or the bitwise OR of one or more of the flags in the
+ `FNM' array.
+
+ The return value is zero on success, `FNM_NOMATCH' if the string did
+not match the pattern, or a different non-zero value if an error
+occurred.
+
+ The flags are follows:
+
+`FNM["CASEFOLD"]' Corresponds to the `FNM_CASEFOLD' flag as defined in
+ `fnmatch()'.
+`FNM["FILE_NAME"]' Corresponds to the `FNM_FILE_NAME' flag as defined
+ in `fnmatch()'.
+`FNM["LEADING_DIR"]' Corresponds to the `FNM_LEADING_DIR' flag as defined
+ in `fnmatch()'.
+`FNM["NOESCAPE"]' Corresponds to the `FNM_NOESCAPE' flag as defined in
+ `fnmatch()'.
+`FNM["PATHNAME"]' Corresponds to the `FNM_PATHNAME' flag as defined in
+ `fnmatch()'.
+`FNM["PERIOD"]' Corresponds to the `FNM_PERIOD' flag as defined in
+ `fnmatch()'.
+
+ Here is an example:
+
+ @load "fnmatch"
+ ...
+ flags = or(FNM["PERIOD"], FNM["NOESCAPE"])
+ if (fnmatch("*.a", "foo.c", flags) == FNM_NOMATCH)
+ print "no match"
+
+
+File: gawk.info, Node: Extension Sample Fork, Next: Extension Sample Ord, Prev: Extension Sample Fnmatch, Up: Extension Samples
+
+16.6.3 Interface To `fork()', `wait()' and `waitpid()'
+------------------------------------------------------
+
+The `fork' extension adds three functions, as follows.
+
+`@load "fork"'
+ This is how you load the extension.
+
+`pid = fork()'
+ This function creates a new process. The return value is the zero
+ in the child and the process-id number of the child in the parent,
+ or -1 upon error. In the latter case, `ERRNO' indicates the
+ problem. In the child, `PROCINFO["pid"]' and `PROCINFO["ppid"]'
+ are updated to reflect the correct values.
+
+`ret = waitpid(pid)'
+ This function takes a numeric argument, which is the process-id to
+ wait for. The return value is that of the `waitpid()' system call.
+
+`ret = wait()'
+ This function waits for the first child to die. The return value
+ is that of the `wait()' system call.
+
+ There is no corresponding `exec()' function.
+
+ Here is an example:
+
+ @load "fork"
+ ...
+ if ((pid = fork()) == 0)
+ print "hello from the child"
+ else
+ print "hello from the parent"
+
+
+File: gawk.info, Node: Extension Sample Ord, Next: Extension Sample Readdir, Prev: Extension Sample Fork, Up: Extension Samples
+
+16.6.4 Character and Numeric values: `ord()' and `chr()'
+--------------------------------------------------------
+
+The `ordchr' extension adds two functions, named `ord()' and `chr()',
+as follows.
+
+`number = ord(string)'
+ Return the numeric value of the first character in `string'.
+
+`char = chr(number)'
+ Return the string whose first character is that represented by
+ `number'.
+
+ These functions are inspired by the Pascal language functions of the
+same name. Here is an example:
+
+ @load "ordchr"
+ ...
+ printf("The numeric value of 'A' is %d\n", ord("A"))
+ printf("The string value of 65 is %s\n", chr(65))
+
+
+File: gawk.info, Node: Extension Sample Readdir, Next: Extension Sample Revout, Prev: Extension Sample Ord, Up: Extension Samples
+
+16.6.5 Reading Directories
+--------------------------
+
+The `readdir' extension adds an input parser for directories, and adds
+a single function named `readdir_do_ftype()'. The usage is as follows:
+
+ @load "readdir"
+
+ readdir_do_ftype("stat") # or "dirent" or "never"
+
+ When this extension is in use, instead of skipping directories named
+on the command line (or with `getline'), they are read, with each entry
+returned as a record.
+
+ The record consists of at least two fields: the inode number and the
+filename, separated by a forward slash character. On systems where the
+directory entry contains the file type, the record has a third field
+which is a single letter indicating the type of the file:
+
+Letter File Type
+--------------------------------------------------------------------------
+`b' Block device
+`c' Character device
+`d' Directory
+`f' Regular file
+`l' Symbolic link
+`p' Named pipe (FIFO)
+`s' Socket
+`u' Anything else (unknown)
+
+ On systems without the file type information, calling
+`readdir_do_ftype("stat")' causes the extension to use the `lstat()'
+system call to retrieve the appropriate information. This is not the
+default, since `lstat()' is a potentially expensive operation. By
+calling `readdir_do_ftype("never")' one can ensure that the file type
+information is never displayed, even when readily available in the
+directory entry.
+
+ The third option, `readdir_do_ftype("dirent")', takes file type
+information from the directory entry, if it is available. This is the
+default on systems that supply this information.
+
+ The `readdir_do_ftype()' function sets `ERRNO' if called without
+arguments or with invalid arguments.
+
+ NOTE: On GNU/Linux systems, there are filesystems that don't
+ support the `d_type' entry (see the readdir(3) manual page), and
+ so the file type is always `u'. Therefore, using
+ `readdir_do_ftype("stat")' is advisable even on GNU/Linux systems.
+ In this case, the `readdir' extension falls back to using
+ `lstat()' when it encounters an unknown file type.
+
+ Here is an example:
+
+ @load "readdir"
+ ...
+ BEGIN { FS = "/" }
+ { print "file name is", $2 }
+
+
+File: gawk.info, Node: Extension Sample Revout, Next: Extension Sample Rev2way, Prev: Extension Sample Readdir, Up: Extension Samples
+
+16.6.6 Reversing Output
+-----------------------
+
+The `revoutput' extension adds a simple output wrapper that reverses
+the characters in each output line. It's main purpose is to show how to
+write an output wrapper, although it may be mildly amusing for the
+unwary. Here is an example:
+
+ @load "revoutput"
+
+ BEGIN {
+ REVOUT = 1
+ print "hello, world" > "/dev/stdout"
+ }
+
+ The output from this program is: `dlrow ,olleh'.
+
+
+File: gawk.info, Node: Extension Sample Rev2way, Next: Extension Sample Read write array, Prev: Extension Sample Revout, Up: Extension Samples
+
+16.6.7 Two-Way I/O Example
+--------------------------
+
+The `revtwoway' extension adds a simple two-way processor that reverses
+the characters in each line sent to it for reading back by the `awk'
+program. It's main purpose is to show how to write a two-way
+processor, although it may also be mildly amusing. The following
+example shows how to use it:
+
+ @load "revtwoway"
+
+ BEGIN {
+ cmd = "/magic/mirror"
+ print "hello, world" |& cmd
+ cmd |& getline result
+ print result
+ close(cmd)
+ }
+
+
+File: gawk.info, Node: Extension Sample Read write array, Next: Extension Sample Readfile, Prev: Extension Sample Rev2way, Up: Extension Samples
+
+16.6.8 Dumping and Restoring An Array
+-------------------------------------
+
+The `rwarray' extension adds two functions, named `writea()' and
+`reada()', as follows:
+
+`ret = writea(file, array)'
+ This function takes a string argument, which is the name of the
+ file to which dump the array, and the array itself as the second
+ argument. `writea()' understands multidimensional arrays. It
+ returns one on success, or zero upon failure.
+
+`ret = reada(file, array)'
+ `reada()' is the inverse of `writea()'; it reads the file named as
+ its first argument, filling in the array named as the second
+ argument. It clears the array first. Here too, the return value
+ is one on success and zero upon failure.
+
+ The array created by `reada()' is identical to that written by
+`writea()' in the sense that the contents are the same. However, due to
+implementation issues, the array traversal order of the recreated array
+is likely to be different from that of the original array. As array
+traversal order in `awk' is by default undefined, this is not
+(technically) a problem. If you need to guarantee a particular
+traversal order, use the array sorting features in `gawk' to do so
+(*note Array Sorting::).
+
+ The file contains binary data. All integral values are written in
+network byte order. However, double precision floating-point values
+are written as native binary data. Thus, arrays containing only string
+data can theoretically be dumped on systems with one byte order and
+restored on systems with a different one, but this has not been tried.
+
+ Here is an example:
+
+ @load "rwarray"
+ ...
+ ret = writea("arraydump.bin", array)
+ ...
+ ret = reada("arraydump.bin", array)
+
+
+File: gawk.info, Node: Extension Sample Readfile, Next: Extension Sample API Tests, Prev: Extension Sample Read write array, Up: Extension Samples
+
+16.6.9 Reading An Entire File
+-----------------------------
+
+The `readfile' extension adds a single function named `readfile()':
+
+`result = readfile("/some/path")'
+ The argument is the name of the file to read. The return value is
+ a string containing the entire contents of the requested file.
+ Upon error, the function returns the empty string and sets `ERRNO'.
+
+ Here is an example:
+
+ @load "readfile"
+ ...
+ contents = readfile("/path/to/file");
+ if (contents == "" && ERRNO != "") {
+ print("problem reading file", ERRNO) > "/dev/stderr"
+ ...
+ }
+
+
+File: gawk.info, Node: Extension Sample API Tests, Next: Extension Sample Time, Prev: Extension Sample Readfile, Up: Extension Samples
+
+16.6.10 API Tests
+-----------------
+
+The `testext' extension exercises parts of the extension API that are
+not tested by the other samples. The `extension/testext.c' file
+contains both the C code for the extension and `awk' test code inside C
+comments that run the tests. The testing framework extracts the `awk'
+code and runs the tests. See the source file for more information.
+
+
+File: gawk.info, Node: Extension Sample Time, Prev: Extension Sample API Tests, Up: Extension Samples
+
+16.6.11 Extension Time Functions
+--------------------------------
+
+These functions can be used by either invoking `gawk' with a
+command-line argument of `-l time' or by inserting `@load "time"' in
+your script.
+
+`the_time = gettimeofday()'
+ Return the time in seconds that has elapsed since 1970-01-01 UTC
+ as a floating point value. If the time is unavailable on this
+ platform, return -1 and set `ERRNO'. The returned time should
+ have sub-second precision, but the actual precision will vary
+ based on the platform. If the standard C `gettimeofday()' system
+ call is available on this platform, then it simply returns the
+ value. Otherwise, if on Windows, it tries to use
+ `GetSystemTimeAsFileTime()'.
+
+`result = sleep(SECONDS)'
+ Attempt to sleep for SECONDS seconds. If SECONDS is negative, or
+ the attempt to sleep fails, return -1 and set `ERRNO'. Otherwise,
+ return zero after sleeping for the indicated amount of time. Note
+ that SECONDS may be a floating-point (non-integral) value.
+ Implementation details: depending on platform availability, this
+ function tries to use `nanosleep()' or `select()' to implement the
+ delay.
+
+
+File: gawk.info, Node: gawkextlib, Prev: Extension Samples, Up: Dynamic Extensions
+
+16.7 The `gawkextlib' Project
+=============================
+
+The `gawkextlib' (http://sourceforge.net/projects/gawkextlib/) project
+provides a number of `gawk' extensions, including one for processing
+XML files. This is the evolution of the original `xgawk' (XML `gawk')
+project.
+
+ As of this writing, there are four extensions:
+
+ * XML parser extension, using the Expat
+ (http://expat.sourceforge.net) XML parsing library.
+
+ * Postgres SQL extension.
+
+ * GD graphics library extension.
+
+ * MPFR library extension. This provides access to a number of MPFR
+ functions which `gawk''s native MPFR support does not.
+
+ The `time' extension described earlier (*note Extension Sample
+Time::) was originally from this project but has been moved in to the
+main `gawk' distribution.
+
+ You can check out the code for the `gawkextlib' project using the
+GIT (http://git-scm.com) distributed source code control system. The
+command is as follows:
+
+ git clone git://git.code.sf.net/p/gawkextlib/code gawkextlib-code
+
+ You will need to have the Expat (http://expat.sourceforge.net) XML
+parser library installed in order to build and use the XML extension.
+
+ In addition, you must have the GNU Autotools installed (Autoconf
+(http://www.gnu.org/software/autoconf), Automake
+(http://www.gnu.org/software/automake), Libtool
+(http://www.gnu.org/software/libtool), and Gettext
+(http://www.gnu.org/software/gettext)).
+
+ The simple recipe for building and testing `gawkextlib' is as
+follows. First, build and install `gawk':
+
+ cd .../path/to/gawk/code
+ ./configure --prefix=/tmp/newgawk Install in /tmp/newgawk for now
+ make && make check Build and check that all is OK
+ make install Install gawk
+
+ Next, build `gawkextlib' and test it:
+
+ cd .../path/to/gawkextlib-code
+ ./update-autotools Generate configure, etc.
+ You may have to run this command twice
+ ./configure --with-gawk=/tmp/newgawk Configure, point at "installed" gawk
+ make && make check Build and check that all is OK
+
+ If you write an extension that you wish to share with other `gawk'
+users, please consider doing so through the `gawkextlib' project.
+

File: gawk.info, Node: Language History, Next: Installation, Prev: Dynamic Extensions, Up: Top
@@ -26041,7 +29190,6 @@ Index
* Ada programming language: Glossary. (line 20)
* adding, features to gawk: Adding Code. (line 6)
* adding, fields: Changing Fields. (line 53)
-* adding, functions to gawk: Dynamic Extensions. (line 9)
* advanced features, buffering: I/O Functions. (line 98)
* advanced features, close() function: Close Files And Pipes.
(line 131)
@@ -26399,7 +29547,6 @@ Index
* characters, transliterating: Translate Program. (line 6)
* characters, values of as numbers: Ordinal Functions. (line 6)
* Chassell, Robert J.: Acknowledgments. (line 33)
-* chdir() function, implementing in gawk: Sample Library. (line 6)
* chem utility: Glossary. (line 151)
* chr() user-defined function: Ordinal Functions. (line 16)
* clear debugger command: Breakpoint Control. (line 36)
@@ -26771,7 +29918,6 @@ Index
(line 162)
* differences in awk and gawk, trunc-mod operation: Arithmetic Ops.
(line 66)
-* directories, changing: Sample Library. (line 6)
* directories, command line: Command line directories.
(line 6)
* directories, searching <1>: Igawk Program. (line 368)
@@ -26896,8 +30042,6 @@ Index
(line 9)
* expressions, selecting: Conditional Exp. (line 6)
* Extended Regular Expressions (EREs): Bracket Expressions. (line 24)
-* extension() function (gawk): Using Internal File Ops.
- (line 15)
* extensions, Brian Kernighan's awk <1>: Other Versions. (line 13)
* extensions, Brian Kernighan's awk: BTL. (line 6)
* extensions, common, ** operator: Arithmetic Ops. (line 36)
@@ -26992,7 +30136,6 @@ Index
* files, closing: I/O Functions. (line 10)
* files, descriptors, See file descriptors: Special FD. (line 6)
* files, group: Group Functions. (line 6)
-* files, information about, retrieving: Sample Library. (line 6)
* files, initialization and cleanup: Filetrans Function. (line 6)
* files, input, See input files: Read Terminal. (line 17)
* files, log, timestamps in: Time Functions. (line 6)
@@ -27088,7 +30231,6 @@ Index
(line 47)
* functions, built-in <1>: Functions. (line 6)
* functions, built-in: Function Calls. (line 10)
-* functions, built-in, adding to gawk: Dynamic Extensions. (line 9)
* functions, built-in, evaluation order: Calling Built-in. (line 30)
* functions, defining: Definition Syntax. (line 6)
* functions, library: Library Functions. (line 6)
@@ -27170,7 +30312,6 @@ Index
(line 26)
* gawk, FUNCTAB array in: Auto-set. (line 119)
* gawk, function arguments and: Calling Built-in. (line 16)
-* gawk, functions, adding: Dynamic Extensions. (line 9)
* gawk, hexadecimal numbers and: Nondecimal-numbers. (line 42)
* gawk, IGNORECASE variable in <1>: Array Sorting Functions.
(line 81)
@@ -27268,6 +30409,8 @@ Index
* gettext library: Explaining gettext. (line 6)
* gettext library, locale categories: Explaining gettext. (line 80)
* gettext() function (C library): Explaining gettext. (line 62)
+* gettimeofday time extension function: Extension Sample Time.
+ (line 10)
* GMP: Arbitrary Precision Arithmetic.
(line 6)
* GNITS mailing list: Acknowledgments. (line 52)
@@ -27922,7 +31065,7 @@ Index
(line 10)
* programming conventions, functions, writing: Definition Syntax.
(line 55)
-* programming conventions, gawk internals: Internal File Ops. (line 33)
+* programming conventions, gawk internals: Internal File Ops. (line 45)
* programming conventions, private variable names: Library Names.
(line 23)
* programming language, recipe for: History. (line 6)
@@ -28172,6 +31315,10 @@ Index
* single-character fields: Single Character Fields.
(line 6)
* Skywalker, Luke: Undocumented. (line 6)
+* sleep: Extension Sample Time.
+ (line 6)
+* sleep time extension function: Extension Sample Time.
+ (line 20)
* sleep utility: Alarm Program. (line 109)
* Solaris, POSIX-compliant awk: Other Versions. (line 87)
* sort function, arrays, sorting: Array Sorting Functions.
@@ -28216,7 +31363,6 @@ Index
* standard input <1>: Special FD. (line 6)
* standard input: Read Terminal. (line 6)
* standard output: Special FD. (line 6)
-* stat() function, implementing in gawk: Sample Library. (line 6)
* statements, compound, control statements and: Statements. (line 10)
* statements, control, in actions: Statements. (line 6)
* statements, multiple: Statements/Lines. (line 91)
@@ -28308,6 +31454,8 @@ Index
* tilde (~), ~ operator <5>: Computed Regexps. (line 6)
* tilde (~), ~ operator <6>: Case-sensitivity. (line 26)
* tilde (~), ~ operator: Regexp Usage. (line 19)
+* time: Extension Sample Time.
+ (line 6)
* time, alarm clock example program: Alarm Program. (line 9)
* time, localization and: Explaining gettext. (line 115)
* time, managing: Getlocaltime Function.
@@ -28517,452 +31665,515 @@ Index

Tag Table:
Node: Top1352
-Node: Foreword31870
-Node: Preface36215
-Ref: Preface-Footnote-139268
-Ref: Preface-Footnote-239374
-Node: History39606
-Node: Names41997
-Ref: Names-Footnote-143474
-Node: This Manual43546
-Ref: This Manual-Footnote-148674
-Node: Conventions48774
-Node: Manual History50908
-Ref: Manual History-Footnote-154178
-Ref: Manual History-Footnote-254219
-Node: How To Contribute54293
-Node: Acknowledgments55437
-Node: Getting Started59933
-Node: Running gawk62312
-Node: One-shot63498
-Node: Read Terminal64723
-Ref: Read Terminal-Footnote-166373
-Ref: Read Terminal-Footnote-266649
-Node: Long66820
-Node: Executable Scripts68196
-Ref: Executable Scripts-Footnote-170065
-Ref: Executable Scripts-Footnote-270167
-Node: Comments70714
-Node: Quoting73181
-Node: DOS Quoting77804
-Node: Sample Data Files78479
-Node: Very Simple81511
-Node: Two Rules86110
-Node: More Complex88257
-Ref: More Complex-Footnote-191187
-Node: Statements/Lines91272
-Ref: Statements/Lines-Footnote-195734
-Node: Other Features95999
-Node: When96927
-Node: Invoking Gawk99074
-Node: Command Line100535
-Node: Options101318
-Ref: Options-Footnote-1116716
-Node: Other Arguments116741
-Node: Naming Standard Input119399
-Node: Environment Variables120493
-Node: AWKPATH Variable121051
-Ref: AWKPATH Variable-Footnote-1123809
-Node: AWKLIBPATH Variable124069
-Node: Other Environment Variables124666
-Node: Exit Status127161
-Node: Include Files127836
-Node: Loading Shared Libraries131405
-Node: Obsolete132630
-Node: Undocumented133327
-Node: Regexp133570
-Node: Regexp Usage134959
-Node: Escape Sequences136985
-Node: Regexp Operators142748
-Ref: Regexp Operators-Footnote-1150128
-Ref: Regexp Operators-Footnote-2150275
-Node: Bracket Expressions150373
-Ref: table-char-classes152263
-Node: GNU Regexp Operators154786
-Node: Case-sensitivity158509
-Ref: Case-sensitivity-Footnote-1161477
-Ref: Case-sensitivity-Footnote-2161712
-Node: Leftmost Longest161820
-Node: Computed Regexps163021
-Node: Reading Files166431
-Node: Records168434
-Ref: Records-Footnote-1177358
-Node: Fields177395
-Ref: Fields-Footnote-1180428
-Node: Nonconstant Fields180514
-Node: Changing Fields182716
-Node: Field Separators188697
-Node: Default Field Splitting191326
-Node: Regexp Field Splitting192443
-Node: Single Character Fields195785
-Node: Command Line Field Separator196844
-Node: Field Splitting Summary200285
-Ref: Field Splitting Summary-Footnote-1203477
-Node: Constant Size203578
-Node: Splitting By Content208162
-Ref: Splitting By Content-Footnote-1211888
-Node: Multiple Line211928
-Ref: Multiple Line-Footnote-1217775
-Node: Getline217954
-Node: Plain Getline220170
-Node: Getline/Variable222259
-Node: Getline/File223400
-Node: Getline/Variable/File224722
-Ref: Getline/Variable/File-Footnote-1226321
-Node: Getline/Pipe226408
-Node: Getline/Variable/Pipe228968
-Node: Getline/Coprocess230075
-Node: Getline/Variable/Coprocess231318
-Node: Getline Notes232032
-Node: Getline Summary234819
-Ref: table-getline-variants235227
-Node: Read Timeout236083
-Ref: Read Timeout-Footnote-1239828
-Node: Command line directories239885
-Node: Printing240515
-Node: Print242146
-Node: Print Examples243483
-Node: Output Separators246267
-Node: OFMT248027
-Node: Printf249385
-Node: Basic Printf250291
-Node: Control Letters251830
-Node: Format Modifiers255642
-Node: Printf Examples261651
-Node: Redirection264366
-Node: Special Files271350
-Node: Special FD271883
-Ref: Special FD-Footnote-1275508
-Node: Special Network275582
-Node: Special Caveats276432
-Node: Close Files And Pipes277228
-Ref: Close Files And Pipes-Footnote-1284251
-Ref: Close Files And Pipes-Footnote-2284399
-Node: Expressions284549
-Node: Values285681
-Node: Constants286357
-Node: Scalar Constants287037
-Ref: Scalar Constants-Footnote-1287896
-Node: Nondecimal-numbers288078
-Node: Regexp Constants291137
-Node: Using Constant Regexps291612
-Node: Variables294667
-Node: Using Variables295322
-Node: Assignment Options297046
-Node: Conversion298918
-Ref: table-locale-affects304294
-Ref: Conversion-Footnote-1304918
-Node: All Operators305027
-Node: Arithmetic Ops305657
-Node: Concatenation308162
-Ref: Concatenation-Footnote-1310955
-Node: Assignment Ops311075
-Ref: table-assign-ops316063
-Node: Increment Ops317471
-Node: Truth Values and Conditions320941
-Node: Truth Values322024
-Node: Typing and Comparison323073
-Node: Variable Typing323862
-Ref: Variable Typing-Footnote-1327759
-Node: Comparison Operators327881
-Ref: table-relational-ops328291
-Node: POSIX String Comparison331840
-Ref: POSIX String Comparison-Footnote-1332796
-Node: Boolean Ops332934
-Ref: Boolean Ops-Footnote-1337012
-Node: Conditional Exp337103
-Node: Function Calls338835
-Node: Precedence342429
-Node: Locales346098
-Node: Patterns and Actions347187
-Node: Pattern Overview348241
-Node: Regexp Patterns349910
-Node: Expression Patterns350453
-Node: Ranges354138
-Node: BEGIN/END357104
-Node: Using BEGIN/END357866
-Ref: Using BEGIN/END-Footnote-1360597
-Node: I/O And BEGIN/END360703
-Node: BEGINFILE/ENDFILE362985
-Node: Empty365889
-Node: Using Shell Variables366205
-Node: Action Overview368490
-Node: Statements370847
-Node: If Statement372701
-Node: While Statement374200
-Node: Do Statement376244
-Node: For Statement377400
-Node: Switch Statement380552
-Node: Break Statement382649
-Node: Continue Statement384639
-Node: Next Statement386432
-Node: Nextfile Statement388822
-Node: Exit Statement391463
-Node: Built-in Variables393879
-Node: User-modified394974
-Ref: User-modified-Footnote-1403329
-Node: Auto-set403391
-Ref: Auto-set-Footnote-1415742
-Ref: Auto-set-Footnote-2415947
-Node: ARGC and ARGV416003
-Node: Arrays419854
-Node: Array Basics421359
-Node: Array Intro422185
-Node: Reference to Elements426503
-Node: Assigning Elements428773
-Node: Array Example429264
-Node: Scanning an Array430996
-Node: Controlling Scanning433310
-Ref: Controlling Scanning-Footnote-1438243
-Node: Delete438559
-Ref: Delete-Footnote-1441324
-Node: Numeric Array Subscripts441381
-Node: Uninitialized Subscripts443564
-Node: Multi-dimensional445192
-Node: Multi-scanning448286
-Node: Arrays of Arrays449877
-Node: Functions454522
-Node: Built-in455344
-Node: Calling Built-in456422
-Node: Numeric Functions458410
-Ref: Numeric Functions-Footnote-1462242
-Ref: Numeric Functions-Footnote-2462599
-Ref: Numeric Functions-Footnote-3462647
-Node: String Functions462916
-Ref: String Functions-Footnote-1486413
-Ref: String Functions-Footnote-2486542
-Ref: String Functions-Footnote-3486790
-Node: Gory Details486877
-Ref: table-sub-escapes488556
-Ref: table-sub-posix-92489910
-Ref: table-sub-proposed491253
-Ref: table-posix-sub492603
-Ref: table-gensub-escapes494149
-Ref: Gory Details-Footnote-1495356
-Ref: Gory Details-Footnote-2495407
-Node: I/O Functions495558
-Ref: I/O Functions-Footnote-1502213
-Node: Time Functions502360
-Ref: Time Functions-Footnote-1513252
-Ref: Time Functions-Footnote-2513320
-Ref: Time Functions-Footnote-3513478
-Ref: Time Functions-Footnote-4513589
-Ref: Time Functions-Footnote-5513701
-Ref: Time Functions-Footnote-6513928
-Node: Bitwise Functions514194
-Ref: table-bitwise-ops514752
-Ref: Bitwise Functions-Footnote-1518973
-Node: Type Functions519157
-Node: I18N Functions519627
-Node: User-defined521254
-Node: Definition Syntax522058
-Ref: Definition Syntax-Footnote-1526968
-Node: Function Example527037
-Node: Function Caveats529631
-Node: Calling A Function530052
-Node: Variable Scope531167
-Node: Pass By Value/Reference533142
-Node: Return Statement536582
-Node: Dynamic Typing539563
-Node: Indirect Calls540298
-Node: Internationalization549983
-Node: I18N and L10N551409
-Node: Explaining gettext552095
-Ref: Explaining gettext-Footnote-1557161
-Ref: Explaining gettext-Footnote-2557345
-Node: Programmer i18n557510
-Node: Translator i18n561710
-Node: String Extraction562503
-Ref: String Extraction-Footnote-1563464
-Node: Printf Ordering563550
-Ref: Printf Ordering-Footnote-1566334
-Node: I18N Portability566398
-Ref: I18N Portability-Footnote-1568847
-Node: I18N Example568910
-Ref: I18N Example-Footnote-1571545
-Node: Gawk I18N571617
-Node: Advanced Features572234
-Node: Nondecimal Data573747
-Node: Array Sorting575330
-Node: Controlling Array Traversal576027
-Node: Array Sorting Functions584265
-Ref: Array Sorting Functions-Footnote-1587939
-Ref: Array Sorting Functions-Footnote-2588032
-Node: Two-way I/O588226
-Ref: Two-way I/O-Footnote-1593658
-Node: TCP/IP Networking593728
-Node: Profiling596572
-Node: Library Functions604026
-Ref: Library Functions-Footnote-1607033
-Node: Library Names607204
-Ref: Library Names-Footnote-1610675
-Ref: Library Names-Footnote-2610895
-Node: General Functions610981
-Node: Strtonum Function611934
-Node: Assert Function614864
-Node: Round Function618190
-Node: Cliff Random Function619733
-Node: Ordinal Functions620749
-Ref: Ordinal Functions-Footnote-1623819
-Ref: Ordinal Functions-Footnote-2624071
-Node: Join Function624280
-Ref: Join Function-Footnote-1626051
-Node: Getlocaltime Function626251
-Node: Data File Management629966
-Node: Filetrans Function630598
-Node: Rewind Function634737
-Node: File Checking636124
-Node: Empty Files637218
-Node: Ignoring Assigns639448
-Node: Getopt Function641001
-Ref: Getopt Function-Footnote-1652305
-Node: Passwd Functions652508
-Ref: Passwd Functions-Footnote-1661483
-Node: Group Functions661571
-Node: Walking Arrays669655
-Node: Sample Programs671224
-Node: Running Examples671889
-Node: Clones672617
-Node: Cut Program673841
-Node: Egrep Program683686
-Ref: Egrep Program-Footnote-1691459
-Node: Id Program691569
-Node: Split Program695185
-Ref: Split Program-Footnote-1698704
-Node: Tee Program698832
-Node: Uniq Program701635
-Node: Wc Program709064
-Ref: Wc Program-Footnote-1713330
-Ref: Wc Program-Footnote-2713530
-Node: Miscellaneous Programs713622
-Node: Dupword Program714810
-Node: Alarm Program716841
-Node: Translate Program721590
-Ref: Translate Program-Footnote-1725977
-Ref: Translate Program-Footnote-2726205
-Node: Labels Program726339
-Ref: Labels Program-Footnote-1729710
-Node: Word Sorting729794
-Node: History Sorting733678
-Node: Extract Program735517
-Ref: Extract Program-Footnote-1743000
-Node: Simple Sed743128
-Node: Igawk Program746190
-Ref: Igawk Program-Footnote-1761347
-Ref: Igawk Program-Footnote-2761548
-Node: Anagram Program761686
-Node: Signature Program764754
-Node: Debugger765854
-Node: Debugging766820
-Node: Debugging Concepts767253
-Node: Debugging Terms769109
-Node: Awk Debugging771706
-Node: Sample Debugging Session772598
-Node: Debugger Invocation773118
-Node: Finding The Bug774447
-Node: List of Debugger Commands780935
-Node: Breakpoint Control782269
-Node: Debugger Execution Control785933
-Node: Viewing And Changing Data789293
-Node: Execution Stack792649
-Node: Debugger Info794116
-Node: Miscellaneous Debugger Commands798097
-Node: Readline Support803542
-Node: Limitations804373
-Node: Arbitrary Precision Arithmetic806625
-Ref: Arbitrary Precision Arithmetic-Footnote-1808267
-Node: General Arithmetic808415
-Node: Floating Point Issues810135
-Node: String Conversion Precision811016
-Ref: String Conversion Precision-Footnote-1812722
-Node: Unexpected Results812831
-Node: POSIX Floating Point Problems814984
-Ref: POSIX Floating Point Problems-Footnote-1818809
-Node: Integer Programming818847
-Node: Floating-point Programming820600
-Ref: Floating-point Programming-Footnote-1826909
-Node: Floating-point Representation827173
-Node: Floating-point Context828338
-Ref: table-ieee-formats829180
-Node: Rounding Mode830564
-Ref: table-rounding-modes831043
-Ref: Rounding Mode-Footnote-1834047
-Node: Gawk and MPFR834228
-Node: Arbitrary Precision Floats835470
-Ref: Arbitrary Precision Floats-Footnote-1837899
-Node: Setting Precision838210
-Node: Setting Rounding Mode840943
-Ref: table-gawk-rounding-modes841347
-Node: Floating-point Constants842527
-Node: Changing Precision843951
-Ref: Changing Precision-Footnote-1845351
-Node: Exact Arithmetic845525
-Node: Arbitrary Precision Integers848633
-Ref: Arbitrary Precision Integers-Footnote-1851633
-Node: Dynamic Extensions851780
-Node: Plugin License852698
-Node: Sample Library853312
-Node: Internal File Description853996
-Node: Internal File Ops857709
-Ref: Internal File Ops-Footnote-1862272
-Node: Using Internal File Ops862412
-Node: Language History864788
-Node: V7/SVR3.1866310
-Node: SVR4868631
-Node: POSIX870073
-Node: BTL871081
-Node: POSIX/GNU871815
-Node: Common Extensions877350
-Node: Ranges and Locales878457
-Ref: Ranges and Locales-Footnote-1883075
-Ref: Ranges and Locales-Footnote-2883102
-Ref: Ranges and Locales-Footnote-3883362
-Node: Contributors883583
-Node: Installation887879
-Node: Gawk Distribution888773
-Node: Getting889257
-Node: Extracting890083
-Node: Distribution contents891775
-Node: Unix Installation896997
-Node: Quick Installation897614
-Node: Additional Configuration Options899576
-Node: Configuration Philosophy901053
-Node: Non-Unix Installation903395
-Node: PC Installation903853
-Node: PC Binary Installation905152
-Node: PC Compiling907000
-Node: PC Testing909944
-Node: PC Using911120
-Node: Cygwin915305
-Node: MSYS916305
-Node: VMS Installation916819
-Node: VMS Compilation917422
-Ref: VMS Compilation-Footnote-1918429
-Node: VMS Installation Details918487
-Node: VMS Running920122
-Node: VMS Old Gawk921729
-Node: Bugs922203
-Node: Other Versions926055
-Node: Notes931370
-Node: Compatibility Mode931957
-Node: Additions932740
-Node: Accessing The Source933667
-Node: Adding Code935093
-Node: New Ports941135
-Node: Derived Files945270
-Ref: Derived Files-Footnote-1950575
-Ref: Derived Files-Footnote-2950609
-Ref: Derived Files-Footnote-3951209
-Node: Future Extensions951307
-Node: Basic Concepts952794
-Node: Basic High Level953475
-Ref: figure-general-flow953746
-Ref: figure-process-flow954345
-Ref: Basic High Level-Footnote-1957574
-Node: Basic Data Typing957759
-Node: Glossary961114
-Node: Copying986425
-Node: GNU Free Documentation License1023982
-Node: Index1049119
+Node: Foreword40050
+Node: Preface44395
+Ref: Preface-Footnote-147448
+Ref: Preface-Footnote-247554
+Node: History47786
+Node: Names50177
+Ref: Names-Footnote-151654
+Node: This Manual51726
+Ref: This Manual-Footnote-156854
+Node: Conventions56954
+Node: Manual History59088
+Ref: Manual History-Footnote-162358
+Ref: Manual History-Footnote-262399
+Node: How To Contribute62473
+Node: Acknowledgments63617
+Node: Getting Started68113
+Node: Running gawk70492
+Node: One-shot71678
+Node: Read Terminal72903
+Ref: Read Terminal-Footnote-174553
+Ref: Read Terminal-Footnote-274829
+Node: Long75000
+Node: Executable Scripts76376
+Ref: Executable Scripts-Footnote-178245
+Ref: Executable Scripts-Footnote-278347
+Node: Comments78894
+Node: Quoting81361
+Node: DOS Quoting85984
+Node: Sample Data Files86659
+Node: Very Simple89691
+Node: Two Rules94290
+Node: More Complex96437
+Ref: More Complex-Footnote-199367
+Node: Statements/Lines99452
+Ref: Statements/Lines-Footnote-1103914
+Node: Other Features104179
+Node: When105107
+Node: Invoking Gawk107254
+Node: Command Line108715
+Node: Options109498
+Ref: Options-Footnote-1124896
+Node: Other Arguments124921
+Node: Naming Standard Input127579
+Node: Environment Variables128673
+Node: AWKPATH Variable129231
+Ref: AWKPATH Variable-Footnote-1131989
+Node: AWKLIBPATH Variable132249
+Node: Other Environment Variables132846
+Node: Exit Status135341
+Node: Include Files136016
+Node: Loading Shared Libraries139585
+Node: Obsolete140810
+Node: Undocumented141507
+Node: Regexp141750
+Node: Regexp Usage143139
+Node: Escape Sequences145165
+Node: Regexp Operators150928
+Ref: Regexp Operators-Footnote-1158308
+Ref: Regexp Operators-Footnote-2158455
+Node: Bracket Expressions158553
+Ref: table-char-classes160443
+Node: GNU Regexp Operators162966
+Node: Case-sensitivity166689
+Ref: Case-sensitivity-Footnote-1169657
+Ref: Case-sensitivity-Footnote-2169892
+Node: Leftmost Longest170000
+Node: Computed Regexps171201
+Node: Reading Files174611
+Node: Records176614
+Ref: Records-Footnote-1185538
+Node: Fields185575
+Ref: Fields-Footnote-1188608
+Node: Nonconstant Fields188694
+Node: Changing Fields190896
+Node: Field Separators196877
+Node: Default Field Splitting199506
+Node: Regexp Field Splitting200623
+Node: Single Character Fields203965
+Node: Command Line Field Separator205024
+Node: Field Splitting Summary208465
+Ref: Field Splitting Summary-Footnote-1211657
+Node: Constant Size211758
+Node: Splitting By Content216342
+Ref: Splitting By Content-Footnote-1220068
+Node: Multiple Line220108
+Ref: Multiple Line-Footnote-1225955
+Node: Getline226134
+Node: Plain Getline228350
+Node: Getline/Variable230439
+Node: Getline/File231580
+Node: Getline/Variable/File232902
+Ref: Getline/Variable/File-Footnote-1234501
+Node: Getline/Pipe234588
+Node: Getline/Variable/Pipe237148
+Node: Getline/Coprocess238255
+Node: Getline/Variable/Coprocess239498
+Node: Getline Notes240212
+Node: Getline Summary242999
+Ref: table-getline-variants243407
+Node: Read Timeout244263
+Ref: Read Timeout-Footnote-1248008
+Node: Command line directories248065
+Node: Printing248695
+Node: Print250326
+Node: Print Examples251663
+Node: Output Separators254447
+Node: OFMT256207
+Node: Printf257565
+Node: Basic Printf258471
+Node: Control Letters260010
+Node: Format Modifiers263822
+Node: Printf Examples269831
+Node: Redirection272546
+Node: Special Files279530
+Node: Special FD280063
+Ref: Special FD-Footnote-1283688
+Node: Special Network283762
+Node: Special Caveats284612
+Node: Close Files And Pipes285408
+Ref: Close Files And Pipes-Footnote-1292431
+Ref: Close Files And Pipes-Footnote-2292579
+Node: Expressions292729
+Node: Values293861
+Node: Constants294537
+Node: Scalar Constants295217
+Ref: Scalar Constants-Footnote-1296076
+Node: Nondecimal-numbers296258
+Node: Regexp Constants299317
+Node: Using Constant Regexps299792
+Node: Variables302847
+Node: Using Variables303502
+Node: Assignment Options305226
+Node: Conversion307098
+Ref: table-locale-affects312474
+Ref: Conversion-Footnote-1313098
+Node: All Operators313207
+Node: Arithmetic Ops313837
+Node: Concatenation316342
+Ref: Concatenation-Footnote-1319135
+Node: Assignment Ops319255
+Ref: table-assign-ops324243
+Node: Increment Ops325651
+Node: Truth Values and Conditions329121
+Node: Truth Values330204
+Node: Typing and Comparison331253
+Node: Variable Typing332042
+Ref: Variable Typing-Footnote-1335939
+Node: Comparison Operators336061
+Ref: table-relational-ops336471
+Node: POSIX String Comparison340020
+Ref: POSIX String Comparison-Footnote-1340976
+Node: Boolean Ops341114
+Ref: Boolean Ops-Footnote-1345192
+Node: Conditional Exp345283
+Node: Function Calls347015
+Node: Precedence350609
+Node: Locales354278
+Node: Patterns and Actions355367
+Node: Pattern Overview356421
+Node: Regexp Patterns358090
+Node: Expression Patterns358633
+Node: Ranges362318
+Node: BEGIN/END365284
+Node: Using BEGIN/END366046
+Ref: Using BEGIN/END-Footnote-1368777
+Node: I/O And BEGIN/END368883
+Node: BEGINFILE/ENDFILE371165
+Node: Empty374069
+Node: Using Shell Variables374385
+Node: Action Overview376670
+Node: Statements379027
+Node: If Statement380881
+Node: While Statement382380
+Node: Do Statement384424
+Node: For Statement385580
+Node: Switch Statement388732
+Node: Break Statement390829
+Node: Continue Statement392819
+Node: Next Statement394612
+Node: Nextfile Statement397002
+Node: Exit Statement399643
+Node: Built-in Variables402059
+Node: User-modified403154
+Ref: User-modified-Footnote-1411509
+Node: Auto-set411571
+Ref: Auto-set-Footnote-1423922
+Ref: Auto-set-Footnote-2424127
+Node: ARGC and ARGV424183
+Node: Arrays428034
+Node: Array Basics429539
+Node: Array Intro430365
+Node: Reference to Elements434683
+Node: Assigning Elements436953
+Node: Array Example437444
+Node: Scanning an Array439176
+Node: Controlling Scanning441490
+Ref: Controlling Scanning-Footnote-1446423
+Node: Delete446739
+Ref: Delete-Footnote-1449504
+Node: Numeric Array Subscripts449561
+Node: Uninitialized Subscripts451744
+Node: Multi-dimensional453372
+Node: Multi-scanning456466
+Node: Arrays of Arrays458057
+Node: Functions462702
+Node: Built-in463524
+Node: Calling Built-in464602
+Node: Numeric Functions466590
+Ref: Numeric Functions-Footnote-1470422
+Ref: Numeric Functions-Footnote-2470779
+Ref: Numeric Functions-Footnote-3470827
+Node: String Functions471096
+Ref: String Functions-Footnote-1494593
+Ref: String Functions-Footnote-2494722
+Ref: String Functions-Footnote-3494970
+Node: Gory Details495057
+Ref: table-sub-escapes496736
+Ref: table-sub-posix-92498090
+Ref: table-sub-proposed499433
+Ref: table-posix-sub500783
+Ref: table-gensub-escapes502329
+Ref: Gory Details-Footnote-1503536
+Ref: Gory Details-Footnote-2503587
+Node: I/O Functions503738
+Ref: I/O Functions-Footnote-1510393
+Node: Time Functions510540
+Ref: Time Functions-Footnote-1521432
+Ref: Time Functions-Footnote-2521500
+Ref: Time Functions-Footnote-3521658
+Ref: Time Functions-Footnote-4521769
+Ref: Time Functions-Footnote-5521881
+Ref: Time Functions-Footnote-6522108
+Node: Bitwise Functions522374
+Ref: table-bitwise-ops522932
+Ref: Bitwise Functions-Footnote-1527153
+Node: Type Functions527337
+Node: I18N Functions527807
+Node: User-defined529434
+Node: Definition Syntax530238
+Ref: Definition Syntax-Footnote-1535148
+Node: Function Example535217
+Node: Function Caveats537811
+Node: Calling A Function538232
+Node: Variable Scope539347
+Node: Pass By Value/Reference541322
+Node: Return Statement544762
+Node: Dynamic Typing547743
+Node: Indirect Calls548478
+Node: Internationalization558163
+Node: I18N and L10N559589
+Node: Explaining gettext560275
+Ref: Explaining gettext-Footnote-1565341
+Ref: Explaining gettext-Footnote-2565525
+Node: Programmer i18n565690
+Node: Translator i18n569890
+Node: String Extraction570683
+Ref: String Extraction-Footnote-1571644
+Node: Printf Ordering571730
+Ref: Printf Ordering-Footnote-1574514
+Node: I18N Portability574578
+Ref: I18N Portability-Footnote-1577027
+Node: I18N Example577090
+Ref: I18N Example-Footnote-1579725
+Node: Gawk I18N579797
+Node: Advanced Features580414
+Node: Nondecimal Data581927
+Node: Array Sorting583510
+Node: Controlling Array Traversal584207
+Node: Array Sorting Functions592445
+Ref: Array Sorting Functions-Footnote-1596119
+Ref: Array Sorting Functions-Footnote-2596212
+Node: Two-way I/O596406
+Ref: Two-way I/O-Footnote-1601838
+Node: TCP/IP Networking601908
+Node: Profiling604752
+Node: Library Functions612206
+Ref: Library Functions-Footnote-1615213
+Node: Library Names615384
+Ref: Library Names-Footnote-1618855
+Ref: Library Names-Footnote-2619075
+Node: General Functions619161
+Node: Strtonum Function620114
+Node: Assert Function623044
+Node: Round Function626370
+Node: Cliff Random Function627913
+Node: Ordinal Functions628929
+Ref: Ordinal Functions-Footnote-1631999
+Ref: Ordinal Functions-Footnote-2632251
+Node: Join Function632460
+Ref: Join Function-Footnote-1634231
+Node: Getlocaltime Function634431
+Node: Data File Management638146
+Node: Filetrans Function638778
+Node: Rewind Function642917
+Node: File Checking644304
+Node: Empty Files645398
+Node: Ignoring Assigns647628
+Node: Getopt Function649181
+Ref: Getopt Function-Footnote-1660485
+Node: Passwd Functions660688
+Ref: Passwd Functions-Footnote-1669663
+Node: Group Functions669751
+Node: Walking Arrays677835
+Node: Sample Programs679404
+Node: Running Examples680069
+Node: Clones680797
+Node: Cut Program682021
+Node: Egrep Program691866
+Ref: Egrep Program-Footnote-1699639
+Node: Id Program699749
+Node: Split Program703365
+Ref: Split Program-Footnote-1706884
+Node: Tee Program707012
+Node: Uniq Program709815
+Node: Wc Program717244
+Ref: Wc Program-Footnote-1721510
+Ref: Wc Program-Footnote-2721710
+Node: Miscellaneous Programs721802
+Node: Dupword Program722990
+Node: Alarm Program725021
+Node: Translate Program729770
+Ref: Translate Program-Footnote-1734157
+Ref: Translate Program-Footnote-2734385
+Node: Labels Program734519
+Ref: Labels Program-Footnote-1737890
+Node: Word Sorting737974
+Node: History Sorting741858
+Node: Extract Program743697
+Ref: Extract Program-Footnote-1751180
+Node: Simple Sed751308
+Node: Igawk Program754370
+Ref: Igawk Program-Footnote-1769527
+Ref: Igawk Program-Footnote-2769728
+Node: Anagram Program769866
+Node: Signature Program772934
+Node: Debugger774034
+Node: Debugging775000
+Node: Debugging Concepts775433
+Node: Debugging Terms777289
+Node: Awk Debugging779886
+Node: Sample Debugging Session780778
+Node: Debugger Invocation781298
+Node: Finding The Bug782627
+Node: List of Debugger Commands789115
+Node: Breakpoint Control790449
+Node: Debugger Execution Control794113
+Node: Viewing And Changing Data797473
+Node: Execution Stack800829
+Node: Debugger Info802296
+Node: Miscellaneous Debugger Commands806277
+Node: Readline Support811722
+Node: Limitations812553
+Node: Arbitrary Precision Arithmetic814805
+Ref: Arbitrary Precision Arithmetic-Footnote-1816447
+Node: General Arithmetic816595
+Node: Floating Point Issues818315
+Node: String Conversion Precision819196
+Ref: String Conversion Precision-Footnote-1820902
+Node: Unexpected Results821011
+Node: POSIX Floating Point Problems823164
+Ref: POSIX Floating Point Problems-Footnote-1826989
+Node: Integer Programming827027
+Node: Floating-point Programming828780
+Ref: Floating-point Programming-Footnote-1835089
+Node: Floating-point Representation835353
+Node: Floating-point Context836518
+Ref: table-ieee-formats837360
+Node: Rounding Mode838744
+Ref: table-rounding-modes839223
+Ref: Rounding Mode-Footnote-1842227
+Node: Gawk and MPFR842408
+Node: Arbitrary Precision Floats843650
+Ref: Arbitrary Precision Floats-Footnote-1846079
+Node: Setting Precision846390
+Node: Setting Rounding Mode849123
+Ref: table-gawk-rounding-modes849527
+Node: Floating-point Constants850707
+Node: Changing Precision852131
+Ref: Changing Precision-Footnote-1853531
+Node: Exact Arithmetic853705
+Node: Arbitrary Precision Integers856813
+Ref: Arbitrary Precision Integers-Footnote-1859813
+Node: Dynamic Extensions859960
+Node: Extension Intro861283
+Node: Plugin License862486
+Node: Extension Design863160
+Node: Old Extension Problems864231
+Ref: Old Extension Problems-Footnote-1865741
+Node: Extension New Mechanism Goals865798
+Ref: Extension New Mechanism Goals-Footnote-1868510
+Node: Extension Other Design Decisions868696
+Node: Extension Mechanism Outline870443
+Ref: load-extension871426
+Ref: load-new-function872859
+Ref: call-new-function873795
+Node: Extension Future Growth875776
+Node: Extension API Description876518
+Node: Extension API Functions Introduction877838
+Node: General Data Types881913
+Ref: General Data Types-Footnote-1887546
+Node: Requesting Values887845
+Ref: table-value-types-returned888576
+Node: Constructor Functions889530
+Node: Registration Functions892526
+Node: Extension Functions893211
+Node: Exit Callback Functions895030
+Node: Extension Version String896273
+Node: Input Parsers896923
+Node: Output Wrappers905504
+Node: Two-way processors909897
+Node: Printing Messages912019
+Ref: Printing Messages-Footnote-1913096
+Node: Updating `ERRNO'913248
+Node: Accessing Parameters913987
+Node: Symbol Table Access915217
+Node: Symbol table by name915729
+Ref: Symbol table by name-Footnote-1917901
+Node: Symbol table by cookie917981
+Ref: Symbol table by cookie-Footnote-1922110
+Node: Cached values922173
+Ref: Cached values-Footnote-1925374
+Node: Array Manipulation925465
+Ref: Array Manipulation-Footnote-1926563
+Node: Array Data Types926602
+Ref: Array Data Types-Footnote-1929328
+Node: Array Functions929420
+Node: Flattening Arrays933186
+Node: Creating Arrays940017
+Node: Extension API Variables944813
+Node: Extension Versioning945449
+Node: Extension API Informational Variables947350
+Node: Extension API Boilerplate948436
+Node: Finding Extensions952270
+Node: Extension Example952817
+Node: Internal File Description953555
+Node: Internal File Ops957243
+Ref: Internal File Ops-Footnote-1968327
+Node: Using Internal File Ops968467
+Ref: Using Internal File Ops-Footnote-1970823
+Node: Extension Samples971089
+Node: Extension Sample File Functions972532
+Node: Extension Sample Fnmatch980901
+Node: Extension Sample Fork982627
+Node: Extension Sample Ord983841
+Node: Extension Sample Readdir984617
+Node: Extension Sample Revout986955
+Node: Extension Sample Rev2way987548
+Node: Extension Sample Read write array988238
+Node: Extension Sample Readfile990121
+Node: Extension Sample API Tests990876
+Node: Extension Sample Time991401
+Node: gawkextlib992710
+Node: Language History995093
+Node: V7/SVR3.1996615
+Node: SVR4998936
+Node: POSIX1000378
+Node: BTL1001386
+Node: POSIX/GNU1002120
+Node: Common Extensions1007655
+Node: Ranges and Locales1008762
+Ref: Ranges and Locales-Footnote-11013380
+Ref: Ranges and Locales-Footnote-21013407
+Ref: Ranges and Locales-Footnote-31013667
+Node: Contributors1013888
+Node: Installation1018184
+Node: Gawk Distribution1019078
+Node: Getting1019562
+Node: Extracting1020388
+Node: Distribution contents1022080
+Node: Unix Installation1027302
+Node: Quick Installation1027919
+Node: Additional Configuration Options1029881
+Node: Configuration Philosophy1031358
+Node: Non-Unix Installation1033700
+Node: PC Installation1034158
+Node: PC Binary Installation1035457
+Node: PC Compiling1037305
+Node: PC Testing1040249
+Node: PC Using1041425
+Node: Cygwin1045610
+Node: MSYS1046610
+Node: VMS Installation1047124
+Node: VMS Compilation1047727
+Ref: VMS Compilation-Footnote-11048734
+Node: VMS Installation Details1048792
+Node: VMS Running1050427
+Node: VMS Old Gawk1052034
+Node: Bugs1052508
+Node: Other Versions1056360
+Node: Notes1061675
+Node: Compatibility Mode1062262
+Node: Additions1063045
+Node: Accessing The Source1063972
+Node: Adding Code1065398
+Node: New Ports1071440
+Node: Derived Files1075575
+Ref: Derived Files-Footnote-11080880
+Ref: Derived Files-Footnote-21080914
+Ref: Derived Files-Footnote-31081514
+Node: Future Extensions1081612
+Node: Basic Concepts1083099
+Node: Basic High Level1083780
+Ref: figure-general-flow1084051
+Ref: figure-process-flow1084650
+Ref: Basic High Level-Footnote-11087879
+Node: Basic Data Typing1088064
+Node: Glossary1091419
+Node: Copying1116730
+Node: GNU Free Documentation License1154287
+Node: Index1179424

End Tag Table
diff --git a/doc/gawk.texi b/doc/gawk.texi
index 59695171..573768ea 100644
--- a/doc/gawk.texi
+++ b/doc/gawk.texi
@@ -321,419 +321,531 @@ particular records in a file and perform operations upon them.
* Index:: Concept and Variable Index.
@detailmenu
-* History:: The history of @command{gawk} and
- @command{awk}.
-* Names:: What name to use to find @command{awk}.
-* This Manual:: Using this @value{DOCUMENT}. Includes
- sample input files that you can use.
-* Conventions:: Typographical Conventions.
-* Manual History:: Brief history of the GNU project and this
- @value{DOCUMENT}.
-* How To Contribute:: Helping to save the world.
-* Acknowledgments:: Acknowledgments.
-* Running gawk:: How to run @command{gawk} programs;
- includes command-line syntax.
-* One-shot:: Running a short throwaway @command{awk}
- program.
-* Read Terminal:: Using no input files (input from terminal
- instead).
-* Long:: Putting permanent @command{awk} programs in
- files.
-* Executable Scripts:: Making self-contained @command{awk}
- programs.
-* Comments:: Adding documentation to @command{gawk}
- programs.
-* Quoting:: More discussion of shell quoting issues.
-* DOS Quoting:: Quoting in Windows Batch Files.
-* Sample Data Files:: Sample data files for use in the
- @command{awk} programs illustrated in this
- @value{DOCUMENT}.
-* Very Simple:: A very simple example.
-* Two Rules:: A less simple one-line example using two
- rules.
-* More Complex:: A more complex example.
-* Statements/Lines:: Subdividing or combining statements into
- lines.
-* Other Features:: Other Features of @command{awk}.
-* When:: When to use @command{gawk} and when to use
- other things.
-* Command Line:: How to run @command{awk}.
-* Options:: Command-line options and their meanings.
-* Other Arguments:: Input file names and variable assignments.
-* Naming Standard Input:: How to specify standard input with other
- files.
-* Environment Variables:: The environment variables @command{gawk}
- uses.
-* AWKPATH Variable:: Searching directories for @command{awk}
- programs.
-* AWKLIBPATH Variable:: Searching directories for @command{awk}
- shared libraries.
-* Other Environment Variables:: The environment variables.
-* Exit Status:: @command{gawk}'s exit status.
-* Include Files:: Including other files into your program.
-* Loading Shared Libraries:: Loading shared libraries into your program.
-* Obsolete:: Obsolete Options and/or features.
-* Undocumented:: Undocumented Options and Features.
-* Regexp Usage:: How to Use Regular Expressions.
-* Escape Sequences:: How to write nonprinting characters.
-* Regexp Operators:: Regular Expression Operators.
-* Bracket Expressions:: What can go between @samp{[...]}.
-* GNU Regexp Operators:: Operators specific to GNU software.
-* Case-sensitivity:: How to do case-insensitive matching.
-* Leftmost Longest:: How much text matches.
-* Computed Regexps:: Using Dynamic Regexps.
-* Records:: Controlling how data is split into records.
-* Fields:: An introduction to fields.
-* Nonconstant Fields:: Nonconstant Field Numbers.
-* Changing Fields:: Changing the Contents of a Field.
-* Field Separators:: The field separator and how to change it.
-* Default Field Splitting:: How fields are normally separated.
-* Regexp Field Splitting:: Using regexps as the field separator.
-* Single Character Fields:: Making each character a separate field.
-* Command Line Field Separator:: Setting @code{FS} from the command-line.
-* Field Splitting Summary:: Some final points and a summary table.
-* Constant Size:: Reading constant width data.
-* Splitting By Content:: Defining Fields By Content
-* Multiple Line:: Reading multi-line records.
-* Getline:: Reading files under explicit program
- control using the @code{getline} function.
-* Plain Getline:: Using @code{getline} with no arguments.
-* Getline/Variable:: Using @code{getline} into a variable.
-* Getline/File:: Using @code{getline} from a file.
-* Getline/Variable/File:: Using @code{getline} into a variable from a
- file.
-* Getline/Pipe:: Using @code{getline} from a pipe.
-* Getline/Variable/Pipe:: Using @code{getline} into a variable from a
- pipe.
-* Getline/Coprocess:: Using @code{getline} from a coprocess.
-* Getline/Variable/Coprocess:: Using @code{getline} into a variable from a
- coprocess.
-* Getline Notes:: Important things to know about
- @code{getline}.
-* Getline Summary:: Summary of @code{getline} Variants.
-* Read Timeout:: Reading input with a timeout.
-* Command line directories:: What happens if you put a directory on the
- command line.
-* Print:: The @code{print} statement.
-* Print Examples:: Simple examples of @code{print} statements.
-* Output Separators:: The output separators and how to change
- them.
-* OFMT:: Controlling Numeric Output With
- @code{print}.
-* Printf:: The @code{printf} statement.
-* Basic Printf:: Syntax of the @code{printf} statement.
-* Control Letters:: Format-control letters.
-* Format Modifiers:: Format-specification modifiers.
-* Printf Examples:: Several examples.
-* Redirection:: How to redirect output to multiple files
- and pipes.
-* Special Files:: File name interpretation in @command{gawk}.
- @command{gawk} allows access to inherited
- file descriptors.
-* Special FD:: Special files for I/O.
-* Special Network:: Special files for network communications.
-* Special Caveats:: Things to watch out for.
-* Close Files And Pipes:: Closing Input and Output Files and Pipes.
-* Values:: Constants, Variables, and Regular
- Expressions.
-* Constants:: String, numeric and regexp constants.
-* Scalar Constants:: Numeric and string constants.
-* Nondecimal-numbers:: What are octal and hex numbers.
-* Regexp Constants:: Regular Expression constants.
-* Using Constant Regexps:: When and how to use a regexp constant.
-* Variables:: Variables give names to values for later
- use.
-* Using Variables:: Using variables in your programs.
-* Assignment Options:: Setting variables on the command-line and a
- summary of command-line syntax. This is an
- advanced method of input.
-* Conversion:: The conversion of strings to numbers and
- vice versa.
-* All Operators:: @command{gawk}'s operators.
-* Arithmetic Ops:: Arithmetic operations (@samp{+}, @samp{-},
- etc.)
-* Concatenation:: Concatenating strings.
-* Assignment Ops:: Changing the value of a variable or a
- field.
-* Increment Ops:: Incrementing the numeric value of a
- variable.
-* Truth Values and Conditions:: Testing for true and false.
-* Truth Values:: What is ``true'' and what is ``false''.
-* Typing and Comparison:: How variables acquire types and how this
- affects comparison of numbers and strings
- with @samp{<}, etc.
-* Variable Typing:: String type versus numeric type.
-* Comparison Operators:: The comparison operators.
-* POSIX String Comparison:: String comparison with POSIX rules.
-* Boolean Ops:: Combining comparison expressions using
- boolean operators @samp{||} (``or''),
- @samp{&&} (``and'') and @samp{!} (``not'').
-* Conditional Exp:: Conditional expressions select between two
- subexpressions under control of a third
- subexpression.
-* Function Calls:: A function call is an expression.
-* Precedence:: How various operators nest.
-* Locales:: How the locale affects things.
-* Pattern Overview:: What goes into a pattern.
-* Regexp Patterns:: Using regexps as patterns.
-* Expression Patterns:: Any expression can be used as a pattern.
-* Ranges:: Pairs of patterns specify record ranges.
-* BEGIN/END:: Specifying initialization and cleanup
- rules.
-* Using BEGIN/END:: How and why to use BEGIN/END rules.
-* I/O And BEGIN/END:: I/O issues in BEGIN/END rules.
-* BEGINFILE/ENDFILE:: Two special patterns for advanced control.
-* Empty:: The empty pattern, which matches every
- record.
-* Using Shell Variables:: How to use shell variables with
- @command{awk}.
-* Action Overview:: What goes into an action.
-* Statements:: Describes the various control statements in
- detail.
-* If Statement:: Conditionally execute some @command{awk}
- statements.
-* While Statement:: Loop until some condition is satisfied.
-* Do Statement:: Do specified action while looping until
- some condition is satisfied.
-* For Statement:: Another looping statement, that provides
- initialization and increment clauses.
-* Switch Statement:: Switch/case evaluation for conditional
- execution of statements based on a value.
-* Break Statement:: Immediately exit the innermost enclosing
- loop.
-* Continue Statement:: Skip to the end of the innermost enclosing
- loop.
-* Next Statement:: Stop processing the current input record.
-* Nextfile Statement:: Stop processing the current file.
-* Exit Statement:: Stop execution of @command{awk}.
-* Built-in Variables:: Summarizes the built-in variables.
-* User-modified:: Built-in variables that you change to
- control @command{awk}.
-* Auto-set:: Built-in variables where @command{awk}
- gives you information.
-* ARGC and ARGV:: Ways to use @code{ARGC} and @code{ARGV}.
-* Array Basics:: The basics of arrays.
-* Array Intro:: Introduction to Arrays
-* Reference to Elements:: How to examine one element of an array.
-* Assigning Elements:: How to change an element of an array.
-* Array Example:: Basic Example of an Array
-* Scanning an Array:: A variation of the @code{for} statement. It
- loops through the indices of an array's
- existing elements.
-* Controlling Scanning:: Controlling the order in which arrays are
- scanned.
-* Delete:: The @code{delete} statement removes an
- element from an array.
-* Numeric Array Subscripts:: How to use numbers as subscripts in
- @command{awk}.
-* Uninitialized Subscripts:: Using Uninitialized variables as
- subscripts.
-* Multi-dimensional:: Emulating multidimensional arrays in
- @command{awk}.
-* Multi-scanning:: Scanning multidimensional arrays.
-* Arrays of Arrays:: True multidimensional arrays.
-* Built-in:: Summarizes the built-in functions.
-* Calling Built-in:: How to call built-in functions.
-* Numeric Functions:: Functions that work with numbers, including
- @code{int()}, @code{sin()} and
- @code{rand()}.
-* String Functions:: Functions for string manipulation, such as
- @code{split()}, @code{match()} and
- @code{sprintf()}.
-* Gory Details:: More than you want to know about @samp{\}
- and @samp{&} with @code{sub()},
- @code{gsub()}, and @code{gensub()}.
-* I/O Functions:: Functions for files and shell commands.
-* Time Functions:: Functions for dealing with timestamps.
-* Bitwise Functions:: Functions for bitwise operations.
-* Type Functions:: Functions for type information.
-* I18N Functions:: Functions for string translation.
-* User-defined:: Describes User-defined functions in detail.
-* Definition Syntax:: How to write definitions and what they
- mean.
-* Function Example:: An example function definition and what it
- does.
-* Function Caveats:: Things to watch out for.
-* Calling A Function:: Don't use spaces.
-* Variable Scope:: Controlling variable scope.
-* Pass By Value/Reference:: Passing parameters.
-* Return Statement:: Specifying the value a function returns.
-* Dynamic Typing:: How variable types can change at runtime.
-* Indirect Calls:: Choosing the function to call at runtime.
-* I18N and L10N:: Internationalization and Localization.
-* Explaining gettext:: How GNU @code{gettext} works.
-* Programmer i18n:: Features for the programmer.
-* Translator i18n:: Features for the translator.
-* String Extraction:: Extracting marked strings.
-* Printf Ordering:: Rearranging @code{printf} arguments.
-* I18N Portability:: @command{awk}-level portability issues.
-* I18N Example:: A simple i18n example.
-* Gawk I18N:: @command{gawk} is also internationalized.
-* Nondecimal Data:: Allowing nondecimal input data.
-* Array Sorting:: Facilities for controlling array traversal
- and sorting arrays.
-* Controlling Array Traversal:: How to use PROCINFO["sorted_in"].
-* Array Sorting Functions:: How to use @code{asort()} and
- @code{asorti()}.
-* Two-way I/O:: Two-way communications with another
- process.
-* TCP/IP Networking:: Using @command{gawk} for network
- programming.
-* Profiling:: Profiling your @command{awk} programs.
-* Library Names:: How to best name private global variables
- in library functions.
-* General Functions:: Functions that are of general use.
-* Strtonum Function:: A replacement for the built-in
- @code{strtonum()} function.
-* Assert Function:: A function for assertions in @command{awk}
- programs.
-* Round Function:: A function for rounding if @code{sprintf()}
- does not do it correctly.
-* Cliff Random Function:: The Cliff Random Number Generator.
-* Ordinal Functions:: Functions for using characters as numbers
- and vice versa.
-* Join Function:: A function to join an array into a string.
-* Getlocaltime Function:: A function to get formatted times.
-* Data File Management:: Functions for managing command-line data
- files.
-* Filetrans Function:: A function for handling data file
- transitions.
-* Rewind Function:: A function for rereading the current file.
-* File Checking:: Checking that data files are readable.
-* Empty Files:: Checking for zero-length files.
-* Ignoring Assigns:: Treating assignments as file names.
-* Getopt Function:: A function for processing command-line
- arguments.
-* Passwd Functions:: Functions for getting user information.
-* Group Functions:: Functions for getting group information.
-* Walking Arrays:: A function to walk arrays of arrays.
-* Running Examples:: How to run these examples.
-* Clones:: Clones of common utilities.
-* Cut Program:: The @command{cut} utility.
-* Egrep Program:: The @command{egrep} utility.
-* Id Program:: The @command{id} utility.
-* Split Program:: The @command{split} utility.
-* Tee Program:: The @command{tee} utility.
-* Uniq Program:: The @command{uniq} utility.
-* Wc Program:: The @command{wc} utility.
-* Miscellaneous Programs:: Some interesting @command{awk} programs.
-* Dupword Program:: Finding duplicated words in a document.
-* Alarm Program:: An alarm clock.
-* Translate Program:: A program similar to the @command{tr}
- utility.
-* Labels Program:: Printing mailing labels.
-* Word Sorting:: A program to produce a word usage count.
-* History Sorting:: Eliminating duplicate entries from a
- history file.
-* Extract Program:: Pulling out programs from Texinfo source
- files.
-* Simple Sed:: A Simple Stream Editor.
-* Igawk Program:: A wrapper for @command{awk} that includes
- files.
-* Anagram Program:: Finding anagrams from a dictionary.
-* Signature Program:: People do amazing things with too much time
- on their hands.
-* Debugging:: Introduction to @command{gawk} debugger.
-* Debugging Concepts:: Debugging in General.
-* Debugging Terms:: Additional Debugging Concepts.
-* Awk Debugging:: Awk Debugging.
-* Sample Debugging Session:: Sample debugging session.
-* Debugger Invocation:: How to Start the Debugger.
-* Finding The Bug:: Finding the Bug.
-* List of Debugger Commands:: Main debugger commands.
-* Breakpoint Control:: Control of Breakpoints.
-* Debugger Execution Control:: Control of Execution.
-* Viewing And Changing Data:: Viewing and Changing Data.
-* Execution Stack:: Dealing with the Stack.
-* Debugger Info:: Obtaining Information about the Program and
- the Debugger State.
-* Miscellaneous Debugger Commands:: Miscellaneous Commands.
-* Readline Support:: Readline support.
-* Limitations:: Limitations and future plans.
-* General Arithmetic:: An introduction to computer arithmetic.
-* Floating Point Issues:: Stuff to know about floating-point numbers.
-* String Conversion Precision:: The String Value Can Lie.
-* Unexpected Results:: Floating Point Numbers Are Not Abstract
- Numbers.
-* POSIX Floating Point Problems:: Standards Versus Existing Practice.
-* Integer Programming:: Effective integer programming.
-* Floating-point Programming:: Effective Floating-point Programming.
-* Floating-point Representation:: Binary floating-point representation.
-* Floating-point Context:: Floating-point context.
-* Rounding Mode:: Floating-point rounding mode.
-* Gawk and MPFR:: How @command{gawk} provides
- arbitrary-precision arithmetic.
-* Arbitrary Precision Floats:: Arbitrary Precision Floating-point
- Arithmetic with @command{gawk}.
-* Setting Precision:: Setting the working precision.
-* Setting Rounding Mode:: Setting the rounding mode.
-* Floating-point Constants:: Representing floating-point constants.
-* Changing Precision:: Changing the precision of a number.
-* Exact Arithmetic:: Exact arithmetic with floating-point
- numbers.
-* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic with
- @command{gawk}.
-* Plugin License:: A note about licensing.
-* Sample Library:: A example of new functions.
-* Internal File Description:: What the new functions will do.
-* Internal File Ops:: The code for internal file operations.
-* Using Internal File Ops:: How to use an external extension.
-* V7/SVR3.1:: The major changes between V7 and System V
- Release 3.1.
-* SVR4:: Minor changes between System V Releases 3.1
- and 4.
-* POSIX:: New features from the POSIX standard.
-* BTL:: New features from Brian Kernighan's version
- of @command{awk}.
-* POSIX/GNU:: The extensions in @command{gawk} not in
- POSIX @command{awk}.
-* Common Extensions:: Common Extensions Summary.
-* Ranges and Locales:: How locales used to affect regexp ranges.
-* Contributors:: The major contributors to @command{gawk}.
-* Gawk Distribution:: What is in the @command{gawk} distribution.
-* Getting:: How to get the distribution.
-* Extracting:: How to extract the distribution.
-* Distribution contents:: What is in the distribution.
-* Unix Installation:: Installing @command{gawk} under various
- versions of Unix.
-* Quick Installation:: Compiling @command{gawk} under Unix.
-* Additional Configuration Options:: Other compile-time options.
-* Configuration Philosophy:: How it's all supposed to work.
-* Non-Unix Installation:: Installation on Other Operating Systems.
-* PC Installation:: Installing and Compiling @command{gawk} on
- MS-DOS and OS/2.
-* PC Binary Installation:: Installing a prepared distribution.
-* PC Compiling:: Compiling @command{gawk} for MS-DOS,
- Windows32, and OS/2.
-* PC Testing:: Testing @command{gawk} on PC systems.
-* PC Using:: Running @command{gawk} on MS-DOS, Windows32
- and OS/2.
-* Cygwin:: Building and running @command{gawk} for
- Cygwin.
-* MSYS:: Using @command{gawk} In The MSYS
- Environment.
-* VMS Installation:: Installing @command{gawk} on VMS.
-* VMS Compilation:: How to compile @command{gawk} under VMS.
-* VMS Installation Details:: How to install @command{gawk} under VMS.
-* VMS Running:: How to run @command{gawk} under VMS.
-* VMS Old Gawk:: An old version comes with some VMS systems.
-* Bugs:: Reporting Problems and Bugs.
-* Other Versions:: Other freely available @command{awk}
- implementations.
-* Compatibility Mode:: How to disable certain @command{gawk}
- extensions.
-* Additions:: Making Additions To @command{gawk}.
-* Accessing The Source:: Accessing the Git repository.
-* Adding Code:: Adding code to the main body of
- @command{gawk}.
-* New Ports:: Porting @command{gawk} to a new operating
- system.
-* Derived Files:: Why derived files are kept in the
- @command{git} repository.
-* Future Extensions:: New features that may be implemented one
- day.
-* Basic High Level:: The high level view.
-* Basic Data Typing:: A very quick intro to data types.
+* History:: The history of @command{gawk} and
+ @command{awk}.
+* Names:: What name to use to find
+ @command{awk}.
+* This Manual:: Using this @value{DOCUMENT}. Includes
+ sample input files that you can use.
+* Conventions:: Typographical Conventions.
+* Manual History:: Brief history of the GNU project and
+ this @value{DOCUMENT}.
+* How To Contribute:: Helping to save the world.
+* Acknowledgments:: Acknowledgments.
+* Running gawk:: How to run @command{gawk} programs;
+ includes command-line syntax.
+* One-shot:: Running a short throwaway
+ @command{awk} program.
+* Read Terminal:: Using no input files (input from
+ terminal instead).
+* Long:: Putting permanent @command{awk}
+ programs in files.
+* Executable Scripts:: Making self-contained @command{awk}
+ programs.
+* Comments:: Adding documentation to @command{gawk}
+ programs.
+* Quoting:: More discussion of shell quoting
+ issues.
+* DOS Quoting:: Quoting in Windows Batch Files.
+* Sample Data Files:: Sample data files for use in the
+ @command{awk} programs illustrated in
+ this @value{DOCUMENT}.
+* Very Simple:: A very simple example.
+* Two Rules:: A less simple one-line example using
+ two rules.
+* More Complex:: A more complex example.
+* Statements/Lines:: Subdividing or combining statements
+ into lines.
+* Other Features:: Other Features of @command{awk}.
+* When:: When to use @command{gawk} and when to
+ use other things.
+* Command Line:: How to run @command{awk}.
+* Options:: Command-line options and their
+ meanings.
+* Other Arguments:: Input file names and variable
+ assignments.
+* Naming Standard Input:: How to specify standard input with
+ other files.
+* Environment Variables:: The environment variables
+ @command{gawk} uses.
+* AWKPATH Variable:: Searching directories for
+ @command{awk} programs.
+* AWKLIBPATH Variable:: Searching directories for
+ @command{awk} shared libraries.
+* Other Environment Variables:: The environment variables.
+* Exit Status:: @command{gawk}'s exit status.
+* Include Files:: Including other files into your
+ program.
+* Loading Shared Libraries:: Loading shared libraries into your
+ program.
+* Obsolete:: Obsolete Options and/or features.
+* Undocumented:: Undocumented Options and Features.
+* Regexp Usage:: How to Use Regular Expressions.
+* Escape Sequences:: How to write nonprinting characters.
+* Regexp Operators:: Regular Expression Operators.
+* Bracket Expressions:: What can go between @samp{[...]}.
+* GNU Regexp Operators:: Operators specific to GNU software.
+* Case-sensitivity:: How to do case-insensitive matching.
+* Leftmost Longest:: How much text matches.
+* Computed Regexps:: Using Dynamic Regexps.
+* Records:: Controlling how data is split into
+ records.
+* Fields:: An introduction to fields.
+* Nonconstant Fields:: Nonconstant Field Numbers.
+* Changing Fields:: Changing the Contents of a Field.
+* Field Separators:: The field separator and how to change
+ it.
+* Default Field Splitting:: How fields are normally separated.
+* Regexp Field Splitting:: Using regexps as the field separator.
+* Single Character Fields:: Making each character a separate
+ field.
+* Command Line Field Separator:: Setting @code{FS} from the
+ command-line.
+* Field Splitting Summary:: Some final points and a summary table.
+* Constant Size:: Reading constant width data.
+* Splitting By Content:: Defining Fields By Content
+* Multiple Line:: Reading multi-line records.
+* Getline:: Reading files under explicit program
+ control using the @code{getline}
+ function.
+* Plain Getline:: Using @code{getline} with no
+ arguments.
+* Getline/Variable:: Using @code{getline} into a variable.
+* Getline/File:: Using @code{getline} from a file.
+* Getline/Variable/File:: Using @code{getline} into a variable
+ from a file.
+* Getline/Pipe:: Using @code{getline} from a pipe.
+* Getline/Variable/Pipe:: Using @code{getline} into a variable
+ from a pipe.
+* Getline/Coprocess:: Using @code{getline} from a coprocess.
+* Getline/Variable/Coprocess:: Using @code{getline} into a variable
+ from a coprocess.
+* Getline Notes:: Important things to know about
+ @code{getline}.
+* Getline Summary:: Summary of @code{getline} Variants.
+* Read Timeout:: Reading input with a timeout.
+* Command line directories:: What happens if you put a directory on
+ the command line.
+* Print:: The @code{print} statement.
+* Print Examples:: Simple examples of @code{print}
+ statements.
+* Output Separators:: The output separators and how to
+ change them.
+* OFMT:: Controlling Numeric Output With
+ @code{print}.
+* Printf:: The @code{printf} statement.
+* Basic Printf:: Syntax of the @code{printf} statement.
+* Control Letters:: Format-control letters.
+* Format Modifiers:: Format-specification modifiers.
+* Printf Examples:: Several examples.
+* Redirection:: How to redirect output to multiple
+ files and pipes.
+* Special Files:: File name interpretation in
+ @command{gawk}. @command{gawk} allows
+ access to inherited file descriptors.
+* Special FD:: Special files for I/O.
+* Special Network:: Special files for network
+ communications.
+* Special Caveats:: Things to watch out for.
+* Close Files And Pipes:: Closing Input and Output Files and
+ Pipes.
+* Values:: Constants, Variables, and Regular
+ Expressions.
+* Constants:: String, numeric and regexp constants.
+* Scalar Constants:: Numeric and string constants.
+* Nondecimal-numbers:: What are octal and hex numbers.
+* Regexp Constants:: Regular Expression constants.
+* Using Constant Regexps:: When and how to use a regexp constant.
+* Variables:: Variables give names to values for
+ later use.
+* Using Variables:: Using variables in your programs.
+* Assignment Options:: Setting variables on the command-line
+ and a summary of command-line syntax.
+ This is an advanced method of input.
+* Conversion:: The conversion of strings to numbers
+ and vice versa.
+* All Operators:: @command{gawk}'s operators.
+* Arithmetic Ops:: Arithmetic operations (@samp{+},
+ @samp{-}, etc.)
+* Concatenation:: Concatenating strings.
+* Assignment Ops:: Changing the value of a variable or a
+ field.
+* Increment Ops:: Incrementing the numeric value of a
+ variable.
+* Truth Values and Conditions:: Testing for true and false.
+* Truth Values:: What is ``true'' and what is
+ ``false''.
+* Typing and Comparison:: How variables acquire types and how
+ this affects comparison of numbers and
+ strings with @samp{<}, etc.
+* Variable Typing:: String type versus numeric type.
+* Comparison Operators:: The comparison operators.
+* POSIX String Comparison:: String comparison with POSIX rules.
+* Boolean Ops:: Combining comparison expressions using
+ boolean operators @samp{||} (``or''),
+ @samp{&&} (``and'') and @samp{!}
+ (``not'').
+* Conditional Exp:: Conditional expressions select between
+ two subexpressions under control of a
+ third subexpression.
+* Function Calls:: A function call is an expression.
+* Precedence:: How various operators nest.
+* Locales:: How the locale affects things.
+* Pattern Overview:: What goes into a pattern.
+* Regexp Patterns:: Using regexps as patterns.
+* Expression Patterns:: Any expression can be used as a
+ pattern.
+* Ranges:: Pairs of patterns specify record
+ ranges.
+* BEGIN/END:: Specifying initialization and cleanup
+ rules.
+* Using BEGIN/END:: How and why to use BEGIN/END rules.
+* I/O And BEGIN/END:: I/O issues in BEGIN/END rules.
+* BEGINFILE/ENDFILE:: Two special patterns for advanced
+ control.
+* Empty:: The empty pattern, which matches every
+ record.
+* Using Shell Variables:: How to use shell variables with
+ @command{awk}.
+* Action Overview:: What goes into an action.
+* Statements:: Describes the various control
+ statements in detail.
+* If Statement:: Conditionally execute some
+ @command{awk} statements.
+* While Statement:: Loop until some condition is
+ satisfied.
+* Do Statement:: Do specified action while looping
+ until some condition is satisfied.
+* For Statement:: Another looping statement, that
+ provides initialization and increment
+ clauses.
+* Switch Statement:: Switch/case evaluation for conditional
+ execution of statements based on a
+ value.
+* Break Statement:: Immediately exit the innermost
+ enclosing loop.
+* Continue Statement:: Skip to the end of the innermost
+ enclosing loop.
+* Next Statement:: Stop processing the current input
+ record.
+* Nextfile Statement:: Stop processing the current file.
+* Exit Statement:: Stop execution of @command{awk}.
+* Built-in Variables:: Summarizes the built-in variables.
+* User-modified:: Built-in variables that you change to
+ control @command{awk}.
+* Auto-set:: Built-in variables where @command{awk}
+ gives you information.
+* ARGC and ARGV:: Ways to use @code{ARGC} and
+ @code{ARGV}.
+* Array Basics:: The basics of arrays.
+* Array Intro:: Introduction to Arrays
+* Reference to Elements:: How to examine one element of an
+ array.
+* Assigning Elements:: How to change an element of an array.
+* Array Example:: Basic Example of an Array
+* Scanning an Array:: A variation of the @code{for}
+ statement. It loops through the
+ indices of an array's existing
+ elements.
+* Controlling Scanning:: Controlling the order in which arrays
+ are scanned.
+* Delete:: The @code{delete} statement removes an
+ element from an array.
+* Numeric Array Subscripts:: How to use numbers as subscripts in
+ @command{awk}.
+* Uninitialized Subscripts:: Using Uninitialized variables as
+ subscripts.
+* Multi-dimensional:: Emulating multidimensional arrays in
+ @command{awk}.
+* Multi-scanning:: Scanning multidimensional arrays.
+* Arrays of Arrays:: True multidimensional arrays.
+* Built-in:: Summarizes the built-in functions.
+* Calling Built-in:: How to call built-in functions.
+* Numeric Functions:: Functions that work with numbers,
+ including @code{int()}, @code{sin()}
+ and @code{rand()}.
+* String Functions:: Functions for string manipulation,
+ such as @code{split()}, @code{match()}
+ and @code{sprintf()}.
+* Gory Details:: More than you want to know about
+ @samp{\} and @samp{&} with
+ @code{sub()}, @code{gsub()}, and
+ @code{gensub()}.
+* I/O Functions:: Functions for files and shell
+ commands.
+* Time Functions:: Functions for dealing with timestamps.
+* Bitwise Functions:: Functions for bitwise operations.
+* Type Functions:: Functions for type information.
+* I18N Functions:: Functions for string translation.
+* User-defined:: Describes User-defined functions in
+ detail.
+* Definition Syntax:: How to write definitions and what they
+ mean.
+* Function Example:: An example function definition and
+ what it does.
+* Function Caveats:: Things to watch out for.
+* Calling A Function:: Don't use spaces.
+* Variable Scope:: Controlling variable scope.
+* Pass By Value/Reference:: Passing parameters.
+* Return Statement:: Specifying the value a function
+ returns.
+* Dynamic Typing:: How variable types can change at
+ runtime.
+* Indirect Calls:: Choosing the function to call at
+ runtime.
+* I18N and L10N:: Internationalization and Localization.
+* Explaining gettext:: How GNU @code{gettext} works.
+* Programmer i18n:: Features for the programmer.
+* Translator i18n:: Features for the translator.
+* String Extraction:: Extracting marked strings.
+* Printf Ordering:: Rearranging @code{printf} arguments.
+* I18N Portability:: @command{awk}-level portability
+ issues.
+* I18N Example:: A simple i18n example.
+* Gawk I18N:: @command{gawk} is also
+ internationalized.
+* Nondecimal Data:: Allowing nondecimal input data.
+* Array Sorting:: Facilities for controlling array
+ traversal and sorting arrays.
+* Controlling Array Traversal:: How to use PROCINFO["sorted_in"].
+* Array Sorting Functions:: How to use @code{asort()} and
+ @code{asorti()}.
+* Two-way I/O:: Two-way communications with another
+ process.
+* TCP/IP Networking:: Using @command{gawk} for network
+ programming.
+* Profiling:: Profiling your @command{awk} programs.
+* Library Names:: How to best name private global
+ variables in library functions.
+* General Functions:: Functions that are of general use.
+* Strtonum Function:: A replacement for the built-in
+ @code{strtonum()} function.
+* Assert Function:: A function for assertions in
+ @command{awk} programs.
+* Round Function:: A function for rounding if
+ @code{sprintf()} does not do it
+ correctly.
+* Cliff Random Function:: The Cliff Random Number Generator.
+* Ordinal Functions:: Functions for using characters as
+ numbers and vice versa.
+* Join Function:: A function to join an array into a
+ string.
+* Getlocaltime Function:: A function to get formatted times.
+* Data File Management:: Functions for managing command-line
+ data files.
+* Filetrans Function:: A function for handling data file
+ transitions.
+* Rewind Function:: A function for rereading the current
+ file.
+* File Checking:: Checking that data files are readable.
+* Empty Files:: Checking for zero-length files.
+* Ignoring Assigns:: Treating assignments as file names.
+* Getopt Function:: A function for processing command-line
+ arguments.
+* Passwd Functions:: Functions for getting user
+ information.
+* Group Functions:: Functions for getting group
+ information.
+* Walking Arrays:: A function to walk arrays of arrays.
+* Running Examples:: How to run these examples.
+* Clones:: Clones of common utilities.
+* Cut Program:: The @command{cut} utility.
+* Egrep Program:: The @command{egrep} utility.
+* Id Program:: The @command{id} utility.
+* Split Program:: The @command{split} utility.
+* Tee Program:: The @command{tee} utility.
+* Uniq Program:: The @command{uniq} utility.
+* Wc Program:: The @command{wc} utility.
+* Miscellaneous Programs:: Some interesting @command{awk}
+ programs.
+* Dupword Program:: Finding duplicated words in a
+ document.
+* Alarm Program:: An alarm clock.
+* Translate Program:: A program similar to the @command{tr}
+ utility.
+* Labels Program:: Printing mailing labels.
+* Word Sorting:: A program to produce a word usage
+ count.
+* History Sorting:: Eliminating duplicate entries from a
+ history file.
+* Extract Program:: Pulling out programs from Texinfo
+ source files.
+* Simple Sed:: A Simple Stream Editor.
+* Igawk Program:: A wrapper for @command{awk} that
+ includes files.
+* Anagram Program:: Finding anagrams from a dictionary.
+* Signature Program:: People do amazing things with too much
+ time on their hands.
+* Debugging:: Introduction to @command{gawk}
+ debugger.
+* Debugging Concepts:: Debugging in General.
+* Debugging Terms:: Additional Debugging Concepts.
+* Awk Debugging:: Awk Debugging.
+* Sample Debugging Session:: Sample debugging session.
+* Debugger Invocation:: How to Start the Debugger.
+* Finding The Bug:: Finding the Bug.
+* List of Debugger Commands:: Main debugger commands.
+* Breakpoint Control:: Control of Breakpoints.
+* Debugger Execution Control:: Control of Execution.
+* Viewing And Changing Data:: Viewing and Changing Data.
+* Execution Stack:: Dealing with the Stack.
+* Debugger Info:: Obtaining Information about the
+ Program and the Debugger State.
+* Miscellaneous Debugger Commands:: Miscellaneous Commands.
+* Readline Support:: Readline support.
+* Limitations:: Limitations and future plans.
+* General Arithmetic:: An introduction to computer
+ arithmetic.
+* Floating Point Issues:: Stuff to know about floating-point
+ numbers.
+* String Conversion Precision:: The String Value Can Lie.
+* Unexpected Results:: Floating Point Numbers Are Not
+ Abstract Numbers.
+* POSIX Floating Point Problems:: Standards Versus Existing Practice.
+* Integer Programming:: Effective integer programming.
+* Floating-point Programming:: Effective Floating-point Programming.
+* Floating-point Representation:: Binary floating-point representation.
+* Floating-point Context:: Floating-point context.
+* Rounding Mode:: Floating-point rounding mode.
+* Gawk and MPFR:: How @command{gawk} provides
+ arbitrary-precision arithmetic.
+* Arbitrary Precision Floats:: Arbitrary Precision Floating-point
+ Arithmetic with @command{gawk}.
+* Setting Precision:: Setting the working precision.
+* Setting Rounding Mode:: Setting the rounding mode.
+* Floating-point Constants:: Representing floating-point constants.
+* Changing Precision:: Changing the precision of a number.
+* Exact Arithmetic:: Exact arithmetic with floating-point
+ numbers.
+* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic
+ with @command{gawk}.
+* Extension Intro:: What is an extension.
+* Plugin License:: A note about licensing.
+* Extension Design:: Design notes about the extension API.
+* Old Extension Problems:: Problems with the old mechanism.
+* Extension New Mechanism Goals:: Goals for the new mechanism.
+* Extension Other Design Decisions:: Some other design decisions.
+* Extension Mechanism Outline:: An outline of how it works.
+* Extension Future Growth:: Some room for future growth.
+* Extension API Description:: A full description of the API.
+* Extension API Functions Introduction:: Introduction to the API functions.
+* General Data Types:: The data types.
+* Requesting Values:: How to get a value.
+* Constructor Functions:: Functions for creating values.
+* Registration Functions:: Functions to register things with
+ @command{gawk}.
+* Extension Functions:: Registering extension functions.
+* Exit Callback Functions:: Registering an exit callback.
+* Extension Version String:: Registering a version string.
+* Input Parsers:: Registering an input parser.
+* Output Wrappers:: Registering an output wrapper.
+* Two-way processors:: Registering a two-way processor.
+* Printing Messages:: Functions for printing messages.
+* Updating @code{ERRNO}:: Functions for updating @code{ERRNO}.
+* Accessing Parameters:: Functions for accessing parameters.
+* Symbol Table Access:: Functions for accessing global
+ variables.
+* Symbol table by name:: Accessing variables by name.
+* Symbol table by cookie:: Accessing variables by ``cookie''.
+* Cached values:: Creating and using cached values.
+* Array Manipulation:: Functions for working with arrays.
+* Array Data Types:: Data types for working with arrays.
+* Array Functions:: Functions for working with arrays.
+* Flattening Arrays:: How to flatten arrays.
+* Creating Arrays:: How to create and populate arrays.
+* Extension API Variables:: Variables provided by the API.
+* Extension Versioning:: API Version information.
+* Extension API Informational Variables:: Variables providing information about
+ @command{gawk}'s invocation.
+* Extension API Boilerplate:: Boilerplate code for using the API.
+* Finding Extensions:: How @command{gawk} find compiled
+ extensions.
+* Extension Example:: Example C code for an extension.
+* Internal File Description:: What the new functions will do.
+* Internal File Ops:: The code for internal file operations.
+* Using Internal File Ops:: How to use an external extension.
+* Extension Samples:: The sample extensions that ship with
+ @code{gawk}.
+* Extension Sample File Functions:: The file functions sample.
+* Extension Sample Fnmatch:: An interface to @code{fnmatch()}.
+* Extension Sample Fork:: An interface to @code{fork()} and
+ other process functions.
+* Extension Sample Ord:: Character to value to character
+ conversions.
+* Extension Sample Readdir:: An interface to @code{readdir()}.
+* Extension Sample Revout:: Reversing output sample output
+ wrapper.
+* Extension Sample Rev2way:: Reversing data sample two-way
+ processor.
+* Extension Sample Read write array:: Serializing an array to a file.
+* Extension Sample Readfile:: Reading an entire file into a string.
+* Extension Sample API Tests:: Tests for the API.
+* Extension Sample Time:: An interface to @code{gettimeofday()}
+ and @code{sleep()}.
+* gawkextlib:: The @code{gawkextlib} project.
+* V7/SVR3.1:: The major changes between V7 and
+ System V Release 3.1.
+* SVR4:: Minor changes between System V
+ Releases 3.1 and 4.
+* POSIX:: New features from the POSIX standard.
+* BTL:: New features from Brian Kernighan's
+ version of @command{awk}.
+* POSIX/GNU:: The extensions in @command{gawk} not
+ in POSIX @command{awk}.
+* Common Extensions:: Common Extensions Summary.
+* Ranges and Locales:: How locales used to affect regexp
+ ranges.
+* Contributors:: The major contributors to
+ @command{gawk}.
+* Gawk Distribution:: What is in the @command{gawk}
+ distribution.
+* Getting:: How to get the distribution.
+* Extracting:: How to extract the distribution.
+* Distribution contents:: What is in the distribution.
+* Unix Installation:: Installing @command{gawk} under
+ various versions of Unix.
+* Quick Installation:: Compiling @command{gawk} under Unix.
+* Additional Configuration Options:: Other compile-time options.
+* Configuration Philosophy:: How it's all supposed to work.
+* Non-Unix Installation:: Installation on Other Operating
+ Systems.
+* PC Installation:: Installing and Compiling
+ @command{gawk} on MS-DOS and OS/2.
+* PC Binary Installation:: Installing a prepared distribution.
+* PC Compiling:: Compiling @command{gawk} for MS-DOS,
+ Windows32, and OS/2.
+* PC Testing:: Testing @command{gawk} on PC systems.
+* PC Using:: Running @command{gawk} on MS-DOS,
+ Windows32 and OS/2.
+* Cygwin:: Building and running @command{gawk}
+ for Cygwin.
+* MSYS:: Using @command{gawk} In The MSYS
+ Environment.
+* VMS Installation:: Installing @command{gawk} on VMS.
+* VMS Compilation:: How to compile @command{gawk} under
+ VMS.
+* VMS Installation Details:: How to install @command{gawk} under
+ VMS.
+* VMS Running:: How to run @command{gawk} under VMS.
+* VMS Old Gawk:: An old version comes with some VMS
+ systems.
+* Bugs:: Reporting Problems and Bugs.
+* Other Versions:: Other freely available @command{awk}
+ implementations.
+* Compatibility Mode:: How to disable certain @command{gawk}
+ extensions.
+* Additions:: Making Additions To @command{gawk}.
+* Accessing The Source:: Accessing the Git repository.
+* Adding Code:: Adding code to the main body of
+ @command{gawk}.
+* New Ports:: Porting @command{gawk} to a new
+ operating system.
+* Derived Files:: Why derived files are kept in the
+ @command{git} repository.
+* Future Extensions:: New features that may be implemented
+ one day.
+* Basic High Level:: The high level view.
+* Basic Data Typing:: A very quick intro to data types.
@end detailmenu
@end menu
@@ -28049,42 +28161,62 @@ gawk -M 'BEGIN @{ n = 13; print n % 2 @}'
@node Dynamic Extensions
@chapter Writing Extensions for @command{gawk}
-This chapter is a placeholder, pending a rewrite for the new API.
-Some of the old bits remain, since they can be partially reused.
-
-
-@c STARTOFRANGE gladfgaw
-@cindex @command{gawk}, functions, adding
-@c STARTOFRANGE adfugaw
-@cindex adding, functions to @command{gawk}
-@c STARTOFRANGE fubadgaw
-@cindex functions, built-in, adding to @command{gawk}
-It is possible to add new built-in
-functions to @command{gawk} using dynamically loaded libraries. This
-facility is available on systems (such as GNU/Linux) that support
-the C @code{dlopen()} and @code{dlsym()} functions.
-This @value{CHAPTER} describes how to write and use dynamically
-loaded extensions for @command{gawk}.
-Experience with programming in
-C or C++ is necessary when reading this @value{SECTION}.
+It is possible to add new built-in functions to @command{gawk} using
+dynamically loaded libraries. This facility is available on systems (such
+as GNU/Linux) that support the C @code{dlopen()} and @code{dlsym()}
+functions. This @value{CHAPTER} describes how to create extensions
+using code written in C or C++. If you don't know anything about C
+programming, you can safely skip this @value{CHAPTER}, although you
+may wish to review the documentation on the extensions that come with
+@command{gawk} (@pxref{Extension Samples}), and the section on the
+@code{gawkextlib} project (@pxref{gawkextlib}).
@quotation NOTE
When @option{--sandbox} is specified, extensions are disabled
-(@pxref{Options}.
+(@pxref{Options}).
@end quotation
@menu
+* Extension Intro:: What is an extension.
* Plugin License:: A note about licensing.
-* Sample Library:: A example of new functions.
+* Extension Design:: Design notes about the extension API.
+* Extension API Description:: A full description of the API.
+* Extension Example:: Example C code for an extension.
+* Extension Samples:: The sample extensions that ship with
+ @code{gawk}.
+* gawkextlib:: The @code{gawkextlib} project.
@end menu
+@node Extension Intro
+@section Introduction
+
+An @dfn{extension} (sometimes called a @dfn{plug-in}) is a piece of
+external compiled code that @command{gawk} can load at runtime to
+provide additional functionality, over and above the built-in capabilities
+described in the rest of this @value{DOCUMENT}.
+
+Extensions are useful because they allow you (of course) to extend
+@command{gawk}'s functionality. For example, they can provide access to
+system calls (such as @code{chdir()} to change directory) and to other
+C library routines that could be of use. As with most software,
+``the sky is the limit;'' if you can imagine something that you might
+want to do and can write in C or C++, you can write an extension to do it!
+
+Extensions are written in C or C++, using the @dfn{Application Programming
+Interface} (API) defined for this purpose by the @command{gawk}
+developers. The rest of this @value{CHAPTER} explains the design
+decisions behind the API, the facilities it provides and how to use
+them, and presents a small sample extension. In addition, it documents
+the sample extensions included in the @command{gawk} distribution,
+and describes the @code{gawkextlib} project.
+
@node Plugin License
@section Extension Licensing
Every dynamic extension should define the global symbol
@code{plugin_is_GPL_compatible} to assert that it has been licensed under
a GPL-compatible license. If this symbol does not exist, @command{gawk}
-will emit a fatal error and exit.
+emits a fatal error and exits when it tries to load your extension.
The declared type of the symbol should be @code{int}. It does not need
to be in any allocated section, though. The code merely asserts that
@@ -28094,23 +28226,2383 @@ the symbol exists in the global scope. Something like this is enough:
int plugin_is_GPL_compatible;
@end example
-@node Sample Library
-@section Example: Directory and File Operation Built-ins
-@c STARTOFRANGE chdirg
-@cindex @code{chdir()} function@comma{} implementing in @command{gawk}
-@c STARTOFRANGE statg
-@cindex @code{stat()} function@comma{} implementing in @command{gawk}
-@c STARTOFRANGE filre
-@cindex files, information about@comma{} retrieving
-@c STARTOFRANGE dirch
-@cindex directories, changing
-
-Two useful functions that are not in @command{awk} are @code{chdir()}
-(so that an @command{awk} program can change its directory) and
-@code{stat()} (so that an @command{awk} program can gather information about
-a file).
-This @value{SECTION} implements these functions for @command{gawk} in an
-external extension library.
+@node Extension Design
+@section Extension API Design
+
+The first version of extensions for @command{gawk} was developed in
+the mid-1990s and released with @command{gawk} 3.1 in the late 1990s.
+The basic mechanisms and design remained unchanged for close to 15 years,
+until 2012.
+
+The old extension mechanism used data types and functions from
+@command{gawk} itself, with a ``clever hack'' to install extension
+functions.
+
+@command{gawk} included some sample extensions, of which a few were
+really useful. However, it was clear from the outset that the extension
+mechanism was bolted onto the side and was not really thought out.
+
+@menu
+* Old Extension Problems:: Problems with the old mechanism.
+* Extension New Mechanism Goals:: Goals for the new mechanism.
+* Extension Other Design Decisions:: Some other design decisions.
+* Extension Mechanism Outline:: An outline of how it works.
+* Extension Future Growth:: Some room for future growth.
+@end menu
+
+@node Old Extension Problems
+@subsection Problems With The Old Mechanism
+
+The old extension mechanism had several problems:
+
+@itemize @bullet
+@item
+It depended heavily upon @command{gawk} internals. Any time the
+@code{NODE} structure@footnote{A critical central data structure
+inside @command{gawk}.} changed, an extension would have to be
+recompiled. Furthermore, to really write extensions required understanding
+something about @command{gawk}'s internal functions. There was some
+documentation in this @value{DOCUMENT}, but it was quite minimal.
+
+@item
+Being able to call into @command{gawk} from an extension required linker
+facilities that are common on Unix-derived systems but that did
+not work on Windows systems; users wanting extensions on Windows
+had to statically link them into @command{gawk}, even though Windows supports
+dynamic loading of shared objects.
+
+@item
+The API would change occasionally as @command{gawk} changed; no compatibility
+between versions was ever offered or planned for.
+@end itemize
+
+Despite the drawbacks, the @command{xgawk} project developers forked
+@command{gawk} and developed several significant extensions. They also
+enhanced @command{gawk}'s facilities relating to file inclusion and
+shared object access.
+
+A new API was desired for a long time, but only in 2012 did the
+@command{gawk} maintainer and the @command{xgawk} developers finally
+start working on it together. More information about the @command{xgawk}
+project is provided in @ref{gawkextlib}.
+
+@node Extension New Mechanism Goals
+@subsection Goals For A New Mechanism
+
+Some goals for the new API were:
+
+@itemize @bullet
+@item
+The API should be independent of @command{gawk} internals. Changes in
+@command{gawk} internals should not be visible to the writer of an
+extension function.
+
+@item
+The API should provide @emph{binary} compatibility across @command{gawk}
+releases as long as the API itself does not change.
+
+@item
+The API should enable extensions written in C to have roughly the
+same ``appearance'' to @command{awk}-level code as @command{awk}
+functions do. This means that extensions should have:
+
+@itemize @minus
+@item
+The ability to access function parameters.
+
+@item
+The ability to turn an undefined parameter into an array (call by reference).
+
+@item
+The ability to create, access and update global variables.
+
+@item
+Easy access to all the elements of an array at once (``array flattening'')
+in order to loop over all the element in an easy fashion for C code.
+
+@item
+The ability to create arrays (including @command{gawk}'s true
+multi-dimensional arrays).
+@end itemize
+@end itemize
+
+Some additional important goals were:
+
+@itemize @bullet
+@item
+The API should use only features in ISO C 90, so that extensions
+can be written using the widest range of C and C++ compilers. The header
+should include the appropriate @samp{#ifdef __cplusplus} and @samp{extern "C"}
+magic so that a C++ compiler could be used. (If using C++, the runtime
+system has to be smart enough to call any constructors and destructors,
+as @command{gawk} is a C program. As of this writing, this has not been
+tested.)
+
+@item
+The API mechanism should not require access to @command{gawk}'s
+symbols@footnote{The @dfn{symbols} are the variables and functions
+defined inside @command{gawk}. Access to these symbols by code
+external to @command{gawk} loaded dynamically at runtime is
+problematic on Windows.} by the compile-time or dynamic linker,
+in order to enable creation of extensions that also work on Windows.
+@end itemize
+
+During development, it became clear that there were other features
+that should be available to extensions, which were also subsequently
+provided:
+
+@itemize @bullet
+@item
+Extensions should have the ability to hook into @command{gawk}'s
+I/O redirection mechanism. In particular, the @command{xgawk}
+developers provided a so-called ``open hook'' to take over reading
+records. During development, this was generalized to allow
+extensions to hook into input processing, output processing, and
+two-way I/O.
+
+@item
+An extension should be able to provide a ``call back'' function
+to perform clean up actions when @command{gawk} exits.
+
+@item
+An extension should be able to provide a version string so that
+@command{gawk}'s @option{--version} option can provide information
+about extensions as well.
+@end itemize
+
+@node Extension Other Design Decisions
+@subsection Other Design Decisions
+
+As an ``arbitrary'' design decision, extensions can read the values of
+built-in variables and arrays (such as @code{ARGV} and @code{FS}), but cannot
+change them, with the exception of @code{PROCINFO}.
+
+The reason for this is to prevent an extension function from affecting
+the flow of an @command{awk} program outside its control. While a real
+@command{awk} function can do what it likes, that is at the discretion
+of the programmer. An extension function should provide a service or
+make a C API available for use within @command{awk}, and not mess with
+@code{FS} or @code{ARGC} and @code{ARGV}.
+
+In addition, it becomes easy to start down a slippery slope. How
+much access to @command{gawk} facilities do extensions need?
+Do they need @code{getline}? What about calling @code{gsub()} or
+compiling regular expressions? What about calling into @command{awk}
+functions? (@emph{That} would be messy.)
+
+In order to avoid these issues, the @command{gawk} developers chose
+to start with the simplest, most basic features that are still truly useful.
+
+Another decision is that although @command{gawk} provides nice things like
+MPFR, and arrays indexed internally by integers, these features are not
+being brought out to the API in order to keep things simple and close to
+traditional @command{awk} semantics. (In fact, arrays indexed internally
+by integers are so transparent that they aren't even documented!)
+
+With time, the API will undoubtedly evolve; the @command{gawk} developers
+expect this to be driven by user needs. For now, the current API seems
+to provide a minimal yet powerful set of features for creating extensions.
+
+@node Extension Mechanism Outline
+@subsection At A High Level How It Works
+
+The requirement to avoid access to @command{gawk}'s symbols is, at first
+glance, a difficult one to meet.
+
+One design, apparently used by Perl and Ruby and maybe others, would
+be to make the mainline @command{gawk} code into a library, with the
+@command{gawk} utility a small C @code{main()} function linked against
+the library.
+
+This seemed like the tail wagging the dog, complicating build and
+installation and making a simple copy of the @command{gawk} executable
+from one system to another (or one place to another on the same
+system!) into a chancy operation.
+
+Pat Rankin suggested the solution that was adopted. Communication between
+@command{gawk} and an extension is two-way. First, when an extension
+is loaded, it is passed a pointer to a @code{struct} whose fields are
+function pointers.
+@iftex
+This is shown in @ref{load-extension}.
+@end iftex
+
+@float Figure,load-extension
+@caption{Loading the extension}
+@ifinfo
+@center @image{api-figure1, , , Loading the extension, txt}
+@end ifinfo
+@ifhtml
+@center @image{api-figure1, , , Loading the extension, png}
+@end ifhtml
+@ifnotinfo
+@ifnothtml
+@center @image{api-figure1, , , Loading the extension}
+@end ifnothtml
+@end ifnotinfo
+@end float
+
+The extension can call functions inside @command{gawk} through these
+function pointers, at runtime, without needing (link-time) access
+to @command{gawk}'s symbols. One of these function pointers is to a
+function for ``registering'' new built-in functions.
+@iftex
+This is shown in @ref{load-new-function}.
+@end iftex
+
+@float Figure,load-new-function
+@caption{Loading the new function}
+@ifinfo
+@center @image{api-figure2, , , Loading the new function, txt}
+@end ifinfo
+@ifhtml
+@center @image{api-figure2, , , Loading the new function, png}
+@end ifhtml
+@ifnotinfo
+@ifnothtml
+@center @image{api-figure2, , , Loading the new function}
+@end ifnothtml
+@end ifnotinfo
+@end float
+
+In the other direction, the extension registers its new functions
+with @command{gawk} by passing function pointers to the functions that
+provide the new feature (@code{do_chdir()}, for example). @command{gawk}
+associates the function pointer with a name and can then call it, using a
+defined calling convention.
+@iftex
+This is shown in @ref{call-new-function}.
+@end iftex
+
+@float Figure,call-new-function
+@caption{Calling the new function}
+@ifinfo
+@center @image{api-figure3, , , Calling the new function, txt}
+@end ifinfo
+@ifhtml
+@center @image{api-figure3, , , Calling the new function, png}
+@end ifhtml
+@ifnotinfo
+@ifnothtml
+@center @image{api-figure3, , , Calling the new function}
+@end ifnothtml
+@end ifnotinfo
+@end float
+
+The @code{do_@var{xxx}()} function, in turn, then uses the function
+pointers in the API @code{struct} to do its work, such as updating
+variables or arrays, printing messages, setting @code{ERRNO}, and so on.
+
+Convenience macros in the @file{gawkapi.h} header file make calling
+through the function pointers look like regular function calls so that
+extension code is quite readable and understandable.
+
+Although all of this sounds medium complicated, the result is that
+extension code is quite clean and straightforward. This can be seen in
+the sample extensions @file{filefuncs.c} (@pxref{Extension Example})
+and also the @file{testext.c} code for testing the APIs.
+
+Some other bits and pieces:
+
+@itemize @bullet
+@item
+The API provides access to @command{gawk}'s @code{do_@var{xxx}} values,
+reflecting command line options, like @code{do_lint}, @code{do_profiling}
+and so on (@pxref{Extension API Variables}).
+These are informational: an extension cannot affect these
+inside @command{gawk}. In addition, attempting to assign to them
+produces a compile-time error.
+
+@item
+The API also provides major and minor version numbers, so that an
+extension can check if the @command{gawk} it is loaded with supports the
+facilities it was compiled with. (Version mismatches ``shouldn't''
+happen, but we all know how @emph{that} goes.)
+@xref{Extension Versioning}, for details.
+@end itemize
+
+@node Extension Future Growth
+@subsection Room For Future Growth
+
+The API provides room for future growth, in two ways.
+
+An ``extension id'' is passed into the extension when its loaded. This
+extension id is then passed back to @command{gawk} with each function
+call. This allows @command{gawk} to identify the extension calling into it,
+should it need to know.
+
+A ``name space'' is passed into @command{gawk} when an extension function
+is registered. This provides for a future mechanism for grouping
+extension functions and possibly avoiding name conflicts.
+
+Of course, as of this writing, no decisions have been made with respect
+to any of the above.
+
+@node Extension API Description
+@section API Description
+
+This (rather large) @value{SECTION} describes the API in detail.
+
+@menu
+* Extension API Functions Introduction:: Introduction to the API functions.
+* General Data Types:: The data types.
+* Requesting Values:: How to get a value.
+* Constructor Functions:: Functions for creating values.
+* Registration Functions:: Functions to register things with
+ @command{gawk}.
+* Printing Messages:: Functions for printing messages.
+* Updating @code{ERRNO}:: Functions for updating @code{ERRNO}.
+* Accessing Parameters:: Functions for accessing parameters.
+* Symbol Table Access:: Functions for accessing global
+ variables.
+* Array Manipulation:: Functions for working with arrays.
+* Extension API Variables:: Variables provided by the API.
+* Extension API Boilerplate:: Boilerplate code for using the API.
+* Finding Extensions:: How @command{gawk} find compiled
+ extensions.
+@end menu
+
+@node Extension API Functions Introduction
+@subsection Introduction
+
+Access to facilities within @command{gawk} are made available
+by calling through function pointers passed into your extension.
+
+API function pointers are provided for the following kinds of operations:
+
+@itemize @bullet
+@item
+Registrations functions. You may register:
+@itemize @minus
+@item
+extension functions,
+@item
+exit callbacks,
+@item
+a version string,
+@item
+input parsers,
+@item
+output wrappers,
+@item
+and two-way processors.
+@end itemize
+All of these are discussed in detail, later in this @value{CHAPTER}.
+
+@item
+Printing fatal, warning, and ``lint'' warning messages.
+
+@item
+Updating @code{ERRNO}, or unsetting it.
+
+@item
+Accessing parameters, including converting an undefined parameter into
+an array.
+
+@item
+Symbol table access: retrieving a global variable, creating one,
+or changing one. This also includes the ability to create a scalar
+variable that will be @emph{constant} within @command{awk} code.
+
+@item
+Creating and releasing cached values; this provides an
+efficient way to use values for multiple variables and
+can be a big performance win.
+
+@item
+Manipulating arrays:
+@itemize @minus
+@item
+Retrieving, adding, deleting, and modifying elements
+@item
+Getting the count of elements in an array
+@item
+Creating a new array
+@item
+Clearing an array
+@item
+Flattening an array for easy C style looping over all its indices and elements
+@end itemize
+@end itemize
+
+Some points about using the API:
+
+@itemize @bullet
+@item
+You must include @code{<sys/types.h>} and @code{<sys/stat.h>} before including
+the @file{gawkapi.h} header file. In addition, you must include either
+@code{<stddef.h>} or @code{<stdlib.h>} to get the definition of @code{size_t}.
+If you wish to use the boilerplate @code{dl_load_func()} macro, you will
+need to include @code{<stdio.h>} as well.
+Finally, to pass reasonable integer values for @code{ERRNO}, you
+will need to include @code{<errno.h>}.
+
+@item
+Although the API only uses ISO C 90 features, there is an exception; the
+``constructor'' functions use the @code{inline} keyword. If your compiler
+does not support this keyword, you should either place
+@samp{-Dinline=''} on your command line, or use the GNU Autotools and include a
+@file{config.h} file in your extensions.
+
+@item
+All pointers filled in by @command{gawk} are to memory
+managed by @command{gawk} and should be treated by the extension as
+read-only. Memory for @emph{all} strings passed into @command{gawk}
+from the extension @emph{must} come from @code{malloc()} and is managed
+by @command{gawk} from then on.
+
+@item
+The API defines several simple structs that map values as seen
+from @command{awk}. A value can be a @code{double}, a string, or an
+array (as in multidimensional arrays, or when creating a new array).
+Strings maintain both pointer and length since embedded @code{NUL}
+characters are allowed.
+
+By intent, strings are maintained using the current multibyte encoding (as
+defined by @env{LC_@var{xxx}} environment variables) and not using wide
+characters. This matches how @command{gawk} stores strings internally
+and also how characters are likely to be input and output from files.
+
+@item
+When retrieving a value (such as a parameter or that of a global variable
+or array element), the extension requests a specific type (number, string,
+scalars, value cookie, array, or ``undefined''). When the request is
+``undefined,'' the returned value will have the real underlying type.
+
+However, if the request and actual type don't match, the access function
+returns ``false'' and fills in the type of the actual value that is there,
+so that the extension can, e.g., print an error message
+(``scalar passed where array expected'').
+
+@c This is documented in the header file and needs some expanding upon.
+@c The table there should be presented here
+@end itemize
+
+While you may call the API functions by using the function pointers
+directly, the interface is not so pretty. To make extension code look
+more like regular code, the @file{gawkapi.h} header file defines a number
+of macros which you should use in your code. This @value{SECTION} presents
+the macros as if they were functions.
+
+@node General Data Types
+@subsection General Purpose Data Types
+
+@quotation
+@i{I have a true love/hate relationship with unions.}@*
+Arnold Robbins
+
+@i{That's the thing about unions: the compiler will arrange things so they
+can accommodate both love and hate.}@*
+Chet Ramey
+@end quotation
+
+The extension API defines a number of simple types and structures for general
+purpose use. Additional, more specialized, data structures, are introduced
+in subsequent @value{SECTION}s, together with the functions that use them.
+
+@table @code
+@item typedef void *awk_ext_id_t;
+A value of this type is received from @command{gawk} when an extension is loaded.
+That value must then be passed back to @command{gawk} as the first parameter of
+each API function.
+
+@item #define awk_const @dots{}
+This macro expands to @samp{const} when compiling an extension,
+and to nothing when compiling @command{gawk} itself. This makes
+certain fields in the API data structures unwritable from extension code,
+while allowing @command{gawk} to use them as it needs to.
+
+@item typedef int awk_bool_t;
+A simple boolean type. At the moment, the API does not define special
+``true'' and ``false'' values, although perhaps it should.
+
+@item typedef struct @{
+@itemx @ @ @ @ char *str;@ @ @ @ @ @ /* data */
+@itemx @ @ @ @ size_t len;@ @ @ @ @ /* length thereof, in chars */
+@itemx @} awk_string_t;
+This represents a mutable string. @command{gawk}
+owns the memory pointed to if it supplied
+the value. Otherwise, it takes ownership of the memory pointed to.
+@strong{Such memory must come from @code{malloc()}!}
+
+As mentioned earlier, strings are maintained using the current
+multibyte encoding.
+
+@item typedef enum @{
+@itemx @ @ @ @ AWK_UNDEFINED,
+@itemx @ @ @ @ AWK_NUMBER,
+@itemx @ @ @ @ AWK_STRING,
+@itemx @ @ @ @ AWK_ARRAY,
+@itemx @ @ @ @ AWK_SCALAR,@ @ @ @ @ @ @ @ @ /* opaque access to a variable */
+@itemx @ @ @ @ AWK_VALUE_COOKIE@ @ @ /* for updating a previously created value */
+@itemx @} awk_valtype_t;
+This @code{enum} indicates the type of a value.
+It is used in the following @code{struct}.
+
+@item typedef struct @{
+@itemx @ @ @ @ awk_valtype_t val_type;
+@itemx @ @ @ @ union @{
+@itemx @ @ @ @ @ @ @ @ awk_string_t@ @ @ @ @ @ @ s;
+@itemx @ @ @ @ @ @ @ @ double@ @ @ @ @ @ @ @ @ @ @ @ @ d;
+@itemx @ @ @ @ @ @ @ @ awk_array_t@ @ @ @ @ @ @ @ a;
+@itemx @ @ @ @ @ @ @ @ awk_scalar_t@ @ @ @ @ @ @ scl;
+@itemx @ @ @ @ @ @ @ @ awk_value_cookie_t@ vc;
+@itemx @ @ @ @ @} u;
+@itemx @} awk_value_t;
+An ``@command{awk} value.''
+The @code{val_type} member indicates what kind of value the
+@code{union} holds, and each member is of the appropriate type.
+
+@item #define str_value@ @ @ @ @ @ u.s
+@itemx #define num_value@ @ @ @ @ @ u.d
+@itemx #define array_cookie@ @ @ u.a
+@itemx #define scalar_cookie@ @ u.scl
+@itemx #define value_cookie@ @ @ u.vc
+These macros make accessing the fields of the @code{awk_value_t} more
+readable.
+
+@item typedef void *awk_scalar_t;
+Scalars can be represented as an opaque type. These values are obtained from
+@command{gawk} and then passed back into it. This is discussed in a general fashion below,
+and in more detail in @ref{Symbol table by cookie}.
+
+@item typedef void *awk_value_cookie_t;
+A ``value cookie'' is an opaque type representing a cached value.
+This is also discussed in a general fashion below,
+and in more detail in @ref{Cached values}.
+
+@end table
+
+Scalar values in @command{awk} are either numbers or strings. The
+@code{awk_value_t} struct represents values. The @code{val_type} member
+indicates what is in the @code{union}.
+
+Representing numbers is easy---the API uses a C @code{double}. Strings
+require more work. Since @command{gawk} allows embedded @code{NUL} bytes
+in string values, a string must be represented as a pair containing a
+data-pointer and length. This is the @code{awk_string_t} type.
+
+Identifiers (i.e., the names of global variables) can be associated
+with either scalar values or with arrays. In addition, @command{gawk}
+provides true arrays of arrays, where any given array element can
+itself be an array. Discussion of arrays is delayed until
+@ref{Array Manipulation}.
+
+The various macros listed earlier make it easier to use the elements
+of the @code{union} as if they were fields in a @code{struct}; this
+is a common coding practice in C. Such code is easier to write and to
+read, however it remains @emph{your} responsibility to make sure that
+the @code{val_type} member correctly reflects the type of the value in
+the @code{awk_value_t}.
+
+Conceptually, the first three members of the @code{union} (number, string,
+and array) are all that is needed for working with @command{awk} values.
+However, since the API provides routines for accessing and changing
+the value of global scalar variables only by using the variable's name,
+there is a performance penalty: @command{gawk} must find the variable
+each time it is accessed and changed. This turns out to be a real issue,
+not just a theoretical one.
+
+Thus, if you know that your extension will spend considerable time
+reading and/or changing the value of one or more scalar variables, you
+can obtain a @dfn{scalar cookie}@footnote{See
+@uref{http://catb.org/jargon/html/C/cookie.html, the ``cookie'' entry in the Jargon file} for a
+definition of @dfn{cookie}, and @uref{http://catb.org/jargon/html/M/magic-cookie.html,
+the ``magic cookie'' entry in the Jargon file} for a nice example. See
+also the entry for ``Cookie'' in the @ref{Glossary}.}
+object for that variable, and then use
+the cookie for getting the variable's value or for changing the variable's
+value.
+This is the @code{awk_scalar_t} type and @code{scalar_cookie} macro.
+Given a scalar cookie, @command{gawk} can directly retrieve or
+modify the value, as required, without having to first find it.
+
+The @code{awk_value_cookie_t} type and @code{value_cookie} macro are similar.
+If you know that you wish to
+use the same numeric or string @emph{value} for one or more variables,
+you can create the value once, retaining a @dfn{value cookie} for it,
+and then pass in that value cookie whenever you wish to set the value of a
+variable. This saves both storage space within the running @command{gawk}
+process as well as the time needed to create the value.
+
+@node Requesting Values
+@subsection Requesting Values
+
+All of the functions that return values from @command{gawk}
+work in the same way. You pass in an @code{awk_valtype_t} value
+to indicate what kind of value you expect. If the actual value
+matches what you requested, the function returns true and fills
+in the @code{awk_value_t} result.
+Otherwise, the function returns false, and the @code{val_type}
+member indicates the type of the actual value. You may then
+print an error message, or reissue the request for the actual
+value type, as appropriate. This behavior is summarized in
+@ref{table-value-types-returned}.
+
+@ifnotplaintext
+@float Table,table-value-types-returned
+@caption{Value Types Returned}
+@multitable @columnfractions .50 .50
+@headitem @tab Type of Actual Value:
+@end multitable
+@multitable @columnfractions .166 .166 .198 .15 .15 .166
+@headitem @tab @tab String @tab Number @tab Array @tab Undefined
+@item @tab @b{String} @tab String @tab String @tab false @tab false
+@item @tab @b{Number} @tab Number if can be converted, else false @tab Number @tab false @tab false
+@item @b{Type} @tab @b{Array} @tab false @tab false @tab Array @tab false
+@item @b{Requested:} @tab @b{Scalar} @tab Scalar @tab Scalar @tab false @tab false
+@item @tab @b{Undefined} @tab String @tab Number @tab Array @tab Undefined
+@item @tab @b{Value Cookie} @tab false @tab false @tab false @tab false
+@end multitable
+@end float
+@end ifnotplaintext
+@ifplaintext
+@float Table,table-value-types-returned
+@caption{Value Types Returned}
+@example
+ +-------------------------------------------------+
+ | Type of Actual Value: |
+ +------------+------------+-----------+-----------+
+ | String | Number | Array | Undefined |
++-----------+-----------+------------+------------+-----------+-----------+
+| | String | String | String | false | false |
+| |-----------+------------+------------+-----------+-----------+
+| | Number | Number if | Number | false | false |
+| | | can be | | | |
+| | | converted, | | | |
+| | | else false | | | |
+| |-----------+------------+------------+-----------+-----------+
+| Type | Array | false | false | Array | false |
+| Requested |-----------+------------+------------+-----------+-----------+
+| | Scalar | Scalar | Scalar | false | false |
+| |-----------+------------+------------+-----------+-----------+
+| | Undefined | String | Number | Array | Undefined |
+| |-----------+------------+------------+-----------+-----------+
+| | Value | false | false | false | false |
+| | Cookie | | | | |
++-----------+-----------+------------+------------+-----------+-----------+
+@end example
+@end float
+@end ifplaintext
+
+@node Constructor Functions
+@subsection Constructor Functions and Convenience Macros
+
+The API provides a number of @dfn{constructor} functions for creating
+string and numeric values, as well as a number of convenience macros.
+This @value{SUBSECTION} presents them all as function prototypes, in
+the way that extension code would use them.
+
+@table @code
+@item static inline awk_value_t *
+@itemx make_const_string(const char *string, size_t length, awk_value_t *result)
+This function creates a string value in the @code{awk_value_t} variable
+pointed to by @code{result}. It expects @code{string} to be a C string constant
+(or other string data), and automatically creates a @emph{copy} of the data
+for storage in @code{result}. It returns @code{result}.
+
+@item static inline awk_value_t *
+@itemx make_malloced_string(const char *string, size_t length, awk_value_t *result)
+This function creates a string value in the @code{awk_value_t} variable
+pointed to by @code{result}. It expects @code{string} to be a @samp{char *}
+value pointing to data previously obtained from @code{malloc()}. The idea here
+is that the data is passed directly to @command{gawk}, which assumes
+responsibility for it. It returns @code{result}.
+
+@item static inline awk_value_t *
+@itemx make_null_string(awk_value_t *result)
+This specialized function creates a null string (the ``undefined'' value)
+in the @code{awk_value_t} variable pointed to by @code{result}.
+It returns @code{result}.
+
+@item static inline awk_value_t *
+@itemx make_number(double num, awk_value_t *result)
+This function simply creates a numeric value in the @code{awk_value_t} variable
+pointed to by @code{result}.
+@end table
+
+Two convenience macros may be used for allocating storage from @code{malloc()}
+and @code{realloc()}. If the allocation fails, they cause @command{gawk} to
+exit with a fatal error message. They should be used as if they were
+procedure calls that do not return a value.
+
+@table @code
+@item emalloc(pointer, type, size, message)
+The arguments to this macro are as follows:
+@c nested table
+@table @code
+@item pointer
+The pointer variable to point at the allocated storage.
+
+@item type
+The type of the pointer variable, used to create a cast for the call to @code{malloc()}.
+
+@item size
+The total number of bytes to be allocated.
+
+@item message
+A message to be prefixed to the fatal error message. Typically this is the name
+of the function using the macro.
+@end table
+
+@noindent
+For example, you might allocate a string value like so:
+
+@example
+awk_value_t result;
+char *message;
+const char greet[] = "Don't Panic!";
+
+emalloc(message, char *, sizeof(greet), "myfunc");
+strcpy(message, greet);
+make_malloced_string(message, strlen(message), & result);
+@end example
+
+@item erealloc(pointer, type, size, message)
+This is like @code{emalloc()}, but it calls @code{realloc()},
+instead of @code{malloc()}.
+The arguments are the same as for the @code{emalloc()} macro.
+@end table
+
+@node Registration Functions
+@subsection Registration Functions
+
+This @value{SECTION} describes the API functions for
+registering parts of your extension with @command{gawk}.
+
+@menu
+* Extension Functions:: Registering extension functions.
+* Exit Callback Functions:: Registering an exit callback.
+* Extension Version String:: Registering a version string.
+* Input Parsers:: Registering an input parser.
+* Output Wrappers:: Registering an output wrapper.
+* Two-way processors:: Registering a two-way processor.
+@end menu
+
+@node Extension Functions
+@subsubsection Registering An Extension Function
+
+Extension functions are described by the following record:
+
+@example
+typedef struct @{
+@ @ @ @ const char *name;
+@ @ @ @ awk_value_t *(*function)(int num_actual_args, awk_value_t *result);
+@ @ @ @ size_t num_expected_args;
+@} awk_ext_func_t;
+@end example
+
+The fields are:
+
+@table @code
+@item const char *name;
+The name of the new function.
+@command{awk} level code calls the function by this name.
+This is a regular C string.
+
+@item awk_value_t *(*function)(int num_actual_args, awk_value_t *result);
+This is a pointer to the C function that provides the desired
+functionality.
+The function must fill in the result with either a number
+or a string. @command{awk} takes ownership of any string memory.
+As mentioned earlier, string memory @strong{must} come from @code{malloc()}.
+
+The function must return the value of @code{result}.
+This is for the convenience of the calling code inside @command{gawk}.
+
+@item size_t num_expected_args;
+This is the number of arguments the function expects to receive.
+Each extension function may decide what to do if the number of
+arguments isn't what it expected. Following @command{awk} functions, it
+is likely OK to ignore extra arguments.
+@end table
+
+Once you have a record representing your extension function, you register
+it with @command{gawk} using this API function:
+
+@table @code
+@item awk_bool_t add_ext_func(const char *namespace, const awk_ext_func_t *func);
+This function returns true upon success, false otherwise.
+The @code{namespace} parameter is currently not used; you should pass in an
+empty string (@code{""}). The @code{func} pointer is the address of a
+@code{struct} representing your function, as just described.
+@end table
+
+@node Exit Callback Functions
+@subsubsection Registering An Exit Callback Function
+
+An @dfn{exit callback} function is a function that
+@command{gawk} calls before it exits.
+Such functions are useful if you have general ``clean up'' tasks
+that should be performed in your extension (such as closing data
+base connections or other resource deallocations).
+You can register such
+a function with @command{gawk} using the following function.
+
+@table @code
+@item void awk_atexit(void (*funcp)(void *data, int exit_status),
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ void *arg0);
+The parameters are:
+@c nested table
+@table @code
+@item funcp
+A pointer to the function to be called before @command{gawk} exits. The @code{data}
+parameter will be the original value of @code{arg0}.
+The @code{exit_status} parameter is
+the exit status value that @command{gawk} will pass to the @code{exit()} system call.
+
+@item arg0
+A pointer to private data which @command{gawk} saves in order to pass to
+the function pointed to by @code{funcp}.
+@end table
+@end table
+
+Exit callback functions are called in Last-In-First-Out (LIFO) order---that is, in
+the reverse order in which they are registered with @command{gawk}.
+
+@node Extension Version String
+@subsubsection Registering An Extension Version String
+
+You can register a version string which indicates the name and
+version of your extension, with @command{gawk}, as follows:
+
+@table @code
+@item void register_ext_version(const char *version);
+Register the string pointed to by @code{version} with @command{gawk}.
+@command{gawk} does @emph{not} copy the @code{version} string, so
+it should not be changed.
+@end table
+
+@command{gawk} prints all registered extension version strings when it
+is invoked with the @option{--version} option.
+
+@node Input Parsers
+@subsubsection Customized Input Parsers
+
+By default, @command{gawk} reads text files as its input. It uses the value
+of @code{RS} to find the end of the record, and then uses @code{FS}
+(or @code{FIELDWIDTHS}) to split it into fields (@pxref{Reading Files}).
+Additionally, it sets the value of @code{RT} (@pxref{Built-in Variables}).
+
+If you want, you can provide your own, custom, input parser. An input
+parser's job is to return a record to the @command{gawk} record processing
+code, along with indicators for the value and length of the data to be
+used for @code{RT}, if any.
+
+To provide an input parser, you must first provide two functions
+(where @var{XXX} is a prefix name for your extension):
+
+@table @code
+@item awk_bool_t @var{XXX}_can_take_file(const awk_input_buf_t *iobuf)
+This function examines the information available in @code{iobuf}
+(which we discuss shortly). Based on the information there, it
+decides if the input parser should be used for this file.
+If so, it should return true. Otherwise, it should return false.
+It should not change any state (variable values, etc.) within @command{gawk}.
+
+@item awk_bool_t @var{XXX}_take_control_of(awk_input_buf_t *iobuf)
+When @command{gawk} decides to hand control of the file over to the
+input parser, it calls this function. This function in turn must fill
+in certain fields in the @code{awk_input_buf_t} structure, and ensure
+that certain conditions are true. It should then return true. If an
+error of some kind occurs, it should not fill in any fields, and should
+return false; then @command{gawk} will not use the input parser.
+The details are presented shortly.
+@end table
+
+Your extension should package these functions inside an
+@code{awk_input_parser_t}, which looks like this:
+
+@example
+typedef struct input_parser @{
+ const char *name; /* name of parser */
+ awk_bool_t (*can_take_file)(const awk_input_buf_t *iobuf);
+ awk_bool_t (*take_control_of)(awk_input_buf_t *iobuf);
+ awk_const struct input_parser *awk_const next; /* for use by gawk */
+@} awk_input_parser_t;
+@end example
+
+The fields are:
+
+@table @code
+@item const char *name;
+The name of the input parser. This is a regular C string.
+
+@item awk_bool_t (*can_take_file)(const awk_input_buf_t *iobuf);
+A pointer to your @code{@var{XXX}_can_take_file()} function.
+
+@item awk_bool_t (*take_control_of)(awk_input_buf_t *iobuf);
+A pointer to your @code{@var{XXX}_take_control_of()} function.
+
+@item awk_const struct input_parser *awk_const next;
+This pointer is used by @command{gawk}.
+The extension cannot modify it.
+@end table
+
+The steps are as follows:
+
+@enumerate
+@item
+Create a @code{static awk_input_parser_t} variable and initialize it
+appropriately.
+
+@item
+When your extension is loaded, register your input parser with
+@command{gawk} using the @code{register_input_parser()} API function
+(described below).
+@end enumerate
+
+An @code{awk_input_buf_t} looks like this:
+
+@example
+typedef struct awk_input @{
+ const char *name; /* filename */
+ int fd; /* file descriptor */
+#define INVALID_HANDLE (-1)
+ void *opaque; /* private data for input parsers */
+ int (*get_record)(char **out, struct awk_input *iobuf,
+ int *errcode, char **rt_start, size_t *rt_len);
+ void (*close_func)(struct awk_input *iobuf);
+ struct stat sbuf; /* stat buf */
+@} awk_input_buf_t;
+@end example
+
+The fields can be divided into two categories: those for use (initially,
+at least) by @code{@var{XXX}_can_take_file()}, and those for use by
+@code{@var{XXX}_take_control_of()}. The first group of fields and their uses
+are as follows:
+
+@table @code
+@item const char *name;
+The name of the file.
+
+@item int fd;
+A file descriptor for the file. If @command{gawk} was able to
+open the file, then @code{fd} will @emph{not} be equal to
+@code{INVALID_HANDLE}. Otherwise, it will.
+
+@item struct stat sbuf;
+If file descriptor is valid, then @command{gawk} will have filled
+in this structure via a call to the @code{fstat()} system call.
+@end table
+
+The @code{@var{XXX}_can_take_file()} function should examine these
+fields and decide if the input parser should be used for the file.
+The decision can be made based upon @command{gawk} state (the value
+of a variable defined previously by the extension and set by
+@command{awk} code), the name of the
+file, whether or not the file descriptor is valid, the information
+in the @code{struct stat}, or any combination of the above.
+
+Once @code{@var{XXX}_can_take_file()} has returned true, and
+@command{gawk} has decided to use your input parser, it calls
+@code{@var{XXX}_take_control_of()}. That function then fills in at
+least the @code{get_record} field of the @code{awk_input_buf_t}. It must
+also ensure that @code{fd} is not set to @code{INVALID_HANDLE}. All of
+the fields that may be filled by @code{@var{XXX}_take_control_of()}
+are as follows:
+
+@table @code
+@item void *opaque;
+This is used to hold any state information needed by the input parser
+for this file. It is ``opaque'' to @command{gawk}. The input parser
+is not required to use this pointer.
+
+@item int@ (*get_record)(char@ **out,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ struct@ awk_input *iobuf,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ int *errcode,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ char **rt_start,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ size_t *rt_len);
+This function pointer should point to a function that creates the input
+records. Said function is the core of the input parser. Its behavior
+is described below.
+
+@item void (*close_func)(struct awk_input *iobuf);
+This function pointer should point to a function that does
+the ``tear down.'' It should release any resources allocated by
+@code{@var{XXX}_take_control_of()}. It may also close the file. If it
+does so, it should set the @code{fd} field to @code{INVALID_HANDLE}.
+
+If @code{fd} is still not @code{INVALID_HANDLE} after the call to this
+function, @command{gawk} calls the regular @code{close()} system call.
+
+Having a ``tear down'' function is optional. If your input parser does
+not need it, do not set this field. Then, @command{gawk} calls the
+regular @code{close()} system call on the file descriptor, so it should
+be valid.
+@end table
+
+The @code{@var{XXX}_get_record()} function does the work of creating
+input records. The parameters are as follows:
+
+@table @code
+@item char **out
+This is a pointer to a @code{char *} variable which is set to point
+to the record. @command{gawk} makes its own copy of the data, so
+the extension must manage this storage.
+
+@item struct awk_input *iobuf
+This is the @code{awk_input_buf_t} for the file. The fields should be
+used for reading data (@code{fd}) and for managing private state
+(@code{opaque}), if any.
+
+@item int *errcode
+If an error occurs, @code{*errcode} should be set to an appropriate
+code from @code{<errno.h>}.
+
+@item char **rt_start
+@itemx size_t *rt_len
+If the concept of a ``record terminator'' makes sense, then
+@code{*rt_start} should be set to point to the data to be used for
+@code{RT}, and @code{*rt_len} should be set to the length of the
+data. Otherwise, @code{*rt_len} should be set to zero.
+@code{gawk} makes its own copy of this data, so the
+extension must manage the storage.
+@end table
+
+The return value is the length of the buffer pointed to by
+@code{*out}, or @code{EOF} if end-of-file was reached or an
+error occurred.
+
+It is guaranteed that @code{errcode} is a valid pointer, so there is no
+need to test for a @code{NULL} value. @command{gawk} sets @code{*errcode}
+to zero, so there is no need to set it unless an error occurs.
+
+If an error does occur, the function should return @code{EOF} and set
+@code{*errcode} to a non-zero value. In that case, if @code{*errcode}
+does not equal @minus{}1, @command{gawk} automatically updates
+the @code{ERRNO} variable based on the value of @code{*errcode} (e.g.,
+setting @samp{*errcode = errno} should do the right thing).
+
+@command{gawk} ships with a sample extension that reads directories,
+returning records for each entry in the directory (@pxref{Extension
+Sample Readdir}). You may wish to use that code as a guide for writing
+your own input parser.
+
+When writing an input parser, you should think about (and document)
+how it is expected to interact with @command{awk} code. You may want
+it to always be called, and take effect as appropriate (as the
+@code{readdir} extension does). Or you may want it to take effect
+based upon the value of an @code{awk} variable, as the XML extension
+from the @code{gawkextlib} project does (@pxref{gawkextlib}).
+In the latter case, code in a @code{BEGINFILE} section
+can look at @code{FILENAME} and @code{ERRNO} to decide whether or
+not to activate an input parser (@pxref{BEGINFILE/ENDFILE}).
+
+You register your input parser with the following function:
+
+@table @code
+@item void register_input_parser(awk_input_parser_t *input_parser);
+Register the input parser pointed to by @code{input_parser} with
+@command{gawk}.
+@end table
+
+@node Output Wrappers
+@subsubsection Customized Output Wrappers
+
+An @dfn{output wrapper} is the mirror image of an input parser.
+It allows an extension to take over the output to a file opened
+with the @samp{>} or @samp{>>} operators (@pxref{Redirection}).
+
+The output wrapper is very similar to the input parser structure:
+
+@example
+typedef struct output_wrapper @{
+ const char *name; /* name of the wrapper */
+ awk_bool_t (*can_take_file)(const awk_output_buf_t *outbuf);
+ awk_bool_t (*take_control_of)(awk_output_buf_t *outbuf);
+ awk_const struct output_wrapper *awk_const next; /* for use by gawk */
+@} awk_output_wrapper_t;
+@end example
+
+The members are as follows:
+
+@table @code
+@item const char *name;
+This is the name of the output wrapper.
+
+@item awk_bool_t (*can_take_file)(const awk_output_buf_t *outbuf);
+This points to a function that examines the information in
+the @code{awk_output_buf_t} structure pointed to by @code{outbuf}.
+It should return true if the output wrapper wants to take over the
+file, and false otherwise. It should not change any state (variable
+values, etc.) within @command{gawk}.
+
+@item awk_bool_t (*take_control_of)(awk_output_buf_t *outbuf);
+The function pointed to by this field is called when @command{gawk}
+decides to let the output wrapper take control of the file. It should
+fill in appropriate members of the @code{awk_output_buf_t} structure,
+as described below, and return true if successful, false otherwise.
+
+@item awk_const struct output_wrapper *awk_const next;
+This is for use by @command{gawk}.
+@end table
+
+The @code{awk_output_buf_t} structure looks like this:
+
+@example
+typedef struct @{
+ const char *name; /* name of output file */
+ const char *mode; /* mode argument to fopen */
+ FILE *fp; /* stdio file pointer */
+ awk_bool_t redirected; /* true if a wrapper is active */
+ void *opaque; /* for use by output wrapper */
+ size_t (*gawk_fwrite)(const void *buf, size_t size, size_t count,
+ FILE *fp, void *opaque);
+ int (*gawk_fflush)(FILE *fp, void *opaque);
+ int (*gawk_ferror)(FILE *fp, void *opaque);
+ int (*gawk_fclose)(FILE *fp, void *opaque);
+@} awk_output_buf_t;
+@end example
+
+Here too, your extension will define @code{@var{XXX}_can_take_file()}
+and @code{@var{XXX}_take_control_of()} functions that examine and update
+data members in the @code{awk_output_buf_t}.
+The data members are as follows:
+
+@table @code
+@item const char *name;
+The name of the output file.
+
+@item const char *mode;
+The mode string (as would be used in the second argument to @code{fopen()})
+with which the file was opened.
+
+@item FILE *fp;
+The @code{FILE} pointer from @code{<stdio.h>}. @command{gawk} opens the file
+before attempting to find an output wrapper.
+
+@item awk_bool_t redirected;
+This field must be set to true by the @code{@var{XXX}_take_control_of()} function.
+
+@item void *opaque;
+This pointer is opaque to @command{gawk}. The extension should use it to store
+a pointer to any private data associated with the file.
+
+@item size_t (*gawk_fwrite)(const void *buf, size_t size, size_t count,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ FILE *fp, void *opaque);
+@itemx int (*gawk_fflush)(FILE *fp, void *opaque);
+@itemx int (*gawk_ferror)(FILE *fp, void *opaque);
+@itemx int (*gawk_fclose)(FILE *fp, void *opaque);
+These pointers should be set to point to functions that perform
+the equivalent function as the @code{<stdio.h>} functions do, if appropriate.
+@command{gawk} uses these function pointers for all output.
+@command{gawk} initializes the pointers to point to internal, ``pass through''
+functions that just call the regular @code{<stdio.h>} functions, so an
+extension only needs to redefine those functions that are appropriate for
+what it does.
+@end table
+
+The @code{@var{XXX}_can_take_file()} function should make a decision based
+upon the @code{name} and @code{mode} fields, and any additional state
+(such as @command{awk} variable values) that is appropriate.
+
+When @command{gawk} calls @code{@var{XXX}_take_control_of()}, it should fill
+in the other fields, as appropriate, except for @code{fp}, which it should just
+use normally.
+
+You register your output wrapper with the following function:
+
+@table @code
+@item void register_output_wrapper(awk_output_wrapper_t *output_wrapper);
+Register the output wrapper pointed to by @code{output_wrapper} with
+@command{gawk}.
+@end table
+
+@node Two-way processors
+@subsubsection Customized Two-way Processors
+
+A @dfn{two-way processor} combines an input parser and an output wrapper for
+two-way I/O with the @samp{|&} operator (@pxref{Redirection}). It makes identical
+use of the @code{awk_input_parser_t} and @code{awk_output_buf_t} structures
+as described earlier.
+
+A two-way processor is represented by the following structure:
+
+@example
+typedef struct two_way_processor @{
+ const char *name; /* name of the two-way processor */
+ awk_bool_t (*can_take_two_way)(const char *name);
+ awk_bool_t (*take_control_of)(const char *name,
+ awk_input_buf_t *inbuf,
+ awk_output_buf_t *outbuf);
+ awk_const struct two_way_processor *awk_const next; /* for use by gawk */
+@} awk_two_way_processor_t;
+@end example
+
+The fields are as follows:
+
+@table @code
+@item const char *name;
+The name of the two-way processor.
+
+@item awk_bool_t (*can_take_two_way)(const char *name);
+This function returns true if it wants to take over two-way I/O for this filename.
+It should not change any state (variable
+values, etc.) within @command{gawk}.
+
+@item awk_bool_t (*take_control_of)(const char *name,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_input_buf_t *inbuf,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_output_buf_t *outbuf);
+This function should fill in the @code{awk_input_buf_t} and
+@code{awk_outut_buf_t} structures pointed to by @code{inbuf} and
+@code{outbuf}, respectively. These structures were described earlier.
+
+@item awk_const struct two_way_processor *awk_const next;
+This is for use by @command{gawk}.
+@end table
+
+As with the input parser and output processor, you provide
+``yes I can take this'' and ``take over for this'' functions,
+@code{@var{XXX}_can_take_two_way()} and @code{@var{XXX}_take_control_of()}.
+
+You register your two-way processor with the following function:
+
+@table @code
+@item void register_two_way_processor(awk_two_way_processor_t *two_way_processor);
+Register the two-way processor pointed to by @code{two_way_processor} with
+@command{gawk}.
+@end table
+
+@node Printing Messages
+@subsection Printing Messages
+
+You can print different kinds of warning messages from your
+extension, as described below. Note that for these functions,
+you must pass in the extension id received from @command{gawk}
+when the extension was loaded.@footnote{Because the API uses only ISO C 90
+features, it cannot make use of the ISO C 99 variadic macro feature to hide
+that parameter. More's the pity.}
+
+@table @code
+@item void fatal(awk_ext_id_t id, const char *format, ...);
+Print a message and then cause @command{gawk} to exit immediately.
+
+@item void warning(awk_ext_id_t id, const char *format, ...);
+Print a warning message.
+
+@item void lintwarn(awk_ext_id_t id, const char *format, ...);
+Print a ``lint warning.'' Normally this is the same as printing a
+warning message, but if @command{gawk} was invoked with @samp{--lint=fatal},
+then lint warnings become fatal error messages.
+@end table
+
+All of these functions are otherwise like the C @code{printf()}
+family of functions, where the @code{format} parameter is a string
+with literal characters and formatting codes intermixed.
+
+@node Updating @code{ERRNO}
+@subsection Updating @code{ERRNO}
+
+The following functions allow you to update the @code{ERRNO}
+variable:
+
+@table @code
+@item void update_ERRNO_int(int errno_val);
+Set @code{ERRNO} to the string equivalent of the error code
+in @code{errno_val}. The value should be one of the defined
+error codes in @code{<errno.h>}, and @command{gawk} turns it
+into a (possibly translated) string using the C @code{strerror()} function.
+
+@item void update_ERRNO_string(const char *string);
+Set @code{ERRNO} directly to the string value of @code{ERRNO}.
+@command{gawk} makes a copy of the value of @code{string}.
+
+@item void unset_ERRNO();
+Unset @code{ERRNO}.
+@end table
+
+@node Accessing Parameters
+@subsection Accessing and Updating Parameters
+
+Two functions give you access to the arguments (parameters)
+passed to your extension function. They are:
+
+@table @code
+@item awk_bool_t get_argument(size_t count,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_valtype_t wanted,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_value_t *result);
+Fill in the @code{awk_value_t} structure pointed to by @code{result}
+with the @code{count}'th argument. Return true if the actual
+type matches @code{wanted}, false otherwise. In the latter
+case, @code{result@w{->}val_type} indicates the actual type
+(@pxref{table-value-types-returned}). Counts are zero based---the first
+argument is numbered zero, the second one, and so on. @code{wanted}
+indicates the type of value expected.
+
+@item awk_bool_t set_argument(size_t count, awk_array_t array);
+Convert a parameter that was undefined into an array; this provides
+call-by-reference for arrays. Return false if @code{count} is too big,
+or if the argument's type is not undefined. @xref{Array Manipulation},
+for more information on creating arrays.
+@end table
+
+@node Symbol Table Access
+@subsection Symbol Table Access
+
+Two sets of routines provide access to global variables, and one set
+allows you to create and release cached values.
+
+@menu
+* Symbol table by name:: Accessing variables by name.
+* Symbol table by cookie:: Accessing variables by ``cookie''.
+* Cached values:: Creating and using cached values.
+@end menu
+
+@node Symbol table by name
+@subsubsection Variable Access and Update by Name
+
+The following routines provide the ability to access and update
+global @command{awk}-level variables by name. In compiler terminology,
+identifiers of different kinds are termed @dfn{symbols}, thus the ``sym''
+in the routines' names. The data structure which stores information
+about symbols is termed a @dfn{symbol table}.
+
+@table @code
+@item awk_bool_t sym_lookup(const char *name,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_valtype_t wanted,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_value_t *result);
+Fill in the @code{awk_value_t} structure pointed to by @code{result}
+with the value of the variable named by the string @code{name}, which is
+a regular C string. @code{wanted} indicates the type of value expected.
+Return true if the actual type matches @code{wanted}, false otherwise
+In the latter case, @code{result->val_type} indicates the actual type
+(@pxref{table-value-types-returned}).
+
+@item awk_bool_t sym_update(const char *name, awk_value_t *value);
+Update the variable named by the string @code{name}, which is a regular
+C string. The variable is added to @command{gawk}'s symbol table
+if it is not there. Return true if everything worked, false otherwise.
+
+Changing types (scalar to array or vice versa) of an existing variable
+is @emph{not} allowed, nor may this routine be used to update an array.
+This routine cannot be be used to update any of the predefined
+variables (such as @code{ARGC} or @code{NF}).
+
+@item awk_bool_t sym_constant(const char *name, awk_value_t *value);
+Create a variable named by the string @code{name}, which is
+a regular C string, that has the constant value as given by
+@code{value}. @command{awk}-level code cannot change the value of this
+variable.@footnote{There (currently) is no @code{awk}-level feature that
+provides this ability.} The extension may change the value of @code{name}'s
+variable with subsequent calls to this routine, and may also convert
+a variable created by @code{sym_update()} into a constant. However,
+once a variable becomes a constant it cannot later be reverted into a
+mutable variable.
+@end table
+
+@node Symbol table by cookie
+@subsubsection Variable Access and Update by Cookie
+
+A @dfn{scalar cookie} is an opaque handle that provide access
+to a global variable or array. It is an optimization that
+avoids looking up variables in @command{gawk}'s symbol table every time
+access is needed. This was discussed earlier, in @ref{General Data Types}.
+
+The following functions let you work with scalar cookies.
+
+@table @code
+@item awk_bool_t sym_lookup_scalar(awk_scalar_t cookie,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_valtype_t wanted,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_value_t *result);
+Retrieve the current value of a scalar cookie.
+Once you have obtained a scalar_cookie using @code{sym_lookup()}, you can
+use this function to get its value more efficiently.
+Return false if the value cannot be retrieved.
+
+@item awk_bool_t sym_update_scalar(awk_scalar_t cookie, awk_value_t *value);
+Update the value associated with a scalar cookie. Return false if
+the new value is not one of @code{AWK_STRING} or @code{AWK_NUMBER}.
+Here too, the built-in variables may not be updated.
+@end table
+
+It is not obvious at first glance how to work with scalar cookies or
+what their @i{raison d'etre} really is. In theory, the @code{sym_lookup()}
+and @code{sym_update()} routines are all you really need to work with
+variables. For example, you might have code that looked up the value of
+a variable, evaluated a condition, and then possibly changed the value
+of the variable based on the result of that evaluation, like so:
+
+@example
+/* do_magic --- do something really great */
+
+static awk_value_t *
+do_magic(int nargs, awk_value_t *result)
+@{
+ awk_value_t value;
+
+ if ( sym_lookup("MAGIC_VAR", AWK_NUMBER, & value)
+ && some_condition(value.num_value)) @{
+ value.num_value += 42;
+ sym_update("MAGIC_VAR", & value);
+ @}
+
+ return make_number(0.0, result);
+@}
+@end example
+
+@noindent
+This code looks (and is) simple and straightforward. So what's the problem?
+
+Consider what happens if @command{awk}-level code associated with your
+extension calls the @code{magic()} function (implemented in C by @code{do_magic()}),
+once per record, while processing hundreds of thousands or millions of records.
+The @code{MAGIC_VAR} variable is looked up in the symbol table once or twice per function call!
+
+The symbol table lookup is really pure overhead; it is considerably more efficient
+to get a cookie that represents the variable, and use that to get the variable's
+value and update it as needed.@footnote{The difference is measurable and quite real. Trust us.}
+
+Thus, the way to use cookies is as follows. First, install your extension's variable
+in @command{gawk}'s symbol table using @code{sym_update()}, as usual. Then get a
+scalar cookie for the variable using @code{sym_lookup()}:
+
+@example
+static awk_scalar_t magic_var_cookie; /* cookie for MAGIC_VAR */
+
+static void
+my_extension_init()
+@{
+ awk_value_t value;
+
+ /* install initial value */
+ sym_update("MAGIC_VAR", make_number(42.0, & value));
+
+ /* get cookie */
+ sym_lookup("MAGIC_VAR", AWK_SCALAR, & value);
+
+ /* save the cookie */
+ magic_var_cookie = value.scalar_cookie;
+ @dots{}
+@}
+@end example
+
+Next, use the routines in this section for retrieving and updating
+the value through the cookie. Thus, @code{do_magic()} now becomes
+something like this:
+
+@example
+/* do_magic --- do something really great */
+
+static awk_value_t *
+do_magic(int nargs, awk_value_t *result)
+@{
+ awk_value_t value;
+
+ if ( sym_lookup_scalar(magic_var_cookie, AWK_NUMBER, & value)
+ && some_condition(value.num_value)) @{
+ value.num_value += 42;
+ sym_update_scalar(magic_var_cookie, & value);
+ @}
+ @dots{}
+
+ return make_number(0.0, result);
+@}
+@end example
+
+@quotation NOTE
+The previous code omitted error checking for
+presentation purposes. Your extension code should be more robust
+and carefully check the return values from the API functions.
+@end quotation
+
+@node Cached values
+@subsubsection Creating and Using Cached Values
+
+The routines in this section allow you to create and release
+cached values. As with scalar cookies, in theory, cached values
+are not necessary. You can create numbers and strings using
+the functions in @ref{Constructor Functions}. You can then
+assign those values to variables using @code{sym_update()}
+or @code{sym_update_scalar()}, as you like.
+
+However, you can understand the point of cached values if you remember that
+@emph{every} string value's storage @emph{must} come from @code{malloc()}.
+If you have 20 variables, all of which have the same string value, you
+must create 20 identical copies of the string.@footnote{Numeric values
+are clearly less problematic, requiring only a C @code{double} to store.}
+
+It is clearly more efficient, if possible, to create a value once, and
+then tell @command{gawk} to reuse the value for multiple variables. That
+is what the routines in this section let you do. The functions are as follows:
+
+@table @code
+@item awk_bool_t create_value(awk_value_t *value, awk_value_cookie_t *result);
+Create a cached string or numeric value from @code{value} for efficient later
+assignment.
+Only @code{AWK_NUMBER} and @code{AWK_STRING} values are allowed. Any other type
+is rejected. While @code{AWK_UNDEFINED} could be allowed, doing so would
+result in inferior performance.
+
+@item awk_bool_t release_value(awk_value_cookie_t vc);
+Release the memory associated with a value cookie obtained
+from @code{create_value()}.
+@end table
+
+You use value cookies in a fashion similar to the way you use scalar cookies.
+In the extension initialization routine, you create the value cookie:
+
+@example
+static awk_value_cookie_t answer_cookie; /* static value cookie */
+
+static void
+my_extension_init()
+@{
+ awk_value_t value;
+ char *long_string;
+ size_t long_string_len;
+
+ /* code from earlier */
+ @dots{}
+ /* @dots{} fill in long_string and long_string_len @dots{} */
+ make_malloced_string(long_string, long_string_len, & value);
+ create_value(& value, & answer_cookie); /* create cookie */
+ @dots{}
+@}
+@end example
+
+Once the value is created, you can use it as the value of any number
+of variables:
+
+@example
+static awk_value_t *
+do_magic(int nargs, awk_value_t *result)
+@{
+ awk_value_t new_value;
+
+ @dots{} /* as earlier */
+
+ value.val_type = AWK_VALUE_COOKIE;
+ value.value_cookie = answer_cookie;
+ sym_update("VAR1", & value);
+ sym_update("VAR2", & value);
+ @dots{}
+ sym_update("VAR100", & value);
+ @dots{}
+@}
+@end example
+
+@noindent
+Using value cookies in this way saves considerable storage, since all of
+@code{VAR1} through @code{VAR100} share the same value.
+
+You might be wondering, ``Is this sharing problematic?
+What happens if @command{awk} code assigns a new value to @code{VAR1},
+are all the others be changed too?''
+
+That's a great question. The answer is that no, it's not a problem.
+@command{gawk} is smart enough to avoid such problems.
+
+Finally, as part of your clean up action (@pxref{Exit Callback Functions})
+you should release any cached values that you created, using
+@code{release_value()}.
+
+@node Array Manipulation
+@subsection Array Manipulation
+
+The primary data structure@footnote{Okay, the only data structure.} in @command{awk}
+is the associative array (@pxref{Arrays}).
+Extensions need to be able to manipulate @command{awk} arrays.
+The API provides a number of data structures for working with arrays,
+functions for working with individual elements, and functions for
+working with arrays as a whole. This includes the ability to
+``flatten'' an array so that it is easy for C code to traverse
+every element in an array. The array data structures integrate
+nicely with the data structures for values to make it easy to
+both work with and create true arrays of arrays (@pxref{General Data Types}).
+
+@menu
+* Array Data Types:: Data types for working with arrays.
+* Array Functions:: Functions for working with arrays.
+* Flattening Arrays:: How to flatten arrays.
+* Creating Arrays:: How to create and populate arrays.
+@end menu
+
+@node Array Data Types
+@subsubsection Array Data Types
+
+The data types associated with arrays are listed below.
+
+@table @code
+@item typedef void *awk_array_t;
+If you request the value of an array variable, you get back an
+@code{awk_array_t} value. This value is opaque@footnote{It is also
+a ``cookie,'' but the @command{gawk} developers did not wish to overuse this
+term.} to the extension; it uniquely identifies the array but can
+only be used by passing it into API functions or receiving it from API
+functions. This is very similar to way @samp{FILE *} values are used
+with the @code{<stdio.h>} library routines.
+
+
+@item
+@item typedef struct awk_element @{
+@itemx @ @ @ @ /* convenience linked list pointer, not used by gawk */
+@itemx @ @ @ @ struct awk_element *next;
+@itemx @ @ @ @ enum @{
+@itemx @ @ @ @ @ @ @ @ AWK_ELEMENT_DEFAULT = 0,@ @ /* set by gawk */
+@itemx @ @ @ @ @ @ @ @ AWK_ELEMENT_DELETE = 1@ @ @ @ /* set by extension if should be deleted */
+@itemx @ @ @ @ @} flags;
+@itemx @ @ @ @ awk_value_t index;
+@itemx @ @ @ @ awk_value_t value;
+@itemx @} awk_element_t;
+The @code{awk_element_t} is a ``flattened''
+array element. @command{awk} produces an array of these
+inside the @code{awk_flat_array_t} (see the next item).
+Individual elements may be marked for deletion. New elements must be added
+individually, one at a time, using the separate API for that purpose.
+The fields are as follows:
+
+@c nested table
+@table @code
+@item struct awk_element *next;
+This pointer is for the convenience of extension writers. It allows
+an extension to create a linked list of new elements which can then be
+added to an array in a loop that traverses the list.
+
+@item enum @{ @dots{} @} flags;
+A set of flag values that convey information between @command{gawk}
+and the extension. Currently there is only one: @code{AWK_ELEMENT_DELETE},
+which the extension can set to cause @command{gawk} to delete the
+element from the original array upon release of the flattened array.
+
+@item index
+@itemx value
+The index and value of the element, respectively.
+@emph{All} memory pointed to by @code{index} and @code{value} belongs to @command{gawk}.
+@end table
+
+@item typedef struct awk_flat_array @{
+@itemx @ @ @ @ awk_const void *awk_const opaque1;@ @ @ @ /* private data for use by gawk */
+@itemx @ @ @ @ awk_const void *awk_const opaque2;@ @ @ @ /* private data for use by gawk */
+@itemx @ @ @ @ awk_const size_t count;@ @ @ @ @ /* how many elements */
+@itemx @ @ @ @ awk_element_t elements[1];@ @ /* will be extended */
+@itemx @} awk_flat_array_t;
+This is a flattened array. When an extension gets one of these
+from @command{gawk}, the @code{elements} array is of actual
+size @code{count}.
+The @code{opaque1} and @code{opaque2} pointers are for use by @command{gawk};
+therefore they are marked @code{awk_const} so that the extension cannot
+modify them.
+@end table
+
+@node Array Functions
+@subsubsection Array Functions
+
+The following functions relate to individual array elements.
+
+@table @code
+@item awk_bool_t get_element_count(awk_array_t a_cookie, size_t *count);
+For the array represented by @code{a_cookie}, return in @code{*count}
+the number of elements it contains. A subarray counts as a single element.
+Return false if there is an error.
+
+@item awk_bool_t get_array_element(awk_array_t a_cookie,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const awk_value_t *const index,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_valtype_t wanted,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_value_t *result);
+For the array represented by @code{a_cookie}, return in @code{*result}
+the value of the element whose index is @code{index}.
+@code{wanted} specifies the type of value you wish to retrieve.
+Return false if @code{wanted} does not match the actual type or if
+@code{index} is not in the array (@pxref{table-value-types-returned}).
+
+The value for @code{index} can be numeric, in which case @command{gawk}
+converts it to a string. Using non-integral values is possible, but
+requires that you understand how such values are converted to strings
+(@pxref{Conversion}); thus using integral values is safest.
+
+As with @emph{all} strings passed into @code{gawk} from an extension,
+the string value of @code{index} must come from @code{malloc()}, and
+@command{gawk} releases the storage.
+
+@item awk_bool_t set_array_element(awk_array_t a_cookie,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const@ awk_value_t *const index,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const@ awk_value_t *const value);
+In the array represented by @code{a_cookie}, create or modify
+the element whose index is given by @code{index}.
+The @code{ARGV} and @code{ENVIRON} arrays may not be changed.
+
+@item awk_bool_t set_array_element_by_elem(awk_array_t a_cookie,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_element_t element);
+Like @code{set_array_element()}, but take the @code{index} and @code{value}
+from @code{element}. This is a convenience macro.
+
+@item awk_bool_t del_array_element(awk_array_t a_cookie,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const awk_value_t* const index);
+Remove the element with the given index from the array
+represented by @code{a_cookie}.
+Return true if the element was removed, or false if the element did
+not exist in the array.
+@end table
+
+The following functions relate to arrays as a whole:
+
+@table @code
+@item awk_array_t create_array();
+Create a new array to which elements may be added.
+@xref{Creating Arrays}, for a discussion of how to
+create a new array and add elements to it.
+
+@item awk_bool_t clear_array(awk_array_t a_cookie);
+Clear the array represented by @code{a_cookie}.
+Return false if there was some kind of problem, true otherwise.
+The array remains an array, but after calling this function, it
+has no elements. This is equivalent to using the @code{delete}
+statement (@pxref{Delete}).
+
+@item awk_bool_t flatten_array(awk_array_t a_cookie, awk_flat_array_t **data);
+For the array represented by @code{a_cookie}, create an @code{awk_flat_array_t}
+structure and fill it in. Set the pointer whose address is passed as @code{data}
+to point to this structure.
+Return true upon success, or false otherwise.
+@xref{Flattening Arrays}, for a discussion of how to
+flatten an array and work with it.
+
+@item awk_bool_t release_flattened_array(awk_array_t a_cookie,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_flat_array_t *data);
+When done with a flattened array, release the storage using this function.
+You must pass in both the original array cookie, and the address of
+the created @code{awk_flat_array_t} structure.
+The function returns true upon success, false otherwise.
+@end table
+
+@node Flattening Arrays
+@subsubsection Working With All The Elements of an Array
+
+To @dfn{flatten} an array is create a structure that
+represents the full array in a fashion that makes it easy
+for C code to traverse the entire array. Test code
+in @file{extension/testext.c} does this, and also serves
+as a nice example to show how to use the APIs.
+
+First, the @command{gawk} script that drives the test extension:
+
+@example
+@@load "testext"
+BEGIN @{
+ n = split("blacky rusty sophie raincloud lucky", pets)
+ printf "pets has %d elements\n", length(pets)
+ ret = dump_array_and_delete("pets", "3")
+ printf "dump_array_and_delete(pets) returned %d\n", ret
+ if ("3" in pets)
+ printf("dump_array_and_delete() did NOT remove index \"3\"!\n")
+ else
+ printf("dump_array_and_delete() did remove index \"3\"!\n")
+ print ""
+@}
+@end example
+
+@noindent
+This code creates an array with @code{split()} (@pxref{String Functions})
+and then calls @code{dump_and_delete()}. That function looks up
+the array whose name is passed as the first argument, and
+deletes the element at the index passed in the second argument.
+It then prints the return value and checks if the element
+was indeed deleted. Here is the C code that implements
+@code{dump_array_and_delete()}. It has been edited slightly for
+presentation.
+
+The first part declares variables, sets up the default
+return value in @code{result}, and checks that the function
+was called with the correct number of arguments:
+
+@example
+static awk_value_t *
+dump_array_and_delete(int nargs, awk_value_t *result)
+@{
+ awk_value_t value, value2, value3;
+ awk_flat_array_t *flat_array;
+ size_t count;
+ char *name;
+ int i;
+
+ assert(result != NULL);
+ make_number(0.0, result);
+
+ if (nargs != 2) @{
+ printf("dump_array_and_delete: nargs not right "
+ "(%d should be 2)\n", nargs);
+ goto out;
+ @}
+@end example
+
+The function then proceeds in steps, as follows. First, retrieve
+the name of the array, passed as the first argument. Then
+retrieve the array itself. If either operation fails, print
+error messages and return:
+
+@example
+ /* get argument named array as flat array and print it */
+ if (get_argument(0, AWK_STRING, & value)) @{
+ name = value.str_value.str;
+ if (sym_lookup(name, AWK_ARRAY, & value2))
+ printf("dump_array_and_delete: sym_lookup of %s passed\n",
+ name);
+ else @{
+ printf("dump_array_and_delete: sym_lookup of %s failed\n",
+ name);
+ goto out;
+ @}
+ @} else @{
+ printf("dump_array_and_delete: get_argument(0) failed\n");
+ goto out;
+ @}
+@end example
+
+For testing purposes and to make sure that the C code sees
+the same number of elements as the @command{awk} code,
+the second step is to get the count of elements in the array
+and print it:
+
+@example
+ if (! get_element_count(value2.array_cookie, & count)) @{
+ printf("dump_array_and_delete: get_element_count failed\n");
+ goto out;
+ @}
+
+ printf("dump_array_and_delete: incoming size is %lu\n",
+ (unsigned long) count);
+@end example
+
+The third step is to actually flatten the array, and then
+to double check that the count in the @code{awk_flat_array_t}
+is the same as the count just retrieved:
+
+@example
+ if (! flatten_array(value2.array_cookie, & flat_array)) @{
+ printf("dump_array_and_delete: could not flatten array\n");
+ goto out;
+ @}
+
+ if (flat_array->count != count) @{
+ printf("dump_array_and_delete: flat_array->count (%lu)"
+ " != count (%lu)\n",
+ (unsigned long) flat_array->count,
+ (unsigned long) count);
+ goto out;
+ @}
+@end example
+
+The fourth step is to retrieve the index of the element
+to be deleted, which was passed as the second argument.
+Remember that argument counts passed to @code{get_argument()}
+are zero-based, thus the second argument is numbered one:
+
+@example
+ if (! get_argument(1, AWK_STRING, & value3)) @{
+ printf("dump_array_and_delete: get_argument(1) failed\n");
+ goto out;
+ @}
+@end example
+
+The fifth step is where the ``real work'' is done. The function
+loops over every element in the array, printing the index and
+element values. In addition, upon finding the element with the
+index that is supposed to be deleted, the function sets the
+@code{AWK_ELEMENT_DELETE} bit in the @code{flags} field
+of the element. When the array is released, @command{gawk}
+traverses the flattened array, and deletes any element which
+have this flag bit set:
+
+@example
+ for (i = 0; i < flat_array->count; i++) @{
+ printf("\t%s[\"%.*s\"] = %s\n",
+ name,
+ (int) flat_array->elements[i].index.str_value.len,
+ flat_array->elements[i].index.str_value.str,
+ valrep2str(& flat_array->elements[i].value));
+
+ if (strcmp(value3.str_value.str,
+ flat_array->elements[i].index.str_value.str)
+ == 0) @{
+ flat_array->elements[i].flags |= AWK_ELEMENT_DELETE;
+ printf("dump_array_and_delete: marking element \"%s\" "
+ "for deletion\n",
+ flat_array->elements[i].index.str_value.str);
+ @}
+ @}
+@end example
+
+The sixth step is to release the flattened array. This tells
+@command{gawk} that the extension is no longer using the array,
+and that it should delete any elements marked for deletion.
+@command{gawk} also frees any storage that was allocated,
+so you should not use the pointer (@code{flat_array} in this
+code) once you have called @code{release_flattened_array()}:
+
+@example
+ if (! release_flattened_array(value2.array_cookie, flat_array)) @{
+ printf("dump_array_and_delete: could not release flattened array\n");
+ goto out;
+ @}
+@end example
+
+Finally, since everything was successful, the function sets the
+return value to success, and returns:
+
+@example
+ make_number(1.0, result);
+out:
+ return result;
+@}
+@end example
+
+Here is the output from running this part of the test:
+
+@example
+pets has 5 elements
+dump_array_and_delete: sym_lookup of pets passed
+dump_array_and_delete: incoming size is 5
+ pets["1"] = "blacky"
+ pets["2"] = "rusty"
+ pets["3"] = "sophie"
+dump_array_and_delete: marking element "3" for deletion
+ pets["4"] = "raincloud"
+ pets["5"] = "lucky"
+dump_array_and_delete(pets) returned 1
+dump_array_and_delete() did remove index "3"!
+@end example
+
+@node Creating Arrays
+@subsubsection How To Create and Populate Arrays
+
+Besides working with arrays created by @command{awk} code, you can
+create arrays and populate them as you see fit, and then @command{awk}
+code can access them and manipulate them.
+
+There are two important points about creating arrays from extension code:
+
+@enumerate 1
+@item
+You must install a new array into @command{gawk}'s symbol
+table immediately upon creating it. Once you have done so,
+you can then populate the array.
+
+@ignore
+Strictly speaking, this is required only
+for arrays that will have subarrays as elements; however it is
+a good idea to always do this. This restriction may be relaxed
+in a subsequent revision of the API.
+@end ignore
+
+Similarly, if installing a new array as a subarray of an existing array,
+you must add the new array to its parent before adding any elements to it.
+
+Thus, the correct way to build an array is to work ``top down.'' Create
+the array, and immediately install it in @command{gawk}'s symbol table
+using @code{sym_update()}, or install it as an element in a previously
+existing array using @code{set_element()}. Example code is coming shortly.
+
+@item
+Due to gawk internals, after using @code{sym_update()} to install an array
+into @command{gawk}, you have to retrieve the array cookie from the value
+passed in to @command{sym_update()} before doing anything else with it, like so:
+
+@example
+awk_value_t index, value;
+awk_array_t new_array;
+
+make_const_string("an index", 8, & index);
+
+new_array = create_array();
+val.val_type = AWK_ARRAY;
+val.array_cookie = new_array;
+
+/* install array in the symbol table */
+sym_update("array", & index, & val);
+
+new_array = val.array_cookie; /* YOU MUST DO THIS */
+@end example
+
+If installing an array as a subarray, you must also retrieve the value
+of the array cookie after the call to @code{set_element()}.
+@end enumerate
+
+The following C code is a simple test extension to create an array
+with two regular elements and with a subarray. The leading @samp{#include}
+directives and boilerplate variable declarations are omitted for brevity.
+The first step is to create a new array and then install it
+in the symbol table:
+
+@example
+@ignore
+#ifdef HAVE_CONFIG_H
+#include <config.h>
+#endif
+
+#include <stdio.h>
+#include <assert.h>
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+#include <sys/types.h>
+#include <sys/stat.h>
+
+#include "gawkapi.h"
+
+static const gawk_api_t *api; /* for convenience macros to work */
+static awk_ext_id_t *ext_id;
+static const char *ext_version = "testarray extension: version 1.0";
+
+int plugin_is_GPL_compatible;
+
+@end ignore
+/* create_new_array --- create a named array */
+
+static void
+create_new_array()
+@{
+ awk_array_t a_cookie;
+ awk_array_t subarray;
+ awk_value_t index, value;
+
+ a_cookie = create_array();
+ value.val_type = AWK_ARRAY;
+ value.array_cookie = a_cookie;
+
+ if (! sym_update("new_array", & value))
+ printf("create_new_array: sym_update(\"new_array\") failed!\n");
+ a_cookie = value.array_cookie;
+@end example
+
+@noindent
+Note how @code{a_cookie} is reset from the @code{array_cookie} field in
+the @code{value} structure.
+
+The second step is to install two regular values into @code{new_array}:
+
+@example
+ (void) make_const_string("hello", 5, & index);
+ (void) make_const_string("world", 5, & value);
+ if (! set_array_element(a_cookie, & index, & value)) @{
+ printf("fill_in_array: set_array_element failed\n");
+ return;
+ @}
+
+ (void) make_const_string("answer", 6, & index);
+ (void) make_number(42.0, & value);
+ if (! set_array_element(a_cookie, & index, & value)) @{
+ printf("fill_in_array: set_array_element failed\n");
+ return;
+ @}
+@end example
+
+The third step is to create the subarray and install it:
+
+@example
+ (void) make_const_string("subarray", 8, & index);
+ subarray = create_array();
+ value.val_type = AWK_ARRAY;
+ value.array_cookie = subarray;
+ if (! set_array_element(a_cookie, & index, & value)) @{
+ printf("fill_in_array: set_array_element failed\n");
+ return;
+ @}
+ subarray = value.array_cookie;
+@end example
+
+The final step is to populate the subarray with its own element:
+
+@example
+ (void) make_const_string("foo", 3, & index);
+ (void) make_const_string("bar", 3, & value);
+ if (! set_array_element(subarray, & index, & value)) @{
+ printf("fill_in_array: set_array_element failed\n");
+ return;
+ @}
+@}
+@ignore
+static awk_ext_func_t func_table[] = @{
+ @{ NULL, NULL, 0 @}
+@};
+
+/* init_testarray --- additional initialization function */
+
+static awk_bool_t init_testarray(void)
+@{
+ create_new_array();
+
+ return 1;
+@}
+
+static awk_bool_t (*init_func)(void) = init_testarray;
+
+dl_load_func(func_table, testarray, "")
+@end ignore
+@end example
+
+Here is sample script that loads the extension
+and then dumps the array:
+
+@example
+@@load "subarray"
+
+function dumparray(name, array, i)
+@{
+ for (i in array)
+ if (isarray(array[i]))
+ dumparray(name "[\"" i "\"]", array[i])
+ else
+ printf("%s[\"%s\"] = %s\n", name, i, array[i])
+@}
+
+BEGIN @{
+ dumparray("new_array", new_array);
+@}
+@end example
+
+Here is the result of running the script:
+
+@example
+$ @kbd{AWKLIBPATH=$PWD ./gawk -f subarray.awk}
+@print{} new_array["subarray"]["foo"] = bar
+@print{} new_array["hello"] = world
+@print{} new_array["answer"] = 42
+@end example
+
+@noindent
+(@xref{Finding Extensions}, for more information on the
+@env{AWKLIBPATH} environment variable.)
+
+@node Extension API Variables
+@subsection API Variables
+
+The API provides two sets of variables. The first provides information
+about the version of the API (both with which the extension was compiled,
+and with which @command{gawk} was compiled). The second provides
+information about how @command{gawk} was invoked.
+
+@menu
+* Extension Versioning:: API Version information.
+* Extension API Informational Variables:: Variables providing information about
+ @command{gawk}'s invocation.
+@end menu
+
+@node Extension Versioning
+@subsubsection API Version Constants and Variables
+
+The API provides both a ``major'' and a ``minor'' version number.
+The API versions are available at compile time as constants:
+
+@table @code
+@item GAWK_API_MAJOR_VERSION
+The major version of the API.
+
+@item GAWK_API_MINOR_VERSION
+The minor version of the API.
+@end table
+
+The minor version increases when new functions are added to the API. Such
+new functions are always added to the end of the API @code{struct}.
+
+The major version increases (and the minor version is reset to zero) if any
+of the data types change size or member order, or if any of the existing
+functions change signature.
+
+It could happen that an extension may be compiled against one version
+of the API but loaded by a version of @command{gawk} using a different
+version. For this reason, the major and minor API versions of the
+running @command{gawk} are included in the API @code{struct} as read-only
+constant integers:
+
+@table @code
+@item api->major_version
+The major version of the running @command{gawk}.
+
+@item api->minor_version
+The minor version of the running @command{gawk}.
+@end table
+
+It is up to the extension to decide if there are API incompatibilities.
+Typically a check like this is enough:
+
+@example
+if (api->major_version != GAWK_API_MAJOR_VERSION
+ || api->minor_version < GAWK_API_MINOR_VERSION) @{
+ fprintf(stderr, "foo_extension: version mismatch with gawk!\n");
+ fprintf(stderr, "\tmy version (%d, %d), gawk version (%d, %d)\n",
+ GAWK_API_MAJOR_VERSION, GAWK_API_MINOR_VERSION,
+ api->major_version, api->minor_version);
+ exit(1);
+@}
+@end example
+
+Such code is included in the boilerplate @code{dl_load_func()} macro
+provided in @file{gawkapi.h} (discussed later, in
+@ref{Extension API Boilerplate}).
+
+@node Extension API Informational Variables
+@subsubsection Informational Variables
+
+The API provides access to several variables that describe
+whether the corresponding command-line options were enabled when
+@command{gawk} was invoked. The variables are:
+
+@table @code
+@item do_lint
+This variable is true if @command{gawk} was invoked with @option{--lint} option
+(@pxref{Options}).
+
+@item do_traditional
+This variable is true if @command{gawk} was invoked with @option{--traditional} option.
+
+@item do_profile
+This variable is true if @command{gawk} was invoked with @option{--profile} option.
+
+@item do_sandbox
+This variable is true if @command{gawk} was invoked with @option{--sandbox} option.
+
+@item do_debug
+This variable is true if @command{gawk} was invoked with @option{--debug} option.
+
+@item do_mpfr
+This variable is true if @command{gawk} was invoked with @option{--bignum} option.
+@end table
+
+The value of @code{do_lint} can change if @command{awk} code
+modifies the @code{LINT} built-in variable (@pxref{Built-in Variables}).
+The others should not change during execution.
+
+@node Extension API Boilerplate
+@subsection Boilerplate Code
+
+As mentioned earlier (@pxref{Extension Mechanism Outline}), the function
+definitions as presented are really macros. To use these macros, your
+extension must provide a small amount of boilerplate code (variables and
+functions) towards the top of your source file, using pre-defined names
+as described below. The boilerplate needed is also provided in comments
+in the @file{gawkapi.h} header file:
+
+@example
+/* Boiler plate code: */
+int plugin_is_GPL_compatible;
+
+static gawk_api_t *const api;
+static awk_ext_id_t ext_id;
+static const char *ext_version = NULL; /* or @dots{} = "some string" */
+
+static awk_ext_func_t func_table[] = @{
+ @{ "name", do_name, 1 @},
+ /* @dots{} */
+@};
+
+/* EITHER: */
+
+static awk_bool_t (*init_func)(void) = NULL;
+
+/* OR: */
+
+static awk_bool_t
+init_my_module(void)
+@{
+ @dots{}
+@}
+
+static awk_bool_t (*init_func)(void) = init_my_module;
+
+dl_load_func(func_table, some_name, "name_space_in_quotes")
+@end example
+
+These variables and functions are as follows:
+
+@table @code
+@item int plugin_is_GPL_compatible;
+This asserts that the extension is compatible with the GNU GPL
+(@pxref{Copying}). If your extension does not have this, @command{gawk}
+will not load it (@pxref{Plugin License}).
+
+@item static gawk_api_t *const api;
+This global @code{static} variable should be set to point to
+the @code{gawk_api_t} pointer that @command{gawk} passes to your
+@code{dl_load()} function. This variable is used by all of the macros.
+
+@item static awk_ext_id_t ext_id;
+This global static variable should be set to the @code{awk_ext_id_t}
+value that @command{gawk} passes to your @code{dl_load()} function.
+This variable is used by all of the macros.
+
+@item static const char *ext_version = NULL; /* or @dots{} = "some string" */
+This global @code{static} variable should be set either
+to @code{NULL}, or to point to a string giving the name and version of
+your extension.
+
+@item static awk_ext_func_t func_table[] = @{ @dots{} @};
+This is an array of one or more @code{awk_ext_func_t} structures
+as described earlier (@pxref{Extension Functions}).
+It can then be looped over for multiple calls to
+@code{add_ext_func()}.
+
+@item static awk_bool_t (*init_func)(void) = NULL;
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @r{OR}
+@itemx static awk_bool_t init_my_module(void) @{ @dots{} @}
+@itemx static awk_bool_t (*init_func)(void) = init_my_module;
+If you need to do some initialization work, you should define a
+function that does it (creates variables, opens files, etc.)
+and then define the @code{init_func} pointer to point to your
+function.
+The function should return zero (false) upon failure, non-zero
+(success) if everything goes well.
+
+If you don't need to do any initialization, define the pointer and
+initialize it to @code{NULL}.
+
+@item dl_load_func(func_table, some_name, "name_space_in_quotes")
+This macro expands to a @code{dl_load()} function that performs
+all the necessary initializations.
+@end table
+
+The point of the all the variables and arrays is to let the
+@code{dl_load()} function (from the @code{dl_load_func()}
+macro) do all the standard work. It does the following:
+
+@enumerate 1
+@item
+Check the API versions. If the extension major version does not match
+@command{gawk}'s, or if the extension minor version is greater than
+@command{gawk}'s, it prints a fatal error message and exits.
+
+@item
+Load the functions defined in @code{func_table}.
+If any of them fails to load, it prints a warning message but
+continues on.
+
+@item
+If the @code{init_func} pointer is not @code{NULL}, call the
+function it points to. If it returns non-zero, print a
+warning message.
+
+@item
+If @code{ext_version} is not @code{NULL}, register
+the version string with @command{gawk}.
+@end enumerate
+
+@node Finding Extensions
+@subsection How @command{gawk} Finds Extensions
+
+Compiled extensions have to be installed in a directory where
+@command{gawk} can find them. If @command{gawk} is configured and
+built in the default fashion, the directory in which to find
+extensions is @file{/usr/local/lib/gawk}. You can also specify a search
+path with a list of directories to search for compiled extensions.
+@xref{AWKLIBPATH Variable}, for more information.
+
+@node Extension Example
+@section Example: Some File Functions
+
+@quotation
+@i{No matter where you go, there you are.} @*
+Buckaroo Bonzai
+@end quotation
+
+@c It's enough to show chdir and stat, no need for fts
+
+Two useful functions that are not in @command{awk} are @code{chdir()} (so
+that an @command{awk} program can change its directory) and @code{stat()}
+(so that an @command{awk} program can gather information about a file).
+This @value{SECTION} implements these functions for @command{gawk}
+in an extension.
@menu
* Internal File Description:: What the new functions will do.
@@ -28121,13 +30613,13 @@ external extension library.
@node Internal File Description
@subsection Using @code{chdir()} and @code{stat()}
-This @value{SECTION} shows how to use the new functions at the @command{awk}
-level once they've been integrated into the running @command{gawk}
-interpreter.
-Using @code{chdir()} is very straightforward. It takes one argument,
-the new directory to change to:
+This @value{SECTION} shows how to use the new functions at
+the @command{awk} level once they've been integrated into the
+running @command{gawk} interpreter. Using @code{chdir()} is very
+straightforward. It takes one argument, the new directory to change to:
@example
+@@load "filefuncs"
@dots{}
newdir = "/home/arnold/funstuff"
ret = chdir(newdir)
@@ -28139,21 +30631,18 @@ if (ret < 0) @{
@dots{}
@end example
-The return value is negative if the @code{chdir} failed,
-and @code{ERRNO}
-(@pxref{Built-in Variables})
-is set to a string indicating the error.
+The return value is negative if the @code{chdir()} failed, and
+@code{ERRNO} (@pxref{Built-in Variables}) is set to a string indicating
+the error.
-Using @code{stat()} is a bit more complicated.
-The C @code{stat()} function fills in a structure that has a fair
-amount of information.
+Using @code{stat()} is a bit more complicated. The C @code{stat()}
+function fills in a structure that has a fair amount of information.
The right way to model this in @command{awk} is to fill in an associative
array with the appropriate information:
@c broke printf for page breaking
@example
file = "/home/arnold/.profile"
-fdata[1] = "x" # force `fdata' to be an array
ret = stat(file, fdata)
if (ret < 0) @{
printf("could not stat %s: %s\n",
@@ -28198,11 +30687,11 @@ be a function of the file's size if the file has holes.
The file's last access, modification, and inode update times,
respectively. These are numeric timestamps, suitable for formatting
with @code{strftime()}
-(@pxref{Built-in}).
+(@pxref{Time Functions}).
@item "pmode"
The file's ``printable mode.'' This is a string representation of
-the file's type and permissions, such as what is produced by
+the file's type and permissions, such as is produced by
@samp{ls -l}---for example, @code{"drwxr-xr-x"}.
@item "type"
@@ -28263,64 +30752,96 @@ of that number, respectively.
@node Internal File Ops
@subsection C Code for @code{chdir()} and @code{stat()}
-Here is the C code for these extensions. They were written for
-GNU/Linux. The code needs some more work for complete portability
-to other POSIX-compliant systems:@footnote{This version is edited
-slightly for presentation. See
-@file{extension/filefuncs.c} in the @command{gawk} distribution
-for the complete version.}
+Here is the C code for these extensions.@footnote{This version is
+edited slightly for presentation. See @file{extension/filefuncs.c}
+in the @command{gawk} distribution for the complete version.}
+
+The file includes a number of standard header files, and then includes
+the @file{gawkapi.h} header file which provides the API definitions.
+Those are followed by the necessary variable declarations
+to make use of the API macros and boilerplate code
+(@pxref{Extension API Boilerplate}).
@c break line for page breaking
@example
-#include "awk.h"
+#ifdef HAVE_CONFIG_H
+#include <config.h>
+#endif
+
+#include <stdio.h>
+#include <assert.h>
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+#include <sys/types.h>
+#include <sys/stat.h>
-#include <sys/sysmacros.h>
+#include "gawkapi.h"
+
+#include "gettext.h"
+#define _(msgid) gettext(msgid)
+#define N_(msgid) msgid
+
+#include "gawkfts.h"
+#include "stack.h"
+
+static const gawk_api_t *api; /* for convenience macros to work */
+static awk_ext_id_t *ext_id;
+static awk_bool_t init_filefuncs(void);
+static awk_bool_t (*init_func)(void) = init_filefuncs;
+static const char *ext_version = "filefuncs extension: version 1.0";
int plugin_is_GPL_compatible;
+@end example
+@cindex programming conventions, @command{gawk} internals
+By convention, for an @command{awk} function @code{foo()}, the C function
+that implements it is called @code{do_foo()}. The function should have
+two arguments: the first is an @code{int} usually called @code{nargs},
+that represents the number of actual arguments for the function.
+The second is a pointer to an @code{awk_value_t}, usually named
+@code{result}.
+
+@example
/* do_chdir --- provide dynamically loaded chdir() builtin for gawk */
-static NODE *
-do_chdir(int nargs)
+static awk_value_t *
+do_chdir(int nargs, awk_value_t *result)
@{
- NODE *newdir;
+ awk_value_t newdir;
int ret = -1;
- if (do_lint && nargs != 1)
- lintwarn("chdir: called with incorrect number of arguments");
+ assert(result != NULL);
- newdir = get_scalar_argument(0, FALSE);
+ if (do_lint && nargs != 1)
+ lintwarn(ext_id,
+ _("chdir: called with incorrect number of arguments, "
+ "expecting 1"));
@end example
-The file includes the @code{"awk.h"} header file for definitions
-for the @command{gawk} internals. It includes @code{<sys/sysmacros.h>}
-for access to the @code{major()} and @code{minor}() macros.
-
-@cindex programming conventions, @command{gawk} internals
-By convention, for an @command{awk} function @code{foo}, the function that
-implements it is called @samp{do_foo}. The function should take
-a @samp{int} argument, usually called @code{nargs}, that
-represents the number of defined arguments for the function. The @code{newdir}
+The @code{newdir}
variable represents the new directory to change to, retrieved
-with @code{get_scalar_argument()}. Note that the first argument is
+with @code{get_argument()}. Note that the first argument is
numbered zero.
-This code actually accomplishes the @code{chdir()}. It first forces
-the argument to be a string and passes the string value to the
+If the argument is retrieved successfully, the function calls the
@code{chdir()} system call. If the @code{chdir()} fails, @code{ERRNO}
is updated.
@example
- (void) force_string(newdir);
- ret = chdir(newdir->stptr);
- if (ret < 0)
- update_ERRNO_int(errno);
+ if (get_argument(0, AWK_STRING, & newdir)) @{
+ ret = chdir(newdir.str_value.str);
+ if (ret < 0)
+ update_ERRNO_int(errno);
+ @}
@end example
Finally, the function returns the return value to the @command{awk} level:
@example
- return make_number((AWKNUM) ret);
+ return make_number(ret, result);
@}
@end example
@@ -28339,7 +30860,168 @@ format_mode(unsigned long fmode)
@}
@end example
-Next comes the @code{do_stat()} function. It starts with
+Next comes a function for reading symbolic links, which is also
+omitted here for brevity:
+
+@example
+/* read_symlink --- read a symbolic link into an allocated buffer.
+ @dots{} */
+
+static char *
+read_symlink(const char *fname, size_t bufsize, ssize_t *linksize)
+@{
+ @dots{}
+@}
+@end example
+
+Two helper functions simplify entering values in the
+array that will contain the result of the @code{stat()}:
+
+@example
+/* array_set --- set an array element */
+
+static void
+array_set(awk_array_t array, const char *sub, awk_value_t *value)
+@{
+ awk_value_t index;
+
+ set_array_element(array,
+ make_const_string(sub, strlen(sub), & index),
+ value);
+
+@}
+
+/* array_set_numeric --- set an array element with a number */
+
+static void
+array_set_numeric(awk_array_t array, const char *sub, double num)
+@{
+ awk_value_t tmp;
+
+ array_set(array, sub, make_number(num, & tmp));
+@}
+@end example
+
+The following function does most of the work to fill in
+the @code{awk_array_t} result array with values obtained
+from a valid @code{struct stat}. It is done in a separate function
+to support the @code{stat()} function for @command{gawk} and also
+to support the @code{fts()} extension which is included in
+the same file but whose code is not shown here
+(@pxref{Extension Sample File Functions}).
+
+The first part of the function is variable declarations,
+including a table to map file types to strings:
+
+@example
+/* fill_stat_array --- do the work to fill an array with stat info */
+
+static int
+fill_stat_array(const char *name, awk_array_t array, struct stat *sbuf)
+@{
+ char *pmode; /* printable mode */
+ const char *type = "unknown";
+ awk_value_t tmp;
+ static struct ftype_map @{
+ unsigned int mask;
+ const char *type;
+ @} ftype_map[] = @{
+ @{ S_IFREG, "file" @},
+ @{ S_IFBLK, "blockdev" @},
+ @{ S_IFCHR, "chardev" @},
+ @{ S_IFDIR, "directory" @},
+#ifdef S_IFSOCK
+ @{ S_IFSOCK, "socket" @},
+#endif
+#ifdef S_IFIFO
+ @{ S_IFIFO, "fifo" @},
+#endif
+#ifdef S_IFLNK
+ @{ S_IFLNK, "symlink" @},
+#endif
+#ifdef S_IFDOOR /* Solaris weirdness */
+ @{ S_IFDOOR, "door" @},
+#endif /* S_IFDOOR */
+ @};
+ int j, k;
+@end example
+
+The destination array is cleared, and then code fills in
+various elements based on values in the @code{struct stat}:
+
+@example
+ /* empty out the array */
+ clear_array(array);
+
+ /* fill in the array */
+ array_set(array, "name", make_const_string(name, strlen(name),
+ & tmp));
+ array_set_numeric(array, "dev", sbuf->st_dev);
+ array_set_numeric(array, "ino", sbuf->st_ino);
+ array_set_numeric(array, "mode", sbuf->st_mode);
+ array_set_numeric(array, "nlink", sbuf->st_nlink);
+ array_set_numeric(array, "uid", sbuf->st_uid);
+ array_set_numeric(array, "gid", sbuf->st_gid);
+ array_set_numeric(array, "size", sbuf->st_size);
+ array_set_numeric(array, "blocks", sbuf->st_blocks);
+ array_set_numeric(array, "atime", sbuf->st_atime);
+ array_set_numeric(array, "mtime", sbuf->st_mtime);
+ array_set_numeric(array, "ctime", sbuf->st_ctime);
+
+ /* for block and character devices, add rdev,
+ major and minor numbers */
+ if (S_ISBLK(sbuf->st_mode) || S_ISCHR(sbuf->st_mode)) @{
+ array_set_numeric(array, "rdev", sbuf->st_rdev);
+ array_set_numeric(array, "major", major(sbuf->st_rdev));
+ array_set_numeric(array, "minor", minor(sbuf->st_rdev));
+ @}
+@end example
+
+@noindent
+The latter part of the function makes selective additions
+to the destination array, depending upon the availability of
+certain members and/or the type of the file. It then returns zero,
+for success:
+
+@example
+#ifdef HAVE_ST_BLKSIZE
+ array_set_numeric(array, "blksize", sbuf->st_blksize);
+#endif /* HAVE_ST_BLKSIZE */
+
+ pmode = format_mode(sbuf->st_mode);
+ array_set(array, "pmode", make_const_string(pmode, strlen(pmode),
+ & tmp));
+
+ /* for symbolic links, add a linkval field */
+ if (S_ISLNK(sbuf->st_mode)) @{
+ char *buf;
+ ssize_t linksize;
+
+ if ((buf = read_symlink(name, sbuf->st_size,
+ & linksize)) != NULL)
+ array_set(array, "linkval",
+ make_malloced_string(buf, linksize, & tmp));
+ else
+ warning(ext_id, _("stat: unable to read symbolic link `%s'"),
+ name);
+ @}
+
+ /* add a type field */
+ type = "unknown"; /* shouldn't happen */
+ for (j = 0, k = sizeof(ftype_map)/sizeof(ftype_map[0]); j < k; j++) @{
+ if ((sbuf->st_mode & S_IFMT) == ftype_map[j].mask) @{
+ type = ftype_map[j].type;
+ break;
+ @}
+ @}
+
+ array_set(array, "type", make_const_string(type, strlen(type), &tmp));
+
+ return 0;
+@}
+@end example
+
+Finally, here is the @code{do_stat()} function. It starts with
variable declarations and argument checking:
@ignore
@@ -28349,116 +31031,140 @@ Changed message for page breaking. Used to be:
@example
/* do_stat --- provide a stat() function for gawk */
-static NODE *
-do_stat(int nargs)
+static awk_value_t *
+do_stat(int nargs, awk_value_t *result)
@{
- NODE *file, *array, *tmp;
- struct stat sbuf;
+ awk_value_t file_param, array_param;
+ char *name;
+ awk_array_t array;
int ret;
- NODE **aptr;
- char *pmode; /* printable mode */
- char *type = "unknown";
+ struct stat sbuf;
- if (do_lint && nargs > 2)
- lintwarn("stat: called with too many arguments");
+ assert(result != NULL);
+
+ if (do_lint && nargs != 2) @{
+ lintwarn(ext_id,
+ _("stat: called with wrong number of arguments"));
+ return make_number(-1, result);
+ @}
@end example
Then comes the actual work. First, the function gets the arguments.
-Then, it always clears the array.
+Next, it gets the information for the file.
The code use @code{lstat()} (instead of @code{stat()})
to get the file information,
in case the file is a symbolic link.
If there's an error, it sets @code{ERRNO} and returns:
-@c comment made multiline for page breaking
@example
/* file is first arg, array to hold results is second */
- file = get_scalar_argument(0, FALSE);
- array = get_array_argument(1, FALSE);
+ if ( ! get_argument(0, AWK_STRING, & file_param)
+ || ! get_argument(1, AWK_ARRAY, & array_param)) @{
+ warning(ext_id, _("stat: bad parameters"));
+ return make_number(-1, result);
+ @}
- /* empty out the array */
- assoc_clear(array);
+ name = file_param.str_value.str;
+ array = array_param.array_cookie;
+
+ /* always empty out the array */
+ clear_array(array);
/* lstat the file, if error, set ERRNO and return */
- (void) force_string(file);
- ret = lstat(file->stptr, & sbuf);
+ ret = lstat(name, & sbuf);
if (ret < 0) @{
update_ERRNO_int(errno);
- return make_number((AWKNUM) ret);
+ return make_number(ret, result);
@}
@end example
-Now comes the tedious part: filling in the array. Only a few of the
-calls are shown here, since they all follow the same pattern:
+The tedious work is done by @code{fill_stat_array()}, shown
+earlier. When done, return the result from @code{fill_stat_array()}:
@example
- /* fill in the array */
- aptr = assoc_lookup(array, tmp = make_string("name", 4));
- *aptr = dupnode(file);
- unref(tmp);
+ ret = fill_stat_array(name, array, & sbuf);
- aptr = assoc_lookup(array, tmp = make_string("mode", 4));
- *aptr = make_number((AWKNUM) sbuf.st_mode);
- unref(tmp);
-
- aptr = assoc_lookup(array, tmp = make_string("pmode", 5));
- pmode = format_mode(sbuf.st_mode);
- *aptr = make_string(pmode, strlen(pmode));
- unref(tmp);
+ return make_number(ret, result);
+@}
@end example
-When done, return the @code{lstat()} return value:
+@cindex programming conventions, @command{gawk} internals
+Finally, it's necessary to provide the ``glue'' that loads the
+new function(s) into @command{gawk}.
+
+The @code{filefuncs} extension also provides an @code{fts()}
+function, which we omit here. For its sake there is an initialization
+function:
@example
+/* init_filefuncs --- initialization routine */
- return make_number((AWKNUM) ret);
+static awk_bool_t
+init_filefuncs(void)
+@{
+ @dots{}
@}
@end example
-@cindex programming conventions, @command{gawk} internals
-Finally, it's necessary to provide the ``glue'' that loads the
-new function(s) into @command{gawk}. By convention, each library has
-a routine named @code{dl_load()} that does the job. The simplest way
-is to use the @code{dl_load_func} macro in @code{gawkapi.h}.
+We are almost done. We need an array of @code{awk_ext_func_t}
+structures for loading each function into @command{gawk}:
+
+@example
+static awk_ext_func_t func_table[] = @{
+ @{ "chdir", do_chdir, 1 @},
+ @{ "stat", do_stat, 2 @},
+ @{ "fts", do_fts, 3 @},
+@};
+@end example
+
+Each extension must have a routine named @code{dl_load()} to load
+everything that needs to be loaded. It is simplest to use the
+@code{dl_load_func()} macro in @code{gawkapi.h}:
+
+@example
+/* define the dl_load() function using the boilerplate macro */
+
+dl_load_func(func_table, filefuncs, "")
+@end example
And that's it! As an exercise, consider adding functions to
implement system calls such as @code{chown()}, @code{chmod()},
and @code{umask()}.
@node Using Internal File Ops
-@subsection Integrating the Extensions
+@subsection Integrating The Extensions
@cindex @command{gawk}, interpreter@comma{} adding code to
Now that the code is written, it must be possible to add it at
runtime to the running @command{gawk} interpreter. First, the
code must be compiled. Assuming that the functions are in
a file named @file{filefuncs.c}, and @var{idir} is the location
-of the @command{gawk} include files,
-the following steps create
-a GNU/Linux shared library:
+of the @file{gawkapi.h} header file,
+the following steps@footnote{In practice, you would probably want to
+use the GNU Autotools---Automake, Autoconf, Libtool, and Gettext---to
+configure and build your libraries. Instructions for doing so are beyond
+the scope of this @value{DOCUMENT}. @xref{gawkextlib}, for WWW links to
+the tools.} create a GNU/Linux shared library:
@example
$ @kbd{gcc -fPIC -shared -DHAVE_CONFIG_H -c -O -g -I@var{idir} filefuncs.c}
-$ @kbd{ld -o filefuncs.so -shared filefuncs.o}
+$ @kbd{ld -o filefuncs.so -shared filefuncs.o -lc}
@end example
-@cindex @code{extension()} function (@command{gawk})
-Once the library exists, it is loaded by calling the @code{extension()}
-built-in function.
-This function takes two arguments: the name of the
-library to load and the name of a function to call when the library
-is first loaded. This function adds the new functions to @command{gawk}.
-It returns the value returned by the initialization function
-within the shared library:
+Once the library exists, it is loaded by using the @code{@@load} keyword.
@example
# file testff.awk
+@@load "filefuncs"
+
BEGIN @{
- extension("./filefuncs.so", "dl_load")
+ "pwd" | getline curdir # save current directory
+ close("pwd")
- chdir(".") # no-op
+ chdir("/tmp")
+ system("pwd") # test it
+ chdir(curdir) # go back
- data[1] = 1 # force `data' to be an array
print "Info for testff.awk"
ret = stat("testff.awk", data)
print "ret =", ret
@@ -28476,40 +31182,705 @@ BEGIN @{
@}
@end example
-Here are the results of running the program:
+The @env{AWKLIBPATH} environment variable tells
+@command{gawk} where to find shared libraries (@pxref{Finding Extensions}).
+We set it to the current directory and run the program:
@example
-$ @kbd{gawk -f testff.awk}
+$ @kbd{AWKLIBPATH=$PWD gawk -f testff.awk}
+@print{} /tmp
@print{} Info for testff.awk
@print{} ret = 0
-@print{} data["size"] = 607
-@print{} data["ino"] = 14945891
-@print{} data["name"] = testff.awk
-@print{} data["pmode"] = -rw-rw-r--
-@print{} data["nlink"] = 1
-@print{} data["atime"] = 1293993369
-@print{} data["mtime"] = 1288520752
-@print{} data["mode"] = 33204
@print{} data["blksize"] = 4096
-@print{} data["dev"] = 2054
+@print{} data["mtime"] = 1350838628
+@print{} data["mode"] = 33204
@print{} data["type"] = file
-@print{} data["gid"] = 500
-@print{} data["uid"] = 500
+@print{} data["dev"] = 2053
+@print{} data["gid"] = 1000
+@print{} data["ino"] = 1719496
+@print{} data["ctime"] = 1350838628
@print{} data["blocks"] = 8
-@print{} data["ctime"] = 1290113572
-@print{} testff.awk modified: 10 31 10 12:25:52
+@print{} data["nlink"] = 1
+@print{} data["name"] = testff.awk
+@print{} data["atime"] = 1350838632
+@print{} data["pmode"] = -rw-rw-r--
+@print{} data["size"] = 662
+@print{} data["uid"] = 1000
+@print{} testff.awk modified: 10 21 12 18:57:08
@print{}
@print{} Info for JUNK
@print{} ret = -1
@print{} JUNK modified: 01 01 70 02:00:00
@end example
-@c ENDOFRANGE filre
-@c ENDOFRANGE dirch
-@c ENDOFRANGE statg
-@c ENDOFRANGE chdirg
-@c ENDOFRANGE gladfgaw
-@c ENDOFRANGE adfugaw
-@c ENDOFRANGE fubadgaw
+
+@node Extension Samples
+@section The Sample Extensions In The @command{gawk} Distribution
+
+This @value{SECTION} provides brief overviews of the sample extensions
+that come in the @command{gawk} distribution. Some of them are intended
+for production use, such the @code{filefuncs} and @code{readdir} extensions.
+Others mainly provide example code that shows how to use the extension API.
+
+@menu
+* Extension Sample File Functions:: The file functions sample.
+* Extension Sample Fnmatch:: An interface to @code{fnmatch()}.
+* Extension Sample Fork:: An interface to @code{fork()} and other
+ process functions.
+* Extension Sample Ord:: Character to value to character
+ conversions.
+* Extension Sample Readdir:: An interface to @code{readdir()}.
+* Extension Sample Revout:: Reversing output sample output wrapper.
+* Extension Sample Rev2way:: Reversing data sample two-way processor.
+* Extension Sample Read write array:: Serializing an array to a file.
+* Extension Sample Readfile:: Reading an entire file into a string.
+* Extension Sample API Tests:: Tests for the API.
+* Extension Sample Time:: An interface to @code{gettimeofday()}
+ and @code{sleep()}.
+@end menu
+
+@node Extension Sample File Functions
+@subsection File Related Functions
+
+The @code{filefuncs} extension provides three different functions, as follows:
+The usage is:
+
+@table @code
+@item @@load "filefuncs"
+This is how you load the extension.
+
+@item result = chdir("/some/directory")
+The @code{chdir()} function is a direct hook to the @code{chdir()}
+system call to change the current directory. It returns zero
+upon success or less than zero upon error. In the latter case it updates
+@code{ERRNO}.
+
+@item result = stat("/some/path", statdata)
+The @code{stat()} function provides a hook into the
+@code{stat()} system call. In fact, it uses @code{lstat()}.
+It returns zero upon success or less than zero upon error.
+In the latter case it updates @code{ERRNO}.
+
+In all cases, it clears the @code{statdata} array.
+When the call is successful, @code{stat()} fills the @code{statdata}
+array with information retrieved from the filesystem, as follows:
+
+@c nested table
+@multitable @columnfractions .25 .60
+@item @code{statdata["name"]} @tab
+The name of the file.
+
+@item @code{statdata["dev"]} @tab
+Corresponds to the @code{st_dev} field in the @code{struct stat}.
+
+@item @code{statdata["ino"]} @tab
+Corresponds to the @code{st_ino} field in the @code{struct stat}.
+
+@item @code{statdata["mode"]} @tab
+Corresponds to the @code{st_mode} field in the @code{struct stat}.
+
+@item @code{statdata["nlink"]} @tab
+Corresponds to the @code{st_nlink} field in the @code{struct stat}.
+
+@item @code{statdata["uid"]} @tab
+Corresponds to the @code{st_uid} field in the @code{struct stat}.
+
+@item @code{statdata["gid"]} @tab
+Corresponds to the @code{st_gid} field in the @code{struct stat}.
+
+@item @code{statdata["size"]} @tab
+Corresponds to the @code{st_size} field in the @code{struct stat}.
+
+@item @code{statdata["atime"]} @tab
+Corresponds to the @code{st_atime} field in the @code{struct stat}.
+
+@item @code{statdata["mtime"]} @tab
+Corresponds to the @code{st_mtime} field in the @code{struct stat}.
+
+@item @code{statdata["ctime"]} @tab
+Corresponds to the @code{st_ctime} field in the @code{struct stat}.
+
+@item @code{statdata["rdev"]} @tab
+Corresponds to the @code{st_rdev} field in the @code{struct stat}.
+This element is only present for device files.
+
+@item @code{statdata["major"]} @tab
+Corresponds to the @code{st_major} field in the @code{struct stat}.
+This element is only present for device files.
+
+@item @code{statdata["minor"]} @tab
+Corresponds to the @code{st_minor} field in the @code{struct stat}.
+This element is only present for device files.
+
+@item @code{statdata["blksize"]} @tab
+Corresponds to the @code{st_blksize} field in the @code{struct stat}.
+if this field is present on your system.
+(It is present on all modern systems that we know of.)
+
+@item @code{statdata["pmode"]} @tab
+A human-readable version of the mode value, such as printed by
+@command{ls}. For example, @code{"-rwxr-xr-x"}.
+
+@item @code{statdata["linkval"]} @tab
+If the named file is a symbolic link, this element will exist
+and its value is the value of the symbolic link (where the
+symbolic link points to).
+
+@item @code{statdata["type"]} @tab
+The type of the file as a string. One of
+@code{"file"},
+@code{"blockdev"},
+@code{"chardev"},
+@code{"directory"},
+@code{"socket"},
+@code{"fifo"},
+@code{"symlink"},
+@code{"door"},
+or
+@code{"unknown"}.
+Not all systems support all file types.
+@end multitable
+
+@item flags = or(FTS_PHYSICAL, ...)
+@itemx result = fts(pathlist, flags, filedata)
+Walk the file trees provided in @code{pathlist} and fill in the
+@code{filedata} array as described below. @code{flags} is the bitwise
+OR of several predefined constant values, also as described below.
+Return zero if there were no errors, otherwise return @minus{}1.
+@end table
+
+The @code{fts()} function provides a hook to the C library @code{fts()}
+routines for traversing file hierarchies. Instead of returning data
+about one file at a time in a stream, it fills in a multi-dimensional
+array with data about each file and directory encountered in the requested
+hierarchies.
+
+The arguments are as follows:
+
+@table @code
+@item pathlist
+An array of filenames. The element values are used; the index values are ignored.
+
+@item flags
+This should be the bitwise OR of one or more of the following
+predefined constant flag values. At least one of
+@code{FTS_LOGICAL} or @code{FTS_PHYSICAL} must be provided; otherwise
+@code{fts()} returns an error value and sets @code{ERRNO}.
+The flags are:
+
+@c nested table
+@table @code
+@item FTS_LOGICAL
+Do a ``logical'' file traversal, where the information returned for
+a symbolic link refers to the linked-to file, and not to the symbolic
+link itself. This flag is mutually exclusive with @code{FTS_PHYSICAL}.
+
+@item FTS_PHYSICAL
+Do a ``physical'' file traversal, where the information returned for a
+symbolic link refers to the symbolic link itself. This flag is mutually
+exclusive with @code{FTS_LOGICAL}.
+
+@item FTS_NOCHDIR
+As a performance optimization, the C library @code{fts()} routines
+change directory as they traverse a file hierarchy. This flag disables
+that optimization.
+
+@item FTS_COMFOLLOW
+Immediately follow a symbolic link named in @code{pathlist},
+whether or not @code{FTS_LOGICAL} is set.
+
+@item FTS_SEEDOT
+By default, the @code{fts()} routines do not return entries for @file{.}
+and @file{..}. This option causes entries for @file{..} to also
+be included. (The extension always includes an entry for @file{.},
+see below.)
+
+@item FTS_XDEV
+During a traversal, do not cross onto a different mounted filesystem.
+@end table
+
+@item filedata
+The @code{filedata} array is first cleared. Then, @code{fts()} creates
+an element in @code{filedata} for every element in @code{pathlist}.
+The index is the name of the directory or file given in @code{pathlist}.
+The element for this index is itself an array. There are two cases.
+
+@c nested table
+@table @emph
+@item The path is a file.
+In this case, the array contains two or three elements:
+
+@c doubly nested table
+@table @code
+@item "path"
+The full path to this file, starting from the ``root'' that was given
+in the @code{pathlist} array.
+
+@item "stat"
+This element is itself an array, containing the same information as provided
+by the @code{stat()} function described earlier for its
+@code{statdata} argument. The element may not be present if
+the @code{stat()} system call for the file failed.
+
+@item "error"
+If some kind of error was encountered, the array will also
+contain an element named @code{"error"}, which is a string describing the error.
+@end table
+
+@item The path is a directory.
+In this case, the array contains one element for each entry in the
+directory. If an entry is a file, that element is as for files, just
+described. If the entry is a directory, that element is (recursively),
+an array describing the subdirectory. If @code{FTS_SEEDOT} was provided
+in the flags, then there will also be an element named @code{".."}. This
+element will be an array containing the data as provided by @code{stat()}.
+
+In addition, there will be an element whose index is @code{"."}.
+This element is an array containing the same two or three elements as
+for a file: @code{"path"}, @code{"stat"}, and @code{"error"}.
+@end table
+@end table
+
+The @code{fts()} function returns zero if there were no errors.
+Otherwise it returns @minus{}1.
+
+@quotation NOTE
+The @code{fts()} extension does not exactly mimic the
+interface of the C library @code{fts()} routines, choosing instead to
+provide an interface that is based on associative arrays, which should
+be more comfortable to use from an @command{awk} program. This includes the
+lack of a comparison function, since @command{gawk} already provides
+powerful array sorting facilities. While an @code{fts_read()}-like
+interface could have been provided, this felt less natural than simply
+creating a multi-dimensional array to represent the file hierarchy and
+its information.
+@end quotation
+
+See @file{test/fts.awk} in the @command{gawk} distribution for an example.
+
+@node Extension Sample Fnmatch
+@subsection Interface To @code{fnmatch()}
+
+This extension provides an interface to the C library
+@code{fnmatch()} function. The usage is:
+
+@example
+@@load "fnmatch"
+
+result = fnmatch(pattern, string, flags)
+@end example
+
+The @code{fnmatch} extension adds a single function named
+@code{fnmatch()}, one constant (@code{FNM_NOMATCH}), and an array of
+flag values named @code{FNM}.
+
+The arguments to @code{fnmatch()} are:
+
+@table @code
+@item pattern
+The filename wildcard to match.
+
+@item string
+The filename string,
+
+@item flag
+Either zero, or the bitwise OR of one or more of the
+flags in the @code{FNM} array.
+@end table
+
+The return value is zero on success, @code{FNM_NOMATCH}
+if the string did not match the pattern, or
+a different non-zero value if an error occurred.
+
+The flags are follows:
+
+@multitable @columnfractions .25 .75
+@item @code{FNM["CASEFOLD"]} @tab
+Corresponds to the @code{FNM_CASEFOLD} flag as defined in @code{fnmatch()}.
+
+@item @code{FNM["FILE_NAME"]} @tab
+Corresponds to the @code{FNM_FILE_NAME} flag as defined in @code{fnmatch()}.
+
+@item @code{FNM["LEADING_DIR"]} @tab
+Corresponds to the @code{FNM_LEADING_DIR} flag as defined in @code{fnmatch()}.
+
+@item @code{FNM["NOESCAPE"]} @tab
+Corresponds to the @code{FNM_NOESCAPE} flag as defined in @code{fnmatch()}.
+
+@item @code{FNM["PATHNAME"]} @tab
+Corresponds to the @code{FNM_PATHNAME} flag as defined in @code{fnmatch()}.
+
+@item @code{FNM["PERIOD"]} @tab
+Corresponds to the @code{FNM_PERIOD} flag as defined in @code{fnmatch()}.
+@end multitable
+
+Here is an example:
+
+@example
+@@load "fnmatch"
+@dots{}
+flags = or(FNM["PERIOD"], FNM["NOESCAPE"])
+if (fnmatch("*.a", "foo.c", flags) == FNM_NOMATCH)
+ print "no match"
+@end example
+
+@node Extension Sample Fork
+@subsection Interface To @code{fork()}, @code{wait()} and @code{waitpid()}
+
+The @code{fork} extension adds three functions, as follows.
+
+@table @code
+@item @@load "fork"
+This is how you load the extension.
+
+@item pid = fork()
+This function creates a new process. The return value is the zero in the
+child and the process-id number of the child in the parent, or @minus{}1
+upon error. In the latter case, @code{ERRNO} indicates the problem.
+In the child, @code{PROCINFO["pid"]} and @code{PROCINFO["ppid"]} are
+updated to reflect the correct values.
+
+@item ret = waitpid(pid)
+This function takes a numeric argument, which is the process-id to
+wait for. The return value is that of the
+@code{waitpid()} system call.
+
+@item ret = wait()
+This function waits for the first child to die.
+The return value is that of the
+@code{wait()} system call.
+@end table
+
+There is no corresponding @code{exec()} function.
+
+Here is an example:
+
+@example
+@@load "fork"
+@dots{}
+if ((pid = fork()) == 0)
+ print "hello from the child"
+else
+ print "hello from the parent"
+@end example
+
+@node Extension Sample Ord
+@subsection Character and Numeric values: @code{ord()} and @code{chr()}
+
+The @code{ordchr} extension adds two functions, named
+@code{ord()} and @code{chr()}, as follows.
+
+@table @code
+@item number = ord(string)
+Return the numeric value of the first character in @code{string}.
+
+@item char = chr(number)
+Return the string whose first character is that represented by @code{number}.
+@end table
+
+These functions are inspired by the Pascal language functions
+of the same name. Here is an example:
+
+@example
+@@load "ordchr"
+@dots{}
+printf("The numeric value of 'A' is %d\n", ord("A"))
+printf("The string value of 65 is %s\n", chr(65))
+@end example
+
+@node Extension Sample Readdir
+@subsection Reading Directories
+
+The @code{readdir} extension adds an input parser for directories, and
+adds a single function named @code{readdir_do_ftype()}.
+The usage is as follows:
+
+@example
+@@load "readdir"
+
+readdir_do_ftype("stat") # or "dirent" or "never"
+@end example
+
+When this extension is in use, instead of skipping directories named
+on the command line (or with @code{getline}),
+they are read, with each entry returned as a record.
+
+The record consists of at least two fields: the inode number and the
+filename, separated by a forward slash character.
+On systems where the directory entry contains the file type, the record
+has a third field which is a single letter indicating the type of the
+file:
+
+@multitable @columnfractions .1 .9
+@headitem Letter @tab File Type
+@item @code{b} @tab Block device
+@item @code{c} @tab Character device
+@item @code{d} @tab Directory
+@item @code{f} @tab Regular file
+@item @code{l} @tab Symbolic link
+@item @code{p} @tab Named pipe (FIFO)
+@item @code{s} @tab Socket
+@item @code{u} @tab Anything else (unknown)
+@end multitable
+
+On systems without the file type information, calling
+@samp{readdir_do_ftype("stat")} causes the extension to use the
+@code{lstat()} system call to retrieve the appropriate information. This
+is not the default, since @code{lstat()} is a potentially expensive
+operation. By calling @samp{readdir_do_ftype("never")} one can ensure
+that the file type information is never displayed, even when readily
+available in the directory entry.
+
+The third option, @samp{readdir_do_ftype("dirent")}, takes file type
+information from the directory entry, if it is available. This is the
+default on systems that supply this information.
+
+The @code{readdir_do_ftype()} function sets @code{ERRNO} if called
+without arguments or with invalid arguments.
+
+@quotation NOTE
+On GNU/Linux systems, there are filesystems that don't support the
+@code{d_type} entry (see the @i{readdir}(3) manual page), and so the file
+type is always @samp{u}. Therefore, using @samp{readdir_do_ftype("stat")}
+is advisable even on GNU/Linux systems. In this case, the @code{readdir}
+extension falls back to using @code{lstat()} when it encounters an
+unknown file type.
+@end quotation
+
+Here is an example:
+
+@example
+@@load "readdir"
+@dots{}
+BEGIN @{ FS = "/" @}
+@{ print "file name is", $2 @}
+@end example
+
+@node Extension Sample Revout
+@subsection Reversing Output
+
+The @code{revoutput} extension adds a simple output wrapper that reverses
+the characters in each output line. It's main purpose is to show how to
+write an output wrapper, although it may be mildly amusing for the unwary.
+Here is an example:
+
+@example
+@@load "revoutput"
+
+BEGIN @{
+ REVOUT = 1
+ print "hello, world" > "/dev/stdout"
+@}
+@end example
+
+The output from this program is:
+@samp{dlrow ,olleh}.
+
+@node Extension Sample Rev2way
+@subsection Two-Way I/O Example
+
+The @code{revtwoway} extension adds a simple two-way processor that
+reverses the characters in each line sent to it for reading back by
+the @command{awk} program. It's main purpose is to show how to write
+a two-way processor, although it may also be mildly amusing.
+The following example shows how to use it:
+
+@example
+@@load "revtwoway"
+
+BEGIN @{
+ cmd = "/magic/mirror"
+ print "hello, world" |& cmd
+ cmd |& getline result
+ print result
+ close(cmd)
+@}
+@end example
+
+@node Extension Sample Read write array
+@subsection Dumping and Restoring An Array
+
+The @code{rwarray} extension adds two functions,
+named @code{writea()} and @code{reada()}, as follows:
+
+@table @code
+@item ret = writea(file, array)
+This function takes a string argument, which is the name of the file
+to which dump the array, and the array itself as the second argument.
+@code{writea()} understands multidimensional arrays. It returns one on
+success, or zero upon failure.
+
+@item ret = reada(file, array)
+@code{reada()} is the inverse of @code{writea()};
+it reads the file named as its first argument, filling in
+the array named as the second argument. It clears the array first.
+Here too, the return value is one on success and zero upon failure.
+@end table
+
+The array created by @code{reada()} is identical to that written by
+@code{writea()} in the sense that the contents are the same. However,
+due to implementation issues, the array traversal order of the recreated
+array is likely to be different from that of the original array. As array
+traversal order in @command{awk} is by default undefined, this is not
+(technically) a problem. If you need to guarantee a particular traversal
+order, use the array sorting features in @command{gawk} to do so
+(@pxref{Array Sorting}).
+
+The file contains binary data. All integral values are written in network
+byte order. However, double precision floating-point values are written
+as native binary data. Thus, arrays containing only string data can
+theoretically be dumped on systems with one byte order and restored on
+systems with a different one, but this has not been tried.
+
+Here is an example:
+
+@example
+@@load "rwarray"
+@dots{}
+ret = writea("arraydump.bin", array)
+@dots{}
+ret = reada("arraydump.bin", array)
+@end example
+
+@node Extension Sample Readfile
+@subsection Reading An Entire File
+
+The @code{readfile} extension adds a single function
+named @code{readfile()}:
+
+@table @code
+@item result = readfile("/some/path")
+The argument is the name of the file to read. The return value is a
+string containing the entire contents of the requested file. Upon error,
+the function returns the empty string and sets @code{ERRNO}.
+@end table
+
+Here is an example:
+
+@example
+@@load "readfile"
+@dots{}
+contents = readfile("/path/to/file");
+if (contents == "" && ERRNO != "") @{
+ print("problem reading file", ERRNO) > "/dev/stderr"
+ ...
+@}
+@end example
+
+@node Extension Sample API Tests
+@subsection API Tests
+
+The @code{testext} extension exercises parts of the extension API that
+are not tested by the other samples. The @file{extension/testext.c}
+file contains both the C code for the extension and @command{awk}
+test code inside C comments that run the tests. The testing framework
+extracts the @command{awk} code and runs the tests. See the source file
+for more information.
+
+@node Extension Sample Time
+@subsection Extension Time Functions
+
+@cindex time
+@cindex sleep
+
+These functions can be used by either invoking @command{gawk}
+with a command-line argument of @samp{-l time} or by
+inserting @samp{@@load "time"} in your script.
+
+@table @code
+
+@cindex @code{gettimeofday} time extension function
+@item the_time = gettimeofday()
+Return the time in seconds that has elapsed since 1970-01-01 UTC as a
+floating point value. If the time is unavailable on this platform, return
+@minus{}1 and set @code{ERRNO}. The returned time should have sub-second
+precision, but the actual precision will vary based on the platform.
+If the standard C @code{gettimeofday()} system call is available on this
+platform, then it simply returns the value. Otherwise, if on Windows,
+it tries to use @code{GetSystemTimeAsFileTime()}.
+
+@cindex @code{sleep} time extension function
+@item result = sleep(@var{seconds})
+Attempt to sleep for @var{seconds} seconds. If @var{seconds} is negative,
+or the attempt to sleep fails, return @minus{}1 and set @code{ERRNO}.
+Otherwise, return zero after sleeping for the indicated amount of time.
+Note that @var{seconds} may be a floating-point (non-integral) value.
+Implementation details: depending on platform availability, this function
+tries to use @code{nanosleep()} or @code{select()} to implement the delay.
+@end table
+
+@node gawkextlib
+@section The @code{gawkextlib} Project
+
+The @uref{http://sourceforge.net/projects/gawkextlib/, @code{gawkextlib}}
+project provides a number of @command{gawk} extensions, including one for
+processing XML files. This is the evolution of the original @command{xgawk}
+(XML @command{gawk}) project.
+
+As of this writing, there are four extensions:
+
+@itemize @bullet
+@item
+XML parser extension, using the @uref{http://expat.sourceforge.net, Expat}
+XML parsing library.
+
+@item
+Postgres SQL extension.
+
+@item
+GD graphics library extension.
+
+@item
+MPFR library extension.
+This provides access to a number of MPFR functions which @command{gawk}'s
+native MPFR support does not.
+@end itemize
+
+The @code{time} extension described earlier (@pxref{Extension Sample
+Time}) was originally from this project but has been moved in to the
+main @command{gawk} distribution.
+
+You can check out the code for the @code{gawkextlib} project
+using the @uref{http://git-scm.com, GIT} distributed source
+code control system. The command is as follows:
+
+@example
+git clone git://git.code.sf.net/p/gawkextlib/code gawkextlib-code
+@end example
+
+You will need to have the @uref{http://expat.sourceforge.net, Expat}
+XML parser library installed in order to build and use the XML extension.
+
+In addition, you must have the GNU Autotools installed
+(@uref{http://www.gnu.org/software/autoconf, Autoconf},
+@uref{http://www.gnu.org/software/automake, Automake},
+@uref{http://www.gnu.org/software/libtool, Libtool},
+and
+@uref{http://www.gnu.org/software/gettext, Gettext}).
+
+The simple recipe for building and testing @code{gawkextlib} is as follows.
+First, build and install @command{gawk}:
+
+@example
+cd .../path/to/gawk/code
+./configure --prefix=/tmp/newgawk @ii{Install in /tmp/newgawk for now}
+make && make check @ii{Build and check that all is OK}
+make install @ii{Install gawk}
+@end example
+
+Next, build @code{gawkextlib} and test it:
+
+@example
+cd .../path/to/gawkextlib-code
+./update-autotools @ii{Generate configure, etc.}
+ @ii{You may have to run this command twice}
+./configure --with-gawk=/tmp/newgawk @ii{Configure, point at ``installed'' gawk}
+make && make check @ii{Build and check that all is OK}
+@end example
+
+If you write an extension that you wish to share with other
+@command{gawk} users, please consider doing so through the
+@code{gawkextlib} project.
+
@ignore
@c Try this