diff options
author | Arnold D. Robbins <arnold@skeeve.com> | 2012-11-04 15:14:34 +0200 |
---|---|---|
committer | Arnold D. Robbins <arnold@skeeve.com> | 2012-11-04 15:14:34 +0200 |
commit | 2204f38c05fef5747b8f6764a202b646f4126338 (patch) | |
tree | a24b6a63658bb9c0ef4baf989061f65959496992 /doc/gawk.texi | |
parent | 5d3c11459bf9c8870cfc599722118b910aa17394 (diff) | |
download | egawk-2204f38c05fef5747b8f6764a202b646f4126338.tar.gz egawk-2204f38c05fef5747b8f6764a202b646f4126338.tar.bz2 egawk-2204f38c05fef5747b8f6764a202b646f4126338.zip |
Finally! Integrated API chapter into gawk doc.
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r-- | doc/gawk.texi | 4521 |
1 files changed, 3946 insertions, 575 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi index 59695171..573768ea 100644 --- a/doc/gawk.texi +++ b/doc/gawk.texi @@ -321,419 +321,531 @@ particular records in a file and perform operations upon them. * Index:: Concept and Variable Index. @detailmenu -* History:: The history of @command{gawk} and - @command{awk}. -* Names:: What name to use to find @command{awk}. -* This Manual:: Using this @value{DOCUMENT}. Includes - sample input files that you can use. -* Conventions:: Typographical Conventions. -* Manual History:: Brief history of the GNU project and this - @value{DOCUMENT}. -* How To Contribute:: Helping to save the world. -* Acknowledgments:: Acknowledgments. -* Running gawk:: How to run @command{gawk} programs; - includes command-line syntax. -* One-shot:: Running a short throwaway @command{awk} - program. -* Read Terminal:: Using no input files (input from terminal - instead). -* Long:: Putting permanent @command{awk} programs in - files. -* Executable Scripts:: Making self-contained @command{awk} - programs. -* Comments:: Adding documentation to @command{gawk} - programs. -* Quoting:: More discussion of shell quoting issues. -* DOS Quoting:: Quoting in Windows Batch Files. -* Sample Data Files:: Sample data files for use in the - @command{awk} programs illustrated in this - @value{DOCUMENT}. -* Very Simple:: A very simple example. -* Two Rules:: A less simple one-line example using two - rules. -* More Complex:: A more complex example. -* Statements/Lines:: Subdividing or combining statements into - lines. -* Other Features:: Other Features of @command{awk}. -* When:: When to use @command{gawk} and when to use - other things. -* Command Line:: How to run @command{awk}. -* Options:: Command-line options and their meanings. -* Other Arguments:: Input file names and variable assignments. -* Naming Standard Input:: How to specify standard input with other - files. -* Environment Variables:: The environment variables @command{gawk} - uses. -* AWKPATH Variable:: Searching directories for @command{awk} - programs. -* AWKLIBPATH Variable:: Searching directories for @command{awk} - shared libraries. -* Other Environment Variables:: The environment variables. -* Exit Status:: @command{gawk}'s exit status. -* Include Files:: Including other files into your program. -* Loading Shared Libraries:: Loading shared libraries into your program. -* Obsolete:: Obsolete Options and/or features. -* Undocumented:: Undocumented Options and Features. -* Regexp Usage:: How to Use Regular Expressions. -* Escape Sequences:: How to write nonprinting characters. -* Regexp Operators:: Regular Expression Operators. -* Bracket Expressions:: What can go between @samp{[...]}. -* GNU Regexp Operators:: Operators specific to GNU software. -* Case-sensitivity:: How to do case-insensitive matching. -* Leftmost Longest:: How much text matches. -* Computed Regexps:: Using Dynamic Regexps. -* Records:: Controlling how data is split into records. -* Fields:: An introduction to fields. -* Nonconstant Fields:: Nonconstant Field Numbers. -* Changing Fields:: Changing the Contents of a Field. -* Field Separators:: The field separator and how to change it. -* Default Field Splitting:: How fields are normally separated. -* Regexp Field Splitting:: Using regexps as the field separator. -* Single Character Fields:: Making each character a separate field. -* Command Line Field Separator:: Setting @code{FS} from the command-line. -* Field Splitting Summary:: Some final points and a summary table. -* Constant Size:: Reading constant width data. -* Splitting By Content:: Defining Fields By Content -* Multiple Line:: Reading multi-line records. -* Getline:: Reading files under explicit program - control using the @code{getline} function. -* Plain Getline:: Using @code{getline} with no arguments. -* Getline/Variable:: Using @code{getline} into a variable. -* Getline/File:: Using @code{getline} from a file. -* Getline/Variable/File:: Using @code{getline} into a variable from a - file. -* Getline/Pipe:: Using @code{getline} from a pipe. -* Getline/Variable/Pipe:: Using @code{getline} into a variable from a - pipe. -* Getline/Coprocess:: Using @code{getline} from a coprocess. -* Getline/Variable/Coprocess:: Using @code{getline} into a variable from a - coprocess. -* Getline Notes:: Important things to know about - @code{getline}. -* Getline Summary:: Summary of @code{getline} Variants. -* Read Timeout:: Reading input with a timeout. -* Command line directories:: What happens if you put a directory on the - command line. -* Print:: The @code{print} statement. -* Print Examples:: Simple examples of @code{print} statements. -* Output Separators:: The output separators and how to change - them. -* OFMT:: Controlling Numeric Output With - @code{print}. -* Printf:: The @code{printf} statement. -* Basic Printf:: Syntax of the @code{printf} statement. -* Control Letters:: Format-control letters. -* Format Modifiers:: Format-specification modifiers. -* Printf Examples:: Several examples. -* Redirection:: How to redirect output to multiple files - and pipes. -* Special Files:: File name interpretation in @command{gawk}. - @command{gawk} allows access to inherited - file descriptors. -* Special FD:: Special files for I/O. -* Special Network:: Special files for network communications. -* Special Caveats:: Things to watch out for. -* Close Files And Pipes:: Closing Input and Output Files and Pipes. -* Values:: Constants, Variables, and Regular - Expressions. -* Constants:: String, numeric and regexp constants. -* Scalar Constants:: Numeric and string constants. -* Nondecimal-numbers:: What are octal and hex numbers. -* Regexp Constants:: Regular Expression constants. -* Using Constant Regexps:: When and how to use a regexp constant. -* Variables:: Variables give names to values for later - use. -* Using Variables:: Using variables in your programs. -* Assignment Options:: Setting variables on the command-line and a - summary of command-line syntax. This is an - advanced method of input. -* Conversion:: The conversion of strings to numbers and - vice versa. -* All Operators:: @command{gawk}'s operators. -* Arithmetic Ops:: Arithmetic operations (@samp{+}, @samp{-}, - etc.) -* Concatenation:: Concatenating strings. -* Assignment Ops:: Changing the value of a variable or a - field. -* Increment Ops:: Incrementing the numeric value of a - variable. -* Truth Values and Conditions:: Testing for true and false. -* Truth Values:: What is ``true'' and what is ``false''. -* Typing and Comparison:: How variables acquire types and how this - affects comparison of numbers and strings - with @samp{<}, etc. -* Variable Typing:: String type versus numeric type. -* Comparison Operators:: The comparison operators. -* POSIX String Comparison:: String comparison with POSIX rules. -* Boolean Ops:: Combining comparison expressions using - boolean operators @samp{||} (``or''), - @samp{&&} (``and'') and @samp{!} (``not''). -* Conditional Exp:: Conditional expressions select between two - subexpressions under control of a third - subexpression. -* Function Calls:: A function call is an expression. -* Precedence:: How various operators nest. -* Locales:: How the locale affects things. -* Pattern Overview:: What goes into a pattern. -* Regexp Patterns:: Using regexps as patterns. -* Expression Patterns:: Any expression can be used as a pattern. -* Ranges:: Pairs of patterns specify record ranges. -* BEGIN/END:: Specifying initialization and cleanup - rules. -* Using BEGIN/END:: How and why to use BEGIN/END rules. -* I/O And BEGIN/END:: I/O issues in BEGIN/END rules. -* BEGINFILE/ENDFILE:: Two special patterns for advanced control. -* Empty:: The empty pattern, which matches every - record. -* Using Shell Variables:: How to use shell variables with - @command{awk}. -* Action Overview:: What goes into an action. -* Statements:: Describes the various control statements in - detail. -* If Statement:: Conditionally execute some @command{awk} - statements. -* While Statement:: Loop until some condition is satisfied. -* Do Statement:: Do specified action while looping until - some condition is satisfied. -* For Statement:: Another looping statement, that provides - initialization and increment clauses. -* Switch Statement:: Switch/case evaluation for conditional - execution of statements based on a value. -* Break Statement:: Immediately exit the innermost enclosing - loop. -* Continue Statement:: Skip to the end of the innermost enclosing - loop. -* Next Statement:: Stop processing the current input record. -* Nextfile Statement:: Stop processing the current file. -* Exit Statement:: Stop execution of @command{awk}. -* Built-in Variables:: Summarizes the built-in variables. -* User-modified:: Built-in variables that you change to - control @command{awk}. -* Auto-set:: Built-in variables where @command{awk} - gives you information. -* ARGC and ARGV:: Ways to use @code{ARGC} and @code{ARGV}. -* Array Basics:: The basics of arrays. -* Array Intro:: Introduction to Arrays -* Reference to Elements:: How to examine one element of an array. -* Assigning Elements:: How to change an element of an array. -* Array Example:: Basic Example of an Array -* Scanning an Array:: A variation of the @code{for} statement. It - loops through the indices of an array's - existing elements. -* Controlling Scanning:: Controlling the order in which arrays are - scanned. -* Delete:: The @code{delete} statement removes an - element from an array. -* Numeric Array Subscripts:: How to use numbers as subscripts in - @command{awk}. -* Uninitialized Subscripts:: Using Uninitialized variables as - subscripts. -* Multi-dimensional:: Emulating multidimensional arrays in - @command{awk}. -* Multi-scanning:: Scanning multidimensional arrays. -* Arrays of Arrays:: True multidimensional arrays. -* Built-in:: Summarizes the built-in functions. -* Calling Built-in:: How to call built-in functions. -* Numeric Functions:: Functions that work with numbers, including - @code{int()}, @code{sin()} and - @code{rand()}. -* String Functions:: Functions for string manipulation, such as - @code{split()}, @code{match()} and - @code{sprintf()}. -* Gory Details:: More than you want to know about @samp{\} - and @samp{&} with @code{sub()}, - @code{gsub()}, and @code{gensub()}. -* I/O Functions:: Functions for files and shell commands. -* Time Functions:: Functions for dealing with timestamps. -* Bitwise Functions:: Functions for bitwise operations. -* Type Functions:: Functions for type information. -* I18N Functions:: Functions for string translation. -* User-defined:: Describes User-defined functions in detail. -* Definition Syntax:: How to write definitions and what they - mean. -* Function Example:: An example function definition and what it - does. -* Function Caveats:: Things to watch out for. -* Calling A Function:: Don't use spaces. -* Variable Scope:: Controlling variable scope. -* Pass By Value/Reference:: Passing parameters. -* Return Statement:: Specifying the value a function returns. -* Dynamic Typing:: How variable types can change at runtime. -* Indirect Calls:: Choosing the function to call at runtime. -* I18N and L10N:: Internationalization and Localization. -* Explaining gettext:: How GNU @code{gettext} works. -* Programmer i18n:: Features for the programmer. -* Translator i18n:: Features for the translator. -* String Extraction:: Extracting marked strings. -* Printf Ordering:: Rearranging @code{printf} arguments. -* I18N Portability:: @command{awk}-level portability issues. -* I18N Example:: A simple i18n example. -* Gawk I18N:: @command{gawk} is also internationalized. -* Nondecimal Data:: Allowing nondecimal input data. -* Array Sorting:: Facilities for controlling array traversal - and sorting arrays. -* Controlling Array Traversal:: How to use PROCINFO["sorted_in"]. -* Array Sorting Functions:: How to use @code{asort()} and - @code{asorti()}. -* Two-way I/O:: Two-way communications with another - process. -* TCP/IP Networking:: Using @command{gawk} for network - programming. -* Profiling:: Profiling your @command{awk} programs. -* Library Names:: How to best name private global variables - in library functions. -* General Functions:: Functions that are of general use. -* Strtonum Function:: A replacement for the built-in - @code{strtonum()} function. -* Assert Function:: A function for assertions in @command{awk} - programs. -* Round Function:: A function for rounding if @code{sprintf()} - does not do it correctly. -* Cliff Random Function:: The Cliff Random Number Generator. -* Ordinal Functions:: Functions for using characters as numbers - and vice versa. -* Join Function:: A function to join an array into a string. -* Getlocaltime Function:: A function to get formatted times. -* Data File Management:: Functions for managing command-line data - files. -* Filetrans Function:: A function for handling data file - transitions. -* Rewind Function:: A function for rereading the current file. -* File Checking:: Checking that data files are readable. -* Empty Files:: Checking for zero-length files. -* Ignoring Assigns:: Treating assignments as file names. -* Getopt Function:: A function for processing command-line - arguments. -* Passwd Functions:: Functions for getting user information. -* Group Functions:: Functions for getting group information. -* Walking Arrays:: A function to walk arrays of arrays. -* Running Examples:: How to run these examples. -* Clones:: Clones of common utilities. -* Cut Program:: The @command{cut} utility. -* Egrep Program:: The @command{egrep} utility. -* Id Program:: The @command{id} utility. -* Split Program:: The @command{split} utility. -* Tee Program:: The @command{tee} utility. -* Uniq Program:: The @command{uniq} utility. -* Wc Program:: The @command{wc} utility. -* Miscellaneous Programs:: Some interesting @command{awk} programs. -* Dupword Program:: Finding duplicated words in a document. -* Alarm Program:: An alarm clock. -* Translate Program:: A program similar to the @command{tr} - utility. -* Labels Program:: Printing mailing labels. -* Word Sorting:: A program to produce a word usage count. -* History Sorting:: Eliminating duplicate entries from a - history file. -* Extract Program:: Pulling out programs from Texinfo source - files. -* Simple Sed:: A Simple Stream Editor. -* Igawk Program:: A wrapper for @command{awk} that includes - files. -* Anagram Program:: Finding anagrams from a dictionary. -* Signature Program:: People do amazing things with too much time - on their hands. -* Debugging:: Introduction to @command{gawk} debugger. -* Debugging Concepts:: Debugging in General. -* Debugging Terms:: Additional Debugging Concepts. -* Awk Debugging:: Awk Debugging. -* Sample Debugging Session:: Sample debugging session. -* Debugger Invocation:: How to Start the Debugger. -* Finding The Bug:: Finding the Bug. -* List of Debugger Commands:: Main debugger commands. -* Breakpoint Control:: Control of Breakpoints. -* Debugger Execution Control:: Control of Execution. -* Viewing And Changing Data:: Viewing and Changing Data. -* Execution Stack:: Dealing with the Stack. -* Debugger Info:: Obtaining Information about the Program and - the Debugger State. -* Miscellaneous Debugger Commands:: Miscellaneous Commands. -* Readline Support:: Readline support. -* Limitations:: Limitations and future plans. -* General Arithmetic:: An introduction to computer arithmetic. -* Floating Point Issues:: Stuff to know about floating-point numbers. -* String Conversion Precision:: The String Value Can Lie. -* Unexpected Results:: Floating Point Numbers Are Not Abstract - Numbers. -* POSIX Floating Point Problems:: Standards Versus Existing Practice. -* Integer Programming:: Effective integer programming. -* Floating-point Programming:: Effective Floating-point Programming. -* Floating-point Representation:: Binary floating-point representation. -* Floating-point Context:: Floating-point context. -* Rounding Mode:: Floating-point rounding mode. -* Gawk and MPFR:: How @command{gawk} provides - arbitrary-precision arithmetic. -* Arbitrary Precision Floats:: Arbitrary Precision Floating-point - Arithmetic with @command{gawk}. -* Setting Precision:: Setting the working precision. -* Setting Rounding Mode:: Setting the rounding mode. -* Floating-point Constants:: Representing floating-point constants. -* Changing Precision:: Changing the precision of a number. -* Exact Arithmetic:: Exact arithmetic with floating-point - numbers. -* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic with - @command{gawk}. -* Plugin License:: A note about licensing. -* Sample Library:: A example of new functions. -* Internal File Description:: What the new functions will do. -* Internal File Ops:: The code for internal file operations. -* Using Internal File Ops:: How to use an external extension. -* V7/SVR3.1:: The major changes between V7 and System V - Release 3.1. -* SVR4:: Minor changes between System V Releases 3.1 - and 4. -* POSIX:: New features from the POSIX standard. -* BTL:: New features from Brian Kernighan's version - of @command{awk}. -* POSIX/GNU:: The extensions in @command{gawk} not in - POSIX @command{awk}. -* Common Extensions:: Common Extensions Summary. -* Ranges and Locales:: How locales used to affect regexp ranges. -* Contributors:: The major contributors to @command{gawk}. -* Gawk Distribution:: What is in the @command{gawk} distribution. -* Getting:: How to get the distribution. -* Extracting:: How to extract the distribution. -* Distribution contents:: What is in the distribution. -* Unix Installation:: Installing @command{gawk} under various - versions of Unix. -* Quick Installation:: Compiling @command{gawk} under Unix. -* Additional Configuration Options:: Other compile-time options. -* Configuration Philosophy:: How it's all supposed to work. -* Non-Unix Installation:: Installation on Other Operating Systems. -* PC Installation:: Installing and Compiling @command{gawk} on - MS-DOS and OS/2. -* PC Binary Installation:: Installing a prepared distribution. -* PC Compiling:: Compiling @command{gawk} for MS-DOS, - Windows32, and OS/2. -* PC Testing:: Testing @command{gawk} on PC systems. -* PC Using:: Running @command{gawk} on MS-DOS, Windows32 - and OS/2. -* Cygwin:: Building and running @command{gawk} for - Cygwin. -* MSYS:: Using @command{gawk} In The MSYS - Environment. -* VMS Installation:: Installing @command{gawk} on VMS. -* VMS Compilation:: How to compile @command{gawk} under VMS. -* VMS Installation Details:: How to install @command{gawk} under VMS. -* VMS Running:: How to run @command{gawk} under VMS. -* VMS Old Gawk:: An old version comes with some VMS systems. -* Bugs:: Reporting Problems and Bugs. -* Other Versions:: Other freely available @command{awk} - implementations. -* Compatibility Mode:: How to disable certain @command{gawk} - extensions. -* Additions:: Making Additions To @command{gawk}. -* Accessing The Source:: Accessing the Git repository. -* Adding Code:: Adding code to the main body of - @command{gawk}. -* New Ports:: Porting @command{gawk} to a new operating - system. -* Derived Files:: Why derived files are kept in the - @command{git} repository. -* Future Extensions:: New features that may be implemented one - day. -* Basic High Level:: The high level view. -* Basic Data Typing:: A very quick intro to data types. +* History:: The history of @command{gawk} and + @command{awk}. +* Names:: What name to use to find + @command{awk}. +* This Manual:: Using this @value{DOCUMENT}. Includes + sample input files that you can use. +* Conventions:: Typographical Conventions. +* Manual History:: Brief history of the GNU project and + this @value{DOCUMENT}. +* How To Contribute:: Helping to save the world. +* Acknowledgments:: Acknowledgments. +* Running gawk:: How to run @command{gawk} programs; + includes command-line syntax. +* One-shot:: Running a short throwaway + @command{awk} program. +* Read Terminal:: Using no input files (input from + terminal instead). +* Long:: Putting permanent @command{awk} + programs in files. +* Executable Scripts:: Making self-contained @command{awk} + programs. +* Comments:: Adding documentation to @command{gawk} + programs. +* Quoting:: More discussion of shell quoting + issues. +* DOS Quoting:: Quoting in Windows Batch Files. +* Sample Data Files:: Sample data files for use in the + @command{awk} programs illustrated in + this @value{DOCUMENT}. +* Very Simple:: A very simple example. +* Two Rules:: A less simple one-line example using + two rules. +* More Complex:: A more complex example. +* Statements/Lines:: Subdividing or combining statements + into lines. +* Other Features:: Other Features of @command{awk}. +* When:: When to use @command{gawk} and when to + use other things. +* Command Line:: How to run @command{awk}. +* Options:: Command-line options and their + meanings. +* Other Arguments:: Input file names and variable + assignments. +* Naming Standard Input:: How to specify standard input with + other files. +* Environment Variables:: The environment variables + @command{gawk} uses. +* AWKPATH Variable:: Searching directories for + @command{awk} programs. +* AWKLIBPATH Variable:: Searching directories for + @command{awk} shared libraries. +* Other Environment Variables:: The environment variables. +* Exit Status:: @command{gawk}'s exit status. +* Include Files:: Including other files into your + program. +* Loading Shared Libraries:: Loading shared libraries into your + program. +* Obsolete:: Obsolete Options and/or features. +* Undocumented:: Undocumented Options and Features. +* Regexp Usage:: How to Use Regular Expressions. +* Escape Sequences:: How to write nonprinting characters. +* Regexp Operators:: Regular Expression Operators. +* Bracket Expressions:: What can go between @samp{[...]}. +* GNU Regexp Operators:: Operators specific to GNU software. +* Case-sensitivity:: How to do case-insensitive matching. +* Leftmost Longest:: How much text matches. +* Computed Regexps:: Using Dynamic Regexps. +* Records:: Controlling how data is split into + records. +* Fields:: An introduction to fields. +* Nonconstant Fields:: Nonconstant Field Numbers. +* Changing Fields:: Changing the Contents of a Field. +* Field Separators:: The field separator and how to change + it. +* Default Field Splitting:: How fields are normally separated. +* Regexp Field Splitting:: Using regexps as the field separator. +* Single Character Fields:: Making each character a separate + field. +* Command Line Field Separator:: Setting @code{FS} from the + command-line. +* Field Splitting Summary:: Some final points and a summary table. +* Constant Size:: Reading constant width data. +* Splitting By Content:: Defining Fields By Content +* Multiple Line:: Reading multi-line records. +* Getline:: Reading files under explicit program + control using the @code{getline} + function. +* Plain Getline:: Using @code{getline} with no + arguments. +* Getline/Variable:: Using @code{getline} into a variable. +* Getline/File:: Using @code{getline} from a file. +* Getline/Variable/File:: Using @code{getline} into a variable + from a file. +* Getline/Pipe:: Using @code{getline} from a pipe. +* Getline/Variable/Pipe:: Using @code{getline} into a variable + from a pipe. +* Getline/Coprocess:: Using @code{getline} from a coprocess. +* Getline/Variable/Coprocess:: Using @code{getline} into a variable + from a coprocess. +* Getline Notes:: Important things to know about + @code{getline}. +* Getline Summary:: Summary of @code{getline} Variants. +* Read Timeout:: Reading input with a timeout. +* Command line directories:: What happens if you put a directory on + the command line. +* Print:: The @code{print} statement. +* Print Examples:: Simple examples of @code{print} + statements. +* Output Separators:: The output separators and how to + change them. +* OFMT:: Controlling Numeric Output With + @code{print}. +* Printf:: The @code{printf} statement. +* Basic Printf:: Syntax of the @code{printf} statement. +* Control Letters:: Format-control letters. +* Format Modifiers:: Format-specification modifiers. +* Printf Examples:: Several examples. +* Redirection:: How to redirect output to multiple + files and pipes. +* Special Files:: File name interpretation in + @command{gawk}. @command{gawk} allows + access to inherited file descriptors. +* Special FD:: Special files for I/O. +* Special Network:: Special files for network + communications. +* Special Caveats:: Things to watch out for. +* Close Files And Pipes:: Closing Input and Output Files and + Pipes. +* Values:: Constants, Variables, and Regular + Expressions. +* Constants:: String, numeric and regexp constants. +* Scalar Constants:: Numeric and string constants. +* Nondecimal-numbers:: What are octal and hex numbers. +* Regexp Constants:: Regular Expression constants. +* Using Constant Regexps:: When and how to use a regexp constant. +* Variables:: Variables give names to values for + later use. +* Using Variables:: Using variables in your programs. +* Assignment Options:: Setting variables on the command-line + and a summary of command-line syntax. + This is an advanced method of input. +* Conversion:: The conversion of strings to numbers + and vice versa. +* All Operators:: @command{gawk}'s operators. +* Arithmetic Ops:: Arithmetic operations (@samp{+}, + @samp{-}, etc.) +* Concatenation:: Concatenating strings. +* Assignment Ops:: Changing the value of a variable or a + field. +* Increment Ops:: Incrementing the numeric value of a + variable. +* Truth Values and Conditions:: Testing for true and false. +* Truth Values:: What is ``true'' and what is + ``false''. +* Typing and Comparison:: How variables acquire types and how + this affects comparison of numbers and + strings with @samp{<}, etc. +* Variable Typing:: String type versus numeric type. +* Comparison Operators:: The comparison operators. +* POSIX String Comparison:: String comparison with POSIX rules. +* Boolean Ops:: Combining comparison expressions using + boolean operators @samp{||} (``or''), + @samp{&&} (``and'') and @samp{!} + (``not''). +* Conditional Exp:: Conditional expressions select between + two subexpressions under control of a + third subexpression. +* Function Calls:: A function call is an expression. +* Precedence:: How various operators nest. +* Locales:: How the locale affects things. +* Pattern Overview:: What goes into a pattern. +* Regexp Patterns:: Using regexps as patterns. +* Expression Patterns:: Any expression can be used as a + pattern. +* Ranges:: Pairs of patterns specify record + ranges. +* BEGIN/END:: Specifying initialization and cleanup + rules. +* Using BEGIN/END:: How and why to use BEGIN/END rules. +* I/O And BEGIN/END:: I/O issues in BEGIN/END rules. +* BEGINFILE/ENDFILE:: Two special patterns for advanced + control. +* Empty:: The empty pattern, which matches every + record. +* Using Shell Variables:: How to use shell variables with + @command{awk}. +* Action Overview:: What goes into an action. +* Statements:: Describes the various control + statements in detail. +* If Statement:: Conditionally execute some + @command{awk} statements. +* While Statement:: Loop until some condition is + satisfied. +* Do Statement:: Do specified action while looping + until some condition is satisfied. +* For Statement:: Another looping statement, that + provides initialization and increment + clauses. +* Switch Statement:: Switch/case evaluation for conditional + execution of statements based on a + value. +* Break Statement:: Immediately exit the innermost + enclosing loop. +* Continue Statement:: Skip to the end of the innermost + enclosing loop. +* Next Statement:: Stop processing the current input + record. +* Nextfile Statement:: Stop processing the current file. +* Exit Statement:: Stop execution of @command{awk}. +* Built-in Variables:: Summarizes the built-in variables. +* User-modified:: Built-in variables that you change to + control @command{awk}. +* Auto-set:: Built-in variables where @command{awk} + gives you information. +* ARGC and ARGV:: Ways to use @code{ARGC} and + @code{ARGV}. +* Array Basics:: The basics of arrays. +* Array Intro:: Introduction to Arrays +* Reference to Elements:: How to examine one element of an + array. +* Assigning Elements:: How to change an element of an array. +* Array Example:: Basic Example of an Array +* Scanning an Array:: A variation of the @code{for} + statement. It loops through the + indices of an array's existing + elements. +* Controlling Scanning:: Controlling the order in which arrays + are scanned. +* Delete:: The @code{delete} statement removes an + element from an array. +* Numeric Array Subscripts:: How to use numbers as subscripts in + @command{awk}. +* Uninitialized Subscripts:: Using Uninitialized variables as + subscripts. +* Multi-dimensional:: Emulating multidimensional arrays in + @command{awk}. +* Multi-scanning:: Scanning multidimensional arrays. +* Arrays of Arrays:: True multidimensional arrays. +* Built-in:: Summarizes the built-in functions. +* Calling Built-in:: How to call built-in functions. +* Numeric Functions:: Functions that work with numbers, + including @code{int()}, @code{sin()} + and @code{rand()}. +* String Functions:: Functions for string manipulation, + such as @code{split()}, @code{match()} + and @code{sprintf()}. +* Gory Details:: More than you want to know about + @samp{\} and @samp{&} with + @code{sub()}, @code{gsub()}, and + @code{gensub()}. +* I/O Functions:: Functions for files and shell + commands. +* Time Functions:: Functions for dealing with timestamps. +* Bitwise Functions:: Functions for bitwise operations. +* Type Functions:: Functions for type information. +* I18N Functions:: Functions for string translation. +* User-defined:: Describes User-defined functions in + detail. +* Definition Syntax:: How to write definitions and what they + mean. +* Function Example:: An example function definition and + what it does. +* Function Caveats:: Things to watch out for. +* Calling A Function:: Don't use spaces. +* Variable Scope:: Controlling variable scope. +* Pass By Value/Reference:: Passing parameters. +* Return Statement:: Specifying the value a function + returns. +* Dynamic Typing:: How variable types can change at + runtime. +* Indirect Calls:: Choosing the function to call at + runtime. +* I18N and L10N:: Internationalization and Localization. +* Explaining gettext:: How GNU @code{gettext} works. +* Programmer i18n:: Features for the programmer. +* Translator i18n:: Features for the translator. +* String Extraction:: Extracting marked strings. +* Printf Ordering:: Rearranging @code{printf} arguments. +* I18N Portability:: @command{awk}-level portability + issues. +* I18N Example:: A simple i18n example. +* Gawk I18N:: @command{gawk} is also + internationalized. +* Nondecimal Data:: Allowing nondecimal input data. +* Array Sorting:: Facilities for controlling array + traversal and sorting arrays. +* Controlling Array Traversal:: How to use PROCINFO["sorted_in"]. +* Array Sorting Functions:: How to use @code{asort()} and + @code{asorti()}. +* Two-way I/O:: Two-way communications with another + process. +* TCP/IP Networking:: Using @command{gawk} for network + programming. +* Profiling:: Profiling your @command{awk} programs. +* Library Names:: How to best name private global + variables in library functions. +* General Functions:: Functions that are of general use. +* Strtonum Function:: A replacement for the built-in + @code{strtonum()} function. +* Assert Function:: A function for assertions in + @command{awk} programs. +* Round Function:: A function for rounding if + @code{sprintf()} does not do it + correctly. +* Cliff Random Function:: The Cliff Random Number Generator. +* Ordinal Functions:: Functions for using characters as + numbers and vice versa. +* Join Function:: A function to join an array into a + string. +* Getlocaltime Function:: A function to get formatted times. +* Data File Management:: Functions for managing command-line + data files. +* Filetrans Function:: A function for handling data file + transitions. +* Rewind Function:: A function for rereading the current + file. +* File Checking:: Checking that data files are readable. +* Empty Files:: Checking for zero-length files. +* Ignoring Assigns:: Treating assignments as file names. +* Getopt Function:: A function for processing command-line + arguments. +* Passwd Functions:: Functions for getting user + information. +* Group Functions:: Functions for getting group + information. +* Walking Arrays:: A function to walk arrays of arrays. +* Running Examples:: How to run these examples. +* Clones:: Clones of common utilities. +* Cut Program:: The @command{cut} utility. +* Egrep Program:: The @command{egrep} utility. +* Id Program:: The @command{id} utility. +* Split Program:: The @command{split} utility. +* Tee Program:: The @command{tee} utility. +* Uniq Program:: The @command{uniq} utility. +* Wc Program:: The @command{wc} utility. +* Miscellaneous Programs:: Some interesting @command{awk} + programs. +* Dupword Program:: Finding duplicated words in a + document. +* Alarm Program:: An alarm clock. +* Translate Program:: A program similar to the @command{tr} + utility. +* Labels Program:: Printing mailing labels. +* Word Sorting:: A program to produce a word usage + count. +* History Sorting:: Eliminating duplicate entries from a + history file. +* Extract Program:: Pulling out programs from Texinfo + source files. +* Simple Sed:: A Simple Stream Editor. +* Igawk Program:: A wrapper for @command{awk} that + includes files. +* Anagram Program:: Finding anagrams from a dictionary. +* Signature Program:: People do amazing things with too much + time on their hands. +* Debugging:: Introduction to @command{gawk} + debugger. +* Debugging Concepts:: Debugging in General. +* Debugging Terms:: Additional Debugging Concepts. +* Awk Debugging:: Awk Debugging. +* Sample Debugging Session:: Sample debugging session. +* Debugger Invocation:: How to Start the Debugger. +* Finding The Bug:: Finding the Bug. +* List of Debugger Commands:: Main debugger commands. +* Breakpoint Control:: Control of Breakpoints. +* Debugger Execution Control:: Control of Execution. +* Viewing And Changing Data:: Viewing and Changing Data. +* Execution Stack:: Dealing with the Stack. +* Debugger Info:: Obtaining Information about the + Program and the Debugger State. +* Miscellaneous Debugger Commands:: Miscellaneous Commands. +* Readline Support:: Readline support. +* Limitations:: Limitations and future plans. +* General Arithmetic:: An introduction to computer + arithmetic. +* Floating Point Issues:: Stuff to know about floating-point + numbers. +* String Conversion Precision:: The String Value Can Lie. +* Unexpected Results:: Floating Point Numbers Are Not + Abstract Numbers. +* POSIX Floating Point Problems:: Standards Versus Existing Practice. +* Integer Programming:: Effective integer programming. +* Floating-point Programming:: Effective Floating-point Programming. +* Floating-point Representation:: Binary floating-point representation. +* Floating-point Context:: Floating-point context. +* Rounding Mode:: Floating-point rounding mode. +* Gawk and MPFR:: How @command{gawk} provides + arbitrary-precision arithmetic. +* Arbitrary Precision Floats:: Arbitrary Precision Floating-point + Arithmetic with @command{gawk}. +* Setting Precision:: Setting the working precision. +* Setting Rounding Mode:: Setting the rounding mode. +* Floating-point Constants:: Representing floating-point constants. +* Changing Precision:: Changing the precision of a number. +* Exact Arithmetic:: Exact arithmetic with floating-point + numbers. +* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic + with @command{gawk}. +* Extension Intro:: What is an extension. +* Plugin License:: A note about licensing. +* Extension Design:: Design notes about the extension API. +* Old Extension Problems:: Problems with the old mechanism. +* Extension New Mechanism Goals:: Goals for the new mechanism. +* Extension Other Design Decisions:: Some other design decisions. +* Extension Mechanism Outline:: An outline of how it works. +* Extension Future Growth:: Some room for future growth. +* Extension API Description:: A full description of the API. +* Extension API Functions Introduction:: Introduction to the API functions. +* General Data Types:: The data types. +* Requesting Values:: How to get a value. +* Constructor Functions:: Functions for creating values. +* Registration Functions:: Functions to register things with + @command{gawk}. +* Extension Functions:: Registering extension functions. +* Exit Callback Functions:: Registering an exit callback. +* Extension Version String:: Registering a version string. +* Input Parsers:: Registering an input parser. +* Output Wrappers:: Registering an output wrapper. +* Two-way processors:: Registering a two-way processor. +* Printing Messages:: Functions for printing messages. +* Updating @code{ERRNO}:: Functions for updating @code{ERRNO}. +* Accessing Parameters:: Functions for accessing parameters. +* Symbol Table Access:: Functions for accessing global + variables. +* Symbol table by name:: Accessing variables by name. +* Symbol table by cookie:: Accessing variables by ``cookie''. +* Cached values:: Creating and using cached values. +* Array Manipulation:: Functions for working with arrays. +* Array Data Types:: Data types for working with arrays. +* Array Functions:: Functions for working with arrays. +* Flattening Arrays:: How to flatten arrays. +* Creating Arrays:: How to create and populate arrays. +* Extension API Variables:: Variables provided by the API. +* Extension Versioning:: API Version information. +* Extension API Informational Variables:: Variables providing information about + @command{gawk}'s invocation. +* Extension API Boilerplate:: Boilerplate code for using the API. +* Finding Extensions:: How @command{gawk} find compiled + extensions. +* Extension Example:: Example C code for an extension. +* Internal File Description:: What the new functions will do. +* Internal File Ops:: The code for internal file operations. +* Using Internal File Ops:: How to use an external extension. +* Extension Samples:: The sample extensions that ship with + @code{gawk}. +* Extension Sample File Functions:: The file functions sample. +* Extension Sample Fnmatch:: An interface to @code{fnmatch()}. +* Extension Sample Fork:: An interface to @code{fork()} and + other process functions. +* Extension Sample Ord:: Character to value to character + conversions. +* Extension Sample Readdir:: An interface to @code{readdir()}. +* Extension Sample Revout:: Reversing output sample output + wrapper. +* Extension Sample Rev2way:: Reversing data sample two-way + processor. +* Extension Sample Read write array:: Serializing an array to a file. +* Extension Sample Readfile:: Reading an entire file into a string. +* Extension Sample API Tests:: Tests for the API. +* Extension Sample Time:: An interface to @code{gettimeofday()} + and @code{sleep()}. +* gawkextlib:: The @code{gawkextlib} project. +* V7/SVR3.1:: The major changes between V7 and + System V Release 3.1. +* SVR4:: Minor changes between System V + Releases 3.1 and 4. +* POSIX:: New features from the POSIX standard. +* BTL:: New features from Brian Kernighan's + version of @command{awk}. +* POSIX/GNU:: The extensions in @command{gawk} not + in POSIX @command{awk}. +* Common Extensions:: Common Extensions Summary. +* Ranges and Locales:: How locales used to affect regexp + ranges. +* Contributors:: The major contributors to + @command{gawk}. +* Gawk Distribution:: What is in the @command{gawk} + distribution. +* Getting:: How to get the distribution. +* Extracting:: How to extract the distribution. +* Distribution contents:: What is in the distribution. +* Unix Installation:: Installing @command{gawk} under + various versions of Unix. +* Quick Installation:: Compiling @command{gawk} under Unix. +* Additional Configuration Options:: Other compile-time options. +* Configuration Philosophy:: How it's all supposed to work. +* Non-Unix Installation:: Installation on Other Operating + Systems. +* PC Installation:: Installing and Compiling + @command{gawk} on MS-DOS and OS/2. +* PC Binary Installation:: Installing a prepared distribution. +* PC Compiling:: Compiling @command{gawk} for MS-DOS, + Windows32, and OS/2. +* PC Testing:: Testing @command{gawk} on PC systems. +* PC Using:: Running @command{gawk} on MS-DOS, + Windows32 and OS/2. +* Cygwin:: Building and running @command{gawk} + for Cygwin. +* MSYS:: Using @command{gawk} In The MSYS + Environment. +* VMS Installation:: Installing @command{gawk} on VMS. +* VMS Compilation:: How to compile @command{gawk} under + VMS. +* VMS Installation Details:: How to install @command{gawk} under + VMS. +* VMS Running:: How to run @command{gawk} under VMS. +* VMS Old Gawk:: An old version comes with some VMS + systems. +* Bugs:: Reporting Problems and Bugs. +* Other Versions:: Other freely available @command{awk} + implementations. +* Compatibility Mode:: How to disable certain @command{gawk} + extensions. +* Additions:: Making Additions To @command{gawk}. +* Accessing The Source:: Accessing the Git repository. +* Adding Code:: Adding code to the main body of + @command{gawk}. +* New Ports:: Porting @command{gawk} to a new + operating system. +* Derived Files:: Why derived files are kept in the + @command{git} repository. +* Future Extensions:: New features that may be implemented + one day. +* Basic High Level:: The high level view. +* Basic Data Typing:: A very quick intro to data types. @end detailmenu @end menu @@ -28049,42 +28161,62 @@ gawk -M 'BEGIN @{ n = 13; print n % 2 @}' @node Dynamic Extensions @chapter Writing Extensions for @command{gawk} -This chapter is a placeholder, pending a rewrite for the new API. -Some of the old bits remain, since they can be partially reused. - - -@c STARTOFRANGE gladfgaw -@cindex @command{gawk}, functions, adding -@c STARTOFRANGE adfugaw -@cindex adding, functions to @command{gawk} -@c STARTOFRANGE fubadgaw -@cindex functions, built-in, adding to @command{gawk} -It is possible to add new built-in -functions to @command{gawk} using dynamically loaded libraries. This -facility is available on systems (such as GNU/Linux) that support -the C @code{dlopen()} and @code{dlsym()} functions. -This @value{CHAPTER} describes how to write and use dynamically -loaded extensions for @command{gawk}. -Experience with programming in -C or C++ is necessary when reading this @value{SECTION}. +It is possible to add new built-in functions to @command{gawk} using +dynamically loaded libraries. This facility is available on systems (such +as GNU/Linux) that support the C @code{dlopen()} and @code{dlsym()} +functions. This @value{CHAPTER} describes how to create extensions +using code written in C or C++. If you don't know anything about C +programming, you can safely skip this @value{CHAPTER}, although you +may wish to review the documentation on the extensions that come with +@command{gawk} (@pxref{Extension Samples}), and the section on the +@code{gawkextlib} project (@pxref{gawkextlib}). @quotation NOTE When @option{--sandbox} is specified, extensions are disabled -(@pxref{Options}. +(@pxref{Options}). @end quotation @menu +* Extension Intro:: What is an extension. * Plugin License:: A note about licensing. -* Sample Library:: A example of new functions. +* Extension Design:: Design notes about the extension API. +* Extension API Description:: A full description of the API. +* Extension Example:: Example C code for an extension. +* Extension Samples:: The sample extensions that ship with + @code{gawk}. +* gawkextlib:: The @code{gawkextlib} project. @end menu +@node Extension Intro +@section Introduction + +An @dfn{extension} (sometimes called a @dfn{plug-in}) is a piece of +external compiled code that @command{gawk} can load at runtime to +provide additional functionality, over and above the built-in capabilities +described in the rest of this @value{DOCUMENT}. + +Extensions are useful because they allow you (of course) to extend +@command{gawk}'s functionality. For example, they can provide access to +system calls (such as @code{chdir()} to change directory) and to other +C library routines that could be of use. As with most software, +``the sky is the limit;'' if you can imagine something that you might +want to do and can write in C or C++, you can write an extension to do it! + +Extensions are written in C or C++, using the @dfn{Application Programming +Interface} (API) defined for this purpose by the @command{gawk} +developers. The rest of this @value{CHAPTER} explains the design +decisions behind the API, the facilities it provides and how to use +them, and presents a small sample extension. In addition, it documents +the sample extensions included in the @command{gawk} distribution, +and describes the @code{gawkextlib} project. + @node Plugin License @section Extension Licensing Every dynamic extension should define the global symbol @code{plugin_is_GPL_compatible} to assert that it has been licensed under a GPL-compatible license. If this symbol does not exist, @command{gawk} -will emit a fatal error and exit. +emits a fatal error and exits when it tries to load your extension. The declared type of the symbol should be @code{int}. It does not need to be in any allocated section, though. The code merely asserts that @@ -28094,23 +28226,2383 @@ the symbol exists in the global scope. Something like this is enough: int plugin_is_GPL_compatible; @end example -@node Sample Library -@section Example: Directory and File Operation Built-ins -@c STARTOFRANGE chdirg -@cindex @code{chdir()} function@comma{} implementing in @command{gawk} -@c STARTOFRANGE statg -@cindex @code{stat()} function@comma{} implementing in @command{gawk} -@c STARTOFRANGE filre -@cindex files, information about@comma{} retrieving -@c STARTOFRANGE dirch -@cindex directories, changing - -Two useful functions that are not in @command{awk} are @code{chdir()} -(so that an @command{awk} program can change its directory) and -@code{stat()} (so that an @command{awk} program can gather information about -a file). -This @value{SECTION} implements these functions for @command{gawk} in an -external extension library. +@node Extension Design +@section Extension API Design + +The first version of extensions for @command{gawk} was developed in +the mid-1990s and released with @command{gawk} 3.1 in the late 1990s. +The basic mechanisms and design remained unchanged for close to 15 years, +until 2012. + +The old extension mechanism used data types and functions from +@command{gawk} itself, with a ``clever hack'' to install extension +functions. + +@command{gawk} included some sample extensions, of which a few were +really useful. However, it was clear from the outset that the extension +mechanism was bolted onto the side and was not really thought out. + +@menu +* Old Extension Problems:: Problems with the old mechanism. +* Extension New Mechanism Goals:: Goals for the new mechanism. +* Extension Other Design Decisions:: Some other design decisions. +* Extension Mechanism Outline:: An outline of how it works. +* Extension Future Growth:: Some room for future growth. +@end menu + +@node Old Extension Problems +@subsection Problems With The Old Mechanism + +The old extension mechanism had several problems: + +@itemize @bullet +@item +It depended heavily upon @command{gawk} internals. Any time the +@code{NODE} structure@footnote{A critical central data structure +inside @command{gawk}.} changed, an extension would have to be +recompiled. Furthermore, to really write extensions required understanding +something about @command{gawk}'s internal functions. There was some +documentation in this @value{DOCUMENT}, but it was quite minimal. + +@item +Being able to call into @command{gawk} from an extension required linker +facilities that are common on Unix-derived systems but that did +not work on Windows systems; users wanting extensions on Windows +had to statically link them into @command{gawk}, even though Windows supports +dynamic loading of shared objects. + +@item +The API would change occasionally as @command{gawk} changed; no compatibility +between versions was ever offered or planned for. +@end itemize + +Despite the drawbacks, the @command{xgawk} project developers forked +@command{gawk} and developed several significant extensions. They also +enhanced @command{gawk}'s facilities relating to file inclusion and +shared object access. + +A new API was desired for a long time, but only in 2012 did the +@command{gawk} maintainer and the @command{xgawk} developers finally +start working on it together. More information about the @command{xgawk} +project is provided in @ref{gawkextlib}. + +@node Extension New Mechanism Goals +@subsection Goals For A New Mechanism + +Some goals for the new API were: + +@itemize @bullet +@item +The API should be independent of @command{gawk} internals. Changes in +@command{gawk} internals should not be visible to the writer of an +extension function. + +@item +The API should provide @emph{binary} compatibility across @command{gawk} +releases as long as the API itself does not change. + +@item +The API should enable extensions written in C to have roughly the +same ``appearance'' to @command{awk}-level code as @command{awk} +functions do. This means that extensions should have: + +@itemize @minus +@item +The ability to access function parameters. + +@item +The ability to turn an undefined parameter into an array (call by reference). + +@item +The ability to create, access and update global variables. + +@item +Easy access to all the elements of an array at once (``array flattening'') +in order to loop over all the element in an easy fashion for C code. + +@item +The ability to create arrays (including @command{gawk}'s true +multi-dimensional arrays). +@end itemize +@end itemize + +Some additional important goals were: + +@itemize @bullet +@item +The API should use only features in ISO C 90, so that extensions +can be written using the widest range of C and C++ compilers. The header +should include the appropriate @samp{#ifdef __cplusplus} and @samp{extern "C"} +magic so that a C++ compiler could be used. (If using C++, the runtime +system has to be smart enough to call any constructors and destructors, +as @command{gawk} is a C program. As of this writing, this has not been +tested.) + +@item +The API mechanism should not require access to @command{gawk}'s +symbols@footnote{The @dfn{symbols} are the variables and functions +defined inside @command{gawk}. Access to these symbols by code +external to @command{gawk} loaded dynamically at runtime is +problematic on Windows.} by the compile-time or dynamic linker, +in order to enable creation of extensions that also work on Windows. +@end itemize + +During development, it became clear that there were other features +that should be available to extensions, which were also subsequently +provided: + +@itemize @bullet +@item +Extensions should have the ability to hook into @command{gawk}'s +I/O redirection mechanism. In particular, the @command{xgawk} +developers provided a so-called ``open hook'' to take over reading +records. During development, this was generalized to allow +extensions to hook into input processing, output processing, and +two-way I/O. + +@item +An extension should be able to provide a ``call back'' function +to perform clean up actions when @command{gawk} exits. + +@item +An extension should be able to provide a version string so that +@command{gawk}'s @option{--version} option can provide information +about extensions as well. +@end itemize + +@node Extension Other Design Decisions +@subsection Other Design Decisions + +As an ``arbitrary'' design decision, extensions can read the values of +built-in variables and arrays (such as @code{ARGV} and @code{FS}), but cannot +change them, with the exception of @code{PROCINFO}. + +The reason for this is to prevent an extension function from affecting +the flow of an @command{awk} program outside its control. While a real +@command{awk} function can do what it likes, that is at the discretion +of the programmer. An extension function should provide a service or +make a C API available for use within @command{awk}, and not mess with +@code{FS} or @code{ARGC} and @code{ARGV}. + +In addition, it becomes easy to start down a slippery slope. How +much access to @command{gawk} facilities do extensions need? +Do they need @code{getline}? What about calling @code{gsub()} or +compiling regular expressions? What about calling into @command{awk} +functions? (@emph{That} would be messy.) + +In order to avoid these issues, the @command{gawk} developers chose +to start with the simplest, most basic features that are still truly useful. + +Another decision is that although @command{gawk} provides nice things like +MPFR, and arrays indexed internally by integers, these features are not +being brought out to the API in order to keep things simple and close to +traditional @command{awk} semantics. (In fact, arrays indexed internally +by integers are so transparent that they aren't even documented!) + +With time, the API will undoubtedly evolve; the @command{gawk} developers +expect this to be driven by user needs. For now, the current API seems +to provide a minimal yet powerful set of features for creating extensions. + +@node Extension Mechanism Outline +@subsection At A High Level How It Works + +The requirement to avoid access to @command{gawk}'s symbols is, at first +glance, a difficult one to meet. + +One design, apparently used by Perl and Ruby and maybe others, would +be to make the mainline @command{gawk} code into a library, with the +@command{gawk} utility a small C @code{main()} function linked against +the library. + +This seemed like the tail wagging the dog, complicating build and +installation and making a simple copy of the @command{gawk} executable +from one system to another (or one place to another on the same +system!) into a chancy operation. + +Pat Rankin suggested the solution that was adopted. Communication between +@command{gawk} and an extension is two-way. First, when an extension +is loaded, it is passed a pointer to a @code{struct} whose fields are +function pointers. +@iftex +This is shown in @ref{load-extension}. +@end iftex + +@float Figure,load-extension +@caption{Loading the extension} +@ifinfo +@center @image{api-figure1, , , Loading the extension, txt} +@end ifinfo +@ifhtml +@center @image{api-figure1, , , Loading the extension, png} +@end ifhtml +@ifnotinfo +@ifnothtml +@center @image{api-figure1, , , Loading the extension} +@end ifnothtml +@end ifnotinfo +@end float + +The extension can call functions inside @command{gawk} through these +function pointers, at runtime, without needing (link-time) access +to @command{gawk}'s symbols. One of these function pointers is to a +function for ``registering'' new built-in functions. +@iftex +This is shown in @ref{load-new-function}. +@end iftex + +@float Figure,load-new-function +@caption{Loading the new function} +@ifinfo +@center @image{api-figure2, , , Loading the new function, txt} +@end ifinfo +@ifhtml +@center @image{api-figure2, , , Loading the new function, png} +@end ifhtml +@ifnotinfo +@ifnothtml +@center @image{api-figure2, , , Loading the new function} +@end ifnothtml +@end ifnotinfo +@end float + +In the other direction, the extension registers its new functions +with @command{gawk} by passing function pointers to the functions that +provide the new feature (@code{do_chdir()}, for example). @command{gawk} +associates the function pointer with a name and can then call it, using a +defined calling convention. +@iftex +This is shown in @ref{call-new-function}. +@end iftex + +@float Figure,call-new-function +@caption{Calling the new function} +@ifinfo +@center @image{api-figure3, , , Calling the new function, txt} +@end ifinfo +@ifhtml +@center @image{api-figure3, , , Calling the new function, png} +@end ifhtml +@ifnotinfo +@ifnothtml +@center @image{api-figure3, , , Calling the new function} +@end ifnothtml +@end ifnotinfo +@end float + +The @code{do_@var{xxx}()} function, in turn, then uses the function +pointers in the API @code{struct} to do its work, such as updating +variables or arrays, printing messages, setting @code{ERRNO}, and so on. + +Convenience macros in the @file{gawkapi.h} header file make calling +through the function pointers look like regular function calls so that +extension code is quite readable and understandable. + +Although all of this sounds medium complicated, the result is that +extension code is quite clean and straightforward. This can be seen in +the sample extensions @file{filefuncs.c} (@pxref{Extension Example}) +and also the @file{testext.c} code for testing the APIs. + +Some other bits and pieces: + +@itemize @bullet +@item +The API provides access to @command{gawk}'s @code{do_@var{xxx}} values, +reflecting command line options, like @code{do_lint}, @code{do_profiling} +and so on (@pxref{Extension API Variables}). +These are informational: an extension cannot affect these +inside @command{gawk}. In addition, attempting to assign to them +produces a compile-time error. + +@item +The API also provides major and minor version numbers, so that an +extension can check if the @command{gawk} it is loaded with supports the +facilities it was compiled with. (Version mismatches ``shouldn't'' +happen, but we all know how @emph{that} goes.) +@xref{Extension Versioning}, for details. +@end itemize + +@node Extension Future Growth +@subsection Room For Future Growth + +The API provides room for future growth, in two ways. + +An ``extension id'' is passed into the extension when its loaded. This +extension id is then passed back to @command{gawk} with each function +call. This allows @command{gawk} to identify the extension calling into it, +should it need to know. + +A ``name space'' is passed into @command{gawk} when an extension function +is registered. This provides for a future mechanism for grouping +extension functions and possibly avoiding name conflicts. + +Of course, as of this writing, no decisions have been made with respect +to any of the above. + +@node Extension API Description +@section API Description + +This (rather large) @value{SECTION} describes the API in detail. + +@menu +* Extension API Functions Introduction:: Introduction to the API functions. +* General Data Types:: The data types. +* Requesting Values:: How to get a value. +* Constructor Functions:: Functions for creating values. +* Registration Functions:: Functions to register things with + @command{gawk}. +* Printing Messages:: Functions for printing messages. +* Updating @code{ERRNO}:: Functions for updating @code{ERRNO}. +* Accessing Parameters:: Functions for accessing parameters. +* Symbol Table Access:: Functions for accessing global + variables. +* Array Manipulation:: Functions for working with arrays. +* Extension API Variables:: Variables provided by the API. +* Extension API Boilerplate:: Boilerplate code for using the API. +* Finding Extensions:: How @command{gawk} find compiled + extensions. +@end menu + +@node Extension API Functions Introduction +@subsection Introduction + +Access to facilities within @command{gawk} are made available +by calling through function pointers passed into your extension. + +API function pointers are provided for the following kinds of operations: + +@itemize @bullet +@item +Registrations functions. You may register: +@itemize @minus +@item +extension functions, +@item +exit callbacks, +@item +a version string, +@item +input parsers, +@item +output wrappers, +@item +and two-way processors. +@end itemize +All of these are discussed in detail, later in this @value{CHAPTER}. + +@item +Printing fatal, warning, and ``lint'' warning messages. + +@item +Updating @code{ERRNO}, or unsetting it. + +@item +Accessing parameters, including converting an undefined parameter into +an array. + +@item +Symbol table access: retrieving a global variable, creating one, +or changing one. This also includes the ability to create a scalar +variable that will be @emph{constant} within @command{awk} code. + +@item +Creating and releasing cached values; this provides an +efficient way to use values for multiple variables and +can be a big performance win. + +@item +Manipulating arrays: +@itemize @minus +@item +Retrieving, adding, deleting, and modifying elements +@item +Getting the count of elements in an array +@item +Creating a new array +@item +Clearing an array +@item +Flattening an array for easy C style looping over all its indices and elements +@end itemize +@end itemize + +Some points about using the API: + +@itemize @bullet +@item +You must include @code{<sys/types.h>} and @code{<sys/stat.h>} before including +the @file{gawkapi.h} header file. In addition, you must include either +@code{<stddef.h>} or @code{<stdlib.h>} to get the definition of @code{size_t}. +If you wish to use the boilerplate @code{dl_load_func()} macro, you will +need to include @code{<stdio.h>} as well. +Finally, to pass reasonable integer values for @code{ERRNO}, you +will need to include @code{<errno.h>}. + +@item +Although the API only uses ISO C 90 features, there is an exception; the +``constructor'' functions use the @code{inline} keyword. If your compiler +does not support this keyword, you should either place +@samp{-Dinline=''} on your command line, or use the GNU Autotools and include a +@file{config.h} file in your extensions. + +@item +All pointers filled in by @command{gawk} are to memory +managed by @command{gawk} and should be treated by the extension as +read-only. Memory for @emph{all} strings passed into @command{gawk} +from the extension @emph{must} come from @code{malloc()} and is managed +by @command{gawk} from then on. + +@item +The API defines several simple structs that map values as seen +from @command{awk}. A value can be a @code{double}, a string, or an +array (as in multidimensional arrays, or when creating a new array). +Strings maintain both pointer and length since embedded @code{NUL} +characters are allowed. + +By intent, strings are maintained using the current multibyte encoding (as +defined by @env{LC_@var{xxx}} environment variables) and not using wide +characters. This matches how @command{gawk} stores strings internally +and also how characters are likely to be input and output from files. + +@item +When retrieving a value (such as a parameter or that of a global variable +or array element), the extension requests a specific type (number, string, +scalars, value cookie, array, or ``undefined''). When the request is +``undefined,'' the returned value will have the real underlying type. + +However, if the request and actual type don't match, the access function +returns ``false'' and fills in the type of the actual value that is there, +so that the extension can, e.g., print an error message +(``scalar passed where array expected''). + +@c This is documented in the header file and needs some expanding upon. +@c The table there should be presented here +@end itemize + +While you may call the API functions by using the function pointers +directly, the interface is not so pretty. To make extension code look +more like regular code, the @file{gawkapi.h} header file defines a number +of macros which you should use in your code. This @value{SECTION} presents +the macros as if they were functions. + +@node General Data Types +@subsection General Purpose Data Types + +@quotation +@i{I have a true love/hate relationship with unions.}@* +Arnold Robbins + +@i{That's the thing about unions: the compiler will arrange things so they +can accommodate both love and hate.}@* +Chet Ramey +@end quotation + +The extension API defines a number of simple types and structures for general +purpose use. Additional, more specialized, data structures, are introduced +in subsequent @value{SECTION}s, together with the functions that use them. + +@table @code +@item typedef void *awk_ext_id_t; +A value of this type is received from @command{gawk} when an extension is loaded. +That value must then be passed back to @command{gawk} as the first parameter of +each API function. + +@item #define awk_const @dots{} +This macro expands to @samp{const} when compiling an extension, +and to nothing when compiling @command{gawk} itself. This makes +certain fields in the API data structures unwritable from extension code, +while allowing @command{gawk} to use them as it needs to. + +@item typedef int awk_bool_t; +A simple boolean type. At the moment, the API does not define special +``true'' and ``false'' values, although perhaps it should. + +@item typedef struct @{ +@itemx @ @ @ @ char *str;@ @ @ @ @ @ /* data */ +@itemx @ @ @ @ size_t len;@ @ @ @ @ /* length thereof, in chars */ +@itemx @} awk_string_t; +This represents a mutable string. @command{gawk} +owns the memory pointed to if it supplied +the value. Otherwise, it takes ownership of the memory pointed to. +@strong{Such memory must come from @code{malloc()}!} + +As mentioned earlier, strings are maintained using the current +multibyte encoding. + +@item typedef enum @{ +@itemx @ @ @ @ AWK_UNDEFINED, +@itemx @ @ @ @ AWK_NUMBER, +@itemx @ @ @ @ AWK_STRING, +@itemx @ @ @ @ AWK_ARRAY, +@itemx @ @ @ @ AWK_SCALAR,@ @ @ @ @ @ @ @ @ /* opaque access to a variable */ +@itemx @ @ @ @ AWK_VALUE_COOKIE@ @ @ /* for updating a previously created value */ +@itemx @} awk_valtype_t; +This @code{enum} indicates the type of a value. +It is used in the following @code{struct}. + +@item typedef struct @{ +@itemx @ @ @ @ awk_valtype_t val_type; +@itemx @ @ @ @ union @{ +@itemx @ @ @ @ @ @ @ @ awk_string_t@ @ @ @ @ @ @ s; +@itemx @ @ @ @ @ @ @ @ double@ @ @ @ @ @ @ @ @ @ @ @ @ d; +@itemx @ @ @ @ @ @ @ @ awk_array_t@ @ @ @ @ @ @ @ a; +@itemx @ @ @ @ @ @ @ @ awk_scalar_t@ @ @ @ @ @ @ scl; +@itemx @ @ @ @ @ @ @ @ awk_value_cookie_t@ vc; +@itemx @ @ @ @ @} u; +@itemx @} awk_value_t; +An ``@command{awk} value.'' +The @code{val_type} member indicates what kind of value the +@code{union} holds, and each member is of the appropriate type. + +@item #define str_value@ @ @ @ @ @ u.s +@itemx #define num_value@ @ @ @ @ @ u.d +@itemx #define array_cookie@ @ @ u.a +@itemx #define scalar_cookie@ @ u.scl +@itemx #define value_cookie@ @ @ u.vc +These macros make accessing the fields of the @code{awk_value_t} more +readable. + +@item typedef void *awk_scalar_t; +Scalars can be represented as an opaque type. These values are obtained from +@command{gawk} and then passed back into it. This is discussed in a general fashion below, +and in more detail in @ref{Symbol table by cookie}. + +@item typedef void *awk_value_cookie_t; +A ``value cookie'' is an opaque type representing a cached value. +This is also discussed in a general fashion below, +and in more detail in @ref{Cached values}. + +@end table + +Scalar values in @command{awk} are either numbers or strings. The +@code{awk_value_t} struct represents values. The @code{val_type} member +indicates what is in the @code{union}. + +Representing numbers is easy---the API uses a C @code{double}. Strings +require more work. Since @command{gawk} allows embedded @code{NUL} bytes +in string values, a string must be represented as a pair containing a +data-pointer and length. This is the @code{awk_string_t} type. + +Identifiers (i.e., the names of global variables) can be associated +with either scalar values or with arrays. In addition, @command{gawk} +provides true arrays of arrays, where any given array element can +itself be an array. Discussion of arrays is delayed until +@ref{Array Manipulation}. + +The various macros listed earlier make it easier to use the elements +of the @code{union} as if they were fields in a @code{struct}; this +is a common coding practice in C. Such code is easier to write and to +read, however it remains @emph{your} responsibility to make sure that +the @code{val_type} member correctly reflects the type of the value in +the @code{awk_value_t}. + +Conceptually, the first three members of the @code{union} (number, string, +and array) are all that is needed for working with @command{awk} values. +However, since the API provides routines for accessing and changing +the value of global scalar variables only by using the variable's name, +there is a performance penalty: @command{gawk} must find the variable +each time it is accessed and changed. This turns out to be a real issue, +not just a theoretical one. + +Thus, if you know that your extension will spend considerable time +reading and/or changing the value of one or more scalar variables, you +can obtain a @dfn{scalar cookie}@footnote{See +@uref{http://catb.org/jargon/html/C/cookie.html, the ``cookie'' entry in the Jargon file} for a +definition of @dfn{cookie}, and @uref{http://catb.org/jargon/html/M/magic-cookie.html, +the ``magic cookie'' entry in the Jargon file} for a nice example. See +also the entry for ``Cookie'' in the @ref{Glossary}.} +object for that variable, and then use +the cookie for getting the variable's value or for changing the variable's +value. +This is the @code{awk_scalar_t} type and @code{scalar_cookie} macro. +Given a scalar cookie, @command{gawk} can directly retrieve or +modify the value, as required, without having to first find it. + +The @code{awk_value_cookie_t} type and @code{value_cookie} macro are similar. +If you know that you wish to +use the same numeric or string @emph{value} for one or more variables, +you can create the value once, retaining a @dfn{value cookie} for it, +and then pass in that value cookie whenever you wish to set the value of a +variable. This saves both storage space within the running @command{gawk} +process as well as the time needed to create the value. + +@node Requesting Values +@subsection Requesting Values + +All of the functions that return values from @command{gawk} +work in the same way. You pass in an @code{awk_valtype_t} value +to indicate what kind of value you expect. If the actual value +matches what you requested, the function returns true and fills +in the @code{awk_value_t} result. +Otherwise, the function returns false, and the @code{val_type} +member indicates the type of the actual value. You may then +print an error message, or reissue the request for the actual +value type, as appropriate. This behavior is summarized in +@ref{table-value-types-returned}. + +@ifnotplaintext +@float Table,table-value-types-returned +@caption{Value Types Returned} +@multitable @columnfractions .50 .50 +@headitem @tab Type of Actual Value: +@end multitable +@multitable @columnfractions .166 .166 .198 .15 .15 .166 +@headitem @tab @tab String @tab Number @tab Array @tab Undefined +@item @tab @b{String} @tab String @tab String @tab false @tab false +@item @tab @b{Number} @tab Number if can be converted, else false @tab Number @tab false @tab false +@item @b{Type} @tab @b{Array} @tab false @tab false @tab Array @tab false +@item @b{Requested:} @tab @b{Scalar} @tab Scalar @tab Scalar @tab false @tab false +@item @tab @b{Undefined} @tab String @tab Number @tab Array @tab Undefined +@item @tab @b{Value Cookie} @tab false @tab false @tab false @tab false +@end multitable +@end float +@end ifnotplaintext +@ifplaintext +@float Table,table-value-types-returned +@caption{Value Types Returned} +@example + +-------------------------------------------------+ + | Type of Actual Value: | + +------------+------------+-----------+-----------+ + | String | Number | Array | Undefined | ++-----------+-----------+------------+------------+-----------+-----------+ +| | String | String | String | false | false | +| |-----------+------------+------------+-----------+-----------+ +| | Number | Number if | Number | false | false | +| | | can be | | | | +| | | converted, | | | | +| | | else false | | | | +| |-----------+------------+------------+-----------+-----------+ +| Type | Array | false | false | Array | false | +| Requested |-----------+------------+------------+-----------+-----------+ +| | Scalar | Scalar | Scalar | false | false | +| |-----------+------------+------------+-----------+-----------+ +| | Undefined | String | Number | Array | Undefined | +| |-----------+------------+------------+-----------+-----------+ +| | Value | false | false | false | false | +| | Cookie | | | | | ++-----------+-----------+------------+------------+-----------+-----------+ +@end example +@end float +@end ifplaintext + +@node Constructor Functions +@subsection Constructor Functions and Convenience Macros + +The API provides a number of @dfn{constructor} functions for creating +string and numeric values, as well as a number of convenience macros. +This @value{SUBSECTION} presents them all as function prototypes, in +the way that extension code would use them. + +@table @code +@item static inline awk_value_t * +@itemx make_const_string(const char *string, size_t length, awk_value_t *result) +This function creates a string value in the @code{awk_value_t} variable +pointed to by @code{result}. It expects @code{string} to be a C string constant +(or other string data), and automatically creates a @emph{copy} of the data +for storage in @code{result}. It returns @code{result}. + +@item static inline awk_value_t * +@itemx make_malloced_string(const char *string, size_t length, awk_value_t *result) +This function creates a string value in the @code{awk_value_t} variable +pointed to by @code{result}. It expects @code{string} to be a @samp{char *} +value pointing to data previously obtained from @code{malloc()}. The idea here +is that the data is passed directly to @command{gawk}, which assumes +responsibility for it. It returns @code{result}. + +@item static inline awk_value_t * +@itemx make_null_string(awk_value_t *result) +This specialized function creates a null string (the ``undefined'' value) +in the @code{awk_value_t} variable pointed to by @code{result}. +It returns @code{result}. + +@item static inline awk_value_t * +@itemx make_number(double num, awk_value_t *result) +This function simply creates a numeric value in the @code{awk_value_t} variable +pointed to by @code{result}. +@end table + +Two convenience macros may be used for allocating storage from @code{malloc()} +and @code{realloc()}. If the allocation fails, they cause @command{gawk} to +exit with a fatal error message. They should be used as if they were +procedure calls that do not return a value. + +@table @code +@item emalloc(pointer, type, size, message) +The arguments to this macro are as follows: +@c nested table +@table @code +@item pointer +The pointer variable to point at the allocated storage. + +@item type +The type of the pointer variable, used to create a cast for the call to @code{malloc()}. + +@item size +The total number of bytes to be allocated. + +@item message +A message to be prefixed to the fatal error message. Typically this is the name +of the function using the macro. +@end table + +@noindent +For example, you might allocate a string value like so: + +@example +awk_value_t result; +char *message; +const char greet[] = "Don't Panic!"; + +emalloc(message, char *, sizeof(greet), "myfunc"); +strcpy(message, greet); +make_malloced_string(message, strlen(message), & result); +@end example + +@item erealloc(pointer, type, size, message) +This is like @code{emalloc()}, but it calls @code{realloc()}, +instead of @code{malloc()}. +The arguments are the same as for the @code{emalloc()} macro. +@end table + +@node Registration Functions +@subsection Registration Functions + +This @value{SECTION} describes the API functions for +registering parts of your extension with @command{gawk}. + +@menu +* Extension Functions:: Registering extension functions. +* Exit Callback Functions:: Registering an exit callback. +* Extension Version String:: Registering a version string. +* Input Parsers:: Registering an input parser. +* Output Wrappers:: Registering an output wrapper. +* Two-way processors:: Registering a two-way processor. +@end menu + +@node Extension Functions +@subsubsection Registering An Extension Function + +Extension functions are described by the following record: + +@example +typedef struct @{ +@ @ @ @ const char *name; +@ @ @ @ awk_value_t *(*function)(int num_actual_args, awk_value_t *result); +@ @ @ @ size_t num_expected_args; +@} awk_ext_func_t; +@end example + +The fields are: + +@table @code +@item const char *name; +The name of the new function. +@command{awk} level code calls the function by this name. +This is a regular C string. + +@item awk_value_t *(*function)(int num_actual_args, awk_value_t *result); +This is a pointer to the C function that provides the desired +functionality. +The function must fill in the result with either a number +or a string. @command{awk} takes ownership of any string memory. +As mentioned earlier, string memory @strong{must} come from @code{malloc()}. + +The function must return the value of @code{result}. +This is for the convenience of the calling code inside @command{gawk}. + +@item size_t num_expected_args; +This is the number of arguments the function expects to receive. +Each extension function may decide what to do if the number of +arguments isn't what it expected. Following @command{awk} functions, it +is likely OK to ignore extra arguments. +@end table + +Once you have a record representing your extension function, you register +it with @command{gawk} using this API function: + +@table @code +@item awk_bool_t add_ext_func(const char *namespace, const awk_ext_func_t *func); +This function returns true upon success, false otherwise. +The @code{namespace} parameter is currently not used; you should pass in an +empty string (@code{""}). The @code{func} pointer is the address of a +@code{struct} representing your function, as just described. +@end table + +@node Exit Callback Functions +@subsubsection Registering An Exit Callback Function + +An @dfn{exit callback} function is a function that +@command{gawk} calls before it exits. +Such functions are useful if you have general ``clean up'' tasks +that should be performed in your extension (such as closing data +base connections or other resource deallocations). +You can register such +a function with @command{gawk} using the following function. + +@table @code +@item void awk_atexit(void (*funcp)(void *data, int exit_status), +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ void *arg0); +The parameters are: +@c nested table +@table @code +@item funcp +A pointer to the function to be called before @command{gawk} exits. The @code{data} +parameter will be the original value of @code{arg0}. +The @code{exit_status} parameter is +the exit status value that @command{gawk} will pass to the @code{exit()} system call. + +@item arg0 +A pointer to private data which @command{gawk} saves in order to pass to +the function pointed to by @code{funcp}. +@end table +@end table + +Exit callback functions are called in Last-In-First-Out (LIFO) order---that is, in +the reverse order in which they are registered with @command{gawk}. + +@node Extension Version String +@subsubsection Registering An Extension Version String + +You can register a version string which indicates the name and +version of your extension, with @command{gawk}, as follows: + +@table @code +@item void register_ext_version(const char *version); +Register the string pointed to by @code{version} with @command{gawk}. +@command{gawk} does @emph{not} copy the @code{version} string, so +it should not be changed. +@end table + +@command{gawk} prints all registered extension version strings when it +is invoked with the @option{--version} option. + +@node Input Parsers +@subsubsection Customized Input Parsers + +By default, @command{gawk} reads text files as its input. It uses the value +of @code{RS} to find the end of the record, and then uses @code{FS} +(or @code{FIELDWIDTHS}) to split it into fields (@pxref{Reading Files}). +Additionally, it sets the value of @code{RT} (@pxref{Built-in Variables}). + +If you want, you can provide your own, custom, input parser. An input +parser's job is to return a record to the @command{gawk} record processing +code, along with indicators for the value and length of the data to be +used for @code{RT}, if any. + +To provide an input parser, you must first provide two functions +(where @var{XXX} is a prefix name for your extension): + +@table @code +@item awk_bool_t @var{XXX}_can_take_file(const awk_input_buf_t *iobuf) +This function examines the information available in @code{iobuf} +(which we discuss shortly). Based on the information there, it +decides if the input parser should be used for this file. +If so, it should return true. Otherwise, it should return false. +It should not change any state (variable values, etc.) within @command{gawk}. + +@item awk_bool_t @var{XXX}_take_control_of(awk_input_buf_t *iobuf) +When @command{gawk} decides to hand control of the file over to the +input parser, it calls this function. This function in turn must fill +in certain fields in the @code{awk_input_buf_t} structure, and ensure +that certain conditions are true. It should then return true. If an +error of some kind occurs, it should not fill in any fields, and should +return false; then @command{gawk} will not use the input parser. +The details are presented shortly. +@end table + +Your extension should package these functions inside an +@code{awk_input_parser_t}, which looks like this: + +@example +typedef struct input_parser @{ + const char *name; /* name of parser */ + awk_bool_t (*can_take_file)(const awk_input_buf_t *iobuf); + awk_bool_t (*take_control_of)(awk_input_buf_t *iobuf); + awk_const struct input_parser *awk_const next; /* for use by gawk */ +@} awk_input_parser_t; +@end example + +The fields are: + +@table @code +@item const char *name; +The name of the input parser. This is a regular C string. + +@item awk_bool_t (*can_take_file)(const awk_input_buf_t *iobuf); +A pointer to your @code{@var{XXX}_can_take_file()} function. + +@item awk_bool_t (*take_control_of)(awk_input_buf_t *iobuf); +A pointer to your @code{@var{XXX}_take_control_of()} function. + +@item awk_const struct input_parser *awk_const next; +This pointer is used by @command{gawk}. +The extension cannot modify it. +@end table + +The steps are as follows: + +@enumerate +@item +Create a @code{static awk_input_parser_t} variable and initialize it +appropriately. + +@item +When your extension is loaded, register your input parser with +@command{gawk} using the @code{register_input_parser()} API function +(described below). +@end enumerate + +An @code{awk_input_buf_t} looks like this: + +@example +typedef struct awk_input @{ + const char *name; /* filename */ + int fd; /* file descriptor */ +#define INVALID_HANDLE (-1) + void *opaque; /* private data for input parsers */ + int (*get_record)(char **out, struct awk_input *iobuf, + int *errcode, char **rt_start, size_t *rt_len); + void (*close_func)(struct awk_input *iobuf); + struct stat sbuf; /* stat buf */ +@} awk_input_buf_t; +@end example + +The fields can be divided into two categories: those for use (initially, +at least) by @code{@var{XXX}_can_take_file()}, and those for use by +@code{@var{XXX}_take_control_of()}. The first group of fields and their uses +are as follows: + +@table @code +@item const char *name; +The name of the file. + +@item int fd; +A file descriptor for the file. If @command{gawk} was able to +open the file, then @code{fd} will @emph{not} be equal to +@code{INVALID_HANDLE}. Otherwise, it will. + +@item struct stat sbuf; +If file descriptor is valid, then @command{gawk} will have filled +in this structure via a call to the @code{fstat()} system call. +@end table + +The @code{@var{XXX}_can_take_file()} function should examine these +fields and decide if the input parser should be used for the file. +The decision can be made based upon @command{gawk} state (the value +of a variable defined previously by the extension and set by +@command{awk} code), the name of the +file, whether or not the file descriptor is valid, the information +in the @code{struct stat}, or any combination of the above. + +Once @code{@var{XXX}_can_take_file()} has returned true, and +@command{gawk} has decided to use your input parser, it calls +@code{@var{XXX}_take_control_of()}. That function then fills in at +least the @code{get_record} field of the @code{awk_input_buf_t}. It must +also ensure that @code{fd} is not set to @code{INVALID_HANDLE}. All of +the fields that may be filled by @code{@var{XXX}_take_control_of()} +are as follows: + +@table @code +@item void *opaque; +This is used to hold any state information needed by the input parser +for this file. It is ``opaque'' to @command{gawk}. The input parser +is not required to use this pointer. + +@item int@ (*get_record)(char@ **out, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ struct@ awk_input *iobuf, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ int *errcode, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ char **rt_start, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ size_t *rt_len); +This function pointer should point to a function that creates the input +records. Said function is the core of the input parser. Its behavior +is described below. + +@item void (*close_func)(struct awk_input *iobuf); +This function pointer should point to a function that does +the ``tear down.'' It should release any resources allocated by +@code{@var{XXX}_take_control_of()}. It may also close the file. If it +does so, it should set the @code{fd} field to @code{INVALID_HANDLE}. + +If @code{fd} is still not @code{INVALID_HANDLE} after the call to this +function, @command{gawk} calls the regular @code{close()} system call. + +Having a ``tear down'' function is optional. If your input parser does +not need it, do not set this field. Then, @command{gawk} calls the +regular @code{close()} system call on the file descriptor, so it should +be valid. +@end table + +The @code{@var{XXX}_get_record()} function does the work of creating +input records. The parameters are as follows: + +@table @code +@item char **out +This is a pointer to a @code{char *} variable which is set to point +to the record. @command{gawk} makes its own copy of the data, so +the extension must manage this storage. + +@item struct awk_input *iobuf +This is the @code{awk_input_buf_t} for the file. The fields should be +used for reading data (@code{fd}) and for managing private state +(@code{opaque}), if any. + +@item int *errcode +If an error occurs, @code{*errcode} should be set to an appropriate +code from @code{<errno.h>}. + +@item char **rt_start +@itemx size_t *rt_len +If the concept of a ``record terminator'' makes sense, then +@code{*rt_start} should be set to point to the data to be used for +@code{RT}, and @code{*rt_len} should be set to the length of the +data. Otherwise, @code{*rt_len} should be set to zero. +@code{gawk} makes its own copy of this data, so the +extension must manage the storage. +@end table + +The return value is the length of the buffer pointed to by +@code{*out}, or @code{EOF} if end-of-file was reached or an +error occurred. + +It is guaranteed that @code{errcode} is a valid pointer, so there is no +need to test for a @code{NULL} value. @command{gawk} sets @code{*errcode} +to zero, so there is no need to set it unless an error occurs. + +If an error does occur, the function should return @code{EOF} and set +@code{*errcode} to a non-zero value. In that case, if @code{*errcode} +does not equal @minus{}1, @command{gawk} automatically updates +the @code{ERRNO} variable based on the value of @code{*errcode} (e.g., +setting @samp{*errcode = errno} should do the right thing). + +@command{gawk} ships with a sample extension that reads directories, +returning records for each entry in the directory (@pxref{Extension +Sample Readdir}). You may wish to use that code as a guide for writing +your own input parser. + +When writing an input parser, you should think about (and document) +how it is expected to interact with @command{awk} code. You may want +it to always be called, and take effect as appropriate (as the +@code{readdir} extension does). Or you may want it to take effect +based upon the value of an @code{awk} variable, as the XML extension +from the @code{gawkextlib} project does (@pxref{gawkextlib}). +In the latter case, code in a @code{BEGINFILE} section +can look at @code{FILENAME} and @code{ERRNO} to decide whether or +not to activate an input parser (@pxref{BEGINFILE/ENDFILE}). + +You register your input parser with the following function: + +@table @code +@item void register_input_parser(awk_input_parser_t *input_parser); +Register the input parser pointed to by @code{input_parser} with +@command{gawk}. +@end table + +@node Output Wrappers +@subsubsection Customized Output Wrappers + +An @dfn{output wrapper} is the mirror image of an input parser. +It allows an extension to take over the output to a file opened +with the @samp{>} or @samp{>>} operators (@pxref{Redirection}). + +The output wrapper is very similar to the input parser structure: + +@example +typedef struct output_wrapper @{ + const char *name; /* name of the wrapper */ + awk_bool_t (*can_take_file)(const awk_output_buf_t *outbuf); + awk_bool_t (*take_control_of)(awk_output_buf_t *outbuf); + awk_const struct output_wrapper *awk_const next; /* for use by gawk */ +@} awk_output_wrapper_t; +@end example + +The members are as follows: + +@table @code +@item const char *name; +This is the name of the output wrapper. + +@item awk_bool_t (*can_take_file)(const awk_output_buf_t *outbuf); +This points to a function that examines the information in +the @code{awk_output_buf_t} structure pointed to by @code{outbuf}. +It should return true if the output wrapper wants to take over the +file, and false otherwise. It should not change any state (variable +values, etc.) within @command{gawk}. + +@item awk_bool_t (*take_control_of)(awk_output_buf_t *outbuf); +The function pointed to by this field is called when @command{gawk} +decides to let the output wrapper take control of the file. It should +fill in appropriate members of the @code{awk_output_buf_t} structure, +as described below, and return true if successful, false otherwise. + +@item awk_const struct output_wrapper *awk_const next; +This is for use by @command{gawk}. +@end table + +The @code{awk_output_buf_t} structure looks like this: + +@example +typedef struct @{ + const char *name; /* name of output file */ + const char *mode; /* mode argument to fopen */ + FILE *fp; /* stdio file pointer */ + awk_bool_t redirected; /* true if a wrapper is active */ + void *opaque; /* for use by output wrapper */ + size_t (*gawk_fwrite)(const void *buf, size_t size, size_t count, + FILE *fp, void *opaque); + int (*gawk_fflush)(FILE *fp, void *opaque); + int (*gawk_ferror)(FILE *fp, void *opaque); + int (*gawk_fclose)(FILE *fp, void *opaque); +@} awk_output_buf_t; +@end example + +Here too, your extension will define @code{@var{XXX}_can_take_file()} +and @code{@var{XXX}_take_control_of()} functions that examine and update +data members in the @code{awk_output_buf_t}. +The data members are as follows: + +@table @code +@item const char *name; +The name of the output file. + +@item const char *mode; +The mode string (as would be used in the second argument to @code{fopen()}) +with which the file was opened. + +@item FILE *fp; +The @code{FILE} pointer from @code{<stdio.h>}. @command{gawk} opens the file +before attempting to find an output wrapper. + +@item awk_bool_t redirected; +This field must be set to true by the @code{@var{XXX}_take_control_of()} function. + +@item void *opaque; +This pointer is opaque to @command{gawk}. The extension should use it to store +a pointer to any private data associated with the file. + +@item size_t (*gawk_fwrite)(const void *buf, size_t size, size_t count, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ FILE *fp, void *opaque); +@itemx int (*gawk_fflush)(FILE *fp, void *opaque); +@itemx int (*gawk_ferror)(FILE *fp, void *opaque); +@itemx int (*gawk_fclose)(FILE *fp, void *opaque); +These pointers should be set to point to functions that perform +the equivalent function as the @code{<stdio.h>} functions do, if appropriate. +@command{gawk} uses these function pointers for all output. +@command{gawk} initializes the pointers to point to internal, ``pass through'' +functions that just call the regular @code{<stdio.h>} functions, so an +extension only needs to redefine those functions that are appropriate for +what it does. +@end table + +The @code{@var{XXX}_can_take_file()} function should make a decision based +upon the @code{name} and @code{mode} fields, and any additional state +(such as @command{awk} variable values) that is appropriate. + +When @command{gawk} calls @code{@var{XXX}_take_control_of()}, it should fill +in the other fields, as appropriate, except for @code{fp}, which it should just +use normally. + +You register your output wrapper with the following function: + +@table @code +@item void register_output_wrapper(awk_output_wrapper_t *output_wrapper); +Register the output wrapper pointed to by @code{output_wrapper} with +@command{gawk}. +@end table + +@node Two-way processors +@subsubsection Customized Two-way Processors + +A @dfn{two-way processor} combines an input parser and an output wrapper for +two-way I/O with the @samp{|&} operator (@pxref{Redirection}). It makes identical +use of the @code{awk_input_parser_t} and @code{awk_output_buf_t} structures +as described earlier. + +A two-way processor is represented by the following structure: + +@example +typedef struct two_way_processor @{ + const char *name; /* name of the two-way processor */ + awk_bool_t (*can_take_two_way)(const char *name); + awk_bool_t (*take_control_of)(const char *name, + awk_input_buf_t *inbuf, + awk_output_buf_t *outbuf); + awk_const struct two_way_processor *awk_const next; /* for use by gawk */ +@} awk_two_way_processor_t; +@end example + +The fields are as follows: + +@table @code +@item const char *name; +The name of the two-way processor. + +@item awk_bool_t (*can_take_two_way)(const char *name); +This function returns true if it wants to take over two-way I/O for this filename. +It should not change any state (variable +values, etc.) within @command{gawk}. + +@item awk_bool_t (*take_control_of)(const char *name, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_input_buf_t *inbuf, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_output_buf_t *outbuf); +This function should fill in the @code{awk_input_buf_t} and +@code{awk_outut_buf_t} structures pointed to by @code{inbuf} and +@code{outbuf}, respectively. These structures were described earlier. + +@item awk_const struct two_way_processor *awk_const next; +This is for use by @command{gawk}. +@end table + +As with the input parser and output processor, you provide +``yes I can take this'' and ``take over for this'' functions, +@code{@var{XXX}_can_take_two_way()} and @code{@var{XXX}_take_control_of()}. + +You register your two-way processor with the following function: + +@table @code +@item void register_two_way_processor(awk_two_way_processor_t *two_way_processor); +Register the two-way processor pointed to by @code{two_way_processor} with +@command{gawk}. +@end table + +@node Printing Messages +@subsection Printing Messages + +You can print different kinds of warning messages from your +extension, as described below. Note that for these functions, +you must pass in the extension id received from @command{gawk} +when the extension was loaded.@footnote{Because the API uses only ISO C 90 +features, it cannot make use of the ISO C 99 variadic macro feature to hide +that parameter. More's the pity.} + +@table @code +@item void fatal(awk_ext_id_t id, const char *format, ...); +Print a message and then cause @command{gawk} to exit immediately. + +@item void warning(awk_ext_id_t id, const char *format, ...); +Print a warning message. + +@item void lintwarn(awk_ext_id_t id, const char *format, ...); +Print a ``lint warning.'' Normally this is the same as printing a +warning message, but if @command{gawk} was invoked with @samp{--lint=fatal}, +then lint warnings become fatal error messages. +@end table + +All of these functions are otherwise like the C @code{printf()} +family of functions, where the @code{format} parameter is a string +with literal characters and formatting codes intermixed. + +@node Updating @code{ERRNO} +@subsection Updating @code{ERRNO} + +The following functions allow you to update the @code{ERRNO} +variable: + +@table @code +@item void update_ERRNO_int(int errno_val); +Set @code{ERRNO} to the string equivalent of the error code +in @code{errno_val}. The value should be one of the defined +error codes in @code{<errno.h>}, and @command{gawk} turns it +into a (possibly translated) string using the C @code{strerror()} function. + +@item void update_ERRNO_string(const char *string); +Set @code{ERRNO} directly to the string value of @code{ERRNO}. +@command{gawk} makes a copy of the value of @code{string}. + +@item void unset_ERRNO(); +Unset @code{ERRNO}. +@end table + +@node Accessing Parameters +@subsection Accessing and Updating Parameters + +Two functions give you access to the arguments (parameters) +passed to your extension function. They are: + +@table @code +@item awk_bool_t get_argument(size_t count, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_valtype_t wanted, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_value_t *result); +Fill in the @code{awk_value_t} structure pointed to by @code{result} +with the @code{count}'th argument. Return true if the actual +type matches @code{wanted}, false otherwise. In the latter +case, @code{result@w{->}val_type} indicates the actual type +(@pxref{table-value-types-returned}). Counts are zero based---the first +argument is numbered zero, the second one, and so on. @code{wanted} +indicates the type of value expected. + +@item awk_bool_t set_argument(size_t count, awk_array_t array); +Convert a parameter that was undefined into an array; this provides +call-by-reference for arrays. Return false if @code{count} is too big, +or if the argument's type is not undefined. @xref{Array Manipulation}, +for more information on creating arrays. +@end table + +@node Symbol Table Access +@subsection Symbol Table Access + +Two sets of routines provide access to global variables, and one set +allows you to create and release cached values. + +@menu +* Symbol table by name:: Accessing variables by name. +* Symbol table by cookie:: Accessing variables by ``cookie''. +* Cached values:: Creating and using cached values. +@end menu + +@node Symbol table by name +@subsubsection Variable Access and Update by Name + +The following routines provide the ability to access and update +global @command{awk}-level variables by name. In compiler terminology, +identifiers of different kinds are termed @dfn{symbols}, thus the ``sym'' +in the routines' names. The data structure which stores information +about symbols is termed a @dfn{symbol table}. + +@table @code +@item awk_bool_t sym_lookup(const char *name, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_valtype_t wanted, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_value_t *result); +Fill in the @code{awk_value_t} structure pointed to by @code{result} +with the value of the variable named by the string @code{name}, which is +a regular C string. @code{wanted} indicates the type of value expected. +Return true if the actual type matches @code{wanted}, false otherwise +In the latter case, @code{result->val_type} indicates the actual type +(@pxref{table-value-types-returned}). + +@item awk_bool_t sym_update(const char *name, awk_value_t *value); +Update the variable named by the string @code{name}, which is a regular +C string. The variable is added to @command{gawk}'s symbol table +if it is not there. Return true if everything worked, false otherwise. + +Changing types (scalar to array or vice versa) of an existing variable +is @emph{not} allowed, nor may this routine be used to update an array. +This routine cannot be be used to update any of the predefined +variables (such as @code{ARGC} or @code{NF}). + +@item awk_bool_t sym_constant(const char *name, awk_value_t *value); +Create a variable named by the string @code{name}, which is +a regular C string, that has the constant value as given by +@code{value}. @command{awk}-level code cannot change the value of this +variable.@footnote{There (currently) is no @code{awk}-level feature that +provides this ability.} The extension may change the value of @code{name}'s +variable with subsequent calls to this routine, and may also convert +a variable created by @code{sym_update()} into a constant. However, +once a variable becomes a constant it cannot later be reverted into a +mutable variable. +@end table + +@node Symbol table by cookie +@subsubsection Variable Access and Update by Cookie + +A @dfn{scalar cookie} is an opaque handle that provide access +to a global variable or array. It is an optimization that +avoids looking up variables in @command{gawk}'s symbol table every time +access is needed. This was discussed earlier, in @ref{General Data Types}. + +The following functions let you work with scalar cookies. + +@table @code +@item awk_bool_t sym_lookup_scalar(awk_scalar_t cookie, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_valtype_t wanted, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_value_t *result); +Retrieve the current value of a scalar cookie. +Once you have obtained a scalar_cookie using @code{sym_lookup()}, you can +use this function to get its value more efficiently. +Return false if the value cannot be retrieved. + +@item awk_bool_t sym_update_scalar(awk_scalar_t cookie, awk_value_t *value); +Update the value associated with a scalar cookie. Return false if +the new value is not one of @code{AWK_STRING} or @code{AWK_NUMBER}. +Here too, the built-in variables may not be updated. +@end table + +It is not obvious at first glance how to work with scalar cookies or +what their @i{raison d'etre} really is. In theory, the @code{sym_lookup()} +and @code{sym_update()} routines are all you really need to work with +variables. For example, you might have code that looked up the value of +a variable, evaluated a condition, and then possibly changed the value +of the variable based on the result of that evaluation, like so: + +@example +/* do_magic --- do something really great */ + +static awk_value_t * +do_magic(int nargs, awk_value_t *result) +@{ + awk_value_t value; + + if ( sym_lookup("MAGIC_VAR", AWK_NUMBER, & value) + && some_condition(value.num_value)) @{ + value.num_value += 42; + sym_update("MAGIC_VAR", & value); + @} + + return make_number(0.0, result); +@} +@end example + +@noindent +This code looks (and is) simple and straightforward. So what's the problem? + +Consider what happens if @command{awk}-level code associated with your +extension calls the @code{magic()} function (implemented in C by @code{do_magic()}), +once per record, while processing hundreds of thousands or millions of records. +The @code{MAGIC_VAR} variable is looked up in the symbol table once or twice per function call! + +The symbol table lookup is really pure overhead; it is considerably more efficient +to get a cookie that represents the variable, and use that to get the variable's +value and update it as needed.@footnote{The difference is measurable and quite real. Trust us.} + +Thus, the way to use cookies is as follows. First, install your extension's variable +in @command{gawk}'s symbol table using @code{sym_update()}, as usual. Then get a +scalar cookie for the variable using @code{sym_lookup()}: + +@example +static awk_scalar_t magic_var_cookie; /* cookie for MAGIC_VAR */ + +static void +my_extension_init() +@{ + awk_value_t value; + + /* install initial value */ + sym_update("MAGIC_VAR", make_number(42.0, & value)); + + /* get cookie */ + sym_lookup("MAGIC_VAR", AWK_SCALAR, & value); + + /* save the cookie */ + magic_var_cookie = value.scalar_cookie; + @dots{} +@} +@end example + +Next, use the routines in this section for retrieving and updating +the value through the cookie. Thus, @code{do_magic()} now becomes +something like this: + +@example +/* do_magic --- do something really great */ + +static awk_value_t * +do_magic(int nargs, awk_value_t *result) +@{ + awk_value_t value; + + if ( sym_lookup_scalar(magic_var_cookie, AWK_NUMBER, & value) + && some_condition(value.num_value)) @{ + value.num_value += 42; + sym_update_scalar(magic_var_cookie, & value); + @} + @dots{} + + return make_number(0.0, result); +@} +@end example + +@quotation NOTE +The previous code omitted error checking for +presentation purposes. Your extension code should be more robust +and carefully check the return values from the API functions. +@end quotation + +@node Cached values +@subsubsection Creating and Using Cached Values + +The routines in this section allow you to create and release +cached values. As with scalar cookies, in theory, cached values +are not necessary. You can create numbers and strings using +the functions in @ref{Constructor Functions}. You can then +assign those values to variables using @code{sym_update()} +or @code{sym_update_scalar()}, as you like. + +However, you can understand the point of cached values if you remember that +@emph{every} string value's storage @emph{must} come from @code{malloc()}. +If you have 20 variables, all of which have the same string value, you +must create 20 identical copies of the string.@footnote{Numeric values +are clearly less problematic, requiring only a C @code{double} to store.} + +It is clearly more efficient, if possible, to create a value once, and +then tell @command{gawk} to reuse the value for multiple variables. That +is what the routines in this section let you do. The functions are as follows: + +@table @code +@item awk_bool_t create_value(awk_value_t *value, awk_value_cookie_t *result); +Create a cached string or numeric value from @code{value} for efficient later +assignment. +Only @code{AWK_NUMBER} and @code{AWK_STRING} values are allowed. Any other type +is rejected. While @code{AWK_UNDEFINED} could be allowed, doing so would +result in inferior performance. + +@item awk_bool_t release_value(awk_value_cookie_t vc); +Release the memory associated with a value cookie obtained +from @code{create_value()}. +@end table + +You use value cookies in a fashion similar to the way you use scalar cookies. +In the extension initialization routine, you create the value cookie: + +@example +static awk_value_cookie_t answer_cookie; /* static value cookie */ + +static void +my_extension_init() +@{ + awk_value_t value; + char *long_string; + size_t long_string_len; + + /* code from earlier */ + @dots{} + /* @dots{} fill in long_string and long_string_len @dots{} */ + make_malloced_string(long_string, long_string_len, & value); + create_value(& value, & answer_cookie); /* create cookie */ + @dots{} +@} +@end example + +Once the value is created, you can use it as the value of any number +of variables: + +@example +static awk_value_t * +do_magic(int nargs, awk_value_t *result) +@{ + awk_value_t new_value; + + @dots{} /* as earlier */ + + value.val_type = AWK_VALUE_COOKIE; + value.value_cookie = answer_cookie; + sym_update("VAR1", & value); + sym_update("VAR2", & value); + @dots{} + sym_update("VAR100", & value); + @dots{} +@} +@end example + +@noindent +Using value cookies in this way saves considerable storage, since all of +@code{VAR1} through @code{VAR100} share the same value. + +You might be wondering, ``Is this sharing problematic? +What happens if @command{awk} code assigns a new value to @code{VAR1}, +are all the others be changed too?'' + +That's a great question. The answer is that no, it's not a problem. +@command{gawk} is smart enough to avoid such problems. + +Finally, as part of your clean up action (@pxref{Exit Callback Functions}) +you should release any cached values that you created, using +@code{release_value()}. + +@node Array Manipulation +@subsection Array Manipulation + +The primary data structure@footnote{Okay, the only data structure.} in @command{awk} +is the associative array (@pxref{Arrays}). +Extensions need to be able to manipulate @command{awk} arrays. +The API provides a number of data structures for working with arrays, +functions for working with individual elements, and functions for +working with arrays as a whole. This includes the ability to +``flatten'' an array so that it is easy for C code to traverse +every element in an array. The array data structures integrate +nicely with the data structures for values to make it easy to +both work with and create true arrays of arrays (@pxref{General Data Types}). + +@menu +* Array Data Types:: Data types for working with arrays. +* Array Functions:: Functions for working with arrays. +* Flattening Arrays:: How to flatten arrays. +* Creating Arrays:: How to create and populate arrays. +@end menu + +@node Array Data Types +@subsubsection Array Data Types + +The data types associated with arrays are listed below. + +@table @code +@item typedef void *awk_array_t; +If you request the value of an array variable, you get back an +@code{awk_array_t} value. This value is opaque@footnote{It is also +a ``cookie,'' but the @command{gawk} developers did not wish to overuse this +term.} to the extension; it uniquely identifies the array but can +only be used by passing it into API functions or receiving it from API +functions. This is very similar to way @samp{FILE *} values are used +with the @code{<stdio.h>} library routines. + + +@item +@item typedef struct awk_element @{ +@itemx @ @ @ @ /* convenience linked list pointer, not used by gawk */ +@itemx @ @ @ @ struct awk_element *next; +@itemx @ @ @ @ enum @{ +@itemx @ @ @ @ @ @ @ @ AWK_ELEMENT_DEFAULT = 0,@ @ /* set by gawk */ +@itemx @ @ @ @ @ @ @ @ AWK_ELEMENT_DELETE = 1@ @ @ @ /* set by extension if should be deleted */ +@itemx @ @ @ @ @} flags; +@itemx @ @ @ @ awk_value_t index; +@itemx @ @ @ @ awk_value_t value; +@itemx @} awk_element_t; +The @code{awk_element_t} is a ``flattened'' +array element. @command{awk} produces an array of these +inside the @code{awk_flat_array_t} (see the next item). +Individual elements may be marked for deletion. New elements must be added +individually, one at a time, using the separate API for that purpose. +The fields are as follows: + +@c nested table +@table @code +@item struct awk_element *next; +This pointer is for the convenience of extension writers. It allows +an extension to create a linked list of new elements which can then be +added to an array in a loop that traverses the list. + +@item enum @{ @dots{} @} flags; +A set of flag values that convey information between @command{gawk} +and the extension. Currently there is only one: @code{AWK_ELEMENT_DELETE}, +which the extension can set to cause @command{gawk} to delete the +element from the original array upon release of the flattened array. + +@item index +@itemx value +The index and value of the element, respectively. +@emph{All} memory pointed to by @code{index} and @code{value} belongs to @command{gawk}. +@end table + +@item typedef struct awk_flat_array @{ +@itemx @ @ @ @ awk_const void *awk_const opaque1;@ @ @ @ /* private data for use by gawk */ +@itemx @ @ @ @ awk_const void *awk_const opaque2;@ @ @ @ /* private data for use by gawk */ +@itemx @ @ @ @ awk_const size_t count;@ @ @ @ @ /* how many elements */ +@itemx @ @ @ @ awk_element_t elements[1];@ @ /* will be extended */ +@itemx @} awk_flat_array_t; +This is a flattened array. When an extension gets one of these +from @command{gawk}, the @code{elements} array is of actual +size @code{count}. +The @code{opaque1} and @code{opaque2} pointers are for use by @command{gawk}; +therefore they are marked @code{awk_const} so that the extension cannot +modify them. +@end table + +@node Array Functions +@subsubsection Array Functions + +The following functions relate to individual array elements. + +@table @code +@item awk_bool_t get_element_count(awk_array_t a_cookie, size_t *count); +For the array represented by @code{a_cookie}, return in @code{*count} +the number of elements it contains. A subarray counts as a single element. +Return false if there is an error. + +@item awk_bool_t get_array_element(awk_array_t a_cookie, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const awk_value_t *const index, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_valtype_t wanted, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_value_t *result); +For the array represented by @code{a_cookie}, return in @code{*result} +the value of the element whose index is @code{index}. +@code{wanted} specifies the type of value you wish to retrieve. +Return false if @code{wanted} does not match the actual type or if +@code{index} is not in the array (@pxref{table-value-types-returned}). + +The value for @code{index} can be numeric, in which case @command{gawk} +converts it to a string. Using non-integral values is possible, but +requires that you understand how such values are converted to strings +(@pxref{Conversion}); thus using integral values is safest. + +As with @emph{all} strings passed into @code{gawk} from an extension, +the string value of @code{index} must come from @code{malloc()}, and +@command{gawk} releases the storage. + +@item awk_bool_t set_array_element(awk_array_t a_cookie, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const@ awk_value_t *const index, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const@ awk_value_t *const value); +In the array represented by @code{a_cookie}, create or modify +the element whose index is given by @code{index}. +The @code{ARGV} and @code{ENVIRON} arrays may not be changed. + +@item awk_bool_t set_array_element_by_elem(awk_array_t a_cookie, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_element_t element); +Like @code{set_array_element()}, but take the @code{index} and @code{value} +from @code{element}. This is a convenience macro. + +@item awk_bool_t del_array_element(awk_array_t a_cookie, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const awk_value_t* const index); +Remove the element with the given index from the array +represented by @code{a_cookie}. +Return true if the element was removed, or false if the element did +not exist in the array. +@end table + +The following functions relate to arrays as a whole: + +@table @code +@item awk_array_t create_array(); +Create a new array to which elements may be added. +@xref{Creating Arrays}, for a discussion of how to +create a new array and add elements to it. + +@item awk_bool_t clear_array(awk_array_t a_cookie); +Clear the array represented by @code{a_cookie}. +Return false if there was some kind of problem, true otherwise. +The array remains an array, but after calling this function, it +has no elements. This is equivalent to using the @code{delete} +statement (@pxref{Delete}). + +@item awk_bool_t flatten_array(awk_array_t a_cookie, awk_flat_array_t **data); +For the array represented by @code{a_cookie}, create an @code{awk_flat_array_t} +structure and fill it in. Set the pointer whose address is passed as @code{data} +to point to this structure. +Return true upon success, or false otherwise. +@xref{Flattening Arrays}, for a discussion of how to +flatten an array and work with it. + +@item awk_bool_t release_flattened_array(awk_array_t a_cookie, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_flat_array_t *data); +When done with a flattened array, release the storage using this function. +You must pass in both the original array cookie, and the address of +the created @code{awk_flat_array_t} structure. +The function returns true upon success, false otherwise. +@end table + +@node Flattening Arrays +@subsubsection Working With All The Elements of an Array + +To @dfn{flatten} an array is create a structure that +represents the full array in a fashion that makes it easy +for C code to traverse the entire array. Test code +in @file{extension/testext.c} does this, and also serves +as a nice example to show how to use the APIs. + +First, the @command{gawk} script that drives the test extension: + +@example +@@load "testext" +BEGIN @{ + n = split("blacky rusty sophie raincloud lucky", pets) + printf "pets has %d elements\n", length(pets) + ret = dump_array_and_delete("pets", "3") + printf "dump_array_and_delete(pets) returned %d\n", ret + if ("3" in pets) + printf("dump_array_and_delete() did NOT remove index \"3\"!\n") + else + printf("dump_array_and_delete() did remove index \"3\"!\n") + print "" +@} +@end example + +@noindent +This code creates an array with @code{split()} (@pxref{String Functions}) +and then calls @code{dump_and_delete()}. That function looks up +the array whose name is passed as the first argument, and +deletes the element at the index passed in the second argument. +It then prints the return value and checks if the element +was indeed deleted. Here is the C code that implements +@code{dump_array_and_delete()}. It has been edited slightly for +presentation. + +The first part declares variables, sets up the default +return value in @code{result}, and checks that the function +was called with the correct number of arguments: + +@example +static awk_value_t * +dump_array_and_delete(int nargs, awk_value_t *result) +@{ + awk_value_t value, value2, value3; + awk_flat_array_t *flat_array; + size_t count; + char *name; + int i; + + assert(result != NULL); + make_number(0.0, result); + + if (nargs != 2) @{ + printf("dump_array_and_delete: nargs not right " + "(%d should be 2)\n", nargs); + goto out; + @} +@end example + +The function then proceeds in steps, as follows. First, retrieve +the name of the array, passed as the first argument. Then +retrieve the array itself. If either operation fails, print +error messages and return: + +@example + /* get argument named array as flat array and print it */ + if (get_argument(0, AWK_STRING, & value)) @{ + name = value.str_value.str; + if (sym_lookup(name, AWK_ARRAY, & value2)) + printf("dump_array_and_delete: sym_lookup of %s passed\n", + name); + else @{ + printf("dump_array_and_delete: sym_lookup of %s failed\n", + name); + goto out; + @} + @} else @{ + printf("dump_array_and_delete: get_argument(0) failed\n"); + goto out; + @} +@end example + +For testing purposes and to make sure that the C code sees +the same number of elements as the @command{awk} code, +the second step is to get the count of elements in the array +and print it: + +@example + if (! get_element_count(value2.array_cookie, & count)) @{ + printf("dump_array_and_delete: get_element_count failed\n"); + goto out; + @} + + printf("dump_array_and_delete: incoming size is %lu\n", + (unsigned long) count); +@end example + +The third step is to actually flatten the array, and then +to double check that the count in the @code{awk_flat_array_t} +is the same as the count just retrieved: + +@example + if (! flatten_array(value2.array_cookie, & flat_array)) @{ + printf("dump_array_and_delete: could not flatten array\n"); + goto out; + @} + + if (flat_array->count != count) @{ + printf("dump_array_and_delete: flat_array->count (%lu)" + " != count (%lu)\n", + (unsigned long) flat_array->count, + (unsigned long) count); + goto out; + @} +@end example + +The fourth step is to retrieve the index of the element +to be deleted, which was passed as the second argument. +Remember that argument counts passed to @code{get_argument()} +are zero-based, thus the second argument is numbered one: + +@example + if (! get_argument(1, AWK_STRING, & value3)) @{ + printf("dump_array_and_delete: get_argument(1) failed\n"); + goto out; + @} +@end example + +The fifth step is where the ``real work'' is done. The function +loops over every element in the array, printing the index and +element values. In addition, upon finding the element with the +index that is supposed to be deleted, the function sets the +@code{AWK_ELEMENT_DELETE} bit in the @code{flags} field +of the element. When the array is released, @command{gawk} +traverses the flattened array, and deletes any element which +have this flag bit set: + +@example + for (i = 0; i < flat_array->count; i++) @{ + printf("\t%s[\"%.*s\"] = %s\n", + name, + (int) flat_array->elements[i].index.str_value.len, + flat_array->elements[i].index.str_value.str, + valrep2str(& flat_array->elements[i].value)); + + if (strcmp(value3.str_value.str, + flat_array->elements[i].index.str_value.str) + == 0) @{ + flat_array->elements[i].flags |= AWK_ELEMENT_DELETE; + printf("dump_array_and_delete: marking element \"%s\" " + "for deletion\n", + flat_array->elements[i].index.str_value.str); + @} + @} +@end example + +The sixth step is to release the flattened array. This tells +@command{gawk} that the extension is no longer using the array, +and that it should delete any elements marked for deletion. +@command{gawk} also frees any storage that was allocated, +so you should not use the pointer (@code{flat_array} in this +code) once you have called @code{release_flattened_array()}: + +@example + if (! release_flattened_array(value2.array_cookie, flat_array)) @{ + printf("dump_array_and_delete: could not release flattened array\n"); + goto out; + @} +@end example + +Finally, since everything was successful, the function sets the +return value to success, and returns: + +@example + make_number(1.0, result); +out: + return result; +@} +@end example + +Here is the output from running this part of the test: + +@example +pets has 5 elements +dump_array_and_delete: sym_lookup of pets passed +dump_array_and_delete: incoming size is 5 + pets["1"] = "blacky" + pets["2"] = "rusty" + pets["3"] = "sophie" +dump_array_and_delete: marking element "3" for deletion + pets["4"] = "raincloud" + pets["5"] = "lucky" +dump_array_and_delete(pets) returned 1 +dump_array_and_delete() did remove index "3"! +@end example + +@node Creating Arrays +@subsubsection How To Create and Populate Arrays + +Besides working with arrays created by @command{awk} code, you can +create arrays and populate them as you see fit, and then @command{awk} +code can access them and manipulate them. + +There are two important points about creating arrays from extension code: + +@enumerate 1 +@item +You must install a new array into @command{gawk}'s symbol +table immediately upon creating it. Once you have done so, +you can then populate the array. + +@ignore +Strictly speaking, this is required only +for arrays that will have subarrays as elements; however it is +a good idea to always do this. This restriction may be relaxed +in a subsequent revision of the API. +@end ignore + +Similarly, if installing a new array as a subarray of an existing array, +you must add the new array to its parent before adding any elements to it. + +Thus, the correct way to build an array is to work ``top down.'' Create +the array, and immediately install it in @command{gawk}'s symbol table +using @code{sym_update()}, or install it as an element in a previously +existing array using @code{set_element()}. Example code is coming shortly. + +@item +Due to gawk internals, after using @code{sym_update()} to install an array +into @command{gawk}, you have to retrieve the array cookie from the value +passed in to @command{sym_update()} before doing anything else with it, like so: + +@example +awk_value_t index, value; +awk_array_t new_array; + +make_const_string("an index", 8, & index); + +new_array = create_array(); +val.val_type = AWK_ARRAY; +val.array_cookie = new_array; + +/* install array in the symbol table */ +sym_update("array", & index, & val); + +new_array = val.array_cookie; /* YOU MUST DO THIS */ +@end example + +If installing an array as a subarray, you must also retrieve the value +of the array cookie after the call to @code{set_element()}. +@end enumerate + +The following C code is a simple test extension to create an array +with two regular elements and with a subarray. The leading @samp{#include} +directives and boilerplate variable declarations are omitted for brevity. +The first step is to create a new array and then install it +in the symbol table: + +@example +@ignore +#ifdef HAVE_CONFIG_H +#include <config.h> +#endif + +#include <stdio.h> +#include <assert.h> +#include <errno.h> +#include <stdlib.h> +#include <string.h> +#include <unistd.h> + +#include <sys/types.h> +#include <sys/stat.h> + +#include "gawkapi.h" + +static const gawk_api_t *api; /* for convenience macros to work */ +static awk_ext_id_t *ext_id; +static const char *ext_version = "testarray extension: version 1.0"; + +int plugin_is_GPL_compatible; + +@end ignore +/* create_new_array --- create a named array */ + +static void +create_new_array() +@{ + awk_array_t a_cookie; + awk_array_t subarray; + awk_value_t index, value; + + a_cookie = create_array(); + value.val_type = AWK_ARRAY; + value.array_cookie = a_cookie; + + if (! sym_update("new_array", & value)) + printf("create_new_array: sym_update(\"new_array\") failed!\n"); + a_cookie = value.array_cookie; +@end example + +@noindent +Note how @code{a_cookie} is reset from the @code{array_cookie} field in +the @code{value} structure. + +The second step is to install two regular values into @code{new_array}: + +@example + (void) make_const_string("hello", 5, & index); + (void) make_const_string("world", 5, & value); + if (! set_array_element(a_cookie, & index, & value)) @{ + printf("fill_in_array: set_array_element failed\n"); + return; + @} + + (void) make_const_string("answer", 6, & index); + (void) make_number(42.0, & value); + if (! set_array_element(a_cookie, & index, & value)) @{ + printf("fill_in_array: set_array_element failed\n"); + return; + @} +@end example + +The third step is to create the subarray and install it: + +@example + (void) make_const_string("subarray", 8, & index); + subarray = create_array(); + value.val_type = AWK_ARRAY; + value.array_cookie = subarray; + if (! set_array_element(a_cookie, & index, & value)) @{ + printf("fill_in_array: set_array_element failed\n"); + return; + @} + subarray = value.array_cookie; +@end example + +The final step is to populate the subarray with its own element: + +@example + (void) make_const_string("foo", 3, & index); + (void) make_const_string("bar", 3, & value); + if (! set_array_element(subarray, & index, & value)) @{ + printf("fill_in_array: set_array_element failed\n"); + return; + @} +@} +@ignore +static awk_ext_func_t func_table[] = @{ + @{ NULL, NULL, 0 @} +@}; + +/* init_testarray --- additional initialization function */ + +static awk_bool_t init_testarray(void) +@{ + create_new_array(); + + return 1; +@} + +static awk_bool_t (*init_func)(void) = init_testarray; + +dl_load_func(func_table, testarray, "") +@end ignore +@end example + +Here is sample script that loads the extension +and then dumps the array: + +@example +@@load "subarray" + +function dumparray(name, array, i) +@{ + for (i in array) + if (isarray(array[i])) + dumparray(name "[\"" i "\"]", array[i]) + else + printf("%s[\"%s\"] = %s\n", name, i, array[i]) +@} + +BEGIN @{ + dumparray("new_array", new_array); +@} +@end example + +Here is the result of running the script: + +@example +$ @kbd{AWKLIBPATH=$PWD ./gawk -f subarray.awk} +@print{} new_array["subarray"]["foo"] = bar +@print{} new_array["hello"] = world +@print{} new_array["answer"] = 42 +@end example + +@noindent +(@xref{Finding Extensions}, for more information on the +@env{AWKLIBPATH} environment variable.) + +@node Extension API Variables +@subsection API Variables + +The API provides two sets of variables. The first provides information +about the version of the API (both with which the extension was compiled, +and with which @command{gawk} was compiled). The second provides +information about how @command{gawk} was invoked. + +@menu +* Extension Versioning:: API Version information. +* Extension API Informational Variables:: Variables providing information about + @command{gawk}'s invocation. +@end menu + +@node Extension Versioning +@subsubsection API Version Constants and Variables + +The API provides both a ``major'' and a ``minor'' version number. +The API versions are available at compile time as constants: + +@table @code +@item GAWK_API_MAJOR_VERSION +The major version of the API. + +@item GAWK_API_MINOR_VERSION +The minor version of the API. +@end table + +The minor version increases when new functions are added to the API. Such +new functions are always added to the end of the API @code{struct}. + +The major version increases (and the minor version is reset to zero) if any +of the data types change size or member order, or if any of the existing +functions change signature. + +It could happen that an extension may be compiled against one version +of the API but loaded by a version of @command{gawk} using a different +version. For this reason, the major and minor API versions of the +running @command{gawk} are included in the API @code{struct} as read-only +constant integers: + +@table @code +@item api->major_version +The major version of the running @command{gawk}. + +@item api->minor_version +The minor version of the running @command{gawk}. +@end table + +It is up to the extension to decide if there are API incompatibilities. +Typically a check like this is enough: + +@example +if (api->major_version != GAWK_API_MAJOR_VERSION + || api->minor_version < GAWK_API_MINOR_VERSION) @{ + fprintf(stderr, "foo_extension: version mismatch with gawk!\n"); + fprintf(stderr, "\tmy version (%d, %d), gawk version (%d, %d)\n", + GAWK_API_MAJOR_VERSION, GAWK_API_MINOR_VERSION, + api->major_version, api->minor_version); + exit(1); +@} +@end example + +Such code is included in the boilerplate @code{dl_load_func()} macro +provided in @file{gawkapi.h} (discussed later, in +@ref{Extension API Boilerplate}). + +@node Extension API Informational Variables +@subsubsection Informational Variables + +The API provides access to several variables that describe +whether the corresponding command-line options were enabled when +@command{gawk} was invoked. The variables are: + +@table @code +@item do_lint +This variable is true if @command{gawk} was invoked with @option{--lint} option +(@pxref{Options}). + +@item do_traditional +This variable is true if @command{gawk} was invoked with @option{--traditional} option. + +@item do_profile +This variable is true if @command{gawk} was invoked with @option{--profile} option. + +@item do_sandbox +This variable is true if @command{gawk} was invoked with @option{--sandbox} option. + +@item do_debug +This variable is true if @command{gawk} was invoked with @option{--debug} option. + +@item do_mpfr +This variable is true if @command{gawk} was invoked with @option{--bignum} option. +@end table + +The value of @code{do_lint} can change if @command{awk} code +modifies the @code{LINT} built-in variable (@pxref{Built-in Variables}). +The others should not change during execution. + +@node Extension API Boilerplate +@subsection Boilerplate Code + +As mentioned earlier (@pxref{Extension Mechanism Outline}), the function +definitions as presented are really macros. To use these macros, your +extension must provide a small amount of boilerplate code (variables and +functions) towards the top of your source file, using pre-defined names +as described below. The boilerplate needed is also provided in comments +in the @file{gawkapi.h} header file: + +@example +/* Boiler plate code: */ +int plugin_is_GPL_compatible; + +static gawk_api_t *const api; +static awk_ext_id_t ext_id; +static const char *ext_version = NULL; /* or @dots{} = "some string" */ + +static awk_ext_func_t func_table[] = @{ + @{ "name", do_name, 1 @}, + /* @dots{} */ +@}; + +/* EITHER: */ + +static awk_bool_t (*init_func)(void) = NULL; + +/* OR: */ + +static awk_bool_t +init_my_module(void) +@{ + @dots{} +@} + +static awk_bool_t (*init_func)(void) = init_my_module; + +dl_load_func(func_table, some_name, "name_space_in_quotes") +@end example + +These variables and functions are as follows: + +@table @code +@item int plugin_is_GPL_compatible; +This asserts that the extension is compatible with the GNU GPL +(@pxref{Copying}). If your extension does not have this, @command{gawk} +will not load it (@pxref{Plugin License}). + +@item static gawk_api_t *const api; +This global @code{static} variable should be set to point to +the @code{gawk_api_t} pointer that @command{gawk} passes to your +@code{dl_load()} function. This variable is used by all of the macros. + +@item static awk_ext_id_t ext_id; +This global static variable should be set to the @code{awk_ext_id_t} +value that @command{gawk} passes to your @code{dl_load()} function. +This variable is used by all of the macros. + +@item static const char *ext_version = NULL; /* or @dots{} = "some string" */ +This global @code{static} variable should be set either +to @code{NULL}, or to point to a string giving the name and version of +your extension. + +@item static awk_ext_func_t func_table[] = @{ @dots{} @}; +This is an array of one or more @code{awk_ext_func_t} structures +as described earlier (@pxref{Extension Functions}). +It can then be looped over for multiple calls to +@code{add_ext_func()}. + +@item static awk_bool_t (*init_func)(void) = NULL; +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @r{OR} +@itemx static awk_bool_t init_my_module(void) @{ @dots{} @} +@itemx static awk_bool_t (*init_func)(void) = init_my_module; +If you need to do some initialization work, you should define a +function that does it (creates variables, opens files, etc.) +and then define the @code{init_func} pointer to point to your +function. +The function should return zero (false) upon failure, non-zero +(success) if everything goes well. + +If you don't need to do any initialization, define the pointer and +initialize it to @code{NULL}. + +@item dl_load_func(func_table, some_name, "name_space_in_quotes") +This macro expands to a @code{dl_load()} function that performs +all the necessary initializations. +@end table + +The point of the all the variables and arrays is to let the +@code{dl_load()} function (from the @code{dl_load_func()} +macro) do all the standard work. It does the following: + +@enumerate 1 +@item +Check the API versions. If the extension major version does not match +@command{gawk}'s, or if the extension minor version is greater than +@command{gawk}'s, it prints a fatal error message and exits. + +@item +Load the functions defined in @code{func_table}. +If any of them fails to load, it prints a warning message but +continues on. + +@item +If the @code{init_func} pointer is not @code{NULL}, call the +function it points to. If it returns non-zero, print a +warning message. + +@item +If @code{ext_version} is not @code{NULL}, register +the version string with @command{gawk}. +@end enumerate + +@node Finding Extensions +@subsection How @command{gawk} Finds Extensions + +Compiled extensions have to be installed in a directory where +@command{gawk} can find them. If @command{gawk} is configured and +built in the default fashion, the directory in which to find +extensions is @file{/usr/local/lib/gawk}. You can also specify a search +path with a list of directories to search for compiled extensions. +@xref{AWKLIBPATH Variable}, for more information. + +@node Extension Example +@section Example: Some File Functions + +@quotation +@i{No matter where you go, there you are.} @* +Buckaroo Bonzai +@end quotation + +@c It's enough to show chdir and stat, no need for fts + +Two useful functions that are not in @command{awk} are @code{chdir()} (so +that an @command{awk} program can change its directory) and @code{stat()} +(so that an @command{awk} program can gather information about a file). +This @value{SECTION} implements these functions for @command{gawk} +in an extension. @menu * Internal File Description:: What the new functions will do. @@ -28121,13 +30613,13 @@ external extension library. @node Internal File Description @subsection Using @code{chdir()} and @code{stat()} -This @value{SECTION} shows how to use the new functions at the @command{awk} -level once they've been integrated into the running @command{gawk} -interpreter. -Using @code{chdir()} is very straightforward. It takes one argument, -the new directory to change to: +This @value{SECTION} shows how to use the new functions at +the @command{awk} level once they've been integrated into the +running @command{gawk} interpreter. Using @code{chdir()} is very +straightforward. It takes one argument, the new directory to change to: @example +@@load "filefuncs" @dots{} newdir = "/home/arnold/funstuff" ret = chdir(newdir) @@ -28139,21 +30631,18 @@ if (ret < 0) @{ @dots{} @end example -The return value is negative if the @code{chdir} failed, -and @code{ERRNO} -(@pxref{Built-in Variables}) -is set to a string indicating the error. +The return value is negative if the @code{chdir()} failed, and +@code{ERRNO} (@pxref{Built-in Variables}) is set to a string indicating +the error. -Using @code{stat()} is a bit more complicated. -The C @code{stat()} function fills in a structure that has a fair -amount of information. +Using @code{stat()} is a bit more complicated. The C @code{stat()} +function fills in a structure that has a fair amount of information. The right way to model this in @command{awk} is to fill in an associative array with the appropriate information: @c broke printf for page breaking @example file = "/home/arnold/.profile" -fdata[1] = "x" # force `fdata' to be an array ret = stat(file, fdata) if (ret < 0) @{ printf("could not stat %s: %s\n", @@ -28198,11 +30687,11 @@ be a function of the file's size if the file has holes. The file's last access, modification, and inode update times, respectively. These are numeric timestamps, suitable for formatting with @code{strftime()} -(@pxref{Built-in}). +(@pxref{Time Functions}). @item "pmode" The file's ``printable mode.'' This is a string representation of -the file's type and permissions, such as what is produced by +the file's type and permissions, such as is produced by @samp{ls -l}---for example, @code{"drwxr-xr-x"}. @item "type" @@ -28263,64 +30752,96 @@ of that number, respectively. @node Internal File Ops @subsection C Code for @code{chdir()} and @code{stat()} -Here is the C code for these extensions. They were written for -GNU/Linux. The code needs some more work for complete portability -to other POSIX-compliant systems:@footnote{This version is edited -slightly for presentation. See -@file{extension/filefuncs.c} in the @command{gawk} distribution -for the complete version.} +Here is the C code for these extensions.@footnote{This version is +edited slightly for presentation. See @file{extension/filefuncs.c} +in the @command{gawk} distribution for the complete version.} + +The file includes a number of standard header files, and then includes +the @file{gawkapi.h} header file which provides the API definitions. +Those are followed by the necessary variable declarations +to make use of the API macros and boilerplate code +(@pxref{Extension API Boilerplate}). @c break line for page breaking @example -#include "awk.h" +#ifdef HAVE_CONFIG_H +#include <config.h> +#endif + +#include <stdio.h> +#include <assert.h> +#include <errno.h> +#include <stdlib.h> +#include <string.h> +#include <unistd.h> + +#include <sys/types.h> +#include <sys/stat.h> -#include <sys/sysmacros.h> +#include "gawkapi.h" + +#include "gettext.h" +#define _(msgid) gettext(msgid) +#define N_(msgid) msgid + +#include "gawkfts.h" +#include "stack.h" + +static const gawk_api_t *api; /* for convenience macros to work */ +static awk_ext_id_t *ext_id; +static awk_bool_t init_filefuncs(void); +static awk_bool_t (*init_func)(void) = init_filefuncs; +static const char *ext_version = "filefuncs extension: version 1.0"; int plugin_is_GPL_compatible; +@end example +@cindex programming conventions, @command{gawk} internals +By convention, for an @command{awk} function @code{foo()}, the C function +that implements it is called @code{do_foo()}. The function should have +two arguments: the first is an @code{int} usually called @code{nargs}, +that represents the number of actual arguments for the function. +The second is a pointer to an @code{awk_value_t}, usually named +@code{result}. + +@example /* do_chdir --- provide dynamically loaded chdir() builtin for gawk */ -static NODE * -do_chdir(int nargs) +static awk_value_t * +do_chdir(int nargs, awk_value_t *result) @{ - NODE *newdir; + awk_value_t newdir; int ret = -1; - if (do_lint && nargs != 1) - lintwarn("chdir: called with incorrect number of arguments"); + assert(result != NULL); - newdir = get_scalar_argument(0, FALSE); + if (do_lint && nargs != 1) + lintwarn(ext_id, + _("chdir: called with incorrect number of arguments, " + "expecting 1")); @end example -The file includes the @code{"awk.h"} header file for definitions -for the @command{gawk} internals. It includes @code{<sys/sysmacros.h>} -for access to the @code{major()} and @code{minor}() macros. - -@cindex programming conventions, @command{gawk} internals -By convention, for an @command{awk} function @code{foo}, the function that -implements it is called @samp{do_foo}. The function should take -a @samp{int} argument, usually called @code{nargs}, that -represents the number of defined arguments for the function. The @code{newdir} +The @code{newdir} variable represents the new directory to change to, retrieved -with @code{get_scalar_argument()}. Note that the first argument is +with @code{get_argument()}. Note that the first argument is numbered zero. -This code actually accomplishes the @code{chdir()}. It first forces -the argument to be a string and passes the string value to the +If the argument is retrieved successfully, the function calls the @code{chdir()} system call. If the @code{chdir()} fails, @code{ERRNO} is updated. @example - (void) force_string(newdir); - ret = chdir(newdir->stptr); - if (ret < 0) - update_ERRNO_int(errno); + if (get_argument(0, AWK_STRING, & newdir)) @{ + ret = chdir(newdir.str_value.str); + if (ret < 0) + update_ERRNO_int(errno); + @} @end example Finally, the function returns the return value to the @command{awk} level: @example - return make_number((AWKNUM) ret); + return make_number(ret, result); @} @end example @@ -28339,7 +30860,168 @@ format_mode(unsigned long fmode) @} @end example -Next comes the @code{do_stat()} function. It starts with +Next comes a function for reading symbolic links, which is also +omitted here for brevity: + +@example +/* read_symlink --- read a symbolic link into an allocated buffer. + @dots{} */ + +static char * +read_symlink(const char *fname, size_t bufsize, ssize_t *linksize) +@{ + @dots{} +@} +@end example + +Two helper functions simplify entering values in the +array that will contain the result of the @code{stat()}: + +@example +/* array_set --- set an array element */ + +static void +array_set(awk_array_t array, const char *sub, awk_value_t *value) +@{ + awk_value_t index; + + set_array_element(array, + make_const_string(sub, strlen(sub), & index), + value); + +@} + +/* array_set_numeric --- set an array element with a number */ + +static void +array_set_numeric(awk_array_t array, const char *sub, double num) +@{ + awk_value_t tmp; + + array_set(array, sub, make_number(num, & tmp)); +@} +@end example + +The following function does most of the work to fill in +the @code{awk_array_t} result array with values obtained +from a valid @code{struct stat}. It is done in a separate function +to support the @code{stat()} function for @command{gawk} and also +to support the @code{fts()} extension which is included in +the same file but whose code is not shown here +(@pxref{Extension Sample File Functions}). + +The first part of the function is variable declarations, +including a table to map file types to strings: + +@example +/* fill_stat_array --- do the work to fill an array with stat info */ + +static int +fill_stat_array(const char *name, awk_array_t array, struct stat *sbuf) +@{ + char *pmode; /* printable mode */ + const char *type = "unknown"; + awk_value_t tmp; + static struct ftype_map @{ + unsigned int mask; + const char *type; + @} ftype_map[] = @{ + @{ S_IFREG, "file" @}, + @{ S_IFBLK, "blockdev" @}, + @{ S_IFCHR, "chardev" @}, + @{ S_IFDIR, "directory" @}, +#ifdef S_IFSOCK + @{ S_IFSOCK, "socket" @}, +#endif +#ifdef S_IFIFO + @{ S_IFIFO, "fifo" @}, +#endif +#ifdef S_IFLNK + @{ S_IFLNK, "symlink" @}, +#endif +#ifdef S_IFDOOR /* Solaris weirdness */ + @{ S_IFDOOR, "door" @}, +#endif /* S_IFDOOR */ + @}; + int j, k; +@end example + +The destination array is cleared, and then code fills in +various elements based on values in the @code{struct stat}: + +@example + /* empty out the array */ + clear_array(array); + + /* fill in the array */ + array_set(array, "name", make_const_string(name, strlen(name), + & tmp)); + array_set_numeric(array, "dev", sbuf->st_dev); + array_set_numeric(array, "ino", sbuf->st_ino); + array_set_numeric(array, "mode", sbuf->st_mode); + array_set_numeric(array, "nlink", sbuf->st_nlink); + array_set_numeric(array, "uid", sbuf->st_uid); + array_set_numeric(array, "gid", sbuf->st_gid); + array_set_numeric(array, "size", sbuf->st_size); + array_set_numeric(array, "blocks", sbuf->st_blocks); + array_set_numeric(array, "atime", sbuf->st_atime); + array_set_numeric(array, "mtime", sbuf->st_mtime); + array_set_numeric(array, "ctime", sbuf->st_ctime); + + /* for block and character devices, add rdev, + major and minor numbers */ + if (S_ISBLK(sbuf->st_mode) || S_ISCHR(sbuf->st_mode)) @{ + array_set_numeric(array, "rdev", sbuf->st_rdev); + array_set_numeric(array, "major", major(sbuf->st_rdev)); + array_set_numeric(array, "minor", minor(sbuf->st_rdev)); + @} +@end example + +@noindent +The latter part of the function makes selective additions +to the destination array, depending upon the availability of +certain members and/or the type of the file. It then returns zero, +for success: + +@example +#ifdef HAVE_ST_BLKSIZE + array_set_numeric(array, "blksize", sbuf->st_blksize); +#endif /* HAVE_ST_BLKSIZE */ + + pmode = format_mode(sbuf->st_mode); + array_set(array, "pmode", make_const_string(pmode, strlen(pmode), + & tmp)); + + /* for symbolic links, add a linkval field */ + if (S_ISLNK(sbuf->st_mode)) @{ + char *buf; + ssize_t linksize; + + if ((buf = read_symlink(name, sbuf->st_size, + & linksize)) != NULL) + array_set(array, "linkval", + make_malloced_string(buf, linksize, & tmp)); + else + warning(ext_id, _("stat: unable to read symbolic link `%s'"), + name); + @} + + /* add a type field */ + type = "unknown"; /* shouldn't happen */ + for (j = 0, k = sizeof(ftype_map)/sizeof(ftype_map[0]); j < k; j++) @{ + if ((sbuf->st_mode & S_IFMT) == ftype_map[j].mask) @{ + type = ftype_map[j].type; + break; + @} + @} + + array_set(array, "type", make_const_string(type, strlen(type), &tmp)); + + return 0; +@} +@end example + +Finally, here is the @code{do_stat()} function. It starts with variable declarations and argument checking: @ignore @@ -28349,116 +31031,140 @@ Changed message for page breaking. Used to be: @example /* do_stat --- provide a stat() function for gawk */ -static NODE * -do_stat(int nargs) +static awk_value_t * +do_stat(int nargs, awk_value_t *result) @{ - NODE *file, *array, *tmp; - struct stat sbuf; + awk_value_t file_param, array_param; + char *name; + awk_array_t array; int ret; - NODE **aptr; - char *pmode; /* printable mode */ - char *type = "unknown"; + struct stat sbuf; - if (do_lint && nargs > 2) - lintwarn("stat: called with too many arguments"); + assert(result != NULL); + + if (do_lint && nargs != 2) @{ + lintwarn(ext_id, + _("stat: called with wrong number of arguments")); + return make_number(-1, result); + @} @end example Then comes the actual work. First, the function gets the arguments. -Then, it always clears the array. +Next, it gets the information for the file. The code use @code{lstat()} (instead of @code{stat()}) to get the file information, in case the file is a symbolic link. If there's an error, it sets @code{ERRNO} and returns: -@c comment made multiline for page breaking @example /* file is first arg, array to hold results is second */ - file = get_scalar_argument(0, FALSE); - array = get_array_argument(1, FALSE); + if ( ! get_argument(0, AWK_STRING, & file_param) + || ! get_argument(1, AWK_ARRAY, & array_param)) @{ + warning(ext_id, _("stat: bad parameters")); + return make_number(-1, result); + @} - /* empty out the array */ - assoc_clear(array); + name = file_param.str_value.str; + array = array_param.array_cookie; + + /* always empty out the array */ + clear_array(array); /* lstat the file, if error, set ERRNO and return */ - (void) force_string(file); - ret = lstat(file->stptr, & sbuf); + ret = lstat(name, & sbuf); if (ret < 0) @{ update_ERRNO_int(errno); - return make_number((AWKNUM) ret); + return make_number(ret, result); @} @end example -Now comes the tedious part: filling in the array. Only a few of the -calls are shown here, since they all follow the same pattern: +The tedious work is done by @code{fill_stat_array()}, shown +earlier. When done, return the result from @code{fill_stat_array()}: @example - /* fill in the array */ - aptr = assoc_lookup(array, tmp = make_string("name", 4)); - *aptr = dupnode(file); - unref(tmp); + ret = fill_stat_array(name, array, & sbuf); - aptr = assoc_lookup(array, tmp = make_string("mode", 4)); - *aptr = make_number((AWKNUM) sbuf.st_mode); - unref(tmp); - - aptr = assoc_lookup(array, tmp = make_string("pmode", 5)); - pmode = format_mode(sbuf.st_mode); - *aptr = make_string(pmode, strlen(pmode)); - unref(tmp); + return make_number(ret, result); +@} @end example -When done, return the @code{lstat()} return value: +@cindex programming conventions, @command{gawk} internals +Finally, it's necessary to provide the ``glue'' that loads the +new function(s) into @command{gawk}. + +The @code{filefuncs} extension also provides an @code{fts()} +function, which we omit here. For its sake there is an initialization +function: @example +/* init_filefuncs --- initialization routine */ - return make_number((AWKNUM) ret); +static awk_bool_t +init_filefuncs(void) +@{ + @dots{} @} @end example -@cindex programming conventions, @command{gawk} internals -Finally, it's necessary to provide the ``glue'' that loads the -new function(s) into @command{gawk}. By convention, each library has -a routine named @code{dl_load()} that does the job. The simplest way -is to use the @code{dl_load_func} macro in @code{gawkapi.h}. +We are almost done. We need an array of @code{awk_ext_func_t} +structures for loading each function into @command{gawk}: + +@example +static awk_ext_func_t func_table[] = @{ + @{ "chdir", do_chdir, 1 @}, + @{ "stat", do_stat, 2 @}, + @{ "fts", do_fts, 3 @}, +@}; +@end example + +Each extension must have a routine named @code{dl_load()} to load +everything that needs to be loaded. It is simplest to use the +@code{dl_load_func()} macro in @code{gawkapi.h}: + +@example +/* define the dl_load() function using the boilerplate macro */ + +dl_load_func(func_table, filefuncs, "") +@end example And that's it! As an exercise, consider adding functions to implement system calls such as @code{chown()}, @code{chmod()}, and @code{umask()}. @node Using Internal File Ops -@subsection Integrating the Extensions +@subsection Integrating The Extensions @cindex @command{gawk}, interpreter@comma{} adding code to Now that the code is written, it must be possible to add it at runtime to the running @command{gawk} interpreter. First, the code must be compiled. Assuming that the functions are in a file named @file{filefuncs.c}, and @var{idir} is the location -of the @command{gawk} include files, -the following steps create -a GNU/Linux shared library: +of the @file{gawkapi.h} header file, +the following steps@footnote{In practice, you would probably want to +use the GNU Autotools---Automake, Autoconf, Libtool, and Gettext---to +configure and build your libraries. Instructions for doing so are beyond +the scope of this @value{DOCUMENT}. @xref{gawkextlib}, for WWW links to +the tools.} create a GNU/Linux shared library: @example $ @kbd{gcc -fPIC -shared -DHAVE_CONFIG_H -c -O -g -I@var{idir} filefuncs.c} -$ @kbd{ld -o filefuncs.so -shared filefuncs.o} +$ @kbd{ld -o filefuncs.so -shared filefuncs.o -lc} @end example -@cindex @code{extension()} function (@command{gawk}) -Once the library exists, it is loaded by calling the @code{extension()} -built-in function. -This function takes two arguments: the name of the -library to load and the name of a function to call when the library -is first loaded. This function adds the new functions to @command{gawk}. -It returns the value returned by the initialization function -within the shared library: +Once the library exists, it is loaded by using the @code{@@load} keyword. @example # file testff.awk +@@load "filefuncs" + BEGIN @{ - extension("./filefuncs.so", "dl_load") + "pwd" | getline curdir # save current directory + close("pwd") - chdir(".") # no-op + chdir("/tmp") + system("pwd") # test it + chdir(curdir) # go back - data[1] = 1 # force `data' to be an array print "Info for testff.awk" ret = stat("testff.awk", data) print "ret =", ret @@ -28476,40 +31182,705 @@ BEGIN @{ @} @end example -Here are the results of running the program: +The @env{AWKLIBPATH} environment variable tells +@command{gawk} where to find shared libraries (@pxref{Finding Extensions}). +We set it to the current directory and run the program: @example -$ @kbd{gawk -f testff.awk} +$ @kbd{AWKLIBPATH=$PWD gawk -f testff.awk} +@print{} /tmp @print{} Info for testff.awk @print{} ret = 0 -@print{} data["size"] = 607 -@print{} data["ino"] = 14945891 -@print{} data["name"] = testff.awk -@print{} data["pmode"] = -rw-rw-r-- -@print{} data["nlink"] = 1 -@print{} data["atime"] = 1293993369 -@print{} data["mtime"] = 1288520752 -@print{} data["mode"] = 33204 @print{} data["blksize"] = 4096 -@print{} data["dev"] = 2054 +@print{} data["mtime"] = 1350838628 +@print{} data["mode"] = 33204 @print{} data["type"] = file -@print{} data["gid"] = 500 -@print{} data["uid"] = 500 +@print{} data["dev"] = 2053 +@print{} data["gid"] = 1000 +@print{} data["ino"] = 1719496 +@print{} data["ctime"] = 1350838628 @print{} data["blocks"] = 8 -@print{} data["ctime"] = 1290113572 -@print{} testff.awk modified: 10 31 10 12:25:52 +@print{} data["nlink"] = 1 +@print{} data["name"] = testff.awk +@print{} data["atime"] = 1350838632 +@print{} data["pmode"] = -rw-rw-r-- +@print{} data["size"] = 662 +@print{} data["uid"] = 1000 +@print{} testff.awk modified: 10 21 12 18:57:08 @print{} @print{} Info for JUNK @print{} ret = -1 @print{} JUNK modified: 01 01 70 02:00:00 @end example -@c ENDOFRANGE filre -@c ENDOFRANGE dirch -@c ENDOFRANGE statg -@c ENDOFRANGE chdirg -@c ENDOFRANGE gladfgaw -@c ENDOFRANGE adfugaw -@c ENDOFRANGE fubadgaw + +@node Extension Samples +@section The Sample Extensions In The @command{gawk} Distribution + +This @value{SECTION} provides brief overviews of the sample extensions +that come in the @command{gawk} distribution. Some of them are intended +for production use, such the @code{filefuncs} and @code{readdir} extensions. +Others mainly provide example code that shows how to use the extension API. + +@menu +* Extension Sample File Functions:: The file functions sample. +* Extension Sample Fnmatch:: An interface to @code{fnmatch()}. +* Extension Sample Fork:: An interface to @code{fork()} and other + process functions. +* Extension Sample Ord:: Character to value to character + conversions. +* Extension Sample Readdir:: An interface to @code{readdir()}. +* Extension Sample Revout:: Reversing output sample output wrapper. +* Extension Sample Rev2way:: Reversing data sample two-way processor. +* Extension Sample Read write array:: Serializing an array to a file. +* Extension Sample Readfile:: Reading an entire file into a string. +* Extension Sample API Tests:: Tests for the API. +* Extension Sample Time:: An interface to @code{gettimeofday()} + and @code{sleep()}. +@end menu + +@node Extension Sample File Functions +@subsection File Related Functions + +The @code{filefuncs} extension provides three different functions, as follows: +The usage is: + +@table @code +@item @@load "filefuncs" +This is how you load the extension. + +@item result = chdir("/some/directory") +The @code{chdir()} function is a direct hook to the @code{chdir()} +system call to change the current directory. It returns zero +upon success or less than zero upon error. In the latter case it updates +@code{ERRNO}. + +@item result = stat("/some/path", statdata) +The @code{stat()} function provides a hook into the +@code{stat()} system call. In fact, it uses @code{lstat()}. +It returns zero upon success or less than zero upon error. +In the latter case it updates @code{ERRNO}. + +In all cases, it clears the @code{statdata} array. +When the call is successful, @code{stat()} fills the @code{statdata} +array with information retrieved from the filesystem, as follows: + +@c nested table +@multitable @columnfractions .25 .60 +@item @code{statdata["name"]} @tab +The name of the file. + +@item @code{statdata["dev"]} @tab +Corresponds to the @code{st_dev} field in the @code{struct stat}. + +@item @code{statdata["ino"]} @tab +Corresponds to the @code{st_ino} field in the @code{struct stat}. + +@item @code{statdata["mode"]} @tab +Corresponds to the @code{st_mode} field in the @code{struct stat}. + +@item @code{statdata["nlink"]} @tab +Corresponds to the @code{st_nlink} field in the @code{struct stat}. + +@item @code{statdata["uid"]} @tab +Corresponds to the @code{st_uid} field in the @code{struct stat}. + +@item @code{statdata["gid"]} @tab +Corresponds to the @code{st_gid} field in the @code{struct stat}. + +@item @code{statdata["size"]} @tab +Corresponds to the @code{st_size} field in the @code{struct stat}. + +@item @code{statdata["atime"]} @tab +Corresponds to the @code{st_atime} field in the @code{struct stat}. + +@item @code{statdata["mtime"]} @tab +Corresponds to the @code{st_mtime} field in the @code{struct stat}. + +@item @code{statdata["ctime"]} @tab +Corresponds to the @code{st_ctime} field in the @code{struct stat}. + +@item @code{statdata["rdev"]} @tab +Corresponds to the @code{st_rdev} field in the @code{struct stat}. +This element is only present for device files. + +@item @code{statdata["major"]} @tab +Corresponds to the @code{st_major} field in the @code{struct stat}. +This element is only present for device files. + +@item @code{statdata["minor"]} @tab +Corresponds to the @code{st_minor} field in the @code{struct stat}. +This element is only present for device files. + +@item @code{statdata["blksize"]} @tab +Corresponds to the @code{st_blksize} field in the @code{struct stat}. +if this field is present on your system. +(It is present on all modern systems that we know of.) + +@item @code{statdata["pmode"]} @tab +A human-readable version of the mode value, such as printed by +@command{ls}. For example, @code{"-rwxr-xr-x"}. + +@item @code{statdata["linkval"]} @tab +If the named file is a symbolic link, this element will exist +and its value is the value of the symbolic link (where the +symbolic link points to). + +@item @code{statdata["type"]} @tab +The type of the file as a string. One of +@code{"file"}, +@code{"blockdev"}, +@code{"chardev"}, +@code{"directory"}, +@code{"socket"}, +@code{"fifo"}, +@code{"symlink"}, +@code{"door"}, +or +@code{"unknown"}. +Not all systems support all file types. +@end multitable + +@item flags = or(FTS_PHYSICAL, ...) +@itemx result = fts(pathlist, flags, filedata) +Walk the file trees provided in @code{pathlist} and fill in the +@code{filedata} array as described below. @code{flags} is the bitwise +OR of several predefined constant values, also as described below. +Return zero if there were no errors, otherwise return @minus{}1. +@end table + +The @code{fts()} function provides a hook to the C library @code{fts()} +routines for traversing file hierarchies. Instead of returning data +about one file at a time in a stream, it fills in a multi-dimensional +array with data about each file and directory encountered in the requested +hierarchies. + +The arguments are as follows: + +@table @code +@item pathlist +An array of filenames. The element values are used; the index values are ignored. + +@item flags +This should be the bitwise OR of one or more of the following +predefined constant flag values. At least one of +@code{FTS_LOGICAL} or @code{FTS_PHYSICAL} must be provided; otherwise +@code{fts()} returns an error value and sets @code{ERRNO}. +The flags are: + +@c nested table +@table @code +@item FTS_LOGICAL +Do a ``logical'' file traversal, where the information returned for +a symbolic link refers to the linked-to file, and not to the symbolic +link itself. This flag is mutually exclusive with @code{FTS_PHYSICAL}. + +@item FTS_PHYSICAL +Do a ``physical'' file traversal, where the information returned for a +symbolic link refers to the symbolic link itself. This flag is mutually +exclusive with @code{FTS_LOGICAL}. + +@item FTS_NOCHDIR +As a performance optimization, the C library @code{fts()} routines +change directory as they traverse a file hierarchy. This flag disables +that optimization. + +@item FTS_COMFOLLOW +Immediately follow a symbolic link named in @code{pathlist}, +whether or not @code{FTS_LOGICAL} is set. + +@item FTS_SEEDOT +By default, the @code{fts()} routines do not return entries for @file{.} +and @file{..}. This option causes entries for @file{..} to also +be included. (The extension always includes an entry for @file{.}, +see below.) + +@item FTS_XDEV +During a traversal, do not cross onto a different mounted filesystem. +@end table + +@item filedata +The @code{filedata} array is first cleared. Then, @code{fts()} creates +an element in @code{filedata} for every element in @code{pathlist}. +The index is the name of the directory or file given in @code{pathlist}. +The element for this index is itself an array. There are two cases. + +@c nested table +@table @emph +@item The path is a file. +In this case, the array contains two or three elements: + +@c doubly nested table +@table @code +@item "path" +The full path to this file, starting from the ``root'' that was given +in the @code{pathlist} array. + +@item "stat" +This element is itself an array, containing the same information as provided +by the @code{stat()} function described earlier for its +@code{statdata} argument. The element may not be present if +the @code{stat()} system call for the file failed. + +@item "error" +If some kind of error was encountered, the array will also +contain an element named @code{"error"}, which is a string describing the error. +@end table + +@item The path is a directory. +In this case, the array contains one element for each entry in the +directory. If an entry is a file, that element is as for files, just +described. If the entry is a directory, that element is (recursively), +an array describing the subdirectory. If @code{FTS_SEEDOT} was provided +in the flags, then there will also be an element named @code{".."}. This +element will be an array containing the data as provided by @code{stat()}. + +In addition, there will be an element whose index is @code{"."}. +This element is an array containing the same two or three elements as +for a file: @code{"path"}, @code{"stat"}, and @code{"error"}. +@end table +@end table + +The @code{fts()} function returns zero if there were no errors. +Otherwise it returns @minus{}1. + +@quotation NOTE +The @code{fts()} extension does not exactly mimic the +interface of the C library @code{fts()} routines, choosing instead to +provide an interface that is based on associative arrays, which should +be more comfortable to use from an @command{awk} program. This includes the +lack of a comparison function, since @command{gawk} already provides +powerful array sorting facilities. While an @code{fts_read()}-like +interface could have been provided, this felt less natural than simply +creating a multi-dimensional array to represent the file hierarchy and +its information. +@end quotation + +See @file{test/fts.awk} in the @command{gawk} distribution for an example. + +@node Extension Sample Fnmatch +@subsection Interface To @code{fnmatch()} + +This extension provides an interface to the C library +@code{fnmatch()} function. The usage is: + +@example +@@load "fnmatch" + +result = fnmatch(pattern, string, flags) +@end example + +The @code{fnmatch} extension adds a single function named +@code{fnmatch()}, one constant (@code{FNM_NOMATCH}), and an array of +flag values named @code{FNM}. + +The arguments to @code{fnmatch()} are: + +@table @code +@item pattern +The filename wildcard to match. + +@item string +The filename string, + +@item flag +Either zero, or the bitwise OR of one or more of the +flags in the @code{FNM} array. +@end table + +The return value is zero on success, @code{FNM_NOMATCH} +if the string did not match the pattern, or +a different non-zero value if an error occurred. + +The flags are follows: + +@multitable @columnfractions .25 .75 +@item @code{FNM["CASEFOLD"]} @tab +Corresponds to the @code{FNM_CASEFOLD} flag as defined in @code{fnmatch()}. + +@item @code{FNM["FILE_NAME"]} @tab +Corresponds to the @code{FNM_FILE_NAME} flag as defined in @code{fnmatch()}. + +@item @code{FNM["LEADING_DIR"]} @tab +Corresponds to the @code{FNM_LEADING_DIR} flag as defined in @code{fnmatch()}. + +@item @code{FNM["NOESCAPE"]} @tab +Corresponds to the @code{FNM_NOESCAPE} flag as defined in @code{fnmatch()}. + +@item @code{FNM["PATHNAME"]} @tab +Corresponds to the @code{FNM_PATHNAME} flag as defined in @code{fnmatch()}. + +@item @code{FNM["PERIOD"]} @tab +Corresponds to the @code{FNM_PERIOD} flag as defined in @code{fnmatch()}. +@end multitable + +Here is an example: + +@example +@@load "fnmatch" +@dots{} +flags = or(FNM["PERIOD"], FNM["NOESCAPE"]) +if (fnmatch("*.a", "foo.c", flags) == FNM_NOMATCH) + print "no match" +@end example + +@node Extension Sample Fork +@subsection Interface To @code{fork()}, @code{wait()} and @code{waitpid()} + +The @code{fork} extension adds three functions, as follows. + +@table @code +@item @@load "fork" +This is how you load the extension. + +@item pid = fork() +This function creates a new process. The return value is the zero in the +child and the process-id number of the child in the parent, or @minus{}1 +upon error. In the latter case, @code{ERRNO} indicates the problem. +In the child, @code{PROCINFO["pid"]} and @code{PROCINFO["ppid"]} are +updated to reflect the correct values. + +@item ret = waitpid(pid) +This function takes a numeric argument, which is the process-id to +wait for. The return value is that of the +@code{waitpid()} system call. + +@item ret = wait() +This function waits for the first child to die. +The return value is that of the +@code{wait()} system call. +@end table + +There is no corresponding @code{exec()} function. + +Here is an example: + +@example +@@load "fork" +@dots{} +if ((pid = fork()) == 0) + print "hello from the child" +else + print "hello from the parent" +@end example + +@node Extension Sample Ord +@subsection Character and Numeric values: @code{ord()} and @code{chr()} + +The @code{ordchr} extension adds two functions, named +@code{ord()} and @code{chr()}, as follows. + +@table @code +@item number = ord(string) +Return the numeric value of the first character in @code{string}. + +@item char = chr(number) +Return the string whose first character is that represented by @code{number}. +@end table + +These functions are inspired by the Pascal language functions +of the same name. Here is an example: + +@example +@@load "ordchr" +@dots{} +printf("The numeric value of 'A' is %d\n", ord("A")) +printf("The string value of 65 is %s\n", chr(65)) +@end example + +@node Extension Sample Readdir +@subsection Reading Directories + +The @code{readdir} extension adds an input parser for directories, and +adds a single function named @code{readdir_do_ftype()}. +The usage is as follows: + +@example +@@load "readdir" + +readdir_do_ftype("stat") # or "dirent" or "never" +@end example + +When this extension is in use, instead of skipping directories named +on the command line (or with @code{getline}), +they are read, with each entry returned as a record. + +The record consists of at least two fields: the inode number and the +filename, separated by a forward slash character. +On systems where the directory entry contains the file type, the record +has a third field which is a single letter indicating the type of the +file: + +@multitable @columnfractions .1 .9 +@headitem Letter @tab File Type +@item @code{b} @tab Block device +@item @code{c} @tab Character device +@item @code{d} @tab Directory +@item @code{f} @tab Regular file +@item @code{l} @tab Symbolic link +@item @code{p} @tab Named pipe (FIFO) +@item @code{s} @tab Socket +@item @code{u} @tab Anything else (unknown) +@end multitable + +On systems without the file type information, calling +@samp{readdir_do_ftype("stat")} causes the extension to use the +@code{lstat()} system call to retrieve the appropriate information. This +is not the default, since @code{lstat()} is a potentially expensive +operation. By calling @samp{readdir_do_ftype("never")} one can ensure +that the file type information is never displayed, even when readily +available in the directory entry. + +The third option, @samp{readdir_do_ftype("dirent")}, takes file type +information from the directory entry, if it is available. This is the +default on systems that supply this information. + +The @code{readdir_do_ftype()} function sets @code{ERRNO} if called +without arguments or with invalid arguments. + +@quotation NOTE +On GNU/Linux systems, there are filesystems that don't support the +@code{d_type} entry (see the @i{readdir}(3) manual page), and so the file +type is always @samp{u}. Therefore, using @samp{readdir_do_ftype("stat")} +is advisable even on GNU/Linux systems. In this case, the @code{readdir} +extension falls back to using @code{lstat()} when it encounters an +unknown file type. +@end quotation + +Here is an example: + +@example +@@load "readdir" +@dots{} +BEGIN @{ FS = "/" @} +@{ print "file name is", $2 @} +@end example + +@node Extension Sample Revout +@subsection Reversing Output + +The @code{revoutput} extension adds a simple output wrapper that reverses +the characters in each output line. It's main purpose is to show how to +write an output wrapper, although it may be mildly amusing for the unwary. +Here is an example: + +@example +@@load "revoutput" + +BEGIN @{ + REVOUT = 1 + print "hello, world" > "/dev/stdout" +@} +@end example + +The output from this program is: +@samp{dlrow ,olleh}. + +@node Extension Sample Rev2way +@subsection Two-Way I/O Example + +The @code{revtwoway} extension adds a simple two-way processor that +reverses the characters in each line sent to it for reading back by +the @command{awk} program. It's main purpose is to show how to write +a two-way processor, although it may also be mildly amusing. +The following example shows how to use it: + +@example +@@load "revtwoway" + +BEGIN @{ + cmd = "/magic/mirror" + print "hello, world" |& cmd + cmd |& getline result + print result + close(cmd) +@} +@end example + +@node Extension Sample Read write array +@subsection Dumping and Restoring An Array + +The @code{rwarray} extension adds two functions, +named @code{writea()} and @code{reada()}, as follows: + +@table @code +@item ret = writea(file, array) +This function takes a string argument, which is the name of the file +to which dump the array, and the array itself as the second argument. +@code{writea()} understands multidimensional arrays. It returns one on +success, or zero upon failure. + +@item ret = reada(file, array) +@code{reada()} is the inverse of @code{writea()}; +it reads the file named as its first argument, filling in +the array named as the second argument. It clears the array first. +Here too, the return value is one on success and zero upon failure. +@end table + +The array created by @code{reada()} is identical to that written by +@code{writea()} in the sense that the contents are the same. However, +due to implementation issues, the array traversal order of the recreated +array is likely to be different from that of the original array. As array +traversal order in @command{awk} is by default undefined, this is not +(technically) a problem. If you need to guarantee a particular traversal +order, use the array sorting features in @command{gawk} to do so +(@pxref{Array Sorting}). + +The file contains binary data. All integral values are written in network +byte order. However, double precision floating-point values are written +as native binary data. Thus, arrays containing only string data can +theoretically be dumped on systems with one byte order and restored on +systems with a different one, but this has not been tried. + +Here is an example: + +@example +@@load "rwarray" +@dots{} +ret = writea("arraydump.bin", array) +@dots{} +ret = reada("arraydump.bin", array) +@end example + +@node Extension Sample Readfile +@subsection Reading An Entire File + +The @code{readfile} extension adds a single function +named @code{readfile()}: + +@table @code +@item result = readfile("/some/path") +The argument is the name of the file to read. The return value is a +string containing the entire contents of the requested file. Upon error, +the function returns the empty string and sets @code{ERRNO}. +@end table + +Here is an example: + +@example +@@load "readfile" +@dots{} +contents = readfile("/path/to/file"); +if (contents == "" && ERRNO != "") @{ + print("problem reading file", ERRNO) > "/dev/stderr" + ... +@} +@end example + +@node Extension Sample API Tests +@subsection API Tests + +The @code{testext} extension exercises parts of the extension API that +are not tested by the other samples. The @file{extension/testext.c} +file contains both the C code for the extension and @command{awk} +test code inside C comments that run the tests. The testing framework +extracts the @command{awk} code and runs the tests. See the source file +for more information. + +@node Extension Sample Time +@subsection Extension Time Functions + +@cindex time +@cindex sleep + +These functions can be used by either invoking @command{gawk} +with a command-line argument of @samp{-l time} or by +inserting @samp{@@load "time"} in your script. + +@table @code + +@cindex @code{gettimeofday} time extension function +@item the_time = gettimeofday() +Return the time in seconds that has elapsed since 1970-01-01 UTC as a +floating point value. If the time is unavailable on this platform, return +@minus{}1 and set @code{ERRNO}. The returned time should have sub-second +precision, but the actual precision will vary based on the platform. +If the standard C @code{gettimeofday()} system call is available on this +platform, then it simply returns the value. Otherwise, if on Windows, +it tries to use @code{GetSystemTimeAsFileTime()}. + +@cindex @code{sleep} time extension function +@item result = sleep(@var{seconds}) +Attempt to sleep for @var{seconds} seconds. If @var{seconds} is negative, +or the attempt to sleep fails, return @minus{}1 and set @code{ERRNO}. +Otherwise, return zero after sleeping for the indicated amount of time. +Note that @var{seconds} may be a floating-point (non-integral) value. +Implementation details: depending on platform availability, this function +tries to use @code{nanosleep()} or @code{select()} to implement the delay. +@end table + +@node gawkextlib +@section The @code{gawkextlib} Project + +The @uref{http://sourceforge.net/projects/gawkextlib/, @code{gawkextlib}} +project provides a number of @command{gawk} extensions, including one for +processing XML files. This is the evolution of the original @command{xgawk} +(XML @command{gawk}) project. + +As of this writing, there are four extensions: + +@itemize @bullet +@item +XML parser extension, using the @uref{http://expat.sourceforge.net, Expat} +XML parsing library. + +@item +Postgres SQL extension. + +@item +GD graphics library extension. + +@item +MPFR library extension. +This provides access to a number of MPFR functions which @command{gawk}'s +native MPFR support does not. +@end itemize + +The @code{time} extension described earlier (@pxref{Extension Sample +Time}) was originally from this project but has been moved in to the +main @command{gawk} distribution. + +You can check out the code for the @code{gawkextlib} project +using the @uref{http://git-scm.com, GIT} distributed source +code control system. The command is as follows: + +@example +git clone git://git.code.sf.net/p/gawkextlib/code gawkextlib-code +@end example + +You will need to have the @uref{http://expat.sourceforge.net, Expat} +XML parser library installed in order to build and use the XML extension. + +In addition, you must have the GNU Autotools installed +(@uref{http://www.gnu.org/software/autoconf, Autoconf}, +@uref{http://www.gnu.org/software/automake, Automake}, +@uref{http://www.gnu.org/software/libtool, Libtool}, +and +@uref{http://www.gnu.org/software/gettext, Gettext}). + +The simple recipe for building and testing @code{gawkextlib} is as follows. +First, build and install @command{gawk}: + +@example +cd .../path/to/gawk/code +./configure --prefix=/tmp/newgawk @ii{Install in /tmp/newgawk for now} +make && make check @ii{Build and check that all is OK} +make install @ii{Install gawk} +@end example + +Next, build @code{gawkextlib} and test it: + +@example +cd .../path/to/gawkextlib-code +./update-autotools @ii{Generate configure, etc.} + @ii{You may have to run this command twice} +./configure --with-gawk=/tmp/newgawk @ii{Configure, point at ``installed'' gawk} +make && make check @ii{Build and check that all is OK} +@end example + +If you write an extension that you wish to share with other +@command{gawk} users, please consider doing so through the +@code{gawkextlib} project. + @ignore @c Try this |