diff options
author | Arnold D. Robbins <arnold@skeeve.com> | 2012-11-04 15:14:34 +0200 |
---|---|---|
committer | Arnold D. Robbins <arnold@skeeve.com> | 2012-11-04 15:14:34 +0200 |
commit | 2204f38c05fef5747b8f6764a202b646f4126338 (patch) | |
tree | a24b6a63658bb9c0ef4baf989061f65959496992 | |
parent | 5d3c11459bf9c8870cfc599722118b910aa17394 (diff) | |
download | egawk-2204f38c05fef5747b8f6764a202b646f4126338.tar.gz egawk-2204f38c05fef5747b8f6764a202b646f4126338.tar.bz2 egawk-2204f38c05fef5747b8f6764a202b646f4126338.zip |
Finally! Integrated API chapter into gawk doc.
-rw-r--r-- | doc/ChangeLog | 4 | ||||
-rw-r--r-- | doc/gawk.info | 5197 | ||||
-rw-r--r-- | doc/gawk.texi | 4521 |
3 files changed, 8154 insertions, 1568 deletions
diff --git a/doc/ChangeLog b/doc/ChangeLog index c05f5586..5aa8f674 100644 --- a/doc/ChangeLog +++ b/doc/ChangeLog @@ -1,3 +1,7 @@ +2012-11-04 Arnold D. Robbins <arnold@skeeve.com> + + * gawk.texi: New chapter on extension API. + 2012-11-03 Arnold D. Robbins <arnold@skeeve.com> * api-figure1.pdf, api-figure2.pdf, api-figure3.pdf: Removed. diff --git a/doc/gawk.info b/doc/gawk.info index baa064d7..73fac121 100644 --- a/doc/gawk.info +++ b/doc/gawk.info @@ -113,419 +113,531 @@ texts being (a) (see below), and with the Back-Cover Texts being (b) * GNU Free Documentation License:: The license for this Info file. * Index:: Concept and Variable Index. -* History:: The history of `gawk' and - `awk'. -* Names:: What name to use to find `awk'. -* This Manual:: Using this Info file. Includes - sample input files that you can use. -* Conventions:: Typographical Conventions. -* Manual History:: Brief history of the GNU project and this - Info file. -* How To Contribute:: Helping to save the world. -* Acknowledgments:: Acknowledgments. -* Running gawk:: How to run `gawk' programs; - includes command-line syntax. -* One-shot:: Running a short throwaway `awk' - program. -* Read Terminal:: Using no input files (input from terminal - instead). -* Long:: Putting permanent `awk' programs in - files. -* Executable Scripts:: Making self-contained `awk' - programs. -* Comments:: Adding documentation to `gawk' - programs. -* Quoting:: More discussion of shell quoting issues. -* DOS Quoting:: Quoting in Windows Batch Files. -* Sample Data Files:: Sample data files for use in the - `awk' programs illustrated in this - Info file. -* Very Simple:: A very simple example. -* Two Rules:: A less simple one-line example using two - rules. -* More Complex:: A more complex example. -* Statements/Lines:: Subdividing or combining statements into - lines. -* Other Features:: Other Features of `awk'. -* When:: When to use `gawk' and when to use - other things. -* Command Line:: How to run `awk'. -* Options:: Command-line options and their meanings. -* Other Arguments:: Input file names and variable assignments. -* Naming Standard Input:: How to specify standard input with other - files. -* Environment Variables:: The environment variables `gawk' - uses. -* AWKPATH Variable:: Searching directories for `awk' - programs. -* AWKLIBPATH Variable:: Searching directories for `awk' - shared libraries. -* Other Environment Variables:: The environment variables. -* Exit Status:: `gawk''s exit status. -* Include Files:: Including other files into your program. -* Loading Shared Libraries:: Loading shared libraries into your program. -* Obsolete:: Obsolete Options and/or features. -* Undocumented:: Undocumented Options and Features. -* Regexp Usage:: How to Use Regular Expressions. -* Escape Sequences:: How to write nonprinting characters. -* Regexp Operators:: Regular Expression Operators. -* Bracket Expressions:: What can go between `[...]'. -* GNU Regexp Operators:: Operators specific to GNU software. -* Case-sensitivity:: How to do case-insensitive matching. -* Leftmost Longest:: How much text matches. -* Computed Regexps:: Using Dynamic Regexps. -* Records:: Controlling how data is split into records. -* Fields:: An introduction to fields. -* Nonconstant Fields:: Nonconstant Field Numbers. -* Changing Fields:: Changing the Contents of a Field. -* Field Separators:: The field separator and how to change it. -* Default Field Splitting:: How fields are normally separated. -* Regexp Field Splitting:: Using regexps as the field separator. -* Single Character Fields:: Making each character a separate field. -* Command Line Field Separator:: Setting `FS' from the command-line. -* Field Splitting Summary:: Some final points and a summary table. -* Constant Size:: Reading constant width data. -* Splitting By Content:: Defining Fields By Content -* Multiple Line:: Reading multi-line records. -* Getline:: Reading files under explicit program - control using the `getline' function. -* Plain Getline:: Using `getline' with no arguments. -* Getline/Variable:: Using `getline' into a variable. -* Getline/File:: Using `getline' from a file. -* Getline/Variable/File:: Using `getline' into a variable from a - file. -* Getline/Pipe:: Using `getline' from a pipe. -* Getline/Variable/Pipe:: Using `getline' into a variable from a - pipe. -* Getline/Coprocess:: Using `getline' from a coprocess. -* Getline/Variable/Coprocess:: Using `getline' into a variable from a - coprocess. -* Getline Notes:: Important things to know about - `getline'. -* Getline Summary:: Summary of `getline' Variants. -* Read Timeout:: Reading input with a timeout. -* Command line directories:: What happens if you put a directory on the - command line. -* Print:: The `print' statement. -* Print Examples:: Simple examples of `print' statements. -* Output Separators:: The output separators and how to change - them. -* OFMT:: Controlling Numeric Output With - `print'. -* Printf:: The `printf' statement. -* Basic Printf:: Syntax of the `printf' statement. -* Control Letters:: Format-control letters. -* Format Modifiers:: Format-specification modifiers. -* Printf Examples:: Several examples. -* Redirection:: How to redirect output to multiple files - and pipes. -* Special Files:: File name interpretation in `gawk'. - `gawk' allows access to inherited - file descriptors. -* Special FD:: Special files for I/O. -* Special Network:: Special files for network communications. -* Special Caveats:: Things to watch out for. -* Close Files And Pipes:: Closing Input and Output Files and Pipes. -* Values:: Constants, Variables, and Regular - Expressions. -* Constants:: String, numeric and regexp constants. -* Scalar Constants:: Numeric and string constants. -* Nondecimal-numbers:: What are octal and hex numbers. -* Regexp Constants:: Regular Expression constants. -* Using Constant Regexps:: When and how to use a regexp constant. -* Variables:: Variables give names to values for later - use. -* Using Variables:: Using variables in your programs. -* Assignment Options:: Setting variables on the command-line and a - summary of command-line syntax. This is an - advanced method of input. -* Conversion:: The conversion of strings to numbers and - vice versa. -* All Operators:: `gawk''s operators. -* Arithmetic Ops:: Arithmetic operations (`+', `-', - etc.) -* Concatenation:: Concatenating strings. -* Assignment Ops:: Changing the value of a variable or a - field. -* Increment Ops:: Incrementing the numeric value of a - variable. -* Truth Values and Conditions:: Testing for true and false. -* Truth Values:: What is ``true'' and what is ``false''. -* Typing and Comparison:: How variables acquire types and how this - affects comparison of numbers and strings - with `<', etc. -* Variable Typing:: String type versus numeric type. -* Comparison Operators:: The comparison operators. -* POSIX String Comparison:: String comparison with POSIX rules. -* Boolean Ops:: Combining comparison expressions using - boolean operators `||' (``or''), - `&&' (``and'') and `!' (``not''). -* Conditional Exp:: Conditional expressions select between two - subexpressions under control of a third - subexpression. -* Function Calls:: A function call is an expression. -* Precedence:: How various operators nest. -* Locales:: How the locale affects things. -* Pattern Overview:: What goes into a pattern. -* Regexp Patterns:: Using regexps as patterns. -* Expression Patterns:: Any expression can be used as a pattern. -* Ranges:: Pairs of patterns specify record ranges. -* BEGIN/END:: Specifying initialization and cleanup - rules. -* Using BEGIN/END:: How and why to use BEGIN/END rules. -* I/O And BEGIN/END:: I/O issues in BEGIN/END rules. -* BEGINFILE/ENDFILE:: Two special patterns for advanced control. -* Empty:: The empty pattern, which matches every - record. -* Using Shell Variables:: How to use shell variables with - `awk'. -* Action Overview:: What goes into an action. -* Statements:: Describes the various control statements in - detail. -* If Statement:: Conditionally execute some `awk' - statements. -* While Statement:: Loop until some condition is satisfied. -* Do Statement:: Do specified action while looping until - some condition is satisfied. -* For Statement:: Another looping statement, that provides - initialization and increment clauses. -* Switch Statement:: Switch/case evaluation for conditional - execution of statements based on a value. -* Break Statement:: Immediately exit the innermost enclosing - loop. -* Continue Statement:: Skip to the end of the innermost enclosing - loop. -* Next Statement:: Stop processing the current input record. -* Nextfile Statement:: Stop processing the current file. -* Exit Statement:: Stop execution of `awk'. -* Built-in Variables:: Summarizes the built-in variables. -* User-modified:: Built-in variables that you change to - control `awk'. -* Auto-set:: Built-in variables where `awk' - gives you information. -* ARGC and ARGV:: Ways to use `ARGC' and `ARGV'. -* Array Basics:: The basics of arrays. -* Array Intro:: Introduction to Arrays -* Reference to Elements:: How to examine one element of an array. -* Assigning Elements:: How to change an element of an array. -* Array Example:: Basic Example of an Array -* Scanning an Array:: A variation of the `for' statement. It - loops through the indices of an array's - existing elements. -* Controlling Scanning:: Controlling the order in which arrays are - scanned. -* Delete:: The `delete' statement removes an - element from an array. -* Numeric Array Subscripts:: How to use numbers as subscripts in - `awk'. -* Uninitialized Subscripts:: Using Uninitialized variables as - subscripts. -* Multi-dimensional:: Emulating multidimensional arrays in - `awk'. -* Multi-scanning:: Scanning multidimensional arrays. -* Arrays of Arrays:: True multidimensional arrays. -* Built-in:: Summarizes the built-in functions. -* Calling Built-in:: How to call built-in functions. -* Numeric Functions:: Functions that work with numbers, including - `int()', `sin()' and - `rand()'. -* String Functions:: Functions for string manipulation, such as - `split()', `match()' and - `sprintf()'. -* Gory Details:: More than you want to know about `\' - and `&' with `sub()', - `gsub()', and `gensub()'. -* I/O Functions:: Functions for files and shell commands. -* Time Functions:: Functions for dealing with timestamps. -* Bitwise Functions:: Functions for bitwise operations. -* Type Functions:: Functions for type information. -* I18N Functions:: Functions for string translation. -* User-defined:: Describes User-defined functions in detail. -* Definition Syntax:: How to write definitions and what they - mean. -* Function Example:: An example function definition and what it - does. -* Function Caveats:: Things to watch out for. -* Calling A Function:: Don't use spaces. -* Variable Scope:: Controlling variable scope. -* Pass By Value/Reference:: Passing parameters. -* Return Statement:: Specifying the value a function returns. -* Dynamic Typing:: How variable types can change at runtime. -* Indirect Calls:: Choosing the function to call at runtime. -* I18N and L10N:: Internationalization and Localization. -* Explaining gettext:: How GNU `gettext' works. -* Programmer i18n:: Features for the programmer. -* Translator i18n:: Features for the translator. -* String Extraction:: Extracting marked strings. -* Printf Ordering:: Rearranging `printf' arguments. -* I18N Portability:: `awk'-level portability issues. -* I18N Example:: A simple i18n example. -* Gawk I18N:: `gawk' is also internationalized. -* Nondecimal Data:: Allowing nondecimal input data. -* Array Sorting:: Facilities for controlling array traversal - and sorting arrays. -* Controlling Array Traversal:: How to use PROCINFO["sorted_in"]. -* Array Sorting Functions:: How to use `asort()' and - `asorti()'. -* Two-way I/O:: Two-way communications with another - process. -* TCP/IP Networking:: Using `gawk' for network - programming. -* Profiling:: Profiling your `awk' programs. -* Library Names:: How to best name private global variables - in library functions. -* General Functions:: Functions that are of general use. -* Strtonum Function:: A replacement for the built-in - `strtonum()' function. -* Assert Function:: A function for assertions in `awk' - programs. -* Round Function:: A function for rounding if `sprintf()' - does not do it correctly. -* Cliff Random Function:: The Cliff Random Number Generator. -* Ordinal Functions:: Functions for using characters as numbers - and vice versa. -* Join Function:: A function to join an array into a string. -* Getlocaltime Function:: A function to get formatted times. -* Data File Management:: Functions for managing command-line data - files. -* Filetrans Function:: A function for handling data file - transitions. -* Rewind Function:: A function for rereading the current file. -* File Checking:: Checking that data files are readable. -* Empty Files:: Checking for zero-length files. -* Ignoring Assigns:: Treating assignments as file names. -* Getopt Function:: A function for processing command-line - arguments. -* Passwd Functions:: Functions for getting user information. -* Group Functions:: Functions for getting group information. -* Walking Arrays:: A function to walk arrays of arrays. -* Running Examples:: How to run these examples. -* Clones:: Clones of common utilities. -* Cut Program:: The `cut' utility. -* Egrep Program:: The `egrep' utility. -* Id Program:: The `id' utility. -* Split Program:: The `split' utility. -* Tee Program:: The `tee' utility. -* Uniq Program:: The `uniq' utility. -* Wc Program:: The `wc' utility. -* Miscellaneous Programs:: Some interesting `awk' programs. -* Dupword Program:: Finding duplicated words in a document. -* Alarm Program:: An alarm clock. -* Translate Program:: A program similar to the `tr' - utility. -* Labels Program:: Printing mailing labels. -* Word Sorting:: A program to produce a word usage count. -* History Sorting:: Eliminating duplicate entries from a - history file. -* Extract Program:: Pulling out programs from Texinfo source - files. -* Simple Sed:: A Simple Stream Editor. -* Igawk Program:: A wrapper for `awk' that includes - files. -* Anagram Program:: Finding anagrams from a dictionary. -* Signature Program:: People do amazing things with too much time - on their hands. -* Debugging:: Introduction to `gawk' debugger. -* Debugging Concepts:: Debugging in General. -* Debugging Terms:: Additional Debugging Concepts. -* Awk Debugging:: Awk Debugging. -* Sample Debugging Session:: Sample debugging session. -* Debugger Invocation:: How to Start the Debugger. -* Finding The Bug:: Finding the Bug. -* List of Debugger Commands:: Main debugger commands. -* Breakpoint Control:: Control of Breakpoints. -* Debugger Execution Control:: Control of Execution. -* Viewing And Changing Data:: Viewing and Changing Data. -* Execution Stack:: Dealing with the Stack. -* Debugger Info:: Obtaining Information about the Program and - the Debugger State. -* Miscellaneous Debugger Commands:: Miscellaneous Commands. -* Readline Support:: Readline support. -* Limitations:: Limitations and future plans. -* General Arithmetic:: An introduction to computer arithmetic. -* Floating Point Issues:: Stuff to know about floating-point numbers. -* String Conversion Precision:: The String Value Can Lie. -* Unexpected Results:: Floating Point Numbers Are Not Abstract - Numbers. -* POSIX Floating Point Problems:: Standards Versus Existing Practice. -* Integer Programming:: Effective integer programming. -* Floating-point Programming:: Effective Floating-point Programming. -* Floating-point Representation:: Binary floating-point representation. -* Floating-point Context:: Floating-point context. -* Rounding Mode:: Floating-point rounding mode. -* Gawk and MPFR:: How `gawk' provides - arbitrary-precision arithmetic. -* Arbitrary Precision Floats:: Arbitrary Precision Floating-point - Arithmetic with `gawk'. -* Setting Precision:: Setting the working precision. -* Setting Rounding Mode:: Setting the rounding mode. -* Floating-point Constants:: Representing floating-point constants. -* Changing Precision:: Changing the precision of a number. -* Exact Arithmetic:: Exact arithmetic with floating-point - numbers. -* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic with - `gawk'. -* Plugin License:: A note about licensing. -* Sample Library:: A example of new functions. -* Internal File Description:: What the new functions will do. -* Internal File Ops:: The code for internal file operations. -* Using Internal File Ops:: How to use an external extension. -* V7/SVR3.1:: The major changes between V7 and System V - Release 3.1. -* SVR4:: Minor changes between System V Releases 3.1 - and 4. -* POSIX:: New features from the POSIX standard. -* BTL:: New features from Brian Kernighan's version - of `awk'. -* POSIX/GNU:: The extensions in `gawk' not in - POSIX `awk'. -* Common Extensions:: Common Extensions Summary. -* Ranges and Locales:: How locales used to affect regexp ranges. -* Contributors:: The major contributors to `gawk'. -* Gawk Distribution:: What is in the `gawk' distribution. -* Getting:: How to get the distribution. -* Extracting:: How to extract the distribution. -* Distribution contents:: What is in the distribution. -* Unix Installation:: Installing `gawk' under various - versions of Unix. -* Quick Installation:: Compiling `gawk' under Unix. -* Additional Configuration Options:: Other compile-time options. -* Configuration Philosophy:: How it's all supposed to work. -* Non-Unix Installation:: Installation on Other Operating Systems. -* PC Installation:: Installing and Compiling `gawk' on - MS-DOS and OS/2. -* PC Binary Installation:: Installing a prepared distribution. -* PC Compiling:: Compiling `gawk' for MS-DOS, - Windows32, and OS/2. -* PC Testing:: Testing `gawk' on PC systems. -* PC Using:: Running `gawk' on MS-DOS, Windows32 - and OS/2. -* Cygwin:: Building and running `gawk' for - Cygwin. -* MSYS:: Using `gawk' In The MSYS - Environment. -* VMS Installation:: Installing `gawk' on VMS. -* VMS Compilation:: How to compile `gawk' under VMS. -* VMS Installation Details:: How to install `gawk' under VMS. -* VMS Running:: How to run `gawk' under VMS. -* VMS Old Gawk:: An old version comes with some VMS systems. -* Bugs:: Reporting Problems and Bugs. -* Other Versions:: Other freely available `awk' - implementations. -* Compatibility Mode:: How to disable certain `gawk' - extensions. -* Additions:: Making Additions To `gawk'. -* Accessing The Source:: Accessing the Git repository. -* Adding Code:: Adding code to the main body of - `gawk'. -* New Ports:: Porting `gawk' to a new operating - system. -* Derived Files:: Why derived files are kept in the - `git' repository. -* Future Extensions:: New features that may be implemented one - day. -* Basic High Level:: The high level view. -* Basic Data Typing:: A very quick intro to data types. +* History:: The history of `gawk' and + `awk'. +* Names:: What name to use to find + `awk'. +* This Manual:: Using this Info file. Includes + sample input files that you can use. +* Conventions:: Typographical Conventions. +* Manual History:: Brief history of the GNU project and + this Info file. +* How To Contribute:: Helping to save the world. +* Acknowledgments:: Acknowledgments. +* Running gawk:: How to run `gawk' programs; + includes command-line syntax. +* One-shot:: Running a short throwaway + `awk' program. +* Read Terminal:: Using no input files (input from + terminal instead). +* Long:: Putting permanent `awk' + programs in files. +* Executable Scripts:: Making self-contained `awk' + programs. +* Comments:: Adding documentation to `gawk' + programs. +* Quoting:: More discussion of shell quoting + issues. +* DOS Quoting:: Quoting in Windows Batch Files. +* Sample Data Files:: Sample data files for use in the + `awk' programs illustrated in + this Info file. +* Very Simple:: A very simple example. +* Two Rules:: A less simple one-line example using + two rules. +* More Complex:: A more complex example. +* Statements/Lines:: Subdividing or combining statements + into lines. +* Other Features:: Other Features of `awk'. +* When:: When to use `gawk' and when to + use other things. +* Command Line:: How to run `awk'. +* Options:: Command-line options and their + meanings. +* Other Arguments:: Input file names and variable + assignments. +* Naming Standard Input:: How to specify standard input with + other files. +* Environment Variables:: The environment variables + `gawk' uses. +* AWKPATH Variable:: Searching directories for + `awk' programs. +* AWKLIBPATH Variable:: Searching directories for + `awk' shared libraries. +* Other Environment Variables:: The environment variables. +* Exit Status:: `gawk''s exit status. +* Include Files:: Including other files into your + program. +* Loading Shared Libraries:: Loading shared libraries into your + program. +* Obsolete:: Obsolete Options and/or features. +* Undocumented:: Undocumented Options and Features. +* Regexp Usage:: How to Use Regular Expressions. +* Escape Sequences:: How to write nonprinting characters. +* Regexp Operators:: Regular Expression Operators. +* Bracket Expressions:: What can go between `[...]'. +* GNU Regexp Operators:: Operators specific to GNU software. +* Case-sensitivity:: How to do case-insensitive matching. +* Leftmost Longest:: How much text matches. +* Computed Regexps:: Using Dynamic Regexps. +* Records:: Controlling how data is split into + records. +* Fields:: An introduction to fields. +* Nonconstant Fields:: Nonconstant Field Numbers. +* Changing Fields:: Changing the Contents of a Field. +* Field Separators:: The field separator and how to change + it. +* Default Field Splitting:: How fields are normally separated. +* Regexp Field Splitting:: Using regexps as the field separator. +* Single Character Fields:: Making each character a separate + field. +* Command Line Field Separator:: Setting `FS' from the + command-line. +* Field Splitting Summary:: Some final points and a summary table. +* Constant Size:: Reading constant width data. +* Splitting By Content:: Defining Fields By Content +* Multiple Line:: Reading multi-line records. +* Getline:: Reading files under explicit program + control using the `getline' + function. +* Plain Getline:: Using `getline' with no + arguments. +* Getline/Variable:: Using `getline' into a variable. +* Getline/File:: Using `getline' from a file. +* Getline/Variable/File:: Using `getline' into a variable + from a file. +* Getline/Pipe:: Using `getline' from a pipe. +* Getline/Variable/Pipe:: Using `getline' into a variable + from a pipe. +* Getline/Coprocess:: Using `getline' from a coprocess. +* Getline/Variable/Coprocess:: Using `getline' into a variable + from a coprocess. +* Getline Notes:: Important things to know about + `getline'. +* Getline Summary:: Summary of `getline' Variants. +* Read Timeout:: Reading input with a timeout. +* Command line directories:: What happens if you put a directory on + the command line. +* Print:: The `print' statement. +* Print Examples:: Simple examples of `print' + statements. +* Output Separators:: The output separators and how to + change them. +* OFMT:: Controlling Numeric Output With + `print'. +* Printf:: The `printf' statement. +* Basic Printf:: Syntax of the `printf' statement. +* Control Letters:: Format-control letters. +* Format Modifiers:: Format-specification modifiers. +* Printf Examples:: Several examples. +* Redirection:: How to redirect output to multiple + files and pipes. +* Special Files:: File name interpretation in + `gawk'. `gawk' allows + access to inherited file descriptors. +* Special FD:: Special files for I/O. +* Special Network:: Special files for network + communications. +* Special Caveats:: Things to watch out for. +* Close Files And Pipes:: Closing Input and Output Files and + Pipes. +* Values:: Constants, Variables, and Regular + Expressions. +* Constants:: String, numeric and regexp constants. +* Scalar Constants:: Numeric and string constants. +* Nondecimal-numbers:: What are octal and hex numbers. +* Regexp Constants:: Regular Expression constants. +* Using Constant Regexps:: When and how to use a regexp constant. +* Variables:: Variables give names to values for + later use. +* Using Variables:: Using variables in your programs. +* Assignment Options:: Setting variables on the command-line + and a summary of command-line syntax. + This is an advanced method of input. +* Conversion:: The conversion of strings to numbers + and vice versa. +* All Operators:: `gawk''s operators. +* Arithmetic Ops:: Arithmetic operations (`+', + `-', etc.) +* Concatenation:: Concatenating strings. +* Assignment Ops:: Changing the value of a variable or a + field. +* Increment Ops:: Incrementing the numeric value of a + variable. +* Truth Values and Conditions:: Testing for true and false. +* Truth Values:: What is ``true'' and what is + ``false''. +* Typing and Comparison:: How variables acquire types and how + this affects comparison of numbers and + strings with `<', etc. +* Variable Typing:: String type versus numeric type. +* Comparison Operators:: The comparison operators. +* POSIX String Comparison:: String comparison with POSIX rules. +* Boolean Ops:: Combining comparison expressions using + boolean operators `||' (``or''), + `&&' (``and'') and `!' + (``not''). +* Conditional Exp:: Conditional expressions select between + two subexpressions under control of a + third subexpression. +* Function Calls:: A function call is an expression. +* Precedence:: How various operators nest. +* Locales:: How the locale affects things. +* Pattern Overview:: What goes into a pattern. +* Regexp Patterns:: Using regexps as patterns. +* Expression Patterns:: Any expression can be used as a + pattern. +* Ranges:: Pairs of patterns specify record + ranges. +* BEGIN/END:: Specifying initialization and cleanup + rules. +* Using BEGIN/END:: How and why to use BEGIN/END rules. +* I/O And BEGIN/END:: I/O issues in BEGIN/END rules. +* BEGINFILE/ENDFILE:: Two special patterns for advanced + control. +* Empty:: The empty pattern, which matches every + record. +* Using Shell Variables:: How to use shell variables with + `awk'. +* Action Overview:: What goes into an action. +* Statements:: Describes the various control + statements in detail. +* If Statement:: Conditionally execute some + `awk' statements. +* While Statement:: Loop until some condition is + satisfied. +* Do Statement:: Do specified action while looping + until some condition is satisfied. +* For Statement:: Another looping statement, that + provides initialization and increment + clauses. +* Switch Statement:: Switch/case evaluation for conditional + execution of statements based on a + value. +* Break Statement:: Immediately exit the innermost + enclosing loop. +* Continue Statement:: Skip to the end of the innermost + enclosing loop. +* Next Statement:: Stop processing the current input + record. +* Nextfile Statement:: Stop processing the current file. +* Exit Statement:: Stop execution of `awk'. +* Built-in Variables:: Summarizes the built-in variables. +* User-modified:: Built-in variables that you change to + control `awk'. +* Auto-set:: Built-in variables where `awk' + gives you information. +* ARGC and ARGV:: Ways to use `ARGC' and + `ARGV'. +* Array Basics:: The basics of arrays. +* Array Intro:: Introduction to Arrays +* Reference to Elements:: How to examine one element of an + array. +* Assigning Elements:: How to change an element of an array. +* Array Example:: Basic Example of an Array +* Scanning an Array:: A variation of the `for' + statement. It loops through the + indices of an array's existing + elements. +* Controlling Scanning:: Controlling the order in which arrays + are scanned. +* Delete:: The `delete' statement removes an + element from an array. +* Numeric Array Subscripts:: How to use numbers as subscripts in + `awk'. +* Uninitialized Subscripts:: Using Uninitialized variables as + subscripts. +* Multi-dimensional:: Emulating multidimensional arrays in + `awk'. +* Multi-scanning:: Scanning multidimensional arrays. +* Arrays of Arrays:: True multidimensional arrays. +* Built-in:: Summarizes the built-in functions. +* Calling Built-in:: How to call built-in functions. +* Numeric Functions:: Functions that work with numbers, + including `int()', `sin()' + and `rand()'. +* String Functions:: Functions for string manipulation, + such as `split()', `match()' + and `sprintf()'. +* Gory Details:: More than you want to know about + `\' and `&' with + `sub()', `gsub()', and + `gensub()'. +* I/O Functions:: Functions for files and shell + commands. +* Time Functions:: Functions for dealing with timestamps. +* Bitwise Functions:: Functions for bitwise operations. +* Type Functions:: Functions for type information. +* I18N Functions:: Functions for string translation. +* User-defined:: Describes User-defined functions in + detail. +* Definition Syntax:: How to write definitions and what they + mean. +* Function Example:: An example function definition and + what it does. +* Function Caveats:: Things to watch out for. +* Calling A Function:: Don't use spaces. +* Variable Scope:: Controlling variable scope. +* Pass By Value/Reference:: Passing parameters. +* Return Statement:: Specifying the value a function + returns. +* Dynamic Typing:: How variable types can change at + runtime. +* Indirect Calls:: Choosing the function to call at + runtime. +* I18N and L10N:: Internationalization and Localization. +* Explaining gettext:: How GNU `gettext' works. +* Programmer i18n:: Features for the programmer. +* Translator i18n:: Features for the translator. +* String Extraction:: Extracting marked strings. +* Printf Ordering:: Rearranging `printf' arguments. +* I18N Portability:: `awk'-level portability + issues. +* I18N Example:: A simple i18n example. +* Gawk I18N:: `gawk' is also + internationalized. +* Nondecimal Data:: Allowing nondecimal input data. +* Array Sorting:: Facilities for controlling array + traversal and sorting arrays. +* Controlling Array Traversal:: How to use PROCINFO["sorted_in"]. +* Array Sorting Functions:: How to use `asort()' and + `asorti()'. +* Two-way I/O:: Two-way communications with another + process. +* TCP/IP Networking:: Using `gawk' for network + programming. +* Profiling:: Profiling your `awk' programs. +* Library Names:: How to best name private global + variables in library functions. +* General Functions:: Functions that are of general use. +* Strtonum Function:: A replacement for the built-in + `strtonum()' function. +* Assert Function:: A function for assertions in + `awk' programs. +* Round Function:: A function for rounding if + `sprintf()' does not do it + correctly. +* Cliff Random Function:: The Cliff Random Number Generator. +* Ordinal Functions:: Functions for using characters as + numbers and vice versa. +* Join Function:: A function to join an array into a + string. +* Getlocaltime Function:: A function to get formatted times. +* Data File Management:: Functions for managing command-line + data files. +* Filetrans Function:: A function for handling data file + transitions. +* Rewind Function:: A function for rereading the current + file. +* File Checking:: Checking that data files are readable. +* Empty Files:: Checking for zero-length files. +* Ignoring Assigns:: Treating assignments as file names. +* Getopt Function:: A function for processing command-line + arguments. +* Passwd Functions:: Functions for getting user + information. +* Group Functions:: Functions for getting group + information. +* Walking Arrays:: A function to walk arrays of arrays. +* Running Examples:: How to run these examples. +* Clones:: Clones of common utilities. +* Cut Program:: The `cut' utility. +* Egrep Program:: The `egrep' utility. +* Id Program:: The `id' utility. +* Split Program:: The `split' utility. +* Tee Program:: The `tee' utility. +* Uniq Program:: The `uniq' utility. +* Wc Program:: The `wc' utility. +* Miscellaneous Programs:: Some interesting `awk' + programs. +* Dupword Program:: Finding duplicated words in a + document. +* Alarm Program:: An alarm clock. +* Translate Program:: A program similar to the `tr' + utility. +* Labels Program:: Printing mailing labels. +* Word Sorting:: A program to produce a word usage + count. +* History Sorting:: Eliminating duplicate entries from a + history file. +* Extract Program:: Pulling out programs from Texinfo + source files. +* Simple Sed:: A Simple Stream Editor. +* Igawk Program:: A wrapper for `awk' that + includes files. +* Anagram Program:: Finding anagrams from a dictionary. +* Signature Program:: People do amazing things with too much + time on their hands. +* Debugging:: Introduction to `gawk' + debugger. +* Debugging Concepts:: Debugging in General. +* Debugging Terms:: Additional Debugging Concepts. +* Awk Debugging:: Awk Debugging. +* Sample Debugging Session:: Sample debugging session. +* Debugger Invocation:: How to Start the Debugger. +* Finding The Bug:: Finding the Bug. +* List of Debugger Commands:: Main debugger commands. +* Breakpoint Control:: Control of Breakpoints. +* Debugger Execution Control:: Control of Execution. +* Viewing And Changing Data:: Viewing and Changing Data. +* Execution Stack:: Dealing with the Stack. +* Debugger Info:: Obtaining Information about the + Program and the Debugger State. +* Miscellaneous Debugger Commands:: Miscellaneous Commands. +* Readline Support:: Readline support. +* Limitations:: Limitations and future plans. +* General Arithmetic:: An introduction to computer + arithmetic. +* Floating Point Issues:: Stuff to know about floating-point + numbers. +* String Conversion Precision:: The String Value Can Lie. +* Unexpected Results:: Floating Point Numbers Are Not + Abstract Numbers. +* POSIX Floating Point Problems:: Standards Versus Existing Practice. +* Integer Programming:: Effective integer programming. +* Floating-point Programming:: Effective Floating-point Programming. +* Floating-point Representation:: Binary floating-point representation. +* Floating-point Context:: Floating-point context. +* Rounding Mode:: Floating-point rounding mode. +* Gawk and MPFR:: How `gawk' provides + arbitrary-precision arithmetic. +* Arbitrary Precision Floats:: Arbitrary Precision Floating-point + Arithmetic with `gawk'. +* Setting Precision:: Setting the working precision. +* Setting Rounding Mode:: Setting the rounding mode. +* Floating-point Constants:: Representing floating-point constants. +* Changing Precision:: Changing the precision of a number. +* Exact Arithmetic:: Exact arithmetic with floating-point + numbers. +* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic + with `gawk'. +* Extension Intro:: What is an extension. +* Plugin License:: A note about licensing. +* Extension Design:: Design notes about the extension API. +* Old Extension Problems:: Problems with the old mechanism. +* Extension New Mechanism Goals:: Goals for the new mechanism. +* Extension Other Design Decisions:: Some other design decisions. +* Extension Mechanism Outline:: An outline of how it works. +* Extension Future Growth:: Some room for future growth. +* Extension API Description:: A full description of the API. +* Extension API Functions Introduction:: Introduction to the API functions. +* General Data Types:: The data types. +* Requesting Values:: How to get a value. +* Constructor Functions:: Functions for creating values. +* Registration Functions:: Functions to register things with + `gawk'. +* Extension Functions:: Registering extension functions. +* Exit Callback Functions:: Registering an exit callback. +* Extension Version String:: Registering a version string. +* Input Parsers:: Registering an input parser. +* Output Wrappers:: Registering an output wrapper. +* Two-way processors:: Registering a two-way processor. +* Printing Messages:: Functions for printing messages. +* Updating `ERRNO':: Functions for updating `ERRNO'. +* Accessing Parameters:: Functions for accessing parameters. +* Symbol Table Access:: Functions for accessing global + variables. +* Symbol table by name:: Accessing variables by name. +* Symbol table by cookie:: Accessing variables by ``cookie''. +* Cached values:: Creating and using cached values. +* Array Manipulation:: Functions for working with arrays. +* Array Data Types:: Data types for working with arrays. +* Array Functions:: Functions for working with arrays. +* Flattening Arrays:: How to flatten arrays. +* Creating Arrays:: How to create and populate arrays. +* Extension API Variables:: Variables provided by the API. +* Extension Versioning:: API Version information. +* Extension API Informational Variables:: Variables providing information about + `gawk''s invocation. +* Extension API Boilerplate:: Boilerplate code for using the API. +* Finding Extensions:: How `gawk' find compiled + extensions. +* Extension Example:: Example C code for an extension. +* Internal File Description:: What the new functions will do. +* Internal File Ops:: The code for internal file operations. +* Using Internal File Ops:: How to use an external extension. +* Extension Samples:: The sample extensions that ship with + `gawk'. +* Extension Sample File Functions:: The file functions sample. +* Extension Sample Fnmatch:: An interface to `fnmatch()'. +* Extension Sample Fork:: An interface to `fork()' and + other process functions. +* Extension Sample Ord:: Character to value to character + conversions. +* Extension Sample Readdir:: An interface to `readdir()'. +* Extension Sample Revout:: Reversing output sample output + wrapper. +* Extension Sample Rev2way:: Reversing data sample two-way + processor. +* Extension Sample Read write array:: Serializing an array to a file. +* Extension Sample Readfile:: Reading an entire file into a string. +* Extension Sample API Tests:: Tests for the API. +* Extension Sample Time:: An interface to `gettimeofday()' + and `sleep()'. +* gawkextlib:: The `gawkextlib' project. +* V7/SVR3.1:: The major changes between V7 and + System V Release 3.1. +* SVR4:: Minor changes between System V + Releases 3.1 and 4. +* POSIX:: New features from the POSIX standard. +* BTL:: New features from Brian Kernighan's + version of `awk'. +* POSIX/GNU:: The extensions in `gawk' not + in POSIX `awk'. +* Common Extensions:: Common Extensions Summary. +* Ranges and Locales:: How locales used to affect regexp + ranges. +* Contributors:: The major contributors to + `gawk'. +* Gawk Distribution:: What is in the `gawk' + distribution. +* Getting:: How to get the distribution. +* Extracting:: How to extract the distribution. +* Distribution contents:: What is in the distribution. +* Unix Installation:: Installing `gawk' under + various versions of Unix. +* Quick Installation:: Compiling `gawk' under Unix. +* Additional Configuration Options:: Other compile-time options. +* Configuration Philosophy:: How it's all supposed to work. +* Non-Unix Installation:: Installation on Other Operating + Systems. +* PC Installation:: Installing and Compiling + `gawk' on MS-DOS and OS/2. +* PC Binary Installation:: Installing a prepared distribution. +* PC Compiling:: Compiling `gawk' for MS-DOS, + Windows32, and OS/2. +* PC Testing:: Testing `gawk' on PC systems. +* PC Using:: Running `gawk' on MS-DOS, + Windows32 and OS/2. +* Cygwin:: Building and running `gawk' + for Cygwin. +* MSYS:: Using `gawk' In The MSYS + Environment. +* VMS Installation:: Installing `gawk' on VMS. +* VMS Compilation:: How to compile `gawk' under + VMS. +* VMS Installation Details:: How to install `gawk' under + VMS. +* VMS Running:: How to run `gawk' under VMS. +* VMS Old Gawk:: An old version comes with some VMS + systems. +* Bugs:: Reporting Problems and Bugs. +* Other Versions:: Other freely available `awk' + implementations. +* Compatibility Mode:: How to disable certain `gawk' + extensions. +* Additions:: Making Additions To `gawk'. +* Accessing The Source:: Accessing the Git repository. +* Adding Code:: Adding code to the main body of + `gawk'. +* New Ports:: Porting `gawk' to a new + operating system. +* Derived Files:: Why derived files are kept in the + `git' repository. +* Future Extensions:: New features that may be implemented + one day. +* Basic High Level:: The high level view. +* Basic Data Typing:: A very quick intro to data types. To Miriam, for making me complete. @@ -21180,34 +21292,66 @@ File: gawk.info, Node: Dynamic Extensions, Next: Language History, Prev: Arbi 16 Writing Extensions for `gawk' ******************************** -This chapter is a placeholder, pending a rewrite for the new API. Some -of the old bits remain, since they can be partially reused. - - It is possible to add new built-in functions to `gawk' using +It is possible to add new built-in functions to `gawk' using dynamically loaded libraries. This facility is available on systems (such as GNU/Linux) that support the C `dlopen()' and `dlsym()' -functions. This major node describes how to write and use dynamically -loaded extensions for `gawk'. Experience with programming in C or C++ -is necessary when reading this minor node. +functions. This major node describes how to create extensions using +code written in C or C++. If you don't know anything about C +programming, you can safely skip this major node, although you may wish +to review the documentation on the extensions that come with `gawk' +(*note Extension Samples::), and the section on the `gawkextlib' +project (*note gawkextlib::). NOTE: When `--sandbox' is specified, extensions are disabled - (*note Options::. + (*note Options::). * Menu: +* Extension Intro:: What is an extension. * Plugin License:: A note about licensing. -* Sample Library:: A example of new functions. +* Extension Design:: Design notes about the extension API. +* Extension API Description:: A full description of the API. +* Extension Example:: Example C code for an extension. +* Extension Samples:: The sample extensions that ship with + `gawk'. +* gawkextlib:: The `gawkextlib' project. + + +File: gawk.info, Node: Extension Intro, Next: Plugin License, Up: Dynamic Extensions + +16.1 Introduction +================= + +An "extension" (sometimes called a "plug-in") is a piece of external +compiled code that `gawk' can load at runtime to provide additional +functionality, over and above the built-in capabilities described in +the rest of this Info file. + + Extensions are useful because they allow you (of course) to extend +`gawk''s functionality. For example, they can provide access to system +calls (such as `chdir()' to change directory) and to other C library +routines that could be of use. As with most software, "the sky is the +limit;" if you can imagine something that you might want to do and can +write in C or C++, you can write an extension to do it! + + Extensions are written in C or C++, using the "Application +Programming Interface" (API) defined for this purpose by the `gawk' +developers. The rest of this major node explains the design decisions +behind the API, the facilities it provides and how to use them, and +presents a small sample extension. In addition, it documents the +sample extensions included in the `gawk' distribution, and describes +the `gawkextlib' project. -File: gawk.info, Node: Plugin License, Next: Sample Library, Up: Dynamic Extensions +File: gawk.info, Node: Plugin License, Next: Extension Design, Prev: Extension Intro, Up: Dynamic Extensions -16.1 Extension Licensing +16.2 Extension Licensing ======================== Every dynamic extension should define the global symbol `plugin_is_GPL_compatible' to assert that it has been licensed under a -GPL-compatible license. If this symbol does not exist, `gawk' will -emit a fatal error and exit. +GPL-compatible license. If this symbol does not exist, `gawk' emits a +fatal error and exits when it tries to load your extension. The declared type of the symbol should be `int'. It does not need to be in any allocated section, though. The code merely asserts that @@ -21216,15 +21360,2213 @@ the symbol exists in the global scope. Something like this is enough: int plugin_is_GPL_compatible; -File: gawk.info, Node: Sample Library, Prev: Plugin License, Up: Dynamic Extensions +File: gawk.info, Node: Extension Design, Next: Extension API Description, Prev: Plugin License, Up: Dynamic Extensions -16.2 Example: Directory and File Operation Built-ins -==================================================== +16.3 Extension API Design +========================= + +The first version of extensions for `gawk' was developed in the +mid-1990s and released with `gawk' 3.1 in the late 1990s. The basic +mechanisms and design remained unchanged for close to 15 years, until +2012. + + The old extension mechanism used data types and functions from +`gawk' itself, with a "clever hack" to install extension functions. + + `gawk' included some sample extensions, of which a few were really +useful. However, it was clear from the outset that the extension +mechanism was bolted onto the side and was not really thought out. + +* Menu: + +* Old Extension Problems:: Problems with the old mechanism. +* Extension New Mechanism Goals:: Goals for the new mechanism. +* Extension Other Design Decisions:: Some other design decisions. +* Extension Mechanism Outline:: An outline of how it works. +* Extension Future Growth:: Some room for future growth. + + +File: gawk.info, Node: Old Extension Problems, Next: Extension New Mechanism Goals, Up: Extension Design + +16.3.1 Problems With The Old Mechanism +-------------------------------------- + +The old extension mechanism had several problems: + + * It depended heavily upon `gawk' internals. Any time the `NODE' + structure(1) changed, an extension would have to be recompiled. + Furthermore, to really write extensions required understanding + something about `gawk''s internal functions. There was some + documentation in this Info file, but it was quite minimal. + + * Being able to call into `gawk' from an extension required linker + facilities that are common on Unix-derived systems but that did + not work on Windows systems; users wanting extensions on Windows + had to statically link them into `gawk', even though Windows + supports dynamic loading of shared objects. + + * The API would change occasionally as `gawk' changed; no + compatibility between versions was ever offered or planned for. + + Despite the drawbacks, the `xgawk' project developers forked `gawk' +and developed several significant extensions. They also enhanced +`gawk''s facilities relating to file inclusion and shared object access. + + A new API was desired for a long time, but only in 2012 did the +`gawk' maintainer and the `xgawk' developers finally start working on +it together. More information about the `xgawk' project is provided in +*note gawkextlib::. + + ---------- Footnotes ---------- + + (1) A critical central data structure inside `gawk'. + + +File: gawk.info, Node: Extension New Mechanism Goals, Next: Extension Other Design Decisions, Prev: Old Extension Problems, Up: Extension Design + +16.3.2 Goals For A New Mechanism +-------------------------------- + +Some goals for the new API were: + + * The API should be independent of `gawk' internals. Changes in + `gawk' internals should not be visible to the writer of an + extension function. + + * The API should provide _binary_ compatibility across `gawk' + releases as long as the API itself does not change. + + * The API should enable extensions written in C to have roughly the + same "appearance" to `awk'-level code as `awk' functions do. This + means that extensions should have: + + - The ability to access function parameters. + + - The ability to turn an undefined parameter into an array + (call by reference). + + - The ability to create, access and update global variables. + + - Easy access to all the elements of an array at once ("array + flattening") in order to loop over all the element in an easy + fashion for C code. + + - The ability to create arrays (including `gawk''s true + multi-dimensional arrays). + + Some additional important goals were: + + * The API should use only features in ISO C 90, so that extensions + can be written using the widest range of C and C++ compilers. The + header should include the appropriate `#ifdef __cplusplus' and + `extern "C"' magic so that a C++ compiler could be used. (If + using C++, the runtime system has to be smart enough to call any + constructors and destructors, as `gawk' is a C program. As of this + writing, this has not been tested.) + + * The API mechanism should not require access to `gawk''s symbols(1) + by the compile-time or dynamic linker, in order to enable creation + of extensions that also work on Windows. + + During development, it became clear that there were other features +that should be available to extensions, which were also subsequently +provided: + + * Extensions should have the ability to hook into `gawk''s I/O + redirection mechanism. In particular, the `xgawk' developers + provided a so-called "open hook" to take over reading records. + During development, this was generalized to allow extensions to + hook into input processing, output processing, and two-way I/O. + + * An extension should be able to provide a "call back" function to + perform clean up actions when `gawk' exits. + + * An extension should be able to provide a version string so that + `gawk''s `--version' option can provide information about + extensions as well. + + ---------- Footnotes ---------- + + (1) The "symbols" are the variables and functions defined inside +`gawk'. Access to these symbols by code external to `gawk' loaded +dynamically at runtime is problematic on Windows. + + +File: gawk.info, Node: Extension Other Design Decisions, Next: Extension Mechanism Outline, Prev: Extension New Mechanism Goals, Up: Extension Design + +16.3.3 Other Design Decisions +----------------------------- + +As an "arbitrary" design decision, extensions can read the values of +built-in variables and arrays (such as `ARGV' and `FS'), but cannot +change them, with the exception of `PROCINFO'. + + The reason for this is to prevent an extension function from +affecting the flow of an `awk' program outside its control. While a +real `awk' function can do what it likes, that is at the discretion of +the programmer. An extension function should provide a service or make +a C API available for use within `awk', and not mess with `FS' or +`ARGC' and `ARGV'. + + In addition, it becomes easy to start down a slippery slope. How +much access to `gawk' facilities do extensions need? Do they need +`getline'? What about calling `gsub()' or compiling regular +expressions? What about calling into `awk' functions? (_That_ would be +messy.) + + In order to avoid these issues, the `gawk' developers chose to start +with the simplest, most basic features that are still truly useful. + + Another decision is that although `gawk' provides nice things like +MPFR, and arrays indexed internally by integers, these features are not +being brought out to the API in order to keep things simple and close to +traditional `awk' semantics. (In fact, arrays indexed internally by +integers are so transparent that they aren't even documented!) + + With time, the API will undoubtedly evolve; the `gawk' developers +expect this to be driven by user needs. For now, the current API seems +to provide a minimal yet powerful set of features for creating +extensions. + + +File: gawk.info, Node: Extension Mechanism Outline, Next: Extension Future Growth, Prev: Extension Other Design Decisions, Up: Extension Design + +16.3.4 At A High Level How It Works +----------------------------------- + +The requirement to avoid access to `gawk''s symbols is, at first +glance, a difficult one to meet. + + One design, apparently used by Perl and Ruby and maybe others, would +be to make the mainline `gawk' code into a library, with the `gawk' +utility a small C `main()' function linked against the library. + + This seemed like the tail wagging the dog, complicating build and +installation and making a simple copy of the `gawk' executable from one +system to another (or one place to another on the same system!) into a +chancy operation. + + Pat Rankin suggested the solution that was adopted. Communication +between `gawk' and an extension is two-way. First, when an extension +is loaded, it is passed a pointer to a `struct' whose fields are +function pointers. + + API + Struct + +---+ + | | + +---+ + +---------------| | + | +---+ dl_load(api_p, id); + | | | ___________________ + | +---+ | + | +---------| | __________________ | + | | +---+ || + | | | | || + | | +---+ || + | | +---| | || + | | | +---+ \ || / + | | | \ / + v v v \/ ++-------+-+---+-+---+-+------------------+--------------------+ +| |x| |x| |x| |OOOOOOOOOOOOOOOOOOOO| +| |x| |x| |x| |OOOOOOOOOOOOOOOOOOOO| +| |x| |x| |x| |OOOOOOOOOOOOOOOOOOOO| ++-------+-+---+-+---+-+------------------+--------------------+ + + gawk Main Program Address Space Extension +Figure 16.1: Loading the extension + + The extension can call functions inside `gawk' through these +function pointers, at runtime, without needing (link-time) access to +`gawk''s symbols. One of these function pointers is to a function for +"registering" new built-in functions. + + register_ext_func({ "chdir", do_chdir, 1 }); + + +--------------------------------------------+ + | | + V | ++-------+-+---+-+---+-+------------------+--------------+-+---+ +| |x| |x| |x| |OOOOOOOOOOOOOO|X|OOO| +| |x| |x| |x| |OOOOOOOOOOOOOO|X|OOO| +| |x| |x| |x| |OOOOOOOOOOOOOO|X|OOO| ++-------+-+---+-+---+-+------------------+--------------+-+---+ + + gawk Main Program Address Space Extension +Figure 16.2: Loading the new function + + In the other direction, the extension registers its new functions +with `gawk' by passing function pointers to the functions that provide +the new feature (`do_chdir()', for example). `gawk' associates the +function pointer with a name and can then call it, using a defined +calling convention. + + BEGIN { + chdir("/path") (*fnptr)(1); + } + +--------------------------------------------+ + | | + | V ++-------+-+---+-+---+-+------------------+--------------+-+---+ +| |x| |x| |x| |OOOOOOOOOOOOOO|X|OOO| +| |x| |x| |x| |OOOOOOOOOOOOOO|X|OOO| +| |x| |x| |x| |OOOOOOOOOOOOOO|X|OOO| ++-------+-+---+-+---+-+------------------+--------------+-+---+ + + gawk Main Program Address Space Extension +Figure 16.3: Calling the new function + + The `do_XXX()' function, in turn, then uses the function pointers in +the API `struct' to do its work, such as updating variables or arrays, +printing messages, setting `ERRNO', and so on. + + Convenience macros in the `gawkapi.h' header file make calling +through the function pointers look like regular function calls so that +extension code is quite readable and understandable. + + Although all of this sounds medium complicated, the result is that +extension code is quite clean and straightforward. This can be seen in +the sample extensions `filefuncs.c' (*note Extension Example::) and +also the `testext.c' code for testing the APIs. + + Some other bits and pieces: + + * The API provides access to `gawk''s `do_XXX' values, reflecting + command line options, like `do_lint', `do_profiling' and so on + (*note Extension API Variables::). These are informational: an + extension cannot affect these inside `gawk'. In addition, + attempting to assign to them produces a compile-time error. + + * The API also provides major and minor version numbers, so that an + extension can check if the `gawk' it is loaded with supports the + facilities it was compiled with. (Version mismatches "shouldn't" + happen, but we all know how _that_ goes.) *Note Extension + Versioning::, for details. + + +File: gawk.info, Node: Extension Future Growth, Prev: Extension Mechanism Outline, Up: Extension Design + +16.3.5 Room For Future Growth +----------------------------- + +The API provides room for future growth, in two ways. + + An "extension id" is passed into the extension when its loaded. This +extension id is then passed back to `gawk' with each function call. +This allows `gawk' to identify the extension calling into it, should it +need to know. + + A "name space" is passed into `gawk' when an extension function is +registered. This provides for a future mechanism for grouping +extension functions and possibly avoiding name conflicts. + + Of course, as of this writing, no decisions have been made with +respect to any of the above. + + +File: gawk.info, Node: Extension API Description, Next: Extension Example, Prev: Extension Design, Up: Dynamic Extensions + +16.4 API Description +==================== + +This (rather large) minor node describes the API in detail. + +* Menu: + +* Extension API Functions Introduction:: Introduction to the API functions. +* General Data Types:: The data types. +* Requesting Values:: How to get a value. +* Constructor Functions:: Functions for creating values. +* Registration Functions:: Functions to register things with + `gawk'. +* Printing Messages:: Functions for printing messages. +* Updating `ERRNO':: Functions for updating `ERRNO'. +* Accessing Parameters:: Functions for accessing parameters. +* Symbol Table Access:: Functions for accessing global + variables. +* Array Manipulation:: Functions for working with arrays. +* Extension API Variables:: Variables provided by the API. +* Extension API Boilerplate:: Boilerplate code for using the API. +* Finding Extensions:: How `gawk' find compiled + extensions. + + +File: gawk.info, Node: Extension API Functions Introduction, Next: General Data Types, Up: Extension API Description + +16.4.1 Introduction +------------------- + +Access to facilities within `gawk' are made available by calling +through function pointers passed into your extension. + + API function pointers are provided for the following kinds of +operations: + + * Registrations functions. You may register: + - extension functions, + + - exit callbacks, + + - a version string, + + - input parsers, + + - output wrappers, + + - and two-way processors. + All of these are discussed in detail, later in this major node. + + * Printing fatal, warning, and "lint" warning messages. + + * Updating `ERRNO', or unsetting it. + + * Accessing parameters, including converting an undefined parameter + into an array. + + * Symbol table access: retrieving a global variable, creating one, + or changing one. This also includes the ability to create a scalar + variable that will be _constant_ within `awk' code. + + * Creating and releasing cached values; this provides an efficient + way to use values for multiple variables and can be a big + performance win. + + * Manipulating arrays: + - Retrieving, adding, deleting, and modifying elements + + - Getting the count of elements in an array + + - Creating a new array + + - Clearing an array + + - Flattening an array for easy C style looping over all its + indices and elements + + Some points about using the API: + + * You must include `<sys/types.h>' and `<sys/stat.h>' before + including the `gawkapi.h' header file. In addition, you must + include either `<stddef.h>' or `<stdlib.h>' to get the definition + of `size_t'. If you wish to use the boilerplate `dl_load_func()' + macro, you will need to include `<stdio.h>' as well. Finally, to + pass reasonable integer values for `ERRNO', you will need to + include `<errno.h>'. + + * Although the API only uses ISO C 90 features, there is an + exception; the "constructor" functions use the `inline' keyword. + If your compiler does not support this keyword, you should either + place `-Dinline=''' on your command line, or use the GNU Autotools + and include a `config.h' file in your extensions. + + * All pointers filled in by `gawk' are to memory managed by `gawk' + and should be treated by the extension as read-only. Memory for + _all_ strings passed into `gawk' from the extension _must_ come + from `malloc()' and is managed by `gawk' from then on. + + * The API defines several simple structs that map values as seen + from `awk'. A value can be a `double', a string, or an array (as + in multidimensional arrays, or when creating a new array). + Strings maintain both pointer and length since embedded `NUL' + characters are allowed. + + By intent, strings are maintained using the current multibyte + encoding (as defined by `LC_XXX' environment variables) and not + using wide characters. This matches how `gawk' stores strings + internally and also how characters are likely to be input and + output from files. + + * When retrieving a value (such as a parameter or that of a global + variable or array element), the extension requests a specific type + (number, string, scalars, value cookie, array, or "undefined"). + When the request is "undefined," the returned value will have the + real underlying type. + + However, if the request and actual type don't match, the access + function returns "false" and fills in the type of the actual value + that is there, so that the extension can, e.g., print an error + message ("scalar passed where array expected"). + + + While you may call the API functions by using the function pointers +directly, the interface is not so pretty. To make extension code look +more like regular code, the `gawkapi.h' header file defines a number of +macros which you should use in your code. This minor node presents the +macros as if they were functions. + + +File: gawk.info, Node: General Data Types, Next: Requesting Values, Prev: Extension API Functions Introduction, Up: Extension API Description + +16.4.2 General Purpose Data Types +--------------------------------- + + I have a true love/hate relationship with unions. + Arnold Robbins + + That's the thing about unions: the compiler will arrange things so + they can accommodate both love and hate. + Chet Ramey + + The extension API defines a number of simple types and structures +for general purpose use. Additional, more specialized, data structures, +are introduced in subsequent minor nodes, together with the functions +that use them. + +`typedef void *awk_ext_id_t;' + A value of this type is received from `gawk' when an extension is + loaded. That value must then be passed back to `gawk' as the + first parameter of each API function. + +`#define awk_const ...' + This macro expands to `const' when compiling an extension, and to + nothing when compiling `gawk' itself. This makes certain fields + in the API data structures unwritable from extension code, while + allowing `gawk' to use them as it needs to. + +`typedef int awk_bool_t;' + A simple boolean type. At the moment, the API does not define + special "true" and "false" values, although perhaps it should. + +`typedef struct {' +` char *str; /* data */' +` size_t len; /* length thereof, in chars */' +`} awk_string_t;' + This represents a mutable string. `gawk' owns the memory pointed + to if it supplied the value. Otherwise, it takes ownership of the + memory pointed to. *Such memory must come from `malloc()'!* + + As mentioned earlier, strings are maintained using the current + multibyte encoding. + +`typedef enum {' +` AWK_UNDEFINED,' +` AWK_NUMBER,' +` AWK_STRING,' +` AWK_ARRAY,' +` AWK_SCALAR, /* opaque access to a variable */' +` AWK_VALUE_COOKIE /* for updating a previously created value */' +`} awk_valtype_t;' + This `enum' indicates the type of a value. It is used in the + following `struct'. + +`typedef struct {' +` awk_valtype_t val_type;' +` union {' +` awk_string_t s;' +` double d;' +` awk_array_t a;' +` awk_scalar_t scl;' +` awk_value_cookie_t vc;' +` } u;' +`} awk_value_t;' + An "`awk' value." The `val_type' member indicates what kind of + value the `union' holds, and each member is of the appropriate + type. + +`#define str_value u.s' +`#define num_value u.d' +`#define array_cookie u.a' +`#define scalar_cookie u.scl' +`#define value_cookie u.vc' + These macros make accessing the fields of the `awk_value_t' more + readable. + +`typedef void *awk_scalar_t;' + Scalars can be represented as an opaque type. These values are + obtained from `gawk' and then passed back into it. This is + discussed in a general fashion below, and in more detail in *note + Symbol table by cookie::. + +`typedef void *awk_value_cookie_t;' + A "value cookie" is an opaque type representing a cached value. + This is also discussed in a general fashion below, and in more + detail in *note Cached values::. + + + Scalar values in `awk' are either numbers or strings. The +`awk_value_t' struct represents values. The `val_type' member +indicates what is in the `union'. + + Representing numbers is easy--the API uses a C `double'. Strings +require more work. Since `gawk' allows embedded `NUL' bytes in string +values, a string must be represented as a pair containing a +data-pointer and length. This is the `awk_string_t' type. + + Identifiers (i.e., the names of global variables) can be associated +with either scalar values or with arrays. In addition, `gawk' provides +true arrays of arrays, where any given array element can itself be an +array. Discussion of arrays is delayed until *note Array +Manipulation::. + + The various macros listed earlier make it easier to use the elements +of the `union' as if they were fields in a `struct'; this is a common +coding practice in C. Such code is easier to write and to read, +however it remains _your_ responsibility to make sure that the +`val_type' member correctly reflects the type of the value in the +`awk_value_t'. + + Conceptually, the first three members of the `union' (number, string, +and array) are all that is needed for working with `awk' values. +However, since the API provides routines for accessing and changing the +value of global scalar variables only by using the variable's name, +there is a performance penalty: `gawk' must find the variable each time +it is accessed and changed. This turns out to be a real issue, not +just a theoretical one. + + Thus, if you know that your extension will spend considerable time +reading and/or changing the value of one or more scalar variables, you +can obtain a "scalar cookie"(1) object for that variable, and then use +the cookie for getting the variable's value or for changing the +variable's value. This is the `awk_scalar_t' type and `scalar_cookie' +macro. Given a scalar cookie, `gawk' can directly retrieve or modify +the value, as required, without having to first find it. + + The `awk_value_cookie_t' type and `value_cookie' macro are similar. +If you know that you wish to use the same numeric or string _value_ for +one or more variables, you can create the value once, retaining a +"value cookie" for it, and then pass in that value cookie whenever you +wish to set the value of a variable. This saves both storage space +within the running `gawk' process as well as the time needed to create +the value. + + ---------- Footnotes ---------- + + (1) See the "cookie" entry in the Jargon file +(http://catb.org/jargon/html/C/cookie.html) for a definition of +"cookie", and the "magic cookie" entry in the Jargon file +(http://catb.org/jargon/html/M/magic-cookie.html) for a nice example. +See also the entry for "Cookie" in the *note Glossary::. + + +File: gawk.info, Node: Requesting Values, Next: Constructor Functions, Prev: General Data Types, Up: Extension API Description + +16.4.3 Requesting Values +------------------------ + +All of the functions that return values from `gawk' work in the same +way. You pass in an `awk_valtype_t' value to indicate what kind of +value you expect. If the actual value matches what you requested, the +function returns true and fills in the `awk_value_t' result. +Otherwise, the function returns false, and the `val_type' member +indicates the type of the actual value. You may then print an error +message, or reissue the request for the actual value type, as +appropriate. This behavior is summarized in *note +table-value-types-returned::. + + Type of Actual Value: +-------------------------------------------------------------------------- + + String Number Array Undefined +------------------------------------------------------------------------------ + String String String false false + Number Number if can Number false false + be converted, + else false +Type Array false false Array false +Requested: Scalar Scalar Scalar false false + Undefined String Number Array Undefined + Value false false false false + Cookie + +Table 16.1: Value Types Returned + + +File: gawk.info, Node: Constructor Functions, Next: Registration Functions, Prev: Requesting Values, Up: Extension API Description + +16.4.4 Constructor Functions and Convenience Macros +--------------------------------------------------- + +The API provides a number of "constructor" functions for creating +string and numeric values, as well as a number of convenience macros. +This node presents them all as function prototypes, in the way that +extension code would use them. + +`static inline awk_value_t *' +`make_const_string(const char *string, size_t length, awk_value_t *result)' + This function creates a string value in the `awk_value_t' variable + pointed to by `result'. It expects `string' to be a C string + constant (or other string data), and automatically creates a + _copy_ of the data for storage in `result'. It returns `result'. + +`static inline awk_value_t *' +`make_malloced_string(const char *string, size_t length, awk_value_t *result)' + This function creates a string value in the `awk_value_t' variable + pointed to by `result'. It expects `string' to be a `char *' value + pointing to data previously obtained from `malloc()'. The idea here + is that the data is passed directly to `gawk', which assumes + responsibility for it. It returns `result'. + +`static inline awk_value_t *' +`make_null_string(awk_value_t *result)' + This specialized function creates a null string (the "undefined" + value) in the `awk_value_t' variable pointed to by `result'. It + returns `result'. + +`static inline awk_value_t *' +`make_number(double num, awk_value_t *result)' + This function simply creates a numeric value in the `awk_value_t' + variable pointed to by `result'. + + Two convenience macros may be used for allocating storage from +`malloc()' and `realloc()'. If the allocation fails, they cause `gawk' +to exit with a fatal error message. They should be used as if they were +procedure calls that do not return a value. + +`emalloc(pointer, type, size, message)' + The arguments to this macro are as follows: + `pointer' + The pointer variable to point at the allocated storage. + + `type' + The type of the pointer variable, used to create a cast for + the call to `malloc()'. + + `size' + The total number of bytes to be allocated. + + `message' + A message to be prefixed to the fatal error message. + Typically this is the name of the function using the macro. + + For example, you might allocate a string value like so: + + awk_value_t result; + char *message; + const char greet[] = "Don't Panic!"; + + emalloc(message, char *, sizeof(greet), "myfunc"); + strcpy(message, greet); + make_malloced_string(message, strlen(message), & result); + +`erealloc(pointer, type, size, message)' + This is like `emalloc()', but it calls `realloc()', instead of + `malloc()'. The arguments are the same as for the `emalloc()' + macro. + + +File: gawk.info, Node: Registration Functions, Next: Printing Messages, Prev: Constructor Functions, Up: Extension API Description + +16.4.5 Registration Functions +----------------------------- + +This minor node describes the API functions for registering parts of +your extension with `gawk'. + +* Menu: + +* Extension Functions:: Registering extension functions. +* Exit Callback Functions:: Registering an exit callback. +* Extension Version String:: Registering a version string. +* Input Parsers:: Registering an input parser. +* Output Wrappers:: Registering an output wrapper. +* Two-way processors:: Registering a two-way processor. + + +File: gawk.info, Node: Extension Functions, Next: Exit Callback Functions, Up: Registration Functions + +16.4.5.1 Registering An Extension Function +.......................................... + +Extension functions are described by the following record: + + typedef struct { + const char *name; + awk_value_t *(*function)(int num_actual_args, awk_value_t *result); + size_t num_expected_args; + } awk_ext_func_t; + + The fields are: + +`const char *name;' + The name of the new function. `awk' level code calls the function + by this name. This is a regular C string. + +`awk_value_t *(*function)(int num_actual_args, awk_value_t *result);' + This is a pointer to the C function that provides the desired + functionality. The function must fill in the result with either a + number or a string. `awk' takes ownership of any string memory. + As mentioned earlier, string memory *must* come from `malloc()'. + + The function must return the value of `result'. This is for the + convenience of the calling code inside `gawk'. + +`size_t num_expected_args;' + This is the number of arguments the function expects to receive. + Each extension function may decide what to do if the number of + arguments isn't what it expected. Following `awk' functions, it + is likely OK to ignore extra arguments. + + Once you have a record representing your extension function, you +register it with `gawk' using this API function: + +`awk_bool_t add_ext_func(const char *namespace, const awk_ext_func_t *func);' + This function returns true upon success, false otherwise. The + `namespace' parameter is currently not used; you should pass in an + empty string (`""'). The `func' pointer is the address of a + `struct' representing your function, as just described. + + +File: gawk.info, Node: Exit Callback Functions, Next: Extension Version String, Prev: Extension Functions, Up: Registration Functions + +16.4.5.2 Registering An Exit Callback Function +.............................................. + +An "exit callback" function is a function that `gawk' calls before it +exits. Such functions are useful if you have general "clean up" tasks +that should be performed in your extension (such as closing data base +connections or other resource deallocations). You can register such a +function with `gawk' using the following function. + +`void awk_atexit(void (*funcp)(void *data, int exit_status),' +` void *arg0);' + The parameters are: + `funcp' + A pointer to the function to be called before `gawk' exits. + The `data' parameter will be the original value of `arg0'. + The `exit_status' parameter is the exit status value that + `gawk' will pass to the `exit()' system call. + + `arg0' + A pointer to private data which `gawk' saves in order to pass + to the function pointed to by `funcp'. + + Exit callback functions are called in Last-In-First-Out (LIFO) +order--that is, in the reverse order in which they are registered with +`gawk'. + + +File: gawk.info, Node: Extension Version String, Next: Input Parsers, Prev: Exit Callback Functions, Up: Registration Functions + +16.4.5.3 Registering An Extension Version String +................................................ + +You can register a version string which indicates the name and version +of your extension, with `gawk', as follows: + +`void register_ext_version(const char *version);' + Register the string pointed to by `version' with `gawk'. `gawk' + does _not_ copy the `version' string, so it should not be changed. + + `gawk' prints all registered extension version strings when it is +invoked with the `--version' option. + + +File: gawk.info, Node: Input Parsers, Next: Output Wrappers, Prev: Extension Version String, Up: Registration Functions + +16.4.5.4 Customized Input Parsers +................................. + +By default, `gawk' reads text files as its input. It uses the value of +`RS' to find the end of the record, and then uses `FS' (or +`FIELDWIDTHS') to split it into fields (*note Reading Files::). +Additionally, it sets the value of `RT' (*note Built-in Variables::). + + If you want, you can provide your own, custom, input parser. An +input parser's job is to return a record to the `gawk' record processing +code, along with indicators for the value and length of the data to be +used for `RT', if any. + + To provide an input parser, you must first provide two functions +(where XXX is a prefix name for your extension): + +`awk_bool_t XXX_can_take_file(const awk_input_buf_t *iobuf)' + This function examines the information available in `iobuf' (which + we discuss shortly). Based on the information there, it decides + if the input parser should be used for this file. If so, it + should return true. Otherwise, it should return false. It should + not change any state (variable values, etc.) within `gawk'. + +`awk_bool_t XXX_take_control_of(awk_input_buf_t *iobuf)' + When `gawk' decides to hand control of the file over to the input + parser, it calls this function. This function in turn must fill + in certain fields in the `awk_input_buf_t' structure, and ensure + that certain conditions are true. It should then return true. If + an error of some kind occurs, it should not fill in any fields, + and should return false; then `gawk' will not use the input parser. + The details are presented shortly. + + Your extension should package these functions inside an +`awk_input_parser_t', which looks like this: + + typedef struct input_parser { + const char *name; /* name of parser */ + awk_bool_t (*can_take_file)(const awk_input_buf_t *iobuf); + awk_bool_t (*take_control_of)(awk_input_buf_t *iobuf); + awk_const struct input_parser *awk_const next; /* for use by gawk */ + } awk_input_parser_t; + + The fields are: + +`const char *name;' + The name of the input parser. This is a regular C string. + +`awk_bool_t (*can_take_file)(const awk_input_buf_t *iobuf);' + A pointer to your `XXX_can_take_file()' function. + +`awk_bool_t (*take_control_of)(awk_input_buf_t *iobuf);' + A pointer to your `XXX_take_control_of()' function. + +`awk_const struct input_parser *awk_const next;' + This pointer is used by `gawk'. The extension cannot modify it. + + The steps are as follows: + + 1. Create a `static awk_input_parser_t' variable and initialize it + appropriately. + + 2. When your extension is loaded, register your input parser with + `gawk' using the `register_input_parser()' API function (described + below). + + An `awk_input_buf_t' looks like this: + + typedef struct awk_input { + const char *name; /* filename */ + int fd; /* file descriptor */ + #define INVALID_HANDLE (-1) + void *opaque; /* private data for input parsers */ + int (*get_record)(char **out, struct awk_input *iobuf, + int *errcode, char **rt_start, size_t *rt_len); + void (*close_func)(struct awk_input *iobuf); + struct stat sbuf; /* stat buf */ + } awk_input_buf_t; + + The fields can be divided into two categories: those for use +(initially, at least) by `XXX_can_take_file()', and those for use by +`XXX_take_control_of()'. The first group of fields and their uses are +as follows: + +`const char *name;' + The name of the file. + +`int fd;' + A file descriptor for the file. If `gawk' was able to open the + file, then `fd' will _not_ be equal to `INVALID_HANDLE'. + Otherwise, it will. + +`struct stat sbuf;' + If file descriptor is valid, then `gawk' will have filled in this + structure via a call to the `fstat()' system call. + + The `XXX_can_take_file()' function should examine these fields and +decide if the input parser should be used for the file. The decision +can be made based upon `gawk' state (the value of a variable defined +previously by the extension and set by `awk' code), the name of the +file, whether or not the file descriptor is valid, the information in +the `struct stat', or any combination of the above. + + Once `XXX_can_take_file()' has returned true, and `gawk' has decided +to use your input parser, it calls `XXX_take_control_of()'. That +function then fills in at least the `get_record' field of the +`awk_input_buf_t'. It must also ensure that `fd' is not set to +`INVALID_HANDLE'. All of the fields that may be filled by +`XXX_take_control_of()' are as follows: + +`void *opaque;' + This is used to hold any state information needed by the input + parser for this file. It is "opaque" to `gawk'. The input parser + is not required to use this pointer. + +`int (*get_record)(char **out,' +` struct awk_input *iobuf,' +` int *errcode,' +` char **rt_start,' +` size_t *rt_len);' + This function pointer should point to a function that creates the + input records. Said function is the core of the input parser. + Its behavior is described below. + +`void (*close_func)(struct awk_input *iobuf);' + This function pointer should point to a function that does the + "tear down." It should release any resources allocated by + `XXX_take_control_of()'. It may also close the file. If it does + so, it should set the `fd' field to `INVALID_HANDLE'. + + If `fd' is still not `INVALID_HANDLE' after the call to this + function, `gawk' calls the regular `close()' system call. + + Having a "tear down" function is optional. If your input parser + does not need it, do not set this field. Then, `gawk' calls the + regular `close()' system call on the file descriptor, so it should + be valid. + + The `XXX_get_record()' function does the work of creating input +records. The parameters are as follows: + +`char **out' + This is a pointer to a `char *' variable which is set to point to + the record. `gawk' makes its own copy of the data, so the + extension must manage this storage. + +`struct awk_input *iobuf' + This is the `awk_input_buf_t' for the file. The fields should be + used for reading data (`fd') and for managing private state + (`opaque'), if any. + +`int *errcode' + If an error occurs, `*errcode' should be set to an appropriate + code from `<errno.h>'. + +`char **rt_start' +`size_t *rt_len' + If the concept of a "record terminator" makes sense, then + `*rt_start' should be set to point to the data to be used for + `RT', and `*rt_len' should be set to the length of the data. + Otherwise, `*rt_len' should be set to zero. `gawk' makes its own + copy of this data, so the extension must manage the storage. + + The return value is the length of the buffer pointed to by `*out', +or `EOF' if end-of-file was reached or an error occurred. + + It is guaranteed that `errcode' is a valid pointer, so there is no +need to test for a `NULL' value. `gawk' sets `*errcode' to zero, so +there is no need to set it unless an error occurs. + + If an error does occur, the function should return `EOF' and set +`*errcode' to a non-zero value. In that case, if `*errcode' does not +equal -1, `gawk' automatically updates the `ERRNO' variable based on +the value of `*errcode' (e.g., setting `*errcode = errno' should do the +right thing). + + `gawk' ships with a sample extension that reads directories, +returning records for each entry in the directory (*note Extension +Sample Readdir::). You may wish to use that code as a guide for writing +your own input parser. + + When writing an input parser, you should think about (and document) +how it is expected to interact with `awk' code. You may want it to +always be called, and take effect as appropriate (as the `readdir' +extension does). Or you may want it to take effect based upon the +value of an `awk' variable, as the XML extension from the `gawkextlib' +project does (*note gawkextlib::). In the latter case, code in a +`BEGINFILE' section can look at `FILENAME' and `ERRNO' to decide +whether or not to activate an input parser (*note BEGINFILE/ENDFILE::). + + You register your input parser with the following function: + +`void register_input_parser(awk_input_parser_t *input_parser);' + Register the input parser pointed to by `input_parser' with `gawk'. + + +File: gawk.info, Node: Output Wrappers, Next: Two-way processors, Prev: Input Parsers, Up: Registration Functions + +16.4.5.5 Customized Output Wrappers +................................... + +An "output wrapper" is the mirror image of an input parser. It allows +an extension to take over the output to a file opened with the `>' or +`>>' operators (*note Redirection::). + + The output wrapper is very similar to the input parser structure: + + typedef struct output_wrapper { + const char *name; /* name of the wrapper */ + awk_bool_t (*can_take_file)(const awk_output_buf_t *outbuf); + awk_bool_t (*take_control_of)(awk_output_buf_t *outbuf); + awk_const struct output_wrapper *awk_const next; /* for use by gawk */ + } awk_output_wrapper_t; + + The members are as follows: + +`const char *name;' + This is the name of the output wrapper. + +`awk_bool_t (*can_take_file)(const awk_output_buf_t *outbuf);' + This points to a function that examines the information in the + `awk_output_buf_t' structure pointed to by `outbuf'. It should + return true if the output wrapper wants to take over the file, and + false otherwise. It should not change any state (variable values, + etc.) within `gawk'. + +`awk_bool_t (*take_control_of)(awk_output_buf_t *outbuf);' + The function pointed to by this field is called when `gawk' + decides to let the output wrapper take control of the file. It + should fill in appropriate members of the `awk_output_buf_t' + structure, as described below, and return true if successful, + false otherwise. + +`awk_const struct output_wrapper *awk_const next;' + This is for use by `gawk'. + + The `awk_output_buf_t' structure looks like this: + + typedef struct { + const char *name; /* name of output file */ + const char *mode; /* mode argument to fopen */ + FILE *fp; /* stdio file pointer */ + awk_bool_t redirected; /* true if a wrapper is active */ + void *opaque; /* for use by output wrapper */ + size_t (*gawk_fwrite)(const void *buf, size_t size, size_t count, + FILE *fp, void *opaque); + int (*gawk_fflush)(FILE *fp, void *opaque); + int (*gawk_ferror)(FILE *fp, void *opaque); + int (*gawk_fclose)(FILE *fp, void *opaque); + } awk_output_buf_t; + + Here too, your extension will define `XXX_can_take_file()' and +`XXX_take_control_of()' functions that examine and update data members +in the `awk_output_buf_t'. The data members are as follows: + +`const char *name;' + The name of the output file. + +`const char *mode;' + The mode string (as would be used in the second argument to + `fopen()') with which the file was opened. + +`FILE *fp;' + The `FILE' pointer from `<stdio.h>'. `gawk' opens the file before + attempting to find an output wrapper. + +`awk_bool_t redirected;' + This field must be set to true by the `XXX_take_control_of()' + function. + +`void *opaque;' + This pointer is opaque to `gawk'. The extension should use it to + store a pointer to any private data associated with the file. + +`size_t (*gawk_fwrite)(const void *buf, size_t size, size_t count,' +` FILE *fp, void *opaque);' +`int (*gawk_fflush)(FILE *fp, void *opaque);' +`int (*gawk_ferror)(FILE *fp, void *opaque);' +`int (*gawk_fclose)(FILE *fp, void *opaque);' + These pointers should be set to point to functions that perform + the equivalent function as the `<stdio.h>' functions do, if + appropriate. `gawk' uses these function pointers for all output. + `gawk' initializes the pointers to point to internal, "pass + through" functions that just call the regular `<stdio.h>' + functions, so an extension only needs to redefine those functions + that are appropriate for what it does. + + The `XXX_can_take_file()' function should make a decision based upon +the `name' and `mode' fields, and any additional state (such as `awk' +variable values) that is appropriate. + + When `gawk' calls `XXX_take_control_of()', it should fill in the +other fields, as appropriate, except for `fp', which it should just use +normally. + + You register your output wrapper with the following function: + +`void register_output_wrapper(awk_output_wrapper_t *output_wrapper);' + Register the output wrapper pointed to by `output_wrapper' with + `gawk'. + + +File: gawk.info, Node: Two-way processors, Prev: Output Wrappers, Up: Registration Functions + +16.4.5.6 Customized Two-way Processors +...................................... + +A "two-way processor" combines an input parser and an output wrapper for +two-way I/O with the `|&' operator (*note Redirection::). It makes +identical use of the `awk_input_parser_t' and `awk_output_buf_t' +structures as described earlier. + + A two-way processor is represented by the following structure: + + typedef struct two_way_processor { + const char *name; /* name of the two-way processor */ + awk_bool_t (*can_take_two_way)(const char *name); + awk_bool_t (*take_control_of)(const char *name, + awk_input_buf_t *inbuf, + awk_output_buf_t *outbuf); + awk_const struct two_way_processor *awk_const next; /* for use by gawk */ + } awk_two_way_processor_t; + + The fields are as follows: + +`const char *name;' + The name of the two-way processor. + +`awk_bool_t (*can_take_two_way)(const char *name);' + This function returns true if it wants to take over two-way I/O + for this filename. It should not change any state (variable + values, etc.) within `gawk'. + +`awk_bool_t (*take_control_of)(const char *name,' +` awk_input_buf_t *inbuf,' +` awk_output_buf_t *outbuf);' + This function should fill in the `awk_input_buf_t' and + `awk_outut_buf_t' structures pointed to by `inbuf' and `outbuf', + respectively. These structures were described earlier. + +`awk_const struct two_way_processor *awk_const next;' + This is for use by `gawk'. + + As with the input parser and output processor, you provide "yes I +can take this" and "take over for this" functions, +`XXX_can_take_two_way()' and `XXX_take_control_of()'. + + You register your two-way processor with the following function: + +`void register_two_way_processor(awk_two_way_processor_t *two_way_processor);' + Register the two-way processor pointed to by `two_way_processor' + with `gawk'. + + +File: gawk.info, Node: Printing Messages, Next: Updating `ERRNO', Prev: Registration Functions, Up: Extension API Description + +16.4.6 Printing Messages +------------------------ + +You can print different kinds of warning messages from your extension, +as described below. Note that for these functions, you must pass in +the extension id received from `gawk' when the extension was loaded.(1) + +`void fatal(awk_ext_id_t id, const char *format, ...);' + Print a message and then cause `gawk' to exit immediately. + +`void warning(awk_ext_id_t id, const char *format, ...);' + Print a warning message. + +`void lintwarn(awk_ext_id_t id, const char *format, ...);' + Print a "lint warning." Normally this is the same as printing a + warning message, but if `gawk' was invoked with `--lint=fatal', + then lint warnings become fatal error messages. + + All of these functions are otherwise like the C `printf()' family of +functions, where the `format' parameter is a string with literal +characters and formatting codes intermixed. + + ---------- Footnotes ---------- + + (1) Because the API uses only ISO C 90 features, it cannot make use +of the ISO C 99 variadic macro feature to hide that parameter. More's +the pity. + + +File: gawk.info, Node: Updating `ERRNO', Next: Accessing Parameters, Prev: Printing Messages, Up: Extension API Description + +16.4.7 Updating `ERRNO' +----------------------- + +The following functions allow you to update the `ERRNO' variable: + +`void update_ERRNO_int(int errno_val);' + Set `ERRNO' to the string equivalent of the error code in + `errno_val'. The value should be one of the defined error codes in + `<errno.h>', and `gawk' turns it into a (possibly translated) + string using the C `strerror()' function. + +`void update_ERRNO_string(const char *string);' + Set `ERRNO' directly to the string value of `ERRNO'. `gawk' makes + a copy of the value of `string'. + +`void unset_ERRNO();' + Unset `ERRNO'. + + +File: gawk.info, Node: Accessing Parameters, Next: Symbol Table Access, Prev: Updating `ERRNO', Up: Extension API Description + +16.4.8 Accessing and Updating Parameters +---------------------------------------- + +Two functions give you access to the arguments (parameters) passed to +your extension function. They are: + +`awk_bool_t get_argument(size_t count,' +` awk_valtype_t wanted,' +` awk_value_t *result);' + Fill in the `awk_value_t' structure pointed to by `result' with + the `count''th argument. Return true if the actual type matches + `wanted', false otherwise. In the latter case, `result->val_type' + indicates the actual type (*note Table 16.1: + table-value-types-returned.). Counts are zero based--the first + argument is numbered zero, the second one, and so on. `wanted' + indicates the type of value expected. + +`awk_bool_t set_argument(size_t count, awk_array_t array);' + Convert a parameter that was undefined into an array; this provides + call-by-reference for arrays. Return false if `count' is too big, + or if the argument's type is not undefined. *Note Array + Manipulation::, for more information on creating arrays. + + +File: gawk.info, Node: Symbol Table Access, Next: Array Manipulation, Prev: Accessing Parameters, Up: Extension API Description + +16.4.9 Symbol Table Access +-------------------------- + +Two sets of routines provide access to global variables, and one set +allows you to create and release cached values. + +* Menu: + +* Symbol table by name:: Accessing variables by name. +* Symbol table by cookie:: Accessing variables by ``cookie''. +* Cached values:: Creating and using cached values. + + +File: gawk.info, Node: Symbol table by name, Next: Symbol table by cookie, Up: Symbol Table Access + +16.4.9.1 Variable Access and Update by Name +........................................... + +The following routines provide the ability to access and update global +`awk'-level variables by name. In compiler terminology, identifiers of +different kinds are termed "symbols", thus the "sym" in the routines' +names. The data structure which stores information about symbols is +termed a "symbol table". + +`awk_bool_t sym_lookup(const char *name,' +` awk_valtype_t wanted,' +` awk_value_t *result);' + Fill in the `awk_value_t' structure pointed to by `result' with + the value of the variable named by the string `name', which is a + regular C string. `wanted' indicates the type of value expected. + Return true if the actual type matches `wanted', false otherwise + In the latter case, `result->val_type' indicates the actual type + (*note Table 16.1: table-value-types-returned.). + +`awk_bool_t sym_update(const char *name, awk_value_t *value);' + Update the variable named by the string `name', which is a regular + C string. The variable is added to `gawk''s symbol table if it is + not there. Return true if everything worked, false otherwise. + + Changing types (scalar to array or vice versa) of an existing + variable is _not_ allowed, nor may this routine be used to update + an array. This routine cannot be be used to update any of the + predefined variables (such as `ARGC' or `NF'). + +`awk_bool_t sym_constant(const char *name, awk_value_t *value);' + Create a variable named by the string `name', which is a regular C + string, that has the constant value as given by `value'. + `awk'-level code cannot change the value of this variable.(1) The + extension may change the value of `name''s variable with + subsequent calls to this routine, and may also convert a variable + created by `sym_update()' into a constant. However, once a + variable becomes a constant it cannot later be reverted into a + mutable variable. + + ---------- Footnotes ---------- + + (1) There (currently) is no `awk'-level feature that provides this +ability. + + +File: gawk.info, Node: Symbol table by cookie, Next: Cached values, Prev: Symbol table by name, Up: Symbol Table Access + +16.4.9.2 Variable Access and Update by Cookie +............................................. + +A "scalar cookie" is an opaque handle that provide access to a global +variable or array. It is an optimization that avoids looking up +variables in `gawk''s symbol table every time access is needed. This +was discussed earlier, in *note General Data Types::. + + The following functions let you work with scalar cookies. + +`awk_bool_t sym_lookup_scalar(awk_scalar_t cookie,' +` awk_valtype_t wanted,' +` awk_value_t *result);' + Retrieve the current value of a scalar cookie. Once you have + obtained a scalar_cookie using `sym_lookup()', you can use this + function to get its value more efficiently. Return false if the + value cannot be retrieved. + +`awk_bool_t sym_update_scalar(awk_scalar_t cookie, awk_value_t *value);' + Update the value associated with a scalar cookie. Return false if + the new value is not one of `AWK_STRING' or `AWK_NUMBER'. Here + too, the built-in variables may not be updated. + + It is not obvious at first glance how to work with scalar cookies or +what their raison d'etre really is. In theory, the `sym_lookup()' and +`sym_update()' routines are all you really need to work with variables. +For example, you might have code that looked up the value of a +variable, evaluated a condition, and then possibly changed the value of +the variable based on the result of that evaluation, like so: + + /* do_magic --- do something really great */ + + static awk_value_t * + do_magic(int nargs, awk_value_t *result) + { + awk_value_t value; + + if ( sym_lookup("MAGIC_VAR", AWK_NUMBER, & value) + && some_condition(value.num_value)) { + value.num_value += 42; + sym_update("MAGIC_VAR", & value); + } + + return make_number(0.0, result); + } + +This code looks (and is) simple and straightforward. So what's the +problem? + + Consider what happens if `awk'-level code associated with your +extension calls the `magic()' function (implemented in C by +`do_magic()'), once per record, while processing hundreds of thousands +or millions of records. The `MAGIC_VAR' variable is looked up in the +symbol table once or twice per function call! + + The symbol table lookup is really pure overhead; it is considerably +more efficient to get a cookie that represents the variable, and use +that to get the variable's value and update it as needed.(1) + + Thus, the way to use cookies is as follows. First, install your +extension's variable in `gawk''s symbol table using `sym_update()', as +usual. Then get a scalar cookie for the variable using `sym_lookup()': + + static awk_scalar_t magic_var_cookie; /* cookie for MAGIC_VAR */ + + static void + my_extension_init() + { + awk_value_t value; + + /* install initial value */ + sym_update("MAGIC_VAR", make_number(42.0, & value)); + + /* get cookie */ + sym_lookup("MAGIC_VAR", AWK_SCALAR, & value); + + /* save the cookie */ + magic_var_cookie = value.scalar_cookie; + ... + } + + Next, use the routines in this section for retrieving and updating +the value through the cookie. Thus, `do_magic()' now becomes something +like this: + + /* do_magic --- do something really great */ + + static awk_value_t * + do_magic(int nargs, awk_value_t *result) + { + awk_value_t value; + + if ( sym_lookup_scalar(magic_var_cookie, AWK_NUMBER, & value) + && some_condition(value.num_value)) { + value.num_value += 42; + sym_update_scalar(magic_var_cookie, & value); + } + ... + + return make_number(0.0, result); + } + + NOTE: The previous code omitted error checking for presentation + purposes. Your extension code should be more robust and carefully + check the return values from the API functions. + + ---------- Footnotes ---------- + + (1) The difference is measurable and quite real. Trust us. + + +File: gawk.info, Node: Cached values, Prev: Symbol table by cookie, Up: Symbol Table Access + +16.4.9.3 Creating and Using Cached Values +......................................... + +The routines in this section allow you to create and release cached +values. As with scalar cookies, in theory, cached values are not +necessary. You can create numbers and strings using the functions in +*note Constructor Functions::. You can then assign those values to +variables using `sym_update()' or `sym_update_scalar()', as you like. + + However, you can understand the point of cached values if you +remember that _every_ string value's storage _must_ come from +`malloc()'. If you have 20 variables, all of which have the same +string value, you must create 20 identical copies of the string.(1) + + It is clearly more efficient, if possible, to create a value once, +and then tell `gawk' to reuse the value for multiple variables. That is +what the routines in this section let you do. The functions are as +follows: + +`awk_bool_t create_value(awk_value_t *value, awk_value_cookie_t *result);' + Create a cached string or numeric value from `value' for efficient + later assignment. Only `AWK_NUMBER' and `AWK_STRING' values are + allowed. Any other type is rejected. While `AWK_UNDEFINED' could + be allowed, doing so would result in inferior performance. + +`awk_bool_t release_value(awk_value_cookie_t vc);' + Release the memory associated with a value cookie obtained from + `create_value()'. + + You use value cookies in a fashion similar to the way you use scalar +cookies. In the extension initialization routine, you create the value +cookie: + + static awk_value_cookie_t answer_cookie; /* static value cookie */ + + static void + my_extension_init() + { + awk_value_t value; + char *long_string; + size_t long_string_len; + + /* code from earlier */ + ... + /* ... fill in long_string and long_string_len ... */ + make_malloced_string(long_string, long_string_len, & value); + create_value(& value, & answer_cookie); /* create cookie */ + ... + } + + Once the value is created, you can use it as the value of any number +of variables: + + static awk_value_t * + do_magic(int nargs, awk_value_t *result) + { + awk_value_t new_value; + + ... /* as earlier */ + + value.val_type = AWK_VALUE_COOKIE; + value.value_cookie = answer_cookie; + sym_update("VAR1", & value); + sym_update("VAR2", & value); + ... + sym_update("VAR100", & value); + ... + } + +Using value cookies in this way saves considerable storage, since all of +`VAR1' through `VAR100' share the same value. + + You might be wondering, "Is this sharing problematic? What happens +if `awk' code assigns a new value to `VAR1', are all the others be +changed too?" + + That's a great question. The answer is that no, it's not a problem. +`gawk' is smart enough to avoid such problems. + + Finally, as part of your clean up action (*note Exit Callback +Functions::) you should release any cached values that you created, +using `release_value()'. + + ---------- Footnotes ---------- + + (1) Numeric values are clearly less problematic, requiring only a C +`double' to store. + + +File: gawk.info, Node: Array Manipulation, Next: Extension API Variables, Prev: Symbol Table Access, Up: Extension API Description + +16.4.10 Array Manipulation +-------------------------- + +The primary data structure(1) in `awk' is the associative array (*note +Arrays::). Extensions need to be able to manipulate `awk' arrays. The +API provides a number of data structures for working with arrays, +functions for working with individual elements, and functions for +working with arrays as a whole. This includes the ability to "flatten" +an array so that it is easy for C code to traverse every element in an +array. The array data structures integrate nicely with the data +structures for values to make it easy to both work with and create true +arrays of arrays (*note General Data Types::). + +* Menu: + +* Array Data Types:: Data types for working with arrays. +* Array Functions:: Functions for working with arrays. +* Flattening Arrays:: How to flatten arrays. +* Creating Arrays:: How to create and populate arrays. + + ---------- Footnotes ---------- + + (1) Okay, the only data structure. + + +File: gawk.info, Node: Array Data Types, Next: Array Functions, Up: Array Manipulation + +16.4.10.1 Array Data Types +.......................... + +The data types associated with arrays are listed below. + +`typedef void *awk_array_t;' + If you request the value of an array variable, you get back an + `awk_array_t' value. This value is opaque(1) to the extension; it + uniquely identifies the array but can only be used by passing it + into API functions or receiving it from API functions. This is + very similar to way `FILE *' values are used with the `<stdio.h>' + library routines. + +`' + +`typedef struct awk_element {' +` /* convenience linked list pointer, not used by gawk */' +` struct awk_element *next;' +` enum {' +` AWK_ELEMENT_DEFAULT = 0, /* set by gawk */' +` AWK_ELEMENT_DELETE = 1 /* set by extension if should be deleted */' +` } flags;' +` awk_value_t index;' +` awk_value_t value;' +`} awk_element_t;' + The `awk_element_t' is a "flattened" array element. `awk' produces + an array of these inside the `awk_flat_array_t' (see the next + item). Individual elements may be marked for deletion. New + elements must be added individually, one at a time, using the + separate API for that purpose. The fields are as follows: + + `struct awk_element *next;' + This pointer is for the convenience of extension writers. It + allows an extension to create a linked list of new elements + which can then be added to an array in a loop that traverses + the list. + + `enum { ... } flags;' + A set of flag values that convey information between `gawk' + and the extension. Currently there is only one: + `AWK_ELEMENT_DELETE', which the extension can set to cause + `gawk' to delete the element from the original array upon + release of the flattened array. + + `index' + `value' + The index and value of the element, respectively. _All_ + memory pointed to by `index' and `value' belongs to `gawk'. + +`typedef struct awk_flat_array {' +` awk_const void *awk_const opaque1; /* private data for use by gawk */' +` awk_const void *awk_const opaque2; /* private data for use by gawk */' +` awk_const size_t count; /* how many elements */' +` awk_element_t elements[1]; /* will be extended */' +`} awk_flat_array_t;' + This is a flattened array. When an extension gets one of these + from `gawk', the `elements' array is of actual size `count'. The + `opaque1' and `opaque2' pointers are for use by `gawk'; therefore + they are marked `awk_const' so that the extension cannot modify + them. + + ---------- Footnotes ---------- + + (1) It is also a "cookie," but the `gawk' developers did not wish to +overuse this term. + + +File: gawk.info, Node: Array Functions, Next: Flattening Arrays, Prev: Array Data Types, Up: Array Manipulation + +16.4.10.2 Array Functions +......................... + +The following functions relate to individual array elements. + +`awk_bool_t get_element_count(awk_array_t a_cookie, size_t *count);' + For the array represented by `a_cookie', return in `*count' the + number of elements it contains. A subarray counts as a single + element. Return false if there is an error. + +`awk_bool_t get_array_element(awk_array_t a_cookie,' +` const awk_value_t *const index,' +` awk_valtype_t wanted,' +` awk_value_t *result);' + For the array represented by `a_cookie', return in `*result' the + value of the element whose index is `index'. `wanted' specifies + the type of value you wish to retrieve. Return false if `wanted' + does not match the actual type or if `index' is not in the array + (*note Table 16.1: table-value-types-returned.). + + The value for `index' can be numeric, in which case `gawk' + converts it to a string. Using non-integral values is possible, but + requires that you understand how such values are converted to + strings (*note Conversion::); thus using integral values is safest. + + As with _all_ strings passed into `gawk' from an extension, the + string value of `index' must come from `malloc()', and `gawk' + releases the storage. + +`awk_bool_t set_array_element(awk_array_t a_cookie,' +` const awk_value_t *const index,' +` const awk_value_t *const value);' + In the array represented by `a_cookie', create or modify the + element whose index is given by `index'. The `ARGV' and `ENVIRON' + arrays may not be changed. + +`awk_bool_t set_array_element_by_elem(awk_array_t a_cookie,' +` awk_element_t element);' + Like `set_array_element()', but take the `index' and `value' from + `element'. This is a convenience macro. + +`awk_bool_t del_array_element(awk_array_t a_cookie,' +` const awk_value_t* const index);' + Remove the element with the given index from the array represented + by `a_cookie'. Return true if the element was removed, or false + if the element did not exist in the array. + + The following functions relate to arrays as a whole: + +`awk_array_t create_array();' + Create a new array to which elements may be added. *Note Creating + Arrays::, for a discussion of how to create a new array and add + elements to it. + +`awk_bool_t clear_array(awk_array_t a_cookie);' + Clear the array represented by `a_cookie'. Return false if there + was some kind of problem, true otherwise. The array remains an + array, but after calling this function, it has no elements. This + is equivalent to using the `delete' statement (*note Delete::). + +`awk_bool_t flatten_array(awk_array_t a_cookie, awk_flat_array_t **data);' + For the array represented by `a_cookie', create an + `awk_flat_array_t' structure and fill it in. Set the pointer whose + address is passed as `data' to point to this structure. Return + true upon success, or false otherwise. *Note Flattening Arrays::, + for a discussion of how to flatten an array and work with it. + +`awk_bool_t release_flattened_array(awk_array_t a_cookie,' +` awk_flat_array_t *data);' + When done with a flattened array, release the storage using this + function. You must pass in both the original array cookie, and + the address of the created `awk_flat_array_t' structure. The + function returns true upon success, false otherwise. + + +File: gawk.info, Node: Flattening Arrays, Next: Creating Arrays, Prev: Array Functions, Up: Array Manipulation + +16.4.10.3 Working With All The Elements of an Array +................................................... + +To "flatten" an array is create a structure that represents the full +array in a fashion that makes it easy for C code to traverse the entire +array. Test code in `extension/testext.c' does this, and also serves +as a nice example to show how to use the APIs. + + First, the `gawk' script that drives the test extension: + + @load "testext" + BEGIN { + n = split("blacky rusty sophie raincloud lucky", pets) + printf "pets has %d elements\n", length(pets) + ret = dump_array_and_delete("pets", "3") + printf "dump_array_and_delete(pets) returned %d\n", ret + if ("3" in pets) + printf("dump_array_and_delete() did NOT remove index \"3\"!\n") + else + printf("dump_array_and_delete() did remove index \"3\"!\n") + print "" + } + +This code creates an array with `split()' (*note String Functions::) +and then calls `dump_and_delete()'. That function looks up the array +whose name is passed as the first argument, and deletes the element at +the index passed in the second argument. It then prints the return +value and checks if the element was indeed deleted. Here is the C code +that implements `dump_array_and_delete()'. It has been edited slightly +for presentation. + + The first part declares variables, sets up the default return value +in `result', and checks that the function was called with the correct +number of arguments: + + static awk_value_t * + dump_array_and_delete(int nargs, awk_value_t *result) + { + awk_value_t value, value2, value3; + awk_flat_array_t *flat_array; + size_t count; + char *name; + int i; + + assert(result != NULL); + make_number(0.0, result); + + if (nargs != 2) { + printf("dump_array_and_delete: nargs not right " + "(%d should be 2)\n", nargs); + goto out; + } + + The function then proceeds in steps, as follows. First, retrieve the +name of the array, passed as the first argument. Then retrieve the +array itself. If either operation fails, print error messages and +return: + + /* get argument named array as flat array and print it */ + if (get_argument(0, AWK_STRING, & value)) { + name = value.str_value.str; + if (sym_lookup(name, AWK_ARRAY, & value2)) + printf("dump_array_and_delete: sym_lookup of %s passed\n", + name); + else { + printf("dump_array_and_delete: sym_lookup of %s failed\n", + name); + goto out; + } + } else { + printf("dump_array_and_delete: get_argument(0) failed\n"); + goto out; + } + + For testing purposes and to make sure that the C code sees the same +number of elements as the `awk' code, the second step is to get the +count of elements in the array and print it: + + if (! get_element_count(value2.array_cookie, & count)) { + printf("dump_array_and_delete: get_element_count failed\n"); + goto out; + } + + printf("dump_array_and_delete: incoming size is %lu\n", + (unsigned long) count); + + The third step is to actually flatten the array, and then to double +check that the count in the `awk_flat_array_t' is the same as the count +just retrieved: + + if (! flatten_array(value2.array_cookie, & flat_array)) { + printf("dump_array_and_delete: could not flatten array\n"); + goto out; + } + + if (flat_array->count != count) { + printf("dump_array_and_delete: flat_array->count (%lu)" + " != count (%lu)\n", + (unsigned long) flat_array->count, + (unsigned long) count); + goto out; + } + + The fourth step is to retrieve the index of the element to be +deleted, which was passed as the second argument. Remember that +argument counts passed to `get_argument()' are zero-based, thus the +second argument is numbered one: + + if (! get_argument(1, AWK_STRING, & value3)) { + printf("dump_array_and_delete: get_argument(1) failed\n"); + goto out; + } + + The fifth step is where the "real work" is done. The function loops +over every element in the array, printing the index and element values. +In addition, upon finding the element with the index that is supposed +to be deleted, the function sets the `AWK_ELEMENT_DELETE' bit in the +`flags' field of the element. When the array is released, `gawk' +traverses the flattened array, and deletes any element which have this +flag bit set: + + for (i = 0; i < flat_array->count; i++) { + printf("\t%s[\"%.*s\"] = %s\n", + name, + (int) flat_array->elements[i].index.str_value.len, + flat_array->elements[i].index.str_value.str, + valrep2str(& flat_array->elements[i].value)); + + if (strcmp(value3.str_value.str, + flat_array->elements[i].index.str_value.str) + == 0) { + flat_array->elements[i].flags |= AWK_ELEMENT_DELETE; + printf("dump_array_and_delete: marking element \"%s\" " + "for deletion\n", + flat_array->elements[i].index.str_value.str); + } + } + + The sixth step is to release the flattened array. This tells `gawk' +that the extension is no longer using the array, and that it should +delete any elements marked for deletion. `gawk' also frees any storage +that was allocated, so you should not use the pointer (`flat_array' in +this code) once you have called `release_flattened_array()': + + if (! release_flattened_array(value2.array_cookie, flat_array)) { + printf("dump_array_and_delete: could not release flattened array\n"); + goto out; + } + + Finally, since everything was successful, the function sets the +return value to success, and returns: + + make_number(1.0, result); + out: + return result; + } + + Here is the output from running this part of the test: + + pets has 5 elements + dump_array_and_delete: sym_lookup of pets passed + dump_array_and_delete: incoming size is 5 + pets["1"] = "blacky" + pets["2"] = "rusty" + pets["3"] = "sophie" + dump_array_and_delete: marking element "3" for deletion + pets["4"] = "raincloud" + pets["5"] = "lucky" + dump_array_and_delete(pets) returned 1 + dump_array_and_delete() did remove index "3"! + + +File: gawk.info, Node: Creating Arrays, Prev: Flattening Arrays, Up: Array Manipulation + +16.4.10.4 How To Create and Populate Arrays +........................................... + +Besides working with arrays created by `awk' code, you can create +arrays and populate them as you see fit, and then `awk' code can access +them and manipulate them. + + There are two important points about creating arrays from extension +code: + + 1. You must install a new array into `gawk''s symbol table + immediately upon creating it. Once you have done so, you can then + populate the array. + + Similarly, if installing a new array as a subarray of an existing + array, you must add the new array to its parent before adding any + elements to it. + + Thus, the correct way to build an array is to work "top down." + Create the array, and immediately install it in `gawk''s symbol + table using `sym_update()', or install it as an element in a + previously existing array using `set_element()'. Example code is + coming shortly. + + 2. Due to gawk internals, after using `sym_update()' to install an + array into `gawk', you have to retrieve the array cookie from the + value passed in to `sym_update()' before doing anything else with + it, like so: + + awk_value_t index, value; + awk_array_t new_array; + + make_const_string("an index", 8, & index); + + new_array = create_array(); + val.val_type = AWK_ARRAY; + val.array_cookie = new_array; + + /* install array in the symbol table */ + sym_update("array", & index, & val); + + new_array = val.array_cookie; /* YOU MUST DO THIS */ + + If installing an array as a subarray, you must also retrieve the + value of the array cookie after the call to `set_element()'. + + The following C code is a simple test extension to create an array +with two regular elements and with a subarray. The leading `#include' +directives and boilerplate variable declarations are omitted for +brevity. The first step is to create a new array and then install it +in the symbol table: + + /* create_new_array --- create a named array */ + + static void + create_new_array() + { + awk_array_t a_cookie; + awk_array_t subarray; + awk_value_t index, value; + + a_cookie = create_array(); + value.val_type = AWK_ARRAY; + value.array_cookie = a_cookie; + + if (! sym_update("new_array", & value)) + printf("create_new_array: sym_update(\"new_array\") failed!\n"); + a_cookie = value.array_cookie; + +Note how `a_cookie' is reset from the `array_cookie' field in the +`value' structure. + + The second step is to install two regular values into `new_array': + + (void) make_const_string("hello", 5, & index); + (void) make_const_string("world", 5, & value); + if (! set_array_element(a_cookie, & index, & value)) { + printf("fill_in_array: set_array_element failed\n"); + return; + } + + (void) make_const_string("answer", 6, & index); + (void) make_number(42.0, & value); + if (! set_array_element(a_cookie, & index, & value)) { + printf("fill_in_array: set_array_element failed\n"); + return; + } + + The third step is to create the subarray and install it: + + (void) make_const_string("subarray", 8, & index); + subarray = create_array(); + value.val_type = AWK_ARRAY; + value.array_cookie = subarray; + if (! set_array_element(a_cookie, & index, & value)) { + printf("fill_in_array: set_array_element failed\n"); + return; + } + subarray = value.array_cookie; + + The final step is to populate the subarray with its own element: + + (void) make_const_string("foo", 3, & index); + (void) make_const_string("bar", 3, & value); + if (! set_array_element(subarray, & index, & value)) { + printf("fill_in_array: set_array_element failed\n"); + return; + } + } + + Here is sample script that loads the extension and then dumps the +array: + + @load "subarray" + + function dumparray(name, array, i) + { + for (i in array) + if (isarray(array[i])) + dumparray(name "[\"" i "\"]", array[i]) + else + printf("%s[\"%s\"] = %s\n", name, i, array[i]) + } + + BEGIN { + dumparray("new_array", new_array); + } + + Here is the result of running the script: + + $ AWKLIBPATH=$PWD ./gawk -f subarray.awk + -| new_array["subarray"]["foo"] = bar + -| new_array["hello"] = world + -| new_array["answer"] = 42 + +(*Note Finding Extensions::, for more information on the `AWKLIBPATH' +environment variable.) + + +File: gawk.info, Node: Extension API Variables, Next: Extension API Boilerplate, Prev: Array Manipulation, Up: Extension API Description + +16.4.11 API Variables +--------------------- + +The API provides two sets of variables. The first provides information +about the version of the API (both with which the extension was +compiled, and with which `gawk' was compiled). The second provides +information about how `gawk' was invoked. + +* Menu: + +* Extension Versioning:: API Version information. +* Extension API Informational Variables:: Variables providing information about + `gawk''s invocation. + + +File: gawk.info, Node: Extension Versioning, Next: Extension API Informational Variables, Up: Extension API Variables + +16.4.11.1 API Version Constants and Variables +............................................. + +The API provides both a "major" and a "minor" version number. The API +versions are available at compile time as constants: + +`GAWK_API_MAJOR_VERSION' + The major version of the API. + +`GAWK_API_MINOR_VERSION' + The minor version of the API. + + The minor version increases when new functions are added to the API. +Such new functions are always added to the end of the API `struct'. + + The major version increases (and the minor version is reset to zero) +if any of the data types change size or member order, or if any of the +existing functions change signature. + + It could happen that an extension may be compiled against one version +of the API but loaded by a version of `gawk' using a different version. +For this reason, the major and minor API versions of the running `gawk' +are included in the API `struct' as read-only constant integers: + +`api->major_version' + The major version of the running `gawk'. + +`api->minor_version' + The minor version of the running `gawk'. + + It is up to the extension to decide if there are API +incompatibilities. Typically a check like this is enough: -Two useful functions that are not in `awk' are `chdir()' (so that an + if (api->major_version != GAWK_API_MAJOR_VERSION + || api->minor_version < GAWK_API_MINOR_VERSION) { + fprintf(stderr, "foo_extension: version mismatch with gawk!\n"); + fprintf(stderr, "\tmy version (%d, %d), gawk version (%d, %d)\n", + GAWK_API_MAJOR_VERSION, GAWK_API_MINOR_VERSION, + api->major_version, api->minor_version); + exit(1); + } + + Such code is included in the boilerplate `dl_load_func()' macro +provided in `gawkapi.h' (discussed later, in *note Extension API +Boilerplate::). + + +File: gawk.info, Node: Extension API Informational Variables, Prev: Extension Versioning, Up: Extension API Variables + +16.4.11.2 Informational Variables +................................. + +The API provides access to several variables that describe whether the +corresponding command-line options were enabled when `gawk' was +invoked. The variables are: + +`do_lint' + This variable is true if `gawk' was invoked with `--lint' option + (*note Options::). + +`do_traditional' + This variable is true if `gawk' was invoked with `--traditional' + option. + +`do_profile' + This variable is true if `gawk' was invoked with `--profile' + option. + +`do_sandbox' + This variable is true if `gawk' was invoked with `--sandbox' + option. + +`do_debug' + This variable is true if `gawk' was invoked with `--debug' option. + +`do_mpfr' + This variable is true if `gawk' was invoked with `--bignum' option. + + The value of `do_lint' can change if `awk' code modifies the `LINT' +built-in variable (*note Built-in Variables::). The others should not +change during execution. + + +File: gawk.info, Node: Extension API Boilerplate, Next: Finding Extensions, Prev: Extension API Variables, Up: Extension API Description + +16.4.12 Boilerplate Code +------------------------ + +As mentioned earlier (*note Extension Mechanism Outline::), the function +definitions as presented are really macros. To use these macros, your +extension must provide a small amount of boilerplate code (variables and +functions) towards the top of your source file, using pre-defined names +as described below. The boilerplate needed is also provided in comments +in the `gawkapi.h' header file: + + /* Boiler plate code: */ + int plugin_is_GPL_compatible; + + static gawk_api_t *const api; + static awk_ext_id_t ext_id; + static const char *ext_version = NULL; /* or ... = "some string" */ + + static awk_ext_func_t func_table[] = { + { "name", do_name, 1 }, + /* ... */ + }; + + /* EITHER: */ + + static awk_bool_t (*init_func)(void) = NULL; + + /* OR: */ + + static awk_bool_t + init_my_module(void) + { + ... + } + + static awk_bool_t (*init_func)(void) = init_my_module; + + dl_load_func(func_table, some_name, "name_space_in_quotes") + + These variables and functions are as follows: + +`int plugin_is_GPL_compatible;' + This asserts that the extension is compatible with the GNU GPL + (*note Copying::). If your extension does not have this, `gawk' + will not load it (*note Plugin License::). + +`static gawk_api_t *const api;' + This global `static' variable should be set to point to the + `gawk_api_t' pointer that `gawk' passes to your `dl_load()' + function. This variable is used by all of the macros. + +`static awk_ext_id_t ext_id;' + This global static variable should be set to the `awk_ext_id_t' + value that `gawk' passes to your `dl_load()' function. This + variable is used by all of the macros. + +`static const char *ext_version = NULL; /* or ... = "some string" */' + This global `static' variable should be set either to `NULL', or + to point to a string giving the name and version of your extension. + +`static awk_ext_func_t func_table[] = { ... };' + This is an array of one or more `awk_ext_func_t' structures as + described earlier (*note Extension Functions::). It can then be + looped over for multiple calls to `add_ext_func()'. + +`static awk_bool_t (*init_func)(void) = NULL;' +` OR' +`static awk_bool_t init_my_module(void) { ... }' +`static awk_bool_t (*init_func)(void) = init_my_module;' + If you need to do some initialization work, you should define a + function that does it (creates variables, opens files, etc.) and + then define the `init_func' pointer to point to your function. + The function should return zero (false) upon failure, non-zero + (success) if everything goes well. + + If you don't need to do any initialization, define the pointer and + initialize it to `NULL'. + +`dl_load_func(func_table, some_name, "name_space_in_quotes")' + This macro expands to a `dl_load()' function that performs all the + necessary initializations. + + The point of the all the variables and arrays is to let the +`dl_load()' function (from the `dl_load_func()' macro) do all the +standard work. It does the following: + + 1. Check the API versions. If the extension major version does not + match `gawk''s, or if the extension minor version is greater than + `gawk''s, it prints a fatal error message and exits. + + 2. Load the functions defined in `func_table'. If any of them fails + to load, it prints a warning message but continues on. + + 3. If the `init_func' pointer is not `NULL', call the function it + points to. If it returns non-zero, print a warning message. + + 4. If `ext_version' is not `NULL', register the version string with + `gawk'. + + +File: gawk.info, Node: Finding Extensions, Prev: Extension API Boilerplate, Up: Extension API Description + +16.4.13 How `gawk' Finds Extensions +----------------------------------- + +Compiled extensions have to be installed in a directory where `gawk' +can find them. If `gawk' is configured and built in the default +fashion, the directory in which to find extensions is +`/usr/local/lib/gawk'. You can also specify a search path with a list +of directories to search for compiled extensions. *Note AWKLIBPATH +Variable::, for more information. + + +File: gawk.info, Node: Extension Example, Next: Extension Samples, Prev: Extension API Description, Up: Dynamic Extensions + +16.5 Example: Some File Functions +================================= + + No matter where you go, there you are. + Buckaroo Bonzai + + Two useful functions that are not in `awk' are `chdir()' (so that an `awk' program can change its directory) and `stat()' (so that an `awk' program can gather information about a file). This minor node -implements these functions for `gawk' in an external extension library. +implements these functions for `gawk' in an extension. * Menu: @@ -21233,9 +23575,9 @@ implements these functions for `gawk' in an external extension library. * Using Internal File Ops:: How to use an external extension. -File: gawk.info, Node: Internal File Description, Next: Internal File Ops, Up: Sample Library +File: gawk.info, Node: Internal File Description, Next: Internal File Ops, Up: Extension Example -16.2.1 Using `chdir()' and `stat()' +16.5.1 Using `chdir()' and `stat()' ----------------------------------- This minor node shows how to use the new functions at the `awk' level @@ -21243,6 +23585,7 @@ once they've been integrated into the running `gawk' interpreter. Using `chdir()' is very straightforward. It takes one argument, the new directory to change to: + @load "filefuncs" ... newdir = "/home/arnold/funstuff" ret = chdir(newdir) @@ -21253,7 +23596,7 @@ directory to change to: } ... - The return value is negative if the `chdir' failed, and `ERRNO' + The return value is negative if the `chdir()' failed, and `ERRNO' (*note Built-in Variables::) is set to a string indicating the error. Using `stat()' is a bit more complicated. The C `stat()' function @@ -21262,7 +23605,6 @@ way to model this in `awk' is to fill in an associative array with the appropriate information: file = "/home/arnold/.profile" - fdata[1] = "x" # force `fdata' to be an array ret = stat(file, fdata) if (ret < 0) { printf("could not stat %s: %s\n", @@ -21304,11 +23646,11 @@ appropriate information: `"ctime"' The file's last access, modification, and inode update times, respectively. These are numeric timestamps, suitable for - formatting with `strftime()' (*note Built-in::). + formatting with `strftime()' (*note Time Functions::). `"pmode"' The file's "printable mode." This is a string representation of - the file's type and permissions, such as what is produced by `ls + the file's type and permissions, such as is produced by `ls -l'--for example, `"drwxr-xr-x"'. `"type"' @@ -21356,57 +23698,87 @@ Elements::): components of that number, respectively. -File: gawk.info, Node: Internal File Ops, Next: Using Internal File Ops, Prev: Internal File Description, Up: Sample Library +File: gawk.info, Node: Internal File Ops, Next: Using Internal File Ops, Prev: Internal File Description, Up: Extension Example -16.2.2 C Code for `chdir()' and `stat()' +16.5.2 C Code for `chdir()' and `stat()' ---------------------------------------- -Here is the C code for these extensions. They were written for -GNU/Linux. The code needs some more work for complete portability to -other POSIX-compliant systems:(1) +Here is the C code for these extensions.(1) - #include "awk.h" + The file includes a number of standard header files, and then +includes the `gawkapi.h' header file which provides the API definitions. +Those are followed by the necessary variable declarations to make use +of the API macros and boilerplate code (*note Extension API +Boilerplate::). - #include <sys/sysmacros.h> + #ifdef HAVE_CONFIG_H + #include <config.h> + #endif + + #include <stdio.h> + #include <assert.h> + #include <errno.h> + #include <stdlib.h> + #include <string.h> + #include <unistd.h> + + #include <sys/types.h> + #include <sys/stat.h> + + #include "gawkapi.h" + + #include "gettext.h" + #define _(msgid) gettext(msgid) + #define N_(msgid) msgid + + #include "gawkfts.h" + #include "stack.h" + + static const gawk_api_t *api; /* for convenience macros to work */ + static awk_ext_id_t *ext_id; + static awk_bool_t init_filefuncs(void); + static awk_bool_t (*init_func)(void) = init_filefuncs; + static const char *ext_version = "filefuncs extension: version 1.0"; int plugin_is_GPL_compatible; + By convention, for an `awk' function `foo()', the C function that +implements it is called `do_foo()'. The function should have two +arguments: the first is an `int' usually called `nargs', that +represents the number of actual arguments for the function. The second +is a pointer to an `awk_value_t', usually named `result'. + /* do_chdir --- provide dynamically loaded chdir() builtin for gawk */ - static NODE * - do_chdir(int nargs) + static awk_value_t * + do_chdir(int nargs, awk_value_t *result) { - NODE *newdir; + awk_value_t newdir; int ret = -1; - if (do_lint && nargs != 1) - lintwarn("chdir: called with incorrect number of arguments"); - - newdir = get_scalar_argument(0, FALSE); + assert(result != NULL); - The file includes the `"awk.h"' header file for definitions for the -`gawk' internals. It includes `<sys/sysmacros.h>' for access to the -`major()' and `minor'() macros. + if (do_lint && nargs != 1) + lintwarn(ext_id, + _("chdir: called with incorrect number of arguments, " + "expecting 1")); - By convention, for an `awk' function `foo', the function that -implements it is called `do_foo'. The function should take a `int' -argument, usually called `nargs', that represents the number of defined -arguments for the function. The `newdir' variable represents the new -directory to change to, retrieved with `get_scalar_argument()'. Note -that the first argument is numbered zero. + The `newdir' variable represents the new directory to change to, +retrieved with `get_argument()'. Note that the first argument is +numbered zero. - This code actually accomplishes the `chdir()'. It first forces the -argument to be a string and passes the string value to the `chdir()' -system call. If the `chdir()' fails, `ERRNO' is updated. + If the argument is retrieved successfully, the function calls the +`chdir()' system call. If the `chdir()' fails, `ERRNO' is updated. - (void) force_string(newdir); - ret = chdir(newdir->stptr); - if (ret < 0) - update_ERRNO_int(errno); + if (get_argument(0, AWK_STRING, & newdir)) { + ret = chdir(newdir.str_value.str); + if (ret < 0) + update_ERRNO_int(errno); + } Finally, the function returns the return value to the `awk' level: - return make_number((AWKNUM) ret); + return make_number(ret, result); } The `stat()' built-in is more involved. First comes a function that @@ -21421,71 +23793,239 @@ turns a numeric mode into a printable representation (e.g., 644 becomes ... } - Next comes the `do_stat()' function. It starts with variable + Next comes a function for reading symbolic links, which is also +omitted here for brevity: + + /* read_symlink --- read a symbolic link into an allocated buffer. + ... */ + + static char * + read_symlink(const char *fname, size_t bufsize, ssize_t *linksize) + { + ... + } + + Two helper functions simplify entering values in the array that will +contain the result of the `stat()': + + /* array_set --- set an array element */ + + static void + array_set(awk_array_t array, const char *sub, awk_value_t *value) + { + awk_value_t index; + + set_array_element(array, + make_const_string(sub, strlen(sub), & index), + value); + + } + + /* array_set_numeric --- set an array element with a number */ + + static void + array_set_numeric(awk_array_t array, const char *sub, double num) + { + awk_value_t tmp; + + array_set(array, sub, make_number(num, & tmp)); + } + + The following function does most of the work to fill in the +`awk_array_t' result array with values obtained from a valid `struct +stat'. It is done in a separate function to support the `stat()' +function for `gawk' and also to support the `fts()' extension which is +included in the same file but whose code is not shown here (*note +Extension Sample File Functions::). + + The first part of the function is variable declarations, including a +table to map file types to strings: + + /* fill_stat_array --- do the work to fill an array with stat info */ + + static int + fill_stat_array(const char *name, awk_array_t array, struct stat *sbuf) + { + char *pmode; /* printable mode */ + const char *type = "unknown"; + awk_value_t tmp; + static struct ftype_map { + unsigned int mask; + const char *type; + } ftype_map[] = { + { S_IFREG, "file" }, + { S_IFBLK, "blockdev" }, + { S_IFCHR, "chardev" }, + { S_IFDIR, "directory" }, + #ifdef S_IFSOCK + { S_IFSOCK, "socket" }, + #endif + #ifdef S_IFIFO + { S_IFIFO, "fifo" }, + #endif + #ifdef S_IFLNK + { S_IFLNK, "symlink" }, + #endif + #ifdef S_IFDOOR /* Solaris weirdness */ + { S_IFDOOR, "door" }, + #endif /* S_IFDOOR */ + }; + int j, k; + + The destination array is cleared, and then code fills in various +elements based on values in the `struct stat': + + /* empty out the array */ + clear_array(array); + + /* fill in the array */ + array_set(array, "name", make_const_string(name, strlen(name), + & tmp)); + array_set_numeric(array, "dev", sbuf->st_dev); + array_set_numeric(array, "ino", sbuf->st_ino); + array_set_numeric(array, "mode", sbuf->st_mode); + array_set_numeric(array, "nlink", sbuf->st_nlink); + array_set_numeric(array, "uid", sbuf->st_uid); + array_set_numeric(array, "gid", sbuf->st_gid); + array_set_numeric(array, "size", sbuf->st_size); + array_set_numeric(array, "blocks", sbuf->st_blocks); + array_set_numeric(array, "atime", sbuf->st_atime); + array_set_numeric(array, "mtime", sbuf->st_mtime); + array_set_numeric(array, "ctime", sbuf->st_ctime); + + /* for block and character devices, add rdev, + major and minor numbers */ + if (S_ISBLK(sbuf->st_mode) || S_ISCHR(sbuf->st_mode)) { + array_set_numeric(array, "rdev", sbuf->st_rdev); + array_set_numeric(array, "major", major(sbuf->st_rdev)); + array_set_numeric(array, "minor", minor(sbuf->st_rdev)); + } + +The latter part of the function makes selective additions to the +destination array, depending upon the availability of certain members +and/or the type of the file. It then returns zero, for success: + + #ifdef HAVE_ST_BLKSIZE + array_set_numeric(array, "blksize", sbuf->st_blksize); + #endif /* HAVE_ST_BLKSIZE */ + + pmode = format_mode(sbuf->st_mode); + array_set(array, "pmode", make_const_string(pmode, strlen(pmode), + & tmp)); + + /* for symbolic links, add a linkval field */ + if (S_ISLNK(sbuf->st_mode)) { + char *buf; + ssize_t linksize; + + if ((buf = read_symlink(name, sbuf->st_size, + & linksize)) != NULL) + array_set(array, "linkval", + make_malloced_string(buf, linksize, & tmp)); + else + warning(ext_id, _("stat: unable to read symbolic link `%s'"), + name); + } + + /* add a type field */ + type = "unknown"; /* shouldn't happen */ + for (j = 0, k = sizeof(ftype_map)/sizeof(ftype_map[0]); j < k; j++) { + if ((sbuf->st_mode & S_IFMT) == ftype_map[j].mask) { + type = ftype_map[j].type; + break; + } + } + + array_set(array, "type", make_const_string(type, strlen(type), &tmp)); + + return 0; + } + + Finally, here is the `do_stat()' function. It starts with variable declarations and argument checking: /* do_stat --- provide a stat() function for gawk */ - static NODE * - do_stat(int nargs) + static awk_value_t * + do_stat(int nargs, awk_value_t *result) { - NODE *file, *array, *tmp; - struct stat sbuf; + awk_value_t file_param, array_param; + char *name; + awk_array_t array; int ret; - NODE **aptr; - char *pmode; /* printable mode */ - char *type = "unknown"; + struct stat sbuf; + + assert(result != NULL); - if (do_lint && nargs > 2) - lintwarn("stat: called with too many arguments"); + if (do_lint && nargs != 2) { + lintwarn(ext_id, + _("stat: called with wrong number of arguments")); + return make_number(-1, result); + } Then comes the actual work. First, the function gets the arguments. -Then, it always clears the array. The code use `lstat()' (instead of -`stat()') to get the file information, in case the file is a symbolic -link. If there's an error, it sets `ERRNO' and returns: +Next, it gets the information for the file. The code use `lstat()' +(instead of `stat()') to get the file information, in case the file is +a symbolic link. If there's an error, it sets `ERRNO' and returns: /* file is first arg, array to hold results is second */ - file = get_scalar_argument(0, FALSE); - array = get_array_argument(1, FALSE); + if ( ! get_argument(0, AWK_STRING, & file_param) + || ! get_argument(1, AWK_ARRAY, & array_param)) { + warning(ext_id, _("stat: bad parameters")); + return make_number(-1, result); + } - /* empty out the array */ - assoc_clear(array); + name = file_param.str_value.str; + array = array_param.array_cookie; + + /* always empty out the array */ + clear_array(array); /* lstat the file, if error, set ERRNO and return */ - (void) force_string(file); - ret = lstat(file->stptr, & sbuf); + ret = lstat(name, & sbuf); if (ret < 0) { update_ERRNO_int(errno); - return make_number((AWKNUM) ret); + return make_number(ret, result); } - Now comes the tedious part: filling in the array. Only a few of the -calls are shown here, since they all follow the same pattern: + The tedious work is done by `fill_stat_array()', shown earlier. +When done, return the result from `fill_stat_array()': - /* fill in the array */ - aptr = assoc_lookup(array, tmp = make_string("name", 4)); - *aptr = dupnode(file); - unref(tmp); + ret = fill_stat_array(name, array, & sbuf); - aptr = assoc_lookup(array, tmp = make_string("mode", 4)); - *aptr = make_number((AWKNUM) sbuf.st_mode); - unref(tmp); + return make_number(ret, result); + } - aptr = assoc_lookup(array, tmp = make_string("pmode", 5)); - pmode = format_mode(sbuf.st_mode); - *aptr = make_string(pmode, strlen(pmode)); - unref(tmp); + Finally, it's necessary to provide the "glue" that loads the new +function(s) into `gawk'. - When done, return the `lstat()' return value: + The `filefuncs' extension also provides an `fts()' function, which +we omit here. For its sake there is an initialization function: + /* init_filefuncs --- initialization routine */ - return make_number((AWKNUM) ret); + static awk_bool_t + init_filefuncs(void) + { + ... } - Finally, it's necessary to provide the "glue" that loads the new -function(s) into `gawk'. By convention, each library has a routine -named `dl_load()' that does the job. The simplest way is to use the -`dl_load_func' macro in `gawkapi.h'. + We are almost done. We need an array of `awk_ext_func_t' structures +for loading each function into `gawk': + + static awk_ext_func_t func_table[] = { + { "chdir", do_chdir, 1 }, + { "stat", do_stat, 2 }, + { "fts", do_fts, 3 }, + }; + + Each extension must have a routine named `dl_load()' to load +everything that needs to be loaded. It is simplest to use the +`dl_load_func()' macro in `gawkapi.h': + + /* define the dl_load() function using the boilerplate macro */ + + dl_load_func(func_table, filefuncs, "") And that's it! As an exercise, consider adding functions to implement system calls such as `chown()', `chmod()', and `umask()'. @@ -21497,34 +24037,33 @@ implement system calls such as `chown()', `chmod()', and `umask()'. version. -File: gawk.info, Node: Using Internal File Ops, Prev: Internal File Ops, Up: Sample Library +File: gawk.info, Node: Using Internal File Ops, Prev: Internal File Ops, Up: Extension Example -16.2.3 Integrating the Extensions +16.5.3 Integrating The Extensions --------------------------------- Now that the code is written, it must be possible to add it at runtime to the running `gawk' interpreter. First, the code must be compiled. Assuming that the functions are in a file named `filefuncs.c', and IDIR -is the location of the `gawk' include files, the following steps create -a GNU/Linux shared library: +is the location of the `gawkapi.h' header file, the following steps(1) +create a GNU/Linux shared library: $ gcc -fPIC -shared -DHAVE_CONFIG_H -c -O -g -IIDIR filefuncs.c - $ ld -o filefuncs.so -shared filefuncs.o + $ ld -o filefuncs.so -shared filefuncs.o -lc - Once the library exists, it is loaded by calling the `extension()' -built-in function. This function takes two arguments: the name of the -library to load and the name of a function to call when the library is -first loaded. This function adds the new functions to `gawk'. It -returns the value returned by the initialization function within the -shared library: + Once the library exists, it is loaded by using the `@load' keyword. # file testff.awk + @load "filefuncs" + BEGIN { - extension("./filefuncs.so", "dl_load") + "pwd" | getline curdir # save current directory + close("pwd") - chdir(".") # no-op + chdir("/tmp") + system("pwd") # test it + chdir(curdir) # go back - data[1] = 1 # force `data' to be an array print "Info for testff.awk" ret = stat("testff.awk", data) print "ret =", ret @@ -21541,32 +24080,642 @@ shared library: print "JUNK modified:", strftime("%m %d %y %H:%M:%S", data["mtime"]) } - Here are the results of running the program: + The `AWKLIBPATH' environment variable tells `gawk' where to find +shared libraries (*note Finding Extensions::). We set it to the +current directory and run the program: - $ gawk -f testff.awk + $ AWKLIBPATH=$PWD gawk -f testff.awk + -| /tmp -| Info for testff.awk -| ret = 0 - -| data["size"] = 607 - -| data["ino"] = 14945891 - -| data["name"] = testff.awk - -| data["pmode"] = -rw-rw-r-- - -| data["nlink"] = 1 - -| data["atime"] = 1293993369 - -| data["mtime"] = 1288520752 - -| data["mode"] = 33204 -| data["blksize"] = 4096 - -| data["dev"] = 2054 + -| data["mtime"] = 1350838628 + -| data["mode"] = 33204 -| data["type"] = file - -| data["gid"] = 500 - -| data["uid"] = 500 + -| data["dev"] = 2053 + -| data["gid"] = 1000 + -| data["ino"] = 1719496 + -| data["ctime"] = 1350838628 -| data["blocks"] = 8 - -| data["ctime"] = 1290113572 - -| testff.awk modified: 10 31 10 12:25:52 + -| data["nlink"] = 1 + -| data["name"] = testff.awk + -| data["atime"] = 1350838632 + -| data["pmode"] = -rw-rw-r-- + -| data["size"] = 662 + -| data["uid"] = 1000 + -| testff.awk modified: 10 21 12 18:57:08 -| -| Info for JUNK -| ret = -1 -| JUNK modified: 01 01 70 02:00:00 + ---------- Footnotes ---------- + + (1) In practice, you would probably want to use the GNU +Autotools--Automake, Autoconf, Libtool, and Gettext--to configure and +build your libraries. Instructions for doing so are beyond the scope of +this Info file. *Note gawkextlib::, for WWW links to the tools. + + +File: gawk.info, Node: Extension Samples, Next: gawkextlib, Prev: Extension Example, Up: Dynamic Extensions + +16.6 The Sample Extensions In The `gawk' Distribution +===================================================== + +This minor node provides brief overviews of the sample extensions that +come in the `gawk' distribution. Some of them are intended for +production use, such the `filefuncs' and `readdir' extensions. Others +mainly provide example code that shows how to use the extension API. + +* Menu: + +* Extension Sample File Functions:: The file functions sample. +* Extension Sample Fnmatch:: An interface to `fnmatch()'. +* Extension Sample Fork:: An interface to `fork()' and other + process functions. +* Extension Sample Ord:: Character to value to character + conversions. +* Extension Sample Readdir:: An interface to `readdir()'. +* Extension Sample Revout:: Reversing output sample output wrapper. +* Extension Sample Rev2way:: Reversing data sample two-way processor. +* Extension Sample Read write array:: Serializing an array to a file. +* Extension Sample Readfile:: Reading an entire file into a string. +* Extension Sample API Tests:: Tests for the API. +* Extension Sample Time:: An interface to `gettimeofday()' + and `sleep()'. + + +File: gawk.info, Node: Extension Sample File Functions, Next: Extension Sample Fnmatch, Up: Extension Samples + +16.6.1 File Related Functions +----------------------------- + +The `filefuncs' extension provides three different functions, as +follows: The usage is: + +`@load "filefuncs"' + This is how you load the extension. + +`result = chdir("/some/directory")' + The `chdir()' function is a direct hook to the `chdir()' system + call to change the current directory. It returns zero upon + success or less than zero upon error. In the latter case it + updates `ERRNO'. + +`result = stat("/some/path", statdata)' + The `stat()' function provides a hook into the `stat()' system + call. In fact, it uses `lstat()'. It returns zero upon success or + less than zero upon error. In the latter case it updates `ERRNO'. + + In all cases, it clears the `statdata' array. When the call is + successful, `stat()' fills the `statdata' array with information + retrieved from the filesystem, as follows: + + `statdata["name"]' The name of the file. + `statdata["dev"]' Corresponds to the `st_dev' field in + the `struct stat'. + `statdata["ino"]' Corresponds to the `st_ino' field in + the `struct stat'. + `statdata["mode"]' Corresponds to the `st_mode' field in + the `struct stat'. + `statdata["nlink"]' Corresponds to the `st_nlink' field in + the `struct stat'. + `statdata["uid"]' Corresponds to the `st_uid' field in + the `struct stat'. + `statdata["gid"]' Corresponds to the `st_gid' field in + the `struct stat'. + `statdata["size"]' Corresponds to the `st_size' field in + the `struct stat'. + `statdata["atime"]' Corresponds to the `st_atime' field in + the `struct stat'. + `statdata["mtime"]' Corresponds to the `st_mtime' field in + the `struct stat'. + `statdata["ctime"]' Corresponds to the `st_ctime' field in + the `struct stat'. + `statdata["rdev"]' Corresponds to the `st_rdev' field in + the `struct stat'. This element is + only present for device files. + `statdata["major"]' Corresponds to the `st_major' field in + the `struct stat'. This element is + only present for device files. + `statdata["minor"]' Corresponds to the `st_minor' field in + the `struct stat'. This element is + only present for device files. + `statdata["blksize"]' Corresponds to the `st_blksize' field + in the `struct stat'. if this field is + present on your system. (It is present + on all modern systems that we know of.) + `statdata["pmode"]' A human-readable version of the mode + value, such as printed by `ls'. For + example, `"-rwxr-xr-x"'. + `statdata["linkval"]' If the named file is a symbolic link, + this element will exist and its value + is the value of the symbolic link + (where the symbolic link points to). + `statdata["type"]' The type of the file as a string. One + of `"file"', `"blockdev"', `"chardev"', + `"directory"', `"socket"', `"fifo"', + `"symlink"', `"door"', or `"unknown"'. + Not all systems support all file types. + +`flags = or(FTS_PHYSICAL, ...)' +`result = fts(pathlist, flags, filedata)' + Walk the file trees provided in `pathlist' and fill in the + `filedata' array as described below. `flags' is the bitwise OR of + several predefined constant values, also as described below. + Return zero if there were no errors, otherwise return -1. + + The `fts()' function provides a hook to the C library `fts()' +routines for traversing file hierarchies. Instead of returning data +about one file at a time in a stream, it fills in a multi-dimensional +array with data about each file and directory encountered in the +requested hierarchies. + + The arguments are as follows: + +`pathlist' + An array of filenames. The element values are used; the index + values are ignored. + +`flags' + This should be the bitwise OR of one or more of the following + predefined constant flag values. At least one of `FTS_LOGICAL' or + `FTS_PHYSICAL' must be provided; otherwise `fts()' returns an + error value and sets `ERRNO'. The flags are: + + `FTS_LOGICAL' + Do a "logical" file traversal, where the information returned + for a symbolic link refers to the linked-to file, and not to + the symbolic link itself. This flag is mutually exclusive + with `FTS_PHYSICAL'. + + `FTS_PHYSICAL' + Do a "physical" file traversal, where the information + returned for a symbolic link refers to the symbolic link + itself. This flag is mutually exclusive with `FTS_LOGICAL'. + + `FTS_NOCHDIR' + As a performance optimization, the C library `fts()' routines + change directory as they traverse a file hierarchy. This + flag disables that optimization. + + `FTS_COMFOLLOW' + Immediately follow a symbolic link named in `pathlist', + whether or not `FTS_LOGICAL' is set. + + `FTS_SEEDOT' + By default, the `fts()' routines do not return entries for `.' + and `..'. This option causes entries for `..' to also be + included. (The extension always includes an entry for `.', + see below.) + + `FTS_XDEV' + During a traversal, do not cross onto a different mounted + filesystem. + +`filedata' + The `filedata' array is first cleared. Then, `fts()' creates an + element in `filedata' for every element in `pathlist'. The index + is the name of the directory or file given in `pathlist'. The + element for this index is itself an array. There are two cases. + + _The path is a file._ + In this case, the array contains two or three elements: + + `"path"' + The full path to this file, starting from the "root" + that was given in the `pathlist' array. + + `"stat"' + This element is itself an array, containing the same + information as provided by the `stat()' function + described earlier for its `statdata' argument. The + element may not be present if the `stat()' system call + for the file failed. + + `"error"' + If some kind of error was encountered, the array will + also contain an element named `"error"', which is a + string describing the error. + + _The path is a directory._ + In this case, the array contains one element for each entry + in the directory. If an entry is a file, that element is as + for files, just described. If the entry is a directory, that + element is (recursively), an array describing the + subdirectory. If `FTS_SEEDOT' was provided in the flags, + then there will also be an element named `".."'. This + element will be an array containing the data as provided by + `stat()'. + + In addition, there will be an element whose index is `"."'. + This element is an array containing the same two or three + elements as for a file: `"path"', `"stat"', and `"error"'. + + The `fts()' function returns zero if there were no errors. +Otherwise it returns -1. + + NOTE: The `fts()' extension does not exactly mimic the interface + of the C library `fts()' routines, choosing instead to provide an + interface that is based on associative arrays, which should be + more comfortable to use from an `awk' program. This includes the + lack of a comparison function, since `gawk' already provides + powerful array sorting facilities. While an `fts_read()'-like + interface could have been provided, this felt less natural than + simply creating a multi-dimensional array to represent the file + hierarchy and its information. + + See `test/fts.awk' in the `gawk' distribution for an example. + + +File: gawk.info, Node: Extension Sample Fnmatch, Next: Extension Sample Fork, Prev: Extension Sample File Functions, Up: Extension Samples + +16.6.2 Interface To `fnmatch()' +------------------------------- + +This extension provides an interface to the C library `fnmatch()' +function. The usage is: + + @load "fnmatch" + + result = fnmatch(pattern, string, flags) + + The `fnmatch' extension adds a single function named `fnmatch()', +one constant (`FNM_NOMATCH'), and an array of flag values named `FNM'. + + The arguments to `fnmatch()' are: + +`pattern' + The filename wildcard to match. + +`string' + The filename string, + +`flag' + Either zero, or the bitwise OR of one or more of the flags in the + `FNM' array. + + The return value is zero on success, `FNM_NOMATCH' if the string did +not match the pattern, or a different non-zero value if an error +occurred. + + The flags are follows: + +`FNM["CASEFOLD"]' Corresponds to the `FNM_CASEFOLD' flag as defined in + `fnmatch()'. +`FNM["FILE_NAME"]' Corresponds to the `FNM_FILE_NAME' flag as defined + in `fnmatch()'. +`FNM["LEADING_DIR"]' Corresponds to the `FNM_LEADING_DIR' flag as defined + in `fnmatch()'. +`FNM["NOESCAPE"]' Corresponds to the `FNM_NOESCAPE' flag as defined in + `fnmatch()'. +`FNM["PATHNAME"]' Corresponds to the `FNM_PATHNAME' flag as defined in + `fnmatch()'. +`FNM["PERIOD"]' Corresponds to the `FNM_PERIOD' flag as defined in + `fnmatch()'. + + Here is an example: + + @load "fnmatch" + ... + flags = or(FNM["PERIOD"], FNM["NOESCAPE"]) + if (fnmatch("*.a", "foo.c", flags) == FNM_NOMATCH) + print "no match" + + +File: gawk.info, Node: Extension Sample Fork, Next: Extension Sample Ord, Prev: Extension Sample Fnmatch, Up: Extension Samples + +16.6.3 Interface To `fork()', `wait()' and `waitpid()' +------------------------------------------------------ + +The `fork' extension adds three functions, as follows. + +`@load "fork"' + This is how you load the extension. + +`pid = fork()' + This function creates a new process. The return value is the zero + in the child and the process-id number of the child in the parent, + or -1 upon error. In the latter case, `ERRNO' indicates the + problem. In the child, `PROCINFO["pid"]' and `PROCINFO["ppid"]' + are updated to reflect the correct values. + +`ret = waitpid(pid)' + This function takes a numeric argument, which is the process-id to + wait for. The return value is that of the `waitpid()' system call. + +`ret = wait()' + This function waits for the first child to die. The return value + is that of the `wait()' system call. + + There is no corresponding `exec()' function. + + Here is an example: + + @load "fork" + ... + if ((pid = fork()) == 0) + print "hello from the child" + else + print "hello from the parent" + + +File: gawk.info, Node: Extension Sample Ord, Next: Extension Sample Readdir, Prev: Extension Sample Fork, Up: Extension Samples + +16.6.4 Character and Numeric values: `ord()' and `chr()' +-------------------------------------------------------- + +The `ordchr' extension adds two functions, named `ord()' and `chr()', +as follows. + +`number = ord(string)' + Return the numeric value of the first character in `string'. + +`char = chr(number)' + Return the string whose first character is that represented by + `number'. + + These functions are inspired by the Pascal language functions of the +same name. Here is an example: + + @load "ordchr" + ... + printf("The numeric value of 'A' is %d\n", ord("A")) + printf("The string value of 65 is %s\n", chr(65)) + + +File: gawk.info, Node: Extension Sample Readdir, Next: Extension Sample Revout, Prev: Extension Sample Ord, Up: Extension Samples + +16.6.5 Reading Directories +-------------------------- + +The `readdir' extension adds an input parser for directories, and adds +a single function named `readdir_do_ftype()'. The usage is as follows: + + @load "readdir" + + readdir_do_ftype("stat") # or "dirent" or "never" + + When this extension is in use, instead of skipping directories named +on the command line (or with `getline'), they are read, with each entry +returned as a record. + + The record consists of at least two fields: the inode number and the +filename, separated by a forward slash character. On systems where the +directory entry contains the file type, the record has a third field +which is a single letter indicating the type of the file: + +Letter File Type +-------------------------------------------------------------------------- +`b' Block device +`c' Character device +`d' Directory +`f' Regular file +`l' Symbolic link +`p' Named pipe (FIFO) +`s' Socket +`u' Anything else (unknown) + + On systems without the file type information, calling +`readdir_do_ftype("stat")' causes the extension to use the `lstat()' +system call to retrieve the appropriate information. This is not the +default, since `lstat()' is a potentially expensive operation. By +calling `readdir_do_ftype("never")' one can ensure that the file type +information is never displayed, even when readily available in the +directory entry. + + The third option, `readdir_do_ftype("dirent")', takes file type +information from the directory entry, if it is available. This is the +default on systems that supply this information. + + The `readdir_do_ftype()' function sets `ERRNO' if called without +arguments or with invalid arguments. + + NOTE: On GNU/Linux systems, there are filesystems that don't + support the `d_type' entry (see the readdir(3) manual page), and + so the file type is always `u'. Therefore, using + `readdir_do_ftype("stat")' is advisable even on GNU/Linux systems. + In this case, the `readdir' extension falls back to using + `lstat()' when it encounters an unknown file type. + + Here is an example: + + @load "readdir" + ... + BEGIN { FS = "/" } + { print "file name is", $2 } + + +File: gawk.info, Node: Extension Sample Revout, Next: Extension Sample Rev2way, Prev: Extension Sample Readdir, Up: Extension Samples + +16.6.6 Reversing Output +----------------------- + +The `revoutput' extension adds a simple output wrapper that reverses +the characters in each output line. It's main purpose is to show how to +write an output wrapper, although it may be mildly amusing for the +unwary. Here is an example: + + @load "revoutput" + + BEGIN { + REVOUT = 1 + print "hello, world" > "/dev/stdout" + } + + The output from this program is: `dlrow ,olleh'. + + +File: gawk.info, Node: Extension Sample Rev2way, Next: Extension Sample Read write array, Prev: Extension Sample Revout, Up: Extension Samples + +16.6.7 Two-Way I/O Example +-------------------------- + +The `revtwoway' extension adds a simple two-way processor that reverses +the characters in each line sent to it for reading back by the `awk' +program. It's main purpose is to show how to write a two-way +processor, although it may also be mildly amusing. The following +example shows how to use it: + + @load "revtwoway" + + BEGIN { + cmd = "/magic/mirror" + print "hello, world" |& cmd + cmd |& getline result + print result + close(cmd) + } + + +File: gawk.info, Node: Extension Sample Read write array, Next: Extension Sample Readfile, Prev: Extension Sample Rev2way, Up: Extension Samples + +16.6.8 Dumping and Restoring An Array +------------------------------------- + +The `rwarray' extension adds two functions, named `writea()' and +`reada()', as follows: + +`ret = writea(file, array)' + This function takes a string argument, which is the name of the + file to which dump the array, and the array itself as the second + argument. `writea()' understands multidimensional arrays. It + returns one on success, or zero upon failure. + +`ret = reada(file, array)' + `reada()' is the inverse of `writea()'; it reads the file named as + its first argument, filling in the array named as the second + argument. It clears the array first. Here too, the return value + is one on success and zero upon failure. + + The array created by `reada()' is identical to that written by +`writea()' in the sense that the contents are the same. However, due to +implementation issues, the array traversal order of the recreated array +is likely to be different from that of the original array. As array +traversal order in `awk' is by default undefined, this is not +(technically) a problem. If you need to guarantee a particular +traversal order, use the array sorting features in `gawk' to do so +(*note Array Sorting::). + + The file contains binary data. All integral values are written in +network byte order. However, double precision floating-point values +are written as native binary data. Thus, arrays containing only string +data can theoretically be dumped on systems with one byte order and +restored on systems with a different one, but this has not been tried. + + Here is an example: + + @load "rwarray" + ... + ret = writea("arraydump.bin", array) + ... + ret = reada("arraydump.bin", array) + + +File: gawk.info, Node: Extension Sample Readfile, Next: Extension Sample API Tests, Prev: Extension Sample Read write array, Up: Extension Samples + +16.6.9 Reading An Entire File +----------------------------- + +The `readfile' extension adds a single function named `readfile()': + +`result = readfile("/some/path")' + The argument is the name of the file to read. The return value is + a string containing the entire contents of the requested file. + Upon error, the function returns the empty string and sets `ERRNO'. + + Here is an example: + + @load "readfile" + ... + contents = readfile("/path/to/file"); + if (contents == "" && ERRNO != "") { + print("problem reading file", ERRNO) > "/dev/stderr" + ... + } + + +File: gawk.info, Node: Extension Sample API Tests, Next: Extension Sample Time, Prev: Extension Sample Readfile, Up: Extension Samples + +16.6.10 API Tests +----------------- + +The `testext' extension exercises parts of the extension API that are +not tested by the other samples. The `extension/testext.c' file +contains both the C code for the extension and `awk' test code inside C +comments that run the tests. The testing framework extracts the `awk' +code and runs the tests. See the source file for more information. + + +File: gawk.info, Node: Extension Sample Time, Prev: Extension Sample API Tests, Up: Extension Samples + +16.6.11 Extension Time Functions +-------------------------------- + +These functions can be used by either invoking `gawk' with a +command-line argument of `-l time' or by inserting `@load "time"' in +your script. + +`the_time = gettimeofday()' + Return the time in seconds that has elapsed since 1970-01-01 UTC + as a floating point value. If the time is unavailable on this + platform, return -1 and set `ERRNO'. The returned time should + have sub-second precision, but the actual precision will vary + based on the platform. If the standard C `gettimeofday()' system + call is available on this platform, then it simply returns the + value. Otherwise, if on Windows, it tries to use + `GetSystemTimeAsFileTime()'. + +`result = sleep(SECONDS)' + Attempt to sleep for SECONDS seconds. If SECONDS is negative, or + the attempt to sleep fails, return -1 and set `ERRNO'. Otherwise, + return zero after sleeping for the indicated amount of time. Note + that SECONDS may be a floating-point (non-integral) value. + Implementation details: depending on platform availability, this + function tries to use `nanosleep()' or `select()' to implement the + delay. + + +File: gawk.info, Node: gawkextlib, Prev: Extension Samples, Up: Dynamic Extensions + +16.7 The `gawkextlib' Project +============================= + +The `gawkextlib' (http://sourceforge.net/projects/gawkextlib/) project +provides a number of `gawk' extensions, including one for processing +XML files. This is the evolution of the original `xgawk' (XML `gawk') +project. + + As of this writing, there are four extensions: + + * XML parser extension, using the Expat + (http://expat.sourceforge.net) XML parsing library. + + * Postgres SQL extension. + + * GD graphics library extension. + + * MPFR library extension. This provides access to a number of MPFR + functions which `gawk''s native MPFR support does not. + + The `time' extension described earlier (*note Extension Sample +Time::) was originally from this project but has been moved in to the +main `gawk' distribution. + + You can check out the code for the `gawkextlib' project using the +GIT (http://git-scm.com) distributed source code control system. The +command is as follows: + + git clone git://git.code.sf.net/p/gawkextlib/code gawkextlib-code + + You will need to have the Expat (http://expat.sourceforge.net) XML +parser library installed in order to build and use the XML extension. + + In addition, you must have the GNU Autotools installed (Autoconf +(http://www.gnu.org/software/autoconf), Automake +(http://www.gnu.org/software/automake), Libtool +(http://www.gnu.org/software/libtool), and Gettext +(http://www.gnu.org/software/gettext)). + + The simple recipe for building and testing `gawkextlib' is as +follows. First, build and install `gawk': + + cd .../path/to/gawk/code + ./configure --prefix=/tmp/newgawk Install in /tmp/newgawk for now + make && make check Build and check that all is OK + make install Install gawk + + Next, build `gawkextlib' and test it: + + cd .../path/to/gawkextlib-code + ./update-autotools Generate configure, etc. + You may have to run this command twice + ./configure --with-gawk=/tmp/newgawk Configure, point at "installed" gawk + make && make check Build and check that all is OK + + If you write an extension that you wish to share with other `gawk' +users, please consider doing so through the `gawkextlib' project. + File: gawk.info, Node: Language History, Next: Installation, Prev: Dynamic Extensions, Up: Top @@ -26041,7 +29190,6 @@ Index * Ada programming language: Glossary. (line 20) * adding, features to gawk: Adding Code. (line 6) * adding, fields: Changing Fields. (line 53) -* adding, functions to gawk: Dynamic Extensions. (line 9) * advanced features, buffering: I/O Functions. (line 98) * advanced features, close() function: Close Files And Pipes. (line 131) @@ -26399,7 +29547,6 @@ Index * characters, transliterating: Translate Program. (line 6) * characters, values of as numbers: Ordinal Functions. (line 6) * Chassell, Robert J.: Acknowledgments. (line 33) -* chdir() function, implementing in gawk: Sample Library. (line 6) * chem utility: Glossary. (line 151) * chr() user-defined function: Ordinal Functions. (line 16) * clear debugger command: Breakpoint Control. (line 36) @@ -26771,7 +29918,6 @@ Index (line 162) * differences in awk and gawk, trunc-mod operation: Arithmetic Ops. (line 66) -* directories, changing: Sample Library. (line 6) * directories, command line: Command line directories. (line 6) * directories, searching <1>: Igawk Program. (line 368) @@ -26896,8 +30042,6 @@ Index (line 9) * expressions, selecting: Conditional Exp. (line 6) * Extended Regular Expressions (EREs): Bracket Expressions. (line 24) -* extension() function (gawk): Using Internal File Ops. - (line 15) * extensions, Brian Kernighan's awk <1>: Other Versions. (line 13) * extensions, Brian Kernighan's awk: BTL. (line 6) * extensions, common, ** operator: Arithmetic Ops. (line 36) @@ -26992,7 +30136,6 @@ Index * files, closing: I/O Functions. (line 10) * files, descriptors, See file descriptors: Special FD. (line 6) * files, group: Group Functions. (line 6) -* files, information about, retrieving: Sample Library. (line 6) * files, initialization and cleanup: Filetrans Function. (line 6) * files, input, See input files: Read Terminal. (line 17) * files, log, timestamps in: Time Functions. (line 6) @@ -27088,7 +30231,6 @@ Index (line 47) * functions, built-in <1>: Functions. (line 6) * functions, built-in: Function Calls. (line 10) -* functions, built-in, adding to gawk: Dynamic Extensions. (line 9) * functions, built-in, evaluation order: Calling Built-in. (line 30) * functions, defining: Definition Syntax. (line 6) * functions, library: Library Functions. (line 6) @@ -27170,7 +30312,6 @@ Index (line 26) * gawk, FUNCTAB array in: Auto-set. (line 119) * gawk, function arguments and: Calling Built-in. (line 16) -* gawk, functions, adding: Dynamic Extensions. (line 9) * gawk, hexadecimal numbers and: Nondecimal-numbers. (line 42) * gawk, IGNORECASE variable in <1>: Array Sorting Functions. (line 81) @@ -27268,6 +30409,8 @@ Index * gettext library: Explaining gettext. (line 6) * gettext library, locale categories: Explaining gettext. (line 80) * gettext() function (C library): Explaining gettext. (line 62) +* gettimeofday time extension function: Extension Sample Time. + (line 10) * GMP: Arbitrary Precision Arithmetic. (line 6) * GNITS mailing list: Acknowledgments. (line 52) @@ -27922,7 +31065,7 @@ Index (line 10) * programming conventions, functions, writing: Definition Syntax. (line 55) -* programming conventions, gawk internals: Internal File Ops. (line 33) +* programming conventions, gawk internals: Internal File Ops. (line 45) * programming conventions, private variable names: Library Names. (line 23) * programming language, recipe for: History. (line 6) @@ -28172,6 +31315,10 @@ Index * single-character fields: Single Character Fields. (line 6) * Skywalker, Luke: Undocumented. (line 6) +* sleep: Extension Sample Time. + (line 6) +* sleep time extension function: Extension Sample Time. + (line 20) * sleep utility: Alarm Program. (line 109) * Solaris, POSIX-compliant awk: Other Versions. (line 87) * sort function, arrays, sorting: Array Sorting Functions. @@ -28216,7 +31363,6 @@ Index * standard input <1>: Special FD. (line 6) * standard input: Read Terminal. (line 6) * standard output: Special FD. (line 6) -* stat() function, implementing in gawk: Sample Library. (line 6) * statements, compound, control statements and: Statements. (line 10) * statements, control, in actions: Statements. (line 6) * statements, multiple: Statements/Lines. (line 91) @@ -28308,6 +31454,8 @@ Index * tilde (~), ~ operator <5>: Computed Regexps. (line 6) * tilde (~), ~ operator <6>: Case-sensitivity. (line 26) * tilde (~), ~ operator: Regexp Usage. (line 19) +* time: Extension Sample Time. + (line 6) * time, alarm clock example program: Alarm Program. (line 9) * time, localization and: Explaining gettext. (line 115) * time, managing: Getlocaltime Function. @@ -28517,452 +31665,515 @@ Index Tag Table: Node: Top1352 -Node: Foreword31870 -Node: Preface36215 -Ref: Preface-Footnote-139268 -Ref: Preface-Footnote-239374 -Node: History39606 -Node: Names41997 -Ref: Names-Footnote-143474 -Node: This Manual43546 -Ref: This Manual-Footnote-148674 -Node: Conventions48774 -Node: Manual History50908 -Ref: Manual History-Footnote-154178 -Ref: Manual History-Footnote-254219 -Node: How To Contribute54293 -Node: Acknowledgments55437 -Node: Getting Started59933 -Node: Running gawk62312 -Node: One-shot63498 -Node: Read Terminal64723 -Ref: Read Terminal-Footnote-166373 -Ref: Read Terminal-Footnote-266649 -Node: Long66820 -Node: Executable Scripts68196 -Ref: Executable Scripts-Footnote-170065 -Ref: Executable Scripts-Footnote-270167 -Node: Comments70714 -Node: Quoting73181 -Node: DOS Quoting77804 -Node: Sample Data Files78479 -Node: Very Simple81511 -Node: Two Rules86110 -Node: More Complex88257 -Ref: More Complex-Footnote-191187 -Node: Statements/Lines91272 -Ref: Statements/Lines-Footnote-195734 -Node: Other Features95999 -Node: When96927 -Node: Invoking Gawk99074 -Node: Command Line100535 -Node: Options101318 -Ref: Options-Footnote-1116716 -Node: Other Arguments116741 -Node: Naming Standard Input119399 -Node: Environment Variables120493 -Node: AWKPATH Variable121051 -Ref: AWKPATH Variable-Footnote-1123809 -Node: AWKLIBPATH Variable124069 -Node: Other Environment Variables124666 -Node: Exit Status127161 -Node: Include Files127836 -Node: Loading Shared Libraries131405 -Node: Obsolete132630 -Node: Undocumented133327 -Node: Regexp133570 -Node: Regexp Usage134959 -Node: Escape Sequences136985 -Node: Regexp Operators142748 -Ref: Regexp Operators-Footnote-1150128 -Ref: Regexp Operators-Footnote-2150275 -Node: Bracket Expressions150373 -Ref: table-char-classes152263 -Node: GNU Regexp Operators154786 -Node: Case-sensitivity158509 -Ref: Case-sensitivity-Footnote-1161477 -Ref: Case-sensitivity-Footnote-2161712 -Node: Leftmost Longest161820 -Node: Computed Regexps163021 -Node: Reading Files166431 -Node: Records168434 -Ref: Records-Footnote-1177358 -Node: Fields177395 -Ref: Fields-Footnote-1180428 -Node: Nonconstant Fields180514 -Node: Changing Fields182716 -Node: Field Separators188697 -Node: Default Field Splitting191326 -Node: Regexp Field Splitting192443 -Node: Single Character Fields195785 -Node: Command Line Field Separator196844 -Node: Field Splitting Summary200285 -Ref: Field Splitting Summary-Footnote-1203477 -Node: Constant Size203578 -Node: Splitting By Content208162 -Ref: Splitting By Content-Footnote-1211888 -Node: Multiple Line211928 -Ref: Multiple Line-Footnote-1217775 -Node: Getline217954 -Node: Plain Getline220170 -Node: Getline/Variable222259 -Node: Getline/File223400 -Node: Getline/Variable/File224722 -Ref: Getline/Variable/File-Footnote-1226321 -Node: Getline/Pipe226408 -Node: Getline/Variable/Pipe228968 -Node: Getline/Coprocess230075 -Node: Getline/Variable/Coprocess231318 -Node: Getline Notes232032 -Node: Getline Summary234819 -Ref: table-getline-variants235227 -Node: Read Timeout236083 -Ref: Read Timeout-Footnote-1239828 -Node: Command line directories239885 -Node: Printing240515 -Node: Print242146 -Node: Print Examples243483 -Node: Output Separators246267 -Node: OFMT248027 -Node: Printf249385 -Node: Basic Printf250291 -Node: Control Letters251830 -Node: Format Modifiers255642 -Node: Printf Examples261651 -Node: Redirection264366 -Node: Special Files271350 -Node: Special FD271883 -Ref: Special FD-Footnote-1275508 -Node: Special Network275582 -Node: Special Caveats276432 -Node: Close Files And Pipes277228 -Ref: Close Files And Pipes-Footnote-1284251 -Ref: Close Files And Pipes-Footnote-2284399 -Node: Expressions284549 -Node: Values285681 -Node: Constants286357 -Node: Scalar Constants287037 -Ref: Scalar Constants-Footnote-1287896 -Node: Nondecimal-numbers288078 -Node: Regexp Constants291137 -Node: Using Constant Regexps291612 -Node: Variables294667 -Node: Using Variables295322 -Node: Assignment Options297046 -Node: Conversion298918 -Ref: table-locale-affects304294 -Ref: Conversion-Footnote-1304918 -Node: All Operators305027 -Node: Arithmetic Ops305657 -Node: Concatenation308162 -Ref: Concatenation-Footnote-1310955 -Node: Assignment Ops311075 -Ref: table-assign-ops316063 -Node: Increment Ops317471 -Node: Truth Values and Conditions320941 -Node: Truth Values322024 -Node: Typing and Comparison323073 -Node: Variable Typing323862 -Ref: Variable Typing-Footnote-1327759 -Node: Comparison Operators327881 -Ref: table-relational-ops328291 -Node: POSIX String Comparison331840 -Ref: POSIX String Comparison-Footnote-1332796 -Node: Boolean Ops332934 -Ref: Boolean Ops-Footnote-1337012 -Node: Conditional Exp337103 -Node: Function Calls338835 -Node: Precedence342429 -Node: Locales346098 -Node: Patterns and Actions347187 -Node: Pattern Overview348241 -Node: Regexp Patterns349910 -Node: Expression Patterns350453 -Node: Ranges354138 -Node: BEGIN/END357104 -Node: Using BEGIN/END357866 -Ref: Using BEGIN/END-Footnote-1360597 -Node: I/O And BEGIN/END360703 -Node: BEGINFILE/ENDFILE362985 -Node: Empty365889 -Node: Using Shell Variables366205 -Node: Action Overview368490 -Node: Statements370847 -Node: If Statement372701 -Node: While Statement374200 -Node: Do Statement376244 -Node: For Statement377400 -Node: Switch Statement380552 -Node: Break Statement382649 -Node: Continue Statement384639 -Node: Next Statement386432 -Node: Nextfile Statement388822 -Node: Exit Statement391463 -Node: Built-in Variables393879 -Node: User-modified394974 -Ref: User-modified-Footnote-1403329 -Node: Auto-set403391 -Ref: Auto-set-Footnote-1415742 -Ref: Auto-set-Footnote-2415947 -Node: ARGC and ARGV416003 -Node: Arrays419854 -Node: Array Basics421359 -Node: Array Intro422185 -Node: Reference to Elements426503 -Node: Assigning Elements428773 -Node: Array Example429264 -Node: Scanning an Array430996 -Node: Controlling Scanning433310 -Ref: Controlling Scanning-Footnote-1438243 -Node: Delete438559 -Ref: Delete-Footnote-1441324 -Node: Numeric Array Subscripts441381 -Node: Uninitialized Subscripts443564 -Node: Multi-dimensional445192 -Node: Multi-scanning448286 -Node: Arrays of Arrays449877 -Node: Functions454522 -Node: Built-in455344 -Node: Calling Built-in456422 -Node: Numeric Functions458410 -Ref: Numeric Functions-Footnote-1462242 -Ref: Numeric Functions-Footnote-2462599 -Ref: Numeric Functions-Footnote-3462647 -Node: String Functions462916 -Ref: String Functions-Footnote-1486413 -Ref: String Functions-Footnote-2486542 -Ref: String Functions-Footnote-3486790 -Node: Gory Details486877 -Ref: table-sub-escapes488556 -Ref: table-sub-posix-92489910 -Ref: table-sub-proposed491253 -Ref: table-posix-sub492603 -Ref: table-gensub-escapes494149 -Ref: Gory Details-Footnote-1495356 -Ref: Gory Details-Footnote-2495407 -Node: I/O Functions495558 -Ref: I/O Functions-Footnote-1502213 -Node: Time Functions502360 -Ref: Time Functions-Footnote-1513252 -Ref: Time Functions-Footnote-2513320 -Ref: Time Functions-Footnote-3513478 -Ref: Time Functions-Footnote-4513589 -Ref: Time Functions-Footnote-5513701 -Ref: Time Functions-Footnote-6513928 -Node: Bitwise Functions514194 -Ref: table-bitwise-ops514752 -Ref: Bitwise Functions-Footnote-1518973 -Node: Type Functions519157 -Node: I18N Functions519627 -Node: User-defined521254 -Node: Definition Syntax522058 -Ref: Definition Syntax-Footnote-1526968 -Node: Function Example527037 -Node: Function Caveats529631 -Node: Calling A Function530052 -Node: Variable Scope531167 -Node: Pass By Value/Reference533142 -Node: Return Statement536582 -Node: Dynamic Typing539563 -Node: Indirect Calls540298 -Node: Internationalization549983 -Node: I18N and L10N551409 -Node: Explaining gettext552095 -Ref: Explaining gettext-Footnote-1557161 -Ref: Explaining gettext-Footnote-2557345 -Node: Programmer i18n557510 -Node: Translator i18n561710 -Node: String Extraction562503 -Ref: String Extraction-Footnote-1563464 -Node: Printf Ordering563550 -Ref: Printf Ordering-Footnote-1566334 -Node: I18N Portability566398 -Ref: I18N Portability-Footnote-1568847 -Node: I18N Example568910 -Ref: I18N Example-Footnote-1571545 -Node: Gawk I18N571617 -Node: Advanced Features572234 -Node: Nondecimal Data573747 -Node: Array Sorting575330 -Node: Controlling Array Traversal576027 -Node: Array Sorting Functions584265 -Ref: Array Sorting Functions-Footnote-1587939 -Ref: Array Sorting Functions-Footnote-2588032 -Node: Two-way I/O588226 -Ref: Two-way I/O-Footnote-1593658 -Node: TCP/IP Networking593728 -Node: Profiling596572 -Node: Library Functions604026 -Ref: Library Functions-Footnote-1607033 -Node: Library Names607204 -Ref: Library Names-Footnote-1610675 -Ref: Library Names-Footnote-2610895 -Node: General Functions610981 -Node: Strtonum Function611934 -Node: Assert Function614864 -Node: Round Function618190 -Node: Cliff Random Function619733 -Node: Ordinal Functions620749 -Ref: Ordinal Functions-Footnote-1623819 -Ref: Ordinal Functions-Footnote-2624071 -Node: Join Function624280 -Ref: Join Function-Footnote-1626051 -Node: Getlocaltime Function626251 -Node: Data File Management629966 -Node: Filetrans Function630598 -Node: Rewind Function634737 -Node: File Checking636124 -Node: Empty Files637218 -Node: Ignoring Assigns639448 -Node: Getopt Function641001 -Ref: Getopt Function-Footnote-1652305 -Node: Passwd Functions652508 -Ref: Passwd Functions-Footnote-1661483 -Node: Group Functions661571 -Node: Walking Arrays669655 -Node: Sample Programs671224 -Node: Running Examples671889 -Node: Clones672617 -Node: Cut Program673841 -Node: Egrep Program683686 -Ref: Egrep Program-Footnote-1691459 -Node: Id Program691569 -Node: Split Program695185 -Ref: Split Program-Footnote-1698704 -Node: Tee Program698832 -Node: Uniq Program701635 -Node: Wc Program709064 -Ref: Wc Program-Footnote-1713330 -Ref: Wc Program-Footnote-2713530 -Node: Miscellaneous Programs713622 -Node: Dupword Program714810 -Node: Alarm Program716841 -Node: Translate Program721590 -Ref: Translate Program-Footnote-1725977 -Ref: Translate Program-Footnote-2726205 -Node: Labels Program726339 -Ref: Labels Program-Footnote-1729710 -Node: Word Sorting729794 -Node: History Sorting733678 -Node: Extract Program735517 -Ref: Extract Program-Footnote-1743000 -Node: Simple Sed743128 -Node: Igawk Program746190 -Ref: Igawk Program-Footnote-1761347 -Ref: Igawk Program-Footnote-2761548 -Node: Anagram Program761686 -Node: Signature Program764754 -Node: Debugger765854 -Node: Debugging766820 -Node: Debugging Concepts767253 -Node: Debugging Terms769109 -Node: Awk Debugging771706 -Node: Sample Debugging Session772598 -Node: Debugger Invocation773118 -Node: Finding The Bug774447 -Node: List of Debugger Commands780935 -Node: Breakpoint Control782269 -Node: Debugger Execution Control785933 -Node: Viewing And Changing Data789293 -Node: Execution Stack792649 -Node: Debugger Info794116 -Node: Miscellaneous Debugger Commands798097 -Node: Readline Support803542 -Node: Limitations804373 -Node: Arbitrary Precision Arithmetic806625 -Ref: Arbitrary Precision Arithmetic-Footnote-1808267 -Node: General Arithmetic808415 -Node: Floating Point Issues810135 -Node: String Conversion Precision811016 -Ref: String Conversion Precision-Footnote-1812722 -Node: Unexpected Results812831 -Node: POSIX Floating Point Problems814984 -Ref: POSIX Floating Point Problems-Footnote-1818809 -Node: Integer Programming818847 -Node: Floating-point Programming820600 -Ref: Floating-point Programming-Footnote-1826909 -Node: Floating-point Representation827173 -Node: Floating-point Context828338 -Ref: table-ieee-formats829180 -Node: Rounding Mode830564 -Ref: table-rounding-modes831043 -Ref: Rounding Mode-Footnote-1834047 -Node: Gawk and MPFR834228 -Node: Arbitrary Precision Floats835470 -Ref: Arbitrary Precision Floats-Footnote-1837899 -Node: Setting Precision838210 -Node: Setting Rounding Mode840943 -Ref: table-gawk-rounding-modes841347 -Node: Floating-point Constants842527 -Node: Changing Precision843951 -Ref: Changing Precision-Footnote-1845351 -Node: Exact Arithmetic845525 -Node: Arbitrary Precision Integers848633 -Ref: Arbitrary Precision Integers-Footnote-1851633 -Node: Dynamic Extensions851780 -Node: Plugin License852698 -Node: Sample Library853312 -Node: Internal File Description853996 -Node: Internal File Ops857709 -Ref: Internal File Ops-Footnote-1862272 -Node: Using Internal File Ops862412 -Node: Language History864788 -Node: V7/SVR3.1866310 -Node: SVR4868631 -Node: POSIX870073 -Node: BTL871081 -Node: POSIX/GNU871815 -Node: Common Extensions877350 -Node: Ranges and Locales878457 -Ref: Ranges and Locales-Footnote-1883075 -Ref: Ranges and Locales-Footnote-2883102 -Ref: Ranges and Locales-Footnote-3883362 -Node: Contributors883583 -Node: Installation887879 -Node: Gawk Distribution888773 -Node: Getting889257 -Node: Extracting890083 -Node: Distribution contents891775 -Node: Unix Installation896997 -Node: Quick Installation897614 -Node: Additional Configuration Options899576 -Node: Configuration Philosophy901053 -Node: Non-Unix Installation903395 -Node: PC Installation903853 -Node: PC Binary Installation905152 -Node: PC Compiling907000 -Node: PC Testing909944 -Node: PC Using911120 -Node: Cygwin915305 -Node: MSYS916305 -Node: VMS Installation916819 -Node: VMS Compilation917422 -Ref: VMS Compilation-Footnote-1918429 -Node: VMS Installation Details918487 -Node: VMS Running920122 -Node: VMS Old Gawk921729 -Node: Bugs922203 -Node: Other Versions926055 -Node: Notes931370 -Node: Compatibility Mode931957 -Node: Additions932740 -Node: Accessing The Source933667 -Node: Adding Code935093 -Node: New Ports941135 -Node: Derived Files945270 -Ref: Derived Files-Footnote-1950575 -Ref: Derived Files-Footnote-2950609 -Ref: Derived Files-Footnote-3951209 -Node: Future Extensions951307 -Node: Basic Concepts952794 -Node: Basic High Level953475 -Ref: figure-general-flow953746 -Ref: figure-process-flow954345 -Ref: Basic High Level-Footnote-1957574 -Node: Basic Data Typing957759 -Node: Glossary961114 -Node: Copying986425 -Node: GNU Free Documentation License1023982 -Node: Index1049119 +Node: Foreword40050 +Node: Preface44395 +Ref: Preface-Footnote-147448 +Ref: Preface-Footnote-247554 +Node: History47786 +Node: Names50177 +Ref: Names-Footnote-151654 +Node: This Manual51726 +Ref: This Manual-Footnote-156854 +Node: Conventions56954 +Node: Manual History59088 +Ref: Manual History-Footnote-162358 +Ref: Manual History-Footnote-262399 +Node: How To Contribute62473 +Node: Acknowledgments63617 +Node: Getting Started68113 +Node: Running gawk70492 +Node: One-shot71678 +Node: Read Terminal72903 +Ref: Read Terminal-Footnote-174553 +Ref: Read Terminal-Footnote-274829 +Node: Long75000 +Node: Executable Scripts76376 +Ref: Executable Scripts-Footnote-178245 +Ref: Executable Scripts-Footnote-278347 +Node: Comments78894 +Node: Quoting81361 +Node: DOS Quoting85984 +Node: Sample Data Files86659 +Node: Very Simple89691 +Node: Two Rules94290 +Node: More Complex96437 +Ref: More Complex-Footnote-199367 +Node: Statements/Lines99452 +Ref: Statements/Lines-Footnote-1103914 +Node: Other Features104179 +Node: When105107 +Node: Invoking Gawk107254 +Node: Command Line108715 +Node: Options109498 +Ref: Options-Footnote-1124896 +Node: Other Arguments124921 +Node: Naming Standard Input127579 +Node: Environment Variables128673 +Node: AWKPATH Variable129231 +Ref: AWKPATH Variable-Footnote-1131989 +Node: AWKLIBPATH Variable132249 +Node: Other Environment Variables132846 +Node: Exit Status135341 +Node: Include Files136016 +Node: Loading Shared Libraries139585 +Node: Obsolete140810 +Node: Undocumented141507 +Node: Regexp141750 +Node: Regexp Usage143139 +Node: Escape Sequences145165 +Node: Regexp Operators150928 +Ref: Regexp Operators-Footnote-1158308 +Ref: Regexp Operators-Footnote-2158455 +Node: Bracket Expressions158553 +Ref: table-char-classes160443 +Node: GNU Regexp Operators162966 +Node: Case-sensitivity166689 +Ref: Case-sensitivity-Footnote-1169657 +Ref: Case-sensitivity-Footnote-2169892 +Node: Leftmost Longest170000 +Node: Computed Regexps171201 +Node: Reading Files174611 +Node: Records176614 +Ref: Records-Footnote-1185538 +Node: Fields185575 +Ref: Fields-Footnote-1188608 +Node: Nonconstant Fields188694 +Node: Changing Fields190896 +Node: Field Separators196877 +Node: Default Field Splitting199506 +Node: Regexp Field Splitting200623 +Node: Single Character Fields203965 +Node: Command Line Field Separator205024 +Node: Field Splitting Summary208465 +Ref: Field Splitting Summary-Footnote-1211657 +Node: Constant Size211758 +Node: Splitting By Content216342 +Ref: Splitting By Content-Footnote-1220068 +Node: Multiple Line220108 +Ref: Multiple Line-Footnote-1225955 +Node: Getline226134 +Node: Plain Getline228350 +Node: Getline/Variable230439 +Node: Getline/File231580 +Node: Getline/Variable/File232902 +Ref: Getline/Variable/File-Footnote-1234501 +Node: Getline/Pipe234588 +Node: Getline/Variable/Pipe237148 +Node: Getline/Coprocess238255 +Node: Getline/Variable/Coprocess239498 +Node: Getline Notes240212 +Node: Getline Summary242999 +Ref: table-getline-variants243407 +Node: Read Timeout244263 +Ref: Read Timeout-Footnote-1248008 +Node: Command line directories248065 +Node: Printing248695 +Node: Print250326 +Node: Print Examples251663 +Node: Output Separators254447 +Node: OFMT256207 +Node: Printf257565 +Node: Basic Printf258471 +Node: Control Letters260010 +Node: Format Modifiers263822 +Node: Printf Examples269831 +Node: Redirection272546 +Node: Special Files279530 +Node: Special FD280063 +Ref: Special FD-Footnote-1283688 +Node: Special Network283762 +Node: Special Caveats284612 +Node: Close Files And Pipes285408 +Ref: Close Files And Pipes-Footnote-1292431 +Ref: Close Files And Pipes-Footnote-2292579 +Node: Expressions292729 +Node: Values293861 +Node: Constants294537 +Node: Scalar Constants295217 +Ref: Scalar Constants-Footnote-1296076 +Node: Nondecimal-numbers296258 +Node: Regexp Constants299317 +Node: Using Constant Regexps299792 +Node: Variables302847 +Node: Using Variables303502 +Node: Assignment Options305226 +Node: Conversion307098 +Ref: table-locale-affects312474 +Ref: Conversion-Footnote-1313098 +Node: All Operators313207 +Node: Arithmetic Ops313837 +Node: Concatenation316342 +Ref: Concatenation-Footnote-1319135 +Node: Assignment Ops319255 +Ref: table-assign-ops324243 +Node: Increment Ops325651 +Node: Truth Values and Conditions329121 +Node: Truth Values330204 +Node: Typing and Comparison331253 +Node: Variable Typing332042 +Ref: Variable Typing-Footnote-1335939 +Node: Comparison Operators336061 +Ref: table-relational-ops336471 +Node: POSIX String Comparison340020 +Ref: POSIX String Comparison-Footnote-1340976 +Node: Boolean Ops341114 +Ref: Boolean Ops-Footnote-1345192 +Node: Conditional Exp345283 +Node: Function Calls347015 +Node: Precedence350609 +Node: Locales354278 +Node: Patterns and Actions355367 +Node: Pattern Overview356421 +Node: Regexp Patterns358090 +Node: Expression Patterns358633 +Node: Ranges362318 +Node: BEGIN/END365284 +Node: Using BEGIN/END366046 +Ref: Using BEGIN/END-Footnote-1368777 +Node: I/O And BEGIN/END368883 +Node: BEGINFILE/ENDFILE371165 +Node: Empty374069 +Node: Using Shell Variables374385 +Node: Action Overview376670 +Node: Statements379027 +Node: If Statement380881 +Node: While Statement382380 +Node: Do Statement384424 +Node: For Statement385580 +Node: Switch Statement388732 +Node: Break Statement390829 +Node: Continue Statement392819 +Node: Next Statement394612 +Node: Nextfile Statement397002 +Node: Exit Statement399643 +Node: Built-in Variables402059 +Node: User-modified403154 +Ref: User-modified-Footnote-1411509 +Node: Auto-set411571 +Ref: Auto-set-Footnote-1423922 +Ref: Auto-set-Footnote-2424127 +Node: ARGC and ARGV424183 +Node: Arrays428034 +Node: Array Basics429539 +Node: Array Intro430365 +Node: Reference to Elements434683 +Node: Assigning Elements436953 +Node: Array Example437444 +Node: Scanning an Array439176 +Node: Controlling Scanning441490 +Ref: Controlling Scanning-Footnote-1446423 +Node: Delete446739 +Ref: Delete-Footnote-1449504 +Node: Numeric Array Subscripts449561 +Node: Uninitialized Subscripts451744 +Node: Multi-dimensional453372 +Node: Multi-scanning456466 +Node: Arrays of Arrays458057 +Node: Functions462702 +Node: Built-in463524 +Node: Calling Built-in464602 +Node: Numeric Functions466590 +Ref: Numeric Functions-Footnote-1470422 +Ref: Numeric Functions-Footnote-2470779 +Ref: Numeric Functions-Footnote-3470827 +Node: String Functions471096 +Ref: String Functions-Footnote-1494593 +Ref: String Functions-Footnote-2494722 +Ref: String Functions-Footnote-3494970 +Node: Gory Details495057 +Ref: table-sub-escapes496736 +Ref: table-sub-posix-92498090 +Ref: table-sub-proposed499433 +Ref: table-posix-sub500783 +Ref: table-gensub-escapes502329 +Ref: Gory Details-Footnote-1503536 +Ref: Gory Details-Footnote-2503587 +Node: I/O Functions503738 +Ref: I/O Functions-Footnote-1510393 +Node: Time Functions510540 +Ref: Time Functions-Footnote-1521432 +Ref: Time Functions-Footnote-2521500 +Ref: Time Functions-Footnote-3521658 +Ref: Time Functions-Footnote-4521769 +Ref: Time Functions-Footnote-5521881 +Ref: Time Functions-Footnote-6522108 +Node: Bitwise Functions522374 +Ref: table-bitwise-ops522932 +Ref: Bitwise Functions-Footnote-1527153 +Node: Type Functions527337 +Node: I18N Functions527807 +Node: User-defined529434 +Node: Definition Syntax530238 +Ref: Definition Syntax-Footnote-1535148 +Node: Function Example535217 +Node: Function Caveats537811 +Node: Calling A Function538232 +Node: Variable Scope539347 +Node: Pass By Value/Reference541322 +Node: Return Statement544762 +Node: Dynamic Typing547743 +Node: Indirect Calls548478 +Node: Internationalization558163 +Node: I18N and L10N559589 +Node: Explaining gettext560275 +Ref: Explaining gettext-Footnote-1565341 +Ref: Explaining gettext-Footnote-2565525 +Node: Programmer i18n565690 +Node: Translator i18n569890 +Node: String Extraction570683 +Ref: String Extraction-Footnote-1571644 +Node: Printf Ordering571730 +Ref: Printf Ordering-Footnote-1574514 +Node: I18N Portability574578 +Ref: I18N Portability-Footnote-1577027 +Node: I18N Example577090 +Ref: I18N Example-Footnote-1579725 +Node: Gawk I18N579797 +Node: Advanced Features580414 +Node: Nondecimal Data581927 +Node: Array Sorting583510 +Node: Controlling Array Traversal584207 +Node: Array Sorting Functions592445 +Ref: Array Sorting Functions-Footnote-1596119 +Ref: Array Sorting Functions-Footnote-2596212 +Node: Two-way I/O596406 +Ref: Two-way I/O-Footnote-1601838 +Node: TCP/IP Networking601908 +Node: Profiling604752 +Node: Library Functions612206 +Ref: Library Functions-Footnote-1615213 +Node: Library Names615384 +Ref: Library Names-Footnote-1618855 +Ref: Library Names-Footnote-2619075 +Node: General Functions619161 +Node: Strtonum Function620114 +Node: Assert Function623044 +Node: Round Function626370 +Node: Cliff Random Function627913 +Node: Ordinal Functions628929 +Ref: Ordinal Functions-Footnote-1631999 +Ref: Ordinal Functions-Footnote-2632251 +Node: Join Function632460 +Ref: Join Function-Footnote-1634231 +Node: Getlocaltime Function634431 +Node: Data File Management638146 +Node: Filetrans Function638778 +Node: Rewind Function642917 +Node: File Checking644304 +Node: Empty Files645398 +Node: Ignoring Assigns647628 +Node: Getopt Function649181 +Ref: Getopt Function-Footnote-1660485 +Node: Passwd Functions660688 +Ref: Passwd Functions-Footnote-1669663 +Node: Group Functions669751 +Node: Walking Arrays677835 +Node: Sample Programs679404 +Node: Running Examples680069 +Node: Clones680797 +Node: Cut Program682021 +Node: Egrep Program691866 +Ref: Egrep Program-Footnote-1699639 +Node: Id Program699749 +Node: Split Program703365 +Ref: Split Program-Footnote-1706884 +Node: Tee Program707012 +Node: Uniq Program709815 +Node: Wc Program717244 +Ref: Wc Program-Footnote-1721510 +Ref: Wc Program-Footnote-2721710 +Node: Miscellaneous Programs721802 +Node: Dupword Program722990 +Node: Alarm Program725021 +Node: Translate Program729770 +Ref: Translate Program-Footnote-1734157 +Ref: Translate Program-Footnote-2734385 +Node: Labels Program734519 +Ref: Labels Program-Footnote-1737890 +Node: Word Sorting737974 +Node: History Sorting741858 +Node: Extract Program743697 +Ref: Extract Program-Footnote-1751180 +Node: Simple Sed751308 +Node: Igawk Program754370 +Ref: Igawk Program-Footnote-1769527 +Ref: Igawk Program-Footnote-2769728 +Node: Anagram Program769866 +Node: Signature Program772934 +Node: Debugger774034 +Node: Debugging775000 +Node: Debugging Concepts775433 +Node: Debugging Terms777289 +Node: Awk Debugging779886 +Node: Sample Debugging Session780778 +Node: Debugger Invocation781298 +Node: Finding The Bug782627 +Node: List of Debugger Commands789115 +Node: Breakpoint Control790449 +Node: Debugger Execution Control794113 +Node: Viewing And Changing Data797473 +Node: Execution Stack800829 +Node: Debugger Info802296 +Node: Miscellaneous Debugger Commands806277 +Node: Readline Support811722 +Node: Limitations812553 +Node: Arbitrary Precision Arithmetic814805 +Ref: Arbitrary Precision Arithmetic-Footnote-1816447 +Node: General Arithmetic816595 +Node: Floating Point Issues818315 +Node: String Conversion Precision819196 +Ref: String Conversion Precision-Footnote-1820902 +Node: Unexpected Results821011 +Node: POSIX Floating Point Problems823164 +Ref: POSIX Floating Point Problems-Footnote-1826989 +Node: Integer Programming827027 +Node: Floating-point Programming828780 +Ref: Floating-point Programming-Footnote-1835089 +Node: Floating-point Representation835353 +Node: Floating-point Context836518 +Ref: table-ieee-formats837360 +Node: Rounding Mode838744 +Ref: table-rounding-modes839223 +Ref: Rounding Mode-Footnote-1842227 +Node: Gawk and MPFR842408 +Node: Arbitrary Precision Floats843650 +Ref: Arbitrary Precision Floats-Footnote-1846079 +Node: Setting Precision846390 +Node: Setting Rounding Mode849123 +Ref: table-gawk-rounding-modes849527 +Node: Floating-point Constants850707 +Node: Changing Precision852131 +Ref: Changing Precision-Footnote-1853531 +Node: Exact Arithmetic853705 +Node: Arbitrary Precision Integers856813 +Ref: Arbitrary Precision Integers-Footnote-1859813 +Node: Dynamic Extensions859960 +Node: Extension Intro861283 +Node: Plugin License862486 +Node: Extension Design863160 +Node: Old Extension Problems864231 +Ref: Old Extension Problems-Footnote-1865741 +Node: Extension New Mechanism Goals865798 +Ref: Extension New Mechanism Goals-Footnote-1868510 +Node: Extension Other Design Decisions868696 +Node: Extension Mechanism Outline870443 +Ref: load-extension871426 +Ref: load-new-function872859 +Ref: call-new-function873795 +Node: Extension Future Growth875776 +Node: Extension API Description876518 +Node: Extension API Functions Introduction877838 +Node: General Data Types881913 +Ref: General Data Types-Footnote-1887546 +Node: Requesting Values887845 +Ref: table-value-types-returned888576 +Node: Constructor Functions889530 +Node: Registration Functions892526 +Node: Extension Functions893211 +Node: Exit Callback Functions895030 +Node: Extension Version String896273 +Node: Input Parsers896923 +Node: Output Wrappers905504 +Node: Two-way processors909897 +Node: Printing Messages912019 +Ref: Printing Messages-Footnote-1913096 +Node: Updating `ERRNO'913248 +Node: Accessing Parameters913987 +Node: Symbol Table Access915217 +Node: Symbol table by name915729 +Ref: Symbol table by name-Footnote-1917901 +Node: Symbol table by cookie917981 +Ref: Symbol table by cookie-Footnote-1922110 +Node: Cached values922173 +Ref: Cached values-Footnote-1925374 +Node: Array Manipulation925465 +Ref: Array Manipulation-Footnote-1926563 +Node: Array Data Types926602 +Ref: Array Data Types-Footnote-1929328 +Node: Array Functions929420 +Node: Flattening Arrays933186 +Node: Creating Arrays940017 +Node: Extension API Variables944813 +Node: Extension Versioning945449 +Node: Extension API Informational Variables947350 +Node: Extension API Boilerplate948436 +Node: Finding Extensions952270 +Node: Extension Example952817 +Node: Internal File Description953555 +Node: Internal File Ops957243 +Ref: Internal File Ops-Footnote-1968327 +Node: Using Internal File Ops968467 +Ref: Using Internal File Ops-Footnote-1970823 +Node: Extension Samples971089 +Node: Extension Sample File Functions972532 +Node: Extension Sample Fnmatch980901 +Node: Extension Sample Fork982627 +Node: Extension Sample Ord983841 +Node: Extension Sample Readdir984617 +Node: Extension Sample Revout986955 +Node: Extension Sample Rev2way987548 +Node: Extension Sample Read write array988238 +Node: Extension Sample Readfile990121 +Node: Extension Sample API Tests990876 +Node: Extension Sample Time991401 +Node: gawkextlib992710 +Node: Language History995093 +Node: V7/SVR3.1996615 +Node: SVR4998936 +Node: POSIX1000378 +Node: BTL1001386 +Node: POSIX/GNU1002120 +Node: Common Extensions1007655 +Node: Ranges and Locales1008762 +Ref: Ranges and Locales-Footnote-11013380 +Ref: Ranges and Locales-Footnote-21013407 +Ref: Ranges and Locales-Footnote-31013667 +Node: Contributors1013888 +Node: Installation1018184 +Node: Gawk Distribution1019078 +Node: Getting1019562 +Node: Extracting1020388 +Node: Distribution contents1022080 +Node: Unix Installation1027302 +Node: Quick Installation1027919 +Node: Additional Configuration Options1029881 +Node: Configuration Philosophy1031358 +Node: Non-Unix Installation1033700 +Node: PC Installation1034158 +Node: PC Binary Installation1035457 +Node: PC Compiling1037305 +Node: PC Testing1040249 +Node: PC Using1041425 +Node: Cygwin1045610 +Node: MSYS1046610 +Node: VMS Installation1047124 +Node: VMS Compilation1047727 +Ref: VMS Compilation-Footnote-11048734 +Node: VMS Installation Details1048792 +Node: VMS Running1050427 +Node: VMS Old Gawk1052034 +Node: Bugs1052508 +Node: Other Versions1056360 +Node: Notes1061675 +Node: Compatibility Mode1062262 +Node: Additions1063045 +Node: Accessing The Source1063972 +Node: Adding Code1065398 +Node: New Ports1071440 +Node: Derived Files1075575 +Ref: Derived Files-Footnote-11080880 +Ref: Derived Files-Footnote-21080914 +Ref: Derived Files-Footnote-31081514 +Node: Future Extensions1081612 +Node: Basic Concepts1083099 +Node: Basic High Level1083780 +Ref: figure-general-flow1084051 +Ref: figure-process-flow1084650 +Ref: Basic High Level-Footnote-11087879 +Node: Basic Data Typing1088064 +Node: Glossary1091419 +Node: Copying1116730 +Node: GNU Free Documentation License1154287 +Node: Index1179424 End Tag Table diff --git a/doc/gawk.texi b/doc/gawk.texi index 59695171..573768ea 100644 --- a/doc/gawk.texi +++ b/doc/gawk.texi @@ -321,419 +321,531 @@ particular records in a file and perform operations upon them. * Index:: Concept and Variable Index. @detailmenu -* History:: The history of @command{gawk} and - @command{awk}. -* Names:: What name to use to find @command{awk}. -* This Manual:: Using this @value{DOCUMENT}. Includes - sample input files that you can use. -* Conventions:: Typographical Conventions. -* Manual History:: Brief history of the GNU project and this - @value{DOCUMENT}. -* How To Contribute:: Helping to save the world. -* Acknowledgments:: Acknowledgments. -* Running gawk:: How to run @command{gawk} programs; - includes command-line syntax. -* One-shot:: Running a short throwaway @command{awk} - program. -* Read Terminal:: Using no input files (input from terminal - instead). -* Long:: Putting permanent @command{awk} programs in - files. -* Executable Scripts:: Making self-contained @command{awk} - programs. -* Comments:: Adding documentation to @command{gawk} - programs. -* Quoting:: More discussion of shell quoting issues. -* DOS Quoting:: Quoting in Windows Batch Files. -* Sample Data Files:: Sample data files for use in the - @command{awk} programs illustrated in this - @value{DOCUMENT}. -* Very Simple:: A very simple example. -* Two Rules:: A less simple one-line example using two - rules. -* More Complex:: A more complex example. -* Statements/Lines:: Subdividing or combining statements into - lines. -* Other Features:: Other Features of @command{awk}. -* When:: When to use @command{gawk} and when to use - other things. -* Command Line:: How to run @command{awk}. -* Options:: Command-line options and their meanings. -* Other Arguments:: Input file names and variable assignments. -* Naming Standard Input:: How to specify standard input with other - files. -* Environment Variables:: The environment variables @command{gawk} - uses. -* AWKPATH Variable:: Searching directories for @command{awk} - programs. -* AWKLIBPATH Variable:: Searching directories for @command{awk} - shared libraries. -* Other Environment Variables:: The environment variables. -* Exit Status:: @command{gawk}'s exit status. -* Include Files:: Including other files into your program. -* Loading Shared Libraries:: Loading shared libraries into your program. -* Obsolete:: Obsolete Options and/or features. -* Undocumented:: Undocumented Options and Features. -* Regexp Usage:: How to Use Regular Expressions. -* Escape Sequences:: How to write nonprinting characters. -* Regexp Operators:: Regular Expression Operators. -* Bracket Expressions:: What can go between @samp{[...]}. -* GNU Regexp Operators:: Operators specific to GNU software. -* Case-sensitivity:: How to do case-insensitive matching. -* Leftmost Longest:: How much text matches. -* Computed Regexps:: Using Dynamic Regexps. -* Records:: Controlling how data is split into records. -* Fields:: An introduction to fields. -* Nonconstant Fields:: Nonconstant Field Numbers. -* Changing Fields:: Changing the Contents of a Field. -* Field Separators:: The field separator and how to change it. -* Default Field Splitting:: How fields are normally separated. -* Regexp Field Splitting:: Using regexps as the field separator. -* Single Character Fields:: Making each character a separate field. -* Command Line Field Separator:: Setting @code{FS} from the command-line. -* Field Splitting Summary:: Some final points and a summary table. -* Constant Size:: Reading constant width data. -* Splitting By Content:: Defining Fields By Content -* Multiple Line:: Reading multi-line records. -* Getline:: Reading files under explicit program - control using the @code{getline} function. -* Plain Getline:: Using @code{getline} with no arguments. -* Getline/Variable:: Using @code{getline} into a variable. -* Getline/File:: Using @code{getline} from a file. -* Getline/Variable/File:: Using @code{getline} into a variable from a - file. -* Getline/Pipe:: Using @code{getline} from a pipe. -* Getline/Variable/Pipe:: Using @code{getline} into a variable from a - pipe. -* Getline/Coprocess:: Using @code{getline} from a coprocess. -* Getline/Variable/Coprocess:: Using @code{getline} into a variable from a - coprocess. -* Getline Notes:: Important things to know about - @code{getline}. -* Getline Summary:: Summary of @code{getline} Variants. -* Read Timeout:: Reading input with a timeout. -* Command line directories:: What happens if you put a directory on the - command line. -* Print:: The @code{print} statement. -* Print Examples:: Simple examples of @code{print} statements. -* Output Separators:: The output separators and how to change - them. -* OFMT:: Controlling Numeric Output With - @code{print}. -* Printf:: The @code{printf} statement. -* Basic Printf:: Syntax of the @code{printf} statement. -* Control Letters:: Format-control letters. -* Format Modifiers:: Format-specification modifiers. -* Printf Examples:: Several examples. -* Redirection:: How to redirect output to multiple files - and pipes. -* Special Files:: File name interpretation in @command{gawk}. - @command{gawk} allows access to inherited - file descriptors. -* Special FD:: Special files for I/O. -* Special Network:: Special files for network communications. -* Special Caveats:: Things to watch out for. -* Close Files And Pipes:: Closing Input and Output Files and Pipes. -* Values:: Constants, Variables, and Regular - Expressions. -* Constants:: String, numeric and regexp constants. -* Scalar Constants:: Numeric and string constants. -* Nondecimal-numbers:: What are octal and hex numbers. -* Regexp Constants:: Regular Expression constants. -* Using Constant Regexps:: When and how to use a regexp constant. -* Variables:: Variables give names to values for later - use. -* Using Variables:: Using variables in your programs. -* Assignment Options:: Setting variables on the command-line and a - summary of command-line syntax. This is an - advanced method of input. -* Conversion:: The conversion of strings to numbers and - vice versa. -* All Operators:: @command{gawk}'s operators. -* Arithmetic Ops:: Arithmetic operations (@samp{+}, @samp{-}, - etc.) -* Concatenation:: Concatenating strings. -* Assignment Ops:: Changing the value of a variable or a - field. -* Increment Ops:: Incrementing the numeric value of a - variable. -* Truth Values and Conditions:: Testing for true and false. -* Truth Values:: What is ``true'' and what is ``false''. -* Typing and Comparison:: How variables acquire types and how this - affects comparison of numbers and strings - with @samp{<}, etc. -* Variable Typing:: String type versus numeric type. -* Comparison Operators:: The comparison operators. -* POSIX String Comparison:: String comparison with POSIX rules. -* Boolean Ops:: Combining comparison expressions using - boolean operators @samp{||} (``or''), - @samp{&&} (``and'') and @samp{!} (``not''). -* Conditional Exp:: Conditional expressions select between two - subexpressions under control of a third - subexpression. -* Function Calls:: A function call is an expression. -* Precedence:: How various operators nest. -* Locales:: How the locale affects things. -* Pattern Overview:: What goes into a pattern. -* Regexp Patterns:: Using regexps as patterns. -* Expression Patterns:: Any expression can be used as a pattern. -* Ranges:: Pairs of patterns specify record ranges. -* BEGIN/END:: Specifying initialization and cleanup - rules. -* Using BEGIN/END:: How and why to use BEGIN/END rules. -* I/O And BEGIN/END:: I/O issues in BEGIN/END rules. -* BEGINFILE/ENDFILE:: Two special patterns for advanced control. -* Empty:: The empty pattern, which matches every - record. -* Using Shell Variables:: How to use shell variables with - @command{awk}. -* Action Overview:: What goes into an action. -* Statements:: Describes the various control statements in - detail. -* If Statement:: Conditionally execute some @command{awk} - statements. -* While Statement:: Loop until some condition is satisfied. -* Do Statement:: Do specified action while looping until - some condition is satisfied. -* For Statement:: Another looping statement, that provides - initialization and increment clauses. -* Switch Statement:: Switch/case evaluation for conditional - execution of statements based on a value. -* Break Statement:: Immediately exit the innermost enclosing - loop. -* Continue Statement:: Skip to the end of the innermost enclosing - loop. -* Next Statement:: Stop processing the current input record. -* Nextfile Statement:: Stop processing the current file. -* Exit Statement:: Stop execution of @command{awk}. -* Built-in Variables:: Summarizes the built-in variables. -* User-modified:: Built-in variables that you change to - control @command{awk}. -* Auto-set:: Built-in variables where @command{awk} - gives you information. -* ARGC and ARGV:: Ways to use @code{ARGC} and @code{ARGV}. -* Array Basics:: The basics of arrays. -* Array Intro:: Introduction to Arrays -* Reference to Elements:: How to examine one element of an array. -* Assigning Elements:: How to change an element of an array. -* Array Example:: Basic Example of an Array -* Scanning an Array:: A variation of the @code{for} statement. It - loops through the indices of an array's - existing elements. -* Controlling Scanning:: Controlling the order in which arrays are - scanned. -* Delete:: The @code{delete} statement removes an - element from an array. -* Numeric Array Subscripts:: How to use numbers as subscripts in - @command{awk}. -* Uninitialized Subscripts:: Using Uninitialized variables as - subscripts. -* Multi-dimensional:: Emulating multidimensional arrays in - @command{awk}. -* Multi-scanning:: Scanning multidimensional arrays. -* Arrays of Arrays:: True multidimensional arrays. -* Built-in:: Summarizes the built-in functions. -* Calling Built-in:: How to call built-in functions. -* Numeric Functions:: Functions that work with numbers, including - @code{int()}, @code{sin()} and - @code{rand()}. -* String Functions:: Functions for string manipulation, such as - @code{split()}, @code{match()} and - @code{sprintf()}. -* Gory Details:: More than you want to know about @samp{\} - and @samp{&} with @code{sub()}, - @code{gsub()}, and @code{gensub()}. -* I/O Functions:: Functions for files and shell commands. -* Time Functions:: Functions for dealing with timestamps. -* Bitwise Functions:: Functions for bitwise operations. -* Type Functions:: Functions for type information. -* I18N Functions:: Functions for string translation. -* User-defined:: Describes User-defined functions in detail. -* Definition Syntax:: How to write definitions and what they - mean. -* Function Example:: An example function definition and what it - does. -* Function Caveats:: Things to watch out for. -* Calling A Function:: Don't use spaces. -* Variable Scope:: Controlling variable scope. -* Pass By Value/Reference:: Passing parameters. -* Return Statement:: Specifying the value a function returns. -* Dynamic Typing:: How variable types can change at runtime. -* Indirect Calls:: Choosing the function to call at runtime. -* I18N and L10N:: Internationalization and Localization. -* Explaining gettext:: How GNU @code{gettext} works. -* Programmer i18n:: Features for the programmer. -* Translator i18n:: Features for the translator. -* String Extraction:: Extracting marked strings. -* Printf Ordering:: Rearranging @code{printf} arguments. -* I18N Portability:: @command{awk}-level portability issues. -* I18N Example:: A simple i18n example. -* Gawk I18N:: @command{gawk} is also internationalized. -* Nondecimal Data:: Allowing nondecimal input data. -* Array Sorting:: Facilities for controlling array traversal - and sorting arrays. -* Controlling Array Traversal:: How to use PROCINFO["sorted_in"]. -* Array Sorting Functions:: How to use @code{asort()} and - @code{asorti()}. -* Two-way I/O:: Two-way communications with another - process. -* TCP/IP Networking:: Using @command{gawk} for network - programming. -* Profiling:: Profiling your @command{awk} programs. -* Library Names:: How to best name private global variables - in library functions. -* General Functions:: Functions that are of general use. -* Strtonum Function:: A replacement for the built-in - @code{strtonum()} function. -* Assert Function:: A function for assertions in @command{awk} - programs. -* Round Function:: A function for rounding if @code{sprintf()} - does not do it correctly. -* Cliff Random Function:: The Cliff Random Number Generator. -* Ordinal Functions:: Functions for using characters as numbers - and vice versa. -* Join Function:: A function to join an array into a string. -* Getlocaltime Function:: A function to get formatted times. -* Data File Management:: Functions for managing command-line data - files. -* Filetrans Function:: A function for handling data file - transitions. -* Rewind Function:: A function for rereading the current file. -* File Checking:: Checking that data files are readable. -* Empty Files:: Checking for zero-length files. -* Ignoring Assigns:: Treating assignments as file names. -* Getopt Function:: A function for processing command-line - arguments. -* Passwd Functions:: Functions for getting user information. -* Group Functions:: Functions for getting group information. -* Walking Arrays:: A function to walk arrays of arrays. -* Running Examples:: How to run these examples. -* Clones:: Clones of common utilities. -* Cut Program:: The @command{cut} utility. -* Egrep Program:: The @command{egrep} utility. -* Id Program:: The @command{id} utility. -* Split Program:: The @command{split} utility. -* Tee Program:: The @command{tee} utility. -* Uniq Program:: The @command{uniq} utility. -* Wc Program:: The @command{wc} utility. -* Miscellaneous Programs:: Some interesting @command{awk} programs. -* Dupword Program:: Finding duplicated words in a document. -* Alarm Program:: An alarm clock. -* Translate Program:: A program similar to the @command{tr} - utility. -* Labels Program:: Printing mailing labels. -* Word Sorting:: A program to produce a word usage count. -* History Sorting:: Eliminating duplicate entries from a - history file. -* Extract Program:: Pulling out programs from Texinfo source - files. -* Simple Sed:: A Simple Stream Editor. -* Igawk Program:: A wrapper for @command{awk} that includes - files. -* Anagram Program:: Finding anagrams from a dictionary. -* Signature Program:: People do amazing things with too much time - on their hands. -* Debugging:: Introduction to @command{gawk} debugger. -* Debugging Concepts:: Debugging in General. -* Debugging Terms:: Additional Debugging Concepts. -* Awk Debugging:: Awk Debugging. -* Sample Debugging Session:: Sample debugging session. -* Debugger Invocation:: How to Start the Debugger. -* Finding The Bug:: Finding the Bug. -* List of Debugger Commands:: Main debugger commands. -* Breakpoint Control:: Control of Breakpoints. -* Debugger Execution Control:: Control of Execution. -* Viewing And Changing Data:: Viewing and Changing Data. -* Execution Stack:: Dealing with the Stack. -* Debugger Info:: Obtaining Information about the Program and - the Debugger State. -* Miscellaneous Debugger Commands:: Miscellaneous Commands. -* Readline Support:: Readline support. -* Limitations:: Limitations and future plans. -* General Arithmetic:: An introduction to computer arithmetic. -* Floating Point Issues:: Stuff to know about floating-point numbers. -* String Conversion Precision:: The String Value Can Lie. -* Unexpected Results:: Floating Point Numbers Are Not Abstract - Numbers. -* POSIX Floating Point Problems:: Standards Versus Existing Practice. -* Integer Programming:: Effective integer programming. -* Floating-point Programming:: Effective Floating-point Programming. -* Floating-point Representation:: Binary floating-point representation. -* Floating-point Context:: Floating-point context. -* Rounding Mode:: Floating-point rounding mode. -* Gawk and MPFR:: How @command{gawk} provides - arbitrary-precision arithmetic. -* Arbitrary Precision Floats:: Arbitrary Precision Floating-point - Arithmetic with @command{gawk}. -* Setting Precision:: Setting the working precision. -* Setting Rounding Mode:: Setting the rounding mode. -* Floating-point Constants:: Representing floating-point constants. -* Changing Precision:: Changing the precision of a number. -* Exact Arithmetic:: Exact arithmetic with floating-point - numbers. -* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic with - @command{gawk}. -* Plugin License:: A note about licensing. -* Sample Library:: A example of new functions. -* Internal File Description:: What the new functions will do. -* Internal File Ops:: The code for internal file operations. -* Using Internal File Ops:: How to use an external extension. -* V7/SVR3.1:: The major changes between V7 and System V - Release 3.1. -* SVR4:: Minor changes between System V Releases 3.1 - and 4. -* POSIX:: New features from the POSIX standard. -* BTL:: New features from Brian Kernighan's version - of @command{awk}. -* POSIX/GNU:: The extensions in @command{gawk} not in - POSIX @command{awk}. -* Common Extensions:: Common Extensions Summary. -* Ranges and Locales:: How locales used to affect regexp ranges. -* Contributors:: The major contributors to @command{gawk}. -* Gawk Distribution:: What is in the @command{gawk} distribution. -* Getting:: How to get the distribution. -* Extracting:: How to extract the distribution. -* Distribution contents:: What is in the distribution. -* Unix Installation:: Installing @command{gawk} under various - versions of Unix. -* Quick Installation:: Compiling @command{gawk} under Unix. -* Additional Configuration Options:: Other compile-time options. -* Configuration Philosophy:: How it's all supposed to work. -* Non-Unix Installation:: Installation on Other Operating Systems. -* PC Installation:: Installing and Compiling @command{gawk} on - MS-DOS and OS/2. -* PC Binary Installation:: Installing a prepared distribution. -* PC Compiling:: Compiling @command{gawk} for MS-DOS, - Windows32, and OS/2. -* PC Testing:: Testing @command{gawk} on PC systems. -* PC Using:: Running @command{gawk} on MS-DOS, Windows32 - and OS/2. -* Cygwin:: Building and running @command{gawk} for - Cygwin. -* MSYS:: Using @command{gawk} In The MSYS - Environment. -* VMS Installation:: Installing @command{gawk} on VMS. -* VMS Compilation:: How to compile @command{gawk} under VMS. -* VMS Installation Details:: How to install @command{gawk} under VMS. -* VMS Running:: How to run @command{gawk} under VMS. -* VMS Old Gawk:: An old version comes with some VMS systems. -* Bugs:: Reporting Problems and Bugs. -* Other Versions:: Other freely available @command{awk} - implementations. -* Compatibility Mode:: How to disable certain @command{gawk} - extensions. -* Additions:: Making Additions To @command{gawk}. -* Accessing The Source:: Accessing the Git repository. -* Adding Code:: Adding code to the main body of - @command{gawk}. -* New Ports:: Porting @command{gawk} to a new operating - system. -* Derived Files:: Why derived files are kept in the - @command{git} repository. -* Future Extensions:: New features that may be implemented one - day. -* Basic High Level:: The high level view. -* Basic Data Typing:: A very quick intro to data types. +* History:: The history of @command{gawk} and + @command{awk}. +* Names:: What name to use to find + @command{awk}. +* This Manual:: Using this @value{DOCUMENT}. Includes + sample input files that you can use. +* Conventions:: Typographical Conventions. +* Manual History:: Brief history of the GNU project and + this @value{DOCUMENT}. +* How To Contribute:: Helping to save the world. +* Acknowledgments:: Acknowledgments. +* Running gawk:: How to run @command{gawk} programs; + includes command-line syntax. +* One-shot:: Running a short throwaway + @command{awk} program. +* Read Terminal:: Using no input files (input from + terminal instead). +* Long:: Putting permanent @command{awk} + programs in files. +* Executable Scripts:: Making self-contained @command{awk} + programs. +* Comments:: Adding documentation to @command{gawk} + programs. +* Quoting:: More discussion of shell quoting + issues. +* DOS Quoting:: Quoting in Windows Batch Files. +* Sample Data Files:: Sample data files for use in the + @command{awk} programs illustrated in + this @value{DOCUMENT}. +* Very Simple:: A very simple example. +* Two Rules:: A less simple one-line example using + two rules. +* More Complex:: A more complex example. +* Statements/Lines:: Subdividing or combining statements + into lines. +* Other Features:: Other Features of @command{awk}. +* When:: When to use @command{gawk} and when to + use other things. +* Command Line:: How to run @command{awk}. +* Options:: Command-line options and their + meanings. +* Other Arguments:: Input file names and variable + assignments. +* Naming Standard Input:: How to specify standard input with + other files. +* Environment Variables:: The environment variables + @command{gawk} uses. +* AWKPATH Variable:: Searching directories for + @command{awk} programs. +* AWKLIBPATH Variable:: Searching directories for + @command{awk} shared libraries. +* Other Environment Variables:: The environment variables. +* Exit Status:: @command{gawk}'s exit status. +* Include Files:: Including other files into your + program. +* Loading Shared Libraries:: Loading shared libraries into your + program. +* Obsolete:: Obsolete Options and/or features. +* Undocumented:: Undocumented Options and Features. +* Regexp Usage:: How to Use Regular Expressions. +* Escape Sequences:: How to write nonprinting characters. +* Regexp Operators:: Regular Expression Operators. +* Bracket Expressions:: What can go between @samp{[...]}. +* GNU Regexp Operators:: Operators specific to GNU software. +* Case-sensitivity:: How to do case-insensitive matching. +* Leftmost Longest:: How much text matches. +* Computed Regexps:: Using Dynamic Regexps. +* Records:: Controlling how data is split into + records. +* Fields:: An introduction to fields. +* Nonconstant Fields:: Nonconstant Field Numbers. +* Changing Fields:: Changing the Contents of a Field. +* Field Separators:: The field separator and how to change + it. +* Default Field Splitting:: How fields are normally separated. +* Regexp Field Splitting:: Using regexps as the field separator. +* Single Character Fields:: Making each character a separate + field. +* Command Line Field Separator:: Setting @code{FS} from the + command-line. +* Field Splitting Summary:: Some final points and a summary table. +* Constant Size:: Reading constant width data. +* Splitting By Content:: Defining Fields By Content +* Multiple Line:: Reading multi-line records. +* Getline:: Reading files under explicit program + control using the @code{getline} + function. +* Plain Getline:: Using @code{getline} with no + arguments. +* Getline/Variable:: Using @code{getline} into a variable. +* Getline/File:: Using @code{getline} from a file. +* Getline/Variable/File:: Using @code{getline} into a variable + from a file. +* Getline/Pipe:: Using @code{getline} from a pipe. +* Getline/Variable/Pipe:: Using @code{getline} into a variable + from a pipe. +* Getline/Coprocess:: Using @code{getline} from a coprocess. +* Getline/Variable/Coprocess:: Using @code{getline} into a variable + from a coprocess. +* Getline Notes:: Important things to know about + @code{getline}. +* Getline Summary:: Summary of @code{getline} Variants. +* Read Timeout:: Reading input with a timeout. +* Command line directories:: What happens if you put a directory on + the command line. +* Print:: The @code{print} statement. +* Print Examples:: Simple examples of @code{print} + statements. +* Output Separators:: The output separators and how to + change them. +* OFMT:: Controlling Numeric Output With + @code{print}. +* Printf:: The @code{printf} statement. +* Basic Printf:: Syntax of the @code{printf} statement. +* Control Letters:: Format-control letters. +* Format Modifiers:: Format-specification modifiers. +* Printf Examples:: Several examples. +* Redirection:: How to redirect output to multiple + files and pipes. +* Special Files:: File name interpretation in + @command{gawk}. @command{gawk} allows + access to inherited file descriptors. +* Special FD:: Special files for I/O. +* Special Network:: Special files for network + communications. +* Special Caveats:: Things to watch out for. +* Close Files And Pipes:: Closing Input and Output Files and + Pipes. +* Values:: Constants, Variables, and Regular + Expressions. +* Constants:: String, numeric and regexp constants. +* Scalar Constants:: Numeric and string constants. +* Nondecimal-numbers:: What are octal and hex numbers. +* Regexp Constants:: Regular Expression constants. +* Using Constant Regexps:: When and how to use a regexp constant. +* Variables:: Variables give names to values for + later use. +* Using Variables:: Using variables in your programs. +* Assignment Options:: Setting variables on the command-line + and a summary of command-line syntax. + This is an advanced method of input. +* Conversion:: The conversion of strings to numbers + and vice versa. +* All Operators:: @command{gawk}'s operators. +* Arithmetic Ops:: Arithmetic operations (@samp{+}, + @samp{-}, etc.) +* Concatenation:: Concatenating strings. +* Assignment Ops:: Changing the value of a variable or a + field. +* Increment Ops:: Incrementing the numeric value of a + variable. +* Truth Values and Conditions:: Testing for true and false. +* Truth Values:: What is ``true'' and what is + ``false''. +* Typing and Comparison:: How variables acquire types and how + this affects comparison of numbers and + strings with @samp{<}, etc. +* Variable Typing:: String type versus numeric type. +* Comparison Operators:: The comparison operators. +* POSIX String Comparison:: String comparison with POSIX rules. +* Boolean Ops:: Combining comparison expressions using + boolean operators @samp{||} (``or''), + @samp{&&} (``and'') and @samp{!} + (``not''). +* Conditional Exp:: Conditional expressions select between + two subexpressions under control of a + third subexpression. +* Function Calls:: A function call is an expression. +* Precedence:: How various operators nest. +* Locales:: How the locale affects things. +* Pattern Overview:: What goes into a pattern. +* Regexp Patterns:: Using regexps as patterns. +* Expression Patterns:: Any expression can be used as a + pattern. +* Ranges:: Pairs of patterns specify record + ranges. +* BEGIN/END:: Specifying initialization and cleanup + rules. +* Using BEGIN/END:: How and why to use BEGIN/END rules. +* I/O And BEGIN/END:: I/O issues in BEGIN/END rules. +* BEGINFILE/ENDFILE:: Two special patterns for advanced + control. +* Empty:: The empty pattern, which matches every + record. +* Using Shell Variables:: How to use shell variables with + @command{awk}. +* Action Overview:: What goes into an action. +* Statements:: Describes the various control + statements in detail. +* If Statement:: Conditionally execute some + @command{awk} statements. +* While Statement:: Loop until some condition is + satisfied. +* Do Statement:: Do specified action while looping + until some condition is satisfied. +* For Statement:: Another looping statement, that + provides initialization and increment + clauses. +* Switch Statement:: Switch/case evaluation for conditional + execution of statements based on a + value. +* Break Statement:: Immediately exit the innermost + enclosing loop. +* Continue Statement:: Skip to the end of the innermost + enclosing loop. +* Next Statement:: Stop processing the current input + record. +* Nextfile Statement:: Stop processing the current file. +* Exit Statement:: Stop execution of @command{awk}. +* Built-in Variables:: Summarizes the built-in variables. +* User-modified:: Built-in variables that you change to + control @command{awk}. +* Auto-set:: Built-in variables where @command{awk} + gives you information. +* ARGC and ARGV:: Ways to use @code{ARGC} and + @code{ARGV}. +* Array Basics:: The basics of arrays. +* Array Intro:: Introduction to Arrays +* Reference to Elements:: How to examine one element of an + array. +* Assigning Elements:: How to change an element of an array. +* Array Example:: Basic Example of an Array +* Scanning an Array:: A variation of the @code{for} + statement. It loops through the + indices of an array's existing + elements. +* Controlling Scanning:: Controlling the order in which arrays + are scanned. +* Delete:: The @code{delete} statement removes an + element from an array. +* Numeric Array Subscripts:: How to use numbers as subscripts in + @command{awk}. +* Uninitialized Subscripts:: Using Uninitialized variables as + subscripts. +* Multi-dimensional:: Emulating multidimensional arrays in + @command{awk}. +* Multi-scanning:: Scanning multidimensional arrays. +* Arrays of Arrays:: True multidimensional arrays. +* Built-in:: Summarizes the built-in functions. +* Calling Built-in:: How to call built-in functions. +* Numeric Functions:: Functions that work with numbers, + including @code{int()}, @code{sin()} + and @code{rand()}. +* String Functions:: Functions for string manipulation, + such as @code{split()}, @code{match()} + and @code{sprintf()}. +* Gory Details:: More than you want to know about + @samp{\} and @samp{&} with + @code{sub()}, @code{gsub()}, and + @code{gensub()}. +* I/O Functions:: Functions for files and shell + commands. +* Time Functions:: Functions for dealing with timestamps. +* Bitwise Functions:: Functions for bitwise operations. +* Type Functions:: Functions for type information. +* I18N Functions:: Functions for string translation. +* User-defined:: Describes User-defined functions in + detail. +* Definition Syntax:: How to write definitions and what they + mean. +* Function Example:: An example function definition and + what it does. +* Function Caveats:: Things to watch out for. +* Calling A Function:: Don't use spaces. +* Variable Scope:: Controlling variable scope. +* Pass By Value/Reference:: Passing parameters. +* Return Statement:: Specifying the value a function + returns. +* Dynamic Typing:: How variable types can change at + runtime. +* Indirect Calls:: Choosing the function to call at + runtime. +* I18N and L10N:: Internationalization and Localization. +* Explaining gettext:: How GNU @code{gettext} works. +* Programmer i18n:: Features for the programmer. +* Translator i18n:: Features for the translator. +* String Extraction:: Extracting marked strings. +* Printf Ordering:: Rearranging @code{printf} arguments. +* I18N Portability:: @command{awk}-level portability + issues. +* I18N Example:: A simple i18n example. +* Gawk I18N:: @command{gawk} is also + internationalized. +* Nondecimal Data:: Allowing nondecimal input data. +* Array Sorting:: Facilities for controlling array + traversal and sorting arrays. +* Controlling Array Traversal:: How to use PROCINFO["sorted_in"]. +* Array Sorting Functions:: How to use @code{asort()} and + @code{asorti()}. +* Two-way I/O:: Two-way communications with another + process. +* TCP/IP Networking:: Using @command{gawk} for network + programming. +* Profiling:: Profiling your @command{awk} programs. +* Library Names:: How to best name private global + variables in library functions. +* General Functions:: Functions that are of general use. +* Strtonum Function:: A replacement for the built-in + @code{strtonum()} function. +* Assert Function:: A function for assertions in + @command{awk} programs. +* Round Function:: A function for rounding if + @code{sprintf()} does not do it + correctly. +* Cliff Random Function:: The Cliff Random Number Generator. +* Ordinal Functions:: Functions for using characters as + numbers and vice versa. +* Join Function:: A function to join an array into a + string. +* Getlocaltime Function:: A function to get formatted times. +* Data File Management:: Functions for managing command-line + data files. +* Filetrans Function:: A function for handling data file + transitions. +* Rewind Function:: A function for rereading the current + file. +* File Checking:: Checking that data files are readable. +* Empty Files:: Checking for zero-length files. +* Ignoring Assigns:: Treating assignments as file names. +* Getopt Function:: A function for processing command-line + arguments. +* Passwd Functions:: Functions for getting user + information. +* Group Functions:: Functions for getting group + information. +* Walking Arrays:: A function to walk arrays of arrays. +* Running Examples:: How to run these examples. +* Clones:: Clones of common utilities. +* Cut Program:: The @command{cut} utility. +* Egrep Program:: The @command{egrep} utility. +* Id Program:: The @command{id} utility. +* Split Program:: The @command{split} utility. +* Tee Program:: The @command{tee} utility. +* Uniq Program:: The @command{uniq} utility. +* Wc Program:: The @command{wc} utility. +* Miscellaneous Programs:: Some interesting @command{awk} + programs. +* Dupword Program:: Finding duplicated words in a + document. +* Alarm Program:: An alarm clock. +* Translate Program:: A program similar to the @command{tr} + utility. +* Labels Program:: Printing mailing labels. +* Word Sorting:: A program to produce a word usage + count. +* History Sorting:: Eliminating duplicate entries from a + history file. +* Extract Program:: Pulling out programs from Texinfo + source files. +* Simple Sed:: A Simple Stream Editor. +* Igawk Program:: A wrapper for @command{awk} that + includes files. +* Anagram Program:: Finding anagrams from a dictionary. +* Signature Program:: People do amazing things with too much + time on their hands. +* Debugging:: Introduction to @command{gawk} + debugger. +* Debugging Concepts:: Debugging in General. +* Debugging Terms:: Additional Debugging Concepts. +* Awk Debugging:: Awk Debugging. +* Sample Debugging Session:: Sample debugging session. +* Debugger Invocation:: How to Start the Debugger. +* Finding The Bug:: Finding the Bug. +* List of Debugger Commands:: Main debugger commands. +* Breakpoint Control:: Control of Breakpoints. +* Debugger Execution Control:: Control of Execution. +* Viewing And Changing Data:: Viewing and Changing Data. +* Execution Stack:: Dealing with the Stack. +* Debugger Info:: Obtaining Information about the + Program and the Debugger State. +* Miscellaneous Debugger Commands:: Miscellaneous Commands. +* Readline Support:: Readline support. +* Limitations:: Limitations and future plans. +* General Arithmetic:: An introduction to computer + arithmetic. +* Floating Point Issues:: Stuff to know about floating-point + numbers. +* String Conversion Precision:: The String Value Can Lie. +* Unexpected Results:: Floating Point Numbers Are Not + Abstract Numbers. +* POSIX Floating Point Problems:: Standards Versus Existing Practice. +* Integer Programming:: Effective integer programming. +* Floating-point Programming:: Effective Floating-point Programming. +* Floating-point Representation:: Binary floating-point representation. +* Floating-point Context:: Floating-point context. +* Rounding Mode:: Floating-point rounding mode. +* Gawk and MPFR:: How @command{gawk} provides + arbitrary-precision arithmetic. +* Arbitrary Precision Floats:: Arbitrary Precision Floating-point + Arithmetic with @command{gawk}. +* Setting Precision:: Setting the working precision. +* Setting Rounding Mode:: Setting the rounding mode. +* Floating-point Constants:: Representing floating-point constants. +* Changing Precision:: Changing the precision of a number. +* Exact Arithmetic:: Exact arithmetic with floating-point + numbers. +* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic + with @command{gawk}. +* Extension Intro:: What is an extension. +* Plugin License:: A note about licensing. +* Extension Design:: Design notes about the extension API. +* Old Extension Problems:: Problems with the old mechanism. +* Extension New Mechanism Goals:: Goals for the new mechanism. +* Extension Other Design Decisions:: Some other design decisions. +* Extension Mechanism Outline:: An outline of how it works. +* Extension Future Growth:: Some room for future growth. +* Extension API Description:: A full description of the API. +* Extension API Functions Introduction:: Introduction to the API functions. +* General Data Types:: The data types. +* Requesting Values:: How to get a value. +* Constructor Functions:: Functions for creating values. +* Registration Functions:: Functions to register things with + @command{gawk}. +* Extension Functions:: Registering extension functions. +* Exit Callback Functions:: Registering an exit callback. +* Extension Version String:: Registering a version string. +* Input Parsers:: Registering an input parser. +* Output Wrappers:: Registering an output wrapper. +* Two-way processors:: Registering a two-way processor. +* Printing Messages:: Functions for printing messages. +* Updating @code{ERRNO}:: Functions for updating @code{ERRNO}. +* Accessing Parameters:: Functions for accessing parameters. +* Symbol Table Access:: Functions for accessing global + variables. +* Symbol table by name:: Accessing variables by name. +* Symbol table by cookie:: Accessing variables by ``cookie''. +* Cached values:: Creating and using cached values. +* Array Manipulation:: Functions for working with arrays. +* Array Data Types:: Data types for working with arrays. +* Array Functions:: Functions for working with arrays. +* Flattening Arrays:: How to flatten arrays. +* Creating Arrays:: How to create and populate arrays. +* Extension API Variables:: Variables provided by the API. +* Extension Versioning:: API Version information. +* Extension API Informational Variables:: Variables providing information about + @command{gawk}'s invocation. +* Extension API Boilerplate:: Boilerplate code for using the API. +* Finding Extensions:: How @command{gawk} find compiled + extensions. +* Extension Example:: Example C code for an extension. +* Internal File Description:: What the new functions will do. +* Internal File Ops:: The code for internal file operations. +* Using Internal File Ops:: How to use an external extension. +* Extension Samples:: The sample extensions that ship with + @code{gawk}. +* Extension Sample File Functions:: The file functions sample. +* Extension Sample Fnmatch:: An interface to @code{fnmatch()}. +* Extension Sample Fork:: An interface to @code{fork()} and + other process functions. +* Extension Sample Ord:: Character to value to character + conversions. +* Extension Sample Readdir:: An interface to @code{readdir()}. +* Extension Sample Revout:: Reversing output sample output + wrapper. +* Extension Sample Rev2way:: Reversing data sample two-way + processor. +* Extension Sample Read write array:: Serializing an array to a file. +* Extension Sample Readfile:: Reading an entire file into a string. +* Extension Sample API Tests:: Tests for the API. +* Extension Sample Time:: An interface to @code{gettimeofday()} + and @code{sleep()}. +* gawkextlib:: The @code{gawkextlib} project. +* V7/SVR3.1:: The major changes between V7 and + System V Release 3.1. +* SVR4:: Minor changes between System V + Releases 3.1 and 4. +* POSIX:: New features from the POSIX standard. +* BTL:: New features from Brian Kernighan's + version of @command{awk}. +* POSIX/GNU:: The extensions in @command{gawk} not + in POSIX @command{awk}. +* Common Extensions:: Common Extensions Summary. +* Ranges and Locales:: How locales used to affect regexp + ranges. +* Contributors:: The major contributors to + @command{gawk}. +* Gawk Distribution:: What is in the @command{gawk} + distribution. +* Getting:: How to get the distribution. +* Extracting:: How to extract the distribution. +* Distribution contents:: What is in the distribution. +* Unix Installation:: Installing @command{gawk} under + various versions of Unix. +* Quick Installation:: Compiling @command{gawk} under Unix. +* Additional Configuration Options:: Other compile-time options. +* Configuration Philosophy:: How it's all supposed to work. +* Non-Unix Installation:: Installation on Other Operating + Systems. +* PC Installation:: Installing and Compiling + @command{gawk} on MS-DOS and OS/2. +* PC Binary Installation:: Installing a prepared distribution. +* PC Compiling:: Compiling @command{gawk} for MS-DOS, + Windows32, and OS/2. +* PC Testing:: Testing @command{gawk} on PC systems. +* PC Using:: Running @command{gawk} on MS-DOS, + Windows32 and OS/2. +* Cygwin:: Building and running @command{gawk} + for Cygwin. +* MSYS:: Using @command{gawk} In The MSYS + Environment. +* VMS Installation:: Installing @command{gawk} on VMS. +* VMS Compilation:: How to compile @command{gawk} under + VMS. +* VMS Installation Details:: How to install @command{gawk} under + VMS. +* VMS Running:: How to run @command{gawk} under VMS. +* VMS Old Gawk:: An old version comes with some VMS + systems. +* Bugs:: Reporting Problems and Bugs. +* Other Versions:: Other freely available @command{awk} + implementations. +* Compatibility Mode:: How to disable certain @command{gawk} + extensions. +* Additions:: Making Additions To @command{gawk}. +* Accessing The Source:: Accessing the Git repository. +* Adding Code:: Adding code to the main body of + @command{gawk}. +* New Ports:: Porting @command{gawk} to a new + operating system. +* Derived Files:: Why derived files are kept in the + @command{git} repository. +* Future Extensions:: New features that may be implemented + one day. +* Basic High Level:: The high level view. +* Basic Data Typing:: A very quick intro to data types. @end detailmenu @end menu @@ -28049,42 +28161,62 @@ gawk -M 'BEGIN @{ n = 13; print n % 2 @}' @node Dynamic Extensions @chapter Writing Extensions for @command{gawk} -This chapter is a placeholder, pending a rewrite for the new API. -Some of the old bits remain, since they can be partially reused. - - -@c STARTOFRANGE gladfgaw -@cindex @command{gawk}, functions, adding -@c STARTOFRANGE adfugaw -@cindex adding, functions to @command{gawk} -@c STARTOFRANGE fubadgaw -@cindex functions, built-in, adding to @command{gawk} -It is possible to add new built-in -functions to @command{gawk} using dynamically loaded libraries. This -facility is available on systems (such as GNU/Linux) that support -the C @code{dlopen()} and @code{dlsym()} functions. -This @value{CHAPTER} describes how to write and use dynamically -loaded extensions for @command{gawk}. -Experience with programming in -C or C++ is necessary when reading this @value{SECTION}. +It is possible to add new built-in functions to @command{gawk} using +dynamically loaded libraries. This facility is available on systems (such +as GNU/Linux) that support the C @code{dlopen()} and @code{dlsym()} +functions. This @value{CHAPTER} describes how to create extensions +using code written in C or C++. If you don't know anything about C +programming, you can safely skip this @value{CHAPTER}, although you +may wish to review the documentation on the extensions that come with +@command{gawk} (@pxref{Extension Samples}), and the section on the +@code{gawkextlib} project (@pxref{gawkextlib}). @quotation NOTE When @option{--sandbox} is specified, extensions are disabled -(@pxref{Options}. +(@pxref{Options}). @end quotation @menu +* Extension Intro:: What is an extension. * Plugin License:: A note about licensing. -* Sample Library:: A example of new functions. +* Extension Design:: Design notes about the extension API. +* Extension API Description:: A full description of the API. +* Extension Example:: Example C code for an extension. +* Extension Samples:: The sample extensions that ship with + @code{gawk}. +* gawkextlib:: The @code{gawkextlib} project. @end menu +@node Extension Intro +@section Introduction + +An @dfn{extension} (sometimes called a @dfn{plug-in}) is a piece of +external compiled code that @command{gawk} can load at runtime to +provide additional functionality, over and above the built-in capabilities +described in the rest of this @value{DOCUMENT}. + +Extensions are useful because they allow you (of course) to extend +@command{gawk}'s functionality. For example, they can provide access to +system calls (such as @code{chdir()} to change directory) and to other +C library routines that could be of use. As with most software, +``the sky is the limit;'' if you can imagine something that you might +want to do and can write in C or C++, you can write an extension to do it! + +Extensions are written in C or C++, using the @dfn{Application Programming +Interface} (API) defined for this purpose by the @command{gawk} +developers. The rest of this @value{CHAPTER} explains the design +decisions behind the API, the facilities it provides and how to use +them, and presents a small sample extension. In addition, it documents +the sample extensions included in the @command{gawk} distribution, +and describes the @code{gawkextlib} project. + @node Plugin License @section Extension Licensing Every dynamic extension should define the global symbol @code{plugin_is_GPL_compatible} to assert that it has been licensed under a GPL-compatible license. If this symbol does not exist, @command{gawk} -will emit a fatal error and exit. +emits a fatal error and exits when it tries to load your extension. The declared type of the symbol should be @code{int}. It does not need to be in any allocated section, though. The code merely asserts that @@ -28094,23 +28226,2383 @@ the symbol exists in the global scope. Something like this is enough: int plugin_is_GPL_compatible; @end example -@node Sample Library -@section Example: Directory and File Operation Built-ins -@c STARTOFRANGE chdirg -@cindex @code{chdir()} function@comma{} implementing in @command{gawk} -@c STARTOFRANGE statg -@cindex @code{stat()} function@comma{} implementing in @command{gawk} -@c STARTOFRANGE filre -@cindex files, information about@comma{} retrieving -@c STARTOFRANGE dirch -@cindex directories, changing - -Two useful functions that are not in @command{awk} are @code{chdir()} -(so that an @command{awk} program can change its directory) and -@code{stat()} (so that an @command{awk} program can gather information about -a file). -This @value{SECTION} implements these functions for @command{gawk} in an -external extension library. +@node Extension Design +@section Extension API Design + +The first version of extensions for @command{gawk} was developed in +the mid-1990s and released with @command{gawk} 3.1 in the late 1990s. +The basic mechanisms and design remained unchanged for close to 15 years, +until 2012. + +The old extension mechanism used data types and functions from +@command{gawk} itself, with a ``clever hack'' to install extension +functions. + +@command{gawk} included some sample extensions, of which a few were +really useful. However, it was clear from the outset that the extension +mechanism was bolted onto the side and was not really thought out. + +@menu +* Old Extension Problems:: Problems with the old mechanism. +* Extension New Mechanism Goals:: Goals for the new mechanism. +* Extension Other Design Decisions:: Some other design decisions. +* Extension Mechanism Outline:: An outline of how it works. +* Extension Future Growth:: Some room for future growth. +@end menu + +@node Old Extension Problems +@subsection Problems With The Old Mechanism + +The old extension mechanism had several problems: + +@itemize @bullet +@item +It depended heavily upon @command{gawk} internals. Any time the +@code{NODE} structure@footnote{A critical central data structure +inside @command{gawk}.} changed, an extension would have to be +recompiled. Furthermore, to really write extensions required understanding +something about @command{gawk}'s internal functions. There was some +documentation in this @value{DOCUMENT}, but it was quite minimal. + +@item +Being able to call into @command{gawk} from an extension required linker +facilities that are common on Unix-derived systems but that did +not work on Windows systems; users wanting extensions on Windows +had to statically link them into @command{gawk}, even though Windows supports +dynamic loading of shared objects. + +@item +The API would change occasionally as @command{gawk} changed; no compatibility +between versions was ever offered or planned for. +@end itemize + +Despite the drawbacks, the @command{xgawk} project developers forked +@command{gawk} and developed several significant extensions. They also +enhanced @command{gawk}'s facilities relating to file inclusion and +shared object access. + +A new API was desired for a long time, but only in 2012 did the +@command{gawk} maintainer and the @command{xgawk} developers finally +start working on it together. More information about the @command{xgawk} +project is provided in @ref{gawkextlib}. + +@node Extension New Mechanism Goals +@subsection Goals For A New Mechanism + +Some goals for the new API were: + +@itemize @bullet +@item +The API should be independent of @command{gawk} internals. Changes in +@command{gawk} internals should not be visible to the writer of an +extension function. + +@item +The API should provide @emph{binary} compatibility across @command{gawk} +releases as long as the API itself does not change. + +@item +The API should enable extensions written in C to have roughly the +same ``appearance'' to @command{awk}-level code as @command{awk} +functions do. This means that extensions should have: + +@itemize @minus +@item +The ability to access function parameters. + +@item +The ability to turn an undefined parameter into an array (call by reference). + +@item +The ability to create, access and update global variables. + +@item +Easy access to all the elements of an array at once (``array flattening'') +in order to loop over all the element in an easy fashion for C code. + +@item +The ability to create arrays (including @command{gawk}'s true +multi-dimensional arrays). +@end itemize +@end itemize + +Some additional important goals were: + +@itemize @bullet +@item +The API should use only features in ISO C 90, so that extensions +can be written using the widest range of C and C++ compilers. The header +should include the appropriate @samp{#ifdef __cplusplus} and @samp{extern "C"} +magic so that a C++ compiler could be used. (If using C++, the runtime +system has to be smart enough to call any constructors and destructors, +as @command{gawk} is a C program. As of this writing, this has not been +tested.) + +@item +The API mechanism should not require access to @command{gawk}'s +symbols@footnote{The @dfn{symbols} are the variables and functions +defined inside @command{gawk}. Access to these symbols by code +external to @command{gawk} loaded dynamically at runtime is +problematic on Windows.} by the compile-time or dynamic linker, +in order to enable creation of extensions that also work on Windows. +@end itemize + +During development, it became clear that there were other features +that should be available to extensions, which were also subsequently +provided: + +@itemize @bullet +@item +Extensions should have the ability to hook into @command{gawk}'s +I/O redirection mechanism. In particular, the @command{xgawk} +developers provided a so-called ``open hook'' to take over reading +records. During development, this was generalized to allow +extensions to hook into input processing, output processing, and +two-way I/O. + +@item +An extension should be able to provide a ``call back'' function +to perform clean up actions when @command{gawk} exits. + +@item +An extension should be able to provide a version string so that +@command{gawk}'s @option{--version} option can provide information +about extensions as well. +@end itemize + +@node Extension Other Design Decisions +@subsection Other Design Decisions + +As an ``arbitrary'' design decision, extensions can read the values of +built-in variables and arrays (such as @code{ARGV} and @code{FS}), but cannot +change them, with the exception of @code{PROCINFO}. + +The reason for this is to prevent an extension function from affecting +the flow of an @command{awk} program outside its control. While a real +@command{awk} function can do what it likes, that is at the discretion +of the programmer. An extension function should provide a service or +make a C API available for use within @command{awk}, and not mess with +@code{FS} or @code{ARGC} and @code{ARGV}. + +In addition, it becomes easy to start down a slippery slope. How +much access to @command{gawk} facilities do extensions need? +Do they need @code{getline}? What about calling @code{gsub()} or +compiling regular expressions? What about calling into @command{awk} +functions? (@emph{That} would be messy.) + +In order to avoid these issues, the @command{gawk} developers chose +to start with the simplest, most basic features that are still truly useful. + +Another decision is that although @command{gawk} provides nice things like +MPFR, and arrays indexed internally by integers, these features are not +being brought out to the API in order to keep things simple and close to +traditional @command{awk} semantics. (In fact, arrays indexed internally +by integers are so transparent that they aren't even documented!) + +With time, the API will undoubtedly evolve; the @command{gawk} developers +expect this to be driven by user needs. For now, the current API seems +to provide a minimal yet powerful set of features for creating extensions. + +@node Extension Mechanism Outline +@subsection At A High Level How It Works + +The requirement to avoid access to @command{gawk}'s symbols is, at first +glance, a difficult one to meet. + +One design, apparently used by Perl and Ruby and maybe others, would +be to make the mainline @command{gawk} code into a library, with the +@command{gawk} utility a small C @code{main()} function linked against +the library. + +This seemed like the tail wagging the dog, complicating build and +installation and making a simple copy of the @command{gawk} executable +from one system to another (or one place to another on the same +system!) into a chancy operation. + +Pat Rankin suggested the solution that was adopted. Communication between +@command{gawk} and an extension is two-way. First, when an extension +is loaded, it is passed a pointer to a @code{struct} whose fields are +function pointers. +@iftex +This is shown in @ref{load-extension}. +@end iftex + +@float Figure,load-extension +@caption{Loading the extension} +@ifinfo +@center @image{api-figure1, , , Loading the extension, txt} +@end ifinfo +@ifhtml +@center @image{api-figure1, , , Loading the extension, png} +@end ifhtml +@ifnotinfo +@ifnothtml +@center @image{api-figure1, , , Loading the extension} +@end ifnothtml +@end ifnotinfo +@end float + +The extension can call functions inside @command{gawk} through these +function pointers, at runtime, without needing (link-time) access +to @command{gawk}'s symbols. One of these function pointers is to a +function for ``registering'' new built-in functions. +@iftex +This is shown in @ref{load-new-function}. +@end iftex + +@float Figure,load-new-function +@caption{Loading the new function} +@ifinfo +@center @image{api-figure2, , , Loading the new function, txt} +@end ifinfo +@ifhtml +@center @image{api-figure2, , , Loading the new function, png} +@end ifhtml +@ifnotinfo +@ifnothtml +@center @image{api-figure2, , , Loading the new function} +@end ifnothtml +@end ifnotinfo +@end float + +In the other direction, the extension registers its new functions +with @command{gawk} by passing function pointers to the functions that +provide the new feature (@code{do_chdir()}, for example). @command{gawk} +associates the function pointer with a name and can then call it, using a +defined calling convention. +@iftex +This is shown in @ref{call-new-function}. +@end iftex + +@float Figure,call-new-function +@caption{Calling the new function} +@ifinfo +@center @image{api-figure3, , , Calling the new function, txt} +@end ifinfo +@ifhtml +@center @image{api-figure3, , , Calling the new function, png} +@end ifhtml +@ifnotinfo +@ifnothtml +@center @image{api-figure3, , , Calling the new function} +@end ifnothtml +@end ifnotinfo +@end float + +The @code{do_@var{xxx}()} function, in turn, then uses the function +pointers in the API @code{struct} to do its work, such as updating +variables or arrays, printing messages, setting @code{ERRNO}, and so on. + +Convenience macros in the @file{gawkapi.h} header file make calling +through the function pointers look like regular function calls so that +extension code is quite readable and understandable. + +Although all of this sounds medium complicated, the result is that +extension code is quite clean and straightforward. This can be seen in +the sample extensions @file{filefuncs.c} (@pxref{Extension Example}) +and also the @file{testext.c} code for testing the APIs. + +Some other bits and pieces: + +@itemize @bullet +@item +The API provides access to @command{gawk}'s @code{do_@var{xxx}} values, +reflecting command line options, like @code{do_lint}, @code{do_profiling} +and so on (@pxref{Extension API Variables}). +These are informational: an extension cannot affect these +inside @command{gawk}. In addition, attempting to assign to them +produces a compile-time error. + +@item +The API also provides major and minor version numbers, so that an +extension can check if the @command{gawk} it is loaded with supports the +facilities it was compiled with. (Version mismatches ``shouldn't'' +happen, but we all know how @emph{that} goes.) +@xref{Extension Versioning}, for details. +@end itemize + +@node Extension Future Growth +@subsection Room For Future Growth + +The API provides room for future growth, in two ways. + +An ``extension id'' is passed into the extension when its loaded. This +extension id is then passed back to @command{gawk} with each function +call. This allows @command{gawk} to identify the extension calling into it, +should it need to know. + +A ``name space'' is passed into @command{gawk} when an extension function +is registered. This provides for a future mechanism for grouping +extension functions and possibly avoiding name conflicts. + +Of course, as of this writing, no decisions have been made with respect +to any of the above. + +@node Extension API Description +@section API Description + +This (rather large) @value{SECTION} describes the API in detail. + +@menu +* Extension API Functions Introduction:: Introduction to the API functions. +* General Data Types:: The data types. +* Requesting Values:: How to get a value. +* Constructor Functions:: Functions for creating values. +* Registration Functions:: Functions to register things with + @command{gawk}. +* Printing Messages:: Functions for printing messages. +* Updating @code{ERRNO}:: Functions for updating @code{ERRNO}. +* Accessing Parameters:: Functions for accessing parameters. +* Symbol Table Access:: Functions for accessing global + variables. +* Array Manipulation:: Functions for working with arrays. +* Extension API Variables:: Variables provided by the API. +* Extension API Boilerplate:: Boilerplate code for using the API. +* Finding Extensions:: How @command{gawk} find compiled + extensions. +@end menu + +@node Extension API Functions Introduction +@subsection Introduction + +Access to facilities within @command{gawk} are made available +by calling through function pointers passed into your extension. + +API function pointers are provided for the following kinds of operations: + +@itemize @bullet +@item +Registrations functions. You may register: +@itemize @minus +@item +extension functions, +@item +exit callbacks, +@item +a version string, +@item +input parsers, +@item +output wrappers, +@item +and two-way processors. +@end itemize +All of these are discussed in detail, later in this @value{CHAPTER}. + +@item +Printing fatal, warning, and ``lint'' warning messages. + +@item +Updating @code{ERRNO}, or unsetting it. + +@item +Accessing parameters, including converting an undefined parameter into +an array. + +@item +Symbol table access: retrieving a global variable, creating one, +or changing one. This also includes the ability to create a scalar +variable that will be @emph{constant} within @command{awk} code. + +@item +Creating and releasing cached values; this provides an +efficient way to use values for multiple variables and +can be a big performance win. + +@item +Manipulating arrays: +@itemize @minus +@item +Retrieving, adding, deleting, and modifying elements +@item +Getting the count of elements in an array +@item +Creating a new array +@item +Clearing an array +@item +Flattening an array for easy C style looping over all its indices and elements +@end itemize +@end itemize + +Some points about using the API: + +@itemize @bullet +@item +You must include @code{<sys/types.h>} and @code{<sys/stat.h>} before including +the @file{gawkapi.h} header file. In addition, you must include either +@code{<stddef.h>} or @code{<stdlib.h>} to get the definition of @code{size_t}. +If you wish to use the boilerplate @code{dl_load_func()} macro, you will +need to include @code{<stdio.h>} as well. +Finally, to pass reasonable integer values for @code{ERRNO}, you +will need to include @code{<errno.h>}. + +@item +Although the API only uses ISO C 90 features, there is an exception; the +``constructor'' functions use the @code{inline} keyword. If your compiler +does not support this keyword, you should either place +@samp{-Dinline=''} on your command line, or use the GNU Autotools and include a +@file{config.h} file in your extensions. + +@item +All pointers filled in by @command{gawk} are to memory +managed by @command{gawk} and should be treated by the extension as +read-only. Memory for @emph{all} strings passed into @command{gawk} +from the extension @emph{must} come from @code{malloc()} and is managed +by @command{gawk} from then on. + +@item +The API defines several simple structs that map values as seen +from @command{awk}. A value can be a @code{double}, a string, or an +array (as in multidimensional arrays, or when creating a new array). +Strings maintain both pointer and length since embedded @code{NUL} +characters are allowed. + +By intent, strings are maintained using the current multibyte encoding (as +defined by @env{LC_@var{xxx}} environment variables) and not using wide +characters. This matches how @command{gawk} stores strings internally +and also how characters are likely to be input and output from files. + +@item +When retrieving a value (such as a parameter or that of a global variable +or array element), the extension requests a specific type (number, string, +scalars, value cookie, array, or ``undefined''). When the request is +``undefined,'' the returned value will have the real underlying type. + +However, if the request and actual type don't match, the access function +returns ``false'' and fills in the type of the actual value that is there, +so that the extension can, e.g., print an error message +(``scalar passed where array expected''). + +@c This is documented in the header file and needs some expanding upon. +@c The table there should be presented here +@end itemize + +While you may call the API functions by using the function pointers +directly, the interface is not so pretty. To make extension code look +more like regular code, the @file{gawkapi.h} header file defines a number +of macros which you should use in your code. This @value{SECTION} presents +the macros as if they were functions. + +@node General Data Types +@subsection General Purpose Data Types + +@quotation +@i{I have a true love/hate relationship with unions.}@* +Arnold Robbins + +@i{That's the thing about unions: the compiler will arrange things so they +can accommodate both love and hate.}@* +Chet Ramey +@end quotation + +The extension API defines a number of simple types and structures for general +purpose use. Additional, more specialized, data structures, are introduced +in subsequent @value{SECTION}s, together with the functions that use them. + +@table @code +@item typedef void *awk_ext_id_t; +A value of this type is received from @command{gawk} when an extension is loaded. +That value must then be passed back to @command{gawk} as the first parameter of +each API function. + +@item #define awk_const @dots{} +This macro expands to @samp{const} when compiling an extension, +and to nothing when compiling @command{gawk} itself. This makes +certain fields in the API data structures unwritable from extension code, +while allowing @command{gawk} to use them as it needs to. + +@item typedef int awk_bool_t; +A simple boolean type. At the moment, the API does not define special +``true'' and ``false'' values, although perhaps it should. + +@item typedef struct @{ +@itemx @ @ @ @ char *str;@ @ @ @ @ @ /* data */ +@itemx @ @ @ @ size_t len;@ @ @ @ @ /* length thereof, in chars */ +@itemx @} awk_string_t; +This represents a mutable string. @command{gawk} +owns the memory pointed to if it supplied +the value. Otherwise, it takes ownership of the memory pointed to. +@strong{Such memory must come from @code{malloc()}!} + +As mentioned earlier, strings are maintained using the current +multibyte encoding. + +@item typedef enum @{ +@itemx @ @ @ @ AWK_UNDEFINED, +@itemx @ @ @ @ AWK_NUMBER, +@itemx @ @ @ @ AWK_STRING, +@itemx @ @ @ @ AWK_ARRAY, +@itemx @ @ @ @ AWK_SCALAR,@ @ @ @ @ @ @ @ @ /* opaque access to a variable */ +@itemx @ @ @ @ AWK_VALUE_COOKIE@ @ @ /* for updating a previously created value */ +@itemx @} awk_valtype_t; +This @code{enum} indicates the type of a value. +It is used in the following @code{struct}. + +@item typedef struct @{ +@itemx @ @ @ @ awk_valtype_t val_type; +@itemx @ @ @ @ union @{ +@itemx @ @ @ @ @ @ @ @ awk_string_t@ @ @ @ @ @ @ s; +@itemx @ @ @ @ @ @ @ @ double@ @ @ @ @ @ @ @ @ @ @ @ @ d; +@itemx @ @ @ @ @ @ @ @ awk_array_t@ @ @ @ @ @ @ @ a; +@itemx @ @ @ @ @ @ @ @ awk_scalar_t@ @ @ @ @ @ @ scl; +@itemx @ @ @ @ @ @ @ @ awk_value_cookie_t@ vc; +@itemx @ @ @ @ @} u; +@itemx @} awk_value_t; +An ``@command{awk} value.'' +The @code{val_type} member indicates what kind of value the +@code{union} holds, and each member is of the appropriate type. + +@item #define str_value@ @ @ @ @ @ u.s +@itemx #define num_value@ @ @ @ @ @ u.d +@itemx #define array_cookie@ @ @ u.a +@itemx #define scalar_cookie@ @ u.scl +@itemx #define value_cookie@ @ @ u.vc +These macros make accessing the fields of the @code{awk_value_t} more +readable. + +@item typedef void *awk_scalar_t; +Scalars can be represented as an opaque type. These values are obtained from +@command{gawk} and then passed back into it. This is discussed in a general fashion below, +and in more detail in @ref{Symbol table by cookie}. + +@item typedef void *awk_value_cookie_t; +A ``value cookie'' is an opaque type representing a cached value. +This is also discussed in a general fashion below, +and in more detail in @ref{Cached values}. + +@end table + +Scalar values in @command{awk} are either numbers or strings. The +@code{awk_value_t} struct represents values. The @code{val_type} member +indicates what is in the @code{union}. + +Representing numbers is easy---the API uses a C @code{double}. Strings +require more work. Since @command{gawk} allows embedded @code{NUL} bytes +in string values, a string must be represented as a pair containing a +data-pointer and length. This is the @code{awk_string_t} type. + +Identifiers (i.e., the names of global variables) can be associated +with either scalar values or with arrays. In addition, @command{gawk} +provides true arrays of arrays, where any given array element can +itself be an array. Discussion of arrays is delayed until +@ref{Array Manipulation}. + +The various macros listed earlier make it easier to use the elements +of the @code{union} as if they were fields in a @code{struct}; this +is a common coding practice in C. Such code is easier to write and to +read, however it remains @emph{your} responsibility to make sure that +the @code{val_type} member correctly reflects the type of the value in +the @code{awk_value_t}. + +Conceptually, the first three members of the @code{union} (number, string, +and array) are all that is needed for working with @command{awk} values. +However, since the API provides routines for accessing and changing +the value of global scalar variables only by using the variable's name, +there is a performance penalty: @command{gawk} must find the variable +each time it is accessed and changed. This turns out to be a real issue, +not just a theoretical one. + +Thus, if you know that your extension will spend considerable time +reading and/or changing the value of one or more scalar variables, you +can obtain a @dfn{scalar cookie}@footnote{See +@uref{http://catb.org/jargon/html/C/cookie.html, the ``cookie'' entry in the Jargon file} for a +definition of @dfn{cookie}, and @uref{http://catb.org/jargon/html/M/magic-cookie.html, +the ``magic cookie'' entry in the Jargon file} for a nice example. See +also the entry for ``Cookie'' in the @ref{Glossary}.} +object for that variable, and then use +the cookie for getting the variable's value or for changing the variable's +value. +This is the @code{awk_scalar_t} type and @code{scalar_cookie} macro. +Given a scalar cookie, @command{gawk} can directly retrieve or +modify the value, as required, without having to first find it. + +The @code{awk_value_cookie_t} type and @code{value_cookie} macro are similar. +If you know that you wish to +use the same numeric or string @emph{value} for one or more variables, +you can create the value once, retaining a @dfn{value cookie} for it, +and then pass in that value cookie whenever you wish to set the value of a +variable. This saves both storage space within the running @command{gawk} +process as well as the time needed to create the value. + +@node Requesting Values +@subsection Requesting Values + +All of the functions that return values from @command{gawk} +work in the same way. You pass in an @code{awk_valtype_t} value +to indicate what kind of value you expect. If the actual value +matches what you requested, the function returns true and fills +in the @code{awk_value_t} result. +Otherwise, the function returns false, and the @code{val_type} +member indicates the type of the actual value. You may then +print an error message, or reissue the request for the actual +value type, as appropriate. This behavior is summarized in +@ref{table-value-types-returned}. + +@ifnotplaintext +@float Table,table-value-types-returned +@caption{Value Types Returned} +@multitable @columnfractions .50 .50 +@headitem @tab Type of Actual Value: +@end multitable +@multitable @columnfractions .166 .166 .198 .15 .15 .166 +@headitem @tab @tab String @tab Number @tab Array @tab Undefined +@item @tab @b{String} @tab String @tab String @tab false @tab false +@item @tab @b{Number} @tab Number if can be converted, else false @tab Number @tab false @tab false +@item @b{Type} @tab @b{Array} @tab false @tab false @tab Array @tab false +@item @b{Requested:} @tab @b{Scalar} @tab Scalar @tab Scalar @tab false @tab false +@item @tab @b{Undefined} @tab String @tab Number @tab Array @tab Undefined +@item @tab @b{Value Cookie} @tab false @tab false @tab false @tab false +@end multitable +@end float +@end ifnotplaintext +@ifplaintext +@float Table,table-value-types-returned +@caption{Value Types Returned} +@example + +-------------------------------------------------+ + | Type of Actual Value: | + +------------+------------+-----------+-----------+ + | String | Number | Array | Undefined | ++-----------+-----------+------------+------------+-----------+-----------+ +| | String | String | String | false | false | +| |-----------+------------+------------+-----------+-----------+ +| | Number | Number if | Number | false | false | +| | | can be | | | | +| | | converted, | | | | +| | | else false | | | | +| |-----------+------------+------------+-----------+-----------+ +| Type | Array | false | false | Array | false | +| Requested |-----------+------------+------------+-----------+-----------+ +| | Scalar | Scalar | Scalar | false | false | +| |-----------+------------+------------+-----------+-----------+ +| | Undefined | String | Number | Array | Undefined | +| |-----------+------------+------------+-----------+-----------+ +| | Value | false | false | false | false | +| | Cookie | | | | | ++-----------+-----------+------------+------------+-----------+-----------+ +@end example +@end float +@end ifplaintext + +@node Constructor Functions +@subsection Constructor Functions and Convenience Macros + +The API provides a number of @dfn{constructor} functions for creating +string and numeric values, as well as a number of convenience macros. +This @value{SUBSECTION} presents them all as function prototypes, in +the way that extension code would use them. + +@table @code +@item static inline awk_value_t * +@itemx make_const_string(const char *string, size_t length, awk_value_t *result) +This function creates a string value in the @code{awk_value_t} variable +pointed to by @code{result}. It expects @code{string} to be a C string constant +(or other string data), and automatically creates a @emph{copy} of the data +for storage in @code{result}. It returns @code{result}. + +@item static inline awk_value_t * +@itemx make_malloced_string(const char *string, size_t length, awk_value_t *result) +This function creates a string value in the @code{awk_value_t} variable +pointed to by @code{result}. It expects @code{string} to be a @samp{char *} +value pointing to data previously obtained from @code{malloc()}. The idea here +is that the data is passed directly to @command{gawk}, which assumes +responsibility for it. It returns @code{result}. + +@item static inline awk_value_t * +@itemx make_null_string(awk_value_t *result) +This specialized function creates a null string (the ``undefined'' value) +in the @code{awk_value_t} variable pointed to by @code{result}. +It returns @code{result}. + +@item static inline awk_value_t * +@itemx make_number(double num, awk_value_t *result) +This function simply creates a numeric value in the @code{awk_value_t} variable +pointed to by @code{result}. +@end table + +Two convenience macros may be used for allocating storage from @code{malloc()} +and @code{realloc()}. If the allocation fails, they cause @command{gawk} to +exit with a fatal error message. They should be used as if they were +procedure calls that do not return a value. + +@table @code +@item emalloc(pointer, type, size, message) +The arguments to this macro are as follows: +@c nested table +@table @code +@item pointer +The pointer variable to point at the allocated storage. + +@item type +The type of the pointer variable, used to create a cast for the call to @code{malloc()}. + +@item size +The total number of bytes to be allocated. + +@item message +A message to be prefixed to the fatal error message. Typically this is the name +of the function using the macro. +@end table + +@noindent +For example, you might allocate a string value like so: + +@example +awk_value_t result; +char *message; +const char greet[] = "Don't Panic!"; + +emalloc(message, char *, sizeof(greet), "myfunc"); +strcpy(message, greet); +make_malloced_string(message, strlen(message), & result); +@end example + +@item erealloc(pointer, type, size, message) +This is like @code{emalloc()}, but it calls @code{realloc()}, +instead of @code{malloc()}. +The arguments are the same as for the @code{emalloc()} macro. +@end table + +@node Registration Functions +@subsection Registration Functions + +This @value{SECTION} describes the API functions for +registering parts of your extension with @command{gawk}. + +@menu +* Extension Functions:: Registering extension functions. +* Exit Callback Functions:: Registering an exit callback. +* Extension Version String:: Registering a version string. +* Input Parsers:: Registering an input parser. +* Output Wrappers:: Registering an output wrapper. +* Two-way processors:: Registering a two-way processor. +@end menu + +@node Extension Functions +@subsubsection Registering An Extension Function + +Extension functions are described by the following record: + +@example +typedef struct @{ +@ @ @ @ const char *name; +@ @ @ @ awk_value_t *(*function)(int num_actual_args, awk_value_t *result); +@ @ @ @ size_t num_expected_args; +@} awk_ext_func_t; +@end example + +The fields are: + +@table @code +@item const char *name; +The name of the new function. +@command{awk} level code calls the function by this name. +This is a regular C string. + +@item awk_value_t *(*function)(int num_actual_args, awk_value_t *result); +This is a pointer to the C function that provides the desired +functionality. +The function must fill in the result with either a number +or a string. @command{awk} takes ownership of any string memory. +As mentioned earlier, string memory @strong{must} come from @code{malloc()}. + +The function must return the value of @code{result}. +This is for the convenience of the calling code inside @command{gawk}. + +@item size_t num_expected_args; +This is the number of arguments the function expects to receive. +Each extension function may decide what to do if the number of +arguments isn't what it expected. Following @command{awk} functions, it +is likely OK to ignore extra arguments. +@end table + +Once you have a record representing your extension function, you register +it with @command{gawk} using this API function: + +@table @code +@item awk_bool_t add_ext_func(const char *namespace, const awk_ext_func_t *func); +This function returns true upon success, false otherwise. +The @code{namespace} parameter is currently not used; you should pass in an +empty string (@code{""}). The @code{func} pointer is the address of a +@code{struct} representing your function, as just described. +@end table + +@node Exit Callback Functions +@subsubsection Registering An Exit Callback Function + +An @dfn{exit callback} function is a function that +@command{gawk} calls before it exits. +Such functions are useful if you have general ``clean up'' tasks +that should be performed in your extension (such as closing data +base connections or other resource deallocations). +You can register such +a function with @command{gawk} using the following function. + +@table @code +@item void awk_atexit(void (*funcp)(void *data, int exit_status), +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ void *arg0); +The parameters are: +@c nested table +@table @code +@item funcp +A pointer to the function to be called before @command{gawk} exits. The @code{data} +parameter will be the original value of @code{arg0}. +The @code{exit_status} parameter is +the exit status value that @command{gawk} will pass to the @code{exit()} system call. + +@item arg0 +A pointer to private data which @command{gawk} saves in order to pass to +the function pointed to by @code{funcp}. +@end table +@end table + +Exit callback functions are called in Last-In-First-Out (LIFO) order---that is, in +the reverse order in which they are registered with @command{gawk}. + +@node Extension Version String +@subsubsection Registering An Extension Version String + +You can register a version string which indicates the name and +version of your extension, with @command{gawk}, as follows: + +@table @code +@item void register_ext_version(const char *version); +Register the string pointed to by @code{version} with @command{gawk}. +@command{gawk} does @emph{not} copy the @code{version} string, so +it should not be changed. +@end table + +@command{gawk} prints all registered extension version strings when it +is invoked with the @option{--version} option. + +@node Input Parsers +@subsubsection Customized Input Parsers + +By default, @command{gawk} reads text files as its input. It uses the value +of @code{RS} to find the end of the record, and then uses @code{FS} +(or @code{FIELDWIDTHS}) to split it into fields (@pxref{Reading Files}). +Additionally, it sets the value of @code{RT} (@pxref{Built-in Variables}). + +If you want, you can provide your own, custom, input parser. An input +parser's job is to return a record to the @command{gawk} record processing +code, along with indicators for the value and length of the data to be +used for @code{RT}, if any. + +To provide an input parser, you must first provide two functions +(where @var{XXX} is a prefix name for your extension): + +@table @code +@item awk_bool_t @var{XXX}_can_take_file(const awk_input_buf_t *iobuf) +This function examines the information available in @code{iobuf} +(which we discuss shortly). Based on the information there, it +decides if the input parser should be used for this file. +If so, it should return true. Otherwise, it should return false. +It should not change any state (variable values, etc.) within @command{gawk}. + +@item awk_bool_t @var{XXX}_take_control_of(awk_input_buf_t *iobuf) +When @command{gawk} decides to hand control of the file over to the +input parser, it calls this function. This function in turn must fill +in certain fields in the @code{awk_input_buf_t} structure, and ensure +that certain conditions are true. It should then return true. If an +error of some kind occurs, it should not fill in any fields, and should +return false; then @command{gawk} will not use the input parser. +The details are presented shortly. +@end table + +Your extension should package these functions inside an +@code{awk_input_parser_t}, which looks like this: + +@example +typedef struct input_parser @{ + const char *name; /* name of parser */ + awk_bool_t (*can_take_file)(const awk_input_buf_t *iobuf); + awk_bool_t (*take_control_of)(awk_input_buf_t *iobuf); + awk_const struct input_parser *awk_const next; /* for use by gawk */ +@} awk_input_parser_t; +@end example + +The fields are: + +@table @code +@item const char *name; +The name of the input parser. This is a regular C string. + +@item awk_bool_t (*can_take_file)(const awk_input_buf_t *iobuf); +A pointer to your @code{@var{XXX}_can_take_file()} function. + +@item awk_bool_t (*take_control_of)(awk_input_buf_t *iobuf); +A pointer to your @code{@var{XXX}_take_control_of()} function. + +@item awk_const struct input_parser *awk_const next; +This pointer is used by @command{gawk}. +The extension cannot modify it. +@end table + +The steps are as follows: + +@enumerate +@item +Create a @code{static awk_input_parser_t} variable and initialize it +appropriately. + +@item +When your extension is loaded, register your input parser with +@command{gawk} using the @code{register_input_parser()} API function +(described below). +@end enumerate + +An @code{awk_input_buf_t} looks like this: + +@example +typedef struct awk_input @{ + const char *name; /* filename */ + int fd; /* file descriptor */ +#define INVALID_HANDLE (-1) + void *opaque; /* private data for input parsers */ + int (*get_record)(char **out, struct awk_input *iobuf, + int *errcode, char **rt_start, size_t *rt_len); + void (*close_func)(struct awk_input *iobuf); + struct stat sbuf; /* stat buf */ +@} awk_input_buf_t; +@end example + +The fields can be divided into two categories: those for use (initially, +at least) by @code{@var{XXX}_can_take_file()}, and those for use by +@code{@var{XXX}_take_control_of()}. The first group of fields and their uses +are as follows: + +@table @code +@item const char *name; +The name of the file. + +@item int fd; +A file descriptor for the file. If @command{gawk} was able to +open the file, then @code{fd} will @emph{not} be equal to +@code{INVALID_HANDLE}. Otherwise, it will. + +@item struct stat sbuf; +If file descriptor is valid, then @command{gawk} will have filled +in this structure via a call to the @code{fstat()} system call. +@end table + +The @code{@var{XXX}_can_take_file()} function should examine these +fields and decide if the input parser should be used for the file. +The decision can be made based upon @command{gawk} state (the value +of a variable defined previously by the extension and set by +@command{awk} code), the name of the +file, whether or not the file descriptor is valid, the information +in the @code{struct stat}, or any combination of the above. + +Once @code{@var{XXX}_can_take_file()} has returned true, and +@command{gawk} has decided to use your input parser, it calls +@code{@var{XXX}_take_control_of()}. That function then fills in at +least the @code{get_record} field of the @code{awk_input_buf_t}. It must +also ensure that @code{fd} is not set to @code{INVALID_HANDLE}. All of +the fields that may be filled by @code{@var{XXX}_take_control_of()} +are as follows: + +@table @code +@item void *opaque; +This is used to hold any state information needed by the input parser +for this file. It is ``opaque'' to @command{gawk}. The input parser +is not required to use this pointer. + +@item int@ (*get_record)(char@ **out, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ struct@ awk_input *iobuf, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ int *errcode, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ char **rt_start, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ size_t *rt_len); +This function pointer should point to a function that creates the input +records. Said function is the core of the input parser. Its behavior +is described below. + +@item void (*close_func)(struct awk_input *iobuf); +This function pointer should point to a function that does +the ``tear down.'' It should release any resources allocated by +@code{@var{XXX}_take_control_of()}. It may also close the file. If it +does so, it should set the @code{fd} field to @code{INVALID_HANDLE}. + +If @code{fd} is still not @code{INVALID_HANDLE} after the call to this +function, @command{gawk} calls the regular @code{close()} system call. + +Having a ``tear down'' function is optional. If your input parser does +not need it, do not set this field. Then, @command{gawk} calls the +regular @code{close()} system call on the file descriptor, so it should +be valid. +@end table + +The @code{@var{XXX}_get_record()} function does the work of creating +input records. The parameters are as follows: + +@table @code +@item char **out +This is a pointer to a @code{char *} variable which is set to point +to the record. @command{gawk} makes its own copy of the data, so +the extension must manage this storage. + +@item struct awk_input *iobuf +This is the @code{awk_input_buf_t} for the file. The fields should be +used for reading data (@code{fd}) and for managing private state +(@code{opaque}), if any. + +@item int *errcode +If an error occurs, @code{*errcode} should be set to an appropriate +code from @code{<errno.h>}. + +@item char **rt_start +@itemx size_t *rt_len +If the concept of a ``record terminator'' makes sense, then +@code{*rt_start} should be set to point to the data to be used for +@code{RT}, and @code{*rt_len} should be set to the length of the +data. Otherwise, @code{*rt_len} should be set to zero. +@code{gawk} makes its own copy of this data, so the +extension must manage the storage. +@end table + +The return value is the length of the buffer pointed to by +@code{*out}, or @code{EOF} if end-of-file was reached or an +error occurred. + +It is guaranteed that @code{errcode} is a valid pointer, so there is no +need to test for a @code{NULL} value. @command{gawk} sets @code{*errcode} +to zero, so there is no need to set it unless an error occurs. + +If an error does occur, the function should return @code{EOF} and set +@code{*errcode} to a non-zero value. In that case, if @code{*errcode} +does not equal @minus{}1, @command{gawk} automatically updates +the @code{ERRNO} variable based on the value of @code{*errcode} (e.g., +setting @samp{*errcode = errno} should do the right thing). + +@command{gawk} ships with a sample extension that reads directories, +returning records for each entry in the directory (@pxref{Extension +Sample Readdir}). You may wish to use that code as a guide for writing +your own input parser. + +When writing an input parser, you should think about (and document) +how it is expected to interact with @command{awk} code. You may want +it to always be called, and take effect as appropriate (as the +@code{readdir} extension does). Or you may want it to take effect +based upon the value of an @code{awk} variable, as the XML extension +from the @code{gawkextlib} project does (@pxref{gawkextlib}). +In the latter case, code in a @code{BEGINFILE} section +can look at @code{FILENAME} and @code{ERRNO} to decide whether or +not to activate an input parser (@pxref{BEGINFILE/ENDFILE}). + +You register your input parser with the following function: + +@table @code +@item void register_input_parser(awk_input_parser_t *input_parser); +Register the input parser pointed to by @code{input_parser} with +@command{gawk}. +@end table + +@node Output Wrappers +@subsubsection Customized Output Wrappers + +An @dfn{output wrapper} is the mirror image of an input parser. +It allows an extension to take over the output to a file opened +with the @samp{>} or @samp{>>} operators (@pxref{Redirection}). + +The output wrapper is very similar to the input parser structure: + +@example +typedef struct output_wrapper @{ + const char *name; /* name of the wrapper */ + awk_bool_t (*can_take_file)(const awk_output_buf_t *outbuf); + awk_bool_t (*take_control_of)(awk_output_buf_t *outbuf); + awk_const struct output_wrapper *awk_const next; /* for use by gawk */ +@} awk_output_wrapper_t; +@end example + +The members are as follows: + +@table @code +@item const char *name; +This is the name of the output wrapper. + +@item awk_bool_t (*can_take_file)(const awk_output_buf_t *outbuf); +This points to a function that examines the information in +the @code{awk_output_buf_t} structure pointed to by @code{outbuf}. +It should return true if the output wrapper wants to take over the +file, and false otherwise. It should not change any state (variable +values, etc.) within @command{gawk}. + +@item awk_bool_t (*take_control_of)(awk_output_buf_t *outbuf); +The function pointed to by this field is called when @command{gawk} +decides to let the output wrapper take control of the file. It should +fill in appropriate members of the @code{awk_output_buf_t} structure, +as described below, and return true if successful, false otherwise. + +@item awk_const struct output_wrapper *awk_const next; +This is for use by @command{gawk}. +@end table + +The @code{awk_output_buf_t} structure looks like this: + +@example +typedef struct @{ + const char *name; /* name of output file */ + const char *mode; /* mode argument to fopen */ + FILE *fp; /* stdio file pointer */ + awk_bool_t redirected; /* true if a wrapper is active */ + void *opaque; /* for use by output wrapper */ + size_t (*gawk_fwrite)(const void *buf, size_t size, size_t count, + FILE *fp, void *opaque); + int (*gawk_fflush)(FILE *fp, void *opaque); + int (*gawk_ferror)(FILE *fp, void *opaque); + int (*gawk_fclose)(FILE *fp, void *opaque); +@} awk_output_buf_t; +@end example + +Here too, your extension will define @code{@var{XXX}_can_take_file()} +and @code{@var{XXX}_take_control_of()} functions that examine and update +data members in the @code{awk_output_buf_t}. +The data members are as follows: + +@table @code +@item const char *name; +The name of the output file. + +@item const char *mode; +The mode string (as would be used in the second argument to @code{fopen()}) +with which the file was opened. + +@item FILE *fp; +The @code{FILE} pointer from @code{<stdio.h>}. @command{gawk} opens the file +before attempting to find an output wrapper. + +@item awk_bool_t redirected; +This field must be set to true by the @code{@var{XXX}_take_control_of()} function. + +@item void *opaque; +This pointer is opaque to @command{gawk}. The extension should use it to store +a pointer to any private data associated with the file. + +@item size_t (*gawk_fwrite)(const void *buf, size_t size, size_t count, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ FILE *fp, void *opaque); +@itemx int (*gawk_fflush)(FILE *fp, void *opaque); +@itemx int (*gawk_ferror)(FILE *fp, void *opaque); +@itemx int (*gawk_fclose)(FILE *fp, void *opaque); +These pointers should be set to point to functions that perform +the equivalent function as the @code{<stdio.h>} functions do, if appropriate. +@command{gawk} uses these function pointers for all output. +@command{gawk} initializes the pointers to point to internal, ``pass through'' +functions that just call the regular @code{<stdio.h>} functions, so an +extension only needs to redefine those functions that are appropriate for +what it does. +@end table + +The @code{@var{XXX}_can_take_file()} function should make a decision based +upon the @code{name} and @code{mode} fields, and any additional state +(such as @command{awk} variable values) that is appropriate. + +When @command{gawk} calls @code{@var{XXX}_take_control_of()}, it should fill +in the other fields, as appropriate, except for @code{fp}, which it should just +use normally. + +You register your output wrapper with the following function: + +@table @code +@item void register_output_wrapper(awk_output_wrapper_t *output_wrapper); +Register the output wrapper pointed to by @code{output_wrapper} with +@command{gawk}. +@end table + +@node Two-way processors +@subsubsection Customized Two-way Processors + +A @dfn{two-way processor} combines an input parser and an output wrapper for +two-way I/O with the @samp{|&} operator (@pxref{Redirection}). It makes identical +use of the @code{awk_input_parser_t} and @code{awk_output_buf_t} structures +as described earlier. + +A two-way processor is represented by the following structure: + +@example +typedef struct two_way_processor @{ + const char *name; /* name of the two-way processor */ + awk_bool_t (*can_take_two_way)(const char *name); + awk_bool_t (*take_control_of)(const char *name, + awk_input_buf_t *inbuf, + awk_output_buf_t *outbuf); + awk_const struct two_way_processor *awk_const next; /* for use by gawk */ +@} awk_two_way_processor_t; +@end example + +The fields are as follows: + +@table @code +@item const char *name; +The name of the two-way processor. + +@item awk_bool_t (*can_take_two_way)(const char *name); +This function returns true if it wants to take over two-way I/O for this filename. +It should not change any state (variable +values, etc.) within @command{gawk}. + +@item awk_bool_t (*take_control_of)(const char *name, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_input_buf_t *inbuf, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_output_buf_t *outbuf); +This function should fill in the @code{awk_input_buf_t} and +@code{awk_outut_buf_t} structures pointed to by @code{inbuf} and +@code{outbuf}, respectively. These structures were described earlier. + +@item awk_const struct two_way_processor *awk_const next; +This is for use by @command{gawk}. +@end table + +As with the input parser and output processor, you provide +``yes I can take this'' and ``take over for this'' functions, +@code{@var{XXX}_can_take_two_way()} and @code{@var{XXX}_take_control_of()}. + +You register your two-way processor with the following function: + +@table @code +@item void register_two_way_processor(awk_two_way_processor_t *two_way_processor); +Register the two-way processor pointed to by @code{two_way_processor} with +@command{gawk}. +@end table + +@node Printing Messages +@subsection Printing Messages + +You can print different kinds of warning messages from your +extension, as described below. Note that for these functions, +you must pass in the extension id received from @command{gawk} +when the extension was loaded.@footnote{Because the API uses only ISO C 90 +features, it cannot make use of the ISO C 99 variadic macro feature to hide +that parameter. More's the pity.} + +@table @code +@item void fatal(awk_ext_id_t id, const char *format, ...); +Print a message and then cause @command{gawk} to exit immediately. + +@item void warning(awk_ext_id_t id, const char *format, ...); +Print a warning message. + +@item void lintwarn(awk_ext_id_t id, const char *format, ...); +Print a ``lint warning.'' Normally this is the same as printing a +warning message, but if @command{gawk} was invoked with @samp{--lint=fatal}, +then lint warnings become fatal error messages. +@end table + +All of these functions are otherwise like the C @code{printf()} +family of functions, where the @code{format} parameter is a string +with literal characters and formatting codes intermixed. + +@node Updating @code{ERRNO} +@subsection Updating @code{ERRNO} + +The following functions allow you to update the @code{ERRNO} +variable: + +@table @code +@item void update_ERRNO_int(int errno_val); +Set @code{ERRNO} to the string equivalent of the error code +in @code{errno_val}. The value should be one of the defined +error codes in @code{<errno.h>}, and @command{gawk} turns it +into a (possibly translated) string using the C @code{strerror()} function. + +@item void update_ERRNO_string(const char *string); +Set @code{ERRNO} directly to the string value of @code{ERRNO}. +@command{gawk} makes a copy of the value of @code{string}. + +@item void unset_ERRNO(); +Unset @code{ERRNO}. +@end table + +@node Accessing Parameters +@subsection Accessing and Updating Parameters + +Two functions give you access to the arguments (parameters) +passed to your extension function. They are: + +@table @code +@item awk_bool_t get_argument(size_t count, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_valtype_t wanted, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_value_t *result); +Fill in the @code{awk_value_t} structure pointed to by @code{result} +with the @code{count}'th argument. Return true if the actual +type matches @code{wanted}, false otherwise. In the latter +case, @code{result@w{->}val_type} indicates the actual type +(@pxref{table-value-types-returned}). Counts are zero based---the first +argument is numbered zero, the second one, and so on. @code{wanted} +indicates the type of value expected. + +@item awk_bool_t set_argument(size_t count, awk_array_t array); +Convert a parameter that was undefined into an array; this provides +call-by-reference for arrays. Return false if @code{count} is too big, +or if the argument's type is not undefined. @xref{Array Manipulation}, +for more information on creating arrays. +@end table + +@node Symbol Table Access +@subsection Symbol Table Access + +Two sets of routines provide access to global variables, and one set +allows you to create and release cached values. + +@menu +* Symbol table by name:: Accessing variables by name. +* Symbol table by cookie:: Accessing variables by ``cookie''. +* Cached values:: Creating and using cached values. +@end menu + +@node Symbol table by name +@subsubsection Variable Access and Update by Name + +The following routines provide the ability to access and update +global @command{awk}-level variables by name. In compiler terminology, +identifiers of different kinds are termed @dfn{symbols}, thus the ``sym'' +in the routines' names. The data structure which stores information +about symbols is termed a @dfn{symbol table}. + +@table @code +@item awk_bool_t sym_lookup(const char *name, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_valtype_t wanted, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_value_t *result); +Fill in the @code{awk_value_t} structure pointed to by @code{result} +with the value of the variable named by the string @code{name}, which is +a regular C string. @code{wanted} indicates the type of value expected. +Return true if the actual type matches @code{wanted}, false otherwise +In the latter case, @code{result->val_type} indicates the actual type +(@pxref{table-value-types-returned}). + +@item awk_bool_t sym_update(const char *name, awk_value_t *value); +Update the variable named by the string @code{name}, which is a regular +C string. The variable is added to @command{gawk}'s symbol table +if it is not there. Return true if everything worked, false otherwise. + +Changing types (scalar to array or vice versa) of an existing variable +is @emph{not} allowed, nor may this routine be used to update an array. +This routine cannot be be used to update any of the predefined +variables (such as @code{ARGC} or @code{NF}). + +@item awk_bool_t sym_constant(const char *name, awk_value_t *value); +Create a variable named by the string @code{name}, which is +a regular C string, that has the constant value as given by +@code{value}. @command{awk}-level code cannot change the value of this +variable.@footnote{There (currently) is no @code{awk}-level feature that +provides this ability.} The extension may change the value of @code{name}'s +variable with subsequent calls to this routine, and may also convert +a variable created by @code{sym_update()} into a constant. However, +once a variable becomes a constant it cannot later be reverted into a +mutable variable. +@end table + +@node Symbol table by cookie +@subsubsection Variable Access and Update by Cookie + +A @dfn{scalar cookie} is an opaque handle that provide access +to a global variable or array. It is an optimization that +avoids looking up variables in @command{gawk}'s symbol table every time +access is needed. This was discussed earlier, in @ref{General Data Types}. + +The following functions let you work with scalar cookies. + +@table @code +@item awk_bool_t sym_lookup_scalar(awk_scalar_t cookie, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_valtype_t wanted, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_value_t *result); +Retrieve the current value of a scalar cookie. +Once you have obtained a scalar_cookie using @code{sym_lookup()}, you can +use this function to get its value more efficiently. +Return false if the value cannot be retrieved. + +@item awk_bool_t sym_update_scalar(awk_scalar_t cookie, awk_value_t *value); +Update the value associated with a scalar cookie. Return false if +the new value is not one of @code{AWK_STRING} or @code{AWK_NUMBER}. +Here too, the built-in variables may not be updated. +@end table + +It is not obvious at first glance how to work with scalar cookies or +what their @i{raison d'etre} really is. In theory, the @code{sym_lookup()} +and @code{sym_update()} routines are all you really need to work with +variables. For example, you might have code that looked up the value of +a variable, evaluated a condition, and then possibly changed the value +of the variable based on the result of that evaluation, like so: + +@example +/* do_magic --- do something really great */ + +static awk_value_t * +do_magic(int nargs, awk_value_t *result) +@{ + awk_value_t value; + + if ( sym_lookup("MAGIC_VAR", AWK_NUMBER, & value) + && some_condition(value.num_value)) @{ + value.num_value += 42; + sym_update("MAGIC_VAR", & value); + @} + + return make_number(0.0, result); +@} +@end example + +@noindent +This code looks (and is) simple and straightforward. So what's the problem? + +Consider what happens if @command{awk}-level code associated with your +extension calls the @code{magic()} function (implemented in C by @code{do_magic()}), +once per record, while processing hundreds of thousands or millions of records. +The @code{MAGIC_VAR} variable is looked up in the symbol table once or twice per function call! + +The symbol table lookup is really pure overhead; it is considerably more efficient +to get a cookie that represents the variable, and use that to get the variable's +value and update it as needed.@footnote{The difference is measurable and quite real. Trust us.} + +Thus, the way to use cookies is as follows. First, install your extension's variable +in @command{gawk}'s symbol table using @code{sym_update()}, as usual. Then get a +scalar cookie for the variable using @code{sym_lookup()}: + +@example +static awk_scalar_t magic_var_cookie; /* cookie for MAGIC_VAR */ + +static void +my_extension_init() +@{ + awk_value_t value; + + /* install initial value */ + sym_update("MAGIC_VAR", make_number(42.0, & value)); + + /* get cookie */ + sym_lookup("MAGIC_VAR", AWK_SCALAR, & value); + + /* save the cookie */ + magic_var_cookie = value.scalar_cookie; + @dots{} +@} +@end example + +Next, use the routines in this section for retrieving and updating +the value through the cookie. Thus, @code{do_magic()} now becomes +something like this: + +@example +/* do_magic --- do something really great */ + +static awk_value_t * +do_magic(int nargs, awk_value_t *result) +@{ + awk_value_t value; + + if ( sym_lookup_scalar(magic_var_cookie, AWK_NUMBER, & value) + && some_condition(value.num_value)) @{ + value.num_value += 42; + sym_update_scalar(magic_var_cookie, & value); + @} + @dots{} + + return make_number(0.0, result); +@} +@end example + +@quotation NOTE +The previous code omitted error checking for +presentation purposes. Your extension code should be more robust +and carefully check the return values from the API functions. +@end quotation + +@node Cached values +@subsubsection Creating and Using Cached Values + +The routines in this section allow you to create and release +cached values. As with scalar cookies, in theory, cached values +are not necessary. You can create numbers and strings using +the functions in @ref{Constructor Functions}. You can then +assign those values to variables using @code{sym_update()} +or @code{sym_update_scalar()}, as you like. + +However, you can understand the point of cached values if you remember that +@emph{every} string value's storage @emph{must} come from @code{malloc()}. +If you have 20 variables, all of which have the same string value, you +must create 20 identical copies of the string.@footnote{Numeric values +are clearly less problematic, requiring only a C @code{double} to store.} + +It is clearly more efficient, if possible, to create a value once, and +then tell @command{gawk} to reuse the value for multiple variables. That +is what the routines in this section let you do. The functions are as follows: + +@table @code +@item awk_bool_t create_value(awk_value_t *value, awk_value_cookie_t *result); +Create a cached string or numeric value from @code{value} for efficient later +assignment. +Only @code{AWK_NUMBER} and @code{AWK_STRING} values are allowed. Any other type +is rejected. While @code{AWK_UNDEFINED} could be allowed, doing so would +result in inferior performance. + +@item awk_bool_t release_value(awk_value_cookie_t vc); +Release the memory associated with a value cookie obtained +from @code{create_value()}. +@end table + +You use value cookies in a fashion similar to the way you use scalar cookies. +In the extension initialization routine, you create the value cookie: + +@example +static awk_value_cookie_t answer_cookie; /* static value cookie */ + +static void +my_extension_init() +@{ + awk_value_t value; + char *long_string; + size_t long_string_len; + + /* code from earlier */ + @dots{} + /* @dots{} fill in long_string and long_string_len @dots{} */ + make_malloced_string(long_string, long_string_len, & value); + create_value(& value, & answer_cookie); /* create cookie */ + @dots{} +@} +@end example + +Once the value is created, you can use it as the value of any number +of variables: + +@example +static awk_value_t * +do_magic(int nargs, awk_value_t *result) +@{ + awk_value_t new_value; + + @dots{} /* as earlier */ + + value.val_type = AWK_VALUE_COOKIE; + value.value_cookie = answer_cookie; + sym_update("VAR1", & value); + sym_update("VAR2", & value); + @dots{} + sym_update("VAR100", & value); + @dots{} +@} +@end example + +@noindent +Using value cookies in this way saves considerable storage, since all of +@code{VAR1} through @code{VAR100} share the same value. + +You might be wondering, ``Is this sharing problematic? +What happens if @command{awk} code assigns a new value to @code{VAR1}, +are all the others be changed too?'' + +That's a great question. The answer is that no, it's not a problem. +@command{gawk} is smart enough to avoid such problems. + +Finally, as part of your clean up action (@pxref{Exit Callback Functions}) +you should release any cached values that you created, using +@code{release_value()}. + +@node Array Manipulation +@subsection Array Manipulation + +The primary data structure@footnote{Okay, the only data structure.} in @command{awk} +is the associative array (@pxref{Arrays}). +Extensions need to be able to manipulate @command{awk} arrays. +The API provides a number of data structures for working with arrays, +functions for working with individual elements, and functions for +working with arrays as a whole. This includes the ability to +``flatten'' an array so that it is easy for C code to traverse +every element in an array. The array data structures integrate +nicely with the data structures for values to make it easy to +both work with and create true arrays of arrays (@pxref{General Data Types}). + +@menu +* Array Data Types:: Data types for working with arrays. +* Array Functions:: Functions for working with arrays. +* Flattening Arrays:: How to flatten arrays. +* Creating Arrays:: How to create and populate arrays. +@end menu + +@node Array Data Types +@subsubsection Array Data Types + +The data types associated with arrays are listed below. + +@table @code +@item typedef void *awk_array_t; +If you request the value of an array variable, you get back an +@code{awk_array_t} value. This value is opaque@footnote{It is also +a ``cookie,'' but the @command{gawk} developers did not wish to overuse this +term.} to the extension; it uniquely identifies the array but can +only be used by passing it into API functions or receiving it from API +functions. This is very similar to way @samp{FILE *} values are used +with the @code{<stdio.h>} library routines. + + +@item +@item typedef struct awk_element @{ +@itemx @ @ @ @ /* convenience linked list pointer, not used by gawk */ +@itemx @ @ @ @ struct awk_element *next; +@itemx @ @ @ @ enum @{ +@itemx @ @ @ @ @ @ @ @ AWK_ELEMENT_DEFAULT = 0,@ @ /* set by gawk */ +@itemx @ @ @ @ @ @ @ @ AWK_ELEMENT_DELETE = 1@ @ @ @ /* set by extension if should be deleted */ +@itemx @ @ @ @ @} flags; +@itemx @ @ @ @ awk_value_t index; +@itemx @ @ @ @ awk_value_t value; +@itemx @} awk_element_t; +The @code{awk_element_t} is a ``flattened'' +array element. @command{awk} produces an array of these +inside the @code{awk_flat_array_t} (see the next item). +Individual elements may be marked for deletion. New elements must be added +individually, one at a time, using the separate API for that purpose. +The fields are as follows: + +@c nested table +@table @code +@item struct awk_element *next; +This pointer is for the convenience of extension writers. It allows +an extension to create a linked list of new elements which can then be +added to an array in a loop that traverses the list. + +@item enum @{ @dots{} @} flags; +A set of flag values that convey information between @command{gawk} +and the extension. Currently there is only one: @code{AWK_ELEMENT_DELETE}, +which the extension can set to cause @command{gawk} to delete the +element from the original array upon release of the flattened array. + +@item index +@itemx value +The index and value of the element, respectively. +@emph{All} memory pointed to by @code{index} and @code{value} belongs to @command{gawk}. +@end table + +@item typedef struct awk_flat_array @{ +@itemx @ @ @ @ awk_const void *awk_const opaque1;@ @ @ @ /* private data for use by gawk */ +@itemx @ @ @ @ awk_const void *awk_const opaque2;@ @ @ @ /* private data for use by gawk */ +@itemx @ @ @ @ awk_const size_t count;@ @ @ @ @ /* how many elements */ +@itemx @ @ @ @ awk_element_t elements[1];@ @ /* will be extended */ +@itemx @} awk_flat_array_t; +This is a flattened array. When an extension gets one of these +from @command{gawk}, the @code{elements} array is of actual +size @code{count}. +The @code{opaque1} and @code{opaque2} pointers are for use by @command{gawk}; +therefore they are marked @code{awk_const} so that the extension cannot +modify them. +@end table + +@node Array Functions +@subsubsection Array Functions + +The following functions relate to individual array elements. + +@table @code +@item awk_bool_t get_element_count(awk_array_t a_cookie, size_t *count); +For the array represented by @code{a_cookie}, return in @code{*count} +the number of elements it contains. A subarray counts as a single element. +Return false if there is an error. + +@item awk_bool_t get_array_element(awk_array_t a_cookie, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const awk_value_t *const index, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_valtype_t wanted, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_value_t *result); +For the array represented by @code{a_cookie}, return in @code{*result} +the value of the element whose index is @code{index}. +@code{wanted} specifies the type of value you wish to retrieve. +Return false if @code{wanted} does not match the actual type or if +@code{index} is not in the array (@pxref{table-value-types-returned}). + +The value for @code{index} can be numeric, in which case @command{gawk} +converts it to a string. Using non-integral values is possible, but +requires that you understand how such values are converted to strings +(@pxref{Conversion}); thus using integral values is safest. + +As with @emph{all} strings passed into @code{gawk} from an extension, +the string value of @code{index} must come from @code{malloc()}, and +@command{gawk} releases the storage. + +@item awk_bool_t set_array_element(awk_array_t a_cookie, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const@ awk_value_t *const index, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const@ awk_value_t *const value); +In the array represented by @code{a_cookie}, create or modify +the element whose index is given by @code{index}. +The @code{ARGV} and @code{ENVIRON} arrays may not be changed. + +@item awk_bool_t set_array_element_by_elem(awk_array_t a_cookie, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_element_t element); +Like @code{set_array_element()}, but take the @code{index} and @code{value} +from @code{element}. This is a convenience macro. + +@item awk_bool_t del_array_element(awk_array_t a_cookie, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const awk_value_t* const index); +Remove the element with the given index from the array +represented by @code{a_cookie}. +Return true if the element was removed, or false if the element did +not exist in the array. +@end table + +The following functions relate to arrays as a whole: + +@table @code +@item awk_array_t create_array(); +Create a new array to which elements may be added. +@xref{Creating Arrays}, for a discussion of how to +create a new array and add elements to it. + +@item awk_bool_t clear_array(awk_array_t a_cookie); +Clear the array represented by @code{a_cookie}. +Return false if there was some kind of problem, true otherwise. +The array remains an array, but after calling this function, it +has no elements. This is equivalent to using the @code{delete} +statement (@pxref{Delete}). + +@item awk_bool_t flatten_array(awk_array_t a_cookie, awk_flat_array_t **data); +For the array represented by @code{a_cookie}, create an @code{awk_flat_array_t} +structure and fill it in. Set the pointer whose address is passed as @code{data} +to point to this structure. +Return true upon success, or false otherwise. +@xref{Flattening Arrays}, for a discussion of how to +flatten an array and work with it. + +@item awk_bool_t release_flattened_array(awk_array_t a_cookie, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_flat_array_t *data); +When done with a flattened array, release the storage using this function. +You must pass in both the original array cookie, and the address of +the created @code{awk_flat_array_t} structure. +The function returns true upon success, false otherwise. +@end table + +@node Flattening Arrays +@subsubsection Working With All The Elements of an Array + +To @dfn{flatten} an array is create a structure that +represents the full array in a fashion that makes it easy +for C code to traverse the entire array. Test code +in @file{extension/testext.c} does this, and also serves +as a nice example to show how to use the APIs. + +First, the @command{gawk} script that drives the test extension: + +@example +@@load "testext" +BEGIN @{ + n = split("blacky rusty sophie raincloud lucky", pets) + printf "pets has %d elements\n", length(pets) + ret = dump_array_and_delete("pets", "3") + printf "dump_array_and_delete(pets) returned %d\n", ret + if ("3" in pets) + printf("dump_array_and_delete() did NOT remove index \"3\"!\n") + else + printf("dump_array_and_delete() did remove index \"3\"!\n") + print "" +@} +@end example + +@noindent +This code creates an array with @code{split()} (@pxref{String Functions}) +and then calls @code{dump_and_delete()}. That function looks up +the array whose name is passed as the first argument, and +deletes the element at the index passed in the second argument. +It then prints the return value and checks if the element +was indeed deleted. Here is the C code that implements +@code{dump_array_and_delete()}. It has been edited slightly for +presentation. + +The first part declares variables, sets up the default +return value in @code{result}, and checks that the function +was called with the correct number of arguments: + +@example +static awk_value_t * +dump_array_and_delete(int nargs, awk_value_t *result) +@{ + awk_value_t value, value2, value3; + awk_flat_array_t *flat_array; + size_t count; + char *name; + int i; + + assert(result != NULL); + make_number(0.0, result); + + if (nargs != 2) @{ + printf("dump_array_and_delete: nargs not right " + "(%d should be 2)\n", nargs); + goto out; + @} +@end example + +The function then proceeds in steps, as follows. First, retrieve +the name of the array, passed as the first argument. Then +retrieve the array itself. If either operation fails, print +error messages and return: + +@example + /* get argument named array as flat array and print it */ + if (get_argument(0, AWK_STRING, & value)) @{ + name = value.str_value.str; + if (sym_lookup(name, AWK_ARRAY, & value2)) + printf("dump_array_and_delete: sym_lookup of %s passed\n", + name); + else @{ + printf("dump_array_and_delete: sym_lookup of %s failed\n", + name); + goto out; + @} + @} else @{ + printf("dump_array_and_delete: get_argument(0) failed\n"); + goto out; + @} +@end example + +For testing purposes and to make sure that the C code sees +the same number of elements as the @command{awk} code, +the second step is to get the count of elements in the array +and print it: + +@example + if (! get_element_count(value2.array_cookie, & count)) @{ + printf("dump_array_and_delete: get_element_count failed\n"); + goto out; + @} + + printf("dump_array_and_delete: incoming size is %lu\n", + (unsigned long) count); +@end example + +The third step is to actually flatten the array, and then +to double check that the count in the @code{awk_flat_array_t} +is the same as the count just retrieved: + +@example + if (! flatten_array(value2.array_cookie, & flat_array)) @{ + printf("dump_array_and_delete: could not flatten array\n"); + goto out; + @} + + if (flat_array->count != count) @{ + printf("dump_array_and_delete: flat_array->count (%lu)" + " != count (%lu)\n", + (unsigned long) flat_array->count, + (unsigned long) count); + goto out; + @} +@end example + +The fourth step is to retrieve the index of the element +to be deleted, which was passed as the second argument. +Remember that argument counts passed to @code{get_argument()} +are zero-based, thus the second argument is numbered one: + +@example + if (! get_argument(1, AWK_STRING, & value3)) @{ + printf("dump_array_and_delete: get_argument(1) failed\n"); + goto out; + @} +@end example + +The fifth step is where the ``real work'' is done. The function +loops over every element in the array, printing the index and +element values. In addition, upon finding the element with the +index that is supposed to be deleted, the function sets the +@code{AWK_ELEMENT_DELETE} bit in the @code{flags} field +of the element. When the array is released, @command{gawk} +traverses the flattened array, and deletes any element which +have this flag bit set: + +@example + for (i = 0; i < flat_array->count; i++) @{ + printf("\t%s[\"%.*s\"] = %s\n", + name, + (int) flat_array->elements[i].index.str_value.len, + flat_array->elements[i].index.str_value.str, + valrep2str(& flat_array->elements[i].value)); + + if (strcmp(value3.str_value.str, + flat_array->elements[i].index.str_value.str) + == 0) @{ + flat_array->elements[i].flags |= AWK_ELEMENT_DELETE; + printf("dump_array_and_delete: marking element \"%s\" " + "for deletion\n", + flat_array->elements[i].index.str_value.str); + @} + @} +@end example + +The sixth step is to release the flattened array. This tells +@command{gawk} that the extension is no longer using the array, +and that it should delete any elements marked for deletion. +@command{gawk} also frees any storage that was allocated, +so you should not use the pointer (@code{flat_array} in this +code) once you have called @code{release_flattened_array()}: + +@example + if (! release_flattened_array(value2.array_cookie, flat_array)) @{ + printf("dump_array_and_delete: could not release flattened array\n"); + goto out; + @} +@end example + +Finally, since everything was successful, the function sets the +return value to success, and returns: + +@example + make_number(1.0, result); +out: + return result; +@} +@end example + +Here is the output from running this part of the test: + +@example +pets has 5 elements +dump_array_and_delete: sym_lookup of pets passed +dump_array_and_delete: incoming size is 5 + pets["1"] = "blacky" + pets["2"] = "rusty" + pets["3"] = "sophie" +dump_array_and_delete: marking element "3" for deletion + pets["4"] = "raincloud" + pets["5"] = "lucky" +dump_array_and_delete(pets) returned 1 +dump_array_and_delete() did remove index "3"! +@end example + +@node Creating Arrays +@subsubsection How To Create and Populate Arrays + +Besides working with arrays created by @command{awk} code, you can +create arrays and populate them as you see fit, and then @command{awk} +code can access them and manipulate them. + +There are two important points about creating arrays from extension code: + +@enumerate 1 +@item +You must install a new array into @command{gawk}'s symbol +table immediately upon creating it. Once you have done so, +you can then populate the array. + +@ignore +Strictly speaking, this is required only +for arrays that will have subarrays as elements; however it is +a good idea to always do this. This restriction may be relaxed +in a subsequent revision of the API. +@end ignore + +Similarly, if installing a new array as a subarray of an existing array, +you must add the new array to its parent before adding any elements to it. + +Thus, the correct way to build an array is to work ``top down.'' Create +the array, and immediately install it in @command{gawk}'s symbol table +using @code{sym_update()}, or install it as an element in a previously +existing array using @code{set_element()}. Example code is coming shortly. + +@item +Due to gawk internals, after using @code{sym_update()} to install an array +into @command{gawk}, you have to retrieve the array cookie from the value +passed in to @command{sym_update()} before doing anything else with it, like so: + +@example +awk_value_t index, value; +awk_array_t new_array; + +make_const_string("an index", 8, & index); + +new_array = create_array(); +val.val_type = AWK_ARRAY; +val.array_cookie = new_array; + +/* install array in the symbol table */ +sym_update("array", & index, & val); + +new_array = val.array_cookie; /* YOU MUST DO THIS */ +@end example + +If installing an array as a subarray, you must also retrieve the value +of the array cookie after the call to @code{set_element()}. +@end enumerate + +The following C code is a simple test extension to create an array +with two regular elements and with a subarray. The leading @samp{#include} +directives and boilerplate variable declarations are omitted for brevity. +The first step is to create a new array and then install it +in the symbol table: + +@example +@ignore +#ifdef HAVE_CONFIG_H +#include <config.h> +#endif + +#include <stdio.h> +#include <assert.h> +#include <errno.h> +#include <stdlib.h> +#include <string.h> +#include <unistd.h> + +#include <sys/types.h> +#include <sys/stat.h> + +#include "gawkapi.h" + +static const gawk_api_t *api; /* for convenience macros to work */ +static awk_ext_id_t *ext_id; +static const char *ext_version = "testarray extension: version 1.0"; + +int plugin_is_GPL_compatible; + +@end ignore +/* create_new_array --- create a named array */ + +static void +create_new_array() +@{ + awk_array_t a_cookie; + awk_array_t subarray; + awk_value_t index, value; + + a_cookie = create_array(); + value.val_type = AWK_ARRAY; + value.array_cookie = a_cookie; + + if (! sym_update("new_array", & value)) + printf("create_new_array: sym_update(\"new_array\") failed!\n"); + a_cookie = value.array_cookie; +@end example + +@noindent +Note how @code{a_cookie} is reset from the @code{array_cookie} field in +the @code{value} structure. + +The second step is to install two regular values into @code{new_array}: + +@example + (void) make_const_string("hello", 5, & index); + (void) make_const_string("world", 5, & value); + if (! set_array_element(a_cookie, & index, & value)) @{ + printf("fill_in_array: set_array_element failed\n"); + return; + @} + + (void) make_const_string("answer", 6, & index); + (void) make_number(42.0, & value); + if (! set_array_element(a_cookie, & index, & value)) @{ + printf("fill_in_array: set_array_element failed\n"); + return; + @} +@end example + +The third step is to create the subarray and install it: + +@example + (void) make_const_string("subarray", 8, & index); + subarray = create_array(); + value.val_type = AWK_ARRAY; + value.array_cookie = subarray; + if (! set_array_element(a_cookie, & index, & value)) @{ + printf("fill_in_array: set_array_element failed\n"); + return; + @} + subarray = value.array_cookie; +@end example + +The final step is to populate the subarray with its own element: + +@example + (void) make_const_string("foo", 3, & index); + (void) make_const_string("bar", 3, & value); + if (! set_array_element(subarray, & index, & value)) @{ + printf("fill_in_array: set_array_element failed\n"); + return; + @} +@} +@ignore +static awk_ext_func_t func_table[] = @{ + @{ NULL, NULL, 0 @} +@}; + +/* init_testarray --- additional initialization function */ + +static awk_bool_t init_testarray(void) +@{ + create_new_array(); + + return 1; +@} + +static awk_bool_t (*init_func)(void) = init_testarray; + +dl_load_func(func_table, testarray, "") +@end ignore +@end example + +Here is sample script that loads the extension +and then dumps the array: + +@example +@@load "subarray" + +function dumparray(name, array, i) +@{ + for (i in array) + if (isarray(array[i])) + dumparray(name "[\"" i "\"]", array[i]) + else + printf("%s[\"%s\"] = %s\n", name, i, array[i]) +@} + +BEGIN @{ + dumparray("new_array", new_array); +@} +@end example + +Here is the result of running the script: + +@example +$ @kbd{AWKLIBPATH=$PWD ./gawk -f subarray.awk} +@print{} new_array["subarray"]["foo"] = bar +@print{} new_array["hello"] = world +@print{} new_array["answer"] = 42 +@end example + +@noindent +(@xref{Finding Extensions}, for more information on the +@env{AWKLIBPATH} environment variable.) + +@node Extension API Variables +@subsection API Variables + +The API provides two sets of variables. The first provides information +about the version of the API (both with which the extension was compiled, +and with which @command{gawk} was compiled). The second provides +information about how @command{gawk} was invoked. + +@menu +* Extension Versioning:: API Version information. +* Extension API Informational Variables:: Variables providing information about + @command{gawk}'s invocation. +@end menu + +@node Extension Versioning +@subsubsection API Version Constants and Variables + +The API provides both a ``major'' and a ``minor'' version number. +The API versions are available at compile time as constants: + +@table @code +@item GAWK_API_MAJOR_VERSION +The major version of the API. + +@item GAWK_API_MINOR_VERSION +The minor version of the API. +@end table + +The minor version increases when new functions are added to the API. Such +new functions are always added to the end of the API @code{struct}. + +The major version increases (and the minor version is reset to zero) if any +of the data types change size or member order, or if any of the existing +functions change signature. + +It could happen that an extension may be compiled against one version +of the API but loaded by a version of @command{gawk} using a different +version. For this reason, the major and minor API versions of the +running @command{gawk} are included in the API @code{struct} as read-only +constant integers: + +@table @code +@item api->major_version +The major version of the running @command{gawk}. + +@item api->minor_version +The minor version of the running @command{gawk}. +@end table + +It is up to the extension to decide if there are API incompatibilities. +Typically a check like this is enough: + +@example +if (api->major_version != GAWK_API_MAJOR_VERSION + || api->minor_version < GAWK_API_MINOR_VERSION) @{ + fprintf(stderr, "foo_extension: version mismatch with gawk!\n"); + fprintf(stderr, "\tmy version (%d, %d), gawk version (%d, %d)\n", + GAWK_API_MAJOR_VERSION, GAWK_API_MINOR_VERSION, + api->major_version, api->minor_version); + exit(1); +@} +@end example + +Such code is included in the boilerplate @code{dl_load_func()} macro +provided in @file{gawkapi.h} (discussed later, in +@ref{Extension API Boilerplate}). + +@node Extension API Informational Variables +@subsubsection Informational Variables + +The API provides access to several variables that describe +whether the corresponding command-line options were enabled when +@command{gawk} was invoked. The variables are: + +@table @code +@item do_lint +This variable is true if @command{gawk} was invoked with @option{--lint} option +(@pxref{Options}). + +@item do_traditional +This variable is true if @command{gawk} was invoked with @option{--traditional} option. + +@item do_profile +This variable is true if @command{gawk} was invoked with @option{--profile} option. + +@item do_sandbox +This variable is true if @command{gawk} was invoked with @option{--sandbox} option. + +@item do_debug +This variable is true if @command{gawk} was invoked with @option{--debug} option. + +@item do_mpfr +This variable is true if @command{gawk} was invoked with @option{--bignum} option. +@end table + +The value of @code{do_lint} can change if @command{awk} code +modifies the @code{LINT} built-in variable (@pxref{Built-in Variables}). +The others should not change during execution. + +@node Extension API Boilerplate +@subsection Boilerplate Code + +As mentioned earlier (@pxref{Extension Mechanism Outline}), the function +definitions as presented are really macros. To use these macros, your +extension must provide a small amount of boilerplate code (variables and +functions) towards the top of your source file, using pre-defined names +as described below. The boilerplate needed is also provided in comments +in the @file{gawkapi.h} header file: + +@example +/* Boiler plate code: */ +int plugin_is_GPL_compatible; + +static gawk_api_t *const api; +static awk_ext_id_t ext_id; +static const char *ext_version = NULL; /* or @dots{} = "some string" */ + +static awk_ext_func_t func_table[] = @{ + @{ "name", do_name, 1 @}, + /* @dots{} */ +@}; + +/* EITHER: */ + +static awk_bool_t (*init_func)(void) = NULL; + +/* OR: */ + +static awk_bool_t +init_my_module(void) +@{ + @dots{} +@} + +static awk_bool_t (*init_func)(void) = init_my_module; + +dl_load_func(func_table, some_name, "name_space_in_quotes") +@end example + +These variables and functions are as follows: + +@table @code +@item int plugin_is_GPL_compatible; +This asserts that the extension is compatible with the GNU GPL +(@pxref{Copying}). If your extension does not have this, @command{gawk} +will not load it (@pxref{Plugin License}). + +@item static gawk_api_t *const api; +This global @code{static} variable should be set to point to +the @code{gawk_api_t} pointer that @command{gawk} passes to your +@code{dl_load()} function. This variable is used by all of the macros. + +@item static awk_ext_id_t ext_id; +This global static variable should be set to the @code{awk_ext_id_t} +value that @command{gawk} passes to your @code{dl_load()} function. +This variable is used by all of the macros. + +@item static const char *ext_version = NULL; /* or @dots{} = "some string" */ +This global @code{static} variable should be set either +to @code{NULL}, or to point to a string giving the name and version of +your extension. + +@item static awk_ext_func_t func_table[] = @{ @dots{} @}; +This is an array of one or more @code{awk_ext_func_t} structures +as described earlier (@pxref{Extension Functions}). +It can then be looped over for multiple calls to +@code{add_ext_func()}. + +@item static awk_bool_t (*init_func)(void) = NULL; +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @r{OR} +@itemx static awk_bool_t init_my_module(void) @{ @dots{} @} +@itemx static awk_bool_t (*init_func)(void) = init_my_module; +If you need to do some initialization work, you should define a +function that does it (creates variables, opens files, etc.) +and then define the @code{init_func} pointer to point to your +function. +The function should return zero (false) upon failure, non-zero +(success) if everything goes well. + +If you don't need to do any initialization, define the pointer and +initialize it to @code{NULL}. + +@item dl_load_func(func_table, some_name, "name_space_in_quotes") +This macro expands to a @code{dl_load()} function that performs +all the necessary initializations. +@end table + +The point of the all the variables and arrays is to let the +@code{dl_load()} function (from the @code{dl_load_func()} +macro) do all the standard work. It does the following: + +@enumerate 1 +@item +Check the API versions. If the extension major version does not match +@command{gawk}'s, or if the extension minor version is greater than +@command{gawk}'s, it prints a fatal error message and exits. + +@item +Load the functions defined in @code{func_table}. +If any of them fails to load, it prints a warning message but +continues on. + +@item +If the @code{init_func} pointer is not @code{NULL}, call the +function it points to. If it returns non-zero, print a +warning message. + +@item +If @code{ext_version} is not @code{NULL}, register +the version string with @command{gawk}. +@end enumerate + +@node Finding Extensions +@subsection How @command{gawk} Finds Extensions + +Compiled extensions have to be installed in a directory where +@command{gawk} can find them. If @command{gawk} is configured and +built in the default fashion, the directory in which to find +extensions is @file{/usr/local/lib/gawk}. You can also specify a search +path with a list of directories to search for compiled extensions. +@xref{AWKLIBPATH Variable}, for more information. + +@node Extension Example +@section Example: Some File Functions + +@quotation +@i{No matter where you go, there you are.} @* +Buckaroo Bonzai +@end quotation + +@c It's enough to show chdir and stat, no need for fts + +Two useful functions that are not in @command{awk} are @code{chdir()} (so +that an @command{awk} program can change its directory) and @code{stat()} +(so that an @command{awk} program can gather information about a file). +This @value{SECTION} implements these functions for @command{gawk} +in an extension. @menu * Internal File Description:: What the new functions will do. @@ -28121,13 +30613,13 @@ external extension library. @node Internal File Description @subsection Using @code{chdir()} and @code{stat()} -This @value{SECTION} shows how to use the new functions at the @command{awk} -level once they've been integrated into the running @command{gawk} -interpreter. -Using @code{chdir()} is very straightforward. It takes one argument, -the new directory to change to: +This @value{SECTION} shows how to use the new functions at +the @command{awk} level once they've been integrated into the +running @command{gawk} interpreter. Using @code{chdir()} is very +straightforward. It takes one argument, the new directory to change to: @example +@@load "filefuncs" @dots{} newdir = "/home/arnold/funstuff" ret = chdir(newdir) @@ -28139,21 +30631,18 @@ if (ret < 0) @{ @dots{} @end example -The return value is negative if the @code{chdir} failed, -and @code{ERRNO} -(@pxref{Built-in Variables}) -is set to a string indicating the error. +The return value is negative if the @code{chdir()} failed, and +@code{ERRNO} (@pxref{Built-in Variables}) is set to a string indicating +the error. -Using @code{stat()} is a bit more complicated. -The C @code{stat()} function fills in a structure that has a fair -amount of information. +Using @code{stat()} is a bit more complicated. The C @code{stat()} +function fills in a structure that has a fair amount of information. The right way to model this in @command{awk} is to fill in an associative array with the appropriate information: @c broke printf for page breaking @example file = "/home/arnold/.profile" -fdata[1] = "x" # force `fdata' to be an array ret = stat(file, fdata) if (ret < 0) @{ printf("could not stat %s: %s\n", @@ -28198,11 +30687,11 @@ be a function of the file's size if the file has holes. The file's last access, modification, and inode update times, respectively. These are numeric timestamps, suitable for formatting with @code{strftime()} -(@pxref{Built-in}). +(@pxref{Time Functions}). @item "pmode" The file's ``printable mode.'' This is a string representation of -the file's type and permissions, such as what is produced by +the file's type and permissions, such as is produced by @samp{ls -l}---for example, @code{"drwxr-xr-x"}. @item "type" @@ -28263,64 +30752,96 @@ of that number, respectively. @node Internal File Ops @subsection C Code for @code{chdir()} and @code{stat()} -Here is the C code for these extensions. They were written for -GNU/Linux. The code needs some more work for complete portability -to other POSIX-compliant systems:@footnote{This version is edited -slightly for presentation. See -@file{extension/filefuncs.c} in the @command{gawk} distribution -for the complete version.} +Here is the C code for these extensions.@footnote{This version is +edited slightly for presentation. See @file{extension/filefuncs.c} +in the @command{gawk} distribution for the complete version.} + +The file includes a number of standard header files, and then includes +the @file{gawkapi.h} header file which provides the API definitions. +Those are followed by the necessary variable declarations +to make use of the API macros and boilerplate code +(@pxref{Extension API Boilerplate}). @c break line for page breaking @example -#include "awk.h" +#ifdef HAVE_CONFIG_H +#include <config.h> +#endif + +#include <stdio.h> +#include <assert.h> +#include <errno.h> +#include <stdlib.h> +#include <string.h> +#include <unistd.h> + +#include <sys/types.h> +#include <sys/stat.h> -#include <sys/sysmacros.h> +#include "gawkapi.h" + +#include "gettext.h" +#define _(msgid) gettext(msgid) +#define N_(msgid) msgid + +#include "gawkfts.h" +#include "stack.h" + +static const gawk_api_t *api; /* for convenience macros to work */ +static awk_ext_id_t *ext_id; +static awk_bool_t init_filefuncs(void); +static awk_bool_t (*init_func)(void) = init_filefuncs; +static const char *ext_version = "filefuncs extension: version 1.0"; int plugin_is_GPL_compatible; +@end example +@cindex programming conventions, @command{gawk} internals +By convention, for an @command{awk} function @code{foo()}, the C function +that implements it is called @code{do_foo()}. The function should have +two arguments: the first is an @code{int} usually called @code{nargs}, +that represents the number of actual arguments for the function. +The second is a pointer to an @code{awk_value_t}, usually named +@code{result}. + +@example /* do_chdir --- provide dynamically loaded chdir() builtin for gawk */ -static NODE * -do_chdir(int nargs) +static awk_value_t * +do_chdir(int nargs, awk_value_t *result) @{ - NODE *newdir; + awk_value_t newdir; int ret = -1; - if (do_lint && nargs != 1) - lintwarn("chdir: called with incorrect number of arguments"); + assert(result != NULL); - newdir = get_scalar_argument(0, FALSE); + if (do_lint && nargs != 1) + lintwarn(ext_id, + _("chdir: called with incorrect number of arguments, " + "expecting 1")); @end example -The file includes the @code{"awk.h"} header file for definitions -for the @command{gawk} internals. It includes @code{<sys/sysmacros.h>} -for access to the @code{major()} and @code{minor}() macros. - -@cindex programming conventions, @command{gawk} internals -By convention, for an @command{awk} function @code{foo}, the function that -implements it is called @samp{do_foo}. The function should take -a @samp{int} argument, usually called @code{nargs}, that -represents the number of defined arguments for the function. The @code{newdir} +The @code{newdir} variable represents the new directory to change to, retrieved -with @code{get_scalar_argument()}. Note that the first argument is +with @code{get_argument()}. Note that the first argument is numbered zero. -This code actually accomplishes the @code{chdir()}. It first forces -the argument to be a string and passes the string value to the +If the argument is retrieved successfully, the function calls the @code{chdir()} system call. If the @code{chdir()} fails, @code{ERRNO} is updated. @example - (void) force_string(newdir); - ret = chdir(newdir->stptr); - if (ret < 0) - update_ERRNO_int(errno); + if (get_argument(0, AWK_STRING, & newdir)) @{ + ret = chdir(newdir.str_value.str); + if (ret < 0) + update_ERRNO_int(errno); + @} @end example Finally, the function returns the return value to the @command{awk} level: @example - return make_number((AWKNUM) ret); + return make_number(ret, result); @} @end example @@ -28339,7 +30860,168 @@ format_mode(unsigned long fmode) @} @end example -Next comes the @code{do_stat()} function. It starts with +Next comes a function for reading symbolic links, which is also +omitted here for brevity: + +@example +/* read_symlink --- read a symbolic link into an allocated buffer. + @dots{} */ + +static char * +read_symlink(const char *fname, size_t bufsize, ssize_t *linksize) +@{ + @dots{} +@} +@end example + +Two helper functions simplify entering values in the +array that will contain the result of the @code{stat()}: + +@example +/* array_set --- set an array element */ + +static void +array_set(awk_array_t array, const char *sub, awk_value_t *value) +@{ + awk_value_t index; + + set_array_element(array, + make_const_string(sub, strlen(sub), & index), + value); + +@} + +/* array_set_numeric --- set an array element with a number */ + +static void +array_set_numeric(awk_array_t array, const char *sub, double num) +@{ + awk_value_t tmp; + + array_set(array, sub, make_number(num, & tmp)); +@} +@end example + +The following function does most of the work to fill in +the @code{awk_array_t} result array with values obtained +from a valid @code{struct stat}. It is done in a separate function +to support the @code{stat()} function for @command{gawk} and also +to support the @code{fts()} extension which is included in +the same file but whose code is not shown here +(@pxref{Extension Sample File Functions}). + +The first part of the function is variable declarations, +including a table to map file types to strings: + +@example +/* fill_stat_array --- do the work to fill an array with stat info */ + +static int +fill_stat_array(const char *name, awk_array_t array, struct stat *sbuf) +@{ + char *pmode; /* printable mode */ + const char *type = "unknown"; + awk_value_t tmp; + static struct ftype_map @{ + unsigned int mask; + const char *type; + @} ftype_map[] = @{ + @{ S_IFREG, "file" @}, + @{ S_IFBLK, "blockdev" @}, + @{ S_IFCHR, "chardev" @}, + @{ S_IFDIR, "directory" @}, +#ifdef S_IFSOCK + @{ S_IFSOCK, "socket" @}, +#endif +#ifdef S_IFIFO + @{ S_IFIFO, "fifo" @}, +#endif +#ifdef S_IFLNK + @{ S_IFLNK, "symlink" @}, +#endif +#ifdef S_IFDOOR /* Solaris weirdness */ + @{ S_IFDOOR, "door" @}, +#endif /* S_IFDOOR */ + @}; + int j, k; +@end example + +The destination array is cleared, and then code fills in +various elements based on values in the @code{struct stat}: + +@example + /* empty out the array */ + clear_array(array); + + /* fill in the array */ + array_set(array, "name", make_const_string(name, strlen(name), + & tmp)); + array_set_numeric(array, "dev", sbuf->st_dev); + array_set_numeric(array, "ino", sbuf->st_ino); + array_set_numeric(array, "mode", sbuf->st_mode); + array_set_numeric(array, "nlink", sbuf->st_nlink); + array_set_numeric(array, "uid", sbuf->st_uid); + array_set_numeric(array, "gid", sbuf->st_gid); + array_set_numeric(array, "size", sbuf->st_size); + array_set_numeric(array, "blocks", sbuf->st_blocks); + array_set_numeric(array, "atime", sbuf->st_atime); + array_set_numeric(array, "mtime", sbuf->st_mtime); + array_set_numeric(array, "ctime", sbuf->st_ctime); + + /* for block and character devices, add rdev, + major and minor numbers */ + if (S_ISBLK(sbuf->st_mode) || S_ISCHR(sbuf->st_mode)) @{ + array_set_numeric(array, "rdev", sbuf->st_rdev); + array_set_numeric(array, "major", major(sbuf->st_rdev)); + array_set_numeric(array, "minor", minor(sbuf->st_rdev)); + @} +@end example + +@noindent +The latter part of the function makes selective additions +to the destination array, depending upon the availability of +certain members and/or the type of the file. It then returns zero, +for success: + +@example +#ifdef HAVE_ST_BLKSIZE + array_set_numeric(array, "blksize", sbuf->st_blksize); +#endif /* HAVE_ST_BLKSIZE */ + + pmode = format_mode(sbuf->st_mode); + array_set(array, "pmode", make_const_string(pmode, strlen(pmode), + & tmp)); + + /* for symbolic links, add a linkval field */ + if (S_ISLNK(sbuf->st_mode)) @{ + char *buf; + ssize_t linksize; + + if ((buf = read_symlink(name, sbuf->st_size, + & linksize)) != NULL) + array_set(array, "linkval", + make_malloced_string(buf, linksize, & tmp)); + else + warning(ext_id, _("stat: unable to read symbolic link `%s'"), + name); + @} + + /* add a type field */ + type = "unknown"; /* shouldn't happen */ + for (j = 0, k = sizeof(ftype_map)/sizeof(ftype_map[0]); j < k; j++) @{ + if ((sbuf->st_mode & S_IFMT) == ftype_map[j].mask) @{ + type = ftype_map[j].type; + break; + @} + @} + + array_set(array, "type", make_const_string(type, strlen(type), &tmp)); + + return 0; +@} +@end example + +Finally, here is the @code{do_stat()} function. It starts with variable declarations and argument checking: @ignore @@ -28349,116 +31031,140 @@ Changed message for page breaking. Used to be: @example /* do_stat --- provide a stat() function for gawk */ -static NODE * -do_stat(int nargs) +static awk_value_t * +do_stat(int nargs, awk_value_t *result) @{ - NODE *file, *array, *tmp; - struct stat sbuf; + awk_value_t file_param, array_param; + char *name; + awk_array_t array; int ret; - NODE **aptr; - char *pmode; /* printable mode */ - char *type = "unknown"; + struct stat sbuf; - if (do_lint && nargs > 2) - lintwarn("stat: called with too many arguments"); + assert(result != NULL); + + if (do_lint && nargs != 2) @{ + lintwarn(ext_id, + _("stat: called with wrong number of arguments")); + return make_number(-1, result); + @} @end example Then comes the actual work. First, the function gets the arguments. -Then, it always clears the array. +Next, it gets the information for the file. The code use @code{lstat()} (instead of @code{stat()}) to get the file information, in case the file is a symbolic link. If there's an error, it sets @code{ERRNO} and returns: -@c comment made multiline for page breaking @example /* file is first arg, array to hold results is second */ - file = get_scalar_argument(0, FALSE); - array = get_array_argument(1, FALSE); + if ( ! get_argument(0, AWK_STRING, & file_param) + || ! get_argument(1, AWK_ARRAY, & array_param)) @{ + warning(ext_id, _("stat: bad parameters")); + return make_number(-1, result); + @} - /* empty out the array */ - assoc_clear(array); + name = file_param.str_value.str; + array = array_param.array_cookie; + + /* always empty out the array */ + clear_array(array); /* lstat the file, if error, set ERRNO and return */ - (void) force_string(file); - ret = lstat(file->stptr, & sbuf); + ret = lstat(name, & sbuf); if (ret < 0) @{ update_ERRNO_int(errno); - return make_number((AWKNUM) ret); + return make_number(ret, result); @} @end example -Now comes the tedious part: filling in the array. Only a few of the -calls are shown here, since they all follow the same pattern: +The tedious work is done by @code{fill_stat_array()}, shown +earlier. When done, return the result from @code{fill_stat_array()}: @example - /* fill in the array */ - aptr = assoc_lookup(array, tmp = make_string("name", 4)); - *aptr = dupnode(file); - unref(tmp); + ret = fill_stat_array(name, array, & sbuf); - aptr = assoc_lookup(array, tmp = make_string("mode", 4)); - *aptr = make_number((AWKNUM) sbuf.st_mode); - unref(tmp); - - aptr = assoc_lookup(array, tmp = make_string("pmode", 5)); - pmode = format_mode(sbuf.st_mode); - *aptr = make_string(pmode, strlen(pmode)); - unref(tmp); + return make_number(ret, result); +@} @end example -When done, return the @code{lstat()} return value: +@cindex programming conventions, @command{gawk} internals +Finally, it's necessary to provide the ``glue'' that loads the +new function(s) into @command{gawk}. + +The @code{filefuncs} extension also provides an @code{fts()} +function, which we omit here. For its sake there is an initialization +function: @example +/* init_filefuncs --- initialization routine */ - return make_number((AWKNUM) ret); +static awk_bool_t +init_filefuncs(void) +@{ + @dots{} @} @end example -@cindex programming conventions, @command{gawk} internals -Finally, it's necessary to provide the ``glue'' that loads the -new function(s) into @command{gawk}. By convention, each library has -a routine named @code{dl_load()} that does the job. The simplest way -is to use the @code{dl_load_func} macro in @code{gawkapi.h}. +We are almost done. We need an array of @code{awk_ext_func_t} +structures for loading each function into @command{gawk}: + +@example +static awk_ext_func_t func_table[] = @{ + @{ "chdir", do_chdir, 1 @}, + @{ "stat", do_stat, 2 @}, + @{ "fts", do_fts, 3 @}, +@}; +@end example + +Each extension must have a routine named @code{dl_load()} to load +everything that needs to be loaded. It is simplest to use the +@code{dl_load_func()} macro in @code{gawkapi.h}: + +@example +/* define the dl_load() function using the boilerplate macro */ + +dl_load_func(func_table, filefuncs, "") +@end example And that's it! As an exercise, consider adding functions to implement system calls such as @code{chown()}, @code{chmod()}, and @code{umask()}. @node Using Internal File Ops -@subsection Integrating the Extensions +@subsection Integrating The Extensions @cindex @command{gawk}, interpreter@comma{} adding code to Now that the code is written, it must be possible to add it at runtime to the running @command{gawk} interpreter. First, the code must be compiled. Assuming that the functions are in a file named @file{filefuncs.c}, and @var{idir} is the location -of the @command{gawk} include files, -the following steps create -a GNU/Linux shared library: +of the @file{gawkapi.h} header file, +the following steps@footnote{In practice, you would probably want to +use the GNU Autotools---Automake, Autoconf, Libtool, and Gettext---to +configure and build your libraries. Instructions for doing so are beyond +the scope of this @value{DOCUMENT}. @xref{gawkextlib}, for WWW links to +the tools.} create a GNU/Linux shared library: @example $ @kbd{gcc -fPIC -shared -DHAVE_CONFIG_H -c -O -g -I@var{idir} filefuncs.c} -$ @kbd{ld -o filefuncs.so -shared filefuncs.o} +$ @kbd{ld -o filefuncs.so -shared filefuncs.o -lc} @end example -@cindex @code{extension()} function (@command{gawk}) -Once the library exists, it is loaded by calling the @code{extension()} -built-in function. -This function takes two arguments: the name of the -library to load and the name of a function to call when the library -is first loaded. This function adds the new functions to @command{gawk}. -It returns the value returned by the initialization function -within the shared library: +Once the library exists, it is loaded by using the @code{@@load} keyword. @example # file testff.awk +@@load "filefuncs" + BEGIN @{ - extension("./filefuncs.so", "dl_load") + "pwd" | getline curdir # save current directory + close("pwd") - chdir(".") # no-op + chdir("/tmp") + system("pwd") # test it + chdir(curdir) # go back - data[1] = 1 # force `data' to be an array print "Info for testff.awk" ret = stat("testff.awk", data) print "ret =", ret @@ -28476,40 +31182,705 @@ BEGIN @{ @} @end example -Here are the results of running the program: +The @env{AWKLIBPATH} environment variable tells +@command{gawk} where to find shared libraries (@pxref{Finding Extensions}). +We set it to the current directory and run the program: @example -$ @kbd{gawk -f testff.awk} +$ @kbd{AWKLIBPATH=$PWD gawk -f testff.awk} +@print{} /tmp @print{} Info for testff.awk @print{} ret = 0 -@print{} data["size"] = 607 -@print{} data["ino"] = 14945891 -@print{} data["name"] = testff.awk -@print{} data["pmode"] = -rw-rw-r-- -@print{} data["nlink"] = 1 -@print{} data["atime"] = 1293993369 -@print{} data["mtime"] = 1288520752 -@print{} data["mode"] = 33204 @print{} data["blksize"] = 4096 -@print{} data["dev"] = 2054 +@print{} data["mtime"] = 1350838628 +@print{} data["mode"] = 33204 @print{} data["type"] = file -@print{} data["gid"] = 500 -@print{} data["uid"] = 500 +@print{} data["dev"] = 2053 +@print{} data["gid"] = 1000 +@print{} data["ino"] = 1719496 +@print{} data["ctime"] = 1350838628 @print{} data["blocks"] = 8 -@print{} data["ctime"] = 1290113572 -@print{} testff.awk modified: 10 31 10 12:25:52 +@print{} data["nlink"] = 1 +@print{} data["name"] = testff.awk +@print{} data["atime"] = 1350838632 +@print{} data["pmode"] = -rw-rw-r-- +@print{} data["size"] = 662 +@print{} data["uid"] = 1000 +@print{} testff.awk modified: 10 21 12 18:57:08 @print{} @print{} Info for JUNK @print{} ret = -1 @print{} JUNK modified: 01 01 70 02:00:00 @end example -@c ENDOFRANGE filre -@c ENDOFRANGE dirch -@c ENDOFRANGE statg -@c ENDOFRANGE chdirg -@c ENDOFRANGE gladfgaw -@c ENDOFRANGE adfugaw -@c ENDOFRANGE fubadgaw + +@node Extension Samples +@section The Sample Extensions In The @command{gawk} Distribution + +This @value{SECTION} provides brief overviews of the sample extensions +that come in the @command{gawk} distribution. Some of them are intended +for production use, such the @code{filefuncs} and @code{readdir} extensions. +Others mainly provide example code that shows how to use the extension API. + +@menu +* Extension Sample File Functions:: The file functions sample. +* Extension Sample Fnmatch:: An interface to @code{fnmatch()}. +* Extension Sample Fork:: An interface to @code{fork()} and other + process functions. +* Extension Sample Ord:: Character to value to character + conversions. +* Extension Sample Readdir:: An interface to @code{readdir()}. +* Extension Sample Revout:: Reversing output sample output wrapper. +* Extension Sample Rev2way:: Reversing data sample two-way processor. +* Extension Sample Read write array:: Serializing an array to a file. +* Extension Sample Readfile:: Reading an entire file into a string. +* Extension Sample API Tests:: Tests for the API. +* Extension Sample Time:: An interface to @code{gettimeofday()} + and @code{sleep()}. +@end menu + +@node Extension Sample File Functions +@subsection File Related Functions + +The @code{filefuncs} extension provides three different functions, as follows: +The usage is: + +@table @code +@item @@load "filefuncs" +This is how you load the extension. + +@item result = chdir("/some/directory") +The @code{chdir()} function is a direct hook to the @code{chdir()} +system call to change the current directory. It returns zero +upon success or less than zero upon error. In the latter case it updates +@code{ERRNO}. + +@item result = stat("/some/path", statdata) +The @code{stat()} function provides a hook into the +@code{stat()} system call. In fact, it uses @code{lstat()}. +It returns zero upon success or less than zero upon error. +In the latter case it updates @code{ERRNO}. + +In all cases, it clears the @code{statdata} array. +When the call is successful, @code{stat()} fills the @code{statdata} +array with information retrieved from the filesystem, as follows: + +@c nested table +@multitable @columnfractions .25 .60 +@item @code{statdata["name"]} @tab +The name of the file. + +@item @code{statdata["dev"]} @tab +Corresponds to the @code{st_dev} field in the @code{struct stat}. + +@item @code{statdata["ino"]} @tab +Corresponds to the @code{st_ino} field in the @code{struct stat}. + +@item @code{statdata["mode"]} @tab +Corresponds to the @code{st_mode} field in the @code{struct stat}. + +@item @code{statdata["nlink"]} @tab +Corresponds to the @code{st_nlink} field in the @code{struct stat}. + +@item @code{statdata["uid"]} @tab +Corresponds to the @code{st_uid} field in the @code{struct stat}. + +@item @code{statdata["gid"]} @tab +Corresponds to the @code{st_gid} field in the @code{struct stat}. + +@item @code{statdata["size"]} @tab +Corresponds to the @code{st_size} field in the @code{struct stat}. + +@item @code{statdata["atime"]} @tab +Corresponds to the @code{st_atime} field in the @code{struct stat}. + +@item @code{statdata["mtime"]} @tab +Corresponds to the @code{st_mtime} field in the @code{struct stat}. + +@item @code{statdata["ctime"]} @tab +Corresponds to the @code{st_ctime} field in the @code{struct stat}. + +@item @code{statdata["rdev"]} @tab +Corresponds to the @code{st_rdev} field in the @code{struct stat}. +This element is only present for device files. + +@item @code{statdata["major"]} @tab +Corresponds to the @code{st_major} field in the @code{struct stat}. +This element is only present for device files. + +@item @code{statdata["minor"]} @tab +Corresponds to the @code{st_minor} field in the @code{struct stat}. +This element is only present for device files. + +@item @code{statdata["blksize"]} @tab +Corresponds to the @code{st_blksize} field in the @code{struct stat}. +if this field is present on your system. +(It is present on all modern systems that we know of.) + +@item @code{statdata["pmode"]} @tab +A human-readable version of the mode value, such as printed by +@command{ls}. For example, @code{"-rwxr-xr-x"}. + +@item @code{statdata["linkval"]} @tab +If the named file is a symbolic link, this element will exist +and its value is the value of the symbolic link (where the +symbolic link points to). + +@item @code{statdata["type"]} @tab +The type of the file as a string. One of +@code{"file"}, +@code{"blockdev"}, +@code{"chardev"}, +@code{"directory"}, +@code{"socket"}, +@code{"fifo"}, +@code{"symlink"}, +@code{"door"}, +or +@code{"unknown"}. +Not all systems support all file types. +@end multitable + +@item flags = or(FTS_PHYSICAL, ...) +@itemx result = fts(pathlist, flags, filedata) +Walk the file trees provided in @code{pathlist} and fill in the +@code{filedata} array as described below. @code{flags} is the bitwise +OR of several predefined constant values, also as described below. +Return zero if there were no errors, otherwise return @minus{}1. +@end table + +The @code{fts()} function provides a hook to the C library @code{fts()} +routines for traversing file hierarchies. Instead of returning data +about one file at a time in a stream, it fills in a multi-dimensional +array with data about each file and directory encountered in the requested +hierarchies. + +The arguments are as follows: + +@table @code +@item pathlist +An array of filenames. The element values are used; the index values are ignored. + +@item flags +This should be the bitwise OR of one or more of the following +predefined constant flag values. At least one of +@code{FTS_LOGICAL} or @code{FTS_PHYSICAL} must be provided; otherwise +@code{fts()} returns an error value and sets @code{ERRNO}. +The flags are: + +@c nested table +@table @code +@item FTS_LOGICAL +Do a ``logical'' file traversal, where the information returned for +a symbolic link refers to the linked-to file, and not to the symbolic +link itself. This flag is mutually exclusive with @code{FTS_PHYSICAL}. + +@item FTS_PHYSICAL +Do a ``physical'' file traversal, where the information returned for a +symbolic link refers to the symbolic link itself. This flag is mutually +exclusive with @code{FTS_LOGICAL}. + +@item FTS_NOCHDIR +As a performance optimization, the C library @code{fts()} routines +change directory as they traverse a file hierarchy. This flag disables +that optimization. + +@item FTS_COMFOLLOW +Immediately follow a symbolic link named in @code{pathlist}, +whether or not @code{FTS_LOGICAL} is set. + +@item FTS_SEEDOT +By default, the @code{fts()} routines do not return entries for @file{.} +and @file{..}. This option causes entries for @file{..} to also +be included. (The extension always includes an entry for @file{.}, +see below.) + +@item FTS_XDEV +During a traversal, do not cross onto a different mounted filesystem. +@end table + +@item filedata +The @code{filedata} array is first cleared. Then, @code{fts()} creates +an element in @code{filedata} for every element in @code{pathlist}. +The index is the name of the directory or file given in @code{pathlist}. +The element for this index is itself an array. There are two cases. + +@c nested table +@table @emph +@item The path is a file. +In this case, the array contains two or three elements: + +@c doubly nested table +@table @code +@item "path" +The full path to this file, starting from the ``root'' that was given +in the @code{pathlist} array. + +@item "stat" +This element is itself an array, containing the same information as provided +by the @code{stat()} function described earlier for its +@code{statdata} argument. The element may not be present if +the @code{stat()} system call for the file failed. + +@item "error" +If some kind of error was encountered, the array will also +contain an element named @code{"error"}, which is a string describing the error. +@end table + +@item The path is a directory. +In this case, the array contains one element for each entry in the +directory. If an entry is a file, that element is as for files, just +described. If the entry is a directory, that element is (recursively), +an array describing the subdirectory. If @code{FTS_SEEDOT} was provided +in the flags, then there will also be an element named @code{".."}. This +element will be an array containing the data as provided by @code{stat()}. + +In addition, there will be an element whose index is @code{"."}. +This element is an array containing the same two or three elements as +for a file: @code{"path"}, @code{"stat"}, and @code{"error"}. +@end table +@end table + +The @code{fts()} function returns zero if there were no errors. +Otherwise it returns @minus{}1. + +@quotation NOTE +The @code{fts()} extension does not exactly mimic the +interface of the C library @code{fts()} routines, choosing instead to +provide an interface that is based on associative arrays, which should +be more comfortable to use from an @command{awk} program. This includes the +lack of a comparison function, since @command{gawk} already provides +powerful array sorting facilities. While an @code{fts_read()}-like +interface could have been provided, this felt less natural than simply +creating a multi-dimensional array to represent the file hierarchy and +its information. +@end quotation + +See @file{test/fts.awk} in the @command{gawk} distribution for an example. + +@node Extension Sample Fnmatch +@subsection Interface To @code{fnmatch()} + +This extension provides an interface to the C library +@code{fnmatch()} function. The usage is: + +@example +@@load "fnmatch" + +result = fnmatch(pattern, string, flags) +@end example + +The @code{fnmatch} extension adds a single function named +@code{fnmatch()}, one constant (@code{FNM_NOMATCH}), and an array of +flag values named @code{FNM}. + +The arguments to @code{fnmatch()} are: + +@table @code +@item pattern +The filename wildcard to match. + +@item string +The filename string, + +@item flag +Either zero, or the bitwise OR of one or more of the +flags in the @code{FNM} array. +@end table + +The return value is zero on success, @code{FNM_NOMATCH} +if the string did not match the pattern, or +a different non-zero value if an error occurred. + +The flags are follows: + +@multitable @columnfractions .25 .75 +@item @code{FNM["CASEFOLD"]} @tab +Corresponds to the @code{FNM_CASEFOLD} flag as defined in @code{fnmatch()}. + +@item @code{FNM["FILE_NAME"]} @tab +Corresponds to the @code{FNM_FILE_NAME} flag as defined in @code{fnmatch()}. + +@item @code{FNM["LEADING_DIR"]} @tab +Corresponds to the @code{FNM_LEADING_DIR} flag as defined in @code{fnmatch()}. + +@item @code{FNM["NOESCAPE"]} @tab +Corresponds to the @code{FNM_NOESCAPE} flag as defined in @code{fnmatch()}. + +@item @code{FNM["PATHNAME"]} @tab +Corresponds to the @code{FNM_PATHNAME} flag as defined in @code{fnmatch()}. + +@item @code{FNM["PERIOD"]} @tab +Corresponds to the @code{FNM_PERIOD} flag as defined in @code{fnmatch()}. +@end multitable + +Here is an example: + +@example +@@load "fnmatch" +@dots{} +flags = or(FNM["PERIOD"], FNM["NOESCAPE"]) +if (fnmatch("*.a", "foo.c", flags) == FNM_NOMATCH) + print "no match" +@end example + +@node Extension Sample Fork +@subsection Interface To @code{fork()}, @code{wait()} and @code{waitpid()} + +The @code{fork} extension adds three functions, as follows. + +@table @code +@item @@load "fork" +This is how you load the extension. + +@item pid = fork() +This function creates a new process. The return value is the zero in the +child and the process-id number of the child in the parent, or @minus{}1 +upon error. In the latter case, @code{ERRNO} indicates the problem. +In the child, @code{PROCINFO["pid"]} and @code{PROCINFO["ppid"]} are +updated to reflect the correct values. + +@item ret = waitpid(pid) +This function takes a numeric argument, which is the process-id to +wait for. The return value is that of the +@code{waitpid()} system call. + +@item ret = wait() +This function waits for the first child to die. +The return value is that of the +@code{wait()} system call. +@end table + +There is no corresponding @code{exec()} function. + +Here is an example: + +@example +@@load "fork" +@dots{} +if ((pid = fork()) == 0) + print "hello from the child" +else + print "hello from the parent" +@end example + +@node Extension Sample Ord +@subsection Character and Numeric values: @code{ord()} and @code{chr()} + +The @code{ordchr} extension adds two functions, named +@code{ord()} and @code{chr()}, as follows. + +@table @code +@item number = ord(string) +Return the numeric value of the first character in @code{string}. + +@item char = chr(number) +Return the string whose first character is that represented by @code{number}. +@end table + +These functions are inspired by the Pascal language functions +of the same name. Here is an example: + +@example +@@load "ordchr" +@dots{} +printf("The numeric value of 'A' is %d\n", ord("A")) +printf("The string value of 65 is %s\n", chr(65)) +@end example + +@node Extension Sample Readdir +@subsection Reading Directories + +The @code{readdir} extension adds an input parser for directories, and +adds a single function named @code{readdir_do_ftype()}. +The usage is as follows: + +@example +@@load "readdir" + +readdir_do_ftype("stat") # or "dirent" or "never" +@end example + +When this extension is in use, instead of skipping directories named +on the command line (or with @code{getline}), +they are read, with each entry returned as a record. + +The record consists of at least two fields: the inode number and the +filename, separated by a forward slash character. +On systems where the directory entry contains the file type, the record +has a third field which is a single letter indicating the type of the +file: + +@multitable @columnfractions .1 .9 +@headitem Letter @tab File Type +@item @code{b} @tab Block device +@item @code{c} @tab Character device +@item @code{d} @tab Directory +@item @code{f} @tab Regular file +@item @code{l} @tab Symbolic link +@item @code{p} @tab Named pipe (FIFO) +@item @code{s} @tab Socket +@item @code{u} @tab Anything else (unknown) +@end multitable + +On systems without the file type information, calling +@samp{readdir_do_ftype("stat")} causes the extension to use the +@code{lstat()} system call to retrieve the appropriate information. This +is not the default, since @code{lstat()} is a potentially expensive +operation. By calling @samp{readdir_do_ftype("never")} one can ensure +that the file type information is never displayed, even when readily +available in the directory entry. + +The third option, @samp{readdir_do_ftype("dirent")}, takes file type +information from the directory entry, if it is available. This is the +default on systems that supply this information. + +The @code{readdir_do_ftype()} function sets @code{ERRNO} if called +without arguments or with invalid arguments. + +@quotation NOTE +On GNU/Linux systems, there are filesystems that don't support the +@code{d_type} entry (see the @i{readdir}(3) manual page), and so the file +type is always @samp{u}. Therefore, using @samp{readdir_do_ftype("stat")} +is advisable even on GNU/Linux systems. In this case, the @code{readdir} +extension falls back to using @code{lstat()} when it encounters an +unknown file type. +@end quotation + +Here is an example: + +@example +@@load "readdir" +@dots{} +BEGIN @{ FS = "/" @} +@{ print "file name is", $2 @} +@end example + +@node Extension Sample Revout +@subsection Reversing Output + +The @code{revoutput} extension adds a simple output wrapper that reverses +the characters in each output line. It's main purpose is to show how to +write an output wrapper, although it may be mildly amusing for the unwary. +Here is an example: + +@example +@@load "revoutput" + +BEGIN @{ + REVOUT = 1 + print "hello, world" > "/dev/stdout" +@} +@end example + +The output from this program is: +@samp{dlrow ,olleh}. + +@node Extension Sample Rev2way +@subsection Two-Way I/O Example + +The @code{revtwoway} extension adds a simple two-way processor that +reverses the characters in each line sent to it for reading back by +the @command{awk} program. It's main purpose is to show how to write +a two-way processor, although it may also be mildly amusing. +The following example shows how to use it: + +@example +@@load "revtwoway" + +BEGIN @{ + cmd = "/magic/mirror" + print "hello, world" |& cmd + cmd |& getline result + print result + close(cmd) +@} +@end example + +@node Extension Sample Read write array +@subsection Dumping and Restoring An Array + +The @code{rwarray} extension adds two functions, +named @code{writea()} and @code{reada()}, as follows: + +@table @code +@item ret = writea(file, array) +This function takes a string argument, which is the name of the file +to which dump the array, and the array itself as the second argument. +@code{writea()} understands multidimensional arrays. It returns one on +success, or zero upon failure. + +@item ret = reada(file, array) +@code{reada()} is the inverse of @code{writea()}; +it reads the file named as its first argument, filling in +the array named as the second argument. It clears the array first. +Here too, the return value is one on success and zero upon failure. +@end table + +The array created by @code{reada()} is identical to that written by +@code{writea()} in the sense that the contents are the same. However, +due to implementation issues, the array traversal order of the recreated +array is likely to be different from that of the original array. As array +traversal order in @command{awk} is by default undefined, this is not +(technically) a problem. If you need to guarantee a particular traversal +order, use the array sorting features in @command{gawk} to do so +(@pxref{Array Sorting}). + +The file contains binary data. All integral values are written in network +byte order. However, double precision floating-point values are written +as native binary data. Thus, arrays containing only string data can +theoretically be dumped on systems with one byte order and restored on +systems with a different one, but this has not been tried. + +Here is an example: + +@example +@@load "rwarray" +@dots{} +ret = writea("arraydump.bin", array) +@dots{} +ret = reada("arraydump.bin", array) +@end example + +@node Extension Sample Readfile +@subsection Reading An Entire File + +The @code{readfile} extension adds a single function +named @code{readfile()}: + +@table @code +@item result = readfile("/some/path") +The argument is the name of the file to read. The return value is a +string containing the entire contents of the requested file. Upon error, +the function returns the empty string and sets @code{ERRNO}. +@end table + +Here is an example: + +@example +@@load "readfile" +@dots{} +contents = readfile("/path/to/file"); +if (contents == "" && ERRNO != "") @{ + print("problem reading file", ERRNO) > "/dev/stderr" + ... +@} +@end example + +@node Extension Sample API Tests +@subsection API Tests + +The @code{testext} extension exercises parts of the extension API that +are not tested by the other samples. The @file{extension/testext.c} +file contains both the C code for the extension and @command{awk} +test code inside C comments that run the tests. The testing framework +extracts the @command{awk} code and runs the tests. See the source file +for more information. + +@node Extension Sample Time +@subsection Extension Time Functions + +@cindex time +@cindex sleep + +These functions can be used by either invoking @command{gawk} +with a command-line argument of @samp{-l time} or by +inserting @samp{@@load "time"} in your script. + +@table @code + +@cindex @code{gettimeofday} time extension function +@item the_time = gettimeofday() +Return the time in seconds that has elapsed since 1970-01-01 UTC as a +floating point value. If the time is unavailable on this platform, return +@minus{}1 and set @code{ERRNO}. The returned time should have sub-second +precision, but the actual precision will vary based on the platform. +If the standard C @code{gettimeofday()} system call is available on this +platform, then it simply returns the value. Otherwise, if on Windows, +it tries to use @code{GetSystemTimeAsFileTime()}. + +@cindex @code{sleep} time extension function +@item result = sleep(@var{seconds}) +Attempt to sleep for @var{seconds} seconds. If @var{seconds} is negative, +or the attempt to sleep fails, return @minus{}1 and set @code{ERRNO}. +Otherwise, return zero after sleeping for the indicated amount of time. +Note that @var{seconds} may be a floating-point (non-integral) value. +Implementation details: depending on platform availability, this function +tries to use @code{nanosleep()} or @code{select()} to implement the delay. +@end table + +@node gawkextlib +@section The @code{gawkextlib} Project + +The @uref{http://sourceforge.net/projects/gawkextlib/, @code{gawkextlib}} +project provides a number of @command{gawk} extensions, including one for +processing XML files. This is the evolution of the original @command{xgawk} +(XML @command{gawk}) project. + +As of this writing, there are four extensions: + +@itemize @bullet +@item +XML parser extension, using the @uref{http://expat.sourceforge.net, Expat} +XML parsing library. + +@item +Postgres SQL extension. + +@item +GD graphics library extension. + +@item +MPFR library extension. +This provides access to a number of MPFR functions which @command{gawk}'s +native MPFR support does not. +@end itemize + +The @code{time} extension described earlier (@pxref{Extension Sample +Time}) was originally from this project but has been moved in to the +main @command{gawk} distribution. + +You can check out the code for the @code{gawkextlib} project +using the @uref{http://git-scm.com, GIT} distributed source +code control system. The command is as follows: + +@example +git clone git://git.code.sf.net/p/gawkextlib/code gawkextlib-code +@end example + +You will need to have the @uref{http://expat.sourceforge.net, Expat} +XML parser library installed in order to build and use the XML extension. + +In addition, you must have the GNU Autotools installed +(@uref{http://www.gnu.org/software/autoconf, Autoconf}, +@uref{http://www.gnu.org/software/automake, Automake}, +@uref{http://www.gnu.org/software/libtool, Libtool}, +and +@uref{http://www.gnu.org/software/gettext, Gettext}). + +The simple recipe for building and testing @code{gawkextlib} is as follows. +First, build and install @command{gawk}: + +@example +cd .../path/to/gawk/code +./configure --prefix=/tmp/newgawk @ii{Install in /tmp/newgawk for now} +make && make check @ii{Build and check that all is OK} +make install @ii{Install gawk} +@end example + +Next, build @code{gawkextlib} and test it: + +@example +cd .../path/to/gawkextlib-code +./update-autotools @ii{Generate configure, etc.} + @ii{You may have to run this command twice} +./configure --with-gawk=/tmp/newgawk @ii{Configure, point at ``installed'' gawk} +make && make check @ii{Build and check that all is OK} +@end example + +If you write an extension that you wish to share with other +@command{gawk} users, please consider doing so through the +@code{gawkextlib} project. + @ignore @c Try this |