diff options
Diffstat (limited to 'doc/gawk.info')
-rw-r--r-- | doc/gawk.info | 5197 |
1 files changed, 4204 insertions, 993 deletions
diff --git a/doc/gawk.info b/doc/gawk.info index baa064d7..73fac121 100644 --- a/doc/gawk.info +++ b/doc/gawk.info @@ -113,419 +113,531 @@ texts being (a) (see below), and with the Back-Cover Texts being (b) * GNU Free Documentation License:: The license for this Info file. * Index:: Concept and Variable Index. -* History:: The history of `gawk' and - `awk'. -* Names:: What name to use to find `awk'. -* This Manual:: Using this Info file. Includes - sample input files that you can use. -* Conventions:: Typographical Conventions. -* Manual History:: Brief history of the GNU project and this - Info file. -* How To Contribute:: Helping to save the world. -* Acknowledgments:: Acknowledgments. -* Running gawk:: How to run `gawk' programs; - includes command-line syntax. -* One-shot:: Running a short throwaway `awk' - program. -* Read Terminal:: Using no input files (input from terminal - instead). -* Long:: Putting permanent `awk' programs in - files. -* Executable Scripts:: Making self-contained `awk' - programs. -* Comments:: Adding documentation to `gawk' - programs. -* Quoting:: More discussion of shell quoting issues. -* DOS Quoting:: Quoting in Windows Batch Files. -* Sample Data Files:: Sample data files for use in the - `awk' programs illustrated in this - Info file. -* Very Simple:: A very simple example. -* Two Rules:: A less simple one-line example using two - rules. -* More Complex:: A more complex example. -* Statements/Lines:: Subdividing or combining statements into - lines. -* Other Features:: Other Features of `awk'. -* When:: When to use `gawk' and when to use - other things. -* Command Line:: How to run `awk'. -* Options:: Command-line options and their meanings. -* Other Arguments:: Input file names and variable assignments. -* Naming Standard Input:: How to specify standard input with other - files. -* Environment Variables:: The environment variables `gawk' - uses. -* AWKPATH Variable:: Searching directories for `awk' - programs. -* AWKLIBPATH Variable:: Searching directories for `awk' - shared libraries. -* Other Environment Variables:: The environment variables. -* Exit Status:: `gawk''s exit status. -* Include Files:: Including other files into your program. -* Loading Shared Libraries:: Loading shared libraries into your program. -* Obsolete:: Obsolete Options and/or features. -* Undocumented:: Undocumented Options and Features. -* Regexp Usage:: How to Use Regular Expressions. -* Escape Sequences:: How to write nonprinting characters. -* Regexp Operators:: Regular Expression Operators. -* Bracket Expressions:: What can go between `[...]'. -* GNU Regexp Operators:: Operators specific to GNU software. -* Case-sensitivity:: How to do case-insensitive matching. -* Leftmost Longest:: How much text matches. -* Computed Regexps:: Using Dynamic Regexps. -* Records:: Controlling how data is split into records. -* Fields:: An introduction to fields. -* Nonconstant Fields:: Nonconstant Field Numbers. -* Changing Fields:: Changing the Contents of a Field. -* Field Separators:: The field separator and how to change it. -* Default Field Splitting:: How fields are normally separated. -* Regexp Field Splitting:: Using regexps as the field separator. -* Single Character Fields:: Making each character a separate field. -* Command Line Field Separator:: Setting `FS' from the command-line. -* Field Splitting Summary:: Some final points and a summary table. -* Constant Size:: Reading constant width data. -* Splitting By Content:: Defining Fields By Content -* Multiple Line:: Reading multi-line records. -* Getline:: Reading files under explicit program - control using the `getline' function. -* Plain Getline:: Using `getline' with no arguments. -* Getline/Variable:: Using `getline' into a variable. -* Getline/File:: Using `getline' from a file. -* Getline/Variable/File:: Using `getline' into a variable from a - file. -* Getline/Pipe:: Using `getline' from a pipe. -* Getline/Variable/Pipe:: Using `getline' into a variable from a - pipe. -* Getline/Coprocess:: Using `getline' from a coprocess. -* Getline/Variable/Coprocess:: Using `getline' into a variable from a - coprocess. -* Getline Notes:: Important things to know about - `getline'. -* Getline Summary:: Summary of `getline' Variants. -* Read Timeout:: Reading input with a timeout. -* Command line directories:: What happens if you put a directory on the - command line. -* Print:: The `print' statement. -* Print Examples:: Simple examples of `print' statements. -* Output Separators:: The output separators and how to change - them. -* OFMT:: Controlling Numeric Output With - `print'. -* Printf:: The `printf' statement. -* Basic Printf:: Syntax of the `printf' statement. -* Control Letters:: Format-control letters. -* Format Modifiers:: Format-specification modifiers. -* Printf Examples:: Several examples. -* Redirection:: How to redirect output to multiple files - and pipes. -* Special Files:: File name interpretation in `gawk'. - `gawk' allows access to inherited - file descriptors. -* Special FD:: Special files for I/O. -* Special Network:: Special files for network communications. -* Special Caveats:: Things to watch out for. -* Close Files And Pipes:: Closing Input and Output Files and Pipes. -* Values:: Constants, Variables, and Regular - Expressions. -* Constants:: String, numeric and regexp constants. -* Scalar Constants:: Numeric and string constants. -* Nondecimal-numbers:: What are octal and hex numbers. -* Regexp Constants:: Regular Expression constants. -* Using Constant Regexps:: When and how to use a regexp constant. -* Variables:: Variables give names to values for later - use. -* Using Variables:: Using variables in your programs. -* Assignment Options:: Setting variables on the command-line and a - summary of command-line syntax. This is an - advanced method of input. -* Conversion:: The conversion of strings to numbers and - vice versa. -* All Operators:: `gawk''s operators. -* Arithmetic Ops:: Arithmetic operations (`+', `-', - etc.) -* Concatenation:: Concatenating strings. -* Assignment Ops:: Changing the value of a variable or a - field. -* Increment Ops:: Incrementing the numeric value of a - variable. -* Truth Values and Conditions:: Testing for true and false. -* Truth Values:: What is ``true'' and what is ``false''. -* Typing and Comparison:: How variables acquire types and how this - affects comparison of numbers and strings - with `<', etc. -* Variable Typing:: String type versus numeric type. -* Comparison Operators:: The comparison operators. -* POSIX String Comparison:: String comparison with POSIX rules. -* Boolean Ops:: Combining comparison expressions using - boolean operators `||' (``or''), - `&&' (``and'') and `!' (``not''). -* Conditional Exp:: Conditional expressions select between two - subexpressions under control of a third - subexpression. -* Function Calls:: A function call is an expression. -* Precedence:: How various operators nest. -* Locales:: How the locale affects things. -* Pattern Overview:: What goes into a pattern. -* Regexp Patterns:: Using regexps as patterns. -* Expression Patterns:: Any expression can be used as a pattern. -* Ranges:: Pairs of patterns specify record ranges. -* BEGIN/END:: Specifying initialization and cleanup - rules. -* Using BEGIN/END:: How and why to use BEGIN/END rules. -* I/O And BEGIN/END:: I/O issues in BEGIN/END rules. -* BEGINFILE/ENDFILE:: Two special patterns for advanced control. -* Empty:: The empty pattern, which matches every - record. -* Using Shell Variables:: How to use shell variables with - `awk'. -* Action Overview:: What goes into an action. -* Statements:: Describes the various control statements in - detail. -* If Statement:: Conditionally execute some `awk' - statements. -* While Statement:: Loop until some condition is satisfied. -* Do Statement:: Do specified action while looping until - some condition is satisfied. -* For Statement:: Another looping statement, that provides - initialization and increment clauses. -* Switch Statement:: Switch/case evaluation for conditional - execution of statements based on a value. -* Break Statement:: Immediately exit the innermost enclosing - loop. -* Continue Statement:: Skip to the end of the innermost enclosing - loop. -* Next Statement:: Stop processing the current input record. -* Nextfile Statement:: Stop processing the current file. -* Exit Statement:: Stop execution of `awk'. -* Built-in Variables:: Summarizes the built-in variables. -* User-modified:: Built-in variables that you change to - control `awk'. -* Auto-set:: Built-in variables where `awk' - gives you information. -* ARGC and ARGV:: Ways to use `ARGC' and `ARGV'. -* Array Basics:: The basics of arrays. -* Array Intro:: Introduction to Arrays -* Reference to Elements:: How to examine one element of an array. -* Assigning Elements:: How to change an element of an array. -* Array Example:: Basic Example of an Array -* Scanning an Array:: A variation of the `for' statement. It - loops through the indices of an array's - existing elements. -* Controlling Scanning:: Controlling the order in which arrays are - scanned. -* Delete:: The `delete' statement removes an - element from an array. -* Numeric Array Subscripts:: How to use numbers as subscripts in - `awk'. -* Uninitialized Subscripts:: Using Uninitialized variables as - subscripts. -* Multi-dimensional:: Emulating multidimensional arrays in - `awk'. -* Multi-scanning:: Scanning multidimensional arrays. -* Arrays of Arrays:: True multidimensional arrays. -* Built-in:: Summarizes the built-in functions. -* Calling Built-in:: How to call built-in functions. -* Numeric Functions:: Functions that work with numbers, including - `int()', `sin()' and - `rand()'. -* String Functions:: Functions for string manipulation, such as - `split()', `match()' and - `sprintf()'. -* Gory Details:: More than you want to know about `\' - and `&' with `sub()', - `gsub()', and `gensub()'. -* I/O Functions:: Functions for files and shell commands. -* Time Functions:: Functions for dealing with timestamps. -* Bitwise Functions:: Functions for bitwise operations. -* Type Functions:: Functions for type information. -* I18N Functions:: Functions for string translation. -* User-defined:: Describes User-defined functions in detail. -* Definition Syntax:: How to write definitions and what they - mean. -* Function Example:: An example function definition and what it - does. -* Function Caveats:: Things to watch out for. -* Calling A Function:: Don't use spaces. -* Variable Scope:: Controlling variable scope. -* Pass By Value/Reference:: Passing parameters. -* Return Statement:: Specifying the value a function returns. -* Dynamic Typing:: How variable types can change at runtime. -* Indirect Calls:: Choosing the function to call at runtime. -* I18N and L10N:: Internationalization and Localization. -* Explaining gettext:: How GNU `gettext' works. -* Programmer i18n:: Features for the programmer. -* Translator i18n:: Features for the translator. -* String Extraction:: Extracting marked strings. -* Printf Ordering:: Rearranging `printf' arguments. -* I18N Portability:: `awk'-level portability issues. -* I18N Example:: A simple i18n example. -* Gawk I18N:: `gawk' is also internationalized. -* Nondecimal Data:: Allowing nondecimal input data. -* Array Sorting:: Facilities for controlling array traversal - and sorting arrays. -* Controlling Array Traversal:: How to use PROCINFO["sorted_in"]. -* Array Sorting Functions:: How to use `asort()' and - `asorti()'. -* Two-way I/O:: Two-way communications with another - process. -* TCP/IP Networking:: Using `gawk' for network - programming. -* Profiling:: Profiling your `awk' programs. -* Library Names:: How to best name private global variables - in library functions. -* General Functions:: Functions that are of general use. -* Strtonum Function:: A replacement for the built-in - `strtonum()' function. -* Assert Function:: A function for assertions in `awk' - programs. -* Round Function:: A function for rounding if `sprintf()' - does not do it correctly. -* Cliff Random Function:: The Cliff Random Number Generator. -* Ordinal Functions:: Functions for using characters as numbers - and vice versa. -* Join Function:: A function to join an array into a string. -* Getlocaltime Function:: A function to get formatted times. -* Data File Management:: Functions for managing command-line data - files. -* Filetrans Function:: A function for handling data file - transitions. -* Rewind Function:: A function for rereading the current file. -* File Checking:: Checking that data files are readable. -* Empty Files:: Checking for zero-length files. -* Ignoring Assigns:: Treating assignments as file names. -* Getopt Function:: A function for processing command-line - arguments. -* Passwd Functions:: Functions for getting user information. -* Group Functions:: Functions for getting group information. -* Walking Arrays:: A function to walk arrays of arrays. -* Running Examples:: How to run these examples. -* Clones:: Clones of common utilities. -* Cut Program:: The `cut' utility. -* Egrep Program:: The `egrep' utility. -* Id Program:: The `id' utility. -* Split Program:: The `split' utility. -* Tee Program:: The `tee' utility. -* Uniq Program:: The `uniq' utility. -* Wc Program:: The `wc' utility. -* Miscellaneous Programs:: Some interesting `awk' programs. -* Dupword Program:: Finding duplicated words in a document. -* Alarm Program:: An alarm clock. -* Translate Program:: A program similar to the `tr' - utility. -* Labels Program:: Printing mailing labels. -* Word Sorting:: A program to produce a word usage count. -* History Sorting:: Eliminating duplicate entries from a - history file. -* Extract Program:: Pulling out programs from Texinfo source - files. -* Simple Sed:: A Simple Stream Editor. -* Igawk Program:: A wrapper for `awk' that includes - files. -* Anagram Program:: Finding anagrams from a dictionary. -* Signature Program:: People do amazing things with too much time - on their hands. -* Debugging:: Introduction to `gawk' debugger. -* Debugging Concepts:: Debugging in General. -* Debugging Terms:: Additional Debugging Concepts. -* Awk Debugging:: Awk Debugging. -* Sample Debugging Session:: Sample debugging session. -* Debugger Invocation:: How to Start the Debugger. -* Finding The Bug:: Finding the Bug. -* List of Debugger Commands:: Main debugger commands. -* Breakpoint Control:: Control of Breakpoints. -* Debugger Execution Control:: Control of Execution. -* Viewing And Changing Data:: Viewing and Changing Data. -* Execution Stack:: Dealing with the Stack. -* Debugger Info:: Obtaining Information about the Program and - the Debugger State. -* Miscellaneous Debugger Commands:: Miscellaneous Commands. -* Readline Support:: Readline support. -* Limitations:: Limitations and future plans. -* General Arithmetic:: An introduction to computer arithmetic. -* Floating Point Issues:: Stuff to know about floating-point numbers. -* String Conversion Precision:: The String Value Can Lie. -* Unexpected Results:: Floating Point Numbers Are Not Abstract - Numbers. -* POSIX Floating Point Problems:: Standards Versus Existing Practice. -* Integer Programming:: Effective integer programming. -* Floating-point Programming:: Effective Floating-point Programming. -* Floating-point Representation:: Binary floating-point representation. -* Floating-point Context:: Floating-point context. -* Rounding Mode:: Floating-point rounding mode. -* Gawk and MPFR:: How `gawk' provides - arbitrary-precision arithmetic. -* Arbitrary Precision Floats:: Arbitrary Precision Floating-point - Arithmetic with `gawk'. -* Setting Precision:: Setting the working precision. -* Setting Rounding Mode:: Setting the rounding mode. -* Floating-point Constants:: Representing floating-point constants. -* Changing Precision:: Changing the precision of a number. -* Exact Arithmetic:: Exact arithmetic with floating-point - numbers. -* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic with - `gawk'. -* Plugin License:: A note about licensing. -* Sample Library:: A example of new functions. -* Internal File Description:: What the new functions will do. -* Internal File Ops:: The code for internal file operations. -* Using Internal File Ops:: How to use an external extension. -* V7/SVR3.1:: The major changes between V7 and System V - Release 3.1. -* SVR4:: Minor changes between System V Releases 3.1 - and 4. -* POSIX:: New features from the POSIX standard. -* BTL:: New features from Brian Kernighan's version - of `awk'. -* POSIX/GNU:: The extensions in `gawk' not in - POSIX `awk'. -* Common Extensions:: Common Extensions Summary. -* Ranges and Locales:: How locales used to affect regexp ranges. -* Contributors:: The major contributors to `gawk'. -* Gawk Distribution:: What is in the `gawk' distribution. -* Getting:: How to get the distribution. -* Extracting:: How to extract the distribution. -* Distribution contents:: What is in the distribution. -* Unix Installation:: Installing `gawk' under various - versions of Unix. -* Quick Installation:: Compiling `gawk' under Unix. -* Additional Configuration Options:: Other compile-time options. -* Configuration Philosophy:: How it's all supposed to work. -* Non-Unix Installation:: Installation on Other Operating Systems. -* PC Installation:: Installing and Compiling `gawk' on - MS-DOS and OS/2. -* PC Binary Installation:: Installing a prepared distribution. -* PC Compiling:: Compiling `gawk' for MS-DOS, - Windows32, and OS/2. -* PC Testing:: Testing `gawk' on PC systems. -* PC Using:: Running `gawk' on MS-DOS, Windows32 - and OS/2. -* Cygwin:: Building and running `gawk' for - Cygwin. -* MSYS:: Using `gawk' In The MSYS - Environment. -* VMS Installation:: Installing `gawk' on VMS. -* VMS Compilation:: How to compile `gawk' under VMS. -* VMS Installation Details:: How to install `gawk' under VMS. -* VMS Running:: How to run `gawk' under VMS. -* VMS Old Gawk:: An old version comes with some VMS systems. -* Bugs:: Reporting Problems and Bugs. -* Other Versions:: Other freely available `awk' - implementations. -* Compatibility Mode:: How to disable certain `gawk' - extensions. -* Additions:: Making Additions To `gawk'. -* Accessing The Source:: Accessing the Git repository. -* Adding Code:: Adding code to the main body of - `gawk'. -* New Ports:: Porting `gawk' to a new operating - system. -* Derived Files:: Why derived files are kept in the - `git' repository. -* Future Extensions:: New features that may be implemented one - day. -* Basic High Level:: The high level view. -* Basic Data Typing:: A very quick intro to data types. +* History:: The history of `gawk' and + `awk'. +* Names:: What name to use to find + `awk'. +* This Manual:: Using this Info file. Includes + sample input files that you can use. +* Conventions:: Typographical Conventions. +* Manual History:: Brief history of the GNU project and + this Info file. +* How To Contribute:: Helping to save the world. +* Acknowledgments:: Acknowledgments. +* Running gawk:: How to run `gawk' programs; + includes command-line syntax. +* One-shot:: Running a short throwaway + `awk' program. +* Read Terminal:: Using no input files (input from + terminal instead). +* Long:: Putting permanent `awk' + programs in files. +* Executable Scripts:: Making self-contained `awk' + programs. +* Comments:: Adding documentation to `gawk' + programs. +* Quoting:: More discussion of shell quoting + issues. +* DOS Quoting:: Quoting in Windows Batch Files. +* Sample Data Files:: Sample data files for use in the + `awk' programs illustrated in + this Info file. +* Very Simple:: A very simple example. +* Two Rules:: A less simple one-line example using + two rules. +* More Complex:: A more complex example. +* Statements/Lines:: Subdividing or combining statements + into lines. +* Other Features:: Other Features of `awk'. +* When:: When to use `gawk' and when to + use other things. +* Command Line:: How to run `awk'. +* Options:: Command-line options and their + meanings. +* Other Arguments:: Input file names and variable + assignments. +* Naming Standard Input:: How to specify standard input with + other files. +* Environment Variables:: The environment variables + `gawk' uses. +* AWKPATH Variable:: Searching directories for + `awk' programs. +* AWKLIBPATH Variable:: Searching directories for + `awk' shared libraries. +* Other Environment Variables:: The environment variables. +* Exit Status:: `gawk''s exit status. +* Include Files:: Including other files into your + program. +* Loading Shared Libraries:: Loading shared libraries into your + program. +* Obsolete:: Obsolete Options and/or features. +* Undocumented:: Undocumented Options and Features. +* Regexp Usage:: How to Use Regular Expressions. +* Escape Sequences:: How to write nonprinting characters. +* Regexp Operators:: Regular Expression Operators. +* Bracket Expressions:: What can go between `[...]'. +* GNU Regexp Operators:: Operators specific to GNU software. +* Case-sensitivity:: How to do case-insensitive matching. +* Leftmost Longest:: How much text matches. +* Computed Regexps:: Using Dynamic Regexps. +* Records:: Controlling how data is split into + records. +* Fields:: An introduction to fields. +* Nonconstant Fields:: Nonconstant Field Numbers. +* Changing Fields:: Changing the Contents of a Field. +* Field Separators:: The field separator and how to change + it. +* Default Field Splitting:: How fields are normally separated. +* Regexp Field Splitting:: Using regexps as the field separator. +* Single Character Fields:: Making each character a separate + field. +* Command Line Field Separator:: Setting `FS' from the + command-line. +* Field Splitting Summary:: Some final points and a summary table. +* Constant Size:: Reading constant width data. +* Splitting By Content:: Defining Fields By Content +* Multiple Line:: Reading multi-line records. +* Getline:: Reading files under explicit program + control using the `getline' + function. +* Plain Getline:: Using `getline' with no + arguments. +* Getline/Variable:: Using `getline' into a variable. +* Getline/File:: Using `getline' from a file. +* Getline/Variable/File:: Using `getline' into a variable + from a file. +* Getline/Pipe:: Using `getline' from a pipe. +* Getline/Variable/Pipe:: Using `getline' into a variable + from a pipe. +* Getline/Coprocess:: Using `getline' from a coprocess. +* Getline/Variable/Coprocess:: Using `getline' into a variable + from a coprocess. +* Getline Notes:: Important things to know about + `getline'. +* Getline Summary:: Summary of `getline' Variants. +* Read Timeout:: Reading input with a timeout. +* Command line directories:: What happens if you put a directory on + the command line. +* Print:: The `print' statement. +* Print Examples:: Simple examples of `print' + statements. +* Output Separators:: The output separators and how to + change them. +* OFMT:: Controlling Numeric Output With + `print'. +* Printf:: The `printf' statement. +* Basic Printf:: Syntax of the `printf' statement. +* Control Letters:: Format-control letters. +* Format Modifiers:: Format-specification modifiers. +* Printf Examples:: Several examples. +* Redirection:: How to redirect output to multiple + files and pipes. +* Special Files:: File name interpretation in + `gawk'. `gawk' allows + access to inherited file descriptors. +* Special FD:: Special files for I/O. +* Special Network:: Special files for network + communications. +* Special Caveats:: Things to watch out for. +* Close Files And Pipes:: Closing Input and Output Files and + Pipes. +* Values:: Constants, Variables, and Regular + Expressions. +* Constants:: String, numeric and regexp constants. +* Scalar Constants:: Numeric and string constants. +* Nondecimal-numbers:: What are octal and hex numbers. +* Regexp Constants:: Regular Expression constants. +* Using Constant Regexps:: When and how to use a regexp constant. +* Variables:: Variables give names to values for + later use. +* Using Variables:: Using variables in your programs. +* Assignment Options:: Setting variables on the command-line + and a summary of command-line syntax. + This is an advanced method of input. +* Conversion:: The conversion of strings to numbers + and vice versa. +* All Operators:: `gawk''s operators. +* Arithmetic Ops:: Arithmetic operations (`+', + `-', etc.) +* Concatenation:: Concatenating strings. +* Assignment Ops:: Changing the value of a variable or a + field. +* Increment Ops:: Incrementing the numeric value of a + variable. +* Truth Values and Conditions:: Testing for true and false. +* Truth Values:: What is ``true'' and what is + ``false''. +* Typing and Comparison:: How variables acquire types and how + this affects comparison of numbers and + strings with `<', etc. +* Variable Typing:: String type versus numeric type. +* Comparison Operators:: The comparison operators. +* POSIX String Comparison:: String comparison with POSIX rules. +* Boolean Ops:: Combining comparison expressions using + boolean operators `||' (``or''), + `&&' (``and'') and `!' + (``not''). +* Conditional Exp:: Conditional expressions select between + two subexpressions under control of a + third subexpression. +* Function Calls:: A function call is an expression. +* Precedence:: How various operators nest. +* Locales:: How the locale affects things. +* Pattern Overview:: What goes into a pattern. +* Regexp Patterns:: Using regexps as patterns. +* Expression Patterns:: Any expression can be used as a + pattern. +* Ranges:: Pairs of patterns specify record + ranges. +* BEGIN/END:: Specifying initialization and cleanup + rules. +* Using BEGIN/END:: How and why to use BEGIN/END rules. +* I/O And BEGIN/END:: I/O issues in BEGIN/END rules. +* BEGINFILE/ENDFILE:: Two special patterns for advanced + control. +* Empty:: The empty pattern, which matches every + record. +* Using Shell Variables:: How to use shell variables with + `awk'. +* Action Overview:: What goes into an action. +* Statements:: Describes the various control + statements in detail. +* If Statement:: Conditionally execute some + `awk' statements. +* While Statement:: Loop until some condition is + satisfied. +* Do Statement:: Do specified action while looping + until some condition is satisfied. +* For Statement:: Another looping statement, that + provides initialization and increment + clauses. +* Switch Statement:: Switch/case evaluation for conditional + execution of statements based on a + value. +* Break Statement:: Immediately exit the innermost + enclosing loop. +* Continue Statement:: Skip to the end of the innermost + enclosing loop. +* Next Statement:: Stop processing the current input + record. +* Nextfile Statement:: Stop processing the current file. +* Exit Statement:: Stop execution of `awk'. +* Built-in Variables:: Summarizes the built-in variables. +* User-modified:: Built-in variables that you change to + control `awk'. +* Auto-set:: Built-in variables where `awk' + gives you information. +* ARGC and ARGV:: Ways to use `ARGC' and + `ARGV'. +* Array Basics:: The basics of arrays. +* Array Intro:: Introduction to Arrays +* Reference to Elements:: How to examine one element of an + array. +* Assigning Elements:: How to change an element of an array. +* Array Example:: Basic Example of an Array +* Scanning an Array:: A variation of the `for' + statement. It loops through the + indices of an array's existing + elements. +* Controlling Scanning:: Controlling the order in which arrays + are scanned. +* Delete:: The `delete' statement removes an + element from an array. +* Numeric Array Subscripts:: How to use numbers as subscripts in + `awk'. +* Uninitialized Subscripts:: Using Uninitialized variables as + subscripts. +* Multi-dimensional:: Emulating multidimensional arrays in + `awk'. +* Multi-scanning:: Scanning multidimensional arrays. +* Arrays of Arrays:: True multidimensional arrays. +* Built-in:: Summarizes the built-in functions. +* Calling Built-in:: How to call built-in functions. +* Numeric Functions:: Functions that work with numbers, + including `int()', `sin()' + and `rand()'. +* String Functions:: Functions for string manipulation, + such as `split()', `match()' + and `sprintf()'. +* Gory Details:: More than you want to know about + `\' and `&' with + `sub()', `gsub()', and + `gensub()'. +* I/O Functions:: Functions for files and shell + commands. +* Time Functions:: Functions for dealing with timestamps. +* Bitwise Functions:: Functions for bitwise operations. +* Type Functions:: Functions for type information. +* I18N Functions:: Functions for string translation. +* User-defined:: Describes User-defined functions in + detail. +* Definition Syntax:: How to write definitions and what they + mean. +* Function Example:: An example function definition and + what it does. +* Function Caveats:: Things to watch out for. +* Calling A Function:: Don't use spaces. +* Variable Scope:: Controlling variable scope. +* Pass By Value/Reference:: Passing parameters. +* Return Statement:: Specifying the value a function + returns. +* Dynamic Typing:: How variable types can change at + runtime. +* Indirect Calls:: Choosing the function to call at + runtime. +* I18N and L10N:: Internationalization and Localization. +* Explaining gettext:: How GNU `gettext' works. +* Programmer i18n:: Features for the programmer. +* Translator i18n:: Features for the translator. +* String Extraction:: Extracting marked strings. +* Printf Ordering:: Rearranging `printf' arguments. +* I18N Portability:: `awk'-level portability + issues. +* I18N Example:: A simple i18n example. +* Gawk I18N:: `gawk' is also + internationalized. +* Nondecimal Data:: Allowing nondecimal input data. +* Array Sorting:: Facilities for controlling array + traversal and sorting arrays. +* Controlling Array Traversal:: How to use PROCINFO["sorted_in"]. +* Array Sorting Functions:: How to use `asort()' and + `asorti()'. +* Two-way I/O:: Two-way communications with another + process. +* TCP/IP Networking:: Using `gawk' for network + programming. +* Profiling:: Profiling your `awk' programs. +* Library Names:: How to best name private global + variables in library functions. +* General Functions:: Functions that are of general use. +* Strtonum Function:: A replacement for the built-in + `strtonum()' function. +* Assert Function:: A function for assertions in + `awk' programs. +* Round Function:: A function for rounding if + `sprintf()' does not do it + correctly. +* Cliff Random Function:: The Cliff Random Number Generator. +* Ordinal Functions:: Functions for using characters as + numbers and vice versa. +* Join Function:: A function to join an array into a + string. +* Getlocaltime Function:: A function to get formatted times. +* Data File Management:: Functions for managing command-line + data files. +* Filetrans Function:: A function for handling data file + transitions. +* Rewind Function:: A function for rereading the current + file. +* File Checking:: Checking that data files are readable. +* Empty Files:: Checking for zero-length files. +* Ignoring Assigns:: Treating assignments as file names. +* Getopt Function:: A function for processing command-line + arguments. +* Passwd Functions:: Functions for getting user + information. +* Group Functions:: Functions for getting group + information. +* Walking Arrays:: A function to walk arrays of arrays. +* Running Examples:: How to run these examples. +* Clones:: Clones of common utilities. +* Cut Program:: The `cut' utility. +* Egrep Program:: The `egrep' utility. +* Id Program:: The `id' utility. +* Split Program:: The `split' utility. +* Tee Program:: The `tee' utility. +* Uniq Program:: The `uniq' utility. +* Wc Program:: The `wc' utility. +* Miscellaneous Programs:: Some interesting `awk' + programs. +* Dupword Program:: Finding duplicated words in a + document. +* Alarm Program:: An alarm clock. +* Translate Program:: A program similar to the `tr' + utility. +* Labels Program:: Printing mailing labels. +* Word Sorting:: A program to produce a word usage + count. +* History Sorting:: Eliminating duplicate entries from a + history file. +* Extract Program:: Pulling out programs from Texinfo + source files. +* Simple Sed:: A Simple Stream Editor. +* Igawk Program:: A wrapper for `awk' that + includes files. +* Anagram Program:: Finding anagrams from a dictionary. +* Signature Program:: People do amazing things with too much + time on their hands. +* Debugging:: Introduction to `gawk' + debugger. +* Debugging Concepts:: Debugging in General. +* Debugging Terms:: Additional Debugging Concepts. +* Awk Debugging:: Awk Debugging. +* Sample Debugging Session:: Sample debugging session. +* Debugger Invocation:: How to Start the Debugger. +* Finding The Bug:: Finding the Bug. +* List of Debugger Commands:: Main debugger commands. +* Breakpoint Control:: Control of Breakpoints. +* Debugger Execution Control:: Control of Execution. +* Viewing And Changing Data:: Viewing and Changing Data. +* Execution Stack:: Dealing with the Stack. +* Debugger Info:: Obtaining Information about the + Program and the Debugger State. +* Miscellaneous Debugger Commands:: Miscellaneous Commands. +* Readline Support:: Readline support. +* Limitations:: Limitations and future plans. +* General Arithmetic:: An introduction to computer + arithmetic. +* Floating Point Issues:: Stuff to know about floating-point + numbers. +* String Conversion Precision:: The String Value Can Lie. +* Unexpected Results:: Floating Point Numbers Are Not + Abstract Numbers. +* POSIX Floating Point Problems:: Standards Versus Existing Practice. +* Integer Programming:: Effective integer programming. +* Floating-point Programming:: Effective Floating-point Programming. +* Floating-point Representation:: Binary floating-point representation. +* Floating-point Context:: Floating-point context. +* Rounding Mode:: Floating-point rounding mode. +* Gawk and MPFR:: How `gawk' provides + arbitrary-precision arithmetic. +* Arbitrary Precision Floats:: Arbitrary Precision Floating-point + Arithmetic with `gawk'. +* Setting Precision:: Setting the working precision. +* Setting Rounding Mode:: Setting the rounding mode. +* Floating-point Constants:: Representing floating-point constants. +* Changing Precision:: Changing the precision of a number. +* Exact Arithmetic:: Exact arithmetic with floating-point + numbers. +* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic + with `gawk'. +* Extension Intro:: What is an extension. +* Plugin License:: A note about licensing. +* Extension Design:: Design notes about the extension API. +* Old Extension Problems:: Problems with the old mechanism. +* Extension New Mechanism Goals:: Goals for the new mechanism. +* Extension Other Design Decisions:: Some other design decisions. +* Extension Mechanism Outline:: An outline of how it works. +* Extension Future Growth:: Some room for future growth. +* Extension API Description:: A full description of the API. +* Extension API Functions Introduction:: Introduction to the API functions. +* General Data Types:: The data types. +* Requesting Values:: How to get a value. +* Constructor Functions:: Functions for creating values. +* Registration Functions:: Functions to register things with + `gawk'. +* Extension Functions:: Registering extension functions. +* Exit Callback Functions:: Registering an exit callback. +* Extension Version String:: Registering a version string. +* Input Parsers:: Registering an input parser. +* Output Wrappers:: Registering an output wrapper. +* Two-way processors:: Registering a two-way processor. +* Printing Messages:: Functions for printing messages. +* Updating `ERRNO':: Functions for updating `ERRNO'. +* Accessing Parameters:: Functions for accessing parameters. +* Symbol Table Access:: Functions for accessing global + variables. +* Symbol table by name:: Accessing variables by name. +* Symbol table by cookie:: Accessing variables by ``cookie''. +* Cached values:: Creating and using cached values. +* Array Manipulation:: Functions for working with arrays. +* Array Data Types:: Data types for working with arrays. +* Array Functions:: Functions for working with arrays. +* Flattening Arrays:: How to flatten arrays. +* Creating Arrays:: How to create and populate arrays. +* Extension API Variables:: Variables provided by the API. +* Extension Versioning:: API Version information. +* Extension API Informational Variables:: Variables providing information about + `gawk''s invocation. +* Extension API Boilerplate:: Boilerplate code for using the API. +* Finding Extensions:: How `gawk' find compiled + extensions. +* Extension Example:: Example C code for an extension. +* Internal File Description:: What the new functions will do. +* Internal File Ops:: The code for internal file operations. +* Using Internal File Ops:: How to use an external extension. +* Extension Samples:: The sample extensions that ship with + `gawk'. +* Extension Sample File Functions:: The file functions sample. +* Extension Sample Fnmatch:: An interface to `fnmatch()'. +* Extension Sample Fork:: An interface to `fork()' and + other process functions. +* Extension Sample Ord:: Character to value to character + conversions. +* Extension Sample Readdir:: An interface to `readdir()'. +* Extension Sample Revout:: Reversing output sample output + wrapper. +* Extension Sample Rev2way:: Reversing data sample two-way + processor. +* Extension Sample Read write array:: Serializing an array to a file. +* Extension Sample Readfile:: Reading an entire file into a string. +* Extension Sample API Tests:: Tests for the API. +* Extension Sample Time:: An interface to `gettimeofday()' + and `sleep()'. +* gawkextlib:: The `gawkextlib' project. +* V7/SVR3.1:: The major changes between V7 and + System V Release 3.1. +* SVR4:: Minor changes between System V + Releases 3.1 and 4. +* POSIX:: New features from the POSIX standard. +* BTL:: New features from Brian Kernighan's + version of `awk'. +* POSIX/GNU:: The extensions in `gawk' not + in POSIX `awk'. +* Common Extensions:: Common Extensions Summary. +* Ranges and Locales:: How locales used to affect regexp + ranges. +* Contributors:: The major contributors to + `gawk'. +* Gawk Distribution:: What is in the `gawk' + distribution. +* Getting:: How to get the distribution. +* Extracting:: How to extract the distribution. +* Distribution contents:: What is in the distribution. +* Unix Installation:: Installing `gawk' under + various versions of Unix. +* Quick Installation:: Compiling `gawk' under Unix. +* Additional Configuration Options:: Other compile-time options. +* Configuration Philosophy:: How it's all supposed to work. +* Non-Unix Installation:: Installation on Other Operating + Systems. +* PC Installation:: Installing and Compiling + `gawk' on MS-DOS and OS/2. +* PC Binary Installation:: Installing a prepared distribution. +* PC Compiling:: Compiling `gawk' for MS-DOS, + Windows32, and OS/2. +* PC Testing:: Testing `gawk' on PC systems. +* PC Using:: Running `gawk' on MS-DOS, + Windows32 and OS/2. +* Cygwin:: Building and running `gawk' + for Cygwin. +* MSYS:: Using `gawk' In The MSYS + Environment. +* VMS Installation:: Installing `gawk' on VMS. +* VMS Compilation:: How to compile `gawk' under + VMS. +* VMS Installation Details:: How to install `gawk' under + VMS. +* VMS Running:: How to run `gawk' under VMS. +* VMS Old Gawk:: An old version comes with some VMS + systems. +* Bugs:: Reporting Problems and Bugs. +* Other Versions:: Other freely available `awk' + implementations. +* Compatibility Mode:: How to disable certain `gawk' + extensions. +* Additions:: Making Additions To `gawk'. +* Accessing The Source:: Accessing the Git repository. +* Adding Code:: Adding code to the main body of + `gawk'. +* New Ports:: Porting `gawk' to a new + operating system. +* Derived Files:: Why derived files are kept in the + `git' repository. +* Future Extensions:: New features that may be implemented + one day. +* Basic High Level:: The high level view. +* Basic Data Typing:: A very quick intro to data types. To Miriam, for making me complete. @@ -21180,34 +21292,66 @@ File: gawk.info, Node: Dynamic Extensions, Next: Language History, Prev: Arbi 16 Writing Extensions for `gawk' ******************************** -This chapter is a placeholder, pending a rewrite for the new API. Some -of the old bits remain, since they can be partially reused. - - It is possible to add new built-in functions to `gawk' using +It is possible to add new built-in functions to `gawk' using dynamically loaded libraries. This facility is available on systems (such as GNU/Linux) that support the C `dlopen()' and `dlsym()' -functions. This major node describes how to write and use dynamically -loaded extensions for `gawk'. Experience with programming in C or C++ -is necessary when reading this minor node. +functions. This major node describes how to create extensions using +code written in C or C++. If you don't know anything about C +programming, you can safely skip this major node, although you may wish +to review the documentation on the extensions that come with `gawk' +(*note Extension Samples::), and the section on the `gawkextlib' +project (*note gawkextlib::). NOTE: When `--sandbox' is specified, extensions are disabled - (*note Options::. + (*note Options::). * Menu: +* Extension Intro:: What is an extension. * Plugin License:: A note about licensing. -* Sample Library:: A example of new functions. +* Extension Design:: Design notes about the extension API. +* Extension API Description:: A full description of the API. +* Extension Example:: Example C code for an extension. +* Extension Samples:: The sample extensions that ship with + `gawk'. +* gawkextlib:: The `gawkextlib' project. + + +File: gawk.info, Node: Extension Intro, Next: Plugin License, Up: Dynamic Extensions + +16.1 Introduction +================= + +An "extension" (sometimes called a "plug-in") is a piece of external +compiled code that `gawk' can load at runtime to provide additional +functionality, over and above the built-in capabilities described in +the rest of this Info file. + + Extensions are useful because they allow you (of course) to extend +`gawk''s functionality. For example, they can provide access to system +calls (such as `chdir()' to change directory) and to other C library +routines that could be of use. As with most software, "the sky is the +limit;" if you can imagine something that you might want to do and can +write in C or C++, you can write an extension to do it! + + Extensions are written in C or C++, using the "Application +Programming Interface" (API) defined for this purpose by the `gawk' +developers. The rest of this major node explains the design decisions +behind the API, the facilities it provides and how to use them, and +presents a small sample extension. In addition, it documents the +sample extensions included in the `gawk' distribution, and describes +the `gawkextlib' project. -File: gawk.info, Node: Plugin License, Next: Sample Library, Up: Dynamic Extensions +File: gawk.info, Node: Plugin License, Next: Extension Design, Prev: Extension Intro, Up: Dynamic Extensions -16.1 Extension Licensing +16.2 Extension Licensing ======================== Every dynamic extension should define the global symbol `plugin_is_GPL_compatible' to assert that it has been licensed under a -GPL-compatible license. If this symbol does not exist, `gawk' will -emit a fatal error and exit. +GPL-compatible license. If this symbol does not exist, `gawk' emits a +fatal error and exits when it tries to load your extension. The declared type of the symbol should be `int'. It does not need to be in any allocated section, though. The code merely asserts that @@ -21216,15 +21360,2213 @@ the symbol exists in the global scope. Something like this is enough: int plugin_is_GPL_compatible; -File: gawk.info, Node: Sample Library, Prev: Plugin License, Up: Dynamic Extensions +File: gawk.info, Node: Extension Design, Next: Extension API Description, Prev: Plugin License, Up: Dynamic Extensions -16.2 Example: Directory and File Operation Built-ins -==================================================== +16.3 Extension API Design +========================= + +The first version of extensions for `gawk' was developed in the +mid-1990s and released with `gawk' 3.1 in the late 1990s. The basic +mechanisms and design remained unchanged for close to 15 years, until +2012. + + The old extension mechanism used data types and functions from +`gawk' itself, with a "clever hack" to install extension functions. + + `gawk' included some sample extensions, of which a few were really +useful. However, it was clear from the outset that the extension +mechanism was bolted onto the side and was not really thought out. + +* Menu: + +* Old Extension Problems:: Problems with the old mechanism. +* Extension New Mechanism Goals:: Goals for the new mechanism. +* Extension Other Design Decisions:: Some other design decisions. +* Extension Mechanism Outline:: An outline of how it works. +* Extension Future Growth:: Some room for future growth. + + +File: gawk.info, Node: Old Extension Problems, Next: Extension New Mechanism Goals, Up: Extension Design + +16.3.1 Problems With The Old Mechanism +-------------------------------------- + +The old extension mechanism had several problems: + + * It depended heavily upon `gawk' internals. Any time the `NODE' + structure(1) changed, an extension would have to be recompiled. + Furthermore, to really write extensions required understanding + something about `gawk''s internal functions. There was some + documentation in this Info file, but it was quite minimal. + + * Being able to call into `gawk' from an extension required linker + facilities that are common on Unix-derived systems but that did + not work on Windows systems; users wanting extensions on Windows + had to statically link them into `gawk', even though Windows + supports dynamic loading of shared objects. + + * The API would change occasionally as `gawk' changed; no + compatibility between versions was ever offered or planned for. + + Despite the drawbacks, the `xgawk' project developers forked `gawk' +and developed several significant extensions. They also enhanced +`gawk''s facilities relating to file inclusion and shared object access. + + A new API was desired for a long time, but only in 2012 did the +`gawk' maintainer and the `xgawk' developers finally start working on +it together. More information about the `xgawk' project is provided in +*note gawkextlib::. + + ---------- Footnotes ---------- + + (1) A critical central data structure inside `gawk'. + + +File: gawk.info, Node: Extension New Mechanism Goals, Next: Extension Other Design Decisions, Prev: Old Extension Problems, Up: Extension Design + +16.3.2 Goals For A New Mechanism +-------------------------------- + +Some goals for the new API were: + + * The API should be independent of `gawk' internals. Changes in + `gawk' internals should not be visible to the writer of an + extension function. + + * The API should provide _binary_ compatibility across `gawk' + releases as long as the API itself does not change. + + * The API should enable extensions written in C to have roughly the + same "appearance" to `awk'-level code as `awk' functions do. This + means that extensions should have: + + - The ability to access function parameters. + + - The ability to turn an undefined parameter into an array + (call by reference). + + - The ability to create, access and update global variables. + + - Easy access to all the elements of an array at once ("array + flattening") in order to loop over all the element in an easy + fashion for C code. + + - The ability to create arrays (including `gawk''s true + multi-dimensional arrays). + + Some additional important goals were: + + * The API should use only features in ISO C 90, so that extensions + can be written using the widest range of C and C++ compilers. The + header should include the appropriate `#ifdef __cplusplus' and + `extern "C"' magic so that a C++ compiler could be used. (If + using C++, the runtime system has to be smart enough to call any + constructors and destructors, as `gawk' is a C program. As of this + writing, this has not been tested.) + + * The API mechanism should not require access to `gawk''s symbols(1) + by the compile-time or dynamic linker, in order to enable creation + of extensions that also work on Windows. + + During development, it became clear that there were other features +that should be available to extensions, which were also subsequently +provided: + + * Extensions should have the ability to hook into `gawk''s I/O + redirection mechanism. In particular, the `xgawk' developers + provided a so-called "open hook" to take over reading records. + During development, this was generalized to allow extensions to + hook into input processing, output processing, and two-way I/O. + + * An extension should be able to provide a "call back" function to + perform clean up actions when `gawk' exits. + + * An extension should be able to provide a version string so that + `gawk''s `--version' option can provide information about + extensions as well. + + ---------- Footnotes ---------- + + (1) The "symbols" are the variables and functions defined inside +`gawk'. Access to these symbols by code external to `gawk' loaded +dynamically at runtime is problematic on Windows. + + +File: gawk.info, Node: Extension Other Design Decisions, Next: Extension Mechanism Outline, Prev: Extension New Mechanism Goals, Up: Extension Design + +16.3.3 Other Design Decisions +----------------------------- + +As an "arbitrary" design decision, extensions can read the values of +built-in variables and arrays (such as `ARGV' and `FS'), but cannot +change them, with the exception of `PROCINFO'. + + The reason for this is to prevent an extension function from +affecting the flow of an `awk' program outside its control. While a +real `awk' function can do what it likes, that is at the discretion of +the programmer. An extension function should provide a service or make +a C API available for use within `awk', and not mess with `FS' or +`ARGC' and `ARGV'. + + In addition, it becomes easy to start down a slippery slope. How +much access to `gawk' facilities do extensions need? Do they need +`getline'? What about calling `gsub()' or compiling regular +expressions? What about calling into `awk' functions? (_That_ would be +messy.) + + In order to avoid these issues, the `gawk' developers chose to start +with the simplest, most basic features that are still truly useful. + + Another decision is that although `gawk' provides nice things like +MPFR, and arrays indexed internally by integers, these features are not +being brought out to the API in order to keep things simple and close to +traditional `awk' semantics. (In fact, arrays indexed internally by +integers are so transparent that they aren't even documented!) + + With time, the API will undoubtedly evolve; the `gawk' developers +expect this to be driven by user needs. For now, the current API seems +to provide a minimal yet powerful set of features for creating +extensions. + + +File: gawk.info, Node: Extension Mechanism Outline, Next: Extension Future Growth, Prev: Extension Other Design Decisions, Up: Extension Design + +16.3.4 At A High Level How It Works +----------------------------------- + +The requirement to avoid access to `gawk''s symbols is, at first +glance, a difficult one to meet. + + One design, apparently used by Perl and Ruby and maybe others, would +be to make the mainline `gawk' code into a library, with the `gawk' +utility a small C `main()' function linked against the library. + + This seemed like the tail wagging the dog, complicating build and +installation and making a simple copy of the `gawk' executable from one +system to another (or one place to another on the same system!) into a +chancy operation. + + Pat Rankin suggested the solution that was adopted. Communication +between `gawk' and an extension is two-way. First, when an extension +is loaded, it is passed a pointer to a `struct' whose fields are +function pointers. + + API + Struct + +---+ + | | + +---+ + +---------------| | + | +---+ dl_load(api_p, id); + | | | ___________________ + | +---+ | + | +---------| | __________________ | + | | +---+ || + | | | | || + | | +---+ || + | | +---| | || + | | | +---+ \ || / + | | | \ / + v v v \/ ++-------+-+---+-+---+-+------------------+--------------------+ +| |x| |x| |x| |OOOOOOOOOOOOOOOOOOOO| +| |x| |x| |x| |OOOOOOOOOOOOOOOOOOOO| +| |x| |x| |x| |OOOOOOOOOOOOOOOOOOOO| ++-------+-+---+-+---+-+------------------+--------------------+ + + gawk Main Program Address Space Extension +Figure 16.1: Loading the extension + + The extension can call functions inside `gawk' through these +function pointers, at runtime, without needing (link-time) access to +`gawk''s symbols. One of these function pointers is to a function for +"registering" new built-in functions. + + register_ext_func({ "chdir", do_chdir, 1 }); + + +--------------------------------------------+ + | | + V | ++-------+-+---+-+---+-+------------------+--------------+-+---+ +| |x| |x| |x| |OOOOOOOOOOOOOO|X|OOO| +| |x| |x| |x| |OOOOOOOOOOOOOO|X|OOO| +| |x| |x| |x| |OOOOOOOOOOOOOO|X|OOO| ++-------+-+---+-+---+-+------------------+--------------+-+---+ + + gawk Main Program Address Space Extension +Figure 16.2: Loading the new function + + In the other direction, the extension registers its new functions +with `gawk' by passing function pointers to the functions that provide +the new feature (`do_chdir()', for example). `gawk' associates the +function pointer with a name and can then call it, using a defined +calling convention. + + BEGIN { + chdir("/path") (*fnptr)(1); + } + +--------------------------------------------+ + | | + | V ++-------+-+---+-+---+-+------------------+--------------+-+---+ +| |x| |x| |x| |OOOOOOOOOOOOOO|X|OOO| +| |x| |x| |x| |OOOOOOOOOOOOOO|X|OOO| +| |x| |x| |x| |OOOOOOOOOOOOOO|X|OOO| ++-------+-+---+-+---+-+------------------+--------------+-+---+ + + gawk Main Program Address Space Extension +Figure 16.3: Calling the new function + + The `do_XXX()' function, in turn, then uses the function pointers in +the API `struct' to do its work, such as updating variables or arrays, +printing messages, setting `ERRNO', and so on. + + Convenience macros in the `gawkapi.h' header file make calling +through the function pointers look like regular function calls so that +extension code is quite readable and understandable. + + Although all of this sounds medium complicated, the result is that +extension code is quite clean and straightforward. This can be seen in +the sample extensions `filefuncs.c' (*note Extension Example::) and +also the `testext.c' code for testing the APIs. + + Some other bits and pieces: + + * The API provides access to `gawk''s `do_XXX' values, reflecting + command line options, like `do_lint', `do_profiling' and so on + (*note Extension API Variables::). These are informational: an + extension cannot affect these inside `gawk'. In addition, + attempting to assign to them produces a compile-time error. + + * The API also provides major and minor version numbers, so that an + extension can check if the `gawk' it is loaded with supports the + facilities it was compiled with. (Version mismatches "shouldn't" + happen, but we all know how _that_ goes.) *Note Extension + Versioning::, for details. + + +File: gawk.info, Node: Extension Future Growth, Prev: Extension Mechanism Outline, Up: Extension Design + +16.3.5 Room For Future Growth +----------------------------- + +The API provides room for future growth, in two ways. + + An "extension id" is passed into the extension when its loaded. This +extension id is then passed back to `gawk' with each function call. +This allows `gawk' to identify the extension calling into it, should it +need to know. + + A "name space" is passed into `gawk' when an extension function is +registered. This provides for a future mechanism for grouping +extension functions and possibly avoiding name conflicts. + + Of course, as of this writing, no decisions have been made with +respect to any of the above. + + +File: gawk.info, Node: Extension API Description, Next: Extension Example, Prev: Extension Design, Up: Dynamic Extensions + +16.4 API Description +==================== + +This (rather large) minor node describes the API in detail. + +* Menu: + +* Extension API Functions Introduction:: Introduction to the API functions. +* General Data Types:: The data types. +* Requesting Values:: How to get a value. +* Constructor Functions:: Functions for creating values. +* Registration Functions:: Functions to register things with + `gawk'. +* Printing Messages:: Functions for printing messages. +* Updating `ERRNO':: Functions for updating `ERRNO'. +* Accessing Parameters:: Functions for accessing parameters. +* Symbol Table Access:: Functions for accessing global + variables. +* Array Manipulation:: Functions for working with arrays. +* Extension API Variables:: Variables provided by the API. +* Extension API Boilerplate:: Boilerplate code for using the API. +* Finding Extensions:: How `gawk' find compiled + extensions. + + +File: gawk.info, Node: Extension API Functions Introduction, Next: General Data Types, Up: Extension API Description + +16.4.1 Introduction +------------------- + +Access to facilities within `gawk' are made available by calling +through function pointers passed into your extension. + + API function pointers are provided for the following kinds of +operations: + + * Registrations functions. You may register: + - extension functions, + + - exit callbacks, + + - a version string, + + - input parsers, + + - output wrappers, + + - and two-way processors. + All of these are discussed in detail, later in this major node. + + * Printing fatal, warning, and "lint" warning messages. + + * Updating `ERRNO', or unsetting it. + + * Accessing parameters, including converting an undefined parameter + into an array. + + * Symbol table access: retrieving a global variable, creating one, + or changing one. This also includes the ability to create a scalar + variable that will be _constant_ within `awk' code. + + * Creating and releasing cached values; this provides an efficient + way to use values for multiple variables and can be a big + performance win. + + * Manipulating arrays: + - Retrieving, adding, deleting, and modifying elements + + - Getting the count of elements in an array + + - Creating a new array + + - Clearing an array + + - Flattening an array for easy C style looping over all its + indices and elements + + Some points about using the API: + + * You must include `<sys/types.h>' and `<sys/stat.h>' before + including the `gawkapi.h' header file. In addition, you must + include either `<stddef.h>' or `<stdlib.h>' to get the definition + of `size_t'. If you wish to use the boilerplate `dl_load_func()' + macro, you will need to include `<stdio.h>' as well. Finally, to + pass reasonable integer values for `ERRNO', you will need to + include `<errno.h>'. + + * Although the API only uses ISO C 90 features, there is an + exception; the "constructor" functions use the `inline' keyword. + If your compiler does not support this keyword, you should either + place `-Dinline=''' on your command line, or use the GNU Autotools + and include a `config.h' file in your extensions. + + * All pointers filled in by `gawk' are to memory managed by `gawk' + and should be treated by the extension as read-only. Memory for + _all_ strings passed into `gawk' from the extension _must_ come + from `malloc()' and is managed by `gawk' from then on. + + * The API defines several simple structs that map values as seen + from `awk'. A value can be a `double', a string, or an array (as + in multidimensional arrays, or when creating a new array). + Strings maintain both pointer and length since embedded `NUL' + characters are allowed. + + By intent, strings are maintained using the current multibyte + encoding (as defined by `LC_XXX' environment variables) and not + using wide characters. This matches how `gawk' stores strings + internally and also how characters are likely to be input and + output from files. + + * When retrieving a value (such as a parameter or that of a global + variable or array element), the extension requests a specific type + (number, string, scalars, value cookie, array, or "undefined"). + When the request is "undefined," the returned value will have the + real underlying type. + + However, if the request and actual type don't match, the access + function returns "false" and fills in the type of the actual value + that is there, so that the extension can, e.g., print an error + message ("scalar passed where array expected"). + + + While you may call the API functions by using the function pointers +directly, the interface is not so pretty. To make extension code look +more like regular code, the `gawkapi.h' header file defines a number of +macros which you should use in your code. This minor node presents the +macros as if they were functions. + + +File: gawk.info, Node: General Data Types, Next: Requesting Values, Prev: Extension API Functions Introduction, Up: Extension API Description + +16.4.2 General Purpose Data Types +--------------------------------- + + I have a true love/hate relationship with unions. + Arnold Robbins + + That's the thing about unions: the compiler will arrange things so + they can accommodate both love and hate. + Chet Ramey + + The extension API defines a number of simple types and structures +for general purpose use. Additional, more specialized, data structures, +are introduced in subsequent minor nodes, together with the functions +that use them. + +`typedef void *awk_ext_id_t;' + A value of this type is received from `gawk' when an extension is + loaded. That value must then be passed back to `gawk' as the + first parameter of each API function. + +`#define awk_const ...' + This macro expands to `const' when compiling an extension, and to + nothing when compiling `gawk' itself. This makes certain fields + in the API data structures unwritable from extension code, while + allowing `gawk' to use them as it needs to. + +`typedef int awk_bool_t;' + A simple boolean type. At the moment, the API does not define + special "true" and "false" values, although perhaps it should. + +`typedef struct {' +` char *str; /* data */' +` size_t len; /* length thereof, in chars */' +`} awk_string_t;' + This represents a mutable string. `gawk' owns the memory pointed + to if it supplied the value. Otherwise, it takes ownership of the + memory pointed to. *Such memory must come from `malloc()'!* + + As mentioned earlier, strings are maintained using the current + multibyte encoding. + +`typedef enum {' +` AWK_UNDEFINED,' +` AWK_NUMBER,' +` AWK_STRING,' +` AWK_ARRAY,' +` AWK_SCALAR, /* opaque access to a variable */' +` AWK_VALUE_COOKIE /* for updating a previously created value */' +`} awk_valtype_t;' + This `enum' indicates the type of a value. It is used in the + following `struct'. + +`typedef struct {' +` awk_valtype_t val_type;' +` union {' +` awk_string_t s;' +` double d;' +` awk_array_t a;' +` awk_scalar_t scl;' +` awk_value_cookie_t vc;' +` } u;' +`} awk_value_t;' + An "`awk' value." The `val_type' member indicates what kind of + value the `union' holds, and each member is of the appropriate + type. + +`#define str_value u.s' +`#define num_value u.d' +`#define array_cookie u.a' +`#define scalar_cookie u.scl' +`#define value_cookie u.vc' + These macros make accessing the fields of the `awk_value_t' more + readable. + +`typedef void *awk_scalar_t;' + Scalars can be represented as an opaque type. These values are + obtained from `gawk' and then passed back into it. This is + discussed in a general fashion below, and in more detail in *note + Symbol table by cookie::. + +`typedef void *awk_value_cookie_t;' + A "value cookie" is an opaque type representing a cached value. + This is also discussed in a general fashion below, and in more + detail in *note Cached values::. + + + Scalar values in `awk' are either numbers or strings. The +`awk_value_t' struct represents values. The `val_type' member +indicates what is in the `union'. + + Representing numbers is easy--the API uses a C `double'. Strings +require more work. Since `gawk' allows embedded `NUL' bytes in string +values, a string must be represented as a pair containing a +data-pointer and length. This is the `awk_string_t' type. + + Identifiers (i.e., the names of global variables) can be associated +with either scalar values or with arrays. In addition, `gawk' provides +true arrays of arrays, where any given array element can itself be an +array. Discussion of arrays is delayed until *note Array +Manipulation::. + + The various macros listed earlier make it easier to use the elements +of the `union' as if they were fields in a `struct'; this is a common +coding practice in C. Such code is easier to write and to read, +however it remains _your_ responsibility to make sure that the +`val_type' member correctly reflects the type of the value in the +`awk_value_t'. + + Conceptually, the first three members of the `union' (number, string, +and array) are all that is needed for working with `awk' values. +However, since the API provides routines for accessing and changing the +value of global scalar variables only by using the variable's name, +there is a performance penalty: `gawk' must find the variable each time +it is accessed and changed. This turns out to be a real issue, not +just a theoretical one. + + Thus, if you know that your extension will spend considerable time +reading and/or changing the value of one or more scalar variables, you +can obtain a "scalar cookie"(1) object for that variable, and then use +the cookie for getting the variable's value or for changing the +variable's value. This is the `awk_scalar_t' type and `scalar_cookie' +macro. Given a scalar cookie, `gawk' can directly retrieve or modify +the value, as required, without having to first find it. + + The `awk_value_cookie_t' type and `value_cookie' macro are similar. +If you know that you wish to use the same numeric or string _value_ for +one or more variables, you can create the value once, retaining a +"value cookie" for it, and then pass in that value cookie whenever you +wish to set the value of a variable. This saves both storage space +within the running `gawk' process as well as the time needed to create +the value. + + ---------- Footnotes ---------- + + (1) See the "cookie" entry in the Jargon file +(http://catb.org/jargon/html/C/cookie.html) for a definition of +"cookie", and the "magic cookie" entry in the Jargon file +(http://catb.org/jargon/html/M/magic-cookie.html) for a nice example. +See also the entry for "Cookie" in the *note Glossary::. + + +File: gawk.info, Node: Requesting Values, Next: Constructor Functions, Prev: General Data Types, Up: Extension API Description + +16.4.3 Requesting Values +------------------------ + +All of the functions that return values from `gawk' work in the same +way. You pass in an `awk_valtype_t' value to indicate what kind of +value you expect. If the actual value matches what you requested, the +function returns true and fills in the `awk_value_t' result. +Otherwise, the function returns false, and the `val_type' member +indicates the type of the actual value. You may then print an error +message, or reissue the request for the actual value type, as +appropriate. This behavior is summarized in *note +table-value-types-returned::. + + Type of Actual Value: +-------------------------------------------------------------------------- + + String Number Array Undefined +------------------------------------------------------------------------------ + String String String false false + Number Number if can Number false false + be converted, + else false +Type Array false false Array false +Requested: Scalar Scalar Scalar false false + Undefined String Number Array Undefined + Value false false false false + Cookie + +Table 16.1: Value Types Returned + + +File: gawk.info, Node: Constructor Functions, Next: Registration Functions, Prev: Requesting Values, Up: Extension API Description + +16.4.4 Constructor Functions and Convenience Macros +--------------------------------------------------- + +The API provides a number of "constructor" functions for creating +string and numeric values, as well as a number of convenience macros. +This node presents them all as function prototypes, in the way that +extension code would use them. + +`static inline awk_value_t *' +`make_const_string(const char *string, size_t length, awk_value_t *result)' + This function creates a string value in the `awk_value_t' variable + pointed to by `result'. It expects `string' to be a C string + constant (or other string data), and automatically creates a + _copy_ of the data for storage in `result'. It returns `result'. + +`static inline awk_value_t *' +`make_malloced_string(const char *string, size_t length, awk_value_t *result)' + This function creates a string value in the `awk_value_t' variable + pointed to by `result'. It expects `string' to be a `char *' value + pointing to data previously obtained from `malloc()'. The idea here + is that the data is passed directly to `gawk', which assumes + responsibility for it. It returns `result'. + +`static inline awk_value_t *' +`make_null_string(awk_value_t *result)' + This specialized function creates a null string (the "undefined" + value) in the `awk_value_t' variable pointed to by `result'. It + returns `result'. + +`static inline awk_value_t *' +`make_number(double num, awk_value_t *result)' + This function simply creates a numeric value in the `awk_value_t' + variable pointed to by `result'. + + Two convenience macros may be used for allocating storage from +`malloc()' and `realloc()'. If the allocation fails, they cause `gawk' +to exit with a fatal error message. They should be used as if they were +procedure calls that do not return a value. + +`emalloc(pointer, type, size, message)' + The arguments to this macro are as follows: + `pointer' + The pointer variable to point at the allocated storage. + + `type' + The type of the pointer variable, used to create a cast for + the call to `malloc()'. + + `size' + The total number of bytes to be allocated. + + `message' + A message to be prefixed to the fatal error message. + Typically this is the name of the function using the macro. + + For example, you might allocate a string value like so: + + awk_value_t result; + char *message; + const char greet[] = "Don't Panic!"; + + emalloc(message, char *, sizeof(greet), "myfunc"); + strcpy(message, greet); + make_malloced_string(message, strlen(message), & result); + +`erealloc(pointer, type, size, message)' + This is like `emalloc()', but it calls `realloc()', instead of + `malloc()'. The arguments are the same as for the `emalloc()' + macro. + + +File: gawk.info, Node: Registration Functions, Next: Printing Messages, Prev: Constructor Functions, Up: Extension API Description + +16.4.5 Registration Functions +----------------------------- + +This minor node describes the API functions for registering parts of +your extension with `gawk'. + +* Menu: + +* Extension Functions:: Registering extension functions. +* Exit Callback Functions:: Registering an exit callback. +* Extension Version String:: Registering a version string. +* Input Parsers:: Registering an input parser. +* Output Wrappers:: Registering an output wrapper. +* Two-way processors:: Registering a two-way processor. + + +File: gawk.info, Node: Extension Functions, Next: Exit Callback Functions, Up: Registration Functions + +16.4.5.1 Registering An Extension Function +.......................................... + +Extension functions are described by the following record: + + typedef struct { + const char *name; + awk_value_t *(*function)(int num_actual_args, awk_value_t *result); + size_t num_expected_args; + } awk_ext_func_t; + + The fields are: + +`const char *name;' + The name of the new function. `awk' level code calls the function + by this name. This is a regular C string. + +`awk_value_t *(*function)(int num_actual_args, awk_value_t *result);' + This is a pointer to the C function that provides the desired + functionality. The function must fill in the result with either a + number or a string. `awk' takes ownership of any string memory. + As mentioned earlier, string memory *must* come from `malloc()'. + + The function must return the value of `result'. This is for the + convenience of the calling code inside `gawk'. + +`size_t num_expected_args;' + This is the number of arguments the function expects to receive. + Each extension function may decide what to do if the number of + arguments isn't what it expected. Following `awk' functions, it + is likely OK to ignore extra arguments. + + Once you have a record representing your extension function, you +register it with `gawk' using this API function: + +`awk_bool_t add_ext_func(const char *namespace, const awk_ext_func_t *func);' + This function returns true upon success, false otherwise. The + `namespace' parameter is currently not used; you should pass in an + empty string (`""'). The `func' pointer is the address of a + `struct' representing your function, as just described. + + +File: gawk.info, Node: Exit Callback Functions, Next: Extension Version String, Prev: Extension Functions, Up: Registration Functions + +16.4.5.2 Registering An Exit Callback Function +.............................................. + +An "exit callback" function is a function that `gawk' calls before it +exits. Such functions are useful if you have general "clean up" tasks +that should be performed in your extension (such as closing data base +connections or other resource deallocations). You can register such a +function with `gawk' using the following function. + +`void awk_atexit(void (*funcp)(void *data, int exit_status),' +` void *arg0);' + The parameters are: + `funcp' + A pointer to the function to be called before `gawk' exits. + The `data' parameter will be the original value of `arg0'. + The `exit_status' parameter is the exit status value that + `gawk' will pass to the `exit()' system call. + + `arg0' + A pointer to private data which `gawk' saves in order to pass + to the function pointed to by `funcp'. + + Exit callback functions are called in Last-In-First-Out (LIFO) +order--that is, in the reverse order in which they are registered with +`gawk'. + + +File: gawk.info, Node: Extension Version String, Next: Input Parsers, Prev: Exit Callback Functions, Up: Registration Functions + +16.4.5.3 Registering An Extension Version String +................................................ + +You can register a version string which indicates the name and version +of your extension, with `gawk', as follows: + +`void register_ext_version(const char *version);' + Register the string pointed to by `version' with `gawk'. `gawk' + does _not_ copy the `version' string, so it should not be changed. + + `gawk' prints all registered extension version strings when it is +invoked with the `--version' option. + + +File: gawk.info, Node: Input Parsers, Next: Output Wrappers, Prev: Extension Version String, Up: Registration Functions + +16.4.5.4 Customized Input Parsers +................................. + +By default, `gawk' reads text files as its input. It uses the value of +`RS' to find the end of the record, and then uses `FS' (or +`FIELDWIDTHS') to split it into fields (*note Reading Files::). +Additionally, it sets the value of `RT' (*note Built-in Variables::). + + If you want, you can provide your own, custom, input parser. An +input parser's job is to return a record to the `gawk' record processing +code, along with indicators for the value and length of the data to be +used for `RT', if any. + + To provide an input parser, you must first provide two functions +(where XXX is a prefix name for your extension): + +`awk_bool_t XXX_can_take_file(const awk_input_buf_t *iobuf)' + This function examines the information available in `iobuf' (which + we discuss shortly). Based on the information there, it decides + if the input parser should be used for this file. If so, it + should return true. Otherwise, it should return false. It should + not change any state (variable values, etc.) within `gawk'. + +`awk_bool_t XXX_take_control_of(awk_input_buf_t *iobuf)' + When `gawk' decides to hand control of the file over to the input + parser, it calls this function. This function in turn must fill + in certain fields in the `awk_input_buf_t' structure, and ensure + that certain conditions are true. It should then return true. If + an error of some kind occurs, it should not fill in any fields, + and should return false; then `gawk' will not use the input parser. + The details are presented shortly. + + Your extension should package these functions inside an +`awk_input_parser_t', which looks like this: + + typedef struct input_parser { + const char *name; /* name of parser */ + awk_bool_t (*can_take_file)(const awk_input_buf_t *iobuf); + awk_bool_t (*take_control_of)(awk_input_buf_t *iobuf); + awk_const struct input_parser *awk_const next; /* for use by gawk */ + } awk_input_parser_t; + + The fields are: + +`const char *name;' + The name of the input parser. This is a regular C string. + +`awk_bool_t (*can_take_file)(const awk_input_buf_t *iobuf);' + A pointer to your `XXX_can_take_file()' function. + +`awk_bool_t (*take_control_of)(awk_input_buf_t *iobuf);' + A pointer to your `XXX_take_control_of()' function. + +`awk_const struct input_parser *awk_const next;' + This pointer is used by `gawk'. The extension cannot modify it. + + The steps are as follows: + + 1. Create a `static awk_input_parser_t' variable and initialize it + appropriately. + + 2. When your extension is loaded, register your input parser with + `gawk' using the `register_input_parser()' API function (described + below). + + An `awk_input_buf_t' looks like this: + + typedef struct awk_input { + const char *name; /* filename */ + int fd; /* file descriptor */ + #define INVALID_HANDLE (-1) + void *opaque; /* private data for input parsers */ + int (*get_record)(char **out, struct awk_input *iobuf, + int *errcode, char **rt_start, size_t *rt_len); + void (*close_func)(struct awk_input *iobuf); + struct stat sbuf; /* stat buf */ + } awk_input_buf_t; + + The fields can be divided into two categories: those for use +(initially, at least) by `XXX_can_take_file()', and those for use by +`XXX_take_control_of()'. The first group of fields and their uses are +as follows: + +`const char *name;' + The name of the file. + +`int fd;' + A file descriptor for the file. If `gawk' was able to open the + file, then `fd' will _not_ be equal to `INVALID_HANDLE'. + Otherwise, it will. + +`struct stat sbuf;' + If file descriptor is valid, then `gawk' will have filled in this + structure via a call to the `fstat()' system call. + + The `XXX_can_take_file()' function should examine these fields and +decide if the input parser should be used for the file. The decision +can be made based upon `gawk' state (the value of a variable defined +previously by the extension and set by `awk' code), the name of the +file, whether or not the file descriptor is valid, the information in +the `struct stat', or any combination of the above. + + Once `XXX_can_take_file()' has returned true, and `gawk' has decided +to use your input parser, it calls `XXX_take_control_of()'. That +function then fills in at least the `get_record' field of the +`awk_input_buf_t'. It must also ensure that `fd' is not set to +`INVALID_HANDLE'. All of the fields that may be filled by +`XXX_take_control_of()' are as follows: + +`void *opaque;' + This is used to hold any state information needed by the input + parser for this file. It is "opaque" to `gawk'. The input parser + is not required to use this pointer. + +`int (*get_record)(char **out,' +` struct awk_input *iobuf,' +` int *errcode,' +` char **rt_start,' +` size_t *rt_len);' + This function pointer should point to a function that creates the + input records. Said function is the core of the input parser. + Its behavior is described below. + +`void (*close_func)(struct awk_input *iobuf);' + This function pointer should point to a function that does the + "tear down." It should release any resources allocated by + `XXX_take_control_of()'. It may also close the file. If it does + so, it should set the `fd' field to `INVALID_HANDLE'. + + If `fd' is still not `INVALID_HANDLE' after the call to this + function, `gawk' calls the regular `close()' system call. + + Having a "tear down" function is optional. If your input parser + does not need it, do not set this field. Then, `gawk' calls the + regular `close()' system call on the file descriptor, so it should + be valid. + + The `XXX_get_record()' function does the work of creating input +records. The parameters are as follows: + +`char **out' + This is a pointer to a `char *' variable which is set to point to + the record. `gawk' makes its own copy of the data, so the + extension must manage this storage. + +`struct awk_input *iobuf' + This is the `awk_input_buf_t' for the file. The fields should be + used for reading data (`fd') and for managing private state + (`opaque'), if any. + +`int *errcode' + If an error occurs, `*errcode' should be set to an appropriate + code from `<errno.h>'. + +`char **rt_start' +`size_t *rt_len' + If the concept of a "record terminator" makes sense, then + `*rt_start' should be set to point to the data to be used for + `RT', and `*rt_len' should be set to the length of the data. + Otherwise, `*rt_len' should be set to zero. `gawk' makes its own + copy of this data, so the extension must manage the storage. + + The return value is the length of the buffer pointed to by `*out', +or `EOF' if end-of-file was reached or an error occurred. + + It is guaranteed that `errcode' is a valid pointer, so there is no +need to test for a `NULL' value. `gawk' sets `*errcode' to zero, so +there is no need to set it unless an error occurs. + + If an error does occur, the function should return `EOF' and set +`*errcode' to a non-zero value. In that case, if `*errcode' does not +equal -1, `gawk' automatically updates the `ERRNO' variable based on +the value of `*errcode' (e.g., setting `*errcode = errno' should do the +right thing). + + `gawk' ships with a sample extension that reads directories, +returning records for each entry in the directory (*note Extension +Sample Readdir::). You may wish to use that code as a guide for writing +your own input parser. + + When writing an input parser, you should think about (and document) +how it is expected to interact with `awk' code. You may want it to +always be called, and take effect as appropriate (as the `readdir' +extension does). Or you may want it to take effect based upon the +value of an `awk' variable, as the XML extension from the `gawkextlib' +project does (*note gawkextlib::). In the latter case, code in a +`BEGINFILE' section can look at `FILENAME' and `ERRNO' to decide +whether or not to activate an input parser (*note BEGINFILE/ENDFILE::). + + You register your input parser with the following function: + +`void register_input_parser(awk_input_parser_t *input_parser);' + Register the input parser pointed to by `input_parser' with `gawk'. + + +File: gawk.info, Node: Output Wrappers, Next: Two-way processors, Prev: Input Parsers, Up: Registration Functions + +16.4.5.5 Customized Output Wrappers +................................... + +An "output wrapper" is the mirror image of an input parser. It allows +an extension to take over the output to a file opened with the `>' or +`>>' operators (*note Redirection::). + + The output wrapper is very similar to the input parser structure: + + typedef struct output_wrapper { + const char *name; /* name of the wrapper */ + awk_bool_t (*can_take_file)(const awk_output_buf_t *outbuf); + awk_bool_t (*take_control_of)(awk_output_buf_t *outbuf); + awk_const struct output_wrapper *awk_const next; /* for use by gawk */ + } awk_output_wrapper_t; + + The members are as follows: + +`const char *name;' + This is the name of the output wrapper. + +`awk_bool_t (*can_take_file)(const awk_output_buf_t *outbuf);' + This points to a function that examines the information in the + `awk_output_buf_t' structure pointed to by `outbuf'. It should + return true if the output wrapper wants to take over the file, and + false otherwise. It should not change any state (variable values, + etc.) within `gawk'. + +`awk_bool_t (*take_control_of)(awk_output_buf_t *outbuf);' + The function pointed to by this field is called when `gawk' + decides to let the output wrapper take control of the file. It + should fill in appropriate members of the `awk_output_buf_t' + structure, as described below, and return true if successful, + false otherwise. + +`awk_const struct output_wrapper *awk_const next;' + This is for use by `gawk'. + + The `awk_output_buf_t' structure looks like this: + + typedef struct { + const char *name; /* name of output file */ + const char *mode; /* mode argument to fopen */ + FILE *fp; /* stdio file pointer */ + awk_bool_t redirected; /* true if a wrapper is active */ + void *opaque; /* for use by output wrapper */ + size_t (*gawk_fwrite)(const void *buf, size_t size, size_t count, + FILE *fp, void *opaque); + int (*gawk_fflush)(FILE *fp, void *opaque); + int (*gawk_ferror)(FILE *fp, void *opaque); + int (*gawk_fclose)(FILE *fp, void *opaque); + } awk_output_buf_t; + + Here too, your extension will define `XXX_can_take_file()' and +`XXX_take_control_of()' functions that examine and update data members +in the `awk_output_buf_t'. The data members are as follows: + +`const char *name;' + The name of the output file. + +`const char *mode;' + The mode string (as would be used in the second argument to + `fopen()') with which the file was opened. + +`FILE *fp;' + The `FILE' pointer from `<stdio.h>'. `gawk' opens the file before + attempting to find an output wrapper. + +`awk_bool_t redirected;' + This field must be set to true by the `XXX_take_control_of()' + function. + +`void *opaque;' + This pointer is opaque to `gawk'. The extension should use it to + store a pointer to any private data associated with the file. + +`size_t (*gawk_fwrite)(const void *buf, size_t size, size_t count,' +` FILE *fp, void *opaque);' +`int (*gawk_fflush)(FILE *fp, void *opaque);' +`int (*gawk_ferror)(FILE *fp, void *opaque);' +`int (*gawk_fclose)(FILE *fp, void *opaque);' + These pointers should be set to point to functions that perform + the equivalent function as the `<stdio.h>' functions do, if + appropriate. `gawk' uses these function pointers for all output. + `gawk' initializes the pointers to point to internal, "pass + through" functions that just call the regular `<stdio.h>' + functions, so an extension only needs to redefine those functions + that are appropriate for what it does. + + The `XXX_can_take_file()' function should make a decision based upon +the `name' and `mode' fields, and any additional state (such as `awk' +variable values) that is appropriate. + + When `gawk' calls `XXX_take_control_of()', it should fill in the +other fields, as appropriate, except for `fp', which it should just use +normally. + + You register your output wrapper with the following function: + +`void register_output_wrapper(awk_output_wrapper_t *output_wrapper);' + Register the output wrapper pointed to by `output_wrapper' with + `gawk'. + + +File: gawk.info, Node: Two-way processors, Prev: Output Wrappers, Up: Registration Functions + +16.4.5.6 Customized Two-way Processors +...................................... + +A "two-way processor" combines an input parser and an output wrapper for +two-way I/O with the `|&' operator (*note Redirection::). It makes +identical use of the `awk_input_parser_t' and `awk_output_buf_t' +structures as described earlier. + + A two-way processor is represented by the following structure: + + typedef struct two_way_processor { + const char *name; /* name of the two-way processor */ + awk_bool_t (*can_take_two_way)(const char *name); + awk_bool_t (*take_control_of)(const char *name, + awk_input_buf_t *inbuf, + awk_output_buf_t *outbuf); + awk_const struct two_way_processor *awk_const next; /* for use by gawk */ + } awk_two_way_processor_t; + + The fields are as follows: + +`const char *name;' + The name of the two-way processor. + +`awk_bool_t (*can_take_two_way)(const char *name);' + This function returns true if it wants to take over two-way I/O + for this filename. It should not change any state (variable + values, etc.) within `gawk'. + +`awk_bool_t (*take_control_of)(const char *name,' +` awk_input_buf_t *inbuf,' +` awk_output_buf_t *outbuf);' + This function should fill in the `awk_input_buf_t' and + `awk_outut_buf_t' structures pointed to by `inbuf' and `outbuf', + respectively. These structures were described earlier. + +`awk_const struct two_way_processor *awk_const next;' + This is for use by `gawk'. + + As with the input parser and output processor, you provide "yes I +can take this" and "take over for this" functions, +`XXX_can_take_two_way()' and `XXX_take_control_of()'. + + You register your two-way processor with the following function: + +`void register_two_way_processor(awk_two_way_processor_t *two_way_processor);' + Register the two-way processor pointed to by `two_way_processor' + with `gawk'. + + +File: gawk.info, Node: Printing Messages, Next: Updating `ERRNO', Prev: Registration Functions, Up: Extension API Description + +16.4.6 Printing Messages +------------------------ + +You can print different kinds of warning messages from your extension, +as described below. Note that for these functions, you must pass in +the extension id received from `gawk' when the extension was loaded.(1) + +`void fatal(awk_ext_id_t id, const char *format, ...);' + Print a message and then cause `gawk' to exit immediately. + +`void warning(awk_ext_id_t id, const char *format, ...);' + Print a warning message. + +`void lintwarn(awk_ext_id_t id, const char *format, ...);' + Print a "lint warning." Normally this is the same as printing a + warning message, but if `gawk' was invoked with `--lint=fatal', + then lint warnings become fatal error messages. + + All of these functions are otherwise like the C `printf()' family of +functions, where the `format' parameter is a string with literal +characters and formatting codes intermixed. + + ---------- Footnotes ---------- + + (1) Because the API uses only ISO C 90 features, it cannot make use +of the ISO C 99 variadic macro feature to hide that parameter. More's +the pity. + + +File: gawk.info, Node: Updating `ERRNO', Next: Accessing Parameters, Prev: Printing Messages, Up: Extension API Description + +16.4.7 Updating `ERRNO' +----------------------- + +The following functions allow you to update the `ERRNO' variable: + +`void update_ERRNO_int(int errno_val);' + Set `ERRNO' to the string equivalent of the error code in + `errno_val'. The value should be one of the defined error codes in + `<errno.h>', and `gawk' turns it into a (possibly translated) + string using the C `strerror()' function. + +`void update_ERRNO_string(const char *string);' + Set `ERRNO' directly to the string value of `ERRNO'. `gawk' makes + a copy of the value of `string'. + +`void unset_ERRNO();' + Unset `ERRNO'. + + +File: gawk.info, Node: Accessing Parameters, Next: Symbol Table Access, Prev: Updating `ERRNO', Up: Extension API Description + +16.4.8 Accessing and Updating Parameters +---------------------------------------- + +Two functions give you access to the arguments (parameters) passed to +your extension function. They are: + +`awk_bool_t get_argument(size_t count,' +` awk_valtype_t wanted,' +` awk_value_t *result);' + Fill in the `awk_value_t' structure pointed to by `result' with + the `count''th argument. Return true if the actual type matches + `wanted', false otherwise. In the latter case, `result->val_type' + indicates the actual type (*note Table 16.1: + table-value-types-returned.). Counts are zero based--the first + argument is numbered zero, the second one, and so on. `wanted' + indicates the type of value expected. + +`awk_bool_t set_argument(size_t count, awk_array_t array);' + Convert a parameter that was undefined into an array; this provides + call-by-reference for arrays. Return false if `count' is too big, + or if the argument's type is not undefined. *Note Array + Manipulation::, for more information on creating arrays. + + +File: gawk.info, Node: Symbol Table Access, Next: Array Manipulation, Prev: Accessing Parameters, Up: Extension API Description + +16.4.9 Symbol Table Access +-------------------------- + +Two sets of routines provide access to global variables, and one set +allows you to create and release cached values. + +* Menu: + +* Symbol table by name:: Accessing variables by name. +* Symbol table by cookie:: Accessing variables by ``cookie''. +* Cached values:: Creating and using cached values. + + +File: gawk.info, Node: Symbol table by name, Next: Symbol table by cookie, Up: Symbol Table Access + +16.4.9.1 Variable Access and Update by Name +........................................... + +The following routines provide the ability to access and update global +`awk'-level variables by name. In compiler terminology, identifiers of +different kinds are termed "symbols", thus the "sym" in the routines' +names. The data structure which stores information about symbols is +termed a "symbol table". + +`awk_bool_t sym_lookup(const char *name,' +` awk_valtype_t wanted,' +` awk_value_t *result);' + Fill in the `awk_value_t' structure pointed to by `result' with + the value of the variable named by the string `name', which is a + regular C string. `wanted' indicates the type of value expected. + Return true if the actual type matches `wanted', false otherwise + In the latter case, `result->val_type' indicates the actual type + (*note Table 16.1: table-value-types-returned.). + +`awk_bool_t sym_update(const char *name, awk_value_t *value);' + Update the variable named by the string `name', which is a regular + C string. The variable is added to `gawk''s symbol table if it is + not there. Return true if everything worked, false otherwise. + + Changing types (scalar to array or vice versa) of an existing + variable is _not_ allowed, nor may this routine be used to update + an array. This routine cannot be be used to update any of the + predefined variables (such as `ARGC' or `NF'). + +`awk_bool_t sym_constant(const char *name, awk_value_t *value);' + Create a variable named by the string `name', which is a regular C + string, that has the constant value as given by `value'. + `awk'-level code cannot change the value of this variable.(1) The + extension may change the value of `name''s variable with + subsequent calls to this routine, and may also convert a variable + created by `sym_update()' into a constant. However, once a + variable becomes a constant it cannot later be reverted into a + mutable variable. + + ---------- Footnotes ---------- + + (1) There (currently) is no `awk'-level feature that provides this +ability. + + +File: gawk.info, Node: Symbol table by cookie, Next: Cached values, Prev: Symbol table by name, Up: Symbol Table Access + +16.4.9.2 Variable Access and Update by Cookie +............................................. + +A "scalar cookie" is an opaque handle that provide access to a global +variable or array. It is an optimization that avoids looking up +variables in `gawk''s symbol table every time access is needed. This +was discussed earlier, in *note General Data Types::. + + The following functions let you work with scalar cookies. + +`awk_bool_t sym_lookup_scalar(awk_scalar_t cookie,' +` awk_valtype_t wanted,' +` awk_value_t *result);' + Retrieve the current value of a scalar cookie. Once you have + obtained a scalar_cookie using `sym_lookup()', you can use this + function to get its value more efficiently. Return false if the + value cannot be retrieved. + +`awk_bool_t sym_update_scalar(awk_scalar_t cookie, awk_value_t *value);' + Update the value associated with a scalar cookie. Return false if + the new value is not one of `AWK_STRING' or `AWK_NUMBER'. Here + too, the built-in variables may not be updated. + + It is not obvious at first glance how to work with scalar cookies or +what their raison d'etre really is. In theory, the `sym_lookup()' and +`sym_update()' routines are all you really need to work with variables. +For example, you might have code that looked up the value of a +variable, evaluated a condition, and then possibly changed the value of +the variable based on the result of that evaluation, like so: + + /* do_magic --- do something really great */ + + static awk_value_t * + do_magic(int nargs, awk_value_t *result) + { + awk_value_t value; + + if ( sym_lookup("MAGIC_VAR", AWK_NUMBER, & value) + && some_condition(value.num_value)) { + value.num_value += 42; + sym_update("MAGIC_VAR", & value); + } + + return make_number(0.0, result); + } + +This code looks (and is) simple and straightforward. So what's the +problem? + + Consider what happens if `awk'-level code associated with your +extension calls the `magic()' function (implemented in C by +`do_magic()'), once per record, while processing hundreds of thousands +or millions of records. The `MAGIC_VAR' variable is looked up in the +symbol table once or twice per function call! + + The symbol table lookup is really pure overhead; it is considerably +more efficient to get a cookie that represents the variable, and use +that to get the variable's value and update it as needed.(1) + + Thus, the way to use cookies is as follows. First, install your +extension's variable in `gawk''s symbol table using `sym_update()', as +usual. Then get a scalar cookie for the variable using `sym_lookup()': + + static awk_scalar_t magic_var_cookie; /* cookie for MAGIC_VAR */ + + static void + my_extension_init() + { + awk_value_t value; + + /* install initial value */ + sym_update("MAGIC_VAR", make_number(42.0, & value)); + + /* get cookie */ + sym_lookup("MAGIC_VAR", AWK_SCALAR, & value); + + /* save the cookie */ + magic_var_cookie = value.scalar_cookie; + ... + } + + Next, use the routines in this section for retrieving and updating +the value through the cookie. Thus, `do_magic()' now becomes something +like this: + + /* do_magic --- do something really great */ + + static awk_value_t * + do_magic(int nargs, awk_value_t *result) + { + awk_value_t value; + + if ( sym_lookup_scalar(magic_var_cookie, AWK_NUMBER, & value) + && some_condition(value.num_value)) { + value.num_value += 42; + sym_update_scalar(magic_var_cookie, & value); + } + ... + + return make_number(0.0, result); + } + + NOTE: The previous code omitted error checking for presentation + purposes. Your extension code should be more robust and carefully + check the return values from the API functions. + + ---------- Footnotes ---------- + + (1) The difference is measurable and quite real. Trust us. + + +File: gawk.info, Node: Cached values, Prev: Symbol table by cookie, Up: Symbol Table Access + +16.4.9.3 Creating and Using Cached Values +......................................... + +The routines in this section allow you to create and release cached +values. As with scalar cookies, in theory, cached values are not +necessary. You can create numbers and strings using the functions in +*note Constructor Functions::. You can then assign those values to +variables using `sym_update()' or `sym_update_scalar()', as you like. + + However, you can understand the point of cached values if you +remember that _every_ string value's storage _must_ come from +`malloc()'. If you have 20 variables, all of which have the same +string value, you must create 20 identical copies of the string.(1) + + It is clearly more efficient, if possible, to create a value once, +and then tell `gawk' to reuse the value for multiple variables. That is +what the routines in this section let you do. The functions are as +follows: + +`awk_bool_t create_value(awk_value_t *value, awk_value_cookie_t *result);' + Create a cached string or numeric value from `value' for efficient + later assignment. Only `AWK_NUMBER' and `AWK_STRING' values are + allowed. Any other type is rejected. While `AWK_UNDEFINED' could + be allowed, doing so would result in inferior performance. + +`awk_bool_t release_value(awk_value_cookie_t vc);' + Release the memory associated with a value cookie obtained from + `create_value()'. + + You use value cookies in a fashion similar to the way you use scalar +cookies. In the extension initialization routine, you create the value +cookie: + + static awk_value_cookie_t answer_cookie; /* static value cookie */ + + static void + my_extension_init() + { + awk_value_t value; + char *long_string; + size_t long_string_len; + + /* code from earlier */ + ... + /* ... fill in long_string and long_string_len ... */ + make_malloced_string(long_string, long_string_len, & value); + create_value(& value, & answer_cookie); /* create cookie */ + ... + } + + Once the value is created, you can use it as the value of any number +of variables: + + static awk_value_t * + do_magic(int nargs, awk_value_t *result) + { + awk_value_t new_value; + + ... /* as earlier */ + + value.val_type = AWK_VALUE_COOKIE; + value.value_cookie = answer_cookie; + sym_update("VAR1", & value); + sym_update("VAR2", & value); + ... + sym_update("VAR100", & value); + ... + } + +Using value cookies in this way saves considerable storage, since all of +`VAR1' through `VAR100' share the same value. + + You might be wondering, "Is this sharing problematic? What happens +if `awk' code assigns a new value to `VAR1', are all the others be +changed too?" + + That's a great question. The answer is that no, it's not a problem. +`gawk' is smart enough to avoid such problems. + + Finally, as part of your clean up action (*note Exit Callback +Functions::) you should release any cached values that you created, +using `release_value()'. + + ---------- Footnotes ---------- + + (1) Numeric values are clearly less problematic, requiring only a C +`double' to store. + + +File: gawk.info, Node: Array Manipulation, Next: Extension API Variables, Prev: Symbol Table Access, Up: Extension API Description + +16.4.10 Array Manipulation +-------------------------- + +The primary data structure(1) in `awk' is the associative array (*note +Arrays::). Extensions need to be able to manipulate `awk' arrays. The +API provides a number of data structures for working with arrays, +functions for working with individual elements, and functions for +working with arrays as a whole. This includes the ability to "flatten" +an array so that it is easy for C code to traverse every element in an +array. The array data structures integrate nicely with the data +structures for values to make it easy to both work with and create true +arrays of arrays (*note General Data Types::). + +* Menu: + +* Array Data Types:: Data types for working with arrays. +* Array Functions:: Functions for working with arrays. +* Flattening Arrays:: How to flatten arrays. +* Creating Arrays:: How to create and populate arrays. + + ---------- Footnotes ---------- + + (1) Okay, the only data structure. + + +File: gawk.info, Node: Array Data Types, Next: Array Functions, Up: Array Manipulation + +16.4.10.1 Array Data Types +.......................... + +The data types associated with arrays are listed below. + +`typedef void *awk_array_t;' + If you request the value of an array variable, you get back an + `awk_array_t' value. This value is opaque(1) to the extension; it + uniquely identifies the array but can only be used by passing it + into API functions or receiving it from API functions. This is + very similar to way `FILE *' values are used with the `<stdio.h>' + library routines. + +`' + +`typedef struct awk_element {' +` /* convenience linked list pointer, not used by gawk */' +` struct awk_element *next;' +` enum {' +` AWK_ELEMENT_DEFAULT = 0, /* set by gawk */' +` AWK_ELEMENT_DELETE = 1 /* set by extension if should be deleted */' +` } flags;' +` awk_value_t index;' +` awk_value_t value;' +`} awk_element_t;' + The `awk_element_t' is a "flattened" array element. `awk' produces + an array of these inside the `awk_flat_array_t' (see the next + item). Individual elements may be marked for deletion. New + elements must be added individually, one at a time, using the + separate API for that purpose. The fields are as follows: + + `struct awk_element *next;' + This pointer is for the convenience of extension writers. It + allows an extension to create a linked list of new elements + which can then be added to an array in a loop that traverses + the list. + + `enum { ... } flags;' + A set of flag values that convey information between `gawk' + and the extension. Currently there is only one: + `AWK_ELEMENT_DELETE', which the extension can set to cause + `gawk' to delete the element from the original array upon + release of the flattened array. + + `index' + `value' + The index and value of the element, respectively. _All_ + memory pointed to by `index' and `value' belongs to `gawk'. + +`typedef struct awk_flat_array {' +` awk_const void *awk_const opaque1; /* private data for use by gawk */' +` awk_const void *awk_const opaque2; /* private data for use by gawk */' +` awk_const size_t count; /* how many elements */' +` awk_element_t elements[1]; /* will be extended */' +`} awk_flat_array_t;' + This is a flattened array. When an extension gets one of these + from `gawk', the `elements' array is of actual size `count'. The + `opaque1' and `opaque2' pointers are for use by `gawk'; therefore + they are marked `awk_const' so that the extension cannot modify + them. + + ---------- Footnotes ---------- + + (1) It is also a "cookie," but the `gawk' developers did not wish to +overuse this term. + + +File: gawk.info, Node: Array Functions, Next: Flattening Arrays, Prev: Array Data Types, Up: Array Manipulation + +16.4.10.2 Array Functions +......................... + +The following functions relate to individual array elements. + +`awk_bool_t get_element_count(awk_array_t a_cookie, size_t *count);' + For the array represented by `a_cookie', return in `*count' the + number of elements it contains. A subarray counts as a single + element. Return false if there is an error. + +`awk_bool_t get_array_element(awk_array_t a_cookie,' +` const awk_value_t *const index,' +` awk_valtype_t wanted,' +` awk_value_t *result);' + For the array represented by `a_cookie', return in `*result' the + value of the element whose index is `index'. `wanted' specifies + the type of value you wish to retrieve. Return false if `wanted' + does not match the actual type or if `index' is not in the array + (*note Table 16.1: table-value-types-returned.). + + The value for `index' can be numeric, in which case `gawk' + converts it to a string. Using non-integral values is possible, but + requires that you understand how such values are converted to + strings (*note Conversion::); thus using integral values is safest. + + As with _all_ strings passed into `gawk' from an extension, the + string value of `index' must come from `malloc()', and `gawk' + releases the storage. + +`awk_bool_t set_array_element(awk_array_t a_cookie,' +` const awk_value_t *const index,' +` const awk_value_t *const value);' + In the array represented by `a_cookie', create or modify the + element whose index is given by `index'. The `ARGV' and `ENVIRON' + arrays may not be changed. + +`awk_bool_t set_array_element_by_elem(awk_array_t a_cookie,' +` awk_element_t element);' + Like `set_array_element()', but take the `index' and `value' from + `element'. This is a convenience macro. + +`awk_bool_t del_array_element(awk_array_t a_cookie,' +` const awk_value_t* const index);' + Remove the element with the given index from the array represented + by `a_cookie'. Return true if the element was removed, or false + if the element did not exist in the array. + + The following functions relate to arrays as a whole: + +`awk_array_t create_array();' + Create a new array to which elements may be added. *Note Creating + Arrays::, for a discussion of how to create a new array and add + elements to it. + +`awk_bool_t clear_array(awk_array_t a_cookie);' + Clear the array represented by `a_cookie'. Return false if there + was some kind of problem, true otherwise. The array remains an + array, but after calling this function, it has no elements. This + is equivalent to using the `delete' statement (*note Delete::). + +`awk_bool_t flatten_array(awk_array_t a_cookie, awk_flat_array_t **data);' + For the array represented by `a_cookie', create an + `awk_flat_array_t' structure and fill it in. Set the pointer whose + address is passed as `data' to point to this structure. Return + true upon success, or false otherwise. *Note Flattening Arrays::, + for a discussion of how to flatten an array and work with it. + +`awk_bool_t release_flattened_array(awk_array_t a_cookie,' +` awk_flat_array_t *data);' + When done with a flattened array, release the storage using this + function. You must pass in both the original array cookie, and + the address of the created `awk_flat_array_t' structure. The + function returns true upon success, false otherwise. + + +File: gawk.info, Node: Flattening Arrays, Next: Creating Arrays, Prev: Array Functions, Up: Array Manipulation + +16.4.10.3 Working With All The Elements of an Array +................................................... + +To "flatten" an array is create a structure that represents the full +array in a fashion that makes it easy for C code to traverse the entire +array. Test code in `extension/testext.c' does this, and also serves +as a nice example to show how to use the APIs. + + First, the `gawk' script that drives the test extension: + + @load "testext" + BEGIN { + n = split("blacky rusty sophie raincloud lucky", pets) + printf "pets has %d elements\n", length(pets) + ret = dump_array_and_delete("pets", "3") + printf "dump_array_and_delete(pets) returned %d\n", ret + if ("3" in pets) + printf("dump_array_and_delete() did NOT remove index \"3\"!\n") + else + printf("dump_array_and_delete() did remove index \"3\"!\n") + print "" + } + +This code creates an array with `split()' (*note String Functions::) +and then calls `dump_and_delete()'. That function looks up the array +whose name is passed as the first argument, and deletes the element at +the index passed in the second argument. It then prints the return +value and checks if the element was indeed deleted. Here is the C code +that implements `dump_array_and_delete()'. It has been edited slightly +for presentation. + + The first part declares variables, sets up the default return value +in `result', and checks that the function was called with the correct +number of arguments: + + static awk_value_t * + dump_array_and_delete(int nargs, awk_value_t *result) + { + awk_value_t value, value2, value3; + awk_flat_array_t *flat_array; + size_t count; + char *name; + int i; + + assert(result != NULL); + make_number(0.0, result); + + if (nargs != 2) { + printf("dump_array_and_delete: nargs not right " + "(%d should be 2)\n", nargs); + goto out; + } + + The function then proceeds in steps, as follows. First, retrieve the +name of the array, passed as the first argument. Then retrieve the +array itself. If either operation fails, print error messages and +return: + + /* get argument named array as flat array and print it */ + if (get_argument(0, AWK_STRING, & value)) { + name = value.str_value.str; + if (sym_lookup(name, AWK_ARRAY, & value2)) + printf("dump_array_and_delete: sym_lookup of %s passed\n", + name); + else { + printf("dump_array_and_delete: sym_lookup of %s failed\n", + name); + goto out; + } + } else { + printf("dump_array_and_delete: get_argument(0) failed\n"); + goto out; + } + + For testing purposes and to make sure that the C code sees the same +number of elements as the `awk' code, the second step is to get the +count of elements in the array and print it: + + if (! get_element_count(value2.array_cookie, & count)) { + printf("dump_array_and_delete: get_element_count failed\n"); + goto out; + } + + printf("dump_array_and_delete: incoming size is %lu\n", + (unsigned long) count); + + The third step is to actually flatten the array, and then to double +check that the count in the `awk_flat_array_t' is the same as the count +just retrieved: + + if (! flatten_array(value2.array_cookie, & flat_array)) { + printf("dump_array_and_delete: could not flatten array\n"); + goto out; + } + + if (flat_array->count != count) { + printf("dump_array_and_delete: flat_array->count (%lu)" + " != count (%lu)\n", + (unsigned long) flat_array->count, + (unsigned long) count); + goto out; + } + + The fourth step is to retrieve the index of the element to be +deleted, which was passed as the second argument. Remember that +argument counts passed to `get_argument()' are zero-based, thus the +second argument is numbered one: + + if (! get_argument(1, AWK_STRING, & value3)) { + printf("dump_array_and_delete: get_argument(1) failed\n"); + goto out; + } + + The fifth step is where the "real work" is done. The function loops +over every element in the array, printing the index and element values. +In addition, upon finding the element with the index that is supposed +to be deleted, the function sets the `AWK_ELEMENT_DELETE' bit in the +`flags' field of the element. When the array is released, `gawk' +traverses the flattened array, and deletes any element which have this +flag bit set: + + for (i = 0; i < flat_array->count; i++) { + printf("\t%s[\"%.*s\"] = %s\n", + name, + (int) flat_array->elements[i].index.str_value.len, + flat_array->elements[i].index.str_value.str, + valrep2str(& flat_array->elements[i].value)); + + if (strcmp(value3.str_value.str, + flat_array->elements[i].index.str_value.str) + == 0) { + flat_array->elements[i].flags |= AWK_ELEMENT_DELETE; + printf("dump_array_and_delete: marking element \"%s\" " + "for deletion\n", + flat_array->elements[i].index.str_value.str); + } + } + + The sixth step is to release the flattened array. This tells `gawk' +that the extension is no longer using the array, and that it should +delete any elements marked for deletion. `gawk' also frees any storage +that was allocated, so you should not use the pointer (`flat_array' in +this code) once you have called `release_flattened_array()': + + if (! release_flattened_array(value2.array_cookie, flat_array)) { + printf("dump_array_and_delete: could not release flattened array\n"); + goto out; + } + + Finally, since everything was successful, the function sets the +return value to success, and returns: + + make_number(1.0, result); + out: + return result; + } + + Here is the output from running this part of the test: + + pets has 5 elements + dump_array_and_delete: sym_lookup of pets passed + dump_array_and_delete: incoming size is 5 + pets["1"] = "blacky" + pets["2"] = "rusty" + pets["3"] = "sophie" + dump_array_and_delete: marking element "3" for deletion + pets["4"] = "raincloud" + pets["5"] = "lucky" + dump_array_and_delete(pets) returned 1 + dump_array_and_delete() did remove index "3"! + + +File: gawk.info, Node: Creating Arrays, Prev: Flattening Arrays, Up: Array Manipulation + +16.4.10.4 How To Create and Populate Arrays +........................................... + +Besides working with arrays created by `awk' code, you can create +arrays and populate them as you see fit, and then `awk' code can access +them and manipulate them. + + There are two important points about creating arrays from extension +code: + + 1. You must install a new array into `gawk''s symbol table + immediately upon creating it. Once you have done so, you can then + populate the array. + + Similarly, if installing a new array as a subarray of an existing + array, you must add the new array to its parent before adding any + elements to it. + + Thus, the correct way to build an array is to work "top down." + Create the array, and immediately install it in `gawk''s symbol + table using `sym_update()', or install it as an element in a + previously existing array using `set_element()'. Example code is + coming shortly. + + 2. Due to gawk internals, after using `sym_update()' to install an + array into `gawk', you have to retrieve the array cookie from the + value passed in to `sym_update()' before doing anything else with + it, like so: + + awk_value_t index, value; + awk_array_t new_array; + + make_const_string("an index", 8, & index); + + new_array = create_array(); + val.val_type = AWK_ARRAY; + val.array_cookie = new_array; + + /* install array in the symbol table */ + sym_update("array", & index, & val); + + new_array = val.array_cookie; /* YOU MUST DO THIS */ + + If installing an array as a subarray, you must also retrieve the + value of the array cookie after the call to `set_element()'. + + The following C code is a simple test extension to create an array +with two regular elements and with a subarray. The leading `#include' +directives and boilerplate variable declarations are omitted for +brevity. The first step is to create a new array and then install it +in the symbol table: + + /* create_new_array --- create a named array */ + + static void + create_new_array() + { + awk_array_t a_cookie; + awk_array_t subarray; + awk_value_t index, value; + + a_cookie = create_array(); + value.val_type = AWK_ARRAY; + value.array_cookie = a_cookie; + + if (! sym_update("new_array", & value)) + printf("create_new_array: sym_update(\"new_array\") failed!\n"); + a_cookie = value.array_cookie; + +Note how `a_cookie' is reset from the `array_cookie' field in the +`value' structure. + + The second step is to install two regular values into `new_array': + + (void) make_const_string("hello", 5, & index); + (void) make_const_string("world", 5, & value); + if (! set_array_element(a_cookie, & index, & value)) { + printf("fill_in_array: set_array_element failed\n"); + return; + } + + (void) make_const_string("answer", 6, & index); + (void) make_number(42.0, & value); + if (! set_array_element(a_cookie, & index, & value)) { + printf("fill_in_array: set_array_element failed\n"); + return; + } + + The third step is to create the subarray and install it: + + (void) make_const_string("subarray", 8, & index); + subarray = create_array(); + value.val_type = AWK_ARRAY; + value.array_cookie = subarray; + if (! set_array_element(a_cookie, & index, & value)) { + printf("fill_in_array: set_array_element failed\n"); + return; + } + subarray = value.array_cookie; + + The final step is to populate the subarray with its own element: + + (void) make_const_string("foo", 3, & index); + (void) make_const_string("bar", 3, & value); + if (! set_array_element(subarray, & index, & value)) { + printf("fill_in_array: set_array_element failed\n"); + return; + } + } + + Here is sample script that loads the extension and then dumps the +array: + + @load "subarray" + + function dumparray(name, array, i) + { + for (i in array) + if (isarray(array[i])) + dumparray(name "[\"" i "\"]", array[i]) + else + printf("%s[\"%s\"] = %s\n", name, i, array[i]) + } + + BEGIN { + dumparray("new_array", new_array); + } + + Here is the result of running the script: + + $ AWKLIBPATH=$PWD ./gawk -f subarray.awk + -| new_array["subarray"]["foo"] = bar + -| new_array["hello"] = world + -| new_array["answer"] = 42 + +(*Note Finding Extensions::, for more information on the `AWKLIBPATH' +environment variable.) + + +File: gawk.info, Node: Extension API Variables, Next: Extension API Boilerplate, Prev: Array Manipulation, Up: Extension API Description + +16.4.11 API Variables +--------------------- + +The API provides two sets of variables. The first provides information +about the version of the API (both with which the extension was +compiled, and with which `gawk' was compiled). The second provides +information about how `gawk' was invoked. + +* Menu: + +* Extension Versioning:: API Version information. +* Extension API Informational Variables:: Variables providing information about + `gawk''s invocation. + + +File: gawk.info, Node: Extension Versioning, Next: Extension API Informational Variables, Up: Extension API Variables + +16.4.11.1 API Version Constants and Variables +............................................. + +The API provides both a "major" and a "minor" version number. The API +versions are available at compile time as constants: + +`GAWK_API_MAJOR_VERSION' + The major version of the API. + +`GAWK_API_MINOR_VERSION' + The minor version of the API. + + The minor version increases when new functions are added to the API. +Such new functions are always added to the end of the API `struct'. + + The major version increases (and the minor version is reset to zero) +if any of the data types change size or member order, or if any of the +existing functions change signature. + + It could happen that an extension may be compiled against one version +of the API but loaded by a version of `gawk' using a different version. +For this reason, the major and minor API versions of the running `gawk' +are included in the API `struct' as read-only constant integers: + +`api->major_version' + The major version of the running `gawk'. + +`api->minor_version' + The minor version of the running `gawk'. + + It is up to the extension to decide if there are API +incompatibilities. Typically a check like this is enough: -Two useful functions that are not in `awk' are `chdir()' (so that an + if (api->major_version != GAWK_API_MAJOR_VERSION + || api->minor_version < GAWK_API_MINOR_VERSION) { + fprintf(stderr, "foo_extension: version mismatch with gawk!\n"); + fprintf(stderr, "\tmy version (%d, %d), gawk version (%d, %d)\n", + GAWK_API_MAJOR_VERSION, GAWK_API_MINOR_VERSION, + api->major_version, api->minor_version); + exit(1); + } + + Such code is included in the boilerplate `dl_load_func()' macro +provided in `gawkapi.h' (discussed later, in *note Extension API +Boilerplate::). + + +File: gawk.info, Node: Extension API Informational Variables, Prev: Extension Versioning, Up: Extension API Variables + +16.4.11.2 Informational Variables +................................. + +The API provides access to several variables that describe whether the +corresponding command-line options were enabled when `gawk' was +invoked. The variables are: + +`do_lint' + This variable is true if `gawk' was invoked with `--lint' option + (*note Options::). + +`do_traditional' + This variable is true if `gawk' was invoked with `--traditional' + option. + +`do_profile' + This variable is true if `gawk' was invoked with `--profile' + option. + +`do_sandbox' + This variable is true if `gawk' was invoked with `--sandbox' + option. + +`do_debug' + This variable is true if `gawk' was invoked with `--debug' option. + +`do_mpfr' + This variable is true if `gawk' was invoked with `--bignum' option. + + The value of `do_lint' can change if `awk' code modifies the `LINT' +built-in variable (*note Built-in Variables::). The others should not +change during execution. + + +File: gawk.info, Node: Extension API Boilerplate, Next: Finding Extensions, Prev: Extension API Variables, Up: Extension API Description + +16.4.12 Boilerplate Code +------------------------ + +As mentioned earlier (*note Extension Mechanism Outline::), the function +definitions as presented are really macros. To use these macros, your +extension must provide a small amount of boilerplate code (variables and +functions) towards the top of your source file, using pre-defined names +as described below. The boilerplate needed is also provided in comments +in the `gawkapi.h' header file: + + /* Boiler plate code: */ + int plugin_is_GPL_compatible; + + static gawk_api_t *const api; + static awk_ext_id_t ext_id; + static const char *ext_version = NULL; /* or ... = "some string" */ + + static awk_ext_func_t func_table[] = { + { "name", do_name, 1 }, + /* ... */ + }; + + /* EITHER: */ + + static awk_bool_t (*init_func)(void) = NULL; + + /* OR: */ + + static awk_bool_t + init_my_module(void) + { + ... + } + + static awk_bool_t (*init_func)(void) = init_my_module; + + dl_load_func(func_table, some_name, "name_space_in_quotes") + + These variables and functions are as follows: + +`int plugin_is_GPL_compatible;' + This asserts that the extension is compatible with the GNU GPL + (*note Copying::). If your extension does not have this, `gawk' + will not load it (*note Plugin License::). + +`static gawk_api_t *const api;' + This global `static' variable should be set to point to the + `gawk_api_t' pointer that `gawk' passes to your `dl_load()' + function. This variable is used by all of the macros. + +`static awk_ext_id_t ext_id;' + This global static variable should be set to the `awk_ext_id_t' + value that `gawk' passes to your `dl_load()' function. This + variable is used by all of the macros. + +`static const char *ext_version = NULL; /* or ... = "some string" */' + This global `static' variable should be set either to `NULL', or + to point to a string giving the name and version of your extension. + +`static awk_ext_func_t func_table[] = { ... };' + This is an array of one or more `awk_ext_func_t' structures as + described earlier (*note Extension Functions::). It can then be + looped over for multiple calls to `add_ext_func()'. + +`static awk_bool_t (*init_func)(void) = NULL;' +` OR' +`static awk_bool_t init_my_module(void) { ... }' +`static awk_bool_t (*init_func)(void) = init_my_module;' + If you need to do some initialization work, you should define a + function that does it (creates variables, opens files, etc.) and + then define the `init_func' pointer to point to your function. + The function should return zero (false) upon failure, non-zero + (success) if everything goes well. + + If you don't need to do any initialization, define the pointer and + initialize it to `NULL'. + +`dl_load_func(func_table, some_name, "name_space_in_quotes")' + This macro expands to a `dl_load()' function that performs all the + necessary initializations. + + The point of the all the variables and arrays is to let the +`dl_load()' function (from the `dl_load_func()' macro) do all the +standard work. It does the following: + + 1. Check the API versions. If the extension major version does not + match `gawk''s, or if the extension minor version is greater than + `gawk''s, it prints a fatal error message and exits. + + 2. Load the functions defined in `func_table'. If any of them fails + to load, it prints a warning message but continues on. + + 3. If the `init_func' pointer is not `NULL', call the function it + points to. If it returns non-zero, print a warning message. + + 4. If `ext_version' is not `NULL', register the version string with + `gawk'. + + +File: gawk.info, Node: Finding Extensions, Prev: Extension API Boilerplate, Up: Extension API Description + +16.4.13 How `gawk' Finds Extensions +----------------------------------- + +Compiled extensions have to be installed in a directory where `gawk' +can find them. If `gawk' is configured and built in the default +fashion, the directory in which to find extensions is +`/usr/local/lib/gawk'. You can also specify a search path with a list +of directories to search for compiled extensions. *Note AWKLIBPATH +Variable::, for more information. + + +File: gawk.info, Node: Extension Example, Next: Extension Samples, Prev: Extension API Description, Up: Dynamic Extensions + +16.5 Example: Some File Functions +================================= + + No matter where you go, there you are. + Buckaroo Bonzai + + Two useful functions that are not in `awk' are `chdir()' (so that an `awk' program can change its directory) and `stat()' (so that an `awk' program can gather information about a file). This minor node -implements these functions for `gawk' in an external extension library. +implements these functions for `gawk' in an extension. * Menu: @@ -21233,9 +23575,9 @@ implements these functions for `gawk' in an external extension library. * Using Internal File Ops:: How to use an external extension. -File: gawk.info, Node: Internal File Description, Next: Internal File Ops, Up: Sample Library +File: gawk.info, Node: Internal File Description, Next: Internal File Ops, Up: Extension Example -16.2.1 Using `chdir()' and `stat()' +16.5.1 Using `chdir()' and `stat()' ----------------------------------- This minor node shows how to use the new functions at the `awk' level @@ -21243,6 +23585,7 @@ once they've been integrated into the running `gawk' interpreter. Using `chdir()' is very straightforward. It takes one argument, the new directory to change to: + @load "filefuncs" ... newdir = "/home/arnold/funstuff" ret = chdir(newdir) @@ -21253,7 +23596,7 @@ directory to change to: } ... - The return value is negative if the `chdir' failed, and `ERRNO' + The return value is negative if the `chdir()' failed, and `ERRNO' (*note Built-in Variables::) is set to a string indicating the error. Using `stat()' is a bit more complicated. The C `stat()' function @@ -21262,7 +23605,6 @@ way to model this in `awk' is to fill in an associative array with the appropriate information: file = "/home/arnold/.profile" - fdata[1] = "x" # force `fdata' to be an array ret = stat(file, fdata) if (ret < 0) { printf("could not stat %s: %s\n", @@ -21304,11 +23646,11 @@ appropriate information: `"ctime"' The file's last access, modification, and inode update times, respectively. These are numeric timestamps, suitable for - formatting with `strftime()' (*note Built-in::). + formatting with `strftime()' (*note Time Functions::). `"pmode"' The file's "printable mode." This is a string representation of - the file's type and permissions, such as what is produced by `ls + the file's type and permissions, such as is produced by `ls -l'--for example, `"drwxr-xr-x"'. `"type"' @@ -21356,57 +23698,87 @@ Elements::): components of that number, respectively. -File: gawk.info, Node: Internal File Ops, Next: Using Internal File Ops, Prev: Internal File Description, Up: Sample Library +File: gawk.info, Node: Internal File Ops, Next: Using Internal File Ops, Prev: Internal File Description, Up: Extension Example -16.2.2 C Code for `chdir()' and `stat()' +16.5.2 C Code for `chdir()' and `stat()' ---------------------------------------- -Here is the C code for these extensions. They were written for -GNU/Linux. The code needs some more work for complete portability to -other POSIX-compliant systems:(1) +Here is the C code for these extensions.(1) - #include "awk.h" + The file includes a number of standard header files, and then +includes the `gawkapi.h' header file which provides the API definitions. +Those are followed by the necessary variable declarations to make use +of the API macros and boilerplate code (*note Extension API +Boilerplate::). - #include <sys/sysmacros.h> + #ifdef HAVE_CONFIG_H + #include <config.h> + #endif + + #include <stdio.h> + #include <assert.h> + #include <errno.h> + #include <stdlib.h> + #include <string.h> + #include <unistd.h> + + #include <sys/types.h> + #include <sys/stat.h> + + #include "gawkapi.h" + + #include "gettext.h" + #define _(msgid) gettext(msgid) + #define N_(msgid) msgid + + #include "gawkfts.h" + #include "stack.h" + + static const gawk_api_t *api; /* for convenience macros to work */ + static awk_ext_id_t *ext_id; + static awk_bool_t init_filefuncs(void); + static awk_bool_t (*init_func)(void) = init_filefuncs; + static const char *ext_version = "filefuncs extension: version 1.0"; int plugin_is_GPL_compatible; + By convention, for an `awk' function `foo()', the C function that +implements it is called `do_foo()'. The function should have two +arguments: the first is an `int' usually called `nargs', that +represents the number of actual arguments for the function. The second +is a pointer to an `awk_value_t', usually named `result'. + /* do_chdir --- provide dynamically loaded chdir() builtin for gawk */ - static NODE * - do_chdir(int nargs) + static awk_value_t * + do_chdir(int nargs, awk_value_t *result) { - NODE *newdir; + awk_value_t newdir; int ret = -1; - if (do_lint && nargs != 1) - lintwarn("chdir: called with incorrect number of arguments"); - - newdir = get_scalar_argument(0, FALSE); + assert(result != NULL); - The file includes the `"awk.h"' header file for definitions for the -`gawk' internals. It includes `<sys/sysmacros.h>' for access to the -`major()' and `minor'() macros. + if (do_lint && nargs != 1) + lintwarn(ext_id, + _("chdir: called with incorrect number of arguments, " + "expecting 1")); - By convention, for an `awk' function `foo', the function that -implements it is called `do_foo'. The function should take a `int' -argument, usually called `nargs', that represents the number of defined -arguments for the function. The `newdir' variable represents the new -directory to change to, retrieved with `get_scalar_argument()'. Note -that the first argument is numbered zero. + The `newdir' variable represents the new directory to change to, +retrieved with `get_argument()'. Note that the first argument is +numbered zero. - This code actually accomplishes the `chdir()'. It first forces the -argument to be a string and passes the string value to the `chdir()' -system call. If the `chdir()' fails, `ERRNO' is updated. + If the argument is retrieved successfully, the function calls the +`chdir()' system call. If the `chdir()' fails, `ERRNO' is updated. - (void) force_string(newdir); - ret = chdir(newdir->stptr); - if (ret < 0) - update_ERRNO_int(errno); + if (get_argument(0, AWK_STRING, & newdir)) { + ret = chdir(newdir.str_value.str); + if (ret < 0) + update_ERRNO_int(errno); + } Finally, the function returns the return value to the `awk' level: - return make_number((AWKNUM) ret); + return make_number(ret, result); } The `stat()' built-in is more involved. First comes a function that @@ -21421,71 +23793,239 @@ turns a numeric mode into a printable representation (e.g., 644 becomes ... } - Next comes the `do_stat()' function. It starts with variable + Next comes a function for reading symbolic links, which is also +omitted here for brevity: + + /* read_symlink --- read a symbolic link into an allocated buffer. + ... */ + + static char * + read_symlink(const char *fname, size_t bufsize, ssize_t *linksize) + { + ... + } + + Two helper functions simplify entering values in the array that will +contain the result of the `stat()': + + /* array_set --- set an array element */ + + static void + array_set(awk_array_t array, const char *sub, awk_value_t *value) + { + awk_value_t index; + + set_array_element(array, + make_const_string(sub, strlen(sub), & index), + value); + + } + + /* array_set_numeric --- set an array element with a number */ + + static void + array_set_numeric(awk_array_t array, const char *sub, double num) + { + awk_value_t tmp; + + array_set(array, sub, make_number(num, & tmp)); + } + + The following function does most of the work to fill in the +`awk_array_t' result array with values obtained from a valid `struct +stat'. It is done in a separate function to support the `stat()' +function for `gawk' and also to support the `fts()' extension which is +included in the same file but whose code is not shown here (*note +Extension Sample File Functions::). + + The first part of the function is variable declarations, including a +table to map file types to strings: + + /* fill_stat_array --- do the work to fill an array with stat info */ + + static int + fill_stat_array(const char *name, awk_array_t array, struct stat *sbuf) + { + char *pmode; /* printable mode */ + const char *type = "unknown"; + awk_value_t tmp; + static struct ftype_map { + unsigned int mask; + const char *type; + } ftype_map[] = { + { S_IFREG, "file" }, + { S_IFBLK, "blockdev" }, + { S_IFCHR, "chardev" }, + { S_IFDIR, "directory" }, + #ifdef S_IFSOCK + { S_IFSOCK, "socket" }, + #endif + #ifdef S_IFIFO + { S_IFIFO, "fifo" }, + #endif + #ifdef S_IFLNK + { S_IFLNK, "symlink" }, + #endif + #ifdef S_IFDOOR /* Solaris weirdness */ + { S_IFDOOR, "door" }, + #endif /* S_IFDOOR */ + }; + int j, k; + + The destination array is cleared, and then code fills in various +elements based on values in the `struct stat': + + /* empty out the array */ + clear_array(array); + + /* fill in the array */ + array_set(array, "name", make_const_string(name, strlen(name), + & tmp)); + array_set_numeric(array, "dev", sbuf->st_dev); + array_set_numeric(array, "ino", sbuf->st_ino); + array_set_numeric(array, "mode", sbuf->st_mode); + array_set_numeric(array, "nlink", sbuf->st_nlink); + array_set_numeric(array, "uid", sbuf->st_uid); + array_set_numeric(array, "gid", sbuf->st_gid); + array_set_numeric(array, "size", sbuf->st_size); + array_set_numeric(array, "blocks", sbuf->st_blocks); + array_set_numeric(array, "atime", sbuf->st_atime); + array_set_numeric(array, "mtime", sbuf->st_mtime); + array_set_numeric(array, "ctime", sbuf->st_ctime); + + /* for block and character devices, add rdev, + major and minor numbers */ + if (S_ISBLK(sbuf->st_mode) || S_ISCHR(sbuf->st_mode)) { + array_set_numeric(array, "rdev", sbuf->st_rdev); + array_set_numeric(array, "major", major(sbuf->st_rdev)); + array_set_numeric(array, "minor", minor(sbuf->st_rdev)); + } + +The latter part of the function makes selective additions to the +destination array, depending upon the availability of certain members +and/or the type of the file. It then returns zero, for success: + + #ifdef HAVE_ST_BLKSIZE + array_set_numeric(array, "blksize", sbuf->st_blksize); + #endif /* HAVE_ST_BLKSIZE */ + + pmode = format_mode(sbuf->st_mode); + array_set(array, "pmode", make_const_string(pmode, strlen(pmode), + & tmp)); + + /* for symbolic links, add a linkval field */ + if (S_ISLNK(sbuf->st_mode)) { + char *buf; + ssize_t linksize; + + if ((buf = read_symlink(name, sbuf->st_size, + & linksize)) != NULL) + array_set(array, "linkval", + make_malloced_string(buf, linksize, & tmp)); + else + warning(ext_id, _("stat: unable to read symbolic link `%s'"), + name); + } + + /* add a type field */ + type = "unknown"; /* shouldn't happen */ + for (j = 0, k = sizeof(ftype_map)/sizeof(ftype_map[0]); j < k; j++) { + if ((sbuf->st_mode & S_IFMT) == ftype_map[j].mask) { + type = ftype_map[j].type; + break; + } + } + + array_set(array, "type", make_const_string(type, strlen(type), &tmp)); + + return 0; + } + + Finally, here is the `do_stat()' function. It starts with variable declarations and argument checking: /* do_stat --- provide a stat() function for gawk */ - static NODE * - do_stat(int nargs) + static awk_value_t * + do_stat(int nargs, awk_value_t *result) { - NODE *file, *array, *tmp; - struct stat sbuf; + awk_value_t file_param, array_param; + char *name; + awk_array_t array; int ret; - NODE **aptr; - char *pmode; /* printable mode */ - char *type = "unknown"; + struct stat sbuf; + + assert(result != NULL); - if (do_lint && nargs > 2) - lintwarn("stat: called with too many arguments"); + if (do_lint && nargs != 2) { + lintwarn(ext_id, + _("stat: called with wrong number of arguments")); + return make_number(-1, result); + } Then comes the actual work. First, the function gets the arguments. -Then, it always clears the array. The code use `lstat()' (instead of -`stat()') to get the file information, in case the file is a symbolic -link. If there's an error, it sets `ERRNO' and returns: +Next, it gets the information for the file. The code use `lstat()' +(instead of `stat()') to get the file information, in case the file is +a symbolic link. If there's an error, it sets `ERRNO' and returns: /* file is first arg, array to hold results is second */ - file = get_scalar_argument(0, FALSE); - array = get_array_argument(1, FALSE); + if ( ! get_argument(0, AWK_STRING, & file_param) + || ! get_argument(1, AWK_ARRAY, & array_param)) { + warning(ext_id, _("stat: bad parameters")); + return make_number(-1, result); + } - /* empty out the array */ - assoc_clear(array); + name = file_param.str_value.str; + array = array_param.array_cookie; + + /* always empty out the array */ + clear_array(array); /* lstat the file, if error, set ERRNO and return */ - (void) force_string(file); - ret = lstat(file->stptr, & sbuf); + ret = lstat(name, & sbuf); if (ret < 0) { update_ERRNO_int(errno); - return make_number((AWKNUM) ret); + return make_number(ret, result); } - Now comes the tedious part: filling in the array. Only a few of the -calls are shown here, since they all follow the same pattern: + The tedious work is done by `fill_stat_array()', shown earlier. +When done, return the result from `fill_stat_array()': - /* fill in the array */ - aptr = assoc_lookup(array, tmp = make_string("name", 4)); - *aptr = dupnode(file); - unref(tmp); + ret = fill_stat_array(name, array, & sbuf); - aptr = assoc_lookup(array, tmp = make_string("mode", 4)); - *aptr = make_number((AWKNUM) sbuf.st_mode); - unref(tmp); + return make_number(ret, result); + } - aptr = assoc_lookup(array, tmp = make_string("pmode", 5)); - pmode = format_mode(sbuf.st_mode); - *aptr = make_string(pmode, strlen(pmode)); - unref(tmp); + Finally, it's necessary to provide the "glue" that loads the new +function(s) into `gawk'. - When done, return the `lstat()' return value: + The `filefuncs' extension also provides an `fts()' function, which +we omit here. For its sake there is an initialization function: + /* init_filefuncs --- initialization routine */ - return make_number((AWKNUM) ret); + static awk_bool_t + init_filefuncs(void) + { + ... } - Finally, it's necessary to provide the "glue" that loads the new -function(s) into `gawk'. By convention, each library has a routine -named `dl_load()' that does the job. The simplest way is to use the -`dl_load_func' macro in `gawkapi.h'. + We are almost done. We need an array of `awk_ext_func_t' structures +for loading each function into `gawk': + + static awk_ext_func_t func_table[] = { + { "chdir", do_chdir, 1 }, + { "stat", do_stat, 2 }, + { "fts", do_fts, 3 }, + }; + + Each extension must have a routine named `dl_load()' to load +everything that needs to be loaded. It is simplest to use the +`dl_load_func()' macro in `gawkapi.h': + + /* define the dl_load() function using the boilerplate macro */ + + dl_load_func(func_table, filefuncs, "") And that's it! As an exercise, consider adding functions to implement system calls such as `chown()', `chmod()', and `umask()'. @@ -21497,34 +24037,33 @@ implement system calls such as `chown()', `chmod()', and `umask()'. version. -File: gawk.info, Node: Using Internal File Ops, Prev: Internal File Ops, Up: Sample Library +File: gawk.info, Node: Using Internal File Ops, Prev: Internal File Ops, Up: Extension Example -16.2.3 Integrating the Extensions +16.5.3 Integrating The Extensions --------------------------------- Now that the code is written, it must be possible to add it at runtime to the running `gawk' interpreter. First, the code must be compiled. Assuming that the functions are in a file named `filefuncs.c', and IDIR -is the location of the `gawk' include files, the following steps create -a GNU/Linux shared library: +is the location of the `gawkapi.h' header file, the following steps(1) +create a GNU/Linux shared library: $ gcc -fPIC -shared -DHAVE_CONFIG_H -c -O -g -IIDIR filefuncs.c - $ ld -o filefuncs.so -shared filefuncs.o + $ ld -o filefuncs.so -shared filefuncs.o -lc - Once the library exists, it is loaded by calling the `extension()' -built-in function. This function takes two arguments: the name of the -library to load and the name of a function to call when the library is -first loaded. This function adds the new functions to `gawk'. It -returns the value returned by the initialization function within the -shared library: + Once the library exists, it is loaded by using the `@load' keyword. # file testff.awk + @load "filefuncs" + BEGIN { - extension("./filefuncs.so", "dl_load") + "pwd" | getline curdir # save current directory + close("pwd") - chdir(".") # no-op + chdir("/tmp") + system("pwd") # test it + chdir(curdir) # go back - data[1] = 1 # force `data' to be an array print "Info for testff.awk" ret = stat("testff.awk", data) print "ret =", ret @@ -21541,32 +24080,642 @@ shared library: print "JUNK modified:", strftime("%m %d %y %H:%M:%S", data["mtime"]) } - Here are the results of running the program: + The `AWKLIBPATH' environment variable tells `gawk' where to find +shared libraries (*note Finding Extensions::). We set it to the +current directory and run the program: - $ gawk -f testff.awk + $ AWKLIBPATH=$PWD gawk -f testff.awk + -| /tmp -| Info for testff.awk -| ret = 0 - -| data["size"] = 607 - -| data["ino"] = 14945891 - -| data["name"] = testff.awk - -| data["pmode"] = -rw-rw-r-- - -| data["nlink"] = 1 - -| data["atime"] = 1293993369 - -| data["mtime"] = 1288520752 - -| data["mode"] = 33204 -| data["blksize"] = 4096 - -| data["dev"] = 2054 + -| data["mtime"] = 1350838628 + -| data["mode"] = 33204 -| data["type"] = file - -| data["gid"] = 500 - -| data["uid"] = 500 + -| data["dev"] = 2053 + -| data["gid"] = 1000 + -| data["ino"] = 1719496 + -| data["ctime"] = 1350838628 -| data["blocks"] = 8 - -| data["ctime"] = 1290113572 - -| testff.awk modified: 10 31 10 12:25:52 + -| data["nlink"] = 1 + -| data["name"] = testff.awk + -| data["atime"] = 1350838632 + -| data["pmode"] = -rw-rw-r-- + -| data["size"] = 662 + -| data["uid"] = 1000 + -| testff.awk modified: 10 21 12 18:57:08 -| -| Info for JUNK -| ret = -1 -| JUNK modified: 01 01 70 02:00:00 + ---------- Footnotes ---------- + + (1) In practice, you would probably want to use the GNU +Autotools--Automake, Autoconf, Libtool, and Gettext--to configure and +build your libraries. Instructions for doing so are beyond the scope of +this Info file. *Note gawkextlib::, for WWW links to the tools. + + +File: gawk.info, Node: Extension Samples, Next: gawkextlib, Prev: Extension Example, Up: Dynamic Extensions + +16.6 The Sample Extensions In The `gawk' Distribution +===================================================== + +This minor node provides brief overviews of the sample extensions that +come in the `gawk' distribution. Some of them are intended for +production use, such the `filefuncs' and `readdir' extensions. Others +mainly provide example code that shows how to use the extension API. + +* Menu: + +* Extension Sample File Functions:: The file functions sample. +* Extension Sample Fnmatch:: An interface to `fnmatch()'. +* Extension Sample Fork:: An interface to `fork()' and other + process functions. +* Extension Sample Ord:: Character to value to character + conversions. +* Extension Sample Readdir:: An interface to `readdir()'. +* Extension Sample Revout:: Reversing output sample output wrapper. +* Extension Sample Rev2way:: Reversing data sample two-way processor. +* Extension Sample Read write array:: Serializing an array to a file. +* Extension Sample Readfile:: Reading an entire file into a string. +* Extension Sample API Tests:: Tests for the API. +* Extension Sample Time:: An interface to `gettimeofday()' + and `sleep()'. + + +File: gawk.info, Node: Extension Sample File Functions, Next: Extension Sample Fnmatch, Up: Extension Samples + +16.6.1 File Related Functions +----------------------------- + +The `filefuncs' extension provides three different functions, as +follows: The usage is: + +`@load "filefuncs"' + This is how you load the extension. + +`result = chdir("/some/directory")' + The `chdir()' function is a direct hook to the `chdir()' system + call to change the current directory. It returns zero upon + success or less than zero upon error. In the latter case it + updates `ERRNO'. + +`result = stat("/some/path", statdata)' + The `stat()' function provides a hook into the `stat()' system + call. In fact, it uses `lstat()'. It returns zero upon success or + less than zero upon error. In the latter case it updates `ERRNO'. + + In all cases, it clears the `statdata' array. When the call is + successful, `stat()' fills the `statdata' array with information + retrieved from the filesystem, as follows: + + `statdata["name"]' The name of the file. + `statdata["dev"]' Corresponds to the `st_dev' field in + the `struct stat'. + `statdata["ino"]' Corresponds to the `st_ino' field in + the `struct stat'. + `statdata["mode"]' Corresponds to the `st_mode' field in + the `struct stat'. + `statdata["nlink"]' Corresponds to the `st_nlink' field in + the `struct stat'. + `statdata["uid"]' Corresponds to the `st_uid' field in + the `struct stat'. + `statdata["gid"]' Corresponds to the `st_gid' field in + the `struct stat'. + `statdata["size"]' Corresponds to the `st_size' field in + the `struct stat'. + `statdata["atime"]' Corresponds to the `st_atime' field in + the `struct stat'. + `statdata["mtime"]' Corresponds to the `st_mtime' field in + the `struct stat'. + `statdata["ctime"]' Corresponds to the `st_ctime' field in + the `struct stat'. + `statdata["rdev"]' Corresponds to the `st_rdev' field in + the `struct stat'. This element is + only present for device files. + `statdata["major"]' Corresponds to the `st_major' field in + the `struct stat'. This element is + only present for device files. + `statdata["minor"]' Corresponds to the `st_minor' field in + the `struct stat'. This element is + only present for device files. + `statdata["blksize"]' Corresponds to the `st_blksize' field + in the `struct stat'. if this field is + present on your system. (It is present + on all modern systems that we know of.) + `statdata["pmode"]' A human-readable version of the mode + value, such as printed by `ls'. For + example, `"-rwxr-xr-x"'. + `statdata["linkval"]' If the named file is a symbolic link, + this element will exist and its value + is the value of the symbolic link + (where the symbolic link points to). + `statdata["type"]' The type of the file as a string. One + of `"file"', `"blockdev"', `"chardev"', + `"directory"', `"socket"', `"fifo"', + `"symlink"', `"door"', or `"unknown"'. + Not all systems support all file types. + +`flags = or(FTS_PHYSICAL, ...)' +`result = fts(pathlist, flags, filedata)' + Walk the file trees provided in `pathlist' and fill in the + `filedata' array as described below. `flags' is the bitwise OR of + several predefined constant values, also as described below. + Return zero if there were no errors, otherwise return -1. + + The `fts()' function provides a hook to the C library `fts()' +routines for traversing file hierarchies. Instead of returning data +about one file at a time in a stream, it fills in a multi-dimensional +array with data about each file and directory encountered in the +requested hierarchies. + + The arguments are as follows: + +`pathlist' + An array of filenames. The element values are used; the index + values are ignored. + +`flags' + This should be the bitwise OR of one or more of the following + predefined constant flag values. At least one of `FTS_LOGICAL' or + `FTS_PHYSICAL' must be provided; otherwise `fts()' returns an + error value and sets `ERRNO'. The flags are: + + `FTS_LOGICAL' + Do a "logical" file traversal, where the information returned + for a symbolic link refers to the linked-to file, and not to + the symbolic link itself. This flag is mutually exclusive + with `FTS_PHYSICAL'. + + `FTS_PHYSICAL' + Do a "physical" file traversal, where the information + returned for a symbolic link refers to the symbolic link + itself. This flag is mutually exclusive with `FTS_LOGICAL'. + + `FTS_NOCHDIR' + As a performance optimization, the C library `fts()' routines + change directory as they traverse a file hierarchy. This + flag disables that optimization. + + `FTS_COMFOLLOW' + Immediately follow a symbolic link named in `pathlist', + whether or not `FTS_LOGICAL' is set. + + `FTS_SEEDOT' + By default, the `fts()' routines do not return entries for `.' + and `..'. This option causes entries for `..' to also be + included. (The extension always includes an entry for `.', + see below.) + + `FTS_XDEV' + During a traversal, do not cross onto a different mounted + filesystem. + +`filedata' + The `filedata' array is first cleared. Then, `fts()' creates an + element in `filedata' for every element in `pathlist'. The index + is the name of the directory or file given in `pathlist'. The + element for this index is itself an array. There are two cases. + + _The path is a file._ + In this case, the array contains two or three elements: + + `"path"' + The full path to this file, starting from the "root" + that was given in the `pathlist' array. + + `"stat"' + This element is itself an array, containing the same + information as provided by the `stat()' function + described earlier for its `statdata' argument. The + element may not be present if the `stat()' system call + for the file failed. + + `"error"' + If some kind of error was encountered, the array will + also contain an element named `"error"', which is a + string describing the error. + + _The path is a directory._ + In this case, the array contains one element for each entry + in the directory. If an entry is a file, that element is as + for files, just described. If the entry is a directory, that + element is (recursively), an array describing the + subdirectory. If `FTS_SEEDOT' was provided in the flags, + then there will also be an element named `".."'. This + element will be an array containing the data as provided by + `stat()'. + + In addition, there will be an element whose index is `"."'. + This element is an array containing the same two or three + elements as for a file: `"path"', `"stat"', and `"error"'. + + The `fts()' function returns zero if there were no errors. +Otherwise it returns -1. + + NOTE: The `fts()' extension does not exactly mimic the interface + of the C library `fts()' routines, choosing instead to provide an + interface that is based on associative arrays, which should be + more comfortable to use from an `awk' program. This includes the + lack of a comparison function, since `gawk' already provides + powerful array sorting facilities. While an `fts_read()'-like + interface could have been provided, this felt less natural than + simply creating a multi-dimensional array to represent the file + hierarchy and its information. + + See `test/fts.awk' in the `gawk' distribution for an example. + + +File: gawk.info, Node: Extension Sample Fnmatch, Next: Extension Sample Fork, Prev: Extension Sample File Functions, Up: Extension Samples + +16.6.2 Interface To `fnmatch()' +------------------------------- + +This extension provides an interface to the C library `fnmatch()' +function. The usage is: + + @load "fnmatch" + + result = fnmatch(pattern, string, flags) + + The `fnmatch' extension adds a single function named `fnmatch()', +one constant (`FNM_NOMATCH'), and an array of flag values named `FNM'. + + The arguments to `fnmatch()' are: + +`pattern' + The filename wildcard to match. + +`string' + The filename string, + +`flag' + Either zero, or the bitwise OR of one or more of the flags in the + `FNM' array. + + The return value is zero on success, `FNM_NOMATCH' if the string did +not match the pattern, or a different non-zero value if an error +occurred. + + The flags are follows: + +`FNM["CASEFOLD"]' Corresponds to the `FNM_CASEFOLD' flag as defined in + `fnmatch()'. +`FNM["FILE_NAME"]' Corresponds to the `FNM_FILE_NAME' flag as defined + in `fnmatch()'. +`FNM["LEADING_DIR"]' Corresponds to the `FNM_LEADING_DIR' flag as defined + in `fnmatch()'. +`FNM["NOESCAPE"]' Corresponds to the `FNM_NOESCAPE' flag as defined in + `fnmatch()'. +`FNM["PATHNAME"]' Corresponds to the `FNM_PATHNAME' flag as defined in + `fnmatch()'. +`FNM["PERIOD"]' Corresponds to the `FNM_PERIOD' flag as defined in + `fnmatch()'. + + Here is an example: + + @load "fnmatch" + ... + flags = or(FNM["PERIOD"], FNM["NOESCAPE"]) + if (fnmatch("*.a", "foo.c", flags) == FNM_NOMATCH) + print "no match" + + +File: gawk.info, Node: Extension Sample Fork, Next: Extension Sample Ord, Prev: Extension Sample Fnmatch, Up: Extension Samples + +16.6.3 Interface To `fork()', `wait()' and `waitpid()' +------------------------------------------------------ + +The `fork' extension adds three functions, as follows. + +`@load "fork"' + This is how you load the extension. + +`pid = fork()' + This function creates a new process. The return value is the zero + in the child and the process-id number of the child in the parent, + or -1 upon error. In the latter case, `ERRNO' indicates the + problem. In the child, `PROCINFO["pid"]' and `PROCINFO["ppid"]' + are updated to reflect the correct values. + +`ret = waitpid(pid)' + This function takes a numeric argument, which is the process-id to + wait for. The return value is that of the `waitpid()' system call. + +`ret = wait()' + This function waits for the first child to die. The return value + is that of the `wait()' system call. + + There is no corresponding `exec()' function. + + Here is an example: + + @load "fork" + ... + if ((pid = fork()) == 0) + print "hello from the child" + else + print "hello from the parent" + + +File: gawk.info, Node: Extension Sample Ord, Next: Extension Sample Readdir, Prev: Extension Sample Fork, Up: Extension Samples + +16.6.4 Character and Numeric values: `ord()' and `chr()' +-------------------------------------------------------- + +The `ordchr' extension adds two functions, named `ord()' and `chr()', +as follows. + +`number = ord(string)' + Return the numeric value of the first character in `string'. + +`char = chr(number)' + Return the string whose first character is that represented by + `number'. + + These functions are inspired by the Pascal language functions of the +same name. Here is an example: + + @load "ordchr" + ... + printf("The numeric value of 'A' is %d\n", ord("A")) + printf("The string value of 65 is %s\n", chr(65)) + + +File: gawk.info, Node: Extension Sample Readdir, Next: Extension Sample Revout, Prev: Extension Sample Ord, Up: Extension Samples + +16.6.5 Reading Directories +-------------------------- + +The `readdir' extension adds an input parser for directories, and adds +a single function named `readdir_do_ftype()'. The usage is as follows: + + @load "readdir" + + readdir_do_ftype("stat") # or "dirent" or "never" + + When this extension is in use, instead of skipping directories named +on the command line (or with `getline'), they are read, with each entry +returned as a record. + + The record consists of at least two fields: the inode number and the +filename, separated by a forward slash character. On systems where the +directory entry contains the file type, the record has a third field +which is a single letter indicating the type of the file: + +Letter File Type +-------------------------------------------------------------------------- +`b' Block device +`c' Character device +`d' Directory +`f' Regular file +`l' Symbolic link +`p' Named pipe (FIFO) +`s' Socket +`u' Anything else (unknown) + + On systems without the file type information, calling +`readdir_do_ftype("stat")' causes the extension to use the `lstat()' +system call to retrieve the appropriate information. This is not the +default, since `lstat()' is a potentially expensive operation. By +calling `readdir_do_ftype("never")' one can ensure that the file type +information is never displayed, even when readily available in the +directory entry. + + The third option, `readdir_do_ftype("dirent")', takes file type +information from the directory entry, if it is available. This is the +default on systems that supply this information. + + The `readdir_do_ftype()' function sets `ERRNO' if called without +arguments or with invalid arguments. + + NOTE: On GNU/Linux systems, there are filesystems that don't + support the `d_type' entry (see the readdir(3) manual page), and + so the file type is always `u'. Therefore, using + `readdir_do_ftype("stat")' is advisable even on GNU/Linux systems. + In this case, the `readdir' extension falls back to using + `lstat()' when it encounters an unknown file type. + + Here is an example: + + @load "readdir" + ... + BEGIN { FS = "/" } + { print "file name is", $2 } + + +File: gawk.info, Node: Extension Sample Revout, Next: Extension Sample Rev2way, Prev: Extension Sample Readdir, Up: Extension Samples + +16.6.6 Reversing Output +----------------------- + +The `revoutput' extension adds a simple output wrapper that reverses +the characters in each output line. It's main purpose is to show how to +write an output wrapper, although it may be mildly amusing for the +unwary. Here is an example: + + @load "revoutput" + + BEGIN { + REVOUT = 1 + print "hello, world" > "/dev/stdout" + } + + The output from this program is: `dlrow ,olleh'. + + +File: gawk.info, Node: Extension Sample Rev2way, Next: Extension Sample Read write array, Prev: Extension Sample Revout, Up: Extension Samples + +16.6.7 Two-Way I/O Example +-------------------------- + +The `revtwoway' extension adds a simple two-way processor that reverses +the characters in each line sent to it for reading back by the `awk' +program. It's main purpose is to show how to write a two-way +processor, although it may also be mildly amusing. The following +example shows how to use it: + + @load "revtwoway" + + BEGIN { + cmd = "/magic/mirror" + print "hello, world" |& cmd + cmd |& getline result + print result + close(cmd) + } + + +File: gawk.info, Node: Extension Sample Read write array, Next: Extension Sample Readfile, Prev: Extension Sample Rev2way, Up: Extension Samples + +16.6.8 Dumping and Restoring An Array +------------------------------------- + +The `rwarray' extension adds two functions, named `writea()' and +`reada()', as follows: + +`ret = writea(file, array)' + This function takes a string argument, which is the name of the + file to which dump the array, and the array itself as the second + argument. `writea()' understands multidimensional arrays. It + returns one on success, or zero upon failure. + +`ret = reada(file, array)' + `reada()' is the inverse of `writea()'; it reads the file named as + its first argument, filling in the array named as the second + argument. It clears the array first. Here too, the return value + is one on success and zero upon failure. + + The array created by `reada()' is identical to that written by +`writea()' in the sense that the contents are the same. However, due to +implementation issues, the array traversal order of the recreated array +is likely to be different from that of the original array. As array +traversal order in `awk' is by default undefined, this is not +(technically) a problem. If you need to guarantee a particular +traversal order, use the array sorting features in `gawk' to do so +(*note Array Sorting::). + + The file contains binary data. All integral values are written in +network byte order. However, double precision floating-point values +are written as native binary data. Thus, arrays containing only string +data can theoretically be dumped on systems with one byte order and +restored on systems with a different one, but this has not been tried. + + Here is an example: + + @load "rwarray" + ... + ret = writea("arraydump.bin", array) + ... + ret = reada("arraydump.bin", array) + + +File: gawk.info, Node: Extension Sample Readfile, Next: Extension Sample API Tests, Prev: Extension Sample Read write array, Up: Extension Samples + +16.6.9 Reading An Entire File +----------------------------- + +The `readfile' extension adds a single function named `readfile()': + +`result = readfile("/some/path")' + The argument is the name of the file to read. The return value is + a string containing the entire contents of the requested file. + Upon error, the function returns the empty string and sets `ERRNO'. + + Here is an example: + + @load "readfile" + ... + contents = readfile("/path/to/file"); + if (contents == "" && ERRNO != "") { + print("problem reading file", ERRNO) > "/dev/stderr" + ... + } + + +File: gawk.info, Node: Extension Sample API Tests, Next: Extension Sample Time, Prev: Extension Sample Readfile, Up: Extension Samples + +16.6.10 API Tests +----------------- + +The `testext' extension exercises parts of the extension API that are +not tested by the other samples. The `extension/testext.c' file +contains both the C code for the extension and `awk' test code inside C +comments that run the tests. The testing framework extracts the `awk' +code and runs the tests. See the source file for more information. + + +File: gawk.info, Node: Extension Sample Time, Prev: Extension Sample API Tests, Up: Extension Samples + +16.6.11 Extension Time Functions +-------------------------------- + +These functions can be used by either invoking `gawk' with a +command-line argument of `-l time' or by inserting `@load "time"' in +your script. + +`the_time = gettimeofday()' + Return the time in seconds that has elapsed since 1970-01-01 UTC + as a floating point value. If the time is unavailable on this + platform, return -1 and set `ERRNO'. The returned time should + have sub-second precision, but the actual precision will vary + based on the platform. If the standard C `gettimeofday()' system + call is available on this platform, then it simply returns the + value. Otherwise, if on Windows, it tries to use + `GetSystemTimeAsFileTime()'. + +`result = sleep(SECONDS)' + Attempt to sleep for SECONDS seconds. If SECONDS is negative, or + the attempt to sleep fails, return -1 and set `ERRNO'. Otherwise, + return zero after sleeping for the indicated amount of time. Note + that SECONDS may be a floating-point (non-integral) value. + Implementation details: depending on platform availability, this + function tries to use `nanosleep()' or `select()' to implement the + delay. + + +File: gawk.info, Node: gawkextlib, Prev: Extension Samples, Up: Dynamic Extensions + +16.7 The `gawkextlib' Project +============================= + +The `gawkextlib' (http://sourceforge.net/projects/gawkextlib/) project +provides a number of `gawk' extensions, including one for processing +XML files. This is the evolution of the original `xgawk' (XML `gawk') +project. + + As of this writing, there are four extensions: + + * XML parser extension, using the Expat + (http://expat.sourceforge.net) XML parsing library. + + * Postgres SQL extension. + + * GD graphics library extension. + + * MPFR library extension. This provides access to a number of MPFR + functions which `gawk''s native MPFR support does not. + + The `time' extension described earlier (*note Extension Sample +Time::) was originally from this project but has been moved in to the +main `gawk' distribution. + + You can check out the code for the `gawkextlib' project using the +GIT (http://git-scm.com) distributed source code control system. The +command is as follows: + + git clone git://git.code.sf.net/p/gawkextlib/code gawkextlib-code + + You will need to have the Expat (http://expat.sourceforge.net) XML +parser library installed in order to build and use the XML extension. + + In addition, you must have the GNU Autotools installed (Autoconf +(http://www.gnu.org/software/autoconf), Automake +(http://www.gnu.org/software/automake), Libtool +(http://www.gnu.org/software/libtool), and Gettext +(http://www.gnu.org/software/gettext)). + + The simple recipe for building and testing `gawkextlib' is as +follows. First, build and install `gawk': + + cd .../path/to/gawk/code + ./configure --prefix=/tmp/newgawk Install in /tmp/newgawk for now + make && make check Build and check that all is OK + make install Install gawk + + Next, build `gawkextlib' and test it: + + cd .../path/to/gawkextlib-code + ./update-autotools Generate configure, etc. + You may have to run this command twice + ./configure --with-gawk=/tmp/newgawk Configure, point at "installed" gawk + make && make check Build and check that all is OK + + If you write an extension that you wish to share with other `gawk' +users, please consider doing so through the `gawkextlib' project. + File: gawk.info, Node: Language History, Next: Installation, Prev: Dynamic Extensions, Up: Top @@ -26041,7 +29190,6 @@ Index * Ada programming language: Glossary. (line 20) * adding, features to gawk: Adding Code. (line 6) * adding, fields: Changing Fields. (line 53) -* adding, functions to gawk: Dynamic Extensions. (line 9) * advanced features, buffering: I/O Functions. (line 98) * advanced features, close() function: Close Files And Pipes. (line 131) @@ -26399,7 +29547,6 @@ Index * characters, transliterating: Translate Program. (line 6) * characters, values of as numbers: Ordinal Functions. (line 6) * Chassell, Robert J.: Acknowledgments. (line 33) -* chdir() function, implementing in gawk: Sample Library. (line 6) * chem utility: Glossary. (line 151) * chr() user-defined function: Ordinal Functions. (line 16) * clear debugger command: Breakpoint Control. (line 36) @@ -26771,7 +29918,6 @@ Index (line 162) * differences in awk and gawk, trunc-mod operation: Arithmetic Ops. (line 66) -* directories, changing: Sample Library. (line 6) * directories, command line: Command line directories. (line 6) * directories, searching <1>: Igawk Program. (line 368) @@ -26896,8 +30042,6 @@ Index (line 9) * expressions, selecting: Conditional Exp. (line 6) * Extended Regular Expressions (EREs): Bracket Expressions. (line 24) -* extension() function (gawk): Using Internal File Ops. - (line 15) * extensions, Brian Kernighan's awk <1>: Other Versions. (line 13) * extensions, Brian Kernighan's awk: BTL. (line 6) * extensions, common, ** operator: Arithmetic Ops. (line 36) @@ -26992,7 +30136,6 @@ Index * files, closing: I/O Functions. (line 10) * files, descriptors, See file descriptors: Special FD. (line 6) * files, group: Group Functions. (line 6) -* files, information about, retrieving: Sample Library. (line 6) * files, initialization and cleanup: Filetrans Function. (line 6) * files, input, See input files: Read Terminal. (line 17) * files, log, timestamps in: Time Functions. (line 6) @@ -27088,7 +30231,6 @@ Index (line 47) * functions, built-in <1>: Functions. (line 6) * functions, built-in: Function Calls. (line 10) -* functions, built-in, adding to gawk: Dynamic Extensions. (line 9) * functions, built-in, evaluation order: Calling Built-in. (line 30) * functions, defining: Definition Syntax. (line 6) * functions, library: Library Functions. (line 6) @@ -27170,7 +30312,6 @@ Index (line 26) * gawk, FUNCTAB array in: Auto-set. (line 119) * gawk, function arguments and: Calling Built-in. (line 16) -* gawk, functions, adding: Dynamic Extensions. (line 9) * gawk, hexadecimal numbers and: Nondecimal-numbers. (line 42) * gawk, IGNORECASE variable in <1>: Array Sorting Functions. (line 81) @@ -27268,6 +30409,8 @@ Index * gettext library: Explaining gettext. (line 6) * gettext library, locale categories: Explaining gettext. (line 80) * gettext() function (C library): Explaining gettext. (line 62) +* gettimeofday time extension function: Extension Sample Time. + (line 10) * GMP: Arbitrary Precision Arithmetic. (line 6) * GNITS mailing list: Acknowledgments. (line 52) @@ -27922,7 +31065,7 @@ Index (line 10) * programming conventions, functions, writing: Definition Syntax. (line 55) -* programming conventions, gawk internals: Internal File Ops. (line 33) +* programming conventions, gawk internals: Internal File Ops. (line 45) * programming conventions, private variable names: Library Names. (line 23) * programming language, recipe for: History. (line 6) @@ -28172,6 +31315,10 @@ Index * single-character fields: Single Character Fields. (line 6) * Skywalker, Luke: Undocumented. (line 6) +* sleep: Extension Sample Time. + (line 6) +* sleep time extension function: Extension Sample Time. + (line 20) * sleep utility: Alarm Program. (line 109) * Solaris, POSIX-compliant awk: Other Versions. (line 87) * sort function, arrays, sorting: Array Sorting Functions. @@ -28216,7 +31363,6 @@ Index * standard input <1>: Special FD. (line 6) * standard input: Read Terminal. (line 6) * standard output: Special FD. (line 6) -* stat() function, implementing in gawk: Sample Library. (line 6) * statements, compound, control statements and: Statements. (line 10) * statements, control, in actions: Statements. (line 6) * statements, multiple: Statements/Lines. (line 91) @@ -28308,6 +31454,8 @@ Index * tilde (~), ~ operator <5>: Computed Regexps. (line 6) * tilde (~), ~ operator <6>: Case-sensitivity. (line 26) * tilde (~), ~ operator: Regexp Usage. (line 19) +* time: Extension Sample Time. + (line 6) * time, alarm clock example program: Alarm Program. (line 9) * time, localization and: Explaining gettext. (line 115) * time, managing: Getlocaltime Function. @@ -28517,452 +31665,515 @@ Index Tag Table: Node: Top1352 -Node: Foreword31870 -Node: Preface36215 -Ref: Preface-Footnote-139268 -Ref: Preface-Footnote-239374 -Node: History39606 -Node: Names41997 -Ref: Names-Footnote-143474 -Node: This Manual43546 -Ref: This Manual-Footnote-148674 -Node: Conventions48774 -Node: Manual History50908 -Ref: Manual History-Footnote-154178 -Ref: Manual History-Footnote-254219 -Node: How To Contribute54293 -Node: Acknowledgments55437 -Node: Getting Started59933 -Node: Running gawk62312 -Node: One-shot63498 -Node: Read Terminal64723 -Ref: Read Terminal-Footnote-166373 -Ref: Read Terminal-Footnote-266649 -Node: Long66820 -Node: Executable Scripts68196 -Ref: Executable Scripts-Footnote-170065 -Ref: Executable Scripts-Footnote-270167 -Node: Comments70714 -Node: Quoting73181 -Node: DOS Quoting77804 -Node: Sample Data Files78479 -Node: Very Simple81511 -Node: Two Rules86110 -Node: More Complex88257 -Ref: More Complex-Footnote-191187 -Node: Statements/Lines91272 -Ref: Statements/Lines-Footnote-195734 -Node: Other Features95999 -Node: When96927 -Node: Invoking Gawk99074 -Node: Command Line100535 -Node: Options101318 -Ref: Options-Footnote-1116716 -Node: Other Arguments116741 -Node: Naming Standard Input119399 -Node: Environment Variables120493 -Node: AWKPATH Variable121051 -Ref: AWKPATH Variable-Footnote-1123809 -Node: AWKLIBPATH Variable124069 -Node: Other Environment Variables124666 -Node: Exit Status127161 -Node: Include Files127836 -Node: Loading Shared Libraries131405 -Node: Obsolete132630 -Node: Undocumented133327 -Node: Regexp133570 -Node: Regexp Usage134959 -Node: Escape Sequences136985 -Node: Regexp Operators142748 -Ref: Regexp Operators-Footnote-1150128 -Ref: Regexp Operators-Footnote-2150275 -Node: Bracket Expressions150373 -Ref: table-char-classes152263 -Node: GNU Regexp Operators154786 -Node: Case-sensitivity158509 -Ref: Case-sensitivity-Footnote-1161477 -Ref: Case-sensitivity-Footnote-2161712 -Node: Leftmost Longest161820 -Node: Computed Regexps163021 -Node: Reading Files166431 -Node: Records168434 -Ref: Records-Footnote-1177358 -Node: Fields177395 -Ref: Fields-Footnote-1180428 -Node: Nonconstant Fields180514 -Node: Changing Fields182716 -Node: Field Separators188697 -Node: Default Field Splitting191326 -Node: Regexp Field Splitting192443 -Node: Single Character Fields195785 -Node: Command Line Field Separator196844 -Node: Field Splitting Summary200285 -Ref: Field Splitting Summary-Footnote-1203477 -Node: Constant Size203578 -Node: Splitting By Content208162 -Ref: Splitting By Content-Footnote-1211888 -Node: Multiple Line211928 -Ref: Multiple Line-Footnote-1217775 -Node: Getline217954 -Node: Plain Getline220170 -Node: Getline/Variable222259 -Node: Getline/File223400 -Node: Getline/Variable/File224722 -Ref: Getline/Variable/File-Footnote-1226321 -Node: Getline/Pipe226408 -Node: Getline/Variable/Pipe228968 -Node: Getline/Coprocess230075 -Node: Getline/Variable/Coprocess231318 -Node: Getline Notes232032 -Node: Getline Summary234819 -Ref: table-getline-variants235227 -Node: Read Timeout236083 -Ref: Read Timeout-Footnote-1239828 -Node: Command line directories239885 -Node: Printing240515 -Node: Print242146 -Node: Print Examples243483 -Node: Output Separators246267 -Node: OFMT248027 -Node: Printf249385 -Node: Basic Printf250291 -Node: Control Letters251830 -Node: Format Modifiers255642 -Node: Printf Examples261651 -Node: Redirection264366 -Node: Special Files271350 -Node: Special FD271883 -Ref: Special FD-Footnote-1275508 -Node: Special Network275582 -Node: Special Caveats276432 -Node: Close Files And Pipes277228 -Ref: Close Files And Pipes-Footnote-1284251 -Ref: Close Files And Pipes-Footnote-2284399 -Node: Expressions284549 -Node: Values285681 -Node: Constants286357 -Node: Scalar Constants287037 -Ref: Scalar Constants-Footnote-1287896 -Node: Nondecimal-numbers288078 -Node: Regexp Constants291137 -Node: Using Constant Regexps291612 -Node: Variables294667 -Node: Using Variables295322 -Node: Assignment Options297046 -Node: Conversion298918 -Ref: table-locale-affects304294 -Ref: Conversion-Footnote-1304918 -Node: All Operators305027 -Node: Arithmetic Ops305657 -Node: Concatenation308162 -Ref: Concatenation-Footnote-1310955 -Node: Assignment Ops311075 -Ref: table-assign-ops316063 -Node: Increment Ops317471 -Node: Truth Values and Conditions320941 -Node: Truth Values322024 -Node: Typing and Comparison323073 -Node: Variable Typing323862 -Ref: Variable Typing-Footnote-1327759 -Node: Comparison Operators327881 -Ref: table-relational-ops328291 -Node: POSIX String Comparison331840 -Ref: POSIX String Comparison-Footnote-1332796 -Node: Boolean Ops332934 -Ref: Boolean Ops-Footnote-1337012 -Node: Conditional Exp337103 -Node: Function Calls338835 -Node: Precedence342429 -Node: Locales346098 -Node: Patterns and Actions347187 -Node: Pattern Overview348241 -Node: Regexp Patterns349910 -Node: Expression Patterns350453 -Node: Ranges354138 -Node: BEGIN/END357104 -Node: Using BEGIN/END357866 -Ref: Using BEGIN/END-Footnote-1360597 -Node: I/O And BEGIN/END360703 -Node: BEGINFILE/ENDFILE362985 -Node: Empty365889 -Node: Using Shell Variables366205 -Node: Action Overview368490 -Node: Statements370847 -Node: If Statement372701 -Node: While Statement374200 -Node: Do Statement376244 -Node: For Statement377400 -Node: Switch Statement380552 -Node: Break Statement382649 -Node: Continue Statement384639 -Node: Next Statement386432 -Node: Nextfile Statement388822 -Node: Exit Statement391463 -Node: Built-in Variables393879 -Node: User-modified394974 -Ref: User-modified-Footnote-1403329 -Node: Auto-set403391 -Ref: Auto-set-Footnote-1415742 -Ref: Auto-set-Footnote-2415947 -Node: ARGC and ARGV416003 -Node: Arrays419854 -Node: Array Basics421359 -Node: Array Intro422185 -Node: Reference to Elements426503 -Node: Assigning Elements428773 -Node: Array Example429264 -Node: Scanning an Array430996 -Node: Controlling Scanning433310 -Ref: Controlling Scanning-Footnote-1438243 -Node: Delete438559 -Ref: Delete-Footnote-1441324 -Node: Numeric Array Subscripts441381 -Node: Uninitialized Subscripts443564 -Node: Multi-dimensional445192 -Node: Multi-scanning448286 -Node: Arrays of Arrays449877 -Node: Functions454522 -Node: Built-in455344 -Node: Calling Built-in456422 -Node: Numeric Functions458410 -Ref: Numeric Functions-Footnote-1462242 -Ref: Numeric Functions-Footnote-2462599 -Ref: Numeric Functions-Footnote-3462647 -Node: String Functions462916 -Ref: String Functions-Footnote-1486413 -Ref: String Functions-Footnote-2486542 -Ref: String Functions-Footnote-3486790 -Node: Gory Details486877 -Ref: table-sub-escapes488556 -Ref: table-sub-posix-92489910 -Ref: table-sub-proposed491253 -Ref: table-posix-sub492603 -Ref: table-gensub-escapes494149 -Ref: Gory Details-Footnote-1495356 -Ref: Gory Details-Footnote-2495407 -Node: I/O Functions495558 -Ref: I/O Functions-Footnote-1502213 -Node: Time Functions502360 -Ref: Time Functions-Footnote-1513252 -Ref: Time Functions-Footnote-2513320 -Ref: Time Functions-Footnote-3513478 -Ref: Time Functions-Footnote-4513589 -Ref: Time Functions-Footnote-5513701 -Ref: Time Functions-Footnote-6513928 -Node: Bitwise Functions514194 -Ref: table-bitwise-ops514752 -Ref: Bitwise Functions-Footnote-1518973 -Node: Type Functions519157 -Node: I18N Functions519627 -Node: User-defined521254 -Node: Definition Syntax522058 -Ref: Definition Syntax-Footnote-1526968 -Node: Function Example527037 -Node: Function Caveats529631 -Node: Calling A Function530052 -Node: Variable Scope531167 -Node: Pass By Value/Reference533142 -Node: Return Statement536582 -Node: Dynamic Typing539563 -Node: Indirect Calls540298 -Node: Internationalization549983 -Node: I18N and L10N551409 -Node: Explaining gettext552095 -Ref: Explaining gettext-Footnote-1557161 -Ref: Explaining gettext-Footnote-2557345 -Node: Programmer i18n557510 -Node: Translator i18n561710 -Node: String Extraction562503 -Ref: String Extraction-Footnote-1563464 -Node: Printf Ordering563550 -Ref: Printf Ordering-Footnote-1566334 -Node: I18N Portability566398 -Ref: I18N Portability-Footnote-1568847 -Node: I18N Example568910 -Ref: I18N Example-Footnote-1571545 -Node: Gawk I18N571617 -Node: Advanced Features572234 -Node: Nondecimal Data573747 -Node: Array Sorting575330 -Node: Controlling Array Traversal576027 -Node: Array Sorting Functions584265 -Ref: Array Sorting Functions-Footnote-1587939 -Ref: Array Sorting Functions-Footnote-2588032 -Node: Two-way I/O588226 -Ref: Two-way I/O-Footnote-1593658 -Node: TCP/IP Networking593728 -Node: Profiling596572 -Node: Library Functions604026 -Ref: Library Functions-Footnote-1607033 -Node: Library Names607204 -Ref: Library Names-Footnote-1610675 -Ref: Library Names-Footnote-2610895 -Node: General Functions610981 -Node: Strtonum Function611934 -Node: Assert Function614864 -Node: Round Function618190 -Node: Cliff Random Function619733 -Node: Ordinal Functions620749 -Ref: Ordinal Functions-Footnote-1623819 -Ref: Ordinal Functions-Footnote-2624071 -Node: Join Function624280 -Ref: Join Function-Footnote-1626051 -Node: Getlocaltime Function626251 -Node: Data File Management629966 -Node: Filetrans Function630598 -Node: Rewind Function634737 -Node: File Checking636124 -Node: Empty Files637218 -Node: Ignoring Assigns639448 -Node: Getopt Function641001 -Ref: Getopt Function-Footnote-1652305 -Node: Passwd Functions652508 -Ref: Passwd Functions-Footnote-1661483 -Node: Group Functions661571 -Node: Walking Arrays669655 -Node: Sample Programs671224 -Node: Running Examples671889 -Node: Clones672617 -Node: Cut Program673841 -Node: Egrep Program683686 -Ref: Egrep Program-Footnote-1691459 -Node: Id Program691569 -Node: Split Program695185 -Ref: Split Program-Footnote-1698704 -Node: Tee Program698832 -Node: Uniq Program701635 -Node: Wc Program709064 -Ref: Wc Program-Footnote-1713330 -Ref: Wc Program-Footnote-2713530 -Node: Miscellaneous Programs713622 -Node: Dupword Program714810 -Node: Alarm Program716841 -Node: Translate Program721590 -Ref: Translate Program-Footnote-1725977 -Ref: Translate Program-Footnote-2726205 -Node: Labels Program726339 -Ref: Labels Program-Footnote-1729710 -Node: Word Sorting729794 -Node: History Sorting733678 -Node: Extract Program735517 -Ref: Extract Program-Footnote-1743000 -Node: Simple Sed743128 -Node: Igawk Program746190 -Ref: Igawk Program-Footnote-1761347 -Ref: Igawk Program-Footnote-2761548 -Node: Anagram Program761686 -Node: Signature Program764754 -Node: Debugger765854 -Node: Debugging766820 -Node: Debugging Concepts767253 -Node: Debugging Terms769109 -Node: Awk Debugging771706 -Node: Sample Debugging Session772598 -Node: Debugger Invocation773118 -Node: Finding The Bug774447 -Node: List of Debugger Commands780935 -Node: Breakpoint Control782269 -Node: Debugger Execution Control785933 -Node: Viewing And Changing Data789293 -Node: Execution Stack792649 -Node: Debugger Info794116 -Node: Miscellaneous Debugger Commands798097 -Node: Readline Support803542 -Node: Limitations804373 -Node: Arbitrary Precision Arithmetic806625 -Ref: Arbitrary Precision Arithmetic-Footnote-1808267 -Node: General Arithmetic808415 -Node: Floating Point Issues810135 -Node: String Conversion Precision811016 -Ref: String Conversion Precision-Footnote-1812722 -Node: Unexpected Results812831 -Node: POSIX Floating Point Problems814984 -Ref: POSIX Floating Point Problems-Footnote-1818809 -Node: Integer Programming818847 -Node: Floating-point Programming820600 -Ref: Floating-point Programming-Footnote-1826909 -Node: Floating-point Representation827173 -Node: Floating-point Context828338 -Ref: table-ieee-formats829180 -Node: Rounding Mode830564 -Ref: table-rounding-modes831043 -Ref: Rounding Mode-Footnote-1834047 -Node: Gawk and MPFR834228 -Node: Arbitrary Precision Floats835470 -Ref: Arbitrary Precision Floats-Footnote-1837899 -Node: Setting Precision838210 -Node: Setting Rounding Mode840943 -Ref: table-gawk-rounding-modes841347 -Node: Floating-point Constants842527 -Node: Changing Precision843951 -Ref: Changing Precision-Footnote-1845351 -Node: Exact Arithmetic845525 -Node: Arbitrary Precision Integers848633 -Ref: Arbitrary Precision Integers-Footnote-1851633 -Node: Dynamic Extensions851780 -Node: Plugin License852698 -Node: Sample Library853312 -Node: Internal File Description853996 -Node: Internal File Ops857709 -Ref: Internal File Ops-Footnote-1862272 -Node: Using Internal File Ops862412 -Node: Language History864788 -Node: V7/SVR3.1866310 -Node: SVR4868631 -Node: POSIX870073 -Node: BTL871081 -Node: POSIX/GNU871815 -Node: Common Extensions877350 -Node: Ranges and Locales878457 -Ref: Ranges and Locales-Footnote-1883075 -Ref: Ranges and Locales-Footnote-2883102 -Ref: Ranges and Locales-Footnote-3883362 -Node: Contributors883583 -Node: Installation887879 -Node: Gawk Distribution888773 -Node: Getting889257 -Node: Extracting890083 -Node: Distribution contents891775 -Node: Unix Installation896997 -Node: Quick Installation897614 -Node: Additional Configuration Options899576 -Node: Configuration Philosophy901053 -Node: Non-Unix Installation903395 -Node: PC Installation903853 -Node: PC Binary Installation905152 -Node: PC Compiling907000 -Node: PC Testing909944 -Node: PC Using911120 -Node: Cygwin915305 -Node: MSYS916305 -Node: VMS Installation916819 -Node: VMS Compilation917422 -Ref: VMS Compilation-Footnote-1918429 -Node: VMS Installation Details918487 -Node: VMS Running920122 -Node: VMS Old Gawk921729 -Node: Bugs922203 -Node: Other Versions926055 -Node: Notes931370 -Node: Compatibility Mode931957 -Node: Additions932740 -Node: Accessing The Source933667 -Node: Adding Code935093 -Node: New Ports941135 -Node: Derived Files945270 -Ref: Derived Files-Footnote-1950575 -Ref: Derived Files-Footnote-2950609 -Ref: Derived Files-Footnote-3951209 -Node: Future Extensions951307 -Node: Basic Concepts952794 -Node: Basic High Level953475 -Ref: figure-general-flow953746 -Ref: figure-process-flow954345 -Ref: Basic High Level-Footnote-1957574 -Node: Basic Data Typing957759 -Node: Glossary961114 -Node: Copying986425 -Node: GNU Free Documentation License1023982 -Node: Index1049119 +Node: Foreword40050 +Node: Preface44395 +Ref: Preface-Footnote-147448 +Ref: Preface-Footnote-247554 +Node: History47786 +Node: Names50177 +Ref: Names-Footnote-151654 +Node: This Manual51726 +Ref: This Manual-Footnote-156854 +Node: Conventions56954 +Node: Manual History59088 +Ref: Manual History-Footnote-162358 +Ref: Manual History-Footnote-262399 +Node: How To Contribute62473 +Node: Acknowledgments63617 +Node: Getting Started68113 +Node: Running gawk70492 +Node: One-shot71678 +Node: Read Terminal72903 +Ref: Read Terminal-Footnote-174553 +Ref: Read Terminal-Footnote-274829 +Node: Long75000 +Node: Executable Scripts76376 +Ref: Executable Scripts-Footnote-178245 +Ref: Executable Scripts-Footnote-278347 +Node: Comments78894 +Node: Quoting81361 +Node: DOS Quoting85984 +Node: Sample Data Files86659 +Node: Very Simple89691 +Node: Two Rules94290 +Node: More Complex96437 +Ref: More Complex-Footnote-199367 +Node: Statements/Lines99452 +Ref: Statements/Lines-Footnote-1103914 +Node: Other Features104179 +Node: When105107 +Node: Invoking Gawk107254 +Node: Command Line108715 +Node: Options109498 +Ref: Options-Footnote-1124896 +Node: Other Arguments124921 +Node: Naming Standard Input127579 +Node: Environment Variables128673 +Node: AWKPATH Variable129231 +Ref: AWKPATH Variable-Footnote-1131989 +Node: AWKLIBPATH Variable132249 +Node: Other Environment Variables132846 +Node: Exit Status135341 +Node: Include Files136016 +Node: Loading Shared Libraries139585 +Node: Obsolete140810 +Node: Undocumented141507 +Node: Regexp141750 +Node: Regexp Usage143139 +Node: Escape Sequences145165 +Node: Regexp Operators150928 +Ref: Regexp Operators-Footnote-1158308 +Ref: Regexp Operators-Footnote-2158455 +Node: Bracket Expressions158553 +Ref: table-char-classes160443 +Node: GNU Regexp Operators162966 +Node: Case-sensitivity166689 +Ref: Case-sensitivity-Footnote-1169657 +Ref: Case-sensitivity-Footnote-2169892 +Node: Leftmost Longest170000 +Node: Computed Regexps171201 +Node: Reading Files174611 +Node: Records176614 +Ref: Records-Footnote-1185538 +Node: Fields185575 +Ref: Fields-Footnote-1188608 +Node: Nonconstant Fields188694 +Node: Changing Fields190896 +Node: Field Separators196877 +Node: Default Field Splitting199506 +Node: Regexp Field Splitting200623 +Node: Single Character Fields203965 +Node: Command Line Field Separator205024 +Node: Field Splitting Summary208465 +Ref: Field Splitting Summary-Footnote-1211657 +Node: Constant Size211758 +Node: Splitting By Content216342 +Ref: Splitting By Content-Footnote-1220068 +Node: Multiple Line220108 +Ref: Multiple Line-Footnote-1225955 +Node: Getline226134 +Node: Plain Getline228350 +Node: Getline/Variable230439 +Node: Getline/File231580 +Node: Getline/Variable/File232902 +Ref: Getline/Variable/File-Footnote-1234501 +Node: Getline/Pipe234588 +Node: Getline/Variable/Pipe237148 +Node: Getline/Coprocess238255 +Node: Getline/Variable/Coprocess239498 +Node: Getline Notes240212 +Node: Getline Summary242999 +Ref: table-getline-variants243407 +Node: Read Timeout244263 +Ref: Read Timeout-Footnote-1248008 +Node: Command line directories248065 +Node: Printing248695 +Node: Print250326 +Node: Print Examples251663 +Node: Output Separators254447 +Node: OFMT256207 +Node: Printf257565 +Node: Basic Printf258471 +Node: Control Letters260010 +Node: Format Modifiers263822 +Node: Printf Examples269831 +Node: Redirection272546 +Node: Special Files279530 +Node: Special FD280063 +Ref: Special FD-Footnote-1283688 +Node: Special Network283762 +Node: Special Caveats284612 +Node: Close Files And Pipes285408 +Ref: Close Files And Pipes-Footnote-1292431 +Ref: Close Files And Pipes-Footnote-2292579 +Node: Expressions292729 +Node: Values293861 +Node: Constants294537 +Node: Scalar Constants295217 +Ref: Scalar Constants-Footnote-1296076 +Node: Nondecimal-numbers296258 +Node: Regexp Constants299317 +Node: Using Constant Regexps299792 +Node: Variables302847 +Node: Using Variables303502 +Node: Assignment Options305226 +Node: Conversion307098 +Ref: table-locale-affects312474 +Ref: Conversion-Footnote-1313098 +Node: All Operators313207 +Node: Arithmetic Ops313837 +Node: Concatenation316342 +Ref: Concatenation-Footnote-1319135 +Node: Assignment Ops319255 +Ref: table-assign-ops324243 +Node: Increment Ops325651 +Node: Truth Values and Conditions329121 +Node: Truth Values330204 +Node: Typing and Comparison331253 +Node: Variable Typing332042 +Ref: Variable Typing-Footnote-1335939 +Node: Comparison Operators336061 +Ref: table-relational-ops336471 +Node: POSIX String Comparison340020 +Ref: POSIX String Comparison-Footnote-1340976 +Node: Boolean Ops341114 +Ref: Boolean Ops-Footnote-1345192 +Node: Conditional Exp345283 +Node: Function Calls347015 +Node: Precedence350609 +Node: Locales354278 +Node: Patterns and Actions355367 +Node: Pattern Overview356421 +Node: Regexp Patterns358090 +Node: Expression Patterns358633 +Node: Ranges362318 +Node: BEGIN/END365284 +Node: Using BEGIN/END366046 +Ref: Using BEGIN/END-Footnote-1368777 +Node: I/O And BEGIN/END368883 +Node: BEGINFILE/ENDFILE371165 +Node: Empty374069 +Node: Using Shell Variables374385 +Node: Action Overview376670 +Node: Statements379027 +Node: If Statement380881 +Node: While Statement382380 +Node: Do Statement384424 +Node: For Statement385580 +Node: Switch Statement388732 +Node: Break Statement390829 +Node: Continue Statement392819 +Node: Next Statement394612 +Node: Nextfile Statement397002 +Node: Exit Statement399643 +Node: Built-in Variables402059 +Node: User-modified403154 +Ref: User-modified-Footnote-1411509 +Node: Auto-set411571 +Ref: Auto-set-Footnote-1423922 +Ref: Auto-set-Footnote-2424127 +Node: ARGC and ARGV424183 +Node: Arrays428034 +Node: Array Basics429539 +Node: Array Intro430365 +Node: Reference to Elements434683 +Node: Assigning Elements436953 +Node: Array Example437444 +Node: Scanning an Array439176 +Node: Controlling Scanning441490 +Ref: Controlling Scanning-Footnote-1446423 +Node: Delete446739 +Ref: Delete-Footnote-1449504 +Node: Numeric Array Subscripts449561 +Node: Uninitialized Subscripts451744 +Node: Multi-dimensional453372 +Node: Multi-scanning456466 +Node: Arrays of Arrays458057 +Node: Functions462702 +Node: Built-in463524 +Node: Calling Built-in464602 +Node: Numeric Functions466590 +Ref: Numeric Functions-Footnote-1470422 +Ref: Numeric Functions-Footnote-2470779 +Ref: Numeric Functions-Footnote-3470827 +Node: String Functions471096 +Ref: String Functions-Footnote-1494593 +Ref: String Functions-Footnote-2494722 +Ref: String Functions-Footnote-3494970 +Node: Gory Details495057 +Ref: table-sub-escapes496736 +Ref: table-sub-posix-92498090 +Ref: table-sub-proposed499433 +Ref: table-posix-sub500783 +Ref: table-gensub-escapes502329 +Ref: Gory Details-Footnote-1503536 +Ref: Gory Details-Footnote-2503587 +Node: I/O Functions503738 +Ref: I/O Functions-Footnote-1510393 +Node: Time Functions510540 +Ref: Time Functions-Footnote-1521432 +Ref: Time Functions-Footnote-2521500 +Ref: Time Functions-Footnote-3521658 +Ref: Time Functions-Footnote-4521769 +Ref: Time Functions-Footnote-5521881 +Ref: Time Functions-Footnote-6522108 +Node: Bitwise Functions522374 +Ref: table-bitwise-ops522932 +Ref: Bitwise Functions-Footnote-1527153 +Node: Type Functions527337 +Node: I18N Functions527807 +Node: User-defined529434 +Node: Definition Syntax530238 +Ref: Definition Syntax-Footnote-1535148 +Node: Function Example535217 +Node: Function Caveats537811 +Node: Calling A Function538232 +Node: Variable Scope539347 +Node: Pass By Value/Reference541322 +Node: Return Statement544762 +Node: Dynamic Typing547743 +Node: Indirect Calls548478 +Node: Internationalization558163 +Node: I18N and L10N559589 +Node: Explaining gettext560275 +Ref: Explaining gettext-Footnote-1565341 +Ref: Explaining gettext-Footnote-2565525 +Node: Programmer i18n565690 +Node: Translator i18n569890 +Node: String Extraction570683 +Ref: String Extraction-Footnote-1571644 +Node: Printf Ordering571730 +Ref: Printf Ordering-Footnote-1574514 +Node: I18N Portability574578 +Ref: I18N Portability-Footnote-1577027 +Node: I18N Example577090 +Ref: I18N Example-Footnote-1579725 +Node: Gawk I18N579797 +Node: Advanced Features580414 +Node: Nondecimal Data581927 +Node: Array Sorting583510 +Node: Controlling Array Traversal584207 +Node: Array Sorting Functions592445 +Ref: Array Sorting Functions-Footnote-1596119 +Ref: Array Sorting Functions-Footnote-2596212 +Node: Two-way I/O596406 +Ref: Two-way I/O-Footnote-1601838 +Node: TCP/IP Networking601908 +Node: Profiling604752 +Node: Library Functions612206 +Ref: Library Functions-Footnote-1615213 +Node: Library Names615384 +Ref: Library Names-Footnote-1618855 +Ref: Library Names-Footnote-2619075 +Node: General Functions619161 +Node: Strtonum Function620114 +Node: Assert Function623044 +Node: Round Function626370 +Node: Cliff Random Function627913 +Node: Ordinal Functions628929 +Ref: Ordinal Functions-Footnote-1631999 +Ref: Ordinal Functions-Footnote-2632251 +Node: Join Function632460 +Ref: Join Function-Footnote-1634231 +Node: Getlocaltime Function634431 +Node: Data File Management638146 +Node: Filetrans Function638778 +Node: Rewind Function642917 +Node: File Checking644304 +Node: Empty Files645398 +Node: Ignoring Assigns647628 +Node: Getopt Function649181 +Ref: Getopt Function-Footnote-1660485 +Node: Passwd Functions660688 +Ref: Passwd Functions-Footnote-1669663 +Node: Group Functions669751 +Node: Walking Arrays677835 +Node: Sample Programs679404 +Node: Running Examples680069 +Node: Clones680797 +Node: Cut Program682021 +Node: Egrep Program691866 +Ref: Egrep Program-Footnote-1699639 +Node: Id Program699749 +Node: Split Program703365 +Ref: Split Program-Footnote-1706884 +Node: Tee Program707012 +Node: Uniq Program709815 +Node: Wc Program717244 +Ref: Wc Program-Footnote-1721510 +Ref: Wc Program-Footnote-2721710 +Node: Miscellaneous Programs721802 +Node: Dupword Program722990 +Node: Alarm Program725021 +Node: Translate Program729770 +Ref: Translate Program-Footnote-1734157 +Ref: Translate Program-Footnote-2734385 +Node: Labels Program734519 +Ref: Labels Program-Footnote-1737890 +Node: Word Sorting737974 +Node: History Sorting741858 +Node: Extract Program743697 +Ref: Extract Program-Footnote-1751180 +Node: Simple Sed751308 +Node: Igawk Program754370 +Ref: Igawk Program-Footnote-1769527 +Ref: Igawk Program-Footnote-2769728 +Node: Anagram Program769866 +Node: Signature Program772934 +Node: Debugger774034 +Node: Debugging775000 +Node: Debugging Concepts775433 +Node: Debugging Terms777289 +Node: Awk Debugging779886 +Node: Sample Debugging Session780778 +Node: Debugger Invocation781298 +Node: Finding The Bug782627 +Node: List of Debugger Commands789115 +Node: Breakpoint Control790449 +Node: Debugger Execution Control794113 +Node: Viewing And Changing Data797473 +Node: Execution Stack800829 +Node: Debugger Info802296 +Node: Miscellaneous Debugger Commands806277 +Node: Readline Support811722 +Node: Limitations812553 +Node: Arbitrary Precision Arithmetic814805 +Ref: Arbitrary Precision Arithmetic-Footnote-1816447 +Node: General Arithmetic816595 +Node: Floating Point Issues818315 +Node: String Conversion Precision819196 +Ref: String Conversion Precision-Footnote-1820902 +Node: Unexpected Results821011 +Node: POSIX Floating Point Problems823164 +Ref: POSIX Floating Point Problems-Footnote-1826989 +Node: Integer Programming827027 +Node: Floating-point Programming828780 +Ref: Floating-point Programming-Footnote-1835089 +Node: Floating-point Representation835353 +Node: Floating-point Context836518 +Ref: table-ieee-formats837360 +Node: Rounding Mode838744 +Ref: table-rounding-modes839223 +Ref: Rounding Mode-Footnote-1842227 +Node: Gawk and MPFR842408 +Node: Arbitrary Precision Floats843650 +Ref: Arbitrary Precision Floats-Footnote-1846079 +Node: Setting Precision846390 +Node: Setting Rounding Mode849123 +Ref: table-gawk-rounding-modes849527 +Node: Floating-point Constants850707 +Node: Changing Precision852131 +Ref: Changing Precision-Footnote-1853531 +Node: Exact Arithmetic853705 +Node: Arbitrary Precision Integers856813 +Ref: Arbitrary Precision Integers-Footnote-1859813 +Node: Dynamic Extensions859960 +Node: Extension Intro861283 +Node: Plugin License862486 +Node: Extension Design863160 +Node: Old Extension Problems864231 +Ref: Old Extension Problems-Footnote-1865741 +Node: Extension New Mechanism Goals865798 +Ref: Extension New Mechanism Goals-Footnote-1868510 +Node: Extension Other Design Decisions868696 +Node: Extension Mechanism Outline870443 +Ref: load-extension871426 +Ref: load-new-function872859 +Ref: call-new-function873795 +Node: Extension Future Growth875776 +Node: Extension API Description876518 +Node: Extension API Functions Introduction877838 +Node: General Data Types881913 +Ref: General Data Types-Footnote-1887546 +Node: Requesting Values887845 +Ref: table-value-types-returned888576 +Node: Constructor Functions889530 +Node: Registration Functions892526 +Node: Extension Functions893211 +Node: Exit Callback Functions895030 +Node: Extension Version String896273 +Node: Input Parsers896923 +Node: Output Wrappers905504 +Node: Two-way processors909897 +Node: Printing Messages912019 +Ref: Printing Messages-Footnote-1913096 +Node: Updating `ERRNO'913248 +Node: Accessing Parameters913987 +Node: Symbol Table Access915217 +Node: Symbol table by name915729 +Ref: Symbol table by name-Footnote-1917901 +Node: Symbol table by cookie917981 +Ref: Symbol table by cookie-Footnote-1922110 +Node: Cached values922173 +Ref: Cached values-Footnote-1925374 +Node: Array Manipulation925465 +Ref: Array Manipulation-Footnote-1926563 +Node: Array Data Types926602 +Ref: Array Data Types-Footnote-1929328 +Node: Array Functions929420 +Node: Flattening Arrays933186 +Node: Creating Arrays940017 +Node: Extension API Variables944813 +Node: Extension Versioning945449 +Node: Extension API Informational Variables947350 +Node: Extension API Boilerplate948436 +Node: Finding Extensions952270 +Node: Extension Example952817 +Node: Internal File Description953555 +Node: Internal File Ops957243 +Ref: Internal File Ops-Footnote-1968327 +Node: Using Internal File Ops968467 +Ref: Using Internal File Ops-Footnote-1970823 +Node: Extension Samples971089 +Node: Extension Sample File Functions972532 +Node: Extension Sample Fnmatch980901 +Node: Extension Sample Fork982627 +Node: Extension Sample Ord983841 +Node: Extension Sample Readdir984617 +Node: Extension Sample Revout986955 +Node: Extension Sample Rev2way987548 +Node: Extension Sample Read write array988238 +Node: Extension Sample Readfile990121 +Node: Extension Sample API Tests990876 +Node: Extension Sample Time991401 +Node: gawkextlib992710 +Node: Language History995093 +Node: V7/SVR3.1996615 +Node: SVR4998936 +Node: POSIX1000378 +Node: BTL1001386 +Node: POSIX/GNU1002120 +Node: Common Extensions1007655 +Node: Ranges and Locales1008762 +Ref: Ranges and Locales-Footnote-11013380 +Ref: Ranges and Locales-Footnote-21013407 +Ref: Ranges and Locales-Footnote-31013667 +Node: Contributors1013888 +Node: Installation1018184 +Node: Gawk Distribution1019078 +Node: Getting1019562 +Node: Extracting1020388 +Node: Distribution contents1022080 +Node: Unix Installation1027302 +Node: Quick Installation1027919 +Node: Additional Configuration Options1029881 +Node: Configuration Philosophy1031358 +Node: Non-Unix Installation1033700 +Node: PC Installation1034158 +Node: PC Binary Installation1035457 +Node: PC Compiling1037305 +Node: PC Testing1040249 +Node: PC Using1041425 +Node: Cygwin1045610 +Node: MSYS1046610 +Node: VMS Installation1047124 +Node: VMS Compilation1047727 +Ref: VMS Compilation-Footnote-11048734 +Node: VMS Installation Details1048792 +Node: VMS Running1050427 +Node: VMS Old Gawk1052034 +Node: Bugs1052508 +Node: Other Versions1056360 +Node: Notes1061675 +Node: Compatibility Mode1062262 +Node: Additions1063045 +Node: Accessing The Source1063972 +Node: Adding Code1065398 +Node: New Ports1071440 +Node: Derived Files1075575 +Ref: Derived Files-Footnote-11080880 +Ref: Derived Files-Footnote-21080914 +Ref: Derived Files-Footnote-31081514 +Node: Future Extensions1081612 +Node: Basic Concepts1083099 +Node: Basic High Level1083780 +Ref: figure-general-flow1084051 +Ref: figure-process-flow1084650 +Ref: Basic High Level-Footnote-11087879 +Node: Basic Data Typing1088064 +Node: Glossary1091419 +Node: Copying1116730 +Node: GNU Free Documentation License1154287 +Node: Index1179424 End Tag Table |