diff options
Diffstat (limited to 'doc/api.texi')
-rw-r--r-- | doc/api.texi | 3969 |
1 files changed, 3969 insertions, 0 deletions
diff --git a/doc/api.texi b/doc/api.texi new file mode 100644 index 00000000..9dc2c300 --- /dev/null +++ b/doc/api.texi @@ -0,0 +1,3969 @@ +\input texinfo @c -*-texinfo-*- +@c %**start of header (This is for running Texinfo on a region.) +@setfilename api.info +@settitle Writing Extensions For Gawk +@c %**end of header (This is for running Texinfo on a region.) + +@dircategory Text creation and manipulation +@direntry +* Gawk: (gawk). A text scanning and processing language. +@end direntry +@dircategory Individual utilities +@direntry +* awk: (gawk)Invoking gawk. Text scanning and processing. +@end direntry + +@set xref-automatic-section-title + +@c The following information should be updated here only! +@c This sets the edition of the document, the version of gawk it +@c applies to and all the info about who's publishing this edition + +@c These apply across the board. +@set UPDATE-MONTH October, 2012 +@set VERSION 4.1 +@set PATCHLEVEL 0 + +@set FSF + +@set TITLE Writing Extensions for Gawk +@set SUBTITLE A Temporary Manual +@set EDITION 1 + +@iftex +@set DOCUMENT book +@set CHAPTER chapter +@set APPENDIX appendix +@set SECTION section +@set SUBSECTION subsection +@set DARKCORNER @inmargin{@image{lflashlight,1cm}, @image{rflashlight,1cm}} +@set COMMONEXT (c.e.) +@end iftex +@ifinfo +@set DOCUMENT Info file +@set CHAPTER major node +@set APPENDIX major node +@set SECTION minor node +@set SUBSECTION node +@set DARKCORNER (d.c.) +@set COMMONEXT (c.e.) +@end ifinfo +@ifhtml +@set DOCUMENT Web page +@set CHAPTER chapter +@set APPENDIX appendix +@set SECTION section +@set SUBSECTION subsection +@set DARKCORNER (d.c.) +@set COMMONEXT (c.e.) +@end ifhtml +@ifdocbook +@set DOCUMENT book +@set CHAPTER chapter +@set APPENDIX appendix +@set SECTION section +@set SUBSECTION subsection +@set DARKCORNER (d.c.) +@set COMMONEXT (c.e.) +@end ifdocbook +@ifplaintext +@set DOCUMENT book +@set CHAPTER chapter +@set APPENDIX appendix +@set SECTION section +@set SUBSECTION subsection +@set DARKCORNER (d.c.) +@set COMMONEXT (c.e.) +@end ifplaintext + +@c some special symbols +@iftex +@set LEQ @math{@leq} +@set PI @math{@pi} +@end iftex +@ifnottex +@set LEQ <= +@set PI @i{pi} +@end ifnottex + +@ifnottex +@macro ii{text} +@i{\text\} +@end macro +@end ifnottex + +@c For HTML, spell out email addresses, to avoid problems with +@c address harvesters for spammers. +@ifhtml +@macro EMAIL{real,spelled} +``\spelled\'' +@end macro +@end ifhtml +@ifnothtml +@macro EMAIL{real,spelled} +@email{\real\} +@end macro +@end ifnothtml + +@set FN file name +@set FFN File Name +@set DF data file +@set DDF Data File +@set PVERSION version +@set CTL Ctrl + +@ignore +Some comments on the layout for TeX. +1. Use at least texinfo.tex 2000-09-06.09 +2. I have done A LOT of work to make this look good. There are `@page' commands + and use of `@group ... @end group' in a number of places. If you muck + with anything, it's your responsibility not to break the layout. +@end ignore + +@c merge the function and variable indexes into the concept index +@ifinfo +@synindex fn cp +@synindex vr cp +@end ifinfo +@iftex +@syncodeindex fn cp +@syncodeindex vr cp +@end iftex +@ifxml +@syncodeindex fn cp +@syncodeindex vr cp +@end ifxml + +@c If "finalout" is commented out, the printed output will show +@c black boxes that mark lines that are too long. Thus, it is +@c unwise to comment it out when running a master in case there are +@c overfulls which are deemed okay. + +@iftex +@finalout +@end iftex + +@copying +Copyright @copyright{} 2012 +Free Software Foundation, Inc. +@sp 2 + +This is Edition @value{EDITION} of @cite{@value{TITLE}: @value{SUBTITLE}}, +for the @value{VERSION}.@value{PATCHLEVEL} (or later) version of the GNU +implementation of AWK. + +Permission is granted to copy, distribute and/or modify this document +under the terms of the GNU Free Documentation License, Version 1.3 or +any later version published by the Free Software Foundation; with the +Invariant Sections being ``GNU General Public License'', the Front-Cover +texts being (a) (see below), and with the Back-Cover Texts being (b) +(see below). A copy of the license is included in the section entitled +``GNU Free Documentation License''. + +@enumerate a +@item +``A GNU Manual'' + +@item +``You have the freedom to +copy and modify this GNU manual. Buying copies from the FSF +supports it in developing GNU and promoting software freedom.'' +@end enumerate +@end copying + +@c Comment out the "smallbook" for technical review. Saves +@c considerable paper. Remember to turn it back on *before* +@c starting the page-breaking work. + +@c 4/2002: Karl Berry recommends commenting out this and the +@c `@setchapternewpage odd', and letting users use `texi2dvi -t' +@c if they want to waste paper. +@c @smallbook + + +@c Uncomment this for the release. Leaving it off saves paper +@c during editing and review. +@setchapternewpage odd + +@titlepage +@title @value{TITLE} +@subtitle @value{SUBTITLE} +@subtitle Edition @value{EDITION} +@subtitle @value{UPDATE-MONTH} +@author Arnold D. Robbins + +@c Include the Distribution inside the titlepage environment so +@c that headings are turned off. Headings on and off do not work. + +@page +@vskip 0pt plus 1filll +``To boldly go where no man has gone before'' is a +Registered Trademark of Paramount Pictures Corporation. @* +@c sorry, i couldn't resist +@sp 3 +Published by: +@sp 1 + +Free Software Foundation @* +51 Franklin Street, Fifth Floor @* +Boston, MA 02110-1301 USA @* +Phone: +1-617-542-5942 @* +Fax: +1-617-542-2652 @* +Email: @email{gnu@@gnu.org} @* +URL: @uref{http://www.gnu.org/} @* + +@c This one is correct for gawk 3.1.0 from the FSF +ISBN 1-882114-28-0 @* +@sp 2 +@insertcopying +@end titlepage + +@ifnottex +@node Top +@top Top Node + +Fake top node. + +@insertcopying + +@end ifnottex + +@menu +* Extension API:: Writing Extensions for @command{gawk}. +* Fake Chapter:: Fake Sections For Cross References. + +@detailmenu +* Extension Intro:: What is an extension. +* Plugin License:: A note about licensing. +* Extension Design:: Design notes about the extension API. +* Old Extension Problems:: Problems with the old mechanism. +* Extension New Mechanism Goals:: Goals for the new mechanism. +* Extension Other Design Decisions:: Some other design decisions. +* Extension Mechanism Outline:: An outline of how it works. +* Extension Future Growth:: Some room for future growth. +* Extension API Description:: A full description of the API. +* Extension API Functions Introduction:: Introduction to the API functions. +* General Data Types:: The data types. +* Requesting Values:: How to get a value. +* Constructor Functions:: Functions for creating values. +* Registration Functions:: Functions to register things with + @command{gawk}. +* Extension Functions:: Registering extension functions. +* Input Parsers:: Registering an input parser. +* Output Wrappers:: Registering an output wrapper. +* Two-way processors:: Registering a two-way processor. +* Exit Callback Functions:: Registering an exit callback. +* Extension Version String:: Registering a version string. +* Printing Messages:: Functions for printing messages. +* Updating @code{ERRNO}:: Functions for updating @code{ERRNO}. +* Accessing Parameters:: Functions for accessing parameters. +* Symbol Table Access:: Functions for accessing global + variables. +* Symbol table by name:: Accessing variables by name. +* Symbol table by cookie:: Accessing variables by ``cookie''. +* Cached values:: Creating and using cached values. +* Array Manipulation:: Functions for working with arrays. +* Array Data Types:: Data types for working with arrays. +* Array Functions:: Functions for working with arrays. +* Flattening Arrays:: How to flatten arrays. +* Creating Arrays:: How to create and populate arrays. +* Extension API Variables:: Variables provided by the API. +* Extension Versioning:: API Version information. +* Extension API Informational Variables:: Variables providing information about + @command{gawk}'s invocation. +* Extension API Boilerplate:: Boilerplate code for using the API. +* Finding Extensions:: How @command{gawk} find compiled + extensions. +* Extension Example:: Example C code for an extension. +* Internal File Description:: What the new functions will do. +* Internal File Ops:: The code for internal file operations. +* Using Internal File Ops:: How to use an external extension. +* Extension Samples:: The sample extensions that ship with + @code{gawk}. +* Extension Sample File Functions:: The file functions sample. +* Extension Sample Fnmatch:: An interface to @code{fnmatch()}. +* Extension Sample Fork:: An interface to @code{fork()} and + other process functions. +* Extension Sample Ord:: Character to value to character + conversions. +* Extension Sample Readdir:: An interface to @code{readdir()}. +* Extension Sample Revout:: Reversing output sample output + wrapper. +* Extension Sample Rev2way:: Reversing data sample two-way + processor. +* Extension Sample Read write array:: Serializing an array to a file. +* Extension Sample Readfile:: Reading an entire file into a string. +* Extension Sample API Tests:: Tests for the API. +* Extension Sample Time:: An interface to @code{gettimeofday()} + and @code{sleep()}. +* gawkextlib:: The @code{gawkextlib} project. +* Reference to Elements:: Referring to an Array Element. +* Built-in:: Built-in Functions. +* Built-in Variables:: Built-in Variables. +* Options:: Command-Line Options. +@end detailmenu +@end menu + +@contents + +@node Extension API +@chapter Writing Extensions for @command{gawk} + +It is possible to add new built-in +functions to @command{gawk} using dynamically loaded libraries. This +facility is available on systems (such as GNU/Linux) that support +the C @code{dlopen()} and @code{dlsym()} functions. +This @value{CHAPTER} describes how to do so using +code written in C or C++. If you don't know anything about C +programming, you can safely skip this @value{CHAPTER}, although you +may wish to review the documentation on the extensions that come +with @command{gawk} (@pxref{Extension Samples}). + +@quotation NOTE +When @option{--sandbox} is specified, extensions are disabled +(@pxref{Options}. +@end quotation + +@menu +* Extension Intro:: What is an extension. +* Plugin License:: A note about licensing. +* Extension Design:: Design notes about the extension API. +* Extension API Description:: A full description of the API. +* Extension Example:: Example C code for an extension. +* Extension Samples:: The sample extensions that ship with + @code{gawk}. +* gawkextlib:: The @code{gawkextlib} project. +@end menu + +@node Extension Intro +@section Introduction + +An @dfn{extension} (sometimes called a @dfn{plug-in}) is a piece of +external compiled code that @command{gawk} can load at runtime to +provide additional functionality, over and above the built-in capabilities +described in the rest of this @value{DOCUMENT}. + +Extensions are useful because they allow you (of course) to extend +@command{gawk}'s functionality. For example, they can provide access to +system calls (such as @code{chdir()} to change directory) and to other +C library routines that could be of use. As with most software, +``the sky is the limit;'' if you can imagine something that you might +want to do and can write in C or C++, you can write an extension to do it! + +Extensions are written in C or C++, using the @dfn{Application Programming +Interface} (API) defined for this purpose by the @command{gawk} +developers. The rest of this @value{CHAPTER} explains the design +decisions behind the API, the facilities it provides and how to use +them, and presents a small sample extension. In addition, it documents +the sample extensions included in the @command{gawk} distribution. + +@node Plugin License +@section Extension Licensing + +Every dynamic extension should define the global symbol +@code{plugin_is_GPL_compatible} to assert that it has been licensed under +a GPL-compatible license. If this symbol does not exist, @command{gawk} +will emit a fatal error and exit. + +The declared type of the symbol should be @code{int}. It does not need +to be in any allocated section, though. The code merely asserts that +the symbol exists in the global scope. Something like this is enough: + +@example +int plugin_is_GPL_compatible; +@end example + +@node Extension Design +@section Extension API Design + +The first version of extensions for @command{gawk} was developed in +the mid-1990s and released with @command{gawk} 3.1 in the late 1990s. +The basic mechanisms and design remained unchanged for close to 15 years, +until 2012. + +The old extension mechanism used data types and functions from +@command{gawk} itself, with a ``clever hack'' to install extension +functions. + +@command{gawk} included some sample extensions, of which a few were +really useful. However, it was clear from the outset that the extension +mechanism was bolted onto the side and was not really thought out. + +@menu +* Old Extension Problems:: Problems with the old mechanism. +* Extension New Mechanism Goals:: Goals for the new mechanism. +* Extension Other Design Decisions:: Some other design decisions. +* Extension Mechanism Outline:: An outline of how it works. +* Extension Future Growth:: Some room for future growth. +@end menu + +@node Old Extension Problems +@subsection Problems With The Old Mechanism + +The old extension mechanism had several problems: + +@itemize @bullet +@item +It depended heavily upon @command{gawk} internals. Any time the +@code{NODE} structure@footnote{A critical central data structure +inside @command{gawk}.} changed, an extension would have to be +recompiled. Furthermore, to really write extensions required understanding +something about @command{gawk}'s internal functions. There was some +documentation in this @value{DOCUMENT}, but it was quite minimal. + +@item +Being able to call into @command{gawk} from an extension required linker +facilities that are common on Unix-derived systems but that did +not work on Windows systems; users wanting extensions on Windows +had to statically link them into @command{gawk}, even though Windows supports +dynamic loading of shared objects. + +@item +The API would change occasionally as @command{gawk} changed; no compatibility +between versions was ever offered or planned for. +@end itemize + +Despite the drawbacks, the @command{xgawk} project developers forked +@command{gawk} and developed several significant extensions. They also +enhanced @command{gawk}'s facilities relating to file inclusion and +shared object access. + +A new API was desired for a long time, but only in 2012 did the +@command{gawk} maintainer and the @command{xgawk} developers finally +start working on it together. More information about the @command{xgawk} +project is provided in @ref{gawkextlib}. + +@node Extension New Mechanism Goals +@subsection Goals For A New Mechanism + +Some goals for the new API were: + +@itemize @bullet +@item +The API should be independent of @command{gawk} internals. Changes in +@command{gawk} internals should not be visible to the writer of an +extension function. + +@item +The API should provide @emph{binary} compatibility across @command{gawk} +releases as long as the API itself does not change. + +@item +The API should enable extensions written in C to have roughly the +same ``appearance'' to @command{awk}-level code as @command{awk} +functions do. This means that extensions should have: + +@itemize @minus +@item +The ability to access function parameters. + +@item +The ability to turn an undefined parameter into an array (call by reference). + +@item +The ability to create, access and update global variables. + +@item +Easy access to all the elements of an array at once (``array flattening'') +in order to loop over all the element in an easy fashion for C code. +@end itemize + +@item +The ability to create arrays (including @command{gawk}'s true +multi-dimensional arrays). +@end itemize + +Some additional important goals were: + +@itemize @bullet +@item +The API should use only features in ISO C 90, so that extensions +can be written using the widest range of C and C++ compilers. The header +should include the appropriate @samp{#ifdef __cplusplus} and @samp{extern "C"} +magic so that a C++ compiler could be used. (If using the C++, the runtime +system has to be smart enough to call any constructors and destructors, +as @command{gawk} is a C program. As of this writing, this has not been +tested.) + +@item +The API mechanism should not require access to @command{gawk}'s +symbols@footnote{The @dfn{symbols} are the variables and functions +defined inside @command{gawk}. Access to these symbols by code +external to @command{gawk} loaded dynamically at runtime is +problematic on Windows.} by the compile-time or dynamic linker, +in order to enable creation of extensions that will also work on Windows. +@end itemize + +During development, it became clear that there were other features +that should be available to extensions, which were also subsequently +provided: + +@itemize @bullet +@item +Extensions should have the ability to hook into @command{gawk}'s +I/O redirection mechanism. In particular, the @command{xgawk} +developers provided a so-called ``open hook'' to take over reading +records. During the development, this was generalized to allow +extensions to hook into input processing, output processing, and +two-way I/O. + +@item +An extension should be able to provide a ``call back'' function +to perform clean up actions when @command{gawk} exits. + +@item +An extension should be able to provide a version string so that +@command{gawk}'s @option{--version} option can provide information +about extensions as well. +@end itemize + +@node Extension Other Design Decisions +@subsection Other Design Decisions + +As an ``arbitrary'' design decision, extensions can read the values of +built-in variables and arrays (such as @code{ARGV} and @code{FS}), but cannot +change them, with the exception of @code{PROCINFO}. + +The reason for this is to prevent an extension function from affecting +the flow of an @command{awk} program outside its control. While a real +@command{awk} function can do what it likes, that is at the discretion +of the programmer. An extension function should provide a service or +make a C API available for use within @command{awk}, and not mess with +@code{FS} or @code{ARGC} and @code{ARGV}. + +In addition, it becomes easy to start down a slippery slope. How +much access to @command{gawk} facilities do extensions need? +Do they need @code{getline}? What about calling @code{gsub()} or +compiling regular expressions? What about calling into @command{awk} +functions? (@emph{That} would be messy.) + +In order to avoid these issues, the @command{gawk} developers chose +to start with the simplest, most basic features that are still truly useful. + +Another decision is that although @command{gawk} provides nice things like +MPFR, and arrays indexed internally by integers, these features are not +being brought out to the API in order to keep things simple and close to +traditional @command{awk} semantics. (In fact, arrays indexed internally +by integers are so transparent that they aren't even documented!) + +With time, the API will undoubtedly evolve; the @command{gawk} developers +expect this to be driven by user needs. For now, the current API seems +to provide a minimal yet powerful set of features for extension creation. + +@node Extension Mechanism Outline +@subsection At A High Level How It Works + +The requirement to avoid access to @command{gawk}'s symbols is, at first +glance, a difficult one to meet. + +One design, apparently used by Perl and Ruby and maybe others, would +be to make the mainline @command{gawk} code into a library, with the +@command{gawk} program a small C @code{main()} function linked against +the library. + +This seemed like the tail wagging the dog, complicating build and +installation and making a simple copy of the @command{gawk} executable +from one system to another (or one place to another on the same +system!) into a chancy operation. + +Pat Rankin suggested the solution that was adopted. Communication between +@command{gawk} and an extension is two-way. First, when an extension +is loaded, it is passed a pointer to a @code{struct} whose fields are +function pointers. + +FIXME: Figure 1 + +The extension can call functions inside @command{gawk} through these +function pointers, at runtime, without needing (link-time) access +to @command{gawk}'s symbols. One of these function pointers is to a +function for ``registering'' new built-in functions. + +FIXME: Figure 2 + +In the other direction, the extension registers its new functions +with @command{gawk} by passing function pointers to the functions that +provide the new feature (@code{do_chdir()}, for example). @command{gawk} +associates the function pointer with a name and can then call it, using a +defined calling convention. The @code{do_@var{xxx}()} function, in turn, +then uses the function pointers in the API @code{struct} to do its work, +such as updating variables or arrays, printing messages, setting @code{ERRNO}, +and so on. + +FIXME: Figure 3 + +Convenience macros in the @file{gawkapi.h} header file make calling +through the function pointers look like regular function calls so that +extension code is quite readable and understandable. + +Although all of this sounds medium complicated, the result is that +extension code is quite clean and straightforward. This can be seen in +the sample extensions @file{filefuncs.c} and also the @file{testext.c} +code for testing the APIs. + +Some other bits and pieces: + +@itemize @bullet +@item +The API provides access to @command{gawk}'s @code{do_@var{xxx}} values, +reflecting command line options, like @code{do_lint}, @code{do_profiling} +and so on (@pxref{Extension API Variables}). +These are informational: an extension cannot affect these +inside @command{gawk}. In addition, attempting to assign to them +produces a compile-time error. + +@item +The API also provides major and minor version numbers, so that an +extension can check if the @command{gawk} it is loaded with supports the +facilities it was compiled with. (Version mismatches ``shouldn't'' +happen, but we all know how @emph{that} goes.) +@xref{Extension Versioning}, for details. + +@item +An extension may register a version string with @command{gawk}; this +allows @command{gawk} to dump extension version information when +invoked with the @option{--version} option. +@end itemize + +@node Extension Future Growth +@subsection Room For Future Growth + +The API provides room for future growth, in two ways. + +An ``extension id'' is passed into the extension when its loaded. This +extension id is then passed back to @command{gawk} with each function +call. This allows @command{gawk} to identify the extension calling it, +should it need to know. + +A ``name space'' is passed into @command{gawk} when an extension function +is registered. This provides for a future mechanism for grouping +extension functions and possibly avoiding name conflicts. + +Of course, as of this writing, no decisions have been made with respect +to any of the above. + +@node Extension API Description +@section API Description + +This (rather large) @value{SECTION} describes the API in detail. + +@menu +* Extension API Functions Introduction:: Introduction to the API functions. +* General Data Types:: The data types. +* Requesting Values:: How to get a value. +* Constructor Functions:: Functions for creating values. +* Registration Functions:: Functions to register things with + @command{gawk}. +* Printing Messages:: Functions for printing messages. +* Updating @code{ERRNO}:: Functions for updating @code{ERRNO}. +* Accessing Parameters:: Functions for accessing parameters. +* Symbol Table Access:: Functions for accessing global + variables. +* Array Manipulation:: Functions for working with arrays. +* Extension API Variables:: Variables provided by the API. +* Extension API Boilerplate:: Boilerplate code for using the API. +* Finding Extensions:: How @command{gawk} find compiled + extensions. +@end menu + +@node Extension API Functions Introduction +@subsection Introduction + +Access to facilities within @command{gawk} are made available +by calling through function pointers passed into your extension. + +API function pointers are provided for the following kinds of operations: + +@itemize @bullet +@item +Registrations functions. You may register: +@itemize @minus +@item +extension functions, +@item +input parsers, +@item +output wrappers, +@item +two-way processors, +@item +exit callbacks, +@item +and a version string. +@end itemize +All of these are discussed in detail, later in this @value{CHAPTER}. + +@item +Printing fatal, warning, and lint warning messages. + +@item +Updating @code{ERRNO}, or unsetting it. + +@item +Accessing parameters, including converting an undefined parameter into +an array. + +@item +Symbol table access: retrieving a global variable, creating one, +or changing one. This also includes the ability to create a scalar +variable that will be @emph{constant} within @command{awk} code. + +@item +Creating and releasing cached values; this provides an +efficient way to use values for multiple variables and +can be a big performance win. + +@item +Manipulating arrays: +@itemize @minus +@item +Retrieving, adding, deleting, and modifying elements +@item +Getting the count of elements in an array +@item +Creating a new array +@item +Clearing an array +@item +Flattening an array for easy C style looping over an array +@end itemize +@end itemize + +Some points about using the API: + +@itemize @bullet +@item +You must include @code{<sys/types.h>} and @code{<sys/stat.h>} before including +the @file{gawkapi.h} header file. In addition, you must include either +@code{<stddef.h>} or @code{<stdlib.h>} to get the definition of @code{size_t}. +Finally, if you wish to use the boilerplate @code{dl_load_func()} macro, you will +need to include @code{<stdio.h>} as well. + +@item +Although the API only uses ISO C 90 features, there is an exception; the +``constructor'' functions use the @code{inline} keyword. If your compiler +does not support this keyword, you should either place +@samp{-Dinline=''} on your command line, or use the GNU Autotools and include a +@file{config.h} file in your extensions. + +@item +All pointers filled in by @command{gawk} are to memory +managed by @command{gawk} and should be treated by the extension as +read-only. Memory for @emph{all} strings passed into @command{gawk} +from the extension @emph{must} come from @code{malloc()} and is managed +by @command{gawk} from then on. + +@item +The API defines several simple structs that map values as seen +from @command{awk}. A value can be a @code{double}, a string, or an +array (as in multidimensional arrays, or when creating a new array). +Strings maintain both pointer and length since embedded @code{NUL} +characters are allowed. + +By intent, strings are maintained using the current multibyte encoding (as +defined by @env{LC_@var{xxx}} environment variables) and not using wide +characters. This matches how @command{gawk} stores strings internally +and also how characters are likely to be input and output from files. + +@item +When retrieving a value (such as a parameter or that of a global variable +or array element), the extension requests a specific type (number, string, +scalars, value cookie, array, or ``undefined''). When the request is +``undefined,'' the returned value will have the real underlying type. + +However, if the request and actual type don't match, the access function +returns ``false'' and fills in the type of the actual value that is there, +so that the extension can, e.g., print an error message +(``scalar passed where array expected''). + +@c This is documented in the header file and needs some expanding upon. +@c The table there should be presented here +@end itemize + +While you may call the API functions by using the function pointers +directly, the interface is not so pretty. To make extension code look +more like regular code, the @file{gawkapi.h} header file defines a number +of macros which you should use in your code. This @value{SECTION} presents +the macros as if they were functions. + +@node General Data Types +@subsection General Purpose Data Types + +@quotation +@i{I have a true love/hate relationship with unions.}@* +Arnold Robbins + +@i{That's the thing about unions: the compiler will arrange things so they +can accommodate both love and hate.}@* +Chet Ramey +@end quotation + +The extension API defines a number of simple types and structures for general +purpose use. Additional, more specialized, data structures, are introduced +in subsequent @value{SECTION}s, together with the functions that use them. + +@table @code +@item typedef void *awk_ext_id_t; +A value of this type is received from @command{gawk} when an extension is loaded. +That value must then be passed back to @command{gawk} as the first parameter of +each API function. + +@item #define awk_const @dots{} +This macro expands to @code{const} when compiling an extension, +and to nothing when compiling @command{gawk} itself. This makes +certain fields in the API data structures unwritable from extension code, +while allowing @command{gawk} to use them as it needs to. + +@item typedef int awk_bool_t; +A simple boolean type. As of this moment, the API does not define special +``true'' and ``false'' values, although perhaps it should. + +@item typedef struct @{ +@itemx @ @ @ @ char *str;@ @ @ @ @ @ /* data */ +@itemx @ @ @ @ size_t len;@ @ @ @ @ /* length thereof, in chars */ +@itemx @} awk_string_t; +This represents a mutable string. @command{gawk} +owns the memory pointed to if it supplied +the value. Otherwise, it takes ownership of the memory pointed to. +@strong{Such memory must come from @code{malloc()}!} + +As mentioned earlier, strings are maintained using the current +multibyte encoding. + +@item typedef enum @{ +@itemx @ @ @ @ AWK_UNDEFINED, +@itemx @ @ @ @ AWK_NUMBER, +@itemx @ @ @ @ AWK_STRING, +@itemx @ @ @ @ AWK_ARRAY, +@itemx @ @ @ @ AWK_SCALAR,@ @ @ @ @ @ @ @ @ /* opaque access to a variable */ +@itemx @ @ @ @ AWK_VALUE_COOKIE@ @ @ /* for updating a previously created value */ +@itemx @} awk_valtype_t; +This @code{enum} indicates the type of a value. +It is used in the following @code{struct}. + +@item typedef struct @{ +@itemx @ @ @ @ awk_valtype_t val_type; +@itemx @ @ @ @ union @{ +@itemx @ @ @ @ @ @ @ @ awk_string_t@ @ @ @ @ @ @ s; +@itemx @ @ @ @ @ @ @ @ double@ @ @ @ @ @ @ @ @ @ @ @ @ d; +@itemx @ @ @ @ @ @ @ @ awk_array_t@ @ @ @ @ @ @ @ a; +@itemx @ @ @ @ @ @ @ @ awk_scalar_t@ @ @ @ @ @ @ scl; +@itemx @ @ @ @ @ @ @ @ awk_value_cookie_t vc; +@itemx @ @ @ @ @} u; +@itemx @} awk_value_t; +An ``@command{awk} value.'' +The @code{val_type} member indicates what kind of value the +@code{union} holds, and each member is of the appropriate type. + +@item #define str_value@ @ @ @ @ @ u.s +@itemx #define num_value@ @ @ @ @ @ u.d +@itemx #define array_cookie@ @ @ u.a +@itemx #define scalar_cookie@ @ u.scl +@itemx #define value_cookie@ @ @ u.vc +These macros make accessing the fields of the @code{awk_value_t} more +readable. + +@item typedef void *awk_scalar_t; +Scalars can be represented as an opaque type. These values are obtained from +@command{gawk} and then passed back into it. This is discussed below. + +@item typedef void *awk_value_cookie_t; +A ``value cookie'' is an opaque type representing a cached value. +This is also discussed below. +@end table + +Scalar values in @command{awk} are either numbers or strings. The +@code{awk_value_t} struct represents values. The @code{val_type} member +indicates what is in the @code{union}. + +Representing numbers is easy---the API uses a C @code{double}. Strings +require more work. Since @command{gawk} allows embedded @code{NUL} bytes +in string values, a string must be represented as a pair containing a +data-pointer and length. This is the @code{awk_string_t} type. + +Identifiers (i.e., the names of global variables) can be associated +with either scalar values or with arrays. In addition, @command{gawk} +provides true arrays of arrays, where any given array element can +itself be an array. Discussion of arrays is delayed until +@ref{Array Manipulation} + +The various macros listed earlier make it easier to use the elements +of the @code{union} as if they were fields in a @code{struct}; this +is a common coding practice in C. Such code is easier to write and to +read, however it remains @emph{your} responsibility to make sure that +the @code{val_type} member correctly reflects the type of the value in +the @code{awk_value_t}. + +Conceptually, the first three members of the @code{union} (number, string, +and array) are all that is needed for working with @command{awk} values. +However, since the API provides routines for accessing and changing +the value of global scalar variables only by using the variable's name, +there is a performance penalty: @command{gawk} must find the variable +each time it is accessed and changed. This turns out to be a real issue, +not just a theoretical one. + +Thus, if you know that your extension will spend considerable time +reading and/or changing the value of one or more scalar variables, you +can obtain a @dfn{scalar cookie}@footnote{See +@uref{http://catb.org/jargon/html/C/cookie.html, the ``cookie'' entry in the Jargon file} for a +definition of @dfn{cookie}, and @uref{http://catb.org/jargon/html/M/magic-cookie.html, +the ``magic cookie'' entry in the Jargon file} for a nice example. See +also the entry in the @ref{Glossary}.} +object for that variable, and then use +the cookie for getting the variable's value for changing the variable's +value. +This is the @code{awk_scalar_t} type and @code{scalar_cookie} macro. +Given a scalar cookie, @command{gawk} can directly retrieve or +modify the value, as required, without having to first find it. + +The @code{awk_value_cookie_t} type and @code{value_cookie} macro are similar. +If you know that you wish to +use the same numeric or string @emph{value} for one or more variables, +you can create the value once, retaining a @dfn{value cookie} for it, +and then pass in that value cookie whenever you wish to set the value of a +variable. This saves both storage space within the running @command{gawk} +process as well as the time needed to create the value. + +@node Requesting Values +@subsection Requesting Values + +All of the functions that return values from @command{gawk} +work in the same way. You pass in an @code{awk_valtype_t} value +to indicate what kind of value you want. If the actual value +matches what you requested, the function returns true and fills +in the @code{awk_value_t} result. +Otherwise, the function returns false, and the @code{val_type} +member indicates the type of the actual value. You may then +print an error message, or reissue the request for the actual +value type, as appropriate. This behavior is summarized in +@ref{table-value-types-returned}. + +@ifnotplaintext +@float Table,table-value-types-returned +@caption{Value Types Returned} +@multitable @columnfractions .50 .50 +@headitem @tab Type of Actual Value: +@end multitable +@multitable @columnfractions .166 .166 .198 .15 .15 .166 +@headitem @tab @tab String @tab Number @tab Array @tab Undefined +@item @tab @b{String} @tab String @tab String @tab false @tab false +@item @tab @b{Number} @tab Number if can be converted, else false @tab Number @tab false @tab false +@item @b{Type} @tab @b{Array} @tab false @tab false @tab Array @tab false +@item @b{Requested:} @tab @b{Scalar} @tab Scalar @tab Scalar @tab false @tab false +@item @tab @b{Undefined} @tab String @tab Number @tab Array @tab Undefined +@item @tab @b{Value Cookie} @tab false @tab false @tab false @tab false +@end multitable +@end float +@end ifnotplaintext +@ifplaintext +@float Table,table-value-types-returned +@caption{Value Types Returned} +@example + +-------------------------------------------------+ + | Type of Actual Value: | + +------------+------------+-----------+-----------+ + | String | Number | Array | Undefined | ++-----------+-----------+------------+------------+-----------+-----------+ +| | String | String | String | false | false | +| |-----------+------------+------------+-----------+-----------+ +| | Number | Number if | Number | false | false | +| | | can be | | | | +| | | converted, | | | | +| | | else false | | | | +| |-----------+------------+------------+-----------+-----------+ +| Type | Array | false | false | Array | false | +| Requested |-----------+------------+------------+-----------+-----------+ +| | Scalar | Scalar | Scalar | false | false | +| |-----------+------------+------------+-----------+-----------+ +| | Undefined | String | Number | Array | Undefined | +| |-----------+------------+------------+-----------+-----------+ +| | Value | false | false | false | false | +| | Cookie | | | | | ++-----------+-----------+------------+------------+-----------+-----------+ +@end example +@end float +@end ifplaintext + +@node Constructor Functions +@subsection Constructor Functions and Convenience Macros + +The API provides a number of @dfn{constructor} functions for creating +string and numeric values, as well as a number of convenience macros. +This @value{SUBSECTION} presents them all as function prototypes, in +the way that extension code would use them. + +@table @code +@item static inline awk_value_t * +@itemx make_const_string(const char *string, size_t length, awk_value_t *result) +This function creates a string value in the @code{awk_value_t} variable +pointed to by @code{result}. It expects @code{string} to be a C string constant +(or other string data), and automatically creates a @emph{copy} of the data +for storage in @code{result}. + +@item static inline awk_value_t * +@itemx make_malloced_string(const char *string, size_t length, awk_value_t *result) +This function creates a string value in the @code{awk_value_t} variable +pointed to by @code{result}. It expects @code{string} to be a @samp{char *} +value pointing to data previously obtained from @code{malloc()}. The idea here +is that the data will be passed directly to @command{gawk}, which will assume +responsibility for it. + +@item static inline awk_value_t * +@itemx make_null_string(awk_value_t *result) +This specialized function creates a null string (the ``undefined'' value) +in the @code{awk_value_t} variable pointed to by @code{result}. + +@item static inline awk_value_t * +@itemx make_number(double num, awk_value_t *result) +This function simply creates a numeric value in the @code{awk_value_t} variable +pointed to by @code{result}. +@end table + +Two convenience macros may be used for allocating storage from @code{malloc()} +and @code{realloc()}. If the allocation fails, they cause @command{gawk} to +exit with a fatal error message. They should be used as if they were +procedure calls that do not return a value. + +@table @code +@item emalloc(pointer, type, size, message) +The arguments to this macro are as follows: +@c nested table +@table @code +@item pointer +The pointer variable to point at the allocated storage. + +@item type +The type of the pointer variable, used to create a cast for the call to @code{malloc()}. + +@item size +The total number of bytes to be allocated. + +@item message +A message to be prefixed to the fatal error message. Typically this is the name +of the function using the macro. +@end table + +@noindent +For example, you might allocate a string value like so: + +@example +awk_value_t result; +char *message; +const char greet[] = "Don't Panic!"; + +emalloc(message, char *, sizeof(greet), "myfunc"); +strcpy(message, greet); +make_malloced_string(message, strlen(message), & result); +@end example + +@item erealloc(pointer, type, size, message) +The arguments are the same as for the @code{emalloc()} macro. +@end table + +@node Registration Functions +@subsection Registration Functions + +This @value{SECTION} describes the API functions which let you +register parts of your extension with @command{gawk}. + +@menu +* Extension Functions:: Registering extension functions. +* Input Parsers:: Registering an input parser. +* Output Wrappers:: Registering an output wrapper. +* Two-way processors:: Registering a two-way processor. +* Exit Callback Functions:: Registering an exit callback. +* Extension Version String:: Registering a version string. +@end menu + +@node Extension Functions +@subsubsection Registering An Extension Function + +Extension functions are described by the following record: + +@example +typedef struct @{ +@ @ @ @ const char *name; +@ @ @ @ awk_value_t *(*function)(int num_actual_args, awk_value_t *result); +@ @ @ @ size_t num_expected_args; +@} awk_ext_func_t; +@end example + +The fields are: + +@table @code +@item const char *name; +The name of the new function. +@command{awk} level code will call the function by this name. + +@item awk_value_t *(*function)(int num_actual_args, awk_value_t *result); +This is a pointer to the C function that provides the desired +functionality. +The function must fill in the result with either a number +or a string. @command{awk takes ownership of any string memory}. +As mentioned earlier, string memory @strong{must} come from @code{malloc()}. + +The function must return the value of @code{result}. +This is for the convenience of the calling code inside @command{gawk}. + +@item size_t num_expected_args; +This is the number of arguments the function expects to receive. +Each extension function may decide what to do if the number of +arguments isn't what it expected. Following @command{awk} functions, it +is likely OK to ignore extra arguments. +@end table + +Once you have a record representing your extension function, you register +it with @command{gawk} using this API function: + +@table @code +@item awk_bool_t add_ext_func(const char *namespace, const awk_ext_func_t *func); +This function returns true upon success, false otherwise. +The @code{namespace} parameter is currently not used; you should pass in an +empty string (@code{""}). The @code{func} pointer is the address of a +@code{struct} describing your function, as just described. +@end table + +@node Input Parsers +@subsubsection Customized Input Parsers + +By default, @command{gawk} reads text files as its input. It uses the value +of @code{RS} to find the end of the record, and then uses @code{FS} +(or @code{FIELDWIDTHS}) to split it into fields. Additionally, it sets +the value of @code{RT} (@pxref{Built-in Variables}). + +If you want, you can provide your own, custom, input parser. An input +parser's job is to return a record to the @command{gawk} record processing +code, along with indicators for the value and length of the data to be +used for @code{RT}, if any. + +To provide an input parser, you must first provide two functions +(where @var{XXX} is a prefix name for your extension): + +@table @code +@item awk_bool_t @var{XXX}_can_take_file(const awk_input_buf_t *iobuf) +This function examines the information available in @code{iobuf} +(which we discuss shortly). Based on the information there, it +decides if the input parser should be used for this file. +If so, it should return true (non-zero). Otherwise, it should +return false (zero). + +@item awk_bool_t @var{XXX}_take_control_of(awk_input_buf_t *iobuf) +When @command{gawk} decides to hand control of the file over to the +input parser, it calls this function. This function in turn must fill +in certain fields in the @code{awk_input_buf_t} structure, and ensure +that certain conditions are true. It should then return true. If an +error of some kind occurs, it should not fill in any fields, and should +return false; then @command{gawk} will not use the input parser. +The details are presented shortly. +@end table + +Your extension should package these functions inside an +@code{awk_input_parser_t}, which looks like this: + +@example +typedef struct input_parser @{ + const char *name; /* name of parser */ + awk_bool_t (*can_take_file)(const awk_input_buf_t *iobuf); + awk_bool_t (*take_control_of)(awk_input_buf_t *iobuf); + awk_const struct input_parser *awk_const next; /* for use by gawk */ +@} awk_input_parser_t; +@end example + +The steps are as follows: + +@enumerate +@item +Create a @code{static awk_input_parser_t} variable and initialize it +appropriately. + +@item +When your extension is loaded, register your input parser with +@command{gawk} using the @code{register_input_parser()} API function +(described below). +@end enumerate + +An @code{awk_input_buf_t} looks like this: + +@example +typedef struct awk_input @{ + const char *name; /* filename */ + int fd; /* file descriptor */ +#define INVALID_HANDLE (-1) + void *opaque; /* private data for input parsers */ + int (*get_record)(char **out, struct awk_input *, int *errcode, + char **rt_start, size_t *rt_len); + void (*close_func)(struct awk_input *); + struct stat sbuf; /* stat buf */ +@} awk_input_buf_t; +@end example + +The fields can be divided into two categories: those for use (initially, +at least) by @code{@var{XXX}_can_take_file()}, and those for use by +@code{@var{XXX}_take_control_of()}. The first group of fields and their uses +are as follows: + +@table @code +@item const char *name; +The name of the file. + +@item int fd; +A file descriptor for the file. If @command{gawk} was able to +open the file, then it will @emph{not} be equal to +@code{INVALID_HANDLE}. Otherwise, it will. + +@item struct stat sbuf; +If file descriptor is valid, then @command{gawk} will have filled +in this structure with a call to the @code{fstat()} system call. +@end table + +The @code{@var{XXX}_can_take_file()} function should examine these +fields and decide if the input parser should be used for the file. +The decision can be made based upon @command{gawk} state (the value +of a variable defined previously by the extension and set by +@command{awk} code), the name of the +file, whether or not the file descriptor is valid, the information +in the @code{struct stat}, or any combination of the above. + +Once @code{@var{XXX}_can_take_file()} has returned true, and +@command{gawk} has decided to use your input parser, it will call +@code{@var{XXX}_take_control_of()}. That function then fills in at +least the @code{get_record} field of the @code{awk_input_buf_t}. It must +also ensure that @code{fd} is not set to @code{INVALID_HANDLE}. All of +the fields that may be filled by @code{@var{XXX}_take_control_of()} +are as follows: + +@table @code +@item void *opaque; +This is used to hold any state information needed by the input parser +for this file. It is ``opaque'' to @command{gawk}. The input parser +is not required to use this pointer. + +@item int (*get_record)(char **out, struct awk_input *, int *errcode, +@itemx char **rt_start, size_t *rt_len); +This is a function pointer that should be set to point to the +function that creates the input records. +Said function is the core of the input parser. Its behavior is +described below. + +@item void (*close_func)(struct awk_input *); +This is a function pointer that should be set to point to the +function that does the ``tear down.'' It should release any resources +allocated by @code{@var{XXX}_take_control_of()}. It may also close +the file. If it does so, it shold set the @code{fd} field to +@code{INVALID_HANDLE}. + +Having a ``tear down'' function is optional. If your input parser does +not need it, do not set this field. In that case, @command{gawk} +will close the regular @code{close()} system call on the +file descriptor, so it should be valid. +@end table + +The @code{@var{XXX}_get_record()} function does the work of creating +input records. The parameters are as follows: + +@table @code +@item char **out +This is a pointer to a @code{char *} variable which is set to point +to the record. @command{gawk} will make its own copy of the data, so +the extension must manage this storage. + +@item struct awk_input *iobuf +This is the @code{awk_input_buf_t} for the file. The fields should be +used for reading data (@code{fd}) and for managing private state +(@code{opaque}), if any. + +@item int *errcode +If an error occurs, @code{*errcode} should be set to an appropriate +code from @code{<errno.h>}. + +@item char **rt_start +@itemx size_t *rt_len +If the concept of a ``record terminator'' makes sense, then +@code{*rt_start} should be set to point to the data to be used for +@code{RT}, and @code{*rt_len} should be set to the length of the +data. Otherwise, @code{*rt_len} should be set to zero. +@code{gawk} makes its own copy of this data, so the +extension must manage the storage. +@end table + +The return value is the length of the buffer pointed to by +@code{*out}, or @code{EOF} if end-of-file was reached or an +error occurred. + +It is guaranteed that @code{errcode} is a valid pointer, so there is no +need to test for a @code{NULL} value. @command{gawk} sets @code{*errcode} +to zero, so there is no need to set it unless an error occurs. + +If an error does occur, the function should return @code{EOF} and set +@code{*errcode} to a non-zero value. In that case, if @code{*errcode} +does not equal @minus{}1, @command{gawk|} will automatically update +the @code{ERRNO} variable based on the value of @code{*errcode} (e.g., +setting @samp{*errcode = errno} should do the right thing). + +@command{gawk} ships with a sample extension (@pxref{Extension Sample +Readdir}) that reads directories, returning records for each entry in +the directory. You may wish to use that code as a guide for writing +your own input parser. + +When writing an input parser, you should think about (and document) +how it is expected to interact with @command{awk} code. You may want +it to always be called, and take effect as appropriate (as the +@code{readdir} extension does). Or you may want it to take effect +based upon the value of an @code{awk} variable, as the XML extension +from the @code{gawkextlib} project does (@pxref{gawkextlib}). +In the latter case, code in a @code{BEGINFILE} section (@pxref{BEGINFILE/ENDFILE}). +can look at @code{FILENAME} and @code{ERRNO} to decide whether or +not to activate an input parser. + +You register your input parser with the following function: + +@table @code +@item void register_input_parser(awk_input_parser_t *input_parser); +Register the input parser pointed to by @code{input_parser} with +@command{gawk}. +@end table + +@node Output Wrappers +@subsubsection Customized Output Wrappers + +An @dfn{output wrapper} is the mirror image of an input parser. +It allows an extension to take over the output to a file (opened +with the @samp{>} or @samp{>>} operators, @pxref{Redirection}). + +The output wrapper is very similar to the input parser structure: + +@example +typedef struct output_wrapper @{ + const char *name; /* name of the wrapper */ + awk_bool_t (*can_take_file)(const awk_output_buf_t *outbuf); + awk_bool_t (*take_control_of)(awk_output_buf_t *outbuf); + awk_const struct output_wrapper *awk_const next; /* for use by gawk */ +@} awk_output_wrapper_t; +@end example + +The members are as follows: + +@table @code +@item const char *name; +This is the name of the output wrapper. + +@item awk_bool_t (*can_take_file)(const awk_output_buf_t *outbuf); +This points to a function that examines the information in +the @code{awk_output_buf_t} structure pointed to by @code{outbuf}. +It should return true if the output wrapper wants to take over the +file, and false otherwise. It should not change any state (variable +values, etc.) within @command{gawk}. + +@item awk_bool_t (*take_control_of)(awk_output_buf_t *outbuf); +The function pointed to by this field is called when @command{gawk} +decides to let the output wrapper take control of the file. It should +fill in appropriate members of the @code{awk_output_buf_t} structure, +as described below, and return true if successful, false otherwise. + +@item awk_const struct output_wrapper *awk_const next; +This is for use by @command{gawk}. +@end table + +The @code{awk_output_buf_t} structure looks like this: + +@example +typedef struct @{ + const char *name; /* name of output file */ + const char *mode; /* mode argument to fopen */ + FILE *fp; /* stdio file pointer */ + awk_bool_t redirected; /* true if a wrapper is active */ + void *opaque; /* for use by output wrapper */ + size_t (*gawk_fwrite)(const void *buf, size_t size, size_t count, + FILE *fp, void *opaque); + int (*gawk_fflush)(FILE *fp, void *opaque); + int (*gawk_ferror)(FILE *fp, void *opaque); + int (*gawk_fclose)(FILE *fp, void *opaque); +@} awk_output_buf_t; +@end example + +Here too, your extension will define @code{@var{XXX}_can_take_file()} +and @code{@var{XXX}_take_control_of()} functions that examine and update +data members in the @code{awk_output_buf_t}. +The data members are as follows: + +@table @code +@item const char *name; +The name of the output file. + +@item const char *mode; +The mode string (as would be used in the second argument to @code{fopen()} +with which the file was opened. + +@item FILE *fp; +The @code{FILE} pointer from @code{<stdio.h>}. @command{gawk} opens the file +before attempting to find an output wrapper. + +@item awk_bool_t redirected; +The field should be set to true in the @code{@var{XXX}_take_control_of()} function. + +@item void *opaque; +This pointer is opaque to @command{gawk}. The extension should use it to store +a pointer to any private data associated with the file. + +@item size_t (*gawk_fwrite)(const void *buf, size_t size, size_t count, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ FILE *fp, void *opaque); +@itemx int (*gawk_fflush)(FILE *fp, void *opaque); +@itemx int (*gawk_ferror)(FILE *fp, void *opaque); +@itemx int (*gawk_fclose)(FILE *fp, void *opaque); +These pointers should be set to point to functions that perform +the equivalent function as the @code{<stdio.h>} functions do, if appropriate. +@command{gawk} uses these function pointers for all output. +@command{gawk} initializes the pointers to point to internal, ``pass through'' +functions that just call the regular @code{<stdio.h>} functions, so an +extension only needs to redefine those functions that are appropriate for +what it does. +@end table + +The @code{@var{XXX}_can_take_file()} function should make a decision based +upon the @code{name} and @code{mode} fields, and any additional state +(such as @command{awk} variable values) that is appropriate. + +When @command{gawk} calls @code{@var{XXX}_take_control_of()}, it should fill +in the other fields, as appropriate, except for @code{fp}, which it should just +use normally. + +You register your output wrapper with the following function: + +@table @code +@item void register_output_wrapper(awk_output_wrapper_t *output_wrapper); +Register the output wrapper pointed to by @code{output_wrapper} with +@command{gawk}. +@end table + +@node Two-way processors +@subsubsection Customized Two-way Processors + +A @dfn{two-way processor} combines an input parser and an output wrapper for +two-way I/O with the @samp{|&} operator (@pxref{Redirection}). It makes identical +use of the @code{awk_input_parser_t} and @code{awk_output_buf_t} structures, +as described earlier. + +A two-way processor is represented by the following structure: + +@example +typedef struct two_way_processor @{ + const char *name; /* name of the two-way processor */ + awk_bool_t (*can_take_two_way)(const char *name); + awk_bool_t (*take_control_of)(const char *name, awk_input_buf_t *inbuf, awk_output_buf_t *outbuf); + awk_const struct two_way_processor *awk_const next; /* for use by gawk */ +@} awk_two_way_processor_t; +@end example + +The fields are as follows: + +@table @code +@item const char *name; +The name of the two-way processor. + +@item awk_bool_t (*can_take_two_way)(const char *name); +This function returns true if it wants to take over the two-way I/O for this filename. + +@item awk_bool_t (*take_control_of)(const char *name, awk_input_buf_t *inbuf, awk_output_buf_t *outbuf); +This function should fill in the @code{awk_input_buf_t} and +@code{awk_outut_buf_t} structures pointed to by @code{inbuf} and +@code{outbuf}, respectively. These structures were described earlier. + +@item awk_const struct two_way_processor *awk_const next; +This is for use by @command{gawk}. +@end table + +As with the input parser and output processor, you provide +``yes I can take this'' and ``take over for this'' functions, +@code{@var{XXX}_can_take_two_way()} and @code{@var{XXX}_take_control_of()}. + +You register your two-way processor with the following function: + +@table @code +@item void register_two_way_processor(awk_two_way_processor_t *two_way_processor); +Register the two-way processor pointed to by @code{two_way_processor} with +@command{gawk}. +@end table + +@node Exit Callback Functions +@subsubsection Registering An Exit Callback Function + +An @dfn{exit callback} function is a function that +@command{gawk} calls before it exits. +Such functions are useful if you have general ``clean up'' tasks +that should be performed in your extension (such as closing data +base connections or other resource deallocations). +You can register such +a function with @command{gawk} using the following function. + +@table @code +@item void awk_atexit(void (*funcp)(void *data, int exit_status), +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ void *arg0); +The parameters are: +@c nested table +@table @code +@item funcp +Points to the function to be called before @command{gawk} exits. The @code{data} +parameter will be the original value of @code{arg0}. +The @code{exit_status} parameter is +the exit status value that @command{gawk} will pass to the @code{exit()} system call. + +@item arg0 +A pointer to private data which @command{gawk} saves in order to pass to +the function pointed to by @code{funcp}. +@end table +@end table + +Exit callback functions are called in Last-In-First-Out (LIFO) order---that is, in +the reverse order in which they are registered with @command{gawk}. + +@node Extension Version String +@subsubsection Registering An Extension Version String + +You can register a version string which indicates the name and +version of your extension, with @command{gawk}, as follows: + +@table @code +@item void register_ext_version(const char *version); +Register the string pointed to by @code{version} with @command{gawk}. +@end table + +@command{gawk} prints all registered extension version strings when it +is invoked with the @option{--version} option. + +@node Printing Messages +@subsection Printing Messages + +You can print different kinds of warning messages from your +extension, as described below. Note that for these functions, +you must pass in the extension id received from @command{gawk} +when the extension was loaded.@footnote{Because the API uses only ISO C 90 +features, it cannot make use of the ISO C 99 variadic macro feature to hide +that parameter. More's the pity.} + +@table @code +@item void fatal(awk_ext_id_t id, const char *format, ...); +Print a message and then cause @command{gawk} to exit immediately. + +@item void warning(awk_ext_id_t id, const char *format, ...); +Print a warning message. + +@item void lintwarn(awk_ext_id_t id, const char *format, ...); +Print a ``lint warning.'' Normally this is the same as printing a +warning message, but if @command{gawk} was invoked with @samp{--lint=fatal}, +then they become fatal error messages. +@end table + +All of these functions are otherwise like the C @code{printf()} +family of functions, where the @code{format} parameter is a string +with literal characters and formatting codes intermixed. + +@node Updating @code{ERRNO} +@subsection Updating @code{ERRNO} + +The following functions allow you to update the @code{ERRNO} +variable. + +@table @code +@item void update_ERRNO_int(int errno_val); +Set @code{ERRNO} to the string equivalent of the error code +in @code{errno_val}. The value should be one of the defined +error codes in @code{<errno.h>}, and @command{gawk} will turn it +into a (possibly translated) string using the C @code{strerror()} function. + +@item void update_ERRNO_string(const char *string); +Set @code{ERRNO} directly to the string value of @code{ERRNO}. +@command{gawk} will make a copy of the value of @code{string}. + +@item void unset_ERRNO(); +Unset @code{ERRNO}. +@end table + +@node Accessing Parameters +@subsection Accessing and Updating Parameters + +Two functions give you access to the arguments (parameters) +passed to your extension function. They are: + +@table @code +@item awk_bool_t get_argument(size_t count, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_valtype_t wanted, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_value_t *result); +Fill in the @code{awk_value_t} structure pointed to by @code{result} +with the @code{count}'th argument. Counts are zero based---the first argument is +numbered zero, the second one, and so on. @code{wanted} indicates the type of +value expected. Return true if the actual type matches @code{wanted}, false otherwise +In the latter case, @code{result->val_type} indicates the actual type. + +@item awk_bool_t set_argument(size_t count, awk_array_t array); +Convert a parameter that was undefined into an array; this provides +call-by-reference for arrays. Return false +if @code{count} is too big, or if the argument's type is +not undefined. +@end table + +@node Symbol Table Access +@subsection Symbol Table Access + +Three sets of routines provide access to global variables. + +@menu +* Symbol table by name:: Accessing variables by name. +* Symbol table by cookie:: Accessing variables by ``cookie''. +* Cached values:: Creating and using cached values. +@end menu + +@node Symbol table by name +@subsubsection Variable Access and Update by Name + +The following routines provide the ability to access and update +global @command{awk}-level variables by name. In compiler terminology, +identifiers of different kinds are termed @dfn{symbols}, thus the ``sym'' +in the routines' names. The data structure which stores information +about symbols is termed a @dfn{symbol table}. + +@table @code +@item awk_bool_t sym_lookup(const char *name, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_valtype_t wanted, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_value_t *result); +Fill in the @code{awk_value_t} structure pointed to by @code{result} +with the value of the variable named by the string @code{name}, which is +a regular C string. @code{wanted} indicates the type of value expected. +Return true if the actual type matches @code{wanted}, false otherwise +In the latter case, @code{result->val_type} indicates the actual type. + +@item awk_bool_t sym_update(const char *name, awk_value_t *value); +Update the variable named by the string @code{name}, which is a regular +C string. The variable will be added to @command{gawk}'s symbol table +if it is not there. Return true if everything worked, false otherwise. + +Changing types (scalar to array or vice versa) of an existing variable +is @emph{not} allowed, nor may this routine be used to update an array. +This routine can also not be be used to update any of the predefined +variables (such as @code{ARGC} or @code{NF}). + +@item awk_bool_t sym_constant(const char *name, awk_value_t *value); +Create a variable named by the string @code{name}, which is +a regular C string, that has the constant value as given by +@code{value}. @command{awk}-level code cannot change the value of this +variable.@footnote{There (currently) is no @code{awk}-level feature that +provides this ability.} The extension may change the value @code{name}'s +variable with subsequent calls to this routine, and may also convert +a variable created by @code{sym_update()} into a constant. However, +once a variable becomes a constant it cannot later be reverted into a +mutable variable. +@end table + +@node Symbol table by cookie +@subsubsection Variable Access and Update by Cookie + +A @dfn{scalar cookie} is an opaque handle that provide access +to a global variable or array. It is an optimization that +avoids looking up variables in @command{gawk}'s symbol table every time +access is needed. This was discussed earlier, in @ref{General Data Types}. + +The following functions let you work with scalar cookies. + +@table @code +@item awk_bool_t sym_lookup_scalar(awk_scalar_t cookie, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_valtype_t wanted, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_value_t *result); +Retrieve the current value of a scalar cookie. +Once you have obtained a scalar_cookie using @code{sym_lookup()}, you can +use this function to get its value more efficiently. +Return false if the value cannot be retrieved. + +@item awk_bool_t sym_update_scalar(awk_scalar_t cookie, awk_value_t *value); +Update the value associated with a scalar cookie. +Return will be false if the new value is not one of +@code{AWK_STRING} or @code{AWK_NUMBER}. +Here too, the built-in variables may not be updated. +@end table + +It is not obvious at first glance how to work with scalar cookies or +what their @i{raison d'etre} really is. In theory, the @code{sym_lookup()} +and @code{sym_update()} routines are all you really need to work with +variables. For example, you might have code that looked up the value of +a variable, evaluated a condition, and then possibly changed the value +of the variable based on the result of that evaluation, like so: + +@example +/* do_magic --- do something really great */ + +static awk_value_t * +do_magic(int nargs, awk_value_t *result) +@{ + awk_value_t value; + + if ( sym_lookup("MAGIC_VAR", AWK_NUMBER, & value) + && some_condition(value.num_value)) @{ + value.num_value += 42; + sym_update("MAGIC_VAR", & value); + @} + + return make_number(0.0, result); +@} +@end example + +@noindent +This code looks (and is) simple and straightforward. So what's the problem? + +Consider what happens if @command{awk}-level code associated with your +extension calls the @code{magic()} function (implemented in C by @code{do_magic()}), +once per record, while processing hundreds of thousands or millions of records. +The @code{MAGIC_VAR} variable is looked up in the symbol table once or twice per function call! + +The symbol table lookup is really pure overhead; it is considerably more efficient +to get a cookie that represents the variable, and use that to get the variable's +value and update it as needed.@footnote{The difference is measurable and quite real. Trust us.} + +Thus, the way to use cookies is as follows. First, install your extension's variable +in @command{gawk}'s symbol table using @code{sym_update()}, as usual. Then get a +scalar cookie for the variable using @code{sym_lookup()}: + +@example +static awk_scalar_t magic_var_cookie; /* static global cookie for MAGIC_VAR */ + +static void +my_extension_init() +@{ + awk_value_t value; + + sym_update("MAGIC_VAR", make_number(42.0, & value)); /* install initial value */ + sym_lookup("MAGIC_VAR", AWK_SCALAR, & value); /* get cookie */ + magic_var_cookie = value.scalar_cookie; /* save the cookie */ + @dots{} +@} +@end example + +Next, use the routines in this section for retrieving and updating +the value by way of the cookie. Thus, @code{do_magic()} now becomes +something like this: + +@example +/* do_magic --- do something really great */ + +static awk_value_t * +do_magic(int nargs, awk_value_t *result) +@{ + awk_value_t value; + + if ( sym_lookup_scalar(magic_var_cookie, AWK_NUMBER, & value) + && some_condition(value.num_value)) @{ + value.num_value += 42; + sym_update_scalar(magic_var_cookie, & value); + @} + @dots{} + + return make_number(0.0, result); +@} +@end example + +@quotation NOTE +The previous code omitted error checking for +presentation purposes. Your extension code should be more robust +and check the return values from the API functions carefully. +@end quotation + +@node Cached values +@subsubsection Creating and Using Cached Values + +The routines in this section allow you to create and release +cached values. As with scalar cookies, in theory, cached values +are not necessary. You can create numbers and strings using +the functions in @ref{Constructor Functions}. You can then +assign those values to variables using @code{sym_update()} +or @code{sym_update_scalar()}, as you like. + +However, you can understand the point of cached values if you remember that +@emph{every} string value's storage @emph{must} come from @code{malloc()}. +If you have 20 variables, all of which have the same string value, you +must create 20 identical copies of the string.@footnote{Numeric values +are clearly less problematic, requiring only a C @code{double} to store.} + +It is clearly more efficient, if possible, to create a value once, and +then tell @command{gawk} to reuse the value for multiple variables. That +is what the routines in this section let you do. The functions are as follows: + +@table @code +@item awk_bool_t create_value(awk_value_t *value, awk_value_cookie_t *result); +Create a cached string or numeric value from @code{value} for efficient later +assignment. +Only @code{AWK_NUMBER} and @code{AWK_STRING} values are allowed. Any other type +is rejected. While @code{AWK_UNDEFINED} could be allowed, doing so would +result in inferior performance. + +@item awk_bool_t release_value(awk_value_cookie_t vc); +Release the memory associated with a value cookie obtained +from @code{create_value()}. +@end table + +You use value cookies in a fashion similar to the way you use scalar cookies. +In the extension initialization routine, you create the value cookie: + +@example +static awk_value_cookie_t answer_cookie; /* static value cookie */ + +static void +my_extension_init() +@{ + awk_value_t value; + char *long_string; + size_t long_string_len; + + @dots{} /* code from earlier */ + /* @dots{} fill in long_string and long_string_len @dots{} */ + make_malloced_string(long_string, long_string_len, & value); + create_value(& value, & answer_cookie); /* create cookie */ + @dots{} +@} +@end example + +Once the value is created, you can use it as the value of any number +of variables: + +@example +static awk_value_t * +do_magic(int nargs, awk_value_t *result) +@{ + awk_value_t new_value; + + @dots{} /* as earlier */ + + value.val_type = AWK_VALUE_COOKIE; + value.value_cookie = answer_cookie; + sym_update("VAR1", & value); + sym_update("VAR2", & value); + @dots{} + sym_update("VAR100", & value); + @dots{} +@} +@end example + +@noindent +Using value cookies in this way saves considerable storage, since all of +@code{VAR1} through @code{VAR100} share the same value. + +You might be wondering, ``Is this sharing problematic? +What happens if @command{awk} code assigns a new value to @code{VAR1}, +will all the others be changed too?'' + +That's a great question. The answer is that no, it's not a problem. +@command{gawk} is smart enough to avoid such problems. + +Finally, as part of your clean up action (@pxref{Exit Callback Functions}) +you should release any cached values that you created using +@code{release_value()}. + +@node Array Manipulation +@subsection Array Manipulation + +The primary data structure@footnote{Okay, the only data structure.} in @command{awk} +is the associative array (@pxref{Arrays}). +Extensions need to be able to manipulate @command{awk} arrays. +The API provides a number of data structures for working with arrays, +functions for working with individual elements, and functions for +working with arrays as a whole. This includes the ability to +``flatten'' an array so that it is easy for C code to traverse +every element in an array. The array data structures integrate +nicely with the data structures for values to make it easy to +both work with and create true arrays of arrays (@pxref{General Data Types}). + +@menu +* Array Data Types:: Data types for working with arrays. +* Array Functions:: Functions for working with arrays. +* Flattening Arrays:: How to flatten arrays. +* Creating Arrays:: How to create and populate arrays. +@end menu + +@node Array Data Types +@subsubsection Array Data Types + +The data types associated with arrays are listed below. + +@table @code +@item typedef void *awk_array_t; +If you request the value of an array variable, you get back an +@code{awk_array_t} value. This value is opaque@footnote{It is also +a ``cookie,'' but the @command{gawk} developers did not wish to overuse this +term.} to the extension; it uniquely identifies the array but can +only be used by passing it into API functions or receiving it from API +functions. This is very similar to way @samp{FILE *} values are used +with the @code{<stdio.h>} library routines. + + +@item +@item typedef struct awk_element @{ +@itemx @ @ @ @ /* convenience linked list pointer, not used by gawk */ +@itemx @ @ @ @ struct awk_element *next; +@itemx @ @ @ @ enum @{ +@itemx @ @ @ @ @ @ @ @ AWK_ELEMENT_DEFAULT = 0,@ @ /* set by gawk */ +@itemx @ @ @ @ @ @ @ @ AWK_ELEMENT_DELETE = 1@ @ @ @ /* set by extension if +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ should be deleted */ +@itemx @ @ @ @ @} flags; +@itemx @ @ @ @ awk_value_t index; +@itemx @ @ @ @ awk_value_t value; +@itemx @} awk_element_t; +The @code{awk_element_t} is a ``flattened'' +array element. @command{awk} produces an array of these +inside the @code{awk_flat_array_t} (see the next item). +ALL memory pointed to belongs to @command{gawk}. Individual elements may +be marked for deletion. New elements must be added individually, +one at a time, using the separate API for that purpose. +The @code{next} pointer is for the convenience of extension writers. +It allows an extension to create a linked +list of new elements which can then be added to array in a loop +that traverses the list. + +@item typedef struct awk_flat_array @{ +@itemx @ @ @ @ awk_const void *awk_const opaque1;@ @ @ @ /* private data for use by gawk */ +@itemx @ @ @ @ awk_const void *awk_const opaque2;@ @ @ @ /* private data for use by gawk */ +@itemx @ @ @ @ awk_const size_t count;@ @ @ @ @ /* how many elements */ +@itemx @ @ @ @ awk_element_t elements[1];@ @ /* will be extended */ +@itemx @} awk_flat_array_t; +This is a flattened array. When an extension gets one of these +from @command{gawk}, the @code{elements} array will be of actual +size @code{count}. +The @code{opaque1} and @code{opaque2} pointers are for use by @command{gawk}; +therefore they are marked @code{awk_const} so that the extension cannot +modify them. +@end table + +@node Array Functions +@subsubsection Array Functions + +The following functions relate to individual array elements. + +@table @code +@item awk_bool_t get_element_count(awk_array_t a_cookie, size_t *count); +For the array represented by @code{a_cookie}, return in @code{*count} +the number of elements it contains. A subarray counts as a single element. +Return false if there is an error. + +@item awk_bool_t get_array_element(awk_array_t a_cookie, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const awk_value_t *const index, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_valtype_t wanted, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_value_t *result); +For the array represented by @code{a_cookie}, return in @code{*result} +the value of the element whose index is @code{index}. +The value for @code{index} can be numeric, in which case @command{gawk} +will convert it to a string. Using non-integral values is possible, but +requires that you understand how such values are converted to strings +(@pxref{Conversion}); thus using integral values is safest. +@code{wanted} specifies the type of value you wish to retrieve. +Return false if @code{wanted} does not match the actual type or if +@code{index} is not in the array. + +As with @emph{all} strings passed into @code{gawk} from an extension, +the string value of @code{index} must come from @code{malloc()}, and +@command{gawk} will release the storage. + +@item awk_bool_t set_array_element(awk_array_t a_cookie, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const@ awk_value_t *const index, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const@ awk_value_t *const value); +In the array represented by @code{a_cookie}, create or modify +the element whose index is given by @code{index}. +The @code{ARGV} and @code{ENVIRON} arrays may not be changed. + +@item awk_bool_t set_array_element_by_elem(awk_array_t a_cookie, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_element_t element); +Like @code{set_array_element()}, but take the @code{index} and @code{value} +from @code{element}. This is a convenience macro. + +@item awk_bool_t del_array_element(awk_array_t a_cookie, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const awk_value_t* const index); +Remove the element with the given index from the array +represented by @code{a_cookie}. +Return true if the element was removed, or false if the element did +not exist in the array. +@end table + +Functions related to arrays as a whole. + +@table @code +@item awk_array_t create_array(); +Create a new array to which elements may be added. +@xref{Creating Arrays}, for a discussion of how to +create a new array and add elements to it. + +@item awk_bool_t clear_array(awk_array_t a_cookie); +Clear the array represented by @code{a_cookie}. +Return false if there was some kind of problem, true otherwise. +The array remains an array, but after calling this function, it +has no elements. This is equivalent to using the @code{delete} +statement (@pxref{Delete}). + +@item awk_bool_t flatten_array(awk_array_t a_cookie, awk_flat_array_t **data); +For the array represented by @code{a_cookie}, create an @code{awk_flat_array_t} +structure and fill it in. Set the pointer whose address is passed as @code{data} +to point to this structure. +Return true upon success, or false otherwise. +@xref{Flattening Arrays}, for a discussion of how to +flatten an array and work with it. + +@item awk_bool_t release_flattened_array(awk_array_t a_cookie, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_flat_array_t *data); +When done with a flattened array, release the storage using this function. +You must pass in both the original array cookie, and the address of +the created @code{awk_flat_array_t} structure. +The function returns true upon success, false otherwise. +@end table + + +@node Flattening Arrays +@subsubsection Working With All The Elements of an Array + +To @dfn{flatten} an array is create a structure that +represents the full array in a fashion that makes it easy +for C code to traverse the entire array. Test code +in @file{extension/testext.c} does this, and also serves +as a nice example to show how to use the APIs. + +First, the @command{gawk} script that drives the test extension: + +@example +@@load "testext" +BEGIN @{ + n = split("blacky rusty sophie raincloud lucky", pets) + printf "pets has %d elements\n", length(pets) + ret = dump_array_and_delete("pets", "3") + printf "dump_array_and_delete(pets) returned %d\n", ret + if ("3" in pets) + printf("dump_array_and_delete() did NOT remove index \"3\"!\n") + else + printf("dump_array_and_delete() did remove index \"3\"!\n") + print "" +@} +@end example + +@noindent +This code creates an array with @code{split()} (@pxref{String Functions}) +and then calls @code{dump_and_delete()}. That function looks up +the array whose name is passed as the first argument, and +deletes the element at the index passed in the second argument. +It then prints the return value and checks if the element +was indeed deleted. Here is the C code that implements +@code{dump_array_and_delete()}. + +The first part declares variables, sets up the default +return value in @code{result}, and checks that the function +was called with the correct number of arguments: + +@example +static awk_value_t * +dump_array_and_delete(int nargs, awk_value_t *result) +@{ + awk_value_t value, value2, value3; + awk_flat_array_t *flat_array; + size_t count; + char *name; + int i; + + assert(result != NULL); + make_number(0.0, result); + + if (nargs != 2) @{ + printf("dump_array_and_delete: nargs not right (%d should be 2)\n", nargs); + goto out; + @} +@end example + +The function then proceeds in steps, as follows. First, retrieve +the name of the array, passed as the first argument. Then +retrieve the array itself. If either operation fails, print +error messages and return. + +@example + /* get argument named array as flat array and print it */ + if (get_argument(0, AWK_STRING, & value)) @{ + name = value.str_value.str; + if (sym_lookup(name, AWK_ARRAY, & value2)) + printf("dump_array_and_delete: sym_lookup of %s passed\n", name); + else @{ + printf("dump_array_and_delete: sym_lookup of %s failed\n", name); + goto out; + @} + @} else @{ + printf("dump_array_and_delete: get_argument(0) failed\n"); + goto out; + @} +@end example + +For testing purposes and to make sure that the C code sees +the same number of elements as the @command{awk} code, +The second step is to get the count of elements in the array +and print it: + +@example + if (! get_element_count(value2.array_cookie, & count)) @{ + printf("dump_array_and_delete: get_element_count failed\n"); + goto out; + @} + + printf("dump_array_and_delete: incoming size is %lu\n", (unsigned long) count); +@end example + +The third step is to actually flatten the array, and then +to double check that the count in the @code{awk_flat_array_t} +is the same as the count just retrieved. + +@example + if (! flatten_array(value2.array_cookie, & flat_array)) @{ + printf("dump_array_and_delete: could not flatten array\n"); + goto out; + @} + + if (flat_array->count != count) @{ + printf("dump_array_and_delete: flat_array->count (%lu) != count (%lu)\n", + (unsigned long) flat_array->count, + (unsigned long) count); + goto out; + @} +@end example + +The fourth step is to retrieve the index of the element +to be deleted, which was passed as the second argument. +Remember that argument counts passed to @code{get_argument()} +are zero-based, thus the second argument is numbered one. + +@example + if (! get_argument(1, AWK_STRING, & value3)) @{ + printf("dump_array_and_delete: get_argument(1) failed\n"); + goto out; + @} +@end example + +The fifth step is where the ``real work'' is done. The function +loops over every element in the array, printing the index and +element values. In addition, upon finding the element with the +index that is supposed to be deleted, the function sets the +@code{AWK_ELEMENT_DELETE} bit in the @code{flags} field +of the element. When the array is released, @command{gawk} +traverses the flattened array, and deletes any element which +have this flag bit set. + +@example + for (i = 0; i < flat_array->count; i++) @{ + printf("\t%s[\"%.*s\"] = %s\n", + name, + (int) flat_array->elements[i].index.str_value.len, + flat_array->elements[i].index.str_value.str, + valrep2str(& flat_array->elements[i].value)); + + if (strcmp(value3.str_value.str, flat_array->elements[i].index.str_value.str) == 0) @{ + flat_array->elements[i].flags |= AWK_ELEMENT_DELETE; + printf("dump_array_and_delete: marking element \"%s\" for deletion\n", + flat_array->elements[i].index.str_value.str); + @} + @} +@end example + +The sixth step is to release the flattened array. This tells +@command{gawk} that the extension is no longer using the array, +and that it should delete any elements marked for deletion. +@command{gawk} will also free any storage that was allocated, +so you should not use the pointer (@code{flat_array} in this +code) once you have called @code{release_flattened_array()}: + +@example + if (! release_flattened_array(value2.array_cookie, flat_array)) @{ + printf("dump_array_and_delete: could not release flattened array\n"); + goto out; + @} +@end example + +Finally, since everything was successful, the function sets the +return value to success, and returns. + +@example + make_number(1.0, result); +out: + return result; +@} +@end example + +Here is the output from running this part of the test: + +@example +pets has 5 elements +dump_array_and_delete: sym_lookup of pets passed +dump_array_and_delete: incoming size is 5 + pets["1"] = "blacky" + pets["2"] = "rusty" + pets["3"] = "sophie" +dump_array_and_delete: marking element "3" for deletion + pets["4"] = "raincloud" + pets["5"] = "lucky" +dump_array_and_delete(pets) returned 1 +dump_array_and_delete() did remove index "3"! +@end example + +@node Creating Arrays +@subsubsection How To Create and Populate Arrays + +Besides working with arrays created by @command{awk} code, you can +create arrays and populate them as you see fit, and then @command{awk} +code can access them and manipulate them. + +There are two important points about creating arrays from extension code: + +@enumerate 1 +@item +You must install a new array into @command{gawk}'s symbol +table immediately upon creating it. Once you have done so, +you can then populate the array. + +@ignore +Strictly speaking, this is required only +for arrays that will have subarrays as elements; however it is +a good idea to always do this. This restriction may be relaxed +in a subsequent revision of the API. +@end ignore + +Similarly, if installing a new array as a subarray of an existing array, +you must add the new array to its parent before adding any elements to it. + +Thus, the correct way to build an array is to work ``top down.'' Create +the array, and immediately install it in @command{gawk}'s symbol table +using @code{sym_update()}, or install it as an element in a previously +existing array using @code{set_element()}. Example code is coming shortly. + +@item +Due to gawk internals, after using @code{sym_update()} to install an array +into @command{gawk}, you have to retrieve the array cookie from the value +passed in to @command{sym_update()} before doing anything else with it, like so: + +@example +awk_value_t index, value; +awk_array_t new_array; + +make_const_string("an index", 9, & index); + +new_array = create_array(); +val.val_type = AWK_ARRAY; +val.array_cookie = new_array; + +sym_update("array", &index, & val); /* install array in the symbol table */ + +new_array = val.array_cookie; /* YOU MUST DO THIS */ +@end example + +If installing an array as a subarray, you must also retrieve the value +of the array_cookie after the call to @code{set_element()}. +@end enumerate + +The following C code is a simple test extension to create an array +with two regular elements and with a subarray. The leading @samp{#include} +directives and boilerplate variable declarations are omitted for brevity. +The first step is to create a new array and then install it +in the symbol table: + +@example +@ignore +#ifdef HAVE_CONFIG_H +#include <config.h> +#endif + +#include <stdio.h> +#include <assert.h> +#include <errno.h> +#include <stdlib.h> +#include <string.h> +#include <unistd.h> + +#include <sys/types.h> +#include <sys/stat.h> + +#include "gawkapi.h" + +static const gawk_api_t *api; /* for convenience macros to work */ +static awk_ext_id_t *ext_id; +static const char *ext_version = "testarray extension: version 1.0"; + +int plugin_is_GPL_compatible; + +@end ignore +/* create_new_array --- create a named array */ + +static void +create_new_array() +@{ + awk_array_t a_cookie; + awk_array_t subarray; + awk_value_t index, value; + + a_cookie = create_array(); + value.val_type = AWK_ARRAY; + value.array_cookie = a_cookie; + + if (! sym_update("new_array", & value)) + printf("create_new_array: sym_update(\"new_array\") failed!\n"); + a_cookie = value.array_cookie; +@end example + +@noindent +Note how @code{a_cookie} is reset from the @code{array_cookie} field in +the @code{value} structure. + +The second step is to install two regular values into @code{new_array}: + +@example + (void) make_const_string("hello", 5, & index); + (void) make_const_string("world", 5, & value); + if (! set_array_element(a_cookie, & index, & value)) @{ + printf("fill_in_array:%d: set_array_element failed\n", __LINE__); + return; + @} + + (void) make_const_string("answer", 6, & index); + (void) make_number(42.0, & value); + if (! set_array_element(a_cookie, & index, & value)) @{ + printf("fill_in_array:%d: set_array_element failed\n", __LINE__); + return; + @} +@end example + +The third step is to create the subarray and install it: + +@example + (void) make_const_string("subarray", 8, & index); + subarray = create_array(); + value.val_type = AWK_ARRAY; + value.array_cookie = subarray; + if (! set_array_element(a_cookie, & index, & value)) @{ + printf("fill_in_array:%d: set_array_element failed\n", __LINE__); + return; + @} + subarray = value.array_cookie; +@end example + +The final step is to populate the subarray with its own element: + +@example + (void) make_const_string("foo", 3, & index); + (void) make_const_string("bar", 3, & value); + if (! set_array_element(subarray, & index, & value)) @{ + printf("fill_in_array:%d: set_array_element failed\n", __LINE__); + return; + @} +@} +@ignore +static awk_ext_func_t func_table[] = @{ + @{ NULL, NULL, 0 @} +@}; + +/* init_testarray --- additional initialization function */ + +static awk_bool_t init_testarray(void) +@{ + create_new_array(); + + return 1; +@} + +static awk_bool_t (*init_func)(void) = init_testarray; + +dl_load_func(func_table, testarray, "") +@end ignore +@end example + +Here is sample script that loads the extension +and then dumps the array: + +@example +@@load "subarray" + +function dumparray(name, array, i) +@{ + for (i in array) + if (isarray(array[i])) + dumparray(name "[\"" i "\"]", array[i]) + else + printf("%s[\"%s\"] = %s\n", name, i, array[i]) +@} + +BEGIN @{ + dumparray("new_array", new_array); +@} +@end example + +Here is the result of running the script: + +@example +$ @kbd{AWKLIBPATH=$PWD ./gawk -f foo.awk} +@print{} new_array["subarray"]["foo"] = bar +@print{} new_array["hello"] = world +@print{} new_array["answer"] = 42 +@end example + +@node Extension API Variables +@subsection Variables + +The API provides two sets of variables. The first provides information +about the version of the API (both with which the extension was compiled, +and with which @command{gawk} was compiled). The second provides +information about how @command{gawk} was invoked. + +@menu +* Extension Versioning:: API Version information. +* Extension API Informational Variables:: Variables providing information about + @command{gawk}'s invocation. +@end menu + +@node Extension Versioning +@subsubsection API Version Constants and Variables + +The API provides both a ``major'' and a ``minor'' version number. +The API versions are available at compile time as constants: + +@table @code +@item GAWK_API_MAJOR_VERSION +The major version of the API. + +@item GAWK_API_MINOR_VERSION +The minor version of the API. +@end table + +The minor version increases when new functions are added to the API. Such +new functions are always added to the end of the API @code{struct}. + +The major version increases (and the minor version is reset to zero) if any +of the data types change size or member order, or if any of the existing +functions change signature. + +It could happen that an extension may be compiled against one version +of the API but loaded by a version of @command{gawk} using a different +version. For this reason, the major and minor API versions of the +running @command{gawk} are included in the API @code{struct} as read-only +constant integers: + +@table @code +@item api->major_version +The major version of the running @command{gawk}. + +@item api->minor_version +The minor version of the running @command{gawk}. +@end table + +It is up to the extension to decide if there are API incompatibilities. +Typically a check like this is enough: + +@example +if (api->major_version != GAWK_API_MAJOR_VERSION + || api->minor_version < GAWK_API_MINOR_VERSION) @{ + fprintf(stderr, "foo_extension: version mismatch with gawk!\n"); + fprintf(stderr, "\tmy version (%d, %d), gawk version (%d, %d)\n", + GAWK_API_MAJOR_VERSION, GAWK_API_MINOR_VERSION, + api->major_version, api->minor_version); + exit(1); +@} +@end example + +Such code is included in the boilerplate @code{dl_load_func} macro +provided in @file{gawkapi.h} (discussed later, in +@ref{Extension API Boilerplate}). + +@node Extension API Informational Variables +@subsubsection Informational Variables + +The API provides access to several variables that describe +whether the corresponding command-line options were enabled when +@command{gawk} was invoked. The variables are: + +@table @code +@item do_lint +This variable will be true if the @option{--lint} option was passed +(@pxref{Options}). + +@item do_traditional +This variable will be true if the @option{--traditional} option was passed. + +@item do_profile +This variable will be true if the @option{--profile} option was passed. + +@item do_sandbox +This variable will be true if the @option{--sandbox} option was passed. + +@item do_debug +This variable will be true if the @option{--debug} option was passed. + +@item do_mpfr +This variable will be true if the @option{--bignum} option was passed. +@end table + +The value of @code{do_lint} can change if @command{awk} code +modifies the @code{LINT} built-in variable (@pxref{Built-in Variables}). +The others should not change during execution. + +@node Extension API Boilerplate +@subsection Boilerplate Code + +As mentioned earlier (@pxref{Extension Mechanism Outline}), the function +definitions as presented are really macros. To use these macros, your +extension must provide a small amount of boilerplate code (variables and +functions) using pre-defined names as described below. The boilerplate +needed is also provided in comments in the @file{gawkapi.h} header file: + +@example +/* Boiler plate code: */ +int plugin_is_GPL_compatible; + +static gawk_api_t *const api; +static awk_ext_id_t ext_id; +static const char *ext_version = NULL; /* or @dots{} = "some string" */ + +static awk_ext_func_t func_table[] = @{ + @{ "name", do_name, 1 @}, + /* @dots{} */ +@}; + +/* EITHER: */ + +static awk_bool_t (*init_func)(void) = NULL; + +/* OR: */ + +static awk_bool_t +init_my_module(void) +@{ + @dots{} +@} + +static awk_bool_t (*init_func)(void) = init_my_module; + +dl_load_func(func_table, some_name, "name_space_in_quotes") +@end example + +These variables and functions are as follows: + +@table @code +@item int plugin_is_GPL_compatible; +This asserts that the extension is compatible with the GNU GPL (@pxref{Copying}). +If your extension does not have this, @command{gawk} will not load it. + +@item static gawk_api_t *const api; +This global @code{static} variable should be set to point to +the @code{gawk_api_t} pointer that @command{gawk} passes to your +@code{dl_load()} function. +This variable is used by all of the macros. + +@item static awk_ext_id_t ext_id; +This global static variable should be set to point to the +the @code{awk_ext_id_t} value that @command{gawk} passes to your +@code{dl_load()} function. +This variable is used by all of the macros. + +@item static const char *ext_version = NULL; /* or @dots{} = "some string" */ +This global @code{static} variable should be set either +to @code{NULL}, or to point to a string giving the name and version of +your extension. + +@item static awk_ext_func_t func_table[] = @{ @dots{} @}; +This is an array of one or more @code{awk_ext_func_t} structures +as described earlier (@pxref{Extension Functions}). +It can then be looped over for multiple calls to +@code{add_ext_func()}. + +@item static awk_bool_t (*init_func)(void) = NULL; +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @r{OR} +@itemx static awk_bool_t init_my_module(void) @{ @dots{} @} +@itemx static awk_bool_t (*init_func)(void) = init_my_module; +If you need to do some initialization work, you should define a +function that does it (creates variables, opens files, etc.) +and then define the @code{init_func} pointer to point to your +function. +The function should return zero (false) upon failure, non-zero +(success) if everything goes well. + +If you don't need to do any initialization, define the pointer and +initialize it to @code{NULL}. + +@item dl_load_func(func_table, some_name, "name_space_in_quotes") +This macro expands to a @code{dl_load()} function that performs +all the necessary initializations. +@end table + +The point of the all the variables and arrays is to let the +@code{dl_load()} function do all the standard work. It does +the following: + +@enumerate 1 +@item +Check the API versions. If the extension major version does not match +@command{gawk}'s, or if the extension minor version is greater than +@command{gawk}'s, it prints a fatal error message and exits. + +@item +Load the functions defined in @code{func_table}. +If any of them fails to load, it prints a warning message but +continues on. + +@item +If the @code{init_func} pointer is not @code{NULL}, call the +function it points to. If it returns non-zero, print a +warning message. + +@item +If @code{ext_version} is not @code{NULL}, register +the version string with @command{gawk}. +@end enumerate + +@node Finding Extensions +@subsection How @command{gawk} Finds Extensions + +Compiled extensions have to be installed in a directory where +@command{gawk} can find them. If @command{gawk} is configured and +built in the default fashion, the default directory in which to find +extensions is @file{/usr/local/lib/gawk}. You can also specify a search +path with a list of directories to search for compiled extensions. +@xref{AWKLIBPATH Variable}, for more information. + +@node Extension Example +@section Example: Some File Functions + +@quotation +@i{No matter where you go, there you are.} @* +Buckaroo Bonzai +@end quotation + +@c It's enough to show chdir and stat, no need for fts + +Two useful functions that are not in @command{awk} are @code{chdir()} (so +that an @command{awk} program can change its directory) and @code{stat()} +(so that an @command{awk} program can gather information about a file). +This @value{SECTION} implements these functions for @command{gawk} +in an external extension. + +@menu +* Internal File Description:: What the new functions will do. +* Internal File Ops:: The code for internal file operations. +* Using Internal File Ops:: How to use an external extension. +@end menu + +@node Internal File Description +@subsection Using @code{chdir()} and @code{stat()} + +This @value{SECTION} shows how to use the new functions at +the @command{awk} level once they've been integrated into the +running @command{gawk} interpreter. Using @code{chdir()} is very +straightforward. It takes one argument, the new directory to change to: + +@example +@@load "filefuncs" +@dots{} +newdir = "/home/arnold/funstuff" +ret = chdir(newdir) +if (ret < 0) @{ + printf("could not change to %s: %s\n", + newdir, ERRNO) > "/dev/stderr" + exit 1 +@} +@dots{} +@end example + +The return value is negative if the @code{chdir()} failed, and +@code{ERRNO} (@pxref{Built-in Variables}) is set to a string indicating +the error. + +Using @code{stat()} is a bit more complicated. The C @code{stat()} +function fills in a structure that has a fair amount of information. +The right way to model this in @command{awk} is to fill in an associative +array with the appropriate information: + +@c broke printf for page breaking +@example +file = "/home/arnold/.profile" +ret = stat(file, fdata) +if (ret < 0) @{ + printf("could not stat %s: %s\n", + file, ERRNO) > "/dev/stderr" + exit 1 +@} +printf("size of %s is %d bytes\n", file, fdata["size"]) +@end example + +The @code{stat()} function always clears the data array, even if +the @code{stat()} fails. It fills in the following elements: + +@table @code +@item "name" +The name of the file that was @code{stat()}'ed. + +@item "dev" +@itemx "ino" +The file's device and inode numbers, respectively. + +@item "mode" +The file's mode, as a numeric value. This includes both the file's +type and its permissions. + +@item "nlink" +The number of hard links (directory entries) the file has. + +@item "uid" +@itemx "gid" +The numeric user and group ID numbers of the file's owner. + +@item "size" +The size in bytes of the file. + +@item "blocks" +The number of disk blocks the file actually occupies. This may not +be a function of the file's size if the file has holes. + +@item "atime" +@itemx "mtime" +@itemx "ctime" +The file's last access, modification, and inode update times, +respectively. These are numeric timestamps, suitable for formatting +with @code{strftime()} +(@pxref{Built-in}). + +@item "pmode" +The file's ``printable mode.'' This is a string representation of +the file's type and permissions, such as what is produced by +@samp{ls -l}---for example, @code{"drwxr-xr-x"}. + +@item "type" +A printable string representation of the file's type. The value +is one of the following: + +@table @code +@item "blockdev" +@itemx "chardev" +The file is a block or character device (``special file''). + +@ignore +@item "door" +The file is a Solaris ``door'' (special file used for +interprocess communications). +@end ignore + +@item "directory" +The file is a directory. + +@item "fifo" +The file is a named-pipe (also known as a FIFO). + +@item "file" +The file is just a regular file. + +@item "socket" +The file is an @code{AF_UNIX} (``Unix domain'') socket in the +filesystem. + +@item "symlink" +The file is a symbolic link. +@end table +@end table + +Several additional elements may be present depending upon the operating +system and the type of the file. You can test for them in your @command{awk} +program by using the @code{in} operator +(@pxref{Reference to Elements}): + +@table @code +@item "blksize" +The preferred block size for I/O to the file. This field is not +present on all POSIX-like systems in the C @code{stat} structure. + +@item "linkval" +If the file is a symbolic link, this element is the name of the +file the link points to (i.e., the value of the link). + +@item "rdev" +@itemx "major" +@itemx "minor" +If the file is a block or character device file, then these values +represent the numeric device number and the major and minor components +of that number, respectively. +@end table + +@node Internal File Ops +@subsection C Code for @code{chdir()} and @code{stat()} + +Here is the C code for these extensions.@footnote{This version is +edited slightly for presentation. See @file{extension/filefuncs.c} +in the @command{gawk} distribution for the complete version.} + +The file includes a number of standard header files, and then includes +the @code{"gawkapi.h"} header file which provides the API definitions. +Those are followed by the necessary variable declarations +to make use of the API macros and boilerplate code +(@pxref{Extension API Boilerplate}). + +@c break line for page breaking +@example +#ifdef HAVE_CONFIG_H +#include <config.h> +#endif + +#include <stdio.h> +#include <assert.h> +#include <errno.h> +#include <stdlib.h> +#include <string.h> +#include <unistd.h> + +#include <sys/types.h> +#include <sys/stat.h> + +#include "gawkapi.h" + +#include "gettext.h" +#define _(msgid) gettext(msgid) +#define N_(msgid) msgid + +#include "gawkfts.h" +#include "stack.h" + +static const gawk_api_t *api; /* for convenience macros to work */ +static awk_ext_id_t *ext_id; +static awk_bool_t init_filefuncs(void); +static awk_bool_t (*init_func)(void) = init_filefuncs; +static const char *ext_version = "filefuncs extension: version 1.0"; + +int plugin_is_GPL_compatible; +@end example + +@cindex programming conventions, @command{gawk} internals +By convention, for an @command{awk} function @code{foo()}, the function that +implements it is called @samp{do_foo()}. The function should have two +arguments: the first is an +@samp{int} usually called @code{nargs}, that +represents the number of defined arguments for the function. +The second is a pointer to an @code{awk_result_t}, usually named +@code{result}. + +@example +/* do_chdir --- provide dynamically loaded chdir() builtin for gawk */ + +static awk_value_t * +do_chdir(int nargs, awk_value_t *result) +@{ + awk_value_t newdir; + int ret = -1; + + assert(result != NULL); + + if (do_lint && nargs != 1) + lintwarn(ext_id, _("chdir: called with incorrect number of arguments, expecting 1")); +@end example + +The @code{newdir} +variable represents the new directory to change to, retrieved +with @code{get_argument()}. Note that the first argument is +numbered zero. + +If the argument is retrieved successfully, the function calls the +@code{chdir()} system call. If the @code{chdir()} fails, @code{ERRNO} +is updated. + +@example + if (get_argument(0, AWK_STRING, & newdir)) @{ + ret = chdir(newdir.str_value.str); + if (ret < 0) + update_ERRNO_int(errno); + @} +@end example + +Finally, the function returns the return value to the @command{awk} level: + +@example + return make_number(ret, result); +@} +@end example + +The @code{stat()} built-in is more involved. First comes a function +that turns a numeric mode into a printable representation +(e.g., 644 becomes @samp{-rw-r--r--}). This is omitted here for brevity: + +@c break line for page breaking +@example +/* format_mode --- turn a stat mode field into something readable */ + +static char * +format_mode(unsigned long fmode) +@{ + @dots{} +@} +@end example + +Next comes a function for reading symbolic links, which is also +omitted here for brevity: + +@example +/* read_symlink --- read a symbolic link into an allocated buffer. + @dots{} */ + +static char * +read_symlink(const char *fname, size_t bufsize, ssize_t *linksize) +@{ + @dots{} +@} +@end example + +Two helper functions simplify entering values in the +array that will contain the result of the @code{stat()}: + +@example +/* array_set --- set an array element */ + +static void +array_set(awk_array_t array, const char *sub, awk_value_t *value) +@{ + awk_value_t index; + + set_array_element(array, + make_const_string(sub, strlen(sub), & index), + value); + +@} + +/* array_set_numeric --- set an array element with a number */ + +static void +array_set_numeric(awk_array_t array, const char *sub, double num) +@{ + awk_value_t tmp; + + array_set(array, sub, make_number(num, & tmp)); +@} +@end example + +The following function does most of the work to fill in +the @code{awk_array_t} result array with values obtained +from a valid @code{struct stat}. It is done in a separate function +to support the @code{stat()} function for @command{gawk} and also +to support the @code{fts()} extension which is included in +the same file but whose code is not shown here +(@pxref{Extension Sample File Functions}). + +The first part of the function is variable declarations, +including a table to map file types to strings: + +@example +/* fill_stat_array --- do the work to fill an array with stat info */ + +static int +fill_stat_array(const char *name, awk_array_t array, struct stat *sbuf) +@{ + char *pmode; /* printable mode */ + const char *type = "unknown"; + awk_value_t tmp; + static struct ftype_map @{ + unsigned int mask; + const char *type; + @} ftype_map[] = @{ + @{ S_IFREG, "file" @}, + @{ S_IFBLK, "blockdev" @}, + @{ S_IFCHR, "chardev" @}, + @{ S_IFDIR, "directory" @}, +#ifdef S_IFSOCK + @{ S_IFSOCK, "socket" @}, +#endif +#ifdef S_IFIFO + @{ S_IFIFO, "fifo" @}, +#endif +#ifdef S_IFLNK + @{ S_IFLNK, "symlink" @}, +#endif +#ifdef S_IFDOOR /* Solaris weirdness */ + @{ S_IFDOOR, "door" @}, +#endif /* S_IFDOOR */ + @}; + int j, k; +@end example + +The destination array is cleared, and then code fills in +various elements based on values in the @code{struct stat}: + +@example + /* empty out the array */ + clear_array(array); + + /* fill in the array */ + array_set(array, "name", make_const_string(name, strlen(name), & tmp)); + array_set_numeric(array, "dev", sbuf->st_dev); + array_set_numeric(array, "ino", sbuf->st_ino); + array_set_numeric(array, "mode", sbuf->st_mode); + array_set_numeric(array, "nlink", sbuf->st_nlink); + array_set_numeric(array, "uid", sbuf->st_uid); + array_set_numeric(array, "gid", sbuf->st_gid); + array_set_numeric(array, "size", sbuf->st_size); + array_set_numeric(array, "blocks", sbuf->st_blocks); + array_set_numeric(array, "atime", sbuf->st_atime); + array_set_numeric(array, "mtime", sbuf->st_mtime); + array_set_numeric(array, "ctime", sbuf->st_ctime); + + /* for block and character devices, add rdev, major and minor numbers */ + if (S_ISBLK(sbuf->st_mode) || S_ISCHR(sbuf->st_mode)) @{ + array_set_numeric(array, "rdev", sbuf->st_rdev); + array_set_numeric(array, "major", major(sbuf->st_rdev)); + array_set_numeric(array, "minor", minor(sbuf->st_rdev)); + @} +@end example + +@noindent +The latter part of the function makes selective additions +to the destination array, depending upon the availability of +certain members and/or the type of the file. In the returns zero, +for success: + +@example +#ifdef HAVE_ST_BLKSIZE + array_set_numeric(array, "blksize", sbuf->st_blksize); +#endif /* HAVE_ST_BLKSIZE */ + + pmode = format_mode(sbuf->st_mode); + array_set(array, "pmode", make_const_string(pmode, strlen(pmode), & tmp)); + + /* for symbolic links, add a linkval field */ + if (S_ISLNK(sbuf->st_mode)) @{ + char *buf; + ssize_t linksize; + + if ((buf = read_symlink(name, sbuf->st_size, + & linksize)) != NULL) + array_set(array, "linkval", make_malloced_string(buf, linksize, & tmp)); + else + warning(ext_id, _("stat: unable to read symbolic link `%s'"), name); + @} + + /* add a type field */ + type = "unknown"; /* shouldn't happen */ + for (j = 0, k = sizeof(ftype_map)/sizeof(ftype_map[0]); j < k; j++) @{ + if ((sbuf->st_mode & S_IFMT) == ftype_map[j].mask) @{ + type = ftype_map[j].type; + break; + @} + @} + + array_set(array, "type", make_const_string(type, strlen(type), &tmp)); + + return 0; +@} +@end example + +Finally, here is the @code{do_stat()} function. It starts with +variable declarations and argument checking: + +@ignore +Changed message for page breaking. Used to be: + "stat: called with incorrect number of arguments (%d), should be 2", +@end ignore +@example +/* do_stat --- provide a stat() function for gawk */ + +static awk_value_t * +do_stat(int nargs, awk_value_t *result) +@{ + awk_value_t file_param, array_param; + char *name; + awk_array_t array; + int ret; + struct stat sbuf; + + assert(result != NULL); + + if (do_lint && nargs != 2) @{ + lintwarn(ext_id, _("stat: called with wrong number of arguments")); + return make_number(-1, result); + @} +@end example + +Then comes the actual work. First, the function gets the arguments. +Next, it gets the information for the file. +The code use @code{lstat()} (instead of @code{stat()}) +to get the file information, +in case the file is a symbolic link. +If there's an error, it sets @code{ERRNO} and returns: + +@example + /* file is first arg, array to hold results is second */ + if ( ! get_argument(0, AWK_STRING, & file_param) + || ! get_argument(1, AWK_ARRAY, & array_param)) @{ + warning(ext_id, _("stat: bad parameters")); + return make_number(-1, result); + @} + + name = file_param.str_value.str; + array = array_param.array_cookie; + + /* always empty out the array */ + clear_array(array); + + /* lstat the file, if error, set ERRNO and return */ + ret = lstat(name, & sbuf); + if (ret < 0) @{ + update_ERRNO_int(errno); + return make_number(ret, result); + @} +@end example + +The tedious work is done by @code{fill_stat_array()}, shown +earlier. When done, return the result from @code{fill_stat_array()}: + +@example + ret = fill_stat_array(name, array, & sbuf); + + return make_number(ret, result); +@} +@end example + +@cindex programming conventions, @command{gawk} internals +Finally, it's necessary to provide the ``glue'' that loads the +new function(s) into @command{gawk}. + +The @samp{filefuncs} extension also provides an @code{fts()} +function, which we omit here. For its sake there is an initialization +function: + +@example +/* init_filefuncs --- initialization routine */ + +static awk_bool_t +init_filefuncs(void) +@{ + @dots{} +@} +@end example + +We are almost done. We need an array of @code{awk_ext_func_t} +structures for loading each function into @command{gawk}: + +@example +static awk_ext_func_t func_table[] = @{ + @{ "chdir", do_chdir, 1 @}, + @{ "stat", do_stat, 2 @}, + @{ "fts", do_fts, 3 @}, +@}; +@end example + +Each extension must have a routine named @code{dl_load()} to load +everything that needs to be loaded. The simplest way is to use the +@code{dl_load_func} macro in @code{gawkapi.h}: + +@example +/* define the dl_load function using the boilerplate macro */ + +dl_load_func(func_table, filefuncs, "") +@end example + +And that's it! As an exercise, consider adding functions to +implement system calls such as @code{chown()}, @code{chmod()}, +and @code{umask()}. + +@node Using Internal File Ops +@subsection Integrating the Extensions + +@cindex @command{gawk}, interpreter@comma{} adding code to +Now that the code is written, it must be possible to add it at +runtime to the running @command{gawk} interpreter. First, the +code must be compiled. Assuming that the functions are in +a file named @file{filefuncs.c}, and @var{idir} is the location +of the @file{gawkapi.h} header file, +the following steps@footnote{In practice, you would probably want to +use the GNU Autotools---Automake, Autoconf, Libtool, and Gettext---to +configure and build your libraries. Instructions for doing so are beyond +the scope of this @value{DOCUMENT}. @xref{gawkextlib}, for WWW links to +the tools.} create a GNU/Linux shared library: + +@example +$ @kbd{gcc -fPIC -shared -DHAVE_CONFIG_H -c -O -g -I@var{idir} filefuncs.c} +$ @kbd{ld -o filefuncs.so -shared filefuncs.o -lc} +@end example + +Once the library exists, it is loaded by using the @code{@@load} keyword. + +@example +# file testff.awk +@@load "filefuncs" + +BEGIN @{ + "pwd" | getline curdir # save current directory + close("pwd") + + chdir("/tmp") + system("pwd") # test it + chdir(curdir) # go back + + print "Info for testff.awk" + ret = stat("testff.awk", data) + print "ret =", ret + for (i in data) + printf "data[\"%s\"] = %s\n", i, data[i] + print "testff.awk modified:", + strftime("%m %d %y %H:%M:%S", data["mtime"]) + + print "\nInfo for JUNK" + ret = stat("JUNK", data) + print "ret =", ret + for (i in data) + printf "data[\"%s\"] = %s\n", i, data[i] + print "JUNK modified:", strftime("%m %d %y %H:%M:%S", data["mtime"]) +@} +@end example + +The @env{AWKLIBPATH} environment variable tells +@command{gawk} where to find shared libraries (@pxref{Finding Extensions}). +We set it to the current directory and run the program: + +@example +$ @kbd{AWKLIBPATH=$PWD gawk -f testff.awk} +@print{} /tmp +@print{} Info for testff.awk +@print{} ret = 0 +@print{} data["blksize"] = 4096 +@print{} data["mtime"] = 1350838628 +@print{} data["mode"] = 33204 +@print{} data["type"] = file +@print{} data["dev"] = 2053 +@print{} data["gid"] = 1000 +@print{} data["ino"] = 1719496 +@print{} data["ctime"] = 1350838628 +@print{} data["blocks"] = 8 +@print{} data["nlink"] = 1 +@print{} data["name"] = testff.awk +@print{} data["atime"] = 1350838632 +@print{} data["pmode"] = -rw-rw-r-- +@print{} data["size"] = 662 +@print{} data["uid"] = 1000 +@print{} testff.awk modified: 10 21 12 18:57:08 +@print{} +@print{} Info for JUNK +@print{} ret = -1 +@print{} JUNK modified: 01 01 70 02:00:00 +@end example + +@node Extension Samples +@section The Sample Extensions in the @command{gawk} Distribution + +This @value{SECTION} provides brief overviews of the sample extensions +that come in the @command{gawk} distribution. Some of them are intended +for production use, such the @code{filefuncs} and @code{readdir} extensions. +Others mainly provide example code that shows how to use the extension API. + +@menu +* Extension Sample File Functions:: The file functions sample. +* Extension Sample Fnmatch:: An interface to @code{fnmatch()}. +* Extension Sample Fork:: An interface to @code{fork()} and other + process functions. +* Extension Sample Ord:: Character to value to character + conversions. +* Extension Sample Readdir:: An interface to @code{readdir()}. +* Extension Sample Revout:: Reversing output sample output wrapper. +* Extension Sample Rev2way:: Reversing data sample two-way processor. +* Extension Sample Read write array:: Serializing an array to a file. +* Extension Sample Readfile:: Reading an entire file into a string. +* Extension Sample API Tests:: Tests for the API. +* Extension Sample Time:: An interface to @code{gettimeofday()} + and @code{sleep()}. +@end menu + +@node Extension Sample File Functions +@subsection File Related Functions + +The @code{filefuncs} extension provides three different functions, as follows: +The usage is: + +@table @code +@item @@load "filefuncs" +This is how you load the extension. + +@item result = chdir("/some/directory") +The @code{chdir()} function is a direct hook to the @code{chdir()} +system call to change the current directory. It returns zero +upon success or less than zero upon error. In the latter case it updates +@code{ERRNO}. + +@item result = stat("/some/path", statdata) +The @code{stat()} function provides a hook into the +@code{stat()} system call. In fact, it uses @code{lstat()}. +It returns zero upon success or less than zero upon error. +In the latter case it updates @code{ERRNO}. + +In all cases, it clears the @code{statdata} array. +When the call is successful, @code{stat()} fills the @code{statdata} +array with information retrieved from the filesystem, as follows: + +@c nested table +@table @code +@item statdata["name"] +The name of the file. + +@item statdata["dev"] +Corresponds to the @code{st_dev} field in the @code{struct stat}. + +@item statdata["ino"] +Corresponds to the @code{st_ino} field in the @code{struct stat}. + +@item statdata["mode"] +Corresponds to the @code{st_mode} field in the @code{struct stat}. + +@item statdata["nlink"] +Corresponds to the @code{st_nlink} field in the @code{struct stat}. + +@item statdata["uid"] +Corresponds to the @code{st_uid} field in the @code{struct stat}. + +@item statdata["gid"] +Corresponds to the @code{st_gid} field in the @code{struct stat}. + +@item statdata["size"] +Corresponds to the @code{st_size} field in the @code{struct stat}. + +@item statdata["atime"] +Corresponds to the @code{st_atime} field in the @code{struct stat}. + +@item statdata["mtime"] +Corresponds to the @code{st_mtime} field in the @code{struct stat}. + +@item statdata["ctime"] +Corresponds to the @code{st_ctime} field in the @code{struct stat}. + +@item statdata["rdev"] +Corresponds to the @code{st_rdev} field in the @code{struct stat}. + This element is only present for device files. + +@item statdata["major"] +Corresponds to the @code{st_major} field in the @code{struct stat}. +This element is only present for device files. + +@item statdata["minor"] +Corresponds to the @code{st_minor} field in the @code{struct stat}. +This element is only present for device files. + +@item statdata["blksize"] +Corresponds to the @code{st_blksize} field in the @code{struct stat}. +if this field is present on your system. +(It is present on all modern systems that we know of.) + +@item statdata["pmode"] +A human-readable version of the mode value, such as printed by +@command{ls}. For example, @code{"-rwxr-xr-x"}. + +@item statdata["linkval"] +If the named file is a symbolic link, this element will exist +and its value is the value of the symbolic link (where the +symbolic link points to). + +@item statdata["type"] +The type of the file as a string. One of +@code{"file"}, +@code{"blockdev"}, +@code{"chardev"}, +@code{"directory"}, +@code{"socket"}, +@code{"fifo"}, +@code{"symlink"}, +@code{"door"}, +or +@code{"unknown"}. +Not all systems support all file types. +@end table + +@item flags = or(FTS_PHYSICAL, ...) +@itemx result = fts(pathlist, flags, filedata) +Walk a the file trees provided in @code{pathlist} and +fill in the @code{filedata} array as described below. +@code{flags} is the bitwise OR of several predefined constant +values, also as described below. +@end table + +The @code{fts()} function provides a hook to the @code{fts()} set of +routines for traversing file hierarchies. Instead of returning data +about one file at a time in a stream, it fills in a multi-dimensional +array with data about each file and directory encountered in the requested +hierarchies. + +The arguments are as follows: + +@table @code +@item pathlist +An array of filenames. The element values are used; the index values are ignored. + +@item flags +This should be the bitwise OR of one or more of the following +predefined constant flag values. At least one of +@code{FTS_LOGICAL} or @code{FTS_PHYSICAL} must be provided; otherwise +@code{fts()} returns an error value and sets @code{ERRNO}. +The flags are: + +@c nested table +@table @code +@item FTS_LOGICAL +Do a ``logical'' file traversal, where the information returned for +a symbolic link refers to the linked-to file, and not to the symbolic +link itself. This flag is mutually exclusive with @code{FTS_PHYSICAL}. + +@item FTS_PHYSICAL +Do a ``physical'' file traversal, where the information returned for a +symbolic link refers to the symbolic link itself. This flag is mutually +exclusive with @code{FTS_LOGICAL}. + +@item FTS_NOCHDIR +As a performance optimization, the C library @code{fts()} routines +change directory as they traverse a file hierarchy. This flag disables +that optimization. + +@item FTS_COMFOLLOW +Immediately follow a symbolic link named in @code{pathlist}, +whether or not @code{FTS_LOGICAL} is set. + +@item FTS_SEEDOT +By default, the @code{fts()} routines do not return entries for @file{.} +and @file{..}. This option causes entries for @file{..} to also +be included. (This extension always includes an entry for @file{.}, +see below.) + +@item FTS_XDEV +During a traversal, do not cross onto a different mounted filesystem. +@end table + +@item filedata +The @code{filedata} array is first cleared. Then, @code{fts()} creates +an element in @code{filedata} for every element in @code{pathlist}. +The index is the name of the directory or file given in @code{pathlist}. +The element for this index is itself an array. There are two cases. + +@c nested table +@table @asis +@item The path is a file. +In this case, the array contains two or three elements: + +@c doubly nested table +@table @code +@item "path" +The full path to this file, starting from the ``root'' that was given +in the @code{pathlist} array. + +@item "stat" +This element is itself an array, containing the same information as provided +by the @code{stat()} function described earlier for its +@code{statdata} argument. The element may not be present if +@code{stat()} system call for the file failed. + +@item "error" +If some kind of error was encountered, the array will also +contain an element named @code{"error"}, which is a string describing the error. +@end table + +@item The path is a directory. +In this case, the array contains one element for each entry in the +directory. If an entry is a file, that element is as for files, just +described. If the entry is a directory, that element is (recursively), +an array describing the subdirectory. If @code{FTS_SEEDOT} was provided +in the flags, then there will also be an element named @code{".."}. This +element will be an array containing the data as provided by @code{stat()}. + +In addition, there will be an element whose index is @code{"."}. +This element is an array containing the same two or three elements as +for a file: @code{"path"}, @code{"stat"}, and @code{"error"}. +@end table +@end table + +The @code{fts()} function returns 0 if there were no errors. Otherwise +it returns @minus{}1. + +@quotation NOTE +The @code{fts()} extension does not exactly mimic the +interface of the C library @code{fts()} routines, choosing instead to +provide an interface that is based on associative arrays, which should +be more comfortable to use from an @command{awk} program. This includes the +lack of a comparison function, since @command{gawk} already provides +powerful array sorting facilities. While an @code{fts_read()}-like +interface could have been provided, this felt less natural than simply +creating a multi-dimensional array to represent the file hierarchy and +its information. +@end quotation + +See @file{test/fts.awk} in the @command{gawk} distribution for an example. + +@node Extension Sample Fnmatch +@subsection Interface To @code{fnmatch()} + +This extension provides and interface to the C library +@code{fnmatch()} function. The usage is: + +@example +@@load "fnmatch" + +result = fnmatch(pattern, string, flags) +@end example + +The @code{fnmatch} extension adds a single function named +@code{fnmatch()}, one constant (@code{FNM_NOMATCH}), and an array of +flag values named @code{FNM}. + +The arguments to @code{fnmatch()} are: + +@table @code +@item pattern +The filename wildcard to match. + +@item string +The filename string, + +@item flag +Either zero, or the bitwise OR of one or more of the +flags in the @code{FNM} array. +@end table + +The return value is zero on success, @code{FNM_NOMATCH} +if the string did not match the pattern, or +a different non-zero value if an error occurred. + +The flags are follows: + +@table @code +@item FNM["CASEFOLD"] +Corresponds to the @code{FNM_CASEFOLD} flag as defined in @code{fnmatch()}. + +@item FNM["FILE_NAME"] +Corresponds to the @code{FNM_FILE_NAME} flag as defined in @code{fnmatch()}. + +@item FNM["LEADING_DIR"] +Corresponds to the @code{FNM_LEADING_DIR} flag as defined in @code{fnmatch()}. + +@item FNM["NOESCAPE"] +Corresponds to the @code{FNM_NOESCAPE} flag as defined in @code{fnmatch()}. + +@item FNM["PATHNAME"] +Corresponds to the @code{FNM_PATHNAME} flag as defined in @code{fnmatch()}. + +@item FNM["PERIOD"] +Corresponds to the @code{FNM_PERIOD} flag as defined in @code{fnmatch()}. +@end table + +Here is an example: + +@example +@@load "fnmatch" +@dots{} +flags = or(FNM["PERIOD"], FNM["NOESCAPE"]) +if (fnmatch("*.a", "foo.c", flags) == FNM_NOMATCH) + print "no match" +@end example + +@node Extension Sample Fork +@subsection Interface to @code{fork()}, @code{wait()} and @code{waitpid()} + +The @code{fork} extension adds three functions, as follows. + +@table @code +@item @@load "fork" +This is how you load the extension. + +@item pid = fork() +This function creates a new process. The return value is the zero in the +child and the process-id number of the child in the parent, or @minus{}1 +upon error. In the latter case, @code{ERRNO} indicates the problem. +In the child, @code{PROCINFO["pid"]} and @code{PROCINFO["ppid"]} are +updated to reflect the correct values. + +@item ret = waitpid(pid) +This function takes a numeric argument, which is the process-id to +wait for. The return value is that of the +@code{waitpid()} system call. + +@item ret = wait() +This function waits for the first child to die. +The return value is that of the +@code{wait()} system call. +@end table + +There is no corresponding @code{exec()} function. + +Here is an example: + +@example +@@load "fork" +@dots{} +if ((pid = fork()) == 0) + print "hello from the child" +else + print "hello from the parent" +@end example + +@node Extension Sample Ord +@subsection Character and Numeric values: @code{ord()} and @code{chr()} + +The @code{ordchr} extension adds two functions, named +@code{ord()} and @code{chr()}, as follows. + +@table @code +@item number = ord(string) +This function takes a string argument, and returns the +numeric value of the first character in the string. + +@item char = chr(number) +This function takes a numeric argument and returns a string +whose first character is that represented by the number. +@end table + +These functions are inspired by the Pascal language functions +of the same name. + +Here is an example: + +@example +@@load "ordchr" +@dots{} +printf("The numeric value of 'A' is %d\n", ord("A")) +printf("The string value of 65 is %s\n", chr(65)) +@end example + +@node Extension Sample Readdir +@subsection Reading Directories + +The @code{readdir} extension adds an input parser for directories, and +adds a single function named @code{readdir_do_ftype()}. +The usage is as follows: + +@example +@@load "readdir" + +readdir_do_ftype("stat") # or "dirent" or "never" +@end example + +When this extension is in use, instead of skipping directories named +on the command line (or with @code{getline}), +they are read, with each entry returned as a record. + +The record consists of at least two fields: the inode number and the +filename, separated by a forward slash character. +On systems where the directory entry contains the file type, the record +has a third field which is a single letter indicating the type of the +file: +@samp{f} +for file, +@samp{d} +for directory, +@samp{b} +for a block device, +@samp{c} +for a character device, +@samp{p} +for a FIFO, +@samp{l} +for a symbolic link, +@samp{s} +for a socket, and +@samp{u} +(unknown) for anything else. + +On systems without the file type information, calling +@samp{readdir_do_ftype("stat")} causes the extension to use the +@code{lstat()} system call to retrieve the appropriate information. This +is not the default, since @code{lstat()} is a potentially expensive +operation. By calling @samp{readdir_do_ftype("never")} one can ensure +that the file type information is never displayed, even when readily +available in the directory entry. + +The third option, @samp{readdir_do_ftype("dirent")}, takes file type +information from the directory entry, if it is available. This is the +default on systems that supply this information. + +The @code{readdir_do_ftype()} function sets @code{ERRNO} if called +without arguments or with invalid arguments. + +@quotation NOTE +On GNU/Linux systems, there are filesystems that don't support the +@code{d_type} entry (see the @i{readdir}(3) manual page), and so the file +type is always @code{u}. Therefore, using @samp{readdir_do_ftype("stat")} +is advisable even on GNU/Linux systems. In this case, the @code{readdir} +extension will fall back to using @code{lstat()} when it encounters an +unknown file type. +@end quotation + +Here is an example: + +@example +@@load "readdir" +@dots{} +BEGIN @{ FS = "/" @} +@{ print "file name is", $2 @} +@end example + +@node Extension Sample Revout +@subsection Reversing Output + +The @code{revoutput} extension adds a simple output wrapper that reverses +the characters in each output line. It's main purpose is to show how to +write an output wrapper, although it may be mildly amusing for the unwary. + +@example +@@load "revoutput" + +BEGIN @{ + REVOUT = 1 + print "hello, world" > "/dev/stdout" +@} +@end example + +The output from this program is: +@samp{dlrow ,olleh}. + +@node Extension Sample Rev2way +@subsection Two-Way I/O Example + +The @code{revtwoway} extension adds a simple two-way processor that +reverses the characters in each line sent to it for reading back by +the @command{awk} program. It's main purpose is to show how to write +a two-way extension, although it may also be mildly amusing. + +The following example shows how to use it: + +@example +@@load "revtwoway" + +BEGIN @{ + cmd = "/magic/mirror" + print "hello, world" |& cmd + cmd |& getline result + print result + close(cmd) +@} +@end example + +@node Extension Sample Read write array +@subsection Dumping and Restoring An Array + +The @code{rwarray} extension adds two functions, +named @code{writea()} and @code{reada()}, as follows: + +@table @code +@item ret = writea(file, array) +This function takes a string argument, which is the name of the file +to which dump the array, and the array itself as the second argument. +@code{writea()} understands multidimensional arrays. It returns 1 on +success, 0 on failure. + +@item ret = reada(file, array) +@code{reada()} is the inverse of @code{writea()}; +it reads the file named as its first argument, filling in +the array named as the second argument. It clears the array first. +Here too, the return value is 1 on success and 0 on failure. +@end table + +The array created by @code{reada()} is identical to that written by +@code{writea()} in the sense that the contents are the same. However, +due to implementation issues, the array traversal order of the recreated +array will likely be different from that of the original array. As array +traversal order in @command{awk} is by default undefined, this is not +(technically) a problem. If you need to guarantee a particular traversal +order, use the array sorting features in @command{gawk} to do so. + +The file contains binary data. All integral values are written in network +byte order. However, double precision floating-point values are written +as native binary data. Thus, arrays containing only string data can +theoretically be dumped on systems with one byte order and restored on +systems with a different one, but this has not been tried. + +Here is an example: + +@example +@@load "rwarray" +@dots{} +ret = writea("arraydump.bin", array) +@dots{} +ret = reada("arraydump.bin", array) +@end example + +@node Extension Sample Readfile +@subsection Reading An Entire File + +The @code{readfile} extension adds a single function +named @code{readfile()}: + +@table @code +@item result = readfile("/some/path") +The argument is the name of the file to read. The return value is a +string containing the entire contents of the requested file. Upon error, +the function returns the empty string and sets @code{ERRNO}. +@end table + +Here is an example: + +@example +@@load "readfile" +@dots{} +contents = readfile("/path/to/file"); +if (contents == "" && ERRNO != "") @{ + print("problem reading file", ERRNO) > "/dev/stderr" + ... +@} +@end example + +@node Extension Sample API Tests +@subsection API Tests + +The @code{testext} extension exercises parts of the extension API that +are not tested by the other samples. The @file{extension/testext.c} +file contains both the C code for the extension and @command{awk} +test code inside C comments that run the tests. The testing framework +extracts the @command{awk} code and runs the tests. See the source file +for more information. + +@node Extension Sample Time +@subsection Time Functions + +@cindex time +@cindex sleep + +These functions can be used by either invoking @command{gawk} +with a command-line argument of @option{-l time} or by +inserting @code{@@load "time"} in your script. + +@table @code + +@cindex @code{gettimeofday} time extension function +@item gettimeofday() +This function returns the time that has elapsed since 1970-01-01 UTC +as a floating point value. It should have sub-second precision, but +the actual precision will vary based on the platform. If the time +is unavailable on this platform, it returns @minus{}1 and sets @code{ERRNO}. +If the standard C @code{gettimeofday()} system call is available on this platform, +then it simply returns the value. Otherwise, if on Windows, +it tries to use @code{GetSystemTimeAsFileTime()}. + +@cindex @code{sleep} time extension function +@item sleep(@var{seconds}) +This function attempts to sleep for @var{seconds} seconds. +Note that @var{seconds} may be a floating-point (non-integral) value. +If @var{seconds} is negative, or the attempt to sleep fails, +then it returns @minus{}1 and sets @code{ERRNO}. +Otherwise, the function should return 0 after sleeping +for the indicated amount of time. Implementation details: depending +on platform availability, it tries to use @code{nanosleep()} or @code{select()} +to implement the delay. + +@end table + +@node gawkextlib +@section The @code{gawkextlib} Project + +The @uref{http://sourceforge.net/projects/gawkextlib/, @code{gawkextlib}} +project provides a number of @command{gawk} extensions, including one for +processing XML files. This is the evolution of the original @command{xgawk} +(XML @command{gawk}) project. + +As of this writing, there are four extensions: + +@itemize @bullet +@item +XML parser extension, using the @uref{http://expat.sourceforge.net, Expat} +XML parsing library + +@item +Postgres SQL extension + +@item +GD graphics library extension + +@item +MPFR library extension. +This provides access to a number of MPFR functions which @command{gawk}'s +native MPFR support does not. +@end itemize + +The @code{time} extension described earlier +(@pxref{Extension Sample Time}) +was originally from this project but has been moved in to the +main @command{gawk} distribution. + +You can check out the code for the @code{gawkextlib} project +using the @uref{http://git-scm.com, GIT} distributed source +code control system. The command is as follows: + +@example +git clone git://git.code.sf.net/p/gawkextlib/code gawkextlib-code +@end example + +You will need to have the @uref{http://expat.sourceforge.net, Expat} +XML parser library installed in order to build and use the XML extension. + +In addition, you should have the GNU Autotools installed +(@uref{http://www.gnu.org/software/autoconf, Autoconf}, +@uref{http://www.gnu.org/software/automake, Automake}, +@uref{http://www.gnu.org/software/libtool, Libtool}, +and +@uref{http://www.gnu.org/software/gettext, Gettext}). + +The simple recipe for building and testing @code{gawkextlib} is as follows. +First, build and install @command{gawk}: + +@example +cd .../path/to/gawk/code +./configure --prefix=/tmp/newgawk @ii{Install in /tmp/newgawk for now} +make && make check @ii{Build and check that all is OK} +make install @ii{Install gawk} +@end example + +Next, build @code{gawkextlib} and test it: + +@example +cd .../path/to/gawkextlib-code +./update-autotools @ii{Generate configure, etc. May have to run twice} +./configure --with-gawk=/tmp/newgawk @ii{Configure, point at ``installed'' gawk} +make && make check @ii{Build and check that all is OK} +@end example + +@node Fake Chapter +@chapter Fake Sections For Cross References + +@menu +* Reference to Elements:: Referring to an Array Element. +* Built-in:: Built-in Functions. +* Built-in Variables:: Built-in Variables. +* Options:: Command-Line Options. +* AWKLIBPATH Variable:: The @env{AWKLIBPATH} Environment Variable. +* BEGINFILE/ENDFILE:: The @code{BEGINFILE} and @code{ENDFILE} Special Patterns. +* Redirection:: Redirecting Output of @code{print} and @code{printf}. +* Arrays:: Arrays in @command{awk}. +* Conversion:: Conversion of Strings and Numbers. +* Delete:: The @code{delete} Statement. +* String Functions:: String-Manipulation Functions. +* Glossary:: Glossary. +* Copying:: GNU General Public License. +@end menu + +@node Reference to Elements +@section Referring to an Array Element + +@node Built-in +@section Built-in Functions + +@node Built-in Variables +@section Built-in Variables + +@node Options +@section Command-Line Options + +@node AWKLIBPATH Variable +@section The @env{AWKLIBPATH} Environment Variable + +@node BEGINFILE/ENDFILE +@section The @code{BEGINFILE} and @code{ENDFILE} Special Patterns + +@node Redirection +@section Redirecting Output of @code{print} and @code{printf} + +@node Arrays +@section Arrays in @command{awk} + +@node Conversion +@section Conversion of Strings and Numbers + +@node Delete +@section The @code{delete} Statement + +@node String Functions +@section String-Manipulation Functions + +@node Glossary +@section Glossary + +@node Copying +@section GNU General Public License + +@bye |