diff options
-rw-r--r-- | doc/api.texi | 4103 |
1 files changed, 0 insertions, 4103 deletions
diff --git a/doc/api.texi b/doc/api.texi deleted file mode 100644 index 6d048def..00000000 --- a/doc/api.texi +++ /dev/null @@ -1,4103 +0,0 @@ -\input texinfo @c -*-texinfo-*- -@c %**start of header (This is for running Texinfo on a region.) -@setfilename api.info -@settitle Writing Extensions For Gawk -@c %**end of header (This is for running Texinfo on a region.) - -@dircategory Text creation and manipulation -@direntry -* Gawk: (gawk). A text scanning and processing language. -@end direntry -@dircategory Individual utilities -@direntry -* awk: (gawk)Invoking gawk. Text scanning and processing. -@end direntry - -@set xref-automatic-section-title - -@c The following information should be updated here only! -@c This sets the edition of the document, the version of gawk it -@c applies to and all the info about who's publishing this edition - -@c These apply across the board. -@set UPDATE-MONTH October, 2012 -@set VERSION 4.1 -@set PATCHLEVEL 0 - -@set FSF - -@set TITLE Writing Extensions for Gawk -@set SUBTITLE A Temporary Manual -@set EDITION 1 - -@iftex -@set DOCUMENT book -@set CHAPTER chapter -@set APPENDIX appendix -@set SECTION section -@set SUBSECTION subsection -@set DARKCORNER @inmargin{@image{lflashlight,1cm}, @image{rflashlight,1cm}} -@set COMMONEXT (c.e.) -@end iftex -@ifinfo -@set DOCUMENT Info file -@set CHAPTER major node -@set APPENDIX major node -@set SECTION minor node -@set SUBSECTION node -@set DARKCORNER (d.c.) -@set COMMONEXT (c.e.) -@end ifinfo -@ifhtml -@set DOCUMENT Web page -@set CHAPTER chapter -@set APPENDIX appendix -@set SECTION section -@set SUBSECTION subsection -@set DARKCORNER (d.c.) -@set COMMONEXT (c.e.) -@end ifhtml -@ifdocbook -@set DOCUMENT book -@set CHAPTER chapter -@set APPENDIX appendix -@set SECTION section -@set SUBSECTION subsection -@set DARKCORNER (d.c.) -@set COMMONEXT (c.e.) -@end ifdocbook -@ifplaintext -@set DOCUMENT book -@set CHAPTER chapter -@set APPENDIX appendix -@set SECTION section -@set SUBSECTION subsection -@set DARKCORNER (d.c.) -@set COMMONEXT (c.e.) -@end ifplaintext - -@c some special symbols -@iftex -@set LEQ @math{@leq} -@set PI @math{@pi} -@end iftex -@ifnottex -@set LEQ <= -@set PI @i{pi} -@end ifnottex - -@ifnottex -@macro ii{text} -@i{\text\} -@end macro -@end ifnottex - -@c For HTML, spell out email addresses, to avoid problems with -@c address harvesters for spammers. -@ifhtml -@macro EMAIL{real,spelled} -``\spelled\'' -@end macro -@end ifhtml -@ifnothtml -@macro EMAIL{real,spelled} -@email{\real\} -@end macro -@end ifnothtml - -@set FN file name -@set FFN File Name -@set DF data file -@set DDF Data File -@set PVERSION version -@set CTL Ctrl - -@ignore -Some comments on the layout for TeX. -1. Use at least texinfo.tex 2000-09-06.09 -2. I have done A LOT of work to make this look good. There are `@page' commands - and use of `@group ... @end group' in a number of places. If you muck - with anything, it's your responsibility not to break the layout. -@end ignore - -@c merge the function and variable indexes into the concept index -@ifinfo -@synindex fn cp -@synindex vr cp -@end ifinfo -@iftex -@syncodeindex fn cp -@syncodeindex vr cp -@end iftex -@ifxml -@syncodeindex fn cp -@syncodeindex vr cp -@end ifxml - -@c If "finalout" is commented out, the printed output will show -@c black boxes that mark lines that are too long. Thus, it is -@c unwise to comment it out when running a master in case there are -@c overfulls which are deemed okay. - -@iftex -@finalout -@end iftex - -@copying -Copyright @copyright{} 2012 -Free Software Foundation, Inc. -@sp 2 - -This is Edition @value{EDITION} of @cite{@value{TITLE}: @value{SUBTITLE}}, -for the @value{VERSION}.@value{PATCHLEVEL} (or later) version of the GNU -implementation of AWK. - -Permission is granted to copy, distribute and/or modify this document -under the terms of the GNU Free Documentation License, Version 1.3 or -any later version published by the Free Software Foundation; with the -Invariant Sections being ``GNU General Public License'', the Front-Cover -texts being (a) (see below), and with the Back-Cover Texts being (b) -(see below). A copy of the license is included in the section entitled -``GNU Free Documentation License''. - -@enumerate a -@item -``A GNU Manual'' - -@item -``You have the freedom to -copy and modify this GNU manual. Buying copies from the FSF -supports it in developing GNU and promoting software freedom.'' -@end enumerate -@end copying - -@c Comment out the "smallbook" for technical review. Saves -@c considerable paper. Remember to turn it back on *before* -@c starting the page-breaking work. - -@c 4/2002: Karl Berry recommends commenting out this and the -@c `@setchapternewpage odd', and letting users use `texi2dvi -t' -@c if they want to waste paper. -@c @smallbook - - -@c Uncomment this for the release. Leaving it off saves paper -@c during editing and review. -@setchapternewpage odd - -@titlepage -@title @value{TITLE} -@subtitle @value{SUBTITLE} -@subtitle Edition @value{EDITION} -@subtitle @value{UPDATE-MONTH} -@author Arnold D. Robbins - -@c Include the Distribution inside the titlepage environment so -@c that headings are turned off. Headings on and off do not work. - -@page -@vskip 0pt plus 1filll -``To boldly go where no man has gone before'' is a -Registered Trademark of Paramount Pictures Corporation. @* -@c sorry, i couldn't resist -@sp 3 -Published by: -@sp 1 - -Free Software Foundation @* -51 Franklin Street, Fifth Floor @* -Boston, MA 02110-1301 USA @* -Phone: +1-617-542-5942 @* -Fax: +1-617-542-2652 @* -Email: @email{gnu@@gnu.org} @* -URL: @uref{http://www.gnu.org/} @* - -@c This one is correct for gawk 3.1.0 from the FSF -ISBN 1-882114-28-0 @* -@sp 2 -@insertcopying -@end titlepage - -@ifnottex -@node Top -@top Top Node - -Fake top node. - -@insertcopying - -@end ifnottex - -@menu -* Extension API:: Writing Extensions for @command{gawk}. -* Fake Chapter:: Fake Sections For Cross References. - -@detailmenu -* Extension Intro:: What is an extension. -* Plugin License:: A note about licensing. -* Extension Design:: Design notes about the extension API. -* Old Extension Problems:: Problems with the old mechanism. -* Extension New Mechanism Goals:: Goals for the new mechanism. -* Extension Other Design Decisions:: Some other design decisions. -* Extension Mechanism Outline:: An outline of how it works. -* Extension Future Growth:: Some room for future growth. -* Extension API Description:: A full description of the API. -* Extension API Functions Introduction:: Introduction to the API functions. -* General Data Types:: The data types. -* Requesting Values:: How to get a value. -* Constructor Functions:: Functions for creating values. -* Registration Functions:: Functions to register things with - @command{gawk}. -* Extension Functions:: Registering extension functions. -* Input Parsers:: Registering an input parser. -* Output Wrappers:: Registering an output wrapper. -* Two-way processors:: Registering a two-way processor. -* Exit Callback Functions:: Registering an exit callback. -* Extension Version String:: Registering a version string. -* Printing Messages:: Functions for printing messages. -* Updating @code{ERRNO}:: Functions for updating @code{ERRNO}. -* Accessing Parameters:: Functions for accessing parameters. -* Symbol Table Access:: Functions for accessing global - variables. -* Symbol table by name:: Accessing variables by name. -* Symbol table by cookie:: Accessing variables by ``cookie''. -* Cached values:: Creating and using cached values. -* Array Manipulation:: Functions for working with arrays. -* Array Data Types:: Data types for working with arrays. -* Array Functions:: Functions for working with arrays. -* Flattening Arrays:: How to flatten arrays. -* Creating Arrays:: How to create and populate arrays. -* Extension API Variables:: Variables provided by the API. -* Extension Versioning:: API Version information. -* Extension API Informational Variables:: Variables providing information about - @command{gawk}'s invocation. -* Extension API Boilerplate:: Boilerplate code for using the API. -* Finding Extensions:: How @command{gawk} find compiled - extensions. -* Extension Example:: Example C code for an extension. -* Internal File Description:: What the new functions will do. -* Internal File Ops:: The code for internal file operations. -* Using Internal File Ops:: How to use an external extension. -* Extension Samples:: The sample extensions that ship with - @code{gawk}. -* Extension Sample File Functions:: The file functions sample. -* Extension Sample Fnmatch:: An interface to @code{fnmatch()}. -* Extension Sample Fork:: An interface to @code{fork()} and - other process functions. -* Extension Sample Ord:: Character to value to character - conversions. -* Extension Sample Readdir:: An interface to @code{readdir()}. -* Extension Sample Revout:: Reversing output sample output - wrapper. -* Extension Sample Rev2way:: Reversing data sample two-way - processor. -* Extension Sample Read write array:: Serializing an array to a file. -* Extension Sample Readfile:: Reading an entire file into a string. -* Extension Sample API Tests:: Tests for the API. -* Extension Sample Time:: An interface to @code{gettimeofday()} - and @code{sleep()}. -* gawkextlib:: The @code{gawkextlib} project. -* Reference to Elements:: Referring to an Array Element. -* Built-in:: Built-in Functions. -* Built-in Variables:: Built-in Variables. -* Options:: Command-Line Options. -@end detailmenu -@end menu - -@contents - -@node Extension API -@chapter Writing Extensions for @command{gawk} - -It is possible to add new built-in functions to @command{gawk} using -dynamically loaded libraries. This facility is available on systems (such -as GNU/Linux) that support the C @code{dlopen()} and @code{dlsym()} -functions. This @value{CHAPTER} describes how to create extensions -using code written in C or C++. If you don't know anything about C -programming, you can safely skip this @value{CHAPTER}, although you -may wish to review the documentation on the extensions that come with -@command{gawk} (@pxref{Extension Samples}), and the section on the -@code{gawkextlib} project (@pxref{gawkextlib}). - -@quotation NOTE -When @option{--sandbox} is specified, extensions are disabled -(@pxref{Options}). -@end quotation - -@menu -* Extension Intro:: What is an extension. -* Plugin License:: A note about licensing. -* Extension Design:: Design notes about the extension API. -* Extension API Description:: A full description of the API. -* Extension Example:: Example C code for an extension. -* Extension Samples:: The sample extensions that ship with - @code{gawk}. -* gawkextlib:: The @code{gawkextlib} project. -@end menu - -@node Extension Intro -@section Introduction - -An @dfn{extension} (sometimes called a @dfn{plug-in}) is a piece of -external compiled code that @command{gawk} can load at runtime to -provide additional functionality, over and above the built-in capabilities -described in the rest of this @value{DOCUMENT}. - -Extensions are useful because they allow you (of course) to extend -@command{gawk}'s functionality. For example, they can provide access to -system calls (such as @code{chdir()} to change directory) and to other -C library routines that could be of use. As with most software, -``the sky is the limit;'' if you can imagine something that you might -want to do and can write in C or C++, you can write an extension to do it! - -Extensions are written in C or C++, using the @dfn{Application Programming -Interface} (API) defined for this purpose by the @command{gawk} -developers. The rest of this @value{CHAPTER} explains the design -decisions behind the API, the facilities it provides and how to use -them, and presents a small sample extension. In addition, it documents -the sample extensions included in the @command{gawk} distribution, -and describes the @code{gawkextlib} project. - -@node Plugin License -@section Extension Licensing - -Every dynamic extension should define the global symbol -@code{plugin_is_GPL_compatible} to assert that it has been licensed under -a GPL-compatible license. If this symbol does not exist, @command{gawk} -emits a fatal error and exits when it tries to load your extension. - -The declared type of the symbol should be @code{int}. It does not need -to be in any allocated section, though. The code merely asserts that -the symbol exists in the global scope. Something like this is enough: - -@example -int plugin_is_GPL_compatible; -@end example - -@node Extension Design -@section Extension API Design - -The first version of extensions for @command{gawk} was developed in -the mid-1990s and released with @command{gawk} 3.1 in the late 1990s. -The basic mechanisms and design remained unchanged for close to 15 years, -until 2012. - -The old extension mechanism used data types and functions from -@command{gawk} itself, with a ``clever hack'' to install extension -functions. - -@command{gawk} included some sample extensions, of which a few were -really useful. However, it was clear from the outset that the extension -mechanism was bolted onto the side and was not really thought out. - -@menu -* Old Extension Problems:: Problems with the old mechanism. -* Extension New Mechanism Goals:: Goals for the new mechanism. -* Extension Other Design Decisions:: Some other design decisions. -* Extension Mechanism Outline:: An outline of how it works. -* Extension Future Growth:: Some room for future growth. -@end menu - -@node Old Extension Problems -@subsection Problems With The Old Mechanism - -The old extension mechanism had several problems: - -@itemize @bullet -@item -It depended heavily upon @command{gawk} internals. Any time the -@code{NODE} structure@footnote{A critical central data structure -inside @command{gawk}.} changed, an extension would have to be -recompiled. Furthermore, to really write extensions required understanding -something about @command{gawk}'s internal functions. There was some -documentation in this @value{DOCUMENT}, but it was quite minimal. - -@item -Being able to call into @command{gawk} from an extension required linker -facilities that are common on Unix-derived systems but that did -not work on Windows systems; users wanting extensions on Windows -had to statically link them into @command{gawk}, even though Windows supports -dynamic loading of shared objects. - -@item -The API would change occasionally as @command{gawk} changed; no compatibility -between versions was ever offered or planned for. -@end itemize - -Despite the drawbacks, the @command{xgawk} project developers forked -@command{gawk} and developed several significant extensions. They also -enhanced @command{gawk}'s facilities relating to file inclusion and -shared object access. - -A new API was desired for a long time, but only in 2012 did the -@command{gawk} maintainer and the @command{xgawk} developers finally -start working on it together. More information about the @command{xgawk} -project is provided in @ref{gawkextlib}. - -@node Extension New Mechanism Goals -@subsection Goals For A New Mechanism - -Some goals for the new API were: - -@itemize @bullet -@item -The API should be independent of @command{gawk} internals. Changes in -@command{gawk} internals should not be visible to the writer of an -extension function. - -@item -The API should provide @emph{binary} compatibility across @command{gawk} -releases as long as the API itself does not change. - -@item -The API should enable extensions written in C to have roughly the -same ``appearance'' to @command{awk}-level code as @command{awk} -functions do. This means that extensions should have: - -@itemize @minus -@item -The ability to access function parameters. - -@item -The ability to turn an undefined parameter into an array (call by reference). - -@item -The ability to create, access and update global variables. - -@item -Easy access to all the elements of an array at once (``array flattening'') -in order to loop over all the element in an easy fashion for C code. - -@item -The ability to create arrays (including @command{gawk}'s true -multi-dimensional arrays). -@end itemize -@end itemize - -Some additional important goals were: - -@itemize @bullet -@item -The API should use only features in ISO C 90, so that extensions -can be written using the widest range of C and C++ compilers. The header -should include the appropriate @samp{#ifdef __cplusplus} and @samp{extern "C"} -magic so that a C++ compiler could be used. (If using C++, the runtime -system has to be smart enough to call any constructors and destructors, -as @command{gawk} is a C program. As of this writing, this has not been -tested.) - -@item -The API mechanism should not require access to @command{gawk}'s -symbols@footnote{The @dfn{symbols} are the variables and functions -defined inside @command{gawk}. Access to these symbols by code -external to @command{gawk} loaded dynamically at runtime is -problematic on Windows.} by the compile-time or dynamic linker, -in order to enable creation of extensions that also work on Windows. -@end itemize - -During development, it became clear that there were other features -that should be available to extensions, which were also subsequently -provided: - -@itemize @bullet -@item -Extensions should have the ability to hook into @command{gawk}'s -I/O redirection mechanism. In particular, the @command{xgawk} -developers provided a so-called ``open hook'' to take over reading -records. During development, this was generalized to allow -extensions to hook into input processing, output processing, and -two-way I/O. - -@item -An extension should be able to provide a ``call back'' function -to perform clean up actions when @command{gawk} exits. - -@item -An extension should be able to provide a version string so that -@command{gawk}'s @option{--version} option can provide information -about extensions as well. -@end itemize - -@node Extension Other Design Decisions -@subsection Other Design Decisions - -As an ``arbitrary'' design decision, extensions can read the values of -built-in variables and arrays (such as @code{ARGV} and @code{FS}), but cannot -change them, with the exception of @code{PROCINFO}. - -The reason for this is to prevent an extension function from affecting -the flow of an @command{awk} program outside its control. While a real -@command{awk} function can do what it likes, that is at the discretion -of the programmer. An extension function should provide a service or -make a C API available for use within @command{awk}, and not mess with -@code{FS} or @code{ARGC} and @code{ARGV}. - -In addition, it becomes easy to start down a slippery slope. How -much access to @command{gawk} facilities do extensions need? -Do they need @code{getline}? What about calling @code{gsub()} or -compiling regular expressions? What about calling into @command{awk} -functions? (@emph{That} would be messy.) - -In order to avoid these issues, the @command{gawk} developers chose -to start with the simplest, most basic features that are still truly useful. - -Another decision is that although @command{gawk} provides nice things like -MPFR, and arrays indexed internally by integers, these features are not -being brought out to the API in order to keep things simple and close to -traditional @command{awk} semantics. (In fact, arrays indexed internally -by integers are so transparent that they aren't even documented!) - -With time, the API will undoubtedly evolve; the @command{gawk} developers -expect this to be driven by user needs. For now, the current API seems -to provide a minimal yet powerful set of features for creating extensions. - -@node Extension Mechanism Outline -@subsection At A High Level How It Works - -The requirement to avoid access to @command{gawk}'s symbols is, at first -glance, a difficult one to meet. - -One design, apparently used by Perl and Ruby and maybe others, would -be to make the mainline @command{gawk} code into a library, with the -@command{gawk} utility a small C @code{main()} function linked against -the library. - -This seemed like the tail wagging the dog, complicating build and -installation and making a simple copy of the @command{gawk} executable -from one system to another (or one place to another on the same -system!) into a chancy operation. - -Pat Rankin suggested the solution that was adopted. Communication between -@command{gawk} and an extension is two-way. First, when an extension -is loaded, it is passed a pointer to a @code{struct} whose fields are -function pointers. -@iftex -This is shown in @ref{load-extension}. -@end iftex - -@float Figure,load-extension -@caption{Loading the extension} -@ifinfo -@center @image{api-figure1, , , Loading the extension, txt} -@end ifinfo -@ifhtml -@center @image{api-figure1, , , Loading the extension, png} -@end ifhtml -@ifnotinfo -@ifnothtml -@center @image{api-figure1, , , Loading the extension} -@end ifnothtml -@end ifnotinfo -@end float - -The extension can call functions inside @command{gawk} through these -function pointers, at runtime, without needing (link-time) access -to @command{gawk}'s symbols. One of these function pointers is to a -function for ``registering'' new built-in functions. -@iftex -This is shown in @ref{load-new-function}. -@end iftex - -@float Figure,load-new-function -@caption{Loading the new function} -@ifinfo -@center @image{api-figure2, , , Loading the new function, txt} -@end ifinfo -@ifhtml -@center @image{api-figure2, , , Loading the new function, png} -@end ifhtml -@ifnotinfo -@ifnothtml -@center @image{api-figure2, , , Loading the new function} -@end ifnothtml -@end ifnotinfo -@end float - -In the other direction, the extension registers its new functions -with @command{gawk} by passing function pointers to the functions that -provide the new feature (@code{do_chdir()}, for example). @command{gawk} -associates the function pointer with a name and can then call it, using a -defined calling convention. -@iftex -This is shown in @ref{call-new-function}. -@end iftex - -@float Figure,call-new-function -@caption{Calling the new function} -@ifinfo -@center @image{api-figure3, , , Calling the new function, txt} -@end ifinfo -@ifhtml -@center @image{api-figure3, , , Calling the new function, png} -@end ifhtml -@ifnotinfo -@ifnothtml -@center @image{api-figure3, , , Calling the new function} -@end ifnothtml -@end ifnotinfo -@end float - -The @code{do_@var{xxx}()} function, in turn, then uses the function -pointers in the API @code{struct} to do its work, such as updating -variables or arrays, printing messages, setting @code{ERRNO}, and so on. - -Convenience macros in the @file{gawkapi.h} header file make calling -through the function pointers look like regular function calls so that -extension code is quite readable and understandable. - -Although all of this sounds medium complicated, the result is that -extension code is quite clean and straightforward. This can be seen in -the sample extensions @file{filefuncs.c} (@pxref{Extension Example}) -and also the @file{testext.c} code for testing the APIs. - -Some other bits and pieces: - -@itemize @bullet -@item -The API provides access to @command{gawk}'s @code{do_@var{xxx}} values, -reflecting command line options, like @code{do_lint}, @code{do_profiling} -and so on (@pxref{Extension API Variables}). -These are informational: an extension cannot affect these -inside @command{gawk}. In addition, attempting to assign to them -produces a compile-time error. - -@item -The API also provides major and minor version numbers, so that an -extension can check if the @command{gawk} it is loaded with supports the -facilities it was compiled with. (Version mismatches ``shouldn't'' -happen, but we all know how @emph{that} goes.) -@xref{Extension Versioning}, for details. -@end itemize - -@node Extension Future Growth -@subsection Room For Future Growth - -The API provides room for future growth, in two ways. - -An ``extension id'' is passed into the extension when its loaded. This -extension id is then passed back to @command{gawk} with each function -call. This allows @command{gawk} to identify the extension calling into it, -should it need to know. - -A ``name space'' is passed into @command{gawk} when an extension function -is registered. This provides for a future mechanism for grouping -extension functions and possibly avoiding name conflicts. - -Of course, as of this writing, no decisions have been made with respect -to any of the above. - -@node Extension API Description -@section API Description - -This (rather large) @value{SECTION} describes the API in detail. - -@menu -* Extension API Functions Introduction:: Introduction to the API functions. -* General Data Types:: The data types. -* Requesting Values:: How to get a value. -* Constructor Functions:: Functions for creating values. -* Registration Functions:: Functions to register things with - @command{gawk}. -* Printing Messages:: Functions for printing messages. -* Updating @code{ERRNO}:: Functions for updating @code{ERRNO}. -* Accessing Parameters:: Functions for accessing parameters. -* Symbol Table Access:: Functions for accessing global - variables. -* Array Manipulation:: Functions for working with arrays. -* Extension API Variables:: Variables provided by the API. -* Extension API Boilerplate:: Boilerplate code for using the API. -* Finding Extensions:: How @command{gawk} find compiled - extensions. -@end menu - -@node Extension API Functions Introduction -@subsection Introduction - -Access to facilities within @command{gawk} are made available -by calling through function pointers passed into your extension. - -API function pointers are provided for the following kinds of operations: - -@itemize @bullet -@item -Registrations functions. You may register: -@itemize @minus -@item -extension functions, -@item -exit callbacks, -@item -a version string, -@item -input parsers, -@item -output wrappers, -@item -and two-way processors. -@end itemize -All of these are discussed in detail, later in this @value{CHAPTER}. - -@item -Printing fatal, warning, and ``lint'' warning messages. - -@item -Updating @code{ERRNO}, or unsetting it. - -@item -Accessing parameters, including converting an undefined parameter into -an array. - -@item -Symbol table access: retrieving a global variable, creating one, -or changing one. This also includes the ability to create a scalar -variable that will be @emph{constant} within @command{awk} code. - -@item -Creating and releasing cached values; this provides an -efficient way to use values for multiple variables and -can be a big performance win. - -@item -Manipulating arrays: -@itemize @minus -@item -Retrieving, adding, deleting, and modifying elements -@item -Getting the count of elements in an array -@item -Creating a new array -@item -Clearing an array -@item -Flattening an array for easy C style looping over all its indices and elements -@end itemize -@end itemize - -Some points about using the API: - -@itemize @bullet -@item -You must include @code{<sys/types.h>} and @code{<sys/stat.h>} before including -the @file{gawkapi.h} header file. In addition, you must include either -@code{<stddef.h>} or @code{<stdlib.h>} to get the definition of @code{size_t}. -If you wish to use the boilerplate @code{dl_load_func()} macro, you will -need to include @code{<stdio.h>} as well. -Finally, to pass reasonable integer values for @code{ERRNO}, you -will need to include @code{<errno.h>}. - -@item -Although the API only uses ISO C 90 features, there is an exception; the -``constructor'' functions use the @code{inline} keyword. If your compiler -does not support this keyword, you should either place -@samp{-Dinline=''} on your command line, or use the GNU Autotools and include a -@file{config.h} file in your extensions. - -@item -All pointers filled in by @command{gawk} are to memory -managed by @command{gawk} and should be treated by the extension as -read-only. Memory for @emph{all} strings passed into @command{gawk} -from the extension @emph{must} come from @code{malloc()} and is managed -by @command{gawk} from then on. - -@item -The API defines several simple structs that map values as seen -from @command{awk}. A value can be a @code{double}, a string, or an -array (as in multidimensional arrays, or when creating a new array). -Strings maintain both pointer and length since embedded @code{NUL} -characters are allowed. - -By intent, strings are maintained using the current multibyte encoding (as -defined by @env{LC_@var{xxx}} environment variables) and not using wide -characters. This matches how @command{gawk} stores strings internally -and also how characters are likely to be input and output from files. - -@item -When retrieving a value (such as a parameter or that of a global variable -or array element), the extension requests a specific type (number, string, -scalars, value cookie, array, or ``undefined''). When the request is -``undefined,'' the returned value will have the real underlying type. - -However, if the request and actual type don't match, the access function -returns ``false'' and fills in the type of the actual value that is there, -so that the extension can, e.g., print an error message -(``scalar passed where array expected''). - -@c This is documented in the header file and needs some expanding upon. -@c The table there should be presented here -@end itemize - -While you may call the API functions by using the function pointers -directly, the interface is not so pretty. To make extension code look -more like regular code, the @file{gawkapi.h} header file defines a number -of macros which you should use in your code. This @value{SECTION} presents -the macros as if they were functions. - -@node General Data Types -@subsection General Purpose Data Types - -@quotation -@i{I have a true love/hate relationship with unions.}@* -Arnold Robbins - -@i{That's the thing about unions: the compiler will arrange things so they -can accommodate both love and hate.}@* -Chet Ramey -@end quotation - -The extension API defines a number of simple types and structures for general -purpose use. Additional, more specialized, data structures, are introduced -in subsequent @value{SECTION}s, together with the functions that use them. - -@table @code -@item typedef void *awk_ext_id_t; -A value of this type is received from @command{gawk} when an extension is loaded. -That value must then be passed back to @command{gawk} as the first parameter of -each API function. - -@item #define awk_const @dots{} -This macro expands to @samp{const} when compiling an extension, -and to nothing when compiling @command{gawk} itself. This makes -certain fields in the API data structures unwritable from extension code, -while allowing @command{gawk} to use them as it needs to. - -@item typedef int awk_bool_t; -A simple boolean type. At the moment, the API does not define special -``true'' and ``false'' values, although perhaps it should. - -@item typedef struct @{ -@itemx @ @ @ @ char *str;@ @ @ @ @ @ /* data */ -@itemx @ @ @ @ size_t len;@ @ @ @ @ /* length thereof, in chars */ -@itemx @} awk_string_t; -This represents a mutable string. @command{gawk} -owns the memory pointed to if it supplied -the value. Otherwise, it takes ownership of the memory pointed to. -@strong{Such memory must come from @code{malloc()}!} - -As mentioned earlier, strings are maintained using the current -multibyte encoding. - -@item typedef enum @{ -@itemx @ @ @ @ AWK_UNDEFINED, -@itemx @ @ @ @ AWK_NUMBER, -@itemx @ @ @ @ AWK_STRING, -@itemx @ @ @ @ AWK_ARRAY, -@itemx @ @ @ @ AWK_SCALAR,@ @ @ @ @ @ @ @ @ /* opaque access to a variable */ -@itemx @ @ @ @ AWK_VALUE_COOKIE@ @ @ /* for updating a previously created value */ -@itemx @} awk_valtype_t; -This @code{enum} indicates the type of a value. -It is used in the following @code{struct}. - -@item typedef struct @{ -@itemx @ @ @ @ awk_valtype_t val_type; -@itemx @ @ @ @ union @{ -@itemx @ @ @ @ @ @ @ @ awk_string_t@ @ @ @ @ @ @ s; -@itemx @ @ @ @ @ @ @ @ double@ @ @ @ @ @ @ @ @ @ @ @ @ d; -@itemx @ @ @ @ @ @ @ @ awk_array_t@ @ @ @ @ @ @ @ a; -@itemx @ @ @ @ @ @ @ @ awk_scalar_t@ @ @ @ @ @ @ scl; -@itemx @ @ @ @ @ @ @ @ awk_value_cookie_t@ vc; -@itemx @ @ @ @ @} u; -@itemx @} awk_value_t; -An ``@command{awk} value.'' -The @code{val_type} member indicates what kind of value the -@code{union} holds, and each member is of the appropriate type. - -@item #define str_value@ @ @ @ @ @ u.s -@itemx #define num_value@ @ @ @ @ @ u.d -@itemx #define array_cookie@ @ @ u.a -@itemx #define scalar_cookie@ @ u.scl -@itemx #define value_cookie@ @ @ u.vc -These macros make accessing the fields of the @code{awk_value_t} more -readable. - -@item typedef void *awk_scalar_t; -Scalars can be represented as an opaque type. These values are obtained from -@command{gawk} and then passed back into it. This is discussed in a general fashion below, -and in more detail in @ref{Symbol table by cookie}. - -@item typedef void *awk_value_cookie_t; -A ``value cookie'' is an opaque type representing a cached value. -This is also discussed in a general fashion below, -and in more detail in @ref{Cached values}. - -@end table - -Scalar values in @command{awk} are either numbers or strings. The -@code{awk_value_t} struct represents values. The @code{val_type} member -indicates what is in the @code{union}. - -Representing numbers is easy---the API uses a C @code{double}. Strings -require more work. Since @command{gawk} allows embedded @code{NUL} bytes -in string values, a string must be represented as a pair containing a -data-pointer and length. This is the @code{awk_string_t} type. - -Identifiers (i.e., the names of global variables) can be associated -with either scalar values or with arrays. In addition, @command{gawk} -provides true arrays of arrays, where any given array element can -itself be an array. Discussion of arrays is delayed until -@ref{Array Manipulation}. - -The various macros listed earlier make it easier to use the elements -of the @code{union} as if they were fields in a @code{struct}; this -is a common coding practice in C. Such code is easier to write and to -read, however it remains @emph{your} responsibility to make sure that -the @code{val_type} member correctly reflects the type of the value in -the @code{awk_value_t}. - -Conceptually, the first three members of the @code{union} (number, string, -and array) are all that is needed for working with @command{awk} values. -However, since the API provides routines for accessing and changing -the value of global scalar variables only by using the variable's name, -there is a performance penalty: @command{gawk} must find the variable -each time it is accessed and changed. This turns out to be a real issue, -not just a theoretical one. - -Thus, if you know that your extension will spend considerable time -reading and/or changing the value of one or more scalar variables, you -can obtain a @dfn{scalar cookie}@footnote{See -@uref{http://catb.org/jargon/html/C/cookie.html, the ``cookie'' entry in the Jargon file} for a -definition of @dfn{cookie}, and @uref{http://catb.org/jargon/html/M/magic-cookie.html, -the ``magic cookie'' entry in the Jargon file} for a nice example. See -also the entry for ``Cookie'' in the @ref{Glossary}.} -object for that variable, and then use -the cookie for getting the variable's value or for changing the variable's -value. -This is the @code{awk_scalar_t} type and @code{scalar_cookie} macro. -Given a scalar cookie, @command{gawk} can directly retrieve or -modify the value, as required, without having to first find it. - -The @code{awk_value_cookie_t} type and @code{value_cookie} macro are similar. -If you know that you wish to -use the same numeric or string @emph{value} for one or more variables, -you can create the value once, retaining a @dfn{value cookie} for it, -and then pass in that value cookie whenever you wish to set the value of a -variable. This saves both storage space within the running @command{gawk} -process as well as the time needed to create the value. - -@node Requesting Values -@subsection Requesting Values - -All of the functions that return values from @command{gawk} -work in the same way. You pass in an @code{awk_valtype_t} value -to indicate what kind of value you expect. If the actual value -matches what you requested, the function returns true and fills -in the @code{awk_value_t} result. -Otherwise, the function returns false, and the @code{val_type} -member indicates the type of the actual value. You may then -print an error message, or reissue the request for the actual -value type, as appropriate. This behavior is summarized in -@ref{table-value-types-returned}. - -@ifnotplaintext -@float Table,table-value-types-returned -@caption{Value Types Returned} -@multitable @columnfractions .50 .50 -@headitem @tab Type of Actual Value: -@end multitable -@multitable @columnfractions .166 .166 .198 .15 .15 .166 -@headitem @tab @tab String @tab Number @tab Array @tab Undefined -@item @tab @b{String} @tab String @tab String @tab false @tab false -@item @tab @b{Number} @tab Number if can be converted, else false @tab Number @tab false @tab false -@item @b{Type} @tab @b{Array} @tab false @tab false @tab Array @tab false -@item @b{Requested:} @tab @b{Scalar} @tab Scalar @tab Scalar @tab false @tab false -@item @tab @b{Undefined} @tab String @tab Number @tab Array @tab Undefined -@item @tab @b{Value Cookie} @tab false @tab false @tab false @tab false -@end multitable -@end float -@end ifnotplaintext -@ifplaintext -@float Table,table-value-types-returned -@caption{Value Types Returned} -@example - +-------------------------------------------------+ - | Type of Actual Value: | - +------------+------------+-----------+-----------+ - | String | Number | Array | Undefined | -+-----------+-----------+------------+------------+-----------+-----------+ -| | String | String | String | false | false | -| |-----------+------------+------------+-----------+-----------+ -| | Number | Number if | Number | false | false | -| | | can be | | | | -| | | converted, | | | | -| | | else false | | | | -| |-----------+------------+------------+-----------+-----------+ -| Type | Array | false | false | Array | false | -| Requested |-----------+------------+------------+-----------+-----------+ -| | Scalar | Scalar | Scalar | false | false | -| |-----------+------------+------------+-----------+-----------+ -| | Undefined | String | Number | Array | Undefined | -| |-----------+------------+------------+-----------+-----------+ -| | Value | false | false | false | false | -| | Cookie | | | | | -+-----------+-----------+------------+------------+-----------+-----------+ -@end example -@end float -@end ifplaintext - -@node Constructor Functions -@subsection Constructor Functions and Convenience Macros - -The API provides a number of @dfn{constructor} functions for creating -string and numeric values, as well as a number of convenience macros. -This @value{SUBSECTION} presents them all as function prototypes, in -the way that extension code would use them. - -@table @code -@item static inline awk_value_t * -@itemx make_const_string(const char *string, size_t length, awk_value_t *result) -This function creates a string value in the @code{awk_value_t} variable -pointed to by @code{result}. It expects @code{string} to be a C string constant -(or other string data), and automatically creates a @emph{copy} of the data -for storage in @code{result}. It returns @code{result}. - -@item static inline awk_value_t * -@itemx make_malloced_string(const char *string, size_t length, awk_value_t *result) -This function creates a string value in the @code{awk_value_t} variable -pointed to by @code{result}. It expects @code{string} to be a @samp{char *} -value pointing to data previously obtained from @code{malloc()}. The idea here -is that the data is passed directly to @command{gawk}, which assumes -responsibility for it. It returns @code{result}. - -@item static inline awk_value_t * -@itemx make_null_string(awk_value_t *result) -This specialized function creates a null string (the ``undefined'' value) -in the @code{awk_value_t} variable pointed to by @code{result}. -It returns @code{result}. - -@item static inline awk_value_t * -@itemx make_number(double num, awk_value_t *result) -This function simply creates a numeric value in the @code{awk_value_t} variable -pointed to by @code{result}. -@end table - -Two convenience macros may be used for allocating storage from @code{malloc()} -and @code{realloc()}. If the allocation fails, they cause @command{gawk} to -exit with a fatal error message. They should be used as if they were -procedure calls that do not return a value. - -@table @code -@item emalloc(pointer, type, size, message) -The arguments to this macro are as follows: -@c nested table -@table @code -@item pointer -The pointer variable to point at the allocated storage. - -@item type -The type of the pointer variable, used to create a cast for the call to @code{malloc()}. - -@item size -The total number of bytes to be allocated. - -@item message -A message to be prefixed to the fatal error message. Typically this is the name -of the function using the macro. -@end table - -@noindent -For example, you might allocate a string value like so: - -@example -awk_value_t result; -char *message; -const char greet[] = "Don't Panic!"; - -emalloc(message, char *, sizeof(greet), "myfunc"); -strcpy(message, greet); -make_malloced_string(message, strlen(message), & result); -@end example - -@item erealloc(pointer, type, size, message) -This is like @code{emalloc()}, but it calls @code{realloc()}, -instead of @code{malloc()}. -The arguments are the same as for the @code{emalloc()} macro. -@end table - -@node Registration Functions -@subsection Registration Functions - -This @value{SECTION} describes the API functions for -registering parts of your extension with @command{gawk}. - -@menu -* Extension Functions:: Registering extension functions. -* Exit Callback Functions:: Registering an exit callback. -* Extension Version String:: Registering a version string. -* Input Parsers:: Registering an input parser. -* Output Wrappers:: Registering an output wrapper. -* Two-way processors:: Registering a two-way processor. -@end menu - -@node Extension Functions -@subsubsection Registering An Extension Function - -Extension functions are described by the following record: - -@example -typedef struct @{ -@ @ @ @ const char *name; -@ @ @ @ awk_value_t *(*function)(int num_actual_args, awk_value_t *result); -@ @ @ @ size_t num_expected_args; -@} awk_ext_func_t; -@end example - -The fields are: - -@table @code -@item const char *name; -The name of the new function. -@command{awk} level code calls the function by this name. -This is a regular C string. - -@item awk_value_t *(*function)(int num_actual_args, awk_value_t *result); -This is a pointer to the C function that provides the desired -functionality. -The function must fill in the result with either a number -or a string. @command{awk} takes ownership of any string memory. -As mentioned earlier, string memory @strong{must} come from @code{malloc()}. - -The function must return the value of @code{result}. -This is for the convenience of the calling code inside @command{gawk}. - -@item size_t num_expected_args; -This is the number of arguments the function expects to receive. -Each extension function may decide what to do if the number of -arguments isn't what it expected. Following @command{awk} functions, it -is likely OK to ignore extra arguments. -@end table - -Once you have a record representing your extension function, you register -it with @command{gawk} using this API function: - -@table @code -@item awk_bool_t add_ext_func(const char *namespace, const awk_ext_func_t *func); -This function returns true upon success, false otherwise. -The @code{namespace} parameter is currently not used; you should pass in an -empty string (@code{""}). The @code{func} pointer is the address of a -@code{struct} representing your function, as just described. -@end table - -@node Exit Callback Functions -@subsubsection Registering An Exit Callback Function - -An @dfn{exit callback} function is a function that -@command{gawk} calls before it exits. -Such functions are useful if you have general ``clean up'' tasks -that should be performed in your extension (such as closing data -base connections or other resource deallocations). -You can register such -a function with @command{gawk} using the following function. - -@table @code -@item void awk_atexit(void (*funcp)(void *data, int exit_status), -@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ void *arg0); -The parameters are: -@c nested table -@table @code -@item funcp -A pointer to the function to be called before @command{gawk} exits. The @code{data} -parameter will be the original value of @code{arg0}. -The @code{exit_status} parameter is -the exit status value that @command{gawk} will pass to the @code{exit()} system call. - -@item arg0 -A pointer to private data which @command{gawk} saves in order to pass to -the function pointed to by @code{funcp}. -@end table -@end table - -Exit callback functions are called in Last-In-First-Out (LIFO) order---that is, in -the reverse order in which they are registered with @command{gawk}. - -@node Extension Version String -@subsubsection Registering An Extension Version String - -You can register a version string which indicates the name and -version of your extension, with @command{gawk}, as follows: - -@table @code -@item void register_ext_version(const char *version); -Register the string pointed to by @code{version} with @command{gawk}. -@command{gawk} does @emph{not} copy the @code{version} string, so -it should not be changed. -@end table - -@command{gawk} prints all registered extension version strings when it -is invoked with the @option{--version} option. - -@node Input Parsers -@subsubsection Customized Input Parsers - -By default, @command{gawk} reads text files as its input. It uses the value -of @code{RS} to find the end of the record, and then uses @code{FS} -(or @code{FIELDWIDTHS}) to split it into fields (@pxref{Reading Files}). -Additionally, it sets the value of @code{RT} (@pxref{Built-in Variables}). - -If you want, you can provide your own, custom, input parser. An input -parser's job is to return a record to the @command{gawk} record processing -code, along with indicators for the value and length of the data to be -used for @code{RT}, if any. - -To provide an input parser, you must first provide two functions -(where @var{XXX} is a prefix name for your extension): - -@table @code -@item awk_bool_t @var{XXX}_can_take_file(const awk_input_buf_t *iobuf) -This function examines the information available in @code{iobuf} -(which we discuss shortly). Based on the information there, it -decides if the input parser should be used for this file. -If so, it should return true. Otherwise, it should return false. -It should not change any state (variable values, etc.) within @command{gawk}. - -@item awk_bool_t @var{XXX}_take_control_of(awk_input_buf_t *iobuf) -When @command{gawk} decides to hand control of the file over to the -input parser, it calls this function. This function in turn must fill -in certain fields in the @code{awk_input_buf_t} structure, and ensure -that certain conditions are true. It should then return true. If an -error of some kind occurs, it should not fill in any fields, and should -return false; then @command{gawk} will not use the input parser. -The details are presented shortly. -@end table - -Your extension should package these functions inside an -@code{awk_input_parser_t}, which looks like this: - -@example -typedef struct input_parser @{ - const char *name; /* name of parser */ - awk_bool_t (*can_take_file)(const awk_input_buf_t *iobuf); - awk_bool_t (*take_control_of)(awk_input_buf_t *iobuf); - awk_const struct input_parser *awk_const next; /* for use by gawk */ -@} awk_input_parser_t; -@end example - -The fields are: - -@table @code -@item const char *name; -The name of the input parser. This is a regular C string. - -@item awk_bool_t (*can_take_file)(const awk_input_buf_t *iobuf); -A pointer to your @code{@var{XXX}_can_take_file()} function. - -@item awk_bool_t (*take_control_of)(awk_input_buf_t *iobuf); -A pointer to your @code{@var{XXX}_take_control_of()} function. - -@item awk_const struct input_parser *awk_const next; -This pointer is used by @command{gawk}. -The extension cannot modify it. -@end table - -The steps are as follows: - -@enumerate -@item -Create a @code{static awk_input_parser_t} variable and initialize it -appropriately. - -@item -When your extension is loaded, register your input parser with -@command{gawk} using the @code{register_input_parser()} API function -(described below). -@end enumerate - -An @code{awk_input_buf_t} looks like this: - -@example -typedef struct awk_input @{ - const char *name; /* filename */ - int fd; /* file descriptor */ -#define INVALID_HANDLE (-1) - void *opaque; /* private data for input parsers */ - int (*get_record)(char **out, struct awk_input *iobuf, - int *errcode, char **rt_start, size_t *rt_len); - void (*close_func)(struct awk_input *iobuf); - struct stat sbuf; /* stat buf */ -@} awk_input_buf_t; -@end example - -The fields can be divided into two categories: those for use (initially, -at least) by @code{@var{XXX}_can_take_file()}, and those for use by -@code{@var{XXX}_take_control_of()}. The first group of fields and their uses -are as follows: - -@table @code -@item const char *name; -The name of the file. - -@item int fd; -A file descriptor for the file. If @command{gawk} was able to -open the file, then @code{fd} will @emph{not} be equal to -@code{INVALID_HANDLE}. Otherwise, it will. - -@item struct stat sbuf; -If file descriptor is valid, then @command{gawk} will have filled -in this structure via a call to the @code{fstat()} system call. -@end table - -The @code{@var{XXX}_can_take_file()} function should examine these -fields and decide if the input parser should be used for the file. -The decision can be made based upon @command{gawk} state (the value -of a variable defined previously by the extension and set by -@command{awk} code), the name of the -file, whether or not the file descriptor is valid, the information -in the @code{struct stat}, or any combination of the above. - -Once @code{@var{XXX}_can_take_file()} has returned true, and -@command{gawk} has decided to use your input parser, it calls -@code{@var{XXX}_take_control_of()}. That function then fills in at -least the @code{get_record} field of the @code{awk_input_buf_t}. It must -also ensure that @code{fd} is not set to @code{INVALID_HANDLE}. All of -the fields that may be filled by @code{@var{XXX}_take_control_of()} -are as follows: - -@table @code -@item void *opaque; -This is used to hold any state information needed by the input parser -for this file. It is ``opaque'' to @command{gawk}. The input parser -is not required to use this pointer. - -@item int@ (*get_record)(char@ **out, -@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ struct@ awk_input *iobuf, -@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ int *errcode, -@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ char **rt_start, -@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ size_t *rt_len); -This function pointer should point to a function that creates the input -records. Said function is the core of the input parser. Its behavior -is described below. - -@item void (*close_func)(struct awk_input *iobuf); -This function pointer should point to a function that does -the ``tear down.'' It should release any resources allocated by -@code{@var{XXX}_take_control_of()}. It may also close the file. If it -does so, it should set the @code{fd} field to @code{INVALID_HANDLE}. - -If @code{fd} is still not @code{INVALID_HANDLE} after the call to this -function, @command{gawk} calls the regular @code{close()} system call. - -Having a ``tear down'' function is optional. If your input parser does -not need it, do not set this field. Then, @command{gawk} calls the -regular @code{close()} system call on the file descriptor, so it should -be valid. -@end table - -The @code{@var{XXX}_get_record()} function does the work of creating -input records. The parameters are as follows: - -@table @code -@item char **out -This is a pointer to a @code{char *} variable which is set to point -to the record. @command{gawk} makes its own copy of the data, so -the extension must manage this storage. - -@item struct awk_input *iobuf -This is the @code{awk_input_buf_t} for the file. The fields should be -used for reading data (@code{fd}) and for managing private state -(@code{opaque}), if any. - -@item int *errcode -If an error occurs, @code{*errcode} should be set to an appropriate -code from @code{<errno.h>}. - -@item char **rt_start -@itemx size_t *rt_len -If the concept of a ``record terminator'' makes sense, then -@code{*rt_start} should be set to point to the data to be used for -@code{RT}, and @code{*rt_len} should be set to the length of the -data. Otherwise, @code{*rt_len} should be set to zero. -@code{gawk} makes its own copy of this data, so the -extension must manage the storage. -@end table - -The return value is the length of the buffer pointed to by -@code{*out}, or @code{EOF} if end-of-file was reached or an -error occurred. - -It is guaranteed that @code{errcode} is a valid pointer, so there is no -need to test for a @code{NULL} value. @command{gawk} sets @code{*errcode} -to zero, so there is no need to set it unless an error occurs. - -If an error does occur, the function should return @code{EOF} and set -@code{*errcode} to a non-zero value. In that case, if @code{*errcode} -does not equal @minus{}1, @command{gawk} automatically updates -the @code{ERRNO} variable based on the value of @code{*errcode} (e.g., -setting @samp{*errcode = errno} should do the right thing). - -@command{gawk} ships with a sample extension that reads directories, -returning records for each entry in the directory (@pxref{Extension -Sample Readdir}). You may wish to use that code as a guide for writing -your own input parser. - -When writing an input parser, you should think about (and document) -how it is expected to interact with @command{awk} code. You may want -it to always be called, and take effect as appropriate (as the -@code{readdir} extension does). Or you may want it to take effect -based upon the value of an @code{awk} variable, as the XML extension -from the @code{gawkextlib} project does (@pxref{gawkextlib}). -In the latter case, code in a @code{BEGINFILE} section -can look at @code{FILENAME} and @code{ERRNO} to decide whether or -not to activate an input parser (@pxref{BEGINFILE/ENDFILE}). - -You register your input parser with the following function: - -@table @code -@item void register_input_parser(awk_input_parser_t *input_parser); -Register the input parser pointed to by @code{input_parser} with -@command{gawk}. -@end table - -@node Output Wrappers -@subsubsection Customized Output Wrappers - -An @dfn{output wrapper} is the mirror image of an input parser. -It allows an extension to take over the output to a file opened -with the @samp{>} or @samp{>>} operators (@pxref{Redirection}). - -The output wrapper is very similar to the input parser structure: - -@example -typedef struct output_wrapper @{ - const char *name; /* name of the wrapper */ - awk_bool_t (*can_take_file)(const awk_output_buf_t *outbuf); - awk_bool_t (*take_control_of)(awk_output_buf_t *outbuf); - awk_const struct output_wrapper *awk_const next; /* for use by gawk */ -@} awk_output_wrapper_t; -@end example - -The members are as follows: - -@table @code -@item const char *name; -This is the name of the output wrapper. - -@item awk_bool_t (*can_take_file)(const awk_output_buf_t *outbuf); -This points to a function that examines the information in -the @code{awk_output_buf_t} structure pointed to by @code{outbuf}. -It should return true if the output wrapper wants to take over the -file, and false otherwise. It should not change any state (variable -values, etc.) within @command{gawk}. - -@item awk_bool_t (*take_control_of)(awk_output_buf_t *outbuf); -The function pointed to by this field is called when @command{gawk} -decides to let the output wrapper take control of the file. It should -fill in appropriate members of the @code{awk_output_buf_t} structure, -as described below, and return true if successful, false otherwise. - -@item awk_const struct output_wrapper *awk_const next; -This is for use by @command{gawk}. -@end table - -The @code{awk_output_buf_t} structure looks like this: - -@example -typedef struct @{ - const char *name; /* name of output file */ - const char *mode; /* mode argument to fopen */ - FILE *fp; /* stdio file pointer */ - awk_bool_t redirected; /* true if a wrapper is active */ - void *opaque; /* for use by output wrapper */ - size_t (*gawk_fwrite)(const void *buf, size_t size, size_t count, - FILE *fp, void *opaque); - int (*gawk_fflush)(FILE *fp, void *opaque); - int (*gawk_ferror)(FILE *fp, void *opaque); - int (*gawk_fclose)(FILE *fp, void *opaque); -@} awk_output_buf_t; -@end example - -Here too, your extension will define @code{@var{XXX}_can_take_file()} -and @code{@var{XXX}_take_control_of()} functions that examine and update -data members in the @code{awk_output_buf_t}. -The data members are as follows: - -@table @code -@item const char *name; -The name of the output file. - -@item const char *mode; -The mode string (as would be used in the second argument to @code{fopen()}) -with which the file was opened. - -@item FILE *fp; -The @code{FILE} pointer from @code{<stdio.h>}. @command{gawk} opens the file -before attempting to find an output wrapper. - -@item awk_bool_t redirected; -This field must be set to true by the @code{@var{XXX}_take_control_of()} function. - -@item void *opaque; -This pointer is opaque to @command{gawk}. The extension should use it to store -a pointer to any private data associated with the file. - -@item size_t (*gawk_fwrite)(const void *buf, size_t size, size_t count, -@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ FILE *fp, void *opaque); -@itemx int (*gawk_fflush)(FILE *fp, void *opaque); -@itemx int (*gawk_ferror)(FILE *fp, void *opaque); -@itemx int (*gawk_fclose)(FILE *fp, void *opaque); -These pointers should be set to point to functions that perform -the equivalent function as the @code{<stdio.h>} functions do, if appropriate. -@command{gawk} uses these function pointers for all output. -@command{gawk} initializes the pointers to point to internal, ``pass through'' -functions that just call the regular @code{<stdio.h>} functions, so an -extension only needs to redefine those functions that are appropriate for -what it does. -@end table - -The @code{@var{XXX}_can_take_file()} function should make a decision based -upon the @code{name} and @code{mode} fields, and any additional state -(such as @command{awk} variable values) that is appropriate. - -When @command{gawk} calls @code{@var{XXX}_take_control_of()}, it should fill -in the other fields, as appropriate, except for @code{fp}, which it should just -use normally. - -You register your output wrapper with the following function: - -@table @code -@item void register_output_wrapper(awk_output_wrapper_t *output_wrapper); -Register the output wrapper pointed to by @code{output_wrapper} with -@command{gawk}. -@end table - -@node Two-way processors -@subsubsection Customized Two-way Processors - -A @dfn{two-way processor} combines an input parser and an output wrapper for -two-way I/O with the @samp{|&} operator (@pxref{Redirection}). It makes identical -use of the @code{awk_input_parser_t} and @code{awk_output_buf_t} structures -as described earlier. - -A two-way processor is represented by the following structure: - -@example -typedef struct two_way_processor @{ - const char *name; /* name of the two-way processor */ - awk_bool_t (*can_take_two_way)(const char *name); - awk_bool_t (*take_control_of)(const char *name, - awk_input_buf_t *inbuf, - awk_output_buf_t *outbuf); - awk_const struct two_way_processor *awk_const next; /* for use by gawk */ -@} awk_two_way_processor_t; -@end example - -The fields are as follows: - -@table @code -@item const char *name; -The name of the two-way processor. - -@item awk_bool_t (*can_take_two_way)(const char *name); -This function returns true if it wants to take over two-way I/O for this filename. -It should not change any state (variable -values, etc.) within @command{gawk}. - -@item awk_bool_t (*take_control_of)(const char *name, -@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_input_buf_t *inbuf, -@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_output_buf_t *outbuf); -This function should fill in the @code{awk_input_buf_t} and -@code{awk_outut_buf_t} structures pointed to by @code{inbuf} and -@code{outbuf}, respectively. These structures were described earlier. - -@item awk_const struct two_way_processor *awk_const next; -This is for use by @command{gawk}. -@end table - -As with the input parser and output processor, you provide -``yes I can take this'' and ``take over for this'' functions, -@code{@var{XXX}_can_take_two_way()} and @code{@var{XXX}_take_control_of()}. - -You register your two-way processor with the following function: - -@table @code -@item void register_two_way_processor(awk_two_way_processor_t *two_way_processor); -Register the two-way processor pointed to by @code{two_way_processor} with -@command{gawk}. -@end table - -@node Printing Messages -@subsection Printing Messages - -You can print different kinds of warning messages from your -extension, as described below. Note that for these functions, -you must pass in the extension id received from @command{gawk} -when the extension was loaded.@footnote{Because the API uses only ISO C 90 -features, it cannot make use of the ISO C 99 variadic macro feature to hide -that parameter. More's the pity.} - -@table @code -@item void fatal(awk_ext_id_t id, const char *format, ...); -Print a message and then cause @command{gawk} to exit immediately. - -@item void warning(awk_ext_id_t id, const char *format, ...); -Print a warning message. - -@item void lintwarn(awk_ext_id_t id, const char *format, ...); -Print a ``lint warning.'' Normally this is the same as printing a -warning message, but if @command{gawk} was invoked with @samp{--lint=fatal}, -then lint warnings become fatal error messages. -@end table - -All of these functions are otherwise like the C @code{printf()} -family of functions, where the @code{format} parameter is a string -with literal characters and formatting codes intermixed. - -@node Updating @code{ERRNO} -@subsection Updating @code{ERRNO} - -The following functions allow you to update the @code{ERRNO} -variable: - -@table @code -@item void update_ERRNO_int(int errno_val); -Set @code{ERRNO} to the string equivalent of the error code -in @code{errno_val}. The value should be one of the defined -error codes in @code{<errno.h>}, and @command{gawk} turns it -into a (possibly translated) string using the C @code{strerror()} function. - -@item void update_ERRNO_string(const char *string); -Set @code{ERRNO} directly to the string value of @code{ERRNO}. -@command{gawk} makes a copy of the value of @code{string}. - -@item void unset_ERRNO(); -Unset @code{ERRNO}. -@end table - -@node Accessing Parameters -@subsection Accessing and Updating Parameters - -Two functions give you access to the arguments (parameters) -passed to your extension function. They are: - -@table @code -@item awk_bool_t get_argument(size_t count, -@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_valtype_t wanted, -@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_value_t *result); -Fill in the @code{awk_value_t} structure pointed to by @code{result} -with the @code{count}'th argument. Return true if the actual -type matches @code{wanted}, false otherwise. In the latter -case, @code{result@w{->}val_type} indicates the actual type -(@pxref{table-value-types-returned}). Counts are zero based---the first -argument is numbered zero, the second one, and so on. @code{wanted} -indicates the type of value expected. - -@item awk_bool_t set_argument(size_t count, awk_array_t array); -Convert a parameter that was undefined into an array; this provides -call-by-reference for arrays. Return false if @code{count} is too big, -or if the argument's type is not undefined. @xref{Array Manipulation}, -for more information on creating arrays. -@end table - -@node Symbol Table Access -@subsection Symbol Table Access - -Two sets of routines provide access to global variables, and one set -allows you to create and release cached values. - -@menu -* Symbol table by name:: Accessing variables by name. -* Symbol table by cookie:: Accessing variables by ``cookie''. -* Cached values:: Creating and using cached values. -@end menu - -@node Symbol table by name -@subsubsection Variable Access and Update by Name - -The following routines provide the ability to access and update -global @command{awk}-level variables by name. In compiler terminology, -identifiers of different kinds are termed @dfn{symbols}, thus the ``sym'' -in the routines' names. The data structure which stores information -about symbols is termed a @dfn{symbol table}. - -@table @code -@item awk_bool_t sym_lookup(const char *name, -@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_valtype_t wanted, -@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_value_t *result); -Fill in the @code{awk_value_t} structure pointed to by @code{result} -with the value of the variable named by the string @code{name}, which is -a regular C string. @code{wanted} indicates the type of value expected. -Return true if the actual type matches @code{wanted}, false otherwise -In the latter case, @code{result->val_type} indicates the actual type -(@pxref{table-value-types-returned}). - -@item awk_bool_t sym_update(const char *name, awk_value_t *value); -Update the variable named by the string @code{name}, which is a regular -C string. The variable is added to @command{gawk}'s symbol table -if it is not there. Return true if everything worked, false otherwise. - -Changing types (scalar to array or vice versa) of an existing variable -is @emph{not} allowed, nor may this routine be used to update an array. -This routine cannot be be used to update any of the predefined -variables (such as @code{ARGC} or @code{NF}). - -@item awk_bool_t sym_constant(const char *name, awk_value_t *value); -Create a variable named by the string @code{name}, which is -a regular C string, that has the constant value as given by -@code{value}. @command{awk}-level code cannot change the value of this -variable.@footnote{There (currently) is no @code{awk}-level feature that -provides this ability.} The extension may change the value of @code{name}'s -variable with subsequent calls to this routine, and may also convert -a variable created by @code{sym_update()} into a constant. However, -once a variable becomes a constant it cannot later be reverted into a -mutable variable. -@end table - -@node Symbol table by cookie -@subsubsection Variable Access and Update by Cookie - -A @dfn{scalar cookie} is an opaque handle that provide access -to a global variable or array. It is an optimization that -avoids looking up variables in @command{gawk}'s symbol table every time -access is needed. This was discussed earlier, in @ref{General Data Types}. - -The following functions let you work with scalar cookies. - -@table @code -@item awk_bool_t sym_lookup_scalar(awk_scalar_t cookie, -@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_valtype_t wanted, -@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_value_t *result); -Retrieve the current value of a scalar cookie. -Once you have obtained a scalar_cookie using @code{sym_lookup()}, you can -use this function to get its value more efficiently. -Return false if the value cannot be retrieved. - -@item awk_bool_t sym_update_scalar(awk_scalar_t cookie, awk_value_t *value); -Update the value associated with a scalar cookie. Return false if -the new value is not one of @code{AWK_STRING} or @code{AWK_NUMBER}. -Here too, the built-in variables may not be updated. -@end table - -It is not obvious at first glance how to work with scalar cookies or -what their @i{raison d'etre} really is. In theory, the @code{sym_lookup()} -and @code{sym_update()} routines are all you really need to work with -variables. For example, you might have code that looked up the value of -a variable, evaluated a condition, and then possibly changed the value -of the variable based on the result of that evaluation, like so: - -@example -/* do_magic --- do something really great */ - -static awk_value_t * -do_magic(int nargs, awk_value_t *result) -@{ - awk_value_t value; - - if ( sym_lookup("MAGIC_VAR", AWK_NUMBER, & value) - && some_condition(value.num_value)) @{ - value.num_value += 42; - sym_update("MAGIC_VAR", & value); - @} - - return make_number(0.0, result); -@} -@end example - -@noindent -This code looks (and is) simple and straightforward. So what's the problem? - -Consider what happens if @command{awk}-level code associated with your -extension calls the @code{magic()} function (implemented in C by @code{do_magic()}), -once per record, while processing hundreds of thousands or millions of records. -The @code{MAGIC_VAR} variable is looked up in the symbol table once or twice per function call! - -The symbol table lookup is really pure overhead; it is considerably more efficient -to get a cookie that represents the variable, and use that to get the variable's -value and update it as needed.@footnote{The difference is measurable and quite real. Trust us.} - -Thus, the way to use cookies is as follows. First, install your extension's variable -in @command{gawk}'s symbol table using @code{sym_update()}, as usual. Then get a -scalar cookie for the variable using @code{sym_lookup()}: - -@example -static awk_scalar_t magic_var_cookie; /* cookie for MAGIC_VAR */ - -static void -my_extension_init() -@{ - awk_value_t value; - - /* install initial value */ - sym_update("MAGIC_VAR", make_number(42.0, & value)); - - /* get cookie */ - sym_lookup("MAGIC_VAR", AWK_SCALAR, & value); - - /* save the cookie */ - magic_var_cookie = value.scalar_cookie; - @dots{} -@} -@end example - -Next, use the routines in this section for retrieving and updating -the value through the cookie. Thus, @code{do_magic()} now becomes -something like this: - -@example -/* do_magic --- do something really great */ - -static awk_value_t * -do_magic(int nargs, awk_value_t *result) -@{ - awk_value_t value; - - if ( sym_lookup_scalar(magic_var_cookie, AWK_NUMBER, & value) - && some_condition(value.num_value)) @{ - value.num_value += 42; - sym_update_scalar(magic_var_cookie, & value); - @} - @dots{} - - return make_number(0.0, result); -@} -@end example - -@quotation NOTE -The previous code omitted error checking for -presentation purposes. Your extension code should be more robust -and carefully check the return values from the API functions. -@end quotation - -@node Cached values -@subsubsection Creating and Using Cached Values - -The routines in this section allow you to create and release -cached values. As with scalar cookies, in theory, cached values -are not necessary. You can create numbers and strings using -the functions in @ref{Constructor Functions}. You can then -assign those values to variables using @code{sym_update()} -or @code{sym_update_scalar()}, as you like. - -However, you can understand the point of cached values if you remember that -@emph{every} string value's storage @emph{must} come from @code{malloc()}. -If you have 20 variables, all of which have the same string value, you -must create 20 identical copies of the string.@footnote{Numeric values -are clearly less problematic, requiring only a C @code{double} to store.} - -It is clearly more efficient, if possible, to create a value once, and -then tell @command{gawk} to reuse the value for multiple variables. That -is what the routines in this section let you do. The functions are as follows: - -@table @code -@item awk_bool_t create_value(awk_value_t *value, awk_value_cookie_t *result); -Create a cached string or numeric value from @code{value} for efficient later -assignment. -Only @code{AWK_NUMBER} and @code{AWK_STRING} values are allowed. Any other type -is rejected. While @code{AWK_UNDEFINED} could be allowed, doing so would -result in inferior performance. - -@item awk_bool_t release_value(awk_value_cookie_t vc); -Release the memory associated with a value cookie obtained -from @code{create_value()}. -@end table - -You use value cookies in a fashion similar to the way you use scalar cookies. -In the extension initialization routine, you create the value cookie: - -@example -static awk_value_cookie_t answer_cookie; /* static value cookie */ - -static void -my_extension_init() -@{ - awk_value_t value; - char *long_string; - size_t long_string_len; - - /* code from earlier */ - @dots{} - /* @dots{} fill in long_string and long_string_len @dots{} */ - make_malloced_string(long_string, long_string_len, & value); - create_value(& value, & answer_cookie); /* create cookie */ - @dots{} -@} -@end example - -Once the value is created, you can use it as the value of any number -of variables: - -@example -static awk_value_t * -do_magic(int nargs, awk_value_t *result) -@{ - awk_value_t new_value; - - @dots{} /* as earlier */ - - value.val_type = AWK_VALUE_COOKIE; - value.value_cookie = answer_cookie; - sym_update("VAR1", & value); - sym_update("VAR2", & value); - @dots{} - sym_update("VAR100", & value); - @dots{} -@} -@end example - -@noindent -Using value cookies in this way saves considerable storage, since all of -@code{VAR1} through @code{VAR100} share the same value. - -You might be wondering, ``Is this sharing problematic? -What happens if @command{awk} code assigns a new value to @code{VAR1}, -are all the others be changed too?'' - -That's a great question. The answer is that no, it's not a problem. -@command{gawk} is smart enough to avoid such problems. - -Finally, as part of your clean up action (@pxref{Exit Callback Functions}) -you should release any cached values that you created, using -@code{release_value()}. - -@node Array Manipulation -@subsection Array Manipulation - -The primary data structure@footnote{Okay, the only data structure.} in @command{awk} -is the associative array (@pxref{Arrays}). -Extensions need to be able to manipulate @command{awk} arrays. -The API provides a number of data structures for working with arrays, -functions for working with individual elements, and functions for -working with arrays as a whole. This includes the ability to -``flatten'' an array so that it is easy for C code to traverse -every element in an array. The array data structures integrate -nicely with the data structures for values to make it easy to -both work with and create true arrays of arrays (@pxref{General Data Types}). - -@menu -* Array Data Types:: Data types for working with arrays. -* Array Functions:: Functions for working with arrays. -* Flattening Arrays:: How to flatten arrays. -* Creating Arrays:: How to create and populate arrays. -@end menu - -@node Array Data Types -@subsubsection Array Data Types - -The data types associated with arrays are listed below. - -@table @code -@item typedef void *awk_array_t; -If you request the value of an array variable, you get back an -@code{awk_array_t} value. This value is opaque@footnote{It is also -a ``cookie,'' but the @command{gawk} developers did not wish to overuse this -term.} to the extension; it uniquely identifies the array but can -only be used by passing it into API functions or receiving it from API -functions. This is very similar to way @samp{FILE *} values are used -with the @code{<stdio.h>} library routines. - - -@item -@item typedef struct awk_element @{ -@itemx @ @ @ @ /* convenience linked list pointer, not used by gawk */ -@itemx @ @ @ @ struct awk_element *next; -@itemx @ @ @ @ enum @{ -@itemx @ @ @ @ @ @ @ @ AWK_ELEMENT_DEFAULT = 0,@ @ /* set by gawk */ -@itemx @ @ @ @ @ @ @ @ AWK_ELEMENT_DELETE = 1@ @ @ @ /* set by extension if should be deleted */ -@itemx @ @ @ @ @} flags; -@itemx @ @ @ @ awk_value_t index; -@itemx @ @ @ @ awk_value_t value; -@itemx @} awk_element_t; -The @code{awk_element_t} is a ``flattened'' -array element. @command{awk} produces an array of these -inside the @code{awk_flat_array_t} (see the next item). -Individual elements may be marked for deletion. New elements must be added -individually, one at a time, using the separate API for that purpose. -The fields are as follows: - -@c nested table -@table @code -@item struct awk_element *next; -This pointer is for the convenience of extension writers. It allows -an extension to create a linked list of new elements which can then be -added to an array in a loop that traverses the list. - -@item enum @{ @dots{} @} flags; -A set of flag values that convey information between @command{gawk} -and the extension. Currently there is only one: @code{AWK_ELEMENT_DELETE}, -which the extension can set to cause @command{gawk} to delete the -element from the original array upon release of the flattened array. - -@item index -@itemx value -The index and value of the element, respectively. -@emph{All} memory pointed to by @code{index} and @code{value} belongs to @command{gawk}. -@end table - -@item typedef struct awk_flat_array @{ -@itemx @ @ @ @ awk_const void *awk_const opaque1;@ @ @ @ /* private data for use by gawk */ -@itemx @ @ @ @ awk_const void *awk_const opaque2;@ @ @ @ /* private data for use by gawk */ -@itemx @ @ @ @ awk_const size_t count;@ @ @ @ @ /* how many elements */ -@itemx @ @ @ @ awk_element_t elements[1];@ @ /* will be extended */ -@itemx @} awk_flat_array_t; -This is a flattened array. When an extension gets one of these -from @command{gawk}, the @code{elements} array is of actual -size @code{count}. -The @code{opaque1} and @code{opaque2} pointers are for use by @command{gawk}; -therefore they are marked @code{awk_const} so that the extension cannot -modify them. -@end table - -@node Array Functions -@subsubsection Array Functions - -The following functions relate to individual array elements. - -@table @code -@item awk_bool_t get_element_count(awk_array_t a_cookie, size_t *count); -For the array represented by @code{a_cookie}, return in @code{*count} -the number of elements it contains. A subarray counts as a single element. -Return false if there is an error. - -@item awk_bool_t get_array_element(awk_array_t a_cookie, -@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const awk_value_t *const index, -@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_valtype_t wanted, -@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_value_t *result); -For the array represented by @code{a_cookie}, return in @code{*result} -the value of the element whose index is @code{index}. -@code{wanted} specifies the type of value you wish to retrieve. -Return false if @code{wanted} does not match the actual type or if -@code{index} is not in the array (@pxref{table-value-types-returned}). - -The value for @code{index} can be numeric, in which case @command{gawk} -converts it to a string. Using non-integral values is possible, but -requires that you understand how such values are converted to strings -(@pxref{Conversion}); thus using integral values is safest. - -As with @emph{all} strings passed into @code{gawk} from an extension, -the string value of @code{index} must come from @code{malloc()}, and -@command{gawk} releases the storage. - -@item awk_bool_t set_array_element(awk_array_t a_cookie, -@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const@ awk_value_t *const index, -@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const@ awk_value_t *const value); -In the array represented by @code{a_cookie}, create or modify -the element whose index is given by @code{index}. -The @code{ARGV} and @code{ENVIRON} arrays may not be changed. - -@item awk_bool_t set_array_element_by_elem(awk_array_t a_cookie, -@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_element_t element); -Like @code{set_array_element()}, but take the @code{index} and @code{value} -from @code{element}. This is a convenience macro. - -@item awk_bool_t del_array_element(awk_array_t a_cookie, -@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const awk_value_t* const index); -Remove the element with the given index from the array -represented by @code{a_cookie}. -Return true if the element was removed, or false if the element did -not exist in the array. -@end table - -The following functions relate to arrays as a whole: - -@table @code -@item awk_array_t create_array(); -Create a new array to which elements may be added. -@xref{Creating Arrays}, for a discussion of how to -create a new array and add elements to it. - -@item awk_bool_t clear_array(awk_array_t a_cookie); -Clear the array represented by @code{a_cookie}. -Return false if there was some kind of problem, true otherwise. -The array remains an array, but after calling this function, it -has no elements. This is equivalent to using the @code{delete} -statement (@pxref{Delete}). - -@item awk_bool_t flatten_array(awk_array_t a_cookie, awk_flat_array_t **data); -For the array represented by @code{a_cookie}, create an @code{awk_flat_array_t} -structure and fill it in. Set the pointer whose address is passed as @code{data} -to point to this structure. -Return true upon success, or false otherwise. -@xref{Flattening Arrays}, for a discussion of how to -flatten an array and work with it. - -@item awk_bool_t release_flattened_array(awk_array_t a_cookie, -@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_flat_array_t *data); -When done with a flattened array, release the storage using this function. -You must pass in both the original array cookie, and the address of -the created @code{awk_flat_array_t} structure. -The function returns true upon success, false otherwise. -@end table - -@node Flattening Arrays -@subsubsection Working With All The Elements of an Array - -To @dfn{flatten} an array is create a structure that -represents the full array in a fashion that makes it easy -for C code to traverse the entire array. Test code -in @file{extension/testext.c} does this, and also serves -as a nice example to show how to use the APIs. - -First, the @command{gawk} script that drives the test extension: - -@example -@@load "testext" -BEGIN @{ - n = split("blacky rusty sophie raincloud lucky", pets) - printf "pets has %d elements\n", length(pets) - ret = dump_array_and_delete("pets", "3") - printf "dump_array_and_delete(pets) returned %d\n", ret - if ("3" in pets) - printf("dump_array_and_delete() did NOT remove index \"3\"!\n") - else - printf("dump_array_and_delete() did remove index \"3\"!\n") - print "" -@} -@end example - -@noindent -This code creates an array with @code{split()} (@pxref{String Functions}) -and then calls @code{dump_and_delete()}. That function looks up -the array whose name is passed as the first argument, and -deletes the element at the index passed in the second argument. -It then prints the return value and checks if the element -was indeed deleted. Here is the C code that implements -@code{dump_array_and_delete()}. It has been edited slightly for -presentation. - -The first part declares variables, sets up the default -return value in @code{result}, and checks that the function -was called with the correct number of arguments: - -@example -static awk_value_t * -dump_array_and_delete(int nargs, awk_value_t *result) -@{ - awk_value_t value, value2, value3; - awk_flat_array_t *flat_array; - size_t count; - char *name; - int i; - - assert(result != NULL); - make_number(0.0, result); - - if (nargs != 2) @{ - printf("dump_array_and_delete: nargs not right " - "(%d should be 2)\n", nargs); - goto out; - @} -@end example - -The function then proceeds in steps, as follows. First, retrieve -the name of the array, passed as the first argument. Then -retrieve the array itself. If either operation fails, print -error messages and return: - -@example - /* get argument named array as flat array and print it */ - if (get_argument(0, AWK_STRING, & value)) @{ - name = value.str_value.str; - if (sym_lookup(name, AWK_ARRAY, & value2)) - printf("dump_array_and_delete: sym_lookup of %s passed\n", - name); - else @{ - printf("dump_array_and_delete: sym_lookup of %s failed\n", - name); - goto out; - @} - @} else @{ - printf("dump_array_and_delete: get_argument(0) failed\n"); - goto out; - @} -@end example - -For testing purposes and to make sure that the C code sees -the same number of elements as the @command{awk} code, -the second step is to get the count of elements in the array -and print it: - -@example - if (! get_element_count(value2.array_cookie, & count)) @{ - printf("dump_array_and_delete: get_element_count failed\n"); - goto out; - @} - - printf("dump_array_and_delete: incoming size is %lu\n", - (unsigned long) count); -@end example - -The third step is to actually flatten the array, and then -to double check that the count in the @code{awk_flat_array_t} -is the same as the count just retrieved: - -@example - if (! flatten_array(value2.array_cookie, & flat_array)) @{ - printf("dump_array_and_delete: could not flatten array\n"); - goto out; - @} - - if (flat_array->count != count) @{ - printf("dump_array_and_delete: flat_array->count (%lu)" - " != count (%lu)\n", - (unsigned long) flat_array->count, - (unsigned long) count); - goto out; - @} -@end example - -The fourth step is to retrieve the index of the element -to be deleted, which was passed as the second argument. -Remember that argument counts passed to @code{get_argument()} -are zero-based, thus the second argument is numbered one: - -@example - if (! get_argument(1, AWK_STRING, & value3)) @{ - printf("dump_array_and_delete: get_argument(1) failed\n"); - goto out; - @} -@end example - -The fifth step is where the ``real work'' is done. The function -loops over every element in the array, printing the index and -element values. In addition, upon finding the element with the -index that is supposed to be deleted, the function sets the -@code{AWK_ELEMENT_DELETE} bit in the @code{flags} field -of the element. When the array is released, @command{gawk} -traverses the flattened array, and deletes any element which -have this flag bit set: - -@example - for (i = 0; i < flat_array->count; i++) @{ - printf("\t%s[\"%.*s\"] = %s\n", - name, - (int) flat_array->elements[i].index.str_value.len, - flat_array->elements[i].index.str_value.str, - valrep2str(& flat_array->elements[i].value)); - - if (strcmp(value3.str_value.str, - flat_array->elements[i].index.str_value.str) - == 0) @{ - flat_array->elements[i].flags |= AWK_ELEMENT_DELETE; - printf("dump_array_and_delete: marking element \"%s\" " - "for deletion\n", - flat_array->elements[i].index.str_value.str); - @} - @} -@end example - -The sixth step is to release the flattened array. This tells -@command{gawk} that the extension is no longer using the array, -and that it should delete any elements marked for deletion. -@command{gawk} also frees any storage that was allocated, -so you should not use the pointer (@code{flat_array} in this -code) once you have called @code{release_flattened_array()}: - -@example - if (! release_flattened_array(value2.array_cookie, flat_array)) @{ - printf("dump_array_and_delete: could not release flattened array\n"); - goto out; - @} -@end example - -Finally, since everything was successful, the function sets the -return value to success, and returns: - -@example - make_number(1.0, result); -out: - return result; -@} -@end example - -Here is the output from running this part of the test: - -@example -pets has 5 elements -dump_array_and_delete: sym_lookup of pets passed -dump_array_and_delete: incoming size is 5 - pets["1"] = "blacky" - pets["2"] = "rusty" - pets["3"] = "sophie" -dump_array_and_delete: marking element "3" for deletion - pets["4"] = "raincloud" - pets["5"] = "lucky" -dump_array_and_delete(pets) returned 1 -dump_array_and_delete() did remove index "3"! -@end example - -@node Creating Arrays -@subsubsection How To Create and Populate Arrays - -Besides working with arrays created by @command{awk} code, you can -create arrays and populate them as you see fit, and then @command{awk} -code can access them and manipulate them. - -There are two important points about creating arrays from extension code: - -@enumerate 1 -@item -You must install a new array into @command{gawk}'s symbol -table immediately upon creating it. Once you have done so, -you can then populate the array. - -@ignore -Strictly speaking, this is required only -for arrays that will have subarrays as elements; however it is -a good idea to always do this. This restriction may be relaxed -in a subsequent revision of the API. -@end ignore - -Similarly, if installing a new array as a subarray of an existing array, -you must add the new array to its parent before adding any elements to it. - -Thus, the correct way to build an array is to work ``top down.'' Create -the array, and immediately install it in @command{gawk}'s symbol table -using @code{sym_update()}, or install it as an element in a previously -existing array using @code{set_element()}. Example code is coming shortly. - -@item -Due to gawk internals, after using @code{sym_update()} to install an array -into @command{gawk}, you have to retrieve the array cookie from the value -passed in to @command{sym_update()} before doing anything else with it, like so: - -@example -awk_value_t index, value; -awk_array_t new_array; - -make_const_string("an index", 8, & index); - -new_array = create_array(); -val.val_type = AWK_ARRAY; -val.array_cookie = new_array; - -/* install array in the symbol table */ -sym_update("array", & index, & val); - -new_array = val.array_cookie; /* YOU MUST DO THIS */ -@end example - -If installing an array as a subarray, you must also retrieve the value -of the array cookie after the call to @code{set_element()}. -@end enumerate - -The following C code is a simple test extension to create an array -with two regular elements and with a subarray. The leading @samp{#include} -directives and boilerplate variable declarations are omitted for brevity. -The first step is to create a new array and then install it -in the symbol table: - -@example -@ignore -#ifdef HAVE_CONFIG_H -#include <config.h> -#endif - -#include <stdio.h> -#include <assert.h> -#include <errno.h> -#include <stdlib.h> -#include <string.h> -#include <unistd.h> - -#include <sys/types.h> -#include <sys/stat.h> - -#include "gawkapi.h" - -static const gawk_api_t *api; /* for convenience macros to work */ -static awk_ext_id_t *ext_id; -static const char *ext_version = "testarray extension: version 1.0"; - -int plugin_is_GPL_compatible; - -@end ignore -/* create_new_array --- create a named array */ - -static void -create_new_array() -@{ - awk_array_t a_cookie; - awk_array_t subarray; - awk_value_t index, value; - - a_cookie = create_array(); - value.val_type = AWK_ARRAY; - value.array_cookie = a_cookie; - - if (! sym_update("new_array", & value)) - printf("create_new_array: sym_update(\"new_array\") failed!\n"); - a_cookie = value.array_cookie; -@end example - -@noindent -Note how @code{a_cookie} is reset from the @code{array_cookie} field in -the @code{value} structure. - -The second step is to install two regular values into @code{new_array}: - -@example - (void) make_const_string("hello", 5, & index); - (void) make_const_string("world", 5, & value); - if (! set_array_element(a_cookie, & index, & value)) @{ - printf("fill_in_array: set_array_element failed\n"); - return; - @} - - (void) make_const_string("answer", 6, & index); - (void) make_number(42.0, & value); - if (! set_array_element(a_cookie, & index, & value)) @{ - printf("fill_in_array: set_array_element failed\n"); - return; - @} -@end example - -The third step is to create the subarray and install it: - -@example - (void) make_const_string("subarray", 8, & index); - subarray = create_array(); - value.val_type = AWK_ARRAY; - value.array_cookie = subarray; - if (! set_array_element(a_cookie, & index, & value)) @{ - printf("fill_in_array: set_array_element failed\n"); - return; - @} - subarray = value.array_cookie; -@end example - -The final step is to populate the subarray with its own element: - -@example - (void) make_const_string("foo", 3, & index); - (void) make_const_string("bar", 3, & value); - if (! set_array_element(subarray, & index, & value)) @{ - printf("fill_in_array: set_array_element failed\n"); - return; - @} -@} -@ignore -static awk_ext_func_t func_table[] = @{ - @{ NULL, NULL, 0 @} -@}; - -/* init_testarray --- additional initialization function */ - -static awk_bool_t init_testarray(void) -@{ - create_new_array(); - - return 1; -@} - -static awk_bool_t (*init_func)(void) = init_testarray; - -dl_load_func(func_table, testarray, "") -@end ignore -@end example - -Here is sample script that loads the extension -and then dumps the array: - -@example -@@load "subarray" - -function dumparray(name, array, i) -@{ - for (i in array) - if (isarray(array[i])) - dumparray(name "[\"" i "\"]", array[i]) - else - printf("%s[\"%s\"] = %s\n", name, i, array[i]) -@} - -BEGIN @{ - dumparray("new_array", new_array); -@} -@end example - -Here is the result of running the script: - -@example -$ @kbd{AWKLIBPATH=$PWD ./gawk -f subarray.awk} -@print{} new_array["subarray"]["foo"] = bar -@print{} new_array["hello"] = world -@print{} new_array["answer"] = 42 -@end example - -@noindent -(@xref{Finding Extensions}, for more information on the -@env{AWKLIBPATH} environment variable.) - -@node Extension API Variables -@subsection API Variables - -The API provides two sets of variables. The first provides information -about the version of the API (both with which the extension was compiled, -and with which @command{gawk} was compiled). The second provides -information about how @command{gawk} was invoked. - -@menu -* Extension Versioning:: API Version information. -* Extension API Informational Variables:: Variables providing information about - @command{gawk}'s invocation. -@end menu - -@node Extension Versioning -@subsubsection API Version Constants and Variables - -The API provides both a ``major'' and a ``minor'' version number. -The API versions are available at compile time as constants: - -@table @code -@item GAWK_API_MAJOR_VERSION -The major version of the API. - -@item GAWK_API_MINOR_VERSION -The minor version of the API. -@end table - -The minor version increases when new functions are added to the API. Such -new functions are always added to the end of the API @code{struct}. - -The major version increases (and the minor version is reset to zero) if any -of the data types change size or member order, or if any of the existing -functions change signature. - -It could happen that an extension may be compiled against one version -of the API but loaded by a version of @command{gawk} using a different -version. For this reason, the major and minor API versions of the -running @command{gawk} are included in the API @code{struct} as read-only -constant integers: - -@table @code -@item api->major_version -The major version of the running @command{gawk}. - -@item api->minor_version -The minor version of the running @command{gawk}. -@end table - -It is up to the extension to decide if there are API incompatibilities. -Typically a check like this is enough: - -@example -if (api->major_version != GAWK_API_MAJOR_VERSION - || api->minor_version < GAWK_API_MINOR_VERSION) @{ - fprintf(stderr, "foo_extension: version mismatch with gawk!\n"); - fprintf(stderr, "\tmy version (%d, %d), gawk version (%d, %d)\n", - GAWK_API_MAJOR_VERSION, GAWK_API_MINOR_VERSION, - api->major_version, api->minor_version); - exit(1); -@} -@end example - -Such code is included in the boilerplate @code{dl_load_func()} macro -provided in @file{gawkapi.h} (discussed later, in -@ref{Extension API Boilerplate}). - -@node Extension API Informational Variables -@subsubsection Informational Variables - -The API provides access to several variables that describe -whether the corresponding command-line options were enabled when -@command{gawk} was invoked. The variables are: - -@table @code -@item do_lint -This variable is true if @command{gawk} was invoked with @option{--lint} option -(@pxref{Options}). - -@item do_traditional -This variable is true if @command{gawk} was invoked with @option{--traditional} option. - -@item do_profile -This variable is true if @command{gawk} was invoked with @option{--profile} option. - -@item do_sandbox -This variable is true if @command{gawk} was invoked with @option{--sandbox} option. - -@item do_debug -This variable is true if @command{gawk} was invoked with @option{--debug} option. - -@item do_mpfr -This variable is true if @command{gawk} was invoked with @option{--bignum} option. -@end table - -The value of @code{do_lint} can change if @command{awk} code -modifies the @code{LINT} built-in variable (@pxref{Built-in Variables}). -The others should not change during execution. - -@node Extension API Boilerplate -@subsection Boilerplate Code - -As mentioned earlier (@pxref{Extension Mechanism Outline}), the function -definitions as presented are really macros. To use these macros, your -extension must provide a small amount of boilerplate code (variables and -functions) towards the top of your source file, using pre-defined names -as described below. The boilerplate needed is also provided in comments -in the @file{gawkapi.h} header file: - -@example -/* Boiler plate code: */ -int plugin_is_GPL_compatible; - -static gawk_api_t *const api; -static awk_ext_id_t ext_id; -static const char *ext_version = NULL; /* or @dots{} = "some string" */ - -static awk_ext_func_t func_table[] = @{ - @{ "name", do_name, 1 @}, - /* @dots{} */ -@}; - -/* EITHER: */ - -static awk_bool_t (*init_func)(void) = NULL; - -/* OR: */ - -static awk_bool_t -init_my_module(void) -@{ - @dots{} -@} - -static awk_bool_t (*init_func)(void) = init_my_module; - -dl_load_func(func_table, some_name, "name_space_in_quotes") -@end example - -These variables and functions are as follows: - -@table @code -@item int plugin_is_GPL_compatible; -This asserts that the extension is compatible with the GNU GPL -(@pxref{Copying}). If your extension does not have this, @command{gawk} -will not load it (@pxref{Plugin License}). - -@item static gawk_api_t *const api; -This global @code{static} variable should be set to point to -the @code{gawk_api_t} pointer that @command{gawk} passes to your -@code{dl_load()} function. This variable is used by all of the macros. - -@item static awk_ext_id_t ext_id; -This global static variable should be set to the @code{awk_ext_id_t} -value that @command{gawk} passes to your @code{dl_load()} function. -This variable is used by all of the macros. - -@item static const char *ext_version = NULL; /* or @dots{} = "some string" */ -This global @code{static} variable should be set either -to @code{NULL}, or to point to a string giving the name and version of -your extension. - -@item static awk_ext_func_t func_table[] = @{ @dots{} @}; -This is an array of one or more @code{awk_ext_func_t} structures -as described earlier (@pxref{Extension Functions}). -It can then be looped over for multiple calls to -@code{add_ext_func()}. - -@item static awk_bool_t (*init_func)(void) = NULL; -@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @r{OR} -@itemx static awk_bool_t init_my_module(void) @{ @dots{} @} -@itemx static awk_bool_t (*init_func)(void) = init_my_module; -If you need to do some initialization work, you should define a -function that does it (creates variables, opens files, etc.) -and then define the @code{init_func} pointer to point to your -function. -The function should return zero (false) upon failure, non-zero -(success) if everything goes well. - -If you don't need to do any initialization, define the pointer and -initialize it to @code{NULL}. - -@item dl_load_func(func_table, some_name, "name_space_in_quotes") -This macro expands to a @code{dl_load()} function that performs -all the necessary initializations. -@end table - -The point of the all the variables and arrays is to let the -@code{dl_load()} function (from the @code{dl_load_func()} -macro) do all the standard work. It does the following: - -@enumerate 1 -@item -Check the API versions. If the extension major version does not match -@command{gawk}'s, or if the extension minor version is greater than -@command{gawk}'s, it prints a fatal error message and exits. - -@item -Load the functions defined in @code{func_table}. -If any of them fails to load, it prints a warning message but -continues on. - -@item -If the @code{init_func} pointer is not @code{NULL}, call the -function it points to. If it returns non-zero, print a -warning message. - -@item -If @code{ext_version} is not @code{NULL}, register -the version string with @command{gawk}. -@end enumerate - -@node Finding Extensions -@subsection How @command{gawk} Finds Extensions - -Compiled extensions have to be installed in a directory where -@command{gawk} can find them. If @command{gawk} is configured and -built in the default fashion, the directory in which to find -extensions is @file{/usr/local/lib/gawk}. You can also specify a search -path with a list of directories to search for compiled extensions. -@xref{AWKLIBPATH Variable}, for more information. - -@node Extension Example -@section Example: Some File Functions - -@quotation -@i{No matter where you go, there you are.} @* -Buckaroo Bonzai -@end quotation - -@c It's enough to show chdir and stat, no need for fts - -Two useful functions that are not in @command{awk} are @code{chdir()} (so -that an @command{awk} program can change its directory) and @code{stat()} -(so that an @command{awk} program can gather information about a file). -This @value{SECTION} implements these functions for @command{gawk} -in an extension. - -@menu -* Internal File Description:: What the new functions will do. -* Internal File Ops:: The code for internal file operations. -* Using Internal File Ops:: How to use an external extension. -@end menu - -@node Internal File Description -@subsection Using @code{chdir()} and @code{stat()} - -This @value{SECTION} shows how to use the new functions at -the @command{awk} level once they've been integrated into the -running @command{gawk} interpreter. Using @code{chdir()} is very -straightforward. It takes one argument, the new directory to change to: - -@example -@@load "filefuncs" -@dots{} -newdir = "/home/arnold/funstuff" -ret = chdir(newdir) -if (ret < 0) @{ - printf("could not change to %s: %s\n", - newdir, ERRNO) > "/dev/stderr" - exit 1 -@} -@dots{} -@end example - -The return value is negative if the @code{chdir()} failed, and -@code{ERRNO} (@pxref{Built-in Variables}) is set to a string indicating -the error. - -Using @code{stat()} is a bit more complicated. The C @code{stat()} -function fills in a structure that has a fair amount of information. -The right way to model this in @command{awk} is to fill in an associative -array with the appropriate information: - -@c broke printf for page breaking -@example -file = "/home/arnold/.profile" -ret = stat(file, fdata) -if (ret < 0) @{ - printf("could not stat %s: %s\n", - file, ERRNO) > "/dev/stderr" - exit 1 -@} -printf("size of %s is %d bytes\n", file, fdata["size"]) -@end example - -The @code{stat()} function always clears the data array, even if -the @code{stat()} fails. It fills in the following elements: - -@table @code -@item "name" -The name of the file that was @code{stat()}'ed. - -@item "dev" -@itemx "ino" -The file's device and inode numbers, respectively. - -@item "mode" -The file's mode, as a numeric value. This includes both the file's -type and its permissions. - -@item "nlink" -The number of hard links (directory entries) the file has. - -@item "uid" -@itemx "gid" -The numeric user and group ID numbers of the file's owner. - -@item "size" -The size in bytes of the file. - -@item "blocks" -The number of disk blocks the file actually occupies. This may not -be a function of the file's size if the file has holes. - -@item "atime" -@itemx "mtime" -@itemx "ctime" -The file's last access, modification, and inode update times, -respectively. These are numeric timestamps, suitable for formatting -with @code{strftime()} -(@pxref{Time Functions}). - -@item "pmode" -The file's ``printable mode.'' This is a string representation of -the file's type and permissions, such as is produced by -@samp{ls -l}---for example, @code{"drwxr-xr-x"}. - -@item "type" -A printable string representation of the file's type. The value -is one of the following: - -@table @code -@item "blockdev" -@itemx "chardev" -The file is a block or character device (``special file''). - -@ignore -@item "door" -The file is a Solaris ``door'' (special file used for -interprocess communications). -@end ignore - -@item "directory" -The file is a directory. - -@item "fifo" -The file is a named-pipe (also known as a FIFO). - -@item "file" -The file is just a regular file. - -@item "socket" -The file is an @code{AF_UNIX} (``Unix domain'') socket in the -filesystem. - -@item "symlink" -The file is a symbolic link. -@end table -@end table - -Several additional elements may be present depending upon the operating -system and the type of the file. You can test for them in your @command{awk} -program by using the @code{in} operator -(@pxref{Reference to Elements}): - -@table @code -@item "blksize" -The preferred block size for I/O to the file. This field is not -present on all POSIX-like systems in the C @code{stat} structure. - -@item "linkval" -If the file is a symbolic link, this element is the name of the -file the link points to (i.e., the value of the link). - -@item "rdev" -@itemx "major" -@itemx "minor" -If the file is a block or character device file, then these values -represent the numeric device number and the major and minor components -of that number, respectively. -@end table - -@node Internal File Ops -@subsection C Code for @code{chdir()} and @code{stat()} - -Here is the C code for these extensions.@footnote{This version is -edited slightly for presentation. See @file{extension/filefuncs.c} -in the @command{gawk} distribution for the complete version.} - -The file includes a number of standard header files, and then includes -the @file{gawkapi.h} header file which provides the API definitions. -Those are followed by the necessary variable declarations -to make use of the API macros and boilerplate code -(@pxref{Extension API Boilerplate}). - -@c break line for page breaking -@example -#ifdef HAVE_CONFIG_H -#include <config.h> -#endif - -#include <stdio.h> -#include <assert.h> -#include <errno.h> -#include <stdlib.h> -#include <string.h> -#include <unistd.h> - -#include <sys/types.h> -#include <sys/stat.h> - -#include "gawkapi.h" - -#include "gettext.h" -#define _(msgid) gettext(msgid) -#define N_(msgid) msgid - -#include "gawkfts.h" -#include "stack.h" - -static const gawk_api_t *api; /* for convenience macros to work */ -static awk_ext_id_t *ext_id; -static awk_bool_t init_filefuncs(void); -static awk_bool_t (*init_func)(void) = init_filefuncs; -static const char *ext_version = "filefuncs extension: version 1.0"; - -int plugin_is_GPL_compatible; -@end example - -@cindex programming conventions, @command{gawk} internals -By convention, for an @command{awk} function @code{foo()}, the C function -that implements it is called @code{do_foo()}. The function should have -two arguments: the first is an @code{int} usually called @code{nargs}, -that represents the number of actual arguments for the function. -The second is a pointer to an @code{awk_value_t}, usually named -@code{result}. - -@example -/* do_chdir --- provide dynamically loaded chdir() builtin for gawk */ - -static awk_value_t * -do_chdir(int nargs, awk_value_t *result) -@{ - awk_value_t newdir; - int ret = -1; - - assert(result != NULL); - - if (do_lint && nargs != 1) - lintwarn(ext_id, - _("chdir: called with incorrect number of arguments, " - "expecting 1")); -@end example - -The @code{newdir} -variable represents the new directory to change to, retrieved -with @code{get_argument()}. Note that the first argument is -numbered zero. - -If the argument is retrieved successfully, the function calls the -@code{chdir()} system call. If the @code{chdir()} fails, @code{ERRNO} -is updated. - -@example - if (get_argument(0, AWK_STRING, & newdir)) @{ - ret = chdir(newdir.str_value.str); - if (ret < 0) - update_ERRNO_int(errno); - @} -@end example - -Finally, the function returns the return value to the @command{awk} level: - -@example - return make_number(ret, result); -@} -@end example - -The @code{stat()} built-in is more involved. First comes a function -that turns a numeric mode into a printable representation -(e.g., 644 becomes @samp{-rw-r--r--}). This is omitted here for brevity: - -@c break line for page breaking -@example -/* format_mode --- turn a stat mode field into something readable */ - -static char * -format_mode(unsigned long fmode) -@{ - @dots{} -@} -@end example - -Next comes a function for reading symbolic links, which is also -omitted here for brevity: - -@example -/* read_symlink --- read a symbolic link into an allocated buffer. - @dots{} */ - -static char * -read_symlink(const char *fname, size_t bufsize, ssize_t *linksize) -@{ - @dots{} -@} -@end example - -Two helper functions simplify entering values in the -array that will contain the result of the @code{stat()}: - -@example -/* array_set --- set an array element */ - -static void -array_set(awk_array_t array, const char *sub, awk_value_t *value) -@{ - awk_value_t index; - - set_array_element(array, - make_const_string(sub, strlen(sub), & index), - value); - -@} - -/* array_set_numeric --- set an array element with a number */ - -static void -array_set_numeric(awk_array_t array, const char *sub, double num) -@{ - awk_value_t tmp; - - array_set(array, sub, make_number(num, & tmp)); -@} -@end example - -The following function does most of the work to fill in -the @code{awk_array_t} result array with values obtained -from a valid @code{struct stat}. It is done in a separate function -to support the @code{stat()} function for @command{gawk} and also -to support the @code{fts()} extension which is included in -the same file but whose code is not shown here -(@pxref{Extension Sample File Functions}). - -The first part of the function is variable declarations, -including a table to map file types to strings: - -@example -/* fill_stat_array --- do the work to fill an array with stat info */ - -static int -fill_stat_array(const char *name, awk_array_t array, struct stat *sbuf) -@{ - char *pmode; /* printable mode */ - const char *type = "unknown"; - awk_value_t tmp; - static struct ftype_map @{ - unsigned int mask; - const char *type; - @} ftype_map[] = @{ - @{ S_IFREG, "file" @}, - @{ S_IFBLK, "blockdev" @}, - @{ S_IFCHR, "chardev" @}, - @{ S_IFDIR, "directory" @}, -#ifdef S_IFSOCK - @{ S_IFSOCK, "socket" @}, -#endif -#ifdef S_IFIFO - @{ S_IFIFO, "fifo" @}, -#endif -#ifdef S_IFLNK - @{ S_IFLNK, "symlink" @}, -#endif -#ifdef S_IFDOOR /* Solaris weirdness */ - @{ S_IFDOOR, "door" @}, -#endif /* S_IFDOOR */ - @}; - int j, k; -@end example - -The destination array is cleared, and then code fills in -various elements based on values in the @code{struct stat}: - -@example - /* empty out the array */ - clear_array(array); - - /* fill in the array */ - array_set(array, "name", make_const_string(name, strlen(name), - & tmp)); - array_set_numeric(array, "dev", sbuf->st_dev); - array_set_numeric(array, "ino", sbuf->st_ino); - array_set_numeric(array, "mode", sbuf->st_mode); - array_set_numeric(array, "nlink", sbuf->st_nlink); - array_set_numeric(array, "uid", sbuf->st_uid); - array_set_numeric(array, "gid", sbuf->st_gid); - array_set_numeric(array, "size", sbuf->st_size); - array_set_numeric(array, "blocks", sbuf->st_blocks); - array_set_numeric(array, "atime", sbuf->st_atime); - array_set_numeric(array, "mtime", sbuf->st_mtime); - array_set_numeric(array, "ctime", sbuf->st_ctime); - - /* for block and character devices, add rdev, - major and minor numbers */ - if (S_ISBLK(sbuf->st_mode) || S_ISCHR(sbuf->st_mode)) @{ - array_set_numeric(array, "rdev", sbuf->st_rdev); - array_set_numeric(array, "major", major(sbuf->st_rdev)); - array_set_numeric(array, "minor", minor(sbuf->st_rdev)); - @} -@end example - -@noindent -The latter part of the function makes selective additions -to the destination array, depending upon the availability of -certain members and/or the type of the file. It then returns zero, -for success: - -@example -#ifdef HAVE_ST_BLKSIZE - array_set_numeric(array, "blksize", sbuf->st_blksize); -#endif /* HAVE_ST_BLKSIZE */ - - pmode = format_mode(sbuf->st_mode); - array_set(array, "pmode", make_const_string(pmode, strlen(pmode), - & tmp)); - - /* for symbolic links, add a linkval field */ - if (S_ISLNK(sbuf->st_mode)) @{ - char *buf; - ssize_t linksize; - - if ((buf = read_symlink(name, sbuf->st_size, - & linksize)) != NULL) - array_set(array, "linkval", - make_malloced_string(buf, linksize, & tmp)); - else - warning(ext_id, _("stat: unable to read symbolic link `%s'"), - name); - @} - - /* add a type field */ - type = "unknown"; /* shouldn't happen */ - for (j = 0, k = sizeof(ftype_map)/sizeof(ftype_map[0]); j < k; j++) @{ - if ((sbuf->st_mode & S_IFMT) == ftype_map[j].mask) @{ - type = ftype_map[j].type; - break; - @} - @} - - array_set(array, "type", make_const_string(type, strlen(type), &tmp)); - - return 0; -@} -@end example - -Finally, here is the @code{do_stat()} function. It starts with -variable declarations and argument checking: - -@ignore -Changed message for page breaking. Used to be: - "stat: called with incorrect number of arguments (%d), should be 2", -@end ignore -@example -/* do_stat --- provide a stat() function for gawk */ - -static awk_value_t * -do_stat(int nargs, awk_value_t *result) -@{ - awk_value_t file_param, array_param; - char *name; - awk_array_t array; - int ret; - struct stat sbuf; - - assert(result != NULL); - - if (do_lint && nargs != 2) @{ - lintwarn(ext_id, - _("stat: called with wrong number of arguments")); - return make_number(-1, result); - @} -@end example - -Then comes the actual work. First, the function gets the arguments. -Next, it gets the information for the file. -The code use @code{lstat()} (instead of @code{stat()}) -to get the file information, -in case the file is a symbolic link. -If there's an error, it sets @code{ERRNO} and returns: - -@example - /* file is first arg, array to hold results is second */ - if ( ! get_argument(0, AWK_STRING, & file_param) - || ! get_argument(1, AWK_ARRAY, & array_param)) @{ - warning(ext_id, _("stat: bad parameters")); - return make_number(-1, result); - @} - - name = file_param.str_value.str; - array = array_param.array_cookie; - - /* always empty out the array */ - clear_array(array); - - /* lstat the file, if error, set ERRNO and return */ - ret = lstat(name, & sbuf); - if (ret < 0) @{ - update_ERRNO_int(errno); - return make_number(ret, result); - @} -@end example - -The tedious work is done by @code{fill_stat_array()}, shown -earlier. When done, return the result from @code{fill_stat_array()}: - -@example - ret = fill_stat_array(name, array, & sbuf); - - return make_number(ret, result); -@} -@end example - -@cindex programming conventions, @command{gawk} internals -Finally, it's necessary to provide the ``glue'' that loads the -new function(s) into @command{gawk}. - -The @code{filefuncs} extension also provides an @code{fts()} -function, which we omit here. For its sake there is an initialization -function: - -@example -/* init_filefuncs --- initialization routine */ - -static awk_bool_t -init_filefuncs(void) -@{ - @dots{} -@} -@end example - -We are almost done. We need an array of @code{awk_ext_func_t} -structures for loading each function into @command{gawk}: - -@example -static awk_ext_func_t func_table[] = @{ - @{ "chdir", do_chdir, 1 @}, - @{ "stat", do_stat, 2 @}, - @{ "fts", do_fts, 3 @}, -@}; -@end example - -Each extension must have a routine named @code{dl_load()} to load -everything that needs to be loaded. It is simplest to use the -@code{dl_load_func()} macro in @code{gawkapi.h}: - -@example -/* define the dl_load() function using the boilerplate macro */ - -dl_load_func(func_table, filefuncs, "") -@end example - -And that's it! As an exercise, consider adding functions to -implement system calls such as @code{chown()}, @code{chmod()}, -and @code{umask()}. - -@node Using Internal File Ops -@subsection Integrating The Extensions - -@cindex @command{gawk}, interpreter@comma{} adding code to -Now that the code is written, it must be possible to add it at -runtime to the running @command{gawk} interpreter. First, the -code must be compiled. Assuming that the functions are in -a file named @file{filefuncs.c}, and @var{idir} is the location -of the @file{gawkapi.h} header file, -the following steps@footnote{In practice, you would probably want to -use the GNU Autotools---Automake, Autoconf, Libtool, and Gettext---to -configure and build your libraries. Instructions for doing so are beyond -the scope of this @value{DOCUMENT}. @xref{gawkextlib}, for WWW links to -the tools.} create a GNU/Linux shared library: - -@example -$ @kbd{gcc -fPIC -shared -DHAVE_CONFIG_H -c -O -g -I@var{idir} filefuncs.c} -$ @kbd{ld -o filefuncs.so -shared filefuncs.o -lc} -@end example - -Once the library exists, it is loaded by using the @code{@@load} keyword. - -@example -# file testff.awk -@@load "filefuncs" - -BEGIN @{ - "pwd" | getline curdir # save current directory - close("pwd") - - chdir("/tmp") - system("pwd") # test it - chdir(curdir) # go back - - print "Info for testff.awk" - ret = stat("testff.awk", data) - print "ret =", ret - for (i in data) - printf "data[\"%s\"] = %s\n", i, data[i] - print "testff.awk modified:", - strftime("%m %d %y %H:%M:%S", data["mtime"]) - - print "\nInfo for JUNK" - ret = stat("JUNK", data) - print "ret =", ret - for (i in data) - printf "data[\"%s\"] = %s\n", i, data[i] - print "JUNK modified:", strftime("%m %d %y %H:%M:%S", data["mtime"]) -@} -@end example - -The @env{AWKLIBPATH} environment variable tells -@command{gawk} where to find shared libraries (@pxref{Finding Extensions}). -We set it to the current directory and run the program: - -@example -$ @kbd{AWKLIBPATH=$PWD gawk -f testff.awk} -@print{} /tmp -@print{} Info for testff.awk -@print{} ret = 0 -@print{} data["blksize"] = 4096 -@print{} data["mtime"] = 1350838628 -@print{} data["mode"] = 33204 -@print{} data["type"] = file -@print{} data["dev"] = 2053 -@print{} data["gid"] = 1000 -@print{} data["ino"] = 1719496 -@print{} data["ctime"] = 1350838628 -@print{} data["blocks"] = 8 -@print{} data["nlink"] = 1 -@print{} data["name"] = testff.awk -@print{} data["atime"] = 1350838632 -@print{} data["pmode"] = -rw-rw-r-- -@print{} data["size"] = 662 -@print{} data["uid"] = 1000 -@print{} testff.awk modified: 10 21 12 18:57:08 -@print{} -@print{} Info for JUNK -@print{} ret = -1 -@print{} JUNK modified: 01 01 70 02:00:00 -@end example - -@node Extension Samples -@section The Sample Extensions In The @command{gawk} Distribution - -This @value{SECTION} provides brief overviews of the sample extensions -that come in the @command{gawk} distribution. Some of them are intended -for production use, such the @code{filefuncs} and @code{readdir} extensions. -Others mainly provide example code that shows how to use the extension API. - -@menu -* Extension Sample File Functions:: The file functions sample. -* Extension Sample Fnmatch:: An interface to @code{fnmatch()}. -* Extension Sample Fork:: An interface to @code{fork()} and other - process functions. -* Extension Sample Ord:: Character to value to character - conversions. -* Extension Sample Readdir:: An interface to @code{readdir()}. -* Extension Sample Revout:: Reversing output sample output wrapper. -* Extension Sample Rev2way:: Reversing data sample two-way processor. -* Extension Sample Read write array:: Serializing an array to a file. -* Extension Sample Readfile:: Reading an entire file into a string. -* Extension Sample API Tests:: Tests for the API. -* Extension Sample Time:: An interface to @code{gettimeofday()} - and @code{sleep()}. -@end menu - -@node Extension Sample File Functions -@subsection File Related Functions - -The @code{filefuncs} extension provides three different functions, as follows: -The usage is: - -@table @code -@item @@load "filefuncs" -This is how you load the extension. - -@item result = chdir("/some/directory") -The @code{chdir()} function is a direct hook to the @code{chdir()} -system call to change the current directory. It returns zero -upon success or less than zero upon error. In the latter case it updates -@code{ERRNO}. - -@item result = stat("/some/path", statdata) -The @code{stat()} function provides a hook into the -@code{stat()} system call. In fact, it uses @code{lstat()}. -It returns zero upon success or less than zero upon error. -In the latter case it updates @code{ERRNO}. - -In all cases, it clears the @code{statdata} array. -When the call is successful, @code{stat()} fills the @code{statdata} -array with information retrieved from the filesystem, as follows: - -@c nested table -@multitable @columnfractions .25 .60 -@item @code{statdata["name"]} @tab -The name of the file. - -@item @code{statdata["dev"]} @tab -Corresponds to the @code{st_dev} field in the @code{struct stat}. - -@item @code{statdata["ino"]} @tab -Corresponds to the @code{st_ino} field in the @code{struct stat}. - -@item @code{statdata["mode"]} @tab -Corresponds to the @code{st_mode} field in the @code{struct stat}. - -@item @code{statdata["nlink"]} @tab -Corresponds to the @code{st_nlink} field in the @code{struct stat}. - -@item @code{statdata["uid"]} @tab -Corresponds to the @code{st_uid} field in the @code{struct stat}. - -@item @code{statdata["gid"]} @tab -Corresponds to the @code{st_gid} field in the @code{struct stat}. - -@item @code{statdata["size"]} @tab -Corresponds to the @code{st_size} field in the @code{struct stat}. - -@item @code{statdata["atime"]} @tab -Corresponds to the @code{st_atime} field in the @code{struct stat}. - -@item @code{statdata["mtime"]} @tab -Corresponds to the @code{st_mtime} field in the @code{struct stat}. - -@item @code{statdata["ctime"]} @tab -Corresponds to the @code{st_ctime} field in the @code{struct stat}. - -@item @code{statdata["rdev"]} @tab -Corresponds to the @code{st_rdev} field in the @code{struct stat}. -This element is only present for device files. - -@item @code{statdata["major"]} @tab -Corresponds to the @code{st_major} field in the @code{struct stat}. -This element is only present for device files. - -@item @code{statdata["minor"]} @tab -Corresponds to the @code{st_minor} field in the @code{struct stat}. -This element is only present for device files. - -@item @code{statdata["blksize"]} @tab -Corresponds to the @code{st_blksize} field in the @code{struct stat}. -if this field is present on your system. -(It is present on all modern systems that we know of.) - -@item @code{statdata["pmode"]} @tab -A human-readable version of the mode value, such as printed by -@command{ls}. For example, @code{"-rwxr-xr-x"}. - -@item @code{statdata["linkval"]} @tab -If the named file is a symbolic link, this element will exist -and its value is the value of the symbolic link (where the -symbolic link points to). - -@item @code{statdata["type"]} @tab -The type of the file as a string. One of -@code{"file"}, -@code{"blockdev"}, -@code{"chardev"}, -@code{"directory"}, -@code{"socket"}, -@code{"fifo"}, -@code{"symlink"}, -@code{"door"}, -or -@code{"unknown"}. -Not all systems support all file types. -@end multitable - -@item flags = or(FTS_PHYSICAL, ...) -@itemx result = fts(pathlist, flags, filedata) -Walk the file trees provided in @code{pathlist} and fill in the -@code{filedata} array as described below. @code{flags} is the bitwise -OR of several predefined constant values, also as described below. -Return zero if there were no errors, otherwise return @minus{}1. -@end table - -The @code{fts()} function provides a hook to the C library @code{fts()} -routines for traversing file hierarchies. Instead of returning data -about one file at a time in a stream, it fills in a multi-dimensional -array with data about each file and directory encountered in the requested -hierarchies. - -The arguments are as follows: - -@table @code -@item pathlist -An array of filenames. The element values are used; the index values are ignored. - -@item flags -This should be the bitwise OR of one or more of the following -predefined constant flag values. At least one of -@code{FTS_LOGICAL} or @code{FTS_PHYSICAL} must be provided; otherwise -@code{fts()} returns an error value and sets @code{ERRNO}. -The flags are: - -@c nested table -@table @code -@item FTS_LOGICAL -Do a ``logical'' file traversal, where the information returned for -a symbolic link refers to the linked-to file, and not to the symbolic -link itself. This flag is mutually exclusive with @code{FTS_PHYSICAL}. - -@item FTS_PHYSICAL -Do a ``physical'' file traversal, where the information returned for a -symbolic link refers to the symbolic link itself. This flag is mutually -exclusive with @code{FTS_LOGICAL}. - -@item FTS_NOCHDIR -As a performance optimization, the C library @code{fts()} routines -change directory as they traverse a file hierarchy. This flag disables -that optimization. - -@item FTS_COMFOLLOW -Immediately follow a symbolic link named in @code{pathlist}, -whether or not @code{FTS_LOGICAL} is set. - -@item FTS_SEEDOT -By default, the @code{fts()} routines do not return entries for @file{.} -and @file{..}. This option causes entries for @file{..} to also -be included. (The extension always includes an entry for @file{.}, -see below.) - -@item FTS_XDEV -During a traversal, do not cross onto a different mounted filesystem. -@end table - -@item filedata -The @code{filedata} array is first cleared. Then, @code{fts()} creates -an element in @code{filedata} for every element in @code{pathlist}. -The index is the name of the directory or file given in @code{pathlist}. -The element for this index is itself an array. There are two cases. - -@c nested table -@table @emph -@item The path is a file. -In this case, the array contains two or three elements: - -@c doubly nested table -@table @code -@item "path" -The full path to this file, starting from the ``root'' that was given -in the @code{pathlist} array. - -@item "stat" -This element is itself an array, containing the same information as provided -by the @code{stat()} function described earlier for its -@code{statdata} argument. The element may not be present if -the @code{stat()} system call for the file failed. - -@item "error" -If some kind of error was encountered, the array will also -contain an element named @code{"error"}, which is a string describing the error. -@end table - -@item The path is a directory. -In this case, the array contains one element for each entry in the -directory. If an entry is a file, that element is as for files, just -described. If the entry is a directory, that element is (recursively), -an array describing the subdirectory. If @code{FTS_SEEDOT} was provided -in the flags, then there will also be an element named @code{".."}. This -element will be an array containing the data as provided by @code{stat()}. - -In addition, there will be an element whose index is @code{"."}. -This element is an array containing the same two or three elements as -for a file: @code{"path"}, @code{"stat"}, and @code{"error"}. -@end table -@end table - -The @code{fts()} function returns zero if there were no errors. -Otherwise it returns @minus{}1. - -@quotation NOTE -The @code{fts()} extension does not exactly mimic the -interface of the C library @code{fts()} routines, choosing instead to -provide an interface that is based on associative arrays, which should -be more comfortable to use from an @command{awk} program. This includes the -lack of a comparison function, since @command{gawk} already provides -powerful array sorting facilities. While an @code{fts_read()}-like -interface could have been provided, this felt less natural than simply -creating a multi-dimensional array to represent the file hierarchy and -its information. -@end quotation - -See @file{test/fts.awk} in the @command{gawk} distribution for an example. - -@node Extension Sample Fnmatch -@subsection Interface To @code{fnmatch()} - -This extension provides an interface to the C library -@code{fnmatch()} function. The usage is: - -@example -@@load "fnmatch" - -result = fnmatch(pattern, string, flags) -@end example - -The @code{fnmatch} extension adds a single function named -@code{fnmatch()}, one constant (@code{FNM_NOMATCH}), and an array of -flag values named @code{FNM}. - -The arguments to @code{fnmatch()} are: - -@table @code -@item pattern -The filename wildcard to match. - -@item string -The filename string, - -@item flag -Either zero, or the bitwise OR of one or more of the -flags in the @code{FNM} array. -@end table - -The return value is zero on success, @code{FNM_NOMATCH} -if the string did not match the pattern, or -a different non-zero value if an error occurred. - -The flags are follows: - -@multitable @columnfractions .25 .75 -@item @code{FNM["CASEFOLD"]} @tab -Corresponds to the @code{FNM_CASEFOLD} flag as defined in @code{fnmatch()}. - -@item @code{FNM["FILE_NAME"]} @tab -Corresponds to the @code{FNM_FILE_NAME} flag as defined in @code{fnmatch()}. - -@item @code{FNM["LEADING_DIR"]} @tab -Corresponds to the @code{FNM_LEADING_DIR} flag as defined in @code{fnmatch()}. - -@item @code{FNM["NOESCAPE"]} @tab -Corresponds to the @code{FNM_NOESCAPE} flag as defined in @code{fnmatch()}. - -@item @code{FNM["PATHNAME"]} @tab -Corresponds to the @code{FNM_PATHNAME} flag as defined in @code{fnmatch()}. - -@item @code{FNM["PERIOD"]} @tab -Corresponds to the @code{FNM_PERIOD} flag as defined in @code{fnmatch()}. -@end multitable - -Here is an example: - -@example -@@load "fnmatch" -@dots{} -flags = or(FNM["PERIOD"], FNM["NOESCAPE"]) -if (fnmatch("*.a", "foo.c", flags) == FNM_NOMATCH) - print "no match" -@end example - -@node Extension Sample Fork -@subsection Interface To @code{fork()}, @code{wait()} and @code{waitpid()} - -The @code{fork} extension adds three functions, as follows. - -@table @code -@item @@load "fork" -This is how you load the extension. - -@item pid = fork() -This function creates a new process. The return value is the zero in the -child and the process-id number of the child in the parent, or @minus{}1 -upon error. In the latter case, @code{ERRNO} indicates the problem. -In the child, @code{PROCINFO["pid"]} and @code{PROCINFO["ppid"]} are -updated to reflect the correct values. - -@item ret = waitpid(pid) -This function takes a numeric argument, which is the process-id to -wait for. The return value is that of the -@code{waitpid()} system call. - -@item ret = wait() -This function waits for the first child to die. -The return value is that of the -@code{wait()} system call. -@end table - -There is no corresponding @code{exec()} function. - -Here is an example: - -@example -@@load "fork" -@dots{} -if ((pid = fork()) == 0) - print "hello from the child" -else - print "hello from the parent" -@end example - -@node Extension Sample Ord -@subsection Character and Numeric values: @code{ord()} and @code{chr()} - -The @code{ordchr} extension adds two functions, named -@code{ord()} and @code{chr()}, as follows. - -@table @code -@item number = ord(string) -Return the numeric value of the first character in @code{string}. - -@item char = chr(number) -Return the string whose first character is that represented by @code{number}. -@end table - -These functions are inspired by the Pascal language functions -of the same name. Here is an example: - -@example -@@load "ordchr" -@dots{} -printf("The numeric value of 'A' is %d\n", ord("A")) -printf("The string value of 65 is %s\n", chr(65)) -@end example - -@node Extension Sample Readdir -@subsection Reading Directories - -The @code{readdir} extension adds an input parser for directories, and -adds a single function named @code{readdir_do_ftype()}. -The usage is as follows: - -@example -@@load "readdir" - -readdir_do_ftype("stat") # or "dirent" or "never" -@end example - -When this extension is in use, instead of skipping directories named -on the command line (or with @code{getline}), -they are read, with each entry returned as a record. - -The record consists of at least two fields: the inode number and the -filename, separated by a forward slash character. -On systems where the directory entry contains the file type, the record -has a third field which is a single letter indicating the type of the -file: - -@multitable @columnfractions .1 .9 -@headitem Letter @tab File Type -@item @code{b} @tab Block device -@item @code{c} @tab Character device -@item @code{d} @tab Directory -@item @code{f} @tab Regular file -@item @code{l} @tab Symbolic link -@item @code{p} @tab Named pipe (FIFO) -@item @code{s} @tab Socket -@item @code{u} @tab Anything else (unknown) -@end multitable - -On systems without the file type information, calling -@samp{readdir_do_ftype("stat")} causes the extension to use the -@code{lstat()} system call to retrieve the appropriate information. This -is not the default, since @code{lstat()} is a potentially expensive -operation. By calling @samp{readdir_do_ftype("never")} one can ensure -that the file type information is never displayed, even when readily -available in the directory entry. - -The third option, @samp{readdir_do_ftype("dirent")}, takes file type -information from the directory entry, if it is available. This is the -default on systems that supply this information. - -The @code{readdir_do_ftype()} function sets @code{ERRNO} if called -without arguments or with invalid arguments. - -@quotation NOTE -On GNU/Linux systems, there are filesystems that don't support the -@code{d_type} entry (see the @i{readdir}(3) manual page), and so the file -type is always @samp{u}. Therefore, using @samp{readdir_do_ftype("stat")} -is advisable even on GNU/Linux systems. In this case, the @code{readdir} -extension falls back to using @code{lstat()} when it encounters an -unknown file type. -@end quotation - -Here is an example: - -@example -@@load "readdir" -@dots{} -BEGIN @{ FS = "/" @} -@{ print "file name is", $2 @} -@end example - -@node Extension Sample Revout -@subsection Reversing Output - -The @code{revoutput} extension adds a simple output wrapper that reverses -the characters in each output line. It's main purpose is to show how to -write an output wrapper, although it may be mildly amusing for the unwary. -Here is an example: - -@example -@@load "revoutput" - -BEGIN @{ - REVOUT = 1 - print "hello, world" > "/dev/stdout" -@} -@end example - -The output from this program is: -@samp{dlrow ,olleh}. - -@node Extension Sample Rev2way -@subsection Two-Way I/O Example - -The @code{revtwoway} extension adds a simple two-way processor that -reverses the characters in each line sent to it for reading back by -the @command{awk} program. It's main purpose is to show how to write -a two-way processor, although it may also be mildly amusing. -The following example shows how to use it: - -@example -@@load "revtwoway" - -BEGIN @{ - cmd = "/magic/mirror" - print "hello, world" |& cmd - cmd |& getline result - print result - close(cmd) -@} -@end example - -@node Extension Sample Read write array -@subsection Dumping and Restoring An Array - -The @code{rwarray} extension adds two functions, -named @code{writea()} and @code{reada()}, as follows: - -@table @code -@item ret = writea(file, array) -This function takes a string argument, which is the name of the file -to which dump the array, and the array itself as the second argument. -@code{writea()} understands multidimensional arrays. It returns one on -success, or zero upon failure. - -@item ret = reada(file, array) -@code{reada()} is the inverse of @code{writea()}; -it reads the file named as its first argument, filling in -the array named as the second argument. It clears the array first. -Here too, the return value is one on success and zero upon failure. -@end table - -The array created by @code{reada()} is identical to that written by -@code{writea()} in the sense that the contents are the same. However, -due to implementation issues, the array traversal order of the recreated -array is likely to be different from that of the original array. As array -traversal order in @command{awk} is by default undefined, this is not -(technically) a problem. If you need to guarantee a particular traversal -order, use the array sorting features in @command{gawk} to do so -(@pxref{Array Sorting}). - -The file contains binary data. All integral values are written in network -byte order. However, double precision floating-point values are written -as native binary data. Thus, arrays containing only string data can -theoretically be dumped on systems with one byte order and restored on -systems with a different one, but this has not been tried. - -Here is an example: - -@example -@@load "rwarray" -@dots{} -ret = writea("arraydump.bin", array) -@dots{} -ret = reada("arraydump.bin", array) -@end example - -@node Extension Sample Readfile -@subsection Reading An Entire File - -The @code{readfile} extension adds a single function -named @code{readfile()}: - -@table @code -@item result = readfile("/some/path") -The argument is the name of the file to read. The return value is a -string containing the entire contents of the requested file. Upon error, -the function returns the empty string and sets @code{ERRNO}. -@end table - -Here is an example: - -@example -@@load "readfile" -@dots{} -contents = readfile("/path/to/file"); -if (contents == "" && ERRNO != "") @{ - print("problem reading file", ERRNO) > "/dev/stderr" - ... -@} -@end example - -@node Extension Sample API Tests -@subsection API Tests - -The @code{testext} extension exercises parts of the extension API that -are not tested by the other samples. The @file{extension/testext.c} -file contains both the C code for the extension and @command{awk} -test code inside C comments that run the tests. The testing framework -extracts the @command{awk} code and runs the tests. See the source file -for more information. - -@node Extension Sample Time -@subsection Extension Time Functions - -@cindex time -@cindex sleep - -These functions can be used by either invoking @command{gawk} -with a command-line argument of @samp{-l time} or by -inserting @samp{@@load "time"} in your script. - -@table @code - -@cindex @code{gettimeofday} time extension function -@item the_time = gettimeofday() -Return the time in seconds that has elapsed since 1970-01-01 UTC as a -floating point value. If the time is unavailable on this platform, return -@minus{}1 and set @code{ERRNO}. The returned time should have sub-second -precision, but the actual precision will vary based on the platform. -If the standard C @code{gettimeofday()} system call is available on this -platform, then it simply returns the value. Otherwise, if on Windows, -it tries to use @code{GetSystemTimeAsFileTime()}. - -@cindex @code{sleep} time extension function -@item result = sleep(@var{seconds}) -Attempt to sleep for @var{seconds} seconds. If @var{seconds} is negative, -or the attempt to sleep fails, return @minus{}1 and set @code{ERRNO}. -Otherwise, return zero after sleeping for the indicated amount of time. -Note that @var{seconds} may be a floating-point (non-integral) value. -Implementation details: depending on platform availability, this function -tries to use @code{nanosleep()} or @code{select()} to implement the delay. -@end table - -@node gawkextlib -@section The @code{gawkextlib} Project - -The @uref{http://sourceforge.net/projects/gawkextlib/, @code{gawkextlib}} -project provides a number of @command{gawk} extensions, including one for -processing XML files. This is the evolution of the original @command{xgawk} -(XML @command{gawk}) project. - -As of this writing, there are four extensions: - -@itemize @bullet -@item -XML parser extension, using the @uref{http://expat.sourceforge.net, Expat} -XML parsing library. - -@item -Postgres SQL extension. - -@item -GD graphics library extension. - -@item -MPFR library extension. -This provides access to a number of MPFR functions which @command{gawk}'s -native MPFR support does not. -@end itemize - -The @code{time} extension described earlier (@pxref{Extension Sample -Time}) was originally from this project but has been moved in to the -main @command{gawk} distribution. - -You can check out the code for the @code{gawkextlib} project -using the @uref{http://git-scm.com, GIT} distributed source -code control system. The command is as follows: - -@example -git clone git://git.code.sf.net/p/gawkextlib/code gawkextlib-code -@end example - -You will need to have the @uref{http://expat.sourceforge.net, Expat} -XML parser library installed in order to build and use the XML extension. - -In addition, you must have the GNU Autotools installed -(@uref{http://www.gnu.org/software/autoconf, Autoconf}, -@uref{http://www.gnu.org/software/automake, Automake}, -@uref{http://www.gnu.org/software/libtool, Libtool}, -and -@uref{http://www.gnu.org/software/gettext, Gettext}). - -The simple recipe for building and testing @code{gawkextlib} is as follows. -First, build and install @command{gawk}: - -@example -cd .../path/to/gawk/code -./configure --prefix=/tmp/newgawk @ii{Install in /tmp/newgawk for now} -make && make check @ii{Build and check that all is OK} -make install @ii{Install gawk} -@end example - -Next, build @code{gawkextlib} and test it: - -@example -cd .../path/to/gawkextlib-code -./update-autotools @ii{Generate configure, etc.} - @ii{You may have to run this command twice} -./configure --with-gawk=/tmp/newgawk @ii{Configure, point at ``installed'' gawk} -make && make check @ii{Build and check that all is OK} -@end example - -If you write an extension that you wish to share with other -@command{gawk} users, please consider doing so through the -@code{gawkextlib} project. - -@node Fake Chapter -@chapter Fake Sections For Cross References - -@menu -* Reference to Elements:: Referring to an Array Element. -* Built-in:: Built-in Functions. -* Built-in Variables:: Built-in Variables. -* Options:: Command-Line Options. -* AWKLIBPATH Variable:: The @env{AWKLIBPATH} Environment Variable. -* BEGINFILE/ENDFILE:: The @code{BEGINFILE} and @code{ENDFILE} Special Patterns. -* Redirection:: Redirecting Output of @code{print} and @code{printf}. -* Arrays:: Arrays in @command{awk}. -* Conversion:: Conversion of Strings and Numbers. -* Delete:: The @code{delete} Statement. -* String Functions:: String-Manipulation Functions. -* Glossary:: Glossary. -* Copying:: GNU General Public License. -* Reading Files:: Reading Input Files. -* Time Functions:: Time Functions. -* Array Sorting:: Controlling Array Traversal and Array Sorting. -@end menu - -@node Reference to Elements -@section Referring to an Array Element - -@node Built-in -@section Built-in Functions - -@node Built-in Variables -@section Built-in Variables - -@node Options -@section Command-Line Options - -@node AWKLIBPATH Variable -@section The @env{AWKLIBPATH} Environment Variable - -@node BEGINFILE/ENDFILE -@section The @code{BEGINFILE} and @code{ENDFILE} Special Patterns - -@node Redirection -@section Redirecting Output of @code{print} and @code{printf} - -@node Arrays -@section Arrays in @command{awk} - -@node Conversion -@section Conversion of Strings and Numbers - -@node Delete -@section The @code{delete} Statement - -@node String Functions -@section String-Manipulation Functions - -@node Glossary -@section Glossary - -@node Copying -@section GNU General Public License - -@node Reading Files -@section Reading Input Files - -@node Time Functions -@section Time Functions - -@node Array Sorting -@section Controlling Array Traversal and Array Sorting - -@bye -shold |