diff options
Diffstat (limited to 'doc/api.texi')
-rw-r--r-- | doc/api.texi | 366 |
1 files changed, 201 insertions, 165 deletions
diff --git a/doc/api.texi b/doc/api.texi index 52d11154..71f5f6e8 100644 --- a/doc/api.texi +++ b/doc/api.texi @@ -309,19 +309,19 @@ Fake top node. @node Extension API @chapter Writing Extensions for @command{gawk} -It is possible to add new built-in -functions to @command{gawk} using dynamically loaded libraries. This -facility is available on systems (such as GNU/Linux) that support -the C @code{dlopen()} and @code{dlsym()} functions. -This @value{CHAPTER} describes how to do so using -code written in C or C++. If you don't know anything about C +It is possible to add new built-in functions to @command{gawk} using +dynamically loaded libraries. This facility is available on systems (such +as GNU/Linux) that support the C @code{dlopen()} and @code{dlsym()} +functions. This @value{CHAPTER} describes how to create extensions +using code written in C or C++. If you don't know anything about C programming, you can safely skip this @value{CHAPTER}, although you -may wish to review the documentation on the extensions that come -with @command{gawk} (@pxref{Extension Samples}). +may wish to review the documentation on the extensions that come with +@command{gawk} (@pxref{Extension Samples}), and the section on the +@code{gawkextlib} project (@pxref{gawkextlib}). @quotation NOTE When @option{--sandbox} is specified, extensions are disabled -(@pxref{Options}. +(@pxref{Options}). @end quotation @menu @@ -355,7 +355,8 @@ Interface} (API) defined for this purpose by the @command{gawk} developers. The rest of this @value{CHAPTER} explains the design decisions behind the API, the facilities it provides and how to use them, and presents a small sample extension. In addition, it documents -the sample extensions included in the @command{gawk} distribution. +the sample extensions included in the @command{gawk} distribution, +and describes the @code{gawkextlib} project. @node Plugin License @section Extension Licensing @@ -363,7 +364,7 @@ the sample extensions included in the @command{gawk} distribution. Every dynamic extension should define the global symbol @code{plugin_is_GPL_compatible} to assert that it has been licensed under a GPL-compatible license. If this symbol does not exist, @command{gawk} -will emit a fatal error and exit. +emits a fatal error and exits when it tries to load your extension. The declared type of the symbol should be @code{int}. It does not need to be in any allocated section, though. The code merely asserts that @@ -466,12 +467,12 @@ The ability to create, access and update global variables. @item Easy access to all the elements of an array at once (``array flattening'') in order to loop over all the element in an easy fashion for C code. -@end itemize @item The ability to create arrays (including @command{gawk}'s true multi-dimensional arrays). @end itemize +@end itemize Some additional important goals were: @@ -480,7 +481,7 @@ Some additional important goals were: The API should use only features in ISO C 90, so that extensions can be written using the widest range of C and C++ compilers. The header should include the appropriate @samp{#ifdef __cplusplus} and @samp{extern "C"} -magic so that a C++ compiler could be used. (If using the C++, the runtime +magic so that a C++ compiler could be used. (If using C++, the runtime system has to be smart enough to call any constructors and destructors, as @command{gawk} is a C program. As of this writing, this has not been tested.) @@ -491,7 +492,7 @@ symbols@footnote{The @dfn{symbols} are the variables and functions defined inside @command{gawk}. Access to these symbols by code external to @command{gawk} loaded dynamically at runtime is problematic on Windows.} by the compile-time or dynamic linker, -in order to enable creation of extensions that will also work on Windows. +in order to enable creation of extensions that also work on Windows. @end itemize During development, it became clear that there were other features @@ -503,7 +504,7 @@ provided: Extensions should have the ability to hook into @command{gawk}'s I/O redirection mechanism. In particular, the @command{xgawk} developers provided a so-called ``open hook'' to take over reading -records. During the development, this was generalized to allow +records. During development, this was generalized to allow extensions to hook into input processing, output processing, and two-way I/O. @@ -548,7 +549,7 @@ by integers are so transparent that they aren't even documented!) With time, the API will undoubtedly evolve; the @command{gawk} developers expect this to be driven by user needs. For now, the current API seems -to provide a minimal yet powerful set of features for extension creation. +to provide a minimal yet powerful set of features for creating extensions. @node Extension Mechanism Outline @subsection At A High Level How It Works @@ -558,7 +559,7 @@ glance, a difficult one to meet. One design, apparently used by Perl and Ruby and maybe others, would be to make the mainline @command{gawk} code into a library, with the -@command{gawk} program a small C @code{main()} function linked against +@command{gawk} utility a small C @code{main()} function linked against the library. This seemed like the tail wagging the dog, complicating build and @@ -637,8 +638,8 @@ extension code is quite readable and understandable. Although all of this sounds medium complicated, the result is that extension code is quite clean and straightforward. This can be seen in -the sample extensions @file{filefuncs.c} and also the @file{testext.c} -code for testing the APIs. +the sample extensions @file{filefuncs.c} (@pxref{Extension Example}) +and also the @file{testext.c} code for testing the APIs. Some other bits and pieces: @@ -657,11 +658,6 @@ extension can check if the @command{gawk} it is loaded with supports the facilities it was compiled with. (Version mismatches ``shouldn't'' happen, but we all know how @emph{that} goes.) @xref{Extension Versioning}, for details. - -@item -An extension may register a version string with @command{gawk}; this -allows @command{gawk} to dump extension version information when -invoked with the @option{--version} option. @end itemize @node Extension Future Growth @@ -671,7 +667,7 @@ The API provides room for future growth, in two ways. An ``extension id'' is passed into the extension when its loaded. This extension id is then passed back to @command{gawk} with each function -call. This allows @command{gawk} to identify the extension calling it, +call. This allows @command{gawk} to identify the extension calling into it, should it need to know. A ``name space'' is passed into @command{gawk} when an extension function @@ -720,20 +716,20 @@ Registrations functions. You may register: @item extension functions, @item -input parsers, +exit callbacks, @item -output wrappers, +a version string, @item -two-way processors, +input parsers, @item -exit callbacks, +output wrappers, @item -and a version string. +and two-way processors. @end itemize All of these are discussed in detail, later in this @value{CHAPTER}. @item -Printing fatal, warning, and lint warning messages. +Printing fatal, warning, and ``lint'' warning messages. @item Updating @code{ERRNO}, or unsetting it. @@ -764,7 +760,7 @@ Creating a new array @item Clearing an array @item -Flattening an array for easy C style looping over an array +Flattening an array for easy C style looping over all its indices and elements @end itemize @end itemize @@ -850,13 +846,13 @@ That value must then be passed back to @command{gawk} as the first parameter of each API function. @item #define awk_const @dots{} -This macro expands to @code{const} when compiling an extension, +This macro expands to @samp{const} when compiling an extension, and to nothing when compiling @command{gawk} itself. This makes certain fields in the API data structures unwritable from extension code, while allowing @command{gawk} to use them as it needs to. @item typedef int awk_bool_t; -A simple boolean type. As of this moment, the API does not define special +A simple boolean type. At the moment, the API does not define special ``true'' and ``false'' values, although perhaps it should. @item typedef struct @{ @@ -889,7 +885,7 @@ It is used in the following @code{struct}. @itemx @ @ @ @ @ @ @ @ double@ @ @ @ @ @ @ @ @ @ @ @ @ d; @itemx @ @ @ @ @ @ @ @ awk_array_t@ @ @ @ @ @ @ @ a; @itemx @ @ @ @ @ @ @ @ awk_scalar_t@ @ @ @ @ @ @ scl; -@itemx @ @ @ @ @ @ @ @ awk_value_cookie_t vc; +@itemx @ @ @ @ @ @ @ @ awk_value_cookie_t@ vc; @itemx @ @ @ @ @} u; @itemx @} awk_value_t; An ``@command{awk} value.'' @@ -906,11 +902,14 @@ readable. @item typedef void *awk_scalar_t; Scalars can be represented as an opaque type. These values are obtained from -@command{gawk} and then passed back into it. This is discussed below. +@command{gawk} and then passed back into it. This is discussed in a general fashion below, +and in more detail in @ref{Symbol table by cookie}. @item typedef void *awk_value_cookie_t; A ``value cookie'' is an opaque type representing a cached value. -This is also discussed below. +This is also discussed in a general fashion below, +and in more detail in @ref{Cached values}. + @end table Scalar values in @command{awk} are either numbers or strings. The @@ -926,7 +925,7 @@ Identifiers (i.e., the names of global variables) can be associated with either scalar values or with arrays. In addition, @command{gawk} provides true arrays of arrays, where any given array element can itself be an array. Discussion of arrays is delayed until -@ref{Array Manipulation} +@ref{Array Manipulation}. The various macros listed earlier make it easier to use the elements of the @code{union} as if they were fields in a @code{struct}; this @@ -949,9 +948,9 @@ can obtain a @dfn{scalar cookie}@footnote{See @uref{http://catb.org/jargon/html/C/cookie.html, the ``cookie'' entry in the Jargon file} for a definition of @dfn{cookie}, and @uref{http://catb.org/jargon/html/M/magic-cookie.html, the ``magic cookie'' entry in the Jargon file} for a nice example. See -also the entry in the @ref{Glossary}.} +also the entry for ``Cookie'' in the @ref{Glossary}.} object for that variable, and then use -the cookie for getting the variable's value for changing the variable's +the cookie for getting the variable's value or for changing the variable's value. This is the @code{awk_scalar_t} type and @code{scalar_cookie} macro. Given a scalar cookie, @command{gawk} can directly retrieve or @@ -970,7 +969,7 @@ process as well as the time needed to create the value. All of the functions that return values from @command{gawk} work in the same way. You pass in an @code{awk_valtype_t} value -to indicate what kind of value you want. If the actual value +to indicate what kind of value you expect. If the actual value matches what you requested, the function returns true and fills in the @code{awk_value_t} result. Otherwise, the function returns false, and the @code{val_type} @@ -1039,20 +1038,21 @@ the way that extension code would use them. This function creates a string value in the @code{awk_value_t} variable pointed to by @code{result}. It expects @code{string} to be a C string constant (or other string data), and automatically creates a @emph{copy} of the data -for storage in @code{result}. +for storage in @code{result}. It returns @code{result}. @item static inline awk_value_t * @itemx make_malloced_string(const char *string, size_t length, awk_value_t *result) This function creates a string value in the @code{awk_value_t} variable pointed to by @code{result}. It expects @code{string} to be a @samp{char *} value pointing to data previously obtained from @code{malloc()}. The idea here -is that the data will be passed directly to @command{gawk}, which will assume -responsibility for it. +is that the data is passed directly to @command{gawk}, which assumes +responsibility for it. It returns @code{result}. @item static inline awk_value_t * @itemx make_null_string(awk_value_t *result) This specialized function creates a null string (the ``undefined'' value) in the @code{awk_value_t} variable pointed to by @code{result}. +It returns @code{result}. @item static inline awk_value_t * @itemx make_number(double num, awk_value_t *result) @@ -1098,22 +1098,24 @@ make_malloced_string(message, strlen(message), & result); @end example @item erealloc(pointer, type, size, message) +This is like @code{emalloc()}, but it calls @code{realloc()}, +instead of @code{malloc()}. The arguments are the same as for the @code{emalloc()} macro. @end table @node Registration Functions @subsection Registration Functions -This @value{SECTION} describes the API functions which let you -register parts of your extension with @command{gawk}. +This @value{SECTION} describes the API functions for +registering parts of your extension with @command{gawk}. @menu * Extension Functions:: Registering extension functions. +* Exit Callback Functions:: Registering an exit callback. +* Extension Version String:: Registering a version string. * Input Parsers:: Registering an input parser. * Output Wrappers:: Registering an output wrapper. * Two-way processors:: Registering a two-way processor. -* Exit Callback Functions:: Registering an exit callback. -* Extension Version String:: Registering a version string. @end menu @node Extension Functions @@ -1134,13 +1136,14 @@ The fields are: @table @code @item const char *name; The name of the new function. -@command{awk} level code will call the function by this name. +@command{awk} level code calls the function by this name. +This is a regular C string. @item awk_value_t *(*function)(int num_actual_args, awk_value_t *result); This is a pointer to the C function that provides the desired functionality. The function must fill in the result with either a number -or a string. @command{awk takes ownership of any string memory}. +or a string. @command{awk} takes ownership of any string memory. As mentioned earlier, string memory @strong{must} come from @code{malloc()}. The function must return the value of @code{result}. @@ -1161,16 +1164,64 @@ it with @command{gawk} using this API function: This function returns true upon success, false otherwise. The @code{namespace} parameter is currently not used; you should pass in an empty string (@code{""}). The @code{func} pointer is the address of a -@code{struct} describing your function, as just described. +@code{struct} representing your function, as just described. @end table +@node Exit Callback Functions +@subsubsection Registering An Exit Callback Function + +An @dfn{exit callback} function is a function that +@command{gawk} calls before it exits. +Such functions are useful if you have general ``clean up'' tasks +that should be performed in your extension (such as closing data +base connections or other resource deallocations). +You can register such +a function with @command{gawk} using the following function. + +@table @code +@item void awk_atexit(void (*funcp)(void *data, int exit_status), +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ void *arg0); +The parameters are: +@c nested table +@table @code +@item funcp +A pointer to the function to be called before @command{gawk} exits. The @code{data} +parameter will be the original value of @code{arg0}. +The @code{exit_status} parameter is +the exit status value that @command{gawk} will pass to the @code{exit()} system call. + +@item arg0 +A pointer to private data which @command{gawk} saves in order to pass to +the function pointed to by @code{funcp}. +@end table +@end table + +Exit callback functions are called in Last-In-First-Out (LIFO) order---that is, in +the reverse order in which they are registered with @command{gawk}. + +@node Extension Version String +@subsubsection Registering An Extension Version String + +You can register a version string which indicates the name and +version of your extension, with @command{gawk}, as follows: + +@table @code +@item void register_ext_version(const char *version); +Register the string pointed to by @code{version} with @command{gawk}. +@command{gawk} does @emph{not} copy the @code{version} string, so +it should not be changed. +@end table + +@command{gawk} prints all registered extension version strings when it +is invoked with the @option{--version} option. + @node Input Parsers @subsubsection Customized Input Parsers By default, @command{gawk} reads text files as its input. It uses the value of @code{RS} to find the end of the record, and then uses @code{FS} -(or @code{FIELDWIDTHS}) to split it into fields. Additionally, it sets -the value of @code{RT} (@pxref{Built-in Variables}). +(or @code{FIELDWIDTHS}) to split it into fields (@pxref{Reading Files}). +Additionally, it sets the value of @code{RT} (@pxref{Built-in Variables}). If you want, you can provide your own, custom, input parser. An input parser's job is to return a record to the @command{gawk} record processing @@ -1185,8 +1236,8 @@ To provide an input parser, you must first provide two functions This function examines the information available in @code{iobuf} (which we discuss shortly). Based on the information there, it decides if the input parser should be used for this file. -If so, it should return true (non-zero). Otherwise, it should -return false (zero). +If so, it should return true. Otherwise, it should return false. +It should not change any state (variable values, etc.) within @command{gawk}. @item awk_bool_t @var{XXX}_take_control_of(awk_input_buf_t *iobuf) When @command{gawk} decides to hand control of the file over to the @@ -1210,6 +1261,23 @@ typedef struct input_parser @{ @} awk_input_parser_t; @end example +The fields are: + +@table @code +@item const char *name; +The name of the input parser. This is a regular C string. + +@item awk_bool_t (*can_take_file)(const awk_input_buf_t *iobuf); +A pointer to your @code{@var{XXX}_can_take_file()} function. + +@item awk_bool_t (*take_control_of)(awk_input_buf_t *iobuf); +A pointer to your @code{@var{XXX}_take_control_of()} function. + +@item awk_const struct input_parser *awk_const next; +This pointer is used by @command{gawk}. +The extension cannot modify it. +@end table + The steps are as follows: @enumerate @@ -1231,9 +1299,9 @@ typedef struct awk_input @{ int fd; /* file descriptor */ #define INVALID_HANDLE (-1) void *opaque; /* private data for input parsers */ - int (*get_record)(char **out, struct awk_input *, int *errcode, - char **rt_start, size_t *rt_len); - void (*close_func)(struct awk_input *); + int (*get_record)(char **out, struct awk_input *iobuf, + int *errcode, char **rt_start, size_t *rt_len); + void (*close_func)(struct awk_input *iobuf); struct stat sbuf; /* stat buf */ @} awk_input_buf_t; @end example @@ -1249,12 +1317,12 @@ The name of the file. @item int fd; A file descriptor for the file. If @command{gawk} was able to -open the file, then it will @emph{not} be equal to +open the file, then @code{fd} will @emph{not} be equal to @code{INVALID_HANDLE}. Otherwise, it will. @item struct stat sbuf; If file descriptor is valid, then @command{gawk} will have filled -in this structure with a call to the @code{fstat()} system call. +in this structure via a call to the @code{fstat()} system call. @end table The @code{@var{XXX}_can_take_file()} function should examine these @@ -1266,7 +1334,7 @@ file, whether or not the file descriptor is valid, the information in the @code{struct stat}, or any combination of the above. Once @code{@var{XXX}_can_take_file()} has returned true, and -@command{gawk} has decided to use your input parser, it will call +@command{gawk} has decided to use your input parser, it calls @code{@var{XXX}_take_control_of()}. That function then fills in at least the @code{get_record} field of the @code{awk_input_buf_t}. It must also ensure that @code{fd} is not set to @code{INVALID_HANDLE}. All of @@ -1279,24 +1347,28 @@ This is used to hold any state information needed by the input parser for this file. It is ``opaque'' to @command{gawk}. The input parser is not required to use this pointer. -@item int (*get_record)(char **out, struct awk_input *, int *errcode, -@itemx char **rt_start, size_t *rt_len); -This is a function pointer that should be set to point to the -function that creates the input records. -Said function is the core of the input parser. Its behavior is -described below. +@item int@ (*get_record)(char@ **out, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ struct@ awk_input *iobuf, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ int *errcode, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ char **rt_start, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ size_t *rt_len); +This function pointer should point to a function that creates the input +records. Said function is the core of the input parser. Its behavior +is described below. + +@item void (*close_func)(struct awk_input *iobuf); +This function pointer should point to a function that does +the ``tear down.'' It should release any resources allocated by +@code{@var{XXX}_take_control_of()}. It may also close the file. If it +does so, it shold set the @code{fd} field to @code{INVALID_HANDLE}. -@item void (*close_func)(struct awk_input *); -This is a function pointer that should be set to point to the -function that does the ``tear down.'' It should release any resources -allocated by @code{@var{XXX}_take_control_of()}. It may also close -the file. If it does so, it shold set the @code{fd} field to -@code{INVALID_HANDLE}. +If @code{fd} is still not @code{INVALID_HANDLE} after the call to this +function, @command{gawk} calls the regular @code{close()} system call. Having a ``tear down'' function is optional. If your input parser does -not need it, do not set this field. In that case, @command{gawk} -will close the regular @code{close()} system call on the -file descriptor, so it should be valid. +not need it, do not set this field. Then, @command{gawk} calls the +regular @code{close()} system call on the file descriptor, so it should +be valid. @end table The @code{@var{XXX}_get_record()} function does the work of creating @@ -1305,7 +1377,7 @@ input records. The parameters are as follows: @table @code @item char **out This is a pointer to a @code{char *} variable which is set to point -to the record. @command{gawk} will make its own copy of the data, so +to the record. @command{gawk} makes its own copy of the data, so the extension must manage this storage. @item struct awk_input *iobuf @@ -1337,13 +1409,13 @@ to zero, so there is no need to set it unless an error occurs. If an error does occur, the function should return @code{EOF} and set @code{*errcode} to a non-zero value. In that case, if @code{*errcode} -does not equal @minus{}1, @command{gawk|} will automatically update +does not equal @minus{}1, @command{gawk} automatically updates the @code{ERRNO} variable based on the value of @code{*errcode} (e.g., setting @samp{*errcode = errno} should do the right thing). -@command{gawk} ships with a sample extension (@pxref{Extension Sample -Readdir}) that reads directories, returning records for each entry in -the directory. You may wish to use that code as a guide for writing +@command{gawk} ships with a sample extension that reads directories, +returning records for each entry in the directory (@pxref{Extension +Sample Readdir}). You may wish to use that code as a guide for writing your own input parser. When writing an input parser, you should think about (and document) @@ -1352,9 +1424,9 @@ it to always be called, and take effect as appropriate (as the @code{readdir} extension does). Or you may want it to take effect based upon the value of an @code{awk} variable, as the XML extension from the @code{gawkextlib} project does (@pxref{gawkextlib}). -In the latter case, code in a @code{BEGINFILE} section (@pxref{BEGINFILE/ENDFILE}). +In the latter case, code in a @code{BEGINFILE} section can look at @code{FILENAME} and @code{ERRNO} to decide whether or -not to activate an input parser. +not to activate an input parser (@pxref{BEGINFILE/ENDFILE}). You register your input parser with the following function: @@ -1368,8 +1440,8 @@ Register the input parser pointed to by @code{input_parser} with @subsubsection Customized Output Wrappers An @dfn{output wrapper} is the mirror image of an input parser. -It allows an extension to take over the output to a file (opened -with the @samp{>} or @samp{>>} operators, @pxref{Redirection}). +It allows an extension to take over the output to a file opened +with the @samp{>} or @samp{>>} operators (@pxref{Redirection}). The output wrapper is very similar to the input parser structure: @@ -1432,7 +1504,7 @@ The data members are as follows: The name of the output file. @item const char *mode; -The mode string (as would be used in the second argument to @code{fopen()} +The mode string (as would be used in the second argument to @code{fopen()}) with which the file was opened. @item FILE *fp; @@ -1440,7 +1512,7 @@ The @code{FILE} pointer from @code{<stdio.h>}. @command{gawk} opens the file before attempting to find an output wrapper. @item awk_bool_t redirected; -The field should be set to true in the @code{@var{XXX}_take_control_of()} function. +This field must be set to true by the @code{@var{XXX}_take_control_of()} function. @item void *opaque; This pointer is opaque to @command{gawk}. The extension should use it to store @@ -1481,7 +1553,7 @@ Register the output wrapper pointed to by @code{output_wrapper} with A @dfn{two-way processor} combines an input parser and an output wrapper for two-way I/O with the @samp{|&} operator (@pxref{Redirection}). It makes identical -use of the @code{awk_input_parser_t} and @code{awk_output_buf_t} structures, +use of the @code{awk_input_parser_t} and @code{awk_output_buf_t} structures as described earlier. A two-way processor is represented by the following structure: @@ -1490,7 +1562,9 @@ A two-way processor is represented by the following structure: typedef struct two_way_processor @{ const char *name; /* name of the two-way processor */ awk_bool_t (*can_take_two_way)(const char *name); - awk_bool_t (*take_control_of)(const char *name, awk_input_buf_t *inbuf, awk_output_buf_t *outbuf); + awk_bool_t (*take_control_of)(const char *name, + awk_input_buf_t *inbuf, + awk_output_buf_t *outbuf); awk_const struct two_way_processor *awk_const next; /* for use by gawk */ @} awk_two_way_processor_t; @end example @@ -1502,9 +1576,13 @@ The fields are as follows: The name of the two-way processor. @item awk_bool_t (*can_take_two_way)(const char *name); -This function returns true if it wants to take over the two-way I/O for this filename. +This function returns true if it wants to take over two-way I/O for this filename. +It should not change any state (variable +values, etc.) within @command{gawk}. -@item awk_bool_t (*take_control_of)(const char *name, awk_input_buf_t *inbuf, awk_output_buf_t *outbuf); +@item awk_bool_t (*take_control_of)(const char *name, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_input_buf_t *inbuf, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_output_buf_t *outbuf); This function should fill in the @code{awk_input_buf_t} and @code{awk_outut_buf_t} structures pointed to by @code{inbuf} and @code{outbuf}, respectively. These structures were described earlier. @@ -1525,52 +1603,6 @@ Register the two-way processor pointed to by @code{two_way_processor} with @command{gawk}. @end table -@node Exit Callback Functions -@subsubsection Registering An Exit Callback Function - -An @dfn{exit callback} function is a function that -@command{gawk} calls before it exits. -Such functions are useful if you have general ``clean up'' tasks -that should be performed in your extension (such as closing data -base connections or other resource deallocations). -You can register such -a function with @command{gawk} using the following function. - -@table @code -@item void awk_atexit(void (*funcp)(void *data, int exit_status), -@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ void *arg0); -The parameters are: -@c nested table -@table @code -@item funcp -Points to the function to be called before @command{gawk} exits. The @code{data} -parameter will be the original value of @code{arg0}. -The @code{exit_status} parameter is -the exit status value that @command{gawk} will pass to the @code{exit()} system call. - -@item arg0 -A pointer to private data which @command{gawk} saves in order to pass to -the function pointed to by @code{funcp}. -@end table -@end table - -Exit callback functions are called in Last-In-First-Out (LIFO) order---that is, in -the reverse order in which they are registered with @command{gawk}. - -@node Extension Version String -@subsubsection Registering An Extension Version String - -You can register a version string which indicates the name and -version of your extension, with @command{gawk}, as follows: - -@table @code -@item void register_ext_version(const char *version); -Register the string pointed to by @code{version} with @command{gawk}. -@end table - -@command{gawk} prints all registered extension version strings when it -is invoked with the @option{--version} option. - @node Printing Messages @subsection Printing Messages @@ -1591,7 +1623,7 @@ Print a warning message. @item void lintwarn(awk_ext_id_t id, const char *format, ...); Print a ``lint warning.'' Normally this is the same as printing a warning message, but if @command{gawk} was invoked with @samp{--lint=fatal}, -then they become fatal error messages. +then lint warnings become fatal error messages. @end table All of these functions are otherwise like the C @code{printf()} @@ -1602,18 +1634,18 @@ with literal characters and formatting codes intermixed. @subsection Updating @code{ERRNO} The following functions allow you to update the @code{ERRNO} -variable. +variable: @table @code @item void update_ERRNO_int(int errno_val); Set @code{ERRNO} to the string equivalent of the error code in @code{errno_val}. The value should be one of the defined -error codes in @code{<errno.h>}, and @command{gawk} will turn it +error codes in @code{<errno.h>}, and @command{gawk} turns it into a (possibly translated) string using the C @code{strerror()} function. @item void update_ERRNO_string(const char *string); Set @code{ERRNO} directly to the string value of @code{ERRNO}. -@command{gawk} will make a copy of the value of @code{string}. +@command{gawk} makes a copy of the value of @code{string}. @item void unset_ERRNO(); Unset @code{ERRNO}. @@ -1674,7 +1706,7 @@ In the latter case, @code{result->val_type} indicates the actual type. @item awk_bool_t sym_update(const char *name, awk_value_t *value); Update the variable named by the string @code{name}, which is a regular -C string. The variable will be added to @command{gawk}'s symbol table +C string. The variable is added to @command{gawk}'s symbol table if it is not there. Return true if everything worked, false otherwise. Changing types (scalar to array or vice versa) of an existing variable @@ -1715,7 +1747,7 @@ Return false if the value cannot be retrieved. @item awk_bool_t sym_update_scalar(awk_scalar_t cookie, awk_value_t *value); Update the value associated with a scalar cookie. -Return will be false if the new value is not one of +Return false if the new value is not one of @code{AWK_STRING} or @code{AWK_NUMBER}. Here too, the built-in variables may not be updated. @end table @@ -1886,7 +1918,7 @@ Using value cookies in this way saves considerable storage, since all of You might be wondering, ``Is this sharing problematic? What happens if @command{awk} code assigns a new value to @code{VAR1}, -will all the others be changed too?'' +are all the others be changed too?'' That's a great question. The answer is that no, it's not a problem. @command{gawk} is smart enough to avoid such problems. @@ -1962,7 +1994,7 @@ that traverses the list. @itemx @ @ @ @ awk_element_t elements[1];@ @ /* will be extended */ @itemx @} awk_flat_array_t; This is a flattened array. When an extension gets one of these -from @command{gawk}, the @code{elements} array will be of actual +from @command{gawk}, the @code{elements} array is of actual size @code{count}. The @code{opaque1} and @code{opaque2} pointers are for use by @command{gawk}; therefore they are marked @code{awk_const} so that the extension cannot @@ -1987,7 +2019,7 @@ Return false if there is an error. For the array represented by @code{a_cookie}, return in @code{*result} the value of the element whose index is @code{index}. The value for @code{index} can be numeric, in which case @command{gawk} -will convert it to a string. Using non-integral values is possible, but +converts it to a string. Using non-integral values is possible, but requires that you understand how such values are converted to strings (@pxref{Conversion}); thus using integral values is safest. @code{wanted} specifies the type of value you wish to retrieve. @@ -1996,7 +2028,7 @@ Return false if @code{wanted} does not match the actual type or if As with @emph{all} strings passed into @code{gawk} from an extension, the string value of @code{index} must come from @code{malloc()}, and -@command{gawk} will release the storage. +@command{gawk} releases the storage. @item awk_bool_t set_array_element(awk_array_t a_cookie, @itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const@ awk_value_t *const index, @@ -2201,7 +2233,7 @@ have this flag bit set. The sixth step is to release the flattened array. This tells @command{gawk} that the extension is no longer using the array, and that it should delete any elements marked for deletion. -@command{gawk} will also free any storage that was allocated, +@command{gawk} also frees any storage that was allocated, so you should not use the pointer (@code{flat_array} in this code) once you have called @code{release_flattened_array()}: @@ -2228,12 +2260,12 @@ Here is the output from running this part of the test: pets has 5 elements dump_array_and_delete: sym_lookup of pets passed dump_array_and_delete: incoming size is 5 - pets["1"] = "blacky" - pets["2"] = "rusty" - pets["3"] = "sophie" + pets["1"] = "blacky" + pets["2"] = "rusty" + pets["3"] = "sophie" dump_array_and_delete: marking element "3" for deletion - pets["4"] = "raincloud" - pets["5"] = "lucky" + pets["4"] = "raincloud" + pets["5"] = "lucky" dump_array_and_delete(pets) returned 1 dump_array_and_delete() did remove index "3"! @end example @@ -2437,7 +2469,7 @@ $ @kbd{AWKLIBPATH=$PWD ./gawk -f foo.awk} @end example @node Extension API Variables -@subsection Variables +@subsection API Variables The API provides two sets of variables. The first provides information about the version of the API (both with which the extension was compiled, @@ -2512,23 +2544,23 @@ whether the corresponding command-line options were enabled when @table @code @item do_lint -This variable will be true if the @option{--lint} option was passed +This variable is true if the @option{--lint} option was passed (@pxref{Options}). @item do_traditional -This variable will be true if the @option{--traditional} option was passed. +This variable is true if the @option{--traditional} option was passed. @item do_profile -This variable will be true if the @option{--profile} option was passed. +This variable is true if the @option{--profile} option was passed. @item do_sandbox -This variable will be true if the @option{--sandbox} option was passed. +This variable is true if the @option{--sandbox} option was passed. @item do_debug -This variable will be true if the @option{--debug} option was passed. +This variable is true if the @option{--debug} option was passed. @item do_mpfr -This variable will be true if the @option{--bignum} option was passed. +This variable is true if the @option{--bignum} option was passed. @end table The value of @code{do_lint} can change if @command{awk} code @@ -3195,7 +3227,7 @@ implement system calls such as @code{chown()}, @code{chmod()}, and @code{umask()}. @node Using Internal File Ops -@subsection Integrating the Extensions +@subsection Integrating The Extensions @cindex @command{gawk}, interpreter@comma{} adding code to Now that the code is written, it must be possible to add it at @@ -3277,7 +3309,7 @@ $ @kbd{AWKLIBPATH=$PWD gawk -f testff.awk} @end example @node Extension Samples -@section The Sample Extensions in the @command{gawk} Distribution +@section The Sample Extensions In The @command{gawk} Distribution This @value{SECTION} provides brief overviews of the sample extensions that come in the @command{gawk} distribution. Some of them are intended @@ -3587,7 +3619,7 @@ if (fnmatch("*.a", "foo.c", flags) == FNM_NOMATCH) @end example @node Extension Sample Fork -@subsection Interface to @code{fork()}, @code{wait()} and @code{waitpid()} +@subsection Interface To @code{fork()}, @code{wait()} and @code{waitpid()} The @code{fork} extension adds three functions, as follows. @@ -3713,7 +3745,7 @@ On GNU/Linux systems, there are filesystems that don't support the @code{d_type} entry (see the @i{readdir}(3) manual page), and so the file type is always @code{u}. Therefore, using @samp{readdir_do_ftype("stat")} is advisable even on GNU/Linux systems. In this case, the @code{readdir} -extension will fall back to using @code{lstat()} when it encounters an +extension falls back to using @code{lstat()} when it encounters an unknown file type. @end quotation @@ -3790,7 +3822,7 @@ Here too, the return value is 1 on success and 0 on failure. The array created by @code{reada()} is identical to that written by @code{writea()} in the sense that the contents are the same. However, due to implementation issues, the array traversal order of the recreated -array will likely be different from that of the original array. As array +array is likely to be different from that of the original array. As array traversal order in @command{awk} is by default undefined, this is not (technically) a problem. If you need to guarantee a particular traversal order, use the array sorting features in @command{gawk} to do so. @@ -3967,6 +3999,7 @@ make && make check @ii{Build and check that all is OK} * String Functions:: String-Manipulation Functions. * Glossary:: Glossary. * Copying:: GNU General Public License. +* Reading Files:: Reading Input Files. @end menu @node Reference to Elements @@ -4008,4 +4041,7 @@ make && make check @ii{Build and check that all is OK} @node Copying @section GNU General Public License +@node Reading Files +@section Reading Input Files + @bye |