diff options
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r-- | doc/gawk.texi | 165 |
1 files changed, 160 insertions, 5 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi index 27cbcab2..e9d987f0 100644 --- a/doc/gawk.texi +++ b/doc/gawk.texi @@ -604,6 +604,7 @@ particular records in a file and perform operations upon them. @code{getline}. * Getline Summary:: Summary of @code{getline} Variants. * Read Timeout:: Reading input with a timeout. +* Retrying Input:: Retrying input after certain errors. * Command-line directories:: What happens if you put a directory on the command line. * Input Summary:: Input summary. @@ -945,6 +946,7 @@ particular records in a file and perform operations upon them. * Array Functions:: Functions for working with arrays. * Flattening Arrays:: How to flatten arrays. * Creating Arrays:: How to create and populate arrays. +* Redirection API:: How to access and manipulate redirections. * Extension API Variables:: Variables provided by the API. * Extension Versioning:: API Version information. * Extension API Informational Variables:: Variables providing information about @@ -6327,6 +6329,7 @@ used with it do not have to be named on the @command{awk} command line * Getline:: Reading files under explicit program control using the @code{getline} function. * Read Timeout:: Reading input with a timeout. +* Retrying Input:: Retrying input after certain errors. * Command-line directories:: What happens if you put a directory on the command line. * Input Summary:: Input summary. @@ -8113,6 +8116,13 @@ a record, such as a file that cannot be opened, then @code{getline} returns @minus{}1. In this case, @command{gawk} sets the variable @code{ERRNO} to a string describing the error that occurred. +If @code{ERRNO} indicates that the I/O operation may be +retried, and @code{PROCINFO["@var{input}", "RETRY"]} is set, +then @code{getline} returns @minus{}2 +instead of @minus{}1, and further calls to @code{getline} +may be attemped. @DBXREF{Retrying Input} for further information about +this feature. + In the following examples, @var{command} stands for a string value that represents a shell command. @@ -8767,7 +8777,8 @@ on a per-command or per-connection basis. the attempt to read from the underlying device may succeed in a later attempt. This is a limitation, and it also means that you cannot use this to multiplex input from -two or more sources. +two or more sources. @DBXREF{Retrying Input} for a way to enable +later I/O attempts to succeed. Assigning a timeout value prevents read operations from blocking indefinitely. But bear in mind that there are other ways @@ -8777,6 +8788,35 @@ a connection before it can start reading any data, or the attempt to open a FIFO special file for reading can block indefinitely until some other process opens it for writing. +@node Retrying Input +@section Retrying Reads After Certain Input Errors +@cindex retrying input + +@cindex differences in @command{awk} and @command{gawk}, retrying input +This @value{SECTION} describes a feature that is specific to @command{gawk}. + +When @command{gawk} encounters an error while reading input, by +default @code{getline} returns @minus{}1, and subsequent attempts to +read from that file result in an end-of-file indication. However, you +may optionally instruct @command{gawk} to allow I/O to be retried when +certain errors are encountered by setting a special element in +the @code{PROCINFO} array (@pxref{Auto-set}): + +@example +PROCINFO["@var{input_name}", "RETRY"] = 1 +@end example + +When this element exists, @command{gawk} checks the value of the system +@code{errno} variable when an I/O error occurs. If @code{errno} indicates +a subsequent I/O attempt may succeed, @code{getline} instead returns +@minus{}2 and +further calls to @code{getline} may succeed. This applies to the @code{errno} +values @code{EAGAIN}, @code{EWOULDBLOCK}, @code{EINTR}, or @code{ETIMEDOUT}. + +This feature is useful in conjunction with +@code{PROCINFO["@var{input_name}", "READ_TIMEOUT"]} or situations where a file +descriptor has been configured to behave in a non-blocking fashion. + @node Command-line directories @section Directories on the Command Line @cindex differences in @command{awk} and @command{gawk}, command-line directories @@ -14928,6 +14968,11 @@ value to be meaningful when an I/O operation returns a failure value, such as @code{getline} returning @minus{}1. You are, of course, free to clear it yourself before doing an I/O operation. +If the value of @code{ERRNO} corresponds to a system error in the C +@code{errno} variable, then @code{PROCINFO["errno"]} will be set to the value +of @code{errno}. For non-system errors, @code{PROCINFO["errno"]} will +be zero. + @cindex @code{FILENAME} variable @cindex dark corner, @code{FILENAME} variable @item @code{FILENAME} @@ -14996,6 +15041,10 @@ are guaranteed to be available: @item PROCINFO["egid"] The value of the @code{getegid()} system call. +@item PROCINFO["errno"] +The value of the C @code{errno} variable when @code{ERRNO} is set to +the associated error message. + @item PROCINFO["euid"] @cindex effective user ID of @command{gawk} user The value of the @code{geteuid()} system call. @@ -15046,7 +15095,7 @@ while the program runs. The value of the @code{getgid()} system call. @item PROCINFO["pgrpid"] -@cindex process group idIDof @command{gawk} process +@cindex process group ID of @command{gawk} process The process group ID of the current process. @item PROCINFO["pid"] @@ -15098,7 +15147,7 @@ The version of the GNU MP library. The maximum precision supported by MPFR. @item PROCINFO["prec_min"] -@cindex minimum precision supported by MPFR library +@cindex minimum precision required by MPFR library The minimum precision required by MPFR. @end table @@ -15135,6 +15184,11 @@ open input file, pipe, or coprocess. @DBXREF{Read Timeout} for more information. @item +It may be used to indicate that input may be retried when it fails due to +certain errors. +@DBXREF{Retrying Input} for more information. + +@item It may be used to cause coprocesses to communicate over pseudo-ttys instead of through two-way pipes; this is discussed further in @ref{Two-way I/O}. @@ -31571,6 +31625,7 @@ This (rather large) @value{SECTION} describes the API in detail. * Symbol Table Access:: Functions for accessing global variables. * Array Manipulation:: Functions for working with arrays. +* Redirection API:: How to access and manipulate redirections. * Extension API Variables:: Variables provided by the API. * Extension API Boilerplate:: Boilerplate code for using the API. @end menu @@ -31646,6 +31701,10 @@ Clearing an array @item Flattening an array for easy C-style looping over all its indices and elements @end itemize + +@item +Accessing and manipulating redirections. + @end itemize Some points about using the API: @@ -33616,6 +33675,75 @@ $ @kbd{AWKLIBPATH=$PWD ./gawk -f subarray.awk} (@DBXREF{Finding Extensions} for more information on the @env{AWKLIBPATH} environment variable.) +@node Redirection API +@subsection Accessing and Manipulating Redirections + +The following function allows extensions to access and manipulate redirections. + +@table @code +@item awk_bool_t get_file(const char *name, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ size_t name_len, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const char *filetype, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ int fd, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const awk_input_buf_t **ibufp, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const awk_output_buf_t **obufp); +Look up a file in @command{gawk}'s internal redirection table. +If @code{name} is @code{NULL} or @code{name_len} is zero, return +data for the currently open input file corresponding to @code{FILENAME}. +(This does not access the @code{filetype} argument, so that may be undefined). +If the file is not already open, attempt to open it. +The @code{filetype} argument must be zero-terminated and should be one of: + +@table @code +@item ">" +A file opened for output. + +@item ">>" +A file opened for append. + +@item "<" +A file opened for input. + +@item "|>" +A pipe opened for output. + +@item "|<" +A pipe opened for input. + +@item "|&" +A two-way coprocess. +@end table + +On error, return a @code{false} value. Otherwise, return +@code{true}, and return additional information about the redirection +in the @code{ibufp} and @code{obufp} pointers. For input +redirections, the @code{*ibufp} value should be non-@code{NULL}, +and @code{*obufp} should be @code{NULL}. For output redirections, +the @code{*obufp} value should be non-@code{NULL}, and @code{*ibufp} +should be @code{NULL}. For two-way coprocesses, both values should +be non-@code{NULL}. + +In the usual case, the extension is interested in @code{(*ibufp)->fd} +and/or @code{fileno((*obufp)->fp)}. If the file is not already +open, and the @code{fd} argument is non-negative, @command{gawk} +will use that file descriptor instead of opening the file in the +usual way. If @code{fd} is non-negative, but the file exists already, +@command{gawk} ignores @code{fd} and returns the existing file. It is +the caller's responsibility to notice that neither the @code{fd} in +the returned @code{awk_input_buf_t} nor the @code{fd} in the returned +@code{awk_output_buf_t} matches the requested value. + +Note that supplying a file descriptor is currently @emph{not} supported +for pipes. However, supplying a file descriptor should work for input, +output, append, and two-way (coprocess) sockets. If @code{filetype} +is two-way, @command{gawk} assumes that it is a socket! Note that in +the two-way case, the input and output file descriptors may differ. +To check for success, you must check whether either matches. +@end table + +It is anticipated that this API function will be used to implement I/O +multiplexing and a socket library. + @node Extension API Variables @subsection API Variables @@ -34830,11 +34958,16 @@ properly: # Please set INPLACE_SUFFIX to make a backup copy. For example, you may # want to set INPLACE_SUFFIX to .bak on the command line or in a BEGIN rule. +# N.B. We call inplace_end() in the BEGINFILE and END rules so that any +# actions in an ENDFILE rule will be redirected as expected. + BEGINFILE @{ - inplace_begin(FILENAME, INPLACE_SUFFIX) + if (_inplace_filename != "") + inplace_end(_inplace_filename, INPLACE_SUFFIX) + inplace_begin(_inplace_filename = FILENAME, INPLACE_SUFFIX) @} -ENDFILE @{ +END @{ inplace_end(FILENAME, INPLACE_SUFFIX) @} @end group @@ -34849,6 +34982,10 @@ If @code{INPLACE_SUFFIX} is not an empty string, the original file is linked to a backup @value{FN} created by appending that suffix. Finally, the temporary file is renamed to the original @value{FN}. +The @code{_inplace_filename} variable serves to keep track of the +current filename so as to not invoke @code{inplace_end()} before +processing the first file. + If any error occurs, the extension issues a fatal error to terminate processing immediately without damaging the original file. @@ -35343,6 +35480,24 @@ Add functions to implement system calls such as @code{chown()}, @code{chmod()}, and @code{umask()} to the file operations extension presented in @ref{Internal File Ops}. +@c Idea from comp.lang.awk, February 2015 +@item +Write an input parser that prints a prompt if the input is +a from a ``terminal'' device. You can use the @code{isatty()} +function to tell if the input file is a terminal. (Hint: this function +is usually expensive to call; try to call it just once.) +The content of the prompt should come from a variable settable +by @command{awk}-level code. +You can write the prompt to stanard error. However, +for best results, open a new file descriptor (or file pointer) +on @file{/dev/tty} and print the prompt there, in case standard +error has been redirected. + +Why is standard error a better +choice than standard output for writing the prompt? +Which reading mechanism should you replace, the one to get +a record, or the one to read raw bytes? + @item (Hard.) How would you provide namespaces in @command{gawk}, so that the |